WEBVTT - How Does the Wayback Machine Work?

0:00:01.920 --> 0:00:06.520
<v Speaker 1>Welcome to brain Stuff production of I Heart Radio. Hey

0:00:06.559 --> 0:00:10.160
<v Speaker 1>brain Stuff, Lauren vog Obam here. If a tree falls

0:00:10.160 --> 0:00:13.280
<v Speaker 1>in a forest doesn't really make a sound? And if

0:00:13.280 --> 0:00:17.040
<v Speaker 1>a website changes overnight, did its previous homepage ever really

0:00:17.040 --> 0:00:20.160
<v Speaker 1>exist in the first place. Because so much of our

0:00:20.160 --> 0:00:23.720
<v Speaker 1>world is increasingly digital and ephemeral, it's not just a

0:00:23.720 --> 0:00:28.040
<v Speaker 1>philosophical question, it's also a simple matter of history. That's

0:00:28.080 --> 0:00:30.680
<v Speaker 1>why the way Back Machine, which features step shots of

0:00:30.680 --> 0:00:33.520
<v Speaker 1>websites as they age and change, is such a fascinating

0:00:33.560 --> 0:00:36.400
<v Speaker 1>glimpse into the dusty corners of the web. The way

0:00:36.440 --> 0:00:39.360
<v Speaker 1>Back Machine is a massive digital archive meant to preserve

0:00:39.400 --> 0:00:42.400
<v Speaker 1>web pages that would otherwise be permanently lost to time.

0:00:43.240 --> 0:00:45.519
<v Speaker 1>Without this horde of data, every time a page was

0:00:45.600 --> 0:00:48.280
<v Speaker 1>updated or deleted, it would simply vanish, as if it

0:00:48.280 --> 0:00:51.839
<v Speaker 1>had never been there. Mark Graham, the director of the

0:00:51.840 --> 0:00:56.080
<v Speaker 1>way Back Machine, noted in Entrepreneur article that the average

0:00:56.080 --> 0:00:58.720
<v Speaker 1>life expectancy of a web page is about a hundred days.

0:00:59.240 --> 0:01:02.240
<v Speaker 1>There are a multitude of reasons why these web pages disappear.

0:01:02.760 --> 0:01:05.520
<v Speaker 1>A site creators move on to other projects, web hosting

0:01:05.600 --> 0:01:09.120
<v Speaker 1>companies go bankrupt, or maybe the pages moved or replaced

0:01:09.120 --> 0:01:12.600
<v Speaker 1>with new data and content. One place you may have

0:01:12.640 --> 0:01:15.440
<v Speaker 1>seen the way back machines work. More than eleven million

0:01:15.480 --> 0:01:18.959
<v Speaker 1>web pages referenced in Wikipedia articles have gone bad over

0:01:18.959 --> 0:01:21.440
<v Speaker 1>the years. In other words, they now return a four

0:01:21.480 --> 0:01:25.480
<v Speaker 1>oh four or page not found error because they've been archived.

0:01:25.480 --> 0:01:27.880
<v Speaker 1>In the way Back Machine. Technicians there were able to

0:01:28.000 --> 0:01:30.959
<v Speaker 1>edit those Wikipedia pages, so the references now point to

0:01:31.040 --> 0:01:34.800
<v Speaker 1>archived versions of those defunct u r l s. The

0:01:34.840 --> 0:01:37.560
<v Speaker 1>way Back Machine is the brainchild of Brewster Kale and

0:01:37.560 --> 0:01:40.880
<v Speaker 1>Bruce Giliad, who also founded the Internet Archive, which is

0:01:40.920 --> 0:01:44.160
<v Speaker 1>a digital library of websites, books, audio and video recordings,

0:01:44.200 --> 0:01:49.240
<v Speaker 1>and software. Both projects are San Francisco based nonprofits. Kale

0:01:49.240 --> 0:01:52.920
<v Speaker 1>and Gilliatt also created Alexa Internet, which analyzes web traffic

0:01:52.960 --> 0:01:57.720
<v Speaker 1>patterns and was sold to Amazon. Project director Graham said

0:01:57.800 --> 0:02:00.840
<v Speaker 1>via email they with Kale and Gilad, had started to

0:02:01.000 --> 0:02:04.880
<v Speaker 1>archive web pages in and in two thousand one launched

0:02:04.880 --> 0:02:07.640
<v Speaker 1>the way Back Machine to support discovery and playback of

0:02:07.760 --> 0:02:11.680
<v Speaker 1>those archived web resources and yes, the name was inspired

0:02:11.680 --> 0:02:14.840
<v Speaker 1>by the nineteen sixties cartoon series The Rocky and Bullwinkle Show.

0:02:15.480 --> 0:02:18.959
<v Speaker 1>In the cartoon, the way Back w A B a c.

0:02:19.320 --> 0:02:22.839
<v Speaker 1>Machine was a plot device used to transport the characters Mr.

0:02:22.880 --> 0:02:25.720
<v Speaker 1>Peabody and Sherman back in time to visit important events

0:02:25.720 --> 0:02:29.320
<v Speaker 1>in human history. In a world where there are more

0:02:29.360 --> 0:02:32.480
<v Speaker 1>than one point seven billion websites, with the number climbing

0:02:32.560 --> 0:02:35.760
<v Speaker 1>dramatically by the day, how can anyone possibly hope to

0:02:35.840 --> 0:02:38.960
<v Speaker 1>catalog so many web pages? The way Back Machine uses

0:02:39.000 --> 0:02:42.119
<v Speaker 1>what are called crawlers, a type of software that automatically

0:02:42.200 --> 0:02:45.120
<v Speaker 1>moves through the web, taking snapshots of billions of sites

0:02:45.160 --> 0:02:48.640
<v Speaker 1>as it goes. Some of the process is automated, but

0:02:48.720 --> 0:02:51.440
<v Speaker 1>many of the requests are generated manually by a network

0:02:51.480 --> 0:02:54.799
<v Speaker 1>of librarians who prioritize certain types of sites that they

0:02:54.840 --> 0:02:58.840
<v Speaker 1>think are important to preserve for posterity and for future generations.

0:03:00.120 --> 0:03:04.000
<v Speaker 1>The crawlers don't capture every iteration of sites. The frequency

0:03:04.000 --> 0:03:07.720
<v Speaker 1>of snapshots differs by these sites importance. Very significant sites

0:03:07.800 --> 0:03:10.959
<v Speaker 1>might be recorded every few hours. Others might be logged

0:03:11.000 --> 0:03:14.520
<v Speaker 1>weeks or months apart. Most aren't logged at all, So

0:03:14.800 --> 0:03:17.519
<v Speaker 1>don't worry that embarrassing fan website you made in high

0:03:17.520 --> 0:03:20.880
<v Speaker 1>school is probably long gone by now. The way Back

0:03:20.919 --> 0:03:24.680
<v Speaker 1>Machine aims to capture snapshots of important content, say the

0:03:24.800 --> 0:03:29.520
<v Speaker 1>breaking news headlines created by major media companies, Furthermore, it

0:03:29.560 --> 0:03:33.120
<v Speaker 1>doesn't necessarily recreate the entire site, and it doesn't preserve

0:03:33.160 --> 0:03:35.080
<v Speaker 1>the data in a way that you'd experience it with

0:03:35.120 --> 0:03:38.520
<v Speaker 1>your browser. It may only capture a few images of

0:03:38.560 --> 0:03:41.640
<v Speaker 1>a few pages and not preserve content that's linked to

0:03:41.680 --> 0:03:45.720
<v Speaker 1>other sites outside of the domain. But on a more

0:03:45.720 --> 0:03:49.080
<v Speaker 1>practical level, you've probably had the experience of clicking on

0:03:49.080 --> 0:03:50.720
<v Speaker 1>a link on a web page and getting a four

0:03:50.760 --> 0:03:53.720
<v Speaker 1>oh four or page dot found notation, and now you're

0:03:53.720 --> 0:03:56.760
<v Speaker 1>wondering what was on the page originally. That's where the

0:03:56.760 --> 0:04:00.000
<v Speaker 1>way back machine can help. To use the way back machine,

0:04:00.280 --> 0:04:04.160
<v Speaker 1>go to archive dot org slash web type the ur

0:04:04.320 --> 0:04:06.000
<v Speaker 1>L of the site you want to investigate in the

0:04:06.120 --> 0:04:09.080
<v Speaker 1>browse history search bar, and the results you'll see a

0:04:09.160 --> 0:04:11.920
<v Speaker 1>chronological barograph that shows how many times the site was

0:04:11.960 --> 0:04:15.760
<v Speaker 1>crawled and saved in a given year. Click the year

0:04:15.840 --> 0:04:18.440
<v Speaker 1>and blow You'll see a twelve month calendar with various

0:04:18.520 --> 0:04:21.680
<v Speaker 1>dates highlighted. Blue highlights mean the site was saved properly,

0:04:21.920 --> 0:04:24.839
<v Speaker 1>red means it was not. Click one of the highlighted

0:04:24.920 --> 0:04:27.599
<v Speaker 1>dates and the site stop shots will appear. Click on

0:04:27.600 --> 0:04:30.359
<v Speaker 1>one of those snapshots, and just like that, you've traveled

0:04:30.360 --> 0:04:32.400
<v Speaker 1>back in time to that older version of the site.

0:04:33.400 --> 0:04:35.280
<v Speaker 1>If you want to make sure that a particular site

0:04:35.320 --> 0:04:37.760
<v Speaker 1>is recorded to the archive, you can do so manually

0:04:38.360 --> 0:04:41.120
<v Speaker 1>use the save page now option to save a specific

0:04:41.120 --> 0:04:44.200
<v Speaker 1>page once, but realize that doing so only saves that

0:04:44.320 --> 0:04:47.440
<v Speaker 1>one page, not an entire website, and it doesn't guarantee

0:04:47.440 --> 0:04:50.279
<v Speaker 1>that the site will be crawled in the future. And

0:04:50.720 --> 0:04:53.920
<v Speaker 1>if content owners want their material excluded from the Wayback Machine,

0:04:54.160 --> 0:04:56.320
<v Speaker 1>they can submit a request by sending an email to

0:04:56.400 --> 0:05:00.560
<v Speaker 1>info at archive dot org. Graham's as that the most

0:05:00.600 --> 0:05:02.640
<v Speaker 1>amazing thing about the way Back Machine is that it

0:05:02.720 --> 0:05:04.920
<v Speaker 1>exists at all, and how much of the public web

0:05:04.960 --> 0:05:07.120
<v Speaker 1>it's able to preserve. Given that it has such a

0:05:07.120 --> 0:05:10.039
<v Speaker 1>small budget and team, they do use volunteers as well,

0:05:11.240 --> 0:05:13.840
<v Speaker 1>he said, with more support, we can do an even

0:05:13.920 --> 0:05:16.080
<v Speaker 1>better job of backing up more of the public web.

0:05:16.640 --> 0:05:19.040
<v Speaker 1>Funding for the Internet Archive and the way Back Machine

0:05:19.240 --> 0:05:22.040
<v Speaker 1>comes from a combination of earned income from our subscription

0:05:22.080 --> 0:05:25.400
<v Speaker 1>based web arcing service archive it dot org, major donors

0:05:25.400 --> 0:05:27.880
<v Speaker 1>and foundations, as well as contributions from more than a

0:05:27.960 --> 0:05:31.280
<v Speaker 1>hundred thousand individual donors. We love being able to give

0:05:31.279 --> 0:05:33.960
<v Speaker 1>away our services and don't run ads on our web pages.

0:05:35.200 --> 0:05:37.040
<v Speaker 1>He's sure that the way Back Machine will become even

0:05:37.120 --> 0:05:40.599
<v Speaker 1>more important in the future. Quote. As the nature of

0:05:40.600 --> 0:05:44.320
<v Speaker 1>how people communicate and share information evolves, so too we

0:05:44.360 --> 0:05:48.120
<v Speaker 1>will need to build technologies, processes, and partnerships to continue

0:05:48.160 --> 0:05:50.080
<v Speaker 1>to do the best job we can to preserve as

0:05:50.120 --> 0:05:53.440
<v Speaker 1>much of this public information as possible. All in support

0:05:53.440 --> 0:05:55.960
<v Speaker 1>of the way Back machines mission to help make the

0:05:55.960 --> 0:05:59.400
<v Speaker 1>web more useful and reliable, and in particular, to help

0:05:59.440 --> 0:06:04.279
<v Speaker 1>support your lists, activists, academics, historians, researchers, and the general public.

0:06:09.560 --> 0:06:11.960
<v Speaker 1>Today's episode was written by Nathan Chandler and produced by

0:06:11.960 --> 0:06:14.719
<v Speaker 1>Tyler Clay. Brain Stuff is production of I Heart Radio's

0:06:14.720 --> 0:06:16.599
<v Speaker 1>How Stuff Works. For more on this and lots of

0:06:16.600 --> 0:06:19.400
<v Speaker 1>other well archived topics, visit our home planet how stuff

0:06:19.400 --> 0:06:21.880
<v Speaker 1>Works dot com and for more podcasts for my heart

0:06:21.960 --> 0:06:24.400
<v Speaker 1>Radio but it's the I Heart Radio app, Apple Podcasts,

0:06:24.440 --> 0:06:26.240
<v Speaker 1>or wherever you listen to your favorite shows.