WEBVTT - The Internet Archive 0:00:04.480 --> 0:00:12.639 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.640 --> 0:00:16.000 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:16.040 --> 0:00:19.040 I'm an executive producer with iHeart Podcasts. And how the 0:00:19.079 --> 0:00:23.280 tech are yet. So let's take a little literary trip. 0:00:23.600 --> 0:00:29.200 In Anthony Burgess's a clockwork Orange, the extremely wicked protagonist 0:00:29.680 --> 0:00:32.920 it's putting it lightly. At one point early early in 0:00:32.920 --> 0:00:36.760 the novel, reflects on the nature of permanence. He thinks 0:00:36.800 --> 0:00:40.680 the reader might not remember what milk bars were like 0:00:41.159 --> 0:00:45.360 due to quote things changing so scory these days and 0:00:45.479 --> 0:00:49.600 everybody very quick to forget, newspapers not being read much 0:00:49.760 --> 0:00:54.120 neither end quote. Alex in this case is saying that 0:00:54.200 --> 0:00:58.040 the combination of the world changing very quickly scory is 0:00:58.080 --> 0:01:01.880 derived from a Slavic word meaning swiftly or quickly, and 0:01:02.000 --> 0:01:05.720 people having short memories means that referencing something that happened 0:01:05.760 --> 0:01:08.680 even just a few years ago might mean you're met 0:01:08.680 --> 0:01:12.360 with blank stares because the world has moved on. Now 0:01:12.520 --> 0:01:15.759 take that same sentiment and crank it up to eleven 0:01:16.040 --> 0:01:18.840 when you talk about the Internet in general and the 0:01:18.840 --> 0:01:21.600 Web in particular. So, on the one hand, we know 0:01:22.000 --> 0:01:24.240 that the rule of thumb is that once something gets 0:01:24.280 --> 0:01:27.920 posted online, that's kind of it, right, it's sort of 0:01:27.959 --> 0:01:31.240 perpetually online. Like that's kind of the joke. Like once 0:01:31.280 --> 0:01:33.520 it's up, it's up, and you can take it down, 0:01:33.520 --> 0:01:35.280 but there's going to be a copy of it somewhere. 0:01:35.720 --> 0:01:39.319 So even if the originator tries to take down whatever 0:01:39.400 --> 0:01:43.440 the stuff was, somebody's got it. But on the other hand, 0:01:43.440 --> 0:01:46.200 we also know that so much stuff gets added every 0:01:46.240 --> 0:01:49.400 single day to the Internet. There's actually a colossal mountain 0:01:49.400 --> 0:01:53.120 of content out there that just keeps getting bigger moment 0:01:53.160 --> 0:01:55.960 by moment, and everything that came before it can end 0:01:56.040 --> 0:01:59.480 up getting buried in the process. And sometimes stuff can 0:01:59.560 --> 0:02:03.760 be added and taken down without anyone being the wiser. Now, 0:02:03.800 --> 0:02:06.640 on top of that, web pages obviously can change. A 0:02:06.720 --> 0:02:10.760 website might adopt a new format or style, might incorporate 0:02:10.840 --> 0:02:15.000 new technologies and interfaces that are added to web browsers, 0:02:15.360 --> 0:02:18.680 or it might choose to remove sections that once might 0:02:18.720 --> 0:02:21.960 have been relevant but maybe now not so much. Or 0:02:22.080 --> 0:02:27.079 entire websites could disappear as servers go offline or companies 0:02:27.320 --> 0:02:32.040 go bankrupt, or you know, web administrators just lose interest. 0:02:32.520 --> 0:02:36.520 The entire spectrum of human output can be found on 0:02:36.560 --> 0:02:39.400 the web. Not every instance of human output, but an 0:02:39.440 --> 0:02:44.440 example of everything is out there. Everything from deep philosophical 0:02:44.520 --> 0:02:48.040 musings to the most banal posts you know, which often 0:02:48.520 --> 0:02:51.320 revolve around what someone is having for lunch. All of 0:02:51.320 --> 0:02:53.760 that finds its way to the Internet. And while you 0:02:53.840 --> 0:02:56.600 might argue that a lot of it, or perhaps even 0:02:56.680 --> 0:02:59.040 most of it, is it really worth the time it 0:02:59.080 --> 0:03:02.920 takes to consume, let alone keep it around. There is 0:03:03.080 --> 0:03:06.160 undeniably a huge amount of valuable data out there too, 0:03:06.639 --> 0:03:09.800 but there's no guarantee that it will stay there or 0:03:09.880 --> 0:03:13.880 remain easily findable. And that's where today's topic comes in. 0:03:13.960 --> 0:03:16.480 I wanted to talk about a project that began back 0:03:16.520 --> 0:03:19.320 in nineteen ninety six. It's a project that aims to 0:03:19.360 --> 0:03:22.520 preserve as much of the Internet as possible and little 0:03:22.720 --> 0:03:26.600 slices of time, little snapshots. Not only does that mean 0:03:26.639 --> 0:03:29.200 you can potentially dig up something that hasn't been online 0:03:29.240 --> 0:03:31.919 for years, but also you can get a look at 0:03:32.000 --> 0:03:35.080 what different sites were like in various eras of the Web. 0:03:35.320 --> 0:03:37.600 It could be a really eye opening experience to see 0:03:37.640 --> 0:03:40.480 something like Amazon and what it looked like, you know, 0:03:40.520 --> 0:03:43.960 shortly after it launched, compared to what it looks like today. 0:03:44.400 --> 0:03:48.960 So we are going to talk about the Internet Archive. Now. 0:03:48.960 --> 0:03:51.240 To do that, we need to talk a little bit 0:03:51.240 --> 0:03:54.040 about the people who founded the ding dang darn thing, 0:03:54.320 --> 0:03:58.520 and that would be Brewster Kale and Bruce Gilliat. So 0:03:58.680 --> 0:04:02.040 Klee graduated from m with a degree in computer science 0:04:02.040 --> 0:04:06.280 and engineering. After he graduated, he joined fellow MIT graduate 0:04:06.400 --> 0:04:10.080 Danny Hillis, who had created a company called Thinking Machines. 0:04:10.320 --> 0:04:13.960 So this was a super computer company. His team specialized 0:04:13.960 --> 0:04:17.920 in building massively parallel computer systems, mostly with the aim 0:04:17.960 --> 0:04:21.120 of building machines for AI research and development. So yeah, 0:04:21.240 --> 0:04:24.480 Calee was working on the challenges of providing AI researchers 0:04:24.520 --> 0:04:28.040 with the compute power they need, decades before our current 0:04:28.120 --> 0:04:33.040 AI explosion. Bruce Gilliot is also a computer scientist, and 0:04:33.080 --> 0:04:35.160 that's just about all I know about him. I mean, 0:04:35.320 --> 0:04:38.040 I know he is, or at least was married, and 0:04:38.120 --> 0:04:40.600 I also know he owned a series of very impressive 0:04:40.600 --> 0:04:43.960 houses in the San Francisco and San Jose areas because 0:04:44.000 --> 0:04:46.600 it made the news whenever he sold one or bought 0:04:46.600 --> 0:04:49.679 a new one. But other than that, there's precious little 0:04:49.680 --> 0:04:53.000 information about him that I could find, which is somewhat ironic. 0:04:53.040 --> 0:04:55.440 When you consider that he has dedicated a lot of 0:04:55.440 --> 0:04:58.520 time and effort to preserving information on the Internet. He 0:04:58.520 --> 0:05:00.839 would go on to co found the company called Alexa 0:05:00.920 --> 0:05:03.960 Internet with Brewster Kale, but that's getting ahead of ourselves. 0:05:04.080 --> 0:05:07.839 So most of my story will center around Kale simply 0:05:07.880 --> 0:05:10.520 because out of the two co founders, he's the one 0:05:10.520 --> 0:05:13.839 who acted more as the face of the efforts, and Gileat, 0:05:13.839 --> 0:05:15.880 from what I can tell, has just been really good 0:05:15.880 --> 0:05:20.120 about kind of maintaining a very personal private life. So 0:05:20.880 --> 0:05:24.960 I don't mean to diminish Gileat's contributions, but at the 0:05:24.960 --> 0:05:27.640 same time, you know, I can only cover what I 0:05:27.640 --> 0:05:31.240 can find. So in nineteen eighty nine, Kale, along with 0:05:31.320 --> 0:05:35.080 a colleague named Harry Morris, created an innovative tool for 0:05:35.200 --> 0:05:38.760 the blossoming Internet. Now remember this is the Internet. It's 0:05:38.839 --> 0:05:42.119 not the Worldwide Web. It didn't exist yet the Web 0:05:42.240 --> 0:05:45.159 the Internet did, and the tool they created was called 0:05:45.160 --> 0:05:51.960 the Wide Area Information Server or ways WAIS. So people 0:05:52.000 --> 0:05:55.040 could create a web server. They could host documents on 0:05:55.080 --> 0:05:59.960 their web servers. But finding these documents was really hard 0:06:00.720 --> 0:06:04.680 because you didn't necessarily have hyperlinks connecting one document to 0:06:04.760 --> 0:06:07.920 others and vice versa. You didn't have an easy way 0:06:07.960 --> 0:06:12.680 of even navigating through different documents from one to the next. 0:06:13.160 --> 0:06:15.320 So it was almost a case that you needed to 0:06:15.360 --> 0:06:19.080 know where something was and what it was called first, 0:06:19.240 --> 0:06:22.440 and then you could go to the relevant server and 0:06:22.480 --> 0:06:26.599 retrieve that document. Otherwise the document would just remain quietly 0:06:26.680 --> 0:06:30.359 sitting on some server somewhere and no one would know 0:06:30.400 --> 0:06:34.080 about it. Now, that is antithetical to the entire purpose 0:06:34.160 --> 0:06:37.840 of a wide area information sharing system, because, I mean, 0:06:37.880 --> 0:06:40.800 the name tells us the whole purpose of this technology 0:06:40.839 --> 0:06:45.360 is to allow information to be widely shared. Jeremy Norman's 0:06:45.400 --> 0:06:50.000 History of Information lists ways as quote the first Internet 0:06:50.080 --> 0:06:54.120 publishing system, just predating Gopher and the World Wide Web 0:06:54.320 --> 0:06:58.839 end quote. In a recorded presentation to some Xerox employees, 0:06:59.000 --> 0:07:03.120 Kale laid out personal perspective on what he wants from 0:07:03.279 --> 0:07:06.159 his experience on the Internet. So first up, he said 0:07:06.360 --> 0:07:09.520 he wanted his own personal information to be easily accessible 0:07:09.960 --> 0:07:13.240 by him. Specifically, not that it should be easily accessible 0:07:13.280 --> 0:07:16.880 to everybody, but specifically to him. He wanted the ability 0:07:16.920 --> 0:07:19.760 to get access to all the different stuff he generates, 0:07:19.800 --> 0:07:22.280 like articles and such, and to make it really easy 0:07:22.320 --> 0:07:25.080 to do that. He also wanted the ability for publishers 0:07:25.120 --> 0:07:27.960 to get their work to him. So in Kal's mind, 0:07:28.280 --> 0:07:30.720 the best approach would be for published works that are 0:07:30.760 --> 0:07:33.360 relevant to his interests to find their way to him, 0:07:33.560 --> 0:07:36.120 as opposed to Kale having to go out and hunt 0:07:36.200 --> 0:07:39.480 down these published works himself. And he pointed out this 0:07:39.600 --> 0:07:42.480 is what publishers want too, because you wouldn't publish something 0:07:42.560 --> 0:07:45.239 unless he wanted folks to actually read it. He also 0:07:45.320 --> 0:07:48.160 said that he wanted this technology to be usable anywhere. 0:07:48.600 --> 0:07:51.200 He wanted people to be able to access it no 0:07:51.240 --> 0:07:53.080 matter what kind of device they were relying on. Now 0:07:53.160 --> 0:07:56.160 he was specifically referencing laptops at the time, but he 0:07:56.280 --> 0:08:00.120 was also saying that portable computer systems, essentially things that 0:08:00.120 --> 0:08:03.400 would become smartphones and tablets, were on the horizon and 0:08:03.440 --> 0:08:05.880 that these needed to be able to access that stuff too. 0:08:06.280 --> 0:08:09.080 And he said that he wanted people to be able 0:08:09.080 --> 0:08:11.880 to use what he had learned should he choose to 0:08:11.880 --> 0:08:15.440 share the information, that if he had come up with 0:08:15.480 --> 0:08:17.600 something that was useful and he wanted to share that, 0:08:17.640 --> 0:08:19.760 he wanted other people to be able to access that. 0:08:20.160 --> 0:08:23.120 Cale didn't say that people should be compelled to share, 0:08:23.560 --> 0:08:26.000 but if they wanted to it should be possible to 0:08:26.040 --> 0:08:30.560 do so. Ways was Cale's attempt to bring these ideas 0:08:30.640 --> 0:08:34.199 to life. In that presentation to the Xerox employees, he 0:08:34.320 --> 0:08:38.320 defined ways as electronic publishing. He further defined that term 0:08:38.400 --> 0:08:41.880 to mean the distribution of information. So whether the end 0:08:41.960 --> 0:08:45.080 user was to look at this information on a computer 0:08:45.120 --> 0:08:48.280 screen or they just chose to print out the information 0:08:48.640 --> 0:08:50.880 and then read it that way, that was beside the point. 0:08:51.120 --> 0:08:55.559 Electronic publishing was all about how information got from the 0:08:55.600 --> 0:08:58.760 originator to the end user. That's what made it e 0:08:58.920 --> 0:09:02.880 publishing that it was publishing over wires. Now, one thing 0:09:03.000 --> 0:09:06.800 Cale introduced in this presentation to Xerox was this concept 0:09:06.800 --> 0:09:10.760 of conducting searches using natural language. This concept is one 0:09:10.800 --> 0:09:13.640 that we're really familiar with today. You enter a query 0:09:13.800 --> 0:09:16.200 into a search bar. You describe what it is that 0:09:16.240 --> 0:09:19.760 you want to know or learn about, or have access to, 0:09:20.080 --> 0:09:23.400 or retrieve or whatever. This search engine brings back search 0:09:23.440 --> 0:09:26.600 results that are ordered by some kind of relevance depending 0:09:26.679 --> 0:09:29.960 upon the search engines, you know, various algorithms. How the 0:09:30.000 --> 0:09:33.760 search engine determines relevance really depends upon the system itself, 0:09:33.880 --> 0:09:36.160 of course, Like you could run the same search across 0:09:36.400 --> 0:09:39.760 different search engines and get very different results based upon 0:09:40.080 --> 0:09:45.280 that methodology of determining relevance. If the system believes it's relevant, 0:09:45.480 --> 0:09:47.240 it may or may not be relevant to what you 0:09:47.320 --> 0:09:50.520 actually want. Like hopefully the two are aligned. If it's 0:09:50.520 --> 0:09:53.400 a really good search engine, then you're going to get 0:09:53.480 --> 0:09:57.600 something that is actually meaningful to you. Anyway, Ways was 0:09:57.720 --> 0:10:01.720 kind of following in that approach back before there was 0:10:01.760 --> 0:10:04.280 a World Wide Web, you know, when you just needed 0:10:04.280 --> 0:10:08.200 a way to find stuff that was being stored on 0:10:08.280 --> 0:10:11.880 these Internet servers and to be able to retrieve these 0:10:11.920 --> 0:10:14.600 documents to make use of them. Otherwise you had this 0:10:14.679 --> 0:10:19.360 incredibly powerful communications tool, but it was so challenging to 0:10:19.480 --> 0:10:22.600 use in a meaningful way that the information stored there 0:10:23.000 --> 0:10:26.560 would be not that useful. I think of it akin 0:10:26.679 --> 0:10:31.720 to imagine that there's this one remote library and it's tiny, 0:10:32.080 --> 0:10:36.440 but it has the world's only copy of some text. 0:10:36.840 --> 0:10:39.280 But this libraries in the middle of nowhere. It's really 0:10:39.360 --> 0:10:42.160 hard to get to the fact that that library has 0:10:42.280 --> 0:10:45.800 that document would not be terribly useful to most people, 0:10:45.920 --> 0:10:47.840 and so it might as well not have the document 0:10:47.880 --> 0:10:50.120 at all. That's kind of what Ways was trying to 0:10:50.160 --> 0:10:52.920 do is solve this problem of making it easier to 0:10:52.960 --> 0:10:57.400 get access to this wealth of information that Kale saw 0:10:57.720 --> 0:11:01.880 was only going to get more complex and more full 0:11:01.960 --> 0:11:05.600 of data. Well, we'll move away from Ways, because we 0:11:05.600 --> 0:11:08.280 could do a full episode about that. I will say 0:11:08.280 --> 0:11:11.960 that Cale and Morris, the founders of Ways, the guys 0:11:11.960 --> 0:11:17.120 who created the Ways technologies, would actually leave Thinking Machines 0:11:17.320 --> 0:11:20.680 and they would found a spinoff company just called Ways Incorporated. 0:11:20.920 --> 0:11:23.439 And it was around this point when the mysterious Bruce 0:11:23.480 --> 0:11:26.840 Gilliot joined the team. And while the Worldwide Web would 0:11:26.880 --> 0:11:29.840 debut in the early nineties, which really opened up accessibility 0:11:29.840 --> 0:11:32.040 to information on the Internet for a lot of people, 0:11:32.480 --> 0:11:35.840 most of them for the first time, Ways would continue 0:11:35.880 --> 0:11:38.920 to remain relevant. In fact, it was relevant enough that 0:11:39.040 --> 0:11:42.480 in nineteen ninety five AOL would come calling with an 0:11:42.480 --> 0:11:45.959 offer to purchase the company for a cool fifteen million dollars. 0:11:46.000 --> 0:11:48.840 If we adjust that for inflation today's money, that would 0:11:48.880 --> 0:11:53.640 be around thirty million bucks around that ballpark. Now, a 0:11:53.640 --> 0:11:56.680 lot of the folks that Ways Incorporated would split off 0:11:56.760 --> 0:12:00.679 to create new companies after this acquisition, and within a 0:12:00.800 --> 0:12:04.400 year that included Cale and Gileat, who went on to 0:12:04.559 --> 0:12:10.000 found a new company called Alexa Internet and you might think, huh, Alexa, 0:12:10.120 --> 0:12:13.280 you mean like the same name as the Amazon Digital Assistant, 0:12:13.679 --> 0:12:16.559 And yes, exactly that, because, as it would turn out, 0:12:16.600 --> 0:12:21.840 Amazon would ultimately acquire Alexa Internet just a few years 0:12:21.880 --> 0:12:25.080 after it was founded. But the name derived from the 0:12:25.120 --> 0:12:29.800 Library at Alexandria, the ancient library of Egypt that at 0:12:29.880 --> 0:12:33.240 one point housed one of the world's largest collections of 0:12:33.320 --> 0:12:39.400 accumulated knowledge. Now around forty eight BCE, Julius Caesar Julie 0:12:39.400 --> 0:12:42.960 Baby and his boys they barged into Alexandria, and as 0:12:43.000 --> 0:12:46.840 a consequence of their rowdy invasion, the library caught fire 0:12:47.200 --> 0:12:49.920 and much of the collection burned. Sadly, that was not 0:12:49.960 --> 0:12:52.880 the only indignity. In fact, it wasn't the first indignity 0:12:53.200 --> 0:12:57.120 that the library suffered that would impact its relevance. Further 0:12:57.240 --> 0:13:00.000 conflicts a couple of centuries later pretty much wiped out 0:13:00.160 --> 0:13:03.560 whatever had been left from the previous calamities, and the 0:13:03.600 --> 0:13:07.079 Library of Alexandria became kind of a touchstone for folks 0:13:07.080 --> 0:13:10.160 who have stressed the importance of access to knowledge and 0:13:10.240 --> 0:13:13.240 the protection of that knowledge, and that the consequences that 0:13:13.360 --> 0:13:15.920 could follow from the loss of such knowledge can be 0:13:15.960 --> 0:13:20.200 really dire. See also like the Middle Ages the Dark Ages, 0:13:20.200 --> 0:13:24.120 for example, that loss of knowledge is a really terrible thing. 0:13:24.520 --> 0:13:28.000 So the impetus for Alexa Internet was that Cale and 0:13:28.080 --> 0:13:31.760 Gillat wanted, in the words of the Web Design Museum quote, 0:13:31.840 --> 0:13:35.960 to develop advanced web navigation that would continually improve itself 0:13:36.080 --> 0:13:39.520 on the basis of user generated data end quote, which 0:13:39.559 --> 0:13:42.679 is a pretty advanced idea for nineteen ninety six when 0:13:42.720 --> 0:13:45.600 the Web was still very young and the general public 0:13:45.679 --> 0:13:47.439 was still just trying to get a grip on exactly 0:13:47.480 --> 0:13:51.320 what the Web and by extension, the Internet were. One 0:13:51.360 --> 0:13:54.679 of the first tools that Alexa Internet developed was a 0:13:54.720 --> 0:13:58.000 browser toolbar. So installing this toolbar into a browser would 0:13:58.000 --> 0:14:01.120 give the user's access to a sort of crowd powered 0:14:01.200 --> 0:14:04.640 recommendation engine. In some ways, it's not that different from 0:14:04.840 --> 0:14:08.360 sites like dig and Reddit that would later rely on 0:14:08.440 --> 0:14:11.880 the user community to actually work and to recommend links 0:14:11.920 --> 0:14:17.120 to really interesting sites. This toolbar would recommend the sites 0:14:17.120 --> 0:14:20.760 to users based upon how the overall community was browsing. 0:14:20.920 --> 0:14:24.160 So the more people who were using this toolbar, the 0:14:24.200 --> 0:14:27.480 more information was going into where they were going, and 0:14:27.520 --> 0:14:29.720 thus you would get different recommendations. So if a lot 0:14:29.720 --> 0:14:32.440 of people were navigating to a specific site for whatever reason, 0:14:32.680 --> 0:14:35.320 you might get a recommendation to do the same. It 0:14:35.360 --> 0:14:38.160 was an attempt at an organic way for folks to 0:14:38.240 --> 0:14:41.560 suggest websites, kind of like a word of mouth campaign, 0:14:41.920 --> 0:14:45.920 and Alexa Internet would also provide meta information about websites 0:14:45.960 --> 0:14:48.840 to users if they wanted it. Meta information is information 0:14:48.920 --> 0:14:52.240 about information, so this would include stuff like how many 0:14:52.440 --> 0:14:55.400 web pages were part of an overall website, or how 0:14:55.440 --> 0:14:58.600 many other websites were pointing back to the one you 0:14:58.640 --> 0:15:01.200 were on, and so forth. A lot of the stuff 0:15:01.360 --> 0:15:04.840 that Alexa Internet could tell you would reflect a specific 0:15:04.880 --> 0:15:07.640 web page's relevance. It's the same sort of information that 0:15:07.640 --> 0:15:10.600 search engines like Google would take into account when deciding 0:15:10.640 --> 0:15:14.480 relevance for search results. And that meant that it didn't 0:15:14.480 --> 0:15:16.520 take very long for Amazon to come around with an 0:15:16.560 --> 0:15:20.000 offer to purchase Alexa Internet. I'll talk about that more, 0:15:20.120 --> 0:15:22.920 as well as the development of the Internet Archive after 0:15:22.960 --> 0:15:26.360 we come back from this quick break to thank our sponsors. 0:15:35.600 --> 0:15:40.000 So Amazon in nineteen ninety nine takes a look at 0:15:40.080 --> 0:15:44.200 Alexa Internet and says, Wow, this is pretty incredible. This 0:15:44.600 --> 0:15:49.480 little company has created some means of checking for stuff 0:15:49.480 --> 0:15:53.840 like relevance and metadata that could be really really useful 0:15:53.880 --> 0:15:57.280 for us, And so Amazon made an offer that Alexa 0:15:57.320 --> 0:16:00.160 Internet couldn't refuse to acquire the company for the and 0:16:00.240 --> 0:16:03.160 slee some of two hundred and fifty million dollars in 0:16:03.280 --> 0:16:07.680 Amazon stock in May of ninety nine. So this is 0:16:07.880 --> 0:16:10.880 a little different than the earlier deal we talked about 0:16:10.880 --> 0:16:14.840 where AOL bought you know, the Ways Incorporated, because they 0:16:14.840 --> 0:16:17.120 bought it with two hundred and fifty million dollars with 0:16:17.200 --> 0:16:19.920 a stock. If we just treated that like it was 0:16:19.960 --> 0:16:25.040 a cash exchange, then if we had just for inflation, 0:16:25.120 --> 0:16:28.240 that's like around four hundred and sixty nine million dollars 0:16:28.240 --> 0:16:31.480 worth of stock. But that's not really how you deal 0:16:31.520 --> 0:16:33.920 with the value here, right. You have to think about 0:16:33.920 --> 0:16:36.680 how much was the stock worth back in nineteen ninety 0:16:36.800 --> 0:16:39.600 nine versus how much is the stock worth today? I 0:16:39.800 --> 0:16:43.480 checked and I saw that in May of nineteen ninety nine, 0:16:43.560 --> 0:16:46.520 Amazon stock was trading for around two dollars eighty nine 0:16:46.560 --> 0:16:49.400 cents per share. These days, it's closer to one hundred 0:16:49.400 --> 0:16:53.840 and eighty dollars per share. Plus. Between that time, Amazon 0:16:53.920 --> 0:16:56.760 had two different stock splits. There was a two to 0:16:56.760 --> 0:16:59.520 one split in late ninety nine, and there was a 0:16:59.560 --> 0:17:03.240 twenty to one stock split in twenty twenty two. When 0:17:03.240 --> 0:17:06.080 you factor all that up, that two hundred and fifty 0:17:06.080 --> 0:17:10.840 million dollars in stock ends up being a ton of wealth. 0:17:11.240 --> 0:17:13.760 Like it's just a huge amount. It would take a 0:17:13.800 --> 0:17:17.040 lot of calculating to get an estimate, and even then 0:17:17.359 --> 0:17:21.520 it wouldn't really be accurate just say that deal is 0:17:21.560 --> 0:17:25.399 worth a lot. So anyway, the important thing with the 0:17:25.400 --> 0:17:29.119 Internet Archive is that Cale and Gileat, through their work 0:17:29.160 --> 0:17:32.359 and creating tools for Alexa Internet, found themselves able to 0:17:32.400 --> 0:17:36.920 create snapshots of the Web. So they were using Alexa 0:17:37.000 --> 0:17:40.560 Internet to have a commercial business, and they established the 0:17:40.560 --> 0:17:45.480 Internet Archive as a way of preserving information that had, 0:17:45.560 --> 0:17:48.680 at some point or another found its home on the Internet. 0:17:48.960 --> 0:17:52.480 So they were using Alexa Internet tech to crawl the 0:17:52.560 --> 0:17:55.080 young Web in order to index everything, which is a 0:17:55.200 --> 0:17:58.040 necessary step if you want to give people access to 0:17:58.119 --> 0:18:00.399 the various documents posted on the web. We first have 0:18:00.440 --> 0:18:02.639 to know what is there and where is it. To 0:18:02.720 --> 0:18:07.320 do that, you've got to index everything. And then they said, well, 0:18:07.600 --> 0:18:09.760 now that we are able to index this, we could 0:18:09.800 --> 0:18:14.000 actually download these little snapshots and keep them. And according 0:18:14.000 --> 0:18:18.560 to the Internet Archive, that would be important because the 0:18:18.640 --> 0:18:23.119 average lifespan for a new web page was not very long, 0:18:23.400 --> 0:18:27.320 So contrary to our belief that once something is posted 0:18:27.359 --> 0:18:30.480 to the Internet, it's there forever, the archive found that 0:18:30.520 --> 0:18:34.560 on average, new web pages stuck around for about seventy 0:18:34.680 --> 0:18:38.679 seven days, which means it's less than three months, and 0:18:38.720 --> 0:18:42.639 then puff they would disappear, like maybe they would change drastically, 0:18:42.680 --> 0:18:46.679 maybe they would just go away. Now, imagine that you 0:18:46.720 --> 0:18:49.800 were to walk into a brick and mortar library, but 0:18:49.880 --> 0:18:52.000 then you found out that on average the books in 0:18:52.040 --> 0:18:54.639 that library would only stick around for three months before 0:18:54.680 --> 0:18:57.720 being lost forever. And think of all the knowledge that 0:18:57.760 --> 0:19:01.200 would disappear on a regular basis and ongoing basis. It 0:19:01.200 --> 0:19:03.840 would be impossible to calculate the impact of that kind 0:19:03.840 --> 0:19:06.200 of reality. It would be like losing the Library of 0:19:06.240 --> 0:19:10.679 Alexandria regularly every three months. So Cale had come to 0:19:10.720 --> 0:19:14.160 the conclusion that knowledge should be preserved and made available 0:19:14.200 --> 0:19:17.399 for posterity. This is similar to an idea that was 0:19:17.440 --> 0:19:20.880 proposed by Stuart Brand back in the nineteen eighties. It's 0:19:20.920 --> 0:19:24.560 a complicated idea that typically gets boiled down to the 0:19:24.600 --> 0:19:29.679 saying information wants to be free. That's actually an oversimplification 0:19:29.720 --> 0:19:33.800 of what Brand was really communicating. But his point was 0:19:33.800 --> 0:19:37.040 that information's value is kind of like a paradox. The 0:19:37.119 --> 0:19:41.440 information could be incredibly valuable, right, it could be absolutely critical, 0:19:41.480 --> 0:19:45.439 and therefore it could be expensive, but the cost of 0:19:45.480 --> 0:19:50.040 distributing information was consistently declining. It was getting easier and 0:19:50.200 --> 0:19:54.120 cheaper to share information, and the benefits of making information 0:19:54.240 --> 0:19:59.560 accessible are typically pretty tremendous. But information is only accessible 0:20:00.119 --> 0:20:03.560 if someone is able to hold onto that info. Otherwise 0:20:03.560 --> 0:20:06.520 it's lost. Right, The Internet was such a volatile thing 0:20:06.560 --> 0:20:09.119 that there was no guarantee that what you saw today 0:20:09.520 --> 0:20:13.000 would be available tomorrow. In the days before the dynamic web, 0:20:13.680 --> 0:20:16.639 it wasn't really unusual for someone to establish a web page, 0:20:16.880 --> 0:20:20.159 to publish that page, and then later on to wipe 0:20:20.160 --> 0:20:24.480 the slate clean or you know, otherwise alter vast portions 0:20:24.480 --> 0:20:27.040 of that page in order to use that same web 0:20:27.400 --> 0:20:31.400 landscape to host a totally different document. So the old 0:20:31.440 --> 0:20:34.720 stuff would just disappear. And so Calee and Gilliat created 0:20:35.000 --> 0:20:40.119 the Internet Archive, a nonprofit organization dedicated to the archival 0:20:40.440 --> 0:20:44.399 of information across the Internet. And I think most people 0:20:44.800 --> 0:20:49.040 are familiar with it from the web wayback machine, but 0:20:49.080 --> 0:20:52.240 that's just one part of what the Internet Archive does. 0:20:52.600 --> 0:20:55.199 As stated in the Library of Congress, the mission of 0:20:55.240 --> 0:20:59.480 the Internet Archive was quote offering permanent access for researchers, 0:20:59.520 --> 0:21:03.040 his story and scholars to historical collections that exist in 0:21:03.119 --> 0:21:07.040 digital format end quote. Cale and Gilliat founded the Internet 0:21:07.119 --> 0:21:09.600 Archive the same year they founded Alexa Internet. So that's 0:21:09.720 --> 0:21:14.440 nineteen ninety six. And it wasn't easy. And why is that? Well, 0:21:14.880 --> 0:21:17.280 you got to think about the challenge you face if 0:21:17.320 --> 0:21:20.919 you want to archive everything on the Internet, or at 0:21:21.000 --> 0:21:24.480 least everything that you're allowed to archive on the Internet. 0:21:24.600 --> 0:21:26.600 We'll come back to that a couple of times. So, 0:21:26.640 --> 0:21:28.240 for one thing, you need to create a way to 0:21:28.320 --> 0:21:31.920 capture the content of a web page and to preserve 0:21:31.960 --> 0:21:35.119 that for posterity. And you need a way for people 0:21:35.280 --> 0:21:39.560 to access those archived web pages and to navigate them. 0:21:39.800 --> 0:21:43.639 So Alexa Internet would end up developing these technologies and 0:21:43.680 --> 0:21:47.320 commercializing them in various ways, and the Internet Archive was 0:21:47.359 --> 0:21:51.119 made possible through these tools. So you could think of 0:21:51.160 --> 0:21:56.000 Alexa Internet as being the funding machine for Internet Archive 0:21:56.119 --> 0:21:58.600 in the beginning, at least as far as the tools 0:21:58.680 --> 0:22:02.080 Internet Archive would use in order to achieve its mission. Now, 0:22:02.119 --> 0:22:05.720 on the capturing front, Alexa Internet created a web crawler. 0:22:06.000 --> 0:22:10.760 So for applications like web search engines, primarily web search engines, 0:22:11.040 --> 0:22:14.919 web crawlers are the soldiers that they send out. A 0:22:14.960 --> 0:22:19.080 web crawler's job is to index content across the Internet 0:22:19.160 --> 0:22:22.119 and to capture information about what the various web pages 0:22:22.160 --> 0:22:26.199 on the Internet are actually about. It's complicated, right. You 0:22:26.240 --> 0:22:29.520 could just have a directory of web pages that's based 0:22:29.520 --> 0:22:32.119 off the title of the web pages, but title and 0:22:32.240 --> 0:22:36.280 content are not always in alignment. So web crawlers are 0:22:36.320 --> 0:22:40.399 all about following the various branching pathways across the web. 0:22:40.480 --> 0:22:43.520 They crawl through the web, in other words, indexing every 0:22:43.640 --> 0:22:47.080 page as they do. So. Not everyone, however, wants their 0:22:47.080 --> 0:22:50.760 web page indexed. So you can actually include some HTML 0:22:50.880 --> 0:22:54.840 language in your web page that indicates that it's off 0:22:54.880 --> 0:22:58.760 limits for indexing, and appolite web crawlers such as the 0:22:58.760 --> 0:23:03.000 ones that Alexi Internet was using, will honor those instructions 0:23:03.040 --> 0:23:06.480 and it will not index that page. But other pages 0:23:06.760 --> 0:23:11.639 that lack this specific instruction of hey, don't index this, 0:23:12.359 --> 0:23:15.920 they're fair game. I like to think of web crellers 0:23:16.000 --> 0:23:18.440 kind of like Doctor Strange from the Marvel Universe the 0:23:18.560 --> 0:23:21.399 Cinematic Universe in particular, they all want. He uses his 0:23:21.520 --> 0:23:25.760 time manipulation abilities to see where all the different possible 0:23:26.000 --> 0:23:29.800 pathways can lead to. The web crellers do that across 0:23:29.880 --> 0:23:32.440 the web. They explore all the nooks and crannies. They 0:23:32.480 --> 0:23:35.560 follow each link that even the ones that no one 0:23:35.640 --> 0:23:38.520 ever clicks on, they follow those two. And you know, 0:23:38.640 --> 0:23:41.359 hats off to web crellers for doing that to build 0:23:41.359 --> 0:23:44.240 out these indices, because without it, web search wouldn't work, 0:23:44.560 --> 0:23:49.919 and Alexa Internet wouldn't have been a thing anyway. Alexa 0:23:49.960 --> 0:23:53.520