WEBVTT - How Google Won the Search War

0:00:04.240 --> 0:00:07.240
<v Speaker 1>Welcome to tech Stuff, a production of I Heart Radios

0:00:07.320 --> 0:00:14.200
<v Speaker 1>How Stuff Works. Hey there, and welcome to tech Stuff.

0:00:14.240 --> 0:00:17.520
<v Speaker 1>I'm your host, Jonathan Strickland. I'm an executive producer with

0:00:17.560 --> 0:00:19.920
<v Speaker 1>How Stuff Works and I heart Radio and I love

0:00:20.040 --> 0:00:23.360
<v Speaker 1>all things tech, and today I thought i'd talk a

0:00:23.360 --> 0:00:26.680
<v Speaker 1>bit about Internet search engines and how Google was able

0:00:26.760 --> 0:00:30.400
<v Speaker 1>to sort of take the lead amongst a pack of competitors,

0:00:30.880 --> 0:00:34.480
<v Speaker 1>most of which came out well before Google did. Now

0:00:34.479 --> 0:00:37.240
<v Speaker 1>these days, lots of people use Google as a word

0:00:37.360 --> 0:00:40.239
<v Speaker 1>for web searching in general, even though the company does

0:00:40.560 --> 0:00:43.159
<v Speaker 1>way more than web search, and there's still plenty of

0:00:43.200 --> 0:00:46.199
<v Speaker 1>competitors that are still active that are out there. I'm

0:00:46.240 --> 0:00:49.400
<v Speaker 1>sure Microsoft would rather we all talk about binging the

0:00:49.440 --> 0:00:52.519
<v Speaker 1>heck of the things, but that doesn't happen. I think

0:00:52.520 --> 0:00:54.520
<v Speaker 1>we're now at the point where people will talk about Googling,

0:00:54.640 --> 0:00:57.040
<v Speaker 1>even if they're using a different search engine. So how

0:00:57.040 --> 0:01:00.120
<v Speaker 1>did that happen? How did we get to that point? Well,

0:01:00.120 --> 0:01:02.280
<v Speaker 1>to explain how we got there, it's a good idea

0:01:02.280 --> 0:01:04.520
<v Speaker 1>to walk down memory lane. I mean, you know, I

0:01:04.560 --> 0:01:07.000
<v Speaker 1>love to do this. Every episode begins with a history

0:01:07.080 --> 0:01:09.760
<v Speaker 1>lesson and to really look at how the idea of

0:01:09.760 --> 0:01:12.280
<v Speaker 1>search engines developed and what things were like in the

0:01:12.319 --> 0:01:16.279
<v Speaker 1>early days of the public Internet and the Web now. First,

0:01:16.560 --> 0:01:20.120
<v Speaker 1>the idea of search engines predates both of those concepts

0:01:20.120 --> 0:01:22.839
<v Speaker 1>by quite some time, and it rose out of necessity.

0:01:22.880 --> 0:01:26.520
<v Speaker 1>It kind of evolved out of older methods of indexing.

0:01:26.640 --> 0:01:31.039
<v Speaker 1>So a predecessor to search engines are the various library

0:01:31.120 --> 0:01:35.760
<v Speaker 1>classification systems UH. Three big ones are the Dewey Decimal system,

0:01:36.200 --> 0:01:40.240
<v Speaker 1>the Library of Congress system, and the Superintendent of Documents

0:01:40.280 --> 0:01:44.120
<v Speaker 1>systems UH. The first two of those designate books with

0:01:44.200 --> 0:01:47.760
<v Speaker 1>call numbers according to subject matter, so you divide the

0:01:47.800 --> 0:01:51.640
<v Speaker 1>books up based upon whatever subject they cover. This can

0:01:51.640 --> 0:01:56.480
<v Speaker 1>get a little complicated, it is and no pun intended subjective.

0:01:57.000 --> 0:01:59.840
<v Speaker 1>You have to determine where does the book best fit

0:02:00.280 --> 0:02:04.760
<v Speaker 1>in the grand taxonomy of subjects UH. Meanwhile, the Superintendent

0:02:04.800 --> 0:02:07.240
<v Speaker 1>of Documents system is totally different. It doesn't divide it

0:02:07.320 --> 0:02:11.079
<v Speaker 1>up by subject. It divides up books by the issuing

0:02:11.120 --> 0:02:15.399
<v Speaker 1>agency responsible for the publication of the work. So they

0:02:15.520 --> 0:02:19.160
<v Speaker 1>just divided up by where the book came from, not

0:02:19.240 --> 0:02:22.480
<v Speaker 1>what the book covers. Whatever the system, the purpose is

0:02:22.520 --> 0:02:24.440
<v Speaker 1>the same. It's to make it possible for someone to

0:02:24.480 --> 0:02:29.160
<v Speaker 1>track down a specific work in an enormous collection of works,

0:02:29.520 --> 0:02:32.080
<v Speaker 1>or to figure out where to place a new work

0:02:32.280 --> 0:02:36.359
<v Speaker 1>within an existing collection. By classifying each work and then

0:02:36.400 --> 0:02:41.360
<v Speaker 1>designating the physical location for that piece, people can find stuff. Otherwise,

0:02:41.400 --> 0:02:43.280
<v Speaker 1>you just have an enormous pile of books with no

0:02:43.520 --> 0:02:47.119
<v Speaker 1>organizational system at all, and finding anything would take ages. Now,

0:02:47.200 --> 0:02:50.160
<v Speaker 1>someday I'll have to do an episode about these systems

0:02:50.160 --> 0:02:52.679
<v Speaker 1>in more detail, to talk about how they were developed

0:02:52.720 --> 0:02:55.560
<v Speaker 1>and how they've evolved over time, because it's actually a

0:02:55.600 --> 0:02:58.480
<v Speaker 1>pretty interesting story. But we're gonna jump forward a bit,

0:02:58.800 --> 0:03:01.799
<v Speaker 1>not quite up to the com uter age, however. Rather

0:03:01.880 --> 0:03:04.799
<v Speaker 1>we're gonna jump forward to the nineteen forties. That's when

0:03:04.800 --> 0:03:08.600
<v Speaker 1>a forward thinking fellow named Vanavar Bush wrote an article

0:03:08.639 --> 0:03:11.600
<v Speaker 1>for The Atlantic Monthly. The piece had the title as

0:03:11.680 --> 0:03:16.320
<v Speaker 1>we May Think, and it contains some fairly prescient ideas

0:03:16.400 --> 0:03:20.119
<v Speaker 1>in it. Bush recognized that as we increase our knowledge,

0:03:20.440 --> 0:03:24.560
<v Speaker 1>we were beginning to specialize in certain fields out of necessity.

0:03:24.560 --> 0:03:29.000
<v Speaker 1>That you couldn't just be a general knowledge master. Eventually

0:03:29.400 --> 0:03:32.480
<v Speaker 1>you were starting to develop our our knowledge in different areas,

0:03:32.960 --> 0:03:36.120
<v Speaker 1>uh so far that you had to specialize. You couldn't

0:03:36.120 --> 0:03:38.520
<v Speaker 1>be an expert in everything to get get a really

0:03:38.520 --> 0:03:42.200
<v Speaker 1>deep understanding about a particular field, such as physics or chemistry,

0:03:42.600 --> 0:03:45.480
<v Speaker 1>we might dedicate all our resources to that pursuit as

0:03:45.480 --> 0:03:48.880
<v Speaker 1>an individual. Meanwhile, there are other people who are exploring

0:03:49.000 --> 0:03:54.280
<v Speaker 1>different subjects, like pure mathematics or cosmology or something like that. Now, this,

0:03:54.680 --> 0:03:57.560
<v Speaker 1>Bush argued, presented a new challenge. How do we create

0:03:57.640 --> 0:04:02.240
<v Speaker 1>a usable record of our discovery, one that's easily navigable

0:04:02.440 --> 0:04:06.480
<v Speaker 1>and remains relevant over time. While an older library classification

0:04:06.520 --> 0:04:11.120
<v Speaker 1>system might encompass several categories, it couldn't get as granular

0:04:11.200 --> 0:04:14.120
<v Speaker 1>as our knowledge was growing to be. For example, the

0:04:14.160 --> 0:04:18.440
<v Speaker 1>Library of Congress classification system has twenty one categories that

0:04:18.520 --> 0:04:21.200
<v Speaker 1>you can use to group books together. But as our

0:04:21.240 --> 0:04:25.400
<v Speaker 1>research and discoveries honed in on ever more precise slices

0:04:25.480 --> 0:04:29.839
<v Speaker 1>of those categories, the system becomes less relevant because you've

0:04:29.960 --> 0:04:35.160
<v Speaker 1>you've got, you know, minor categories within those major categories,

0:04:35.560 --> 0:04:38.919
<v Speaker 1>so it gets harder to start classifying things. Bush said

0:04:39.120 --> 0:04:41.920
<v Speaker 1>we needed to have a record that could be continuously

0:04:42.080 --> 0:04:47.679
<v Speaker 1>extended and easy to consult. But he went even further

0:04:47.760 --> 0:04:50.640
<v Speaker 1>out than that. He said, to make it a really

0:04:50.839 --> 0:04:53.719
<v Speaker 1>useful record, we need to structure it to respond to

0:04:53.760 --> 0:04:56.200
<v Speaker 1>our queries in a way similar to how the human

0:04:56.240 --> 0:05:00.680
<v Speaker 1>mind works. Bush argued that we think through associate. We

0:05:00.760 --> 0:05:05.800
<v Speaker 1>associate ideas with each other, sometimes in pretty unusual ways,

0:05:05.839 --> 0:05:09.040
<v Speaker 1>in ways that might seem intuitive to us. But on

0:05:09.080 --> 0:05:11.920
<v Speaker 1>the very surface of it, there there doesn't seem to

0:05:11.920 --> 0:05:14.960
<v Speaker 1>be any relation between those ideas. And you may have

0:05:15.080 --> 0:05:17.479
<v Speaker 1>experienced this where you're thinking about one thing and you

0:05:17.560 --> 0:05:20.040
<v Speaker 1>just start to think about a different thing that doesn't

0:05:20.080 --> 0:05:22.520
<v Speaker 1>seem to be related, and then you're able to relate

0:05:22.520 --> 0:05:26.560
<v Speaker 1>the two. This is really human ingenuity. It's where innovation

0:05:26.760 --> 0:05:30.520
<v Speaker 1>really takes off. Well, Bush, that would probably be impossible

0:05:30.720 --> 0:05:33.159
<v Speaker 1>for us to create an artificial system that could replicate

0:05:33.200 --> 0:05:35.800
<v Speaker 1>that tendency, but we could at the very least design

0:05:35.920 --> 0:05:39.520
<v Speaker 1>something that acknowledges that human trait so it works better

0:05:39.640 --> 0:05:42.400
<v Speaker 1>for us. So if we did that, if we designed

0:05:42.440 --> 0:05:45.240
<v Speaker 1>to search for a record for a particular type of information,

0:05:45.760 --> 0:05:48.760
<v Speaker 1>we might also see the opportunity to search for tangential

0:05:48.839 --> 0:05:52.320
<v Speaker 1>data that is relevant to our needs. A good system

0:05:52.360 --> 0:05:54.640
<v Speaker 1>would be able to anticipate that and serve up the

0:05:54.680 --> 0:05:58.200
<v Speaker 1>information for us. So Bush proposed a hypothetical system called

0:05:58.360 --> 0:06:02.240
<v Speaker 1>mimics m E M E X and that would use

0:06:02.279 --> 0:06:08.000
<v Speaker 1>associative factors to organize information in a virtually limitless storage space. Again,

0:06:08.000 --> 0:06:11.039
<v Speaker 1>this is hypothetical. It would be a system that one

0:06:11.080 --> 0:06:13.680
<v Speaker 1>could reference and send a retrieval command to get the

0:06:13.720 --> 0:06:16.560
<v Speaker 1>most relevant information related to whatever it was you were

0:06:16.560 --> 0:06:20.200
<v Speaker 1>asking for your query. Essentially, he was talking about a

0:06:20.200 --> 0:06:24.560
<v Speaker 1>conceptual model that the Internet attempts to realize. Now skip

0:06:24.560 --> 0:06:27.280
<v Speaker 1>ahead to the nineteen sixties. Then you've got a computer

0:06:27.320 --> 0:06:32.240
<v Speaker 1>scientist named Jerry Saltan. Jerry Salton taught at Cornell University,

0:06:32.279 --> 0:06:36.480
<v Speaker 1>and he developed an indexing strategy using a vector space model.

0:06:37.040 --> 0:06:39.680
<v Speaker 1>Now this gets a bit mind bendy for people who

0:06:39.720 --> 0:06:43.080
<v Speaker 1>haven't worked with vector space models, but follow me here. Now,

0:06:43.120 --> 0:06:47.400
<v Speaker 1>start with an imaginary virtual space kind of analogous to

0:06:47.520 --> 0:06:50.680
<v Speaker 1>the physical space we live in in our day to

0:06:50.760 --> 0:06:54.839
<v Speaker 1>day lives. Now, in our reality, we can perceive three dimensions,

0:06:54.960 --> 0:06:57.440
<v Speaker 1>and we experience a fourth one, that of time, but

0:06:57.520 --> 0:07:01.400
<v Speaker 1>we cannot directly perceive any more than that ourselves, So

0:07:01.440 --> 0:07:03.680
<v Speaker 1>most of the time we associate the physical world with

0:07:03.800 --> 0:07:08.000
<v Speaker 1>three physical dimensions. Now, the information retrieval method that Salton

0:07:08.080 --> 0:07:11.280
<v Speaker 1>set up, he defined the number of dimensions within his

0:07:11.520 --> 0:07:15.960
<v Speaker 1>virtual space by the number of terms in a retrieval request.

0:07:16.240 --> 0:07:20.800
<v Speaker 1>So if your request included five terms, the vector space

0:07:20.800 --> 0:07:25.119
<v Speaker 1>model would have five dimensions. Documents within the model would

0:07:25.160 --> 0:07:29.920
<v Speaker 1>virtually appear as vectors within the space according to which

0:07:29.920 --> 0:07:33.200
<v Speaker 1>of the search terms were present within those documents and

0:07:33.240 --> 0:07:36.520
<v Speaker 1>how frequently they were present within the documents. Uh, the

0:07:36.640 --> 0:07:40.240
<v Speaker 1>queries and the documents are both vectors of the term counts.

0:07:40.240 --> 0:07:42.400
<v Speaker 1>And just in case you're as rusty on your physics

0:07:42.480 --> 0:07:45.000
<v Speaker 1>terms as I am, a vector is a quantity that

0:07:45.040 --> 0:07:50.120
<v Speaker 1>has a magnitude and a direction. So your terms have vectors,

0:07:50.160 --> 0:07:52.720
<v Speaker 1>your documents have vectors, and the goal is to identify

0:07:52.760 --> 0:07:55.640
<v Speaker 1>the documents that are most similar to the initial query

0:07:55.720 --> 0:07:58.560
<v Speaker 1>in an effort to retrieve the most relevant results, well

0:07:58.640 --> 0:08:01.240
<v Speaker 1>leaving out anything that doesn't meet the criterion or doesn't

0:08:01.240 --> 0:08:04.880
<v Speaker 1>meant a predetermined threshold of relevance. So you might say,

0:08:05.160 --> 0:08:09.640
<v Speaker 1>I need to have x percentage match for the retrieval

0:08:09.800 --> 0:08:12.120
<v Speaker 1>to actually come through, and anything that doesn't meet that

0:08:12.200 --> 0:08:15.520
<v Speaker 1>threshold gets discarded. It's not it's not served to me,

0:08:16.040 --> 0:08:18.400
<v Speaker 1>and that saves you time when you start sorting through

0:08:18.480 --> 0:08:21.920
<v Speaker 1>the results to see if any of those actually represent

0:08:21.960 --> 0:08:24.960
<v Speaker 1>the information you were actually looking for. Now, suffice it

0:08:25.000 --> 0:08:27.760
<v Speaker 1>to say, this model really looks for the presence of

0:08:27.800 --> 0:08:32.080
<v Speaker 1>specific terms, but not necessarily their use within the document

0:08:32.120 --> 0:08:35.280
<v Speaker 1>their context, So you could end up retrieving a document

0:08:35.320 --> 0:08:38.400
<v Speaker 1>that technically contains all the terms you used in the search,

0:08:38.880 --> 0:08:42.600
<v Speaker 1>but it has no real relevance to your actual needs.

0:08:42.640 --> 0:08:47.280
<v Speaker 1>So that is a limitation of this model, but still

0:08:47.360 --> 0:08:50.000
<v Speaker 1>it was a pretty good starting point, so Saltan's work

0:08:50.040 --> 0:08:53.840
<v Speaker 1>was incredibly important. Another big thinker who helped shape the

0:08:53.880 --> 0:08:57.040
<v Speaker 1>course of what would become the Internet and the Web

0:08:57.400 --> 0:08:59.959
<v Speaker 1>is a guy named Ted Nelson who in the nineteenes

0:09:00.000 --> 0:09:03.160
<v Speaker 1>sixties proposed an idea he called Zanna Do. And I'm

0:09:03.200 --> 0:09:05.920
<v Speaker 1>not talking about the cheesy movie starring Olivia Newton John

0:09:06.000 --> 0:09:09.360
<v Speaker 1>about roller skating Greek muses, but as a side note,

0:09:09.360 --> 0:09:12.880
<v Speaker 1>I really love that movie now. Nelson's Zanna Do was

0:09:12.920 --> 0:09:16.280
<v Speaker 1>a hypothetical computer based writing system that would have a

0:09:16.360 --> 0:09:20.640
<v Speaker 1>means to link different documents within a global depository. So

0:09:20.760 --> 0:09:23.720
<v Speaker 1>essentially he was talking about hypertext links, which would allow

0:09:23.800 --> 0:09:27.480
<v Speaker 1>users to navigate from document to document to relate documents together,

0:09:28.160 --> 0:09:32.800
<v Speaker 1>so that you could have a collection of documents about

0:09:32.840 --> 0:09:35.720
<v Speaker 1>the same sort of of subject matter and make it

0:09:35.800 --> 0:09:39.040
<v Speaker 1>very easy to reference different research. It would also allow

0:09:39.160 --> 0:09:41.679
<v Speaker 1>document creators to add their work to a growing collection

0:09:41.720 --> 0:09:44.800
<v Speaker 1>of documents about similar subjects. Now, while the Web would

0:09:44.840 --> 0:09:48.960
<v Speaker 1>incorporate many of Nelson's ideas, he has stated that the

0:09:49.040 --> 0:09:51.880
<v Speaker 1>web falls far short of what Zanna do was meant

0:09:52.000 --> 0:09:54.840
<v Speaker 1>to do. Still, those links would become very important for

0:09:54.880 --> 0:09:56.840
<v Speaker 1>the web. Heck, I mean you could argue the links

0:09:56.880 --> 0:09:58.880
<v Speaker 1>or what make it a web in the first place.

0:09:59.240 --> 0:10:03.600
<v Speaker 1>The World Wide Web is a series of documents published

0:10:03.600 --> 0:10:08.280
<v Speaker 1>on servers that have connective tissue between them. That's the

0:10:08.320 --> 0:10:11.600
<v Speaker 1>web that you navigate. So it would be crucial in

0:10:11.640 --> 0:10:15.200
<v Speaker 1>Google's eventual successes. We'll see now. In the nineteen seventies,

0:10:15.559 --> 0:10:18.080
<v Speaker 1>the agency that would become DARPA, which at the time

0:10:18.120 --> 0:10:22.040
<v Speaker 1>was just ARPA, funded the development of the ARPA Net,

0:10:22.360 --> 0:10:26.400
<v Speaker 1>which would be the predecessor to the Internet. Computer scientists

0:10:26.480 --> 0:10:28.840
<v Speaker 1>worked on the rules that machines would have to follow

0:10:28.920 --> 0:10:31.319
<v Speaker 1>in order to communicate with one another over a network.

0:10:31.800 --> 0:10:34.080
<v Speaker 1>This was a non trivial problem at the time because

0:10:34.679 --> 0:10:38.360
<v Speaker 1>computers were dependent upon proprietary systems that were not compatible

0:10:38.440 --> 0:10:42.520
<v Speaker 1>with computers from other manufacturers. So, in other words, they

0:10:42.559 --> 0:10:44.680
<v Speaker 1>were talking in different languages. So you have to find

0:10:44.720 --> 0:10:48.600
<v Speaker 1>a common means of communication between these different machines. Solving

0:10:48.600 --> 0:10:51.080
<v Speaker 1>those problems laid the foundation for the Internet that was

0:10:51.120 --> 0:10:54.600
<v Speaker 1>to follow. Now skipping ahead to the late nineteen eighties,

0:10:54.920 --> 0:10:57.640
<v Speaker 1>this is still before the Web was a thing, but

0:10:58.040 --> 0:11:03.360
<v Speaker 1>college students Alan Mta and Bill Healen recognized the need

0:11:03.559 --> 0:11:06.520
<v Speaker 1>for a tool to search file databases. Effectively, they were

0:11:06.520 --> 0:11:09.880
<v Speaker 1>part of a project at the McGill University School of

0:11:09.880 --> 0:11:13.000
<v Speaker 1>Computer Science to develop that kind of a tool. It

0:11:13.040 --> 0:11:15.840
<v Speaker 1>would become known as Archie, and it was meant to

0:11:15.880 --> 0:11:20.280
<v Speaker 1>search archives of files. The original version was pretty primitive.

0:11:20.679 --> 0:11:24.200
<v Speaker 1>It would essentially just send an automated request to a

0:11:24.240 --> 0:11:27.640
<v Speaker 1>file Transfer Protocol server and it would just say, hey,

0:11:27.679 --> 0:11:30.280
<v Speaker 1>give me a list of all the files that are

0:11:30.320 --> 0:11:33.839
<v Speaker 1>stored on your server. That's it, just give me a

0:11:33.920 --> 0:11:35.720
<v Speaker 1>laundry list of all the files that are on there.

0:11:36.160 --> 0:11:39.240
<v Speaker 1>And it was once a month it would send this request,

0:11:39.760 --> 0:11:42.080
<v Speaker 1>and so really it was just a list of the

0:11:42.160 --> 0:11:46.160
<v Speaker 1>documents that were available on that FTP server, not anything more,

0:11:46.559 --> 0:11:49.320
<v Speaker 1>you know, sophisticated than that. But it would grow to

0:11:49.320 --> 0:11:52.120
<v Speaker 1>become a query search tool, allowing users to look for

0:11:52.160 --> 0:11:55.600
<v Speaker 1>files containing specific terms in them or with specific titles.

0:11:56.240 --> 0:11:59.839
<v Speaker 1>Other schools would develop similar search tools in the following years,

0:12:00.200 --> 0:12:04.360
<v Speaker 1>naming them after characters from Archie comics like Veronica and

0:12:04.480 --> 0:12:07.280
<v Speaker 1>jug Head. Now this is despite the fact that Mtaj

0:12:07.360 --> 0:12:10.959
<v Speaker 1>said he intended no association with Archie comics at all.

0:12:11.000 --> 0:12:14.680
<v Speaker 1>He chose the name Archie because it's archive but without

0:12:14.679 --> 0:12:17.520
<v Speaker 1>the V. But sometimes memes just take hold, even if

0:12:17.520 --> 0:12:21.760
<v Speaker 1>they're based off a misunderstanding. Also, both Veronica and Jugead

0:12:22.000 --> 0:12:26.360
<v Speaker 1>search for files in the Gopher index system, a predecessor

0:12:26.440 --> 0:12:29.080
<v Speaker 1>and alternative to the Worldwide Web. I did an episode

0:12:29.080 --> 0:12:32.040
<v Speaker 1>about Gopher a couple of years ago. I think, so

0:12:32.080 --> 0:12:33.640
<v Speaker 1>he can search the archives if you want to hear

0:12:33.679 --> 0:12:38.240
<v Speaker 1>about that. Now. This leads us to when Tim Burners

0:12:38.320 --> 0:12:41.920
<v Speaker 1>Lee built and published the world's first web page. Burners

0:12:41.960 --> 0:12:45.000
<v Speaker 1>Lee had done some work with hypertext documents at CERN

0:12:45.160 --> 0:12:48.400
<v Speaker 1>as a contractor in the early eighties. The goal then

0:12:48.600 --> 0:12:51.559
<v Speaker 1>was to help researchers share information between each other as

0:12:51.559 --> 0:12:54.480
<v Speaker 1>they were smashing particles against each other really really hard.

0:12:54.920 --> 0:12:58.840
<v Speaker 1>By burners Lee was thinking about pairing the hypertext capabilities

0:12:58.840 --> 0:13:01.800
<v Speaker 1>with the Internet to allow for an interconnected series of

0:13:01.840 --> 0:13:06.000
<v Speaker 1>documents hosted on networked Internet servers, and thus the World

0:13:06.040 --> 0:13:08.760
<v Speaker 1>Wide Web was born. It wouldn't take long for others

0:13:08.800 --> 0:13:11.320
<v Speaker 1>to jump on the idea, and that meant it wouldn't

0:13:11.360 --> 0:13:13.840
<v Speaker 1>be long before people needed a tool to search the

0:13:13.880 --> 0:13:17.520
<v Speaker 1>growing collection of documents on the Internet. And that kind

0:13:17.520 --> 0:13:19.880
<v Speaker 1>of sets me up for the next section, which I

0:13:19.920 --> 0:13:22.360
<v Speaker 1>will tackle in just a moment after we take this

0:13:22.440 --> 0:13:32.880
<v Speaker 1>quake break. So in the earliest days of the web,

0:13:32.960 --> 0:13:36.120
<v Speaker 1>when calling it a web might have been a little generous,

0:13:36.440 --> 0:13:40.120
<v Speaker 1>cern maintained a list of web servers that hosted web pages.

0:13:40.480 --> 0:13:44.280
<v Speaker 1>This was all part of the Worldwide Web Virtual Library

0:13:44.480 --> 0:13:49.240
<v Speaker 1>or vlib or sometimes www v lib. This was the

0:13:49.280 --> 0:13:52.560
<v Speaker 1>first index of web content and it relied upon real

0:13:52.600 --> 0:13:54.959
<v Speaker 1>life human beings to build out the index. As more

0:13:54.960 --> 0:13:58.240
<v Speaker 1>web pages were publishing, they volunteered their time to build

0:13:58.240 --> 0:14:01.760
<v Speaker 1>out the index. So this is automated. People were actually

0:14:02.240 --> 0:14:06.199
<v Speaker 1>doing this by hand adding the the names and the

0:14:06.280 --> 0:14:10.680
<v Speaker 1>addresses to these different sites on this index. Next we

0:14:10.720 --> 0:14:15.120
<v Speaker 1>have Matthew Gray's Worldwide Web Wanderer. Now, this was a

0:14:15.200 --> 0:14:18.000
<v Speaker 1>bot or an autonomous program on a network that can

0:14:18.080 --> 0:14:21.800
<v Speaker 1>interact in some significant way with the information on the network.

0:14:22.160 --> 0:14:24.520
<v Speaker 1>And we deal with bots all the time. Sometimes it's

0:14:24.520 --> 0:14:27.680
<v Speaker 1>in the background and humans don't really notice, and sometimes

0:14:28.080 --> 0:14:30.720
<v Speaker 1>like chat pots, it's very much in front of us.

0:14:31.200 --> 0:14:34.000
<v Speaker 1>The butt that Matthew Gray created would navigate the World

0:14:34.080 --> 0:14:37.040
<v Speaker 1>Wide Web to keep counting of how many active servers

0:14:37.080 --> 0:14:39.960
<v Speaker 1>there were in any network. It was essentially measuring the

0:14:40.000 --> 0:14:43.600
<v Speaker 1>growth of the web over time by counting up these servers.

0:14:43.920 --> 0:14:46.560
<v Speaker 1>As more servers came online, we learned that the World

0:14:46.560 --> 0:14:49.920
<v Speaker 1>Wide Web was growing. Gray upgraded the bot to actually

0:14:49.920 --> 0:14:52.360
<v Speaker 1>capture the u r l's of web pages, because earlier

0:14:52.360 --> 0:14:55.720
<v Speaker 1>it was just counting stuff. It wasn't actually making note

0:14:55.760 --> 0:14:58.520
<v Speaker 1>of anything in particular, and so I got a little

0:14:58.560 --> 0:15:02.800
<v Speaker 1>more sophisticated gray bill out of database of these captured

0:15:02.880 --> 0:15:05.680
<v Speaker 1>u r l s, called wand decks. The bought would

0:15:05.720 --> 0:15:09.440
<v Speaker 1>ping servers multiple times each day, and it actually became

0:15:09.440 --> 0:15:12.240
<v Speaker 1>a problem that was pinging so frequently. And a ping

0:15:12.320 --> 0:15:14.640
<v Speaker 1>is just a quick message that essentially says, hey, are

0:15:14.680 --> 0:15:17.240
<v Speaker 1>you there, and then it's waiting for a response of yeah,

0:15:17.360 --> 0:15:19.880
<v Speaker 1>I'm here. It's all good. But it was doing this

0:15:19.960 --> 0:15:23.480
<v Speaker 1>so many times each day that it was actually starting

0:15:23.520 --> 0:15:26.040
<v Speaker 1>to create lag on the Internet. Of course, this is

0:15:26.040 --> 0:15:29.240
<v Speaker 1>in the very very early days, so whoopsie daisy there.

0:15:29.680 --> 0:15:33.760
<v Speaker 1>Now toward the end of n some early web search

0:15:33.840 --> 0:15:36.520
<v Speaker 1>tools were starting to make their way to the general public. Though,

0:15:36.840 --> 0:15:39.320
<v Speaker 1>keep in mind that in the very early days of

0:15:39.360 --> 0:15:43.040
<v Speaker 1>the Worldwide Web, the general public accessing web pages was

0:15:43.080 --> 0:15:46.280
<v Speaker 1>really just a tiny fraction of the overall population. It's

0:15:46.320 --> 0:15:50.720
<v Speaker 1>like college students, some early adopters, some folks with various

0:15:50.800 --> 0:15:55.520
<v Speaker 1>government agencies, and a few companies, but not a whole lot. Uh.

0:15:55.640 --> 0:15:58.120
<v Speaker 1>There was largely a mysterious thing. You know. This is

0:15:58.160 --> 0:16:01.120
<v Speaker 1>when people were just starting to hear the terms of

0:16:01.360 --> 0:16:05.280
<v Speaker 1>Worldwide Web and information super Highway, because the Internet had

0:16:05.320 --> 0:16:07.400
<v Speaker 1>been around for a while, but most people didn't have

0:16:07.400 --> 0:16:11.320
<v Speaker 1>any regular way to access it. So these tools could

0:16:11.360 --> 0:16:15.880
<v Speaker 1>help you find stuff, but they weren't super sophisticated. There

0:16:15.960 --> 0:16:19.960
<v Speaker 1>was the Worldwide web Worm, which would pull together lists

0:16:20.040 --> 0:16:22.200
<v Speaker 1>of titles and u r l s for web pages.

0:16:23.040 --> 0:16:26.080
<v Speaker 1>There was jump Station, which would pull down information about

0:16:26.080 --> 0:16:29.800
<v Speaker 1>web pages titles and header sections, so sort of like

0:16:30.000 --> 0:16:32.160
<v Speaker 1>the title of the web page and a brief description

0:16:32.200 --> 0:16:34.400
<v Speaker 1>of what the web page was supposed to be. But

0:16:34.520 --> 0:16:37.000
<v Speaker 1>both of those tools were very simple, and they would

0:16:37.040 --> 0:16:40.160
<v Speaker 1>present results in the order that they were found by

0:16:40.280 --> 0:16:44.920
<v Speaker 1>the tools, so there was no ranking of the search results.

0:16:45.160 --> 0:16:48.320
<v Speaker 1>It was all by by uh, first come, first serve

0:16:48.400 --> 0:16:52.960
<v Speaker 1>kind of approach. So it might be that your results

0:16:53.000 --> 0:16:56.160
<v Speaker 1>all had whatever your query was in it, but the

0:16:56.160 --> 0:16:58.760
<v Speaker 1>most relevant ones could be buried much further down the

0:16:58.840 --> 0:17:02.080
<v Speaker 1>list because they didn't rank in any way. Then there

0:17:02.160 --> 0:17:05.520
<v Speaker 1>was the rb SC spider, which actually attempted to rank

0:17:05.600 --> 0:17:08.760
<v Speaker 1>results by relevance. But all three of these were limited

0:17:08.760 --> 0:17:10.560
<v Speaker 1>in what they could do, and often you needed to

0:17:10.600 --> 0:17:13.800
<v Speaker 1>know what you were looking for exactly in order to

0:17:13.840 --> 0:17:16.560
<v Speaker 1>get a hit. In other words, you couldn't just do

0:17:16.760 --> 0:17:21.800
<v Speaker 1>a string of words. You certainly couldn't write in natural

0:17:21.920 --> 0:17:25.080
<v Speaker 1>language what your query was, so you might have to

0:17:25.080 --> 0:17:27.439
<v Speaker 1>put in the actual title of a page in order

0:17:27.520 --> 0:17:31.159
<v Speaker 1>to get the response back. So you would have to

0:17:31.200 --> 0:17:33.240
<v Speaker 1>know what the page's title is, but you're not. You

0:17:33.320 --> 0:17:35.200
<v Speaker 1>obviously don't know what the U r L is, or

0:17:35.200 --> 0:17:38.439
<v Speaker 1>else you would just navigate to the page directly. You

0:17:38.520 --> 0:17:41.640
<v Speaker 1>just type in the address and your browser's address bar

0:17:41.680 --> 0:17:45.600
<v Speaker 1>and go there. So it was kind of limited in

0:17:45.680 --> 0:17:49.080
<v Speaker 1>its utility. If you were to do anything outside of

0:17:49.160 --> 0:17:51.280
<v Speaker 1>the actual title of a page, you might not find

0:17:51.280 --> 0:17:55.680
<v Speaker 1>any hits, even if such pages actually existed out there. Also,

0:17:55.720 --> 0:17:59.600
<v Speaker 1>in some Stanford undergraduates decided to take the work they

0:17:59.640 --> 0:18:02.800
<v Speaker 1>had been doing on a project called Architect and develop

0:18:02.840 --> 0:18:05.720
<v Speaker 1>a web crawling search tool based off of that work.

0:18:06.320 --> 0:18:10.880
<v Speaker 1>Architect was all about using statistical analysis of word relationships

0:18:10.920 --> 0:18:14.000
<v Speaker 1>in an effort to kind of build a basic understanding

0:18:14.040 --> 0:18:18.040
<v Speaker 1>of what the subject matter was and that would then

0:18:18.160 --> 0:18:21.760
<v Speaker 1>be able to help you create more relevant search results

0:18:21.760 --> 0:18:26.480
<v Speaker 1>on queries. So you run a search request and this

0:18:26.560 --> 0:18:32.720
<v Speaker 1>tool would statistically analyze various indexed pages in its database

0:18:33.320 --> 0:18:37.760
<v Speaker 1>and return the results that appeared to be the most relevant. Um.

0:18:37.800 --> 0:18:40.760
<v Speaker 1>It was an interesting approach. It was definitely one that

0:18:40.840 --> 0:18:44.879
<v Speaker 1>was needed because it wasn't just listing the the sites

0:18:44.920 --> 0:18:49.000
<v Speaker 1>chronologically based on how they were attained. But it would

0:18:49.000 --> 0:18:53.880
<v Speaker 1>take about two years for this project to actually turn

0:18:53.920 --> 0:18:58.119
<v Speaker 1>into something that the group could unveil uh and when

0:18:58.160 --> 0:19:02.880
<v Speaker 1>they did, they called the will Excite and they held

0:19:02.920 --> 0:19:07.159
<v Speaker 1>a commercial release for the product in n But in

0:19:07.200 --> 0:19:11.640
<v Speaker 1>between the founding and the release of Excite, we hit

0:19:11.760 --> 0:19:18.159
<v Speaker 1>a banner year for early search engines. Nineteen four was

0:19:18.200 --> 0:19:22.960
<v Speaker 1>the year that web crawler, lycos Info, Seek, and Yahoo

0:19:23.160 --> 0:19:26.320
<v Speaker 1>all got their start. Now, with the case of Yahoo,

0:19:26.480 --> 0:19:29.320
<v Speaker 1>the company was not relying on bots to crawl through

0:19:29.359 --> 0:19:33.040
<v Speaker 1>web servers to index all the pages that the bots

0:19:33.119 --> 0:19:38.000
<v Speaker 1>came across. Instead, Yahoo initially was relying on actual human

0:19:38.040 --> 0:19:41.680
<v Speaker 1>beings to curate an index, so they were actually going

0:19:41.720 --> 0:19:44.600
<v Speaker 1>to web pages deciding whether or not those web pages

0:19:44.680 --> 0:19:48.280
<v Speaker 1>were good enough to be listed on Yahoo on the

0:19:48.359 --> 0:19:51.359
<v Speaker 1>various subjects that Yahoo was covering, and then they would

0:19:51.400 --> 0:19:54.600
<v Speaker 1>be grouped together if they passed muster. Now, there are

0:19:54.680 --> 0:19:57.479
<v Speaker 1>pros and cons to that approach. One of the pros

0:19:57.560 --> 0:20:00.320
<v Speaker 1>is that because it is human curated, there a much

0:20:00.359 --> 0:20:03.800
<v Speaker 1>better possibility that the web pages on Yeah whose lists

0:20:03.840 --> 0:20:07.119
<v Speaker 1>were good ones with good content. But the conside was

0:20:07.160 --> 0:20:09.439
<v Speaker 1>that as the web grew and began growing at an

0:20:09.480 --> 0:20:13.320
<v Speaker 1>even faster rate, it really limited Yahoo's usefulness. It would

0:20:13.359 --> 0:20:16.160
<v Speaker 1>only be later that Yahoo would branch out into the

0:20:16.200 --> 0:20:19.160
<v Speaker 1>web search in general, and even then it relied very

0:20:19.160 --> 0:20:21.959
<v Speaker 1>heavily on third parties for the actual search tools. They

0:20:21.960 --> 0:20:26.400
<v Speaker 1>didn't really dive into developing their own. They were more

0:20:26.440 --> 0:20:31.560
<v Speaker 1>about making deals with other search engines to power their search.

0:20:31.560 --> 0:20:34.680
<v Speaker 1>In fact, that happened on and off throughout Yahoo's entire existence.

0:20:35.359 --> 0:20:38.960
<v Speaker 1>But let's get back to web Crawler, Lycos and Infoseek. Now,

0:20:39.000 --> 0:20:41.960
<v Speaker 1>of those three, WebCrawler was the first to provide full

0:20:42.080 --> 0:20:45.600
<v Speaker 1>text search of web pages, so not just headers and titles.

0:20:45.960 --> 0:20:49.359
<v Speaker 1>You could search terms, and if they appeared in the

0:20:49.440 --> 0:20:52.840
<v Speaker 1>web page at all, then, in theory, WebCrawler would be

0:20:52.880 --> 0:20:55.080
<v Speaker 1>able to bring that back as long as it was

0:20:55.160 --> 0:20:59.080
<v Speaker 1>indexed in Webcrawler's index. Um it was the work of

0:20:59.600 --> 0:21:03.600
<v Speaker 1>universe the a Washington student named Brian Pinkerton, and Pinkerton's

0:21:03.600 --> 0:21:07.679
<v Speaker 1>web Crawler built out this big index of pages, and

0:21:07.720 --> 0:21:11.879
<v Speaker 1>Pinkerton started rather modestly. He first released a list of

0:21:11.920 --> 0:21:15.560
<v Speaker 1>the top twenty five websites on the web on March fifteenth,

0:21:16.960 --> 0:21:19.040
<v Speaker 1>and the following month he announced that the web Crawler's

0:21:19.080 --> 0:21:24.320
<v Speaker 1>index included four thousand websites, and by June of ninety four,

0:21:24.720 --> 0:21:28.199
<v Speaker 1>he made the index searchable for everyone. So again, this

0:21:28.280 --> 0:21:31.119
<v Speaker 1>is just a slice of all the websites that were

0:21:31.119 --> 0:21:34.560
<v Speaker 1>out there, but it was a decent enough slice to

0:21:34.640 --> 0:21:37.679
<v Speaker 1>start off with, and the endeavor proved to be successful.

0:21:37.720 --> 0:21:40.879
<v Speaker 1>Pinkerton received financial investments from a couple of big companies,

0:21:41.119 --> 0:21:43.240
<v Speaker 1>and within a year he had managed to support the

0:21:43.280 --> 0:21:47.280
<v Speaker 1>service through advertising revenue, a model that other search engines

0:21:47.320 --> 0:21:49.280
<v Speaker 1>would follow, so he was able to actually make money

0:21:49.280 --> 0:21:54.480
<v Speaker 1>by serving up advertising on his search engine pages. By June,

0:21:54.760 --> 0:21:57.480
<v Speaker 1>A O. L had become interested in WebCrawler and would

0:21:57.520 --> 0:22:00.440
<v Speaker 1>purchase the company a O L would lay Eater sell

0:22:00.480 --> 0:22:02.840
<v Speaker 1>the company a little less than two years later to

0:22:02.920 --> 0:22:05.760
<v Speaker 1>excite that company that I had mentioned earlier in this episode.

0:22:05.800 --> 0:22:07.439
<v Speaker 1>I'll get back to them and to web Crawler a

0:22:07.440 --> 0:22:09.760
<v Speaker 1>bit later, but I will say that web Crawler was

0:22:09.840 --> 0:22:12.360
<v Speaker 1>my search engine of choice when I first started using

0:22:12.400 --> 0:22:15.399
<v Speaker 1>the web in the mid nineteen nineties. I was actually

0:22:15.440 --> 0:22:18.120
<v Speaker 1>pretty slow to move over to that crazy Google thing

0:22:18.160 --> 0:22:21.280
<v Speaker 1>that we're gonna get to later in this episode. Lycos meanwhile,

0:22:21.640 --> 0:22:25.800
<v Speaker 1>started off as a project at Carnegie Mellon University. Michael

0:22:25.880 --> 0:22:28.880
<v Speaker 1>Malden headed up the project and the name came from

0:22:29.000 --> 0:22:34.520
<v Speaker 1>wolf spiders that have the scientific name Lycos sedilla. When

0:22:34.640 --> 0:22:37.560
<v Speaker 1>Lycos became a company, Bob Davis took the helm to

0:22:37.680 --> 0:22:40.920
<v Speaker 1>turn it into a revenue generating business that it gets

0:22:40.920 --> 0:22:43.679
<v Speaker 1>cash from advertising like web Crawler, and it also was

0:22:43.760 --> 0:22:46.920
<v Speaker 1>a success, and by the end of nine the Lycos

0:22:47.040 --> 0:22:50.720
<v Speaker 1>index was the largest web search index available on the web.

0:22:51.040 --> 0:22:55.679
<v Speaker 1>It held more than sixty million documents in it. The

0:22:55.720 --> 0:22:59.280
<v Speaker 1>service grew tremendously, as did the company, and the full

0:22:59.320 --> 0:23:01.400
<v Speaker 1>story of Like Coast is one I'll have to cover

0:23:01.440 --> 0:23:04.480
<v Speaker 1>in another episode because it gets pretty bonkers. But for

0:23:04.520 --> 0:23:06.320
<v Speaker 1>this episode, it's just important to note that it was

0:23:06.359 --> 0:23:11.119
<v Speaker 1>another early search service that grew and became diversified and

0:23:11.200 --> 0:23:15.479
<v Speaker 1>tried to do lots of other stuff. Um Steve Kirsch

0:23:15.880 --> 0:23:18.880
<v Speaker 1>would be the guy behind info Seque. That one originally

0:23:18.960 --> 0:23:22.520
<v Speaker 1>launched as a pay for use service, so it's an

0:23:22.560 --> 0:23:27.160
<v Speaker 1>original revenue model. Wasn't advertising, it was you would pay

0:23:27.240 --> 0:23:30.960
<v Speaker 1>to use it. Now that only lasted about half a year,

0:23:30.960 --> 0:23:32.640
<v Speaker 1>a little more than half a year before a kurse

0:23:32.760 --> 0:23:36.240
<v Speaker 1>dropped the fee and it became free to use, and

0:23:36.240 --> 0:23:40.200
<v Speaker 1>by February the service became known as info see Search

0:23:41.119 --> 0:23:46.000
<v Speaker 1>and also Netscape and Infoseque negotiated a deal in which

0:23:46.040 --> 0:23:49.200
<v Speaker 1>info sque would become the default search engine and Netscape's

0:23:49.400 --> 0:23:53.960
<v Speaker 1>web browser, so that really helped info squ's penetration quite

0:23:53.960 --> 0:23:57.360
<v Speaker 1>a bit in those days. Now, one thing Infoseque incorporated

0:23:57.440 --> 0:24:00.159
<v Speaker 1>in its service after a couple of years is the

0:24:00.200 --> 0:24:03.440
<v Speaker 1>option to use boolean operators. Now, these are a collection

0:24:03.480 --> 0:24:06.440
<v Speaker 1>of simple words that can help you narrow down searches.

0:24:07.040 --> 0:24:11.800
<v Speaker 1>The words include and or and not, so with an

0:24:11.840 --> 0:24:16.560
<v Speaker 1>and operator you are narrowing your focus. So if you

0:24:16.600 --> 0:24:22.320
<v Speaker 1>search for the terms Superman and movies, the results you

0:24:22.359 --> 0:24:25.040
<v Speaker 1>get should be relevant to both of those terms. You

0:24:25.080 --> 0:24:29.880
<v Speaker 1>should only get results that include information about Superman and movies.

0:24:30.760 --> 0:24:33.960
<v Speaker 1>If you're looking for specific Superman movie, hopefully those would

0:24:34.040 --> 0:24:37.480
<v Speaker 1>be right in that list. Some of them should have

0:24:37.560 --> 0:24:39.800
<v Speaker 1>the information you're looking for, and maybe that you still

0:24:39.800 --> 0:24:41.760
<v Speaker 1>have to do some digging to find them, because you're

0:24:41.760 --> 0:24:44.600
<v Speaker 1>going to get all the web pages that have both

0:24:44.800 --> 0:24:48.639
<v Speaker 1>Superman and movies inside of them. Now you could make

0:24:48.640 --> 0:24:53.200
<v Speaker 1>it more specific. You could say Superman and movies and

0:24:53.520 --> 0:24:57.400
<v Speaker 1>Christopher Reeve. That would end up narrowing the results for

0:24:57.800 --> 0:25:00.960
<v Speaker 1>those to look for. Any pages have all three of

0:25:01.000 --> 0:25:05.040
<v Speaker 1>those terms inside of them. The boolean operator or does

0:25:05.080 --> 0:25:07.919
<v Speaker 1>the opposite. It broadens your search. Maybe you want to

0:25:07.920 --> 0:25:12.600
<v Speaker 1>search for Batman or Superman, then you should get results

0:25:12.640 --> 0:25:17.280
<v Speaker 1>that have either or both Superman or Batman in them. Um,

0:25:17.320 --> 0:25:19.639
<v Speaker 1>so you would get all the Superman results, all the

0:25:19.680 --> 0:25:22.359
<v Speaker 1>Batman results. You probably also get all the Superman and

0:25:22.400 --> 0:25:27.240
<v Speaker 1>Batman results, so you're you're increasing the number that you receive.

0:25:27.800 --> 0:25:31.679
<v Speaker 1>The not boolean operator helps you eliminate options from search.

0:25:31.800 --> 0:25:35.840
<v Speaker 1>So if you searched comic books not Superman, you should

0:25:35.840 --> 0:25:39.240
<v Speaker 1>get results about comic books that don't mention or include

0:25:39.440 --> 0:25:43.000
<v Speaker 1>Superman in the web pages, so it should be discussions

0:25:43.000 --> 0:25:45.960
<v Speaker 1>about comic books, but they're not Superman comic books, or

0:25:45.960 --> 0:25:50.240
<v Speaker 1>at least Superman's name isn't appearing in the web page. Now,

0:25:50.240 --> 0:25:52.679
<v Speaker 1>Boolean search is still a great tool to help you

0:25:52.760 --> 0:25:55.199
<v Speaker 1>get the results you want, but as time has gone on,

0:25:55.280 --> 0:25:59.120
<v Speaker 1>search has become much more sophisticated, so it's not really

0:25:59.160 --> 0:26:02.560
<v Speaker 1>as necessary to become familiar with booleyan search. It's good

0:26:02.600 --> 0:26:05.560
<v Speaker 1>to know how to use it, but it's not key

0:26:05.600 --> 0:26:10.240
<v Speaker 1>because searches not only just grown more sophisticated, but growing

0:26:10.320 --> 0:26:14.320
<v Speaker 1>more intrusive. A lot of searches today rely on information

0:26:14.600 --> 0:26:18.520
<v Speaker 1>that various browsers and web pages are gathering about you,

0:26:19.040 --> 0:26:23.000
<v Speaker 1>so they're using your past behavior as a predictive tool

0:26:23.320 --> 0:26:26.879
<v Speaker 1>to help serve up relevant results. But that's a topic

0:26:26.920 --> 0:26:30.879
<v Speaker 1>for a different podcast episode. I'll do another podcast episode

0:26:30.880 --> 0:26:35.120
<v Speaker 1>about this at some point. Now. Infacyque had a search

0:26:35.160 --> 0:26:38.200
<v Speaker 1>tool that allowed users to include different modifiers on search

0:26:38.240 --> 0:26:41.800
<v Speaker 1>results to narrow down the return sites, which was becoming

0:26:41.800 --> 0:26:45.000
<v Speaker 1>important because the web was growing enormously in the mid

0:26:45.040 --> 0:26:47.560
<v Speaker 1>to late nineties and would only continue to do so.

0:26:48.160 --> 0:26:51.240
<v Speaker 1>The Walt Disney Company took notice of infoseque and would

0:26:51.240 --> 0:26:55.680
<v Speaker 1>purchase more than of the company, effectively incorporating the business

0:26:55.680 --> 0:26:59.359
<v Speaker 1>into the media empire ruled by the hand of the mouse.

0:27:00.280 --> 0:27:03.560
<v Speaker 1>Infoseque at that point had made several acquisitions of its own,

0:27:03.560 --> 0:27:07.359
<v Speaker 1>including sites like ESPN dot com and ABC news dot com,

0:27:07.400 --> 0:27:11.440
<v Speaker 1>which then became part of Disney's media Empire, and infoseque

0:27:11.440 --> 0:27:14.879
<v Speaker 1>we get rolled into Disney's Go dot com network of

0:27:14.920 --> 0:27:19.679
<v Speaker 1>services and sites, and effectively, eventually, after several years, it

0:27:19.680 --> 0:27:24.440
<v Speaker 1>would disappear into that network of sites, Infoseque would begin

0:27:24.480 --> 0:27:28.960
<v Speaker 1>to offer up manually curated search results along with automated ones. Again,

0:27:29.040 --> 0:27:31.320
<v Speaker 1>this was an effort to return the most relevant results.

0:27:31.720 --> 0:27:33.840
<v Speaker 1>You'll see if you look at the history of search

0:27:33.880 --> 0:27:36.479
<v Speaker 1>engines that a lot of them kind of experimented with

0:27:36.640 --> 0:27:40.879
<v Speaker 1>this human curated approach because that was a real issue,

0:27:41.000 --> 0:27:43.960
<v Speaker 1>was that you would use these search engines and you

0:27:43.960 --> 0:27:46.200
<v Speaker 1>would get a ton of results and only a few

0:27:46.240 --> 0:27:48.639
<v Speaker 1>of them ever seemed to be even remotely connected to

0:27:48.680 --> 0:27:52.680
<v Speaker 1>what you wanted. So putting humans in charge of that

0:27:52.920 --> 0:27:57.600
<v Speaker 1>made it a little easier to do relevant results. Because

0:27:57.760 --> 0:28:01.679
<v Speaker 1>humans understand context. They understand when a site is actually

0:28:02.119 --> 0:28:06.400
<v Speaker 1>about something versus when a site just mentions something off hand,

0:28:06.520 --> 0:28:10.480
<v Speaker 1>but it's not really about that thing. You even saw

0:28:10.520 --> 0:28:15.800
<v Speaker 1>this relatively recently. I remember, uh, shortly after I started

0:28:15.840 --> 0:28:20.480
<v Speaker 1>How Stuff works, how the service mahallow was kind of struggling,

0:28:21.080 --> 0:28:23.840
<v Speaker 1>but it was also a human curated search engine. And

0:28:24.000 --> 0:28:26.800
<v Speaker 1>we're talking like two thousand seven when I was looking

0:28:26.840 --> 0:28:29.960
<v Speaker 1>into that. Um my friend Veronica Belmont used to work

0:28:30.000 --> 0:28:33.320
<v Speaker 1>for that company, so it was still something that people

0:28:33.320 --> 0:28:36.720
<v Speaker 1>were trying even as late as the late two thousand's era,

0:28:37.000 --> 0:28:39.719
<v Speaker 1>or by late two thousand's, I mean the first decade

0:28:39.720 --> 0:28:43.760
<v Speaker 1>of two thousands. Anyway, info seq uh what tried that out.

0:28:43.920 --> 0:28:47.280
<v Speaker 1>And also one of the engineers from Infoseek, lie Yan Hong,

0:28:47.960 --> 0:28:50.920
<v Speaker 1>relocated to China and became a co founder of a

0:28:50.960 --> 0:28:54.160
<v Speaker 1>different search engine company called Bai Do be a i

0:28:54.360 --> 0:28:59.480
<v Speaker 1>du as a company that has become truly enormous, with

0:28:59.560 --> 0:29:04.840
<v Speaker 1>asset approaching a value of three hundred billion dollars. That's

0:29:04.840 --> 0:29:10.000
<v Speaker 1>actually more than what Google's parent company, Alphabet has at

0:29:10.080 --> 0:29:13.200
<v Speaker 1>its disposal. So you could argue by do one the

0:29:13.240 --> 0:29:16.800
<v Speaker 1>search wars, but then by Do is not widely known

0:29:16.840 --> 0:29:21.320
<v Speaker 1>in the West. It's a very huge company over in Asia,

0:29:21.440 --> 0:29:25.200
<v Speaker 1>but not not as well known here. Back to our

0:29:25.200 --> 0:29:30.440
<v Speaker 1>search engine history. Excite, the company I talked about earlier,

0:29:30.440 --> 0:29:33.200
<v Speaker 1>finally debuts and it did well. In fact, it did

0:29:33.280 --> 0:29:36.200
<v Speaker 1>so well that it would end up purchasing web Crawler

0:29:36.400 --> 0:29:41.280
<v Speaker 1>in But by nine it's numbers were starting to decline

0:29:41.280 --> 0:29:45.120
<v Speaker 1>thanks to you know who That rhymes was Shmoogle, and

0:29:45.160 --> 0:29:48.120
<v Speaker 1>it merged with a company called at home dot com,

0:29:48.440 --> 0:29:51.320
<v Speaker 1>the at symbol home dot com. It was a deal

0:29:51.360 --> 0:29:56.280
<v Speaker 1>that was worth nearly seven billion dollars, but that deal

0:29:56.360 --> 0:29:59.960
<v Speaker 1>did not ultimately work out. The merged company would file

0:30:00.160 --> 0:30:02.960
<v Speaker 1>for bankruptcy in two thousand one, one of the many

0:30:03.240 --> 0:30:07.400
<v Speaker 1>victims of the dot com bubble bursting um that was

0:30:07.440 --> 0:30:10.680
<v Speaker 1>at least one of the big contributing factors to that.

0:30:10.880 --> 0:30:13.120
<v Speaker 1>The company also just had a lot of debt even

0:30:13.120 --> 0:30:16.760
<v Speaker 1>heading into two thousand two thousand one, so that was

0:30:16.840 --> 0:30:20.760
<v Speaker 1>kind of the nail in the coffin. Now, Infospace, which

0:30:20.880 --> 0:30:24.400
<v Speaker 1>once upon a time owned what would become stuffed Media,

0:30:24.960 --> 0:30:29.560
<v Speaker 1>So technically I was an Infospace employee for a short while,

0:30:29.960 --> 0:30:35.600
<v Speaker 1>purchased Excites assets and domain names, and so web Crawler

0:30:35.920 --> 0:30:42.920
<v Speaker 1>and and uh Excite all became wrapped up with infospaces offerings,

0:30:43.120 --> 0:30:47.360
<v Speaker 1>and uh yeah, there you just technically, it's still part

0:30:47.800 --> 0:30:50.120
<v Speaker 1>of that. You can still use some of that, although

0:30:50.640 --> 0:30:52.960
<v Speaker 1>um it's a much different tool than what it used

0:30:53.000 --> 0:30:57.840
<v Speaker 1>to be. Also in Alta Vista emerged from the Western

0:30:57.920 --> 0:31:01.040
<v Speaker 1>Research Laboratory at the Digital equip Mint Corporation or d

0:31:01.080 --> 0:31:05.360
<v Speaker 1>e C. Alta Vista allowed for natural language queries, meaning

0:31:05.360 --> 0:31:07.400
<v Speaker 1>you could type in a query similar to how you

0:31:07.480 --> 0:31:10.280
<v Speaker 1>would ask a person to look for something for you.

0:31:10.520 --> 0:31:12.720
<v Speaker 1>You didn't have to focus on asking in a way

0:31:12.760 --> 0:31:15.640
<v Speaker 1>that would only make sense to a machine. This is

0:31:15.760 --> 0:31:18.680
<v Speaker 1>that barrier of entry we often see with technology, where

0:31:19.120 --> 0:31:23.400
<v Speaker 1>we have to adjust our behavior so that whatever technology

0:31:23.440 --> 0:31:27.240
<v Speaker 1>we're working with understands, quote unquote what we want from it.

0:31:27.760 --> 0:31:30.920
<v Speaker 1>Um Alta Vista was trying to reverse that, to make

0:31:31.280 --> 0:31:35.280
<v Speaker 1>the technology attempt to understand what we want, rather than

0:31:35.400 --> 0:31:38.840
<v Speaker 1>making us work so that the technology can understand us.

0:31:39.320 --> 0:31:41.480
<v Speaker 1>The researchers who designed it had to do a full

0:31:41.520 --> 0:31:47.200
<v Speaker 1>scale web crawl in August, indexed ten million pages in

0:31:47.240 --> 0:31:50.640
<v Speaker 1>that web crawl, and this was compelling enough to launch

0:31:50.720 --> 0:31:54.920
<v Speaker 1>as a spinoff company by Alta Vista was powering search

0:31:54.960 --> 0:31:57.959
<v Speaker 1>results for Yahoo. So, like I mentioned earlier, where Yahoo

0:31:58.000 --> 0:32:01.520
<v Speaker 1>would use other company to to run their web search

0:32:01.560 --> 0:32:04.720
<v Speaker 1>Altivista was one of those, but also at that time,

0:32:04.800 --> 0:32:09.200
<v Speaker 1>Compact would acquire d e C, which in turn owned Altivista,

0:32:09.720 --> 0:32:12.680
<v Speaker 1>and Compact turned Ultivista into more of a portal service

0:32:13.160 --> 0:32:16.400
<v Speaker 1>than than a search engine, a true search engine, which

0:32:16.400 --> 0:32:20.000
<v Speaker 1>put it more in direct competition with Yahoo, and Ultivista's

0:32:20.040 --> 0:32:23.360
<v Speaker 1>numbers went into decline, possibly because of that shift to

0:32:23.440 --> 0:32:27.520
<v Speaker 1>a portal service rather than as a more straightforward search tool.

0:32:28.040 --> 0:32:31.640
<v Speaker 1>Now we're not quite done covering all the major players

0:32:31.640 --> 0:32:34.320
<v Speaker 1>in the space before Google came on board. I'm going

0:32:34.360 --> 0:32:37.320
<v Speaker 1>to cover a couple more right after we take this

0:32:37.400 --> 0:32:48.560
<v Speaker 1>quick break. Okay, So in addition to the services I've

0:32:48.600 --> 0:32:51.200
<v Speaker 1>already mentioned, there were a couple more. There was ink

0:32:51.320 --> 0:32:54.680
<v Speaker 1>Tomy that it's a project that was headed by Eric

0:32:54.680 --> 0:32:59.520
<v Speaker 1>Brewer and Paul Gautier. They founded inc Tomy in the

0:32:59.520 --> 0:33:02.600
<v Speaker 1>two of them been working on a parallel processing computing

0:33:02.640 --> 0:33:06.240
<v Speaker 1>project for DARPA when they came up with this approach

0:33:06.320 --> 0:33:10.120
<v Speaker 1>to search, and rather than launching a dedicated search tool

0:33:10.200 --> 0:33:13.440
<v Speaker 1>of their own, they said, oh, well, we offer to

0:33:13.600 --> 0:33:18.440
<v Speaker 1>use our technology to power other people's search engines. So essentially,

0:33:18.760 --> 0:33:21.840
<v Speaker 1>you you put up the front and will power the

0:33:21.880 --> 0:33:26.280
<v Speaker 1>back end. And one of those was run by a

0:33:26.320 --> 0:33:29.280
<v Speaker 1>company called hot Wired, and they introduced a search tool

0:33:29.360 --> 0:33:33.240
<v Speaker 1>called hot bot. Ink Tomy worked largely as sort of

0:33:33.280 --> 0:33:36.560
<v Speaker 1>a business to business entity, growing far beyond a search

0:33:36.600 --> 0:33:39.560
<v Speaker 1>engine company. But the dot com crash of two thousand

0:33:39.640 --> 0:33:42.480
<v Speaker 1>one also hit ink Tony really hard, and a couple

0:33:42.520 --> 0:33:46.240
<v Speaker 1>of years later it was swept up by Yahoo. So

0:33:46.320 --> 0:33:48.000
<v Speaker 1>you see, you see a lot of these companies end

0:33:48.080 --> 0:33:50.280
<v Speaker 1>up kind of getting gulped up by each other. Now,

0:33:50.280 --> 0:33:53.160
<v Speaker 1>the last of our pre Google search engines that I'm

0:33:53.200 --> 0:33:56.840
<v Speaker 1>going to talk about is ask Jeeves. Later on, it

0:33:56.960 --> 0:34:00.000
<v Speaker 1>was just known as Ask. It launched in nineteen nine

0:34:00.120 --> 0:34:03.840
<v Speaker 1>d seven, having been developed by David Warthen and Garrett Gruner,

0:34:04.280 --> 0:34:06.400
<v Speaker 1>and like some of the other services I mentioned in

0:34:06.440 --> 0:34:10.200
<v Speaker 1>this episode, it would present curated lists that were created

0:34:10.200 --> 0:34:14.360
<v Speaker 1>by sort of an editorial board, along with some paid listing.

0:34:14.440 --> 0:34:17.840
<v Speaker 1>So if you're a company that wanted your website to

0:34:17.880 --> 0:34:24.920
<v Speaker 1>be listed alongside quote unquote legitimate research returns rather, you

0:34:25.000 --> 0:34:29.000
<v Speaker 1>could pony up the cash have your website put on

0:34:29.040 --> 0:34:33.120
<v Speaker 1>that list. That still happens today on search engines. Happens

0:34:33.160 --> 0:34:36.080
<v Speaker 1>today on Google, where you'll see the first couple of

0:34:36.120 --> 0:34:38.960
<v Speaker 1>results tend to be ones that say, you know, add

0:34:39.480 --> 0:34:41.520
<v Speaker 1>At the end of it, Google has to label them

0:34:41.560 --> 0:34:45.560
<v Speaker 1>as ads, not as just natural search results based on

0:34:45.600 --> 0:34:49.279
<v Speaker 1>your query. Though sometimes those ads actually are the things

0:34:49.320 --> 0:34:51.640
<v Speaker 1>you're looking for, so it's not always a bad thing,

0:34:51.880 --> 0:34:55.080
<v Speaker 1>but it is good to just pay attention. So eventually

0:34:55.440 --> 0:34:58.399
<v Speaker 1>Ask would develop its own search engine technology that would

0:34:58.400 --> 0:35:03.440
<v Speaker 1>automate things. They stopped lying exclusively on people curating lists,

0:35:03.840 --> 0:35:07.399
<v Speaker 1>and Ask would go on to acquire Excite, so you saw,

0:35:07.400 --> 0:35:10.080
<v Speaker 1>you know, Excite what WebCrawler will ask with later on

0:35:10.200 --> 0:35:12.640
<v Speaker 1>by Excite, So you see, there's a lot of shuffling

0:35:12.920 --> 0:35:18.040
<v Speaker 1>with these companies. And then came Google, which had started

0:35:18.080 --> 0:35:21.920
<v Speaker 1>as a research project at Stanford. Larry Page and Sarage

0:35:21.920 --> 0:35:24.600
<v Speaker 1>Brenn had developed the tool and they were running it

0:35:24.680 --> 0:35:26.960
<v Speaker 1>out of a garage for a little while. They had

0:35:26.960 --> 0:35:30.680
<v Speaker 1>built a search tool they originally called BackRub, and their

0:35:30.680 --> 0:35:32.680
<v Speaker 1>goal is to create a search engine that could index

0:35:32.719 --> 0:35:35.000
<v Speaker 1>the web and then present the most relevant results to

0:35:35.200 --> 0:35:38.680
<v Speaker 1>any query. But how would you do that? Well, the

0:35:38.760 --> 0:35:42.480
<v Speaker 1>actual answer, if we're being totally transparent, is kind of

0:35:42.480 --> 0:35:46.640
<v Speaker 1>like Coke's secret formula, and that we know in general

0:35:46.920 --> 0:35:49.080
<v Speaker 1>what has to go into it, but we don't know

0:35:49.160 --> 0:35:52.440
<v Speaker 1>the specifics that would allow us to replicate the results precisely.

0:35:52.840 --> 0:35:58.000
<v Speaker 1>The algorithm that Google uses is peculiar to Google, and

0:35:58.080 --> 0:36:01.520
<v Speaker 1>they also change it a lot. They tweak it, so

0:36:01.800 --> 0:36:03.880
<v Speaker 1>even if we did learn how it used to work,

0:36:04.040 --> 0:36:07.560
<v Speaker 1>it doesn't work that way anymore. So Brendan Page would

0:36:07.600 --> 0:36:11.000
<v Speaker 1>refer to this process as page rank. And here's how

0:36:11.040 --> 0:36:15.240
<v Speaker 1>it worked from a theoretical standpoint. So first, you index

0:36:15.280 --> 0:36:18.680
<v Speaker 1>the web. So you need to get a kind of

0:36:18.719 --> 0:36:24.279
<v Speaker 1>a ah, a complete look at all the websites that

0:36:24.360 --> 0:36:27.080
<v Speaker 1>are available out there on the web and inventory if

0:36:27.120 --> 0:36:29.719
<v Speaker 1>you will, of all the web. To do this, you

0:36:29.760 --> 0:36:32.480
<v Speaker 1>send out bots to index all the pages that are

0:36:32.480 --> 0:36:35.120
<v Speaker 1>listed on the web that you can find. Um, you

0:36:35.160 --> 0:36:39.520
<v Speaker 1>can actually in the HTML of a web page, you

0:36:39.560 --> 0:36:42.600
<v Speaker 1>can designate it so that it will instruct bots to

0:36:42.680 --> 0:36:46.359
<v Speaker 1>ignore the page and not index it. So you can

0:36:46.400 --> 0:36:48.759
<v Speaker 1>do that and it won't show up on any search

0:36:48.840 --> 0:36:51.440
<v Speaker 1>result page because the bot will see that message and

0:36:51.480 --> 0:36:54.480
<v Speaker 1>we'll just move on. This is useful if you want

0:36:54.480 --> 0:36:56.839
<v Speaker 1>a page that only people who know about it can

0:36:56.920 --> 0:36:59.840
<v Speaker 1>navigate to it, and you don't want folks just stumbling

0:37:00.000 --> 0:37:02.799
<v Speaker 1>on it through search, So that is an option. So

0:37:02.920 --> 0:37:05.960
<v Speaker 1>for all of the pages that are discoverable. The bots

0:37:05.960 --> 0:37:08.879
<v Speaker 1>will crawl through, they follow all the links, they try

0:37:08.920 --> 0:37:11.279
<v Speaker 1>and index out the web and get it as good

0:37:11.280 --> 0:37:13.839
<v Speaker 1>as snapshot of what the World Wide Web is as

0:37:13.960 --> 0:37:17.319
<v Speaker 1>is possible. Now, these spots aren't just looking for the

0:37:17.360 --> 0:37:20.200
<v Speaker 1>location of the web pages, like what server those web

0:37:20.200 --> 0:37:23.800
<v Speaker 1>pages are stored on, or even get just a full

0:37:23.960 --> 0:37:27.600
<v Speaker 1>understanding of what the text is inside those pages, so

0:37:27.640 --> 0:37:30.000
<v Speaker 1>that when you do a search query and you put

0:37:30.000 --> 0:37:33.120
<v Speaker 1>your search terms in, they can return the pages that

0:37:33.200 --> 0:37:36.520
<v Speaker 1>have those search terms. They're also looking for links, both

0:37:36.560 --> 0:37:39.120
<v Speaker 1>going into the page and coming from the page to

0:37:39.160 --> 0:37:42.080
<v Speaker 1>go elsewhere, and the links will become a really important

0:37:42.120 --> 0:37:45.200
<v Speaker 1>part of page rink. So here's the basic idea. Brand

0:37:45.280 --> 0:37:48.160
<v Speaker 1>and Page figured out that if a web page about

0:37:48.200 --> 0:37:52.319
<v Speaker 1>a given subject is really good, other pages tend to

0:37:52.400 --> 0:37:57.000
<v Speaker 1>link to it. They do so because they recognize the quality,

0:37:57.440 --> 0:38:02.040
<v Speaker 1>and that helps boost the pages position in search results.

0:38:02.040 --> 0:38:04.440
<v Speaker 1>So let's use an example to kind of understand this.

0:38:05.000 --> 0:38:08.560
<v Speaker 1>Let's say you are one of these early web developers

0:38:08.640 --> 0:38:11.720
<v Speaker 1>in the late nineteen nineties, and you're also a big

0:38:11.800 --> 0:38:14.799
<v Speaker 1>music fans, so you decided to create a blog that's

0:38:14.840 --> 0:38:18.000
<v Speaker 1>completely focused on the music industry, and you cover the

0:38:18.040 --> 0:38:20.840
<v Speaker 1>news in the industry. You post reviews of albums that

0:38:20.880 --> 0:38:23.520
<v Speaker 1>you've listened to. Maybe you even do some interviews with

0:38:23.560 --> 0:38:26.000
<v Speaker 1>people who are in the industry. And as you write

0:38:26.000 --> 0:38:29.640
<v Speaker 1>this blog, other people take notice. Some of them also

0:38:29.680 --> 0:38:32.239
<v Speaker 1>have a web presence and cover the industry, and they

0:38:32.280 --> 0:38:35.080
<v Speaker 1>really dig your stuff, so they linked to your page.

0:38:35.120 --> 0:38:38.080
<v Speaker 1>They say, there's a really cool music industry blog. It's

0:38:38.080 --> 0:38:40.799
<v Speaker 1>being run by this person over here. Follow this link

0:38:40.800 --> 0:38:44.239
<v Speaker 1>to go check it out. Google's bots would register that

0:38:44.320 --> 0:38:47.120
<v Speaker 1>they would see that those links were out there pointing

0:38:47.160 --> 0:38:49.879
<v Speaker 1>to your page, and the more sites that link back

0:38:49.920 --> 0:38:52.400
<v Speaker 1>to your blog, the higher your blog would rank and

0:38:52.440 --> 0:38:56.840
<v Speaker 1>search results. So if someone searched music industry news or

0:38:56.880 --> 0:38:59.640
<v Speaker 1>something along those lines, there's a chance that your blog

0:38:59.680 --> 0:39:02.560
<v Speaker 1>would pop up fairly high and results. Now how high

0:39:02.640 --> 0:39:07.000
<v Speaker 1>would be dependent on something other than just how many

0:39:07.160 --> 0:39:10.640
<v Speaker 1>pages are linking to you. That's one factor that matters

0:39:10.680 --> 0:39:12.880
<v Speaker 1>a lot, the number of sites linking to your page.

0:39:13.120 --> 0:39:17.160
<v Speaker 1>But the other one is how trustworthy those linking sites were.

0:39:17.719 --> 0:39:22.000
<v Speaker 1>So let's consider two scenarios. In our first scenario, you've

0:39:22.040 --> 0:39:24.680
<v Speaker 1>got your music blog and you've got a lot of

0:39:24.719 --> 0:39:27.279
<v Speaker 1>sites that are linking to your page, but they're all

0:39:27.680 --> 0:39:30.960
<v Speaker 1>small time sites like some our personal sites, run by

0:39:31.000 --> 0:39:33.239
<v Speaker 1>people who are interested in music, but they don't really

0:39:33.239 --> 0:39:36.520
<v Speaker 1>have any presence in the industry and no one's really

0:39:36.600 --> 0:39:40.960
<v Speaker 1>linking to their page, so they're not ranked super high

0:39:41.040 --> 0:39:44.279
<v Speaker 1>in Google's estimation. Some of them might be even worse

0:39:44.280 --> 0:39:47.120
<v Speaker 1>than that. Some of them might be link farms. Link farms.

0:39:47.400 --> 0:39:49.840
<v Speaker 1>You don't really see them that much these days, but

0:39:50.200 --> 0:39:53.239
<v Speaker 1>in the nineties they were everywhere. They only existed to

0:39:53.360 --> 0:39:57.040
<v Speaker 1>link to other pages, and it was in an effort

0:39:57.040 --> 0:40:01.160
<v Speaker 1>to boost those other pages rankings in search. So if

0:40:01.239 --> 0:40:04.000
<v Speaker 1>you navigated to one, let's say you do a search

0:40:04.040 --> 0:40:06.920
<v Speaker 1>for a term and you click on the link, you

0:40:07.000 --> 0:40:12.400
<v Speaker 1>end up looking at a bunch of completely disconnected titles

0:40:12.400 --> 0:40:15.120
<v Speaker 1>and U r l s and that's it. There's no

0:40:15.239 --> 0:40:18.399
<v Speaker 1>other content on the page. It's just a listing of

0:40:18.480 --> 0:40:22.440
<v Speaker 1>links to different sites with no rhyme or reason to them.

0:40:22.480 --> 0:40:26.840
<v Speaker 1>Those would also be very low in Google's trustworthiness according

0:40:26.840 --> 0:40:30.040
<v Speaker 1>to its algorithm, because obviously the only reason they're existing

0:40:30.200 --> 0:40:33.440
<v Speaker 1>is to try and game the system, to try and say, well,

0:40:33.520 --> 0:40:37.160
<v Speaker 1>let's just add a lot more links to this page

0:40:37.600 --> 0:40:41.960
<v Speaker 1>and that will boost its its relevance. So if that

0:40:42.040 --> 0:40:44.120
<v Speaker 1>were the case, if most of the links going to

0:40:44.200 --> 0:40:48.440
<v Speaker 1>your page were either from small potatoes websites or they

0:40:48.440 --> 0:40:51.680
<v Speaker 1>were from link farms. Your page rank wouldn't be boosted

0:40:51.800 --> 0:40:54.640
<v Speaker 1>very high. It might be higher than it would be

0:40:54.680 --> 0:40:56.640
<v Speaker 1>if there were no links going to your page at all,

0:40:56.719 --> 0:41:00.359
<v Speaker 1>but it's not a huge help. Now let's consider scenario two.

0:41:01.080 --> 0:41:03.720
<v Speaker 1>Let's say your blog only has a few sites linking

0:41:03.719 --> 0:41:06.240
<v Speaker 1>to it, a couple of dozen maybe, But those sites

0:41:06.239 --> 0:41:10.840
<v Speaker 1>are doozies. Maybe they include record labels that are in

0:41:10.880 --> 0:41:14.520
<v Speaker 1>the music industry. Maybe it's other outlets that cover music news.

0:41:14.920 --> 0:41:18.080
<v Speaker 1>Maybe it includes some news websites that use your blog

0:41:18.120 --> 0:41:21.040
<v Speaker 1>as a source for stories. Now those sites have a

0:41:21.120 --> 0:41:25.120
<v Speaker 1>much higher level of trustworthiness for Google, and so or

0:41:25.280 --> 0:41:27.839
<v Speaker 1>you know, in Google's estimation, I should say, and those

0:41:27.880 --> 0:41:31.160
<v Speaker 1>links matter more. So maybe in scenario one you have

0:41:31.200 --> 0:41:34.400
<v Speaker 1>a thousand tiny sites linking to you, and scenario to

0:41:34.560 --> 0:41:36.080
<v Speaker 1>you just have a couple of dozen of the really

0:41:36.120 --> 0:41:39.600
<v Speaker 1>big sites linking to you. Page rank would favor scenario

0:41:39.760 --> 0:41:43.040
<v Speaker 1>two over scenario one, reasoning that if your blog is

0:41:43.080 --> 0:41:45.640
<v Speaker 1>good enough to get the attention and support of those

0:41:45.680 --> 0:41:49.480
<v Speaker 1>trusted entities, it must be a really good resource, and

0:41:49.520 --> 0:41:52.919
<v Speaker 1>so your site would get boosted in search results. Now

0:41:53.000 --> 0:41:56.480
<v Speaker 1>that helped address a troublesome trend with search. I mentioned

0:41:56.520 --> 0:41:59.840
<v Speaker 1>link farms. That was one problem. So any search engine

0:41:59.840 --> 0:42:04.399
<v Speaker 1>that looked at back linking UM could be fooled through

0:42:04.480 --> 0:42:07.720
<v Speaker 1>link farms that were just there to to boost that number.

0:42:08.680 --> 0:42:12.040
<v Speaker 1>In the nineties, it wasn't unusual to encounter that. I

0:42:12.120 --> 0:42:13.759
<v Speaker 1>can't tell you how many times it happened to me

0:42:13.800 --> 0:42:15.520
<v Speaker 1>when I was doing a search for, you know, a

0:42:15.520 --> 0:42:19.200
<v Speaker 1>fairly obscure type of topic, and I just would come

0:42:19.200 --> 0:42:22.160
<v Speaker 1>across a link farm to all sorts of stuff that

0:42:22.239 --> 0:42:24.600
<v Speaker 1>was most of which was totally not relevant to what

0:42:24.680 --> 0:42:29.760
<v Speaker 1>I wanted. UM. Those were really frustrating, and so that

0:42:29.760 --> 0:42:32.239
<v Speaker 1>that was one thing that people would do to try

0:42:32.239 --> 0:42:38.680
<v Speaker 1>and game the system. But another was an equally annoying tactic. UH.

0:42:39.280 --> 0:42:42.160
<v Speaker 1>People wanted folks to come to their web pages really badly.

0:42:42.520 --> 0:42:45.000
<v Speaker 1>They were in the old old days. There were even

0:42:45.160 --> 0:42:47.799
<v Speaker 1>web page counters, a little it looked like a little

0:42:47.800 --> 0:42:50.279
<v Speaker 1>digit counter that would tell you how many people had

0:42:50.320 --> 0:42:52.840
<v Speaker 1>been to that website, and it became kind of a

0:42:52.880 --> 0:42:56.520
<v Speaker 1>badge of honor among early web developers if that number

0:42:56.520 --> 0:42:59.400
<v Speaker 1>were particularly high, because it showed that a lot of

0:42:59.400 --> 0:43:02.239
<v Speaker 1>people were visiting your site, and it was kind of

0:43:02.239 --> 0:43:05.960
<v Speaker 1>a prestige thing um and also could mean money because

0:43:05.960 --> 0:43:08.640
<v Speaker 1>if you were using web advertising to support your your

0:43:08.680 --> 0:43:12.280
<v Speaker 1>web site and that number was getting really really high,

0:43:12.520 --> 0:43:14.600
<v Speaker 1>and then you had more page views, and more page

0:43:14.640 --> 0:43:17.799
<v Speaker 1>views would mean more cash from the advertisers, So there

0:43:17.920 --> 0:43:21.279
<v Speaker 1>was an actual, you know, financial reason to try and

0:43:21.320 --> 0:43:23.480
<v Speaker 1>get more people to come to your web page, and

0:43:23.520 --> 0:43:30.600
<v Speaker 1>not everybody played fair and square. Sometimes web developers would

0:43:30.600 --> 0:43:35.920
<v Speaker 1>include an incredibly long list of popular search terms on

0:43:36.000 --> 0:43:38.720
<v Speaker 1>the web page. Usually would be at the very bottom

0:43:38.760 --> 0:43:42.600
<v Speaker 1>of the web page in tiny font and so that's

0:43:42.600 --> 0:43:45.040
<v Speaker 1>the only place where your search terms would show up.

0:43:45.480 --> 0:43:47.080
<v Speaker 1>The rest of the web page would be about something

0:43:47.239 --> 0:43:50.040
<v Speaker 1>entirely different, and then you do a search on the

0:43:50.040 --> 0:43:52.279
<v Speaker 1>web page for the terms you were looking for. It

0:43:52.320 --> 0:43:55.480
<v Speaker 1>turns out there just in this list of random or

0:43:55.520 --> 0:43:59.200
<v Speaker 1>seemingly random search terms, it's really the most popular search

0:43:59.320 --> 0:44:01.719
<v Speaker 1>terms that people could come across, and they were just

0:44:02.880 --> 0:44:05.280
<v Speaker 1>dumping them all at the bottom of their web pages,

0:44:05.360 --> 0:44:07.560
<v Speaker 1>and that way their web page would pop up in

0:44:07.600 --> 0:44:10.640
<v Speaker 1>all these sorts of searches, and people would end up

0:44:10.680 --> 0:44:13.759
<v Speaker 1>going to their web page without knowing that it wasn't

0:44:13.840 --> 0:44:16.719
<v Speaker 1>really about what they were hoping for That was really

0:44:16.760 --> 0:44:20.080
<v Speaker 1>frustrating for a lot of people, including myself, because you know,

0:44:20.200 --> 0:44:22.600
<v Speaker 1>you're obviously you're searching for something because you want to

0:44:22.640 --> 0:44:25.040
<v Speaker 1>get that content, but then you end up going to

0:44:25.080 --> 0:44:27.200
<v Speaker 1>a web page that's not about that at all. It's

0:44:27.239 --> 0:44:29.760
<v Speaker 1>not a good experience, So it was a terrible way

0:44:29.960 --> 0:44:32.719
<v Speaker 1>to have people come to your web page. However, if

0:44:32.719 --> 0:44:35.400
<v Speaker 1>your goal was just to get those views so that

0:44:35.480 --> 0:44:38.719
<v Speaker 1>you could get that ad money, people were willing to

0:44:38.719 --> 0:44:43.319
<v Speaker 1>do it. Um maybe it was a successful strategy for

0:44:43.360 --> 0:44:46.000
<v Speaker 1>people who were maybe running an online store, but I

0:44:46.000 --> 0:44:47.879
<v Speaker 1>can't imagine it would be worked too well. I mean,

0:44:48.120 --> 0:44:51.840
<v Speaker 1>if I'm looking for information about quantum mechanics and I

0:44:51.960 --> 0:44:55.440
<v Speaker 1>end up being dumped in some store that's selling baseball

0:44:55.480 --> 0:44:58.440
<v Speaker 1>caps that have nothing to do with anything, I'm probably

0:44:58.480 --> 0:45:01.200
<v Speaker 1>just gonna be mad. But anyway, that was one of

0:45:01.239 --> 0:45:05.400
<v Speaker 1>the other approaches people were taking, was trying to include

0:45:05.400 --> 0:45:07.520
<v Speaker 1>this text. Sometimes they would even hide it. They would

0:45:07.520 --> 0:45:11.080
<v Speaker 1>have a big section of the web page where the

0:45:11.239 --> 0:45:15.239
<v Speaker 1>font had the same color as the background text, so

0:45:15.280 --> 0:45:18.200
<v Speaker 1>you couldn't see it just when you're reading through the

0:45:18.239 --> 0:45:21.319
<v Speaker 1>web page, but it could be read by bots as

0:45:21.360 --> 0:45:26.640
<v Speaker 1>they're crawling through all this material. Uh So, search engine

0:45:26.960 --> 0:45:31.640
<v Speaker 1>developers got into kind of a seesaw battle with web

0:45:31.640 --> 0:45:34.960
<v Speaker 1>developers to try and get around these tricks. One of

0:45:35.000 --> 0:45:37.840
<v Speaker 1>the things they started to do Google was one of

0:45:37.880 --> 0:45:41.400
<v Speaker 1>them was focused on the text in the actual body

0:45:41.520 --> 0:45:45.000
<v Speaker 1>of the document itself and then ignore information that might

0:45:45.040 --> 0:45:47.399
<v Speaker 1>be in the headers or footers, which was typically where

0:45:47.400 --> 0:45:52.160
<v Speaker 1>people were putting these laundry lists of popular search terms.

0:45:52.480 --> 0:45:54.640
<v Speaker 1>So Google got around that by saying, Okay, well, we're

0:45:54.680 --> 0:45:57.000
<v Speaker 1>no longer worried about the text that's in the head

0:45:57.120 --> 0:45:59.840
<v Speaker 1>or the footer. We're just concentrating on what's in the

0:45:59.840 --> 0:46:05.440
<v Speaker 1>body of the page. And Google's approach really improved upon relevance,

0:46:05.480 --> 0:46:08.759
<v Speaker 1>the search results were just better than most of the competitors.

0:46:08.920 --> 0:46:11.160
<v Speaker 1>You know, you you were more likely to come across

0:46:11.239 --> 0:46:13.960
<v Speaker 1>something the stuff that you know represented what you wanted,

0:46:14.480 --> 0:46:17.640
<v Speaker 1>and so Google was able to tap into advertising revenue

0:46:17.719 --> 0:46:21.520
<v Speaker 1>because they were able to really give people what they wanted.

0:46:22.400 --> 0:46:25.799
<v Speaker 1>Advertisers wanted to be included with that, and Google began

0:46:25.880 --> 0:46:29.280
<v Speaker 1>listing ads supported results with the top returns for queries.

0:46:29.680 --> 0:46:32.800
<v Speaker 1>So it meant that you know, the stuff that people

0:46:32.840 --> 0:46:36.080
<v Speaker 1>most wanted to see, you would get ads served right

0:46:36.160 --> 0:46:40.279
<v Speaker 1>with that Uh, there's a very attractive proposition, and it

0:46:40.320 --> 0:46:42.719
<v Speaker 1>positioned the company well enough to survive the dot com

0:46:42.719 --> 0:46:45.440
<v Speaker 1>bubble burst of two thousand and two thousand one, and

0:46:45.560 --> 0:46:48.200
<v Speaker 1>many of its competitors either merged with other companies as

0:46:48.239 --> 0:46:52.080
<v Speaker 1>I mentioned, or they completely went under. The Google remained

0:46:53.160 --> 0:46:56.440
<v Speaker 1>around and then was able to actually seriously grow in

0:46:56.480 --> 0:46:59.319
<v Speaker 1>the two thousand's. Uh. There were a couple of discussions

0:46:59.320 --> 0:47:02.600
<v Speaker 1>with other company needs early on, including Excite, that could

0:47:02.600 --> 0:47:05.040
<v Speaker 1>have led to Google getting acquired, but none of that

0:47:05.120 --> 0:47:08.160
<v Speaker 1>came to fruition, and Google remained its own company and

0:47:08.200 --> 0:47:13.280
<v Speaker 1>continue to build on its success. And Google would evolve

0:47:13.320 --> 0:47:16.160
<v Speaker 1>its algorithm trying to crack the nut of deciphering the

0:47:16.239 --> 0:47:20.200
<v Speaker 1>meaning of text inside web pages. So not just here

0:47:20.200 --> 0:47:23.160
<v Speaker 1>are the web pages that include the terms that you

0:47:23.239 --> 0:47:26.080
<v Speaker 1>search for, but here are the ones that included in

0:47:26.120 --> 0:47:30.040
<v Speaker 1>the way that you meant including a improving it so

0:47:30.080 --> 0:47:34.400
<v Speaker 1>that it can recognize natural language and not just you know,

0:47:34.640 --> 0:47:38.879
<v Speaker 1>lists of search terms. Pairing that with the page rank

0:47:38.960 --> 0:47:41.520
<v Speaker 1>kind of approach would give Google the information and needed

0:47:41.520 --> 0:47:45.200
<v Speaker 1>to really rank its results and necessitated the search engine

0:47:45.239 --> 0:47:49.880
<v Speaker 1>optimization strategy that that became a whole new industry. Ranking

0:47:49.880 --> 0:47:52.759
<v Speaker 1>well in search was a really good way to get

0:47:52.800 --> 0:47:56.800
<v Speaker 1>serious Internet traffic to a site. People made entire careers

0:47:56.800 --> 0:47:58.799
<v Speaker 1>out of figuring out the best way to rank well

0:47:58.840 --> 0:48:03.880
<v Speaker 1>in search, which honestly mostly involves creating a compelling and

0:48:03.920 --> 0:48:06.920
<v Speaker 1>relevant web page or website that makes people want to

0:48:07.000 --> 0:48:11.440
<v Speaker 1>link to it. Um it's easier said than done. It's

0:48:11.480 --> 0:48:15.560
<v Speaker 1>that was the best way to rank well within Google's search. Occasionally,

0:48:15.600 --> 0:48:18.680
<v Speaker 1>Google would tweak things so that your site, if it

0:48:18.719 --> 0:48:21.439
<v Speaker 1>was particularly good, which just rise to the top because

0:48:21.480 --> 0:48:24.840
<v Speaker 1>Google recognized that they might wait your site more heavily

0:48:24.920 --> 0:48:28.120
<v Speaker 1>than other sites. Um. But it also led to companies

0:48:28.560 --> 0:48:32.759
<v Speaker 1>learning the hard lesson that depending upon search traffic is

0:48:32.960 --> 0:48:36.919
<v Speaker 1>a risky thing to do. Every time Google changes its

0:48:37.000 --> 0:48:41.360
<v Speaker 1>search algorithm, it affects search rankings. So you might be

0:48:41.440 --> 0:48:43.960
<v Speaker 1>doing really well for years, and then suddenly you see

0:48:43.960 --> 0:48:47.520
<v Speaker 1>a massive drop off and visitor numbers because Google changed

0:48:47.560 --> 0:48:50.080
<v Speaker 1>its algorithm and your page no longer ranks as well

0:48:50.120 --> 0:48:52.480
<v Speaker 1>in search results as it used to. So in a

0:48:52.560 --> 0:48:54.840
<v Speaker 1>future episode, I plan on getting some s e O

0:48:55.080 --> 0:48:58.759
<v Speaker 1>experts on the show and have them talk about the

0:48:58.840 --> 0:49:01.640
<v Speaker 1>challenges of developing a good strategy to rank well in

0:49:01.719 --> 0:49:05.719
<v Speaker 1>search and what other strategies people might consider if they

0:49:05.719 --> 0:49:10.200
<v Speaker 1>want to promote their traffic to sites and services. You know,

0:49:10.320 --> 0:49:14.120
<v Speaker 1>it's it's tricky stuff because again, it might work great

0:49:14.600 --> 0:49:17.200
<v Speaker 1>today and then tomorrow it might not work at all.

0:49:17.880 --> 0:49:22.160
<v Speaker 1>So there there's a real strong push among web developers

0:49:22.160 --> 0:49:27.400
<v Speaker 1>to try and find alternatives to search engine traffic being

0:49:27.480 --> 0:49:31.840
<v Speaker 1>your main way of getting people into your website. Um. Also,

0:49:32.280 --> 0:49:36.360
<v Speaker 1>if people are just searching for content and then popping

0:49:36.360 --> 0:49:40.040
<v Speaker 1>over to your site, uh, and they read one page

0:49:40.080 --> 0:49:43.279
<v Speaker 1>that is relevant to whatever their search engine query was,

0:49:43.719 --> 0:49:46.520
<v Speaker 1>they're not likely to stick around unless they go down

0:49:46.680 --> 0:49:50.480
<v Speaker 1>sort of the Wikipedia rabbit hole. They're more likely to bounce.

0:49:50.920 --> 0:49:52.719
<v Speaker 1>And this was a problem we saw at the House

0:49:52.719 --> 0:49:54.839
<v Speaker 1>Stuff Works website all the time, is that we could

0:49:54.840 --> 0:49:59.000
<v Speaker 1>get great search engine traffic. People were looking for specific

0:49:59.040 --> 0:50:02.520
<v Speaker 1>answers to question and we had articles that answered those questions,

0:50:02.560 --> 0:50:05.160
<v Speaker 1>so people would come and read those articles. Now, what

0:50:05.200 --> 0:50:07.400
<v Speaker 1>would be ideal for us is that people say, this

0:50:07.480 --> 0:50:09.640
<v Speaker 1>is a great site, I want to read more articles.

0:50:09.760 --> 0:50:12.879
<v Speaker 1>Let's just see what's here. But the reality was most

0:50:12.880 --> 0:50:16.600
<v Speaker 1>people would come in, read whatever they wanted and then leave. Um,

0:50:16.640 --> 0:50:19.120
<v Speaker 1>they wouldn't stick around to read other stuff. And and

0:50:19.480 --> 0:50:21.520
<v Speaker 1>it was a real challenge One of the things that

0:50:21.560 --> 0:50:24.000
<v Speaker 1>we always tried to do was figure out how to

0:50:24.040 --> 0:50:27.319
<v Speaker 1>create a site that was a destination all of its own.

0:50:27.680 --> 0:50:30.600
<v Speaker 1>That you're not going there because a search engine told

0:50:30.640 --> 0:50:33.040
<v Speaker 1>you to. You're going there because you love the site

0:50:33.080 --> 0:50:36.760
<v Speaker 1>and you want to read more of the stuff on there. Um.

0:50:36.800 --> 0:50:39.160
<v Speaker 1>That was always our goal. It was always very, very

0:50:39.280 --> 0:50:41.799
<v Speaker 1>challenging because there's a ton of websites out there, and

0:50:41.840 --> 0:50:45.520
<v Speaker 1>there's a ton of really great content, So making sure

0:50:45.560 --> 0:50:48.719
<v Speaker 1>that yours can stand up to everybody else's is a

0:50:48.719 --> 0:50:50.680
<v Speaker 1>heck of a challenge. It's a hard thing to do.

0:50:50.880 --> 0:50:53.359
<v Speaker 1>I think the site does a great job of it, um,

0:50:53.480 --> 0:50:55.000
<v Speaker 1>but it was one of those things that we were

0:50:55.040 --> 0:50:59.240
<v Speaker 1>always striving toward. In the end, Google one out because

0:50:59.440 --> 0:51:02.680
<v Speaker 1>it had grown too large before the bubble burst, so

0:51:02.800 --> 0:51:05.640
<v Speaker 1>it hadn't spread its assets out too thin, it wasn't

0:51:05.680 --> 0:51:08.680
<v Speaker 1>in incredible amounts of debt, so it was able to

0:51:08.840 --> 0:51:10.960
<v Speaker 1>weather that storm, and then it was able to build

0:51:11.000 --> 0:51:14.400
<v Speaker 1>on its success, and it had developed a search engine

0:51:14.400 --> 0:51:16.680
<v Speaker 1>tool that people felt returned the best results and they

0:51:16.719 --> 0:51:19.959
<v Speaker 1>put a ton of trust in it. Ultimately, Google would

0:51:19.960 --> 0:51:23.120
<v Speaker 1>become this enormous company that would be able to gather

0:51:23.520 --> 0:51:26.960
<v Speaker 1>huge amounts of data from its users and put that

0:51:27.000 --> 0:51:28.919
<v Speaker 1>to use as well, and that made it a very

0:51:29.040 --> 0:51:34.160
<v Speaker 1>valuable resource for advertisers, and that's kind of how Google

0:51:34.239 --> 0:51:38.480
<v Speaker 1>won the search engine war. Now, we'll talk about other

0:51:38.520 --> 0:51:40.920
<v Speaker 1>stuff related search engines in the future, but our next

0:51:40.960 --> 0:51:44.160
<v Speaker 1>episode is going to be about something totally different. Um

0:51:44.200 --> 0:51:46.520
<v Speaker 1>And I'm just doing a few one off episodes because

0:51:46.520 --> 0:51:51.120
<v Speaker 1>after doing that arc of seven episodes about the media

0:51:51.239 --> 0:51:55.640
<v Speaker 1>and its relationship to us and and technology, I felt

0:51:55.640 --> 0:51:58.440
<v Speaker 1>like we kind of needed to do some one offs.

0:51:58.560 --> 0:52:02.600
<v Speaker 1>So the next one's gonna be an their entertainment related podcast,

0:52:02.680 --> 0:52:04.520
<v Speaker 1>but it will be another one off. If you guys

0:52:04.560 --> 0:52:07.480
<v Speaker 1>have suggestions for future topics I should tackle, why not

0:52:07.719 --> 0:52:10.239
<v Speaker 1>send me an email address is tech stuff at how

0:52:10.280 --> 0:52:12.400
<v Speaker 1>stuff works dot com or hop on over to our

0:52:12.440 --> 0:52:16.040
<v Speaker 1>website that's tech stuff podcast dot com. You will find

0:52:16.040 --> 0:52:18.600
<v Speaker 1>the archive of all of our shows. There, you'll find

0:52:18.640 --> 0:52:21.399
<v Speaker 1>links to our social media sites. You'll find a link

0:52:21.520 --> 0:52:24.120
<v Speaker 1>to our online store, where every purchase you make goes

0:52:24.160 --> 0:52:26.920
<v Speaker 1>to help the show and we greatly appreciate it and

0:52:26.960 --> 0:52:34.680
<v Speaker 1>I will talk to you again really soon. Text Stuff

0:52:34.680 --> 0:52:37.040
<v Speaker 1>is a production of I Heart Radio's How Stuff Works.

0:52:37.200 --> 0:52:40.000
<v Speaker 1>For more podcasts from my Heart radio, visit the I

0:52:40.120 --> 0:52:43.360
<v Speaker 1>heart Radio app, Apple podcasts, or wherever you listen to

0:52:43.400 --> 0:52:44.320
<v Speaker 1>your favorite shows.