WEBVTT - The challenge of natural language processing

0:00:04.120 --> 0:00:07.160
<v Speaker 1>Get in touch with technology with tech Stuff from how

0:00:07.200 --> 0:00:14.120
<v Speaker 1>stuff works dot com. Hey there, and welcome to tech Stuff.

0:00:14.160 --> 0:00:17.400
<v Speaker 1>I'm your host, Jonathan Strickland. I'm an executive producer at

0:00:17.440 --> 0:00:20.680
<v Speaker 1>how stuff Works and I love all things tech. And

0:00:20.720 --> 0:00:24.000
<v Speaker 1>in the last episode, I covered the history and technology

0:00:24.040 --> 0:00:28.639
<v Speaker 1>behind speech recognition. So today we're going to look at

0:00:28.680 --> 0:00:34.440
<v Speaker 1>a related concept called natural language processing or natural language understanding.

0:00:34.479 --> 0:00:38.920
<v Speaker 1>The two are are related. This technology and speech recognition

0:00:39.000 --> 0:00:42.800
<v Speaker 1>are both part of what make voice assistants like Sirie,

0:00:43.120 --> 0:00:46.840
<v Speaker 1>Alexa and Google Assistant work, though there are other technologies

0:00:46.880 --> 0:00:49.040
<v Speaker 1>that also go into that. Now, this is a huge

0:00:49.120 --> 0:00:53.120
<v Speaker 1>topic and as a long and fascinating history, so this

0:00:53.200 --> 0:00:55.120
<v Speaker 1>episode is just going to be the start of it.

0:00:55.320 --> 0:00:58.320
<v Speaker 1>In the next episode, I will conclude a discussion on

0:00:58.480 --> 0:01:01.360
<v Speaker 1>natural language processing and go into the history of these

0:01:01.400 --> 0:01:05.920
<v Speaker 1>actual voice assistants. So, on a high level, what is

0:01:06.120 --> 0:01:11.080
<v Speaker 1>natural language processing? Well, simply put, it's programming a machine

0:01:11.160 --> 0:01:14.720
<v Speaker 1>to interpret language the way we use it we human beings.

0:01:14.840 --> 0:01:19.640
<v Speaker 1>So in an ideal implementation, which would also require advanced

0:01:19.720 --> 0:01:23.680
<v Speaker 1>artificial intelligence, you could speak to a machine or type

0:01:23.720 --> 0:01:25.760
<v Speaker 1>whatever you like into a terminal and it would be

0:01:25.800 --> 0:01:29.080
<v Speaker 1>able to understand what you meant. What your commands were,

0:01:29.200 --> 0:01:32.800
<v Speaker 1>no matter how you worded the phrase. In turn, the

0:01:32.880 --> 0:01:36.440
<v Speaker 1>machine would be able to generate responses that made linguistic

0:01:36.560 --> 0:01:39.959
<v Speaker 1>sense to us, and we could in effect hold entire

0:01:40.080 --> 0:01:44.840
<v Speaker 1>conversations with those machines. This, as it turns out, is

0:01:44.880 --> 0:01:49.000
<v Speaker 1>a very difficult challenge. Even creating a machine that can

0:01:49.040 --> 0:01:52.560
<v Speaker 1>respond to basic commands delivered in a natural language is

0:01:52.720 --> 0:01:56.080
<v Speaker 1>really really hard to do, and we haven't yet cracked

0:01:56.240 --> 0:02:00.520
<v Speaker 1>the nut on making a machine that can actually hold

0:02:00.560 --> 0:02:04.040
<v Speaker 1>a real conversation with us. Yet we can sometimes forget

0:02:04.520 --> 0:02:09.520
<v Speaker 1>that machines do not natively understand human language. Machines process

0:02:09.600 --> 0:02:13.639
<v Speaker 1>information in machine code, which is difficult for humans to understand.

0:02:14.120 --> 0:02:17.480
<v Speaker 1>I almost said impossible for humans to understand, but really

0:02:17.880 --> 0:02:22.600
<v Speaker 1>it's just impractical. It's incredibly difficult. So, for example, computers

0:02:22.600 --> 0:02:26.639
<v Speaker 1>that run on binary systems process all information in zeros

0:02:26.760 --> 0:02:29.840
<v Speaker 1>and ones. Ultimately, when you get down to it, so

0:02:29.880 --> 0:02:31.880
<v Speaker 1>if you were to look at a sheet of zeros

0:02:31.919 --> 0:02:36.280
<v Speaker 1>and ones, it would probably seem completely incomprehensible to you,

0:02:36.440 --> 0:02:40.560
<v Speaker 1>although to a computer it could seem perfectly logical. Our

0:02:40.680 --> 0:02:46.000
<v Speaker 1>language is equally incomprehensible to machines. Programming languages make it

0:02:46.080 --> 0:02:49.079
<v Speaker 1>easier for humans to make machines do what we want

0:02:49.160 --> 0:02:52.960
<v Speaker 1>them to do. Programming languages create a level of abstraction

0:02:53.200 --> 0:02:56.200
<v Speaker 1>between human language and machine language. It's kind of a

0:02:56.600 --> 0:02:59.600
<v Speaker 1>meeting ground in the middle. Programming languages tend to be

0:02:59.720 --> 0:03:05.079
<v Speaker 1>highly structured with specific strict sets of rules. Programming within

0:03:05.160 --> 0:03:08.200
<v Speaker 1>those rules will get you the results you want, assuming

0:03:08.360 --> 0:03:11.960
<v Speaker 1>your code is good, but if you stray outside those rules,

0:03:12.160 --> 0:03:15.359
<v Speaker 1>you start to get errors. Human language is much more

0:03:15.440 --> 0:03:20.200
<v Speaker 1>variable and complicated and ambiguous, and that's something that machines

0:03:20.200 --> 0:03:22.880
<v Speaker 1>are not very good at handling. Now, if you've ever

0:03:22.880 --> 0:03:26.600
<v Speaker 1>played a text based adventure from way back in the day,

0:03:26.639 --> 0:03:29.800
<v Speaker 1>like Zork, you know that those adventure games have a

0:03:29.880 --> 0:03:34.080
<v Speaker 1>very limited vocabulary. The game can accept certain commands, but

0:03:34.200 --> 0:03:37.200
<v Speaker 1>only because the programmer built in the option in the game.

0:03:37.280 --> 0:03:40.880
<v Speaker 1>They incorporated that in the game's design. So you might

0:03:40.920 --> 0:03:44.200
<v Speaker 1>be able to type something like go north or just north,

0:03:44.280 --> 0:03:46.840
<v Speaker 1>and the game understands you want your character to move

0:03:46.880 --> 0:03:49.240
<v Speaker 1>to a new location that's to the north of your

0:03:49.240 --> 0:03:52.480
<v Speaker 1>current location. But maybe you type something else, maybe you

0:03:52.520 --> 0:03:57.120
<v Speaker 1>type jog north or saunter north, and the programmer didn't

0:03:57.160 --> 0:03:58.880
<v Speaker 1>think of that. They didn't come up with all the

0:03:58.920 --> 0:04:01.560
<v Speaker 1>different ways you have describe the way you want to

0:04:01.640 --> 0:04:04.240
<v Speaker 1>move north, so you might get a result that says

0:04:04.280 --> 0:04:07.440
<v Speaker 1>something like I didn't understand that, or you can't do

0:04:07.480 --> 0:04:12.360
<v Speaker 1>that here. Computers only have the illusion of understanding us.

0:04:12.400 --> 0:04:15.720
<v Speaker 1>They don't actually know what we mean when we say something,

0:04:15.760 --> 0:04:19.599
<v Speaker 1>at least not natively. Now, that meant that for most

0:04:19.640 --> 0:04:22.640
<v Speaker 1>of our history with computers, humans have had to learn

0:04:22.720 --> 0:04:25.560
<v Speaker 1>how to work with machines, not the other way around.

0:04:26.000 --> 0:04:30.719
<v Speaker 1>We have had to learn commands and syntax that machines accept,

0:04:31.120 --> 0:04:32.960
<v Speaker 1>and if we try to word those commands in a

0:04:33.000 --> 0:04:36.760
<v Speaker 1>different way, we tend to get an error. Natural language

0:04:36.760 --> 0:04:40.000
<v Speaker 1>processing attempts to flip the tables on this relationship and

0:04:40.000 --> 0:04:43.039
<v Speaker 1>teach machines how to work with humans so that we

0:04:43.080 --> 0:04:45.599
<v Speaker 1>don't have to go through any sort of learning curve.

0:04:45.640 --> 0:04:48.960
<v Speaker 1>We don't need to formulate our our commands in a

0:04:49.000 --> 0:04:53.360
<v Speaker 1>specific way to be understood. The technology works on our terms,

0:04:53.640 --> 0:04:56.640
<v Speaker 1>or as close to those as we can manage. That

0:04:56.720 --> 0:04:59.800
<v Speaker 1>means that programmers have to build systems that can parse

0:05:00.160 --> 0:05:03.680
<v Speaker 1>language for meaning, and it also means having to build

0:05:03.760 --> 0:05:07.160
<v Speaker 1>tools and machines that can handle stuff that you typically

0:05:07.240 --> 0:05:11.600
<v Speaker 1>encounter in higher level language courses. So here's a quick

0:05:11.720 --> 0:05:16.480
<v Speaker 1>rundown on some of the stuff a natural language processing

0:05:16.480 --> 0:05:21.000
<v Speaker 1>approach has to take into account. First, you have grammar. Now,

0:05:21.000 --> 0:05:25.120
<v Speaker 1>grammar can refer to the study of language, but generally speaking,

0:05:25.120 --> 0:05:27.200
<v Speaker 1>when we say grammar, or at least when I'm using

0:05:27.240 --> 0:05:30.640
<v Speaker 1>the term in the context of natural language processing, I

0:05:30.680 --> 0:05:35.320
<v Speaker 1>mean a set of rules for the organization of components

0:05:35.360 --> 0:05:39.760
<v Speaker 1>of a language into meaningful statements or sentences. This is

0:05:39.800 --> 0:05:43.520
<v Speaker 1>a broad concept. It is a big, big idea. It

0:05:43.560 --> 0:05:47.479
<v Speaker 1>actually encompasses a couple of other also big ideas that

0:05:47.520 --> 0:05:50.880
<v Speaker 1>are important in natural language processing. One of those is

0:05:50.920 --> 0:05:56.400
<v Speaker 1>the concept of morphology. Morphology has to do with word forms.

0:05:57.240 --> 0:06:01.080
<v Speaker 1>Words consist of more themes, and a word can actually

0:06:01.120 --> 0:06:04.599
<v Speaker 1>have multiple moreph themes. So, for example, let's take a

0:06:04.640 --> 0:06:10.080
<v Speaker 1>word like sky divers. Sky divers technically has four more themes,

0:06:10.120 --> 0:06:16.840
<v Speaker 1>and they are sky dive er and s sky divers.

0:06:16.880 --> 0:06:20.080
<v Speaker 1>The more themes only make sense if we put them

0:06:20.120 --> 0:06:24.760
<v Speaker 1>in that particular order. For the word skydivers, dive skiers

0:06:24.839 --> 0:06:27.760
<v Speaker 1>does not mean the same thing. Actually, it doesn't mean

0:06:27.880 --> 0:06:30.200
<v Speaker 1>anything at all. So a good system will have to

0:06:30.240 --> 0:06:34.200
<v Speaker 1>understand morphology and know how words can and cannot be formed.

0:06:34.600 --> 0:06:38.039
<v Speaker 1>So again, with skydivers and knows all right, well, I

0:06:38.200 --> 0:06:40.320
<v Speaker 1>know the word sky, I know what that means. I

0:06:40.360 --> 0:06:43.279
<v Speaker 1>know what the word dive means. Er means that this

0:06:43.360 --> 0:06:47.040
<v Speaker 1>is not an action. This is actually an entity that

0:06:47.160 --> 0:06:50.599
<v Speaker 1>engages in that action. Right. A sky diver is someone

0:06:50.640 --> 0:06:54.919
<v Speaker 1>who's skydives, and the s SO says it's plural, so

0:06:54.960 --> 0:06:59.200
<v Speaker 1>that there's more than one skydiver. That's what morphology is

0:06:59.240 --> 0:07:02.880
<v Speaker 1>all about. This is this sort of internal logic of

0:07:02.920 --> 0:07:09.240
<v Speaker 1>word formation. Syntax is another big concept within grammar. Syntax, however,

0:07:09.320 --> 0:07:13.560
<v Speaker 1>does not refer to word formation. It refers to sentence structure.

0:07:13.600 --> 0:07:18.680
<v Speaker 1>How do we arrange words to make meaningful sentences. For example,

0:07:18.880 --> 0:07:23.200
<v Speaker 1>the sentence you must have patience, my young Padawan. That

0:07:23.240 --> 0:07:27.560
<v Speaker 1>follows good syntax, but patients you must have my young

0:07:27.640 --> 0:07:31.360
<v Speaker 1>Padawan is a bit hanky because Yoda is all over

0:07:31.400 --> 0:07:35.760
<v Speaker 1>the place with his syntax. In addition to grammar, you

0:07:35.840 --> 0:07:39.240
<v Speaker 1>also have to take into account semantics. Now, that is

0:07:39.280 --> 0:07:43.240
<v Speaker 1>the study of the meaning within language. This is a

0:07:43.240 --> 0:07:46.160
<v Speaker 1>tricky one because there's a lot to unwrap here. For example,

0:07:46.480 --> 0:07:50.440
<v Speaker 1>words and phrases can actually stand for different meanings. They

0:07:50.440 --> 0:07:54.960
<v Speaker 1>can denote different ideas. We might use many different phrases

0:07:55.120 --> 0:07:58.320
<v Speaker 1>or words to describe the same concept. Right, So we

0:07:58.400 --> 0:08:02.320
<v Speaker 1>might use a usen or more different ways to say

0:08:02.360 --> 0:08:05.840
<v Speaker 1>the same thing, or we might use two similar words

0:08:05.960 --> 0:08:09.240
<v Speaker 1>or phrases to describe very different concepts. We might even

0:08:09.360 --> 0:08:13.880
<v Speaker 1>use the same phrase to describe wildly different things or

0:08:13.920 --> 0:08:16.840
<v Speaker 1>with very different meanings. Semantics gets down to what we

0:08:16.880 --> 0:08:20.320
<v Speaker 1>actually mean when we say something. If you've ever had

0:08:20.360 --> 0:08:23.920
<v Speaker 1>a discussion with someone and that person says, you know

0:08:24.000 --> 0:08:27.800
<v Speaker 1>what I meant, that's essentially a statement that indicates semantically

0:08:28.280 --> 0:08:31.800
<v Speaker 1>the meaning was clear, even if the phrasing did not

0:08:32.000 --> 0:08:35.800
<v Speaker 1>indicate it on the face of things. Then there is

0:08:35.880 --> 0:08:41.600
<v Speaker 1>pragmatics that's all about context. Contextual information is incredibly important

0:08:41.600 --> 0:08:45.240
<v Speaker 1>in communication, and it relates a little bit to semantics.

0:08:45.320 --> 0:08:50.000
<v Speaker 1>Semantics is about structure, and pragmatics is about context. So

0:08:50.040 --> 0:08:53.920
<v Speaker 1>if I say the weather sure is nice today, on

0:08:54.080 --> 0:08:55.880
<v Speaker 1>the face of it, that sounds like I'm in favor

0:08:56.080 --> 0:08:58.520
<v Speaker 1>of the way the weather is. Right, it sounds like, oh,

0:08:58.640 --> 0:09:01.120
<v Speaker 1>I like how the weather is. But if I say

0:09:01.120 --> 0:09:04.800
<v Speaker 1>that same phrase while I'm standing in a downpour and

0:09:04.880 --> 0:09:08.960
<v Speaker 1>I'm clearly not happy, I'm obviously being sarcastic. I mean

0:09:09.000 --> 0:09:12.600
<v Speaker 1>the opposite of what I actually said. The context of

0:09:12.640 --> 0:09:16.240
<v Speaker 1>the situation changes the meaning of what I am saying,

0:09:16.600 --> 0:09:19.839
<v Speaker 1>even though the actual phrasing would seem to indicate the

0:09:19.920 --> 0:09:23.959
<v Speaker 1>opposite of what my meaning was. As we develop more

0:09:24.000 --> 0:09:26.600
<v Speaker 1>technology that can communicate with us, we have to take

0:09:26.600 --> 0:09:30.120
<v Speaker 1>pragmatics into consideration, or else machines are going to be

0:09:30.160 --> 0:09:34.080
<v Speaker 1>misinterpreting what we actually mean when we say stuff. So

0:09:34.320 --> 0:09:36.160
<v Speaker 1>machines are going to have to learn how to deal

0:09:36.200 --> 0:09:41.280
<v Speaker 1>with stuff like sarcasm. Yeah. Right. Then we have phonology,

0:09:41.400 --> 0:09:44.680
<v Speaker 1>that is the sound of a language. I talked a

0:09:44.679 --> 0:09:48.000
<v Speaker 1>little bit about this in the Speech Recognition podcast about

0:09:48.000 --> 0:09:51.000
<v Speaker 1>how different languages have different phonemes. So I'm not going

0:09:51.040 --> 0:09:52.960
<v Speaker 1>to dwell on that again. You can listen to the

0:09:53.000 --> 0:09:56.200
<v Speaker 1>Speech Recognition podcast to learn more about it. But it

0:09:56.320 --> 0:09:59.439
<v Speaker 1>is an important element in languages, especially when you get

0:09:59.480 --> 0:10:05.000
<v Speaker 1>into uh natural language processing that is taking verbal input

0:10:05.120 --> 0:10:09.520
<v Speaker 1>and not just textual input. Then you have lexicons that's

0:10:09.559 --> 0:10:14.240
<v Speaker 1>the total vocabulary for a system. Ideally, alexicon has not

0:10:14.360 --> 0:10:18.240
<v Speaker 1>just the words, but some sort of metadata attached that

0:10:18.360 --> 0:10:22.000
<v Speaker 1>indicate the meaning of words or the relationship of words

0:10:22.080 --> 0:10:24.760
<v Speaker 1>with one another. Though you can fudge this a little

0:10:24.760 --> 0:10:27.280
<v Speaker 1>bit depending upon the implementation of the system. I'll talk

0:10:27.320 --> 0:10:30.719
<v Speaker 1>a lot more about that throughout these podcasts. Now, these

0:10:30.760 --> 0:10:34.840
<v Speaker 1>can be tricky concepts for human beings, let alone for machines.

0:10:35.160 --> 0:10:39.640
<v Speaker 1>Machines are very good at following strict sets of instructions,

0:10:40.120 --> 0:10:43.760
<v Speaker 1>but language can sometimes defy logic. Think of rules that

0:10:43.840 --> 0:10:47.960
<v Speaker 1>apply to your native language, then just think of the

0:10:48.000 --> 0:10:52.040
<v Speaker 1>exceptions that exist to those rules. Every language has exceptions

0:10:52.080 --> 0:10:55.520
<v Speaker 1>for rules that are established, and depending upon the rule

0:10:55.679 --> 0:10:58.160
<v Speaker 1>and the exception, there may seem to be no rhyme

0:10:58.440 --> 0:11:01.600
<v Speaker 1>or reason for the deviation and from the rule. Moreover,

0:11:02.240 --> 0:11:05.480
<v Speaker 1>if we want machines that are capable of understanding us

0:11:05.640 --> 0:11:08.680
<v Speaker 1>and responding to our language in a meaningful way, those

0:11:08.720 --> 0:11:12.040
<v Speaker 1>machines need to be able to handle the idiosyncrasies of

0:11:12.120 --> 0:11:16.360
<v Speaker 1>individual speakers. To some extent. There may be regional turns

0:11:16.400 --> 0:11:19.880
<v Speaker 1>of phrase or vocabulary that don't extend to the general

0:11:19.920 --> 0:11:24.199
<v Speaker 1>population of speakers of the respected language. So you might

0:11:24.440 --> 0:11:29.680
<v Speaker 1>encounter a person who speaks in local idioms quite a bit,

0:11:30.320 --> 0:11:33.520
<v Speaker 1>and if those are not frequently used in the broader

0:11:33.559 --> 0:11:37.320
<v Speaker 1>general population of that language, then you're gonna have a

0:11:37.320 --> 0:11:40.680
<v Speaker 1>lot of communication errors between that person and a machine

0:11:40.800 --> 0:11:44.880
<v Speaker 1>that is trying to process that language. Ideally, machines would

0:11:44.880 --> 0:11:48.520
<v Speaker 1>be able to understand whatever we say and interpret the

0:11:48.600 --> 0:11:52.360
<v Speaker 1>meaning correctly, although we haven't even gotten to a world

0:11:52.360 --> 0:11:54.920
<v Speaker 1>where human beings can do that reliably, So I don't

0:11:54.920 --> 0:11:57.360
<v Speaker 1>know why I'm holding machines up to such a high standard.

0:11:57.600 --> 0:11:59.960
<v Speaker 1>We definitely would want them to reach a certain love

0:12:00.200 --> 0:12:05.319
<v Speaker 1>of confidence and and capability, however that machines just are

0:12:05.360 --> 0:12:09.200
<v Speaker 1>not quite there yet. I'm going to talk a lot

0:12:09.360 --> 0:12:13.640
<v Speaker 1>more about the history of natural language processing in just

0:12:13.679 --> 0:12:16.680
<v Speaker 1>a moment, but first let's take a quick break to

0:12:16.800 --> 0:12:27.960
<v Speaker 1>thank our sponsor. The history of natural language processing is

0:12:28.120 --> 0:12:32.920
<v Speaker 1>pretty darn complicated because it involves multiple lines of research

0:12:33.120 --> 0:12:37.559
<v Speaker 1>and lots of different disciplines. So we have all sorts

0:12:37.559 --> 0:12:40.480
<v Speaker 1>of things that play into this, like hidden Markov models

0:12:40.520 --> 0:12:45.000
<v Speaker 1>I talked about those in the Speech Recognition podcast, neural networks,

0:12:45.360 --> 0:12:50.239
<v Speaker 1>referencing language using mathematical vectors, and a lot more contributing

0:12:50.240 --> 0:12:53.240
<v Speaker 1>to the evolution of natural language processing, and a lot

0:12:53.280 --> 0:12:58.359
<v Speaker 1>of disciplines like not just computer science, but linguistics and psychology.

0:12:58.520 --> 0:13:02.880
<v Speaker 1>So there's not like a single line I can follow

0:13:03.240 --> 0:13:07.040
<v Speaker 1>where it's a lad to be led to see. So

0:13:07.080 --> 0:13:10.240
<v Speaker 1>we're gonna be jumping around a little bit. However, one

0:13:10.280 --> 0:13:12.160
<v Speaker 1>of the sources I want to call out that I

0:13:12.240 --> 0:13:15.040
<v Speaker 1>used while I was researching this episode was a paper

0:13:15.080 --> 0:13:20.160
<v Speaker 1>written by Karen Spark Jones called Natural Language Processing a

0:13:20.240 --> 0:13:24.320
<v Speaker 1>Historical Review. It's pretty dense, it's pretty technical, but it's

0:13:24.360 --> 0:13:26.800
<v Speaker 1>also available to read online if you want a more

0:13:26.840 --> 0:13:29.840
<v Speaker 1>thorough treatment of the history of the technology up to

0:13:29.880 --> 0:13:32.400
<v Speaker 1>two thousand. I'm gonna be skimming over quite a bit

0:13:32.440 --> 0:13:35.000
<v Speaker 1>of it because, as I say, it gets really deep

0:13:35.040 --> 0:13:38.200
<v Speaker 1>and really technical, and it uses a lot of shorthand

0:13:38.240 --> 0:13:40.600
<v Speaker 1>to reference things, which meant that I had to do

0:13:40.640 --> 0:13:43.800
<v Speaker 1>a lot of jumping down research rabbit holes to learn more.

0:13:43.840 --> 0:13:47.960
<v Speaker 1>But it was a very useful starting point for this research.

0:13:48.440 --> 0:13:51.040
<v Speaker 1>And also it was published in two thousand one. Obviously

0:13:51.480 --> 0:13:54.320
<v Speaker 1>a lot has happened since then. We're almost two decades

0:13:54.400 --> 0:13:58.280
<v Speaker 1>out from that. But I'm gonna start at the beginning

0:13:58.320 --> 0:14:01.360
<v Speaker 1>and then work my way up to what's going on today.

0:14:01.400 --> 0:14:05.320
<v Speaker 1>So early work in natural language processing it actually surprised me.

0:14:05.360 --> 0:14:07.640
<v Speaker 1>I was surprised at how old it was. It actually

0:14:07.720 --> 0:14:11.079
<v Speaker 1>dates all the way back to the nineteen forties. Physicist

0:14:11.120 --> 0:14:15.360
<v Speaker 1>and computer scientist Andrew Donald Booth proposed using computers to

0:14:15.400 --> 0:14:19.360
<v Speaker 1>translate passages from one language into another, which is the

0:14:19.400 --> 0:14:21.640
<v Speaker 1>type of natural language processing. You have to be able

0:14:21.640 --> 0:14:25.200
<v Speaker 1>to recognize the words of one language and then map

0:14:25.320 --> 0:14:28.800
<v Speaker 1>them to a similar meaning in a different language. Now,

0:14:28.840 --> 0:14:33.000
<v Speaker 1>Booth's approach involved creating a word for word model. If

0:14:33.000 --> 0:14:36.440
<v Speaker 1>the model couldn't find a match between two words, it

0:14:36.480 --> 0:14:40.440
<v Speaker 1>would automatically discard the last letter on the input word

0:14:40.760 --> 0:14:43.840
<v Speaker 1>and try again. It would do this until it found

0:14:43.840 --> 0:14:45.840
<v Speaker 1>a match, or if it didn't find a match, you've

0:14:45.840 --> 0:14:48.720
<v Speaker 1>got an error. But it would find a match, it

0:14:48.760 --> 0:14:51.080
<v Speaker 1>would search its memory to see if the ending of

0:14:51.120 --> 0:14:54.320
<v Speaker 1>the input word could give information about what the ending

0:14:54.440 --> 0:14:57.920
<v Speaker 1>does to the meaning of the word. So, for example,

0:14:58.120 --> 0:15:01.240
<v Speaker 1>if you were using this to tr inslate from English

0:15:01.320 --> 0:15:06.680
<v Speaker 1>into Russian and you use the word writer, maybe writer

0:15:07.040 --> 0:15:11.240
<v Speaker 1>does not show up in the Russian lexicon, but right

0:15:11.720 --> 0:15:17.840
<v Speaker 1>does w R I T E. So the translating program

0:15:17.880 --> 0:15:21.800
<v Speaker 1>tries to translate writer from English into Russian, cannot find

0:15:21.840 --> 0:15:25.760
<v Speaker 1>a Russian equivalent to writer, drops the r looks for

0:15:25.800 --> 0:15:28.440
<v Speaker 1>the Russian word for right, and it finds it. Then says,

0:15:28.480 --> 0:15:31.640
<v Speaker 1>all right, well, in English, what does writer remain? What

0:15:31.680 --> 0:15:36.200
<v Speaker 1>does that are due to the word right and it

0:15:36.240 --> 0:15:38.480
<v Speaker 1>looks at its memory and finds out that the letter

0:15:38.720 --> 0:15:43.040
<v Speaker 1>R makes a a noun out of the verb, but

0:15:43.240 --> 0:15:47.160
<v Speaker 1>it creates an entity that does the action, which is

0:15:47.400 --> 0:15:51.280
<v Speaker 1>to right. Then it looks in the Russian lexicon and says,

0:15:51.520 --> 0:15:54.720
<v Speaker 1>all right, well, is there a word in that lexicon

0:15:55.160 --> 0:15:59.400
<v Speaker 1>that matches this meaning. It's kind of a slow, laborious

0:15:59.480 --> 0:16:02.240
<v Speaker 1>way of doing things, but was also very very early.

0:16:02.280 --> 0:16:07.360
<v Speaker 1>I mean it was the following year, in nine, Warren

0:16:07.440 --> 0:16:12.160
<v Speaker 1>Weaver produced a memorandum about machine translation, and Weaver admitted

0:16:12.200 --> 0:16:14.840
<v Speaker 1>in the memorandum that such an application would likely be

0:16:14.960 --> 0:16:17.760
<v Speaker 1>much more challenging than what he understood it to be,

0:16:18.360 --> 0:16:21.560
<v Speaker 1>but that he was quote willing to expose my ignorance,

0:16:21.600 --> 0:16:24.920
<v Speaker 1>hoping that will be slightly shielded by my intentions in

0:16:25.000 --> 0:16:28.640
<v Speaker 1>the quote. And I think that's rather charming. In that memo,

0:16:29.080 --> 0:16:32.560
<v Speaker 1>Weaver cites a letter he wrote to Professor Norbert Wiener

0:16:32.680 --> 0:16:36.800
<v Speaker 1>of M I T. And that included the following paragraph.

0:16:36.880 --> 0:16:40.600
<v Speaker 1>So here's a full paragraph. Actually it's two paragraphs from

0:16:40.640 --> 0:16:45.920
<v Speaker 1>the memorandum recognizing fully, even though necessarily vaguely, the semantic

0:16:46.000 --> 0:16:50.400
<v Speaker 1>difficulties because of multiple meanings, etcetera. I have wondered if

0:16:50.440 --> 0:16:54.119
<v Speaker 1>it were unthinkable to design a computer which would translate,

0:16:54.520 --> 0:16:58.360
<v Speaker 1>even if it would only translate only scientific material, where

0:16:58.360 --> 0:17:02.280
<v Speaker 1>the semantic difficulties are very notably less, and even if

0:17:02.320 --> 0:17:06.440
<v Speaker 1>it did produce an inelegant but intelligible result, it would

0:17:06.440 --> 0:17:10.960
<v Speaker 1>seem to me worthwhile also knowing nothing official about, but

0:17:11.280 --> 0:17:16.199
<v Speaker 1>having guests and inferred considerable about powerful new mechanized methods

0:17:16.200 --> 0:17:20.040
<v Speaker 1>and cryptography methods which I believe succeed even when one

0:17:20.080 --> 0:17:23.560
<v Speaker 1>does not know what language has been coded. One naturally

0:17:23.640 --> 0:17:27.400
<v Speaker 1>wonders if the problem of translation could conceivably be treated

0:17:27.440 --> 0:17:30.600
<v Speaker 1>as a problem in cryptography. When I look at an

0:17:30.680 --> 0:17:34.040
<v Speaker 1>article in Russian, I say, this is really written in English,

0:17:34.119 --> 0:17:36.960
<v Speaker 1>but it has been coded in some strange symbols I

0:17:37.000 --> 0:17:40.439
<v Speaker 1>will now proceed to decode. So he got this idea

0:17:40.480 --> 0:17:43.720
<v Speaker 1>because of activities that were going on in World War Two,

0:17:44.240 --> 0:17:48.240
<v Speaker 1>where teams were trying to decode messages. And they might

0:17:48.400 --> 0:17:52.520
<v Speaker 1>decode the message, they might figure out what letters correspond

0:17:52.600 --> 0:17:55.159
<v Speaker 1>to the code, but it may even be in a

0:17:55.160 --> 0:17:58.400
<v Speaker 1>totally different language than when they speak. So while they

0:17:58.400 --> 0:18:01.680
<v Speaker 1>are able to decode the message into a native language,

0:18:01.720 --> 0:18:04.320
<v Speaker 1>they are not able to speak that language. He says, well,

0:18:04.320 --> 0:18:07.000
<v Speaker 1>what if we just take that same step, and now

0:18:07.040 --> 0:18:10.399
<v Speaker 1>we treat the other language as a code in of

0:18:10.480 --> 0:18:13.320
<v Speaker 1>itself and try to translate that into English or or

0:18:13.560 --> 0:18:17.600
<v Speaker 1>decrypt it into English. Weaver are acknowledged that the word

0:18:17.800 --> 0:18:21.320
<v Speaker 1>into word approach that Booth and his contemporaries were relying

0:18:21.400 --> 0:18:25.080
<v Speaker 1>upon had limited utility. He wrote, quote, it is in

0:18:25.160 --> 0:18:28.639
<v Speaker 1>fact amply clear that a translation procedure that does little

0:18:28.640 --> 0:18:31.200
<v Speaker 1>more than handle a one to one correspondence of words

0:18:31.520 --> 0:18:35.440
<v Speaker 1>cannot hope to be useful for problems of literary translation

0:18:35.760 --> 0:18:38.680
<v Speaker 1>in which style is important, and in which the problems

0:18:38.720 --> 0:18:42.879
<v Speaker 1>of idiom, multiple meanings, etcetera. Are frequent. End quote. So

0:18:42.920 --> 0:18:46.679
<v Speaker 1>there he's saying, you can't just take a foreign word,

0:18:47.160 --> 0:18:51.560
<v Speaker 1>translate it into whatever the closest equivalent in English is,

0:18:52.080 --> 0:18:55.520
<v Speaker 1>and hope to get the same meaning, especially in literary works,

0:18:55.640 --> 0:18:58.720
<v Speaker 1>because they are all these different turns of phrase and

0:18:58.840 --> 0:19:03.199
<v Speaker 1>cultural meanings that will get lost. In that translation. You

0:19:03.240 --> 0:19:07.280
<v Speaker 1>would have something that might technically be considered more or

0:19:07.359 --> 0:19:10.520
<v Speaker 1>less correct, but would not be actually correct. You wouldn't

0:19:10.520 --> 0:19:14.840
<v Speaker 1>be getting across the meaning of the author in that translation.

0:19:15.080 --> 0:19:19.800
<v Speaker 1>You would just have words in a syntactical order that

0:19:19.880 --> 0:19:24.360
<v Speaker 1>would make sense from a syntax perspective. In other words,

0:19:24.400 --> 0:19:28.600
<v Speaker 1>you would have sentences that held up grammatically, but they

0:19:28.600 --> 0:19:33.240
<v Speaker 1>wouldn't necessarily have the meaning of the original writing. Weaver's

0:19:33.240 --> 0:19:36.600
<v Speaker 1>proposal was to perhaps expand the word into word model

0:19:36.680 --> 0:19:39.000
<v Speaker 1>and create a system that would analyze not just the

0:19:39.040 --> 0:19:42.800
<v Speaker 1>target word, but the words adjacent to the target in

0:19:42.920 --> 0:19:46.479
<v Speaker 1>order to determine the context of the word the meaning

0:19:46.680 --> 0:19:48.720
<v Speaker 1>of the word. As we'll see when we get a

0:19:48.760 --> 0:19:51.359
<v Speaker 1>little bit further down in the timeline, this is one

0:19:51.400 --> 0:19:54.520
<v Speaker 1>of the methods that folks working in in natural language

0:19:54.520 --> 0:19:58.080
<v Speaker 1>processing incorporated into their approach. So this was incredibly forward

0:19:58.080 --> 0:20:02.800
<v Speaker 1>thinking of Weaver. On January seven, ninety four, researchers from

0:20:02.800 --> 0:20:06.720
<v Speaker 1>IBM and Georgetown University demonstrated a system that was able

0:20:06.760 --> 0:20:12.760
<v Speaker 1>to translate around sixty sentences from Russian into English automatically. Now,

0:20:12.760 --> 0:20:16.359
<v Speaker 1>the process wasn't exactly painless. It required an operator to

0:20:16.440 --> 0:20:19.800
<v Speaker 1>take a sentence written in Russian but transcribed for the

0:20:19.800 --> 0:20:23.439
<v Speaker 1>English alphabet. It wasn't in the cyrillic alphabet. The person

0:20:23.520 --> 0:20:27.800
<v Speaker 1>would then encode that sentence on punch cards. They would

0:20:27.800 --> 0:20:30.800
<v Speaker 1>feed the punch cards into a seven oh one computer.

0:20:31.359 --> 0:20:34.480
<v Speaker 1>I mentioned the seven oh one that was an IBM system,

0:20:34.520 --> 0:20:37.080
<v Speaker 1>but I mentioned that in the previous episode and speech recognition.

0:20:37.359 --> 0:20:40.440
<v Speaker 1>Then they would wait for the translation program's response, which

0:20:40.440 --> 0:20:43.000
<v Speaker 1>would take a few seconds. The program would attempt to

0:20:43.040 --> 0:20:47.480
<v Speaker 1>translate the words from Russian to English. The demonstration was impressive,

0:20:47.600 --> 0:20:50.480
<v Speaker 1>but it was limited in scope. The program had alexicon

0:20:50.560 --> 0:20:53.640
<v Speaker 1>of only two fifty words or so, and it required

0:20:53.680 --> 0:20:58.199
<v Speaker 1>extensive programming to cope with syntax because word order in

0:20:58.320 --> 0:21:02.679
<v Speaker 1>Russian is different then word order in English, and you

0:21:02.720 --> 0:21:06.560
<v Speaker 1>can think of the programming as including metadata. The researchers

0:21:06.560 --> 0:21:11.000
<v Speaker 1>would tag Russian words with little signs that related to

0:21:11.119 --> 0:21:14.480
<v Speaker 1>specific rules. So, for example, one of the terms the

0:21:14.520 --> 0:21:18.760
<v Speaker 1>system could translate was a Russian two word phrase. It

0:21:18.840 --> 0:21:25.200
<v Speaker 1>was g dial major, which is I'm butchering the Russian pronunciation,

0:21:25.280 --> 0:21:28.760
<v Speaker 1>but in English it means major general. But the word

0:21:28.880 --> 0:21:32.119
<v Speaker 1>order is reversed in Russian. If you did a strict

0:21:32.160 --> 0:21:35.760
<v Speaker 1>word to word translation, you would get general major with

0:21:35.840 --> 0:21:39.600
<v Speaker 1>the translation, because that's the order that the Russian phrase

0:21:39.600 --> 0:21:42.560
<v Speaker 1>would put it in. So the programmers would tag each

0:21:42.600 --> 0:21:45.879
<v Speaker 1>word with a rule to kind of give the idea

0:21:45.920 --> 0:21:48.880
<v Speaker 1>of of what what you would what you should follow

0:21:48.920 --> 0:21:51.520
<v Speaker 1>when you're making these translations, and by you I mean

0:21:51.720 --> 0:21:55.359
<v Speaker 1>the computer system. So the word for general got the

0:21:55.400 --> 0:22:00.679
<v Speaker 1>assignment of rule twenty one and the rule for major

0:22:01.200 --> 0:22:04.879
<v Speaker 1>got the sign on. So when the system encountered a word,

0:22:05.240 --> 0:22:08.320
<v Speaker 1>it would look up any related rules to that word.

0:22:08.720 --> 0:22:11.200
<v Speaker 1>So if it comes across a word that has the

0:22:11.240 --> 0:22:14.760
<v Speaker 1>associated rule one, it would say, all right, this rule

0:22:14.800 --> 0:22:17.200
<v Speaker 1>tells me I have to go back over the message

0:22:17.240 --> 0:22:19.240
<v Speaker 1>and look to see if there was a rule twenty

0:22:19.280 --> 0:22:22.720
<v Speaker 1>one word in that same phrase, And if it finds

0:22:22.760 --> 0:22:25.159
<v Speaker 1>a rule twenty one word, it would then know I

0:22:25.240 --> 0:22:29.479
<v Speaker 1>need to reverse the order of these two words. This

0:22:29.480 --> 0:22:33.080
<v Speaker 1>this uh word order that appears in Russian needs to

0:22:33.119 --> 0:22:36.280
<v Speaker 1>be flipped for English. Now that's a pretty laborious process

0:22:36.840 --> 0:22:40.280
<v Speaker 1>and it doesn't work great for larger lexicons. The larger

0:22:40.440 --> 0:22:43.879
<v Speaker 1>the vocabulary, the more complex the sentences can become, the

0:22:43.920 --> 0:22:47.000
<v Speaker 1>more exceptions and rules you're going to encounter. It would

0:22:47.040 --> 0:22:49.640
<v Speaker 1>be really hard to implement this on a big scale,

0:22:49.680 --> 0:22:53.119
<v Speaker 1>but it was an impressive display of machine translation. The

0:22:53.160 --> 0:22:56.440
<v Speaker 1>system was essentially a vocabulary list and a long series

0:22:56.480 --> 0:23:00.919
<v Speaker 1>of if then rules. If the word is this, then

0:23:01.040 --> 0:23:04.919
<v Speaker 1>look for this. If that is there, then switch the

0:23:05.119 --> 0:23:09.720
<v Speaker 1>word order. Essentially according to articles, it could translate sentences

0:23:09.760 --> 0:23:13.640
<v Speaker 1>designed for the system in about six seconds. But again

0:23:13.720 --> 0:23:17.640
<v Speaker 1>it was designed for the system, very limited vocabulary, so

0:23:18.359 --> 0:23:21.639
<v Speaker 1>limited implementation there. And it's good to point out that

0:23:21.680 --> 0:23:24.200
<v Speaker 1>a lot of work and machine translation around this time

0:23:24.240 --> 0:23:27.919
<v Speaker 1>focused on English and Russian, which is no big surprise.

0:23:28.720 --> 0:23:30.720
<v Speaker 1>Keep in mind the time scale we're talking about the

0:23:30.800 --> 0:23:34.719
<v Speaker 1>nineteen fifties. Here, the USA and the then USS are

0:23:34.880 --> 0:23:38.000
<v Speaker 1>we're not on great terms. Both countries were using pretty

0:23:38.080 --> 0:23:41.439
<v Speaker 1>much every means at their disposal to analyze one another,

0:23:41.960 --> 0:23:44.919
<v Speaker 1>to spy on one another, to maneuver to make certain

0:23:44.960 --> 0:23:47.800
<v Speaker 1>the other nation didn't get a superior position. And we

0:23:47.800 --> 0:23:50.640
<v Speaker 1>saw a lot of technological development during this period, including

0:23:50.680 --> 0:23:53.800
<v Speaker 1>the space race that was all wrapped up in this

0:23:53.880 --> 0:23:57.840
<v Speaker 1>Cold War issue as well, and perhaps as no big surprise,

0:23:57.920 --> 0:24:00.520
<v Speaker 1>the US government was pretty keen to fund research and

0:24:00.560 --> 0:24:03.919
<v Speaker 1>development in machine translation up to a point. That is,

0:24:04.240 --> 0:24:09.440
<v Speaker 1>in nineteen sixty six, Joseph Wisenbaum published a computer program

0:24:09.480 --> 0:24:12.919
<v Speaker 1>called Eliza. I've talked about Eliza in previous episodes of

0:24:12.960 --> 0:24:16.800
<v Speaker 1>Tech Stuff. This was a primitive chat bought text based

0:24:17.000 --> 0:24:22.119
<v Speaker 1>chat bot. It mimicked a Rogerian psychotherapist. That's a discipline

0:24:22.160 --> 0:24:26.440
<v Speaker 1>that was pioneered by the psychologist Carl Rogers. It's sometimes

0:24:26.480 --> 0:24:31.679
<v Speaker 1>also called persons centered therapy. Eliza was strictly this text

0:24:31.680 --> 0:24:34.760
<v Speaker 1>based terminal operation. You would see a line of text

0:24:34.800 --> 0:24:37.000
<v Speaker 1>pop up. It would ask you how what how you're doing?

0:24:37.440 --> 0:24:39.080
<v Speaker 1>You can type stuff in and then it would respond

0:24:39.119 --> 0:24:43.399
<v Speaker 1>to you, so you would get the responses that appeared

0:24:43.440 --> 0:24:46.400
<v Speaker 1>to be semi intelligent. Typically it would be a question

0:24:46.440 --> 0:24:49.600
<v Speaker 1>to ask for more information, or sometimes it would be

0:24:49.640 --> 0:24:52.600
<v Speaker 1>a phrase to change the subject. So you might say

0:24:53.240 --> 0:24:56.640
<v Speaker 1>something along the lines of I'm so angry right now,

0:24:56.880 --> 0:24:59.800
<v Speaker 1>and Eliza might respond with what has made you angry?

0:25:00.320 --> 0:25:03.880
<v Speaker 1>So Eliza has flipped this around in order to sustain

0:25:03.920 --> 0:25:06.280
<v Speaker 1>the conversation. Then you could type in something else. Maybe

0:25:06.280 --> 0:25:09.760
<v Speaker 1>you type in everything is going wrong today, and Eliza

0:25:09.880 --> 0:25:12.600
<v Speaker 1>might respond with can you give me an example? And

0:25:12.640 --> 0:25:15.800
<v Speaker 1>then so on. Eliza would give the appearance of understanding

0:25:15.800 --> 0:25:18.640
<v Speaker 1>the subject, but in reality it was simply taking the input,

0:25:19.320 --> 0:25:22.040
<v Speaker 1>analyzing the parts of speech, then sending back a very

0:25:22.080 --> 0:25:25.760
<v Speaker 1>similar message or a related message in an effort to

0:25:25.840 --> 0:25:28.880
<v Speaker 1>keep the conversation going. Like it might just be a placeholder.

0:25:29.520 --> 0:25:33.000
<v Speaker 1>The program did not understand language or context beyond being

0:25:33.040 --> 0:25:35.840
<v Speaker 1>able to parse the basic parts of a sentence and

0:25:35.880 --> 0:25:39.480
<v Speaker 1>then rearrange them or go with several stock responses when

0:25:40.040 --> 0:25:42.399
<v Speaker 1>it didn't have a way of figuring out what it

0:25:42.400 --> 0:25:47.359
<v Speaker 1>should do. NIX also saw something else that would end

0:25:47.440 --> 0:25:50.400
<v Speaker 1>up creating a bit of a big setback for natural

0:25:50.520 --> 0:25:54.840
<v Speaker 1>language processor researchers. But I'll explain more about that when

0:25:54.920 --> 0:26:06.160
<v Speaker 1>we come back after a quick break to thank our sponsors. Okay,

0:26:06.200 --> 0:26:11.080
<v Speaker 1>So nineteen sixty six, what happened that set back research

0:26:11.119 --> 0:26:15.040
<v Speaker 1>in this field. Well, that's when a report was published

0:26:15.040 --> 0:26:18.359
<v Speaker 1>that had a dramatic impact on funding for R and

0:26:18.440 --> 0:26:22.680
<v Speaker 1>D and machine translation. It was called the ALPAC Report.

0:26:23.240 --> 0:26:27.680
<v Speaker 1>ALPAC a l p a C stood for Automatic Language

0:26:27.760 --> 0:26:31.800
<v Speaker 1>Processing Advisory Committee. This was a group consisting of various

0:26:31.800 --> 0:26:36.280
<v Speaker 1>experts and fields ranging from computer science to linguistics to psychology,

0:26:36.560 --> 0:26:38.960
<v Speaker 1>and the U. S. Government had established the committee back

0:26:39.000 --> 0:26:42.119
<v Speaker 1>in nineteen sixty four, and they had a very simple assignment,

0:26:42.440 --> 0:26:46.000
<v Speaker 1>or at least simple on the surface, which was evaluate

0:26:46.119 --> 0:26:49.880
<v Speaker 1>the progress that was being made an automatic machine translation

0:26:50.160 --> 0:26:53.360
<v Speaker 1>across the board, look at what everyone's working on. Give

0:26:53.480 --> 0:26:56.320
<v Speaker 1>us an idea of where we are and where we're headed.

0:26:56.680 --> 0:27:00.720
<v Speaker 1>The nineteen sixty six report essentially concluded that the field

0:27:00.800 --> 0:27:03.800
<v Speaker 1>was still in its infancy, and that before any real

0:27:03.840 --> 0:27:07.479
<v Speaker 1>advancements could happen, a lot more basic research in the

0:27:07.520 --> 0:27:11.879
<v Speaker 1>field of computational linguistics would be required. So essentially, the

0:27:11.960 --> 0:27:14.840
<v Speaker 1>report was saying, we're trying to move at a full gallop,

0:27:14.880 --> 0:27:17.040
<v Speaker 1>but we still aren't really sure how to get on

0:27:17.080 --> 0:27:21.520
<v Speaker 1>the horse. I'm paraphrasing, of course. One result of this

0:27:21.880 --> 0:27:25.119
<v Speaker 1>was that the US government began to scale back grants

0:27:25.200 --> 0:27:28.639
<v Speaker 1>for research in the field of machine translation. This was,

0:27:28.720 --> 0:27:33.000
<v Speaker 1>unfortunately exactly the opposite thing that needed to happen. The

0:27:33.040 --> 0:27:37.280
<v Speaker 1>US government wanted more immediate results and decided, well, if

0:27:37.320 --> 0:27:39.680
<v Speaker 1>you're not going to get results right away, we're gonna

0:27:39.760 --> 0:27:42.679
<v Speaker 1>take that money away and put it to use somewhere else.

0:27:43.240 --> 0:27:46.119
<v Speaker 1>And that made funding scarce, and it likely prolonged the

0:27:46.160 --> 0:27:49.280
<v Speaker 1>amount of time it took to advance the discipline. Although

0:27:49.400 --> 0:27:52.080
<v Speaker 1>I should stress work was still being performed in the

0:27:52.160 --> 0:27:54.960
<v Speaker 1>United States as well as elsewhere. It's not like this

0:27:55.359 --> 0:27:58.760
<v Speaker 1>brought everything to a standstill. It just slowed down quite

0:27:58.800 --> 0:28:04.120
<v Speaker 1>a bit. By teen sixty seven, NLP research was straining

0:28:04.280 --> 0:28:10.600
<v Speaker 1>against technological limitations. They were starting to feel the the

0:28:10.840 --> 0:28:14.440
<v Speaker 1>very limit of what computers were able to do. Even

0:28:14.480 --> 0:28:18.240
<v Speaker 1>advanced systems could take upwards of seven minutes to analyze

0:28:18.280 --> 0:28:22.800
<v Speaker 1>a long sentence. Programming was still largely in a similar language,

0:28:22.840 --> 0:28:24.760
<v Speaker 1>so it wasn't easy to do. And you would still

0:28:24.800 --> 0:28:27.639
<v Speaker 1>have to interact with machines using punch cards, so that

0:28:27.720 --> 0:28:30.440
<v Speaker 1>was also laborious, and heaven help you if you dropped

0:28:30.480 --> 0:28:32.480
<v Speaker 1>all your punch cards and you forgot to number them,

0:28:32.520 --> 0:28:36.439
<v Speaker 1>because then you've ruined your program. Work was progressing on

0:28:36.480 --> 0:28:39.560
<v Speaker 1>the linguistic side, but the technological side was kind of

0:28:39.640 --> 0:28:42.840
<v Speaker 1>lagging behind at this point. One of the big decisions

0:28:42.880 --> 0:28:45.719
<v Speaker 1>researchers had to make around this time was what were

0:28:45.760 --> 0:28:49.600
<v Speaker 1>they going to focus on first while building out computational linguistics.

0:28:49.640 --> 0:28:53.200
<v Speaker 1>Because it's such a huge problem you couldn't really tackle

0:28:53.240 --> 0:28:56.840
<v Speaker 1>it wholesale. You needed to kind of focus on specifics.

0:28:56.880 --> 0:29:00.920
<v Speaker 1>So should research focus on syntax that all about sentence

0:29:01.000 --> 0:29:03.400
<v Speaker 1>form and structure, as I mentioned earlier, or should it

0:29:03.440 --> 0:29:06.640
<v Speaker 1>focus on semantics, which is more about the underlying meaning

0:29:06.760 --> 0:29:09.760
<v Speaker 1>of what was said and less about the structure of

0:29:09.840 --> 0:29:15.000
<v Speaker 1>how it was said. And ultimately, most researchers, not all

0:29:15.040 --> 0:29:17.840
<v Speaker 1>of them, but most of them decided to focus on syntax.

0:29:17.920 --> 0:29:20.600
<v Speaker 1>For one thing, it seemed like a more analytical thing

0:29:20.640 --> 0:29:24.360
<v Speaker 1>to concentrate on. Right like, you could define rules more

0:29:24.440 --> 0:29:28.240
<v Speaker 1>easily for syntax than you could for semantics, and semantic

0:29:28.280 --> 0:29:31.200
<v Speaker 1>ambiguity could be fudged a bit. You can rely heavily

0:29:31.240 --> 0:29:34.680
<v Speaker 1>on output words that had a broad meaning. So using

0:29:34.680 --> 0:29:37.720
<v Speaker 1>a word with a broad meaning might not produce a specific,

0:29:37.960 --> 0:29:41.800
<v Speaker 1>precise result, but at least could be quote not wrong

0:29:41.920 --> 0:29:46.360
<v Speaker 1>end quote. So if a word might have several translations

0:29:46.480 --> 0:29:50.520
<v Speaker 1>ranging from hut to villa to bungalow to mansion, the

0:29:50.560 --> 0:29:55.400
<v Speaker 1>output word might be building because the translating program might

0:29:55.400 --> 0:29:59.120
<v Speaker 1>not know which variation of that translation it should go with,

0:29:59.680 --> 0:30:03.800
<v Speaker 1>but knows that all of those different examples fall into

0:30:03.840 --> 0:30:08.440
<v Speaker 1>a larger category called building. So that's not precise, but

0:30:08.480 --> 0:30:11.560
<v Speaker 1>it gets the job done. You you would understand what

0:30:11.680 --> 0:30:15.880
<v Speaker 1>the the actual noun was. In general, you would know

0:30:15.920 --> 0:30:17.640
<v Speaker 1>it was a building. You might not know that it

0:30:17.720 --> 0:30:20.400
<v Speaker 1>was a home, and you might not know what kind

0:30:20.440 --> 0:30:22.280
<v Speaker 1>of home it was, but you would at least know

0:30:22.520 --> 0:30:24.760
<v Speaker 1>that it was a structure. So much of the work

0:30:24.760 --> 0:30:28.640
<v Speaker 1>in the late nineteen sixties focused on solving syntax problems

0:30:28.680 --> 0:30:33.720
<v Speaker 1>for computers, with the researchers saying will worry about semantics later.

0:30:34.400 --> 0:30:37.160
<v Speaker 1>Some notable groups went against the flow and decided to

0:30:37.160 --> 0:30:42.320
<v Speaker 1>tackle semantics and semantically driven processing, partly because they recognized

0:30:42.360 --> 0:30:46.080
<v Speaker 1>it as being a really tough problem and some engineers

0:30:46.160 --> 0:30:50.000
<v Speaker 1>just love solving really hard problems. That's kind of what

0:30:50.280 --> 0:30:53.080
<v Speaker 1>thrills them, and so they chose to go that route.

0:30:53.400 --> 0:30:57.080
<v Speaker 1>They began building out semantic categories and worked on semantic

0:30:57.160 --> 0:31:02.280
<v Speaker 1>pattern matching using semantic networks as a means of knowledge representation.

0:31:03.440 --> 0:31:06.800
<v Speaker 1>Karen Spark Jones, who wrote that that history I mentioned earlier,

0:31:06.800 --> 0:31:09.840
<v Speaker 1>suggests that it was in the late nineteen sixties that

0:31:10.040 --> 0:31:13.320
<v Speaker 1>the research moved out of its initial phase and into

0:31:13.360 --> 0:31:16.560
<v Speaker 1>a second phase, and that second phase was largely marked

0:31:16.560 --> 0:31:22.600
<v Speaker 1>by the incorporation of artificial intelligence, including incorporating world knowledge

0:31:22.720 --> 0:31:27.680
<v Speaker 1>in processing natural language. In nineteen sixty eight, Terry Winograd,

0:31:27.760 --> 0:31:30.800
<v Speaker 1>who today is a Professor Emeritus of Computer Science at

0:31:30.840 --> 0:31:34.000
<v Speaker 1>Stanford University, was working in M I. T. S AI

0:31:34.120 --> 0:31:37.520
<v Speaker 1>Lab as part of his postgraduate studies, and he began

0:31:37.600 --> 0:31:40.920
<v Speaker 1>to work on a virtual world he would call s

0:31:41.120 --> 0:31:45.400
<v Speaker 1>h R D l U sued blue. Um, that's what

0:31:45.440 --> 0:31:47.880
<v Speaker 1>I'm going to call it is sued blue. It consisted

0:31:47.920 --> 0:31:51.600
<v Speaker 1>of virtual objects on a virtual table, so it's all imaginary, right.

0:31:52.040 --> 0:31:56.280
<v Speaker 1>He then programmed a grammar and lexicon specifically for this

0:31:56.560 --> 0:32:00.880
<v Speaker 1>very very limited imaginary world. So in anything that did

0:32:00.920 --> 0:32:04.000
<v Speaker 1>not involve the things that were in this imaginary world,

0:32:04.080 --> 0:32:07.560
<v Speaker 1>namely the table and these virtual objects, that didn't need

0:32:07.640 --> 0:32:10.520
<v Speaker 1>to be dealt with it all because it was immaterial,

0:32:10.600 --> 0:32:13.640
<v Speaker 1>It didn't exist in this universe. So he only had

0:32:13.640 --> 0:32:15.800
<v Speaker 1>to focus on the elements he had created, and that

0:32:15.920 --> 0:32:18.480
<v Speaker 1>limited the scope of his work and made it more manageable.

0:32:19.040 --> 0:32:22.800
<v Speaker 1>His design even included the concept of persistence and memory.

0:32:23.600 --> 0:32:28.160
<v Speaker 1>So imagine a table with a collection of five objects

0:32:28.200 --> 0:32:30.400
<v Speaker 1>on it. So you've got imaginary table, You've got five

0:32:30.440 --> 0:32:34.480
<v Speaker 1>imaginary objects on it. Two of the five imaginary objects

0:32:34.520 --> 0:32:37.600
<v Speaker 1>are spheres. One of them is a green sphere, and

0:32:37.640 --> 0:32:39.960
<v Speaker 1>one of them is a red sphere. You then type

0:32:39.960 --> 0:32:43.680
<v Speaker 1>in a command into a terminal that is that's giving

0:32:43.680 --> 0:32:47.200
<v Speaker 1>you information about this virtual world, and you say, I

0:32:47.240 --> 0:32:50.360
<v Speaker 1>want to move the red sphere over to the far

0:32:50.600 --> 0:32:53.280
<v Speaker 1>end of the table. And then you send another command,

0:32:53.320 --> 0:32:55.680
<v Speaker 1>only this time you don't specify red sphere. You just

0:32:55.720 --> 0:32:59.680
<v Speaker 1>say move the sphere back. Whino grad system could actually

0:32:59.720 --> 0:33:02.720
<v Speaker 1>remember ber that you had previously moved the red sphere,

0:33:03.040 --> 0:33:05.240
<v Speaker 1>and it would apply your command to the red sphere

0:33:05.280 --> 0:33:08.520
<v Speaker 1>again under the assumption that's what you meant. When you

0:33:08.840 --> 0:33:12.320
<v Speaker 1>didn't specify, you must have meant the same sphere that

0:33:12.400 --> 0:33:15.200
<v Speaker 1>you had just moved. This is a concept that we're

0:33:15.200 --> 0:33:18.880
<v Speaker 1>seeing rolled out into voice assistance today, like Google Assistant.

0:33:19.360 --> 0:33:22.720
<v Speaker 1>It's the ability to reference something you've already accessed without

0:33:22.800 --> 0:33:26.600
<v Speaker 1>having to specify what you're talking about. So if I

0:33:26.640 --> 0:33:29.840
<v Speaker 1>asked a voice assistant what the weather will be like today,

0:33:29.920 --> 0:33:32.400
<v Speaker 1>and then I follow that up after I get the information,

0:33:32.440 --> 0:33:35.920
<v Speaker 1>I say what about tomorrow, the system that has this

0:33:36.000 --> 0:33:39.360
<v Speaker 1>kind of capability could infer that what I meant was

0:33:39.960 --> 0:33:42.720
<v Speaker 1>what will the weather be like tomorrow, even though I

0:33:42.720 --> 0:33:46.040
<v Speaker 1>didn't say it specifically. Like that. That's pretty advanced for

0:33:46.160 --> 0:33:48.920
<v Speaker 1>nineteen sixty eight, even though it was for this very

0:33:48.960 --> 0:33:53.520
<v Speaker 1>restricted virtual world with a limited number of variables. However,

0:33:54.080 --> 0:33:56.760
<v Speaker 1>win no Grad discovered that the secret to his success

0:33:56.840 --> 0:34:00.360
<v Speaker 1>was largely in this restriction. As you expec ended the

0:34:00.440 --> 0:34:03.920
<v Speaker 1>virtual world to incorporate more elements, it made the problem

0:34:04.040 --> 0:34:07.920
<v Speaker 1>exponentially harder. His work, by the way, was an early

0:34:07.920 --> 0:34:11.560
<v Speaker 1>example of what we call anapho resolution, and an anaphour

0:34:11.640 --> 0:34:13.279
<v Speaker 1>is what I was talking about second ago. It's a

0:34:13.280 --> 0:34:16.759
<v Speaker 1>word or phrase that refers to an earlier word or

0:34:16.880 --> 0:34:20.799
<v Speaker 1>phrase within a discourse. So if I said move the

0:34:20.800 --> 0:34:24.080
<v Speaker 1>red sphere to the left, then I said, now move

0:34:24.120 --> 0:34:27.560
<v Speaker 1>it back the It obviously refers to the red sphere.

0:34:27.640 --> 0:34:30.759
<v Speaker 1>You would understand that, but a machine wouldn't necessarily understand it.

0:34:31.239 --> 0:34:33.520
<v Speaker 1>You would have to say move the red sphere to

0:34:33.520 --> 0:34:37.040
<v Speaker 1>the left, move the red sphere back. And even with back,

0:34:37.560 --> 0:34:40.200
<v Speaker 1>that has an element of memory to it, because the

0:34:40.239 --> 0:34:43.440
<v Speaker 1>system has to remember where the red sphere used to be. Why.

0:34:43.480 --> 0:34:46.040
<v Speaker 1>No Grad's approach was one of the early attempts to

0:34:46.080 --> 0:34:51.200
<v Speaker 1>incorporate anapho resolution into NLP models. Other models concentrated on

0:34:51.239 --> 0:34:54.600
<v Speaker 1>translating word by word or sentence by sentence. They were

0:34:54.640 --> 0:35:00.360
<v Speaker 1>incapable of maintaining relationships between between words beyond that. That

0:35:00.520 --> 0:35:04.600
<v Speaker 1>shift marked a change in attitude among NLP researchers of

0:35:04.680 --> 0:35:07.680
<v Speaker 1>the time. A growing number of researchers felt that world

0:35:07.719 --> 0:35:11.640
<v Speaker 1>knowledge and artificial intelligence was necessary if we wanted machines

0:35:11.719 --> 0:35:14.600
<v Speaker 1>to be able to analyze and act upon longer forms

0:35:14.640 --> 0:35:18.360
<v Speaker 1>of discourse. The early approaches to NLP were best suited

0:35:18.400 --> 0:35:24.400
<v Speaker 1>to short, self contained passages in ninety one, AREPA launched

0:35:24.400 --> 0:35:28.319
<v Speaker 1>the Speech Understanding Research Program. I also mentioned that in

0:35:28.360 --> 0:35:30.880
<v Speaker 1>the Speech recognition episode it was very important for the

0:35:30.920 --> 0:35:34.080
<v Speaker 1>development of speech recognition. The goal of that program was

0:35:34.120 --> 0:35:37.319
<v Speaker 1>to advance not only speech recognition but also n LP

0:35:37.520 --> 0:35:40.640
<v Speaker 1>research so that a computer could not just detect and

0:35:40.680 --> 0:35:44.680
<v Speaker 1>transcribe speech, but also respond to it in some meaningful way,

0:35:44.800 --> 0:35:49.000
<v Speaker 1>for example being able to UH index all that information

0:35:49.080 --> 0:35:53.480
<v Speaker 1>so that it is searchable. The program lasted five years. However,

0:35:53.600 --> 0:35:57.280
<v Speaker 1>at the conclusion, the agency was not satisfied with the results,

0:35:57.640 --> 0:36:00.520
<v Speaker 1>which technically delivered upon what was asked, but a pretty

0:36:00.640 --> 0:36:05.239
<v Speaker 1>limited implementation, so are BUT decided to cut funding. They

0:36:05.280 --> 0:36:08.800
<v Speaker 1>stopped the project. This was another big blow to research

0:36:08.800 --> 0:36:11.160
<v Speaker 1>in the United States, which had viewed the project as

0:36:11.160 --> 0:36:14.440
<v Speaker 1>a positive development ever since the ALPAC report had pulled

0:36:14.440 --> 0:36:17.920
<v Speaker 1>the RUG out from under the funding earlier. Now, I've

0:36:17.920 --> 0:36:20.560
<v Speaker 1>got a lot more to say about the development of

0:36:20.680 --> 0:36:24.439
<v Speaker 1>natural language processing and where we are now, as well

0:36:24.480 --> 0:36:27.239
<v Speaker 1>as the history of the various voice assistants that we're

0:36:27.280 --> 0:36:30.799
<v Speaker 1>familiar with today. But it's time to conclude this episode.

0:36:31.040 --> 0:36:33.239
<v Speaker 1>In our next episode, we'll pick up where I left

0:36:33.239 --> 0:36:36.160
<v Speaker 1>off today and we'll continue down and talk about all

0:36:36.200 --> 0:36:39.680
<v Speaker 1>of our beloved friends like Syrie and Alexa. Now, if

0:36:39.719 --> 0:36:43.880
<v Speaker 1>you have suggestions or future episodes of tech Stuff, right me.

0:36:44.040 --> 0:36:46.319
<v Speaker 1>Let me know what you want to hear. There might

0:36:46.360 --> 0:36:49.560
<v Speaker 1>be a specific technology or a company, a person in tech.

0:36:49.640 --> 0:36:51.560
<v Speaker 1>Maybe there's someone you want me to interview or have

0:36:51.680 --> 0:36:54.279
<v Speaker 1>on as a special guest host. You can send me

0:36:54.320 --> 0:36:57.160
<v Speaker 1>an email. The address for the show is tech Stuff

0:36:57.440 --> 0:37:00.439
<v Speaker 1>at how stuff works dot com, or you can drop

0:37:00.440 --> 0:37:02.719
<v Speaker 1>me a line on Facebook or Twitter. The handle of

0:37:02.840 --> 0:37:06.480
<v Speaker 1>both of those is tech Stuff H s W. Don't forget.

0:37:06.600 --> 0:37:08.680
<v Speaker 1>You can follow us on Instagram. I want to see

0:37:08.680 --> 0:37:10.920
<v Speaker 1>you guys over there, and I'll talk to you again

0:37:11.680 --> 0:37:20.480
<v Speaker 1>really soon for more on this and thousands of other topics,

0:37:20.560 --> 0:37:32.000
<v Speaker 1>because it how stuff works dot com