1 00:00:04,120 --> 00:00:07,160 Speaker 1: Get in touch with technology with tech Stuff from how 2 00:00:07,200 --> 00:00:14,120 Speaker 1: stuff works dot com. Hey there, and welcome to tech Stuff. 3 00:00:14,160 --> 00:00:17,400 Speaker 1: I'm your host, Jonathan Strickland. I'm an executive producer at 4 00:00:17,440 --> 00:00:20,680 Speaker 1: how stuff Works and I love all things tech. And 5 00:00:20,720 --> 00:00:24,000 Speaker 1: in the last episode, I covered the history and technology 6 00:00:24,040 --> 00:00:28,639 Speaker 1: behind speech recognition. So today we're going to look at 7 00:00:28,680 --> 00:00:34,440 Speaker 1: a related concept called natural language processing or natural language understanding. 8 00:00:34,479 --> 00:00:38,920 Speaker 1: The two are are related. This technology and speech recognition 9 00:00:39,000 --> 00:00:42,800 Speaker 1: are both part of what make voice assistants like Sirie, 10 00:00:43,120 --> 00:00:46,840 Speaker 1: Alexa and Google Assistant work, though there are other technologies 11 00:00:46,880 --> 00:00:49,040 Speaker 1: that also go into that. Now, this is a huge 12 00:00:49,120 --> 00:00:53,120 Speaker 1: topic and as a long and fascinating history, so this 13 00:00:53,200 --> 00:00:55,120 Speaker 1: episode is just going to be the start of it. 14 00:00:55,320 --> 00:00:58,320 Speaker 1: In the next episode, I will conclude a discussion on 15 00:00:58,480 --> 00:01:01,360 Speaker 1: natural language processing and go into the history of these 16 00:01:01,400 --> 00:01:05,920 Speaker 1: actual voice assistants. So, on a high level, what is 17 00:01:06,120 --> 00:01:11,080 Speaker 1: natural language processing? Well, simply put, it's programming a machine 18 00:01:11,160 --> 00:01:14,720 Speaker 1: to interpret language the way we use it we human beings. 19 00:01:14,840 --> 00:01:19,640 Speaker 1: So in an ideal implementation, which would also require advanced 20 00:01:19,720 --> 00:01:23,680 Speaker 1: artificial intelligence, you could speak to a machine or type 21 00:01:23,720 --> 00:01:25,760 Speaker 1: whatever you like into a terminal and it would be 22 00:01:25,800 --> 00:01:29,080 Speaker 1: able to understand what you meant. What your commands were, 23 00:01:29,200 --> 00:01:32,800 Speaker 1: no matter how you worded the phrase. In turn, the 24 00:01:32,880 --> 00:01:36,440 Speaker 1: machine would be able to generate responses that made linguistic 25 00:01:36,560 --> 00:01:39,959 Speaker 1: sense to us, and we could in effect hold entire 26 00:01:40,080 --> 00:01:44,840 Speaker 1: conversations with those machines. This, as it turns out, is 27 00:01:44,880 --> 00:01:49,000 Speaker 1: a very difficult challenge. Even creating a machine that can 28 00:01:49,040 --> 00:01:52,560 Speaker 1: respond to basic commands delivered in a natural language is 29 00:01:52,720 --> 00:01:56,080 Speaker 1: really really hard to do, and we haven't yet cracked 30 00:01:56,240 --> 00:02:00,520 Speaker 1: the nut on making a machine that can actually hold 31 00:02:00,560 --> 00:02:04,040 Speaker 1: a real conversation with us. Yet we can sometimes forget 32 00:02:04,520 --> 00:02:09,520 Speaker 1: that machines do not natively understand human language. Machines process 33 00:02:09,600 --> 00:02:13,639 Speaker 1: information in machine code, which is difficult for humans to understand. 34 00:02:14,120 --> 00:02:17,480 Speaker 1: I almost said impossible for humans to understand, but really 35 00:02:17,880 --> 00:02:22,600 Speaker 1: it's just impractical. It's incredibly difficult. So, for example, computers 36 00:02:22,600 --> 00:02:26,639 Speaker 1: that run on binary systems process all information in zeros 37 00:02:26,760 --> 00:02:29,840 Speaker 1: and ones. Ultimately, when you get down to it, so 38 00:02:29,880 --> 00:02:31,880 Speaker 1: if you were to look at a sheet of zeros 39 00:02:31,919 --> 00:02:36,280 Speaker 1: and ones, it would probably seem completely incomprehensible to you, 40 00:02:36,440 --> 00:02:40,560 Speaker 1: although to a computer it could seem perfectly logical. Our 41 00:02:40,680 --> 00:02:46,000 Speaker 1: language is equally incomprehensible to machines. Programming languages make it 42 00:02:46,080 --> 00:02:49,079 Speaker 1: easier for humans to make machines do what we want 43 00:02:49,160 --> 00:02:52,960 Speaker 1: them to do. Programming languages create a level of abstraction 44 00:02:53,200 --> 00:02:56,200 Speaker 1: between human language and machine language. It's kind of a 45 00:02:56,600 --> 00:02:59,600 Speaker 1: meeting ground in the middle. Programming languages tend to be 46 00:02:59,720 --> 00:03:05,079 Speaker 1: highly structured with specific strict sets of rules. Programming within 47 00:03:05,160 --> 00:03:08,200 Speaker 1: those rules will get you the results you want, assuming 48 00:03:08,360 --> 00:03:11,960 Speaker 1: your code is good, but if you stray outside those rules, 49 00:03:12,160 --> 00:03:15,359 Speaker 1: you start to get errors. Human language is much more 50 00:03:15,440 --> 00:03:20,200 Speaker 1: variable and complicated and ambiguous, and that's something that machines 51 00:03:20,200 --> 00:03:22,880 Speaker 1: are not very good at handling. Now, if you've ever 52 00:03:22,880 --> 00:03:26,600 Speaker 1: played a text based adventure from way back in the day, 53 00:03:26,639 --> 00:03:29,800 Speaker 1: like Zork, you know that those adventure games have a 54 00:03:29,880 --> 00:03:34,080 Speaker 1: very limited vocabulary. The game can accept certain commands, but 55 00:03:34,200 --> 00:03:37,200 Speaker 1: only because the programmer built in the option in the game. 56 00:03:37,280 --> 00:03:40,880 Speaker 1: They incorporated that in the game's design. So you might 57 00:03:40,920 --> 00:03:44,200 Speaker 1: be able to type something like go north or just north, 58 00:03:44,280 --> 00:03:46,840 Speaker 1: and the game understands you want your character to move 59 00:03:46,880 --> 00:03:49,240 Speaker 1: to a new location that's to the north of your 60 00:03:49,240 --> 00:03:52,480 Speaker 1: current location. But maybe you type something else, maybe you 61 00:03:52,520 --> 00:03:57,120 Speaker 1: type jog north or saunter north, and the programmer didn't 62 00:03:57,160 --> 00:03:58,880 Speaker 1: think of that. They didn't come up with all the 63 00:03:58,920 --> 00:04:01,560 Speaker 1: different ways you have describe the way you want to 64 00:04:01,640 --> 00:04:04,240 Speaker 1: move north, so you might get a result that says 65 00:04:04,280 --> 00:04:07,440 Speaker 1: something like I didn't understand that, or you can't do 66 00:04:07,480 --> 00:04:12,360 Speaker 1: that here. Computers only have the illusion of understanding us. 67 00:04:12,400 --> 00:04:15,720 Speaker 1: They don't actually know what we mean when we say something, 68 00:04:15,760 --> 00:04:19,599 Speaker 1: at least not natively. Now, that meant that for most 69 00:04:19,640 --> 00:04:22,640 Speaker 1: of our history with computers, humans have had to learn 70 00:04:22,720 --> 00:04:25,560 Speaker 1: how to work with machines, not the other way around. 71 00:04:26,000 --> 00:04:30,719 Speaker 1: We have had to learn commands and syntax that machines accept, 72 00:04:31,120 --> 00:04:32,960 Speaker 1: and if we try to word those commands in a 73 00:04:33,000 --> 00:04:36,760 Speaker 1: different way, we tend to get an error. Natural language 74 00:04:36,760 --> 00:04:40,000 Speaker 1: processing attempts to flip the tables on this relationship and 75 00:04:40,000 --> 00:04:43,039 Speaker 1: teach machines how to work with humans so that we 76 00:04:43,080 --> 00:04:45,599 Speaker 1: don't have to go through any sort of learning curve. 77 00:04:45,640 --> 00:04:48,960 Speaker 1: We don't need to formulate our our commands in a 78 00:04:49,000 --> 00:04:53,360 Speaker 1: specific way to be understood. The technology works on our terms, 79 00:04:53,640 --> 00:04:56,640 Speaker 1: or as close to those as we can manage. That 80 00:04:56,720 --> 00:04:59,800 Speaker 1: means that programmers have to build systems that can parse 81 00:05:00,160 --> 00:05:03,680 Speaker 1: language for meaning, and it also means having to build 82 00:05:03,760 --> 00:05:07,160 Speaker 1: tools and machines that can handle stuff that you typically 83 00:05:07,240 --> 00:05:11,600 Speaker 1: encounter in higher level language courses. So here's a quick 84 00:05:11,720 --> 00:05:16,480 Speaker 1: rundown on some of the stuff a natural language processing 85 00:05:16,480 --> 00:05:21,000 Speaker 1: approach has to take into account. First, you have grammar. Now, 86 00:05:21,000 --> 00:05:25,120 Speaker 1: grammar can refer to the study of language, but generally speaking, 87 00:05:25,120 --> 00:05:27,200 Speaker 1: when we say grammar, or at least when I'm using 88 00:05:27,240 --> 00:05:30,640 Speaker 1: the term in the context of natural language processing, I 89 00:05:30,680 --> 00:05:35,320 Speaker 1: mean a set of rules for the organization of components 90 00:05:35,360 --> 00:05:39,760 Speaker 1: of a language into meaningful statements or sentences. This is 91 00:05:39,800 --> 00:05:43,520 Speaker 1: a broad concept. It is a big, big idea. It 92 00:05:43,560 --> 00:05:47,479 Speaker 1: actually encompasses a couple of other also big ideas that 93 00:05:47,520 --> 00:05:50,880 Speaker 1: are important in natural language processing. One of those is 94 00:05:50,920 --> 00:05:56,400 Speaker 1: the concept of morphology. Morphology has to do with word forms. 95 00:05:57,240 --> 00:06:01,080 Speaker 1: Words consist of more themes, and a word can actually 96 00:06:01,120 --> 00:06:04,599 Speaker 1: have multiple moreph themes. So, for example, let's take a 97 00:06:04,640 --> 00:06:10,080 Speaker 1: word like sky divers. Sky divers technically has four more themes, 98 00:06:10,120 --> 00:06:16,840 Speaker 1: and they are sky dive er and s sky divers. 99 00:06:16,880 --> 00:06:20,080 Speaker 1: The more themes only make sense if we put them 100 00:06:20,120 --> 00:06:24,760 Speaker 1: in that particular order. For the word skydivers, dive skiers 101 00:06:24,839 --> 00:06:27,760 Speaker 1: does not mean the same thing. Actually, it doesn't mean 102 00:06:27,880 --> 00:06:30,200 Speaker 1: anything at all. So a good system will have to 103 00:06:30,240 --> 00:06:34,200 Speaker 1: understand morphology and know how words can and cannot be formed. 104 00:06:34,600 --> 00:06:38,039 Speaker 1: So again, with skydivers and knows all right, well, I 105 00:06:38,200 --> 00:06:40,320 Speaker 1: know the word sky, I know what that means. I 106 00:06:40,360 --> 00:06:43,279 Speaker 1: know what the word dive means. Er means that this 107 00:06:43,360 --> 00:06:47,040 Speaker 1: is not an action. This is actually an entity that 108 00:06:47,160 --> 00:06:50,599 Speaker 1: engages in that action. Right. A sky diver is someone 109 00:06:50,640 --> 00:06:54,919 Speaker 1: who's skydives, and the s SO says it's plural, so 110 00:06:54,960 --> 00:06:59,200 Speaker 1: that there's more than one skydiver. That's what morphology is 111 00:06:59,240 --> 00:07:02,880 Speaker 1: all about. This is this sort of internal logic of 112 00:07:02,920 --> 00:07:09,240 Speaker 1: word formation. Syntax is another big concept within grammar. Syntax, however, 113 00:07:09,320 --> 00:07:13,560 Speaker 1: does not refer to word formation. It refers to sentence structure. 114 00:07:13,600 --> 00:07:18,680 Speaker 1: How do we arrange words to make meaningful sentences. For example, 115 00:07:18,880 --> 00:07:23,200 Speaker 1: the sentence you must have patience, my young Padawan. That 116 00:07:23,240 --> 00:07:27,560 Speaker 1: follows good syntax, but patients you must have my young 117 00:07:27,640 --> 00:07:31,360 Speaker 1: Padawan is a bit hanky because Yoda is all over 118 00:07:31,400 --> 00:07:35,760 Speaker 1: the place with his syntax. In addition to grammar, you 119 00:07:35,840 --> 00:07:39,240 Speaker 1: also have to take into account semantics. Now, that is 120 00:07:39,280 --> 00:07:43,240 Speaker 1: the study of the meaning within language. This is a 121 00:07:43,240 --> 00:07:46,160 Speaker 1: tricky one because there's a lot to unwrap here. For example, 122 00:07:46,480 --> 00:07:50,440 Speaker 1: words and phrases can actually stand for different meanings. They 123 00:07:50,440 --> 00:07:54,960 Speaker 1: can denote different ideas. We might use many different phrases 124 00:07:55,120 --> 00:07:58,320 Speaker 1: or words to describe the same concept. Right, So we 125 00:07:58,400 --> 00:08:02,320 Speaker 1: might use a usen or more different ways to say 126 00:08:02,360 --> 00:08:05,840 Speaker 1: the same thing, or we might use two similar words 127 00:08:05,960 --> 00:08:09,240 Speaker 1: or phrases to describe very different concepts. We might even 128 00:08:09,360 --> 00:08:13,880 Speaker 1: use the same phrase to describe wildly different things or 129 00:08:13,920 --> 00:08:16,840 Speaker 1: with very different meanings. Semantics gets down to what we 130 00:08:16,880 --> 00:08:20,320 Speaker 1: actually mean when we say something. If you've ever had 131 00:08:20,360 --> 00:08:23,920 Speaker 1: a discussion with someone and that person says, you know 132 00:08:24,000 --> 00:08:27,800 Speaker 1: what I meant, that's essentially a statement that indicates semantically 133 00:08:28,280 --> 00:08:31,800 Speaker 1: the meaning was clear, even if the phrasing did not 134 00:08:32,000 --> 00:08:35,800 Speaker 1: indicate it on the face of things. Then there is 135 00:08:35,880 --> 00:08:41,600 Speaker 1: pragmatics that's all about context. Contextual information is incredibly important 136 00:08:41,600 --> 00:08:45,240 Speaker 1: in communication, and it relates a little bit to semantics. 137 00:08:45,320 --> 00:08:50,000 Speaker 1: Semantics is about structure, and pragmatics is about context. So 138 00:08:50,040 --> 00:08:53,920 Speaker 1: if I say the weather sure is nice today, on 139 00:08:54,080 --> 00:08:55,880 Speaker 1: the face of it, that sounds like I'm in favor 140 00:08:56,080 --> 00:08:58,520 Speaker 1: of the way the weather is. Right, it sounds like, oh, 141 00:08:58,640 --> 00:09:01,120 Speaker 1: I like how the weather is. But if I say 142 00:09:01,120 --> 00:09:04,800 Speaker 1: that same phrase while I'm standing in a downpour and 143 00:09:04,880 --> 00:09:08,960 Speaker 1: I'm clearly not happy, I'm obviously being sarcastic. I mean 144 00:09:09,000 --> 00:09:12,600 Speaker 1: the opposite of what I actually said. The context of 145 00:09:12,640 --> 00:09:16,240 Speaker 1: the situation changes the meaning of what I am saying, 146 00:09:16,600 --> 00:09:19,839 Speaker 1: even though the actual phrasing would seem to indicate the 147 00:09:19,920 --> 00:09:23,959 Speaker 1: opposite of what my meaning was. As we develop more 148 00:09:24,000 --> 00:09:26,600 Speaker 1: technology that can communicate with us, we have to take 149 00:09:26,600 --> 00:09:30,120 Speaker 1: pragmatics into consideration, or else machines are going to be 150 00:09:30,160 --> 00:09:34,080 Speaker 1: misinterpreting what we actually mean when we say stuff. So 151 00:09:34,320 --> 00:09:36,160 Speaker 1: machines are going to have to learn how to deal 152 00:09:36,200 --> 00:09:41,280 Speaker 1: with stuff like sarcasm. Yeah. Right. Then we have phonology, 153 00:09:41,400 --> 00:09:44,680 Speaker 1: that is the sound of a language. I talked a 154 00:09:44,679 --> 00:09:48,000 Speaker 1: little bit about this in the Speech Recognition podcast about 155 00:09:48,000 --> 00:09:51,000 Speaker 1: how different languages have different phonemes. So I'm not going 156 00:09:51,040 --> 00:09:52,960 Speaker 1: to dwell on that again. You can listen to the 157 00:09:53,000 --> 00:09:56,200 Speaker 1: Speech Recognition podcast to learn more about it. But it 158 00:09:56,320 --> 00:09:59,439 Speaker 1: is an important element in languages, especially when you get 159 00:09:59,480 --> 00:10:05,000 Speaker 1: into uh natural language processing that is taking verbal input 160 00:10:05,120 --> 00:10:09,520 Speaker 1: and not just textual input. Then you have lexicons that's 161 00:10:09,559 --> 00:10:14,240 Speaker 1: the total vocabulary for a system. Ideally, alexicon has not 162 00:10:14,360 --> 00:10:18,240 Speaker 1: just the words, but some sort of metadata attached that 163 00:10:18,360 --> 00:10:22,000 Speaker 1: indicate the meaning of words or the relationship of words 164 00:10:22,080 --> 00:10:24,760 Speaker 1: with one another. Though you can fudge this a little 165 00:10:24,760 --> 00:10:27,280 Speaker 1: bit depending upon the implementation of the system. I'll talk 166 00:10:27,320 --> 00:10:30,719 Speaker 1: a lot more about that throughout these podcasts. Now, these 167 00:10:30,760 --> 00:10:34,840 Speaker 1: can be tricky concepts for human beings, let alone for machines. 168 00:10:35,160 --> 00:10:39,640 Speaker 1: Machines are very good at following strict sets of instructions, 169 00:10:40,120 --> 00:10:43,760 Speaker 1: but language can sometimes defy logic. Think of rules that 170 00:10:43,840 --> 00:10:47,960 Speaker 1: apply to your native language, then just think of the 171 00:10:48,000 --> 00:10:52,040 Speaker 1: exceptions that exist to those rules. Every language has exceptions 172 00:10:52,080 --> 00:10:55,520 Speaker 1: for rules that are established, and depending upon the rule 173 00:10:55,679 --> 00:10:58,160 Speaker 1: and the exception, there may seem to be no rhyme 174 00:10:58,440 --> 00:11:01,600 Speaker 1: or reason for the deviation and from the rule. Moreover, 175 00:11:02,240 --> 00:11:05,480 Speaker 1: if we want machines that are capable of understanding us 176 00:11:05,640 --> 00:11:08,680 Speaker 1: and responding to our language in a meaningful way, those 177 00:11:08,720 --> 00:11:12,040 Speaker 1: machines need to be able to handle the idiosyncrasies of 178 00:11:12,120 --> 00:11:16,360 Speaker 1: individual speakers. To some extent. There may be regional turns 179 00:11:16,400 --> 00:11:19,880 Speaker 1: of phrase or vocabulary that don't extend to the general 180 00:11:19,920 --> 00:11:24,199 Speaker 1: population of speakers of the respected language. So you might 181 00:11:24,440 --> 00:11:29,680 Speaker 1: encounter a person who speaks in local idioms quite a bit, 182 00:11:30,320 --> 00:11:33,520 Speaker 1: and if those are not frequently used in the broader 183 00:11:33,559 --> 00:11:37,320 Speaker 1: general population of that language, then you're gonna have a 184 00:11:37,320 --> 00:11:40,680 Speaker 1: lot of communication errors between that person and a machine 185 00:11:40,800 --> 00:11:44,880 Speaker 1: that is trying to process that language. Ideally, machines would 186 00:11:44,880 --> 00:11:48,520 Speaker 1: be able to understand whatever we say and interpret the 187 00:11:48,600 --> 00:11:52,360 Speaker 1: meaning correctly, although we haven't even gotten to a world 188 00:11:52,360 --> 00:11:54,920 Speaker 1: where human beings can do that reliably, So I don't 189 00:11:54,920 --> 00:11:57,360 Speaker 1: know why I'm holding machines up to such a high standard. 190 00:11:57,600 --> 00:11:59,960 Speaker 1: We definitely would want them to reach a certain love 191 00:12:00,200 --> 00:12:05,319 Speaker 1: of confidence and and capability, however that machines just are 192 00:12:05,360 --> 00:12:09,200 Speaker 1: not quite there yet. I'm going to talk a lot 193 00:12:09,360 --> 00:12:13,640 Speaker 1: more about the history of natural language processing in just 194 00:12:13,679 --> 00:12:16,680 Speaker 1: a moment, but first let's take a quick break to 195 00:12:16,800 --> 00:12:27,960 Speaker 1: thank our sponsor. The history of natural language processing is 196 00:12:28,120 --> 00:12:32,920 Speaker 1: pretty darn complicated because it involves multiple lines of research 197 00:12:33,120 --> 00:12:37,559 Speaker 1: and lots of different disciplines. So we have all sorts 198 00:12:37,559 --> 00:12:40,480 Speaker 1: of things that play into this, like hidden Markov models 199 00:12:40,520 --> 00:12:45,000 Speaker 1: I talked about those in the Speech Recognition podcast, neural networks, 200 00:12:45,360 --> 00:12:50,239 Speaker 1: referencing language using mathematical vectors, and a lot more contributing 201 00:12:50,240 --> 00:12:53,240 Speaker 1: to the evolution of natural language processing, and a lot 202 00:12:53,280 --> 00:12:58,359 Speaker 1: of disciplines like not just computer science, but linguistics and psychology. 203 00:12:58,520 --> 00:13:02,880 Speaker 1: So there's not like a single line I can follow 204 00:13:03,240 --> 00:13:07,040 Speaker 1: where it's a lad to be led to see. So 205 00:13:07,080 --> 00:13:10,240 Speaker 1: we're gonna be jumping around a little bit. However, one 206 00:13:10,280 --> 00:13:12,160 Speaker 1: of the sources I want to call out that I 207 00:13:12,240 --> 00:13:15,040 Speaker 1: used while I was researching this episode was a paper 208 00:13:15,080 --> 00:13:20,160 Speaker 1: written by Karen Spark Jones called Natural Language Processing a 209 00:13:20,240 --> 00:13:24,320 Speaker 1: Historical Review. It's pretty dense, it's pretty technical, but it's 210 00:13:24,360 --> 00:13:26,800 Speaker 1: also available to read online if you want a more 211 00:13:26,840 --> 00:13:29,840 Speaker 1: thorough treatment of the history of the technology up to 212 00:13:29,880 --> 00:13:32,400 Speaker 1: two thousand. I'm gonna be skimming over quite a bit 213 00:13:32,440 --> 00:13:35,000 Speaker 1: of it because, as I say, it gets really deep 214 00:13:35,040 --> 00:13:38,200 Speaker 1: and really technical, and it uses a lot of shorthand 215 00:13:38,240 --> 00:13:40,600 Speaker 1: to reference things, which meant that I had to do 216 00:13:40,640 --> 00:13:43,800 Speaker 1: a lot of jumping down research rabbit holes to learn more. 217 00:13:43,840 --> 00:13:47,960 Speaker 1: But it was a very useful starting point for this research. 218 00:13:48,440 --> 00:13:51,040 Speaker 1: And also it was published in two thousand one. Obviously 219 00:13:51,480 --> 00:13:54,320 Speaker 1: a lot has happened since then. We're almost two decades 220 00:13:54,400 --> 00:13:58,280 Speaker 1: out from that. But I'm gonna start at the beginning 221 00:13:58,320 --> 00:14:01,360 Speaker 1: and then work my way up to what's going on today. 222 00:14:01,400 --> 00:14:05,320 Speaker 1: So early work in natural language processing it actually surprised me. 223 00:14:05,360 --> 00:14:07,640 Speaker 1: I was surprised at how old it was. It actually 224 00:14:07,720 --> 00:14:11,079 Speaker 1: dates all the way back to the nineteen forties. Physicist 225 00:14:11,120 --> 00:14:15,360 Speaker 1: and computer scientist Andrew Donald Booth proposed using computers to 226 00:14:15,400 --> 00:14:19,360 Speaker 1: translate passages from one language into another, which is the 227 00:14:19,400 --> 00:14:21,640 Speaker 1: type of natural language processing. You have to be able 228 00:14:21,640 --> 00:14:25,200 Speaker 1: to recognize the words of one language and then map 229 00:14:25,320 --> 00:14:28,800 Speaker 1: them to a similar meaning in a different language. Now, 230 00:14:28,840 --> 00:14:33,000 Speaker 1: Booth's approach involved creating a word for word model. If 231 00:14:33,000 --> 00:14:36,440 Speaker 1: the model couldn't find a match between two words, it 232 00:14:36,480 --> 00:14:40,440 Speaker 1: would automatically discard the last letter on the input word 233 00:14:40,760 --> 00:14:43,840 Speaker 1: and try again. It would do this until it found 234 00:14:43,840 --> 00:14:45,840 Speaker 1: a match, or if it didn't find a match, you've 235 00:14:45,840 --> 00:14:48,720 Speaker 1: got an error. But it would find a match, it 236 00:14:48,760 --> 00:14:51,080 Speaker 1: would search its memory to see if the ending of 237 00:14:51,120 --> 00:14:54,320 Speaker 1: the input word could give information about what the ending 238 00:14:54,440 --> 00:14:57,920 Speaker 1: does to the meaning of the word. So, for example, 239 00:14:58,120 --> 00:15:01,240 Speaker 1: if you were using this to tr inslate from English 240 00:15:01,320 --> 00:15:06,680 Speaker 1: into Russian and you use the word writer, maybe writer 241 00:15:07,040 --> 00:15:11,240 Speaker 1: does not show up in the Russian lexicon, but right 242 00:15:11,720 --> 00:15:17,840 Speaker 1: does w R I T E. So the translating program 243 00:15:17,880 --> 00:15:21,800 Speaker 1: tries to translate writer from English into Russian, cannot find 244 00:15:21,840 --> 00:15:25,760 Speaker 1: a Russian equivalent to writer, drops the r looks for 245 00:15:25,800 --> 00:15:28,440 Speaker 1: the Russian word for right, and it finds it. Then says, 246 00:15:28,480 --> 00:15:31,640 Speaker 1: all right, well, in English, what does writer remain? What 247 00:15:31,680 --> 00:15:36,200 Speaker 1: does that are due to the word right and it 248 00:15:36,240 --> 00:15:38,480 Speaker 1: looks at its memory and finds out that the letter 249 00:15:38,720 --> 00:15:43,040 Speaker 1: R makes a a noun out of the verb, but 250 00:15:43,240 --> 00:15:47,160 Speaker 1: it creates an entity that does the action, which is 251 00:15:47,400 --> 00:15:51,280 Speaker 1: to right. Then it looks in the Russian lexicon and says, 252 00:15:51,520 --> 00:15:54,720 Speaker 1: all right, well, is there a word in that lexicon 253 00:15:55,160 --> 00:15:59,400 Speaker 1: that matches this meaning. It's kind of a slow, laborious 254 00:15:59,480 --> 00:16:02,240 Speaker 1: way of doing things, but was also very very early. 255 00:16:02,280 --> 00:16:07,360 Speaker 1: I mean it was the following year, in nine, Warren 256 00:16:07,440 --> 00:16:12,160 Speaker 1: Weaver produced a memorandum about machine translation, and Weaver admitted 257 00:16:12,200 --> 00:16:14,840 Speaker 1: in the memorandum that such an application would likely be 258 00:16:14,960 --> 00:16:17,760 Speaker 1: much more challenging than what he understood it to be, 259 00:16:18,360 --> 00:16:21,560 Speaker 1: but that he was quote willing to expose my ignorance, 260 00:16:21,600 --> 00:16:24,920 Speaker 1: hoping that will be slightly shielded by my intentions in 261 00:16:25,000 --> 00:16:28,640 Speaker 1: the quote. And I think that's rather charming. In that memo, 262 00:16:29,080 --> 00:16:32,560 Speaker 1: Weaver cites a letter he wrote to Professor Norbert Wiener 263 00:16:32,680 --> 00:16:36,800 Speaker 1: of M I T. And that included the following paragraph. 264 00:16:36,880 --> 00:16:40,600 Speaker 1: So here's a full paragraph. Actually it's two paragraphs from 265 00:16:40,640 --> 00:16:45,920 Speaker 1: the memorandum recognizing fully, even though necessarily vaguely, the semantic 266 00:16:46,000 --> 00:16:50,400 Speaker 1: difficulties because of multiple meanings, etcetera. I have wondered if 267 00:16:50,440 --> 00:16:54,119 Speaker 1: it were unthinkable to design a computer which would translate, 268 00:16:54,520 --> 00:16:58,360 Speaker 1: even if it would only translate only scientific material, where 269 00:16:58,360 --> 00:17:02,280 Speaker 1: the semantic difficulties are very notably less, and even if 270 00:17:02,320 --> 00:17:06,440 Speaker 1: it did produce an inelegant but intelligible result, it would 271 00:17:06,440 --> 00:17:10,960 Speaker 1: seem to me worthwhile also knowing nothing official about, but 272 00:17:11,280 --> 00:17:16,199 Speaker 1: having guests and inferred considerable about powerful new mechanized methods 273 00:17:16,200 --> 00:17:20,040 Speaker 1: and cryptography methods which I believe succeed even when one 274 00:17:20,080 --> 00:17:23,560 Speaker 1: does not know what language has been coded. One naturally 275 00:17:23,640 --> 00:17:27,400 Speaker 1: wonders if the problem of translation could conceivably be treated 276 00:17:27,440 --> 00:17:30,600 Speaker 1: as a problem in cryptography. When I look at an 277 00:17:30,680 --> 00:17:34,040 Speaker 1: article in Russian, I say, this is really written in English, 278 00:17:34,119 --> 00:17:36,960 Speaker 1: but it has been coded in some strange symbols I 279 00:17:37,000 --> 00:17:40,439 Speaker 1: will now proceed to decode. So he got this idea 280 00:17:40,480 --> 00:17:43,720 Speaker 1: because of activities that were going on in World War Two, 281 00:17:44,240 --> 00:17:48,240 Speaker 1: where teams were trying to decode messages. And they might 282 00:17:48,400 --> 00:17:52,520 Speaker 1: decode the message, they might figure out what letters correspond 283 00:17:52,600 --> 00:17:55,159 Speaker 1: to the code, but it may even be in a 284 00:17:55,160 --> 00:17:58,400 Speaker 1: totally different language than when they speak. So while they 285 00:17:58,400 --> 00:18:01,680 Speaker 1: are able to decode the message into a native language, 286 00:18:01,720 --> 00:18:04,320 Speaker 1: they are not able to speak that language. He says, well, 287 00:18:04,320 --> 00:18:07,000 Speaker 1: what if we just take that same step, and now 288 00:18:07,040 --> 00:18:10,399 Speaker 1: we treat the other language as a code in of 289 00:18:10,480 --> 00:18:13,320 Speaker 1: itself and try to translate that into English or or 290 00:18:13,560 --> 00:18:17,600 Speaker 1: decrypt it into English. Weaver are acknowledged that the word 291 00:18:17,800 --> 00:18:21,320 Speaker 1: into word approach that Booth and his contemporaries were relying 292 00:18:21,400 --> 00:18:25,080 Speaker 1: upon had limited utility. He wrote, quote, it is in 293 00:18:25,160 --> 00:18:28,639 Speaker 1: fact amply clear that a translation procedure that does little 294 00:18:28,640 --> 00:18:31,200 Speaker 1: more than handle a one to one correspondence of words 295 00:18:31,520 --> 00:18:35,440 Speaker 1: cannot hope to be useful for problems of literary translation 296 00:18:35,760 --> 00:18:38,680 Speaker 1: in which style is important, and in which the problems 297 00:18:38,720 --> 00:18:42,879 Speaker 1: of idiom, multiple meanings, etcetera. Are frequent. End quote. So 298 00:18:42,920 --> 00:18:46,679 Speaker 1: there he's saying, you can't just take a foreign word, 299 00:18:47,160 --> 00:18:51,560 Speaker 1: translate it into whatever the closest equivalent in English is, 300 00:18:52,080 --> 00:18:55,520 Speaker 1: and hope to get the same meaning, especially in literary works, 301 00:18:55,640 --> 00:18:58,720 Speaker 1: because they are all these different turns of phrase and 302 00:18:58,840 --> 00:19:03,199 Speaker 1: cultural meanings that will get lost. In that translation. You 303 00:19:03,240 --> 00:19:07,280 Speaker 1: would have something that might technically be considered more or 304 00:19:07,359 --> 00:19:10,520 Speaker 1: less correct, but would not be actually correct. You wouldn't 305 00:19:10,520 --> 00:19:14,840 Speaker 1: be getting across the meaning of the author in that translation. 306 00:19:15,080 --> 00:19:19,800 Speaker 1: You would just have words in a syntactical order that 307 00:19:19,880 --> 00:19:24,360 Speaker 1: would make sense from a syntax perspective. In other words, 308 00:19:24,400 --> 00:19:28,600 Speaker 1: you would have sentences that held up grammatically, but they 309 00:19:28,600 --> 00:19:33,240 Speaker 1: wouldn't necessarily have the meaning of the original writing. Weaver's 310 00:19:33,240 --> 00:19:36,600 Speaker 1: proposal was to perhaps expand the word into word model 311 00:19:36,680 --> 00:19:39,000 Speaker 1: and create a system that would analyze not just the 312 00:19:39,040 --> 00:19:42,800 Speaker 1: target word, but the words adjacent to the target in 313 00:19:42,920 --> 00:19:46,479 Speaker 1: order to determine the context of the word the meaning 314 00:19:46,680 --> 00:19:48,720 Speaker 1: of the word. As we'll see when we get a 315 00:19:48,760 --> 00:19:51,359 Speaker 1: little bit further down in the timeline, this is one 316 00:19:51,400 --> 00:19:54,520 Speaker 1: of the methods that folks working in in natural language 317 00:19:54,520 --> 00:19:58,080 Speaker 1: processing incorporated into their approach. So this was incredibly forward 318 00:19:58,080 --> 00:20:02,800 Speaker 1: thinking of Weaver. On January seven, ninety four, researchers from 319 00:20:02,800 --> 00:20:06,720 Speaker 1: IBM and Georgetown University demonstrated a system that was able 320 00:20:06,760 --> 00:20:12,760 Speaker 1: to translate around sixty sentences from Russian into English automatically. Now, 321 00:20:12,760 --> 00:20:16,359 Speaker 1: the process wasn't exactly painless. It required an operator to 322 00:20:16,440 --> 00:20:19,800 Speaker 1: take a sentence written in Russian but transcribed for the 323 00:20:19,800 --> 00:20:23,439 Speaker 1: English alphabet. It wasn't in the cyrillic alphabet. The person 324 00:20:23,520 --> 00:20:27,800 Speaker 1: would then encode that sentence on punch cards. They would 325 00:20:27,800 --> 00:20:30,800 Speaker 1: feed the punch cards into a seven oh one computer. 326 00:20:31,359 --> 00:20:34,480 Speaker 1: I mentioned the seven oh one that was an IBM system, 327 00:20:34,520 --> 00:20:37,080 Speaker 1: but I mentioned that in the previous episode and speech recognition. 328 00:20:37,359 --> 00:20:40,440 Speaker 1: Then they would wait for the translation program's response, which 329 00:20:40,440 --> 00:20:43,000 Speaker 1: would take a few seconds. The program would attempt to 330 00:20:43,040 --> 00:20:47,480 Speaker 1: translate the words from Russian to English. The demonstration was impressive, 331 00:20:47,600 --> 00:20:50,480 Speaker 1: but it was limited in scope. The program had alexicon 332 00:20:50,560 --> 00:20:53,640 Speaker 1: of only two fifty words or so, and it required 333 00:20:53,680 --> 00:20:58,199 Speaker 1: extensive programming to cope with syntax because word order in 334 00:20:58,320 --> 00:21:02,679 Speaker 1: Russian is different then word order in English, and you 335 00:21:02,720 --> 00:21:06,560 Speaker 1: can think of the programming as including metadata. The researchers 336 00:21:06,560 --> 00:21:11,000 Speaker 1: would tag Russian words with little signs that related to 337 00:21:11,119 --> 00:21:14,480 Speaker 1: specific rules. So, for example, one of the terms the 338 00:21:14,520 --> 00:21:18,760 Speaker 1: system could translate was a Russian two word phrase. It 339 00:21:18,840 --> 00:21:25,200 Speaker 1: was g dial major, which is I'm butchering the Russian pronunciation, 340 00:21:25,280 --> 00:21:28,760 Speaker 1: but in English it means major general. But the word 341 00:21:28,880 --> 00:21:32,119 Speaker 1: order is reversed in Russian. If you did a strict 342 00:21:32,160 --> 00:21:35,760 Speaker 1: word to word translation, you would get general major with 343 00:21:35,840 --> 00:21:39,600 Speaker 1: the translation, because that's the order that the Russian phrase 344 00:21:39,600 --> 00:21:42,560 Speaker 1: would put it in. So the programmers would tag each 345 00:21:42,600 --> 00:21:45,879 Speaker 1: word with a rule to kind of give the idea 346 00:21:45,920 --> 00:21:48,880 Speaker 1: of of what what you would what you should follow 347 00:21:48,920 --> 00:21:51,520 Speaker 1: when you're making these translations, and by you I mean 348 00:21:51,720 --> 00:21:55,359 Speaker 1: the computer system. So the word for general got the 349 00:21:55,400 --> 00:22:00,679 Speaker 1: assignment of rule twenty one and the rule for major 350 00:22:01,200 --> 00:22:04,879 Speaker 1: got the sign on. So when the system encountered a word, 351 00:22:05,240 --> 00:22:08,320 Speaker 1: it would look up any related rules to that word. 352 00:22:08,720 --> 00:22:11,200 Speaker 1: So if it comes across a word that has the 353 00:22:11,240 --> 00:22:14,760 Speaker 1: associated rule one, it would say, all right, this rule 354 00:22:14,800 --> 00:22:17,200 Speaker 1: tells me I have to go back over the message 355 00:22:17,240 --> 00:22:19,240 Speaker 1: and look to see if there was a rule twenty 356 00:22:19,280 --> 00:22:22,720 Speaker 1: one word in that same phrase, And if it finds 357 00:22:22,760 --> 00:22:25,159 Speaker 1: a rule twenty one word, it would then know I 358 00:22:25,240 --> 00:22:29,479 Speaker 1: need to reverse the order of these two words. This 359 00:22:29,480 --> 00:22:33,080 Speaker 1: this uh word order that appears in Russian needs to 360 00:22:33,119 --> 00:22:36,280 Speaker 1: be flipped for English. Now that's a pretty laborious process 361 00:22:36,840 --> 00:22:40,280 Speaker 1: and it doesn't work great for larger lexicons. The larger 362 00:22:40,440 --> 00:22:43,879 Speaker 1: the vocabulary, the more complex the sentences can become, the 363 00:22:43,920 --> 00:22:47,000 Speaker 1: more exceptions and rules you're going to encounter. It would 364 00:22:47,040 --> 00:22:49,640 Speaker 1: be really hard to implement this on a big scale, 365 00:22:49,680 --> 00:22:53,119 Speaker 1: but it was an impressive display of machine translation. The 366 00:22:53,160 --> 00:22:56,440 Speaker 1: system was essentially a vocabulary list and a long series 367 00:22:56,480 --> 00:23:00,919 Speaker 1: of if then rules. If the word is this, then 368 00:23:01,040 --> 00:23:04,919 Speaker 1: look for this. If that is there, then switch the 369 00:23:05,119 --> 00:23:09,720 Speaker 1: word order. Essentially according to articles, it could translate sentences 370 00:23:09,760 --> 00:23:13,640 Speaker 1: designed for the system in about six seconds. But again 371 00:23:13,720 --> 00:23:17,640 Speaker 1: it was designed for the system, very limited vocabulary, so 372 00:23:18,359 --> 00:23:21,639 Speaker 1: limited implementation there. And it's good to point out that 373 00:23:21,680 --> 00:23:24,200 Speaker 1: a lot of work and machine translation around this time 374 00:23:24,240 --> 00:23:27,919 Speaker 1: focused on English and Russian, which is no big surprise. 375 00:23:28,720 --> 00:23:30,720 Speaker 1: Keep in mind the time scale we're talking about the 376 00:23:30,800 --> 00:23:34,719 Speaker 1: nineteen fifties. Here, the USA and the then USS are 377 00:23:34,880 --> 00:23:38,000 Speaker 1: we're not on great terms. Both countries were using pretty 378 00:23:38,080 --> 00:23:41,439 Speaker 1: much every means at their disposal to analyze one another, 379 00:23:41,960 --> 00:23:44,919 Speaker 1: to spy on one another, to maneuver to make certain 380 00:23:44,960 --> 00:23:47,800 Speaker 1: the other nation didn't get a superior position. And we 381 00:23:47,800 --> 00:23:50,640 Speaker 1: saw a lot of technological development during this period, including 382 00:23:50,680 --> 00:23:53,800 Speaker 1: the space race that was all wrapped up in this 383 00:23:53,880 --> 00:23:57,840 Speaker 1: Cold War issue as well, and perhaps as no big surprise, 384 00:23:57,920 --> 00:24:00,520 Speaker 1: the US government was pretty keen to fund research and 385 00:24:00,560 --> 00:24:03,919 Speaker 1: development in machine translation up to a point. That is, 386 00:24:04,240 --> 00:24:09,440 Speaker 1: in nineteen sixty six, Joseph Wisenbaum published a computer program 387 00:24:09,480 --> 00:24:12,919 Speaker 1: called Eliza. I've talked about Eliza in previous episodes of 388 00:24:12,960 --> 00:24:16,800 Speaker 1: Tech Stuff. This was a primitive chat bought text based 389 00:24:17,000 --> 00:24:22,119 Speaker 1: chat bot. It mimicked a Rogerian psychotherapist. That's a discipline 390 00:24:22,160 --> 00:24:26,440 Speaker 1: that was pioneered by the psychologist Carl Rogers. It's sometimes 391 00:24:26,480 --> 00:24:31,679 Speaker 1: also called persons centered therapy. Eliza was strictly this text 392 00:24:31,680 --> 00:24:34,760 Speaker 1: based terminal operation. You would see a line of text 393 00:24:34,800 --> 00:24:37,000 Speaker 1: pop up. It would ask you how what how you're doing? 394 00:24:37,440 --> 00:24:39,080 Speaker 1: You can type stuff in and then it would respond 395 00:24:39,119 --> 00:24:43,399 Speaker 1: to you, so you would get the responses that appeared 396 00:24:43,440 --> 00:24:46,400 Speaker 1: to be semi intelligent. Typically it would be a question 397 00:24:46,440 --> 00:24:49,600 Speaker 1: to ask for more information, or sometimes it would be 398 00:24:49,640 --> 00:24:52,600 Speaker 1: a phrase to change the subject. So you might say 399 00:24:53,240 --> 00:24:56,640 Speaker 1: something along the lines of I'm so angry right now, 400 00:24:56,880 --> 00:24:59,800 Speaker 1: and Eliza might respond with what has made you angry? 401 00:25:00,320 --> 00:25:03,880 Speaker 1: So Eliza has flipped this around in order to sustain 402 00:25:03,920 --> 00:25:06,280 Speaker 1: the conversation. Then you could type in something else. Maybe 403 00:25:06,280 --> 00:25:09,760 Speaker 1: you type in everything is going wrong today, and Eliza 404 00:25:09,880 --> 00:25:12,600 Speaker 1: might respond with can you give me an example? And 405 00:25:12,640 --> 00:25:15,800 Speaker 1: then so on. Eliza would give the appearance of understanding 406 00:25:15,800 --> 00:25:18,640 Speaker 1: the subject, but in reality it was simply taking the input, 407 00:25:19,320 --> 00:25:22,040 Speaker 1: analyzing the parts of speech, then sending back a very 408 00:25:22,080 --> 00:25:25,760 Speaker 1: similar message or a related message in an effort to 409 00:25:25,840 --> 00:25:28,880 Speaker 1: keep the conversation going. Like it might just be a placeholder. 410 00:25:29,520 --> 00:25:33,000 Speaker 1: The program did not understand language or context beyond being 411 00:25:33,040 --> 00:25:35,840 Speaker 1: able to parse the basic parts of a sentence and 412 00:25:35,880 --> 00:25:39,480 Speaker 1: then rearrange them or go with several stock responses when 413 00:25:40,040 --> 00:25:42,399 Speaker 1: it didn't have a way of figuring out what it 414 00:25:42,400 --> 00:25:47,359 Speaker 1: should do. NIX also saw something else that would end 415 00:25:47,440 --> 00:25:50,400 Speaker 1: up creating a bit of a big setback for natural 416 00:25:50,520 --> 00:25:54,840 Speaker 1: language processor researchers. But I'll explain more about that when 417 00:25:54,920 --> 00:26:06,160 Speaker 1: we come back after a quick break to thank our sponsors. Okay, 418 00:26:06,200 --> 00:26:11,080 Speaker 1: So nineteen sixty six, what happened that set back research 419 00:26:11,119 --> 00:26:15,040 Speaker 1: in this field. Well, that's when a report was published 420 00:26:15,040 --> 00:26:18,359 Speaker 1: that had a dramatic impact on funding for R and 421 00:26:18,440 --> 00:26:22,680 Speaker 1: D and machine translation. It was called the ALPAC Report. 422 00:26:23,240 --> 00:26:27,680 Speaker 1: ALPAC a l p a C stood for Automatic Language 423 00:26:27,760 --> 00:26:31,800 Speaker 1: Processing Advisory Committee. This was a group consisting of various 424 00:26:31,800 --> 00:26:36,280 Speaker 1: experts and fields ranging from computer science to linguistics to psychology, 425 00:26:36,560 --> 00:26:38,960 Speaker 1: and the U. S. Government had established the committee back 426 00:26:39,000 --> 00:26:42,119 Speaker 1: in nineteen sixty four, and they had a very simple assignment, 427 00:26:42,440 --> 00:26:46,000 Speaker 1: or at least simple on the surface, which was evaluate 428 00:26:46,119 --> 00:26:49,880 Speaker 1: the progress that was being made an automatic machine translation 429 00:26:50,160 --> 00:26:53,360 Speaker 1: across the board, look at what everyone's working on. Give 430 00:26:53,480 --> 00:26:56,320 Speaker 1: us an idea of where we are and where we're headed. 431 00:26:56,680 --> 00:27:00,720 Speaker 1: The nineteen sixty six report essentially concluded that the field 432 00:27:00,800 --> 00:27:03,800 Speaker 1: was still in its infancy, and that before any real 433 00:27:03,840 --> 00:27:07,479 Speaker 1: advancements could happen, a lot more basic research in the 434 00:27:07,520 --> 00:27:11,879 Speaker 1: field of computational linguistics would be required. So essentially, the 435 00:27:11,960 --> 00:27:14,840 Speaker 1: report was saying, we're trying to move at a full gallop, 436 00:27:14,880 --> 00:27:17,040 Speaker 1: but we still aren't really sure how to get on 437 00:27:17,080 --> 00:27:21,520 Speaker 1: the horse. I'm paraphrasing, of course. One result of this 438 00:27:21,880 --> 00:27:25,119 Speaker 1: was that the US government began to scale back grants 439 00:27:25,200 --> 00:27:28,639 Speaker 1: for research in the field of machine translation. This was, 440 00:27:28,720 --> 00:27:33,000 Speaker 1: unfortunately exactly the opposite thing that needed to happen. The 441 00:27:33,040 --> 00:27:37,280 Speaker 1: US government wanted more immediate results and decided, well, if 442 00:27:37,320 --> 00:27:39,680 Speaker 1: you're not going to get results right away, we're gonna 443 00:27:39,760 --> 00:27:42,679 Speaker 1: take that money away and put it to use somewhere else. 444 00:27:43,240 --> 00:27:46,119 Speaker 1: And that made funding scarce, and it likely prolonged the 445 00:27:46,160 --> 00:27:49,280 Speaker 1: amount of time it took to advance the discipline. Although 446 00:27:49,400 --> 00:27:52,080 Speaker 1: I should stress work was still being performed in the 447 00:27:52,160 --> 00:27:54,960 Speaker 1: United States as well as elsewhere. It's not like this 448 00:27:55,359 --> 00:27:58,760 Speaker 1: brought everything to a standstill. It just slowed down quite 449 00:27:58,800 --> 00:28:04,120 Speaker 1: a bit. By teen sixty seven, NLP research was straining 450 00:28:04,280 --> 00:28:10,600 Speaker 1: against technological limitations. They were starting to feel the the 451 00:28:10,840 --> 00:28:14,440 Speaker 1: very limit of what computers were able to do. Even 452 00:28:14,480 --> 00:28:18,240 Speaker 1: advanced systems could take upwards of seven minutes to analyze 453 00:28:18,280 --> 00:28:22,800 Speaker 1: a long sentence. Programming was still largely in a similar language, 454 00:28:22,840 --> 00:28:24,760 Speaker 1: so it wasn't easy to do. And you would still 455 00:28:24,800 --> 00:28:27,639 Speaker 1: have to interact with machines using punch cards, so that 456 00:28:27,720 --> 00:28:30,440 Speaker 1: was also laborious, and heaven help you if you dropped 457 00:28:30,480 --> 00:28:32,480 Speaker 1: all your punch cards and you forgot to number them, 458 00:28:32,520 --> 00:28:36,439 Speaker 1: because then you've ruined your program. Work was progressing on 459 00:28:36,480 --> 00:28:39,560 Speaker 1: the linguistic side, but the technological side was kind of 460 00:28:39,640 --> 00:28:42,840 Speaker 1: lagging behind at this point. One of the big decisions 461 00:28:42,880 --> 00:28:45,719 Speaker 1: researchers had to make around this time was what were 462 00:28:45,760 --> 00:28:49,600 Speaker 1: they going to focus on first while building out computational linguistics. 463 00:28:49,640 --> 00:28:53,200 Speaker 1: Because it's such a huge problem you couldn't really tackle 464 00:28:53,240 --> 00:28:56,840 Speaker 1: it wholesale. You needed to kind of focus on specifics. 465 00:28:56,880 --> 00:29:00,920 Speaker 1: So should research focus on syntax that all about sentence 466 00:29:01,000 --> 00:29:03,400 Speaker 1: form and structure, as I mentioned earlier, or should it 467 00:29:03,440 --> 00:29:06,640 Speaker 1: focus on semantics, which is more about the underlying meaning 468 00:29:06,760 --> 00:29:09,760 Speaker 1: of what was said and less about the structure of 469 00:29:09,840 --> 00:29:15,000 Speaker 1: how it was said. And ultimately, most researchers, not all 470 00:29:15,040 --> 00:29:17,840 Speaker 1: of them, but most of them decided to focus on syntax. 471 00:29:17,920 --> 00:29:20,600 Speaker 1: For one thing, it seemed like a more analytical thing 472 00:29:20,640 --> 00:29:24,360 Speaker 1: to concentrate on. Right like, you could define rules more 473 00:29:24,440 --> 00:29:28,240 Speaker 1: easily for syntax than you could for semantics, and semantic 474 00:29:28,280 --> 00:29:31,200 Speaker 1: ambiguity could be fudged a bit. You can rely heavily 475 00:29:31,240 --> 00:29:34,680 Speaker 1: on output words that had a broad meaning. So using 476 00:29:34,680 --> 00:29:37,720 Speaker 1: a word with a broad meaning might not produce a specific, 477 00:29:37,960 --> 00:29:41,800 Speaker 1: precise result, but at least could be quote not wrong 478 00:29:41,920 --> 00:29:46,360 Speaker 1: end quote. So if a word might have several translations 479 00:29:46,480 --> 00:29:50,520 Speaker 1: ranging from hut to villa to bungalow to mansion, the 480 00:29:50,560 --> 00:29:55,400 Speaker 1: output word might be building because the translating program might 481 00:29:55,400 --> 00:29:59,120 Speaker 1: not know which variation of that translation it should go with, 482 00:29:59,680 --> 00:30:03,800 Speaker 1: but knows that all of those different examples fall into 483 00:30:03,840 --> 00:30:08,440 Speaker 1: a larger category called building. So that's not precise, but 484 00:30:08,480 --> 00:30:11,560 Speaker 1: it gets the job done. You you would understand what 485 00:30:11,680 --> 00:30:15,880 Speaker 1: the the actual noun was. In general, you would know 486 00:30:15,920 --> 00:30:17,640 Speaker 1: it was a building. You might not know that it 487 00:30:17,720 --> 00:30:20,400 Speaker 1: was a home, and you might not know what kind 488 00:30:20,440 --> 00:30:22,280 Speaker 1: of home it was, but you would at least know 489 00:30:22,520 --> 00:30:24,760 Speaker 1: that it was a structure. So much of the work 490 00:30:24,760 --> 00:30:28,640 Speaker 1: in the late nineteen sixties focused on solving syntax problems 491 00:30:28,680 --> 00:30:33,720 Speaker 1: for computers, with the researchers saying will worry about semantics later. 492 00:30:34,400 --> 00:30:37,160 Speaker 1: Some notable groups went against the flow and decided to 493 00:30:37,160 --> 00:30:42,320 Speaker 1: tackle semantics and semantically driven processing, partly because they recognized 494 00:30:42,360 --> 00:30:46,080 Speaker 1: it as being a really tough problem and some engineers 495 00:30:46,160 --> 00:30:50,000 Speaker 1: just love solving really hard problems. That's kind of what 496 00:30:50,280 --> 00:30:53,080 Speaker 1: thrills them, and so they chose to go that route. 497 00:30:53,400 --> 00:30:57,080 Speaker 1: They began building out semantic categories and worked on semantic 498 00:30:57,160 --> 00:31:02,280 Speaker 1: pattern matching using semantic networks as a means of knowledge representation. 499 00:31:03,440 --> 00:31:06,800 Speaker 1: Karen Spark Jones, who wrote that that history I mentioned earlier, 500 00:31:06,800 --> 00:31:09,840 Speaker 1: suggests that it was in the late nineteen sixties that 501 00:31:10,040 --> 00:31:13,320 Speaker 1: the research moved out of its initial phase and into 502 00:31:13,360 --> 00:31:16,560 Speaker 1: a second phase, and that second phase was largely marked 503 00:31:16,560 --> 00:31:22,600 Speaker 1: by the incorporation of artificial intelligence, including incorporating world knowledge 504 00:31:22,720 --> 00:31:27,680 Speaker 1: in processing natural language. In nineteen sixty eight, Terry Winograd, 505 00:31:27,760 --> 00:31:30,800 Speaker 1: who today is a Professor Emeritus of Computer Science at 506 00:31:30,840 --> 00:31:34,000 Speaker 1: Stanford University, was working in M I. T. S AI 507 00:31:34,120 --> 00:31:37,520 Speaker 1: Lab as part of his postgraduate studies, and he began 508 00:31:37,600 --> 00:31:40,920 Speaker 1: to work on a virtual world he would call s 509 00:31:41,120 --> 00:31:45,400 Speaker 1: h R D l U sued blue. Um, that's what 510 00:31:45,440 --> 00:31:47,880 Speaker 1: I'm going to call it is sued blue. It consisted 511 00:31:47,920 --> 00:31:51,600 Speaker 1: of virtual objects on a virtual table, so it's all imaginary, right. 512 00:31:52,040 --> 00:31:56,280 Speaker 1: He then programmed a grammar and lexicon specifically for this 513 00:31:56,560 --> 00:32:00,880 Speaker 1: very very limited imaginary world. So in anything that did 514 00:32:00,920 --> 00:32:04,000 Speaker 1: not involve the things that were in this imaginary world, 515 00:32:04,080 --> 00:32:07,560 Speaker 1: namely the table and these virtual objects, that didn't need 516 00:32:07,640 --> 00:32:10,520 Speaker 1: to be dealt with it all because it was immaterial, 517 00:32:10,600 --> 00:32:13,640 Speaker 1: It didn't exist in this universe. So he only had 518 00:32:13,640 --> 00:32:15,800 Speaker 1: to focus on the elements he had created, and that 519 00:32:15,920 --> 00:32:18,480 Speaker 1: limited the scope of his work and made it more manageable. 520 00:32:19,040 --> 00:32:22,800 Speaker 1: His design even included the concept of persistence and memory. 521 00:32:23,600 --> 00:32:28,160 Speaker 1: So imagine a table with a collection of five objects 522 00:32:28,200 --> 00:32:30,400 Speaker 1: on it. So you've got imaginary table, You've got five 523 00:32:30,440 --> 00:32:34,480 Speaker 1: imaginary objects on it. Two of the five imaginary objects 524 00:32:34,520 --> 00:32:37,600 Speaker 1: are spheres. One of them is a green sphere, and 525 00:32:37,640 --> 00:32:39,960 Speaker 1: one of them is a red sphere. You then type 526 00:32:39,960 --> 00:32:43,680 Speaker 1: in a command into a terminal that is that's giving 527 00:32:43,680 --> 00:32:47,200 Speaker 1: you information about this virtual world, and you say, I 528 00:32:47,240 --> 00:32:50,360 Speaker 1: want to move the red sphere over to the far 529 00:32:50,600 --> 00:32:53,280 Speaker 1: end of the table. And then you send another command, 530 00:32:53,320 --> 00:32:55,680 Speaker 1: only this time you don't specify red sphere. You just 531 00:32:55,720 --> 00:32:59,680 Speaker 1: say move the sphere back. Whino grad system could actually 532 00:32:59,720 --> 00:33:02,720 Speaker 1: remember ber that you had previously moved the red sphere, 533 00:33:03,040 --> 00:33:05,240 Speaker 1: and it would apply your command to the red sphere 534 00:33:05,280 --> 00:33:08,520 Speaker 1: again under the assumption that's what you meant. When you 535 00:33:08,840 --> 00:33:12,320 Speaker 1: didn't specify, you must have meant the same sphere that 536 00:33:12,400 --> 00:33:15,200 Speaker 1: you had just moved. This is a concept that we're 537 00:33:15,200 --> 00:33:18,880 Speaker 1: seeing rolled out into voice assistance today, like Google Assistant. 538 00:33:19,360 --> 00:33:22,720 Speaker 1: It's the ability to reference something you've already accessed without 539 00:33:22,800 --> 00:33:26,600 Speaker 1: having to specify what you're talking about. So if I 540 00:33:26,640 --> 00:33:29,840 Speaker 1: asked a voice assistant what the weather will be like today, 541 00:33:29,920 --> 00:33:32,400 Speaker 1: and then I follow that up after I get the information, 542 00:33:32,440 --> 00:33:35,920 Speaker 1: I say what about tomorrow, the system that has this 543 00:33:36,000 --> 00:33:39,360 Speaker 1: kind of capability could infer that what I meant was 544 00:33:39,960 --> 00:33:42,720 Speaker 1: what will the weather be like tomorrow, even though I 545 00:33:42,720 --> 00:33:46,040 Speaker 1: didn't say it specifically. Like that. That's pretty advanced for 546 00:33:46,160 --> 00:33:48,920 Speaker 1: nineteen sixty eight, even though it was for this very 547 00:33:48,960 --> 00:33:53,520 Speaker 1: restricted virtual world with a limited number of variables. However, 548 00:33:54,080 --> 00:33:56,760 Speaker 1: win no Grad discovered that the secret to his success 549 00:33:56,840 --> 00:34:00,360 Speaker 1: was largely in this restriction. As you expec ended the 550 00:34:00,440 --> 00:34:03,920 Speaker 1: virtual world to incorporate more elements, it made the problem 551 00:34:04,040 --> 00:34:07,920 Speaker 1: exponentially harder. His work, by the way, was an early 552 00:34:07,920 --> 00:34:11,560 Speaker 1: example of what we call anapho resolution, and an anaphour 553 00:34:11,640 --> 00:34:13,279 Speaker 1: is what I was talking about second ago. It's a 554 00:34:13,280 --> 00:34:16,759 Speaker 1: word or phrase that refers to an earlier word or 555 00:34:16,880 --> 00:34:20,799 Speaker 1: phrase within a discourse. So if I said move the 556 00:34:20,800 --> 00:34:24,080 Speaker 1: red sphere to the left, then I said, now move 557 00:34:24,120 --> 00:34:27,560 Speaker 1: it back the It obviously refers to the red sphere. 558 00:34:27,640 --> 00:34:30,759 Speaker 1: You would understand that, but a machine wouldn't necessarily understand it. 559 00:34:31,239 --> 00:34:33,520 Speaker 1: You would have to say move the red sphere to 560 00:34:33,520 --> 00:34:37,040 Speaker 1: the left, move the red sphere back. And even with back, 561 00:34:37,560 --> 00:34:40,200 Speaker 1: that has an element of memory to it, because the 562 00:34:40,239 --> 00:34:43,440 Speaker 1: system has to remember where the red sphere used to be. Why. 563 00:34:43,480 --> 00:34:46,040 Speaker 1: No Grad's approach was one of the early attempts to 564 00:34:46,080 --> 00:34:51,200 Speaker 1: incorporate anapho resolution into NLP models. Other models concentrated on 565 00:34:51,239 --> 00:34:54,600 Speaker 1: translating word by word or sentence by sentence. They were 566 00:34:54,640 --> 00:35:00,360 Speaker 1: incapable of maintaining relationships between between words beyond that. That 567 00:35:00,520 --> 00:35:04,600 Speaker 1: shift marked a change in attitude among NLP researchers of 568 00:35:04,680 --> 00:35:07,680 Speaker 1: the time. A growing number of researchers felt that world 569 00:35:07,719 --> 00:35:11,640 Speaker 1: knowledge and artificial intelligence was necessary if we wanted machines 570 00:35:11,719 --> 00:35:14,600 Speaker 1: to be able to analyze and act upon longer forms 571 00:35:14,640 --> 00:35:18,360 Speaker 1: of discourse. The early approaches to NLP were best suited 572 00:35:18,400 --> 00:35:24,400 Speaker 1: to short, self contained passages in ninety one, AREPA launched 573 00:35:24,400 --> 00:35:28,319 Speaker 1: the Speech Understanding Research Program. I also mentioned that in 574 00:35:28,360 --> 00:35:30,880 Speaker 1: the Speech recognition episode it was very important for the 575 00:35:30,920 --> 00:35:34,080 Speaker 1: development of speech recognition. The goal of that program was 576 00:35:34,120 --> 00:35:37,319 Speaker 1: to advance not only speech recognition but also n LP 577 00:35:37,520 --> 00:35:40,640 Speaker 1: research so that a computer could not just detect and 578 00:35:40,680 --> 00:35:44,680 Speaker 1: transcribe speech, but also respond to it in some meaningful way, 579 00:35:44,800 --> 00:35:49,000 Speaker 1: for example being able to UH index all that information 580 00:35:49,080 --> 00:35:53,480 Speaker 1: so that it is searchable. The program lasted five years. However, 581 00:35:53,600 --> 00:35:57,280 Speaker 1: at the conclusion, the agency was not satisfied with the results, 582 00:35:57,640 --> 00:36:00,520 Speaker 1: which technically delivered upon what was asked, but a pretty 583 00:36:00,640 --> 00:36:05,239 Speaker 1: limited implementation, so are BUT decided to cut funding. They 584 00:36:05,280 --> 00:36:08,800 Speaker 1: stopped the project. This was another big blow to research 585 00:36:08,800 --> 00:36:11,160 Speaker 1: in the United States, which had viewed the project as 586 00:36:11,160 --> 00:36:14,440 Speaker 1: a positive development ever since the ALPAC report had pulled 587 00:36:14,440 --> 00:36:17,920 Speaker 1: the RUG out from under the funding earlier. Now, I've 588 00:36:17,920 --> 00:36:20,560 Speaker 1: got a lot more to say about the development of 589 00:36:20,680 --> 00:36:24,439 Speaker 1: natural language processing and where we are now, as well 590 00:36:24,480 --> 00:36:27,239 Speaker 1: as the history of the various voice assistants that we're 591 00:36:27,280 --> 00:36:30,799 Speaker 1: familiar with today. But it's time to conclude this episode. 592 00:36:31,040 --> 00:36:33,239 Speaker 1: In our next episode, we'll pick up where I left 593 00:36:33,239 --> 00:36:36,160 Speaker 1: off today and we'll continue down and talk about all 594 00:36:36,200 --> 00:36:39,680 Speaker 1: of our beloved friends like Syrie and Alexa. Now, if 595 00:36:39,719 --> 00:36:43,880 Speaker 1: you have suggestions or future episodes of tech Stuff, right me. 596 00:36:44,040 --> 00:36:46,319 Speaker 1: Let me know what you want to hear. There might 597 00:36:46,360 --> 00:36:49,560 Speaker 1: be a specific technology or a company, a person in tech. 598 00:36:49,640 --> 00:36:51,560 Speaker 1: Maybe there's someone you want me to interview or have 599 00:36:51,680 --> 00:36:54,279 Speaker 1: on as a special guest host. You can send me 600 00:36:54,320 --> 00:36:57,160 Speaker 1: an email. The address for the show is tech Stuff 601 00:36:57,440 --> 00:37:00,439 Speaker 1: at how stuff works dot com, or you can drop 602 00:37:00,440 --> 00:37:02,719 Speaker 1: me a line on Facebook or Twitter. The handle of 603 00:37:02,840 --> 00:37:06,480 Speaker 1: both of those is tech Stuff H s W. Don't forget. 604 00:37:06,600 --> 00:37:08,680 Speaker 1: You can follow us on Instagram. I want to see 605 00:37:08,680 --> 00:37:10,920 Speaker 1: you guys over there, and I'll talk to you again 606 00:37:11,680 --> 00:37:20,480 Speaker 1: really soon for more on this and thousands of other topics, 607 00:37:20,560 --> 00:37:32,000 Speaker 1: because it how stuff works dot com