1
00:00:04,120 --> 00:00:07,160
Speaker 1: Get in touch with technology with tech Stuff from how

2
00:00:07,200 --> 00:00:14,120
Speaker 1: stuff works dot com. Hey there, and welcome to tech Stuff.

3
00:00:14,160 --> 00:00:17,400
Speaker 1: I'm your host, Jonathan Strickland. I'm an executive producer at

4
00:00:17,440 --> 00:00:20,680
Speaker 1: how stuff Works and I love all things tech. And

5
00:00:20,720 --> 00:00:24,000
Speaker 1: in the last episode, I covered the history and technology

6
00:00:24,040 --> 00:00:28,639
Speaker 1: behind speech recognition. So today we're going to look at

7
00:00:28,680 --> 00:00:34,440
Speaker 1: a related concept called natural language processing or natural language understanding.

8
00:00:34,479 --> 00:00:38,920
Speaker 1: The two are are related. This technology and speech recognition

9
00:00:39,000 --> 00:00:42,800
Speaker 1: are both part of what make voice assistants like Sirie,

10
00:00:43,120 --> 00:00:46,840
Speaker 1: Alexa and Google Assistant work, though there are other technologies

11
00:00:46,880 --> 00:00:49,040
Speaker 1: that also go into that. Now, this is a huge

12
00:00:49,120 --> 00:00:53,120
Speaker 1: topic and as a long and fascinating history, so this

13
00:00:53,200 --> 00:00:55,120
Speaker 1: episode is just going to be the start of it.

14
00:00:55,320 --> 00:00:58,320
Speaker 1: In the next episode, I will conclude a discussion on

15
00:00:58,480 --> 00:01:01,360
Speaker 1: natural language processing and go into the history of these

16
00:01:01,400 --> 00:01:05,920
Speaker 1: actual voice assistants. So, on a high level, what is

17
00:01:06,120 --> 00:01:11,080
Speaker 1: natural language processing? Well, simply put, it's programming a machine

18
00:01:11,160 --> 00:01:14,720
Speaker 1: to interpret language the way we use it we human beings.

19
00:01:14,840 --> 00:01:19,640
Speaker 1: So in an ideal implementation, which would also require advanced

20
00:01:19,720 --> 00:01:23,680
Speaker 1: artificial intelligence, you could speak to a machine or type

21
00:01:23,720 --> 00:01:25,760
Speaker 1: whatever you like into a terminal and it would be

22
00:01:25,800 --> 00:01:29,080
Speaker 1: able to understand what you meant. What your commands were,

23
00:01:29,200 --> 00:01:32,800
Speaker 1: no matter how you worded the phrase. In turn, the

24
00:01:32,880 --> 00:01:36,440
Speaker 1: machine would be able to generate responses that made linguistic

25
00:01:36,560 --> 00:01:39,959
Speaker 1: sense to us, and we could in effect hold entire

26
00:01:40,080 --> 00:01:44,840
Speaker 1: conversations with those machines. This, as it turns out, is

27
00:01:44,880 --> 00:01:49,000
Speaker 1: a very difficult challenge. Even creating a machine that can

28
00:01:49,040 --> 00:01:52,560
Speaker 1: respond to basic commands delivered in a natural language is

29
00:01:52,720 --> 00:01:56,080
Speaker 1: really really hard to do, and we haven't yet cracked

30
00:01:56,240 --> 00:02:00,520
Speaker 1: the nut on making a machine that can actually hold

31
00:02:00,560 --> 00:02:04,040
Speaker 1: a real conversation with us. Yet we can sometimes forget

32
00:02:04,520 --> 00:02:09,520
Speaker 1: that machines do not natively understand human language. Machines process

33
00:02:09,600 --> 00:02:13,639
Speaker 1: information in machine code, which is difficult for humans to understand.

34
00:02:14,120 --> 00:02:17,480
Speaker 1: I almost said impossible for humans to understand, but really

35
00:02:17,880 --> 00:02:22,600
Speaker 1: it's just impractical. It's incredibly difficult. So, for example, computers

36
00:02:22,600 --> 00:02:26,639
Speaker 1: that run on binary systems process all information in zeros

37
00:02:26,760 --> 00:02:29,840
Speaker 1: and ones. Ultimately, when you get down to it, so

38
00:02:29,880 --> 00:02:31,880
Speaker 1: if you were to look at a sheet of zeros

39
00:02:31,919 --> 00:02:36,280
Speaker 1: and ones, it would probably seem completely incomprehensible to you,

40
00:02:36,440 --> 00:02:40,560
Speaker 1: although to a computer it could seem perfectly logical. Our

41
00:02:40,680 --> 00:02:46,000
Speaker 1: language is equally incomprehensible to machines. Programming languages make it

42
00:02:46,080 --> 00:02:49,079
Speaker 1: easier for humans to make machines do what we want

43
00:02:49,160 --> 00:02:52,960
Speaker 1: them to do. Programming languages create a level of abstraction

44
00:02:53,200 --> 00:02:56,200
Speaker 1: between human language and machine language. It's kind of a

45
00:02:56,600 --> 00:02:59,600
Speaker 1: meeting ground in the middle. Programming languages tend to be

46
00:02:59,720 --> 00:03:05,079
Speaker 1: highly structured with specific strict sets of rules. Programming within

47
00:03:05,160 --> 00:03:08,200
Speaker 1: those rules will get you the results you want, assuming

48
00:03:08,360 --> 00:03:11,960
Speaker 1: your code is good, but if you stray outside those rules,

49
00:03:12,160 --> 00:03:15,359
Speaker 1: you start to get errors. Human language is much more

50
00:03:15,440 --> 00:03:20,200
Speaker 1: variable and complicated and ambiguous, and that's something that machines

51
00:03:20,200 --> 00:03:22,880
Speaker 1: are not very good at handling. Now, if you've ever

52
00:03:22,880 --> 00:03:26,600
Speaker 1: played a text based adventure from way back in the day,

53
00:03:26,639 --> 00:03:29,800
Speaker 1: like Zork, you know that those adventure games have a

54
00:03:29,880 --> 00:03:34,080
Speaker 1: very limited vocabulary. The game can accept certain commands, but

55
00:03:34,200 --> 00:03:37,200
Speaker 1: only because the programmer built in the option in the game.

56
00:03:37,280 --> 00:03:40,880
Speaker 1: They incorporated that in the game's design. So you might

57
00:03:40,920 --> 00:03:44,200
Speaker 1: be able to type something like go north or just north,

58
00:03:44,280 --> 00:03:46,840
Speaker 1: and the game understands you want your character to move

59
00:03:46,880 --> 00:03:49,240
Speaker 1: to a new location that's to the north of your

60
00:03:49,240 --> 00:03:52,480
Speaker 1: current location. But maybe you type something else, maybe you

61
00:03:52,520 --> 00:03:57,120
Speaker 1: type jog north or saunter north, and the programmer didn't

62
00:03:57,160 --> 00:03:58,880
Speaker 1: think of that. They didn't come up with all the

63
00:03:58,920 --> 00:04:01,560
Speaker 1: different ways you have describe the way you want to

64
00:04:01,640 --> 00:04:04,240
Speaker 1: move north, so you might get a result that says

65
00:04:04,280 --> 00:04:07,440
Speaker 1: something like I didn't understand that, or you can't do

66
00:04:07,480 --> 00:04:12,360
Speaker 1: that here. Computers only have the illusion of understanding us.

67
00:04:12,400 --> 00:04:15,720
Speaker 1: They don't actually know what we mean when we say something,

68
00:04:15,760 --> 00:04:19,599
Speaker 1: at least not natively. Now, that meant that for most

69
00:04:19,640 --> 00:04:22,640
Speaker 1: of our history with computers, humans have had to learn

70
00:04:22,720 --> 00:04:25,560
Speaker 1: how to work with machines, not the other way around.

71
00:04:26,000 --> 00:04:30,719
Speaker 1: We have had to learn commands and syntax that machines accept,

72
00:04:31,120 --> 00:04:32,960
Speaker 1: and if we try to word those commands in a

73
00:04:33,000 --> 00:04:36,760
Speaker 1: different way, we tend to get an error. Natural language

74
00:04:36,760 --> 00:04:40,000
Speaker 1: processing attempts to flip the tables on this relationship and

75
00:04:40,000 --> 00:04:43,039
Speaker 1: teach machines how to work with humans so that we

76
00:04:43,080 --> 00:04:45,599
Speaker 1: don't have to go through any sort of learning curve.

77
00:04:45,640 --> 00:04:48,960
Speaker 1: We don't need to formulate our our commands in a

78
00:04:49,000 --> 00:04:53,360
Speaker 1: specific way to be understood. The technology works on our terms,

79
00:04:53,640 --> 00:04:56,640
Speaker 1: or as close to those as we can manage. That

80
00:04:56,720 --> 00:04:59,800
Speaker 1: means that programmers have to build systems that can parse

81
00:05:00,160 --> 00:05:03,680
Speaker 1: language for meaning, and it also means having to build

82
00:05:03,760 --> 00:05:07,160
Speaker 1: tools and machines that can handle stuff that you typically

83
00:05:07,240 --> 00:05:11,600
Speaker 1: encounter in higher level language courses. So here's a quick

84
00:05:11,720 --> 00:05:16,480
Speaker 1: rundown on some of the stuff a natural language processing

85
00:05:16,480 --> 00:05:21,000
Speaker 1: approach has to take into account. First, you have grammar. Now,

86
00:05:21,000 --> 00:05:25,120
Speaker 1: grammar can refer to the study of language, but generally speaking,

87
00:05:25,120 --> 00:05:27,200
Speaker 1: when we say grammar, or at least when I'm using

88
00:05:27,240 --> 00:05:30,640
Speaker 1: the term in the context of natural language processing, I

89
00:05:30,680 --> 00:05:35,320
Speaker 1: mean a set of rules for the organization of components

90
00:05:35,360 --> 00:05:39,760
Speaker 1: of a language into meaningful statements or sentences. This is

91
00:05:39,800 --> 00:05:43,520
Speaker 1: a broad concept. It is a big, big idea. It

92
00:05:43,560 --> 00:05:47,479
Speaker 1: actually encompasses a couple of other also big ideas that

93
00:05:47,520 --> 00:05:50,880
Speaker 1: are important in natural language processing. One of those is

94
00:05:50,920 --> 00:05:56,400
Speaker 1: the concept of morphology. Morphology has to do with word forms.

95
00:05:57,240 --> 00:06:01,080
Speaker 1: Words consist of more themes, and a word can actually

96
00:06:01,120 --> 00:06:04,599
Speaker 1: have multiple moreph themes. So, for example, let's take a

97
00:06:04,640 --> 00:06:10,080
Speaker 1: word like sky divers. Sky divers technically has four more themes,

98
00:06:10,120 --> 00:06:16,840
Speaker 1: and they are sky dive er and s sky divers.

99
00:06:16,880 --> 00:06:20,080
Speaker 1: The more themes only make sense if we put them

100
00:06:20,120 --> 00:06:24,760
Speaker 1: in that particular order. For the word skydivers, dive skiers

101
00:06:24,839 --> 00:06:27,760
Speaker 1: does not mean the same thing. Actually, it doesn't mean

102
00:06:27,880 --> 00:06:30,200
Speaker 1: anything at all. So a good system will have to

103
00:06:30,240 --> 00:06:34,200
Speaker 1: understand morphology and know how words can and cannot be formed.

104
00:06:34,600 --> 00:06:38,039
Speaker 1: So again, with skydivers and knows all right, well, I

105
00:06:38,200 --> 00:06:40,320
Speaker 1: know the word sky, I know what that means. I

106
00:06:40,360 --> 00:06:43,279
Speaker 1: know what the word dive means. Er means that this

107
00:06:43,360 --> 00:06:47,040
Speaker 1: is not an action. This is actually an entity that

108
00:06:47,160 --> 00:06:50,599
Speaker 1: engages in that action. Right. A sky diver is someone

109
00:06:50,640 --> 00:06:54,919
Speaker 1: who's skydives, and the s SO says it's plural, so

110
00:06:54,960 --> 00:06:59,200
Speaker 1: that there's more than one skydiver. That's what morphology is

111
00:06:59,240 --> 00:07:02,880
Speaker 1: all about. This is this sort of internal logic of

112
00:07:02,920 --> 00:07:09,240
Speaker 1: word formation. Syntax is another big concept within grammar. Syntax, however,

113
00:07:09,320 --> 00:07:13,560
Speaker 1: does not refer to word formation. It refers to sentence structure.

114
00:07:13,600 --> 00:07:18,680
Speaker 1: How do we arrange words to make meaningful sentences. For example,

115
00:07:18,880 --> 00:07:23,200
Speaker 1: the sentence you must have patience, my young Padawan. That

116
00:07:23,240 --> 00:07:27,560
Speaker 1: follows good syntax, but patients you must have my young

117
00:07:27,640 --> 00:07:31,360
Speaker 1: Padawan is a bit hanky because Yoda is all over

118
00:07:31,400 --> 00:07:35,760
Speaker 1: the place with his syntax. In addition to grammar, you

119
00:07:35,840 --> 00:07:39,240
Speaker 1: also have to take into account semantics. Now, that is

120
00:07:39,280 --> 00:07:43,240
Speaker 1: the study of the meaning within language. This is a

121
00:07:43,240 --> 00:07:46,160
Speaker 1: tricky one because there's a lot to unwrap here. For example,

122
00:07:46,480 --> 00:07:50,440
Speaker 1: words and phrases can actually stand for different meanings. They

123
00:07:50,440 --> 00:07:54,960
Speaker 1: can denote different ideas. We might use many different phrases

124
00:07:55,120 --> 00:07:58,320
Speaker 1: or words to describe the same concept. Right, So we

125
00:07:58,400 --> 00:08:02,320
Speaker 1: might use a usen or more different ways to say

126
00:08:02,360 --> 00:08:05,840
Speaker 1: the same thing, or we might use two similar words

127
00:08:05,960 --> 00:08:09,240
Speaker 1: or phrases to describe very different concepts. We might even

128
00:08:09,360 --> 00:08:13,880
Speaker 1: use the same phrase to describe wildly different things or

129
00:08:13,920 --> 00:08:16,840
Speaker 1: with very different meanings. Semantics gets down to what we

130
00:08:16,880 --> 00:08:20,320
Speaker 1: actually mean when we say something. If you've ever had

131
00:08:20,360 --> 00:08:23,920
Speaker 1: a discussion with someone and that person says, you know

132
00:08:24,000 --> 00:08:27,800
Speaker 1: what I meant, that's essentially a statement that indicates semantically

133
00:08:28,280 --> 00:08:31,800
Speaker 1: the meaning was clear, even if the phrasing did not

134
00:08:32,000 --> 00:08:35,800
Speaker 1: indicate it on the face of things. Then there is

135
00:08:35,880 --> 00:08:41,600
Speaker 1: pragmatics that's all about context. Contextual information is incredibly important

136
00:08:41,600 --> 00:08:45,240
Speaker 1: in communication, and it relates a little bit to semantics.

137
00:08:45,320 --> 00:08:50,000
Speaker 1: Semantics is about structure, and pragmatics is about context. So

138
00:08:50,040 --> 00:08:53,920
Speaker 1: if I say the weather sure is nice today, on

139
00:08:54,080 --> 00:08:55,880
Speaker 1: the face of it, that sounds like I'm in favor

140
00:08:56,080 --> 00:08:58,520
Speaker 1: of the way the weather is. Right, it sounds like, oh,

141
00:08:58,640 --> 00:09:01,120
Speaker 1: I like how the weather is. But if I say

142
00:09:01,120 --> 00:09:04,800
Speaker 1: that same phrase while I'm standing in a downpour and

143
00:09:04,880 --> 00:09:08,960
Speaker 1: I'm clearly not happy, I'm obviously being sarcastic. I mean

144
00:09:09,000 --> 00:09:12,600
Speaker 1: the opposite of what I actually said. The context of

145
00:09:12,640 --> 00:09:16,240
Speaker 1: the situation changes the meaning of what I am saying,

146
00:09:16,600 --> 00:09:19,839
Speaker 1: even though the actual phrasing would seem to indicate the

147
00:09:19,920 --> 00:09:23,959
Speaker 1: opposite of what my meaning was. As we develop more

148
00:09:24,000 --> 00:09:26,600
Speaker 1: technology that can communicate with us, we have to take

149
00:09:26,600 --> 00:09:30,120
Speaker 1: pragmatics into consideration, or else machines are going to be

150
00:09:30,160 --> 00:09:34,080
Speaker 1: misinterpreting what we actually mean when we say stuff. So

151
00:09:34,320 --> 00:09:36,160
Speaker 1: machines are going to have to learn how to deal

152
00:09:36,200 --> 00:09:41,280
Speaker 1: with stuff like sarcasm. Yeah. Right. Then we have phonology,

153
00:09:41,400 --> 00:09:44,680
Speaker 1: that is the sound of a language. I talked a

154
00:09:44,679 --> 00:09:48,000
Speaker 1: little bit about this in the Speech Recognition podcast about

155
00:09:48,000 --> 00:09:51,000
Speaker 1: how different languages have different phonemes. So I'm not going

156
00:09:51,040 --> 00:09:52,960
Speaker 1: to dwell on that again. You can listen to the

157
00:09:53,000 --> 00:09:56,200
Speaker 1: Speech Recognition podcast to learn more about it. But it

158
00:09:56,320 --> 00:09:59,439
Speaker 1: is an important element in languages, especially when you get

159
00:09:59,480 --> 00:10:05,000
Speaker 1: into uh natural language processing that is taking verbal input

160
00:10:05,120 --> 00:10:09,520
Speaker 1: and not just textual input. Then you have lexicons that's

161
00:10:09,559 --> 00:10:14,240
Speaker 1: the total vocabulary for a system. Ideally, alexicon has not

162
00:10:14,360 --> 00:10:18,240
Speaker 1: just the words, but some sort of metadata attached that

163
00:10:18,360 --> 00:10:22,000
Speaker 1: indicate the meaning of words or the relationship of words

164
00:10:22,080 --> 00:10:24,760
Speaker 1: with one another. Though you can fudge this a little

165
00:10:24,760 --> 00:10:27,280
Speaker 1: bit depending upon the implementation of the system. I'll talk

166
00:10:27,320 --> 00:10:30,719
Speaker 1: a lot more about that throughout these podcasts. Now, these

167
00:10:30,760 --> 00:10:34,840
Speaker 1: can be tricky concepts for human beings, let alone for machines.

168
00:10:35,160 --> 00:10:39,640
Speaker 1: Machines are very good at following strict sets of instructions,

169
00:10:40,120 --> 00:10:43,760
Speaker 1: but language can sometimes defy logic. Think of rules that

170
00:10:43,840 --> 00:10:47,960
Speaker 1: apply to your native language, then just think of the

171
00:10:48,000 --> 00:10:52,040
Speaker 1: exceptions that exist to those rules. Every language has exceptions

172
00:10:52,080 --> 00:10:55,520
Speaker 1: for rules that are established, and depending upon the rule

173
00:10:55,679 --> 00:10:58,160
Speaker 1: and the exception, there may seem to be no rhyme

174
00:10:58,440 --> 00:11:01,600
Speaker 1: or reason for the deviation and from the rule. Moreover,

175
00:11:02,240 --> 00:11:05,480
Speaker 1: if we want machines that are capable of understanding us

176
00:11:05,640 --> 00:11:08,680
Speaker 1: and responding to our language in a meaningful way, those

177
00:11:08,720 --> 00:11:12,040
Speaker 1: machines need to be able to handle the idiosyncrasies of

178
00:11:12,120 --> 00:11:16,360
Speaker 1: individual speakers. To some extent. There may be regional turns

179
00:11:16,400 --> 00:11:19,880
Speaker 1: of phrase or vocabulary that don't extend to the general

180
00:11:19,920 --> 00:11:24,199
Speaker 1: population of speakers of the respected language. So you might

181
00:11:24,440 --> 00:11:29,680
Speaker 1: encounter a person who speaks in local idioms quite a bit,

182
00:11:30,320 --> 00:11:33,520
Speaker 1: and if those are not frequently used in the broader

183
00:11:33,559 --> 00:11:37,320
Speaker 1: general population of that language, then you're gonna have a

184
00:11:37,320 --> 00:11:40,680
Speaker 1: lot of communication errors between that person and a machine

185
00:11:40,800 --> 00:11:44,880
Speaker 1: that is trying to process that language. Ideally, machines would

186
00:11:44,880 --> 00:11:48,520
Speaker 1: be able to understand whatever we say and interpret the

187
00:11:48,600 --> 00:11:52,360
Speaker 1: meaning correctly, although we haven't even gotten to a world

188
00:11:52,360 --> 00:11:54,920
Speaker 1: where human beings can do that reliably, So I don't

189
00:11:54,920 --> 00:11:57,360
Speaker 1: know why I'm holding machines up to such a high standard.

190
00:11:57,600 --> 00:11:59,960
Speaker 1: We definitely would want them to reach a certain love

191
00:12:00,200 --> 00:12:05,319
Speaker 1: of confidence and and capability, however that machines just are

192
00:12:05,360 --> 00:12:09,200
Speaker 1: not quite there yet. I'm going to talk a lot

193
00:12:09,360 --> 00:12:13,640
Speaker 1: more about the history of natural language processing in just

194
00:12:13,679 --> 00:12:16,680
Speaker 1: a moment, but first let's take a quick break to

195
00:12:16,800 --> 00:12:27,960
Speaker 1: thank our sponsor. The history of natural language processing is

196
00:12:28,120 --> 00:12:32,920
Speaker 1: pretty darn complicated because it involves multiple lines of research

197
00:12:33,120 --> 00:12:37,559
Speaker 1: and lots of different disciplines. So we have all sorts

198
00:12:37,559 --> 00:12:40,480
Speaker 1: of things that play into this, like hidden Markov models

199
00:12:40,520 --> 00:12:45,000
Speaker 1: I talked about those in the Speech Recognition podcast, neural networks,

200
00:12:45,360 --> 00:12:50,239
Speaker 1: referencing language using mathematical vectors, and a lot more contributing

201
00:12:50,240 --> 00:12:53,240
Speaker 1: to the evolution of natural language processing, and a lot

202
00:12:53,280 --> 00:12:58,359
Speaker 1: of disciplines like not just computer science, but linguistics and psychology.

203
00:12:58,520 --> 00:13:02,880
Speaker 1: So there's not like a single line I can follow

204
00:13:03,240 --> 00:13:07,040
Speaker 1: where it's a lad to be led to see. So

205
00:13:07,080 --> 00:13:10,240
Speaker 1: we're gonna be jumping around a little bit. However, one

206
00:13:10,280 --> 00:13:12,160
Speaker 1: of the sources I want to call out that I

207
00:13:12,240 --> 00:13:15,040
Speaker 1: used while I was researching this episode was a paper

208
00:13:15,080 --> 00:13:20,160
Speaker 1: written by Karen Spark Jones called Natural Language Processing a

209
00:13:20,240 --> 00:13:24,320
Speaker 1: Historical Review. It's pretty dense, it's pretty technical, but it's

210
00:13:24,360 --> 00:13:26,800
Speaker 1: also available to read online if you want a more

211
00:13:26,840 --> 00:13:29,840
Speaker 1: thorough treatment of the history of the technology up to

212
00:13:29,880 --> 00:13:32,400
Speaker 1: two thousand. I'm gonna be skimming over quite a bit

213
00:13:32,440 --> 00:13:35,000
Speaker 1: of it because, as I say, it gets really deep

214
00:13:35,040 --> 00:13:38,200
Speaker 1: and really technical, and it uses a lot of shorthand

215
00:13:38,240 --> 00:13:40,600
Speaker 1: to reference things, which meant that I had to do

216
00:13:40,640 --> 00:13:43,800
Speaker 1: a lot of jumping down research rabbit holes to learn more.

217
00:13:43,840 --> 00:13:47,960
Speaker 1: But it was a very useful starting point for this research.

218
00:13:48,440 --> 00:13:51,040
Speaker 1: And also it was published in two thousand one. Obviously

219
00:13:51,480 --> 00:13:54,320
Speaker 1: a lot has happened since then. We're almost two decades

220
00:13:54,400 --> 00:13:58,280
Speaker 1: out from that. But I'm gonna start at the beginning

221
00:13:58,320 --> 00:14:01,360
Speaker 1: and then work my way up to what's going on today.

222
00:14:01,400 --> 00:14:05,320
Speaker 1: So early work in natural language processing it actually surprised me.

223
00:14:05,360 --> 00:14:07,640
Speaker 1: I was surprised at how old it was. It actually

224
00:14:07,720 --> 00:14:11,079
Speaker 1: dates all the way back to the nineteen forties. Physicist

225
00:14:11,120 --> 00:14:15,360
Speaker 1: and computer scientist Andrew Donald Booth proposed using computers to

226
00:14:15,400 --> 00:14:19,360
Speaker 1: translate passages from one language into another, which is the

227
00:14:19,400 --> 00:14:21,640
Speaker 1: type of natural language processing. You have to be able

228
00:14:21,640 --> 00:14:25,200
Speaker 1: to recognize the words of one language and then map

229
00:14:25,320 --> 00:14:28,800
Speaker 1: them to a similar meaning in a different language. Now,

230
00:14:28,840 --> 00:14:33,000
Speaker 1: Booth's approach involved creating a word for word model. If

231
00:14:33,000 --> 00:14:36,440
Speaker 1: the model couldn't find a match between two words, it

232
00:14:36,480 --> 00:14:40,440
Speaker 1: would automatically discard the last letter on the input word

233
00:14:40,760 --> 00:14:43,840
Speaker 1: and try again. It would do this until it found

234
00:14:43,840 --> 00:14:45,840
Speaker 1: a match, or if it didn't find a match, you've

235
00:14:45,840 --> 00:14:48,720
Speaker 1: got an error. But it would find a match, it

236
00:14:48,760 --> 00:14:51,080
Speaker 1: would search its memory to see if the ending of

237
00:14:51,120 --> 00:14:54,320
Speaker 1: the input word could give information about what the ending

238
00:14:54,440 --> 00:14:57,920
Speaker 1: does to the meaning of the word. So, for example,

239
00:14:58,120 --> 00:15:01,240
Speaker 1: if you were using this to tr inslate from English

240
00:15:01,320 --> 00:15:06,680
Speaker 1: into Russian and you use the word writer, maybe writer

241
00:15:07,040 --> 00:15:11,240
Speaker 1: does not show up in the Russian lexicon, but right

242
00:15:11,720 --> 00:15:17,840
Speaker 1: does w R I T E. So the translating program

243
00:15:17,880 --> 00:15:21,800
Speaker 1: tries to translate writer from English into Russian, cannot find

244
00:15:21,840 --> 00:15:25,760
Speaker 1: a Russian equivalent to writer, drops the r looks for

245
00:15:25,800 --> 00:15:28,440
Speaker 1: the Russian word for right, and it finds it. Then says,

246
00:15:28,480 --> 00:15:31,640
Speaker 1: all right, well, in English, what does writer remain? What

247
00:15:31,680 --> 00:15:36,200
Speaker 1: does that are due to the word right and it

248
00:15:36,240 --> 00:15:38,480
Speaker 1: looks at its memory and finds out that the letter

249
00:15:38,720 --> 00:15:43,040
Speaker 1: R makes a a noun out of the verb, but

250
00:15:43,240 --> 00:15:47,160
Speaker 1: it creates an entity that does the action, which is

251
00:15:47,400 --> 00:15:51,280
Speaker 1: to right. Then it looks in the Russian lexicon and says,

252
00:15:51,520 --> 00:15:54,720
Speaker 1: all right, well, is there a word in that lexicon

253
00:15:55,160 --> 00:15:59,400
Speaker 1: that matches this meaning. It's kind of a slow, laborious

254
00:15:59,480 --> 00:16:02,240
Speaker 1: way of doing things, but was also very very early.

255
00:16:02,280 --> 00:16:07,360
Speaker 1: I mean it was the following year, in nine, Warren

256
00:16:07,440 --> 00:16:12,160
Speaker 1: Weaver produced a memorandum about machine translation, and Weaver admitted

257
00:16:12,200 --> 00:16:14,840
Speaker 1: in the memorandum that such an application would likely be

258
00:16:14,960 --> 00:16:17,760
Speaker 1: much more challenging than what he understood it to be,

259
00:16:18,360 --> 00:16:21,560
Speaker 1: but that he was quote willing to expose my ignorance,

260
00:16:21,600 --> 00:16:24,920
Speaker 1: hoping that will be slightly shielded by my intentions in

261
00:16:25,000 --> 00:16:28,640
Speaker 1: the quote. And I think that's rather charming. In that memo,

262
00:16:29,080 --> 00:16:32,560
Speaker 1: Weaver cites a letter he wrote to Professor Norbert Wiener

263
00:16:32,680 --> 00:16:36,800
Speaker 1: of M I T. And that included the following paragraph.

264
00:16:36,880 --> 00:16:40,600
Speaker 1: So here's a full paragraph. Actually it's two paragraphs from

265
00:16:40,640 --> 00:16:45,920
Speaker 1: the memorandum recognizing fully, even though necessarily vaguely, the semantic

266
00:16:46,000 --> 00:16:50,400
Speaker 1: difficulties because of multiple meanings, etcetera. I have wondered if

267
00:16:50,440 --> 00:16:54,119
Speaker 1: it were unthinkable to design a computer which would translate,

268
00:16:54,520 --> 00:16:58,360
Speaker 1: even if it would only translate only scientific material, where

269
00:16:58,360 --> 00:17:02,280
Speaker 1: the semantic difficulties are very notably less, and even if

270
00:17:02,320 --> 00:17:06,440
Speaker 1: it did produce an inelegant but intelligible result, it would

271
00:17:06,440 --> 00:17:10,960
Speaker 1: seem to me worthwhile also knowing nothing official about, but

272
00:17:11,280 --> 00:17:16,199
Speaker 1: having guests and inferred considerable about powerful new mechanized methods

273
00:17:16,200 --> 00:17:20,040
Speaker 1: and cryptography methods which I believe succeed even when one

274
00:17:20,080 --> 00:17:23,560
Speaker 1: does not know what language has been coded. One naturally

275
00:17:23,640 --> 00:17:27,400
Speaker 1: wonders if the problem of translation could conceivably be treated

276
00:17:27,440 --> 00:17:30,600
Speaker 1: as a problem in cryptography. When I look at an

277
00:17:30,680 --> 00:17:34,040
Speaker 1: article in Russian, I say, this is really written in English,

278
00:17:34,119 --> 00:17:36,960
Speaker 1: but it has been coded in some strange symbols I

279
00:17:37,000 --> 00:17:40,439
Speaker 1: will now proceed to decode. So he got this idea

280
00:17:40,480 --> 00:17:43,720
Speaker 1: because of activities that were going on in World War Two,

281
00:17:44,240 --> 00:17:48,240
Speaker 1: where teams were trying to decode messages. And they might

282
00:17:48,400 --> 00:17:52,520
Speaker 1: decode the message, they might figure out what letters correspond

283
00:17:52,600 --> 00:17:55,159
Speaker 1: to the code, but it may even be in a

284
00:17:55,160 --> 00:17:58,400
Speaker 1: totally different language than when they speak. So while they

285
00:17:58,400 --> 00:18:01,680
Speaker 1: are able to decode the message into a native language,

286
00:18:01,720 --> 00:18:04,320
Speaker 1: they are not able to speak that language. He says, well,

287
00:18:04,320 --> 00:18:07,000
Speaker 1: what if we just take that same step, and now

288
00:18:07,040 --> 00:18:10,399
Speaker 1: we treat the other language as a code in of

289
00:18:10,480 --> 00:18:13,320
Speaker 1: itself and try to translate that into English or or

290
00:18:13,560 --> 00:18:17,600
Speaker 1: decrypt it into English. Weaver are acknowledged that the word

291
00:18:17,800 --> 00:18:21,320
Speaker 1: into word approach that Booth and his contemporaries were relying

292
00:18:21,400 --> 00:18:25,080
Speaker 1: upon had limited utility. He wrote, quote, it is in

293
00:18:25,160 --> 00:18:28,639
Speaker 1: fact amply clear that a translation procedure that does little

294
00:18:28,640 --> 00:18:31,200
Speaker 1: more than handle a one to one correspondence of words

295
00:18:31,520 --> 00:18:35,440
Speaker 1: cannot hope to be useful for problems of literary translation

296
00:18:35,760 --> 00:18:38,680
Speaker 1: in which style is important, and in which the problems

297
00:18:38,720 --> 00:18:42,879
Speaker 1: of idiom, multiple meanings, etcetera. Are frequent. End quote. So

298
00:18:42,920 --> 00:18:46,679
Speaker 1: there he's saying, you can't just take a foreign word,

299
00:18:47,160 --> 00:18:51,560
Speaker 1: translate it into whatever the closest equivalent in English is,

300
00:18:52,080 --> 00:18:55,520
Speaker 1: and hope to get the same meaning, especially in literary works,

301
00:18:55,640 --> 00:18:58,720
Speaker 1: because they are all these different turns of phrase and

302
00:18:58,840 --> 00:19:03,199
Speaker 1: cultural meanings that will get lost. In that translation. You

303
00:19:03,240 --> 00:19:07,280
Speaker 1: would have something that might technically be considered more or

304
00:19:07,359 --> 00:19:10,520
Speaker 1: less correct, but would not be actually correct. You wouldn't

305
00:19:10,520 --> 00:19:14,840
Speaker 1: be getting across the meaning of the author in that translation.

306
00:19:15,080 --> 00:19:19,800
Speaker 1: You would just have words in a syntactical order that

307
00:19:19,880 --> 00:19:24,360
Speaker 1: would make sense from a syntax perspective. In other words,

308
00:19:24,400 --> 00:19:28,600
Speaker 1: you would have sentences that held up grammatically, but they

309
00:19:28,600 --> 00:19:33,240
Speaker 1: wouldn't necessarily have the meaning of the original writing. Weaver's

310
00:19:33,240 --> 00:19:36,600
Speaker 1: proposal was to perhaps expand the word into word model

311
00:19:36,680 --> 00:19:39,000
Speaker 1: and create a system that would analyze not just the

312
00:19:39,040 --> 00:19:42,800
Speaker 1: target word, but the words adjacent to the target in

313
00:19:42,920 --> 00:19:46,479
Speaker 1: order to determine the context of the word the meaning

314
00:19:46,680 --> 00:19:48,720
Speaker 1: of the word. As we'll see when we get a

315
00:19:48,760 --> 00:19:51,359
Speaker 1: little bit further down in the timeline, this is one

316
00:19:51,400 --> 00:19:54,520
Speaker 1: of the methods that folks working in in natural language

317
00:19:54,520 --> 00:19:58,080
Speaker 1: processing incorporated into their approach. So this was incredibly forward

318
00:19:58,080 --> 00:20:02,800
Speaker 1: thinking of Weaver. On January seven, ninety four, researchers from

319
00:20:02,800 --> 00:20:06,720
Speaker 1: IBM and Georgetown University demonstrated a system that was able

320
00:20:06,760 --> 00:20:12,760
Speaker 1: to translate around sixty sentences from Russian into English automatically. Now,

321
00:20:12,760 --> 00:20:16,359
Speaker 1: the process wasn't exactly painless. It required an operator to

322
00:20:16,440 --> 00:20:19,800
Speaker 1: take a sentence written in Russian but transcribed for the

323
00:20:19,800 --> 00:20:23,439
Speaker 1: English alphabet. It wasn't in the cyrillic alphabet. The person

324
00:20:23,520 --> 00:20:27,800
Speaker 1: would then encode that sentence on punch cards. They would

325
00:20:27,800 --> 00:20:30,800
Speaker 1: feed the punch cards into a seven oh one computer.

326
00:20:31,359 --> 00:20:34,480
Speaker 1: I mentioned the seven oh one that was an IBM system,

327
00:20:34,520 --> 00:20:37,080
Speaker 1: but I mentioned that in the previous episode and speech recognition.

328
00:20:37,359 --> 00:20:40,440
Speaker 1: Then they would wait for the translation program's response, which

329
00:20:40,440 --> 00:20:43,000
Speaker 1: would take a few seconds. The program would attempt to

330
00:20:43,040 --> 00:20:47,480
Speaker 1: translate the words from Russian to English. The demonstration was impressive,

331
00:20:47,600 --> 00:20:50,480
Speaker 1: but it was limited in scope. The program had alexicon

332
00:20:50,560 --> 00:20:53,640
Speaker 1: of only two fifty words or so, and it required

333
00:20:53,680 --> 00:20:58,199
Speaker 1: extensive programming to cope with syntax because word order in

334
00:20:58,320 --> 00:21:02,679
Speaker 1: Russian is different then word order in English, and you

335
00:21:02,720 --> 00:21:06,560
Speaker 1: can think of the programming as including metadata. The researchers

336
00:21:06,560 --> 00:21:11,000
Speaker 1: would tag Russian words with little signs that related to

337
00:21:11,119 --> 00:21:14,480
Speaker 1: specific rules. So, for example, one of the terms the

338
00:21:14,520 --> 00:21:18,760
Speaker 1: system could translate was a Russian two word phrase. It

339
00:21:18,840 --> 00:21:25,200
Speaker 1: was g dial major, which is I'm butchering the Russian pronunciation,

340
00:21:25,280 --> 00:21:28,760
Speaker 1: but in English it means major general. But the word

341
00:21:28,880 --> 00:21:32,119
Speaker 1: order is reversed in Russian. If you did a strict

342
00:21:32,160 --> 00:21:35,760
Speaker 1: word to word translation, you would get general major with

343
00:21:35,840 --> 00:21:39,600
Speaker 1: the translation, because that's the order that the Russian phrase

344
00:21:39,600 --> 00:21:42,560
Speaker 1: would put it in. So the programmers would tag each

345
00:21:42,600 --> 00:21:45,879
Speaker 1: word with a rule to kind of give the idea

346
00:21:45,920 --> 00:21:48,880
Speaker 1: of of what what you would what you should follow

347
00:21:48,920 --> 00:21:51,520
Speaker 1: when you're making these translations, and by you I mean

348
00:21:51,720 --> 00:21:55,359
Speaker 1: the computer system. So the word for general got the

349
00:21:55,400 --> 00:22:00,679
Speaker 1: assignment of rule twenty one and the rule for major

350
00:22:01,200 --> 00:22:04,879
Speaker 1: got the sign on. So when the system encountered a word,

351
00:22:05,240 --> 00:22:08,320
Speaker 1: it would look up any related rules to that word.

352
00:22:08,720 --> 00:22:11,200
Speaker 1: So if it comes across a word that has the

353
00:22:11,240 --> 00:22:14,760
Speaker 1: associated rule one, it would say, all right, this rule

354
00:22:14,800 --> 00:22:17,200
Speaker 1: tells me I have to go back over the message

355
00:22:17,240 --> 00:22:19,240
Speaker 1: and look to see if there was a rule twenty

356
00:22:19,280 --> 00:22:22,720
Speaker 1: one word in that same phrase, And if it finds

357
00:22:22,760 --> 00:22:25,159
Speaker 1: a rule twenty one word, it would then know I

358
00:22:25,240 --> 00:22:29,479
Speaker 1: need to reverse the order of these two words. This

359
00:22:29,480 --> 00:22:33,080
Speaker 1: this uh word order that appears in Russian needs to

360
00:22:33,119 --> 00:22:36,280
Speaker 1: be flipped for English. Now that's a pretty laborious process

361
00:22:36,840 --> 00:22:40,280
Speaker 1: and it doesn't work great for larger lexicons. The larger

362
00:22:40,440 --> 00:22:43,879
Speaker 1: the vocabulary, the more complex the sentences can become, the

363
00:22:43,920 --> 00:22:47,000
Speaker 1: more exceptions and rules you're going to encounter. It would

364
00:22:47,040 --> 00:22:49,640
Speaker 1: be really hard to implement this on a big scale,

365
00:22:49,680 --> 00:22:53,119
Speaker 1: but it was an impressive display of machine translation. The

366
00:22:53,160 --> 00:22:56,440
Speaker 1: system was essentially a vocabulary list and a long series

367
00:22:56,480 --> 00:23:00,919
Speaker 1: of if then rules. If the word is this, then

368
00:23:01,040 --> 00:23:04,919
Speaker 1: look for this. If that is there, then switch the

369
00:23:05,119 --> 00:23:09,720
Speaker 1: word order. Essentially according to articles, it could translate sentences

370
00:23:09,760 --> 00:23:13,640
Speaker 1: designed for the system in about six seconds. But again

371
00:23:13,720 --> 00:23:17,640
Speaker 1: it was designed for the system, very limited vocabulary, so

372
00:23:18,359 --> 00:23:21,639
Speaker 1: limited implementation there. And it's good to point out that

373
00:23:21,680 --> 00:23:24,200
Speaker 1: a lot of work and machine translation around this time

374
00:23:24,240 --> 00:23:27,919
Speaker 1: focused on English and Russian, which is no big surprise.

375
00:23:28,720 --> 00:23:30,720
Speaker 1: Keep in mind the time scale we're talking about the

376
00:23:30,800 --> 00:23:34,719
Speaker 1: nineteen fifties. Here, the USA and the then USS are

377
00:23:34,880 --> 00:23:38,000
Speaker 1: we're not on great terms. Both countries were using pretty

378
00:23:38,080 --> 00:23:41,439
Speaker 1: much every means at their disposal to analyze one another,

379
00:23:41,960 --> 00:23:44,919
Speaker 1: to spy on one another, to maneuver to make certain

380
00:23:44,960 --> 00:23:47,800
Speaker 1: the other nation didn't get a superior position. And we

381
00:23:47,800 --> 00:23:50,640
Speaker 1: saw a lot of technological development during this period, including

382
00:23:50,680 --> 00:23:53,800
Speaker 1: the space race that was all wrapped up in this

383
00:23:53,880 --> 00:23:57,840
Speaker 1: Cold War issue as well, and perhaps as no big surprise,

384
00:23:57,920 --> 00:24:00,520
Speaker 1: the US government was pretty keen to fund research and

385
00:24:00,560 --> 00:24:03,919
Speaker 1: development in machine translation up to a point. That is,

386
00:24:04,240 --> 00:24:09,440
Speaker 1: in nineteen sixty six, Joseph Wisenbaum published a computer program

387
00:24:09,480 --> 00:24:12,919
Speaker 1: called Eliza. I've talked about Eliza in previous episodes of

388
00:24:12,960 --> 00:24:16,800
Speaker 1: Tech Stuff. This was a primitive chat bought text based

389
00:24:17,000 --> 00:24:22,119
Speaker 1: chat bot. It mimicked a Rogerian psychotherapist. That's a discipline

390
00:24:22,160 --> 00:24:26,440
Speaker 1: that was pioneered by the psychologist Carl Rogers. It's sometimes

391
00:24:26,480 --> 00:24:31,679
Speaker 1: also called persons centered therapy. Eliza was strictly this text

392
00:24:31,680 --> 00:24:34,760
Speaker 1: based terminal operation. You would see a line of text

393
00:24:34,800 --> 00:24:37,000
Speaker 1: pop up. It would ask you how what how you're doing?

394
00:24:37,440 --> 00:24:39,080
Speaker 1: You can type stuff in and then it would respond

395
00:24:39,119 --> 00:24:43,399
Speaker 1: to you, so you would get the responses that appeared

396
00:24:43,440 --> 00:24:46,400
Speaker 1: to be semi intelligent. Typically it would be a question

397
00:24:46,440 --> 00:24:49,600
Speaker 1: to ask for more information, or sometimes it would be

398
00:24:49,640 --> 00:24:52,600
Speaker 1: a phrase to change the subject. So you might say

399
00:24:53,240 --> 00:24:56,640
Speaker 1: something along the lines of I'm so angry right now,

400
00:24:56,880 --> 00:24:59,800
Speaker 1: and Eliza might respond with what has made you angry?

401
00:25:00,320 --> 00:25:03,880
Speaker 1: So Eliza has flipped this around in order to sustain

402
00:25:03,920 --> 00:25:06,280
Speaker 1: the conversation. Then you could type in something else. Maybe

403
00:25:06,280 --> 00:25:09,760
Speaker 1: you type in everything is going wrong today, and Eliza

404
00:25:09,880 --> 00:25:12,600
Speaker 1: might respond with can you give me an example? And

405
00:25:12,640 --> 00:25:15,800
Speaker 1: then so on. Eliza would give the appearance of understanding

406
00:25:15,800 --> 00:25:18,640
Speaker 1: the subject, but in reality it was simply taking the input,

407
00:25:19,320 --> 00:25:22,040
Speaker 1: analyzing the parts of speech, then sending back a very

408
00:25:22,080 --> 00:25:25,760
Speaker 1: similar message or a related message in an effort to

409
00:25:25,840 --> 00:25:28,880
Speaker 1: keep the conversation going. Like it might just be a placeholder.

410
00:25:29,520 --> 00:25:33,000
Speaker 1: The program did not understand language or context beyond being

411
00:25:33,040 --> 00:25:35,840
Speaker 1: able to parse the basic parts of a sentence and

412
00:25:35,880 --> 00:25:39,480
Speaker 1: then rearrange them or go with several stock responses when

413
00:25:40,040 --> 00:25:42,399
Speaker 1: it didn't have a way of figuring out what it

414
00:25:42,400 --> 00:25:47,359
Speaker 1: should do. NIX also saw something else that would end

415
00:25:47,440 --> 00:25:50,400
Speaker 1: up creating a bit of a big setback for natural

416
00:25:50,520 --> 00:25:54,840
Speaker 1: language processor researchers. But I'll explain more about that when

417
00:25:54,920 --> 00:26:06,160
Speaker 1: we come back after a quick break to thank our sponsors. Okay,

418
00:26:06,200 --> 00:26:11,080
Speaker 1: So nineteen sixty six, what happened that set back research

419
00:26:11,119 --> 00:26:15,040
Speaker 1: in this field. Well, that's when a report was published

420
00:26:15,040 --> 00:26:18,359
Speaker 1: that had a dramatic impact on funding for R and

421
00:26:18,440 --> 00:26:22,680
Speaker 1: D and machine translation. It was called the ALPAC Report.

422
00:26:23,240 --> 00:26:27,680
Speaker 1: ALPAC a l p a C stood for Automatic Language

423
00:26:27,760 --> 00:26:31,800
Speaker 1: Processing Advisory Committee. This was a group consisting of various

424
00:26:31,800 --> 00:26:36,280
Speaker 1: experts and fields ranging from computer science to linguistics to psychology,

425
00:26:36,560 --> 00:26:38,960
Speaker 1: and the U. S. Government had established the committee back

426
00:26:39,000 --> 00:26:42,119
Speaker 1: in nineteen sixty four, and they had a very simple assignment,

427
00:26:42,440 --> 00:26:46,000
Speaker 1: or at least simple on the surface, which was evaluate

428
00:26:46,119 --> 00:26:49,880
Speaker 1: the progress that was being made an automatic machine translation

429
00:26:50,160 --> 00:26:53,360
Speaker 1: across the board, look at what everyone's working on. Give

430
00:26:53,480 --> 00:26:56,320
Speaker 1: us an idea of where we are and where we're headed.

431
00:26:56,680 --> 00:27:00,720
Speaker 1: The nineteen sixty six report essentially concluded that the field

432
00:27:00,800 --> 00:27:03,800
Speaker 1: was still in its infancy, and that before any real

433
00:27:03,840 --> 00:27:07,479
Speaker 1: advancements could happen, a lot more basic research in the

434
00:27:07,520 --> 00:27:11,879
Speaker 1: field of computational linguistics would be required. So essentially, the

435
00:27:11,960 --> 00:27:14,840
Speaker 1: report was saying, we're trying to move at a full gallop,

436
00:27:14,880 --> 00:27:17,040
Speaker 1: but we still aren't really sure how to get on

437
00:27:17,080 --> 00:27:21,520
Speaker 1: the horse. I'm paraphrasing, of course. One result of this

438
00:27:21,880 --> 00:27:25,119
Speaker 1: was that the US government began to scale back grants

439
00:27:25,200 --> 00:27:28,639
Speaker 1: for research in the field of machine translation. This was,

440
00:27:28,720 --> 00:27:33,000
Speaker 1: unfortunately exactly the opposite thing that needed to happen. The

441
00:27:33,040 --> 00:27:37,280
Speaker 1: US government wanted more immediate results and decided, well, if

442
00:27:37,320 --> 00:27:39,680
Speaker 1: you're not going to get results right away, we're gonna

443
00:27:39,760 --> 00:27:42,679
Speaker 1: take that money away and put it to use somewhere else.

444
00:27:43,240 --> 00:27:46,119
Speaker 1: And that made funding scarce, and it likely prolonged the

445
00:27:46,160 --> 00:27:49,280
Speaker 1: amount of time it took to advance the discipline. Although

446
00:27:49,400 --> 00:27:52,080
Speaker 1: I should stress work was still being performed in the

447
00:27:52,160 --> 00:27:54,960
Speaker 1: United States as well as elsewhere. It's not like this

448
00:27:55,359 --> 00:27:58,760
Speaker 1: brought everything to a standstill. It just slowed down quite

449
00:27:58,800 --> 00:28:04,120
Speaker 1: a bit. By teen sixty seven, NLP research was straining

450
00:28:04,280 --> 00:28:10,600
Speaker 1: against technological limitations. They were starting to feel the the

451
00:28:10,840 --> 00:28:14,440
Speaker 1: very limit of what computers were able to do. Even

452
00:28:14,480 --> 00:28:18,240
Speaker 1: advanced systems could take upwards of seven minutes to analyze

453
00:28:18,280 --> 00:28:22,800
Speaker 1: a long sentence. Programming was still largely in a similar language,

454
00:28:22,840 --> 00:28:24,760
Speaker 1: so it wasn't easy to do. And you would still

455
00:28:24,800 --> 00:28:27,639
Speaker 1: have to interact with machines using punch cards, so that

456
00:28:27,720 --> 00:28:30,440
Speaker 1: was also laborious, and heaven help you if you dropped

457
00:28:30,480 --> 00:28:32,480
Speaker 1: all your punch cards and you forgot to number them,

458
00:28:32,520 --> 00:28:36,439
Speaker 1: because then you've ruined your program. Work was progressing on

459
00:28:36,480 --> 00:28:39,560
Speaker 1: the linguistic side, but the technological side was kind of

460
00:28:39,640 --> 00:28:42,840
Speaker 1: lagging behind at this point. One of the big decisions

461
00:28:42,880 --> 00:28:45,719
Speaker 1: researchers had to make around this time was what were

462
00:28:45,760 --> 00:28:49,600
Speaker 1: they going to focus on first while building out computational linguistics.

463
00:28:49,640 --> 00:28:53,200
Speaker 1: Because it's such a huge problem you couldn't really tackle

464
00:28:53,240 --> 00:28:56,840
Speaker 1: it wholesale. You needed to kind of focus on specifics.

465
00:28:56,880 --> 00:29:00,920
Speaker 1: So should research focus on syntax that all about sentence

466
00:29:01,000 --> 00:29:03,400
Speaker 1: form and structure, as I mentioned earlier, or should it

467
00:29:03,440 --> 00:29:06,640
Speaker 1: focus on semantics, which is more about the underlying meaning

468
00:29:06,760 --> 00:29:09,760
Speaker 1: of what was said and less about the structure of

469
00:29:09,840 --> 00:29:15,000
Speaker 1: how it was said. And ultimately, most researchers, not all

470
00:29:15,040 --> 00:29:17,840
Speaker 1: of them, but most of them decided to focus on syntax.

471
00:29:17,920 --> 00:29:20,600
Speaker 1: For one thing, it seemed like a more analytical thing

472
00:29:20,640 --> 00:29:24,360
Speaker 1: to concentrate on. Right like, you could define rules more

473
00:29:24,440 --> 00:29:28,240
Speaker 1: easily for syntax than you could for semantics, and semantic

474
00:29:28,280 --> 00:29:31,200
Speaker 1: ambiguity could be fudged a bit. You can rely heavily

475
00:29:31,240 --> 00:29:34,680
Speaker 1: on output words that had a broad meaning. So using

476
00:29:34,680 --> 00:29:37,720
Speaker 1: a word with a broad meaning might not produce a specific,

477
00:29:37,960 --> 00:29:41,800
Speaker 1: precise result, but at least could be quote not wrong

478
00:29:41,920 --> 00:29:46,360
Speaker 1: end quote. So if a word might have several translations

479
00:29:46,480 --> 00:29:50,520
Speaker 1: ranging from hut to villa to bungalow to mansion, the

480
00:29:50,560 --> 00:29:55,400
Speaker 1: output word might be building because the translating program might

481
00:29:55,400 --> 00:29:59,120
Speaker 1: not know which variation of that translation it should go with,

482
00:29:59,680 --> 00:30:03,800
Speaker 1: but knows that all of those different examples fall into

483
00:30:03,840 --> 00:30:08,440
Speaker 1: a larger category called building. So that's not precise, but

484
00:30:08,480 --> 00:30:11,560
Speaker 1: it gets the job done. You you would understand what

485
00:30:11,680 --> 00:30:15,880
Speaker 1: the the actual noun was. In general, you would know

486
00:30:15,920 --> 00:30:17,640
Speaker 1: it was a building. You might not know that it

487
00:30:17,720 --> 00:30:20,400
Speaker 1: was a home, and you might not know what kind

488
00:30:20,440 --> 00:30:22,280
Speaker 1: of home it was, but you would at least know

489
00:30:22,520 --> 00:30:24,760
Speaker 1: that it was a structure. So much of the work

490
00:30:24,760 --> 00:30:28,640
Speaker 1: in the late nineteen sixties focused on solving syntax problems

491
00:30:28,680 --> 00:30:33,720
Speaker 1: for computers, with the researchers saying will worry about semantics later.

492
00:30:34,400 --> 00:30:37,160
Speaker 1: Some notable groups went against the flow and decided to

493
00:30:37,160 --> 00:30:42,320
Speaker 1: tackle semantics and semantically driven processing, partly because they recognized

494
00:30:42,360 --> 00:30:46,080
Speaker 1: it as being a really tough problem and some engineers

495
00:30:46,160 --> 00:30:50,000
Speaker 1: just love solving really hard problems. That's kind of what

496
00:30:50,280 --> 00:30:53,080
Speaker 1: thrills them, and so they chose to go that route.

497
00:30:53,400 --> 00:30:57,080
Speaker 1: They began building out semantic categories and worked on semantic

498
00:30:57,160 --> 00:31:02,280
Speaker 1: pattern matching using semantic networks as a means of knowledge representation.

499
00:31:03,440 --> 00:31:06,800
Speaker 1: Karen Spark Jones, who wrote that that history I mentioned earlier,

500
00:31:06,800 --> 00:31:09,840
Speaker 1: suggests that it was in the late nineteen sixties that

501
00:31:10,040 --> 00:31:13,320
Speaker 1: the research moved out of its initial phase and into

502
00:31:13,360 --> 00:31:16,560
Speaker 1: a second phase, and that second phase was largely marked

503
00:31:16,560 --> 00:31:22,600
Speaker 1: by the incorporation of artificial intelligence, including incorporating world knowledge

504
00:31:22,720 --> 00:31:27,680
Speaker 1: in processing natural language. In nineteen sixty eight, Terry Winograd,

505
00:31:27,760 --> 00:31:30,800
Speaker 1: who today is a Professor Emeritus of Computer Science at

506
00:31:30,840 --> 00:31:34,000
Speaker 1: Stanford University, was working in M I. T. S AI

507
00:31:34,120 --> 00:31:37,520
Speaker 1: Lab as part of his postgraduate studies, and he began

508
00:31:37,600 --> 00:31:40,920
Speaker 1: to work on a virtual world he would call s

509
00:31:41,120 --> 00:31:45,400
Speaker 1: h R D l U sued blue. Um, that's what

510
00:31:45,440 --> 00:31:47,880
Speaker 1: I'm going to call it is sued blue. It consisted

511
00:31:47,920 --> 00:31:51,600
Speaker 1: of virtual objects on a virtual table, so it's all imaginary, right.

512
00:31:52,040 --> 00:31:56,280
Speaker 1: He then programmed a grammar and lexicon specifically for this

513
00:31:56,560 --> 00:32:00,880
Speaker 1: very very limited imaginary world. So in anything that did

514
00:32:00,920 --> 00:32:04,000
Speaker 1: not involve the things that were in this imaginary world,

515
00:32:04,080 --> 00:32:07,560
Speaker 1: namely the table and these virtual objects, that didn't need

516
00:32:07,640 --> 00:32:10,520
Speaker 1: to be dealt with it all because it was immaterial,

517
00:32:10,600 --> 00:32:13,640
Speaker 1: It didn't exist in this universe. So he only had

518
00:32:13,640 --> 00:32:15,800
Speaker 1: to focus on the elements he had created, and that

519
00:32:15,920 --> 00:32:18,480
Speaker 1: limited the scope of his work and made it more manageable.

520
00:32:19,040 --> 00:32:22,800
Speaker 1: His design even included the concept of persistence and memory.

521
00:32:23,600 --> 00:32:28,160
Speaker 1: So imagine a table with a collection of five objects

522
00:32:28,200 --> 00:32:30,400
Speaker 1: on it. So you've got imaginary table, You've got five

523
00:32:30,440 --> 00:32:34,480
Speaker 1: imaginary objects on it. Two of the five imaginary objects

524
00:32:34,520 --> 00:32:37,600
Speaker 1: are spheres. One of them is a green sphere, and

525
00:32:37,640 --> 00:32:39,960
Speaker 1: one of them is a red sphere. You then type

526
00:32:39,960 --> 00:32:43,680
Speaker 1: in a command into a terminal that is that's giving

527
00:32:43,680 --> 00:32:47,200
Speaker 1: you information about this virtual world, and you say, I

528
00:32:47,240 --> 00:32:50,360
Speaker 1: want to move the red sphere over to the far

529
00:32:50,600 --> 00:32:53,280
Speaker 1: end of the table. And then you send another command,

530
00:32:53,320 --> 00:32:55,680
Speaker 1: only this time you don't specify red sphere. You just

531
00:32:55,720 --> 00:32:59,680
Speaker 1: say move the sphere back. Whino grad system could actually

532
00:32:59,720 --> 00:33:02,720
Speaker 1: remember ber that you had previously moved the red sphere,

533
00:33:03,040 --> 00:33:05,240
Speaker 1: and it would apply your command to the red sphere

534
00:33:05,280 --> 00:33:08,520
Speaker 1: again under the assumption that's what you meant. When you

535
00:33:08,840 --> 00:33:12,320
Speaker 1: didn't specify, you must have meant the same sphere that

536
00:33:12,400 --> 00:33:15,200
Speaker 1: you had just moved. This is a concept that we're

537
00:33:15,200 --> 00:33:18,880
Speaker 1: seeing rolled out into voice assistance today, like Google Assistant.

538
00:33:19,360 --> 00:33:22,720
Speaker 1: It's the ability to reference something you've already accessed without

539
00:33:22,800 --> 00:33:26,600
Speaker 1: having to specify what you're talking about. So if I

540
00:33:26,640 --> 00:33:29,840
Speaker 1: asked a voice assistant what the weather will be like today,

541
00:33:29,920 --> 00:33:32,400
Speaker 1: and then I follow that up after I get the information,

542
00:33:32,440 --> 00:33:35,920
Speaker 1: I say what about tomorrow, the system that has this

543
00:33:36,000 --> 00:33:39,360
Speaker 1: kind of capability could infer that what I meant was

544
00:33:39,960 --> 00:33:42,720
Speaker 1: what will the weather be like tomorrow, even though I

545
00:33:42,720 --> 00:33:46,040
Speaker 1: didn't say it specifically. Like that. That's pretty advanced for

546
00:33:46,160 --> 00:33:48,920
Speaker 1: nineteen sixty eight, even though it was for this very

547
00:33:48,960 --> 00:33:53,520
Speaker 1: restricted virtual world with a limited number of variables. However,

548
00:33:54,080 --> 00:33:56,760
Speaker 1: win no Grad discovered that the secret to his success

549
00:33:56,840 --> 00:34:00,360
Speaker 1: was largely in this restriction. As you expec ended the

550
00:34:00,440 --> 00:34:03,920
Speaker 1: virtual world to incorporate more elements, it made the problem

551
00:34:04,040 --> 00:34:07,920
Speaker 1: exponentially harder. His work, by the way, was an early

552
00:34:07,920 --> 00:34:11,560
Speaker 1: example of what we call anapho resolution, and an anaphour

553
00:34:11,640 --> 00:34:13,279
Speaker 1: is what I was talking about second ago. It's a

554
00:34:13,280 --> 00:34:16,759
Speaker 1: word or phrase that refers to an earlier word or

555
00:34:16,880 --> 00:34:20,799
Speaker 1: phrase within a discourse. So if I said move the

556
00:34:20,800 --> 00:34:24,080
Speaker 1: red sphere to the left, then I said, now move

557
00:34:24,120 --> 00:34:27,560
Speaker 1: it back the It obviously refers to the red sphere.

558
00:34:27,640 --> 00:34:30,759
Speaker 1: You would understand that, but a machine wouldn't necessarily understand it.

559
00:34:31,239 --> 00:34:33,520
Speaker 1: You would have to say move the red sphere to

560
00:34:33,520 --> 00:34:37,040
Speaker 1: the left, move the red sphere back. And even with back,

561
00:34:37,560 --> 00:34:40,200
Speaker 1: that has an element of memory to it, because the

562
00:34:40,239 --> 00:34:43,440
Speaker 1: system has to remember where the red sphere used to be. Why.

563
00:34:43,480 --> 00:34:46,040
Speaker 1: No Grad's approach was one of the early attempts to

564
00:34:46,080 --> 00:34:51,200
Speaker 1: incorporate anapho resolution into NLP models. Other models concentrated on

565
00:34:51,239 --> 00:34:54,600
Speaker 1: translating word by word or sentence by sentence. They were

566
00:34:54,640 --> 00:35:00,360
Speaker 1: incapable of maintaining relationships between between words beyond that. That

567
00:35:00,520 --> 00:35:04,600
Speaker 1: shift marked a change in attitude among NLP researchers of

568
00:35:04,680 --> 00:35:07,680
Speaker 1: the time. A growing number of researchers felt that world

569
00:35:07,719 --> 00:35:11,640
Speaker 1: knowledge and artificial intelligence was necessary if we wanted machines

570
00:35:11,719 --> 00:35:14,600
Speaker 1: to be able to analyze and act upon longer forms

571
00:35:14,640 --> 00:35:18,360
Speaker 1: of discourse. The early approaches to NLP were best suited

572
00:35:18,400 --> 00:35:24,400
Speaker 1: to short, self contained passages in ninety one, AREPA launched

573
00:35:24,400 --> 00:35:28,319
Speaker 1: the Speech Understanding Research Program. I also mentioned that in

574
00:35:28,360 --> 00:35:30,880
Speaker 1: the Speech recognition episode it was very important for the

575
00:35:30,920 --> 00:35:34,080
Speaker 1: development of speech recognition. The goal of that program was

576
00:35:34,120 --> 00:35:37,319
Speaker 1: to advance not only speech recognition but also n LP

577
00:35:37,520 --> 00:35:40,640
Speaker 1: research so that a computer could not just detect and

578
00:35:40,680 --> 00:35:44,680
Speaker 1: transcribe speech, but also respond to it in some meaningful way,

579
00:35:44,800 --> 00:35:49,000
Speaker 1: for example being able to UH index all that information

580
00:35:49,080 --> 00:35:53,480
Speaker 1: so that it is searchable. The program lasted five years. However,

581
00:35:53,600 --> 00:35:57,280
Speaker 1: at the conclusion, the agency was not satisfied with the results,

582
00:35:57,640 --> 00:36:00,520
Speaker 1: which technically delivered upon what was asked, but a pretty

583
00:36:00,640 --> 00:36:05,239
Speaker 1: limited implementation, so are BUT decided to cut funding. They

584
00:36:05,280 --> 00:36:08,800
Speaker 1: stopped the project. This was another big blow to research

585
00:36:08,800 --> 00:36:11,160
Speaker 1: in the United States, which had viewed the project as

586
00:36:11,160 --> 00:36:14,440
Speaker 1: a positive development ever since the ALPAC report had pulled

587
00:36:14,440 --> 00:36:17,920
Speaker 1: the RUG out from under the funding earlier. Now, I've

588
00:36:17,920 --> 00:36:20,560
Speaker 1: got a lot more to say about the development of

589
00:36:20,680 --> 00:36:24,439
Speaker 1: natural language processing and where we are now, as well

590
00:36:24,480 --> 00:36:27,239
Speaker 1: as the history of the various voice assistants that we're

591
00:36:27,280 --> 00:36:30,799
Speaker 1: familiar with today. But it's time to conclude this episode.

592
00:36:31,040 --> 00:36:33,239
Speaker 1: In our next episode, we'll pick up where I left

593
00:36:33,239 --> 00:36:36,160
Speaker 1: off today and we'll continue down and talk about all

594
00:36:36,200 --> 00:36:39,680
Speaker 1: of our beloved friends like Syrie and Alexa. Now, if

595
00:36:39,719 --> 00:36:43,880
Speaker 1: you have suggestions or future episodes of tech Stuff, right me.

596
00:36:44,040 --> 00:36:46,319
Speaker 1: Let me know what you want to hear. There might

597
00:36:46,360 --> 00:36:49,560
Speaker 1: be a specific technology or a company, a person in tech.

598
00:36:49,640 --> 00:36:51,560
Speaker 1: Maybe there's someone you want me to interview or have

599
00:36:51,680 --> 00:36:54,279
Speaker 1: on as a special guest host. You can send me

600
00:36:54,320 --> 00:36:57,160
Speaker 1: an email. The address for the show is tech Stuff

601
00:36:57,440 --> 00:37:00,439
Speaker 1: at how stuff works dot com, or you can drop

602
00:37:00,440 --> 00:37:02,719
Speaker 1: me a line on Facebook or Twitter. The handle of

603
00:37:02,840 --> 00:37:06,480
Speaker 1: both of those is tech Stuff H s W. Don't forget.

604
00:37:06,600 --> 00:37:08,680
Speaker 1: You can follow us on Instagram. I want to see

605
00:37:08,680 --> 00:37:10,920
Speaker 1: you guys over there, and I'll talk to you again

606
00:37:11,680 --> 00:37:20,480
Speaker 1: really soon for more on this and thousands of other topics,

607
00:37:20,560 --> 00:37:32,000
Speaker 1: because it how stuff works dot com