WEBVTT - More on NLP and where voice assistants come from 0:00:04.120 --> 0:00:07.160 Get in touch with technology with tech Stuff from how 0:00:07.200 --> 0:00:13.880 stuff works dot com. Hey there, and welcome to tech Stuff. 0:00:13.920 --> 0:00:17.520 I'm your host, Jonathan Strickland. I'm an executive producer with 0:00:17.560 --> 0:00:21.319 how Stuff Works in a love all things tech, and 0:00:21.440 --> 0:00:26.720 this is the second episode about natural language processing an LP, 0:00:27.040 --> 0:00:31.040 also natural language understanding and LU. The two are related. 0:00:31.800 --> 0:00:35.080 With that describes the technologies and processes we use to 0:00:35.120 --> 0:00:39.080 give machines the ability to interpret and respond to language 0:00:39.120 --> 0:00:43.479 the way we use it, so not just understanding our input, 0:00:43.520 --> 0:00:47.839 but also generating output that still follows the rules of 0:00:47.960 --> 0:00:51.479 various languages. So it's all about getting machines to conform 0:00:51.600 --> 0:00:54.520 to us rather than the other way around. If you 0:00:54.640 --> 0:00:58.200 have not listened to the episode immediately before this one, 0:00:58.720 --> 0:01:00.760 you should do that. But as I'm about to pick 0:01:00.800 --> 0:01:02.920 up where I left off, which was just after our 0:01:03.000 --> 0:01:07.280 PA pulled the plug on its Speech Understanding research project, 0:01:07.920 --> 0:01:10.920 and the research under the r PA project had shown 0:01:11.000 --> 0:01:15.280 that NLP was an even more challenging problem than had 0:01:15.319 --> 0:01:20.360 previously been anticipated. Even the simplest approaches were creating enormous 0:01:20.400 --> 0:01:22.959 demands on both the work programmers had to do to 0:01:23.000 --> 0:01:26.160 build a system out and the processing the system would 0:01:26.200 --> 0:01:29.640 have to rely upon in order to interpret language. Work 0:01:29.680 --> 0:01:35.280 in the late nineties seventies ranged into psychology. NLP researchers 0:01:35.440 --> 0:01:37.720 felt a system needed to be able to identify a 0:01:37.840 --> 0:01:42.400 user's needs and goals in order to function properly, had 0:01:42.440 --> 0:01:46.240 to understand not just the surface level meaning of a phrase, 0:01:46.760 --> 0:01:50.920 but the underlying meaning of linguistic expressions as well. Only 0:01:50.960 --> 0:01:53.880 then could you have a computer system that could collaborate 0:01:53.920 --> 0:01:56.560 with a human being in a seamless way. So, in 0:01:56.560 --> 0:01:59.640 other words, what they're saying is that you could translate 0:01:59.680 --> 0:02:03.080 stuff for interpret stuff word by word, but unless you 0:02:03.080 --> 0:02:05.800 have an understanding of what the person is trying to 0:02:05.840 --> 0:02:09.799 actually accomplish, chances are the results you're going to get 0:02:09.800 --> 0:02:12.160 back are not going to be as relevant as they 0:02:12.160 --> 0:02:15.440 could be. And so that was where the psychology was 0:02:15.480 --> 0:02:19.480 starting to take form. By the early nineteen eighties, which 0:02:19.720 --> 0:02:22.640 marks the third phase of n LP development. According to 0:02:22.680 --> 0:02:26.160 the researcher Karen spark Jones, who I talked about in 0:02:26.160 --> 0:02:29.800 the last episode, researchers were coming to terms with the 0:02:29.840 --> 0:02:34.000 idea that a scalable NLP system that relied upon the 0:02:34.040 --> 0:02:38.160 old methods of building lexicons and syntax rules just was 0:02:38.200 --> 0:02:41.040 not practical It required far too much work on the 0:02:41.080 --> 0:02:43.880 front end when designing a system to make a general 0:02:43.919 --> 0:02:48.040 purpose in LP application. The problem was just way too 0:02:48.040 --> 0:02:52.880 big to take that approach. Even with relatively narrow implementations 0:02:52.919 --> 0:02:57.080 like designing a system that would parse technical documents, you think, 0:02:57.360 --> 0:02:59.799 all right, well, the language used in technical documents is 0:02:59.840 --> 0:03:02.799 a subset of the language you would encounter in the 0:03:02.880 --> 0:03:07.640 quote unquote real world. Even with those use cases, the 0:03:07.720 --> 0:03:10.600 old methods were proving to require far too much investment 0:03:10.639 --> 0:03:14.799 in time, money, and effort on the design front. Spark 0:03:14.919 --> 0:03:18.919 Jones identifies the key focus during this phase as being 0:03:19.000 --> 0:03:24.160 on grammar and logic. During this phase, researchers developed several 0:03:24.240 --> 0:03:27.680 different grammar types. Now, grammars are sets of rules for 0:03:27.720 --> 0:03:31.440 analyzing and formalizing language. I would love to go into 0:03:31.480 --> 0:03:34.679 more detail about the different grammars that were developed during 0:03:34.680 --> 0:03:39.120 this phase or adopted for computational models, but honestly, it 0:03:39.160 --> 0:03:44.040 gets really, really heavy, really quickly. It gets extremely technical, 0:03:44.280 --> 0:03:46.680 though not on a technological side, but more on the 0:03:46.760 --> 0:03:50.320 linguistic side. And suffice it to say that a lot 0:03:50.360 --> 0:03:53.080 of research and debate centered around what is the best 0:03:53.120 --> 0:03:56.440 way to arrive at the meaning of language? How do 0:03:56.520 --> 0:04:00.080 we get to that? How how can you ascertain it 0:04:00.200 --> 0:04:03.400 is meant by what was spoken or what was written. 0:04:03.760 --> 0:04:07.240 The grammars were meant to direct NLP models to analyze 0:04:07.320 --> 0:04:11.680 language in different ways that were computationally viable and that 0:04:11.720 --> 0:04:15.320 wouldn't require the laborious process of programming everything in a 0:04:15.360 --> 0:04:19.280 word for word style. Another big area of focus at 0:04:19.279 --> 0:04:23.320 this time was on generation, meaning creating models that would 0:04:23.320 --> 0:04:28.040 allow machines to generate natural language responses to users, including 0:04:28.080 --> 0:04:32.240 responses that were extended, long examples of discourse, not just 0:04:32.920 --> 0:04:36.760 a quick message. While machines wouldn't be able to think, 0:04:37.480 --> 0:04:39.880 they would be able to put together a more sophisticated 0:04:39.960 --> 0:04:43.320 response than chatbots like Eliza that I mentioned in the 0:04:43.400 --> 0:04:46.800 last episode could manage. So the idea being, how can 0:04:46.839 --> 0:04:51.120 we make a machine that can communicate results to a 0:04:51.160 --> 0:04:54.880 person in a way that just makes sense. It's almost 0:04:54.880 --> 0:04:57.440 as if a normal human being is chatting with you. 0:04:58.200 --> 0:05:01.360 But as we understand it, it's very difficult to do 0:05:01.440 --> 0:05:04.960 this on an extended basis. You can do it for 0:05:05.360 --> 0:05:09.280 responses to individual queries, but when you start trying to 0:05:09.320 --> 0:05:12.680 create something that can carry on an actual conversation, that's 0:05:12.680 --> 0:05:16.120 where things start. To break down. In the nineties, work 0:05:16.200 --> 0:05:20.600 in n LP focused on representing words as as mathematical vectors. 0:05:21.279 --> 0:05:25.480 Many words are related to one another, so for example, 0:05:25.720 --> 0:05:29.719 hotel and motel are related. They don't mean exactly the 0:05:29.760 --> 0:05:33.640 same thing, but they mean very similar things. Then you 0:05:33.720 --> 0:05:37.080 have a term like bet and breakfast. A bet and 0:05:37.080 --> 0:05:40.120 breakfast is similar again to a hotel or a motel. 0:05:40.200 --> 0:05:43.200 It's a different thing, but it's related. So these words 0:05:43.240 --> 0:05:46.640 have similarities. They also have differences between them, but they're 0:05:46.680 --> 0:05:49.560 all more similar to each other than if I used 0:05:49.560 --> 0:05:52.520 a different word like hospital. A bet and breakfast is 0:05:52.600 --> 0:05:54.880 more like a hotel or a motel than it is 0:05:54.920 --> 0:05:57.880 a hospital. So in other words, we can group words 0:05:57.920 --> 0:06:02.880 together into vector spaces and calculate the quote unquote distances 0:06:02.920 --> 0:06:07.240 between vectors, and that determines degrees of similarity, and this 0:06:07.320 --> 0:06:11.560 is very helpful for both translation and natural language processing. 0:06:12.040 --> 0:06:15.360 There are ways to do this that even take context 0:06:15.440 --> 0:06:18.799 into account. And this relates back to what was being 0:06:19.760 --> 0:06:26.240 uh suggested by Warren Weaver when I talked about that memorandum. 0:06:26.279 --> 0:06:28.960 There's a model called skip Graham, which is essentially what 0:06:29.040 --> 0:06:33.200 he was talking about. This model takes a window of 0:06:33.240 --> 0:06:36.800 words surrounding each word in a sentence to determine context, 0:06:36.920 --> 0:06:38.800 so it's not looking at it just from a word 0:06:38.960 --> 0:06:42.440 toward basis. Let's say that I write a phrase and 0:06:42.440 --> 0:06:46.520 it says, I'm going to the bank to make a withdrawal. Now, 0:06:46.560 --> 0:06:48.560 the word bank can actually refer to a couple of 0:06:48.560 --> 0:06:52.240 different things. Right, it could be a financial institution, which 0:06:52.279 --> 0:06:55.000 is obviously what I do mean when I say that sentence. 0:06:55.320 --> 0:06:58.440 That it could also mean the area right next to 0:06:58.480 --> 0:07:01.279 a river, right the bank of a river. The Skip 0:07:01.320 --> 0:07:04.520 Graham model would take each word in that sentence and 0:07:04.560 --> 0:07:07.440 then part with a few other words that are close 0:07:07.520 --> 0:07:10.880 by to determine the meaning of the phrase. So it's 0:07:10.880 --> 0:07:13.160 looking at I'm going to the bank to make a 0:07:13.200 --> 0:07:17.800 withdrawal for bank, it might say to bank, the bank, 0:07:18.000 --> 0:07:22.640 to bank, make bank a bank withdrawal bank. By looking 0:07:22.680 --> 0:07:26.440 at these pairings, the system can figure out from context 0:07:26.880 --> 0:07:30.240 that the bank I'm talking about is probably a financial institution. 0:07:30.520 --> 0:07:33.239 I'm probably not making a withdrawal from a river bank. 0:07:33.960 --> 0:07:38.600 So it's a way of machine systems figuring out the 0:07:38.640 --> 0:07:42.120 meaning of a phrase through contextual cues by using this 0:07:42.160 --> 0:07:45.800 windowed approach. And again, Warren weaver Back had proposed such 0:07:45.800 --> 0:07:48.800 a thing. The vector approach would become more important as 0:07:48.800 --> 0:07:53.240 computer scientists made advances in neural networks. That approach also 0:07:53.360 --> 0:07:56.920 made machine translation much more effective because it no longer 0:07:57.000 --> 0:08:00.560 looked for word for word matches, but rather matches meaning 0:08:00.880 --> 0:08:05.880 based on vectors and probabilities. That's really important because once 0:08:05.920 --> 0:08:08.640 you determine the meaning of a phrase in one language, 0:08:09.040 --> 0:08:13.320 then you can look for a phrase in another language 0:08:13.360 --> 0:08:18.720 that most closely resembles the meaning of the original. Uh. 0:08:18.760 --> 0:08:22.679 This is the art of translation. A real translator, someone 0:08:22.680 --> 0:08:26.040 who's translated from one language to another, is probably not 0:08:26.120 --> 0:08:28.880 doing so word for word. Rather, they're doing meaning for 0:08:29.080 --> 0:08:33.120 meaning to make certain that the intent of what is 0:08:33.160 --> 0:08:38.480 being communicated gets through, not just the vocabulary. The ninety nineties, 0:08:38.520 --> 0:08:42.480 which sparked Jones identifies as the fourth phase of NLP 0:08:42.600 --> 0:08:45.400 development that would be the final phase in her report, 0:08:46.040 --> 0:08:50.960 saw a more concentrated focus on lexicons over syntax, and 0:08:50.960 --> 0:08:55.000 it also saw more practical applications of natural language processing, 0:08:55.320 --> 0:08:57.880 as well as leveraging the Worldwide Web to help train 0:08:58.000 --> 0:09:01.840 natural language processing models. There was an a rich source 0:09:02.440 --> 0:09:06.120 of natural language on the Worldwide Web. Pretty much every 0:09:06.120 --> 0:09:09.800 permutation you could imagine from people who are very careful 0:09:10.160 --> 0:09:13.560 and the way they construct sentences and paragraphs to people 0:09:13.559 --> 0:09:17.040 who are much more cavalier in the way they use language, 0:09:17.040 --> 0:09:21.680 whether purposefully or otherwise. And also that report from spark 0:09:21.800 --> 0:09:25.480 Jones again is dated October two thousand one, so that's 0:09:25.520 --> 0:09:30.160 where her work stops for that particular report. But nearly 0:09:30.160 --> 0:09:34.240 two decades have passed since that time, So in that time, 0:09:34.280 --> 0:09:36.719 what has changed. Well, I would argue we are now 0:09:36.720 --> 0:09:40.520 in a new phase of NLP development, one marked largely 0:09:40.600 --> 0:09:43.680 by the rise and a few key technologies. One of 0:09:43.679 --> 0:09:47.640 those is cloud computing. Cloud computing has removed the necessity 0:09:47.760 --> 0:09:51.840 to build in complex capabilities in end machines like a 0:09:51.880 --> 0:09:55.640 smartphone or a computer terminal, So an organization can create 0:09:55.679 --> 0:10:00.480 a cloud infrastructure which consists of powerful machines and data basis. 0:10:00.679 --> 0:10:03.680 Those machines could be real, they could be virtual. Virtual 0:10:03.760 --> 0:10:07.040 machines are hosted on real hardware, but they're running virtual 0:10:07.200 --> 0:10:11.560 implementations of various operating systems. So these machines provide the 0:10:11.600 --> 0:10:14.760 processing power and they house the systems that are necessary 0:10:14.800 --> 0:10:17.959 to parse language and respond appropriately, So you can think 0:10:17.960 --> 0:10:21.320 of it as the brains of natural language processing. They 0:10:21.320 --> 0:10:24.439 all exist on these very powerful computers that are in 0:10:24.559 --> 0:10:28.840 data centers. The widespread availability of the Internet and the 0:10:28.880 --> 0:10:31.679 fact that it's pretty easy to stay connected in many 0:10:31.720 --> 0:10:35.360 parts of the world make this possible. So the end 0:10:35.480 --> 0:10:39.640 user feels like the capabilities are actually housed on whatever 0:10:39.679 --> 0:10:41.719 device he or she is using, like if it's a 0:10:41.760 --> 0:10:44.480 smartphone or a computer, But in reality, all the work 0:10:44.559 --> 0:10:48.400 is actually taking place potentially thousands of miles away in 0:10:48.440 --> 0:10:51.160 a data center, and it's just being sent to you. 0:10:51.360 --> 0:10:54.520 The the queries are being sent to the center and 0:10:54.559 --> 0:10:58.240 the responses are being sent back to your device. Another 0:10:58.280 --> 0:11:00.880 big development that has helped signific piquant LEE is the 0:11:00.920 --> 0:11:04.199 pairing of artificial neural networks and as well as a 0:11:04.480 --> 0:11:07.679 deep learning the process of deep learning, so a neural 0:11:07.720 --> 0:11:10.920 network processes information in a way similar to how our 0:11:10.960 --> 0:11:13.960 brains do it. Every node in a neural network represents 0:11:13.960 --> 0:11:18.360 a neuron and it executes UH an operation upon data 0:11:18.559 --> 0:11:21.920 and then hands off this data, which has now been 0:11:21.960 --> 0:11:25.960 altered it's been transformed by this operation, to another layer 0:11:26.080 --> 0:11:29.560 of neurons with a network which do further processing, and 0:11:29.600 --> 0:11:31.920 so on and so forth. The system as a whole 0:11:32.040 --> 0:11:36.520 can evaluate calculations and assign confidence levels to them. Deep 0:11:36.600 --> 0:11:40.600 learning passes information through numerous layers to transform data and, 0:11:40.679 --> 0:11:44.920 in the context of natural language processing, extract meaning from 0:11:44.960 --> 0:11:47.720 that information. Now I've got a bit more to say 0:11:47.720 --> 0:11:50.560 about natural language processing in general, and then after that 0:11:50.640 --> 0:11:55.920 I'm going to transition to talk about recent implementations like Sirie, Alexa, 0:11:55.960 --> 0:11:59.800 Google Assistant, and Cortana. But first let's take a quick 0:12:00.080 --> 0:12:10.000 rake and thank our sponsor. In two thousand and sixteen, 0:12:10.040 --> 0:12:14.280 Google announced a system that could analyze syntax and recognize 0:12:14.320 --> 0:12:19.160 the various elements of a sentence, including verbs, nouns, adjectives, 0:12:19.160 --> 0:12:22.800 and other components. The system's name is sort of a 0:12:22.840 --> 0:12:27.760 snapshot of the zeitgeist of It was called and I'm 0:12:27.840 --> 0:12:32.720 not making this up Parsi mcpart's face. It really was. 0:12:33.360 --> 0:12:37.760 This is a parser, a a software that is meant 0:12:37.800 --> 0:12:42.880 to analyze inputs and determine what the relationships are between 0:12:43.000 --> 0:12:46.840 various components within the input. So it's parsing out the 0:12:46.960 --> 0:12:50.120 meaning of a phrase by looking at the relationship between 0:12:50.160 --> 0:12:53.760 all the different components. It was designed specifically for English 0:12:53.920 --> 0:12:57.920 language inputs. In that same announcement, Google unveiled and open 0:12:57.960 --> 0:13:03.280 source neural network framework called syntax net syntax Net tags 0:13:03.360 --> 0:13:07.319 every word in an input with a part of speech tag, 0:13:07.679 --> 0:13:10.800 and the tag describes the purpose of that word, what 0:13:10.880 --> 0:13:15.520 purpose does it serve within the sentence, within the context 0:13:15.640 --> 0:13:18.600 of that input. So, for example, it might be the 0:13:18.679 --> 0:13:21.920 subject of the sentence, or it could be an object 0:13:22.200 --> 0:13:25.040 of the sentence, or it might be the action the 0:13:25.200 --> 0:13:28.720 root the user wishes to perform upon the object. So 0:13:29.520 --> 0:13:31.720 if it identifies a verb that tends to be the 0:13:31.840 --> 0:13:36.960 root of the command. The system also determines the syntactic 0:13:37.040 --> 0:13:40.320 relationship between all the words, so not just what each 0:13:40.360 --> 0:13:43.560 word's purpose is, but how that word relates to all 0:13:43.679 --> 0:13:46.960 the other words within the input, and then it creates 0:13:47.000 --> 0:13:52.080 a dependency tree which illustrates which words depend upon others. 0:13:52.640 --> 0:13:56.080 Syntax Net also makes use of beam search. That's the 0:13:56.120 --> 0:13:58.959 strategy I talked about in the Speech Recognition podcast a 0:13:59.040 --> 0:14:05.200 couple of podcasts go so that is to help eliminate ambiguity. 0:14:05.320 --> 0:14:10.320 As sentence length increases, the number of possible interpretations of 0:14:10.360 --> 0:14:14.839 that sentence also increases dramatically. Right, the more complicated a 0:14:14.920 --> 0:14:18.840 sentence is, the easier it is to misinterpret what that 0:14:18.960 --> 0:14:21.760 sentence means, especially if you're looking at it from the 0:14:21.760 --> 0:14:24.480 perspective of a machine, So how does the computer know 0:14:25.000 --> 0:14:29.320 which interpretation is the right one? Syntax net takes a 0:14:29.480 --> 0:14:33.040 sentence and starts to parse it, beginning with a left 0:14:33.040 --> 0:14:35.520 to right approach for English, so it starts at the 0:14:35.560 --> 0:14:38.880 beginning of the sentence and works its way through. Essentially, 0:14:38.920 --> 0:14:42.360 it creates a hypothesis as to how the words relate 0:14:42.400 --> 0:14:45.080 to each other. But as it goes along, it detects 0:14:45.120 --> 0:14:49.800 possible alternate interpretations, so it starts to assign a probability 0:14:49.840 --> 0:14:54.040 score to each interpretation, Essentially how sure it is that 0:14:54.200 --> 0:14:56.800 this is on the right track. And it will keep 0:14:56.920 --> 0:15:00.680 multiple possible answers as it parses, so it doesn't toss 0:15:00.760 --> 0:15:04.120 them aside immediately. It says, all right, I'm right now, 0:15:04.440 --> 0:15:07.280 I'm pretty sure answer A is correct, but I'm going 0:15:07.320 --> 0:15:10.320 to hold on to B and C just in case. Now, 0:15:10.360 --> 0:15:13.920 if one interpretation has a particularly low score and there 0:15:13.960 --> 0:15:17.720 are several other potential interpretations that have higher scores, the 0:15:17.760 --> 0:15:20.760 system will discard the low score with the assumption that 0:15:20.840 --> 0:15:22.960 it just can't be the right answer just doesn't make 0:15:23.000 --> 0:15:27.720 sense in well formed text, that is informal text, something 0:15:27.760 --> 0:15:30.840 that has been written in a very formal approach, PARSI 0:15:31.000 --> 0:15:33.680 mcpars face does a pretty good job. In fact, a 0:15:33.760 --> 0:15:38.480 really good job has an accuracy rating that's approaching the 0:15:38.560 --> 0:15:43.040 level of a human linguist that is trained in parsing sentences. 0:15:43.920 --> 0:15:46.360 Humans who have that kind of training average at around 0:15:46.880 --> 0:15:52.360 scent accuracy, so PARSI mcpars faces right right behind them. 0:15:52.440 --> 0:15:57.120 But the key phrase there is well formed text. If 0:15:57.160 --> 0:16:00.920 you present parsi mcpar's face with more lucy goosey language, 0:16:01.200 --> 0:16:04.560 such as what you might find on your average Internet website, 0:16:05.560 --> 0:16:08.360 which I know was redundant, parsing mcpars face has a 0:16:08.400 --> 0:16:12.520 much more modest nine success rating. It's still impressive, but 0:16:12.560 --> 0:16:15.920 it's a significant drop in accuracy. Now, these sort of 0:16:15.920 --> 0:16:18.960 tools have been used in various Google products for a while, 0:16:19.160 --> 0:16:22.600 not just Google Assistant, which is the one that people 0:16:22.640 --> 0:16:24.520 tend to think about because it's the one we interact 0:16:24.560 --> 0:16:28.040 with when we are speaking to Google, but also in 0:16:28.120 --> 0:16:30.920 stuff like Gmail. If you've used Gmail and you've noticed 0:16:30.960 --> 0:16:34.320 that sometimes you get automated responses popping up that you 0:16:34.400 --> 0:16:37.280 can choose as an option, So instead of writing an email, 0:16:37.280 --> 0:16:40.360 you just select sounds good or I'll see you then, 0:16:40.520 --> 0:16:43.120 or whatever it may be. Then you have seen this 0:16:43.160 --> 0:16:46.000 technology at work, or at least you've seen the product 0:16:46.080 --> 0:16:49.280 of its work. Those automated responses are the result of 0:16:49.320 --> 0:16:54.080 a natural language understanding system that's parsing that email, identifying 0:16:54.120 --> 0:16:57.200 whatever the salient points are in the message, and then 0:16:57.240 --> 0:17:00.520 generating what are hopefully logical responses to it, so you 0:17:00.560 --> 0:17:02.520 can just choose that instead of taking the time to 0:17:02.520 --> 0:17:05.679 actually type something in. One of the key elements in 0:17:05.800 --> 0:17:09.520 natural language understanding is creating machines that can communicate with 0:17:09.640 --> 0:17:13.600 us and explain how they arrived at a certain result. Now, 0:17:13.640 --> 0:17:16.880 this falls into the concept of transparency, which is really 0:17:16.960 --> 0:17:19.919 important when we were talking about artificial intelligence. There's a 0:17:20.000 --> 0:17:24.119 real fear that AI and neural networks are creaning toward 0:17:24.240 --> 0:17:28.320 a black box scenario, and a black box describes any 0:17:28.400 --> 0:17:31.240 system where the workings of the system are hidden from 0:17:31.240 --> 0:17:35.719 our view. We cannot see how something works, and so 0:17:35.760 --> 0:17:38.159 we can only make guesses as to what's going on. 0:17:38.760 --> 0:17:40.760 I know a lot of gear heads who are exasperated 0:17:40.760 --> 0:17:44.719 with the way vehicle manufacturers are creating more of their cars, trucks, 0:17:44.760 --> 0:17:48.879 and other vehicles with systems that aren't easily accessible or modifiable. 0:17:49.320 --> 0:17:53.160 They consider those cars to be black boxes. It makes 0:17:53.160 --> 0:17:55.480 it much harder to work on a vehicle if you 0:17:55.520 --> 0:17:59.720 don't have the proprietary tools and knowledge that are specifically 0:17:59.760 --> 0:18:02.840 for that system. Now take that concept and apply it 0:18:02.840 --> 0:18:06.000 to AI, and it gets pretty scary pretty fast, particularly 0:18:06.280 --> 0:18:09.000 since we're relying on AI to do some important stuff 0:18:09.040 --> 0:18:13.239 like drive cars, make stock option deals, or help with 0:18:13.320 --> 0:18:17.399 healthcare issues, and so one area of work focuses on 0:18:17.440 --> 0:18:21.159 giving machines the capability to explain themselves, not just to 0:18:21.200 --> 0:18:24.440 provide an answer, but explain why they came up with 0:18:24.480 --> 0:18:28.120 that answer. So imagine a chess playing computer. It's playing 0:18:28.119 --> 0:18:30.200 a game of chess and it makes a move. Then 0:18:30.240 --> 0:18:33.040 imagine being able to ask the computer, why did you 0:18:33.119 --> 0:18:36.200 make that move, and then the computer could actually answer 0:18:36.280 --> 0:18:39.680 the question, explaining the logic behind the move it made. 0:18:40.119 --> 0:18:43.920 Now extend that concept to all sorts of different AI applications. 0:18:44.240 --> 0:18:46.880 If an AI stock trader suddenly buys up a ton 0:18:46.880 --> 0:18:50.080 of stocks, you might want to know exactly what prompted 0:18:50.160 --> 0:18:53.840 that decision, why did it make that purchase? And you 0:18:53.880 --> 0:18:56.479 can easily imagine situations in which you'd want to know 0:18:56.560 --> 0:18:59.480 why a machine behaved the way it did. Why did 0:18:59.720 --> 0:19:03.399 an autonomous car choose a particular route. Why did a 0:19:03.440 --> 0:19:07.920 healthcare program suggest a particular diagnosis Without getting those answers, 0:19:07.920 --> 0:19:11.040 we're just putting our faith into machines blindly, and giving 0:19:11.040 --> 0:19:15.120 a computer the ability to generate meaningful and equally important 0:19:15.240 --> 0:19:20.080 relevant explanations would be extremely helpful. So what are some 0:19:20.119 --> 0:19:24.440 of the uses of natural language processing technology. Well, one 0:19:24.520 --> 0:19:28.160 fairly simple application is in spelling and grammar checking software. 0:19:28.200 --> 0:19:30.520 If you've used a word processing program over the last 0:19:30.560 --> 0:19:33.480 few years the last couple of decades, chances are you're 0:19:33.480 --> 0:19:37.960 familiar with automatic real time spell check and grammar check features. 0:19:38.680 --> 0:19:40.760 This is possible because of the work that has been 0:19:40.800 --> 0:19:44.120 done in natural language processing. Spell check needs to take 0:19:44.160 --> 0:19:47.560 into consideration not only if a word is spelled correctly, 0:19:47.600 --> 0:19:51.760 if a word matches a word that's in the computer's lexicon, 0:19:52.320 --> 0:19:55.639 but also if it's the right word for that instance. 0:19:56.000 --> 0:19:58.320 In English, we have a lot of hominems. Those are 0:19:58.320 --> 0:20:01.760 words that sound the same aim, but I have different meanings. 0:20:02.080 --> 0:20:05.040 Now you can have hominem's that are spelled exactly the 0:20:05.080 --> 0:20:07.960 same way, and those really aren't a problem because the 0:20:07.960 --> 0:20:12.480 reader can pick up on what meaning you intended through context. Though, 0:20:12.520 --> 0:20:15.960 if you're using natural language processing to do a translation, 0:20:16.400 --> 0:20:18.879 then the NLP system needs to be able to determine 0:20:18.960 --> 0:20:22.480 which meaning the original author intended. In my earlier example 0:20:22.520 --> 0:20:26.640 about making a withdrawal at the bank, there's a hominem 0:20:26.680 --> 0:20:29.160 you know, to two versions of bank, but they mean 0:20:29.200 --> 0:20:32.040 two different things. I could also talk about bank as 0:20:32.080 --> 0:20:34.400 in the sense of a verb, as in banking off 0:20:34.520 --> 0:20:39.040 of something, but you get the point. There are also 0:20:39.119 --> 0:20:42.760 hominem's that sound the same but are spelled differently, and 0:20:42.800 --> 0:20:45.560 they have different meanings as well. So for example, they 0:20:45.680 --> 0:20:49.480 dreaded too as in t O two as in t 0:20:49.760 --> 0:20:54.400 O O, and two as in two combo. Those are 0:20:54.440 --> 0:20:58.159 three words with three different applications, three different spellings. A 0:20:58.320 --> 0:21:01.359 good spell check algorithm will be able to determine if 0:21:01.359 --> 0:21:04.960 you've used the correct one in any instance. So if 0:21:04.960 --> 0:21:10.040 you say that's two sweet, that's too sweet, but you're 0:21:10.240 --> 0:21:13.920 using the number too just in word form, the spell 0:21:14.040 --> 0:21:16.280 check will give you the old heads up and say 0:21:16.600 --> 0:21:19.399 I think you meant t O O not t w O. 0:21:20.160 --> 0:21:22.760 Fun fact, I typed that sentence into Google Docs and 0:21:22.800 --> 0:21:26.000 it said you're totes fine. BRA didn't notice it at all. 0:21:26.560 --> 0:21:30.040 Grammar checkers have to be able to analyze sentence structure 0:21:30.160 --> 0:21:32.840 and word choice and compared against the grammar program for 0:21:32.880 --> 0:21:35.920 the system. This might also help determine if the word 0:21:35.960 --> 0:21:39.240 you use was the correct one. So, for example, affect 0:21:39.640 --> 0:21:45.080 versus effect, Affect is a verb you affect something. Effect 0:21:45.240 --> 0:21:49.159 is usually a noun. It's typically the result of some action. 0:21:49.280 --> 0:21:52.520 So I could affect a drum, which is a dumb 0:21:52.560 --> 0:21:55.320 thing to say, and the effect might be that the 0:21:55.359 --> 0:21:58.680 sound I played hurt your ears. Now, if you spell 0:21:58.760 --> 0:22:02.160 the word correctly and the spell checker is only comparing 0:22:02.200 --> 0:22:04.800 the words you type against a lexicon to see if 0:22:04.840 --> 0:22:07.159 there's a match, you might not get an indication that 0:22:07.200 --> 0:22:10.000 anything is wrong because the computer system is saying, well, 0:22:10.000 --> 0:22:12.959 that word is spelled correctly. It doesn't realize it's the 0:22:12.960 --> 0:22:15.760 wrong word. But if it has a way of checking grammar, 0:22:15.760 --> 0:22:17.760 it can also make sure you're using the right word 0:22:17.840 --> 0:22:21.520 in the right context. Search engines such as Google use 0:22:21.600 --> 0:22:24.440 natural language processing to determine what it is you're looking 0:22:24.480 --> 0:22:26.800 for right, So when you're typing in a search and 0:22:26.840 --> 0:22:30.280 you hit the search button, you might get a little 0:22:31.240 --> 0:22:34.920 uh notification that says, maybe you meant this other thing, 0:22:35.040 --> 0:22:38.160 or maybe you need to search for this terminology. That's 0:22:38.160 --> 0:22:41.040 a useful feature since not everyone thinks of search the 0:22:41.119 --> 0:22:43.760 same way. I could tell a dozen people to go 0:22:43.800 --> 0:22:48.159 on Google and pull up information about Benjamin Franklin and 0:22:48.200 --> 0:22:51.479 the story about the kite, and those folks might go 0:22:51.640 --> 0:22:54.639 and perform their searches in twelve different ways. But the 0:22:54.680 --> 0:22:57.560 search engine's job is to return the best results based 0:22:57.560 --> 0:23:00.480 on the query, which means it needs to suss out 0:23:00.520 --> 0:23:04.160 what the searcher is actually looking for. So even if 0:23:04.480 --> 0:23:08.160 the twelve people all type twelve different ways of looking 0:23:08.240 --> 0:23:11.280 up this information about Benjamin Franklin and the kite story, 0:23:11.880 --> 0:23:16.320 it should respond with the most relevant results. And maybe 0:23:16.320 --> 0:23:19.919 people get slightly different search results based upon the query, 0:23:20.000 --> 0:23:22.760 but they should be more or less the same. And 0:23:22.800 --> 0:23:24.359 it can also look out for you. It could give 0:23:24.359 --> 0:23:26.880 you suggestions for search terms, should you use an incorrect 0:23:26.880 --> 0:23:31.920 spelling or you approximate a spelling, or something like that. 0:23:32.720 --> 0:23:36.320 One of the areas of opportunity for natural language processing 0:23:36.320 --> 0:23:39.160 applications in the near future is handling the massive amounts 0:23:39.200 --> 0:23:42.960 of information in big data applications. So, for example, a 0:23:43.040 --> 0:23:47.159 lawyer might want to search historical legal results using natural 0:23:47.240 --> 0:23:50.040 language to look for precedents that might help his or 0:23:50.040 --> 0:23:53.639 her case in the courtroom. A pharmaceuticals company might need 0:23:53.680 --> 0:23:58.080 to search information about clinical trials, doctors, notes, patient testimonials, 0:23:58.119 --> 0:24:01.560 and related information. And the amount of information represented by 0:24:01.560 --> 0:24:04.800 big data is truly astounding. It's enormous. It's way too 0:24:04.880 --> 0:24:07.960 much for any human to sort through. So developing a 0:24:07.960 --> 0:24:10.800 method for computers to parse a query and return relevant 0:24:10.840 --> 0:24:14.520 results is highly desirable. For a computer to understand that 0:24:14.680 --> 0:24:19.720 context understanding and air quotes and being able to give 0:24:19.760 --> 0:24:23.840 you results based upon your questions, that would be incredibly 0:24:23.920 --> 0:24:27.080 valuable for lots of different industries. And we started off 0:24:27.119 --> 0:24:30.880 talking about machine translation at the early stages of natural 0:24:30.960 --> 0:24:34.640