WEBVTT - Computer-Generated Communication 0:00:04.400 --> 0:00:12.760 Welcome to Text, a production from my Heart Radio. Hey there, 0:00:12.760 --> 0:00:16.480 and welcome to tech stuff. I'm your host, Jonathan Strickland. 0:00:16.520 --> 0:00:19.000 I'm an executive producer with I Heart Radio and I 0:00:19.079 --> 0:00:22.599 love all things tech, and today I want to tackle 0:00:22.840 --> 0:00:29.480 a really interesting, complicated, and potentially scary topic, and that 0:00:29.640 --> 0:00:34.360 is predictive text generation. And I know that sounds weird 0:00:34.400 --> 0:00:36.760 to say potentially scary, but you know, stick with me. 0:00:37.360 --> 0:00:40.720 I'm sure many of you have seen social media posts 0:00:40.760 --> 0:00:45.080 that say things like type I am the on your 0:00:45.120 --> 0:00:48.519 phone and then generate a result using the middle option 0:00:48.840 --> 0:00:52.280 of predictive text. So you know, just for example, I 0:00:52.320 --> 0:00:54.600 did that. If I did that on my phone, then 0:00:54.920 --> 0:00:58.320 I get I am the only one who can help 0:00:58.400 --> 0:01:04.679 me with this. Oh two, real predictive text. I mean, 0:01:04.680 --> 0:01:07.360 I'm the only one who researches and writes these episodes. 0:01:07.400 --> 0:01:11.840 That's it's way too real. But the whole meme of 0:01:12.000 --> 0:01:17.120 using predictive text to generate seemingly meaningful or you know, 0:01:17.240 --> 0:01:22.120 sometimes wildly absurd phrases is just part of what I 0:01:22.200 --> 0:01:25.640 want to talk about today. Now. The reason this topic 0:01:25.920 --> 0:01:29.120 jumped at me is because of a recent news article 0:01:29.200 --> 0:01:32.600 that I read over on the Verge The article that 0:01:32.680 --> 0:01:35.959 was written by Kim Lyons has the title a college 0:01:36.080 --> 0:01:40.959 student used GPT three to write fake blog posts and 0:01:41.120 --> 0:01:44.560 ended up at the top of Hacker News. Now, as 0:01:44.600 --> 0:01:49.080 the headline indicates, a computer science student used a predictive 0:01:49.160 --> 0:01:52.960 text engine called GPT three, a beta build of it 0:01:53.040 --> 0:01:58.000 in fact, that stands for Generative pre Trained Transformer, and 0:01:58.040 --> 0:02:01.400 then generated a blog that was featured on a site 0:02:01.400 --> 0:02:04.320 called hacker News as if it were a piece written 0:02:04.320 --> 0:02:07.960 by a flesh and blood human being. What's more, a 0:02:08.040 --> 0:02:10.919 threat on Reddit showed that only a few people were 0:02:10.960 --> 0:02:13.920 picking up on the feeling that something hinky was going on, 0:02:14.000 --> 0:02:17.240 and that perhaps the blog post had not been written 0:02:17.760 --> 0:02:21.040 but generated. And Lions goes on to point out that 0:02:21.120 --> 0:02:24.680 the fact that there's a lot of, you know, not 0:02:25.000 --> 0:02:28.359 very good writing on the Internet makes it a little 0:02:28.400 --> 0:02:32.359 harder to sus out a decent generated post as opposed 0:02:32.400 --> 0:02:35.240 to a written one. It's not so much that the 0:02:35.280 --> 0:02:39.480 AI has become super awesome at writing, but rather that 0:02:39.520 --> 0:02:42.400 we've kind of lowered the bar more than a little. 0:02:42.680 --> 0:02:44.760 This kind of plays into the whole concept of a 0:02:44.800 --> 0:02:48.720 touring test. So, just to go off on a tangent here, 0:02:48.919 --> 0:02:50.960 this isn't in my notes, I'm just going to speak 0:02:51.360 --> 0:02:54.880 off the cuff. The Touring test is named after Alan Touring, 0:02:55.560 --> 0:03:01.600 famous computer scientist, and the idea. Nowadays, it's kind of 0:03:01.600 --> 0:03:05.400 evolved into this idea of you have a series of 0:03:05.639 --> 0:03:10.040 interviews that a person does over a computer, and some 0:03:10.160 --> 0:03:14.119 of the interviewees are people and some of them are 0:03:14.840 --> 0:03:18.960 chat bots essentially, and the goal of this whole exercise 0:03:19.080 --> 0:03:21.000 is to see if the person who's doing the interview 0:03:21.560 --> 0:03:25.679 can consistently tell if the other entity on the other 0:03:25.680 --> 0:03:28.359 side of the interview is a person or if it's 0:03:28.360 --> 0:03:32.239 a chat bought. And if you pass with a certain percentage, 0:03:32.520 --> 0:03:35.040 you would say that the chat bought has passed the 0:03:35.040 --> 0:03:38.240 Touring test, that people are unable to tell the difference 0:03:38.280 --> 0:03:41.080 between the chat bought and a real human being, and 0:03:41.080 --> 0:03:43.880 that this is kind of one of the markers for 0:03:44.040 --> 0:03:48.600 artificial intelligence. We're gonna be dipping into that sort of 0:03:48.680 --> 0:03:53.520 thing with this discussion as well. So today I'm really 0:03:53.520 --> 0:03:56.520 wanted to dive into the whole concept of predictive text 0:03:56.560 --> 0:04:00.000 and how it's done and how it could absolutely destroy 0:04:00.120 --> 0:04:02.960 platforms like Facebook in the future. That's all I'm going 0:04:03.000 --> 0:04:06.040 to end this episode, So stick around, but we have 0:04:06.200 --> 0:04:10.760 to build on this gradually, So let's start at the 0:04:10.880 --> 0:04:14.600 very beginning, which, according to this woman who's singing outside 0:04:14.600 --> 0:04:17.320 my window, is a very good place to start. And 0:04:17.400 --> 0:04:21.080 we are going to start with a particularly tricky concept 0:04:21.200 --> 0:04:25.120 for a former English Lit Major to try and explain, 0:04:25.680 --> 0:04:29.560 and this is called a Markov model. It's named after 0:04:29.760 --> 0:04:35.920 a mathematician named Andre Andreyevitch Markov, and he was born 0:04:35.920 --> 0:04:39.400 in Russia in eighteen fifty six, and he did a 0:04:39.400 --> 0:04:43.840 lot of work on an area of mathematics called stochastic processes. 0:04:44.560 --> 0:04:48.800 But that just raises another question, right, what does stochastic mean? Well, 0:04:48.839 --> 0:04:53.640 a stochastic variable is one that is randomly determined. A 0:04:53.680 --> 0:04:59.040 stocastic system has a random probability pattern that you can study, 0:04:59.080 --> 0:05:03.720 but you can't dickt it precisely. There's always uncertainty. So 0:05:03.760 --> 0:05:07.880 you can assign probabilities as to how the pattern will form, 0:05:08.480 --> 0:05:11.720 but those are just indications of how likely a particular 0:05:11.800 --> 0:05:15.320 pattern will form, not a guarantee. So let's take a 0:05:15.400 --> 0:05:19.640 very simple example, and let's pick something really random. Let's 0:05:19.680 --> 0:05:22.760 talk about my two year old niece. So let's say 0:05:22.960 --> 0:05:25.520 my niece is standing in the middle of a room 0:05:25.760 --> 0:05:29.560 and I walk in. Now, based on my past interactions 0:05:29.800 --> 0:05:34.480 with this random creature, I know my niece is likely 0:05:34.560 --> 0:05:38.000 to do one of three things. She is going to 0:05:38.120 --> 0:05:40.799 run at me and grab my hand, and then boss 0:05:40.880 --> 0:05:43.359 me around and put me someplace and tell me I 0:05:43.400 --> 0:05:46.159 have to stay there. She's going to run away from 0:05:46.200 --> 0:05:49.760 me and then hide and then demand very loudly that 0:05:49.839 --> 0:05:52.760 I come find her. She is not, i should add, 0:05:52.839 --> 0:05:57.960 quite grasped the concept of hiding. Or she is going 0:05:58.040 --> 0:06:02.520 to ignore me and say and or dance. Those are 0:06:02.560 --> 0:06:05.440 the things that she typically does. There are other things 0:06:05.560 --> 0:06:08.760 she might do as well, but they happen much less frequently. 0:06:09.080 --> 0:06:12.440 So let's say I want to sketch out this scenario 0:06:12.560 --> 0:06:16.640 on paper. I might start with the scenario is my 0:06:16.720 --> 0:06:19.400 nieces in a room and I come into the room. 0:06:19.640 --> 0:06:22.440 Then I would draw a little bubbles on my paper 0:06:22.800 --> 0:06:26.960 to represent the potential actions or states as we would 0:06:26.960 --> 0:06:30.240 call them, in a Markov chain that could follow this 0:06:30.440 --> 0:06:33.560 input of me walking into the room. Now, based on 0:06:33.600 --> 0:06:36.640 the number of times I've seen her respond before, I 0:06:36.680 --> 0:06:41.440 could wait each of those states with a certain probability. If, 0:06:41.640 --> 0:06:44.680 for example, she runs at me and grabs my hand 0:06:44.760 --> 0:06:48.039 then bosses me around more than half the time I 0:06:48.080 --> 0:06:52.760 can wait that outcome, as you know, And does that 0:06:52.839 --> 0:06:54.800 mean the next time I walk into a room that 0:06:54.880 --> 0:07:00.160 she's going to do that? No, each incident is random. 0:07:00.160 --> 0:07:03.400 I'm just illustrating how likely a particular outcome is going 0:07:03.440 --> 0:07:07.080 to be. I would then assign probabilities for the other 0:07:07.200 --> 0:07:11.280 two outcomes I outlined, and and maybe just ignore all 0:07:11.280 --> 0:07:16.200 the outliers and say that one of them is you know, likely, 0:07:16.240 --> 0:07:18.880 which means the third one is only five percent likely 0:07:18.960 --> 0:07:22.320 to happen because it has to add up to now. 0:07:22.440 --> 0:07:26.280 The example I just gave is ridiculously simple, despite the 0:07:26.280 --> 0:07:29.760 fact that my niece is already incredibly complicated, And it 0:07:29.920 --> 0:07:34.320 just gives us the odds of one starting state that 0:07:34.400 --> 0:07:37.720 I'm me walking into a room that then transitions into 0:07:37.760 --> 0:07:42.480 one of three outcome states. Markov models can have lots 0:07:42.480 --> 0:07:46.040 of variables, with some variables dependent upon the value of 0:07:46.120 --> 0:07:49.760 other variables. So you might see a chain as something 0:07:50.080 --> 0:07:54.040 like if outcome A happens and there's a sixty chance 0:07:54.080 --> 0:07:57.280 that it will, then there's a thirty percent chance that 0:07:57.440 --> 0:08:01.880 a subsequent outcome A three will happen, And it can 0:08:01.920 --> 0:08:05.640 become a really complex branching path of possibilities, but we 0:08:05.720 --> 0:08:09.679 can stick with simple. Let's take the coin flip, the 0:08:09.840 --> 0:08:13.720 classic example of a random variable. We know that the 0:08:13.760 --> 0:08:18.040 odds of a fair coin landing heads up are and 0:08:18.120 --> 0:08:22.040 landing tails up. Our fifty percent. Flipping a coin many 0:08:22.240 --> 0:08:27.320 thousands of times should show that collectively you're gravitating towards 0:08:27.400 --> 0:08:31.560 those probabilities, that about half of your coin flips will 0:08:31.600 --> 0:08:34.000 be heads and the other half will be tails. But 0:08:34.120 --> 0:08:37.480 that does not mean you won't get on streaks where 0:08:37.520 --> 0:08:41.600 you flip heads over and over. Allah, Rosencrantz and Guildenstern 0:08:41.640 --> 0:08:44.640 are dead. And if you don't know that reference, I 0:08:44.720 --> 0:08:48.360 highly recommend that you read that play or you watch 0:08:48.480 --> 0:08:51.280 the excellent film version that has Tim Roth and Gary 0:08:51.320 --> 0:08:54.400 Oldman in it, because it is fantastic and it kind 0:08:54.400 --> 0:08:59.160 of dives into a fun discussion of probabilities and what 0:08:59.320 --> 0:09:03.080 does that actually mean Anyway, The odds of flipping a 0:09:03.160 --> 0:09:07.440 coin heads are for a single coin flip, but what 0:09:07.520 --> 0:09:11.160 about a second coin flip. Well, if we look at 0:09:11.280 --> 0:09:15.560 just that flip in isolation, that second coin flip, it's 0:09:15.559 --> 0:09:18.200 still a fifty pc chance that's going to land on heads. 0:09:18.840 --> 0:09:21.160 But if we frame it a different way, if we 0:09:21.200 --> 0:09:25.360 ask the question, what what are the odds of flipping 0:09:25.360 --> 0:09:28.120 heads twice in a row? This is a different question 0:09:28.240 --> 0:09:32.040 because you're not thinking about individual flips. You're saying, what 0:09:32.160 --> 0:09:36.360 are the odds of this happening twice sequentially? Well, now 0:09:36.720 --> 0:09:38.640 we have to take the odds of it happening once, 0:09:38.679 --> 0:09:42.280 which is, and then we have to multiply it against itself. 0:09:42.320 --> 0:09:46.000 It's a fifty chance again that it would happen twice. 0:09:46.200 --> 0:09:50.280 So oft is let me do the math. It is 0:09:51.640 --> 0:09:54.400 or one four. So if you were to do a 0:09:54.440 --> 0:09:57.080 pair of coin flips, and you were to repeat this 0:09:57.160 --> 0:10:00.760 experiment over and over and over again over the long run, 0:10:00.800 --> 0:10:05.400 you would find that of those sequences would end up 0:10:05.440 --> 0:10:08.880 with heads followed by heads. But what if we wanted 0:10:08.920 --> 0:10:11.400 to say, how what are the odds of flipping three 0:10:11.520 --> 0:10:14.719 heads in a row? Well, then we have to have 0:10:15.000 --> 0:10:20.199 it again. So instead of one out of every four trials, 0:10:20.520 --> 0:10:23.080 we would see one out of every eight, or twelve 0:10:23.120 --> 0:10:26.120 point five percent. And we can keep extending this out. 0:10:26.200 --> 0:10:29.920 We can figure out the odds of some ridiculously long 0:10:30.080 --> 0:10:33.880 stretch of flipping heads in a row. Now in Rosen, Cranston, 0:10:33.880 --> 0:10:37.240 Gillenstern are dead. We are told that it happens and 0:10:37.400 --> 0:10:42.400 astonishing ninety two times in a row, that streak has 0:10:42.440 --> 0:10:48.120 a probability of one in five octillion. That would be 0:10:48.160 --> 0:10:53.160 a five followed by twenty seven zeros. This does not 0:10:53.280 --> 0:10:58.479 mean that it would be impossible, but it is unfathomably unlikely. 0:10:59.440 --> 0:11:03.520 Clemson University has a useful lecture available online in the 0:11:03.559 --> 0:11:08.600 form of a presentation, and it's titled Introduction to Markov Models, 0:11:08.880 --> 0:11:12.880 and it uses weather forecasting as an example. And their 0:11:12.960 --> 0:11:19.760 example takes three initial states, sunny, rainy, and cloudy. Consequently, 0:11:19.760 --> 0:11:23.319 those are also the three potential output states, so each 0:11:23.440 --> 0:11:29.079 state can transition into three states, including transitioning into itself, 0:11:29.120 --> 0:11:32.960 so you could go sunny to cloudy, sunny too rainy, 0:11:33.080 --> 0:11:36.400 or sunny to sunny. That's a valid result as well. 0:11:36.600 --> 0:11:40.640 And in their example, the ideas that we have based 0:11:40.640 --> 0:11:46.160 on past observations figured out the probability for specific forecasts 0:11:46.200 --> 0:11:48.800 based on whatever the current weather happens to be. So, 0:11:49.120 --> 0:11:54.720 for example, we've figured out that rain tomorrow is likely 0:11:54.840 --> 0:11:59.040 if it's raining today, but it's only likely if it's 0:11:59.080 --> 0:12:04.959 just cloudy or sunny today. So if it's cloudy, if 0:12:04.960 --> 0:12:09.200 it's sunny, if it's raining today, that we'll see rain tomorrow. 0:12:09.400 --> 0:12:11.960 But our model would need to have probabilities assigned to 0:12:12.120 --> 0:12:15.600 each pair of starting and ending states. So I'm gonna 0:12:15.600 --> 0:12:18.200 follow through with that just for the purposes of this conversation. 0:12:18.640 --> 0:12:21.959 And we've covered the probabilities of tomorrow being rainy based 0:12:21.960 --> 0:12:25.520 on whatever today's weather is. But the example from Clemson 0:12:25.559 --> 0:12:29.079 also gives the other two outcomes states. So if we're 0:12:29.120 --> 0:12:33.520 looking at the probability of tomorrow being cloudy, we see 0:12:33.520 --> 0:12:37.520 that based on our past observations, that if today is sunny, 0:12:37.559 --> 0:12:41.080 it's a chance of cloudy tomorrow. If today is rainy, 0:12:41.120 --> 0:12:43.800 it's a thirty percent chance, and if today is cloudy, 0:12:43.840 --> 0:12:46.679 there's a fifty percent chance. And finally, if we want 0:12:46.760 --> 0:12:49.200 to know if it's going to be sunny tomorrow, again 0:12:49.200 --> 0:12:51.600 this is all just based on the example. We see 0:12:51.600 --> 0:12:54.200 that if today is sunny, there's an eight percent chance 0:12:54.240 --> 0:12:56.800 that tomorrow will be too. If today is rainy, it's 0:12:56.840 --> 0:12:59.760 just a five percent chance. If today is cloudy there's 0:12:59.760 --> 0:13:02.400 a fifteen percent chance. Now, the reason we need to 0:13:02.400 --> 0:13:05.040 know all of these probabilities will become clear in a second. 0:13:05.280 --> 0:13:08.679 And again these are just examples, they don't reflect real data. 0:13:09.360 --> 0:13:12.840 Markov got very clever and began to use math to 0:13:12.920 --> 0:13:18.120 describe probabilities for predictions that are further out than one state. So, 0:13:18.240 --> 0:13:21.440 for example, you might say, what is the probability that, 0:13:21.679 --> 0:13:25.320 if today is cloudy, that tomorrow will be sunny and 0:13:25.360 --> 0:13:29.240 that the following day will be rainy. This is kind 0:13:29.240 --> 0:13:31.520 of similar to us asking the question of what are 0:13:31.520 --> 0:13:34.800 the odds of flipping heads two or three times in 0:13:34.800 --> 0:13:37.920 a row, except we're looking at the probabilities of weather 0:13:38.360 --> 0:13:41.400 that are based on what our current conditions happen to be. 0:13:41.800 --> 0:13:45.360 So using the example probabilities that were used in that lecture, 0:13:45.720 --> 0:13:49.600 we would find that sunny days follow cloudy days just 0:13:49.880 --> 0:13:52.160 fifteen percent of the time, So there's a fifteen percent 0:13:52.320 --> 0:13:55.719 chance that tomorrow will be cloudy if today is sunny, 0:13:56.400 --> 0:14:00.480 and rainy days follow sunny days twenty per scent of 0:14:00.559 --> 0:14:04.400 the time. So if tomorrow is sunny, there's a twenty 0:14:04.840 --> 0:14:08.600 chance the day after tomorrow will be rainy. So then 0:14:09.520 --> 0:14:12.800 that means that if today's cloudy, we've got that fift 0:14:13.320 --> 0:14:15.360 chance tomorrow will be sunny, and if it is sunny, 0:14:15.400 --> 0:14:18.240 there's a chance that the day after tomorrow will be rainy. 0:14:18.320 --> 0:14:21.080 So we have to multiply those probabilities together. We have 0:14:21.120 --> 0:14:26.640 to multiply that by twenty or point one five times 0:14:26.640 --> 0:14:30.520 point two. That gives us point zero three, which we 0:14:30.760 --> 0:14:33.480 convert to a percentage. That means there's just a three 0:14:33.520 --> 0:14:37.760 percent chance that if today is cloudy, tomorrow will be sunny, 0:14:37.800 --> 0:14:40.400 and the day after tomorrow will be rainy. That's just 0:14:40.440 --> 0:14:43.200 a three percent chance of that happening. And the further 0:14:43.280 --> 0:14:45.800 out we try to predict a particular sequence of whether, 0:14:46.200 --> 0:14:49.280 the lower the probability will be, meaning you know it 0:14:49.320 --> 0:14:52.080 could happen. It's not like it's impossible, but it gets 0:14:52.200 --> 0:14:55.520 less likely the further out we go from our initial state. 0:14:55.880 --> 0:14:59.520 So a Markov model is a stochastic model that describes 0:14:59.600 --> 0:15:03.960 putten chill sequences. It is temporal in nature. That means 0:15:04.400 --> 0:15:07.600 we are really concerned with the state of things and 0:15:07.640 --> 0:15:11.000 how those states will change over time, and it gives 0:15:11.080 --> 0:15:15.080 us a way to explain how current states will depend 0:15:15.160 --> 0:15:18.800 upon previous states. It's not just about predicting the future, 0:15:18.840 --> 0:15:23.040 but also understanding the present. Why are things the way 0:15:23.080 --> 0:15:25.840 they are right now? And it gives us the chance 0:15:25.880 --> 0:15:30.280 to weigh the predictions of the future based upon past 0:15:30.360 --> 0:15:35.560 observational data. This is why we see weather forecasts that 0:15:35.600 --> 0:15:39.000 give us percentages for rainy days, Like a chance for 0:15:39.120 --> 0:15:41.800 rain tells us that it's probably a good idea to 0:15:41.800 --> 0:15:44.440 bring an umbrella if we're going outside, because based on 0:15:44.520 --> 0:15:49.320 past observations, there's a decent chance it's going to rain today. Now, 0:15:50.640 --> 0:15:54.000 let's get more complicated. What if we don't actually know 0:15:54.760 --> 0:15:58.080 the current state of the weather. Let's say that you 0:15:58.160 --> 0:16:01.280 are stuck inside and you can't see out a window, 0:16:01.320 --> 0:16:03.160 you have no windows in the room you're in, and 0:16:03.240 --> 0:16:06.160 someone else comes into your room and says, what's the 0:16:06.200 --> 0:16:10.280 weather like outside? Well, the only hint that we have 0:16:10.560 --> 0:16:14.120 in this experience is if the person that comes in 0:16:14.360 --> 0:16:17.160 is carrying an umbrella or not. We don't actually know 0:16:17.400 --> 0:16:20.800 the current state. We can only make an educated guess 0:16:20.840 --> 0:16:24.440 based on the presence or absence of an umbrella. The 0:16:24.560 --> 0:16:28.040 reality of the current state is hidden from us. This 0:16:28.160 --> 0:16:31.200 leads us to a type of sequential analysis that's used 0:16:31.200 --> 0:16:35.640 in computer science, the hidden Markov model. So with these models, 0:16:35.920 --> 0:16:39.280 we're trying to learn more about the initial states by 0:16:39.320 --> 0:16:42.960 analyzing the outcomes that we can observe. And another way 0:16:42.960 --> 0:16:45.080 of putting it is we're trying to answer the question 0:16:45.920 --> 0:16:48.440 Why are things how they are right now? Why did 0:16:48.440 --> 0:16:53.120 this happen? Let's look back and figure out the probability 0:16:53.160 --> 0:16:57.560 that a particular initial state led to what is going 0:16:57.600 --> 0:17:00.440 on right now now. The whole reason I spent time 0:17:00.440 --> 0:17:04.080 talking about Markov models and probability is that it ties 0:17:04.200 --> 0:17:08.199 heavily into predictive text. It's also used in tons of 0:17:08.240 --> 0:17:12.800 other computational processes and analysis, from natural language analysis to 0:17:12.920 --> 0:17:17.639 genome sequencing. It's really powerful stuff. If we think about language, 0:17:18.000 --> 0:17:20.439 we know that there are certain rules to things. You 0:17:20.480 --> 0:17:24.240 can't just string random letters in a sequence and expect 0:17:24.359 --> 0:17:27.520 that to make a word that other people can understand. 0:17:28.119 --> 0:17:31.320 We have developed languages that have their own vocabularies and 0:17:31.440 --> 0:17:35.440 syntax and grammars. We know that in English, for example, 0:17:35.680 --> 0:17:39.439 the letter Q is nearly always followed by the letter you. 0:17:40.160 --> 0:17:42.920 We know that it would be very odd to see 0:17:42.960 --> 0:17:46.960 the letter H follow right behind the letter J in English. 0:17:47.320 --> 0:17:49.879 And so we can start building out a dictionary and 0:17:49.960 --> 0:17:53.800 a matrix, and the dictionary would include lots of common words, 0:17:53.840 --> 0:17:56.439 and the matrix would include basic rules to help us 0:17:56.480 --> 0:18:00.679 identify when someone is making a typo or misspelling something. 0:18:01.200 --> 0:18:03.959 And with these tools we could build out a method 0:18:04.000 --> 0:18:07.280 for predicting a letter based on the letters that were 0:18:07.320 --> 0:18:11.359 already typed. So if I typed T and then H, 0:18:11.520 --> 0:18:14.680 my predictive text might helpfully offer out the letter E 0:18:14.960 --> 0:18:18.080 because I frequently type the word the If I ignore 0:18:18.160 --> 0:18:20.680 that and I hit the letter A, I might get 0:18:20.720 --> 0:18:25.280 the prompt of using van or thank or maybe even 0:18:25.359 --> 0:18:29.399 thanks or maybe something else. And we're starting down that 0:18:29.520 --> 0:18:34.320 journey toward generative text. When we come back, I'll explain 0:18:34.359 --> 0:18:39.320 more about this and some really cool experiments with using 0:18:39.640 --> 0:18:42.720 machine learning and what that all means. But first let's 0:18:42.760 --> 0:18:53.919 take a quick break. Okay, So we're building out a 0:18:53.920 --> 0:18:59.040 tool that quote unquote understands basic probabilities of words appearing 0:18:59.080 --> 0:19:01.479 in a given language in a given order, and it 0:19:01.560 --> 0:19:04.320 understands that, for example, a Q will be followed by 0:19:04.480 --> 0:19:08.280 you nearly of the time in English. We build into 0:19:08.320 --> 0:19:12.320 this model all sorts of probabilities, so that words that 0:19:12.359 --> 0:19:15.280 are more common are going to pop up as autocomplete 0:19:15.280 --> 0:19:19.520 options more frequently than uncommon words. But we can do 0:19:19.600 --> 0:19:23.679 better than this. We can pair this with a learning model. 0:19:24.160 --> 0:19:28.680 Learning models evolve over time, They adjust based on the 0:19:28.720 --> 0:19:32.320 input fed to them, and we're talking about lots and 0:19:32.480 --> 0:19:37.240 lots of input, they refine themselves, so, in other words, 0:19:37.640 --> 0:19:42.200 they learn. So with learning models are predictive text begins 0:19:42.240 --> 0:19:47.160 to adjust to the specific individual who uses the predictive 0:19:47.200 --> 0:19:49.679 text over time. Like a phone. So let's say you 0:19:49.720 --> 0:19:53.960 and I each have the same particular model of smartphone, 0:19:54.480 --> 0:19:58.159 and we're both running the same operating system version and everything, 0:19:58.200 --> 0:20:02.080 like our phones are are essentially identical, at least at 0:20:02.119 --> 0:20:05.520 casual glance. And we've both been using these phones for 0:20:05.760 --> 0:20:08.439 a few weeks. And in that time, you and I 0:20:08.480 --> 0:20:11.560 have each used our phones to send various messages to 0:20:11.600 --> 0:20:14.960 our friends, our family, our colleagues, you know, your arch nemesis, 0:20:14.960 --> 0:20:18.159 Ben Bolan, you know the usual. As we do that, 0:20:18.800 --> 0:20:22.000 our predictive text keyboards start to pick up on how 0:20:22.119 --> 0:20:26.360 we use words, and it can build up a frequency matrix, 0:20:26.359 --> 0:20:30.160 which isn't just looking at words that are common in general, 0:20:30.359 --> 0:20:34.000 but words that are common to us as individuals, and 0:20:34.040 --> 0:20:36.920 the way that we use words, and sometimes the way 0:20:36.960 --> 0:20:40.040 we generate words. Maybe you happen to use the word 0:20:40.160 --> 0:20:44.040 balder dash a lot, and so you start typing the 0:20:44.080 --> 0:20:46.800 word and the autocomplete for balder dash will jump up 0:20:46.880 --> 0:20:49.679 much faster than it would if I were typing it 0:20:49.800 --> 0:20:52.119 on my phone, because my phone has never heard me 0:20:52.720 --> 0:20:56.639 use that, so it doesn't automatically assume that's what I'm typing. 0:20:56.880 --> 0:20:59.800 Maybe I use the word folder roll a lot, and 0:20:59.840 --> 0:21:02.679 the same happens with my phone compared to yours. The 0:21:02.720 --> 0:21:06.520 models learned the words we use, not and not just 0:21:06.600 --> 0:21:09.560 the words that the words we create as well. So 0:21:09.640 --> 0:21:12.320 let's say that I was, for some reason a big 0:21:12.400 --> 0:21:14.840 fan of How I Met Your Mother, which I'm not. 0:21:15.040 --> 0:21:16.919 But let's say that I am a big fan of 0:21:16.920 --> 0:21:20.119 Neil Patrick Harris, which is true, and his character often 0:21:20.160 --> 0:21:24.080 says that is wait for it, legendary. Uh, And it 0:21:24.440 --> 0:21:27.560 might extend the word legendary. So to do that, I 0:21:27.680 --> 0:21:29.840 might throw in a whole bunch of extra ease at 0:21:29.880 --> 0:21:34.040 the beginning of legendary. Well, my phone might pick up 0:21:34.080 --> 0:21:36.560 that I tend to do this, and so it includes 0:21:36.640 --> 0:21:40.200 that as a legitimate word, even though any sort of 0:21:40.560 --> 0:21:45.600 spelling check would say this ain't a word, stop it, 0:21:45.640 --> 0:21:48.520 But my phone's predictive text is going to include it 0:21:48.560 --> 0:21:52.200 as saying this is something that is meaningful and thus 0:21:52.240 --> 0:21:57.480 a valid option. Also, the phones can learn to adapt 0:21:57.520 --> 0:22:01.439 to our own sense of syntax and grammar. Perhaps for 0:22:01.520 --> 0:22:05.200 purposes of a particular effect. One of us tends to 0:22:05.240 --> 0:22:08.719 tweak the syntax of the language that we're communicating in 0:22:08.760 --> 0:22:12.080 for some reason. Maybe it's for comedic effect and it's 0:22:12.080 --> 0:22:15.480 not following the established rules of grammar for English. But 0:22:15.560 --> 0:22:18.560 our phone starts to understand that's how we communicate, based 0:22:18.600 --> 0:22:21.880 on how we order our words and how we generate 0:22:21.880 --> 0:22:25.479 our phrases, you know, how we communicate that. While our 0:22:25.560 --> 0:22:30.560 choices aren't necessarily in alignment with an established formal system, 0:22:30.600 --> 0:22:34.880 they represent a particular approach to communicating. Predictive text can 0:22:34.960 --> 0:22:38.840 start to get a handle on that if it's built properly, 0:22:39.359 --> 0:22:43.640 and even someone who communicates in an idiosyncratic way might 0:22:43.680 --> 0:22:47.680 find that their phone is offering up particularly relevant suggestions. 0:22:47.720 --> 0:22:50.720 So how does all this work? How do machines actually 0:22:51.160 --> 0:22:55.760 learn stuff? Well, there's not one single method, but there 0:22:55.800 --> 0:23:00.160 are a collection of related processes that computer scientists develop 0:23:00.160 --> 0:23:04.480 to train machines. And you can look at two major 0:23:04.640 --> 0:23:08.359 types of categories of machine learning, and there are a 0:23:08.400 --> 0:23:10.800 lot of subtypes under each of these, and those would 0:23:10.840 --> 0:23:16.280 be supervised learning and unsupervised learning. Supervised learning involves training 0:23:16.280 --> 0:23:21.280 a computer model using known input and output information, so 0:23:21.560 --> 0:23:23.680 Let's take an example that I like to use a lot, 0:23:23.960 --> 0:23:26.919 and it's about image recognition. So let's say you're teaching 0:23:26.920 --> 0:23:31.320 a computer to recognize images of coffee mugs, and you 0:23:31.400 --> 0:23:35.720 have an enormous supply of images, millions of them. Some 0:23:35.840 --> 0:23:39.120 of them contain coffee mugs and various shapes and sizes 0:23:39.160 --> 0:23:44.320 and colors and orientations, and the lighting can be different. 0:23:44.400 --> 0:23:46.560 You might have the handle pointing to the left, and 0:23:46.680 --> 0:23:48.680 some or pointing to the right or the other. Some 0:23:48.720 --> 0:23:51.040 cases it might be on its side. But you've got 0:23:51.160 --> 0:23:55.120 tons of these, and you also have millions of images 0:23:55.320 --> 0:23:58.240 of other stuff. Some of it might not even resemble 0:23:58.320 --> 0:24:02.280 a mug remotely. Maybe it's an airplane or Christopher walkin. 0:24:02.840 --> 0:24:05.840 Others might look kind of like a mug, you know, 0:24:05.840 --> 0:24:09.160 it might be a glass or a bowl or something similar. Now, 0:24:09.200 --> 0:24:12.600 as a human being, you can tell straight away if 0:24:12.640 --> 0:24:14.840 the image you've got in front of you represents a 0:24:14.840 --> 0:24:21.280 coffee mug or not, But machines don't inherently possess this ability. 0:24:21.640 --> 0:24:25.480 You could feed one photo of a generic off white 0:24:25.520 --> 0:24:28.160 coffee mug, the handle happens to be pointed to the left, 0:24:28.200 --> 0:24:30.720 and you tag that photo as a coffee mug, you 0:24:30.760 --> 0:24:33.320 give meta data to the computer to classify that as 0:24:33.359 --> 0:24:36.320 a coffee mug. And if you create a database of images, 0:24:36.760 --> 0:24:39.480 maybe you do a search for coffee mug, that one 0:24:39.480 --> 0:24:41.560 would come up as a result because of all the 0:24:41.600 --> 0:24:46.040 work you've done with tagging this thing and effectively telling 0:24:46.080 --> 0:24:49.440 the computer this is what I mean by coffee mug. However, 0:24:49.720 --> 0:24:52.560 if you fed a new image and this one is 0:24:52.600 --> 0:24:55.800 of a red coffee mug that's of a different size, 0:24:56.119 --> 0:24:59.119 maybe the photo has different lighting conditions, maybe the mug 0:24:59.160 --> 0:25:02.440 is a little closer to the camera, the handles point 0:25:02.440 --> 0:25:04.760 to the right and on the left, would the computer 0:25:04.880 --> 0:25:09.280 automatically know that that's a coffee mug. No, it hasn't 0:25:09.400 --> 0:25:13.040