WEBVTT - Machine Learning 101 0:00:04.400 --> 0:00:07.800 Welcome to tech Stuff, a production from I Heart Radio. 0:00:12.119 --> 0:00:15.440 Hey there, and welcome to tech Stuff. This is your host, 0:00:15.640 --> 0:00:19.120 Jonathan Strickland. I'm an executive producer with I Heart Radio 0:00:19.160 --> 0:00:22.360 and I love all things sex. You know, folks, Back 0:00:22.400 --> 0:00:28.000 in nineteen eighties, six comedy science fiction film that I 0:00:28.040 --> 0:00:32.400 saw in the theater about a robot the games sentience 0:00:32.440 --> 0:00:35.640 and becomes a total goofball what it will? It hit 0:00:35.720 --> 0:00:39.199 theaters in eight six and it was called Short Circuit. 0:00:39.880 --> 0:00:44.200 The movie starred Steve Gutenberg, Ali Sheety, and lamentably a 0:00:44.240 --> 0:00:48.080 white actor named Fisher Stevens playing a non white character, 0:00:48.720 --> 0:00:52.640 someone who is Indian. I should add that's not Steven's fault. 0:00:52.880 --> 0:00:55.760 I mean, he auditioned to be in a movie and 0:00:55.840 --> 0:00:58.280 he got a gig. He didn't cast himself in the film, 0:00:58.320 --> 0:01:01.800 and he has since talked about his experiences realizing the 0:01:01.840 --> 0:01:04.280 problems with a white man playing a non white character, 0:01:04.319 --> 0:01:08.880 but setting aside all the problematic whitewashing, the movie showed 0:01:09.000 --> 0:01:11.920 this robot, who in the course of the film names 0:01:11.959 --> 0:01:17.000 itself Johnny five. Learning. It learns about the world around it, 0:01:17.120 --> 0:01:20.840 It learns about people, It learns about human concepts like 0:01:21.000 --> 0:01:25.720 humor and emotion, and the general idea was pretty cute. 0:01:26.800 --> 0:01:31.119 Now the nifty thing is machines actually can learn. In fact, 0:01:31.200 --> 0:01:35.120 machine learning is a really important field of study these days, 0:01:35.480 --> 0:01:38.920 complete with its own challenges and risks. I've talked about 0:01:39.000 --> 0:01:41.399 machine learning a few times in the past, but I 0:01:41.400 --> 0:01:44.240 figured we could do a deeper dive to understand what 0:01:44.400 --> 0:01:48.120 machine learning is, what it isn't, how people are leveraging 0:01:48.160 --> 0:01:51.880 machine learning, and why. I said that it does come 0:01:51.920 --> 0:01:58.280 with risks, so let's learn about machines learning. It will 0:01:58.320 --> 0:02:02.480 be impossible to talk about machine learning without also talking 0:02:02.520 --> 0:02:08.079 about artificial intelligence or AI. And this term artificial intelligence 0:02:08.280 --> 0:02:12.880 is a real doozy. It trips people up, even people 0:02:12.960 --> 0:02:17.880 who have dedicated their lives to researching and developing artificial intelligence. 0:02:18.200 --> 0:02:22.679 You can get two experts in AI talking about AI 0:02:22.800 --> 0:02:25.600 and find out that because they have slightly different takes 0:02:25.639 --> 0:02:31.320 on what AI is, there are some communication issues. It's 0:02:31.320 --> 0:02:34.200 not as simple as red versus blue would have you think, 0:02:34.840 --> 0:02:40.440 what does the A stand for? So when you really 0:02:40.880 --> 0:02:43.200 boil it down, it comes out as as no big 0:02:43.200 --> 0:02:46.239 surprise that there's a lot of ambiguity here. After all, 0:02:46.600 --> 0:02:51.639 how would you define intelligence just intelligence, not artificial intelligence 0:02:52.000 --> 0:02:56.640 just intelligence? Well, would it be the ability to learn, 0:02:57.000 --> 0:03:01.240 that is, to acquire skills and knowledge? Or is it 0:03:01.320 --> 0:03:04.680 the application of learning? Is it problems solving? Is it 0:03:05.160 --> 0:03:08.440 being able to think ahead and make plans in order 0:03:08.480 --> 0:03:12.720 to achieve a specific goal? Is it the ability to 0:03:12.960 --> 0:03:16.560 examine a problem and deconstructed in order to figure out 0:03:16.600 --> 0:03:19.600 the best solution a more specific version of problem solving? 0:03:20.240 --> 0:03:25.560 Is it the ability to recognize, understand, and navigate emotional scenarios? Now, 0:03:25.680 --> 0:03:30.920 arguably it's all of these things and more. We all 0:03:31.000 --> 0:03:35.400 have kind of an intuitive grasp on what intelligence is, 0:03:36.280 --> 0:03:40.960 but defining it in a simple way tends to feel 0:03:41.000 --> 0:03:44.080 reductive and it leaves out a lot of important details. 0:03:44.480 --> 0:03:50.160 So if defining just general intelligence is hard, it stands 0:03:50.200 --> 0:03:55.000 for reason that defining artificial intelligence is also a tough job. Heck, 0:03:55.320 --> 0:03:58.640 even coming up with a number of different types of 0:03:58.680 --> 0:04:02.720 AI is tricky. And if you don't believe me, just 0:04:02.920 --> 0:04:08.920 google the phrase different types of artificial intelligence. Never mind, 0:04:08.960 --> 0:04:10.680 you don't. You don't really actually have to do that. 0:04:10.720 --> 0:04:13.800 I already did it, though, Feel free to do it 0:04:13.840 --> 0:04:16.839 yourself and check my work if you like. When I 0:04:17.080 --> 0:04:20.640 googled that phrase different types of AI, some of The 0:04:20.680 --> 0:04:24.960 top results included a blog post on BMC Software titled 0:04:25.240 --> 0:04:28.840 four types of Artificial Intelligence. But then there was also 0:04:28.880 --> 0:04:31.840 an article on code bots that was titled what are 0:04:31.880 --> 0:04:34.960 the three types of AI? And then there was an 0:04:35.040 --> 0:04:40.080 article from Forbes titled seven types of Artificial Intelligence. See, 0:04:40.279 --> 0:04:43.320 we can't even agree on how many versions of A 0:04:43.480 --> 0:04:48.560 EITHER are because defining a I is really hard. It 0:04:48.680 --> 0:04:52.080 largely depends upon how you view AI and then how 0:04:52.080 --> 0:04:56.039 you break it down into different realms of intelligence. Now 0:04:56.080 --> 0:04:59.839 we could go super high level because a classic way 0:04:59.839 --> 0:05:04.960 to look at AI is strong versus weak artificial intelligence. 0:05:06.560 --> 0:05:12.440 Strong AI UH sometimes called artificial general intelligence, would be 0:05:12.600 --> 0:05:17.640 a machine that processes information and at least appears to 0:05:17.839 --> 0:05:21.599 have some form of consciousness and self awareness and the 0:05:21.640 --> 0:05:26.000 ability to both have experiences and to be aware that 0:05:26.120 --> 0:05:30.239 it is having experiences. It might even feel emotion, though 0:05:30.880 --> 0:05:34.880 maybe not emotions that we could easily identify or sympathize with. 0:05:35.520 --> 0:05:38.839 So this would be the kind of machine that would 0:05:39.080 --> 0:05:42.560 think in a way similar to humans. It would be 0:05:42.600 --> 0:05:45.960 able to sense its environment and not just react, but 0:05:46.120 --> 0:05:49.800 really process what is going on and build and understanding. 0:05:50.000 --> 0:05:51.880 It's the type of AI that we see a lot 0:05:52.160 --> 0:05:55.239 in science fiction. That's the type of AI of Johnny 0:05:55.320 --> 0:05:59.240 five from Short Circuit, or how from two thousand one, 0:05:59.360 --> 0:06:02.440 or the draw aids in Star Wars. It's also a 0:06:02.440 --> 0:06:06.560 type of artificial intelligence that we have yet to actually 0:06:06.600 --> 0:06:11.479 achieve in the real world. So then what is weak AI. Well, 0:06:12.680 --> 0:06:16.120 you could say it's everything else, or you could say 0:06:16.120 --> 0:06:21.080 it's the building blocks that maybe collectively will lead to 0:06:21.200 --> 0:06:26.200 strong AI. Week AI involves processes that allow machines to 0:06:26.279 --> 0:06:31.360 complete tasks. So, for example, image recognition software could fall 0:06:31.400 --> 0:06:34.680 into this category. Once upon a time, in order to 0:06:34.760 --> 0:06:39.560 search photos effectively, you needed to actually add meta data 0:06:39.839 --> 0:06:45.039 like tags to those photos. So, for example, I might 0:06:45.400 --> 0:06:49.240 tag pictures of my dog with the meta tag dog, 0:06:50.040 --> 0:06:52.719 and then if I wanted to see photos of my pooch, 0:06:53.320 --> 0:06:55.440 then I would pull up my photo app and search 0:06:55.760 --> 0:06:58.840 the term dog and all the photos that I had 0:06:58.880 --> 0:07:01.280 tagged with the word dog would show up. But if 0:07:01.320 --> 0:07:04.679 I had failed to tag some pictures of my dog, 0:07:05.279 --> 0:07:07.839 those pictures wouldn't pop up in search because the computer 0:07:07.839 --> 0:07:11.240 program wasn't actually looking for dogs and my photos, it 0:07:11.360 --> 0:07:13.960 was just looking for photos that had that particular meta 0:07:14.000 --> 0:07:18.040 tag attached to it. But now we've reached a point 0:07:18.240 --> 0:07:21.400 where at least some photo apps are using image recognition 0:07:21.480 --> 0:07:25.240 to analyze photos, and these will return results that the 0:07:25.280 --> 0:07:28.679 algorithm has identified as having a reasonable chance of meeting 0:07:28.880 --> 0:07:31.720 your search query. So if I used an app like 0:07:31.760 --> 0:07:35.480 that and I put in dog as my search term, 0:07:35.520 --> 0:07:38.239 it could pull up photos that had no meta tags 0:07:38.240 --> 0:07:41.160 attached to them at all, because the search is relying 0:07:41.200 --> 0:07:44.640 on image recognition. Now, this also means that if the 0:07:44.680 --> 0:07:48.120 image recognition algorithm isn't very good, I could get some 0:07:48.200 --> 0:07:50.360 images that don't have a dog in them at all, 0:07:50.880 --> 0:07:54.040 or it might miss other images that have my dog 0:07:54.120 --> 0:07:56.760 in them. But my point is that the ability to 0:07:56.840 --> 0:07:59.760 identify whether or not a dog is in a particular 0:08:00.080 --> 0:08:05.760 photo represents a kind of weak artificial intelligence. You wouldn't 0:08:05.800 --> 0:08:10.640 say that the photo search tool possesses humanlike intelligence, because 0:08:10.680 --> 0:08:14.200 really it only does one thing. It's analyzing photos and 0:08:14.240 --> 0:08:17.840 looks for matches to specific search queries, but it can't 0:08:17.920 --> 0:08:21.440 do anything outside of that use case. However, that's just 0:08:21.560 --> 0:08:24.520 one little example. There are all sorts of other ones, 0:08:24.560 --> 0:08:30.480 like voice recognition, environmental sensing, course plotting, that kind of thing, 0:08:30.720 --> 0:08:33.880 and in some circles, as we get better at making machines, 0:08:33.920 --> 0:08:39.040 and systems that can do these things. Those elements seem 0:08:39.080 --> 0:08:42.360 to kind of drift away from the ongoing conversation about 0:08:42.440 --> 0:08:46.199 artificial intelligence. A guy named Larry Tesler, who was a 0:08:46.240 --> 0:08:49.160 computer scientist who worked at lots of really important places 0:08:49.240 --> 0:08:54.520 like Xerox Park and Amazon and Apple. He once observed, 0:08:54.640 --> 0:08:59.920 quote intelligence is whatever machines haven't done yet, end quote. 0:09:00.440 --> 0:09:03.480 So his point was that the reason that AI is 0:09:03.559 --> 0:09:06.120 really hard to talk about is that the goal post 0:09:06.320 --> 0:09:12.920 for what actually is artificial intelligence is constantly moving. Now, 0:09:12.920 --> 0:09:16.719 this pretty much mirrors how we think about things like consciousness. 0:09:17.120 --> 0:09:20.640 Lots of people study consciousness, and the general sense I 0:09:20.679 --> 0:09:23.240 get is that it's a lot easier for people to 0:09:23.280 --> 0:09:29.080 talk about what isn't consciousness rather than what consciousness actually is. 0:09:29.760 --> 0:09:33.480 And it seems like artificial intelligence is in a similar place, 0:09:33.559 --> 0:09:36.520 which really isn't that big of a surprise as we 0:09:36.679 --> 0:09:41.200 closely associate intelligence with consciousness. Now this leads us to 0:09:41.720 --> 0:09:45.160 why there are so many different takes on how many 0:09:45.200 --> 0:09:48.680 types of AI there are. It all depends on how 0:09:48.800 --> 0:09:53.199 you classify different disciplines in artificial intelligence, and over time, 0:09:53.679 --> 0:09:57.679 a lot of disciplines that were previously distinct from AI 0:09:57.800 --> 0:10:01.599 have sort of converged into becoming heart of the AI discussion. 0:10:01.880 --> 0:10:04.840 Machine learning, as it turns out, was part of the 0:10:04.920 --> 0:10:09.520 AI discussion, branched off from it, and then rejoined the 0:10:09.559 --> 0:10:12.920 AI discussion years later. So I am not going to 0:10:12.960 --> 0:10:16.240 go down all the different approaches to classification because I 0:10:16.280 --> 0:10:18.719 don't know that they would be that valuable to us. 0:10:19.200 --> 0:10:21.120 They would really just illustrate that there are a lot 0:10:21.160 --> 0:10:26.280 of different ways to look at the subject. So if 0:10:26.360 --> 0:10:30.559 you ever find yourself in a conversation about AI, it 0:10:30.640 --> 0:10:33.720 might be a good idea to set a few ground 0:10:33.840 --> 0:10:37.440 rules as to what everyone means when they use the 0:10:37.520 --> 0:10:42.760 term artificial intelligence. That can help with expectations and understanding. 0:10:43.320 --> 0:10:46.200 Or you could just run for the nearest exit, which 0:10:46.240 --> 0:10:49.920 is what people tend to do whenever I start talking 0:10:49.960 --> 0:10:56.040 about it anyway. What about machine learning, Well, from one perspective, 0:10:56.280 --> 0:10:59.199 you could say machine learning is a sub discipline of 0:10:59.280 --> 0:11:03.080 artificial and eligence, although like I said, it hasn't always 0:11:03.120 --> 0:11:07.080 been viewed as such. I think most people would say 0:11:07.080 --> 0:11:11.000 that the ability to learn that is to take information 0:11:11.160 --> 0:11:15.280 and experience and then have some form of understanding of 0:11:15.320 --> 0:11:19.080 those things so that you can apply that to future tasks. 0:11:19.240 --> 0:11:23.160 Potentially getting better over time. I would say most people 0:11:23.200 --> 0:11:26.720 would call that part of intelligence, but you could also 0:11:26.760 --> 0:11:29.240 be a bit more wishy washy and say it's related to, 0:11:29.880 --> 0:11:33.520 you know, artificial intelligence, as opposed to being part of AI. 0:11:33.640 --> 0:11:37.839 Since the definition of AI is let's say, fluid, either 0:11:37.920 --> 0:11:41.520 way of classifying. Machine learning works. As far as I'm concerned, 0:11:42.600 --> 0:11:46.160 machine learning boils down to the idea of creating a 0:11:46.200 --> 0:11:50.120 system that can learn as it performs a task. It 0:11:50.160 --> 0:11:54.679 can learn what works and more importantly, what does not work. 0:11:55.200 --> 0:11:57.440 You may have heard that we learn a lot more 0:11:57.520 --> 0:12:01.200 from our mistakes than we do from our successes, which 0:12:01.880 --> 0:12:05.320 there's pretty much true in my experience. When something goes wrong, 0:12:05.880 --> 0:12:11.280 it's usually, but not always, possible to trace the event 0:12:11.480 --> 0:12:14.760 or events that led to the failure. You can identify 0:12:14.840 --> 0:12:19.120 decisions that we're probably the wrong ones or that led 0:12:19.200 --> 0:12:22.679 to a bad outcome, But if you have a success, 0:12:23.080 --> 0:12:27.160 it's hard to figure out which decisions were key to 0:12:27.280 --> 0:12:30.960 that successful outcome. Did your decision at step two set 0:12:31.000 --> 0:12:33.560 you on the right path, or was your choice at 0:12:33.559 --> 0:12:36.920 step three so good that it helped correct a mistake 0:12:37.160 --> 0:12:39.920 that you made it step two. But a good approach 0:12:39.960 --> 0:12:43.480 to machine learning involves a system that can adjust things 0:12:43.520 --> 0:12:47.160 on its own to reduce mistakes and increase the success rate. 0:12:47.520 --> 0:12:50.040 And another way of putting it is that instead of 0:12:50.080 --> 0:12:53.720 programming a system to arrive at a specific outcome, you 0:12:53.800 --> 0:12:57.160 are training the system to learn how to do it 0:12:57.240 --> 0:13:00.520 by itself. And that sounds a bit magical when you 0:13:00.559 --> 0:13:03.760 put it that way, doesn't it. It sounds like someone 0:13:03.840 --> 0:13:06.880 just took a computer and showed it pictures of cats 0:13:07.080 --> 0:13:09.640 and then expected the computer to know what a cat was. 0:13:10.440 --> 0:13:13.840 And this actually does mirror an actual project that really 0:13:14.240 --> 0:13:17.880 did do that, But I'm leaving out some big important 0:13:17.880 --> 0:13:22.200 information in the middle. Now. One big step is that 0:13:22.240 --> 0:13:26.520 computers and machines can't just magically learn by default. People 0:13:26.600 --> 0:13:29.840 first had to come up with a methodology that allows 0:13:29.920 --> 0:13:32.560 machines to go through the process of completing a task, 0:13:33.200 --> 0:13:36.960 then making adjustments to the process of doing that task, 0:13:37.360 --> 0:13:40.880 which would then improve future results. We have to lay 0:13:40.880 --> 0:13:45.440 the groundwork in architecture and theory and algorithms. We have 0:13:45.520 --> 0:13:49.600 to build the logical pathways that computers can follow in 0:13:49.720 --> 0:13:52.720 order for them to learn. A lot of machine learning 0:13:53.120 --> 0:13:57.360 revolves around patterns and pattern recognition. So what do I 0:13:57.400 --> 0:14:01.400 mean by patterns? Well, I mean some form of regularity 0:14:01.480 --> 0:14:06.800 and predictability. Machine learning models analyze patterns and attempt to 0:14:06.880 --> 0:14:11.640 draw conclusions based on those patterns. This in itself is 0:14:11.640 --> 0:14:15.720 tricky stuff. So why is that, Well, it's because sometimes 0:14:15.960 --> 0:14:19.960 we might think there's a pattern, when in reality there 0:14:20.080 --> 0:14:25.480 is not. We humans are pretty good at recognizing patterns, 0:14:25.680 --> 0:14:29.480 which makes sense. It's a survival mechanism. If you were 0:14:29.520 --> 0:14:33.160 to look at tall grass and you see patterns that 0:14:33.240 --> 0:14:37.320 suggest the presence of a predator like a tiger, well 0:14:37.440 --> 0:14:40.520 you would know that danger is nearby, and you would 0:14:40.520 --> 0:14:43.560 have the opportunity to do something about that to help 0:14:43.600 --> 0:14:48.960 your chances of survival. If, however, you remained blissfully unaware 0:14:49.080 --> 0:14:51.960 of the danger, you'd be far more likely to fall 0:14:52.000 --> 0:14:55.920 prey to that hungry tiger. So recognizing patterns is one 0:14:55.920 --> 0:14:58.760 of the abilities that gave humans a chance to live 0:14:58.800 --> 0:15:02.440 another day, and, from an evolutionary standpoint, a chance to 0:15:02.800 --> 0:15:07.680 make more humans. But sometimes we humans will perceive a 0:15:07.720 --> 0:15:12.920 pattern where none actually exists. A simple example of this 0:15:13.080 --> 0:15:16.960 is the fun exercise of laying on your back outside, 0:15:17.360 --> 0:15:20.200 looking up at the clouds and saying, what does that 0:15:20.240 --> 0:15:23.960 cloud remind you of? The shapes of clouds? Which have 0:15:24.560 --> 0:15:28.480 no significance and are the product of environmental factors, can 0:15:28.600 --> 0:15:32.600 seem to suggest patterns to us. We might see a dog, 0:15:32.840 --> 0:15:36.120 or a car or a face, but we know that 0:15:36.280 --> 0:15:40.360 what we're really seeing with just the appearance of a pattern, 0:15:40.440 --> 0:15:43.360 it's it's not evidence of a pattern actually being there. 0:15:43.400 --> 0:15:50.040 It's noise, not signal, but it could be misinterpreted as signal. Well, 0:15:50.080 --> 0:15:53.000 it turns out that in machine learning applications this is 0:15:53.080 --> 0:15:55.520 also an issue. I'll talk about it more towards the 0:15:55.600 --> 0:15:59.800 end of this episode. Computers can sometimes misinterpret data and 0:16:00.080 --> 0:16:04.000 termines something represents a pattern when it really doesn't. When 0:16:04.040 --> 0:16:07.000 that happens, a system relying on machine learning can produce 0:16:07.080 --> 0:16:11.480 false positives, and the consequences can sometimes be funny, like hey, 0:16:11.520 --> 0:16:14.320 this image recognition software thinks this coffee mug is actually 0:16:14.360 --> 0:16:17.320 a kiddie cat, or they can be really serious and 0:16:17.360 --> 0:16:22.440 potentially harmful. Hey, this facial recognition software has misidentified a person, 0:16:22.720 --> 0:16:25.640 marking them as, say, a person of interest in a 0:16:25.680 --> 0:16:29.080 criminal case. And it's all because this facial recognition software 0:16:29.120 --> 0:16:32.560 isn't very good at differentiating people of color. That's a 0:16:32.680 --> 0:16:36.520 real problem that really happens. Now, when we come back 0:16:36.800 --> 0:16:40.400 I'll give a little overview of the evolution of machine learning, 0:16:40.880 --> 0:16:44.200 but before we do that, let's take a quick break 0:16:51.840 --> 0:16:55.320 to talk about the history of machine learning. We first 0:16:55.360 --> 0:16:59.120 have to look back much much earlier, long before the 0:16:59.160 --> 0:17:02.880 era of computers, and talk about how thinkers like Thomas 0:17:02.960 --> 0:17:07.600 Bayes thought about the act of problem solving. Bays was 0:17:07.680 --> 0:17:11.240 born way back in two so quite a bit before 0:17:11.280 --> 0:17:14.480 we were thinking about machine learning, but he was interested 0:17:14.600 --> 0:17:19.560 in problem solving for problems involving probabilities, and specifically the 0:17:19.600 --> 0:17:24.000 relationship between different probabilities. I think it's easier to talk 0:17:24.040 --> 0:17:27.520 about if I give you an example. So let's make 0:17:27.560 --> 0:17:30.320 a silly one, all right, So let's say we got 0:17:30.320 --> 0:17:35.440 ourselves a plucky podcaster. Hey there, everybody, It's Jonathan Strickland, 0:17:36.080 --> 0:17:39.600 and it's Tuesday as I record this, And because of 0:17:39.760 --> 0:17:43.199 who I am, you know who this podcaster is. And 0:17:43.280 --> 0:17:47.480 because it's Tuesday, there is a chance I am wearing 0:17:47.640 --> 0:17:51.159 a they might Be Giants T shirt. And we also 0:17:51.240 --> 0:17:55.760 know that if this podcaster is wearing a they might 0:17:55.800 --> 0:17:59.879 be Giants T shirt on a Tuesday, there's a sixty 0:18:00.119 --> 0:18:03.440 percent chance that I'm going to end up wearing pajamas 0:18:03.520 --> 0:18:06.960 on Wednesday. But we also know that if I did 0:18:07.080 --> 0:18:11.280 not where they Might be Giants shirt on Tuesday, and 0:18:11.400 --> 0:18:15.280 remember there's a sixty chance I didn't, then we know 0:18:15.400 --> 0:18:17.920 there's an eighty percent chance I'm going to be wearing 0:18:17.960 --> 0:18:22.240 pajamas on Wednesday. Well, Bays worked out a way that 0:18:22.320 --> 0:18:28.040 described this sort of probability relationship between different discrete events, 0:18:28.200 --> 0:18:32.000 and using his reasoning, you can work forward or backward 0:18:32.000 --> 0:18:35.959 based on probabilities. Bays would describe wearing a they Might 0:18:36.000 --> 0:18:39.320 be Giant shirt on Tuesday as one event and wearing 0:18:39.320 --> 0:18:43.600 pajamas on Wednesday as a separate event, and then describe 0:18:43.640 --> 0:18:46.400 the two not only determining how likely it is I'll 0:18:46.400 --> 0:18:50.720 wear pajamas on Wednesday, but if we start with the 0:18:50.920 --> 0:18:53.320 later event. In other words, if we start with the 0:18:53.359 --> 0:18:57.240 fact that it's Wednesday and I'm wearing pajamas, we could 0:18:57.240 --> 0:19:02.120 work out how likely it was that yesterday, on Tuesday, 0:19:02.200 --> 0:19:05.439 I was wearing they Might Be Giants shirt. That was 0:19:05.560 --> 0:19:08.000 his his contribution, that you can work this in either 0:19:08.119 --> 0:19:11.679 direction if you know these different variables. Now, Bay has 0:19:11.760 --> 0:19:15.240 never published his thoughts, but rather send an essay explaining 0:19:15.280 --> 0:19:18.040 it to a friend of his, who then made sure 0:19:18.080 --> 0:19:20.840 that The work was published after Bays had passed away, 0:19:20.880 --> 0:19:25.040 and a few decades later Pierre Simon Laplace would take 0:19:25.119 --> 0:19:27.560 this work that Bays had done and flesh it out 0:19:27.600 --> 0:19:32.280 into an actual formal theorem. It's an important example of 0:19:32.320 --> 0:19:36.840 conditional probability, and a lot of what machine learning is 0:19:37.640 --> 0:19:42.800 really boiled down to is dealing with different probabilities, not certainties, which, 0:19:42.800 --> 0:19:44.119 when you get down to it, is what most of 0:19:44.200 --> 0:19:46.120 us are doing most of the time. Right We make 0:19:46.160 --> 0:19:51.480 decisions based on at least perceived probabilities. Sometimes these decisions 0:19:51.520 --> 0:19:54.960 might feel like they're a coin flip situation that any 0:19:55.040 --> 0:19:58.399 choice is equally likely to precipitate a good outcome or 0:19:58.440 --> 0:20:01.399 a bad outcome. Other times we might make a choice 0:20:01.400 --> 0:20:04.960 because we feel the probabilities are stacked favorably one way 0:20:05.080 --> 0:20:08.840 over another. Sometimes we will make a choice to back 0:20:08.960 --> 0:20:13.679 the least probable outcome because well, humans are not always 0:20:13.720 --> 0:20:17.399 superrational and hex sometimes the long shot does pay off, 0:20:17.560 --> 0:20:22.560 so that keeps Vegas in business. Bayes' theorem is just 0:20:22.680 --> 0:20:26.000 one example of ways that mathematicians and philosophers figured out 0:20:26.040 --> 0:20:31.280 ways to mathematically express problem solving and decision making, and 0:20:31.320 --> 0:20:33.440 a lot of this was figuring out if there were 0:20:33.520 --> 0:20:36.119 a way to boil down things that most of us 0:20:36.119 --> 0:20:40.280 approached through intuition and experience. So it's kind of neat, 0:20:40.880 --> 0:20:43.480 and also the more you look into it, the more 0:20:43.560 --> 0:20:46.240 likely you might find it's a little spooky, because it's 0:20:46.240 --> 0:20:49.639 weird to consider that our approaches to making choices and 0:20:49.720 --> 0:20:55.399 solving problems can be reduced down to mathematical expressions. But 0:20:56.440 --> 0:21:00.359 let's leave the potential existential crises alone for now, shall we. 0:21:00.480 --> 0:21:03.920 So moving on, we have another smarty pants we need 0:21:03.960 --> 0:21:08.479 to talk about Andre Markov, a Russian mathematician. In the 0:21:08.560 --> 0:21:12.120 early twentieth century. He began studying the nature of certain 0:21:12.240 --> 0:21:16.160 random processes that follow a particular type of rule, which 0:21:16.160 --> 0:21:20.000 we now call the Markov property. That rule says that 0:21:20.400 --> 0:21:24.600 for this particular process, the next stage of the process 0:21:24.720 --> 0:21:29.120 only depends upon the current stage, but not any stages 0:21:29.160 --> 0:21:33.520 that came before then. So let's take my ridiculous T 0:21:33.720 --> 0:21:36.600 shirt example, and let's build it out a little bit further. 0:21:37.000 --> 0:21:39.800 Let's say that I've got three T shirts to my name. 0:21:40.200 --> 0:21:42.119 One of them is that they might be Giant's shirt, 0:21:42.680 --> 0:21:46.160 one is a plain blue T shirt, and the third 0:21:46.480 --> 0:21:49.240 is a shirt that has the tech stuff logo on it, 0:21:49.800 --> 0:21:54.959 and it's based off of long observation that you've determined 0:21:55.280 --> 0:21:59.680 these following facts. If I am wearing that they Might 0:21:59.680 --> 0:22:04.399 Be shirt today, I definitely will not wear it tomorrow. 0:22:04.800 --> 0:22:08.280 But there's a fifty shot I'll wear either the blue 0:22:08.280 --> 0:22:12.080 shirt or the tech Stuff shirt. Now, if I'm wearing 0:22:12.280 --> 0:22:15.800 the blue shirt today, there's a ten percent chance I'm 0:22:15.800 --> 0:22:19.280 going to wear the same blue shirt tomorrow. Don't worry, 0:22:19.520 --> 0:22:23.600 I'll wash it first. There's a sixty chance that I'll 0:22:23.600 --> 0:22:26.320 wear the tech Stuff shirt, and there's a thirty percent 0:22:26.400 --> 0:22:29.600 chance I'll wear the they Might Be Giant shirt. But 0:22:30.520 --> 0:22:33.159 if I'm wearing the tech Stuff shirt today, there's a 0:22:33.200 --> 0:22:36.399 seventy chance I'll wear it again tomorrow because I like 0:22:36.440 --> 0:22:39.760 to promote myself. But there's a thirty percent chance I'll 0:22:39.760 --> 0:22:42.159 wear the they Might Be Giant shirt, and there is 0:22:42.280 --> 0:22:44.920 no chance that I'm going to wear the blue one 0:22:45.240 --> 0:22:49.520 in this case. So those are our various scenarios. Right 0:22:49.800 --> 0:22:54.560 which shirt I will wear tomorrow depends only upon which 0:22:54.640 --> 0:22:58.120 shirt I am wearing today. What I wore yesterday has 0:22:58.119 --> 0:23:02.119 no bearing on the outcome for tomorrow, So today is 0:23:02.160 --> 0:23:05.879 all that matters. And depending on which shirt I wear, 0:23:06.320 --> 0:23:09.639 you can make some probability predictions for tomorrow. So we 0:23:09.640 --> 0:23:12.600 can actually use this approach to figure out the probability 0:23:12.640 --> 0:23:15.840 that I might wear the tech Stuff shirts, say ten 0:23:15.920 --> 0:23:19.119 days in a row, since there's a better than even 0:23:19.240 --> 0:23:22.760 chance that if I'm wearing tech stuff today, I'll end 0:23:22.840 --> 0:23:26.000 up wearing it again tomorrow. And if I wear it tomorrow, 0:23:26.240 --> 0:23:28.879 then there's a better than fifty chance that I'm going 0:23:28.920 --> 0:23:32.639 to wear it the following day. But at some point 0:23:32.720 --> 0:23:35.880 you're going to see that the odds are starting to 0:23:35.960 --> 0:23:40.320 be against you, for you know, increasingly long strings of 0:23:40.400 --> 0:23:44.000 wearing the tech Stuff shirt. Anyway, Markov chains would become 0:23:44.040 --> 0:23:46.920 one of the types of processes that machine learning models 0:23:46.960 --> 0:23:50.520 would incorporate, with some models looking at the current state 0:23:50.600 --> 0:23:53.639 of a given process and then make predictions on what 0:23:53.920 --> 0:23:57.399 the next state will be with no need to look 0:23:57.560 --> 0:24:03.320 back at the previous decision. The Markov chain is memory less. 0:24:04.400 --> 0:24:07.680 Now that's just a couple of the mathematicians whose work 0:24:07.840 --> 0:24:12.159 underlies elements of machine learning. There's also structure we need 0:24:12.200 --> 0:24:15.880 to talk about. In ninety nine, a man named Donald 0:24:15.920 --> 0:24:19.520 Hebb wrote a book titled The Organization of Behavior, and 0:24:19.600 --> 0:24:24.200 in that book, Hebb gave a hypothesis on how neurons, 0:24:24.480 --> 0:24:27.879 that is, how how brain cells interact with one another. 0:24:28.440 --> 0:24:32.480 His ideas included the notion that if two neurons interact 0:24:32.520 --> 0:24:36.760 with one another regularly, that is, if one fires, that 0:24:36.880 --> 0:24:40.440 the second one is also likely to fire. They end 0:24:40.520 --> 0:24:44.959 up forming a tighter communicative relationship with each other. Not 0:24:45.160 --> 0:24:50.320 long after his expression of this hypothesis, computer scientists began 0:24:50.359 --> 0:24:53.000 to think of a potential way to do this artificially, 0:24:53.400 --> 0:24:59.120 with machines creating the equivalent of artificial neurons. The relative 0:24:59.280 --> 0:25:04.080 strength and relationship between artificial neurons is something we described 0:25:04.119 --> 0:25:07.520 by wait that's going to be an important part of 0:25:07.560 --> 0:25:11.439 machine learning. WIT. By the way, is W E I 0:25:11.720 --> 0:25:15.640 G H T, as in this relationship is weighted more 0:25:15.720 --> 0:25:21.040 heavily than that relationship. In the early nineteen fifties, and 0:25:21.200 --> 0:25:25.080 IBM researcher named Arthur Samuel created a program designed to 0:25:25.119 --> 0:25:28.399 win at checkers. The program would do a quick analysis 0:25:28.440 --> 0:25:32.680 of where pieces were on a checkerboard and whose move 0:25:32.720 --> 0:25:36.080 it was, and then calculate the chances of each side 0:25:36.080 --> 0:25:38.960 winning the game based on those positions, and it did 0:25:39.000 --> 0:25:43.119 this with a mini max approach. Alright, so checkers is 0:25:43.160 --> 0:25:46.840 a two player turn based game. Player one makes a move, 0:25:47.160 --> 0:25:49.399 then player two can make a move. There are a 0:25:49.440 --> 0:25:52.840 finite number of moves that can be made, a finite 0:25:52.960 --> 0:25:57.159 number of possibilities, though admittedly it's a pretty good number 0:25:57.200 --> 0:26:00.520 of possibilities. But let's say a game has been going 0:26:00.520 --> 0:26:03.439 on for a few moves, and you've got your two sides. 0:26:03.480 --> 0:26:06.040 You've got the red checkers over on player one side 0:26:06.160 --> 0:26:08.880 and the black checkers for a player to Let's say 0:26:08.880 --> 0:26:12.080 it's player one's move. For the purposes of this example, 0:26:12.400 --> 0:26:15.040 will say that player one really just has one piece 0:26:15.200 --> 0:26:19.119 that they can actually move on this turn, and it 0:26:19.160 --> 0:26:23.480