WEBVTT - Rerun: Machine Learning 101 0:00:04.400 --> 0:00:07.800 Welcome to Tech Stuff, a production from I Heart Radio. 0:00:11.840 --> 0:00:14.040 Hey there, and welcome to tech Stuff. I'm your host, 0:00:14.120 --> 0:00:16.759 Jonathan Strickland. I'm an executive producer with iHeart Radio. And 0:00:16.800 --> 0:00:20.479 how the tech are you? Alright? Well, I'm still on vacation. 0:00:20.600 --> 0:00:24.040 I'll be coming back soon, so tomorrow you should expect 0:00:24.040 --> 0:00:27.920 a brand new episode unless something goes wrong while I'm 0:00:27.920 --> 0:00:31.000 trying to get back. Hopefully nothing like that happens, And 0:00:31.440 --> 0:00:33.920 so we thought we'd have a little rerun. This episode 0:00:33.960 --> 0:00:38.080 originally published in April one, so just last year. It 0:00:38.200 --> 0:00:42.400 is titled machine Learning one oh one. And I wanted 0:00:42.440 --> 0:00:45.360 to do this one because, as always, we hear a 0:00:45.440 --> 0:00:48.919 lot about artificial intelligence and machine learning in the news 0:00:48.960 --> 0:00:54.040 and in media, and often those topics get a little confusing. 0:00:54.120 --> 0:00:59.520 They can come across more broad than some people intend, 0:00:59.840 --> 0:01:04.640 or or they can be somewhat misguided in their interpretations. 0:01:04.680 --> 0:01:06.480 So I thought it would be useful to have a 0:01:06.480 --> 0:01:10.399 little refresher course on machine learning and artificial intelligence to 0:01:10.400 --> 0:01:13.920 hope you enjoy, uh and I will be back at 0:01:13.920 --> 0:01:20.200 the end. Back in nineteen eighties, six comedy science fiction 0:01:20.280 --> 0:01:24.200 film that I saw in the theater about a robot, 0:01:24.600 --> 0:01:28.560 the game sentience and becomes a total goofball what it will. 0:01:28.600 --> 0:01:31.039 It hit theaters in eighties six and it was called 0:01:31.640 --> 0:01:36.080 Short Circuit. The movie starred Steve Gutenberg, Ali Sheety, and 0:01:36.360 --> 0:01:40.440 lamentably a white actor named Fisher Stevens playing a non 0:01:40.480 --> 0:01:44.520 white character, someone who is Indian. I should add that's 0:01:44.520 --> 0:01:48.240 not Steven's fault. I mean, he auditioned to be in 0:01:48.240 --> 0:01:50.640 a movie and he got a gig. He didn't cast 0:01:50.720 --> 0:01:53.080 himself in the film, and he has since talked about 0:01:53.120 --> 0:01:56.720 his experiences, realizing the problems with a white man playing 0:01:56.720 --> 0:01:59.760 a non white character, but setting aside all the problematic 0:01:59.760 --> 0:02:04.080 white washing, the movie showed this robot, who in the 0:02:04.080 --> 0:02:08.440 course of the film names itself Johnny five learning. It 0:02:08.560 --> 0:02:11.560 learns about the world around it, it learns about people, 0:02:12.080 --> 0:02:16.960 It learns about human concepts like humor and emotion, and 0:02:17.000 --> 0:02:20.919 the general idea was pretty cute. Now, the nifty thing 0:02:21.040 --> 0:02:25.680 is machines actually can learn. In fact, machine learning is 0:02:25.720 --> 0:02:29.320 a really important field of study these days, complete with 0:02:29.360 --> 0:02:32.959 its own challenges and risks. I've talked about machine learning 0:02:33.240 --> 0:02:35.040 a few times in the past, but I figured we 0:02:35.040 --> 0:02:38.400 could do a deeper dive to understand what machine learning 0:02:38.560 --> 0:02:42.160 is what it isn't how people are leveraging machine learning 0:02:42.240 --> 0:02:45.919 and why I said that it does come with risks, 0:02:45.919 --> 0:02:53.280 So let's learn about machines learning. It will be impossible 0:02:53.360 --> 0:02:56.800 to talk about machine learning without also talking about artificial 0:02:56.840 --> 0:03:01.840 intelligence or AI. And this term artificial intelligence is a 0:03:02.000 --> 0:03:06.520 real doozy. It trips people up, even people who have 0:03:06.680 --> 0:03:11.560 dedicated their lives to researching and developing artificial intelligence. You 0:03:11.600 --> 0:03:16.200 can get two experts in AI talking about AI and 0:03:16.240 --> 0:03:19.000 find out that because they have slightly different takes on 0:03:19.160 --> 0:03:24.680 what AI is, there are some communication issues. It's not 0:03:24.760 --> 0:03:27.480 as simple as red versus blue would have you think 0:03:28.080 --> 0:03:33.680 what does the A stand for? So when you really 0:03:34.120 --> 0:03:36.440 boil it down, it comes out as as no big 0:03:36.480 --> 0:03:39.480 surprise that there's a lot of ambiguity here. After all, 0:03:39.840 --> 0:03:44.880 how would you define intelligence just intelligence, not artificial intelligence, 0:03:45.240 --> 0:03:49.880 just intelligence? Well? Would it be the ability to learn, 0:03:50.240 --> 0:03:54.480 that is, to acquire skills and knowledge? Or is it 0:03:54.560 --> 0:03:57.920 the application of learning? Is it problems solving? Is it 0:03:58.400 --> 0:04:01.680 being able to think ahead and make plans in order 0:04:01.720 --> 0:04:05.960 to achieve a specific goal? Is it the ability to 0:04:06.240 --> 0:04:09.800 examine a problem and deconstructed in order to figure out 0:04:09.840 --> 0:04:12.840 the best solution. A more specific version of problem solving. 0:04:13.480 --> 0:04:18.800 Is it the ability to recognize, understand, and navigate emotional scenarios? Now, 0:04:18.920 --> 0:04:24.200 arguably it's all of these things and more. We all 0:04:24.240 --> 0:04:28.640 have kind of an intuitive grasp on what intelligence is, 0:04:29.560 --> 0:04:34.240 but defining it in a simple way tends to feel 0:04:34.240 --> 0:04:37.680 reductive and it leaves out a lot of important details. 0:04:37.720 --> 0:04:43.440 So if defining just general intelligence is hard, it stands 0:04:43.440 --> 0:04:46.720 for a reason that defining artificial intelligence is also a 0:04:46.760 --> 0:04:50.600 tough job. Heck, even coming up with a number of 0:04:50.640 --> 0:04:54.680 different types of a I is tricky. And if you 0:04:54.720 --> 0:04:59.159 don't believe me, just google the phrase different types of 0:04:59.279 --> 0:05:03.400 artificial intelligence. Never mind, you don't. You don't really actually 0:05:03.440 --> 0:05:06.119 have to do that. I already did it, though, Feel 0:05:06.160 --> 0:05:08.640 free to do it yourself and check my work if 0:05:08.680 --> 0:05:13.360 you like. When I googled that phrase different types of AI, 0:05:13.520 --> 0:05:16.400 some of the top results included a blog post on 0:05:16.600 --> 0:05:21.480 BMC software titled four types of artificial Intelligence. But then 0:05:21.520 --> 0:05:24.279 there was also an article on code bots that was 0:05:24.320 --> 0:05:27.680 titled what are the three types of AI? And then 0:05:27.720 --> 0:05:31.440 there was an article from Forbes titled seven types of 0:05:31.520 --> 0:05:35.600 artificial intelligence. See, we can't even agree on how many 0:05:35.720 --> 0:05:39.200 versions of a EI there are because defining a I 0:05:40.080 --> 0:05:44.040 is really hard. It largely depends upon how you view 0:05:44.200 --> 0:05:46.720 AI and then how you break it down into different 0:05:46.760 --> 0:05:51.599 realms of intelligence. Now we could go super high level, 0:05:51.920 --> 0:05:55.159 because a classic way to look at AI is strong 0:05:55.760 --> 0:06:02.240 versus weak Artificial intelligence stro on AI UH sometimes called 0:06:02.440 --> 0:06:08.760 artificial general intelligence, would be a machine that processes information 0:06:09.040 --> 0:06:13.400 and at least appears to have some form of consciousness 0:06:13.480 --> 0:06:17.440 and self awareness and the ability to both have experiences 0:06:17.480 --> 0:06:21.359 and to be aware that it is having experiences. It 0:06:21.440 --> 0:06:25.599 might even feel emotion, though maybe not emotions that we 0:06:25.680 --> 0:06:29.480 could easily identify or sympathize with. So this would be 0:06:30.080 --> 0:06:33.840 the kind of machine that would think in a way 0:06:34.000 --> 0:06:36.840 similar to humans. It would be able to sense its 0:06:36.920 --> 0:06:40.640 environment and not just react, but really process what is 0:06:40.680 --> 0:06:43.839 going on and build and understanding. It's the type of 0:06:43.880 --> 0:06:46.880 AI that we see a lot in science fiction. A's 0:06:46.920 --> 0:06:50.000 the type of AI of Johnny five from Short Circuit 0:06:50.480 --> 0:06:53.719 or how from two thousand one, or the droids in 0:06:53.800 --> 0:06:57.880 Star Wars. It's also a type of artificial intelligence that 0:06:57.960 --> 0:07:01.480 we have yet to actually achieve in the real world. 0:07:02.000 --> 0:07:06.520 So then what is week AI. Well, you could say 0:07:06.520 --> 0:07:10.120 it's everything else, or you could say it's the building 0:07:10.160 --> 0:07:16.080 blocks that maybe collectively will lead to strong AI week. 0:07:16.240 --> 0:07:21.160 AI involves processes that allow machines to complete tasks, So, 0:07:21.240 --> 0:07:25.640 for example, image recognition software could fall into this category. 0:07:25.960 --> 0:07:29.640 Once upon a time, in order to search photos effectively, 0:07:30.160 --> 0:07:34.680 you needed to actually add meta data like tags to 0:07:34.880 --> 0:07:40.040 those photos. So, for example, I might tag pictures of 0:07:40.080 --> 0:07:44.080 my dog with the meta tag dog, and then if 0:07:44.080 --> 0:07:46.920 I wanted to see photos of my pooch, then I 0:07:46.920 --> 0:07:49.920 would pull up my photo app and search the term dog, 0:07:50.440 --> 0:07:52.920 and all the photos that I had tagged with the 0:07:52.960 --> 0:07:55.320 word dog would show up. But if I had failed 0:07:55.480 --> 0:07:59.520 to tag some pictures of my dog, those pictures wouldn't 0:07:59.560 --> 0:08:02.200 pop up in search because the computer program wasn't actually 0:08:02.280 --> 0:08:05.200 looking for dogs in my photos. It was just looking 0:08:05.200 --> 0:08:08.720 for photos that had that particular meta tag attached to it. 0:08:09.480 --> 0:08:12.320 But now we've reached a point where at least some 0:08:12.400 --> 0:08:16.720 photo apps are using image recognition to analyze photos, and 0:08:16.760 --> 0:08:20.120 these will return results that the algorithm has identified as 0:08:20.160 --> 0:08:23.560 having a reasonable chance of meeting your search query. So 0:08:23.840 --> 0:08:26.280 if I used an app like that and I put 0:08:26.320 --> 0:08:29.480 in dog as my search term, it could pull up 0:08:29.480 --> 0:08:32.640 photos that had no meta tags attached to them at all. 0:08:33.120 --> 0:08:36.520 Because the search is relying on image recognition. Now, this 0:08:36.640 --> 0:08:40.680 also means that if the image recognition algorithm isn't very good, 0:08:40.720 --> 0:08:42.960 I could get some images that don't have a dog 0:08:43.000 --> 0:08:46.480 in them at all, or it might miss other images 0:08:46.520 --> 0:08:48.960 that have my dog in them. But my point is 0:08:49.000 --> 0:08:52.080 that the ability to identify whether or not a dog 0:08:52.160 --> 0:08:56.000 is in a particular photo represents a kind of weak 0:08:56.160 --> 0:09:01.560 artificial intelligence. You wouldn't say that the photo search tool 0:09:01.720 --> 0:09:05.560 possesses humanlike intelligence, because really it only does one thing. 0:09:06.120 --> 0:09:10.200 It's analyzing photos and looks for matches to specific search queries, 0:09:10.559 --> 0:09:14.360 but it can't do anything outside of that use case. However, 0:09:14.400 --> 0:09:17.080 that's just one little example. There are all sorts of 0:09:17.080 --> 0:09:23.120 other ones, like voice recognition, environmental sensing, course plotting, that 0:09:23.200 --> 0:09:25.760 kind of thing, and in some circles, as we get 0:09:25.800 --> 0:09:30.320 better at making machines and systems that can do these things, 0:09:31.120 --> 0:09:34.120 those elements seem to kind of drift away from the 0:09:34.200 --> 0:09:38.960 ongoing conversation about artificial intelligence. A guy named Larry Tessler, 0:09:39.160 --> 0:09:41.320 who was a computer scientist who worked at lots of 0:09:41.320 --> 0:09:46.320 really important places like Xerox, Park and Amazon and Apple, 0:09:46.840 --> 0:09:52.200 he once observed, quote, intelligence is whatever machines haven't done yet. 0:09:52.559 --> 0:09:55.920 End quote. So his point was that the reason that 0:09:56.000 --> 0:09:58.560 AI is really hard to talk about is that the 0:09:58.600 --> 0:10:04.160 goal post for why actually is artificial intelligence is constantly moving. 0:10:06.000 --> 0:10:08.560 Now this pretty much mirrors how we think about things 0:10:08.600 --> 0:10:13.439 like consciousness. Lots of people study consciousness, and the general 0:10:13.480 --> 0:10:16.040 sense I get is that it's a lot easier for 0:10:16.080 --> 0:10:20.160 people to talk about what isn't consciousness rather than what 0:10:20.520 --> 0:10:25.080 consciousness actually is. And it seems like artificial intelligence is 0:10:25.120 --> 0:10:28.640 in a similar place, which really isn't that big of 0:10:28.640 --> 0:10:33.640 a surprise as we closely associate intelligence with consciousness. Now 0:10:33.679 --> 0:10:36.959 this leads us to why there are so many different 0:10:37.040 --> 0:10:41.000 takes on how many types of AI there are. It 0:10:41.000 --> 0:10:45.400 all depends on how you classify different disciplines in artificial intelligence, 0:10:45.720 --> 0:10:48.920 and over time, a lot of disciplines that were previously 0:10:49.080 --> 0:10:53.480 distinct from AI have sort of converged into becoming part 0:10:53.600 --> 0:10:56.840 of the AI discussion. Machine learning, as it turns out, 0:10:57.360 --> 0:11:00.880 was part of the AI discussion, branch off from it, 0:11:01.120 --> 0:11:05.480 and then rejoined the AI discussion years later. So I 0:11:05.520 --> 0:11:08.000 am not going to go down all the different approaches 0:11:08.040 --> 0:11:10.640 to classification because I don't know that they would be 0:11:10.760 --> 0:11:13.840 that valuable to us. They would really just illustrate that 0:11:13.880 --> 0:11:16.280 there are a lot of different ways to look at 0:11:16.320 --> 0:11:21.560 the subject. So if you ever find yourself in a 0:11:21.600 --> 0:11:25.760 conversation about AI, it might be a good idea to 0:11:25.800 --> 0:11:29.400 set a few ground rules as to what everyone means 0:11:29.840 --> 0:11:33.320 when they use the term artificial intelligence. That can help 0:11:33.559 --> 0:11:38.360 with expectations and understanding. Or you could just run for 0:11:38.400 --> 0:11:41.560 the nearest exit, which is what people tend to do 0:11:41.640 --> 0:11:48.120 whenever I start talking about it anyway. What about machine learning, Well, 0:11:48.200 --> 0:11:51.240 from one perspective, you could say machine learning is a 0:11:51.360 --> 0:11:55.520 sub discipline of artificial intelligence, although like I said, it 0:11:55.600 --> 0:11:59.679 hasn't always been viewed as such. I think most people 0:11:59.760 --> 0:12:02.880 would say that the ability to learn that is to 0:12:03.200 --> 0:12:07.520 take information and experience and then have some form of 0:12:07.640 --> 0:12:11.120 understanding of those things so that you can apply that 0:12:11.200 --> 0:12:15.200 to future tasks, potentially getting better over time. I would 0:12:15.240 --> 0:12:18.880 say most people would call that part of intelligence. But 0:12:19.480 --> 0:12:21.400 you could also be a bit more wishy washy and 0:12:21.440 --> 0:12:25.000 say it's related to, you know, artificial intelligence, as opposed 0:12:25.040 --> 0:12:28.080 to being part of AI, since the definition of AI 0:12:28.240 --> 0:12:33.320 is let's say, fluid. Either way of classifying machine learning works. 0:12:33.360 --> 0:12:37.960 As far as I'm concerned, machine learning boils down to 0:12:38.000 --> 0:12:41.520 the idea of creating a system that can learn as 0:12:41.559 --> 0:12:45.360 it performs a task. It can learn what works and 0:12:45.520 --> 0:12:49.280 more importantly, what does not work. You may have heard 0:12:49.360 --> 0:12:51.920 that we learn a lot more from our mistakes than 0:12:51.960 --> 0:12:56.320 we do from our successes, which there's pretty much true 0:12:56.360 --> 0:13:00.480 in my experience. When something goes wrong, it's usually, but 0:13:00.800 --> 0:13:05.640 not always, possible to trace the event or events that 0:13:05.800 --> 0:13:09.920 led to the failure. You can identify decisions that we're 0:13:09.960 --> 0:13:13.400 probably the wrong ones or that led to a bad outcome, 0:13:14.120 --> 0:13:17.640 But if you have a success, it's hard to figure 0:13:17.679 --> 0:13:22.600 out which decisions were key to that successful outcome. Did 0:13:22.640 --> 0:13:25.199 your decision at step two set you on the right path, 0:13:25.600 --> 0:13:28.720 or was your choice at step three so good that 0:13:28.800 --> 0:13:31.840 it helped correct a mistake that you made it step two. 0:13:32.360 --> 0:13:35.319 But a good approach to machine learning involves a system 0:13:35.480 --> 0:13:38.560 that can adjust things on its own to reduce mistakes 0:13:38.960 --> 0:13:41.839 and increase the success rate. And another way of putting 0:13:41.880 --> 0:13:44.959 it is that instead of programming a system to arrive 0:13:45.000 --> 0:13:48.920 at a specific outcome, you are training the system to 0:13:49.080 --> 0:13:52.480 learn how to do it by itself. And that sounds 0:13:52.480 --> 0:13:55.240 a bit magical when you put it that way, doesn't it? 0:13:55.800 --> 0:13:59.040 It sounds like someone just took a computer and showed 0:13:59.040 --> 0:14:01.840 it pictures of cat and then expected the computer to 0:14:01.880 --> 0:14:05.200 know what a cat was. And this actually does mirror 0:14:05.360 --> 0:14:09.000 an actual project that really did do that, But I'm 0:14:09.080 --> 0:14:13.320 leaving out some big important information in the middle. Now, 0:14:13.840 --> 0:14:17.679 one big step is that computers and machines can't just 0:14:17.800 --> 0:14:20.880 magically learn by default. People first had to come up 0:14:20.920 --> 0:14:24.240 with a methodology that allows machines to go through the 0:14:24.280 --> 0:14:27.960 process of completing a task, then making adjustments to the 0:14:28.080 --> 0:14:32.920 process of doing that task, which would then improve future results. 0:14:33.440 --> 0:14:36.960 We have to lay the groundwork in architecture and theory 0:14:37.160 --> 0:14:41.160 and algorithms. We have to build the logical pathways that 0:14:41.200 --> 0:14:44.760 computers can follow in order for them to learn. A 0:14:44.800 --> 0:14:49.680 lot of machine learning revolves around patterns and pattern recognition. 0:14:50.080 --> 0:14:52.400 So what do I mean by patterns? Well, I mean 0:14:52.560 --> 0:14:58.680 some form of regularity and predictability. Machine learning models analyze 0:14:58.720 --> 0:15:03.040 patterns and attempt to draw conclusions based on those patterns. 0:15:03.760 --> 0:15:07.120 This in itself is tricky stuff. So why is that? Well, 0:15:07.160 --> 0:15:11.720 it's because sometimes we might think there's a pattern when 0:15:11.720 --> 0:15:17.040 in reality there is not. We humans are pretty good 0:15:17.320 --> 0:15:22.160 at recognizing patterns, which makes sense. It's a survival mechanism. 0:15:22.200 --> 0:15:25.280 If you were to look at tall grass and you 0:15:25.480 --> 0:15:28.800 see patterns that suggest the presence of a predator like 0:15:29.000 --> 0:15:33.200 a tiger, well you would know that danger is nearby, 0:15:33.240 --> 0:15:36.120 and you would have the opportunity to do something about 0:15:36.160 --> 0:15:40.200 that to help your chances of survival. If, however, you 0:15:40.320 --> 0:15:44.400 remained blissfully unaware of the danger, you'd be far more 0:15:44.480 --> 0:15:48.240 likely to fall prey to that hungry tiger. So recognizing 0:15:48.320 --> 0:15:51.280 patterns is one of the abilities that gave humans a 0:15:51.360 --> 0:15:55.080 chance to live another day, and, from an evolutionary standpoint, 0:15:55.120 --> 0:16:00.240 a chance to make more humans. But sometimes we wins 0:16:00.280 --> 0:16:05.360 will perceive a pattern where none actually exists. A simple 0:16:05.360 --> 0:16:08.760 example of this is the fun exercise of laying on 0:16:08.800 --> 0:16:13.000 your back outside, looking up at the clouds and saying, 0:16:13.040 --> 0:16:16.600 what does that cloud remind you? Of? The shapes of clouds, 0:16:16.680 --> 0:16:21.120 which have no significance and are the product of environmental factors, 0:16:21.560 --> 0:16:25.040 can seem to suggest patterns to us. We might see 0:16:25.040 --> 0:16:28.840 a dog, or a car or a face, but we 0:16:28.920 --> 0:16:32.880 know that what we're really seeing with just the appearance 0:16:33.000 --> 0:16:35.400 of a pattern, it's it's not evidence of a pattern 0:16:35.480 --> 0:16:40.000 actually being there. It's noise, not signal. But it could 0:16:40.040 --> 0:16:44.200 be misinterpreted as signal. Well, it turns out that in 0:16:44.280 --> 0:16:47.440 machine learning applications this is also an issue. I'll talk 0:16:47.480 --> 0:16:50.520 about it more towards the end of this episode. Computers 0:16:50.560 --> 0:16:55.400 can sometimes misinterpret data and determine something represents a pattern 0:16:55.480 --> 0:16:58.760 when it really doesn't. When that happens, a system relying 0:16:58.760 --> 0:17:02.760 on machine learning can whose false positives, and the consequences 0:17:02.800 --> 0:17:06.159 can sometimes be funny, like hey, this image recognition software 0:17:06.200 --> 0:17:09.119 thinks this coffee mug is actually a kidney cat. Or 0:17:09.160 --> 0:17:12.640 they can be really serious and potentially harmful. Hey, this 0:17:12.800 --> 0:17:17.120 facial recognition software has misidentified a person, marking them as, say, 0:17:17.200 --> 0:17:20.240 a person of interest in a criminal case. And it's 0:17:20.240 --> 0:17:23.280 all because this facial recognition software isn't very good at 0:17:23.320 --> 0:17:29.040 differentiating people of color. That's a real problem that really happens. Now, 0:17:29.040 --> 0:17:31.800 when we come back, I'll give a little overview of 0:17:31.880 --> 0:17:35.080 the evolution of machine learning. But before we do that, 0:17:35.720 --> 0:17:46.560 let's take a quick break to talk about the history 0:17:46.760 --> 0:17:50.080 of machine learning. We first have to look back much 0:17:50.560 --> 0:17:54.080 much earlier, long before the era of computers, and talk 0:17:54.160 --> 0:17:58.480 about how thinkers like Thomas Bayes thought about the act 0:17:58.720 --> 0:18:03.400 of problem solving. Bays was born way back in two, 0:18:03.440 --> 0:18:06.320 so quite a bit before we were thinking about machine learning, 0:18:06.720 --> 0:18:11.400 but he was interested in problem solving for problems involving probabilities, 0:18:11.840 --> 0:18:16.480 and specifically the relationship between different probabilities. I think it's 0:18:16.520 --> 0:18:19.440 easier to talk about if I give you an example. 0:18:20.040 --> 0:18:22.520 So let's make a silly one, all right, So let's 0:18:22.560 --> 0:18:27.200 say we got ourselves a plucky podcaster. Hey there, everybody, 0:18:27.440 --> 0:18:31.960 It's Jonathan Strickland, and it's Tuesday as I record this, 0:18:32.160 --> 0:18:35.040 And because of who I am, you know who this 0:18:35.119 --> 0:18:39.800 podcaster is. And because it's Tuesday, there is a chance 0:18:39.960 --> 0:18:42.840 I am wearing a they might be Giants T shirt. 0:18:43.320 --> 0:18:48.080 And we also know that if this podcaster is wearing 0:18:48.280 --> 0:18:51.800 a they might be Giants T shirt on a Tuesday, 0:18:52.000 --> 0:18:55.639 there's a sixty chance that I'm going to end up 0:18:55.640 --> 0:18:59.720 wearing pajamas on Wednesday. But we also know that if 0:18:59.760 --> 0:19:04.280 I did not where they might be Giant's shirt on Tuesday, 0:19:04.480 --> 0:19:08.359 and remember there's a six chance I didn't, then we 0:19:08.440 --> 0:19:10.879 know there's an eighty percent chance I'm going to be 0:19:10.920 --> 0:19:15.359 wearing pajamas on Wednesday. Will Bays worked out a way 0:19:15.440 --> 0:19:20.240 that described the sort of probability relationship between different discrete 0:19:20.320 --> 0:19:24.320 events and using his reasoning, you can work forward or 0:19:24.440 --> 0:19:29.000 backward based on probabilities. Theys would describe wearing a they 0:19:29.080 --> 0:19:32.240 Might be Giant shirt on Tuesday as one event and 0:19:32.280 --> 0:19:36.360 wearing pajamas on Wednesday as a separate event, and then 0:19:36.400 --> 0:19:39.399 describe the two not only determining how likely it is 0:19:39.440 --> 0:19:43.760 I'll wear pajamas on Wednesday, but if we start with 0:19:43.880 --> 0:19:46.439 the later event, in other words, that we start with 0:19:46.480 --> 0:19:50.199 the fact that it's Wednesday and I'm wearing pajamas, we 0:19:50.240 --> 0:19:55.360 could work out how likely it was that yesterday, on Tuesday, 0:19:55.440 --> 0:19:58.719 I was wearing they Might be Giants shirt. That was 0:19:58.800 --> 0:20:01.240 his his contribution, that you can work this in either 0:20:01.359 --> 0:20:04.919 direction if you know these different variables. Now, Bay has 0:20:05.000 --> 0:20:08.480 never published his thoughts, but rather send an essay explaining 0:20:08.520 --> 0:20:11.280 it to a friend of his, who then made sure 0:20:11.359 --> 0:20:13.879 that the work was published. After Bays had passed away, 0:20:14.160 --> 0:20:18.280 and a few decades later, Pierre Simon Laplace would take 0:20:18.359 --> 0:20:20.800 this work that Bays had done and flesh it out 0:20:20.840 --> 0:20:25.520 into an actual formal theorem. It's an important example of 0:20:25.600 --> 0:20:30.080 conditional probability, and a lot of what machine learning is 0:20:30.880 --> 0:20:36.000 really boiled down to is dealing with different probabilities, not certainties, which, 0:20:36.040 --> 0:20:37.399 when you get down to it, is what most of 0:20:37.440 --> 0:20:39.360 us are doing most of the time. Right. We make 0:20:39.400 --> 0:20:44.720 decisions based on at least perceived probabilities. Sometimes these decisions 0:20:44.800 --> 0:20:48.200 might feel like they're a coin flip situation, that any 0:20:48.320 --> 0:20:51.639 choice is equally likely to precipitate a good outcome or 0:20:51.680 --> 0:20:54.640 a bad outcome. Other Times we might make a choice 0:20:54.680 --> 0:20:58.240 because we feel the probabilities are stacked favorably one way 0:20:58.320 --> 0:21:02.080 over another. Sometimes we will make a choice to back 0:21:02.240 --> 0:21:07.720 the least probable outcome, because well, humans are not always superrational. 0:21:07.760 --> 0:21:10.960 In hex sometimes the long shot does pay off, so 0:21:11.920 --> 0:21:16.120 that keeps Vegas in business. Bayes' theorem is just one 0:21:16.160 --> 0:21:19.639 example of ways that mathematicians and philosophers figured out ways 0:21:19.680 --> 0:21:24.639 to mathematically express problem solving and decision making, And a 0:21:24.680 --> 0:21:26.879 lot of this was figuring out if there were a 0:21:26.920 --> 0:21:29.880 way to boil down things that most of us approached 0:21:29.960 --> 0:21:34.359 through intuition and experience. So it's kind of neat, and 0:21:34.480 --> 0:21:37.080 also the more you look into it, the more likely 0:21:37.119 --> 0:21:39.879 you might find it's little spooky, because it's weird to 0:21:39.880 --> 0:21:43.960 consider that our approaches to making choices and solving problems 0:21:44.240 --> 0:21:50.440 can be reduced down to mathematical expressions. But let's leave 0:21:50.520 --> 0:21:53.840 the potential existential crises alone for now, shall we. So 0:21:53.960 --> 0:21:57.280 moving on, we have another smarty pants we need to 0:21:57.320 --> 0:22:03.240 talk about Andre Markov, mathematician. In the early twentie century. 0:22:03.320 --> 0:22:07.159 He began studying the nature of certain random processes that 0:22:07.240 --> 0:22:10.040 follow a particular type of rule, which we now call 0:22:10.240 --> 0:22:15.400 the Markov property. That rule says that for this particular process, 0:22:15.440 --> 0:22:19.640 the next stage of the process only depends upon the 0:22:19.680 --> 0:22:23.960 current stage, but not any stages that came before then. 0:22:24.400 --> 0:22:28.480 So let's take my ridiculous T shirt example and let's 0:22:28.480 --> 0:22:30.880 build it out a little bit further. Let's say that 0:22:31.000 --> 0:22:33.680 I've got three T shirts to my name. One of 0:22:33.720 --> 0:22:36.320 them is that they might be Giant's shirt. One is 0:22:36.359 --> 0:22:40.040 a plain blue T shirt, and the third is a 0:22:40.119 --> 0:22:43.159 shirt that has the tech Stuff logo on it. And 0:22:43.960 --> 0:22:48.879 it's based off of long observation that you've determined these 0:22:48.920 --> 0:22:53.040 following facts. If I am wearing that they might be 0:22:53.119 --> 0:22:57.639 Giant's shirt today, I definitely will not wear it tomorrow. 0:22:58.040 --> 0:23:01.199 But there's a fifty fifty shot I'll wear either the 0:23:01.200 --> 0:23:05.000 blue shirt or the tech Stuff shirt. Now, if I'm 0:23:05.040 --> 0:23:09.040 wearing the blue shirt today, there's a ten chance I'm 0:23:09.040 --> 0:23:12.520 going to wear the same blue shirt tomorrow. Don't worry, 0:23:12.800 --> 0:23:16.840 I'll wash it first. There's a sixty chance that I'll 0:23:16.880 --> 0:23:19.560 wear the tech Stuff shirt, and there's a thirty percent 0:23:19.640 --> 0:23:22.879 chance I'll wear the they Might Be Giant shirt. But 0:23:23.800 --> 0:23:26.439 if I'm wearing the tech stuff shirt today, there's a 0:23:26.440 --> 0:23:29.639 seventy chance I'll wear it again tomorrow because I like 0:23:29.720 --> 0:23:33.000 to promote myself. But there's a thirty percent chance I'll 0:23:33.000 --> 0:23:35.439 wear the they Might be Giant shirt, and there is 0:23:35.520 --> 0:23:38.160 no chance that I'm going to wear the blue one 0:23:38.520 --> 0:23:42.760 in this case. So those are our various scenarios. Right 0:23:43.080 --> 0:23:47.800 which shirt I will wear tomorrow depends only upon which 0:23:47.880 --> 0:23:51.359 shirt I am wearing today. What I wore yesterday has 0:23:51.400 --> 0:23:55.359 no bearing on the outcome for tomorrow, So today is 0:23:55.400 --> 0:23:59.119 all that matters. And depending on which shirt I wear, 0:23:59.560 --> 0:24:02.879 you can make some probability predictions for tomorrow. So we 0:24:02.920 --> 0:24:05.840 can actually use this approach to figure out the probability 0:24:05.920 --> 0:24:09.080 that I might wear the tech Stuff shirts, say ten 0:24:09.200 --> 0:24:12.359 days in a row, since there's a better than even 0:24:12.480 --> 0:24:16.000 chance that if I'm wearing tech Stuff today, I'll end 0:24:16.080 --> 0:24:19.280 up wearing it again tomorrow, and if I wear it tomorrow, 0:24:19.480 --> 0:24:22.119 then there's a better than fift chance that I'm going 0:24:22.160 --> 0:24:25.840 to wear it the following day. But at some point 0:24:25.960 --> 0:24:29.119 you're going to see that the odds are starting to 0:24:29.200 --> 0:24:33.600 be against you, for you know, increasingly long strings of 0:24:33.640 --> 0:24:37.240 wearing the tech stuff shirt. Anyway, Markov chains would become 0:24:37.320 --> 0:24:40.159 one of the types of processes that machine learning models 0:24:40.200 --> 0:24:43.760 would incorporate, with some models looking at the current state 0:24:43.880 --> 0:24:46.879 of a given process and then make predictions on what 0:24:47.160 --> 0:24:50.679 the next state will be with no need to look 0:24:50.800 --> 0:24:56.720 back at the previous decisions. The Markov chain is memory less. 0:24:57.640 --> 0:25:00.960 Now that's just a couple of the mathematicians whose work 0:25:01.080 --> 0:25:05.399 underlies elements of machine learning. There's also structure we need 0:25:05.440 --> 0:25:09.800 to talk about. In a man named Donald Hebb wrote 0:25:09.800 --> 0:25:13.520 a book titled The Organization of Behavior, and in that book, 0:25:14.080 --> 0:25:18.560 Hebb gave hypothesis on how neurons, that is, how how 0:25:18.640 --> 0:25:22.840 brain cells interact with one another. His ideas included the 0:25:22.840 --> 0:25:27.119 notion that if two neurons interact with one another regularly, 0:25:27.640 --> 0:25:31.000 that is, if one fires, that the second one is 0:25:31.040 --> 0:25:35.280 also likely to fire. They end up forming a tighter 0:25:35.320 --> 0:25:40.399 communicative relationship with each other. Not long after his expression 0:25:40.400 --> 0:25:44.199 of this hypothesis. Computer scientists began to think of a 0:25:44.200 --> 0:25:48.480 potential way to do this artificially, with machines creating the 0:25:48.560 --> 0:25:54.440 equivalent of artificial neurons. The relative strength in relationship between 0:25:54.720 --> 0:25:59.560 artificial neurons is something we describe by Wait, that's going 0:25:59.600 --> 0:26:02.919 to be an important part of machine learning. WIT. By 0:26:02.920 --> 0:26:06.120 the way, is W E I G H T, as 0:26:06.160 --> 0:26:11.439 in this relationship is weighted more heavily than that relationship. 0:26:12.200 --> 0:26:16.080 In the early nineteen fifties, an IBM researcher named Arthur 0:26:16.280 --> 0:26:19.919 Samuel created a program designed to win at checkers. The 0:26:19.960 --> 0:26:22.920 program would do a quick analysis of where pieces were 0:26:23.160 --> 0:26:27.120