WEBVTT - Machine Learning 101

0:00:04.400 --> 0:00:07.800
<v Speaker 1>Welcome to tech Stuff, a production from I Heart Radio.

0:00:12.119 --> 0:00:15.440
<v Speaker 1>Hey there, and welcome to tech Stuff. This is your host,

0:00:15.640 --> 0:00:19.120
<v Speaker 1>Jonathan Strickland. I'm an executive producer with I Heart Radio

0:00:19.160 --> 0:00:22.360
<v Speaker 1>and I love all things sex. You know, folks, Back

0:00:22.400 --> 0:00:28.000
<v Speaker 1>in nineteen eighties, six comedy science fiction film that I

0:00:28.040 --> 0:00:32.400
<v Speaker 1>saw in the theater about a robot the games sentience

0:00:32.440 --> 0:00:35.640
<v Speaker 1>and becomes a total goofball what it will? It hit

0:00:35.720 --> 0:00:39.199
<v Speaker 1>theaters in eight six and it was called Short Circuit.

0:00:39.880 --> 0:00:44.200
<v Speaker 1>The movie starred Steve Gutenberg, Ali Sheety, and lamentably a

0:00:44.240 --> 0:00:48.080
<v Speaker 1>white actor named Fisher Stevens playing a non white character,

0:00:48.720 --> 0:00:52.640
<v Speaker 1>someone who is Indian. I should add that's not Steven's fault.

0:00:52.880 --> 0:00:55.760
<v Speaker 1>I mean, he auditioned to be in a movie and

0:00:55.840 --> 0:00:58.280
<v Speaker 1>he got a gig. He didn't cast himself in the film,

0:00:58.320 --> 0:01:01.800
<v Speaker 1>and he has since talked about his experiences realizing the

0:01:01.840 --> 0:01:04.280
<v Speaker 1>problems with a white man playing a non white character,

0:01:04.319 --> 0:01:08.880
<v Speaker 1>but setting aside all the problematic whitewashing, the movie showed

0:01:09.000 --> 0:01:11.920
<v Speaker 1>this robot, who in the course of the film names

0:01:11.959 --> 0:01:17.000
<v Speaker 1>itself Johnny five. Learning. It learns about the world around it,

0:01:17.120 --> 0:01:20.840
<v Speaker 1>It learns about people, It learns about human concepts like

0:01:21.000 --> 0:01:25.720
<v Speaker 1>humor and emotion, and the general idea was pretty cute.

0:01:26.800 --> 0:01:31.119
<v Speaker 1>Now the nifty thing is machines actually can learn. In fact,

0:01:31.200 --> 0:01:35.120
<v Speaker 1>machine learning is a really important field of study these days,

0:01:35.480 --> 0:01:38.920
<v Speaker 1>complete with its own challenges and risks. I've talked about

0:01:39.000 --> 0:01:41.399
<v Speaker 1>machine learning a few times in the past, but I

0:01:41.400 --> 0:01:44.240
<v Speaker 1>figured we could do a deeper dive to understand what

0:01:44.400 --> 0:01:48.120
<v Speaker 1>machine learning is, what it isn't, how people are leveraging

0:01:48.160 --> 0:01:51.880
<v Speaker 1>machine learning, and why. I said that it does come

0:01:51.920 --> 0:01:58.280
<v Speaker 1>with risks, so let's learn about machines learning. It will

0:01:58.320 --> 0:02:02.480
<v Speaker 1>be impossible to talk about machine learning without also talking

0:02:02.520 --> 0:02:08.079
<v Speaker 1>about artificial intelligence or AI. And this term artificial intelligence

0:02:08.280 --> 0:02:12.880
<v Speaker 1>is a real doozy. It trips people up, even people

0:02:12.960 --> 0:02:17.880
<v Speaker 1>who have dedicated their lives to researching and developing artificial intelligence.

0:02:18.200 --> 0:02:22.679
<v Speaker 1>You can get two experts in AI talking about AI

0:02:22.800 --> 0:02:25.600
<v Speaker 1>and find out that because they have slightly different takes

0:02:25.639 --> 0:02:31.320
<v Speaker 1>on what AI is, there are some communication issues. It's

0:02:31.320 --> 0:02:34.200
<v Speaker 1>not as simple as red versus blue would have you think,

0:02:34.840 --> 0:02:40.440
<v Speaker 1>what does the A stand for? So when you really

0:02:40.880 --> 0:02:43.200
<v Speaker 1>boil it down, it comes out as as no big

0:02:43.200 --> 0:02:46.239
<v Speaker 1>surprise that there's a lot of ambiguity here. After all,

0:02:46.600 --> 0:02:51.639
<v Speaker 1>how would you define intelligence just intelligence, not artificial intelligence

0:02:52.000 --> 0:02:56.640
<v Speaker 1>just intelligence? Well, would it be the ability to learn,

0:02:57.000 --> 0:03:01.240
<v Speaker 1>that is, to acquire skills and knowledge? Or is it

0:03:01.320 --> 0:03:04.680
<v Speaker 1>the application of learning? Is it problems solving? Is it

0:03:05.160 --> 0:03:08.440
<v Speaker 1>being able to think ahead and make plans in order

0:03:08.480 --> 0:03:12.720
<v Speaker 1>to achieve a specific goal? Is it the ability to

0:03:12.960 --> 0:03:16.560
<v Speaker 1>examine a problem and deconstructed in order to figure out

0:03:16.600 --> 0:03:19.600
<v Speaker 1>the best solution a more specific version of problem solving?

0:03:20.240 --> 0:03:25.560
<v Speaker 1>Is it the ability to recognize, understand, and navigate emotional scenarios? Now,

0:03:25.680 --> 0:03:30.920
<v Speaker 1>arguably it's all of these things and more. We all

0:03:31.000 --> 0:03:35.400
<v Speaker 1>have kind of an intuitive grasp on what intelligence is,

0:03:36.280 --> 0:03:40.960
<v Speaker 1>but defining it in a simple way tends to feel

0:03:41.000 --> 0:03:44.080
<v Speaker 1>reductive and it leaves out a lot of important details.

0:03:44.480 --> 0:03:50.160
<v Speaker 1>So if defining just general intelligence is hard, it stands

0:03:50.200 --> 0:03:55.000
<v Speaker 1>for reason that defining artificial intelligence is also a tough job. Heck,

0:03:55.320 --> 0:03:58.640
<v Speaker 1>even coming up with a number of different types of

0:03:58.680 --> 0:04:02.720
<v Speaker 1>AI is tricky. And if you don't believe me, just

0:04:02.920 --> 0:04:08.920
<v Speaker 1>google the phrase different types of artificial intelligence. Never mind,

0:04:08.960 --> 0:04:10.680
<v Speaker 1>you don't. You don't really actually have to do that.

0:04:10.720 --> 0:04:13.800
<v Speaker 1>I already did it, though, Feel free to do it

0:04:13.840 --> 0:04:16.839
<v Speaker 1>yourself and check my work if you like. When I

0:04:17.080 --> 0:04:20.640
<v Speaker 1>googled that phrase different types of AI, some of The

0:04:20.680 --> 0:04:24.960
<v Speaker 1>top results included a blog post on BMC Software titled

0:04:25.240 --> 0:04:28.840
<v Speaker 1>four types of Artificial Intelligence. But then there was also

0:04:28.880 --> 0:04:31.840
<v Speaker 1>an article on code bots that was titled what are

0:04:31.880 --> 0:04:34.960
<v Speaker 1>the three types of AI? And then there was an

0:04:35.040 --> 0:04:40.080
<v Speaker 1>article from Forbes titled seven types of Artificial Intelligence. See,

0:04:40.279 --> 0:04:43.320
<v Speaker 1>we can't even agree on how many versions of A

0:04:43.480 --> 0:04:48.560
<v Speaker 1>EITHER are because defining a I is really hard. It

0:04:48.680 --> 0:04:52.080
<v Speaker 1>largely depends upon how you view AI and then how

0:04:52.080 --> 0:04:56.039
<v Speaker 1>you break it down into different realms of intelligence. Now

0:04:56.080 --> 0:04:59.839
<v Speaker 1>we could go super high level because a classic way

0:04:59.839 --> 0:05:04.960
<v Speaker 1>to look at AI is strong versus weak artificial intelligence.

0:05:06.560 --> 0:05:12.440
<v Speaker 1>Strong AI UH sometimes called artificial general intelligence, would be

0:05:12.600 --> 0:05:17.640
<v Speaker 1>a machine that processes information and at least appears to

0:05:17.839 --> 0:05:21.599
<v Speaker 1>have some form of consciousness and self awareness and the

0:05:21.640 --> 0:05:26.000
<v Speaker 1>ability to both have experiences and to be aware that

0:05:26.120 --> 0:05:30.239
<v Speaker 1>it is having experiences. It might even feel emotion, though

0:05:30.880 --> 0:05:34.880
<v Speaker 1>maybe not emotions that we could easily identify or sympathize with.

0:05:35.520 --> 0:05:38.839
<v Speaker 1>So this would be the kind of machine that would

0:05:39.080 --> 0:05:42.560
<v Speaker 1>think in a way similar to humans. It would be

0:05:42.600 --> 0:05:45.960
<v Speaker 1>able to sense its environment and not just react, but

0:05:46.120 --> 0:05:49.800
<v Speaker 1>really process what is going on and build and understanding.

0:05:50.000 --> 0:05:51.880
<v Speaker 1>It's the type of AI that we see a lot

0:05:52.160 --> 0:05:55.239
<v Speaker 1>in science fiction. That's the type of AI of Johnny

0:05:55.320 --> 0:05:59.240
<v Speaker 1>five from Short Circuit, or how from two thousand one,

0:05:59.360 --> 0:06:02.440
<v Speaker 1>or the draw aids in Star Wars. It's also a

0:06:02.440 --> 0:06:06.560
<v Speaker 1>type of artificial intelligence that we have yet to actually

0:06:06.600 --> 0:06:11.479
<v Speaker 1>achieve in the real world. So then what is weak AI. Well,

0:06:12.680 --> 0:06:16.120
<v Speaker 1>you could say it's everything else, or you could say

0:06:16.120 --> 0:06:21.080
<v Speaker 1>it's the building blocks that maybe collectively will lead to

0:06:21.200 --> 0:06:26.200
<v Speaker 1>strong AI. Week AI involves processes that allow machines to

0:06:26.279 --> 0:06:31.360
<v Speaker 1>complete tasks. So, for example, image recognition software could fall

0:06:31.400 --> 0:06:34.680
<v Speaker 1>into this category. Once upon a time, in order to

0:06:34.760 --> 0:06:39.560
<v Speaker 1>search photos effectively, you needed to actually add meta data

0:06:39.839 --> 0:06:45.039
<v Speaker 1>like tags to those photos. So, for example, I might

0:06:45.400 --> 0:06:49.240
<v Speaker 1>tag pictures of my dog with the meta tag dog,

0:06:50.040 --> 0:06:52.719
<v Speaker 1>and then if I wanted to see photos of my pooch,

0:06:53.320 --> 0:06:55.440
<v Speaker 1>then I would pull up my photo app and search

0:06:55.760 --> 0:06:58.840
<v Speaker 1>the term dog and all the photos that I had

0:06:58.880 --> 0:07:01.280
<v Speaker 1>tagged with the word dog would show up. But if

0:07:01.320 --> 0:07:04.679
<v Speaker 1>I had failed to tag some pictures of my dog,

0:07:05.279 --> 0:07:07.839
<v Speaker 1>those pictures wouldn't pop up in search because the computer

0:07:07.839 --> 0:07:11.240
<v Speaker 1>program wasn't actually looking for dogs and my photos, it

0:07:11.360 --> 0:07:13.960
<v Speaker 1>was just looking for photos that had that particular meta

0:07:14.000 --> 0:07:18.040
<v Speaker 1>tag attached to it. But now we've reached a point

0:07:18.240 --> 0:07:21.400
<v Speaker 1>where at least some photo apps are using image recognition

0:07:21.480 --> 0:07:25.240
<v Speaker 1>to analyze photos, and these will return results that the

0:07:25.280 --> 0:07:28.679
<v Speaker 1>algorithm has identified as having a reasonable chance of meeting

0:07:28.880 --> 0:07:31.720
<v Speaker 1>your search query. So if I used an app like

0:07:31.760 --> 0:07:35.480
<v Speaker 1>that and I put in dog as my search term,

0:07:35.520 --> 0:07:38.239
<v Speaker 1>it could pull up photos that had no meta tags

0:07:38.240 --> 0:07:41.160
<v Speaker 1>attached to them at all, because the search is relying

0:07:41.200 --> 0:07:44.640
<v Speaker 1>on image recognition. Now, this also means that if the

0:07:44.680 --> 0:07:48.120
<v Speaker 1>image recognition algorithm isn't very good, I could get some

0:07:48.200 --> 0:07:50.360
<v Speaker 1>images that don't have a dog in them at all,

0:07:50.880 --> 0:07:54.040
<v Speaker 1>or it might miss other images that have my dog

0:07:54.120 --> 0:07:56.760
<v Speaker 1>in them. But my point is that the ability to

0:07:56.840 --> 0:07:59.760
<v Speaker 1>identify whether or not a dog is in a particular

0:08:00.080 --> 0:08:05.760
<v Speaker 1>photo represents a kind of weak artificial intelligence. You wouldn't

0:08:05.800 --> 0:08:10.640
<v Speaker 1>say that the photo search tool possesses humanlike intelligence, because

0:08:10.680 --> 0:08:14.200
<v Speaker 1>really it only does one thing. It's analyzing photos and

0:08:14.240 --> 0:08:17.840
<v Speaker 1>looks for matches to specific search queries, but it can't

0:08:17.920 --> 0:08:21.440
<v Speaker 1>do anything outside of that use case. However, that's just

0:08:21.560 --> 0:08:24.520
<v Speaker 1>one little example. There are all sorts of other ones,

0:08:24.560 --> 0:08:30.480
<v Speaker 1>like voice recognition, environmental sensing, course plotting, that kind of thing,

0:08:30.720 --> 0:08:33.880
<v Speaker 1>and in some circles, as we get better at making machines,

0:08:33.920 --> 0:08:39.040
<v Speaker 1>and systems that can do these things. Those elements seem

0:08:39.080 --> 0:08:42.360
<v Speaker 1>to kind of drift away from the ongoing conversation about

0:08:42.440 --> 0:08:46.199
<v Speaker 1>artificial intelligence. A guy named Larry Tesler, who was a

0:08:46.240 --> 0:08:49.160
<v Speaker 1>computer scientist who worked at lots of really important places

0:08:49.240 --> 0:08:54.520
<v Speaker 1>like Xerox Park and Amazon and Apple. He once observed,

0:08:54.640 --> 0:08:59.920
<v Speaker 1>quote intelligence is whatever machines haven't done yet, end quote.

0:09:00.440 --> 0:09:03.480
<v Speaker 1>So his point was that the reason that AI is

0:09:03.559 --> 0:09:06.120
<v Speaker 1>really hard to talk about is that the goal post

0:09:06.320 --> 0:09:12.920
<v Speaker 1>for what actually is artificial intelligence is constantly moving. Now,

0:09:12.920 --> 0:09:16.719
<v Speaker 1>this pretty much mirrors how we think about things like consciousness.

0:09:17.120 --> 0:09:20.640
<v Speaker 1>Lots of people study consciousness, and the general sense I

0:09:20.679 --> 0:09:23.240
<v Speaker 1>get is that it's a lot easier for people to

0:09:23.280 --> 0:09:29.080
<v Speaker 1>talk about what isn't consciousness rather than what consciousness actually is.

0:09:29.760 --> 0:09:33.480
<v Speaker 1>And it seems like artificial intelligence is in a similar place,

0:09:33.559 --> 0:09:36.520
<v Speaker 1>which really isn't that big of a surprise as we

0:09:36.679 --> 0:09:41.200
<v Speaker 1>closely associate intelligence with consciousness. Now this leads us to

0:09:41.720 --> 0:09:45.160
<v Speaker 1>why there are so many different takes on how many

0:09:45.200 --> 0:09:48.680
<v Speaker 1>types of AI there are. It all depends on how

0:09:48.800 --> 0:09:53.199
<v Speaker 1>you classify different disciplines in artificial intelligence, and over time,

0:09:53.679 --> 0:09:57.679
<v Speaker 1>a lot of disciplines that were previously distinct from AI

0:09:57.800 --> 0:10:01.599
<v Speaker 1>have sort of converged into becoming heart of the AI discussion.

0:10:01.880 --> 0:10:04.840
<v Speaker 1>Machine learning, as it turns out, was part of the

0:10:04.920 --> 0:10:09.520
<v Speaker 1>AI discussion, branched off from it, and then rejoined the

0:10:09.559 --> 0:10:12.920
<v Speaker 1>AI discussion years later. So I am not going to

0:10:12.960 --> 0:10:16.240
<v Speaker 1>go down all the different approaches to classification because I

0:10:16.280 --> 0:10:18.719
<v Speaker 1>don't know that they would be that valuable to us.

0:10:19.200 --> 0:10:21.120
<v Speaker 1>They would really just illustrate that there are a lot

0:10:21.160 --> 0:10:26.280
<v Speaker 1>of different ways to look at the subject. So if

0:10:26.360 --> 0:10:30.559
<v Speaker 1>you ever find yourself in a conversation about AI, it

0:10:30.640 --> 0:10:33.720
<v Speaker 1>might be a good idea to set a few ground

0:10:33.840 --> 0:10:37.440
<v Speaker 1>rules as to what everyone means when they use the

0:10:37.520 --> 0:10:42.760
<v Speaker 1>term artificial intelligence. That can help with expectations and understanding.

0:10:43.320 --> 0:10:46.200
<v Speaker 1>Or you could just run for the nearest exit, which

0:10:46.240 --> 0:10:49.920
<v Speaker 1>is what people tend to do whenever I start talking

0:10:49.960 --> 0:10:56.040
<v Speaker 1>about it anyway. What about machine learning, Well, from one perspective,

0:10:56.280 --> 0:10:59.199
<v Speaker 1>you could say machine learning is a sub discipline of

0:10:59.280 --> 0:11:03.080
<v Speaker 1>artificial and eligence, although like I said, it hasn't always

0:11:03.120 --> 0:11:07.080
<v Speaker 1>been viewed as such. I think most people would say

0:11:07.080 --> 0:11:11.000
<v Speaker 1>that the ability to learn that is to take information

0:11:11.160 --> 0:11:15.280
<v Speaker 1>and experience and then have some form of understanding of

0:11:15.320 --> 0:11:19.080
<v Speaker 1>those things so that you can apply that to future tasks.

0:11:19.240 --> 0:11:23.160
<v Speaker 1>Potentially getting better over time. I would say most people

0:11:23.200 --> 0:11:26.720
<v Speaker 1>would call that part of intelligence, but you could also

0:11:26.760 --> 0:11:29.240
<v Speaker 1>be a bit more wishy washy and say it's related to,

0:11:29.880 --> 0:11:33.520
<v Speaker 1>you know, artificial intelligence, as opposed to being part of AI.

0:11:33.640 --> 0:11:37.839
<v Speaker 1>Since the definition of AI is let's say, fluid, either

0:11:37.920 --> 0:11:41.520
<v Speaker 1>way of classifying. Machine learning works. As far as I'm concerned,

0:11:42.600 --> 0:11:46.160
<v Speaker 1>machine learning boils down to the idea of creating a

0:11:46.200 --> 0:11:50.120
<v Speaker 1>system that can learn as it performs a task. It

0:11:50.160 --> 0:11:54.679
<v Speaker 1>can learn what works and more importantly, what does not work.

0:11:55.200 --> 0:11:57.440
<v Speaker 1>You may have heard that we learn a lot more

0:11:57.520 --> 0:12:01.200
<v Speaker 1>from our mistakes than we do from our successes, which

0:12:01.880 --> 0:12:05.320
<v Speaker 1>there's pretty much true in my experience. When something goes wrong,

0:12:05.880 --> 0:12:11.280
<v Speaker 1>it's usually, but not always, possible to trace the event

0:12:11.480 --> 0:12:14.760
<v Speaker 1>or events that led to the failure. You can identify

0:12:14.840 --> 0:12:19.120
<v Speaker 1>decisions that we're probably the wrong ones or that led

0:12:19.200 --> 0:12:22.679
<v Speaker 1>to a bad outcome, But if you have a success,

0:12:23.080 --> 0:12:27.160
<v Speaker 1>it's hard to figure out which decisions were key to

0:12:27.280 --> 0:12:30.960
<v Speaker 1>that successful outcome. Did your decision at step two set

0:12:31.000 --> 0:12:33.560
<v Speaker 1>you on the right path, or was your choice at

0:12:33.559 --> 0:12:36.920
<v Speaker 1>step three so good that it helped correct a mistake

0:12:37.160 --> 0:12:39.920
<v Speaker 1>that you made it step two. But a good approach

0:12:39.960 --> 0:12:43.480
<v Speaker 1>to machine learning involves a system that can adjust things

0:12:43.520 --> 0:12:47.160
<v Speaker 1>on its own to reduce mistakes and increase the success rate.

0:12:47.520 --> 0:12:50.040
<v Speaker 1>And another way of putting it is that instead of

0:12:50.080 --> 0:12:53.720
<v Speaker 1>programming a system to arrive at a specific outcome, you

0:12:53.800 --> 0:12:57.160
<v Speaker 1>are training the system to learn how to do it

0:12:57.240 --> 0:13:00.520
<v Speaker 1>by itself. And that sounds a bit magical when you

0:13:00.559 --> 0:13:03.760
<v Speaker 1>put it that way, doesn't it. It sounds like someone

0:13:03.840 --> 0:13:06.880
<v Speaker 1>just took a computer and showed it pictures of cats

0:13:07.080 --> 0:13:09.640
<v Speaker 1>and then expected the computer to know what a cat was.

0:13:10.440 --> 0:13:13.840
<v Speaker 1>And this actually does mirror an actual project that really

0:13:14.240 --> 0:13:17.880
<v Speaker 1>did do that, But I'm leaving out some big important

0:13:17.880 --> 0:13:22.200
<v Speaker 1>information in the middle. Now. One big step is that

0:13:22.240 --> 0:13:26.520
<v Speaker 1>computers and machines can't just magically learn by default. People

0:13:26.600 --> 0:13:29.840
<v Speaker 1>first had to come up with a methodology that allows

0:13:29.920 --> 0:13:32.560
<v Speaker 1>machines to go through the process of completing a task,

0:13:33.200 --> 0:13:36.960
<v Speaker 1>then making adjustments to the process of doing that task,

0:13:37.360 --> 0:13:40.880
<v Speaker 1>which would then improve future results. We have to lay

0:13:40.880 --> 0:13:45.440
<v Speaker 1>the groundwork in architecture and theory and algorithms. We have

0:13:45.520 --> 0:13:49.600
<v Speaker 1>to build the logical pathways that computers can follow in

0:13:49.720 --> 0:13:52.720
<v Speaker 1>order for them to learn. A lot of machine learning

0:13:53.120 --> 0:13:57.360
<v Speaker 1>revolves around patterns and pattern recognition. So what do I

0:13:57.400 --> 0:14:01.400
<v Speaker 1>mean by patterns? Well, I mean some form of regularity

0:14:01.480 --> 0:14:06.800
<v Speaker 1>and predictability. Machine learning models analyze patterns and attempt to

0:14:06.880 --> 0:14:11.640
<v Speaker 1>draw conclusions based on those patterns. This in itself is

0:14:11.640 --> 0:14:15.720
<v Speaker 1>tricky stuff. So why is that, Well, it's because sometimes

0:14:15.960 --> 0:14:19.960
<v Speaker 1>we might think there's a pattern, when in reality there

0:14:20.080 --> 0:14:25.480
<v Speaker 1>is not. We humans are pretty good at recognizing patterns,

0:14:25.680 --> 0:14:29.480
<v Speaker 1>which makes sense. It's a survival mechanism. If you were

0:14:29.520 --> 0:14:33.160
<v Speaker 1>to look at tall grass and you see patterns that

0:14:33.240 --> 0:14:37.320
<v Speaker 1>suggest the presence of a predator like a tiger, well

0:14:37.440 --> 0:14:40.520
<v Speaker 1>you would know that danger is nearby, and you would

0:14:40.520 --> 0:14:43.560
<v Speaker 1>have the opportunity to do something about that to help

0:14:43.600 --> 0:14:48.960
<v Speaker 1>your chances of survival. If, however, you remained blissfully unaware

0:14:49.080 --> 0:14:51.960
<v Speaker 1>of the danger, you'd be far more likely to fall

0:14:52.000 --> 0:14:55.920
<v Speaker 1>prey to that hungry tiger. So recognizing patterns is one

0:14:55.920 --> 0:14:58.760
<v Speaker 1>of the abilities that gave humans a chance to live

0:14:58.800 --> 0:15:02.440
<v Speaker 1>another day, and, from an evolutionary standpoint, a chance to

0:15:02.800 --> 0:15:07.680
<v Speaker 1>make more humans. But sometimes we humans will perceive a

0:15:07.720 --> 0:15:12.920
<v Speaker 1>pattern where none actually exists. A simple example of this

0:15:13.080 --> 0:15:16.960
<v Speaker 1>is the fun exercise of laying on your back outside,

0:15:17.360 --> 0:15:20.200
<v Speaker 1>looking up at the clouds and saying, what does that

0:15:20.240 --> 0:15:23.960
<v Speaker 1>cloud remind you of? The shapes of clouds? Which have

0:15:24.560 --> 0:15:28.480
<v Speaker 1>no significance and are the product of environmental factors, can

0:15:28.600 --> 0:15:32.600
<v Speaker 1>seem to suggest patterns to us. We might see a dog,

0:15:32.840 --> 0:15:36.120
<v Speaker 1>or a car or a face, but we know that

0:15:36.280 --> 0:15:40.360
<v Speaker 1>what we're really seeing with just the appearance of a pattern,

0:15:40.440 --> 0:15:43.360
<v Speaker 1>it's it's not evidence of a pattern actually being there.

0:15:43.400 --> 0:15:50.040
<v Speaker 1>It's noise, not signal, but it could be misinterpreted as signal. Well,

0:15:50.080 --> 0:15:53.000
<v Speaker 1>it turns out that in machine learning applications this is

0:15:53.080 --> 0:15:55.520
<v Speaker 1>also an issue. I'll talk about it more towards the

0:15:55.600 --> 0:15:59.800
<v Speaker 1>end of this episode. Computers can sometimes misinterpret data and

0:16:00.080 --> 0:16:04.000
<v Speaker 1>termines something represents a pattern when it really doesn't. When

0:16:04.040 --> 0:16:07.000
<v Speaker 1>that happens, a system relying on machine learning can produce

0:16:07.080 --> 0:16:11.480
<v Speaker 1>false positives, and the consequences can sometimes be funny, like hey,

0:16:11.520 --> 0:16:14.320
<v Speaker 1>this image recognition software thinks this coffee mug is actually

0:16:14.360 --> 0:16:17.320
<v Speaker 1>a kiddie cat, or they can be really serious and

0:16:17.360 --> 0:16:22.440
<v Speaker 1>potentially harmful. Hey, this facial recognition software has misidentified a person,

0:16:22.720 --> 0:16:25.640
<v Speaker 1>marking them as, say, a person of interest in a

0:16:25.680 --> 0:16:29.080
<v Speaker 1>criminal case. And it's all because this facial recognition software

0:16:29.120 --> 0:16:32.560
<v Speaker 1>isn't very good at differentiating people of color. That's a

0:16:32.680 --> 0:16:36.520
<v Speaker 1>real problem that really happens. Now, when we come back

0:16:36.800 --> 0:16:40.400
<v Speaker 1>I'll give a little overview of the evolution of machine learning,

0:16:40.880 --> 0:16:44.200
<v Speaker 1>but before we do that, let's take a quick break

0:16:51.840 --> 0:16:55.320
<v Speaker 1>to talk about the history of machine learning. We first

0:16:55.360 --> 0:16:59.120
<v Speaker 1>have to look back much much earlier, long before the

0:16:59.160 --> 0:17:02.880
<v Speaker 1>era of computers, and talk about how thinkers like Thomas

0:17:02.960 --> 0:17:07.600
<v Speaker 1>Bayes thought about the act of problem solving. Bays was

0:17:07.680 --> 0:17:11.240
<v Speaker 1>born way back in two so quite a bit before

0:17:11.280 --> 0:17:14.480
<v Speaker 1>we were thinking about machine learning, but he was interested

0:17:14.600 --> 0:17:19.560
<v Speaker 1>in problem solving for problems involving probabilities, and specifically the

0:17:19.600 --> 0:17:24.000
<v Speaker 1>relationship between different probabilities. I think it's easier to talk

0:17:24.040 --> 0:17:27.520
<v Speaker 1>about if I give you an example. So let's make

0:17:27.560 --> 0:17:30.320
<v Speaker 1>a silly one, all right, So let's say we got

0:17:30.320 --> 0:17:35.440
<v Speaker 1>ourselves a plucky podcaster. Hey there, everybody, It's Jonathan Strickland,

0:17:36.080 --> 0:17:39.600
<v Speaker 1>and it's Tuesday as I record this, And because of

0:17:39.760 --> 0:17:43.199
<v Speaker 1>who I am, you know who this podcaster is. And

0:17:43.280 --> 0:17:47.480
<v Speaker 1>because it's Tuesday, there is a chance I am wearing

0:17:47.640 --> 0:17:51.159
<v Speaker 1>a they might Be Giants T shirt. And we also

0:17:51.240 --> 0:17:55.760
<v Speaker 1>know that if this podcaster is wearing a they might

0:17:55.800 --> 0:17:59.879
<v Speaker 1>be Giants T shirt on a Tuesday, there's a sixty

0:18:00.119 --> 0:18:03.440
<v Speaker 1>percent chance that I'm going to end up wearing pajamas

0:18:03.520 --> 0:18:06.960
<v Speaker 1>on Wednesday. But we also know that if I did

0:18:07.080 --> 0:18:11.280
<v Speaker 1>not where they Might be Giants shirt on Tuesday, and

0:18:11.400 --> 0:18:15.280
<v Speaker 1>remember there's a sixty chance I didn't, then we know

0:18:15.400 --> 0:18:17.920
<v Speaker 1>there's an eighty percent chance I'm going to be wearing

0:18:17.960 --> 0:18:22.240
<v Speaker 1>pajamas on Wednesday. Well, Bays worked out a way that

0:18:22.320 --> 0:18:28.040
<v Speaker 1>described this sort of probability relationship between different discrete events,

0:18:28.200 --> 0:18:32.000
<v Speaker 1>and using his reasoning, you can work forward or backward

0:18:32.000 --> 0:18:35.959
<v Speaker 1>based on probabilities. Bays would describe wearing a they Might

0:18:36.000 --> 0:18:39.320
<v Speaker 1>be Giant shirt on Tuesday as one event and wearing

0:18:39.320 --> 0:18:43.600
<v Speaker 1>pajamas on Wednesday as a separate event, and then describe

0:18:43.640 --> 0:18:46.400
<v Speaker 1>the two not only determining how likely it is I'll

0:18:46.400 --> 0:18:50.720
<v Speaker 1>wear pajamas on Wednesday, but if we start with the

0:18:50.920 --> 0:18:53.320
<v Speaker 1>later event. In other words, if we start with the

0:18:53.359 --> 0:18:57.240
<v Speaker 1>fact that it's Wednesday and I'm wearing pajamas, we could

0:18:57.240 --> 0:19:02.120
<v Speaker 1>work out how likely it was that yesterday, on Tuesday,

0:19:02.200 --> 0:19:05.439
<v Speaker 1>I was wearing they Might Be Giants shirt. That was

0:19:05.560 --> 0:19:08.000
<v Speaker 1>his his contribution, that you can work this in either

0:19:08.119 --> 0:19:11.679
<v Speaker 1>direction if you know these different variables. Now, Bay has

0:19:11.760 --> 0:19:15.240
<v Speaker 1>never published his thoughts, but rather send an essay explaining

0:19:15.280 --> 0:19:18.040
<v Speaker 1>it to a friend of his, who then made sure

0:19:18.080 --> 0:19:20.840
<v Speaker 1>that The work was published after Bays had passed away,

0:19:20.880 --> 0:19:25.040
<v Speaker 1>and a few decades later Pierre Simon Laplace would take

0:19:25.119 --> 0:19:27.560
<v Speaker 1>this work that Bays had done and flesh it out

0:19:27.600 --> 0:19:32.280
<v Speaker 1>into an actual formal theorem. It's an important example of

0:19:32.320 --> 0:19:36.840
<v Speaker 1>conditional probability, and a lot of what machine learning is

0:19:37.640 --> 0:19:42.800
<v Speaker 1>really boiled down to is dealing with different probabilities, not certainties, which,

0:19:42.800 --> 0:19:44.119
<v Speaker 1>when you get down to it, is what most of

0:19:44.200 --> 0:19:46.120
<v Speaker 1>us are doing most of the time. Right We make

0:19:46.160 --> 0:19:51.480
<v Speaker 1>decisions based on at least perceived probabilities. Sometimes these decisions

0:19:51.520 --> 0:19:54.960
<v Speaker 1>might feel like they're a coin flip situation that any

0:19:55.040 --> 0:19:58.399
<v Speaker 1>choice is equally likely to precipitate a good outcome or

0:19:58.440 --> 0:20:01.399
<v Speaker 1>a bad outcome. Other times we might make a choice

0:20:01.400 --> 0:20:04.960
<v Speaker 1>because we feel the probabilities are stacked favorably one way

0:20:05.080 --> 0:20:08.840
<v Speaker 1>over another. Sometimes we will make a choice to back

0:20:08.960 --> 0:20:13.679
<v Speaker 1>the least probable outcome because well, humans are not always

0:20:13.720 --> 0:20:17.399
<v Speaker 1>superrational and hex sometimes the long shot does pay off,

0:20:17.560 --> 0:20:22.560
<v Speaker 1>so that keeps Vegas in business. Bayes' theorem is just

0:20:22.680 --> 0:20:26.000
<v Speaker 1>one example of ways that mathematicians and philosophers figured out

0:20:26.040 --> 0:20:31.280
<v Speaker 1>ways to mathematically express problem solving and decision making, and

0:20:31.320 --> 0:20:33.440
<v Speaker 1>a lot of this was figuring out if there were

0:20:33.520 --> 0:20:36.119
<v Speaker 1>a way to boil down things that most of us

0:20:36.119 --> 0:20:40.280
<v Speaker 1>approached through intuition and experience. So it's kind of neat,

0:20:40.880 --> 0:20:43.480
<v Speaker 1>and also the more you look into it, the more

0:20:43.560 --> 0:20:46.240
<v Speaker 1>likely you might find it's a little spooky, because it's

0:20:46.240 --> 0:20:49.639
<v Speaker 1>weird to consider that our approaches to making choices and

0:20:49.720 --> 0:20:55.399
<v Speaker 1>solving problems can be reduced down to mathematical expressions. But

0:20:56.440 --> 0:21:00.359
<v Speaker 1>let's leave the potential existential crises alone for now, shall we.

0:21:00.480 --> 0:21:03.920
<v Speaker 1>So moving on, we have another smarty pants we need

0:21:03.960 --> 0:21:08.479
<v Speaker 1>to talk about Andre Markov, a Russian mathematician. In the

0:21:08.560 --> 0:21:12.120
<v Speaker 1>early twentieth century. He began studying the nature of certain

0:21:12.240 --> 0:21:16.160
<v Speaker 1>random processes that follow a particular type of rule, which

0:21:16.160 --> 0:21:20.000
<v Speaker 1>we now call the Markov property. That rule says that

0:21:20.400 --> 0:21:24.600
<v Speaker 1>for this particular process, the next stage of the process

0:21:24.720 --> 0:21:29.120
<v Speaker 1>only depends upon the current stage, but not any stages

0:21:29.160 --> 0:21:33.520
<v Speaker 1>that came before then. So let's take my ridiculous T

0:21:33.720 --> 0:21:36.600
<v Speaker 1>shirt example, and let's build it out a little bit further.

0:21:37.000 --> 0:21:39.800
<v Speaker 1>Let's say that I've got three T shirts to my name.

0:21:40.200 --> 0:21:42.119
<v Speaker 1>One of them is that they might be Giant's shirt,

0:21:42.680 --> 0:21:46.160
<v Speaker 1>one is a plain blue T shirt, and the third

0:21:46.480 --> 0:21:49.240
<v Speaker 1>is a shirt that has the tech stuff logo on it,

0:21:49.800 --> 0:21:54.959
<v Speaker 1>and it's based off of long observation that you've determined

0:21:55.280 --> 0:21:59.680
<v Speaker 1>these following facts. If I am wearing that they Might

0:21:59.680 --> 0:22:04.399
<v Speaker 1>Be shirt today, I definitely will not wear it tomorrow.

0:22:04.800 --> 0:22:08.280
<v Speaker 1>But there's a fifty shot I'll wear either the blue

0:22:08.280 --> 0:22:12.080
<v Speaker 1>shirt or the tech Stuff shirt. Now, if I'm wearing

0:22:12.280 --> 0:22:15.800
<v Speaker 1>the blue shirt today, there's a ten percent chance I'm

0:22:15.800 --> 0:22:19.280
<v Speaker 1>going to wear the same blue shirt tomorrow. Don't worry,

0:22:19.520 --> 0:22:23.600
<v Speaker 1>I'll wash it first. There's a sixty chance that I'll

0:22:23.600 --> 0:22:26.320
<v Speaker 1>wear the tech Stuff shirt, and there's a thirty percent

0:22:26.400 --> 0:22:29.600
<v Speaker 1>chance I'll wear the they Might Be Giant shirt. But

0:22:30.520 --> 0:22:33.159
<v Speaker 1>if I'm wearing the tech Stuff shirt today, there's a

0:22:33.200 --> 0:22:36.399
<v Speaker 1>seventy chance I'll wear it again tomorrow because I like

0:22:36.440 --> 0:22:39.760
<v Speaker 1>to promote myself. But there's a thirty percent chance I'll

0:22:39.760 --> 0:22:42.159
<v Speaker 1>wear the they Might Be Giant shirt, and there is

0:22:42.280 --> 0:22:44.920
<v Speaker 1>no chance that I'm going to wear the blue one

0:22:45.240 --> 0:22:49.520
<v Speaker 1>in this case. So those are our various scenarios. Right

0:22:49.800 --> 0:22:54.560
<v Speaker 1>which shirt I will wear tomorrow depends only upon which

0:22:54.640 --> 0:22:58.120
<v Speaker 1>shirt I am wearing today. What I wore yesterday has

0:22:58.119 --> 0:23:02.119
<v Speaker 1>no bearing on the outcome for tomorrow, So today is

0:23:02.160 --> 0:23:05.879
<v Speaker 1>all that matters. And depending on which shirt I wear,

0:23:06.320 --> 0:23:09.639
<v Speaker 1>you can make some probability predictions for tomorrow. So we

0:23:09.640 --> 0:23:12.600
<v Speaker 1>can actually use this approach to figure out the probability

0:23:12.640 --> 0:23:15.840
<v Speaker 1>that I might wear the tech Stuff shirts, say ten

0:23:15.920 --> 0:23:19.119
<v Speaker 1>days in a row, since there's a better than even

0:23:19.240 --> 0:23:22.760
<v Speaker 1>chance that if I'm wearing tech stuff today, I'll end

0:23:22.840 --> 0:23:26.000
<v Speaker 1>up wearing it again tomorrow. And if I wear it tomorrow,

0:23:26.240 --> 0:23:28.879
<v Speaker 1>then there's a better than fifty chance that I'm going

0:23:28.920 --> 0:23:32.639
<v Speaker 1>to wear it the following day. But at some point

0:23:32.720 --> 0:23:35.880
<v Speaker 1>you're going to see that the odds are starting to

0:23:35.960 --> 0:23:40.320
<v Speaker 1>be against you, for you know, increasingly long strings of

0:23:40.400 --> 0:23:44.000
<v Speaker 1>wearing the tech Stuff shirt. Anyway, Markov chains would become

0:23:44.040 --> 0:23:46.920
<v Speaker 1>one of the types of processes that machine learning models

0:23:46.960 --> 0:23:50.520
<v Speaker 1>would incorporate, with some models looking at the current state

0:23:50.600 --> 0:23:53.639
<v Speaker 1>of a given process and then make predictions on what

0:23:53.920 --> 0:23:57.399
<v Speaker 1>the next state will be with no need to look

0:23:57.560 --> 0:24:03.320
<v Speaker 1>back at the previous decision. The Markov chain is memory less.

0:24:04.400 --> 0:24:07.680
<v Speaker 1>Now that's just a couple of the mathematicians whose work

0:24:07.840 --> 0:24:12.159
<v Speaker 1>underlies elements of machine learning. There's also structure we need

0:24:12.200 --> 0:24:15.880
<v Speaker 1>to talk about. In ninety nine, a man named Donald

0:24:15.920 --> 0:24:19.520
<v Speaker 1>Hebb wrote a book titled The Organization of Behavior, and

0:24:19.600 --> 0:24:24.200
<v Speaker 1>in that book, Hebb gave a hypothesis on how neurons,

0:24:24.480 --> 0:24:27.879
<v Speaker 1>that is, how how brain cells interact with one another.

0:24:28.440 --> 0:24:32.480
<v Speaker 1>His ideas included the notion that if two neurons interact

0:24:32.520 --> 0:24:36.760
<v Speaker 1>with one another regularly, that is, if one fires, that

0:24:36.880 --> 0:24:40.440
<v Speaker 1>the second one is also likely to fire. They end

0:24:40.520 --> 0:24:44.959
<v Speaker 1>up forming a tighter communicative relationship with each other. Not

0:24:45.160 --> 0:24:50.320
<v Speaker 1>long after his expression of this hypothesis, computer scientists began

0:24:50.359 --> 0:24:53.000
<v Speaker 1>to think of a potential way to do this artificially,

0:24:53.400 --> 0:24:59.120
<v Speaker 1>with machines creating the equivalent of artificial neurons. The relative

0:24:59.280 --> 0:25:04.080
<v Speaker 1>strength and relationship between artificial neurons is something we described

0:25:04.119 --> 0:25:07.520
<v Speaker 1>by wait that's going to be an important part of

0:25:07.560 --> 0:25:11.439
<v Speaker 1>machine learning. WIT. By the way, is W E I

0:25:11.720 --> 0:25:15.640
<v Speaker 1>G H T, as in this relationship is weighted more

0:25:15.720 --> 0:25:21.040
<v Speaker 1>heavily than that relationship. In the early nineteen fifties, and

0:25:21.200 --> 0:25:25.080
<v Speaker 1>IBM researcher named Arthur Samuel created a program designed to

0:25:25.119 --> 0:25:28.399
<v Speaker 1>win at checkers. The program would do a quick analysis

0:25:28.440 --> 0:25:32.680
<v Speaker 1>of where pieces were on a checkerboard and whose move

0:25:32.720 --> 0:25:36.080
<v Speaker 1>it was, and then calculate the chances of each side

0:25:36.080 --> 0:25:38.960
<v Speaker 1>winning the game based on those positions, and it did

0:25:39.000 --> 0:25:43.119
<v Speaker 1>this with a mini max approach. Alright, so checkers is

0:25:43.160 --> 0:25:46.840
<v Speaker 1>a two player turn based game. Player one makes a move,

0:25:47.160 --> 0:25:49.399
<v Speaker 1>then player two can make a move. There are a

0:25:49.440 --> 0:25:52.840
<v Speaker 1>finite number of moves that can be made, a finite

0:25:52.960 --> 0:25:57.159
<v Speaker 1>number of possibilities, though admittedly it's a pretty good number

0:25:57.200 --> 0:26:00.520
<v Speaker 1>of possibilities. But let's say a game has been going

0:26:00.520 --> 0:26:03.439
<v Speaker 1>on for a few moves, and you've got your two sides.

0:26:03.480 --> 0:26:06.040
<v Speaker 1>You've got the red checkers over on player one side

0:26:06.160 --> 0:26:08.880
<v Speaker 1>and the black checkers for a player to Let's say

0:26:08.880 --> 0:26:12.080
<v Speaker 1>it's player one's move. For the purposes of this example,

0:26:12.400 --> 0:26:15.040
<v Speaker 1>will say that player one really just has one piece

0:26:15.200 --> 0:26:19.119
<v Speaker 1>that they can actually move on this turn, and it

0:26:19.160 --> 0:26:23.480
<v Speaker 1>can move into one of two open spaces. So player

0:26:23.520 --> 0:26:26.760
<v Speaker 1>one has to make a choice. After that choice, it's

0:26:26.760 --> 0:26:29.800
<v Speaker 1>going to be player two's turn. So we can create

0:26:29.840 --> 0:26:34.360
<v Speaker 1>a decision tree illustrating the possible choices and the possible

0:26:34.400 --> 0:26:38.639
<v Speaker 1>outcomes of those choices. These choices are the children of

0:26:38.680 --> 0:26:42.040
<v Speaker 1>the starting position for player one, so player one's starting

0:26:42.040 --> 0:26:46.120
<v Speaker 1>position has two children. Player too will have their own

0:26:46.200 --> 0:26:49.520
<v Speaker 1>choices to make after that decision has been made, but

0:26:49.840 --> 0:26:53.240
<v Speaker 1>those choices are going to depend upon whatever move player

0:26:53.280 --> 0:26:57.360
<v Speaker 1>one ultimately takes. So we can extend out our decision

0:26:57.440 --> 0:27:01.920
<v Speaker 1>tree showing the branching possible move that Player Too might make,

0:27:02.480 --> 0:27:05.879
<v Speaker 1>and these are the children of the two possible outcomes

0:27:05.920 --> 0:27:10.160
<v Speaker 1>of our first choice. After player two's turn, it's player

0:27:10.240 --> 0:27:14.000
<v Speaker 1>ones turn again, which means we need to branch those

0:27:14.040 --> 0:27:17.720
<v Speaker 1>decisions out even further. And this is all before player

0:27:17.800 --> 0:27:22.560
<v Speaker 1>one has even made that first choice. We're just evaluating possibilities.

0:27:22.840 --> 0:27:25.560
<v Speaker 1>At some point, either when we have plotted far enough

0:27:25.600 --> 0:27:28.840
<v Speaker 1>out that we know all possible outcomes of the game,

0:27:29.520 --> 0:27:32.399
<v Speaker 1>or we're just reaching a point where it would be

0:27:32.520 --> 0:27:35.399
<v Speaker 1>unmanageable for us to go any further. We need to

0:27:35.440 --> 0:27:40.160
<v Speaker 1>actually analyze what our options are. The endpoints represent either

0:27:40.720 --> 0:27:45.119
<v Speaker 1>a win, a loss, or a draw for player one, or,

0:27:45.160 --> 0:27:48.320
<v Speaker 1>if we haven't extended out the tree all the way

0:27:48.359 --> 0:27:50.600
<v Speaker 1>to the end of the game, at least a change

0:27:50.600 --> 0:27:54.160
<v Speaker 1>in advantage, whether it would be in player one's advantage

0:27:54.200 --> 0:27:58.119
<v Speaker 1>to make that move or disadvantage. We could actually assign

0:27:58.240 --> 0:28:01.760
<v Speaker 1>numerical values to each in the point, with positive values

0:28:01.840 --> 0:28:05.600
<v Speaker 1>representing an advantage for player one and a negative value

0:28:05.640 --> 0:28:09.080
<v Speaker 1>representing an advantage for Player Too. And once we do that,

0:28:09.480 --> 0:28:12.600
<v Speaker 1>we can see which pathways tend to lead to better

0:28:12.640 --> 0:28:17.399
<v Speaker 1>outcomes for Player one. We work backward through the decision tree.

0:28:17.840 --> 0:28:21.800
<v Speaker 1>So on all the decisions that end in an advantage

0:28:21.800 --> 0:28:24.760
<v Speaker 1>for player one, we can say this is the choice

0:28:24.760 --> 0:28:28.119
<v Speaker 1>that player one would take. But then we know that

0:28:28.240 --> 0:28:31.200
<v Speaker 1>for player to player two is always going to choose

0:28:31.520 --> 0:28:35.640
<v Speaker 1>whichever choice has the grace advantage for that player. So

0:28:36.040 --> 0:28:38.160
<v Speaker 1>we have to actually take that into account as we're

0:28:38.200 --> 0:28:43.400
<v Speaker 1>working backward, and this is how we can finally get

0:28:43.440 --> 0:28:45.520
<v Speaker 1>to the point where we decide which move we're going

0:28:45.560 --> 0:28:48.920
<v Speaker 1>to make, because these decisions, as you go backward up

0:28:48.960 --> 0:28:53.040
<v Speaker 1>the tree, they ultimately inform you which of those two

0:28:53.160 --> 0:28:58.040
<v Speaker 1>choices is going to give you the best result. Those values, well,

0:28:58.160 --> 0:29:01.040
<v Speaker 1>those are weights. So for player one, the goal is

0:29:01.080 --> 0:29:04.680
<v Speaker 1>to pick the path that has the highest positive value.

0:29:04.800 --> 0:29:07.440
<v Speaker 1>For player too, it's to pick the path that has

0:29:07.480 --> 0:29:11.080
<v Speaker 1>the lowest possible value or the highest negative value, if

0:29:11.080 --> 0:29:13.560
<v Speaker 1>you prefer. So. In other words, player one might be

0:29:13.600 --> 0:29:16.720
<v Speaker 1>thinking something like, if I moved to Spot A, my

0:29:16.840 --> 0:29:19.800
<v Speaker 1>chance of winning this game is but if I moved

0:29:19.800 --> 0:29:24.480
<v Speaker 1>to Spot B, it's only so. Of course, those percentages

0:29:24.520 --> 0:29:26.600
<v Speaker 1>will also depend on what player two is going to

0:29:26.640 --> 0:29:29.240
<v Speaker 1>do in response. Some moves that Player Too might do

0:29:29.440 --> 0:29:33.000
<v Speaker 1>could end up guaranteeing a win for player one. This

0:29:33.160 --> 0:29:35.920
<v Speaker 1>is the mini max approach, and there's an algorithm that

0:29:36.000 --> 0:29:39.320
<v Speaker 1>guides it. It depends upon the current position within a

0:29:39.400 --> 0:29:43.120
<v Speaker 1>game and how many moves or how much depth it

0:29:43.160 --> 0:29:46.200
<v Speaker 1>has to take into account, and for which player is

0:29:46.240 --> 0:29:50.760
<v Speaker 1>it actually helping out. What happens is if player one

0:29:50.880 --> 0:29:55.040
<v Speaker 1>does this evaluation and finds that both options are negative, well,

0:29:55.240 --> 0:29:58.360
<v Speaker 1>then this is something that happens in games. Right. Sometimes

0:29:58.440 --> 0:30:01.360
<v Speaker 1>you find out there is no good move, like any

0:30:01.440 --> 0:30:03.640
<v Speaker 1>move you make is going to be a losing move. Well,

0:30:03.640 --> 0:30:05.800
<v Speaker 1>the only option at that point is to choose the

0:30:05.920 --> 0:30:09.200
<v Speaker 1>least bad one, so it would be whatever the smallest

0:30:09.240 --> 0:30:13.240
<v Speaker 1>negative value choice was. Our next big development that I

0:30:13.280 --> 0:30:18.880
<v Speaker 1>need to mention is Frank Rosenblatt's artificial neural network called Perceptron.

0:30:19.560 --> 0:30:22.880
<v Speaker 1>Its purpose was to recognize shapes and patterns, and it

0:30:22.920 --> 0:30:26.480
<v Speaker 1>was originally going to be its own machine like actual hardware,

0:30:27.000 --> 0:30:30.480
<v Speaker 1>but the first incarnation of Perceptron would actually be in

0:30:30.520 --> 0:30:33.840
<v Speaker 1>the form of software rather than hardware. There was a

0:30:33.880 --> 0:30:37.480
<v Speaker 1>purpose built Perceptron later, but the original one was software.

0:30:37.960 --> 0:30:41.960
<v Speaker 1>Despite some early excitement, the Perceptron proved to be somewhat

0:30:42.080 --> 0:30:46.000
<v Speaker 1>limited in its capabilities and interest in artificial neural networks

0:30:46.040 --> 0:30:49.560
<v Speaker 1>died down for a while as a result. In a way,

0:30:50.200 --> 0:30:53.200
<v Speaker 1>you could kind of compare this to some other technologies

0:30:53.200 --> 0:30:56.640
<v Speaker 1>that got a big hype cycle and then later deflated.

0:30:57.040 --> 0:31:00.120
<v Speaker 1>Virtual reality is the one I always go with. Back

0:31:00.120 --> 0:31:02.720
<v Speaker 1>in the nineteen nineties, the world was really hyped for

0:31:02.800 --> 0:31:08.120
<v Speaker 1>virtual reality. People had incredibly unrealistic expectations for what VR

0:31:08.320 --> 0:31:11.200
<v Speaker 1>actually meant and what it could do. And when it

0:31:11.200 --> 0:31:14.600
<v Speaker 1>turned out the VR wasn't nearly as sophisticated as people

0:31:14.600 --> 0:31:18.400
<v Speaker 1>were imagining, a lot of enthusiasm dropped out for the

0:31:18.600 --> 0:31:23.040
<v Speaker 1>entire field, and with that dropped funding and support, and

0:31:23.080 --> 0:31:26.120
<v Speaker 1>as a result, development and VR hit a real wall,

0:31:26.200 --> 0:31:29.040
<v Speaker 1>with only a fraction of the people who had been

0:31:29.080 --> 0:31:32.400
<v Speaker 1>working in the field sticking around, and they had to

0:31:32.440 --> 0:31:35.320
<v Speaker 1>scramble just to find funding to keep their projects going.

0:31:35.720 --> 0:31:38.280
<v Speaker 1>So VR was effectively put on the shelf and wouldn't

0:31:38.320 --> 0:31:42.160
<v Speaker 1>make much progress for nearly twenty years. Well. Artificial neural

0:31:42.200 --> 0:31:46.959
<v Speaker 1>networks had a very similar issue, but other computer scientists

0:31:47.280 --> 0:31:51.000
<v Speaker 1>eventually found ways to design artificial neural networks. They could

0:31:51.040 --> 0:31:54.520
<v Speaker 1>do some pretty amazing things if they had access to

0:31:54.680 --> 0:31:57.960
<v Speaker 1>enough data. When we come back, i'll talk a little

0:31:57.960 --> 0:32:00.440
<v Speaker 1>bit more about that and what it all means. But

0:32:00.560 --> 0:32:11.800
<v Speaker 1>first let's take another quick break. So we left off

0:32:11.880 --> 0:32:15.080
<v Speaker 1>with the AI field going into hibernation for a little bit.

0:32:15.640 --> 0:32:20.040
<v Speaker 1>Theory and mathematics were bumping up against the limitations of technology,

0:32:20.200 --> 0:32:23.160
<v Speaker 1>which wasn't quite at the level to put all that

0:32:23.280 --> 0:32:26.160
<v Speaker 1>theory to the test. Plus there needed to be some

0:32:26.200 --> 0:32:30.120
<v Speaker 1>tweaks to the approaches, but those came with time and

0:32:30.400 --> 0:32:34.440
<v Speaker 1>more mathematicians found new ways to create artificial neural networks

0:32:34.480 --> 0:32:38.880
<v Speaker 1>capable of stuff like pattern recognition and learning. So let's

0:32:39.040 --> 0:32:44.040
<v Speaker 1>imagine another decision tree. We've got our starting position. This

0:32:44.080 --> 0:32:47.200
<v Speaker 1>is probably where we put some input. We would feed

0:32:47.640 --> 0:32:51.920
<v Speaker 1>data into a system, and let's say from that starting position,

0:32:51.960 --> 0:32:55.160
<v Speaker 1>we have a process that's going to transform that input

0:32:55.760 --> 0:32:59.520
<v Speaker 1>into one of two possible ways. So we've got two

0:33:00.040 --> 0:33:05.120
<v Speaker 1>potential outputs for that first step. Like our mini max example,

0:33:05.440 --> 0:33:08.800
<v Speaker 1>we can go down several layers of possible choices, and

0:33:08.840 --> 0:33:12.640
<v Speaker 1>we can wait the relationships between these different choices. So

0:33:13.080 --> 0:33:16.280
<v Speaker 1>if the incoming value is higher than a certain amount,

0:33:16.640 --> 0:33:19.800
<v Speaker 1>maybe the node sends it down one pathway, but if

0:33:19.880 --> 0:33:23.479
<v Speaker 1>the value is lower than that arbitrary amount, the node

0:33:23.600 --> 0:33:28.760
<v Speaker 1>will send it down a different pathway. This is drastically oversimplifying,

0:33:28.920 --> 0:33:31.240
<v Speaker 1>but I hope you kind of get the idea. It's

0:33:31.240 --> 0:33:34.400
<v Speaker 1>like a big sorting system and the goal is that

0:33:34.800 --> 0:33:38.760
<v Speaker 1>at the very end, whatever comes out as output is

0:33:38.880 --> 0:33:43.320
<v Speaker 1>correct or true. Ideally, you've got a system that is

0:33:43.480 --> 0:33:48.760
<v Speaker 1>self improving. It trains itself to be better. But how

0:33:48.800 --> 0:33:52.440
<v Speaker 1>the heck does that happen? Well, let's consider cats for

0:33:52.480 --> 0:33:57.760
<v Speaker 1>a bit, not the musical and could heavens definitely not

0:33:58.200 --> 0:34:02.760
<v Speaker 1>the movie music a coal that is a subject that

0:34:02.960 --> 0:34:05.760
<v Speaker 1>deserves its own episode. Maybe one day I'll figure out

0:34:06.000 --> 0:34:08.120
<v Speaker 1>a way to tackle that film with some sort of

0:34:08.160 --> 0:34:11.120
<v Speaker 1>tech capacity, but honestly, I'm just not ready to do

0:34:11.200 --> 0:34:14.560
<v Speaker 1>that yet, from like an emotional standpoint as well as

0:34:14.560 --> 0:34:19.040
<v Speaker 1>a research one. No, let's say you're teaching a computer

0:34:19.080 --> 0:34:23.720
<v Speaker 1>system to recognize cats pictures of cats, and the system

0:34:23.760 --> 0:34:27.840
<v Speaker 1>has an artificial neural network that accepts input pictures of

0:34:27.880 --> 0:34:31.400
<v Speaker 1>cats and then filters that input through the network to

0:34:31.520 --> 0:34:35.399
<v Speaker 1>make the determination does this picture include a cat in it?

0:34:35.920 --> 0:34:38.880
<v Speaker 1>And you start feeding it lots of images. The neural

0:34:38.920 --> 0:34:42.359
<v Speaker 1>network acts on the data according to the weighted relationship

0:34:42.520 --> 0:34:47.480
<v Speaker 1>between the artificial neurons, and it produces an output. Now

0:34:47.800 --> 0:34:50.640
<v Speaker 1>here's the thing we already know what we want the

0:34:50.640 --> 0:34:54.000
<v Speaker 1>output to be because we can recognize of a picture

0:34:54.000 --> 0:34:57.160
<v Speaker 1>has a cat inet or not. Maybe we've got one

0:34:57.239 --> 0:35:00.440
<v Speaker 1>thousand pictures. This is the training data we're going to

0:35:00.600 --> 0:35:03.960
<v Speaker 1>use for this machine learning process. We also know that

0:35:04.080 --> 0:35:06.719
<v Speaker 1>eight hundred of those pictures have a cat in them

0:35:06.760 --> 0:35:10.160
<v Speaker 1>and two hundred don't, so we know what we want

0:35:10.160 --> 0:35:13.160
<v Speaker 1>the results to be. We've got an artificial neural network

0:35:13.360 --> 0:35:16.759
<v Speaker 1>in which some neurons or nodes will accept input and

0:35:16.760 --> 0:35:19.440
<v Speaker 1>perform a function based on that input, and then the

0:35:19.480 --> 0:35:23.360
<v Speaker 1>weighted connections that neuron has to other neurons will determine

0:35:23.640 --> 0:35:26.480
<v Speaker 1>where it passes the information down until we get to

0:35:26.480 --> 0:35:29.759
<v Speaker 1>an output. And this happens until we get that conclusion.

0:35:30.440 --> 0:35:34.160
<v Speaker 1>So what happens if the computer's answer is wrong. One

0:35:34.239 --> 0:35:37.160
<v Speaker 1>if we feed those one thousand photos to it and

0:35:37.239 --> 0:35:40.239
<v Speaker 1>says only three hundred of them have cats in them.

0:35:40.480 --> 0:35:43.720
<v Speaker 1>While we have to go back and adjust those weighted

0:35:43.840 --> 0:35:48.719
<v Speaker 1>connections because clearly something didn't go right, the connections within

0:35:48.760 --> 0:35:53.000
<v Speaker 1>the network need to be readjusted. We would likely start

0:35:53.320 --> 0:35:57.200
<v Speaker 1>closest to our output and see which neurons seem to

0:35:57.239 --> 0:36:01.560
<v Speaker 1>contribute to the mistake, which which neurons were responsible, in

0:36:01.600 --> 0:36:04.160
<v Speaker 1>other words, for it to say, oh, only three these

0:36:04.200 --> 0:36:07.960
<v Speaker 1>pictures had cats in them. And then we would adjust

0:36:08.040 --> 0:36:11.719
<v Speaker 1>the weights, the incoming weights of connections to those neurons

0:36:12.360 --> 0:36:15.880
<v Speaker 1>in order to try and favor pathways that lead to

0:36:16.040 --> 0:36:19.480
<v Speaker 1>correct answers. Then we feed it the one thousand pictures

0:36:19.560 --> 0:36:22.719
<v Speaker 1>again and we look at those results. Then we do

0:36:22.840 --> 0:36:26.480
<v Speaker 1>this again and again and again, every time tweaking the

0:36:26.520 --> 0:36:31.239
<v Speaker 1>network a little bit so that it gets a bit better. Eventually,

0:36:31.520 --> 0:36:34.960
<v Speaker 1>when we have trained the system, we can start to

0:36:35.120 --> 0:36:39.720
<v Speaker 1>feed brand new data to the network, not the stuff

0:36:39.760 --> 0:36:43.640
<v Speaker 1>we've trained it on, but pictures that we and the

0:36:43.680 --> 0:36:47.160
<v Speaker 1>system have never seen before. And if our network is

0:36:47.200 --> 0:36:49.480
<v Speaker 1>a good one, if we have trained it well, it

0:36:49.520 --> 0:36:53.239
<v Speaker 1>will sort through these new photos and it will count

0:36:53.320 --> 0:36:56.320
<v Speaker 1>up the ones that have the cat pictures lickety split.

0:36:56.760 --> 0:37:00.839
<v Speaker 1>This approach is called supervised learning because it involves kind

0:37:00.840 --> 0:37:04.880
<v Speaker 1>of grading the network on its homework and then working

0:37:04.920 --> 0:37:08.759
<v Speaker 1>with it to get better. Heck, with the right algorithm,

0:37:08.760 --> 0:37:12.480
<v Speaker 1>a neural network can learn to recognize and differentiate patterns

0:37:12.960 --> 0:37:16.520
<v Speaker 1>even if we never explicitly told the system what it

0:37:16.600 --> 0:37:20.719
<v Speaker 1>was looking for. Google discovered this several years ago when

0:37:20.760 --> 0:37:25.040
<v Speaker 1>it fed several thousand YouTube videos to an enormous artificial

0:37:25.080 --> 0:37:29.359
<v Speaker 1>neural network. The system analyzed the videos that were fed

0:37:29.400 --> 0:37:33.520
<v Speaker 1>to it and gradually recognized patterns that represented different types

0:37:33.560 --> 0:37:39.160
<v Speaker 1>of stuff, like people or like cats, because there are

0:37:39.200 --> 0:37:42.520
<v Speaker 1>a lot of cat videos on YouTube, and the network

0:37:42.880 --> 0:37:45.120
<v Speaker 1>got to the point where it could identify an image

0:37:45.120 --> 0:37:48.960
<v Speaker 1>of a cat fairly reliably better than seventy of the time,

0:37:49.440 --> 0:37:53.239
<v Speaker 1>even though it was never told how to do that,

0:37:53.920 --> 0:37:57.839
<v Speaker 1>or it was never even told what a cat was. So,

0:37:57.880 --> 0:38:01.120
<v Speaker 1>as Google representatives put it, they said, it had to

0:38:01.239 --> 0:38:04.719
<v Speaker 1>invent the concept of a cat. It had to recognize

0:38:05.200 --> 0:38:09.719
<v Speaker 1>that cats are not the same as people, which I

0:38:09.719 --> 0:38:14.080
<v Speaker 1>think is a big slap in the face to some cats. Really,

0:38:14.760 --> 0:38:18.560
<v Speaker 1>what it said was that I recognized this particular pattern

0:38:18.600 --> 0:38:23.040
<v Speaker 1>of features, and I recognized that these other instances of

0:38:23.120 --> 0:38:26.839
<v Speaker 1>creatures that have a similar pattern seemed to match that,

0:38:27.040 --> 0:38:30.919
<v Speaker 1>and so I draw the conclusion that this instance of

0:38:30.960 --> 0:38:35.080
<v Speaker 1>a thing belongs with all these other instances of things

0:38:35.160 --> 0:38:39.640
<v Speaker 1>that are similar in characteristics. So this was more of

0:38:39.640 --> 0:38:43.439
<v Speaker 1>an example of unsupervised learning, and that the system, when

0:38:43.480 --> 0:38:46.640
<v Speaker 1>fed enough data, began to categorize stuff all on its

0:38:46.640 --> 0:38:50.680
<v Speaker 1>own through its own parameters. Now, one neat way that

0:38:50.719 --> 0:38:54.720
<v Speaker 1>computer scientists will train up systems for certain types of applications.

0:38:55.280 --> 0:39:00.840
<v Speaker 1>Is through a generative adversarial network, which I admit sounds

0:39:00.920 --> 0:39:03.719
<v Speaker 1>kind of sinister, doesn't it, And I mean it can be,

0:39:03.920 --> 0:39:07.719
<v Speaker 1>but it doesn't have to be. Essentially, you're using two

0:39:07.760 --> 0:39:11.480
<v Speaker 1>different artificial neural networks. One of the networks has a

0:39:11.520 --> 0:39:15.520
<v Speaker 1>specific job, it's to fool the other network. So the

0:39:15.520 --> 0:39:18.719
<v Speaker 1>other network's job is to detect attempts to fool it

0:39:19.080 --> 0:39:23.520
<v Speaker 1>versus legitimate data. So let's use an example. Let's say

0:39:23.560 --> 0:39:26.200
<v Speaker 1>you're trying to create a system that can make realistic

0:39:26.600 --> 0:39:33.040
<v Speaker 1>but entirely computer generated, that is, fabricated photographs of people. So,

0:39:33.080 --> 0:39:36.440
<v Speaker 1>in other words, these are computer generated images that don't

0:39:36.560 --> 0:39:40.239
<v Speaker 1>actually represent a real person at all. We've got one

0:39:40.320 --> 0:39:43.600
<v Speaker 1>artificial neural network, the generator, and its job is to

0:39:43.680 --> 0:39:49.399
<v Speaker 1>create images of people that can pass as real photographs.

0:39:49.760 --> 0:39:52.520
<v Speaker 1>Then we've got our other network, which is the discriminator.

0:39:52.840 --> 0:39:56.520
<v Speaker 1>This is trying to sort out real photos of actual

0:39:56.600 --> 0:40:02.600
<v Speaker 1>people from pictures that have been generated by the generative system.

0:40:02.640 --> 0:40:06.400
<v Speaker 1>And we put these two networks against each other. The

0:40:06.480 --> 0:40:10.279
<v Speaker 1>idea here is that both systems get better as they

0:40:10.320 --> 0:40:14.880
<v Speaker 1>test one another out. If the generator network is falling

0:40:14.920 --> 0:40:19.040
<v Speaker 1>behind because the discriminator can suss out the fakes too easily. Well,

0:40:19.080 --> 0:40:22.280
<v Speaker 1>then it's time to tweak some weights in that neural

0:40:22.320 --> 0:40:27.680
<v Speaker 1>network that are leading to dissatisfactory computer generated images and

0:40:27.719 --> 0:40:31.960
<v Speaker 1>try it again. But then, if the discriminator is starting

0:40:32.000 --> 0:40:36.160
<v Speaker 1>to miss fakes while, it's time to tweak the discriminator

0:40:36.200 --> 0:40:41.080
<v Speaker 1>network so it's better at spotting the false pictures. Not

0:40:41.200 --> 0:40:44.879
<v Speaker 1>Along the way, some pretty extraordinary stuff can happen. There

0:40:44.880 --> 0:40:50.120
<v Speaker 1>are photos of computer generated faces, not altered pictures, not

0:40:50.280 --> 0:40:54.920
<v Speaker 1>ones created by a human artist, but entirely composed by

0:40:54.960 --> 0:40:59.440
<v Speaker 1>a computer, and they can look absolutely realistic, complete with

0:40:59.480 --> 0:41:04.680
<v Speaker 1>consistent lighting and shadows. This is only after lots of

0:41:04.760 --> 0:41:09.000
<v Speaker 1>training sessions the networks learn what the giveaways are, like,

0:41:09.520 --> 0:41:12.920
<v Speaker 1>what is it that leads the discriminator to say, no,

0:41:13.480 --> 0:41:15.920
<v Speaker 1>this is a fake photo, and how can you fix that?

0:41:16.400 --> 0:41:19.399
<v Speaker 1>It reminds me a bit of how photo experts used

0:41:19.440 --> 0:41:22.919
<v Speaker 1>to point out really bad photoshop jobs and explaining how

0:41:23.160 --> 0:41:27.120
<v Speaker 1>certain elements like shadows or edges or whatever, we're a

0:41:27.239 --> 0:41:30.880
<v Speaker 1>dead giveaway that someone had altered an image. Well, similar

0:41:31.000 --> 0:41:35.160
<v Speaker 1>rules exist for generated images, and through training, the generator

0:41:35.200 --> 0:41:39.640
<v Speaker 1>gets better at making really convincing examples that don't fall

0:41:39.680 --> 0:41:42.560
<v Speaker 1>into the traps that would reveal it as a fake.

0:41:43.520 --> 0:41:47.360
<v Speaker 1>Over time, generative networks can get good enough to produce

0:41:47.360 --> 0:41:50.080
<v Speaker 1>stuff that would be very difficult for a human to

0:41:50.160 --> 0:41:54.000
<v Speaker 1>tell apart from the quote unquote real thing, and discriminators

0:41:54.040 --> 0:41:57.280
<v Speaker 1>can get good enough to detect fakes that would otherwise

0:41:57.360 --> 0:42:01.000
<v Speaker 1>pass human inspection. So an example of this is the

0:42:01.080 --> 0:42:05.520
<v Speaker 1>current ongoing battle with deep fakes. These are computer generated

0:42:05.640 --> 0:42:09.600
<v Speaker 1>videos that appear to be legit. If they're done well enough,

0:42:10.080 --> 0:42:12.680
<v Speaker 1>they can have famous people in them. Doesn't have to

0:42:12.680 --> 0:42:15.200
<v Speaker 1>be a famous person, but it can show a video

0:42:15.200 --> 0:42:18.799
<v Speaker 1>of someone doing something that they absolutely never did, but

0:42:19.160 --> 0:42:21.799
<v Speaker 1>according to the video, they did, and it can be

0:42:21.840 --> 0:42:25.200
<v Speaker 1>really convincing if it's done well. A good deep fake

0:42:25.520 --> 0:42:29.520
<v Speaker 1>can fool people if you aren't paying too much attention.

0:42:29.600 --> 0:42:33.680
<v Speaker 1>Some of the really good ones can pass pretty deep scrutiny.

0:42:33.760 --> 0:42:37.319
<v Speaker 1>So this requires researchers to come up with solutions that

0:42:37.360 --> 0:42:41.040
<v Speaker 1>are pretty subtle and beyond the average person's ability to replicate,

0:42:41.120 --> 0:42:44.880
<v Speaker 1>like looking at the reflections in the person's eyes and

0:42:44.920 --> 0:42:48.680
<v Speaker 1>whether or not they seem realistic or a computer generated.

0:42:48.760 --> 0:42:53.520
<v Speaker 1>But that really just represents another hurdle for the generative side.

0:42:53.800 --> 0:42:57.960
<v Speaker 1>So in other words, this is a seesaw approach, right.

0:42:58.840 --> 0:43:02.600
<v Speaker 1>It's creating face as on one side and detecting them

0:43:02.600 --> 0:43:05.080
<v Speaker 1>on the other side. It's something we see an artificial

0:43:05.120 --> 0:43:08.040
<v Speaker 1>intelligence in general. A similar story played out with the

0:43:08.080 --> 0:43:11.880
<v Speaker 1>old capture systems, where you know, we saw back and

0:43:11.920 --> 0:43:15.440
<v Speaker 1>forth between methods to try and weed out bots by

0:43:15.560 --> 0:43:19.960
<v Speaker 1>using capture images that only humans could really parse, and

0:43:20.000 --> 0:43:24.280
<v Speaker 1>then we saw improved bots that could analyze these images

0:43:24.320 --> 0:43:27.920
<v Speaker 1>and return correct results, which men it was necessary to

0:43:27.960 --> 0:43:31.080
<v Speaker 1>create more difficult captures. Eventually get to a point where

0:43:31.200 --> 0:43:34.200
<v Speaker 1>the captures are difficult enough where the average person can't

0:43:34.200 --> 0:43:36.040
<v Speaker 1>even pass them, and then you have to go to

0:43:36.080 --> 0:43:39.080
<v Speaker 1>a different method. We also see this play out in

0:43:39.120 --> 0:43:42.320
<v Speaker 1>the cyber security realm, where you might say, the thieves

0:43:42.360 --> 0:43:45.600
<v Speaker 1>get better at lock picking, and then security experts make

0:43:45.719 --> 0:43:50.719
<v Speaker 1>better locks, and the cycle just repeats endlessly. One thing

0:43:50.840 --> 0:43:54.720
<v Speaker 1>that has really fueled machine learning recently is the era

0:43:54.880 --> 0:43:58.480
<v Speaker 1>of big data. Being able to harvest information on a

0:43:58.680 --> 0:44:04.000
<v Speaker 1>truly massive scale provides the opportunity to feed that data

0:44:04.120 --> 0:44:09.120
<v Speaker 1>into various machine learning systems to search for meaning within

0:44:09.200 --> 0:44:13.560
<v Speaker 1>that data. These systems might scour the information to look

0:44:13.560 --> 0:44:18.120
<v Speaker 1>for stuff like criminal activity like financial crimes or the

0:44:18.160 --> 0:44:22.520
<v Speaker 1>attempt to move some money around from various criminal exploits.

0:44:22.760 --> 0:44:25.400
<v Speaker 1>Or it could be used to look for trends like

0:44:25.480 --> 0:44:29.279
<v Speaker 1>market trends, or it might be used to plot possible

0:44:29.360 --> 0:44:33.560
<v Speaker 1>spikes in COVID nineteen transmission where those might occur where

0:44:33.680 --> 0:44:37.560
<v Speaker 1>people should really be focusing their attention. But now we

0:44:37.680 --> 0:44:40.360
<v Speaker 1>got to think back on what I said earlier about

0:44:40.440 --> 0:44:44.000
<v Speaker 1>looking up at the sky and seeing shapes in the clouds.

0:44:45.000 --> 0:44:48.240
<v Speaker 1>There's a risk that comes along with machine learning. Actually,

0:44:48.280 --> 0:44:50.560
<v Speaker 1>technically there are a lot of risks, but this one

0:44:50.680 --> 0:44:54.280
<v Speaker 1>is a biggie. It is possible for machines like humans

0:44:54.760 --> 0:44:58.600
<v Speaker 1>to detect a pattern where there really isn't a pattern.

0:44:59.080 --> 0:45:03.160
<v Speaker 1>Systems might interpret noise to be signal, and depending on

0:45:03.239 --> 0:45:06.160
<v Speaker 1>what you're using the system to do, that could lead

0:45:06.200 --> 0:45:10.680
<v Speaker 1>you to some seriously dangerous incorrect conclusions. In some cases,

0:45:11.280 --> 0:45:13.640
<v Speaker 1>you could just be inconvenient, but depending on what you're

0:45:13.760 --> 0:45:17.319
<v Speaker 1>working toward, it could be catastrophic. And so computer scientists

0:45:17.400 --> 0:45:19.799
<v Speaker 1>know they have to do a lot of analysis to

0:45:19.840 --> 0:45:24.320
<v Speaker 1>make sure that patterns that are identified through machine learning

0:45:24.360 --> 0:45:30.960
<v Speaker 1>processes are actually real before acting on that information. Likewise,

0:45:31.440 --> 0:45:35.640
<v Speaker 1>bias is something that we humans have. Well, it's also

0:45:35.680 --> 0:45:39.560
<v Speaker 1>something that machine learning systems have too. Now, sometimes bias

0:45:39.800 --> 0:45:42.800
<v Speaker 1>is intentional. It can take the form of those weighted

0:45:42.920 --> 0:45:48.960
<v Speaker 1>relationships between artificial neurons. Other times, a systems architects, you know,

0:45:49.040 --> 0:45:52.280
<v Speaker 1>the people who put it together. They might have introduced bias,

0:45:52.400 --> 0:45:57.400
<v Speaker 1>not through conscious effort, but merely through the approach they took,

0:45:57.760 --> 0:46:01.120
<v Speaker 1>and that approach might have been too narrow. Ow we've

0:46:01.160 --> 0:46:04.719
<v Speaker 1>seen this pop up a lot again with facial recognition technologies,

0:46:04.760 --> 0:46:08.880
<v Speaker 1>many of which have a sliding scale of efficacy. They

0:46:08.960 --> 0:46:13.000
<v Speaker 1>might be more reliable with certain ethnicities like white people,

0:46:13.320 --> 0:46:16.960
<v Speaker 1>over others. That points a a likely problem with the

0:46:16.960 --> 0:46:20.239
<v Speaker 1>way those systems were trained. This is one of the

0:46:20.320 --> 0:46:23.560
<v Speaker 1>reasons why many companies have made a choice to stop

0:46:23.600 --> 0:46:28.080
<v Speaker 1>supplying certain parties like police forces and military branches with

0:46:28.160 --> 0:46:32.799
<v Speaker 1>facial recognition systems. The systems aren't reliable for all demographic

0:46:32.840 --> 0:46:37.120
<v Speaker 1>groups and thus could cause disproportionate harm to certain populations.

0:46:37.400 --> 0:46:40.440
<v Speaker 1>It would be a technological approach to systemic racism, and

0:46:40.480 --> 0:46:44.000
<v Speaker 1>this stuff is already out there in the wild. You

0:46:44.080 --> 0:46:47.480
<v Speaker 1>might think a computer system can't be biased or prejudiced

0:46:47.760 --> 0:46:51.160
<v Speaker 1>or racist, and sure, we're still not. At the point

0:46:51.200 --> 0:46:53.920
<v Speaker 1>where these systems are thinking in the way that humans do,

0:46:54.280 --> 0:46:59.160
<v Speaker 1>but the outcome is still disproportionately harmful to some groups.

0:46:59.640 --> 0:47:02.640
<v Speaker 1>Now that not to say that machine learning itself is bad.

0:47:03.120 --> 0:47:06.880
<v Speaker 1>It's not bad. It's a tool, just as all technology

0:47:06.920 --> 0:47:10.319
<v Speaker 1>is a tool used properly with a careful hand to

0:47:10.360 --> 0:47:15.000
<v Speaker 1>make sure that biases understood and where needed mitigated, and

0:47:15.200 --> 0:47:19.080
<v Speaker 1>where work can be double or triple checked before acted upon.

0:47:19.520 --> 0:47:22.840
<v Speaker 1>It is a remarkably useful tool, one that will power

0:47:22.960 --> 0:47:27.600
<v Speaker 1>and design and improve elements in our lives if it's

0:47:27.719 --> 0:47:31.040
<v Speaker 1>under the correct stewardship. But it does require a bit

0:47:31.040 --> 0:47:34.920
<v Speaker 1>more hands on work. We can't just leave it to

0:47:34.960 --> 0:47:40.320
<v Speaker 1>the machines just yet. Well, that wraps up this look

0:47:40.520 --> 0:47:43.400
<v Speaker 1>at the concept of machine learning and some of the

0:47:43.920 --> 0:47:48.040
<v Speaker 1>thought that underlies it. This really is a very high

0:47:48.120 --> 0:47:52.440
<v Speaker 1>level treatment of machine learning. There are plenty of resources

0:47:52.480 --> 0:47:54.719
<v Speaker 1>online if you want to dive in and learn more.

0:47:55.080 --> 0:47:58.040
<v Speaker 1>A lot of them get very heavy into the math.

0:47:58.280 --> 0:48:00.719
<v Speaker 1>So if that's not your bag, Uh, it might be

0:48:00.719 --> 0:48:03.200
<v Speaker 1>a little challenging to navigate. It certainly is for me.

0:48:03.840 --> 0:48:07.160
<v Speaker 1>I love learning about the stuff, but um, a lot

0:48:07.200 --> 0:48:10.480
<v Speaker 1>of it requires me to look up a term, then

0:48:10.560 --> 0:48:13.560
<v Speaker 1>look up a term that explains that term, and so on,

0:48:13.760 --> 0:48:16.920
<v Speaker 1>and I go down a rabbit hole. But hopefully you

0:48:17.000 --> 0:48:19.879
<v Speaker 1>have a better appreciation for what machine learning is at

0:48:19.880 --> 0:48:22.479
<v Speaker 1>this point. If you have suggestions for topics I should

0:48:22.560 --> 0:48:26.560
<v Speaker 1>cover in future text Stuff episodes, let me know. The

0:48:26.600 --> 0:48:28.560
<v Speaker 1>best way to get in touch with me is through

0:48:28.600 --> 0:48:32.120
<v Speaker 1>Twitter and the handle is text stuff H s W,

0:48:32.880 --> 0:48:41.320
<v Speaker 1>and I'll talk to you again really soon. Text Stuff

0:48:41.400 --> 0:48:44.600
<v Speaker 1>is an I Heart Radio production. For more podcasts from

0:48:44.600 --> 0:48:48.359
<v Speaker 1>my heart Radio, visit the i heart Radio app, Apple Podcasts,

0:48:48.480 --> 0:48:50.480
<v Speaker 1>or wherever you listen to your favorite shows.