WEBVTT - Machine Learning and Catastrophic Forgetting

0:00:04.440 --> 0:00:12.280
<v Speaker 1>Welcome to tech Stuff, a production from iHeartRadio. Hey there,

0:00:12.280 --> 0:00:15.960
<v Speaker 1>and welcome to tech Stuff. I'm your host, Jonathan Strickland.

0:00:16.000 --> 0:00:19.160
<v Speaker 1>I'm an executive producer with iHeartRadio and how the tech

0:00:19.200 --> 0:00:22.479
<v Speaker 1>are you? So? Over this past weekend, I was listening

0:00:22.520 --> 0:00:26.239
<v Speaker 1>to the podcast The Skeptics Guide to the Universe, which

0:00:26.280 --> 0:00:28.520
<v Speaker 1>I have no connection to. I just listened to it,

0:00:29.240 --> 0:00:32.800
<v Speaker 1>and it included a section on AI that referenced something

0:00:33.000 --> 0:00:37.040
<v Speaker 1>I don't think I had heard of before, which is

0:00:37.600 --> 0:00:41.680
<v Speaker 1>really talking more about my oversight than anything else. Maybe

0:00:41.720 --> 0:00:44.120
<v Speaker 1>I did hear about it, but then I forgot about it,

0:00:44.320 --> 0:00:48.040
<v Speaker 1>you know, catastrophically. So the thing they talked about was

0:00:48.200 --> 0:00:54.040
<v Speaker 1>catastrophic forgetting in artificial intelligence, specifically in machine learning systems

0:00:54.040 --> 0:00:57.920
<v Speaker 1>built on artificial neural networks. Now, before we talk about

0:00:58.080 --> 0:01:02.080
<v Speaker 1>catastrophic forgetting, which as I mentioned, is related to neural

0:01:02.120 --> 0:01:04.760
<v Speaker 1>networks and machine learning, we really need to do a

0:01:04.840 --> 0:01:07.120
<v Speaker 1>quick reminder, not a quick reminder. We need to do

0:01:07.160 --> 0:01:10.280
<v Speaker 1>a full reminder on how all this works. And that's

0:01:10.319 --> 0:01:13.080
<v Speaker 1>going to require us to do a whole lot of remembering.

0:01:13.360 --> 0:01:17.000
<v Speaker 1>Not a catastrophic amount, but a lot. So the history

0:01:17.040 --> 0:01:20.680
<v Speaker 1>of artificial intelligence as a discipline is one of intense

0:01:21.600 --> 0:01:26.000
<v Speaker 1>and important debates in fields like computer science. Now, I

0:01:26.040 --> 0:01:29.000
<v Speaker 1>have often talked about how AI can be seen as

0:01:29.160 --> 0:01:33.440
<v Speaker 1>the convergence of several other disciplines into its own field,

0:01:34.000 --> 0:01:37.720
<v Speaker 1>and there's more than one way to approach the challenge

0:01:37.720 --> 0:01:41.080
<v Speaker 1>of artificial intelligence. And in the history of AI, we

0:01:41.120 --> 0:01:45.680
<v Speaker 1>actually saw that play out, and some would argue the

0:01:45.680 --> 0:01:48.680
<v Speaker 1>way it played out means that we're actually just now

0:01:48.720 --> 0:01:53.000
<v Speaker 1>playing catch up. So different schools of thought pushed these

0:01:53.040 --> 0:01:58.760
<v Speaker 1>different approaches forward as this should be the prevailing methodology

0:01:58.800 --> 0:02:03.160
<v Speaker 1>we use to devel artificial intelligence. This is important because

0:02:03.200 --> 0:02:05.880
<v Speaker 1>the development of AI does not exist in a vacuum

0:02:06.440 --> 0:02:11.519
<v Speaker 1>right It exists in our real world. Research requires funding,

0:02:11.720 --> 0:02:17.240
<v Speaker 1>and when you've got different sides arguing that their approach

0:02:17.360 --> 0:02:22.359
<v Speaker 1>to artificial intelligence is superior and that the alternatives are

0:02:22.400 --> 0:02:25.200
<v Speaker 1>not just inferior, but potentially limited to the point of

0:02:25.200 --> 0:02:29.080
<v Speaker 1>being useless, well you've got a metaphorical wrestling match going on.

0:02:29.720 --> 0:02:33.120
<v Speaker 1>The winner takes home the big prize of getting funding

0:02:33.200 --> 0:02:36.240
<v Speaker 1>for their research, and the loser has to scrabble for

0:02:36.320 --> 0:02:39.160
<v Speaker 1>whatever they can find, and often they will see their

0:02:39.160 --> 0:02:43.520
<v Speaker 1>work languish as a result. By the way, this is

0:02:43.560 --> 0:02:46.520
<v Speaker 1>why I often bring stuff up in this podcast that

0:02:46.639 --> 0:02:50.680
<v Speaker 1>is outside the realm of tech. I've received a lot

0:02:50.720 --> 0:02:53.079
<v Speaker 1>of messages over the years from folks saying that I

0:02:53.080 --> 0:02:56.600
<v Speaker 1>should leave out stuff like money or politics. Politics is

0:02:56.600 --> 0:03:00.000
<v Speaker 1>the big one. But to me, that doesn't make sense

0:03:00.120 --> 0:03:04.840
<v Speaker 1>cause tech exists within our world, a world that is

0:03:04.960 --> 0:03:09.080
<v Speaker 1>largely shaped by money and politics. I don't think we

0:03:09.280 --> 0:03:12.240
<v Speaker 1>can separate the tech from all of that because I

0:03:12.280 --> 0:03:16.400
<v Speaker 1>believe that if you were to somehow magically remove those influences,

0:03:16.440 --> 0:03:19.920
<v Speaker 1>If somehow money and politics never played a part in

0:03:19.960 --> 0:03:23.480
<v Speaker 1>the development of technology, our tech would look very different

0:03:23.560 --> 0:03:27.040
<v Speaker 1>from what it does today. Not necessarily better or worse,

0:03:27.560 --> 0:03:31.160
<v Speaker 1>but different. I mean, think about Thomas Edison. He was

0:03:31.400 --> 0:03:36.680
<v Speaker 1>very much driven by financial success, like his work in

0:03:36.800 --> 0:03:40.960
<v Speaker 1>tech was really mostly about making lots of money. And

0:03:41.000 --> 0:03:43.920
<v Speaker 1>without the making lots of money part, you don't really

0:03:43.960 --> 0:03:48.120
<v Speaker 1>have his drive to really bring together the brightest minds

0:03:48.120 --> 0:03:51.040
<v Speaker 1>of his generation and set them to work on creating

0:03:51.080 --> 0:03:54.960
<v Speaker 1>incredible technology. So I think we have to take all

0:03:55.000 --> 0:03:58.680
<v Speaker 1>these things into consideration. Anyway, that's a total rabbit trail,

0:03:58.720 --> 0:04:01.280
<v Speaker 1>and I apologize. Let's get back to our story. It

0:04:01.320 --> 0:04:04.920
<v Speaker 1>really begins around nineteen forty three when a pair of

0:04:04.960 --> 0:04:09.240
<v Speaker 1>researchers at the University of Chicago first proposed the concept

0:04:09.400 --> 0:04:13.520
<v Speaker 1>of the basic unit of a neural network. Those researchers

0:04:13.600 --> 0:04:17.760
<v Speaker 1>were Warren McCullough and Walter Pets, And in fact, they

0:04:17.800 --> 0:04:22.400
<v Speaker 1>demonstrate their idea by showing a simple electrical circuit the

0:04:22.520 --> 0:04:25.719
<v Speaker 1>very basis for what would become a neural network. So

0:04:25.839 --> 0:04:29.400
<v Speaker 1>their proposal was a system that would use those simple

0:04:29.440 --> 0:04:33.960
<v Speaker 1>circuits to mimic the neurons that we have in our noggins.

0:04:34.320 --> 0:04:37.760
<v Speaker 1>So our brain consists of a bunch of these neurons,

0:04:38.320 --> 0:04:40.039
<v Speaker 1>and you might wonder how much is a bunch, Well,

0:04:40.920 --> 0:04:44.480
<v Speaker 1>we're talking about on average, around one hundred billion neurons

0:04:44.640 --> 0:04:48.240
<v Speaker 1>in the human brain. These neurons interconnect with each other.

0:04:48.360 --> 0:04:50.960
<v Speaker 1>It's not just a one to one, right, You've got

0:04:50.960 --> 0:04:55.159
<v Speaker 1>these interconnections between all these different neurons, not with every

0:04:55.160 --> 0:04:58.200
<v Speaker 1>neuron connected to every other neuron, but lots of interconnections.

0:04:58.200 --> 0:05:01.000
<v Speaker 1>And if we're looking at just the connection, you would

0:05:01.040 --> 0:05:04.159
<v Speaker 1>count more than one hundred trillion of them in the

0:05:04.160 --> 0:05:07.840
<v Speaker 1>typical human brain. And these connections in our brains make

0:05:07.960 --> 0:05:12.640
<v Speaker 1>up neural circuits. Those circuits light up, and that represents

0:05:12.720 --> 0:05:15.960
<v Speaker 1>us doing lots of different stuff from experiencing the world

0:05:16.000 --> 0:05:20.120
<v Speaker 1>around us so perception to thinking about a past memory.

0:05:20.320 --> 0:05:23.320
<v Speaker 1>You know that typically is like recreating the same pathway

0:05:23.360 --> 0:05:27.760
<v Speaker 1>over and over, and sometimes we don't recreate it exactly correctly,

0:05:28.240 --> 0:05:32.280
<v Speaker 1>and our memory ends up not being a perfect representation

0:05:32.360 --> 0:05:35.159
<v Speaker 1>of the thing that we actually experienced. This is why

0:05:35.440 --> 0:05:38.679
<v Speaker 1>things like eyewitness testimony is not always very reliable, because

0:05:38.680 --> 0:05:43.840
<v Speaker 1>our memories aren't infallible. They can trick us and we

0:05:43.880 --> 0:05:46.360
<v Speaker 1>can have all those pathways light up when we learn

0:05:46.400 --> 0:05:49.279
<v Speaker 1>a new skill and we start forming new pathways, and

0:05:49.320 --> 0:05:53.160
<v Speaker 1>then as we practice this skill, we start to reinforce

0:05:53.279 --> 0:05:56.919
<v Speaker 1>those pathways. So McCulla and Pitts propose that we create

0:05:57.040 --> 0:06:02.080
<v Speaker 1>machines capable of doing essentially a similar thing that our

0:06:02.120 --> 0:06:07.599
<v Speaker 1>brains do, so kind of a neuromimicry, not exactly one

0:06:07.640 --> 0:06:11.479
<v Speaker 1>to one the way our brains work, but inspired by

0:06:11.560 --> 0:06:15.560
<v Speaker 1>the way our brains work. Now, we would be limited

0:06:15.839 --> 0:06:18.919
<v Speaker 1>by what the technology of the day would be able

0:06:18.920 --> 0:06:22.000
<v Speaker 1>to do, because there's no feasible way we could create

0:06:22.040 --> 0:06:27.920
<v Speaker 1>a massive electrical system with one hundred billion individual simple

0:06:28.040 --> 0:06:31.880
<v Speaker 1>circuits with more than one hundred trillion connections between them.

0:06:32.160 --> 0:06:35.680
<v Speaker 1>That would be beyond our capability it would be beyond

0:06:35.680 --> 0:06:39.599
<v Speaker 1>our resources. We could, however, create systems that used interconnected

0:06:39.640 --> 0:06:44.320
<v Speaker 1>circuits to process information and to teach such a system

0:06:44.440 --> 0:06:49.080
<v Speaker 1>to do specific tasks. Now, in nineteen forty nine, Donald

0:06:49.200 --> 0:06:53.880
<v Speaker 1>Hebb wrote a book about biological neurons, and he titled

0:06:53.880 --> 0:06:58.800
<v Speaker 1>this book the Organization of Behavior and suggested neural pathways

0:06:58.800 --> 0:07:02.320
<v Speaker 1>get stronger with additional use, kind of like you know,

0:07:02.320 --> 0:07:05.400
<v Speaker 1>if you exercise your muscles, you build strength over time.

0:07:05.480 --> 0:07:09.560
<v Speaker 1>Will so is the same with neural pathways. And if

0:07:09.600 --> 0:07:12.560
<v Speaker 1>you don't use those muscles well, then your muscles get weaker. Well,

0:07:12.640 --> 0:07:16.080
<v Speaker 1>same with neural pathways. If you end up learning a skill,

0:07:16.760 --> 0:07:20.920
<v Speaker 1>but then over a great amount of time you no

0:07:20.960 --> 0:07:24.560
<v Speaker 1>longer practice that skill, you're gonna lose some of your ability.

0:07:25.160 --> 0:07:27.080
<v Speaker 1>Maybe not all of it, but at least some of it.

0:07:27.160 --> 0:07:29.400
<v Speaker 1>And you have to you know, like I think about

0:07:29.800 --> 0:07:34.200
<v Speaker 1>wrestlers who come back from from retirement. Professional wrestlers, they

0:07:34.240 --> 0:07:36.480
<v Speaker 1>call it ring rust. You got to knock off the

0:07:36.560 --> 0:07:39.000
<v Speaker 1>ring rust and get back into step and kind of

0:07:39.320 --> 0:07:41.520
<v Speaker 1>get back into your groove. And it takes a little time.

0:07:41.680 --> 0:07:46.000
<v Speaker 1>Typically sometimes you know, you can get back into the

0:07:46.040 --> 0:07:48.760
<v Speaker 1>game faster than others, but you get the idea, and

0:07:49.000 --> 0:07:53.640
<v Speaker 1>also heb ended up proposing the concept of cells that

0:07:53.760 --> 0:07:59.360
<v Speaker 1>fire together wire together, meaning that neurons that fire at

0:07:59.360 --> 0:08:04.880
<v Speaker 1>the same time end up strengthening faster than other neurons do.

0:08:05.520 --> 0:08:09.640
<v Speaker 1>So when you get into that system, you can actually

0:08:09.680 --> 0:08:15.160
<v Speaker 1>reinforce those pathways. And for AI this would be really important.

0:08:15.640 --> 0:08:18.200
<v Speaker 1>And it wasn't very long after Donald have had published

0:08:18.240 --> 0:08:21.240
<v Speaker 1>this work that researchers in the field of AI tried

0:08:21.280 --> 0:08:26.440
<v Speaker 1>to apply that concept that philosophy to computer science. By

0:08:26.480 --> 0:08:29.800
<v Speaker 1>the mid nineteen fifties, the burgeoning computer science Lab and

0:08:29.920 --> 0:08:34.600
<v Speaker 1>AI Lab at MIT was building out neural networks based

0:08:34.679 --> 0:08:41.359
<v Speaker 1>on Hebb's ideas. Meanwhile, another computer scientist named Frank Rosenblatt

0:08:41.679 --> 0:08:46.280
<v Speaker 1>was looking at primitive neural systems and he started with flies,

0:08:46.440 --> 0:08:50.760
<v Speaker 1>like house flies. He wanted to explore systems that were

0:08:50.800 --> 0:08:54.920
<v Speaker 1>involved when a fly would quickly move away after detecting

0:08:54.920 --> 0:08:59.560
<v Speaker 1>a possible threat, like instantly, or at least appearing to

0:08:59.640 --> 0:09:03.839
<v Speaker 1>us to instantly react to something. So, for example, a

0:09:03.920 --> 0:09:06.320
<v Speaker 1>fly swater coming at it like you might be moving

0:09:06.360 --> 0:09:08.400
<v Speaker 1>the fly swater very quickly, and yet the fly is

0:09:08.440 --> 0:09:14.240
<v Speaker 1>able to move super fast with no perceivable delay. Right,

0:09:14.360 --> 0:09:16.760
<v Speaker 1>we know that we have a delay from when we

0:09:16.800 --> 0:09:19.280
<v Speaker 1>perceive something to when we can act on something. Like

0:09:19.320 --> 0:09:21.599
<v Speaker 1>if you've ever been in a fender bender in a

0:09:21.640 --> 0:09:24.680
<v Speaker 1>car accident, you know that that there's a delay between

0:09:24.679 --> 0:09:26.880
<v Speaker 1>when you see the issue when you can hit the brake,

0:09:27.520 --> 0:09:30.960
<v Speaker 1>and that can lead to accidents. Well, with flies, that

0:09:31.000 --> 0:09:35.080
<v Speaker 1>delay seems to be super super small. So Rosenblatt was

0:09:35.120 --> 0:09:39.440
<v Speaker 1>really interested in exploring the neurological reasons for that. How

0:09:39.559 --> 0:09:41.920
<v Speaker 1>can that happen? It has to be really simple, right,

0:09:42.360 --> 0:09:44.680
<v Speaker 1>There has to be a simple and more or less

0:09:44.840 --> 0:09:50.400
<v Speaker 1>direct pathway that exists to allow a fly to react

0:09:50.520 --> 0:09:54.720
<v Speaker 1>to detecting a potential threat like that. And if you

0:09:54.840 --> 0:09:58.960
<v Speaker 1>could replicate that with electronics, you could have a very

0:09:59.000 --> 0:10:05.640
<v Speaker 1>simple but potentially powerful artificial intelligence system. So he came

0:10:05.720 --> 0:10:08.440
<v Speaker 1>up with this system that would be based off that

0:10:08.640 --> 0:10:11.080
<v Speaker 1>very simple direct pathway that you would see in something

0:10:11.120 --> 0:10:14.760
<v Speaker 1>like a fly, and he called it the perceptron. So

0:10:14.800 --> 0:10:17.080
<v Speaker 1>he went back to the simple circuit design that was

0:10:17.080 --> 0:10:20.080
<v Speaker 1>proposed by Pitts and McCullough and he built out the

0:10:20.120 --> 0:10:24.560
<v Speaker 1>Mark one perceptron or perceptron I guess I should say,

0:10:24.720 --> 0:10:27.520
<v Speaker 1>So let's talk about a perceptron like not big p

0:10:27.760 --> 0:10:30.600
<v Speaker 1>but a little p perceptron. This is probably what we

0:10:30.600 --> 0:10:33.920
<v Speaker 1>would call a neural node in a modern neural network.

0:10:34.240 --> 0:10:38.080
<v Speaker 1>So the purpose of the perceptron was to accept inputs

0:10:38.559 --> 0:10:43.319
<v Speaker 1>and produce an output based on some threshold. Like if

0:10:43.320 --> 0:10:46.720
<v Speaker 1>the inputs meet a certain threshold, one output would be produced.

0:10:46.760 --> 0:10:48.680
<v Speaker 1>If they failed to do so, a different output would

0:10:48.679 --> 0:10:53.320
<v Speaker 1>be produced. The inputs, in turn would be assigned weights,

0:10:53.760 --> 0:10:56.840
<v Speaker 1>which would factor into the output the perceptron would generate.

0:10:57.240 --> 0:11:02.880
<v Speaker 1>So when we're talking weights, I mean you eights as

0:11:02.920 --> 0:11:06.000
<v Speaker 1>in like how heavy something is, or in this case,

0:11:06.360 --> 0:11:10.440
<v Speaker 1>how much impact that thing has. So we're talking about

0:11:10.640 --> 0:11:14.280
<v Speaker 1>how much impact one input has relative to other inputs.

0:11:14.720 --> 0:11:17.480
<v Speaker 1>Let me use a really mundane human example to kind

0:11:17.480 --> 0:11:21.280
<v Speaker 1>of explain what this means. Let's say that your friend

0:11:21.360 --> 0:11:23.600
<v Speaker 1>asks you to go see a movie with them, and

0:11:23.640 --> 0:11:26.520
<v Speaker 1>it's going to be playing tonight at nine pm. But

0:11:26.600 --> 0:11:29.720
<v Speaker 1>you've had a really busy day and you might not

0:11:29.760 --> 0:11:32.920
<v Speaker 1>be able to even eat dinner until around nine pm.

0:11:33.280 --> 0:11:34.880
<v Speaker 1>And if you go see this movie, it might mean

0:11:34.920 --> 0:11:37.760
<v Speaker 1>having to skip dinner or to try and eat something

0:11:37.840 --> 0:11:41.600
<v Speaker 1>really fast and unhealthy before you go to the movie.

0:11:41.920 --> 0:11:45.120
<v Speaker 1>What's more, you got a really big day tomorrow and

0:11:45.400 --> 0:11:47.400
<v Speaker 1>you feel like you really need to be well rested

0:11:47.440 --> 0:11:51.240
<v Speaker 1>for it. However, at the same time, you haven't seen

0:11:51.280 --> 0:11:54.200
<v Speaker 1>this friend in ages, and you really like this person

0:11:54.360 --> 0:11:56.880
<v Speaker 1>and you've wanted to hang with them for a really

0:11:56.920 --> 0:11:59.880
<v Speaker 1>long time. Plus the movie they're suggesting is one you've

0:12:00.120 --> 0:12:03.000
<v Speaker 1>hell he wanted to see and you haven't gone yet. Well,

0:12:03.040 --> 0:12:07.160
<v Speaker 1>you would likely assign at least unconsciously weights to each

0:12:07.200 --> 0:12:09.800
<v Speaker 1>of these factors before you make your decision. You know,

0:12:09.840 --> 0:12:12.880
<v Speaker 1>if getting some dinner without having to rush and also

0:12:12.960 --> 0:12:15.880
<v Speaker 1>to be really well rested for tomorrow are really important

0:12:15.920 --> 0:12:20.240
<v Speaker 1>to you, you'll probably reluctantly decline the offer. But if

0:12:20.280 --> 0:12:22.440
<v Speaker 1>you really crave some time with your friend and you

0:12:22.520 --> 0:12:24.720
<v Speaker 1>really want to see that movie before all the spoilers

0:12:24.760 --> 0:12:28.000
<v Speaker 1>come out on Facebook or whatever, maybe you'll say yes.

0:12:28.559 --> 0:12:32.520
<v Speaker 1>Your decision depends upon the weights you assign those factors,

0:12:32.559 --> 0:12:36.160
<v Speaker 1>those inputs, even if you don't consciously think about it

0:12:36.200 --> 0:12:39.680
<v Speaker 1>that way. Well. The Perceptron system worked in a similar way,

0:12:39.880 --> 0:12:44.000
<v Speaker 1>produced outputs by taking the inputs into consideration, including each

0:12:44.120 --> 0:12:48.280
<v Speaker 1>input's weight. Moreover, the more you submitted inputs, the more

0:12:48.320 --> 0:12:51.680
<v Speaker 1>the system would quote unquote learn how to weight each

0:12:51.720 --> 0:12:54.160
<v Speaker 1>of those inputs, all with the goal of bringing the

0:12:54.320 --> 0:12:59.079
<v Speaker 1>actual output that the process or you know, generates closer

0:12:59.080 --> 0:13:03.720
<v Speaker 1>to the one you want it to generate. Okay, I

0:13:03.800 --> 0:13:05.800
<v Speaker 1>just said a lot there. We've got some more to

0:13:05.840 --> 0:13:07.839
<v Speaker 1>get through. But before we get to that, let's take

0:13:07.840 --> 0:13:19.120
<v Speaker 1>a quick break. All right. Before the break, we were

0:13:19.120 --> 0:13:22.720
<v Speaker 1>talking about inputs and weights and the idea of getting

0:13:23.320 --> 0:13:26.680
<v Speaker 1>an output that is close to what you want the

0:13:26.720 --> 0:13:30.240
<v Speaker 1>system to do. That's not a guarantee, right, The system

0:13:30.280 --> 0:13:34.200
<v Speaker 1>could generate an output that's quote unquote wrong, you know,

0:13:34.280 --> 0:13:38.400
<v Speaker 1>depending on whatever task you've set this machine learning system

0:13:38.520 --> 0:13:42.040
<v Speaker 1>to learn. And that gets a bit conceptual. So let's

0:13:42.040 --> 0:13:44.640
<v Speaker 1>talk about a simple example that I love to use.

0:13:44.720 --> 0:13:46.520
<v Speaker 1>If you've been listening to tech sta for a while,

0:13:46.600 --> 0:13:50.600
<v Speaker 1>you've heard this before, and that's talking about pictures of cats.

0:13:50.960 --> 0:13:54.000
<v Speaker 1>Because cats ruled the internet. I don't know if they

0:13:54.040 --> 0:13:56.400
<v Speaker 1>still do. They won't talk to me, so they just

0:13:56.440 --> 0:13:59.040
<v Speaker 1>knock things off shelves. Anyway, if your goal is to

0:13:59.160 --> 0:14:04.040
<v Speaker 1>teach a computer system to differentiate photos that include a

0:14:04.080 --> 0:14:08.240
<v Speaker 1>cat from photos that do not include a cat, well,

0:14:08.800 --> 0:14:10.840
<v Speaker 1>you would need to train the system, and part of

0:14:10.840 --> 0:14:15.640
<v Speaker 1>that includes feeding the system a whole bunch of photographs.

0:14:16.280 --> 0:14:19.200
<v Speaker 1>Some of those would have cats in them, some would not,

0:14:19.960 --> 0:14:24.120
<v Speaker 1>and chances are the system would misidentify photos, maybe a

0:14:24.160 --> 0:14:27.120
<v Speaker 1>significant number of those photos. You would probably have false

0:14:27.160 --> 0:14:29.800
<v Speaker 1>positives where the system thinks there's a cat there and

0:14:29.840 --> 0:14:32.960
<v Speaker 1>there's not, and false negatives where it doesn't think there's

0:14:33.000 --> 0:14:36.320
<v Speaker 1>a cat there but there is. At that point, your

0:14:36.320 --> 0:14:39.080
<v Speaker 1>goal is to try and teach the system to close

0:14:39.200 --> 0:14:43.280
<v Speaker 1>the gap between the actual results it produces and what

0:14:43.440 --> 0:14:46.600
<v Speaker 1>you want it to produce. In some systems that means

0:14:46.680 --> 0:14:49.480
<v Speaker 1>you might have to go in manually to adjust the

0:14:49.560 --> 0:14:52.480
<v Speaker 1>input weights to increase the weight of one input versus

0:14:52.560 --> 0:14:55.960
<v Speaker 1>another in an effort to cut down on mistakes. So

0:14:56.080 --> 0:15:01.440
<v Speaker 1>the perceptron was interesting, but it was very limit in complexity.

0:15:01.760 --> 0:15:04.160
<v Speaker 1>It was essentially a single layer where you'd feed a

0:15:04.160 --> 0:15:06.480
<v Speaker 1>bunch of inputs in and you would get an output,

0:15:06.840 --> 0:15:10.360
<v Speaker 1>So it was suitable for a subset of computational challenges,

0:15:10.360 --> 0:15:14.640
<v Speaker 1>but anything beyond that was well beyond its own reach

0:15:14.880 --> 0:15:18.040
<v Speaker 1>as a single layer network. By the late nineteen fifties,

0:15:18.080 --> 0:15:21.680
<v Speaker 1>other researchers had created new neural networks that were multi layered,

0:15:22.160 --> 0:15:26.920
<v Speaker 1>so a node or neuron didn't just accept inputs, it

0:15:26.920 --> 0:15:30.240
<v Speaker 1>would generate outputs that then would become inputs for another

0:15:30.760 --> 0:15:35.640
<v Speaker 1>layer down. So instead of just having one layer of nodes,

0:15:35.640 --> 0:15:38.040
<v Speaker 1>you would have multiple layers of nodes. Typically you would

0:15:38.040 --> 0:15:42.000
<v Speaker 1>have one at the quote unquote top of the network,

0:15:42.360 --> 0:15:43.920
<v Speaker 1>and you would have outputs at the bottom, and the

0:15:43.960 --> 0:15:47.240
<v Speaker 1>ones in between would be often referred to as hidden layers,

0:15:47.720 --> 0:15:51.040
<v Speaker 1>and who knows how many there would be. So anyway,

0:15:51.360 --> 0:15:54.160
<v Speaker 1>you would feed data to the system. The initial nodes

0:15:54.200 --> 0:15:58.200
<v Speaker 1>would generate information as outputs that would become inputs for

0:15:58.280 --> 0:16:02.400
<v Speaker 1>the next layer down, which would then continue the process

0:16:03.040 --> 0:16:05.000
<v Speaker 1>and so on and so forth until you get to

0:16:05.040 --> 0:16:08.080
<v Speaker 1>the output. So now you had artificial neural networks that

0:16:08.080 --> 0:16:12.480
<v Speaker 1>could tackle more complex challenges, and you would have multiple

0:16:12.520 --> 0:16:16.400
<v Speaker 1>steps in the process. Didn't necessarily mean they were automatically

0:16:16.480 --> 0:16:20.600
<v Speaker 1>better than the perceptron, was just that they were able

0:16:20.640 --> 0:16:26.400
<v Speaker 1>to tackle more complicated tasks. What followed is something that

0:16:26.480 --> 0:16:30.000
<v Speaker 1>will probably sound really familiar to you if you ever

0:16:30.160 --> 0:16:35.200
<v Speaker 1>follow technology or fads, the hype around machine learning and

0:16:35.320 --> 0:16:38.120
<v Speaker 1>artificial intelligence, And keep in mind this is like the

0:16:38.240 --> 0:16:43.200
<v Speaker 1>nineteen sixties. It grew beyond the technology's actual capabilities. At

0:16:43.240 --> 0:16:47.160
<v Speaker 1>that time, people started to project what this technology would

0:16:47.200 --> 0:16:49.560
<v Speaker 1>be able to do, and they did so thinking it

0:16:49.600 --> 0:16:52.840
<v Speaker 1>was going to be in a very short turnaround, like

0:16:52.880 --> 0:16:57.359
<v Speaker 1>we're right on the very precipice of a monstrous breakthrough

0:16:57.440 --> 0:17:00.000
<v Speaker 1>that will bring the science fiction future into the press.

0:17:01.160 --> 0:17:06.000
<v Speaker 1>So when it was realized that we weren't at that, like,

0:17:06.080 --> 0:17:09.920
<v Speaker 1>that's not how progress typically works. It's usually much more

0:17:10.440 --> 0:17:15.480
<v Speaker 1>gradual and humble than that, Well, then enthusiasm around AI

0:17:15.600 --> 0:17:18.120
<v Speaker 1>began to take a hit. And as I mentioned already,

0:17:18.160 --> 0:17:21.720
<v Speaker 1>a big part of AI research really comes down to funding,

0:17:22.280 --> 0:17:25.680
<v Speaker 1>and it gets really challenging to secure funding when public

0:17:25.800 --> 0:17:30.480
<v Speaker 1>opinion dims on a technology. We've seen this happen lots

0:17:30.520 --> 0:17:34.320
<v Speaker 1>of times, right, Like three D television was a fad

0:17:34.359 --> 0:17:37.040
<v Speaker 1>that was pushed. Now, granted, that one, you could argue

0:17:37.119 --> 0:17:40.440
<v Speaker 1>was more of an example of manufacturing companies that make

0:17:40.480 --> 0:17:44.080
<v Speaker 1>televisions trying to push a technology on consumers and the

0:17:44.119 --> 0:17:46.840
<v Speaker 1>consumers just weren't interested. You could argue that was the

0:17:46.880 --> 0:17:50.320
<v Speaker 1>case there. But virtual reality in the nineteen nineties definitely

0:17:50.359 --> 0:17:53.920
<v Speaker 1>followed this pathway. There was this excitement around virtual reality.

0:17:54.920 --> 0:17:58.800
<v Speaker 1>Then that excitement faded to almost nothing when people realized

0:17:58.800 --> 0:18:02.080
<v Speaker 1>that the actual state of the art of the technology

0:18:02.320 --> 0:18:05.800
<v Speaker 1>was far below where they expected it to be. And

0:18:05.880 --> 0:18:09.360
<v Speaker 1>suddenly people who are working in VR couldn't get funding

0:18:09.520 --> 0:18:11.720
<v Speaker 1>for their work, and they kind of had to scrounge

0:18:11.760 --> 0:18:15.680
<v Speaker 1>around in order to keep the development going at all,

0:18:16.320 --> 0:18:19.200
<v Speaker 1>and then eventually we would see that come back around again.

0:18:19.760 --> 0:18:23.320
<v Speaker 1>You could argue that NFTs recently went through this too,

0:18:23.400 --> 0:18:26.880
<v Speaker 1>where the hype went well beyond what NFTs could actually do.

0:18:27.920 --> 0:18:31.240
<v Speaker 1>I've been really down on NFTs in general. I do

0:18:31.280 --> 0:18:36.359
<v Speaker 1>think that there are potential legitimate uses for NFTs, but

0:18:36.480 --> 0:18:42.720
<v Speaker 1>I think the early examples were frivolous and almost solely

0:18:42.800 --> 0:18:48.640
<v Speaker 1>centered around speculation, as in like financial speculation, and as

0:18:48.720 --> 0:18:50.640
<v Speaker 1>a result, there was nothing for it to do other

0:18:50.720 --> 0:18:53.840
<v Speaker 1>than to create a bubble that would ultimately burst, which

0:18:53.880 --> 0:18:57.520
<v Speaker 1>is what happened. And maybe NFTs will recover from that

0:18:57.640 --> 0:19:01.760
<v Speaker 1>and become something that's more fundamentally useful in the Internet

0:19:01.840 --> 0:19:04.880
<v Speaker 1>in the future or in digital commerce in the future,

0:19:06.240 --> 0:19:10.200
<v Speaker 1>but it's going to have to get over the catastrophe

0:19:10.240 --> 0:19:12.960
<v Speaker 1>that happened when the rug was pulled out from underneath

0:19:13.080 --> 0:19:18.000
<v Speaker 1>n FTS, and that was all, you know, predictable and preventable.

0:19:18.760 --> 0:19:23.080
<v Speaker 1>But like I've said before, like I've lifted the joke

0:19:23.119 --> 0:19:25.359
<v Speaker 1>from Peter Cook, we've learned from our mistakes. We can

0:19:25.440 --> 0:19:29.760
<v Speaker 1>repeat them almost exactly. Anyway, This same sort of hype

0:19:29.960 --> 0:19:35.000
<v Speaker 1>cycle activity happened with neural networks and machine learning in

0:19:35.040 --> 0:19:40.800
<v Speaker 1>the nineteen sixties. Then enter Marvin Minsky and Seymour Pappart

0:19:40.840 --> 0:19:43.880
<v Speaker 1>of MIT's AI lab. They were leading that lab at

0:19:43.880 --> 0:19:46.840
<v Speaker 1>the time. In nineteen sixty nine, they co authored a

0:19:46.960 --> 0:19:53.280
<v Speaker 1>book titled Perceptrons. They were actually critical of that artificial

0:19:53.359 --> 0:19:56.480
<v Speaker 1>neural network approach to AI and machine learning. They were

0:19:56.520 --> 0:20:00.040
<v Speaker 1>concerned that the limitations of the technology meant that you

0:19:59.840 --> 0:20:03.560
<v Speaker 1>and you need an unrealistically huge system of artificial neurons,

0:20:03.920 --> 0:20:08.159
<v Speaker 1>perhaps then using that system to compute an infinite number

0:20:08.200 --> 0:20:12.679
<v Speaker 1>of variations of the same process or task if you

0:20:12.760 --> 0:20:15.200
<v Speaker 1>wanted to train the weights so that they were of

0:20:15.320 --> 0:20:19.719
<v Speaker 1>the optimal value. So, in other words, they thought, it's

0:20:19.920 --> 0:20:23.640
<v Speaker 1>too impractical, and it's going to take too much compute time,

0:20:23.680 --> 0:20:26.040
<v Speaker 1>and you're never going to achieve the result you want.

0:20:26.080 --> 0:20:30.200
<v Speaker 1>You're never going to get to that most perfect system.

0:20:30.880 --> 0:20:35.840
<v Speaker 1>And they believed it just had fundamental inescapable flaws. They

0:20:35.880 --> 0:20:40.760
<v Speaker 1>had different systems in mind. Now Minski and Separate tried

0:20:40.800 --> 0:20:43.159
<v Speaker 1>to push their systems forward, and I could do a

0:20:43.200 --> 0:20:46.800
<v Speaker 1>full episode about them too, and their ideas were not bad.

0:20:47.359 --> 0:20:50.119
<v Speaker 1>They were different. It was a different approach. But this

0:20:50.240 --> 0:20:53.040
<v Speaker 1>also meant that researchers who had been pushing the development

0:20:53.040 --> 0:20:56.800
<v Speaker 1>of our artificial neural networks felt forced to move on

0:20:57.000 --> 0:21:02.280
<v Speaker 1>to different projects because financial support for anything connected to

0:21:02.320 --> 0:21:07.320
<v Speaker 1>the concept of neural networks effectively disappeared, right like funding

0:21:07.480 --> 0:21:11.119
<v Speaker 1>just dropped for that. Because here you had these experts

0:21:11.160 --> 0:21:15.800
<v Speaker 1>in computer science saying, yeah, this approach, while interesting, has

0:21:15.840 --> 0:21:19.320
<v Speaker 1>already hit an insurmountable obstacle and it's not going to

0:21:19.359 --> 0:21:21.200
<v Speaker 1>go any further. It's gone as far as it can go.

0:21:21.840 --> 0:21:25.480
<v Speaker 1>And so a lot of computer scientists blamed Minsky and

0:21:25.520 --> 0:21:31.320
<v Speaker 1>Separate for essentially demolishing funding for neural networks for more

0:21:31.359 --> 0:21:34.720
<v Speaker 1>than a decade, and in fact, this would become an

0:21:34.760 --> 0:21:38.919
<v Speaker 1>era that retrospectively, computer scientists would reference as the AI

0:21:39.240 --> 0:21:42.960
<v Speaker 1>Winter got all Game of Thrones up in here Now.

0:21:43.000 --> 0:21:46.240
<v Speaker 1>In nineteen eighty two, there was a hint of spring

0:21:47.040 --> 0:21:51.119
<v Speaker 1>thawing out that AI Winter researchers in Japan were starting

0:21:51.160 --> 0:21:56.119
<v Speaker 1>to resurrect work on neural network projects. And meanwhile, a

0:21:56.160 --> 0:21:59.720
<v Speaker 1>scientist named John Hopfield submitted a research paper to the

0:21:59.800 --> 0:22:03.760
<v Speaker 1>Neattional Academy of Sciences that brought neural networks back into

0:22:03.840 --> 0:22:07.320
<v Speaker 1>discussion here in the United States, and because Japan was

0:22:07.440 --> 0:22:13.200
<v Speaker 1>actively investing in developing that technology, institutions in the United

0:22:13.240 --> 0:22:15.600
<v Speaker 1>States began to open up the purse strings a bit

0:22:16.000 --> 0:22:18.880
<v Speaker 1>because there was a concern that if there were something

0:22:19.359 --> 0:22:22.720
<v Speaker 1>to this artificial neural network concept, if in fact those

0:22:22.840 --> 0:22:28.040
<v Speaker 1>obstacles weren't insurmountable, as min skin Separate had suggested, the

0:22:28.200 --> 0:22:32.800
<v Speaker 1>US could potentially fall behind another country because it would

0:22:32.800 --> 0:22:36.159
<v Speaker 1>fail to fund its development. So, in a desire not

0:22:36.680 --> 0:22:38.760
<v Speaker 1>to have Japan take the ball and run with it,

0:22:39.080 --> 0:22:42.520
<v Speaker 1>the United States began to invest again in artificial neural

0:22:42.600 --> 0:22:46.600
<v Speaker 1>network research and development. In the mid nineteen eighties, computer

0:22:46.680 --> 0:22:53.040
<v Speaker 1>scientists essentially rediscovered the usefulness of a process called back propagation.

0:22:53.640 --> 0:22:56.720
<v Speaker 1>And I've already talked about nodes and weights and stuff,

0:22:56.760 --> 0:22:58.479
<v Speaker 1>but this is going to require a little bit more

0:22:58.480 --> 0:23:02.000
<v Speaker 1>explanation to understand what by propagation is all about. So

0:23:02.119 --> 0:23:05.560
<v Speaker 1>let's kind of try to visualize a neural network. So

0:23:05.600 --> 0:23:07.960
<v Speaker 1>you've got your input nodes. Just think of a bunch

0:23:07.960 --> 0:23:10.960
<v Speaker 1>of circles. If you were drawing it from top to bottom,

0:23:10.960 --> 0:23:13.760
<v Speaker 1>this would be your top layer. This is like the

0:23:13.800 --> 0:23:17.560
<v Speaker 1>funnels where you're going to feed data into the system.

0:23:18.040 --> 0:23:20.200
<v Speaker 1>Now you've got a whole bunch of these at the top,

0:23:20.280 --> 0:23:22.720
<v Speaker 1>and they can accept the data that you're feeding in.

0:23:23.200 --> 0:23:28.240
<v Speaker 1>They process that data and then based upon you some operation,

0:23:28.840 --> 0:23:32.920
<v Speaker 1>they will then send an output to a node one

0:23:33.000 --> 0:23:36.280
<v Speaker 1>layer down. So there's lots of other nodes in the

0:23:36.359 --> 0:23:38.760
<v Speaker 1>layers below, or maybe not as many as you have

0:23:38.840 --> 0:23:43.560
<v Speaker 1>initial layers. You might actually have fewer, and the layers

0:23:43.600 --> 0:23:47.119
<v Speaker 1>above will send to you know, data to a specific

0:23:47.200 --> 0:23:51.040
<v Speaker 1>node depending upon what the outcome is. Whatever the output is,

0:23:52.280 --> 0:23:56.800
<v Speaker 1>So these nodes accept the input. These inputs have a

0:23:56.840 --> 0:23:59.920
<v Speaker 1>bias and a weight to them, and this is one

0:24:00.040 --> 0:24:03.040
<v Speaker 1>the hidden layers. They will then create an output and

0:24:03.080 --> 0:24:07.840
<v Speaker 1>send that on to nodes another layer down. So this

0:24:07.960 --> 0:24:10.640
<v Speaker 1>goes on until you get to your output layer where

0:24:10.680 --> 0:24:14.280
<v Speaker 1>you get your final result, and then you can determine

0:24:14.280 --> 0:24:16.840
<v Speaker 1>whether or not the final result matches what you were

0:24:16.880 --> 0:24:20.480
<v Speaker 1>hoping for. So did your system properly identify which photos

0:24:20.520 --> 0:24:23.280
<v Speaker 1>do and don't have cats in them? Now, as I

0:24:23.280 --> 0:24:26.720
<v Speaker 1>mentioned earlier, you typically get results that aren't perfect, but

0:24:26.800 --> 0:24:30.520
<v Speaker 1>we want to train the system to improve with every test.

0:24:31.200 --> 0:24:35.320
<v Speaker 1>Back propagation is one way to do this. So with

0:24:35.359 --> 0:24:38.600
<v Speaker 1>that propagation, you actually start with the final output. You've

0:24:38.640 --> 0:24:41.840
<v Speaker 1>already done a test run, right, and you've got your output,

0:24:42.560 --> 0:24:48.080
<v Speaker 1>and maybe your test has five possible final outcomes, but

0:24:48.200 --> 0:24:51.440
<v Speaker 1>only one of those is the outcome you actually want. Okay,

0:24:51.480 --> 0:24:54.680
<v Speaker 1>we'll say it's outcome number one. We're saying I want

0:24:54.680 --> 0:24:58.439
<v Speaker 1>this system to more often than not come to the

0:24:58.480 --> 0:25:00.680
<v Speaker 1>conclusion that it's outcome number one one. But you run

0:25:00.720 --> 0:25:07.480
<v Speaker 1>your test. It's got one thousand little tasks in it,

0:25:07.520 --> 0:25:10.800
<v Speaker 1>and you run your test, you find out that it

0:25:10.880 --> 0:25:13.720
<v Speaker 1>only arrives at outcome number one five percent of the time,

0:25:13.920 --> 0:25:16.399
<v Speaker 1>which is actually worse than random chance. Right, it should

0:25:16.400 --> 0:25:18.640
<v Speaker 1>be twenty percent for random chance, But it's only getting

0:25:18.680 --> 0:25:21.679
<v Speaker 1>there five percent of the time. Something is going really

0:25:21.720 --> 0:25:25.480
<v Speaker 1>wrong with your system for it to mistakenly go to

0:25:25.560 --> 0:25:29.080
<v Speaker 1>one of the other options and very rarely go to

0:25:29.119 --> 0:25:32.280
<v Speaker 1>the correct one. So let's say you also noticed the

0:25:32.280 --> 0:25:35.359
<v Speaker 1>outcome number three. It goes to that one forty percent

0:25:35.400 --> 0:25:37.680
<v Speaker 1>of the time. So it's making this mistake forty percent

0:25:37.680 --> 0:25:39.760
<v Speaker 1>of the time and only getting it right five percent

0:25:39.760 --> 0:25:42.399
<v Speaker 1>of the time. So things are seriously out of whack.

0:25:42.440 --> 0:25:46.560
<v Speaker 1>You need to find which connections which would involve the

0:25:46.600 --> 0:25:50.480
<v Speaker 1>biases and the weights that are within your system that

0:25:50.520 --> 0:25:54.640
<v Speaker 1>are leading it to mistakenly arrive at the wrong outcome.

0:25:54.840 --> 0:25:59.359
<v Speaker 1>So frequently you want to reduce those factors, and simultaneously

0:25:59.400 --> 0:26:02.440
<v Speaker 1>you need to boost the ones that lead the system

0:26:02.560 --> 0:26:05.240
<v Speaker 1>to arrive at outcome number one, because that's the answer

0:26:05.320 --> 0:26:08.840
<v Speaker 1>you actually want the system to get to. All Right,

0:26:09.960 --> 0:26:12.040
<v Speaker 1>I've been droning on for a bit, Let's take another

0:26:12.119 --> 0:26:14.640
<v Speaker 1>quick break. When we come back, I'll finish up explaining

0:26:14.680 --> 0:26:27.159
<v Speaker 1>this and then we'll move on to catastrophic forgetting. Okay,

0:26:27.600 --> 0:26:30.639
<v Speaker 1>so we were talking about how you are looking at

0:26:30.640 --> 0:26:34.760
<v Speaker 1>a system that is coming to the wrong conclusion ninety

0:26:34.800 --> 0:26:37.880
<v Speaker 1>five percent of the time. It is a broken system.

0:26:38.200 --> 0:26:42.399
<v Speaker 1>You have to then figure out what factors are causing

0:26:42.440 --> 0:26:46.240
<v Speaker 1>this to happen, and they are numerous, right, They extend

0:26:46.320 --> 0:26:48.800
<v Speaker 1>all the way up to the very top of your

0:26:48.840 --> 0:26:51.480
<v Speaker 1>neural network, the other end where the input comes in.

0:26:51.800 --> 0:26:54.439
<v Speaker 1>But you can't just change everything all at once. You've

0:26:54.440 --> 0:26:57.800
<v Speaker 1>got to figure this out systematically, and that's what backpropagation

0:26:57.920 --> 0:27:02.200
<v Speaker 1>is really all about. It detects which links one layer

0:27:02.280 --> 0:27:06.520
<v Speaker 1>up from the output have the greatest impact on the outcome. Right,

0:27:07.200 --> 0:27:10.040
<v Speaker 1>changing everything would be tedious, It would be impractical. You

0:27:10.119 --> 0:27:13.440
<v Speaker 1>might even make things worse. Some of these neural networks

0:27:13.480 --> 0:27:18.600
<v Speaker 1>are confoundingly complicated, so it's not really a feasible solution.

0:27:19.000 --> 0:27:21.840
<v Speaker 1>So instead you look at the connections that are having

0:27:21.840 --> 0:27:25.080
<v Speaker 1>the biggest impact on your outcome. So you want things

0:27:25.119 --> 0:27:27.439
<v Speaker 1>where if you make a small change in either the

0:27:27.480 --> 0:27:30.399
<v Speaker 1>bias or the weight, or maybe both, you'll see a

0:27:30.520 --> 0:27:34.360
<v Speaker 1>larger end effect on the outcome. All the connections are

0:27:34.480 --> 0:27:38.359
<v Speaker 1>arguably important, but some are more important than others. Backpropagation

0:27:38.480 --> 0:27:41.240
<v Speaker 1>works backwards from the result toward the other end of

0:27:41.240 --> 0:27:44.040
<v Speaker 1>the network to tweak those connections. It boosts ones that

0:27:44.160 --> 0:27:48.120
<v Speaker 1>lead to the correct or desired response, and it reduces

0:27:48.160 --> 0:27:52.640
<v Speaker 1>the values of those that lead to incorrect or undesired responses.

0:27:53.040 --> 0:27:54.840
<v Speaker 1>If we were to think of this like the classic

0:27:54.920 --> 0:27:59.280
<v Speaker 1>example and chaos theory, this could potentially involve us studying

0:27:59.320 --> 0:28:01.879
<v Speaker 1>a hurricane as it hits land and tracing its history

0:28:01.960 --> 0:28:05.240
<v Speaker 1>back as it moved through the ocean, and we would

0:28:05.240 --> 0:28:08.560
<v Speaker 1>eventually arrive at the point where it was a tropical storm,

0:28:08.640 --> 0:28:11.359
<v Speaker 1>and then we would go further back and see the

0:28:11.359 --> 0:28:14.240
<v Speaker 1>factors that led to the creation of that storm, and

0:28:14.320 --> 0:28:16.040
<v Speaker 1>maybe if we tracked it all the way back we

0:28:16.040 --> 0:28:19.520
<v Speaker 1>would even find that one of a billion factors that

0:28:19.600 --> 0:28:22.800
<v Speaker 1>made the storm was in fact, a butterfly was flapping

0:28:22.880 --> 0:28:24.399
<v Speaker 1>its wings on the other side of the world, and

0:28:24.400 --> 0:28:27.680
<v Speaker 1>that contributed to it. Maybe we find out that butterfly

0:28:27.720 --> 0:28:31.520
<v Speaker 1>flap of its wings had an impact, but it was negligible,

0:28:31.560 --> 0:28:33.280
<v Speaker 1>and that if the butterfly hadn't flapped its wings, the

0:28:33.359 --> 0:28:36.640
<v Speaker 1>hurricane still would have happened. That would be an example of, well,

0:28:36.640 --> 0:28:40.600
<v Speaker 1>we don't bother adjusting the weight of the impact of

0:28:40.600 --> 0:28:43.640
<v Speaker 1>that butterfly flapping its wings because it doesn't matter for

0:28:43.680 --> 0:28:46.560
<v Speaker 1>the end result. But what if we were to discover

0:28:46.720 --> 0:28:49.720
<v Speaker 1>that that butterfly flap of its wings is the only

0:28:49.840 --> 0:28:53.680
<v Speaker 1>reason the hurricane happened that, or at least was the

0:28:53.720 --> 0:28:57.680
<v Speaker 1>primary reason that all the other factors pale in comparison, Well,

0:28:57.680 --> 0:29:00.800
<v Speaker 1>then we'd want to make sure we boost the weight

0:29:00.920 --> 0:29:05.680
<v Speaker 1>of that input, because clearly that butterfly is fundamental for hurricanes.

0:29:07.480 --> 0:29:09.920
<v Speaker 1>I think hurricanes are really dangerous, and I would ask

0:29:10.000 --> 0:29:13.920
<v Speaker 1>butterflies to kind of chill, all right. I mean, I

0:29:13.920 --> 0:29:17.200
<v Speaker 1>don't want butterflies to go away, just you know, maybe

0:29:17.240 --> 0:29:21.280
<v Speaker 1>stop flapping so much. Anyway, the formula for back propagation

0:29:21.360 --> 0:29:25.200
<v Speaker 1>gets into some calculus that is well beyond my knowledge

0:29:25.280 --> 0:29:28.360
<v Speaker 1>and skill. So rather than attempt to stumble my way

0:29:28.440 --> 0:29:32.560
<v Speaker 1>through an explanation that I don't actually understand, I think

0:29:32.560 --> 0:29:34.760
<v Speaker 1>it's best to leave the concept at the high level

0:29:34.920 --> 0:29:37.360
<v Speaker 1>that I have described right now. So just know that

0:29:37.400 --> 0:29:39.920
<v Speaker 1>it gets way more granular than what I've talked about.

0:29:39.960 --> 0:29:44.080
<v Speaker 1>But essentially, you're looking at those factors that led to

0:29:44.280 --> 0:29:48.200
<v Speaker 1>the ultimate decision and saying which ones of these had

0:29:48.240 --> 0:29:51.680
<v Speaker 1>the greatest impact, and how can I tweak them so

0:29:51.840 --> 0:29:54.680
<v Speaker 1>that I can shape the outcome to one I wanted.

0:29:54.760 --> 0:29:57.000
<v Speaker 1>If we were thinking about that example I gave about

0:29:57.280 --> 0:30:01.160
<v Speaker 1>whether or not you go to the movies, maybe in

0:30:02.400 --> 0:30:05.920
<v Speaker 1>present day you starts thinking about past experiences where you

0:30:06.040 --> 0:30:08.080
<v Speaker 1>made a decision to go out when you had a

0:30:08.080 --> 0:30:11.800
<v Speaker 1>big day, then the following day, and how that impacted you,

0:30:11.840 --> 0:30:14.480
<v Speaker 1>perhaps negatively. Maybe you're like, man, I should have gotten

0:30:14.520 --> 0:30:17.160
<v Speaker 1>a promotion by now, and then you think, well, I

0:30:17.200 --> 0:30:20.320
<v Speaker 1>do go to the movies an awful lot. You might say,

0:30:20.640 --> 0:30:23.400
<v Speaker 1>I need to adjust some of the factors that affect

0:30:23.440 --> 0:30:27.880
<v Speaker 1>my decision making process and perhaps prioritize my career. Or

0:30:28.600 --> 0:30:33.040
<v Speaker 1>if you've decided that late stage capitalism is terrible, evil,

0:30:33.160 --> 0:30:35.920
<v Speaker 1>and that you're going to try and live a hedonistic

0:30:35.960 --> 0:30:40.320
<v Speaker 1>lifestyle of a wandering soul. Maybe you say, I'm going

0:30:40.400 --> 0:30:43.200
<v Speaker 1>to go and see my movie with my friend, and yeah,

0:30:43.480 --> 0:30:45.600
<v Speaker 1>that's just how it is, because that's the most important

0:30:45.600 --> 0:30:47.800
<v Speaker 1>thing to me. You only go around this crazy world once.

0:30:47.880 --> 0:30:50.360
<v Speaker 1>After all, I'm not telling you which way to go.

0:30:51.360 --> 0:30:55.320
<v Speaker 1>I'm still finding my own way. But yeah, backpropagation would

0:30:55.360 --> 0:30:57.640
<v Speaker 1>be how you would go back and say, all right, well,

0:30:57.680 --> 0:31:00.760
<v Speaker 1>because I don't like the outcome that happened, and I

0:31:00.840 --> 0:31:05.240
<v Speaker 1>need to change the way. These factors weigh in on

0:31:05.360 --> 0:31:08.720
<v Speaker 1>the decision making process that goes through the whole system. Now,

0:31:08.760 --> 0:31:12.400
<v Speaker 1>the advancements in the science of neural networks proved that

0:31:12.400 --> 0:31:15.920
<v Speaker 1>the technology no longer operated under the constraints that concern

0:31:16.040 --> 0:31:19.240
<v Speaker 1>Minsky and support in the late sixties, So once again

0:31:19.640 --> 0:31:23.840
<v Speaker 1>funding found its way to neural network research and development projects.

0:31:24.560 --> 0:31:29.160
<v Speaker 1>Now let's finally talk about forgetting and what makes it catastrophic.

0:31:29.960 --> 0:31:33.680
<v Speaker 1>So you could, in theory, develop an artificial neural network

0:31:34.000 --> 0:31:37.800
<v Speaker 1>and have a library of training data, and the only

0:31:37.840 --> 0:31:40.560
<v Speaker 1>thing you ever do with this network is you feed

0:31:40.640 --> 0:31:45.280
<v Speaker 1>that same set of training data to that same neural

0:31:45.320 --> 0:31:49.760
<v Speaker 1>network over and over. In an effort to get performance

0:31:49.760 --> 0:31:53.040
<v Speaker 1>as close to perfect as you possibly can. Just you know,

0:31:53.120 --> 0:31:54.920
<v Speaker 1>it's kind of like if you have a car and

0:31:54.960 --> 0:31:58.719
<v Speaker 1>you're constantly tweaking it so it will perform better, and

0:31:58.800 --> 0:32:01.640
<v Speaker 1>maybe you change one thing and it boosts performance in

0:32:01.680 --> 0:32:05.440
<v Speaker 1>one area, but it kind of negatively impacts performance in

0:32:05.480 --> 0:32:08.360
<v Speaker 1>another area, so then you got to tweak something else.

0:32:08.720 --> 0:32:10.800
<v Speaker 1>You could be doing that with an artificial neural network

0:32:10.800 --> 0:32:13.200
<v Speaker 1>forever and just be using the same set of training data,

0:32:13.640 --> 0:32:15.600
<v Speaker 1>and all you're trying to do is make a system

0:32:15.960 --> 0:32:18.720
<v Speaker 1>that could handle that training data better than any other

0:32:18.760 --> 0:32:21.240
<v Speaker 1>system in the world. And that would be interesting, but

0:32:21.320 --> 0:32:24.880
<v Speaker 1>it would be useless from a practical standpoint. You could say, like, hey,

0:32:24.880 --> 0:32:26.680
<v Speaker 1>you want to see my machine that can sort through

0:32:26.880 --> 0:32:30.600
<v Speaker 1>only this collection of photographs and pick out the ones

0:32:30.680 --> 0:32:32.520
<v Speaker 1>that have cats in them and the ones that don't

0:32:32.840 --> 0:32:37.080
<v Speaker 1>pretty pretty darn effectively, but not perfectly. It's not really

0:32:37.120 --> 0:32:40.880
<v Speaker 1>an interesting value proposition, right, So more likely you are

0:32:40.920 --> 0:32:43.720
<v Speaker 1>eventually going to start feeding lots of different kinds of

0:32:43.800 --> 0:32:48.240
<v Speaker 1>data to this neural network, And yeah, you train the

0:32:48.320 --> 0:32:51.200
<v Speaker 1>network on certain data sets, but your goal is to

0:32:51.240 --> 0:32:53.960
<v Speaker 1>feed new sets of data data the system has never

0:32:54.040 --> 0:32:57.400
<v Speaker 1>encountered before. And rely on the system's ability to process

0:32:57.480 --> 0:33:00.760
<v Speaker 1>this information correctly to get the result you want. And

0:33:00.880 --> 0:33:03.720
<v Speaker 1>we might even be talking about stuff the human beings

0:33:03.720 --> 0:33:07.880
<v Speaker 1>can't easily do. Right. But see, the training data is

0:33:07.880 --> 0:33:09.800
<v Speaker 1>going to mean that the network will start to create

0:33:09.800 --> 0:33:14.640
<v Speaker 1>and reinforce certain pathways, and those pathways will over time

0:33:14.680 --> 0:33:16.760
<v Speaker 1>get stronger and stronger, just as we said at the

0:33:16.760 --> 0:33:19.960
<v Speaker 1>beginning of this episode. But new data is going to

0:33:20.000 --> 0:33:24.720
<v Speaker 1>necessitate new pathways. Sometimes, when the system begins to form

0:33:24.800 --> 0:33:29.520
<v Speaker 1>these new pathways, it forgets the old pathways. So it's

0:33:29.600 --> 0:33:32.600
<v Speaker 1>possible for a neural network to actually get worse at

0:33:32.600 --> 0:33:36.440
<v Speaker 1>the task it had previously been trained to do with

0:33:36.600 --> 0:33:40.440
<v Speaker 1>the actual training material. In fact, in a true catastrophe,

0:33:40.480 --> 0:33:44.760
<v Speaker 1>the system might forget the objective and doesn't recognize what

0:33:44.800 --> 0:33:47.720
<v Speaker 1>the desired outcome is meant to be, so the results

0:33:47.760 --> 0:33:50.760
<v Speaker 1>can appear random and meaningless. It's as if the system

0:33:50.800 --> 0:33:54.720
<v Speaker 1>has developed some form of amnesia. So this is prevalent,

0:33:55.320 --> 0:33:59.880
<v Speaker 1>most prevalent anyway, in systems that rely on unguided learning.

0:34:00.520 --> 0:34:05.440
<v Speaker 1>With guided learning, you have engineers who are carefully selecting

0:34:05.480 --> 0:34:10.120
<v Speaker 1>the data that gets fed into a system. An unguided

0:34:10.160 --> 0:34:14.480
<v Speaker 1>system would collect raw data from wherever and attempt to

0:34:14.520 --> 0:34:17.880
<v Speaker 1>deliver desired results, and that those are the kinds of

0:34:18.600 --> 0:34:22.319
<v Speaker 1>neural networks that are more prone to catastrophic forgetting. But

0:34:22.360 --> 0:34:26.640
<v Speaker 1>as I said, machine learning systems tackle new data, maybe

0:34:26.680 --> 0:34:30.560
<v Speaker 1>even new tasks, and then you get the risk of

0:34:30.600 --> 0:34:33.359
<v Speaker 1>the system forgetting stuff. So I jokingly say, it's kind

0:34:33.400 --> 0:34:35.480
<v Speaker 1>of like when I learned something new, it has to

0:34:35.520 --> 0:34:39.000
<v Speaker 1>push out something old, like you know, my friend's phone

0:34:39.040 --> 0:34:41.480
<v Speaker 1>number or something. Suddenly I can no longer remember it

0:34:41.560 --> 0:34:44.960
<v Speaker 1>because I learned some new interesting fact, as if I

0:34:45.000 --> 0:34:48.320
<v Speaker 1>have met my capacity for being able to know things.

0:34:48.520 --> 0:34:52.040
<v Speaker 1>So learning anything new necessitates having to forget something I

0:34:52.200 --> 0:34:55.440
<v Speaker 1>used to know, like gat Ye, because now gat Ye

0:34:55.600 --> 0:34:58.760
<v Speaker 1>is just somebody that I used to know. But wait,

0:34:59.040 --> 0:35:04.000
<v Speaker 1>there's more. Just as a system can experience catastrophic forgetting,

0:35:04.719 --> 0:35:10.239
<v Speaker 1>it can also experience catastrophic remembering. This is when a

0:35:10.280 --> 0:35:14.480
<v Speaker 1>system mistakenly believes it is doing one process a task

0:35:14.600 --> 0:35:18.480
<v Speaker 1>it had previously been trained to do, rather than the

0:35:18.480 --> 0:35:22.359
<v Speaker 1>one it's actually trying to do. So let's say we've

0:35:22.360 --> 0:35:25.680
<v Speaker 1>got an artificial neural network, and originally we taught it

0:35:25.719 --> 0:35:28.160
<v Speaker 1>to recognize the photos that have cats in them versus

0:35:28.239 --> 0:35:31.359
<v Speaker 1>the ones that don't. But now we have retrained the

0:35:31.400 --> 0:35:35.800
<v Speaker 1>same artificial neural network to try and recognize handwritten text,

0:35:36.520 --> 0:35:40.400
<v Speaker 1>except when we feed handwritten text to the system, suddenly

0:35:40.440 --> 0:35:43.879
<v Speaker 1>the system believes it's trying to determine where the cats are.

0:35:44.480 --> 0:35:47.040
<v Speaker 1>This is something that can happen with machine learning systems too,

0:35:47.040 --> 0:35:49.880
<v Speaker 1>and you still get bad results out of it. So

0:35:50.000 --> 0:35:54.440
<v Speaker 1>this is a real problem. Now, these are not insurmountable problems.

0:35:54.960 --> 0:35:58.840
<v Speaker 1>There are some solutions that are actually intuitive. For example,

0:35:59.360 --> 0:36:02.760
<v Speaker 1>any game out their nose that it's best to save

0:36:02.880 --> 0:36:05.720
<v Speaker 1>your game just before you head into a big boss battle,

0:36:06.000 --> 0:36:09.200
<v Speaker 1>just in case things don't go the way you planned. Well.

0:36:09.200 --> 0:36:12.640
<v Speaker 1>With artificial neural networks, it's maybe not a bad idea

0:36:12.680 --> 0:36:15.960
<v Speaker 1>to make a copy of a network before you retrain

0:36:16.040 --> 0:36:18.480
<v Speaker 1>it to do something new. Then you still have the

0:36:18.520 --> 0:36:22.399
<v Speaker 1>backup if things do go pair shape. There are other

0:36:22.440 --> 0:36:27.080
<v Speaker 1>approaches to decreasing the risk of catastrophic forgetting or catastrophic remembering.

0:36:27.600 --> 0:36:31.719
<v Speaker 1>An article in applied Mathematics titled Overcoming catastrophic forgetting in

0:36:31.760 --> 0:36:35.839
<v Speaker 1>neural networks describes a system in which the researchers purposefully

0:36:35.920 --> 0:36:40.960
<v Speaker 1>slowed down the network's ability to change the weights involved

0:36:41.040 --> 0:36:46.680
<v Speaker 1>in important tasks. From previous training cycles. So this makes

0:36:46.719 --> 0:36:49.560
<v Speaker 1>teaching the system to do new tasks a little more

0:36:49.640 --> 0:36:56.680
<v Speaker 1>challenging because it's protecting these weights. It's preventing the system's

0:36:56.680 --> 0:37:01.920
<v Speaker 1>ability to be completely plastic, which means the system has

0:37:01.960 --> 0:37:04.440
<v Speaker 1>to work around these constraints and still learn how to

0:37:04.480 --> 0:37:07.600
<v Speaker 1>do the new task, but in the process it means

0:37:07.600 --> 0:37:11.400
<v Speaker 1>it doesn't forget how to do the previous tasks. This

0:37:11.560 --> 0:37:15.240
<v Speaker 1>article is interesting because the tasks the researchers actually used

0:37:15.360 --> 0:37:17.920
<v Speaker 1>the purposes of training, Like, what were they teaching the

0:37:18.000 --> 0:37:20.759
<v Speaker 1>artificial neural network to do well. They were teaching it

0:37:20.960 --> 0:37:24.000
<v Speaker 1>how to play Atari twenty six hundred games. So they

0:37:24.000 --> 0:37:27.200
<v Speaker 1>would start with one game and train the system on

0:37:27.239 --> 0:37:30.960
<v Speaker 1>how to play the game. Then they would give the

0:37:31.000 --> 0:37:35.480
<v Speaker 1>system a new game with different game mechanics, and the

0:37:35.480 --> 0:37:38.160
<v Speaker 1>system would have to learn how to play this new game,

0:37:38.760 --> 0:37:40.800
<v Speaker 1>but they wanted to see if it could still remember

0:37:40.840 --> 0:37:43.400
<v Speaker 1>how to play the original game. That was kind of

0:37:43.200 --> 0:37:45.640
<v Speaker 1>the system they were working on. They were tweaking things

0:37:46.239 --> 0:37:50.680
<v Speaker 1>so that the machine learning artificial neural network as a

0:37:50.680 --> 0:37:53.719
<v Speaker 1>whole could learn how to play multiple Atari twenty six

0:37:53.800 --> 0:37:56.720
<v Speaker 1>hundred games without forgetting how to do the previous ones.

0:37:57.160 --> 0:37:59.319
<v Speaker 1>This is a non trivial task. I mean, it takes

0:37:59.320 --> 0:38:02.440
<v Speaker 1>a lot of work to see exactly how to preserve

0:38:02.480 --> 0:38:05.240
<v Speaker 1>things so that you're not slowing down the learning process

0:38:05.280 --> 0:38:07.920
<v Speaker 1>too much, but you're also not inviting the possibility of

0:38:07.960 --> 0:38:12.759
<v Speaker 1>catastrophic forgetting. Now that's just one example of how researchers

0:38:12.800 --> 0:38:16.400
<v Speaker 1>are looking to mitigate the problem of catastrophic forgetting in

0:38:16.400 --> 0:38:20.040
<v Speaker 1>catastrophic remembering, there are other methods as well, and maybe

0:38:20.080 --> 0:38:23.000
<v Speaker 1>I'll do another episode where I'll go into more detail

0:38:23.280 --> 0:38:26.480
<v Speaker 1>on some of those. They do get pretty complicated, and

0:38:26.800 --> 0:38:30.839
<v Speaker 1>in fact, eventually Rerilli and I even eventually pretty early on,

0:38:31.400 --> 0:38:35.000
<v Speaker 1>I hit my limit for as far as I can

0:38:35.120 --> 0:38:38.879
<v Speaker 1>understand the actual mechanics of the system. So rather than

0:38:40.280 --> 0:38:43.960
<v Speaker 1>try and punch above my weight, I think it's best

0:38:44.000 --> 0:38:47.000
<v Speaker 1>to kind of be a little more general, but just

0:38:47.040 --> 0:38:49.040
<v Speaker 1>to have that understanding to kind of get a better

0:38:49.080 --> 0:38:53.520
<v Speaker 1>appreciation of some of the challenges relating to artificial intelligence

0:38:53.560 --> 0:38:57.799
<v Speaker 1>in general and machine learning in particular. And again, like

0:38:57.840 --> 0:39:02.279
<v Speaker 1>this machine learning issue, it's a bigger problem with more

0:39:02.320 --> 0:39:08.279
<v Speaker 1>sophisticated systems that are meant to do unsupervised and unguided learning, right,

0:39:08.360 --> 0:39:09.879
<v Speaker 1>those are the ones that are going to be more

0:39:09.960 --> 0:39:14.040
<v Speaker 1>prone to these issues. If we're talking about supervised and

0:39:14.120 --> 0:39:18.640
<v Speaker 1>guided learning, where engineers are being very careful with the

0:39:18.719 --> 0:39:22.080
<v Speaker 1>data being fed to a system, it's less likely to happen.

0:39:22.440 --> 0:39:27.480
<v Speaker 1>But the whole promise, or at least the you know,

0:39:27.600 --> 0:39:29.640
<v Speaker 1>not the promise of the technology itself, but the promise

0:39:29.640 --> 0:39:32.880
<v Speaker 1>of the people who are funding it, is that this

0:39:32.920 --> 0:39:36.440
<v Speaker 1>technology is going to reach a point where it's able

0:39:36.440 --> 0:39:38.920
<v Speaker 1>to learn on its own and be able to do

0:39:39.000 --> 0:39:41.799
<v Speaker 1>things better than people can do, to free us up

0:39:41.840 --> 0:39:44.080
<v Speaker 1>to doing, you know, stuff we want to do instead

0:39:44.120 --> 0:39:46.800
<v Speaker 1>of stuff we have to do. That's like the science

0:39:46.800 --> 0:39:51.360
<v Speaker 1>fiction dream version of AI. As we all know, getting

0:39:51.400 --> 0:39:53.840
<v Speaker 1>there is much more painful. It's not like a simple

0:39:53.880 --> 0:39:58.320
<v Speaker 1>process of Hey, we've made everything easy to do now,

0:39:58.400 --> 0:40:01.040
<v Speaker 1>and you don't have to worry called day. You can

0:40:01.120 --> 0:40:04.839
<v Speaker 1>enjoy your life and pursue your dreams and develop your

0:40:04.880 --> 0:40:07.879
<v Speaker 1>hobbies and your interests, and you can have fulfillment and

0:40:07.960 --> 0:40:10.960
<v Speaker 1>somehow money isn't important anymore. Like that seems to be

0:40:11.080 --> 0:40:13.239
<v Speaker 1>the Star Trek version of the future that people want

0:40:13.280 --> 0:40:15.759
<v Speaker 1>it to go in. But as we have seen, the

0:40:15.800 --> 0:40:18.440
<v Speaker 1>process of getting there is way more painful. As you know,

0:40:18.480 --> 0:40:22.000
<v Speaker 1>people face a reality of potentially being out of work

0:40:22.600 --> 0:40:27.640
<v Speaker 1>because of AI, or maybe being paid way less to

0:40:27.800 --> 0:40:30.560
<v Speaker 1>do work because the AI is doing most of it.

0:40:31.400 --> 0:40:34.720
<v Speaker 1>These are not That's not Star Trek future. That's getting

0:40:34.760 --> 0:40:38.439
<v Speaker 1>like into Blade Runner future, So we don't want that one.

0:40:38.520 --> 0:40:42.759
<v Speaker 1>By the way, the tears in the Rain speech is fantastic,

0:40:42.840 --> 0:40:44.399
<v Speaker 1>but you do not want to live in the Blade

0:40:44.440 --> 0:40:47.520
<v Speaker 1>Runner world. Trust me, you might not want to live

0:40:47.520 --> 0:40:49.520
<v Speaker 1>in the Star Trek world either, because those outfits don't

0:40:49.560 --> 0:40:55.200
<v Speaker 1>look that comfortable. Anyway. That's my little discussion about AI,

0:40:55.400 --> 0:40:59.360
<v Speaker 1>machine learning and castrophic forgetting in cast trophic. Remembering again,

0:40:59.440 --> 0:41:03.480
<v Speaker 1>this is just one of the challenges associated with AI

0:41:03.640 --> 0:41:05.879
<v Speaker 1>and machine learning. I don't mean to suggest it's the

0:41:05.920 --> 0:41:09.560
<v Speaker 1>one and only, or even that it's the most important one,

0:41:09.800 --> 0:41:11.839
<v Speaker 1>but it is one that I had not really heard

0:41:11.840 --> 0:41:14.200
<v Speaker 1>of until I listened to that Skeptics Guide to the

0:41:14.320 --> 0:41:17.520
<v Speaker 1>Universe episode over the weekend, and it was really interesting

0:41:17.560 --> 0:41:21.120
<v Speaker 1>to dive into the material and read up about it

0:41:21.160 --> 0:41:23.280
<v Speaker 1>and to get a better understanding of what it means

0:41:23.280 --> 0:41:27.200
<v Speaker 1>and how it works. And as I said, we'll probably

0:41:27.239 --> 0:41:29.880
<v Speaker 1>revisit this topic in the future, especially since AI is

0:41:29.880 --> 0:41:32.719
<v Speaker 1>such a big deal these days. Okay, but that's it

0:41:32.920 --> 0:41:36.160
<v Speaker 1>for this episode. Of tech Stuff. I hope you are

0:41:36.280 --> 0:41:40.359
<v Speaker 1>all well, and I will talk to you again really soon.

0:41:46.680 --> 0:41:51.360
<v Speaker 1>Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio,

0:41:51.680 --> 0:41:55.359
<v Speaker 1>visit the iHeartRadio app, Apple Podcasts, or wherever you listen

0:41:55.400 --> 0:41:56.480
<v Speaker 1>to your favorite shows.