WEBVTT - Machine Learning and Catastrophic Forgetting 0:00:04.440 --> 0:00:12.280 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.280 --> 0:00:15.960 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:16.000 --> 0:00:19.160 I'm an executive producer with iHeartRadio and how the tech 0:00:19.200 --> 0:00:22.479 are you? So? Over this past weekend, I was listening 0:00:22.520 --> 0:00:26.239 to the podcast The Skeptics Guide to the Universe, which 0:00:26.280 --> 0:00:28.520 I have no connection to. I just listened to it, 0:00:29.240 --> 0:00:32.800 and it included a section on AI that referenced something 0:00:33.000 --> 0:00:37.040 I don't think I had heard of before, which is 0:00:37.600 --> 0:00:41.680 really talking more about my oversight than anything else. Maybe 0:00:41.720 --> 0:00:44.120 I did hear about it, but then I forgot about it, 0:00:44.320 --> 0:00:48.040 you know, catastrophically. So the thing they talked about was 0:00:48.200 --> 0:00:54.040 catastrophic forgetting in artificial intelligence, specifically in machine learning systems 0:00:54.040 --> 0:00:57.920 built on artificial neural networks. Now, before we talk about 0:00:58.080 --> 0:01:02.080 catastrophic forgetting, which as I mentioned, is related to neural 0:01:02.120 --> 0:01:04.760 networks and machine learning, we really need to do a 0:01:04.840 --> 0:01:07.120 quick reminder, not a quick reminder. We need to do 0:01:07.160 --> 0:01:10.280 a full reminder on how all this works. And that's 0:01:10.319 --> 0:01:13.080 going to require us to do a whole lot of remembering. 0:01:13.360 --> 0:01:17.000 Not a catastrophic amount, but a lot. So the history 0:01:17.040 --> 0:01:20.680 of artificial intelligence as a discipline is one of intense 0:01:21.600 --> 0:01:26.000 and important debates in fields like computer science. Now, I 0:01:26.040 --> 0:01:29.000 have often talked about how AI can be seen as 0:01:29.160 --> 0:01:33.440 the convergence of several other disciplines into its own field, 0:01:34.000 --> 0:01:37.720 and there's more than one way to approach the challenge 0:01:37.720 --> 0:01:41.080 of artificial intelligence. And in the history of AI, we 0:01:41.120 --> 0:01:45.680 actually saw that play out, and some would argue the 0:01:45.680 --> 0:01:48.680 way it played out means that we're actually just now 0:01:48.720 --> 0:01:53.000 playing catch up. So different schools of thought pushed these 0:01:53.040 --> 0:01:58.760 different approaches forward as this should be the prevailing methodology 0:01:58.800 --> 0:02:03.160 we use to devel artificial intelligence. This is important because 0:02:03.200 --> 0:02:05.880 the development of AI does not exist in a vacuum 0:02:06.440 --> 0:02:11.519 right It exists in our real world. Research requires funding, 0:02:11.720 --> 0:02:17.240 and when you've got different sides arguing that their approach 0:02:17.360 --> 0:02:22.359 to artificial intelligence is superior and that the alternatives are 0:02:22.400 --> 0:02:25.200 not just inferior, but potentially limited to the point of 0:02:25.200 --> 0:02:29.080 being useless, well you've got a metaphorical wrestling match going on. 0:02:29.720 --> 0:02:33.120 The winner takes home the big prize of getting funding 0:02:33.200 --> 0:02:36.240 for their research, and the loser has to scrabble for 0:02:36.320 --> 0:02:39.160 whatever they can find, and often they will see their 0:02:39.160 --> 0:02:43.520 work languish as a result. By the way, this is 0:02:43.560 --> 0:02:46.520 why I often bring stuff up in this podcast that 0:02:46.639 --> 0:02:50.680 is outside the realm of tech. I've received a lot 0:02:50.720 --> 0:02:53.079 of messages over the years from folks saying that I 0:02:53.080 --> 0:02:56.600 should leave out stuff like money or politics. Politics is 0:02:56.600 --> 0:03:00.000 the big one. But to me, that doesn't make sense 0:03:00.120 --> 0:03:04.840 cause tech exists within our world, a world that is 0:03:04.960 --> 0:03:09.080 largely shaped by money and politics. I don't think we 0:03:09.280 --> 0:03:12.240 can separate the tech from all of that because I 0:03:12.280 --> 0:03:16.400 believe that if you were to somehow magically remove those influences, 0:03:16.440 --> 0:03:19.920 If somehow money and politics never played a part in 0:03:19.960 --> 0:03:23.480 the development of technology, our tech would look very different 0:03:23.560 --> 0:03:27.040 from what it does today. Not necessarily better or worse, 0:03:27.560 --> 0:03:31.160 but different. I mean, think about Thomas Edison. He was 0:03:31.400 --> 0:03:36.680 very much driven by financial success, like his work in 0:03:36.800 --> 0:03:40.960 tech was really mostly about making lots of money. And 0:03:41.000 --> 0:03:43.920 without the making lots of money part, you don't really 0:03:43.960 --> 0:03:48.120 have his drive to really bring together the brightest minds 0:03:48.120 --> 0:03:51.040 of his generation and set them to work on creating 0:03:51.080 --> 0:03:54.960 incredible technology. So I think we have to take all 0:03:55.000 --> 0:03:58.680 these things into consideration. Anyway, that's a total rabbit trail, 0:03:58.720 --> 0:04:01.280 and I apologize. Let's get back to our story. It 0:04:01.320 --> 0:04:04.920 really begins around nineteen forty three when a pair of 0:04:04.960 --> 0:04:09.240 researchers at the University of Chicago first proposed the concept 0:04:09.400 --> 0:04:13.520 of the basic unit of a neural network. Those researchers 0:04:13.600 --> 0:04:17.760 were Warren McCullough and Walter Pets, And in fact, they 0:04:17.800 --> 0:04:22.400 demonstrate their idea by showing a simple electrical circuit the 0:04:22.520 --> 0:04:25.719 very basis for what would become a neural network. So 0:04:25.839 --> 0:04:29.400 their proposal was a system that would use those simple 0:04:29.440 --> 0:04:33.960 circuits to mimic the neurons that we have in our noggins. 0:04:34.320 --> 0:04:37.760 So our brain consists of a bunch of these neurons, 0:04:38.320 --> 0:04:40.039 and you might wonder how much is a bunch, Well, 0:04:40.920 --> 0:04:44.480 we're talking about on average, around one hundred billion neurons 0:04:44.640 --> 0:04:48.240 in the human brain. These neurons interconnect with each other. 0:04:48.360 --> 0:04:50.960 It's not just a one to one, right, You've got 0:04:50.960 --> 0:04:55.159 these interconnections between all these different neurons, not with every 0:04:55.160 --> 0:04:58.200 neuron connected to every other neuron, but lots of interconnections. 0:04:58.200 --> 0:05:01.000 And if we're looking at just the connection, you would 0:05:01.040 --> 0:05:04.159 count more than one hundred trillion of them in the 0:05:04.160 --> 0:05:07.840 typical human brain. And these connections in our brains make 0:05:07.960 --> 0:05:12.640 up neural circuits. Those circuits light up, and that represents 0:05:12.720 --> 0:05:15.960 us doing lots of different stuff from experiencing the world 0:05:16.000 --> 0:05:20.120 around us so perception to thinking about a past memory. 0:05:20.320 --> 0:05:23.320 You know that typically is like recreating the same pathway 0:05:23.360 --> 0:05:27.760 over and over, and sometimes we don't recreate it exactly correctly, 0:05:28.240 --> 0:05:32.280 and our memory ends up not being a perfect representation 0:05:32.360 --> 0:05:35.159 of the thing that we actually experienced. This is why 0:05:35.440 --> 0:05:38.679 things like eyewitness testimony is not always very reliable, because 0:05:38.680 --> 0:05:43.840 our memories aren't infallible. They can trick us and we 0:05:43.880 --> 0:05:46.360 can have all those pathways light up when we learn 0:05:46.400 --> 0:05:49.279 a new skill and we start forming new pathways, and 0:05:49.320 --> 0:05:53.160 then as we practice this skill, we start to reinforce 0:05:53.279 --> 0:05:56.919 those pathways. So McCulla and Pitts propose that we create 0:05:57.040 --> 0:06:02.080 machines capable of doing essentially a similar thing that our 0:06:02.120 --> 0:06:07.599 brains do, so kind of a neuromimicry, not exactly one 0:06:07.640 --> 0:06:11.479 to one the way our brains work, but inspired by 0:06:11.560 --> 0:06:15.560 the way our brains work. Now, we would be limited 0:06:15.839 --> 0:06:18.919 by what the technology of the day would be able 0:06:18.920 --> 0:06:22.000 to do, because there's no feasible way we could create 0:06:22.040 --> 0:06:27.920 a massive electrical system with one hundred billion individual simple 0:06:28.040 --> 0:06:31.880 circuits with more than one hundred trillion connections between them. 0:06:32.160 --> 0:06:35.680 That would be beyond our capability it would be beyond 0:06:35.680 --> 0:06:39.599 our resources. We could, however, create systems that used interconnected 0:06:39.640 --> 0:06:44.320 circuits to process information and to teach such a system 0:06:44.440 --> 0:06:49.080 to do specific tasks. Now, in nineteen forty nine, Donald 0:06:49.200 --> 0:06:53.880 Hebb wrote a book about biological neurons, and he titled 0:06:53.880 --> 0:06:58.800 this book the Organization of Behavior and suggested neural pathways 0:06:58.800 --> 0:07:02.320 get stronger with additional use, kind of like you know, 0:07:02.320 --> 0:07:05.400 if you exercise your muscles, you build strength over time. 0:07:05.480 --> 0:07:09.560 Will so is the same with neural pathways. And if 0:07:09.600 --> 0:07:12.560 you don't use those muscles well, then your muscles get weaker. Well, 0:07:12.640 --> 0:07:16.080 same with neural pathways. If you end up learning a skill, 0:07:16.760 --> 0:07:20.920 but then over a great amount of time you no 0:07:20.960 --> 0:07:24.560 longer practice that skill, you're gonna lose some of your ability. 0:07:25.160 --> 0:07:27.080 Maybe not all of it, but at least some of it. 0:07:27.160 --> 0:07:29.400 And you have to you know, like I think about 0:07:29.800 --> 0:07:34.200 wrestlers who come back from from retirement. Professional wrestlers, they 0:07:34.240 --> 0:07:36.480 call it ring rust. You got to knock off the 0:07:36.560 --> 0:07:39.000 ring rust and get back into step and kind of 0:07:39.320 --> 0:07:41.520 get back into your groove. And it takes a little time. 0:07:41.680 --> 0:07:46.000 Typically sometimes you know, you can get back into the 0:07:46.040 --> 0:07:48.760 game faster than others, but you get the idea, and 0:07:49.000 --> 0:07:53.640 also heb ended up proposing the concept of cells that 0:07:53.760 --> 0:07:59.360 fire together wire together, meaning that neurons that fire at 0:07:59.360 --> 0:08:04.880 the same time end up strengthening faster than other neurons do. 0:08:05.520 --> 0:08:09.640 So when you get into that system, you can actually 0:08:09.680 --> 0:08:15.160 reinforce those pathways. And for AI this would be really important. 0:08:15.640 --> 0:08:18.200 And it wasn't very long after Donald have had published 0:08:18.240 --> 0:08:21.240 this work that researchers in the field of AI tried 0:08:21.280 --> 0:08:26.440 to apply that concept that philosophy to computer science. By 0:08:26.480 --> 0:08:29.800 the mid nineteen fifties, the burgeoning computer science Lab and 0:08:29.920 --> 0:08:34.600 AI Lab at MIT was building out neural networks based 0:08:34.679 --> 0:08:41.359 on Hebb's ideas. Meanwhile, another computer scientist named Frank Rosenblatt 0:08:41.679 --> 0:08:46.280 was looking at primitive neural systems and he started with flies, 0:08:46.440 --> 0:08:50.760 like house flies. He wanted to explore systems that were 0:08:50.800 --> 0:08:54.920 involved when a fly would quickly move away after detecting 0:08:54.920 --> 0:08:59.560 a possible threat, like instantly, or at least appearing to 0:08:59.640 --> 0:09:03.839 us to instantly react to something. So, for example, a 0:09:03.920 --> 0:09:06.320 fly swater coming at it like you might be moving 0:09:06.360 --> 0:09:08.400 the fly swater very quickly, and yet the fly is 0:09:08.440 --> 0:09:14.240 able to move super fast with no perceivable delay. Right, 0:09:14.360 --> 0:09:16.760 we know that we have a delay from when we 0:09:16.800 --> 0:09:19.280 perceive something to when we can act on something. Like 0:09:19.320 --> 0:09:21.599 if you've ever been in a fender bender in a 0:09:21.640 --> 0:09:24.680 car accident, you know that that there's a delay between 0:09:24.679 --> 0:09:26.880 when you see the issue when you can hit the brake, 0:09:27.520 --> 0:09:30.960 and that can lead to accidents. Well, with flies, that 0:09:31.000 --> 0:09:35.080 delay seems to be super super small. So Rosenblatt was 0:09:35.120 --> 0:09:39.440 really interested in exploring the neurological reasons for that. How 0:09:39.559 --> 0:09:41.920 can that happen? It has to be really simple, right, 0:09:42.360 --> 0:09:44.680 There has to be a simple and more or less 0:09:44.840 --> 0:09:50.400 direct pathway that exists to allow a fly to react 0:09:50.520 --> 0:09:54.720 to detecting a potential threat like that. And if you 0:09:54.840 --> 0:09:58.960 could replicate that with electronics, you could have a very 0:09:59.000 --> 0:10:05.640 simple but potentially powerful artificial intelligence system. So he came 0:10:05.720 --> 0:10:08.440 up with this system that would be based off that 0:10:08.640 --> 0:10:11.080 very simple direct pathway that you would see in something 0:10:11.120 --> 0:10:14.760 like a fly, and he called it the perceptron. So 0:10:14.800 --> 0:10:17.080 he went back to the simple circuit design that was 0:10:17.080 --> 0:10:20.080 proposed by Pitts and McCullough and he built out the 0:10:20.120 --> 0:10:24.560 Mark one perceptron or perceptron I guess I should say, 0:10:24.720 --> 0:10:27.520 So let's talk about a perceptron like not big p 0:10:27.760 --> 0:10:30.600 but a little p perceptron. This is probably what we 0:10:30.600 --> 0:10:33.920 would call a neural node in a modern neural network. 0:10:34.240 --> 0:10:38.080 So the purpose of the perceptron was to accept inputs 0:10:38.559 --> 0:10:43.319 and produce an output based on some threshold. Like if 0:10:43.320 --> 0:10:46.720 the inputs meet a certain threshold, one output would be produced. 0:10:46.760 --> 0:10:48.680 If they failed to do so, a different output would 0:10:48.679 --> 0:10:53.320 be produced. The inputs, in turn would be assigned weights, 0:10:53.760 --> 0:10:56.840 which would factor into the output the perceptron would generate. 0:10:57.240 --> 0:11:02.880 So when we're talking weights, I mean you eights as 0:11:02.920 --> 0:11:06.000 in like how heavy something is, or in this case, 0:11:06.360 --> 0:11:10.440 how much impact that thing has. So we're talking about 0:11:10.640 --> 0:11:14.280 how much impact one input has relative to other inputs. 0:11:14.720 --> 0:11:17.480 Let me use a really mundane human example to kind 0:11:17.480 --> 0:11:21.280 of explain what this means. Let's say that your friend 0:11:21.360 --> 0:11:23.600 asks you to go see a movie with them, and 0:11:23.640 --> 0:11:26.520 it's going to be playing tonight at nine pm. But 0:11:26.600 --> 0:11:29.720 you've had a really busy day and you might not 0:11:29.760 --> 0:11:32.920 be able to even eat dinner until around nine pm. 0:11:33.280 --> 0:11:34.880 And if you go see this movie, it might mean 0:11:34.920 --> 0:11:37.760 having to skip dinner or to try and eat something 0:11:37.840 --> 0:11:41.600 really fast and unhealthy before you go to the movie. 0:11:41.920 --> 0:11:45.120 What's more, you got a really big day tomorrow and 0:11:45.400 --> 0:11:47.400 you feel like you really need to be well rested 0:11:47.440 --> 0:11:51.240 for it. However, at the same time, you haven't seen 0:11:51.280 --> 0:11:54.200 this friend in ages, and you really like this person 0:11:54.360 --> 0:11:56.880 and you've wanted to hang with them for a really 0:11:56.920 --> 0:11:59.880 long time. Plus the movie they're suggesting is one you've 0:12:00.120 --> 0:12:03.000 hell he wanted to see and you haven't gone yet. Well, 0:12:03.040 --> 0:12:07.160 you would likely assign at least unconsciously weights to each 0:12:07.200 --> 0:12:09.800 of these factors before you make your decision. You know, 0:12:09.840 --> 0:12:12.880 if getting some dinner without having to rush and also 0:12:12.960 --> 0:12:15.880 to be really well rested for tomorrow are really important 0:12:15.920 --> 0:12:20.240 to you, you'll probably reluctantly decline the offer. But if 0:12:20.280 --> 0:12:22.440 you really crave some time with your friend and you 0:12:22.520 --> 0:12:24.720 really want to see that movie before all the spoilers 0:12:24.760 --> 0:12:28.000 come out on Facebook or whatever, maybe you'll say yes. 0:12:28.559 --> 0:12:32.520 Your decision depends upon the weights you assign those factors, 0:12:32.559 --> 0:12:36.160 those inputs, even if you don't consciously think about it 0:12:36.200 --> 0:12:39.680 that way. Well. The Perceptron system worked in a similar way, 0:12:39.880 --> 0:12:44.000 produced outputs by taking the inputs into consideration, including each 0:12:44.120 --> 0:12:48.280 input's weight. Moreover, the more you submitted inputs, the more 0:12:48.320 --> 0:12:51.680 the system would quote unquote learn how to weight each 0:12:51.720 --> 0:12:54.160 of those inputs, all with the goal of bringing the 0:12:54.320 --> 0:12:59.079 actual output that the process or you know, generates closer 0:12:59.080 --> 0:13:03.720 to the one you want it to generate. Okay, I 0:13:03.800 --> 0:13:05.800 just said a lot there. We've got some more to 0:13:05.840 --> 0:13:07.839 get through. But before we get to that, let's take 0:13:07.840 --> 0:13:19.120 a quick break. All right. Before the break, we were 0:13:19.120 --> 0:13:22.720 talking about inputs and weights and the idea of getting 0:13:23.320 --> 0:13:26.680 an output that is close to what you want the 0:13:26.720 --> 0:13:30.240 system to do. That's not a guarantee, right, The system 0:13:30.280 --> 0:13:34.200 could generate an output that's quote unquote wrong, you know, 0:13:34.280 --> 0:13:38.400 depending on whatever task you've set this machine learning system 0:13:38.520 --> 0:13:42.040 to learn. And that gets a bit conceptual. So let's 0:13:42.040 --> 0:13:44.640 talk about a simple example that I love to use. 0:13:44.720 --> 0:13:46.520 If you've been listening to tech sta for a while, 0:13:46.600 --> 0:13:50.600 you've heard this before, and that's talking about pictures of cats. 0:13:50.960 --> 0:13:54.000 Because cats ruled the internet. I don't know if they 0:13:54.040 --> 0:13:56.400 still do. They won't talk to me, so they just 0:13:56.440 --> 0:13:59.040 knock things off shelves. Anyway, if your goal is to 0:13:59.160 --> 0:14:04.040 teach a computer system to differentiate photos that include a 0:14:04.080 --> 0:14:08.240 cat from photos that do not include a cat, well, 0:14:08.800 --> 0:14:10.840 you would need to train the system, and part of 0:14:10.840 --> 0:14:15.640 that includes feeding the system a whole bunch of photographs. 0:14:16.280 --> 0:14:19.200 Some of those would have cats in them, some would not, 0:14:19.960 --> 0:14:24.120 and chances are the system would misidentify photos, maybe a 0:14:24.160 --> 0:14:27.120 significant number of those photos. You would probably have false 0:14:27.160 --> 0:14:29.800 positives where the system thinks there's a cat there and 0:14:29.840 --> 0:14:32.960 there's not, and false negatives where it doesn't think there's 0:14:33.000 --> 0:14:36.320 a cat there but there is. At that point, your 0:14:36.320 --> 0:14:39.080 goal is to try and teach the system to close 0:14:39.200 --> 0:14:43.280 the gap between the actual results it produces and what 0:14:43.440 --> 0:14:46.600 you want it to produce. In some systems that means 0:14:46.680 --> 0:14:49.480 you might have to go in manually to adjust the 0:14:49.560 --> 0:14:52.480 input weights to increase the weight of one input versus 0:14:52.560 --> 0:14:55.960 another in an effort to cut down on mistakes. So 0:14:56.080 --> 0:15:01.440 the perceptron was interesting, but it was very limit in complexity. 0:15:01.760 --> 0:15:04.160 It was essentially a single layer where you'd feed a 0:15:04.160 --> 0:15:06.480 bunch of inputs in and you would get an output, 0:15:06.840 --> 0:15:10.360 So it was suitable for a subset of computational challenges, 0:15:10.360 --> 0:15:14.640 but anything beyond that was well beyond its own reach 0:15:14.880 --> 0:15:18.040 as a single layer network. By the late nineteen fifties, 0:15:18.080 --> 0:15:21.680 other researchers had created new neural networks that were multi layered, 0:15:22.160 --> 0:15:26.920 so a node or neuron didn't just accept inputs, it 0:15:26.920 --> 0:15:30.240 would generate outputs that then would become inputs for another 0:15:30.760 --> 0:15:35.640 layer down. So instead of just having one layer of nodes, 0:15:35.640 --> 0:15:38.040 you would have multiple layers of nodes. Typically you would 0:15:38.040 --> 0:15:42.000 have one at the quote unquote top of the network, 0:15:42.360 --> 0:15:43.920 and you would have outputs at the bottom, and the 0:15:43.960 --> 0:15:47.240 ones in between would be often referred to as hidden layers, 0:15:47.720 --> 0:15:51.040 and who knows how many there would be. So anyway, 0:15:51.360 --> 0:15:54.160 you would feed data to the system. The initial nodes 0:15:54.200 --> 0:15:58.200 would generate information as outputs that would become inputs for 0:15:58.280 --> 0:16:02.400 the next layer down, which would then continue the process 0:16:03.040 --> 0:16:05.000 and so on and so forth until you get to 0:16:05.040 --> 0:16:08.080 the output. So now you had artificial neural networks that 0:16:08.080 --> 0:16:12.480 could tackle more complex challenges, and you would have multiple 0:16:12.520 --> 0:16:16.400 steps in the process. Didn't necessarily mean they were automatically 0:16:16.480 --> 0:16:20.600 better than the perceptron, was just that they were able 0:16:20.640 --> 0:16:26.400 to tackle more complicated tasks. What followed is something that 0:16:26.480 --> 0:16:30.000 will probably sound really familiar to you if you ever 0:16:30.160 --> 0:16:35.200 follow technology or fads, the hype around machine learning and 0:16:35.320 --> 0:16:38.120 artificial intelligence, And keep in mind this is like the 0:16:38.240 --> 0:16:43.200 nineteen sixties. It grew beyond the technology's actual capabilities. At 0:16:43.240 --> 0:16:47.160 that time, people started to project what this technology would 0:16:47.200 --> 0:16:49.560 be able to do, and they did so thinking it 0:16:49.600 --> 0:16:52.840 was going to be in a very short turnaround, like 0:16:52.880 --> 0:16:57.359 we're right on the very precipice of a monstrous breakthrough 0:16:57.440 --> 0:17:00.000 that will bring the science fiction future into the press. 0:17:01.160 --> 0:17:06.000 So when it was realized that we weren't at that, like, 0:17:06.080 --> 0:17:09.920 that's not how progress typically works. It's usually much more 0:17:10.440 --> 0:17:15.480 gradual and humble than that, Well, then enthusiasm around AI 0:17:15.600 --> 0:17:18.120 began to take a hit. And as I mentioned already, 0:17:18.160 --> 0:17:21.720 a big part of AI research really comes down to funding, 0:17:22.280 --> 0:17:25.680 and it gets really challenging to secure funding when public 0:17:25.800 --> 0:17:30.480 opinion dims on a technology. We've seen this happen lots 0:17:30.520 --> 0:17:34.320 of times, right, Like three D television was a fad 0:17:34.359 --> 0:17:37.040 that was pushed. Now, granted, that one, you could argue 0:17:37.119 --> 0:17:40.440 was more of an example of manufacturing companies that make 0:17:40.480 --> 0:17:44.080 televisions trying to push a technology on consumers and the 0:17:44.119 --> 0:17:46.840 consumers just weren't interested. You could argue that was the 0:17:46.880 --> 0:17:50.320 case there. But virtual reality in the nineteen nineties definitely 0:17:50.359 --> 0:17:53.920 followed this pathway. There was this excitement around virtual reality. 0:17:54.920 --> 0:17:58.800 Then that excitement faded to almost nothing when people realized 0:17:58.800 --> 0:18:02.080 that the actual state of the art of the technology 0:18:02.320 --> 0:18:05.800 was far below where they expected it to be. And 0:18:05.880 --> 0:18:09.360 suddenly people who are working in VR couldn't get funding 0:18:09.520 --> 0:18:11.720 for their work, and they kind of had to scrounge 0:18:11.760 --> 0:18:15.680 around in order to keep the development going at all, 0:18:16.320 --> 0:18:19.200 and then eventually we would see that come back around again. 0:18:19.760 --> 0:18:23.320 You could argue that NFTs recently went through this too, 0:18:23.400 --> 0:18:26.880 where the hype went well beyond what NFTs could actually do. 0:18:27.920 --> 0:18:31.240 I've been really down on NFTs in general. I do 0:18:31.280 --> 0:18:36.359 think that there are potential legitimate uses for NFTs, but 0:18:36.480 --> 0:18:42.720 I think the early examples were frivolous and almost solely 0:18:42.800 --> 0:18:48.640 centered around speculation, as in like financial speculation, and as 0:18:48.720 --> 0:18:50.640 a result, there was nothing for it to do other 0:18:50.720 --> 0:18:53.840 than to create a bubble that would ultimately burst, which 0:18:53.880 --> 0:18:57.520 is what happened. And maybe NFTs will recover from that 0:18:57.640 --> 0:19:01.760 and become something that's more fundamentally useful in the Internet 0:19:01.840 --> 0:19:04.880 in the future or in digital commerce in the future, 0:19:06.240 --> 0:19:10.200 but it's going to have to get over the catastrophe 0:19:10.240 --> 0:19:12.960 that happened when the rug was pulled out from underneath 0:19:13.080 --> 0:19:18.000 n FTS, and that was all, you know, predictable and preventable. 0:19:18.760 --> 0:19:23.080 But like I've said before, like I've lifted the joke 0:19:23.119 --> 0:19:25.359 from Peter Cook, we've learned from our mistakes. We can 0:19:25.440 --> 0:19:29.760 repeat them almost exactly. Anyway, This same sort of hype 0:19:29.960 --> 0:19:35.000 cycle activity happened with neural networks and machine learning in 0:19:35.040 --> 0:19:40.800 the nineteen sixties. Then enter Marvin Minsky and Seymour Pappart 0:19:40.840 --> 0:19:43.880 of MIT's AI lab. They were leading that lab at 0:19:43.880 --> 0:19:46.840 the time. In nineteen sixty nine, they co authored a 0:19:46.960 --> 0:19:53.280 book titled Perceptrons. They were actually critical of that artificial 0:19:53.359 --> 0:19:56.480 neural network approach to AI and machine learning. They were 0:19:56.520 --> 0:20:00.040 concerned that the limitations of the technology meant that you 0:19:59.840 --> 0:20:03.560 and you need an unrealistically huge system of artificial neurons, 0:20:03.920 --> 0:20:08.159 perhaps then using that system to compute an infinite number 0:20:08.200 --> 0:20:12.679 of variations of the same process or task if you 0:20:12.760 --> 0:20:15.200 wanted to train the weights so that they were of 0:20:15.320 --> 0:20:19.719 the optimal value. So, in other words, they thought, it's 0:20:19.920 --> 0:20:23.640 too impractical, and it's going to take too much compute time, 0:20:23.680 --> 0:20:26.040 and you're never going to achieve the result you want. 0:20:26.080 --> 0:20:30.200 You're never going to get to that most perfect system. 0:20:30.880 --> 0:20:35.840 And they believed it just had fundamental inescapable flaws. They 0:20:35.880 --> 0:20:40.760 had different systems in mind. Now Minski and Separate tried 0:20:40.800 --> 0:20:43.159 to push their systems forward, and I could do a 0:20:43.200 --> 0:20:46.800 full episode about them too, and their ideas were not bad. 0:20:47.359 --> 0:20:50.119 They were different. It was a different approach. But this 0:20:50.240 --> 0:20:53.040 also meant that researchers who had been pushing the development 0:20:53.040 --> 0:20:56.800 of our artificial neural networks felt forced to move on 0:20:57.000 --> 0:21:02.280 to different projects because financial support for anything connected to 0:21:02.320 --> 0:21:07.320 the concept of neural networks effectively disappeared, right like funding 0:21:07.480 --> 0:21:11.119 just dropped for that. Because here you had these experts 0:21:11.160 --> 0:21:15.800 in computer science saying, yeah, this approach, while interesting, has 0:21:15.840 --> 0:21:19.320 already hit an insurmountable obstacle and it's not going to 0:21:19.359 --> 0:21:21.200 go any further. It's gone as far as it can go. 0:21:21.840 --> 0:21:25.480 And so a lot of computer scientists blamed Minsky and 0:21:25.520 --> 0:21:31.320 Separate for essentially demolishing funding for neural networks for more 0:21:31.359 --> 0:21:34.720 than a decade, and in fact, this would become an 0:21:34.760 --> 0:21:38.919 era that retrospectively, computer scientists would reference as the AI 0:21:39.240 --> 0:21:42.960 Winter got all Game of Thrones up in here Now. 0:21:43.000 --> 0:21:46.240 In nineteen eighty two, there was a hint of spring 0:21:47.040 --> 0:21:51.119 thawing out that AI Winter researchers in Japan were starting 0:21:51.160 --> 0:21:56.119 to resurrect work on neural network projects. And meanwhile, a 0:21:56.160 --> 0:21:59.720 scientist named John Hopfield submitted a research paper to the 0:21:59.800 --> 0:22:03.760 Neattional Academy of Sciences that brought neural networks back into 0:22:03.840 --> 0:22:07.320 discussion here in the United States, and because Japan was 0:22:07.440 --> 0:22:13.200 actively investing in developing that technology, institutions in the United 0:22:13.240 --> 0:22:15.600 States began to open up the purse strings a bit 0:22:16.000 --> 0:22:18.880 because there was a concern that if there were something 0:22:19.359 --> 0:22:22.720 to this artificial neural network concept, if in fact those 0:22:22.840 --> 0:22:28.040 obstacles weren't insurmountable, as min skin Separate had suggested, the 0:22:28.200 --> 0:22:32.800 US could potentially fall behind another country because it would 0:22:32.800 --> 0:22:36.159 fail to fund its development. So, in a desire not 0:22:36.680 --> 0:22:38.760 to have Japan take the ball and run with it, 0:22:39.080 --> 0:22:42.520 the United States began to invest again in artificial neural 0:22:42.600 --> 0:22:46.600 network research and development. In the mid nineteen eighties, computer 0:22:46.680 --> 0:22:53.040 scientists essentially rediscovered the usefulness of a process called back propagation. 0:22:53.640 --> 0:22:56.720 And I've already talked about nodes and weights and stuff, 0:22:56.760 --> 0:22:58.479 but this is going to require a little bit more 0:22:58.480 --> 0:23:02.000 explanation to understand what by propagation is all about. So 0:23:02.119 --> 0:23:05.560 let's kind of try to visualize a neural network. So 0:23:05.600 --> 0:23:07.960 you've got your input nodes. Just think of a bunch 0:23:07.960 --> 0:23:10.960 of circles. If you were drawing it from top to bottom, 0:23:10.960 --> 0:23:13.760 this would be your top layer. This is like the 0:23:13.800 --> 0:23:17.560 funnels where you're going to feed data into the system. 0:23:18.040 --> 0:23:20.200 Now you've got a whole bunch of these at the top, 0:23:20.280 --> 0:23:22.720 and they can accept the data that you're feeding in. 0:23:23.200 --> 0:23:28.240 They process that data and then based upon you some operation, 0:23:28.840 --> 0:23:32.920 they will then send an output to a node one 0:23:33.000 --> 0:23:36.280 layer down. So there's lots of other nodes in the 0:23:36.359 --> 0:23:38.760 layers below, or maybe not as many as you have 0:23:38.840 --> 0:23:43.560 initial layers. You might actually have fewer, and the layers 0:23:43.600 --> 0:23:47.119 above will send to you know, data to a specific 0:23:47.200 --> 0:23:51.040 node depending upon what the outcome is. Whatever the output is, 0:23:52.280 --> 0:23:56.800 So these nodes accept the input. These inputs have a 0:23:56.840 --> 0:23:59.920 bias and a weight to them, and this is one 0:24:00.040 --> 0:24:03.040 the hidden layers. They will then create an output and 0:24:03.080 --> 0:24:07.840 send that on to nodes another layer down. So this 0:24:07.960 --> 0:24:10.640 goes on until you get to your output layer where 0:24:10.680 --> 0:24:14.280 you get your final result, and then you can determine 0:24:14.280 --> 0:24:16.840 whether or not the final result matches what you were 0:24:16.880 --> 0:24:20.480 hoping for. So did your system properly identify which photos 0:24:20.520 --> 0:24:23.280 do and don't have cats in them? Now, as I 0:24:23.280 --> 0:24:26.720 mentioned earlier, you typically get results that aren't perfect, but 0:24:26.800 --> 0:24:30.520 we want to train the system to improve with every test. 0:24:31.200 --> 0:24:35.320 Back propagation is one way to do this. So with 0:24:35.359 --> 0:24:38.600 that propagation, you actually start with the final output. You've 0:24:38.640 --> 0:24:41.840 already done a test run, right, and you've got your output, 0:24:42.560 --> 0:24:48.080 and maybe your test has five possible final outcomes, but 0:24:48.200 --> 0:24:51.440 only one of those is the outcome you actually want. Okay, 0:24:51.480 --> 0:24:54.680 we'll say it's outcome number one. We're saying I want 0:24:54.680 --> 0:24:58.439 this system to more often than not come to the 0:24:58.480 --> 0:25:00.680 conclusion that it's outcome number one one. But you run 0:25:00.720 --> 0:25:07.480 your test. It's got one thousand little tasks in it, 0:25:07.520 --> 0:25:10.800 and you run your test, you find out that it 0:25:10.880 --> 0:25:13.720 only arrives at outcome number one five percent of the time, 0:25:13.920 --> 0:25:16.399 which is actually worse than random chance. Right, it should 0:25:16.400 --> 0:25:18.640 be twenty percent for random chance, But it's only getting 0:25:18.680 --> 0:25:21.679 there five percent of the time. Something is going really 0:25:21.720 --> 0:25:25.480