WEBVTT - Rerun: Machine Learning and Catastrophic Forgetting 0:00:02.920 --> 0:00:11.080 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:11.119 --> 0:00:14.560 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:14.640 --> 0:00:18.040 I'm an executive producer with iHeart Podcasts. And how the 0:00:18.120 --> 0:00:21.960 tech are you well. I just got back from celebrating 0:00:22.079 --> 0:00:25.000 my birthday. Thank y'all for all of you who are 0:00:25.000 --> 0:00:28.040 wishing me a happy birthday. And here in the United States, 0:00:28.120 --> 0:00:32.640 we're about to have our national holiday celebrating the fourth 0:00:32.680 --> 0:00:36.320 of July. I realize Fourth of July happens everywhere, not 0:00:36.520 --> 0:00:38.879 just in the US, but we celebrate it here in 0:00:38.920 --> 0:00:42.680 the US, and as such, there's very limited time to 0:00:42.680 --> 0:00:45.600 get everything done, and I really wasn't able to pull 0:00:45.640 --> 0:00:48.320 an episode together in time, and I apologize for that, 0:00:48.720 --> 0:00:52.239 but I thought I would bring an older episode to 0:00:52.320 --> 0:00:55.480 y'all so that we can still have an episode to 0:00:55.560 --> 0:00:59.800 listen to today. And typically I would have one of my 0:01:00.080 --> 0:01:04.120 Fireworks episodes play on this day, because Fireworks has a 0:01:04.240 --> 0:01:06.240 very close association with the Fourth of July here in 0:01:06.240 --> 0:01:09.400 the United States. But I've done that for several years 0:01:09.440 --> 0:01:11.560 in a row, and I've thought it might be nice 0:01:11.600 --> 0:01:14.760 to have a break from Fireworks instead. I thought I 0:01:14.760 --> 0:01:17.240 would focus on something that continues to be a very 0:01:17.280 --> 0:01:21.120 important topic in tech, and that is artificial intelligence. And 0:01:21.600 --> 0:01:26.280 AI is incredibly impressive, but there are also lots of 0:01:26.480 --> 0:01:31.400 challenges with AI, and those are ranging from the technological 0:01:31.480 --> 0:01:35.800 side to the social side right and how we implement AI. 0:01:36.240 --> 0:01:38.560 One thing I thought that we don't really get to 0:01:38.600 --> 0:01:43.959 talk about very much is the concept of forgetting with AI. 0:01:44.240 --> 0:01:46.440 We have a lot of generative AI out there that 0:01:46.959 --> 0:01:51.120 is drawing upon huge resources of information, but AI can 0:01:51.240 --> 0:01:56.440 also quote unquote forget. So this episode originally published on 0:01:56.520 --> 0:01:59.520 July thirty first of twenty twenty three. It is called 0:01:59.600 --> 0:02:03.640 Machine Learning and Catastrophic Forgetting. And I think it's a 0:02:03.760 --> 0:02:07.080 useful thing to reflect upon as we see more and 0:02:07.160 --> 0:02:14.720 more headlines about tech companies and their investment increasingly astronomical 0:02:14.840 --> 0:02:21.520 investment in artificial intelligence. I hope you enjoy so. Over 0:02:21.560 --> 0:02:24.920 this past weekend, I was listening to the podcast The 0:02:24.919 --> 0:02:27.920 Skeptics Guide to the Universe, which I have no connection to. 0:02:28.200 --> 0:02:31.480 I just listened to it, and it included a section 0:02:31.720 --> 0:02:36.280 on AI that referenced something I don't think I had 0:02:36.400 --> 0:02:39.560 heard of before, which is really talking more about my 0:02:39.680 --> 0:02:43.440 oversight than anything else. Maybe I did hear about it 0:02:43.600 --> 0:02:47.160 but then I forgot about it, you know, catastrophically. So 0:02:47.560 --> 0:02:52.280 the thing they talked about was catastrophic forgetting in artificial intelligence, 0:02:52.280 --> 0:02:57.200 specifically in machine learning systems built on artificial neural networks. Now, 0:02:57.200 --> 0:03:01.760 before we talk about catastrophic forgetting, which as I mentioned, 0:03:01.800 --> 0:03:04.960 is related to neural networks and machine learning, we really 0:03:05.000 --> 0:03:07.360 need to do a quick reminder, not a quick reminder. 0:03:07.360 --> 0:03:09.280 We need to do a full reminder on how all 0:03:09.360 --> 0:03:12.040 this works. And that's going to require us to do 0:03:12.240 --> 0:03:15.560 a whole lot of remembering. Not a catastrophic amount, but 0:03:15.639 --> 0:03:19.280 a lot. So the history of artificial intelligence as a 0:03:19.320 --> 0:03:25.120 discipline is one of intense and important debates in fields 0:03:25.160 --> 0:03:28.040 like computer science. Now, I have often talked about how 0:03:28.120 --> 0:03:31.480 AI can be seen as the convergence of several other 0:03:31.600 --> 0:03:35.600 disciplines into its own field. And there's more than one 0:03:35.600 --> 0:03:40.680 way to approach the challenge of artificial intelligence. And in 0:03:40.760 --> 0:03:43.440 the history of AI, we actually saw that play out, 0:03:44.080 --> 0:03:47.680 and some would argue the way it played out means 0:03:47.720 --> 0:03:51.200 that we're actually just now playing catch up. So different 0:03:51.240 --> 0:03:56.200 schools of thought pushed these different approaches forward as this 0:03:56.400 --> 0:04:01.920 should be the prevailing methodology we use to develop artificial intelligence. 0:04:02.360 --> 0:04:05.440 This is important because the development of AI does not 0:04:05.560 --> 0:04:09.680 exist in a vacuum, right. It exists in our real world. 0:04:10.320 --> 0:04:16.760 Research requires funding, and when you've got different sides arguing 0:04:16.800 --> 0:04:21.160 that their approach to artificial intelligence is superior and that 0:04:21.200 --> 0:04:25.400 the alternatives are not just inferior, but potentially limited to 0:04:25.440 --> 0:04:28.360 the point of being useless, well you've got a metaphorical 0:04:28.440 --> 0:04:31.760 wrestling match going on. The winner takes home the big 0:04:31.800 --> 0:04:36.000 prize of getting funding for their research, and the loser 0:04:36.120 --> 0:04:38.839 has to scrabble for whatever they can find, and often 0:04:39.080 --> 0:04:42.840 they will see their work languish as a result. By 0:04:42.880 --> 0:04:45.960 the way, this is why I often bring stuff up 0:04:46.000 --> 0:04:49.520 in this podcast that is outside the realm of tech. 0:04:50.480 --> 0:04:52.720 I've received a lot of messages over the years from 0:04:52.720 --> 0:04:55.400 folks saying that I should leave out stuff like money 0:04:55.880 --> 0:04:58.640 or politics. Politics is the big one. But to me, 0:04:58.760 --> 0:05:04.720 that doesn't make sense because tech exists within our world, 0:05:04.839 --> 0:05:08.640 a world that is largely shaped by money and politics. 0:05:09.040 --> 0:05:12.000 I don't think we can separate the tech from all 0:05:12.040 --> 0:05:14.440 of that because I believe that if you were to 0:05:14.480 --> 0:05:18.839 somehow magically remove those influences, If somehow money and politics 0:05:18.880 --> 0:05:22.600 never played a part in the development of technology, our 0:05:22.640 --> 0:05:25.479 tech would look very different from what it does today. 0:05:25.960 --> 0:05:29.960 Not necessarily better or worse, but different. I mean, think 0:05:29.960 --> 0:05:36.040 about Thomas Edison. He was very much driven by financial success, 0:05:36.120 --> 0:05:40.200 like his work in tech was really mostly about making 0:05:40.320 --> 0:05:43.520 lots of money. And without the making lots of money part, 0:05:43.920 --> 0:05:47.480 you don't really have his drive to really bring together 0:05:47.560 --> 0:05:50.800 the brightest minds of his generation and set them to 0:05:50.880 --> 0:05:55.080 work on creating incredible technology. So I think we have 0:05:55.240 --> 0:05:58.440 to take all these things into consideration. Anyway, that's a 0:05:58.480 --> 0:06:00.720 total rabbit trail, and I apology. Let's get back to 0:06:00.760 --> 0:06:05.200 our story. It really begins around nineteen forty three when 0:06:05.200 --> 0:06:08.360 a pair of researchers at the University of Chicago first 0:06:08.640 --> 0:06:13.080 proposed the concept of the basic unit of a neural network. 0:06:13.400 --> 0:06:18.279 Those researchers were Warren McCullough and Walter Pets, And in fact, 0:06:18.320 --> 0:06:22.839 they demonstrate their idea by showing a simple electrical circuit 0:06:23.040 --> 0:06:25.839 the very basis for what would become a neural network. 0:06:26.320 --> 0:06:29.679 So their proposal was a system that would use those 0:06:29.720 --> 0:06:33.880 simple circuits to mimic the neurons that we have in 0:06:33.880 --> 0:06:37.720 our noggins. So our brain consists of a bunch of 0:06:37.760 --> 0:06:40.719 these neurons, and you might wonder how much is a bunch. Well, 0:06:41.600 --> 0:06:45.159 we're talking about on average, around one hundred billion neurons 0:06:45.320 --> 0:06:48.920 in the human brain. These neurons interconnect with each other. 0:06:49.040 --> 0:06:51.640 It's not just a one to one, right, You've got 0:06:51.640 --> 0:06:55.839 these interconnections between all these different neurons, not with every 0:06:55.839 --> 0:06:58.880 neuron connected to every other neuron, but lots of interconnections. 0:06:58.880 --> 0:07:01.680 And if we're looking at just the connections, you would 0:07:01.720 --> 0:07:04.839 count more than one hundred trillion of them in the 0:07:04.880 --> 0:07:08.560 typical human brain. And these connections in our brains make 0:07:08.640 --> 0:07:13.320 up neural circuits. Those circuits light up, and that represents 0:07:13.400 --> 0:07:16.640 us doing lots of different stuff, from experiencing the world 0:07:16.680 --> 0:07:20.840 around us so perception to thinking about a past memory. 0:07:21.000 --> 0:07:24.000 You know that typically is like recreating the same pathway 0:07:24.080 --> 0:07:28.440 over and over, and sometimes we don't recreate it exactly correctly, 0:07:28.920 --> 0:07:32.920 and our memory ends up not being a perfect representation 0:07:33.080 --> 0:07:35.880 of the thing that we actually experienced. This is why 0:07:36.120 --> 0:07:39.360 things like eyewitness testimony is not always very reliable, because 0:07:39.400 --> 0:07:44.520 our memories aren't infallible. They can trick us and we 0:07:44.560 --> 0:07:47.040 can have all those pathways light up. When we learn 0:07:47.080 --> 0:07:50.200 a new skill, we start forming new pathways, and then 0:07:50.360 --> 0:07:54.800 as we practice this skill, we start to reinforce those pathways. 0:07:55.160 --> 0:07:58.800 So McCulla and Pitts propose that we create machines capable 0:07:58.880 --> 0:08:03.320 of doing essentially a similar thing that our brains do, 0:08:03.440 --> 0:08:08.680 so kind of a neuromimicry, not exactly one to one 0:08:08.720 --> 0:08:12.600 the way our brains work, but inspired by the way 0:08:12.760 --> 0:08:17.080 our brains work. Now, we would be limited by what 0:08:17.360 --> 0:08:19.920 the technology of the day would be able to do, 0:08:20.360 --> 0:08:23.640 because there's no feasible way we could create a massive 0:08:24.160 --> 0:08:29.640 electrical system with one hundred billion individual simple circuits with 0:08:29.760 --> 0:08:33.240 more than one hundred trillion connections between them. That would 0:08:33.240 --> 0:08:37.199 be beyond our capability. It would be beyond our resources. 0:08:37.559 --> 0:08:40.840 We could, however, create systems that used interconnected circuits to 0:08:40.920 --> 0:08:45.480 process information and to teach such a system to do 0:08:45.559 --> 0:08:50.920 specific tasks. Now, in nineteen forty nine, Donald Hebb wrote 0:08:50.960 --> 0:08:55.080 a book about biological neurons, and he titled this book 0:08:55.320 --> 0:08:59.960 the Organization of Behavior and suggested neural pathways get stronger 0:09:00.520 --> 0:09:03.320 with additional use, kind of like you know, if you 0:09:03.559 --> 0:09:06.520 exercise your muscles, you build strength over time, while so 0:09:06.720 --> 0:09:10.640 is the same with neural pathways, and if you don't 0:09:10.720 --> 0:09:13.240 use those muscles, well, then your muscles get weaker. Well, 0:09:13.320 --> 0:09:16.760 same with neural pathways. If you end up learning a skill, 0:09:17.480 --> 0:09:21.600 but then over a great amount of time you no 0:09:21.640 --> 0:09:24.560 longer practice that skill, you're going to lose some of 0:09:24.600 --> 0:09:27.400 your ability, maybe not all of it, but at least 0:09:27.400 --> 0:09:29.240 some of it. And you have to you know, like 0:09:29.559 --> 0:09:33.240 I think about wrestlers who come back from from retirement, 0:09:33.360 --> 0:09:36.520 professional wrestlers, they call it ring rust. You got to 0:09:36.600 --> 0:09:39.120 knock off the ring rust and get back into step 0:09:39.200 --> 0:09:41.320 and kind of get back into your groove. And it 0:09:41.360 --> 0:09:45.880 takes a little time. Typically sometimes you know, you can 0:09:46.000 --> 0:09:48.280 get back into the game faster than others, but you 0:09:48.400 --> 0:09:53.040 get the idea. And also heb ended up proposing the 0:09:53.080 --> 0:09:58.080 concept of cells that fire together wire together, meaning that 0:09:58.800 --> 0:10:02.800 neurons that fire at the same time end up strengthening 0:10:02.880 --> 0:10:08.160 faster than other neurons do. So when you get into 0:10:08.240 --> 0:10:14.040 that system, you can actually reinforce those pathways. And for 0:10:14.160 --> 0:10:17.120 AI this would be really important. And it wasn't very 0:10:17.160 --> 0:10:20.599 long after Donald Habb had published this work that researchers 0:10:20.600 --> 0:10:23.679 in the field of AI tried to apply that concept 0:10:23.760 --> 0:10:28.480 that philosophy to computer science. By the mid nineteen fifties, 0:10:28.520 --> 0:10:32.040 the burgeoning computer science lab and AI lab at MIT 0:10:32.880 --> 0:10:38.400 was building out neural networks based on Hebb's ideas. Meanwhile, 0:10:38.840 --> 0:10:43.680 another computer scientist named Frank Rosenblatt was looking at primitive 0:10:43.679 --> 0:10:48.079 neural systems and he started with flies like house flies. 0:10:49.040 --> 0:10:52.160 He wanted to explore systems that were involved when a 0:10:52.200 --> 0:10:56.560 fly would quickly move away after detecting a possible threat, 0:10:57.000 --> 0:11:01.439 like instantly, or at least appear to us to instantly 0:11:01.520 --> 0:11:05.480 react to something. So, for example, a fly swatter coming 0:11:05.480 --> 0:11:07.640 at it, like you might be moving the fly swater 0:11:07.720 --> 0:11:09.840 very quickly, and yet the fly is able to move 0:11:10.400 --> 0:11:15.640 super fast with no perceivable delay. Right, we know that 0:11:15.679 --> 0:11:18.200 we have a delay from when we perceive something to 0:11:18.240 --> 0:11:20.520 when we can act on something. Like if you've ever 0:11:20.559 --> 0:11:23.000 been in a fender bender in a car accident, you 0:11:23.040 --> 0:11:25.920 know that that there's a delay between when you see 0:11:25.920 --> 0:11:28.680 the issue when you can hit the brake, and that 0:11:28.920 --> 0:11:32.240 can lead to accidents. Well, with flies, that delay seems 0:11:32.280 --> 0:11:36.600 to be super super small. So Rosenblatt was really interested 0:11:36.960 --> 0:11:40.960 in exploring the neurological reasons for that. How can that happen? 0:11:41.000 --> 0:11:43.520 It has to be really simple, right, There has to 0:11:43.559 --> 0:11:48.199 be a simple and more or less direct pathway that 0:11:48.360 --> 0:11:52.800 exists to allow a fly to react to detecting a 0:11:52.800 --> 0:11:57.160 potential threat like that, and if you could replicate that 0:11:57.920 --> 0:12:02.040 with electronics, you could have a very simple but potentially 0:12:02.200 --> 0:12:07.240 powerful artificial intelligence system. So he came up with this 0:12:07.440 --> 0:12:10.160 system that would be based off that very simple direct 0:12:10.200 --> 0:12:12.240 pathway that you would see in something like a fly, 0:12:12.760 --> 0:12:16.120 and he called it the perceptron. So he went back 0:12:16.200 --> 0:12:18.680 to the simple circuit design that was proposed by Pitts 0:12:18.679 --> 0:12:22.520 and McCullough and he built out the Mark one perceptron 0:12:23.480 --> 0:12:25.920 or perceptron. I guess I should say, so let's talk 0:12:25.920 --> 0:12:28.920 about a perceptron, like not big P, but a little 0:12:29.040 --> 0:12:31.840 P perceptron. This is probably what we would call a 0:12:31.920 --> 0:12:35.680 neural node in a modern neural network. So the purpose 0:12:35.800 --> 0:12:40.000 of the perceptron was to accept inputs and produce an 0:12:40.040 --> 0:12:44.679 output based on some threshold, Like if the inputs meet 0:12:44.720 --> 0:12:47.640 a certain threshold, one output would be produced. If they 0:12:47.720 --> 0:12:49.880 failed to do so, a different output would be produced. 0:12:50.720 --> 0:12:54.880 The inputs, in turn would be assigned weights, which would 0:12:54.880 --> 0:12:58.240 factor into the output the perceptron would generate. So when 0:12:58.240 --> 0:13:04.760 we're talking weights, I mean weights as in like how 0:13:04.840 --> 0:13:08.079 heavy something is or in this case, how much impact 0:13:08.520 --> 0:13:12.200 that thing has, So we're talking about how much impact 0:13:12.280 --> 0:13:15.920 one input has relative to other inputs. Let me use 0:13:15.960 --> 0:13:19.440 a really mundane human example to kind of explain what 0:13:19.520 --> 0:13:22.640 this means. Let's say that your friend asks you to 0:13:22.679 --> 0:13:24.760 go see a movie with them, and it's going to 0:13:24.800 --> 0:13:27.760 be playing tonight at nine pm. But you've had a 0:13:27.880 --> 0:13:30.880 really busy day and you might not be able to 0:13:30.920 --> 0:13:34.320 even eat dinner until around nine pm. And if you 0:13:34.360 --> 0:13:36.280 go see this movie, it might mean having to skip 0:13:36.320 --> 0:13:40.000 dinner or to try and eat something really fast and 0:13:40.120 --> 0:13:43.599 unhealthy before you go to the movie. What's more, you 0:13:43.679 --> 0:13:46.680 got a really big day tomorrow and you feel like 0:13:46.720 --> 0:13:49.480 you really need to be well rested for it. However, 0:13:49.600 --> 0:13:53.320 at the same time, you haven't seen this friend in ages, 0:13:53.360 --> 0:13:55.800 and you really like this person and you've wanted to 0:13:55.800 --> 0:13:59.000 hang with them for a really long time. Plus the 0:13:59.040 --> 0:14:01.600 movie they're suggesting is one you've really wanted to see 0:14:01.600 --> 0:14:04.840 and you haven't gone yet. Well, you would likely assign 0:14:04.960 --> 0:14:09.360 at least unconsciously weights to each of these factors before 0:14:09.360 --> 0:14:11.439 you make your decision. You know, if getting some dinner 0:14:11.480 --> 0:14:14.440 without having to rush, and also to be really well 0:14:14.480 --> 0:14:17.720 rested for tomorrow are really important to you, you'll probably 0:14:18.000 --> 0:14:21.880 reluctantly decline the offer. But if you really crave some 0:14:21.960 --> 0:14:24.000 time with your friend and you really want to see 0:14:24.000 --> 0:14:26.360 that movie before all the spoilers come out on Facebook 0:14:26.400 --> 0:14:30.440 or whatever, maybe you'll say yes. Your decision depends upon 0:14:30.480 --> 0:14:34.520 the weights you assign those factors, those inputs, even if 0:14:34.520 --> 0:14:38.000 you don't consciously think about it that way. Well, the 0:14:38.040 --> 0:14:41.920 Perceptron system worked in a similar way, produced outputs by 0:14:41.920 --> 0:14:46.800 taking the inputs into consideration, including each input's weight. Moreover, 0:14:47.080 --> 0:14:49.560 the more you submitted inputs, the more the system would 0:14:49.640 --> 0:14:53.280 quote unquote learn how to weight each of those inputs, 0:14:53.560 --> 0:14:56.600 all with the goal of bringing the actual output that 0:14:56.640 --> 0:15:00.360 the process or you know, generates closer to the one 0:15:00.560 --> 0:15:04.920 you want it to generate. Okay, I just said a 0:15:04.920 --> 0:15:07.280 lot there. We've got some more to get through. But 0:15:07.320 --> 0:15:09.320 before we get to that, let's take a quick break, 0:15:18.520 --> 0:15:20.920 all right. Before the break, we were talking about inputs 0:15:21.080 --> 0:15:25.240 and weights and the idea of getting an output that 0:15:25.520 --> 0:15:28.240 is close to what you want the system to do. 0:15:28.960 --> 0:15:31.720 That's not a guarantee, right, The system could generate an 0:15:31.720 --> 0:15:35.800 output that's quote unquote wrong, you know, depending on whatever 0:15:35.880 --> 0:15:41.080 task you've set this machine learning system to learn, and 0:15:41.160 --> 0:15:43.280 that gets a bit conceptual. So let's talk about a 0:15:43.320 --> 0:15:45.840 simple example that I love to use. If you've been 0:15:45.840 --> 0:15:48.400 listening to texta for a while, you've heard this before, 0:15:49.400 --> 0:15:53.000 and that's talking about pictures of cats. Because cats ruled 0:15:53.160 --> 0:15:55.440 the Internet. I don't know if they still do. They 0:15:55.480 --> 0:15:58.960 won't talk to me, so just knock things off shelves. Anyway. 0:15:58.960 --> 0:16:01.320 If your goal is to tea each a computer system 0:16:01.720 --> 0:16:06.360 to differentiate photos that include a cat from photos that 0:16:06.440 --> 0:16:10.000 do not include a cat, well, you would need to 0:16:10.040 --> 0:16:13.400 train the system, and part of that includes feeding the 0:16:13.480 --> 0:16:18.200 system a whole bunch of photographs. Some of those would 0:16:18.240 --> 0:16:21.960 have cats in them, some would not, and chances are 0:16:22.040 --> 0:16:25.840 the system would misidentify photos. Maybe a significant number of 0:16:25.840 --> 0:16:28.680 those photos. You would probably have false positives where the 0:16:28.720 --> 0:16:31.560 system thinks there's a cat there and there's not, and 0:16:31.600 --> 0:16:34.280 false negatives where it doesn't think there's a cat there 0:16:34.560 --> 0:16:37.680 but there is. At that point, your goal is to 0:16:37.680 --> 0:16:41.120 try and teach the system to close the gap between 0:16:41.360 --> 0:16:44.800 the actual results it produces and what you want it 0:16:44.920 --> 0:16:47.760 to produce. In some systems, that means you might have 0:16:47.840 --> 0:16:51.320 to go in manually to adjust the input weights to 0:16:51.440 --> 0:16:53.880 increase the weight of one input versus another in an 0:16:53.920 --> 0:16:59.360 effort to cut down on mistakes. So the perceptron was interesting, 0:16:59.760 --> 0:17:03.080 but it was very limited in complexity. It was essentially 0:17:03.160 --> 0:17:05.560 a single layer where you'd feed a bunch of inputs 0:17:05.560 --> 0:17:07.879 in and you would get an output. So it was 0:17:07.920 --> 0:17:11.959 suitable for a subset of computational challenges, but anything beyond 0:17:12.000 --> 0:17:16.119 that was well beyond its own reach as a single 0:17:16.200 --> 0:17:19.719 layer network. By the late nineteen fifties, other researchers had 0:17:19.760 --> 0:17:23.879 created new neural networks that were multi layered. So a 0:17:23.960 --> 0:17:28.160 node or neuron didn't just accept inputs, it would generate 0:17:28.200 --> 0:17:32.600 outputs that then would become inputs for another layer down. 0:17:33.000 --> 0:17:36.399 So instead of just having one layer of nodes, you 0:17:36.400 --> 0:17:38.840 would have multiple layers of nodes. Typically you would have 0:17:39.280 --> 0:17:43.119 one at the quote unquote top of the network, and 0:17:43.160 --> 0:17:44.880 you would have outputs at the bottom, and the ones 0:17:44.880 --> 0:17:47.920 in between would be often referred to as hidden layers, 0:17:48.400 --> 0:17:51.640 and who knows how many there would be. So anyway 0:17:52.040 --> 0:17:54.840 you would feed data to the system, the initial nodes 0:17:54.880 --> 0:17:58.879 would generate information as outputs that would become inputs for 0:17:58.960 --> 0:18:03.680 the next layer down, which would then continue the process 0:18:03.720 --> 0:18:05.679 and so on and so forth until you get to 0:18:05.720 --> 0:18:08.760 the output. So now you had artificial neural networks that 0:18:08.800 --> 0:18:13.199 could tackle more complex challenges, and you would have multiple 0:18:13.200 --> 0:18:17.120 steps in the process. Didn't necessarily mean they were automatically 0:18:17.200 --> 0:18:21.280 better than the perceptron, was just that they were able 0:18:21.320 --> 0:18:27.119 to tackle more complicated tasks. What followed is something that 0:18:27.160 --> 0:18:30.680 will probably sound really familiar to you if you ever 0:18:30.840 --> 0:18:35.919 follow technology or fads, the hype around machine learning and 0:18:36.000 --> 0:18:38.800 artificial intelligence, and keep in mind this is like the 0:18:38.920 --> 0:18:43.920 nineteen sixties. It grew beyond the technology's actual capabilities. At 0:18:43.920 --> 0:18:47.840 that time. People started to project what this technology would 0:18:47.880 --> 0:18:50.239 be able to do, and they did so thinking it 0:18:50.280 --> 0:18:53.520 was going to be in a very short turnaround, like 0:18:53.560 --> 0:18:58.080 we're right on the very precipice of a monstrous breakthrough 0:18:58.119 --> 0:19:00.960 that will bring the science fiction future into the present. 0:19:01.880 --> 0:19:06.719 So when it was realized that we weren't at that, like, 0:19:06.800 --> 0:19:10.639 that's not how progress typically works. It's usually much more 0:19:11.119 --> 0:19:16.200 gradual and humble than that, well, then enthusiasm around AI 0:19:16.280 --> 0:19:18.800 began to take a hit. And as I mentioned already, 0:19:18.840 --> 0:19:22.440 a big part of AI research really comes down to funding, 0:19:23.000 --> 0:19:26.360 and it gets really challenging to secure funding when public 0:19:26.480 --> 0:19:31.200 opinion dims on a technology. We've seen this happen lots 0:19:31.200 --> 0:19:35.000 of times, right, like three D television was a fad 0:19:35.080 --> 0:19:37.720 that was pushed. Now, granted, that one, you could argue 0:19:37.800 --> 0:19:41.120 was more of an example of manufacturing companies that make 0:19:41.200 --> 0:19:44.800 televisions trying to push a technology on consumers and the 0:19:44.800 --> 0:19:47.520 consumers just weren't interested. You could argue that was the 0:19:47.560 --> 0:19:51.000 case there. But virtual reality in the nineteen nineties definitely 0:19:51.040 --> 0:19:54.639 followed this pathway. There was this excitement around virtual reality. 0:19:55.640 --> 0:19:59.480 Then that excitement faded to almost nothing when people realized 0:19:59.480 --> 0:20:02.800 that the actual state of the art of the technology 0:20:03.000 --> 0:20:06.480 was far below where they expected it to be. And 0:20:06.560 --> 0:20:10.040 suddenly people who are working in VR couldn't get funding 0:20:10.200 --> 0:20:12.400 for their work and they kind of had to scrounge 0:20:12.440 --> 0:20:16.359 around in order to keep the development going at all. 0:20:17.040 --> 0:20:19.879 And then eventually we would see that come back around again. 0:20:20.480 --> 0:20:24.040 You could argue that NFTs recently went through this too, 0:20:24.080 --> 0:20:27.560 where the hype went well beyond what NFTs could actually do. 0:20:28.640 --> 0:20:31.920 I've been really down on NFTs in general. I do 0:20:31.960 --> 0:20:37.080 think that there are potential legitimate uses for NFTs, but 0:20:37.160 --> 0:20:43.399 I think the early examples were frivolous and almost solely 0:20:43.480 --> 0:20:49.400 centered around speculation, as in like financial speculation and as 0:20:49.400 --> 0:20:51.320 a result, there was nothing for it to do other 0:20:51.400 --> 0:20:54.520 than to create a bubble that would ultimately burst, which 0:20:54.560 --> 0:20:58.199 is what happened. And maybe NFTs will recover from that 0:20:58.320 --> 0:21:02.440 and become something that's more fundamentally useful in the Internet 0:21:02.520 --> 0:21:05.560 in the future or in digital commerce in the future. 0:21:06.920 --> 0:21:10.879 But it's going to have to get over the catastrophe 0:21:10.920 --> 0:21:13.680 that happened when the rug was pulled out from underneath 0:21:13.760 --> 0:21:19.520 n FTS. And that was all predictable and preventable. But 0:21:21.000 --> 0:21:23.919 like I've said before, like I've lifted the joke from 0:21:23.960 --> 0:21:26.440 Peter Cook, we've learned from our mistakes. We can repeat 0:21:26.480 --> 0:21:31.040 them almost exactly. Anyway, This same sort of hype cycle 0:21:31.119 --> 0:21:35.800 activity happened with neural networks and machine learning in the 0:21:35.880 --> 0:21:41.639 nineteen sixties. Then enter Marvin Minsky and Seymour Pappart of 0:21:41.800 --> 0:21:44.920 MIT's AI lab. They were leading that lab at the time. 0:21:45.280 --> 0:21:49.800 In nineteen sixty nine, they co authored a book titled Perceptrons. 0:21:50.720 --> 0:21:55.040 They were actually critical of that artificial neural network approach 0:21:55.080 --> 0:21:58.080 to AI and machine learning. They were concerned that the 0:21:58.119 --> 0:22:01.040 limitations of the technology meant that you would need an 0:22:01.160 --> 0:22:06.399 unrealistically huge system of artificial neurons. Perhaps then using that 0:22:06.400 --> 0:22:10.639 system to compute an infinite number of variations of the 0:22:10.680 --> 0:22:14.399 same process or task if you wanted to train the 0:22:14.400 --> 0:22:18.879 weights so that they were of the optimal value. So, 0:22:18.920 --> 0:22:22.920 in other words, they thought, it's too impractical and it's 0:22:22.960 --> 0:22:24.960 going to take too much compute time, and you're never 0:22:25.040 --> 0:22:27.360 going to achieve the result you want. You're never going 0:22:27.400 --> 0:22:32.600 to get to that most perfect system. And they believed 0:22:33.119 --> 0:22:37.760 it just had fundamental inescapable flaws. They had different systems 0:22:37.800 --> 0:22:42.120 in mind. Now Minski and Separate tried to push their 0:22:42.160 --> 0:22:44.680 systems forward, and I could do a full episode about 0:22:44.720 --> 0:22:48.800 them too, and their ideas were not bad. They were different. 0:22:49.160 --> 0:22:51.520 It was a different approach. But this also meant that 0:22:51.600 --> 0:22:54.520 researchers who had been pushing the development of our artificial 0:22:54.560 --> 0:22:58.919 neural networks felt forced to move on to different projects 0:22:59.000 --> 0:23:03.600 because financial support for anything connected to the concept of 0:23:03.640 --> 0:23:09.120 neural networks effectively disappeared, right like funding just dropped for that. 0:23:09.200 --> 0:23:13.359 Because here you had these experts in computer science saying, yeah, 0:23:13.560 --> 0:23:19.159 this approach, while interesting, has already hit an insurmountable obstacle 0:23:19.200 --> 0:23:20.960 and it's not going to go any further. It's gone 0:23:21.000 --> 0:23:23.880 as far as it can go. And so a lot 0:23:23.920 --> 0:23:29.640 of computer scientists blamed Minsky and Separate for essentially demolishing 0:23:29.720 --> 0:23:33.680 funding for neural networks for more than a decade, and 0:23:33.680 --> 0:23:37.320 in fact, this would become an era that retrospectively, computer 0:23:37.400 --> 0:23:41.680 scientists would reference as the AI Winter got all Game 0:23:41.720 --> 0:23:44.800 of Thrones up in here. Now. In nineteen eighty two, 0:23:45.240 --> 0:23:49.200 there was a hint of spring thawing out that AI 0:23:49.240 --> 0:23:54.120 Winter researchers in Japan were starting to resurrect work on 0:23:54.280 --> 0:23:58.640 neural network projects, and meanwhile, a scientist named John Hopfield 0:23:59.080 --> 0:24:02.080 submitted a research paper to the National Academy of Sciences 0:24:02.560 --> 0:24:05.280 that brought neural networks back into discussion here in the 0:24:05.359 --> 0:24:10.800 United States. And because Japan was actively investing in developing 0:24:10.800 --> 0:24:15.000 that technology, institutions in the United States began to open 0:24:15.119 --> 0:24:17.359 up the purse strings a bit because there was a 0:24:17.400 --> 0:24:21.280 concern that if there were something to this artificial neural 0:24:21.320 --> 0:24:25.920 network concept, if in fact those obstacles weren't insurmountable, as 0:24:25.960 --> 0:24:30.480 min Skin Separate had suggested, the US could potentially fall 0:24:30.720 --> 0:24:35.320 behind another country because it would fail to fund its development. So, 0:24:35.920 --> 0:24:38.760 in a desire not to have Japan take the ball 0:24:38.800 --> 0:24:41.439 and run with it, the United States began to invest 0:24:41.680 --> 0:24:45.479 again in artificial neural network research and development. In the 0:24:45.480 --> 0:24:50.920 mid nineteen eighties, computer scientists essentially rediscovered the usefulness of 0:24:51.480 --> 0:24:55.639 a process called back propagation. And I've already talked about 0:24:56.160 --> 0:24:58.159 nodes and weights and stuff, but this is going to 0:24:58.160 --> 0:25:00.479 require a little bit more explanation to under stand what 0:25:00.560 --> 0:25:03.760 back propagation is all about. So let's kind of try 0:25:03.800 --> 0:25:07.560 to visualize a neural network. So you've got your input nodes. 0:25:07.920 --> 0:25:10.240 Just think of a bunch of circles. If you were 0:25:10.359 --> 0:25:12.160 drawing it from top to bottom, this would be your 0:25:12.200 --> 0:25:15.679 top layer. This is like the funnels where you're going 0:25:15.760 --> 0:25:19.639 to feed data into the system. Now you've got a 0:25:19.640 --> 0:25:21.400 whole bunch of these at the top and they can 0:25:21.440 --> 0:25:25.240 accept the data that you're feeding in. They process that data, 0:25:25.640 --> 0:25:30.480