WEBVTT - Ep147 "Can we engineer human thought?" with Tom Griffiths 0:00:05.120 --> 0:00:07.440 AI seems like it burst out of the gate a 0:00:07.480 --> 0:00:10.920 few years ago, But is it actually the latest chapter 0:00:11.119 --> 0:00:16.040 in a three hundred year trajectory to turn thought into math? 0:00:16.680 --> 0:00:19.400 Can the mind be captured with equations? 0:00:19.960 --> 0:00:23.400 Why do current AI models need petabytes of data but 0:00:23.480 --> 0:00:26.759 a child can learn from just a few examples. Why 0:00:26.760 --> 0:00:31.440 does AI have jagged intelligence, meaning it looks brilliant in 0:00:31.480 --> 0:00:36.080 one moment and then it does something totally nonsensical. In physics, 0:00:36.440 --> 0:00:39.479 we have various laws, like the law of gravity or 0:00:39.520 --> 0:00:43.320 the laws of motion, And today we're joined by cognitive 0:00:43.360 --> 0:00:47.080 scientist Tom Griffiths from Princeton to talk about whether we 0:00:47.159 --> 0:00:55.400 are moving towards nailing down laws of thought. Welcome to 0:00:55.400 --> 0:00:58.880 Inner Cosmos with me David Eagleman. I'm a neuroscientist and 0:00:58.920 --> 0:01:02.400 author at Stanford, and in these episodes we sail deeply 0:01:02.440 --> 0:01:05.960 into our three pound universe to understand why and how. 0:01:05.800 --> 0:01:07.880 Our lives look the way they do. 0:01:23.920 --> 0:01:27.000 One thing that distinguishes Homo sapiens from all our cousins 0:01:27.040 --> 0:01:30.119 in the animal kingdom is that we watch the world 0:01:30.120 --> 0:01:34.160 around us and we try to abstract patterns from it. 0:01:34.600 --> 0:01:37.679 For example, you might watch the way that a stone 0:01:37.760 --> 0:01:41.039 falls to the ground and maybe you see a tree 0:01:41.120 --> 0:01:44.200 branch fall, and maybe you see a glacier and one 0:01:44.240 --> 0:01:46.959 day a huge wall of ice falls off it, and 0:01:47.040 --> 0:01:51.000 pretty soon you start seeing an underlying similarity to the 0:01:51.040 --> 0:01:54.600 way that things move. And eventually someone very very smart 0:01:54.680 --> 0:01:58.360 comes along, like Isaac Newton and summarizes all this in 0:01:58.600 --> 0:02:02.080 the law of gravity. And then the same smart guy, 0:02:02.160 --> 0:02:05.600 Newton comes up with the three laws of motion. And 0:02:05.640 --> 0:02:08.560 then another smart person is Einstein. He figures out the 0:02:08.919 --> 0:02:12.840 conservation of mass and energy, which seems to be another 0:02:13.040 --> 0:02:16.919 ironclad law, and then we have the laws of thermodynamics 0:02:16.960 --> 0:02:20.320 and electrostatic laws, and all of this speaks to the 0:02:20.360 --> 0:02:24.079 great success that we've had as the species in figuring 0:02:24.120 --> 0:02:28.160 out the lowest level of code that's running in the universe. 0:02:28.880 --> 0:02:32.120 But for most of human history, the concept of a 0:02:32.600 --> 0:02:35.840 thought has felt like the most intimate thing we experience 0:02:35.960 --> 0:02:38.560 and the least tractable thing to study. 0:02:39.200 --> 0:02:41.840 What a thought is and how it occurs. 0:02:42.280 --> 0:02:46.200 That seems to live in a different category of mystery 0:02:46.240 --> 0:02:49.560 from how an object falls. Why, Well, it's because the 0:02:49.639 --> 0:02:53.600 thought pops into your head and somehow it carries memory 0:02:53.680 --> 0:02:57.760 and expectation and language and often a feeling. But it 0:02:57.840 --> 0:03:02.440 feels vaporous and private. It feels like the one thing 0:03:02.960 --> 0:03:07.880 that will forever escape formal description. But what's interesting is 0:03:07.919 --> 0:03:11.680 that for centuries people have tried, there's always been a 0:03:11.800 --> 0:03:13.399 deep human urge. 0:03:13.120 --> 0:03:16.480 To ask whether thought has laws to it? 0:03:16.560 --> 0:03:19.720 In other words, does the mind have principles that you 0:03:19.760 --> 0:03:23.760 can write down? Does reasoning have a grammar to it? 0:03:24.120 --> 0:03:27.960 Can you describe intelligence in a language that's precise enough 0:03:28.360 --> 0:03:31.440 that once you understand the rules, you can begin to 0:03:31.480 --> 0:03:35.800 build with them, like build artificial intelligence. Most of us 0:03:35.840 --> 0:03:38.400 are old enough to remember that this question of AI 0:03:38.960 --> 0:03:42.840 once lived in philosophy seminars and math departments, but now 0:03:42.880 --> 0:03:45.320 it's sitting at the center of our economy. 0:03:46.280 --> 0:03:48.200 Okay, So what is thought? 0:03:48.440 --> 0:03:52.640 Can we capture it in formal systems like laws or equations? 0:03:53.040 --> 0:03:57.600 Do different parts of intelligence come from logic, from learning, 0:03:57.720 --> 0:04:03.840 from uncertainty, from memory, from prior knowledge, from living inside bodies, 0:04:03.920 --> 0:04:06.120 from living inside our cultures? 0:04:06.640 --> 0:04:08.040 From the particular. 0:04:07.600 --> 0:04:11.120 Constraints of being a human animal with a short lifespan 0:04:11.520 --> 0:04:15.320 and limited bandwidth. Our guest today is someone who lives 0:04:15.400 --> 0:04:19.240 right at the intersection of all these questions. Tom Griffiths 0:04:19.320 --> 0:04:22.880 is a professor at Princeton, where He directs the Computational 0:04:22.960 --> 0:04:27.680 Cognitive Science Lab and the Princeton Laboratory for Artificial Intelligence. 0:04:28.160 --> 0:04:31.719 He has spent years asking how minds work through the 0:04:31.839 --> 0:04:35.760 different lenses of math and computation and learning. And he's 0:04:35.760 --> 0:04:38.919 the author of a wonderful new book called The Laws 0:04:38.960 --> 0:04:43.719 of Thought, which traces the long history of thinkers asking 0:04:44.080 --> 0:04:46.719 are their rules to this? Can we understand what human 0:04:46.760 --> 0:04:50.839 thinking is? In his book we get the lengthy arc 0:04:51.040 --> 0:04:55.839 of minds trying to understand mind. This begins millennia ago 0:04:55.920 --> 0:05:00.520 with Aristotle, who wondered whether logic itself could be math matized, 0:05:00.960 --> 0:05:05.520 and Tom follows the trail through the architects of symbolic reasoning, 0:05:05.960 --> 0:05:09.240 through the birth of computation, through the rise of neural networks, 0:05:09.640 --> 0:05:13.680 through the realization that probability theory might serve as a 0:05:13.800 --> 0:05:17.719 language for our beliefs about things. Along the way, in 0:05:17.760 --> 0:05:20.880 his book, a picture emerges that there may not be 0:05:21.320 --> 0:05:24.440 just a single tool for capturing their mind, but instead 0:05:24.440 --> 0:05:27.760 there are different ways of trying to tackle the problem, 0:05:28.120 --> 0:05:32.520 and each one sheds light on a different aspect of cognition. 0:05:33.240 --> 0:05:36.360 So we're going to talk about ourselves human minds, and 0:05:36.440 --> 0:05:40.160 we'll talk about AI what kind of intelligence is this 0:05:40.440 --> 0:05:47.480 and what is missing? Here's my interview with Tom Griffiths. 0:05:48.160 --> 0:05:50.320 As soon as you turn thought into math, it becomes 0:05:50.320 --> 0:05:52.480 something that machines would be able to do. And so 0:05:52.600 --> 0:05:57.160 our modern AI systems are really a consequence of, you know, 0:05:57.200 --> 0:06:00.520 that thought that people were having hundreds of years ago, 0:06:01.080 --> 0:06:04.160 of being able to turn thought into something that can 0:06:04.160 --> 0:06:05.800 be expressed in mathematical terms. 0:06:05.920 --> 0:06:07.440 And so one of the things that I loved about 0:06:07.440 --> 0:06:09.400 your book, by the way, is that you really tell 0:06:09.560 --> 0:06:11.640 stories of all the thinkers. 0:06:12.240 --> 0:06:14.880 You dive into the lives, you tell them with real color. 0:06:15.160 --> 0:06:17.440 If you were going to start with one thinker that 0:06:17.480 --> 0:06:19.160 you think is the most important, who would that be. 0:06:19.400 --> 0:06:21.160 There are a couple of people who have this sort 0:06:21.160 --> 0:06:23.960 of enduring influence throughout the book. One of them is Leibnitz, 0:06:24.040 --> 0:06:26.680 who kind of started this enterprise in some sense. He 0:06:26.760 --> 0:06:30.360 was really trying to take the idea of logic as 0:06:30.400 --> 0:06:33.280 expressed by Aristotle and turn it into math, but ultimately 0:06:33.279 --> 0:06:35.960 failed in doing that. But along the way he also 0:06:36.120 --> 0:06:38.799 discovered the calculus, which turned out to be really important 0:06:38.800 --> 0:06:41.440 when people wanted to make neural networks that could learn 0:06:41.520 --> 0:06:44.480 from data. It turns out that the trick for doing 0:06:44.560 --> 0:06:46.840 that is actually a trick that lad had figured out 0:06:46.880 --> 0:06:50.599 all that time ago. And then another key figure here, 0:06:50.760 --> 0:06:53.320 as might be suggested by the title of the book, 0:06:53.400 --> 0:06:57.800 is George Bull, who was a nineteenth century mathematician. He 0:06:57.920 --> 0:06:59.600 was a school teacher for most of his life and 0:06:59.680 --> 0:07:01.600 did a lot of like serious math on the side 0:07:01.640 --> 0:07:03.680 instead of you know, had a big effect on the 0:07:03.720 --> 0:07:08.080 history of mathematics. But he was really the person who 0:07:08.160 --> 0:07:11.560 then first solved that problem that Leibnitz had posed. And 0:07:11.600 --> 0:07:15.480 in addition to the impact that that work had, he's 0:07:15.560 --> 0:07:19.000 also the great grandfather of Jeff Hinton, who was one 0:07:19.000 --> 0:07:21.400 of the people who played an important role in developing 0:07:21.440 --> 0:07:24.040 these algorithms for learning from your own networks. And so 0:07:24.200 --> 0:07:26.240 you could make an argument that without Boole we would 0:07:26.240 --> 0:07:29.480 be a fair way back from where we are today. 0:07:29.920 --> 0:07:30.080 You know. 0:07:30.200 --> 0:07:33.160 Interestingly, when most people think about Boole, they only know 0:07:33.320 --> 0:07:38.360 about Boolean numbers. They know about zero and one binary numbers, 0:07:38.680 --> 0:07:41.520 and that's essentially the extent of the think. But he 0:07:41.560 --> 0:07:43.760 was quite celebrated in his life right even though he 0:07:43.840 --> 0:07:48.440 was a headmaster and not formally involved as a professor. 0:07:48.560 --> 0:07:51.000 Am I correct about this? He nonetheless was quite recognized 0:07:51.000 --> 0:07:51.800 as a mathematician. 0:07:52.400 --> 0:07:55.080 Yeah, he became a university professor later in his life, 0:07:55.120 --> 0:07:57.080 but spent most of his life as a teacher and 0:07:57.120 --> 0:08:01.080 a head master. But yeah, he won a gold medal 0:08:01.080 --> 0:08:04.080 in mathematics from the Royal Society. Was a very prestigious award, 0:08:05.080 --> 0:08:08.960 and you know, was this amazing person who was having 0:08:09.000 --> 0:08:11.880 these high level correspondences with the leading mathematicians of the 0:08:11.960 --> 0:08:16.360 day while holding down his job running a small school. 0:08:16.960 --> 0:08:17.640 Yeah. 0:08:17.800 --> 0:08:22.840 Now, in the book, you essentially use three different frameworks. 0:08:22.920 --> 0:08:26.440 What phenomenon does each framework explain? 0:08:26.560 --> 0:08:27.960 Unusually Well, the. 0:08:27.840 --> 0:08:29.880 Three frameworks I talk about in the book are what 0:08:29.920 --> 0:08:32.360 I call rules and symbols, which is what we've been 0:08:32.400 --> 0:08:34.800 talking about. This kind of like approach that stems out 0:08:34.840 --> 0:08:37.120 of logic, where the idea is that you're going to 0:08:37.200 --> 0:08:39.719 be able to write down some rules that characterize the 0:08:39.720 --> 0:08:41.960 structure of thought, and by following those rules, you end 0:08:42.000 --> 0:08:47.319 up with interesting consequences. The second approach is networks, features 0:08:47.320 --> 0:08:47.840 and spaces. 0:08:47.920 --> 0:08:48.079 Right. 0:08:48.120 --> 0:08:50.640 This is neural networks, which you can kind of think 0:08:50.640 --> 0:08:54.160 about as a system for doing computation when you start 0:08:54.200 --> 0:08:56.839 representing things as points in a space. Right, So if 0:08:56.840 --> 0:09:01.600 you start to think about you know, every object that 0:09:01.640 --> 0:09:03.360 you could see in the world is not being something 0:09:03.400 --> 0:09:05.920 that's described by rules, but being described by a location 0:09:06.040 --> 0:09:08.560 along some dimensions. You need to have a way of 0:09:08.600 --> 0:09:11.400 talking about how to map between those spaces and your 0:09:11.440 --> 0:09:14.120 all network solve that problem. And then the third is 0:09:14.960 --> 0:09:20.439 probability and statistics. And probability theory is really powerful because 0:09:20.720 --> 0:09:24.120 it is the complement to logic, where logic tells us 0:09:24.160 --> 0:09:26.120 how to go from things that we know to be 0:09:26.200 --> 0:09:28.959 true to other things that we're equally certain or true. 0:09:29.360 --> 0:09:31.680 Probability theory tells us what to do when we're uncertain. 0:09:32.160 --> 0:09:34.920 So if we get some information we want to draw 0:09:34.920 --> 0:09:37.319 a conclusion, but we're not able to draw that conclusion 0:09:37.360 --> 0:09:40.559 with perfect certainty, Probability theory tells us how to do that, 0:09:40.920 --> 0:09:44.000 and it tells us how to combine our sort of 0:09:44.400 --> 0:09:49.320 background beliefs, the other sources of information we have our 0:09:49.360 --> 0:09:52.120 biases in with the data that we see in a 0:09:52.120 --> 0:09:54.160 way that helps us to explain how it's possible to 0:09:54.240 --> 0:09:56.440 learn from small amounts of data. And that's one thing 0:09:56.480 --> 0:09:59.199 which is still something that discriminates human learning from the 0:09:59.280 --> 0:10:01.120 learning that's done by AI systems today. 0:10:01.520 --> 0:10:03.560 Okay, great, so we're going to dive into each of 0:10:03.600 --> 0:10:06.560 these three lenses. But just before we do, do you 0:10:06.640 --> 0:10:11.680 see the AI conversation today over indexing on one of 0:10:11.720 --> 0:10:13.160 these lenses over the others. 0:10:14.760 --> 0:10:17.520 I think there's a lot of emphasis on neural networks, 0:10:17.559 --> 0:10:21.240 which are fundamentally the sort of engineering technology which is 0:10:21.280 --> 0:10:25.360 making possible the creation of our chatbots and the other 0:10:25.520 --> 0:10:29.600 sort of big AI systems that are deployed. I think 0:10:30.200 --> 0:10:34.360 that potentially misses out the importance of these other threads 0:10:34.559 --> 0:10:37.679 right where. One thing that's important to remember is that 0:10:37.720 --> 0:10:40.680 those neural networks are being trained on what is essentially 0:10:40.720 --> 0:10:43.200 a system of rules and symbols. They're being trained on 0:10:43.720 --> 0:10:47.480 human language, which is symbolic and rule like in various ways, 0:10:48.160 --> 0:10:50.680 and they're being trained on code, which is even more 0:10:50.720 --> 0:10:54.240 symbolic and even more rule like, And those things together 0:10:54.280 --> 0:10:56.559 provide some of the substrate for developing the kind of 0:10:56.600 --> 0:10:59.280 intelligence that they demonstrate. And then the way that they're 0:10:59.320 --> 0:11:03.079 trained is by learning to predict the next token, right, 0:11:03.120 --> 0:11:05.080 the next word or part of word, based on what 0:11:05.080 --> 0:11:07.760 they've seen so far. And that way of training them 0:11:07.800 --> 0:11:12.040 is actually using probability theory. So that's a probabilistic problem 0:11:12.040 --> 0:11:14.079 because you're making a guess about what the next thing 0:11:14.120 --> 0:11:15.920 is going to be based on the things that you see, 0:11:16.200 --> 0:11:18.520 and so that's an important ingredient in their success as well, 0:11:18.559 --> 0:11:22.000 is that they're essentially learning to approximate a big probability distribution. 0:11:22.360 --> 0:11:25.079 So let's dive into the first one, rules and symbols. 0:11:25.280 --> 0:11:28.600 So take us back to the original urge. Why did 0:11:28.679 --> 0:11:33.559 early thinkers believe that this could be used to explain thinking. 0:11:35.200 --> 0:11:38.240 I think a lot of the draw of rules and 0:11:38.280 --> 0:11:41.760 symbols was that that really was, in some way what 0:11:42.120 --> 0:11:45.400 mathematics was to people, right, So Leibniz, part of the 0:11:45.400 --> 0:11:47.400 reason why he wasn't able to solve this problem of 0:11:47.440 --> 0:11:50.160 figuring out how to turn thought into math is that 0:11:50.679 --> 0:11:53.560 what he thought math was, or the kind of math 0:11:53.600 --> 0:11:55.440 that he was trying to use to solve that problem, 0:11:56.000 --> 0:11:59.080 was really arithmetic, right, And arithmetic was kind of like 0:11:59.120 --> 0:12:01.560 the model that they had for a mathematical system. So 0:12:01.600 --> 0:12:04.120 you can think about ideas being added together or subtracting 0:12:04.120 --> 0:12:07.080 one idea from another, and really thinking about the operators 0:12:07.120 --> 0:12:09.160 that you're using as being the things that are sort 0:12:09.160 --> 0:12:11.280 of coming from this familiar mathematical language. 0:12:11.360 --> 0:12:13.640 And so I think part of. 0:12:13.640 --> 0:12:16.240 The reason that we end up with that approach is 0:12:16.280 --> 0:12:19.120 because of the kind of math that has been successful 0:12:19.120 --> 0:12:22.920 in other settings, right where we need to do arithmetic 0:12:23.040 --> 0:12:25.320 to you know, that's a good description of certain. 0:12:25.160 --> 0:12:26.320 Kinds of things that human minds do. 0:12:27.720 --> 0:12:30.120 Google had the insight that you needed a different kind 0:12:30.120 --> 0:12:32.640 of algebra in order to describe thought, and then that's 0:12:32.679 --> 0:12:36.200 what leads to modern mathematical logic. But it's still in 0:12:36.240 --> 0:12:39.560 this kind of symbolic language, although Gooole also talked about 0:12:39.559 --> 0:12:42.840 probability theory as being important for capturing languages as well. 0:12:42.920 --> 0:12:45.360 So I think it's really more about what are the 0:12:45.440 --> 0:12:48.640 kinds of mathematical systems that it was sort of straightforward 0:12:48.640 --> 0:12:51.199 to formalize, and that gave us something that we could 0:12:51.240 --> 0:12:53.600 try to map thought onto. And that's what we do 0:12:53.679 --> 0:12:58.320 as scientists is often taking mathematical systems that mathematicians have 0:12:58.360 --> 0:13:00.840 defined for us and then saying, oh, I think this 0:13:01.000 --> 0:13:03.800 mathematical system maps onto the thing that I want to understand, 0:13:04.200 --> 0:13:06.400 and so trying to establish that correspondence and not just 0:13:06.520 --> 0:13:08.199 then allow us to derive its consequences. 0:13:09.160 --> 0:13:12.840 So speaking of rules and symbols, So thinkers like Newl 0:13:12.920 --> 0:13:17.040 and Simon, they popularize this idea of goals and sub goals. 0:13:17.480 --> 0:13:21.920 What did that viewpoint get exactly right about human problem solving. 0:13:23.440 --> 0:13:26.760 So now we're fast forwarding a bit right from we 0:13:26.840 --> 0:13:30.800 have Boule figuring out the structure of logic. That turns 0:13:30.840 --> 0:13:32.480 into you know, lots of people then sort of turn 0:13:32.520 --> 0:13:34.840 that into a sort of mature theory of logic. You 0:13:34.920 --> 0:13:39.160 get aalenteering kind of turning this into a theory of computation, 0:13:39.480 --> 0:13:41.960 thinking about what an abstract mathematician is doing when they're 0:13:42.000 --> 0:13:43.880 doing something like logic, and thinking about how you can 0:13:43.920 --> 0:13:48.240 make a machine do that. And then we have people 0:13:48.520 --> 0:13:51.560 starting to realize that, you know, as digital computers are 0:13:51.559 --> 0:13:55.560 being developed, maybe those provide a good model for how 0:13:55.600 --> 0:13:59.880 thinking works in general, and then trying to use a 0:14:00.000 --> 0:14:03.200 computer as a sort of foundation for you know, thinking 0:14:03.200 --> 0:14:05.280 about things like how people might solve problems. And so 0:14:06.040 --> 0:14:09.800 Alan Ewele and Herbert Simon were influential cognitive scientists who 0:14:10.600 --> 0:14:14.679 did exactly that. They had this idea that maybe there 0:14:14.760 --> 0:14:17.400 is a way that you could make computers smarter by 0:14:17.440 --> 0:14:20.720 using insights from human cognition, but also get a better 0:14:20.800 --> 0:14:23.160 understanding of what humans are doing when they're solving problems 0:14:23.200 --> 0:14:25.400 by using the sort of ideas that come from things 0:14:25.440 --> 0:14:29.360 like computer programming, and so they set up you know this, 0:14:29.640 --> 0:14:31.240 you know, when we're trying to solve a problem or 0:14:31.240 --> 0:14:33.360 prove a mathematical theorem or play a game of chess, 0:14:33.720 --> 0:14:35.520 they set this up as a problem of searching through 0:14:35.560 --> 0:14:40.600 a tree of possibilities, where what you're doing is making choices, 0:14:40.880 --> 0:14:42.480 and then each of those choices gives you a new 0:14:42.520 --> 0:14:44.240 set of choices, and each of those choices gives you 0:14:44.240 --> 0:14:46.360 a new set of choices, and the hard thing is 0:14:46.880 --> 0:14:49.160 finding a path through those choices that leads to the 0:14:49.160 --> 0:14:51.160 point that you want to end up at. And so 0:14:51.720 --> 0:14:54.120 that's something where you can take inspiration from how human 0:14:54.120 --> 0:14:57.720 mathematicians solve problems. You can take inspiration from the kind 0:14:57.760 --> 0:14:59.960 of you know, tricks like working backwards from the end 0:15:00.080 --> 0:15:03.320 towards the start. Right, Those were principles that they were 0:15:03.360 --> 0:15:05.280 able to use to try and explain these aspects of 0:15:05.360 --> 0:15:07.800 human cognition as well as making the machines work better. 0:15:08.040 --> 0:15:09.960 Okay, but then one of the things that happened is 0:15:10.000 --> 0:15:13.120 that at least one of these attempts had ballooned into 0:15:13.200 --> 0:15:17.320 twenty five million rules. And so what does that teach 0:15:17.400 --> 0:15:20.520 us about the shape of human intelligence. 0:15:21.960 --> 0:15:23.800 This rules and symbols enterprise. 0:15:24.040 --> 0:15:24.160 Right. 0:15:24.240 --> 0:15:27.000 The sort of appeal that this had was that maybe 0:15:27.040 --> 0:15:27.520 one day. 0:15:27.400 --> 0:15:29.560 You could just write down all of the rules that 0:15:29.560 --> 0:15:31.720 you need to write down, and then you've characterized how 0:15:31.760 --> 0:15:34.320 intelligence works. Right, So it's just a matter of getting 0:15:34.400 --> 0:15:37.440 enough rules in a way that's very reminiscent today, right 0:15:37.520 --> 0:15:40.920 of you know, the way that our modern AI systems 0:15:40.920 --> 0:15:43.520 are being made is by training them on more and 0:15:43.600 --> 0:15:46.640 more data, right, feeding in more and more language. There 0:15:46.680 --> 0:15:48.520 was a hope that you could just like, yeah, like 0:15:48.800 --> 0:15:50.760 document all of the rules that you need to capture 0:15:51.080 --> 0:15:54.360 the structure of human knowledge. And so that led to 0:15:54.960 --> 0:15:57.240 you know, companies being started to try and engage in 0:15:57.280 --> 0:16:01.880 that enterprise, ultimately I would say, unsuccessfully, but giving us 0:16:01.920 --> 0:16:06.440 some kind of characterization of like particular subsets of human knowledge. 0:16:06.520 --> 0:16:08.640 And so I think the thing that came out of 0:16:08.920 --> 0:16:12.280 that enterprise was revealing that maybe you need something more 0:16:12.360 --> 0:16:16.600 than just rules, right, that maybe thinking about logic as 0:16:16.640 --> 0:16:19.480 a basis for our model of intelligence was missing something. 0:16:20.040 --> 0:16:22.240 It's an approach that worked really well for certain kinds 0:16:22.280 --> 0:16:26.200 of problems like doing arithmetic, playing games or chess, but 0:16:26.240 --> 0:16:28.520 it didn't work very well for other kinds of problems 0:16:28.640 --> 0:16:31.520 like figuring out what you're seeing in the world, or 0:16:31.880 --> 0:16:34.360 actually learning language or these other kinds of things. 0:16:34.400 --> 0:16:37.080 And so this is what leads to your second lens, 0:16:37.240 --> 0:16:40.800 which is neural networks. And you talk about these as 0:16:40.880 --> 0:16:44.080 having you know, a boom and bust history. So, first, 0:16:44.120 --> 0:16:46.720 what happened in the last decade that allowed them to 0:16:46.760 --> 0:16:48.880 turn into the dominant paradigm. 0:16:49.320 --> 0:16:53.440 The big breakthrough in the last decade was really about 0:16:53.760 --> 0:16:56.680 being able to make bigger in neural networks that could 0:16:56.720 --> 0:17:01.840 be trained on more data in a way that could scale, right, 0:17:01.880 --> 0:17:05.159 and so bigger here means what these are. An artificial 0:17:05.240 --> 0:17:08.840 neural network is a set of units that are communicating 0:17:08.840 --> 0:17:13.120 with one another. They're communicating along weighted connections, a sort 0:17:13.160 --> 0:17:15.639 of you know, imagine like how neurons are connected in 0:17:15.680 --> 0:17:17.840 your brain, and those neurons are connected to one another 0:17:17.880 --> 0:17:20.359 and sending each other signals. An artificial neural network is 0:17:20.400 --> 0:17:23.719 basically simulating that kind of structure inside a computer. And 0:17:23.760 --> 0:17:26.679 so for a long time, the sort of the history 0:17:26.680 --> 0:17:30.040 of neural networks has been one of people figuring out 0:17:30.119 --> 0:17:33.880 how to make bigger neural networks work. So the very 0:17:33.920 --> 0:17:37.800 first you know, learning neural networks. They had a learning 0:17:37.840 --> 0:17:40.719 algorithm that worked for one layer of weights, and then 0:17:40.760 --> 0:17:42.560 there was a breakthrough in the nineteen eighties that meant, 0:17:42.600 --> 0:17:44.280 now you had a learning algorithm that could work for 0:17:44.359 --> 0:17:46.600 multiple layers of weights, but it didn't work for very 0:17:46.760 --> 0:17:48.920 deep neural networks with lots of layers of net weights 0:17:48.960 --> 0:17:52.160 because it I can sort of explain the technical reasons 0:17:52.200 --> 0:17:54.159 behind it, but you know, sort of like the basic 0:17:54.160 --> 0:17:57.760 algorithm didn't quite work. And so the big breakthroughs of 0:17:57.840 --> 0:18:00.000 the last you know, ten to fifteen years have been 0:18:00.080 --> 0:18:03.480 about coming up with ways to take those algorithms and 0:18:03.520 --> 0:18:05.440 actually make them work for neural networks that are bigger 0:18:05.480 --> 0:18:07.679 and bigger and deeper and deeper, that are able to 0:18:07.840 --> 0:18:11.840 easily learn more complex functions and can do so from 0:18:12.160 --> 0:18:14.879 massive amounts of data in a way that means that 0:18:14.920 --> 0:18:18.000 they're able to discover sort of complex relationships between things 0:18:18.000 --> 0:18:19.960 that are necessary to produce intelligent behavior. 0:18:20.280 --> 0:18:24.080 And so, what are these neural networks capture about cognition 0:18:24.760 --> 0:18:29.200 that symbols missed, especially in terms of things like similarity 0:18:29.240 --> 0:18:32.040 and fuzziness and graded concepts. 0:18:32.560 --> 0:18:35.000 Fuzziness is a really good way of describing it. It's 0:18:35.040 --> 0:18:39.080 that you know, if you ask somebody, you know, whether 0:18:39.119 --> 0:18:41.800 something is a piece of furniture, they're going to say, 0:18:42.119 --> 0:18:43.840 you know, if you show them a chair, they'll say, yes, 0:18:43.920 --> 0:18:46.879 definitely a piece of furniture. If you show them a rug, 0:18:47.440 --> 0:18:51.119 they'll say, yeah, maybe a piece of furniture. Right, it 0:18:51.119 --> 0:18:52.960 doesn't sort of fit with our you know, week sort 0:18:52.960 --> 0:18:56.240 of have a prototypical idea of what furniture is, which 0:18:56.240 --> 0:18:58.479 contains things like chairs and tables and ottomans and these 0:18:58.480 --> 0:19:02.119 other kinds of things, and then rugs and treadmills, and you. 0:19:02.160 --> 0:19:04.040 Know, like these are things that maybe. 0:19:03.760 --> 0:19:06.320 You're in this category, but maybe an't right. And so 0:19:07.160 --> 0:19:09.520 we need to have a way of thinking about concepts 0:19:09.520 --> 0:19:11.760 that's not just the sort of yes or no, true 0:19:11.800 --> 0:19:14.760 or false one or zero that logic would give us. 0:19:14.800 --> 0:19:16.840 We need to have something which has that fuzziness in it. 0:19:17.160 --> 0:19:19.399 One way of getting fuzziness is by thinking about a 0:19:19.480 --> 0:19:22.760 concept in terms of points in space, right where you 0:19:22.760 --> 0:19:26.399 could think chairs are here in one location, rugs are 0:19:26.440 --> 0:19:29.040 here in another location, and maybe what it is to 0:19:29.040 --> 0:19:30.439 be a piece of furniture is to just be in 0:19:30.440 --> 0:19:32.560 some part of that space, and how close you are 0:19:32.600 --> 0:19:34.280 to that part of the space is like how good 0:19:34.320 --> 0:19:36.560 you are as an example of that kind of furniture. 0:19:37.359 --> 0:19:39.800 And so as soon as you think in those terms, 0:19:39.800 --> 0:19:42.160 you have a new problem, which is with our rules 0:19:42.160 --> 0:19:44.400 and symbols. We knew how to do computation, we knew 0:19:44.400 --> 0:19:47.000 how to describe thinking. Thinking was a matter of applying 0:19:47.040 --> 0:19:49.240 the rules and seat of you know, repeating that process. 0:19:50.040 --> 0:19:52.640 But we don't have a way of doing computation in spaces. 0:19:52.720 --> 0:19:54.439 And that's what youral networks give us. So you can 0:19:54.520 --> 0:19:58.760 kind of think about a space corresponding to the activation 0:19:58.920 --> 0:20:00.639 of the units inside this neural network. 0:20:00.680 --> 0:20:03.040 How much you know, how much input. 0:20:02.720 --> 0:20:05.600 Each neural unit in that neural network is receiving, and 0:20:04.920 --> 0:20:08.600 how much of a response it's making that characterizes some 0:20:08.680 --> 0:20:10.639 kind of space. And then neural network gives us a 0:20:10.640 --> 0:20:13.359 way of mapping from the inputs that it's getting to 0:20:13.440 --> 0:20:14.080 some output. 0:20:14.200 --> 0:20:16.040 So you could put in you know. 0:20:16.000 --> 0:20:18.239 Your picture of a chair, and it maps that to 0:20:18.240 --> 0:20:20.560 some point in space, and then it put sort of 0:20:20.560 --> 0:20:23.440 produces out an output the corresponds to, yes, this is 0:20:23.480 --> 0:20:26.080 a piece of furniture. And because those outputs can now 0:20:26.119 --> 0:20:29.199 be continuous values, you can capture the fuzziness and other 0:20:29.280 --> 0:20:31.360 kinds of things that you want for your concepts. 0:20:31.640 --> 0:20:34.960 And so, in what sense are these modern systems, these 0:20:35.040 --> 0:20:38.360 artificial neural networks learning, and in what sense are they 0:20:38.400 --> 0:20:42.720 doing something that's maybe categorically different from how children learn. 0:20:44.520 --> 0:20:48.320 This is a fundamental question, right, That's the kind of 0:20:48.320 --> 0:20:50.440 thing that we cognitive scientists think about a lot, and 0:20:50.800 --> 0:20:53.080 I think that AI researchers are starting to care about 0:20:53.119 --> 0:20:55.359 a lot too, which is, you know, what are these 0:20:55.440 --> 0:20:59.720 sort of meaningful differences between human minds, human brains and 0:20:59.720 --> 0:21:01.680 what we building in these AI systems or these sort 0:21:01.680 --> 0:21:07.240 of artificial brains. I think one very salient difference is 0:21:07.359 --> 0:21:10.399 the amount of data which is needed for a human 0:21:10.640 --> 0:21:12.639 to learn language compared to the amount of data you 0:21:12.640 --> 0:21:15.159 need to put into on neural network. So if you 0:21:15.200 --> 0:21:18.640 take a system like chat GPT, right, one of these chatbots, 0:21:19.000 --> 0:21:21.639