WEBVTT - The Mystery at the Heart of ChatGPT 0:00:15.356 --> 0:00:25.836 Pushkin. Today's show is about no knowns and unknown unknowns, 0:00:26.436 --> 0:00:30.876 which is to say, we're talking about AI, specifically a 0:00:30.956 --> 0:00:34.676 type of AI called a large language model, or an LM. 0:00:35.196 --> 0:00:38.956 The most famous LLM is CHAT GPT, but there are 0:00:38.996 --> 0:00:41.996 lots of others, and at their core, they all do 0:00:42.036 --> 0:00:44.796 the same thing. They read a piece of text and 0:00:44.836 --> 0:00:47.836 they predict what the next series of words should be. 0:00:48.436 --> 0:00:52.676 Lms are, obviously and quite suddenly, a huge deal in 0:00:52.716 --> 0:00:55.476 a lot of ways. One thing about them that is 0:00:55.596 --> 0:00:59.556 particularly wild to me. Lms behave in ways that are 0:00:59.556 --> 0:01:03.876 surprising even to the people who built them. In other words, 0:01:04.196 --> 0:01:08.716 large language models are this profoundly powerful, disruptive new thing, 0:01:09.316 --> 0:01:12.316 and right now we urgently need to figure out what 0:01:12.356 --> 0:01:19.996 they mean and how they work. I'm Jacob Goldstein and 0:01:20.036 --> 0:01:22.116 this is What's Your Problem, the show where I talk 0:01:22.196 --> 0:01:26.036 to people who are trying to make technological progress. My 0:01:26.076 --> 0:01:29.276 guest today is Sam Bowman. He's an expert in large 0:01:29.316 --> 0:01:33.396 language models in lms. He's on the faculty at NYU, 0:01:33.516 --> 0:01:36.116 and he runs a research group at an AI company 0:01:36.116 --> 0:01:40.796 called Anthropic. All the reason talk about lms inspired Sam 0:01:40.836 --> 0:01:42.716 to write a paper to clear up what he thought 0:01:42.716 --> 0:01:46.036 were some misconceptions. The paper is called eight Things to 0:01:46.116 --> 0:01:49.236 Know about Large Language Models. I am a fan of 0:01:49.356 --> 0:01:52.436 lists in general, and I loved this list in particular. 0:01:53.156 --> 0:01:55.356 Among other things, it gave me a deeper sense of 0:01:55.396 --> 0:01:58.356 the ways in which large language models are still a mystery, 0:01:58.556 --> 0:02:04.356 even to experts like Sam. That mystery, those unknowns, have 0:02:04.556 --> 0:02:08.156 important implications for the way we think about, and regulate 0:02:08.196 --> 0:02:11.796 and develop AI. We're going to start by discussing a 0:02:11.836 --> 0:02:15.596 pretty simple item on Sam's list. The item is this, 0:02:16.476 --> 0:02:20.916 brief interactions with llms are often misleading. You write this, 0:02:21.036 --> 0:02:26.316 You write, brief interactions with lms are often misleading. What's 0:02:26.356 --> 0:02:26.636 that mean? 0:02:27.236 --> 0:02:31.676 So when, especially when GPD four came out, and I 0:02:31.676 --> 0:02:35.636 guess also went when chat GPT first came out, there 0:02:35.676 --> 0:02:39.436 was very predictably this wave of people on Twitter saying, hey, 0:02:40.356 --> 0:02:44.276 this system is sentient and it knows where I live 0:02:44.556 --> 0:02:47.956 and it's ready to take over the world tomorrow because 0:02:47.996 --> 0:02:50.516 they had one chat with it and it said that 0:02:50.556 --> 0:02:52.476 it was sentient and it made a few educated guesses 0:02:52.556 --> 0:02:55.236 that happened to be bright, and you'll get other people 0:02:55.276 --> 0:02:58.236 on Twitter saying, hey, this system is dumb as bricks. 0:02:58.556 --> 0:03:02.076 I told it a really simple story and ask it 0:03:02.156 --> 0:03:03.556 what happened in the story and it got it wrong. 0:03:04.396 --> 0:03:06.396 There's a couple of things going on here. There's this 0:03:06.436 --> 0:03:08.236 great analogy that came up in a recent I think 0:03:08.236 --> 0:03:10.636 Time article by hell in Time owner saying they're basically 0:03:10.676 --> 0:03:14.716 improv players, where if you put them in some situation, 0:03:14.996 --> 0:03:18.276 if you put them in this situation of, oh, this 0:03:18.316 --> 0:03:20.476 is a conversation between a human who thinks the AI 0:03:20.516 --> 0:03:23.476 is sentient and the AI, then maybe the AA is 0:03:23.476 --> 0:03:24.316 going to say it's sentient. 0:03:24.436 --> 0:03:27.396 So specifically, they're improv players in the sense that famously 0:03:27.436 --> 0:03:30.076 an improv you're supposed to say yes to everything that 0:03:30.116 --> 0:03:34.476 your improv partner suggests, and so CHATCHYPT and the other 0:03:34.836 --> 0:03:37.436 llms are there to say yes, yes, and and that's 0:03:37.476 --> 0:03:38.196 what's going on. 0:03:38.556 --> 0:03:41.756 That's a decent part of it. Yeah, they're going to 0:03:41.796 --> 0:03:43.396 say yes. They're going to go along with what you're 0:03:43.396 --> 0:03:46.036 doing if you make it clear what you expect, if 0:03:46.076 --> 0:03:48.356 you make it clear, like what kind of narrative you're 0:03:48.396 --> 0:03:51.356 putting them in, what kind of environment you're putting them in, 0:03:51.396 --> 0:03:52.156 they'll go along with that. 0:03:52.556 --> 0:03:55.956 Uh, there are a couple of items on your list 0:03:56.436 --> 0:04:00.036 that seems directly contrary to assertions I've heard from other 0:04:00.076 --> 0:04:04.836 people about LMS, so that's fun and exciting. One is 0:04:06.436 --> 0:04:10.156 human performance on a task is not an upper bound 0:04:10.356 --> 0:04:12.076 on LM performance. 0:04:12.796 --> 0:04:15.316 So one of the reasons I think these systems can 0:04:16.356 --> 0:04:18.596 be better at a lot of tasks than humans is 0:04:18.676 --> 0:04:21.916 just that they've learned more stuff that they've read and 0:04:21.956 --> 0:04:24.756 mostly memorized, not just sort of all of the important 0:04:24.756 --> 0:04:27.916 papers in one little branch of chemistry or all of 0:04:27.916 --> 0:04:29.996 the important papers in all of chemistry. They've just read 0:04:29.996 --> 0:04:33.276 and mostly memorized, sort of all of the research papers. 0:04:33.036 --> 0:04:35.436 In everything, all of the papers in everything. 0:04:35.676 --> 0:04:38.756 Yeah, and many of the novels and many of the 0:04:38.996 --> 0:04:42.476 many of the news stories. And even if these systems 0:04:42.476 --> 0:04:45.076 aren't really great at drawing connections between these and sort 0:04:45.076 --> 0:04:47.076 of synthesizing a new knowledge out of them, they can 0:04:47.076 --> 0:04:49.756 do that a little bit. So you can sort of 0:04:49.756 --> 0:04:53.716 imagine what happens if you get someone who's not especially bright, 0:04:53.756 --> 0:04:59.156 but basically reasonably intelligent, reasonably competent person who's just gotten 0:04:59.196 --> 0:05:01.236 a PhD in every single thing you can get a 0:05:01.236 --> 0:05:03.716 PhD in, I'd expect them to figure some things out 0:05:03.756 --> 0:05:05.836 and to be able to do some things that no 0:05:05.996 --> 0:05:09.956 one person can do, and probably they'll notice some things 0:05:09.956 --> 0:05:11.676 that that'll be really hard for even a team or 0:05:11.676 --> 0:05:13.996 an organization to do, just because it really it's important 0:05:13.996 --> 0:05:16.876 that it kind of is, in some sense living in 0:05:16.916 --> 0:05:17.836 this one person's head. 0:05:18.036 --> 0:05:21.516 Let me just like lean into that one for a sec. 0:05:22.196 --> 0:05:27.436 So do you think that in some amount of time 0:05:27.516 --> 0:05:32.516 in the next few years, say, an LM will make 0:05:32.596 --> 0:05:37.556 some kind of you know, breakthrough in knowledge, will figure 0:05:37.596 --> 0:05:39.876 something out that no human has ever figured out that 0:05:39.956 --> 0:05:42.276 will be a meaningful breakthrough. 0:05:42.636 --> 0:05:45.596 Yeah, I think so almost. By definition, I don't have 0:05:45.636 --> 0:05:46.956 a good guess of what that's going to look like 0:05:46.996 --> 0:05:47.676 or that's going to be. 0:05:47.716 --> 0:05:50.836 Otherwise you'd be figuring it out right now, right, yeah, yeah, yeah, 0:05:50.876 --> 0:05:53.076 But no, I can imagine some story like, hey, kind 0:05:53.076 --> 0:05:55.996 of a bunch of chemists in this field of chemists 0:05:55.996 --> 0:05:59.076 have noticed this thing, and some biologists in this other 0:05:59.116 --> 0:06:01.356 subfield have noticed this other thing, and some doctors have 0:06:01.436 --> 0:06:04.076 noticed this third thing, and together they mean that some 0:06:05.076 --> 0:06:07.716 very unexpected kind of drug design might treat some new disease. 0:06:08.596 --> 0:06:12.436 And maybe if you had enough medical researchers trying enough 0:06:12.436 --> 0:06:15.276 different things, eventually they'd stumble into that. But it seems 0:06:15.276 --> 0:06:17.836 possible at some point that something like a large language 0:06:17.876 --> 0:06:20.196 model is just going to notice that, and if you 0:06:20.516 --> 0:06:21.876 ask it the right way, it's going to tell you, 0:06:22.836 --> 0:06:25.396 and you might have to second guess it a lot. 0:06:25.436 --> 0:06:28.036 These systems also make stuff up, But I think it's 0:06:28.076 --> 0:06:31.276 quite possible that you start seeing these things pretty often 0:06:31.356 --> 0:06:34.076 tell you surprising new things that happen to be true. 0:06:34.716 --> 0:06:38.156 There's another item on your list that seems to me 0:06:38.276 --> 0:06:41.916 to be like a provocation. It seems to me in 0:06:41.916 --> 0:06:47.116 a good way. It seems like directly contradictory to what 0:06:47.156 --> 0:06:51.316 I have read, specifically to this idea that all large 0:06:51.396 --> 0:06:55.836 language models are doing is guessing what the next word 0:06:55.916 --> 0:06:58.676 in a series is likely to be, and that list 0:06:58.716 --> 0:07:04.836 item is this. Llms often appear to learn and use 0:07:04.916 --> 0:07:09.036 representations of the outside world. Llms often appear to learn 0:07:09.196 --> 0:07:13.556 and use representations of the outside world. So that sounds 0:07:13.636 --> 0:07:17.156 quite different from just guessing the next word, is it 0:07:16.956 --> 0:07:18.996 or is it not? Different in a way that I 0:07:19.036 --> 0:07:19.916 just don't understand. 0:07:20.396 --> 0:07:23.156 It turns out it's not that different. Okay, this is 0:07:24.316 --> 0:07:26.276 I want to say it's the big discovery, But it's 0:07:26.356 --> 0:07:29.796 this big discovery that's spread out over dozens of experiments 0:07:30.156 --> 0:07:31.276 over the last few years. 0:07:31.956 --> 0:07:34.076 Can you give me a specific example. It's such an 0:07:34.076 --> 0:07:38.596 abstract assertion that I think it would be helpful to 0:07:38.716 --> 0:07:40.996 have a specific example. 0:07:40.836 --> 0:07:44.236 That we can think about. One great example of this 0:07:44.516 --> 0:07:48.196 is if you tell a model a story, a simple 0:07:48.236 --> 0:07:51.116 story that takes place in some sort of physical space 0:07:51.156 --> 0:07:54.556 where it's it's some characters walking around a house and 0:07:54.596 --> 0:07:56.756 they're having a conversation while they're walking, and they're picking 0:07:56.756 --> 0:07:59.956 style up and they're putting it down. You can see 0:08:00.036 --> 0:08:03.916 inside the activations of the neurons when the model is 0:08:03.956 --> 0:08:06.236 reading that story. You can pull out a map of 0:08:06.236 --> 0:08:09.356 the house. You can see that there's a there's a 0:08:09.396 --> 0:08:11.396 piece the network that says, oh, okay, now they're in 0:08:11.436 --> 0:08:13.556 the living room, and another piece that says, oh, living 0:08:13.596 --> 0:08:17.876 room is connected to the bedroom. And you can mess 0:08:17.876 --> 0:08:19.396 with this in ways that show that it's really sort 0:08:19.436 --> 0:08:23.316 of it is really representing the house. That if you 0:08:23.356 --> 0:08:25.996 find the piece of the network that says, oh, Susan 0:08:26.076 --> 0:08:29.916 is in the living room, and you flip that, flip 0:08:29.916 --> 0:08:32.436 that from a positive number to a negative number, then 0:08:32.676 --> 0:08:35.716 the story will continue as though Susan is not in 0:08:35.716 --> 0:08:37.236 a lot in the living room, or couldn't possibly have 0:08:37.236 --> 0:08:37.796 been in living. 0:08:37.676 --> 0:08:41.636 So that does seem like it's representing the physical world 0:08:41.716 --> 0:08:45.116 in a way that is not just guessing the next word. 0:08:45.876 --> 0:08:50.156 Yeah. Yeah, so we're finding out these systems are actually 0:08:50.196 --> 0:08:52.676 representing the objects they're talking about, at least some of 0:08:52.716 --> 0:08:52.996 the time. 0:08:53.156 --> 0:08:55.796 They're creating a representation of physical space. 0:08:56.356 --> 0:08:58.796 Yeah. I should be clear that this is this doesn't 0:08:58.836 --> 0:09:02.956 always work when when you're giving these systems something really 0:09:03.036 --> 0:09:06.196 hard and subtle, they're just going to totally botch this stuff. 0:09:06.196 --> 0:09:09.876 Their internal representations are a mess. But more and more 0:09:09.876 --> 0:09:12.036 of the time they're really doing it. And as these 0:09:12.036 --> 0:09:14.556 things get bigger and bigger, they're doing it more and more. 0:09:15.196 --> 0:09:17.316 And so this feels like this important turning point where 0:09:17.316 --> 0:09:19.996 it's like, oh, okay, there is some understanding going on 0:09:20.076 --> 0:09:24.076 here and it's getting better, and that really radically opens 0:09:24.156 --> 0:09:26.596 up the possibilities for where this technology might go. 0:09:27.276 --> 0:09:32.276 This what you're saying seems very much at odds with 0:09:34.116 --> 0:09:39.276 what people generally say about llms, Right, Like the standard 0:09:39.876 --> 0:09:43.316 line is they're just predicting what the next word is 0:09:43.316 --> 0:09:44.796 going to be. And they're very good at predicting what 0:09:44.796 --> 0:09:45.956 the next word is going to be, and there's a 0:09:45.996 --> 0:09:48.116 lot of powerful things you can do, but what you're 0:09:48.116 --> 0:09:52.556 saying sounds fundamentally different from that. And so I mean, 0:09:52.956 --> 0:09:54.876 are the people saying they're just predicting the next word? 0:09:54.876 --> 0:09:58.036 Are they wrong? Is what you're saying a point of 0:09:58.116 --> 0:10:01.156 debate among experts or what? Why is this so different 0:10:01.156 --> 0:10:02.156 than what I've heard before. 0:10:02.636 --> 0:10:05.716 There's a few things going on. So first, saying that 0:10:05.716 --> 0:10:08.196 they're just predicting the next word is mostly right. But 0:10:08.276 --> 0:10:09.996 it turns out that's saying that they just predict the 0:10:09.996 --> 0:10:12.036 next word is a lot like saying humans are just 0:10:12.276 --> 0:10:16.156 chemical reactions. It turns out that if you're trying to 0:10:16.156 --> 0:10:20.556 predict the next word, and if you've got a smaller 0:10:20.716 --> 0:10:22.756 work that's trying to predict the next word, it's going 0:10:22.836 --> 0:10:26.196 to learn that sort of the word, the and of 0:10:26.316 --> 0:10:28.556 an a and those show up often, and that's about 0:10:28.596 --> 0:10:31.236 all it's going to learn. If you take a medium 0:10:31.276 --> 0:10:33.796 sized neural network, it's going to learn how to write 0:10:33.796 --> 0:10:35.756 fluent sentences. This is going to write, oh, okay, sort 0:10:35.756 --> 0:10:39.156 of adjectives come before nouns, these kinds of nouns come 0:10:39.196 --> 0:10:41.796 before these kinds of nouns. It might even learn some facts. 0:10:41.796 --> 0:10:44.436 It might learn that if you talk about the president 0:10:44.476 --> 0:10:46.956 of the United States, you'll get names like Obama and 0:10:46.996 --> 0:10:50.236 Bush and Biden and Trump, and it'll start to kind 0:10:50.276 --> 0:10:53.196 of make sense, but it's still just kind of learning statistics. 0:10:53.836 --> 0:10:56.596 And if you make the neural work even bigger, it 0:10:56.676 --> 0:11:00.116 will abstract further away. It will start to reason about 0:11:00.756 --> 0:11:04.316 the people and the objects and the spaces themselves and 0:11:04.436 --> 0:11:07.756 use that abstraction to predict the next word. So kind 0:11:07.756 --> 0:11:11.076 of the more these systems learn about the world, the 0:11:11.196 --> 0:11:13.836 farther and farther their Internet representations get from just sort 0:11:13.876 --> 0:11:16.476 of literally what word comes after what other word. 0:11:17.236 --> 0:11:20.236 So there's another item on your list that seems like 0:11:20.596 --> 0:11:24.076 it should have interesting implications for the AI industry, right 0:11:24.116 --> 0:11:28.156 for the business of building lms, I'll just read that one. 0:11:28.956 --> 0:11:34.796 It goes lms predictably get more capable with increasing investment, 0:11:35.276 --> 0:11:39.716 even without targeted innovation. So we'll get into it. But 0:11:40.356 --> 0:11:42.276 just top line, what does that mean? 0:11:44.556 --> 0:11:49.116 We had language models in almost their modern form back 0:11:49.196 --> 0:11:53.996 in twenty ten, eleven, twelve. Most of the building blocks 0:11:53.996 --> 0:11:55.876 for them go back even farther to the eighties or 0:11:55.876 --> 0:12:00.036 even the sixties. You might have noticed that we weren't 0:12:00.516 --> 0:12:03.956 We didn't have chat GBT ten or twenty or fifty 0:12:04.476 --> 0:12:09.436 years ago. What people have been gradually discovering and dually 0:12:10.396 --> 0:12:13.036 sort of discovering to a greater and greater degree is 0:12:13.036 --> 0:12:17.636 that if you just take this reldly simple technology and 0:12:18.716 --> 0:12:22.076 throw more data at it and run it in its 0:12:22.076 --> 0:12:25.836 sort of training phase for longer and longer by fancier 0:12:25.876 --> 0:12:27.956 or and France your computers to run it on, it 0:12:28.116 --> 0:12:29.036 just keeps getting better. 0:12:29.156 --> 0:12:32.796 But if the technology is not special, I mean, everybody 0:12:32.836 --> 0:12:37.076 knows the basic sauce, it suggests that GPT might not 0:12:37.196 --> 0:12:40.876 have an open AI. The company that makes chat GPT 0:12:41.116 --> 0:12:44.316 might not have like that much of a moat, right. 0:12:45.276 --> 0:12:48.996 I mean, Google is clearly in this business, as is Anthropic, 0:12:49.036 --> 0:12:53.116 the company where you're working. Is there any reason to 0:12:53.156 --> 0:12:56.036 think open AI GPT is going to stay ahead. 0:12:56.556 --> 0:12:59.156 I think there's not a lot of secret sauce. There 0:12:59.196 --> 0:13:01.276 are some details of how to build these things that 0:13:01.796 --> 0:13:04.196 don't get published, but the basic idea is very much 0:13:04.196 --> 0:13:09.996 out there. And yeah, I think the the closest thing 0:13:10.036 --> 0:13:12.516 you can really have to emote is just enormous amounts 0:13:12.516 --> 0:13:15.036 of money. I think at some point you're going to 0:13:15.076 --> 0:13:18.476 have a relatively small number of labs building the really 0:13:18.556 --> 0:13:21.556 impressive frontier systems just because at some point these are 0:13:21.556 --> 0:13:24.996 going to be ten billion dollar projects, and it just 0:13:25.036 --> 0:13:26.876 seems unlikely that you're going to get that many ten 0:13:26.876 --> 0:13:28.836 billion dollar projects. 0:13:28.436 --> 0:13:31.076 If it's the case, as you say that, essentially what 0:13:31.676 --> 0:13:34.956 you need to build a frontier level LM is a 0:13:34.996 --> 0:13:41.596 lot of money. I would guess that governments around the world, 0:13:41.636 --> 0:13:45.076 certainly say China to pick a salient government, are probably 0:13:45.356 --> 0:13:48.716 building giant lms right now. Does that seem like a 0:13:48.756 --> 0:13:50.036 reasonable guess? 0:13:51.676 --> 0:13:55.116 Yeah, that seems right. I know there are a lot 0:13:55.116 --> 0:13:59.836 of private and private, public and public groups in China 0:14:00.036 --> 0:14:02.516 working in this stuff, and when I sort of hear 0:14:02.556 --> 0:14:05.716 people in the field who are following the geopolitical side 0:14:05.716 --> 0:14:08.036 of this more closely, they're paying a lot of attention 0:14:08.196 --> 0:14:13.716 to things like the Chips Act and Global Trade in 0:14:14.396 --> 0:14:17.476 chips in that you really do need. When you're spending 0:14:17.556 --> 0:14:19.956 these millions or billions of dollars, you're basically spending them 0:14:19.956 --> 0:14:23.316 to buy or rent very fancy, state of the art 0:14:23.436 --> 0:14:27.476 computer chips. And it has become a priority for the 0:14:27.516 --> 0:14:29.796 US to try to make it hard for China to 0:14:29.796 --> 0:14:33.116 do that, and. 0:14:33.716 --> 0:14:35.716 To try and make it hard for China to get 0:14:35.556 --> 0:14:38.476 at the processor level, which in a sense is like 0:14:38.836 --> 0:14:41.796 the cement that lllms are built from. There is a 0:14:41.836 --> 0:14:45.796 physical thing. We forget that, but it's fancy chips basically. 0:14:46.156 --> 0:14:46.556 That's right. 0:14:46.636 --> 0:14:52.276 Yeah, we've been talking so far about what we know 0:14:52.516 --> 0:14:55.916 about how large language models work. After the break, we'll 0:14:55.916 --> 0:14:58.396 get into what I think is the most interesting thing 0:14:58.476 --> 0:15:09.236 about lms, what we don't know about how they work. 0:15:09.796 --> 0:15:10.756 That's the end of the ads. 0:15:11.196 --> 0:15:12.356 Now we're going back to the show. 0:15:12.796 --> 0:15:16.636 So far, we've basically been talking about how do lllms work. 0:15:16.796 --> 0:15:22.916 What's going on? There is another bucket in your list, 0:15:22.956 --> 0:15:26.756 several items, three items that are it seems to me, 0:15:26.796 --> 0:15:29.116 in quite a different category, and they get at this 0:15:29.876 --> 0:15:35.156 very very interesting idea about lms, and that is, to 0:15:35.276 --> 0:15:40.316 some significant degree, nobody knows how they work. The people 0:15:40.316 --> 0:15:43.116 who build lms, people like you, people who build them 0:15:43.116 --> 0:15:46.516 and study them, don't understand a lot of what is 0:15:46.556 --> 0:15:49.716 going on, which is amazing to me and super interesting. 0:15:49.756 --> 0:15:55.996 So let's start with this list item. It says specific 0:15:56.116 --> 0:16:03.436 important behaviors in lms tend to emerge unpredictably as a byproduct. 0:16:02.796 --> 0:16:03.836 Of increasing investment. 0:16:03.916 --> 0:16:06.956 And you give a couple of examples of this happening 0:16:07.556 --> 0:16:09.916 for real in the world. I think the best way 0:16:09.916 --> 0:16:12.876 to understand what's going on here is to talk about 0:16:12.876 --> 0:16:15.236 one of those examples. Can you just like talk me 0:16:15.276 --> 0:16:20.036 through one of those examples of this unpredictable new behavior emerging. Yeah. 0:16:20.436 --> 0:16:23.396 So a specific large language model that people working in 0:16:23.436 --> 0:16:25.396 the stuff talk about a lot is GPD three. This 0:16:25.476 --> 0:16:28.476 came out a little less than three years ago and 0:16:28.516 --> 0:16:30.356 I think sort of kicked off the modern wave of 0:16:30.356 --> 0:16:34.116 research on this stuff. And one thing researchers would do, 0:16:34.156 --> 0:16:37.236 as these systems would would come out is give them 0:16:37.476 --> 0:16:39.676 math puzzles and logic puzzles and see how they did. 0:16:40.356 --> 0:16:42.276 And this could be as simple as just sort of 0:16:42.316 --> 0:16:45.636 giving the model reasonably hard arithmetic, sort of asking the model, 0:16:45.956 --> 0:16:49.076 what is one hundred and twenty five plus four hundred 0:16:49.076 --> 0:16:52.036 and sixty seven. And what they found is sort of 0:16:52.556 --> 0:16:55.196 GPD one was bad at this, and GPD two was 0:16:55.236 --> 0:16:57.396 bad at this, and at least for some of these tasks, 0:16:57.476 --> 0:17:02.396 GPD three was also bad at this. And they released it. 0:17:02.396 --> 0:17:03.556 They put it out in the world, they wrote a 0:17:03.556 --> 0:17:06.676 paper about it, they did some demos to researchers, and 0:17:06.716 --> 0:17:08.836 then eventually just let anyone sign up and use it. 0:17:09.716 --> 0:17:14.076 And after a few months people started noticing. Oh, there 0:17:14.076 --> 0:17:15.876 are some tricks you can use to actually make it 0:17:15.996 --> 0:17:21.716 quite a bit better at this. If you ask the 0:17:21.756 --> 0:17:24.996 model the right way, sometimes it'll just kind of reason 0:17:25.036 --> 0:17:28.196 out loud. Sometimes it will say, well, it'll actually do 0:17:28.316 --> 0:17:30.116 long edition, we'll actually write out its steps. 0:17:30.476 --> 0:17:33.556 So give me a specific example. How do you ask 0:17:33.596 --> 0:17:34.276 it the right way? 0:17:35.756 --> 0:17:37.796 So it took even a few more months for people 0:17:37.796 --> 0:17:40.916 to figure out how to do this systematically, but it 0:17:40.956 --> 0:17:43.916 turned out the trick was you literally say, let's think 0:17:43.956 --> 0:17:44.716 step by step. 0:17:44.996 --> 0:17:48.196 You actually type that in, you say that to the machine, 0:17:48.196 --> 0:17:49.116 to the model. 0:17:49.036 --> 0:17:51.516 Yes, And if you say what is this number of 0:17:51.516 --> 0:17:54.956 plus this number question mark, it'll give a wrong answer. 0:17:55.116 --> 0:17:56.876 If you say, what is this number of plus this number, 0:17:57.356 --> 0:18:00.636 let's think step by step dot dot, it's going to 0:18:00.716 --> 0:18:03.036 list out. Okay, let's start with the ones digit, and 0:18:03.036 --> 0:18:04.996 then the tenth digit, and then the one hundredth digit, 0:18:05.556 --> 0:18:08.236 and then give you the answer, and it'll very often 0:18:08.236 --> 0:18:10.836 be right huh. And it turns out this works really 0:18:10.876 --> 0:18:14.316 generally that for many kinds of sort of math and 0:18:14.396 --> 0:18:19.276 reasoning problems, even some even sort of ethics problems. There's 0:18:19.636 --> 0:18:21.396 a huge range of things you might ask one of 0:18:21.436 --> 0:18:24.036 these ural networks to do where if you just tell it, 0:18:24.316 --> 0:18:28.116 let's think step by step, it will bring out this 0:18:28.156 --> 0:18:31.076 whole reasoning ability that is actually really useful, that allows 0:18:31.116 --> 0:18:32.516 it to do much better at a lot of things, 0:18:32.916 --> 0:18:37.556 and that it didn't have before. And when this technology 0:18:37.596 --> 0:18:39.676 was first released, the people who built it, they did 0:18:39.676 --> 0:18:41.076 not know this was a possibility. 0:18:42.036 --> 0:18:45.916 That's wild, right, Like it means this thing is incredibly 0:18:45.996 --> 0:18:48.996 powerful in a way that the people who built it 0:18:49.076 --> 0:18:51.996 didn't know. And let's think step by step is just 0:18:52.076 --> 0:18:56.116 like this incantation. It's just like saying abracadabra or something, 0:18:56.716 --> 0:18:58.956 and the builders didn't know it was there. 0:18:59.436 --> 0:19:01.996 Yeah, it's it's a bizarre time to be working on 0:19:02.036 --> 0:19:02.596 this stuff. 0:19:02.676 --> 0:19:06.076 It Like, here's where it's getting a little sketchy to 0:19:06.116 --> 0:19:08.316 me at a certain level, right, I mean you've also 0:19:08.316 --> 0:19:10.316 done a lot of work in AI safety and this 0:19:10.396 --> 0:19:12.876 kind of section of the interview, I feel like we're 0:19:12.876 --> 0:19:14.996 getting more toward that, the section of like, the people 0:19:15.036 --> 0:19:17.916 building this stuff don't understand what it can do. And 0:19:17.956 --> 0:19:20.596 here should we add another list item here? Like this 0:19:20.716 --> 0:19:23.636 might be the place Cherkiff, So there's this other item 0:19:23.676 --> 0:19:25.996 on your eight things to know list that seems germane. 0:19:25.996 --> 0:19:30.916 Here experts are not yet able to interpret the inner 0:19:30.956 --> 0:19:35.836 workings of lms, which also wild also kind of goes 0:19:35.876 --> 0:19:39.676 with this idea of not knowing what the thing can do, 0:19:39.836 --> 0:19:44.076 right and very not intuitive for a piece of technology. 0:19:44.156 --> 0:19:47.156 Right If you go back to say the Internet, Sure 0:19:47.196 --> 0:19:50.996 we didn't know all the social implications of the Internet, 0:19:51.236 --> 0:19:54.156 but we knew how the technology worked. We knew what 0:19:54.236 --> 0:19:56.716 was going on with the chips and the wires and 0:19:56.716 --> 0:20:00.076 the electrons and whatever. Right, Like the amazing thing here 0:20:00.116 --> 0:20:02.396 is clearly we don't know the social implications of AI. 0:20:02.796 --> 0:20:05.436 But you're saying, we don't even know what it's doing 0:20:05.516 --> 0:20:06.516 inside the box. 0:20:08.076 --> 0:20:11.036 Yeah, that's right. We've got these very crude tools for 0:20:11.116 --> 0:20:13.476 sort of opening the box and looking inside. I mean, 0:20:13.636 --> 0:20:15.156 in a literal sense, we know it's going on. We 0:20:15.156 --> 0:20:17.796 can say, oh, when you put in this word, then 0:20:18.276 --> 0:20:20.316 it makes this number bigger, which makes that number smaller, 0:20:20.316 --> 0:20:21.996 which makes this number bigger. And you could keep saying 0:20:22.036 --> 0:20:25.076 that for twenty years and then you'd have explained what happened. 0:20:26.636 --> 0:20:29.316 But we haven't figured out any other way of talking 0:20:29.356 --> 0:20:32.556 about these systems that actually gives us any clarity about 0:20:33.676 --> 0:20:35.836 what's possible why these systems are doing what they're doing 0:20:35.956 --> 0:20:39.596 where they're reliable and not it's just this huge mess 0:20:39.636 --> 0:20:43.436 of connections that we don't really know what to do with. 0:20:43.996 --> 0:20:48.156 I mean, what should we make of this set of 0:20:48.236 --> 0:20:54.476 facts that these are incredibly powerful tools that nobody understands 0:20:54.516 --> 0:20:59.956 at a pretty deep level, that can do unpredictable things, 0:20:59.956 --> 0:21:03.436 that are able to do things that even their makers 0:21:03.516 --> 0:21:04.476 don't know they can do. 0:21:05.236 --> 0:21:09.956 I think it's pretty exciting and also pretty sobering. I 0:21:09.956 --> 0:21:11.436 think we don't have a good way of predicting how 0:21:11.476 --> 0:21:13.476 fast this is moving or what we're going to get when. 0:21:14.556 --> 0:21:18.196 But in the big picture, it seems like there's a 0:21:18.236 --> 0:21:21.436 lot of momentum toward building these really powerful eye systems 0:21:21.476 --> 0:21:24.956 over the next few years. We don't understand how they work. 0:21:25.476 --> 0:21:27.676 Another one of these list items is we also aren't 0:21:27.716 --> 0:21:29.236 very good at controlling, and we aren't very good at 0:21:29.236 --> 0:21:30.436 making them do what we want. 0:21:30.516 --> 0:21:32.396 Yes, let me just pause there, because it's the last 0:21:32.436 --> 0:21:34.956 list item and you have just walked up to it. 0:21:34.956 --> 0:21:37.356 So the last item, the item that we haven't mentioned 0:21:37.356 --> 0:21:40.956 on your list. There are no reliable techniques for steering 0:21:40.996 --> 0:21:44.076 the behavior of lms, so they're powerful. We don't really 0:21:44.156 --> 0:21:46.036 understand how they work. They can do things we don't 0:21:46.076 --> 0:21:48.676 know they're going to do, and we can't really control them. 0:21:49.036 --> 0:21:51.116 Now we're through the list. Now let's just talk it out. 0:21:51.436 --> 0:21:55.596