WEBVTT - Jack Morris on Finding the Next Big AI Breakthrough 0:00:02.720 --> 0:00:19.000 Bloomberg Audio Studios, Podcasts, Radio News. Hello and welcome to 0:00:19.040 --> 0:00:21.080 another episode of The Odd Laws podcast. 0:00:21.200 --> 0:00:23.520 I'm Joe Wisenthal and I'm Tracy Alloway. 0:00:23.800 --> 0:00:26.319 Tracy, have you played around with GPT five much? 0:00:26.920 --> 0:00:30.640 Not really, I've been perplexity pills. Oh that's what your 0:00:30.720 --> 0:00:33.360 main Yeah, that's my main one at the moment. But 0:00:33.479 --> 0:00:34.840 is it good? I hear mixed. 0:00:35.080 --> 0:00:37.239 I use it because I use GPT every day. It 0:00:37.280 --> 0:00:41.400 does not strike me as like obviously better yeah for 0:00:41.600 --> 0:00:44.640 my uses than like the three models, which I've been 0:00:44.720 --> 0:00:46.760 very impressed by because you know, I want to establish them. 0:00:46.760 --> 0:00:47.960 No hater or anything like that. 0:00:48.240 --> 0:00:50.040 But like, it did not strike me as like, oh, 0:00:50.080 --> 0:00:50.720 this is like an. 0:00:50.720 --> 0:00:52.839 Amazing Yeah, this is the thing. 0:00:53.040 --> 0:00:54.280 Step function or whatever. 0:00:54.280 --> 0:00:59.080 It feels like the sort of breakthroughs awe inspiring breakthroughs 0:00:59.120 --> 0:01:00.680 are kind of behind us, and a lot of the 0:01:00.720 --> 0:01:04.160 progress on the models feels very incremental at this point, 0:01:04.160 --> 0:01:05.959 even though people are spending a lot of time and 0:01:06.040 --> 0:01:07.240 resources on doing it. 0:01:07.520 --> 0:01:09.880 The one thing GPG five does prompt me and say, oh, 0:01:09.880 --> 0:01:11.560 that's a great question. Would you like to follow up 0:01:11.560 --> 0:01:11.959 more on that? 0:01:12.160 --> 0:01:13.160 But it's like does it. 0:01:13.080 --> 0:01:16.320 Say, o, Joe, you're so smart? That's such a smart question. 0:01:16.440 --> 0:01:17.559 Say you know what it did? Say? 0:01:17.680 --> 0:01:19.560 I asked to follow up, and it started an answer 0:01:19.560 --> 0:01:22.280 with love it and then love it? Do you want 0:01:22.280 --> 0:01:23.200 me to look into that? 0:01:23.720 --> 0:01:24.039 Yes? 0:01:24.200 --> 0:01:26.600 They are very flattering, aren't they. Actually, that's one thing 0:01:26.640 --> 0:01:29.399 I like about perplexity is it doesn't really flatter you. 0:01:29.400 --> 0:01:30.600 It just spits out an answer. 0:01:30.840 --> 0:01:33.800 So anyway, there's so many questions I have about AI, 0:01:33.880 --> 0:01:36.039 and we talk about the business old fair amount and 0:01:36.200 --> 0:01:38.160 video and all that stuff. We actually don't really talk 0:01:38.200 --> 0:01:41.160 that much about the pure research side as much. But 0:01:41.240 --> 0:01:43.160 it's pretty important, I think, because I think a lot 0:01:43.200 --> 0:01:45.360 of people would agree that if the skills are like 0:01:45.400 --> 0:01:47.840 slowing down, or if there were a wall or something 0:01:47.880 --> 0:01:51.000 like that, that might change some of these business model calculations, 0:01:51.040 --> 0:01:53.080 et cetera. So I think it's good we need to 0:01:53.080 --> 0:01:55.280 get an update on just sort of the state of 0:01:55.280 --> 0:01:56.800 the art the science of AI. 0:01:57.080 --> 0:01:57.320 Yeah. 0:01:57.400 --> 0:02:00.600 Also, it would be nice just to understand what's possible 0:02:00.760 --> 0:02:03.080 in terms of the AI models and what people are 0:02:03.080 --> 0:02:06.360 actually researching, what they're working towards, work like, is it 0:02:06.480 --> 0:02:09.480 mostly about price? Is it mostly about the output? Is 0:02:09.520 --> 0:02:12.000 it mostly about energy use? All those things? 0:02:12.080 --> 0:02:13.760 All those things, Well, I'm really excited to say we 0:02:13.800 --> 0:02:16.760 have the perfect guest, someone who is an AI researcher. 0:02:17.000 --> 0:02:19.360 We're gonna be speaking with Jack Morris. He's currently about 0:02:19.400 --> 0:02:21.280 to finish his PhD. 0:02:20.880 --> 0:02:22.400 At Cornell in AI. 0:02:22.440 --> 0:02:27.160 He's been affiliated with Meta professionally, so presumably he already 0:02:27.200 --> 0:02:30.080 has a one hundred million dollar pay package in the bank. 0:02:30.320 --> 0:02:32.840 Now he's shaking his head, he's not that's a joke. 0:02:33.040 --> 0:02:36.079 But Jack, thank you so much for coming on odd lots. 0:02:36.200 --> 0:02:38.040 Yeah, thanks for having me. This is gonna be fun. 0:02:38.120 --> 0:02:39.560 What do you explain to me, like what you're up to, 0:02:39.600 --> 0:02:41.239 because I don't really understand how. 0:02:41.080 --> 0:02:42.040 It works where people are. 0:02:42.040 --> 0:02:44.560 They're at a university and they're also at a company, 0:02:44.639 --> 0:02:47.560 and this isn't how it works. And much of the world, right, 0:02:47.600 --> 0:02:49.880 people get their degree and then they get a job. 0:02:50.080 --> 0:02:52.200 I get the impression that in the AI world it's 0:02:52.240 --> 0:02:56.440 a little fuzzier in terms of one's affiliations between industry 0:02:56.600 --> 0:02:58.040 and education and stuff like that. 0:02:58.280 --> 0:03:01.320 Yeah, that's definitely true. I think might be on the 0:03:01.360 --> 0:03:03.200 way out, but I can tell you about my situation. 0:03:03.400 --> 0:03:06.960 So there's kind of a public research world and like 0:03:06.960 --> 0:03:11.240 a private research world, and all the academic institutions do 0:03:11.280 --> 0:03:17.400 public research, and the AI labs like Open Ai, Anthropic, Google, 0:03:17.440 --> 0:03:19.840 deep Mind, they essentially do private research where they have 0:03:19.960 --> 0:03:23.079 these people in house that are running experiments and learning 0:03:23.080 --> 0:03:25.960 more about their systems, but they don't publish anything or 0:03:25.960 --> 0:03:28.280 share any of their knowledge. And so a cool thing 0:03:28.280 --> 0:03:30.920 about getting your PhD right now is you can do 0:03:31.080 --> 0:03:34.079 research right about it and then publicize it like put 0:03:34.080 --> 0:03:36.400 it online, I tweet about it. I kind of like 0:03:36.520 --> 0:03:38.920 can talk to you about it. And there's a few 0:03:38.960 --> 0:03:42.000 places left that will still kind of moment, we're never. 0:03:41.920 --> 0:03:42.720 Going to hear from you again. 0:03:44.600 --> 0:03:46.640 Yeah, I'll make sure they have a clause in my 0:03:46.720 --> 0:03:50.040 contract that I can still talk to Joe and Tracy. 0:03:49.400 --> 0:03:52.320 The all thoughts clause. Yes, that would be important. So 0:03:52.680 --> 0:03:56.440 when we say AI research or an AI researcher, what 0:03:56.560 --> 0:04:01.480 exactly does that entail? Can't the AI models just research themselves? 0:04:01.600 --> 0:04:02.440 Just let them do it? 0:04:02.720 --> 0:04:06.000 Yeah, that's actually a very smart idea, and like people 0:04:06.120 --> 0:04:08.720 are really worried about that. Actually, Like if we get 0:04:08.720 --> 0:04:13.320 to the point where the AI can improve itself into researching, yeah, 0:04:13.360 --> 0:04:15.120 then it sort of gets smarter and then it improves 0:04:15.120 --> 0:04:17.120 themself again and it ends up being this kind of 0:04:17.400 --> 0:04:21.160 exponential improvement that ends up with all of our demise. 0:04:21.839 --> 0:04:24.840 But I think right now it's not quite there yet. 0:04:25.080 --> 0:04:28.160 Like maybe you can talk to CHGBT what good Yeah, 0:04:28.160 --> 0:04:29.960 And good news for me too, because it means I 0:04:30.000 --> 0:04:33.599 can still get a degree and be gainfully employed. But 0:04:34.200 --> 0:04:36.680 I think it's it's still helpful, but we still need 0:04:36.720 --> 0:04:39.800 like humans to make these improvements. And in terms of 0:04:39.839 --> 0:04:41.479 what the actual day to day work looks like, I 0:04:41.480 --> 0:04:43.800 think it really varies. Like there's some people working on 0:04:44.480 --> 0:04:47.279 trying to make the models run faster, or trying to 0:04:47.279 --> 0:04:50.440 make the hardware that runs the models run faster more efficiently. 0:04:50.880 --> 0:04:52.800 There's people that try to work on the data, like 0:04:52.839 --> 0:04:55.640 what should we train on more coding problems or more 0:04:55.880 --> 0:04:59.600 textbooks or more Reddit posts, what works best to make 0:04:59.640 --> 0:05:01.920 the model? And then there's a lot more people working 0:05:01.960 --> 0:05:04.680 on different areas of the stack, like training algorithms. I 0:05:04.760 --> 0:05:08.240 kind of have my own little niche and niche. There's 0:05:08.279 --> 0:05:11.760 this old field of information theory from like the twentieth 0:05:11.760 --> 0:05:14.360 century where they talk about bits like a zero or 0:05:14.400 --> 0:05:16.600 a one is a bit and you can add them 0:05:16.680 --> 0:05:20.040 up and have kilobytes and megabytes. And so I've been 0:05:20.080 --> 0:05:21.919 trying to think about what that means in like the 0:05:22.000 --> 0:05:24.200 chat GBT world, if you train a model on a 0:05:24.200 --> 0:05:26.880 certain number of bits, how many bits does it actually learn? 0:05:27.200 --> 0:05:29.040 And like can you look at the model and figure 0:05:29.040 --> 0:05:30.599 out like if you have one slice of the model, 0:05:30.600 --> 0:05:32.600 how many bits that is and stuff like that. So 0:05:32.640 --> 0:05:34.920 maybe the easiest way to explain is if you had, 0:05:34.960 --> 0:05:37.640 for some god forsaken reason to use chat GBT as 0:05:37.680 --> 0:05:40.560 like a flash drive, like you had a certain set 0:05:40.600 --> 0:05:43.080 of data and it had to memorize all that data, 0:05:43.200 --> 0:05:46.840 Like how much data could it actually store? That's the 0:05:46.960 --> 0:05:48.800 kind of area I've been working in. And then you know, 0:05:48.800 --> 0:05:50.880 once you're there, you kind of realize we could do this, 0:05:51.040 --> 0:05:52.800 or maybe next semester, if we have time, we could 0:05:53.120 --> 0:05:54.839 try this other thing. And so there's it kind of 0:05:54.839 --> 0:05:56.880 branches out and there's a lot of little problems that 0:05:56.920 --> 0:05:57.520 you can try. 0:05:57.920 --> 0:06:01.800 I mentioned GPT five fine to me, It does not 0:06:02.040 --> 0:06:05.479 strike me as like you know, because actually so the 0:06:05.520 --> 0:06:08.360 first time I use cha GPT is genuinely blown away 0:06:08.400 --> 0:06:10.559 like most people. And then actually I was pretty blown 0:06:10.600 --> 0:06:13.600 away by the three models, in part because of how 0:06:13.640 --> 0:06:16.640 well they could do document search and superior to Google 0:06:16.680 --> 0:06:19.680 Search in many respects and also just the organization of 0:06:19.720 --> 0:06:22.279 a lot of unstructured data, et cetera. Like I didn't 0:06:22.279 --> 0:06:25.720 have like some oh my god wow moment with GPT five. 0:06:25.760 --> 0:06:28.960 It's like, this seems like, how do we measure whether 0:06:29.080 --> 0:06:31.359 AI is getting better all the time. 0:06:32.680 --> 0:06:35.360 Yeah, that's that's a huge question, right. 0:06:35.800 --> 0:06:37.200 Well, let me ask you, Okay, let me ask you 0:06:37.320 --> 0:06:41.480 actually a more specific question. How do the entities that 0:06:41.800 --> 0:06:46.320 test AI models as their job or as their function? 0:06:46.800 --> 0:06:50.320 What does the formal testing process look like to rank 0:06:50.400 --> 0:06:52.480 the quality of AI models? 0:06:52.560 --> 0:06:55.240 Okay, yeah, that's that's more tractable. We can we can 0:06:55.320 --> 0:06:57.120 start there, and then we can talk about three and 0:06:57.320 --> 0:07:01.159 GPT five. So there's essentially two ways people do this 0:07:01.240 --> 0:07:05.000 kind of model evaluation. The main one is just by 0:07:05.120 --> 0:07:08.159 testing them on different data sets. So, for example, there's 0:07:08.200 --> 0:07:10.680 this data set called swee bench that's a bunch of 0:07:11.120 --> 0:07:14.680 software engineering related coding problems and they all have a 0:07:14.760 --> 0:07:18.040 human written solution and tests, and so you can ask 0:07:18.120 --> 0:07:20.280 GPT five, can you write the code for this and 0:07:20.320 --> 0:07:22.320 then run the tests and see if it's right? And 0:07:22.400 --> 0:07:24.360 still the models are pretty bad at that. I think 0:07:24.360 --> 0:07:26.800 they can do about half of them. They're very hard. 0:07:26.840 --> 0:07:30.640 They're like entire days of work for professional software engineers. 0:07:30.880 --> 0:07:32.920 But when a new model comes out, they can say, oh, look, 0:07:32.920 --> 0:07:35.360 we actually got a higher score on sweet bench. And 0:07:35.400 --> 0:07:37.520 there's a ton of different data sets like that. So 0:07:37.560 --> 0:07:39.920 when GBT five comes out, they say, you know, it's 0:07:39.920 --> 0:07:42.880 better at these types of coding tests. And a big 0:07:42.880 --> 0:07:46.760 one that specifically open AI has been advocating for is math, 0:07:47.120 --> 0:07:50.200 like they did the International Math Olympiad, and they said 0:07:50.760 --> 0:07:54.520 essentially GBT five scored at the level of the best 0:07:54.640 --> 0:07:59.200 high school mathematicians, which is pretty cool. But you raise 0:07:59.240 --> 0:08:00.960 a good question of how is that actually map to 0:08:01.000 --> 0:08:03.200 real world usage? And I think this is like a 0:08:03.240 --> 0:08:06.680 really hard problem that people still haven't figured out. 0:08:06.960 --> 0:08:11.080 Does anyone try to capture that sort of like genes sequah? 0:08:11.320 --> 0:08:13.640 I guess when it comes to AI models, is one 0:08:13.680 --> 0:08:15.920 of the tests asking it to I don't know, come 0:08:16.000 --> 0:08:17.680 up with a stupid limerick or something. 0:08:18.160 --> 0:08:21.200 Yeah, there are a lot of tests like that. There's 0:08:21.280 --> 0:08:25.200 some creative writing benchmarks and some poetry related ones. But 0:08:25.480 --> 0:08:29.080 I think you point out something interesting that for example, 0:08:29.120 --> 0:08:32.680 I mostly use Claude from Anthropic and I think Claude 0:08:32.760 --> 0:08:36.520 does have this something to it that's like a little 0:08:36.559 --> 0:08:38.960 bit different, and it's very difficult to characterize. It's just 0:08:39.000 --> 0:08:40.400 sort of the way it speaks to you and the 0:08:40.400 --> 0:08:42.760 way it thinks of itself is I like it a 0:08:42.800 --> 0:08:44.960 lot better, but I don't know how you would design 0:08:45.000 --> 0:08:47.320 like a data set that can really capture that. The 0:08:47.360 --> 0:08:50.079 second way they do the evaluation is by they call 0:08:50.080 --> 0:08:53.840 it it's Elo scores, like in chess. So they, for example, 0:08:53.920 --> 0:08:56.440 ask the two models to write a limerick, and then 0:08:56.480 --> 0:08:59.000 they have humans rank which one is better, and they 0:08:59.040 --> 0:09:02.400 make this kind of lat of Elo rankings for models. 0:09:02.640 --> 0:09:05.439 So I think right now Claude or GPT five or 0:09:05.480 --> 0:09:09.440 maybe the Google model is top on this ladder. 0:09:10.000 --> 0:09:12.680 The algorithm made famous in the social network that Mark 0:09:12.760 --> 0:09:17.080 Zuckerberg used to rate the of his colleagues still the 0:09:17.200 --> 0:09:19.400 workhorse model for comp evaluation. 0:09:19.640 --> 0:09:22.800 That's some good trivia, Joe, very good and no comment. Well, 0:09:22.880 --> 0:09:27.440 I assume just on the hard number evaluation. People are 0:09:27.480 --> 0:09:31.839 also ranking these on data usage, energy, that sort of. 0:09:31.760 --> 0:09:32.240 Thing as well. 0:09:32.320 --> 0:09:35.760 Right speed, speed would be a definitely. 0:09:35.960 --> 0:09:38.640 The AI companies like to use price as a metric, 0:09:38.760 --> 0:09:41.120 which is kind of interesting because there's a lot that 0:09:41.160 --> 0:09:43.440 goes on behind the scenes, including just sort of like 0:09:44.280 --> 0:09:47.720 free money that drives the prices down, but they also 0:09:47.760 --> 0:09:50.240 do benchmark speed, and I think you make a good 0:09:50.280 --> 0:09:53.079 point that the benchmarks can be pretty misleading, Like, for example, 0:09:53.080 --> 0:09:56.160 there's a bunch of recent open source models that came 0:09:56.200 --> 0:09:59.000 from different Chinese AI labs that have really, really high 0:09:59.080 --> 0:10:02.400 scores on certain benchmarks, but people kind of think they're 0:10:02.400 --> 0:10:05.680 not as good for real world usage for whatever reason. 0:10:06.360 --> 0:10:08.720 I've seen people talk about this isn't part of the 0:10:08.840 --> 0:10:14.120 problem with testing AI or evaluating AI. That a lot 0:10:14.160 --> 0:10:16.959 of these problems exist in the real world already, right, 0:10:17.200 --> 0:10:19.720 You see this a lot, and people are always finding this, 0:10:19.920 --> 0:10:23.400 which is that here's an AI model that is amazing 0:10:23.559 --> 0:10:27.520 at math on the math Olympiad, and yet it gets 0:10:27.520 --> 0:10:31.280 tripped up by questions like which is heavier a pound 0:10:31.320 --> 0:10:33.880 of steel or two pounds of feathers, And they'll say 0:10:33.920 --> 0:10:35.920 that that's a trick question. A pound of steel weighs the 0:10:35.920 --> 0:10:38.520 same as two pounds of feathers, which is clearly like 0:10:38.840 --> 0:10:41.760 it was clearly then been trained in some sense to 0:10:42.280 --> 0:10:44.960 recognize these steel versus feathers thing or whatever it is. 0:10:45.200 --> 0:10:47.920 I forget if it's steel, But it also clearly can't 0:10:47.960 --> 0:10:49.719 measure whether one or. 0:10:49.679 --> 0:10:50.480 Two is bigger. 0:10:50.840 --> 0:10:54.960 Yeah, that's a really good example. I think they kind 0:10:54.960 --> 0:10:58.720 of successively include these kinds of things in more rounds 0:10:58.720 --> 0:11:00.760 of training data, and so every time a new model 0:11:00.800 --> 0:11:03.640 comes out, they kind of patch little holes that appeared 0:11:03.640 --> 0:11:06.040 in the previous models. So you're pointing to this, like 0:11:06.080 --> 0:11:08.280 they probably started with the classic riddle that's like a 0:11:08.320 --> 0:11:10.200 pound of bricks or a pound of feathers bricks and 0:11:10.240 --> 0:11:13.120 they're equal, but then like the models got that wrong, 0:11:13.160 --> 0:11:14.040 and so they added to. 0:11:13.960 --> 0:11:19.080 Something a very efficient way to achieve intelligence, like, oh yeah, 0:11:19.080 --> 0:11:19.960 we should have included that. 0:11:20.000 --> 0:11:21.640 Oh yeah, we got to include that trick. Oh yeah, 0:11:21.640 --> 0:11:22.320 we gotta have right. 0:11:22.360 --> 0:11:26.480 Like ever, like going that does not speak to me 0:11:26.880 --> 0:11:30.200 of a line towards something that we would call anything 0:11:30.280 --> 0:11:32.280 resembling human intelligence. 0:11:32.400 --> 0:11:35.760 I definitely agree. I think one counter example is people 0:11:35.760 --> 0:11:37.880 said this for a long time about self driving cars, 0:11:38.240 --> 0:11:40.480 Like everyone was really excited about them for a long time, 0:11:40.520 --> 0:11:42.760 and then they kind of didn't really work, like eight 0:11:42.880 --> 0:11:45.360 or so years ago, and there was this period where 0:11:45.360 --> 0:11:47.959 they were saying, oh, the models can't do green cones. 0:11:48.040 --> 0:11:50.400 We're going out there trying to take videos of green cones, 0:11:50.440 --> 0:11:55.640 and yeah, they can't do snow. I'm saying that it 0:11:55.720 --> 0:11:59.240 worked for them, and so it might be possible. But 0:11:59.720 --> 0:12:01.960 in the case of language models, there's something a little 0:12:02.000 --> 0:12:05.880 more interesting happening, because we now have two ways to learn. 0:12:06.280 --> 0:12:07.760 If you guys are ready, we could we could get 0:12:07.760 --> 0:12:10.040 into something a little technical, which I think gives you 0:12:10.080 --> 0:12:13.280 some insights. So there's essentially two ways you can teach 0:12:13.360 --> 0:12:16.680 machines to learn from data. One is called supervised learning, 0:12:16.920 --> 0:12:19.640 where the computer will copy what you did, which is 0:12:19.640 --> 0:12:22.040 like basically what we're talking about now, and the other 0:12:22.160 --> 0:12:25.199 is called reinforcement learning, where the computer just does something 0:12:25.280 --> 0:12:27.120 and then you give it a reward if it does 0:12:27.160 --> 0:12:30.360 something well. And so for a long time, like the 0:12:30.400 --> 0:12:34.640 original chat GBT was mostly just trained with supervised learning, 0:12:34.720 --> 0:12:37.120 like it would just copy the text from all of 0:12:37.160 --> 0:12:39.680 the Internet, and so the best it could ever do 0:12:39.880 --> 0:12:44.280 is emulate Reddit posts very well. And there was a 0:12:44.320 --> 0:12:47.439 tiny bit of reinforcement learning, but people didn't know how 0:12:47.480 --> 0:12:50.040 to do it right. And then you mentioned this three model, 0:12:50.040 --> 0:12:52.839 which is kind of in some ways like a big jump, 0:12:52.960 --> 0:12:55.040 like it made the models much better at math, much 0:12:55.040 --> 0:12:57.760 better at certain things. And the way they did that 0:12:57.840 --> 0:13:00.760 is actually through reinforcement learning. Found out a way to 0:13:00.840 --> 0:13:02.760 kind of like let the model think for a while 0:13:03.240 --> 0:13:05.280 and then give it a reward when it gets the 0:13:05.360 --> 0:13:07.600 answer at the end. It's kind of scary. 0:13:07.840 --> 0:13:10.199 Yeah, when you say give it a reward, is. 0:13:10.120 --> 0:13:13.680 It like take a cookie paying robots? 0:13:13.920 --> 0:13:14.120 Yeah? 0:13:14.240 --> 0:13:16.920 Well no, genuinely, like what is the reward? You just 0:13:16.920 --> 0:13:18.080 tell it it did a good job. 0:13:18.440 --> 0:13:20.199 You just give it like a higher number. Okay, and 0:13:20.240 --> 0:13:21.559 that makes you happy, all right. 0:13:22.120 --> 0:13:24.520 I'd get a little bit worried when we're like giving 0:13:24.520 --> 0:13:27.520 it cupcakes or something like here you go, good job. 0:13:28.440 --> 0:13:30.240 Just going back to the intro, you know, we were 0:13:30.240 --> 0:13:32.880 talking about how it feels like a lot of the 0:13:32.920 --> 0:13:36.520 progress on AI models is a little bit more incremental, 0:13:36.960 --> 0:13:39.000 and I guess it's hard to tell whether that's just 0:13:39.200 --> 0:13:41.840 personal bias because now we're used to them and the 0:13:41.880 --> 0:13:44.720 sort of wow moment has passed. But what does it 0:13:44.760 --> 0:13:47.440 feel like to you in terms of improvements? Are we 0:13:47.600 --> 0:13:52.040 seeing the improvement cycle accelerate or decelerate at this point? 0:13:52.240 --> 0:13:55.000 I think it's kind of like the market, where it's 0:13:55.040 --> 0:13:57.679 like always it gets faster for a little while, and 0:13:57.679 --> 0:14:00.560 then it feels like things have slowed down and the 0:14:00.600 --> 0:14:02.920 progress is never quite in the areas that you expect 0:14:03.000 --> 0:14:06.720 as one example, people really thought this year was the 0:14:06.840 --> 0:14:10.680 year when the assistance would start being able to act 0:14:10.720 --> 0:14:13.560 like actual assistants, like the Year of agents. People actually 0:14:13.640 --> 0:14:15.800 coined that term, I think, like the year of agents, 0:14:16.000 --> 0:14:19.080 and it really it didn't happen for whatever reason. Maybe 0:14:19.080 --> 0:14:21.160 it will in the next three months. But the agents 0:14:21.160 --> 0:14:23.200 are still pretty bad the ones that you can use. 0:14:23.440 --> 0:14:25.920 But they did get way better at competitive math, Like 0:14:25.960 --> 0:14:29.600 now they can do these like world class proofs that 0:14:29.640 --> 0:14:33.080 they couldn't do before. So it's almost unpredictable, like which 0:14:33.160 --> 0:14:36.120 areas the AI will kind of conquer next, But it 0:14:36.160 --> 0:14:38.920 does feel like progress is continuing. 0:14:39.320 --> 0:14:42.920 Actually, what happened with agents? I've never had a successful 0:14:43.280 --> 0:14:45.840 agent experience, even basic things like come up with a 0:14:45.880 --> 0:14:49.120 list of every past odd Lots guests, yeah and put 0:14:49.160 --> 0:14:52.120 it in a file or something like that, which just 0:14:52.760 --> 0:14:55.200 there's an RSS feed that exists for odd Lots. This 0:14:55.200 --> 0:14:57.480 should be ray stick for it all around, and then 0:14:58.040 --> 0:15:00.880 something will happen or it'll get lazy. Here's like here's 0:15:00.920 --> 0:15:04.560 fifteen and what is actually this is thought leaders love 0:15:04.600 --> 0:15:06.560 this stuff. They love to talking about the agents. So 0:15:06.600 --> 0:15:09.720 what actually happened with agents? Maybe they'll get there, but 0:15:09.800 --> 0:15:11.400 what do you use to what is the roadblock there. 0:15:11.880 --> 0:15:14.680 I don't think there's any conceptual roadblock, Like there's no 0:15:14.800 --> 0:15:17.400 reason why you couldn't collect data for that and train 0:15:17.480 --> 0:15:20.600 them either in a supervised way or using reinforcement learning. 0:15:20.920 --> 0:15:23.520 It just hasn't happened yet. So I think maybe behind 0:15:23.520 --> 0:15:25.400 the scenes it turned out that the problem was harder 0:15:25.400 --> 0:15:28.560 than people thought, Like getting data from all those scenarios 0:15:28.640 --> 0:15:31.640 is really hard. And there have been some stories from 0:15:31.800 --> 0:15:34.040 like people that I've heard of that found these little 0:15:34.080 --> 0:15:37.840 companies in San Francisco and they build these tiny environments 0:15:37.880 --> 0:15:41.240 for the AI labs to do reinforcement learning on for agents, 0:15:41.320 --> 0:15:44.000 like for example, doing a calendar. They'll build like a 0:15:44.000 --> 0:15:47.120 little calendar app, but make it have rewards so you 0:15:47.120 --> 0:15:49.440 can do reinforcement learning, and they can just sell that 0:15:49.520 --> 0:15:51.760 for like hundreds of thousands of dollars. So I think 0:15:51.920 --> 0:15:54.600 the progress is ongoing behind the scenes, Like there's a 0:15:54.600 --> 0:15:58.080 whole ecosystem built around it. It just hasn't really manifested 0:15:58.080 --> 0:15:59.400 in the products that we use. 0:16:00.000 --> 0:16:02.880 I was going to ask, how much of the difficulty is, 0:16:03.360 --> 0:16:06.360 you know, the actual development of the models, the thinking part, 0:16:06.440 --> 0:16:10.640 versus just getting them to plug in seamlessly with other applications. 0:16:11.160 --> 0:16:15.440 Yeah, I think the second thing is probably the biggest 0:16:15.440 --> 0:16:17.920 barrier in terms of time, Like it just takes a 0:16:17.920 --> 0:16:20.520 really long time to figure out what data you need 0:16:20.640 --> 0:16:23.160 and collect it properly and actually train the models on 0:16:23.200 --> 0:16:25.680 that data. But at the same time, there are people 0:16:26.120 --> 0:16:27.920 like me who are trying to work on better like 0:16:28.000 --> 0:16:31.400 conceptual frameworks for training the models. So to go back 0:16:31.400 --> 0:16:37.280 to the three example, doing reinforcement learning on CHATGBT, like 0:16:37.320 --> 0:16:40.240 that seems to me like a huge breakthrough, Like we 0:16:40.280 --> 0:16:42.760 didn't know how to do that before. It unlocks all 0:16:42.800 --> 0:16:45.800 sorts of doors and ways to train the models. So 0:16:45.960 --> 0:16:48.400 even if maybe you don't think that model was that 0:16:48.480 --> 0:16:50.880 much better than the previous one, it seems like it 0:16:50.960 --> 0:16:54.160 will give us huge improvements in the future. 0:17:10.040 --> 0:17:14.879 So you mentioned at the intro that it's possible, hopefully 0:17:14.920 --> 0:17:16.919 you'll get a close but you might end up in 0:17:16.960 --> 0:17:19.879 a situation which you go to work for some frontier 0:17:20.040 --> 0:17:22.800 AI lab and we never hear from you again, or 0:17:22.840 --> 0:17:25.480 you just post cryptic tweets like oh no idea, what's coming, 0:17:25.880 --> 0:17:26.680 Oh it's gonna. 0:17:26.440 --> 0:17:29.760 Be so over or whatever. Yeah, an the death Star, Yeah, 0:17:29.760 --> 0:17:30.560 it's very annoying. 0:17:30.640 --> 0:17:33.320 The way they all tweet, it's possible talk to us 0:17:33.359 --> 0:17:36.400 about like why not work on an open source project? 0:17:36.880 --> 0:17:38.880 And this is of course when people talk about deep 0:17:38.920 --> 0:17:40.680 seek and a lot of the Chinese models that the 0:17:40.760 --> 0:17:43.520 US competes with, a lot of those are open source. 0:17:43.840 --> 0:17:46.960 Presumably you could keep coming on odd lads over and 0:17:47.000 --> 0:17:50.639 over again, why like what is even the case for 0:17:50.800 --> 0:17:52.960 the best and the brightest to work on a closed 0:17:52.960 --> 0:17:54.600 source frontier models. 0:17:54.840 --> 0:17:58.080 Yeah, it's a really hard question, Like I've I've struggled 0:17:58.119 --> 0:18:00.399 with this in my own personal decision making. I was 0:18:00.560 --> 0:18:03.080 originally thinking, Oh, I'd love to become a professor and 0:18:03.119 --> 0:18:07.479 mentor younger students and get a whole like group of 0:18:07.520 --> 0:18:11.160 these ideas going and start working on similar related problems 0:18:11.200 --> 0:18:12.920 to the stuff I was talking about. And I still 0:18:12.920 --> 0:18:15.639 think that would be fun. But there's a big gap 0:18:15.680 --> 0:18:18.520 in terms of the things we can do at Cornell 0:18:18.640 --> 0:18:20.600 and the things that you can do at open AI. 0:18:20.800 --> 0:18:24.919 Like they just have like crazy infrastructure for training models 0:18:24.960 --> 0:18:29.480 really easily and data and a ton of really good data. 0:18:29.960 --> 0:18:32.720 And so I think as that gap has widened, I've 0:18:32.720 --> 0:18:34.760 felt like a lot of what we're doing is like 0:18:35.080 --> 0:18:38.080 kind of devising these toy scenarios where we can study 0:18:38.080 --> 0:18:41.240 interesting things, but I feel a bit disconnected from the 0:18:41.280 --> 0:18:45.399 real like progress of humanity. You know, like if you 0:18:45.480 --> 0:18:47.879 really agree that this is like the biggest problem of 0:18:47.920 --> 0:18:49.760 our time. I don't want to say it's like the 0:18:49.800 --> 0:18:52.439 Manhattan Project, but like, what's more like trying to go 0:18:52.480 --> 0:18:54.680 to the Moon in the sixties? The space race. It's 0:18:54.760 --> 0:18:56.720 kind of like a space race going on in these 0:18:56.760 --> 0:18:58.840 different private labs. You want to be a part of it. 0:18:58.880 --> 0:19:02.320 Like there's crazy energy that it has huge implications for 0:19:02.359 --> 0:19:05.880 the future of society. So I think I am interested 0:19:05.880 --> 0:19:09.760 in participating in that. My big question is like, if 0:19:09.760 --> 0:19:13.560 you think that the reinforcement learning thing was the most 0:19:13.600 --> 0:19:16.800