WEBVTT - Understanding the Most Viral Chart in Artificial Intelligence

0:00:02.720 --> 0:00:14.000
<v Speaker 1>Bloomberg Audio Studios, Podcasts, Radio News.

0:00:18.560 --> 0:00:21.919
<v Speaker 2>Hello and welcome to another episode of the Odd Lots podcast.

0:00:22.000 --> 0:00:24.959
<v Speaker 2>I'm Jill Wisenthal, and I'm Tracy all the way, Tracy.

0:00:25.200 --> 0:00:28.560
<v Speaker 2>One thing about AI is that lots of lines that

0:00:28.640 --> 0:00:29.000
<v Speaker 2>go up.

0:00:30.640 --> 0:00:34.919
<v Speaker 3>Yes, famously, there is perhaps one line that has captured

0:00:34.920 --> 0:00:37.000
<v Speaker 3>the attention more than others when it comes to lines

0:00:37.080 --> 0:00:37.479
<v Speaker 3>going up.

0:00:37.720 --> 0:00:40.560
<v Speaker 2>Yes, but we're recording as April seven. Did you see

0:00:40.600 --> 0:00:43.640
<v Speaker 2>the anthropic revenue chart, by the way, Oh.

0:00:43.320 --> 0:00:44.440
<v Speaker 4>It's just extreme.

0:00:44.720 --> 0:00:47.519
<v Speaker 2>Okay, it's just the number of lines going up. I

0:00:47.560 --> 0:00:49.440
<v Speaker 2>mean there are some really.

0:00:49.520 --> 0:00:53.240
<v Speaker 3>Let me caveat that. Up until recently, there was one

0:00:53.400 --> 0:00:56.840
<v Speaker 3>chart of a line going up exponentially that became I

0:00:56.840 --> 0:00:59.880
<v Speaker 3>think it's fair to say the most viral chart in AI.

0:01:00.240 --> 0:01:02.680
<v Speaker 4>Right, Yes, I would absolutely agree with that.

0:01:02.800 --> 0:01:05.600
<v Speaker 2>So one of the many lines that go up, or

0:01:05.600 --> 0:01:08.399
<v Speaker 2>there are various lines that sort of capture This is

0:01:08.840 --> 0:01:11.240
<v Speaker 2>essentially just measures of AI progress of what they could do,

0:01:11.280 --> 0:01:14.360
<v Speaker 2>what the models are capable of, and so forth. And

0:01:15.000 --> 0:01:18.800
<v Speaker 2>you know, there's all different benchmarks out there and hobbyist

0:01:18.920 --> 0:01:22.120
<v Speaker 2>benchmark creators, et cetera, all kinds of benchmarks out there.

0:01:22.920 --> 0:01:26.880
<v Speaker 2>Organization called Meter based out in San Francisco, and they

0:01:26.920 --> 0:01:30.280
<v Speaker 2>measure how well AI models are doing at various sort

0:01:30.280 --> 0:01:34.000
<v Speaker 2>of engineering tasks, et cetera. And they have these charts

0:01:34.000 --> 0:01:37.440
<v Speaker 2>showing how long, you know, certain tasks, how long would

0:01:37.440 --> 0:01:40.240
<v Speaker 2>take a human to do them, and then whether AI

0:01:40.319 --> 0:01:43.040
<v Speaker 2>could do them. And yes, the lines just almost vertical.

0:01:43.040 --> 0:01:44.920
<v Speaker 2>I think there was someone one of the ones that

0:01:44.959 --> 0:01:47.720
<v Speaker 2>came out maybe very early this year or late last year,

0:01:47.800 --> 0:01:49.280
<v Speaker 2>showing the latest Claude model.

0:01:49.360 --> 0:01:50.920
<v Speaker 4>It, yes, like this is crazy.

0:01:51.040 --> 0:01:54.360
<v Speaker 3>When I look at these charts, they're called time horizon charts.

0:01:54.720 --> 0:01:57.880
<v Speaker 3>When I look at them, I like, intuitively I kind

0:01:57.880 --> 0:02:00.840
<v Speaker 3>of understand what they're saying, and you can kind of

0:02:00.880 --> 0:02:04.080
<v Speaker 3>see the leap in progress between some of the previous

0:02:04.120 --> 0:02:07.560
<v Speaker 3>models and Claude, right the latest claud model. And that's

0:02:07.560 --> 0:02:10.440
<v Speaker 3>what got everyone excited, was you had this big exponential

0:02:10.520 --> 0:02:14.400
<v Speaker 3>shift up in the capability of that particular AI model.

0:02:14.680 --> 0:02:17.600
<v Speaker 3>But then when I start like diving into what it

0:02:17.600 --> 0:02:21.320
<v Speaker 3>actually says on Meter's website about what these charts represent,

0:02:21.440 --> 0:02:24.040
<v Speaker 3>I start getting really confused. I know everyone wants to

0:02:24.040 --> 0:02:26.520
<v Speaker 3>get excited about AI and charts going up in general,

0:02:26.680 --> 0:02:28.280
<v Speaker 3>but I think there's a lot of nuance here and

0:02:28.320 --> 0:02:30.200
<v Speaker 3>we should probably talk about it, because the other thing

0:02:30.240 --> 0:02:33.000
<v Speaker 3>going on with meter right now is they've become sort

0:02:33.000 --> 0:02:35.920
<v Speaker 3>of the industry standard benchmark, and so a lot of

0:02:36.080 --> 0:02:39.720
<v Speaker 3>investment decisions are being based on these charts. And if

0:02:39.720 --> 0:02:42.400
<v Speaker 3>you oversimplify them as just like okay, lines going up

0:02:42.760 --> 0:02:46.440
<v Speaker 3>and then suddenly it goes up even more, obviously, people

0:02:46.480 --> 0:02:48.280
<v Speaker 3>are going to start to get like maybe a little

0:02:48.280 --> 0:02:48.959
<v Speaker 3>over excited.

0:02:49.040 --> 0:02:51.160
<v Speaker 2>Can I see one other thing too that I'm very

0:02:51.160 --> 0:02:54.040
<v Speaker 2>curious about, Like, I'm really glad that there are people

0:02:54.240 --> 0:02:58.320
<v Speaker 2>designing various benchmarks for measuring a progress. Seems like an

0:02:58.360 --> 0:03:01.119
<v Speaker 2>important thing to get a handle on. But like if

0:03:01.200 --> 0:03:05.480
<v Speaker 2>I were, like, say, like talented or smart enough to

0:03:05.520 --> 0:03:07.440
<v Speaker 2>be like doing these things, I would go work for

0:03:07.480 --> 0:03:09.560
<v Speaker 2>one of the labs and make ten million dollars a

0:03:09.639 --> 0:03:11.520
<v Speaker 2>year or something like that. And so I'm actually curious

0:03:11.560 --> 0:03:13.440
<v Speaker 2>to a lot of the nonprofits, et cetera. It's like,

0:03:13.960 --> 0:03:16.120
<v Speaker 2>do you really want to be like working at the

0:03:16.160 --> 0:03:19.480
<v Speaker 2>cutting edge of AI in a nonprofit? I mean, I

0:03:19.480 --> 0:03:22.040
<v Speaker 2>guess Open Eyes owned by a nonprofit weirdly enough, but

0:03:22.080 --> 0:03:23.960
<v Speaker 2>you know what I'm saying, Like I would want the money.

0:03:24.160 --> 0:03:26.240
<v Speaker 3>We should talk about it with our guests who are

0:03:26.240 --> 0:03:27.480
<v Speaker 3>currently sitting right here.

0:03:28.480 --> 0:03:29.320
<v Speaker 4>That's exactly right.

0:03:29.400 --> 0:03:31.799
<v Speaker 2>I'm very excited to say we have the two perfect

0:03:31.840 --> 0:03:34.280
<v Speaker 2>guests to talk about the best of viral and maybe

0:03:34.320 --> 0:03:36.640
<v Speaker 2>important chart in AI. Right now, we're going to be

0:03:36.640 --> 0:03:38.560
<v Speaker 2>speaking with Joel Becker. He is a member of the

0:03:38.600 --> 0:03:41.160
<v Speaker 2>technical staff at METER, and we're also going to be

0:03:41.160 --> 0:03:44.160
<v Speaker 2>speaking with Chris Painter, the president of METER. So, uh,

0:03:44.600 --> 0:03:46.920
<v Speaker 2>Joel and Chris, thank you so much for coming on oud,

0:03:47.000 --> 0:03:50.960
<v Speaker 2>lots for having us. Yeah, really excited to chat with

0:03:51.080 --> 0:03:54.000
<v Speaker 2>both of you. Chris, and you're the president. I'll start

0:03:54.040 --> 0:03:55.920
<v Speaker 2>with you, like, what is METER? How long has it

0:03:55.920 --> 0:03:58.640
<v Speaker 2>been around? What is this organization? What's its goal? Just

0:03:58.640 --> 0:04:02.000
<v Speaker 2>give us the sort of six sixty second synopsis of Meter.

0:04:02.160 --> 0:04:02.640
<v Speaker 5>Yeah, totally.

0:04:02.720 --> 0:04:04.840
<v Speaker 6>I can try and you know, sometimes I give a

0:04:04.840 --> 0:04:07.040
<v Speaker 6>long version. I can try and do a short version here.

0:04:07.120 --> 0:04:11.080
<v Speaker 6>So Meter is a research nonprofit based in the Bay Area,

0:04:11.160 --> 0:04:15.160
<v Speaker 6>like you said, dedicated to advancing the science of measuring

0:04:15.200 --> 0:04:19.159
<v Speaker 6>whether and when AI systems might pose catastrophic risks to

0:04:19.320 --> 0:04:23.680
<v Speaker 6>humanity as a whole, focused specifically on threats that come

0:04:23.720 --> 0:04:27.400
<v Speaker 6>from AI autonomy or AI systems themselves. So when you

0:04:27.440 --> 0:04:29.559
<v Speaker 6>talk about there's kind of this whole field and AI

0:04:30.240 --> 0:04:34.360
<v Speaker 6>of dangerous capability evaluations. People seeing ken this AI system

0:04:34.360 --> 0:04:37.960
<v Speaker 6>assist with a chemical or biological weapon attack, can it advance?

0:04:38.360 --> 0:04:40.920
<v Speaker 6>Kind of like bad actors' ability to execute cyber attacks

0:04:40.960 --> 0:04:43.880
<v Speaker 6>on a really large scale. METER is sort of specialized

0:04:43.960 --> 0:04:48.880
<v Speaker 6>in specifically assessing how autonomous are AI systems, what is

0:04:48.920 --> 0:04:51.840
<v Speaker 6>the scale and like length and difficulty of tasks that

0:04:51.839 --> 0:04:55.000
<v Speaker 6>they're able to do by themselves, partially because we think

0:04:55.000 --> 0:04:59.080
<v Speaker 6>it sets the stakes for conversations about AI misalignment. So

0:04:59.120 --> 0:05:01.200
<v Speaker 6>we sort of see ourselve being on the hook for

0:05:01.240 --> 0:05:04.559
<v Speaker 6>at any given point in time, giving humanity the bits

0:05:04.600 --> 0:05:08.480
<v Speaker 6>of evidence that are most informative for establishing the stakes

0:05:08.520 --> 0:05:12.400
<v Speaker 6>of are we reliant on AI systems as a society

0:05:12.440 --> 0:05:14.960
<v Speaker 6>in a way that could make it really bad if

0:05:15.000 --> 0:05:16.200
<v Speaker 6>they are misaligned.

0:05:16.480 --> 0:05:18.360
<v Speaker 3>I'm going to let Joe ask the question about why

0:05:18.360 --> 0:05:20.280
<v Speaker 3>you're both working in a nonprofit instead of one of

0:05:20.320 --> 0:05:23.440
<v Speaker 3>the labs later, But one question I do have is

0:05:23.480 --> 0:05:26.120
<v Speaker 3>when I think of METER, you guys always come up

0:05:26.120 --> 0:05:28.760
<v Speaker 3>in the context of these time horizon charts. And I

0:05:29.560 --> 0:05:31.240
<v Speaker 3>don't mean this as an insult or anything, but I

0:05:31.240 --> 0:05:34.600
<v Speaker 3>hardly ever hear anyone talk about the actual safety aspect

0:05:34.760 --> 0:05:36.440
<v Speaker 3>of your mission. Why do you think that is.

0:05:36.720 --> 0:05:39.599
<v Speaker 6>Yeah, So I think there's some distinction between our motive

0:05:39.680 --> 0:05:42.800
<v Speaker 6>for assessing time horizons and the kind of how it

0:05:42.839 --> 0:05:45.040
<v Speaker 6>gets used then by the rest of the world or

0:05:45.120 --> 0:05:47.040
<v Speaker 6>kind of like what the origin of the rest of

0:05:47.080 --> 0:05:49.520
<v Speaker 6>the world's interest in it for meter. I think the

0:05:49.920 --> 0:05:52.719
<v Speaker 6>reason that we work on things like the time horizon

0:05:52.800 --> 0:05:55.680
<v Speaker 6>charts is because if we're trying to establish the stakes

0:05:55.720 --> 0:05:59.120
<v Speaker 6>for talking about could AI systems go rogue or one

0:05:59.200 --> 0:06:01.599
<v Speaker 6>day could they like try to take over and subvert

0:06:01.680 --> 0:06:04.760
<v Speaker 6>human control? Three years ago, if you went back to

0:06:04.760 --> 0:06:07.320
<v Speaker 6>around when it meters started about fourish years ago, and

0:06:07.400 --> 0:06:09.760
<v Speaker 6>if you it was started by Beth Barnes Paul Cristiano

0:06:09.800 --> 0:06:12.720
<v Speaker 6>and this was kind of the initial motive. Is if

0:06:12.720 --> 0:06:14.880
<v Speaker 6>you went back then and you said, why don't I

0:06:14.960 --> 0:06:17.240
<v Speaker 6>think that AI systems are going to go rogue and

0:06:17.320 --> 0:06:20.680
<v Speaker 6>like take over or overthrow humanity today the kind of

0:06:20.720 --> 0:06:22.400
<v Speaker 6>most intuitive you know, you can come up with a

0:06:22.400 --> 0:06:25.520
<v Speaker 6>lot of abstract reasons debates about the goals AI systems

0:06:25.560 --> 0:06:27.680
<v Speaker 6>might or might not eventually have, but the kind of

0:06:27.720 --> 0:06:30.920
<v Speaker 6>most damning in the moment reason is the AI system

0:06:31.040 --> 0:06:33.280
<v Speaker 6>just can't do much right. It doesn't make sense to

0:06:33.320 --> 0:06:36.560
<v Speaker 6>talk about a question answer system that like can't even

0:06:36.600 --> 0:06:40.000
<v Speaker 6>reliably answer programming questions saying like is it going to

0:06:40.200 --> 0:06:42.480
<v Speaker 6>hack my systems or like backdoor me in some way.

0:06:42.520 --> 0:06:44.120
<v Speaker 6>It just doesn't make any sense to talk about.

0:06:44.000 --> 0:06:46.040
<v Speaker 3>It's going to write you a poem that you asked.

0:06:46.240 --> 0:06:48.240
<v Speaker 6>Right, or won't even at the time they couldn't do

0:06:48.279 --> 0:06:50.880
<v Speaker 6>anything for themselves. And so if you're like, kind of

0:06:51.080 --> 0:06:54.159
<v Speaker 6>being able to subvert human control depends on agency, And

0:06:54.200 --> 0:06:55.960
<v Speaker 6>so we wanted to come up with a measure that

0:06:56.040 --> 0:06:58.400
<v Speaker 6>kind of tracks agency over time to kind of say,

0:06:58.400 --> 0:07:01.520
<v Speaker 6>when would this argument no longer? When are AI systems

0:07:01.560 --> 0:07:04.840
<v Speaker 6>now able to kind of do long, complex enough actions

0:07:04.839 --> 0:07:08.320
<v Speaker 6>by themselves that the argument kind of the goalposts almost

0:07:08.360 --> 0:07:10.920
<v Speaker 6>move somewhere else to like, well, we would catch the

0:07:10.960 --> 0:07:13.880
<v Speaker 6>AIS or the AIS don't want to subvert human control.

0:07:14.240 --> 0:07:17.000
<v Speaker 6>And so I agree that there is a distinction between

0:07:17.040 --> 0:07:19.600
<v Speaker 6>like how I think partially the exercise of trying to

0:07:19.600 --> 0:07:22.320
<v Speaker 6>come up with these measures throws off things that are

0:07:22.440 --> 0:07:26.000
<v Speaker 6>very like grounded and intuitive measures of AI progress that

0:07:26.080 --> 0:07:28.960
<v Speaker 6>might be more intuitive than just benchmarks. Right, So if

0:07:29.000 --> 0:07:30.600
<v Speaker 6>you a lot of people are in the game of

0:07:30.600 --> 0:07:33.480
<v Speaker 6>making just benchmarks, where you say, like here's my harm

0:07:33.480 --> 0:07:36.400
<v Speaker 6>bench or something the AI gets seventy percent. That's much

0:07:36.480 --> 0:07:39.400
<v Speaker 6>less of a kind of grounded or long lasting metric.

0:07:39.480 --> 0:07:41.560
<v Speaker 6>Like it's hard to say what that means or how

0:07:41.560 --> 0:07:44.240
<v Speaker 6>that generalizes, but the idea with time horizon is like,

0:07:44.280 --> 0:07:46.880
<v Speaker 6>maybe it's more intuitive, and I think that helps both

0:07:46.880 --> 0:07:49.280
<v Speaker 6>for safety and for like business understanding.

0:07:49.600 --> 0:07:52.240
<v Speaker 4>So let's talk about what this charge.

0:07:52.240 --> 0:07:55.240
<v Speaker 2>I go the main chart here at meter dot org,

0:07:55.360 --> 0:07:58.320
<v Speaker 2>right on the front page, it's this time horizon chart,

0:07:58.360 --> 0:08:01.360
<v Speaker 2>and it shows Claude Opus four point six as a

0:08:01.400 --> 0:08:05.240
<v Speaker 2>February twenty twenty six able to complete a task length

0:08:05.360 --> 0:08:08.520
<v Speaker 2>in eleven hours and fifty nine minutes with a fifty

0:08:08.560 --> 0:08:13.280
<v Speaker 2>percent success rate. I have to admit, the first time

0:08:13.680 --> 0:08:16.160
<v Speaker 2>I saw this chart or version of this chart, what

0:08:16.280 --> 0:08:19.200
<v Speaker 2>I assume, and I suspect others assume, is that it

0:08:19.320 --> 0:08:21.000
<v Speaker 2>was able to go off and work on a task

0:08:21.040 --> 0:08:24.200
<v Speaker 2>for eleven hours and fifty nine minutes then come back

0:08:24.240 --> 0:08:27.200
<v Speaker 2>with an answers. But apparently it's not that. What do

0:08:27.240 --> 0:08:30.160
<v Speaker 2>you walk us through what's really being measured here? By

0:08:30.160 --> 0:08:34.240
<v Speaker 2>the way, the previous high was all was GPT five

0:08:34.280 --> 0:08:36.559
<v Speaker 2>point three codex. That was five hours of fifty minutes.

0:08:36.679 --> 0:08:38.280
<v Speaker 2>So I guess part of the reason this charge is

0:08:38.320 --> 0:08:40.720
<v Speaker 2>blew people in mind because literally that's basically a double

0:08:40.760 --> 0:08:42.640
<v Speaker 2>But why don't you talk to us about what's really

0:08:42.679 --> 0:08:43.400
<v Speaker 2>being measured here?

0:08:43.520 --> 0:08:46.520
<v Speaker 7>Yeah, so fundamentally, you know, in simpler terms, we are

0:08:46.559 --> 0:08:49.880
<v Speaker 7>plotting the difficulty of tasks the AIS are able to

0:08:49.920 --> 0:08:52.439
<v Speaker 7>complete over time. And you know, the particular way that

0:08:52.480 --> 0:08:54.959
<v Speaker 7>we measure the difficulty of tasks is in how long

0:08:55.000 --> 0:08:57.679
<v Speaker 7>it takes humans to complete those same tasks that we're

0:08:57.720 --> 0:08:59.880
<v Speaker 7>asking the AIS to do. So in this case, you know,

0:09:00.080 --> 0:09:02.400
<v Speaker 7>talking about for a PUS four point six, something like,

0:09:02.520 --> 0:09:05.520
<v Speaker 7>tasks that take humans twelve hours to do, we predict

0:09:05.559 --> 0:09:08.199
<v Speaker 7>that it will succeed at those tasks around fifty percent

0:09:08.240 --> 0:09:11.120
<v Speaker 7>of the time. And yeah, you know, it turns out

0:09:11.200 --> 0:09:14.280
<v Speaker 7>that when you plot using this particular difficulty measure how

0:09:14.280 --> 0:09:17.440
<v Speaker 7>performance AIS are relative to how long it takes humans

0:09:17.440 --> 0:09:21.280
<v Speaker 7>to complete these tasks, we see an exponential increase in

0:09:21.440 --> 0:09:24.640
<v Speaker 7>capabilities for AIS. And what that ends up meaning is

0:09:24.640 --> 0:09:27.640
<v Speaker 7>that you keep on having these doublings of capabilities every

0:09:27.760 --> 0:09:28.880
<v Speaker 7>let's say four months.

0:09:28.920 --> 0:09:30.640
<v Speaker 5>It seems on recent trends.

0:09:30.480 --> 0:09:32.600
<v Speaker 7>Where you know, the next model is not merely going

0:09:32.640 --> 0:09:35.280
<v Speaker 7>to have necessarily, you know, an hour longer time horizon,

0:09:35.320 --> 0:09:38.240
<v Speaker 7>but perhaps be having some multiple of the.

0:09:38.160 --> 0:09:40.319
<v Speaker 5>Time horizon of the previous model that's come out.

0:09:40.520 --> 0:09:44.239
<v Speaker 2>So then explain how the number of their twelve hours

0:09:44.360 --> 0:09:49.000
<v Speaker 2>is established. So there is some engineering task and you say, okay,

0:09:49.040 --> 0:09:51.760
<v Speaker 2>this is a test that would require twelve hours, but

0:09:52.080 --> 0:09:55.959
<v Speaker 2>humans of all different types of talent capabilities, how do

0:09:56.000 --> 0:09:58.920
<v Speaker 2>you establish that? Okay, this was a twelve hour TESK,

0:09:58.960 --> 0:10:00.679
<v Speaker 2>this was a six hour tel whatever.

0:10:00.960 --> 0:10:03.600
<v Speaker 7>Yeah, So the simple answer is, literally, we get humans

0:10:03.840 --> 0:10:05.800
<v Speaker 7>to sit down and complete the tasks that we give

0:10:05.800 --> 0:10:09.080
<v Speaker 7>to AIS and as close to identical conditions as possible.

0:10:09.400 --> 0:10:11.400
<v Speaker 7>So first we come up with the tasks, and that's

0:10:11.480 --> 0:10:13.240
<v Speaker 7>you know, that's whole god a killer fish. We can

0:10:13.240 --> 0:10:16.240
<v Speaker 7>talk about exactly how we do that. And then, using

0:10:16.440 --> 0:10:18.960
<v Speaker 7>essentially the same tools that we're about to give the AIS,

0:10:19.240 --> 0:10:21.880
<v Speaker 7>we take talented humans, you know, not people who have

0:10:21.880 --> 0:10:24.080
<v Speaker 7>seen this particular type of task before, but people who

0:10:24.120 --> 0:10:26.959
<v Speaker 7>have relevant expertise. So if it's a software engineering task,

0:10:27.120 --> 0:10:29.760
<v Speaker 7>you know, they have software engineering expertise. Machine learning task,

0:10:29.840 --> 0:10:33.280
<v Speaker 7>they have machine learning expertise, and then we time them,

0:10:33.360 --> 0:10:35.360
<v Speaker 7>we see how long it takes for them to complete

0:10:35.400 --> 0:10:39.439
<v Speaker 7>those tasks cescfully and then roughly we call the difficulty

0:10:39.440 --> 0:10:41.720
<v Speaker 7>of the task as measured in human time to complete,

0:10:41.800 --> 0:10:44.200
<v Speaker 7>as the average time it took these humans to complete

0:10:44.200 --> 0:10:47.080
<v Speaker 7>the task. Then we'll run the AIS on this same

0:10:47.120 --> 0:10:49.960
<v Speaker 7>set of tasks. Typically today for the very easiest tasks

0:10:49.960 --> 0:10:52.240
<v Speaker 7>that they're more or less always going to succeed, there's

0:10:52.280 --> 0:10:54.800
<v Speaker 7>some mid range of tasks where you know, perhaps they

0:10:54.800 --> 0:10:57.600
<v Speaker 7>succeed fifty percent of the time, or perhaps for some

0:10:57.720 --> 0:11:00.480
<v Speaker 7>tasks in that range they succeed zero percent of the

0:11:00.480 --> 0:11:02.079
<v Speaker 7>time and for others one hundred percent of the time.

0:11:02.080 --> 0:11:04.120
<v Speaker 7>And so they're getting fifty percent on average, let's say,

0:11:04.400 --> 0:11:06.920
<v Speaker 7>and then for the much harder tasks, perhaps they're getting

0:11:06.920 --> 0:11:10.720
<v Speaker 7>closer to zero percent. And then the point of which

0:11:10.760 --> 0:11:12.760
<v Speaker 7>we predict, you know, in the middle of all these

0:11:12.880 --> 0:11:15.000
<v Speaker 7>zero percents and one hundred percents by task, the point

0:11:15.040 --> 0:11:17.160
<v Speaker 7>at which we predict that they'd have a fifty percent

0:11:17.240 --> 0:11:20.560
<v Speaker 7>chance of succeeding. That is, either a fifty percent chance

0:11:20.600 --> 0:11:22.920
<v Speaker 7>of succeeding on some task or fifty percent of the

0:11:22.960 --> 0:11:25.920
<v Speaker 7>tasks or of that's difficulty that we think they would

0:11:25.960 --> 0:11:27.800
<v Speaker 7>succeed on. That's what we're going to call the time

0:11:27.840 --> 0:11:28.880
<v Speaker 7>horizon of these models.

0:11:29.120 --> 0:11:31.320
<v Speaker 6>I think one thing also that could be good to explain.

0:11:31.360 --> 0:11:33.400
<v Speaker 6>Here is the task distribution. I mean, this is not

0:11:33.640 --> 0:11:36.320
<v Speaker 6>this is not all activities that humans do. We are

0:11:36.360 --> 0:11:39.400
<v Speaker 6>specifically here interested in or the like. There's some question

0:11:39.559 --> 0:11:42.440
<v Speaker 6>in what tasks are you know, like Joel mentioned, we're

0:11:42.480 --> 0:11:44.240
<v Speaker 6>having people come into our office do the task to

0:11:44.240 --> 0:11:46.120
<v Speaker 6>get a sense of how long it takes. We're not

0:11:46.200 --> 0:11:48.800
<v Speaker 6>having them come in and like, you know, paint paintings

0:11:48.920 --> 0:11:51.800
<v Speaker 6>or write novels or you know. We're focused here specifically

0:11:51.800 --> 0:11:54.320
<v Speaker 6>on things that are in the distribution of work that

0:11:54.440 --> 0:11:57.559
<v Speaker 6>a engineer at a like. We like to think of

0:11:57.559 --> 0:11:59.920
<v Speaker 6>it as like a frontier AI lab, the tasks that

0:12:00.000 --> 0:12:02.520
<v Speaker 6>they might be doing. So this is things like software engineering.

0:12:02.559 --> 0:12:05.800
<v Speaker 6>It's fine tuning at AI models, it is like software

0:12:05.920 --> 0:12:07.680
<v Speaker 6>machine learning, that kind of task.

0:12:07.720 --> 0:12:09.360
<v Speaker 3>Wait, can I just ask why did you decide to

0:12:09.400 --> 0:12:11.960
<v Speaker 3>focus on engineering? Because you could have winded out to

0:12:12.080 --> 0:12:14.600
<v Speaker 3>you know, if we're talking about AI being capable of,

0:12:14.800 --> 0:12:17.120
<v Speaker 3>you know, taking over the world, there are all sorts

0:12:17.120 --> 0:12:20.640
<v Speaker 3>of substantive tasks that would fall under that category. So

0:12:20.679 --> 0:12:21.840
<v Speaker 3>why just do engineering?

0:12:22.080 --> 0:12:24.400
<v Speaker 6>Yeah, I think that for one thing, maybe other people

0:12:24.440 --> 0:12:26.200
<v Speaker 6>in the team or maybe Jolis thoughts about this, but

0:12:26.600 --> 0:12:29.120
<v Speaker 6>I think my particular motive and being interested in the

0:12:29.120 --> 0:12:31.959
<v Speaker 6>time horizon on software tasks is that first of all,

0:12:31.960 --> 0:12:34.960
<v Speaker 6>it's the thing that the industry is very like already

0:12:34.960 --> 0:12:37.240
<v Speaker 6>even before we started working on this, is very focused on.

0:12:37.320 --> 0:12:39.120
<v Speaker 6>So it's one of the capabilities that you should expect

0:12:39.160 --> 0:12:41.439
<v Speaker 6>to come along for the ride earliest. It's the thing

0:12:41.480 --> 0:12:43.760
<v Speaker 6>that like a lot of optimization pressure is being exerted on.

0:12:44.040 --> 0:12:46.040
<v Speaker 6>And then I think that it is kind of the

0:12:46.240 --> 0:12:48.280
<v Speaker 6>like thing that you would expect as an early warning

0:12:48.400 --> 0:12:51.480
<v Speaker 6>kind of sign of this AR and D automation. So

0:12:51.600 --> 0:12:54.680
<v Speaker 6>to some extent, METER thinks of itself as trying to

0:12:54.679 --> 0:12:57.800
<v Speaker 6>build you know, science that are advanced science that can

0:12:58.040 --> 0:13:00.240
<v Speaker 6>say when are we getting to the point that aisism

0:13:00.520 --> 0:13:03.559
<v Speaker 6>could improve themselves or speed up the pace of AI development.

0:13:03.559 --> 0:13:06.080
<v Speaker 6>When will AI research kind of feed on itself? And

0:13:06.120 --> 0:13:09.319
<v Speaker 6>the kind of core capability for that might be software

0:13:09.360 --> 0:13:12.920
<v Speaker 6>engineering and machine le learning research ability. There are other

0:13:13.000 --> 0:13:15.240
<v Speaker 6>skills that could be relevant to taking over the world.

0:13:15.800 --> 0:13:17.440
<v Speaker 6>I think other people have done time rising some like

0:13:17.480 --> 0:13:18.640
<v Speaker 6>cybersecurity sense.

0:13:19.200 --> 0:13:21.280
<v Speaker 3>But I suppose it is true like the basilisk isn't

0:13:21.280 --> 0:13:23.720
<v Speaker 3>going to paint its way into like power or something

0:13:23.760 --> 0:13:25.400
<v Speaker 3>like that. Okay, it might.

0:13:25.400 --> 0:13:28.960
<v Speaker 2>Deceive you it might be very convincing or cunning in

0:13:29.000 --> 0:13:31.400
<v Speaker 2>some way, and handover the cues.

0:13:32.240 --> 0:13:33.920
<v Speaker 5>I always say for your mental models.

0:13:33.920 --> 0:13:36.360
<v Speaker 7>You know, we don't have perfect evidence of this whatsoever,

0:13:36.720 --> 0:13:39.840
<v Speaker 7>but my rough sense, sort of colloquially or you know,

0:13:39.880 --> 0:13:42.800
<v Speaker 7>my prior before evidence comes in, is that if we

0:13:42.880 --> 0:13:45.520
<v Speaker 7>did study tasks on these very different distributions, you know,

0:13:45.520 --> 0:13:47.920
<v Speaker 7>not machine learning, not software engineering. I'm not sure about

0:13:47.920 --> 0:13:50.400
<v Speaker 7>painting exactly, but you know, perhaps or other kinds of

0:13:50.440 --> 0:13:53.320
<v Speaker 7>task distributions that we could enumerate that basically we would

0:13:53.400 --> 0:13:57.880
<v Speaker 7>see this similarly shaped exponential progress over time where every

0:13:57.960 --> 0:14:00.199
<v Speaker 7>I'm not sure exactly, but let's say, you know, for month,

0:14:00.240 --> 0:14:03.680
<v Speaker 7>six months, something like that, the level of capabilities as

0:14:03.720 --> 0:14:06.320
<v Speaker 7>measured in time horizon would be doubling at something like

0:14:06.320 --> 0:14:08.920
<v Speaker 7>that pace, maybe from a much lower level. So you know,

0:14:09.000 --> 0:14:11.400
<v Speaker 7>one example that we do have better evidence of is

0:14:11.440 --> 0:14:14.360
<v Speaker 7>that the ais today are much less performance at you know,

0:14:14.400 --> 0:14:17.199
<v Speaker 7>anything that requires vision capabilities, seeing what's on a screen,

0:14:17.320 --> 0:14:20.040
<v Speaker 7>clicking around at a computer, but they're getting you know,

0:14:20.120 --> 0:14:22.520
<v Speaker 7>tremendously better that sort of thing over time.

0:14:22.840 --> 0:14:23.920
<v Speaker 5>I just do mention quickly.

0:14:23.920 --> 0:14:26.479
<v Speaker 6>We did actually do a very kind of brief investigation

0:14:26.600 --> 0:14:29.640
<v Speaker 6>of this another task distributions that's on our website somewhere

0:14:29.880 --> 0:14:31.920
<v Speaker 6>like cross domain time horizons. I think we looked at

0:14:32.000 --> 0:14:35.000
<v Speaker 6>data from the Tesla's shared on self driving and forgetting

0:14:35.000 --> 0:14:37.200
<v Speaker 6>the other there's like os world. Maybe some of these

0:14:37.240 --> 0:14:39.920
<v Speaker 6>are like somewhat similar, still kind of in the distribution

0:14:39.920 --> 0:14:42.680
<v Speaker 6>of software tasks, but trying to get further afield into

0:14:42.720 --> 0:14:43.440
<v Speaker 6>things like vision.

0:14:59.240 --> 0:15:02.080
<v Speaker 3>How big is the sample size on the humans who

0:15:02.120 --> 0:15:05.480
<v Speaker 3>are actually doing work? And also is it getting harder

0:15:05.680 --> 0:15:08.520
<v Speaker 3>getting like human engineers into the room to compete with

0:15:08.600 --> 0:15:11.240
<v Speaker 3>like Claude Opus four point six versus say, if I

0:15:11.280 --> 0:15:14.320
<v Speaker 3>was a mediocre engineer, and I'm not, I'm a non

0:15:14.360 --> 0:15:16.920
<v Speaker 3>existent engineer, but if I was a mediocre one, I

0:15:16.920 --> 0:15:19.000
<v Speaker 3>would like maybe I would feel good about going up

0:15:19.040 --> 0:15:22.000
<v Speaker 3>against like GPT three or something, and maybe I would

0:15:22.040 --> 0:15:25.520
<v Speaker 3>feel a lot worse about myself going up against like Claude.

0:15:25.640 --> 0:15:28.000
<v Speaker 7>Yeah, you know, on these tasks, I'm in a pretty

0:15:28.040 --> 0:15:31.760
<v Speaker 7>similar position myself to you. So we have approximately three,

0:15:31.880 --> 0:15:35.239
<v Speaker 7>although it varies quite a lot across tasks. Human baselines

0:15:35.440 --> 0:15:37.840
<v Speaker 7>per tasks, so you know, typically we're ever going over

0:15:37.880 --> 0:15:40.640
<v Speaker 7>something like three. I think the final numbers, it's my

0:15:40.680 --> 0:15:42.800
<v Speaker 7>impression that they're not going to be so sensitive to

0:15:42.840 --> 0:15:44.280
<v Speaker 7>the particular baselines that we use.

0:15:44.680 --> 0:15:47.320
<v Speaker 6>Aren't the longer tests week more weekly baselined.

0:15:47.720 --> 0:15:50.360
<v Speaker 7>Yeah, So indeed, I think it will get a lot

0:15:50.360 --> 0:15:52.720
<v Speaker 7>harder to baseline these tasks as the length of task

0:15:52.800 --> 0:15:55.640
<v Speaker 7>AIS are able to successfully complete gets longer and longer.

0:15:55.760 --> 0:15:57.440
<v Speaker 7>You know, you might think at some points the length

0:15:57.440 --> 0:15:59.840
<v Speaker 7>of task that they can complete is longer than the

0:16:00.320 --> 0:16:02.440
<v Speaker 7>time in four months time, they're going to be able

0:16:02.440 --> 0:16:04.720
<v Speaker 7>to complete tasks of more than four months, and then

0:16:04.760 --> 0:16:06.920
<v Speaker 7>it's you know, kind of becomes paths close to impossible

0:16:06.960 --> 0:16:09.320
<v Speaker 7>to get these four months long baselines. Of course we're

0:16:09.360 --> 0:16:11.560
<v Speaker 7>not at that point yet, but you know, definitely has

0:16:11.600 --> 0:16:14.280
<v Speaker 7>become more difficult to get these baselines as time has

0:16:14.320 --> 0:16:14.640
<v Speaker 7>gone on.

0:16:15.000 --> 0:16:17.640
<v Speaker 5>At the moments, not impossible, but very challenging. Joe.

0:16:17.640 --> 0:16:21.200
<v Speaker 3>These are the future jobs for displaced engineers, right. It's

0:16:21.320 --> 0:16:25.360
<v Speaker 3>competing against the codes for benchmark first benchmark evaluation, we

0:16:25.400 --> 0:16:26.120
<v Speaker 3>found the jobs.

0:16:26.480 --> 0:16:29.600
<v Speaker 2>So we mentioned at the beginning the most viral CHARLINEI

0:16:30.320 --> 0:16:32.360
<v Speaker 2>is this chart that you have on the front of

0:16:32.360 --> 0:16:35.760
<v Speaker 2>your website. Your website defaults to this and it shows

0:16:36.000 --> 0:16:39.040
<v Speaker 2>you know, this doubling. So if we actually go back

0:16:39.080 --> 0:16:43.280
<v Speaker 2>to November let's say November twenty twenty five, Gemini three

0:16:43.480 --> 0:16:46.320
<v Speaker 2>pro three hours and forty four minutes, claud Op was

0:16:46.320 --> 0:16:48.920
<v Speaker 2>four point six twelve hours. Those are the fifty percent

0:16:49.280 --> 0:16:53.040
<v Speaker 2>success benchmark. If we go to the eighty percent benchmark,

0:16:53.120 --> 0:16:57.880
<v Speaker 2>which the website doesn't default to improve the price of

0:16:57.920 --> 0:17:02.240
<v Speaker 2>improvement looks a little less impressive to me. So okay,

0:17:02.800 --> 0:17:06.320
<v Speaker 2>now it's like it does not have the same gap

0:17:06.480 --> 0:17:09.959
<v Speaker 2>pretty clearly. Now eighty percent is still not one hundred percent.

0:17:10.400 --> 0:17:14.160
<v Speaker 2>And I know that this is your meter's goal is about,

0:17:14.200 --> 0:17:17.000
<v Speaker 2>like you know, human safety and all this stuff. But

0:17:17.040 --> 0:17:19.199
<v Speaker 2>when we think about people look at this and they

0:17:19.560 --> 0:17:22.600
<v Speaker 2>use it as a stand in for how performance are

0:17:22.640 --> 0:17:27.600
<v Speaker 2>these models? Even eighty percent, you know, certainly for like

0:17:27.640 --> 0:17:31.080
<v Speaker 2>any business application. I understand you're not like serving business

0:17:31.119 --> 0:17:34.920
<v Speaker 2>here per se, but probably businesses care about this. Even

0:17:34.960 --> 0:17:37.560
<v Speaker 2>eighty percent may not be very good enough. And it

0:17:37.600 --> 0:17:40.080
<v Speaker 2>does not look as crazy when you look at the

0:17:40.119 --> 0:17:43.240
<v Speaker 2>eighty percent chart as it does at the fifty percent chart.

0:17:43.800 --> 0:17:47.080
<v Speaker 2>Why the focus on the fifty percent chart? And given like,

0:17:47.720 --> 0:17:50.280
<v Speaker 2>why not look at the chart that just does not

0:17:50.480 --> 0:17:51.840
<v Speaker 2>look as impressive.

0:17:51.960 --> 0:17:53.840
<v Speaker 5>Yeah, maybe two central things to say.

0:17:54.119 --> 0:17:56.240
<v Speaker 7>One to my eyes, the eight percent shot it's basically

0:17:56.359 --> 0:17:59.480
<v Speaker 7>does look as impressive. Well, the doubling time is about

0:17:59.480 --> 0:18:00.680
<v Speaker 7>the safe cope.

0:18:00.480 --> 0:18:02.520
<v Speaker 5>On it's the same.

0:18:02.640 --> 0:18:05.879
<v Speaker 6>It's the same, it's say an afseet of it's the

0:18:05.880 --> 0:18:06.960
<v Speaker 6>same pace of progress.

0:18:07.040 --> 0:18:09.040
<v Speaker 7>You know, it's something like five times smaller than the

0:18:09.480 --> 0:18:11.360
<v Speaker 7>fifty percent than the fifty percent number.

0:18:11.440 --> 0:18:13.120
<v Speaker 5>But you know that only takes you too doublings.

0:18:13.160 --> 0:18:15.240
<v Speaker 7>And if each doubling takes around four months, that means

0:18:15.280 --> 0:18:16.919
<v Speaker 7>that in eight months time you're going to have the

0:18:16.920 --> 0:18:19.560
<v Speaker 7>same eighty percent success rate roughly as you do fifty

0:18:19.560 --> 0:18:20.679
<v Speaker 7>percent success ray today.

0:18:20.880 --> 0:18:21.840
<v Speaker 5>That's one thing to say.

0:18:21.960 --> 0:18:24.080
<v Speaker 7>Maybe a second thing to say is, you know, remember

0:18:24.119 --> 0:18:26.520
<v Speaker 7>at the beginning I said, essentially what we're doing is

0:18:26.560 --> 0:18:29.800
<v Speaker 7>plotting the difficulty of tasks that these ais can complete

0:18:29.800 --> 0:18:32.000
<v Speaker 7>over time, just with this particular measure that ends up

0:18:32.000 --> 0:18:35.600
<v Speaker 7>showing this clean exponential trend. And we've picked a particular

0:18:35.640 --> 0:18:37.760
<v Speaker 7>number as our difficulty number, and you know that is

0:18:37.800 --> 0:18:40.520
<v Speaker 7>this fifty percent reliability threshold. We could have picked a

0:18:40.520 --> 0:18:42.840
<v Speaker 7>different one. I think there are reasons for picking the

0:18:42.960 --> 0:18:46.120
<v Speaker 7>fifty percent one. In particular, it's the one that statistically

0:18:46.160 --> 0:18:49.239
<v Speaker 7>we're better able to measure. For some technical reasons, it's

0:18:49.280 --> 0:18:51.719
<v Speaker 7>the one that shows up in previously. It's show that

0:18:51.840 --> 0:18:53.439
<v Speaker 7>there are some couple of a couple of other reasons

0:18:53.440 --> 0:18:55.080
<v Speaker 7>why we can go for fifty per cents rather than

0:18:55.119 --> 0:18:58.520
<v Speaker 7>eighty percent. Maybe a final thing to say is that

0:18:58.880 --> 0:19:03.399
<v Speaker 7>this fifty percent number is sort of equivocating between these tasks.

0:19:03.440 --> 0:19:05.800
<v Speaker 7>It's able to complete fifty percent of the time and

0:19:05.960 --> 0:19:07.960
<v Speaker 7>fifty percent of the tasks it's able to complete one

0:19:08.000 --> 0:19:09.880
<v Speaker 7>hundred percent of the time, and fifty percent it's able

0:19:09.880 --> 0:19:12.720
<v Speaker 7>to complete zero percent of the time. And actually, I

0:19:12.760 --> 0:19:15.080
<v Speaker 7>think the situation is it's somewhere in between, but it's

0:19:15.080 --> 0:19:17.000
<v Speaker 7>a little bit closer to the latter, where there are

0:19:17.000 --> 0:19:20.080
<v Speaker 7>some tasks that it's completing with near perfect reliability and

0:19:20.119 --> 0:19:22.560
<v Speaker 7>some tasks in that range that it's completing with very

0:19:22.560 --> 0:19:27.040
<v Speaker 7>low reliability. And for downstream economic applications or for applications

0:19:27.080 --> 0:19:29.800
<v Speaker 7>inside of these maju AI companies or something, you know,

0:19:30.160 --> 0:19:32.240
<v Speaker 7>you might think that that's more favorable in some sense,

0:19:32.240 --> 0:19:34.000
<v Speaker 7>that there are some of these tasks where we're getting

0:19:34.080 --> 0:19:36.919
<v Speaker 7>one hundred percent reliability, even even for very challenging tasks.

0:19:37.000 --> 0:19:39.240
<v Speaker 6>I think two other things make maybe it could be

0:19:39.320 --> 0:19:41.240
<v Speaker 6>useful to just explain when you said that they are

0:19:41.280 --> 0:19:44.679
<v Speaker 6>technical reasons why it's easiest to measure it fifty percent. One, Like,

0:19:44.920 --> 0:19:47.080
<v Speaker 6>it is just the case that it is fifty percent

0:19:47.160 --> 0:19:49.600
<v Speaker 6>is the point at which it is like least sensitive

0:19:49.680 --> 0:19:52.280
<v Speaker 6>to Like the distribution is kind of thickest, right, I mean,

0:19:52.359 --> 0:19:54.200
<v Speaker 6>correct me if this is wrong, But my I mean

0:19:54.440 --> 0:19:57.000
<v Speaker 6>there are like to resolve something like ninety five percent,

0:19:57.080 --> 0:19:59.199
<v Speaker 6>you would need way more samples because then you need

0:19:59.240 --> 0:20:01.480
<v Speaker 6>to have some that are like you need way more

0:20:01.480 --> 0:20:03.880
<v Speaker 6>simples to be able to resolve that level of precision.

0:20:04.040 --> 0:20:05.800
<v Speaker 7>I think there are some caveats to that picture. But

0:20:06.000 --> 0:20:07.760
<v Speaker 7>let's say even more extreme. You know, let's say that

0:20:07.800 --> 0:20:10.520
<v Speaker 7>we cared about you know, ninety nine percents. In that case,

0:20:10.600 --> 0:20:13.880
<v Speaker 7>if we had one percent label noise quotes unquote, yeah

0:20:13.880 --> 0:20:16.280
<v Speaker 7>it is you know, if sometimes we were accidentally grading

0:20:16.320 --> 0:20:18.800
<v Speaker 7>some of the failing tasks, passing some of the passing

0:20:18.840 --> 0:20:21.159
<v Speaker 7>tasks as failing, then we just never be able to

0:20:21.200 --> 0:20:25.360
<v Speaker 7>estimate that reliably right, And at fifty percents, this comes

0:20:25.359 --> 0:20:26.600
<v Speaker 7>a little bit closer to washing out.

0:20:26.680 --> 0:20:29.160
<v Speaker 6>And I think one other intuitive thing here, or one intuition,

0:20:29.280 --> 0:20:31.600
<v Speaker 6>is that if you give me a task and you

0:20:31.640 --> 0:20:33.760
<v Speaker 6>give me the model, it is the point at which

0:20:33.800 --> 0:20:35.720
<v Speaker 6>I think that the model, all you tell me is

0:20:35.760 --> 0:20:37.600
<v Speaker 6>the time or the length of task that it takes

0:20:37.600 --> 0:20:40.120
<v Speaker 6>a human to do the task. The fifty percent time

0:20:40.119 --> 0:20:42.520
<v Speaker 6>horizon is the point at which I think it is

0:20:42.600 --> 0:20:45.199
<v Speaker 6>more likely that the model will be able to do

0:20:45.280 --> 0:20:47.400
<v Speaker 6>the task than that it can't. And I just find

0:20:47.400 --> 0:20:47.960
<v Speaker 6>that intuitive.

0:20:48.040 --> 0:20:50.679
<v Speaker 3>Yeah, how much interest do you get on these charts

0:20:50.680 --> 0:20:54.280
<v Speaker 3>from potential investors specifically? And the reason I ask is

0:20:54.320 --> 0:20:56.800
<v Speaker 3>because I was just messing around and like googling some

0:20:56.840 --> 0:21:00.320
<v Speaker 3>stuff and when the OPUS chart, the latest opis chart

0:21:00.359 --> 0:21:02.760
<v Speaker 3>came up, someone posted it on Breddit and I think

0:21:02.800 --> 0:21:05.440
<v Speaker 3>like the second comment on it was someone going, how

0:21:05.440 --> 0:21:08.600
<v Speaker 3>do I invest in open Ai? And like and like

0:21:08.720 --> 0:21:10.919
<v Speaker 3>people were they were trying to club together to like

0:21:11.040 --> 0:21:13.680
<v Speaker 3>invest in these companies. So clearly there are people out

0:21:13.680 --> 0:21:16.480
<v Speaker 3>there who are using these charts as investment tools.

0:21:16.720 --> 0:21:19.360
<v Speaker 6>I would say, you know, we don't get an enormous

0:21:19.400 --> 0:21:22.199
<v Speaker 6>amount of inbound from investment firms. I mean, sometimes, you know,

0:21:22.320 --> 0:21:24.239
<v Speaker 6>vcs or whatever we're based in the Bay area will

0:21:24.280 --> 0:21:27.120
<v Speaker 6>reach out to us. I think that there's some kind

0:21:27.160 --> 0:21:30.360
<v Speaker 6>of principle of our goal is to inform the public

0:21:30.800 --> 0:21:34.360
<v Speaker 6>and give them the best evidence that we can about

0:21:34.480 --> 0:21:36.239
<v Speaker 6>when we might get to this point of kind of

0:21:36.400 --> 0:21:39.200
<v Speaker 6>you know, AI being you know, fully autonomous or able

0:21:39.240 --> 0:21:42.960
<v Speaker 6>to improve itself. And there's some principle at play here

0:21:43.000 --> 0:21:45.760
<v Speaker 6>of like I kind of want to enable people to

0:21:45.800 --> 0:21:49.159
<v Speaker 6>do whatever they will do with that information, and I

0:21:49.200 --> 0:21:52.359
<v Speaker 6>think that we don't engage a ton in kind of

0:21:52.359 --> 0:21:55.560
<v Speaker 6>the like business side or investment implication of the work.

0:21:55.800 --> 0:21:58.119
<v Speaker 6>One kind of thought experiment I sometimes say to myself

0:21:58.160 --> 0:22:00.200
<v Speaker 6>is if I do believe that at some point we're

0:22:00.240 --> 0:22:03.600
<v Speaker 6>going to get this AI that's improving itself, and where

0:22:03.640 --> 0:22:05.719
<v Speaker 6>like AI research is automated, and you have all these

0:22:05.720 --> 0:22:08.520
<v Speaker 6>fears about a singularity, would I rather that like all

0:22:08.560 --> 0:22:12.000
<v Speaker 6>of Wall Street like falsely didn't think that was coming

0:22:12.200 --> 0:22:14.480
<v Speaker 6>when I believed it was coming, Or would I want

0:22:14.520 --> 0:22:16.800
<v Speaker 6>them all to know that it was coming, given that

0:22:16.880 --> 0:22:19.399
<v Speaker 6>I believe it's coming, and I think all of human

0:22:19.520 --> 0:22:21.160
<v Speaker 6>Maybe this is more a personal view, but I think

0:22:21.720 --> 0:22:25.000
<v Speaker 6>if this is possible that we will automate AI research,

0:22:25.240 --> 0:22:28.520
<v Speaker 6>I think all of humanity being aware of it, aware

0:22:28.560 --> 0:22:31.639
<v Speaker 6>of where we're heading, is sort of a precondition for

0:22:31.720 --> 0:22:33.439
<v Speaker 6>us all being able to figure out what to do

0:22:33.600 --> 0:22:35.679
<v Speaker 6>about it. And so I don't kind of want like

0:22:35.800 --> 0:22:37.959
<v Speaker 6>certain people or one side or one team to kind

0:22:38.000 --> 0:22:40.400
<v Speaker 6>of like selectively be in the dark because they might

0:22:40.640 --> 0:22:43.080
<v Speaker 6>invest on the basis of this or something like that.

0:22:43.160 --> 0:22:44.840
<v Speaker 6>But we don't, you know, it's not where we put

0:22:44.840 --> 0:22:47.719
<v Speaker 6>our time. We're focused on informing the public. The public

0:22:47.720 --> 0:22:48.959
<v Speaker 6>includes some investors.

0:22:49.280 --> 0:22:52.240
<v Speaker 3>So on that note, like what is the actual level

0:22:52.359 --> 0:22:56.479
<v Speaker 3>at which we're all presumably supposed to panic or at which, like,

0:22:56.560 --> 0:22:58.800
<v Speaker 3>if you're a policy maker, you would start to get

0:22:58.840 --> 0:23:01.439
<v Speaker 3>worried about AI being able to automate and improve on

0:23:01.520 --> 0:23:04.760
<v Speaker 3>itself in a way that eventually becomes detrimental to humanity.

0:23:05.080 --> 0:23:07.640
<v Speaker 7>I don't know exactly what the level is on this time,

0:23:07.680 --> 0:23:09.600
<v Speaker 7>horise and measure. I think, you know, one thing to

0:23:09.640 --> 0:23:12.399
<v Speaker 7>say is we have made real progress on the science

0:23:12.440 --> 0:23:14.880
<v Speaker 7>of measuring these AI systems and how capable they are.

0:23:14.920 --> 0:23:16.480
<v Speaker 7>But I think there's a long way to go, and

0:23:16.560 --> 0:23:19.240
<v Speaker 7>in an important sense, I think we're behind on this task.

0:23:19.600 --> 0:23:24.360
<v Speaker 7>We're measuring some underlying technical trend and at some point

0:23:24.440 --> 0:23:26.800
<v Speaker 7>I do think that implies greater risks.

0:23:26.800 --> 0:23:27.960
<v Speaker 5>So astonishing things happening.

0:23:28.000 --> 0:23:30.840
<v Speaker 7>Although Chris can speak more to other arguments that we

0:23:30.920 --> 0:23:33.080
<v Speaker 7>might back out to for why even if AIS are

0:23:33.160 --> 0:23:36.280
<v Speaker 7>very capable, we still might not see castrophic dangers emerge

0:23:36.359 --> 0:23:37.720
<v Speaker 7>in the short term.

0:23:38.200 --> 0:23:39.520
<v Speaker 5>Yeah, I'm sure, you know.

0:23:39.560 --> 0:23:43.159
<v Speaker 2>I think part of the reason why the AGI cheddar

0:23:43.200 --> 0:23:45.760
<v Speaker 2>has really picked up, particularly in the wake of like

0:23:45.880 --> 0:23:50.480
<v Speaker 2>everyone using Chlord code, is it's very easy to emerge

0:23:50.560 --> 0:23:52.080
<v Speaker 2>in it. So like you're sitting there, it's like, yeah,

0:23:52.160 --> 0:23:53.720
<v Speaker 2>do this, do this. It's like I don't even need

0:23:53.720 --> 0:23:55.400
<v Speaker 2>to be here, right, I think you sort of get

0:23:55.440 --> 0:23:57.680
<v Speaker 2>a very intuitive feel for like how the human could

0:23:57.680 --> 0:24:00.760
<v Speaker 2>come out of the loop. What helps today, because I'm

0:24:00.800 --> 0:24:03.239
<v Speaker 2>sure there's been tried. Like if you go to like

0:24:04.119 --> 0:24:06.919
<v Speaker 2>ch juput and you say, here's a you hear you

0:24:07.000 --> 0:24:09.920
<v Speaker 2>have cloud code access, go build something, and the AI

0:24:10.040 --> 0:24:13.280
<v Speaker 2>is what actually happens today when AI is working with A.

0:24:13.840 --> 0:24:16.520
<v Speaker 7>Yeah, my sense is that at some point, you know,

0:24:16.880 --> 0:24:19.520
<v Speaker 7>further away points than would have been true some time ago,

0:24:19.760 --> 0:24:22.760
<v Speaker 7>the AIS will more or less full on their faces

0:24:22.440 --> 0:24:24.679
<v Speaker 7>that you know, there are some things they're not so

0:24:24.720 --> 0:24:27.520
<v Speaker 7>capable of today, like collaborative hallucinations.

0:24:27.560 --> 0:24:30.840
<v Speaker 4>Will they're just like, you know, just like devolved terribles.

0:24:30.960 --> 0:24:33.240
<v Speaker 5>Yeah, I think all sorts of ways can go.

0:24:33.400 --> 0:24:34.760
<v Speaker 7>You know, at some point they're going to need to

0:24:34.800 --> 0:24:37.840
<v Speaker 7>rely on external resources, and today that they're not as

0:24:37.880 --> 0:24:41.280
<v Speaker 7>capable at managing these external resources effectively. I think they're

0:24:41.320 --> 0:24:44.320
<v Speaker 7>less capable at sort of ideation and sort of self

0:24:44.320 --> 0:24:46.359
<v Speaker 7>awareness about where they are in the problem today than

0:24:46.359 --> 0:24:48.840
<v Speaker 7>they are at these kind of raw software engineering skills.

0:24:48.960 --> 0:24:50.520
<v Speaker 7>You know, you know, as you mentioned, the ways in

0:24:50.560 --> 0:24:54.240
<v Speaker 7>which AIS are autonomous today or close to autonomous today,

0:24:54.560 --> 0:24:57.240
<v Speaker 7>is the human has the idea and then you know,

0:24:57.480 --> 0:25:00.480
<v Speaker 7>submits that idea to cloud code or a code or

0:25:00.480 --> 0:25:03.320
<v Speaker 7>one of these other agenticie tools, and then they handle

0:25:03.359 --> 0:25:06.720
<v Speaker 7>the software engineering components and possibly there's still still some

0:25:06.920 --> 0:25:09.600
<v Speaker 7>intervention after that. I do imagine that the sort of

0:25:09.760 --> 0:25:12.320
<v Speaker 7>circle of autonomy or something gets larger over time. I

0:25:12.359 --> 0:25:14.960
<v Speaker 7>do think there's no fundamental barrier. It seems to me

0:25:15.040 --> 0:25:17.280
<v Speaker 7>today as having those ideas and so we moved to

0:25:17.320 --> 0:25:20.280
<v Speaker 7>a great level of abstraction. But if we were purely

0:25:20.280 --> 0:25:23.680
<v Speaker 7>relying today on these fully autonomous capabilities, you know, could

0:25:23.760 --> 0:25:27.000
<v Speaker 7>you manage research departments, any any apartments of your choice

0:25:27.000 --> 0:25:28.520
<v Speaker 7>inside of a major AI company.

0:25:28.720 --> 0:25:30.080
<v Speaker 5>No, my guess is probably not.

0:25:31.160 --> 0:25:33.200
<v Speaker 3>Actually on this note, this reminds me something I wanted

0:25:33.240 --> 0:25:36.920
<v Speaker 3>to ask. So when you look at the domain specific

0:25:37.520 --> 0:25:40.080
<v Speaker 3>time horizon charts, so the ones that show like I

0:25:40.119 --> 0:25:42.320
<v Speaker 3>think you call them task suites or something like that.

0:25:42.359 --> 0:25:45.439
<v Speaker 3>Like I guess productivity by a specific job, and you

0:25:45.520 --> 0:25:49.399
<v Speaker 3>see these different lines, So sometimes you see like almost

0:25:49.480 --> 0:25:53.199
<v Speaker 3>horizontal lines and sometimes you see squiggly or steeper lines.

0:25:53.920 --> 0:25:56.880
<v Speaker 3>What is actually happening there? Like, how are we supposed

0:25:56.880 --> 0:25:59.600
<v Speaker 3>to interpret that? Like is this a measurement problem? Or

0:25:59.640 --> 0:26:02.520
<v Speaker 3>is it say something very fundamental about like what AI

0:26:03.359 --> 0:26:06.000
<v Speaker 3>can and can't do under current conditions.

0:26:06.160 --> 0:26:07.600
<v Speaker 6>The thing that I think would be good for Jill

0:26:07.680 --> 0:26:09.840
<v Speaker 6>to explain is that I think that there is a

0:26:09.880 --> 0:26:13.880
<v Speaker 6>distinction here between will AI like the time horizon charge

0:26:13.880 --> 0:26:17.119
<v Speaker 6>doesn't by itself, I think, tell you will productivity in

0:26:17.200 --> 0:26:20.639
<v Speaker 6>one specific kind of job increase because of access to AI?

0:26:21.160 --> 0:26:23.800
<v Speaker 7>Yeah, maybe one thing to say on that chart showing

0:26:24.040 --> 0:26:27.760
<v Speaker 7>the time horizon on these different task distributions relative to

0:26:27.800 --> 0:26:29.800
<v Speaker 7>my guesses ahead of time, You know, I think those

0:26:29.800 --> 0:26:32.640
<v Speaker 7>time horizons are remarkably similar. I think the doubling times

0:26:32.680 --> 0:26:34.920
<v Speaker 7>the pace of progress in AI seems more similar than

0:26:34.920 --> 0:26:37.239
<v Speaker 7>I put of guessed to the original trend that we

0:26:37.480 --> 0:26:41.680
<v Speaker 7>that we published, although you know, imperfectly so on this

0:26:42.000 --> 0:26:45.960
<v Speaker 7>difficulty translating what we might call raw AI capabilities, in

0:26:46.000 --> 0:26:48.359
<v Speaker 7>some sense you know, capabilities on benchmarks or something to

0:26:48.760 --> 0:26:51.040
<v Speaker 7>real world productivity. I think there are a number of

0:26:51.080 --> 0:26:53.840
<v Speaker 7>differences in a number of ways, in particular in which

0:26:54.080 --> 0:26:57.679
<v Speaker 7>the benchmark results are overestimating what we might see in

0:26:57.680 --> 0:27:00.520
<v Speaker 7>the wild, you know, not not hugely overestimating. I think

0:27:00.560 --> 0:27:02.800
<v Speaker 7>we do see that people are getting real utility out

0:27:02.800 --> 0:27:06.159
<v Speaker 7>of these modern agentic a IOLs, but overestimating to some extent.

0:27:06.840 --> 0:27:08.159
<v Speaker 5>One is that the.

0:27:08.400 --> 0:27:12.240
<v Speaker 7>Scoring implicitly is different in real problems, I'm scoring based

0:27:12.280 --> 0:27:16.240
<v Speaker 7>on something a bit more holistic than these algorithmic scoring procedures,

0:27:16.240 --> 0:27:19.359
<v Speaker 7>these automatic scoring procedures that we're using at Meter and

0:27:19.560 --> 0:27:21.800
<v Speaker 7>many other people that are using in the in the

0:27:21.840 --> 0:27:24.360
<v Speaker 7>benchmark world. There's some notion of code quality if you're

0:27:24.400 --> 0:27:26.800
<v Speaker 7>if you're working in software engineering, but for other tasks

0:27:26.840 --> 0:27:27.560
<v Speaker 7>there's there's.

0:27:27.640 --> 0:27:29.240
<v Speaker 3>Beautiful code, elegant code.

0:27:29.280 --> 0:27:32.360
<v Speaker 5>People talk that yeah, yeah, yeah, for other tasks that's

0:27:32.359 --> 0:27:34.200
<v Speaker 5>going to be coding.

0:27:34.320 --> 0:27:34.720
<v Speaker 1>This is.

0:27:36.240 --> 0:27:38.159
<v Speaker 7>One more thing is that the tasks that come up

0:27:38.200 --> 0:27:40.679
<v Speaker 7>in the wild are more likely to be messy in

0:27:40.720 --> 0:27:43.800
<v Speaker 7>some sense. They involve working with other people, They involve

0:27:43.840 --> 0:27:46.320
<v Speaker 7>working in much larger code bases or sort of more

0:27:46.359 --> 0:27:49.800
<v Speaker 7>open ended problems, maybe with something even adversarial going on

0:27:50.080 --> 0:27:52.359
<v Speaker 7>in the in the software engineering context, that might be

0:27:52.440 --> 0:27:54.760
<v Speaker 7>that someone's trying to make a change to the part

0:27:54.800 --> 0:27:56.840
<v Speaker 7>of the code base that you're currently working on and

0:27:56.880 --> 0:27:58.320
<v Speaker 7>you need to and you need to work around that,

0:27:58.720 --> 0:28:01.879
<v Speaker 7>and we do tend to see that the AIS are

0:28:02.160 --> 0:28:04.520
<v Speaker 7>less capable working on these more messy problems.

0:28:04.880 --> 0:28:05.879
<v Speaker 5>I don't want to overstate that.

0:28:06.000 --> 0:28:08.159
<v Speaker 7>You know, it's not an enormous effect, but you know,

0:28:08.240 --> 0:28:10.120
<v Speaker 7>that's one thing that gets in the way of these

0:28:10.160 --> 0:28:12.760
<v Speaker 7>productivity increases, you know. And then I do think there's

0:28:12.800 --> 0:28:15.520
<v Speaker 7>something to the reliability question right where you know, if

0:28:15.520 --> 0:28:17.639
<v Speaker 7>it was true that for a certain type of task

0:28:17.680 --> 0:28:20.479
<v Speaker 7>you only had you know, eighty percent reliability, then every

0:28:20.520 --> 0:28:23.000
<v Speaker 7>time you're going to need to go back and verify

0:28:23.080 --> 0:28:25.520
<v Speaker 7>the work of these AIS, and not only verify the

0:28:25.520 --> 0:28:27.919
<v Speaker 7>work of dcais, but without the context of how they

0:28:27.920 --> 0:28:30.600
<v Speaker 7>implemented the solution relative to if you went about the

0:28:30.600 --> 0:28:33.000
<v Speaker 7>task yourself, you'd already have that in your head, and

0:28:33.000 --> 0:28:36.280
<v Speaker 7>so this verification step quote unquote would take less time.

0:28:36.520 --> 0:28:38.680
<v Speaker 7>You know, I don't expect these frictions to be sort

0:28:38.680 --> 0:28:41.000
<v Speaker 7>of so fundamental in some sense, or I imagine they

0:28:41.080 --> 0:28:43.720
<v Speaker 7>go up levels of abstraction I think not only as

0:28:43.760 --> 0:28:46.640
<v Speaker 7>the underlying technical progress real, but I think that the

0:28:46.680 --> 0:28:49.560
<v Speaker 7>productivity improvements that are also going to show up increasingly.

0:28:49.600 --> 0:28:51.000
<v Speaker 5>But yeah, there are these frictions.

0:28:51.240 --> 0:28:54.960
<v Speaker 2>Tracy alluded to this question when she asked about VCS

0:28:55.000 --> 0:28:58.520
<v Speaker 2>and investor interest. So people see these charts and regardless

0:28:58.520 --> 0:28:59.800
<v Speaker 2>of what Meter's point.

0:28:59.640 --> 0:29:02.120
<v Speaker 4>Is, like, this is incredible. I got to invest in this.

0:29:02.520 --> 0:29:05.640
<v Speaker 2>But this brings me to this broader thing that I

0:29:05.680 --> 0:29:09.120
<v Speaker 2>find very strange about AI, which is this kind of odd,

0:29:09.280 --> 0:29:13.880
<v Speaker 2>sort of Baptist and bootlegger relationship between the AI labs

0:29:13.880 --> 0:29:16.239
<v Speaker 2>people who are building this stuff and the sort of

0:29:16.400 --> 0:29:20.800
<v Speaker 2>alignment safety people, and they sort of go back and forth,

0:29:20.920 --> 0:29:23.320
<v Speaker 2>and like you have the heads of the lab saying yes,

0:29:23.360 --> 0:29:26.000
<v Speaker 2>this might destroy the world and take all your jobs,

0:29:26.000 --> 0:29:27.800
<v Speaker 2>and the safety people in the alignment people.

0:29:27.600 --> 0:29:30.200
<v Speaker 4>Says, yes, this might destroy the world.

0:29:30.320 --> 0:29:34.480
<v Speaker 2>And like, I'm very strange industry right, Like the only

0:29:34.520 --> 0:29:36.240
<v Speaker 2>thing that I can think of a cigarettes, where like

0:29:36.240 --> 0:29:38.200
<v Speaker 2>they warn you that smokey is bad, except they had

0:29:38.200 --> 0:29:39.560
<v Speaker 2>to do that because they lost a lawsuit.

0:29:39.600 --> 0:29:41.760
<v Speaker 4>I don't think they were particularly inclined to do that.

0:29:41.840 --> 0:29:44.440
<v Speaker 2>I can't think of any other industry where the most

0:29:44.600 --> 0:29:48.720
<v Speaker 2>enthusiastic people about it are also warning and dooming about

0:29:48.720 --> 0:29:51.440
<v Speaker 2>how bad the thing they're building could be. So I'm

0:29:51.480 --> 0:29:54.760
<v Speaker 2>sort of curious, like you know, first of all, like

0:29:54.880 --> 0:29:57.960
<v Speaker 2>and I talked about this in the intro, like who

0:29:58.080 --> 0:30:01.040
<v Speaker 2>is the type of person that's like working it meter?

0:30:01.320 --> 0:30:04.960
<v Speaker 2>It is like skilled enough to do like advanced evaluations,

0:30:04.960 --> 0:30:07.200
<v Speaker 2>and like where's the funding coming from? But like talk

0:30:07.240 --> 0:30:11.080
<v Speaker 2>to us about like who's behind meter and why they're there.

0:30:11.160 --> 0:30:14.120
<v Speaker 6>Yeah, totally, So. I think one thing to say on

0:30:14.320 --> 0:30:16.600
<v Speaker 6>the history of kind of people caring about AI safety

0:30:16.640 --> 0:30:19.040
<v Speaker 6>in the day area is that this concern goes back

0:30:19.160 --> 0:30:21.560
<v Speaker 6>like quite a ways, I could say for over a decade.

0:30:21.680 --> 0:30:24.160
<v Speaker 6>There are many people who got into the field because

0:30:24.200 --> 0:30:27.800
<v Speaker 6>they saw this trend of deep learning, Like what if

0:30:27.840 --> 0:30:29.840
<v Speaker 6>deep learning works and it kind of goes all the

0:30:29.880 --> 0:30:34.680
<v Speaker 6>way to artificial general intelligence and then superintelligence, and if

0:30:34.720 --> 0:30:38.840
<v Speaker 6>that works, then it could affect everything. I think possibly

0:30:38.840 --> 0:30:40.880
<v Speaker 6>when people worry about this, there's a future that they

0:30:40.880 --> 0:30:43.600
<v Speaker 6>have in mind with super intelligence that's even more capable

0:30:43.640 --> 0:30:45.840
<v Speaker 6>than what people who think of themselves as like AGI

0:30:45.960 --> 0:30:48.560
<v Speaker 6>pill today think of. They're imagining AI systems that can

0:30:48.600 --> 0:30:50.960
<v Speaker 6>run you know, the entire economy and I think people

0:30:50.960 --> 0:30:53.280
<v Speaker 6>who kind of a while ago or many years ago

0:30:53.520 --> 0:30:56.320
<v Speaker 6>saw that vision and were sort of alarmed about the

0:30:56.320 --> 0:30:58.640
<v Speaker 6>stakes of it. Many people had this intuition that the

0:30:58.680 --> 0:31:00.800
<v Speaker 6>thing to do is go and work in the industry

0:31:00.880 --> 0:31:02.680
<v Speaker 6>because if you're like helping build it, you know what's

0:31:02.720 --> 0:31:04.560
<v Speaker 6>the best way to shape the future, It's to build it.

0:31:05.080 --> 0:31:07.360
<v Speaker 6>And I think that there's obviously you could have questions

0:31:07.360 --> 0:31:09.960
<v Speaker 6>about how sincere that is for many of the people

0:31:10.200 --> 0:31:11.960
<v Speaker 6>who are in the industry, or if there's kind of

0:31:11.960 --> 0:31:14.520
<v Speaker 6>a mix of different motivations and like you know, different

0:31:14.560 --> 0:31:17.040
<v Speaker 6>wolves inside of them where maybe they partially are motivated

0:31:17.040 --> 0:31:19.080
<v Speaker 6>by that, but also they're like there's kind of this

0:31:19.200 --> 0:31:23.200
<v Speaker 6>like Oppenheimer, like, it feels good to feel like you're

0:31:23.240 --> 0:31:25.720
<v Speaker 6>in the position of making something that's dangerous made.

0:31:25.760 --> 0:31:29.600
<v Speaker 2>Someone wants described Open aiyet of me, this is years ago.

0:31:29.720 --> 0:31:32.240
<v Speaker 2>Friend said it was like Open AI I was sort

0:31:32.280 --> 0:31:34.680
<v Speaker 2>of like the Manhattan Project, except the goal was to

0:31:34.720 --> 0:31:36.880
<v Speaker 2>not build the bomb at the very end, if that

0:31:37.000 --> 0:31:39.840
<v Speaker 2>makes any sense. So to your Oppenheimer point, it's like

0:31:39.920 --> 0:31:40.560
<v Speaker 2>very strange.

0:31:40.600 --> 0:31:43.200
<v Speaker 6>And I think one thing to emphasize is, you know, well,

0:31:43.240 --> 0:31:45.040
<v Speaker 6>it could be that there's a mix of motivations now

0:31:45.080 --> 0:31:47.280
<v Speaker 6>there are definitely many people, I think in the Bay

0:31:47.280 --> 0:31:51.200
<v Speaker 6>Area who sincerely believe that the technology is headed to

0:31:51.240 --> 0:31:53.320
<v Speaker 6>someplace that will be very difficult for a huge where

0:31:53.320 --> 0:31:55.920
<v Speaker 6>it will be very difficult for humanity to stay kind

0:31:55.920 --> 0:31:58.920
<v Speaker 6>of in the driver's seat or like stay in control

0:31:59.040 --> 0:32:00.360
<v Speaker 6>and kind of a meaningful sense.

0:32:00.760 --> 0:32:04.840
<v Speaker 2>It does seem is though, like people talk about all

0:32:04.920 --> 0:32:07.520
<v Speaker 2>the big AI labs have like a pr problem or

0:32:07.560 --> 0:32:09.440
<v Speaker 2>something like that. They keep bringing this up, and it's

0:32:09.480 --> 0:32:11.000
<v Speaker 2>like maybe they just believe it.

0:32:11.240 --> 0:32:13.600
<v Speaker 6>So I think that this concern is quite old, and

0:32:13.680 --> 0:32:15.480
<v Speaker 6>I think many people have this intuition that they're like,

0:32:15.520 --> 0:32:18.160
<v Speaker 6>I can influence the thing by building it. But now

0:32:18.280 --> 0:32:22.560
<v Speaker 6>there's this problem that that logic kind of always recommends

0:32:22.600 --> 0:32:25.760
<v Speaker 6>that you continue building more advanced technology or like more

0:32:25.760 --> 0:32:28.560
<v Speaker 6>advanced AI systems. And now you have this problem where

0:32:28.880 --> 0:32:31.520
<v Speaker 6>there's all of these companies and they all say that

0:32:32.000 --> 0:32:34.800
<v Speaker 6>they need to build it because if they don't build it,

0:32:34.840 --> 0:32:37.760
<v Speaker 6>another company will. And then even if all the and

0:32:37.960 --> 0:32:40.120
<v Speaker 6>they could all have doubts about each other's commitment to

0:32:40.160 --> 0:32:42.880
<v Speaker 6>safety or to these principles. Famously, the leaders of the

0:32:42.960 --> 0:32:45.080
<v Speaker 6>labs really do not get along. They're not friends. It's

0:32:45.080 --> 0:32:47.000
<v Speaker 6>not easy for them to kind of sort out the

0:32:47.040 --> 0:32:49.760
<v Speaker 6>safety thing among themselves. And then even if all the

0:32:50.000 --> 0:32:52.520
<v Speaker 6>USAI labs kind of agreed to do that, they then

0:32:52.600 --> 0:32:55.239
<v Speaker 6>have this kind of external bogeyman of China, Right, well,

0:32:55.440 --> 0:32:58.000
<v Speaker 6>what will the Chinese companies do? And so there's this

0:32:58.320 --> 0:33:01.320
<v Speaker 6>sense in which just like even the concern is real.

0:33:01.400 --> 0:33:03.720
<v Speaker 6>I think a lot of people then who are in

0:33:03.760 --> 0:33:06.280
<v Speaker 6>the industry have the instinct that they kind of there's

0:33:06.280 --> 0:33:08.640
<v Speaker 6>no guiding principle for what they should do on safety

0:33:08.680 --> 0:33:11.720
<v Speaker 6>other than to like build leverage for themselves for later.

0:33:12.280 --> 0:33:15.320
<v Speaker 6>And I think that is a concerning state of affairs

0:33:15.360 --> 0:33:18.520
<v Speaker 6>for AI development to be in globally. You know, obviously

0:33:18.600 --> 0:33:21.400
<v Speaker 6>we're trying to do something different by like informing the

0:33:21.480 --> 0:33:23.800
<v Speaker 6>public or kind of giving like you know, you could

0:33:23.840 --> 0:33:26.400
<v Speaker 6>imagine that this situation would be better if or like

0:33:26.440 --> 0:33:28.960
<v Speaker 6>one gap that exists right now in that picture is

0:33:29.000 --> 0:33:32.080
<v Speaker 6>that it's the people building the technology who most believe

0:33:32.160 --> 0:33:35.000
<v Speaker 6>that it's going to be destabilizing and sort of all encompassing.

0:33:35.320 --> 0:33:37.520
<v Speaker 6>Maybe if the public and governments all were on the

0:33:37.520 --> 0:33:39.680
<v Speaker 6>same page and believed the same thing, if it were

0:33:39.760 --> 0:33:41.720
<v Speaker 6>true that it was headed there, then there would be

0:33:41.800 --> 0:33:44.120
<v Speaker 6>kind of like more time for society to figure out

0:33:44.120 --> 0:33:47.160
<v Speaker 6>a response from people who are not trying to build

0:33:47.240 --> 0:33:50.600
<v Speaker 6>leverage over the technology themselves directly, or you know, control

0:33:50.640 --> 0:33:53.880
<v Speaker 6>the technology via some kind of like public action or government.

0:33:54.240 --> 0:33:56.240
<v Speaker 3>Can I just ask very quickly since you brought up

0:33:56.320 --> 0:33:58.120
<v Speaker 3>China and I don't want to forget to ask this question.

0:33:58.440 --> 0:34:02.000
<v Speaker 3>But Quinn doesn't show up on your like main charts.

0:34:02.000 --> 0:34:04.360
<v Speaker 3>I think you did a preliminary assessment of it a

0:34:04.400 --> 0:34:07.840
<v Speaker 3>while ago, but like, what's the difference between assessing one

0:34:07.840 --> 0:34:10.000
<v Speaker 3>of the closed models in America versus one of the

0:34:10.000 --> 0:34:11.880
<v Speaker 3>open source models over in China.

0:34:12.000 --> 0:34:14.080
<v Speaker 7>I think one thing to say is that the capabilities

0:34:14.280 --> 0:34:17.000
<v Speaker 7>are lacking behind We think that they're they're lacking behind it.

0:34:17.120 --> 0:34:18.840
<v Speaker 7>I'm not sure that of it.

0:34:19.000 --> 0:34:20.759
<v Speaker 3>They just like don't make it onto the chart.

0:34:21.320 --> 0:34:24.440
<v Speaker 7>So we do try to prioritize just because MITA has

0:34:24.560 --> 0:34:27.960
<v Speaker 7>has limited resources staff time in particular, that the models

0:34:27.960 --> 0:34:30.440
<v Speaker 7>that we anticipate being on the frontier and in general,

0:34:30.480 --> 0:34:32.719
<v Speaker 7>the Chinese models have been something like, you know, nine

0:34:32.760 --> 0:34:35.239
<v Speaker 7>to twelve months let's say, behind the US models. And

0:34:35.280 --> 0:34:38.040
<v Speaker 7>I think the gap by time horizon is probably even

0:34:38.080 --> 0:34:41.520
<v Speaker 7>larger than the gap by benchmark scores, where there's some

0:34:41.960 --> 0:34:43.600
<v Speaker 7>I'm not sure how scientific I can make this, but

0:34:43.680 --> 0:34:47.280
<v Speaker 7>there's some cloaqu real sense or something that the Chinese

0:34:47.280 --> 0:34:51.080
<v Speaker 7>models are stronger according to benchmark scores than they would

0:34:51.080 --> 0:34:53.880
<v Speaker 7>be on you know, truly held out problems in some.

0:34:53.760 --> 0:34:58.320
<v Speaker 3>Sense like gaming the benchmark. Is that what that means.

0:34:58.120 --> 0:35:01.880
<v Speaker 7>Or I'm not sure you know exactly how that shakes out,

0:35:01.719 --> 0:35:04.560
<v Speaker 7>but something spiritually spiritually close to that. I'm not sure

0:35:04.600 --> 0:35:06.560
<v Speaker 7>that's true for all Chinese models. I'm sure it's true

0:35:06.600 --> 0:35:09.319
<v Speaker 7>for lots of models outside of China, but I think

0:35:09.320 --> 0:35:21.160
<v Speaker 7>that's the least more possibility.

0:35:26.480 --> 0:35:29.799
<v Speaker 3>I'm very curious when you talk to external actors in

0:35:29.840 --> 0:35:31.920
<v Speaker 3>all of this, and I'm going to group them into

0:35:32.080 --> 0:35:37.480
<v Speaker 3>I guess policymakers, investors, and the labs themselves, like who

0:35:37.520 --> 0:35:39.840
<v Speaker 3>are you interacting the most with at the moment?

0:35:40.239 --> 0:35:42.560
<v Speaker 6>I think that in practice we end up interacting a

0:35:42.560 --> 0:35:45.480
<v Speaker 6>lot with AI labs because there's some amount of sorting out,

0:35:45.840 --> 0:35:48.759
<v Speaker 6>getting access to models, working with them to set new

0:35:48.760 --> 0:35:52.120
<v Speaker 6>precedents and things related to third party red teaming and

0:35:52.320 --> 0:35:55.439
<v Speaker 6>third party risk assessment. We think of our audience as

0:35:55.440 --> 0:35:58.480
<v Speaker 6>being sort of like high context members of the public,

0:35:58.560 --> 0:36:00.960
<v Speaker 6>so the kind of like people, you know, who are

0:36:01.680 --> 0:36:03.959
<v Speaker 6>maybe like you do, right, people who are kind of.

0:36:03.880 --> 0:36:05.440
<v Speaker 3>Like people listening to this podcast.

0:36:05.960 --> 0:36:08.040
<v Speaker 6>So people listening to this podcast people with kind of

0:36:08.120 --> 0:36:11.000
<v Speaker 6>who have to make important decisions that will be informed

0:36:11.000 --> 0:36:14.239
<v Speaker 6>by the pace of AI progress or like the kind

0:36:14.280 --> 0:36:17.440
<v Speaker 6>of profile of AI capabilities. Overall, Because we're based in

0:36:17.440 --> 0:36:19.480
<v Speaker 6>the Bay Area, I think we like disproportionately end up

0:36:19.520 --> 0:36:22.520
<v Speaker 6>interacting with people who are building the technology and like

0:36:22.560 --> 0:36:25.439
<v Speaker 6>closer to it. Partially, I think back to Joe's point before,

0:36:25.480 --> 0:36:27.319
<v Speaker 6>I think this is kind of because it is the

0:36:27.360 --> 0:36:30.120
<v Speaker 6>case that to kind of care about a lot of

0:36:30.160 --> 0:36:33.920
<v Speaker 6>these frontier problems, you're kind of selecting for people who

0:36:34.320 --> 0:36:36.799
<v Speaker 6>are building the technology themselves. There's some sense in which,

0:36:36.840 --> 0:36:39.960
<v Speaker 6>like the companies in the industry spends more time thinking

0:36:40.000 --> 0:36:44.040
<v Speaker 6>today about frontier capabilities assessment than the government does. I

0:36:44.040 --> 0:36:46.200
<v Speaker 6>think like one day you could imagine us getting to

0:36:46.239 --> 0:36:48.319
<v Speaker 6>the point where the government is like very focused on

0:36:48.360 --> 0:36:50.759
<v Speaker 6>this and dedicating a lot of resources to it, and

0:36:50.800 --> 0:36:52.840
<v Speaker 6>at that point I would expect Meeter to be spending

0:36:52.840 --> 0:36:54.360
<v Speaker 6>more time talking to governments.

0:36:55.040 --> 0:36:56.520
<v Speaker 3>That's kind of what I was getting at because our

0:36:56.600 --> 0:36:59.000
<v Speaker 3>senses and a lot of the conversations, like we talk

0:36:59.040 --> 0:37:01.080
<v Speaker 3>to people and they'll say something about like, oh, it's

0:37:01.080 --> 0:37:03.240
<v Speaker 3>important to have a social safety net for an AI

0:37:03.400 --> 0:37:06.560
<v Speaker 3>enabled future, but no one seems to be really thinking

0:37:06.600 --> 0:37:07.840
<v Speaker 3>about it in a lot of detail.

0:37:07.960 --> 0:37:09.880
<v Speaker 2>And when you say, you know, it's easy to imagine

0:37:10.000 --> 0:37:12.719
<v Speaker 2>or maybe the government will care more about this, not

0:37:12.800 --> 0:37:14.799
<v Speaker 2>so easy for me to imagine. It seems like they

0:37:14.840 --> 0:37:17.640
<v Speaker 2>mostly care about you know, data centers and like where

0:37:17.640 --> 0:37:20.799
<v Speaker 2>they located and stuff like that. It would be nice

0:37:20.840 --> 0:37:25.160
<v Speaker 2>if we had policymakers really looking at like frontier capabilities

0:37:25.200 --> 0:37:28.319
<v Speaker 2>and stuff. Still seems kind of a way off, but

0:37:28.360 --> 0:37:30.480
<v Speaker 2>it is interesting. You know, you're like talking about like

0:37:30.520 --> 0:37:34.120
<v Speaker 2>the sort of like capitalist dynamic, right, there's competition, and

0:37:34.360 --> 0:37:36.080
<v Speaker 2>it's like you have a lot of people that are

0:37:36.120 --> 0:37:39.319
<v Speaker 2>really worried about, oh, what if the other guys get

0:37:39.360 --> 0:37:42.400
<v Speaker 2>to ASI or AGI first, or what if the Chinese,

0:37:42.400 --> 0:37:45.080
<v Speaker 2>et cetera. How much does the fact of like free

0:37:45.080 --> 0:37:48.480
<v Speaker 2>market capitalism and the demand you know, the big investors

0:37:48.520 --> 0:37:50.400
<v Speaker 2>at the VC funds, like they want to return, they

0:37:50.400 --> 0:37:52.720
<v Speaker 2>want an ipo if we might get some big AI

0:37:52.880 --> 0:37:55.880
<v Speaker 2>IPOs this year in fact, how much do you find

0:37:55.960 --> 0:37:59.680
<v Speaker 2>that to be perhaps intention with the safety element?

0:38:00.360 --> 0:38:03.560
<v Speaker 6>Yeah, I maybe, Yeah, people on our team wou have

0:38:03.600 --> 0:38:09.000
<v Speaker 6>different views on this. I personally don't feel there's, yeah,

0:38:09.080 --> 0:38:13.920
<v Speaker 6>there's some thing you're like investors are key decision makers.

0:38:13.560 --> 0:38:15.000
<v Speaker 5>And you know they're people too.

0:38:15.080 --> 0:38:17.080
<v Speaker 6>That sounds strange to say investors or people do I

0:38:17.160 --> 0:38:19.920
<v Speaker 6>sound like Mitt Romney or something. But I think that, like,

0:38:20.360 --> 0:38:22.200
<v Speaker 6>I think that the element of this that feels like

0:38:22.239 --> 0:38:24.399
<v Speaker 6>it could be intention is if you build a bunch

0:38:24.440 --> 0:38:28.040
<v Speaker 6>of financial obligations to keep kind of the pedal to

0:38:28.080 --> 0:38:31.080
<v Speaker 6>the metal no matter what the risks are going into

0:38:31.120 --> 0:38:32.839
<v Speaker 6>the future. So, like, one thing I think a lot

0:38:32.880 --> 0:38:35.120
<v Speaker 6>about is if you're like building up a huge amount

0:38:35.120 --> 0:38:37.600
<v Speaker 6>of debt to build data centers and then say that

0:38:37.680 --> 0:38:40.840
<v Speaker 6>you do find evidence that you're now worried about about

0:38:40.880 --> 0:38:43.480
<v Speaker 6>the you know, loss of control from AI systems, you

0:38:43.520 --> 0:38:46.080
<v Speaker 6>do find instances of AI systems going rogue. Do you

0:38:46.160 --> 0:38:49.319
<v Speaker 6>now have like a financial commitment to build up those

0:38:49.400 --> 0:38:52.080
<v Speaker 6>data centers and like continue kind of the pace of progress.

0:38:52.400 --> 0:38:54.680
<v Speaker 6>I think that is one place where I feel the

0:38:54.719 --> 0:38:57.920
<v Speaker 6>tension pretty acutely, Like you're building these expectations into the

0:38:57.960 --> 0:39:01.440
<v Speaker 6>market that could kind of force you to continue development

0:39:01.480 --> 0:39:05.960
<v Speaker 6>when you otherwise would rather invest more in safety or Yeah,

0:39:06.000 --> 0:39:08.240
<v Speaker 6>like it at least gives you a kind of financial

0:39:08.320 --> 0:39:12.480
<v Speaker 6>obligation to continue scaling at least compute. I think that

0:39:12.600 --> 0:39:16.919
<v Speaker 6>like the people themselves being informed about the progress does

0:39:16.920 --> 0:39:20.040
<v Speaker 6>not seem bad to me. I think it's like good

0:39:20.040 --> 0:39:21.799
<v Speaker 6>in some ways for everyone to be on the same

0:39:21.840 --> 0:39:25.560
<v Speaker 6>page about capabilities that could be related to subverting human

0:39:25.600 --> 0:39:28.680
<v Speaker 6>control later on. But I think in the world beyond

0:39:28.680 --> 0:39:31.080
<v Speaker 6>like the information that Meter shares, I do think there

0:39:31.120 --> 0:39:34.040
<v Speaker 6>is a tension, like the fact that private companies are

0:39:34.080 --> 0:39:37.160
<v Speaker 6>building this I think could cause really acute tensions in

0:39:37.200 --> 0:39:40.800
<v Speaker 6>the future where people make these commitments that they wouldn't

0:39:40.880 --> 0:39:43.319
<v Speaker 6>if they were trying to like slow or you know,

0:39:43.400 --> 0:39:45.680
<v Speaker 6>maximize social resilience of the technology.

0:39:45.960 --> 0:39:46.160
<v Speaker 5>Yeah.

0:39:46.360 --> 0:39:48.279
<v Speaker 7>I'm not sure how these things shake out, but I

0:39:48.280 --> 0:39:51.080
<v Speaker 7>think there are some forces on the other side, right, Yeah,

0:39:51.320 --> 0:39:54.719
<v Speaker 7>you know, some safety promoting technologies quote unquotes or techniques

0:39:54.960 --> 0:39:57.080
<v Speaker 7>do make the models more useful, you know, if they're

0:39:57.120 --> 0:39:59.399
<v Speaker 7>better complying better complying with your whale in some sense,

0:39:59.440 --> 0:40:04.080
<v Speaker 7>and so have capitalist incentives standard capitalist incentives to invest

0:40:04.160 --> 0:40:06.920
<v Speaker 7>in that kind of research. Maybe that doesn't cover you know,

0:40:06.960 --> 0:40:10.440
<v Speaker 7>the broad suite of safety research that seems important. It

0:40:10.480 --> 0:40:14.040
<v Speaker 7>certainly doesn't rule out capabilities progress as being an important

0:40:14.080 --> 0:40:17.239
<v Speaker 7>taxis on which you do want to scale. But you know,

0:40:17.280 --> 0:40:19.640
<v Speaker 7>I think there are some some forces in each direction.

0:40:20.000 --> 0:40:22.080
<v Speaker 3>Since you mentioned compute just then, can you talk a

0:40:22.080 --> 0:40:24.640
<v Speaker 3>little bit more about I guess the relationship between like

0:40:24.680 --> 0:40:27.759
<v Speaker 3>the time horizon improvements and the cost of compute at

0:40:27.800 --> 0:40:29.919
<v Speaker 3>the moment, and like what you've actually seen and how

0:40:30.200 --> 0:40:31.000
<v Speaker 3>that impacts it.

0:40:31.280 --> 0:40:34.399
<v Speaker 7>Yeah, so, so one extraordinary fact from my perspective. I'm

0:40:34.400 --> 0:40:36.000
<v Speaker 7>not sure how to how to fit these facts together,

0:40:36.320 --> 0:40:39.520
<v Speaker 7>but something like the R and D spend on compute

0:40:39.719 --> 0:40:43.319
<v Speaker 7>of these companies has risen exponentially, of course, and in

0:40:43.360 --> 0:40:46.319
<v Speaker 7>fact it's risen exponentially at essentially the same rate as

0:40:46.440 --> 0:40:48.800
<v Speaker 7>time horizon progress. You know, I think there's nothing necessary

0:40:48.840 --> 0:40:50.880
<v Speaker 7>about that. You know, it doesn't mean by itself that

0:40:50.920 --> 0:40:54.479
<v Speaker 7>if computer progress lows then capabilities progress will also slow,

0:40:54.520 --> 0:40:57.080
<v Speaker 7>but you know, it's clearly an important input into into

0:40:57.120 --> 0:41:00.239
<v Speaker 7>AI progress. I expect that to continue to be through

0:41:00.280 --> 0:41:04.440
<v Speaker 7>in future. Sometimes people ask us if we think it's plausible,

0:41:04.480 --> 0:41:07.000
<v Speaker 7>or how plausible we think it is that that capabilities progress,

0:41:07.080 --> 0:41:10.279
<v Speaker 7>this exponential capabilities progress might slow down at some point

0:41:10.440 --> 0:41:13.480
<v Speaker 7>at some point in the future. And you know, one

0:41:13.520 --> 0:41:16.080
<v Speaker 7>reason it seems it's hard for me to consider it

0:41:16.080 --> 0:41:18.200
<v Speaker 7>plausible that it will slow down in the next at

0:41:18.280 --> 0:41:20.560
<v Speaker 7>least small number of years. Is that a lot of

0:41:20.600 --> 0:41:23.759
<v Speaker 7>those computes are and the investments basically already bigged in, right,

0:41:23.840 --> 0:41:25.759
<v Speaker 7>Like the data centers have already been built, you know,

0:41:25.800 --> 0:41:28.879
<v Speaker 7>plans for data centers even beyond twenty twenty seven twenty

0:41:28.920 --> 0:41:31.920
<v Speaker 7>twenty eight are presumably you know, coming coming to fruition

0:41:32.080 --> 0:41:35.479
<v Speaker 7>coming about, and so some of these input investments are

0:41:35.600 --> 0:41:37.480
<v Speaker 7>already baked in in some sense. So it would be

0:41:37.520 --> 0:41:41.320
<v Speaker 7>surprising to see capabilities slow to the extent that computes

0:41:41.360 --> 0:41:44.200
<v Speaker 7>has been has been an important input. After that, maybe

0:41:44.280 --> 0:41:46.400
<v Speaker 7>maybe you need to think about, you know, other arguments

0:41:46.400 --> 0:41:47.839
<v Speaker 7>for how capabilities might slow.

0:41:47.880 --> 0:41:49.440
<v Speaker 5>But that's roughly how I think about it.

0:41:49.960 --> 0:41:53.600
<v Speaker 2>There's a very good or interesting critical subject post called

0:41:53.640 --> 0:41:56.520
<v Speaker 2>against the Muter grav by someone named Nathan Woodgen who

0:41:56.520 --> 0:41:59.120
<v Speaker 2>brings up one an interesting point that I wouldn't have

0:41:59.120 --> 0:42:02.520
<v Speaker 2>thought of heading out Reddit, which is you're paying the

0:42:02.600 --> 0:42:06.719
<v Speaker 2>software engineers to come in and perform these tasks, right

0:42:07.520 --> 0:42:09.439
<v Speaker 2>it seems, you know, maybe this will be the last

0:42:09.520 --> 0:42:12.799
<v Speaker 2>job of humans, is just doing benchmark If I were

0:42:12.880 --> 0:42:14.839
<v Speaker 2>like a good software engineer and you say, Joe, come

0:42:14.880 --> 0:42:15.760
<v Speaker 2>in and do this task.

0:42:16.040 --> 0:42:18.000
<v Speaker 4>How do you prevent me? Oh man, this is taking

0:42:18.040 --> 0:42:18.680
<v Speaker 4>me a long time.

0:42:18.760 --> 0:42:20.759
<v Speaker 2>Mean, why I keep getting one hundred dollars an hour

0:42:20.880 --> 0:42:23.440
<v Speaker 2>for like looking at my computer and time? Who this

0:42:23.520 --> 0:42:25.160
<v Speaker 2>is tough. I'm gonna have to come back tomorrow and

0:42:25.239 --> 0:42:28.919
<v Speaker 2>keep working on this. How do you avoid the sort

0:42:28.920 --> 0:42:32.520
<v Speaker 2>of conflict of interest where the person who's paid to

0:42:32.600 --> 0:42:35.319
<v Speaker 2>work on this problem may be encouraged to take as

0:42:35.320 --> 0:42:37.560
<v Speaker 2>long as possible to solve it, and with only three

0:42:38.320 --> 0:42:42.160
<v Speaker 2>people working on it at times, I don't know, like

0:42:42.200 --> 0:42:44.400
<v Speaker 2>this does not It seems like a conflict of interest

0:42:44.400 --> 0:42:44.560
<v Speaker 2>to me.

0:42:44.800 --> 0:42:47.120
<v Speaker 7>Yeah, So the shulds onset is, you know, in general,

0:42:47.200 --> 0:42:50.200
<v Speaker 7>we are incentivizing these people to complete the task because

0:42:50.200 --> 0:42:52.600
<v Speaker 7>you know, it's possible, in particular, to complete the task

0:42:52.760 --> 0:42:55.680
<v Speaker 7>faster than that is who are attempting the same task

0:42:56.080 --> 0:42:56.960
<v Speaker 7>the time that it would take for them.

0:42:57.000 --> 0:43:00.000
<v Speaker 3>They task a bonus if they do it faster than Yeah.

0:43:00.080 --> 0:43:03.400
<v Speaker 7>Yeah, approximately, there's a bonus if they complete it faster

0:43:03.560 --> 0:43:04.479
<v Speaker 7>faster than anyone else.

0:43:04.880 --> 0:43:06.480
<v Speaker 5>You know. Another thing to say.

0:43:06.640 --> 0:43:10.279
<v Speaker 7>Is I think it just is true that our baselining methodology,

0:43:10.440 --> 0:43:12.279
<v Speaker 7>or the ways in which we compare to humans in

0:43:12.320 --> 0:43:15.120
<v Speaker 7>some ways leaves a lot to be desired that you know,

0:43:15.239 --> 0:43:17.719
<v Speaker 7>ideally we would have invested, you know, one hundred times

0:43:17.760 --> 0:43:21.840
<v Speaker 7>as many resources in having one hundred baselines human basedlines

0:43:21.840 --> 0:43:24.319
<v Speaker 7>per task, and those would have come from, you know,

0:43:24.480 --> 0:43:26.960
<v Speaker 7>perhaps the very best software engineers or machine learning engineers

0:43:27.000 --> 0:43:28.880
<v Speaker 7>in the world. Maybe that would be the Maybe that

0:43:28.920 --> 0:43:31.480
<v Speaker 7>would be the comparison that we're making. And indeed, we'd

0:43:31.480 --> 0:43:34.280
<v Speaker 7>be doing all of this procedure over many more tasks,

0:43:34.360 --> 0:43:37.280
<v Speaker 7>not just many more tasks, many more tasks, over wider

0:43:37.320 --> 0:43:40.800
<v Speaker 7>task distributions than just software engineering or machine learning engineering.

0:43:41.080 --> 0:43:44.360
<v Speaker 7>I mean, I do think time horizon still represents progress

0:43:44.400 --> 0:43:46.960
<v Speaker 7>over over what's come before in the science of measuring

0:43:47.000 --> 0:43:49.360
<v Speaker 7>AI capabilities. But you know, in some ways I'm sympathetic

0:43:49.440 --> 0:43:52.880
<v Speaker 7>to a lot of criticisms of time horizon. I do

0:43:52.960 --> 0:43:55.320
<v Speaker 7>think that some of the details, at least for the

0:43:55.360 --> 0:43:57.279
<v Speaker 7>work we've done so far, you know, aren't going to

0:43:57.280 --> 0:44:00.279
<v Speaker 7>matter as much as you might naively think. So trueing

0:44:00.440 --> 0:44:03.279
<v Speaker 7>the shortest baseline time that we end up observing or

0:44:03.280 --> 0:44:05.040
<v Speaker 7>the longest time you know, it's actually not going to

0:44:05.080 --> 0:44:06.840
<v Speaker 7>make that much difference to the final measurements.

0:44:07.040 --> 0:44:07.200
<v Speaker 5>You know.

0:44:07.239 --> 0:44:10.400
<v Speaker 7>Of course, we do think these people are talented software

0:44:10.400 --> 0:44:13.600
<v Speaker 7>engineers or cybersecurity people or someone depending on the task.

0:44:13.760 --> 0:44:15.600
<v Speaker 7>But you know, perhaps we could have found even more

0:44:15.840 --> 0:44:18.960
<v Speaker 7>talented people. They would have completed it in half the time.

0:44:19.080 --> 0:44:20.880
<v Speaker 7>And so you know, naively, it would seem like the

0:44:20.880 --> 0:44:23.440
<v Speaker 7>time horizon that we estimate of these models would be

0:44:23.600 --> 0:44:26.080
<v Speaker 7>half as long as we actually end up observing. But

0:44:26.120 --> 0:44:27.880
<v Speaker 7>of course that that wouldn't change the doubling time. It

0:44:27.880 --> 0:44:30.520
<v Speaker 7>would mean you'd get to the same level after another

0:44:30.560 --> 0:44:32.840
<v Speaker 7>four months. In some sense, the big picture that I

0:44:32.880 --> 0:44:35.400
<v Speaker 7>want time horizon to point to is less this like

0:44:35.760 --> 0:44:38.480
<v Speaker 7>Opus four point six is twelve hours in particular, and

0:44:38.560 --> 0:44:41.360
<v Speaker 7>more that we're seeing this remarkable pace of progress that

0:44:41.440 --> 0:44:45.040
<v Speaker 7>shows no signs of slowing in the recent past, and

0:44:45.640 --> 0:44:47.920
<v Speaker 7>I think in the near future as well. You know,

0:44:47.960 --> 0:44:50.480
<v Speaker 7>in fact, it shows some signs of speeding up.

0:44:50.560 --> 0:44:52.880
<v Speaker 3>Well, I was going to ask about this because I

0:44:52.920 --> 0:44:55.960
<v Speaker 3>think recently the statistic that you would always hear was

0:44:56.000 --> 0:44:59.479
<v Speaker 3>like a doubling every seven months something like that. How

0:44:59.560 --> 0:45:01.680
<v Speaker 3>fast do you see it going in the near future?

0:45:02.120 --> 0:45:05.439
<v Speaker 7>Yeah, so I was a doubling over every seven months.

0:45:05.520 --> 0:45:07.839
<v Speaker 7>Person that there was there was controversy in our team

0:45:07.880 --> 0:45:10.720
<v Speaker 7>about about what to believe here because when we originally

0:45:10.719 --> 0:45:14.040
<v Speaker 7>published this work approximately a year ago, you'd see, you know,

0:45:14.080 --> 0:45:16.840
<v Speaker 7>if you plotted a single straight line, a single exponential

0:45:16.920 --> 0:45:18.960
<v Speaker 7>you'd get something like, you know, six or seven months,

0:45:19.040 --> 0:45:19.560
<v Speaker 7>let's say.

0:45:19.640 --> 0:45:20.320
<v Speaker 5>But if you.

0:45:20.280 --> 0:45:23.359
<v Speaker 7>Restricted to just the time since I think JPT four

0:45:23.320 --> 0:45:25.759
<v Speaker 7>to oh, since the twenty twenty four models onwards, you'd

0:45:25.760 --> 0:45:27.360
<v Speaker 7>see something closer to this sort sort.

0:45:27.200 --> 0:45:28.480
<v Speaker 5>Of like four or five month trend.

0:45:29.000 --> 0:45:31.440
<v Speaker 7>And some people believed in that, and you know, some

0:45:31.440 --> 0:45:34.000
<v Speaker 7>people like me had the intuition that, well, we have

0:45:34.080 --> 0:45:36.000
<v Speaker 7>so few data points, we should we should really be

0:45:36.440 --> 0:45:38.400
<v Speaker 7>estimating over this larger number of data points than a

0:45:38.440 --> 0:45:39.000
<v Speaker 7>large number of.

0:45:39.040 --> 0:45:41.520
<v Speaker 5>Data points says every six or seven months.

0:45:41.760 --> 0:45:43.440
<v Speaker 7>There are a couple of things that have changed my

0:45:43.480 --> 0:45:46.239
<v Speaker 7>mind and made me realize my colleagues were right. Since

0:45:46.280 --> 0:45:48.680
<v Speaker 7>since then, One is that for the models that have

0:45:48.800 --> 0:45:51.000
<v Speaker 7>that have come out, since you know, what trends has

0:45:51.160 --> 0:45:54.160
<v Speaker 7>has better predicted how performance those models would be. And

0:45:54.200 --> 0:45:57.200
<v Speaker 7>it's very clear that the answer to that is the

0:45:57.560 --> 0:46:01.520
<v Speaker 7>four month doubling time and not this seven month doubling time.

0:46:01.760 --> 0:46:04.200
<v Speaker 7>You know that there's some some possibility that could speed

0:46:04.239 --> 0:46:06.080
<v Speaker 7>up again. We've seen it. We've seen it speed up

0:46:06.080 --> 0:46:08.440
<v Speaker 7>once I think there are some reasons in principle why

0:46:08.480 --> 0:46:11.400
<v Speaker 7>you might expect it to speed up again. I think

0:46:11.480 --> 0:46:13.960
<v Speaker 7>there are some caveats about this, you know, these are

0:46:14.000 --> 0:46:16.200
<v Speaker 7>these are maybe some some takes that my colleagues would

0:46:16.200 --> 0:46:18.200
<v Speaker 7>agree with, and so you know, maybe maybe you should

0:46:18.200 --> 0:46:20.120
<v Speaker 7>discard that, or you know, you should think that they're

0:46:20.120 --> 0:46:21.799
<v Speaker 7>going to commits me in the way that they did

0:46:21.840 --> 0:46:24.440
<v Speaker 7>with the with the four month versus seven month doubling times.

0:46:24.719 --> 0:46:29.080
<v Speaker 7>I have some suspicion that the tasks that meter is

0:46:29.120 --> 0:46:32.640
<v Speaker 7>measuring performance on are you know, in some sense more

0:46:32.680 --> 0:46:35.800
<v Speaker 7>and more narrow slice of possible tasks, and in particular,

0:46:35.840 --> 0:46:39.640
<v Speaker 7>and more and more narrow slice that is perhaps similar

0:46:39.680 --> 0:46:42.160
<v Speaker 7>to the kinds of tasks that you'd expect these major

0:46:42.200 --> 0:46:44.720
<v Speaker 7>AI companies to be training on in the first instance.

0:46:45.000 --> 0:46:47.880
<v Speaker 7>And so in some sense, we're increasingly more so than

0:46:48.000 --> 0:46:51.160
<v Speaker 7>was the case before, measuring progress on the exact types

0:46:51.200 --> 0:46:52.920
<v Speaker 7>of tasks that they're trying to get better at. You know,

0:46:53.120 --> 0:46:55.080
<v Speaker 7>you might think, for instance, the kinds of tasks that

0:46:55.280 --> 0:46:58.320
<v Speaker 7>would make for good reinforcement learning environments, the kinds of

0:46:58.320 --> 0:47:01.480
<v Speaker 7>tasks that you can score quickly and cheaply and automatically.

0:47:01.880 --> 0:47:04.160
<v Speaker 7>I think that progress is real. I think that progress

0:47:04.320 --> 0:47:06.960
<v Speaker 7>generalizes to some extent to other types of tasks I see.

0:47:07.000 --> 0:47:08.920
<v Speaker 7>I think we're saying, you know, remarkable progress and these

0:47:08.920 --> 0:47:10.440
<v Speaker 7>more messy tsks. For example.

0:47:10.920 --> 0:47:13.960
<v Speaker 2>I have one last question, which is like how big

0:47:14.000 --> 0:47:16.720
<v Speaker 2>is your team funding? And like also how many people

0:47:17.080 --> 0:47:21.239
<v Speaker 2>Meter are basically like really rich from AI and they're like,

0:47:21.239 --> 0:47:23.760
<v Speaker 2>you know what, I'm good. I don't need to pursue

0:47:23.960 --> 0:47:26.839
<v Speaker 2>like stick around for the IPO or whatever. I'm set

0:47:27.239 --> 0:47:30.160
<v Speaker 2>and now I want to work on something that like humanity.

0:47:30.400 --> 0:47:30.440
<v Speaker 3>No.

0:47:30.600 --> 0:47:33.160
<v Speaker 2>I've seen like there are other independent air researchers and

0:47:33.160 --> 0:47:35.360
<v Speaker 2>they talk about this. It's like, I want to be

0:47:35.400 --> 0:47:38.520
<v Speaker 2>able to talk about what I saw. Miles Brundage, someone

0:47:38.560 --> 0:47:41.200
<v Speaker 2>who has like a little think tank, He's talked about this.

0:47:41.360 --> 0:47:44.200
<v Speaker 2>What's like, how many people are like rich already and

0:47:44.200 --> 0:47:46.000
<v Speaker 2>they're like, Okay, now I want to work for something

0:47:46.040 --> 0:47:46.920
<v Speaker 2>that's public facing.

0:47:47.040 --> 0:47:49.920
<v Speaker 6>Yeah, so Meter right now is about thirty people that

0:47:50.040 --> 0:47:52.600
<v Speaker 6>we're growing and hoping to grow fast. We are hiring

0:47:52.640 --> 0:47:56.520
<v Speaker 6>I should say meter dot org slash careers and yeah,

0:47:56.520 --> 0:47:58.040
<v Speaker 6>you were touching before and kind of the thing about

0:47:58.120 --> 0:47:59.440
<v Speaker 6>is it difficult to be a nonprofit?

0:47:59.760 --> 0:48:01.640
<v Speaker 5>You know, we can't pay people in equity.

0:48:02.560 --> 0:48:04.000
<v Speaker 4>We got to get an io, right.

0:48:04.080 --> 0:48:07.359
<v Speaker 6>Yeah, there's no no ibo or for Meter, but we

0:48:07.400 --> 0:48:10.360
<v Speaker 6>do try to pay competitively on cash compensation, right, So

0:48:10.480 --> 0:48:13.120
<v Speaker 6>that's an area where we feel we can like somewhat

0:48:13.120 --> 0:48:16.360
<v Speaker 6>compete with labs. And it's true that I think a

0:48:16.400 --> 0:48:19.960
<v Speaker 6>lot of our team is just motivated by trying to

0:48:20.080 --> 0:48:22.359
<v Speaker 6>kind of do something different like not you know, all

0:48:22.400 --> 0:48:24.320
<v Speaker 6>the companies to some extent or in this business of

0:48:24.400 --> 0:48:27.759
<v Speaker 6>kind of like building somewhat redundant products kind of competing

0:48:27.800 --> 0:48:30.800
<v Speaker 6>for the same role in the world. And Meter is

0:48:30.840 --> 0:48:32.520
<v Speaker 6>in a really unique position at the moment where I

0:48:32.520 --> 0:48:35.160
<v Speaker 6>think that we have like access and the ability to

0:48:35.200 --> 0:48:38.440
<v Speaker 6>communicate these ideas and explain the state of AI research

0:48:38.560 --> 0:48:41.040
<v Speaker 6>to a number, like a lot of audiences that might

0:48:41.080 --> 0:48:44.200
<v Speaker 6>be hard for like individual researchers inside of a company,

0:48:44.239 --> 0:48:46.040
<v Speaker 6>Like we get to talk to a lot of governments directly.

0:48:46.080 --> 0:48:47.960
<v Speaker 6>We get to come here and talk with you all,

0:48:48.360 --> 0:48:50.000
<v Speaker 6>And that's kind of different. I think if you look

0:48:50.000 --> 0:48:52.200
<v Speaker 6>at all the actors that are working on the frontier

0:48:52.320 --> 0:48:55.040
<v Speaker 6>of AI research or AI safety, you kind of if

0:48:55.040 --> 0:48:57.880
<v Speaker 6>you compare us to AI lab staff, I think that

0:48:57.960 --> 0:48:59.719
<v Speaker 6>our work gets to be we get to kind of

0:48:59.760 --> 0:49:02.600
<v Speaker 6>every day work on whatever research we think will be

0:49:02.719 --> 0:49:05.480
<v Speaker 6>most informative to the like public decision.

0:49:05.600 --> 0:49:09.520
<v Speaker 2>Do you have ex AI, not XAI, but ex as

0:49:09.560 --> 0:49:12.600
<v Speaker 2>a former AI lab staff who maybe there was a

0:49:12.680 --> 0:49:14.720
<v Speaker 2>tender at some point and now they work at mater.

0:49:14.760 --> 0:49:16.520
<v Speaker 6>Yeah, we do, okay of those. Yeah, so we do

0:49:16.560 --> 0:49:19.279
<v Speaker 6>have some people who previously worked at AI labs. I

0:49:19.320 --> 0:49:21.799
<v Speaker 6>do think that as time goes on, I think one

0:49:21.880 --> 0:49:23.799
<v Speaker 6>hope that I have is that more, you know, there

0:49:24.120 --> 0:49:26.520
<v Speaker 6>will be more and more researchers who have kind of

0:49:26.560 --> 0:49:29.160
<v Speaker 6>like made the money that they need from working in

0:49:29.200 --> 0:49:31.600
<v Speaker 6>the industry and now are excited and kind of like

0:49:31.680 --> 0:49:34.520
<v Speaker 6>lifting all boats by working on kind of like inside

0:49:34.520 --> 0:49:36.920
<v Speaker 6>of an organization where the north star can be what

0:49:37.080 --> 0:49:39.360
<v Speaker 6>is most informative to the rest of the world outside

0:49:39.360 --> 0:49:41.960
<v Speaker 6>of these like relatively small set of companies.

0:49:42.360 --> 0:49:45.040
<v Speaker 7>Chris is very polite. I think that's I think that's wonderful.

0:49:45.160 --> 0:49:47.000
<v Speaker 7>I'm tempted to be a little bit, a little bit

0:49:47.000 --> 0:49:50.720
<v Speaker 7>more aggressive in this conversation. I think we have spoken

0:49:50.760 --> 0:49:53.760
<v Speaker 7>through mister's work on some of the most important problems

0:49:53.760 --> 0:49:56.080
<v Speaker 7>in the world, problems that are going to define the

0:49:56.080 --> 0:49:58.279
<v Speaker 7>future I think for not just the next years, but

0:49:58.400 --> 0:50:02.080
<v Speaker 7>you know, coming coming decades, maybe maybe even coming centuries.

0:50:02.160 --> 0:50:04.560
<v Speaker 7>And we've also spoken about some of the ways in

0:50:04.600 --> 0:50:06.960
<v Speaker 7>which me to work is not might not what you

0:50:07.040 --> 0:50:08.960
<v Speaker 7>might want it to be. That there's a long way

0:50:08.960 --> 0:50:12.879
<v Speaker 7>to go in the science of evaluating these ais. Why

0:50:12.880 --> 0:50:14.960
<v Speaker 7>have we not made more progress? You know, maybe maybe

0:50:14.960 --> 0:50:17.480
<v Speaker 7>a couple of reasons. I think clearly the central reason

0:50:18.040 --> 0:50:22.200
<v Speaker 7>is that we are bottlenecked on technical talent, on incredibly

0:50:22.239 --> 0:50:24.960
<v Speaker 7>capable people to come work on these questions. I was

0:50:25.000 --> 0:50:28.320
<v Speaker 7>on a meter work retreat recently where we were brainstorming,

0:50:28.360 --> 0:50:30.960
<v Speaker 7>you know, twenty thirty of these what seemed like world

0:50:31.040 --> 0:50:33.960
<v Speaker 7>important problems, problems that we think no one else is

0:50:34.000 --> 0:50:36.000
<v Speaker 7>going to get to if we do not get to them,

0:50:36.320 --> 0:50:38.680
<v Speaker 7>and we are able to conduct research on how many

0:50:38.719 --> 0:50:39.919
<v Speaker 7>of those problems, I think it's one.

0:50:40.160 --> 0:50:40.359
<v Speaker 5>Two.

0:50:40.800 --> 0:50:43.160
<v Speaker 7>You know, maybe if we do an extraordinary job this quarter,

0:50:43.480 --> 0:50:45.839
<v Speaker 7>it might be three. As Chris alludes to, I think

0:50:45.840 --> 0:50:49.120
<v Speaker 7>if you're interested in, you know, less working on redundant

0:50:49.160 --> 0:50:52.560
<v Speaker 7>products at these major area companies and more advancing our

0:50:52.640 --> 0:50:54.759
<v Speaker 7>understanding on some of the most important questions in the

0:50:54.760 --> 0:50:56.600
<v Speaker 7>world that are going to shake the world for years

0:50:56.600 --> 0:50:58.400
<v Speaker 7>to come. Meters is a great place to go.

0:50:58.520 --> 0:51:00.239
<v Speaker 6>Well, yeah, One more thing to say about that is

0:51:00.280 --> 0:51:03.120
<v Speaker 6>like the vibe inside of Meter is a state of triage, right,

0:51:03.200 --> 0:51:06.600
<v Speaker 6>And I think people often tell themselves externally. People might guess, oh,

0:51:06.680 --> 0:51:08.680
<v Speaker 6>you know, meters A, it's outside of any of the

0:51:08.680 --> 0:51:10.520
<v Speaker 6>AI labs. So the thing it might most struggle with

0:51:10.640 --> 0:51:12.719
<v Speaker 6>is things like access to AI models. You know, you

0:51:12.760 --> 0:51:14.440
<v Speaker 6>can't do the research you want because you don't have

0:51:14.600 --> 0:51:17.279
<v Speaker 6>you're not building the thing yourself in practice, or that's

0:51:17.320 --> 0:51:18.799
<v Speaker 6>the story that people always tell us. You have to

0:51:18.800 --> 0:51:20.920
<v Speaker 6>build you know, the future to shape it in practice.

0:51:20.960 --> 0:51:23.319
<v Speaker 6>I think our experience at METER is that, like when

0:51:23.360 --> 0:51:25.480
<v Speaker 6>we want to try new types of research that would

0:51:25.560 --> 0:51:28.520
<v Speaker 6>require new kinds of structured access, our experience at this

0:51:28.560 --> 0:51:30.799
<v Speaker 6>point has been that AI labs are like pretty game

0:51:30.840 --> 0:51:33.360
<v Speaker 6>to play ball on that. And the thing that is

0:51:33.360 --> 0:51:35.919
<v Speaker 6>more happening is that we're having to turn down opportunities

0:51:35.920 --> 0:51:37.600
<v Speaker 6>to do stuff like that because we don't have the

0:51:37.680 --> 0:51:39.960
<v Speaker 6>staff that we need to make those things happen.

0:51:40.239 --> 0:51:43.520
<v Speaker 2>Interesting Joel and Chris, thank you so much for coming

0:51:43.520 --> 0:51:47.239
<v Speaker 2>on odd Laws. Absolutely fascinating conversation and I appreciate your

0:51:47.280 --> 0:51:47.759
<v Speaker 2>taking your time.

0:51:47.920 --> 0:51:48.840
<v Speaker 5>Great to have you in the studio.

0:51:48.920 --> 0:51:55.400
<v Speaker 6>Yeah, thank you so much, so much, having us.

0:52:03.120 --> 0:52:06.640
<v Speaker 2>That was a really interesting conversation to that we're starting

0:52:06.680 --> 0:52:08.840
<v Speaker 2>from the end sort of the idea of like, Okay,

0:52:09.280 --> 0:52:11.919
<v Speaker 2>here are some really important questions, like let's just set

0:52:11.960 --> 0:52:12.879
<v Speaker 2>everything aside.

0:52:12.560 --> 0:52:14.520
<v Speaker 3>And there's thirty people working on there, there's.

0:52:14.400 --> 0:52:17.000
<v Speaker 2>You know, and like how many people want to do it,

0:52:17.080 --> 0:52:19.560
<v Speaker 2>and it's like, okay, we try to match cash comp

0:52:19.600 --> 0:52:21.760
<v Speaker 2>et cetera. Yeah, that seems like kind of a tricky

0:52:22.040 --> 0:52:24.200
<v Speaker 2>issue if like, if you accept the premise that these

0:52:24.200 --> 0:52:27.359
<v Speaker 2>are some big questions we have to get right and

0:52:27.760 --> 0:52:30.400
<v Speaker 2>you got to land this plane hopefully, Like that's a

0:52:30.400 --> 0:52:31.160
<v Speaker 2>bit of an issue.

0:52:31.280 --> 0:52:34.319
<v Speaker 3>Yeah. The other thing I thought was really interesting was

0:52:34.920 --> 0:52:37.880
<v Speaker 3>the Chinese models not really making it on the charts

0:52:38.080 --> 0:52:40.880
<v Speaker 3>even though, like we know, in the market itself, like

0:52:41.000 --> 0:52:44.040
<v Speaker 3>when deep Seak, when that new version came out, that

0:52:44.160 --> 0:52:47.040
<v Speaker 3>was like this huge thing where everyone started to panic

0:52:47.120 --> 0:52:49.160
<v Speaker 3>and to not see it even like land on the

0:52:49.200 --> 0:52:51.360
<v Speaker 3>time horizon chart. It's kind of interesting.

0:52:51.480 --> 0:52:52.280
<v Speaker 4>I guess it's interesting.

0:52:52.280 --> 0:52:54.919
<v Speaker 2>I mean, I guess I buy the reasoning from their

0:52:54.960 --> 0:52:58.319
<v Speaker 2>perspective that the only interesting question from meters perspective is

0:52:58.360 --> 0:53:02.360
<v Speaker 2>like the most cutting edge slightly adjacent to the most

0:53:02.360 --> 0:53:06.240
<v Speaker 2>interesting chart for like business, right, So it's like, Okay,

0:53:06.360 --> 0:53:09.360
<v Speaker 2>we know the deep sea and Quinn and Kimmy and

0:53:09.400 --> 0:53:12.719
<v Speaker 2>all those are like very impressive. Do they push like

0:53:12.800 --> 0:53:16.880
<v Speaker 2>the very frontier? Perhaps not, but just in general, I

0:53:16.920 --> 0:53:19.840
<v Speaker 2>find this space so weird because it's like, here you

0:53:19.880 --> 0:53:22.359
<v Speaker 2>have these people who are like clearly quite alarmed at

0:53:22.360 --> 0:53:26.319
<v Speaker 2>the potential here, and most people, I think, look at

0:53:26.320 --> 0:53:28.680
<v Speaker 2>these charts and they say like, wow, this is like

0:53:29.040 --> 0:53:30.480
<v Speaker 2>I want to invest in this, or this is.

0:53:30.400 --> 0:53:31.560
<v Speaker 4>Like no, I know, I know.

0:53:31.960 --> 0:53:34.439
<v Speaker 3>Like that's why my first question was like, you're here

0:53:34.480 --> 0:53:37.400
<v Speaker 3>for AI safety purposes, but everyone seems to get excited

0:53:37.400 --> 0:53:39.920
<v Speaker 3>about the line go up charts right, Like there's a

0:53:39.920 --> 0:53:43.880
<v Speaker 3>disconnect all connected. Like I say, when an industry basically

0:53:43.920 --> 0:53:47.760
<v Speaker 3>says it's worried by itself, you should pay attention.

0:53:48.000 --> 0:53:51.560
<v Speaker 2>It's really strange. This gets back to, you know, very

0:53:51.560 --> 0:53:54.120
<v Speaker 2>It's very strange where you have the CEOs of these

0:53:54.120 --> 0:53:57.840
<v Speaker 2>companies who are in many cases the most alarmist, and

0:53:57.880 --> 0:54:00.520
<v Speaker 2>there's this sort of cynical thing. And I don't totally

0:54:00.520 --> 0:54:03.080
<v Speaker 2>discount the cynical interpretations like oh, they're saying this because

0:54:03.120 --> 0:54:05.160
<v Speaker 2>they want to get investors and so forth, and they

0:54:05.200 --> 0:54:08.640
<v Speaker 2>need all this money. But look, it was also true

0:54:08.680 --> 0:54:12.000
<v Speaker 2>that open AI and Anthropic but open AY a little

0:54:12.040 --> 0:54:15.799
<v Speaker 2>more were like founded with these very exotic corporate structures

0:54:15.800 --> 0:54:18.759
<v Speaker 2>of like a private company owned by nonprofit et cetera,

0:54:19.120 --> 0:54:22.480
<v Speaker 2>which they presumably did because they took pretty seriously the

0:54:22.520 --> 0:54:25.520
<v Speaker 2>fact that this technology is science. It was like very

0:54:25.560 --> 0:54:29.440
<v Speaker 2>strange and not just like it's not just enterprise office right, Like.

0:54:29.400 --> 0:54:31.319
<v Speaker 3>They were self limiting in a way.

0:54:31.600 --> 0:54:35.600
<v Speaker 2>One other interesting thing too, that this idea is like, okay, like,

0:54:35.719 --> 0:54:38.400
<v Speaker 2>first of all, what's the difference between seven months and

0:54:38.440 --> 0:54:39.960
<v Speaker 2>four month time doubling?

0:54:40.120 --> 0:54:40.600
<v Speaker 5>Not much?

0:54:40.680 --> 0:54:40.880
<v Speaker 1>You know.

0:54:40.920 --> 0:54:43.080
<v Speaker 3>It's like these people's like, oh, I can't but it's exponential,

0:54:43.120 --> 0:54:43.400
<v Speaker 3>isn't it.

0:54:43.440 --> 0:54:45.400
<v Speaker 4>I guess it's exponential, But it's still funny to me.

0:54:45.440 --> 0:54:47.600
<v Speaker 2>It's like, oh, I think like AI is going to

0:54:47.719 --> 0:54:50.000
<v Speaker 2>destroy all white collar work in two years, and someone

0:54:50.040 --> 0:54:52.000
<v Speaker 2>else is like, no, no, I think it's gonna be three years.

0:54:52.160 --> 0:54:55.520
<v Speaker 2>Is if that makes any different whatsoever? But one thing

0:54:55.560 --> 0:54:58.560
<v Speaker 2>to consider all sort of alluded to this. You know,

0:54:58.600 --> 0:55:01.520
<v Speaker 2>you had like open ay shut down. It's like video efforts,

0:55:01.520 --> 0:55:04.960
<v Speaker 2>et cetera. So perhaps part of the story is just

0:55:05.040 --> 0:55:08.399
<v Speaker 2>this intense focus now on the software engineering side, as

0:55:08.480 --> 0:55:10.839
<v Speaker 2>what these labs are working in Yeah, and sort of

0:55:10.880 --> 0:55:13.400
<v Speaker 2>like all these other side quests are not as important,

0:55:13.680 --> 0:55:17.200
<v Speaker 2>So maybe we will see even more rapid progress on

0:55:17.239 --> 0:55:21.200
<v Speaker 2>some of these technical benchmarks, because clearly, from the labs perspective,

0:55:21.480 --> 0:55:23.359
<v Speaker 2>that's where the action is more than some of these

0:55:23.360 --> 0:55:26.960
<v Speaker 2>consumer things like making making images or videos.

0:55:27.080 --> 0:55:28.839
<v Speaker 3>Yep, all right, shall we leave it there, Let's leave

0:55:28.880 --> 0:55:30.680
<v Speaker 3>it there. Okay, this has been another episode of the

0:55:30.680 --> 0:55:33.080
<v Speaker 3>Auth Thoughts podcast. I'm Tracy Alloway. You can follow me

0:55:33.160 --> 0:55:34.120
<v Speaker 3>at Tracy Alloway.

0:55:34.360 --> 0:55:37.240
<v Speaker 2>And I'm Joe Wisenthal. You can follow me at the Stalwart.

0:55:37.320 --> 0:55:40.320
<v Speaker 2>Follow our guest Chris Painter He's at Chris Painter yup.

0:55:40.360 --> 0:55:43.600
<v Speaker 2>And Joel Becker He's at Joel Underscore b k R.

0:55:43.960 --> 0:55:47.120
<v Speaker 2>Follow our producers Carmen Rodriguez at Carmen armand dash Ol

0:55:47.120 --> 0:55:51.000
<v Speaker 2>Bennett at Dashbot, kil Brooks at Kilbrooks and Kevin Lozano

0:55:51.160 --> 0:55:54.480
<v Speaker 2>at Kevin Lloyd Lozano. And for more odd Laws content,

0:55:54.560 --> 0:55:56.719
<v Speaker 2>go to Bloomberg dot com slash odd Lots where the

0:55:56.800 --> 0:55:59.200
<v Speaker 2>daily newsletter and all of our episodes and you can

0:55:59.280 --> 0:56:01.680
<v Speaker 2>chat about all these topics twenty four to seven in

0:56:01.800 --> 0:56:05.239
<v Speaker 2>our discord Discord dot gg slash lots.

0:56:05.200 --> 0:56:07.439
<v Speaker 3>And if you enjoy Odd Lots. If you like these

0:56:07.480 --> 0:56:10.040
<v Speaker 3>AI episodes, then please leave us a positive review on

0:56:10.080 --> 0:56:12.759
<v Speaker 3>your favorite podcast platform. And remember, if you are a

0:56:12.800 --> 0:56:15.560
<v Speaker 3>Bloomberg subscriber, you can listen to all of our episodes

0:56:15.600 --> 0:56:18.040
<v Speaker 3>absolutely ad free. All you need to do is find

0:56:18.080 --> 0:56:21.319
<v Speaker 3>the Bloomberg channel on Apple Podcasts and follow the instructions there.

0:56:21.719 --> 0:56:22.480
<v Speaker 3>Thanks for listening.