WEBVTT - Why I Think Karpathy is Wrong on the AGI Timeline

0:00:00.840 --> 0:00:04.480
<v S1>Andrej Karpathy came on Dwarkesh. S podcast recently and I

0:00:04.480 --> 0:00:07.880
<v S1>have a number of thoughts. The consensus seems to be

0:00:07.880 --> 0:00:11.240
<v S1>that Karpathy thinks AGI is ten years away and therefore

0:00:11.240 --> 0:00:14.600
<v S1>Gary Marcus is right. And people like myself and Sholto

0:00:15.240 --> 0:00:17.400
<v S1>and all the other people saying AGI is within a

0:00:17.400 --> 0:00:21.520
<v S1>few years have just basically lost the war. It's a

0:00:21.520 --> 0:00:27.040
<v S1>compelling narrative, but that's not really what happened. He did, however,

0:00:27.080 --> 0:00:30.800
<v S1>say that he thinks AGI is ten years out. But

0:00:30.800 --> 0:00:34.040
<v S1>the AGI debate has always hinged on definitions, and I

0:00:34.040 --> 0:00:36.760
<v S1>think the one that Karpathy is using is the reason

0:00:36.760 --> 0:00:40.720
<v S1>he's wrong. It came from back when he was at OpenAI,

0:00:40.880 --> 0:00:43.840
<v S1>and it basically goes like this. An AI that can

0:00:43.840 --> 0:00:48.360
<v S1>do any economically valuable work as good or better than

0:00:48.360 --> 0:00:52.160
<v S1>a human. And again, that goes all the way back, like,

0:00:52.200 --> 0:00:55.920
<v S1>I don't know, whenever Karpathy was at OpenAI. This is

0:00:55.920 --> 0:00:59.800
<v S1>over five years ago. I simply don't think this is

0:00:59.800 --> 0:01:03.790
<v S1>the best definition to use. I quite like it as

0:01:03.790 --> 0:01:07.390
<v S1>a pure definition or as a computer science definition, but

0:01:07.390 --> 0:01:10.110
<v S1>I think we should use one that focuses more on

0:01:10.110 --> 0:01:16.510
<v S1>practically and directly helping humans and avoiding bad outcomes for humans,

0:01:16.709 --> 0:01:19.789
<v S1>as opposed to talking about what's interesting and valuable to

0:01:19.830 --> 0:01:25.589
<v S1>AI people like us. I'm worried about human worker replacement,

0:01:25.750 --> 0:01:29.110
<v S1>specifically human knowledge work, and that's why I've been using

0:01:29.110 --> 0:01:33.589
<v S1>this definition since 2023. And Dwarkesh is now using this

0:01:33.590 --> 0:01:37.870
<v S1>definition as well, which is an AI system that can

0:01:37.870 --> 0:01:41.910
<v S1>replace an average knowledge worker. For me, this is a

0:01:41.910 --> 0:01:44.750
<v S1>better definition for two reasons. One, it focuses on the

0:01:44.750 --> 0:01:48.390
<v S1>fact that it's an AI system and not one particular

0:01:48.390 --> 0:01:52.950
<v S1>component of a system like a model. Two it provides

0:01:52.950 --> 0:01:55.390
<v S1>a more direct benchmark for the thing we care about,

0:01:55.390 --> 0:01:58.910
<v S1>which is our companies actually replacing workers with the system.

0:01:59.190 --> 0:02:03.670
<v S1>Yes or no. And this system part is extremely key.

0:02:04.990 --> 0:02:08.230
<v S1>I have no reason or even ability to disagree with

0:02:08.230 --> 0:02:12.550
<v S1>Karpathy on the limitations of pure llms. He recently wrote

0:02:12.550 --> 0:02:17.270
<v S1>yet another LLM from scratch by hand A thousand lines

0:02:17.270 --> 0:02:20.950
<v S1>of code. He is the actual sensei here. Like I

0:02:20.950 --> 0:02:28.030
<v S1>know 0.0017% of what he knows about Llms. The problem is,

0:02:28.030 --> 0:02:32.230
<v S1>AI systems aren't just the llms themselves, they're not naked

0:02:32.230 --> 0:02:37.230
<v S1>neural nets. When you go to ChatGPT and you're talking

0:02:37.230 --> 0:02:40.350
<v S1>with GPT five, you're not talking to a base neural net,

0:02:40.350 --> 0:02:43.510
<v S1>you're talking to an AI system. You're talking to the

0:02:43.510 --> 0:02:46.990
<v S1>result of that initial LLM being shaped and molded with

0:02:46.990 --> 0:02:51.750
<v S1>colossal amounts of extra scaffolding and engineering to be the

0:02:51.750 --> 0:02:55.950
<v S1>best possible system it can be for doing that particular task.

0:02:56.230 --> 0:03:00.420
<v S1>In this case, being a chatbot or an assistant. This

0:03:00.419 --> 0:03:03.980
<v S1>distinction is crucial because replacing human jobs will also be

0:03:03.980 --> 0:03:08.140
<v S1>done through composite, stitched together systems that are many times

0:03:08.139 --> 0:03:12.899
<v S1>more powerful than their parts. To replace a project manager

0:03:12.900 --> 0:03:16.860
<v S1>or an executive assistant, the company's building human worker replacement

0:03:16.860 --> 0:03:20.260
<v S1>aren't going to wait for GPT nine or Gemini 7.5

0:03:20.780 --> 0:03:24.820
<v S1>to maybe solve their problems. Human worker replacement will happen

0:03:24.820 --> 0:03:28.700
<v S1>through AI products and systems that work around the pure

0:03:28.740 --> 0:03:34.500
<v S1>limitations of llms and of individual model intelligence like Rag.

0:03:34.540 --> 0:03:39.860
<v S1>Expanding context, windows, context management, things like that. And the

0:03:39.860 --> 0:03:43.940
<v S1>best example of this is actually Claude code. It's just

0:03:43.940 --> 0:03:47.780
<v S1>a brilliant example. Just throwing out estimates when Cloud Code

0:03:47.780 --> 0:03:51.940
<v S1>came out, which was earlier in 25, in like basically

0:03:51.940 --> 0:03:55.580
<v S1>March of 25 when it launched, it was like five

0:03:55.620 --> 0:03:58.740
<v S1>times better than opus, which was its best model at

0:03:58.740 --> 0:04:04.420
<v S1>the time for doing coding tasks and stuff like that. Well,

0:04:04.420 --> 0:04:07.780
<v S1>it's less than ten months later and it's already gotten

0:04:07.780 --> 0:04:11.980
<v S1>many times better than that already. It's like a night

0:04:11.980 --> 0:04:15.340
<v S1>and day difference. Yes, the models got better, but that's

0:04:15.340 --> 0:04:19.739
<v S1>not what made the difference. It was constant iterative improvements,

0:04:19.779 --> 0:04:23.740
<v S1>grinding towards improving how the AI talks to itself and

0:04:23.740 --> 0:04:30.220
<v S1>how humans interact with the AI coordination, context management, context engineering.

0:04:31.420 --> 0:04:33.860
<v S1>And just now they added skills, which takes the whole

0:04:33.860 --> 0:04:38.780
<v S1>thing to like completely different tier. This is exactly the

0:04:38.779 --> 0:04:43.740
<v S1>type of efficiency ratchet that will apply to human work replacement,

0:04:44.500 --> 0:04:47.260
<v S1>where we don't have enough context window to read all

0:04:47.260 --> 0:04:51.700
<v S1>the company's docs. Companies will have or invent systems to

0:04:51.740 --> 0:04:56.060
<v S1>do that, whether or not general enough to match human flexibility.

0:04:56.060 --> 0:04:59.180
<v S1>They'll just add so many great use cases and capabilities

0:04:59.700 --> 0:05:03.140
<v S1>based roughly around like the agent skills thing from anthropic

0:05:03.140 --> 0:05:06.299
<v S1>that they just released that we eventually won't notice because

0:05:06.300 --> 0:05:10.380
<v S1>it'll cover most use cases. The part that concerns me

0:05:10.380 --> 0:05:13.419
<v S1>most about the speed of progress towards AI replacing human

0:05:13.420 --> 0:05:17.300
<v S1>knowledge workers is not the speed of the AI system improvement.

0:05:17.740 --> 0:05:20.820
<v S1>It's also the fact that the bar is so low.

0:05:21.540 --> 0:05:24.580
<v S1>A good portion of our culture's comedy is based on

0:05:24.580 --> 0:05:28.660
<v S1>the utter incompetence of, like, half of our workforce. We're

0:05:28.660 --> 0:05:32.299
<v S1>talking about the worst possible customer service, people bragging about

0:05:32.300 --> 0:05:35.500
<v S1>how little work they do, making a sport of doing

0:05:35.500 --> 0:05:38.820
<v S1>the bare minimum, showing up the bare minimum amount of time,

0:05:39.500 --> 0:05:42.020
<v S1>not doing hardly any work and getting away with it

0:05:42.020 --> 0:05:47.140
<v S1>and getting paid. People absolutely detesting their jobs. Even decent

0:05:47.140 --> 0:05:50.419
<v S1>workers just mindlessly punch in and out a lot of

0:05:50.420 --> 0:05:56.730
<v S1>the time. Mediocrity is the baseline, almost by definition. That

0:05:56.730 --> 0:06:01.930
<v S1>is what multibillion dollar human worker replacement startups are competing with,

0:06:02.170 --> 0:06:05.330
<v S1>not the top 10% performers that you know, a lot

0:06:05.330 --> 0:06:09.289
<v S1>of us know, at least for now. Think of it

0:06:09.290 --> 0:06:12.330
<v S1>this way in the time that we went from cloud

0:06:12.330 --> 0:06:16.650
<v S1>code not existing to getting really, really good to now

0:06:16.650 --> 0:06:22.610
<v S1>having shareable work task replacement skills, the bottom 50% of

0:06:22.610 --> 0:06:28.809
<v S1>knowledge workers improved by how much? Zero in the time

0:06:28.810 --> 0:06:33.050
<v S1>since ChatGPT came out. Right. So we're talking about late 22.

0:06:33.770 --> 0:06:37.210
<v S1>So we're talking about what is that over three years

0:06:38.890 --> 0:06:42.210
<v S1>in the time since ChatGPT came out, we're talking about

0:06:42.250 --> 0:06:47.250
<v S1>a stark difference in AI before then and now, three

0:06:47.250 --> 0:06:51.210
<v S1>full years go by, the bottom 50% of knowledge workers

0:06:51.210 --> 0:06:57.120
<v S1>improved their capabilities. By how much? Again, 0%. The bar

0:06:57.120 --> 0:07:01.560
<v S1>for human work replacement is not moving, while the capabilities

0:07:01.560 --> 0:07:07.159
<v S1>of AI systems are going absolutely apeshit. Now, you might

0:07:07.160 --> 0:07:09.240
<v S1>push back saying this is only for the people not

0:07:09.240 --> 0:07:13.320
<v S1>trying very hard or who aren't that smart or whatever. True.

0:07:13.440 --> 0:07:17.200
<v S1>But it doesn't matter. You and me and Dwarkesh and

0:07:17.200 --> 0:07:20.840
<v S1>Karpathy are going to be fine. So what? I'm worried

0:07:20.840 --> 0:07:25.560
<v S1>about everyone else. If AI only eats the absolute worst

0:07:25.720 --> 0:07:28.800
<v S1>bottom 50% of knowledge workers in the next 5 or

0:07:28.800 --> 0:07:33.680
<v S1>10 years, we're still talking about hundreds of millions of jobs,

0:07:34.760 --> 0:07:38.440
<v S1>or even 25%. So basically a bunch of I just

0:07:38.440 --> 0:07:41.520
<v S1>did a bunch of research on this, and the total

0:07:41.520 --> 0:07:46.800
<v S1>number of knowledge workers worldwide is right around a billion.

0:07:47.680 --> 0:07:53.120
<v S1>1 billion knowledge workers. So half is a big percentage.

0:07:53.120 --> 0:07:57.920
<v S1>That's 500 million people, but let's just say it's 10%.

0:07:57.920 --> 0:08:02.440
<v S1>Let's just say it's 25%. And we've already established that

0:08:02.440 --> 0:08:04.960
<v S1>these are the least competent people at the job. So no,

0:08:04.960 --> 0:08:08.120
<v S1>they won't be pivoting easily to another knowledge work position.

0:08:09.280 --> 0:08:13.160
<v S1>This is why I disagree with Karpathy on AGI. It's

0:08:13.160 --> 0:08:18.120
<v S1>not because he's wrong about Llms having severe limitations. He's not,

0:08:18.560 --> 0:08:21.400
<v S1>but he's focused on the wrong thing. If the thing

0:08:21.400 --> 0:08:25.360
<v S1>we care about is AI's near-term and practical impact on humanity,

0:08:26.120 --> 0:08:28.600
<v S1>the thing to watch is not the pure LLM tech

0:08:28.760 --> 0:08:33.200
<v S1>or the specific technical limitations of RL to achieving continuous learning.

0:08:33.559 --> 0:08:37.559
<v S1>It's the trillions of dollars being invested in replacing the

0:08:37.559 --> 0:08:41.320
<v S1>worst performing human workers, who will likely never get better

0:08:41.320 --> 0:08:45.000
<v S1>than they already are. Those trillions are being spent on

0:08:45.000 --> 0:08:51.160
<v S1>scaffolding workarounds to LLM limitations that provide us just general

0:08:51.160 --> 0:08:55.720
<v S1>enough AGI to start replacing people and from there it

0:08:55.720 --> 0:08:59.480
<v S1>will only improve. Given what we've seen in systems like

0:08:59.480 --> 0:09:06.400
<v S1>cloud code cursor codecs that dramatically magnify model capability. While

0:09:06.400 --> 0:09:09.840
<v S1>the models continue to improve along their own axis as well,

0:09:10.080 --> 0:09:13.440
<v S1>do you really want to bet that good enough generality

0:09:13.840 --> 0:09:17.559
<v S1>won't be hit in the next couple of years? I

0:09:17.559 --> 0:09:20.520
<v S1>wouldn't take that bet. And this is why I think

0:09:20.559 --> 0:09:24.880
<v S1>AGI will arrive before 2028. Like a 70% chance. A

0:09:24.920 --> 0:09:31.959
<v S1>rough guess who really knows. And before 2030, I'm guessing 95%.

0:09:32.760 --> 0:09:36.040
<v S1>Not because all the stuff Karpathy is talking about will

0:09:36.040 --> 0:09:39.079
<v S1>be solved by then, but because it won't matter if

0:09:39.080 --> 0:09:43.440
<v S1>it's solved with trillions of dollars in funding and trillions

0:09:43.440 --> 0:09:48.000
<v S1>of dollars in market opportunity, we're almost guaranteed to cloud

0:09:48.000 --> 0:09:51.720
<v S1>code our way past a very low bar of millions

0:09:51.720 --> 0:09:53.400
<v S1>of barely there employees.