1 00:00:00,840 --> 00:00:04,480 S1: Andrej Karpathy came on Dwarkesh. S podcast recently and I 2 00:00:04,480 --> 00:00:07,880 S1: have a number of thoughts. The consensus seems to be 3 00:00:07,880 --> 00:00:11,240 S1: that Karpathy thinks AGI is ten years away and therefore 4 00:00:11,240 --> 00:00:14,600 S1: Gary Marcus is right. And people like myself and Sholto 5 00:00:15,240 --> 00:00:17,400 S1: and all the other people saying AGI is within a 6 00:00:17,400 --> 00:00:21,520 S1: few years have just basically lost the war. It's a 7 00:00:21,520 --> 00:00:27,040 S1: compelling narrative, but that's not really what happened. He did, however, 8 00:00:27,080 --> 00:00:30,800 S1: say that he thinks AGI is ten years out. But 9 00:00:30,800 --> 00:00:34,040 S1: the AGI debate has always hinged on definitions, and I 10 00:00:34,040 --> 00:00:36,760 S1: think the one that Karpathy is using is the reason 11 00:00:36,760 --> 00:00:40,720 S1: he's wrong. It came from back when he was at OpenAI, 12 00:00:40,880 --> 00:00:43,840 S1: and it basically goes like this. An AI that can 13 00:00:43,840 --> 00:00:48,360 S1: do any economically valuable work as good or better than 14 00:00:48,360 --> 00:00:52,160 S1: a human. And again, that goes all the way back, like, 15 00:00:52,200 --> 00:00:55,920 S1: I don't know, whenever Karpathy was at OpenAI. This is 16 00:00:55,920 --> 00:00:59,800 S1: over five years ago. I simply don't think this is 17 00:00:59,800 --> 00:01:03,790 S1: the best definition to use. I quite like it as 18 00:01:03,790 --> 00:01:07,390 S1: a pure definition or as a computer science definition, but 19 00:01:07,390 --> 00:01:10,110 S1: I think we should use one that focuses more on 20 00:01:10,110 --> 00:01:16,510 S1: practically and directly helping humans and avoiding bad outcomes for humans, 21 00:01:16,709 --> 00:01:19,789 S1: as opposed to talking about what's interesting and valuable to 22 00:01:19,830 --> 00:01:25,589 S1: AI people like us. I'm worried about human worker replacement, 23 00:01:25,750 --> 00:01:29,110 S1: specifically human knowledge work, and that's why I've been using 24 00:01:29,110 --> 00:01:33,589 S1: this definition since 2023. And Dwarkesh is now using this 25 00:01:33,590 --> 00:01:37,870 S1: definition as well, which is an AI system that can 26 00:01:37,870 --> 00:01:41,910 S1: replace an average knowledge worker. For me, this is a 27 00:01:41,910 --> 00:01:44,750 S1: better definition for two reasons. One, it focuses on the 28 00:01:44,750 --> 00:01:48,390 S1: fact that it's an AI system and not one particular 29 00:01:48,390 --> 00:01:52,950 S1: component of a system like a model. Two it provides 30 00:01:52,950 --> 00:01:55,390 S1: a more direct benchmark for the thing we care about, 31 00:01:55,390 --> 00:01:58,910 S1: which is our companies actually replacing workers with the system. 32 00:01:59,190 --> 00:02:03,670 S1: Yes or no. And this system part is extremely key. 33 00:02:04,990 --> 00:02:08,230 S1: I have no reason or even ability to disagree with 34 00:02:08,230 --> 00:02:12,550 S1: Karpathy on the limitations of pure llms. He recently wrote 35 00:02:12,550 --> 00:02:17,270 S1: yet another LLM from scratch by hand A thousand lines 36 00:02:17,270 --> 00:02:20,950 S1: of code. He is the actual sensei here. Like I 37 00:02:20,950 --> 00:02:28,030 S1: know 0.0017% of what he knows about Llms. The problem is, 38 00:02:28,030 --> 00:02:32,230 S1: AI systems aren't just the llms themselves, they're not naked 39 00:02:32,230 --> 00:02:37,230 S1: neural nets. When you go to ChatGPT and you're talking 40 00:02:37,230 --> 00:02:40,350 S1: with GPT five, you're not talking to a base neural net, 41 00:02:40,350 --> 00:02:43,510 S1: you're talking to an AI system. You're talking to the 42 00:02:43,510 --> 00:02:46,990 S1: result of that initial LLM being shaped and molded with 43 00:02:46,990 --> 00:02:51,750 S1: colossal amounts of extra scaffolding and engineering to be the 44 00:02:51,750 --> 00:02:55,950 S1: best possible system it can be for doing that particular task. 45 00:02:56,230 --> 00:03:00,420 S1: In this case, being a chatbot or an assistant. This 46 00:03:00,419 --> 00:03:03,980 S1: distinction is crucial because replacing human jobs will also be 47 00:03:03,980 --> 00:03:08,140 S1: done through composite, stitched together systems that are many times 48 00:03:08,139 --> 00:03:12,899 S1: more powerful than their parts. To replace a project manager 49 00:03:12,900 --> 00:03:16,860 S1: or an executive assistant, the company's building human worker replacement 50 00:03:16,860 --> 00:03:20,260 S1: aren't going to wait for GPT nine or Gemini 7.5 51 00:03:20,780 --> 00:03:24,820 S1: to maybe solve their problems. Human worker replacement will happen 52 00:03:24,820 --> 00:03:28,700 S1: through AI products and systems that work around the pure 53 00:03:28,740 --> 00:03:34,500 S1: limitations of llms and of individual model intelligence like Rag. 54 00:03:34,540 --> 00:03:39,860 S1: Expanding context, windows, context management, things like that. And the 55 00:03:39,860 --> 00:03:43,940 S1: best example of this is actually Claude code. It's just 56 00:03:43,940 --> 00:03:47,780 S1: a brilliant example. Just throwing out estimates when Cloud Code 57 00:03:47,780 --> 00:03:51,940 S1: came out, which was earlier in 25, in like basically 58 00:03:51,940 --> 00:03:55,580 S1: March of 25 when it launched, it was like five 59 00:03:55,620 --> 00:03:58,740 S1: times better than opus, which was its best model at 60 00:03:58,740 --> 00:04:04,420 S1: the time for doing coding tasks and stuff like that. Well, 61 00:04:04,420 --> 00:04:07,780 S1: it's less than ten months later and it's already gotten 62 00:04:07,780 --> 00:04:11,980 S1: many times better than that already. It's like a night 63 00:04:11,980 --> 00:04:15,340 S1: and day difference. Yes, the models got better, but that's 64 00:04:15,340 --> 00:04:19,739 S1: not what made the difference. It was constant iterative improvements, 65 00:04:19,779 --> 00:04:23,740 S1: grinding towards improving how the AI talks to itself and 66 00:04:23,740 --> 00:04:30,220 S1: how humans interact with the AI coordination, context management, context engineering. 67 00:04:31,420 --> 00:04:33,860 S1: And just now they added skills, which takes the whole 68 00:04:33,860 --> 00:04:38,780 S1: thing to like completely different tier. This is exactly the 69 00:04:38,779 --> 00:04:43,740 S1: type of efficiency ratchet that will apply to human work replacement, 70 00:04:44,500 --> 00:04:47,260 S1: where we don't have enough context window to read all 71 00:04:47,260 --> 00:04:51,700 S1: the company's docs. Companies will have or invent systems to 72 00:04:51,740 --> 00:04:56,060 S1: do that, whether or not general enough to match human flexibility. 73 00:04:56,060 --> 00:04:59,180 S1: They'll just add so many great use cases and capabilities 74 00:04:59,700 --> 00:05:03,140 S1: based roughly around like the agent skills thing from anthropic 75 00:05:03,140 --> 00:05:06,299 S1: that they just released that we eventually won't notice because 76 00:05:06,300 --> 00:05:10,380 S1: it'll cover most use cases. The part that concerns me 77 00:05:10,380 --> 00:05:13,419 S1: most about the speed of progress towards AI replacing human 78 00:05:13,420 --> 00:05:17,300 S1: knowledge workers is not the speed of the AI system improvement. 79 00:05:17,740 --> 00:05:20,820 S1: It's also the fact that the bar is so low. 80 00:05:21,540 --> 00:05:24,580 S1: A good portion of our culture's comedy is based on 81 00:05:24,580 --> 00:05:28,660 S1: the utter incompetence of, like, half of our workforce. We're 82 00:05:28,660 --> 00:05:32,299 S1: talking about the worst possible customer service, people bragging about 83 00:05:32,300 --> 00:05:35,500 S1: how little work they do, making a sport of doing 84 00:05:35,500 --> 00:05:38,820 S1: the bare minimum, showing up the bare minimum amount of time, 85 00:05:39,500 --> 00:05:42,020 S1: not doing hardly any work and getting away with it 86 00:05:42,020 --> 00:05:47,140 S1: and getting paid. People absolutely detesting their jobs. Even decent 87 00:05:47,140 --> 00:05:50,419 S1: workers just mindlessly punch in and out a lot of 88 00:05:50,420 --> 00:05:56,730 S1: the time. Mediocrity is the baseline, almost by definition. That 89 00:05:56,730 --> 00:06:01,930 S1: is what multibillion dollar human worker replacement startups are competing with, 90 00:06:02,170 --> 00:06:05,330 S1: not the top 10% performers that you know, a lot 91 00:06:05,330 --> 00:06:09,289 S1: of us know, at least for now. Think of it 92 00:06:09,290 --> 00:06:12,330 S1: this way in the time that we went from cloud 93 00:06:12,330 --> 00:06:16,650 S1: code not existing to getting really, really good to now 94 00:06:16,650 --> 00:06:22,610 S1: having shareable work task replacement skills, the bottom 50% of 95 00:06:22,610 --> 00:06:28,809 S1: knowledge workers improved by how much? Zero in the time 96 00:06:28,810 --> 00:06:33,050 S1: since ChatGPT came out. Right. So we're talking about late 22. 97 00:06:33,770 --> 00:06:37,210 S1: So we're talking about what is that over three years 98 00:06:38,890 --> 00:06:42,210 S1: in the time since ChatGPT came out, we're talking about 99 00:06:42,250 --> 00:06:47,250 S1: a stark difference in AI before then and now, three 100 00:06:47,250 --> 00:06:51,210 S1: full years go by, the bottom 50% of knowledge workers 101 00:06:51,210 --> 00:06:57,120 S1: improved their capabilities. By how much? Again, 0%. The bar 102 00:06:57,120 --> 00:07:01,560 S1: for human work replacement is not moving, while the capabilities 103 00:07:01,560 --> 00:07:07,159 S1: of AI systems are going absolutely apeshit. Now, you might 104 00:07:07,160 --> 00:07:09,240 S1: push back saying this is only for the people not 105 00:07:09,240 --> 00:07:13,320 S1: trying very hard or who aren't that smart or whatever. True. 106 00:07:13,440 --> 00:07:17,200 S1: But it doesn't matter. You and me and Dwarkesh and 107 00:07:17,200 --> 00:07:20,840 S1: Karpathy are going to be fine. So what? I'm worried 108 00:07:20,840 --> 00:07:25,560 S1: about everyone else. If AI only eats the absolute worst 109 00:07:25,720 --> 00:07:28,800 S1: bottom 50% of knowledge workers in the next 5 or 110 00:07:28,800 --> 00:07:33,680 S1: 10 years, we're still talking about hundreds of millions of jobs, 111 00:07:34,760 --> 00:07:38,440 S1: or even 25%. So basically a bunch of I just 112 00:07:38,440 --> 00:07:41,520 S1: did a bunch of research on this, and the total 113 00:07:41,520 --> 00:07:46,800 S1: number of knowledge workers worldwide is right around a billion. 114 00:07:47,680 --> 00:07:53,120 S1: 1 billion knowledge workers. So half is a big percentage. 115 00:07:53,120 --> 00:07:57,920 S1: That's 500 million people, but let's just say it's 10%. 116 00:07:57,920 --> 00:08:02,440 S1: Let's just say it's 25%. And we've already established that 117 00:08:02,440 --> 00:08:04,960 S1: these are the least competent people at the job. So no, 118 00:08:04,960 --> 00:08:08,120 S1: they won't be pivoting easily to another knowledge work position. 119 00:08:09,280 --> 00:08:13,160 S1: This is why I disagree with Karpathy on AGI. It's 120 00:08:13,160 --> 00:08:18,120 S1: not because he's wrong about Llms having severe limitations. He's not, 121 00:08:18,560 --> 00:08:21,400 S1: but he's focused on the wrong thing. If the thing 122 00:08:21,400 --> 00:08:25,360 S1: we care about is AI's near-term and practical impact on humanity, 123 00:08:26,120 --> 00:08:28,600 S1: the thing to watch is not the pure LLM tech 124 00:08:28,760 --> 00:08:33,200 S1: or the specific technical limitations of RL to achieving continuous learning. 125 00:08:33,559 --> 00:08:37,559 S1: It's the trillions of dollars being invested in replacing the 126 00:08:37,559 --> 00:08:41,320 S1: worst performing human workers, who will likely never get better 127 00:08:41,320 --> 00:08:45,000 S1: than they already are. Those trillions are being spent on 128 00:08:45,000 --> 00:08:51,160 S1: scaffolding workarounds to LLM limitations that provide us just general 129 00:08:51,160 --> 00:08:55,720 S1: enough AGI to start replacing people and from there it 130 00:08:55,720 --> 00:08:59,480 S1: will only improve. Given what we've seen in systems like 131 00:08:59,480 --> 00:09:06,400 S1: cloud code cursor codecs that dramatically magnify model capability. While 132 00:09:06,400 --> 00:09:09,840 S1: the models continue to improve along their own axis as well, 133 00:09:10,080 --> 00:09:13,440 S1: do you really want to bet that good enough generality 134 00:09:13,840 --> 00:09:17,559 S1: won't be hit in the next couple of years? I 135 00:09:17,559 --> 00:09:20,520 S1: wouldn't take that bet. And this is why I think 136 00:09:20,559 --> 00:09:24,880 S1: AGI will arrive before 2028. Like a 70% chance. A 137 00:09:24,920 --> 00:09:31,959 S1: rough guess who really knows. And before 2030, I'm guessing 95%. 138 00:09:32,760 --> 00:09:36,040 S1: Not because all the stuff Karpathy is talking about will 139 00:09:36,040 --> 00:09:39,079 S1: be solved by then, but because it won't matter if 140 00:09:39,080 --> 00:09:43,440 S1: it's solved with trillions of dollars in funding and trillions 141 00:09:43,440 --> 00:09:48,000 S1: of dollars in market opportunity, we're almost guaranteed to cloud 142 00:09:48,000 --> 00:09:51,720 S1: code our way past a very low bar of millions 143 00:09:51,720 --> 00:09:53,400 S1: of barely there employees.