1
00:00:00,840 --> 00:00:04,480
S1: Andrej Karpathy came on Dwarkesh. S podcast recently and I

2
00:00:04,480 --> 00:00:07,880
S1: have a number of thoughts. The consensus seems to be

3
00:00:07,880 --> 00:00:11,240
S1: that Karpathy thinks AGI is ten years away and therefore

4
00:00:11,240 --> 00:00:14,600
S1: Gary Marcus is right. And people like myself and Sholto

5
00:00:15,240 --> 00:00:17,400
S1: and all the other people saying AGI is within a

6
00:00:17,400 --> 00:00:21,520
S1: few years have just basically lost the war. It's a

7
00:00:21,520 --> 00:00:27,040
S1: compelling narrative, but that's not really what happened. He did, however,

8
00:00:27,080 --> 00:00:30,800
S1: say that he thinks AGI is ten years out. But

9
00:00:30,800 --> 00:00:34,040
S1: the AGI debate has always hinged on definitions, and I

10
00:00:34,040 --> 00:00:36,760
S1: think the one that Karpathy is using is the reason

11
00:00:36,760 --> 00:00:40,720
S1: he's wrong. It came from back when he was at OpenAI,

12
00:00:40,880 --> 00:00:43,840
S1: and it basically goes like this. An AI that can

13
00:00:43,840 --> 00:00:48,360
S1: do any economically valuable work as good or better than

14
00:00:48,360 --> 00:00:52,160
S1: a human. And again, that goes all the way back, like,

15
00:00:52,200 --> 00:00:55,920
S1: I don't know, whenever Karpathy was at OpenAI. This is

16
00:00:55,920 --> 00:00:59,800
S1: over five years ago. I simply don't think this is

17
00:00:59,800 --> 00:01:03,790
S1: the best definition to use. I quite like it as

18
00:01:03,790 --> 00:01:07,390
S1: a pure definition or as a computer science definition, but

19
00:01:07,390 --> 00:01:10,110
S1: I think we should use one that focuses more on

20
00:01:10,110 --> 00:01:16,510
S1: practically and directly helping humans and avoiding bad outcomes for humans,

21
00:01:16,709 --> 00:01:19,789
S1: as opposed to talking about what's interesting and valuable to

22
00:01:19,830 --> 00:01:25,589
S1: AI people like us. I'm worried about human worker replacement,

23
00:01:25,750 --> 00:01:29,110
S1: specifically human knowledge work, and that's why I've been using

24
00:01:29,110 --> 00:01:33,589
S1: this definition since 2023. And Dwarkesh is now using this

25
00:01:33,590 --> 00:01:37,870
S1: definition as well, which is an AI system that can

26
00:01:37,870 --> 00:01:41,910
S1: replace an average knowledge worker. For me, this is a

27
00:01:41,910 --> 00:01:44,750
S1: better definition for two reasons. One, it focuses on the

28
00:01:44,750 --> 00:01:48,390
S1: fact that it's an AI system and not one particular

29
00:01:48,390 --> 00:01:52,950
S1: component of a system like a model. Two it provides

30
00:01:52,950 --> 00:01:55,390
S1: a more direct benchmark for the thing we care about,

31
00:01:55,390 --> 00:01:58,910
S1: which is our companies actually replacing workers with the system.

32
00:01:59,190 --> 00:02:03,670
S1: Yes or no. And this system part is extremely key.

33
00:02:04,990 --> 00:02:08,230
S1: I have no reason or even ability to disagree with

34
00:02:08,230 --> 00:02:12,550
S1: Karpathy on the limitations of pure llms. He recently wrote

35
00:02:12,550 --> 00:02:17,270
S1: yet another LLM from scratch by hand A thousand lines

36
00:02:17,270 --> 00:02:20,950
S1: of code. He is the actual sensei here. Like I

37
00:02:20,950 --> 00:02:28,030
S1: know 0.0017% of what he knows about Llms. The problem is,

38
00:02:28,030 --> 00:02:32,230
S1: AI systems aren't just the llms themselves, they're not naked

39
00:02:32,230 --> 00:02:37,230
S1: neural nets. When you go to ChatGPT and you're talking

40
00:02:37,230 --> 00:02:40,350
S1: with GPT five, you're not talking to a base neural net,

41
00:02:40,350 --> 00:02:43,510
S1: you're talking to an AI system. You're talking to the

42
00:02:43,510 --> 00:02:46,990
S1: result of that initial LLM being shaped and molded with

43
00:02:46,990 --> 00:02:51,750
S1: colossal amounts of extra scaffolding and engineering to be the

44
00:02:51,750 --> 00:02:55,950
S1: best possible system it can be for doing that particular task.

45
00:02:56,230 --> 00:03:00,420
S1: In this case, being a chatbot or an assistant. This

46
00:03:00,419 --> 00:03:03,980
S1: distinction is crucial because replacing human jobs will also be

47
00:03:03,980 --> 00:03:08,140
S1: done through composite, stitched together systems that are many times

48
00:03:08,139 --> 00:03:12,899
S1: more powerful than their parts. To replace a project manager

49
00:03:12,900 --> 00:03:16,860
S1: or an executive assistant, the company's building human worker replacement

50
00:03:16,860 --> 00:03:20,260
S1: aren't going to wait for GPT nine or Gemini 7.5

51
00:03:20,780 --> 00:03:24,820
S1: to maybe solve their problems. Human worker replacement will happen

52
00:03:24,820 --> 00:03:28,700
S1: through AI products and systems that work around the pure

53
00:03:28,740 --> 00:03:34,500
S1: limitations of llms and of individual model intelligence like Rag.

54
00:03:34,540 --> 00:03:39,860
S1: Expanding context, windows, context management, things like that. And the

55
00:03:39,860 --> 00:03:43,940
S1: best example of this is actually Claude code. It's just

56
00:03:43,940 --> 00:03:47,780
S1: a brilliant example. Just throwing out estimates when Cloud Code

57
00:03:47,780 --> 00:03:51,940
S1: came out, which was earlier in 25, in like basically

58
00:03:51,940 --> 00:03:55,580
S1: March of 25 when it launched, it was like five

59
00:03:55,620 --> 00:03:58,740
S1: times better than opus, which was its best model at

60
00:03:58,740 --> 00:04:04,420
S1: the time for doing coding tasks and stuff like that. Well,

61
00:04:04,420 --> 00:04:07,780
S1: it's less than ten months later and it's already gotten

62
00:04:07,780 --> 00:04:11,980
S1: many times better than that already. It's like a night

63
00:04:11,980 --> 00:04:15,340
S1: and day difference. Yes, the models got better, but that's

64
00:04:15,340 --> 00:04:19,739
S1: not what made the difference. It was constant iterative improvements,

65
00:04:19,779 --> 00:04:23,740
S1: grinding towards improving how the AI talks to itself and

66
00:04:23,740 --> 00:04:30,220
S1: how humans interact with the AI coordination, context management, context engineering.

67
00:04:31,420 --> 00:04:33,860
S1: And just now they added skills, which takes the whole

68
00:04:33,860 --> 00:04:38,780
S1: thing to like completely different tier. This is exactly the

69
00:04:38,779 --> 00:04:43,740
S1: type of efficiency ratchet that will apply to human work replacement,

70
00:04:44,500 --> 00:04:47,260
S1: where we don't have enough context window to read all

71
00:04:47,260 --> 00:04:51,700
S1: the company's docs. Companies will have or invent systems to

72
00:04:51,740 --> 00:04:56,060
S1: do that, whether or not general enough to match human flexibility.

73
00:04:56,060 --> 00:04:59,180
S1: They'll just add so many great use cases and capabilities

74
00:04:59,700 --> 00:05:03,140
S1: based roughly around like the agent skills thing from anthropic

75
00:05:03,140 --> 00:05:06,299
S1: that they just released that we eventually won't notice because

76
00:05:06,300 --> 00:05:10,380
S1: it'll cover most use cases. The part that concerns me

77
00:05:10,380 --> 00:05:13,419
S1: most about the speed of progress towards AI replacing human

78
00:05:13,420 --> 00:05:17,300
S1: knowledge workers is not the speed of the AI system improvement.

79
00:05:17,740 --> 00:05:20,820
S1: It's also the fact that the bar is so low.

80
00:05:21,540 --> 00:05:24,580
S1: A good portion of our culture's comedy is based on

81
00:05:24,580 --> 00:05:28,660
S1: the utter incompetence of, like, half of our workforce. We're

82
00:05:28,660 --> 00:05:32,299
S1: talking about the worst possible customer service, people bragging about

83
00:05:32,300 --> 00:05:35,500
S1: how little work they do, making a sport of doing

84
00:05:35,500 --> 00:05:38,820
S1: the bare minimum, showing up the bare minimum amount of time,

85
00:05:39,500 --> 00:05:42,020
S1: not doing hardly any work and getting away with it

86
00:05:42,020 --> 00:05:47,140
S1: and getting paid. People absolutely detesting their jobs. Even decent

87
00:05:47,140 --> 00:05:50,419
S1: workers just mindlessly punch in and out a lot of

88
00:05:50,420 --> 00:05:56,730
S1: the time. Mediocrity is the baseline, almost by definition. That

89
00:05:56,730 --> 00:06:01,930
S1: is what multibillion dollar human worker replacement startups are competing with,

90
00:06:02,170 --> 00:06:05,330
S1: not the top 10% performers that you know, a lot

91
00:06:05,330 --> 00:06:09,289
S1: of us know, at least for now. Think of it

92
00:06:09,290 --> 00:06:12,330
S1: this way in the time that we went from cloud

93
00:06:12,330 --> 00:06:16,650
S1: code not existing to getting really, really good to now

94
00:06:16,650 --> 00:06:22,610
S1: having shareable work task replacement skills, the bottom 50% of

95
00:06:22,610 --> 00:06:28,809
S1: knowledge workers improved by how much? Zero in the time

96
00:06:28,810 --> 00:06:33,050
S1: since ChatGPT came out. Right. So we're talking about late 22.

97
00:06:33,770 --> 00:06:37,210
S1: So we're talking about what is that over three years

98
00:06:38,890 --> 00:06:42,210
S1: in the time since ChatGPT came out, we're talking about

99
00:06:42,250 --> 00:06:47,250
S1: a stark difference in AI before then and now, three

100
00:06:47,250 --> 00:06:51,210
S1: full years go by, the bottom 50% of knowledge workers

101
00:06:51,210 --> 00:06:57,120
S1: improved their capabilities. By how much? Again, 0%. The bar

102
00:06:57,120 --> 00:07:01,560
S1: for human work replacement is not moving, while the capabilities

103
00:07:01,560 --> 00:07:07,159
S1: of AI systems are going absolutely apeshit. Now, you might

104
00:07:07,160 --> 00:07:09,240
S1: push back saying this is only for the people not

105
00:07:09,240 --> 00:07:13,320
S1: trying very hard or who aren't that smart or whatever. True.

106
00:07:13,440 --> 00:07:17,200
S1: But it doesn't matter. You and me and Dwarkesh and

107
00:07:17,200 --> 00:07:20,840
S1: Karpathy are going to be fine. So what? I'm worried

108
00:07:20,840 --> 00:07:25,560
S1: about everyone else. If AI only eats the absolute worst

109
00:07:25,720 --> 00:07:28,800
S1: bottom 50% of knowledge workers in the next 5 or

110
00:07:28,800 --> 00:07:33,680
S1: 10 years, we're still talking about hundreds of millions of jobs,

111
00:07:34,760 --> 00:07:38,440
S1: or even 25%. So basically a bunch of I just

112
00:07:38,440 --> 00:07:41,520
S1: did a bunch of research on this, and the total

113
00:07:41,520 --> 00:07:46,800
S1: number of knowledge workers worldwide is right around a billion.

114
00:07:47,680 --> 00:07:53,120
S1: 1 billion knowledge workers. So half is a big percentage.

115
00:07:53,120 --> 00:07:57,920
S1: That's 500 million people, but let's just say it's 10%.

116
00:07:57,920 --> 00:08:02,440
S1: Let's just say it's 25%. And we've already established that

117
00:08:02,440 --> 00:08:04,960
S1: these are the least competent people at the job. So no,

118
00:08:04,960 --> 00:08:08,120
S1: they won't be pivoting easily to another knowledge work position.

119
00:08:09,280 --> 00:08:13,160
S1: This is why I disagree with Karpathy on AGI. It's

120
00:08:13,160 --> 00:08:18,120
S1: not because he's wrong about Llms having severe limitations. He's not,

121
00:08:18,560 --> 00:08:21,400
S1: but he's focused on the wrong thing. If the thing

122
00:08:21,400 --> 00:08:25,360
S1: we care about is AI's near-term and practical impact on humanity,

123
00:08:26,120 --> 00:08:28,600
S1: the thing to watch is not the pure LLM tech

124
00:08:28,760 --> 00:08:33,200
S1: or the specific technical limitations of RL to achieving continuous learning.

125
00:08:33,559 --> 00:08:37,559
S1: It's the trillions of dollars being invested in replacing the

126
00:08:37,559 --> 00:08:41,320
S1: worst performing human workers, who will likely never get better

127
00:08:41,320 --> 00:08:45,000
S1: than they already are. Those trillions are being spent on

128
00:08:45,000 --> 00:08:51,160
S1: scaffolding workarounds to LLM limitations that provide us just general

129
00:08:51,160 --> 00:08:55,720
S1: enough AGI to start replacing people and from there it

130
00:08:55,720 --> 00:08:59,480
S1: will only improve. Given what we've seen in systems like

131
00:08:59,480 --> 00:09:06,400
S1: cloud code cursor codecs that dramatically magnify model capability. While

132
00:09:06,400 --> 00:09:09,840
S1: the models continue to improve along their own axis as well,

133
00:09:10,080 --> 00:09:13,440
S1: do you really want to bet that good enough generality

134
00:09:13,840 --> 00:09:17,559
S1: won't be hit in the next couple of years? I

135
00:09:17,559 --> 00:09:20,520
S1: wouldn't take that bet. And this is why I think

136
00:09:20,559 --> 00:09:24,880
S1: AGI will arrive before 2028. Like a 70% chance. A

137
00:09:24,920 --> 00:09:31,959
S1: rough guess who really knows. And before 2030, I'm guessing 95%.

138
00:09:32,760 --> 00:09:36,040
S1: Not because all the stuff Karpathy is talking about will

139
00:09:36,040 --> 00:09:39,079
S1: be solved by then, but because it won't matter if

140
00:09:39,080 --> 00:09:43,440
S1: it's solved with trillions of dollars in funding and trillions

141
00:09:43,440 --> 00:09:48,000
S1: of dollars in market opportunity, we're almost guaranteed to cloud

142
00:09:48,000 --> 00:09:51,720
S1: code our way past a very low bar of millions

143
00:09:51,720 --> 00:09:53,400
S1: of barely there employees.