1
00:00:02,400 --> 00:00:07,040
Speaker 1: A media Hello and welcome to Better Offline. I'm your

2
00:00:07,040 --> 00:00:20,759
Speaker 1: host ed Zitron. What a lot of you have been

3
00:00:20,800 --> 00:00:24,320
Speaker 1: getting in touch? Yes, you're getting your Deep Seek episode.

4
00:00:24,320 --> 00:00:26,080
Speaker 1: In fact, this is the first of a two parter.

5
00:00:26,640 --> 00:00:28,520
Speaker 1: This will come out on Friday, which is when you're

6
00:00:28,560 --> 00:00:30,960
Speaker 1: listening to this, and then it'll follow up on Monday.

7
00:00:31,320 --> 00:00:36,040
Speaker 1: I apologize. I spent a lot of Monday writing this

8
00:00:36,159 --> 00:00:38,720
Speaker 1: and also learning about a lot of this stuff in

9
00:00:38,760 --> 00:00:41,479
Speaker 1: an attempt to distill it as best I could. This

10
00:00:41,560 --> 00:00:45,120
Speaker 1: situation is extremely weird, and it's developing, and I think

11
00:00:45,159 --> 00:00:46,839
Speaker 1: even when I put out this episode there will be

12
00:00:46,920 --> 00:00:49,320
Speaker 1: new parts of it that I have yet to really

13
00:00:49,360 --> 00:00:52,400
Speaker 1: get to. I will do my absolute best to explain

14
00:00:52,440 --> 00:00:55,240
Speaker 1: in these episodes both what is happening with Deep Seek,

15
00:00:55,640 --> 00:00:58,320
Speaker 1: what it means, what they've built, and what it's going

16
00:00:58,360 --> 00:01:01,880
Speaker 1: to do in the future. But let's again, so, as

17
00:01:01,920 --> 00:01:05,200
Speaker 1: general came to a close, the entire generative AI industry

18
00:01:05,240 --> 00:01:07,880
Speaker 1: found itself in a kind of chaos. In sure, the

19
00:01:07,920 --> 00:01:10,880
Speaker 1: recent AI bubble and in particular the hundreds of billions

20
00:01:10,920 --> 00:01:14,040
Speaker 1: of dollars being spent on it, hinged on this big

21
00:01:14,080 --> 00:01:16,720
Speaker 1: idea that we need bigger models, which are both trained

22
00:01:16,760 --> 00:01:20,319
Speaker 1: and run on bigger and even larger GPUs, almost entirely

23
00:01:20,319 --> 00:01:23,639
Speaker 1: sold by Nvidia, and in turn they're based in bigger

24
00:01:23,680 --> 00:01:27,880
Speaker 1: and bigger data centers owned by companies like Microsoft, Oracle, Amazon,

25
00:01:27,920 --> 00:01:31,320
Speaker 1: and Google. Now, there was also this expectation that this

26
00:01:31,360 --> 00:01:34,840
Speaker 1: would always be the case. Hubris within this industry is

27
00:01:34,920 --> 00:01:39,280
Speaker 1: kind of part of the whole deal, and generative AI

28
00:01:39,520 --> 00:01:41,360
Speaker 1: was always meant to be this way, at least for

29
00:01:41,400 --> 00:01:43,920
Speaker 1: the American developers. It was always meant to be energy

30
00:01:43,959 --> 00:01:47,280
Speaker 1: and compute hungary. Throwing entire Zoo's worth of animals and

31
00:01:47,360 --> 00:01:50,000
Speaker 1: boiling lakes was necessary to do this. There was never

32
00:01:50,000 --> 00:01:54,040
Speaker 1: any other way to do it, and I thought, at

33
00:01:54,120 --> 00:01:56,200
Speaker 1: least I've thought for a while that this was because

34
00:01:56,280 --> 00:01:59,760
Speaker 1: they just they tried to make them more efficient, but

35
00:01:59,840 --> 00:02:03,080
Speaker 1: they couldn't. There was just something about transformer based architecture,

36
00:02:03,120 --> 00:02:05,639
Speaker 1: like the stuff that underpins Chat GPT, so the GPT

37
00:02:05,720 --> 00:02:09,160
Speaker 1: model under Chat GPT either. It wasn't the case, though.

38
00:02:10,000 --> 00:02:12,960
Speaker 1: A Chinese artificial intelligence company that few people had really

39
00:02:13,040 --> 00:02:15,280
Speaker 1: heard of, called deep Seak came along a few weeks

40
00:02:15,280 --> 00:02:19,000
Speaker 1: ago with multiple models that aren't merely competitive with open aiyes,

41
00:02:19,160 --> 00:02:22,919
Speaker 1: but actually undercut them in several meaningful ways. Deep Seak's

42
00:02:22,960 --> 00:02:25,680
Speaker 1: models are both open source, which means that their source

43
00:02:25,720 --> 00:02:29,680
Speaker 1: code and research is public, and they're significantly more efficient

44
00:02:29,680 --> 00:02:32,640
Speaker 1: as well as much as thirty times cheaper to run.

45
00:02:32,720 --> 00:02:34,880
Speaker 1: In the case of their reasoning model are one which

46
00:02:34,919 --> 00:02:38,560
Speaker 1: is competitive with open Aizo one and fifteen or more

47
00:02:38,639 --> 00:02:43,200
Speaker 1: times more efficient than GPT four. Oh, it's actually kind

48
00:02:43,240 --> 00:02:45,200
Speaker 1: of crazy when you think about it, and as you're

49
00:02:45,240 --> 00:02:47,640
Speaker 1: going to hear, this whole thing has jokeified me all

50
00:02:47,680 --> 00:02:50,440
Speaker 1: over again. And what's crazy is that some of them

51
00:02:50,440 --> 00:02:52,799
Speaker 1: can be distilled, which I'll get too later and run

52
00:02:52,840 --> 00:02:55,800
Speaker 1: on local devices like a laptop. It's kind of crazy,

53
00:02:56,160 --> 00:02:58,600
Speaker 1: and as a result, the markets have kind of panicked

54
00:02:58,639 --> 00:03:02,480
Speaker 1: because the entire narrative of the AI bubble has been

55
00:03:02,520 --> 00:03:04,800
Speaker 1: that these models have to be expensive because they are

56
00:03:04,840 --> 00:03:08,079
Speaker 1: the future, and that's why hyperscalers had to burn two

57
00:03:08,160 --> 00:03:12,040
Speaker 1: hundred billion dollars in capital expenditures for infrastructure to support

58
00:03:12,080 --> 00:03:15,919
Speaker 1: this wonderful boom, and specifically the ideas of open AI

59
00:03:16,000 --> 00:03:18,920
Speaker 1: and anthropic the idea that there was another way to

60
00:03:18,960 --> 00:03:20,840
Speaker 1: do this, that in fact, we didn't need to spend

61
00:03:20,840 --> 00:03:22,600
Speaker 1: all this money, and that maybe we could find a

62
00:03:22,639 --> 00:03:26,960
Speaker 1: more efficient way of doing it. Well, that would require

63
00:03:27,000 --> 00:03:29,239
Speaker 1: them to have another idea rather than throw as much

64
00:03:29,280 --> 00:03:32,000
Speaker 1: money at the problem as possible. Yeah, they just didn't

65
00:03:32,080 --> 00:03:35,200
Speaker 1: consider it, it turns out. And now as long as

66
00:03:35,240 --> 00:03:38,920
Speaker 1: come this outsider that's upended the whole conventional understanding and

67
00:03:39,120 --> 00:03:43,440
Speaker 1: perhaps even dethroned a member of America's tech royalty, Sam Altman,

68
00:03:43,480 --> 00:03:46,000
Speaker 1: a man who has crafted, if not a cult of personality,

69
00:03:46,400 --> 00:03:49,440
Speaker 1: some sort of public image of an unassailable visionary that

70
00:03:49,480 --> 00:03:52,200
Speaker 1: will lead the vanguard in the biggest technological change since

71
00:03:52,200 --> 00:03:56,680
Speaker 1: the Internet. Yeah, he's wrong. He never was doing that.

72
00:03:56,960 --> 00:03:59,440
Speaker 1: I've been saying it for a while. He's never been

73
00:03:59,480 --> 00:04:02,880
Speaker 1: doing this. But Deep Seek isn't just an outsider now.

74
00:04:02,920 --> 00:04:05,440
Speaker 1: They are a company that's emerged as a side project

75
00:04:05,440 --> 00:04:08,720
Speaker 1: from a tiny, tiny Chinese hedge fund, at least by

76
00:04:08,760 --> 00:04:10,880
Speaker 1: the stands of hedge funds, like five point five billion

77
00:04:10,920 --> 00:04:14,200
Speaker 1: dollars on the assets under management, and their founding team

78
00:04:14,240 --> 00:04:16,839
Speaker 1: has nowhere near the level of fame and celebrity or

79
00:04:16,839 --> 00:04:21,000
Speaker 1: even the accolades of Sam Moltman. It's distinctly humiliating for

80
00:04:21,080 --> 00:04:23,920
Speaker 1: everyone involved that is in Deep Seek. And on top

81
00:04:23,920 --> 00:04:27,479
Speaker 1: of all of that, Deep Seek's biggest ugliest insult is

82
00:04:27,480 --> 00:04:30,360
Speaker 1: that its model, deep seek are one, is competitive, like

83
00:04:30,400 --> 00:04:33,799
Speaker 1: I said, with open AI's incredibly expensive oh one reasoning model,

84
00:04:33,960 --> 00:04:37,880
Speaker 1: yet significantly and I mean ninety six percent cheaper to run.

85
00:04:38,120 --> 00:04:40,120
Speaker 1: And it can even be run locally. Like I said

86
00:04:40,440 --> 00:04:42,520
Speaker 1: speaking to a few developers, I know one was able

87
00:04:42,520 --> 00:04:44,679
Speaker 1: to run deep Seek's R one model and their twenty

88
00:04:44,760 --> 00:04:47,279
Speaker 1: twenty one MacBook Pro with an M one chip that

89
00:04:47,400 --> 00:04:51,480
Speaker 1: is a four year old computer, not a thirty thousand

90
00:04:51,680 --> 00:04:55,440
Speaker 1: GPU inside. It's kind of crazy. Worse still, Deep seeks

91
00:04:55,480 --> 00:04:58,159
Speaker 1: models are made freely available to use, with the source

92
00:04:58,160 --> 00:05:01,200
Speaker 1: code published under the MIT tech line, along with the

93
00:05:01,200 --> 00:05:04,119
Speaker 1: research on how they were made, although not the training data,

94
00:05:04,160 --> 00:05:06,159
Speaker 1: which makes some people say it's not really open source.

95
00:05:06,160 --> 00:05:08,280
Speaker 1: But for the sake of argument, I'm just going to

96
00:05:08,320 --> 00:05:11,080
Speaker 1: say open source. And this means by the way that

97
00:05:11,320 --> 00:05:14,120
Speaker 1: deep seeks models can be adapted and used for commercial

98
00:05:14,200 --> 00:05:17,599
Speaker 1: use without the need for royalties or fees. Anyone can

99
00:05:17,640 --> 00:05:20,880
Speaker 1: take this and build their own. It's kind of crazy.

100
00:05:21,400 --> 00:05:24,200
Speaker 1: By contrast, open ai is anything but open and its

101
00:05:24,240 --> 00:05:26,840
Speaker 1: last LM to be released under the MIT license was

102
00:05:26,880 --> 00:05:30,479
Speaker 1: twenty nineteen. Is GBT two No no wait wait, ship

103
00:05:30,680 --> 00:05:33,800
Speaker 1: let me correct that deep Seek's biggest ugliest secret is

104
00:05:33,839 --> 00:05:36,880
Speaker 1: actually that it's obviously taking aim at every element of

105
00:05:36,920 --> 00:05:40,839
Speaker 1: open aiy's portfolio. As the company was already dominating headlines,

106
00:05:40,880 --> 00:05:43,719
Speaker 1: this week it quietly dropped its Janus Pro seven B

107
00:05:43,839 --> 00:05:47,360
Speaker 1: image generation and analysis model, which the company says outperforms

108
00:05:47,360 --> 00:05:50,719
Speaker 1: both stable diffusion and open AI's Daly three. And those

109
00:05:50,760 --> 00:05:53,480
Speaker 1: are by the way image generation thinks. So you type

110
00:05:53,480 --> 00:05:57,200
Speaker 1: in something you like Garfield with boobs, and then outcomes

111
00:05:57,200 --> 00:06:00,560
Speaker 1: of Garfield with juicy cans, and that's probably the first

112
00:06:00,560 --> 00:06:02,560
Speaker 1: time you hear that on the podcast, but probably not

113
00:06:02,640 --> 00:06:06,840
Speaker 1: the last. And as with its other code, deep Seek

114
00:06:06,880 --> 00:06:09,320
Speaker 1: has made this freely available to both commercial and personal

115
00:06:09,400 --> 00:06:13,560
Speaker 1: users alike, whereas open ai is largely paywall darly three.

116
00:06:13,640 --> 00:06:17,520
Speaker 1: This is really, it's a truly crazy situation. And it's

117
00:06:17,520 --> 00:06:20,520
Speaker 1: also this cynical, vulgar version of David and Goliath, where

118
00:06:20,520 --> 00:06:23,200
Speaker 1: a tech startup back by a shadowy Chinese hedge fund

119
00:06:23,360 --> 00:06:26,520
Speaker 1: with eight billion dollars under management is somehow the plucky

120
00:06:26,560 --> 00:06:29,000
Speaker 1: upstart against the lumbering loss eo fish one hundred and

121
00:06:29,040 --> 00:06:33,000
Speaker 1: fifty billion dollars startup back by multiple public tech companies

122
00:06:33,000 --> 00:06:36,599
Speaker 1: with a market capitalization of other three trillion dollars I realized,

123
00:06:36,600 --> 00:06:39,119
Speaker 1: by the way I said earlier, five point five billion

124
00:06:39,160 --> 00:06:41,719
Speaker 1: dollars under management. This is why you check your notes

125
00:06:41,720 --> 00:06:44,040
Speaker 1: in advance. But I'm not cutting it. This is fresh.

126
00:06:44,120 --> 00:06:47,120
Speaker 1: I am inside a closet in New York. The content

127
00:06:47,320 --> 00:06:51,159
Speaker 1: must flow anyway. Deep Seek's V three model, which is

128
00:06:51,160 --> 00:06:54,080
Speaker 1: comparable and competitive with both open AI's GPT four roh

129
00:06:54,160 --> 00:06:57,360
Speaker 1: and anthropics Claude Sonnet three point five models, which by

130
00:06:57,360 --> 00:07:00,480
Speaker 1: the way, has some reasoning features. As I said, it's

131
00:07:00,520 --> 00:07:03,839
Speaker 1: fifty three times cheaper to run the R one when

132
00:07:03,920 --> 00:07:08,040
Speaker 1: using the company's own cloud services, and as mentioned earlier,

133
00:07:08,080 --> 00:07:11,000
Speaker 1: said model is effectively free for anyone to use locally

134
00:07:11,080 --> 00:07:13,240
Speaker 1: or on their own cloud instances, and could be taken

135
00:07:13,280 --> 00:07:15,640
Speaker 1: by any commercial enterprise and turned into a product of

136
00:07:15,640 --> 00:07:19,680
Speaker 1: their own should they desire to say, compete with open Ai,

137
00:07:19,800 --> 00:07:24,400
Speaker 1: the loudest and most annoying startup of all time. In essence, Deepseek,

138
00:07:24,440 --> 00:07:26,800
Speaker 1: and I'll get into its background and the concerns people

139
00:07:26,840 --> 00:07:29,600
Speaker 1: might have about its Chinese origins released two models that

140
00:07:29,640 --> 00:07:32,640
Speaker 1: perform competitively and even beat models from both open Air

141
00:07:32,720 --> 00:07:35,760
Speaker 1: and Anthropic, undercut them in price, and then made them

142
00:07:35,800 --> 00:07:38,880
Speaker 1: open undermining not just the economics of the biggest generative

143
00:07:38,880 --> 00:07:42,360
Speaker 1: AI companies, but laying bare exactly how they work. The

144
00:07:42,400 --> 00:07:47,240
Speaker 1: magic's gone. There's no more voodoo inside Samultman's soul. It's

145
00:07:47,320 --> 00:07:51,440
Speaker 1: all out there. And the last point is extremely important

146
00:07:51,480 --> 00:07:54,480
Speaker 1: when it comes to open EI's reasoning model, which specifically

147
00:07:54,600 --> 00:07:57,080
Speaker 1: hid its chain of thought for fear of these unsafe

148
00:07:57,120 --> 00:08:00,200
Speaker 1: thoughts that might manipulate the customer. And then they add

149
00:08:00,280 --> 00:08:02,600
Speaker 1: slightly under their breath that the actual reasons they did

150
00:08:02,640 --> 00:08:05,720
Speaker 1: it was a competitive advantage. Now to explain what that means.

151
00:08:05,880 --> 00:08:09,640
Speaker 1: When you make a request with open Aiy's oh one model,

152
00:08:09,720 --> 00:08:11,720
Speaker 1: say give me all the states with the letter are

153
00:08:11,840 --> 00:08:14,720
Speaker 1: in them, it actually shows you like the thinking. And

154
00:08:14,720 --> 00:08:16,880
Speaker 1: by the way, these things don't fucking think. They're they're

155
00:08:16,920 --> 00:08:19,880
Speaker 1: computer bullshit, like they don't think at all. But I'm

156
00:08:19,880 --> 00:08:22,320
Speaker 1: going to use it just for this so you see it.

157
00:08:22,360 --> 00:08:26,000
Speaker 1: Say okay, here are all the American states, which ones

158
00:08:26,040 --> 00:08:29,080
Speaker 1: have that letter? I'm checking all of those. It's effectively

159
00:08:29,120 --> 00:08:32,440
Speaker 1: having a large language model check a large language model. Now,

160
00:08:32,600 --> 00:08:35,280
Speaker 1: the thing is the steps they were showing you were

161
00:08:35,280 --> 00:08:37,560
Speaker 1: all cleaned up. They would look nice, they would be

162
00:08:37,600 --> 00:08:41,440
Speaker 1: formatted nicely. Deep Seak's chain of thought is completely laid bare,

163
00:08:42,080 --> 00:08:46,000
Speaker 1: which is very interesting because it really takes the wind

164
00:08:46,000 --> 00:08:48,800
Speaker 1: out of open Aiy's sales. And on top of that,

165
00:08:49,760 --> 00:08:52,320
Speaker 1: it allows you to see actually how these things think

166
00:08:52,400 --> 00:08:55,240
Speaker 1: through things, again not really thinking, but still you can

167
00:08:55,280 --> 00:08:57,959
Speaker 1: see things about how large language models work that these

168
00:08:57,960 --> 00:09:00,440
Speaker 1: companies didn't want you to have. On top of this,

169
00:09:00,840 --> 00:09:04,560
Speaker 1: open aiy one model has something even shittier to it,

170
00:09:04,600 --> 00:09:07,240
Speaker 1: which is these chain of thought things all cost money.

171
00:09:07,600 --> 00:09:10,880
Speaker 1: When you see it generate these thoughts, it's actually generating

172
00:09:10,920 --> 00:09:13,240
Speaker 1: more thoughts than you see because they're hiding the chain

173
00:09:13,280 --> 00:09:15,440
Speaker 1: of thought. So open ai is just charging you an

174
00:09:15,440 --> 00:09:18,200
Speaker 1: indeterminate amount of money, an insane amount of money, as

175
00:09:18,200 --> 00:09:21,360
Speaker 1: I'll get too later. But nevertheless, you don't know what

176
00:09:21,400 --> 00:09:23,920
Speaker 1: you're being charged for. You don't even know what's really

177
00:09:23,960 --> 00:09:26,720
Speaker 1: going on under the hood. Or you could use deep

178
00:09:26,760 --> 00:09:30,439
Speaker 1: seek and let's be completely clear, by the way, open

179
00:09:30,440 --> 00:09:34,319
Speaker 1: AI's literal only competitive advantage against Meta and Anthropic was

180
00:09:34,400 --> 00:09:37,200
Speaker 1: its reasoning models OH one and O three and O three,

181
00:09:37,200 --> 00:09:38,839
Speaker 1: by the way, is currently in a research preview and

182
00:09:38,960 --> 00:09:41,920
Speaker 1: is mostly just more of the same. Although I mentioned

183
00:09:41,960 --> 00:09:44,480
Speaker 1: earlier in the show that anthropics. Claudes Sonnet three point

184
00:09:44,520 --> 00:09:48,480
Speaker 1: five has some reasoning features. They're comparatively more rudimentary than

185
00:09:48,520 --> 00:09:50,600
Speaker 1: those in O one and O three, and i'd argue

186
00:09:50,679 --> 00:09:54,400
Speaker 1: are one, which is deep Seek's model. In an AI context,

187
00:09:54,480 --> 00:09:56,839
Speaker 1: reasoning works by breaking down a prompt into a series

188
00:09:56,840 --> 00:10:00,480
Speaker 1: of different steps with considerations of different approaches. Like I

189
00:10:00,520 --> 00:10:03,439
Speaker 1: said earlier, effectively a large language model checking its own

190
00:10:03,480 --> 00:10:06,480
Speaker 1: homework with no thinking involved, because like I said, they

191
00:10:06,480 --> 00:10:09,520
Speaker 1: do not think or no things an open Ai rushed

192
00:10:09,559 --> 00:10:12,160
Speaker 1: to launch its O one reasoning model last year because,

193
00:10:12,320 --> 00:10:15,720
Speaker 1: and I quote fortune from last October, Sam Mormon was

194
00:10:16,000 --> 00:10:19,320
Speaker 1: eager to prove to potential investors that in the company's

195
00:10:19,400 --> 00:10:22,080
Speaker 1: latest funding around, the open ai remains at the forefront

196
00:10:22,120 --> 00:10:25,480
Speaker 1: of AI development, and as I've noted in my newsletter

197
00:10:25,520 --> 00:10:28,400
Speaker 1: at the time, it was not particularly reliable, failing to

198
00:10:28,440 --> 00:10:31,040
Speaker 1: accurately count the number of times the letter R appeared

199
00:10:31,040 --> 00:10:33,800
Speaker 1: in the word strawberry, which was the code name for

200
00:10:34,240 --> 00:10:38,080
Speaker 1: one very funny stuff. At this point, it's fairly obvious

201
00:10:38,120 --> 00:10:41,400
Speaker 1: that open ai wasn't anywhere near the forefront of AI development,

202
00:10:41,640 --> 00:10:44,440
Speaker 1: and now that its competitive advantage is effectively gone, there

203
00:10:44,440 --> 00:10:47,000
Speaker 1: are genuine doubts about what comes next for the company.

204
00:10:48,280 --> 00:10:51,000
Speaker 1: As I'll go into there are many questionable parts of

205
00:10:51,000 --> 00:10:53,960
Speaker 1: Deepseek's story. It's funding, what GPUs it has, and how

206
00:10:54,040 --> 00:10:56,720
Speaker 1: much it actually spent training these models. But what we

207
00:10:56,840 --> 00:11:00,680
Speaker 1: definitively understand to be true is badly for open Ai,

208
00:11:00,880 --> 00:11:03,480
Speaker 1: and I would argue every other large US tech firm

209
00:11:03,480 --> 00:11:06,160
Speaker 1: that's jumped onto the generative AI bandwagon in the past

210
00:11:06,160 --> 00:11:20,200
Speaker 1: few years. Deep seeks models actually exist. They work, at

211
00:11:20,280 --> 00:11:22,880
Speaker 1: least by the standards of hallucination PRONELLA lams that don't,

212
00:11:22,920 --> 00:11:25,959
Speaker 1: at the risk of repeating myself know anything. They've been

213
00:11:26,000 --> 00:11:29,680
Speaker 1: independently verified to be competitive in performance, and their magnitudes

214
00:11:29,800 --> 00:11:34,400
Speaker 1: cheaper in price than those from both hyperscalers, Google's Gemini, Mets, Lama,

215
00:11:34,440 --> 00:11:36,560
Speaker 1: Amazon Que and so on and so forth, and from

216
00:11:36,600 --> 00:11:41,000
Speaker 1: those released by open Ai and Anthropic. Deep seeks models

217
00:11:41,040 --> 00:11:44,200
Speaker 1: don't require massive new data centers. They run on GPUs

218
00:11:44,240 --> 00:11:47,040
Speaker 1: currently used to run services like chat, GPT, and even

219
00:11:47,080 --> 00:11:50,000
Speaker 1: work on more austere hardware, Nor do they require an

220
00:11:50,120 --> 00:11:53,840
Speaker 1: endless supply of bigger, faster Nvidio GPUs every single year

221
00:11:53,880 --> 00:11:57,920
Speaker 1: to progress. The entire AI bubble was inflated based on

222
00:11:57,960 --> 00:12:00,600
Speaker 1: the premise that these models were simply impossible to build

223
00:12:00,600 --> 00:12:04,000
Speaker 1: without burning massive amounts of cash, straining the power grid,

224
00:12:04,000 --> 00:12:07,400
Speaker 1: and blowing past emission skulls, and that these costs were

225
00:12:07,400 --> 00:12:11,560
Speaker 1: both necessary and really good because they'd lead to creating

226
00:12:11,600 --> 00:12:15,400
Speaker 1: powerful AI, something that's yet to happen. And it's kind

227
00:12:15,400 --> 00:12:18,319
Speaker 1: of obvious at this point that that wasn't true. Now

228
00:12:18,360 --> 00:12:22,600
Speaker 1: the markets are sitting around there asking a very reasonable question, Shit,

229
00:12:22,760 --> 00:12:27,400
Speaker 1: did we just waste two hundred billion dollars? Anyway, let's

230
00:12:27,400 --> 00:12:30,720
Speaker 1: get into the nitty grit. What is deep Seek? First

231
00:12:30,760 --> 00:12:32,760
Speaker 1: of all, if you want to super deep dive into

232
00:12:32,800 --> 00:12:35,240
Speaker 1: what it is, I can't recommend venture beats right up enough.

233
00:12:35,280 --> 00:12:36,880
Speaker 1: I'll link to it in the show notes as they

234
00:12:36,960 --> 00:12:39,800
Speaker 1: usually do. It's really good and it goes into a

235
00:12:39,800 --> 00:12:42,120
Speaker 1: lot more detail than I woar. But here's the too

236
00:12:42,200 --> 00:12:44,880
Speaker 1: long didn't read for you. Deep Seek is a spin

237
00:12:44,920 --> 00:12:47,520
Speaker 1: off from a Chinese hedge fund called high Flyer Quant.

238
00:12:47,840 --> 00:12:50,079
Speaker 1: It's a relatively small and young company, and from its

239
00:12:50,120 --> 00:12:52,960
Speaker 1: inception it went big on algorithmic and AI driven trading.

240
00:12:53,320 --> 00:12:56,120
Speaker 1: Later it started building its own standalone chat bots, including

241
00:12:56,120 --> 00:12:59,440
Speaker 1: a chat GPT equivalent for the Chinese market. This is

242
00:12:59,559 --> 00:13:01,760
Speaker 1: what we need, right Now, I'm sure some of you

243
00:13:01,800 --> 00:13:05,080
Speaker 1: will say, oh, well, who knows if that's really true. Sure,

244
00:13:05,520 --> 00:13:07,760
Speaker 1: I think that that's fair. I also think that there

245
00:13:07,760 --> 00:13:09,880
Speaker 1: are parts of Sam Mortman's legend that we should question

246
00:13:09,960 --> 00:13:13,280
Speaker 1: as well. I think the circumstances under which Sam Mortman

247
00:13:13,360 --> 00:13:16,880
Speaker 1: got made head of y Combinator are extremely questionable. I'm

248
00:13:16,920 --> 00:13:19,240
Speaker 1: saying you can question deep Seek, and indeed you should.

249
00:13:19,240 --> 00:13:21,920
Speaker 1: We should be more critical of these powerful companies, but

250
00:13:22,040 --> 00:13:24,520
Speaker 1: don't do it halfway. If we're going to be worried,

251
00:13:24,600 --> 00:13:28,360
Speaker 1: let's be worried about everyone. Now. Deepseak did a few

252
00:13:28,360 --> 00:13:31,200
Speaker 1: things differently, like open sourcing its models, although it likely

253
00:13:31,240 --> 00:13:34,800
Speaker 1: built upon take from other companies like Metaslama and the

254
00:13:35,160 --> 00:13:38,680
Speaker 1: mL library PyTorch to train its models. It's secured over

255
00:13:38,760 --> 00:13:43,160
Speaker 1: ten thousand Nvidia GPUs right before the US imposed export restrictions,

256
00:13:43,160 --> 00:13:45,240
Speaker 1: which sounds like a lot, but it's a fraction of

257
00:13:45,240 --> 00:13:47,320
Speaker 1: what the big AI labs like Google, Open Air, and

258
00:13:47,360 --> 00:13:50,480
Speaker 1: Anthropic have to play with. I think I've heard estimates

259
00:13:50,520 --> 00:13:53,120
Speaker 1: of like one hundred thousand to three hundred thousand each,

260
00:13:53,200 --> 00:13:56,199
Speaker 1: if not more. Now you've likely seen or heard that

261
00:13:56,280 --> 00:13:59,080
Speaker 1: deep Seak trained its latest model for five point six

262
00:13:59,120 --> 00:14:01,520
Speaker 1: million dollars a poster to the insane amounts that I'll

263
00:14:01,520 --> 00:14:03,640
Speaker 1: get to later, and I want to be clear that

264
00:14:03,840 --> 00:14:06,760
Speaker 1: any and all mentions of this number are estimates. In fact,

265
00:14:06,800 --> 00:14:09,600
Speaker 1: the provenance of the five point five to eight million

266
00:14:09,679 --> 00:14:12,000
Speaker 1: dollar number appears to be a citation of a post

267
00:14:12,040 --> 00:14:15,080
Speaker 1: made by an nvidio engineer in an article from the

268
00:14:15,120 --> 00:14:18,199
Speaker 1: South China Morning Post, which links to another article from

269
00:14:18,240 --> 00:14:21,040
Speaker 1: the South China Morning Post which simply states that deep

270
00:14:21,080 --> 00:14:23,480
Speaker 1: Seat V three comes with six hundred and seventy one

271
00:14:23,480 --> 00:14:25,880
Speaker 1: billion parameters and was trained in around two months at

272
00:14:25,880 --> 00:14:28,400
Speaker 1: the cost of five point five eight million dollars with

273
00:14:28,480 --> 00:14:31,640
Speaker 1: no additional citations of any kind. So you should take

274
00:14:31,640 --> 00:14:36,320
Speaker 1: it with a pinch of salt. But it's not totally ludicrous. Well,

275
00:14:36,360 --> 00:14:38,920
Speaker 1: there are some that have estimated the cost. Deep Seeks

276
00:14:39,000 --> 00:14:41,840
Speaker 1: V three models allegedly trained using two thousand and forty

277
00:14:41,880 --> 00:14:45,440
Speaker 1: eight n video h eight hundred GPUs according to its paper,

278
00:14:46,000 --> 00:14:48,840
Speaker 1: and Ben Thompson of Strategory has made this clear that

279
00:14:48,880 --> 00:14:51,440
Speaker 1: the five point five million dollar number only covers the

280
00:14:51,480 --> 00:14:54,520
Speaker 1: literal training cost of the official training run, and this

281
00:14:54,640 --> 00:14:56,400
Speaker 1: is made fairly clear in the paper by the way

282
00:14:56,520 --> 00:14:59,080
Speaker 1: of V three, and that's the one that's competitive with

283
00:14:59,200 --> 00:15:02,400
Speaker 1: Opening Eyes GPT four O model, meaning that any costs

284
00:15:02,440 --> 00:15:04,680
Speaker 1: related to prior research or experiments on how to build

285
00:15:04,680 --> 00:15:07,800
Speaker 1: the mooddle were left out. Now big big shower to

286
00:15:07,800 --> 00:15:10,400
Speaker 1: Minimaxer the guy on Blue Sky and Twitter, he's great.

287
00:15:10,960 --> 00:15:13,200
Speaker 1: He is wonderful, and also added that this is fairly

288
00:15:13,200 --> 00:15:16,240
Speaker 1: standard for the industry. Again, you choose how you feel

289
00:15:16,240 --> 00:15:17,840
Speaker 1: about this, but I want to give you the information.

290
00:15:19,080 --> 00:15:21,680
Speaker 1: And while it's safe to say that deep Seak's models

291
00:15:21,680 --> 00:15:24,600
Speaker 1: are cheaper to train, the actual costs, especially as deep

292
00:15:24,600 --> 00:15:27,040
Speaker 1: Seak doesn't share its training data, which some might argue

293
00:15:27,040 --> 00:15:29,440
Speaker 1: means its models are not really open source. As I said,

294
00:15:30,560 --> 00:15:33,400
Speaker 1: the numbers get a little harder to guess at. Thompson

295
00:15:33,440 --> 00:15:35,160
Speaker 1: notes that Deep Seek had to craft a bunch of

296
00:15:35,160 --> 00:15:38,560
Speaker 1: elegant workarounds to make the model perform, including writing code

297
00:15:38,560 --> 00:15:41,600
Speaker 1: that ultimately changed how GPUs actually communicated with each other.

298
00:15:41,960 --> 00:15:45,880
Speaker 1: This functionality isn't otherwise possible using Nvidia's developer tools. They

299
00:15:46,000 --> 00:15:47,760
Speaker 1: really had to get in there. It's kind of cool.

300
00:15:48,160 --> 00:15:50,720
Speaker 1: Deep seaks models V three and R one are more

301
00:15:50,760 --> 00:15:53,160
Speaker 1: efficient and as a result, cheaper to run, and can

302
00:15:53,200 --> 00:15:56,560
Speaker 1: be accessed via its API at prices that are astronomically

303
00:15:56,640 --> 00:16:00,240
Speaker 1: cheaper than open eyes, Deep seat Chat running deep six

304
00:16:00,360 --> 00:16:03,960
Speaker 1: GPT four oh competitive V three model cost zero points

305
00:16:04,040 --> 00:16:07,640
Speaker 1: zero seven cents per one million input tokens as in

306
00:16:07,680 --> 00:16:11,080
Speaker 1: commands given to the model, and one dollar one ten

307
00:16:11,480 --> 00:16:14,520
Speaker 1: per one million output tokens as in the resulting output

308
00:16:14,560 --> 00:16:16,800
Speaker 1: from the model. I know that these numbers kind of

309
00:16:16,840 --> 00:16:19,200
Speaker 1: like just sound like numbers like you, Maybe you don't

310
00:16:19,240 --> 00:16:21,160
Speaker 1: have context, so let me give you some. This is

311
00:16:21,200 --> 00:16:24,440
Speaker 1: a dramatic price drop from the two dollars fifty cents

312
00:16:24,480 --> 00:16:28,040
Speaker 1: per one million input tokens and ten dollars per one

313
00:16:28,080 --> 00:16:32,520
Speaker 1: million output tokens the open Ai charges for GPT four. Oh,

314
00:16:33,200 --> 00:16:39,400
Speaker 1: this isn't just undercutting, this is this is a bunker buster. If. Now,

315
00:16:39,520 --> 00:16:41,560
Speaker 1: there is a side that I'll kind of get into

316
00:16:41,560 --> 00:16:44,160
Speaker 1: a little bit later, in that you are using models

317
00:16:44,160 --> 00:16:46,440
Speaker 1: hosted in the country that you don't know, probably China.

318
00:16:46,760 --> 00:16:49,920
Speaker 1: There are data concerns. But again, you can put this

319
00:16:50,040 --> 00:16:52,800
Speaker 1: on your own server. You could put this in Google Cloud.

320
00:16:52,880 --> 00:16:55,880
Speaker 1: Both Microsoft and Google are apparently thinking about it now.

321
00:16:55,880 --> 00:16:58,560
Speaker 1: The Information reported that Google had added it to Google Cloud.

322
00:16:58,720 --> 00:17:01,520
Speaker 1: No they did not. They didn't do that. They allowed

323
00:17:01,520 --> 00:17:03,840
Speaker 1: you to connect hugging face. This is a whole bunch

324
00:17:03,840 --> 00:17:06,159
Speaker 1: of technical stuff that if you understand, you'll be like, yeah, Ed,

325
00:17:06,240 --> 00:17:10,639
Speaker 1: I know. Long story short, the hyperscalers are already bringing

326
00:17:10,680 --> 00:17:13,920
Speaker 1: deep Seek out, and I'll get to why that's bad

327
00:17:14,200 --> 00:17:17,480
Speaker 1: later in detail. But it's also very funny. Now here's

328
00:17:17,520 --> 00:17:20,680
Speaker 1: something else that's funny. Deep seek reasoner. It's reasoning model

329
00:17:20,760 --> 00:17:23,600
Speaker 1: costs that fifty five cents per one million input tokens

330
00:17:23,680 --> 00:17:27,160
Speaker 1: and two dollars and nineteen cents per one million output tokens.

331
00:17:27,359 --> 00:17:31,360
Speaker 1: Now that sounds expensive. Maybe it is. Whatever, that's goddamn

332
00:17:31,480 --> 00:17:34,760
Speaker 1: nothing compared to the fifteen dollars per one million input

333
00:17:34,840 --> 00:17:37,600
Speaker 1: tokens and sixty dollars per one million output tokens of

334
00:17:37,640 --> 00:17:41,960
Speaker 1: open ai WOF. If I'm Sam Orman, I'm shitting myself.

335
00:17:43,560 --> 00:17:45,800
Speaker 1: But there's an obvious bar here. We do not know

336
00:17:45,840 --> 00:17:48,560
Speaker 1: where deep seek is hosting its models, who has access

337
00:17:48,560 --> 00:17:50,640
Speaker 1: to that data, or where that data is coming from

338
00:17:50,760 --> 00:17:52,960
Speaker 1: or going to. We don't know who funds deep Seek

339
00:17:53,040 --> 00:17:55,240
Speaker 1: other than it's connected to High Flyer, the hedge fund

340
00:17:55,240 --> 00:17:57,320
Speaker 1: that I mentioned earlier that it's split from. In twenty

341
00:17:57,359 --> 00:17:59,760
Speaker 1: twenty three, there are concerns that deep seak could be

342
00:17:59,760 --> 00:18:02,200
Speaker 1: stayed funded, and that deep Seek's low prices are a

343
00:18:02,280 --> 00:18:05,000
Speaker 1: kind of geopolitical weapon breaking the back of the generative

344
00:18:05,000 --> 00:18:08,440
Speaker 1: AI industry in America. I'm not really sure whether that's

345
00:18:08,480 --> 00:18:11,080
Speaker 1: the case or not. It's certainly true that China has

346
00:18:11,119 --> 00:18:13,720
Speaker 1: long treated AI as a strategic part of its national

347
00:18:13,760 --> 00:18:16,840
Speaker 1: industrial policy and is reported to help companies and sectors

348
00:18:16,840 --> 00:18:18,800
Speaker 1: where it wants to catch up with the Western world.

349
00:18:19,480 --> 00:18:21,879
Speaker 1: The Made in China twenty twenty five initiatives SAW are

350
00:18:21,880 --> 00:18:25,399
Speaker 1: reported hundreds of billions of dollars provided to Chinese firms

351
00:18:25,440 --> 00:18:28,960
Speaker 1: working in industries like chip making, aviation, and yeah AI.

352
00:18:29,400 --> 00:18:32,760
Speaker 1: The extent of that support isn't exactly transparent, surprise, surprise,

353
00:18:33,000 --> 00:18:34,760
Speaker 1: and so it's not entirely out of the realm of

354
00:18:34,760 --> 00:18:37,800
Speaker 1: possibility that deep Seek is also the recipient of state aid.

355
00:18:38,240 --> 00:18:39,760
Speaker 1: The good news is that we're going to find out

356
00:18:39,840 --> 00:18:43,720
Speaker 1: fairly quickly. American AI infrastructure company Grok is already bringing

357
00:18:43,760 --> 00:18:46,680
Speaker 1: deep Seek's model online, meaning that we'll get at least

358
00:18:46,720 --> 00:18:49,760
Speaker 1: a very some sort of confirmation of whether these prices

359
00:18:49,760 --> 00:18:52,520
Speaker 1: are realistic or whether they're heavily subsidized by whoever it

360
00:18:52,560 --> 00:18:55,080
Speaker 1: is that backs deep Seek. It's also true that deep

361
00:18:55,080 --> 00:18:57,280
Speaker 1: seek is owned in part by a hedge fund, which

362
00:18:57,359 --> 00:19:00,479
Speaker 1: likely isn't short of cash to pump into them. But

363
00:19:00,520 --> 00:19:03,439
Speaker 1: as in the side, given the open AI is the

364
00:19:03,520 --> 00:19:07,199
Speaker 1: benefactor of billions of dollars of cloud compute credits and

365
00:19:07,240 --> 00:19:10,600
Speaker 1: gets reduced pricing for Microsoft's zero cloud services to run

366
00:19:10,640 --> 00:19:13,560
Speaker 1: its actual models, it's a bit tough for them to

367
00:19:13,600 --> 00:19:16,439
Speaker 1: complain about Arrival being subsidized by a larger entity with

368
00:19:16,480 --> 00:19:18,960
Speaker 1: the ability to absorb the costs of doing business should

369
00:19:19,040 --> 00:19:21,560
Speaker 1: that be the case. Same goes for anthropic by the way,

370
00:19:21,920 --> 00:19:24,359
Speaker 1: and yes, I know Microsoft isn't a state, but with

371
00:19:24,400 --> 00:19:26,960
Speaker 1: a market cap of three point two trillion dollars in

372
00:19:27,040 --> 00:19:30,320
Speaker 1: quarterly revenues, larger than the combined GDPs of some EU

373
00:19:30,400 --> 00:19:33,000
Speaker 1: and NATO nations, it's kind of the next best thing.

374
00:19:33,640 --> 00:19:36,560
Speaker 1: But I digress. Whatever concerns there may be about malign

375
00:19:36,680 --> 00:19:40,000
Speaker 1: Chinese influence of bordering on irrelevant outside of the low prices,

376
00:19:40,040 --> 00:19:43,080
Speaker 1: of course, offered by deepseek itself, and even that is

377
00:19:43,080 --> 00:19:46,080
Speaker 1: speculative at this point. Once these models are hosted elsewhere,

378
00:19:46,119 --> 00:19:48,240
Speaker 1: and once deep Seek's methods, which I'll get to in

379
00:19:48,280 --> 00:19:50,760
Speaker 1: a little bit, are recreated, and by the way, that's

380
00:19:50,800 --> 00:19:52,840
Speaker 1: not really going to take very long. I believe we're

381
00:19:52,840 --> 00:19:54,880
Speaker 1: going to see that these prices are indicative of how

382
00:19:54,960 --> 00:20:11,280
Speaker 1: cheap these models are to run. So you might be wondering,

383
00:20:11,359 --> 00:20:13,480
Speaker 1: how the hell is this so much cheaper? And that's

384
00:20:13,480 --> 00:20:15,639
Speaker 1: a bloody good question. And because I'm me, I have

385
00:20:15,680 --> 00:20:19,520
Speaker 1: a hypothesis. I do not believe that the companies making

386
00:20:19,600 --> 00:20:22,520
Speaker 1: these foundation models, such as Open Air and Anthropic, have

387
00:20:22,600 --> 00:20:25,639
Speaker 1: actually been incentivized to do more with less. And because

388
00:20:25,680 --> 00:20:29,359
Speaker 1: they're chummy little relationships with hyperscalers like Amazon, Google and

389
00:20:29,400 --> 00:20:33,040
Speaker 1: Microsoft were focused almost entirely on making the biggest, most

390
00:20:33,119 --> 00:20:37,240
Speaker 1: hugest models possible, using the biggest, even hugerris chips. And

391
00:20:37,280 --> 00:20:39,960
Speaker 1: because the absence of profitability didn't stop them from raising

392
00:20:40,000 --> 00:20:43,200
Speaker 1: more money. Well, they've never had to be fucking efficient,

393
00:20:43,320 --> 00:20:46,520
Speaker 1: have they. They've never had to try. Maybe they should

394
00:20:46,520 --> 00:20:50,359
Speaker 1: buy less avocado fucking toast. Anyway, let me put it

395
00:20:50,359 --> 00:20:53,960
Speaker 1: in simpler terms. Imagine living on fifteen hundred dollars a month,

396
00:20:54,040 --> 00:20:55,639
Speaker 1: and then imagine how you'd live on one hundred and

397
00:20:55,680 --> 00:20:57,800
Speaker 1: fifty thousand dollars a month, and that you have to,

398
00:20:58,160 --> 00:21:00,479
Speaker 1: like Brewster's millions, spend as much much of it as

399
00:21:00,480 --> 00:21:04,240
Speaker 1: you can to complete a mission, a very simple mission. Live.

400
00:21:05,240 --> 00:21:08,320
Speaker 1: In the former example, you concern survival, you have a

401
00:21:08,359 --> 00:21:10,280
Speaker 1: limited amount of money and must make it go as

402
00:21:10,280 --> 00:21:12,639
Speaker 1: far as possible, with real sacrifices to be made with

403
00:21:12,680 --> 00:21:14,880
Speaker 1: every dollar you spent. If you want to have fun,

404
00:21:15,080 --> 00:21:17,199
Speaker 1: you're going to have to eat less. Potentially all the

405
00:21:17,240 --> 00:21:19,240
Speaker 1: food you eat will have to be cheaper. You have

406
00:21:19,280 --> 00:21:21,640
Speaker 1: to live on a budget. You have to make decisions,

407
00:21:21,680 --> 00:21:24,399
Speaker 1: and indeed you might learn to cook at home. You

408
00:21:24,480 --> 00:21:27,520
Speaker 1: might walk more, you might do things that will help

409
00:21:27,560 --> 00:21:30,800
Speaker 1: you not spend all your money. In the latter example,

410
00:21:30,880 --> 00:21:32,720
Speaker 1: where you have one hundred and fifty thousand dollars a

411
00:21:32,760 --> 00:21:35,720
Speaker 1: month that you must spend, your incentivize the splurge to

412
00:21:35,800 --> 00:21:39,359
Speaker 1: lean into excess to pursue this vague idea of living

413
00:21:39,400 --> 00:21:43,159
Speaker 1: your life, your actions are dictated not by any existential threats,

414
00:21:43,240 --> 00:21:45,800
Speaker 1: or indeed any kind of future planning, but by whatever

415
00:21:45,840 --> 00:21:49,600
Speaker 1: you perceive to be an opportunity to live. Open AI

416
00:21:49,720 --> 00:21:53,000
Speaker 1: and anthropic are emblematic of what happens when survival takes

417
00:21:53,000 --> 00:21:56,240
Speaker 1: a back seat to living. They have been incentivized by

418
00:21:56,280 --> 00:21:59,600
Speaker 1: frothy venture capital and public markets desperate for the next

419
00:21:59,600 --> 00:22:02,600
Speaker 1: big thing thing, the next big growth to build bigger

420
00:22:02,600 --> 00:22:05,480
Speaker 1: models and sell even bigger dreams. Like Dario Amaday of

421
00:22:05,480 --> 00:22:08,800
Speaker 1: Anthropics saying that your AI and I quote could surpass

422
00:22:08,840 --> 00:22:12,800
Speaker 1: almost all human beings at almost everything shortly after twenty

423
00:22:12,960 --> 00:22:16,000
Speaker 1: twenty seven, I just want to take a fucking second. Journalist,

424
00:22:16,040 --> 00:22:18,720
Speaker 1: if you're listening to this, stop fucking quoting this bullshit.

425
00:22:19,440 --> 00:22:22,800
Speaker 1: Stop it. You're doing nothing. You are failing at your

426
00:22:22,840 --> 00:22:26,840
Speaker 1: goddamn job every single time you quote this bullshit, this nonsense.

427
00:22:27,119 --> 00:22:29,800
Speaker 1: Shortly after twenty twenty seven. What the fuck does that mean?

428
00:22:29,840 --> 00:22:33,640
Speaker 1: Twenty twenty eight, twenty twenty nine, twenty thirty, what does

429
00:22:34,000 --> 00:22:38,760
Speaker 1: surpassing humans and almost everything even mean? This shit doesn't work.

430
00:22:38,840 --> 00:22:42,040
Speaker 1: This shit is not good. Oh my god. Anyway, back

431
00:22:42,080 --> 00:22:45,399
Speaker 1: to the podcast, the Calm Damn. Both Open AI and

432
00:22:45,440 --> 00:22:48,280
Speaker 1: Anthropic have effectively lived their existence with the infinite money

433
00:22:48,320 --> 00:22:50,320
Speaker 1: cheap from the SIMS. And I know some of you

434
00:22:50,440 --> 00:22:52,120
Speaker 1: might say, by the way, it's not an infant money,

435
00:22:52,119 --> 00:22:54,440
Speaker 1: you just add you go into the console. You get

436
00:22:54,440 --> 00:22:57,199
Speaker 1: my point. And both companies have been bleeding billions of

437
00:22:57,200 --> 00:22:59,760
Speaker 1: dollars a year after revenue, and that's, by the way,

438
00:23:00,040 --> 00:23:03,080
Speaker 1: making billions of dollars and then still losing billions is insane,

439
00:23:03,480 --> 00:23:06,200
Speaker 1: and they still operated as if money would never run

440
00:23:06,200 --> 00:23:09,560
Speaker 1: out because it and it wouldn't. If they were actually

441
00:23:09,560 --> 00:23:11,919
Speaker 1: worried about that happening, they would have certainly tried to

442
00:23:11,920 --> 00:23:14,439
Speaker 1: do what Deep seek has done, except they didn't have

443
00:23:14,560 --> 00:23:16,720
Speaker 1: to because both of them had the endless cash and

444
00:23:16,760 --> 00:23:20,720
Speaker 1: access to GPUs from either Microsoft, Amazon or Google. And

445
00:23:21,000 --> 00:23:23,480
Speaker 1: the stargate thing is just I will mention it later,

446
00:23:23,680 --> 00:23:26,280
Speaker 1: just long story short. They're not going to put five

447
00:23:26,359 --> 00:23:29,000
Speaker 1: hundred billion dollars into the it was up to five

448
00:23:29,040 --> 00:23:32,800
Speaker 1: hundred bill I'm so tired of this shit. Open iron

449
00:23:32,800 --> 00:23:35,359
Speaker 1: anthropic have never been made to sweat, unlike me in

450
00:23:35,400 --> 00:23:38,320
Speaker 1: this closet where I'm recording this. And they've received endless

451
00:23:38,320 --> 00:23:40,600
Speaker 1: amount of free marketing from a tech and business media

452
00:23:40,640 --> 00:23:44,320
Speaker 1: happy to print whatever vapid bullshit they spout, and it's

453
00:23:44,400 --> 00:23:48,080
Speaker 1: just very frustrating. They've raised money at will with ananthropic,

454
00:23:48,119 --> 00:23:50,560
Speaker 1: by the way, is currently raising another two billion dollars,

455
00:23:50,680 --> 00:23:52,840
Speaker 1: valuing the company at sixty billion dollars. And this was

456
00:23:52,920 --> 00:23:55,600
Speaker 1: I think happening while deep Zeek was going on, which

457
00:23:55,640 --> 00:23:58,040
Speaker 1: is really funny. And they've done all of this off

458
00:23:58,040 --> 00:24:00,800
Speaker 1: of a narrative of them. We need more money than

459
00:24:00,800 --> 00:24:04,080
Speaker 1: any company is ever needed ever because the things we're

460
00:24:04,080 --> 00:24:08,800
Speaker 1: doing have to cost this much. There is no other way.

461
00:24:09,000 --> 00:24:12,159
Speaker 1: You must give us more money. My name is Sam Altman.

462
00:24:12,200 --> 00:24:14,640
Speaker 1: I need more money than has ever been made from

463
00:24:14,680 --> 00:24:17,320
Speaker 1: my huge, beautiful company that sucks and needs money to

464
00:24:17,359 --> 00:24:20,440
Speaker 1: train it. Help me, please, My big, beautiful sick company

465
00:24:20,480 --> 00:24:22,520
Speaker 1: is dying, but the best and most important company of

466
00:24:22,520 --> 00:24:28,119
Speaker 1: all time. It's also normal. Now. Do I think that

467
00:24:28,200 --> 00:24:30,399
Speaker 1: they were aware that there were methods to make their

468
00:24:30,440 --> 00:24:34,280
Speaker 1: models more efficient? Sure, open AI tried and failed in

469
00:24:34,320 --> 00:24:36,560
Speaker 1: twenty twenty three to deliver a more efficient model to

470
00:24:36,600 --> 00:24:42,600
Speaker 1: Microsoft called Arakis. I'm sure there are teams that both

471
00:24:42,600 --> 00:24:45,920
Speaker 1: Anthropic and OPENII that are specifically dedicated to making things

472
00:24:46,040 --> 00:24:48,560
Speaker 1: kind of more efficient. But they didn't have to do it,

473
00:24:48,600 --> 00:24:51,639
Speaker 1: and so they didn't. And as I've written before in

474
00:24:51,680 --> 00:24:54,400
Speaker 1: my newsletter and argued on this very podcast, open AI

475
00:24:54,520 --> 00:24:56,880
Speaker 1: simply burns money and have been allowed to burn money,

476
00:24:56,880 --> 00:24:58,879
Speaker 1: and up until recently likely would have been allowed to

477
00:24:58,880 --> 00:25:02,040
Speaker 1: burn even more money because everybody, all of the American

478
00:25:02,080 --> 00:25:04,639
Speaker 1: model developers, appeared to agree that the only way to

479
00:25:04,640 --> 00:25:07,280
Speaker 1: develop large language models was to make them as big

480
00:25:07,400 --> 00:25:10,840
Speaker 1: as humanly possible and work out troublesome stuff like making

481
00:25:10,840 --> 00:25:14,240
Speaker 1: them profitable or turning them into a useful thing later,

482
00:25:14,560 --> 00:25:17,840
Speaker 1: which is I presume when AGI happens, a thing that

483
00:25:17,840 --> 00:25:20,679
Speaker 1: they're still in the process of defining, let alone doing.

484
00:25:21,760 --> 00:25:23,640
Speaker 1: Deep Seek, on the other hand, had to work out

485
00:25:23,640 --> 00:25:25,600
Speaker 1: a way to make its own large language models within

486
00:25:25,640 --> 00:25:28,000
Speaker 1: the constraints of the hamstrung end video chips that can

487
00:25:28,040 --> 00:25:31,080
Speaker 1: be legally sold to China. While there's a whole cottaged

488
00:25:31,119 --> 00:25:34,160
Speaker 1: industry of selling chips in Chines using resellers and other

489
00:25:34,200 --> 00:25:37,280
Speaker 1: parties to get restricted silicon into the country, the entire

490
00:25:37,320 --> 00:25:40,040
Speaker 1: way in which deep Seek went about developing its models

491
00:25:40,160 --> 00:25:44,240
Speaker 1: suggests that it was working around very specific memory bandwidth constraints,

492
00:25:44,560 --> 00:25:46,320
Speaker 1: meaning that the amount of data that could be fed

493
00:25:46,320 --> 00:25:48,640
Speaker 1: into it and out of it and into the chips.

494
00:25:48,680 --> 00:25:51,720
Speaker 1: In essence, doing more with less wasn't something it shows,

495
00:25:51,720 --> 00:25:55,000
Speaker 1: but it's something they had to do. I've touched already

496
00:25:55,000 --> 00:25:57,160
Speaker 1: on the technical how of these models in greater depth,

497
00:25:57,200 --> 00:25:59,200
Speaker 1: and you can really read in that in my news

498
00:25:59,240 --> 00:26:01,359
Speaker 1: there and you can go to whez your hed not

499
00:26:01,480 --> 00:26:03,200
Speaker 1: at it's at the end of the episode. But I'll

500
00:26:03,200 --> 00:26:05,560
Speaker 1: also have show notes to what cales like Ben Thompson's

501
00:26:05,520 --> 00:26:08,960
Speaker 1: some strategory because there are lots of things to read here.

502
00:26:09,000 --> 00:26:11,160
Speaker 1: I know there are some really technical listeners, and I'm

503
00:26:11,160 --> 00:26:13,800
Speaker 1: sure you're gonna flame me in my emails. Please go

504
00:26:13,840 --> 00:26:16,080
Speaker 1: and read it. I'm not wrong. I've checked with a

505
00:26:16,080 --> 00:26:18,920
Speaker 1: lot of people too, and by the way, all of

506
00:26:18,920 --> 00:26:22,399
Speaker 1: this austerity stuff seems to have worked. There's also the

507
00:26:22,440 --> 00:26:26,840
Speaker 1: training data situation and another mayor culper. I've previously discussed

508
00:26:26,880 --> 00:26:29,760
Speaker 1: the concept of model collapse and how feeding synthetic data,

509
00:26:29,800 --> 00:26:32,639
Speaker 1: which is training data created by a generative model, into

510
00:26:32,680 --> 00:26:35,440
Speaker 1: another model, could end up teaching it bad habits, which

511
00:26:35,440 --> 00:26:37,800
Speaker 1: in turn would destroy the model. But it seems that

512
00:26:37,840 --> 00:26:41,240
Speaker 1: deep Seekers succeeded in training its models using generative data

513
00:26:41,760 --> 00:26:45,919
Speaker 1: specifically though, and I'm quoting geekwise John Turou like mathematics

514
00:26:45,960 --> 00:26:49,000
Speaker 1: where correctness is unambiguous, and using and i quote again,

515
00:26:49,240 --> 00:26:52,640
Speaker 1: highly efficient reward functions that could identify with which new

516
00:26:52,680 --> 00:26:55,959
Speaker 1: training examples would actually improve the model, avoiding wasted compute

517
00:26:55,960 --> 00:26:59,000
Speaker 1: on redundant data, and it seems to have worked. Though

518
00:26:59,040 --> 00:27:02,080
Speaker 1: model collapse may still be a possibility. This approach extremely

519
00:27:02,119 --> 00:27:04,720
Speaker 1: precise use of synthetic data is in line with some

520
00:27:04,760 --> 00:27:07,399
Speaker 1: of the defenses against model collapse I've heard from LLLM

521
00:27:07,440 --> 00:27:10,600
Speaker 1: developers i've talked to. This is also a situation where

522
00:27:10,640 --> 00:27:13,440
Speaker 1: we don't know the exact training data, and it doesn't

523
00:27:13,480 --> 00:27:16,320
Speaker 1: negate any of the previous points I've made about model collapse.

524
00:27:17,119 --> 00:27:20,520
Speaker 1: Now we'll see what happens there. But synthetic data might

525
00:27:20,560 --> 00:27:22,359
Speaker 1: work where the output is something that you could figure

526
00:27:22,359 --> 00:27:24,800
Speaker 1: out using a calculator. But when you get into anything

527
00:27:24,840 --> 00:27:26,840
Speaker 1: a bit more fuzzy, like written text or anything with

528
00:27:26,880 --> 00:27:30,680
Speaker 1: an element of analysis, you'll likely encounter some unhappy side effects.

529
00:27:30,840 --> 00:27:32,760
Speaker 1: But I don't know if that's really going to change

530
00:27:32,760 --> 00:27:35,679
Speaker 1: how good these things are. There's also a little scuttle

531
00:27:35,680 --> 00:27:38,840
Speaker 1: about about where deep seak got its data. Ben Thompson,

532
00:27:38,880 --> 00:27:42,080
Speaker 1: that's trategory suggests that deep seek's models are potentially distilling

533
00:27:42,160 --> 00:27:45,040
Speaker 1: other model's outputs, by which I mean having another model,

534
00:27:45,080 --> 00:27:48,520
Speaker 1: say metas LAMA or open ais GPT four H, which

535
00:27:48,560 --> 00:27:51,119
Speaker 1: is why deep seak identified itself as chet GPT at

536
00:27:51,160 --> 00:27:54,240
Speaker 1: one point spit out outputs specifically to train parts of

537
00:27:54,240 --> 00:27:57,600
Speaker 1: deep Seek. This obviously violates the terms of service of

538
00:27:57,640 --> 00:28:00,280
Speaker 1: these tools, as open AI and its rivals with much rather,

539
00:28:00,400 --> 00:28:03,240
Speaker 1: have you not use its technology to create its next rival.

540
00:28:03,800 --> 00:28:07,480
Speaker 1: And open Aye, by the way, has recently reportedly found

541
00:28:07,480 --> 00:28:10,880
Speaker 1: evidence that deep seek used open AIS models to train

542
00:28:10,960 --> 00:28:14,160
Speaker 1: its rivals. And this is from the Financial Times, although

543
00:28:14,200 --> 00:28:16,800
Speaker 1: it failed to make any formal allegations, but it did

544
00:28:16,880 --> 00:28:19,520
Speaker 1: say that using chat gpt to train a competing model

545
00:28:19,640 --> 00:28:22,920
Speaker 1: violates its terms of service, and David Sachs, the investor

546
00:28:22,920 --> 00:28:25,920
Speaker 1: in Trump administration AI and cryptos are, says it's possible

547
00:28:25,960 --> 00:28:29,320
Speaker 1: that this occurred, although he failed to provide evidence. I

548
00:28:29,440 --> 00:28:31,760
Speaker 1: just want to say, how fucking funny it is that

549
00:28:31,920 --> 00:28:36,000
Speaker 1: open air is going where where you're stealing my stuff?

550
00:28:36,040 --> 00:28:41,440
Speaker 1: Don't steal my things? Where fucking coward, pansy bastard bitches.

551
00:28:41,560 --> 00:28:44,880
Speaker 1: Fucking hell, what a what a bunch of whiny babies.

552
00:28:44,960 --> 00:28:49,400
Speaker 1: Oh no, my plagiarism machine got plagiarized. Where kiss my

553
00:28:49,760 --> 00:28:54,160
Speaker 1: entire asshole, sam Orman, you little worm, you fucking embarrassment

554
00:28:54,200 --> 00:28:56,640
Speaker 1: to Silicon Valley. You should be ashamed of yourself for

555
00:28:56,680 --> 00:29:01,120
Speaker 1: many reasons, but so much this though. Where Yeah, oh no,

556
00:29:01,320 --> 00:29:03,800
Speaker 1: you stole from use my plagier is the machine that

557
00:29:03,880 --> 00:29:07,200
Speaker 1: requires me to steal from literally every artist and author

558
00:29:07,240 --> 00:29:09,640
Speaker 1: on the Internet. The thing where we went on YouTube

559
00:29:09,680 --> 00:29:12,760
Speaker 1: and transcribed everything and fed it into the machine. That's

560
00:29:12,800 --> 00:29:15,680
Speaker 1: that's not stealing, that's good. But you using our model

561
00:29:15,720 --> 00:29:19,200
Speaker 1: to generate answers. That's just not fair. What a bunch

562
00:29:19,240 --> 00:29:22,160
Speaker 1: of babies, you guys say. I'm almos worth billions of dollars.

563
00:29:22,240 --> 00:29:24,880
Speaker 1: He has a five million dollar car. Cry more, you

564
00:29:24,960 --> 00:29:29,080
Speaker 1: little worm. Personally, I genuinely want open ai to point

565
00:29:29,080 --> 00:29:31,600
Speaker 1: a finger at Deep Seek and accuse it of IP theft,

566
00:29:32,080 --> 00:29:35,280
Speaker 1: mostly for the yucks, but also for the hypocrisy factor.

567
00:29:35,600 --> 00:29:38,440
Speaker 1: This is a company that, as I've just very cleanly said,

568
00:29:38,600 --> 00:29:42,240
Speaker 1: exists purely from the wholesale industrial larceny of content produced

569
00:29:42,240 --> 00:29:46,200
Speaker 1: by literally a fucking everyone, And now they're crying way.

570
00:29:47,040 --> 00:29:49,920
Speaker 1: I'm sam Olman. I'm a big baby. I've filled my

571
00:29:50,080 --> 00:29:54,280
Speaker 1: diaper because someone stole from my plagiarism machine. Kiss my ass,

572
00:29:55,000 --> 00:29:58,920
Speaker 1: Kiss my ass. These companies haven't got shit. Open ai

573
00:29:59,040 --> 00:30:01,840
Speaker 1: doesn't have shit. They they don't have anything, They don't

574
00:30:01,880 --> 00:30:05,360
Speaker 1: have a next product without reasoning, they haven't got anything.

575
00:30:05,600 --> 00:30:10,360
Speaker 1: And now they don't have that disgusting justification that overspending

576
00:30:10,400 --> 00:30:14,160
Speaker 1: the fat, ugly American startup culture of spending as much

577
00:30:14,200 --> 00:30:17,080
Speaker 1: as you can to build America's next top monopoly. They

578
00:30:17,080 --> 00:30:20,720
Speaker 1: should be fucking ashamed of themselves. They shouldn't be billionaires,

579
00:30:20,760 --> 00:30:23,880
Speaker 1: they should be poverty stricken. They should have to pay

580
00:30:23,880 --> 00:30:27,680
Speaker 1: everyone they stole for And it's just it sickens me

581
00:30:27,800 --> 00:30:31,120
Speaker 1: seeing the reaction from some people on this, seeing the sinophobia,

582
00:30:31,280 --> 00:30:33,920
Speaker 1: but seeing this level of defensiveness of a company like

583
00:30:33,960 --> 00:30:37,560
Speaker 1: open AI or Anthropic, And as I'll get into next episode,

584
00:30:37,640 --> 00:30:40,200
Speaker 1: we are really running out of time here, and I

585
00:30:40,240 --> 00:30:43,960
Speaker 1: think Deep Seek is really I think it could be

586
00:30:44,080 --> 00:30:47,360
Speaker 1: really the end of days for these companies. I don't

587
00:30:47,360 --> 00:30:50,000
Speaker 1: know how much they've got left time wise, or even

588
00:30:50,040 --> 00:30:53,120
Speaker 1: money wise, and I'm not sure how they even raise money.

589
00:30:53,200 --> 00:30:55,240
Speaker 1: But in the next episode, I'm going to deep dive

590
00:30:55,280 --> 00:30:58,040
Speaker 1: into Deep Seek and I'll tell you how they sent

591
00:30:58,120 --> 00:31:00,120
Speaker 1: the US tech market into a panic and what it

592
00:31:00,120 --> 00:31:03,760
Speaker 1: actually means the future of open Aianthropic and the hyperscale

593
00:31:03,840 --> 00:31:06,920
Speaker 1: is backing them. This has been a crazy few days.

594
00:31:07,400 --> 00:31:10,480
Speaker 1: I hope this has helped, and on Monday you'll find

595
00:31:10,480 --> 00:31:13,800
Speaker 1: out more. Thank you so much for listening. The support

596
00:31:13,840 --> 00:31:15,520
Speaker 1: I've got for the show has been incredible, and the

597
00:31:15,560 --> 00:31:19,880
Speaker 1: emails I've got about Deep Seek. I've been trying Okay,

598
00:31:19,920 --> 00:31:22,240
Speaker 1: I've really been trying so the fastest I could do it.

599
00:31:22,880 --> 00:31:24,520
Speaker 1: But I'm so happy to do this show, and I'm

600
00:31:24,520 --> 00:31:34,680
Speaker 1: so grateful for all of you. Thank you for listening

601
00:31:34,720 --> 00:31:37,360
Speaker 1: to Better Offline. The editor and composer of the Better

602
00:31:37,400 --> 00:31:40,400
Speaker 1: Offline theme song is Matasowski. You can check out more

603
00:31:40,440 --> 00:31:43,920
Speaker 1: of his music and audio projects at Matasowski dot com,

604
00:31:44,040 --> 00:31:48,960
Speaker 1: M A T T O. S O w Ski dot com.

605
00:31:49,000 --> 00:31:51,320
Speaker 1: You can email me at easy at Better Offline dot

606
00:31:51,360 --> 00:31:53,560
Speaker 1: com or visit Better Offline dot com to find more

607
00:31:53,600 --> 00:31:57,000
Speaker 1: podcast links and of course, my newsletter. I also really

608
00:31:57,000 --> 00:31:59,320
Speaker 1: recommend you go to chat dot where's youreaed dot at

609
00:31:59,320 --> 00:32:01,760
Speaker 1: to visit the disc and go to our slash Better

610
00:32:01,800 --> 00:32:04,960
Speaker 1: Offline to check out I'll Reddit. Thank you so much

611
00:32:05,000 --> 00:32:08,840
Speaker 1: for listening. Better Offline is a production of cool Zone Media.

612
00:32:08,960 --> 00:32:12,360
Speaker 1: For more from cool Zone Media, visit our website Coolzonemedia

613
00:32:12,400 --> 00:32:15,200
Speaker 1: dot com, or check us out on the iHeartRadio app,

614
00:32:15,280 --> 00:32:17,720
Speaker 1: Apple Podcasts, or wherever you get your podcasts.