1
00:00:02,480 --> 00:00:07,040
Speaker 1: Bloomberg Audio Studios, Podcasts, Radio News.

2
00:00:18,040 --> 00:00:21,919
Speaker 2: Hello and welcome to another episode of the Odd Lots podcast.

3
00:00:22,000 --> 00:00:24,480
Speaker 3: I'm Joe Wisenthal and I'm Tracy Alloway.

4
00:00:24,560 --> 00:00:26,200
Speaker 2: Tracy the Deep Seek sell off.

5
00:00:27,200 --> 00:00:30,240
Speaker 3: That's right, it's pretty deep. Has anyone made that joke yet.

6
00:00:30,200 --> 00:00:31,080
Speaker 1: We're in Deep Seek?

7
00:00:31,240 --> 00:00:33,600
Speaker 2: Yeah, I don't think anyone who's made that joke.

8
00:00:33,880 --> 00:00:36,760
Speaker 3: I will say, like, you know, it's bad in markets

9
00:00:36,800 --> 00:00:40,000
Speaker 3: when all the headlines are about standard deviation, yes, right,

10
00:00:40,040 --> 00:00:43,240
Speaker 3: And then you know it's really bad when you see

11
00:00:43,280 --> 00:00:46,000
Speaker 3: people start to say it's not a crash, it's a

12
00:00:46,040 --> 00:00:49,159
Speaker 3: healthy correction. Yes, that's the real cope.

13
00:00:49,360 --> 00:00:52,280
Speaker 2: But just for like real scene setting. You know, We've

14
00:00:52,320 --> 00:00:55,520
Speaker 2: done some very timely interviews about tech concentration in the

15
00:00:55,560 --> 00:00:57,840
Speaker 2: market lately and how so much of the market is

16
00:00:57,880 --> 00:01:02,000
Speaker 2: this big concentrated bed on AI et cetera. Anyway, on Monday,

17
00:01:02,080 --> 00:01:04,120
Speaker 2: I think people will be listening to this. On Tuesday,

18
00:01:04,560 --> 00:01:07,720
Speaker 2: markets got clobbered in video one of the big winners

19
00:01:07,800 --> 00:01:10,120
Speaker 2: as of the time I'm talking about this three thirty

20
00:01:10,120 --> 00:01:14,000
Speaker 2: pm on Monday, down seventeen percent. We're talking major laws

21
00:01:14,080 --> 00:01:17,640
Speaker 2: is really across the tech complex. Basically, it seems to

22
00:01:17,680 --> 00:01:21,839
Speaker 2: be catalyzed by the introduction of this high performance, open

23
00:01:21,920 --> 00:01:26,199
Speaker 2: source Chinese AI model called deep Seek. I was born,

24
00:01:26,319 --> 00:01:28,360
Speaker 2: from what we know, out of a hedge fund. Apparently

25
00:01:28,440 --> 00:01:30,959
Speaker 2: it was very cheap to train, very cheap to build.

26
00:01:31,440 --> 00:01:34,640
Speaker 2: You know, the tech constraints at this point didn't seem

27
00:01:34,680 --> 00:01:36,000
Speaker 2: to be much of a problem. They may be a

28
00:01:36,000 --> 00:01:38,720
Speaker 2: problem going forward, But yes, here is something the entire

29
00:01:38,760 --> 00:01:41,080
Speaker 2: market betting on a lot of companies making AI and

30
00:01:41,160 --> 00:01:45,120
Speaker 2: are now concerns about, of course, a cheap Chinese competitor.

31
00:01:45,360 --> 00:01:48,520
Speaker 3: I just realized, Joe, this is actually your fault, isn't it.

32
00:01:49,040 --> 00:01:51,160
Speaker 3: This last week you wrote that you were a deep

33
00:01:51,200 --> 00:01:55,360
Speaker 3: Seek aibro and look what you've done. You've wiped five

34
00:01:55,480 --> 00:01:58,240
Speaker 3: hundred and sixty billion dollars off of in videos market.

35
00:01:58,440 --> 00:02:01,440
Speaker 2: Yeah, might be that's you anyway. One of the interesting

36
00:02:01,520 --> 00:02:03,480
Speaker 2: questions though, is that this was sort of announced in

37
00:02:03,520 --> 00:02:06,920
Speaker 2: a white paper in December. Why did it take for

38
00:02:07,040 --> 00:02:09,799
Speaker 2: till January twenty seventh for related to freak people out?

39
00:02:10,040 --> 00:02:13,200
Speaker 2: Big questions? Anyway, let's jump right into it. We really

40
00:02:13,240 --> 00:02:16,240
Speaker 2: do have the perfect guest, someone who's was here for

41
00:02:16,280 --> 00:02:19,640
Speaker 2: our election Eve Special, a guy who knows all about

42
00:02:20,120 --> 00:02:23,320
Speaker 2: numbers and AI and quant stuff, and he writes a

43
00:02:23,360 --> 00:02:26,680
Speaker 2: substack that has become for me a daily absolute must

44
00:02:26,720 --> 00:02:29,919
Speaker 2: read where he writes an extraordinary amount. I don't even

45
00:02:29,960 --> 00:02:31,560
Speaker 2: know how he writes so much on a given day.

46
00:02:31,840 --> 00:02:34,160
Speaker 2: We're going to be speaking with Zvi Mashowitz. He is

47
00:02:34,240 --> 00:02:37,280
Speaker 2: the author of the Don't Worry about the Vase blog

48
00:02:37,400 --> 00:02:41,480
Speaker 2: or substack. ZV. You're also a deep seki brill. You've

49
00:02:41,520 --> 00:02:42,359
Speaker 2: switched to using that.

50
00:02:43,000 --> 00:02:46,519
Speaker 1: So I use a wide variety of different ais. So

51
00:02:46,639 --> 00:02:49,600
Speaker 1: I will use quad paranthropic, I will use one from

52
00:02:49,680 --> 00:02:53,600
Speaker 1: ta GPT, from open Ai. I'll use Gemini sometimes, and

53
00:02:53,639 --> 00:02:56,520
Speaker 1: I'll use Perplexity for web searches. But yeah, I'll use

54
00:02:56,600 --> 00:02:59,960
Speaker 1: R one, the new deep seat model for certain type

55
00:03:00,160 --> 00:03:02,680
Speaker 1: queries where I want to see how it thinks and

56
00:03:02,840 --> 00:03:06,000
Speaker 1: like see the logic laid out, and then I can judge,

57
00:03:06,000 --> 00:03:07,760
Speaker 1: like did that make sense? Do I agree with that?

58
00:03:08,480 --> 00:03:10,880
Speaker 3: So one of the things that seems to be freaking

59
00:03:10,960 --> 00:03:14,799
Speaker 3: people out as well as the market is that purportedly

60
00:03:15,480 --> 00:03:19,519
Speaker 3: this was trained on like a very low cost, something

61
00:03:19,600 --> 00:03:23,760
Speaker 3: like five point five million dollars for deep Seek V three,

62
00:03:24,080 --> 00:03:27,160
Speaker 3: although I've seen people erroneously say that the five point

63
00:03:27,200 --> 00:03:30,040
Speaker 3: five million was for all of its R one models,

64
00:03:30,040 --> 00:03:32,760
Speaker 3: and that's not what it says in the technical paper.

65
00:03:32,840 --> 00:03:35,760
Speaker 3: It was just for V three. But anyway, oh I

66
00:03:35,760 --> 00:03:38,280
Speaker 3: should mention it also seems like a big chunk of

67
00:03:38,280 --> 00:03:41,320
Speaker 3: it was built on Mama, so they're sort of piggybacking

68
00:03:41,600 --> 00:03:45,520
Speaker 3: off of others investment. But anyway, five point five million

69
00:03:45,560 --> 00:03:50,320
Speaker 3: dollars to train, is that a realistic and then b

70
00:03:50,600 --> 00:03:52,840
Speaker 3: do we have any sense of how they were able

71
00:03:52,920 --> 00:03:53,360
Speaker 3: to do that.

72
00:03:53,720 --> 00:03:55,320
Speaker 1: So we have a very good sense of exactly what

73
00:03:55,360 --> 00:03:58,760
Speaker 1: they did because they're unusually open and they gave us

74
00:03:58,760 --> 00:04:01,120
Speaker 1: technical papers, they tell us what they did. They still

75
00:04:01,200 --> 00:04:03,400
Speaker 1: hid some parts of the process, especially with getting from

76
00:04:03,640 --> 00:04:05,400
Speaker 1: V three, which was trained for the five point five

77
00:04:05,440 --> 00:04:08,040
Speaker 1: million two R one, which is the reasoning model for

78
00:04:08,200 --> 00:04:10,720
Speaker 1: additional millions of dollars, where they tried to make it

79
00:04:10,720 --> 00:04:12,160
Speaker 1: a little bit harder for us to duplicate it by

80
00:04:12,200 --> 00:04:16,479
Speaker 1: not sharing their reinforcement learning techniques. But we shouldn't get

81
00:04:16,480 --> 00:04:18,440
Speaker 1: over anchored or carried away with the five point five

82
00:04:18,440 --> 00:04:20,240
Speaker 1: million dollar number. It's not that it's not real, it's

83
00:04:20,360 --> 00:04:23,880
Speaker 1: very real. But in order to get that ability to

84
00:04:23,880 --> 00:04:26,799
Speaker 1: spend five point five million dollars and get the model

85
00:04:26,839 --> 00:04:28,680
Speaker 1: to pop out. They had to acquire the data, they

86
00:04:28,680 --> 00:04:30,360
Speaker 1: had to hire the engineers, they had to build their

87
00:04:30,400 --> 00:04:33,840
Speaker 1: own cluster, they had to over optimize to the bone

88
00:04:34,040 --> 00:04:36,680
Speaker 1: their cluster because they're having problems of chip access thanks

89
00:04:36,680 --> 00:04:39,320
Speaker 1: to our export controls. And they were training on eight hundreds.

90
00:04:40,480 --> 00:04:43,400
Speaker 1: And the way they did this was they did all

91
00:04:43,400 --> 00:04:46,880
Speaker 1: these sorts of mini optimism, little optimizations, including like just

92
00:04:46,960 --> 00:04:50,559
Speaker 1: exactly integrating the hardware, the software, everything they were doing

93
00:04:51,000 --> 00:04:53,800
Speaker 1: in order to train as cheaply as possible on fifteen

94
00:04:53,800 --> 00:04:58,640
Speaker 1: trillion tokens and get the same level of performance or

95
00:04:58,839 --> 00:05:01,120
Speaker 1: you know, close to the same level performance as other

96
00:05:01,160 --> 00:05:04,440
Speaker 1: companies have gotten with much much more compute. But it

97
00:05:04,480 --> 00:05:05,960
Speaker 1: doesn't mean that you can get your own model for

98
00:05:06,000 --> 00:05:07,760
Speaker 1: five point five million dollars, even though they told you

99
00:05:07,800 --> 00:05:10,200
Speaker 1: a lot of the information. In total, they're spending hundreds

100
00:05:10,200 --> 00:05:11,400
Speaker 1: of millions of dollars to get this result.

101
00:05:11,560 --> 00:05:13,800
Speaker 2: Wait, explain that further. Why does it still take hundreds

102
00:05:13,800 --> 00:05:16,800
Speaker 2: of millions And does this mean if it takes hundreds

103
00:05:16,800 --> 00:05:20,039
Speaker 2: of millions of dollars that the gap between what they're

104
00:05:20,080 --> 00:05:23,000
Speaker 2: able to do versus the say American labs is perhaps

105
00:05:23,040 --> 00:05:24,599
Speaker 2: not as wide as maybe people think.

106
00:05:24,880 --> 00:05:28,640
Speaker 1: Well, what deepseek is doing is they have less access

107
00:05:28,680 --> 00:05:31,080
Speaker 1: to chips. They can't just buy Navidiot chips the same

108
00:05:31,120 --> 00:05:34,160
Speaker 1: way that you know open ai or Microsoft or and

109
00:05:34,279 --> 00:05:37,599
Speaker 1: throb it can buy Nvidiot chips. So instead they had

110
00:05:37,600 --> 00:05:40,880
Speaker 1: to make good use, very very efficient, killer use of

111
00:05:40,920 --> 00:05:44,600
Speaker 1: the chips that they did have. So they focused on

112
00:05:44,920 --> 00:05:46,960
Speaker 1: all these optimizations and all of these ways that they

113
00:05:47,000 --> 00:05:50,120
Speaker 1: could save on compute. But in order to get there,

114
00:05:50,200 --> 00:05:52,239
Speaker 1: they had to spend a lot of money to figure

115
00:05:52,240 --> 00:05:54,400
Speaker 1: out how to do that and to build the infrastructure

116
00:05:54,400 --> 00:05:57,720
Speaker 1: to do that. And you know, once they knew what

117
00:05:57,800 --> 00:05:59,680
Speaker 1: to do, it cost them five point five million dollars

118
00:05:59,720 --> 00:06:01,120
Speaker 1: to do it. They've shared a lot of that information

119
00:06:01,560 --> 00:06:04,720
Speaker 1: and this has dramatically reduced the cost of somebody who

120
00:06:04,720 --> 00:06:06,240
Speaker 1: wants to follow in their footsteps and train a new

121
00:06:06,279 --> 00:06:08,920
Speaker 1: model because they've shown the way of many of their

122
00:06:08,920 --> 00:06:11,479
Speaker 1: optimizations that people didn't realize they could do or didn't

123
00:06:11,480 --> 00:06:13,440
Speaker 1: realize how to do them. That can now very easily

124
00:06:13,480 --> 00:06:16,280
Speaker 1: be copied. But it does not mean that you are

125
00:06:16,320 --> 00:06:18,520
Speaker 1: five point five million dollars away from your own V three.

126
00:06:19,200 --> 00:06:22,320
Speaker 3: So the other thing that is freaking people out is

127
00:06:22,480 --> 00:06:25,040
Speaker 3: the fact that this is open source, right, we all

128
00:06:25,080 --> 00:06:28,960
Speaker 3: remember the days when OpenAI was more open and now

129
00:06:28,960 --> 00:06:31,479
Speaker 3: it's moved to closed source. Why do you think they

130
00:06:31,480 --> 00:06:34,080
Speaker 3: did that? And like how big a deal is that?

131
00:06:35,560 --> 00:06:37,919
Speaker 1: So this is one of those things where they have

132
00:06:38,040 --> 00:06:40,080
Speaker 1: a story and you can believe their story. You're not

133
00:06:40,080 --> 00:06:41,960
Speaker 1: with their story, but their story is that they are

134
00:06:42,040 --> 00:06:45,800
Speaker 1: essentially ideologically in favor of the idea that everyone should

135
00:06:45,800 --> 00:06:48,919
Speaker 1: have access to the same AI, that AI should be

136
00:06:48,920 --> 00:06:51,960
Speaker 1: shared with the world, especially that China should help pump

137
00:06:51,960 --> 00:06:54,640
Speaker 1: out its own ecosystem and they should help grow all

138
00:06:54,680 --> 00:06:57,120
Speaker 1: of the AI for the betterment of humanity. And they're

139
00:06:57,120 --> 00:06:59,400
Speaker 1: going to get artificial general intelligence and they are going

140
00:06:59,440 --> 00:07:02,440
Speaker 1: to open source that as well, and this is their

141
00:07:02,680 --> 00:07:04,440
Speaker 1: the main point of deep Sea. This is why deep

142
00:07:04,480 --> 00:07:07,760
Speaker 1: Seak exists. They disclaiming even having a business model really

143
00:07:08,480 --> 00:07:11,360
Speaker 1: and you know they're they're an outgrowth of a hedge fund,

144
00:07:11,680 --> 00:07:14,720
Speaker 1: and hedge fund makes money and maybe they can just

145
00:07:14,720 --> 00:07:17,440
Speaker 1: do this if they choose to do that, or maybe

146
00:07:17,440 --> 00:07:19,680
Speaker 1: they will end up with a different business model. But

147
00:07:20,640 --> 00:07:23,480
Speaker 1: it was obviously very concerning from a lot of angles

148
00:07:23,520 --> 00:07:26,960
Speaker 1: if you open source increasingly capable models, because you know,

149
00:07:27,040 --> 00:07:30,520
Speaker 1: artificial general intelligence means something that's you know, as smart

150
00:07:30,520 --> 00:07:33,280
Speaker 1: and capable as you and I as a human, and

151
00:07:33,400 --> 00:07:37,320
Speaker 1: perhaps more so. And if you just hand that over

152
00:07:37,520 --> 00:07:40,440
Speaker 1: in open form to anybody in the world who wants

153
00:07:40,480 --> 00:07:44,280
Speaker 1: to do anything with it, then we don't know how

154
00:07:44,360 --> 00:07:48,360
Speaker 1: dangerous that is, but it's existentially risky at some limit

155
00:07:49,120 --> 00:07:51,600
Speaker 1: to unleash things that are smarter and more capable, more

156
00:07:51,600 --> 00:07:54,240
Speaker 1: competitive than us, that are then going to be free

157
00:07:54,280 --> 00:07:57,119
Speaker 1: and loose to you know, engage in whatever any human

158
00:07:57,160 --> 00:07:57,840
Speaker 1: directs them to do.

159
00:07:58,480 --> 00:08:01,480
Speaker 3: I have a really dumb question, but I hear people

160
00:08:01,520 --> 00:08:05,760
Speaker 3: say artificial general intelligence all the time. AGI, what does

161
00:08:05,800 --> 00:08:06,600
Speaker 3: that actually mean?

162
00:08:07,480 --> 00:08:09,840
Speaker 1: There is a lot of dispute over exactly what that means.

163
00:08:09,840 --> 00:08:12,360
Speaker 1: The words are not used consistently, but it stands for

164
00:08:12,440 --> 00:08:17,680
Speaker 1: artificial general intelligence. Generally, it is understood to mean you

165
00:08:17,760 --> 00:08:20,880
Speaker 1: can do any task that can be done on a

166
00:08:20,920 --> 00:08:24,880
Speaker 1: computer that can be done cognitively only as well as

167
00:08:25,080 --> 00:08:25,520
Speaker 1: a human.

168
00:08:26,840 --> 00:08:28,680
Speaker 2: I mean, it does most of these things do things

169
00:08:28,760 --> 00:08:30,520
Speaker 2: much better than me. I don't know how to code,

170
00:08:30,560 --> 00:08:33,480
Speaker 2: and so, but I get that there are still some things.

171
00:08:33,480 --> 00:08:35,880
Speaker 2: Maybe they wouldn't be as good as proving some of

172
00:08:35,880 --> 00:08:38,400
Speaker 2: the are you human tests? Everyone to talk about Jevins

173
00:08:38,480 --> 00:08:41,160
Speaker 2: paradox and so we see in video and broadcom shares

174
00:08:41,240 --> 00:08:44,319
Speaker 2: these chip companies, they're getting crumbled today. And one of

175
00:08:44,360 --> 00:08:46,680
Speaker 2: the theories like, oh no, with all these optimizations and

176
00:08:46,720 --> 00:08:50,520
Speaker 2: so forth, in researchers will just use those and they'll

177
00:08:50,559 --> 00:08:54,079
Speaker 2: still have max demand for compute, and so it won't

178
00:08:54,120 --> 00:08:56,800
Speaker 2: actually change the ultimate end for compute. How are you

179
00:08:56,800 --> 00:08:57,720
Speaker 2: thinking about this question?

180
00:08:58,679 --> 00:09:02,040
Speaker 1: So I'm definitely a Jevans pro right now from the

181
00:09:02,080 --> 00:09:03,680
Speaker 1: perspective of this, you.

182
00:09:03,720 --> 00:09:06,080
Speaker 2: Don't think it'll have a negative impact and just the

183
00:09:06,120 --> 00:09:07,679
Speaker 2: amount of compute demanded.

184
00:09:08,000 --> 00:09:10,079
Speaker 1: The tweet I sent this morning was Navidio down eleven

185
00:09:10,120 --> 00:09:12,400
Speaker 1: percent pre market on news that his chips are highly useful.

186
00:09:14,200 --> 00:09:16,800
Speaker 1: And I believe that what we've shown is that, yes,

187
00:09:16,840 --> 00:09:19,240
Speaker 1: you can get a lot more in some sense out

188
00:09:19,320 --> 00:09:22,720
Speaker 1: of each Navidia chip than you expected. You can get

189
00:09:22,760 --> 00:09:25,760
Speaker 1: more AI. And if there was a limited amount of

190
00:09:25,800 --> 00:09:29,439
Speaker 1: stuff to do with AI, and once you did that stuff,

191
00:09:29,480 --> 00:09:31,720
Speaker 1: you were done, then that would be a different story.

192
00:09:31,720 --> 00:09:34,560
Speaker 1: But that's very much not the case. As we get

193
00:09:34,720 --> 00:09:37,600
Speaker 1: further along towards AGI, as these ais get more capable,

194
00:09:38,000 --> 00:09:39,400
Speaker 1: we're going to want to use them for more and

195
00:09:39,440 --> 00:09:42,720
Speaker 1: more things, more and more often, and most importantly, the

196
00:09:42,840 --> 00:09:45,280
Speaker 1: entire revolution of R one and also Open Eyes O

197
00:09:45,440 --> 00:09:49,760
Speaker 1: one is inference time compute. What that means is every

198
00:09:49,760 --> 00:09:53,400
Speaker 1: time you ask the question, it's going to use more compute,

199
00:09:53,400 --> 00:09:57,840
Speaker 1: more cycles of GPUs to think for longer, to basically

200
00:09:57,920 --> 00:10:00,400
Speaker 1: use more tokens or words to figure out what the

201
00:10:00,400 --> 00:10:03,480
Speaker 1: best possible answer is. And this scales not necessarily with

202
00:10:03,559 --> 00:10:06,480
Speaker 1: out limit, but it scales very very far. So Opening

203
00:10:06,480 --> 00:10:08,960
Speaker 1: Eyes new three is capable of thinking for you know,

204
00:10:09,080 --> 00:10:11,760
Speaker 1: many minutes. It's capable of potentially spending you know, hundreds

205
00:10:11,840 --> 00:10:14,160
Speaker 1: or even in theory thousands of dollars or more on

206
00:10:14,280 --> 00:10:18,840
Speaker 1: individual query. And if you knock that down by an

207
00:10:18,920 --> 00:10:21,560
Speaker 1: order of magnitude, that almost certainly gets you to use

208
00:10:21,559 --> 00:10:23,760
Speaker 1: it more for a given result, not use it less,

209
00:10:23,760 --> 00:10:27,959
Speaker 1: because that is effect starting to get prohibitive. And over time,

210
00:10:28,559 --> 00:10:31,120
Speaker 1: you know, if you have the ability to spend or

211
00:10:31,120 --> 00:10:33,760
Speaker 1: markly vittle of money and then get things like virtual

212
00:10:33,800 --> 00:10:37,600
Speaker 1: employees and abilities to answer any question under the sun, yeah,

213
00:10:37,640 --> 00:10:41,000
Speaker 1: there's basically unlimited demand to do that or to scale

214
00:10:41,080 --> 00:10:43,439
Speaker 1: up the quality of the answers as the price drops.

215
00:10:44,000 --> 00:10:47,720
Speaker 1: So I basically expect that as fast as the VIDIA

216
00:10:47,800 --> 00:10:50,920
Speaker 1: can manufacture chips and we can put them into data

217
00:10:50,960 --> 00:10:53,559
Speaker 1: centers and give them electrical power. People will be happy

218
00:10:53,559 --> 00:10:54,840
Speaker 1: to pie those chips.

219
00:10:54,920 --> 00:10:58,640
Speaker 3: At the risk of angering the Jeffons Paradox bros. Just

220
00:10:58,679 --> 00:11:01,440
Speaker 3: to push on the point a little bit more so,

221
00:11:01,520 --> 00:11:04,560
Speaker 3: my understanding of deepseek is that one of the reasons

222
00:11:04,640 --> 00:11:09,880
Speaker 3: it's special is because it doesn't rely on like specialized components,

223
00:11:09,960 --> 00:11:12,840
Speaker 3: custom operators, and so it can work on a variety

224
00:11:12,840 --> 00:11:16,880
Speaker 3: of GPUs. Is there a scenario where, you know, AI

225
00:11:17,160 --> 00:11:21,240
Speaker 3: becomes so free and plentiful, which could in theory be

226
00:11:21,320 --> 00:11:25,000
Speaker 3: good for Nvidia, But at the same time, because it's

227
00:11:25,040 --> 00:11:28,000
Speaker 3: easy to run on a bunch of other GPUs, people

228
00:11:28,120 --> 00:11:32,480
Speaker 3: start using you know, more like ACIK chips, like customized

229
00:11:32,559 --> 00:11:34,480
Speaker 3: chips for a specific purpose.

230
00:11:35,320 --> 00:11:37,880
Speaker 1: I mean, in the long run, we will almost certainly

231
00:11:37,920 --> 00:11:40,319
Speaker 1: see specialized inference chips, whether from the Video or they're

232
00:11:40,320 --> 00:11:43,200
Speaker 1: from someone else, and we will almost certainly see various

233
00:11:43,200 --> 00:11:45,880
Speaker 1: different advancements that today's chips are going to be obsolete

234
00:11:46,200 --> 00:11:48,400
Speaker 1: in a few years. That's how AI works, right, There's

235
00:11:48,400 --> 00:11:52,400
Speaker 1: all these rapid advancements. But you know, I think in

236
00:11:52,520 --> 00:11:55,160
Speaker 1: Video is in a very very good position take advantage

237
00:11:55,160 --> 00:11:57,600
Speaker 1: of all of this. I certainly don't think that like

238
00:11:57,679 --> 00:12:01,000
Speaker 1: you'll just use your laptop to run the best agis

239
00:12:01,240 --> 00:12:03,840
Speaker 1: and therefore we don't have to worry about buying TPUs

240
00:12:04,080 --> 00:12:07,040
Speaker 1: is a porposition. It's certainly possible that rivals will come

241
00:12:07,080 --> 00:12:09,160
Speaker 1: up with superior checks. That's always possible. The video does

242
00:12:09,160 --> 00:12:11,880
Speaker 1: not have a monopoly, but the video certainly seems to

243
00:12:11,920 --> 00:12:13,040
Speaker 1: be a dominantiation right now.

244
00:12:29,640 --> 00:12:31,360
Speaker 2: It seems to me. I mean, I know there's others,

245
00:12:31,360 --> 00:12:33,040
Speaker 2: but it seems to be in the US. There's like

246
00:12:33,240 --> 00:12:38,559
Speaker 2: three main AI producers of models that people know about.

247
00:12:38,600 --> 00:12:43,400
Speaker 2: There's Open Ai, there's Claude, and then there's Meta with Lama.

248
00:12:43,480 --> 00:12:46,760
Speaker 2: And it's worth knowing that Meta is green today, that

249
00:12:46,800 --> 00:12:48,760
Speaker 2: the stock is actually up as of the time I'm

250
00:12:48,760 --> 00:12:51,640
Speaker 2: talking about this one point one percent. Just go through

251
00:12:51,720 --> 00:12:55,480
Speaker 2: each one real quickly, how the sort of deep seek

252
00:12:55,640 --> 00:12:58,679
Speaker 2: shock affects them and their viability and where they stand today.

253
00:12:59,160 --> 00:13:01,400
Speaker 1: I think the most amazing thing about your question is

254
00:13:01,400 --> 00:13:02,520
Speaker 1: that you forgot about Google.

255
00:13:02,960 --> 00:13:05,000
Speaker 2: Oh yeah, right, yeah, that's very tilling.

256
00:13:05,280 --> 00:13:10,320
Speaker 1: But everyone else has forgotten about Yeah, surprising Semini flash

257
00:13:10,400 --> 00:13:13,480
Speaker 1: thinking their version of one and R one got updated

258
00:13:13,520 --> 00:13:16,520
Speaker 1: a few days ago, and there are many reports that

259
00:13:16,559 --> 00:13:20,319
Speaker 1: it's actually very good now and potentially competitive and effectively.

260
00:13:20,320 --> 00:13:22,360
Speaker 1: It's free to use for a lot of people on

261
00:13:22,600 --> 00:13:26,240
Speaker 1: AI studio, but nobody I know has taken the time

262
00:13:26,280 --> 00:13:28,320
Speaker 1: to check and find out how good it is because

263
00:13:28,320 --> 00:13:30,280
Speaker 1: we've all been too obsessed with being deep seep roads.

264
00:13:31,720 --> 00:13:34,319
Speaker 1: Google's had its like rhetorical lunch eaten over and over

265
00:13:34,320 --> 00:13:36,160
Speaker 1: and over again December. Like open a I would come

266
00:13:36,200 --> 00:13:37,960
Speaker 1: up with advance after advance after Advance, then Google would

267
00:13:38,000 --> 00:13:40,080
Speaker 1: love Advance after advanced after advance, and Googles would be

268
00:13:40,400 --> 00:13:42,679
Speaker 1: seemingly actually, if anything, more impressive. And yet everyone will

269
00:13:42,720 --> 00:13:44,160
Speaker 1: always just talk about open a eyes, so this is

270
00:13:44,160 --> 00:13:46,640
Speaker 1: not even new. Something is going on there. So in

271
00:13:46,760 --> 00:13:50,400
Speaker 1: terms of open Ai, Open Ai should be very nervous

272
00:13:50,800 --> 00:13:53,920
Speaker 1: in some sense, of course, because they have the reasoning models,

273
00:13:53,920 --> 00:13:55,600
Speaker 1: and now the reasoning model has been copied much more

274
00:13:55,640 --> 00:13:59,080
Speaker 1: effectively than previously, and the competition is a hell of

275
00:13:59,080 --> 00:14:02,120
Speaker 1: a lot cheaper Open Eye is charging, so it's a

276
00:14:02,120 --> 00:14:04,400
Speaker 1: direct threat to their business model for obvious reasons, and

277
00:14:04,440 --> 00:14:07,280
Speaker 1: it looks like their lead in reasoning models is smaller

278
00:14:07,280 --> 00:14:10,320
Speaker 1: and faster to undo than you would expect. Because if

279
00:14:10,400 --> 00:14:12,760
Speaker 1: deep Sea can do it, of course Anthropic and Google

280
00:14:13,320 --> 00:14:14,800
Speaker 1: you know, can do it. And everyone else can do

281
00:14:14,800 --> 00:14:18,840
Speaker 1: it as well, and Thropic, which produces Claude, has not

282
00:14:18,920 --> 00:14:21,960
Speaker 1: yet produced their own reasoning model. They clearly are operating

283
00:14:22,160 --> 00:14:25,000
Speaker 1: under a shortage of compute in some sense, so it's

284
00:14:25,080 --> 00:14:27,400
Speaker 1: entirely possible that they have chosen not to launch a

285
00:14:27,440 --> 00:14:30,240
Speaker 1: reasoning model even though they could, or not focused on

286
00:14:30,280 --> 00:14:33,240
Speaker 1: training one as quickly as possible until they've addressed this problem.

287
00:14:33,320 --> 00:14:36,760
Speaker 1: They're continuously taking investment. We should expect them to solve

288
00:14:36,760 --> 00:14:40,680
Speaker 1: their problems over time, but they seem like they should

289
00:14:40,680 --> 00:14:43,560
Speaker 1: be dressed directly concerned because they're less of a directly

290
00:14:43,560 --> 00:14:46,440
Speaker 1: competitive product in some sense, but also they tend to

291
00:14:46,520 --> 00:14:49,600
Speaker 1: market to effectively much more aware people, so their people

292
00:14:49,600 --> 00:14:51,400
Speaker 1: will also know about deep Seak and they will have

293
00:14:51,440 --> 00:14:54,680
Speaker 1: a choice to make. If I was Meta, I would

294
00:14:54,680 --> 00:14:58,000
Speaker 1: be far more worried, especially if I was on their

295
00:14:58,040 --> 00:15:01,200
Speaker 1: Genai team and wanted to keep my job, because Meta's

296
00:15:01,240 --> 00:15:03,920
Speaker 1: lunch has been eaten massively here right, Meta with Lama

297
00:15:04,080 --> 00:15:07,560
Speaker 1: had the best open models, and all the best open

298
00:15:07,600 --> 00:15:12,600
Speaker 1: models were effectively fine tunes of Lama, and now deep

299
00:15:12,640 --> 00:15:15,360
Speaker 1: Seat comes out, and this is absolutely not in any

300
00:15:15,400 --> 00:15:17,600
Speaker 1: way a fine tune of Lama. This is their own product,

301
00:15:18,000 --> 00:15:20,920
Speaker 1: and V three was already blowing everything that Meta had

302
00:15:20,920 --> 00:15:23,360
Speaker 1: out of the water. Are one. There are reports that

303
00:15:23,400 --> 00:15:25,680
Speaker 1: it's better than their new version that they're training now,

304
00:15:25,680 --> 00:15:28,560
Speaker 1: it's better than Lava four, which I would expect to

305
00:15:28,600 --> 00:15:33,320
Speaker 1: be true. And so there's no point in releasing an

306
00:15:33,360 --> 00:15:36,560
Speaker 1: inferior open model if everyone on the open model community

307
00:15:36,640 --> 00:15:38,680
Speaker 1: just be like, why don't I just use deep Sea Tracy.

308
00:15:38,720 --> 00:15:42,000
Speaker 2: It's interesting that, as V said, the people who should

309
00:15:42,000 --> 00:15:45,520
Speaker 2: be nervous are the employees of Meta, not Meta itself,

310
00:15:45,520 --> 00:15:48,840
Speaker 2: because Meta is up, and so you gotta wonder. It's like, well,

311
00:15:48,840 --> 00:15:50,800
Speaker 2: maybe they don't. I don't know, maybe they don't need

312
00:15:50,840 --> 00:15:54,400
Speaker 2: to invest as much in their own open source AI

313
00:15:54,440 --> 00:15:56,000
Speaker 2: if there's a better one out there now the stock

314
00:15:56,120 --> 00:15:56,320
Speaker 2: is up.

315
00:15:56,320 --> 00:16:00,000
Speaker 1: Anyway, The market has been very strange from my perspective

316
00:16:00,080 --> 00:16:02,080
Speaker 1: on how it reacts to different things that Meta does.

317
00:16:02,120 --> 00:16:04,360
Speaker 1: For a while, Meta would announce we're spending more in AI,

318
00:16:04,680 --> 00:16:06,880
Speaker 1: we're investing in all these data centers, we're training all

319
00:16:06,880 --> 00:16:09,240
Speaker 1: of these models, and the market would go, what are

320
00:16:09,280 --> 00:16:12,480
Speaker 1: you doing? This is another metaverse or something, and we're

321
00:16:12,480 --> 00:16:14,240
Speaker 1: gonna hammer your stock and we're gonna drag you down.

322
00:16:14,640 --> 00:16:16,920
Speaker 1: And then with the most recent sixty five billion dollar

323
00:16:16,920 --> 00:16:20,840
Speaker 1: announce spend. Then then Meta was up. Presumaly, they're gonna

324
00:16:20,920 --> 00:16:24,040
Speaker 1: use it mostly for inference effectively in a lot of

325
00:16:24,040 --> 00:16:27,440
Speaker 1: scenarios because they had these massive inference costs to want

326
00:16:27,480 --> 00:16:31,360
Speaker 1: to put ail over Facebook and Instagram. So you know,

327
00:16:31,520 --> 00:16:33,480
Speaker 1: if anything, like you know, I think the market might

328
00:16:33,480 --> 00:16:35,680
Speaker 1: be speculating that this means that they will know how

329
00:16:35,680 --> 00:16:37,920
Speaker 1: to train better lamas that are cheaper to operate, and

330
00:16:38,160 --> 00:16:40,400
Speaker 1: their costs will go down, and then they'll be in

331
00:16:40,440 --> 00:16:43,200
Speaker 1: a better position, and that theory isn't.

332
00:16:42,960 --> 00:16:48,800
Speaker 3: Crazy since we all just collectively remembered Google. I have

333
00:16:48,880 --> 00:16:50,960
Speaker 3: a question that's sort of been on the back in

334
00:16:51,000 --> 00:16:53,000
Speaker 3: the back of my mind. I think Joe has brought

335
00:16:53,040 --> 00:16:56,760
Speaker 3: this up before as well. But like when Google debuted,

336
00:16:57,800 --> 00:17:00,840
Speaker 3: it took years and years and years for people to

337
00:17:00,920 --> 00:17:04,119
Speaker 3: sort of catch up to the search function, and actually

338
00:17:04,200 --> 00:17:07,400
Speaker 3: no one ever really caught up, right, So Google has

339
00:17:07,440 --> 00:17:10,440
Speaker 3: like dominated for years. Why is it when it comes

340
00:17:10,440 --> 00:17:16,359
Speaker 3: to these chatbots there aren't like higher wider moats around

341
00:17:16,480 --> 00:17:17,359
Speaker 3: these businesses.

342
00:17:18,119 --> 00:17:22,679
Speaker 1: So one reason is that everyone's training on roughly the

343
00:17:22,720 --> 00:17:26,440
Speaker 1: same data, meeting the entire Internet and all of human knowledge,

344
00:17:26,560 --> 00:17:28,080
Speaker 1: so it's very hard to get that much of a

345
00:17:28,080 --> 00:17:30,800
Speaker 1: permanent data edge there unless you're creating synthetic data off

346
00:17:30,800 --> 00:17:32,960
Speaker 1: of your own models, which is what Opening Eye is

347
00:17:33,200 --> 00:17:36,720
Speaker 1: plausively doing. Now. Another reason is because everybody is scaling

348
00:17:36,720 --> 00:17:39,359
Speaker 1: as fast as possible and adding zeros to everything on

349
00:17:39,400 --> 00:17:42,600
Speaker 1: a periodic basis in calendar time. It doesn't take that

350
00:17:42,680 --> 00:17:45,760
Speaker 1: long before your rival is going to have access to

351
00:17:45,800 --> 00:17:48,720
Speaker 1: more compute than you had, and they're copying your techniques

352
00:17:48,720 --> 00:17:51,400
Speaker 1: more aggressively. They's just a lot less secret sauce there's

353
00:17:51,440 --> 00:17:54,480
Speaker 1: only so many algorithms. Fundamentally, everyone is relying on the

354
00:17:54,520 --> 00:17:56,399
Speaker 1: scaling laws. It's called the bitter lesson is the idea

355
00:17:56,440 --> 00:17:58,440
Speaker 1: that you know, you just scale more, you just use

356
00:17:58,440 --> 00:18:00,600
Speaker 1: more compute, you just use more data, you just use

357
00:18:00,680 --> 00:18:03,400
Speaker 1: more parameters and deep seek. You're saying, maybe you don't.

358
00:18:03,560 --> 00:18:06,159
Speaker 1: You can do more optimizations, you can get around this

359
00:18:06,240 --> 00:18:10,399
Speaker 1: problem and still get a superior model. But mostly, yeah,

360
00:18:10,480 --> 00:18:13,159
Speaker 1: there's been a lot of just I can catch up

361
00:18:13,160 --> 00:18:16,119
Speaker 1: to you by copying what you did. Also that I

362
00:18:16,119 --> 00:18:19,399
Speaker 1: can see the outputs, right, I can query your model,

363
00:18:19,640 --> 00:18:22,160
Speaker 1: and I can use your model's outputs to actively train

364
00:18:22,560 --> 00:18:27,119
Speaker 1: my model. And you see this in things like most

365
00:18:27,160 --> 00:18:29,760
Speaker 1: models that get trained. You ask them who trains you,

366
00:18:29,840 --> 00:18:32,960
Speaker 1: and they will often say, oh, I'm from Open Ai and.

367
00:18:33,040 --> 00:18:35,520
Speaker 2: The internet has gotten so weird. I just the internet

368
00:18:35,600 --> 00:18:38,160
Speaker 2: is so weird to speak. Mashavitz, thank you so much

369
00:18:38,240 --> 00:18:41,159
Speaker 2: for running over to the Odd Lots and helping us

370
00:18:41,200 --> 00:18:44,320
Speaker 2: record this emergency pod on the Deep Seek selloff though.

371
00:18:44,320 --> 00:18:45,000
Speaker 2: It was fantastic.

372
00:18:45,080 --> 00:18:58,600
Speaker 1: All right, thank you, Tracy.

373
00:18:58,640 --> 00:19:00,639
Speaker 2: I love talking to v We got just sort of

374
00:19:00,680 --> 00:19:03,880
Speaker 2: make him our Ai or our Ai guy.

375
00:19:04,119 --> 00:19:06,120
Speaker 3: I mean, to be honest, we could probably have him

376
00:19:06,160 --> 00:19:09,280
Speaker 3: back on again because there's gonna be stuff happening.

377
00:19:09,480 --> 00:19:12,359
Speaker 2: Maybe we will, and obviously it's we could go a

378
00:19:12,400 --> 00:19:15,280
Speaker 2: lot longer. This is a really exciting story. This is

379
00:19:15,320 --> 00:19:18,360
Speaker 2: a really exciting story, and things are just getting really

380
00:19:18,400 --> 00:19:19,200
Speaker 2: weird these days.

381
00:19:19,240 --> 00:19:22,320
Speaker 3: It is kind of crazy how fast all of this is. Yap,

382
00:19:22,840 --> 00:19:24,960
Speaker 3: And then the other thing I would say is just

383
00:19:25,320 --> 00:19:28,560
Speaker 3: the bitter lesson. Great name for a band.

384
00:19:29,119 --> 00:19:32,680
Speaker 2: Oh, totally totally great. Maybe when we do our Ai

385
00:19:32,840 --> 00:19:36,040
Speaker 2: themed proud rock band. True, Yes, that could be our name.

386
00:19:36,119 --> 00:19:38,040
Speaker 3: Yes, let's do that. Okay, shall we leave it there?

387
00:19:38,119 --> 00:19:38,840
Speaker 2: Let's leave it there.

388
00:19:39,160 --> 00:19:41,879
Speaker 3: This has been another episode of the Odd Thoughts podcast.

389
00:19:42,000 --> 00:19:45,520
Speaker 3: I'm Tracy Alloway. You can follow me at Tracy Alloway.

390
00:19:45,280 --> 00:19:48,320
Speaker 2: And I'm Jill Wisenthal. You can follow me at the Stalwart.

391
00:19:48,560 --> 00:19:52,000
Speaker 2: Follow our guests Vimashovitz, he's at this v Also definitely

392
00:19:52,119 --> 00:19:54,440
Speaker 2: check out his free subs deck. It's a must read

393
00:19:54,520 --> 00:19:57,760
Speaker 2: for me. Don't worry about the v OZ, really great stuff

394
00:19:57,800 --> 00:20:00,639
Speaker 2: every single day. Follow our producers Carmen ra Rigaz at

395
00:20:00,720 --> 00:20:03,879
Speaker 2: Kerman armand dash O Bennett at Dashbot and kill Brooks

396
00:20:03,880 --> 00:20:06,960
Speaker 2: at Kilbrooks. For more oddlocks content, go to Bloomberg dot

397
00:20:07,000 --> 00:20:09,840
Speaker 2: com slash odlocks. We have transcripts, a blog in a newsletter,

398
00:20:10,040 --> 00:20:11,960
Speaker 2: and you can chat about all of these topics twenty

399
00:20:11,960 --> 00:20:15,800
Speaker 2: four to seven in our discord Discord dot gg slash Odlots.

400
00:20:15,840 --> 00:20:17,920
Speaker 3: Maybe we'll give zv to do a Q and A

401
00:20:18,040 --> 00:20:21,200
Speaker 3: in there with oh yeah, that'd be great. And if

402
00:20:21,240 --> 00:20:24,119
Speaker 3: you enjoy Oddlots, if you like it when we roll

403
00:20:24,160 --> 00:20:27,800
Speaker 3: out these emergency episodes, then please leave us a positive

404
00:20:27,840 --> 00:20:31,119
Speaker 3: review on your favorite platform. Thanks for listening.