1 00:00:02,480 --> 00:00:07,040 Speaker 1: Bloomberg Audio Studios, Podcasts, Radio News. 2 00:00:18,040 --> 00:00:21,919 Speaker 2: Hello and welcome to another episode of the Odd Lots podcast. 3 00:00:22,000 --> 00:00:24,480 Speaker 3: I'm Joe Wisenthal and I'm Tracy Alloway. 4 00:00:24,560 --> 00:00:26,200 Speaker 2: Tracy the Deep Seek sell off. 5 00:00:27,200 --> 00:00:30,240 Speaker 3: That's right, it's pretty deep. Has anyone made that joke yet. 6 00:00:30,200 --> 00:00:31,080 Speaker 1: We're in Deep Seek? 7 00:00:31,240 --> 00:00:33,600 Speaker 2: Yeah, I don't think anyone who's made that joke. 8 00:00:33,880 --> 00:00:36,760 Speaker 3: I will say, like, you know, it's bad in markets 9 00:00:36,800 --> 00:00:40,000 Speaker 3: when all the headlines are about standard deviation, yes, right, 10 00:00:40,040 --> 00:00:43,240 Speaker 3: And then you know it's really bad when you see 11 00:00:43,280 --> 00:00:46,000 Speaker 3: people start to say it's not a crash, it's a 12 00:00:46,040 --> 00:00:49,159 Speaker 3: healthy correction. Yes, that's the real cope. 13 00:00:49,360 --> 00:00:52,280 Speaker 2: But just for like real scene setting. You know, We've 14 00:00:52,320 --> 00:00:55,520 Speaker 2: done some very timely interviews about tech concentration in the 15 00:00:55,560 --> 00:00:57,840 Speaker 2: market lately and how so much of the market is 16 00:00:57,880 --> 00:01:02,000 Speaker 2: this big concentrated bed on AI et cetera. Anyway, on Monday, 17 00:01:02,080 --> 00:01:04,120 Speaker 2: I think people will be listening to this. On Tuesday, 18 00:01:04,560 --> 00:01:07,720 Speaker 2: markets got clobbered in video one of the big winners 19 00:01:07,800 --> 00:01:10,120 Speaker 2: as of the time I'm talking about this three thirty 20 00:01:10,120 --> 00:01:14,000 Speaker 2: pm on Monday, down seventeen percent. We're talking major laws 21 00:01:14,080 --> 00:01:17,640 Speaker 2: is really across the tech complex. Basically, it seems to 22 00:01:17,680 --> 00:01:21,839 Speaker 2: be catalyzed by the introduction of this high performance, open 23 00:01:21,920 --> 00:01:26,199 Speaker 2: source Chinese AI model called deep Seek. I was born, 24 00:01:26,319 --> 00:01:28,360 Speaker 2: from what we know, out of a hedge fund. Apparently 25 00:01:28,440 --> 00:01:30,959 Speaker 2: it was very cheap to train, very cheap to build. 26 00:01:31,440 --> 00:01:34,640 Speaker 2: You know, the tech constraints at this point didn't seem 27 00:01:34,680 --> 00:01:36,000 Speaker 2: to be much of a problem. They may be a 28 00:01:36,000 --> 00:01:38,720 Speaker 2: problem going forward, But yes, here is something the entire 29 00:01:38,760 --> 00:01:41,080 Speaker 2: market betting on a lot of companies making AI and 30 00:01:41,160 --> 00:01:45,120 Speaker 2: are now concerns about, of course, a cheap Chinese competitor. 31 00:01:45,360 --> 00:01:48,520 Speaker 3: I just realized, Joe, this is actually your fault, isn't it. 32 00:01:49,040 --> 00:01:51,160 Speaker 3: This last week you wrote that you were a deep 33 00:01:51,200 --> 00:01:55,360 Speaker 3: Seek aibro and look what you've done. You've wiped five 34 00:01:55,480 --> 00:01:58,240 Speaker 3: hundred and sixty billion dollars off of in videos market. 35 00:01:58,440 --> 00:02:01,440 Speaker 2: Yeah, might be that's you anyway. One of the interesting 36 00:02:01,520 --> 00:02:03,480 Speaker 2: questions though, is that this was sort of announced in 37 00:02:03,520 --> 00:02:06,920 Speaker 2: a white paper in December. Why did it take for 38 00:02:07,040 --> 00:02:09,799 Speaker 2: till January twenty seventh for related to freak people out? 39 00:02:10,040 --> 00:02:13,200 Speaker 2: Big questions? Anyway, let's jump right into it. We really 40 00:02:13,240 --> 00:02:16,240 Speaker 2: do have the perfect guest, someone who's was here for 41 00:02:16,280 --> 00:02:19,640 Speaker 2: our election Eve Special, a guy who knows all about 42 00:02:20,120 --> 00:02:23,320 Speaker 2: numbers and AI and quant stuff, and he writes a 43 00:02:23,360 --> 00:02:26,680 Speaker 2: substack that has become for me a daily absolute must 44 00:02:26,720 --> 00:02:29,919 Speaker 2: read where he writes an extraordinary amount. I don't even 45 00:02:29,960 --> 00:02:31,560 Speaker 2: know how he writes so much on a given day. 46 00:02:31,840 --> 00:02:34,160 Speaker 2: We're going to be speaking with Zvi Mashowitz. He is 47 00:02:34,240 --> 00:02:37,280 Speaker 2: the author of the Don't Worry about the Vase blog 48 00:02:37,400 --> 00:02:41,480 Speaker 2: or substack. ZV. You're also a deep seki brill. You've 49 00:02:41,520 --> 00:02:42,359 Speaker 2: switched to using that. 50 00:02:43,000 --> 00:02:46,519 Speaker 1: So I use a wide variety of different ais. So 51 00:02:46,639 --> 00:02:49,600 Speaker 1: I will use quad paranthropic, I will use one from 52 00:02:49,680 --> 00:02:53,600 Speaker 1: ta GPT, from open Ai. I'll use Gemini sometimes, and 53 00:02:53,639 --> 00:02:56,520 Speaker 1: I'll use Perplexity for web searches. But yeah, I'll use 54 00:02:56,600 --> 00:02:59,960 Speaker 1: R one, the new deep seat model for certain type 55 00:03:00,160 --> 00:03:02,680 Speaker 1: queries where I want to see how it thinks and 56 00:03:02,840 --> 00:03:06,000 Speaker 1: like see the logic laid out, and then I can judge, 57 00:03:06,000 --> 00:03:07,760 Speaker 1: like did that make sense? Do I agree with that? 58 00:03:08,480 --> 00:03:10,880 Speaker 3: So one of the things that seems to be freaking 59 00:03:10,960 --> 00:03:14,799 Speaker 3: people out as well as the market is that purportedly 60 00:03:15,480 --> 00:03:19,519 Speaker 3: this was trained on like a very low cost, something 61 00:03:19,600 --> 00:03:23,760 Speaker 3: like five point five million dollars for deep Seek V three, 62 00:03:24,080 --> 00:03:27,160 Speaker 3: although I've seen people erroneously say that the five point 63 00:03:27,200 --> 00:03:30,040 Speaker 3: five million was for all of its R one models, 64 00:03:30,040 --> 00:03:32,760 Speaker 3: and that's not what it says in the technical paper. 65 00:03:32,840 --> 00:03:35,760 Speaker 3: It was just for V three. But anyway, oh I 66 00:03:35,760 --> 00:03:38,280 Speaker 3: should mention it also seems like a big chunk of 67 00:03:38,280 --> 00:03:41,320 Speaker 3: it was built on Mama, so they're sort of piggybacking 68 00:03:41,600 --> 00:03:45,520 Speaker 3: off of others investment. But anyway, five point five million 69 00:03:45,560 --> 00:03:50,320 Speaker 3: dollars to train, is that a realistic and then b 70 00:03:50,600 --> 00:03:52,840 Speaker 3: do we have any sense of how they were able 71 00:03:52,920 --> 00:03:53,360 Speaker 3: to do that. 72 00:03:53,720 --> 00:03:55,320 Speaker 1: So we have a very good sense of exactly what 73 00:03:55,360 --> 00:03:58,760 Speaker 1: they did because they're unusually open and they gave us 74 00:03:58,760 --> 00:04:01,120 Speaker 1: technical papers, they tell us what they did. They still 75 00:04:01,200 --> 00:04:03,400 Speaker 1: hid some parts of the process, especially with getting from 76 00:04:03,640 --> 00:04:05,400 Speaker 1: V three, which was trained for the five point five 77 00:04:05,440 --> 00:04:08,040 Speaker 1: million two R one, which is the reasoning model for 78 00:04:08,200 --> 00:04:10,720 Speaker 1: additional millions of dollars, where they tried to make it 79 00:04:10,720 --> 00:04:12,160 Speaker 1: a little bit harder for us to duplicate it by 80 00:04:12,200 --> 00:04:16,479 Speaker 1: not sharing their reinforcement learning techniques. But we shouldn't get 81 00:04:16,480 --> 00:04:18,440 Speaker 1: over anchored or carried away with the five point five 82 00:04:18,440 --> 00:04:20,240 Speaker 1: million dollar number. It's not that it's not real, it's 83 00:04:20,360 --> 00:04:23,880 Speaker 1: very real. But in order to get that ability to 84 00:04:23,880 --> 00:04:26,799 Speaker 1: spend five point five million dollars and get the model 85 00:04:26,839 --> 00:04:28,680 Speaker 1: to pop out. They had to acquire the data, they 86 00:04:28,680 --> 00:04:30,360 Speaker 1: had to hire the engineers, they had to build their 87 00:04:30,400 --> 00:04:33,840 Speaker 1: own cluster, they had to over optimize to the bone 88 00:04:34,040 --> 00:04:36,680 Speaker 1: their cluster because they're having problems of chip access thanks 89 00:04:36,680 --> 00:04:39,320 Speaker 1: to our export controls. And they were training on eight hundreds. 90 00:04:40,480 --> 00:04:43,400 Speaker 1: And the way they did this was they did all 91 00:04:43,400 --> 00:04:46,880 Speaker 1: these sorts of mini optimism, little optimizations, including like just 92 00:04:46,960 --> 00:04:50,559 Speaker 1: exactly integrating the hardware, the software, everything they were doing 93 00:04:51,000 --> 00:04:53,800 Speaker 1: in order to train as cheaply as possible on fifteen 94 00:04:53,800 --> 00:04:58,640 Speaker 1: trillion tokens and get the same level of performance or 95 00:04:58,839 --> 00:05:01,120 Speaker 1: you know, close to the same level performance as other 96 00:05:01,160 --> 00:05:04,440 Speaker 1: companies have gotten with much much more compute. But it 97 00:05:04,480 --> 00:05:05,960 Speaker 1: doesn't mean that you can get your own model for 98 00:05:06,000 --> 00:05:07,760 Speaker 1: five point five million dollars, even though they told you 99 00:05:07,800 --> 00:05:10,200 Speaker 1: a lot of the information. In total, they're spending hundreds 100 00:05:10,200 --> 00:05:11,400 Speaker 1: of millions of dollars to get this result. 101 00:05:11,560 --> 00:05:13,800 Speaker 2: Wait, explain that further. Why does it still take hundreds 102 00:05:13,800 --> 00:05:16,800 Speaker 2: of millions And does this mean if it takes hundreds 103 00:05:16,800 --> 00:05:20,039 Speaker 2: of millions of dollars that the gap between what they're 104 00:05:20,080 --> 00:05:23,000 Speaker 2: able to do versus the say American labs is perhaps 105 00:05:23,040 --> 00:05:24,599 Speaker 2: not as wide as maybe people think. 106 00:05:24,880 --> 00:05:28,640 Speaker 1: Well, what deepseek is doing is they have less access 107 00:05:28,680 --> 00:05:31,080 Speaker 1: to chips. They can't just buy Navidiot chips the same 108 00:05:31,120 --> 00:05:34,160 Speaker 1: way that you know open ai or Microsoft or and 109 00:05:34,279 --> 00:05:37,599 Speaker 1: throb it can buy Nvidiot chips. So instead they had 110 00:05:37,600 --> 00:05:40,880 Speaker 1: to make good use, very very efficient, killer use of 111 00:05:40,920 --> 00:05:44,600 Speaker 1: the chips that they did have. So they focused on 112 00:05:44,920 --> 00:05:46,960 Speaker 1: all these optimizations and all of these ways that they 113 00:05:47,000 --> 00:05:50,120 Speaker 1: could save on compute. But in order to get there, 114 00:05:50,200 --> 00:05:52,239 Speaker 1: they had to spend a lot of money to figure 115 00:05:52,240 --> 00:05:54,400 Speaker 1: out how to do that and to build the infrastructure 116 00:05:54,400 --> 00:05:57,720 Speaker 1: to do that. And you know, once they knew what 117 00:05:57,800 --> 00:05:59,680 Speaker 1: to do, it cost them five point five million dollars 118 00:05:59,720 --> 00:06:01,120 Speaker 1: to do it. They've shared a lot of that information 119 00:06:01,560 --> 00:06:04,720 Speaker 1: and this has dramatically reduced the cost of somebody who 120 00:06:04,720 --> 00:06:06,240 Speaker 1: wants to follow in their footsteps and train a new 121 00:06:06,279 --> 00:06:08,920 Speaker 1: model because they've shown the way of many of their 122 00:06:08,920 --> 00:06:11,479 Speaker 1: optimizations that people didn't realize they could do or didn't 123 00:06:11,480 --> 00:06:13,440 Speaker 1: realize how to do them. That can now very easily 124 00:06:13,480 --> 00:06:16,280 Speaker 1: be copied. But it does not mean that you are 125 00:06:16,320 --> 00:06:18,520 Speaker 1: five point five million dollars away from your own V three. 126 00:06:19,200 --> 00:06:22,320 Speaker 3: So the other thing that is freaking people out is 127 00:06:22,480 --> 00:06:25,040 Speaker 3: the fact that this is open source, right, we all 128 00:06:25,080 --> 00:06:28,960 Speaker 3: remember the days when OpenAI was more open and now 129 00:06:28,960 --> 00:06:31,479 Speaker 3: it's moved to closed source. Why do you think they 130 00:06:31,480 --> 00:06:34,080 Speaker 3: did that? And like how big a deal is that? 131 00:06:35,560 --> 00:06:37,919 Speaker 1: So this is one of those things where they have 132 00:06:38,040 --> 00:06:40,080 Speaker 1: a story and you can believe their story. You're not 133 00:06:40,080 --> 00:06:41,960 Speaker 1: with their story, but their story is that they are 134 00:06:42,040 --> 00:06:45,800 Speaker 1: essentially ideologically in favor of the idea that everyone should 135 00:06:45,800 --> 00:06:48,919 Speaker 1: have access to the same AI, that AI should be 136 00:06:48,920 --> 00:06:51,960 Speaker 1: shared with the world, especially that China should help pump 137 00:06:51,960 --> 00:06:54,640 Speaker 1: out its own ecosystem and they should help grow all 138 00:06:54,680 --> 00:06:57,120 Speaker 1: of the AI for the betterment of humanity. And they're 139 00:06:57,120 --> 00:06:59,400 Speaker 1: going to get artificial general intelligence and they are going 140 00:06:59,440 --> 00:07:02,440 Speaker 1: to open source that as well, and this is their 141 00:07:02,680 --> 00:07:04,440 Speaker 1: the main point of deep Sea. This is why deep 142 00:07:04,480 --> 00:07:07,760 Speaker 1: Seak exists. They disclaiming even having a business model really 143 00:07:08,480 --> 00:07:11,360 Speaker 1: and you know they're they're an outgrowth of a hedge fund, 144 00:07:11,680 --> 00:07:14,720 Speaker 1: and hedge fund makes money and maybe they can just 145 00:07:14,720 --> 00:07:17,440 Speaker 1: do this if they choose to do that, or maybe 146 00:07:17,440 --> 00:07:19,680 Speaker 1: they will end up with a different business model. But 147 00:07:20,640 --> 00:07:23,480 Speaker 1: it was obviously very concerning from a lot of angles 148 00:07:23,520 --> 00:07:26,960 Speaker 1: if you open source increasingly capable models, because you know, 149 00:07:27,040 --> 00:07:30,520 Speaker 1: artificial general intelligence means something that's you know, as smart 150 00:07:30,520 --> 00:07:33,280 Speaker 1: and capable as you and I as a human, and 151 00:07:33,400 --> 00:07:37,320 Speaker 1: perhaps more so. And if you just hand that over 152 00:07:37,520 --> 00:07:40,440 Speaker 1: in open form to anybody in the world who wants 153 00:07:40,480 --> 00:07:44,280 Speaker 1: to do anything with it, then we don't know how 154 00:07:44,360 --> 00:07:48,360 Speaker 1: dangerous that is, but it's existentially risky at some limit 155 00:07:49,120 --> 00:07:51,600 Speaker 1: to unleash things that are smarter and more capable, more 156 00:07:51,600 --> 00:07:54,240 Speaker 1: competitive than us, that are then going to be free 157 00:07:54,280 --> 00:07:57,119 Speaker 1: and loose to you know, engage in whatever any human 158 00:07:57,160 --> 00:07:57,840 Speaker 1: directs them to do. 159 00:07:58,480 --> 00:08:01,480 Speaker 3: I have a really dumb question, but I hear people 160 00:08:01,520 --> 00:08:05,760 Speaker 3: say artificial general intelligence all the time. AGI, what does 161 00:08:05,800 --> 00:08:06,600 Speaker 3: that actually mean? 162 00:08:07,480 --> 00:08:09,840 Speaker 1: There is a lot of dispute over exactly what that means. 163 00:08:09,840 --> 00:08:12,360 Speaker 1: The words are not used consistently, but it stands for 164 00:08:12,440 --> 00:08:17,680 Speaker 1: artificial general intelligence. Generally, it is understood to mean you 165 00:08:17,760 --> 00:08:20,880 Speaker 1: can do any task that can be done on a 166 00:08:20,920 --> 00:08:24,880 Speaker 1: computer that can be done cognitively only as well as 167 00:08:25,080 --> 00:08:25,520 Speaker 1: a human. 168 00:08:26,840 --> 00:08:28,680 Speaker 2: I mean, it does most of these things do things 169 00:08:28,760 --> 00:08:30,520 Speaker 2: much better than me. I don't know how to code, 170 00:08:30,560 --> 00:08:33,480 Speaker 2: and so, but I get that there are still some things. 171 00:08:33,480 --> 00:08:35,880 Speaker 2: Maybe they wouldn't be as good as proving some of 172 00:08:35,880 --> 00:08:38,400 Speaker 2: the are you human tests? Everyone to talk about Jevins 173 00:08:38,480 --> 00:08:41,160 Speaker 2: paradox and so we see in video and broadcom shares 174 00:08:41,240 --> 00:08:44,319 Speaker 2: these chip companies, they're getting crumbled today. And one of 175 00:08:44,360 --> 00:08:46,680 Speaker 2: the theories like, oh no, with all these optimizations and 176 00:08:46,720 --> 00:08:50,520 Speaker 2: so forth, in researchers will just use those and they'll 177 00:08:50,559 --> 00:08:54,079 Speaker 2: still have max demand for compute, and so it won't 178 00:08:54,120 --> 00:08:56,800 Speaker 2: actually change the ultimate end for compute. How are you 179 00:08:56,800 --> 00:08:57,720 Speaker 2: thinking about this question? 180 00:08:58,679 --> 00:09:02,040 Speaker 1: So I'm definitely a Jevans pro right now from the 181 00:09:02,080 --> 00:09:03,680 Speaker 1: perspective of this, you. 182 00:09:03,720 --> 00:09:06,080 Speaker 2: Don't think it'll have a negative impact and just the 183 00:09:06,120 --> 00:09:07,679 Speaker 2: amount of compute demanded. 184 00:09:08,000 --> 00:09:10,079 Speaker 1: The tweet I sent this morning was Navidio down eleven 185 00:09:10,120 --> 00:09:12,400 Speaker 1: percent pre market on news that his chips are highly useful. 186 00:09:14,200 --> 00:09:16,800 Speaker 1: And I believe that what we've shown is that, yes, 187 00:09:16,840 --> 00:09:19,240 Speaker 1: you can get a lot more in some sense out 188 00:09:19,320 --> 00:09:22,720 Speaker 1: of each Navidia chip than you expected. You can get 189 00:09:22,760 --> 00:09:25,760 Speaker 1: more AI. And if there was a limited amount of 190 00:09:25,800 --> 00:09:29,439 Speaker 1: stuff to do with AI, and once you did that stuff, 191 00:09:29,480 --> 00:09:31,720 Speaker 1: you were done, then that would be a different story. 192 00:09:31,720 --> 00:09:34,560 Speaker 1: But that's very much not the case. As we get 193 00:09:34,720 --> 00:09:37,600 Speaker 1: further along towards AGI, as these ais get more capable, 194 00:09:38,000 --> 00:09:39,400 Speaker 1: we're going to want to use them for more and 195 00:09:39,440 --> 00:09:42,720 Speaker 1: more things, more and more often, and most importantly, the 196 00:09:42,840 --> 00:09:45,280 Speaker 1: entire revolution of R one and also Open Eyes O 197 00:09:45,440 --> 00:09:49,760 Speaker 1: one is inference time compute. What that means is every 198 00:09:49,760 --> 00:09:53,400 Speaker 1: time you ask the question, it's going to use more compute, 199 00:09:53,400 --> 00:09:57,840 Speaker 1: more cycles of GPUs to think for longer, to basically 200 00:09:57,920 --> 00:10:00,400 Speaker 1: use more tokens or words to figure out what the 201 00:10:00,400 --> 00:10:03,480 Speaker 1: best possible answer is. And this scales not necessarily with 202 00:10:03,559 --> 00:10:06,480 Speaker 1: out limit, but it scales very very far. So Opening 203 00:10:06,480 --> 00:10:08,960 Speaker 1: Eyes new three is capable of thinking for you know, 204 00:10:09,080 --> 00:10:11,760 Speaker 1: many minutes. It's capable of potentially spending you know, hundreds 205 00:10:11,840 --> 00:10:14,160 Speaker 1: or even in theory thousands of dollars or more on 206 00:10:14,280 --> 00:10:18,840 Speaker 1: individual query. And if you knock that down by an 207 00:10:18,920 --> 00:10:21,560 Speaker 1: order of magnitude, that almost certainly gets you to use 208 00:10:21,559 --> 00:10:23,760 Speaker 1: it more for a given result, not use it less, 209 00:10:23,760 --> 00:10:27,959 Speaker 1: because that is effect starting to get prohibitive. And over time, 210 00:10:28,559 --> 00:10:31,120 Speaker 1: you know, if you have the ability to spend or 211 00:10:31,120 --> 00:10:33,760 Speaker 1: markly vittle of money and then get things like virtual 212 00:10:33,800 --> 00:10:37,600 Speaker 1: employees and abilities to answer any question under the sun, yeah, 213 00:10:37,640 --> 00:10:41,000 Speaker 1: there's basically unlimited demand to do that or to scale 214 00:10:41,080 --> 00:10:43,439 Speaker 1: up the quality of the answers as the price drops. 215 00:10:44,000 --> 00:10:47,720 Speaker 1: So I basically expect that as fast as the VIDIA 216 00:10:47,800 --> 00:10:50,920 Speaker 1: can manufacture chips and we can put them into data 217 00:10:50,960 --> 00:10:53,559 Speaker 1: centers and give them electrical power. People will be happy 218 00:10:53,559 --> 00:10:54,840 Speaker 1: to pie those chips. 219 00:10:54,920 --> 00:10:58,640 Speaker 3: At the risk of angering the Jeffons Paradox bros. Just 220 00:10:58,679 --> 00:11:01,440 Speaker 3: to push on the point a little bit more so, 221 00:11:01,520 --> 00:11:04,560 Speaker 3: my understanding of deepseek is that one of the reasons 222 00:11:04,640 --> 00:11:09,880 Speaker 3: it's special is because it doesn't rely on like specialized components, 223 00:11:09,960 --> 00:11:12,840 Speaker 3: custom operators, and so it can work on a variety 224 00:11:12,840 --> 00:11:16,880 Speaker 3: of GPUs. Is there a scenario where, you know, AI 225 00:11:17,160 --> 00:11:21,240 Speaker 3: becomes so free and plentiful, which could in theory be 226 00:11:21,320 --> 00:11:25,000 Speaker 3: good for Nvidia, But at the same time, because it's 227 00:11:25,040 --> 00:11:28,000 Speaker 3: easy to run on a bunch of other GPUs, people 228 00:11:28,120 --> 00:11:32,480 Speaker 3: start using you know, more like ACIK chips, like customized 229 00:11:32,559 --> 00:11:34,480 Speaker 3: chips for a specific purpose. 230 00:11:35,320 --> 00:11:37,880 Speaker 1: I mean, in the long run, we will almost certainly 231 00:11:37,920 --> 00:11:40,319 Speaker 1: see specialized inference chips, whether from the Video or they're 232 00:11:40,320 --> 00:11:43,200 Speaker 1: from someone else, and we will almost certainly see various 233 00:11:43,200 --> 00:11:45,880 Speaker 1: different advancements that today's chips are going to be obsolete 234 00:11:46,200 --> 00:11:48,400 Speaker 1: in a few years. That's how AI works, right, There's 235 00:11:48,400 --> 00:11:52,400 Speaker 1: all these rapid advancements. But you know, I think in 236 00:11:52,520 --> 00:11:55,160 Speaker 1: Video is in a very very good position take advantage 237 00:11:55,160 --> 00:11:57,600 Speaker 1: of all of this. I certainly don't think that like 238 00:11:57,679 --> 00:12:01,000 Speaker 1: you'll just use your laptop to run the best agis 239 00:12:01,240 --> 00:12:03,840 Speaker 1: and therefore we don't have to worry about buying TPUs 240 00:12:04,080 --> 00:12:07,040 Speaker 1: is a porposition. It's certainly possible that rivals will come 241 00:12:07,080 --> 00:12:09,160 Speaker 1: up with superior checks. That's always possible. The video does 242 00:12:09,160 --> 00:12:11,880 Speaker 1: not have a monopoly, but the video certainly seems to 243 00:12:11,920 --> 00:12:13,040 Speaker 1: be a dominantiation right now. 244 00:12:29,640 --> 00:12:31,360 Speaker 2: It seems to me. I mean, I know there's others, 245 00:12:31,360 --> 00:12:33,040 Speaker 2: but it seems to be in the US. There's like 246 00:12:33,240 --> 00:12:38,559 Speaker 2: three main AI producers of models that people know about. 247 00:12:38,600 --> 00:12:43,400 Speaker 2: There's Open Ai, there's Claude, and then there's Meta with Lama. 248 00:12:43,480 --> 00:12:46,760 Speaker 2: And it's worth knowing that Meta is green today, that 249 00:12:46,800 --> 00:12:48,760 Speaker 2: the stock is actually up as of the time I'm 250 00:12:48,760 --> 00:12:51,640 Speaker 2: talking about this one point one percent. Just go through 251 00:12:51,720 --> 00:12:55,480 Speaker 2: each one real quickly, how the sort of deep seek 252 00:12:55,640 --> 00:12:58,679 Speaker 2: shock affects them and their viability and where they stand today. 253 00:12:59,160 --> 00:13:01,400 Speaker 1: I think the most amazing thing about your question is 254 00:13:01,400 --> 00:13:02,520 Speaker 1: that you forgot about Google. 255 00:13:02,960 --> 00:13:05,000 Speaker 2: Oh yeah, right, yeah, that's very tilling. 256 00:13:05,280 --> 00:13:10,320 Speaker 1: But everyone else has forgotten about Yeah, surprising Semini flash 257 00:13:10,400 --> 00:13:13,480 Speaker 1: thinking their version of one and R one got updated 258 00:13:13,520 --> 00:13:16,520 Speaker 1: a few days ago, and there are many reports that 259 00:13:16,559 --> 00:13:20,319 Speaker 1: it's actually very good now and potentially competitive and effectively. 260 00:13:20,320 --> 00:13:22,360 Speaker 1: It's free to use for a lot of people on 261 00:13:22,600 --> 00:13:26,240 Speaker 1: AI studio, but nobody I know has taken the time 262 00:13:26,280 --> 00:13:28,320 Speaker 1: to check and find out how good it is because 263 00:13:28,320 --> 00:13:30,280 Speaker 1: we've all been too obsessed with being deep seep roads. 264 00:13:31,720 --> 00:13:34,319 Speaker 1: Google's had its like rhetorical lunch eaten over and over 265 00:13:34,320 --> 00:13:36,160 Speaker 1: and over again December. Like open a I would come 266 00:13:36,200 --> 00:13:37,960 Speaker 1: up with advance after advance after Advance, then Google would 267 00:13:38,000 --> 00:13:40,080 Speaker 1: love Advance after advanced after advance, and Googles would be 268 00:13:40,400 --> 00:13:42,679 Speaker 1: seemingly actually, if anything, more impressive. And yet everyone will 269 00:13:42,720 --> 00:13:44,160 Speaker 1: always just talk about open a eyes, so this is 270 00:13:44,160 --> 00:13:46,640 Speaker 1: not even new. Something is going on there. So in 271 00:13:46,760 --> 00:13:50,400 Speaker 1: terms of open Ai, Open Ai should be very nervous 272 00:13:50,800 --> 00:13:53,920 Speaker 1: in some sense, of course, because they have the reasoning models, 273 00:13:53,920 --> 00:13:55,600 Speaker 1: and now the reasoning model has been copied much more 274 00:13:55,640 --> 00:13:59,080 Speaker 1: effectively than previously, and the competition is a hell of 275 00:13:59,080 --> 00:14:02,120 Speaker 1: a lot cheaper Open Eye is charging, so it's a 276 00:14:02,120 --> 00:14:04,400 Speaker 1: direct threat to their business model for obvious reasons, and 277 00:14:04,440 --> 00:14:07,280 Speaker 1: it looks like their lead in reasoning models is smaller 278 00:14:07,280 --> 00:14:10,320 Speaker 1: and faster to undo than you would expect. Because if 279 00:14:10,400 --> 00:14:12,760 Speaker 1: deep Sea can do it, of course Anthropic and Google 280 00:14:13,320 --> 00:14:14,800 Speaker 1: you know, can do it. And everyone else can do 281 00:14:14,800 --> 00:14:18,840 Speaker 1: it as well, and Thropic, which produces Claude, has not 282 00:14:18,920 --> 00:14:21,960 Speaker 1: yet produced their own reasoning model. They clearly are operating 283 00:14:22,160 --> 00:14:25,000 Speaker 1: under a shortage of compute in some sense, so it's 284 00:14:25,080 --> 00:14:27,400 Speaker 1: entirely possible that they have chosen not to launch a 285 00:14:27,440 --> 00:14:30,240 Speaker 1: reasoning model even though they could, or not focused on 286 00:14:30,280 --> 00:14:33,240 Speaker 1: training one as quickly as possible until they've addressed this problem. 287 00:14:33,320 --> 00:14:36,760 Speaker 1: They're continuously taking investment. We should expect them to solve 288 00:14:36,760 --> 00:14:40,680 Speaker 1: their problems over time, but they seem like they should 289 00:14:40,680 --> 00:14:43,560 Speaker 1: be dressed directly concerned because they're less of a directly 290 00:14:43,560 --> 00:14:46,440 Speaker 1: competitive product in some sense, but also they tend to 291 00:14:46,520 --> 00:14:49,600 Speaker 1: market to effectively much more aware people, so their people 292 00:14:49,600 --> 00:14:51,400 Speaker 1: will also know about deep Seak and they will have 293 00:14:51,440 --> 00:14:54,680 Speaker 1: a choice to make. If I was Meta, I would 294 00:14:54,680 --> 00:14:58,000 Speaker 1: be far more worried, especially if I was on their 295 00:14:58,040 --> 00:15:01,200 Speaker 1: Genai team and wanted to keep my job, because Meta's 296 00:15:01,240 --> 00:15:03,920 Speaker 1: lunch has been eaten massively here right, Meta with Lama 297 00:15:04,080 --> 00:15:07,560 Speaker 1: had the best open models, and all the best open 298 00:15:07,600 --> 00:15:12,600 Speaker 1: models were effectively fine tunes of Lama, and now deep 299 00:15:12,640 --> 00:15:15,360 Speaker 1: Seat comes out, and this is absolutely not in any 300 00:15:15,400 --> 00:15:17,600 Speaker 1: way a fine tune of Lama. This is their own product, 301 00:15:18,000 --> 00:15:20,920 Speaker 1: and V three was already blowing everything that Meta had 302 00:15:20,920 --> 00:15:23,360 Speaker 1: out of the water. Are one. There are reports that 303 00:15:23,400 --> 00:15:25,680 Speaker 1: it's better than their new version that they're training now, 304 00:15:25,680 --> 00:15:28,560 Speaker 1: it's better than Lava four, which I would expect to 305 00:15:28,600 --> 00:15:33,320 Speaker 1: be true. And so there's no point in releasing an 306 00:15:33,360 --> 00:15:36,560 Speaker 1: inferior open model if everyone on the open model community 307 00:15:36,640 --> 00:15:38,680 Speaker 1: just be like, why don't I just use deep Sea Tracy. 308 00:15:38,720 --> 00:15:42,000 Speaker 2: It's interesting that, as V said, the people who should 309 00:15:42,000 --> 00:15:45,520 Speaker 2: be nervous are the employees of Meta, not Meta itself, 310 00:15:45,520 --> 00:15:48,840 Speaker 2: because Meta is up, and so you gotta wonder. It's like, well, 311 00:15:48,840 --> 00:15:50,800 Speaker 2: maybe they don't. I don't know, maybe they don't need 312 00:15:50,840 --> 00:15:54,400 Speaker 2: to invest as much in their own open source AI 313 00:15:54,440 --> 00:15:56,000 Speaker 2: if there's a better one out there now the stock 314 00:15:56,120 --> 00:15:56,320 Speaker 2: is up. 315 00:15:56,320 --> 00:16:00,000 Speaker 1: Anyway, The market has been very strange from my perspective 316 00:16:00,080 --> 00:16:02,080 Speaker 1: on how it reacts to different things that Meta does. 317 00:16:02,120 --> 00:16:04,360 Speaker 1: For a while, Meta would announce we're spending more in AI, 318 00:16:04,680 --> 00:16:06,880 Speaker 1: we're investing in all these data centers, we're training all 319 00:16:06,880 --> 00:16:09,240 Speaker 1: of these models, and the market would go, what are 320 00:16:09,280 --> 00:16:12,480 Speaker 1: you doing? This is another metaverse or something, and we're 321 00:16:12,480 --> 00:16:14,240 Speaker 1: gonna hammer your stock and we're gonna drag you down. 322 00:16:14,640 --> 00:16:16,920 Speaker 1: And then with the most recent sixty five billion dollar 323 00:16:16,920 --> 00:16:20,840 Speaker 1: announce spend. Then then Meta was up. Presumaly, they're gonna 324 00:16:20,920 --> 00:16:24,040 Speaker 1: use it mostly for inference effectively in a lot of 325 00:16:24,040 --> 00:16:27,440 Speaker 1: scenarios because they had these massive inference costs to want 326 00:16:27,480 --> 00:16:31,360 Speaker 1: to put ail over Facebook and Instagram. So you know, 327 00:16:31,520 --> 00:16:33,480 Speaker 1: if anything, like you know, I think the market might 328 00:16:33,480 --> 00:16:35,680 Speaker 1: be speculating that this means that they will know how 329 00:16:35,680 --> 00:16:37,920 Speaker 1: to train better lamas that are cheaper to operate, and 330 00:16:38,160 --> 00:16:40,400 Speaker 1: their costs will go down, and then they'll be in 331 00:16:40,440 --> 00:16:43,200 Speaker 1: a better position, and that theory isn't. 332 00:16:42,960 --> 00:16:48,800 Speaker 3: Crazy since we all just collectively remembered Google. I have 333 00:16:48,880 --> 00:16:50,960 Speaker 3: a question that's sort of been on the back in 334 00:16:51,000 --> 00:16:53,000 Speaker 3: the back of my mind. I think Joe has brought 335 00:16:53,040 --> 00:16:56,760 Speaker 3: this up before as well. But like when Google debuted, 336 00:16:57,800 --> 00:17:00,840 Speaker 3: it took years and years and years for people to 337 00:17:00,920 --> 00:17:04,119 Speaker 3: sort of catch up to the search function, and actually 338 00:17:04,200 --> 00:17:07,400 Speaker 3: no one ever really caught up, right, So Google has 339 00:17:07,440 --> 00:17:10,440 Speaker 3: like dominated for years. Why is it when it comes 340 00:17:10,440 --> 00:17:16,359 Speaker 3: to these chatbots there aren't like higher wider moats around 341 00:17:16,480 --> 00:17:17,359 Speaker 3: these businesses. 342 00:17:18,119 --> 00:17:22,679 Speaker 1: So one reason is that everyone's training on roughly the 343 00:17:22,720 --> 00:17:26,440 Speaker 1: same data, meeting the entire Internet and all of human knowledge, 344 00:17:26,560 --> 00:17:28,080 Speaker 1: so it's very hard to get that much of a 345 00:17:28,080 --> 00:17:30,800 Speaker 1: permanent data edge there unless you're creating synthetic data off 346 00:17:30,800 --> 00:17:32,960 Speaker 1: of your own models, which is what Opening Eye is 347 00:17:33,200 --> 00:17:36,720 Speaker 1: plausively doing. Now. Another reason is because everybody is scaling 348 00:17:36,720 --> 00:17:39,359 Speaker 1: as fast as possible and adding zeros to everything on 349 00:17:39,400 --> 00:17:42,600 Speaker 1: a periodic basis in calendar time. It doesn't take that 350 00:17:42,680 --> 00:17:45,760 Speaker 1: long before your rival is going to have access to 351 00:17:45,800 --> 00:17:48,720 Speaker 1: more compute than you had, and they're copying your techniques 352 00:17:48,720 --> 00:17:51,400 Speaker 1: more aggressively. They's just a lot less secret sauce there's 353 00:17:51,440 --> 00:17:54,480 Speaker 1: only so many algorithms. Fundamentally, everyone is relying on the 354 00:17:54,520 --> 00:17:56,399 Speaker 1: scaling laws. It's called the bitter lesson is the idea 355 00:17:56,440 --> 00:17:58,440 Speaker 1: that you know, you just scale more, you just use 356 00:17:58,440 --> 00:18:00,600 Speaker 1: more compute, you just use more data, you just use 357 00:18:00,680 --> 00:18:03,400 Speaker 1: more parameters and deep seek. You're saying, maybe you don't. 358 00:18:03,560 --> 00:18:06,159 Speaker 1: You can do more optimizations, you can get around this 359 00:18:06,240 --> 00:18:10,399 Speaker 1: problem and still get a superior model. But mostly, yeah, 360 00:18:10,480 --> 00:18:13,159 Speaker 1: there's been a lot of just I can catch up 361 00:18:13,160 --> 00:18:16,119 Speaker 1: to you by copying what you did. Also that I 362 00:18:16,119 --> 00:18:19,399 Speaker 1: can see the outputs, right, I can query your model, 363 00:18:19,640 --> 00:18:22,160 Speaker 1: and I can use your model's outputs to actively train 364 00:18:22,560 --> 00:18:27,119 Speaker 1: my model. And you see this in things like most 365 00:18:27,160 --> 00:18:29,760 Speaker 1: models that get trained. You ask them who trains you, 366 00:18:29,840 --> 00:18:32,960 Speaker 1: and they will often say, oh, I'm from Open Ai and. 367 00:18:33,040 --> 00:18:35,520 Speaker 2: The internet has gotten so weird. I just the internet 368 00:18:35,600 --> 00:18:38,160 Speaker 2: is so weird to speak. Mashavitz, thank you so much 369 00:18:38,240 --> 00:18:41,159 Speaker 2: for running over to the Odd Lots and helping us 370 00:18:41,200 --> 00:18:44,320 Speaker 2: record this emergency pod on the Deep Seek selloff though. 371 00:18:44,320 --> 00:18:45,000 Speaker 2: It was fantastic. 372 00:18:45,080 --> 00:18:58,600 Speaker 1: All right, thank you, Tracy. 373 00:18:58,640 --> 00:19:00,639 Speaker 2: I love talking to v We got just sort of 374 00:19:00,680 --> 00:19:03,880 Speaker 2: make him our Ai or our Ai guy. 375 00:19:04,119 --> 00:19:06,120 Speaker 3: I mean, to be honest, we could probably have him 376 00:19:06,160 --> 00:19:09,280 Speaker 3: back on again because there's gonna be stuff happening. 377 00:19:09,480 --> 00:19:12,359 Speaker 2: Maybe we will, and obviously it's we could go a 378 00:19:12,400 --> 00:19:15,280 Speaker 2: lot longer. This is a really exciting story. This is 379 00:19:15,320 --> 00:19:18,360 Speaker 2: a really exciting story, and things are just getting really 380 00:19:18,400 --> 00:19:19,200 Speaker 2: weird these days. 381 00:19:19,240 --> 00:19:22,320 Speaker 3: It is kind of crazy how fast all of this is. Yap, 382 00:19:22,840 --> 00:19:24,960 Speaker 3: And then the other thing I would say is just 383 00:19:25,320 --> 00:19:28,560 Speaker 3: the bitter lesson. Great name for a band. 384 00:19:29,119 --> 00:19:32,680 Speaker 2: Oh, totally totally great. Maybe when we do our Ai 385 00:19:32,840 --> 00:19:36,040 Speaker 2: themed proud rock band. True, Yes, that could be our name. 386 00:19:36,119 --> 00:19:38,040 Speaker 3: Yes, let's do that. Okay, shall we leave it there? 387 00:19:38,119 --> 00:19:38,840 Speaker 2: Let's leave it there. 388 00:19:39,160 --> 00:19:41,879 Speaker 3: This has been another episode of the Odd Thoughts podcast. 389 00:19:42,000 --> 00:19:45,520 Speaker 3: I'm Tracy Alloway. You can follow me at Tracy Alloway. 390 00:19:45,280 --> 00:19:48,320 Speaker 2: And I'm Jill Wisenthal. You can follow me at the Stalwart. 391 00:19:48,560 --> 00:19:52,000 Speaker 2: Follow our guests Vimashovitz, he's at this v Also definitely 392 00:19:52,119 --> 00:19:54,440 Speaker 2: check out his free subs deck. It's a must read 393 00:19:54,520 --> 00:19:57,760 Speaker 2: for me. Don't worry about the v OZ, really great stuff 394 00:19:57,800 --> 00:20:00,639 Speaker 2: every single day. Follow our producers Carmen ra Rigaz at 395 00:20:00,720 --> 00:20:03,879 Speaker 2: Kerman armand dash O Bennett at Dashbot and kill Brooks 396 00:20:03,880 --> 00:20:06,960 Speaker 2: at Kilbrooks. For more oddlocks content, go to Bloomberg dot 397 00:20:07,000 --> 00:20:09,840 Speaker 2: com slash odlocks. We have transcripts, a blog in a newsletter, 398 00:20:10,040 --> 00:20:11,960 Speaker 2: and you can chat about all of these topics twenty 399 00:20:11,960 --> 00:20:15,800 Speaker 2: four to seven in our discord Discord dot gg slash Odlots. 400 00:20:15,840 --> 00:20:17,920 Speaker 3: Maybe we'll give zv to do a Q and A 401 00:20:18,040 --> 00:20:21,200 Speaker 3: in there with oh yeah, that'd be great. And if 402 00:20:21,240 --> 00:20:24,119 Speaker 3: you enjoy Oddlots, if you like it when we roll 403 00:20:24,160 --> 00:20:27,800 Speaker 3: out these emergency episodes, then please leave us a positive 404 00:20:27,840 --> 00:20:31,119 Speaker 3: review on your favorite platform. Thanks for listening.