WEBVTT - The AI Model That Tanked the Stock Market 0:00:02.480 --> 0:00:07.040 Bloomberg Audio Studios, Podcasts, Radio News. 0:00:18.040 --> 0:00:21.919 Hello and welcome to another episode of the Odd Lots podcast. 0:00:22.000 --> 0:00:24.480 I'm Joe Wisenthal and I'm Tracy Alloway. 0:00:24.560 --> 0:00:26.200 Tracy the Deep Seek sell off. 0:00:27.200 --> 0:00:30.240 That's right, it's pretty deep. Has anyone made that joke yet. 0:00:30.200 --> 0:00:31.080 We're in Deep Seek? 0:00:31.240 --> 0:00:33.600 Yeah, I don't think anyone who's made that joke. 0:00:33.880 --> 0:00:36.760 I will say, like, you know, it's bad in markets 0:00:36.800 --> 0:00:40.000 when all the headlines are about standard deviation, yes, right, 0:00:40.040 --> 0:00:43.240 And then you know it's really bad when you see 0:00:43.280 --> 0:00:46.000 people start to say it's not a crash, it's a 0:00:46.040 --> 0:00:49.159 healthy correction. Yes, that's the real cope. 0:00:49.360 --> 0:00:52.280 But just for like real scene setting. You know, We've 0:00:52.320 --> 0:00:55.520 done some very timely interviews about tech concentration in the 0:00:55.560 --> 0:00:57.840 market lately and how so much of the market is 0:00:57.880 --> 0:01:02.000 this big concentrated bed on AI et cetera. Anyway, on Monday, 0:01:02.080 --> 0:01:04.120 I think people will be listening to this. On Tuesday, 0:01:04.560 --> 0:01:07.720 markets got clobbered in video one of the big winners 0:01:07.800 --> 0:01:10.120 as of the time I'm talking about this three thirty 0:01:10.120 --> 0:01:14.000 pm on Monday, down seventeen percent. We're talking major laws 0:01:14.080 --> 0:01:17.640 is really across the tech complex. Basically, it seems to 0:01:17.680 --> 0:01:21.839 be catalyzed by the introduction of this high performance, open 0:01:21.920 --> 0:01:26.199 source Chinese AI model called deep Seek. I was born, 0:01:26.319 --> 0:01:28.360 from what we know, out of a hedge fund. Apparently 0:01:28.440 --> 0:01:30.959 it was very cheap to train, very cheap to build. 0:01:31.440 --> 0:01:34.640 You know, the tech constraints at this point didn't seem 0:01:34.680 --> 0:01:36.000 to be much of a problem. They may be a 0:01:36.000 --> 0:01:38.720 problem going forward, But yes, here is something the entire 0:01:38.760 --> 0:01:41.080 market betting on a lot of companies making AI and 0:01:41.160 --> 0:01:45.120 are now concerns about, of course, a cheap Chinese competitor. 0:01:45.360 --> 0:01:48.520 I just realized, Joe, this is actually your fault, isn't it. 0:01:49.040 --> 0:01:51.160 This last week you wrote that you were a deep 0:01:51.200 --> 0:01:55.360 Seek aibro and look what you've done. You've wiped five 0:01:55.480 --> 0:01:58.240 hundred and sixty billion dollars off of in videos market. 0:01:58.440 --> 0:02:01.440 Yeah, might be that's you anyway. One of the interesting 0:02:01.520 --> 0:02:03.480 questions though, is that this was sort of announced in 0:02:03.520 --> 0:02:06.920 a white paper in December. Why did it take for 0:02:07.040 --> 0:02:09.799 till January twenty seventh for related to freak people out? 0:02:10.040 --> 0:02:13.200 Big questions? Anyway, let's jump right into it. We really 0:02:13.240 --> 0:02:16.240 do have the perfect guest, someone who's was here for 0:02:16.280 --> 0:02:19.640 our election Eve Special, a guy who knows all about 0:02:20.120 --> 0:02:23.320 numbers and AI and quant stuff, and he writes a 0:02:23.360 --> 0:02:26.680 substack that has become for me a daily absolute must 0:02:26.720 --> 0:02:29.919 read where he writes an extraordinary amount. I don't even 0:02:29.960 --> 0:02:31.560 know how he writes so much on a given day. 0:02:31.840 --> 0:02:34.160 We're going to be speaking with Zvi Mashowitz. He is 0:02:34.240 --> 0:02:37.280 the author of the Don't Worry about the Vase blog 0:02:37.400 --> 0:02:41.480 or substack. ZV. You're also a deep seki brill. You've 0:02:41.520 --> 0:02:42.359 switched to using that. 0:02:43.000 --> 0:02:46.519 So I use a wide variety of different ais. So 0:02:46.639 --> 0:02:49.600 I will use quad paranthropic, I will use one from 0:02:49.680 --> 0:02:53.600 ta GPT, from open Ai. I'll use Gemini sometimes, and 0:02:53.639 --> 0:02:56.520 I'll use Perplexity for web searches. But yeah, I'll use 0:02:56.600 --> 0:02:59.960 R one, the new deep seat model for certain type 0:03:00.160 --> 0:03:02.680 queries where I want to see how it thinks and 0:03:02.840 --> 0:03:06.000 like see the logic laid out, and then I can judge, 0:03:06.000 --> 0:03:07.760 like did that make sense? Do I agree with that? 0:03:08.480 --> 0:03:10.880 So one of the things that seems to be freaking 0:03:10.960 --> 0:03:14.799 people out as well as the market is that purportedly 0:03:15.480 --> 0:03:19.519 this was trained on like a very low cost, something 0:03:19.600 --> 0:03:23.760 like five point five million dollars for deep Seek V three, 0:03:24.080 --> 0:03:27.160 although I've seen people erroneously say that the five point 0:03:27.200 --> 0:03:30.040 five million was for all of its R one models, 0:03:30.040 --> 0:03:32.760 and that's not what it says in the technical paper. 0:03:32.840 --> 0:03:35.760 It was just for V three. But anyway, oh I 0:03:35.760 --> 0:03:38.280 should mention it also seems like a big chunk of 0:03:38.280 --> 0:03:41.320 it was built on Mama, so they're sort of piggybacking 0:03:41.600 --> 0:03:45.520 off of others investment. But anyway, five point five million 0:03:45.560 --> 0:03:50.320 dollars to train, is that a realistic and then b 0:03:50.600 --> 0:03:52.840 do we have any sense of how they were able 0:03:52.920 --> 0:03:53.360 to do that. 0:03:53.720 --> 0:03:55.320 So we have a very good sense of exactly what 0:03:55.360 --> 0:03:58.760 they did because they're unusually open and they gave us 0:03:58.760 --> 0:04:01.120 technical papers, they tell us what they did. They still 0:04:01.200 --> 0:04:03.400 hid some parts of the process, especially with getting from 0:04:03.640 --> 0:04:05.400 V three, which was trained for the five point five 0:04:05.440 --> 0:04:08.040 million two R one, which is the reasoning model for 0:04:08.200 --> 0:04:10.720 additional millions of dollars, where they tried to make it 0:04:10.720 --> 0:04:12.160 a little bit harder for us to duplicate it by 0:04:12.200 --> 0:04:16.479 not sharing their reinforcement learning techniques. But we shouldn't get 0:04:16.480 --> 0:04:18.440 over anchored or carried away with the five point five 0:04:18.440 --> 0:04:20.240 million dollar number. It's not that it's not real, it's 0:04:20.360 --> 0:04:23.880 very real. But in order to get that ability to 0:04:23.880 --> 0:04:26.799 spend five point five million dollars and get the model 0:04:26.839 --> 0:04:28.680 to pop out. They had to acquire the data, they 0:04:28.680 --> 0:04:30.360 had to hire the engineers, they had to build their 0:04:30.400 --> 0:04:33.840 own cluster, they had to over optimize to the bone 0:04:34.040 --> 0:04:36.680 their cluster because they're having problems of chip access thanks 0:04:36.680 --> 0:04:39.320 to our export controls. And they were training on eight hundreds. 0:04:40.480 --> 0:04:43.400 And the way they did this was they did all 0:04:43.400 --> 0:04:46.880 these sorts of mini optimism, little optimizations, including like just 0:04:46.960 --> 0:04:50.559 exactly integrating the hardware, the software, everything they were doing 0:04:51.000 --> 0:04:53.800 in order to train as cheaply as possible on fifteen 0:04:53.800 --> 0:04:58.640 trillion tokens and get the same level of performance or 0:04:58.839 --> 0:05:01.120 you know, close to the same level performance as other 0:05:01.160 --> 0:05:04.440 companies have gotten with much much more compute. But it 0:05:04.480 --> 0:05:05.960 doesn't mean that you can get your own model for 0:05:06.000 --> 0:05:07.760 five point five million dollars, even though they told you 0:05:07.800 --> 0:05:10.200 a lot of the information. In total, they're spending hundreds 0:05:10.200 --> 0:05:11.400 of millions of dollars to get this result. 0:05:11.560 --> 0:05:13.800 Wait, explain that further. Why does it still take hundreds 0:05:13.800 --> 0:05:16.800 of millions And does this mean if it takes hundreds 0:05:16.800 --> 0:05:20.039 of millions of dollars that the gap between what they're 0:05:20.080 --> 0:05:23.000 able to do versus the say American labs is perhaps 0:05:23.040 --> 0:05:24.599 not as wide as maybe people think. 0:05:24.880 --> 0:05:28.640 Well, what deepseek is doing is they have less access 0:05:28.680 --> 0:05:31.080 to chips. They can't just buy Navidiot chips the same 0:05:31.120 --> 0:05:34.160 way that you know open ai or Microsoft or and 0:05:34.279 --> 0:05:37.599 throb it can buy Nvidiot chips. So instead they had 0:05:37.600 --> 0:05:40.880 to make good use, very very efficient, killer use of 0:05:40.920 --> 0:05:44.600 the chips that they did have. So they focused on 0:05:44.920 --> 0:05:46.960 all these optimizations and all of these ways that they 0:05:47.000 --> 0:05:50.120 could save on compute. But in order to get there, 0:05:50.200 --> 0:05:52.239 they had to spend a lot of money to figure 0:05:52.240 --> 0:05:54.400 out how to do that and to build the infrastructure 0:05:54.400 --> 0:05:57.720 to do that. And you know, once they knew what 0:05:57.800 --> 0:05:59.680 to do, it cost them five point five million dollars 0:05:59.720 --> 0:06:01.120 to do it. They've shared a lot of that information 0:06:01.560 --> 0:06:04.720 and this has dramatically reduced the cost of somebody who 0:06:04.720 --> 0:06:06.240 wants to follow in their footsteps and train a new 0:06:06.279 --> 0:06:08.920 model because they've shown the way of many of their 0:06:08.920 --> 0:06:11.479 optimizations that people didn't realize they could do or didn't 0:06:11.480 --> 0:06:13.440 realize how to do them. That can now very easily 0:06:13.480 --> 0:06:16.280 be copied. But it does not mean that you are 0:06:16.320 --> 0:06:18.520 five point five million dollars away from your own V three. 0:06:19.200 --> 0:06:22.320 So the other thing that is freaking people out is 0:06:22.480 --> 0:06:25.040 the fact that this is open source, right, we all 0:06:25.080 --> 0:06:28.960 remember the days when OpenAI was more open and now 0:06:28.960 --> 0:06:31.479 it's moved to closed source. Why do you think they 0:06:31.480 --> 0:06:34.080 did that? And like how big a deal is that? 0:06:35.560 --> 0:06:37.919 So this is one of those things where they have 0:06:38.040 --> 0:06:40.080 a story and you can believe their story. You're not 0:06:40.080 --> 0:06:41.960 with their story, but their story is that they are 0:06:42.040 --> 0:06:45.800 essentially ideologically in favor of the idea that everyone should 0:06:45.800 --> 0:06:48.919 have access to the same AI, that AI should be 0:06:48.920 --> 0:06:51.960 shared with the world, especially that China should help pump 0:06:51.960 --> 0:06:54.640 out its own ecosystem and they should help grow all 0:06:54.680 --> 0:06:57.120 of the AI for the betterment of humanity. And they're 0:06:57.120 --> 0:06:59.400 going to get artificial general intelligence and they are going 0:06:59.440 --> 0:07:02.440 to open source that as well, and this is their 0:07:02.680 --> 0:07:04.440 the main point of deep Sea. This is why deep 0:07:04.480 --> 0:07:07.760 Seak exists. They disclaiming even having a business model really 0:07:08.480 --> 0:07:11.360 and you know they're they're an outgrowth of a hedge fund, 0:07:11.680 --> 0:07:14.720 and hedge fund makes money and maybe they can just 0:07:14.720 --> 0:07:17.440 do this if they choose to do that, or maybe 0:07:17.440 --> 0:07:19.680 they will end up with a different business model. But 0:07:20.640 --> 0:07:23.480 it was obviously very concerning from a lot of angles 0:07:23.520 --> 0:07:26.960 if you open source increasingly capable models, because you know, 0:07:27.040 --> 0:07:30.520 artificial general intelligence means something that's you know, as smart 0:07:30.520 --> 0:07:33.280 and capable as you and I as a human, and 0:07:33.400 --> 0:07:37.320 perhaps more so. And if you just hand that over 0:07:37.520 --> 0:07:40.440 in open form to anybody in the world who wants 0:07:40.480 --> 0:07:44.280 to do anything with it, then we don't know how 0:07:44.360 --> 0:07:48.360 dangerous that is, but it's existentially risky at some limit 0:07:49.120 --> 0:07:51.600 to unleash things that are smarter and more capable, more 0:07:51.600 --> 0:07:54.240 competitive than us, that are then going to be free 0:07:54.280 --> 0:07:57.119 and loose to you know, engage in whatever any human 0:07:57.160 --> 0:07:57.840 directs them to do. 0:07:58.480 --> 0:08:01.480 I have a really dumb question, but I hear people 0:08:01.520 --> 0:08:05.760 say artificial general intelligence all the time. AGI, what does 0:08:05.800 --> 0:08:06.600 that actually mean? 0:08:07.480 --> 0:08:09.840 There is a lot of dispute over exactly what that means. 0:08:09.840 --> 0:08:12.360 The words are not used consistently, but it stands for 0:08:12.440 --> 0:08:17.680 artificial general intelligence. Generally, it is understood to mean you 0:08:17.760 --> 0:08:20.880 can do any task that can be done on a 0:08:20.920 --> 0:08:24.880 computer that can be done cognitively only as well as 0:08:25.080 --> 0:08:25.520 a human. 0:08:26.840 --> 0:08:28.680 I mean, it does most of these things do things 0:08:28.760 --> 0:08:30.520 much better than me. I don't know how to code, 0:08:30.560 --> 0:08:33.480 and so, but I get that there are still some things. 0:08:33.480 --> 0:08:35.880 Maybe they wouldn't be as good as proving some of 0:08:35.880 --> 0:08:38.400 the are you human tests? Everyone to talk about Jevins 0:08:38.480 --> 0:08:41.160 paradox and so we see in video and broadcom shares 0:08:41.240 --> 0:08:44.319 these chip companies, they're getting crumbled today. And one of 0:08:44.360 --> 0:08:46.680 the theories like, oh no, with all these optimizations and 0:08:46.720 --> 0:08:50.520 so forth, in researchers will just use those and they'll 0:08:50.559 --> 0:08:54.079 still have max demand for compute, and so it won't 0:08:54.120 --> 0:08:56.800 actually change the ultimate end for compute. How are you 0:08:56.800 --> 0:08:57.720 thinking about this question? 0:08:58.679 --> 0:09:02.040 So I'm definitely a Jevans pro right now from the 0:09:02.080 --> 0:09:03.680 perspective of this, you. 0:09:03.720 --> 0:09:06.080 Don't think it'll have a negative impact and just the 0:09:06.120 --> 0:09:07.679 amount of compute demanded. 0:09:08.000 --> 0:09:10.079 The tweet I sent this morning was Navidio down eleven 0:09:10.120 --> 0:09:12.400 percent pre market on news that his chips are highly useful. 0:09:14.200 --> 0:09:16.800 And I believe that what we've shown is that, yes, 0:09:16.840 --> 0:09:19.240 you can get a lot more in some sense out 0:09:19.320 --> 0:09:22.720 of each Navidia chip than you expected. You can get 0:09:22.760 --> 0:09:25.760 more AI. And if there was a limited amount of 0:09:25.800 --> 0:09:29.439 stuff to do with AI, and once you did that stuff, 0:09:29.480 --> 0:09:31.720 you were done, then that would be a different story. 0:09:31.720 --> 0:09:34.560 But that's very much not the case. As we get 0:09:34.720 --> 0:09:37.600 further along towards AGI, as these ais get more capable, 0:09:38.000 --> 0:09:39.400 we're going to want to use them for more and 0:09:39.440 --> 0:09:42.720 more things, more and more often, and most importantly, the 0:09:42.840 --> 0:09:45.280 entire revolution of R one and also Open Eyes O 0:09:45.440 --> 0:09:49.760 one is inference time compute. What that means is every 0:09:49.760 --> 0:09:53.400 time you ask the question, it's going to use more compute, 0:09:53.400 --> 0:09:57.840 more cycles of GPUs to think for longer, to basically 0:09:57.920 --> 0:10:00.400 use more tokens or words to figure out what the 0:10:00.400 --> 0:10:03.480 best possible answer is. And this scales not necessarily with 0:10:03.559 --> 0:10:06.480 out limit, but it scales very very far. So Opening 0:10:06.480 --> 0:10:08.960 Eyes new three is capable of thinking for you know, 0:10:09.080 --> 0:10:11.760 many minutes. It's capable of potentially spending you know, hundreds 0:10:11.840 --> 0:10:14.160 or even in theory thousands of dollars or more on 0:10:14.280 --> 0:10:18.840 individual query. And if you knock that down by an 0:10:18.920 --> 0:10:21.560 order of magnitude, that almost certainly gets you to use 0:10:21.559 --> 0:10:23.760 it more for a given result, not use it less, 0:10:23.760 --> 0:10:27.959 because that is effect starting to get prohibitive. And over time, 0:10:28.559 --> 0:10:31.120 you know, if you have the ability to spend or 0:10:31.120 --> 0:10:33.760 markly vittle of money and then get things like virtual 0:10:33.800 --> 0:10:37.600 employees and abilities to answer any question under the sun, yeah, 0:10:37.640 --> 0:10:41.000 there's basically unlimited demand to do that or to scale 0:10:41.080 --> 0:10:43.439 up the quality of the answers as the price drops. 0:10:44.000 --> 0:10:47.720 So I basically expect that as fast as the VIDIA 0:10:47.800 --> 0:10:50.920 can manufacture chips and we can put them into data 0:10:50.960 --> 0:10:53.559 centers and give them electrical power. People will be happy 0:10:53.559 --> 0:10:54.840 to pie those chips. 0:10:54.920 --> 0:10:58.640 At the risk of angering the Jeffons Paradox bros. Just 0:10:58.679 --> 0:11:01.440 to push on the point a little bit more so, 0:11:01.520 --> 0:11:04.560 my understanding of deepseek is that one of the reasons 0:11:04.640 --> 0:11:09.880 it's special is because it doesn't rely on like specialized components, 0:11:09.960 --> 0:11:12.840 custom operators, and so it can work on a variety 0:11:12.840 --> 0:11:16.880 of GPUs. Is there a scenario where, you know, AI 0:11:17.160 --> 0:11:21.240 becomes so free and plentiful, which could in theory be 0:11:21.320 --> 0:11:25.000 good for Nvidia, But at the same time, because it's 0:11:25.040 --> 0:11:28.000 easy to run on a bunch of other GPUs, people 0:11:28.120 --> 0:11:32.480 start using you know, more like ACIK chips, like customized 0:11:32.559 --> 0:11:34.480 chips for a specific purpose. 0:11:35.320 --> 0:11:37.880 I mean, in the long run, we will almost certainly 0:11:37.920 --> 0:11:40.319 see specialized inference chips, whether from the Video or they're 0:11:40.320 --> 0:11:43.200 from someone else, and we will almost certainly see various 0:11:43.200 --> 0:11:45.880 different advancements that today's chips are going to be obsolete 0:11:46.200 --> 0:11:48.400 in a few years. That's how AI works, right, There's 0:11:48.400 --> 0:11:52.400 all these rapid advancements. But you know, I think in 0:11:52.520 --> 0:11:55.160 Video is in a very very good position take advantage 0:11:55.160 --> 0:11:57.600 of all of this. I certainly don't think that like 0:11:57.679 --> 0:12:01.000 you'll just use your laptop to run the best agis 0:12:01.240 --> 0:12:03.840 and therefore we don't have to worry about buying TPUs 0:12:04.080 --> 0:12:07.040 is a porposition. It's certainly possible that rivals will come 0:12:07.080 --> 0:12:09.160 up with superior checks. That's always possible. The video does 0:12:09.160 --> 0:12:11.880 not have a monopoly, but the video certainly seems to 0:12:11.920 --> 0:12:13.040 be a dominantiation right now. 0:12:29.640 --> 0:12:31.360 It seems to me. I mean, I know there's others, 0:12:31.360 --> 0:12:33.040 but it seems to be in the US. There's like 0:12:33.240 --> 0:12:38.559 three main AI producers of models that people know about. 0:12:38.600 --> 0:12:43.400 There's Open Ai, there's Claude, and then there's Meta with Lama. 0:12:43.480 --> 0:12:46.760 And it's worth knowing that Meta is green today, that 0:12:46.800 --> 0:12:48.760 the stock is actually up as of the time I'm 0:12:48.760 --> 0:12:51.640 talking about this one point one percent. Just go through 0:12:51.720 --> 0:12:55.480 each one real quickly, how the sort of deep seek 0:12:55.640 --> 0:12:58.679 shock affects them and their viability and where they stand today. 0:12:59.160 --> 0:13:01.400 I think the most amazing thing about your question is 0:13:01.400 --> 0:13:02.520 that you forgot about Google. 0:13:02.960 --> 0:13:05.000 Oh yeah, right, yeah, that's very tilling. 0:13:05.280 --> 0:13:10.320 But everyone else has forgotten about Yeah, surprising Semini flash 0:13:10.400 --> 0:13:13.480 thinking their version of one and R one got updated 0:13:13.520 --> 0:13:16.520 a few days ago, and there are many reports that 0:13:16.559 --> 0:13:20.319 it's actually very good now and potentially competitive and effectively. 0:13:20.320 --> 0:13:22.360 It's free to use for a lot of people on 0:13:22.600 --> 0:13:26.240 AI studio, but nobody I know has taken the time 0:13:26.280 --> 0:13:28.320 to check and find out how good it is because 0:13:28.320 --> 0:13:30.280 we've all been too obsessed with being deep seep roads. 0:13:31.720 --> 0:13:34.319 Google's had its like rhetorical lunch eaten over and over 0:13:34.320 --> 0:13:36.160 and over again December. Like open a I would come 0:13:36.200 --> 0:13:37.960 up with advance after advance after Advance, then Google would 0:13:38.000 --> 0:13:40.080 love Advance after advanced after advance, and Googles would be 0:13:40.400 --> 0:13:42.679 seemingly actually, if anything, more impressive. And yet everyone will 0:13:42.720 --> 0:13:44.160 always just talk about open a eyes, so this is 0:13:44.160 --> 0:13:46.640 not even new. Something is going on there. So in 0:13:46.760 --> 0:13:50.400 terms of open Ai, Open Ai should be very nervous 0:13:50.800 --> 0:13:53.920 in some sense, of course, because they have the reasoning models, 0:13:53.920 --> 0:13:55.600 and now the reasoning model has been copied much more 0:13:55.640 --> 0:13:59.080 effectively than previously, and the competition is a hell of 0:13:59.080 --> 0:14:02.120 a lot cheaper Open Eye is charging, so it's a 0:14:02.120 --> 0:14:04.400 direct threat to their business model for obvious reasons, and 0:14:04.440 --> 0:14:07.280 it looks like their lead in reasoning models is smaller 0:14:07.280 --> 0:14:10.320 and faster to undo than you would expect. Because if 0:14:10.400 --> 0:14:12.760 deep Sea can do it, of course Anthropic and Google 0:14:13.320 --> 0:14:14.800 you know, can do it. And everyone else can do 0:14:14.800 --> 0:14:18.840 it as well, and Thropic, which produces Claude, has not 0:14:18.920 --> 0:14:21.960 yet produced their own reasoning model. They clearly are operating 0:14:22.160 --> 0:14:25.000 under a shortage of compute in some sense, so it's 0:14:25.080 --> 0:14:27.400 entirely possible that they have chosen not to launch a 0:14:27.440 --> 0:14:30.240 reasoning model even though they could, or not focused on 0:14:30.280 --> 0:14:33.240 training one as quickly as possible until they've addressed this problem. 0:14:33.320 --> 0:14:36.760 They're continuously taking investment. We should expect them to solve 0:14:36.760 --> 0:14:40.680 their problems over time, but they seem like they should 0:14:40.680 --> 0:14:43.560 be dressed directly concerned because they're less of a directly 0:14:43.560 --> 0:14:46.440 competitive product in some sense, but also they tend to 0:14:46.520 --> 0:14:49.600 market to effectively much more aware people, so their people 0:14:49.600 --> 0:14:51.400 will also know about deep Seak and they will have 0:14:51.440 --> 0:14:54.680 a choice to make. If I was Meta, I would 0:14:54.680 --> 0:14:58.000 be far more worried, especially if I was on their 0:14:58.040 --> 0:15:01.200 Genai team and wanted to keep my job, because Meta's 0:15:01.240 --> 0:15:03.920 lunch has been eaten massively here right, Meta with Lama 0:15:04.080 --> 0:15:07.560 had the best open models, and all the best open 0:15:07.600 --> 0:15:12.600 models were effectively fine tunes of Lama, and now deep 0:15:12.640 --> 0:15:15.360 Seat comes out, and this is absolutely not in any 0:15:15.400 --> 0:15:17.600 way a fine tune of Lama. This is their own product, 0:15:18.000 --> 0:15:20.920 and V three was already blowing everything that Meta had 0:15:20.920 --> 0:15:23.360 out of the water. Are one. There are reports that 0:15:23.400 --> 0:15:25.680 it's better than their new version that they're training now, 0:15:25.680 --> 0:15:28.560 it's better than Lava four, which I would expect to 0:15:28.600 --> 0:15:33.320 be true. And so there's no point in releasing an 0:15:33.360 --> 0:15:36.560 inferior open model if everyone on the open model community 0:15:36.640 --> 0:15:38.680 just be like, why don't I just use deep Sea Tracy. 0:15:38.720 --> 0:15:42.000 It's interesting that, as V said, the people who should 0:15:42.000 --> 0:15:45.520 be nervous are the employees of Meta, not Meta itself, 0:15:45.520 --> 0:15:48.840 because Meta is up, and so you gotta wonder. It's like, well, 0:15:48.840 --> 0:15:50.800 maybe they don't. I don't know, maybe they don't need 0:15:50.840 --> 0:15:54.400 to invest as much in their own open source AI 0:15:54.440 --> 0:15:56.000 if there's a better one out there now the stock 0:15:56.120 --> 0:15:56.320 is up. 0:15:56.320 --> 0:16:00.000 Anyway, The market has been very strange from my perspective 0:16:00.080 --> 0:16:02.080 on how it reacts to different things that Meta does. 0:16:02.120 --> 0:16:04.360 For a while, Meta would announce we're spending more in AI, 0:16:04.680 --> 0:16:06.880 we're investing in all these data centers, we're training all 0:16:06.880 --> 0:16:09.240 of these models, and the market would go, what are 0:16:09.280 --> 0:16:12.480 you doing? This is another metaverse or something, and we're 0:16:12.480 --> 0:16:14.240 gonna hammer your stock and we're gonna drag you down. 0:16:14.640 --> 0:16:16.920 And then with the most recent sixty five billion dollar 0:16:16.920 --> 0:16:20.840 announce spend. Then then Meta was up. Presumaly, they're gonna 0:16:20.920 --> 0:16:24.040 use it mostly for inference effectively in a lot of 0:16:24.040 --> 0:16:27.440 scenarios because they had these massive inference costs to want 0:16:27.480 --> 0:16:31.360 to put ail over Facebook and Instagram. So you know, 0:16:31.520 --> 0:16:33.480 if anything, like you know, I think the market might 0:16:33.480 --> 0:16:35.680 be speculating that this means that they will know how 0:16:35.680 --> 0:16:37.920 to train better lamas that are cheaper to operate, and 0:16:38.160 --> 0:16:40.400 their costs will go down, and then they'll be in 0:16:40.440 --> 0:16:43.200 a better position, and that theory isn't. 0:16:42.960 --> 0:16:48.800 Crazy since we all just collectively remembered Google. I have 0:16:48.880 --> 0:16:50.960 a question that's sort of been on the back in 0:16:51.000 --> 0:16:53.000 the back of my mind. I think Joe has brought 0:16:53.040 --> 0:16:56.760 this up before as well. But like when Google debuted, 0:16:57.800 --> 0:17:00.840 it took years and years and years for people to 0:17:00.920 --> 0:17:04.119 sort of catch up to the search function, and actually 0:17:04.200 --> 0:17:07.400 no one ever really caught up, right, So Google has 0:17:07.440 --> 0:17:10.440 like dominated for years. Why is it when it comes 0:17:10.440 --> 0:17:16.359 to these chatbots there aren't like higher wider moats around 0:17:16.480 --> 0:17:17.359 these businesses. 0:17:18.119 --> 0:17:22.679 So one reason is that everyone's training on roughly the 0:17:22.720 --> 0:17:26.440 same data, meeting the entire Internet and all of human knowledge, 0:17:26.560 --> 0:17:28.080 so it's very hard to get that much of a 0:17:28.080 --> 0:17:30.800 permanent data edge there unless you're creating synthetic data off 0:17:30.800 --> 0:17:32.960 of your own models, which is what Opening Eye is 0:17:33.200 --> 0:17:36.720 plausively doing. Now. Another reason is because everybody is scaling 0:17:36.720 --> 0:17:39.359 as fast as possible and adding zeros to everything on 0:17:39.400 --> 0:17:42.600 a periodic basis in calendar time. It doesn't take that 0:17:42.680 --> 0:17:45.760 long before your rival is going to have access to 0:17:45.800 --> 0:17:48.720 more compute than you had, and they're copying your techniques 0:17:48.720 --> 0:17:51.400 more aggressively. They's just a lot less secret sauce there's 0:17:51.440 --> 0:17:54.480 only so many algorithms. Fundamentally, everyone is relying on the 0:17:54.520 --> 0:17:56.399 scaling laws. It's called the bitter lesson is the idea 0:17:56.440 --> 0:17:58.440 that you know, you just scale more, you just use 0:17:58.440 --> 0:18:00.600 more compute, you just use more data, you just use 0:18:00.680 --> 0:18:03.400 more parameters and deep seek. You're saying, maybe you don't. 0:18:03.560 --> 0:18:06.159 You can do more optimizations, you can get around this 0:18:06.240 --> 0:18:10.399 problem and still get a superior model. But mostly, yeah, 0:18:10.480 --> 0:18:13.159 there's been a lot of just I can catch up 0:18:13.160 --> 0:18:16.119 to you by copying what you did. Also that I 0:18:16.119 --> 0:18:19.399 can see the outputs, right, I can query your model, 0:18:19.640 --> 0:18:22.160 and I can use your model's outputs to actively train 0:18:22.560 --> 0:18:27.119 my model. And you see this in things like most 0:18:27.160 --> 0:18:29.760 models that get trained. You ask them who trains you, 0:18:29.840 --> 0:18:32.960 and they will often say, oh, I'm from Open Ai and. 0:18:33.040 --> 0:18:35.520 The internet has gotten so weird. I just the internet 0:18:35.600 --> 0:18:38.160 is so weird to speak. Mashavitz, thank you so much 0:18:38.240 --> 0:18:41.159 for running over to the Odd Lots and helping us 0:18:41.200 --> 0:18:44.320 record this emergency pod on the Deep Seek selloff though. 0:18:44.320 --> 0:18:45.000 It was fantastic. 0:18:45.080 --> 0:18:58.600 All right, thank you, Tracy. 0:18:58.640 --> 0:19:00.639 I love talking to v We got just sort of 0:19:00.680 --> 0:19:03.880 make him our Ai or our Ai guy. 0:19:04.119 --> 0:19:06.120 I mean, to be honest, we could probably have him 0:19:06.160 --> 0:19:09.280 back on again because there's gonna be stuff happening. 0:19:09.480 --> 0:19:12.359 Maybe we will, and obviously it's we could go a 0:19:12.400 --> 0:19:15.280 lot longer. This is a really exciting story. This is 0:19:15.320 --> 0:19:18.360 a really exciting story, and things are just getting really 0:19:18.400 --> 0:19:19.200 weird these days. 0:19:19.240 --> 0:19:22.320 It is kind of crazy how fast all of this is. Yap, 0:19:22.840 --> 0:19:24.960 And then the other thing I would say is just 0:19:25.320 --> 0:19:28.560 the bitter lesson. Great name for a band. 0:19:29.119 --> 0:19:32.680 Oh, totally totally great. Maybe when we do our Ai 0:19:32.840 --> 0:19:36.040 themed proud rock band. True, Yes, that could be our name. 0:19:36.119 --> 0:19:38.040 Yes, let's do that. Okay, shall we leave it there? 0:19:38.119 --> 0:19:38.840 Let's leave it there. 0:19:39.160 --> 0:19:41.879 This has been another episode of the Odd Thoughts podcast. 0:19:42.000 --> 0:19:45.520 I'm Tracy Alloway. You can follow me at Tracy Alloway. 0:19:45.280 --> 0:19:48.320 And I'm Jill Wisenthal. You can follow me at the Stalwart. 0:19:48.560 --> 0:19:52.000 Follow our guests Vimashovitz, he's at this v Also definitely 0:19:52.119 --> 0:19:54.440 check out his free subs deck. It's a must read 0:19:54.520 --> 0:19:57.760 for me. Don't worry about the v OZ, really great stuff 0:19:57.800 --> 0:20:00.639 every single day. Follow our producers Carmen ra Rigaz at 0:20:00.720 --> 0:20:03.879 Kerman armand dash O Bennett at Dashbot and kill Brooks 0:20:03.880 --> 0:20:06.960 at Kilbrooks. For more oddlocks content, go to Bloomberg dot 0:20:07.000 --> 0:20:09.840 com slash odlocks. We have transcripts, a blog in a newsletter, 0:20:10.040 --> 0:20:11.960 and you can chat about all of these topics twenty 0:20:11.960 --> 0:20:15.800 four to seven in our discord Discord dot gg slash Odlots. 0:20:15.840 --> 0:20:17.920 Maybe we'll give zv to do a Q and A 0:20:18.040 --> 0:20:21.200