WEBVTT - What The Hell Is DeepSeek? 0:00:02.400 --> 0:00:07.040 A media Hello and welcome to Better Offline. I'm your 0:00:07.040 --> 0:00:20.759 host ed Zitron. What a lot of you have been 0:00:20.800 --> 0:00:24.320 getting in touch? Yes, you're getting your Deep Seek episode. 0:00:24.320 --> 0:00:26.080 In fact, this is the first of a two parter. 0:00:26.640 --> 0:00:28.520 This will come out on Friday, which is when you're 0:00:28.560 --> 0:00:30.960 listening to this, and then it'll follow up on Monday. 0:00:31.320 --> 0:00:36.040 I apologize. I spent a lot of Monday writing this 0:00:36.159 --> 0:00:38.720 and also learning about a lot of this stuff in 0:00:38.760 --> 0:00:41.479 an attempt to distill it as best I could. This 0:00:41.560 --> 0:00:45.120 situation is extremely weird, and it's developing, and I think 0:00:45.159 --> 0:00:46.839 even when I put out this episode there will be 0:00:46.920 --> 0:00:49.320 new parts of it that I have yet to really 0:00:49.360 --> 0:00:52.400 get to. I will do my absolute best to explain 0:00:52.440 --> 0:00:55.240 in these episodes both what is happening with Deep Seek, 0:00:55.640 --> 0:00:58.320 what it means, what they've built, and what it's going 0:00:58.360 --> 0:01:01.880 to do in the future. But let's again, so, as 0:01:01.920 --> 0:01:05.200 general came to a close, the entire generative AI industry 0:01:05.240 --> 0:01:07.880 found itself in a kind of chaos. In sure, the 0:01:07.920 --> 0:01:10.880 recent AI bubble and in particular the hundreds of billions 0:01:10.920 --> 0:01:14.040 of dollars being spent on it, hinged on this big 0:01:14.080 --> 0:01:16.720 idea that we need bigger models, which are both trained 0:01:16.760 --> 0:01:20.319 and run on bigger and even larger GPUs, almost entirely 0:01:20.319 --> 0:01:23.639 sold by Nvidia, and in turn they're based in bigger 0:01:23.680 --> 0:01:27.880 and bigger data centers owned by companies like Microsoft, Oracle, Amazon, 0:01:27.920 --> 0:01:31.320 and Google. Now, there was also this expectation that this 0:01:31.360 --> 0:01:34.840 would always be the case. Hubris within this industry is 0:01:34.920 --> 0:01:39.280 kind of part of the whole deal, and generative AI 0:01:39.520 --> 0:01:41.360 was always meant to be this way, at least for 0:01:41.400 --> 0:01:43.920 the American developers. It was always meant to be energy 0:01:43.959 --> 0:01:47.280 and compute hungary. Throwing entire Zoo's worth of animals and 0:01:47.360 --> 0:01:50.000 boiling lakes was necessary to do this. There was never 0:01:50.000 --> 0:01:54.040 any other way to do it, and I thought, at 0:01:54.120 --> 0:01:56.200 least I've thought for a while that this was because 0:01:56.280 --> 0:01:59.760 they just they tried to make them more efficient, but 0:01:59.840 --> 0:02:03.080 they couldn't. There was just something about transformer based architecture, 0:02:03.120 --> 0:02:05.639 like the stuff that underpins Chat GPT, so the GPT 0:02:05.720 --> 0:02:09.160 model under Chat GPT either. It wasn't the case, though. 0:02:10.000 --> 0:02:12.960 A Chinese artificial intelligence company that few people had really 0:02:13.040 --> 0:02:15.280 heard of, called deep Seak came along a few weeks 0:02:15.280 --> 0:02:19.000 ago with multiple models that aren't merely competitive with open aiyes, 0:02:19.160 --> 0:02:22.919 but actually undercut them in several meaningful ways. Deep Seak's 0:02:22.960 --> 0:02:25.680 models are both open source, which means that their source 0:02:25.720 --> 0:02:29.680 code and research is public, and they're significantly more efficient 0:02:29.680 --> 0:02:32.640 as well as much as thirty times cheaper to run. 0:02:32.720 --> 0:02:34.880 In the case of their reasoning model are one which 0:02:34.919 --> 0:02:38.560 is competitive with open Aizo one and fifteen or more 0:02:38.639 --> 0:02:43.200 times more efficient than GPT four. Oh, it's actually kind 0:02:43.240 --> 0:02:45.200 of crazy when you think about it, and as you're 0:02:45.240 --> 0:02:47.640 going to hear, this whole thing has jokeified me all 0:02:47.680 --> 0:02:50.440 over again. And what's crazy is that some of them 0:02:50.440 --> 0:02:52.799 can be distilled, which I'll get too later and run 0:02:52.840 --> 0:02:55.800 on local devices like a laptop. It's kind of crazy, 0:02:56.160 --> 0:02:58.600 and as a result, the markets have kind of panicked 0:02:58.639 --> 0:03:02.480 because the entire narrative of the AI bubble has been 0:03:02.520 --> 0:03:04.800 that these models have to be expensive because they are 0:03:04.840 --> 0:03:08.079 the future, and that's why hyperscalers had to burn two 0:03:08.160 --> 0:03:12.040 hundred billion dollars in capital expenditures for infrastructure to support 0:03:12.080 --> 0:03:15.919 this wonderful boom, and specifically the ideas of open AI 0:03:16.000 --> 0:03:18.920 and anthropic the idea that there was another way to 0:03:18.960 --> 0:03:20.840 do this, that in fact, we didn't need to spend 0:03:20.840 --> 0:03:22.600 all this money, and that maybe we could find a 0:03:22.639 --> 0:03:26.960 more efficient way of doing it. Well, that would require 0:03:27.000 --> 0:03:29.239 them to have another idea rather than throw as much 0:03:29.280 --> 0:03:32.000 money at the problem as possible. Yeah, they just didn't 0:03:32.080 --> 0:03:35.200 consider it, it turns out. And now as long as 0:03:35.240 --> 0:03:38.920 come this outsider that's upended the whole conventional understanding and 0:03:39.120 --> 0:03:43.440 perhaps even dethroned a member of America's tech royalty, Sam Altman, 0:03:43.480 --> 0:03:46.000 a man who has crafted, if not a cult of personality, 0:03:46.400 --> 0:03:49.440 some sort of public image of an unassailable visionary that 0:03:49.480 --> 0:03:52.200 will lead the vanguard in the biggest technological change since 0:03:52.200 --> 0:03:56.680 the Internet. Yeah, he's wrong. He never was doing that. 0:03:56.960 --> 0:03:59.440 I've been saying it for a while. He's never been 0:03:59.480 --> 0:04:02.880 doing this. But Deep Seek isn't just an outsider now. 0:04:02.920 --> 0:04:05.440 They are a company that's emerged as a side project 0:04:05.440 --> 0:04:08.720 from a tiny, tiny Chinese hedge fund, at least by 0:04:08.760 --> 0:04:10.880 the stands of hedge funds, like five point five billion 0:04:10.920 --> 0:04:14.200 dollars on the assets under management, and their founding team 0:04:14.240 --> 0:04:16.839 has nowhere near the level of fame and celebrity or 0:04:16.839 --> 0:04:21.000 even the accolades of Sam Moltman. It's distinctly humiliating for 0:04:21.080 --> 0:04:23.920 everyone involved that is in Deep Seek. And on top 0:04:23.920 --> 0:04:27.479 of all of that, Deep Seek's biggest ugliest insult is 0:04:27.480 --> 0:04:30.360 that its model, deep seek are one, is competitive, like 0:04:30.400 --> 0:04:33.799 I said, with open AI's incredibly expensive oh one reasoning model, 0:04:33.960 --> 0:04:37.880 yet significantly and I mean ninety six percent cheaper to run. 0:04:38.120 --> 0:04:40.120 And it can even be run locally. Like I said 0:04:40.440 --> 0:04:42.520 speaking to a few developers, I know one was able 0:04:42.520 --> 0:04:44.679 to run deep Seek's R one model and their twenty 0:04:44.760 --> 0:04:47.279 twenty one MacBook Pro with an M one chip that 0:04:47.400 --> 0:04:51.480 is a four year old computer, not a thirty thousand 0:04:51.680 --> 0:04:55.440 GPU inside. It's kind of crazy. Worse still, Deep seeks 0:04:55.480 --> 0:04:58.159 models are made freely available to use, with the source 0:04:58.160 --> 0:05:01.200 code published under the MIT tech line, along with the 0:05:01.200 --> 0:05:04.119 research on how they were made, although not the training data, 0:05:04.160 --> 0:05:06.159 which makes some people say it's not really open source. 0:05:06.160 --> 0:05:08.280 But for the sake of argument, I'm just going to 0:05:08.320 --> 0:05:11.080 say open source. And this means by the way that 0:05:11.320 --> 0:05:14.120 deep seeks models can be adapted and used for commercial 0:05:14.200 --> 0:05:17.599 use without the need for royalties or fees. Anyone can 0:05:17.640 --> 0:05:20.880 take this and build their own. It's kind of crazy. 0:05:21.400 --> 0:05:24.200 By contrast, open ai is anything but open and its 0:05:24.240 --> 0:05:26.840 last LM to be released under the MIT license was 0:05:26.880 --> 0:05:30.479 twenty nineteen. Is GBT two No no wait wait, ship 0:05:30.680 --> 0:05:33.800 let me correct that deep Seek's biggest ugliest secret is 0:05:33.839 --> 0:05:36.880 actually that it's obviously taking aim at every element of 0:05:36.920 --> 0:05:40.839 open aiy's portfolio. As the company was already dominating headlines, 0:05:40.880 --> 0:05:43.719 this week it quietly dropped its Janus Pro seven B 0:05:43.839 --> 0:05:47.360 image generation and analysis model, which the company says outperforms 0:05:47.360 --> 0:05:50.719 both stable diffusion and open AI's Daly three. And those 0:05:50.760 --> 0:05:53.480 are by the way image generation thinks. So you type 0:05:53.480 --> 0:05:57.200 in something you like Garfield with boobs, and then outcomes 0:05:57.200 --> 0:06:00.560 of Garfield with juicy cans, and that's probably the first 0:06:00.560 --> 0:06:02.560 time you hear that on the podcast, but probably not 0:06:02.640 --> 0:06:06.840 the last. And as with its other code, deep Seek 0:06:06.880 --> 0:06:09.320 has made this freely available to both commercial and personal 0:06:09.400 --> 0:06:13.560 users alike, whereas open ai is largely paywall darly three. 0:06:13.640 --> 0:06:17.520 This is really, it's a truly crazy situation. And it's 0:06:17.520 --> 0:06:20.520 also this cynical, vulgar version of David and Goliath, where 0:06:20.520 --> 0:06:23.200 a tech startup back by a shadowy Chinese hedge fund 0:06:23.360 --> 0:06:26.520 with eight billion dollars under management is somehow the plucky 0:06:26.560 --> 0:06:29.000 upstart against the lumbering loss eo fish one hundred and 0:06:29.040 --> 0:06:33.000 fifty billion dollars startup back by multiple public tech companies 0:06:33.000 --> 0:06:36.599 with a market capitalization of other three trillion dollars I realized, 0:06:36.600 --> 0:06:39.119 by the way I said earlier, five point five billion 0:06:39.160 --> 0:06:41.719 dollars under management. This is why you check your notes 0:06:41.720 --> 0:06:44.040 in advance. But I'm not cutting it. This is fresh. 0:06:44.120 --> 0:06:47.120 I am inside a closet in New York. The content 0:06:47.320 --> 0:06:51.159 must flow anyway. Deep Seek's V three model, which is 0:06:51.160 --> 0:06:54.080 comparable and competitive with both open AI's GPT four roh 0:06:54.160 --> 0:06:57.360 and anthropics Claude Sonnet three point five models, which by 0:06:57.360 --> 0:07:00.480 the way, has some reasoning features. As I said, it's 0:07:00.520 --> 0:07:03.839 fifty three times cheaper to run the R one when 0:07:03.920 --> 0:07:08.040 using the company's own cloud services, and as mentioned earlier, 0:07:08.080 --> 0:07:11.000 said model is effectively free for anyone to use locally 0:07:11.080 --> 0:07:13.240 or on their own cloud instances, and could be taken 0:07:13.280 --> 0:07:15.640 by any commercial enterprise and turned into a product of 0:07:15.640 --> 0:07:19.680 their own should they desire to say, compete with open Ai, 0:07:19.800 --> 0:07:24.400 the loudest and most annoying startup of all time. In essence, Deepseek, 0:07:24.440 --> 0:07:26.800 and I'll get into its background and the concerns people 0:07:26.840 --> 0:07:29.600 might have about its Chinese origins released two models that 0:07:29.640 --> 0:07:32.640 perform competitively and even beat models from both open Air 0:07:32.720 --> 0:07:35.760 and Anthropic, undercut them in price, and then made them 0:07:35.800 --> 0:07:38.880 open undermining not just the economics of the biggest generative 0:07:38.880 --> 0:07:42.360 AI companies, but laying bare exactly how they work. The 0:07:42.400 --> 0:07:47.240 magic's gone. There's no more voodoo inside Samultman's soul. It's 0:07:47.320 --> 0:07:51.440 all out there. And the last point is extremely important 0:07:51.480 --> 0:07:54.480 when it comes to open EI's reasoning model, which specifically 0:07:54.600 --> 0:07:57.080 hid its chain of thought for fear of these unsafe 0:07:57.120 --> 0:08:00.200 thoughts that might manipulate the customer. And then they add 0:08:00.280 --> 0:08:02.600 slightly under their breath that the actual reasons they did 0:08:02.640 --> 0:08:05.720 it was a competitive advantage. Now to explain what that means. 0:08:05.880 --> 0:08:09.640 When you make a request with open Aiy's oh one model, 0:08:09.720 --> 0:08:11.720 say give me all the states with the letter are 0:08:11.840 --> 0:08:14.720 in them, it actually shows you like the thinking. And 0:08:14.720 --> 0:08:16.880 by the way, these things don't fucking think. They're they're 0:08:16.920 --> 0:08:19.880 computer bullshit, like they don't think at all. But I'm 0:08:19.880 --> 0:08:22.320 going to use it just for this so you see it. 0:08:22.360 --> 0:08:26.000 Say okay, here are all the American states, which ones 0:08:26.040 --> 0:08:29.080 have that letter? I'm checking all of those. It's effectively 0:08:29.120 --> 0:08:32.440 having a large language model check a large language model. Now, 0:08:32.600 --> 0:08:35.280 the thing is the steps they were showing you were 0:08:35.280 --> 0:08:37.560 all cleaned up. They would look nice, they would be 0:08:37.600 --> 0:08:41.440 formatted nicely. Deep Seak's chain of thought is completely laid bare, 0:08:42.080 --> 0:08:46.000 which is very interesting because it really takes the wind 0:08:46.000 --> 0:08:48.800 out of open Aiy's sales. And on top of that, 0:08:49.760 --> 0:08:52.320 it allows you to see actually how these things think 0:08:52.400 --> 0:08:55.240 through things, again not really thinking, but still you can 0:08:55.280 --> 0:08:57.959 see things about how large language models work that these 0:08:57.960 --> 0:09:00.440 companies didn't want you to have. On top of this, 0:09:00.840 --> 0:09:04.560 open aiy one model has something even shittier to it, 0:09:04.600 --> 0:09:07.240 which is these chain of thought things all cost money. 0:09:07.600 --> 0:09:10.880 When you see it generate these thoughts, it's actually generating 0:09:10.920 --> 0:09:13.240 more thoughts than you see because they're hiding the chain 0:09:13.280 --> 0:09:15.440 of thought. So open ai is just charging you an 0:09:15.440 --> 0:09:18.200 indeterminate amount of money, an insane amount of money, as 0:09:18.200 --> 0:09:21.360 I'll get too later. But nevertheless, you don't know what 0:09:21.400 --> 0:09:23.920 you're being charged for. You don't even know what's really 0:09:23.960 --> 0:09:26.720 going on under the hood. Or you could use deep 0:09:26.760 --> 0:09:30.439 seek and let's be completely clear, by the way, open 0:09:30.440 --> 0:09:34.319 AI's literal only competitive advantage against Meta and Anthropic was 0:09:34.400 --> 0:09:37.200 its reasoning models OH one and O three and O three, 0:09:37.200 --> 0:09:38.839 by the way, is currently in a research preview and 0:09:38.960 --> 0:09:41.920 is mostly just more of the same. Although I mentioned 0:09:41.960 --> 0:09:44.480 earlier in the show that anthropics. Claudes Sonnet three point 0:09:44.520 --> 0:09:48.480 five has some reasoning features. They're comparatively more rudimentary than 0:09:48.520 --> 0:09:50.600 those in O one and O three, and i'd argue 0:09:50.679 --> 0:09:54.400 are one, which is deep Seek's model. In an AI context, 0:09:54.480 --> 0:09:56.839 reasoning works by breaking down a prompt into a series 0:09:56.840 --> 0:10:00.480 of different steps with considerations of different approaches. Like I 0:10:00.520 --> 0:10:03.439 said earlier, effectively a large language model checking its own 0:10:03.480 --> 0:10:06.480 homework with no thinking involved, because like I said, they 0:10:06.480 --> 0:10:09.520 do not think or no things an open Ai rushed 0:10:09.559 --> 0:10:12.160 to launch its O one reasoning model last year because, 0:10:12.320 --> 0:10:15.720 and I quote fortune from last October, Sam Mormon was 0:10:16.000 --> 0:10:19.320 eager to prove to potential investors that in the company's 0:10:19.400 --> 0:10:22.080 latest funding around, the open ai remains at the forefront 0:10:22.120 --> 0:10:25.480 of AI development, and as I've noted in my newsletter 0:10:25.520 --> 0:10:28.400 at the time, it was not particularly reliable, failing to 0:10:28.440 --> 0:10:31.040 accurately count the number of times the letter R appeared 0:10:31.040 --> 0:10:33.800 in the word strawberry, which was the code name for 0:10:34.240 --> 0:10:38.080 one very funny stuff. At this point, it's fairly obvious 0:10:38.120 --> 0:10:41.400 that open ai wasn't anywhere near the forefront of AI development, 0:10:41.640 --> 0:10:44.440 and now that its competitive advantage is effectively gone, there 0:10:44.440 --> 0:10:47.000 are genuine doubts about what comes next for the company. 0:10:48.280 --> 0:10:51.000 As I'll go into there are many questionable parts of 0:10:51.000 --> 0:10:53.960 Deepseek's story. It's funding, what GPUs it has, and how 0:10:54.040 --> 0:10:56.720 much it actually spent training these models. But what we 0:10:56.840 --> 0:11:00.680 definitively understand to be true is badly for open Ai, 0:11:00.880 --> 0:11:03.480 and I would argue every other large US tech firm 0:11:03.480 --> 0:11:06.160 that's jumped onto the generative AI bandwagon in the past 0:11:06.160 --> 0:11:20.200 few years. Deep seeks models actually exist. They work, at 0:11:20.280 --> 0:11:22.880 least by the standards of hallucination PRONELLA lams that don't, 0:11:22.920 --> 0:11:25.959 at the risk of repeating myself know anything. They've been 0:11:26.000 --> 0:11:29.680 independently verified to be competitive in performance, and their magnitudes 0:11:29.800 --> 0:11:34.400 cheaper in price than those from both hyperscalers, Google's Gemini, Mets, Lama, 0:11:34.440 --> 0:11:36.560 Amazon Que and so on and so forth, and from 0:11:36.600 --> 0:11:41.000 those released by open Ai and Anthropic. Deep seeks models 0:11:41.040 --> 0:11:44.200 don't require massive new data centers. They run on GPUs 0:11:44.240 --> 0:11:47.040 currently used to run services like chat, GPT, and even 0:11:47.080 --> 0:11:50.000 work on more austere hardware, Nor do they require an 0:11:50.120 --> 0:11:53.840 endless supply of bigger, faster Nvidio GPUs every single year 0:11:53.880 --> 0:11:57.920 to progress. The entire AI bubble was inflated based on 0:11:57.960 --> 0:12:00.600 the premise that these models were simply impossible to build 0:12:00.600 --> 0:12:04.000 without burning massive amounts of cash, straining the power grid, 0:12:04.000 --> 0:12:07.400 and blowing past emission skulls, and that these costs were 0:12:07.400 --> 0:12:11.560 both necessary and really good because they'd lead to creating 0:12:11.600 --> 0:12:15.400 powerful AI, something that's yet to happen. And it's kind 0:12:15.400 --> 0:12:18.319 of obvious at this point that that wasn't true. Now 0:12:18.360 --> 0:12:22.600 the markets are sitting around there asking a very reasonable question, Shit, 0:12:22.760 --> 0:12:27.400 did we just waste two hundred billion dollars? Anyway, let's 0:12:27.400 --> 0:12:30.720 get into the nitty grit. What is deep Seek? First 0:12:30.760 --> 0:12:32.760 of all, if you want to super deep dive into 0:12:32.800 --> 0:12:35.240 what it is, I can't recommend venture beats right up enough. 0:12:35.280 --> 0:12:36.880 I'll link to it in the show notes as they 0:12:36.960 --> 0:12:39.800 usually do. It's really good and it goes into a 0:12:39.800 --> 0:12:42.120 lot more detail than I woar. But here's the too 0:12:42.200 --> 0:12:44.880 long didn't read for you. Deep Seek is a spin 0:12:44.920 --> 0:12:47.520 off from a Chinese hedge fund called high Flyer Quant. 0:12:47.840 --> 0:12:50.079 It's a relatively small and young company, and from its 0:12:50.120 --> 0:12:52.960 inception it went big on algorithmic and AI driven trading. 0:12:53.320 --> 0:12:56.120 Later it started building its own standalone chat bots, including 0:12:56.120 --> 0:12:59.440 a chat GPT equivalent for the Chinese market. This is 0:12:59.559 --> 0:13:01.760 what we need, right Now, I'm sure some of you 0:13:01.800 --> 0:13:05.080 will say, oh, well, who knows if that's really true. Sure, 0:13:05.520 --> 0:13:07.760 I think that that's fair. I also think that there 0:13:07.760 --> 0:13:09.880 are parts of Sam Mortman's legend that we should question 0:13:09.960 --> 0:13:13.280 as well. I think the circumstances under which Sam Mortman 0:13:13.360 --> 0:13:16.880 got made head of y Combinator are extremely questionable. I'm 0:13:16.920 --> 0:13:19.240 saying you can question deep Seek, and indeed you should. 0:13:19.240 --> 0:13:21.920 We should be more critical of these powerful companies, but 0:13:22.040 --> 0:13:24.520 don't do it halfway. If we're going to be worried, 0:13:24.600 --> 0:13:28.360 let's be worried about everyone. Now. Deepseak did a few 0:13:28.360 --> 0:13:31.200 things differently, like open sourcing its models, although it likely 0:13:31.240 --> 0:13:34.800 built upon take from other companies like Metaslama and the 0:13:35.160 --> 0:13:38.680 mL library PyTorch to train its models. It's secured over 0:13:38.760 --> 0:13:43.160 ten thousand Nvidia GPUs right before the US imposed export restrictions, 0:13:43.160 --> 0:13:45.240 which sounds like a lot, but it's a fraction of 0:13:45.240 --> 0:13:47.320 what the big AI labs like Google, Open Air, and 0:13:47.360 --> 0:13:50.480 Anthropic have to play with. I think I've heard estimates 0:13:50.520 --> 0:13:53.120 of like one hundred thousand to three hundred thousand each, 0:13:53.200 --> 0:13:56.199 if not more. Now you've likely seen or heard that 0:13:56.280 --> 0:13:59.080 deep Seak trained its latest model for five point six 0:13:59.120 --> 0:14:01.520 million dollars a poster to the insane amounts that I'll 0:14:01.520 --> 0:14:03.640 get to later, and I want to be clear that 0:14:03.840 --> 0:14:06.760 any and all mentions of this number are estimates. In fact, 0:14:06.800 --> 0:14:09.600 the provenance of the five point five to eight million 0:14:09.679 --> 0:14:12.000 dollar number appears to be a citation of a post 0:14:12.040 --> 0:14:15.080 made by an nvidio engineer in an article from the 0:14:15.120 --> 0:14:18.199 South China Morning Post, which links to another article from 0:14:18.240 --> 0:14:21.040 the South China Morning Post which simply states that deep 0:14:21.080 --> 0:14:23.480 Seat V three comes with six hundred and seventy one 0:14:23.480 --> 0:14:25.880 billion parameters and was trained in around two months at 0:14:25.880 --> 0:14:28.400 the cost of five point five eight million dollars with 0:14:28.480 --> 0:14:31.640 no additional citations of any kind. So you should take 0:14:31.640 --> 0:14:36.320 it with a pinch of salt. But it's not totally ludicrous. Well, 0:14:36.360 --> 0:14:38.920 there are some that have estimated the cost. Deep Seeks 0:14:39.000 --> 0:14:41.840 V three models allegedly trained using two thousand and forty 0:14:41.880 --> 0:14:45.440 eight n video h eight hundred GPUs according to its paper, 0:14:46.000 --> 0:14:48.840 and Ben Thompson of Strategory has made this clear that 0:14:48.880 --> 0:14:51.440 the five point five million dollar number only covers the 0:14:51.480 --> 0:14:54.520 literal training cost of the official training run, and this 0:14:54.640 --> 0:14:56.400 is made fairly clear in the paper by the way 0:14:56.520 --> 0:14:59.080 of V three, and that's the one that's competitive with 0:14:59.200 --> 0:15:02.400 Opening Eyes GPT four O model, meaning that any costs 0:15:02.440 --> 0:15:04.680 related to prior research or experiments on how to build 0:15:04.680 --> 0:15:07.800 the mooddle were left out. Now big big shower to 0:15:07.800 --> 0:15:10.400 Minimaxer the guy on Blue Sky and Twitter, he's great. 0:15:10.960 --> 0:15:13.200 He is wonderful, and also added that this is fairly 0:15:13.200 --> 0:15:16.240 standard for the industry. Again, you choose how you feel 0:15:16.240 --> 0:15:17.840 about this, but I want to give you the information. 0:15:19.080 --> 0:15:21.680 And while it's safe to say that deep Seak's models 0:15:21.680 --> 0:15:24.600 are cheaper to train, the actual costs, especially as deep 0:15:24.600 --> 0:15:27.040 Seak doesn't share its training data, which some might argue 0:15:27.040 --> 0:15:29.440 means its models are not really open source. As I said, 0:15:30.560 --> 0:15:33.400 the numbers get a little harder to guess at. Thompson 0:15:33.440 --> 0:15:35.160 notes that Deep Seek had to craft a bunch of 0:15:35.160 --> 0:15:38.560 elegant workarounds to make the model perform, including writing code 0:15:38.560 --> 0:15:41.600 that ultimately changed how GPUs actually communicated with each other. 0:15:41.960 --> 0:15:45.880 This functionality isn't otherwise possible using Nvidia's developer tools. They 0:15:46.000 --> 0:15:47.760 really had to get in there. It's kind of cool. 0:15:48.160 --> 0:15:50.720 Deep seaks models V three and R one are more 0:15:50.760 --> 0:15:53.160 efficient and as a result, cheaper to run, and can 0:15:53.200 --> 0:15:56.560 be accessed via its API at prices that are astronomically 0:15:56.640 --> 0:16:00.240 cheaper than open eyes, Deep seat Chat running deep six 0:16:00.360 --> 0:16:03.960 GPT four oh competitive V three model cost zero points 0:16:04.040 --> 0:16:07.640 zero seven cents per one million input tokens as in 0:16:07.680 --> 0:16:11.080 commands given to the model, and one dollar one ten 0:16:11.480 --> 0:16:14.520 per one million output tokens as in the resulting output 0:16:14.560 --> 0:16:16.800 from the model. I know that these numbers kind of 0:16:16.840 --> 0:16:19.200 like just sound like numbers like you, Maybe you don't 0:16:19.240 --> 0:16:21.160 have context, so let me give you some. This is 0:16:21.200 --> 0:16:24.440 a dramatic price drop from the two dollars fifty cents 0:16:24.480 --> 0:16:28.040 per one million input tokens and ten dollars per one 0:16:28.080 --> 0:16:32.520 million output tokens the open Ai charges for GPT four. Oh, 0:16:33.200 --> 0:16:39.400 this isn't just undercutting, this is this is a bunker buster. If. Now, 0:16:39.520 --> 0:16:41.560 there is a side that I'll kind of get into 0:16:41.560 --> 0:16:44.160 a little bit later, in that you are using models 0:16:44.160 --> 0:16:46.440 hosted in the country that you don't know, probably China. 0:16:46.760 --> 0:16:49.920 There are data concerns. But again, you can put this 0:16:50.040 --> 0:16:52.800 on your own server. You could put this in Google Cloud. 0:16:52.880 --> 0:16:55.880 Both Microsoft and Google are apparently thinking about it now. 0:16:55.880 --> 0:16:58.560 The Information reported that Google had added it to Google Cloud. 0:16:58.720 --> 0:17:01.520 No they did not. They didn't do that. They allowed 0:17:01.520 --> 0:17:03.840 you to connect hugging face. This is a whole bunch 0:17:03.840 --> 0:17:06.159 of technical stuff that if you understand, you'll be like, yeah, Ed, 0:17:06.240 --> 0:17:10.639 I know. Long story short, the hyperscalers are already bringing 0:17:10.680 --> 0:17:13.920 deep Seek out, and I'll get to why that's bad 0:17:14.200 --> 0:17:17.480 later in detail. But it's also very funny. Now here's 0:17:17.520 --> 0:17:20.680 something else that's funny. Deep seek reasoner. It's reasoning model 0:17:20.760 --> 0:17:23.600 costs that fifty five cents per one million input tokens 0:17:23.680 --> 0:17:27.160 and two dollars and nineteen cents per one million output tokens. 0:17:27.359 --> 0:17:31.360 Now that sounds expensive. Maybe it is. Whatever, that's goddamn 0:17:31.480 --> 0:17:34.760 nothing compared to the fifteen dollars per one million input 0:17:34.840 --> 0:17:37.600 tokens and sixty dollars per one million output tokens of 0:17:37.640 --> 0:17:41.960 open ai WOF. If I'm Sam Orman, I'm shitting myself. 0:17:43.560 --> 0:17:45.800 But there's an obvious bar here. We do not know 0:17:45.840 --> 0:17:48.560 where deep seek is hosting its models, who has access 0:17:48.560 --> 0:17:50.640 to that data, or where that data is coming from 0:17:50.760 --> 0:17:52.960 or going to. We don't know who funds deep Seek 0:17:53.040 --> 0:17:55.240 other than it's connected to High Flyer, the hedge fund 0:17:55.240 --> 0:17:57.320 that I mentioned earlier that it's split from. In twenty 0:17:57.359 --> 0:17:59.760 twenty three, there are concerns that deep seak could be 0:17:59.760 --> 0:18:02.200 stayed funded, and that deep Seek's low prices are a 0:18:02.280 --> 0:18:05.000 kind of geopolitical weapon breaking the back of the generative 0:18:05.000 --> 0:18:08.440 AI industry in America. I'm not really sure whether that's 0:18:08.480 --> 0:18:11.080 the case or not. It's certainly true that China has 0:18:11.119 --> 0:18:13.720 long treated AI as a strategic part of its national 0:18:13.760 --> 0:18:16.840 industrial policy and is reported to help companies and sectors 0:18:16.840 --> 0:18:18.800 where it wants to catch up with the Western world. 0:18:19.480 --> 0:18:21.879 The Made in China twenty twenty five initiatives SAW are 0:18:21.880 --> 0:18:25.399 reported hundreds of billions of dollars provided to Chinese firms 0:18:25.440 --> 0:18:28.960 working in industries like chip making, aviation, and yeah AI. 0:18:29.400 --> 0:18:32.760 The extent of that support isn't exactly transparent, surprise, surprise, 0:18:33.000 --> 0:18:34.760 and so it's not entirely out of the realm of 0:18:34.760 --> 0:18:37.800 possibility that deep Seek is also the recipient of state aid. 0:18:38.240 --> 0:18:39.760 The good news is that we're going to find out 0:18:39.840 --> 0:18:43.720 fairly quickly. American AI infrastructure company Grok is already bringing 0:18:43.760 --> 0:18:46.680 deep Seek's model online, meaning that we'll get at least 0:18:46.720 --> 0:18:49.760 a very some sort of confirmation of whether these prices 0:18:49.760 --> 0:18:52.520 are realistic or whether they're heavily subsidized by whoever it 0:18:52.560 --> 0:18:55.080 is that backs deep Seek. It's also true that deep 0:18:55.080 --> 0:18:57.280 seek is owned in part by a hedge fund, which 0:18:57.359 --> 0:19:00.479 likely isn't short of cash to pump into them. But 0:19:00.520 --> 0:19:03.439 as in the side, given the open AI is the 0:19:03.520 --> 0:19:07.199 benefactor of billions of dollars of cloud compute credits and 0:19:07.240 --> 0:19:10.600 gets reduced pricing for Microsoft's zero cloud services to run 0:19:10.640 --> 0:19:13.560 its actual models, it's a bit tough for them to 0:19:13.600 --> 0:19:16.439 complain about Arrival being subsidized by a larger entity with 0:19:16.480 --> 0:19:18.960 the ability to absorb the costs of doing business should 0:19:19.040 --> 0:19:21.560 that be the case. Same goes for anthropic by the way, 0:19:21.920 --> 0:19:24.359 and yes, I know Microsoft isn't a state, but with 0:19:24.400 --> 0:19:26.960 a market cap of three point two trillion dollars in 0:19:27.040 --> 0:19:30.320 quarterly revenues, larger than the combined GDPs of some EU 0:19:30.400 --> 0:19:33.000 and NATO nations, it's kind of the next best thing. 0:19:33.640 --> 0:19:36.560 But I digress. Whatever concerns there may be about malign 0:19:36.680 --> 0:19:40.000 Chinese influence of bordering on irrelevant outside of the low prices, 0:19:40.040 --> 0:19:43.080 of course, offered by deepseek itself, and even that is 0:19:43.080 --> 0:19:46.080 speculative at this point. Once these models are hosted elsewhere, 0:19:46.119 --> 0:19:48.240 and once deep Seek's methods, which I'll get to in 0:19:48.280 --> 0:19:50.760 a little bit, are recreated, and by the way, that's 0:19:50.800 --> 0:19:52.840 not really going to take very long. I believe we're 0:19:52.840 --> 0:19:54.880 going to see that these prices are indicative of how 0:19:54.960 --> 0:20:11.280 cheap these models are to run. So you might be wondering, 0:20:11.359 --> 0:20:13.480 how the hell is this so much cheaper? And that's 0:20:13.480 --> 0:20:15.639 a bloody good question. And because I'm me, I have 0:20:15.680 --> 0:20:19.520 a hypothesis. I do not believe that the companies making 0:20:19.600 --> 0:20:22.520 these foundation models, such as Open Air and Anthropic, have 0:20:22.600 --> 0:20:25.639 actually been incentivized to do more with less. And because 0:20:25.680 --> 0:20:29.359 they're chummy little relationships with hyperscalers like Amazon, Google and 0:20:29.400 --> 0:20:33.040 Microsoft were focused almost entirely on making the biggest, most 0:20:33.119 --> 0:20:37.240 hugest models possible, using the biggest, even hugerris chips. And 0:20:37.280 --> 0:20:39.960 because the absence of profitability didn't stop them from raising 0:20:40.000 --> 0:20:43.200 more money. Well, they've never had to be fucking efficient, 0:20:43.320 --> 0:20:46.520 have they. They've never had to try. Maybe they should 0:20:46.520 --> 0:20:50.359 buy less avocado fucking toast. Anyway, let me put it 0:20:50.359 --> 0:20:53.960 in simpler terms. Imagine living on fifteen hundred dollars a month, 0:20:54.040 --> 0:20:55.639 and then imagine how you'd live on one hundred and 0:20:55.680 --> 0:20:57.800 fifty thousand dollars a month, and that you have to, 0:20:58.160 --> 0:21:00.479 like Brewster's millions, spend as much much of it as 0:21:00.480 --> 0:21:04.240 you can to complete a mission, a very simple mission. Live. 0:21:05.240 --> 0:21:08.320 In the former example, you concern survival, you have a 0:21:08.359 --> 0:21:10.280