1 00:00:02,400 --> 00:00:07,040 Speaker 1: A media Hello and welcome to Better Offline. I'm your 2 00:00:07,040 --> 00:00:20,759 Speaker 1: host ed Zitron. What a lot of you have been 3 00:00:20,800 --> 00:00:24,320 Speaker 1: getting in touch? Yes, you're getting your Deep Seek episode. 4 00:00:24,320 --> 00:00:26,080 Speaker 1: In fact, this is the first of a two parter. 5 00:00:26,640 --> 00:00:28,520 Speaker 1: This will come out on Friday, which is when you're 6 00:00:28,560 --> 00:00:30,960 Speaker 1: listening to this, and then it'll follow up on Monday. 7 00:00:31,320 --> 00:00:36,040 Speaker 1: I apologize. I spent a lot of Monday writing this 8 00:00:36,159 --> 00:00:38,720 Speaker 1: and also learning about a lot of this stuff in 9 00:00:38,760 --> 00:00:41,479 Speaker 1: an attempt to distill it as best I could. This 10 00:00:41,560 --> 00:00:45,120 Speaker 1: situation is extremely weird, and it's developing, and I think 11 00:00:45,159 --> 00:00:46,839 Speaker 1: even when I put out this episode there will be 12 00:00:46,920 --> 00:00:49,320 Speaker 1: new parts of it that I have yet to really 13 00:00:49,360 --> 00:00:52,400 Speaker 1: get to. I will do my absolute best to explain 14 00:00:52,440 --> 00:00:55,240 Speaker 1: in these episodes both what is happening with Deep Seek, 15 00:00:55,640 --> 00:00:58,320 Speaker 1: what it means, what they've built, and what it's going 16 00:00:58,360 --> 00:01:01,880 Speaker 1: to do in the future. But let's again, so, as 17 00:01:01,920 --> 00:01:05,200 Speaker 1: general came to a close, the entire generative AI industry 18 00:01:05,240 --> 00:01:07,880 Speaker 1: found itself in a kind of chaos. In sure, the 19 00:01:07,920 --> 00:01:10,880 Speaker 1: recent AI bubble and in particular the hundreds of billions 20 00:01:10,920 --> 00:01:14,040 Speaker 1: of dollars being spent on it, hinged on this big 21 00:01:14,080 --> 00:01:16,720 Speaker 1: idea that we need bigger models, which are both trained 22 00:01:16,760 --> 00:01:20,319 Speaker 1: and run on bigger and even larger GPUs, almost entirely 23 00:01:20,319 --> 00:01:23,639 Speaker 1: sold by Nvidia, and in turn they're based in bigger 24 00:01:23,680 --> 00:01:27,880 Speaker 1: and bigger data centers owned by companies like Microsoft, Oracle, Amazon, 25 00:01:27,920 --> 00:01:31,320 Speaker 1: and Google. Now, there was also this expectation that this 26 00:01:31,360 --> 00:01:34,840 Speaker 1: would always be the case. Hubris within this industry is 27 00:01:34,920 --> 00:01:39,280 Speaker 1: kind of part of the whole deal, and generative AI 28 00:01:39,520 --> 00:01:41,360 Speaker 1: was always meant to be this way, at least for 29 00:01:41,400 --> 00:01:43,920 Speaker 1: the American developers. It was always meant to be energy 30 00:01:43,959 --> 00:01:47,280 Speaker 1: and compute hungary. Throwing entire Zoo's worth of animals and 31 00:01:47,360 --> 00:01:50,000 Speaker 1: boiling lakes was necessary to do this. There was never 32 00:01:50,000 --> 00:01:54,040 Speaker 1: any other way to do it, and I thought, at 33 00:01:54,120 --> 00:01:56,200 Speaker 1: least I've thought for a while that this was because 34 00:01:56,280 --> 00:01:59,760 Speaker 1: they just they tried to make them more efficient, but 35 00:01:59,840 --> 00:02:03,080 Speaker 1: they couldn't. There was just something about transformer based architecture, 36 00:02:03,120 --> 00:02:05,639 Speaker 1: like the stuff that underpins Chat GPT, so the GPT 37 00:02:05,720 --> 00:02:09,160 Speaker 1: model under Chat GPT either. It wasn't the case, though. 38 00:02:10,000 --> 00:02:12,960 Speaker 1: A Chinese artificial intelligence company that few people had really 39 00:02:13,040 --> 00:02:15,280 Speaker 1: heard of, called deep Seak came along a few weeks 40 00:02:15,280 --> 00:02:19,000 Speaker 1: ago with multiple models that aren't merely competitive with open aiyes, 41 00:02:19,160 --> 00:02:22,919 Speaker 1: but actually undercut them in several meaningful ways. Deep Seak's 42 00:02:22,960 --> 00:02:25,680 Speaker 1: models are both open source, which means that their source 43 00:02:25,720 --> 00:02:29,680 Speaker 1: code and research is public, and they're significantly more efficient 44 00:02:29,680 --> 00:02:32,640 Speaker 1: as well as much as thirty times cheaper to run. 45 00:02:32,720 --> 00:02:34,880 Speaker 1: In the case of their reasoning model are one which 46 00:02:34,919 --> 00:02:38,560 Speaker 1: is competitive with open Aizo one and fifteen or more 47 00:02:38,639 --> 00:02:43,200 Speaker 1: times more efficient than GPT four. Oh, it's actually kind 48 00:02:43,240 --> 00:02:45,200 Speaker 1: of crazy when you think about it, and as you're 49 00:02:45,240 --> 00:02:47,640 Speaker 1: going to hear, this whole thing has jokeified me all 50 00:02:47,680 --> 00:02:50,440 Speaker 1: over again. And what's crazy is that some of them 51 00:02:50,440 --> 00:02:52,799 Speaker 1: can be distilled, which I'll get too later and run 52 00:02:52,840 --> 00:02:55,800 Speaker 1: on local devices like a laptop. It's kind of crazy, 53 00:02:56,160 --> 00:02:58,600 Speaker 1: and as a result, the markets have kind of panicked 54 00:02:58,639 --> 00:03:02,480 Speaker 1: because the entire narrative of the AI bubble has been 55 00:03:02,520 --> 00:03:04,800 Speaker 1: that these models have to be expensive because they are 56 00:03:04,840 --> 00:03:08,079 Speaker 1: the future, and that's why hyperscalers had to burn two 57 00:03:08,160 --> 00:03:12,040 Speaker 1: hundred billion dollars in capital expenditures for infrastructure to support 58 00:03:12,080 --> 00:03:15,919 Speaker 1: this wonderful boom, and specifically the ideas of open AI 59 00:03:16,000 --> 00:03:18,920 Speaker 1: and anthropic the idea that there was another way to 60 00:03:18,960 --> 00:03:20,840 Speaker 1: do this, that in fact, we didn't need to spend 61 00:03:20,840 --> 00:03:22,600 Speaker 1: all this money, and that maybe we could find a 62 00:03:22,639 --> 00:03:26,960 Speaker 1: more efficient way of doing it. Well, that would require 63 00:03:27,000 --> 00:03:29,239 Speaker 1: them to have another idea rather than throw as much 64 00:03:29,280 --> 00:03:32,000 Speaker 1: money at the problem as possible. Yeah, they just didn't 65 00:03:32,080 --> 00:03:35,200 Speaker 1: consider it, it turns out. And now as long as 66 00:03:35,240 --> 00:03:38,920 Speaker 1: come this outsider that's upended the whole conventional understanding and 67 00:03:39,120 --> 00:03:43,440 Speaker 1: perhaps even dethroned a member of America's tech royalty, Sam Altman, 68 00:03:43,480 --> 00:03:46,000 Speaker 1: a man who has crafted, if not a cult of personality, 69 00:03:46,400 --> 00:03:49,440 Speaker 1: some sort of public image of an unassailable visionary that 70 00:03:49,480 --> 00:03:52,200 Speaker 1: will lead the vanguard in the biggest technological change since 71 00:03:52,200 --> 00:03:56,680 Speaker 1: the Internet. Yeah, he's wrong. He never was doing that. 72 00:03:56,960 --> 00:03:59,440 Speaker 1: I've been saying it for a while. He's never been 73 00:03:59,480 --> 00:04:02,880 Speaker 1: doing this. But Deep Seek isn't just an outsider now. 74 00:04:02,920 --> 00:04:05,440 Speaker 1: They are a company that's emerged as a side project 75 00:04:05,440 --> 00:04:08,720 Speaker 1: from a tiny, tiny Chinese hedge fund, at least by 76 00:04:08,760 --> 00:04:10,880 Speaker 1: the stands of hedge funds, like five point five billion 77 00:04:10,920 --> 00:04:14,200 Speaker 1: dollars on the assets under management, and their founding team 78 00:04:14,240 --> 00:04:16,839 Speaker 1: has nowhere near the level of fame and celebrity or 79 00:04:16,839 --> 00:04:21,000 Speaker 1: even the accolades of Sam Moltman. It's distinctly humiliating for 80 00:04:21,080 --> 00:04:23,920 Speaker 1: everyone involved that is in Deep Seek. And on top 81 00:04:23,920 --> 00:04:27,479 Speaker 1: of all of that, Deep Seek's biggest ugliest insult is 82 00:04:27,480 --> 00:04:30,360 Speaker 1: that its model, deep seek are one, is competitive, like 83 00:04:30,400 --> 00:04:33,799 Speaker 1: I said, with open AI's incredibly expensive oh one reasoning model, 84 00:04:33,960 --> 00:04:37,880 Speaker 1: yet significantly and I mean ninety six percent cheaper to run. 85 00:04:38,120 --> 00:04:40,120 Speaker 1: And it can even be run locally. Like I said 86 00:04:40,440 --> 00:04:42,520 Speaker 1: speaking to a few developers, I know one was able 87 00:04:42,520 --> 00:04:44,679 Speaker 1: to run deep Seek's R one model and their twenty 88 00:04:44,760 --> 00:04:47,279 Speaker 1: twenty one MacBook Pro with an M one chip that 89 00:04:47,400 --> 00:04:51,480 Speaker 1: is a four year old computer, not a thirty thousand 90 00:04:51,680 --> 00:04:55,440 Speaker 1: GPU inside. It's kind of crazy. Worse still, Deep seeks 91 00:04:55,480 --> 00:04:58,159 Speaker 1: models are made freely available to use, with the source 92 00:04:58,160 --> 00:05:01,200 Speaker 1: code published under the MIT tech line, along with the 93 00:05:01,200 --> 00:05:04,119 Speaker 1: research on how they were made, although not the training data, 94 00:05:04,160 --> 00:05:06,159 Speaker 1: which makes some people say it's not really open source. 95 00:05:06,160 --> 00:05:08,280 Speaker 1: But for the sake of argument, I'm just going to 96 00:05:08,320 --> 00:05:11,080 Speaker 1: say open source. And this means by the way that 97 00:05:11,320 --> 00:05:14,120 Speaker 1: deep seeks models can be adapted and used for commercial 98 00:05:14,200 --> 00:05:17,599 Speaker 1: use without the need for royalties or fees. Anyone can 99 00:05:17,640 --> 00:05:20,880 Speaker 1: take this and build their own. It's kind of crazy. 100 00:05:21,400 --> 00:05:24,200 Speaker 1: By contrast, open ai is anything but open and its 101 00:05:24,240 --> 00:05:26,840 Speaker 1: last LM to be released under the MIT license was 102 00:05:26,880 --> 00:05:30,479 Speaker 1: twenty nineteen. Is GBT two No no wait wait, ship 103 00:05:30,680 --> 00:05:33,800 Speaker 1: let me correct that deep Seek's biggest ugliest secret is 104 00:05:33,839 --> 00:05:36,880 Speaker 1: actually that it's obviously taking aim at every element of 105 00:05:36,920 --> 00:05:40,839 Speaker 1: open aiy's portfolio. As the company was already dominating headlines, 106 00:05:40,880 --> 00:05:43,719 Speaker 1: this week it quietly dropped its Janus Pro seven B 107 00:05:43,839 --> 00:05:47,360 Speaker 1: image generation and analysis model, which the company says outperforms 108 00:05:47,360 --> 00:05:50,719 Speaker 1: both stable diffusion and open AI's Daly three. And those 109 00:05:50,760 --> 00:05:53,480 Speaker 1: are by the way image generation thinks. So you type 110 00:05:53,480 --> 00:05:57,200 Speaker 1: in something you like Garfield with boobs, and then outcomes 111 00:05:57,200 --> 00:06:00,560 Speaker 1: of Garfield with juicy cans, and that's probably the first 112 00:06:00,560 --> 00:06:02,560 Speaker 1: time you hear that on the podcast, but probably not 113 00:06:02,640 --> 00:06:06,840 Speaker 1: the last. And as with its other code, deep Seek 114 00:06:06,880 --> 00:06:09,320 Speaker 1: has made this freely available to both commercial and personal 115 00:06:09,400 --> 00:06:13,560 Speaker 1: users alike, whereas open ai is largely paywall darly three. 116 00:06:13,640 --> 00:06:17,520 Speaker 1: This is really, it's a truly crazy situation. And it's 117 00:06:17,520 --> 00:06:20,520 Speaker 1: also this cynical, vulgar version of David and Goliath, where 118 00:06:20,520 --> 00:06:23,200 Speaker 1: a tech startup back by a shadowy Chinese hedge fund 119 00:06:23,360 --> 00:06:26,520 Speaker 1: with eight billion dollars under management is somehow the plucky 120 00:06:26,560 --> 00:06:29,000 Speaker 1: upstart against the lumbering loss eo fish one hundred and 121 00:06:29,040 --> 00:06:33,000 Speaker 1: fifty billion dollars startup back by multiple public tech companies 122 00:06:33,000 --> 00:06:36,599 Speaker 1: with a market capitalization of other three trillion dollars I realized, 123 00:06:36,600 --> 00:06:39,119 Speaker 1: by the way I said earlier, five point five billion 124 00:06:39,160 --> 00:06:41,719 Speaker 1: dollars under management. This is why you check your notes 125 00:06:41,720 --> 00:06:44,040 Speaker 1: in advance. But I'm not cutting it. This is fresh. 126 00:06:44,120 --> 00:06:47,120 Speaker 1: I am inside a closet in New York. The content 127 00:06:47,320 --> 00:06:51,159 Speaker 1: must flow anyway. Deep Seek's V three model, which is 128 00:06:51,160 --> 00:06:54,080 Speaker 1: comparable and competitive with both open AI's GPT four roh 129 00:06:54,160 --> 00:06:57,360 Speaker 1: and anthropics Claude Sonnet three point five models, which by 130 00:06:57,360 --> 00:07:00,480 Speaker 1: the way, has some reasoning features. As I said, it's 131 00:07:00,520 --> 00:07:03,839 Speaker 1: fifty three times cheaper to run the R one when 132 00:07:03,920 --> 00:07:08,040 Speaker 1: using the company's own cloud services, and as mentioned earlier, 133 00:07:08,080 --> 00:07:11,000 Speaker 1: said model is effectively free for anyone to use locally 134 00:07:11,080 --> 00:07:13,240 Speaker 1: or on their own cloud instances, and could be taken 135 00:07:13,280 --> 00:07:15,640 Speaker 1: by any commercial enterprise and turned into a product of 136 00:07:15,640 --> 00:07:19,680 Speaker 1: their own should they desire to say, compete with open Ai, 137 00:07:19,800 --> 00:07:24,400 Speaker 1: the loudest and most annoying startup of all time. In essence, Deepseek, 138 00:07:24,440 --> 00:07:26,800 Speaker 1: and I'll get into its background and the concerns people 139 00:07:26,840 --> 00:07:29,600 Speaker 1: might have about its Chinese origins released two models that 140 00:07:29,640 --> 00:07:32,640 Speaker 1: perform competitively and even beat models from both open Air 141 00:07:32,720 --> 00:07:35,760 Speaker 1: and Anthropic, undercut them in price, and then made them 142 00:07:35,800 --> 00:07:38,880 Speaker 1: open undermining not just the economics of the biggest generative 143 00:07:38,880 --> 00:07:42,360 Speaker 1: AI companies, but laying bare exactly how they work. The 144 00:07:42,400 --> 00:07:47,240 Speaker 1: magic's gone. There's no more voodoo inside Samultman's soul. It's 145 00:07:47,320 --> 00:07:51,440 Speaker 1: all out there. And the last point is extremely important 146 00:07:51,480 --> 00:07:54,480 Speaker 1: when it comes to open EI's reasoning model, which specifically 147 00:07:54,600 --> 00:07:57,080 Speaker 1: hid its chain of thought for fear of these unsafe 148 00:07:57,120 --> 00:08:00,200 Speaker 1: thoughts that might manipulate the customer. And then they add 149 00:08:00,280 --> 00:08:02,600 Speaker 1: slightly under their breath that the actual reasons they did 150 00:08:02,640 --> 00:08:05,720 Speaker 1: it was a competitive advantage. Now to explain what that means. 151 00:08:05,880 --> 00:08:09,640 Speaker 1: When you make a request with open Aiy's oh one model, 152 00:08:09,720 --> 00:08:11,720 Speaker 1: say give me all the states with the letter are 153 00:08:11,840 --> 00:08:14,720 Speaker 1: in them, it actually shows you like the thinking. And 154 00:08:14,720 --> 00:08:16,880 Speaker 1: by the way, these things don't fucking think. They're they're 155 00:08:16,920 --> 00:08:19,880 Speaker 1: computer bullshit, like they don't think at all. But I'm 156 00:08:19,880 --> 00:08:22,320 Speaker 1: going to use it just for this so you see it. 157 00:08:22,360 --> 00:08:26,000 Speaker 1: Say okay, here are all the American states, which ones 158 00:08:26,040 --> 00:08:29,080 Speaker 1: have that letter? I'm checking all of those. It's effectively 159 00:08:29,120 --> 00:08:32,440 Speaker 1: having a large language model check a large language model. Now, 160 00:08:32,600 --> 00:08:35,280 Speaker 1: the thing is the steps they were showing you were 161 00:08:35,280 --> 00:08:37,560 Speaker 1: all cleaned up. They would look nice, they would be 162 00:08:37,600 --> 00:08:41,440 Speaker 1: formatted nicely. Deep Seak's chain of thought is completely laid bare, 163 00:08:42,080 --> 00:08:46,000 Speaker 1: which is very interesting because it really takes the wind 164 00:08:46,000 --> 00:08:48,800 Speaker 1: out of open Aiy's sales. And on top of that, 165 00:08:49,760 --> 00:08:52,320 Speaker 1: it allows you to see actually how these things think 166 00:08:52,400 --> 00:08:55,240 Speaker 1: through things, again not really thinking, but still you can 167 00:08:55,280 --> 00:08:57,959 Speaker 1: see things about how large language models work that these 168 00:08:57,960 --> 00:09:00,440 Speaker 1: companies didn't want you to have. On top of this, 169 00:09:00,840 --> 00:09:04,560 Speaker 1: open aiy one model has something even shittier to it, 170 00:09:04,600 --> 00:09:07,240 Speaker 1: which is these chain of thought things all cost money. 171 00:09:07,600 --> 00:09:10,880 Speaker 1: When you see it generate these thoughts, it's actually generating 172 00:09:10,920 --> 00:09:13,240 Speaker 1: more thoughts than you see because they're hiding the chain 173 00:09:13,280 --> 00:09:15,440 Speaker 1: of thought. So open ai is just charging you an 174 00:09:15,440 --> 00:09:18,200 Speaker 1: indeterminate amount of money, an insane amount of money, as 175 00:09:18,200 --> 00:09:21,360 Speaker 1: I'll get too later. But nevertheless, you don't know what 176 00:09:21,400 --> 00:09:23,920 Speaker 1: you're being charged for. You don't even know what's really 177 00:09:23,960 --> 00:09:26,720 Speaker 1: going on under the hood. Or you could use deep 178 00:09:26,760 --> 00:09:30,439 Speaker 1: seek and let's be completely clear, by the way, open 179 00:09:30,440 --> 00:09:34,319 Speaker 1: AI's literal only competitive advantage against Meta and Anthropic was 180 00:09:34,400 --> 00:09:37,200 Speaker 1: its reasoning models OH one and O three and O three, 181 00:09:37,200 --> 00:09:38,839 Speaker 1: by the way, is currently in a research preview and 182 00:09:38,960 --> 00:09:41,920 Speaker 1: is mostly just more of the same. Although I mentioned 183 00:09:41,960 --> 00:09:44,480 Speaker 1: earlier in the show that anthropics. Claudes Sonnet three point 184 00:09:44,520 --> 00:09:48,480 Speaker 1: five has some reasoning features. They're comparatively more rudimentary than 185 00:09:48,520 --> 00:09:50,600 Speaker 1: those in O one and O three, and i'd argue 186 00:09:50,679 --> 00:09:54,400 Speaker 1: are one, which is deep Seek's model. In an AI context, 187 00:09:54,480 --> 00:09:56,839 Speaker 1: reasoning works by breaking down a prompt into a series 188 00:09:56,840 --> 00:10:00,480 Speaker 1: of different steps with considerations of different approaches. Like I 189 00:10:00,520 --> 00:10:03,439 Speaker 1: said earlier, effectively a large language model checking its own 190 00:10:03,480 --> 00:10:06,480 Speaker 1: homework with no thinking involved, because like I said, they 191 00:10:06,480 --> 00:10:09,520 Speaker 1: do not think or no things an open Ai rushed 192 00:10:09,559 --> 00:10:12,160 Speaker 1: to launch its O one reasoning model last year because, 193 00:10:12,320 --> 00:10:15,720 Speaker 1: and I quote fortune from last October, Sam Mormon was 194 00:10:16,000 --> 00:10:19,320 Speaker 1: eager to prove to potential investors that in the company's 195 00:10:19,400 --> 00:10:22,080 Speaker 1: latest funding around, the open ai remains at the forefront 196 00:10:22,120 --> 00:10:25,480 Speaker 1: of AI development, and as I've noted in my newsletter 197 00:10:25,520 --> 00:10:28,400 Speaker 1: at the time, it was not particularly reliable, failing to 198 00:10:28,440 --> 00:10:31,040 Speaker 1: accurately count the number of times the letter R appeared 199 00:10:31,040 --> 00:10:33,800 Speaker 1: in the word strawberry, which was the code name for 200 00:10:34,240 --> 00:10:38,080 Speaker 1: one very funny stuff. At this point, it's fairly obvious 201 00:10:38,120 --> 00:10:41,400 Speaker 1: that open ai wasn't anywhere near the forefront of AI development, 202 00:10:41,640 --> 00:10:44,440 Speaker 1: and now that its competitive advantage is effectively gone, there 203 00:10:44,440 --> 00:10:47,000 Speaker 1: are genuine doubts about what comes next for the company. 204 00:10:48,280 --> 00:10:51,000 Speaker 1: As I'll go into there are many questionable parts of 205 00:10:51,000 --> 00:10:53,960 Speaker 1: Deepseek's story. It's funding, what GPUs it has, and how 206 00:10:54,040 --> 00:10:56,720 Speaker 1: much it actually spent training these models. But what we 207 00:10:56,840 --> 00:11:00,680 Speaker 1: definitively understand to be true is badly for open Ai, 208 00:11:00,880 --> 00:11:03,480 Speaker 1: and I would argue every other large US tech firm 209 00:11:03,480 --> 00:11:06,160 Speaker 1: that's jumped onto the generative AI bandwagon in the past 210 00:11:06,160 --> 00:11:20,200 Speaker 1: few years. Deep seeks models actually exist. They work, at 211 00:11:20,280 --> 00:11:22,880 Speaker 1: least by the standards of hallucination PRONELLA lams that don't, 212 00:11:22,920 --> 00:11:25,959 Speaker 1: at the risk of repeating myself know anything. They've been 213 00:11:26,000 --> 00:11:29,680 Speaker 1: independently verified to be competitive in performance, and their magnitudes 214 00:11:29,800 --> 00:11:34,400 Speaker 1: cheaper in price than those from both hyperscalers, Google's Gemini, Mets, Lama, 215 00:11:34,440 --> 00:11:36,560 Speaker 1: Amazon Que and so on and so forth, and from 216 00:11:36,600 --> 00:11:41,000 Speaker 1: those released by open Ai and Anthropic. Deep seeks models 217 00:11:41,040 --> 00:11:44,200 Speaker 1: don't require massive new data centers. They run on GPUs 218 00:11:44,240 --> 00:11:47,040 Speaker 1: currently used to run services like chat, GPT, and even 219 00:11:47,080 --> 00:11:50,000 Speaker 1: work on more austere hardware, Nor do they require an 220 00:11:50,120 --> 00:11:53,840 Speaker 1: endless supply of bigger, faster Nvidio GPUs every single year 221 00:11:53,880 --> 00:11:57,920 Speaker 1: to progress. The entire AI bubble was inflated based on 222 00:11:57,960 --> 00:12:00,600 Speaker 1: the premise that these models were simply impossible to build 223 00:12:00,600 --> 00:12:04,000 Speaker 1: without burning massive amounts of cash, straining the power grid, 224 00:12:04,000 --> 00:12:07,400 Speaker 1: and blowing past emission skulls, and that these costs were 225 00:12:07,400 --> 00:12:11,560 Speaker 1: both necessary and really good because they'd lead to creating 226 00:12:11,600 --> 00:12:15,400 Speaker 1: powerful AI, something that's yet to happen. And it's kind 227 00:12:15,400 --> 00:12:18,319 Speaker 1: of obvious at this point that that wasn't true. Now 228 00:12:18,360 --> 00:12:22,600 Speaker 1: the markets are sitting around there asking a very reasonable question, Shit, 229 00:12:22,760 --> 00:12:27,400 Speaker 1: did we just waste two hundred billion dollars? Anyway, let's 230 00:12:27,400 --> 00:12:30,720 Speaker 1: get into the nitty grit. What is deep Seek? First 231 00:12:30,760 --> 00:12:32,760 Speaker 1: of all, if you want to super deep dive into 232 00:12:32,800 --> 00:12:35,240 Speaker 1: what it is, I can't recommend venture beats right up enough. 233 00:12:35,280 --> 00:12:36,880 Speaker 1: I'll link to it in the show notes as they 234 00:12:36,960 --> 00:12:39,800 Speaker 1: usually do. It's really good and it goes into a 235 00:12:39,800 --> 00:12:42,120 Speaker 1: lot more detail than I woar. But here's the too 236 00:12:42,200 --> 00:12:44,880 Speaker 1: long didn't read for you. Deep Seek is a spin 237 00:12:44,920 --> 00:12:47,520 Speaker 1: off from a Chinese hedge fund called high Flyer Quant. 238 00:12:47,840 --> 00:12:50,079 Speaker 1: It's a relatively small and young company, and from its 239 00:12:50,120 --> 00:12:52,960 Speaker 1: inception it went big on algorithmic and AI driven trading. 240 00:12:53,320 --> 00:12:56,120 Speaker 1: Later it started building its own standalone chat bots, including 241 00:12:56,120 --> 00:12:59,440 Speaker 1: a chat GPT equivalent for the Chinese market. This is 242 00:12:59,559 --> 00:13:01,760 Speaker 1: what we need, right Now, I'm sure some of you 243 00:13:01,800 --> 00:13:05,080 Speaker 1: will say, oh, well, who knows if that's really true. Sure, 244 00:13:05,520 --> 00:13:07,760 Speaker 1: I think that that's fair. I also think that there 245 00:13:07,760 --> 00:13:09,880 Speaker 1: are parts of Sam Mortman's legend that we should question 246 00:13:09,960 --> 00:13:13,280 Speaker 1: as well. I think the circumstances under which Sam Mortman 247 00:13:13,360 --> 00:13:16,880 Speaker 1: got made head of y Combinator are extremely questionable. I'm 248 00:13:16,920 --> 00:13:19,240 Speaker 1: saying you can question deep Seek, and indeed you should. 249 00:13:19,240 --> 00:13:21,920 Speaker 1: We should be more critical of these powerful companies, but 250 00:13:22,040 --> 00:13:24,520 Speaker 1: don't do it halfway. If we're going to be worried, 251 00:13:24,600 --> 00:13:28,360 Speaker 1: let's be worried about everyone. Now. Deepseak did a few 252 00:13:28,360 --> 00:13:31,200 Speaker 1: things differently, like open sourcing its models, although it likely 253 00:13:31,240 --> 00:13:34,800 Speaker 1: built upon take from other companies like Metaslama and the 254 00:13:35,160 --> 00:13:38,680 Speaker 1: mL library PyTorch to train its models. It's secured over 255 00:13:38,760 --> 00:13:43,160 Speaker 1: ten thousand Nvidia GPUs right before the US imposed export restrictions, 256 00:13:43,160 --> 00:13:45,240 Speaker 1: which sounds like a lot, but it's a fraction of 257 00:13:45,240 --> 00:13:47,320 Speaker 1: what the big AI labs like Google, Open Air, and 258 00:13:47,360 --> 00:13:50,480 Speaker 1: Anthropic have to play with. I think I've heard estimates 259 00:13:50,520 --> 00:13:53,120 Speaker 1: of like one hundred thousand to three hundred thousand each, 260 00:13:53,200 --> 00:13:56,199 Speaker 1: if not more. Now you've likely seen or heard that 261 00:13:56,280 --> 00:13:59,080 Speaker 1: deep Seak trained its latest model for five point six 262 00:13:59,120 --> 00:14:01,520 Speaker 1: million dollars a poster to the insane amounts that I'll 263 00:14:01,520 --> 00:14:03,640 Speaker 1: get to later, and I want to be clear that 264 00:14:03,840 --> 00:14:06,760 Speaker 1: any and all mentions of this number are estimates. In fact, 265 00:14:06,800 --> 00:14:09,600 Speaker 1: the provenance of the five point five to eight million 266 00:14:09,679 --> 00:14:12,000 Speaker 1: dollar number appears to be a citation of a post 267 00:14:12,040 --> 00:14:15,080 Speaker 1: made by an nvidio engineer in an article from the 268 00:14:15,120 --> 00:14:18,199 Speaker 1: South China Morning Post, which links to another article from 269 00:14:18,240 --> 00:14:21,040 Speaker 1: the South China Morning Post which simply states that deep 270 00:14:21,080 --> 00:14:23,480 Speaker 1: Seat V three comes with six hundred and seventy one 271 00:14:23,480 --> 00:14:25,880 Speaker 1: billion parameters and was trained in around two months at 272 00:14:25,880 --> 00:14:28,400 Speaker 1: the cost of five point five eight million dollars with 273 00:14:28,480 --> 00:14:31,640 Speaker 1: no additional citations of any kind. So you should take 274 00:14:31,640 --> 00:14:36,320 Speaker 1: it with a pinch of salt. But it's not totally ludicrous. Well, 275 00:14:36,360 --> 00:14:38,920 Speaker 1: there are some that have estimated the cost. Deep Seeks 276 00:14:39,000 --> 00:14:41,840 Speaker 1: V three models allegedly trained using two thousand and forty 277 00:14:41,880 --> 00:14:45,440 Speaker 1: eight n video h eight hundred GPUs according to its paper, 278 00:14:46,000 --> 00:14:48,840 Speaker 1: and Ben Thompson of Strategory has made this clear that 279 00:14:48,880 --> 00:14:51,440 Speaker 1: the five point five million dollar number only covers the 280 00:14:51,480 --> 00:14:54,520 Speaker 1: literal training cost of the official training run, and this 281 00:14:54,640 --> 00:14:56,400 Speaker 1: is made fairly clear in the paper by the way 282 00:14:56,520 --> 00:14:59,080 Speaker 1: of V three, and that's the one that's competitive with 283 00:14:59,200 --> 00:15:02,400 Speaker 1: Opening Eyes GPT four O model, meaning that any costs 284 00:15:02,440 --> 00:15:04,680 Speaker 1: related to prior research or experiments on how to build 285 00:15:04,680 --> 00:15:07,800 Speaker 1: the mooddle were left out. Now big big shower to 286 00:15:07,800 --> 00:15:10,400 Speaker 1: Minimaxer the guy on Blue Sky and Twitter, he's great. 287 00:15:10,960 --> 00:15:13,200 Speaker 1: He is wonderful, and also added that this is fairly 288 00:15:13,200 --> 00:15:16,240 Speaker 1: standard for the industry. Again, you choose how you feel 289 00:15:16,240 --> 00:15:17,840 Speaker 1: about this, but I want to give you the information. 290 00:15:19,080 --> 00:15:21,680 Speaker 1: And while it's safe to say that deep Seak's models 291 00:15:21,680 --> 00:15:24,600 Speaker 1: are cheaper to train, the actual costs, especially as deep 292 00:15:24,600 --> 00:15:27,040 Speaker 1: Seak doesn't share its training data, which some might argue 293 00:15:27,040 --> 00:15:29,440 Speaker 1: means its models are not really open source. As I said, 294 00:15:30,560 --> 00:15:33,400 Speaker 1: the numbers get a little harder to guess at. Thompson 295 00:15:33,440 --> 00:15:35,160 Speaker 1: notes that Deep Seek had to craft a bunch of 296 00:15:35,160 --> 00:15:38,560 Speaker 1: elegant workarounds to make the model perform, including writing code 297 00:15:38,560 --> 00:15:41,600 Speaker 1: that ultimately changed how GPUs actually communicated with each other. 298 00:15:41,960 --> 00:15:45,880 Speaker 1: This functionality isn't otherwise possible using Nvidia's developer tools. They 299 00:15:46,000 --> 00:15:47,760 Speaker 1: really had to get in there. It's kind of cool. 300 00:15:48,160 --> 00:15:50,720 Speaker 1: Deep seaks models V three and R one are more 301 00:15:50,760 --> 00:15:53,160 Speaker 1: efficient and as a result, cheaper to run, and can 302 00:15:53,200 --> 00:15:56,560 Speaker 1: be accessed via its API at prices that are astronomically 303 00:15:56,640 --> 00:16:00,240 Speaker 1: cheaper than open eyes, Deep seat Chat running deep six 304 00:16:00,360 --> 00:16:03,960 Speaker 1: GPT four oh competitive V three model cost zero points 305 00:16:04,040 --> 00:16:07,640 Speaker 1: zero seven cents per one million input tokens as in 306 00:16:07,680 --> 00:16:11,080 Speaker 1: commands given to the model, and one dollar one ten 307 00:16:11,480 --> 00:16:14,520 Speaker 1: per one million output tokens as in the resulting output 308 00:16:14,560 --> 00:16:16,800 Speaker 1: from the model. I know that these numbers kind of 309 00:16:16,840 --> 00:16:19,200 Speaker 1: like just sound like numbers like you, Maybe you don't 310 00:16:19,240 --> 00:16:21,160 Speaker 1: have context, so let me give you some. This is 311 00:16:21,200 --> 00:16:24,440 Speaker 1: a dramatic price drop from the two dollars fifty cents 312 00:16:24,480 --> 00:16:28,040 Speaker 1: per one million input tokens and ten dollars per one 313 00:16:28,080 --> 00:16:32,520 Speaker 1: million output tokens the open Ai charges for GPT four. Oh, 314 00:16:33,200 --> 00:16:39,400 Speaker 1: this isn't just undercutting, this is this is a bunker buster. If. Now, 315 00:16:39,520 --> 00:16:41,560 Speaker 1: there is a side that I'll kind of get into 316 00:16:41,560 --> 00:16:44,160 Speaker 1: a little bit later, in that you are using models 317 00:16:44,160 --> 00:16:46,440 Speaker 1: hosted in the country that you don't know, probably China. 318 00:16:46,760 --> 00:16:49,920 Speaker 1: There are data concerns. But again, you can put this 319 00:16:50,040 --> 00:16:52,800 Speaker 1: on your own server. You could put this in Google Cloud. 320 00:16:52,880 --> 00:16:55,880 Speaker 1: Both Microsoft and Google are apparently thinking about it now. 321 00:16:55,880 --> 00:16:58,560 Speaker 1: The Information reported that Google had added it to Google Cloud. 322 00:16:58,720 --> 00:17:01,520 Speaker 1: No they did not. They didn't do that. They allowed 323 00:17:01,520 --> 00:17:03,840 Speaker 1: you to connect hugging face. This is a whole bunch 324 00:17:03,840 --> 00:17:06,159 Speaker 1: of technical stuff that if you understand, you'll be like, yeah, Ed, 325 00:17:06,240 --> 00:17:10,639 Speaker 1: I know. Long story short, the hyperscalers are already bringing 326 00:17:10,680 --> 00:17:13,920 Speaker 1: deep Seek out, and I'll get to why that's bad 327 00:17:14,200 --> 00:17:17,480 Speaker 1: later in detail. But it's also very funny. Now here's 328 00:17:17,520 --> 00:17:20,680 Speaker 1: something else that's funny. Deep seek reasoner. It's reasoning model 329 00:17:20,760 --> 00:17:23,600 Speaker 1: costs that fifty five cents per one million input tokens 330 00:17:23,680 --> 00:17:27,160 Speaker 1: and two dollars and nineteen cents per one million output tokens. 331 00:17:27,359 --> 00:17:31,360 Speaker 1: Now that sounds expensive. Maybe it is. Whatever, that's goddamn 332 00:17:31,480 --> 00:17:34,760 Speaker 1: nothing compared to the fifteen dollars per one million input 333 00:17:34,840 --> 00:17:37,600 Speaker 1: tokens and sixty dollars per one million output tokens of 334 00:17:37,640 --> 00:17:41,960 Speaker 1: open ai WOF. If I'm Sam Orman, I'm shitting myself. 335 00:17:43,560 --> 00:17:45,800 Speaker 1: But there's an obvious bar here. We do not know 336 00:17:45,840 --> 00:17:48,560 Speaker 1: where deep seek is hosting its models, who has access 337 00:17:48,560 --> 00:17:50,640 Speaker 1: to that data, or where that data is coming from 338 00:17:50,760 --> 00:17:52,960 Speaker 1: or going to. We don't know who funds deep Seek 339 00:17:53,040 --> 00:17:55,240 Speaker 1: other than it's connected to High Flyer, the hedge fund 340 00:17:55,240 --> 00:17:57,320 Speaker 1: that I mentioned earlier that it's split from. In twenty 341 00:17:57,359 --> 00:17:59,760 Speaker 1: twenty three, there are concerns that deep seak could be 342 00:17:59,760 --> 00:18:02,200 Speaker 1: stayed funded, and that deep Seek's low prices are a 343 00:18:02,280 --> 00:18:05,000 Speaker 1: kind of geopolitical weapon breaking the back of the generative 344 00:18:05,000 --> 00:18:08,440 Speaker 1: AI industry in America. I'm not really sure whether that's 345 00:18:08,480 --> 00:18:11,080 Speaker 1: the case or not. It's certainly true that China has 346 00:18:11,119 --> 00:18:13,720 Speaker 1: long treated AI as a strategic part of its national 347 00:18:13,760 --> 00:18:16,840 Speaker 1: industrial policy and is reported to help companies and sectors 348 00:18:16,840 --> 00:18:18,800 Speaker 1: where it wants to catch up with the Western world. 349 00:18:19,480 --> 00:18:21,879 Speaker 1: The Made in China twenty twenty five initiatives SAW are 350 00:18:21,880 --> 00:18:25,399 Speaker 1: reported hundreds of billions of dollars provided to Chinese firms 351 00:18:25,440 --> 00:18:28,960 Speaker 1: working in industries like chip making, aviation, and yeah AI. 352 00:18:29,400 --> 00:18:32,760 Speaker 1: The extent of that support isn't exactly transparent, surprise, surprise, 353 00:18:33,000 --> 00:18:34,760 Speaker 1: and so it's not entirely out of the realm of 354 00:18:34,760 --> 00:18:37,800 Speaker 1: possibility that deep Seek is also the recipient of state aid. 355 00:18:38,240 --> 00:18:39,760 Speaker 1: The good news is that we're going to find out 356 00:18:39,840 --> 00:18:43,720 Speaker 1: fairly quickly. American AI infrastructure company Grok is already bringing 357 00:18:43,760 --> 00:18:46,680 Speaker 1: deep Seek's model online, meaning that we'll get at least 358 00:18:46,720 --> 00:18:49,760 Speaker 1: a very some sort of confirmation of whether these prices 359 00:18:49,760 --> 00:18:52,520 Speaker 1: are realistic or whether they're heavily subsidized by whoever it 360 00:18:52,560 --> 00:18:55,080 Speaker 1: is that backs deep Seek. It's also true that deep 361 00:18:55,080 --> 00:18:57,280 Speaker 1: seek is owned in part by a hedge fund, which 362 00:18:57,359 --> 00:19:00,479 Speaker 1: likely isn't short of cash to pump into them. But 363 00:19:00,520 --> 00:19:03,439 Speaker 1: as in the side, given the open AI is the 364 00:19:03,520 --> 00:19:07,199 Speaker 1: benefactor of billions of dollars of cloud compute credits and 365 00:19:07,240 --> 00:19:10,600 Speaker 1: gets reduced pricing for Microsoft's zero cloud services to run 366 00:19:10,640 --> 00:19:13,560 Speaker 1: its actual models, it's a bit tough for them to 367 00:19:13,600 --> 00:19:16,439 Speaker 1: complain about Arrival being subsidized by a larger entity with 368 00:19:16,480 --> 00:19:18,960 Speaker 1: the ability to absorb the costs of doing business should 369 00:19:19,040 --> 00:19:21,560 Speaker 1: that be the case. Same goes for anthropic by the way, 370 00:19:21,920 --> 00:19:24,359 Speaker 1: and yes, I know Microsoft isn't a state, but with 371 00:19:24,400 --> 00:19:26,960 Speaker 1: a market cap of three point two trillion dollars in 372 00:19:27,040 --> 00:19:30,320 Speaker 1: quarterly revenues, larger than the combined GDPs of some EU 373 00:19:30,400 --> 00:19:33,000 Speaker 1: and NATO nations, it's kind of the next best thing. 374 00:19:33,640 --> 00:19:36,560 Speaker 1: But I digress. Whatever concerns there may be about malign 375 00:19:36,680 --> 00:19:40,000 Speaker 1: Chinese influence of bordering on irrelevant outside of the low prices, 376 00:19:40,040 --> 00:19:43,080 Speaker 1: of course, offered by deepseek itself, and even that is 377 00:19:43,080 --> 00:19:46,080 Speaker 1: speculative at this point. Once these models are hosted elsewhere, 378 00:19:46,119 --> 00:19:48,240 Speaker 1: and once deep Seek's methods, which I'll get to in 379 00:19:48,280 --> 00:19:50,760 Speaker 1: a little bit, are recreated, and by the way, that's 380 00:19:50,800 --> 00:19:52,840 Speaker 1: not really going to take very long. I believe we're 381 00:19:52,840 --> 00:19:54,880 Speaker 1: going to see that these prices are indicative of how 382 00:19:54,960 --> 00:20:11,280 Speaker 1: cheap these models are to run. So you might be wondering, 383 00:20:11,359 --> 00:20:13,480 Speaker 1: how the hell is this so much cheaper? And that's 384 00:20:13,480 --> 00:20:15,639 Speaker 1: a bloody good question. And because I'm me, I have 385 00:20:15,680 --> 00:20:19,520 Speaker 1: a hypothesis. I do not believe that the companies making 386 00:20:19,600 --> 00:20:22,520 Speaker 1: these foundation models, such as Open Air and Anthropic, have 387 00:20:22,600 --> 00:20:25,639 Speaker 1: actually been incentivized to do more with less. And because 388 00:20:25,680 --> 00:20:29,359 Speaker 1: they're chummy little relationships with hyperscalers like Amazon, Google and 389 00:20:29,400 --> 00:20:33,040 Speaker 1: Microsoft were focused almost entirely on making the biggest, most 390 00:20:33,119 --> 00:20:37,240 Speaker 1: hugest models possible, using the biggest, even hugerris chips. And 391 00:20:37,280 --> 00:20:39,960 Speaker 1: because the absence of profitability didn't stop them from raising 392 00:20:40,000 --> 00:20:43,200 Speaker 1: more money. Well, they've never had to be fucking efficient, 393 00:20:43,320 --> 00:20:46,520 Speaker 1: have they. They've never had to try. Maybe they should 394 00:20:46,520 --> 00:20:50,359 Speaker 1: buy less avocado fucking toast. Anyway, let me put it 395 00:20:50,359 --> 00:20:53,960 Speaker 1: in simpler terms. Imagine living on fifteen hundred dollars a month, 396 00:20:54,040 --> 00:20:55,639 Speaker 1: and then imagine how you'd live on one hundred and 397 00:20:55,680 --> 00:20:57,800 Speaker 1: fifty thousand dollars a month, and that you have to, 398 00:20:58,160 --> 00:21:00,479 Speaker 1: like Brewster's millions, spend as much much of it as 399 00:21:00,480 --> 00:21:04,240 Speaker 1: you can to complete a mission, a very simple mission. Live. 400 00:21:05,240 --> 00:21:08,320 Speaker 1: In the former example, you concern survival, you have a 401 00:21:08,359 --> 00:21:10,280 Speaker 1: limited amount of money and must make it go as 402 00:21:10,280 --> 00:21:12,639 Speaker 1: far as possible, with real sacrifices to be made with 403 00:21:12,680 --> 00:21:14,880 Speaker 1: every dollar you spent. If you want to have fun, 404 00:21:15,080 --> 00:21:17,199 Speaker 1: you're going to have to eat less. Potentially all the 405 00:21:17,240 --> 00:21:19,240 Speaker 1: food you eat will have to be cheaper. You have 406 00:21:19,280 --> 00:21:21,640 Speaker 1: to live on a budget. You have to make decisions, 407 00:21:21,680 --> 00:21:24,399 Speaker 1: and indeed you might learn to cook at home. You 408 00:21:24,480 --> 00:21:27,520 Speaker 1: might walk more, you might do things that will help 409 00:21:27,560 --> 00:21:30,800 Speaker 1: you not spend all your money. In the latter example, 410 00:21:30,880 --> 00:21:32,720 Speaker 1: where you have one hundred and fifty thousand dollars a 411 00:21:32,760 --> 00:21:35,720 Speaker 1: month that you must spend, your incentivize the splurge to 412 00:21:35,800 --> 00:21:39,359 Speaker 1: lean into excess to pursue this vague idea of living 413 00:21:39,400 --> 00:21:43,159 Speaker 1: your life, your actions are dictated not by any existential threats, 414 00:21:43,240 --> 00:21:45,800 Speaker 1: or indeed any kind of future planning, but by whatever 415 00:21:45,840 --> 00:21:49,600 Speaker 1: you perceive to be an opportunity to live. Open AI 416 00:21:49,720 --> 00:21:53,000 Speaker 1: and anthropic are emblematic of what happens when survival takes 417 00:21:53,000 --> 00:21:56,240 Speaker 1: a back seat to living. They have been incentivized by 418 00:21:56,280 --> 00:21:59,600 Speaker 1: frothy venture capital and public markets desperate for the next 419 00:21:59,600 --> 00:22:02,600 Speaker 1: big thing thing, the next big growth to build bigger 420 00:22:02,600 --> 00:22:05,480 Speaker 1: models and sell even bigger dreams. Like Dario Amaday of 421 00:22:05,480 --> 00:22:08,800 Speaker 1: Anthropics saying that your AI and I quote could surpass 422 00:22:08,840 --> 00:22:12,800 Speaker 1: almost all human beings at almost everything shortly after twenty 423 00:22:12,960 --> 00:22:16,000 Speaker 1: twenty seven, I just want to take a fucking second. Journalist, 424 00:22:16,040 --> 00:22:18,720 Speaker 1: if you're listening to this, stop fucking quoting this bullshit. 425 00:22:19,440 --> 00:22:22,800 Speaker 1: Stop it. You're doing nothing. You are failing at your 426 00:22:22,840 --> 00:22:26,840 Speaker 1: goddamn job every single time you quote this bullshit, this nonsense. 427 00:22:27,119 --> 00:22:29,800 Speaker 1: Shortly after twenty twenty seven. What the fuck does that mean? 428 00:22:29,840 --> 00:22:33,640 Speaker 1: Twenty twenty eight, twenty twenty nine, twenty thirty, what does 429 00:22:34,000 --> 00:22:38,760 Speaker 1: surpassing humans and almost everything even mean? This shit doesn't work. 430 00:22:38,840 --> 00:22:42,040 Speaker 1: This shit is not good. Oh my god. Anyway, back 431 00:22:42,080 --> 00:22:45,399 Speaker 1: to the podcast, the Calm Damn. Both Open AI and 432 00:22:45,440 --> 00:22:48,280 Speaker 1: Anthropic have effectively lived their existence with the infinite money 433 00:22:48,320 --> 00:22:50,320 Speaker 1: cheap from the SIMS. And I know some of you 434 00:22:50,440 --> 00:22:52,120 Speaker 1: might say, by the way, it's not an infant money, 435 00:22:52,119 --> 00:22:54,440 Speaker 1: you just add you go into the console. You get 436 00:22:54,440 --> 00:22:57,199 Speaker 1: my point. And both companies have been bleeding billions of 437 00:22:57,200 --> 00:22:59,760 Speaker 1: dollars a year after revenue, and that's, by the way, 438 00:23:00,040 --> 00:23:03,080 Speaker 1: making billions of dollars and then still losing billions is insane, 439 00:23:03,480 --> 00:23:06,200 Speaker 1: and they still operated as if money would never run 440 00:23:06,200 --> 00:23:09,560 Speaker 1: out because it and it wouldn't. If they were actually 441 00:23:09,560 --> 00:23:11,919 Speaker 1: worried about that happening, they would have certainly tried to 442 00:23:11,920 --> 00:23:14,439 Speaker 1: do what Deep seek has done, except they didn't have 443 00:23:14,560 --> 00:23:16,720 Speaker 1: to because both of them had the endless cash and 444 00:23:16,760 --> 00:23:20,720 Speaker 1: access to GPUs from either Microsoft, Amazon or Google. And 445 00:23:21,000 --> 00:23:23,480 Speaker 1: the stargate thing is just I will mention it later, 446 00:23:23,680 --> 00:23:26,280 Speaker 1: just long story short. They're not going to put five 447 00:23:26,359 --> 00:23:29,000 Speaker 1: hundred billion dollars into the it was up to five 448 00:23:29,040 --> 00:23:32,800 Speaker 1: hundred bill I'm so tired of this shit. Open iron 449 00:23:32,800 --> 00:23:35,359 Speaker 1: anthropic have never been made to sweat, unlike me in 450 00:23:35,400 --> 00:23:38,320 Speaker 1: this closet where I'm recording this. And they've received endless 451 00:23:38,320 --> 00:23:40,600 Speaker 1: amount of free marketing from a tech and business media 452 00:23:40,640 --> 00:23:44,320 Speaker 1: happy to print whatever vapid bullshit they spout, and it's 453 00:23:44,400 --> 00:23:48,080 Speaker 1: just very frustrating. They've raised money at will with ananthropic, 454 00:23:48,119 --> 00:23:50,560 Speaker 1: by the way, is currently raising another two billion dollars, 455 00:23:50,680 --> 00:23:52,840 Speaker 1: valuing the company at sixty billion dollars. And this was 456 00:23:52,920 --> 00:23:55,600 Speaker 1: I think happening while deep Zeek was going on, which 457 00:23:55,640 --> 00:23:58,040 Speaker 1: is really funny. And they've done all of this off 458 00:23:58,040 --> 00:24:00,800 Speaker 1: of a narrative of them. We need more money than 459 00:24:00,800 --> 00:24:04,080 Speaker 1: any company is ever needed ever because the things we're 460 00:24:04,080 --> 00:24:08,800 Speaker 1: doing have to cost this much. There is no other way. 461 00:24:09,000 --> 00:24:12,159 Speaker 1: You must give us more money. My name is Sam Altman. 462 00:24:12,200 --> 00:24:14,640 Speaker 1: I need more money than has ever been made from 463 00:24:14,680 --> 00:24:17,320 Speaker 1: my huge, beautiful company that sucks and needs money to 464 00:24:17,359 --> 00:24:20,440 Speaker 1: train it. Help me, please, My big, beautiful sick company 465 00:24:20,480 --> 00:24:22,520 Speaker 1: is dying, but the best and most important company of 466 00:24:22,520 --> 00:24:28,119 Speaker 1: all time. It's also normal. Now. Do I think that 467 00:24:28,200 --> 00:24:30,399 Speaker 1: they were aware that there were methods to make their 468 00:24:30,440 --> 00:24:34,280 Speaker 1: models more efficient? Sure, open AI tried and failed in 469 00:24:34,320 --> 00:24:36,560 Speaker 1: twenty twenty three to deliver a more efficient model to 470 00:24:36,600 --> 00:24:42,600 Speaker 1: Microsoft called Arakis. I'm sure there are teams that both 471 00:24:42,600 --> 00:24:45,920 Speaker 1: Anthropic and OPENII that are specifically dedicated to making things 472 00:24:46,040 --> 00:24:48,560 Speaker 1: kind of more efficient. But they didn't have to do it, 473 00:24:48,600 --> 00:24:51,639 Speaker 1: and so they didn't. And as I've written before in 474 00:24:51,680 --> 00:24:54,400 Speaker 1: my newsletter and argued on this very podcast, open AI 475 00:24:54,520 --> 00:24:56,880 Speaker 1: simply burns money and have been allowed to burn money, 476 00:24:56,880 --> 00:24:58,879 Speaker 1: and up until recently likely would have been allowed to 477 00:24:58,880 --> 00:25:02,040 Speaker 1: burn even more money because everybody, all of the American 478 00:25:02,080 --> 00:25:04,639 Speaker 1: model developers, appeared to agree that the only way to 479 00:25:04,640 --> 00:25:07,280 Speaker 1: develop large language models was to make them as big 480 00:25:07,400 --> 00:25:10,840 Speaker 1: as humanly possible and work out troublesome stuff like making 481 00:25:10,840 --> 00:25:14,240 Speaker 1: them profitable or turning them into a useful thing later, 482 00:25:14,560 --> 00:25:17,840 Speaker 1: which is I presume when AGI happens, a thing that 483 00:25:17,840 --> 00:25:20,679 Speaker 1: they're still in the process of defining, let alone doing. 484 00:25:21,760 --> 00:25:23,640 Speaker 1: Deep Seek, on the other hand, had to work out 485 00:25:23,640 --> 00:25:25,600 Speaker 1: a way to make its own large language models within 486 00:25:25,640 --> 00:25:28,000 Speaker 1: the constraints of the hamstrung end video chips that can 487 00:25:28,040 --> 00:25:31,080 Speaker 1: be legally sold to China. While there's a whole cottaged 488 00:25:31,119 --> 00:25:34,160 Speaker 1: industry of selling chips in Chines using resellers and other 489 00:25:34,200 --> 00:25:37,280 Speaker 1: parties to get restricted silicon into the country, the entire 490 00:25:37,320 --> 00:25:40,040 Speaker 1: way in which deep Seek went about developing its models 491 00:25:40,160 --> 00:25:44,240 Speaker 1: suggests that it was working around very specific memory bandwidth constraints, 492 00:25:44,560 --> 00:25:46,320 Speaker 1: meaning that the amount of data that could be fed 493 00:25:46,320 --> 00:25:48,640 Speaker 1: into it and out of it and into the chips. 494 00:25:48,680 --> 00:25:51,720 Speaker 1: In essence, doing more with less wasn't something it shows, 495 00:25:51,720 --> 00:25:55,000 Speaker 1: but it's something they had to do. I've touched already 496 00:25:55,000 --> 00:25:57,160 Speaker 1: on the technical how of these models in greater depth, 497 00:25:57,200 --> 00:25:59,200 Speaker 1: and you can really read in that in my news 498 00:25:59,240 --> 00:26:01,359 Speaker 1: there and you can go to whez your hed not 499 00:26:01,480 --> 00:26:03,200 Speaker 1: at it's at the end of the episode. But I'll 500 00:26:03,200 --> 00:26:05,560 Speaker 1: also have show notes to what cales like Ben Thompson's 501 00:26:05,520 --> 00:26:08,960 Speaker 1: some strategory because there are lots of things to read here. 502 00:26:09,000 --> 00:26:11,160 Speaker 1: I know there are some really technical listeners, and I'm 503 00:26:11,160 --> 00:26:13,800 Speaker 1: sure you're gonna flame me in my emails. Please go 504 00:26:13,840 --> 00:26:16,080 Speaker 1: and read it. I'm not wrong. I've checked with a 505 00:26:16,080 --> 00:26:18,920 Speaker 1: lot of people too, and by the way, all of 506 00:26:18,920 --> 00:26:22,399 Speaker 1: this austerity stuff seems to have worked. There's also the 507 00:26:22,440 --> 00:26:26,840 Speaker 1: training data situation and another mayor culper. I've previously discussed 508 00:26:26,880 --> 00:26:29,760 Speaker 1: the concept of model collapse and how feeding synthetic data, 509 00:26:29,800 --> 00:26:32,639 Speaker 1: which is training data created by a generative model, into 510 00:26:32,680 --> 00:26:35,440 Speaker 1: another model, could end up teaching it bad habits, which 511 00:26:35,440 --> 00:26:37,800 Speaker 1: in turn would destroy the model. But it seems that 512 00:26:37,840 --> 00:26:41,240 Speaker 1: deep Seekers succeeded in training its models using generative data 513 00:26:41,760 --> 00:26:45,919 Speaker 1: specifically though, and I'm quoting geekwise John Turou like mathematics 514 00:26:45,960 --> 00:26:49,000 Speaker 1: where correctness is unambiguous, and using and i quote again, 515 00:26:49,240 --> 00:26:52,640 Speaker 1: highly efficient reward functions that could identify with which new 516 00:26:52,680 --> 00:26:55,959 Speaker 1: training examples would actually improve the model, avoiding wasted compute 517 00:26:55,960 --> 00:26:59,000 Speaker 1: on redundant data, and it seems to have worked. Though 518 00:26:59,040 --> 00:27:02,080 Speaker 1: model collapse may still be a possibility. This approach extremely 519 00:27:02,119 --> 00:27:04,720 Speaker 1: precise use of synthetic data is in line with some 520 00:27:04,760 --> 00:27:07,399 Speaker 1: of the defenses against model collapse I've heard from LLLM 521 00:27:07,440 --> 00:27:10,600 Speaker 1: developers i've talked to. This is also a situation where 522 00:27:10,640 --> 00:27:13,440 Speaker 1: we don't know the exact training data, and it doesn't 523 00:27:13,480 --> 00:27:16,320 Speaker 1: negate any of the previous points I've made about model collapse. 524 00:27:17,119 --> 00:27:20,520 Speaker 1: Now we'll see what happens there. But synthetic data might 525 00:27:20,560 --> 00:27:22,359 Speaker 1: work where the output is something that you could figure 526 00:27:22,359 --> 00:27:24,800 Speaker 1: out using a calculator. But when you get into anything 527 00:27:24,840 --> 00:27:26,840 Speaker 1: a bit more fuzzy, like written text or anything with 528 00:27:26,880 --> 00:27:30,680 Speaker 1: an element of analysis, you'll likely encounter some unhappy side effects. 529 00:27:30,840 --> 00:27:32,760 Speaker 1: But I don't know if that's really going to change 530 00:27:32,760 --> 00:27:35,679 Speaker 1: how good these things are. There's also a little scuttle 531 00:27:35,680 --> 00:27:38,840 Speaker 1: about about where deep seak got its data. Ben Thompson, 532 00:27:38,880 --> 00:27:42,080 Speaker 1: that's trategory suggests that deep seek's models are potentially distilling 533 00:27:42,160 --> 00:27:45,040 Speaker 1: other model's outputs, by which I mean having another model, 534 00:27:45,080 --> 00:27:48,520 Speaker 1: say metas LAMA or open ais GPT four H, which 535 00:27:48,560 --> 00:27:51,119 Speaker 1: is why deep seak identified itself as chet GPT at 536 00:27:51,160 --> 00:27:54,240 Speaker 1: one point spit out outputs specifically to train parts of 537 00:27:54,240 --> 00:27:57,600 Speaker 1: deep Seek. This obviously violates the terms of service of 538 00:27:57,640 --> 00:28:00,280 Speaker 1: these tools, as open AI and its rivals with much rather, 539 00:28:00,400 --> 00:28:03,240 Speaker 1: have you not use its technology to create its next rival. 540 00:28:03,800 --> 00:28:07,480 Speaker 1: And open Aye, by the way, has recently reportedly found 541 00:28:07,480 --> 00:28:10,880 Speaker 1: evidence that deep seek used open AIS models to train 542 00:28:10,960 --> 00:28:14,160 Speaker 1: its rivals. And this is from the Financial Times, although 543 00:28:14,200 --> 00:28:16,800 Speaker 1: it failed to make any formal allegations, but it did 544 00:28:16,880 --> 00:28:19,520 Speaker 1: say that using chat gpt to train a competing model 545 00:28:19,640 --> 00:28:22,920 Speaker 1: violates its terms of service, and David Sachs, the investor 546 00:28:22,920 --> 00:28:25,920 Speaker 1: in Trump administration AI and cryptos are, says it's possible 547 00:28:25,960 --> 00:28:29,320 Speaker 1: that this occurred, although he failed to provide evidence. I 548 00:28:29,440 --> 00:28:31,760 Speaker 1: just want to say, how fucking funny it is that 549 00:28:31,920 --> 00:28:36,000 Speaker 1: open air is going where where you're stealing my stuff? 550 00:28:36,040 --> 00:28:41,440 Speaker 1: Don't steal my things? Where fucking coward, pansy bastard bitches. 551 00:28:41,560 --> 00:28:44,880 Speaker 1: Fucking hell, what a what a bunch of whiny babies. 552 00:28:44,960 --> 00:28:49,400 Speaker 1: Oh no, my plagiarism machine got plagiarized. Where kiss my 553 00:28:49,760 --> 00:28:54,160 Speaker 1: entire asshole, sam Orman, you little worm, you fucking embarrassment 554 00:28:54,200 --> 00:28:56,640 Speaker 1: to Silicon Valley. You should be ashamed of yourself for 555 00:28:56,680 --> 00:29:01,120 Speaker 1: many reasons, but so much this though. Where Yeah, oh no, 556 00:29:01,320 --> 00:29:03,800 Speaker 1: you stole from use my plagier is the machine that 557 00:29:03,880 --> 00:29:07,200 Speaker 1: requires me to steal from literally every artist and author 558 00:29:07,240 --> 00:29:09,640 Speaker 1: on the Internet. The thing where we went on YouTube 559 00:29:09,680 --> 00:29:12,760 Speaker 1: and transcribed everything and fed it into the machine. That's 560 00:29:12,800 --> 00:29:15,680 Speaker 1: that's not stealing, that's good. But you using our model 561 00:29:15,720 --> 00:29:19,200 Speaker 1: to generate answers. That's just not fair. What a bunch 562 00:29:19,240 --> 00:29:22,160 Speaker 1: of babies, you guys say. I'm almos worth billions of dollars. 563 00:29:22,240 --> 00:29:24,880 Speaker 1: He has a five million dollar car. Cry more, you 564 00:29:24,960 --> 00:29:29,080 Speaker 1: little worm. Personally, I genuinely want open ai to point 565 00:29:29,080 --> 00:29:31,600 Speaker 1: a finger at Deep Seek and accuse it of IP theft, 566 00:29:32,080 --> 00:29:35,280 Speaker 1: mostly for the yucks, but also for the hypocrisy factor. 567 00:29:35,600 --> 00:29:38,440 Speaker 1: This is a company that, as I've just very cleanly said, 568 00:29:38,600 --> 00:29:42,240 Speaker 1: exists purely from the wholesale industrial larceny of content produced 569 00:29:42,240 --> 00:29:46,200 Speaker 1: by literally a fucking everyone, And now they're crying way. 570 00:29:47,040 --> 00:29:49,920 Speaker 1: I'm sam Olman. I'm a big baby. I've filled my 571 00:29:50,080 --> 00:29:54,280 Speaker 1: diaper because someone stole from my plagiarism machine. Kiss my ass, 572 00:29:55,000 --> 00:29:58,920 Speaker 1: Kiss my ass. These companies haven't got shit. Open ai 573 00:29:59,040 --> 00:30:01,840 Speaker 1: doesn't have shit. They they don't have anything, They don't 574 00:30:01,880 --> 00:30:05,360 Speaker 1: have a next product without reasoning, they haven't got anything. 575 00:30:05,600 --> 00:30:10,360 Speaker 1: And now they don't have that disgusting justification that overspending 576 00:30:10,400 --> 00:30:14,160 Speaker 1: the fat, ugly American startup culture of spending as much 577 00:30:14,200 --> 00:30:17,080 Speaker 1: as you can to build America's next top monopoly. They 578 00:30:17,080 --> 00:30:20,720 Speaker 1: should be fucking ashamed of themselves. They shouldn't be billionaires, 579 00:30:20,760 --> 00:30:23,880 Speaker 1: they should be poverty stricken. They should have to pay 580 00:30:23,880 --> 00:30:27,680 Speaker 1: everyone they stole for And it's just it sickens me 581 00:30:27,800 --> 00:30:31,120 Speaker 1: seeing the reaction from some people on this, seeing the sinophobia, 582 00:30:31,280 --> 00:30:33,920 Speaker 1: but seeing this level of defensiveness of a company like 583 00:30:33,960 --> 00:30:37,560 Speaker 1: open AI or Anthropic, And as I'll get into next episode, 584 00:30:37,640 --> 00:30:40,200 Speaker 1: we are really running out of time here, and I 585 00:30:40,240 --> 00:30:43,960 Speaker 1: think Deep Seek is really I think it could be 586 00:30:44,080 --> 00:30:47,360 Speaker 1: really the end of days for these companies. I don't 587 00:30:47,360 --> 00:30:50,000 Speaker 1: know how much they've got left time wise, or even 588 00:30:50,040 --> 00:30:53,120 Speaker 1: money wise, and I'm not sure how they even raise money. 589 00:30:53,200 --> 00:30:55,240 Speaker 1: But in the next episode, I'm going to deep dive 590 00:30:55,280 --> 00:30:58,040 Speaker 1: into Deep Seek and I'll tell you how they sent 591 00:30:58,120 --> 00:31:00,120 Speaker 1: the US tech market into a panic and what it 592 00:31:00,120 --> 00:31:03,760 Speaker 1: actually means the future of open Aianthropic and the hyperscale 593 00:31:03,840 --> 00:31:06,920 Speaker 1: is backing them. This has been a crazy few days. 594 00:31:07,400 --> 00:31:10,480 Speaker 1: I hope this has helped, and on Monday you'll find 595 00:31:10,480 --> 00:31:13,800 Speaker 1: out more. Thank you so much for listening. The support 596 00:31:13,840 --> 00:31:15,520 Speaker 1: I've got for the show has been incredible, and the 597 00:31:15,560 --> 00:31:19,880 Speaker 1: emails I've got about Deep Seek. I've been trying Okay, 598 00:31:19,920 --> 00:31:22,240 Speaker 1: I've really been trying so the fastest I could do it. 599 00:31:22,880 --> 00:31:24,520 Speaker 1: But I'm so happy to do this show, and I'm 600 00:31:24,520 --> 00:31:34,680 Speaker 1: so grateful for all of you. Thank you for listening 601 00:31:34,720 --> 00:31:37,360 Speaker 1: to Better Offline. The editor and composer of the Better 602 00:31:37,400 --> 00:31:40,400 Speaker 1: Offline theme song is Matasowski. You can check out more 603 00:31:40,440 --> 00:31:43,920 Speaker 1: of his music and audio projects at Matasowski dot com, 604 00:31:44,040 --> 00:31:48,960 Speaker 1: M A T T O. S O w Ski dot com. 605 00:31:49,000 --> 00:31:51,320 Speaker 1: You can email me at easy at Better Offline dot 606 00:31:51,360 --> 00:31:53,560 Speaker 1: com or visit Better Offline dot com to find more 607 00:31:53,600 --> 00:31:57,000 Speaker 1: podcast links and of course, my newsletter. I also really 608 00:31:57,000 --> 00:31:59,320 Speaker 1: recommend you go to chat dot where's youreaed dot at 609 00:31:59,320 --> 00:32:01,760 Speaker 1: to visit the disc and go to our slash Better 610 00:32:01,800 --> 00:32:04,960 Speaker 1: Offline to check out I'll Reddit. Thank you so much 611 00:32:05,000 --> 00:32:08,840 Speaker 1: for listening. Better Offline is a production of cool Zone Media. 612 00:32:08,960 --> 00:32:12,360 Speaker 1: For more from cool Zone Media, visit our website Coolzonemedia 613 00:32:12,400 --> 00:32:15,200 Speaker 1: dot com, or check us out on the iHeartRadio app, 614 00:32:15,280 --> 00:32:17,720 Speaker 1: Apple Podcasts, or wherever you get your podcasts.