WEBVTT - What The Hell Is DeepSeek?

0:00:02.400 --> 0:00:07.040
<v Speaker 1>A media Hello and welcome to Better Offline. I'm your

0:00:07.040 --> 0:00:20.759
<v Speaker 1>host ed Zitron. What a lot of you have been

0:00:20.800 --> 0:00:24.320
<v Speaker 1>getting in touch? Yes, you're getting your Deep Seek episode.

0:00:24.320 --> 0:00:26.080
<v Speaker 1>In fact, this is the first of a two parter.

0:00:26.640 --> 0:00:28.520
<v Speaker 1>This will come out on Friday, which is when you're

0:00:28.560 --> 0:00:30.960
<v Speaker 1>listening to this, and then it'll follow up on Monday.

0:00:31.320 --> 0:00:36.040
<v Speaker 1>I apologize. I spent a lot of Monday writing this

0:00:36.159 --> 0:00:38.720
<v Speaker 1>and also learning about a lot of this stuff in

0:00:38.760 --> 0:00:41.479
<v Speaker 1>an attempt to distill it as best I could. This

0:00:41.560 --> 0:00:45.120
<v Speaker 1>situation is extremely weird, and it's developing, and I think

0:00:45.159 --> 0:00:46.839
<v Speaker 1>even when I put out this episode there will be

0:00:46.920 --> 0:00:49.320
<v Speaker 1>new parts of it that I have yet to really

0:00:49.360 --> 0:00:52.400
<v Speaker 1>get to. I will do my absolute best to explain

0:00:52.440 --> 0:00:55.240
<v Speaker 1>in these episodes both what is happening with Deep Seek,

0:00:55.640 --> 0:00:58.320
<v Speaker 1>what it means, what they've built, and what it's going

0:00:58.360 --> 0:01:01.880
<v Speaker 1>to do in the future. But let's again, so, as

0:01:01.920 --> 0:01:05.200
<v Speaker 1>general came to a close, the entire generative AI industry

0:01:05.240 --> 0:01:07.880
<v Speaker 1>found itself in a kind of chaos. In sure, the

0:01:07.920 --> 0:01:10.880
<v Speaker 1>recent AI bubble and in particular the hundreds of billions

0:01:10.920 --> 0:01:14.040
<v Speaker 1>of dollars being spent on it, hinged on this big

0:01:14.080 --> 0:01:16.720
<v Speaker 1>idea that we need bigger models, which are both trained

0:01:16.760 --> 0:01:20.319
<v Speaker 1>and run on bigger and even larger GPUs, almost entirely

0:01:20.319 --> 0:01:23.639
<v Speaker 1>sold by Nvidia, and in turn they're based in bigger

0:01:23.680 --> 0:01:27.880
<v Speaker 1>and bigger data centers owned by companies like Microsoft, Oracle, Amazon,

0:01:27.920 --> 0:01:31.320
<v Speaker 1>and Google. Now, there was also this expectation that this

0:01:31.360 --> 0:01:34.840
<v Speaker 1>would always be the case. Hubris within this industry is

0:01:34.920 --> 0:01:39.280
<v Speaker 1>kind of part of the whole deal, and generative AI

0:01:39.520 --> 0:01:41.360
<v Speaker 1>was always meant to be this way, at least for

0:01:41.400 --> 0:01:43.920
<v Speaker 1>the American developers. It was always meant to be energy

0:01:43.959 --> 0:01:47.280
<v Speaker 1>and compute hungary. Throwing entire Zoo's worth of animals and

0:01:47.360 --> 0:01:50.000
<v Speaker 1>boiling lakes was necessary to do this. There was never

0:01:50.000 --> 0:01:54.040
<v Speaker 1>any other way to do it, and I thought, at

0:01:54.120 --> 0:01:56.200
<v Speaker 1>least I've thought for a while that this was because

0:01:56.280 --> 0:01:59.760
<v Speaker 1>they just they tried to make them more efficient, but

0:01:59.840 --> 0:02:03.080
<v Speaker 1>they couldn't. There was just something about transformer based architecture,

0:02:03.120 --> 0:02:05.639
<v Speaker 1>like the stuff that underpins Chat GPT, so the GPT

0:02:05.720 --> 0:02:09.160
<v Speaker 1>model under Chat GPT either. It wasn't the case, though.

0:02:10.000 --> 0:02:12.960
<v Speaker 1>A Chinese artificial intelligence company that few people had really

0:02:13.040 --> 0:02:15.280
<v Speaker 1>heard of, called deep Seak came along a few weeks

0:02:15.280 --> 0:02:19.000
<v Speaker 1>ago with multiple models that aren't merely competitive with open aiyes,

0:02:19.160 --> 0:02:22.919
<v Speaker 1>but actually undercut them in several meaningful ways. Deep Seak's

0:02:22.960 --> 0:02:25.680
<v Speaker 1>models are both open source, which means that their source

0:02:25.720 --> 0:02:29.680
<v Speaker 1>code and research is public, and they're significantly more efficient

0:02:29.680 --> 0:02:32.640
<v Speaker 1>as well as much as thirty times cheaper to run.

0:02:32.720 --> 0:02:34.880
<v Speaker 1>In the case of their reasoning model are one which

0:02:34.919 --> 0:02:38.560
<v Speaker 1>is competitive with open Aizo one and fifteen or more

0:02:38.639 --> 0:02:43.200
<v Speaker 1>times more efficient than GPT four. Oh, it's actually kind

0:02:43.240 --> 0:02:45.200
<v Speaker 1>of crazy when you think about it, and as you're

0:02:45.240 --> 0:02:47.640
<v Speaker 1>going to hear, this whole thing has jokeified me all

0:02:47.680 --> 0:02:50.440
<v Speaker 1>over again. And what's crazy is that some of them

0:02:50.440 --> 0:02:52.799
<v Speaker 1>can be distilled, which I'll get too later and run

0:02:52.840 --> 0:02:55.800
<v Speaker 1>on local devices like a laptop. It's kind of crazy,

0:02:56.160 --> 0:02:58.600
<v Speaker 1>and as a result, the markets have kind of panicked

0:02:58.639 --> 0:03:02.480
<v Speaker 1>because the entire narrative of the AI bubble has been

0:03:02.520 --> 0:03:04.800
<v Speaker 1>that these models have to be expensive because they are

0:03:04.840 --> 0:03:08.079
<v Speaker 1>the future, and that's why hyperscalers had to burn two

0:03:08.160 --> 0:03:12.040
<v Speaker 1>hundred billion dollars in capital expenditures for infrastructure to support

0:03:12.080 --> 0:03:15.919
<v Speaker 1>this wonderful boom, and specifically the ideas of open AI

0:03:16.000 --> 0:03:18.920
<v Speaker 1>and anthropic the idea that there was another way to

0:03:18.960 --> 0:03:20.840
<v Speaker 1>do this, that in fact, we didn't need to spend

0:03:20.840 --> 0:03:22.600
<v Speaker 1>all this money, and that maybe we could find a

0:03:22.639 --> 0:03:26.960
<v Speaker 1>more efficient way of doing it. Well, that would require

0:03:27.000 --> 0:03:29.239
<v Speaker 1>them to have another idea rather than throw as much

0:03:29.280 --> 0:03:32.000
<v Speaker 1>money at the problem as possible. Yeah, they just didn't

0:03:32.080 --> 0:03:35.200
<v Speaker 1>consider it, it turns out. And now as long as

0:03:35.240 --> 0:03:38.920
<v Speaker 1>come this outsider that's upended the whole conventional understanding and

0:03:39.120 --> 0:03:43.440
<v Speaker 1>perhaps even dethroned a member of America's tech royalty, Sam Altman,

0:03:43.480 --> 0:03:46.000
<v Speaker 1>a man who has crafted, if not a cult of personality,

0:03:46.400 --> 0:03:49.440
<v Speaker 1>some sort of public image of an unassailable visionary that

0:03:49.480 --> 0:03:52.200
<v Speaker 1>will lead the vanguard in the biggest technological change since

0:03:52.200 --> 0:03:56.680
<v Speaker 1>the Internet. Yeah, he's wrong. He never was doing that.

0:03:56.960 --> 0:03:59.440
<v Speaker 1>I've been saying it for a while. He's never been

0:03:59.480 --> 0:04:02.880
<v Speaker 1>doing this. But Deep Seek isn't just an outsider now.

0:04:02.920 --> 0:04:05.440
<v Speaker 1>They are a company that's emerged as a side project

0:04:05.440 --> 0:04:08.720
<v Speaker 1>from a tiny, tiny Chinese hedge fund, at least by

0:04:08.760 --> 0:04:10.880
<v Speaker 1>the stands of hedge funds, like five point five billion

0:04:10.920 --> 0:04:14.200
<v Speaker 1>dollars on the assets under management, and their founding team

0:04:14.240 --> 0:04:16.839
<v Speaker 1>has nowhere near the level of fame and celebrity or

0:04:16.839 --> 0:04:21.000
<v Speaker 1>even the accolades of Sam Moltman. It's distinctly humiliating for

0:04:21.080 --> 0:04:23.920
<v Speaker 1>everyone involved that is in Deep Seek. And on top

0:04:23.920 --> 0:04:27.479
<v Speaker 1>of all of that, Deep Seek's biggest ugliest insult is

0:04:27.480 --> 0:04:30.360
<v Speaker 1>that its model, deep seek are one, is competitive, like

0:04:30.400 --> 0:04:33.799
<v Speaker 1>I said, with open AI's incredibly expensive oh one reasoning model,

0:04:33.960 --> 0:04:37.880
<v Speaker 1>yet significantly and I mean ninety six percent cheaper to run.

0:04:38.120 --> 0:04:40.120
<v Speaker 1>And it can even be run locally. Like I said

0:04:40.440 --> 0:04:42.520
<v Speaker 1>speaking to a few developers, I know one was able

0:04:42.520 --> 0:04:44.679
<v Speaker 1>to run deep Seek's R one model and their twenty

0:04:44.760 --> 0:04:47.279
<v Speaker 1>twenty one MacBook Pro with an M one chip that

0:04:47.400 --> 0:04:51.480
<v Speaker 1>is a four year old computer, not a thirty thousand

0:04:51.680 --> 0:04:55.440
<v Speaker 1>GPU inside. It's kind of crazy. Worse still, Deep seeks

0:04:55.480 --> 0:04:58.159
<v Speaker 1>models are made freely available to use, with the source

0:04:58.160 --> 0:05:01.200
<v Speaker 1>code published under the MIT tech line, along with the

0:05:01.200 --> 0:05:04.119
<v Speaker 1>research on how they were made, although not the training data,

0:05:04.160 --> 0:05:06.159
<v Speaker 1>which makes some people say it's not really open source.

0:05:06.160 --> 0:05:08.280
<v Speaker 1>But for the sake of argument, I'm just going to

0:05:08.320 --> 0:05:11.080
<v Speaker 1>say open source. And this means by the way that

0:05:11.320 --> 0:05:14.120
<v Speaker 1>deep seeks models can be adapted and used for commercial

0:05:14.200 --> 0:05:17.599
<v Speaker 1>use without the need for royalties or fees. Anyone can

0:05:17.640 --> 0:05:20.880
<v Speaker 1>take this and build their own. It's kind of crazy.

0:05:21.400 --> 0:05:24.200
<v Speaker 1>By contrast, open ai is anything but open and its

0:05:24.240 --> 0:05:26.840
<v Speaker 1>last LM to be released under the MIT license was

0:05:26.880 --> 0:05:30.479
<v Speaker 1>twenty nineteen. Is GBT two No no wait wait, ship

0:05:30.680 --> 0:05:33.800
<v Speaker 1>let me correct that deep Seek's biggest ugliest secret is

0:05:33.839 --> 0:05:36.880
<v Speaker 1>actually that it's obviously taking aim at every element of

0:05:36.920 --> 0:05:40.839
<v Speaker 1>open aiy's portfolio. As the company was already dominating headlines,

0:05:40.880 --> 0:05:43.719
<v Speaker 1>this week it quietly dropped its Janus Pro seven B

0:05:43.839 --> 0:05:47.360
<v Speaker 1>image generation and analysis model, which the company says outperforms

0:05:47.360 --> 0:05:50.719
<v Speaker 1>both stable diffusion and open AI's Daly three. And those

0:05:50.760 --> 0:05:53.480
<v Speaker 1>are by the way image generation thinks. So you type

0:05:53.480 --> 0:05:57.200
<v Speaker 1>in something you like Garfield with boobs, and then outcomes

0:05:57.200 --> 0:06:00.560
<v Speaker 1>of Garfield with juicy cans, and that's probably the first

0:06:00.560 --> 0:06:02.560
<v Speaker 1>time you hear that on the podcast, but probably not

0:06:02.640 --> 0:06:06.840
<v Speaker 1>the last. And as with its other code, deep Seek

0:06:06.880 --> 0:06:09.320
<v Speaker 1>has made this freely available to both commercial and personal

0:06:09.400 --> 0:06:13.560
<v Speaker 1>users alike, whereas open ai is largely paywall darly three.

0:06:13.640 --> 0:06:17.520
<v Speaker 1>This is really, it's a truly crazy situation. And it's

0:06:17.520 --> 0:06:20.520
<v Speaker 1>also this cynical, vulgar version of David and Goliath, where

0:06:20.520 --> 0:06:23.200
<v Speaker 1>a tech startup back by a shadowy Chinese hedge fund

0:06:23.360 --> 0:06:26.520
<v Speaker 1>with eight billion dollars under management is somehow the plucky

0:06:26.560 --> 0:06:29.000
<v Speaker 1>upstart against the lumbering loss eo fish one hundred and

0:06:29.040 --> 0:06:33.000
<v Speaker 1>fifty billion dollars startup back by multiple public tech companies

0:06:33.000 --> 0:06:36.599
<v Speaker 1>with a market capitalization of other three trillion dollars I realized,

0:06:36.600 --> 0:06:39.119
<v Speaker 1>by the way I said earlier, five point five billion

0:06:39.160 --> 0:06:41.719
<v Speaker 1>dollars under management. This is why you check your notes

0:06:41.720 --> 0:06:44.040
<v Speaker 1>in advance. But I'm not cutting it. This is fresh.

0:06:44.120 --> 0:06:47.120
<v Speaker 1>I am inside a closet in New York. The content

0:06:47.320 --> 0:06:51.159
<v Speaker 1>must flow anyway. Deep Seek's V three model, which is

0:06:51.160 --> 0:06:54.080
<v Speaker 1>comparable and competitive with both open AI's GPT four roh

0:06:54.160 --> 0:06:57.360
<v Speaker 1>and anthropics Claude Sonnet three point five models, which by

0:06:57.360 --> 0:07:00.480
<v Speaker 1>the way, has some reasoning features. As I said, it's

0:07:00.520 --> 0:07:03.839
<v Speaker 1>fifty three times cheaper to run the R one when

0:07:03.920 --> 0:07:08.040
<v Speaker 1>using the company's own cloud services, and as mentioned earlier,

0:07:08.080 --> 0:07:11.000
<v Speaker 1>said model is effectively free for anyone to use locally

0:07:11.080 --> 0:07:13.240
<v Speaker 1>or on their own cloud instances, and could be taken

0:07:13.280 --> 0:07:15.640
<v Speaker 1>by any commercial enterprise and turned into a product of

0:07:15.640 --> 0:07:19.680
<v Speaker 1>their own should they desire to say, compete with open Ai,

0:07:19.800 --> 0:07:24.400
<v Speaker 1>the loudest and most annoying startup of all time. In essence, Deepseek,

0:07:24.440 --> 0:07:26.800
<v Speaker 1>and I'll get into its background and the concerns people

0:07:26.840 --> 0:07:29.600
<v Speaker 1>might have about its Chinese origins released two models that

0:07:29.640 --> 0:07:32.640
<v Speaker 1>perform competitively and even beat models from both open Air

0:07:32.720 --> 0:07:35.760
<v Speaker 1>and Anthropic, undercut them in price, and then made them

0:07:35.800 --> 0:07:38.880
<v Speaker 1>open undermining not just the economics of the biggest generative

0:07:38.880 --> 0:07:42.360
<v Speaker 1>AI companies, but laying bare exactly how they work. The

0:07:42.400 --> 0:07:47.240
<v Speaker 1>magic's gone. There's no more voodoo inside Samultman's soul. It's

0:07:47.320 --> 0:07:51.440
<v Speaker 1>all out there. And the last point is extremely important

0:07:51.480 --> 0:07:54.480
<v Speaker 1>when it comes to open EI's reasoning model, which specifically

0:07:54.600 --> 0:07:57.080
<v Speaker 1>hid its chain of thought for fear of these unsafe

0:07:57.120 --> 0:08:00.200
<v Speaker 1>thoughts that might manipulate the customer. And then they add

0:08:00.280 --> 0:08:02.600
<v Speaker 1>slightly under their breath that the actual reasons they did

0:08:02.640 --> 0:08:05.720
<v Speaker 1>it was a competitive advantage. Now to explain what that means.

0:08:05.880 --> 0:08:09.640
<v Speaker 1>When you make a request with open Aiy's oh one model,

0:08:09.720 --> 0:08:11.720
<v Speaker 1>say give me all the states with the letter are

0:08:11.840 --> 0:08:14.720
<v Speaker 1>in them, it actually shows you like the thinking. And

0:08:14.720 --> 0:08:16.880
<v Speaker 1>by the way, these things don't fucking think. They're they're

0:08:16.920 --> 0:08:19.880
<v Speaker 1>computer bullshit, like they don't think at all. But I'm

0:08:19.880 --> 0:08:22.320
<v Speaker 1>going to use it just for this so you see it.

0:08:22.360 --> 0:08:26.000
<v Speaker 1>Say okay, here are all the American states, which ones

0:08:26.040 --> 0:08:29.080
<v Speaker 1>have that letter? I'm checking all of those. It's effectively

0:08:29.120 --> 0:08:32.440
<v Speaker 1>having a large language model check a large language model. Now,

0:08:32.600 --> 0:08:35.280
<v Speaker 1>the thing is the steps they were showing you were

0:08:35.280 --> 0:08:37.560
<v Speaker 1>all cleaned up. They would look nice, they would be

0:08:37.600 --> 0:08:41.440
<v Speaker 1>formatted nicely. Deep Seak's chain of thought is completely laid bare,

0:08:42.080 --> 0:08:46.000
<v Speaker 1>which is very interesting because it really takes the wind

0:08:46.000 --> 0:08:48.800
<v Speaker 1>out of open Aiy's sales. And on top of that,

0:08:49.760 --> 0:08:52.320
<v Speaker 1>it allows you to see actually how these things think

0:08:52.400 --> 0:08:55.240
<v Speaker 1>through things, again not really thinking, but still you can

0:08:55.280 --> 0:08:57.959
<v Speaker 1>see things about how large language models work that these

0:08:57.960 --> 0:09:00.440
<v Speaker 1>companies didn't want you to have. On top of this,

0:09:00.840 --> 0:09:04.560
<v Speaker 1>open aiy one model has something even shittier to it,

0:09:04.600 --> 0:09:07.240
<v Speaker 1>which is these chain of thought things all cost money.

0:09:07.600 --> 0:09:10.880
<v Speaker 1>When you see it generate these thoughts, it's actually generating

0:09:10.920 --> 0:09:13.240
<v Speaker 1>more thoughts than you see because they're hiding the chain

0:09:13.280 --> 0:09:15.440
<v Speaker 1>of thought. So open ai is just charging you an

0:09:15.440 --> 0:09:18.200
<v Speaker 1>indeterminate amount of money, an insane amount of money, as

0:09:18.200 --> 0:09:21.360
<v Speaker 1>I'll get too later. But nevertheless, you don't know what

0:09:21.400 --> 0:09:23.920
<v Speaker 1>you're being charged for. You don't even know what's really

0:09:23.960 --> 0:09:26.720
<v Speaker 1>going on under the hood. Or you could use deep

0:09:26.760 --> 0:09:30.439
<v Speaker 1>seek and let's be completely clear, by the way, open

0:09:30.440 --> 0:09:34.319
<v Speaker 1>AI's literal only competitive advantage against Meta and Anthropic was

0:09:34.400 --> 0:09:37.200
<v Speaker 1>its reasoning models OH one and O three and O three,

0:09:37.200 --> 0:09:38.839
<v Speaker 1>by the way, is currently in a research preview and

0:09:38.960 --> 0:09:41.920
<v Speaker 1>is mostly just more of the same. Although I mentioned

0:09:41.960 --> 0:09:44.480
<v Speaker 1>earlier in the show that anthropics. Claudes Sonnet three point

0:09:44.520 --> 0:09:48.480
<v Speaker 1>five has some reasoning features. They're comparatively more rudimentary than

0:09:48.520 --> 0:09:50.600
<v Speaker 1>those in O one and O three, and i'd argue

0:09:50.679 --> 0:09:54.400
<v Speaker 1>are one, which is deep Seek's model. In an AI context,

0:09:54.480 --> 0:09:56.839
<v Speaker 1>reasoning works by breaking down a prompt into a series

0:09:56.840 --> 0:10:00.480
<v Speaker 1>of different steps with considerations of different approaches. Like I

0:10:00.520 --> 0:10:03.439
<v Speaker 1>said earlier, effectively a large language model checking its own

0:10:03.480 --> 0:10:06.480
<v Speaker 1>homework with no thinking involved, because like I said, they

0:10:06.480 --> 0:10:09.520
<v Speaker 1>do not think or no things an open Ai rushed

0:10:09.559 --> 0:10:12.160
<v Speaker 1>to launch its O one reasoning model last year because,

0:10:12.320 --> 0:10:15.720
<v Speaker 1>and I quote fortune from last October, Sam Mormon was

0:10:16.000 --> 0:10:19.320
<v Speaker 1>eager to prove to potential investors that in the company's

0:10:19.400 --> 0:10:22.080
<v Speaker 1>latest funding around, the open ai remains at the forefront

0:10:22.120 --> 0:10:25.480
<v Speaker 1>of AI development, and as I've noted in my newsletter

0:10:25.520 --> 0:10:28.400
<v Speaker 1>at the time, it was not particularly reliable, failing to

0:10:28.440 --> 0:10:31.040
<v Speaker 1>accurately count the number of times the letter R appeared

0:10:31.040 --> 0:10:33.800
<v Speaker 1>in the word strawberry, which was the code name for

0:10:34.240 --> 0:10:38.080
<v Speaker 1>one very funny stuff. At this point, it's fairly obvious

0:10:38.120 --> 0:10:41.400
<v Speaker 1>that open ai wasn't anywhere near the forefront of AI development,

0:10:41.640 --> 0:10:44.440
<v Speaker 1>and now that its competitive advantage is effectively gone, there

0:10:44.440 --> 0:10:47.000
<v Speaker 1>are genuine doubts about what comes next for the company.

0:10:48.280 --> 0:10:51.000
<v Speaker 1>As I'll go into there are many questionable parts of

0:10:51.000 --> 0:10:53.960
<v Speaker 1>Deepseek's story. It's funding, what GPUs it has, and how

0:10:54.040 --> 0:10:56.720
<v Speaker 1>much it actually spent training these models. But what we

0:10:56.840 --> 0:11:00.680
<v Speaker 1>definitively understand to be true is badly for open Ai,

0:11:00.880 --> 0:11:03.480
<v Speaker 1>and I would argue every other large US tech firm

0:11:03.480 --> 0:11:06.160
<v Speaker 1>that's jumped onto the generative AI bandwagon in the past

0:11:06.160 --> 0:11:20.200
<v Speaker 1>few years. Deep seeks models actually exist. They work, at

0:11:20.280 --> 0:11:22.880
<v Speaker 1>least by the standards of hallucination PRONELLA lams that don't,

0:11:22.920 --> 0:11:25.959
<v Speaker 1>at the risk of repeating myself know anything. They've been

0:11:26.000 --> 0:11:29.680
<v Speaker 1>independently verified to be competitive in performance, and their magnitudes

0:11:29.800 --> 0:11:34.400
<v Speaker 1>cheaper in price than those from both hyperscalers, Google's Gemini, Mets, Lama,

0:11:34.440 --> 0:11:36.560
<v Speaker 1>Amazon Que and so on and so forth, and from

0:11:36.600 --> 0:11:41.000
<v Speaker 1>those released by open Ai and Anthropic. Deep seeks models

0:11:41.040 --> 0:11:44.200
<v Speaker 1>don't require massive new data centers. They run on GPUs

0:11:44.240 --> 0:11:47.040
<v Speaker 1>currently used to run services like chat, GPT, and even

0:11:47.080 --> 0:11:50.000
<v Speaker 1>work on more austere hardware, Nor do they require an

0:11:50.120 --> 0:11:53.840
<v Speaker 1>endless supply of bigger, faster Nvidio GPUs every single year

0:11:53.880 --> 0:11:57.920
<v Speaker 1>to progress. The entire AI bubble was inflated based on

0:11:57.960 --> 0:12:00.600
<v Speaker 1>the premise that these models were simply impossible to build

0:12:00.600 --> 0:12:04.000
<v Speaker 1>without burning massive amounts of cash, straining the power grid,

0:12:04.000 --> 0:12:07.400
<v Speaker 1>and blowing past emission skulls, and that these costs were

0:12:07.400 --> 0:12:11.560
<v Speaker 1>both necessary and really good because they'd lead to creating

0:12:11.600 --> 0:12:15.400
<v Speaker 1>powerful AI, something that's yet to happen. And it's kind

0:12:15.400 --> 0:12:18.319
<v Speaker 1>of obvious at this point that that wasn't true. Now

0:12:18.360 --> 0:12:22.600
<v Speaker 1>the markets are sitting around there asking a very reasonable question, Shit,

0:12:22.760 --> 0:12:27.400
<v Speaker 1>did we just waste two hundred billion dollars? Anyway, let's

0:12:27.400 --> 0:12:30.720
<v Speaker 1>get into the nitty grit. What is deep Seek? First

0:12:30.760 --> 0:12:32.760
<v Speaker 1>of all, if you want to super deep dive into

0:12:32.800 --> 0:12:35.240
<v Speaker 1>what it is, I can't recommend venture beats right up enough.

0:12:35.280 --> 0:12:36.880
<v Speaker 1>I'll link to it in the show notes as they

0:12:36.960 --> 0:12:39.800
<v Speaker 1>usually do. It's really good and it goes into a

0:12:39.800 --> 0:12:42.120
<v Speaker 1>lot more detail than I woar. But here's the too

0:12:42.200 --> 0:12:44.880
<v Speaker 1>long didn't read for you. Deep Seek is a spin

0:12:44.920 --> 0:12:47.520
<v Speaker 1>off from a Chinese hedge fund called high Flyer Quant.

0:12:47.840 --> 0:12:50.079
<v Speaker 1>It's a relatively small and young company, and from its

0:12:50.120 --> 0:12:52.960
<v Speaker 1>inception it went big on algorithmic and AI driven trading.

0:12:53.320 --> 0:12:56.120
<v Speaker 1>Later it started building its own standalone chat bots, including

0:12:56.120 --> 0:12:59.440
<v Speaker 1>a chat GPT equivalent for the Chinese market. This is

0:12:59.559 --> 0:13:01.760
<v Speaker 1>what we need, right Now, I'm sure some of you

0:13:01.800 --> 0:13:05.080
<v Speaker 1>will say, oh, well, who knows if that's really true. Sure,

0:13:05.520 --> 0:13:07.760
<v Speaker 1>I think that that's fair. I also think that there

0:13:07.760 --> 0:13:09.880
<v Speaker 1>are parts of Sam Mortman's legend that we should question

0:13:09.960 --> 0:13:13.280
<v Speaker 1>as well. I think the circumstances under which Sam Mortman

0:13:13.360 --> 0:13:16.880
<v Speaker 1>got made head of y Combinator are extremely questionable. I'm

0:13:16.920 --> 0:13:19.240
<v Speaker 1>saying you can question deep Seek, and indeed you should.

0:13:19.240 --> 0:13:21.920
<v Speaker 1>We should be more critical of these powerful companies, but

0:13:22.040 --> 0:13:24.520
<v Speaker 1>don't do it halfway. If we're going to be worried,

0:13:24.600 --> 0:13:28.360
<v Speaker 1>let's be worried about everyone. Now. Deepseak did a few

0:13:28.360 --> 0:13:31.200
<v Speaker 1>things differently, like open sourcing its models, although it likely

0:13:31.240 --> 0:13:34.800
<v Speaker 1>built upon take from other companies like Metaslama and the

0:13:35.160 --> 0:13:38.680
<v Speaker 1>mL library PyTorch to train its models. It's secured over

0:13:38.760 --> 0:13:43.160
<v Speaker 1>ten thousand Nvidia GPUs right before the US imposed export restrictions,

0:13:43.160 --> 0:13:45.240
<v Speaker 1>which sounds like a lot, but it's a fraction of

0:13:45.240 --> 0:13:47.320
<v Speaker 1>what the big AI labs like Google, Open Air, and

0:13:47.360 --> 0:13:50.480
<v Speaker 1>Anthropic have to play with. I think I've heard estimates

0:13:50.520 --> 0:13:53.120
<v Speaker 1>of like one hundred thousand to three hundred thousand each,

0:13:53.200 --> 0:13:56.199
<v Speaker 1>if not more. Now you've likely seen or heard that

0:13:56.280 --> 0:13:59.080
<v Speaker 1>deep Seak trained its latest model for five point six

0:13:59.120 --> 0:14:01.520
<v Speaker 1>million dollars a poster to the insane amounts that I'll

0:14:01.520 --> 0:14:03.640
<v Speaker 1>get to later, and I want to be clear that

0:14:03.840 --> 0:14:06.760
<v Speaker 1>any and all mentions of this number are estimates. In fact,

0:14:06.800 --> 0:14:09.600
<v Speaker 1>the provenance of the five point five to eight million

0:14:09.679 --> 0:14:12.000
<v Speaker 1>dollar number appears to be a citation of a post

0:14:12.040 --> 0:14:15.080
<v Speaker 1>made by an nvidio engineer in an article from the

0:14:15.120 --> 0:14:18.199
<v Speaker 1>South China Morning Post, which links to another article from

0:14:18.240 --> 0:14:21.040
<v Speaker 1>the South China Morning Post which simply states that deep

0:14:21.080 --> 0:14:23.480
<v Speaker 1>Seat V three comes with six hundred and seventy one

0:14:23.480 --> 0:14:25.880
<v Speaker 1>billion parameters and was trained in around two months at

0:14:25.880 --> 0:14:28.400
<v Speaker 1>the cost of five point five eight million dollars with

0:14:28.480 --> 0:14:31.640
<v Speaker 1>no additional citations of any kind. So you should take

0:14:31.640 --> 0:14:36.320
<v Speaker 1>it with a pinch of salt. But it's not totally ludicrous. Well,

0:14:36.360 --> 0:14:38.920
<v Speaker 1>there are some that have estimated the cost. Deep Seeks

0:14:39.000 --> 0:14:41.840
<v Speaker 1>V three models allegedly trained using two thousand and forty

0:14:41.880 --> 0:14:45.440
<v Speaker 1>eight n video h eight hundred GPUs according to its paper,

0:14:46.000 --> 0:14:48.840
<v Speaker 1>and Ben Thompson of Strategory has made this clear that

0:14:48.880 --> 0:14:51.440
<v Speaker 1>the five point five million dollar number only covers the

0:14:51.480 --> 0:14:54.520
<v Speaker 1>literal training cost of the official training run, and this

0:14:54.640 --> 0:14:56.400
<v Speaker 1>is made fairly clear in the paper by the way

0:14:56.520 --> 0:14:59.080
<v Speaker 1>of V three, and that's the one that's competitive with

0:14:59.200 --> 0:15:02.400
<v Speaker 1>Opening Eyes GPT four O model, meaning that any costs

0:15:02.440 --> 0:15:04.680
<v Speaker 1>related to prior research or experiments on how to build

0:15:04.680 --> 0:15:07.800
<v Speaker 1>the mooddle were left out. Now big big shower to

0:15:07.800 --> 0:15:10.400
<v Speaker 1>Minimaxer the guy on Blue Sky and Twitter, he's great.

0:15:10.960 --> 0:15:13.200
<v Speaker 1>He is wonderful, and also added that this is fairly

0:15:13.200 --> 0:15:16.240
<v Speaker 1>standard for the industry. Again, you choose how you feel

0:15:16.240 --> 0:15:17.840
<v Speaker 1>about this, but I want to give you the information.

0:15:19.080 --> 0:15:21.680
<v Speaker 1>And while it's safe to say that deep Seak's models

0:15:21.680 --> 0:15:24.600
<v Speaker 1>are cheaper to train, the actual costs, especially as deep

0:15:24.600 --> 0:15:27.040
<v Speaker 1>Seak doesn't share its training data, which some might argue

0:15:27.040 --> 0:15:29.440
<v Speaker 1>means its models are not really open source. As I said,

0:15:30.560 --> 0:15:33.400
<v Speaker 1>the numbers get a little harder to guess at. Thompson

0:15:33.440 --> 0:15:35.160
<v Speaker 1>notes that Deep Seek had to craft a bunch of

0:15:35.160 --> 0:15:38.560
<v Speaker 1>elegant workarounds to make the model perform, including writing code

0:15:38.560 --> 0:15:41.600
<v Speaker 1>that ultimately changed how GPUs actually communicated with each other.

0:15:41.960 --> 0:15:45.880
<v Speaker 1>This functionality isn't otherwise possible using Nvidia's developer tools. They

0:15:46.000 --> 0:15:47.760
<v Speaker 1>really had to get in there. It's kind of cool.

0:15:48.160 --> 0:15:50.720
<v Speaker 1>Deep seaks models V three and R one are more

0:15:50.760 --> 0:15:53.160
<v Speaker 1>efficient and as a result, cheaper to run, and can

0:15:53.200 --> 0:15:56.560
<v Speaker 1>be accessed via its API at prices that are astronomically

0:15:56.640 --> 0:16:00.240
<v Speaker 1>cheaper than open eyes, Deep seat Chat running deep six

0:16:00.360 --> 0:16:03.960
<v Speaker 1>GPT four oh competitive V three model cost zero points

0:16:04.040 --> 0:16:07.640
<v Speaker 1>zero seven cents per one million input tokens as in

0:16:07.680 --> 0:16:11.080
<v Speaker 1>commands given to the model, and one dollar one ten

0:16:11.480 --> 0:16:14.520
<v Speaker 1>per one million output tokens as in the resulting output

0:16:14.560 --> 0:16:16.800
<v Speaker 1>from the model. I know that these numbers kind of

0:16:16.840 --> 0:16:19.200
<v Speaker 1>like just sound like numbers like you, Maybe you don't

0:16:19.240 --> 0:16:21.160
<v Speaker 1>have context, so let me give you some. This is

0:16:21.200 --> 0:16:24.440
<v Speaker 1>a dramatic price drop from the two dollars fifty cents

0:16:24.480 --> 0:16:28.040
<v Speaker 1>per one million input tokens and ten dollars per one

0:16:28.080 --> 0:16:32.520
<v Speaker 1>million output tokens the open Ai charges for GPT four. Oh,

0:16:33.200 --> 0:16:39.400
<v Speaker 1>this isn't just undercutting, this is this is a bunker buster. If. Now,

0:16:39.520 --> 0:16:41.560
<v Speaker 1>there is a side that I'll kind of get into

0:16:41.560 --> 0:16:44.160
<v Speaker 1>a little bit later, in that you are using models

0:16:44.160 --> 0:16:46.440
<v Speaker 1>hosted in the country that you don't know, probably China.

0:16:46.760 --> 0:16:49.920
<v Speaker 1>There are data concerns. But again, you can put this

0:16:50.040 --> 0:16:52.800
<v Speaker 1>on your own server. You could put this in Google Cloud.

0:16:52.880 --> 0:16:55.880
<v Speaker 1>Both Microsoft and Google are apparently thinking about it now.

0:16:55.880 --> 0:16:58.560
<v Speaker 1>The Information reported that Google had added it to Google Cloud.

0:16:58.720 --> 0:17:01.520
<v Speaker 1>No they did not. They didn't do that. They allowed

0:17:01.520 --> 0:17:03.840
<v Speaker 1>you to connect hugging face. This is a whole bunch

0:17:03.840 --> 0:17:06.159
<v Speaker 1>of technical stuff that if you understand, you'll be like, yeah, Ed,

0:17:06.240 --> 0:17:10.639
<v Speaker 1>I know. Long story short, the hyperscalers are already bringing

0:17:10.680 --> 0:17:13.920
<v Speaker 1>deep Seek out, and I'll get to why that's bad

0:17:14.200 --> 0:17:17.480
<v Speaker 1>later in detail. But it's also very funny. Now here's

0:17:17.520 --> 0:17:20.680
<v Speaker 1>something else that's funny. Deep seek reasoner. It's reasoning model

0:17:20.760 --> 0:17:23.600
<v Speaker 1>costs that fifty five cents per one million input tokens

0:17:23.680 --> 0:17:27.160
<v Speaker 1>and two dollars and nineteen cents per one million output tokens.

0:17:27.359 --> 0:17:31.360
<v Speaker 1>Now that sounds expensive. Maybe it is. Whatever, that's goddamn

0:17:31.480 --> 0:17:34.760
<v Speaker 1>nothing compared to the fifteen dollars per one million input

0:17:34.840 --> 0:17:37.600
<v Speaker 1>tokens and sixty dollars per one million output tokens of

0:17:37.640 --> 0:17:41.960
<v Speaker 1>open ai WOF. If I'm Sam Orman, I'm shitting myself.

0:17:43.560 --> 0:17:45.800
<v Speaker 1>But there's an obvious bar here. We do not know

0:17:45.840 --> 0:17:48.560
<v Speaker 1>where deep seek is hosting its models, who has access

0:17:48.560 --> 0:17:50.640
<v Speaker 1>to that data, or where that data is coming from

0:17:50.760 --> 0:17:52.960
<v Speaker 1>or going to. We don't know who funds deep Seek

0:17:53.040 --> 0:17:55.240
<v Speaker 1>other than it's connected to High Flyer, the hedge fund

0:17:55.240 --> 0:17:57.320
<v Speaker 1>that I mentioned earlier that it's split from. In twenty

0:17:57.359 --> 0:17:59.760
<v Speaker 1>twenty three, there are concerns that deep seak could be

0:17:59.760 --> 0:18:02.200
<v Speaker 1>stayed funded, and that deep Seek's low prices are a

0:18:02.280 --> 0:18:05.000
<v Speaker 1>kind of geopolitical weapon breaking the back of the generative

0:18:05.000 --> 0:18:08.440
<v Speaker 1>AI industry in America. I'm not really sure whether that's

0:18:08.480 --> 0:18:11.080
<v Speaker 1>the case or not. It's certainly true that China has

0:18:11.119 --> 0:18:13.720
<v Speaker 1>long treated AI as a strategic part of its national

0:18:13.760 --> 0:18:16.840
<v Speaker 1>industrial policy and is reported to help companies and sectors

0:18:16.840 --> 0:18:18.800
<v Speaker 1>where it wants to catch up with the Western world.

0:18:19.480 --> 0:18:21.879
<v Speaker 1>The Made in China twenty twenty five initiatives SAW are

0:18:21.880 --> 0:18:25.399
<v Speaker 1>reported hundreds of billions of dollars provided to Chinese firms

0:18:25.440 --> 0:18:28.960
<v Speaker 1>working in industries like chip making, aviation, and yeah AI.

0:18:29.400 --> 0:18:32.760
<v Speaker 1>The extent of that support isn't exactly transparent, surprise, surprise,

0:18:33.000 --> 0:18:34.760
<v Speaker 1>and so it's not entirely out of the realm of

0:18:34.760 --> 0:18:37.800
<v Speaker 1>possibility that deep Seek is also the recipient of state aid.

0:18:38.240 --> 0:18:39.760
<v Speaker 1>The good news is that we're going to find out

0:18:39.840 --> 0:18:43.720
<v Speaker 1>fairly quickly. American AI infrastructure company Grok is already bringing

0:18:43.760 --> 0:18:46.680
<v Speaker 1>deep Seek's model online, meaning that we'll get at least

0:18:46.720 --> 0:18:49.760
<v Speaker 1>a very some sort of confirmation of whether these prices

0:18:49.760 --> 0:18:52.520
<v Speaker 1>are realistic or whether they're heavily subsidized by whoever it

0:18:52.560 --> 0:18:55.080
<v Speaker 1>is that backs deep Seek. It's also true that deep

0:18:55.080 --> 0:18:57.280
<v Speaker 1>seek is owned in part by a hedge fund, which

0:18:57.359 --> 0:19:00.479
<v Speaker 1>likely isn't short of cash to pump into them. But

0:19:00.520 --> 0:19:03.439
<v Speaker 1>as in the side, given the open AI is the

0:19:03.520 --> 0:19:07.199
<v Speaker 1>benefactor of billions of dollars of cloud compute credits and

0:19:07.240 --> 0:19:10.600
<v Speaker 1>gets reduced pricing for Microsoft's zero cloud services to run

0:19:10.640 --> 0:19:13.560
<v Speaker 1>its actual models, it's a bit tough for them to

0:19:13.600 --> 0:19:16.439
<v Speaker 1>complain about Arrival being subsidized by a larger entity with

0:19:16.480 --> 0:19:18.960
<v Speaker 1>the ability to absorb the costs of doing business should

0:19:19.040 --> 0:19:21.560
<v Speaker 1>that be the case. Same goes for anthropic by the way,

0:19:21.920 --> 0:19:24.359
<v Speaker 1>and yes, I know Microsoft isn't a state, but with

0:19:24.400 --> 0:19:26.960
<v Speaker 1>a market cap of three point two trillion dollars in

0:19:27.040 --> 0:19:30.320
<v Speaker 1>quarterly revenues, larger than the combined GDPs of some EU

0:19:30.400 --> 0:19:33.000
<v Speaker 1>and NATO nations, it's kind of the next best thing.

0:19:33.640 --> 0:19:36.560
<v Speaker 1>But I digress. Whatever concerns there may be about malign

0:19:36.680 --> 0:19:40.000
<v Speaker 1>Chinese influence of bordering on irrelevant outside of the low prices,

0:19:40.040 --> 0:19:43.080
<v Speaker 1>of course, offered by deepseek itself, and even that is

0:19:43.080 --> 0:19:46.080
<v Speaker 1>speculative at this point. Once these models are hosted elsewhere,

0:19:46.119 --> 0:19:48.240
<v Speaker 1>and once deep Seek's methods, which I'll get to in

0:19:48.280 --> 0:19:50.760
<v Speaker 1>a little bit, are recreated, and by the way, that's

0:19:50.800 --> 0:19:52.840
<v Speaker 1>not really going to take very long. I believe we're

0:19:52.840 --> 0:19:54.880
<v Speaker 1>going to see that these prices are indicative of how

0:19:54.960 --> 0:20:11.280
<v Speaker 1>cheap these models are to run. So you might be wondering,

0:20:11.359 --> 0:20:13.480
<v Speaker 1>how the hell is this so much cheaper? And that's

0:20:13.480 --> 0:20:15.639
<v Speaker 1>a bloody good question. And because I'm me, I have

0:20:15.680 --> 0:20:19.520
<v Speaker 1>a hypothesis. I do not believe that the companies making

0:20:19.600 --> 0:20:22.520
<v Speaker 1>these foundation models, such as Open Air and Anthropic, have

0:20:22.600 --> 0:20:25.639
<v Speaker 1>actually been incentivized to do more with less. And because

0:20:25.680 --> 0:20:29.359
<v Speaker 1>they're chummy little relationships with hyperscalers like Amazon, Google and

0:20:29.400 --> 0:20:33.040
<v Speaker 1>Microsoft were focused almost entirely on making the biggest, most

0:20:33.119 --> 0:20:37.240
<v Speaker 1>hugest models possible, using the biggest, even hugerris chips. And

0:20:37.280 --> 0:20:39.960
<v Speaker 1>because the absence of profitability didn't stop them from raising

0:20:40.000 --> 0:20:43.200
<v Speaker 1>more money. Well, they've never had to be fucking efficient,

0:20:43.320 --> 0:20:46.520
<v Speaker 1>have they. They've never had to try. Maybe they should

0:20:46.520 --> 0:20:50.359
<v Speaker 1>buy less avocado fucking toast. Anyway, let me put it

0:20:50.359 --> 0:20:53.960
<v Speaker 1>in simpler terms. Imagine living on fifteen hundred dollars a month,

0:20:54.040 --> 0:20:55.639
<v Speaker 1>and then imagine how you'd live on one hundred and

0:20:55.680 --> 0:20:57.800
<v Speaker 1>fifty thousand dollars a month, and that you have to,

0:20:58.160 --> 0:21:00.479
<v Speaker 1>like Brewster's millions, spend as much much of it as

0:21:00.480 --> 0:21:04.240
<v Speaker 1>you can to complete a mission, a very simple mission. Live.

0:21:05.240 --> 0:21:08.320
<v Speaker 1>In the former example, you concern survival, you have a

0:21:08.359 --> 0:21:10.280
<v Speaker 1>limited amount of money and must make it go as

0:21:10.280 --> 0:21:12.639
<v Speaker 1>far as possible, with real sacrifices to be made with

0:21:12.680 --> 0:21:14.880
<v Speaker 1>every dollar you spent. If you want to have fun,

0:21:15.080 --> 0:21:17.199
<v Speaker 1>you're going to have to eat less. Potentially all the

0:21:17.240 --> 0:21:19.240
<v Speaker 1>food you eat will have to be cheaper. You have

0:21:19.280 --> 0:21:21.640
<v Speaker 1>to live on a budget. You have to make decisions,

0:21:21.680 --> 0:21:24.399
<v Speaker 1>and indeed you might learn to cook at home. You

0:21:24.480 --> 0:21:27.520
<v Speaker 1>might walk more, you might do things that will help

0:21:27.560 --> 0:21:30.800
<v Speaker 1>you not spend all your money. In the latter example,

0:21:30.880 --> 0:21:32.720
<v Speaker 1>where you have one hundred and fifty thousand dollars a

0:21:32.760 --> 0:21:35.720
<v Speaker 1>month that you must spend, your incentivize the splurge to

0:21:35.800 --> 0:21:39.359
<v Speaker 1>lean into excess to pursue this vague idea of living

0:21:39.400 --> 0:21:43.159
<v Speaker 1>your life, your actions are dictated not by any existential threats,

0:21:43.240 --> 0:21:45.800
<v Speaker 1>or indeed any kind of future planning, but by whatever

0:21:45.840 --> 0:21:49.600
<v Speaker 1>you perceive to be an opportunity to live. Open AI

0:21:49.720 --> 0:21:53.000
<v Speaker 1>and anthropic are emblematic of what happens when survival takes

0:21:53.000 --> 0:21:56.240
<v Speaker 1>a back seat to living. They have been incentivized by

0:21:56.280 --> 0:21:59.600
<v Speaker 1>frothy venture capital and public markets desperate for the next

0:21:59.600 --> 0:22:02.600
<v Speaker 1>big thing thing, the next big growth to build bigger

0:22:02.600 --> 0:22:05.480
<v Speaker 1>models and sell even bigger dreams. Like Dario Amaday of

0:22:05.480 --> 0:22:08.800
<v Speaker 1>Anthropics saying that your AI and I quote could surpass

0:22:08.840 --> 0:22:12.800
<v Speaker 1>almost all human beings at almost everything shortly after twenty

0:22:12.960 --> 0:22:16.000
<v Speaker 1>twenty seven, I just want to take a fucking second. Journalist,

0:22:16.040 --> 0:22:18.720
<v Speaker 1>if you're listening to this, stop fucking quoting this bullshit.

0:22:19.440 --> 0:22:22.800
<v Speaker 1>Stop it. You're doing nothing. You are failing at your

0:22:22.840 --> 0:22:26.840
<v Speaker 1>goddamn job every single time you quote this bullshit, this nonsense.

0:22:27.119 --> 0:22:29.800
<v Speaker 1>Shortly after twenty twenty seven. What the fuck does that mean?

0:22:29.840 --> 0:22:33.640
<v Speaker 1>Twenty twenty eight, twenty twenty nine, twenty thirty, what does

0:22:34.000 --> 0:22:38.760
<v Speaker 1>surpassing humans and almost everything even mean? This shit doesn't work.

0:22:38.840 --> 0:22:42.040
<v Speaker 1>This shit is not good. Oh my god. Anyway, back

0:22:42.080 --> 0:22:45.399
<v Speaker 1>to the podcast, the Calm Damn. Both Open AI and

0:22:45.440 --> 0:22:48.280
<v Speaker 1>Anthropic have effectively lived their existence with the infinite money

0:22:48.320 --> 0:22:50.320
<v Speaker 1>cheap from the SIMS. And I know some of you

0:22:50.440 --> 0:22:52.120
<v Speaker 1>might say, by the way, it's not an infant money,

0:22:52.119 --> 0:22:54.440
<v Speaker 1>you just add you go into the console. You get

0:22:54.440 --> 0:22:57.199
<v Speaker 1>my point. And both companies have been bleeding billions of

0:22:57.200 --> 0:22:59.760
<v Speaker 1>dollars a year after revenue, and that's, by the way,

0:23:00.040 --> 0:23:03.080
<v Speaker 1>making billions of dollars and then still losing billions is insane,

0:23:03.480 --> 0:23:06.200
<v Speaker 1>and they still operated as if money would never run

0:23:06.200 --> 0:23:09.560
<v Speaker 1>out because it and it wouldn't. If they were actually

0:23:09.560 --> 0:23:11.919
<v Speaker 1>worried about that happening, they would have certainly tried to

0:23:11.920 --> 0:23:14.439
<v Speaker 1>do what Deep seek has done, except they didn't have

0:23:14.560 --> 0:23:16.720
<v Speaker 1>to because both of them had the endless cash and

0:23:16.760 --> 0:23:20.720
<v Speaker 1>access to GPUs from either Microsoft, Amazon or Google. And

0:23:21.000 --> 0:23:23.480
<v Speaker 1>the stargate thing is just I will mention it later,

0:23:23.680 --> 0:23:26.280
<v Speaker 1>just long story short. They're not going to put five

0:23:26.359 --> 0:23:29.000
<v Speaker 1>hundred billion dollars into the it was up to five

0:23:29.040 --> 0:23:32.800
<v Speaker 1>hundred bill I'm so tired of this shit. Open iron

0:23:32.800 --> 0:23:35.359
<v Speaker 1>anthropic have never been made to sweat, unlike me in

0:23:35.400 --> 0:23:38.320
<v Speaker 1>this closet where I'm recording this. And they've received endless

0:23:38.320 --> 0:23:40.600
<v Speaker 1>amount of free marketing from a tech and business media

0:23:40.640 --> 0:23:44.320
<v Speaker 1>happy to print whatever vapid bullshit they spout, and it's

0:23:44.400 --> 0:23:48.080
<v Speaker 1>just very frustrating. They've raised money at will with ananthropic,

0:23:48.119 --> 0:23:50.560
<v Speaker 1>by the way, is currently raising another two billion dollars,

0:23:50.680 --> 0:23:52.840
<v Speaker 1>valuing the company at sixty billion dollars. And this was

0:23:52.920 --> 0:23:55.600
<v Speaker 1>I think happening while deep Zeek was going on, which

0:23:55.640 --> 0:23:58.040
<v Speaker 1>is really funny. And they've done all of this off

0:23:58.040 --> 0:24:00.800
<v Speaker 1>of a narrative of them. We need more money than

0:24:00.800 --> 0:24:04.080
<v Speaker 1>any company is ever needed ever because the things we're

0:24:04.080 --> 0:24:08.800
<v Speaker 1>doing have to cost this much. There is no other way.

0:24:09.000 --> 0:24:12.159
<v Speaker 1>You must give us more money. My name is Sam Altman.

0:24:12.200 --> 0:24:14.640
<v Speaker 1>I need more money than has ever been made from

0:24:14.680 --> 0:24:17.320
<v Speaker 1>my huge, beautiful company that sucks and needs money to

0:24:17.359 --> 0:24:20.440
<v Speaker 1>train it. Help me, please, My big, beautiful sick company

0:24:20.480 --> 0:24:22.520
<v Speaker 1>is dying, but the best and most important company of

0:24:22.520 --> 0:24:28.119
<v Speaker 1>all time. It's also normal. Now. Do I think that

0:24:28.200 --> 0:24:30.399
<v Speaker 1>they were aware that there were methods to make their

0:24:30.440 --> 0:24:34.280
<v Speaker 1>models more efficient? Sure, open AI tried and failed in

0:24:34.320 --> 0:24:36.560
<v Speaker 1>twenty twenty three to deliver a more efficient model to

0:24:36.600 --> 0:24:42.600
<v Speaker 1>Microsoft called Arakis. I'm sure there are teams that both

0:24:42.600 --> 0:24:45.920
<v Speaker 1>Anthropic and OPENII that are specifically dedicated to making things

0:24:46.040 --> 0:24:48.560
<v Speaker 1>kind of more efficient. But they didn't have to do it,

0:24:48.600 --> 0:24:51.639
<v Speaker 1>and so they didn't. And as I've written before in

0:24:51.680 --> 0:24:54.400
<v Speaker 1>my newsletter and argued on this very podcast, open AI

0:24:54.520 --> 0:24:56.880
<v Speaker 1>simply burns money and have been allowed to burn money,

0:24:56.880 --> 0:24:58.879
<v Speaker 1>and up until recently likely would have been allowed to

0:24:58.880 --> 0:25:02.040
<v Speaker 1>burn even more money because everybody, all of the American

0:25:02.080 --> 0:25:04.639
<v Speaker 1>model developers, appeared to agree that the only way to

0:25:04.640 --> 0:25:07.280
<v Speaker 1>develop large language models was to make them as big

0:25:07.400 --> 0:25:10.840
<v Speaker 1>as humanly possible and work out troublesome stuff like making

0:25:10.840 --> 0:25:14.240
<v Speaker 1>them profitable or turning them into a useful thing later,

0:25:14.560 --> 0:25:17.840
<v Speaker 1>which is I presume when AGI happens, a thing that

0:25:17.840 --> 0:25:20.679
<v Speaker 1>they're still in the process of defining, let alone doing.

0:25:21.760 --> 0:25:23.640
<v Speaker 1>Deep Seek, on the other hand, had to work out

0:25:23.640 --> 0:25:25.600
<v Speaker 1>a way to make its own large language models within

0:25:25.640 --> 0:25:28.000
<v Speaker 1>the constraints of the hamstrung end video chips that can

0:25:28.040 --> 0:25:31.080
<v Speaker 1>be legally sold to China. While there's a whole cottaged

0:25:31.119 --> 0:25:34.160
<v Speaker 1>industry of selling chips in Chines using resellers and other

0:25:34.200 --> 0:25:37.280
<v Speaker 1>parties to get restricted silicon into the country, the entire

0:25:37.320 --> 0:25:40.040
<v Speaker 1>way in which deep Seek went about developing its models

0:25:40.160 --> 0:25:44.240
<v Speaker 1>suggests that it was working around very specific memory bandwidth constraints,

0:25:44.560 --> 0:25:46.320
<v Speaker 1>meaning that the amount of data that could be fed

0:25:46.320 --> 0:25:48.640
<v Speaker 1>into it and out of it and into the chips.

0:25:48.680 --> 0:25:51.720
<v Speaker 1>In essence, doing more with less wasn't something it shows,

0:25:51.720 --> 0:25:55.000
<v Speaker 1>but it's something they had to do. I've touched already

0:25:55.000 --> 0:25:57.160
<v Speaker 1>on the technical how of these models in greater depth,

0:25:57.200 --> 0:25:59.200
<v Speaker 1>and you can really read in that in my news

0:25:59.240 --> 0:26:01.359
<v Speaker 1>there and you can go to whez your hed not

0:26:01.480 --> 0:26:03.200
<v Speaker 1>at it's at the end of the episode. But I'll

0:26:03.200 --> 0:26:05.560
<v Speaker 1>also have show notes to what cales like Ben Thompson's

0:26:05.520 --> 0:26:08.960
<v Speaker 1>some strategory because there are lots of things to read here.

0:26:09.000 --> 0:26:11.160
<v Speaker 1>I know there are some really technical listeners, and I'm

0:26:11.160 --> 0:26:13.800
<v Speaker 1>sure you're gonna flame me in my emails. Please go

0:26:13.840 --> 0:26:16.080
<v Speaker 1>and read it. I'm not wrong. I've checked with a

0:26:16.080 --> 0:26:18.920
<v Speaker 1>lot of people too, and by the way, all of

0:26:18.920 --> 0:26:22.399
<v Speaker 1>this austerity stuff seems to have worked. There's also the

0:26:22.440 --> 0:26:26.840
<v Speaker 1>training data situation and another mayor culper. I've previously discussed

0:26:26.880 --> 0:26:29.760
<v Speaker 1>the concept of model collapse and how feeding synthetic data,

0:26:29.800 --> 0:26:32.639
<v Speaker 1>which is training data created by a generative model, into

0:26:32.680 --> 0:26:35.440
<v Speaker 1>another model, could end up teaching it bad habits, which

0:26:35.440 --> 0:26:37.800
<v Speaker 1>in turn would destroy the model. But it seems that

0:26:37.840 --> 0:26:41.240
<v Speaker 1>deep Seekers succeeded in training its models using generative data

0:26:41.760 --> 0:26:45.919
<v Speaker 1>specifically though, and I'm quoting geekwise John Turou like mathematics

0:26:45.960 --> 0:26:49.000
<v Speaker 1>where correctness is unambiguous, and using and i quote again,

0:26:49.240 --> 0:26:52.640
<v Speaker 1>highly efficient reward functions that could identify with which new

0:26:52.680 --> 0:26:55.959
<v Speaker 1>training examples would actually improve the model, avoiding wasted compute

0:26:55.960 --> 0:26:59.000
<v Speaker 1>on redundant data, and it seems to have worked. Though

0:26:59.040 --> 0:27:02.080
<v Speaker 1>model collapse may still be a possibility. This approach extremely

0:27:02.119 --> 0:27:04.720
<v Speaker 1>precise use of synthetic data is in line with some

0:27:04.760 --> 0:27:07.399
<v Speaker 1>of the defenses against model collapse I've heard from LLLM

0:27:07.440 --> 0:27:10.600
<v Speaker 1>developers i've talked to. This is also a situation where

0:27:10.640 --> 0:27:13.440
<v Speaker 1>we don't know the exact training data, and it doesn't

0:27:13.480 --> 0:27:16.320
<v Speaker 1>negate any of the previous points I've made about model collapse.

0:27:17.119 --> 0:27:20.520
<v Speaker 1>Now we'll see what happens there. But synthetic data might

0:27:20.560 --> 0:27:22.359
<v Speaker 1>work where the output is something that you could figure

0:27:22.359 --> 0:27:24.800
<v Speaker 1>out using a calculator. But when you get into anything

0:27:24.840 --> 0:27:26.840
<v Speaker 1>a bit more fuzzy, like written text or anything with

0:27:26.880 --> 0:27:30.680
<v Speaker 1>an element of analysis, you'll likely encounter some unhappy side effects.

0:27:30.840 --> 0:27:32.760
<v Speaker 1>But I don't know if that's really going to change

0:27:32.760 --> 0:27:35.679
<v Speaker 1>how good these things are. There's also a little scuttle

0:27:35.680 --> 0:27:38.840
<v Speaker 1>about about where deep seak got its data. Ben Thompson,

0:27:38.880 --> 0:27:42.080
<v Speaker 1>that's trategory suggests that deep seek's models are potentially distilling

0:27:42.160 --> 0:27:45.040
<v Speaker 1>other model's outputs, by which I mean having another model,

0:27:45.080 --> 0:27:48.520
<v Speaker 1>say metas LAMA or open ais GPT four H, which

0:27:48.560 --> 0:27:51.119
<v Speaker 1>is why deep seak identified itself as chet GPT at

0:27:51.160 --> 0:27:54.240
<v Speaker 1>one point spit out outputs specifically to train parts of

0:27:54.240 --> 0:27:57.600
<v Speaker 1>deep Seek. This obviously violates the terms of service of

0:27:57.640 --> 0:28:00.280
<v Speaker 1>these tools, as open AI and its rivals with much rather,

0:28:00.400 --> 0:28:03.240
<v Speaker 1>have you not use its technology to create its next rival.

0:28:03.800 --> 0:28:07.480
<v Speaker 1>And open Aye, by the way, has recently reportedly found

0:28:07.480 --> 0:28:10.880
<v Speaker 1>evidence that deep seek used open AIS models to train

0:28:10.960 --> 0:28:14.160
<v Speaker 1>its rivals. And this is from the Financial Times, although

0:28:14.200 --> 0:28:16.800
<v Speaker 1>it failed to make any formal allegations, but it did

0:28:16.880 --> 0:28:19.520
<v Speaker 1>say that using chat gpt to train a competing model

0:28:19.640 --> 0:28:22.920
<v Speaker 1>violates its terms of service, and David Sachs, the investor

0:28:22.920 --> 0:28:25.920
<v Speaker 1>in Trump administration AI and cryptos are, says it's possible

0:28:25.960 --> 0:28:29.320
<v Speaker 1>that this occurred, although he failed to provide evidence. I

0:28:29.440 --> 0:28:31.760
<v Speaker 1>just want to say, how fucking funny it is that

0:28:31.920 --> 0:28:36.000
<v Speaker 1>open air is going where where you're stealing my stuff?

0:28:36.040 --> 0:28:41.440
<v Speaker 1>Don't steal my things? Where fucking coward, pansy bastard bitches.

0:28:41.560 --> 0:28:44.880
<v Speaker 1>Fucking hell, what a what a bunch of whiny babies.

0:28:44.960 --> 0:28:49.400
<v Speaker 1>Oh no, my plagiarism machine got plagiarized. Where kiss my

0:28:49.760 --> 0:28:54.160
<v Speaker 1>entire asshole, sam Orman, you little worm, you fucking embarrassment

0:28:54.200 --> 0:28:56.640
<v Speaker 1>to Silicon Valley. You should be ashamed of yourself for

0:28:56.680 --> 0:29:01.120
<v Speaker 1>many reasons, but so much this though. Where Yeah, oh no,

0:29:01.320 --> 0:29:03.800
<v Speaker 1>you stole from use my plagier is the machine that

0:29:03.880 --> 0:29:07.200
<v Speaker 1>requires me to steal from literally every artist and author

0:29:07.240 --> 0:29:09.640
<v Speaker 1>on the Internet. The thing where we went on YouTube

0:29:09.680 --> 0:29:12.760
<v Speaker 1>and transcribed everything and fed it into the machine. That's

0:29:12.800 --> 0:29:15.680
<v Speaker 1>that's not stealing, that's good. But you using our model

0:29:15.720 --> 0:29:19.200
<v Speaker 1>to generate answers. That's just not fair. What a bunch

0:29:19.240 --> 0:29:22.160
<v Speaker 1>of babies, you guys say. I'm almos worth billions of dollars.

0:29:22.240 --> 0:29:24.880
<v Speaker 1>He has a five million dollar car. Cry more, you

0:29:24.960 --> 0:29:29.080
<v Speaker 1>little worm. Personally, I genuinely want open ai to point

0:29:29.080 --> 0:29:31.600
<v Speaker 1>a finger at Deep Seek and accuse it of IP theft,

0:29:32.080 --> 0:29:35.280
<v Speaker 1>mostly for the yucks, but also for the hypocrisy factor.

0:29:35.600 --> 0:29:38.440
<v Speaker 1>This is a company that, as I've just very cleanly said,

0:29:38.600 --> 0:29:42.240
<v Speaker 1>exists purely from the wholesale industrial larceny of content produced

0:29:42.240 --> 0:29:46.200
<v Speaker 1>by literally a fucking everyone, And now they're crying way.

0:29:47.040 --> 0:29:49.920
<v Speaker 1>I'm sam Olman. I'm a big baby. I've filled my

0:29:50.080 --> 0:29:54.280
<v Speaker 1>diaper because someone stole from my plagiarism machine. Kiss my ass,

0:29:55.000 --> 0:29:58.920
<v Speaker 1>Kiss my ass. These companies haven't got shit. Open ai

0:29:59.040 --> 0:30:01.840
<v Speaker 1>doesn't have shit. They they don't have anything, They don't

0:30:01.880 --> 0:30:05.360
<v Speaker 1>have a next product without reasoning, they haven't got anything.

0:30:05.600 --> 0:30:10.360
<v Speaker 1>And now they don't have that disgusting justification that overspending

0:30:10.400 --> 0:30:14.160
<v Speaker 1>the fat, ugly American startup culture of spending as much

0:30:14.200 --> 0:30:17.080
<v Speaker 1>as you can to build America's next top monopoly. They

0:30:17.080 --> 0:30:20.720
<v Speaker 1>should be fucking ashamed of themselves. They shouldn't be billionaires,

0:30:20.760 --> 0:30:23.880
<v Speaker 1>they should be poverty stricken. They should have to pay

0:30:23.880 --> 0:30:27.680
<v Speaker 1>everyone they stole for And it's just it sickens me

0:30:27.800 --> 0:30:31.120
<v Speaker 1>seeing the reaction from some people on this, seeing the sinophobia,

0:30:31.280 --> 0:30:33.920
<v Speaker 1>but seeing this level of defensiveness of a company like

0:30:33.960 --> 0:30:37.560
<v Speaker 1>open AI or Anthropic, And as I'll get into next episode,

0:30:37.640 --> 0:30:40.200
<v Speaker 1>we are really running out of time here, and I

0:30:40.240 --> 0:30:43.960
<v Speaker 1>think Deep Seek is really I think it could be

0:30:44.080 --> 0:30:47.360
<v Speaker 1>really the end of days for these companies. I don't

0:30:47.360 --> 0:30:50.000
<v Speaker 1>know how much they've got left time wise, or even

0:30:50.040 --> 0:30:53.120
<v Speaker 1>money wise, and I'm not sure how they even raise money.

0:30:53.200 --> 0:30:55.240
<v Speaker 1>But in the next episode, I'm going to deep dive

0:30:55.280 --> 0:30:58.040
<v Speaker 1>into Deep Seek and I'll tell you how they sent

0:30:58.120 --> 0:31:00.120
<v Speaker 1>the US tech market into a panic and what it

0:31:00.120 --> 0:31:03.760
<v Speaker 1>actually means the future of open Aianthropic and the hyperscale

0:31:03.840 --> 0:31:06.920
<v Speaker 1>is backing them. This has been a crazy few days.

0:31:07.400 --> 0:31:10.480
<v Speaker 1>I hope this has helped, and on Monday you'll find

0:31:10.480 --> 0:31:13.800
<v Speaker 1>out more. Thank you so much for listening. The support

0:31:13.840 --> 0:31:15.520
<v Speaker 1>I've got for the show has been incredible, and the

0:31:15.560 --> 0:31:19.880
<v Speaker 1>emails I've got about Deep Seek. I've been trying Okay,

0:31:19.920 --> 0:31:22.240
<v Speaker 1>I've really been trying so the fastest I could do it.

0:31:22.880 --> 0:31:24.520
<v Speaker 1>But I'm so happy to do this show, and I'm

0:31:24.520 --> 0:31:34.680
<v Speaker 1>so grateful for all of you. Thank you for listening

0:31:34.720 --> 0:31:37.360
<v Speaker 1>to Better Offline. The editor and composer of the Better

0:31:37.400 --> 0:31:40.400
<v Speaker 1>Offline theme song is Matasowski. You can check out more

0:31:40.440 --> 0:31:43.920
<v Speaker 1>of his music and audio projects at Matasowski dot com,

0:31:44.040 --> 0:31:48.960
<v Speaker 1>M A T T O. S O w Ski dot com.

0:31:49.000 --> 0:31:51.320
<v Speaker 1>You can email me at easy at Better Offline dot

0:31:51.360 --> 0:31:53.560
<v Speaker 1>com or visit Better Offline dot com to find more

0:31:53.600 --> 0:31:57.000
<v Speaker 1>podcast links and of course, my newsletter. I also really

0:31:57.000 --> 0:31:59.320
<v Speaker 1>recommend you go to chat dot where's youreaed dot at

0:31:59.320 --> 0:32:01.760
<v Speaker 1>to visit the disc and go to our slash Better

0:32:01.800 --> 0:32:04.960
<v Speaker 1>Offline to check out I'll Reddit. Thank you so much

0:32:05.000 --> 0:32:08.840
<v Speaker 1>for listening. Better Offline is a production of cool Zone Media.

0:32:08.960 --> 0:32:12.360
<v Speaker 1>For more from cool Zone Media, visit our website Coolzonemedia

0:32:12.400 --> 0:32:15.200
<v Speaker 1>dot com, or check us out on the iHeartRadio app,

0:32:15.280 --> 0:32:17.720
<v Speaker 1>Apple Podcasts, or wherever you get your podcasts.