WEBVTT - How To Argue With An AI Booster, Part Two 0:00:02.120 --> 0:00:02.880 Ze Media. 0:00:04.200 --> 0:00:07.000 Hello one, Welcome to Better Offline. I'm your host ed 0:00:07.080 --> 0:00:21.840 Zi Trun. This is part two of our three parts 0:00:21.880 --> 0:00:25.040 serious on how to argue with an AI booster. When 0:00:25.040 --> 0:00:27.440 we last left off, I'd started talking about some of 0:00:27.480 --> 0:00:30.240 the most common and vacuous talking points used by those 0:00:30.240 --> 0:00:32.959 who defend the generative AI industry and why a lot 0:00:32.960 --> 0:00:36.080 of them are wholly without merit. These are the booster quips, 0:00:36.120 --> 0:00:38.680 assertions that if you don't know much, sound convincing but 0:00:38.720 --> 0:00:41.680 are easily disproven with the right information. And in that 0:00:41.800 --> 0:00:44.000 last episode we addressed the quips that say were in 0:00:44.040 --> 0:00:47.080 the early days of AI and that people doubted smartphones 0:00:47.080 --> 0:00:49.479 and the internet. Things they didn't do just like they 0:00:49.479 --> 0:00:52.880 did generative AI, which they should do in the cycle 0:00:52.920 --> 0:00:55.360 of grief. That's the denial stage. Now we're going to 0:00:55.400 --> 0:00:58.880 move on to bargaining. This is just that the dot 0:00:58.920 --> 0:01:01.920 com boom, even if of this collapses, the overcapacity will 0:01:01.960 --> 0:01:04.200 be practical for the market like the fiber boom was. 0:01:05.040 --> 0:01:07.760 All right, folks, time for a little history. You know me, 0:01:07.840 --> 0:01:10.800 I'll love me some mystery. The fiber boom began after 0:01:10.840 --> 0:01:14.520 the Telecommunications Act of nineteen ninety six deregulated large parts 0:01:14.520 --> 0:01:18.920 of America's communications infrastructure, creating a massive boom, a five 0:01:19.000 --> 0:01:25.720 hundred billion dollars one to be precise, primarily funded with debt. Obviously, 0:01:25.720 --> 0:01:28.400 we're still using the infrastructure bought during that boom, and 0:01:28.480 --> 0:01:30.640 this fact is used as a defense of the insane 0:01:30.720 --> 0:01:35.520 capex spending surrounding generative AI. High speed Internet is useful, right, sure, 0:01:35.600 --> 0:01:38.480 But the fiber optic boom period was also defined by 0:01:38.480 --> 0:01:43.280 a gluttony of overinvestment, ridiculous valuations, and genuine, outright fraud. 0:01:43.480 --> 0:01:45.560 In any case, this is not remotely the same thing, 0:01:45.560 --> 0:01:47.480 and anyone making this point needs to learn the very 0:01:47.520 --> 0:01:51.520 fucking basics of technology. Let's get going now. The fiber 0:01:51.520 --> 0:01:54.120 optic cable of this era is mostly owned by a 0:01:54.120 --> 0:01:57.360 few companies. Forty two percent of Nvidia's revenue is from 0:01:57.400 --> 0:02:00.440 the Magnificent seven, and the companies buying these gps are 0:02:00.480 --> 0:02:02.360 for the most part not going to go bust once 0:02:02.400 --> 0:02:05.680 the AI bubble bursts. You can also already get the 0:02:05.800 --> 0:02:09.560 cheap fiber of this era too cheap aigpus already here. 0:02:09.840 --> 0:02:13.040 GPUs are depreciating assets, meaning that the good deals are 0:02:13.080 --> 0:02:16.640 already happening. I found an in Vidia a one hundred 0:02:16.639 --> 0:02:19.160 for two or three thousand dollars multiple times on eBay, 0:02:19.360 --> 0:02:21.120 and you can get the h one hundreds which are 0:02:21.160 --> 0:02:23.639 more powerful for well, I think thirty grand and those 0:02:23.680 --> 0:02:27.720 things go forty five thousand retails, So not brilliant. Aigpus 0:02:27.760 --> 0:02:29.760 also do not have a variety of use cases and 0:02:29.800 --> 0:02:33.440 are limited by Kuda, in Vidia's programming libraries and APIs. 0:02:33.760 --> 0:02:37.760 Aigpus are integrated into applications using this language Kuda, and 0:02:37.800 --> 0:02:41.280 this is specifically in Vidia's programming language. While there are 0:02:41.400 --> 0:02:45.320 other use cases scientific simulations, image and video processing, data 0:02:45.360 --> 0:02:48.880 science and analytics, medical imaging, and so on. Kuder is 0:02:48.880 --> 0:02:53.720 not a one size fits or digital panacea. While fiber 0:02:53.720 --> 0:02:57.040 optic cable was, and it was also put everywhere, it 0:02:57.200 --> 0:03:00.240 truly did set up the future. What are the these 0:03:00.240 --> 0:03:04.679 GPUs setting up exactly? Also, widespread access to cheaper GPUs 0:03:04.720 --> 0:03:08.280 has already happened, and what new use cases are there? 0:03:08.600 --> 0:03:11.520 What are the new innovative things we can do? As 0:03:11.520 --> 0:03:14.440 a result of the AI bubble, there are now many, many, many, many, 0:03:14.440 --> 0:03:17.720 many different vendors to get access to GPUs. You can 0:03:17.760 --> 0:03:20.000 pay at an hourly rate. Who knows if it's probitable, 0:03:20.040 --> 0:03:21.880 but you can do it, and sometimes you can get 0:03:21.880 --> 0:03:23.880 them for as little as one dollars an hour, which 0:03:23.919 --> 0:03:26.640 is really not good. It definitely isn't making them money 0:03:26.639 --> 0:03:30.520 but putting the financial collapse aside. While they might be 0:03:30.639 --> 0:03:33.840 cheaper when the AI bubble bursts, does cheaper actually enable 0:03:33.840 --> 0:03:36.920 people to do new stuff? Is costs the problem because 0:03:36.920 --> 0:03:38.080 I think the costs are going to go up. But 0:03:38.120 --> 0:03:40.440 even if they weren't going up, what are the things 0:03:40.480 --> 0:03:42.520 that you could do that a new What is the 0:03:42.560 --> 0:03:46.520 prohibitive cost? No one can actually answer this question because 0:03:46.560 --> 0:03:50.080 the answer isn't fun. GPUs are built to shove massive 0:03:50.080 --> 0:03:52.960 amounts of compute into one specific function, again and again 0:03:53.000 --> 0:03:55.560 and again, like generating the output of model, which remember, 0:03:55.680 --> 0:03:59.640 mostly boils down to complex maths. Unlike CPUs, a GPU 0:03:59.680 --> 0:04:03.240 can't easily changed tasks or handle many little distinct operations, 0:04:03.520 --> 0:04:05.560 meaning that these things aren't going to be adopted for 0:04:05.640 --> 0:04:08.640 another mass market use case because there probably isn't one. 0:04:09.280 --> 0:04:12.800 In simpler terms, this was not an infrastructure built out. 0:04:13.000 --> 0:04:16.360 The GPU boom is a heavily centralized, capital expenditure funded 0:04:16.400 --> 0:04:18.640 asset bubble where a bunch of chips will sit in 0:04:18.680 --> 0:04:22.560 warehouses or kind of fallow data centers waiting for somebody 0:04:22.560 --> 0:04:24.480 to make up a use case for them. And if 0:04:24.520 --> 0:04:27.000 an endearing one existed, we'd already have it, because we 0:04:27.040 --> 0:04:31.920 already have all the fucking GPUs. Now here's a really 0:04:31.920 --> 0:04:34.359 big boost e quip and I have been looking forward to. 0:04:34.360 --> 0:04:35.880 I get a lot of people asking you about this. 0:04:36.839 --> 0:04:41.280 I'm ed, you're so stupid. Why am I stupid? Exactly? Well, 0:04:41.320 --> 0:04:44.200 five really smart guys got together and wrote AI twenty 0:04:44.279 --> 0:04:47.320 twenty seven, which is a very real sounding extrapolation that 0:04:47.440 --> 0:04:52.559 shut the fuck up, shut up, shut up. AI twenty 0:04:52.600 --> 0:04:55.440 twenty seven is fan fiction. If you were scared by this, 0:04:55.480 --> 0:04:57.560 and you're not a booster, you shouldn't feel bad. By 0:04:57.560 --> 0:05:00.320 the way this was written to scare you. By the way, 0:05:00.320 --> 0:05:02.200 if you don't know what it is I'm talking about, 0:05:02.360 --> 0:05:04.880 you should consider yourself lucky. It's essentially a piece of 0:05:04.920 --> 0:05:09.000 speculative fiction that describes where GENAI companies get fatter models 0:05:09.000 --> 0:05:11.400 that get exponentially better, and the US and China are 0:05:11.440 --> 0:05:14.120 in brailed in an AI arms race. It's really silly. 0:05:14.160 --> 0:05:17.000 It's so very silly, and I call it fan fiction 0:05:17.080 --> 0:05:19.680 because it is. If we're thinking about this in purely 0:05:19.720 --> 0:05:22.080 intellectual terms. It's up there with my immortal and no, 0:05:22.200 --> 0:05:24.599 I'm not explaining that you can google that one for yourselves. 0:05:25.160 --> 0:05:27.240 It doesn't matter if all the people writing the fan 0:05:27.279 --> 0:05:30.080 fiction are scientists or that they have the right credentials. 0:05:30.440 --> 0:05:33.200 They themselves said that AI twenty twenty seven is a 0:05:33.279 --> 0:05:36.960 guess an extrapolation, which means guess with expert feedback, which 0:05:37.000 --> 0:05:40.120 means someone editing your fan fiction and involves experience that 0:05:40.200 --> 0:05:42.240 open AI. There are people that worked on the shows 0:05:42.240 --> 0:05:45.479 they write fan fiction about. We're not even insulting fan fiction. 0:05:45.560 --> 0:05:48.520 By the way, go nuts, you're more You are one 0:05:48.600 --> 0:05:53.040 hundred times more ethically positive than these people. At least 0:05:53.040 --> 0:05:56.960 you admits fan fiction could knuckles get pregnant. I'm sure 0:05:56.960 --> 0:05:59.200 somebody's found out. I'm not going to go line by 0:05:59.240 --> 0:06:01.160 line and cut this any more than I'm going to 0:06:01.200 --> 0:06:03.839 go and do a lengthy takedown of someone's erotic Bancho 0:06:03.920 --> 0:06:07.640 Kazoui's story, because both are fictional. The entire premise of 0:06:07.640 --> 0:06:10.400 this nonsense is that at one point someone invents a 0:06:10.400 --> 0:06:13.400 self learning agent that teaches itself stuff, and it does 0:06:13.400 --> 0:06:16.520 a bunch of other stuff requiring a Brazilian compute points 0:06:17.000 --> 0:06:19.599 with different agents with different numbers after them. There is 0:06:19.640 --> 0:06:21.800 no proof that this is possible. Nobody has done it, 0:06:21.839 --> 0:06:24.600 and nobody will do it. AA twenty twenty seven was 0:06:24.640 --> 0:06:27.120 written specifically to fool people that want to be fooled, 0:06:27.279 --> 0:06:29.440 with big chants and the right technical terms used to 0:06:29.480 --> 0:06:31.400 lull the credulus into a wet dream and a New 0:06:31.480 --> 0:06:33.680 York Times column where one of the writers folds their 0:06:33.720 --> 0:06:36.520 hands and looks worried. It was also written to scare 0:06:36.520 --> 0:06:40.480 people that are already scared. It makes big, scary proclamations 0:06:40.480 --> 0:06:43.000 with tons of links to stuff that looks really legitimate, 0:06:43.080 --> 0:06:45.920 but when you piece it all together, is literally just 0:06:46.000 --> 0:06:50.440 fan fection, except really not that endearing. My personal favorite 0:06:50.480 --> 0:06:53.200 part is mid twenty twenty six China Wakes Up, which 0:06:53.240 --> 0:06:56.520 involves China's intelligence agents. He's trying to steal Open Brains 0:06:56.560 --> 0:06:59.960 agent no idea who this companicably referring to please email 0:07:00.000 --> 0:07:02.000 if you can work it out to I don't care 0:07:02.080 --> 0:07:05.760 at business dot org before the headline of AI take 0:07:05.839 --> 0:07:08.560 some jobs. After Open Brain releases a model. Oh God, 0:07:08.600 --> 0:07:12.520 I'm so bored even fucking talking about this now. Sarah 0:07:12.600 --> 0:07:15.120 lyonce puts this well, arguing that AI twenty twenty seven 0:07:15.160 --> 0:07:17.680 and AI in general is no different from the spurious 0:07:17.720 --> 0:07:20.200 spectral evidence used to accuse someone of being a witch 0:07:20.280 --> 0:07:23.520 during the Salem witch trials, and I quote and the 0:07:23.520 --> 0:07:26.320 evidence is spectral. What is the real evidence in AI 0:07:26.320 --> 0:07:29.680 twenty twenty seven beyond trust us and vibes? People who 0:07:29.680 --> 0:07:32.720 wrote it site themselves in the piece, do not demand 0:07:32.720 --> 0:07:35.440 I take this seriously. This is so clearly a marketing 0:07:35.960 --> 0:07:38.240 device to scare people into buying your product before this 0:07:38.280 --> 0:07:41.600 imaginary window closes. Don't call me stupid for not falling 0:07:41.640 --> 0:07:44.840 for your spectral evidence. My whole life, people have been 0:07:44.880 --> 0:07:48.200 saying artificial intelligence is around the corner, and it never arrives. 0:07:48.640 --> 0:07:50.680 I simply do not believe a chatbot will ever be 0:07:50.720 --> 0:07:52.720 more than a chat pot, and until you show me 0:07:52.760 --> 0:07:57.040 it doing that, I will not believe it anyway. AI 0:07:57.080 --> 0:08:00.480 twenty twenty seven is fan fiction nothing more. Just because 0:08:00.480 --> 0:08:02.920 it's full of fancy words and has five different grifters 0:08:02.960 --> 0:08:19.400 on its byline doesn't mean a goddamn thing. Now now, now, now, now, folks, 0:08:20.240 --> 0:08:24.120 we've all been waiting for this moment, and here's the 0:08:24.200 --> 0:08:28.239 ultimate booster quip the cust of inference is coming down. 0:08:28.520 --> 0:08:31.640 This proves that things are getting cheaper. And here's a 0:08:31.640 --> 0:08:34.000 bonus trick for you before I get to my ben 0:08:34.640 --> 0:08:37.640 Here we go, ask them to explain whether things have 0:08:37.720 --> 0:08:40.000 actually got cheaper, and if they say they have, ask 0:08:40.040 --> 0:08:42.880 them why there are no profitable AI companies. If they 0:08:42.920 --> 0:08:45.240 say they're in the growth stage, ask them why there 0:08:45.240 --> 0:08:47.920 are no profitable AI companies. Again, I'd say it's been 0:08:48.000 --> 0:08:50.679 several years and not got one. At this point they 0:08:50.679 --> 0:08:53.640 should try and kill you. But really, I'm about to 0:08:53.679 --> 0:08:55.880 be petty. I'm about to be petty for a fucking 0:08:55.920 --> 0:08:58.960 reason though. In an interview on a podcast from earlier 0:08:58.960 --> 0:09:01.560 this year that I will not even quote because the 0:09:01.679 --> 0:09:04.040 journalist in question did not back me up and it 0:09:04.080 --> 0:09:08.240 pisses me off, Journalist Casey Newton said the following about 0:09:08.240 --> 0:09:08.720 my work. 0:09:09.720 --> 0:09:11.160 You don't think that that kind of flies in the 0:09:11.160 --> 0:09:13.120 face of same altman saying that we need billions of 0:09:13.160 --> 0:09:15.880 dollars for years. No, not at all. And I think 0:09:15.920 --> 0:09:18.080 that's why it's so important when you're reading about AI 0:09:18.240 --> 0:09:20.600 to read people who actually interview people who work at 0:09:20.640 --> 0:09:23.640 these companies and understand how the technology works. Because the 0:09:23.800 --> 0:09:28.000 entire industry has been on this curve where they are 0:09:28.200 --> 0:09:32.440 trying to find micro innovations that reduce the cost of 0:09:32.480 --> 0:09:35.240 training the models and to reduce the cost of what 0:09:35.280 --> 0:09:37.600 they call inference, which is when you actually enter aquarium 0:09:37.640 --> 0:09:41.000 the chat GBT and if you plotted the curve of 0:09:41.280 --> 0:09:44.360 how the cost has been following over time, Deep Seek 0:09:44.440 --> 0:09:47.520 is on that curve. Right, So everything that Deep Seek 0:09:47.559 --> 0:09:50.160 did it was expected by the AI labs that someone 0:09:50.200 --> 0:09:52.520 would be able to do. The novelty was just that 0:09:52.559 --> 0:09:54.760 a Chinese company did it. So to say that it 0:09:54.920 --> 0:09:58.600 like up ends expectations of how AI would be built 0:09:58.760 --> 0:10:01.440 is just purely false and the opinion of somebody who 0:10:01.440 --> 0:10:02.680 does not know what he's talking about. 0:10:03.280 --> 0:10:06.520 Newton then says several octaves higher, which shows you exactly 0:10:06.520 --> 0:10:09.360 how mad he isn't that he thought what he said 0:10:09.480 --> 0:10:12.000 was very civil, and that there are things that are 0:10:12.000 --> 0:10:14.679 true and there are things that are false, like you 0:10:14.720 --> 0:10:17.560 can choose which ones you want to believe. I'm not 0:10:17.600 --> 0:10:20.240 going to be so civil. Other than the fact that 0:10:20.280 --> 0:10:23.959 Casey refers to micro innovations, the fuck are you talking about? 0:10:24.200 --> 0:10:26.640 And Deep Seak being on a curve that was expected, 0:10:27.000 --> 0:10:30.320 he makes, as many do, two very big mistakes and personally. 0:10:30.360 --> 0:10:34.160 If I was doing this, I personally would not have 0:10:34.280 --> 0:10:37.680 said these things in a sentence that began with me 0:10:37.760 --> 0:10:40.560 suggesting that I be in case and Newton in this 0:10:40.679 --> 0:10:44.080 example knew how the technology works. Now here's the case 0:10:44.120 --> 0:10:47.160 in Newton wib inference, which is when you actually enter 0:10:47.200 --> 0:10:50.040 a query into chat GPT. This statement is false. It's 0:10:50.040 --> 0:10:52.760 not what inference means. Inference and I've gotten this wrong 0:10:52.800 --> 0:10:55.680 in the past too. I'm being accountable. Is everything that 0:10:55.760 --> 0:10:58.120 happens when you put in a prompt to generate an output. 0:10:58.400 --> 0:11:02.080 It's when an AI based on your infers meaning. To 0:11:02.160 --> 0:11:05.280 be more specific, in quoting Google machine learning, inference is 0:11:05.280 --> 0:11:07.720 the process of running data points into a machine learning 0:11:07.720 --> 0:11:10.960 model to calculate an output, such as a single numerical score. 0:11:11.320 --> 0:11:13.439 Except that's what these things are bad at. But nevertheless, 0:11:13.720 --> 0:11:15.440 Casey will try and weasel out of this one and 0:11:15.480 --> 0:11:18.320 say this is what he meant. It wasn't. He also said, 0:11:18.400 --> 0:11:20.240 if he planted the curve of how the cost of 0:11:20.280 --> 0:11:24.200 inference has been falling over time, well that's wrong, Casey, 0:11:24.320 --> 0:11:26.320 that's wrong the man. The cost of inference has gone 0:11:26.360 --> 0:11:28.960 up over time. Now, Casey, like many people who talk 0:11:28.960 --> 0:11:31.600 about stuff without learning about it first is likely referring 0:11:31.600 --> 0:11:33.320 to the fact that the price of tokens for some 0:11:33.360 --> 0:11:36.240 models has gone down in some cases. But you know what, folks, 0:11:36.320 --> 0:11:38.959 let's establish and facts about inference. I'm doing the train. 0:11:39.320 --> 0:11:41.960 I'm pulling the big horn on the invisible train. I'm 0:11:42.000 --> 0:11:45.000 cooking now. Inference is a thing that costs money, is 0:11:45.120 --> 0:11:47.760 entirely different to the price of tokens, and conflating the 0:11:47.800 --> 0:11:51.000 two is journalistic malpractice. The cost of inference would be 0:11:51.000 --> 0:11:53.720 the price of running the GPU and the associated architecture. 0:11:53.800 --> 0:11:55.800 Of course, we do not at this point have any 0:11:55.840 --> 0:11:59.520 real insight into token prices are set by the people 0:11:59.520 --> 0:12:02.160 who sell access to the tokens, such as open ai 0:12:02.200 --> 0:12:05.120 and Anthropic. For example, open ai dropped the price of 0:12:05.160 --> 0:12:07.959 its O three models token costs almost immediately after the 0:12:08.000 --> 0:12:10.520 launch of Claude Opus four. Do you think it did 0:12:10.559 --> 0:12:12.800 that because the price of serving the models got cheaper. 0:12:13.000 --> 0:12:16.040 If you do, I don't know how you possibly put 0:12:16.080 --> 0:12:19.920 your trousers on every morning without cutting yourself in half. Now, 0:12:19.920 --> 0:12:22.960 the cost of inference conversation comes from articles that say 0:12:23.000 --> 0:12:25.400 that we now have models that are cheaper that can 0:12:25.400 --> 0:12:28.960 now hit higher benchmark scores. Though the article I'm referring to, 0:12:29.000 --> 0:12:31.080 which will be in the show notes, is from November 0:12:31.080 --> 0:12:33.240 twenty twenty four, and the comparison it makes is between 0:12:33.280 --> 0:12:36.280 GPT three, which is from November twenty twenty one, and 0:12:36.400 --> 0:12:40.400 LAMA three point two to three b September twenty twenty four. Now, 0:12:40.440 --> 0:12:42.200 the suggestion is in any case, that the cost of 0:12:42.200 --> 0:12:45.040 inference is going down ten x year over year. The 0:12:45.080 --> 0:12:47.600 problem is, however, that these are raw token costs, not 0:12:47.640 --> 0:12:51.199 actual expressions of evaluations of token burn in a practical setting. 0:12:51.720 --> 0:12:54.199 And to really I realized that it was a bit technical. 0:12:54.960 --> 0:12:57.920 These are just what it costs to do something. It 0:12:57.960 --> 0:13:01.120 doesn't actually tell you how how many tokens will be 0:13:01.160 --> 0:13:03.640 burned at what volume they will be burned, because that 0:13:03.679 --> 0:13:06.800 would change things. And well, wouldn't you know it, the 0:13:06.840 --> 0:13:10.120 cost of inference actually went up as a result. In 0:13:10.160 --> 0:13:12.080 an excellent blog from Killer Code, and I did not 0:13:12.160 --> 0:13:14.640 get the chance to find out the pronunciation of this 0:13:15.400 --> 0:13:17.319 second name, so I'm just going to call her. It 0:13:17.400 --> 0:13:22.760 is ewasyz sz Ka. I am so sorry. I would 0:13:22.840 --> 0:13:25.679 rather spell it out, miss than actually mispronounce it. I 0:13:25.720 --> 0:13:29.240 hate when people say z tron wrong. Great blog anyway, 0:13:29.320 --> 0:13:33.520 let me quote, application inference costs increase for two reasons. 0:13:33.559 --> 0:13:36.600 The frontier models cost per token stayed constant, and the 0:13:36.679 --> 0:13:40.760 token consumption per application grew a lot. Token consumption per 0:13:40.800 --> 0:13:43.600 application grew a lot because models allowed for longer context 0:13:43.600 --> 0:13:46.880 windows and bigger suggestions from the models. The combination of 0:13:46.920 --> 0:13:49.840 a steady price per token and more token consumption caused 0:13:49.880 --> 0:13:52.880 that inference cost to grow about ten times over the 0:13:52.880 --> 0:13:56.600 past two years. To explain that in really simple terms, 0:13:56.640 --> 0:13:59.440 while the costs of old models may have decreased, new models, 0:13:59.640 --> 0:14:02.760 which you need to do most things, cost about the same, 0:14:02.800 --> 0:14:05.600 and the reasoning that these new models use do actually 0:14:05.600 --> 0:14:09.079 burn way way more tokens. When these new models reason, 0:14:09.160 --> 0:14:11.280 they break the user's input down and break it into 0:14:11.280 --> 0:14:14.360 component parts, then run inference on each of those parts. 0:14:14.600 --> 0:14:16.200 When you plug an L and M into an AI 0:14:16.240 --> 0:14:19.320 coding environment, it will naturally burn an absolute shit ton 0:14:19.360 --> 0:14:21.640 of tokens, in part because of the large amount of 0:14:21.640 --> 0:14:23.800 information you have to load into the prompt and the 0:14:23.840 --> 0:14:25.960 context window, or the amount of information you can load 0:14:26.000 --> 0:14:29.440 in at once, and in part because generatingcode is inference 0:14:29.520 --> 0:14:31.920 intensive and also breaking down all those coding tasks. At 0:14:31.960 --> 0:14:34.360 each of those tasks requiring a coding tool and taking 0:14:34.400 --> 0:14:38.200 a bunch of inference themselves. It's really bad. In fact, 0:14:38.240 --> 0:14:40.640 the inference costs are so severe. The Killer Code says 0:14:40.680 --> 0:14:43.160 that a combination of a steady price for token and 0:14:43.200 --> 0:14:46.040 more token consumption caused app inference costs to grow about 0:14:46.040 --> 0:14:49.160 ten x over the last two years. I'm repeating myself. 0:14:49.200 --> 0:14:51.520 I realized, But I really need you to get one thing, 0:14:51.760 --> 0:14:53.960 which is that the cost of inference went up. But 0:14:54.120 --> 0:14:56.600 I'm not done. I refuse to let this point go 0:14:56.800 --> 0:14:58.760 because people love to say the cost of inference is 0:14:58.800 --> 0:15:01.400 going down when the cost of inference has increased, and 0:15:01.440 --> 0:15:04.240 they do so to a national audience, all while suggesting 0:15:04.320 --> 0:15:07.880 I'm wrong somehow and acting superior. I don't like being 0:15:07.920 --> 0:15:10.680 made to feel this way. I don't think it's nice 0:15:10.680 --> 0:15:13.360 to do this to people. And if you're gonna do it, 0:15:13.440 --> 0:15:15.720 if you have the temerity to call someone out directly, 0:15:15.840 --> 0:15:20.160 at least be fucking right. I'm not wrong, You're wrong. 0:15:20.600 --> 0:15:24.240 In fact, software developer influencer Theo Brown recently put out 0:15:24.240 --> 0:15:26.960 a video called I was wrong about AI costs They 0:15:27.040 --> 0:15:30.240 keep going up, which he breaks down as follows, reasoning 0:15:30.240 --> 0:15:34.000 models are significantly increasing the amount of output tokens being generated. 0:15:34.320 --> 0:15:37.760 These tokens are also more expensive. In one example, Brown 0:15:37.840 --> 0:15:41.080 finds that Grockfor's reasoning mode uses six hundred and three 0:15:41.120 --> 0:15:45.760 tokens to generate two words. This was a problem across 0:15:45.800 --> 0:15:48.720 every single reasoning model, as even cheap reasoning models would 0:15:48.760 --> 0:15:51.600 do the same thing. As a result, tasks are taking 0:15:51.680 --> 0:15:55.240 longer and burning more tokens. Another writer called Ethan Deing 0:15:55.280 --> 0:15:57.760 noted a few months ago that reasoning models burn so 0:15:57.800 --> 0:16:00.680 many tokens that there is no flat subscrips price that 0:16:00.720 --> 0:16:03.200 works in this new world. As the number of tokens 0:16:03.240 --> 0:16:06.920 they consume to an absolutely nuclear the price drops have 0:16:07.000 --> 0:16:09.920 also for the most part stopped. You cannot at this 0:16:10.040 --> 0:16:12.560 point fairly evaluate whether a model is cheaper just based 0:16:12.600 --> 0:16:15.640 on its cost per tokens, because reasoning models inherently burn 0:16:15.880 --> 0:16:19.080 and are built to inherently burn more tokens to create 0:16:19.120 --> 0:16:21.560 an output. Reasoning models are also the only way that 0:16:21.600 --> 0:16:23.840 model developers have been able to improve the efficacy of 0:16:23.880 --> 0:16:26.640 new models, using something called test time compute to burn 0:16:26.680 --> 0:16:30.080 extra tokens to complete a task, and in basically anything 0:16:30.120 --> 0:16:31.800 you're using today, there's going to be some sort of 0:16:31.880 --> 0:16:35.360 reasoning model, especially if you're coding, the cost of inference 0:16:35.360 --> 0:16:38.800 has gone up. Statements otherwise are purely false and are 0:16:38.840 --> 0:16:41.000 the opinion of somebody who does not know what he's 0:16:41.040 --> 0:16:44.240 talking about. But you ask, could the costs of inference 0:16:44.280 --> 0:16:49.000 go down? Maybe it sure isn't trending that way, nor 0:16:49.040 --> 0:16:51.560 has it gone down yet. I also predict that there's 0:16:51.560 --> 0:16:53.440 going to be some sort of sudden realization in the 0:16:53.440 --> 0:16:55.720 media that inference is going up, which is kind of 0:16:55.720 --> 0:16:58.960 already started. The Information had a piece on it in 0:16:59.040 --> 0:17:01.480 late August where they note that into it paide twenty 0:17:01.480 --> 0:17:03.880 million dollars to as your last year, primarily to access 0:17:03.920 --> 0:17:06.160 open AI's models, and it's on track to spend thirty 0:17:06.200 --> 0:17:08.720 million this year, which outpaces the company's revenue growth in 0:17:08.760 --> 0:17:11.800 the same period, raising questions about how sustainable the spending 0:17:11.920 --> 0:17:13.560 is and how much of the cost it can pass 0:17:13.560 --> 0:17:16.320 along to customers. Christopher Mims and The Wall Street Journal 0:17:16.359 --> 0:17:18.359 also had a piece about the costs going up. Do 0:17:18.520 --> 0:17:21.040 not be mad at Chris. Chris and I chatted before 0:17:21.080 --> 0:17:24.040 he submitted that piece, like he literally on Blue Sky 0:17:24.080 --> 0:17:26.360 called me out if fucking rocks. By the way, big 0:17:26.440 --> 0:17:28.600 up to Chris Mims because it's nice to see the 0:17:28.640 --> 0:17:31.639 mainstream media actually engaging with these things, even though it's 0:17:31.720 --> 0:17:34.600 dangerous to the bubble. But you know what, the truth 0:17:34.680 --> 0:17:37.040 must win out, and the problem here is that the 0:17:37.160 --> 0:17:41.600 architecture underlying large language models is inherently unreliable. I imagine open 0:17:41.600 --> 0:17:44.520 AI's introduction of the router to chat GPT five as 0:17:44.560 --> 0:17:46.359 an attempt to moderate both the costs of the model 0:17:46.440 --> 0:17:49.320 chosen and reduce the amount of exposure to reasoning models 0:17:49.320 --> 0:17:52.520 for simple queries. Though Sam Moltman was boasting on August 0:17:52.520 --> 0:17:54.880 tenth about the significant increase in both free and paid 0:17:54.960 --> 0:17:58.000 users exposure to reasoning models, they don't teach you this 0:17:58.119 --> 0:18:01.640 in business school. Still, A study written up by VentureBeat 0:18:01.680 --> 0:18:04.040 found that open weight models burn between one point five 0:18:04.080 --> 0:18:06.119 to four times more tokens, in part due to a 0:18:06.200 --> 0:18:08.879 lack of token efficiency and in part thanks to you 0:18:09.040 --> 0:18:13.440 guessed it reasoning models. I quote the finding's challenge of 0:18:13.480 --> 0:18:16.560 prevailing assumption in the AI industry that open source models 0:18:16.560 --> 0:18:20.520 offer a clear economic advantages over proprietary alternatives. While open 0:18:20.520 --> 0:18:23.000 source models typically cost less per token to run, the 0:18:23.000 --> 0:18:25.520 study suggests that this advantage could be and I quote 0:18:25.560 --> 0:18:28.280 the study easily offset if they require more tokens to 0:18:28.320 --> 0:18:31.560 reason about a given problem, and models keep getting bigger 0:18:31.560 --> 0:18:36.399 and more expensive too. So why did this happen? Well, 0:18:36.520 --> 0:18:39.359 it's because model developers hit a wall of diminishing returns 0:18:39.400 --> 0:18:41.159 and the only way to make models do more was 0:18:41.200 --> 0:18:43.080 to make them burn more tokens to generate a more 0:18:43.119 --> 0:18:46.560 accurate response, which is a very simple way of describing 0:18:46.600 --> 0:18:49.160 reasoning a thing that opening I launched in September twenty 0:18:49.200 --> 0:18:52.120 twenty four, and others followed. As a result, all the 0:18:52.160 --> 0:18:55.040 gains from powerful new models come from burning more and 0:18:55.119 --> 0:18:57.639 more tokens. The cost per million token number is no 0:18:57.720 --> 0:18:59.840 longer an accurate measure of the actual cost of generative 0:18:59.880 --> 0:19:02.720 a because it's much much, much much harder to tell 0:19:02.720 --> 0:19:04.920 how many tokens of reasoning model may burn, and it 0:19:05.040 --> 0:19:08.399 varies as the boint the O Boying, I'm keeping that 0:19:08.480 --> 0:19:11.080 all right. You get the real cuts as the O 0:19:11.240 --> 0:19:14.840 Brown noted from model to model. In any case, there 0:19:14.880 --> 0:19:17.600 really is no changing this path. These companies are out 0:19:17.600 --> 0:19:22.679 of ideas now another another one of my favorite ultimate 0:19:22.720 --> 0:19:25.120 booster gripts. This is a classic and I still get 0:19:25.160 --> 0:19:28.679 this on social media. I'm I have people yapping in 0:19:28.720 --> 0:19:31.919 my ear saying open air and Anthropic are just like 0:19:32.080 --> 0:19:34.840 Uber because Uber bent twenty five billion dollars over the 0:19:34.880 --> 0:19:37.960 course of fifteen or so years and look look edward, 0:19:38.119 --> 0:19:40.399 they're now profitable. Why are you calling me Airport? Shut up? 0:19:40.640 --> 0:19:43.199 This proves the open Ai, a totally different company with 0:19:43.240 --> 0:19:46.280 different economics, will be totally fine. So I've heard this 0:19:46.400 --> 0:19:48.520 argument maybe fifty times in the last year, to the 0:19:48.520 --> 0:19:49.879 point that I had to talk about it in my 0:19:49.960 --> 0:19:53.160 piece how does open Ai Survive, which I also turned 0:19:53.160 --> 0:19:55.720 into a podcast around July twenty twenty four. Go back 0:19:55.720 --> 0:19:58.960 and link a link to it in the piece. Yaddy yaddy, yadda. Nevertheless, 0:19:58.960 --> 0:20:00.840 people make a few points by Uber and AI that 0:20:00.840 --> 0:20:02.880 I think are fundamentally incorrect, and I'm going to break 0:20:02.920 --> 0:20:05.680 them down for you now. They claim that AI is 0:20:05.720 --> 0:20:08.200 making itself too big to fail and betting itself everywhere 0:20:08.240 --> 0:20:10.920 and becoming essential, and none of these things are the case. 0:20:11.560 --> 0:20:13.480 I've heard this argument a lot, by the way, and 0:20:13.520 --> 0:20:16.879 it's one that's both ahistorical and alarmingly ignorant of the 0:20:17.040 --> 0:20:21.320