WEBVTT - The Case Against Generative AI (Part 3) 0:00:02.840 --> 0:00:06.640 Media, Hello and welcomes a better offline. I'm, of course 0:00:06.680 --> 0:00:20.880 your host ed Zitron. We're in the third episode of 0:00:20.880 --> 0:00:23.080 our four part series where I give you a comprehensive 0:00:23.120 --> 0:00:25.919 explanation as to the origins of the AI bubble, the 0:00:25.960 --> 0:00:29.000 mythology sustaining it, and why it's destined to end really, 0:00:29.080 --> 0:00:32.840 really badly. Now, if you're jumping in now, please start 0:00:32.880 --> 0:00:35.240 from the very beginning. The reason why this is a 0:00:35.320 --> 0:00:37.839 four part my first ever, is because I want it 0:00:37.880 --> 0:00:40.040 to be comprehensive, and because this is a very big 0:00:40.080 --> 0:00:42.959 subject with a lot of moving parts and even more bullshit. 0:00:43.600 --> 0:00:45.879 A few weeks ago, I published a premium newsletter that 0:00:45.920 --> 0:00:48.640 explained how everybody is losing money on generative AI, in 0:00:48.680 --> 0:00:51.320 part because the costs of running AI models is increasing, 0:00:51.520 --> 0:00:53.680 and in part because the software itself doesn't do enough 0:00:53.680 --> 0:00:55.880 to warrant the costs associated with running them, which are 0:00:55.880 --> 0:00:59.680 already subsidized and unprofitable for the model providers. Outside of 0:00:59.760 --> 0:01:03.160 open and to a lesser extent, Anthropic, nobody seems to 0:01:03.160 --> 0:01:05.800 be making much revenue, with the most successful company being 0:01:05.840 --> 0:01:08.800 any Sphere, makers of AI coding tool Cursor, which hid 0:01:08.800 --> 0:01:11.800 five hundred million dollars have annualized so forty one point 0:01:12.280 --> 0:01:14.679 six million in one month a few months ago, just 0:01:14.720 --> 0:01:17.319 before Anthropic and open ai jacked up the prices for 0:01:17.400 --> 0:01:21.160 priority processing on enterprise queries, raising their operating costs as 0:01:21.160 --> 0:01:24.280 a result. In any case, that's some pissport revenue for 0:01:24.319 --> 0:01:26.440 an industry that's meant to be the future of software. 0:01:26.800 --> 0:01:29.280 Smart Watchers are projected to make thirty two billion dollars 0:01:29.280 --> 0:01:32.000 this year, and as I've mentioned in the past, the 0:01:32.040 --> 0:01:34.600 Magnificent Seven expect to make thirty five billion dollars or 0:01:34.600 --> 0:01:36.399 so in revenue from AI this year, and I think 0:01:36.440 --> 0:01:38.800 in total, when you're throw in core even all them, 0:01:39.120 --> 0:01:42.560 it's barely fifty five billion dollars in total. Even Anthropic 0:01:42.600 --> 0:01:45.240 and open Ai seem a little lethargic, both burning billions 0:01:45.240 --> 0:01:47.560 of dollars while making by my estimates, no more than 0:01:47.560 --> 0:01:50.360 two billion dollars in Anthropics case this year so far 0:01:50.600 --> 0:01:53.480 and six point six two six billion dollars in twenty 0:01:53.520 --> 0:01:56.440 twenty five so far for open Ai, despite projections of 0:01:56.520 --> 0:02:00.240 five billion dollars and thirteen billion dollars respectively. Outside of 0:02:00.320 --> 0:02:03.240 these two AI startups are floundering, struggling to stay alive 0:02:03.320 --> 0:02:06.000 and raising money in several hundred million dollar versus their 0:02:06.040 --> 0:02:09.919 negative gross margin businesses flounder as they dug into. A 0:02:09.960 --> 0:02:12.520 few months ago, I could find only twelve AI powered 0:02:12.560 --> 0:02:15.040 companies making more than eight point three million dollars a month, 0:02:15.080 --> 0:02:17.840 with two of them slightly improving their revenue, specifically AI 0:02:17.880 --> 0:02:20.120 search company perplexd, which is now here one hundred and 0:02:20.160 --> 0:02:23.040 fifty million dollars an ur in or twelve point five 0:02:23.080 --> 0:02:26.119 million dollars a month, and AI coding startup Replayer, which 0:02:26.120 --> 0:02:29.280 has hit the same amount. Both of these companies burn 0:02:29.400 --> 0:02:32.720 ridiculous amounts of money. Paplexd burned one hundred and sixty 0:02:32.720 --> 0:02:35.400 four percent of its revenue on Amazon web services, open 0:02:35.400 --> 0:02:38.440 Ai and Anthropic last year, and while replet hasn't leaked 0:02:38.440 --> 0:02:41.720 its costs, the information reports its gross margins in July 0:02:41.800 --> 0:02:44.320 but twenty three percent, which doesn't include the cost of 0:02:44.320 --> 0:02:47.200 its free users, which you simply have to do with llms, 0:02:47.360 --> 0:02:49.440 as free users are capable of costing you a shit 0:02:49.480 --> 0:02:51.600 ton of money. And some of you might say that's 0:02:51.600 --> 0:02:54.040 how they do it in software, Well, guess what software 0:02:54.040 --> 0:02:56.160 doesn't usually connect you to a model that can burn 0:02:56.360 --> 0:02:59.520 I don't know ten cents twenty cents every time they 0:02:59.560 --> 0:03:01.880 touch it, which may not seem like much, but when 0:03:01.919 --> 0:03:04.760 you're making three dollars on someone and they don't convert, 0:03:04.840 --> 0:03:08.880 it does problematically. Your paid users also cost you more 0:03:08.880 --> 0:03:11.440 than they bring in as well. In fact, every user 0:03:11.639 --> 0:03:14.080 loses you money in Generative AI because it's impossible to 0:03:14.080 --> 0:03:17.280 do cost control in a consistent manner. A few months ago, 0:03:17.320 --> 0:03:19.440 I did a piece of Anthropic losing money on every 0:03:19.440 --> 0:03:21.880 single claud code subscriber. And now I'm going to walk 0:03:21.919 --> 0:03:24.760 you through the whole story in a simplified fashion because 0:03:24.800 --> 0:03:28.600 it's quite important. So claud Code is a coding environment 0:03:28.639 --> 0:03:32.680 that people use used, or I should really say, try 0:03:32.720 --> 0:03:36.480 to use to build software using generative AI. It's available 0:03:36.520 --> 0:03:39.360 as part of Anthropics twenty dollars, one hundred dollars and 0:03:39.400 --> 0:03:41.920 two hundred dollars a month claud subscriptions, with the more 0:03:41.920 --> 0:03:46.520 expensive subscriptions having more generous rate limits. Generally, these subscriptions 0:03:46.520 --> 0:03:47.920 are all you can eat. You can use them as 0:03:47.960 --> 0:03:49.920 much as you want until you hit limits, rather than 0:03:49.960 --> 0:03:52.400 paying for the actual tokens you burn. When I say 0:03:52.480 --> 0:03:55.680 burn tokens and someone reached out saying I should specify this, 0:03:56.040 --> 0:03:59.960 I'm describing how these models are traditionally built. In general, 0:04:00.120 --> 0:04:03.360 you'll builded a dollar per million input tokens as in 0:04:03.600 --> 0:04:06.520 user feeding in data and output tokens the output created, 0:04:06.920 --> 0:04:09.640 so you wouldn't get one token built, so every million 0:04:09.640 --> 0:04:12.520 you get charged. So, for example, Anthropic charges three dollars 0:04:12.560 --> 0:04:16.880 per million input tokens and six million output tokens to 0:04:17.000 --> 0:04:19.440 use its clauds on it for model, and it's about 0:04:19.480 --> 0:04:23.400 I think, well, a word before tokens should really look 0:04:23.440 --> 0:04:26.719 that up. It's it also gets more complex as you 0:04:26.760 --> 0:04:29.800 get into things like generating code. Nevertheless, claud code has 0:04:29.839 --> 0:04:33.120 been quite popular, and a user created a program called 0:04:33.160 --> 0:04:36.160 cc usage which allowed you to see your token burn 0:04:36.400 --> 0:04:39.200 the amount of tokens you were using. You were actually 0:04:39.279 --> 0:04:43.440 burning using Anthropics models while using clawed code versus just 0:04:43.640 --> 0:04:45.640 getting charged a month and not knowing, and many were 0:04:45.640 --> 0:04:47.440 seeing that they were burning in the excess of their 0:04:47.480 --> 0:04:50.200 monthly spend. To be clear, this is the token price 0:04:50.240 --> 0:04:52.480 based on anthropics own pricing, and thus the cost of 0:04:52.520 --> 0:04:55.080 Anthropic are likely not identical. So I got a little 0:04:55.120 --> 0:04:59.000 clever using anthropics gross profit margins, I chose fifty five percent, 0:04:59.160 --> 0:05:01.480 and then a few weeks solved my article sixty percent 0:05:01.560 --> 0:05:03.680 was leaked. I found at least twenty different accounts of 0:05:03.720 --> 0:05:06.400 people costing Anthropic anywhere from one hundred and thirty percent 0:05:06.600 --> 0:05:09.320 to three thousand and eighty four percent of their subscription. 0:05:09.960 --> 0:05:13.240 There is also now a leader board called vibrank, where 0:05:13.240 --> 0:05:16.039 people compete to see how much they burn with the 0:05:16.080 --> 0:05:18.840 current leader burning and I sheit you not fifty two 0:05:18.920 --> 0:05:20.880 hundred and ninety one dollars of the course of a month. 0:05:22.120 --> 0:05:25.400 Anthropic is, to be clear, the second largest model developer 0:05:25.520 --> 0:05:27.760 and has some of the best AI talent in the industry. 0:05:27.960 --> 0:05:30.240 It has a better handle on its infrastructure than anyone 0:05:30.240 --> 0:05:32.760 outside of big tech and open AI, and it still 0:05:32.800 --> 0:05:35.640 cannot seem to fix this problem even with weekly rate 0:05:35.680 --> 0:05:38.360 limits brought in at the end of August. While one 0:05:38.400 --> 0:05:41.400 could assume that Anthropic is simply letting users run wild, 0:05:41.440 --> 0:05:44.360 my theory is far simpler. Even the model developers have 0:05:44.520 --> 0:05:47.520 no real way of limiting user activity, likely due to 0:05:47.520 --> 0:05:50.839 the architecture of generative AI. I know it sounds insane, 0:05:50.839 --> 0:05:54.320 but at the most advanced level. Even there, modeled providers 0:05:54.360 --> 0:05:57.560 are still prompting their models, and whatever rate limits may 0:05:57.839 --> 0:06:00.719 be in place appear to at times get completely ignored, 0:06:00.920 --> 0:06:02.479 and there doesn't seem to be anything they can do 0:06:02.520 --> 0:06:05.839 to stop it now. Really, Anthropic counts amongst its capitalist 0:06:05.920 --> 0:06:09.480 apex predators one lone Chinese man who spent fifty thousand 0:06:09.480 --> 0:06:11.400 dollars to their compute in the space of a month 0:06:11.480 --> 0:06:15.320 fucking around with glord code. Even if Anthropic was profitable, 0:06:15.520 --> 0:06:17.719 it isn't, and we'll burn billions of dollars this year. 0:06:17.920 --> 0:06:20.240 A customer paying two hundred dollars a month ran up 0:06:20.279 --> 0:06:24.600 fifty thousand dollars in costs, immediately devouring the margin of 0:06:24.640 --> 0:06:27.120 any user running the service that day, that week, or 0:06:27.120 --> 0:06:30.560 even that month. Even if Anthropics costs are half the 0:06:30.560 --> 0:06:33.400 published rates, they're not. By the way, one guy amounted 0:06:33.400 --> 0:06:35.159 to one hundred and twenty five US is worth of 0:06:35.200 --> 0:06:38.720 monthly revenue. This is not a real business. That's a 0:06:38.760 --> 0:06:41.479 bad business without of control costs, and it doesn't appear 0:06:41.600 --> 0:06:45.160 anybody has these costs under control and face with the 0:06:45.200 --> 0:06:47.880 grim reality ahead of them, these companies are trying nasty 0:06:47.920 --> 0:06:50.599 little tricks on their customers to douce more revenue from them. 0:06:51.279 --> 0:06:54.159 A few weeks ago, Replet, an unprofitable AI coding company, 0:06:54.200 --> 0:06:56.880 released a product called Agent three, which promised to be 0:06:56.960 --> 0:07:00.560 ten times more autonomous and offer infinitely more possible abilities, 0:07:00.760 --> 0:07:04.040 testing and fixing its code, constantly improving your application behind 0:07:04.080 --> 0:07:08.760 the scenes in a reflection loop. Sounds very real, sounds 0:07:08.800 --> 0:07:12.600 extremely real, It's so real, but actually it isn't. In reality. 0:07:12.920 --> 0:07:15.040 This means you go and tell the model to build something, 0:07:15.040 --> 0:07:16.520 and it would go and do it, and you'll be 0:07:16.520 --> 0:07:18.640 shocked to hear that these models can't be relied upon 0:07:18.680 --> 0:07:21.400 to go and do anything. Please note that this was 0:07:21.480 --> 0:07:24.160 launched a few months after Replet raise their prices, shifting 0:07:24.320 --> 0:07:27.240 to obfiscated effort based pricing that would charge the full 0:07:27.280 --> 0:07:29.320 scope of the agent's work. And if you're wondering what 0:07:29.400 --> 0:07:32.440 the fuck that means, so are their customers. Agent three 0:07:32.440 --> 0:07:35.600 has been a disaster. Users found the tasks that previously 0:07:35.640 --> 0:07:38.280 cost a few dollars were spiraling into the hundreds of dollars, 0:07:38.480 --> 0:07:41.200 with the register reporting one customer found themselves within one 0:07:41.240 --> 0:07:43.440 thousand dollars bill after a week, and I quote them, 0:07:44.040 --> 0:07:46.880 I think it's just launch pricing adjustment. Some tasks on 0:07:46.960 --> 0:07:49.040 new apps ran over an hour and forty five minutes 0:07:49.040 --> 0:07:51.760 and only charged four to six dollars, but editing pre 0:07:51.840 --> 0:07:55.120 existing apps seems to cost most overall. I spend one 0:07:55.320 --> 0:07:58.240 K this week alone, and they told that to the register. 0:07:58.320 --> 0:08:01.440 By the way, another user comp that costs skyrocket without 0:08:01.440 --> 0:08:04.280 any concrete results, and they quote the register here. I 0:08:04.320 --> 0:08:06.480 typically spent between one hundred dollars and two hundred and 0:08:06.480 --> 0:08:08.640 fifty dollars a month. I blew through seventy dollars in 0:08:08.640 --> 0:08:11.280 a night at Agent three launch, and another redditor wrote 0:08:11.320 --> 0:08:14.800 alleging the new tool also performed some questionable actions. One 0:08:14.840 --> 0:08:17.760 prompt brute forced its way through authentication, redoing auth and 0:08:17.800 --> 0:08:20.520 hard resetting users password to what it wanted to perform 0:08:20.640 --> 0:08:23.160 app testing on a form. The user wrote, I realized 0:08:23.200 --> 0:08:25.960 that's a little nonsensical, but long story short, it did 0:08:25.960 --> 0:08:28.120 a bunch of shit. It wasn't asked to. As I 0:08:28.160 --> 0:08:31.720 previously reported, in late May early June, both open ai 0:08:31.800 --> 0:08:34.280 and Anthropic cranked up the pricing on their enterprise customers, 0:08:34.360 --> 0:08:37.720 leading Replet and Cursor both shifting their prices upward. This 0:08:37.800 --> 0:08:40.800 abuse is now trickled down to the customers. Report has 0:08:40.800 --> 0:08:43.520 now released an update. Unless you choose how autonomous you 0:08:43.520 --> 0:08:45.840 want Agent three to be, which is a tacit admission 0:08:45.840 --> 0:08:48.880 that you can't trust coding elms to build software replets. 0:08:48.960 --> 0:08:51.360 Users are still pissed off, complaining that report is charging 0:08:51.360 --> 0:08:53.720 them for an activity when the agent doesn't do anything, 0:08:53.960 --> 0:08:57.200 a consistent problem I've found across redditors. While Reddit is 0:08:57.200 --> 0:09:00.000 not the full summation of all users of every company everywhere, 0:09:00.160 --> 0:09:02.880 it's a fairly good barometer of user sentiment and man 0:09:02.880 --> 0:09:08.079 a user's piss and now here's why this is bad. Traditionally, 0:09:08.120 --> 0:09:10.920 Silicon Valley startups have relied upon the same model, have 0:09:11.040 --> 0:09:13.120 grow really fast and burn a bunch of money, then 0:09:13.160 --> 0:09:16.600 turn the profit lever. AI does not have a profit 0:09:16.679 --> 0:09:19.200 lever because the raw costs of providing access to AI 0:09:19.320 --> 0:09:22.280 models are so high and they're only increasing that the 0:09:22.280 --> 0:09:25.080 basic economics of how the tech industry sell software don't 0:09:25.080 --> 0:09:38.600 make sense. I'll reiterate something I wrote a few weeks ago. 0:09:39.280 --> 0:09:43.240 A large language model users infrastructural burden varies wildly between 0:09:43.320 --> 0:09:46.679 users and use cases. While somebody asking chat gpt to 0:09:46.720 --> 0:09:48.880 summarize an email might not be much of a burden, 0:09:49.120 --> 0:09:52.240 somebody asking chat gpt to review hundreds of pages of 0:09:52.240 --> 0:09:54.800 documents at once. A core feature of basically any twenty 0:09:54.800 --> 0:09:57.520 dollars a month subscription could eat up to eight GPUs 0:09:57.559 --> 0:09:59.480 at once. To be very clear, a user that pays 0:09:59.520 --> 0:10:01.720 twenty dollars a month could run multiple queries like this 0:10:01.760 --> 0:10:04.800 a month and there's not really a way to stop them. 0:10:04.960 --> 0:10:08.440 Unlike most software products, any errors in producing an output 0:10:08.440 --> 0:10:11.360 from a large language model have a significant opportunity cost. 0:10:11.640 --> 0:10:13.520 When a user doesn't like an output, or the model 0:10:13.520 --> 0:10:16.160 gets something wrong which it's guaranteed to do, or the 0:10:16.240 --> 0:10:19.160 user realizes they forgot something, the model must make a 0:10:19.200 --> 0:10:22.800 further generation or generations, and even with caching which anthropic 0:10:22.880 --> 0:10:25.400 is added are told to there's a definitive cost attached 0:10:25.400 --> 0:10:28.679 to any mistake. Large language models are for the most 0:10:28.720 --> 0:10:31.720 part lacking in any definitive use cases, meaning that every 0:10:31.800 --> 0:10:33.719 user is even with an idea of what they want 0:10:33.760 --> 0:10:37.040 to do, experimenting with every input and output. In doing so, 0:10:37.080 --> 0:10:39.480 they create the opportunity to burn more tokens, which in 0:10:39.480 --> 0:10:42.120 turn creates an infrastructural burn on GPUs, which cost a 0:10:42.120 --> 0:10:44.800 lot of money to run. The more specific the output, 0:10:44.880 --> 0:10:47.240 the more opportunities there are of a monstrous token burn. 0:10:47.280 --> 0:10:50.440 And I'm specifically thinking about coding with l elms. The 0:10:50.480 --> 0:10:53.600 token heavy nature of generating code means that any mistakes, 0:10:53.640 --> 0:10:57.600 suboptimal generations, or straight up errors will guarantee further token burn. 0:10:58.280 --> 0:11:01.319 Even efforts to reduce compute cors by, for example, pushing 0:11:01.360 --> 0:11:03.679 free users or those on cheap plans, the small or 0:11:03.760 --> 0:11:07.440 less intensive models have dubious efficacy. As I talked about 0:11:07.480 --> 0:11:09.679 in a previous episode, open ai split a model in 0:11:09.720 --> 0:11:12.959 the GPT version of CHET. GPT requires vast amounts of 0:11:13.000 --> 0:11:15.760 additional compute in order to route the user's request or 0:11:15.880 --> 0:11:18.960 the appropriate model, with simpler requests going to smaller models 0:11:19.000 --> 0:11:21.760 and more complex ones being shifted to reasoning models, and 0:11:21.800 --> 0:11:23.840 it makes it impossible to cash part of the input. 0:11:23.880 --> 0:11:26.680 As a result, it's not really clear whether it's saving 0:11:26.679 --> 0:11:29.120 open ai any money, and indeed, kind I suggest it 0:11:29.200 --> 0:11:32.200 might be costing them more. In simpler terms, it's very, 0:11:32.280 --> 0:11:34.920 very very difficult to imagine what one user free or 0:11:34.960 --> 0:11:37.480 otherwise might cost, and thus it's hard to charge them 0:11:37.679 --> 0:11:39.840 anything on a monthly basis or tell them what a 0:11:39.840 --> 0:11:42.720 service might actually cost them on average. And this is 0:11:42.760 --> 0:11:47.200 a huge, huge problem with AI coding environments. But let's 0:11:47.200 --> 0:11:50.640 talk about claud Code again. Anthropics code generate a tool. 0:11:50.760 --> 0:11:53.480 According to the information claud code was driving nearly four 0:11:53.559 --> 0:11:56.719 hundred million dollars in annualized revenue, roughly doubling from a 0:11:56.720 --> 0:11:59.360 few weeks ago on July thirty first, twenty twenty five. 0:12:00.080 --> 0:12:02.880 The annualized revenue works out to about thirty three million 0:12:02.920 --> 0:12:05.280 dollars a month in revenue for a company that predicts 0:12:05.280 --> 0:12:07.679 it will make at least four hundred and sixteen million 0:12:07.720 --> 0:12:09.280 dollars a month by the end of the year, and 0:12:09.320 --> 0:12:11.840 for a product that has become for a time the 0:12:11.880 --> 0:12:14.280 most popular coding environment in the world from the second 0:12:14.360 --> 0:12:17.680 largest and best funded AI company in the world. Is 0:12:17.720 --> 0:12:20.760 that it is that fucking it is that all that's 0:12:20.760 --> 0:12:23.960 happening here thirty three million dollars, all of which is 0:12:24.000 --> 0:12:27.800 unprofitable after it felt, at least based on social media 0:12:27.920 --> 0:12:30.840 chatter and discussing with multiple different engineers, that claud code 0:12:30.840 --> 0:12:33.760 have become ubiquitous with anything to do with LLLMS and coding. 0:12:34.720 --> 0:12:37.280 To be clear, Anthropics, so on It and Opus models 0:12:37.320 --> 0:12:39.560 are consistently some of the most popular for programming an 0:12:39.600 --> 0:12:42.720 open router, an aggregator of LM usage, and Anthropic has 0:12:42.760 --> 0:12:45.120 been consistently named as the best at coding. Whether or 0:12:45.120 --> 0:12:47.959 not I feel that way is irrelevant. Some bright spark 0:12:48.000 --> 0:12:49.720 out there is going to send it. Microsoft's get hub 0:12:49.760 --> 0:12:52.320 copilot at one point eight million paying subscribers, and guess 0:12:52.320 --> 0:12:55.360 what that's true? In fact, I reported it. Here's another 0:12:55.440 --> 0:12:58.160 fun fact. The Wall Street Journal report that Microsoft loses 0:12:58.200 --> 0:13:00.440 on average twenty dollars a month per use, with some 0:13:00.520 --> 0:13:03.120 users costing the company as much as eight bucks. And 0:13:03.160 --> 0:13:06.600 that's for the most popular product. But wait, wait, wait, wait, 0:13:07.320 --> 0:13:11.479 hold up, wait, I read some shit in the newspaper. 0:13:11.800 --> 0:13:15.480 Aren't these LLLM code generators replacing actual human engineers? And thus, 0:13:15.640 --> 0:13:17.480 even if they cost way more than twenty dollars one 0:13:17.520 --> 0:13:19.440 hundred dollars or two hundred dollars a month, they're still 0:13:19.440 --> 0:13:22.800 worth it. Right, They're replacing an entire engineer. Oh my 0:13:22.880 --> 0:13:25.520 sweet summer child. If you believe the New York Times 0:13:25.600 --> 0:13:28.240 or other outlets that simply copy and paste whatever anthropic 0:13:28.280 --> 0:13:31.079 CEO Warrio Ama Day says, you'd think that the reason 0:13:31.080 --> 0:13:33.200 that software engineers are having trouble finding work is because 0:13:33.200 --> 0:13:37.120 their jobs are being replaced by AI. This grotesque, manipulative, abusive, 0:13:37.120 --> 0:13:40.000 and offensive lie has been propagated through the entire business 0:13:40.080 --> 0:13:42.360 and tech media without anybody sitting down and asking whether 0:13:42.360 --> 0:13:44.560 it's true, or even getting a good understanding of what 0:13:44.600 --> 0:13:47.880 it is that elms can actually do with code. Members 0:13:47.880 --> 0:13:51.440 of the media, I am begging you stop stop doing this, 0:13:51.559 --> 0:13:56.480 Stop publishing these fucking headlines. You're embarrassing yourself. Every asshole 0:13:56.559 --> 0:13:58.400 is willing to give a quote saying that coding is 0:13:58.440 --> 0:14:00.600 dead and that every execut if he is willing to 0:14:00.600 --> 0:14:03.160 burp out some nonsense about replacing all of their engineers. 0:14:03.200 --> 0:14:05.199 But I'm fucking begging you to either use these things 0:14:05.280 --> 0:14:08.040 yourself or speak to people that do. I am not 0:14:08.080 --> 0:14:10.800 a coder. I cannot write or read code. Nevertheless, I'm 0:14:10.840 --> 0:14:13.520 capable of learning, and I've spoken to numerous software engineers 0:14:13.520 --> 0:14:15.880 in the last few months, and basically I've reached a 0:14:15.920 --> 0:14:20.480 consensus of this is kind of useful sometimes. However, one time, 0:14:20.520 --> 0:14:24.440 a very silly man with an increasingly squeaky voice said 0:14:24.440 --> 0:14:26.600 that I don't speak to people who use AI tools. 0:14:26.600 --> 0:14:29.440 So I went and spoke to three notable experienced software 0:14:29.480 --> 0:14:31.600 engineers and ask them to give me the straight truth 0:14:31.640 --> 0:14:34.560 about what coding lllms can do. Now, for the purposes 0:14:34.600 --> 0:14:36.400 of brevity, I'm going to use select quotes from what 0:14:36.440 --> 0:14:37.920 these people said. But if you want to read the 0:14:37.920 --> 0:14:40.560 whole thing, you can check out the newsletter first. I'm 0:14:40.560 --> 0:14:42.160 going to read what Carl Brown of the Internet of 0:14:42.200 --> 0:14:44.160 Bugs said, and I had him on the show a 0:14:44.200 --> 0:14:48.040 few months back. He's fantastic. So most of the advancements 0:14:48.040 --> 0:14:50.760 in programming languages, technique and craft in the last four 0:14:50.840 --> 0:14:53.080 years have been designing safer and better ways of tying 0:14:53.080 --> 0:14:56.240 these blocks together to create large and larger programs with 0:14:56.320 --> 0:15:00.000 more complexity and functionality. Humans use these advancements to arrange 0:15:00.120 --> 0:15:02.720 these blocks in logical abstraction layers so we can fit 0:15:02.720 --> 0:15:05.160 an understanding of the lairs interconnections in our heads as 0:15:05.160 --> 0:15:08.640 we work. Diving into blocks temporarily is needed. This is 0:15:08.680 --> 0:15:11.360 where AIS fall down. The amount of context required to 0:15:11.400 --> 0:15:14.480 hold the interconnections between these blocks quickly grows beyond the 0:15:14.480 --> 0:15:17.760 AI's effective short term memory, in practice much smaller than 0:15:17.760 --> 0:15:21.000 its advertised context windows size, and the AIS like the 0:15:21.040 --> 0:15:23.880 ability to reason about the abstractions as we do. This 0:15:24.000 --> 0:15:27.840 leads to real world code that's illogically layed, hard to understand, debug, 0:15:27.880 --> 0:15:32.440 and maintain. Carl also said code generation AIS, from an 0:15:32.480 --> 0:15:35.600 industry standpoint, are roughly the equivalent of a slightly below 0:15:35.600 --> 0:15:38.640 average computer science graduate fresh out of school without any 0:15:38.680 --> 0:15:41.600 real world experience, only ever having written programs to be 0:15:41.600 --> 0:15:44.480 printed and graded. That's bad because, as he pointed out, 0:15:44.520 --> 0:15:47.280 whereas llms can't get past this summer, in turn stage, 0:15:47.320 --> 0:15:50.320 actual humans get better, and if we're replacing the bottom 0:15:50.360 --> 0:15:52.160 rung of the labor market, there won't be any mid 0:15:52.240 --> 0:15:55.080 level or senior developers later down the line. Next, I 0:15:55.120 --> 0:15:57.560 asked Nick Sharesh of I will fucking pile drive you 0:15:57.640 --> 0:16:01.240 if you mention AI again what he thought. Llms, he said, 0:16:01.280 --> 0:16:03.600 will sometimes solve a thorny problem for me in a 0:16:03.640 --> 0:16:06.320 few seconds, saving me some brain power. But in practice, 0:16:06.320 --> 0:16:08.960 the effort of articulating so much of the design work 0:16:08.960 --> 0:16:11.560 in plain English and hoping the LM emits code that 0:16:11.600 --> 0:16:15.120 I find acceptable is frequently more work than just writing 0:16:15.160 --> 0:16:18.480 the code. For most problems, the hardest part is the thinking, 0:16:18.640 --> 0:16:21.560 and lllms don't make it that part any easier. I 0:16:21.600 --> 0:16:24.440 also talked to Colvogi of no AI is not making 0:16:24.480 --> 0:16:27.680 AI engineers ten X is productive. We also had in 0:16:27.720 --> 0:16:30.760 the show recently, and he said this, llms often function 0:16:30.920 --> 0:16:32.680 like a fresh summer intern. They're good at solving the 0:16:32.680 --> 0:16:35.080 straightforward problems that code has learned about in school. But 0:16:35.160 --> 0:16:37.800 they are unworldly. They do not understand how to bring 0:16:37.840 --> 0:16:40.520 lots of solutions to the small, straightforward problems together into 0:16:40.560 --> 0:16:42.920 a larger hole. They lack the experience to be wholly 0:16:42.920 --> 0:16:44.720 trusted and trust this is the most important thing you 0:16:44.760 --> 0:16:48.360 need to fully delegate coding tasks. In simpler terms, lms 0:16:48.360 --> 0:16:50.880 are capable of writing code, but can't do software engineering 0:16:50.880 --> 0:16:54.400 because software engineering is the process of understanding, maintaining and 0:16:54.440 --> 0:16:58.080 executing code to produce functional software, and lms do not learn, 0:16:58.160 --> 0:17:01.280 cannot adapt, and to paraphrase something Carl Brown said to me, 0:17:01.640 --> 0:17:04.439 break down the more of your code and variables you 0:17:04.480 --> 0:17:06.840 ask them to look at at once, so you can't 0:17:06.880 --> 0:17:09.600 replace a software engineer with them. If you are printing 0:17:09.640 --> 0:17:12.080 this in a media outlet and have heard this sentence, 0:17:12.280 --> 0:17:15.840 you are fucking up. You really are fucking up. I'm 0:17:15.920 --> 0:17:18.040 really neat members of the media here in this You 0:17:18.080 --> 0:17:19.879 need to change. You need to change on this one. 0:17:19.960 --> 0:17:38.000 You are doing software engineers dirty. Look, and I understand 0:17:38.000 --> 0:17:40.680 why too. It's very easy to believe that software engineering 0:17:40.720 --> 0:17:42.679 is just writing code, but the reality is that software 0:17:42.680 --> 0:17:46.480 engineers maintain software, which includes writing and analyzing code, amongst 0:17:46.480 --> 0:17:49.440 a vast array of different personalities and programs and problems. 0:17:50.040 --> 0:17:53.600 Good software engineering harkens back to Brian Merchant's interviews with translators. 0:17:53.680 --> 0:17:55.959 While some may believe the translators simply tell you what 0:17:55.960 --> 0:17:59.840 words mean, true translation is communicating the meaning of a sentence, 0:18:00.119 --> 0:18:03.800 which is cultural, contextual, regional, and personal and often requires 0:18:03.840 --> 0:18:07.240 the exercise of creativity and novel thinking. And on top 0:18:07.320 --> 0:18:10.199 of that, while translation is the production of words, you 0:18:10.240 --> 0:18:12.200 can't just take code and look at it. You actually 0:18:12.240 --> 0:18:15.440 need to know how code works and functions and wide functions. 0:18:15.480 --> 0:18:18.640 In that way, using an LLM, you'll never know because 0:18:18.680 --> 0:18:21.760 the LM doesn't know anything either. Now, my editor Matt 0:18:21.800 --> 0:18:23.960 Hughes gave an example of this in his newsletter, which 0:18:24.000 --> 0:18:26.399 I think i'll paraphrase. He used to live in France 0:18:26.400 --> 0:18:28.680 and the French speaking part of Switzerland, and sometimes he 0:18:28.720 --> 0:18:31.159 will read French translations of books to see how awkward 0:18:31.240 --> 0:18:34.399 bits of prose are translated. Doing those awkward bits requires 0:18:34.400 --> 0:18:37.000 a bit of creative thinking. And I quote take Harry 0:18:37.040 --> 0:18:40.960 Potter in French, Hogwarts is boudlard, which translates into bacon lice. 0:18:41.359 --> 0:18:43.680 Why did they go with that instead of a literal translation? 0:18:43.720 --> 0:18:47.439 Of Hogwarts, which would be Verus Spork. I'm sorry to 0:18:47.440 --> 0:18:50.000 anyone who can actually read languages, no idea, but I'd 0:18:50.000 --> 0:18:51.359 assume it is something to do with the fact that 0:18:51.400 --> 0:18:55.120 Poolard that Poudlard sounds a lot better than Veru Spork, 0:18:55.520 --> 0:18:58.960 and both of them, I can say flawlessly. Someone had 0:18:59.000 --> 0:19:01.520 to actually think about to translate that one idea. They 0:19:01.520 --> 0:19:04.040 had to exercise creativity, which is something that an AI 0:19:04.119 --> 0:19:08.040 in is inherently incapable of doing. Similarly, coding is not 0:19:08.080 --> 0:19:10.199 just a series of texts that program as a computer, 0:19:10.280 --> 0:19:12.800 but a series of interconnected characters that refers to other 0:19:12.880 --> 0:19:15.959 software in other places that must also function now and 0:19:16.040 --> 0:19:18.040 explain on some level to someone who has never ever 0:19:18.080 --> 0:19:20.320 seen the code before why it was done in this way. 0:19:20.880 --> 0:19:23.000 This is, by the way, while we're still yet to 0:19:23.000 --> 0:19:25.800 get any tangible proof that AI is replacing software engineers, 0:19:25.840 --> 0:19:30.359 because it isn't replacing software engineers, and now we need 0:19:30.400 --> 0:19:33.080 to understand why this is so existentially bad for generative AI. 0:19:33.880 --> 0:19:36.600 Of all the fields supposedly at risk from AI disruption, 0:19:36.720 --> 0:19:39.440 coding fields or felt the most tangible, if only because 0:19:39.480 --> 0:19:42.040 the answer to can you write code with LMS wasn't 0:19:42.080 --> 0:19:45.280 an immediate unilater or no The media has also been 0:19:45.359 --> 0:19:48.000 quick to suggest that AI writes software, which is true 0:19:48.000 --> 0:19:51.440 in the same way that chat GBT writes novels. In reality, 0:19:51.560 --> 0:19:54.919 lms can generate code and do somewhere some sort of 0:19:55.000 --> 0:19:58.720 software engineering adjacent tasks, but like all large language models, 0:19:58.760 --> 0:20:01.359 break down and go totally in saying hallucinating more and 0:20:01.400 --> 0:20:03.920 more as the tasks get more complex, and software engineering 0:20:03.960 --> 0:20:07.520 is extremely complex. Even software engineers who can read code 0:20:07.520 --> 0:20:09.359