1 00:00:02,800 --> 00:00:03,560 Speaker 1: Ze Media. 2 00:00:05,320 --> 00:00:07,880 Speaker 2: Hi, my name said Tron, and welcome to Better Offline. 3 00:00:07,880 --> 00:00:22,280 Speaker 2: This is also Jackass. So you've just had a cheery 4 00:00:22,280 --> 00:00:25,080 Speaker 2: two part chuckle first about how Generative Ai made tanker 5 00:00:25,239 --> 00:00:27,319 Speaker 2: markets in our economy. So I'm going to give you 6 00:00:27,320 --> 00:00:30,319 Speaker 2: a lighter one an episode about GPT five, which is 7 00:00:30,360 --> 00:00:33,080 Speaker 2: a model from open Ai, and why just under three 8 00:00:33,159 --> 00:00:35,360 Speaker 2: years of hype have led to the software equivalent of 9 00:00:35,360 --> 00:00:38,239 Speaker 2: the launch of Saint Anger, except every time lars are hit. 10 00:00:38,280 --> 00:00:41,800 Speaker 2: The snare drama cost them fifty five thousand dollars. Now, 11 00:00:41,800 --> 00:00:44,480 Speaker 2: if we look at the positive reviews, we see takes 12 00:00:44,600 --> 00:00:48,000 Speaker 2: ranging from Simon Willison's tempered remark that GPT five is 13 00:00:48,240 --> 00:00:51,280 Speaker 2: just good at stuff to semi anass this is completely 14 00:00:51,320 --> 00:00:54,800 Speaker 2: insane statement that GPT five is setting the stage for 15 00:00:54,880 --> 00:00:59,920 Speaker 2: ad monetization and the open Ai GPT chat GPT super app. 16 00:01:00,120 --> 00:01:02,440 Speaker 2: In a piece that makes several assertions about how the 17 00:01:02,520 --> 00:01:05,520 Speaker 2: router that underpins GPT five is somehow the secret way 18 00:01:05,600 --> 00:01:09,959 Speaker 2: that Openaye will inject Dad's which is just distinctly silly. 19 00:01:10,080 --> 00:01:13,400 Speaker 2: It's I'll get into this in the episode a little bit, 20 00:01:13,400 --> 00:01:15,480 Speaker 2: but just with everything you're going to hear, you're going 21 00:01:15,560 --> 00:01:18,160 Speaker 2: to realize that this is just someone just saying stuff. 22 00:01:18,200 --> 00:01:21,120 Speaker 2: Took four bylines to do that shit too. I'm also British. 23 00:01:21,120 --> 00:01:23,080 Speaker 2: I'm gonna say router. I might say router as well, 24 00:01:23,120 --> 00:01:24,760 Speaker 2: because I've been here a while. Make fun of my 25 00:01:24,840 --> 00:01:27,399 Speaker 2: voice if you really must. But with that out the way, 26 00:01:27,440 --> 00:01:30,640 Speaker 2: here's a quote from semi Analysis' coverage. Before the router, 27 00:01:30,720 --> 00:01:32,640 Speaker 2: there was no way for a query to be distinguished, 28 00:01:32,680 --> 00:01:35,880 Speaker 2: and after the router, the first low value query could 29 00:01:35,920 --> 00:01:38,679 Speaker 2: be routed to a GBT five mini model that can 30 00:01:38,760 --> 00:01:41,959 Speaker 2: answer with zero tool calls and no reasoning. This likely 31 00:01:41,959 --> 00:01:44,160 Speaker 2: means serving this user is approaching the cost of a 32 00:01:44,200 --> 00:01:48,120 Speaker 2: search query. This does not make any sense. This None 33 00:01:48,120 --> 00:01:50,480 Speaker 2: of this makes it like it's just a bunch of assumptions. 34 00:01:50,600 --> 00:01:53,120 Speaker 2: Why would this be the case. The article also makes 35 00:01:53,120 --> 00:01:54,840 Speaker 2: a lot of claims about the value of a question 36 00:01:54,920 --> 00:01:58,440 Speaker 2: and how chat GPT could I am serious a agent 37 00:01:59,000 --> 00:02:02,000 Speaker 2: agentically reach out to lawyers. I'm not going to edit 38 00:02:02,040 --> 00:02:04,560 Speaker 2: that out because egentically is not a fun word to say. 39 00:02:05,640 --> 00:02:07,760 Speaker 2: It is just complete nonsense, and in fact, I'm not 40 00:02:07,840 --> 00:02:11,320 Speaker 2: sure this piece reflects how GPT five even works at all. Again, 41 00:02:11,400 --> 00:02:14,520 Speaker 2: quoting it, the router serves multiple purposes on both the 42 00:02:14,560 --> 00:02:17,320 Speaker 2: cost and performance side. On the cost side, routing users 43 00:02:17,320 --> 00:02:19,400 Speaker 2: to many versions of each bubble allows open ai to 44 00:02:19,440 --> 00:02:22,480 Speaker 2: service uses at a lower cost or with lower costs. 45 00:02:22,520 --> 00:02:25,160 Speaker 2: Even to be fair on semi analysis, it's not as 46 00:02:25,200 --> 00:02:27,920 Speaker 2: if open ai gave them much help. Open AI's official 47 00:02:27,919 --> 00:02:31,520 Speaker 2: writings about the router aren't exactly filled with details, talking 48 00:02:31,560 --> 00:02:34,000 Speaker 2: and glowing terms about what it does, but not how 49 00:02:34,480 --> 00:02:38,440 Speaker 2: here's what they say. Chat GPT's real time router quickly 50 00:02:38,480 --> 00:02:41,760 Speaker 2: decides which model to use based on the conversation type, complexity, 51 00:02:41,800 --> 00:02:44,640 Speaker 2: tool needs, and your explicit intent. For example, if you 52 00:02:44,720 --> 00:02:47,520 Speaker 2: say think hard about this in the prompt. The router 53 00:02:47,600 --> 00:02:51,480 Speaker 2: is continuously trained on real signals, including when users switch models, 54 00:02:51,520 --> 00:02:56,200 Speaker 2: preference rates for responses, and measured corrected correctness improving over time. 55 00:02:56,600 --> 00:02:59,120 Speaker 2: Once usage limits are reached, a mini version of each 56 00:02:59,160 --> 00:03:02,040 Speaker 2: model handles remains inquiries. In the near future, we plan 57 00:03:02,080 --> 00:03:05,280 Speaker 2: to integrate these capabilities into a single model. And that 58 00:03:05,360 --> 00:03:08,359 Speaker 2: last bit really doesn't make sense, but in any case, 59 00:03:08,400 --> 00:03:11,760 Speaker 2: the lordchip GPT five has been very, very weird. At first. 60 00:03:11,760 --> 00:03:14,120 Speaker 2: Some people seemed really happy about it. Chief of them 61 00:03:14,120 --> 00:03:16,640 Speaker 2: software YouTube of Theo Brown, who is over four hundred 62 00:03:16,680 --> 00:03:19,520 Speaker 2: and sixty eight thousand subscribers. He's also known as theogg 63 00:03:19,760 --> 00:03:20,560 Speaker 2: who said. 64 00:03:20,840 --> 00:03:24,400 Speaker 1: I didn't know it could get this good. This was 65 00:03:24,520 --> 00:03:29,280 Speaker 1: kind of the like oh fuck moment for me in 66 00:03:29,320 --> 00:03:33,040 Speaker 1: a lot of ways, and I've had to fight like 67 00:03:33,120 --> 00:03:38,560 Speaker 1: a slow spiral into insanity. It's a really really good model. 68 00:03:39,600 --> 00:03:41,120 Speaker 2: He finished by saying, and. 69 00:03:41,120 --> 00:03:42,960 Speaker 1: Keep an eye on your job because I don't know 70 00:03:43,000 --> 00:03:44,840 Speaker 1: what this means for us long term. 71 00:03:45,360 --> 00:03:48,480 Speaker 2: Pretty crazy, right. Comments on the video included people saying 72 00:03:48,520 --> 00:03:51,200 Speaker 2: things like if open aye is helding you hostage, blink 73 00:03:51,280 --> 00:03:54,200 Speaker 2: twice and yes that is an adverbating quote. Another saying 74 00:03:54,240 --> 00:03:57,040 Speaker 2: this dude, is everything wrong in it today? Another saying 75 00:03:57,080 --> 00:03:59,600 Speaker 2: this video was sponsored by open Ai, Another other saying 76 00:03:59,800 --> 00:04:02,360 Speaker 2: ge GPT five failed every test project I gave it today. 77 00:04:02,440 --> 00:04:04,640 Speaker 2: It's a lie in my experience. Maybe they haven't ramped 78 00:04:04,720 --> 00:04:08,040 Speaker 2: up the GPUs now. From what I can tell, THEO 79 00:04:08,160 --> 00:04:10,800 Speaker 2: Brown played with GPT five in open ais offices and 80 00:04:10,800 --> 00:04:14,640 Speaker 2: did all the benchmarking there. Open Ai, by the way, 81 00:04:14,880 --> 00:04:19,520 Speaker 2: fucking how come on? You can't benchmark in their offices anyway. 82 00:04:19,560 --> 00:04:22,599 Speaker 2: Open AI's API based access to GPT five models. You 83 00:04:22,640 --> 00:04:24,000 Speaker 2: know the thing that you use if you want to 84 00:04:24,000 --> 00:04:26,720 Speaker 2: integrate GPT into your app, does not root them, by 85 00:04:26,760 --> 00:04:29,000 Speaker 2: the way, nor does open ai offer access to its 86 00:04:29,080 --> 00:04:32,440 Speaker 2: router or any associated models. Important detail. Just want you 87 00:04:32,480 --> 00:04:34,400 Speaker 2: to know that because we need to make sure very 88 00:04:34,400 --> 00:04:37,080 Speaker 2: clear now A weekly a Theo Brown would put out 89 00:04:37,120 --> 00:04:39,680 Speaker 2: another video called I was wrong about GPT five, which 90 00:04:39,960 --> 00:04:41,560 Speaker 2: he would open by saying. 91 00:04:41,880 --> 00:04:43,760 Speaker 1: So first and foremost, I want to make sure it 92 00:04:43,800 --> 00:04:47,359 Speaker 1: is very very clear that the experience that you probably 93 00:04:47,400 --> 00:04:50,000 Speaker 1: are having with chat, GPT and GPT five right now 94 00:04:50,400 --> 00:04:52,760 Speaker 1: is not the experience that I had when I was 95 00:04:52,760 --> 00:04:53,600 Speaker 1: first testing it. 96 00:04:53,960 --> 00:04:55,880 Speaker 2: Brown goes on to explain that he was not paid 97 00:04:55,880 --> 00:04:59,120 Speaker 2: by open Ai at all, that he was sincerely impressed 98 00:04:59,120 --> 00:05:01,599 Speaker 2: by the company and GA five, and that he'd actually 99 00:05:01,680 --> 00:05:04,200 Speaker 2: spent over twenty five thousand dollars in inference testing it 100 00:05:04,240 --> 00:05:06,720 Speaker 2: on his own company software, and indeed also that he 101 00:05:06,800 --> 00:05:10,280 Speaker 2: turned down a grand appearance fee. Sorry, I mean that's 102 00:05:10,320 --> 00:05:13,160 Speaker 2: a very British thing, one thousand dollars appearance fee, not 103 00:05:13,240 --> 00:05:16,160 Speaker 2: just like a really nice one. Brown claims he asked 104 00:05:16,160 --> 00:05:18,240 Speaker 2: open Ai to try it out, and after they declined 105 00:05:18,279 --> 00:05:20,240 Speaker 2: to let him test it early on his own, he 106 00:05:20,360 --> 00:05:22,159 Speaker 2: was invited to try it on camera with a small 107 00:05:22,200 --> 00:05:24,679 Speaker 2: group of other people open AI's offices where they'd film 108 00:05:24,720 --> 00:05:27,919 Speaker 2: his reactions. He said that the API was incredible, but 109 00:05:28,000 --> 00:05:30,039 Speaker 2: that it's become apparent that the models he was using 110 00:05:30,080 --> 00:05:31,799 Speaker 2: in the video were not the same as those released 111 00:05:31,839 --> 00:05:34,200 Speaker 2: of the public. Making a post on August thirteenth on 112 00:05:34,440 --> 00:05:37,000 Speaker 2: xd Everything app that GPT five was nowhere near as 113 00:05:37,040 --> 00:05:39,360 Speaker 2: good as in cursor as when it was as it 114 00:05:39,440 --> 00:05:40,960 Speaker 2: was when he was using it a few weeks ago, 115 00:05:41,040 --> 00:05:43,760 Speaker 2: complaining that things that worked while demoing it at open 116 00:05:43,800 --> 00:05:47,159 Speaker 2: ai no longer did, adding that there was something somebody 117 00:05:47,160 --> 00:05:49,680 Speaker 2: else on Twitter that said they'd had a similarly great 118 00:05:49,720 --> 00:05:53,560 Speaker 2: experience GPT five on launch that has since decayed. It 119 00:05:53,640 --> 00:05:55,880 Speaker 2: isn't completely clear what happened here, but I'm going to 120 00:05:55,880 --> 00:05:58,040 Speaker 2: guess that open ai showed THEO Brown and others in 121 00:05:58,080 --> 00:06:01,200 Speaker 2: their offices some sort of heavily molded version of the 122 00:06:01,200 --> 00:06:04,560 Speaker 2: model that burns significantly more compute to provide its outputs, 123 00:06:04,680 --> 00:06:07,599 Speaker 2: though I'm also very suspicious of how significance the difference 124 00:06:07,640 --> 00:06:11,040 Speaker 2: is here. Brown's videos attempt to show the difference between 125 00:06:11,080 --> 00:06:12,840 Speaker 2: the generations that you received from the model when it 126 00:06:12,880 --> 00:06:14,920 Speaker 2: was good and when it was bad. In this video, 127 00:06:15,160 --> 00:06:17,000 Speaker 2: which I'll include a link to in the episode notes. 128 00:06:17,000 --> 00:06:20,280 Speaker 2: But if I'm honest, they look pretty similar in that 129 00:06:20,279 --> 00:06:23,440 Speaker 2: they're kind of mediocre. I'm not saying that as a hater, 130 00:06:23,480 --> 00:06:25,120 Speaker 2: by the way. They just kind of look like shit. 131 00:06:26,000 --> 00:06:28,080 Speaker 2: It's just kind of okay, like shit. They look like 132 00:06:28,160 --> 00:06:31,240 Speaker 2: regular fucking generated websites. They don't look special. The good 133 00:06:31,279 --> 00:06:34,839 Speaker 2: one is fine, and the bad one has weird gradients 134 00:06:34,880 --> 00:06:37,919 Speaker 2: on it. This whole thing sucks, though, and was a 135 00:06:37,960 --> 00:06:41,000 Speaker 2: clear set up by open Ai to overstate the abilities 136 00:06:41,000 --> 00:06:43,320 Speaker 2: of GPT five, one that fell apart with the lightest 137 00:06:43,320 --> 00:06:46,480 Speaker 2: brush with reality. I imagine their assumption was that Brown 138 00:06:46,480 --> 00:06:48,720 Speaker 2: would post the glossy video and then walk away, and 139 00:06:48,760 --> 00:06:51,320 Speaker 2: it gave THEO some credit for straight up stating he 140 00:06:51,360 --> 00:06:53,919 Speaker 2: was misled. This was a desperate move and one that 141 00:06:53,960 --> 00:06:56,000 Speaker 2: blew up in the face of open Ai. Along with 142 00:06:56,040 --> 00:06:58,919 Speaker 2: the rest of the GPT five launch. People hate the model, 143 00:06:59,000 --> 00:07:01,960 Speaker 2: customers are mad for taking models away like four to 144 00:07:02,040 --> 00:07:04,560 Speaker 2: H and have remained mad even with their return, and 145 00:07:04,600 --> 00:07:07,919 Speaker 2: the chat gpt subreddit is almost entirely people complaining about 146 00:07:08,320 --> 00:07:11,320 Speaker 2: how ineffective the new version is and how even GPT 147 00:07:11,360 --> 00:07:13,760 Speaker 2: four ROH is not the same They got game of 148 00:07:13,760 --> 00:07:16,640 Speaker 2: brain Baby. As I said in last week's monologue. I 149 00:07:16,680 --> 00:07:18,800 Speaker 2: believe open Ai has grown a fandom rather than any 150 00:07:18,840 --> 00:07:21,880 Speaker 2: kind of sustainable product market fit, and they're now suffering 151 00:07:21,920 --> 00:07:24,520 Speaker 2: fandom like hate with every minor change they make in 152 00:07:24,520 --> 00:07:27,680 Speaker 2: an attempt to push GPT five further, further aggravating people 153 00:07:27,680 --> 00:07:30,640 Speaker 2: that barely understand why they use the product to begin with. Yeah, 154 00:07:30,760 --> 00:07:33,720 Speaker 2: the center of the angle laid the reason for GPT 155 00:07:33,800 --> 00:07:36,520 Speaker 2: five's launch, the belief that this was somehow a cost 156 00:07:36,520 --> 00:07:39,240 Speaker 2: cutting measure, where OpenAI had added a router to chat 157 00:07:39,280 --> 00:07:41,920 Speaker 2: GPT as a means of sending certain requests to cheaper 158 00:07:41,920 --> 00:07:45,080 Speaker 2: models to save money. But when I hear router, I 159 00:07:45,160 --> 00:07:47,680 Speaker 2: hear latency, and I never or even a second believe 160 00:07:47,720 --> 00:07:49,760 Speaker 2: that this would somehow be cheaper to run. It didn't 161 00:07:49,760 --> 00:07:52,720 Speaker 2: make sense. I'm a curious little criator, so I went 162 00:07:52,760 --> 00:07:55,920 Speaker 2: and found out how chat GPT five actually works, and 163 00:07:56,040 --> 00:07:59,160 Speaker 2: unlike the following incredible products that you should buy, it's 164 00:07:59,200 --> 00:08:12,679 Speaker 2: actually kind of a big piece of shit. And we're back, 165 00:08:13,120 --> 00:08:14,960 Speaker 2: and from here on out, I will define two things. 166 00:08:15,000 --> 00:08:17,720 Speaker 2: GPT five referring to the model and its associated mini 167 00:08:17,720 --> 00:08:20,400 Speaker 2: and nano models, and Chat GPT five referring to the 168 00:08:20,400 --> 00:08:23,520 Speaker 2: current state of chat GPT, which features an auto fast 169 00:08:23,560 --> 00:08:27,120 Speaker 2: and thinking and thinking mini model selections. You also can 170 00:08:27,160 --> 00:08:30,239 Speaker 2: see legacy models, but that's not what we're talking about today, 171 00:08:30,240 --> 00:08:32,760 Speaker 2: and that's also only for a little bit. It's a 172 00:08:32,800 --> 00:08:34,959 Speaker 2: distinction I have to make, by the way, and make earlier, 173 00:08:34,960 --> 00:08:37,480 Speaker 2: because the two things are different, they work in different ways, 174 00:08:37,480 --> 00:08:40,600 Speaker 2: and chat GPT five structure induces a bunch of trade 175 00:08:40,600 --> 00:08:43,600 Speaker 2: offs and downsides that, as I'll discuss later, make this 176 00:08:43,640 --> 00:08:47,320 Speaker 2: whole thing even more wasteful. In discussions with a source 177 00:08:47,360 --> 00:08:50,360 Speaker 2: that an infrastructure provider familiar with the architecture, it appears 178 00:08:50,400 --> 00:08:53,320 Speaker 2: that chat GPT five is in fact potentially more expensive 179 00:08:53,320 --> 00:08:55,679 Speaker 2: to run than previous models, and due to the complex 180 00:08:55,679 --> 00:08:58,200 Speaker 2: and chaotic nature of said architecture, can at times spun 181 00:08:58,320 --> 00:09:02,400 Speaker 2: upwards of double The tokens per quid tokens, for those 182 00:09:02,400 --> 00:09:04,560 Speaker 2: who don't know, are basically chunks of texts that the 183 00:09:04,600 --> 00:09:08,000 Speaker 2: AI models do stuff with. I'm simplifying this. Do not 184 00:09:08,120 --> 00:09:11,600 Speaker 2: email me and correct some minor thing nobody cares. A 185 00:09:11,679 --> 00:09:14,320 Speaker 2: sentence like the quick brown fox jumps over the lazy 186 00:09:14,360 --> 00:09:17,160 Speaker 2: dog will be broken into lots of smaller four character chunks. 187 00:09:17,400 --> 00:09:19,720 Speaker 2: There are different kinds of tokens, and they're all priced differently. 188 00:09:20,080 --> 00:09:22,120 Speaker 2: An input token refers to the data you send to 189 00:09:22,160 --> 00:09:24,280 Speaker 2: the model when you ask a question. Output tokens are 190 00:09:24,360 --> 00:09:26,199 Speaker 2: used to measure the size of its response, with bigger 191 00:09:26,200 --> 00:09:30,240 Speaker 2: responses requiring more tokens. The more tokens you burn paquery, 192 00:09:30,280 --> 00:09:32,480 Speaker 2: the more expensive it is to run that query. The 193 00:09:32,520 --> 00:09:35,560 Speaker 2: fact that chat GPT five can, in certain circumstances burn 194 00:09:35,600 --> 00:09:37,920 Speaker 2: twice the number of tokens of query means that every 195 00:09:38,000 --> 00:09:41,839 Speaker 2: question costs more. Chat GPT is also significantly more convoluted, 196 00:09:41,840 --> 00:09:45,280 Speaker 2: plagued by latency issues, and is more compute intensive thanks 197 00:09:45,280 --> 00:09:49,319 Speaker 2: to open a ey's new, smarter, more efficient model routing system. 198 00:09:50,040 --> 00:09:52,880 Speaker 2: In simpler terms, every user prompt on chat GPT, whether 199 00:09:52,920 --> 00:09:55,920 Speaker 2: it's in auto, fast thinking or Thinking Mini, starts by 200 00:09:55,920 --> 00:09:59,120 Speaker 2: putting the users prompt before the static prompt. I don't 201 00:09:59,160 --> 00:10:01,480 Speaker 2: want to lose you here. This is important. A static 202 00:10:01,480 --> 00:10:04,079 Speaker 2: prompt is the invisible instructions given by open Ai to 203 00:10:04,160 --> 00:10:07,080 Speaker 2: chat GPT, in the models themselves and the tools associate 204 00:10:07,160 --> 00:10:09,800 Speaker 2: with them to tell them how to operate. Instructions like 205 00:10:09,840 --> 00:10:12,199 Speaker 2: you are chat GPT, you're a large language model, You're 206 00:10:12,200 --> 00:10:14,720 Speaker 2: a helpful chat bot. Do not threaten them with a knife, 207 00:10:14,720 --> 00:10:17,280 Speaker 2: and so on and so forth. These static prompts are 208 00:10:17,280 --> 00:10:19,480 Speaker 2: different with each model you use. A reasoning model will 209 00:10:19,480 --> 00:10:22,400 Speaker 2: have a different instructions set to a more chat focused one, 210 00:10:22,440 --> 00:10:24,760 Speaker 2: such as think harder about a particular problem before giving 211 00:10:24,800 --> 00:10:27,760 Speaker 2: an answer. Break down problems into component answers. When you 212 00:10:27,840 --> 00:10:30,200 Speaker 2: get a certain thing, like if someone asks you a 213 00:10:30,240 --> 00:10:33,080 Speaker 2: coding question, query a coding tool. That kind of thing, 214 00:10:33,760 --> 00:10:35,800 Speaker 2: a user prompt is exactly what it sounds like, the 215 00:10:35,840 --> 00:10:37,760 Speaker 2: thing that a user wants the AI model to do. 216 00:10:38,320 --> 00:10:40,560 Speaker 2: The new order in chat GPT five becomes an issue 217 00:10:40,600 --> 00:10:43,080 Speaker 2: when you use multiple different models in the same conversation. 218 00:10:43,160 --> 00:10:45,199 Speaker 2: Because the router, the thing that selects the right model 219 00:10:45,200 --> 00:10:47,520 Speaker 2: for the request, has to look at the user prompt. 220 00:10:47,760 --> 00:10:50,800 Speaker 2: It can't consider static instructions first because they may be 221 00:10:50,840 --> 00:10:53,920 Speaker 2: different based on what the user asked. In fact, the 222 00:10:54,120 --> 00:10:56,000 Speaker 2: order has to be flipped for the whole thing to work. 223 00:10:56,679 --> 00:11:00,240 Speaker 2: But simpler previous versions of chat GPT would take the 224 00:11:00,240 --> 00:11:03,360 Speaker 2: static prompt and then invisibly append the user prompt onto it. 225 00:11:03,400 --> 00:11:06,080 Speaker 2: This static prompt would typically be cashed massively, reducing the 226 00:11:06,080 --> 00:11:08,040 Speaker 2: amount of compute the model needs to perform a task. 227 00:11:08,559 --> 00:11:12,400 Speaker 2: Chat GPT cannot do this. Every time you use chat 228 00:11:12,440 --> 00:11:15,480 Speaker 2: GPT five. Every single thing you say or do can 229 00:11:15,520 --> 00:11:17,880 Speaker 2: cause it to do something different. Attach a vile might 230 00:11:17,880 --> 00:11:20,080 Speaker 2: need a different model. Ask it to look into something 231 00:11:20,120 --> 00:11:22,600 Speaker 2: and be detailed. Might trigger a reasoning model or a 232 00:11:22,600 --> 00:11:26,600 Speaker 2: different depth of reasoning. Ask a question in a weird way. Sorry, 233 00:11:26,600 --> 00:11:27,880 Speaker 2: the route is going to need to send you to 234 00:11:27,880 --> 00:11:30,800 Speaker 2: a different model entirely each time, coming up with new 235 00:11:30,800 --> 00:11:33,839 Speaker 2: instructions based on the subtle interpretation of what you asked in. 236 00:11:34,559 --> 00:11:36,600 Speaker 2: Every single thing that can happen when you ask chat 237 00:11:36,640 --> 00:11:39,280 Speaker 2: GPT to do something may triget the route to change model. 238 00:11:39,400 --> 00:11:41,559 Speaker 2: A request a new tool, and each time it does 239 00:11:41,600 --> 00:11:44,680 Speaker 2: so requires a completely fresh static prompt, regardless of whether 240 00:11:44,679 --> 00:11:46,920 Speaker 2: you select auto thinking Faster or any other option on 241 00:11:47,040 --> 00:11:50,400 Speaker 2: chat GPT. This in turn requires it to expend more 242 00:11:50,400 --> 00:11:53,640 Speaker 2: compute with queries consuming more tokens compared to previous versions. 243 00:11:54,960 --> 00:11:56,640 Speaker 2: It's like you started a job, and every time you 244 00:11:56,720 --> 00:11:58,800 Speaker 2: do a task, right an email, make a cup of copy, 245 00:11:58,920 --> 00:12:03,440 Speaker 2: attend a meeting, email someone with a threat your workplace 246 00:12:03,480 --> 00:12:06,640 Speaker 2: requires you to complete the entire mandatory onboarding training first. 247 00:12:06,760 --> 00:12:08,800 Speaker 2: One way that it is spreadsheet, not before you brush up 248 00:12:08,800 --> 00:12:13,040 Speaker 2: on your anti biberary legislation. First your prick. As a result, 249 00:12:13,120 --> 00:12:16,160 Speaker 2: Chat GPT may be smart, but it doesn't really seem 250 00:12:16,160 --> 00:12:20,320 Speaker 2: efficient in the GPT five version. Now to play Devil's advoca, 251 00:12:20,480 --> 00:12:22,840 Speaker 2: open Ai likely added the routing model as a means 252 00:12:22,840 --> 00:12:25,440 Speaker 2: of creating a more sophisticated output for a user, and 253 00:12:25,520 --> 00:12:28,959 Speaker 2: I imagine with the intention of cost saving. Then again, 254 00:12:29,000 --> 00:12:30,800 Speaker 2: this might just be the thing it had to ship. 255 00:12:30,920 --> 00:12:32,760 Speaker 2: After all, GPT five was meant to be the next 256 00:12:32,840 --> 00:12:35,000 Speaker 2: great leap in AI, and the pressure was on to 257 00:12:35,040 --> 00:12:37,480 Speaker 2: get it out the door by creating a system that 258 00:12:37,520 --> 00:12:41,040 Speaker 2: depends on an extern and or routing model, likely another LM. 259 00:12:41,080 --> 00:12:43,280 Speaker 2: In this case, open ai has removed the ability to 260 00:12:43,280 --> 00:12:46,200 Speaker 2: cash the hidden instructions that dictate the how the models 261 00:12:46,240 --> 00:12:50,840 Speaker 2: generate answers in chat GPT, creating massive infrastructural overhead. Worse still, 262 00:12:51,000 --> 00:12:53,880 Speaker 2: this happens with every single turn as in message on 263 00:12:53,960 --> 00:12:56,880 Speaker 2: Chat GPT five, regardless of the model you choose, creating 264 00:12:57,000 --> 00:12:59,800 Speaker 2: endless infrastructural baggage with no real way out that only 265 00:12:59,800 --> 00:13:02,880 Speaker 2: could pounds based on how complex the user's queries get 266 00:13:02,920 --> 00:13:05,280 Speaker 2: or how much they change. They could be simple, but 267 00:13:05,400 --> 00:13:08,560 Speaker 2: just going in different directions every time, could open ai 268 00:13:08,679 --> 00:13:10,800 Speaker 2: make a better router? Sure? Does it have a good 269 00:13:10,840 --> 00:13:13,959 Speaker 2: one today? No, every time you message CHATGBT as the 270 00:13:13,960 --> 00:13:16,640 Speaker 2: potential to change model or tooling based on its own whims, 271 00:13:16,760 --> 00:13:19,200 Speaker 2: each time requiring a fresh static prompt, and short of 272 00:13:19,480 --> 00:13:22,240 Speaker 2: totally reworking the architecture of chat GPT five, there's no 273 00:13:22,280 --> 00:13:25,280 Speaker 2: way to change this. And if it's an LLM choosing 274 00:13:25,320 --> 00:13:28,640 Speaker 2: which model, I don't know, maybe it hallucinates just a guess. 275 00:13:29,400 --> 00:13:30,840 Speaker 2: It doesn't even need to be the case where a 276 00:13:30,920 --> 00:13:33,560 Speaker 2: user asks chet GPT five to think, and based on 277 00:13:33,600 --> 00:13:36,480 Speaker 2: my test with GPT five, sometimes you can just ask 278 00:13:36,480 --> 00:13:38,800 Speaker 2: it a forward question and it will think about it. 279 00:13:38,800 --> 00:13:41,840 Speaker 2: For no apparent reason, open ai has created a product 280 00:13:41,840 --> 00:13:45,680 Speaker 2: with latency issues and an overwhelmingly convoluted routing system that's 281 00:13:45,720 --> 00:13:48,560 Speaker 2: already straining capacity, to the point that this announcement feels 282 00:13:48,640 --> 00:13:51,880 Speaker 2: like open ai is walking away from its API entirely. This, 283 00:13:52,000 --> 00:13:53,880 Speaker 2: as a reminder, is the thing that people use to 284 00:13:53,920 --> 00:13:56,800 Speaker 2: incorporate open AI's models into their apps while also running 285 00:13:56,800 --> 00:13:59,560 Speaker 2: set models on the infrastructure open Ai rants from Microsoft 286 00:14:00,040 --> 00:14:02,400 Speaker 2: and call even at some point as well as Oracle, 287 00:14:03,200 --> 00:14:05,600 Speaker 2: and this API thing is really weird by the way 288 00:14:05,640 --> 00:14:08,559 Speaker 2: because these are new models, but Open Eyes really not 289 00:14:08,600 --> 00:14:11,760 Speaker 2: talking about the models themselves that much. Unlike the GPT 290 00:14:11,840 --> 00:14:14,840 Speaker 2: four rower announcement, which mentions the API in the first paragraph, 291 00:14:14,920 --> 00:14:17,440 Speaker 2: the GPT five announcement has no reference to it and 292 00:14:17,520 --> 00:14:19,720 Speaker 2: only has a single reference to developers at all when 293 00:14:19,760 --> 00:14:22,560 Speaker 2: talking about coding. Some woman has already hinted that he 294 00:14:22,640 --> 00:14:25,680 Speaker 2: intends to deprecate any new API demand, though I imagine 295 00:14:25,680 --> 00:14:27,920 Speaker 2: it will let anyone who will pay for priority processing, 296 00:14:27,960 --> 00:14:31,400 Speaker 2: which is essentially open eyes way to require minimum commitments 297 00:14:31,400 --> 00:14:34,040 Speaker 2: and extra payments from API customers just so they never 298 00:14:34,120 --> 00:14:37,200 Speaker 2: feel the bite of any compute shortages and throttling, which 299 00:14:37,200 --> 00:14:40,520 Speaker 2: they absolutely will do to people that don't pay. Chat 300 00:14:40,520 --> 00:14:43,000 Speaker 2: GPT five feels like the ultimate comeuppance for a company 301 00:14:43,000 --> 00:14:45,040 Speaker 2: that has never been forced to build a product, choosing 302 00:14:45,120 --> 00:14:48,200 Speaker 2: instead to bolt increasingly complex tools onto the side of 303 00:14:48,280 --> 00:14:51,280 Speaker 2: models in the hopes that one will magically appear. Now, 304 00:14:51,360 --> 00:14:53,880 Speaker 2: each and every feature of Chat GPT burns more money 305 00:14:53,880 --> 00:14:56,760 Speaker 2: than it ever did before. Chat GPT five feels like 306 00:14:56,800 --> 00:14:58,600 Speaker 2: a product that was rushed to market by a desperate 307 00:14:58,600 --> 00:15:00,680 Speaker 2: company that had to get something out of the In 308 00:15:00,720 --> 00:15:04,120 Speaker 2: simpler terms, here, it's actually really funny. When I worked 309 00:15:04,160 --> 00:15:07,200 Speaker 2: this out, I chuckled. I chuckled vigorously. This is just 310 00:15:07,240 --> 00:15:10,200 Speaker 2: a case where open ai has given chat gpt middle manager. 311 00:15:10,960 --> 00:15:12,640 Speaker 2: But now I'm giving you the chance to open up 312 00:15:12,680 --> 00:15:15,680 Speaker 2: your hearts and do something better. Open up your wallets too, 313 00:15:15,680 --> 00:15:18,800 Speaker 2: and send money to a company that follows here, But 314 00:15:19,000 --> 00:15:38,280 Speaker 2: hold my advertisements and we're back. Like every great middle manager, 315 00:15:38,480 --> 00:15:41,280 Speaker 2: chat GPT five's rutter creates more work based on its 316 00:15:41,320 --> 00:15:43,840 Speaker 2: own interpretation of what's going on, and has a separate 317 00:15:43,920 --> 00:15:45,960 Speaker 2: large language model. I can't imagine it has a ton 318 00:15:46,000 --> 00:15:48,520 Speaker 2: of training data available if I had to guess, and 319 00:15:48,560 --> 00:15:51,080 Speaker 2: this is a guess by the way open ai has done, 320 00:15:51,120 --> 00:15:53,160 Speaker 2: and we'll do a lot of fine tuning and reinforcement 321 00:15:53,240 --> 00:15:55,680 Speaker 2: learning to make it work. Though, to give it a 322 00:15:55,680 --> 00:15:57,640 Speaker 2: little grace, this is a new thing that it's doing, 323 00:15:57,680 --> 00:16:01,840 Speaker 2: and it's doing sort of a huge scale. The problems start, 324 00:16:01,880 --> 00:16:03,600 Speaker 2: by the way, with the fact that chat GPT five 325 00:16:03,680 --> 00:16:06,280 Speaker 2: is taking the user's initial prompt and then deciding which 326 00:16:06,280 --> 00:16:09,720 Speaker 2: model to use, unlike previous models, which sent your prompt 327 00:16:09,760 --> 00:16:11,920 Speaker 2: directly to the model along with the static prompt which 328 00:16:11,960 --> 00:16:13,880 Speaker 2: was cashed and came first. An important feature in how 329 00:16:13,960 --> 00:16:17,080 Speaker 2: these models, limit tokenburn. Open ai starts with a router 330 00:16:17,160 --> 00:16:20,400 Speaker 2: model that makes takes what you ask and gives its 331 00:16:20,480 --> 00:16:22,560 Speaker 2: chat GPT and tags it based on what kind of 332 00:16:22,640 --> 00:16:25,400 Speaker 2: thing your question might need. The thing might be a tool, 333 00:16:25,480 --> 00:16:27,400 Speaker 2: such as whether it has to do a web search 334 00:16:27,480 --> 00:16:30,360 Speaker 2: to spit out the thing at the end, a reasoning model, 335 00:16:30,520 --> 00:16:32,360 Speaker 2: whether it needs to use a coding language, and so 336 00:16:32,520 --> 00:16:35,760 Speaker 2: on and so forth. Once chat GPT has bounced your 337 00:16:35,800 --> 00:16:38,800 Speaker 2: query across various models, burn and compute along the way, 338 00:16:39,040 --> 00:16:41,600 Speaker 2: it then pushes it towards the chat portion of the generation. 339 00:16:42,080 --> 00:16:44,480 Speaker 2: And each time you ask chat GPT a question or 340 00:16:44,600 --> 00:16:47,520 Speaker 2: to do something and you specialized static prompt is generated, 341 00:16:47,800 --> 00:16:50,920 Speaker 2: sometimes several make it impossible to cash them in advance. 342 00:16:51,240 --> 00:16:53,520 Speaker 2: In simpler terms, each time you message it, chat GPT 343 00:16:53,640 --> 00:16:56,760 Speaker 2: is to dump all cased information and instructions for what 344 00:16:56,800 --> 00:16:59,120 Speaker 2: you need to do and reload it with each prompt. 345 00:16:59,520 --> 00:17:02,120 Speaker 2: Now here's some examples of what chat GPT five has 346 00:17:02,200 --> 00:17:04,879 Speaker 2: to reload every single time you prompt him whether or 347 00:17:04,880 --> 00:17:06,560 Speaker 2: not to use a browser or search the internet, and 348 00:17:06,640 --> 00:17:09,200 Speaker 2: under what conditions to do so, because they will change 349 00:17:09,200 --> 00:17:12,040 Speaker 2: with each prompt. How to approach a particular problem based 350 00:17:12,080 --> 00:17:14,439 Speaker 2: on what the user asked, including any specific ways you 351 00:17:14,480 --> 00:17:16,840 Speaker 2: meant to answer, tone, brevity, and so on based on 352 00:17:16,920 --> 00:17:20,840 Speaker 2: their request, specifics around how it might use, say open 353 00:17:20,880 --> 00:17:23,800 Speaker 2: ais code interpreter, such as the usage rules for running 354 00:17:23,800 --> 00:17:25,920 Speaker 2: a Python script, or how you want the code's output, 355 00:17:25,960 --> 00:17:28,359 Speaker 2: which again will be different based on each prompt. And 356 00:17:28,520 --> 00:17:30,199 Speaker 2: you can even say, do it in the exactly the 357 00:17:30,200 --> 00:17:32,919 Speaker 2: same way, and because it's a large language model, it 358 00:17:32,960 --> 00:17:37,480 Speaker 2: may hallucinate something different every single goddamn time you prompt 359 00:17:37,560 --> 00:17:40,520 Speaker 2: chat GPT five it has to do this. Worse still, 360 00:17:40,560 --> 00:17:43,480 Speaker 2: a particular conversation can involve you using multiple different models 361 00:17:43,520 --> 00:17:47,119 Speaker 2: and tools, requiring you with each and every prompt, having 362 00:17:47,119 --> 00:17:49,639 Speaker 2: to inject a different static prompt for each component that 363 00:17:49,720 --> 00:17:52,800 Speaker 2: chat GPT five uses. And you can't catch the static 364 00:17:52,800 --> 00:17:54,760 Speaker 2: prompt before the user's intent because if you did that, 365 00:17:55,040 --> 00:17:57,040 Speaker 2: it might send an instruction to a model that doesn't 366 00:17:57,040 --> 00:17:59,199 Speaker 2: make sense, such as telling a reasoning model to give 367 00:17:59,200 --> 00:18:01,840 Speaker 2: a quick and simple line answer remini or nanomodel to 368 00:18:01,880 --> 00:18:04,000 Speaker 2: do some sort of deep reasoning, which would create a 369 00:18:04,000 --> 00:18:07,920 Speaker 2: crappy answer and burn tokens in the process. And this 370 00:18:07,960 --> 00:18:10,040 Speaker 2: is all thanks to the complicated way that open ai 371 00:18:10,160 --> 00:18:14,400 Speaker 2: insisted on building GPT five. Every single time you send 372 00:18:14,480 --> 00:18:16,399 Speaker 2: something to chat, GPT can trigger it to use a 373 00:18:16,560 --> 00:18:21,199 Speaker 2: different series of models audio vision, reasoning, each with their 374 00:18:21,240 --> 00:18:24,680 Speaker 2: own instructions, static prompts, all while pulling different tools, each 375 00:18:24,720 --> 00:18:27,359 Speaker 2: requiring their own instructions based on what you asked, and 376 00:18:27,440 --> 00:18:30,679 Speaker 2: reasoning models even have different depths of reasoning. Unlike four 377 00:18:30,720 --> 00:18:33,800 Speaker 2: to ZH, which is a multimodal model combining text, vision, 378 00:18:33,800 --> 00:18:36,399 Speaker 2: and voice, GPT five is a ratking of open AI's 379 00:18:36,440 --> 00:18:38,720 Speaker 2: models and tools that gets reborn every single time you 380 00:18:38,760 --> 00:18:41,640 Speaker 2: ask it to do anything prompt It can prompt cash 381 00:18:41,720 --> 00:18:45,199 Speaker 2: some things, but the core instructions not so much. But 382 00:18:45,280 --> 00:18:47,600 Speaker 2: let's get a little more granular, because I know I've 383 00:18:47,720 --> 00:18:51,480 Speaker 2: been quite repetitive, but this is detailed. So from what 384 00:18:51,520 --> 00:18:53,879 Speaker 2: I've been told, there are either one or two models 385 00:18:53,880 --> 00:18:55,639 Speaker 2: at work for the routing. I'm going to go with 386 00:18:55,680 --> 00:18:57,600 Speaker 2: what I think is most likely based on the discussions 387 00:18:57,600 --> 00:19:00,640 Speaker 2: I've had with people familiar with the architecture. I've heard 388 00:19:00,680 --> 00:19:04,040 Speaker 2: the term orchestrator thrown around potential to potentially suggesting the 389 00:19:04,119 --> 00:19:06,840 Speaker 2: router may be more omnipresent throughout the process, but I 390 00:19:06,880 --> 00:19:09,479 Speaker 2: was unable to confirm its existence. Reach out of you 391 00:19:09,480 --> 00:19:12,480 Speaker 2: here differently, I'll explain things as they were explained to me. Though. 392 00:19:13,080 --> 00:19:15,760 Speaker 2: When a user sensor prompt, it goes through the Splitter leg, 393 00:19:15,760 --> 00:19:18,480 Speaker 2: which decides to send the query on one of two paths. 394 00:19:18,760 --> 00:19:21,399 Speaker 2: One is called the fast path, where a query is straightforward, 395 00:19:21,400 --> 00:19:24,240 Speaker 2: such as a text only conversation that doesn't require any 396 00:19:24,400 --> 00:19:27,399 Speaker 2: analysis or extra tools or thinking, a path where the 397 00:19:27,440 --> 00:19:30,679 Speaker 2: query may require reasoning or more complex tools like codgeneration 398 00:19:30,800 --> 00:19:33,560 Speaker 2: or access to web browser for research. To be clear, 399 00:19:33,640 --> 00:19:35,639 Speaker 2: there are prompts where it may be split into multiple 400 00:19:35,680 --> 00:19:38,320 Speaker 2: paths that trigger multiple models or tools, each requiring their 401 00:19:38,320 --> 00:19:41,720 Speaker 2: own static instructions. From what I understand, the splitter model 402 00:19:41,800 --> 00:19:44,480 Speaker 2: is a completely separate large language model, though we don't 403 00:19:44,480 --> 00:19:47,600 Speaker 2: have a ton of details about it. I also, based 404 00:19:47,600 --> 00:19:49,720 Speaker 2: on conversations I've had, think there's a chance there could 405 00:19:49,720 --> 00:19:52,000 Speaker 2: be a separate model that sits above the splitter that 406 00:19:52,080 --> 00:19:55,119 Speaker 2: does much lighter classification of how a query might be routed. 407 00:19:55,160 --> 00:19:56,919 Speaker 2: So you ask it to do something, it might just 408 00:19:57,000 --> 00:20:00,359 Speaker 2: go Okay, this looks like it needs a tool and 409 00:20:00,400 --> 00:20:02,600 Speaker 2: going off. Why now? In any case, none of this 410 00:20:02,680 --> 00:20:05,240 Speaker 2: can be cashed because all of this exists before inference, 411 00:20:05,400 --> 00:20:07,679 Speaker 2: which is where, by the way, it's inference I've misstated 412 00:20:07,720 --> 00:20:10,919 Speaker 2: in the past. Is like it inferring, meaning inference is 413 00:20:11,000 --> 00:20:14,240 Speaker 2: everything that happens to get an output to you. So 414 00:20:14,400 --> 00:20:17,320 Speaker 2: all of the stuff that's happening. And by the way, 415 00:20:17,359 --> 00:20:20,239 Speaker 2: this is all a completely new cost that open ai 416 00:20:20,359 --> 00:20:22,760 Speaker 2: has created. No one does this like this, it's so 417 00:20:22,840 --> 00:20:25,400 Speaker 2: fucking stupid. But now we get to the chat leg. 418 00:20:25,720 --> 00:20:27,919 Speaker 2: Now the open ai has added layers of extraction, it 419 00:20:27,920 --> 00:20:30,000 Speaker 2: can begin cooking up the output, by which I mean 420 00:20:30,200 --> 00:20:32,560 Speaker 2: do inference. The chat leg is where the pieces that 421 00:20:32,600 --> 00:20:35,080 Speaker 2: the splitter model created are pulled together, each loaded into 422 00:20:35,119 --> 00:20:38,159 Speaker 2: their with their respective static prompts based on what the 423 00:20:38,240 --> 00:20:40,879 Speaker 2: user asked chat GPD five to do. Each piece of 424 00:20:40,880 --> 00:20:43,080 Speaker 2: the model a tool to generate Python and an image 425 00:20:43,119 --> 00:20:46,400 Speaker 2: generation tool a reasoning model. To generate an output has 426 00:20:46,440 --> 00:20:49,720 Speaker 2: to process an entirely new static prompt and again that's 427 00:20:49,760 --> 00:20:53,560 Speaker 2: every interaction. Remember, static prompts are effectively instruction. So the 428 00:20:53,560 --> 00:20:55,680 Speaker 2: splitter model has told each piece of the pie how 429 00:20:55,720 --> 00:20:58,280 Speaker 2: to act to create a particular output. As a result, 430 00:20:58,400 --> 00:20:59,960 Speaker 2: much of this can't be cashed, creating more and more 431 00:21:00,160 --> 00:21:03,240 Speaker 2: repetitious token bone response and mean to have to repeat 432 00:21:03,280 --> 00:21:05,919 Speaker 2: this stuff so that you really get him. The upshot 433 00:21:05,920 --> 00:21:08,000 Speaker 2: of the chat legs static prompt baggage is that you 434 00:21:08,040 --> 00:21:10,000 Speaker 2: can do a little more here, at least in theory, 435 00:21:10,200 --> 00:21:13,119 Speaker 2: because each component can be instructed separately, they can again, 436 00:21:13,160 --> 00:21:16,320 Speaker 2: in theory, be made to give more individualized, specialized outputs, 437 00:21:16,359 --> 00:21:18,440 Speaker 2: like creating an image with tags that is as I'll 438 00:21:18,440 --> 00:21:21,080 Speaker 2: give you an example of very shortly generated using a 439 00:21:21,119 --> 00:21:26,520 Speaker 2: specific reasoning model. I'm clutching it straws here. I don't 440 00:21:26,520 --> 00:21:29,360 Speaker 2: really know if this's better, but I'm trying to be reasonable. 441 00:21:29,400 --> 00:21:31,960 Speaker 2: I'm trying to be normal. Every day, I try and 442 00:21:32,000 --> 00:21:35,679 Speaker 2: be normal. Previously, Open Eye's advantage was that a model 443 00:21:35,760 --> 00:21:37,400 Speaker 2: like four to oh was a kind of a jack 444 00:21:37,440 --> 00:21:39,800 Speaker 2: of all trades. But to get the benefits of chat 445 00:21:39,840 --> 00:21:42,919 Speaker 2: GPT five and that's in air quotes, it's engaged a 446 00:21:43,000 --> 00:21:46,680 Speaker 2: conductor model that can just make things more convoluted, even 447 00:21:46,720 --> 00:21:49,480 Speaker 2: in the case of simple requests. Let me give you 448 00:21:49,480 --> 00:21:52,520 Speaker 2: an example. You upload a chart of NFL player's stats 449 00:21:52,520 --> 00:21:55,240 Speaker 2: and ask chat GPT to decide which is the best 450 00:21:55,240 --> 00:21:57,160 Speaker 2: of the group and create an image to show the results. 451 00:21:57,359 --> 00:21:59,880 Speaker 2: In GPT four oh, chat GPT would use one more 452 00:22:00,160 --> 00:22:02,359 Speaker 2: and thus one static prompt to look at the image, 453 00:22:02,400 --> 00:22:04,520 Speaker 2: decide which tools to use, and then how to format 454 00:22:04,560 --> 00:22:07,600 Speaker 2: the response. You only needed one prompt, which was cased 455 00:22:07,640 --> 00:22:09,560 Speaker 2: because one model can look at the stats for all 456 00:22:09,600 --> 00:22:11,480 Speaker 2: the data and make the decisions and then use the 457 00:22:11,520 --> 00:22:15,160 Speaker 2: image generation tool to make the final image. In GPT five, 458 00:22:15,240 --> 00:22:17,960 Speaker 2: the chet GPT conductor model would see the stats, root 459 00:22:18,040 --> 00:22:20,640 Speaker 2: it to a vision model requiring its own static prompt, 460 00:22:20,680 --> 00:22:23,160 Speaker 2: then a separate text only reasoning model, one that has 461 00:22:23,160 --> 00:22:25,000 Speaker 2: no ability to use tools, but it might be cheaper 462 00:22:25,000 --> 00:22:27,919 Speaker 2: to get an answer from and also requires a static prompt, 463 00:22:28,640 --> 00:22:30,960 Speaker 2: and that would then decide which players are best and 464 00:22:31,000 --> 00:22:32,760 Speaker 2: then spit out an output, and then root it to 465 00:22:32,800 --> 00:22:35,680 Speaker 2: a completely separate model that can generate texts to query 466 00:22:35,720 --> 00:22:39,199 Speaker 2: the image tool again need a stag prompt for this 467 00:22:39,359 --> 00:22:41,600 Speaker 2: to then generate the image. On top of all this 468 00:22:41,680 --> 00:22:44,920 Speaker 2: onerous baggage lies another problem. The GPT five's various models 469 00:22:44,920 --> 00:22:48,160 Speaker 2: are just more complex. By splitting out the component elements 470 00:22:48,160 --> 00:22:50,080 Speaker 2: of what a model can do and allowing each model 471 00:22:50,119 --> 00:22:52,600 Speaker 2: to have different levels of reasoning, even the cheaper ones 472 00:22:52,640 --> 00:22:55,399 Speaker 2: like MIDI and nano open AI has created an endless 473 00:22:55,440 --> 00:22:57,960 Speaker 2: combination of different reasons to have to make a brand 474 00:22:58,000 --> 00:23:01,520 Speaker 2: new static prompt instruction automated by a router, a large 475 00:23:01,560 --> 00:23:04,000 Speaker 2: language model that chooses what large language model to choose 476 00:23:04,000 --> 00:23:08,760 Speaker 2: for a query. It is, if I'm honest, kind of funny. 477 00:23:08,960 --> 00:23:12,040 Speaker 2: Reasoning models work when simply described by breaking up a 478 00:23:12,040 --> 00:23:14,480 Speaker 2: prompt into component pieces, looking over them, and deciding what 479 00:23:14,520 --> 00:23:17,320 Speaker 2: the best course of action might be. Chat GPT's router 480 00:23:17,359 --> 00:23:19,920 Speaker 2: is effectively an abstraction higher breaking up the prompt into 481 00:23:19,920 --> 00:23:22,679 Speaker 2: component pieces, then choosing different models for each of those pieces, 482 00:23:22,680 --> 00:23:26,000 Speaker 2: which may in turn be broken up by a reasoning model. 483 00:23:26,240 --> 00:23:28,119 Speaker 2: While I wouldn't say this is a hat on a 484 00:23:28,160 --> 00:23:31,119 Speaker 2: hat situation, it is at this point unclear what exactly 485 00:23:31,200 --> 00:23:35,320 Speaker 2: the benefits of chat GPT five's new architecture are, less hallucinations, 486 00:23:35,640 --> 00:23:38,440 Speaker 2: better answers. Based on what I've been told, this was 487 00:23:38,480 --> 00:23:41,480 Speaker 2: a decision made to increase the model's performance, what I 488 00:23:41,520 --> 00:23:43,919 Speaker 2: can say is that this very likely increased open ayes 489 00:23:43,960 --> 00:23:45,560 Speaker 2: overhead at a time when it needs to do the 490 00:23:45,600 --> 00:23:49,040 Speaker 2: exact opposite. Even if chat GPT five pushes people towards 491 00:23:49,119 --> 00:23:51,920 Speaker 2: cheaper models, it does so while guaranteeing extra costs and 492 00:23:52,000 --> 00:23:55,400 Speaker 2: latency and whatever signals it may learn as people use. 493 00:23:55,440 --> 00:23:59,200 Speaker 2: This will have to create significant benefits massive one hundred 494 00:23:59,240 --> 00:24:01,720 Speaker 2: percent plus game for it to be anything close to worthwhile. 495 00:24:02,400 --> 00:24:04,359 Speaker 2: While open ai is rude to may be smart in 496 00:24:04,480 --> 00:24:06,560 Speaker 2: terms of nuance of how it might answer a query, 497 00:24:06,600 --> 00:24:09,600 Speaker 2: and even that I question it most decidedly, is not 498 00:24:09,720 --> 00:24:12,119 Speaker 2: more efficient and may have actually increased the burn rate 499 00:24:12,160 --> 00:24:13,680 Speaker 2: for a company that will lose as much as eight 500 00:24:13,720 --> 00:24:16,400 Speaker 2: billion dollars this year, and I think that number might 501 00:24:16,440 --> 00:24:19,840 Speaker 2: be low too. Yet what I'm left with in writing 502 00:24:19,880 --> 00:24:23,080 Speaker 2: this script is how wasteful all of this is. Open Ai, 503 00:24:23,400 --> 00:24:26,439 Speaker 2: a company that is already incinerated upwards of fifteen billion 504 00:24:26,480 --> 00:24:28,840 Speaker 2: dollars in the last two years, has chosen to create 505 00:24:28,880 --> 00:24:31,000 Speaker 2: a less efficient way of doing business as a means 506 00:24:31,000 --> 00:24:35,040 Speaker 2: of eking out and monest the best performance improvements. It 507 00:24:35,160 --> 00:24:38,359 Speaker 2: just sucks. In our own lives, we're continually pushed and 508 00:24:38,400 --> 00:24:40,960 Speaker 2: pressured and punished if we get into debt, judged by 509 00:24:40,960 --> 00:24:43,480 Speaker 2: our peers and our parents, if we spend our money recklessly, 510 00:24:43,640 --> 00:24:45,920 Speaker 2: and if we're too reckless, we find ourselves less likely 511 00:24:45,960 --> 00:24:49,600 Speaker 2: to receive anything from credit to housing. Companies like open 512 00:24:49,600 --> 00:24:52,560 Speaker 2: Ai live by a different set of standards. Some Mormon 513 00:24:52,640 --> 00:24:54,959 Speaker 2: intends to lose more than forty four billion dollars by 514 00:24:55,000 --> 00:24:57,080 Speaker 2: the end of twenty twenty eight on open Ai, and 515 00:24:57,119 --> 00:25:00,639 Speaker 2: graciously told CNBC, like Lord Farquad, he was willing to 516 00:25:00,720 --> 00:25:02,919 Speaker 2: run at a loss for a long time where he 517 00:25:03,000 --> 00:25:06,000 Speaker 2: was treated like he was this smart, reasonable decision maker 518 00:25:06,080 --> 00:25:08,920 Speaker 2: rather than someone that needs to rein in their horrendous 519 00:25:09,000 --> 00:25:12,560 Speaker 2: spending habits and be more mindful. The ultra rich are 520 00:25:12,720 --> 00:25:15,280 Speaker 2: rewarded far more for their errant spending habits than we 521 00:25:15,320 --> 00:25:18,160 Speaker 2: ever are for any thrifty inness or austerity measures we make, 522 00:25:18,600 --> 00:25:20,600 Speaker 2: and none of us are afforded the level of grace 523 00:25:20,640 --> 00:25:24,720 Speaker 2: that Clammy sam Altman has been and has been feels appropriate. 524 00:25:25,240 --> 00:25:28,960 Speaker 2: Chat GPT five is an engineering nightmare, a phenomenally silly 525 00:25:29,000 --> 00:25:31,240 Speaker 2: and desperate attempt to duce what remains of the dying 526 00:25:31,280 --> 00:25:34,760 Speaker 2: innovation and excitement within the walls of open Ai. It's 527 00:25:34,800 --> 00:25:37,480 Speaker 2: not November twenty twenty two anymore. And let's be honest, 528 00:25:37,480 --> 00:25:39,959 Speaker 2: there really hasn't been anything exciting or interesting out this 529 00:25:40,000 --> 00:25:44,560 Speaker 2: company since GPT four. There's nothing exciting happening at this company. 530 00:25:45,080 --> 00:25:47,600 Speaker 2: As many as seven hundred million people a week allegedly 531 00:25:47,680 --> 00:25:50,320 Speaker 2: used chat GPT, but nobody can really say why. An 532 00:25:50,320 --> 00:25:53,720 Speaker 2: open Ai, despite its massive popularity. Cannot seem to stop 533 00:25:53,760 --> 00:25:56,880 Speaker 2: losing billions of dollars, and it can't seem to explain 534 00:25:56,920 --> 00:26:00,399 Speaker 2: why that's necessary other than this shit's really expensive. Dude, 535 00:26:00,960 --> 00:26:03,480 Speaker 2: Can anyone actually articulate a reason why we need to 536 00:26:03,480 --> 00:26:05,960 Speaker 2: burn billions of dollars to do this? What are we doing? 537 00:26:06,080 --> 00:26:08,240 Speaker 2: Why are we doing it? Has everybody just agreed to 538 00:26:08,280 --> 00:26:11,080 Speaker 2: do this until it becomes a completely untenable Do we 539 00:26:11,119 --> 00:26:13,080 Speaker 2: all yearn for the abyss so much that we can't 540 00:26:13,080 --> 00:26:17,359 Speaker 2: find camaraderie and admitting we were wrong? Look at GPT five. 541 00:26:17,880 --> 00:26:20,399 Speaker 2: This is, if you believe the hype, the best funded, 542 00:26:20,440 --> 00:26:23,320 Speaker 2: best resourced company in the world, with the greatest mind 543 00:26:23,359 --> 00:26:26,080 Speaker 2: and its helm and the greatest minds within its wars. 544 00:26:26,240 --> 00:26:28,600 Speaker 2: And this is the best they've gone. A large language 545 00:26:28,640 --> 00:26:31,480 Speaker 2: model that chooses which large language model will answer your question. 546 00:26:32,200 --> 00:26:35,160 Speaker 2: G fucking wit, Sam Mortman sounds dandy, and how much 547 00:26:35,240 --> 00:26:37,800 Speaker 2: better is this? You say, Oh, you can't really say 548 00:26:38,119 --> 00:26:40,400 Speaker 2: fucking brilliant? Hey does it do anything new? 549 00:26:40,840 --> 00:26:40,879 Speaker 3: No? 550 00:26:41,720 --> 00:26:44,560 Speaker 2: Oh, what's that? It's actually our job to work out 551 00:26:44,560 --> 00:26:48,040 Speaker 2: for ourselves. Thanks man, I love it. I love this shit. 552 00:26:48,200 --> 00:26:50,560 Speaker 2: And if you're someone that is a hype merchant listening 553 00:26:50,560 --> 00:26:52,200 Speaker 2: to this and you've done really well getting to the 554 00:26:52,280 --> 00:26:54,119 Speaker 2: end of the third part. By the way, I respect you. 555 00:26:54,440 --> 00:26:56,639 Speaker 2: I want you to email me and explain why they 556 00:26:56,680 --> 00:26:59,159 Speaker 2: should be justified in burning billions of dollars if you 557 00:26:59,280 --> 00:27:03,040 Speaker 2: tell me, if you tell me Aws, I will eat 558 00:27:03,080 --> 00:27:06,000 Speaker 2: you alive. I mean that, does it? I mean that 559 00:27:06,240 --> 00:27:10,399 Speaker 2: completely literally, I will unhinge my jaw. I'll eat you 560 00:27:10,520 --> 00:27:12,359 Speaker 2: like Kirby and shit out of dance. I've said that 561 00:27:12,359 --> 00:27:15,600 Speaker 2: one before, but I'm going with him in any case. 562 00:27:16,400 --> 00:27:19,119 Speaker 2: This three parter has also really reminded me how ridiculous 563 00:27:19,160 --> 00:27:23,120 Speaker 2: this is, how nonsensical things have become, and how much 564 00:27:23,200 --> 00:27:27,920 Speaker 2: waste has been kind of justified, justified on this idea 565 00:27:27,960 --> 00:27:30,200 Speaker 2: that this will become something by people that don't really 566 00:27:30,200 --> 00:27:32,240 Speaker 2: know what it does today or might do in the future. 567 00:27:32,840 --> 00:27:34,919 Speaker 2: None of this is going to end well, and not 568 00:27:34,960 --> 00:27:38,080 Speaker 2: even the boosters seem to be having fun anymore. Everybody's 569 00:27:38,160 --> 00:27:40,640 Speaker 2: just flating around waiting for it to end. Even Sam 570 00:27:40,720 --> 00:27:43,600 Speaker 2: Ortman seems tired of it all. I know, I bloody 571 00:27:43,600 --> 00:27:54,359 Speaker 2: well I am. Thank you for listening to Better Offline. 572 00:27:54,480 --> 00:27:56,920 Speaker 3: The editor and composer of the Better Offline theme song 573 00:27:57,000 --> 00:27:59,639 Speaker 3: is Metosowski. You can check out more of his music 574 00:27:59,640 --> 00:28:03,320 Speaker 3: and audio projects at Mattasowski dot com m A T 575 00:28:03,320 --> 00:28:07,760 Speaker 3: T O S O W s Ki dot com. You 576 00:28:07,800 --> 00:28:10,320 Speaker 3: can email me at easy at better offline dot com 577 00:28:10,400 --> 00:28:12,720 Speaker 3: or visit better offline dot com to find more podcast 578 00:28:12,760 --> 00:28:16,080 Speaker 3: links and of course, my newsletter. I also really recommend 579 00:28:16,119 --> 00:28:18,080 Speaker 3: you go to chat dot where's youreaed dot at to 580 00:28:18,160 --> 00:28:20,520 Speaker 3: visit the discord, and go to our slash. 581 00:28:20,200 --> 00:28:23,359 Speaker 2: Better Offline to check out I'll Reddit. Thank you so 582 00:28:23,440 --> 00:28:26,880 Speaker 2: much for listening. Better Offline is a production of cool 583 00:28:26,960 --> 00:28:29,719 Speaker 2: Zone Media. For more from cool Zone Media, visit our 584 00:28:29,760 --> 00:28:32,760 Speaker 2: website cool Zonemedia dot com, or check us out on 585 00:28:32,840 --> 00:28:36,639 Speaker 2: the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.