WEBVTT - Exclusive: How GPT-5 Actually Works

0:00:02.800 --> 0:00:03.560
<v Speaker 1>Ze Media.

0:00:05.320 --> 0:00:07.880
<v Speaker 2>Hi, my name said Tron, and welcome to Better Offline.

0:00:07.880 --> 0:00:22.280
<v Speaker 2>This is also Jackass. So you've just had a cheery

0:00:22.280 --> 0:00:25.080
<v Speaker 2>two part chuckle first about how Generative Ai made tanker

0:00:25.239 --> 0:00:27.319
<v Speaker 2>markets in our economy. So I'm going to give you

0:00:27.320 --> 0:00:30.319
<v Speaker 2>a lighter one an episode about GPT five, which is

0:00:30.360 --> 0:00:33.080
<v Speaker 2>a model from open Ai, and why just under three

0:00:33.159 --> 0:00:35.360
<v Speaker 2>years of hype have led to the software equivalent of

0:00:35.360 --> 0:00:38.239
<v Speaker 2>the launch of Saint Anger, except every time lars are hit.

0:00:38.280 --> 0:00:41.800
<v Speaker 2>The snare drama cost them fifty five thousand dollars. Now,

0:00:41.800 --> 0:00:44.480
<v Speaker 2>if we look at the positive reviews, we see takes

0:00:44.600 --> 0:00:48.000
<v Speaker 2>ranging from Simon Willison's tempered remark that GPT five is

0:00:48.240 --> 0:00:51.280
<v Speaker 2>just good at stuff to semi anass this is completely

0:00:51.320 --> 0:00:54.800
<v Speaker 2>insane statement that GPT five is setting the stage for

0:00:54.880 --> 0:00:59.920
<v Speaker 2>ad monetization and the open Ai GPT chat GPT super app.

0:01:00.120 --> 0:01:02.440
<v Speaker 2>In a piece that makes several assertions about how the

0:01:02.520 --> 0:01:05.520
<v Speaker 2>router that underpins GPT five is somehow the secret way

0:01:05.600 --> 0:01:09.959
<v Speaker 2>that Openaye will inject Dad's which is just distinctly silly.

0:01:10.080 --> 0:01:13.400
<v Speaker 2>It's I'll get into this in the episode a little bit,

0:01:13.400 --> 0:01:15.480
<v Speaker 2>but just with everything you're going to hear, you're going

0:01:15.560 --> 0:01:18.160
<v Speaker 2>to realize that this is just someone just saying stuff.

0:01:18.200 --> 0:01:21.120
<v Speaker 2>Took four bylines to do that shit too. I'm also British.

0:01:21.120 --> 0:01:23.080
<v Speaker 2>I'm gonna say router. I might say router as well,

0:01:23.120 --> 0:01:24.760
<v Speaker 2>because I've been here a while. Make fun of my

0:01:24.840 --> 0:01:27.399
<v Speaker 2>voice if you really must. But with that out the way,

0:01:27.440 --> 0:01:30.640
<v Speaker 2>here's a quote from semi Analysis' coverage. Before the router,

0:01:30.720 --> 0:01:32.640
<v Speaker 2>there was no way for a query to be distinguished,

0:01:32.680 --> 0:01:35.880
<v Speaker 2>and after the router, the first low value query could

0:01:35.920 --> 0:01:38.679
<v Speaker 2>be routed to a GBT five mini model that can

0:01:38.760 --> 0:01:41.959
<v Speaker 2>answer with zero tool calls and no reasoning. This likely

0:01:41.959 --> 0:01:44.160
<v Speaker 2>means serving this user is approaching the cost of a

0:01:44.200 --> 0:01:48.120
<v Speaker 2>search query. This does not make any sense. This None

0:01:48.120 --> 0:01:50.480
<v Speaker 2>of this makes it like it's just a bunch of assumptions.

0:01:50.600 --> 0:01:53.120
<v Speaker 2>Why would this be the case. The article also makes

0:01:53.120 --> 0:01:54.840
<v Speaker 2>a lot of claims about the value of a question

0:01:54.920 --> 0:01:58.440
<v Speaker 2>and how chat GPT could I am serious a agent

0:01:59.000 --> 0:02:02.000
<v Speaker 2>agentically reach out to lawyers. I'm not going to edit

0:02:02.040 --> 0:02:04.560
<v Speaker 2>that out because egentically is not a fun word to say.

0:02:05.640 --> 0:02:07.760
<v Speaker 2>It is just complete nonsense, and in fact, I'm not

0:02:07.840 --> 0:02:11.320
<v Speaker 2>sure this piece reflects how GPT five even works at all. Again,

0:02:11.400 --> 0:02:14.520
<v Speaker 2>quoting it, the router serves multiple purposes on both the

0:02:14.560 --> 0:02:17.320
<v Speaker 2>cost and performance side. On the cost side, routing users

0:02:17.320 --> 0:02:19.400
<v Speaker 2>to many versions of each bubble allows open ai to

0:02:19.440 --> 0:02:22.480
<v Speaker 2>service uses at a lower cost or with lower costs.

0:02:22.520 --> 0:02:25.160
<v Speaker 2>Even to be fair on semi analysis, it's not as

0:02:25.200 --> 0:02:27.920
<v Speaker 2>if open ai gave them much help. Open AI's official

0:02:27.919 --> 0:02:31.520
<v Speaker 2>writings about the router aren't exactly filled with details, talking

0:02:31.560 --> 0:02:34.000
<v Speaker 2>and glowing terms about what it does, but not how

0:02:34.480 --> 0:02:38.440
<v Speaker 2>here's what they say. Chat GPT's real time router quickly

0:02:38.480 --> 0:02:41.760
<v Speaker 2>decides which model to use based on the conversation type, complexity,

0:02:41.800 --> 0:02:44.640
<v Speaker 2>tool needs, and your explicit intent. For example, if you

0:02:44.720 --> 0:02:47.520
<v Speaker 2>say think hard about this in the prompt. The router

0:02:47.600 --> 0:02:51.480
<v Speaker 2>is continuously trained on real signals, including when users switch models,

0:02:51.520 --> 0:02:56.200
<v Speaker 2>preference rates for responses, and measured corrected correctness improving over time.

0:02:56.600 --> 0:02:59.120
<v Speaker 2>Once usage limits are reached, a mini version of each

0:02:59.160 --> 0:03:02.040
<v Speaker 2>model handles remains inquiries. In the near future, we plan

0:03:02.080 --> 0:03:05.280
<v Speaker 2>to integrate these capabilities into a single model. And that

0:03:05.360 --> 0:03:08.359
<v Speaker 2>last bit really doesn't make sense, but in any case,

0:03:08.400 --> 0:03:11.760
<v Speaker 2>the lordchip GPT five has been very, very weird. At first.

0:03:11.760 --> 0:03:14.120
<v Speaker 2>Some people seemed really happy about it. Chief of them

0:03:14.120 --> 0:03:16.640
<v Speaker 2>software YouTube of Theo Brown, who is over four hundred

0:03:16.680 --> 0:03:19.520
<v Speaker 2>and sixty eight thousand subscribers. He's also known as theogg

0:03:19.760 --> 0:03:20.560
<v Speaker 2>who said.

0:03:20.840 --> 0:03:24.400
<v Speaker 1>I didn't know it could get this good. This was

0:03:24.520 --> 0:03:29.280
<v Speaker 1>kind of the like oh fuck moment for me in

0:03:29.320 --> 0:03:33.040
<v Speaker 1>a lot of ways, and I've had to fight like

0:03:33.120 --> 0:03:38.560
<v Speaker 1>a slow spiral into insanity. It's a really really good model.

0:03:39.600 --> 0:03:41.120
<v Speaker 2>He finished by saying, and.

0:03:41.120 --> 0:03:42.960
<v Speaker 1>Keep an eye on your job because I don't know

0:03:43.000 --> 0:03:44.840
<v Speaker 1>what this means for us long term.

0:03:45.360 --> 0:03:48.480
<v Speaker 2>Pretty crazy, right. Comments on the video included people saying

0:03:48.520 --> 0:03:51.200
<v Speaker 2>things like if open aye is helding you hostage, blink

0:03:51.280 --> 0:03:54.200
<v Speaker 2>twice and yes that is an adverbating quote. Another saying

0:03:54.240 --> 0:03:57.040
<v Speaker 2>this dude, is everything wrong in it today? Another saying

0:03:57.080 --> 0:03:59.600
<v Speaker 2>this video was sponsored by open Ai, Another other saying

0:03:59.800 --> 0:04:02.360
<v Speaker 2>ge GPT five failed every test project I gave it today.

0:04:02.440 --> 0:04:04.640
<v Speaker 2>It's a lie in my experience. Maybe they haven't ramped

0:04:04.720 --> 0:04:08.040
<v Speaker 2>up the GPUs now. From what I can tell, THEO

0:04:08.160 --> 0:04:10.800
<v Speaker 2>Brown played with GPT five in open ais offices and

0:04:10.800 --> 0:04:14.640
<v Speaker 2>did all the benchmarking there. Open Ai, by the way,

0:04:14.880 --> 0:04:19.520
<v Speaker 2>fucking how come on? You can't benchmark in their offices anyway.

0:04:19.560 --> 0:04:22.599
<v Speaker 2>Open AI's API based access to GPT five models. You

0:04:22.640 --> 0:04:24.000
<v Speaker 2>know the thing that you use if you want to

0:04:24.000 --> 0:04:26.720
<v Speaker 2>integrate GPT into your app, does not root them, by

0:04:26.760 --> 0:04:29.000
<v Speaker 2>the way, nor does open ai offer access to its

0:04:29.080 --> 0:04:32.440
<v Speaker 2>router or any associated models. Important detail. Just want you

0:04:32.480 --> 0:04:34.400
<v Speaker 2>to know that because we need to make sure very

0:04:34.400 --> 0:04:37.080
<v Speaker 2>clear now A weekly a Theo Brown would put out

0:04:37.120 --> 0:04:39.680
<v Speaker 2>another video called I was wrong about GPT five, which

0:04:39.960 --> 0:04:41.560
<v Speaker 2>he would open by saying.

0:04:41.880 --> 0:04:43.760
<v Speaker 1>So first and foremost, I want to make sure it

0:04:43.800 --> 0:04:47.359
<v Speaker 1>is very very clear that the experience that you probably

0:04:47.400 --> 0:04:50.000
<v Speaker 1>are having with chat, GPT and GPT five right now

0:04:50.400 --> 0:04:52.760
<v Speaker 1>is not the experience that I had when I was

0:04:52.760 --> 0:04:53.600
<v Speaker 1>first testing it.

0:04:53.960 --> 0:04:55.880
<v Speaker 2>Brown goes on to explain that he was not paid

0:04:55.880 --> 0:04:59.120
<v Speaker 2>by open Ai at all, that he was sincerely impressed

0:04:59.120 --> 0:05:01.599
<v Speaker 2>by the company and GA five, and that he'd actually

0:05:01.680 --> 0:05:04.200
<v Speaker 2>spent over twenty five thousand dollars in inference testing it

0:05:04.240 --> 0:05:06.720
<v Speaker 2>on his own company software, and indeed also that he

0:05:06.800 --> 0:05:10.280
<v Speaker 2>turned down a grand appearance fee. Sorry, I mean that's

0:05:10.320 --> 0:05:13.160
<v Speaker 2>a very British thing, one thousand dollars appearance fee, not

0:05:13.240 --> 0:05:16.160
<v Speaker 2>just like a really nice one. Brown claims he asked

0:05:16.160 --> 0:05:18.240
<v Speaker 2>open Ai to try it out, and after they declined

0:05:18.279 --> 0:05:20.240
<v Speaker 2>to let him test it early on his own, he

0:05:20.360 --> 0:05:22.159
<v Speaker 2>was invited to try it on camera with a small

0:05:22.200 --> 0:05:24.679
<v Speaker 2>group of other people open AI's offices where they'd film

0:05:24.720 --> 0:05:27.919
<v Speaker 2>his reactions. He said that the API was incredible, but

0:05:28.000 --> 0:05:30.039
<v Speaker 2>that it's become apparent that the models he was using

0:05:30.080 --> 0:05:31.799
<v Speaker 2>in the video were not the same as those released

0:05:31.839 --> 0:05:34.200
<v Speaker 2>of the public. Making a post on August thirteenth on

0:05:34.440 --> 0:05:37.000
<v Speaker 2>xd Everything app that GPT five was nowhere near as

0:05:37.040 --> 0:05:39.360
<v Speaker 2>good as in cursor as when it was as it

0:05:39.440 --> 0:05:40.960
<v Speaker 2>was when he was using it a few weeks ago,

0:05:41.040 --> 0:05:43.760
<v Speaker 2>complaining that things that worked while demoing it at open

0:05:43.800 --> 0:05:47.159
<v Speaker 2>ai no longer did, adding that there was something somebody

0:05:47.160 --> 0:05:49.680
<v Speaker 2>else on Twitter that said they'd had a similarly great

0:05:49.720 --> 0:05:53.560
<v Speaker 2>experience GPT five on launch that has since decayed. It

0:05:53.640 --> 0:05:55.880
<v Speaker 2>isn't completely clear what happened here, but I'm going to

0:05:55.880 --> 0:05:58.040
<v Speaker 2>guess that open ai showed THEO Brown and others in

0:05:58.080 --> 0:06:01.200
<v Speaker 2>their offices some sort of heavily molded version of the

0:06:01.200 --> 0:06:04.560
<v Speaker 2>model that burns significantly more compute to provide its outputs,

0:06:04.680 --> 0:06:07.599
<v Speaker 2>though I'm also very suspicious of how significance the difference

0:06:07.640 --> 0:06:11.040
<v Speaker 2>is here. Brown's videos attempt to show the difference between

0:06:11.080 --> 0:06:12.840
<v Speaker 2>the generations that you received from the model when it

0:06:12.880 --> 0:06:14.920
<v Speaker 2>was good and when it was bad. In this video,

0:06:15.160 --> 0:06:17.000
<v Speaker 2>which I'll include a link to in the episode notes.

0:06:17.000 --> 0:06:20.280
<v Speaker 2>But if I'm honest, they look pretty similar in that

0:06:20.279 --> 0:06:23.440
<v Speaker 2>they're kind of mediocre. I'm not saying that as a hater,

0:06:23.480 --> 0:06:25.120
<v Speaker 2>by the way. They just kind of look like shit.

0:06:26.000 --> 0:06:28.080
<v Speaker 2>It's just kind of okay, like shit. They look like

0:06:28.160 --> 0:06:31.240
<v Speaker 2>regular fucking generated websites. They don't look special. The good

0:06:31.279 --> 0:06:34.839
<v Speaker 2>one is fine, and the bad one has weird gradients

0:06:34.880 --> 0:06:37.919
<v Speaker 2>on it. This whole thing sucks, though, and was a

0:06:37.960 --> 0:06:41.000
<v Speaker 2>clear set up by open Ai to overstate the abilities

0:06:41.000 --> 0:06:43.320
<v Speaker 2>of GPT five, one that fell apart with the lightest

0:06:43.320 --> 0:06:46.480
<v Speaker 2>brush with reality. I imagine their assumption was that Brown

0:06:46.480 --> 0:06:48.720
<v Speaker 2>would post the glossy video and then walk away, and

0:06:48.760 --> 0:06:51.320
<v Speaker 2>it gave THEO some credit for straight up stating he

0:06:51.360 --> 0:06:53.919
<v Speaker 2>was misled. This was a desperate move and one that

0:06:53.960 --> 0:06:56.000
<v Speaker 2>blew up in the face of open Ai. Along with

0:06:56.040 --> 0:06:58.919
<v Speaker 2>the rest of the GPT five launch. People hate the model,

0:06:59.000 --> 0:07:01.960
<v Speaker 2>customers are mad for taking models away like four to

0:07:02.040 --> 0:07:04.560
<v Speaker 2>H and have remained mad even with their return, and

0:07:04.600 --> 0:07:07.919
<v Speaker 2>the chat gpt subreddit is almost entirely people complaining about

0:07:08.320 --> 0:07:11.320
<v Speaker 2>how ineffective the new version is and how even GPT

0:07:11.360 --> 0:07:13.760
<v Speaker 2>four ROH is not the same They got game of

0:07:13.760 --> 0:07:16.640
<v Speaker 2>brain Baby. As I said in last week's monologue. I

0:07:16.680 --> 0:07:18.800
<v Speaker 2>believe open Ai has grown a fandom rather than any

0:07:18.840 --> 0:07:21.880
<v Speaker 2>kind of sustainable product market fit, and they're now suffering

0:07:21.920 --> 0:07:24.520
<v Speaker 2>fandom like hate with every minor change they make in

0:07:24.520 --> 0:07:27.680
<v Speaker 2>an attempt to push GPT five further, further aggravating people

0:07:27.680 --> 0:07:30.640
<v Speaker 2>that barely understand why they use the product to begin with. Yeah,

0:07:30.760 --> 0:07:33.720
<v Speaker 2>the center of the angle laid the reason for GPT

0:07:33.800 --> 0:07:36.520
<v Speaker 2>five's launch, the belief that this was somehow a cost

0:07:36.520 --> 0:07:39.240
<v Speaker 2>cutting measure, where OpenAI had added a router to chat

0:07:39.280 --> 0:07:41.920
<v Speaker 2>GPT as a means of sending certain requests to cheaper

0:07:41.920 --> 0:07:45.080
<v Speaker 2>models to save money. But when I hear router, I

0:07:45.160 --> 0:07:47.680
<v Speaker 2>hear latency, and I never or even a second believe

0:07:47.720 --> 0:07:49.760
<v Speaker 2>that this would somehow be cheaper to run. It didn't

0:07:49.760 --> 0:07:52.720
<v Speaker 2>make sense. I'm a curious little criator, so I went

0:07:52.760 --> 0:07:55.920
<v Speaker 2>and found out how chat GPT five actually works, and

0:07:56.040 --> 0:07:59.160
<v Speaker 2>unlike the following incredible products that you should buy, it's

0:07:59.200 --> 0:08:12.679
<v Speaker 2>actually kind of a big piece of shit. And we're back,

0:08:13.120 --> 0:08:14.960
<v Speaker 2>and from here on out, I will define two things.

0:08:15.000 --> 0:08:17.720
<v Speaker 2>GPT five referring to the model and its associated mini

0:08:17.720 --> 0:08:20.400
<v Speaker 2>and nano models, and Chat GPT five referring to the

0:08:20.400 --> 0:08:23.520
<v Speaker 2>current state of chat GPT, which features an auto fast

0:08:23.560 --> 0:08:27.120
<v Speaker 2>and thinking and thinking mini model selections. You also can

0:08:27.160 --> 0:08:30.239
<v Speaker 2>see legacy models, but that's not what we're talking about today,

0:08:30.240 --> 0:08:32.760
<v Speaker 2>and that's also only for a little bit. It's a

0:08:32.800 --> 0:08:34.959
<v Speaker 2>distinction I have to make, by the way, and make earlier,

0:08:34.960 --> 0:08:37.480
<v Speaker 2>because the two things are different, they work in different ways,

0:08:37.480 --> 0:08:40.600
<v Speaker 2>and chat GPT five structure induces a bunch of trade

0:08:40.600 --> 0:08:43.600
<v Speaker 2>offs and downsides that, as I'll discuss later, make this

0:08:43.640 --> 0:08:47.320
<v Speaker 2>whole thing even more wasteful. In discussions with a source

0:08:47.360 --> 0:08:50.360
<v Speaker 2>that an infrastructure provider familiar with the architecture, it appears

0:08:50.400 --> 0:08:53.320
<v Speaker 2>that chat GPT five is in fact potentially more expensive

0:08:53.320 --> 0:08:55.679
<v Speaker 2>to run than previous models, and due to the complex

0:08:55.679 --> 0:08:58.200
<v Speaker 2>and chaotic nature of said architecture, can at times spun

0:08:58.320 --> 0:09:02.400
<v Speaker 2>upwards of double The tokens per quid tokens, for those

0:09:02.400 --> 0:09:04.560
<v Speaker 2>who don't know, are basically chunks of texts that the

0:09:04.600 --> 0:09:08.000
<v Speaker 2>AI models do stuff with. I'm simplifying this. Do not

0:09:08.120 --> 0:09:11.600
<v Speaker 2>email me and correct some minor thing nobody cares. A

0:09:11.679 --> 0:09:14.320
<v Speaker 2>sentence like the quick brown fox jumps over the lazy

0:09:14.360 --> 0:09:17.160
<v Speaker 2>dog will be broken into lots of smaller four character chunks.

0:09:17.400 --> 0:09:19.720
<v Speaker 2>There are different kinds of tokens, and they're all priced differently.

0:09:20.080 --> 0:09:22.120
<v Speaker 2>An input token refers to the data you send to

0:09:22.160 --> 0:09:24.280
<v Speaker 2>the model when you ask a question. Output tokens are

0:09:24.360 --> 0:09:26.199
<v Speaker 2>used to measure the size of its response, with bigger

0:09:26.200 --> 0:09:30.240
<v Speaker 2>responses requiring more tokens. The more tokens you burn paquery,

0:09:30.280 --> 0:09:32.480
<v Speaker 2>the more expensive it is to run that query. The

0:09:32.520 --> 0:09:35.560
<v Speaker 2>fact that chat GPT five can, in certain circumstances burn

0:09:35.600 --> 0:09:37.920
<v Speaker 2>twice the number of tokens of query means that every

0:09:38.000 --> 0:09:41.839
<v Speaker 2>question costs more. Chat GPT is also significantly more convoluted,

0:09:41.840 --> 0:09:45.280
<v Speaker 2>plagued by latency issues, and is more compute intensive thanks

0:09:45.280 --> 0:09:49.319
<v Speaker 2>to open a ey's new, smarter, more efficient model routing system.

0:09:50.040 --> 0:09:52.880
<v Speaker 2>In simpler terms, every user prompt on chat GPT, whether

0:09:52.920 --> 0:09:55.920
<v Speaker 2>it's in auto, fast thinking or Thinking Mini, starts by

0:09:55.920 --> 0:09:59.120
<v Speaker 2>putting the users prompt before the static prompt. I don't

0:09:59.160 --> 0:10:01.480
<v Speaker 2>want to lose you here. This is important. A static

0:10:01.480 --> 0:10:04.079
<v Speaker 2>prompt is the invisible instructions given by open Ai to

0:10:04.160 --> 0:10:07.080
<v Speaker 2>chat GPT, in the models themselves and the tools associate

0:10:07.160 --> 0:10:09.800
<v Speaker 2>with them to tell them how to operate. Instructions like

0:10:09.840 --> 0:10:12.199
<v Speaker 2>you are chat GPT, you're a large language model, You're

0:10:12.200 --> 0:10:14.720
<v Speaker 2>a helpful chat bot. Do not threaten them with a knife,

0:10:14.720 --> 0:10:17.280
<v Speaker 2>and so on and so forth. These static prompts are

0:10:17.280 --> 0:10:19.480
<v Speaker 2>different with each model you use. A reasoning model will

0:10:19.480 --> 0:10:22.400
<v Speaker 2>have a different instructions set to a more chat focused one,

0:10:22.440 --> 0:10:24.760
<v Speaker 2>such as think harder about a particular problem before giving

0:10:24.800 --> 0:10:27.760
<v Speaker 2>an answer. Break down problems into component answers. When you

0:10:27.840 --> 0:10:30.200
<v Speaker 2>get a certain thing, like if someone asks you a

0:10:30.240 --> 0:10:33.080
<v Speaker 2>coding question, query a coding tool. That kind of thing,

0:10:33.760 --> 0:10:35.800
<v Speaker 2>a user prompt is exactly what it sounds like, the

0:10:35.840 --> 0:10:37.760
<v Speaker 2>thing that a user wants the AI model to do.

0:10:38.320 --> 0:10:40.560
<v Speaker 2>The new order in chat GPT five becomes an issue

0:10:40.600 --> 0:10:43.080
<v Speaker 2>when you use multiple different models in the same conversation.

0:10:43.160 --> 0:10:45.199
<v Speaker 2>Because the router, the thing that selects the right model

0:10:45.200 --> 0:10:47.520
<v Speaker 2>for the request, has to look at the user prompt.

0:10:47.760 --> 0:10:50.800
<v Speaker 2>It can't consider static instructions first because they may be

0:10:50.840 --> 0:10:53.920
<v Speaker 2>different based on what the user asked. In fact, the

0:10:54.120 --> 0:10:56.000
<v Speaker 2>order has to be flipped for the whole thing to work.

0:10:56.679 --> 0:11:00.240
<v Speaker 2>But simpler previous versions of chat GPT would take the

0:11:00.240 --> 0:11:03.360
<v Speaker 2>static prompt and then invisibly append the user prompt onto it.

0:11:03.400 --> 0:11:06.080
<v Speaker 2>This static prompt would typically be cashed massively, reducing the

0:11:06.080 --> 0:11:08.040
<v Speaker 2>amount of compute the model needs to perform a task.

0:11:08.559 --> 0:11:12.400
<v Speaker 2>Chat GPT cannot do this. Every time you use chat

0:11:12.440 --> 0:11:15.480
<v Speaker 2>GPT five. Every single thing you say or do can

0:11:15.520 --> 0:11:17.880
<v Speaker 2>cause it to do something different. Attach a vile might

0:11:17.880 --> 0:11:20.080
<v Speaker 2>need a different model. Ask it to look into something

0:11:20.120 --> 0:11:22.600
<v Speaker 2>and be detailed. Might trigger a reasoning model or a

0:11:22.600 --> 0:11:26.600
<v Speaker 2>different depth of reasoning. Ask a question in a weird way. Sorry,

0:11:26.600 --> 0:11:27.880
<v Speaker 2>the route is going to need to send you to

0:11:27.880 --> 0:11:30.800
<v Speaker 2>a different model entirely each time, coming up with new

0:11:30.800 --> 0:11:33.839
<v Speaker 2>instructions based on the subtle interpretation of what you asked in.

0:11:34.559 --> 0:11:36.600
<v Speaker 2>Every single thing that can happen when you ask chat

0:11:36.640 --> 0:11:39.280
<v Speaker 2>GPT to do something may triget the route to change model.

0:11:39.400 --> 0:11:41.559
<v Speaker 2>A request a new tool, and each time it does

0:11:41.600 --> 0:11:44.680
<v Speaker 2>so requires a completely fresh static prompt, regardless of whether

0:11:44.679 --> 0:11:46.920
<v Speaker 2>you select auto thinking Faster or any other option on

0:11:47.040 --> 0:11:50.400
<v Speaker 2>chat GPT. This in turn requires it to expend more

0:11:50.400 --> 0:11:53.640
<v Speaker 2>compute with queries consuming more tokens compared to previous versions.

0:11:54.960 --> 0:11:56.640
<v Speaker 2>It's like you started a job, and every time you

0:11:56.720 --> 0:11:58.800
<v Speaker 2>do a task, right an email, make a cup of copy,

0:11:58.920 --> 0:12:03.440
<v Speaker 2>attend a meeting, email someone with a threat your workplace

0:12:03.480 --> 0:12:06.640
<v Speaker 2>requires you to complete the entire mandatory onboarding training first.

0:12:06.760 --> 0:12:08.800
<v Speaker 2>One way that it is spreadsheet, not before you brush up

0:12:08.800 --> 0:12:13.040
<v Speaker 2>on your anti biberary legislation. First your prick. As a result,

0:12:13.120 --> 0:12:16.160
<v Speaker 2>Chat GPT may be smart, but it doesn't really seem

0:12:16.160 --> 0:12:20.320
<v Speaker 2>efficient in the GPT five version. Now to play Devil's advoca,

0:12:20.480 --> 0:12:22.840
<v Speaker 2>open Ai likely added the routing model as a means

0:12:22.840 --> 0:12:25.440
<v Speaker 2>of creating a more sophisticated output for a user, and

0:12:25.520 --> 0:12:28.959
<v Speaker 2>I imagine with the intention of cost saving. Then again,

0:12:29.000 --> 0:12:30.800
<v Speaker 2>this might just be the thing it had to ship.

0:12:30.920 --> 0:12:32.760
<v Speaker 2>After all, GPT five was meant to be the next

0:12:32.840 --> 0:12:35.000
<v Speaker 2>great leap in AI, and the pressure was on to

0:12:35.040 --> 0:12:37.480
<v Speaker 2>get it out the door by creating a system that

0:12:37.520 --> 0:12:41.040
<v Speaker 2>depends on an extern and or routing model, likely another LM.

0:12:41.080 --> 0:12:43.280
<v Speaker 2>In this case, open ai has removed the ability to

0:12:43.280 --> 0:12:46.200
<v Speaker 2>cash the hidden instructions that dictate the how the models

0:12:46.240 --> 0:12:50.840
<v Speaker 2>generate answers in chat GPT, creating massive infrastructural overhead. Worse still,

0:12:51.000 --> 0:12:53.880
<v Speaker 2>this happens with every single turn as in message on

0:12:53.960 --> 0:12:56.880
<v Speaker 2>Chat GPT five, regardless of the model you choose, creating

0:12:57.000 --> 0:12:59.800
<v Speaker 2>endless infrastructural baggage with no real way out that only

0:12:59.800 --> 0:13:02.880
<v Speaker 2>could pounds based on how complex the user's queries get

0:13:02.920 --> 0:13:05.280
<v Speaker 2>or how much they change. They could be simple, but

0:13:05.400 --> 0:13:08.560
<v Speaker 2>just going in different directions every time, could open ai

0:13:08.679 --> 0:13:10.800
<v Speaker 2>make a better router? Sure? Does it have a good

0:13:10.840 --> 0:13:13.959
<v Speaker 2>one today? No, every time you message CHATGBT as the

0:13:13.960 --> 0:13:16.640
<v Speaker 2>potential to change model or tooling based on its own whims,

0:13:16.760 --> 0:13:19.200
<v Speaker 2>each time requiring a fresh static prompt, and short of

0:13:19.480 --> 0:13:22.240
<v Speaker 2>totally reworking the architecture of chat GPT five, there's no

0:13:22.280 --> 0:13:25.280
<v Speaker 2>way to change this. And if it's an LLM choosing

0:13:25.320 --> 0:13:28.640
<v Speaker 2>which model, I don't know, maybe it hallucinates just a guess.

0:13:29.400 --> 0:13:30.840
<v Speaker 2>It doesn't even need to be the case where a

0:13:30.920 --> 0:13:33.560
<v Speaker 2>user asks chet GPT five to think, and based on

0:13:33.600 --> 0:13:36.480
<v Speaker 2>my test with GPT five, sometimes you can just ask

0:13:36.480 --> 0:13:38.800
<v Speaker 2>it a forward question and it will think about it.

0:13:38.800 --> 0:13:41.840
<v Speaker 2>For no apparent reason, open ai has created a product

0:13:41.840 --> 0:13:45.680
<v Speaker 2>with latency issues and an overwhelmingly convoluted routing system that's

0:13:45.720 --> 0:13:48.560
<v Speaker 2>already straining capacity, to the point that this announcement feels

0:13:48.640 --> 0:13:51.880
<v Speaker 2>like open ai is walking away from its API entirely. This,

0:13:52.000 --> 0:13:53.880
<v Speaker 2>as a reminder, is the thing that people use to

0:13:53.920 --> 0:13:56.800
<v Speaker 2>incorporate open AI's models into their apps while also running

0:13:56.800 --> 0:13:59.560
<v Speaker 2>set models on the infrastructure open Ai rants from Microsoft

0:14:00.040 --> 0:14:02.400
<v Speaker 2>and call even at some point as well as Oracle,

0:14:03.200 --> 0:14:05.600
<v Speaker 2>and this API thing is really weird by the way

0:14:05.640 --> 0:14:08.559
<v Speaker 2>because these are new models, but Open Eyes really not

0:14:08.600 --> 0:14:11.760
<v Speaker 2>talking about the models themselves that much. Unlike the GPT

0:14:11.840 --> 0:14:14.840
<v Speaker 2>four rower announcement, which mentions the API in the first paragraph,

0:14:14.920 --> 0:14:17.440
<v Speaker 2>the GPT five announcement has no reference to it and

0:14:17.520 --> 0:14:19.720
<v Speaker 2>only has a single reference to developers at all when

0:14:19.760 --> 0:14:22.560
<v Speaker 2>talking about coding. Some woman has already hinted that he

0:14:22.640 --> 0:14:25.680
<v Speaker 2>intends to deprecate any new API demand, though I imagine

0:14:25.680 --> 0:14:27.920
<v Speaker 2>it will let anyone who will pay for priority processing,

0:14:27.960 --> 0:14:31.400
<v Speaker 2>which is essentially open eyes way to require minimum commitments

0:14:31.400 --> 0:14:34.040
<v Speaker 2>and extra payments from API customers just so they never

0:14:34.120 --> 0:14:37.200
<v Speaker 2>feel the bite of any compute shortages and throttling, which

0:14:37.200 --> 0:14:40.520
<v Speaker 2>they absolutely will do to people that don't pay. Chat

0:14:40.520 --> 0:14:43.000
<v Speaker 2>GPT five feels like the ultimate comeuppance for a company

0:14:43.000 --> 0:14:45.040
<v Speaker 2>that has never been forced to build a product, choosing

0:14:45.120 --> 0:14:48.200
<v Speaker 2>instead to bolt increasingly complex tools onto the side of

0:14:48.280 --> 0:14:51.280
<v Speaker 2>models in the hopes that one will magically appear. Now,

0:14:51.360 --> 0:14:53.880
<v Speaker 2>each and every feature of Chat GPT burns more money

0:14:53.880 --> 0:14:56.760
<v Speaker 2>than it ever did before. Chat GPT five feels like

0:14:56.800 --> 0:14:58.600
<v Speaker 2>a product that was rushed to market by a desperate

0:14:58.600 --> 0:15:00.680
<v Speaker 2>company that had to get something out of the In

0:15:00.720 --> 0:15:04.120
<v Speaker 2>simpler terms, here, it's actually really funny. When I worked

0:15:04.160 --> 0:15:07.200
<v Speaker 2>this out, I chuckled. I chuckled vigorously. This is just

0:15:07.240 --> 0:15:10.200
<v Speaker 2>a case where open ai has given chat gpt middle manager.

0:15:10.960 --> 0:15:12.640
<v Speaker 2>But now I'm giving you the chance to open up

0:15:12.680 --> 0:15:15.680
<v Speaker 2>your hearts and do something better. Open up your wallets too,

0:15:15.680 --> 0:15:18.800
<v Speaker 2>and send money to a company that follows here, But

0:15:19.000 --> 0:15:38.280
<v Speaker 2>hold my advertisements and we're back. Like every great middle manager,

0:15:38.480 --> 0:15:41.280
<v Speaker 2>chat GPT five's rutter creates more work based on its

0:15:41.320 --> 0:15:43.840
<v Speaker 2>own interpretation of what's going on, and has a separate

0:15:43.920 --> 0:15:45.960
<v Speaker 2>large language model. I can't imagine it has a ton

0:15:46.000 --> 0:15:48.520
<v Speaker 2>of training data available if I had to guess, and

0:15:48.560 --> 0:15:51.080
<v Speaker 2>this is a guess by the way open ai has done,

0:15:51.120 --> 0:15:53.160
<v Speaker 2>and we'll do a lot of fine tuning and reinforcement

0:15:53.240 --> 0:15:55.680
<v Speaker 2>learning to make it work. Though, to give it a

0:15:55.680 --> 0:15:57.640
<v Speaker 2>little grace, this is a new thing that it's doing,

0:15:57.680 --> 0:16:01.840
<v Speaker 2>and it's doing sort of a huge scale. The problems start,

0:16:01.880 --> 0:16:03.600
<v Speaker 2>by the way, with the fact that chat GPT five

0:16:03.680 --> 0:16:06.280
<v Speaker 2>is taking the user's initial prompt and then deciding which

0:16:06.280 --> 0:16:09.720
<v Speaker 2>model to use, unlike previous models, which sent your prompt

0:16:09.760 --> 0:16:11.920
<v Speaker 2>directly to the model along with the static prompt which

0:16:11.960 --> 0:16:13.880
<v Speaker 2>was cashed and came first. An important feature in how

0:16:13.960 --> 0:16:17.080
<v Speaker 2>these models, limit tokenburn. Open ai starts with a router

0:16:17.160 --> 0:16:20.400
<v Speaker 2>model that makes takes what you ask and gives its

0:16:20.480 --> 0:16:22.560
<v Speaker 2>chat GPT and tags it based on what kind of

0:16:22.640 --> 0:16:25.400
<v Speaker 2>thing your question might need. The thing might be a tool,

0:16:25.480 --> 0:16:27.400
<v Speaker 2>such as whether it has to do a web search

0:16:27.480 --> 0:16:30.360
<v Speaker 2>to spit out the thing at the end, a reasoning model,

0:16:30.520 --> 0:16:32.360
<v Speaker 2>whether it needs to use a coding language, and so

0:16:32.520 --> 0:16:35.760
<v Speaker 2>on and so forth. Once chat GPT has bounced your

0:16:35.800 --> 0:16:38.800
<v Speaker 2>query across various models, burn and compute along the way,

0:16:39.040 --> 0:16:41.600
<v Speaker 2>it then pushes it towards the chat portion of the generation.

0:16:42.080 --> 0:16:44.480
<v Speaker 2>And each time you ask chat GPT a question or

0:16:44.600 --> 0:16:47.520
<v Speaker 2>to do something and you specialized static prompt is generated,

0:16:47.800 --> 0:16:50.920
<v Speaker 2>sometimes several make it impossible to cash them in advance.

0:16:51.240 --> 0:16:53.520
<v Speaker 2>In simpler terms, each time you message it, chat GPT

0:16:53.640 --> 0:16:56.760
<v Speaker 2>is to dump all cased information and instructions for what

0:16:56.800 --> 0:16:59.120
<v Speaker 2>you need to do and reload it with each prompt.

0:16:59.520 --> 0:17:02.120
<v Speaker 2>Now here's some examples of what chat GPT five has

0:17:02.200 --> 0:17:04.879
<v Speaker 2>to reload every single time you prompt him whether or

0:17:04.880 --> 0:17:06.560
<v Speaker 2>not to use a browser or search the internet, and

0:17:06.640 --> 0:17:09.200
<v Speaker 2>under what conditions to do so, because they will change

0:17:09.200 --> 0:17:12.040
<v Speaker 2>with each prompt. How to approach a particular problem based

0:17:12.080 --> 0:17:14.439
<v Speaker 2>on what the user asked, including any specific ways you

0:17:14.480 --> 0:17:16.840
<v Speaker 2>meant to answer, tone, brevity, and so on based on

0:17:16.920 --> 0:17:20.840
<v Speaker 2>their request, specifics around how it might use, say open

0:17:20.880 --> 0:17:23.800
<v Speaker 2>ais code interpreter, such as the usage rules for running

0:17:23.800 --> 0:17:25.920
<v Speaker 2>a Python script, or how you want the code's output,

0:17:25.960 --> 0:17:28.359
<v Speaker 2>which again will be different based on each prompt. And

0:17:28.520 --> 0:17:30.199
<v Speaker 2>you can even say, do it in the exactly the

0:17:30.200 --> 0:17:32.919
<v Speaker 2>same way, and because it's a large language model, it

0:17:32.960 --> 0:17:37.480
<v Speaker 2>may hallucinate something different every single goddamn time you prompt

0:17:37.560 --> 0:17:40.520
<v Speaker 2>chat GPT five it has to do this. Worse still,

0:17:40.560 --> 0:17:43.480
<v Speaker 2>a particular conversation can involve you using multiple different models

0:17:43.520 --> 0:17:47.119
<v Speaker 2>and tools, requiring you with each and every prompt, having

0:17:47.119 --> 0:17:49.639
<v Speaker 2>to inject a different static prompt for each component that

0:17:49.720 --> 0:17:52.800
<v Speaker 2>chat GPT five uses. And you can't catch the static

0:17:52.800 --> 0:17:54.760
<v Speaker 2>prompt before the user's intent because if you did that,

0:17:55.040 --> 0:17:57.040
<v Speaker 2>it might send an instruction to a model that doesn't

0:17:57.040 --> 0:17:59.199
<v Speaker 2>make sense, such as telling a reasoning model to give

0:17:59.200 --> 0:18:01.840
<v Speaker 2>a quick and simple line answer remini or nanomodel to

0:18:01.880 --> 0:18:04.000
<v Speaker 2>do some sort of deep reasoning, which would create a

0:18:04.000 --> 0:18:07.920
<v Speaker 2>crappy answer and burn tokens in the process. And this

0:18:07.960 --> 0:18:10.040
<v Speaker 2>is all thanks to the complicated way that open ai

0:18:10.160 --> 0:18:14.400
<v Speaker 2>insisted on building GPT five. Every single time you send

0:18:14.480 --> 0:18:16.399
<v Speaker 2>something to chat, GPT can trigger it to use a

0:18:16.560 --> 0:18:21.199
<v Speaker 2>different series of models audio vision, reasoning, each with their

0:18:21.240 --> 0:18:24.680
<v Speaker 2>own instructions, static prompts, all while pulling different tools, each

0:18:24.720 --> 0:18:27.359
<v Speaker 2>requiring their own instructions based on what you asked, and

0:18:27.440 --> 0:18:30.679
<v Speaker 2>reasoning models even have different depths of reasoning. Unlike four

0:18:30.720 --> 0:18:33.800
<v Speaker 2>to ZH, which is a multimodal model combining text, vision,

0:18:33.800 --> 0:18:36.399
<v Speaker 2>and voice, GPT five is a ratking of open AI's

0:18:36.440 --> 0:18:38.720
<v Speaker 2>models and tools that gets reborn every single time you

0:18:38.760 --> 0:18:41.640
<v Speaker 2>ask it to do anything prompt It can prompt cash

0:18:41.720 --> 0:18:45.199
<v Speaker 2>some things, but the core instructions not so much. But

0:18:45.280 --> 0:18:47.600
<v Speaker 2>let's get a little more granular, because I know I've

0:18:47.720 --> 0:18:51.480
<v Speaker 2>been quite repetitive, but this is detailed. So from what

0:18:51.520 --> 0:18:53.879
<v Speaker 2>I've been told, there are either one or two models

0:18:53.880 --> 0:18:55.639
<v Speaker 2>at work for the routing. I'm going to go with

0:18:55.680 --> 0:18:57.600
<v Speaker 2>what I think is most likely based on the discussions

0:18:57.600 --> 0:19:00.640
<v Speaker 2>I've had with people familiar with the architecture. I've heard

0:19:00.680 --> 0:19:04.040
<v Speaker 2>the term orchestrator thrown around potential to potentially suggesting the

0:19:04.119 --> 0:19:06.840
<v Speaker 2>router may be more omnipresent throughout the process, but I

0:19:06.880 --> 0:19:09.479
<v Speaker 2>was unable to confirm its existence. Reach out of you

0:19:09.480 --> 0:19:12.480
<v Speaker 2>here differently, I'll explain things as they were explained to me. Though.

0:19:13.080 --> 0:19:15.760
<v Speaker 2>When a user sensor prompt, it goes through the Splitter leg,

0:19:15.760 --> 0:19:18.480
<v Speaker 2>which decides to send the query on one of two paths.

0:19:18.760 --> 0:19:21.399
<v Speaker 2>One is called the fast path, where a query is straightforward,

0:19:21.400 --> 0:19:24.240
<v Speaker 2>such as a text only conversation that doesn't require any

0:19:24.400 --> 0:19:27.399
<v Speaker 2>analysis or extra tools or thinking, a path where the

0:19:27.440 --> 0:19:30.679
<v Speaker 2>query may require reasoning or more complex tools like codgeneration

0:19:30.800 --> 0:19:33.560
<v Speaker 2>or access to web browser for research. To be clear,

0:19:33.640 --> 0:19:35.639
<v Speaker 2>there are prompts where it may be split into multiple

0:19:35.680 --> 0:19:38.320
<v Speaker 2>paths that trigger multiple models or tools, each requiring their

0:19:38.320 --> 0:19:41.720
<v Speaker 2>own static instructions. From what I understand, the splitter model

0:19:41.800 --> 0:19:44.480
<v Speaker 2>is a completely separate large language model, though we don't

0:19:44.480 --> 0:19:47.600
<v Speaker 2>have a ton of details about it. I also, based

0:19:47.600 --> 0:19:49.720
<v Speaker 2>on conversations I've had, think there's a chance there could

0:19:49.720 --> 0:19:52.000
<v Speaker 2>be a separate model that sits above the splitter that

0:19:52.080 --> 0:19:55.119
<v Speaker 2>does much lighter classification of how a query might be routed.

0:19:55.160 --> 0:19:56.919
<v Speaker 2>So you ask it to do something, it might just

0:19:57.000 --> 0:20:00.359
<v Speaker 2>go Okay, this looks like it needs a tool and

0:20:00.400 --> 0:20:02.600
<v Speaker 2>going off. Why now? In any case, none of this

0:20:02.680 --> 0:20:05.240
<v Speaker 2>can be cashed because all of this exists before inference,

0:20:05.400 --> 0:20:07.679
<v Speaker 2>which is where, by the way, it's inference I've misstated

0:20:07.720 --> 0:20:10.919
<v Speaker 2>in the past. Is like it inferring, meaning inference is

0:20:11.000 --> 0:20:14.240
<v Speaker 2>everything that happens to get an output to you. So

0:20:14.400 --> 0:20:17.320
<v Speaker 2>all of the stuff that's happening. And by the way,

0:20:17.359 --> 0:20:20.239
<v Speaker 2>this is all a completely new cost that open ai

0:20:20.359 --> 0:20:22.760
<v Speaker 2>has created. No one does this like this, it's so

0:20:22.840 --> 0:20:25.400
<v Speaker 2>fucking stupid. But now we get to the chat leg.

0:20:25.720 --> 0:20:27.919
<v Speaker 2>Now the open ai has added layers of extraction, it

0:20:27.920 --> 0:20:30.000
<v Speaker 2>can begin cooking up the output, by which I mean

0:20:30.200 --> 0:20:32.560
<v Speaker 2>do inference. The chat leg is where the pieces that

0:20:32.600 --> 0:20:35.080
<v Speaker 2>the splitter model created are pulled together, each loaded into

0:20:35.119 --> 0:20:38.159
<v Speaker 2>their with their respective static prompts based on what the

0:20:38.240 --> 0:20:40.879
<v Speaker 2>user asked chat GPD five to do. Each piece of

0:20:40.880 --> 0:20:43.080
<v Speaker 2>the model a tool to generate Python and an image

0:20:43.119 --> 0:20:46.400
<v Speaker 2>generation tool a reasoning model. To generate an output has

0:20:46.440 --> 0:20:49.720
<v Speaker 2>to process an entirely new static prompt and again that's

0:20:49.760 --> 0:20:53.560
<v Speaker 2>every interaction. Remember, static prompts are effectively instruction. So the

0:20:53.560 --> 0:20:55.680
<v Speaker 2>splitter model has told each piece of the pie how

0:20:55.720 --> 0:20:58.280
<v Speaker 2>to act to create a particular output. As a result,

0:20:58.400 --> 0:20:59.960
<v Speaker 2>much of this can't be cashed, creating more and more

0:21:00.160 --> 0:21:03.240
<v Speaker 2>repetitious token bone response and mean to have to repeat

0:21:03.280 --> 0:21:05.919
<v Speaker 2>this stuff so that you really get him. The upshot

0:21:05.920 --> 0:21:08.000
<v Speaker 2>of the chat legs static prompt baggage is that you

0:21:08.040 --> 0:21:10.000
<v Speaker 2>can do a little more here, at least in theory,

0:21:10.200 --> 0:21:13.119
<v Speaker 2>because each component can be instructed separately, they can again,

0:21:13.160 --> 0:21:16.320
<v Speaker 2>in theory, be made to give more individualized, specialized outputs,

0:21:16.359 --> 0:21:18.440
<v Speaker 2>like creating an image with tags that is as I'll

0:21:18.440 --> 0:21:21.080
<v Speaker 2>give you an example of very shortly generated using a

0:21:21.119 --> 0:21:26.520
<v Speaker 2>specific reasoning model. I'm clutching it straws here. I don't

0:21:26.520 --> 0:21:29.360
<v Speaker 2>really know if this's better, but I'm trying to be reasonable.

0:21:29.400 --> 0:21:31.960
<v Speaker 2>I'm trying to be normal. Every day, I try and

0:21:32.000 --> 0:21:35.679
<v Speaker 2>be normal. Previously, Open Eye's advantage was that a model

0:21:35.760 --> 0:21:37.400
<v Speaker 2>like four to oh was a kind of a jack

0:21:37.440 --> 0:21:39.800
<v Speaker 2>of all trades. But to get the benefits of chat

0:21:39.840 --> 0:21:42.919
<v Speaker 2>GPT five and that's in air quotes, it's engaged a

0:21:43.000 --> 0:21:46.680
<v Speaker 2>conductor model that can just make things more convoluted, even

0:21:46.720 --> 0:21:49.480
<v Speaker 2>in the case of simple requests. Let me give you

0:21:49.480 --> 0:21:52.520
<v Speaker 2>an example. You upload a chart of NFL player's stats

0:21:52.520 --> 0:21:55.240
<v Speaker 2>and ask chat GPT to decide which is the best

0:21:55.240 --> 0:21:57.160
<v Speaker 2>of the group and create an image to show the results.

0:21:57.359 --> 0:21:59.880
<v Speaker 2>In GPT four oh, chat GPT would use one more

0:22:00.160 --> 0:22:02.359
<v Speaker 2>and thus one static prompt to look at the image,

0:22:02.400 --> 0:22:04.520
<v Speaker 2>decide which tools to use, and then how to format

0:22:04.560 --> 0:22:07.600
<v Speaker 2>the response. You only needed one prompt, which was cased

0:22:07.640 --> 0:22:09.560
<v Speaker 2>because one model can look at the stats for all

0:22:09.600 --> 0:22:11.480
<v Speaker 2>the data and make the decisions and then use the

0:22:11.520 --> 0:22:15.160
<v Speaker 2>image generation tool to make the final image. In GPT five,

0:22:15.240 --> 0:22:17.960
<v Speaker 2>the chet GPT conductor model would see the stats, root

0:22:18.040 --> 0:22:20.640
<v Speaker 2>it to a vision model requiring its own static prompt,

0:22:20.680 --> 0:22:23.160
<v Speaker 2>then a separate text only reasoning model, one that has

0:22:23.160 --> 0:22:25.000
<v Speaker 2>no ability to use tools, but it might be cheaper

0:22:25.000 --> 0:22:27.919
<v Speaker 2>to get an answer from and also requires a static prompt,

0:22:28.640 --> 0:22:30.960
<v Speaker 2>and that would then decide which players are best and

0:22:31.000 --> 0:22:32.760
<v Speaker 2>then spit out an output, and then root it to

0:22:32.800 --> 0:22:35.680
<v Speaker 2>a completely separate model that can generate texts to query

0:22:35.720 --> 0:22:39.199
<v Speaker 2>the image tool again need a stag prompt for this

0:22:39.359 --> 0:22:41.600
<v Speaker 2>to then generate the image. On top of all this

0:22:41.680 --> 0:22:44.920
<v Speaker 2>onerous baggage lies another problem. The GPT five's various models

0:22:44.920 --> 0:22:48.160
<v Speaker 2>are just more complex. By splitting out the component elements

0:22:48.160 --> 0:22:50.080
<v Speaker 2>of what a model can do and allowing each model

0:22:50.119 --> 0:22:52.600
<v Speaker 2>to have different levels of reasoning, even the cheaper ones

0:22:52.640 --> 0:22:55.399
<v Speaker 2>like MIDI and nano open AI has created an endless

0:22:55.440 --> 0:22:57.960
<v Speaker 2>combination of different reasons to have to make a brand

0:22:58.000 --> 0:23:01.520
<v Speaker 2>new static prompt instruction automated by a router, a large

0:23:01.560 --> 0:23:04.000
<v Speaker 2>language model that chooses what large language model to choose

0:23:04.000 --> 0:23:08.760
<v Speaker 2>for a query. It is, if I'm honest, kind of funny.

0:23:08.960 --> 0:23:12.040
<v Speaker 2>Reasoning models work when simply described by breaking up a

0:23:12.040 --> 0:23:14.480
<v Speaker 2>prompt into component pieces, looking over them, and deciding what

0:23:14.520 --> 0:23:17.320
<v Speaker 2>the best course of action might be. Chat GPT's router

0:23:17.359 --> 0:23:19.920
<v Speaker 2>is effectively an abstraction higher breaking up the prompt into

0:23:19.920 --> 0:23:22.679
<v Speaker 2>component pieces, then choosing different models for each of those pieces,

0:23:22.680 --> 0:23:26.000
<v Speaker 2>which may in turn be broken up by a reasoning model.

0:23:26.240 --> 0:23:28.119
<v Speaker 2>While I wouldn't say this is a hat on a

0:23:28.160 --> 0:23:31.119
<v Speaker 2>hat situation, it is at this point unclear what exactly

0:23:31.200 --> 0:23:35.320
<v Speaker 2>the benefits of chat GPT five's new architecture are, less hallucinations,

0:23:35.640 --> 0:23:38.440
<v Speaker 2>better answers. Based on what I've been told, this was

0:23:38.480 --> 0:23:41.480
<v Speaker 2>a decision made to increase the model's performance, what I

0:23:41.520 --> 0:23:43.919
<v Speaker 2>can say is that this very likely increased open ayes

0:23:43.960 --> 0:23:45.560
<v Speaker 2>overhead at a time when it needs to do the

0:23:45.600 --> 0:23:49.040
<v Speaker 2>exact opposite. Even if chat GPT five pushes people towards

0:23:49.119 --> 0:23:51.920
<v Speaker 2>cheaper models, it does so while guaranteeing extra costs and

0:23:52.000 --> 0:23:55.400
<v Speaker 2>latency and whatever signals it may learn as people use.

0:23:55.440 --> 0:23:59.200
<v Speaker 2>This will have to create significant benefits massive one hundred

0:23:59.240 --> 0:24:01.720
<v Speaker 2>percent plus game for it to be anything close to worthwhile.

0:24:02.400 --> 0:24:04.359
<v Speaker 2>While open ai is rude to may be smart in

0:24:04.480 --> 0:24:06.560
<v Speaker 2>terms of nuance of how it might answer a query,

0:24:06.600 --> 0:24:09.600
<v Speaker 2>and even that I question it most decidedly, is not

0:24:09.720 --> 0:24:12.119
<v Speaker 2>more efficient and may have actually increased the burn rate

0:24:12.160 --> 0:24:13.680
<v Speaker 2>for a company that will lose as much as eight

0:24:13.720 --> 0:24:16.400
<v Speaker 2>billion dollars this year, and I think that number might

0:24:16.440 --> 0:24:19.840
<v Speaker 2>be low too. Yet what I'm left with in writing

0:24:19.880 --> 0:24:23.080
<v Speaker 2>this script is how wasteful all of this is. Open Ai,

0:24:23.400 --> 0:24:26.439
<v Speaker 2>a company that is already incinerated upwards of fifteen billion

0:24:26.480 --> 0:24:28.840
<v Speaker 2>dollars in the last two years, has chosen to create

0:24:28.880 --> 0:24:31.000
<v Speaker 2>a less efficient way of doing business as a means

0:24:31.000 --> 0:24:35.040
<v Speaker 2>of eking out and monest the best performance improvements. It

0:24:35.160 --> 0:24:38.359
<v Speaker 2>just sucks. In our own lives, we're continually pushed and

0:24:38.400 --> 0:24:40.960
<v Speaker 2>pressured and punished if we get into debt, judged by

0:24:40.960 --> 0:24:43.480
<v Speaker 2>our peers and our parents, if we spend our money recklessly,

0:24:43.640 --> 0:24:45.920
<v Speaker 2>and if we're too reckless, we find ourselves less likely

0:24:45.960 --> 0:24:49.600
<v Speaker 2>to receive anything from credit to housing. Companies like open

0:24:49.600 --> 0:24:52.560
<v Speaker 2>Ai live by a different set of standards. Some Mormon

0:24:52.640 --> 0:24:54.959
<v Speaker 2>intends to lose more than forty four billion dollars by

0:24:55.000 --> 0:24:57.080
<v Speaker 2>the end of twenty twenty eight on open Ai, and

0:24:57.119 --> 0:25:00.639
<v Speaker 2>graciously told CNBC, like Lord Farquad, he was willing to

0:25:00.720 --> 0:25:02.919
<v Speaker 2>run at a loss for a long time where he

0:25:03.000 --> 0:25:06.000
<v Speaker 2>was treated like he was this smart, reasonable decision maker

0:25:06.080 --> 0:25:08.920
<v Speaker 2>rather than someone that needs to rein in their horrendous

0:25:09.000 --> 0:25:12.560
<v Speaker 2>spending habits and be more mindful. The ultra rich are

0:25:12.720 --> 0:25:15.280
<v Speaker 2>rewarded far more for their errant spending habits than we

0:25:15.320 --> 0:25:18.160
<v Speaker 2>ever are for any thrifty inness or austerity measures we make,

0:25:18.600 --> 0:25:20.600
<v Speaker 2>and none of us are afforded the level of grace

0:25:20.640 --> 0:25:24.720
<v Speaker 2>that Clammy sam Altman has been and has been feels appropriate.

0:25:25.240 --> 0:25:28.960
<v Speaker 2>Chat GPT five is an engineering nightmare, a phenomenally silly

0:25:29.000 --> 0:25:31.240
<v Speaker 2>and desperate attempt to duce what remains of the dying

0:25:31.280 --> 0:25:34.760
<v Speaker 2>innovation and excitement within the walls of open Ai. It's

0:25:34.800 --> 0:25:37.480
<v Speaker 2>not November twenty twenty two anymore. And let's be honest,

0:25:37.480 --> 0:25:39.959
<v Speaker 2>there really hasn't been anything exciting or interesting out this

0:25:40.000 --> 0:25:44.560
<v Speaker 2>company since GPT four. There's nothing exciting happening at this company.

0:25:45.080 --> 0:25:47.600
<v Speaker 2>As many as seven hundred million people a week allegedly

0:25:47.680 --> 0:25:50.320
<v Speaker 2>used chat GPT, but nobody can really say why. An

0:25:50.320 --> 0:25:53.720
<v Speaker 2>open Ai, despite its massive popularity. Cannot seem to stop

0:25:53.760 --> 0:25:56.880
<v Speaker 2>losing billions of dollars, and it can't seem to explain

0:25:56.920 --> 0:26:00.399
<v Speaker 2>why that's necessary other than this shit's really expensive. Dude,

0:26:00.960 --> 0:26:03.480
<v Speaker 2>Can anyone actually articulate a reason why we need to

0:26:03.480 --> 0:26:05.960
<v Speaker 2>burn billions of dollars to do this? What are we doing?

0:26:06.080 --> 0:26:08.240
<v Speaker 2>Why are we doing it? Has everybody just agreed to

0:26:08.280 --> 0:26:11.080
<v Speaker 2>do this until it becomes a completely untenable Do we

0:26:11.119 --> 0:26:13.080
<v Speaker 2>all yearn for the abyss so much that we can't

0:26:13.080 --> 0:26:17.359
<v Speaker 2>find camaraderie and admitting we were wrong? Look at GPT five.

0:26:17.880 --> 0:26:20.399
<v Speaker 2>This is, if you believe the hype, the best funded,

0:26:20.440 --> 0:26:23.320
<v Speaker 2>best resourced company in the world, with the greatest mind

0:26:23.359 --> 0:26:26.080
<v Speaker 2>and its helm and the greatest minds within its wars.

0:26:26.240 --> 0:26:28.600
<v Speaker 2>And this is the best they've gone. A large language

0:26:28.640 --> 0:26:31.480
<v Speaker 2>model that chooses which large language model will answer your question.

0:26:32.200 --> 0:26:35.160
<v Speaker 2>G fucking wit, Sam Mortman sounds dandy, and how much

0:26:35.240 --> 0:26:37.800
<v Speaker 2>better is this? You say, Oh, you can't really say

0:26:38.119 --> 0:26:40.400
<v Speaker 2>fucking brilliant? Hey does it do anything new?

0:26:40.840 --> 0:26:40.879
<v Speaker 3>No?

0:26:41.720 --> 0:26:44.560
<v Speaker 2>Oh, what's that? It's actually our job to work out

0:26:44.560 --> 0:26:48.040
<v Speaker 2>for ourselves. Thanks man, I love it. I love this shit.

0:26:48.200 --> 0:26:50.560
<v Speaker 2>And if you're someone that is a hype merchant listening

0:26:50.560 --> 0:26:52.200
<v Speaker 2>to this and you've done really well getting to the

0:26:52.280 --> 0:26:54.119
<v Speaker 2>end of the third part. By the way, I respect you.

0:26:54.440 --> 0:26:56.639
<v Speaker 2>I want you to email me and explain why they

0:26:56.680 --> 0:26:59.159
<v Speaker 2>should be justified in burning billions of dollars if you

0:26:59.280 --> 0:27:03.040
<v Speaker 2>tell me, if you tell me Aws, I will eat

0:27:03.080 --> 0:27:06.000
<v Speaker 2>you alive. I mean that, does it? I mean that

0:27:06.240 --> 0:27:10.399
<v Speaker 2>completely literally, I will unhinge my jaw. I'll eat you

0:27:10.520 --> 0:27:12.359
<v Speaker 2>like Kirby and shit out of dance. I've said that

0:27:12.359 --> 0:27:15.600
<v Speaker 2>one before, but I'm going with him in any case.

0:27:16.400 --> 0:27:19.119
<v Speaker 2>This three parter has also really reminded me how ridiculous

0:27:19.160 --> 0:27:23.120
<v Speaker 2>this is, how nonsensical things have become, and how much

0:27:23.200 --> 0:27:27.920
<v Speaker 2>waste has been kind of justified, justified on this idea

0:27:27.960 --> 0:27:30.200
<v Speaker 2>that this will become something by people that don't really

0:27:30.200 --> 0:27:32.240
<v Speaker 2>know what it does today or might do in the future.

0:27:32.840 --> 0:27:34.919
<v Speaker 2>None of this is going to end well, and not

0:27:34.960 --> 0:27:38.080
<v Speaker 2>even the boosters seem to be having fun anymore. Everybody's

0:27:38.160 --> 0:27:40.640
<v Speaker 2>just flating around waiting for it to end. Even Sam

0:27:40.720 --> 0:27:43.600
<v Speaker 2>Ortman seems tired of it all. I know, I bloody

0:27:43.600 --> 0:27:54.359
<v Speaker 2>well I am. Thank you for listening to Better Offline.

0:27:54.480 --> 0:27:56.920
<v Speaker 3>The editor and composer of the Better Offline theme song

0:27:57.000 --> 0:27:59.639
<v Speaker 3>is Metosowski. You can check out more of his music

0:27:59.640 --> 0:28:03.320
<v Speaker 3>and audio projects at Mattasowski dot com m A T

0:28:03.320 --> 0:28:07.760
<v Speaker 3>T O S O W s Ki dot com. You

0:28:07.800 --> 0:28:10.320
<v Speaker 3>can email me at easy at better offline dot com

0:28:10.400 --> 0:28:12.720
<v Speaker 3>or visit better offline dot com to find more podcast

0:28:12.760 --> 0:28:16.080
<v Speaker 3>links and of course, my newsletter. I also really recommend

0:28:16.119 --> 0:28:18.080
<v Speaker 3>you go to chat dot where's youreaed dot at to

0:28:18.160 --> 0:28:20.520
<v Speaker 3>visit the discord, and go to our slash.

0:28:20.200 --> 0:28:23.359
<v Speaker 2>Better Offline to check out I'll Reddit. Thank you so

0:28:23.440 --> 0:28:26.880
<v Speaker 2>much for listening. Better Offline is a production of cool

0:28:26.960 --> 0:28:29.719
<v Speaker 2>Zone Media. For more from cool Zone Media, visit our

0:28:29.760 --> 0:28:32.760
<v Speaker 2>website cool Zonemedia dot com, or check us out on

0:28:32.840 --> 0:28:36.639
<v Speaker 2>the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.