1
00:00:02,800 --> 00:00:03,560
Speaker 1: Ze Media.

2
00:00:05,320 --> 00:00:07,880
Speaker 2: Hi, my name said Tron, and welcome to Better Offline.

3
00:00:07,880 --> 00:00:22,280
Speaker 2: This is also Jackass. So you've just had a cheery

4
00:00:22,280 --> 00:00:25,080
Speaker 2: two part chuckle first about how Generative Ai made tanker

5
00:00:25,239 --> 00:00:27,319
Speaker 2: markets in our economy. So I'm going to give you

6
00:00:27,320 --> 00:00:30,319
Speaker 2: a lighter one an episode about GPT five, which is

7
00:00:30,360 --> 00:00:33,080
Speaker 2: a model from open Ai, and why just under three

8
00:00:33,159 --> 00:00:35,360
Speaker 2: years of hype have led to the software equivalent of

9
00:00:35,360 --> 00:00:38,239
Speaker 2: the launch of Saint Anger, except every time lars are hit.

10
00:00:38,280 --> 00:00:41,800
Speaker 2: The snare drama cost them fifty five thousand dollars. Now,

11
00:00:41,800 --> 00:00:44,480
Speaker 2: if we look at the positive reviews, we see takes

12
00:00:44,600 --> 00:00:48,000
Speaker 2: ranging from Simon Willison's tempered remark that GPT five is

13
00:00:48,240 --> 00:00:51,280
Speaker 2: just good at stuff to semi anass this is completely

14
00:00:51,320 --> 00:00:54,800
Speaker 2: insane statement that GPT five is setting the stage for

15
00:00:54,880 --> 00:00:59,920
Speaker 2: ad monetization and the open Ai GPT chat GPT super app.

16
00:01:00,120 --> 00:01:02,440
Speaker 2: In a piece that makes several assertions about how the

17
00:01:02,520 --> 00:01:05,520
Speaker 2: router that underpins GPT five is somehow the secret way

18
00:01:05,600 --> 00:01:09,959
Speaker 2: that Openaye will inject Dad's which is just distinctly silly.

19
00:01:10,080 --> 00:01:13,400
Speaker 2: It's I'll get into this in the episode a little bit,

20
00:01:13,400 --> 00:01:15,480
Speaker 2: but just with everything you're going to hear, you're going

21
00:01:15,560 --> 00:01:18,160
Speaker 2: to realize that this is just someone just saying stuff.

22
00:01:18,200 --> 00:01:21,120
Speaker 2: Took four bylines to do that shit too. I'm also British.

23
00:01:21,120 --> 00:01:23,080
Speaker 2: I'm gonna say router. I might say router as well,

24
00:01:23,120 --> 00:01:24,760
Speaker 2: because I've been here a while. Make fun of my

25
00:01:24,840 --> 00:01:27,399
Speaker 2: voice if you really must. But with that out the way,

26
00:01:27,440 --> 00:01:30,640
Speaker 2: here's a quote from semi Analysis' coverage. Before the router,

27
00:01:30,720 --> 00:01:32,640
Speaker 2: there was no way for a query to be distinguished,

28
00:01:32,680 --> 00:01:35,880
Speaker 2: and after the router, the first low value query could

29
00:01:35,920 --> 00:01:38,679
Speaker 2: be routed to a GBT five mini model that can

30
00:01:38,760 --> 00:01:41,959
Speaker 2: answer with zero tool calls and no reasoning. This likely

31
00:01:41,959 --> 00:01:44,160
Speaker 2: means serving this user is approaching the cost of a

32
00:01:44,200 --> 00:01:48,120
Speaker 2: search query. This does not make any sense. This None

33
00:01:48,120 --> 00:01:50,480
Speaker 2: of this makes it like it's just a bunch of assumptions.

34
00:01:50,600 --> 00:01:53,120
Speaker 2: Why would this be the case. The article also makes

35
00:01:53,120 --> 00:01:54,840
Speaker 2: a lot of claims about the value of a question

36
00:01:54,920 --> 00:01:58,440
Speaker 2: and how chat GPT could I am serious a agent

37
00:01:59,000 --> 00:02:02,000
Speaker 2: agentically reach out to lawyers. I'm not going to edit

38
00:02:02,040 --> 00:02:04,560
Speaker 2: that out because egentically is not a fun word to say.

39
00:02:05,640 --> 00:02:07,760
Speaker 2: It is just complete nonsense, and in fact, I'm not

40
00:02:07,840 --> 00:02:11,320
Speaker 2: sure this piece reflects how GPT five even works at all. Again,

41
00:02:11,400 --> 00:02:14,520
Speaker 2: quoting it, the router serves multiple purposes on both the

42
00:02:14,560 --> 00:02:17,320
Speaker 2: cost and performance side. On the cost side, routing users

43
00:02:17,320 --> 00:02:19,400
Speaker 2: to many versions of each bubble allows open ai to

44
00:02:19,440 --> 00:02:22,480
Speaker 2: service uses at a lower cost or with lower costs.

45
00:02:22,520 --> 00:02:25,160
Speaker 2: Even to be fair on semi analysis, it's not as

46
00:02:25,200 --> 00:02:27,920
Speaker 2: if open ai gave them much help. Open AI's official

47
00:02:27,919 --> 00:02:31,520
Speaker 2: writings about the router aren't exactly filled with details, talking

48
00:02:31,560 --> 00:02:34,000
Speaker 2: and glowing terms about what it does, but not how

49
00:02:34,480 --> 00:02:38,440
Speaker 2: here's what they say. Chat GPT's real time router quickly

50
00:02:38,480 --> 00:02:41,760
Speaker 2: decides which model to use based on the conversation type, complexity,

51
00:02:41,800 --> 00:02:44,640
Speaker 2: tool needs, and your explicit intent. For example, if you

52
00:02:44,720 --> 00:02:47,520
Speaker 2: say think hard about this in the prompt. The router

53
00:02:47,600 --> 00:02:51,480
Speaker 2: is continuously trained on real signals, including when users switch models,

54
00:02:51,520 --> 00:02:56,200
Speaker 2: preference rates for responses, and measured corrected correctness improving over time.

55
00:02:56,600 --> 00:02:59,120
Speaker 2: Once usage limits are reached, a mini version of each

56
00:02:59,160 --> 00:03:02,040
Speaker 2: model handles remains inquiries. In the near future, we plan

57
00:03:02,080 --> 00:03:05,280
Speaker 2: to integrate these capabilities into a single model. And that

58
00:03:05,360 --> 00:03:08,359
Speaker 2: last bit really doesn't make sense, but in any case,

59
00:03:08,400 --> 00:03:11,760
Speaker 2: the lordchip GPT five has been very, very weird. At first.

60
00:03:11,760 --> 00:03:14,120
Speaker 2: Some people seemed really happy about it. Chief of them

61
00:03:14,120 --> 00:03:16,640
Speaker 2: software YouTube of Theo Brown, who is over four hundred

62
00:03:16,680 --> 00:03:19,520
Speaker 2: and sixty eight thousand subscribers. He's also known as theogg

63
00:03:19,760 --> 00:03:20,560
Speaker 2: who said.

64
00:03:20,840 --> 00:03:24,400
Speaker 1: I didn't know it could get this good. This was

65
00:03:24,520 --> 00:03:29,280
Speaker 1: kind of the like oh fuck moment for me in

66
00:03:29,320 --> 00:03:33,040
Speaker 1: a lot of ways, and I've had to fight like

67
00:03:33,120 --> 00:03:38,560
Speaker 1: a slow spiral into insanity. It's a really really good model.

68
00:03:39,600 --> 00:03:41,120
Speaker 2: He finished by saying, and.

69
00:03:41,120 --> 00:03:42,960
Speaker 1: Keep an eye on your job because I don't know

70
00:03:43,000 --> 00:03:44,840
Speaker 1: what this means for us long term.

71
00:03:45,360 --> 00:03:48,480
Speaker 2: Pretty crazy, right. Comments on the video included people saying

72
00:03:48,520 --> 00:03:51,200
Speaker 2: things like if open aye is helding you hostage, blink

73
00:03:51,280 --> 00:03:54,200
Speaker 2: twice and yes that is an adverbating quote. Another saying

74
00:03:54,240 --> 00:03:57,040
Speaker 2: this dude, is everything wrong in it today? Another saying

75
00:03:57,080 --> 00:03:59,600
Speaker 2: this video was sponsored by open Ai, Another other saying

76
00:03:59,800 --> 00:04:02,360
Speaker 2: ge GPT five failed every test project I gave it today.

77
00:04:02,440 --> 00:04:04,640
Speaker 2: It's a lie in my experience. Maybe they haven't ramped

78
00:04:04,720 --> 00:04:08,040
Speaker 2: up the GPUs now. From what I can tell, THEO

79
00:04:08,160 --> 00:04:10,800
Speaker 2: Brown played with GPT five in open ais offices and

80
00:04:10,800 --> 00:04:14,640
Speaker 2: did all the benchmarking there. Open Ai, by the way,

81
00:04:14,880 --> 00:04:19,520
Speaker 2: fucking how come on? You can't benchmark in their offices anyway.

82
00:04:19,560 --> 00:04:22,599
Speaker 2: Open AI's API based access to GPT five models. You

83
00:04:22,640 --> 00:04:24,000
Speaker 2: know the thing that you use if you want to

84
00:04:24,000 --> 00:04:26,720
Speaker 2: integrate GPT into your app, does not root them, by

85
00:04:26,760 --> 00:04:29,000
Speaker 2: the way, nor does open ai offer access to its

86
00:04:29,080 --> 00:04:32,440
Speaker 2: router or any associated models. Important detail. Just want you

87
00:04:32,480 --> 00:04:34,400
Speaker 2: to know that because we need to make sure very

88
00:04:34,400 --> 00:04:37,080
Speaker 2: clear now A weekly a Theo Brown would put out

89
00:04:37,120 --> 00:04:39,680
Speaker 2: another video called I was wrong about GPT five, which

90
00:04:39,960 --> 00:04:41,560
Speaker 2: he would open by saying.

91
00:04:41,880 --> 00:04:43,760
Speaker 1: So first and foremost, I want to make sure it

92
00:04:43,800 --> 00:04:47,359
Speaker 1: is very very clear that the experience that you probably

93
00:04:47,400 --> 00:04:50,000
Speaker 1: are having with chat, GPT and GPT five right now

94
00:04:50,400 --> 00:04:52,760
Speaker 1: is not the experience that I had when I was

95
00:04:52,760 --> 00:04:53,600
Speaker 1: first testing it.

96
00:04:53,960 --> 00:04:55,880
Speaker 2: Brown goes on to explain that he was not paid

97
00:04:55,880 --> 00:04:59,120
Speaker 2: by open Ai at all, that he was sincerely impressed

98
00:04:59,120 --> 00:05:01,599
Speaker 2: by the company and GA five, and that he'd actually

99
00:05:01,680 --> 00:05:04,200
Speaker 2: spent over twenty five thousand dollars in inference testing it

100
00:05:04,240 --> 00:05:06,720
Speaker 2: on his own company software, and indeed also that he

101
00:05:06,800 --> 00:05:10,280
Speaker 2: turned down a grand appearance fee. Sorry, I mean that's

102
00:05:10,320 --> 00:05:13,160
Speaker 2: a very British thing, one thousand dollars appearance fee, not

103
00:05:13,240 --> 00:05:16,160
Speaker 2: just like a really nice one. Brown claims he asked

104
00:05:16,160 --> 00:05:18,240
Speaker 2: open Ai to try it out, and after they declined

105
00:05:18,279 --> 00:05:20,240
Speaker 2: to let him test it early on his own, he

106
00:05:20,360 --> 00:05:22,159
Speaker 2: was invited to try it on camera with a small

107
00:05:22,200 --> 00:05:24,679
Speaker 2: group of other people open AI's offices where they'd film

108
00:05:24,720 --> 00:05:27,919
Speaker 2: his reactions. He said that the API was incredible, but

109
00:05:28,000 --> 00:05:30,039
Speaker 2: that it's become apparent that the models he was using

110
00:05:30,080 --> 00:05:31,799
Speaker 2: in the video were not the same as those released

111
00:05:31,839 --> 00:05:34,200
Speaker 2: of the public. Making a post on August thirteenth on

112
00:05:34,440 --> 00:05:37,000
Speaker 2: xd Everything app that GPT five was nowhere near as

113
00:05:37,040 --> 00:05:39,360
Speaker 2: good as in cursor as when it was as it

114
00:05:39,440 --> 00:05:40,960
Speaker 2: was when he was using it a few weeks ago,

115
00:05:41,040 --> 00:05:43,760
Speaker 2: complaining that things that worked while demoing it at open

116
00:05:43,800 --> 00:05:47,159
Speaker 2: ai no longer did, adding that there was something somebody

117
00:05:47,160 --> 00:05:49,680
Speaker 2: else on Twitter that said they'd had a similarly great

118
00:05:49,720 --> 00:05:53,560
Speaker 2: experience GPT five on launch that has since decayed. It

119
00:05:53,640 --> 00:05:55,880
Speaker 2: isn't completely clear what happened here, but I'm going to

120
00:05:55,880 --> 00:05:58,040
Speaker 2: guess that open ai showed THEO Brown and others in

121
00:05:58,080 --> 00:06:01,200
Speaker 2: their offices some sort of heavily molded version of the

122
00:06:01,200 --> 00:06:04,560
Speaker 2: model that burns significantly more compute to provide its outputs,

123
00:06:04,680 --> 00:06:07,599
Speaker 2: though I'm also very suspicious of how significance the difference

124
00:06:07,640 --> 00:06:11,040
Speaker 2: is here. Brown's videos attempt to show the difference between

125
00:06:11,080 --> 00:06:12,840
Speaker 2: the generations that you received from the model when it

126
00:06:12,880 --> 00:06:14,920
Speaker 2: was good and when it was bad. In this video,

127
00:06:15,160 --> 00:06:17,000
Speaker 2: which I'll include a link to in the episode notes.

128
00:06:17,000 --> 00:06:20,280
Speaker 2: But if I'm honest, they look pretty similar in that

129
00:06:20,279 --> 00:06:23,440
Speaker 2: they're kind of mediocre. I'm not saying that as a hater,

130
00:06:23,480 --> 00:06:25,120
Speaker 2: by the way. They just kind of look like shit.

131
00:06:26,000 --> 00:06:28,080
Speaker 2: It's just kind of okay, like shit. They look like

132
00:06:28,160 --> 00:06:31,240
Speaker 2: regular fucking generated websites. They don't look special. The good

133
00:06:31,279 --> 00:06:34,839
Speaker 2: one is fine, and the bad one has weird gradients

134
00:06:34,880 --> 00:06:37,919
Speaker 2: on it. This whole thing sucks, though, and was a

135
00:06:37,960 --> 00:06:41,000
Speaker 2: clear set up by open Ai to overstate the abilities

136
00:06:41,000 --> 00:06:43,320
Speaker 2: of GPT five, one that fell apart with the lightest

137
00:06:43,320 --> 00:06:46,480
Speaker 2: brush with reality. I imagine their assumption was that Brown

138
00:06:46,480 --> 00:06:48,720
Speaker 2: would post the glossy video and then walk away, and

139
00:06:48,760 --> 00:06:51,320
Speaker 2: it gave THEO some credit for straight up stating he

140
00:06:51,360 --> 00:06:53,919
Speaker 2: was misled. This was a desperate move and one that

141
00:06:53,960 --> 00:06:56,000
Speaker 2: blew up in the face of open Ai. Along with

142
00:06:56,040 --> 00:06:58,919
Speaker 2: the rest of the GPT five launch. People hate the model,

143
00:06:59,000 --> 00:07:01,960
Speaker 2: customers are mad for taking models away like four to

144
00:07:02,040 --> 00:07:04,560
Speaker 2: H and have remained mad even with their return, and

145
00:07:04,600 --> 00:07:07,919
Speaker 2: the chat gpt subreddit is almost entirely people complaining about

146
00:07:08,320 --> 00:07:11,320
Speaker 2: how ineffective the new version is and how even GPT

147
00:07:11,360 --> 00:07:13,760
Speaker 2: four ROH is not the same They got game of

148
00:07:13,760 --> 00:07:16,640
Speaker 2: brain Baby. As I said in last week's monologue. I

149
00:07:16,680 --> 00:07:18,800
Speaker 2: believe open Ai has grown a fandom rather than any

150
00:07:18,840 --> 00:07:21,880
Speaker 2: kind of sustainable product market fit, and they're now suffering

151
00:07:21,920 --> 00:07:24,520
Speaker 2: fandom like hate with every minor change they make in

152
00:07:24,520 --> 00:07:27,680
Speaker 2: an attempt to push GPT five further, further aggravating people

153
00:07:27,680 --> 00:07:30,640
Speaker 2: that barely understand why they use the product to begin with. Yeah,

154
00:07:30,760 --> 00:07:33,720
Speaker 2: the center of the angle laid the reason for GPT

155
00:07:33,800 --> 00:07:36,520
Speaker 2: five's launch, the belief that this was somehow a cost

156
00:07:36,520 --> 00:07:39,240
Speaker 2: cutting measure, where OpenAI had added a router to chat

157
00:07:39,280 --> 00:07:41,920
Speaker 2: GPT as a means of sending certain requests to cheaper

158
00:07:41,920 --> 00:07:45,080
Speaker 2: models to save money. But when I hear router, I

159
00:07:45,160 --> 00:07:47,680
Speaker 2: hear latency, and I never or even a second believe

160
00:07:47,720 --> 00:07:49,760
Speaker 2: that this would somehow be cheaper to run. It didn't

161
00:07:49,760 --> 00:07:52,720
Speaker 2: make sense. I'm a curious little criator, so I went

162
00:07:52,760 --> 00:07:55,920
Speaker 2: and found out how chat GPT five actually works, and

163
00:07:56,040 --> 00:07:59,160
Speaker 2: unlike the following incredible products that you should buy, it's

164
00:07:59,200 --> 00:08:12,679
Speaker 2: actually kind of a big piece of shit. And we're back,

165
00:08:13,120 --> 00:08:14,960
Speaker 2: and from here on out, I will define two things.

166
00:08:15,000 --> 00:08:17,720
Speaker 2: GPT five referring to the model and its associated mini

167
00:08:17,720 --> 00:08:20,400
Speaker 2: and nano models, and Chat GPT five referring to the

168
00:08:20,400 --> 00:08:23,520
Speaker 2: current state of chat GPT, which features an auto fast

169
00:08:23,560 --> 00:08:27,120
Speaker 2: and thinking and thinking mini model selections. You also can

170
00:08:27,160 --> 00:08:30,239
Speaker 2: see legacy models, but that's not what we're talking about today,

171
00:08:30,240 --> 00:08:32,760
Speaker 2: and that's also only for a little bit. It's a

172
00:08:32,800 --> 00:08:34,959
Speaker 2: distinction I have to make, by the way, and make earlier,

173
00:08:34,960 --> 00:08:37,480
Speaker 2: because the two things are different, they work in different ways,

174
00:08:37,480 --> 00:08:40,600
Speaker 2: and chat GPT five structure induces a bunch of trade

175
00:08:40,600 --> 00:08:43,600
Speaker 2: offs and downsides that, as I'll discuss later, make this

176
00:08:43,640 --> 00:08:47,320
Speaker 2: whole thing even more wasteful. In discussions with a source

177
00:08:47,360 --> 00:08:50,360
Speaker 2: that an infrastructure provider familiar with the architecture, it appears

178
00:08:50,400 --> 00:08:53,320
Speaker 2: that chat GPT five is in fact potentially more expensive

179
00:08:53,320 --> 00:08:55,679
Speaker 2: to run than previous models, and due to the complex

180
00:08:55,679 --> 00:08:58,200
Speaker 2: and chaotic nature of said architecture, can at times spun

181
00:08:58,320 --> 00:09:02,400
Speaker 2: upwards of double The tokens per quid tokens, for those

182
00:09:02,400 --> 00:09:04,560
Speaker 2: who don't know, are basically chunks of texts that the

183
00:09:04,600 --> 00:09:08,000
Speaker 2: AI models do stuff with. I'm simplifying this. Do not

184
00:09:08,120 --> 00:09:11,600
Speaker 2: email me and correct some minor thing nobody cares. A

185
00:09:11,679 --> 00:09:14,320
Speaker 2: sentence like the quick brown fox jumps over the lazy

186
00:09:14,360 --> 00:09:17,160
Speaker 2: dog will be broken into lots of smaller four character chunks.

187
00:09:17,400 --> 00:09:19,720
Speaker 2: There are different kinds of tokens, and they're all priced differently.

188
00:09:20,080 --> 00:09:22,120
Speaker 2: An input token refers to the data you send to

189
00:09:22,160 --> 00:09:24,280
Speaker 2: the model when you ask a question. Output tokens are

190
00:09:24,360 --> 00:09:26,199
Speaker 2: used to measure the size of its response, with bigger

191
00:09:26,200 --> 00:09:30,240
Speaker 2: responses requiring more tokens. The more tokens you burn paquery,

192
00:09:30,280 --> 00:09:32,480
Speaker 2: the more expensive it is to run that query. The

193
00:09:32,520 --> 00:09:35,560
Speaker 2: fact that chat GPT five can, in certain circumstances burn

194
00:09:35,600 --> 00:09:37,920
Speaker 2: twice the number of tokens of query means that every

195
00:09:38,000 --> 00:09:41,839
Speaker 2: question costs more. Chat GPT is also significantly more convoluted,

196
00:09:41,840 --> 00:09:45,280
Speaker 2: plagued by latency issues, and is more compute intensive thanks

197
00:09:45,280 --> 00:09:49,319
Speaker 2: to open a ey's new, smarter, more efficient model routing system.

198
00:09:50,040 --> 00:09:52,880
Speaker 2: In simpler terms, every user prompt on chat GPT, whether

199
00:09:52,920 --> 00:09:55,920
Speaker 2: it's in auto, fast thinking or Thinking Mini, starts by

200
00:09:55,920 --> 00:09:59,120
Speaker 2: putting the users prompt before the static prompt. I don't

201
00:09:59,160 --> 00:10:01,480
Speaker 2: want to lose you here. This is important. A static

202
00:10:01,480 --> 00:10:04,079
Speaker 2: prompt is the invisible instructions given by open Ai to

203
00:10:04,160 --> 00:10:07,080
Speaker 2: chat GPT, in the models themselves and the tools associate

204
00:10:07,160 --> 00:10:09,800
Speaker 2: with them to tell them how to operate. Instructions like

205
00:10:09,840 --> 00:10:12,199
Speaker 2: you are chat GPT, you're a large language model, You're

206
00:10:12,200 --> 00:10:14,720
Speaker 2: a helpful chat bot. Do not threaten them with a knife,

207
00:10:14,720 --> 00:10:17,280
Speaker 2: and so on and so forth. These static prompts are

208
00:10:17,280 --> 00:10:19,480
Speaker 2: different with each model you use. A reasoning model will

209
00:10:19,480 --> 00:10:22,400
Speaker 2: have a different instructions set to a more chat focused one,

210
00:10:22,440 --> 00:10:24,760
Speaker 2: such as think harder about a particular problem before giving

211
00:10:24,800 --> 00:10:27,760
Speaker 2: an answer. Break down problems into component answers. When you

212
00:10:27,840 --> 00:10:30,200
Speaker 2: get a certain thing, like if someone asks you a

213
00:10:30,240 --> 00:10:33,080
Speaker 2: coding question, query a coding tool. That kind of thing,

214
00:10:33,760 --> 00:10:35,800
Speaker 2: a user prompt is exactly what it sounds like, the

215
00:10:35,840 --> 00:10:37,760
Speaker 2: thing that a user wants the AI model to do.

216
00:10:38,320 --> 00:10:40,560
Speaker 2: The new order in chat GPT five becomes an issue

217
00:10:40,600 --> 00:10:43,080
Speaker 2: when you use multiple different models in the same conversation.

218
00:10:43,160 --> 00:10:45,199
Speaker 2: Because the router, the thing that selects the right model

219
00:10:45,200 --> 00:10:47,520
Speaker 2: for the request, has to look at the user prompt.

220
00:10:47,760 --> 00:10:50,800
Speaker 2: It can't consider static instructions first because they may be

221
00:10:50,840 --> 00:10:53,920
Speaker 2: different based on what the user asked. In fact, the

222
00:10:54,120 --> 00:10:56,000
Speaker 2: order has to be flipped for the whole thing to work.

223
00:10:56,679 --> 00:11:00,240
Speaker 2: But simpler previous versions of chat GPT would take the

224
00:11:00,240 --> 00:11:03,360
Speaker 2: static prompt and then invisibly append the user prompt onto it.

225
00:11:03,400 --> 00:11:06,080
Speaker 2: This static prompt would typically be cashed massively, reducing the

226
00:11:06,080 --> 00:11:08,040
Speaker 2: amount of compute the model needs to perform a task.

227
00:11:08,559 --> 00:11:12,400
Speaker 2: Chat GPT cannot do this. Every time you use chat

228
00:11:12,440 --> 00:11:15,480
Speaker 2: GPT five. Every single thing you say or do can

229
00:11:15,520 --> 00:11:17,880
Speaker 2: cause it to do something different. Attach a vile might

230
00:11:17,880 --> 00:11:20,080
Speaker 2: need a different model. Ask it to look into something

231
00:11:20,120 --> 00:11:22,600
Speaker 2: and be detailed. Might trigger a reasoning model or a

232
00:11:22,600 --> 00:11:26,600
Speaker 2: different depth of reasoning. Ask a question in a weird way. Sorry,

233
00:11:26,600 --> 00:11:27,880
Speaker 2: the route is going to need to send you to

234
00:11:27,880 --> 00:11:30,800
Speaker 2: a different model entirely each time, coming up with new

235
00:11:30,800 --> 00:11:33,839
Speaker 2: instructions based on the subtle interpretation of what you asked in.

236
00:11:34,559 --> 00:11:36,600
Speaker 2: Every single thing that can happen when you ask chat

237
00:11:36,640 --> 00:11:39,280
Speaker 2: GPT to do something may triget the route to change model.

238
00:11:39,400 --> 00:11:41,559
Speaker 2: A request a new tool, and each time it does

239
00:11:41,600 --> 00:11:44,680
Speaker 2: so requires a completely fresh static prompt, regardless of whether

240
00:11:44,679 --> 00:11:46,920
Speaker 2: you select auto thinking Faster or any other option on

241
00:11:47,040 --> 00:11:50,400
Speaker 2: chat GPT. This in turn requires it to expend more

242
00:11:50,400 --> 00:11:53,640
Speaker 2: compute with queries consuming more tokens compared to previous versions.

243
00:11:54,960 --> 00:11:56,640
Speaker 2: It's like you started a job, and every time you

244
00:11:56,720 --> 00:11:58,800
Speaker 2: do a task, right an email, make a cup of copy,

245
00:11:58,920 --> 00:12:03,440
Speaker 2: attend a meeting, email someone with a threat your workplace

246
00:12:03,480 --> 00:12:06,640
Speaker 2: requires you to complete the entire mandatory onboarding training first.

247
00:12:06,760 --> 00:12:08,800
Speaker 2: One way that it is spreadsheet, not before you brush up

248
00:12:08,800 --> 00:12:13,040
Speaker 2: on your anti biberary legislation. First your prick. As a result,

249
00:12:13,120 --> 00:12:16,160
Speaker 2: Chat GPT may be smart, but it doesn't really seem

250
00:12:16,160 --> 00:12:20,320
Speaker 2: efficient in the GPT five version. Now to play Devil's advoca,

251
00:12:20,480 --> 00:12:22,840
Speaker 2: open Ai likely added the routing model as a means

252
00:12:22,840 --> 00:12:25,440
Speaker 2: of creating a more sophisticated output for a user, and

253
00:12:25,520 --> 00:12:28,959
Speaker 2: I imagine with the intention of cost saving. Then again,

254
00:12:29,000 --> 00:12:30,800
Speaker 2: this might just be the thing it had to ship.

255
00:12:30,920 --> 00:12:32,760
Speaker 2: After all, GPT five was meant to be the next

256
00:12:32,840 --> 00:12:35,000
Speaker 2: great leap in AI, and the pressure was on to

257
00:12:35,040 --> 00:12:37,480
Speaker 2: get it out the door by creating a system that

258
00:12:37,520 --> 00:12:41,040
Speaker 2: depends on an extern and or routing model, likely another LM.

259
00:12:41,080 --> 00:12:43,280
Speaker 2: In this case, open ai has removed the ability to

260
00:12:43,280 --> 00:12:46,200
Speaker 2: cash the hidden instructions that dictate the how the models

261
00:12:46,240 --> 00:12:50,840
Speaker 2: generate answers in chat GPT, creating massive infrastructural overhead. Worse still,

262
00:12:51,000 --> 00:12:53,880
Speaker 2: this happens with every single turn as in message on

263
00:12:53,960 --> 00:12:56,880
Speaker 2: Chat GPT five, regardless of the model you choose, creating

264
00:12:57,000 --> 00:12:59,800
Speaker 2: endless infrastructural baggage with no real way out that only

265
00:12:59,800 --> 00:13:02,880
Speaker 2: could pounds based on how complex the user's queries get

266
00:13:02,920 --> 00:13:05,280
Speaker 2: or how much they change. They could be simple, but

267
00:13:05,400 --> 00:13:08,560
Speaker 2: just going in different directions every time, could open ai

268
00:13:08,679 --> 00:13:10,800
Speaker 2: make a better router? Sure? Does it have a good

269
00:13:10,840 --> 00:13:13,959
Speaker 2: one today? No, every time you message CHATGBT as the

270
00:13:13,960 --> 00:13:16,640
Speaker 2: potential to change model or tooling based on its own whims,

271
00:13:16,760 --> 00:13:19,200
Speaker 2: each time requiring a fresh static prompt, and short of

272
00:13:19,480 --> 00:13:22,240
Speaker 2: totally reworking the architecture of chat GPT five, there's no

273
00:13:22,280 --> 00:13:25,280
Speaker 2: way to change this. And if it's an LLM choosing

274
00:13:25,320 --> 00:13:28,640
Speaker 2: which model, I don't know, maybe it hallucinates just a guess.

275
00:13:29,400 --> 00:13:30,840
Speaker 2: It doesn't even need to be the case where a

276
00:13:30,920 --> 00:13:33,560
Speaker 2: user asks chet GPT five to think, and based on

277
00:13:33,600 --> 00:13:36,480
Speaker 2: my test with GPT five, sometimes you can just ask

278
00:13:36,480 --> 00:13:38,800
Speaker 2: it a forward question and it will think about it.

279
00:13:38,800 --> 00:13:41,840
Speaker 2: For no apparent reason, open ai has created a product

280
00:13:41,840 --> 00:13:45,680
Speaker 2: with latency issues and an overwhelmingly convoluted routing system that's

281
00:13:45,720 --> 00:13:48,560
Speaker 2: already straining capacity, to the point that this announcement feels

282
00:13:48,640 --> 00:13:51,880
Speaker 2: like open ai is walking away from its API entirely. This,

283
00:13:52,000 --> 00:13:53,880
Speaker 2: as a reminder, is the thing that people use to

284
00:13:53,920 --> 00:13:56,800
Speaker 2: incorporate open AI's models into their apps while also running

285
00:13:56,800 --> 00:13:59,560
Speaker 2: set models on the infrastructure open Ai rants from Microsoft

286
00:14:00,040 --> 00:14:02,400
Speaker 2: and call even at some point as well as Oracle,

287
00:14:03,200 --> 00:14:05,600
Speaker 2: and this API thing is really weird by the way

288
00:14:05,640 --> 00:14:08,559
Speaker 2: because these are new models, but Open Eyes really not

289
00:14:08,600 --> 00:14:11,760
Speaker 2: talking about the models themselves that much. Unlike the GPT

290
00:14:11,840 --> 00:14:14,840
Speaker 2: four rower announcement, which mentions the API in the first paragraph,

291
00:14:14,920 --> 00:14:17,440
Speaker 2: the GPT five announcement has no reference to it and

292
00:14:17,520 --> 00:14:19,720
Speaker 2: only has a single reference to developers at all when

293
00:14:19,760 --> 00:14:22,560
Speaker 2: talking about coding. Some woman has already hinted that he

294
00:14:22,640 --> 00:14:25,680
Speaker 2: intends to deprecate any new API demand, though I imagine

295
00:14:25,680 --> 00:14:27,920
Speaker 2: it will let anyone who will pay for priority processing,

296
00:14:27,960 --> 00:14:31,400
Speaker 2: which is essentially open eyes way to require minimum commitments

297
00:14:31,400 --> 00:14:34,040
Speaker 2: and extra payments from API customers just so they never

298
00:14:34,120 --> 00:14:37,200
Speaker 2: feel the bite of any compute shortages and throttling, which

299
00:14:37,200 --> 00:14:40,520
Speaker 2: they absolutely will do to people that don't pay. Chat

300
00:14:40,520 --> 00:14:43,000
Speaker 2: GPT five feels like the ultimate comeuppance for a company

301
00:14:43,000 --> 00:14:45,040
Speaker 2: that has never been forced to build a product, choosing

302
00:14:45,120 --> 00:14:48,200
Speaker 2: instead to bolt increasingly complex tools onto the side of

303
00:14:48,280 --> 00:14:51,280
Speaker 2: models in the hopes that one will magically appear. Now,

304
00:14:51,360 --> 00:14:53,880
Speaker 2: each and every feature of Chat GPT burns more money

305
00:14:53,880 --> 00:14:56,760
Speaker 2: than it ever did before. Chat GPT five feels like

306
00:14:56,800 --> 00:14:58,600
Speaker 2: a product that was rushed to market by a desperate

307
00:14:58,600 --> 00:15:00,680
Speaker 2: company that had to get something out of the In

308
00:15:00,720 --> 00:15:04,120
Speaker 2: simpler terms, here, it's actually really funny. When I worked

309
00:15:04,160 --> 00:15:07,200
Speaker 2: this out, I chuckled. I chuckled vigorously. This is just

310
00:15:07,240 --> 00:15:10,200
Speaker 2: a case where open ai has given chat gpt middle manager.

311
00:15:10,960 --> 00:15:12,640
Speaker 2: But now I'm giving you the chance to open up

312
00:15:12,680 --> 00:15:15,680
Speaker 2: your hearts and do something better. Open up your wallets too,

313
00:15:15,680 --> 00:15:18,800
Speaker 2: and send money to a company that follows here, But

314
00:15:19,000 --> 00:15:38,280
Speaker 2: hold my advertisements and we're back. Like every great middle manager,

315
00:15:38,480 --> 00:15:41,280
Speaker 2: chat GPT five's rutter creates more work based on its

316
00:15:41,320 --> 00:15:43,840
Speaker 2: own interpretation of what's going on, and has a separate

317
00:15:43,920 --> 00:15:45,960
Speaker 2: large language model. I can't imagine it has a ton

318
00:15:46,000 --> 00:15:48,520
Speaker 2: of training data available if I had to guess, and

319
00:15:48,560 --> 00:15:51,080
Speaker 2: this is a guess by the way open ai has done,

320
00:15:51,120 --> 00:15:53,160
Speaker 2: and we'll do a lot of fine tuning and reinforcement

321
00:15:53,240 --> 00:15:55,680
Speaker 2: learning to make it work. Though, to give it a

322
00:15:55,680 --> 00:15:57,640
Speaker 2: little grace, this is a new thing that it's doing,

323
00:15:57,680 --> 00:16:01,840
Speaker 2: and it's doing sort of a huge scale. The problems start,

324
00:16:01,880 --> 00:16:03,600
Speaker 2: by the way, with the fact that chat GPT five

325
00:16:03,680 --> 00:16:06,280
Speaker 2: is taking the user's initial prompt and then deciding which

326
00:16:06,280 --> 00:16:09,720
Speaker 2: model to use, unlike previous models, which sent your prompt

327
00:16:09,760 --> 00:16:11,920
Speaker 2: directly to the model along with the static prompt which

328
00:16:11,960 --> 00:16:13,880
Speaker 2: was cashed and came first. An important feature in how

329
00:16:13,960 --> 00:16:17,080
Speaker 2: these models, limit tokenburn. Open ai starts with a router

330
00:16:17,160 --> 00:16:20,400
Speaker 2: model that makes takes what you ask and gives its

331
00:16:20,480 --> 00:16:22,560
Speaker 2: chat GPT and tags it based on what kind of

332
00:16:22,640 --> 00:16:25,400
Speaker 2: thing your question might need. The thing might be a tool,

333
00:16:25,480 --> 00:16:27,400
Speaker 2: such as whether it has to do a web search

334
00:16:27,480 --> 00:16:30,360
Speaker 2: to spit out the thing at the end, a reasoning model,

335
00:16:30,520 --> 00:16:32,360
Speaker 2: whether it needs to use a coding language, and so

336
00:16:32,520 --> 00:16:35,760
Speaker 2: on and so forth. Once chat GPT has bounced your

337
00:16:35,800 --> 00:16:38,800
Speaker 2: query across various models, burn and compute along the way,

338
00:16:39,040 --> 00:16:41,600
Speaker 2: it then pushes it towards the chat portion of the generation.

339
00:16:42,080 --> 00:16:44,480
Speaker 2: And each time you ask chat GPT a question or

340
00:16:44,600 --> 00:16:47,520
Speaker 2: to do something and you specialized static prompt is generated,

341
00:16:47,800 --> 00:16:50,920
Speaker 2: sometimes several make it impossible to cash them in advance.

342
00:16:51,240 --> 00:16:53,520
Speaker 2: In simpler terms, each time you message it, chat GPT

343
00:16:53,640 --> 00:16:56,760
Speaker 2: is to dump all cased information and instructions for what

344
00:16:56,800 --> 00:16:59,120
Speaker 2: you need to do and reload it with each prompt.

345
00:16:59,520 --> 00:17:02,120
Speaker 2: Now here's some examples of what chat GPT five has

346
00:17:02,200 --> 00:17:04,879
Speaker 2: to reload every single time you prompt him whether or

347
00:17:04,880 --> 00:17:06,560
Speaker 2: not to use a browser or search the internet, and

348
00:17:06,640 --> 00:17:09,200
Speaker 2: under what conditions to do so, because they will change

349
00:17:09,200 --> 00:17:12,040
Speaker 2: with each prompt. How to approach a particular problem based

350
00:17:12,080 --> 00:17:14,439
Speaker 2: on what the user asked, including any specific ways you

351
00:17:14,480 --> 00:17:16,840
Speaker 2: meant to answer, tone, brevity, and so on based on

352
00:17:16,920 --> 00:17:20,840
Speaker 2: their request, specifics around how it might use, say open

353
00:17:20,880 --> 00:17:23,800
Speaker 2: ais code interpreter, such as the usage rules for running

354
00:17:23,800 --> 00:17:25,920
Speaker 2: a Python script, or how you want the code's output,

355
00:17:25,960 --> 00:17:28,359
Speaker 2: which again will be different based on each prompt. And

356
00:17:28,520 --> 00:17:30,199
Speaker 2: you can even say, do it in the exactly the

357
00:17:30,200 --> 00:17:32,919
Speaker 2: same way, and because it's a large language model, it

358
00:17:32,960 --> 00:17:37,480
Speaker 2: may hallucinate something different every single goddamn time you prompt

359
00:17:37,560 --> 00:17:40,520
Speaker 2: chat GPT five it has to do this. Worse still,

360
00:17:40,560 --> 00:17:43,480
Speaker 2: a particular conversation can involve you using multiple different models

361
00:17:43,520 --> 00:17:47,119
Speaker 2: and tools, requiring you with each and every prompt, having

362
00:17:47,119 --> 00:17:49,639
Speaker 2: to inject a different static prompt for each component that

363
00:17:49,720 --> 00:17:52,800
Speaker 2: chat GPT five uses. And you can't catch the static

364
00:17:52,800 --> 00:17:54,760
Speaker 2: prompt before the user's intent because if you did that,

365
00:17:55,040 --> 00:17:57,040
Speaker 2: it might send an instruction to a model that doesn't

366
00:17:57,040 --> 00:17:59,199
Speaker 2: make sense, such as telling a reasoning model to give

367
00:17:59,200 --> 00:18:01,840
Speaker 2: a quick and simple line answer remini or nanomodel to

368
00:18:01,880 --> 00:18:04,000
Speaker 2: do some sort of deep reasoning, which would create a

369
00:18:04,000 --> 00:18:07,920
Speaker 2: crappy answer and burn tokens in the process. And this

370
00:18:07,960 --> 00:18:10,040
Speaker 2: is all thanks to the complicated way that open ai

371
00:18:10,160 --> 00:18:14,400
Speaker 2: insisted on building GPT five. Every single time you send

372
00:18:14,480 --> 00:18:16,399
Speaker 2: something to chat, GPT can trigger it to use a

373
00:18:16,560 --> 00:18:21,199
Speaker 2: different series of models audio vision, reasoning, each with their

374
00:18:21,240 --> 00:18:24,680
Speaker 2: own instructions, static prompts, all while pulling different tools, each

375
00:18:24,720 --> 00:18:27,359
Speaker 2: requiring their own instructions based on what you asked, and

376
00:18:27,440 --> 00:18:30,679
Speaker 2: reasoning models even have different depths of reasoning. Unlike four

377
00:18:30,720 --> 00:18:33,800
Speaker 2: to ZH, which is a multimodal model combining text, vision,

378
00:18:33,800 --> 00:18:36,399
Speaker 2: and voice, GPT five is a ratking of open AI's

379
00:18:36,440 --> 00:18:38,720
Speaker 2: models and tools that gets reborn every single time you

380
00:18:38,760 --> 00:18:41,640
Speaker 2: ask it to do anything prompt It can prompt cash

381
00:18:41,720 --> 00:18:45,199
Speaker 2: some things, but the core instructions not so much. But

382
00:18:45,280 --> 00:18:47,600
Speaker 2: let's get a little more granular, because I know I've

383
00:18:47,720 --> 00:18:51,480
Speaker 2: been quite repetitive, but this is detailed. So from what

384
00:18:51,520 --> 00:18:53,879
Speaker 2: I've been told, there are either one or two models

385
00:18:53,880 --> 00:18:55,639
Speaker 2: at work for the routing. I'm going to go with

386
00:18:55,680 --> 00:18:57,600
Speaker 2: what I think is most likely based on the discussions

387
00:18:57,600 --> 00:19:00,640
Speaker 2: I've had with people familiar with the architecture. I've heard

388
00:19:00,680 --> 00:19:04,040
Speaker 2: the term orchestrator thrown around potential to potentially suggesting the

389
00:19:04,119 --> 00:19:06,840
Speaker 2: router may be more omnipresent throughout the process, but I

390
00:19:06,880 --> 00:19:09,479
Speaker 2: was unable to confirm its existence. Reach out of you

391
00:19:09,480 --> 00:19:12,480
Speaker 2: here differently, I'll explain things as they were explained to me. Though.

392
00:19:13,080 --> 00:19:15,760
Speaker 2: When a user sensor prompt, it goes through the Splitter leg,

393
00:19:15,760 --> 00:19:18,480
Speaker 2: which decides to send the query on one of two paths.

394
00:19:18,760 --> 00:19:21,399
Speaker 2: One is called the fast path, where a query is straightforward,

395
00:19:21,400 --> 00:19:24,240
Speaker 2: such as a text only conversation that doesn't require any

396
00:19:24,400 --> 00:19:27,399
Speaker 2: analysis or extra tools or thinking, a path where the

397
00:19:27,440 --> 00:19:30,679
Speaker 2: query may require reasoning or more complex tools like codgeneration

398
00:19:30,800 --> 00:19:33,560
Speaker 2: or access to web browser for research. To be clear,

399
00:19:33,640 --> 00:19:35,639
Speaker 2: there are prompts where it may be split into multiple

400
00:19:35,680 --> 00:19:38,320
Speaker 2: paths that trigger multiple models or tools, each requiring their

401
00:19:38,320 --> 00:19:41,720
Speaker 2: own static instructions. From what I understand, the splitter model

402
00:19:41,800 --> 00:19:44,480
Speaker 2: is a completely separate large language model, though we don't

403
00:19:44,480 --> 00:19:47,600
Speaker 2: have a ton of details about it. I also, based

404
00:19:47,600 --> 00:19:49,720
Speaker 2: on conversations I've had, think there's a chance there could

405
00:19:49,720 --> 00:19:52,000
Speaker 2: be a separate model that sits above the splitter that

406
00:19:52,080 --> 00:19:55,119
Speaker 2: does much lighter classification of how a query might be routed.

407
00:19:55,160 --> 00:19:56,919
Speaker 2: So you ask it to do something, it might just

408
00:19:57,000 --> 00:20:00,359
Speaker 2: go Okay, this looks like it needs a tool and

409
00:20:00,400 --> 00:20:02,600
Speaker 2: going off. Why now? In any case, none of this

410
00:20:02,680 --> 00:20:05,240
Speaker 2: can be cashed because all of this exists before inference,

411
00:20:05,400 --> 00:20:07,679
Speaker 2: which is where, by the way, it's inference I've misstated

412
00:20:07,720 --> 00:20:10,919
Speaker 2: in the past. Is like it inferring, meaning inference is

413
00:20:11,000 --> 00:20:14,240
Speaker 2: everything that happens to get an output to you. So

414
00:20:14,400 --> 00:20:17,320
Speaker 2: all of the stuff that's happening. And by the way,

415
00:20:17,359 --> 00:20:20,239
Speaker 2: this is all a completely new cost that open ai

416
00:20:20,359 --> 00:20:22,760
Speaker 2: has created. No one does this like this, it's so

417
00:20:22,840 --> 00:20:25,400
Speaker 2: fucking stupid. But now we get to the chat leg.

418
00:20:25,720 --> 00:20:27,919
Speaker 2: Now the open ai has added layers of extraction, it

419
00:20:27,920 --> 00:20:30,000
Speaker 2: can begin cooking up the output, by which I mean

420
00:20:30,200 --> 00:20:32,560
Speaker 2: do inference. The chat leg is where the pieces that

421
00:20:32,600 --> 00:20:35,080
Speaker 2: the splitter model created are pulled together, each loaded into

422
00:20:35,119 --> 00:20:38,159
Speaker 2: their with their respective static prompts based on what the

423
00:20:38,240 --> 00:20:40,879
Speaker 2: user asked chat GPD five to do. Each piece of

424
00:20:40,880 --> 00:20:43,080
Speaker 2: the model a tool to generate Python and an image

425
00:20:43,119 --> 00:20:46,400
Speaker 2: generation tool a reasoning model. To generate an output has

426
00:20:46,440 --> 00:20:49,720
Speaker 2: to process an entirely new static prompt and again that's

427
00:20:49,760 --> 00:20:53,560
Speaker 2: every interaction. Remember, static prompts are effectively instruction. So the

428
00:20:53,560 --> 00:20:55,680
Speaker 2: splitter model has told each piece of the pie how

429
00:20:55,720 --> 00:20:58,280
Speaker 2: to act to create a particular output. As a result,

430
00:20:58,400 --> 00:20:59,960
Speaker 2: much of this can't be cashed, creating more and more

431
00:21:00,160 --> 00:21:03,240
Speaker 2: repetitious token bone response and mean to have to repeat

432
00:21:03,280 --> 00:21:05,919
Speaker 2: this stuff so that you really get him. The upshot

433
00:21:05,920 --> 00:21:08,000
Speaker 2: of the chat legs static prompt baggage is that you

434
00:21:08,040 --> 00:21:10,000
Speaker 2: can do a little more here, at least in theory,

435
00:21:10,200 --> 00:21:13,119
Speaker 2: because each component can be instructed separately, they can again,

436
00:21:13,160 --> 00:21:16,320
Speaker 2: in theory, be made to give more individualized, specialized outputs,

437
00:21:16,359 --> 00:21:18,440
Speaker 2: like creating an image with tags that is as I'll

438
00:21:18,440 --> 00:21:21,080
Speaker 2: give you an example of very shortly generated using a

439
00:21:21,119 --> 00:21:26,520
Speaker 2: specific reasoning model. I'm clutching it straws here. I don't

440
00:21:26,520 --> 00:21:29,360
Speaker 2: really know if this's better, but I'm trying to be reasonable.

441
00:21:29,400 --> 00:21:31,960
Speaker 2: I'm trying to be normal. Every day, I try and

442
00:21:32,000 --> 00:21:35,679
Speaker 2: be normal. Previously, Open Eye's advantage was that a model

443
00:21:35,760 --> 00:21:37,400
Speaker 2: like four to oh was a kind of a jack

444
00:21:37,440 --> 00:21:39,800
Speaker 2: of all trades. But to get the benefits of chat

445
00:21:39,840 --> 00:21:42,919
Speaker 2: GPT five and that's in air quotes, it's engaged a

446
00:21:43,000 --> 00:21:46,680
Speaker 2: conductor model that can just make things more convoluted, even

447
00:21:46,720 --> 00:21:49,480
Speaker 2: in the case of simple requests. Let me give you

448
00:21:49,480 --> 00:21:52,520
Speaker 2: an example. You upload a chart of NFL player's stats

449
00:21:52,520 --> 00:21:55,240
Speaker 2: and ask chat GPT to decide which is the best

450
00:21:55,240 --> 00:21:57,160
Speaker 2: of the group and create an image to show the results.

451
00:21:57,359 --> 00:21:59,880
Speaker 2: In GPT four oh, chat GPT would use one more

452
00:22:00,160 --> 00:22:02,359
Speaker 2: and thus one static prompt to look at the image,

453
00:22:02,400 --> 00:22:04,520
Speaker 2: decide which tools to use, and then how to format

454
00:22:04,560 --> 00:22:07,600
Speaker 2: the response. You only needed one prompt, which was cased

455
00:22:07,640 --> 00:22:09,560
Speaker 2: because one model can look at the stats for all

456
00:22:09,600 --> 00:22:11,480
Speaker 2: the data and make the decisions and then use the

457
00:22:11,520 --> 00:22:15,160
Speaker 2: image generation tool to make the final image. In GPT five,

458
00:22:15,240 --> 00:22:17,960
Speaker 2: the chet GPT conductor model would see the stats, root

459
00:22:18,040 --> 00:22:20,640
Speaker 2: it to a vision model requiring its own static prompt,

460
00:22:20,680 --> 00:22:23,160
Speaker 2: then a separate text only reasoning model, one that has

461
00:22:23,160 --> 00:22:25,000
Speaker 2: no ability to use tools, but it might be cheaper

462
00:22:25,000 --> 00:22:27,919
Speaker 2: to get an answer from and also requires a static prompt,

463
00:22:28,640 --> 00:22:30,960
Speaker 2: and that would then decide which players are best and

464
00:22:31,000 --> 00:22:32,760
Speaker 2: then spit out an output, and then root it to

465
00:22:32,800 --> 00:22:35,680
Speaker 2: a completely separate model that can generate texts to query

466
00:22:35,720 --> 00:22:39,199
Speaker 2: the image tool again need a stag prompt for this

467
00:22:39,359 --> 00:22:41,600
Speaker 2: to then generate the image. On top of all this

468
00:22:41,680 --> 00:22:44,920
Speaker 2: onerous baggage lies another problem. The GPT five's various models

469
00:22:44,920 --> 00:22:48,160
Speaker 2: are just more complex. By splitting out the component elements

470
00:22:48,160 --> 00:22:50,080
Speaker 2: of what a model can do and allowing each model

471
00:22:50,119 --> 00:22:52,600
Speaker 2: to have different levels of reasoning, even the cheaper ones

472
00:22:52,640 --> 00:22:55,399
Speaker 2: like MIDI and nano open AI has created an endless

473
00:22:55,440 --> 00:22:57,960
Speaker 2: combination of different reasons to have to make a brand

474
00:22:58,000 --> 00:23:01,520
Speaker 2: new static prompt instruction automated by a router, a large

475
00:23:01,560 --> 00:23:04,000
Speaker 2: language model that chooses what large language model to choose

476
00:23:04,000 --> 00:23:08,760
Speaker 2: for a query. It is, if I'm honest, kind of funny.

477
00:23:08,960 --> 00:23:12,040
Speaker 2: Reasoning models work when simply described by breaking up a

478
00:23:12,040 --> 00:23:14,480
Speaker 2: prompt into component pieces, looking over them, and deciding what

479
00:23:14,520 --> 00:23:17,320
Speaker 2: the best course of action might be. Chat GPT's router

480
00:23:17,359 --> 00:23:19,920
Speaker 2: is effectively an abstraction higher breaking up the prompt into

481
00:23:19,920 --> 00:23:22,679
Speaker 2: component pieces, then choosing different models for each of those pieces,

482
00:23:22,680 --> 00:23:26,000
Speaker 2: which may in turn be broken up by a reasoning model.

483
00:23:26,240 --> 00:23:28,119
Speaker 2: While I wouldn't say this is a hat on a

484
00:23:28,160 --> 00:23:31,119
Speaker 2: hat situation, it is at this point unclear what exactly

485
00:23:31,200 --> 00:23:35,320
Speaker 2: the benefits of chat GPT five's new architecture are, less hallucinations,

486
00:23:35,640 --> 00:23:38,440
Speaker 2: better answers. Based on what I've been told, this was

487
00:23:38,480 --> 00:23:41,480
Speaker 2: a decision made to increase the model's performance, what I

488
00:23:41,520 --> 00:23:43,919
Speaker 2: can say is that this very likely increased open ayes

489
00:23:43,960 --> 00:23:45,560
Speaker 2: overhead at a time when it needs to do the

490
00:23:45,600 --> 00:23:49,040
Speaker 2: exact opposite. Even if chat GPT five pushes people towards

491
00:23:49,119 --> 00:23:51,920
Speaker 2: cheaper models, it does so while guaranteeing extra costs and

492
00:23:52,000 --> 00:23:55,400
Speaker 2: latency and whatever signals it may learn as people use.

493
00:23:55,440 --> 00:23:59,200
Speaker 2: This will have to create significant benefits massive one hundred

494
00:23:59,240 --> 00:24:01,720
Speaker 2: percent plus game for it to be anything close to worthwhile.

495
00:24:02,400 --> 00:24:04,359
Speaker 2: While open ai is rude to may be smart in

496
00:24:04,480 --> 00:24:06,560
Speaker 2: terms of nuance of how it might answer a query,

497
00:24:06,600 --> 00:24:09,600
Speaker 2: and even that I question it most decidedly, is not

498
00:24:09,720 --> 00:24:12,119
Speaker 2: more efficient and may have actually increased the burn rate

499
00:24:12,160 --> 00:24:13,680
Speaker 2: for a company that will lose as much as eight

500
00:24:13,720 --> 00:24:16,400
Speaker 2: billion dollars this year, and I think that number might

501
00:24:16,440 --> 00:24:19,840
Speaker 2: be low too. Yet what I'm left with in writing

502
00:24:19,880 --> 00:24:23,080
Speaker 2: this script is how wasteful all of this is. Open Ai,

503
00:24:23,400 --> 00:24:26,439
Speaker 2: a company that is already incinerated upwards of fifteen billion

504
00:24:26,480 --> 00:24:28,840
Speaker 2: dollars in the last two years, has chosen to create

505
00:24:28,880 --> 00:24:31,000
Speaker 2: a less efficient way of doing business as a means

506
00:24:31,000 --> 00:24:35,040
Speaker 2: of eking out and monest the best performance improvements. It

507
00:24:35,160 --> 00:24:38,359
Speaker 2: just sucks. In our own lives, we're continually pushed and

508
00:24:38,400 --> 00:24:40,960
Speaker 2: pressured and punished if we get into debt, judged by

509
00:24:40,960 --> 00:24:43,480
Speaker 2: our peers and our parents, if we spend our money recklessly,

510
00:24:43,640 --> 00:24:45,920
Speaker 2: and if we're too reckless, we find ourselves less likely

511
00:24:45,960 --> 00:24:49,600
Speaker 2: to receive anything from credit to housing. Companies like open

512
00:24:49,600 --> 00:24:52,560
Speaker 2: Ai live by a different set of standards. Some Mormon

513
00:24:52,640 --> 00:24:54,959
Speaker 2: intends to lose more than forty four billion dollars by

514
00:24:55,000 --> 00:24:57,080
Speaker 2: the end of twenty twenty eight on open Ai, and

515
00:24:57,119 --> 00:25:00,639
Speaker 2: graciously told CNBC, like Lord Farquad, he was willing to

516
00:25:00,720 --> 00:25:02,919
Speaker 2: run at a loss for a long time where he

517
00:25:03,000 --> 00:25:06,000
Speaker 2: was treated like he was this smart, reasonable decision maker

518
00:25:06,080 --> 00:25:08,920
Speaker 2: rather than someone that needs to rein in their horrendous

519
00:25:09,000 --> 00:25:12,560
Speaker 2: spending habits and be more mindful. The ultra rich are

520
00:25:12,720 --> 00:25:15,280
Speaker 2: rewarded far more for their errant spending habits than we

521
00:25:15,320 --> 00:25:18,160
Speaker 2: ever are for any thrifty inness or austerity measures we make,

522
00:25:18,600 --> 00:25:20,600
Speaker 2: and none of us are afforded the level of grace

523
00:25:20,640 --> 00:25:24,720
Speaker 2: that Clammy sam Altman has been and has been feels appropriate.

524
00:25:25,240 --> 00:25:28,960
Speaker 2: Chat GPT five is an engineering nightmare, a phenomenally silly

525
00:25:29,000 --> 00:25:31,240
Speaker 2: and desperate attempt to duce what remains of the dying

526
00:25:31,280 --> 00:25:34,760
Speaker 2: innovation and excitement within the walls of open Ai. It's

527
00:25:34,800 --> 00:25:37,480
Speaker 2: not November twenty twenty two anymore. And let's be honest,

528
00:25:37,480 --> 00:25:39,959
Speaker 2: there really hasn't been anything exciting or interesting out this

529
00:25:40,000 --> 00:25:44,560
Speaker 2: company since GPT four. There's nothing exciting happening at this company.

530
00:25:45,080 --> 00:25:47,600
Speaker 2: As many as seven hundred million people a week allegedly

531
00:25:47,680 --> 00:25:50,320
Speaker 2: used chat GPT, but nobody can really say why. An

532
00:25:50,320 --> 00:25:53,720
Speaker 2: open Ai, despite its massive popularity. Cannot seem to stop

533
00:25:53,760 --> 00:25:56,880
Speaker 2: losing billions of dollars, and it can't seem to explain

534
00:25:56,920 --> 00:26:00,399
Speaker 2: why that's necessary other than this shit's really expensive. Dude,

535
00:26:00,960 --> 00:26:03,480
Speaker 2: Can anyone actually articulate a reason why we need to

536
00:26:03,480 --> 00:26:05,960
Speaker 2: burn billions of dollars to do this? What are we doing?

537
00:26:06,080 --> 00:26:08,240
Speaker 2: Why are we doing it? Has everybody just agreed to

538
00:26:08,280 --> 00:26:11,080
Speaker 2: do this until it becomes a completely untenable Do we

539
00:26:11,119 --> 00:26:13,080
Speaker 2: all yearn for the abyss so much that we can't

540
00:26:13,080 --> 00:26:17,359
Speaker 2: find camaraderie and admitting we were wrong? Look at GPT five.

541
00:26:17,880 --> 00:26:20,399
Speaker 2: This is, if you believe the hype, the best funded,

542
00:26:20,440 --> 00:26:23,320
Speaker 2: best resourced company in the world, with the greatest mind

543
00:26:23,359 --> 00:26:26,080
Speaker 2: and its helm and the greatest minds within its wars.

544
00:26:26,240 --> 00:26:28,600
Speaker 2: And this is the best they've gone. A large language

545
00:26:28,640 --> 00:26:31,480
Speaker 2: model that chooses which large language model will answer your question.

546
00:26:32,200 --> 00:26:35,160
Speaker 2: G fucking wit, Sam Mortman sounds dandy, and how much

547
00:26:35,240 --> 00:26:37,800
Speaker 2: better is this? You say, Oh, you can't really say

548
00:26:38,119 --> 00:26:40,400
Speaker 2: fucking brilliant? Hey does it do anything new?

549
00:26:40,840 --> 00:26:40,879
Speaker 3: No?

550
00:26:41,720 --> 00:26:44,560
Speaker 2: Oh, what's that? It's actually our job to work out

551
00:26:44,560 --> 00:26:48,040
Speaker 2: for ourselves. Thanks man, I love it. I love this shit.

552
00:26:48,200 --> 00:26:50,560
Speaker 2: And if you're someone that is a hype merchant listening

553
00:26:50,560 --> 00:26:52,200
Speaker 2: to this and you've done really well getting to the

554
00:26:52,280 --> 00:26:54,119
Speaker 2: end of the third part. By the way, I respect you.

555
00:26:54,440 --> 00:26:56,639
Speaker 2: I want you to email me and explain why they

556
00:26:56,680 --> 00:26:59,159
Speaker 2: should be justified in burning billions of dollars if you

557
00:26:59,280 --> 00:27:03,040
Speaker 2: tell me, if you tell me Aws, I will eat

558
00:27:03,080 --> 00:27:06,000
Speaker 2: you alive. I mean that, does it? I mean that

559
00:27:06,240 --> 00:27:10,399
Speaker 2: completely literally, I will unhinge my jaw. I'll eat you

560
00:27:10,520 --> 00:27:12,359
Speaker 2: like Kirby and shit out of dance. I've said that

561
00:27:12,359 --> 00:27:15,600
Speaker 2: one before, but I'm going with him in any case.

562
00:27:16,400 --> 00:27:19,119
Speaker 2: This three parter has also really reminded me how ridiculous

563
00:27:19,160 --> 00:27:23,120
Speaker 2: this is, how nonsensical things have become, and how much

564
00:27:23,200 --> 00:27:27,920
Speaker 2: waste has been kind of justified, justified on this idea

565
00:27:27,960 --> 00:27:30,200
Speaker 2: that this will become something by people that don't really

566
00:27:30,200 --> 00:27:32,240
Speaker 2: know what it does today or might do in the future.

567
00:27:32,840 --> 00:27:34,919
Speaker 2: None of this is going to end well, and not

568
00:27:34,960 --> 00:27:38,080
Speaker 2: even the boosters seem to be having fun anymore. Everybody's

569
00:27:38,160 --> 00:27:40,640
Speaker 2: just flating around waiting for it to end. Even Sam

570
00:27:40,720 --> 00:27:43,600
Speaker 2: Ortman seems tired of it all. I know, I bloody

571
00:27:43,600 --> 00:27:54,359
Speaker 2: well I am. Thank you for listening to Better Offline.

572
00:27:54,480 --> 00:27:56,920
Speaker 3: The editor and composer of the Better Offline theme song

573
00:27:57,000 --> 00:27:59,639
Speaker 3: is Metosowski. You can check out more of his music

574
00:27:59,640 --> 00:28:03,320
Speaker 3: and audio projects at Mattasowski dot com m A T

575
00:28:03,320 --> 00:28:07,760
Speaker 3: T O S O W s Ki dot com. You

576
00:28:07,800 --> 00:28:10,320
Speaker 3: can email me at easy at better offline dot com

577
00:28:10,400 --> 00:28:12,720
Speaker 3: or visit better offline dot com to find more podcast

578
00:28:12,760 --> 00:28:16,080
Speaker 3: links and of course, my newsletter. I also really recommend

579
00:28:16,119 --> 00:28:18,080
Speaker 3: you go to chat dot where's youreaed dot at to

580
00:28:18,160 --> 00:28:20,520
Speaker 3: visit the discord, and go to our slash.

581
00:28:20,200 --> 00:28:23,359
Speaker 2: Better Offline to check out I'll Reddit. Thank you so

582
00:28:23,440 --> 00:28:26,880
Speaker 2: much for listening. Better Offline is a production of cool

583
00:28:26,960 --> 00:28:29,719
Speaker 2: Zone Media. For more from cool Zone Media, visit our

584
00:28:29,760 --> 00:28:32,760
Speaker 2: website cool Zonemedia dot com, or check us out on

585
00:28:32,840 --> 00:28:36,639
Speaker 2: the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.