1
00:00:21,513 --> 00:00:26,403
S1: All right. Welcome to unsupervised learning. This is Daniel. Okay.

2
00:00:26,433 --> 00:00:29,613
S1: I'm going to start off with something that just happened.

3
00:00:29,613 --> 00:00:34,443
S1: So strawberry just launched. It is being called zero one,

4
00:00:34,443 --> 00:00:38,403
S1: which I assume the O might mean Orion because people

5
00:00:38,403 --> 00:00:41,433
S1: were saying that it might have been called Orion. So

6
00:00:41,433 --> 00:00:44,193
S1: this is the new model from OpenAI. And I've been

7
00:00:44,193 --> 00:00:48,783
S1: messing with it for a couple hours already. So, uh,

8
00:00:48,783 --> 00:00:51,033
S1: first thing is I gave it a task of building

9
00:00:51,033 --> 00:00:53,553
S1: a business plan for something I'm working on, and it

10
00:00:53,553 --> 00:00:56,313
S1: produced output that was far and above better than Ford

11
00:00:56,343 --> 00:01:03,663
S1: or Sonnet 3.5. Yeah, it was really quite, quite good. Uh,

12
00:01:03,663 --> 00:01:06,843
S1: very detailed. It took quite a while. There's no streaming

13
00:01:06,843 --> 00:01:09,933
S1: in the API, so it feels a little rough compared

14
00:01:09,933 --> 00:01:13,863
S1: to the current models. But whatever that, that will come

15
00:01:13,863 --> 00:01:18,693
S1: with time. Uh, it's quite expensive. So basically I did

16
00:01:18,693 --> 00:01:24,703
S1: a couple of conversation analysis, uh, Analysis by passing in, um,

17
00:01:25,183 --> 00:01:28,933
S1: you know, conversations like transcripts from podcasts. And I think

18
00:01:28,933 --> 00:01:31,633
S1: I did 2 or 3 of those, and it was

19
00:01:31,633 --> 00:01:35,443
S1: almost a dollar. And there's also a mini version which

20
00:01:35,443 --> 00:01:39,403
S1: is way less expensive, but I'm trying to test the capabilities,

21
00:01:39,403 --> 00:01:42,703
S1: so I'm using the full model. But yeah, a few

22
00:01:42,703 --> 00:01:50,623
S1: requests for a dollar, whereas I would say probably many

23
00:01:50,623 --> 00:01:55,513
S1: dozen or a couple of hundred requests are normally like

24
00:01:55,783 --> 00:02:01,453
S1: a few dollars. So it's many factors more expensive. So

25
00:02:01,453 --> 00:02:05,983
S1: just something to consider. As with most models, you don't

26
00:02:05,983 --> 00:02:08,923
S1: need the biggest, best or latest. This is a tweet

27
00:02:08,923 --> 00:02:12,043
S1: I just put out, so I'm going through it. So

28
00:02:12,073 --> 00:02:16,543
S1: this does one particular thing well, which is in better

29
00:02:16,543 --> 00:02:20,683
S1: than anything else, which is pausing to think and actually

30
00:02:20,683 --> 00:02:23,453
S1: going step by step. That's kind of like the magic sauce.

31
00:02:23,453 --> 00:02:27,743
S1: Here is the chain of thought reasoning. So if you

32
00:02:27,743 --> 00:02:30,773
S1: don't need that for what you're trying to do, you

33
00:02:30,773 --> 00:02:33,893
S1: definitely shouldn't use this because it's more expensive, takes longer

34
00:02:33,893 --> 00:02:38,243
S1: to run. All those sorts of reasons, this type of

35
00:02:38,243 --> 00:02:42,473
S1: model and similar ones going forward are going to massively

36
00:02:42,473 --> 00:02:45,953
S1: benefit from high quality prompting. So things like we use

37
00:02:45,953 --> 00:02:50,123
S1: with fabric, which is open source on GitHub if you're

38
00:02:50,123 --> 00:02:52,943
S1: not familiar, but you probably are if you're listening to this.

39
00:02:53,213 --> 00:02:55,793
S1: But essentially, the more you know what you want and

40
00:02:55,793 --> 00:02:58,133
S1: the better you can articulate that, the better this is

41
00:02:58,133 --> 00:03:01,493
S1: going to perform, because it is a chain of thought

42
00:03:01,523 --> 00:03:04,523
S1: sort of concept. So the more you give it to

43
00:03:04,553 --> 00:03:12,203
S1: help with that, the better. Okay, sorry about that. I

44
00:03:12,203 --> 00:03:15,113
S1: was just checking to make sure I wasn't doxxing anyone

45
00:03:15,113 --> 00:03:18,143
S1: by showing you my messages, but I was not, so

46
00:03:18,173 --> 00:03:24,793
S1: I don't have to rerecord. Okay, so, um. continuing on

47
00:03:24,793 --> 00:03:30,403
S1: here and going to expand this window fully. Okay. So, um, yeah,

48
00:03:30,433 --> 00:03:33,163
S1: the better you can articulate all of this. And by

49
00:03:33,163 --> 00:03:35,233
S1: the way, I want to do an edit there for

50
00:03:35,233 --> 00:03:39,703
S1: the team. So the better you can articulate this stuff

51
00:03:40,303 --> 00:03:43,993
S1: in exactly what you want, the better things are. That's

52
00:03:43,993 --> 00:03:47,893
S1: the bottom line here. So a lot of people are

53
00:03:47,893 --> 00:03:52,243
S1: going to question is this AGI or not? Uh, Sam

54
00:03:52,243 --> 00:03:55,813
S1: Altman already responded. He's like, yeah, this absolutely is not.

55
00:03:56,023 --> 00:03:59,923
S1: So that that should end it in terms of the

56
00:03:59,923 --> 00:04:03,253
S1: actual creator of this thing saying it's not. I also

57
00:04:03,253 --> 00:04:07,363
S1: don't think it is either. Uh, whatever that matters for.

58
00:04:07,363 --> 00:04:10,633
S1: But bottom line is, anyone who's making the claim of

59
00:04:10,663 --> 00:04:14,953
S1: like this is or isn't AGI. Here's my request to

60
00:04:14,953 --> 00:04:18,583
S1: the internet. Basically, anyone claiming something is or is not

61
00:04:18,583 --> 00:04:22,483
S1: should also provide a concise and achievable definition of what

62
00:04:22,533 --> 00:04:25,443
S1: that means. And I have one here, of course, which

63
00:04:25,443 --> 00:04:30,513
S1: is I've talked about before, whether the ability of an AI,

64
00:04:30,543 --> 00:04:33,243
S1: whether a model or a product or a system to

65
00:04:33,273 --> 00:04:37,593
S1: perform the work of an average US based knowledge worker

66
00:04:37,593 --> 00:04:43,023
S1: in 2002, and I say 2002 because that's pre GPT four. Right.

67
00:04:43,623 --> 00:04:51,393
S1: So basically pre AI in these terms anyway. So yeah

68
00:04:51,423 --> 00:04:54,783
S1: anyone who's talking about AGI make sure they have a definition.

69
00:04:54,783 --> 00:04:58,653
S1: Otherwise you're just wasting your time because the entire conversation

70
00:04:58,653 --> 00:05:01,443
S1: will be about definitions. And you might not even figure

71
00:05:01,443 --> 00:05:06,513
S1: that out until fucking two hours later. Sorry for the cussing.

72
00:05:06,903 --> 00:05:09,033
S1: All right. One of the most important changes to me

73
00:05:09,033 --> 00:05:12,183
S1: with this model. This this is massive, okay? This is

74
00:05:12,183 --> 00:05:15,813
S1: the first model that does this. Uh, it's the first

75
00:05:15,813 --> 00:05:19,143
S1: model of its kind to do this very, very interesting.

76
00:05:19,803 --> 00:05:24,783
S1: It's actually spending tokens To think, okay, before you had

77
00:05:24,783 --> 00:05:28,803
S1: input and you had output and you were being charged in,

78
00:05:28,803 --> 00:05:30,873
S1: the amount of work that was being done was based

79
00:05:30,873 --> 00:05:33,693
S1: on the number of tokens coming in and the number

80
00:05:33,693 --> 00:05:36,303
S1: of tokens coming out, and that that was the extent

81
00:05:36,303 --> 00:05:40,143
S1: of it. What's happening now is you have tokens coming

82
00:05:40,143 --> 00:05:44,253
S1: in and you have tokens coming out, but there are

83
00:05:44,253 --> 00:05:50,193
S1: tokens being spent while it's thinking. It's actually thinking and

84
00:05:50,193 --> 00:05:54,153
S1: reasoning through how to solve the problem. And what's really

85
00:05:54,153 --> 00:05:59,763
S1: fascinating about this is that you now have multiple factors here. Okay.

86
00:05:59,793 --> 00:06:03,633
S1: So you can do better prompting. And this is the

87
00:06:03,633 --> 00:06:07,833
S1: next piece here. Number seven. You could do better prompting.

88
00:06:07,983 --> 00:06:11,133
S1: You could use a smarter model. Or you could have

89
00:06:11,133 --> 00:06:15,813
S1: the model think harder on the problem. And these are

90
00:06:15,813 --> 00:06:19,743
S1: all going to be levers and knobs that we have

91
00:06:19,773 --> 00:06:22,503
S1: to get better results from AI. And this is the

92
00:06:22,543 --> 00:06:26,143
S1: first time we have this third level lever of like

93
00:06:26,173 --> 00:06:31,243
S1: actually having it think, right. So at inference time, more

94
00:06:31,243 --> 00:06:34,693
S1: effort being spent. And they actually say in the blog

95
00:06:34,693 --> 00:06:37,123
S1: post they're like, hey, look, right now it's taking, you know,

96
00:06:37,153 --> 00:06:40,963
S1: a few seconds to think or whatever, and it's going

97
00:06:40,993 --> 00:06:43,903
S1: to get back great results. But we're thinking, what if

98
00:06:43,903 --> 00:06:47,413
S1: it thinks for minutes? What if it thinks for hours?

99
00:06:47,413 --> 00:06:50,743
S1: What if it thinks for days or weeks? And not

100
00:06:50,743 --> 00:06:54,643
S1: only that, but we give it more compute power to think.

101
00:06:55,243 --> 00:06:58,123
S1: And the example they gave, I think this was an

102
00:06:58,123 --> 00:07:01,483
S1: OpenAI post. The example they gave here was how much

103
00:07:01,483 --> 00:07:04,003
S1: do you want to solve cancer? What if you could

104
00:07:04,033 --> 00:07:07,393
S1: build a data center? What if you had one data

105
00:07:07,423 --> 00:07:10,453
S1: center just for working on cancer and one data center

106
00:07:10,453 --> 00:07:16,393
S1: just for working on aging and so on? Okay. And

107
00:07:16,393 --> 00:07:19,663
S1: you basically have models like this that scale with the

108
00:07:19,663 --> 00:07:22,723
S1: inference difficulty based on the amount of difficulty of the,

109
00:07:22,763 --> 00:07:25,643
S1: of the thinking. And then, of course, you have a

110
00:07:25,643 --> 00:07:28,973
S1: smart model and a good neural net and all that, right?

111
00:07:29,003 --> 00:07:32,693
S1: Scalability of the of the neural net. So maybe that's

112
00:07:32,693 --> 00:07:36,863
S1: GPT five, GPT six, whatever. Combined with the good prompting,

113
00:07:36,863 --> 00:07:42,893
S1: combined with this thinking capability and combined with, you know,

114
00:07:42,923 --> 00:07:48,713
S1: all those things unified into the combined with having that

115
00:07:48,713 --> 00:07:53,933
S1: giant infrastructure to run it so that that's insane. Um,

116
00:07:53,963 --> 00:07:56,213
S1: and the scales all the way down to like, the

117
00:07:56,213 --> 00:08:00,863
S1: smallest stupid problem where it's just like, whatever, GPT three

118
00:08:00,893 --> 00:08:04,313
S1: and you get back the answer almost instantaneously. In fact,

119
00:08:04,313 --> 00:08:07,733
S1: forget GPT three. It's some local model that only does

120
00:08:07,733 --> 00:08:11,573
S1: one thing well. You're spending almost no resources whatsoever. It

121
00:08:11,573 --> 00:08:14,933
S1: just goes to your phone, bounces back immediately, doesn't go anywhere,

122
00:08:14,933 --> 00:08:18,803
S1: barely costs any cycles of a GPU or a CPU

123
00:08:18,833 --> 00:08:21,713
S1: because you don't need those resources to run. Because it's

124
00:08:21,713 --> 00:08:24,733
S1: just an easy thing to answer. So now we're talking

125
00:08:24,733 --> 00:08:30,703
S1: about AI that scales with the difficulty of the problem, right? With,

126
00:08:30,943 --> 00:08:35,893
S1: you know, cancer, aging, getting out of the solar system,

127
00:08:35,893 --> 00:08:40,303
S1: escaping the sun, expanding, ultimately heat, death of the universe.

128
00:08:40,303 --> 00:08:45,613
S1: That's a big one, right? Because entropy kills everything. So ultimately,

129
00:08:45,613 --> 00:08:47,173
S1: we're going to need a way out of here at

130
00:08:47,173 --> 00:08:53,143
S1: some point, assuming we survive that long. Not happening anytime soon.

131
00:08:53,143 --> 00:08:55,783
S1: I wouldn't worry about that. But these are the types

132
00:08:55,783 --> 00:09:00,103
S1: of things that are really exciting. You know, the size

133
00:09:00,103 --> 00:09:04,213
S1: of the problem being being a factor, for which I

134
00:09:04,243 --> 00:09:06,973
S1: you point at it with lots and lots of different

135
00:09:07,003 --> 00:09:11,533
S1: knobs and levers controlling that decision. So I think that's

136
00:09:11,533 --> 00:09:15,103
S1: really cool. Another important thing to mention is that the

137
00:09:15,103 --> 00:09:18,673
S1: innovation seems independent of what we were waiting for for

138
00:09:18,673 --> 00:09:22,423
S1: GPT five. So based on all I read, all the

139
00:09:22,423 --> 00:09:25,883
S1: releases from OpenAI. And I've seen all the rumors and,

140
00:09:25,913 --> 00:09:28,163
S1: you know, talked to a bunch of people who've been

141
00:09:28,163 --> 00:09:32,603
S1: speculating about this. And this seems completely independent from, oh,

142
00:09:32,633 --> 00:09:36,473
S1: is this GPT four oh, is it for oh, is

143
00:09:36,473 --> 00:09:40,343
S1: it five? Is it an early version of five. Doesn't

144
00:09:40,343 --> 00:09:45,293
S1: really matter. It's like a separate axis. This is like

145
00:09:45,293 --> 00:09:48,923
S1: a capability. This is like thinking capability, which is on

146
00:09:48,923 --> 00:09:52,763
S1: a separate axis from how big or smart is the

147
00:09:52,763 --> 00:09:56,573
S1: neural net, right? Or how big or smart is the

148
00:09:56,603 --> 00:10:01,103
S1: is the model. So really, really cool to think about

149
00:10:01,103 --> 00:10:03,503
S1: those being two separate things because now we can start

150
00:10:03,503 --> 00:10:06,803
S1: thinking about, okay, well if GPT five is still going

151
00:10:06,833 --> 00:10:09,263
S1: to come out, you know, later this year, beginning in

152
00:10:09,293 --> 00:10:11,723
S1: next year or whenever it's going to come out and

153
00:10:11,723 --> 00:10:15,833
S1: whatever they're going to call it. Well, imagine GPT five

154
00:10:15,833 --> 00:10:22,283
S1: with this thinking capability. That's cool. So presumably this is

155
00:10:22,283 --> 00:10:25,203
S1: just a feature that you can add onto any model,

156
00:10:25,293 --> 00:10:28,323
S1: which is what we're just talking about. And I think

157
00:10:28,323 --> 00:10:32,823
S1: this is okay. This is really, really crucial here. I've

158
00:10:32,823 --> 00:10:34,983
S1: been talking for a long time about slack in the

159
00:10:34,983 --> 00:10:39,093
S1: rope and tricks that we're going to use to jump

160
00:10:39,093 --> 00:10:43,503
S1: ahead in, um, advancement of AI, so so check this out.

161
00:10:43,533 --> 00:10:45,513
S1: A lot of people are like, oh, we're running into

162
00:10:45,513 --> 00:10:49,173
S1: a data wall. Oh, neural nets are only so good

163
00:10:49,233 --> 00:10:52,293
S1: they can only get so good. We've already hit a thing.

164
00:10:52,293 --> 00:10:54,993
S1: I mean, so many, so many people are saying things

165
00:10:54,993 --> 00:11:00,903
S1: like this that just sound absolutely ridiculous to me. First

166
00:11:00,903 --> 00:11:03,963
S1: of all, they were the ones saying we wouldn't be here.

167
00:11:03,963 --> 00:11:08,253
S1: And so now we are here and everyone's surprised and

168
00:11:08,253 --> 00:11:10,773
S1: they're like, well, here's what we know for sure is

169
00:11:10,803 --> 00:11:13,833
S1: we're not going to get any better. How can I

170
00:11:13,833 --> 00:11:18,063
S1: believe you if you didn't predict any of this and

171
00:11:18,063 --> 00:11:21,753
S1: you were absolutely certain back then, and now you're absolutely

172
00:11:21,753 --> 00:11:26,323
S1: certain it's not going to jump ahead again, right? Leopold

173
00:11:26,353 --> 00:11:29,713
S1: talks about this in his paper. There's lots of different

174
00:11:29,713 --> 00:11:33,793
S1: ways to get better. There's the architecture of the model.

175
00:11:33,793 --> 00:11:37,723
S1: There's the size of the model. I forget what all

176
00:11:37,723 --> 00:11:40,273
S1: levers he had, but it's the architecture of the model,

177
00:11:40,303 --> 00:11:42,043
S1: the size of the model. And I think it was

178
00:11:42,043 --> 00:11:45,793
S1: hobbling was the other one, which is what I called

179
00:11:46,003 --> 00:11:48,913
S1: like a year ago. Slack in the rope or tricks

180
00:11:48,913 --> 00:11:50,533
S1: we're going to. This is what I told a friend

181
00:11:50,533 --> 00:11:53,203
S1: of mine who's really smart in this stuff. I said,

182
00:11:53,473 --> 00:11:58,663
S1: watch this. We're going to find multiple tricks where we're

183
00:11:58,663 --> 00:12:02,113
S1: messing around in percentage points, and then we find a

184
00:12:02,113 --> 00:12:05,003
S1: thing and it jumps us 2 or 3 or 5

185
00:12:05,003 --> 00:12:10,663
S1: or 10 x or 100 x ahead. And and I

186
00:12:10,663 --> 00:12:13,843
S1: actually learned this from him. Uh, I actually learned this

187
00:12:13,843 --> 00:12:16,033
S1: from him. He was like, hey, you know, there are

188
00:12:16,033 --> 00:12:19,243
S1: things that jump you ahead. Um, and I think he

189
00:12:19,243 --> 00:12:22,333
S1: gave me example from some public paper or whatever. And

190
00:12:22,333 --> 00:12:25,793
S1: it was an example of like a big jump. And

191
00:12:25,793 --> 00:12:28,673
S1: my natural intuition was there's going to be a lot

192
00:12:28,673 --> 00:12:33,353
S1: more of those, and they're not coming from pursuing along

193
00:12:33,353 --> 00:12:36,293
S1: this axis, which is difficult. They are actually just hanging

194
00:12:36,293 --> 00:12:38,693
S1: off to the side. It's like, oh, did you know

195
00:12:38,693 --> 00:12:40,823
S1: if you just changed the color of this? Hey, did

196
00:12:40,823 --> 00:12:43,673
S1: you know if you just orient the data backward instead

197
00:12:43,673 --> 00:12:45,953
S1: of forward? Hey, did you know if you just prune

198
00:12:45,983 --> 00:12:48,653
S1: the data in this way or if you add this

199
00:12:48,653 --> 00:12:52,283
S1: particular data set or. And I'm just making up these examples,

200
00:12:52,283 --> 00:12:57,053
S1: but simple things that you wouldn't think would work. And

201
00:12:57,053 --> 00:13:01,253
S1: this is why Leopold talks about if you automate an

202
00:13:01,253 --> 00:13:05,543
S1: AI engineer or an AI researcher, is what he called it.

203
00:13:05,573 --> 00:13:08,783
S1: That's when it gets completely silly, because they have the

204
00:13:08,783 --> 00:13:10,913
S1: ability to now go and try a whole bunch of

205
00:13:10,913 --> 00:13:15,293
S1: these things, including these tricks. Um, all this to say

206
00:13:15,293 --> 00:13:18,683
S1: that the slack in the rope or this series of

207
00:13:18,683 --> 00:13:22,943
S1: tricks is going to keep multiplying our advances, and that's

208
00:13:22,973 --> 00:13:27,483
S1: at the same time that we're working on the algorithms. Oh,

209
00:13:27,483 --> 00:13:30,123
S1: that was the other. That was the other factor is algorithms.

210
00:13:30,123 --> 00:13:32,073
S1: That was this is going to happen at the same

211
00:13:32,073 --> 00:13:34,773
S1: time we're working on the algorithms to make those better.

212
00:13:34,803 --> 00:13:38,733
S1: We're also working on the size of the neural net, um,

213
00:13:38,733 --> 00:13:41,523
S1: and the quality and the structure. And everything about the

214
00:13:41,523 --> 00:13:44,013
S1: neural net is going to get bigger and more powerful,

215
00:13:44,043 --> 00:13:46,953
S1: but mostly just a matter of size, number of parameters.

216
00:13:48,003 --> 00:13:51,393
S1: But all those things are changing at the same time

217
00:13:51,393 --> 00:13:55,563
S1: as we're finding all these tricks. Right. So we're talking

218
00:13:55,563 --> 00:14:00,123
S1: about this is just begun. And this is what people

219
00:14:00,123 --> 00:14:02,883
S1: don't realize. This is just now starting. We're going to

220
00:14:02,883 --> 00:14:06,483
S1: look back in two years and be like, what was that?

221
00:14:06,483 --> 00:14:11,943
S1: That was silly. Right. And so I really want to

222
00:14:11,973 --> 00:14:15,693
S1: warn people against thinking we're hitting some kind of a wall.

223
00:14:16,293 --> 00:14:19,053
S1: Think of it this way. We just found alien technology.

224
00:14:19,083 --> 00:14:21,933
S1: We have no idea how it works. And we're like,

225
00:14:21,963 --> 00:14:26,063
S1: poking it with a stick and it's already spitting out

226
00:14:26,063 --> 00:14:29,693
S1: amazing things. So think about that. Okay, we got a

227
00:14:29,693 --> 00:14:32,363
S1: glowy ball. We don't know how it floats. We don't

228
00:14:32,393 --> 00:14:36,923
S1: know how it's doing. Anti-Gravity, right? We don't know how

229
00:14:36,923 --> 00:14:39,443
S1: it's doing this. We don't know how it's reflecting its surface.

230
00:14:39,473 --> 00:14:41,543
S1: We don't know how it's coming up with these answers.

231
00:14:41,543 --> 00:14:43,763
S1: We don't know how it got here from the other

232
00:14:43,763 --> 00:14:46,763
S1: solar system. We don't know anything about it. You poke

233
00:14:46,763 --> 00:14:49,463
S1: it with a stick and it tells this magic stuff

234
00:14:49,463 --> 00:14:54,023
S1: and we're like, Holy crap, that's amazing. Somebody walks up,

235
00:14:54,023 --> 00:14:57,053
S1: sees you poke it with a stick and goes, yeah,

236
00:14:57,083 --> 00:15:00,803
S1: that's I mean, that's that's all it's ever going to

237
00:15:00,803 --> 00:15:04,283
S1: be able to do. I mean, I've seen you poke

238
00:15:04,283 --> 00:15:07,613
S1: it with a stick twice, and it gave you kind

239
00:15:07,613 --> 00:15:10,703
S1: of a similar answer, which means that's all we could

240
00:15:10,703 --> 00:15:16,433
S1: learn from this alien ball. That's their conclusion. I am

241
00:15:16,433 --> 00:15:19,553
S1: certain that since you poked it with a stick while

242
00:15:19,553 --> 00:15:22,133
S1: I was standing here three times, and it kind of

243
00:15:22,163 --> 00:15:26,323
S1: gave you a similar answer. One it must be stupid.

244
00:15:26,353 --> 00:15:29,833
S1: Two it's not as smart as us. And three, this

245
00:15:29,833 --> 00:15:32,683
S1: is as as smart as it's ever going to be.

246
00:15:32,713 --> 00:15:36,043
S1: This is the most it has to offer. That is

247
00:15:36,043 --> 00:15:39,973
S1: the claim that's being made by these kind of like denialists,

248
00:15:40,003 --> 00:15:45,163
S1: in my view. And that doesn't mean the current shiny

249
00:15:45,163 --> 00:15:49,783
S1: ball is better than humans, or it should replace humans,

250
00:15:49,783 --> 00:15:52,723
S1: or it could do everything we could do. Like, this

251
00:15:52,723 --> 00:15:55,093
S1: is not a competition. Okay, here's a better way to

252
00:15:55,123 --> 00:15:58,063
S1: think about this. This is not like a rock that

253
00:15:58,063 --> 00:16:00,973
S1: we have animated. Think of it this way. If an

254
00:16:00,973 --> 00:16:04,363
S1: alien comes here because someone else was like, hey, this

255
00:16:04,363 --> 00:16:08,083
S1: is not thinking, this is processing. And I'm like, come on,

256
00:16:08,083 --> 00:16:11,503
S1: come on. If you if an alien comes here, let's

257
00:16:11,503 --> 00:16:14,863
S1: assume we know how our brain works. An alien comes

258
00:16:14,863 --> 00:16:17,473
S1: here and we look at its brain, or it shows

259
00:16:17,473 --> 00:16:22,813
S1: us its brain and it looks different. And we're like, oh,

260
00:16:22,843 --> 00:16:28,783
S1: you guys do neurons and synapses different than us? Who's

261
00:16:28,783 --> 00:16:31,363
S1: going to walk over and be like, well, since they're

262
00:16:31,363 --> 00:16:35,593
S1: doing neurons and synapses different than us, they're not thinking.

263
00:16:36,043 --> 00:16:40,723
S1: Only humans can think. And I'm like, they got here.

264
00:16:40,753 --> 00:16:43,483
S1: They got here, didn't they? It's a little shiny ball.

265
00:16:43,483 --> 00:16:46,213
S1: And they got here from whatever part of the galaxy

266
00:16:46,213 --> 00:16:51,253
S1: or universe that they came from. They're obviously doing something right.

267
00:16:52,033 --> 00:16:55,573
S1: And I is obviously doing something right too. So I

268
00:16:55,573 --> 00:16:59,173
S1: think it's a little bit specious. Is that is that

269
00:16:59,173 --> 00:17:05,713
S1: the name of the word? It's like specious to just

270
00:17:05,713 --> 00:17:09,973
S1: magically assume that we are the best. Only we are

271
00:17:10,003 --> 00:17:15,493
S1: thinking only we are special. Instead of thinking like we

272
00:17:15,493 --> 00:17:19,813
S1: might have this nascent alien intelligence thing going on that

273
00:17:19,813 --> 00:17:22,873
S1: actually is doing things that are very much analogous to us.

274
00:17:22,993 --> 00:17:25,223
S1: It reminds me of the first time that I clicked

275
00:17:25,223 --> 00:17:29,423
S1: around inside of Linux. This is like late 90s. I

276
00:17:29,423 --> 00:17:33,863
S1: was messing with Linux. This must have been like 9798

277
00:17:33,863 --> 00:17:37,673
S1: or something. I'm messing with Linux and I'm clicking around

278
00:17:37,703 --> 00:17:41,693
S1: because I had started with windows and I'm like, oh,

279
00:17:41,693 --> 00:17:44,303
S1: it opens windows and it opens things that I could

280
00:17:44,303 --> 00:17:47,633
S1: click and navigate. Then I'm like, it's it's just like

281
00:17:47,663 --> 00:17:52,523
S1: on Windows Explorer. And this like, blew me away. It

282
00:17:52,553 --> 00:17:55,373
S1: absolutely blew me away that this was just a different

283
00:17:55,373 --> 00:17:59,033
S1: way of doing the same thing. And that underneath this,

284
00:17:59,303 --> 00:18:02,543
S1: there's a universal thing of you need to be able

285
00:18:02,543 --> 00:18:05,333
S1: to browse files, you need to be able to open windows,

286
00:18:05,333 --> 00:18:08,903
S1: you need to be able to close windows. And that

287
00:18:08,903 --> 00:18:10,913
S1: clicked for me. And I'm like, oh, I guess like

288
00:18:10,943 --> 00:18:13,763
S1: all operating systems are going to do this differently. It's

289
00:18:13,763 --> 00:18:16,193
S1: the same with aliens. It's the same with like they

290
00:18:16,193 --> 00:18:20,153
S1: might think differently, but whatever. They have to think, right.

291
00:18:20,153 --> 00:18:23,633
S1: So why would we expect this synthetic intelligence that we've

292
00:18:23,673 --> 00:18:28,053
S1: birthed to do it exactly the same way that we

293
00:18:28,083 --> 00:18:32,253
S1: way that we do. We should not expect that we

294
00:18:32,283 --> 00:18:38,763
S1: got here accidentally stumbling through time due to evolution. And

295
00:18:38,763 --> 00:18:42,663
S1: we've got this version that we have and it's awesome, obviously.

296
00:18:42,933 --> 00:18:46,953
S1: But like, that's way different than we invented this thing

297
00:18:46,983 --> 00:18:52,293
S1: five years ago or whenever that was 2017, six years ago.

298
00:18:53,313 --> 00:18:55,803
S1: And I know it goes further back than that. But

299
00:18:55,833 --> 00:19:01,083
S1: you know what I'm saying? Transformers. All right. So that's that.

300
00:19:01,083 --> 00:19:04,503
S1: And this this is becoming a long thing. But whatever

301
00:19:04,533 --> 00:19:09,663
S1: we'll go with it. So yeah, basically we have no

302
00:19:09,963 --> 00:19:14,463
S1: idea how early all of this is. We're likely to

303
00:19:14,493 --> 00:19:17,643
S1: find ten, 20 or 200 more of these holy crap

304
00:19:17,673 --> 00:19:23,463
S1: optimizations like this thinking thing before we start hitting any

305
00:19:23,463 --> 00:19:30,323
S1: limits for neural network architecture or the transform transformer like.

306
00:19:30,743 --> 00:19:34,073
S1: Plus we could just find something better than a transformer.

307
00:19:34,073 --> 00:19:38,993
S1: You realize how how lucky we were to find the transformer.

308
00:19:39,023 --> 00:19:41,813
S1: Like the people who made that paper. They're like, hey,

309
00:19:41,813 --> 00:19:44,003
S1: this is this is a cool way we think this

310
00:19:44,003 --> 00:19:45,923
S1: is a cool way of doing something. They didn't know

311
00:19:45,923 --> 00:19:50,033
S1: what they had. Okay, you should watch a Karpathy talk

312
00:19:50,063 --> 00:19:53,303
S1: about the transformer. He's like, this thing is a general

313
00:19:53,303 --> 00:19:57,173
S1: purpose computer. This thing is insanely good at learning. He

314
00:19:57,173 --> 00:20:01,193
S1: talks about different ways that it's better than humans at learning. Okay,

315
00:20:01,223 --> 00:20:05,933
S1: some some people randomly found this thing and it shot

316
00:20:05,933 --> 00:20:08,813
S1: us off. Okay. So so check this out. This is

317
00:20:08,813 --> 00:20:14,573
S1: another example of finding tricks or slack in the rope

318
00:20:14,573 --> 00:20:17,873
S1: just lying on the ground. So we stumble through AI

319
00:20:18,383 --> 00:20:22,943
S1: for decades and decades and decades. And then someone's like, hey,

320
00:20:22,943 --> 00:20:25,413
S1: this is kind of cool about this attention mechanism. Hey,

321
00:20:25,413 --> 00:20:29,193
S1: what do you think about this architecture for a neural net? Boom!

322
00:20:29,193 --> 00:20:34,263
S1: Now we have this take off. There's nothing saying somebody

323
00:20:34,263 --> 00:20:37,653
S1: isn't going to be like, I like what you did

324
00:20:37,653 --> 00:20:42,603
S1: with that transformer architecture. What if it looked like this instead?

325
00:20:42,633 --> 00:20:46,473
S1: It might be 20 times better. It might be 2000

326
00:20:46,473 --> 00:20:50,403
S1: times better. It might be 4% better. It doesn't matter.

327
00:20:50,433 --> 00:20:55,053
S1: Like the we have only just begun. We have only

328
00:20:55,053 --> 00:20:58,833
S1: just begun. I can absolutely guarantee you that assuming we

329
00:20:58,833 --> 00:21:02,043
S1: don't kill ourselves off as a result of this, like

330
00:21:02,073 --> 00:21:07,713
S1: that would set things back. But I'm trying to get

331
00:21:07,713 --> 00:21:12,873
S1: you to think about things in this way because it's

332
00:21:12,873 --> 00:21:16,773
S1: insane what's about to happen. And yeah, I'm going to

333
00:21:16,773 --> 00:21:18,963
S1: have more examples here. I'm working on an example right

334
00:21:18,963 --> 00:21:21,783
S1: here on this other screen. Uh, pretty cool thing I'm

335
00:21:21,783 --> 00:21:25,023
S1: building with it. Um, okay. So that was that.