WEBVTT - My First Thoughts on New OpenAI Strawberry Model ( OpenAI o1-preview)

0:00:21.513 --> 0:00:26.403
<v S1>All right. Welcome to unsupervised learning. This is Daniel. Okay.

0:00:26.433 --> 0:00:29.613
<v S1>I'm going to start off with something that just happened.

0:00:29.613 --> 0:00:34.443
<v S1>So strawberry just launched. It is being called zero one,

0:00:34.443 --> 0:00:38.403
<v S1>which I assume the O might mean Orion because people

0:00:38.403 --> 0:00:41.433
<v S1>were saying that it might have been called Orion. So

0:00:41.433 --> 0:00:44.193
<v S1>this is the new model from OpenAI. And I've been

0:00:44.193 --> 0:00:48.783
<v S1>messing with it for a couple hours already. So, uh,

0:00:48.783 --> 0:00:51.033
<v S1>first thing is I gave it a task of building

0:00:51.033 --> 0:00:53.553
<v S1>a business plan for something I'm working on, and it

0:00:53.553 --> 0:00:56.313
<v S1>produced output that was far and above better than Ford

0:00:56.343 --> 0:01:03.663
<v S1>or Sonnet 3.5. Yeah, it was really quite, quite good. Uh,

0:01:03.663 --> 0:01:06.843
<v S1>very detailed. It took quite a while. There's no streaming

0:01:06.843 --> 0:01:09.933
<v S1>in the API, so it feels a little rough compared

0:01:09.933 --> 0:01:13.863
<v S1>to the current models. But whatever that, that will come

0:01:13.863 --> 0:01:18.693
<v S1>with time. Uh, it's quite expensive. So basically I did

0:01:18.693 --> 0:01:24.703
<v S1>a couple of conversation analysis, uh, Analysis by passing in, um,

0:01:25.183 --> 0:01:28.933
<v S1>you know, conversations like transcripts from podcasts. And I think

0:01:28.933 --> 0:01:31.633
<v S1>I did 2 or 3 of those, and it was

0:01:31.633 --> 0:01:35.443
<v S1>almost a dollar. And there's also a mini version which

0:01:35.443 --> 0:01:39.403
<v S1>is way less expensive, but I'm trying to test the capabilities,

0:01:39.403 --> 0:01:42.703
<v S1>so I'm using the full model. But yeah, a few

0:01:42.703 --> 0:01:50.623
<v S1>requests for a dollar, whereas I would say probably many

0:01:50.623 --> 0:01:55.513
<v S1>dozen or a couple of hundred requests are normally like

0:01:55.783 --> 0:02:01.453
<v S1>a few dollars. So it's many factors more expensive. So

0:02:01.453 --> 0:02:05.983
<v S1>just something to consider. As with most models, you don't

0:02:05.983 --> 0:02:08.923
<v S1>need the biggest, best or latest. This is a tweet

0:02:08.923 --> 0:02:12.043
<v S1>I just put out, so I'm going through it. So

0:02:12.073 --> 0:02:16.543
<v S1>this does one particular thing well, which is in better

0:02:16.543 --> 0:02:20.683
<v S1>than anything else, which is pausing to think and actually

0:02:20.683 --> 0:02:23.453
<v S1>going step by step. That's kind of like the magic sauce.

0:02:23.453 --> 0:02:27.743
<v S1>Here is the chain of thought reasoning. So if you

0:02:27.743 --> 0:02:30.773
<v S1>don't need that for what you're trying to do, you

0:02:30.773 --> 0:02:33.893
<v S1>definitely shouldn't use this because it's more expensive, takes longer

0:02:33.893 --> 0:02:38.243
<v S1>to run. All those sorts of reasons, this type of

0:02:38.243 --> 0:02:42.473
<v S1>model and similar ones going forward are going to massively

0:02:42.473 --> 0:02:45.953
<v S1>benefit from high quality prompting. So things like we use

0:02:45.953 --> 0:02:50.123
<v S1>with fabric, which is open source on GitHub if you're

0:02:50.123 --> 0:02:52.943
<v S1>not familiar, but you probably are if you're listening to this.

0:02:53.213 --> 0:02:55.793
<v S1>But essentially, the more you know what you want and

0:02:55.793 --> 0:02:58.133
<v S1>the better you can articulate that, the better this is

0:02:58.133 --> 0:03:01.493
<v S1>going to perform, because it is a chain of thought

0:03:01.523 --> 0:03:04.523
<v S1>sort of concept. So the more you give it to

0:03:04.553 --> 0:03:12.203
<v S1>help with that, the better. Okay, sorry about that. I

0:03:12.203 --> 0:03:15.113
<v S1>was just checking to make sure I wasn't doxxing anyone

0:03:15.113 --> 0:03:18.143
<v S1>by showing you my messages, but I was not, so

0:03:18.173 --> 0:03:24.793
<v S1>I don't have to rerecord. Okay, so, um. continuing on

0:03:24.793 --> 0:03:30.403
<v S1>here and going to expand this window fully. Okay. So, um, yeah,

0:03:30.433 --> 0:03:33.163
<v S1>the better you can articulate all of this. And by

0:03:33.163 --> 0:03:35.233
<v S1>the way, I want to do an edit there for

0:03:35.233 --> 0:03:39.703
<v S1>the team. So the better you can articulate this stuff

0:03:40.303 --> 0:03:43.993
<v S1>in exactly what you want, the better things are. That's

0:03:43.993 --> 0:03:47.893
<v S1>the bottom line here. So a lot of people are

0:03:47.893 --> 0:03:52.243
<v S1>going to question is this AGI or not? Uh, Sam

0:03:52.243 --> 0:03:55.813
<v S1>Altman already responded. He's like, yeah, this absolutely is not.

0:03:56.023 --> 0:03:59.923
<v S1>So that that should end it in terms of the

0:03:59.923 --> 0:04:03.253
<v S1>actual creator of this thing saying it's not. I also

0:04:03.253 --> 0:04:07.363
<v S1>don't think it is either. Uh, whatever that matters for.

0:04:07.363 --> 0:04:10.633
<v S1>But bottom line is, anyone who's making the claim of

0:04:10.663 --> 0:04:14.953
<v S1>like this is or isn't AGI. Here's my request to

0:04:14.953 --> 0:04:18.583
<v S1>the internet. Basically, anyone claiming something is or is not

0:04:18.583 --> 0:04:22.483
<v S1>should also provide a concise and achievable definition of what

0:04:22.533 --> 0:04:25.443
<v S1>that means. And I have one here, of course, which

0:04:25.443 --> 0:04:30.513
<v S1>is I've talked about before, whether the ability of an AI,

0:04:30.543 --> 0:04:33.243
<v S1>whether a model or a product or a system to

0:04:33.273 --> 0:04:37.593
<v S1>perform the work of an average US based knowledge worker

0:04:37.593 --> 0:04:43.023
<v S1>in 2002, and I say 2002 because that's pre GPT four. Right.

0:04:43.623 --> 0:04:51.393
<v S1>So basically pre AI in these terms anyway. So yeah

0:04:51.423 --> 0:04:54.783
<v S1>anyone who's talking about AGI make sure they have a definition.

0:04:54.783 --> 0:04:58.653
<v S1>Otherwise you're just wasting your time because the entire conversation

0:04:58.653 --> 0:05:01.443
<v S1>will be about definitions. And you might not even figure

0:05:01.443 --> 0:05:06.513
<v S1>that out until fucking two hours later. Sorry for the cussing.

0:05:06.903 --> 0:05:09.033
<v S1>All right. One of the most important changes to me

0:05:09.033 --> 0:05:12.183
<v S1>with this model. This this is massive, okay? This is

0:05:12.183 --> 0:05:15.813
<v S1>the first model that does this. Uh, it's the first

0:05:15.813 --> 0:05:19.143
<v S1>model of its kind to do this very, very interesting.

0:05:19.803 --> 0:05:24.783
<v S1>It's actually spending tokens To think, okay, before you had

0:05:24.783 --> 0:05:28.803
<v S1>input and you had output and you were being charged in,

0:05:28.803 --> 0:05:30.873
<v S1>the amount of work that was being done was based

0:05:30.873 --> 0:05:33.693
<v S1>on the number of tokens coming in and the number

0:05:33.693 --> 0:05:36.303
<v S1>of tokens coming out, and that that was the extent

0:05:36.303 --> 0:05:40.143
<v S1>of it. What's happening now is you have tokens coming

0:05:40.143 --> 0:05:44.253
<v S1>in and you have tokens coming out, but there are

0:05:44.253 --> 0:05:50.193
<v S1>tokens being spent while it's thinking. It's actually thinking and

0:05:50.193 --> 0:05:54.153
<v S1>reasoning through how to solve the problem. And what's really

0:05:54.153 --> 0:05:59.763
<v S1>fascinating about this is that you now have multiple factors here. Okay.

0:05:59.793 --> 0:06:03.633
<v S1>So you can do better prompting. And this is the

0:06:03.633 --> 0:06:07.833
<v S1>next piece here. Number seven. You could do better prompting.

0:06:07.983 --> 0:06:11.133
<v S1>You could use a smarter model. Or you could have

0:06:11.133 --> 0:06:15.813
<v S1>the model think harder on the problem. And these are

0:06:15.813 --> 0:06:19.743
<v S1>all going to be levers and knobs that we have

0:06:19.773 --> 0:06:22.503
<v S1>to get better results from AI. And this is the

0:06:22.543 --> 0:06:26.143
<v S1>first time we have this third level lever of like

0:06:26.173 --> 0:06:31.243
<v S1>actually having it think, right. So at inference time, more

0:06:31.243 --> 0:06:34.693
<v S1>effort being spent. And they actually say in the blog

0:06:34.693 --> 0:06:37.123
<v S1>post they're like, hey, look, right now it's taking, you know,

0:06:37.153 --> 0:06:40.963
<v S1>a few seconds to think or whatever, and it's going

0:06:40.993 --> 0:06:43.903
<v S1>to get back great results. But we're thinking, what if

0:06:43.903 --> 0:06:47.413
<v S1>it thinks for minutes? What if it thinks for hours?

0:06:47.413 --> 0:06:50.743
<v S1>What if it thinks for days or weeks? And not

0:06:50.743 --> 0:06:54.643
<v S1>only that, but we give it more compute power to think.

0:06:55.243 --> 0:06:58.123
<v S1>And the example they gave, I think this was an

0:06:58.123 --> 0:07:01.483
<v S1>OpenAI post. The example they gave here was how much

0:07:01.483 --> 0:07:04.003
<v S1>do you want to solve cancer? What if you could

0:07:04.033 --> 0:07:07.393
<v S1>build a data center? What if you had one data

0:07:07.423 --> 0:07:10.453
<v S1>center just for working on cancer and one data center

0:07:10.453 --> 0:07:16.393
<v S1>just for working on aging and so on? Okay. And

0:07:16.393 --> 0:07:19.663
<v S1>you basically have models like this that scale with the

0:07:19.663 --> 0:07:22.723
<v S1>inference difficulty based on the amount of difficulty of the,

0:07:22.763 --> 0:07:25.643
<v S1>of the thinking. And then, of course, you have a

0:07:25.643 --> 0:07:28.973
<v S1>smart model and a good neural net and all that, right?

0:07:29.003 --> 0:07:32.693
<v S1>Scalability of the of the neural net. So maybe that's

0:07:32.693 --> 0:07:36.863
<v S1>GPT five, GPT six, whatever. Combined with the good prompting,

0:07:36.863 --> 0:07:42.893
<v S1>combined with this thinking capability and combined with, you know,

0:07:42.923 --> 0:07:48.713
<v S1>all those things unified into the combined with having that

0:07:48.713 --> 0:07:53.933
<v S1>giant infrastructure to run it so that that's insane. Um,

0:07:53.963 --> 0:07:56.213
<v S1>and the scales all the way down to like, the

0:07:56.213 --> 0:08:00.863
<v S1>smallest stupid problem where it's just like, whatever, GPT three

0:08:00.893 --> 0:08:04.313
<v S1>and you get back the answer almost instantaneously. In fact,

0:08:04.313 --> 0:08:07.733
<v S1>forget GPT three. It's some local model that only does

0:08:07.733 --> 0:08:11.573
<v S1>one thing well. You're spending almost no resources whatsoever. It

0:08:11.573 --> 0:08:14.933
<v S1>just goes to your phone, bounces back immediately, doesn't go anywhere,

0:08:14.933 --> 0:08:18.803
<v S1>barely costs any cycles of a GPU or a CPU

0:08:18.833 --> 0:08:21.713
<v S1>because you don't need those resources to run. Because it's

0:08:21.713 --> 0:08:24.733
<v S1>just an easy thing to answer. So now we're talking

0:08:24.733 --> 0:08:30.703
<v S1>about AI that scales with the difficulty of the problem, right? With,

0:08:30.943 --> 0:08:35.893
<v S1>you know, cancer, aging, getting out of the solar system,

0:08:35.893 --> 0:08:40.303
<v S1>escaping the sun, expanding, ultimately heat, death of the universe.

0:08:40.303 --> 0:08:45.613
<v S1>That's a big one, right? Because entropy kills everything. So ultimately,

0:08:45.613 --> 0:08:47.173
<v S1>we're going to need a way out of here at

0:08:47.173 --> 0:08:53.143
<v S1>some point, assuming we survive that long. Not happening anytime soon.

0:08:53.143 --> 0:08:55.783
<v S1>I wouldn't worry about that. But these are the types

0:08:55.783 --> 0:09:00.103
<v S1>of things that are really exciting. You know, the size

0:09:00.103 --> 0:09:04.213
<v S1>of the problem being being a factor, for which I

0:09:04.243 --> 0:09:06.973
<v S1>you point at it with lots and lots of different

0:09:07.003 --> 0:09:11.533
<v S1>knobs and levers controlling that decision. So I think that's

0:09:11.533 --> 0:09:15.103
<v S1>really cool. Another important thing to mention is that the

0:09:15.103 --> 0:09:18.673
<v S1>innovation seems independent of what we were waiting for for

0:09:18.673 --> 0:09:22.423
<v S1>GPT five. So based on all I read, all the

0:09:22.423 --> 0:09:25.883
<v S1>releases from OpenAI. And I've seen all the rumors and,

0:09:25.913 --> 0:09:28.163
<v S1>you know, talked to a bunch of people who've been

0:09:28.163 --> 0:09:32.603
<v S1>speculating about this. And this seems completely independent from, oh,

0:09:32.633 --> 0:09:36.473
<v S1>is this GPT four oh, is it for oh, is

0:09:36.473 --> 0:09:40.343
<v S1>it five? Is it an early version of five. Doesn't

0:09:40.343 --> 0:09:45.293
<v S1>really matter. It's like a separate axis. This is like

0:09:45.293 --> 0:09:48.923
<v S1>a capability. This is like thinking capability, which is on

0:09:48.923 --> 0:09:52.763
<v S1>a separate axis from how big or smart is the

0:09:52.763 --> 0:09:56.573
<v S1>neural net, right? Or how big or smart is the

0:09:56.603 --> 0:10:01.103
<v S1>is the model. So really, really cool to think about

0:10:01.103 --> 0:10:03.503
<v S1>those being two separate things because now we can start

0:10:03.503 --> 0:10:06.803
<v S1>thinking about, okay, well if GPT five is still going

0:10:06.833 --> 0:10:09.263
<v S1>to come out, you know, later this year, beginning in

0:10:09.293 --> 0:10:11.723
<v S1>next year or whenever it's going to come out and

0:10:11.723 --> 0:10:15.833
<v S1>whatever they're going to call it. Well, imagine GPT five

0:10:15.833 --> 0:10:22.283
<v S1>with this thinking capability. That's cool. So presumably this is

0:10:22.283 --> 0:10:25.203
<v S1>just a feature that you can add onto any model,

0:10:25.293 --> 0:10:28.323
<v S1>which is what we're just talking about. And I think

0:10:28.323 --> 0:10:32.823
<v S1>this is okay. This is really, really crucial here. I've

0:10:32.823 --> 0:10:34.983
<v S1>been talking for a long time about slack in the

0:10:34.983 --> 0:10:39.093
<v S1>rope and tricks that we're going to use to jump

0:10:39.093 --> 0:10:43.503
<v S1>ahead in, um, advancement of AI, so so check this out.

0:10:43.533 --> 0:10:45.513
<v S1>A lot of people are like, oh, we're running into

0:10:45.513 --> 0:10:49.173
<v S1>a data wall. Oh, neural nets are only so good

0:10:49.233 --> 0:10:52.293
<v S1>they can only get so good. We've already hit a thing.

0:10:52.293 --> 0:10:54.993
<v S1>I mean, so many, so many people are saying things

0:10:54.993 --> 0:11:00.903
<v S1>like this that just sound absolutely ridiculous to me. First

0:11:00.903 --> 0:11:03.963
<v S1>of all, they were the ones saying we wouldn't be here.

0:11:03.963 --> 0:11:08.253
<v S1>And so now we are here and everyone's surprised and

0:11:08.253 --> 0:11:10.773
<v S1>they're like, well, here's what we know for sure is

0:11:10.803 --> 0:11:13.833
<v S1>we're not going to get any better. How can I

0:11:13.833 --> 0:11:18.063
<v S1>believe you if you didn't predict any of this and

0:11:18.063 --> 0:11:21.753
<v S1>you were absolutely certain back then, and now you're absolutely

0:11:21.753 --> 0:11:26.323
<v S1>certain it's not going to jump ahead again, right? Leopold

0:11:26.353 --> 0:11:29.713
<v S1>talks about this in his paper. There's lots of different

0:11:29.713 --> 0:11:33.793
<v S1>ways to get better. There's the architecture of the model.

0:11:33.793 --> 0:11:37.723
<v S1>There's the size of the model. I forget what all

0:11:37.723 --> 0:11:40.273
<v S1>levers he had, but it's the architecture of the model,

0:11:40.303 --> 0:11:42.043
<v S1>the size of the model. And I think it was

0:11:42.043 --> 0:11:45.793
<v S1>hobbling was the other one, which is what I called

0:11:46.003 --> 0:11:48.913
<v S1>like a year ago. Slack in the rope or tricks

0:11:48.913 --> 0:11:50.533
<v S1>we're going to. This is what I told a friend

0:11:50.533 --> 0:11:53.203
<v S1>of mine who's really smart in this stuff. I said,

0:11:53.473 --> 0:11:58.663
<v S1>watch this. We're going to find multiple tricks where we're

0:11:58.663 --> 0:12:02.113
<v S1>messing around in percentage points, and then we find a

0:12:02.113 --> 0:12:05.003
<v S1>thing and it jumps us 2 or 3 or 5

0:12:05.003 --> 0:12:10.663
<v S1>or 10 x or 100 x ahead. And and I

0:12:10.663 --> 0:12:13.843
<v S1>actually learned this from him. Uh, I actually learned this

0:12:13.843 --> 0:12:16.033
<v S1>from him. He was like, hey, you know, there are

0:12:16.033 --> 0:12:19.243
<v S1>things that jump you ahead. Um, and I think he

0:12:19.243 --> 0:12:22.333
<v S1>gave me example from some public paper or whatever. And

0:12:22.333 --> 0:12:25.793
<v S1>it was an example of like a big jump. And

0:12:25.793 --> 0:12:28.673
<v S1>my natural intuition was there's going to be a lot

0:12:28.673 --> 0:12:33.353
<v S1>more of those, and they're not coming from pursuing along

0:12:33.353 --> 0:12:36.293
<v S1>this axis, which is difficult. They are actually just hanging

0:12:36.293 --> 0:12:38.693
<v S1>off to the side. It's like, oh, did you know

0:12:38.693 --> 0:12:40.823
<v S1>if you just changed the color of this? Hey, did

0:12:40.823 --> 0:12:43.673
<v S1>you know if you just orient the data backward instead

0:12:43.673 --> 0:12:45.953
<v S1>of forward? Hey, did you know if you just prune

0:12:45.983 --> 0:12:48.653
<v S1>the data in this way or if you add this

0:12:48.653 --> 0:12:52.283
<v S1>particular data set or. And I'm just making up these examples,

0:12:52.283 --> 0:12:57.053
<v S1>but simple things that you wouldn't think would work. And

0:12:57.053 --> 0:13:01.253
<v S1>this is why Leopold talks about if you automate an

0:13:01.253 --> 0:13:05.543
<v S1>AI engineer or an AI researcher, is what he called it.

0:13:05.573 --> 0:13:08.783
<v S1>That's when it gets completely silly, because they have the

0:13:08.783 --> 0:13:10.913
<v S1>ability to now go and try a whole bunch of

0:13:10.913 --> 0:13:15.293
<v S1>these things, including these tricks. Um, all this to say

0:13:15.293 --> 0:13:18.683
<v S1>that the slack in the rope or this series of

0:13:18.683 --> 0:13:22.943
<v S1>tricks is going to keep multiplying our advances, and that's

0:13:22.973 --> 0:13:27.483
<v S1>at the same time that we're working on the algorithms. Oh,

0:13:27.483 --> 0:13:30.123
<v S1>that was the other. That was the other factor is algorithms.

0:13:30.123 --> 0:13:32.073
<v S1>That was this is going to happen at the same

0:13:32.073 --> 0:13:34.773
<v S1>time we're working on the algorithms to make those better.

0:13:34.803 --> 0:13:38.733
<v S1>We're also working on the size of the neural net, um,

0:13:38.733 --> 0:13:41.523
<v S1>and the quality and the structure. And everything about the

0:13:41.523 --> 0:13:44.013
<v S1>neural net is going to get bigger and more powerful,

0:13:44.043 --> 0:13:46.953
<v S1>but mostly just a matter of size, number of parameters.

0:13:48.003 --> 0:13:51.393
<v S1>But all those things are changing at the same time

0:13:51.393 --> 0:13:55.563
<v S1>as we're finding all these tricks. Right. So we're talking

0:13:55.563 --> 0:14:00.123
<v S1>about this is just begun. And this is what people

0:14:00.123 --> 0:14:02.883
<v S1>don't realize. This is just now starting. We're going to

0:14:02.883 --> 0:14:06.483
<v S1>look back in two years and be like, what was that?

0:14:06.483 --> 0:14:11.943
<v S1>That was silly. Right. And so I really want to

0:14:11.973 --> 0:14:15.693
<v S1>warn people against thinking we're hitting some kind of a wall.

0:14:16.293 --> 0:14:19.053
<v S1>Think of it this way. We just found alien technology.

0:14:19.083 --> 0:14:21.933
<v S1>We have no idea how it works. And we're like,

0:14:21.963 --> 0:14:26.063
<v S1>poking it with a stick and it's already spitting out

0:14:26.063 --> 0:14:29.693
<v S1>amazing things. So think about that. Okay, we got a

0:14:29.693 --> 0:14:32.363
<v S1>glowy ball. We don't know how it floats. We don't

0:14:32.393 --> 0:14:36.923
<v S1>know how it's doing. Anti-Gravity, right? We don't know how

0:14:36.923 --> 0:14:39.443
<v S1>it's doing this. We don't know how it's reflecting its surface.

0:14:39.473 --> 0:14:41.543
<v S1>We don't know how it's coming up with these answers.

0:14:41.543 --> 0:14:43.763
<v S1>We don't know how it got here from the other

0:14:43.763 --> 0:14:46.763
<v S1>solar system. We don't know anything about it. You poke

0:14:46.763 --> 0:14:49.463
<v S1>it with a stick and it tells this magic stuff

0:14:49.463 --> 0:14:54.023
<v S1>and we're like, Holy crap, that's amazing. Somebody walks up,

0:14:54.023 --> 0:14:57.053
<v S1>sees you poke it with a stick and goes, yeah,

0:14:57.083 --> 0:15:00.803
<v S1>that's I mean, that's that's all it's ever going to

0:15:00.803 --> 0:15:04.283
<v S1>be able to do. I mean, I've seen you poke

0:15:04.283 --> 0:15:07.613
<v S1>it with a stick twice, and it gave you kind

0:15:07.613 --> 0:15:10.703
<v S1>of a similar answer, which means that's all we could

0:15:10.703 --> 0:15:16.433
<v S1>learn from this alien ball. That's their conclusion. I am

0:15:16.433 --> 0:15:19.553
<v S1>certain that since you poked it with a stick while

0:15:19.553 --> 0:15:22.133
<v S1>I was standing here three times, and it kind of

0:15:22.163 --> 0:15:26.323
<v S1>gave you a similar answer. One it must be stupid.

0:15:26.353 --> 0:15:29.833
<v S1>Two it's not as smart as us. And three, this

0:15:29.833 --> 0:15:32.683
<v S1>is as as smart as it's ever going to be.

0:15:32.713 --> 0:15:36.043
<v S1>This is the most it has to offer. That is

0:15:36.043 --> 0:15:39.973
<v S1>the claim that's being made by these kind of like denialists,

0:15:40.003 --> 0:15:45.163
<v S1>in my view. And that doesn't mean the current shiny

0:15:45.163 --> 0:15:49.783
<v S1>ball is better than humans, or it should replace humans,

0:15:49.783 --> 0:15:52.723
<v S1>or it could do everything we could do. Like, this

0:15:52.723 --> 0:15:55.093
<v S1>is not a competition. Okay, here's a better way to

0:15:55.123 --> 0:15:58.063
<v S1>think about this. This is not like a rock that

0:15:58.063 --> 0:16:00.973
<v S1>we have animated. Think of it this way. If an

0:16:00.973 --> 0:16:04.363
<v S1>alien comes here because someone else was like, hey, this

0:16:04.363 --> 0:16:08.083
<v S1>is not thinking, this is processing. And I'm like, come on,

0:16:08.083 --> 0:16:11.503
<v S1>come on. If you if an alien comes here, let's

0:16:11.503 --> 0:16:14.863
<v S1>assume we know how our brain works. An alien comes

0:16:14.863 --> 0:16:17.473
<v S1>here and we look at its brain, or it shows

0:16:17.473 --> 0:16:22.813
<v S1>us its brain and it looks different. And we're like, oh,

0:16:22.843 --> 0:16:28.783
<v S1>you guys do neurons and synapses different than us? Who's

0:16:28.783 --> 0:16:31.363
<v S1>going to walk over and be like, well, since they're

0:16:31.363 --> 0:16:35.593
<v S1>doing neurons and synapses different than us, they're not thinking.

0:16:36.043 --> 0:16:40.723
<v S1>Only humans can think. And I'm like, they got here.

0:16:40.753 --> 0:16:43.483
<v S1>They got here, didn't they? It's a little shiny ball.

0:16:43.483 --> 0:16:46.213
<v S1>And they got here from whatever part of the galaxy

0:16:46.213 --> 0:16:51.253
<v S1>or universe that they came from. They're obviously doing something right.

0:16:52.033 --> 0:16:55.573
<v S1>And I is obviously doing something right too. So I

0:16:55.573 --> 0:16:59.173
<v S1>think it's a little bit specious. Is that is that

0:16:59.173 --> 0:17:05.713
<v S1>the name of the word? It's like specious to just

0:17:05.713 --> 0:17:09.973
<v S1>magically assume that we are the best. Only we are

0:17:10.003 --> 0:17:15.493
<v S1>thinking only we are special. Instead of thinking like we

0:17:15.493 --> 0:17:19.813
<v S1>might have this nascent alien intelligence thing going on that

0:17:19.813 --> 0:17:22.873
<v S1>actually is doing things that are very much analogous to us.

0:17:22.993 --> 0:17:25.223
<v S1>It reminds me of the first time that I clicked

0:17:25.223 --> 0:17:29.423
<v S1>around inside of Linux. This is like late 90s. I

0:17:29.423 --> 0:17:33.863
<v S1>was messing with Linux. This must have been like 9798

0:17:33.863 --> 0:17:37.673
<v S1>or something. I'm messing with Linux and I'm clicking around

0:17:37.703 --> 0:17:41.693
<v S1>because I had started with windows and I'm like, oh,

0:17:41.693 --> 0:17:44.303
<v S1>it opens windows and it opens things that I could

0:17:44.303 --> 0:17:47.633
<v S1>click and navigate. Then I'm like, it's it's just like

0:17:47.663 --> 0:17:52.523
<v S1>on Windows Explorer. And this like, blew me away. It

0:17:52.553 --> 0:17:55.373
<v S1>absolutely blew me away that this was just a different

0:17:55.373 --> 0:17:59.033
<v S1>way of doing the same thing. And that underneath this,

0:17:59.303 --> 0:18:02.543
<v S1>there's a universal thing of you need to be able

0:18:02.543 --> 0:18:05.333
<v S1>to browse files, you need to be able to open windows,

0:18:05.333 --> 0:18:08.903
<v S1>you need to be able to close windows. And that

0:18:08.903 --> 0:18:10.913
<v S1>clicked for me. And I'm like, oh, I guess like

0:18:10.943 --> 0:18:13.763
<v S1>all operating systems are going to do this differently. It's

0:18:13.763 --> 0:18:16.193
<v S1>the same with aliens. It's the same with like they

0:18:16.193 --> 0:18:20.153
<v S1>might think differently, but whatever. They have to think, right.

0:18:20.153 --> 0:18:23.633
<v S1>So why would we expect this synthetic intelligence that we've

0:18:23.673 --> 0:18:28.053
<v S1>birthed to do it exactly the same way that we

0:18:28.083 --> 0:18:32.253
<v S1>way that we do. We should not expect that we

0:18:32.283 --> 0:18:38.763
<v S1>got here accidentally stumbling through time due to evolution. And

0:18:38.763 --> 0:18:42.663
<v S1>we've got this version that we have and it's awesome, obviously.

0:18:42.933 --> 0:18:46.953
<v S1>But like, that's way different than we invented this thing

0:18:46.983 --> 0:18:52.293
<v S1>five years ago or whenever that was 2017, six years ago.

0:18:53.313 --> 0:18:55.803
<v S1>And I know it goes further back than that. But

0:18:55.833 --> 0:19:01.083
<v S1>you know what I'm saying? Transformers. All right. So that's that.

0:19:01.083 --> 0:19:04.503
<v S1>And this this is becoming a long thing. But whatever

0:19:04.533 --> 0:19:09.663
<v S1>we'll go with it. So yeah, basically we have no

0:19:09.963 --> 0:19:14.463
<v S1>idea how early all of this is. We're likely to

0:19:14.493 --> 0:19:17.643
<v S1>find ten, 20 or 200 more of these holy crap

0:19:17.673 --> 0:19:23.463
<v S1>optimizations like this thinking thing before we start hitting any

0:19:23.463 --> 0:19:30.323
<v S1>limits for neural network architecture or the transform transformer like.

0:19:30.743 --> 0:19:34.073
<v S1>Plus we could just find something better than a transformer.

0:19:34.073 --> 0:19:38.993
<v S1>You realize how how lucky we were to find the transformer.

0:19:39.023 --> 0:19:41.813
<v S1>Like the people who made that paper. They're like, hey,

0:19:41.813 --> 0:19:44.003
<v S1>this is this is a cool way we think this

0:19:44.003 --> 0:19:45.923
<v S1>is a cool way of doing something. They didn't know

0:19:45.923 --> 0:19:50.033
<v S1>what they had. Okay, you should watch a Karpathy talk

0:19:50.063 --> 0:19:53.303
<v S1>about the transformer. He's like, this thing is a general

0:19:53.303 --> 0:19:57.173
<v S1>purpose computer. This thing is insanely good at learning. He

0:19:57.173 --> 0:20:01.193
<v S1>talks about different ways that it's better than humans at learning. Okay,

0:20:01.223 --> 0:20:05.933
<v S1>some some people randomly found this thing and it shot

0:20:05.933 --> 0:20:08.813
<v S1>us off. Okay. So so check this out. This is

0:20:08.813 --> 0:20:14.573
<v S1>another example of finding tricks or slack in the rope

0:20:14.573 --> 0:20:17.873
<v S1>just lying on the ground. So we stumble through AI

0:20:18.383 --> 0:20:22.943
<v S1>for decades and decades and decades. And then someone's like, hey,

0:20:22.943 --> 0:20:25.413
<v S1>this is kind of cool about this attention mechanism. Hey,

0:20:25.413 --> 0:20:29.193
<v S1>what do you think about this architecture for a neural net? Boom!

0:20:29.193 --> 0:20:34.263
<v S1>Now we have this take off. There's nothing saying somebody

0:20:34.263 --> 0:20:37.653
<v S1>isn't going to be like, I like what you did

0:20:37.653 --> 0:20:42.603
<v S1>with that transformer architecture. What if it looked like this instead?

0:20:42.633 --> 0:20:46.473
<v S1>It might be 20 times better. It might be 2000

0:20:46.473 --> 0:20:50.403
<v S1>times better. It might be 4% better. It doesn't matter.

0:20:50.433 --> 0:20:55.053
<v S1>Like the we have only just begun. We have only

0:20:55.053 --> 0:20:58.833
<v S1>just begun. I can absolutely guarantee you that assuming we

0:20:58.833 --> 0:21:02.043
<v S1>don't kill ourselves off as a result of this, like

0:21:02.073 --> 0:21:07.713
<v S1>that would set things back. But I'm trying to get

0:21:07.713 --> 0:21:12.873
<v S1>you to think about things in this way because it's

0:21:12.873 --> 0:21:16.773
<v S1>insane what's about to happen. And yeah, I'm going to

0:21:16.773 --> 0:21:18.963
<v S1>have more examples here. I'm working on an example right

0:21:18.963 --> 0:21:21.783
<v S1>here on this other screen. Uh, pretty cool thing I'm

0:21:21.783 --> 0:21:25.023
<v S1>building with it. Um, okay. So that was that.