1 00:00:21,513 --> 00:00:26,403 S1: All right. Welcome to unsupervised learning. This is Daniel. Okay. 2 00:00:26,433 --> 00:00:29,613 S1: I'm going to start off with something that just happened. 3 00:00:29,613 --> 00:00:34,443 S1: So strawberry just launched. It is being called zero one, 4 00:00:34,443 --> 00:00:38,403 S1: which I assume the O might mean Orion because people 5 00:00:38,403 --> 00:00:41,433 S1: were saying that it might have been called Orion. So 6 00:00:41,433 --> 00:00:44,193 S1: this is the new model from OpenAI. And I've been 7 00:00:44,193 --> 00:00:48,783 S1: messing with it for a couple hours already. So, uh, 8 00:00:48,783 --> 00:00:51,033 S1: first thing is I gave it a task of building 9 00:00:51,033 --> 00:00:53,553 S1: a business plan for something I'm working on, and it 10 00:00:53,553 --> 00:00:56,313 S1: produced output that was far and above better than Ford 11 00:00:56,343 --> 00:01:03,663 S1: or Sonnet 3.5. Yeah, it was really quite, quite good. Uh, 12 00:01:03,663 --> 00:01:06,843 S1: very detailed. It took quite a while. There's no streaming 13 00:01:06,843 --> 00:01:09,933 S1: in the API, so it feels a little rough compared 14 00:01:09,933 --> 00:01:13,863 S1: to the current models. But whatever that, that will come 15 00:01:13,863 --> 00:01:18,693 S1: with time. Uh, it's quite expensive. So basically I did 16 00:01:18,693 --> 00:01:24,703 S1: a couple of conversation analysis, uh, Analysis by passing in, um, 17 00:01:25,183 --> 00:01:28,933 S1: you know, conversations like transcripts from podcasts. And I think 18 00:01:28,933 --> 00:01:31,633 S1: I did 2 or 3 of those, and it was 19 00:01:31,633 --> 00:01:35,443 S1: almost a dollar. And there's also a mini version which 20 00:01:35,443 --> 00:01:39,403 S1: is way less expensive, but I'm trying to test the capabilities, 21 00:01:39,403 --> 00:01:42,703 S1: so I'm using the full model. But yeah, a few 22 00:01:42,703 --> 00:01:50,623 S1: requests for a dollar, whereas I would say probably many 23 00:01:50,623 --> 00:01:55,513 S1: dozen or a couple of hundred requests are normally like 24 00:01:55,783 --> 00:02:01,453 S1: a few dollars. So it's many factors more expensive. So 25 00:02:01,453 --> 00:02:05,983 S1: just something to consider. As with most models, you don't 26 00:02:05,983 --> 00:02:08,923 S1: need the biggest, best or latest. This is a tweet 27 00:02:08,923 --> 00:02:12,043 S1: I just put out, so I'm going through it. So 28 00:02:12,073 --> 00:02:16,543 S1: this does one particular thing well, which is in better 29 00:02:16,543 --> 00:02:20,683 S1: than anything else, which is pausing to think and actually 30 00:02:20,683 --> 00:02:23,453 S1: going step by step. That's kind of like the magic sauce. 31 00:02:23,453 --> 00:02:27,743 S1: Here is the chain of thought reasoning. So if you 32 00:02:27,743 --> 00:02:30,773 S1: don't need that for what you're trying to do, you 33 00:02:30,773 --> 00:02:33,893 S1: definitely shouldn't use this because it's more expensive, takes longer 34 00:02:33,893 --> 00:02:38,243 S1: to run. All those sorts of reasons, this type of 35 00:02:38,243 --> 00:02:42,473 S1: model and similar ones going forward are going to massively 36 00:02:42,473 --> 00:02:45,953 S1: benefit from high quality prompting. So things like we use 37 00:02:45,953 --> 00:02:50,123 S1: with fabric, which is open source on GitHub if you're 38 00:02:50,123 --> 00:02:52,943 S1: not familiar, but you probably are if you're listening to this. 39 00:02:53,213 --> 00:02:55,793 S1: But essentially, the more you know what you want and 40 00:02:55,793 --> 00:02:58,133 S1: the better you can articulate that, the better this is 41 00:02:58,133 --> 00:03:01,493 S1: going to perform, because it is a chain of thought 42 00:03:01,523 --> 00:03:04,523 S1: sort of concept. So the more you give it to 43 00:03:04,553 --> 00:03:12,203 S1: help with that, the better. Okay, sorry about that. I 44 00:03:12,203 --> 00:03:15,113 S1: was just checking to make sure I wasn't doxxing anyone 45 00:03:15,113 --> 00:03:18,143 S1: by showing you my messages, but I was not, so 46 00:03:18,173 --> 00:03:24,793 S1: I don't have to rerecord. Okay, so, um. continuing on 47 00:03:24,793 --> 00:03:30,403 S1: here and going to expand this window fully. Okay. So, um, yeah, 48 00:03:30,433 --> 00:03:33,163 S1: the better you can articulate all of this. And by 49 00:03:33,163 --> 00:03:35,233 S1: the way, I want to do an edit there for 50 00:03:35,233 --> 00:03:39,703 S1: the team. So the better you can articulate this stuff 51 00:03:40,303 --> 00:03:43,993 S1: in exactly what you want, the better things are. That's 52 00:03:43,993 --> 00:03:47,893 S1: the bottom line here. So a lot of people are 53 00:03:47,893 --> 00:03:52,243 S1: going to question is this AGI or not? Uh, Sam 54 00:03:52,243 --> 00:03:55,813 S1: Altman already responded. He's like, yeah, this absolutely is not. 55 00:03:56,023 --> 00:03:59,923 S1: So that that should end it in terms of the 56 00:03:59,923 --> 00:04:03,253 S1: actual creator of this thing saying it's not. I also 57 00:04:03,253 --> 00:04:07,363 S1: don't think it is either. Uh, whatever that matters for. 58 00:04:07,363 --> 00:04:10,633 S1: But bottom line is, anyone who's making the claim of 59 00:04:10,663 --> 00:04:14,953 S1: like this is or isn't AGI. Here's my request to 60 00:04:14,953 --> 00:04:18,583 S1: the internet. Basically, anyone claiming something is or is not 61 00:04:18,583 --> 00:04:22,483 S1: should also provide a concise and achievable definition of what 62 00:04:22,533 --> 00:04:25,443 S1: that means. And I have one here, of course, which 63 00:04:25,443 --> 00:04:30,513 S1: is I've talked about before, whether the ability of an AI, 64 00:04:30,543 --> 00:04:33,243 S1: whether a model or a product or a system to 65 00:04:33,273 --> 00:04:37,593 S1: perform the work of an average US based knowledge worker 66 00:04:37,593 --> 00:04:43,023 S1: in 2002, and I say 2002 because that's pre GPT four. Right. 67 00:04:43,623 --> 00:04:51,393 S1: So basically pre AI in these terms anyway. So yeah 68 00:04:51,423 --> 00:04:54,783 S1: anyone who's talking about AGI make sure they have a definition. 69 00:04:54,783 --> 00:04:58,653 S1: Otherwise you're just wasting your time because the entire conversation 70 00:04:58,653 --> 00:05:01,443 S1: will be about definitions. And you might not even figure 71 00:05:01,443 --> 00:05:06,513 S1: that out until fucking two hours later. Sorry for the cussing. 72 00:05:06,903 --> 00:05:09,033 S1: All right. One of the most important changes to me 73 00:05:09,033 --> 00:05:12,183 S1: with this model. This this is massive, okay? This is 74 00:05:12,183 --> 00:05:15,813 S1: the first model that does this. Uh, it's the first 75 00:05:15,813 --> 00:05:19,143 S1: model of its kind to do this very, very interesting. 76 00:05:19,803 --> 00:05:24,783 S1: It's actually spending tokens To think, okay, before you had 77 00:05:24,783 --> 00:05:28,803 S1: input and you had output and you were being charged in, 78 00:05:28,803 --> 00:05:30,873 S1: the amount of work that was being done was based 79 00:05:30,873 --> 00:05:33,693 S1: on the number of tokens coming in and the number 80 00:05:33,693 --> 00:05:36,303 S1: of tokens coming out, and that that was the extent 81 00:05:36,303 --> 00:05:40,143 S1: of it. What's happening now is you have tokens coming 82 00:05:40,143 --> 00:05:44,253 S1: in and you have tokens coming out, but there are 83 00:05:44,253 --> 00:05:50,193 S1: tokens being spent while it's thinking. It's actually thinking and 84 00:05:50,193 --> 00:05:54,153 S1: reasoning through how to solve the problem. And what's really 85 00:05:54,153 --> 00:05:59,763 S1: fascinating about this is that you now have multiple factors here. Okay. 86 00:05:59,793 --> 00:06:03,633 S1: So you can do better prompting. And this is the 87 00:06:03,633 --> 00:06:07,833 S1: next piece here. Number seven. You could do better prompting. 88 00:06:07,983 --> 00:06:11,133 S1: You could use a smarter model. Or you could have 89 00:06:11,133 --> 00:06:15,813 S1: the model think harder on the problem. And these are 90 00:06:15,813 --> 00:06:19,743 S1: all going to be levers and knobs that we have 91 00:06:19,773 --> 00:06:22,503 S1: to get better results from AI. And this is the 92 00:06:22,543 --> 00:06:26,143 S1: first time we have this third level lever of like 93 00:06:26,173 --> 00:06:31,243 S1: actually having it think, right. So at inference time, more 94 00:06:31,243 --> 00:06:34,693 S1: effort being spent. And they actually say in the blog 95 00:06:34,693 --> 00:06:37,123 S1: post they're like, hey, look, right now it's taking, you know, 96 00:06:37,153 --> 00:06:40,963 S1: a few seconds to think or whatever, and it's going 97 00:06:40,993 --> 00:06:43,903 S1: to get back great results. But we're thinking, what if 98 00:06:43,903 --> 00:06:47,413 S1: it thinks for minutes? What if it thinks for hours? 99 00:06:47,413 --> 00:06:50,743 S1: What if it thinks for days or weeks? And not 100 00:06:50,743 --> 00:06:54,643 S1: only that, but we give it more compute power to think. 101 00:06:55,243 --> 00:06:58,123 S1: And the example they gave, I think this was an 102 00:06:58,123 --> 00:07:01,483 S1: OpenAI post. The example they gave here was how much 103 00:07:01,483 --> 00:07:04,003 S1: do you want to solve cancer? What if you could 104 00:07:04,033 --> 00:07:07,393 S1: build a data center? What if you had one data 105 00:07:07,423 --> 00:07:10,453 S1: center just for working on cancer and one data center 106 00:07:10,453 --> 00:07:16,393 S1: just for working on aging and so on? Okay. And 107 00:07:16,393 --> 00:07:19,663 S1: you basically have models like this that scale with the 108 00:07:19,663 --> 00:07:22,723 S1: inference difficulty based on the amount of difficulty of the, 109 00:07:22,763 --> 00:07:25,643 S1: of the thinking. And then, of course, you have a 110 00:07:25,643 --> 00:07:28,973 S1: smart model and a good neural net and all that, right? 111 00:07:29,003 --> 00:07:32,693 S1: Scalability of the of the neural net. So maybe that's 112 00:07:32,693 --> 00:07:36,863 S1: GPT five, GPT six, whatever. Combined with the good prompting, 113 00:07:36,863 --> 00:07:42,893 S1: combined with this thinking capability and combined with, you know, 114 00:07:42,923 --> 00:07:48,713 S1: all those things unified into the combined with having that 115 00:07:48,713 --> 00:07:53,933 S1: giant infrastructure to run it so that that's insane. Um, 116 00:07:53,963 --> 00:07:56,213 S1: and the scales all the way down to like, the 117 00:07:56,213 --> 00:08:00,863 S1: smallest stupid problem where it's just like, whatever, GPT three 118 00:08:00,893 --> 00:08:04,313 S1: and you get back the answer almost instantaneously. In fact, 119 00:08:04,313 --> 00:08:07,733 S1: forget GPT three. It's some local model that only does 120 00:08:07,733 --> 00:08:11,573 S1: one thing well. You're spending almost no resources whatsoever. It 121 00:08:11,573 --> 00:08:14,933 S1: just goes to your phone, bounces back immediately, doesn't go anywhere, 122 00:08:14,933 --> 00:08:18,803 S1: barely costs any cycles of a GPU or a CPU 123 00:08:18,833 --> 00:08:21,713 S1: because you don't need those resources to run. Because it's 124 00:08:21,713 --> 00:08:24,733 S1: just an easy thing to answer. So now we're talking 125 00:08:24,733 --> 00:08:30,703 S1: about AI that scales with the difficulty of the problem, right? With, 126 00:08:30,943 --> 00:08:35,893 S1: you know, cancer, aging, getting out of the solar system, 127 00:08:35,893 --> 00:08:40,303 S1: escaping the sun, expanding, ultimately heat, death of the universe. 128 00:08:40,303 --> 00:08:45,613 S1: That's a big one, right? Because entropy kills everything. So ultimately, 129 00:08:45,613 --> 00:08:47,173 S1: we're going to need a way out of here at 130 00:08:47,173 --> 00:08:53,143 S1: some point, assuming we survive that long. Not happening anytime soon. 131 00:08:53,143 --> 00:08:55,783 S1: I wouldn't worry about that. But these are the types 132 00:08:55,783 --> 00:09:00,103 S1: of things that are really exciting. You know, the size 133 00:09:00,103 --> 00:09:04,213 S1: of the problem being being a factor, for which I 134 00:09:04,243 --> 00:09:06,973 S1: you point at it with lots and lots of different 135 00:09:07,003 --> 00:09:11,533 S1: knobs and levers controlling that decision. So I think that's 136 00:09:11,533 --> 00:09:15,103 S1: really cool. Another important thing to mention is that the 137 00:09:15,103 --> 00:09:18,673 S1: innovation seems independent of what we were waiting for for 138 00:09:18,673 --> 00:09:22,423 S1: GPT five. So based on all I read, all the 139 00:09:22,423 --> 00:09:25,883 S1: releases from OpenAI. And I've seen all the rumors and, 140 00:09:25,913 --> 00:09:28,163 S1: you know, talked to a bunch of people who've been 141 00:09:28,163 --> 00:09:32,603 S1: speculating about this. And this seems completely independent from, oh, 142 00:09:32,633 --> 00:09:36,473 S1: is this GPT four oh, is it for oh, is 143 00:09:36,473 --> 00:09:40,343 S1: it five? Is it an early version of five. Doesn't 144 00:09:40,343 --> 00:09:45,293 S1: really matter. It's like a separate axis. This is like 145 00:09:45,293 --> 00:09:48,923 S1: a capability. This is like thinking capability, which is on 146 00:09:48,923 --> 00:09:52,763 S1: a separate axis from how big or smart is the 147 00:09:52,763 --> 00:09:56,573 S1: neural net, right? Or how big or smart is the 148 00:09:56,603 --> 00:10:01,103 S1: is the model. So really, really cool to think about 149 00:10:01,103 --> 00:10:03,503 S1: those being two separate things because now we can start 150 00:10:03,503 --> 00:10:06,803 S1: thinking about, okay, well if GPT five is still going 151 00:10:06,833 --> 00:10:09,263 S1: to come out, you know, later this year, beginning in 152 00:10:09,293 --> 00:10:11,723 S1: next year or whenever it's going to come out and 153 00:10:11,723 --> 00:10:15,833 S1: whatever they're going to call it. Well, imagine GPT five 154 00:10:15,833 --> 00:10:22,283 S1: with this thinking capability. That's cool. So presumably this is 155 00:10:22,283 --> 00:10:25,203 S1: just a feature that you can add onto any model, 156 00:10:25,293 --> 00:10:28,323 S1: which is what we're just talking about. And I think 157 00:10:28,323 --> 00:10:32,823 S1: this is okay. This is really, really crucial here. I've 158 00:10:32,823 --> 00:10:34,983 S1: been talking for a long time about slack in the 159 00:10:34,983 --> 00:10:39,093 S1: rope and tricks that we're going to use to jump 160 00:10:39,093 --> 00:10:43,503 S1: ahead in, um, advancement of AI, so so check this out. 161 00:10:43,533 --> 00:10:45,513 S1: A lot of people are like, oh, we're running into 162 00:10:45,513 --> 00:10:49,173 S1: a data wall. Oh, neural nets are only so good 163 00:10:49,233 --> 00:10:52,293 S1: they can only get so good. We've already hit a thing. 164 00:10:52,293 --> 00:10:54,993 S1: I mean, so many, so many people are saying things 165 00:10:54,993 --> 00:11:00,903 S1: like this that just sound absolutely ridiculous to me. First 166 00:11:00,903 --> 00:11:03,963 S1: of all, they were the ones saying we wouldn't be here. 167 00:11:03,963 --> 00:11:08,253 S1: And so now we are here and everyone's surprised and 168 00:11:08,253 --> 00:11:10,773 S1: they're like, well, here's what we know for sure is 169 00:11:10,803 --> 00:11:13,833 S1: we're not going to get any better. How can I 170 00:11:13,833 --> 00:11:18,063 S1: believe you if you didn't predict any of this and 171 00:11:18,063 --> 00:11:21,753 S1: you were absolutely certain back then, and now you're absolutely 172 00:11:21,753 --> 00:11:26,323 S1: certain it's not going to jump ahead again, right? Leopold 173 00:11:26,353 --> 00:11:29,713 S1: talks about this in his paper. There's lots of different 174 00:11:29,713 --> 00:11:33,793 S1: ways to get better. There's the architecture of the model. 175 00:11:33,793 --> 00:11:37,723 S1: There's the size of the model. I forget what all 176 00:11:37,723 --> 00:11:40,273 S1: levers he had, but it's the architecture of the model, 177 00:11:40,303 --> 00:11:42,043 S1: the size of the model. And I think it was 178 00:11:42,043 --> 00:11:45,793 S1: hobbling was the other one, which is what I called 179 00:11:46,003 --> 00:11:48,913 S1: like a year ago. Slack in the rope or tricks 180 00:11:48,913 --> 00:11:50,533 S1: we're going to. This is what I told a friend 181 00:11:50,533 --> 00:11:53,203 S1: of mine who's really smart in this stuff. I said, 182 00:11:53,473 --> 00:11:58,663 S1: watch this. We're going to find multiple tricks where we're 183 00:11:58,663 --> 00:12:02,113 S1: messing around in percentage points, and then we find a 184 00:12:02,113 --> 00:12:05,003 S1: thing and it jumps us 2 or 3 or 5 185 00:12:05,003 --> 00:12:10,663 S1: or 10 x or 100 x ahead. And and I 186 00:12:10,663 --> 00:12:13,843 S1: actually learned this from him. Uh, I actually learned this 187 00:12:13,843 --> 00:12:16,033 S1: from him. He was like, hey, you know, there are 188 00:12:16,033 --> 00:12:19,243 S1: things that jump you ahead. Um, and I think he 189 00:12:19,243 --> 00:12:22,333 S1: gave me example from some public paper or whatever. And 190 00:12:22,333 --> 00:12:25,793 S1: it was an example of like a big jump. And 191 00:12:25,793 --> 00:12:28,673 S1: my natural intuition was there's going to be a lot 192 00:12:28,673 --> 00:12:33,353 S1: more of those, and they're not coming from pursuing along 193 00:12:33,353 --> 00:12:36,293 S1: this axis, which is difficult. They are actually just hanging 194 00:12:36,293 --> 00:12:38,693 S1: off to the side. It's like, oh, did you know 195 00:12:38,693 --> 00:12:40,823 S1: if you just changed the color of this? Hey, did 196 00:12:40,823 --> 00:12:43,673 S1: you know if you just orient the data backward instead 197 00:12:43,673 --> 00:12:45,953 S1: of forward? Hey, did you know if you just prune 198 00:12:45,983 --> 00:12:48,653 S1: the data in this way or if you add this 199 00:12:48,653 --> 00:12:52,283 S1: particular data set or. And I'm just making up these examples, 200 00:12:52,283 --> 00:12:57,053 S1: but simple things that you wouldn't think would work. And 201 00:12:57,053 --> 00:13:01,253 S1: this is why Leopold talks about if you automate an 202 00:13:01,253 --> 00:13:05,543 S1: AI engineer or an AI researcher, is what he called it. 203 00:13:05,573 --> 00:13:08,783 S1: That's when it gets completely silly, because they have the 204 00:13:08,783 --> 00:13:10,913 S1: ability to now go and try a whole bunch of 205 00:13:10,913 --> 00:13:15,293 S1: these things, including these tricks. Um, all this to say 206 00:13:15,293 --> 00:13:18,683 S1: that the slack in the rope or this series of 207 00:13:18,683 --> 00:13:22,943 S1: tricks is going to keep multiplying our advances, and that's 208 00:13:22,973 --> 00:13:27,483 S1: at the same time that we're working on the algorithms. Oh, 209 00:13:27,483 --> 00:13:30,123 S1: that was the other. That was the other factor is algorithms. 210 00:13:30,123 --> 00:13:32,073 S1: That was this is going to happen at the same 211 00:13:32,073 --> 00:13:34,773 S1: time we're working on the algorithms to make those better. 212 00:13:34,803 --> 00:13:38,733 S1: We're also working on the size of the neural net, um, 213 00:13:38,733 --> 00:13:41,523 S1: and the quality and the structure. And everything about the 214 00:13:41,523 --> 00:13:44,013 S1: neural net is going to get bigger and more powerful, 215 00:13:44,043 --> 00:13:46,953 S1: but mostly just a matter of size, number of parameters. 216 00:13:48,003 --> 00:13:51,393 S1: But all those things are changing at the same time 217 00:13:51,393 --> 00:13:55,563 S1: as we're finding all these tricks. Right. So we're talking 218 00:13:55,563 --> 00:14:00,123 S1: about this is just begun. And this is what people 219 00:14:00,123 --> 00:14:02,883 S1: don't realize. This is just now starting. We're going to 220 00:14:02,883 --> 00:14:06,483 S1: look back in two years and be like, what was that? 221 00:14:06,483 --> 00:14:11,943 S1: That was silly. Right. And so I really want to 222 00:14:11,973 --> 00:14:15,693 S1: warn people against thinking we're hitting some kind of a wall. 223 00:14:16,293 --> 00:14:19,053 S1: Think of it this way. We just found alien technology. 224 00:14:19,083 --> 00:14:21,933 S1: We have no idea how it works. And we're like, 225 00:14:21,963 --> 00:14:26,063 S1: poking it with a stick and it's already spitting out 226 00:14:26,063 --> 00:14:29,693 S1: amazing things. So think about that. Okay, we got a 227 00:14:29,693 --> 00:14:32,363 S1: glowy ball. We don't know how it floats. We don't 228 00:14:32,393 --> 00:14:36,923 S1: know how it's doing. Anti-Gravity, right? We don't know how 229 00:14:36,923 --> 00:14:39,443 S1: it's doing this. We don't know how it's reflecting its surface. 230 00:14:39,473 --> 00:14:41,543 S1: We don't know how it's coming up with these answers. 231 00:14:41,543 --> 00:14:43,763 S1: We don't know how it got here from the other 232 00:14:43,763 --> 00:14:46,763 S1: solar system. We don't know anything about it. You poke 233 00:14:46,763 --> 00:14:49,463 S1: it with a stick and it tells this magic stuff 234 00:14:49,463 --> 00:14:54,023 S1: and we're like, Holy crap, that's amazing. Somebody walks up, 235 00:14:54,023 --> 00:14:57,053 S1: sees you poke it with a stick and goes, yeah, 236 00:14:57,083 --> 00:15:00,803 S1: that's I mean, that's that's all it's ever going to 237 00:15:00,803 --> 00:15:04,283 S1: be able to do. I mean, I've seen you poke 238 00:15:04,283 --> 00:15:07,613 S1: it with a stick twice, and it gave you kind 239 00:15:07,613 --> 00:15:10,703 S1: of a similar answer, which means that's all we could 240 00:15:10,703 --> 00:15:16,433 S1: learn from this alien ball. That's their conclusion. I am 241 00:15:16,433 --> 00:15:19,553 S1: certain that since you poked it with a stick while 242 00:15:19,553 --> 00:15:22,133 S1: I was standing here three times, and it kind of 243 00:15:22,163 --> 00:15:26,323 S1: gave you a similar answer. One it must be stupid. 244 00:15:26,353 --> 00:15:29,833 S1: Two it's not as smart as us. And three, this 245 00:15:29,833 --> 00:15:32,683 S1: is as as smart as it's ever going to be. 246 00:15:32,713 --> 00:15:36,043 S1: This is the most it has to offer. That is 247 00:15:36,043 --> 00:15:39,973 S1: the claim that's being made by these kind of like denialists, 248 00:15:40,003 --> 00:15:45,163 S1: in my view. And that doesn't mean the current shiny 249 00:15:45,163 --> 00:15:49,783 S1: ball is better than humans, or it should replace humans, 250 00:15:49,783 --> 00:15:52,723 S1: or it could do everything we could do. Like, this 251 00:15:52,723 --> 00:15:55,093 S1: is not a competition. Okay, here's a better way to 252 00:15:55,123 --> 00:15:58,063 S1: think about this. This is not like a rock that 253 00:15:58,063 --> 00:16:00,973 S1: we have animated. Think of it this way. If an 254 00:16:00,973 --> 00:16:04,363 S1: alien comes here because someone else was like, hey, this 255 00:16:04,363 --> 00:16:08,083 S1: is not thinking, this is processing. And I'm like, come on, 256 00:16:08,083 --> 00:16:11,503 S1: come on. If you if an alien comes here, let's 257 00:16:11,503 --> 00:16:14,863 S1: assume we know how our brain works. An alien comes 258 00:16:14,863 --> 00:16:17,473 S1: here and we look at its brain, or it shows 259 00:16:17,473 --> 00:16:22,813 S1: us its brain and it looks different. And we're like, oh, 260 00:16:22,843 --> 00:16:28,783 S1: you guys do neurons and synapses different than us? Who's 261 00:16:28,783 --> 00:16:31,363 S1: going to walk over and be like, well, since they're 262 00:16:31,363 --> 00:16:35,593 S1: doing neurons and synapses different than us, they're not thinking. 263 00:16:36,043 --> 00:16:40,723 S1: Only humans can think. And I'm like, they got here. 264 00:16:40,753 --> 00:16:43,483 S1: They got here, didn't they? It's a little shiny ball. 265 00:16:43,483 --> 00:16:46,213 S1: And they got here from whatever part of the galaxy 266 00:16:46,213 --> 00:16:51,253 S1: or universe that they came from. They're obviously doing something right. 267 00:16:52,033 --> 00:16:55,573 S1: And I is obviously doing something right too. So I 268 00:16:55,573 --> 00:16:59,173 S1: think it's a little bit specious. Is that is that 269 00:16:59,173 --> 00:17:05,713 S1: the name of the word? It's like specious to just 270 00:17:05,713 --> 00:17:09,973 S1: magically assume that we are the best. Only we are 271 00:17:10,003 --> 00:17:15,493 S1: thinking only we are special. Instead of thinking like we 272 00:17:15,493 --> 00:17:19,813 S1: might have this nascent alien intelligence thing going on that 273 00:17:19,813 --> 00:17:22,873 S1: actually is doing things that are very much analogous to us. 274 00:17:22,993 --> 00:17:25,223 S1: It reminds me of the first time that I clicked 275 00:17:25,223 --> 00:17:29,423 S1: around inside of Linux. This is like late 90s. I 276 00:17:29,423 --> 00:17:33,863 S1: was messing with Linux. This must have been like 9798 277 00:17:33,863 --> 00:17:37,673 S1: or something. I'm messing with Linux and I'm clicking around 278 00:17:37,703 --> 00:17:41,693 S1: because I had started with windows and I'm like, oh, 279 00:17:41,693 --> 00:17:44,303 S1: it opens windows and it opens things that I could 280 00:17:44,303 --> 00:17:47,633 S1: click and navigate. Then I'm like, it's it's just like 281 00:17:47,663 --> 00:17:52,523 S1: on Windows Explorer. And this like, blew me away. It 282 00:17:52,553 --> 00:17:55,373 S1: absolutely blew me away that this was just a different 283 00:17:55,373 --> 00:17:59,033 S1: way of doing the same thing. And that underneath this, 284 00:17:59,303 --> 00:18:02,543 S1: there's a universal thing of you need to be able 285 00:18:02,543 --> 00:18:05,333 S1: to browse files, you need to be able to open windows, 286 00:18:05,333 --> 00:18:08,903 S1: you need to be able to close windows. And that 287 00:18:08,903 --> 00:18:10,913 S1: clicked for me. And I'm like, oh, I guess like 288 00:18:10,943 --> 00:18:13,763 S1: all operating systems are going to do this differently. It's 289 00:18:13,763 --> 00:18:16,193 S1: the same with aliens. It's the same with like they 290 00:18:16,193 --> 00:18:20,153 S1: might think differently, but whatever. They have to think, right. 291 00:18:20,153 --> 00:18:23,633 S1: So why would we expect this synthetic intelligence that we've 292 00:18:23,673 --> 00:18:28,053 S1: birthed to do it exactly the same way that we 293 00:18:28,083 --> 00:18:32,253 S1: way that we do. We should not expect that we 294 00:18:32,283 --> 00:18:38,763 S1: got here accidentally stumbling through time due to evolution. And 295 00:18:38,763 --> 00:18:42,663 S1: we've got this version that we have and it's awesome, obviously. 296 00:18:42,933 --> 00:18:46,953 S1: But like, that's way different than we invented this thing 297 00:18:46,983 --> 00:18:52,293 S1: five years ago or whenever that was 2017, six years ago. 298 00:18:53,313 --> 00:18:55,803 S1: And I know it goes further back than that. But 299 00:18:55,833 --> 00:19:01,083 S1: you know what I'm saying? Transformers. All right. So that's that. 300 00:19:01,083 --> 00:19:04,503 S1: And this this is becoming a long thing. But whatever 301 00:19:04,533 --> 00:19:09,663 S1: we'll go with it. So yeah, basically we have no 302 00:19:09,963 --> 00:19:14,463 S1: idea how early all of this is. We're likely to 303 00:19:14,493 --> 00:19:17,643 S1: find ten, 20 or 200 more of these holy crap 304 00:19:17,673 --> 00:19:23,463 S1: optimizations like this thinking thing before we start hitting any 305 00:19:23,463 --> 00:19:30,323 S1: limits for neural network architecture or the transform transformer like. 306 00:19:30,743 --> 00:19:34,073 S1: Plus we could just find something better than a transformer. 307 00:19:34,073 --> 00:19:38,993 S1: You realize how how lucky we were to find the transformer. 308 00:19:39,023 --> 00:19:41,813 S1: Like the people who made that paper. They're like, hey, 309 00:19:41,813 --> 00:19:44,003 S1: this is this is a cool way we think this 310 00:19:44,003 --> 00:19:45,923 S1: is a cool way of doing something. They didn't know 311 00:19:45,923 --> 00:19:50,033 S1: what they had. Okay, you should watch a Karpathy talk 312 00:19:50,063 --> 00:19:53,303 S1: about the transformer. He's like, this thing is a general 313 00:19:53,303 --> 00:19:57,173 S1: purpose computer. This thing is insanely good at learning. He 314 00:19:57,173 --> 00:20:01,193 S1: talks about different ways that it's better than humans at learning. Okay, 315 00:20:01,223 --> 00:20:05,933 S1: some some people randomly found this thing and it shot 316 00:20:05,933 --> 00:20:08,813 S1: us off. Okay. So so check this out. This is 317 00:20:08,813 --> 00:20:14,573 S1: another example of finding tricks or slack in the rope 318 00:20:14,573 --> 00:20:17,873 S1: just lying on the ground. So we stumble through AI 319 00:20:18,383 --> 00:20:22,943 S1: for decades and decades and decades. And then someone's like, hey, 320 00:20:22,943 --> 00:20:25,413 S1: this is kind of cool about this attention mechanism. Hey, 321 00:20:25,413 --> 00:20:29,193 S1: what do you think about this architecture for a neural net? Boom! 322 00:20:29,193 --> 00:20:34,263 S1: Now we have this take off. There's nothing saying somebody 323 00:20:34,263 --> 00:20:37,653 S1: isn't going to be like, I like what you did 324 00:20:37,653 --> 00:20:42,603 S1: with that transformer architecture. What if it looked like this instead? 325 00:20:42,633 --> 00:20:46,473 S1: It might be 20 times better. It might be 2000 326 00:20:46,473 --> 00:20:50,403 S1: times better. It might be 4% better. It doesn't matter. 327 00:20:50,433 --> 00:20:55,053 S1: Like the we have only just begun. We have only 328 00:20:55,053 --> 00:20:58,833 S1: just begun. I can absolutely guarantee you that assuming we 329 00:20:58,833 --> 00:21:02,043 S1: don't kill ourselves off as a result of this, like 330 00:21:02,073 --> 00:21:07,713 S1: that would set things back. But I'm trying to get 331 00:21:07,713 --> 00:21:12,873 S1: you to think about things in this way because it's 332 00:21:12,873 --> 00:21:16,773 S1: insane what's about to happen. And yeah, I'm going to 333 00:21:16,773 --> 00:21:18,963 S1: have more examples here. I'm working on an example right 334 00:21:18,963 --> 00:21:21,783 S1: here on this other screen. Uh, pretty cool thing I'm 335 00:21:21,783 --> 00:21:25,023 S1: building with it. Um, okay. So that was that.