1 00:00:15,356 --> 00:00:15,796 Speaker 1: Pushkin. 2 00:00:20,156 --> 00:00:22,156 Speaker 2: If I were going to pick one paper from the 3 00:00:22,196 --> 00:00:25,316 Speaker 2: past decade that had the biggest impact on the world, 4 00:00:25,876 --> 00:00:28,676 Speaker 2: I would choose one called Attention Is All You Need, 5 00:00:28,916 --> 00:00:34,836 Speaker 2: published in twenty seventeen. That paper basically invented transformer models. 6 00:00:35,356 --> 00:00:38,836 Speaker 2: You've almost certainly used a transformer model if you have 7 00:00:39,036 --> 00:00:42,676 Speaker 2: used chat GPT or Gemini or Claude or deep Seek. 8 00:00:42,996 --> 00:00:47,236 Speaker 2: In fact, the tea in chat GPT stands for a transformer, 9 00:00:48,076 --> 00:00:51,196 Speaker 2: and transformer models have turned out to be wildly useful, 10 00:00:51,276 --> 00:00:54,476 Speaker 2: not just at generating language, but also at everything from 11 00:00:54,516 --> 00:00:57,596 Speaker 2: generating images to predicting what proteins will look like. 12 00:00:58,516 --> 00:01:00,716 Speaker 1: In fact, transformers. 13 00:01:00,076 --> 00:01:03,156 Speaker 2: Are so ubiquitous and so powerful that it's easy to 14 00:01:03,196 --> 00:01:05,556 Speaker 2: forget that some guy just thought them up. 15 00:01:06,316 --> 00:01:08,036 Speaker 1: But in fact, some guy did. 16 00:01:08,116 --> 00:01:10,876 Speaker 2: Just think up transform and I'm talking to him today 17 00:01:11,276 --> 00:01:19,396 Speaker 2: on the show. I'm Jacob Goldstein and this is What's 18 00:01:19,436 --> 00:01:21,396 Speaker 2: Your Problem, the show where I talk to people who 19 00:01:21,396 --> 00:01:24,956 Speaker 2: are trying to make technological progress. My guest today is 20 00:01:25,036 --> 00:01:28,436 Speaker 2: Yakub Uskolai. And just to be clear, Yakub was one 21 00:01:28,476 --> 00:01:31,796 Speaker 2: of several co authors on that transformer paper, and on 22 00:01:31,836 --> 00:01:34,596 Speaker 2: top of that, lots of other researchers were working on 23 00:01:34,716 --> 00:01:36,756 Speaker 2: related things at the same time, so a lot of 24 00:01:36,796 --> 00:01:39,596 Speaker 2: people were working on this, but the key idea did 25 00:01:39,636 --> 00:01:44,076 Speaker 2: seem to come from Yakub. Today, Yakub is the CEO 26 00:01:44,236 --> 00:01:47,476 Speaker 2: of Inceptive. That's a company that he co founded to 27 00:01:47,636 --> 00:01:51,076 Speaker 2: use AI to develop new kinds of medicine, and the 28 00:01:51,116 --> 00:01:54,876 Speaker 2: company is particularly focused on RNA. We talked about his 29 00:01:54,916 --> 00:01:57,476 Speaker 2: work at Inceptive in the second part of our conversation. 30 00:01:58,236 --> 00:02:00,196 Speaker 2: In the first part, we talked about his work on 31 00:02:00,316 --> 00:02:03,956 Speaker 2: transformer models. At the time he started working on the 32 00:02:04,036 --> 00:02:07,236 Speaker 2: idea for transformers, this is around a decade ago now, 33 00:02:07,556 --> 00:02:10,916 Speaker 2: there were a couple of big problems with existing language models. 34 00:02:11,436 --> 00:02:13,956 Speaker 2: For one thing, they were slow. They were in fact 35 00:02:13,996 --> 00:02:16,276 Speaker 2: so slow that they could not even keep up with 36 00:02:16,356 --> 00:02:19,996 Speaker 2: all the new training data that was becoming available. A 37 00:02:20,036 --> 00:02:24,716 Speaker 2: second problem, they struggled with what are called long range dependencies. 38 00:02:25,196 --> 00:02:29,156 Speaker 2: Basically in language, that's relationships between words that are far 39 00:02:29,236 --> 00:02:32,196 Speaker 2: apart from each other in a sentence. So to start, 40 00:02:32,356 --> 00:02:34,756 Speaker 2: I asked Yakab for an example we could use to 41 00:02:34,796 --> 00:02:37,756 Speaker 2: discuss these problems and also how he came up with 42 00:02:37,836 --> 00:02:41,076 Speaker 2: his big idea for how to solve them. So, pick 43 00:02:41,116 --> 00:02:43,916 Speaker 2: a sentence that's going to be a good object lesson 44 00:02:43,956 --> 00:02:44,276 Speaker 2: for us. 45 00:02:44,356 --> 00:02:47,516 Speaker 1: Okay, so we could have the frog didn't cross the 46 00:02:47,556 --> 00:02:50,356 Speaker 1: road because it was too tired. Okay, so we got 47 00:02:50,356 --> 00:02:51,556 Speaker 1: our sentence. Yep. 48 00:02:52,236 --> 00:02:55,236 Speaker 2: How would the sort of big, powerful but slow to 49 00:02:55,316 --> 00:02:57,836 Speaker 2: train algorithm in twenty fifteen. 50 00:02:59,036 --> 00:03:02,596 Speaker 1: Have processed that sentence? So basically it would have walked 51 00:03:02,636 --> 00:03:06,476 Speaker 1: through that sentence word by word, and so it would 52 00:03:06,556 --> 00:03:11,876 Speaker 1: walk through the sentence left to right. The frog did 53 00:03:12,076 --> 00:03:15,036 Speaker 1: not cross the road because it was too tired. 54 00:03:15,076 --> 00:03:17,796 Speaker 2: Which is logical, which is how I would think a 55 00:03:17,836 --> 00:03:18,636 Speaker 2: system would work. 56 00:03:18,676 --> 00:03:21,676 Speaker 1: It's more or less how we read, right, it's how 57 00:03:21,716 --> 00:03:25,076 Speaker 1: we read, but it's not necessarily how we understand. Uh huh. 58 00:03:25,116 --> 00:03:28,476 Speaker 1: That is actually one of the integral I would say 59 00:03:29,036 --> 00:03:31,876 Speaker 1: for what we then how we then went about trying 60 00:03:31,916 --> 00:03:32,716 Speaker 1: to speak us all up? 61 00:03:32,876 --> 00:03:34,796 Speaker 2: Well, I love that. I want you to say more 62 00:03:34,836 --> 00:03:37,236 Speaker 2: about it. When you say it's not how we understand, 63 00:03:37,276 --> 00:03:37,876 Speaker 2: what do you mean? 64 00:03:38,516 --> 00:03:43,156 Speaker 1: So? On one hand, right linearity of time forces us 65 00:03:43,236 --> 00:03:48,636 Speaker 1: to almost always feel that we're communicating language in order 66 00:03:48,836 --> 00:03:53,876 Speaker 1: and just linearly. It actually turns out that that's not 67 00:03:53,916 --> 00:03:56,956 Speaker 1: really how we read, not even in terms of our secades, 68 00:03:56,956 --> 00:03:59,236 Speaker 1: in terms of our em movements. We actually do jump 69 00:03:59,276 --> 00:04:01,916 Speaker 1: back and forth quite a bit while reading, and if 70 00:04:01,956 --> 00:04:06,756 Speaker 1: you look at conversations, you also have highly nonlinear elements 71 00:04:06,796 --> 00:04:11,076 Speaker 1: where there's repetition, there's reference, there's basically different flavors of interruption. 72 00:04:11,556 --> 00:04:14,036 Speaker 1: But sure, by and large right, we would say we 73 00:04:14,156 --> 00:04:17,356 Speaker 1: certainly right them left to right right. So if you 74 00:04:17,516 --> 00:04:20,556 Speaker 1: write a proper text, you don't write it as you 75 00:04:20,556 --> 00:04:22,276 Speaker 1: would read it, and you also don't write it as 76 00:04:22,316 --> 00:04:24,196 Speaker 1: you would talk about it. You do write it in 77 00:04:24,796 --> 00:04:28,196 Speaker 1: one linear order. Now, as we read this and as 78 00:04:28,196 --> 00:04:34,196 Speaker 1: we understand this, we actually form groups of words that 79 00:04:34,316 --> 00:04:38,156 Speaker 1: then form meaning. Right. So an example of that is 80 00:04:38,596 --> 00:04:42,556 Speaker 1: you know adjective noun, right, it's or say, in this 81 00:04:42,596 --> 00:04:45,756 Speaker 1: case an article noun, it's not a frog, it's the frog. Right. 82 00:04:45,916 --> 00:04:48,716 Speaker 1: We could have also said it's the green frog or 83 00:04:49,156 --> 00:04:49,956 Speaker 1: the lazy frog. 84 00:04:50,356 --> 00:04:54,076 Speaker 2: Right. Language has a structure, right, and there things can 85 00:04:54,116 --> 00:04:58,076 Speaker 2: modify other things, and things can modify the modifiers exactly exactly. 86 00:04:58,156 --> 00:05:02,156 Speaker 1: But the interesting thing now is that structure in as 87 00:05:01,956 --> 00:05:05,636 Speaker 1: a as a tree structured clean hierarchy, only tells you 88 00:05:05,676 --> 00:05:10,876 Speaker 1: half the story. There's so many exceptions where statistical dependencies, 89 00:05:10,956 --> 00:05:13,836 Speaker 1: where modification actually happens at a distance. 90 00:05:14,116 --> 00:05:16,116 Speaker 2: So okay, So just to bring this back to your 91 00:05:16,156 --> 00:05:19,076 Speaker 2: sample sentence, The frog didn't cross the road because it 92 00:05:19,156 --> 00:05:22,956 Speaker 2: was too tired. That word it is actually quite far 93 00:05:22,996 --> 00:05:25,996 Speaker 2: from the word frog. And if you're an AI going 94 00:05:25,996 --> 00:05:28,956 Speaker 2: from left to right, you may well get confused there, right, 95 00:05:28,996 --> 00:05:33,196 Speaker 2: You may think it refers to road instead of to frog. 96 00:05:34,276 --> 00:05:37,076 Speaker 2: So this is one of the problems you were trying 97 00:05:37,116 --> 00:05:39,236 Speaker 2: to solve. And then the other one you were mentioning before, 98 00:05:39,556 --> 00:05:43,836 Speaker 2: which is these models were just slow because after each word, 99 00:05:43,876 --> 00:05:46,956 Speaker 2: the model just recalculates what everything means, and that just 100 00:05:47,156 --> 00:05:48,156 Speaker 2: takes a long time. 101 00:05:48,476 --> 00:05:51,396 Speaker 1: They can't go fast enough exactly. It takes a long time, 102 00:05:51,476 --> 00:05:55,116 Speaker 1: and it doesn't play to the strengths of the computers, 103 00:05:55,356 --> 00:05:57,356 Speaker 1: of the accelerators that we're using there. 104 00:05:57,756 --> 00:05:59,916 Speaker 2: And when you say accelerators, I know Google has their 105 00:05:59,956 --> 00:06:02,956 Speaker 2: own chips, but basically we mean GPUs. 106 00:06:02,356 --> 00:06:04,756 Speaker 1: Now right, we mean GPUs, We mean. 107 00:06:04,636 --> 00:06:08,276 Speaker 2: The chips that Nvidia sells. What is the nature of. 108 00:06:08,196 --> 00:06:10,396 Speaker 1: Those particular ships. Yeah, So the nature of those particular 109 00:06:10,516 --> 00:06:16,596 Speaker 1: chips is that instead of doing a broad variety of 110 00:06:16,756 --> 00:06:22,636 Speaker 1: complex computations in sequence, they are incredibly good. They excel 111 00:06:23,116 --> 00:06:27,516 Speaker 1: at performing many, many, many simple computations in parallel. And 112 00:06:27,556 --> 00:06:32,396 Speaker 1: so what this hierarchical or semi hierrachical nature of language 113 00:06:32,956 --> 00:06:38,356 Speaker 1: enables you to do is instead of having, so to speak, 114 00:06:38,476 --> 00:06:41,956 Speaker 1: one place where you read the current word, you could 115 00:06:42,036 --> 00:06:46,876 Speaker 1: now imagine you actually read every You look at everything 116 00:06:46,876 --> 00:06:52,196 Speaker 1: at the same time, and you apply many simple operations 117 00:06:52,556 --> 00:06:55,516 Speaker 1: at the same time to each position in your sentence. 118 00:06:55,796 --> 00:06:58,276 Speaker 2: Huh So this is the big idea, I just want 119 00:06:58,276 --> 00:07:01,876 Speaker 2: to because this is it, right, this is the breakthrough happening. Yes, 120 00:07:02,756 --> 00:07:05,836 Speaker 2: it's basically, what if instead of reading the sentence one 121 00:07:05,836 --> 00:07:08,276 Speaker 2: word at a time from left to right, we read 122 00:07:08,276 --> 00:07:10,036 Speaker 2: the whole thing all at once. 123 00:07:10,396 --> 00:07:13,676 Speaker 1: All at once. And now the problem is clearly something's 124 00:07:13,676 --> 00:07:17,076 Speaker 1: got to give, right, so there's no fore lunch in 125 00:07:17,116 --> 00:07:20,116 Speaker 1: that sense. You have to now simplify what you can 126 00:07:20,156 --> 00:07:22,956 Speaker 1: do at every position when you do this all in parallel, 127 00:07:24,436 --> 00:07:26,716 Speaker 1: but you can now afford to do this a bunch 128 00:07:26,716 --> 00:07:29,916 Speaker 1: of times after another and revise it over time or 129 00:07:29,956 --> 00:07:32,796 Speaker 1: over these steps. And so instead of walking through the 130 00:07:32,836 --> 00:07:35,836 Speaker 1: sentence from beginning to end, whether an average sentence has 131 00:07:35,876 --> 00:07:39,076 Speaker 1: like twenty words or so average sentence in pros, instead 132 00:07:39,116 --> 00:07:41,716 Speaker 1: of walking those twenty positions, what you're doing is you're 133 00:07:41,756 --> 00:07:46,116 Speaker 1: looking at every word at the same time, but in 134 00:07:46,156 --> 00:07:49,036 Speaker 1: a simpler way. But now you can do that maybe 135 00:07:49,156 --> 00:07:53,556 Speaker 1: five or six times, revising your understanding, and that turns 136 00:07:53,556 --> 00:07:59,596 Speaker 1: out is faster, way faster on GPUs and because of 137 00:07:59,636 --> 00:08:01,916 Speaker 1: this hierarchical nature of language, it's also better. 138 00:08:02,996 --> 00:08:06,076 Speaker 2: So you have this idea, and as I read the 139 00:08:06,076 --> 00:08:08,076 Speaker 2: little note on the paper, it was in fact your idea. 140 00:08:08,076 --> 00:08:09,516 Speaker 2: I know you were working with a t but the 141 00:08:09,556 --> 00:08:12,796 Speaker 2: paper credits you with the idea. So let's let's take 142 00:08:12,836 --> 00:08:15,476 Speaker 2: this idea, this basic idea of look at the whole 143 00:08:15,476 --> 00:08:18,796 Speaker 2: input sentence all at once, yep, a few times, and 144 00:08:19,876 --> 00:08:22,436 Speaker 2: apply it to our frog sentence. Give me, give me 145 00:08:22,436 --> 00:08:23,436 Speaker 2: that frog sentence again. 146 00:08:23,996 --> 00:08:26,396 Speaker 1: The frog did not cross the road because it was 147 00:08:26,436 --> 00:08:27,916 Speaker 1: too tired. Good. 148 00:08:28,276 --> 00:08:30,156 Speaker 2: Tired is good because that's unambiguous. 149 00:08:30,196 --> 00:08:31,716 Speaker 1: Hot could be either one. It could be the road 150 00:08:31,836 --> 00:08:33,756 Speaker 1: or the frog, right, Hot could be hot could be 151 00:08:33,836 --> 00:08:36,596 Speaker 1: the one exactly is in fact hot could either could 152 00:08:36,596 --> 00:08:40,476 Speaker 1: actually either one and non referential and non referential because 153 00:08:40,516 --> 00:08:42,396 Speaker 1: it was too hot outside. 154 00:08:41,996 --> 00:08:44,076 Speaker 2: Outside it could be any of three things, the weather, 155 00:08:44,396 --> 00:08:47,396 Speaker 2: or the frog or the road exactly. I love that 156 00:08:47,636 --> 00:08:52,956 Speaker 2: tired solves the problem. So your model, this new way 157 00:08:52,996 --> 00:08:57,276 Speaker 2: of doing things, how does it parse that sentence, what 158 00:08:57,316 --> 00:08:57,716 Speaker 2: does it do? 159 00:08:58,356 --> 00:09:02,636 Speaker 1: So basically, let's look at the word it and look 160 00:09:02,636 --> 00:09:04,956 Speaker 1: at it in every single step of these you know, 161 00:09:05,076 --> 00:09:10,356 Speaker 1: say a handful of times repeated operation. Imagine you're looking 162 00:09:10,396 --> 00:09:12,516 Speaker 1: at this word it, that's the one that you are 163 00:09:12,556 --> 00:09:16,076 Speaker 1: now trying to understand better, and you now compare it 164 00:09:16,276 --> 00:09:18,796 Speaker 1: to every other word in the sense. Okay, so you 165 00:09:18,836 --> 00:09:22,196 Speaker 1: compare it to the to frog that did not cross 166 00:09:22,316 --> 00:09:28,356 Speaker 1: the road because two and tired, there was two and 167 00:09:28,516 --> 00:09:36,076 Speaker 1: tire and initially in the first past. Already a very 168 00:09:36,596 --> 00:09:40,276 Speaker 1: simple insight the model can fairly easily learn is that 169 00:09:40,356 --> 00:09:49,196 Speaker 1: it could be strongly informed by frog, by road, by nothing, 170 00:09:50,196 --> 00:09:54,276 Speaker 1: but not so by two or by the or maybe 171 00:09:54,276 --> 00:09:56,916 Speaker 1: only to a certain extent by us. But if you 172 00:09:56,916 --> 00:10:00,676 Speaker 1: want to know more about what it denotes, then it 173 00:10:00,716 --> 00:10:03,876 Speaker 1: could be, you know, it could be informed by by 174 00:10:03,876 --> 00:10:04,476 Speaker 1: all of these. 175 00:10:04,876 --> 00:10:08,036 Speaker 2: And just to be clear, that sort of understanding arises 176 00:10:08,076 --> 00:10:09,476 Speaker 2: because it has trained in. 177 00:10:09,396 --> 00:10:11,116 Speaker 1: This way on lots of data. 178 00:10:11,156 --> 00:10:14,956 Speaker 2: It's encountering a new sentence after reading lots of other 179 00:10:15,036 --> 00:10:19,236 Speaker 2: sentences with lots of pronouns with different possible antecedents. 180 00:10:19,316 --> 00:10:23,156 Speaker 1: Yeah, exactly, exactly. So Now the interesting thing is that 181 00:10:23,516 --> 00:10:29,836 Speaker 1: which of the two it actually refers to, doesn't depend 182 00:10:30,116 --> 00:10:33,156 Speaker 1: on only on what those other two words are. And 183 00:10:33,196 --> 00:10:36,396 Speaker 1: this is why you need these subsequent steps because so 184 00:10:36,996 --> 00:10:39,476 Speaker 1: let's talk with the first step. So what now happens 185 00:10:39,556 --> 00:10:44,116 Speaker 1: is that, say the model identifies frog and road could 186 00:10:44,116 --> 00:10:46,916 Speaker 1: have a lot to do with the word it. So 187 00:10:47,156 --> 00:10:51,396 Speaker 1: now you basically copy some information from both frog and 188 00:10:51,636 --> 00:10:55,916 Speaker 1: road over to it, and you don't just copy it, 189 00:10:55,956 --> 00:10:59,076 Speaker 1: you kind of transform it also on the way, but 190 00:10:59,196 --> 00:11:02,956 Speaker 1: you refine your understanding of it. And this is all learned, 191 00:11:02,996 --> 00:11:06,076 Speaker 1: does not given by rules or you know, in any 192 00:11:06,116 --> 00:11:07,916 Speaker 1: way pre specifying. 193 00:11:07,436 --> 00:11:11,276 Speaker 2: Right, just by training on loge, just by training this emergency, 194 00:11:11,276 --> 00:11:13,636 Speaker 2: and so that sort of the meaning of it after 195 00:11:13,676 --> 00:11:17,676 Speaker 2: this first step is kind of influenced by both frog 196 00:11:17,756 --> 00:11:18,196 Speaker 2: and road. 197 00:11:18,316 --> 00:11:23,356 Speaker 1: Yes, both frog and road. Okay, so now we repeat 198 00:11:23,396 --> 00:11:27,596 Speaker 1: this operation again and we now know that it is 199 00:11:27,796 --> 00:11:30,876 Speaker 1: unsure or the model basically now has this kind of superposition. Right, 200 00:11:30,916 --> 00:11:34,076 Speaker 1: it could be road, it could be frog. But now 201 00:11:34,196 --> 00:11:36,836 Speaker 1: in the next step it also looks at tired, and 202 00:11:36,876 --> 00:11:41,116 Speaker 1: somehow the model has learned that when it means something inanimate, 203 00:11:41,276 --> 00:11:46,556 Speaker 1: that tired is not the thing. And so maybe in 204 00:11:46,676 --> 00:11:50,116 Speaker 1: context of tired, it is more likely to refer to frog, 205 00:11:50,836 --> 00:11:54,716 Speaker 1: and now you know, well, it is more likely and 206 00:11:54,796 --> 00:11:57,036 Speaker 1: now maybe the model has figured out already, maybe needs 207 00:11:57,076 --> 00:12:00,516 Speaker 1: a bit more, a few more iterations that it is 208 00:12:00,596 --> 00:12:03,636 Speaker 1: most likely to refer to frog because of the presence 209 00:12:04,196 --> 00:12:07,036 Speaker 1: of tired. So it has solved the problem. But it 210 00:12:07,036 --> 00:12:08,036 Speaker 1: has solved the problem. 211 00:12:08,556 --> 00:12:12,436 Speaker 2: So you do, you have this idea, you try it out. 212 00:12:12,876 --> 00:12:14,996 Speaker 2: There's a detail that you mentioned that's kind of fun, 213 00:12:15,036 --> 00:12:17,596 Speaker 2: and we kind of skipped it, but you mentioned that 214 00:12:17,996 --> 00:12:20,116 Speaker 2: another one of the co authors, who has also gone 215 00:12:20,156 --> 00:12:22,476 Speaker 2: on to do very big things, was about to leave 216 00:12:22,556 --> 00:12:25,276 Speaker 2: Google when you sort of want to test this idea, 217 00:12:25,356 --> 00:12:27,476 Speaker 2: and and that fact that he was about to leave 218 00:12:27,476 --> 00:12:29,956 Speaker 2: Google was actually important to the history of this idea. 219 00:12:29,996 --> 00:12:33,316 Speaker 1: Tell me about that it was important. So this Ilia Plususian, 220 00:12:34,636 --> 00:12:39,476 Speaker 1: he was at the time that this started to gain 221 00:12:39,716 --> 00:12:44,036 Speaker 1: any kind of speed, Elia was managing a good chunk 222 00:12:44,116 --> 00:12:47,676 Speaker 1: of my organization. And the moment he really made the 223 00:12:47,676 --> 00:12:51,396 Speaker 1: decision to leave the company, he had to wait ultimately 224 00:12:51,556 --> 00:12:55,436 Speaker 1: for his co for his co founder, and for them 225 00:12:55,476 --> 00:12:58,156 Speaker 1: to then actually get going together in earnest and so 226 00:12:58,236 --> 00:13:00,716 Speaker 1: he had a few months where he knew and I 227 00:13:00,796 --> 00:13:04,716 Speaker 1: also knew that he was about to leave and where 228 00:13:04,836 --> 00:13:06,716 Speaker 1: you know, the right thing would of course be to 229 00:13:06,756 --> 00:13:11,236 Speaker 1: transition his team to another manager, which we did immediately, 230 00:13:11,716 --> 00:13:14,436 Speaker 1: but where you then suddenly was in a position of 231 00:13:14,476 --> 00:13:18,356 Speaker 1: having nothing to lose and yet quite some time left 232 00:13:18,436 --> 00:13:21,556 Speaker 1: to play with Google's resources and do cool stuff with interesting, 233 00:13:21,796 --> 00:13:25,756 Speaker 1: interesting people. And and so that's one of those moments 234 00:13:25,756 --> 00:13:31,716 Speaker 1: where suddenly your appetite for risk as a researcher just spikes, right, huh, 235 00:13:32,636 --> 00:13:34,556 Speaker 1: because you have, for for a few more months, you 236 00:13:34,556 --> 00:13:38,796 Speaker 1: have these resources at your disposal, you've transitioned your responsibilities. 237 00:13:38,876 --> 00:13:41,836 Speaker 1: At that stage, you're just like, Okay, let's try this 238 00:13:41,956 --> 00:13:46,236 Speaker 1: crazy shit and and and it's and that's literally in 239 00:13:46,356 --> 00:13:49,676 Speaker 1: so many ways, was was one of the integral catalysts 240 00:13:50,796 --> 00:13:55,036 Speaker 1: because that also enabled, right, this kind of mindset of 241 00:13:55,276 --> 00:13:58,796 Speaker 1: we're going for this now, whatever the reason. It still 242 00:13:58,956 --> 00:14:02,956 Speaker 1: you know affects other people. And so there were others 243 00:14:02,956 --> 00:14:06,796 Speaker 1: who joined that collaboration really really early on, who I 244 00:14:06,916 --> 00:14:10,236 Speaker 1: feel were much more excited a result, much more likely 245 00:14:10,276 --> 00:14:12,396 Speaker 1: to really work on this and to really give it 246 00:14:12,396 --> 00:14:17,196 Speaker 1: there all because of his you know, nothing left to lose, 247 00:14:17,636 --> 00:14:19,796 Speaker 1: I'm going to go for this attitude at this. 248 00:14:19,716 --> 00:14:23,556 Speaker 2: Point, Right, was there a moment when you realized it worked. 249 00:14:23,796 --> 00:14:27,396 Speaker 1: There were actually a few moments. And it's interesting because 250 00:14:29,396 --> 00:14:32,596 Speaker 1: on one hand, right, it's a very gradual thing, right, 251 00:14:32,636 --> 00:14:35,636 Speaker 1: And initially, actually it took us many months to get 252 00:14:35,676 --> 00:14:39,156 Speaker 1: to the point where we saw significant first signs of 253 00:14:39,196 --> 00:14:41,836 Speaker 1: life of this not just being a curiosity but really 254 00:14:41,876 --> 00:14:45,196 Speaker 1: being something that would end up being competitive. So there 255 00:14:45,276 --> 00:14:48,036 Speaker 1: certainly was a moment when that started. There was another 256 00:14:48,116 --> 00:14:51,796 Speaker 1: moment when we for the for the first time had 257 00:14:51,916 --> 00:14:56,956 Speaker 1: one machine translation challenge, one language pair of the W 258 00:14:57,116 --> 00:15:00,796 Speaker 1: and T task as it's called, where our score, our 259 00:15:00,836 --> 00:15:05,116 Speaker 1: model performed better than any other single model. The point 260 00:15:05,116 --> 00:15:07,796 Speaker 1: in time when I think all of us realized this 261 00:15:07,876 --> 00:15:12,436 Speaker 1: is special was when we not only had the best 262 00:15:12,436 --> 00:15:16,036 Speaker 1: one in one of these tasks, but in multiple and 263 00:15:17,276 --> 00:15:19,556 Speaker 1: we didn't just have the best number. We also at 264 00:15:19,556 --> 00:15:22,316 Speaker 1: that point were able to establish that we've gotten there 265 00:15:22,356 --> 00:15:27,236 Speaker 1: with about ten times less energy or training compute spend. 266 00:15:27,716 --> 00:15:30,116 Speaker 2: Wow, So you do one tenth the work and you 267 00:15:30,156 --> 00:15:31,076 Speaker 2: get a better result. 268 00:15:31,316 --> 00:15:33,276 Speaker 1: One tenth the work and you get a better result 269 00:15:33,316 --> 00:15:37,116 Speaker 1: not just across one specific challenge, but across multiple including 270 00:15:37,156 --> 00:15:39,876 Speaker 1: the hardest or of one of the harder ones. Right. 271 00:15:40,316 --> 00:15:45,476 Speaker 1: And then at that stage we were still improving rapidly, 272 00:15:46,716 --> 00:15:50,596 Speaker 1: and then you realize, okay, this is for real. There's 273 00:15:51,036 --> 00:15:53,396 Speaker 1: because there right, It wasn't like we it wasn't that 274 00:15:53,436 --> 00:15:56,116 Speaker 1: we had to squeeze those last little bits and pieces 275 00:15:56,156 --> 00:15:59,756 Speaker 1: of gain out of it. It was still improving fairly rapidly, 276 00:16:00,796 --> 00:16:03,476 Speaker 1: to the point where actually, by the time we actually 277 00:16:03,476 --> 00:16:08,196 Speaker 1: published the paper, we again reduced the computer requirements, not 278 00:16:08,276 --> 00:16:11,356 Speaker 1: quite by an entire order of magnitude, but almost right, 279 00:16:11,476 --> 00:16:14,516 Speaker 1: so it still was getting faster and better at a 280 00:16:14,556 --> 00:16:17,076 Speaker 1: pretty rapid rate. Wow, so we had in the paper 281 00:16:17,116 --> 00:16:19,836 Speaker 1: we had some results that were those roughly ten x 282 00:16:19,836 --> 00:16:23,396 Speaker 1: faster on eighthpus and what we demonstrated in terms of 283 00:16:23,516 --> 00:16:26,596 Speaker 1: quality on those eight GPUs by the time we actually 284 00:16:26,676 --> 00:16:29,396 Speaker 1: published the paper properly we were able to do with one. 285 00:16:29,276 --> 00:16:32,636 Speaker 2: GPU, one GPU meaning one chip of the kind that 286 00:16:32,676 --> 00:16:35,556 Speaker 2: people by one hundred thousand of now to build a 287 00:16:35,636 --> 00:16:39,116 Speaker 2: data center exactly. So the paper actually at the end 288 00:16:39,876 --> 00:16:45,596 Speaker 2: mentions other possible uses beyond language for this technology. It 289 00:16:45,716 --> 00:16:50,996 Speaker 2: mentions images, audio, and video, I think explicitly. How much 290 00:16:51,036 --> 00:16:52,836 Speaker 2: were you thinking about that at the time. Was that 291 00:16:52,956 --> 00:16:55,036 Speaker 2: just like an afterthought or were you like, hey, wait 292 00:16:55,076 --> 00:16:57,356 Speaker 2: a minute, it's not just language. 293 00:16:57,596 --> 00:16:59,956 Speaker 1: By the time it was actually published at a conference, 294 00:16:59,996 --> 00:17:04,076 Speaker 1: not just the preprint. By December, we had initial models 295 00:17:04,236 --> 00:17:07,436 Speaker 1: on other modalities on generating images. We had the first 296 00:17:07,716 --> 00:17:09,836 Speaker 1: the first at the stay. At that time they were 297 00:17:10,116 --> 00:17:12,636 Speaker 1: not performing that well yet, but you know, they were 298 00:17:12,716 --> 00:17:15,916 Speaker 1: rapidly getting better. We had the first prototypes actually of 299 00:17:16,036 --> 00:17:20,676 Speaker 1: models working on genomic data, working on protein structure. That's 300 00:17:20,676 --> 00:17:24,116 Speaker 1: good for shadow good for shadowing exactly. But then we 301 00:17:24,236 --> 00:17:27,236 Speaker 1: ended up for a variety of reasons, we ended up 302 00:17:27,756 --> 00:17:30,716 Speaker 1: at first focusing on applications in computer vision. 303 00:17:31,116 --> 00:17:33,716 Speaker 2: The paper comes out, you know, you're working on these 304 00:17:33,756 --> 00:17:37,756 Speaker 2: other applications, you're presenting the paper, it's published in various forms. 305 00:17:38,356 --> 00:17:42,636 Speaker 1: What's the response like. It was interesting because the response 306 00:17:43,076 --> 00:17:49,836 Speaker 1: built in deep learning AI circles basically between the pre 307 00:17:49,916 --> 00:17:52,236 Speaker 1: print that I think came out and I want to 308 00:17:52,276 --> 00:17:56,436 Speaker 1: say June twenty seventeen, and then the actually actual publication, 309 00:17:56,876 --> 00:17:59,916 Speaker 1: to the extent that by the time the poster session 310 00:17:59,956 --> 00:18:03,676 Speaker 1: happened at the conference, there was quite a crowd at 311 00:18:03,676 --> 00:18:07,036 Speaker 1: the poster so we had to be shoved out of 312 00:18:07,116 --> 00:18:10,316 Speaker 1: the out of the hall in which the poster session happened. 313 00:18:10,316 --> 00:18:14,276 Speaker 1: About security and had very hors voices by the end 314 00:18:14,316 --> 00:18:18,676 Speaker 1: of the evening, you guys were like the Beatles of 315 00:18:18,756 --> 00:18:23,556 Speaker 1: the AI conference. I wouldn't say that because we weren't 316 00:18:23,556 --> 00:18:26,396 Speaker 1: the Beatles, because it was really it was still very specific. 317 00:18:26,436 --> 00:18:27,836 Speaker 2: You were more that you were more of the cool 318 00:18:27,916 --> 00:18:30,036 Speaker 2: hipster band. You were the hipster. 319 00:18:29,756 --> 00:18:32,156 Speaker 1: Band, certainly more the cool hipster band. But it was 320 00:18:32,196 --> 00:18:35,396 Speaker 1: an interesting experience because there were some folks and including 321 00:18:35,396 --> 00:18:38,916 Speaker 1: some greats in the field, who came by and said, Wow, 322 00:18:39,036 --> 00:18:40,396 Speaker 1: this is this is cool. 323 00:18:40,716 --> 00:18:44,156 Speaker 2: What has happened since has been wild. 324 00:18:44,596 --> 00:18:48,036 Speaker 1: It seems wild to say the least. Yes, Is it 325 00:18:48,116 --> 00:18:52,196 Speaker 1: surprising to you? Of course, many aspects are surprising. For sure. 326 00:18:53,796 --> 00:18:57,956 Speaker 1: We definitely saw pretty early on already back in twenty eighteen, 327 00:18:57,996 --> 00:19:04,076 Speaker 1: twenty nineteen, that something really exciting was happening here. Now 328 00:19:05,316 --> 00:19:08,836 Speaker 1: I'm still surprised by with the advent of chat GPT, 329 00:19:10,276 --> 00:19:15,076 Speaker 1: something that didn't go way beyond those language models that 330 00:19:15,116 --> 00:19:19,036 Speaker 1: we had already seen a few years before, was suddenly 331 00:19:20,436 --> 00:19:23,876 Speaker 1: the world's fastest growing consumer product. 332 00:19:23,836 --> 00:19:25,436 Speaker 2: Ever, right, I think ever? 333 00:19:25,676 --> 00:19:26,716 Speaker 1: Ever? Yes? 334 00:19:27,076 --> 00:19:31,956 Speaker 2: And by the way, GBT stands for generative pre transformer, right, 335 00:19:31,996 --> 00:19:35,636 Speaker 2: transformer is your word, that's right? So there's an interesting 336 00:19:36,956 --> 00:19:39,996 Speaker 2: I don't know, business side to this right, which is, 337 00:19:40,356 --> 00:19:42,196 Speaker 2: you were working for Google when you came up with this. 338 00:19:42,356 --> 00:19:48,316 Speaker 2: Google presumably owned the idea, had intellectual property around. 339 00:19:47,996 --> 00:19:49,956 Speaker 1: The idea has filed many a patent. 340 00:19:50,116 --> 00:19:52,436 Speaker 2: Was it just a choice Google made to let everybody 341 00:19:52,556 --> 00:19:56,196 Speaker 2: use it? Like when you see the fastest growing consumer 342 00:19:56,356 --> 00:19:58,716 Speaker 2: product in this year of the world not only built 343 00:19:58,716 --> 00:20:02,076 Speaker 2: on this idea, but using the name like and it's 344 00:20:02,116 --> 00:20:04,356 Speaker 2: a different company that was five years later. 345 00:20:04,236 --> 00:20:04,836 Speaker 1: Five years later. 346 00:20:04,876 --> 00:20:07,436 Speaker 2: But a patent's good for more than five years? Is 347 00:20:07,476 --> 00:20:08,276 Speaker 2: that a choice? 348 00:20:08,356 --> 00:20:10,516 Speaker 1: Is that a stret dig choice? What's going on there? 349 00:20:11,036 --> 00:20:14,196 Speaker 1: So the choice to do it in the first place, 350 00:20:15,036 --> 00:20:19,396 Speaker 1: to publish it in the first place, is really based 351 00:20:19,476 --> 00:20:23,916 Speaker 1: on and and rooted in a deep conviction of Google 352 00:20:23,956 --> 00:20:26,636 Speaker 1: at the time, And I'm actually pretty sure it still 353 00:20:26,676 --> 00:20:31,596 Speaker 1: is the case that it is. Actually these developments are 354 00:20:31,676 --> 00:20:34,356 Speaker 1: the tide that floats all votes, that lifts. 355 00:20:33,876 --> 00:20:38,796 Speaker 2: All votes, like a belief in progress, a belief in progress, 356 00:20:38,796 --> 00:20:40,596 Speaker 2: a good old fashioned Now. 357 00:20:40,996 --> 00:20:45,436 Speaker 1: It's also the case that at the time, organizationally, that 358 00:20:45,556 --> 00:20:51,596 Speaker 1: specific research arm was unusually separated from the product organizations. 359 00:20:51,756 --> 00:20:56,476 Speaker 1: And the reason why Brain or in general, the deep 360 00:20:56,556 --> 00:21:02,436 Speaker 1: learning groups were more separated was in part historical, namely 361 00:21:02,676 --> 00:21:05,716 Speaker 1: that when they started out there were no applications and 362 00:21:05,876 --> 00:21:08,996 Speaker 1: the technology was not ready for being applied, and so 363 00:21:09,356 --> 00:21:13,996 Speaker 1: it's completely understandable and just you know a consequence of 364 00:21:14,156 --> 00:21:20,196 Speaker 1: organic developments that when this technology suddenly is on the 365 00:21:20,236 --> 00:21:24,836 Speaker 1: cusp of being incredibly impactful, you're probably still under utilizing 366 00:21:24,876 --> 00:21:29,796 Speaker 1: it internally and potentially also not yet treating it in 367 00:21:29,876 --> 00:21:32,756 Speaker 1: the same way as you would have maybe otherwise treated 368 00:21:32,836 --> 00:21:35,036 Speaker 1: previous trade secrets. 369 00:21:34,716 --> 00:21:39,516 Speaker 2: For example, as it feels like this out their research project, 370 00:21:39,716 --> 00:21:42,276 Speaker 2: not like what's going to be this consumer. 371 00:21:42,476 --> 00:21:47,316 Speaker 1: Product exactly exactly, And to be fair, it took Open 372 00:21:47,356 --> 00:21:49,836 Speaker 1: a Eye in this case a fair amount of time 373 00:21:50,116 --> 00:21:54,396 Speaker 1: and to then turn this into this product, and most 374 00:21:54,396 --> 00:21:57,876 Speaker 1: of that time it also from their vantage point, wasn't 375 00:21:57,876 --> 00:22:01,036 Speaker 1: a product. Right. So up until all the way through 376 00:22:01,676 --> 00:22:06,836 Speaker 1: chat REPT, Open Eye have published all of their GPT developments, 377 00:22:07,356 --> 00:22:10,036 Speaker 1: maybe not all, but you know, their large fraction of 378 00:22:10,676 --> 00:22:11,316 Speaker 1: their work on this. 379 00:22:11,516 --> 00:22:12,516 Speaker 2: Yeah, they're early models. 380 00:22:12,516 --> 00:22:15,676 Speaker 1: The whole models were open exactly. They were more true 381 00:22:15,676 --> 00:22:19,716 Speaker 1: to their name really also believing in the same thing. 382 00:22:19,716 --> 00:22:22,396 Speaker 1: And it was only really after chat GPT and after 383 00:22:22,476 --> 00:22:27,156 Speaker 1: this to them also surprise to a certain extent success, 384 00:22:27,676 --> 00:22:31,556 Speaker 1: that they started to become more closed as well when 385 00:22:31,596 --> 00:22:37,676 Speaker 1: it comes to scientific developments in this past. You'll be 386 00:22:37,716 --> 00:22:54,836 Speaker 1: back in just a minute. Let's talk about your company. 387 00:22:55,236 --> 00:22:58,156 Speaker 1: When'd you decide to start Inceptive? The decision took a 388 00:22:58,156 --> 00:23:03,196 Speaker 1: while and was influenced by events that happened over the 389 00:23:03,236 --> 00:23:07,116 Speaker 1: course of about three months two to three months in 390 00:23:07,196 --> 00:23:12,196 Speaker 1: late twenty twenty, starting with the birth of my first child. 391 00:23:13,116 --> 00:23:16,916 Speaker 1: So when am I was born, two things happened. Number one, 392 00:23:17,516 --> 00:23:21,316 Speaker 1: witnessing a pregnancy and a birth during a pandemic where 393 00:23:21,516 --> 00:23:24,916 Speaker 1: there's a pathogen that's rapidly spreading, and so all of 394 00:23:24,956 --> 00:23:29,196 Speaker 1: that was a pretty daunting experience, and everything went great, 395 00:23:30,276 --> 00:23:35,436 Speaker 1: But having this new human in my arms also really 396 00:23:35,676 --> 00:23:41,636 Speaker 1: made me question if I couldn't more directly affect people's 397 00:23:41,676 --> 00:23:45,956 Speaker 1: lives positively with my work. And so I was at 398 00:23:46,036 --> 00:23:49,836 Speaker 1: the time quite confident that indirectly it would have effect 399 00:23:49,916 --> 00:23:53,556 Speaker 1: also on things like medicine, biology, etc. But I was wondering, 400 00:23:53,916 --> 00:23:58,196 Speaker 1: couldn't this happen more directly if I focused more on it. 401 00:23:58,436 --> 00:24:00,916 Speaker 1: The next thing that happened was that alpha fold two 402 00:24:01,636 --> 00:24:05,156 Speaker 1: results at CAST fourteen were published. CAST fourteen is this 403 00:24:05,636 --> 00:24:09,596 Speaker 1: biannual challenge for protein structure prediction and some other related problems. 404 00:24:09,716 --> 00:24:11,636 Speaker 1: This is the protein folding problem, and this is the 405 00:24:11,636 --> 00:24:13,036 Speaker 1: protein folding problem exactly. 406 00:24:13,076 --> 00:24:15,836 Speaker 2: The machine learning solving the protein folding problem, which had 407 00:24:15,876 --> 00:24:18,676 Speaker 2: been a problem for decades given us chain of amino 408 00:24:18,716 --> 00:24:21,476 Speaker 2: acids predict the three D structure of approach precisely, and 409 00:24:22,276 --> 00:24:24,676 Speaker 2: humans failed and machine learning succeeded. 410 00:24:24,756 --> 00:24:29,356 Speaker 1: Just amazing. Yes, it's a great example. Humans failed despite 411 00:24:29,396 --> 00:24:33,156 Speaker 1: the fact that we actually understand the physics fundamentally, but 412 00:24:33,276 --> 00:24:37,276 Speaker 1: we still couldn't create models that were good enough using 413 00:24:37,316 --> 00:24:39,756 Speaker 1: our conceptual understanding of the processes involve. 414 00:24:39,876 --> 00:24:42,636 Speaker 2: You would think an algorithm would work on that one, right, 415 00:24:42,676 --> 00:24:45,116 Speaker 2: You would just think an old school set of rules, 416 00:24:45,196 --> 00:24:48,196 Speaker 2: like we know what the molecules look like, we know 417 00:24:48,316 --> 00:24:51,516 Speaker 2: the laws of physics. It's amazing that we couldn't predict 418 00:24:51,556 --> 00:24:53,276 Speaker 2: it that way. Right. All you want to know is 419 00:24:53,316 --> 00:24:55,396 Speaker 2: what shape is the protein going to be? You know 420 00:24:55,516 --> 00:24:57,916 Speaker 2: all of the constituent parts, you know every atom in it, 421 00:24:57,956 --> 00:25:00,276 Speaker 2: and you still couldn't predict it with a set of rules, 422 00:25:00,276 --> 00:25:02,796 Speaker 2: but AI machine learning could. 423 00:25:03,556 --> 00:25:07,036 Speaker 1: Amazing, Yes, and it is amazing. Actually, when you put 424 00:25:07,036 --> 00:25:09,156 Speaker 1: it like this, it's important to point out that and 425 00:25:09,676 --> 00:25:12,756 Speaker 1: when we say we understand it, we make massive oversimplifying 426 00:25:12,796 --> 00:25:16,596 Speaker 1: assumptions because we ignore all the other players that are 427 00:25:16,636 --> 00:25:19,876 Speaker 1: present when a protein folds. We ignore a lot of 428 00:25:19,916 --> 00:25:22,796 Speaker 1: the kinetics of it because we say we know the structure, 429 00:25:22,996 --> 00:25:25,156 Speaker 1: but the truth is, we don't know all the wiggling 430 00:25:25,236 --> 00:25:27,516 Speaker 1: and all the shenanigans that happen on the way there, right, 431 00:25:27,556 --> 00:25:31,996 Speaker 1: and we don't know about uh, you know, chaperone proteins 432 00:25:31,996 --> 00:25:34,076 Speaker 1: that are there to influence the folding. We don't know 433 00:25:34,316 --> 00:25:36,636 Speaker 1: around all sorts of other I'm doing the physics one. 434 00:25:36,676 --> 00:25:40,796 Speaker 2: I'm doing the assume a frictionless plane version of protein precisely. 435 00:25:40,436 --> 00:25:43,556 Speaker 1: Precisely, precisely. And the beauty is that deep learning doesn't 436 00:25:43,556 --> 00:25:45,556 Speaker 1: need to make this assumption. AI doesn't need to make 437 00:25:45,556 --> 00:25:48,396 Speaker 1: this assumption. AI it just looks at data, and it 438 00:25:48,436 --> 00:25:51,356 Speaker 1: can look at more data than any human or even 439 00:25:51,516 --> 00:25:55,796 Speaker 1: humanity eventually could look at together. It's such a good 440 00:25:55,836 --> 00:25:59,076 Speaker 1: example problem to demonstrate that these models are ready for 441 00:25:59,156 --> 00:26:02,476 Speaker 1: prime time in this field and ready for lots of applications, 442 00:26:02,476 --> 00:26:04,676 Speaker 1: not just one or two, but men sold, and so 443 00:26:04,876 --> 00:26:08,156 Speaker 1: that happens, so sold exactly. And then the third thing 444 00:26:08,316 --> 00:26:14,036 Speaker 1: was that the COVID mRNA vaccines came out with astonishing 445 00:26:14,196 --> 00:26:17,196 Speaker 1: ninety plus percent out of. 446 00:26:17,156 --> 00:26:21,996 Speaker 2: The gate that they were still so underraty. Under the 447 00:26:21,996 --> 00:26:24,756 Speaker 2: beginning of the pandemic, people were like, it'll be two 448 00:26:24,836 --> 00:26:27,596 Speaker 2: or three years, and if there's sixty percent effective, that'll be. 449 00:26:27,516 --> 00:26:30,836 Speaker 1: Great, exactly exactly, And so everybody forgets. Everybody forgets it. 450 00:26:30,996 --> 00:26:33,676 Speaker 1: And when you look at it, this is a molecule 451 00:26:33,756 --> 00:26:35,996 Speaker 1: family that was for you know, most of the time 452 00:26:35,996 --> 00:26:38,116 Speaker 1: that we've known about it since the sixties, I suppose 453 00:26:38,756 --> 00:26:43,796 Speaker 1: we've treated it like a neglected stepchild of molecular biology, 454 00:26:43,996 --> 00:26:47,156 Speaker 1: because you're talking about marine in general. In general. 455 00:26:47,876 --> 00:26:49,916 Speaker 2: Everybody loves DNA, right, DNA. 456 00:26:50,036 --> 00:26:53,716 Speaker 3: Everybody loves DNA movie star, Yeah, exactly, exactly, even though 457 00:26:53,756 --> 00:26:57,316 Speaker 3: now looking back, DNA is merely you know, the place 458 00:26:57,356 --> 00:27:00,716 Speaker 3: where life takes its notes, maybe the hard drive and 459 00:27:00,796 --> 00:27:01,356 Speaker 3: the memory. 460 00:27:01,556 --> 00:27:04,356 Speaker 1: It's the book, right, it's the book. So but but 461 00:27:04,436 --> 00:27:07,076 Speaker 1: at the end of the day, it was this molecule 462 00:27:07,076 --> 00:27:10,196 Speaker 1: family that was about to save, you know, depending on them, 463 00:27:10,396 --> 00:27:14,276 Speaker 1: tens of millions of lives and in rapid time. So 464 00:27:14,356 --> 00:27:16,836 Speaker 1: all these things hold, but we have no training data 465 00:27:16,956 --> 00:27:21,596 Speaker 1: to apply anything like alpha fold to this specific molecule family, 466 00:27:21,676 --> 00:27:24,356 Speaker 1: no training data to speak of. We had two hundred 467 00:27:24,396 --> 00:27:28,876 Speaker 1: thousand known protein structures at the time, I believe, maybe optimistically, 468 00:27:28,916 --> 00:27:31,996 Speaker 1: we had maybe twelve hundred known RNA structures. And on 469 00:27:32,036 --> 00:27:34,636 Speaker 1: top of that, it was also fairly clear that for 470 00:27:34,796 --> 00:27:38,236 Speaker 1: RNA going directly to function would be much much more important, 471 00:27:38,276 --> 00:27:41,796 Speaker 1: because it's in a certain sense a less strongly structured molecule, 472 00:27:41,876 --> 00:27:44,916 Speaker 1: and other aspects of the molecule might play a bigger role. 473 00:27:45,756 --> 00:27:49,196 Speaker 1: And then on top of that, the attention that generative 474 00:27:49,236 --> 00:27:53,476 Speaker 1: AI was receiving overall, also now in the field of 475 00:27:53,716 --> 00:27:58,756 Speaker 1: pharma or of medicine, was building, And so I ended 476 00:27:58,836 --> 00:28:02,996 Speaker 1: up finding myself in a conversation where very I would 477 00:28:02,996 --> 00:28:07,956 Speaker 1: say wise longtime mentor of mine pointed out that, you know, 478 00:28:08,036 --> 00:28:11,316 Speaker 1: maybe ten years from now or so, somebody could tell 479 00:28:11,396 --> 00:28:14,836 Speaker 1: my daughter that there was this perfect storm where this 480 00:28:14,956 --> 00:28:17,516 Speaker 1: MACLE molecule with no training data was about to save 481 00:28:17,556 --> 00:28:20,076 Speaker 1: the world and could do so much more in the 482 00:28:20,076 --> 00:28:23,996 Speaker 1: direction of positively impacting people's lives. We didn't have training data, 483 00:28:24,396 --> 00:28:27,676 Speaker 1: would be very expensive to create it, but using the 484 00:28:27,796 --> 00:28:30,196 Speaker 1: technology that I've been or technologies that I'd been working 485 00:28:30,276 --> 00:28:32,076 Speaker 1: on for the last I don't know, ten plus years, 486 00:28:32,516 --> 00:28:36,076 Speaker 1: and the ability because of the attention that people were 487 00:28:36,716 --> 00:28:40,316 Speaker 1: now giving to AI in this field the ability to 488 00:28:40,356 --> 00:28:43,116 Speaker 1: raise quite a bit of money. I, in that position, 489 00:28:43,276 --> 00:28:47,956 Speaker 1: chose to stay back at my cushy dream job in 490 00:28:47,956 --> 00:28:53,436 Speaker 1: big tech and not actually take this opportunity to really 491 00:28:53,436 --> 00:28:56,716 Speaker 1: positively impact people's lives, And that idea was not one 492 00:28:56,796 --> 00:28:58,556 Speaker 1: I was willing to entertain. 493 00:28:59,036 --> 00:29:00,956 Speaker 2: You couldn't just coast it out at Google and let 494 00:29:01,036 --> 00:29:03,316 Speaker 2: somebody else go figure out RNA. 495 00:29:03,516 --> 00:29:06,556 Speaker 1: Yeah, and it's not just RNA. I think RNA is 496 00:29:06,556 --> 00:29:08,356 Speaker 1: a great starting point at the end of the day, 497 00:29:08,676 --> 00:29:14,916 Speaker 1: but building models that learn from first of all, all 498 00:29:14,916 --> 00:29:17,076 Speaker 1: the publicly available data that we can possibly get our 499 00:29:17,116 --> 00:29:19,596 Speaker 1: hands on, but also from data that we can reasonably 500 00:29:19,636 --> 00:29:24,516 Speaker 1: effectively create in our own lab. How to design molecules 501 00:29:24,596 --> 00:29:28,356 Speaker 1: for specific functions is something that now is within reach 502 00:29:28,636 --> 00:29:32,156 Speaker 1: and that will in the next years, in the years 503 00:29:32,196 --> 00:29:36,116 Speaker 1: to come, have completely transformational impact on how we even 504 00:29:36,156 --> 00:29:41,716 Speaker 1: think about what medicines are. That any opportunity to speed 505 00:29:41,756 --> 00:29:44,596 Speaker 1: this up, to make this happen, even just a day 506 00:29:44,676 --> 00:29:48,116 Speaker 1: sooner than it could have otherwise happened, is incredibly valuable 507 00:29:48,196 --> 00:29:48,836 Speaker 1: in my opinion. 508 00:29:49,116 --> 00:29:52,116 Speaker 2: As you're talking about this idea that the absence of 509 00:29:52,276 --> 00:29:55,116 Speaker 2: training data is kind of seems to be at the 510 00:29:55,116 --> 00:29:56,716 Speaker 2: center of it, right, It seems to be the core 511 00:29:57,396 --> 00:30:01,156 Speaker 2: yeah problem, which makes sense, right, Like the reason language 512 00:30:01,156 --> 00:30:03,356 Speaker 2: works so well is basically because of the Internet. I know, 513 00:30:03,436 --> 00:30:05,796 Speaker 2: now we're going beyond it, but like it just happened 514 00:30:05,796 --> 00:30:08,556 Speaker 2: to be that there was this incredibly giant set of 515 00:30:08,636 --> 00:30:12,556 Speaker 2: natural life language that became available. We don't have anything 516 00:30:12,636 --> 00:30:14,676 Speaker 2: like that for RNA, so are you. I mean, it's 517 00:30:14,756 --> 00:30:19,916 Speaker 2: kind of step one at inceptive creating the data. Is 518 00:30:19,956 --> 00:30:21,476 Speaker 2: that kind of what's happening? 519 00:30:22,356 --> 00:30:25,156 Speaker 1: So step one that inceptive is learning to use all 520 00:30:25,196 --> 00:30:27,036 Speaker 1: the data or was I think we've made a lot 521 00:30:27,036 --> 00:30:28,796 Speaker 1: of focus in that direction, learning to use all the 522 00:30:28,876 --> 00:30:33,676 Speaker 1: data that is available already and identify what other data 523 00:30:33,716 --> 00:30:36,276 Speaker 1: we're missing, and then see how far we can get 524 00:30:36,316 --> 00:30:39,556 Speaker 1: with just the publicly available data and at the same 525 00:30:39,596 --> 00:30:42,996 Speaker 1: time scale up generating our own data. And it turns 526 00:30:42,996 --> 00:30:46,916 Speaker 1: out that actually, because of the nature of evolution, because 527 00:30:46,916 --> 00:30:51,676 Speaker 1: of how evolution isn't actually incentivized to really explore the 528 00:30:51,876 --> 00:30:58,516 Speaker 1: entire space of possibilities. It is almost always given that 529 00:30:58,676 --> 00:31:02,716 Speaker 1: if you are trying to design exceptional molecules, especially ones 530 00:31:02,836 --> 00:31:08,156 Speaker 1: that are not say, you know, natural formats, you are 531 00:31:08,396 --> 00:31:11,596 Speaker 1: basically gearing need to need novel training in it. 532 00:31:11,916 --> 00:31:15,276 Speaker 2: Yeah, basically you're saying you build RNAs that don't exist 533 00:31:15,356 --> 00:31:17,796 Speaker 2: in the world that have therapeutic uses, and there's no 534 00:31:17,996 --> 00:31:19,556 Speaker 2: kind of definitionally no training. 535 00:31:19,636 --> 00:31:21,916 Speaker 1: Yes, that exist. The funny thing is we have a 536 00:31:21,956 --> 00:31:25,596 Speaker 1: few of them, and so we have existence proofs of 537 00:31:25,796 --> 00:31:32,956 Speaker 1: OURNA molecules, for example, RNA viruses that actually exhibit incredibly 538 00:31:32,996 --> 00:31:38,316 Speaker 1: complex different functions in ourselves, that do all sorts of 539 00:31:38,436 --> 00:31:40,876 Speaker 1: things that we don't usually like. But if we could 540 00:31:40,956 --> 00:31:43,836 Speaker 1: use those, you know, for good, If we could use those, 541 00:31:44,236 --> 00:31:48,396 Speaker 1: you know, in ways that would actually be aimed at 542 00:31:48,396 --> 00:31:51,676 Speaker 1: fighting disease rather than creating them, those kinds of functions, 543 00:31:51,716 --> 00:31:55,516 Speaker 1: even just a small subset of them, would really transform 544 00:31:55,556 --> 00:31:58,076 Speaker 1: medicine already. And so we know it's possible. What are 545 00:31:58,076 --> 00:31:59,516 Speaker 1: you dreaming of when you say that, what are you 546 00:31:59,556 --> 00:32:03,516 Speaker 1: thinking of? Specific? Okay, So, for example, right, one estimate 547 00:32:03,676 --> 00:32:07,756 Speaker 1: is that in order for COVID to infect you, you 548 00:32:07,796 --> 00:32:13,436 Speaker 1: would need potentially as few as five COVID genomes inside 549 00:32:13,436 --> 00:32:16,436 Speaker 1: your organism that's already in five five viral particles. Five 550 00:32:16,516 --> 00:32:21,356 Speaker 1: viral particles. Yeah, you inhale those, you wouldn't have to 551 00:32:21,516 --> 00:32:24,516 Speaker 1: inject it you wouldn't even have to swallow it, you 552 00:32:24,516 --> 00:32:25,076 Speaker 1: inhale them. 553 00:32:25,316 --> 00:32:27,156 Speaker 2: If we could have a medicine that worked as well 554 00:32:27,156 --> 00:32:29,076 Speaker 2: as a disease is a version of your. 555 00:32:29,076 --> 00:32:31,996 Speaker 1: Truth, exactly exactly so at the end of the day, right, 556 00:32:32,076 --> 00:32:36,716 Speaker 1: this medicine is able to spread in your body only 557 00:32:36,876 --> 00:32:40,156 Speaker 1: into certain types of organs and tissues and cells. It 558 00:32:40,196 --> 00:32:42,876 Speaker 1: does certain things there that are really quite complex, right, 559 00:32:43,076 --> 00:32:46,916 Speaker 1: changing the cells behavior again not usually in this case 560 00:32:46,996 --> 00:32:50,676 Speaker 1: in favorable ways, but still in ways that wouldn't have 561 00:32:50,716 --> 00:32:53,276 Speaker 1: to be modified that much in order to potentially be 562 00:32:53,436 --> 00:32:56,636 Speaker 1: exactly what you would need for complex multifactorial medicine. And 563 00:32:56,676 --> 00:32:58,756 Speaker 1: if you could make all of that happen by just 564 00:32:58,796 --> 00:33:02,756 Speaker 1: inhaling five of those molecules, then again, that would completely 565 00:33:02,836 --> 00:33:05,516 Speaker 1: change how you think about medicine. Right, you have viruses 566 00:33:05,756 --> 00:33:09,116 Speaker 1: that aren't immediately active, but that are inactive for long 567 00:33:09,116 --> 00:33:12,956 Speaker 1: periods of time in your organism, and only under certain conditions, 568 00:33:13,036 --> 00:33:19,516 Speaker 1: say under certain immune conditions, really start being reactivated. Why 569 00:33:19,516 --> 00:33:23,236 Speaker 1: can't we have medicines that work in a similar way 570 00:33:23,276 --> 00:33:26,756 Speaker 1: where you actually not only in a vaccination sense, but 571 00:33:26,876 --> 00:33:29,836 Speaker 1: where you take a medicine for a genetic predisposition for 572 00:33:29,836 --> 00:33:31,916 Speaker 1: a certain disease that you are able to take a 573 00:33:31,956 --> 00:33:33,676 Speaker 1: metic design of medicine that you can take and that 574 00:33:33,796 --> 00:33:36,876 Speaker 1: waits until the disease actually starts to develop, and only 575 00:33:36,876 --> 00:33:39,676 Speaker 1: then and only where that disease then starts developed, becomes 576 00:33:39,716 --> 00:33:43,236 Speaker 1: active and actually affects it and potentially also then alarms 577 00:33:43,316 --> 00:33:44,916 Speaker 1: the doctor through a blunt test. 578 00:33:45,916 --> 00:33:48,436 Speaker 2: Like for cancer cells or something. So you have some 579 00:33:48,956 --> 00:33:51,396 Speaker 2: kind of prophylactic medicine in your body and it is 580 00:33:51,516 --> 00:33:54,756 Speaker 2: encoded in such a way that it just hangs out there, 581 00:33:55,116 --> 00:33:58,476 Speaker 2: like herpes, to take a pathological example for example, and 582 00:33:58,516 --> 00:34:02,196 Speaker 2: only in certain settings does it do anything. And those 583 00:34:02,236 --> 00:34:05,076 Speaker 2: settings are if you see a cancer cell, destroy it, 584 00:34:05,116 --> 00:34:06,876 Speaker 2: otherwise just it there precisely. 585 00:34:07,316 --> 00:34:09,476 Speaker 1: And if you can design those also in ways where 586 00:34:09,476 --> 00:34:12,356 Speaker 1: you can just make them all go away. When you know, 587 00:34:12,476 --> 00:34:15,956 Speaker 1: you take a say a completely harmless small molecule, and 588 00:34:15,996 --> 00:34:17,636 Speaker 1: that's again entirely feasible. 589 00:34:17,836 --> 00:34:21,036 Speaker 2: Sure, So, I mean you're dreaming big. These are wonderful 590 00:34:21,076 --> 00:34:23,276 Speaker 2: big you know, science fiction andy dreams that I hope 591 00:34:23,316 --> 00:34:27,076 Speaker 2: you figure them out. On a practical level. What's happening 592 00:34:27,076 --> 00:34:29,036 Speaker 2: at the company right now? How many people work there, 593 00:34:29,116 --> 00:34:30,716 Speaker 2: what are they doing, and what are they figured out 594 00:34:30,756 --> 00:34:31,036 Speaker 2: so far? 595 00:34:31,116 --> 00:34:35,636 Speaker 1: We're round forty. What we're doing is really exactly what 596 00:34:35,676 --> 00:34:40,796 Speaker 1: we just talked about. We're basically scaling data generation experiments 597 00:34:40,836 --> 00:34:44,516 Speaker 1: in our lab that allow us to assess a variety 598 00:34:44,556 --> 00:34:50,396 Speaker 1: of different functions of different mostly RNA molecules actually mostly 599 00:34:50,516 --> 00:34:54,876 Speaker 1: m RNA molecules at the moment, that are relevant to 600 00:34:55,116 --> 00:34:58,236 Speaker 1: a pretty broad variety of different diseases. And so this 601 00:34:58,356 --> 00:35:03,636 Speaker 1: ranges from things like infectious disease vaccines to sell therapies 602 00:35:03,676 --> 00:35:06,636 Speaker 1: that can be applied in oncology or an auto or 603 00:35:06,676 --> 00:35:12,076 Speaker 1: against autoimmune disease. We have mRNAs that we hope will 604 00:35:12,116 --> 00:35:16,076 Speaker 1: eventually be effective in enzyme replacement as enzyme replacement therapies 605 00:35:16,436 --> 00:35:20,396 Speaker 1: for families of a large family of rare diseases, and 606 00:35:20,436 --> 00:35:23,836 Speaker 1: the list goes on. And so we're creating this or 607 00:35:23,956 --> 00:35:28,836 Speaker 1: growing this training data set that eventually, on top of 608 00:35:30,236 --> 00:35:33,436 Speaker 1: foundation and models that we pre trained on all publicly 609 00:35:33,476 --> 00:35:39,036 Speaker 1: available data, allow us to tune those foundation models towards 610 00:35:39,116 --> 00:35:44,636 Speaker 1: designing exceptional molecules for exactly those applications and many more 611 00:35:44,676 --> 00:35:45,956 Speaker 1: sharing similar properties. 612 00:35:45,996 --> 00:35:50,556 Speaker 2: So you basically build new mr and a model molecules 613 00:35:50,596 --> 00:35:52,796 Speaker 2: and test them, and then you give that data to 614 00:35:52,876 --> 00:35:56,356 Speaker 2: your model and presumably it tells you what to build next, 615 00:35:56,396 --> 00:35:58,116 Speaker 2: or it helps you figure out what to build next. 616 00:35:58,116 --> 00:35:59,436 Speaker 2: It's sort of a loop in that way. 617 00:35:59,516 --> 00:36:03,236 Speaker 1: The models are definitely one interesting source for proposals if 618 00:36:03,276 --> 00:36:07,596 Speaker 1: you wish for what to synthesize and test next, they're 619 00:36:07,636 --> 00:36:11,036 Speaker 1: not the only such source, so we basically also explore 620 00:36:11,196 --> 00:36:14,916 Speaker 1: kind of and maybe less guided or heuristically guided ways, 621 00:36:15,316 --> 00:36:18,236 Speaker 1: but exactly so in some of the cases, it's really 622 00:36:18,316 --> 00:36:21,076 Speaker 1: quite iterative. For some of those functions and for some 623 00:36:21,156 --> 00:36:25,956 Speaker 1: of those modalities and diseases or disease targets, we're actually 624 00:36:25,956 --> 00:36:29,316 Speaker 1: already at a point where our models can spit out 625 00:36:29,476 --> 00:36:33,116 Speaker 1: entirely novel molecules that really are unlike anything they've ever 626 00:36:33,156 --> 00:36:37,756 Speaker 1: seen or we've ever seen in nature, that very consistently 627 00:36:38,676 --> 00:36:43,156 Speaker 1: perform quite favorably compared to pretty strong baselines by incumbents 628 00:36:43,156 --> 00:36:43,636 Speaker 1: in the field. 629 00:36:44,076 --> 00:36:48,716 Speaker 2: When you say perform quite favorably compared to baselines by 630 00:36:48,716 --> 00:36:50,916 Speaker 2: incumbents in the field, and does that on some level 631 00:36:50,996 --> 00:36:54,196 Speaker 2: mean better than what experts would think. 632 00:36:54,076 --> 00:36:56,756 Speaker 1: Up, better than what experts can think up, and also 633 00:36:56,876 --> 00:37:00,876 Speaker 1: better than more traditional machine learning tools can easily produce. 634 00:37:01,276 --> 00:37:03,796 Speaker 2: It's like that famous moment in the Go match when 635 00:37:03,836 --> 00:37:07,476 Speaker 2: alpha go made some move that like no human being 636 00:37:07,716 --> 00:37:08,796 Speaker 2: ever would have thought of. 637 00:37:09,796 --> 00:37:14,276 Speaker 1: Yes, so I would say we've long passed the move 638 00:37:14,356 --> 00:37:18,556 Speaker 1: thirty seven in the sense that our understanding of the 639 00:37:18,676 --> 00:37:23,116 Speaker 1: underlying biological phenomena is so incomplete that for most of 640 00:37:23,196 --> 00:37:26,756 Speaker 1: the things that we're able to design for, we don't 641 00:37:26,756 --> 00:37:28,396 Speaker 1: really understand why they happen. 642 00:37:28,836 --> 00:37:31,196 Speaker 2: Huh, when you say weed, you mean at inceptive or 643 00:37:31,236 --> 00:37:32,916 Speaker 2: do you mean just medicine in general? 644 00:37:33,516 --> 00:37:34,916 Speaker 1: I would say just medicine in general. 645 00:37:35,316 --> 00:37:39,156 Speaker 2: Okay, So Inceptive is doing this very kind of high 646 00:37:39,276 --> 00:37:42,836 Speaker 2: level work, right, I mean building what will hopefully be 647 00:37:42,876 --> 00:37:46,236 Speaker 2: the foundation. What's the right amount of time in the 648 00:37:46,236 --> 00:37:48,716 Speaker 2: future to ask about when will we know if it works? 649 00:37:48,836 --> 00:37:49,956 Speaker 2: You think five years? 650 00:37:50,316 --> 00:37:55,276 Speaker 1: So the general idea of using genitive AI and similar 651 00:37:55,436 --> 00:38:02,156 Speaker 1: techniques to generate therapeutics, there are some things in clinical 652 00:38:02,196 --> 00:38:07,076 Speaker 1: trials that were largely designed with AI. As far as 653 00:38:07,076 --> 00:38:11,276 Speaker 1: I know, we're still maybe now we have the first 654 00:38:11,356 --> 00:38:15,756 Speaker 1: trials just now starting for molecules that were truly entirely 655 00:38:15,796 --> 00:38:18,076 Speaker 1: designed by A. 656 00:38:17,356 --> 00:38:20,436 Speaker 2: As opposed to sort of selected from a library. 657 00:38:20,036 --> 00:38:24,716 Speaker 1: Or selected, influenced, exactly selected, adjusted to you, and tweaked, 658 00:38:25,156 --> 00:38:29,076 Speaker 1: et cetera. Right, So that's really still only happening just now, 659 00:38:29,876 --> 00:38:33,556 Speaker 1: but we will see I believe, the first success or 660 00:38:33,636 --> 00:38:37,236 Speaker 1: a first success of such molecules, certainly within the next 661 00:38:37,276 --> 00:38:37,876 Speaker 1: five years. 662 00:38:38,116 --> 00:38:41,076 Speaker 2: What about more narrowly, the project at inceptive. 663 00:38:41,156 --> 00:38:44,316 Speaker 1: It's a similar timeframe. We should be able to get 664 00:38:44,876 --> 00:38:49,356 Speaker 1: molecules into the clinic in the next few years, certainly 665 00:38:49,396 --> 00:38:51,476 Speaker 1: in the next handful of years. Now. These will not 666 00:38:51,556 --> 00:38:57,196 Speaker 1: be molecules with where the objective that we used in 667 00:38:57,236 --> 00:39:01,756 Speaker 1: their design is you know, even remotely as complex or 668 00:39:02,236 --> 00:39:05,516 Speaker 1: you know, kind of the different functions that we're designing 669 00:39:05,556 --> 00:39:09,436 Speaker 1: for are are not going to be even remotely as 670 00:39:09,476 --> 00:39:12,276 Speaker 1: diverse as say what you would find because we used 671 00:39:12,276 --> 00:39:15,676 Speaker 1: this example earlier in ourna virus. These will really be 672 00:39:15,876 --> 00:39:21,236 Speaker 1: more simpler. Those will be molecules that don't do things 673 00:39:21,276 --> 00:39:24,076 Speaker 1: that we couldn't possibly have done before, but that do 674 00:39:24,236 --> 00:39:28,276 Speaker 1: them much better in ways that are more accessible, in 675 00:39:28,356 --> 00:39:30,916 Speaker 1: ways that come with less side effects. 676 00:39:30,996 --> 00:39:34,916 Speaker 2: What biotech largely is is they make protein drugs. And 677 00:39:34,956 --> 00:39:37,836 Speaker 2: so if you could make an mRNA drug where you 678 00:39:37,836 --> 00:39:39,836 Speaker 2: put the m RNA into the body and the body 679 00:39:39,836 --> 00:39:42,436 Speaker 2: makes the protein, it wouldn't be some crazy sleeper cell 680 00:39:42,436 --> 00:39:44,196 Speaker 2: that sits in your body for twenty years or whatever, 681 00:39:44,436 --> 00:39:48,996 Speaker 2: but it might be a more practical alternative to today's biotech drugs. 682 00:39:49,116 --> 00:39:49,596 Speaker 1: Absolutely. 683 00:39:50,396 --> 00:39:53,756 Speaker 2: So you've had a kind of crash course in biology 684 00:39:53,796 --> 00:39:55,676 Speaker 2: in the last few years, yes, And I'm curious, like, 685 00:39:56,276 --> 00:40:00,156 Speaker 2: what is what is something that has been particularly compelling 686 00:40:00,276 --> 00:40:02,876 Speaker 2: or surprising or interesting to you that you have learned 687 00:40:02,876 --> 00:40:03,676 Speaker 2: about biology. 688 00:40:03,956 --> 00:40:07,596 Speaker 1: They're countless things. The biggest one, or the red thread 689 00:40:08,116 --> 00:40:15,196 Speaker 1: across many of them is really just how effective life 690 00:40:16,396 --> 00:40:22,676 Speaker 1: is at finding solutions to problems that on one hand 691 00:40:22,916 --> 00:40:27,756 Speaker 1: are incredibly robust, surprisingly robust, and on the other hand, 692 00:40:28,436 --> 00:40:34,476 Speaker 1: are so different from how we would design solutions to 693 00:40:34,636 --> 00:40:35,596 Speaker 1: similar problems. 694 00:40:36,356 --> 00:40:37,156 Speaker 2: Uh huh. 695 00:40:37,516 --> 00:40:40,116 Speaker 1: That really this comes back to this idea that we 696 00:40:40,196 --> 00:40:43,516 Speaker 1: might just not be particularly well equipped in terms of 697 00:40:43,516 --> 00:40:48,356 Speaker 1: cognitive capabilities to understand biology that basically, you know we 698 00:40:49,236 --> 00:40:53,276 Speaker 1: are we would never think to do it this way, 699 00:40:53,356 --> 00:40:57,196 Speaker 1: and how we think to do it is oftentimes much 700 00:40:57,276 --> 00:40:57,876 Speaker 1: more brittle. 701 00:40:58,556 --> 00:41:01,956 Speaker 2: Uh huh. Brittle is an interesting world, less, less resilient, 702 00:41:02,076 --> 00:41:04,036 Speaker 2: less able to persist under different. 703 00:41:03,756 --> 00:41:06,796 Speaker 1: Conditions, exactly exactly. I mean, you know, we still haven't 704 00:41:06,796 --> 00:41:08,916 Speaker 1: built machines that can fix themselves, for one. 705 00:41:09,116 --> 00:41:11,916 Speaker 2: Which is fundamentally the miracle of being a human being. 706 00:41:12,036 --> 00:41:17,516 Speaker 1: Just fundamentally exactly, exactly exactly and so and of course 707 00:41:17,516 --> 00:41:20,796 Speaker 1: this is true across the scales, right from from you know, 708 00:41:21,196 --> 00:41:24,996 Speaker 1: single cells all the way to complex organisms like ourselves 709 00:41:25,436 --> 00:41:33,116 Speaker 1: and and really just how many also very different kinds 710 00:41:33,116 --> 00:41:37,796 Speaker 1: of solutions life has found and or and or constantly 711 00:41:37,876 --> 00:41:40,956 Speaker 1: is finding. Uh. And you see this all over the place, 712 00:41:40,956 --> 00:41:47,716 Speaker 1: and it's both daunting, humbling, but also incredibly inspiring when 713 00:41:47,716 --> 00:41:51,316 Speaker 1: it comes to applying AI in this area, because again 714 00:41:51,356 --> 00:41:54,396 Speaker 1: I think that at least so far, it's the best 715 00:41:54,436 --> 00:41:58,636 Speaker 1: tool and maybe actually the only tool we have so 716 00:41:58,796 --> 00:42:02,716 Speaker 1: far in face of this kind of complexity. Really design 717 00:42:02,756 --> 00:42:07,356 Speaker 1: interventions that medicines that go way beyond what we were 718 00:42:07,396 --> 00:42:09,556 Speaker 1: able to do or are able to do, just based 719 00:42:09,596 --> 00:42:10,916 Speaker 1: on our own conceptual understanding. 720 00:42:14,436 --> 00:42:16,636 Speaker 2: We'll be back in a minute with the lightning round. 721 00:42:18,196 --> 00:42:33,116 Speaker 2: M hm, let's finish for the lightning round. As an 722 00:42:33,156 --> 00:42:38,476 Speaker 2: inventor of the Transformer model, are there particular possible uses 723 00:42:38,516 --> 00:42:41,836 Speaker 2: of it that worry you flash make you sad? 724 00:42:42,596 --> 00:42:48,396 Speaker 1: I am quite concerned about the p doom doomerism, whatever 725 00:42:48,436 --> 00:42:53,836 Speaker 1: you want to call it, existential fear instilling rhetoric that 726 00:42:53,956 --> 00:42:57,916 Speaker 1: is in some cases actually also promoted by people by 727 00:42:58,396 --> 00:42:59,956 Speaker 1: entities in the space. 728 00:43:00,436 --> 00:43:02,436 Speaker 2: So just to be clear, you're you're not worried about 729 00:43:02,436 --> 00:43:05,556 Speaker 2: the existential risk. You're worried about people talking. 730 00:43:05,676 --> 00:43:10,276 Speaker 1: I'm worried about the about the existential risk being inflated 731 00:43:10,356 --> 00:43:17,196 Speaker 1: or the perception being inflated to the extent that we 732 00:43:17,316 --> 00:43:20,996 Speaker 1: actually don't look enough at some of the much more 733 00:43:21,076 --> 00:43:23,996 Speaker 1: concrete and much more immediate risks. Right. I'm not going 734 00:43:24,076 --> 00:43:27,436 Speaker 1: to say that the existential risk is zero. That would 735 00:43:27,436 --> 00:43:27,836 Speaker 1: be silly. 736 00:43:27,956 --> 00:43:31,276 Speaker 2: What is a concrete an immediate risk that is you 737 00:43:31,316 --> 00:43:32,236 Speaker 2: think under. 738 00:43:32,396 --> 00:43:37,396 Speaker 1: Discuss these large scale models are such defective tools in 739 00:43:37,636 --> 00:43:42,556 Speaker 1: manipulating people in large numbers already today, and it's happening 740 00:43:42,756 --> 00:43:47,436 Speaker 1: everywhere for many, many different purposes by in some cases 741 00:43:47,476 --> 00:43:52,516 Speaker 1: benevolent and in many cases malevolent actors that I really 742 00:43:53,156 --> 00:43:56,516 Speaker 1: firmly believe we need to look much more at things 743 00:43:56,636 --> 00:44:03,116 Speaker 1: like enabling cryptographic certification of human generated content, because doing 744 00:44:03,156 --> 00:44:05,196 Speaker 1: that with the machine generated content is not going to work. 745 00:44:05,316 --> 00:44:09,716 Speaker 1: But we definitely can cryptographically certify human generated content as. 746 00:44:09,556 --> 00:44:12,996 Speaker 2: Such basically watermarking or something some way to say this 747 00:44:13,196 --> 00:44:14,036 Speaker 2: a human made this. 748 00:44:14,316 --> 00:44:15,716 Speaker 1: Exactly what would you be. 749 00:44:15,716 --> 00:44:19,156 Speaker 2: Working on if you were not working in biology on 750 00:44:19,236 --> 00:44:19,956 Speaker 2: drug development? 751 00:44:20,516 --> 00:44:26,076 Speaker 1: Education using using artificial intelligence to democratize access to education. 752 00:44:26,836 --> 00:44:30,876 Speaker 2: What have you seen that has been impressive or compelling 753 00:44:30,916 --> 00:44:31,716 Speaker 2: to you in that regard? 754 00:44:31,916 --> 00:44:35,396 Speaker 1: There are lots of little examples so far and really countless. 755 00:44:36,116 --> 00:44:39,236 Speaker 1: It's what's happening at the con Academy. There are many 756 00:44:39,276 --> 00:44:44,076 Speaker 1: examples of AI applied to education problems in places like China, 757 00:44:44,156 --> 00:44:47,316 Speaker 1: for example. You have a bunch of very compelling examples 758 00:44:47,316 --> 00:44:50,556 Speaker 1: in fiction. A book I really like, like a named 759 00:44:50,636 --> 00:44:54,676 Speaker 1: Neil Stephenson, The Diamond Age or a Young Ladies Illustrated 760 00:44:54,676 --> 00:44:56,756 Speaker 1: primer that I recommend if you. 761 00:44:56,796 --> 00:44:59,516 Speaker 2: Just everybody in AI talks about that, Well now they do. 762 00:44:59,596 --> 00:45:01,436 Speaker 1: Yeah, it's yeah, well. 763 00:45:01,316 --> 00:45:01,676 Speaker 2: Now they do. 764 00:45:01,796 --> 00:45:02,516 Speaker 1: You liked it before? 765 00:45:02,516 --> 00:45:02,996 Speaker 2: It was cool? 766 00:45:03,076 --> 00:45:05,836 Speaker 1: I'm sure at one point I thought it was really 767 00:45:05,836 --> 00:45:09,516 Speaker 1: really important and sure that Neil students know is that 768 00:45:09,836 --> 00:45:13,876 Speaker 1: we are about to be able to build the primary 769 00:45:13,956 --> 00:45:16,516 Speaker 1: and so I ended up having coffee with him to 770 00:45:16,516 --> 00:45:19,316 Speaker 1: tell him, oh, that's great. So at the end of 771 00:45:19,356 --> 00:45:24,756 Speaker 1: the day, maybe the biggest inspiration there is my daughter. 772 00:45:25,076 --> 00:45:28,476 Speaker 1: She's four and a half now, and I think she 773 00:45:28,796 --> 00:45:34,036 Speaker 1: could today read. She can read read okay, but she 774 00:45:34,076 --> 00:45:38,076 Speaker 1: could read, you know, grade school level if she had 775 00:45:38,116 --> 00:45:41,276 Speaker 1: access to you know, an AI tutor teaching her how 776 00:45:41,276 --> 00:45:41,596 Speaker 1: to read? 777 00:45:41,676 --> 00:45:45,276 Speaker 2: Does your daughter use AI use you know, AI chat 778 00:45:45,316 --> 00:45:49,796 Speaker 2: butts not directly without me, But we've. 779 00:45:49,596 --> 00:45:54,396 Speaker 1: Actually used chat GPT to implement an AI reading tutor 780 00:45:55,236 --> 00:45:58,036 Speaker 1: that works reasonably well. I mean we basically, you know, 781 00:45:58,156 --> 00:46:01,476 Speaker 1: kind of as I call it now, vibe coding, vibe coded. 782 00:46:02,316 --> 00:46:04,956 Speaker 1: And I wasn't there for all of it. Took some time, 783 00:46:04,996 --> 00:46:06,396 Speaker 1: but she was there for some of it. Oh, you 784 00:46:06,556 --> 00:46:09,076 Speaker 1: vibe coded it with her? Yeah, well, I mean she was, 785 00:46:09,196 --> 00:46:11,996 Speaker 1: she was there. You know, she witnessed a good chunk 786 00:46:12,036 --> 00:46:14,196 Speaker 1: of it, Yes, although she was more interested in the 787 00:46:14,196 --> 00:46:16,636 Speaker 1: image generation parts. But yeah, we have a sketch of 788 00:46:16,676 --> 00:46:19,716 Speaker 1: one that she quite enjoys. So that's kind of like 789 00:46:19,756 --> 00:46:23,596 Speaker 1: the extent of her at the sage using I directly. 790 00:46:30,036 --> 00:46:32,676 Speaker 1: Yakabust is the CEO and. 791 00:46:32,636 --> 00:46:36,156 Speaker 2: Co founder of Inceptive and the co author of the 792 00:46:36,196 --> 00:46:40,076 Speaker 2: paper Attention Is All You Need. Just a quick note, 793 00:46:40,596 --> 00:46:42,996 Speaker 2: This is our last episode before a break of a 794 00:46:43,036 --> 00:46:45,996 Speaker 2: couple of weeks, and then we'll be back with more episodes. 795 00:46:46,676 --> 00:46:49,996 Speaker 2: Please email us at problem at Pushkin dot fm. We 796 00:46:50,036 --> 00:46:53,756 Speaker 2: are always looking for new guests for the show. Today's 797 00:46:53,756 --> 00:46:57,556 Speaker 2: show was produced by Trinamanino and Gabriel Hunter Chang. It 798 00:46:57,756 --> 00:47:01,756 Speaker 2: was edited by Alexander Garretton and engineered by Sarah muguerrett