1 00:00:15,356 --> 00:00:24,076 Speaker 1: Pushkin. Imagine something that is sort of like chat GPT, 2 00:00:24,636 --> 00:00:28,076 Speaker 1: but for the human body. Chat GPT looks at a 3 00:00:28,116 --> 00:00:31,236 Speaker 1: sentence and predicts what words are likely to come next. 4 00:00:31,756 --> 00:00:34,316 Speaker 1: This thing would look at a human body and predict 5 00:00:34,356 --> 00:00:38,156 Speaker 1: what diseases are likely to come next. The body is 6 00:00:38,316 --> 00:00:41,836 Speaker 1: wildly complex and unpredictable. This seems like a very, very 7 00:00:41,876 --> 00:00:45,076 Speaker 1: hard problem, but it is a problem people are working on, 8 00:00:45,516 --> 00:00:48,996 Speaker 1: and at least in some circumstances, they're figuring out how 9 00:00:49,036 --> 00:00:57,716 Speaker 1: to make predictions that are truly useful. I'm Jacob Goldstein, 10 00:00:57,756 --> 00:00:59,636 Speaker 1: and this is What's Your Problem, the show where I 11 00:00:59,716 --> 00:01:02,796 Speaker 1: talk to people who are trying to make technological progress. 12 00:01:03,156 --> 00:01:06,476 Speaker 1: My guest today is Charles Fisher, co founder and CEO 13 00:01:06,636 --> 00:01:10,836 Speaker 1: of Unlearned. Charles' problem is how do you build an 14 00:01:10,836 --> 00:01:14,836 Speaker 1: AI model that can predict human health. Charles and his 15 00:01:14,876 --> 00:01:17,796 Speaker 1: colleagues have built a predictive model of human health that's 16 00:01:17,876 --> 00:01:21,076 Speaker 1: already being used in clinical trials for new drugs and 17 00:01:21,196 --> 00:01:24,356 Speaker 1: new medical devices. But we started out talking about the 18 00:01:24,396 --> 00:01:27,956 Speaker 1: big picture, about the very idea of trying to predict 19 00:01:27,996 --> 00:01:31,836 Speaker 1: what's going to happen to a human body. 20 00:01:32,996 --> 00:01:36,356 Speaker 2: It's funny when I talk about trying to quantify biology 21 00:01:36,356 --> 00:01:38,756 Speaker 2: and make it predictable. I often get hit with this 22 00:01:40,796 --> 00:01:45,716 Speaker 2: critique that biology isn't physics. Biology is complex, biology is 23 00:01:45,716 --> 00:01:47,476 Speaker 2: not physics. We're not going to be able to do that. 24 00:01:47,876 --> 00:01:49,276 Speaker 1: Let's deterministic. 25 00:01:50,396 --> 00:01:54,636 Speaker 2: Right, So for physics, for two thousand years, right, people 26 00:01:54,676 --> 00:01:58,476 Speaker 2: started working on physics in ancient Greece. And for two 27 00:01:58,476 --> 00:02:03,596 Speaker 2: thousand years, physics wasn't physics. Physics was unpredictable. Physics was 28 00:02:03,756 --> 00:02:07,676 Speaker 2: too complex to understand until something was invented. And that 29 00:02:07,676 --> 00:02:09,076 Speaker 2: thing was calculus. 30 00:02:10,116 --> 00:02:10,796 Speaker 1: Until new right. 31 00:02:10,916 --> 00:02:15,236 Speaker 2: Yeah, So once calculus was invented, all of the sudden, 32 00:02:15,276 --> 00:02:18,036 Speaker 2: we had a new language. In this language, this new 33 00:02:18,116 --> 00:02:21,276 Speaker 2: kind of mathematics allowed us to really easily describe lots 34 00:02:21,276 --> 00:02:25,036 Speaker 2: of physical phenomena. And so now physics has become this 35 00:02:25,116 --> 00:02:28,676 Speaker 2: thing that's very predictable and well understood. And that's what 36 00:02:28,676 --> 00:02:31,116 Speaker 2: we've been waiting for in biology. We've been waiting for 37 00:02:31,156 --> 00:02:33,996 Speaker 2: a new tool, a new language, a new mathematics that 38 00:02:34,036 --> 00:02:37,276 Speaker 2: will allow us to understand these complex systems. And that's 39 00:02:37,396 --> 00:02:39,476 Speaker 2: really what I think these new tools are. 40 00:02:39,796 --> 00:02:41,796 Speaker 1: So I think so your hope, your hope is that 41 00:02:41,916 --> 00:02:48,156 Speaker 1: machine learning generative AI will do for medicine biology. What 42 00:02:48,236 --> 00:02:49,756 Speaker 1: Calculus did for physics. 43 00:02:50,236 --> 00:02:53,516 Speaker 2: Exactly. That is big, big, It's exactly what I hope. 44 00:02:53,916 --> 00:02:54,996 Speaker 2: That's exactly what I hope. 45 00:02:55,036 --> 00:02:57,916 Speaker 1: So okay, so this is your hope. You're starting this 46 00:02:58,036 --> 00:03:03,796 Speaker 1: company to test your hypothesis. Uh, what do you do? 47 00:03:04,796 --> 00:03:06,116 Speaker 2: What do you mean? What do I do? What I 48 00:03:06,156 --> 00:03:08,836 Speaker 2: do on day one? Or like, what are we doing? No? 49 00:03:08,836 --> 00:03:12,276 Speaker 1: No, no, We're back to twenty seventeen. You have this 50 00:03:12,516 --> 00:03:16,036 Speaker 1: big up in the sky, two thousand year, thirty thousand 51 00:03:16,036 --> 00:03:18,636 Speaker 1: foot idea. But you got to make a thing that 52 00:03:18,676 --> 00:03:20,876 Speaker 1: somebody is going to pay you for that will hopefully 53 00:03:20,996 --> 00:03:22,916 Speaker 1: use AI in medicine in some way. So what do 54 00:03:22,956 --> 00:03:23,156 Speaker 1: you do? 55 00:03:23,836 --> 00:03:27,436 Speaker 2: So we didn't know what would work, so we focused 56 00:03:27,516 --> 00:03:33,196 Speaker 2: on two different problems at the time. So one problem is, 57 00:03:33,636 --> 00:03:36,596 Speaker 2: let's imagine we're going to have a bunch of data 58 00:03:36,956 --> 00:03:40,356 Speaker 2: from some maybe a big large collection of patients. We're 59 00:03:40,356 --> 00:03:43,276 Speaker 2: gonna have this data all over time, so the symptoms 60 00:03:43,276 --> 00:03:46,996 Speaker 2: that a patient might have every week, four year or 61 00:03:47,036 --> 00:03:49,716 Speaker 2: something like that. And our goal is to be able 62 00:03:49,756 --> 00:03:53,596 Speaker 2: to create a simulator of a patient's future health. So, 63 00:03:53,756 --> 00:03:55,876 Speaker 2: given what I know about a patient in the past, 64 00:03:56,316 --> 00:03:59,196 Speaker 2: can I simulate what will happen to them in the future. 65 00:03:59,916 --> 00:04:03,636 Speaker 1: And presumably that is sort of probabilistic. I mean, what 66 00:04:03,756 --> 00:04:05,476 Speaker 1: we know about health, Like you can say there's an 67 00:04:05,636 --> 00:04:08,476 Speaker 1: x percent chance that in why years this person will 68 00:04:08,476 --> 00:04:09,876 Speaker 1: have a heart attack something like that. 69 00:04:10,236 --> 00:04:13,716 Speaker 2: Exactly. Yeah, we want to yes, because so many things 70 00:04:13,756 --> 00:04:16,996 Speaker 2: are undetermined in that you know, maybe yeah, exactly. 71 00:04:16,836 --> 00:04:18,996 Speaker 1: Right, and it's just the nature of the world, right, 72 00:04:19,356 --> 00:04:20,036 Speaker 1: one hundred percent. 73 00:04:20,196 --> 00:04:21,036 Speaker 2: Yeah. 74 00:04:21,556 --> 00:04:24,636 Speaker 1: So okay, so you have this idea of basically where 75 00:04:24,836 --> 00:04:28,036 Speaker 1: chat GBT, which didn't exist yet, but predicts the next 76 00:04:28,036 --> 00:04:31,316 Speaker 1: word with some probability you want to predict the next 77 00:04:31,356 --> 00:04:32,476 Speaker 1: health outcome. 78 00:04:32,076 --> 00:04:34,716 Speaker 2: For exactly that is the big idea. Yeah, So that 79 00:04:34,916 --> 00:04:36,676 Speaker 2: that was one of them. The other that was not 80 00:04:36,756 --> 00:04:38,516 Speaker 2: the only one that was the one that is what 81 00:04:38,516 --> 00:04:40,956 Speaker 2: we do. The one that we didn't do is we 82 00:04:40,956 --> 00:04:43,676 Speaker 2: were interested as well potentially so that's at a very 83 00:04:43,716 --> 00:04:47,756 Speaker 2: macroscopic scale, that's at the scale of the person, whereas 84 00:04:47,796 --> 00:04:49,956 Speaker 2: the other thing we were interested in was potentially could 85 00:04:49,996 --> 00:04:52,116 Speaker 2: we go at the micro scale and look at what's 86 00:04:52,156 --> 00:04:55,356 Speaker 2: happening inside individual cells. We were interested in this at 87 00:04:55,396 --> 00:04:58,036 Speaker 2: the beginning. Basically, the way we figured this out is 88 00:04:58,116 --> 00:05:01,116 Speaker 2: we signed a few deals with farmer companies to try 89 00:05:01,156 --> 00:05:06,356 Speaker 2: these things, and we found found that the technology worked 90 00:05:06,396 --> 00:05:11,196 Speaker 2: really well in this simulating health outcomes, and it didn't 91 00:05:11,196 --> 00:05:13,676 Speaker 2: work very well when it comes down to simulating what's 92 00:05:13,716 --> 00:05:15,996 Speaker 2: inside the cell. And I think this comes down to data, 93 00:05:16,356 --> 00:05:18,676 Speaker 2: which is that we get a ton of data on 94 00:05:18,916 --> 00:05:21,516 Speaker 2: human health outcomes, like literally every time you go to 95 00:05:21,556 --> 00:05:24,436 Speaker 2: the doctor, there's data there on your health outcomes. But 96 00:05:24,476 --> 00:05:28,036 Speaker 2: the data from the things inside the cell, there is 97 00:05:28,076 --> 00:05:31,636 Speaker 2: a lot of it, but it's much more difficult to 98 00:05:31,676 --> 00:05:34,116 Speaker 2: work with. So I think that was a lot of 99 00:05:34,156 --> 00:05:37,596 Speaker 2: what drove us in this direction is really the focus 100 00:05:37,636 --> 00:05:39,556 Speaker 2: on what we think we have the data to solve 101 00:05:39,636 --> 00:05:40,596 Speaker 2: these kinds of problems. 102 00:05:40,676 --> 00:05:44,476 Speaker 1: So, Okay, you go in the direction of simulating health 103 00:05:44,516 --> 00:05:48,756 Speaker 1: outcomes for patients, and in particular, sort of where you 104 00:05:48,796 --> 00:05:52,596 Speaker 1: get to is working with companies that are running clinical trials. 105 00:05:52,596 --> 00:05:54,796 Speaker 1: And I know eventually you get to a point where 106 00:05:54,836 --> 00:05:57,596 Speaker 1: companies can use your model, use your software to run 107 00:05:57,636 --> 00:06:01,396 Speaker 1: clinical trials with fewer patients. So just tell me about that, 108 00:06:02,156 --> 00:06:03,476 Speaker 1: arc tell me how you get there. 109 00:06:04,076 --> 00:06:08,076 Speaker 2: Clinical trials are, well, they're super tick forever, and they're 110 00:06:08,076 --> 00:06:10,916 Speaker 2: really really expensive. Something might take like five years and 111 00:06:10,956 --> 00:06:14,796 Speaker 2: cost one hundred million dollars to run a clinical trial. Yeah, 112 00:06:14,836 --> 00:06:17,996 Speaker 2: in the way that these are hundreds or thousands of patients, right, oh, 113 00:06:18,036 --> 00:06:21,996 Speaker 2: thousands of patients typically, right, Yeah, And typically half of 114 00:06:22,036 --> 00:06:24,516 Speaker 2: the patients in a clinical trial are receiving a PLACBO. 115 00:06:25,436 --> 00:06:27,796 Speaker 2: So you're going to randomly assign half to receive an 116 00:06:27,796 --> 00:06:30,476 Speaker 2: experimental treatment have to receive a PLACBO. And the reason 117 00:06:30,596 --> 00:06:33,916 Speaker 2: is that every clinical trial is ultimately just doing a comparison. 118 00:06:34,396 --> 00:06:36,956 Speaker 2: You're comparing how a patient responds to the new treatment 119 00:06:36,996 --> 00:06:38,796 Speaker 2: to how they respond if they don't get that treatment. 120 00:06:38,836 --> 00:06:40,716 Speaker 1: And let me just give a shout out to the 121 00:06:40,796 --> 00:06:44,836 Speaker 1: randomized controlled trial as like a really beautiful construct, right, 122 00:06:45,636 --> 00:06:47,956 Speaker 1: not that old? Not that old. I learned that a 123 00:06:48,076 --> 00:06:51,636 Speaker 1: ring for this interview, like less than one hundred years old, amazingly. 124 00:06:52,676 --> 00:06:56,356 Speaker 1: But it's a perfect way to assess not perfect, it's 125 00:06:56,356 --> 00:06:59,956 Speaker 1: a very very good way to assess causality. It's really elegant. 126 00:07:00,156 --> 00:07:03,076 Speaker 2: It is an elegant idea. But if you're a patient, 127 00:07:04,356 --> 00:07:06,996 Speaker 2: why are you participating a clinical trial at all? What's 128 00:07:07,036 --> 00:07:09,716 Speaker 2: the number one reason people participate in clinic trials. They 129 00:07:09,716 --> 00:07:12,116 Speaker 2: participate in clinical trials because they want access to this 130 00:07:12,196 --> 00:07:14,796 Speaker 2: experimental treatment that you can't get any other way. That's 131 00:07:14,796 --> 00:07:17,716 Speaker 2: the number one reason why patients are participating in clinical trials. 132 00:07:17,796 --> 00:07:19,156 Speaker 2: Number one, Now they. 133 00:07:19,076 --> 00:07:21,316 Speaker 1: Don't they don't want to be randomized to the placebo. 134 00:07:21,516 --> 00:07:23,636 Speaker 2: No, no, no, they don't. 135 00:07:23,716 --> 00:07:27,156 Speaker 1: I can certainly understand that it is the case, right 136 00:07:27,276 --> 00:07:33,076 Speaker 1: that most trials fail, meaning the drug is not helping 137 00:07:33,076 --> 00:07:36,836 Speaker 1: you and possibly hurting you, meaning on average, you're better 138 00:07:36,916 --> 00:07:39,396 Speaker 1: off being in the placebo arm Like that is true, right. 139 00:07:39,396 --> 00:07:42,516 Speaker 2: Yea, there's a principle of equipoise. But that's an academic 140 00:07:42,636 --> 00:07:43,956 Speaker 2: Ivory tower principle. 141 00:07:44,156 --> 00:07:48,636 Speaker 1: I mean, it also is true. Just sue, that's fine, that's. 142 00:07:48,476 --> 00:07:52,716 Speaker 2: Fine, but in the end, that's like, in the end, 143 00:07:52,836 --> 00:07:56,956 Speaker 2: patients choose not to participate in clinical trials because they 144 00:07:56,996 --> 00:08:00,516 Speaker 2: don't want to get a placebo. Patients drop out of 145 00:08:00,596 --> 00:08:03,196 Speaker 2: clinical trials when they think they are getting a placbo. 146 00:08:03,796 --> 00:08:07,436 Speaker 2: Those are also true. Number one reason those things happen. 147 00:08:07,556 --> 00:08:08,516 Speaker 2: Are those reasons? Fair? 148 00:08:08,676 --> 00:08:08,916 Speaker 1: Okay? 149 00:08:08,956 --> 00:08:12,636 Speaker 2: Right? So, And in fact, twenty percent of clinical trials 150 00:08:12,636 --> 00:08:14,996 Speaker 2: failed not because the drug didn't work, but because they 151 00:08:15,036 --> 00:08:19,476 Speaker 2: just couldn't find enough people to participate, okay. And what 152 00:08:19,516 --> 00:08:23,916 Speaker 2: we realized though, is that there was a way for 153 00:08:24,076 --> 00:08:29,156 Speaker 2: us not to try to replace the randomized control trial, 154 00:08:29,196 --> 00:08:31,716 Speaker 2: but to make it better, and that what we are 155 00:08:31,756 --> 00:08:35,916 Speaker 2: doing is we could take what we call digital twins 156 00:08:35,636 --> 00:08:38,516 Speaker 2: of the patients, so these are these simulations of their 157 00:08:38,596 --> 00:08:42,076 Speaker 2: of their future outcomes, and that we could incorporate those 158 00:08:42,156 --> 00:08:48,836 Speaker 2: data into our cts directly randomized control trials. We call 159 00:08:48,876 --> 00:08:51,276 Speaker 2: it just kind of like a reimagining of our cts. 160 00:08:51,396 --> 00:08:54,996 Speaker 2: It's it's you're going to have a RCT that is 161 00:08:55,676 --> 00:09:01,156 Speaker 2: more accurate, that is has requires fewer patients, and as 162 00:09:01,196 --> 00:09:03,356 Speaker 2: a result, you get a lot of the benefits of 163 00:09:03,996 --> 00:09:06,756 Speaker 2: faster trials of things that are better for the patients. 164 00:09:06,996 --> 00:09:09,756 Speaker 2: We can talk about that in a minute, but you 165 00:09:09,876 --> 00:09:11,476 Speaker 2: keep all of the same scientific rigger. 166 00:09:12,716 --> 00:09:17,356 Speaker 1: So specifically, okay, that's a good like big picture. Specifically, 167 00:09:18,596 --> 00:09:19,116 Speaker 1: how does it. 168 00:09:19,076 --> 00:09:25,196 Speaker 2: Work right now? We build one model per disease. So, 169 00:09:25,276 --> 00:09:28,716 Speaker 2: for example, we have a model for patients with Alzheimer's disease. 170 00:09:28,756 --> 00:09:31,236 Speaker 2: We have a separate model for patients with als, we 171 00:09:31,276 --> 00:09:33,476 Speaker 2: have a separate model for multiple scleroses, et cetera. 172 00:09:33,876 --> 00:09:36,436 Speaker 1: Let's pick one model and talk about it. What's the 173 00:09:36,436 --> 00:09:38,516 Speaker 1: one that's farthest along, Which is the one that works 174 00:09:38,516 --> 00:09:38,876 Speaker 1: the best? 175 00:09:39,076 --> 00:09:41,996 Speaker 2: Yeah, So our Alzheimer's disease model is that was our 176 00:09:41,996 --> 00:09:44,916 Speaker 2: first one that we've published scientific papers on and things 177 00:09:44,956 --> 00:09:47,076 Speaker 2: like this, so that ones our most well known. 178 00:09:47,356 --> 00:09:51,156 Speaker 1: Okay, so you're setting out to build a model that 179 00:09:51,236 --> 00:09:55,196 Speaker 1: will predict whether what's going to happen, presumably to a 180 00:09:55,196 --> 00:09:58,196 Speaker 1: patient who has the early stages of Alzheimer's disease, How 181 00:09:58,196 --> 00:10:00,876 Speaker 1: will their disease progress? A hard thing to know in 182 00:10:00,916 --> 00:10:04,636 Speaker 1: the real world. How do you build that? What do 183 00:10:04,676 --> 00:10:04,956 Speaker 1: you do? 184 00:10:05,796 --> 00:10:07,836 Speaker 2: So the first thing is that you need data to 185 00:10:07,916 --> 00:10:11,956 Speaker 2: learn from. Yeah, it's kind of obvious. So our first 186 00:10:11,996 --> 00:10:14,076 Speaker 2: step was like, oh, we say, okay, we want to 187 00:10:14,116 --> 00:10:16,236 Speaker 2: have data sets where we get a ton of information 188 00:10:16,276 --> 00:10:19,436 Speaker 2: about each patient. What's that mean? That means that any 189 00:10:19,476 --> 00:10:22,436 Speaker 2: individual time, I want to have a lot of different 190 00:10:22,996 --> 00:10:25,276 Speaker 2: different measurements made on that patient at each time. 191 00:10:26,156 --> 00:10:29,116 Speaker 1: So alsumably you want to have a lot of moments 192 00:10:29,196 --> 00:10:31,156 Speaker 1: when lots of information exactly. 193 00:10:31,156 --> 00:10:32,156 Speaker 2: You also want to have lots of. 194 00:10:32,196 --> 00:10:34,516 Speaker 1: Lots of times over a long period of time, over 195 00:10:34,556 --> 00:10:35,276 Speaker 1: a long period. 196 00:10:35,316 --> 00:10:37,316 Speaker 2: Yeah, and so you know these are going to be 197 00:10:37,476 --> 00:10:39,756 Speaker 2: for Alzheimer's. You're looking at a bunch of things related 198 00:10:39,836 --> 00:10:45,356 Speaker 2: to the patient's cognitive performance on different assessments. Just also 199 00:10:45,396 --> 00:10:48,236 Speaker 2: there's things about just their daily life. How are they 200 00:10:48,276 --> 00:10:51,076 Speaker 2: able to function in their daily life. There's things related 201 00:10:51,116 --> 00:10:55,996 Speaker 2: to their caregivers actually, like how does their caregiver rate 202 00:10:56,316 --> 00:11:00,796 Speaker 2: their behavior? Brain imaging, blood tests, all that kind of information. 203 00:11:00,916 --> 00:11:02,796 Speaker 2: You want to have as much of it about each patient. 204 00:11:02,876 --> 00:11:05,276 Speaker 2: You want to have it as many times as possible. Sure, 205 00:11:05,516 --> 00:11:07,276 Speaker 2: and we'll try to get that for you know, like 206 00:11:07,356 --> 00:11:11,516 Speaker 2: fifty thousand people. And that's the kind of data set 207 00:11:11,596 --> 00:11:13,036 Speaker 2: that we that we're starting with. 208 00:11:13,396 --> 00:11:16,836 Speaker 1: And like, is there one repository that when you get that, 209 00:11:16,876 --> 00:11:18,276 Speaker 1: you're like jackpot or what. 210 00:11:19,236 --> 00:11:23,316 Speaker 2: No, we we have to aggregate data from lots and 211 00:11:23,356 --> 00:11:25,156 Speaker 2: lots of different places to be able to build a 212 00:11:25,156 --> 00:11:25,996 Speaker 2: big enough data set. 213 00:11:26,916 --> 00:11:29,196 Speaker 1: Okay, so now you got the data, what do you 214 00:11:29,236 --> 00:11:29,676 Speaker 1: do next? 215 00:11:30,436 --> 00:11:33,716 Speaker 2: Then we got to train a model to to to 216 00:11:33,836 --> 00:11:37,036 Speaker 2: be able to learn from those data how to simulate things. 217 00:11:37,396 --> 00:11:38,676 Speaker 2: And now actually what we do. 218 00:11:38,876 --> 00:11:42,556 Speaker 1: In particular in this case, how to predict, given some 219 00:11:42,596 --> 00:11:45,036 Speaker 1: set of inputs for a patient, what's going to happen 220 00:11:45,076 --> 00:11:46,276 Speaker 1: next exactly? 221 00:11:46,316 --> 00:11:48,956 Speaker 2: And so this does look you were using that analogy 222 00:11:49,036 --> 00:11:52,036 Speaker 2: of like a language model predicts the next word. So 223 00:11:52,436 --> 00:11:55,036 Speaker 2: given these words I've seen before, predicts the next word. 224 00:11:55,316 --> 00:11:57,676 Speaker 2: And that's that is similar to how our models and 225 00:11:57,716 --> 00:11:59,956 Speaker 2: these diseases work. So we're going to say, given I've 226 00:11:59,996 --> 00:12:02,956 Speaker 2: observed these things in the past about a patient, what 227 00:12:03,036 --> 00:12:06,876 Speaker 2: will happen to them next? That is is very analogous 228 00:12:06,916 --> 00:12:07,876 Speaker 2: to kind of what we're doing. 229 00:12:08,516 --> 00:12:11,276 Speaker 1: It's okay, so you build the model, how does it work? 230 00:12:11,276 --> 00:12:14,636 Speaker 1: How does it work in a clinical trial, specifically so 231 00:12:14,676 --> 00:12:17,236 Speaker 1: that you know the people running the trial can can 232 00:12:17,316 --> 00:12:18,756 Speaker 1: do it with fewer patients. 233 00:12:18,996 --> 00:12:25,836 Speaker 2: Sure. So in a typical case, we're involved at the 234 00:12:25,876 --> 00:12:29,916 Speaker 2: beginning of the clinical trial in the design of the protocol. Okay, 235 00:12:30,316 --> 00:12:34,556 Speaker 2: So there's a question of how many patients should you 236 00:12:34,716 --> 00:12:37,716 Speaker 2: randomize to your control group, how many patients do you 237 00:12:37,756 --> 00:12:40,076 Speaker 2: need overall, and how many should be in the treatment, 238 00:12:40,076 --> 00:12:40,716 Speaker 2: how many should be in. 239 00:12:40,716 --> 00:12:42,876 Speaker 1: The control It's not always fifty to fifty. 240 00:12:43,196 --> 00:12:46,316 Speaker 2: It's not always fifty to fifty in our studies. Our 241 00:12:46,396 --> 00:12:49,356 Speaker 2: typical goal is to try to minimize the number of 242 00:12:49,396 --> 00:12:51,876 Speaker 2: people that you need to put in the control group. Okay, 243 00:12:52,996 --> 00:12:56,156 Speaker 2: And so we're involved in doing helping to do that 244 00:12:56,756 --> 00:12:58,796 Speaker 2: calculation to say, here's how big your trial should be. 245 00:12:58,996 --> 00:13:03,476 Speaker 2: And so then as patients enroll in the study, we 246 00:13:03,556 --> 00:13:08,436 Speaker 2: take data from their first visit before they receive whatever 247 00:13:08,596 --> 00:13:12,956 Speaker 2: new treatment they're going to receive and we take those data, 248 00:13:13,076 --> 00:13:15,796 Speaker 2: we input them into our pre trained model. So I 249 00:13:15,916 --> 00:13:17,876 Speaker 2: like to think about you know, CHATCHBTU give it a 250 00:13:17,916 --> 00:13:20,476 Speaker 2: prompt and it gives us output. Same thing. We take 251 00:13:20,516 --> 00:13:22,716 Speaker 2: the data from the patient, we prompt the model and 252 00:13:22,756 --> 00:13:25,276 Speaker 2: it outputs their predictions for what will happen. 253 00:13:24,996 --> 00:13:26,596 Speaker 1: In the And to be clear, you do that for 254 00:13:26,716 --> 00:13:28,916 Speaker 1: all of the patients in both arms the treatments. 255 00:13:29,476 --> 00:13:32,356 Speaker 2: Yes, yeah, and we don't know, right, it's blinded blind 256 00:13:32,356 --> 00:13:35,716 Speaker 2: it's you, it's blinded to us. We don't know what. Yeah, 257 00:13:35,756 --> 00:13:37,356 Speaker 2: So we do that for one hundred percent of the 258 00:13:37,396 --> 00:13:42,156 Speaker 2: patients and then we give those data to the customer, 259 00:13:42,756 --> 00:13:44,116 Speaker 2: to the farmer company. 260 00:13:44,316 --> 00:13:46,796 Speaker 1: So then what happens next? What happens next? 261 00:13:46,916 --> 00:13:50,076 Speaker 2: We wait around for a while. Yeah. And then when 262 00:13:50,116 --> 00:13:53,196 Speaker 2: the study is actually completed, right, and they they they 263 00:13:53,196 --> 00:13:57,596 Speaker 2: do unblind the data. We have to help to to 264 00:13:57,956 --> 00:14:01,356 Speaker 2: say here's how you now can incorporate these these predicted 265 00:14:01,396 --> 00:14:03,396 Speaker 2: outcomes into the analysis. 266 00:14:02,956 --> 00:14:04,836 Speaker 1: Like so this is this is it. Now We're at 267 00:14:04,836 --> 00:14:07,716 Speaker 1: the moment now when the thing you have built is useful. 268 00:14:07,796 --> 00:14:11,476 Speaker 1: So so now it's it's they have done the study, 269 00:14:11,836 --> 00:14:14,796 Speaker 1: they have the outcomes for the real human beings and 270 00:14:14,836 --> 00:14:17,956 Speaker 1: they have the predicted outcomes from your model. How is 271 00:14:17,996 --> 00:14:19,716 Speaker 1: your system? How's your model useful? 272 00:14:20,396 --> 00:14:22,996 Speaker 2: So the very first thing that we're basically going to 273 00:14:22,996 --> 00:14:24,156 Speaker 2: do is what I'm going to say, We're going to 274 00:14:24,196 --> 00:14:28,956 Speaker 2: recalibrate our model. Recalibrate and you're going to figure out 275 00:14:29,036 --> 00:14:33,236 Speaker 2: a relationship between your predicted outcomes and your observed outcomes 276 00:14:33,276 --> 00:14:36,796 Speaker 2: for the patients who really received the placebo, for. 277 00:14:36,876 --> 00:14:39,116 Speaker 1: The patients in the placebo group, And basically you're going 278 00:14:39,156 --> 00:14:40,436 Speaker 1: to see how you did how do we do. 279 00:14:40,876 --> 00:14:43,436 Speaker 2: Yes, and in particularly going to find out not just 280 00:14:43,836 --> 00:14:45,716 Speaker 2: it's not like a measure of was it good or bad, 281 00:14:45,756 --> 00:14:47,956 Speaker 2: You're going to find out exactly how are they related? 282 00:14:48,916 --> 00:14:53,076 Speaker 2: And then you can take that information in adjust your predictions. 283 00:14:53,636 --> 00:14:57,676 Speaker 2: Okay for everybody. So you can say, let's imagine that 284 00:14:57,956 --> 00:15:03,156 Speaker 2: I find out, well, on average, I'm i underestimating how 285 00:15:03,236 --> 00:15:05,436 Speaker 2: much a patient would progress by one point per year. 286 00:15:05,476 --> 00:15:08,236 Speaker 2: I'm on average underestimating it. Well, then I'll go through 287 00:15:08,236 --> 00:15:09,836 Speaker 2: and I'll take my prediction and I'll be like, well 288 00:15:10,476 --> 00:15:13,516 Speaker 2: add one point, add one point forer you. And then 289 00:15:13,916 --> 00:15:15,876 Speaker 2: now you have said, okay, well, now I've taken the 290 00:15:15,916 --> 00:15:18,236 Speaker 2: model and I've been able to do it in such 291 00:15:18,236 --> 00:15:21,036 Speaker 2: a way where I've fixed these mistakes by looking at 292 00:15:21,076 --> 00:15:23,556 Speaker 2: the actual patients who got place ebo, And now I'm 293 00:15:23,596 --> 00:15:25,596 Speaker 2: going to apply that model to the patient and the 294 00:15:25,636 --> 00:15:28,996 Speaker 2: treatment group, and I'm going to look at Now, I 295 00:15:29,156 --> 00:15:31,676 Speaker 2: just look at that difference between the patients and the 296 00:15:31,676 --> 00:15:33,636 Speaker 2: treatment group and their predictions from the model, and I 297 00:15:33,676 --> 00:15:36,156 Speaker 2: average that and I get an estimate for the treatment effect. 298 00:15:36,596 --> 00:15:39,996 Speaker 2: Now that is described in a two stage procedure. That 299 00:15:40,236 --> 00:15:43,236 Speaker 2: it's not actually a two stage procedure. It's one mathematical 300 00:15:43,236 --> 00:15:47,796 Speaker 2: analysis that you do it. But the thing that's really 301 00:15:48,316 --> 00:15:53,036 Speaker 2: I think quite amazing actually is that this has a 302 00:15:53,596 --> 00:15:57,916 Speaker 2: bunch of mathematical guarantees to it. We can actually prove 303 00:15:58,956 --> 00:16:01,596 Speaker 2: that the estimate that you get for how effective the 304 00:16:01,636 --> 00:16:06,236 Speaker 2: treatment is is still unbiased. So it's not an overestimate, 305 00:16:06,236 --> 00:16:09,836 Speaker 2: it's not under ustan, it's on average correct. Can prove 306 00:16:10,076 --> 00:16:12,636 Speaker 2: that if you compute a P value from the analysis 307 00:16:12,636 --> 00:16:15,236 Speaker 2: like you would typically do, that it has exactly the 308 00:16:15,316 --> 00:16:17,596 Speaker 2: right properties as it does out of a regular RCT. 309 00:16:17,756 --> 00:16:20,516 Speaker 1: P value is roughly the probability that the funding was 310 00:16:20,516 --> 00:16:20,916 Speaker 1: a fluke. 311 00:16:22,156 --> 00:16:25,756 Speaker 2: Ye right, Yeah. If you compute an arabar the arabar 312 00:16:25,876 --> 00:16:27,996 Speaker 2: you get from our analysis the air bar you would 313 00:16:27,996 --> 00:16:31,916 Speaker 2: get from a normal there. They all have exactly identical statistics. 314 00:16:31,956 --> 00:16:35,476 Speaker 1: This is not intuitive, but but you're saying, the mathematical 315 00:16:35,596 --> 00:16:39,076 Speaker 1: fact is that it works. Yes, And just to be clear, 316 00:16:40,036 --> 00:16:42,716 Speaker 1: what this allows you or the people running the trial 317 00:16:42,836 --> 00:16:46,796 Speaker 1: to do is to enroll fewer people in the placebo 318 00:16:46,916 --> 00:16:49,636 Speaker 1: arm not none, but fewer than they otherwise would have 319 00:16:49,716 --> 00:16:52,236 Speaker 1: had to get the same amount of statistical power. Right, 320 00:16:52,316 --> 00:16:55,076 Speaker 1: that is the bottom line thing that you are delivering. Yes, 321 00:16:55,156 --> 00:16:57,956 Speaker 1: that's correct, And it's something like a quarter or a 322 00:16:58,076 --> 00:17:00,036 Speaker 1: third less, is that right? Yeah? 323 00:17:00,156 --> 00:17:03,956 Speaker 2: So it depends on how accurate our models are. The 324 00:17:04,076 --> 00:17:06,516 Speaker 2: more accurate the model is, the fewer patients you need 325 00:17:06,556 --> 00:17:10,676 Speaker 2: in your placebo group. Sure so typically right now, yet 326 00:17:10,796 --> 00:17:13,716 Speaker 2: somewhere between like a quarter, like fifty percent. It depends 327 00:17:13,836 --> 00:17:15,796 Speaker 2: on the specific details. 328 00:17:15,956 --> 00:17:19,236 Speaker 1: So tell me what is the effect of that at 329 00:17:19,236 --> 00:17:21,476 Speaker 1: a macro scale? What does it mean to say a 330 00:17:21,596 --> 00:17:26,036 Speaker 1: drug company can get the same statistical power by enrolling 331 00:17:26,156 --> 00:17:30,196 Speaker 1: twenty five percent fewer people in their study, specifically in 332 00:17:30,276 --> 00:17:30,876 Speaker 1: the placeboar. 333 00:17:31,916 --> 00:17:34,476 Speaker 2: Well, I think that there are two things. First is 334 00:17:35,356 --> 00:17:39,316 Speaker 2: I think people don't always understand how expensive clinical trials 335 00:17:39,356 --> 00:17:43,116 Speaker 2: are you know, companies are paying one hundred sometimes two 336 00:17:43,236 --> 00:17:46,036 Speaker 2: hundred thousand dollars per patient in one of their clinical trials, 337 00:17:46,116 --> 00:17:49,196 Speaker 2: So finding and enrolling and monitoring a patient for all 338 00:17:49,236 --> 00:17:52,116 Speaker 2: that time is very, very expensive. It also just takes 339 00:17:52,116 --> 00:17:54,556 Speaker 2: a long time to find people who are willing to participate. 340 00:17:55,316 --> 00:17:58,036 Speaker 2: And so if you're talking about a large phase three trial, 341 00:17:58,276 --> 00:18:01,996 Speaker 2: reducing the size of the control group by twenty five percent, 342 00:18:02,076 --> 00:18:04,156 Speaker 2: that might mean like one hundred fewer patients that you 343 00:18:04,236 --> 00:18:06,916 Speaker 2: need to actually recruit and enroll in your study, and 344 00:18:07,276 --> 00:18:09,516 Speaker 2: that that could be like a year. But you know, 345 00:18:09,676 --> 00:18:11,996 Speaker 2: so you can save six months to a year off 346 00:18:12,036 --> 00:18:15,396 Speaker 2: of your total clinical trial timeline. That means a lot, right, 347 00:18:16,116 --> 00:18:19,436 Speaker 2: but both for patients. If the drug is actually successful, 348 00:18:19,876 --> 00:18:24,636 Speaker 2: that's a year faster it gets to market. And you know, 349 00:18:24,716 --> 00:18:27,276 Speaker 2: for the farmer company, that's office a big value proposition 350 00:18:27,396 --> 00:18:29,116 Speaker 2: being able to get the drug to market a year faster. 351 00:18:35,716 --> 00:18:39,836 Speaker 1: In a minute, moving from clinical trials to individual patients, 352 00:18:47,396 --> 00:18:53,236 Speaker 1: now back to the show. What is the what's the 353 00:18:53,276 --> 00:18:55,076 Speaker 1: big picture? Where are you trying to get to and 354 00:18:55,516 --> 00:19:01,076 Speaker 1: you know, in the medium termament in the long term, So. 355 00:19:02,476 --> 00:19:06,796 Speaker 2: The ability to understand what a person's health outcome is 356 00:19:06,836 --> 00:19:09,356 Speaker 2: going to be under different scenarios. This is I think 357 00:19:09,396 --> 00:19:12,396 Speaker 2: what's really important. Is it not just hey, given that 358 00:19:12,436 --> 00:19:14,396 Speaker 2: they would get a placebo, what's going to happen to 359 00:19:14,436 --> 00:19:16,636 Speaker 2: the health outcomes? That's nice for clinical trials, but we 360 00:19:16,716 --> 00:19:19,476 Speaker 2: want to know, hey, there's ten different treatment options for 361 00:19:19,556 --> 00:19:22,116 Speaker 2: this patient, and if I were to give them each 362 00:19:22,156 --> 00:19:24,436 Speaker 2: one of these different treatment options, what would their health 363 00:19:24,476 --> 00:19:26,356 Speaker 2: outcomes look like in those different scenarios. 364 00:19:27,276 --> 00:19:30,036 Speaker 1: So there you're also moving out of the clinical trial 365 00:19:30,596 --> 00:19:32,876 Speaker 1: into the realm of like a doctor seeing a patient. 366 00:19:32,996 --> 00:19:35,756 Speaker 1: Let's just be very clear, like that that's a huge leap, 367 00:19:36,076 --> 00:19:37,556 Speaker 1: and like that's what you're talking about. 368 00:19:37,796 --> 00:19:42,556 Speaker 2: I think that there's a really good pathway to being 369 00:19:42,676 --> 00:19:47,596 Speaker 2: able to build these things and make them useful for 370 00:19:47,996 --> 00:19:50,196 Speaker 2: problems that are at the individual patient level. 371 00:19:50,396 --> 00:19:52,516 Speaker 1: And is the narrow way to think about it, Like 372 00:19:53,236 --> 00:19:56,876 Speaker 1: before you get to the magical computer that can predict 373 00:19:56,916 --> 00:19:59,316 Speaker 1: everything for everybody, that you get to a very very 374 00:19:59,396 --> 00:20:04,196 Speaker 1: good model that can predict for individuals in certain circumstances 375 00:20:04,236 --> 00:20:06,116 Speaker 1: a certain set of outcomes. So, for example, you might 376 00:20:06,156 --> 00:20:09,636 Speaker 1: have a very very good Alzheimer's model for certain patients 377 00:20:10,156 --> 00:20:12,996 Speaker 1: at a certain stage of disease. This model is very 378 00:20:13,156 --> 00:20:15,396 Speaker 1: powerful at the level of the individual. Is that the 379 00:20:15,436 --> 00:20:18,036 Speaker 1: way to think about it, Yeah, the way I'll tell you. 380 00:20:18,036 --> 00:20:19,956 Speaker 2: The way I think about it. I think that the 381 00:20:20,076 --> 00:20:23,316 Speaker 2: most important thing that models can do, which actually things 382 00:20:23,396 --> 00:20:26,716 Speaker 2: like a chat ept are not good at, is that 383 00:20:26,796 --> 00:20:33,476 Speaker 2: they can give you really well calibrated estimates of their 384 00:20:33,556 --> 00:20:37,956 Speaker 2: own confidence. That's the most important thing that a model 385 00:20:37,996 --> 00:20:43,196 Speaker 2: can do, because, like we said earlier, health is stochastic. 386 00:20:43,556 --> 00:20:49,356 Speaker 2: There are all kinds of things that happens fundamentally exactly right. 387 00:20:50,356 --> 00:20:52,796 Speaker 2: And so you know, we're going to make a prediction 388 00:20:53,036 --> 00:20:56,196 Speaker 2: about somebody in the future, and sometimes we're going to 389 00:20:56,196 --> 00:20:58,636 Speaker 2: be really confident in that prediction and then it's actionable, 390 00:20:59,836 --> 00:21:02,836 Speaker 2: but sometimes you're not. It's not you're not confident, and 391 00:21:02,956 --> 00:21:06,356 Speaker 2: maybe it's not actionable because you're really unconfident. And the 392 00:21:06,476 --> 00:21:08,396 Speaker 2: most we're never going to get to the point that's 393 00:21:08,396 --> 00:21:10,396 Speaker 2: going to say, hey, you're going to have a heart 394 00:21:10,436 --> 00:21:15,196 Speaker 2: attack on July seventeenth of twenty thirty seven. It's like, 395 00:21:15,236 --> 00:21:17,636 Speaker 2: it's never going to be like that detail. But the 396 00:21:17,876 --> 00:21:21,996 Speaker 2: point question is can you believe the model's estimates of 397 00:21:22,076 --> 00:21:25,036 Speaker 2: its own confidence? And if you can, then you when 398 00:21:25,076 --> 00:21:27,396 Speaker 2: it is confident, you can act on it, and when 399 00:21:27,436 --> 00:21:29,756 Speaker 2: it's not confident, you can do other things. And that's 400 00:21:29,836 --> 00:21:32,756 Speaker 2: the that's so it's actually a really key technical thing, 401 00:21:32,836 --> 00:21:34,156 Speaker 2: and we know what we need to work on. 402 00:21:34,796 --> 00:21:36,836 Speaker 1: If I were going to answer pomorphis it, I'd be like, 403 00:21:36,876 --> 00:21:38,956 Speaker 1: it's like a it's like a humility. It's like an 404 00:21:38,956 --> 00:21:41,436 Speaker 1: epistemic humility, Like it knows what it doesn't know. 405 00:21:41,716 --> 00:21:43,436 Speaker 2: It knows what it doesn't know, and it will tell 406 00:21:43,476 --> 00:21:48,916 Speaker 2: you like I, yeah, here's my prediction. But yeah, exactly 407 00:21:49,276 --> 00:21:50,996 Speaker 2: So if you can get it to that point where 408 00:21:51,036 --> 00:21:54,396 Speaker 2: we were, where it's well calibrated that way, then they 409 00:21:54,476 --> 00:21:58,476 Speaker 2: become really really useful for a whole bunch of things. 410 00:21:59,116 --> 00:22:00,036 Speaker 2: And it's not going to say. 411 00:21:59,916 --> 00:22:03,076 Speaker 1: Become probably useful if they can have a relatively high 412 00:22:03,116 --> 00:22:06,236 Speaker 1: degree of certainty about at least some things, right yea, just. 413 00:22:06,276 --> 00:22:11,236 Speaker 2: Like yeah, it's not very course, yeah, but exactly so. 414 00:22:11,996 --> 00:22:15,596 Speaker 2: I think that that's the most important thing for these 415 00:22:16,116 --> 00:22:19,556 Speaker 2: applications of AI in medicine is to have models that 416 00:22:19,636 --> 00:22:21,556 Speaker 2: are going to be able to do that effectively. 417 00:22:22,716 --> 00:22:25,596 Speaker 1: If everything goes well, what problem will you be trying 418 00:22:25,636 --> 00:22:26,836 Speaker 1: to solve in five years? 419 00:22:28,196 --> 00:22:30,636 Speaker 2: In five years, I hope that we are rolling out 420 00:22:31,756 --> 00:22:35,596 Speaker 2: something that is a model for everything. That's what we 421 00:22:35,676 --> 00:22:37,596 Speaker 2: want to be rolling out, not this one disease at 422 00:22:37,636 --> 00:22:39,756 Speaker 2: a time thing, but one model for all disease. And 423 00:22:39,876 --> 00:22:42,916 Speaker 2: the reason why I really want to do this is 424 00:22:43,076 --> 00:22:45,996 Speaker 2: because if it's one model per disease, I need a 425 00:22:46,116 --> 00:22:48,916 Speaker 2: ton of data on that disease, a ton. So we 426 00:22:48,996 --> 00:22:50,956 Speaker 2: can work on these areas like Alzheimer's where I can 427 00:22:50,956 --> 00:22:53,636 Speaker 2: get data from fifty thousand patients, But how do I 428 00:22:53,756 --> 00:22:56,676 Speaker 2: work on the disease where I have fifty patients fifty 429 00:22:56,716 --> 00:22:58,756 Speaker 2: patients in the world who have this rare disease. Those 430 00:22:58,756 --> 00:23:01,996 Speaker 2: are really really important things. And the only way that 431 00:23:02,076 --> 00:23:03,556 Speaker 2: we're going to be able to do that is to 432 00:23:03,676 --> 00:23:06,716 Speaker 2: unlock a new kind of capability in our models to 433 00:23:06,876 --> 00:23:11,036 Speaker 2: learn from a handful of examples. And so this is 434 00:23:11,356 --> 00:23:14,916 Speaker 2: this is to me, the next frontier for our work 435 00:23:15,596 --> 00:23:18,116 Speaker 2: is figuring out how we can do that and then 436 00:23:18,316 --> 00:23:21,276 Speaker 2: bring that to market, because it opens up the ability 437 00:23:21,396 --> 00:23:24,716 Speaker 2: to work on rare diseases that are really really important 438 00:23:24,756 --> 00:23:29,236 Speaker 2: market very difficult to develop drugs for. And it's and 439 00:23:29,316 --> 00:23:31,636 Speaker 2: again I'm I'm you know, as a scientist, I'm drawn 440 00:23:31,676 --> 00:23:34,076 Speaker 2: to the technical challenges. Those are the things that. 441 00:23:34,236 --> 00:23:36,876 Speaker 1: It seems so hard, right, I mean, it seems like 442 00:23:37,516 --> 00:23:42,756 Speaker 1: this really basic insight about genitive models is that like 443 00:23:44,276 --> 00:23:47,196 Speaker 1: gigantic amounts of data feeding. You know, for a language model, 444 00:23:47,236 --> 00:23:48,996 Speaker 1: you feed it the whole internet is the way to 445 00:23:49,076 --> 00:23:52,716 Speaker 1: get it to understand how language works. And so how 446 00:23:53,116 --> 00:23:55,116 Speaker 1: how can you do something for fifty people? 447 00:23:55,316 --> 00:23:55,356 Speaker 2: Like? 448 00:23:55,836 --> 00:23:57,956 Speaker 1: How how that in five years? 449 00:23:58,436 --> 00:24:02,076 Speaker 2: Yeah, it's really hard. How the analogy is actually perfect? Okay, 450 00:24:02,596 --> 00:24:05,036 Speaker 2: if you want to build what we've learned is that 451 00:24:05,156 --> 00:24:08,156 Speaker 2: if you want to build a really amazing language model 452 00:24:08,716 --> 00:24:13,076 Speaker 2: that's really specific to some domain, so you only want 453 00:24:13,116 --> 00:24:16,396 Speaker 2: a language model that's really good at biophysics, it knows 454 00:24:16,476 --> 00:24:19,636 Speaker 2: biophysics really well. Would you be better off training a 455 00:24:19,716 --> 00:24:22,116 Speaker 2: model trying to find as much biophysics as you can 456 00:24:22,196 --> 00:24:24,556 Speaker 2: and training it on that or just training a model 457 00:24:24,556 --> 00:24:27,276 Speaker 2: on the entire init And what we've learned is much 458 00:24:27,316 --> 00:24:29,676 Speaker 2: better to train a model on the entire init that 459 00:24:29,756 --> 00:24:33,996 Speaker 2: there's a lot of things that transfer from one domain 460 00:24:34,156 --> 00:24:36,876 Speaker 2: to another. And so what we can do now is 461 00:24:36,916 --> 00:24:38,836 Speaker 2: say we train the model on the whole Internet, and 462 00:24:38,916 --> 00:24:42,436 Speaker 2: we have one biophysics paper, and we give it that 463 00:24:42,596 --> 00:24:46,276 Speaker 2: one or two papers on the background of all of 464 00:24:46,396 --> 00:24:49,156 Speaker 2: the knowledge from everywhere else, and that's much better than 465 00:24:49,196 --> 00:24:51,636 Speaker 2: trying to get lots and lots of biophysics papers. So 466 00:24:51,716 --> 00:24:54,956 Speaker 2: the analogy works perfectly in the exact same direction. That's 467 00:24:54,996 --> 00:24:56,516 Speaker 2: the whole point. We want to be able to take 468 00:24:56,676 --> 00:24:59,636 Speaker 2: all of the world's Imagine taking a model that has 469 00:24:59,876 --> 00:25:02,916 Speaker 2: all of the world's health data and putting all of 470 00:25:03,036 --> 00:25:05,516 Speaker 2: that into one So what seen everything and it can 471 00:25:05,556 --> 00:25:08,116 Speaker 2: now draw analogies between because there's a lot of things 472 00:25:08,196 --> 00:25:11,076 Speaker 2: you think about, like Parkinson's and Alzheimer's, they have a 473 00:25:11,116 --> 00:25:14,956 Speaker 2: lot of similarities, Huntington's a lot of similarities. So why 474 00:25:14,996 --> 00:25:18,276 Speaker 2: aren't we drawing kind of information or knowledge from one 475 00:25:18,356 --> 00:25:20,876 Speaker 2: disease area and using it to inform another because they 476 00:25:20,916 --> 00:25:24,236 Speaker 2: are similar. And so I'm allowing a model to have 477 00:25:24,476 --> 00:25:26,716 Speaker 2: access to all of the data and figure out how 478 00:25:26,756 --> 00:25:28,796 Speaker 2: to do it. I think is the right path forward. 479 00:25:29,596 --> 00:25:30,156 Speaker 2: So is that. 480 00:25:32,676 --> 00:25:35,996 Speaker 1: Wildly capital intensive? Like what do you actually do to 481 00:25:36,116 --> 00:25:38,036 Speaker 1: do that? You just get all the health data about 482 00:25:38,076 --> 00:25:40,236 Speaker 1: all the people you can and say to the machine 483 00:25:40,316 --> 00:25:41,996 Speaker 1: figure it out, Like what do you do? 484 00:25:43,516 --> 00:25:47,356 Speaker 2: Yeah? Yes, I mean the first step for us is 485 00:25:48,236 --> 00:25:50,916 Speaker 2: you need to get a lot of data. The biggest 486 00:25:50,996 --> 00:25:53,236 Speaker 2: thing is that we need to figure out a way 487 00:25:54,596 --> 00:25:58,676 Speaker 2: to have the model map all of those data to 488 00:25:58,796 --> 00:25:59,916 Speaker 2: the same representation. 489 00:26:00,476 --> 00:26:02,556 Speaker 1: What does that mean, map all of those data to 490 00:26:02,636 --> 00:26:03,596 Speaker 1: the same representation. 491 00:26:04,276 --> 00:26:09,996 Speaker 2: So let's imagine that there is some unobservable state of 492 00:26:10,076 --> 00:26:13,596 Speaker 2: a person which just describes their health. We can't actually 493 00:26:13,636 --> 00:26:16,516 Speaker 2: observe it directly. It's we don't exactly know what it is, 494 00:26:16,916 --> 00:26:19,236 Speaker 2: but we can make these measurements of it that tell 495 00:26:19,356 --> 00:26:23,356 Speaker 2: us something about that underlying state. So I can measure BMI, 496 00:26:23,476 --> 00:26:25,436 Speaker 2: I can measure heart rate, I can measure all the 497 00:26:25,556 --> 00:26:27,796 Speaker 2: I can measure all of these different things. And what 498 00:26:27,956 --> 00:26:29,956 Speaker 2: we want to be able to do is instead of 499 00:26:30,156 --> 00:26:32,076 Speaker 2: working in the world of measurements, which is where we 500 00:26:32,156 --> 00:26:34,396 Speaker 2: currently work, we want to be able to work at 501 00:26:34,436 --> 00:26:37,756 Speaker 2: that underlying unobservable state because if you can, if you 502 00:26:37,796 --> 00:26:39,916 Speaker 2: can see that, if you could reach through into that 503 00:26:40,156 --> 00:26:43,236 Speaker 2: underlying state, you can answer any question about any. 504 00:26:43,116 --> 00:26:46,676 Speaker 1: Patient's health, like like like a number like this one 505 00:26:46,756 --> 00:26:48,156 Speaker 1: state that is just like one. 506 00:26:48,476 --> 00:26:56,236 Speaker 2: High dimension the high dimension, right right, Well, okay, yeah, 507 00:26:56,316 --> 00:26:58,436 Speaker 2: so I mean yeah, but basically talking about is there 508 00:26:58,556 --> 00:27:01,796 Speaker 2: some vector, some really high dimensional space where we're able 509 00:27:01,876 --> 00:27:04,876 Speaker 2: to take all diseases and look at them how they're 510 00:27:04,916 --> 00:27:07,076 Speaker 2: related to each other in this really high dimensional space. 511 00:27:07,716 --> 00:27:10,036 Speaker 2: That is the way language models work. That's exactly how I. 512 00:27:10,116 --> 00:27:15,316 Speaker 1: Love And that's intense, Like, that's pretty far out right. 513 00:27:15,396 --> 00:27:16,676 Speaker 1: Doesn't that feel far out too? 514 00:27:17,236 --> 00:27:20,396 Speaker 2: I would say, talk like a hippie, But I if 515 00:27:20,436 --> 00:27:22,876 Speaker 2: I describe this to a machine learning researcher, they're like, 516 00:27:23,036 --> 00:27:26,596 Speaker 2: that sounds exactly like what you should do. So it 517 00:27:26,676 --> 00:27:29,236 Speaker 2: doesn't seem far out to me. It seems it seems 518 00:27:29,356 --> 00:27:31,476 Speaker 2: very clear that that's the direction that we should be 519 00:27:31,516 --> 00:27:32,036 Speaker 2: taking things. 520 00:27:32,396 --> 00:27:34,636 Speaker 1: And does five years seem like a like you might 521 00:27:34,716 --> 00:27:36,076 Speaker 1: actually do it in five years. 522 00:27:36,716 --> 00:27:40,076 Speaker 2: Yeah, we were hoping to be able to have a 523 00:27:40,236 --> 00:27:43,196 Speaker 2: version next year. That's a pan neuroscience model, so we're 524 00:27:43,236 --> 00:27:47,516 Speaker 2: starting with all. So we're starting with start with something 525 00:27:47,796 --> 00:27:50,356 Speaker 2: more attractable, build a more tractable thing. So right now 526 00:27:50,436 --> 00:27:53,636 Speaker 2: we're working on a neuroscience model. So we're hoping, I 527 00:27:53,676 --> 00:27:55,916 Speaker 2: mean which to be totally. This might not work. This 528 00:27:56,356 --> 00:27:58,756 Speaker 2: is a research idea, right, so it may work, it 529 00:27:58,836 --> 00:28:00,716 Speaker 2: might not work. But that's you ask where I would 530 00:28:00,716 --> 00:28:02,196 Speaker 2: hope to be. That's where I hope to be is 531 00:28:02,276 --> 00:28:04,236 Speaker 2: that we're able to solve those those problems. 532 00:28:08,076 --> 00:28:09,796 Speaker 1: So we'll be back in a minute. With the Lightning Round, 533 00:28:09,996 --> 00:28:12,556 Speaker 1: including what Charles learned when he worked as an ice 534 00:28:12,596 --> 00:28:22,476 Speaker 1: hockey wrap back to the show. I'm going to finish 535 00:28:22,556 --> 00:28:24,436 Speaker 1: with the Lightning Round. Will just be a few more minutes. 536 00:28:24,716 --> 00:28:24,996 Speaker 2: Okay. 537 00:28:26,236 --> 00:28:30,436 Speaker 1: As the name suggests, I've heard you say that you 538 00:28:30,596 --> 00:28:34,516 Speaker 1: read academic preprints, which is basically studies that are about 539 00:28:34,556 --> 00:28:37,516 Speaker 1: to be published, that you read them every day. What's 540 00:28:37,556 --> 00:28:39,636 Speaker 1: one you read recently that you found particularly interesting? 541 00:28:41,076 --> 00:28:44,276 Speaker 2: Recently? There have been a number of papers that I've 542 00:28:44,316 --> 00:28:50,556 Speaker 2: been reading around different ways of so training, the kind 543 00:28:50,556 --> 00:28:54,436 Speaker 2: of neural networks that we use. All of them use 544 00:28:54,556 --> 00:28:58,076 Speaker 2: a particular algorithm that people call ADAM. It's been used 545 00:28:58,116 --> 00:29:01,356 Speaker 2: for a really long time, like everyone uses it now, 546 00:29:02,556 --> 00:29:06,116 Speaker 2: and it has I don't know, it has some problems. 547 00:29:06,316 --> 00:29:08,356 Speaker 2: There's a paper that was just really recently on a 548 00:29:08,436 --> 00:29:10,636 Speaker 2: new algorithm people call LION. I don't know what it 549 00:29:10,716 --> 00:29:13,396 Speaker 2: stands for. L I O N stands for something. And 550 00:29:13,556 --> 00:29:17,556 Speaker 2: this was a discovered So they used a machine learning 551 00:29:17,596 --> 00:29:20,756 Speaker 2: out a reinforcement learning algorithm to discover a new kind 552 00:29:20,836 --> 00:29:21,516 Speaker 2: of optimizer. 553 00:29:22,436 --> 00:29:26,316 Speaker 1: So if this works, if Lion is better than ADAM, 554 00:29:26,436 --> 00:29:29,716 Speaker 1: will it be like machine learning figuring out a better 555 00:29:29,796 --> 00:29:32,476 Speaker 1: way to build machine learning. Is that what's happening here? 556 00:29:32,756 --> 00:29:34,436 Speaker 2: Yeah, that's what people are working on exactly. 557 00:29:34,716 --> 00:29:36,796 Speaker 1: This is like the takeoff. This is like the moment 558 00:29:36,836 --> 00:29:39,716 Speaker 1: when GPT five builds GBT six or whatever. 559 00:29:39,916 --> 00:29:42,196 Speaker 2: I think the claim is it's like five percent better something. 560 00:29:42,196 --> 00:29:42,956 Speaker 2: It's not. It's not. 561 00:29:44,716 --> 00:29:46,596 Speaker 1: Yes, Lion couldn't find the. 562 00:29:46,676 --> 00:29:49,836 Speaker 2: Time another thing yet Yeah. So yeah, that was a 563 00:29:49,876 --> 00:29:51,036 Speaker 2: paper I read really recently. 564 00:29:52,356 --> 00:29:54,676 Speaker 1: If you couldn't work in AI, what field would you 565 00:29:54,716 --> 00:29:54,916 Speaker 1: work in? 566 00:29:58,636 --> 00:30:03,756 Speaker 2: If I couldn't work in AI. Uh, I guess I 567 00:30:03,796 --> 00:30:09,356 Speaker 2: would probably try to work in energy, maybe tim a 568 00:30:09,476 --> 00:30:11,676 Speaker 2: change something related to that. 569 00:30:12,236 --> 00:30:14,476 Speaker 1: I think seeing bummed at the prospect of not being 570 00:30:14,516 --> 00:30:16,636 Speaker 1: able to work in AI, I appreciate that. I don't 571 00:30:16,636 --> 00:30:16,996 Speaker 1: want to make it. 572 00:30:17,076 --> 00:30:20,556 Speaker 2: I'm very bummed. Yeah, you know, I think it's the 573 00:30:20,676 --> 00:30:25,196 Speaker 2: most exciting thing that's happened on Earth since the Industrial Revolution. 574 00:30:25,276 --> 00:30:27,636 Speaker 2: So it's a new industrial revolution. Yeah. 575 00:30:28,356 --> 00:30:31,756 Speaker 1: Weirdly, you used to work at a virtual reality hardware company. 576 00:30:33,796 --> 00:30:36,996 Speaker 1: I feel like VR is always about to break through, 577 00:30:37,316 --> 00:30:39,716 Speaker 1: you know, like Apple just had this big announcement, had 578 00:30:39,756 --> 00:30:42,236 Speaker 1: a Facebook did a while ago, but yet it never 579 00:30:42,396 --> 00:30:45,956 Speaker 1: quite happens. Why not, Like, why are we not doing 580 00:30:45,996 --> 00:30:47,076 Speaker 1: this interview in the metaverse. 581 00:30:48,396 --> 00:30:51,436 Speaker 2: So I only worked at that company for a few months. 582 00:30:52,036 --> 00:30:56,276 Speaker 2: I spent my whole career working in biophysics. I moved 583 00:30:56,356 --> 00:30:58,996 Speaker 2: to Pfiser. I was working at Pfiser, and then I 584 00:30:59,076 --> 00:31:02,116 Speaker 2: got im just like, I'm gonna try something totally different, 585 00:31:02,676 --> 00:31:05,196 Speaker 2: and I went and tried this work at the VR company. 586 00:31:05,676 --> 00:31:08,516 Speaker 2: I was interested in that because of the underlying technical 587 00:31:08,556 --> 00:31:10,956 Speaker 2: problems research that I had to do, not because I 588 00:31:11,116 --> 00:31:15,276 Speaker 2: was drawn to the product. I have only ever used 589 00:31:15,316 --> 00:31:20,396 Speaker 2: a virtual reality headset twice my entire life. Once was 590 00:31:20,476 --> 00:31:23,156 Speaker 2: in the interview for that job, and once was testing 591 00:31:23,276 --> 00:31:26,316 Speaker 2: something while I was working at that job. I'm not 592 00:31:26,556 --> 00:31:28,996 Speaker 2: interested in it, so you want to know I was 593 00:31:29,076 --> 00:31:31,196 Speaker 2: interested in the engineering. So you want to know why 594 00:31:31,276 --> 00:31:34,156 Speaker 2: I don't think it's taken off. Is because most people 595 00:31:34,236 --> 00:31:37,876 Speaker 2: don't have a compelling reason to use it. Neither do I. Yeah, 596 00:31:38,476 --> 00:31:41,396 Speaker 2: what'd you learn working as an ice hockey referee? Ice 597 00:31:41,436 --> 00:31:45,236 Speaker 2: hockey referee? Oh, that was like my super super young job. 598 00:31:47,356 --> 00:31:50,596 Speaker 2: I would say that I learned it's best not to 599 00:31:50,716 --> 00:31:57,396 Speaker 2: call penalties on little children. That's what I learned. You know, 600 00:31:57,516 --> 00:31:59,516 Speaker 2: people would just like like run into each other and 601 00:31:59,556 --> 00:32:01,396 Speaker 2: they'd fall down. You're like, is that a penalty. Was 602 00:32:01,436 --> 00:32:03,916 Speaker 2: it on purpose? Not on purpose? If you call a penalty, 603 00:32:04,196 --> 00:32:05,876 Speaker 2: the parents are going to be real upset at you. 604 00:32:05,956 --> 00:32:07,396 Speaker 2: So you just just let them play. 605 00:32:07,836 --> 00:32:10,476 Speaker 1: Good early experiments, cost benefit analysis. 606 00:32:10,716 --> 00:32:11,396 Speaker 2: Just let them play. 607 00:32:17,156 --> 00:32:20,636 Speaker 1: Charles Fisher is the co founder and CEO of Umler. 608 00:32:21,396 --> 00:32:24,876 Speaker 1: Today's show was edited by Sarah Nis, produced by Gabriel 609 00:32:24,996 --> 00:32:29,796 Speaker 1: Hunter Chang and Edith Russlo, and engineered by Amanda k Wong. 610 00:32:30,076 --> 00:32:33,036 Speaker 1: I'm Jacob Goldstein. One last note, the show is going 611 00:32:33,116 --> 00:32:35,236 Speaker 1: to be off for the next several weeks and we'll 612 00:32:35,276 --> 00:32:38,596 Speaker 1: be back with new episodes in August. Have a rad summer.