1
00:00:15,356 --> 00:00:24,076
Speaker 1: Pushkin. Imagine something that is sort of like chat GPT,

2
00:00:24,636 --> 00:00:28,076
Speaker 1: but for the human body. Chat GPT looks at a

3
00:00:28,116 --> 00:00:31,236
Speaker 1: sentence and predicts what words are likely to come next.

4
00:00:31,756 --> 00:00:34,316
Speaker 1: This thing would look at a human body and predict

5
00:00:34,356 --> 00:00:38,156
Speaker 1: what diseases are likely to come next. The body is

6
00:00:38,316 --> 00:00:41,836
Speaker 1: wildly complex and unpredictable. This seems like a very, very

7
00:00:41,876 --> 00:00:45,076
Speaker 1: hard problem, but it is a problem people are working on,

8
00:00:45,516 --> 00:00:48,996
Speaker 1: and at least in some circumstances, they're figuring out how

9
00:00:49,036 --> 00:00:57,716
Speaker 1: to make predictions that are truly useful. I'm Jacob Goldstein,

10
00:00:57,756 --> 00:00:59,636
Speaker 1: and this is What's Your Problem, the show where I

11
00:00:59,716 --> 00:01:02,796
Speaker 1: talk to people who are trying to make technological progress.

12
00:01:03,156 --> 00:01:06,476
Speaker 1: My guest today is Charles Fisher, co founder and CEO

13
00:01:06,636 --> 00:01:10,836
Speaker 1: of Unlearned. Charles' problem is how do you build an

14
00:01:10,836 --> 00:01:14,836
Speaker 1: AI model that can predict human health. Charles and his

15
00:01:14,876 --> 00:01:17,796
Speaker 1: colleagues have built a predictive model of human health that's

16
00:01:17,876 --> 00:01:21,076
Speaker 1: already being used in clinical trials for new drugs and

17
00:01:21,196 --> 00:01:24,356
Speaker 1: new medical devices. But we started out talking about the

18
00:01:24,396 --> 00:01:27,956
Speaker 1: big picture, about the very idea of trying to predict

19
00:01:27,996 --> 00:01:31,836
Speaker 1: what's going to happen to a human body.

20
00:01:32,996 --> 00:01:36,356
Speaker 2: It's funny when I talk about trying to quantify biology

21
00:01:36,356 --> 00:01:38,756
Speaker 2: and make it predictable. I often get hit with this

22
00:01:40,796 --> 00:01:45,716
Speaker 2: critique that biology isn't physics. Biology is complex, biology is

23
00:01:45,716 --> 00:01:47,476
Speaker 2: not physics. We're not going to be able to do that.

24
00:01:47,876 --> 00:01:49,276
Speaker 1: Let's deterministic.

25
00:01:50,396 --> 00:01:54,636
Speaker 2: Right, So for physics, for two thousand years, right, people

26
00:01:54,676 --> 00:01:58,476
Speaker 2: started working on physics in ancient Greece. And for two

27
00:01:58,476 --> 00:02:03,596
Speaker 2: thousand years, physics wasn't physics. Physics was unpredictable. Physics was

28
00:02:03,756 --> 00:02:07,676
Speaker 2: too complex to understand until something was invented. And that

29
00:02:07,676 --> 00:02:09,076
Speaker 2: thing was calculus.

30
00:02:10,116 --> 00:02:10,796
Speaker 1: Until new right.

31
00:02:10,916 --> 00:02:15,236
Speaker 2: Yeah, So once calculus was invented, all of the sudden,

32
00:02:15,276 --> 00:02:18,036
Speaker 2: we had a new language. In this language, this new

33
00:02:18,116 --> 00:02:21,276
Speaker 2: kind of mathematics allowed us to really easily describe lots

34
00:02:21,276 --> 00:02:25,036
Speaker 2: of physical phenomena. And so now physics has become this

35
00:02:25,116 --> 00:02:28,676
Speaker 2: thing that's very predictable and well understood. And that's what

36
00:02:28,676 --> 00:02:31,116
Speaker 2: we've been waiting for in biology. We've been waiting for

37
00:02:31,156 --> 00:02:33,996
Speaker 2: a new tool, a new language, a new mathematics that

38
00:02:34,036 --> 00:02:37,276
Speaker 2: will allow us to understand these complex systems. And that's

39
00:02:37,396 --> 00:02:39,476
Speaker 2: really what I think these new tools are.

40
00:02:39,796 --> 00:02:41,796
Speaker 1: So I think so your hope, your hope is that

41
00:02:41,916 --> 00:02:48,156
Speaker 1: machine learning generative AI will do for medicine biology. What

42
00:02:48,236 --> 00:02:49,756
Speaker 1: Calculus did for physics.

43
00:02:50,236 --> 00:02:53,516
Speaker 2: Exactly. That is big, big, It's exactly what I hope.

44
00:02:53,916 --> 00:02:54,996
Speaker 2: That's exactly what I hope.

45
00:02:55,036 --> 00:02:57,916
Speaker 1: So okay, so this is your hope. You're starting this

46
00:02:58,036 --> 00:03:03,796
Speaker 1: company to test your hypothesis. Uh, what do you do?

47
00:03:04,796 --> 00:03:06,116
Speaker 2: What do you mean? What do I do? What I

48
00:03:06,156 --> 00:03:08,836
Speaker 2: do on day one? Or like, what are we doing? No?

49
00:03:08,836 --> 00:03:12,276
Speaker 1: No, no, We're back to twenty seventeen. You have this

50
00:03:12,516 --> 00:03:16,036
Speaker 1: big up in the sky, two thousand year, thirty thousand

51
00:03:16,036 --> 00:03:18,636
Speaker 1: foot idea. But you got to make a thing that

52
00:03:18,676 --> 00:03:20,876
Speaker 1: somebody is going to pay you for that will hopefully

53
00:03:20,996 --> 00:03:22,916
Speaker 1: use AI in medicine in some way. So what do

54
00:03:22,956 --> 00:03:23,156
Speaker 1: you do?

55
00:03:23,836 --> 00:03:27,436
Speaker 2: So we didn't know what would work, so we focused

56
00:03:27,516 --> 00:03:33,196
Speaker 2: on two different problems at the time. So one problem is,

57
00:03:33,636 --> 00:03:36,596
Speaker 2: let's imagine we're going to have a bunch of data

58
00:03:36,956 --> 00:03:40,356
Speaker 2: from some maybe a big large collection of patients. We're

59
00:03:40,356 --> 00:03:43,276
Speaker 2: gonna have this data all over time, so the symptoms

60
00:03:43,276 --> 00:03:46,996
Speaker 2: that a patient might have every week, four year or

61
00:03:47,036 --> 00:03:49,716
Speaker 2: something like that. And our goal is to be able

62
00:03:49,756 --> 00:03:53,596
Speaker 2: to create a simulator of a patient's future health. So,

63
00:03:53,756 --> 00:03:55,876
Speaker 2: given what I know about a patient in the past,

64
00:03:56,316 --> 00:03:59,196
Speaker 2: can I simulate what will happen to them in the future.

65
00:03:59,916 --> 00:04:03,636
Speaker 1: And presumably that is sort of probabilistic. I mean, what

66
00:04:03,756 --> 00:04:05,476
Speaker 1: we know about health, Like you can say there's an

67
00:04:05,636 --> 00:04:08,476
Speaker 1: x percent chance that in why years this person will

68
00:04:08,476 --> 00:04:09,876
Speaker 1: have a heart attack something like that.

69
00:04:10,236 --> 00:04:13,716
Speaker 2: Exactly. Yeah, we want to yes, because so many things

70
00:04:13,756 --> 00:04:16,996
Speaker 2: are undetermined in that you know, maybe yeah, exactly.

71
00:04:16,836 --> 00:04:18,996
Speaker 1: Right, and it's just the nature of the world, right,

72
00:04:19,356 --> 00:04:20,036
Speaker 1: one hundred percent.

73
00:04:20,196 --> 00:04:21,036
Speaker 2: Yeah.

74
00:04:21,556 --> 00:04:24,636
Speaker 1: So okay, so you have this idea of basically where

75
00:04:24,836 --> 00:04:28,036
Speaker 1: chat GBT, which didn't exist yet, but predicts the next

76
00:04:28,036 --> 00:04:31,316
Speaker 1: word with some probability you want to predict the next

77
00:04:31,356 --> 00:04:32,476
Speaker 1: health outcome.

78
00:04:32,076 --> 00:04:34,716
Speaker 2: For exactly that is the big idea. Yeah, So that

79
00:04:34,916 --> 00:04:36,676
Speaker 2: that was one of them. The other that was not

80
00:04:36,756 --> 00:04:38,516
Speaker 2: the only one that was the one that is what

81
00:04:38,516 --> 00:04:40,956
Speaker 2: we do. The one that we didn't do is we

82
00:04:40,956 --> 00:04:43,676
Speaker 2: were interested as well potentially so that's at a very

83
00:04:43,716 --> 00:04:47,756
Speaker 2: macroscopic scale, that's at the scale of the person, whereas

84
00:04:47,796 --> 00:04:49,956
Speaker 2: the other thing we were interested in was potentially could

85
00:04:49,996 --> 00:04:52,116
Speaker 2: we go at the micro scale and look at what's

86
00:04:52,156 --> 00:04:55,356
Speaker 2: happening inside individual cells. We were interested in this at

87
00:04:55,396 --> 00:04:58,036
Speaker 2: the beginning. Basically, the way we figured this out is

88
00:04:58,116 --> 00:05:01,116
Speaker 2: we signed a few deals with farmer companies to try

89
00:05:01,156 --> 00:05:06,356
Speaker 2: these things, and we found found that the technology worked

90
00:05:06,396 --> 00:05:11,196
Speaker 2: really well in this simulating health outcomes, and it didn't

91
00:05:11,196 --> 00:05:13,676
Speaker 2: work very well when it comes down to simulating what's

92
00:05:13,716 --> 00:05:15,996
Speaker 2: inside the cell. And I think this comes down to data,

93
00:05:16,356 --> 00:05:18,676
Speaker 2: which is that we get a ton of data on

94
00:05:18,916 --> 00:05:21,516
Speaker 2: human health outcomes, like literally every time you go to

95
00:05:21,556 --> 00:05:24,436
Speaker 2: the doctor, there's data there on your health outcomes. But

96
00:05:24,476 --> 00:05:28,036
Speaker 2: the data from the things inside the cell, there is

97
00:05:28,076 --> 00:05:31,636
Speaker 2: a lot of it, but it's much more difficult to

98
00:05:31,676 --> 00:05:34,116
Speaker 2: work with. So I think that was a lot of

99
00:05:34,156 --> 00:05:37,596
Speaker 2: what drove us in this direction is really the focus

100
00:05:37,636 --> 00:05:39,556
Speaker 2: on what we think we have the data to solve

101
00:05:39,636 --> 00:05:40,596
Speaker 2: these kinds of problems.

102
00:05:40,676 --> 00:05:44,476
Speaker 1: So, Okay, you go in the direction of simulating health

103
00:05:44,516 --> 00:05:48,756
Speaker 1: outcomes for patients, and in particular, sort of where you

104
00:05:48,796 --> 00:05:52,596
Speaker 1: get to is working with companies that are running clinical trials.

105
00:05:52,596 --> 00:05:54,796
Speaker 1: And I know eventually you get to a point where

106
00:05:54,836 --> 00:05:57,596
Speaker 1: companies can use your model, use your software to run

107
00:05:57,636 --> 00:06:01,396
Speaker 1: clinical trials with fewer patients. So just tell me about that,

108
00:06:02,156 --> 00:06:03,476
Speaker 1: arc tell me how you get there.

109
00:06:04,076 --> 00:06:08,076
Speaker 2: Clinical trials are, well, they're super tick forever, and they're

110
00:06:08,076 --> 00:06:10,916
Speaker 2: really really expensive. Something might take like five years and

111
00:06:10,956 --> 00:06:14,796
Speaker 2: cost one hundred million dollars to run a clinical trial. Yeah,

112
00:06:14,836 --> 00:06:17,996
Speaker 2: in the way that these are hundreds or thousands of patients, right, oh,

113
00:06:18,036 --> 00:06:21,996
Speaker 2: thousands of patients typically, right, Yeah, And typically half of

114
00:06:22,036 --> 00:06:24,516
Speaker 2: the patients in a clinical trial are receiving a PLACBO.

115
00:06:25,436 --> 00:06:27,796
Speaker 2: So you're going to randomly assign half to receive an

116
00:06:27,796 --> 00:06:30,476
Speaker 2: experimental treatment have to receive a PLACBO. And the reason

117
00:06:30,596 --> 00:06:33,916
Speaker 2: is that every clinical trial is ultimately just doing a comparison.

118
00:06:34,396 --> 00:06:36,956
Speaker 2: You're comparing how a patient responds to the new treatment

119
00:06:36,996 --> 00:06:38,796
Speaker 2: to how they respond if they don't get that treatment.

120
00:06:38,836 --> 00:06:40,716
Speaker 1: And let me just give a shout out to the

121
00:06:40,796 --> 00:06:44,836
Speaker 1: randomized controlled trial as like a really beautiful construct, right,

122
00:06:45,636 --> 00:06:47,956
Speaker 1: not that old? Not that old. I learned that a

123
00:06:48,076 --> 00:06:51,636
Speaker 1: ring for this interview, like less than one hundred years old, amazingly.

124
00:06:52,676 --> 00:06:56,356
Speaker 1: But it's a perfect way to assess not perfect, it's

125
00:06:56,356 --> 00:06:59,956
Speaker 1: a very very good way to assess causality. It's really elegant.

126
00:07:00,156 --> 00:07:03,076
Speaker 2: It is an elegant idea. But if you're a patient,

127
00:07:04,356 --> 00:07:06,996
Speaker 2: why are you participating a clinical trial at all? What's

128
00:07:07,036 --> 00:07:09,716
Speaker 2: the number one reason people participate in clinic trials. They

129
00:07:09,716 --> 00:07:12,116
Speaker 2: participate in clinical trials because they want access to this

130
00:07:12,196 --> 00:07:14,796
Speaker 2: experimental treatment that you can't get any other way. That's

131
00:07:14,796 --> 00:07:17,716
Speaker 2: the number one reason why patients are participating in clinical trials.

132
00:07:17,796 --> 00:07:19,156
Speaker 2: Number one, Now they.

133
00:07:19,076 --> 00:07:21,316
Speaker 1: Don't they don't want to be randomized to the placebo.

134
00:07:21,516 --> 00:07:23,636
Speaker 2: No, no, no, they don't.

135
00:07:23,716 --> 00:07:27,156
Speaker 1: I can certainly understand that it is the case, right

136
00:07:27,276 --> 00:07:33,076
Speaker 1: that most trials fail, meaning the drug is not helping

137
00:07:33,076 --> 00:07:36,836
Speaker 1: you and possibly hurting you, meaning on average, you're better

138
00:07:36,916 --> 00:07:39,396
Speaker 1: off being in the placebo arm Like that is true, right.

139
00:07:39,396 --> 00:07:42,516
Speaker 2: Yea, there's a principle of equipoise. But that's an academic

140
00:07:42,636 --> 00:07:43,956
Speaker 2: Ivory tower principle.

141
00:07:44,156 --> 00:07:48,636
Speaker 1: I mean, it also is true. Just sue, that's fine, that's.

142
00:07:48,476 --> 00:07:52,716
Speaker 2: Fine, but in the end, that's like, in the end,

143
00:07:52,836 --> 00:07:56,956
Speaker 2: patients choose not to participate in clinical trials because they

144
00:07:56,996 --> 00:08:00,516
Speaker 2: don't want to get a placebo. Patients drop out of

145
00:08:00,596 --> 00:08:03,196
Speaker 2: clinical trials when they think they are getting a placbo.

146
00:08:03,796 --> 00:08:07,436
Speaker 2: Those are also true. Number one reason those things happen.

147
00:08:07,556 --> 00:08:08,516
Speaker 2: Are those reasons? Fair?

148
00:08:08,676 --> 00:08:08,916
Speaker 1: Okay?

149
00:08:08,956 --> 00:08:12,636
Speaker 2: Right? So, And in fact, twenty percent of clinical trials

150
00:08:12,636 --> 00:08:14,996
Speaker 2: failed not because the drug didn't work, but because they

151
00:08:15,036 --> 00:08:19,476
Speaker 2: just couldn't find enough people to participate, okay. And what

152
00:08:19,516 --> 00:08:23,916
Speaker 2: we realized though, is that there was a way for

153
00:08:24,076 --> 00:08:29,156
Speaker 2: us not to try to replace the randomized control trial,

154
00:08:29,196 --> 00:08:31,716
Speaker 2: but to make it better, and that what we are

155
00:08:31,756 --> 00:08:35,916
Speaker 2: doing is we could take what we call digital twins

156
00:08:35,636 --> 00:08:38,516
Speaker 2: of the patients, so these are these simulations of their

157
00:08:38,596 --> 00:08:42,076
Speaker 2: of their future outcomes, and that we could incorporate those

158
00:08:42,156 --> 00:08:48,836
Speaker 2: data into our cts directly randomized control trials. We call

159
00:08:48,876 --> 00:08:51,276
Speaker 2: it just kind of like a reimagining of our cts.

160
00:08:51,396 --> 00:08:54,996
Speaker 2: It's it's you're going to have a RCT that is

161
00:08:55,676 --> 00:09:01,156
Speaker 2: more accurate, that is has requires fewer patients, and as

162
00:09:01,196 --> 00:09:03,356
Speaker 2: a result, you get a lot of the benefits of

163
00:09:03,996 --> 00:09:06,756
Speaker 2: faster trials of things that are better for the patients.

164
00:09:06,996 --> 00:09:09,756
Speaker 2: We can talk about that in a minute, but you

165
00:09:09,876 --> 00:09:11,476
Speaker 2: keep all of the same scientific rigger.

166
00:09:12,716 --> 00:09:17,356
Speaker 1: So specifically, okay, that's a good like big picture. Specifically,

167
00:09:18,596 --> 00:09:19,116
Speaker 1: how does it.

168
00:09:19,076 --> 00:09:25,196
Speaker 2: Work right now? We build one model per disease. So,

169
00:09:25,276 --> 00:09:28,716
Speaker 2: for example, we have a model for patients with Alzheimer's disease.

170
00:09:28,756 --> 00:09:31,236
Speaker 2: We have a separate model for patients with als, we

171
00:09:31,276 --> 00:09:33,476
Speaker 2: have a separate model for multiple scleroses, et cetera.

172
00:09:33,876 --> 00:09:36,436
Speaker 1: Let's pick one model and talk about it. What's the

173
00:09:36,436 --> 00:09:38,516
Speaker 1: one that's farthest along, Which is the one that works

174
00:09:38,516 --> 00:09:38,876
Speaker 1: the best?

175
00:09:39,076 --> 00:09:41,996
Speaker 2: Yeah, So our Alzheimer's disease model is that was our

176
00:09:41,996 --> 00:09:44,916
Speaker 2: first one that we've published scientific papers on and things

177
00:09:44,956 --> 00:09:47,076
Speaker 2: like this, so that ones our most well known.

178
00:09:47,356 --> 00:09:51,156
Speaker 1: Okay, so you're setting out to build a model that

179
00:09:51,236 --> 00:09:55,196
Speaker 1: will predict whether what's going to happen, presumably to a

180
00:09:55,196 --> 00:09:58,196
Speaker 1: patient who has the early stages of Alzheimer's disease, How

181
00:09:58,196 --> 00:10:00,876
Speaker 1: will their disease progress? A hard thing to know in

182
00:10:00,916 --> 00:10:04,636
Speaker 1: the real world. How do you build that? What do

183
00:10:04,676 --> 00:10:04,956
Speaker 1: you do?

184
00:10:05,796 --> 00:10:07,836
Speaker 2: So the first thing is that you need data to

185
00:10:07,916 --> 00:10:11,956
Speaker 2: learn from. Yeah, it's kind of obvious. So our first

186
00:10:11,996 --> 00:10:14,076
Speaker 2: step was like, oh, we say, okay, we want to

187
00:10:14,116 --> 00:10:16,236
Speaker 2: have data sets where we get a ton of information

188
00:10:16,276 --> 00:10:19,436
Speaker 2: about each patient. What's that mean? That means that any

189
00:10:19,476 --> 00:10:22,436
Speaker 2: individual time, I want to have a lot of different

190
00:10:22,996 --> 00:10:25,276
Speaker 2: different measurements made on that patient at each time.

191
00:10:26,156 --> 00:10:29,116
Speaker 1: So alsumably you want to have a lot of moments

192
00:10:29,196 --> 00:10:31,156
Speaker 1: when lots of information exactly.

193
00:10:31,156 --> 00:10:32,156
Speaker 2: You also want to have lots of.

194
00:10:32,196 --> 00:10:34,516
Speaker 1: Lots of times over a long period of time, over

195
00:10:34,556 --> 00:10:35,276
Speaker 1: a long period.

196
00:10:35,316 --> 00:10:37,316
Speaker 2: Yeah, and so you know these are going to be

197
00:10:37,476 --> 00:10:39,756
Speaker 2: for Alzheimer's. You're looking at a bunch of things related

198
00:10:39,836 --> 00:10:45,356
Speaker 2: to the patient's cognitive performance on different assessments. Just also

199
00:10:45,396 --> 00:10:48,236
Speaker 2: there's things about just their daily life. How are they

200
00:10:48,276 --> 00:10:51,076
Speaker 2: able to function in their daily life. There's things related

201
00:10:51,116 --> 00:10:55,996
Speaker 2: to their caregivers actually, like how does their caregiver rate

202
00:10:56,316 --> 00:11:00,796
Speaker 2: their behavior? Brain imaging, blood tests, all that kind of information.

203
00:11:00,916 --> 00:11:02,796
Speaker 2: You want to have as much of it about each patient.

204
00:11:02,876 --> 00:11:05,276
Speaker 2: You want to have it as many times as possible. Sure,

205
00:11:05,516 --> 00:11:07,276
Speaker 2: and we'll try to get that for you know, like

206
00:11:07,356 --> 00:11:11,516
Speaker 2: fifty thousand people. And that's the kind of data set

207
00:11:11,596 --> 00:11:13,036
Speaker 2: that we that we're starting with.

208
00:11:13,396 --> 00:11:16,836
Speaker 1: And like, is there one repository that when you get that,

209
00:11:16,876 --> 00:11:18,276
Speaker 1: you're like jackpot or what.

210
00:11:19,236 --> 00:11:23,316
Speaker 2: No, we we have to aggregate data from lots and

211
00:11:23,356 --> 00:11:25,156
Speaker 2: lots of different places to be able to build a

212
00:11:25,156 --> 00:11:25,996
Speaker 2: big enough data set.

213
00:11:26,916 --> 00:11:29,196
Speaker 1: Okay, so now you got the data, what do you

214
00:11:29,236 --> 00:11:29,676
Speaker 1: do next?

215
00:11:30,436 --> 00:11:33,716
Speaker 2: Then we got to train a model to to to

216
00:11:33,836 --> 00:11:37,036
Speaker 2: be able to learn from those data how to simulate things.

217
00:11:37,396 --> 00:11:38,676
Speaker 2: And now actually what we do.

218
00:11:38,876 --> 00:11:42,556
Speaker 1: In particular in this case, how to predict, given some

219
00:11:42,596 --> 00:11:45,036
Speaker 1: set of inputs for a patient, what's going to happen

220
00:11:45,076 --> 00:11:46,276
Speaker 1: next exactly?

221
00:11:46,316 --> 00:11:48,956
Speaker 2: And so this does look you were using that analogy

222
00:11:49,036 --> 00:11:52,036
Speaker 2: of like a language model predicts the next word. So

223
00:11:52,436 --> 00:11:55,036
Speaker 2: given these words I've seen before, predicts the next word.

224
00:11:55,316 --> 00:11:57,676
Speaker 2: And that's that is similar to how our models and

225
00:11:57,716 --> 00:11:59,956
Speaker 2: these diseases work. So we're going to say, given I've

226
00:11:59,996 --> 00:12:02,956
Speaker 2: observed these things in the past about a patient, what

227
00:12:03,036 --> 00:12:06,876
Speaker 2: will happen to them next? That is is very analogous

228
00:12:06,916 --> 00:12:07,876
Speaker 2: to kind of what we're doing.

229
00:12:08,516 --> 00:12:11,276
Speaker 1: It's okay, so you build the model, how does it work?

230
00:12:11,276 --> 00:12:14,636
Speaker 1: How does it work in a clinical trial, specifically so

231
00:12:14,676 --> 00:12:17,236
Speaker 1: that you know the people running the trial can can

232
00:12:17,316 --> 00:12:18,756
Speaker 1: do it with fewer patients.

233
00:12:18,996 --> 00:12:25,836
Speaker 2: Sure. So in a typical case, we're involved at the

234
00:12:25,876 --> 00:12:29,916
Speaker 2: beginning of the clinical trial in the design of the protocol. Okay,

235
00:12:30,316 --> 00:12:34,556
Speaker 2: So there's a question of how many patients should you

236
00:12:34,716 --> 00:12:37,716
Speaker 2: randomize to your control group, how many patients do you

237
00:12:37,756 --> 00:12:40,076
Speaker 2: need overall, and how many should be in the treatment,

238
00:12:40,076 --> 00:12:40,716
Speaker 2: how many should be in.

239
00:12:40,716 --> 00:12:42,876
Speaker 1: The control It's not always fifty to fifty.

240
00:12:43,196 --> 00:12:46,316
Speaker 2: It's not always fifty to fifty in our studies. Our

241
00:12:46,396 --> 00:12:49,356
Speaker 2: typical goal is to try to minimize the number of

242
00:12:49,396 --> 00:12:51,876
Speaker 2: people that you need to put in the control group. Okay,

243
00:12:52,996 --> 00:12:56,156
Speaker 2: And so we're involved in doing helping to do that

244
00:12:56,756 --> 00:12:58,796
Speaker 2: calculation to say, here's how big your trial should be.

245
00:12:58,996 --> 00:13:03,476
Speaker 2: And so then as patients enroll in the study, we

246
00:13:03,556 --> 00:13:08,436
Speaker 2: take data from their first visit before they receive whatever

247
00:13:08,596 --> 00:13:12,956
Speaker 2: new treatment they're going to receive and we take those data,

248
00:13:13,076 --> 00:13:15,796
Speaker 2: we input them into our pre trained model. So I

249
00:13:15,916 --> 00:13:17,876
Speaker 2: like to think about you know, CHATCHBTU give it a

250
00:13:17,916 --> 00:13:20,476
Speaker 2: prompt and it gives us output. Same thing. We take

251
00:13:20,516 --> 00:13:22,716
Speaker 2: the data from the patient, we prompt the model and

252
00:13:22,756 --> 00:13:25,276
Speaker 2: it outputs their predictions for what will happen.

253
00:13:24,996 --> 00:13:26,596
Speaker 1: In the And to be clear, you do that for

254
00:13:26,716 --> 00:13:28,916
Speaker 1: all of the patients in both arms the treatments.

255
00:13:29,476 --> 00:13:32,356
Speaker 2: Yes, yeah, and we don't know, right, it's blinded blind

256
00:13:32,356 --> 00:13:35,716
Speaker 2: it's you, it's blinded to us. We don't know what. Yeah,

257
00:13:35,756 --> 00:13:37,356
Speaker 2: So we do that for one hundred percent of the

258
00:13:37,396 --> 00:13:42,156
Speaker 2: patients and then we give those data to the customer,

259
00:13:42,756 --> 00:13:44,116
Speaker 2: to the farmer company.

260
00:13:44,316 --> 00:13:46,796
Speaker 1: So then what happens next? What happens next?

261
00:13:46,916 --> 00:13:50,076
Speaker 2: We wait around for a while. Yeah. And then when

262
00:13:50,116 --> 00:13:53,196
Speaker 2: the study is actually completed, right, and they they they

263
00:13:53,196 --> 00:13:57,596
Speaker 2: do unblind the data. We have to help to to

264
00:13:57,956 --> 00:14:01,356
Speaker 2: say here's how you now can incorporate these these predicted

265
00:14:01,396 --> 00:14:03,396
Speaker 2: outcomes into the analysis.

266
00:14:02,956 --> 00:14:04,836
Speaker 1: Like so this is this is it. Now We're at

267
00:14:04,836 --> 00:14:07,716
Speaker 1: the moment now when the thing you have built is useful.

268
00:14:07,796 --> 00:14:11,476
Speaker 1: So so now it's it's they have done the study,

269
00:14:11,836 --> 00:14:14,796
Speaker 1: they have the outcomes for the real human beings and

270
00:14:14,836 --> 00:14:17,956
Speaker 1: they have the predicted outcomes from your model. How is

271
00:14:17,996 --> 00:14:19,716
Speaker 1: your system? How's your model useful?

272
00:14:20,396 --> 00:14:22,996
Speaker 2: So the very first thing that we're basically going to

273
00:14:22,996 --> 00:14:24,156
Speaker 2: do is what I'm going to say, We're going to

274
00:14:24,196 --> 00:14:28,956
Speaker 2: recalibrate our model. Recalibrate and you're going to figure out

275
00:14:29,036 --> 00:14:33,236
Speaker 2: a relationship between your predicted outcomes and your observed outcomes

276
00:14:33,276 --> 00:14:36,796
Speaker 2: for the patients who really received the placebo, for.

277
00:14:36,876 --> 00:14:39,116
Speaker 1: The patients in the placebo group, And basically you're going

278
00:14:39,156 --> 00:14:40,436
Speaker 1: to see how you did how do we do.

279
00:14:40,876 --> 00:14:43,436
Speaker 2: Yes, and in particularly going to find out not just

280
00:14:43,836 --> 00:14:45,716
Speaker 2: it's not like a measure of was it good or bad,

281
00:14:45,756 --> 00:14:47,956
Speaker 2: You're going to find out exactly how are they related?

282
00:14:48,916 --> 00:14:53,076
Speaker 2: And then you can take that information in adjust your predictions.

283
00:14:53,636 --> 00:14:57,676
Speaker 2: Okay for everybody. So you can say, let's imagine that

284
00:14:57,956 --> 00:15:03,156
Speaker 2: I find out, well, on average, I'm i underestimating how

285
00:15:03,236 --> 00:15:05,436
Speaker 2: much a patient would progress by one point per year.

286
00:15:05,476 --> 00:15:08,236
Speaker 2: I'm on average underestimating it. Well, then I'll go through

287
00:15:08,236 --> 00:15:09,836
Speaker 2: and I'll take my prediction and I'll be like, well

288
00:15:10,476 --> 00:15:13,516
Speaker 2: add one point, add one point forer you. And then

289
00:15:13,916 --> 00:15:15,876
Speaker 2: now you have said, okay, well, now I've taken the

290
00:15:15,916 --> 00:15:18,236
Speaker 2: model and I've been able to do it in such

291
00:15:18,236 --> 00:15:21,036
Speaker 2: a way where I've fixed these mistakes by looking at

292
00:15:21,076 --> 00:15:23,556
Speaker 2: the actual patients who got place ebo, And now I'm

293
00:15:23,596 --> 00:15:25,596
Speaker 2: going to apply that model to the patient and the

294
00:15:25,636 --> 00:15:28,996
Speaker 2: treatment group, and I'm going to look at Now, I

295
00:15:29,156 --> 00:15:31,676
Speaker 2: just look at that difference between the patients and the

296
00:15:31,676 --> 00:15:33,636
Speaker 2: treatment group and their predictions from the model, and I

297
00:15:33,676 --> 00:15:36,156
Speaker 2: average that and I get an estimate for the treatment effect.

298
00:15:36,596 --> 00:15:39,996
Speaker 2: Now that is described in a two stage procedure. That

299
00:15:40,236 --> 00:15:43,236
Speaker 2: it's not actually a two stage procedure. It's one mathematical

300
00:15:43,236 --> 00:15:47,796
Speaker 2: analysis that you do it. But the thing that's really

301
00:15:48,316 --> 00:15:53,036
Speaker 2: I think quite amazing actually is that this has a

302
00:15:53,596 --> 00:15:57,916
Speaker 2: bunch of mathematical guarantees to it. We can actually prove

303
00:15:58,956 --> 00:16:01,596
Speaker 2: that the estimate that you get for how effective the

304
00:16:01,636 --> 00:16:06,236
Speaker 2: treatment is is still unbiased. So it's not an overestimate,

305
00:16:06,236 --> 00:16:09,836
Speaker 2: it's not under ustan, it's on average correct. Can prove

306
00:16:10,076 --> 00:16:12,636
Speaker 2: that if you compute a P value from the analysis

307
00:16:12,636 --> 00:16:15,236
Speaker 2: like you would typically do, that it has exactly the

308
00:16:15,316 --> 00:16:17,596
Speaker 2: right properties as it does out of a regular RCT.

309
00:16:17,756 --> 00:16:20,516
Speaker 1: P value is roughly the probability that the funding was

310
00:16:20,516 --> 00:16:20,916
Speaker 1: a fluke.

311
00:16:22,156 --> 00:16:25,756
Speaker 2: Ye right, Yeah. If you compute an arabar the arabar

312
00:16:25,876 --> 00:16:27,996
Speaker 2: you get from our analysis the air bar you would

313
00:16:27,996 --> 00:16:31,916
Speaker 2: get from a normal there. They all have exactly identical statistics.

314
00:16:31,956 --> 00:16:35,476
Speaker 1: This is not intuitive, but but you're saying, the mathematical

315
00:16:35,596 --> 00:16:39,076
Speaker 1: fact is that it works. Yes, And just to be clear,

316
00:16:40,036 --> 00:16:42,716
Speaker 1: what this allows you or the people running the trial

317
00:16:42,836 --> 00:16:46,796
Speaker 1: to do is to enroll fewer people in the placebo

318
00:16:46,916 --> 00:16:49,636
Speaker 1: arm not none, but fewer than they otherwise would have

319
00:16:49,716 --> 00:16:52,236
Speaker 1: had to get the same amount of statistical power. Right,

320
00:16:52,316 --> 00:16:55,076
Speaker 1: that is the bottom line thing that you are delivering. Yes,

321
00:16:55,156 --> 00:16:57,956
Speaker 1: that's correct, And it's something like a quarter or a

322
00:16:58,076 --> 00:17:00,036
Speaker 1: third less, is that right? Yeah?

323
00:17:00,156 --> 00:17:03,956
Speaker 2: So it depends on how accurate our models are. The

324
00:17:04,076 --> 00:17:06,516
Speaker 2: more accurate the model is, the fewer patients you need

325
00:17:06,556 --> 00:17:10,676
Speaker 2: in your placebo group. Sure so typically right now, yet

326
00:17:10,796 --> 00:17:13,716
Speaker 2: somewhere between like a quarter, like fifty percent. It depends

327
00:17:13,836 --> 00:17:15,796
Speaker 2: on the specific details.

328
00:17:15,956 --> 00:17:19,236
Speaker 1: So tell me what is the effect of that at

329
00:17:19,236 --> 00:17:21,476
Speaker 1: a macro scale? What does it mean to say a

330
00:17:21,596 --> 00:17:26,036
Speaker 1: drug company can get the same statistical power by enrolling

331
00:17:26,156 --> 00:17:30,196
Speaker 1: twenty five percent fewer people in their study, specifically in

332
00:17:30,276 --> 00:17:30,876
Speaker 1: the placeboar.

333
00:17:31,916 --> 00:17:34,476
Speaker 2: Well, I think that there are two things. First is

334
00:17:35,356 --> 00:17:39,316
Speaker 2: I think people don't always understand how expensive clinical trials

335
00:17:39,356 --> 00:17:43,116
Speaker 2: are you know, companies are paying one hundred sometimes two

336
00:17:43,236 --> 00:17:46,036
Speaker 2: hundred thousand dollars per patient in one of their clinical trials,

337
00:17:46,116 --> 00:17:49,196
Speaker 2: So finding and enrolling and monitoring a patient for all

338
00:17:49,236 --> 00:17:52,116
Speaker 2: that time is very, very expensive. It also just takes

339
00:17:52,116 --> 00:17:54,556
Speaker 2: a long time to find people who are willing to participate.

340
00:17:55,316 --> 00:17:58,036
Speaker 2: And so if you're talking about a large phase three trial,

341
00:17:58,276 --> 00:18:01,996
Speaker 2: reducing the size of the control group by twenty five percent,

342
00:18:02,076 --> 00:18:04,156
Speaker 2: that might mean like one hundred fewer patients that you

343
00:18:04,236 --> 00:18:06,916
Speaker 2: need to actually recruit and enroll in your study, and

344
00:18:07,276 --> 00:18:09,516
Speaker 2: that that could be like a year. But you know,

345
00:18:09,676 --> 00:18:11,996
Speaker 2: so you can save six months to a year off

346
00:18:12,036 --> 00:18:15,396
Speaker 2: of your total clinical trial timeline. That means a lot, right,

347
00:18:16,116 --> 00:18:19,436
Speaker 2: but both for patients. If the drug is actually successful,

348
00:18:19,876 --> 00:18:24,636
Speaker 2: that's a year faster it gets to market. And you know,

349
00:18:24,716 --> 00:18:27,276
Speaker 2: for the farmer company, that's office a big value proposition

350
00:18:27,396 --> 00:18:29,116
Speaker 2: being able to get the drug to market a year faster.

351
00:18:35,716 --> 00:18:39,836
Speaker 1: In a minute, moving from clinical trials to individual patients,

352
00:18:47,396 --> 00:18:53,236
Speaker 1: now back to the show. What is the what's the

353
00:18:53,276 --> 00:18:55,076
Speaker 1: big picture? Where are you trying to get to and

354
00:18:55,516 --> 00:19:01,076
Speaker 1: you know, in the medium termament in the long term, So.

355
00:19:02,476 --> 00:19:06,796
Speaker 2: The ability to understand what a person's health outcome is

356
00:19:06,836 --> 00:19:09,356
Speaker 2: going to be under different scenarios. This is I think

357
00:19:09,396 --> 00:19:12,396
Speaker 2: what's really important. Is it not just hey, given that

358
00:19:12,436 --> 00:19:14,396
Speaker 2: they would get a placebo, what's going to happen to

359
00:19:14,436 --> 00:19:16,636
Speaker 2: the health outcomes? That's nice for clinical trials, but we

360
00:19:16,716 --> 00:19:19,476
Speaker 2: want to know, hey, there's ten different treatment options for

361
00:19:19,556 --> 00:19:22,116
Speaker 2: this patient, and if I were to give them each

362
00:19:22,156 --> 00:19:24,436
Speaker 2: one of these different treatment options, what would their health

363
00:19:24,476 --> 00:19:26,356
Speaker 2: outcomes look like in those different scenarios.

364
00:19:27,276 --> 00:19:30,036
Speaker 1: So there you're also moving out of the clinical trial

365
00:19:30,596 --> 00:19:32,876
Speaker 1: into the realm of like a doctor seeing a patient.

366
00:19:32,996 --> 00:19:35,756
Speaker 1: Let's just be very clear, like that that's a huge leap,

367
00:19:36,076 --> 00:19:37,556
Speaker 1: and like that's what you're talking about.

368
00:19:37,796 --> 00:19:42,556
Speaker 2: I think that there's a really good pathway to being

369
00:19:42,676 --> 00:19:47,596
Speaker 2: able to build these things and make them useful for

370
00:19:47,996 --> 00:19:50,196
Speaker 2: problems that are at the individual patient level.

371
00:19:50,396 --> 00:19:52,516
Speaker 1: And is the narrow way to think about it, Like

372
00:19:53,236 --> 00:19:56,876
Speaker 1: before you get to the magical computer that can predict

373
00:19:56,916 --> 00:19:59,316
Speaker 1: everything for everybody, that you get to a very very

374
00:19:59,396 --> 00:20:04,196
Speaker 1: good model that can predict for individuals in certain circumstances

375
00:20:04,236 --> 00:20:06,116
Speaker 1: a certain set of outcomes. So, for example, you might

376
00:20:06,156 --> 00:20:09,636
Speaker 1: have a very very good Alzheimer's model for certain patients

377
00:20:10,156 --> 00:20:12,996
Speaker 1: at a certain stage of disease. This model is very

378
00:20:13,156 --> 00:20:15,396
Speaker 1: powerful at the level of the individual. Is that the

379
00:20:15,436 --> 00:20:18,036
Speaker 1: way to think about it, Yeah, the way I'll tell you.

380
00:20:18,036 --> 00:20:19,956
Speaker 2: The way I think about it. I think that the

381
00:20:20,076 --> 00:20:23,316
Speaker 2: most important thing that models can do, which actually things

382
00:20:23,396 --> 00:20:26,716
Speaker 2: like a chat ept are not good at, is that

383
00:20:26,796 --> 00:20:33,476
Speaker 2: they can give you really well calibrated estimates of their

384
00:20:33,556 --> 00:20:37,956
Speaker 2: own confidence. That's the most important thing that a model

385
00:20:37,996 --> 00:20:43,196
Speaker 2: can do, because, like we said earlier, health is stochastic.

386
00:20:43,556 --> 00:20:49,356
Speaker 2: There are all kinds of things that happens fundamentally exactly right.

387
00:20:50,356 --> 00:20:52,796
Speaker 2: And so you know, we're going to make a prediction

388
00:20:53,036 --> 00:20:56,196
Speaker 2: about somebody in the future, and sometimes we're going to

389
00:20:56,196 --> 00:20:58,636
Speaker 2: be really confident in that prediction and then it's actionable,

390
00:20:59,836 --> 00:21:02,836
Speaker 2: but sometimes you're not. It's not you're not confident, and

391
00:21:02,956 --> 00:21:06,356
Speaker 2: maybe it's not actionable because you're really unconfident. And the

392
00:21:06,476 --> 00:21:08,396
Speaker 2: most we're never going to get to the point that's

393
00:21:08,396 --> 00:21:10,396
Speaker 2: going to say, hey, you're going to have a heart

394
00:21:10,436 --> 00:21:15,196
Speaker 2: attack on July seventeenth of twenty thirty seven. It's like,

395
00:21:15,236 --> 00:21:17,636
Speaker 2: it's never going to be like that detail. But the

396
00:21:17,876 --> 00:21:21,996
Speaker 2: point question is can you believe the model's estimates of

397
00:21:22,076 --> 00:21:25,036
Speaker 2: its own confidence? And if you can, then you when

398
00:21:25,076 --> 00:21:27,396
Speaker 2: it is confident, you can act on it, and when

399
00:21:27,436 --> 00:21:29,756
Speaker 2: it's not confident, you can do other things. And that's

400
00:21:29,836 --> 00:21:32,756
Speaker 2: the that's so it's actually a really key technical thing,

401
00:21:32,836 --> 00:21:34,156
Speaker 2: and we know what we need to work on.

402
00:21:34,796 --> 00:21:36,836
Speaker 1: If I were going to answer pomorphis it, I'd be like,

403
00:21:36,876 --> 00:21:38,956
Speaker 1: it's like a it's like a humility. It's like an

404
00:21:38,956 --> 00:21:41,436
Speaker 1: epistemic humility, Like it knows what it doesn't know.

405
00:21:41,716 --> 00:21:43,436
Speaker 2: It knows what it doesn't know, and it will tell

406
00:21:43,476 --> 00:21:48,916
Speaker 2: you like I, yeah, here's my prediction. But yeah, exactly

407
00:21:49,276 --> 00:21:50,996
Speaker 2: So if you can get it to that point where

408
00:21:51,036 --> 00:21:54,396
Speaker 2: we were, where it's well calibrated that way, then they

409
00:21:54,476 --> 00:21:58,476
Speaker 2: become really really useful for a whole bunch of things.

410
00:21:59,116 --> 00:22:00,036
Speaker 2: And it's not going to say.

411
00:21:59,916 --> 00:22:03,076
Speaker 1: Become probably useful if they can have a relatively high

412
00:22:03,116 --> 00:22:06,236
Speaker 1: degree of certainty about at least some things, right yea, just.

413
00:22:06,276 --> 00:22:11,236
Speaker 2: Like yeah, it's not very course, yeah, but exactly so.

414
00:22:11,996 --> 00:22:15,596
Speaker 2: I think that that's the most important thing for these

415
00:22:16,116 --> 00:22:19,556
Speaker 2: applications of AI in medicine is to have models that

416
00:22:19,636 --> 00:22:21,556
Speaker 2: are going to be able to do that effectively.

417
00:22:22,716 --> 00:22:25,596
Speaker 1: If everything goes well, what problem will you be trying

418
00:22:25,636 --> 00:22:26,836
Speaker 1: to solve in five years?

419
00:22:28,196 --> 00:22:30,636
Speaker 2: In five years, I hope that we are rolling out

420
00:22:31,756 --> 00:22:35,596
Speaker 2: something that is a model for everything. That's what we

421
00:22:35,676 --> 00:22:37,596
Speaker 2: want to be rolling out, not this one disease at

422
00:22:37,636 --> 00:22:39,756
Speaker 2: a time thing, but one model for all disease. And

423
00:22:39,876 --> 00:22:42,916
Speaker 2: the reason why I really want to do this is

424
00:22:43,076 --> 00:22:45,996
Speaker 2: because if it's one model per disease, I need a

425
00:22:46,116 --> 00:22:48,916
Speaker 2: ton of data on that disease, a ton. So we

426
00:22:48,996 --> 00:22:50,956
Speaker 2: can work on these areas like Alzheimer's where I can

427
00:22:50,956 --> 00:22:53,636
Speaker 2: get data from fifty thousand patients, But how do I

428
00:22:53,756 --> 00:22:56,676
Speaker 2: work on the disease where I have fifty patients fifty

429
00:22:56,716 --> 00:22:58,756
Speaker 2: patients in the world who have this rare disease. Those

430
00:22:58,756 --> 00:23:01,996
Speaker 2: are really really important things. And the only way that

431
00:23:02,076 --> 00:23:03,556
Speaker 2: we're going to be able to do that is to

432
00:23:03,676 --> 00:23:06,716
Speaker 2: unlock a new kind of capability in our models to

433
00:23:06,876 --> 00:23:11,036
Speaker 2: learn from a handful of examples. And so this is

434
00:23:11,356 --> 00:23:14,916
Speaker 2: this is to me, the next frontier for our work

435
00:23:15,596 --> 00:23:18,116
Speaker 2: is figuring out how we can do that and then

436
00:23:18,316 --> 00:23:21,276
Speaker 2: bring that to market, because it opens up the ability

437
00:23:21,396 --> 00:23:24,716
Speaker 2: to work on rare diseases that are really really important

438
00:23:24,756 --> 00:23:29,236
Speaker 2: market very difficult to develop drugs for. And it's and

439
00:23:29,316 --> 00:23:31,636
Speaker 2: again I'm I'm you know, as a scientist, I'm drawn

440
00:23:31,676 --> 00:23:34,076
Speaker 2: to the technical challenges. Those are the things that.

441
00:23:34,236 --> 00:23:36,876
Speaker 1: It seems so hard, right, I mean, it seems like

442
00:23:37,516 --> 00:23:42,756
Speaker 1: this really basic insight about genitive models is that like

443
00:23:44,276 --> 00:23:47,196
Speaker 1: gigantic amounts of data feeding. You know, for a language model,

444
00:23:47,236 --> 00:23:48,996
Speaker 1: you feed it the whole internet is the way to

445
00:23:49,076 --> 00:23:52,716
Speaker 1: get it to understand how language works. And so how

446
00:23:53,116 --> 00:23:55,116
Speaker 1: how can you do something for fifty people?

447
00:23:55,316 --> 00:23:55,356
Speaker 2: Like?

448
00:23:55,836 --> 00:23:57,956
Speaker 1: How how that in five years?

449
00:23:58,436 --> 00:24:02,076
Speaker 2: Yeah, it's really hard. How the analogy is actually perfect? Okay,

450
00:24:02,596 --> 00:24:05,036
Speaker 2: if you want to build what we've learned is that

451
00:24:05,156 --> 00:24:08,156
Speaker 2: if you want to build a really amazing language model

452
00:24:08,716 --> 00:24:13,076
Speaker 2: that's really specific to some domain, so you only want

453
00:24:13,116 --> 00:24:16,396
Speaker 2: a language model that's really good at biophysics, it knows

454
00:24:16,476 --> 00:24:19,636
Speaker 2: biophysics really well. Would you be better off training a

455
00:24:19,716 --> 00:24:22,116
Speaker 2: model trying to find as much biophysics as you can

456
00:24:22,196 --> 00:24:24,556
Speaker 2: and training it on that or just training a model

457
00:24:24,556 --> 00:24:27,276
Speaker 2: on the entire init And what we've learned is much

458
00:24:27,316 --> 00:24:29,676
Speaker 2: better to train a model on the entire init that

459
00:24:29,756 --> 00:24:33,996
Speaker 2: there's a lot of things that transfer from one domain

460
00:24:34,156 --> 00:24:36,876
Speaker 2: to another. And so what we can do now is

461
00:24:36,916 --> 00:24:38,836
Speaker 2: say we train the model on the whole Internet, and

462
00:24:38,916 --> 00:24:42,436
Speaker 2: we have one biophysics paper, and we give it that

463
00:24:42,596 --> 00:24:46,276
Speaker 2: one or two papers on the background of all of

464
00:24:46,396 --> 00:24:49,156
Speaker 2: the knowledge from everywhere else, and that's much better than

465
00:24:49,196 --> 00:24:51,636
Speaker 2: trying to get lots and lots of biophysics papers. So

466
00:24:51,716 --> 00:24:54,956
Speaker 2: the analogy works perfectly in the exact same direction. That's

467
00:24:54,996 --> 00:24:56,516
Speaker 2: the whole point. We want to be able to take

468
00:24:56,676 --> 00:24:59,636
Speaker 2: all of the world's Imagine taking a model that has

469
00:24:59,876 --> 00:25:02,916
Speaker 2: all of the world's health data and putting all of

470
00:25:03,036 --> 00:25:05,516
Speaker 2: that into one So what seen everything and it can

471
00:25:05,556 --> 00:25:08,116
Speaker 2: now draw analogies between because there's a lot of things

472
00:25:08,196 --> 00:25:11,076
Speaker 2: you think about, like Parkinson's and Alzheimer's, they have a

473
00:25:11,116 --> 00:25:14,956
Speaker 2: lot of similarities, Huntington's a lot of similarities. So why

474
00:25:14,996 --> 00:25:18,276
Speaker 2: aren't we drawing kind of information or knowledge from one

475
00:25:18,356 --> 00:25:20,876
Speaker 2: disease area and using it to inform another because they

476
00:25:20,916 --> 00:25:24,236
Speaker 2: are similar. And so I'm allowing a model to have

477
00:25:24,476 --> 00:25:26,716
Speaker 2: access to all of the data and figure out how

478
00:25:26,756 --> 00:25:28,796
Speaker 2: to do it. I think is the right path forward.

479
00:25:29,596 --> 00:25:30,156
Speaker 2: So is that.

480
00:25:32,676 --> 00:25:35,996
Speaker 1: Wildly capital intensive? Like what do you actually do to

481
00:25:36,116 --> 00:25:38,036
Speaker 1: do that? You just get all the health data about

482
00:25:38,076 --> 00:25:40,236
Speaker 1: all the people you can and say to the machine

483
00:25:40,316 --> 00:25:41,996
Speaker 1: figure it out, Like what do you do?

484
00:25:43,516 --> 00:25:47,356
Speaker 2: Yeah? Yes, I mean the first step for us is

485
00:25:48,236 --> 00:25:50,916
Speaker 2: you need to get a lot of data. The biggest

486
00:25:50,996 --> 00:25:53,236
Speaker 2: thing is that we need to figure out a way

487
00:25:54,596 --> 00:25:58,676
Speaker 2: to have the model map all of those data to

488
00:25:58,796 --> 00:25:59,916
Speaker 2: the same representation.

489
00:26:00,476 --> 00:26:02,556
Speaker 1: What does that mean, map all of those data to

490
00:26:02,636 --> 00:26:03,596
Speaker 1: the same representation.

491
00:26:04,276 --> 00:26:09,996
Speaker 2: So let's imagine that there is some unobservable state of

492
00:26:10,076 --> 00:26:13,596
Speaker 2: a person which just describes their health. We can't actually

493
00:26:13,636 --> 00:26:16,516
Speaker 2: observe it directly. It's we don't exactly know what it is,

494
00:26:16,916 --> 00:26:19,236
Speaker 2: but we can make these measurements of it that tell

495
00:26:19,356 --> 00:26:23,356
Speaker 2: us something about that underlying state. So I can measure BMI,

496
00:26:23,476 --> 00:26:25,436
Speaker 2: I can measure heart rate, I can measure all the

497
00:26:25,556 --> 00:26:27,796
Speaker 2: I can measure all of these different things. And what

498
00:26:27,956 --> 00:26:29,956
Speaker 2: we want to be able to do is instead of

499
00:26:30,156 --> 00:26:32,076
Speaker 2: working in the world of measurements, which is where we

500
00:26:32,156 --> 00:26:34,396
Speaker 2: currently work, we want to be able to work at

501
00:26:34,436 --> 00:26:37,756
Speaker 2: that underlying unobservable state because if you can, if you

502
00:26:37,796 --> 00:26:39,916
Speaker 2: can see that, if you could reach through into that

503
00:26:40,156 --> 00:26:43,236
Speaker 2: underlying state, you can answer any question about any.

504
00:26:43,116 --> 00:26:46,676
Speaker 1: Patient's health, like like like a number like this one

505
00:26:46,756 --> 00:26:48,156
Speaker 1: state that is just like one.

506
00:26:48,476 --> 00:26:56,236
Speaker 2: High dimension the high dimension, right right, Well, okay, yeah,

507
00:26:56,316 --> 00:26:58,436
Speaker 2: so I mean yeah, but basically talking about is there

508
00:26:58,556 --> 00:27:01,796
Speaker 2: some vector, some really high dimensional space where we're able

509
00:27:01,876 --> 00:27:04,876
Speaker 2: to take all diseases and look at them how they're

510
00:27:04,916 --> 00:27:07,076
Speaker 2: related to each other in this really high dimensional space.

511
00:27:07,716 --> 00:27:10,036
Speaker 2: That is the way language models work. That's exactly how I.

512
00:27:10,116 --> 00:27:15,316
Speaker 1: Love And that's intense, Like, that's pretty far out right.

513
00:27:15,396 --> 00:27:16,676
Speaker 1: Doesn't that feel far out too?

514
00:27:17,236 --> 00:27:20,396
Speaker 2: I would say, talk like a hippie, But I if

515
00:27:20,436 --> 00:27:22,876
Speaker 2: I describe this to a machine learning researcher, they're like,

516
00:27:23,036 --> 00:27:26,596
Speaker 2: that sounds exactly like what you should do. So it

517
00:27:26,676 --> 00:27:29,236
Speaker 2: doesn't seem far out to me. It seems it seems

518
00:27:29,356 --> 00:27:31,476
Speaker 2: very clear that that's the direction that we should be

519
00:27:31,516 --> 00:27:32,036
Speaker 2: taking things.

520
00:27:32,396 --> 00:27:34,636
Speaker 1: And does five years seem like a like you might

521
00:27:34,716 --> 00:27:36,076
Speaker 1: actually do it in five years.

522
00:27:36,716 --> 00:27:40,076
Speaker 2: Yeah, we were hoping to be able to have a

523
00:27:40,236 --> 00:27:43,196
Speaker 2: version next year. That's a pan neuroscience model, so we're

524
00:27:43,236 --> 00:27:47,516
Speaker 2: starting with all. So we're starting with start with something

525
00:27:47,796 --> 00:27:50,356
Speaker 2: more attractable, build a more tractable thing. So right now

526
00:27:50,436 --> 00:27:53,636
Speaker 2: we're working on a neuroscience model. So we're hoping, I

527
00:27:53,676 --> 00:27:55,916
Speaker 2: mean which to be totally. This might not work. This

528
00:27:56,356 --> 00:27:58,756
Speaker 2: is a research idea, right, so it may work, it

529
00:27:58,836 --> 00:28:00,716
Speaker 2: might not work. But that's you ask where I would

530
00:28:00,716 --> 00:28:02,196
Speaker 2: hope to be. That's where I hope to be is

531
00:28:02,276 --> 00:28:04,236
Speaker 2: that we're able to solve those those problems.

532
00:28:08,076 --> 00:28:09,796
Speaker 1: So we'll be back in a minute. With the Lightning Round,

533
00:28:09,996 --> 00:28:12,556
Speaker 1: including what Charles learned when he worked as an ice

534
00:28:12,596 --> 00:28:22,476
Speaker 1: hockey wrap back to the show. I'm going to finish

535
00:28:22,556 --> 00:28:24,436
Speaker 1: with the Lightning Round. Will just be a few more minutes.

536
00:28:24,716 --> 00:28:24,996
Speaker 2: Okay.

537
00:28:26,236 --> 00:28:30,436
Speaker 1: As the name suggests, I've heard you say that you

538
00:28:30,596 --> 00:28:34,516
Speaker 1: read academic preprints, which is basically studies that are about

539
00:28:34,556 --> 00:28:37,516
Speaker 1: to be published, that you read them every day. What's

540
00:28:37,556 --> 00:28:39,636
Speaker 1: one you read recently that you found particularly interesting?

541
00:28:41,076 --> 00:28:44,276
Speaker 2: Recently? There have been a number of papers that I've

542
00:28:44,316 --> 00:28:50,556
Speaker 2: been reading around different ways of so training, the kind

543
00:28:50,556 --> 00:28:54,436
Speaker 2: of neural networks that we use. All of them use

544
00:28:54,556 --> 00:28:58,076
Speaker 2: a particular algorithm that people call ADAM. It's been used

545
00:28:58,116 --> 00:29:01,356
Speaker 2: for a really long time, like everyone uses it now,

546
00:29:02,556 --> 00:29:06,116
Speaker 2: and it has I don't know, it has some problems.

547
00:29:06,316 --> 00:29:08,356
Speaker 2: There's a paper that was just really recently on a

548
00:29:08,436 --> 00:29:10,636
Speaker 2: new algorithm people call LION. I don't know what it

549
00:29:10,716 --> 00:29:13,396
Speaker 2: stands for. L I O N stands for something. And

550
00:29:13,556 --> 00:29:17,556
Speaker 2: this was a discovered So they used a machine learning

551
00:29:17,596 --> 00:29:20,756
Speaker 2: out a reinforcement learning algorithm to discover a new kind

552
00:29:20,836 --> 00:29:21,516
Speaker 2: of optimizer.

553
00:29:22,436 --> 00:29:26,316
Speaker 1: So if this works, if Lion is better than ADAM,

554
00:29:26,436 --> 00:29:29,716
Speaker 1: will it be like machine learning figuring out a better

555
00:29:29,796 --> 00:29:32,476
Speaker 1: way to build machine learning. Is that what's happening here?

556
00:29:32,756 --> 00:29:34,436
Speaker 2: Yeah, that's what people are working on exactly.

557
00:29:34,716 --> 00:29:36,796
Speaker 1: This is like the takeoff. This is like the moment

558
00:29:36,836 --> 00:29:39,716
Speaker 1: when GPT five builds GBT six or whatever.

559
00:29:39,916 --> 00:29:42,196
Speaker 2: I think the claim is it's like five percent better something.

560
00:29:42,196 --> 00:29:42,956
Speaker 2: It's not. It's not.

561
00:29:44,716 --> 00:29:46,596
Speaker 1: Yes, Lion couldn't find the.

562
00:29:46,676 --> 00:29:49,836
Speaker 2: Time another thing yet Yeah. So yeah, that was a

563
00:29:49,876 --> 00:29:51,036
Speaker 2: paper I read really recently.

564
00:29:52,356 --> 00:29:54,676
Speaker 1: If you couldn't work in AI, what field would you

565
00:29:54,716 --> 00:29:54,916
Speaker 1: work in?

566
00:29:58,636 --> 00:30:03,756
Speaker 2: If I couldn't work in AI. Uh, I guess I

567
00:30:03,796 --> 00:30:09,356
Speaker 2: would probably try to work in energy, maybe tim a

568
00:30:09,476 --> 00:30:11,676
Speaker 2: change something related to that.

569
00:30:12,236 --> 00:30:14,476
Speaker 1: I think seeing bummed at the prospect of not being

570
00:30:14,516 --> 00:30:16,636
Speaker 1: able to work in AI, I appreciate that. I don't

571
00:30:16,636 --> 00:30:16,996
Speaker 1: want to make it.

572
00:30:17,076 --> 00:30:20,556
Speaker 2: I'm very bummed. Yeah, you know, I think it's the

573
00:30:20,676 --> 00:30:25,196
Speaker 2: most exciting thing that's happened on Earth since the Industrial Revolution.

574
00:30:25,276 --> 00:30:27,636
Speaker 2: So it's a new industrial revolution. Yeah.

575
00:30:28,356 --> 00:30:31,756
Speaker 1: Weirdly, you used to work at a virtual reality hardware company.

576
00:30:33,796 --> 00:30:36,996
Speaker 1: I feel like VR is always about to break through,

577
00:30:37,316 --> 00:30:39,716
Speaker 1: you know, like Apple just had this big announcement, had

578
00:30:39,756 --> 00:30:42,236
Speaker 1: a Facebook did a while ago, but yet it never

579
00:30:42,396 --> 00:30:45,956
Speaker 1: quite happens. Why not, Like, why are we not doing

580
00:30:45,996 --> 00:30:47,076
Speaker 1: this interview in the metaverse.

581
00:30:48,396 --> 00:30:51,436
Speaker 2: So I only worked at that company for a few months.

582
00:30:52,036 --> 00:30:56,276
Speaker 2: I spent my whole career working in biophysics. I moved

583
00:30:56,356 --> 00:30:58,996
Speaker 2: to Pfiser. I was working at Pfiser, and then I

584
00:30:59,076 --> 00:31:02,116
Speaker 2: got im just like, I'm gonna try something totally different,

585
00:31:02,676 --> 00:31:05,196
Speaker 2: and I went and tried this work at the VR company.

586
00:31:05,676 --> 00:31:08,516
Speaker 2: I was interested in that because of the underlying technical

587
00:31:08,556 --> 00:31:10,956
Speaker 2: problems research that I had to do, not because I

588
00:31:11,116 --> 00:31:15,276
Speaker 2: was drawn to the product. I have only ever used

589
00:31:15,316 --> 00:31:20,396
Speaker 2: a virtual reality headset twice my entire life. Once was

590
00:31:20,476 --> 00:31:23,156
Speaker 2: in the interview for that job, and once was testing

591
00:31:23,276 --> 00:31:26,316
Speaker 2: something while I was working at that job. I'm not

592
00:31:26,556 --> 00:31:28,996
Speaker 2: interested in it, so you want to know I was

593
00:31:29,076 --> 00:31:31,196
Speaker 2: interested in the engineering. So you want to know why

594
00:31:31,276 --> 00:31:34,156
Speaker 2: I don't think it's taken off. Is because most people

595
00:31:34,236 --> 00:31:37,876
Speaker 2: don't have a compelling reason to use it. Neither do I. Yeah,

596
00:31:38,476 --> 00:31:41,396
Speaker 2: what'd you learn working as an ice hockey referee? Ice

597
00:31:41,436 --> 00:31:45,236
Speaker 2: hockey referee? Oh, that was like my super super young job.

598
00:31:47,356 --> 00:31:50,596
Speaker 2: I would say that I learned it's best not to

599
00:31:50,716 --> 00:31:57,396
Speaker 2: call penalties on little children. That's what I learned. You know,

600
00:31:57,516 --> 00:31:59,516
Speaker 2: people would just like like run into each other and

601
00:31:59,556 --> 00:32:01,396
Speaker 2: they'd fall down. You're like, is that a penalty. Was

602
00:32:01,436 --> 00:32:03,916
Speaker 2: it on purpose? Not on purpose? If you call a penalty,

603
00:32:04,196 --> 00:32:05,876
Speaker 2: the parents are going to be real upset at you.

604
00:32:05,956 --> 00:32:07,396
Speaker 2: So you just just let them play.

605
00:32:07,836 --> 00:32:10,476
Speaker 1: Good early experiments, cost benefit analysis.

606
00:32:10,716 --> 00:32:11,396
Speaker 2: Just let them play.

607
00:32:17,156 --> 00:32:20,636
Speaker 1: Charles Fisher is the co founder and CEO of Umler.

608
00:32:21,396 --> 00:32:24,876
Speaker 1: Today's show was edited by Sarah Nis, produced by Gabriel

609
00:32:24,996 --> 00:32:29,796
Speaker 1: Hunter Chang and Edith Russlo, and engineered by Amanda k Wong.

610
00:32:30,076 --> 00:32:33,036
Speaker 1: I'm Jacob Goldstein. One last note, the show is going

611
00:32:33,116 --> 00:32:35,236
Speaker 1: to be off for the next several weeks and we'll

612
00:32:35,276 --> 00:32:38,596
Speaker 1: be back with new episodes in August. Have a rad summer.