1
00:00:02,040 --> 00:00:07,360
Speaker 1: This is Master's in Business with Barry Ridholds on Bloomberg Radio.

2
00:00:08,400 --> 00:00:11,800
Speaker 2: This week on the podcast Strap Yourself In, I have

3
00:00:12,039 --> 00:00:16,159
Speaker 2: another extra special guest. John mccauliffe is co founder and

4
00:00:16,239 --> 00:00:19,720
Speaker 2: chief investment officer at the Volleyon Group. They're a five

5
00:00:19,760 --> 00:00:23,360
Speaker 2: billion dollar hedge funds and one of the earliest shops

6
00:00:24,079 --> 00:00:27,880
Speaker 2: to ever use machine learning as it applies to training

7
00:00:27,920 --> 00:00:33,239
Speaker 2: and investment management decisions. It is a full systematic approach

8
00:00:33,360 --> 00:00:39,599
Speaker 2: to using computer horsepower and database and machine learning and

9
00:00:39,880 --> 00:00:43,960
Speaker 2: their own predictive engine to make investments and trades, and

10
00:00:44,000 --> 00:00:48,400
Speaker 2: it's managed to put together quite a track record. Previously,

11
00:00:48,479 --> 00:00:52,680
Speaker 2: John was at d SHAW where he ran statistical arbitrage.

12
00:00:53,080 --> 00:00:56,040
Speaker 2: He is one of the people who worked on the

13
00:00:56,080 --> 00:01:01,520
Speaker 2: Amazon recommendation engine, and he is currently a professor of

14
00:01:01,560 --> 00:01:06,040
Speaker 2: statistics at Berkeley. I don't even know where to begin,

15
00:01:06,200 --> 00:01:10,160
Speaker 2: other than say, if you're interested in AI or machine

16
00:01:10,240 --> 00:01:16,080
Speaker 2: learning or quantitative strategies, this is just a masterclass in

17
00:01:16,120 --> 00:01:18,280
Speaker 2: how it's done by one of the first people in

18
00:01:18,319 --> 00:01:22,880
Speaker 2: the space to not only do this sort of machine

19
00:01:22,920 --> 00:01:25,800
Speaker 2: learning and apply it to investing, but one of the best.

20
00:01:26,120 --> 00:01:29,400
Speaker 2: I think this is a fascinating conversation and I believe

21
00:01:29,440 --> 00:01:31,840
Speaker 2: you will find it to be so. Also, with no

22
00:01:32,000 --> 00:01:37,800
Speaker 2: further ado, my discussion with volleyon groups. John mccauliffe. John mccauliff,

23
00:01:37,920 --> 00:01:39,479
Speaker 2: Welcome to Bloomberg.

24
00:01:40,120 --> 00:01:41,560
Speaker 1: Thanks, Barry. I'm really happy to be here.

25
00:01:41,640 --> 00:01:45,160
Speaker 2: So let's talk a little bit about your academic background. First,

26
00:01:45,840 --> 00:01:49,640
Speaker 2: you start out undergrad computer science and applied mathematics at

27
00:01:49,720 --> 00:01:54,040
Speaker 2: Harvard before you go on to get a PhD from

28
00:01:54,080 --> 00:01:58,240
Speaker 2: California Berkeley. What led to a career in data analysis?

29
00:01:58,280 --> 00:02:00,880
Speaker 2: How early did you know that's what you wanted to do?

30
00:02:01,880 --> 00:02:06,200
Speaker 1: Well, it was a winding path. Actually, I was very

31
00:02:06,240 --> 00:02:11,320
Speaker 1: interested in international relations and foreign languages when I was

32
00:02:11,400 --> 00:02:13,880
Speaker 1: finishing high school. In fact, I spent the last year

33
00:02:13,919 --> 00:02:17,040
Speaker 1: of high school as an exchange student in Germany. And

34
00:02:17,120 --> 00:02:20,400
Speaker 1: so when I got to college, I was expecting to

35
00:02:20,919 --> 00:02:23,880
Speaker 1: major in government and go on to maybe work in

36
00:02:23,919 --> 00:02:25,600
Speaker 1: the foreign service something like that.

37
00:02:25,800 --> 00:02:30,200
Speaker 2: Really, so this is a big shift from your original expectations.

38
00:02:30,280 --> 00:02:33,559
Speaker 1: Yeah, it took about one semester for me to realize

39
00:02:33,840 --> 00:02:36,600
Speaker 1: that none of the questions that were being asked in

40
00:02:36,639 --> 00:02:39,680
Speaker 1: my classes had definitive and correct answers.

41
00:02:40,680 --> 00:02:41,720
Speaker 2: Did that frustrate you?

42
00:02:41,880 --> 00:02:45,760
Speaker 1: It did frustrate me. Yeah, And so I stayed home

43
00:02:45,880 --> 00:02:47,959
Speaker 1: over winter I stayed Excuse me, I didn't go home.

44
00:02:48,000 --> 00:02:51,120
Speaker 1: I stayed at college over winter break to try to

45
00:02:51,160 --> 00:02:52,400
Speaker 1: sort out what the heck I was going to do,

46
00:02:52,440 --> 00:02:54,440
Speaker 1: because I could see that it wasn't My plan was

47
00:02:54,440 --> 00:02:58,000
Speaker 1: in disarray. And I'd always been interested in computers, had

48
00:02:58,000 --> 00:03:01,840
Speaker 1: played around with computers, never done anything very serious, but

49
00:03:02,919 --> 00:03:05,560
Speaker 1: I thought I might as well give it a shot,

50
00:03:05,639 --> 00:03:07,799
Speaker 1: and so in the spring semester I took my first

51
00:03:07,840 --> 00:03:13,560
Speaker 1: computer science course. And when you write software, everything has

52
00:03:13,720 --> 00:03:15,800
Speaker 1: a right answer. It either does what you wanted to

53
00:03:15,800 --> 00:03:16,200
Speaker 1: do or.

54
00:03:16,160 --> 00:03:20,360
Speaker 2: It doesn't, does not compile exactly. So that's really quite

55
00:03:20,480 --> 00:03:26,919
Speaker 2: quite fascinating. So what led you from Berkeley to d

56
00:03:27,280 --> 00:03:30,040
Speaker 2: sure that they're one of the first quand shops. How

57
00:03:30,080 --> 00:03:31,839
Speaker 2: did you get there? What sort of research did Yeah?

58
00:03:31,840 --> 00:03:34,920
Speaker 1: I actually I spent time at d Shot in between

59
00:03:34,960 --> 00:03:38,920
Speaker 1: my undergrad and my PhD program, So it was after

60
00:03:39,000 --> 00:03:40,720
Speaker 1: Harvard that I went to that show.

61
00:03:40,840 --> 00:03:45,280
Speaker 2: Did that light an interest in using machine learning and

62
00:03:45,320 --> 00:03:48,840
Speaker 2: computers applied to finance or what was that experience like?

63
00:03:49,120 --> 00:03:54,040
Speaker 1: Yeah, it made me really interested in and excited about

64
00:03:54,520 --> 00:03:59,720
Speaker 1: using statistical thinking and data analysis to sort of understand

65
00:03:59,720 --> 00:04:02,840
Speaker 1: then amics of securities prices. Machine learning did not play

66
00:04:02,880 --> 00:04:05,840
Speaker 1: really a role at that time, I think, not at

67
00:04:05,920 --> 00:04:08,680
Speaker 1: d SHAW, but you know, probably nowhere it was too

68
00:04:08,720 --> 00:04:13,960
Speaker 1: immature a feel in the nineties. But I had already

69
00:04:14,000 --> 00:04:20,039
Speaker 1: been curious and interested in using these kinds of statistical

70
00:04:20,080 --> 00:04:23,480
Speaker 1: tools in trading and in investing when I was finishing

71
00:04:23,520 --> 00:04:25,880
Speaker 1: college and then at d SHAW. You know, I had

72
00:04:25,920 --> 00:04:28,800
Speaker 1: brilliant colleagues and we were working on hard problems. So

73
00:04:28,839 --> 00:04:30,200
Speaker 1: I really, I really got a lot.

74
00:04:30,040 --> 00:04:32,559
Speaker 2: Of us still one of the top performing hedge funds,

75
00:04:32,600 --> 00:04:34,520
Speaker 2: one of the earliest quant hedge funds, A great a

76
00:04:34,560 --> 00:04:37,440
Speaker 2: great place to absolutely cut your teeth at. So was

77
00:04:37,480 --> 00:04:40,080
Speaker 2: it Harvard d SHAW and then Berkeley?

78
00:04:40,160 --> 00:04:40,800
Speaker 1: Yeah, that's right?

79
00:04:40,839 --> 00:04:43,800
Speaker 2: And then from Berkeley? How did you end up at Amazon?

80
00:04:44,800 --> 00:04:47,400
Speaker 1: I guess I should correct myself. There was a year

81
00:04:47,440 --> 00:04:50,320
Speaker 1: at Amazon after d Eshaw, but before Berkeley.

82
00:04:50,240 --> 00:04:55,520
Speaker 2: And am I reading this correctly? The recommendation engine that

83
00:04:55,640 --> 00:04:58,480
Speaker 2: Amazon uses you helped develop?

84
00:04:58,880 --> 00:05:01,200
Speaker 1: I would say I worked on it. I would you

85
00:05:01,200 --> 00:05:03,560
Speaker 1: know it existed, It was in place when I got there,

86
00:05:03,920 --> 00:05:07,200
Speaker 1: and sort of the things that are familiar about the

87
00:05:07,200 --> 00:05:11,480
Speaker 1: recommendation engine had already been built by my manager and

88
00:05:12,000 --> 00:05:17,520
Speaker 1: his colleagues. But I worked I did research on improvements

89
00:05:17,560 --> 00:05:21,880
Speaker 1: and different ways of forming recommendations. It was funny because

90
00:05:21,920 --> 00:05:28,840
Speaker 1: at the time, the entire database of purchase history for

91
00:05:29,240 --> 00:05:32,719
Speaker 1: all of Amazon fit in one twenty gigabyte file on

92
00:05:32,800 --> 00:05:34,160
Speaker 1: a disc, so I could just load it on my

93
00:05:34,200 --> 00:05:35,280
Speaker 1: computer and run.

94
00:05:35,360 --> 00:05:36,960
Speaker 2: Now I don't think we could do that anymore. We

95
00:05:37,160 --> 00:05:39,920
Speaker 2: could not, so, thank goodness is Amica Zone cloud services,

96
00:05:39,960 --> 00:05:43,279
Speaker 2: so you could put what is it, twenty five years

97
00:05:43,520 --> 00:05:50,200
Speaker 2: and hundreds of billions of dollars of transactions. So my

98
00:05:50,320 --> 00:05:53,839
Speaker 2: assumption is products like that are highly iterative. The first

99
00:05:53,920 --> 00:05:56,159
Speaker 2: version is all right, it does a half decent job,

100
00:05:56,440 --> 00:05:58,240
Speaker 2: and then it gets better, and then it starts to

101
00:05:58,279 --> 00:06:01,919
Speaker 2: get almost spookily good. It's like, oh, how much of

102
00:06:01,960 --> 00:06:04,560
Speaker 2: that is just the size of the database, and how

103
00:06:04,600 --> 00:06:07,360
Speaker 2: much of that is just a clever algorithm.

104
00:06:08,040 --> 00:06:12,960
Speaker 1: Well, that's a great question, because the two are inextricably linked.

105
00:06:14,160 --> 00:06:18,080
Speaker 1: The way that you make algorithms great is by making

106
00:06:18,120 --> 00:06:21,360
Speaker 1: them more powerful, more expressive, able to describe lots of

107
00:06:21,360 --> 00:06:25,000
Speaker 1: different kinds of patterns and relationships. But those kinds of

108
00:06:25,040 --> 00:06:29,480
Speaker 1: approaches need huge amounts of data in order to correctly

109
00:06:29,520 --> 00:06:32,280
Speaker 1: sort out what's signal and what's noise. The more expressive.

110
00:06:32,800 --> 00:06:35,680
Speaker 1: A tool like that is like a recommender system, the

111
00:06:35,680 --> 00:06:40,359
Speaker 1: more prone it is to mistake one time noise for

112
00:06:40,560 --> 00:06:44,200
Speaker 1: persistent signal, and that is a recurring theme in statistical prediction.

113
00:06:44,279 --> 00:06:47,599
Speaker 1: It is really the central problem in statistical predictions. So

114
00:06:47,839 --> 00:06:50,520
Speaker 1: you have it in recommender systems, you have it in

115
00:06:50,720 --> 00:06:54,960
Speaker 1: predicting price action, in the problems that we solve, and elsewhere.

116
00:06:55,400 --> 00:06:57,560
Speaker 2: There was a pretty infamous New York Times article a

117
00:06:57,560 --> 00:07:02,919
Speaker 2: couple of years ago about targets using their own recommender

118
00:07:03,000 --> 00:07:09,279
Speaker 2: system and sending out maternity things to people. A dad

119
00:07:09,320 --> 00:07:13,160
Speaker 2: gets his young teenage daughters what is this and goes

120
00:07:13,200 --> 00:07:15,200
Speaker 2: in to yell at them, and turns out she was

121
00:07:15,200 --> 00:07:20,080
Speaker 2: pregnant and they had pieced it together. How far of

122
00:07:20,120 --> 00:07:23,400
Speaker 2: a leap is it from these systems to much more

123
00:07:23,880 --> 00:07:29,160
Speaker 2: sophisticated machine learning and even large language models.

124
00:07:30,080 --> 00:07:32,840
Speaker 1: It's the answer, it turns out, is that it's a

125
00:07:32,920 --> 00:07:37,520
Speaker 1: question of scale. That wasn't at all obvious before GPT

126
00:07:38,120 --> 00:07:42,640
Speaker 1: three and chat GPT, But it just turned out that

127
00:07:42,920 --> 00:07:46,360
Speaker 1: when you have, for example, GPT is built from a

128
00:07:46,440 --> 00:07:50,800
Speaker 1: database of sentences in English, it's got a trillion words

129
00:07:50,800 --> 00:07:53,880
Speaker 1: in it that database, and when you take a trillion

130
00:07:53,920 --> 00:07:55,520
Speaker 1: words and you use it to fit a model that

131
00:07:55,560 --> 00:07:59,080
Speaker 1: has one hundred and seventy five billion parameters. There is

132
00:07:59,120 --> 00:08:02,480
Speaker 1: apparently a kind of transition where things become, you know,

133
00:08:02,560 --> 00:08:05,440
Speaker 1: frankly astounding. I don't I think, I don't think that

134
00:08:05,480 --> 00:08:07,600
Speaker 1: anybody who isn't astounded is telling the truth.

135
00:08:07,720 --> 00:08:13,200
Speaker 2: Right. It's eerie is in terms of how sophisticated it is,

136
00:08:13,520 --> 00:08:17,120
Speaker 2: but it's also kind of surprising in terms of I

137
00:08:17,120 --> 00:08:20,520
Speaker 2: guess what the programs like to call hallucinations. I guess

138
00:08:20,560 --> 00:08:24,160
Speaker 2: if you're using the Internet as your base model, Hey,

139
00:08:24,320 --> 00:08:26,760
Speaker 2: there's one or two things on the Internet that are wrong,

140
00:08:27,240 --> 00:08:30,080
Speaker 2: so of course that's going to show up in something

141
00:08:30,080 --> 00:08:30,960
Speaker 2: like chap GPT.

142
00:08:31,640 --> 00:08:36,559
Speaker 1: Yeah, you know. Underlyingly, there's this tool GPT three that's

143
00:08:36,600 --> 00:08:39,560
Speaker 1: really the engine that powers jed GPT, and that tool

144
00:08:40,760 --> 00:08:43,560
Speaker 1: it has one goal. It's a simple goal. You show

145
00:08:43,600 --> 00:08:46,240
Speaker 1: at the beginning of a sentence, and it predicts the

146
00:08:46,280 --> 00:08:48,600
Speaker 1: next word in the sentence, and that's all it is

147
00:08:48,600 --> 00:08:50,800
Speaker 1: trained to do. I mean, it really is actually that simple.

148
00:08:51,280 --> 00:08:53,559
Speaker 2: It's a dumb program that looks smart.

149
00:08:53,679 --> 00:08:58,079
Speaker 1: If you like. But the thing about predicting the next

150
00:08:58,120 --> 00:09:01,600
Speaker 1: word in a sentence is whether you know the sequence

151
00:09:01,600 --> 00:09:04,400
Speaker 1: of words that's being output, is leading to something that

152
00:09:04,440 --> 00:09:07,400
Speaker 1: is true or false, is irrelevant. The only thing that

153
00:09:07,520 --> 00:09:10,640
Speaker 1: it is trained to do is make highly accurate predictions

154
00:09:10,679 --> 00:09:11,320
Speaker 1: of next words.

155
00:09:11,640 --> 00:09:16,080
Speaker 2: So when I said it's really very sophisticated, it just

156
00:09:16,640 --> 00:09:20,080
Speaker 2: for what we tend to call this artificial intelligence. But

157
00:09:20,160 --> 00:09:22,559
Speaker 2: I've read a number of people said, hey, this really

158
00:09:22,600 --> 00:09:26,320
Speaker 2: is an AI. This is something a little more rudimentary.

159
00:09:26,480 --> 00:09:30,719
Speaker 1: Yeah, I think, you know, a critic would say that

160
00:09:31,280 --> 00:09:35,559
Speaker 1: artificial intelligence is a complete misnomer. There's sort of nothing

161
00:09:36,160 --> 00:09:40,560
Speaker 1: remotely intelligent in the colloquial sense about these systems. And

162
00:09:40,600 --> 00:09:44,080
Speaker 1: then a common defense in AI research is that artificial

163
00:09:44,120 --> 00:09:46,240
Speaker 1: intelligence is a moving target. As soon as you build

164
00:09:46,240 --> 00:09:50,120
Speaker 1: a system that does something quasi magical that was the

165
00:09:50,160 --> 00:09:53,720
Speaker 1: old yardstick of intelligence, then the goalposts get moved by

166
00:09:53,720 --> 00:09:57,400
Speaker 1: the people who are supplying the evaluations. And I guess

167
00:09:57,440 --> 00:09:59,320
Speaker 1: I would sit somewhere in between. I think the language

168
00:09:59,320 --> 00:10:03,480
Speaker 1: is unfortunate because it's so easily misconstrued. I wouldn't call

169
00:10:03,520 --> 00:10:05,760
Speaker 1: the system dumb, and I wouldn't call it smart. It's

170
00:10:06,000 --> 00:10:08,319
Speaker 1: you know, those are those are not characteristics of these systems.

171
00:10:08,440 --> 00:10:10,320
Speaker 2: But it's complex and sophisticated.

172
00:10:10,440 --> 00:10:12,280
Speaker 1: It certainly is it has one hundred and seventy five

173
00:10:12,280 --> 00:10:15,559
Speaker 1: billion parameters. That doesn't fit your definition of complex you

174
00:10:15,679 --> 00:10:16,120
Speaker 1: know what would?

175
00:10:16,920 --> 00:10:21,360
Speaker 2: Yeah, that works for me. So your in your career line,

176
00:10:21,400 --> 00:10:26,720
Speaker 2: where is aphametrics and what was that recommendation engine? Like?

177
00:10:26,920 --> 00:10:29,760
Speaker 1: Yeah, So that was work I did as a summer

178
00:10:30,280 --> 00:10:34,120
Speaker 1: research intern during my PhD. And that work was about

179
00:10:35,800 --> 00:10:39,880
Speaker 1: what's called the problem is called genotype calling. So genotype calling,

180
00:10:40,360 --> 00:10:43,200
Speaker 1: I'll explain, Barry, do you have an identical twin? I

181
00:10:43,280 --> 00:10:46,720
Speaker 1: do not, Okay, So I can safely say your genome

182
00:10:46,760 --> 00:10:49,040
Speaker 1: is unique in the world. There's no one else who

183
00:10:49,040 --> 00:10:52,560
Speaker 1: has exactly your genome. On the other hand, if you

184
00:10:52,559 --> 00:10:55,280
Speaker 1: were to lay your genome in mind alongside each other

185
00:10:55,400 --> 00:10:58,840
Speaker 1: lined up, they would be ninety nine point nine percent identical.

186
00:10:58,960 --> 00:11:03,240
Speaker 1: About one position in a thousand is different. But those

187
00:11:03,280 --> 00:11:05,719
Speaker 1: differences are what caused you to be you and me

188
00:11:05,800 --> 00:11:08,200
Speaker 1: to be me. So they're obviously of intense kind of

189
00:11:08,200 --> 00:11:11,680
Speaker 1: scientific and applied interest. And so it's very important to

190
00:11:11,800 --> 00:11:15,320
Speaker 1: be able to take a sort of a sample of

191
00:11:15,360 --> 00:11:20,360
Speaker 1: your DNA and quickly produce a profile of all the

192
00:11:20,400 --> 00:11:23,960
Speaker 1: places that have variability what your particular values are, Okay,

193
00:11:24,120 --> 00:11:27,079
Speaker 1: And that problem is the genotyping problem.

194
00:11:27,400 --> 00:11:31,000
Speaker 2: And this used to be a very expensive, very complex

195
00:11:31,760 --> 00:11:34,840
Speaker 2: problem to solve that. We've spent billions of dollars figuring

196
00:11:34,880 --> 00:11:38,040
Speaker 2: out now a lot faster, a lot cheaper.

197
00:11:37,800 --> 00:11:40,560
Speaker 1: A lot faster. In fact, even the technology I worked

198
00:11:40,600 --> 00:11:43,280
Speaker 1: on in two thousand and five two thousand and four

199
00:11:43,840 --> 00:11:47,679
Speaker 1: is multiple generations old and not really what's used anymore.

200
00:11:47,880 --> 00:11:51,000
Speaker 2: So let's talk about what you did at the efficient frontier.

201
00:11:51,559 --> 00:11:56,680
Speaker 2: Explain what real time click prediction rules are and how

202
00:11:56,720 --> 00:11:59,120
Speaker 2: it works for a keyword search.

203
00:11:59,280 --> 00:12:06,000
Speaker 1: Sure, the revenue engine that drives Google is search keyword ads, right,

204
00:12:06,040 --> 00:12:07,480
Speaker 1: So every time you do a search, at the top

205
00:12:07,520 --> 00:12:10,360
Speaker 1: you see ad ad AD, And so how do those

206
00:12:10,400 --> 00:12:14,520
Speaker 1: ads get there? Well, actually it's surprising maybe if you

207
00:12:14,520 --> 00:12:16,280
Speaker 1: don't know about it, but every single time you type

208
00:12:16,280 --> 00:12:19,240
Speaker 1: in a search term on Google and hit return, a

209
00:12:19,559 --> 00:12:23,280
Speaker 1: very fast auction takes place, and a whole bunch of

210
00:12:23,360 --> 00:12:28,840
Speaker 1: companies running software bid electronically to place their ads at

211
00:12:28,840 --> 00:12:33,360
Speaker 1: the top of your search results. And the more or

212
00:12:33,480 --> 00:12:36,600
Speaker 1: less the results that are shown on the page are

213
00:12:36,600 --> 00:12:38,760
Speaker 1: in order of how much they bid. It's not quite true,

214
00:12:38,760 --> 00:12:40,000
Speaker 1: but you could think of it. It's true.

215
00:12:40,360 --> 00:12:44,160
Speaker 2: A rough outline. So the first three sponsored results on

216
00:12:44,280 --> 00:12:48,320
Speaker 2: a Google page, go through that auction process, and I

217
00:12:48,360 --> 00:12:50,760
Speaker 2: think at this point everybody knows what page rank is

218
00:12:50,800 --> 00:12:53,800
Speaker 2: for for the rest of that that's right, And that

219
00:12:53,840 --> 00:12:56,080
Speaker 2: seemed to be Google secret sauce early on.

220
00:12:56,280 --> 00:13:01,360
Speaker 1: Right, Well, you know, to talk about the the ad placement.

221
00:13:01,520 --> 00:13:04,120
Speaker 1: So the people who are supplying the ad, who are

222
00:13:04,120 --> 00:13:06,200
Speaker 1: participating in the auctions, they have a problem, which is

223
00:13:06,200 --> 00:13:08,840
Speaker 1: how much to bid, right, And so how would you

224
00:13:08,880 --> 00:13:11,959
Speaker 1: decide how much to bid? Well, you want to know

225
00:13:13,120 --> 00:13:15,520
Speaker 1: basically the probability that somebody is going to click on

226
00:13:15,559 --> 00:13:19,040
Speaker 1: your ad, and then you would multiply that by how

227
00:13:19,120 --> 00:13:21,959
Speaker 1: much money you make eventually if they click. And that's

228
00:13:22,040 --> 00:13:24,840
Speaker 1: kind of an expectation of how much money you'll make.

229
00:13:25,200 --> 00:13:29,959
Speaker 1: And so then you gear your bid price to make

230
00:13:30,000 --> 00:13:32,319
Speaker 1: sure that it's going to be profitable for you. And

231
00:13:32,360 --> 00:13:36,000
Speaker 1: then so really you have to make a decision about

232
00:13:36,200 --> 00:13:38,040
Speaker 1: what this click through rate is going to be. You

233
00:13:38,040 --> 00:13:39,800
Speaker 1: have to predict the click through probability.

234
00:13:40,360 --> 00:13:42,480
Speaker 2: So I was going to say, this sounds like it's

235
00:13:42,520 --> 00:13:48,000
Speaker 2: a very sophisticated application of computer science probability and statistics.

236
00:13:48,520 --> 00:13:50,880
Speaker 2: And if you do it right, you make money, and

237
00:13:50,960 --> 00:13:54,360
Speaker 2: if you do it wrong, your ad budget is a

238
00:13:54,360 --> 00:13:55,400
Speaker 2: money loser.

239
00:13:55,160 --> 00:13:55,559
Speaker 1: That's right.

240
00:13:55,840 --> 00:13:58,520
Speaker 2: Huh. So tell us a little bit about your doctorate,

241
00:13:58,600 --> 00:14:01,920
Speaker 2: what you wrote about for your PhD at Berkeley.

242
00:14:02,240 --> 00:14:06,640
Speaker 1: Yeah, so we're back to genomes. Actually, this was around

243
00:14:06,679 --> 00:14:08,679
Speaker 1: the time when I was in my first year of

244
00:14:08,679 --> 00:14:11,560
Speaker 1: my PhD program, is when the human genome was published

245
00:14:11,960 --> 00:14:16,120
Speaker 1: in Nature. So it was kind of really the beginning

246
00:14:16,200 --> 00:14:20,720
Speaker 1: of the explosion of work on kind of high throughput,

247
00:14:21,240 --> 00:14:26,400
Speaker 1: large scale genetics research. And one really important question after

248
00:14:26,440 --> 00:14:28,600
Speaker 1: you've sequenced a genome is well, what are all the

249
00:14:28,640 --> 00:14:30,360
Speaker 1: bits of it doing. You can look at a string

250
00:14:30,400 --> 00:14:33,520
Speaker 1: of DNA. It's just made up of these kind of

251
00:14:33,520 --> 00:14:37,400
Speaker 1: four letters, but you don't want to just know the

252
00:14:37,440 --> 00:14:39,960
Speaker 1: four letters. They're kind of a code. And some parts

253
00:14:39,960 --> 00:14:43,920
Speaker 1: of the DNA represent useful stuff that is being turned

254
00:14:44,080 --> 00:14:47,720
Speaker 1: by your cell into proteins and et cetera, and other

255
00:14:47,840 --> 00:14:49,880
Speaker 1: parts of the DNA don't appear to have any function

256
00:14:49,920 --> 00:14:51,760
Speaker 1: at all, and it's really important to know which is

257
00:14:51,800 --> 00:14:56,000
Speaker 1: which as a biology researcher. And so it's you know,

258
00:14:56,040 --> 00:15:01,240
Speaker 1: for a long time before high throughput sequencing, biologists would

259
00:15:01,240 --> 00:15:03,320
Speaker 1: be in the lab and they would very laboriously look

260
00:15:03,360 --> 00:15:05,920
Speaker 1: at very tiny segments of DNA and establish what their

261
00:15:05,920 --> 00:15:08,960
Speaker 1: function was. But now we have the whole human genome

262
00:15:09,040 --> 00:15:10,880
Speaker 1: sitting on disk, and we would like to be able

263
00:15:10,920 --> 00:15:13,200
Speaker 1: to just run an analysis on it and have the

264
00:15:13,240 --> 00:15:16,760
Speaker 1: computer spit out everything that is functional and not functional.

265
00:15:17,760 --> 00:15:21,480
Speaker 1: And so that's the problem I worked on. And a

266
00:15:21,520 --> 00:15:24,400
Speaker 1: really important insight is that you can take advantage of

267
00:15:24,440 --> 00:15:28,600
Speaker 1: the idea of natural selection and the idea of evolution

268
00:15:29,120 --> 00:15:31,640
Speaker 1: to help you. And the way you do that is

269
00:15:32,160 --> 00:15:34,840
Speaker 1: you have the human genome, you sequence a bunch of

270
00:15:35,160 --> 00:15:38,800
Speaker 1: primate genomes nearby relatives of the union, and you lay

271
00:15:38,840 --> 00:15:41,760
Speaker 1: all those genomes on top of each other, and then

272
00:15:41,960 --> 00:15:45,800
Speaker 1: you look for places where all of the genomes agree. Right,

273
00:15:45,920 --> 00:15:50,080
Speaker 1: there hasn't been variation that's happening through mutations. And why

274
00:15:50,160 --> 00:15:53,080
Speaker 1: hasn't there been, Well, the biggest force that throws out

275
00:15:53,160 --> 00:15:56,440
Speaker 1: variation is natural selection. If you get a mutation in

276
00:15:56,480 --> 00:15:59,400
Speaker 1: a part of your genome that really matters, then you're

277
00:15:59,480 --> 00:16:02,640
Speaker 1: kind of on it and you won't have progeny and

278
00:16:02,680 --> 00:16:06,120
Speaker 1: that'll get stamped out. So natural selection is this very

279
00:16:06,120 --> 00:16:10,160
Speaker 1: strong force that's causing DNA not to change. And so

280
00:16:10,200 --> 00:16:13,160
Speaker 1: when you when you make these primate alignments, you can

281
00:16:13,360 --> 00:16:18,320
Speaker 1: really leverage that fact and look for conservation and use

282
00:16:18,360 --> 00:16:20,000
Speaker 1: that as a big signal that something is functional.

283
00:16:20,280 --> 00:16:25,160
Speaker 2: Huh, really really interesting. You mentioned our DNA is ninety

284
00:16:25,240 --> 00:16:28,640
Speaker 2: nine point ninety nine. I don't know how many places

285
00:16:28,640 --> 00:16:30,200
Speaker 2: to the right of the decimal point you would want

286
00:16:30,200 --> 00:16:34,560
Speaker 2: to go, but very similar. How how similar or different

287
00:16:34,720 --> 00:16:38,360
Speaker 2: are we from let's say, a chimpanzee. I've always questioned,

288
00:16:38,400 --> 00:16:41,680
Speaker 2: there's an urban legend that they're practically the same. It

289
00:16:41,680 --> 00:16:46,680
Speaker 2: always seems like it's overstated two percent. So you and

290
00:16:46,720 --> 00:16:49,680
Speaker 2: I have a point one percent different me and the

291
00:16:49,720 --> 00:16:52,280
Speaker 2: average chimp. It's two point zero percent.

292
00:16:52,440 --> 00:16:55,800
Speaker 1: That's exactly right. Yeah, so chimps are essentially our closest

293
00:16:56,360 --> 00:16:57,720
Speaker 1: non human primate relatives.

294
00:16:58,320 --> 00:17:02,280
Speaker 2: Really really quite fascinating. So let's talk a little bit

295
00:17:02,320 --> 00:17:05,160
Speaker 2: about the firm. You guys were one of the earliest

296
00:17:05,160 --> 00:17:09,040
Speaker 2: pioneers of machine learning research. Explain a little bit what

297
00:17:09,119 --> 00:17:09,840
Speaker 2: the firm does.

298
00:17:10,880 --> 00:17:16,439
Speaker 1: Sure, so, we run trading strategies investment strategies that are

299
00:17:16,640 --> 00:17:20,320
Speaker 1: fully automated, so we call them fully systematic, and that

300
00:17:20,400 --> 00:17:24,760
Speaker 1: means that we have software systems that run every day

301
00:17:25,440 --> 00:17:29,800
Speaker 1: during market hours, and they take in information about the

302
00:17:29,880 --> 00:17:34,400
Speaker 1: characteristics of the securities we're trading. Think of stocks and

303
00:17:34,440 --> 00:17:39,800
Speaker 1: then they make predictions of how the prices of each

304
00:17:39,960 --> 00:17:43,400
Speaker 1: security is going to change over time, and then they

305
00:17:44,480 --> 00:17:47,600
Speaker 1: decide on changes in our inventory, changes in held positions

306
00:17:48,280 --> 00:17:53,000
Speaker 1: based on those predictions, and then those desired changes are

307
00:17:53,040 --> 00:17:56,360
Speaker 1: sent into an execution system which automatically carries them out.

308
00:17:56,960 --> 00:18:02,680
Speaker 2: So fully automated. Is there supervision or it's kind of

309
00:18:02,760 --> 00:18:05,040
Speaker 2: running on its own with a couple of checks.

310
00:18:05,119 --> 00:18:09,119
Speaker 1: There's lots of human diagnostic supervision, right, So there are

311
00:18:09,160 --> 00:18:14,760
Speaker 1: people who are watching screens full of instrumentation and telemetry

312
00:18:14,840 --> 00:18:17,720
Speaker 1: about what the systems are doing. But those people are

313
00:18:17,760 --> 00:18:21,240
Speaker 1: not taking any actions, right unless there's a problem, right,

314
00:18:21,320 --> 00:18:23,000
Speaker 1: and then they do.

315
00:18:23,480 --> 00:18:26,120
Speaker 2: So let's talk a little bit about how machines learn

316
00:18:26,200 --> 00:18:30,280
Speaker 2: to identify signals. I'm assuming you start with the giant

317
00:18:30,359 --> 00:18:35,320
Speaker 2: database that is the history of stock prices, volume movement, etc.

318
00:18:36,200 --> 00:18:38,840
Speaker 2: And then bring in a lot of additional things to bear.

319
00:18:39,520 --> 00:18:44,439
Speaker 2: What's the process like developing a particular trading strategy.

320
00:18:44,960 --> 00:18:48,760
Speaker 1: Yeah, so, as you're saying, we begin with a very

321
00:18:48,840 --> 00:18:54,160
Speaker 1: large historical data set of prices and volumes, market data

322
00:18:54,200 --> 00:18:59,440
Speaker 1: that kind, but importantly all kinds of other information about securities,

323
00:19:00,000 --> 00:19:04,760
Speaker 1: financial statement data, textual data, analyst data.

324
00:19:05,080 --> 00:19:11,000
Speaker 2: So it's everything from prices fundamental everything from learnings to

325
00:19:11,080 --> 00:19:14,080
Speaker 2: revenue to sales, etc. I'm assuming the change and the

326
00:19:14,680 --> 00:19:17,159
Speaker 2: delta of the change is going to be very significant

327
00:19:17,200 --> 00:19:22,000
Speaker 2: in that. What about macroeconomic what some people call noise,

328
00:19:22,119 --> 00:19:27,159
Speaker 2: but one would imagine some signal in everything from inflation

329
00:19:27,359 --> 00:19:31,840
Speaker 2: to interest rates to GDPs firm spending. Are those inputs

330
00:19:32,280 --> 00:19:34,200
Speaker 2: worthwhile or how do you think about those?

331
00:19:34,560 --> 00:19:38,640
Speaker 1: So we don't hold portfolios that are exposed to those things.

332
00:19:38,760 --> 00:19:42,320
Speaker 1: So it's really a business decision on our part. We

333
00:19:42,440 --> 00:19:47,200
Speaker 1: are working with institutional investors who already have as much

334
00:19:47,240 --> 00:19:50,200
Speaker 1: exposure as they want to things like the market or

335
00:19:50,520 --> 00:19:56,040
Speaker 1: to well recognized econometric risk factors like value, and so

336
00:19:56,080 --> 00:19:58,680
Speaker 1: they don't need our help to be exposed to those things.

337
00:19:58,680 --> 00:20:01,439
Speaker 1: They are very well equipped to handle that part of

338
00:20:01,480 --> 00:20:05,440
Speaker 1: their investment process. What we're trying to provide is the

339
00:20:05,480 --> 00:20:09,000
Speaker 1: most diversification possible. So we want to give them a

340
00:20:09,040 --> 00:20:14,000
Speaker 1: new return stream which has good and stable returns, but

341
00:20:14,160 --> 00:20:17,199
Speaker 1: on top of that, importantly, is also not correlated with

342
00:20:17,240 --> 00:20:19,639
Speaker 1: any of the other return streams that they already that

343
00:20:19,680 --> 00:20:20,280
Speaker 1: they already have.

344
00:20:20,480 --> 00:20:25,040
Speaker 2: That's interesting. So can I assume that you're applying your

345
00:20:25,359 --> 00:20:29,359
Speaker 2: machine learning methodology across different asset classes or is it

346
00:20:29,400 --> 00:20:30,560
Speaker 2: strictly equities? Oh?

347
00:20:30,600 --> 00:20:34,639
Speaker 1: No, We apply it to UH to equities, to credit,

348
00:20:34,720 --> 00:20:39,840
Speaker 1: to corporate bonds, and we trade futures contracts, and in

349
00:20:39,880 --> 00:20:41,520
Speaker 1: the fullness of time, we hope that we will be

350
00:20:41,560 --> 00:20:44,320
Speaker 1: trading kind of every security in the world.

351
00:20:44,359 --> 00:20:47,159
Speaker 2: So, so currently stocks, bonds, When you say futures, I

352
00:20:47,200 --> 00:20:48,440
Speaker 2: assume commodities, all.

353
00:20:48,400 --> 00:20:49,320
Speaker 1: Kinds of futures contract.

354
00:20:49,359 --> 00:20:52,399
Speaker 2: It's really really interesting. So it could be anything from

355
00:20:52,640 --> 00:20:56,560
Speaker 2: interest rate swaps to commodities to the full gamut. So,

356
00:20:56,880 --> 00:21:01,480
Speaker 2: so how different is this approach from what other quant

357
00:21:01,600 --> 00:21:05,480
Speaker 2: shops do that really focus on equities.

358
00:21:06,800 --> 00:21:11,280
Speaker 1: I think it's kind of the same question as asking, well,

359
00:21:11,400 --> 00:21:13,119
Speaker 1: what do we mean when we say we use machine

360
00:21:13,160 --> 00:21:16,480
Speaker 1: learning or that you know we are our principles are

361
00:21:16,520 --> 00:21:20,520
Speaker 1: our machine learning principles, and so how does that make

362
00:21:20,600 --> 00:21:24,159
Speaker 1: us different than the kind of standard approach in quantitative trading?

363
00:21:24,840 --> 00:21:28,000
Speaker 1: And the answer to the question really comes back to

364
00:21:28,000 --> 00:21:31,720
Speaker 1: this idea we mentioned a little while ago of how

365
00:21:31,760 --> 00:21:36,200
Speaker 1: powerful the tools are that you're using to form predictions. Right,

366
00:21:36,480 --> 00:21:40,199
Speaker 1: So in our business, the thing that we build is

367
00:21:40,200 --> 00:21:44,040
Speaker 1: called a prediction rule. Okay, that's that's our widget and

368
00:21:44,320 --> 00:21:46,760
Speaker 1: What a prediction rule does is it takes in a

369
00:21:46,760 --> 00:21:49,560
Speaker 1: bunch of input, a bunch of information about a stock

370
00:21:49,880 --> 00:21:53,600
Speaker 1: at a moment in time, and it hands you a

371
00:21:53,760 --> 00:21:56,080
Speaker 1: guess about how that stock's price is going to change

372
00:21:56,200 --> 00:22:00,840
Speaker 1: over some future period of time. Okay, and so there

373
00:22:00,920 --> 00:22:05,320
Speaker 1: is one most important question about prediction rules, which is

374
00:22:05,480 --> 00:22:07,840
Speaker 1: how complex are they? How much complexity do they have?

375
00:22:08,400 --> 00:22:13,000
Speaker 1: Complexity is a colloquial term. It's unfortunately another example of

376
00:22:13,600 --> 00:22:16,879
Speaker 1: a place where things can be vague or ambiguous because

377
00:22:18,080 --> 00:22:21,359
Speaker 1: a general purpose word has been borrowed in a technical setting.

378
00:22:21,520 --> 00:22:24,399
Speaker 1: But when you use the word complexity in statistical prediction,

379
00:22:24,720 --> 00:22:28,800
Speaker 1: there's a very specific meaning. It means how much expressive

380
00:22:28,840 --> 00:22:32,280
Speaker 1: power does this prediction rule have? How good a job

381
00:22:32,440 --> 00:22:35,280
Speaker 1: can it do of approximating what's going on in the

382
00:22:35,359 --> 00:22:38,200
Speaker 1: data you show it. Remember, we have these giant historical

383
00:22:38,280 --> 00:22:41,760
Speaker 1: data sets, and every entry in the data set looks

384
00:22:41,800 --> 00:22:44,520
Speaker 1: like this. What was going on with the stock at

385
00:22:44,520 --> 00:22:47,440
Speaker 1: a moment in a certain moment in time, it's price action,

386
00:22:47,680 --> 00:22:52,080
Speaker 1: it's financials analyst information. And then what did its price

387
00:22:52,160 --> 00:22:55,040
Speaker 1: do in the subsequent twenty four hours or the subsequent

388
00:22:55,160 --> 00:23:01,000
Speaker 1: fifteen minutes or whatever. Okay, and so when you talk

389
00:23:01,040 --> 00:23:04,240
Speaker 1: about the amount of complexity that a prediction rule has,

390
00:23:04,720 --> 00:23:08,280
Speaker 1: that means how well is it able to capture the

391
00:23:08,320 --> 00:23:11,000
Speaker 1: relationship between the things that you can show it when

392
00:23:11,040 --> 00:23:13,840
Speaker 1: you ask it for a prediction, and what actually happens

393
00:23:14,040 --> 00:23:18,159
Speaker 1: to the price. And naturally you kind of want to

394
00:23:18,720 --> 00:23:20,840
Speaker 1: use high complexity rules because they have a lot of

395
00:23:20,880 --> 00:23:23,440
Speaker 1: approximating power. They do a good job of describing anything

396
00:23:23,480 --> 00:23:26,920
Speaker 1: that's going on. But there are two There are two

397
00:23:26,960 --> 00:23:30,639
Speaker 1: disadvantages to high complexity. One is it needs a lot

398
00:23:30,680 --> 00:23:34,639
Speaker 1: of data, otherwise it gets fooled into thinking that randomness

399
00:23:34,680 --> 00:23:39,000
Speaker 1: is actually signal. And the other is that it's hard

400
00:23:39,000 --> 00:23:41,880
Speaker 1: to reason about what's going on under the hood. Right,

401
00:23:41,960 --> 00:23:45,080
Speaker 1: when you have very simple prediction rules, you can sort

402
00:23:45,119 --> 00:23:48,040
Speaker 1: of summarize everything that's good that they're doing in a sentence. Right,

403
00:23:48,119 --> 00:23:50,920
Speaker 1: you can look you can look inside them and get

404
00:23:50,920 --> 00:23:53,840
Speaker 1: a complete understanding of how they behave, and that's not

405
00:23:53,880 --> 00:23:55,960
Speaker 1: possible with high complexity prediction rules.

406
00:23:56,040 --> 00:23:59,440
Speaker 2: So I'm glad you brought up the concept of how

407
00:23:59,520 --> 00:24:04,159
Speaker 2: easy it is or how frequently you can fool an

408
00:24:04,200 --> 00:24:08,000
Speaker 2: algorithm or a complex rule, because sometimes the results are

409
00:24:08,040 --> 00:24:11,119
Speaker 2: just random. And it reminds me of the issue of

410
00:24:11,880 --> 00:24:14,960
Speaker 2: back testing. No one ever shows you a bad back test.

411
00:24:15,480 --> 00:24:19,480
Speaker 2: How do you deal with the issue of overfitting and

412
00:24:19,720 --> 00:24:23,479
Speaker 2: back testing that just is geared towards what already happened

413
00:24:23,520 --> 00:24:25,440
Speaker 2: and not what might happen in the future.

414
00:24:25,680 --> 00:24:28,960
Speaker 1: Yeah, that is you know, if you like the million

415
00:24:28,960 --> 00:24:34,639
Speaker 1: dollar question in statistical prediction, Okay, and it might you

416
00:24:34,720 --> 00:24:38,840
Speaker 1: might find it surprising that relatively straightforward ideas go a

417
00:24:38,840 --> 00:24:43,280
Speaker 1: long way here. And so let me let me just

418
00:24:43,359 --> 00:24:45,399
Speaker 1: describe a little scenario of how you deal you can

419
00:24:45,440 --> 00:24:47,560
Speaker 1: deal with this. All right, we agree, we have this

420
00:24:47,600 --> 00:24:50,840
Speaker 1: big historical data set, right, One thing you could do

421
00:24:50,920 --> 00:24:53,280
Speaker 1: is just start analyzing the heck out of that data

422
00:24:53,320 --> 00:24:56,920
Speaker 1: set and find a complicated prediction rule. But you're you've

423
00:24:56,920 --> 00:24:59,640
Speaker 1: already started doing it wrong. The first thing you do

424
00:24:59,760 --> 00:25:02,239
Speaker 1: before or you even look at the data is you

425
00:25:02,560 --> 00:25:04,520
Speaker 1: randomly pick out half of the data and you lock

426
00:25:04,560 --> 00:25:07,280
Speaker 1: it in a drawer. Okay, and that leads you with

427
00:25:07,359 --> 00:25:09,280
Speaker 1: the other half of the data that you haven't locked away.

428
00:25:09,800 --> 00:25:12,119
Speaker 1: On this half, you get to go hogwild. You build

429
00:25:12,200 --> 00:25:16,399
Speaker 1: every kind of prediction rule, simple rules, enormously complicated rules,

430
00:25:16,480 --> 00:25:20,640
Speaker 1: everything in between. Right, and now you can check how

431
00:25:20,960 --> 00:25:23,520
Speaker 1: accurate all of these prediction rules that you've built are

432
00:25:24,400 --> 00:25:26,800
Speaker 1: on the data that they have been looking at, and

433
00:25:26,840 --> 00:25:29,200
Speaker 1: the answer will always be the same. The most complex

434
00:25:29,280 --> 00:25:32,000
Speaker 1: rules will look the best. Of course, they have the

435
00:25:32,040 --> 00:25:35,360
Speaker 1: most expressive power, so naturally they do the best job

436
00:25:35,359 --> 00:25:38,600
Speaker 1: of describing what you showed them. The big problem is

437
00:25:38,960 --> 00:25:41,040
Speaker 1: that what you showed them is a mix of signal

438
00:25:41,080 --> 00:25:43,960
Speaker 1: and noise, and there's no way you can tell to

439
00:25:44,080 --> 00:25:47,080
Speaker 1: what extent a complex rule has found the signal versus

440
00:25:47,119 --> 00:25:49,400
Speaker 1: the noise. All you know is that it's perfectly described

441
00:25:49,440 --> 00:25:51,960
Speaker 1: to the data you showed it. You certainly suspect it

442
00:25:52,000 --> 00:25:55,520
Speaker 1: must be overfitting if it's doing that. Well, okay, so

443
00:25:55,640 --> 00:25:59,200
Speaker 1: now you freeze all those prediction rules. You're not allowed

444
00:25:59,200 --> 00:26:01,560
Speaker 1: to change them in any way anymore. And now you

445
00:26:01,640 --> 00:26:04,120
Speaker 1: unlock the drawer and you pull out all that data

446
00:26:04,160 --> 00:26:06,520
Speaker 1: that you've never looked at. You can't overfit data that

447
00:26:06,560 --> 00:26:09,840
Speaker 1: you never fit, and so you take that data and

448
00:26:09,880 --> 00:26:12,359
Speaker 1: you run it through each of these prediction rules that's frozen,

449
00:26:12,359 --> 00:26:14,880
Speaker 1: that you built. And now it is not the case

450
00:26:14,920 --> 00:26:18,760
Speaker 1: at all that the most complex rules look the best. Instead,

451
00:26:19,119 --> 00:26:23,080
Speaker 1: you'll see a kind of U shaped behavior where the

452
00:26:23,200 --> 00:26:25,760
Speaker 1: very simple rules are too simple. They've missed signal, they

453
00:26:25,840 --> 00:26:29,120
Speaker 1: left signal on the table. The two complex rules are

454
00:26:29,280 --> 00:26:32,320
Speaker 1: also doing badly because they've captured all the signal but

455
00:26:32,320 --> 00:26:34,440
Speaker 1: also lots of noise. And then somewhere in the middle

456
00:26:34,520 --> 00:26:37,840
Speaker 1: is a sweet spot where you've struck the right trade

457
00:26:37,840 --> 00:26:42,720
Speaker 1: off between how much expressive power the prediction rule has

458
00:26:43,000 --> 00:26:45,439
Speaker 1: and how good a job it is doing of avoiding

459
00:26:46,440 --> 00:26:49,080
Speaker 1: the mistaking of noise for signal.

460
00:26:49,280 --> 00:26:53,000
Speaker 2: Really really intriguing. So you guys, have you've built one

461
00:26:53,000 --> 00:26:57,399
Speaker 2: of the largest specialized machine learning research and development teams

462
00:26:57,480 --> 00:27:01,080
Speaker 2: in finance. How do you assndle a team like that

463
00:27:02,359 --> 00:27:05,720
Speaker 2: and how do you get the brain trust to do

464
00:27:05,800 --> 00:27:09,680
Speaker 2: the sort of work that's applicable to managing assets.

465
00:27:10,680 --> 00:27:13,879
Speaker 1: Well, the short answer is, we spend a huge amount

466
00:27:14,040 --> 00:27:20,160
Speaker 1: of energy on recruiting and uh, you know, identifying the

467
00:27:20,200 --> 00:27:23,280
Speaker 1: sort of premier people in the field of machine learning,

468
00:27:23,440 --> 00:27:28,480
Speaker 1: kind of both academic and practitioners, and we exhibit a

469
00:27:28,480 --> 00:27:30,760
Speaker 1: lot of patients. We we wait a really long time

470
00:27:31,280 --> 00:27:33,400
Speaker 1: to be able to find the people who are kind

471
00:27:33,400 --> 00:27:36,919
Speaker 1: of really the best, and that that that matters enormously

472
00:27:37,080 --> 00:27:40,399
Speaker 1: to us, both from the standpoint of the success of

473
00:27:40,400 --> 00:27:43,840
Speaker 1: the firm and also because it's something that you know,

474
00:27:43,880 --> 00:27:47,240
Speaker 1: we value extremely highly just having great colleagues, brilliant colleagues

475
00:27:47,240 --> 00:27:49,120
Speaker 1: that you know, I want to work in a place

476
00:27:49,119 --> 00:27:51,720
Speaker 1: where I can learn from all the people around me.

477
00:27:51,920 --> 00:27:55,520
Speaker 1: And you know, when when my co founder Michael Caratanev

478
00:27:55,520 --> 00:28:01,399
Speaker 1: and I we're talking about starting Bollion, one of the

479
00:28:01,440 --> 00:28:04,359
Speaker 1: reasons that was on our minds is we wanted to

480
00:28:04,520 --> 00:28:07,639
Speaker 1: be in control of who we worked with. You know,

481
00:28:07,680 --> 00:28:10,640
Speaker 1: we really wanted to be able to assemble a group

482
00:28:10,680 --> 00:28:14,359
Speaker 1: of people who were, you know, as brilliant as we

483
00:28:14,400 --> 00:28:17,359
Speaker 1: could find, but also you know, good people, people that

484
00:28:17,520 --> 00:28:19,640
Speaker 1: we liked, people that we were excited to collaborate with.

485
00:28:20,000 --> 00:28:22,960
Speaker 2: So let's talk about some of the fundamental principles Volnon

486
00:28:23,160 --> 00:28:29,440
Speaker 2: is built on. You reference a prediction based approach from

487
00:28:29,480 --> 00:28:33,840
Speaker 2: a paper Leo Briman wrote called two Cultures. Yeah, tell

488
00:28:33,920 --> 00:28:37,000
Speaker 2: us a little bit about what two cultures actually is.

489
00:28:37,200 --> 00:28:40,600
Speaker 1: Yeah. So this this paper was written about twenty years ago.

490
00:28:41,080 --> 00:28:45,400
Speaker 1: Leo Briman was one of the great probabilists and statisticians

491
00:28:46,360 --> 00:28:53,760
Speaker 1: of his generation. Berkeley professor need I say, and you know,

492
00:28:53,880 --> 00:28:59,280
Speaker 1: Leo had been a practitioner in statistical consulting actually for

493
00:28:59,400 --> 00:29:02,840
Speaker 1: quite some time. I'm in between a U. C. L.

494
00:29:02,880 --> 00:29:06,880
Speaker 1: A tenured job and returning to academia at Berkeley, and

495
00:29:07,080 --> 00:29:10,600
Speaker 1: he learned a lot in that time about actually solving

496
00:29:10,640 --> 00:29:15,880
Speaker 1: prediction problems and instead of hypothetically solving them in sort

497
00:29:15,880 --> 00:29:20,800
Speaker 1: of the academic context. And so all of his insights

498
00:29:20,840 --> 00:29:25,160
Speaker 1: about the difference really culminated in this paper from two

499
00:29:25,200 --> 00:29:26,200
Speaker 1: thousand that he wrote.

500
00:29:26,160 --> 00:29:30,000
Speaker 2: The difference between practical use versus academic theory if you like.

501
00:29:30,160 --> 00:29:35,000
Speaker 1: Yeah, And so he identified two schools of thought about

502
00:29:35,000 --> 00:29:40,400
Speaker 1: solving prediction problems, right, and one school is sort of

503
00:29:40,920 --> 00:29:44,120
Speaker 1: model based. The idea is there's some stuff you're going

504
00:29:44,200 --> 00:29:49,080
Speaker 1: to get to observe stock characteristics. Let's say there's a

505
00:29:49,080 --> 00:29:51,840
Speaker 1: thing you wish you knew future price change, let's say,

506
00:29:51,880 --> 00:29:55,280
Speaker 1: and there's a box in nature that turns those inputs

507
00:29:55,280 --> 00:29:59,840
Speaker 1: into the output, right. And in the model based school

508
00:29:59,880 --> 00:30:03,400
Speaker 1: of thought, you try to open that box, reason about

509
00:30:03,400 --> 00:30:06,440
Speaker 1: how it must work, make theories. In our case, these

510
00:30:06,440 --> 00:30:11,440
Speaker 1: would be sort of econometric theories, financial economics theories. And

511
00:30:11,480 --> 00:30:14,640
Speaker 1: then those theories have knobs, not many, and you use

512
00:30:14,760 --> 00:30:17,560
Speaker 1: data to set the knobs, but otherwise you believe the model.

513
00:30:19,200 --> 00:30:22,520
Speaker 1: And he contrasts that with the machine learning school of thought,

514
00:30:22,640 --> 00:30:27,840
Speaker 1: which is also has the idea of Nature's box. The

515
00:30:27,920 --> 00:30:30,360
Speaker 1: inputs go in, the thing you wish you knew comes out.

516
00:30:30,840 --> 00:30:32,920
Speaker 1: But in machine learning, you don't try to open the box.

517
00:30:33,280 --> 00:30:35,600
Speaker 1: You just try to approximate what the box is doing.

518
00:30:36,120 --> 00:30:39,800
Speaker 1: And your measure of success is predictive accuracy, and is

519
00:30:39,880 --> 00:30:43,240
Speaker 1: only predictive accuracy. If you build a gadget and that

520
00:30:43,320 --> 00:30:47,400
Speaker 1: gadget produces predictions that are really accurate they turn out

521
00:30:47,440 --> 00:30:50,680
Speaker 1: to look like the thing that nature produces, then that

522
00:30:50,800 --> 00:30:55,080
Speaker 1: is success. And at the time he wrote the paper,

523
00:30:55,200 --> 00:30:58,960
Speaker 1: his assessment was ninety eight percent of statistics was taking

524
00:30:58,960 --> 00:31:01,560
Speaker 1: the model based approach, two percent was taking the machine

525
00:31:01,600 --> 00:31:02,200
Speaker 1: learning approach.

526
00:31:02,600 --> 00:31:05,520
Speaker 2: And are those statistics still valid today or have we

527
00:31:05,600 --> 00:31:06,440
Speaker 2: shifted quite a bit?

528
00:31:06,520 --> 00:31:10,520
Speaker 1: We shifted quite a bit, And different different arenas of

529
00:31:11,680 --> 00:31:16,160
Speaker 1: prediction problems have different mixes these days. But even in finance,

530
00:31:16,200 --> 00:31:19,120
Speaker 1: I would say it's it's probably more like fifty to.

531
00:31:19,120 --> 00:31:20,960
Speaker 2: Fifty really that much?

532
00:31:21,240 --> 00:31:26,200
Speaker 1: Yeah, I think you know, And if you the logical

533
00:31:26,240 --> 00:31:31,760
Speaker 1: extreme is natural language modeling, which was done for decades

534
00:31:31,760 --> 00:31:35,080
Speaker 1: and decades in the model based approach, where you kind

535
00:31:35,080 --> 00:31:39,120
Speaker 1: of reasoned about linguistic characteristics of how people kind of

536
00:31:39,240 --> 00:31:42,800
Speaker 1: do dialogue and those models had some parameters and you

537
00:31:42,840 --> 00:31:47,080
Speaker 1: fit them with data, and then instead you have, as

538
00:31:47,080 --> 00:31:50,680
Speaker 1: we said, a database of a trillion words and a

539
00:31:50,720 --> 00:31:53,160
Speaker 1: tool with one hundred and seventy five billion parameters, and

540
00:31:53,200 --> 00:31:56,600
Speaker 1: you run that and there is no hope of completely

541
00:31:56,680 --> 00:31:59,640
Speaker 1: understanding what is going on inside of GPD three. But

542
00:31:59,680 --> 00:32:04,080
Speaker 1: nobody complains about that because the results are astounding. The

543
00:32:04,120 --> 00:32:07,479
Speaker 1: thing that you get is incredible. And so that is

544
00:32:08,320 --> 00:32:12,320
Speaker 1: by analogy, the way that we reason about running systematic

545
00:32:12,360 --> 00:32:16,640
Speaker 1: investment strategies. At the end of the day, predictive accuracy

546
00:32:16,840 --> 00:32:20,840
Speaker 1: is what creates returns for investors. Being able to give

547
00:32:21,760 --> 00:32:25,239
Speaker 1: complete descriptions of exactly how the predictions arise does not

548
00:32:25,400 --> 00:32:29,040
Speaker 1: in itself create returns for investors. Now, I'm not against

549
00:32:29,080 --> 00:32:32,800
Speaker 1: interpretability and simplicity all equal. I love interpretability and simplicity,

550
00:32:32,960 --> 00:32:35,719
Speaker 1: but all else is not equal. If you want the

551
00:32:35,760 --> 00:32:39,520
Speaker 1: most accurate predictions, you are going to have to sacrifice

552
00:32:39,560 --> 00:32:43,160
Speaker 1: some amount of simplicity. In fact, this truth is so

553
00:32:43,280 --> 00:32:45,400
Speaker 1: widespread that Leo gave it a name in his paper.

554
00:32:45,440 --> 00:32:49,080
Speaker 1: He called it Accam's dilemma. So Accam's razor is the

555
00:32:49,080 --> 00:32:52,240
Speaker 1: philosophical idea that you should choose the simplest explanation that

556
00:32:52,320 --> 00:32:58,360
Speaker 1: fits the facts. Akam's dilemma is the point that in

557
00:32:58,400 --> 00:33:02,560
Speaker 1: statistical prediction, simplest approach, even though you wish you could

558
00:33:02,600 --> 00:33:04,840
Speaker 1: choose it, is not the most accurate approach if you

559
00:33:04,880 --> 00:33:08,600
Speaker 1: care about predictive accuracy. If you're putting predictive accuracy first,

560
00:33:09,120 --> 00:33:12,160
Speaker 1: then you have to embrace a certain amount of complexity

561
00:33:12,400 --> 00:33:13,640
Speaker 1: and lack of interpretability.

562
00:33:13,960 --> 00:33:17,960
Speaker 2: Huh, that's really quite fascinating. So let's talk a little

563
00:33:17,960 --> 00:33:24,600
Speaker 2: bit about artificial intelligence and large language models. You follow

564
00:33:24,680 --> 00:33:29,280
Speaker 2: d Shaw playing in e commerce and biotech. It seems

565
00:33:29,400 --> 00:33:34,920
Speaker 2: like this approach to using statistics, probability, and computer science

566
00:33:35,960 --> 00:33:38,240
Speaker 2: is applicable to so many different fields.

567
00:33:38,640 --> 00:33:42,760
Speaker 1: It is. Yeah, I think you're talking about prediction problems ultimately,

568
00:33:43,000 --> 00:33:48,920
Speaker 1: So in recommender systems, you can think of the question

569
00:33:49,000 --> 00:33:53,160
Speaker 1: as being well, if I had to predict what thing

570
00:33:53,200 --> 00:33:55,920
Speaker 1: I could show a person that would you be most

571
00:33:56,080 --> 00:33:59,600
Speaker 1: likely to change their behavior and cause them to buy it.

572
00:34:00,000 --> 00:34:06,120
Speaker 1: It's a kind of prediction problem that motivates recommendations. In biotechnology.

573
00:34:07,440 --> 00:34:12,120
Speaker 1: Very often we are trying to make predictions about whether someone,

574
00:34:12,440 --> 00:34:15,600
Speaker 1: let's say, does or doesn't have a condition a disease

575
00:34:15,840 --> 00:34:19,799
Speaker 1: based on lots of information we can gather from high

576
00:34:19,880 --> 00:34:25,879
Speaker 1: throughput diagnostic techniques. These days, the keyword in biology and

577
00:34:25,960 --> 00:34:30,319
Speaker 1: in medicine and biotechnology is high throughput. You're running analyses

578
00:34:30,680 --> 00:34:34,520
Speaker 1: on an individual that are producing hundreds of thousands of numbers,

579
00:34:35,360 --> 00:34:37,680
Speaker 1: and you want to be able to take all of

580
00:34:37,719 --> 00:34:40,520
Speaker 1: that kind of wealth of data and turn it into

581
00:34:40,560 --> 00:34:42,400
Speaker 1: diagnostic information about.

582
00:34:42,160 --> 00:34:47,759
Speaker 2: And we've seen AI get applied to pharmaceutical development in

583
00:34:47,840 --> 00:34:52,120
Speaker 2: ways that people just never really could have imagined just

584
00:34:52,200 --> 00:34:54,879
Speaker 2: a few short years ago. Is there a field that

585
00:34:55,040 --> 00:34:57,879
Speaker 2: AI and large language models are not going to touch

586
00:34:58,280 --> 00:34:59,920
Speaker 2: or is this just the future of everything.

587
00:35:01,800 --> 00:35:04,160
Speaker 1: The kinds of fields where you would expect uptake to

588
00:35:04,200 --> 00:35:10,040
Speaker 1: be slow are where it is hard to assemble large

589
00:35:10,120 --> 00:35:15,760
Speaker 1: data sets of systematically gathered data. And so any field

590
00:35:15,800 --> 00:35:20,959
Speaker 1: where it's relatively easy to at large scale, let's say,

591
00:35:20,960 --> 00:35:23,279
Speaker 1: produce the kinds of the same kinds of informations that

592
00:35:23,920 --> 00:35:26,520
Speaker 1: experts are using to make their decisions, you should expect

593
00:35:26,520 --> 00:35:29,160
Speaker 1: that field to be impacted by these tools if it

594
00:35:29,160 --> 00:35:29,920
Speaker 1: hasn't been already.

595
00:35:30,000 --> 00:35:33,200
Speaker 2: So you're kind of answering my next question, which is

596
00:35:33,680 --> 00:35:36,719
Speaker 2: what led you back to investment management. But it seems

597
00:35:37,040 --> 00:35:40,560
Speaker 2: if there's any field that just generates endless amounts of data.

598
00:35:40,840 --> 00:35:44,200
Speaker 1: It's the markets, that's true. And I had already been

599
00:35:44,960 --> 00:35:48,879
Speaker 1: really interested in the problems of systematic investment strategies from

600
00:35:48,920 --> 00:35:52,160
Speaker 1: my time working in d SHAW, and so my co

601
00:35:52,280 --> 00:35:56,200
Speaker 1: founder Michael Kratanav and I, you know, we were both

602
00:35:56,200 --> 00:35:59,399
Speaker 1: in the Bay Area in the two thousand and four.

603
00:36:02,120 --> 00:36:04,280
Speaker 1: He was there because of a firm that he had founded,

604
00:36:04,320 --> 00:36:06,960
Speaker 1: and I was there finishing my PhD. And we started

605
00:36:06,960 --> 00:36:10,360
Speaker 1: to talk about the idea of using contemporary machine learning

606
00:36:10,360 --> 00:36:14,480
Speaker 1: methods to build strategies that would be, you know, really

607
00:36:14,480 --> 00:36:19,600
Speaker 1: different from strategies that result from classical techniques. And we

608
00:36:19,640 --> 00:36:21,400
Speaker 1: had met at d SHAW in the nineties and been

609
00:36:21,480 --> 00:36:26,640
Speaker 1: less excited about this idea because the methods were pretty immature.

610
00:36:27,000 --> 00:36:29,719
Speaker 1: There wasn't actually a giant diversity of data back in

611
00:36:29,760 --> 00:36:33,200
Speaker 1: the nineties in financial markets, not like there was in

612
00:36:33,239 --> 00:36:36,719
Speaker 1: two thousand and five. And compute was really still quite

613
00:36:36,719 --> 00:36:39,359
Speaker 1: expensive in the nineties, whereas in two thousand and five,

614
00:36:40,000 --> 00:36:42,440
Speaker 1: you know, it had been dropping in the usual More's

615
00:36:42,520 --> 00:36:45,560
Speaker 1: Law way. And this was even before GPUs, and so

616
00:36:45,840 --> 00:36:47,879
Speaker 1: when we looked at the problem in two thousand and five,

617
00:36:48,520 --> 00:36:52,960
Speaker 1: it felt like there was a very live opportunity to

618
00:36:53,040 --> 00:36:56,120
Speaker 1: do something with a lot of promise that would be

619
00:36:56,200 --> 00:36:59,680
Speaker 1: really different. And we had the sense that not a

620
00:36:59,680 --> 00:37:02,480
Speaker 1: lot of people were of the same opinion, and so

621
00:37:02,520 --> 00:37:04,680
Speaker 1: it seemed like something that we should try.

622
00:37:04,880 --> 00:37:08,240
Speaker 2: That there was a void. Nothing nothing in the market

623
00:37:08,239 --> 00:37:12,160
Speaker 2: hates more than a vacuum and intellectual approach. So so

624
00:37:12,560 --> 00:37:17,879
Speaker 2: you mentioned the diversity of various data sources. What what

625
00:37:18,000 --> 00:37:21,520
Speaker 2: don't you consider, like how how far off of price

626
00:37:21,560 --> 00:37:25,280
Speaker 2: and volume do you go in the net you're casting

627
00:37:25,440 --> 00:37:28,040
Speaker 2: for inputs into into your systems.

628
00:37:29,000 --> 00:37:33,440
Speaker 1: Well, I think we're prepared as a you know, as

629
00:37:33,480 --> 00:37:36,919
Speaker 1: a as a research principle, we're prepared to consider any

630
00:37:37,120 --> 00:37:41,160
Speaker 1: data that has some bearing on price formation, like some

631
00:37:41,160 --> 00:37:44,560
Speaker 1: some plausible bearing on how prices are formed. Now, of

632
00:37:44,600 --> 00:37:47,880
Speaker 1: course we're you know, we're a relatively small group of

633
00:37:47,880 --> 00:37:50,759
Speaker 1: people with a lot of ideas and uh, and so

634
00:37:51,120 --> 00:37:55,240
Speaker 1: we have to prioritize so you know, in the event

635
00:37:55,360 --> 00:37:58,200
Speaker 1: we end up pursuing data that you know makes a

636
00:37:58,239 --> 00:38:00,520
Speaker 1: lot of sense, you know, we don't we don't try.

637
00:38:00,920 --> 00:38:03,080
Speaker 2: I mean, can you go as far as politics or

638
00:38:03,120 --> 00:38:06,360
Speaker 2: the weather, like how far off of prices can you

639
00:38:06,560 --> 00:38:07,160
Speaker 2: can you look?

640
00:38:07,239 --> 00:38:10,120
Speaker 1: So, you know, an example would be the weather. You're

641
00:38:09,920 --> 00:38:12,800
Speaker 1: for most securities, you're not going to be very interested

642
00:38:12,880 --> 00:38:15,200
Speaker 1: in the weather, but for commodities future as you might be,

643
00:38:15,320 --> 00:38:17,080
Speaker 1: so that you know, that's the kind of reasoning you

644
00:38:17,080 --> 00:38:17,560
Speaker 1: would apply.

645
00:38:18,120 --> 00:38:22,799
Speaker 2: Right, really really interesting. So let's talk about some of

646
00:38:22,840 --> 00:38:26,960
Speaker 2: the strategies. You guys are running short and mid horizon

647
00:38:27,120 --> 00:38:32,600
Speaker 2: US equities, European equities, Asian equities, mid horizon US credit,

648
00:38:33,040 --> 00:38:36,680
Speaker 2: and then cross assets. So I might to assume all

649
00:38:36,719 --> 00:38:40,200
Speaker 2: of these are machine learning based, and how similar different

650
00:38:40,960 --> 00:38:43,480
Speaker 2: is each approach to each of those asset classes.

651
00:38:43,920 --> 00:38:50,040
Speaker 1: Yeah, they're all machine learning based. The kind of principles

652
00:38:50,080 --> 00:38:53,360
Speaker 1: that I've described of using as much complexity as you

653
00:38:53,440 --> 00:38:57,800
Speaker 1: need to maximize predictive accuracy, et cetera. Those principles underlie

654
00:38:57,880 --> 00:39:00,840
Speaker 1: all the systems. But of course it's trading. Trading corporate

655
00:39:00,840 --> 00:39:03,840
Speaker 1: bonds is very different from trading equities, and so the

656
00:39:04,200 --> 00:39:06,040
Speaker 1: implementations reflect that reality.

657
00:39:06,760 --> 00:39:09,879
Speaker 2: Huh. So let's talk a little bit about the four

658
00:39:09,960 --> 00:39:15,000
Speaker 2: step process that you bring to the systematic approach, and

659
00:39:15,040 --> 00:39:19,240
Speaker 2: this is off of your site, so it's it's data prediction, engine,

660
00:39:19,640 --> 00:39:26,400
Speaker 2: portfolio construction, and execution. Yeah, I'm assuming that is heavily

661
00:39:26,560 --> 00:39:30,359
Speaker 2: computer and machine learning based. At each step along the way.

662
00:39:30,440 --> 00:39:31,359
Speaker 2: Is that is that fair?

663
00:39:32,480 --> 00:39:34,880
Speaker 1: I think that's fair. I mean to different degrees. The

664
00:39:35,360 --> 00:39:41,719
Speaker 1: data gathering that's you know, that's a that's largely a

665
00:39:41,800 --> 00:39:46,080
Speaker 1: software and kind of operations and infrastructure job.

666
00:39:46,280 --> 00:39:48,280
Speaker 2: Do you guys have to spend a lot of time

667
00:39:48,400 --> 00:39:51,759
Speaker 2: cleaning up that data and making sure that because you

668
00:39:52,120 --> 00:39:56,320
Speaker 2: hear between CRISP and s and P and Bloomberg, sometimes

669
00:39:56,400 --> 00:39:58,719
Speaker 2: you'll pull something up and they're just all off a

670
00:39:58,719 --> 00:40:00,759
Speaker 2: little bit from each other because they all bring a

671
00:40:00,840 --> 00:40:04,040
Speaker 2: very different approach to data assembly. How do you make

672
00:40:04,080 --> 00:40:07,960
Speaker 2: sure everything is consistent and there's no errors or errants

673
00:40:09,040 --> 00:40:09,960
Speaker 2: inputs throughout.

674
00:40:10,239 --> 00:40:14,000
Speaker 1: Yeah, through a lot of effort. Essentially, there there we have.

675
00:40:15,040 --> 00:40:17,799
Speaker 1: You know, we have an entire group of people who

676
00:40:17,840 --> 00:40:23,800
Speaker 1: focus on data operations, both for gathering a historical data

677
00:40:23,880 --> 00:40:27,080
Speaker 1: and for the management of the ongoing live data feeds.

678
00:40:27,360 --> 00:40:29,319
Speaker 1: There's no way around that. I mean, that's just work

679
00:40:29,360 --> 00:40:31,080
Speaker 1: that you have to that you have to do.

680
00:40:31,200 --> 00:40:33,080
Speaker 2: You just have to brute force your way through that.

681
00:40:33,520 --> 00:40:36,640
Speaker 2: And then the prediction engine. Sounds like that's the single

682
00:40:36,719 --> 00:40:41,600
Speaker 2: most important part of the machine learning process if I'm

683
00:40:42,040 --> 00:40:45,880
Speaker 2: understanding you correctly, that that's where all the meat of

684
00:40:45,960 --> 00:40:47,080
Speaker 2: the technology is.

685
00:40:47,280 --> 00:40:50,680
Speaker 1: Yeah, I understand the sentiment. I mean, it's worth emphasizing

686
00:40:50,719 --> 00:40:54,200
Speaker 1: that you do not get to a successful systematic strategy

687
00:40:54,200 --> 00:40:57,200
Speaker 1: without all the ingredients. You have to have clean data

688
00:40:57,680 --> 00:41:01,920
Speaker 1: because of the garbage in garbage out. You have to

689
00:41:01,920 --> 00:41:06,880
Speaker 1: have accurate predictions. But you know, predictions don't automatically translate

690
00:41:06,920 --> 00:41:10,400
Speaker 1: into returns for investors. Those predictions are kind of the

691
00:41:10,480 --> 00:41:15,440
Speaker 1: power that drives the portfolio holding part of the system.

692
00:41:15,480 --> 00:41:18,520
Speaker 2: So let's talk about that portfolio construction. Given that you

693
00:41:18,640 --> 00:41:22,920
Speaker 2: have a prediction engine that and good data going into it,

694
00:41:23,200 --> 00:41:26,560
Speaker 2: so you're fairly confident as to the output. How do

695
00:41:26,640 --> 00:41:28,920
Speaker 2: you then take that output and say, here's how I'm

696
00:41:28,920 --> 00:41:32,400
Speaker 2: going to build a portfolio based on what this generates.

697
00:41:32,520 --> 00:41:38,920
Speaker 1: Yeah, so there are three big ingredients in the portfolio construction.

698
00:41:39,160 --> 00:41:43,440
Speaker 1: The predictions what is usually called a risk model in

699
00:41:43,760 --> 00:41:50,560
Speaker 1: this business, which means some understanding of how volatile prices

700
00:41:50,600 --> 00:41:54,000
Speaker 1: are across all the securities you're trading, how correlated they are,

701
00:41:54,840 --> 00:41:57,200
Speaker 1: how you know if they have a if they have

702
00:41:57,239 --> 00:42:00,719
Speaker 1: a big movement, how big that movement will be. That's

703
00:42:00,760 --> 00:42:03,880
Speaker 1: all the risk model. And then the final ingredient is

704
00:42:04,840 --> 00:42:08,240
Speaker 1: what's usually called a market impact model, and that means

705
00:42:09,640 --> 00:42:13,279
Speaker 1: an understanding of how much you are going to push

706
00:42:13,320 --> 00:42:15,759
Speaker 1: prices away from you when you try to trade. This

707
00:42:15,800 --> 00:42:18,799
Speaker 1: is a reality of all trading. You buy a lot

708
00:42:18,800 --> 00:42:21,239
Speaker 1: of a security, you push the price up, you push

709
00:42:21,280 --> 00:42:24,239
Speaker 1: it away from you in the unfavorable direction. And in

710
00:42:24,280 --> 00:42:27,959
Speaker 1: the systems that we run, the predictions that we're trying

711
00:42:27,960 --> 00:42:31,880
Speaker 1: to capture are about the same size as the effect

712
00:42:31,920 --> 00:42:34,279
Speaker 1: that we have on the markets when we trade, and

713
00:42:34,360 --> 00:42:38,439
Speaker 1: so you cannot neglect that impact effect when you're thinking

714
00:42:38,480 --> 00:42:41,000
Speaker 1: about what portfolios to hold.

715
00:42:41,040 --> 00:42:44,760
Speaker 2: So execution becomes really important. If you're not executing well,

716
00:42:44,920 --> 00:42:48,560
Speaker 2: you are moving prices away from your profit.

717
00:42:48,880 --> 00:42:53,400
Speaker 1: That's right, and it is you know, probably the single

718
00:42:53,520 --> 00:42:59,319
Speaker 1: thing that undoes quantitative hedge funds most often is that

719
00:43:00,040 --> 00:43:04,560
Speaker 1: they misunderstand how much they're moving prices. They get too big,

720
00:43:04,600 --> 00:43:08,040
Speaker 1: they start trading too much, and they sort of blowed

721
00:43:08,080 --> 00:43:08,600
Speaker 1: themselves up.

722
00:43:08,800 --> 00:43:11,120
Speaker 2: It's funny that you say that, because as you were

723
00:43:11,160 --> 00:43:14,239
Speaker 2: describing that, the first name that popped into my head

724
00:43:14,680 --> 00:43:19,400
Speaker 2: was long term capital managements trading these really thinly traded

725
00:43:20,239 --> 00:43:27,080
Speaker 2: obscure fixed income products, and everything they bought they sent

726
00:43:27,239 --> 00:43:30,359
Speaker 2: higher because there just wasn't any volume in it. And

727
00:43:30,400 --> 00:43:33,359
Speaker 2: when they needed liquiditly there was none to be had.

728
00:43:33,440 --> 00:43:36,800
Speaker 2: And you know that plus no risk management one hundred

729
00:43:36,960 --> 00:43:39,440
Speaker 2: x leverage equals a kaboom.

730
00:43:39,840 --> 00:43:43,120
Speaker 1: They made a number of mistakes the book. The book

731
00:43:43,160 --> 00:43:45,279
Speaker 1: is good. So when genius fail in, oh absolutely love

732
00:43:45,280 --> 00:43:46,880
Speaker 1: that fantastically fascinating.

733
00:43:46,960 --> 00:43:51,239
Speaker 2: So when you're reading a book like that, somewhere in

734
00:43:51,280 --> 00:43:53,279
Speaker 2: the back of your head are you thinking, hey, this

735
00:43:53,440 --> 00:43:56,280
Speaker 2: is like a what not to do when you're setting

736
00:43:56,360 --> 00:44:00,000
Speaker 2: up a machine learning fund. How influential is something like.

737
00:44:00,040 --> 00:44:03,480
Speaker 1: Well one hundred percent? I mean, look, I think the

738
00:44:03,520 --> 00:44:05,959
Speaker 1: most important adage I've ever heard in my professional life

739
00:44:06,000 --> 00:44:10,160
Speaker 1: is good judgment comes from experience. Experience comes from bad judgment.

740
00:44:10,600 --> 00:44:13,800
Speaker 1: So the extent to which you can get good judgment

741
00:44:14,120 --> 00:44:17,440
Speaker 1: from other people's experience, that is that that is like

742
00:44:17,480 --> 00:44:22,400
Speaker 1: a free tuition. And so we talk a lot about

743
00:44:22,560 --> 00:44:25,640
Speaker 1: all the all the mistakes that that that other people

744
00:44:25,680 --> 00:44:29,960
Speaker 1: have made. And you know, we do not congratulate ourselves

745
00:44:29,960 --> 00:44:33,600
Speaker 1: on having avoided mistakes. We think those people were smart.

746
00:44:33,680 --> 00:44:36,040
Speaker 1: I mean look that you know, you read about these

747
00:44:36,040 --> 00:44:38,040
Speaker 1: events and these people. None of these people were dummies.

748
00:44:38,080 --> 00:44:40,000
Speaker 1: They were sophisticated Nobel laureates.

749
00:44:40,080 --> 00:44:43,040
Speaker 2: Yeah, right, it's they just didn't have a guide book

750
00:44:43,080 --> 00:44:45,560
Speaker 2: on what not to do, which which you guys.

751
00:44:45,280 --> 00:44:47,880
Speaker 1: Do We don't. No, I don't think we do. I

752
00:44:47,920 --> 00:44:50,200
Speaker 1: mean apart from that, apart from reading about right, But

753
00:44:50,480 --> 00:44:53,160
Speaker 1: everybody is undone by a failure that they they didn't

754
00:44:53,280 --> 00:44:55,240
Speaker 1: they did, they didn't think of ever didn't know about yet.

755
00:44:55,280 --> 00:44:57,200
Speaker 1: And we're extremely cognizant of that.

756
00:44:57,560 --> 00:45:00,680
Speaker 2: Huh. That has to be somewhat humbling to come being

757
00:45:01,000 --> 00:45:06,720
Speaker 2: on the lookout for that blind spot that could disrupt everything.

758
00:45:06,920 --> 00:45:11,480
Speaker 1: Yes, yeah, humility is the key ingredient in running in

759
00:45:11,560 --> 00:45:13,040
Speaker 1: running these systems.

760
00:45:13,400 --> 00:45:18,000
Speaker 2: Really quite amazing. So let's talk a little bit about

761
00:45:18,360 --> 00:45:22,960
Speaker 2: how academically focused volling On is. You guys have a

762
00:45:23,040 --> 00:45:28,000
Speaker 2: pretty deep R and D team internally, you teach at Berkeley.

763
00:45:28,200 --> 00:45:30,560
Speaker 2: What does it mean for a Hedge fund to be

764
00:45:30,680 --> 00:45:32,000
Speaker 2: academically focused?

765
00:45:32,480 --> 00:45:36,120
Speaker 1: What I would say probably is kind of evidence based

766
00:45:36,280 --> 00:45:40,640
Speaker 1: rather than academically focused. Saying academically focused gives the impression

767
00:45:40,760 --> 00:45:43,600
Speaker 1: that kind of papers would be the goal or the

768
00:45:43,960 --> 00:45:46,080
Speaker 1: desired output, and that's not the case at all. We have,

769
00:45:46,520 --> 00:45:49,319
Speaker 1: you know, a very specific applied problem that we are

770
00:45:49,440 --> 00:45:50,200
Speaker 1: trying to solve.

771
00:45:50,480 --> 00:45:51,799
Speaker 2: Papers are a mean to an end.

772
00:45:52,000 --> 00:45:56,320
Speaker 1: Papers are you know, we don't write papers for external consumption.

773
00:45:56,360 --> 00:45:59,600
Speaker 1: We do lots of writing internally, and that's to make

774
00:45:59,640 --> 00:46:02,359
Speaker 1: sure that that you know, we're keeping track of our

775
00:46:02,400 --> 00:46:04,840
Speaker 1: own kind of scientific process.

776
00:46:05,000 --> 00:46:08,800
Speaker 2: But you're fairly widely published in statistics and machine learning. Yes,

777
00:46:08,840 --> 00:46:13,280
Speaker 2: what purpose does that serve other than a calling card

778
00:46:13,440 --> 00:46:16,440
Speaker 2: for the fund as well as Hey, I have this

779
00:46:16,560 --> 00:46:18,360
Speaker 2: idea and I want to see what the rest of

780
00:46:18,600 --> 00:46:21,640
Speaker 2: my peers think of it. When when you put stuff

781
00:46:21,680 --> 00:46:24,839
Speaker 2: out into the world, what sort of feedback or pushback

782
00:46:25,239 --> 00:46:26,120
Speaker 2: do you get?

783
00:46:27,480 --> 00:46:29,319
Speaker 1: I guess I would have to say, I really I

784
00:46:29,400 --> 00:46:32,279
Speaker 1: do that as kind of a double life of non

785
00:46:32,320 --> 00:46:37,880
Speaker 1: financial research. So it's just something that I really enjoy. Principally,

786
00:46:37,880 --> 00:46:39,359
Speaker 1: what it means is that I get to work with

787
00:46:39,719 --> 00:46:44,760
Speaker 1: PhD students, and you know, we have really outstanding PhD

788
00:46:44,800 --> 00:46:50,600
Speaker 1: students at Berkeley in statistics, and so it's an opportunity

789
00:46:50,640 --> 00:46:58,640
Speaker 1: for me to do a kind of intellectual work that namely,

790
00:46:58,880 --> 00:47:01,080
Speaker 1: you know, writing a paper laying out an argument for

791
00:47:01,120 --> 00:47:05,040
Speaker 1: public consumption, et cetera that is kind of closed off

792
00:47:05,120 --> 00:47:05,960
Speaker 1: as far as so.

793
00:47:05,960 --> 00:47:10,080
Speaker 2: Not adjacent to what you guys are doing at Volleyon generally. No, No,

794
00:47:10,560 --> 00:47:14,440
Speaker 2: that's really interesting. So then I always assume that that

795
00:47:14,600 --> 00:47:17,799
Speaker 2: was part of your process for developing new models to

796
00:47:17,840 --> 00:47:22,120
Speaker 2: apply machine learning to new assets. Take us through the process.

797
00:47:22,160 --> 00:47:25,400
Speaker 2: How do you go about saying, Hey, this is an

798
00:47:25,440 --> 00:47:28,080
Speaker 2: asset class we don't have exposure to. Let's see how

799
00:47:28,080 --> 00:47:32,280
Speaker 2: to apply what we already know to that specific area.

800
00:47:32,560 --> 00:47:35,399
Speaker 1: Yeah, we have it's a great question. So we're trying

801
00:47:35,400 --> 00:47:39,680
Speaker 1: as much as possible to get the problem for a

802
00:47:39,719 --> 00:47:43,160
Speaker 1: new asset class into a familiar setup, into you know,

803
00:47:43,719 --> 00:47:47,839
Speaker 1: as standard a setup as we can, and so we

804
00:47:47,920 --> 00:47:52,200
Speaker 1: know what these systems look like in the world of equity.

805
00:47:52,320 --> 00:47:55,359
Speaker 1: And so if you're trying to do the same kind,

806
00:47:55,400 --> 00:47:57,440
Speaker 1: if you're trying to build the same kind of system

807
00:47:57,520 --> 00:47:59,920
Speaker 1: for corporate bonds, and you start off by saying, well, okay,

808
00:48:00,239 --> 00:48:02,400
Speaker 1: i'd like I need to know, you know, closing prices

809
00:48:02,480 --> 00:48:05,279
Speaker 1: or inter day prices for all the bonds. Already, you

810
00:48:05,320 --> 00:48:09,239
Speaker 1: have a very big problem in corporate bonds because there

811
00:48:09,320 --> 00:48:15,440
Speaker 1: is no there is no live price feeds that's showing

812
00:48:15,480 --> 00:48:18,560
Speaker 1: you a bit offer quote in the way that there

813
00:48:18,640 --> 00:48:21,920
Speaker 1: is inequity. And so before you can even get started

814
00:48:21,960 --> 00:48:24,680
Speaker 1: thinking about predicting how a price is going to change,

815
00:48:24,680 --> 00:48:26,080
Speaker 1: it would be nice if you know what the price

816
00:48:26,160 --> 00:48:28,759
Speaker 1: currently was, and that is already a problem you have

817
00:48:28,800 --> 00:48:30,719
Speaker 1: to solve in corporate bonds as opposed to being just

818
00:48:30,760 --> 00:48:32,000
Speaker 1: an input that you have access to.

819
00:48:32,320 --> 00:48:35,279
Speaker 2: The old joke was trading by appointment only. Yeah, and

820
00:48:35,320 --> 00:48:37,520
Speaker 2: that seems to be a bit of an issue. And

821
00:48:37,560 --> 00:48:42,080
Speaker 2: there are so many more bond issues than there are equities. Absolutely,

822
00:48:42,239 --> 00:48:45,480
Speaker 2: is this just a database challenge or how do you work?

823
00:48:45,520 --> 00:48:49,280
Speaker 1: No, it's a statistics problem, but it's it's a different

824
00:48:49,360 --> 00:48:52,319
Speaker 1: kind of statistics problem. We're not in this case. We're

825
00:48:52,360 --> 00:48:54,920
Speaker 1: not trying to yet. We're not yet trying to predict

826
00:48:55,080 --> 00:48:58,520
Speaker 1: the future of any quantity. We're trying to say, I

827
00:48:58,560 --> 00:49:00,839
Speaker 1: wish I knew what the fair value of this of

828
00:49:00,880 --> 00:49:05,319
Speaker 1: this CSIP was. I can't see that exactly because there's

829
00:49:05,320 --> 00:49:07,239
Speaker 1: no live order book that with a bid and an

830
00:49:07,280 --> 00:49:09,680
Speaker 1: offer that's got lots of liquidity that lets me figure

831
00:49:09,680 --> 00:49:11,320
Speaker 1: out the fair value. But I do know what.

832
00:49:11,520 --> 00:49:14,120
Speaker 2: At best, you have a recent price, maybe not even

833
00:49:14,160 --> 00:49:14,680
Speaker 2: so recent.

834
00:49:14,840 --> 00:49:17,839
Speaker 1: I have lots of related information. I know you know

835
00:49:18,480 --> 00:49:21,279
Speaker 1: this bond. Maybe this bond didn't trade today, but it

836
00:49:21,320 --> 00:49:23,279
Speaker 1: traded a few times yesterday. I get to say I

837
00:49:23,400 --> 00:49:26,440
Speaker 1: know where it traded. I'm in touch with bond dealers,

838
00:49:26,440 --> 00:49:28,959
Speaker 1: so I know where they've quoted this bond, maybe only

839
00:49:28,960 --> 00:49:31,440
Speaker 1: on one side over the last few days. I have

840
00:49:31,560 --> 00:49:34,800
Speaker 1: some information about the company that issued this bond, et cetera.

841
00:49:35,280 --> 00:49:37,719
Speaker 1: So I have lots of stuff that's related to the

842
00:49:37,800 --> 00:49:39,320
Speaker 1: number I know that I want to know. I just

843
00:49:39,360 --> 00:49:42,319
Speaker 1: don't know that number, right, And so what I want

844
00:49:42,360 --> 00:49:44,440
Speaker 1: to try to do is kind of fill in and

845
00:49:44,600 --> 00:49:46,919
Speaker 1: do what's what in statistics or in control we would

846
00:49:46,920 --> 00:49:50,440
Speaker 1: call a now casting problem, huh, And it's an analogy

847
00:49:50,480 --> 00:49:55,600
Speaker 1: actually is too automatically controlling an airplane. So surprisingly, Oh,

848
00:49:55,840 --> 00:49:57,719
Speaker 1: the main there there are there are when you're if

849
00:49:57,719 --> 00:49:59,800
Speaker 1: you're trying if a software is trying to fly in

850
00:49:59,800 --> 00:50:03,080
Speaker 1: air plane, there are six things that it absolutely has

851
00:50:03,120 --> 00:50:05,160
Speaker 1: to know. Has to know the x y z of

852
00:50:05,160 --> 00:50:07,400
Speaker 1: where the plane is and the x y z of

853
00:50:07,440 --> 00:50:10,200
Speaker 1: its velocity where it's headed. Right, those are the six

854
00:50:10,280 --> 00:50:14,120
Speaker 1: most important numbers. Now, nature does not just supply those

855
00:50:14,200 --> 00:50:17,440
Speaker 1: numbers to you. You cannot know those numbers with perfect exactitude.

856
00:50:17,600 --> 00:50:20,200
Speaker 1: But there's lots of instruments on the plane, and there's

857
00:50:20,239 --> 00:50:23,800
Speaker 1: GPS and all sorts of information that is very closely

858
00:50:23,840 --> 00:50:26,040
Speaker 1: related to the numbers You wish you knew, and you

859
00:50:26,080 --> 00:50:28,919
Speaker 1: can use statistics to go from all that stuff that's

860
00:50:29,040 --> 00:50:32,719
Speaker 1: adjacent to a guess and infill of the thing you

861
00:50:32,800 --> 00:50:35,120
Speaker 1: wish you knew, And the same goes with the current

862
00:50:35,160 --> 00:50:36,560
Speaker 1: price of a corporate bond.

863
00:50:37,160 --> 00:50:41,520
Speaker 2: Huh. That's really kind of interesting. So I'm curious as

864
00:50:41,600 --> 00:50:45,399
Speaker 2: to how often you start working your way into one

865
00:50:45,480 --> 00:50:50,640
Speaker 2: particular asset or a particular strategy for that asset and

866
00:50:50,760 --> 00:50:54,040
Speaker 2: just suddenly realize, oh, this is wildly different than we

867
00:50:54,160 --> 00:50:57,960
Speaker 2: previously expected, and suddenly you're down a rabbit hole to

868
00:50:58,160 --> 00:51:02,200
Speaker 2: just wildly unexpected areas. It sounds like that isn't all

869
00:51:02,239 --> 00:51:02,920
Speaker 2: then uncommon.

870
00:51:02,960 --> 00:51:03,960
Speaker 1: It is not uncommon at all.

871
00:51:04,160 --> 00:51:04,399
Speaker 2: Huh.

872
00:51:04,400 --> 00:51:07,480
Speaker 1: No, it's a nice you know, there's this kind of

873
00:51:07,560 --> 00:51:09,440
Speaker 1: wishful thinking that all we have. You know, we figured

874
00:51:09,440 --> 00:51:11,840
Speaker 1: it out in one asset class in the sense that

875
00:51:11,840 --> 00:51:14,200
Speaker 1: we have a system that's kind of stable and performing

876
00:51:14,200 --> 00:51:16,600
Speaker 1: reasonably well that we that we have a feel for,

877
00:51:17,160 --> 00:51:20,280
Speaker 1: and now we want to take that system and somehow

878
00:51:20,320 --> 00:51:23,920
Speaker 1: replicate it in a different situation. And while we're going

879
00:51:23,960 --> 00:51:26,799
Speaker 1: to standardize the new situation to make it look like

880
00:51:26,800 --> 00:51:29,080
Speaker 1: the old situation. That's the principle. That principle kind of

881
00:51:29,120 --> 00:51:31,480
Speaker 1: quickly goes out the window when you when you start

882
00:51:31,520 --> 00:51:33,400
Speaker 1: to make contact with the reality of how the new

883
00:51:33,440 --> 00:51:34,800
Speaker 1: asset class actually behaves.

884
00:51:34,880 --> 00:51:37,360
Speaker 2: So stocks are different than credit, are different than bonds,

885
00:51:37,440 --> 00:51:40,360
Speaker 2: or different than commodities. They're all like starting fresh. Yeah,

886
00:51:40,400 --> 00:51:42,919
Speaker 2: over what some of the more surprising things you've learned

887
00:51:42,960 --> 00:51:46,880
Speaker 2: as you've applied machine learning to totally different asset classes.

888
00:51:47,040 --> 00:51:49,719
Speaker 1: Well, I think, you know, corporate bonds provide a lot

889
00:51:49,719 --> 00:51:52,480
Speaker 1: of examples of this. I mean, the fact that you

890
00:51:52,520 --> 00:51:57,279
Speaker 1: don't actually really know a good live price or a

891
00:51:57,320 --> 00:52:00,480
Speaker 1: good live bid offers it seems seems you know, it's surprising.

892
00:52:00,520 --> 00:52:03,520
Speaker 1: I mean, this is this fact has started to change.

893
00:52:03,560 --> 00:52:07,279
Speaker 1: Like over the years, there's been an accelerating electronification of

894
00:52:07,320 --> 00:52:09,880
Speaker 1: corporate bond treading, and that's you know, that's that's been

895
00:52:09,880 --> 00:52:11,839
Speaker 1: a big advantage for us actually because we were kind

896
00:52:11,880 --> 00:52:14,000
Speaker 1: of first movers and so we've really benefited from that.

897
00:52:14,440 --> 00:52:17,360
Speaker 1: So the problem is diminished relative to how it was,

898
00:52:17,719 --> 00:52:20,160
Speaker 1: you know, six seven years ago when we started, but

899
00:52:20,200 --> 00:52:23,440
Speaker 1: it's still relative equities, it's absolutely there.

900
00:52:23,520 --> 00:52:25,839
Speaker 2: Yeah, So you get so when in other words, if

901
00:52:25,840 --> 00:52:28,279
Speaker 2: I'm looking at a bond mutual fund or even a

902
00:52:28,280 --> 00:52:33,400
Speaker 2: bondytf that's trading during the day. That price is somebody's

903
00:52:33,440 --> 00:52:37,560
Speaker 2: best approximation of the value of all the bonds inside.

904
00:52:37,880 --> 00:52:41,439
Speaker 2: But really you don't know the nav, do you.

905
00:52:41,440 --> 00:52:43,600
Speaker 1: You're just kind of guessing, Barry, don't even get me

906
00:52:43,640 --> 00:52:46,160
Speaker 1: started on bonditfs real because.

907
00:52:45,960 --> 00:52:48,160
Speaker 2: That it seems like that would be the first place

908
00:52:48,200 --> 00:52:52,120
Speaker 2: that would show up. Hey, bondytf's sound like throughout the

909
00:52:52,200 --> 00:52:56,759
Speaker 2: day they're gonna be mispriced a little bit or wildly mispriced.

910
00:52:57,080 --> 00:53:00,319
Speaker 1: Well, the bond ETF there's a sense if you're a

911
00:53:00,520 --> 00:53:02,360
Speaker 1: if you're a market purist, in which they can't be

912
00:53:02,440 --> 00:53:05,280
Speaker 1: mispriced because there's their price is set by supplying demand

913
00:53:05,560 --> 00:53:08,560
Speaker 1: in the ETF market, and that's a super liquid market,

914
00:53:08,840 --> 00:53:11,720
Speaker 1: and so there may be a difference between the market

915
00:53:11,719 --> 00:53:14,120
Speaker 1: price of the ETF and the under the nave of

916
00:53:14,120 --> 00:53:18,520
Speaker 1: the underlying portfolio, except in many cases with bond ETF

917
00:53:18,560 --> 00:53:23,120
Speaker 1: there's not even a crisply defined underlying portfolio. It turns

918
00:53:23,120 --> 00:53:26,520
Speaker 1: out that the authorized participants in those ETF markets can

919
00:53:27,120 --> 00:53:32,279
Speaker 1: negotiate with the fund manager about exactly what the constituents

920
00:53:32,320 --> 00:53:35,160
Speaker 1: are of the create redeem baskets, and so it's not

921
00:53:35,200 --> 00:53:37,920
Speaker 1: even at all clear what you mean when you say

922
00:53:38,080 --> 00:53:40,239
Speaker 1: that the nav is this or that relative to the

923
00:53:40,320 --> 00:53:41,200
Speaker 1: price of the ETF.

924
00:53:41,520 --> 00:53:44,040
Speaker 2: So when I asked about what's surprising when you work

925
00:53:44,080 --> 00:53:46,000
Speaker 2: you in on a rabbit hole. Hey, we don't know

926
00:53:46,000 --> 00:53:48,120
Speaker 2: what the hell's in this bond ETF. Trust us, it's

927
00:53:48,160 --> 00:53:51,640
Speaker 2: all good. That's a pretty surprise. And I'm only exaggerating

928
00:53:51,640 --> 00:53:54,520
Speaker 2: a little bit, But that seems like that's kind of shocking.

929
00:53:55,160 --> 00:53:57,920
Speaker 1: It's it is surprising when you find out about it,

930
00:53:57,960 --> 00:54:00,919
Speaker 1: but you quickly come to understand. If you trade single

931
00:54:00,960 --> 00:54:03,160
Speaker 1: name bonds, as we do, you quickly come to understand

932
00:54:03,520 --> 00:54:05,719
Speaker 1: why bond ETFs work that way.

933
00:54:06,560 --> 00:54:08,680
Speaker 2: I recall a couple of years ago there was a

934
00:54:08,719 --> 00:54:12,480
Speaker 2: big Wall Street Journal article on the g l d

935
00:54:13,280 --> 00:54:17,279
Speaker 2: E t F, and from that article I learned that

936
00:54:18,040 --> 00:54:22,280
Speaker 2: GLD was formed because gold dealers had just excess gold

937
00:54:22,400 --> 00:54:25,120
Speaker 2: piling up in their warehouses and they needed a way

938
00:54:25,560 --> 00:54:27,920
Speaker 2: to move it. So that was kind of shocking about

939
00:54:27,960 --> 00:54:32,279
Speaker 2: that ETF any other space that that led to a

940
00:54:33,360 --> 00:54:35,719
Speaker 2: sort of big surprise as you worked your way into it.

941
00:54:37,160 --> 00:54:41,200
Speaker 1: Well, I think ETFs are a kind of a good

942
00:54:41,239 --> 00:54:45,239
Speaker 1: source of these examples. So the volatility ETFs, the you know,

943
00:54:45,280 --> 00:54:47,560
Speaker 1: the ETFs that are that are based on the VIX

944
00:54:47,640 --> 00:54:50,360
Speaker 1: or that are short the vics. You may remember several years.

945
00:54:50,160 --> 00:54:52,120
Speaker 2: Ago I was gonna say the ones that haven't blown up.

946
00:54:52,200 --> 00:54:55,600
Speaker 1: Yeah right, there was this event called Valmageddon where.

947
00:54:56,239 --> 00:54:58,640
Speaker 2: That was ETF notes, wasn't it the Yeah.

948
00:54:59,680 --> 00:55:03,160
Speaker 1: Right, there are these essentially these investment products that were

949
00:55:03,280 --> 00:55:07,040
Speaker 1: short VIX, and VIX went through a spike that caused

950
00:55:07,040 --> 00:55:09,160
Speaker 1: them to have to liquidate, which was part I mean,

951
00:55:09,239 --> 00:55:13,200
Speaker 1: the people who designed the sixteene traded note. They understood

952
00:55:13,239 --> 00:55:15,040
Speaker 1: that this was a possibility, so they had a sort

953
00:55:15,040 --> 00:55:18,600
Speaker 1: of uh descriptions in their in their contract for what

954
00:55:18,880 --> 00:55:23,680
Speaker 1: it would mean. But yeah, always surprising to watch something

955
00:55:24,040 --> 00:55:25,360
Speaker 1: suddenly go out of business.

956
00:55:25,600 --> 00:55:28,120
Speaker 2: We seem to get a thousand year flood every couple

957
00:55:28,120 --> 00:55:30,760
Speaker 2: of years. Maybe we shouldn't be calling these things thousand

958
00:55:30,800 --> 00:55:33,880
Speaker 2: year flood. That's right, that's a that's a big misnomer.

959
00:55:34,360 --> 00:55:36,879
Speaker 1: As statisticians, we tell people, you know, if you if

960
00:55:36,920 --> 00:55:39,960
Speaker 1: you think that you've experienced a six sigma event, the

961
00:55:40,000 --> 00:55:43,120
Speaker 1: problem is that you have underestimated sigma.

962
00:55:43,239 --> 00:55:46,759
Speaker 2: That that's really interesting. So so, given the gap in

963
00:55:46,840 --> 00:55:53,000
Speaker 2: the world between computer science and an investment management, how

964
00:55:53,080 --> 00:55:56,279
Speaker 2: long is it going to be before that narrows and

965
00:55:56,320 --> 00:55:58,560
Speaker 2: we start seeing a whole lot more of the sort

966
00:55:58,560 --> 00:56:02,239
Speaker 2: of work you're doing applied across the board to to

967
00:56:02,320 --> 00:56:03,440
Speaker 2: the world of investment.

968
00:56:04,520 --> 00:56:08,160
Speaker 1: Well, I think it's happening. It's been happening for for

969
00:56:08,239 --> 00:56:11,000
Speaker 1: quite a long time. I mean, for example, all of

970
00:56:11,440 --> 00:56:15,279
Speaker 1: modern portfolio theory. Really, it kind of began in the

971
00:56:15,320 --> 00:56:18,520
Speaker 1: fifties with you know, first of all, Markowitz and other

972
00:56:18,560 --> 00:56:21,600
Speaker 1: people thinking about, you know, what it means to benefit

973
00:56:21,640 --> 00:56:24,960
Speaker 1: from diversification, and the idea that you know, diversification is

974
00:56:24,960 --> 00:56:28,040
Speaker 1: the only free lunch in finance. So I would I

975
00:56:28,040 --> 00:56:32,880
Speaker 1: would say that, you know, the idea of thinking in

976
00:56:32,920 --> 00:56:38,120
Speaker 1: a in a systematic and scientific way about how to

977
00:56:38,120 --> 00:56:41,279
Speaker 1: to manage and grow wealth, not you know, not even

978
00:56:41,440 --> 00:56:45,359
Speaker 1: just for institutions, but also for individuals. Has is an

979
00:56:45,360 --> 00:56:48,920
Speaker 1: example of a way that these ideas have kind of

980
00:56:49,840 --> 00:56:51,880
Speaker 1: had profound effects.

981
00:56:52,120 --> 00:56:55,200
Speaker 2: I know, I only have you for a little while longer,

982
00:56:55,520 --> 00:56:58,319
Speaker 2: So let's jump to our favorite questions that we ask

983
00:56:59,040 --> 00:57:01,640
Speaker 2: all of our guests, starting with tell us what you're

984
00:57:01,640 --> 00:57:04,400
Speaker 2: streaming these days? What are you either listening to or

985
00:57:04,480 --> 00:57:06,680
Speaker 2: watching to keep yourself entertained.

986
00:57:08,160 --> 00:57:11,480
Speaker 1: I A few things I've been watching recently. The Bear,

987
00:57:11,520 --> 00:57:13,960
Speaker 1: I don't know if you've heard So Great, So great, right,

988
00:57:14,239 --> 00:57:16,200
Speaker 1: and I'm in Chicago, as I know, we were just

989
00:57:16,360 --> 00:57:19,680
Speaker 1: from Yeah, so.

990
00:57:18,920 --> 00:57:21,200
Speaker 2: So and and there are parts of that show that

991
00:57:21,280 --> 00:57:23,760
Speaker 2: are kind of a love letter to absolutely as you

992
00:57:23,800 --> 00:57:26,640
Speaker 2: get deeper into the series, because it starts out kind

993
00:57:26,640 --> 00:57:29,600
Speaker 2: of gritty and you're seeing the underside, and then as

994
00:57:29,640 --> 00:57:33,600
Speaker 2: we progress, it really becomes like a lovely postcard. Such

995
00:57:33,640 --> 00:57:34,400
Speaker 2: an amazing show.

996
00:57:34,480 --> 00:57:37,760
Speaker 1: So really really love that show. Was I was late

997
00:57:37,800 --> 00:57:41,040
Speaker 1: to better call Saul that I'm finishing up. I think

998
00:57:41,080 --> 00:57:45,600
Speaker 1: as good as as Breaking Bad, So I maybe when

999
00:57:45,640 --> 00:57:48,240
Speaker 1: you haven't heard of there's a show called Mister in Between.

1000
00:57:48,160 --> 00:57:50,320
Speaker 2: Which is mister Yeah.

1001
00:57:50,320 --> 00:57:53,040
Speaker 1: It's not Hulu, it's from it's from Australia. It's about

1002
00:57:53,040 --> 00:57:58,360
Speaker 1: a guy who's, you know, a doting father living his life.

1003
00:57:58,360 --> 00:58:03,120
Speaker 1: He's also essentially a muscle man and hitman for for

1004
00:58:04,040 --> 00:58:07,479
Speaker 1: local criminals in his part of Australia. But it's half

1005
00:58:07,480 --> 00:58:09,200
Speaker 1: hour dark comedy.

1006
00:58:09,160 --> 00:58:12,440
Speaker 2: Right, so not quite Barry and not quite Sopranos somewhere.

1007
00:58:12,960 --> 00:58:14,080
Speaker 1: Yeah, that's exactly.

1008
00:58:14,240 --> 00:58:19,360
Speaker 2: Yeah, sounds really interesting. Tell us about your early mentors

1009
00:58:19,360 --> 00:58:21,160
Speaker 2: who helped shape your career.

1010
00:58:21,880 --> 00:58:24,440
Speaker 1: Well, Berry, I'd been lucky to have a lot of

1011
00:58:24,840 --> 00:58:28,880
Speaker 1: people who were you know, both really smart and talented

1012
00:58:28,920 --> 00:58:32,000
Speaker 1: and willing to you know, take the time to help

1013
00:58:32,040 --> 00:58:35,840
Speaker 1: me learn and understand things. So actually, my co founder,

1014
00:58:36,000 --> 00:58:40,240
Speaker 1: Michael Caratanov, he was kind of my first mentor in finance.

1015
00:58:40,320 --> 00:58:42,680
Speaker 1: He he had been a d SHAW for several years

1016
00:58:43,400 --> 00:58:46,120
Speaker 1: when I got there, and he he really taught me

1017
00:58:46,560 --> 00:58:49,280
Speaker 1: kind of the ins and outs of of market micro structure.

1018
00:58:50,440 --> 00:58:53,120
Speaker 1: I worked with a couple of people who managed me

1019
00:58:53,280 --> 00:58:56,480
Speaker 1: at d SHAW yo see Friedman and Kapeel Mature, who

1020
00:58:56,480 --> 00:59:00,480
Speaker 1: have gone on to hugely successful careers in quantitative finance,

1021
00:59:00,560 --> 00:59:03,360
Speaker 1: and they taught me a lot to when I did

1022
00:59:03,360 --> 00:59:06,800
Speaker 1: my PhD. My advisor Mike Jordan, who's a kind of

1023
00:59:06,840 --> 00:59:12,320
Speaker 1: world famous machine learning researcher. You know, I learned enormously

1024
00:59:12,320 --> 00:59:18,000
Speaker 1: from him. And there's another professor of statistics who sadly

1025
00:59:18,000 --> 00:59:21,920
Speaker 1: passed away about fifteen years ago named David Friedman. He

1026
00:59:22,040 --> 00:59:26,120
Speaker 1: was really just an intellectual giant of the twentieth century

1027
00:59:26,120 --> 00:59:30,040
Speaker 1: and probability and statistics. He was both, you know, one

1028
00:59:30,080 --> 00:59:35,120
Speaker 1: of the most brilliant probabilists and also an applied statistician.

1029
00:59:35,160 --> 00:59:38,200
Speaker 1: And this is this is like a pink diamond kind

1030
00:59:38,240 --> 00:59:42,240
Speaker 1: of combination. It's that rare to find someone who has

1031
00:59:42,320 --> 00:59:46,320
Speaker 1: that kind of technical capability but also understands the pragmatics

1032
00:59:46,320 --> 00:59:48,560
Speaker 1: of actually doing data analysis. He spent a lot of

1033
00:59:48,600 --> 00:59:53,360
Speaker 1: time as an expert witness. He was the lead statistical

1034
00:59:53,360 --> 00:59:56,440
Speaker 1: consultant for the case on census adjustment that went to

1035
00:59:56,520 --> 01:00:02,000
Speaker 1: the Supreme Court. In fact, he told me, uh, what

1036
01:00:02,360 --> 01:00:05,280
Speaker 1: went that in the end? Uh, you know, the the

1037
01:00:05,320 --> 01:00:08,760
Speaker 1: people against adjustment they won in a unanimous Supreme Court decision.

1038
01:00:08,800 --> 01:00:11,120
Speaker 1: And David Freeman told me, he said, you know, all

1039
01:00:11,160 --> 01:00:13,240
Speaker 1: that work and we only convinced nine people.

1040
01:00:15,440 --> 01:00:17,840
Speaker 2: But not nine people that kind of matter, Yeah, exactly.

1041
01:00:18,160 --> 01:00:21,280
Speaker 1: So it was just it was a real it was

1042
01:00:21,360 --> 01:00:24,480
Speaker 1: kind of a once in a lifetime privilege to get

1043
01:00:24,520 --> 01:00:28,520
Speaker 1: to spend time with someone of that intellectual caliber. And

1044
01:00:28,600 --> 01:00:30,520
Speaker 1: there were others too. I mean, I've been I've been

1045
01:00:30,600 --> 01:00:31,520
Speaker 1: very fortunate that.

1046
01:00:31,600 --> 01:00:35,360
Speaker 2: That's quite a list to begin with. Let's talk about books.

1047
01:00:35,360 --> 01:00:36,920
Speaker 2: What are some of your favorites and what are you

1048
01:00:36,960 --> 01:00:37,760
Speaker 2: reading right now?

1049
01:00:38,880 --> 01:00:40,880
Speaker 1: Uh? Well, I'm a I'm a big book reader, so

1050
01:00:41,080 --> 01:00:42,240
Speaker 1: I had a long list.

1051
01:00:42,480 --> 01:00:45,800
Speaker 2: But probably by the way, this is everybody's favorite section

1052
01:00:46,400 --> 01:00:50,200
Speaker 2: of the podcast. People are always looking for good book recommendations,

1053
01:00:50,320 --> 01:00:54,520
Speaker 2: and if they like what you said earlier, they're gonna

1054
01:00:54,520 --> 01:00:56,720
Speaker 2: love love your book recommendations, so fire away.

1055
01:00:57,120 --> 01:01:02,800
Speaker 1: So I'm a big fan of kind of modernist dystopian fiction.

1056
01:01:03,280 --> 01:01:05,800
Speaker 1: So a couple of examples of that would be the

1057
01:01:05,800 --> 01:01:10,600
Speaker 1: book Infinite Jest by David Foster Wallace, wind Up Bird

1058
01:01:10,680 --> 01:01:14,000
Speaker 1: Chronicle by Hirouki Murakami. Those are two of my all

1059
01:01:14,040 --> 01:01:17,919
Speaker 1: time favorite books. There's a I think much less well

1060
01:01:17,960 --> 01:01:22,840
Speaker 1: known but beautiful novel. It's a kind of academic coming

1061
01:01:22,880 --> 01:01:28,440
Speaker 1: of age novel called Stoner by John Williams. A really moving,

1062
01:01:28,560 --> 01:01:33,000
Speaker 1: just a tremendous book. Sort of more dystopia would be

1063
01:01:33,360 --> 01:01:38,240
Speaker 1: White Noise to Lilo and kind of the classics that

1064
01:01:38,240 --> 01:01:41,040
Speaker 1: everybody knows nineteen eighty four and Brave New World. Those

1065
01:01:41,040 --> 01:01:42,920
Speaker 1: are two more of my favorite.

1066
01:01:42,600 --> 01:01:46,800
Speaker 2: Huh, it's funny when you mentioned The Bear. I'm in

1067
01:01:46,840 --> 01:01:49,920
Speaker 2: the middle of reading a book that I would swear

1068
01:01:50,000 --> 01:01:54,880
Speaker 2: the writers of The Bear leaned on called Unreasonable Hospitality

1069
01:01:55,520 --> 01:01:59,640
Speaker 2: by somebody who worked for the Danny Meyer's Hospitality Group.

1070
01:02:00,080 --> 01:02:03,520
Speaker 2: Eleven Madison Park in Ramsey Tavern and all these famous

1071
01:02:03,840 --> 01:02:08,040
Speaker 2: New York haunts, and the scene in The Bear where

1072
01:02:08,440 --> 01:02:12,240
Speaker 2: they overhear a couple say, oh, we visited Chicago when

1073
01:02:12,240 --> 01:02:14,640
Speaker 2: you never had deep dish, So they send the guy

1074
01:02:14,720 --> 01:02:18,160
Speaker 2: out to get deep dish. There's part of the book

1075
01:02:18,720 --> 01:02:23,720
Speaker 2: where at eleven Medicine Park this people actually showed up

1076
01:02:23,760 --> 01:02:25,760
Speaker 2: with suitcases. It was the last thing they would eat

1077
01:02:25,800 --> 01:02:28,360
Speaker 2: doing before they heading to the airport. And they said, oh,

1078
01:02:28,360 --> 01:02:30,360
Speaker 2: we ate all these great places in New York, but

1079
01:02:30,400 --> 01:02:32,640
Speaker 2: we never had a New York hot dog. And what

1080
01:02:32,640 --> 01:02:34,320
Speaker 2: do they do. They send them out to get someone

1081
01:02:34,360 --> 01:02:36,600
Speaker 2: out to get a hot dog. They played it and

1082
01:02:37,280 --> 01:02:39,920
Speaker 2: use all the condiments to make it very special, and

1083
01:02:39,960 --> 01:02:42,840
Speaker 2: it looks like it was ripped right out of the Bear,

1084
01:02:43,000 --> 01:02:46,960
Speaker 2: or vice versa. But if you're interested in just hey,

1085
01:02:47,000 --> 01:02:51,840
Speaker 2: how can we disrupt the restaurant business and make it

1086
01:02:51,920 --> 01:02:54,000
Speaker 2: not just about the celebrity chef in the kitchen but

1087
01:02:54,400 --> 01:02:58,160
Speaker 2: the whole experience. Fascinating kind of nonfiction book?

1088
01:02:58,240 --> 01:02:59,240
Speaker 1: That does sound really interesting.

1089
01:02:59,400 --> 01:03:02,080
Speaker 2: Yeah, really, you mentioned the Bear and it just popped

1090
01:03:02,120 --> 01:03:04,160
Speaker 2: into my head. Any of the books you want to

1091
01:03:04,160 --> 01:03:06,080
Speaker 2: mention that's that's a good list to start with.

1092
01:03:06,440 --> 01:03:10,240
Speaker 1: Yeah. My other kind of big interest is science fiction,

1093
01:03:10,480 --> 01:03:16,920
Speaker 1: speculative fiction. Unsurprisingly right, Sorry, sorry, but so there are

1094
01:03:16,960 --> 01:03:19,640
Speaker 1: some classics that I think everybody should read. Ursula LeGuin

1095
01:03:19,960 --> 01:03:24,040
Speaker 1: loves just amazing. So The Dispossessed and The Left Hand

1096
01:03:24,040 --> 01:03:25,680
Speaker 1: of Darkness, those are just two of the best books

1097
01:03:25,680 --> 01:03:26,520
Speaker 1: I've ever read period.

1098
01:03:26,560 --> 01:03:30,040
Speaker 2: Forget Left Handed Darkness stays with you for a long time.

1099
01:03:30,120 --> 01:03:35,200
Speaker 1: Yeah right, yeah, really really amazing books. I'm rereading right now,

1100
01:03:35,360 --> 01:03:41,960
Speaker 1: Cryptonomicon Neil Stevenson. And one other thing I try to

1101
01:03:42,000 --> 01:03:45,520
Speaker 1: do is I have very big gaps in my reading.

1102
01:03:45,560 --> 01:03:48,280
Speaker 1: For example, I've never read Updyke, so I started reading

1103
01:03:48,320 --> 01:03:49,000
Speaker 1: The Rabbit.

1104
01:03:49,000 --> 01:03:52,280
Speaker 2: Serious World of Corn. It's a garb and they're they're

1105
01:03:52,440 --> 01:03:53,800
Speaker 2: very much of an era.

1106
01:03:54,000 --> 01:03:54,880
Speaker 1: Yeah, that's right.

1107
01:03:55,880 --> 01:03:57,720
Speaker 2: What else give us more? Uh?

1108
01:03:58,080 --> 01:03:59,880
Speaker 1: Wow? Okay, let's see George so.

1109
01:04:01,600 --> 01:04:01,840
Speaker 2: He.

1110
01:04:02,320 --> 01:04:05,040
Speaker 1: Oh wow, I think I think you'd love him. So

1111
01:04:05,400 --> 01:04:09,280
Speaker 1: He's his real strength is short fiction. He had He's

1112
01:04:09,280 --> 01:04:12,960
Speaker 1: written great novels too, but tenth of December this is

1113
01:04:13,000 --> 01:04:16,320
Speaker 1: his best collection of of fiction. And that this is

1114
01:04:16,360 --> 01:04:23,280
Speaker 1: more kind of modern dystopian, kind of comic dystopian stuff.

1115
01:04:23,600 --> 01:04:27,560
Speaker 2: You keep coming back to dystopia, yeasinating.

1116
01:04:26,760 --> 01:04:30,680
Speaker 1: I find, you know, it's uh, it's very different from

1117
01:04:30,720 --> 01:04:32,919
Speaker 1: my my day to day reality. So I think it's

1118
01:04:32,960 --> 01:04:36,120
Speaker 1: a you know, it's a great change of pace for

1119
01:04:36,200 --> 01:04:41,320
Speaker 1: me to be able to read this stuff. So, uh,

1120
01:04:41,960 --> 01:04:45,360
Speaker 1: some some science writing, I can tell you. Probably the

1121
01:04:45,400 --> 01:04:48,360
Speaker 1: best science book I ever read is The Selfish Gene

1122
01:04:48,800 --> 01:04:54,280
Speaker 1: by Richard Dawkins, which kind of really you know, you

1123
01:04:54,360 --> 01:04:57,480
Speaker 1: have a kind of intuitive understanding of genetics and natural

1124
01:04:57,480 --> 01:05:01,560
Speaker 1: selection in Darwin, but the language that Dawkins uses really

1125
01:05:01,560 --> 01:05:06,160
Speaker 1: makes you appreciate just how how much the genes are

1126
01:05:06,160 --> 01:05:08,840
Speaker 1: in charge and how little we as the as the

1127
01:05:09,520 --> 01:05:13,320
Speaker 1: you know he calls he calls organisms survival machines that

1128
01:05:13,400 --> 01:05:16,720
Speaker 1: the genes have kind of built and and exist inside

1129
01:05:16,760 --> 01:05:19,200
Speaker 1: in order to ensure their propagation. And his whole the

1130
01:05:19,200 --> 01:05:21,560
Speaker 1: whole point of view in that book just gives you, Uh,

1131
01:05:22,360 --> 01:05:25,280
Speaker 1: it's really eye opening, makes you think about natural selection

1132
01:05:25,360 --> 01:05:28,040
Speaker 1: and evolution and genetics in a completely different way, even

1133
01:05:28,040 --> 01:05:30,600
Speaker 1: though it's all based on the same kind of facts that.

1134
01:05:30,600 --> 01:05:32,400
Speaker 2: You know, it's just framing.

1135
01:05:32,480 --> 01:05:34,760
Speaker 1: It's the framing and the perspective that are really that

1136
01:05:34,840 --> 01:05:37,120
Speaker 1: really kind of blow your mind. So it's a great

1137
01:05:37,200 --> 01:05:38,200
Speaker 1: it's a great book to read.

1138
01:05:39,440 --> 01:05:41,439
Speaker 2: Huh, that's a hell of a list. You've given people

1139
01:05:41,480 --> 01:05:44,080
Speaker 2: a lot of things to start with, and now down

1140
01:05:44,080 --> 01:05:47,160
Speaker 2: to our last two questions, HM, what advice would you

1141
01:05:47,200 --> 01:05:50,560
Speaker 2: give to a recent college grad who is interested in

1142
01:05:50,600 --> 01:05:54,880
Speaker 2: a career in either investment management or machine learning.

1143
01:05:56,600 --> 01:05:59,560
Speaker 1: Yeah? So, I mean I work in a very specialized

1144
01:05:59,720 --> 01:06:01,600
Speaker 1: sub domain of finance, So there are a lot of

1145
01:06:01,600 --> 01:06:03,200
Speaker 1: people who are going to be interested in investment in

1146
01:06:03,240 --> 01:06:06,600
Speaker 1: finance that I that I couldn't give any specific advice to.

1147
01:06:06,880 --> 01:06:11,240
Speaker 1: I have kind of general advice that I think is

1148
01:06:11,520 --> 01:06:15,600
Speaker 1: useful both for finance and even more broadly. This advice

1149
01:06:15,680 --> 01:06:19,400
Speaker 1: is really kind of top of Maslow's pyramid advice. If

1150
01:06:19,520 --> 01:06:22,240
Speaker 1: you know, if you're trying to kind of write your

1151
01:06:22,280 --> 01:06:25,040
Speaker 1: novel and pay the rent while you get it done,

1152
01:06:25,040 --> 01:06:27,600
Speaker 1: this is I can't really help you with that. But

1153
01:06:29,120 --> 01:06:32,360
Speaker 1: you know, if what you care about is building this career,

1154
01:06:32,520 --> 01:06:34,520
Speaker 1: then I would say number one piece of advice is

1155
01:06:34,520 --> 01:06:37,080
Speaker 1: work with incredible people. Like far and away, much more

1156
01:06:37,080 --> 01:06:40,360
Speaker 1: important than what the particular field is the details of

1157
01:06:40,400 --> 01:06:42,880
Speaker 1: what you're working on, is the caliber of the people

1158
01:06:42,880 --> 01:06:45,640
Speaker 1: that you do it with, both in terms of your

1159
01:06:45,680 --> 01:06:51,120
Speaker 1: own satisfaction and how much you learn and and and

1160
01:06:51,600 --> 01:06:55,040
Speaker 1: all of that. I think you know you'll learn, you'll

1161
01:06:55,040 --> 01:06:59,360
Speaker 1: benefit hugely on a personal level from working with incredible

1162
01:06:59,400 --> 01:07:03,680
Speaker 1: people and if you don't work with people that are

1163
01:07:04,000 --> 01:07:06,760
Speaker 1: like that, then you're probably going to have a lot

1164
01:07:06,760 --> 01:07:09,080
Speaker 1: of professional unhappiness. So it's kind of either or.

1165
01:07:09,600 --> 01:07:14,760
Speaker 2: That's a really intriguing answer. So final question, what do

1166
01:07:14,800 --> 01:07:18,360
Speaker 2: you know about the world of investing, machine learning, large

1167
01:07:18,440 --> 01:07:22,400
Speaker 2: language models, just the application of technology to the field

1168
01:07:22,440 --> 01:07:25,439
Speaker 2: of investing that you wish you knew twenty five years

1169
01:07:25,520 --> 01:07:28,520
Speaker 2: or so ago when you were really first ramping up.

1170
01:07:29,840 --> 01:07:33,720
Speaker 1: I think one of the most important lessons that I learned,

1171
01:07:33,760 --> 01:07:35,640
Speaker 1: had to learn the hard way kind of going through

1172
01:07:35,840 --> 01:07:39,960
Speaker 1: and running these systems, was that it's kind of comes

1173
01:07:40,000 --> 01:07:43,440
Speaker 1: back to the point you made earlier about the primacy

1174
01:07:43,480 --> 01:07:47,680
Speaker 1: of prediction rules. And it may be true that the

1175
01:07:47,720 --> 01:07:51,360
Speaker 1: most important thing is the prediction quality, but there are

1176
01:07:51,480 --> 01:07:55,280
Speaker 1: lots of other very necessary, mandatory ingredients, and I would

1177
01:07:55,280 --> 01:07:57,680
Speaker 1: put kind of risk management at the top of that list.

1178
01:07:57,720 --> 01:08:02,560
Speaker 1: So I think it's easy to to maybe neglect risk

1179
01:08:02,600 --> 01:08:06,480
Speaker 1: management to a certain extent and focus all of your

1180
01:08:06,520 --> 01:08:10,520
Speaker 1: attention on predictive accuracy. But I think it really does

1181
01:08:10,560 --> 01:08:13,720
Speaker 1: turn out that if you don't have high quality risk

1182
01:08:13,760 --> 01:08:16,479
Speaker 1: management to go along with that predictive accuracy, you won't succeed.

1183
01:08:17,439 --> 01:08:20,640
Speaker 1: And I guess I wish I had appreciated that in

1184
01:08:21,080 --> 01:08:23,080
Speaker 1: a really deep way twenty five years ago.

1185
01:08:23,320 --> 01:08:27,400
Speaker 2: John, This has been really, absolutely fascinating. I don't even

1186
01:08:27,479 --> 01:08:30,000
Speaker 2: know where to begin other than saying thank you for

1187
01:08:30,040 --> 01:08:34,080
Speaker 2: being so generous with your time and your expertise. We

1188
01:08:34,280 --> 01:08:36,920
Speaker 2: have been speaking with John mccauloff. He is the co

1189
01:08:37,040 --> 01:08:41,360
Speaker 2: founder and chief investment officer at the five billion dollar

1190
01:08:41,479 --> 01:08:46,040
Speaker 2: hedge fund Volleyon Group. If you enjoy this conversation, well,

1191
01:08:46,320 --> 01:08:48,599
Speaker 2: be sure and check out any of the previous five

1192
01:08:48,720 --> 01:08:53,000
Speaker 2: hundred we've done over the past nine years. You can

1193
01:08:53,040 --> 01:08:57,200
Speaker 2: find those at iTunes, Spotify, YouTube, or wherever you find

1194
01:08:57,280 --> 01:09:01,040
Speaker 2: your favorite podcast. Sign up for my daily reading list

1195
01:09:01,560 --> 01:09:06,200
Speaker 2: at rid Halts. Follow me on Twitter at Barry Underscore

1196
01:09:06,280 --> 01:09:10,200
Speaker 2: Rit Halts until I get my hacked account at rid

1197
01:09:10,200 --> 01:09:16,240
Speaker 2: Holt's back. I say that. I say that because the

1198
01:09:16,320 --> 01:09:20,759
Speaker 2: process of dealing with the seventeen people left at once

1199
01:09:20,840 --> 01:09:27,000
Speaker 2: Twitter now x is unbelievably frustrating and annoying. Follow all

1200
01:09:27,080 --> 01:09:31,160
Speaker 2: of the fine family of podcasts on Twitter at podcast

1201
01:09:31,720 --> 01:09:33,800
Speaker 2: I would be remiss if I did not thank the

1202
01:09:33,800 --> 01:09:37,639
Speaker 2: crack team that helps put these conversations together each week.

1203
01:09:38,320 --> 01:09:42,080
Speaker 2: Paris Woald is my producer. Attiko val Bron is my

1204
01:09:42,280 --> 01:09:47,519
Speaker 2: project manager. Sean Russo is my director of Research. I'm

1205
01:09:47,600 --> 01:09:51,000
Speaker 2: Barry rid Halts. You've been listening to Masters in Business

1206
01:09:51,560 --> 01:10:03,160
Speaker 2: on Bloomberg Radio at