1 00:00:02,040 --> 00:00:07,360 Speaker 1: This is Master's in Business with Barry Ridholds on Bloomberg Radio. 2 00:00:08,400 --> 00:00:11,800 Speaker 2: This week on the podcast Strap Yourself In, I have 3 00:00:12,039 --> 00:00:16,159 Speaker 2: another extra special guest. John mccauliffe is co founder and 4 00:00:16,239 --> 00:00:19,720 Speaker 2: chief investment officer at the Volleyon Group. They're a five 5 00:00:19,760 --> 00:00:23,360 Speaker 2: billion dollar hedge funds and one of the earliest shops 6 00:00:24,079 --> 00:00:27,880 Speaker 2: to ever use machine learning as it applies to training 7 00:00:27,920 --> 00:00:33,239 Speaker 2: and investment management decisions. It is a full systematic approach 8 00:00:33,360 --> 00:00:39,599 Speaker 2: to using computer horsepower and database and machine learning and 9 00:00:39,880 --> 00:00:43,960 Speaker 2: their own predictive engine to make investments and trades, and 10 00:00:44,000 --> 00:00:48,400 Speaker 2: it's managed to put together quite a track record. Previously, 11 00:00:48,479 --> 00:00:52,680 Speaker 2: John was at d SHAW where he ran statistical arbitrage. 12 00:00:53,080 --> 00:00:56,040 Speaker 2: He is one of the people who worked on the 13 00:00:56,080 --> 00:01:01,520 Speaker 2: Amazon recommendation engine, and he is currently a professor of 14 00:01:01,560 --> 00:01:06,040 Speaker 2: statistics at Berkeley. I don't even know where to begin, 15 00:01:06,200 --> 00:01:10,160 Speaker 2: other than say, if you're interested in AI or machine 16 00:01:10,240 --> 00:01:16,080 Speaker 2: learning or quantitative strategies, this is just a masterclass in 17 00:01:16,120 --> 00:01:18,280 Speaker 2: how it's done by one of the first people in 18 00:01:18,319 --> 00:01:22,880 Speaker 2: the space to not only do this sort of machine 19 00:01:22,920 --> 00:01:25,800 Speaker 2: learning and apply it to investing, but one of the best. 20 00:01:26,120 --> 00:01:29,400 Speaker 2: I think this is a fascinating conversation and I believe 21 00:01:29,440 --> 00:01:31,840 Speaker 2: you will find it to be so. Also, with no 22 00:01:32,000 --> 00:01:37,800 Speaker 2: further ado, my discussion with volleyon groups. John mccauliffe. John mccauliff, 23 00:01:37,920 --> 00:01:39,479 Speaker 2: Welcome to Bloomberg. 24 00:01:40,120 --> 00:01:41,560 Speaker 1: Thanks, Barry. I'm really happy to be here. 25 00:01:41,640 --> 00:01:45,160 Speaker 2: So let's talk a little bit about your academic background. First, 26 00:01:45,840 --> 00:01:49,640 Speaker 2: you start out undergrad computer science and applied mathematics at 27 00:01:49,720 --> 00:01:54,040 Speaker 2: Harvard before you go on to get a PhD from 28 00:01:54,080 --> 00:01:58,240 Speaker 2: California Berkeley. What led to a career in data analysis? 29 00:01:58,280 --> 00:02:00,880 Speaker 2: How early did you know that's what you wanted to do? 30 00:02:01,880 --> 00:02:06,200 Speaker 1: Well, it was a winding path. Actually, I was very 31 00:02:06,240 --> 00:02:11,320 Speaker 1: interested in international relations and foreign languages when I was 32 00:02:11,400 --> 00:02:13,880 Speaker 1: finishing high school. In fact, I spent the last year 33 00:02:13,919 --> 00:02:17,040 Speaker 1: of high school as an exchange student in Germany. And 34 00:02:17,120 --> 00:02:20,400 Speaker 1: so when I got to college, I was expecting to 35 00:02:20,919 --> 00:02:23,880 Speaker 1: major in government and go on to maybe work in 36 00:02:23,919 --> 00:02:25,600 Speaker 1: the foreign service something like that. 37 00:02:25,800 --> 00:02:30,200 Speaker 2: Really, so this is a big shift from your original expectations. 38 00:02:30,280 --> 00:02:33,559 Speaker 1: Yeah, it took about one semester for me to realize 39 00:02:33,840 --> 00:02:36,600 Speaker 1: that none of the questions that were being asked in 40 00:02:36,639 --> 00:02:39,680 Speaker 1: my classes had definitive and correct answers. 41 00:02:40,680 --> 00:02:41,720 Speaker 2: Did that frustrate you? 42 00:02:41,880 --> 00:02:45,760 Speaker 1: It did frustrate me. Yeah, And so I stayed home 43 00:02:45,880 --> 00:02:47,959 Speaker 1: over winter I stayed Excuse me, I didn't go home. 44 00:02:48,000 --> 00:02:51,120 Speaker 1: I stayed at college over winter break to try to 45 00:02:51,160 --> 00:02:52,400 Speaker 1: sort out what the heck I was going to do, 46 00:02:52,440 --> 00:02:54,440 Speaker 1: because I could see that it wasn't My plan was 47 00:02:54,440 --> 00:02:58,000 Speaker 1: in disarray. And I'd always been interested in computers, had 48 00:02:58,000 --> 00:03:01,840 Speaker 1: played around with computers, never done anything very serious, but 49 00:03:02,919 --> 00:03:05,560 Speaker 1: I thought I might as well give it a shot, 50 00:03:05,639 --> 00:03:07,799 Speaker 1: and so in the spring semester I took my first 51 00:03:07,840 --> 00:03:13,560 Speaker 1: computer science course. And when you write software, everything has 52 00:03:13,720 --> 00:03:15,800 Speaker 1: a right answer. It either does what you wanted to 53 00:03:15,800 --> 00:03:16,200 Speaker 1: do or. 54 00:03:16,160 --> 00:03:20,360 Speaker 2: It doesn't, does not compile exactly. So that's really quite 55 00:03:20,480 --> 00:03:26,919 Speaker 2: quite fascinating. So what led you from Berkeley to d 56 00:03:27,280 --> 00:03:30,040 Speaker 2: sure that they're one of the first quand shops. How 57 00:03:30,080 --> 00:03:31,839 Speaker 2: did you get there? What sort of research did Yeah? 58 00:03:31,840 --> 00:03:34,920 Speaker 1: I actually I spent time at d Shot in between 59 00:03:34,960 --> 00:03:38,920 Speaker 1: my undergrad and my PhD program, So it was after 60 00:03:39,000 --> 00:03:40,720 Speaker 1: Harvard that I went to that show. 61 00:03:40,840 --> 00:03:45,280 Speaker 2: Did that light an interest in using machine learning and 62 00:03:45,320 --> 00:03:48,840 Speaker 2: computers applied to finance or what was that experience like? 63 00:03:49,120 --> 00:03:54,040 Speaker 1: Yeah, it made me really interested in and excited about 64 00:03:54,520 --> 00:03:59,720 Speaker 1: using statistical thinking and data analysis to sort of understand 65 00:03:59,720 --> 00:04:02,840 Speaker 1: then amics of securities prices. Machine learning did not play 66 00:04:02,880 --> 00:04:05,840 Speaker 1: really a role at that time, I think, not at 67 00:04:05,920 --> 00:04:08,680 Speaker 1: d SHAW, but you know, probably nowhere it was too 68 00:04:08,720 --> 00:04:13,960 Speaker 1: immature a feel in the nineties. But I had already 69 00:04:14,000 --> 00:04:20,039 Speaker 1: been curious and interested in using these kinds of statistical 70 00:04:20,080 --> 00:04:23,480 Speaker 1: tools in trading and in investing when I was finishing 71 00:04:23,520 --> 00:04:25,880 Speaker 1: college and then at d SHAW. You know, I had 72 00:04:25,920 --> 00:04:28,800 Speaker 1: brilliant colleagues and we were working on hard problems. So 73 00:04:28,839 --> 00:04:30,200 Speaker 1: I really, I really got a lot. 74 00:04:30,040 --> 00:04:32,559 Speaker 2: Of us still one of the top performing hedge funds, 75 00:04:32,600 --> 00:04:34,520 Speaker 2: one of the earliest quant hedge funds, A great a 76 00:04:34,560 --> 00:04:37,440 Speaker 2: great place to absolutely cut your teeth at. So was 77 00:04:37,480 --> 00:04:40,080 Speaker 2: it Harvard d SHAW and then Berkeley? 78 00:04:40,160 --> 00:04:40,800 Speaker 1: Yeah, that's right? 79 00:04:40,839 --> 00:04:43,800 Speaker 2: And then from Berkeley? How did you end up at Amazon? 80 00:04:44,800 --> 00:04:47,400 Speaker 1: I guess I should correct myself. There was a year 81 00:04:47,440 --> 00:04:50,320 Speaker 1: at Amazon after d Eshaw, but before Berkeley. 82 00:04:50,240 --> 00:04:55,520 Speaker 2: And am I reading this correctly? The recommendation engine that 83 00:04:55,640 --> 00:04:58,480 Speaker 2: Amazon uses you helped develop? 84 00:04:58,880 --> 00:05:01,200 Speaker 1: I would say I worked on it. I would you 85 00:05:01,200 --> 00:05:03,560 Speaker 1: know it existed, It was in place when I got there, 86 00:05:03,920 --> 00:05:07,200 Speaker 1: and sort of the things that are familiar about the 87 00:05:07,200 --> 00:05:11,480 Speaker 1: recommendation engine had already been built by my manager and 88 00:05:12,000 --> 00:05:17,520 Speaker 1: his colleagues. But I worked I did research on improvements 89 00:05:17,560 --> 00:05:21,880 Speaker 1: and different ways of forming recommendations. It was funny because 90 00:05:21,920 --> 00:05:28,840 Speaker 1: at the time, the entire database of purchase history for 91 00:05:29,240 --> 00:05:32,719 Speaker 1: all of Amazon fit in one twenty gigabyte file on 92 00:05:32,800 --> 00:05:34,160 Speaker 1: a disc, so I could just load it on my 93 00:05:34,200 --> 00:05:35,280 Speaker 1: computer and run. 94 00:05:35,360 --> 00:05:36,960 Speaker 2: Now I don't think we could do that anymore. We 95 00:05:37,160 --> 00:05:39,920 Speaker 2: could not, so, thank goodness is Amica Zone cloud services, 96 00:05:39,960 --> 00:05:43,279 Speaker 2: so you could put what is it, twenty five years 97 00:05:43,520 --> 00:05:50,200 Speaker 2: and hundreds of billions of dollars of transactions. So my 98 00:05:50,320 --> 00:05:53,839 Speaker 2: assumption is products like that are highly iterative. The first 99 00:05:53,920 --> 00:05:56,159 Speaker 2: version is all right, it does a half decent job, 100 00:05:56,440 --> 00:05:58,240 Speaker 2: and then it gets better, and then it starts to 101 00:05:58,279 --> 00:06:01,919 Speaker 2: get almost spookily good. It's like, oh, how much of 102 00:06:01,960 --> 00:06:04,560 Speaker 2: that is just the size of the database, and how 103 00:06:04,600 --> 00:06:07,360 Speaker 2: much of that is just a clever algorithm. 104 00:06:08,040 --> 00:06:12,960 Speaker 1: Well, that's a great question, because the two are inextricably linked. 105 00:06:14,160 --> 00:06:18,080 Speaker 1: The way that you make algorithms great is by making 106 00:06:18,120 --> 00:06:21,360 Speaker 1: them more powerful, more expressive, able to describe lots of 107 00:06:21,360 --> 00:06:25,000 Speaker 1: different kinds of patterns and relationships. But those kinds of 108 00:06:25,040 --> 00:06:29,480 Speaker 1: approaches need huge amounts of data in order to correctly 109 00:06:29,520 --> 00:06:32,280 Speaker 1: sort out what's signal and what's noise. The more expressive. 110 00:06:32,800 --> 00:06:35,680 Speaker 1: A tool like that is like a recommender system, the 111 00:06:35,680 --> 00:06:40,359 Speaker 1: more prone it is to mistake one time noise for 112 00:06:40,560 --> 00:06:44,200 Speaker 1: persistent signal, and that is a recurring theme in statistical prediction. 113 00:06:44,279 --> 00:06:47,599 Speaker 1: It is really the central problem in statistical predictions. So 114 00:06:47,839 --> 00:06:50,520 Speaker 1: you have it in recommender systems, you have it in 115 00:06:50,720 --> 00:06:54,960 Speaker 1: predicting price action, in the problems that we solve, and elsewhere. 116 00:06:55,400 --> 00:06:57,560 Speaker 2: There was a pretty infamous New York Times article a 117 00:06:57,560 --> 00:07:02,919 Speaker 2: couple of years ago about targets using their own recommender 118 00:07:03,000 --> 00:07:09,279 Speaker 2: system and sending out maternity things to people. A dad 119 00:07:09,320 --> 00:07:13,160 Speaker 2: gets his young teenage daughters what is this and goes 120 00:07:13,200 --> 00:07:15,200 Speaker 2: in to yell at them, and turns out she was 121 00:07:15,200 --> 00:07:20,080 Speaker 2: pregnant and they had pieced it together. How far of 122 00:07:20,120 --> 00:07:23,400 Speaker 2: a leap is it from these systems to much more 123 00:07:23,880 --> 00:07:29,160 Speaker 2: sophisticated machine learning and even large language models. 124 00:07:30,080 --> 00:07:32,840 Speaker 1: It's the answer, it turns out, is that it's a 125 00:07:32,920 --> 00:07:37,520 Speaker 1: question of scale. That wasn't at all obvious before GPT 126 00:07:38,120 --> 00:07:42,640 Speaker 1: three and chat GPT, But it just turned out that 127 00:07:42,920 --> 00:07:46,360 Speaker 1: when you have, for example, GPT is built from a 128 00:07:46,440 --> 00:07:50,800 Speaker 1: database of sentences in English, it's got a trillion words 129 00:07:50,800 --> 00:07:53,880 Speaker 1: in it that database, and when you take a trillion 130 00:07:53,920 --> 00:07:55,520 Speaker 1: words and you use it to fit a model that 131 00:07:55,560 --> 00:07:59,080 Speaker 1: has one hundred and seventy five billion parameters. There is 132 00:07:59,120 --> 00:08:02,480 Speaker 1: apparently a kind of transition where things become, you know, 133 00:08:02,560 --> 00:08:05,440 Speaker 1: frankly astounding. I don't I think, I don't think that 134 00:08:05,480 --> 00:08:07,600 Speaker 1: anybody who isn't astounded is telling the truth. 135 00:08:07,720 --> 00:08:13,200 Speaker 2: Right. It's eerie is in terms of how sophisticated it is, 136 00:08:13,520 --> 00:08:17,120 Speaker 2: but it's also kind of surprising in terms of I 137 00:08:17,120 --> 00:08:20,520 Speaker 2: guess what the programs like to call hallucinations. I guess 138 00:08:20,560 --> 00:08:24,160 Speaker 2: if you're using the Internet as your base model, Hey, 139 00:08:24,320 --> 00:08:26,760 Speaker 2: there's one or two things on the Internet that are wrong, 140 00:08:27,240 --> 00:08:30,080 Speaker 2: so of course that's going to show up in something 141 00:08:30,080 --> 00:08:30,960 Speaker 2: like chap GPT. 142 00:08:31,640 --> 00:08:36,559 Speaker 1: Yeah, you know. Underlyingly, there's this tool GPT three that's 143 00:08:36,600 --> 00:08:39,560 Speaker 1: really the engine that powers jed GPT, and that tool 144 00:08:40,760 --> 00:08:43,560 Speaker 1: it has one goal. It's a simple goal. You show 145 00:08:43,600 --> 00:08:46,240 Speaker 1: at the beginning of a sentence, and it predicts the 146 00:08:46,280 --> 00:08:48,600 Speaker 1: next word in the sentence, and that's all it is 147 00:08:48,600 --> 00:08:50,800 Speaker 1: trained to do. I mean, it really is actually that simple. 148 00:08:51,280 --> 00:08:53,559 Speaker 2: It's a dumb program that looks smart. 149 00:08:53,679 --> 00:08:58,079 Speaker 1: If you like. But the thing about predicting the next 150 00:08:58,120 --> 00:09:01,600 Speaker 1: word in a sentence is whether you know the sequence 151 00:09:01,600 --> 00:09:04,400 Speaker 1: of words that's being output, is leading to something that 152 00:09:04,440 --> 00:09:07,400 Speaker 1: is true or false, is irrelevant. The only thing that 153 00:09:07,520 --> 00:09:10,640 Speaker 1: it is trained to do is make highly accurate predictions 154 00:09:10,679 --> 00:09:11,320 Speaker 1: of next words. 155 00:09:11,640 --> 00:09:16,080 Speaker 2: So when I said it's really very sophisticated, it just 156 00:09:16,640 --> 00:09:20,080 Speaker 2: for what we tend to call this artificial intelligence. But 157 00:09:20,160 --> 00:09:22,559 Speaker 2: I've read a number of people said, hey, this really 158 00:09:22,600 --> 00:09:26,320 Speaker 2: is an AI. This is something a little more rudimentary. 159 00:09:26,480 --> 00:09:30,719 Speaker 1: Yeah, I think, you know, a critic would say that 160 00:09:31,280 --> 00:09:35,559 Speaker 1: artificial intelligence is a complete misnomer. There's sort of nothing 161 00:09:36,160 --> 00:09:40,560 Speaker 1: remotely intelligent in the colloquial sense about these systems. And 162 00:09:40,600 --> 00:09:44,080 Speaker 1: then a common defense in AI research is that artificial 163 00:09:44,120 --> 00:09:46,240 Speaker 1: intelligence is a moving target. As soon as you build 164 00:09:46,240 --> 00:09:50,120 Speaker 1: a system that does something quasi magical that was the 165 00:09:50,160 --> 00:09:53,720 Speaker 1: old yardstick of intelligence, then the goalposts get moved by 166 00:09:53,720 --> 00:09:57,400 Speaker 1: the people who are supplying the evaluations. And I guess 167 00:09:57,440 --> 00:09:59,320 Speaker 1: I would sit somewhere in between. I think the language 168 00:09:59,320 --> 00:10:03,480 Speaker 1: is unfortunate because it's so easily misconstrued. I wouldn't call 169 00:10:03,520 --> 00:10:05,760 Speaker 1: the system dumb, and I wouldn't call it smart. It's 170 00:10:06,000 --> 00:10:08,319 Speaker 1: you know, those are those are not characteristics of these systems. 171 00:10:08,440 --> 00:10:10,320 Speaker 2: But it's complex and sophisticated. 172 00:10:10,440 --> 00:10:12,280 Speaker 1: It certainly is it has one hundred and seventy five 173 00:10:12,280 --> 00:10:15,559 Speaker 1: billion parameters. That doesn't fit your definition of complex you 174 00:10:15,679 --> 00:10:16,120 Speaker 1: know what would? 175 00:10:16,920 --> 00:10:21,360 Speaker 2: Yeah, that works for me. So your in your career line, 176 00:10:21,400 --> 00:10:26,720 Speaker 2: where is aphametrics and what was that recommendation engine? Like? 177 00:10:26,920 --> 00:10:29,760 Speaker 1: Yeah, So that was work I did as a summer 178 00:10:30,280 --> 00:10:34,120 Speaker 1: research intern during my PhD. And that work was about 179 00:10:35,800 --> 00:10:39,880 Speaker 1: what's called the problem is called genotype calling. So genotype calling, 180 00:10:40,360 --> 00:10:43,200 Speaker 1: I'll explain, Barry, do you have an identical twin? I 181 00:10:43,280 --> 00:10:46,720 Speaker 1: do not, Okay, So I can safely say your genome 182 00:10:46,760 --> 00:10:49,040 Speaker 1: is unique in the world. There's no one else who 183 00:10:49,040 --> 00:10:52,560 Speaker 1: has exactly your genome. On the other hand, if you 184 00:10:52,559 --> 00:10:55,280 Speaker 1: were to lay your genome in mind alongside each other 185 00:10:55,400 --> 00:10:58,840 Speaker 1: lined up, they would be ninety nine point nine percent identical. 186 00:10:58,960 --> 00:11:03,240 Speaker 1: About one position in a thousand is different. But those 187 00:11:03,280 --> 00:11:05,719 Speaker 1: differences are what caused you to be you and me 188 00:11:05,800 --> 00:11:08,200 Speaker 1: to be me. So they're obviously of intense kind of 189 00:11:08,200 --> 00:11:11,680 Speaker 1: scientific and applied interest. And so it's very important to 190 00:11:11,800 --> 00:11:15,320 Speaker 1: be able to take a sort of a sample of 191 00:11:15,360 --> 00:11:20,360 Speaker 1: your DNA and quickly produce a profile of all the 192 00:11:20,400 --> 00:11:23,960 Speaker 1: places that have variability what your particular values are, Okay, 193 00:11:24,120 --> 00:11:27,079 Speaker 1: And that problem is the genotyping problem. 194 00:11:27,400 --> 00:11:31,000 Speaker 2: And this used to be a very expensive, very complex 195 00:11:31,760 --> 00:11:34,840 Speaker 2: problem to solve that. We've spent billions of dollars figuring 196 00:11:34,880 --> 00:11:38,040 Speaker 2: out now a lot faster, a lot cheaper. 197 00:11:37,800 --> 00:11:40,560 Speaker 1: A lot faster. In fact, even the technology I worked 198 00:11:40,600 --> 00:11:43,280 Speaker 1: on in two thousand and five two thousand and four 199 00:11:43,840 --> 00:11:47,679 Speaker 1: is multiple generations old and not really what's used anymore. 200 00:11:47,880 --> 00:11:51,000 Speaker 2: So let's talk about what you did at the efficient frontier. 201 00:11:51,559 --> 00:11:56,680 Speaker 2: Explain what real time click prediction rules are and how 202 00:11:56,720 --> 00:11:59,120 Speaker 2: it works for a keyword search. 203 00:11:59,280 --> 00:12:06,000 Speaker 1: Sure, the revenue engine that drives Google is search keyword ads, right, 204 00:12:06,040 --> 00:12:07,480 Speaker 1: So every time you do a search, at the top 205 00:12:07,520 --> 00:12:10,360 Speaker 1: you see ad ad AD, And so how do those 206 00:12:10,400 --> 00:12:14,520 Speaker 1: ads get there? Well, actually it's surprising maybe if you 207 00:12:14,520 --> 00:12:16,280 Speaker 1: don't know about it, but every single time you type 208 00:12:16,280 --> 00:12:19,240 Speaker 1: in a search term on Google and hit return, a 209 00:12:19,559 --> 00:12:23,280 Speaker 1: very fast auction takes place, and a whole bunch of 210 00:12:23,360 --> 00:12:28,840 Speaker 1: companies running software bid electronically to place their ads at 211 00:12:28,840 --> 00:12:33,360 Speaker 1: the top of your search results. And the more or 212 00:12:33,480 --> 00:12:36,600 Speaker 1: less the results that are shown on the page are 213 00:12:36,600 --> 00:12:38,760 Speaker 1: in order of how much they bid. It's not quite true, 214 00:12:38,760 --> 00:12:40,000 Speaker 1: but you could think of it. It's true. 215 00:12:40,360 --> 00:12:44,160 Speaker 2: A rough outline. So the first three sponsored results on 216 00:12:44,280 --> 00:12:48,320 Speaker 2: a Google page, go through that auction process, and I 217 00:12:48,360 --> 00:12:50,760 Speaker 2: think at this point everybody knows what page rank is 218 00:12:50,800 --> 00:12:53,800 Speaker 2: for for the rest of that that's right, And that 219 00:12:53,840 --> 00:12:56,080 Speaker 2: seemed to be Google secret sauce early on. 220 00:12:56,280 --> 00:13:01,360 Speaker 1: Right, Well, you know, to talk about the the ad placement. 221 00:13:01,520 --> 00:13:04,120 Speaker 1: So the people who are supplying the ad, who are 222 00:13:04,120 --> 00:13:06,200 Speaker 1: participating in the auctions, they have a problem, which is 223 00:13:06,200 --> 00:13:08,840 Speaker 1: how much to bid, right, And so how would you 224 00:13:08,880 --> 00:13:11,959 Speaker 1: decide how much to bid? Well, you want to know 225 00:13:13,120 --> 00:13:15,520 Speaker 1: basically the probability that somebody is going to click on 226 00:13:15,559 --> 00:13:19,040 Speaker 1: your ad, and then you would multiply that by how 227 00:13:19,120 --> 00:13:21,959 Speaker 1: much money you make eventually if they click. And that's 228 00:13:22,040 --> 00:13:24,840 Speaker 1: kind of an expectation of how much money you'll make. 229 00:13:25,200 --> 00:13:29,959 Speaker 1: And so then you gear your bid price to make 230 00:13:30,000 --> 00:13:32,319 Speaker 1: sure that it's going to be profitable for you. And 231 00:13:32,360 --> 00:13:36,000 Speaker 1: then so really you have to make a decision about 232 00:13:36,200 --> 00:13:38,040 Speaker 1: what this click through rate is going to be. You 233 00:13:38,040 --> 00:13:39,800 Speaker 1: have to predict the click through probability. 234 00:13:40,360 --> 00:13:42,480 Speaker 2: So I was going to say, this sounds like it's 235 00:13:42,520 --> 00:13:48,000 Speaker 2: a very sophisticated application of computer science probability and statistics. 236 00:13:48,520 --> 00:13:50,880 Speaker 2: And if you do it right, you make money, and 237 00:13:50,960 --> 00:13:54,360 Speaker 2: if you do it wrong, your ad budget is a 238 00:13:54,360 --> 00:13:55,400 Speaker 2: money loser. 239 00:13:55,160 --> 00:13:55,559 Speaker 1: That's right. 240 00:13:55,840 --> 00:13:58,520 Speaker 2: Huh. So tell us a little bit about your doctorate, 241 00:13:58,600 --> 00:14:01,920 Speaker 2: what you wrote about for your PhD at Berkeley. 242 00:14:02,240 --> 00:14:06,640 Speaker 1: Yeah, so we're back to genomes. Actually, this was around 243 00:14:06,679 --> 00:14:08,679 Speaker 1: the time when I was in my first year of 244 00:14:08,679 --> 00:14:11,560 Speaker 1: my PhD program, is when the human genome was published 245 00:14:11,960 --> 00:14:16,120 Speaker 1: in Nature. So it was kind of really the beginning 246 00:14:16,200 --> 00:14:20,720 Speaker 1: of the explosion of work on kind of high throughput, 247 00:14:21,240 --> 00:14:26,400 Speaker 1: large scale genetics research. And one really important question after 248 00:14:26,440 --> 00:14:28,600 Speaker 1: you've sequenced a genome is well, what are all the 249 00:14:28,640 --> 00:14:30,360 Speaker 1: bits of it doing. You can look at a string 250 00:14:30,400 --> 00:14:33,520 Speaker 1: of DNA. It's just made up of these kind of 251 00:14:33,520 --> 00:14:37,400 Speaker 1: four letters, but you don't want to just know the 252 00:14:37,440 --> 00:14:39,960 Speaker 1: four letters. They're kind of a code. And some parts 253 00:14:39,960 --> 00:14:43,920 Speaker 1: of the DNA represent useful stuff that is being turned 254 00:14:44,080 --> 00:14:47,720 Speaker 1: by your cell into proteins and et cetera, and other 255 00:14:47,840 --> 00:14:49,880 Speaker 1: parts of the DNA don't appear to have any function 256 00:14:49,920 --> 00:14:51,760 Speaker 1: at all, and it's really important to know which is 257 00:14:51,800 --> 00:14:56,000 Speaker 1: which as a biology researcher. And so it's you know, 258 00:14:56,040 --> 00:15:01,240 Speaker 1: for a long time before high throughput sequencing, biologists would 259 00:15:01,240 --> 00:15:03,320 Speaker 1: be in the lab and they would very laboriously look 260 00:15:03,360 --> 00:15:05,920 Speaker 1: at very tiny segments of DNA and establish what their 261 00:15:05,920 --> 00:15:08,960 Speaker 1: function was. But now we have the whole human genome 262 00:15:09,040 --> 00:15:10,880 Speaker 1: sitting on disk, and we would like to be able 263 00:15:10,920 --> 00:15:13,200 Speaker 1: to just run an analysis on it and have the 264 00:15:13,240 --> 00:15:16,760 Speaker 1: computer spit out everything that is functional and not functional. 265 00:15:17,760 --> 00:15:21,480 Speaker 1: And so that's the problem I worked on. And a 266 00:15:21,520 --> 00:15:24,400 Speaker 1: really important insight is that you can take advantage of 267 00:15:24,440 --> 00:15:28,600 Speaker 1: the idea of natural selection and the idea of evolution 268 00:15:29,120 --> 00:15:31,640 Speaker 1: to help you. And the way you do that is 269 00:15:32,160 --> 00:15:34,840 Speaker 1: you have the human genome, you sequence a bunch of 270 00:15:35,160 --> 00:15:38,800 Speaker 1: primate genomes nearby relatives of the union, and you lay 271 00:15:38,840 --> 00:15:41,760 Speaker 1: all those genomes on top of each other, and then 272 00:15:41,960 --> 00:15:45,800 Speaker 1: you look for places where all of the genomes agree. Right, 273 00:15:45,920 --> 00:15:50,080 Speaker 1: there hasn't been variation that's happening through mutations. And why 274 00:15:50,160 --> 00:15:53,080 Speaker 1: hasn't there been, Well, the biggest force that throws out 275 00:15:53,160 --> 00:15:56,440 Speaker 1: variation is natural selection. If you get a mutation in 276 00:15:56,480 --> 00:15:59,400 Speaker 1: a part of your genome that really matters, then you're 277 00:15:59,480 --> 00:16:02,640 Speaker 1: kind of on it and you won't have progeny and 278 00:16:02,680 --> 00:16:06,120 Speaker 1: that'll get stamped out. So natural selection is this very 279 00:16:06,120 --> 00:16:10,160 Speaker 1: strong force that's causing DNA not to change. And so 280 00:16:10,200 --> 00:16:13,160 Speaker 1: when you when you make these primate alignments, you can 281 00:16:13,360 --> 00:16:18,320 Speaker 1: really leverage that fact and look for conservation and use 282 00:16:18,360 --> 00:16:20,000 Speaker 1: that as a big signal that something is functional. 283 00:16:20,280 --> 00:16:25,160 Speaker 2: Huh, really really interesting. You mentioned our DNA is ninety 284 00:16:25,240 --> 00:16:28,640 Speaker 2: nine point ninety nine. I don't know how many places 285 00:16:28,640 --> 00:16:30,200 Speaker 2: to the right of the decimal point you would want 286 00:16:30,200 --> 00:16:34,560 Speaker 2: to go, but very similar. How how similar or different 287 00:16:34,720 --> 00:16:38,360 Speaker 2: are we from let's say, a chimpanzee. I've always questioned, 288 00:16:38,400 --> 00:16:41,680 Speaker 2: there's an urban legend that they're practically the same. It 289 00:16:41,680 --> 00:16:46,680 Speaker 2: always seems like it's overstated two percent. So you and 290 00:16:46,720 --> 00:16:49,680 Speaker 2: I have a point one percent different me and the 291 00:16:49,720 --> 00:16:52,280 Speaker 2: average chimp. It's two point zero percent. 292 00:16:52,440 --> 00:16:55,800 Speaker 1: That's exactly right. Yeah, so chimps are essentially our closest 293 00:16:56,360 --> 00:16:57,720 Speaker 1: non human primate relatives. 294 00:16:58,320 --> 00:17:02,280 Speaker 2: Really really quite fascinating. So let's talk a little bit 295 00:17:02,320 --> 00:17:05,160 Speaker 2: about the firm. You guys were one of the earliest 296 00:17:05,160 --> 00:17:09,040 Speaker 2: pioneers of machine learning research. Explain a little bit what 297 00:17:09,119 --> 00:17:09,840 Speaker 2: the firm does. 298 00:17:10,880 --> 00:17:16,439 Speaker 1: Sure, so, we run trading strategies investment strategies that are 299 00:17:16,640 --> 00:17:20,320 Speaker 1: fully automated, so we call them fully systematic, and that 300 00:17:20,400 --> 00:17:24,760 Speaker 1: means that we have software systems that run every day 301 00:17:25,440 --> 00:17:29,800 Speaker 1: during market hours, and they take in information about the 302 00:17:29,880 --> 00:17:34,400 Speaker 1: characteristics of the securities we're trading. Think of stocks and 303 00:17:34,440 --> 00:17:39,800 Speaker 1: then they make predictions of how the prices of each 304 00:17:39,960 --> 00:17:43,400 Speaker 1: security is going to change over time, and then they 305 00:17:44,480 --> 00:17:47,600 Speaker 1: decide on changes in our inventory, changes in held positions 306 00:17:48,280 --> 00:17:53,000 Speaker 1: based on those predictions, and then those desired changes are 307 00:17:53,040 --> 00:17:56,360 Speaker 1: sent into an execution system which automatically carries them out. 308 00:17:56,960 --> 00:18:02,680 Speaker 2: So fully automated. Is there supervision or it's kind of 309 00:18:02,760 --> 00:18:05,040 Speaker 2: running on its own with a couple of checks. 310 00:18:05,119 --> 00:18:09,119 Speaker 1: There's lots of human diagnostic supervision, right, So there are 311 00:18:09,160 --> 00:18:14,760 Speaker 1: people who are watching screens full of instrumentation and telemetry 312 00:18:14,840 --> 00:18:17,720 Speaker 1: about what the systems are doing. But those people are 313 00:18:17,760 --> 00:18:21,240 Speaker 1: not taking any actions, right unless there's a problem, right, 314 00:18:21,320 --> 00:18:23,000 Speaker 1: and then they do. 315 00:18:23,480 --> 00:18:26,120 Speaker 2: So let's talk a little bit about how machines learn 316 00:18:26,200 --> 00:18:30,280 Speaker 2: to identify signals. I'm assuming you start with the giant 317 00:18:30,359 --> 00:18:35,320 Speaker 2: database that is the history of stock prices, volume movement, etc. 318 00:18:36,200 --> 00:18:38,840 Speaker 2: And then bring in a lot of additional things to bear. 319 00:18:39,520 --> 00:18:44,439 Speaker 2: What's the process like developing a particular trading strategy. 320 00:18:44,960 --> 00:18:48,760 Speaker 1: Yeah, so, as you're saying, we begin with a very 321 00:18:48,840 --> 00:18:54,160 Speaker 1: large historical data set of prices and volumes, market data 322 00:18:54,200 --> 00:18:59,440 Speaker 1: that kind, but importantly all kinds of other information about securities, 323 00:19:00,000 --> 00:19:04,760 Speaker 1: financial statement data, textual data, analyst data. 324 00:19:05,080 --> 00:19:11,000 Speaker 2: So it's everything from prices fundamental everything from learnings to 325 00:19:11,080 --> 00:19:14,080 Speaker 2: revenue to sales, etc. I'm assuming the change and the 326 00:19:14,680 --> 00:19:17,159 Speaker 2: delta of the change is going to be very significant 327 00:19:17,200 --> 00:19:22,000 Speaker 2: in that. What about macroeconomic what some people call noise, 328 00:19:22,119 --> 00:19:27,159 Speaker 2: but one would imagine some signal in everything from inflation 329 00:19:27,359 --> 00:19:31,840 Speaker 2: to interest rates to GDPs firm spending. Are those inputs 330 00:19:32,280 --> 00:19:34,200 Speaker 2: worthwhile or how do you think about those? 331 00:19:34,560 --> 00:19:38,640 Speaker 1: So we don't hold portfolios that are exposed to those things. 332 00:19:38,760 --> 00:19:42,320 Speaker 1: So it's really a business decision on our part. We 333 00:19:42,440 --> 00:19:47,200 Speaker 1: are working with institutional investors who already have as much 334 00:19:47,240 --> 00:19:50,200 Speaker 1: exposure as they want to things like the market or 335 00:19:50,520 --> 00:19:56,040 Speaker 1: to well recognized econometric risk factors like value, and so 336 00:19:56,080 --> 00:19:58,680 Speaker 1: they don't need our help to be exposed to those things. 337 00:19:58,680 --> 00:20:01,439 Speaker 1: They are very well equipped to handle that part of 338 00:20:01,480 --> 00:20:05,440 Speaker 1: their investment process. What we're trying to provide is the 339 00:20:05,480 --> 00:20:09,000 Speaker 1: most diversification possible. So we want to give them a 340 00:20:09,040 --> 00:20:14,000 Speaker 1: new return stream which has good and stable returns, but 341 00:20:14,160 --> 00:20:17,199 Speaker 1: on top of that, importantly, is also not correlated with 342 00:20:17,240 --> 00:20:19,639 Speaker 1: any of the other return streams that they already that 343 00:20:19,680 --> 00:20:20,280 Speaker 1: they already have. 344 00:20:20,480 --> 00:20:25,040 Speaker 2: That's interesting. So can I assume that you're applying your 345 00:20:25,359 --> 00:20:29,359 Speaker 2: machine learning methodology across different asset classes or is it 346 00:20:29,400 --> 00:20:30,560 Speaker 2: strictly equities? Oh? 347 00:20:30,600 --> 00:20:34,639 Speaker 1: No, We apply it to UH to equities, to credit, 348 00:20:34,720 --> 00:20:39,840 Speaker 1: to corporate bonds, and we trade futures contracts, and in 349 00:20:39,880 --> 00:20:41,520 Speaker 1: the fullness of time, we hope that we will be 350 00:20:41,560 --> 00:20:44,320 Speaker 1: trading kind of every security in the world. 351 00:20:44,359 --> 00:20:47,159 Speaker 2: So, so currently stocks, bonds, When you say futures, I 352 00:20:47,200 --> 00:20:48,440 Speaker 2: assume commodities, all. 353 00:20:48,400 --> 00:20:49,320 Speaker 1: Kinds of futures contract. 354 00:20:49,359 --> 00:20:52,399 Speaker 2: It's really really interesting. So it could be anything from 355 00:20:52,640 --> 00:20:56,560 Speaker 2: interest rate swaps to commodities to the full gamut. So, 356 00:20:56,880 --> 00:21:01,480 Speaker 2: so how different is this approach from what other quant 357 00:21:01,600 --> 00:21:05,480 Speaker 2: shops do that really focus on equities. 358 00:21:06,800 --> 00:21:11,280 Speaker 1: I think it's kind of the same question as asking, well, 359 00:21:11,400 --> 00:21:13,119 Speaker 1: what do we mean when we say we use machine 360 00:21:13,160 --> 00:21:16,480 Speaker 1: learning or that you know we are our principles are 361 00:21:16,520 --> 00:21:20,520 Speaker 1: our machine learning principles, and so how does that make 362 00:21:20,600 --> 00:21:24,159 Speaker 1: us different than the kind of standard approach in quantitative trading? 363 00:21:24,840 --> 00:21:28,000 Speaker 1: And the answer to the question really comes back to 364 00:21:28,000 --> 00:21:31,720 Speaker 1: this idea we mentioned a little while ago of how 365 00:21:31,760 --> 00:21:36,200 Speaker 1: powerful the tools are that you're using to form predictions. Right, 366 00:21:36,480 --> 00:21:40,199 Speaker 1: So in our business, the thing that we build is 367 00:21:40,200 --> 00:21:44,040 Speaker 1: called a prediction rule. Okay, that's that's our widget and 368 00:21:44,320 --> 00:21:46,760 Speaker 1: What a prediction rule does is it takes in a 369 00:21:46,760 --> 00:21:49,560 Speaker 1: bunch of input, a bunch of information about a stock 370 00:21:49,880 --> 00:21:53,600 Speaker 1: at a moment in time, and it hands you a 371 00:21:53,760 --> 00:21:56,080 Speaker 1: guess about how that stock's price is going to change 372 00:21:56,200 --> 00:22:00,840 Speaker 1: over some future period of time. Okay, and so there 373 00:22:00,920 --> 00:22:05,320 Speaker 1: is one most important question about prediction rules, which is 374 00:22:05,480 --> 00:22:07,840 Speaker 1: how complex are they? How much complexity do they have? 375 00:22:08,400 --> 00:22:13,000 Speaker 1: Complexity is a colloquial term. It's unfortunately another example of 376 00:22:13,600 --> 00:22:16,879 Speaker 1: a place where things can be vague or ambiguous because 377 00:22:18,080 --> 00:22:21,359 Speaker 1: a general purpose word has been borrowed in a technical setting. 378 00:22:21,520 --> 00:22:24,399 Speaker 1: But when you use the word complexity in statistical prediction, 379 00:22:24,720 --> 00:22:28,800 Speaker 1: there's a very specific meaning. It means how much expressive 380 00:22:28,840 --> 00:22:32,280 Speaker 1: power does this prediction rule have? How good a job 381 00:22:32,440 --> 00:22:35,280 Speaker 1: can it do of approximating what's going on in the 382 00:22:35,359 --> 00:22:38,200 Speaker 1: data you show it. Remember, we have these giant historical 383 00:22:38,280 --> 00:22:41,760 Speaker 1: data sets, and every entry in the data set looks 384 00:22:41,800 --> 00:22:44,520 Speaker 1: like this. What was going on with the stock at 385 00:22:44,520 --> 00:22:47,440 Speaker 1: a moment in a certain moment in time, it's price action, 386 00:22:47,680 --> 00:22:52,080 Speaker 1: it's financials analyst information. And then what did its price 387 00:22:52,160 --> 00:22:55,040 Speaker 1: do in the subsequent twenty four hours or the subsequent 388 00:22:55,160 --> 00:23:01,000 Speaker 1: fifteen minutes or whatever. Okay, and so when you talk 389 00:23:01,040 --> 00:23:04,240 Speaker 1: about the amount of complexity that a prediction rule has, 390 00:23:04,720 --> 00:23:08,280 Speaker 1: that means how well is it able to capture the 391 00:23:08,320 --> 00:23:11,000 Speaker 1: relationship between the things that you can show it when 392 00:23:11,040 --> 00:23:13,840 Speaker 1: you ask it for a prediction, and what actually happens 393 00:23:14,040 --> 00:23:18,159 Speaker 1: to the price. And naturally you kind of want to 394 00:23:18,720 --> 00:23:20,840 Speaker 1: use high complexity rules because they have a lot of 395 00:23:20,880 --> 00:23:23,440 Speaker 1: approximating power. They do a good job of describing anything 396 00:23:23,480 --> 00:23:26,920 Speaker 1: that's going on. But there are two There are two 397 00:23:26,960 --> 00:23:30,639 Speaker 1: disadvantages to high complexity. One is it needs a lot 398 00:23:30,680 --> 00:23:34,639 Speaker 1: of data, otherwise it gets fooled into thinking that randomness 399 00:23:34,680 --> 00:23:39,000 Speaker 1: is actually signal. And the other is that it's hard 400 00:23:39,000 --> 00:23:41,880 Speaker 1: to reason about what's going on under the hood. Right, 401 00:23:41,960 --> 00:23:45,080 Speaker 1: when you have very simple prediction rules, you can sort 402 00:23:45,119 --> 00:23:48,040 Speaker 1: of summarize everything that's good that they're doing in a sentence. Right, 403 00:23:48,119 --> 00:23:50,920 Speaker 1: you can look you can look inside them and get 404 00:23:50,920 --> 00:23:53,840 Speaker 1: a complete understanding of how they behave, and that's not 405 00:23:53,880 --> 00:23:55,960 Speaker 1: possible with high complexity prediction rules. 406 00:23:56,040 --> 00:23:59,440 Speaker 2: So I'm glad you brought up the concept of how 407 00:23:59,520 --> 00:24:04,159 Speaker 2: easy it is or how frequently you can fool an 408 00:24:04,200 --> 00:24:08,000 Speaker 2: algorithm or a complex rule, because sometimes the results are 409 00:24:08,040 --> 00:24:11,119 Speaker 2: just random. And it reminds me of the issue of 410 00:24:11,880 --> 00:24:14,960 Speaker 2: back testing. No one ever shows you a bad back test. 411 00:24:15,480 --> 00:24:19,480 Speaker 2: How do you deal with the issue of overfitting and 412 00:24:19,720 --> 00:24:23,479 Speaker 2: back testing that just is geared towards what already happened 413 00:24:23,520 --> 00:24:25,440 Speaker 2: and not what might happen in the future. 414 00:24:25,680 --> 00:24:28,960 Speaker 1: Yeah, that is you know, if you like the million 415 00:24:28,960 --> 00:24:34,639 Speaker 1: dollar question in statistical prediction, Okay, and it might you 416 00:24:34,720 --> 00:24:38,840 Speaker 1: might find it surprising that relatively straightforward ideas go a 417 00:24:38,840 --> 00:24:43,280 Speaker 1: long way here. And so let me let me just 418 00:24:43,359 --> 00:24:45,399 Speaker 1: describe a little scenario of how you deal you can 419 00:24:45,440 --> 00:24:47,560 Speaker 1: deal with this. All right, we agree, we have this 420 00:24:47,600 --> 00:24:50,840 Speaker 1: big historical data set, right, One thing you could do 421 00:24:50,920 --> 00:24:53,280 Speaker 1: is just start analyzing the heck out of that data 422 00:24:53,320 --> 00:24:56,920 Speaker 1: set and find a complicated prediction rule. But you're you've 423 00:24:56,920 --> 00:24:59,640 Speaker 1: already started doing it wrong. The first thing you do 424 00:24:59,760 --> 00:25:02,239 Speaker 1: before or you even look at the data is you 425 00:25:02,560 --> 00:25:04,520 Speaker 1: randomly pick out half of the data and you lock 426 00:25:04,560 --> 00:25:07,280 Speaker 1: it in a drawer. Okay, and that leads you with 427 00:25:07,359 --> 00:25:09,280 Speaker 1: the other half of the data that you haven't locked away. 428 00:25:09,800 --> 00:25:12,119 Speaker 1: On this half, you get to go hogwild. You build 429 00:25:12,200 --> 00:25:16,399 Speaker 1: every kind of prediction rule, simple rules, enormously complicated rules, 430 00:25:16,480 --> 00:25:20,640 Speaker 1: everything in between. Right, and now you can check how 431 00:25:20,960 --> 00:25:23,520 Speaker 1: accurate all of these prediction rules that you've built are 432 00:25:24,400 --> 00:25:26,800 Speaker 1: on the data that they have been looking at, and 433 00:25:26,840 --> 00:25:29,200 Speaker 1: the answer will always be the same. The most complex 434 00:25:29,280 --> 00:25:32,000 Speaker 1: rules will look the best. Of course, they have the 435 00:25:32,040 --> 00:25:35,360 Speaker 1: most expressive power, so naturally they do the best job 436 00:25:35,359 --> 00:25:38,600 Speaker 1: of describing what you showed them. The big problem is 437 00:25:38,960 --> 00:25:41,040 Speaker 1: that what you showed them is a mix of signal 438 00:25:41,080 --> 00:25:43,960 Speaker 1: and noise, and there's no way you can tell to 439 00:25:44,080 --> 00:25:47,080 Speaker 1: what extent a complex rule has found the signal versus 440 00:25:47,119 --> 00:25:49,400 Speaker 1: the noise. All you know is that it's perfectly described 441 00:25:49,440 --> 00:25:51,960 Speaker 1: to the data you showed it. You certainly suspect it 442 00:25:52,000 --> 00:25:55,520 Speaker 1: must be overfitting if it's doing that. Well, okay, so 443 00:25:55,640 --> 00:25:59,200 Speaker 1: now you freeze all those prediction rules. You're not allowed 444 00:25:59,200 --> 00:26:01,560 Speaker 1: to change them in any way anymore. And now you 445 00:26:01,640 --> 00:26:04,120 Speaker 1: unlock the drawer and you pull out all that data 446 00:26:04,160 --> 00:26:06,520 Speaker 1: that you've never looked at. You can't overfit data that 447 00:26:06,560 --> 00:26:09,840 Speaker 1: you never fit, and so you take that data and 448 00:26:09,880 --> 00:26:12,359 Speaker 1: you run it through each of these prediction rules that's frozen, 449 00:26:12,359 --> 00:26:14,880 Speaker 1: that you built. And now it is not the case 450 00:26:14,920 --> 00:26:18,760 Speaker 1: at all that the most complex rules look the best. Instead, 451 00:26:19,119 --> 00:26:23,080 Speaker 1: you'll see a kind of U shaped behavior where the 452 00:26:23,200 --> 00:26:25,760 Speaker 1: very simple rules are too simple. They've missed signal, they 453 00:26:25,840 --> 00:26:29,120 Speaker 1: left signal on the table. The two complex rules are 454 00:26:29,280 --> 00:26:32,320 Speaker 1: also doing badly because they've captured all the signal but 455 00:26:32,320 --> 00:26:34,440 Speaker 1: also lots of noise. And then somewhere in the middle 456 00:26:34,520 --> 00:26:37,840 Speaker 1: is a sweet spot where you've struck the right trade 457 00:26:37,840 --> 00:26:42,720 Speaker 1: off between how much expressive power the prediction rule has 458 00:26:43,000 --> 00:26:45,439 Speaker 1: and how good a job it is doing of avoiding 459 00:26:46,440 --> 00:26:49,080 Speaker 1: the mistaking of noise for signal. 460 00:26:49,280 --> 00:26:53,000 Speaker 2: Really really intriguing. So you guys, have you've built one 461 00:26:53,000 --> 00:26:57,399 Speaker 2: of the largest specialized machine learning research and development teams 462 00:26:57,480 --> 00:27:01,080 Speaker 2: in finance. How do you assndle a team like that 463 00:27:02,359 --> 00:27:05,720 Speaker 2: and how do you get the brain trust to do 464 00:27:05,800 --> 00:27:09,680 Speaker 2: the sort of work that's applicable to managing assets. 465 00:27:10,680 --> 00:27:13,879 Speaker 1: Well, the short answer is, we spend a huge amount 466 00:27:14,040 --> 00:27:20,160 Speaker 1: of energy on recruiting and uh, you know, identifying the 467 00:27:20,200 --> 00:27:23,280 Speaker 1: sort of premier people in the field of machine learning, 468 00:27:23,440 --> 00:27:28,480 Speaker 1: kind of both academic and practitioners, and we exhibit a 469 00:27:28,480 --> 00:27:30,760 Speaker 1: lot of patients. We we wait a really long time 470 00:27:31,280 --> 00:27:33,400 Speaker 1: to be able to find the people who are kind 471 00:27:33,400 --> 00:27:36,919 Speaker 1: of really the best, and that that that matters enormously 472 00:27:37,080 --> 00:27:40,399 Speaker 1: to us, both from the standpoint of the success of 473 00:27:40,400 --> 00:27:43,840 Speaker 1: the firm and also because it's something that you know, 474 00:27:43,880 --> 00:27:47,240 Speaker 1: we value extremely highly just having great colleagues, brilliant colleagues 475 00:27:47,240 --> 00:27:49,120 Speaker 1: that you know, I want to work in a place 476 00:27:49,119 --> 00:27:51,720 Speaker 1: where I can learn from all the people around me. 477 00:27:51,920 --> 00:27:55,520 Speaker 1: And you know, when when my co founder Michael Caratanev 478 00:27:55,520 --> 00:28:01,399 Speaker 1: and I we're talking about starting Bollion, one of the 479 00:28:01,440 --> 00:28:04,359 Speaker 1: reasons that was on our minds is we wanted to 480 00:28:04,520 --> 00:28:07,639 Speaker 1: be in control of who we worked with. You know, 481 00:28:07,680 --> 00:28:10,640 Speaker 1: we really wanted to be able to assemble a group 482 00:28:10,680 --> 00:28:14,359 Speaker 1: of people who were, you know, as brilliant as we 483 00:28:14,400 --> 00:28:17,359 Speaker 1: could find, but also you know, good people, people that 484 00:28:17,520 --> 00:28:19,640 Speaker 1: we liked, people that we were excited to collaborate with. 485 00:28:20,000 --> 00:28:22,960 Speaker 2: So let's talk about some of the fundamental principles Volnon 486 00:28:23,160 --> 00:28:29,440 Speaker 2: is built on. You reference a prediction based approach from 487 00:28:29,480 --> 00:28:33,840 Speaker 2: a paper Leo Briman wrote called two Cultures. Yeah, tell 488 00:28:33,920 --> 00:28:37,000 Speaker 2: us a little bit about what two cultures actually is. 489 00:28:37,200 --> 00:28:40,600 Speaker 1: Yeah. So this this paper was written about twenty years ago. 490 00:28:41,080 --> 00:28:45,400 Speaker 1: Leo Briman was one of the great probabilists and statisticians 491 00:28:46,360 --> 00:28:53,760 Speaker 1: of his generation. Berkeley professor need I say, and you know, 492 00:28:53,880 --> 00:28:59,280 Speaker 1: Leo had been a practitioner in statistical consulting actually for 493 00:28:59,400 --> 00:29:02,840 Speaker 1: quite some time. I'm in between a U. C. L. 494 00:29:02,880 --> 00:29:06,880 Speaker 1: A tenured job and returning to academia at Berkeley, and 495 00:29:07,080 --> 00:29:10,600 Speaker 1: he learned a lot in that time about actually solving 496 00:29:10,640 --> 00:29:15,880 Speaker 1: prediction problems and instead of hypothetically solving them in sort 497 00:29:15,880 --> 00:29:20,800 Speaker 1: of the academic context. And so all of his insights 498 00:29:20,840 --> 00:29:25,160 Speaker 1: about the difference really culminated in this paper from two 499 00:29:25,200 --> 00:29:26,200 Speaker 1: thousand that he wrote. 500 00:29:26,160 --> 00:29:30,000 Speaker 2: The difference between practical use versus academic theory if you like. 501 00:29:30,160 --> 00:29:35,000 Speaker 1: Yeah, And so he identified two schools of thought about 502 00:29:35,000 --> 00:29:40,400 Speaker 1: solving prediction problems, right, and one school is sort of 503 00:29:40,920 --> 00:29:44,120 Speaker 1: model based. The idea is there's some stuff you're going 504 00:29:44,200 --> 00:29:49,080 Speaker 1: to get to observe stock characteristics. Let's say there's a 505 00:29:49,080 --> 00:29:51,840 Speaker 1: thing you wish you knew future price change, let's say, 506 00:29:51,880 --> 00:29:55,280 Speaker 1: and there's a box in nature that turns those inputs 507 00:29:55,280 --> 00:29:59,840 Speaker 1: into the output, right. And in the model based school 508 00:29:59,880 --> 00:30:03,400 Speaker 1: of thought, you try to open that box, reason about 509 00:30:03,400 --> 00:30:06,440 Speaker 1: how it must work, make theories. In our case, these 510 00:30:06,440 --> 00:30:11,440 Speaker 1: would be sort of econometric theories, financial economics theories. And 511 00:30:11,480 --> 00:30:14,640 Speaker 1: then those theories have knobs, not many, and you use 512 00:30:14,760 --> 00:30:17,560 Speaker 1: data to set the knobs, but otherwise you believe the model. 513 00:30:19,200 --> 00:30:22,520 Speaker 1: And he contrasts that with the machine learning school of thought, 514 00:30:22,640 --> 00:30:27,840 Speaker 1: which is also has the idea of Nature's box. The 515 00:30:27,920 --> 00:30:30,360 Speaker 1: inputs go in, the thing you wish you knew comes out. 516 00:30:30,840 --> 00:30:32,920 Speaker 1: But in machine learning, you don't try to open the box. 517 00:30:33,280 --> 00:30:35,600 Speaker 1: You just try to approximate what the box is doing. 518 00:30:36,120 --> 00:30:39,800 Speaker 1: And your measure of success is predictive accuracy, and is 519 00:30:39,880 --> 00:30:43,240 Speaker 1: only predictive accuracy. If you build a gadget and that 520 00:30:43,320 --> 00:30:47,400 Speaker 1: gadget produces predictions that are really accurate they turn out 521 00:30:47,440 --> 00:30:50,680 Speaker 1: to look like the thing that nature produces, then that 522 00:30:50,800 --> 00:30:55,080 Speaker 1: is success. And at the time he wrote the paper, 523 00:30:55,200 --> 00:30:58,960 Speaker 1: his assessment was ninety eight percent of statistics was taking 524 00:30:58,960 --> 00:31:01,560 Speaker 1: the model based approach, two percent was taking the machine 525 00:31:01,600 --> 00:31:02,200 Speaker 1: learning approach. 526 00:31:02,600 --> 00:31:05,520 Speaker 2: And are those statistics still valid today or have we 527 00:31:05,600 --> 00:31:06,440 Speaker 2: shifted quite a bit? 528 00:31:06,520 --> 00:31:10,520 Speaker 1: We shifted quite a bit, And different different arenas of 529 00:31:11,680 --> 00:31:16,160 Speaker 1: prediction problems have different mixes these days. But even in finance, 530 00:31:16,200 --> 00:31:19,120 Speaker 1: I would say it's it's probably more like fifty to. 531 00:31:19,120 --> 00:31:20,960 Speaker 2: Fifty really that much? 532 00:31:21,240 --> 00:31:26,200 Speaker 1: Yeah, I think you know, And if you the logical 533 00:31:26,240 --> 00:31:31,760 Speaker 1: extreme is natural language modeling, which was done for decades 534 00:31:31,760 --> 00:31:35,080 Speaker 1: and decades in the model based approach, where you kind 535 00:31:35,080 --> 00:31:39,120 Speaker 1: of reasoned about linguistic characteristics of how people kind of 536 00:31:39,240 --> 00:31:42,800 Speaker 1: do dialogue and those models had some parameters and you 537 00:31:42,840 --> 00:31:47,080 Speaker 1: fit them with data, and then instead you have, as 538 00:31:47,080 --> 00:31:50,680 Speaker 1: we said, a database of a trillion words and a 539 00:31:50,720 --> 00:31:53,160 Speaker 1: tool with one hundred and seventy five billion parameters, and 540 00:31:53,200 --> 00:31:56,600 Speaker 1: you run that and there is no hope of completely 541 00:31:56,680 --> 00:31:59,640 Speaker 1: understanding what is going on inside of GPD three. But 542 00:31:59,680 --> 00:32:04,080 Speaker 1: nobody complains about that because the results are astounding. The 543 00:32:04,120 --> 00:32:07,479 Speaker 1: thing that you get is incredible. And so that is 544 00:32:08,320 --> 00:32:12,320 Speaker 1: by analogy, the way that we reason about running systematic 545 00:32:12,360 --> 00:32:16,640 Speaker 1: investment strategies. At the end of the day, predictive accuracy 546 00:32:16,840 --> 00:32:20,840 Speaker 1: is what creates returns for investors. Being able to give 547 00:32:21,760 --> 00:32:25,239 Speaker 1: complete descriptions of exactly how the predictions arise does not 548 00:32:25,400 --> 00:32:29,040 Speaker 1: in itself create returns for investors. Now, I'm not against 549 00:32:29,080 --> 00:32:32,800 Speaker 1: interpretability and simplicity all equal. I love interpretability and simplicity, 550 00:32:32,960 --> 00:32:35,719 Speaker 1: but all else is not equal. If you want the 551 00:32:35,760 --> 00:32:39,520 Speaker 1: most accurate predictions, you are going to have to sacrifice 552 00:32:39,560 --> 00:32:43,160 Speaker 1: some amount of simplicity. In fact, this truth is so 553 00:32:43,280 --> 00:32:45,400 Speaker 1: widespread that Leo gave it a name in his paper. 554 00:32:45,440 --> 00:32:49,080 Speaker 1: He called it Accam's dilemma. So Accam's razor is the 555 00:32:49,080 --> 00:32:52,240 Speaker 1: philosophical idea that you should choose the simplest explanation that 556 00:32:52,320 --> 00:32:58,360 Speaker 1: fits the facts. Akam's dilemma is the point that in 557 00:32:58,400 --> 00:33:02,560 Speaker 1: statistical prediction, simplest approach, even though you wish you could 558 00:33:02,600 --> 00:33:04,840 Speaker 1: choose it, is not the most accurate approach if you 559 00:33:04,880 --> 00:33:08,600 Speaker 1: care about predictive accuracy. If you're putting predictive accuracy first, 560 00:33:09,120 --> 00:33:12,160 Speaker 1: then you have to embrace a certain amount of complexity 561 00:33:12,400 --> 00:33:13,640 Speaker 1: and lack of interpretability. 562 00:33:13,960 --> 00:33:17,960 Speaker 2: Huh, that's really quite fascinating. So let's talk a little 563 00:33:17,960 --> 00:33:24,600 Speaker 2: bit about artificial intelligence and large language models. You follow 564 00:33:24,680 --> 00:33:29,280 Speaker 2: d Shaw playing in e commerce and biotech. It seems 565 00:33:29,400 --> 00:33:34,920 Speaker 2: like this approach to using statistics, probability, and computer science 566 00:33:35,960 --> 00:33:38,240 Speaker 2: is applicable to so many different fields. 567 00:33:38,640 --> 00:33:42,760 Speaker 1: It is. Yeah, I think you're talking about prediction problems ultimately, 568 00:33:43,000 --> 00:33:48,920 Speaker 1: So in recommender systems, you can think of the question 569 00:33:49,000 --> 00:33:53,160 Speaker 1: as being well, if I had to predict what thing 570 00:33:53,200 --> 00:33:55,920 Speaker 1: I could show a person that would you be most 571 00:33:56,080 --> 00:33:59,600 Speaker 1: likely to change their behavior and cause them to buy it. 572 00:34:00,000 --> 00:34:06,120 Speaker 1: It's a kind of prediction problem that motivates recommendations. In biotechnology. 573 00:34:07,440 --> 00:34:12,120 Speaker 1: Very often we are trying to make predictions about whether someone, 574 00:34:12,440 --> 00:34:15,600 Speaker 1: let's say, does or doesn't have a condition a disease 575 00:34:15,840 --> 00:34:19,799 Speaker 1: based on lots of information we can gather from high 576 00:34:19,880 --> 00:34:25,879 Speaker 1: throughput diagnostic techniques. These days, the keyword in biology and 577 00:34:25,960 --> 00:34:30,319 Speaker 1: in medicine and biotechnology is high throughput. You're running analyses 578 00:34:30,680 --> 00:34:34,520 Speaker 1: on an individual that are producing hundreds of thousands of numbers, 579 00:34:35,360 --> 00:34:37,680 Speaker 1: and you want to be able to take all of 580 00:34:37,719 --> 00:34:40,520 Speaker 1: that kind of wealth of data and turn it into 581 00:34:40,560 --> 00:34:42,400 Speaker 1: diagnostic information about. 582 00:34:42,160 --> 00:34:47,759 Speaker 2: And we've seen AI get applied to pharmaceutical development in 583 00:34:47,840 --> 00:34:52,120 Speaker 2: ways that people just never really could have imagined just 584 00:34:52,200 --> 00:34:54,879 Speaker 2: a few short years ago. Is there a field that 585 00:34:55,040 --> 00:34:57,879 Speaker 2: AI and large language models are not going to touch 586 00:34:58,280 --> 00:34:59,920 Speaker 2: or is this just the future of everything. 587 00:35:01,800 --> 00:35:04,160 Speaker 1: The kinds of fields where you would expect uptake to 588 00:35:04,200 --> 00:35:10,040 Speaker 1: be slow are where it is hard to assemble large 589 00:35:10,120 --> 00:35:15,760 Speaker 1: data sets of systematically gathered data. And so any field 590 00:35:15,800 --> 00:35:20,959 Speaker 1: where it's relatively easy to at large scale, let's say, 591 00:35:20,960 --> 00:35:23,279 Speaker 1: produce the kinds of the same kinds of informations that 592 00:35:23,920 --> 00:35:26,520 Speaker 1: experts are using to make their decisions, you should expect 593 00:35:26,520 --> 00:35:29,160 Speaker 1: that field to be impacted by these tools if it 594 00:35:29,160 --> 00:35:29,920 Speaker 1: hasn't been already. 595 00:35:30,000 --> 00:35:33,200 Speaker 2: So you're kind of answering my next question, which is 596 00:35:33,680 --> 00:35:36,719 Speaker 2: what led you back to investment management. But it seems 597 00:35:37,040 --> 00:35:40,560 Speaker 2: if there's any field that just generates endless amounts of data. 598 00:35:40,840 --> 00:35:44,200 Speaker 1: It's the markets, that's true. And I had already been 599 00:35:44,960 --> 00:35:48,879 Speaker 1: really interested in the problems of systematic investment strategies from 600 00:35:48,920 --> 00:35:52,160 Speaker 1: my time working in d SHAW, and so my co 601 00:35:52,280 --> 00:35:56,200 Speaker 1: founder Michael Kratanav and I, you know, we were both 602 00:35:56,200 --> 00:35:59,399 Speaker 1: in the Bay Area in the two thousand and four. 603 00:36:02,120 --> 00:36:04,280 Speaker 1: He was there because of a firm that he had founded, 604 00:36:04,320 --> 00:36:06,960 Speaker 1: and I was there finishing my PhD. And we started 605 00:36:06,960 --> 00:36:10,360 Speaker 1: to talk about the idea of using contemporary machine learning 606 00:36:10,360 --> 00:36:14,480 Speaker 1: methods to build strategies that would be, you know, really 607 00:36:14,480 --> 00:36:19,600 Speaker 1: different from strategies that result from classical techniques. And we 608 00:36:19,640 --> 00:36:21,400 Speaker 1: had met at d SHAW in the nineties and been 609 00:36:21,480 --> 00:36:26,640 Speaker 1: less excited about this idea because the methods were pretty immature. 610 00:36:27,000 --> 00:36:29,719 Speaker 1: There wasn't actually a giant diversity of data back in 611 00:36:29,760 --> 00:36:33,200 Speaker 1: the nineties in financial markets, not like there was in 612 00:36:33,239 --> 00:36:36,719 Speaker 1: two thousand and five. And compute was really still quite 613 00:36:36,719 --> 00:36:39,359 Speaker 1: expensive in the nineties, whereas in two thousand and five, 614 00:36:40,000 --> 00:36:42,440 Speaker 1: you know, it had been dropping in the usual More's 615 00:36:42,520 --> 00:36:45,560 Speaker 1: Law way. And this was even before GPUs, and so 616 00:36:45,840 --> 00:36:47,879 Speaker 1: when we looked at the problem in two thousand and five, 617 00:36:48,520 --> 00:36:52,960 Speaker 1: it felt like there was a very live opportunity to 618 00:36:53,040 --> 00:36:56,120 Speaker 1: do something with a lot of promise that would be 619 00:36:56,200 --> 00:36:59,680 Speaker 1: really different. And we had the sense that not a 620 00:36:59,680 --> 00:37:02,480 Speaker 1: lot of people were of the same opinion, and so 621 00:37:02,520 --> 00:37:04,680 Speaker 1: it seemed like something that we should try. 622 00:37:04,880 --> 00:37:08,240 Speaker 2: That there was a void. Nothing nothing in the market 623 00:37:08,239 --> 00:37:12,160 Speaker 2: hates more than a vacuum and intellectual approach. So so 624 00:37:12,560 --> 00:37:17,879 Speaker 2: you mentioned the diversity of various data sources. What what 625 00:37:18,000 --> 00:37:21,520 Speaker 2: don't you consider, like how how far off of price 626 00:37:21,560 --> 00:37:25,280 Speaker 2: and volume do you go in the net you're casting 627 00:37:25,440 --> 00:37:28,040 Speaker 2: for inputs into into your systems. 628 00:37:29,000 --> 00:37:33,440 Speaker 1: Well, I think we're prepared as a you know, as 629 00:37:33,480 --> 00:37:36,919 Speaker 1: a as a research principle, we're prepared to consider any 630 00:37:37,120 --> 00:37:41,160 Speaker 1: data that has some bearing on price formation, like some 631 00:37:41,160 --> 00:37:44,560 Speaker 1: some plausible bearing on how prices are formed. Now, of 632 00:37:44,600 --> 00:37:47,880 Speaker 1: course we're you know, we're a relatively small group of 633 00:37:47,880 --> 00:37:50,759 Speaker 1: people with a lot of ideas and uh, and so 634 00:37:51,120 --> 00:37:55,240 Speaker 1: we have to prioritize so you know, in the event 635 00:37:55,360 --> 00:37:58,200 Speaker 1: we end up pursuing data that you know makes a 636 00:37:58,239 --> 00:38:00,520 Speaker 1: lot of sense, you know, we don't we don't try. 637 00:38:00,920 --> 00:38:03,080 Speaker 2: I mean, can you go as far as politics or 638 00:38:03,120 --> 00:38:06,360 Speaker 2: the weather, like how far off of prices can you 639 00:38:06,560 --> 00:38:07,160 Speaker 2: can you look? 640 00:38:07,239 --> 00:38:10,120 Speaker 1: So, you know, an example would be the weather. You're 641 00:38:09,920 --> 00:38:12,800 Speaker 1: for most securities, you're not going to be very interested 642 00:38:12,880 --> 00:38:15,200 Speaker 1: in the weather, but for commodities future as you might be, 643 00:38:15,320 --> 00:38:17,080 Speaker 1: so that you know, that's the kind of reasoning you 644 00:38:17,080 --> 00:38:17,560 Speaker 1: would apply. 645 00:38:18,120 --> 00:38:22,799 Speaker 2: Right, really really interesting. So let's talk about some of 646 00:38:22,840 --> 00:38:26,960 Speaker 2: the strategies. You guys are running short and mid horizon 647 00:38:27,120 --> 00:38:32,600 Speaker 2: US equities, European equities, Asian equities, mid horizon US credit, 648 00:38:33,040 --> 00:38:36,680 Speaker 2: and then cross assets. So I might to assume all 649 00:38:36,719 --> 00:38:40,200 Speaker 2: of these are machine learning based, and how similar different 650 00:38:40,960 --> 00:38:43,480 Speaker 2: is each approach to each of those asset classes. 651 00:38:43,920 --> 00:38:50,040 Speaker 1: Yeah, they're all machine learning based. The kind of principles 652 00:38:50,080 --> 00:38:53,360 Speaker 1: that I've described of using as much complexity as you 653 00:38:53,440 --> 00:38:57,800 Speaker 1: need to maximize predictive accuracy, et cetera. Those principles underlie 654 00:38:57,880 --> 00:39:00,840 Speaker 1: all the systems. But of course it's trading. Trading corporate 655 00:39:00,840 --> 00:39:03,840 Speaker 1: bonds is very different from trading equities, and so the 656 00:39:04,200 --> 00:39:06,040 Speaker 1: implementations reflect that reality. 657 00:39:06,760 --> 00:39:09,879 Speaker 2: Huh. So let's talk a little bit about the four 658 00:39:09,960 --> 00:39:15,000 Speaker 2: step process that you bring to the systematic approach, and 659 00:39:15,040 --> 00:39:19,240 Speaker 2: this is off of your site, so it's it's data prediction, engine, 660 00:39:19,640 --> 00:39:26,400 Speaker 2: portfolio construction, and execution. Yeah, I'm assuming that is heavily 661 00:39:26,560 --> 00:39:30,359 Speaker 2: computer and machine learning based. At each step along the way. 662 00:39:30,440 --> 00:39:31,359 Speaker 2: Is that is that fair? 663 00:39:32,480 --> 00:39:34,880 Speaker 1: I think that's fair. I mean to different degrees. The 664 00:39:35,360 --> 00:39:41,719 Speaker 1: data gathering that's you know, that's a that's largely a 665 00:39:41,800 --> 00:39:46,080 Speaker 1: software and kind of operations and infrastructure job. 666 00:39:46,280 --> 00:39:48,280 Speaker 2: Do you guys have to spend a lot of time 667 00:39:48,400 --> 00:39:51,759 Speaker 2: cleaning up that data and making sure that because you 668 00:39:52,120 --> 00:39:56,320 Speaker 2: hear between CRISP and s and P and Bloomberg, sometimes 669 00:39:56,400 --> 00:39:58,719 Speaker 2: you'll pull something up and they're just all off a 670 00:39:58,719 --> 00:40:00,759 Speaker 2: little bit from each other because they all bring a 671 00:40:00,840 --> 00:40:04,040 Speaker 2: very different approach to data assembly. How do you make 672 00:40:04,080 --> 00:40:07,960 Speaker 2: sure everything is consistent and there's no errors or errants 673 00:40:09,040 --> 00:40:09,960 Speaker 2: inputs throughout. 674 00:40:10,239 --> 00:40:14,000 Speaker 1: Yeah, through a lot of effort. Essentially, there there we have. 675 00:40:15,040 --> 00:40:17,799 Speaker 1: You know, we have an entire group of people who 676 00:40:17,840 --> 00:40:23,800 Speaker 1: focus on data operations, both for gathering a historical data 677 00:40:23,880 --> 00:40:27,080 Speaker 1: and for the management of the ongoing live data feeds. 678 00:40:27,360 --> 00:40:29,319 Speaker 1: There's no way around that. I mean, that's just work 679 00:40:29,360 --> 00:40:31,080 Speaker 1: that you have to that you have to do. 680 00:40:31,200 --> 00:40:33,080 Speaker 2: You just have to brute force your way through that. 681 00:40:33,520 --> 00:40:36,640 Speaker 2: And then the prediction engine. Sounds like that's the single 682 00:40:36,719 --> 00:40:41,600 Speaker 2: most important part of the machine learning process if I'm 683 00:40:42,040 --> 00:40:45,880 Speaker 2: understanding you correctly, that that's where all the meat of 684 00:40:45,960 --> 00:40:47,080 Speaker 2: the technology is. 685 00:40:47,280 --> 00:40:50,680 Speaker 1: Yeah, I understand the sentiment. I mean, it's worth emphasizing 686 00:40:50,719 --> 00:40:54,200 Speaker 1: that you do not get to a successful systematic strategy 687 00:40:54,200 --> 00:40:57,200 Speaker 1: without all the ingredients. You have to have clean data 688 00:40:57,680 --> 00:41:01,920 Speaker 1: because of the garbage in garbage out. You have to 689 00:41:01,920 --> 00:41:06,880 Speaker 1: have accurate predictions. But you know, predictions don't automatically translate 690 00:41:06,920 --> 00:41:10,400 Speaker 1: into returns for investors. Those predictions are kind of the 691 00:41:10,480 --> 00:41:15,440 Speaker 1: power that drives the portfolio holding part of the system. 692 00:41:15,480 --> 00:41:18,520 Speaker 2: So let's talk about that portfolio construction. Given that you 693 00:41:18,640 --> 00:41:22,920 Speaker 2: have a prediction engine that and good data going into it, 694 00:41:23,200 --> 00:41:26,560 Speaker 2: so you're fairly confident as to the output. How do 695 00:41:26,640 --> 00:41:28,920 Speaker 2: you then take that output and say, here's how I'm 696 00:41:28,920 --> 00:41:32,400 Speaker 2: going to build a portfolio based on what this generates. 697 00:41:32,520 --> 00:41:38,920 Speaker 1: Yeah, so there are three big ingredients in the portfolio construction. 698 00:41:39,160 --> 00:41:43,440 Speaker 1: The predictions what is usually called a risk model in 699 00:41:43,760 --> 00:41:50,560 Speaker 1: this business, which means some understanding of how volatile prices 700 00:41:50,600 --> 00:41:54,000 Speaker 1: are across all the securities you're trading, how correlated they are, 701 00:41:54,840 --> 00:41:57,200 Speaker 1: how you know if they have a if they have 702 00:41:57,239 --> 00:42:00,719 Speaker 1: a big movement, how big that movement will be. That's 703 00:42:00,760 --> 00:42:03,880 Speaker 1: all the risk model. And then the final ingredient is 704 00:42:04,840 --> 00:42:08,240 Speaker 1: what's usually called a market impact model, and that means 705 00:42:09,640 --> 00:42:13,279 Speaker 1: an understanding of how much you are going to push 706 00:42:13,320 --> 00:42:15,759 Speaker 1: prices away from you when you try to trade. This 707 00:42:15,800 --> 00:42:18,799 Speaker 1: is a reality of all trading. You buy a lot 708 00:42:18,800 --> 00:42:21,239 Speaker 1: of a security, you push the price up, you push 709 00:42:21,280 --> 00:42:24,239 Speaker 1: it away from you in the unfavorable direction. And in 710 00:42:24,280 --> 00:42:27,959 Speaker 1: the systems that we run, the predictions that we're trying 711 00:42:27,960 --> 00:42:31,880 Speaker 1: to capture are about the same size as the effect 712 00:42:31,920 --> 00:42:34,279 Speaker 1: that we have on the markets when we trade, and 713 00:42:34,360 --> 00:42:38,439 Speaker 1: so you cannot neglect that impact effect when you're thinking 714 00:42:38,480 --> 00:42:41,000 Speaker 1: about what portfolios to hold. 715 00:42:41,040 --> 00:42:44,760 Speaker 2: So execution becomes really important. If you're not executing well, 716 00:42:44,920 --> 00:42:48,560 Speaker 2: you are moving prices away from your profit. 717 00:42:48,880 --> 00:42:53,400 Speaker 1: That's right, and it is you know, probably the single 718 00:42:53,520 --> 00:42:59,319 Speaker 1: thing that undoes quantitative hedge funds most often is that 719 00:43:00,040 --> 00:43:04,560 Speaker 1: they misunderstand how much they're moving prices. They get too big, 720 00:43:04,600 --> 00:43:08,040 Speaker 1: they start trading too much, and they sort of blowed 721 00:43:08,080 --> 00:43:08,600 Speaker 1: themselves up. 722 00:43:08,800 --> 00:43:11,120 Speaker 2: It's funny that you say that, because as you were 723 00:43:11,160 --> 00:43:14,239 Speaker 2: describing that, the first name that popped into my head 724 00:43:14,680 --> 00:43:19,400 Speaker 2: was long term capital managements trading these really thinly traded 725 00:43:20,239 --> 00:43:27,080 Speaker 2: obscure fixed income products, and everything they bought they sent 726 00:43:27,239 --> 00:43:30,359 Speaker 2: higher because there just wasn't any volume in it. And 727 00:43:30,400 --> 00:43:33,359 Speaker 2: when they needed liquiditly there was none to be had. 728 00:43:33,440 --> 00:43:36,800 Speaker 2: And you know that plus no risk management one hundred 729 00:43:36,960 --> 00:43:39,440 Speaker 2: x leverage equals a kaboom. 730 00:43:39,840 --> 00:43:43,120 Speaker 1: They made a number of mistakes the book. The book 731 00:43:43,160 --> 00:43:45,279 Speaker 1: is good. So when genius fail in, oh absolutely love 732 00:43:45,280 --> 00:43:46,880 Speaker 1: that fantastically fascinating. 733 00:43:46,960 --> 00:43:51,239 Speaker 2: So when you're reading a book like that, somewhere in 734 00:43:51,280 --> 00:43:53,279 Speaker 2: the back of your head are you thinking, hey, this 735 00:43:53,440 --> 00:43:56,280 Speaker 2: is like a what not to do when you're setting 736 00:43:56,360 --> 00:44:00,000 Speaker 2: up a machine learning fund. How influential is something like. 737 00:44:00,040 --> 00:44:03,480 Speaker 1: Well one hundred percent? I mean, look, I think the 738 00:44:03,520 --> 00:44:05,959 Speaker 1: most important adage I've ever heard in my professional life 739 00:44:06,000 --> 00:44:10,160 Speaker 1: is good judgment comes from experience. Experience comes from bad judgment. 740 00:44:10,600 --> 00:44:13,800 Speaker 1: So the extent to which you can get good judgment 741 00:44:14,120 --> 00:44:17,440 Speaker 1: from other people's experience, that is that that is like 742 00:44:17,480 --> 00:44:22,400 Speaker 1: a free tuition. And so we talk a lot about 743 00:44:22,560 --> 00:44:25,640 Speaker 1: all the all the mistakes that that that other people 744 00:44:25,680 --> 00:44:29,960 Speaker 1: have made. And you know, we do not congratulate ourselves 745 00:44:29,960 --> 00:44:33,600 Speaker 1: on having avoided mistakes. We think those people were smart. 746 00:44:33,680 --> 00:44:36,040 Speaker 1: I mean look that you know, you read about these 747 00:44:36,040 --> 00:44:38,040 Speaker 1: events and these people. None of these people were dummies. 748 00:44:38,080 --> 00:44:40,000 Speaker 1: They were sophisticated Nobel laureates. 749 00:44:40,080 --> 00:44:43,040 Speaker 2: Yeah, right, it's they just didn't have a guide book 750 00:44:43,080 --> 00:44:45,560 Speaker 2: on what not to do, which which you guys. 751 00:44:45,280 --> 00:44:47,880 Speaker 1: Do We don't. No, I don't think we do. I 752 00:44:47,920 --> 00:44:50,200 Speaker 1: mean apart from that, apart from reading about right, But 753 00:44:50,480 --> 00:44:53,160 Speaker 1: everybody is undone by a failure that they they didn't 754 00:44:53,280 --> 00:44:55,240 Speaker 1: they did, they didn't think of ever didn't know about yet. 755 00:44:55,280 --> 00:44:57,200 Speaker 1: And we're extremely cognizant of that. 756 00:44:57,560 --> 00:45:00,680 Speaker 2: Huh. That has to be somewhat humbling to come being 757 00:45:01,000 --> 00:45:06,720 Speaker 2: on the lookout for that blind spot that could disrupt everything. 758 00:45:06,920 --> 00:45:11,480 Speaker 1: Yes, yeah, humility is the key ingredient in running in 759 00:45:11,560 --> 00:45:13,040 Speaker 1: running these systems. 760 00:45:13,400 --> 00:45:18,000 Speaker 2: Really quite amazing. So let's talk a little bit about 761 00:45:18,360 --> 00:45:22,960 Speaker 2: how academically focused volling On is. You guys have a 762 00:45:23,040 --> 00:45:28,000 Speaker 2: pretty deep R and D team internally, you teach at Berkeley. 763 00:45:28,200 --> 00:45:30,560 Speaker 2: What does it mean for a Hedge fund to be 764 00:45:30,680 --> 00:45:32,000 Speaker 2: academically focused? 765 00:45:32,480 --> 00:45:36,120 Speaker 1: What I would say probably is kind of evidence based 766 00:45:36,280 --> 00:45:40,640 Speaker 1: rather than academically focused. Saying academically focused gives the impression 767 00:45:40,760 --> 00:45:43,600 Speaker 1: that kind of papers would be the goal or the 768 00:45:43,960 --> 00:45:46,080 Speaker 1: desired output, and that's not the case at all. We have, 769 00:45:46,520 --> 00:45:49,319 Speaker 1: you know, a very specific applied problem that we are 770 00:45:49,440 --> 00:45:50,200 Speaker 1: trying to solve. 771 00:45:50,480 --> 00:45:51,799 Speaker 2: Papers are a mean to an end. 772 00:45:52,000 --> 00:45:56,320 Speaker 1: Papers are you know, we don't write papers for external consumption. 773 00:45:56,360 --> 00:45:59,600 Speaker 1: We do lots of writing internally, and that's to make 774 00:45:59,640 --> 00:46:02,359 Speaker 1: sure that that you know, we're keeping track of our 775 00:46:02,400 --> 00:46:04,840 Speaker 1: own kind of scientific process. 776 00:46:05,000 --> 00:46:08,800 Speaker 2: But you're fairly widely published in statistics and machine learning. Yes, 777 00:46:08,840 --> 00:46:13,280 Speaker 2: what purpose does that serve other than a calling card 778 00:46:13,440 --> 00:46:16,440 Speaker 2: for the fund as well as Hey, I have this 779 00:46:16,560 --> 00:46:18,360 Speaker 2: idea and I want to see what the rest of 780 00:46:18,600 --> 00:46:21,640 Speaker 2: my peers think of it. When when you put stuff 781 00:46:21,680 --> 00:46:24,839 Speaker 2: out into the world, what sort of feedback or pushback 782 00:46:25,239 --> 00:46:26,120 Speaker 2: do you get? 783 00:46:27,480 --> 00:46:29,319 Speaker 1: I guess I would have to say, I really I 784 00:46:29,400 --> 00:46:32,279 Speaker 1: do that as kind of a double life of non 785 00:46:32,320 --> 00:46:37,880 Speaker 1: financial research. So it's just something that I really enjoy. Principally, 786 00:46:37,880 --> 00:46:39,359 Speaker 1: what it means is that I get to work with 787 00:46:39,719 --> 00:46:44,760 Speaker 1: PhD students, and you know, we have really outstanding PhD 788 00:46:44,800 --> 00:46:50,600 Speaker 1: students at Berkeley in statistics, and so it's an opportunity 789 00:46:50,640 --> 00:46:58,640 Speaker 1: for me to do a kind of intellectual work that namely, 790 00:46:58,880 --> 00:47:01,080 Speaker 1: you know, writing a paper laying out an argument for 791 00:47:01,120 --> 00:47:05,040 Speaker 1: public consumption, et cetera that is kind of closed off 792 00:47:05,120 --> 00:47:05,960 Speaker 1: as far as so. 793 00:47:05,960 --> 00:47:10,080 Speaker 2: Not adjacent to what you guys are doing at Volleyon generally. No, No, 794 00:47:10,560 --> 00:47:14,440 Speaker 2: that's really interesting. So then I always assume that that 795 00:47:14,600 --> 00:47:17,799 Speaker 2: was part of your process for developing new models to 796 00:47:17,840 --> 00:47:22,120 Speaker 2: apply machine learning to new assets. Take us through the process. 797 00:47:22,160 --> 00:47:25,400 Speaker 2: How do you go about saying, Hey, this is an 798 00:47:25,440 --> 00:47:28,080 Speaker 2: asset class we don't have exposure to. Let's see how 799 00:47:28,080 --> 00:47:32,280 Speaker 2: to apply what we already know to that specific area. 800 00:47:32,560 --> 00:47:35,399 Speaker 1: Yeah, we have it's a great question. So we're trying 801 00:47:35,400 --> 00:47:39,680 Speaker 1: as much as possible to get the problem for a 802 00:47:39,719 --> 00:47:43,160 Speaker 1: new asset class into a familiar setup, into you know, 803 00:47:43,719 --> 00:47:47,839 Speaker 1: as standard a setup as we can, and so we 804 00:47:47,920 --> 00:47:52,200 Speaker 1: know what these systems look like in the world of equity. 805 00:47:52,320 --> 00:47:55,359 Speaker 1: And so if you're trying to do the same kind, 806 00:47:55,400 --> 00:47:57,440 Speaker 1: if you're trying to build the same kind of system 807 00:47:57,520 --> 00:47:59,920 Speaker 1: for corporate bonds, and you start off by saying, well, okay, 808 00:48:00,239 --> 00:48:02,400 Speaker 1: i'd like I need to know, you know, closing prices 809 00:48:02,480 --> 00:48:05,279 Speaker 1: or inter day prices for all the bonds. Already, you 810 00:48:05,320 --> 00:48:09,239 Speaker 1: have a very big problem in corporate bonds because there 811 00:48:09,320 --> 00:48:15,440 Speaker 1: is no there is no live price feeds that's showing 812 00:48:15,480 --> 00:48:18,560 Speaker 1: you a bit offer quote in the way that there 813 00:48:18,640 --> 00:48:21,920 Speaker 1: is inequity. And so before you can even get started 814 00:48:21,960 --> 00:48:24,680 Speaker 1: thinking about predicting how a price is going to change, 815 00:48:24,680 --> 00:48:26,080 Speaker 1: it would be nice if you know what the price 816 00:48:26,160 --> 00:48:28,759 Speaker 1: currently was, and that is already a problem you have 817 00:48:28,800 --> 00:48:30,719 Speaker 1: to solve in corporate bonds as opposed to being just 818 00:48:30,760 --> 00:48:32,000 Speaker 1: an input that you have access to. 819 00:48:32,320 --> 00:48:35,279 Speaker 2: The old joke was trading by appointment only. Yeah, and 820 00:48:35,320 --> 00:48:37,520 Speaker 2: that seems to be a bit of an issue. And 821 00:48:37,560 --> 00:48:42,080 Speaker 2: there are so many more bond issues than there are equities. Absolutely, 822 00:48:42,239 --> 00:48:45,480 Speaker 2: is this just a database challenge or how do you work? 823 00:48:45,520 --> 00:48:49,280 Speaker 1: No, it's a statistics problem, but it's it's a different 824 00:48:49,360 --> 00:48:52,319 Speaker 1: kind of statistics problem. We're not in this case. We're 825 00:48:52,360 --> 00:48:54,920 Speaker 1: not trying to yet. We're not yet trying to predict 826 00:48:55,080 --> 00:48:58,520 Speaker 1: the future of any quantity. We're trying to say, I 827 00:48:58,560 --> 00:49:00,839 Speaker 1: wish I knew what the fair value of this of 828 00:49:00,880 --> 00:49:05,319 Speaker 1: this CSIP was. I can't see that exactly because there's 829 00:49:05,320 --> 00:49:07,239 Speaker 1: no live order book that with a bid and an 830 00:49:07,280 --> 00:49:09,680 Speaker 1: offer that's got lots of liquidity that lets me figure 831 00:49:09,680 --> 00:49:11,320 Speaker 1: out the fair value. But I do know what. 832 00:49:11,520 --> 00:49:14,120 Speaker 2: At best, you have a recent price, maybe not even 833 00:49:14,160 --> 00:49:14,680 Speaker 2: so recent. 834 00:49:14,840 --> 00:49:17,839 Speaker 1: I have lots of related information. I know you know 835 00:49:18,480 --> 00:49:21,279 Speaker 1: this bond. Maybe this bond didn't trade today, but it 836 00:49:21,320 --> 00:49:23,279 Speaker 1: traded a few times yesterday. I get to say I 837 00:49:23,400 --> 00:49:26,440 Speaker 1: know where it traded. I'm in touch with bond dealers, 838 00:49:26,440 --> 00:49:28,959 Speaker 1: so I know where they've quoted this bond, maybe only 839 00:49:28,960 --> 00:49:31,440 Speaker 1: on one side over the last few days. I have 840 00:49:31,560 --> 00:49:34,800 Speaker 1: some information about the company that issued this bond, et cetera. 841 00:49:35,280 --> 00:49:37,719 Speaker 1: So I have lots of stuff that's related to the 842 00:49:37,800 --> 00:49:39,320 Speaker 1: number I know that I want to know. I just 843 00:49:39,360 --> 00:49:42,319 Speaker 1: don't know that number, right, And so what I want 844 00:49:42,360 --> 00:49:44,440 Speaker 1: to try to do is kind of fill in and 845 00:49:44,600 --> 00:49:46,919 Speaker 1: do what's what in statistics or in control we would 846 00:49:46,920 --> 00:49:50,440 Speaker 1: call a now casting problem, huh, And it's an analogy 847 00:49:50,480 --> 00:49:55,600 Speaker 1: actually is too automatically controlling an airplane. So surprisingly, Oh, 848 00:49:55,840 --> 00:49:57,719 Speaker 1: the main there there are there are when you're if 849 00:49:57,719 --> 00:49:59,800 Speaker 1: you're trying if a software is trying to fly in 850 00:49:59,800 --> 00:50:03,080 Speaker 1: air plane, there are six things that it absolutely has 851 00:50:03,120 --> 00:50:05,160 Speaker 1: to know. Has to know the x y z of 852 00:50:05,160 --> 00:50:07,400 Speaker 1: where the plane is and the x y z of 853 00:50:07,440 --> 00:50:10,200 Speaker 1: its velocity where it's headed. Right, those are the six 854 00:50:10,280 --> 00:50:14,120 Speaker 1: most important numbers. Now, nature does not just supply those 855 00:50:14,200 --> 00:50:17,440 Speaker 1: numbers to you. You cannot know those numbers with perfect exactitude. 856 00:50:17,600 --> 00:50:20,200 Speaker 1: But there's lots of instruments on the plane, and there's 857 00:50:20,239 --> 00:50:23,800 Speaker 1: GPS and all sorts of information that is very closely 858 00:50:23,840 --> 00:50:26,040 Speaker 1: related to the numbers You wish you knew, and you 859 00:50:26,080 --> 00:50:28,919 Speaker 1: can use statistics to go from all that stuff that's 860 00:50:29,040 --> 00:50:32,719 Speaker 1: adjacent to a guess and infill of the thing you 861 00:50:32,800 --> 00:50:35,120 Speaker 1: wish you knew, And the same goes with the current 862 00:50:35,160 --> 00:50:36,560 Speaker 1: price of a corporate bond. 863 00:50:37,160 --> 00:50:41,520 Speaker 2: Huh. That's really kind of interesting. So I'm curious as 864 00:50:41,600 --> 00:50:45,399 Speaker 2: to how often you start working your way into one 865 00:50:45,480 --> 00:50:50,640 Speaker 2: particular asset or a particular strategy for that asset and 866 00:50:50,760 --> 00:50:54,040 Speaker 2: just suddenly realize, oh, this is wildly different than we 867 00:50:54,160 --> 00:50:57,960 Speaker 2: previously expected, and suddenly you're down a rabbit hole to 868 00:50:58,160 --> 00:51:02,200 Speaker 2: just wildly unexpected areas. It sounds like that isn't all 869 00:51:02,239 --> 00:51:02,920 Speaker 2: then uncommon. 870 00:51:02,960 --> 00:51:03,960 Speaker 1: It is not uncommon at all. 871 00:51:04,160 --> 00:51:04,399 Speaker 2: Huh. 872 00:51:04,400 --> 00:51:07,480 Speaker 1: No, it's a nice you know, there's this kind of 873 00:51:07,560 --> 00:51:09,440 Speaker 1: wishful thinking that all we have. You know, we figured 874 00:51:09,440 --> 00:51:11,840 Speaker 1: it out in one asset class in the sense that 875 00:51:11,840 --> 00:51:14,200 Speaker 1: we have a system that's kind of stable and performing 876 00:51:14,200 --> 00:51:16,600 Speaker 1: reasonably well that we that we have a feel for, 877 00:51:17,160 --> 00:51:20,280 Speaker 1: and now we want to take that system and somehow 878 00:51:20,320 --> 00:51:23,920 Speaker 1: replicate it in a different situation. And while we're going 879 00:51:23,960 --> 00:51:26,799 Speaker 1: to standardize the new situation to make it look like 880 00:51:26,800 --> 00:51:29,080 Speaker 1: the old situation. That's the principle. That principle kind of 881 00:51:29,120 --> 00:51:31,480 Speaker 1: quickly goes out the window when you when you start 882 00:51:31,520 --> 00:51:33,400 Speaker 1: to make contact with the reality of how the new 883 00:51:33,440 --> 00:51:34,800 Speaker 1: asset class actually behaves. 884 00:51:34,880 --> 00:51:37,360 Speaker 2: So stocks are different than credit, are different than bonds, 885 00:51:37,440 --> 00:51:40,360 Speaker 2: or different than commodities. They're all like starting fresh. Yeah, 886 00:51:40,400 --> 00:51:42,919 Speaker 2: over what some of the more surprising things you've learned 887 00:51:42,960 --> 00:51:46,880 Speaker 2: as you've applied machine learning to totally different asset classes. 888 00:51:47,040 --> 00:51:49,719 Speaker 1: Well, I think, you know, corporate bonds provide a lot 889 00:51:49,719 --> 00:51:52,480 Speaker 1: of examples of this. I mean, the fact that you 890 00:51:52,520 --> 00:51:57,279 Speaker 1: don't actually really know a good live price or a 891 00:51:57,320 --> 00:52:00,480 Speaker 1: good live bid offers it seems seems you know, it's surprising. 892 00:52:00,520 --> 00:52:03,520 Speaker 1: I mean, this is this fact has started to change. 893 00:52:03,560 --> 00:52:07,279 Speaker 1: Like over the years, there's been an accelerating electronification of 894 00:52:07,320 --> 00:52:09,880 Speaker 1: corporate bond treading, and that's you know, that's that's been 895 00:52:09,880 --> 00:52:11,839 Speaker 1: a big advantage for us actually because we were kind 896 00:52:11,880 --> 00:52:14,000 Speaker 1: of first movers and so we've really benefited from that. 897 00:52:14,440 --> 00:52:17,360 Speaker 1: So the problem is diminished relative to how it was, 898 00:52:17,719 --> 00:52:20,160 Speaker 1: you know, six seven years ago when we started, but 899 00:52:20,200 --> 00:52:23,440 Speaker 1: it's still relative equities, it's absolutely there. 900 00:52:23,520 --> 00:52:25,839 Speaker 2: Yeah, So you get so when in other words, if 901 00:52:25,840 --> 00:52:28,279 Speaker 2: I'm looking at a bond mutual fund or even a 902 00:52:28,280 --> 00:52:33,400 Speaker 2: bondytf that's trading during the day. That price is somebody's 903 00:52:33,440 --> 00:52:37,560 Speaker 2: best approximation of the value of all the bonds inside. 904 00:52:37,880 --> 00:52:41,439 Speaker 2: But really you don't know the nav, do you. 905 00:52:41,440 --> 00:52:43,600 Speaker 1: You're just kind of guessing, Barry, don't even get me 906 00:52:43,640 --> 00:52:46,160 Speaker 1: started on bonditfs real because. 907 00:52:45,960 --> 00:52:48,160 Speaker 2: That it seems like that would be the first place 908 00:52:48,200 --> 00:52:52,120 Speaker 2: that would show up. Hey, bondytf's sound like throughout the 909 00:52:52,200 --> 00:52:56,759 Speaker 2: day they're gonna be mispriced a little bit or wildly mispriced. 910 00:52:57,080 --> 00:53:00,319 Speaker 1: Well, the bond ETF there's a sense if you're a 911 00:53:00,520 --> 00:53:02,360 Speaker 1: if you're a market purist, in which they can't be 912 00:53:02,440 --> 00:53:05,280 Speaker 1: mispriced because there's their price is set by supplying demand 913 00:53:05,560 --> 00:53:08,560 Speaker 1: in the ETF market, and that's a super liquid market, 914 00:53:08,840 --> 00:53:11,720 Speaker 1: and so there may be a difference between the market 915 00:53:11,719 --> 00:53:14,120 Speaker 1: price of the ETF and the under the nave of 916 00:53:14,120 --> 00:53:18,520 Speaker 1: the underlying portfolio, except in many cases with bond ETF 917 00:53:18,560 --> 00:53:23,120 Speaker 1: there's not even a crisply defined underlying portfolio. It turns 918 00:53:23,120 --> 00:53:26,520 Speaker 1: out that the authorized participants in those ETF markets can 919 00:53:27,120 --> 00:53:32,279 Speaker 1: negotiate with the fund manager about exactly what the constituents 920 00:53:32,320 --> 00:53:35,160 Speaker 1: are of the create redeem baskets, and so it's not 921 00:53:35,200 --> 00:53:37,920 Speaker 1: even at all clear what you mean when you say 922 00:53:38,080 --> 00:53:40,239 Speaker 1: that the nav is this or that relative to the 923 00:53:40,320 --> 00:53:41,200 Speaker 1: price of the ETF. 924 00:53:41,520 --> 00:53:44,040 Speaker 2: So when I asked about what's surprising when you work 925 00:53:44,080 --> 00:53:46,000 Speaker 2: you in on a rabbit hole. Hey, we don't know 926 00:53:46,000 --> 00:53:48,120 Speaker 2: what the hell's in this bond ETF. Trust us, it's 927 00:53:48,160 --> 00:53:51,640 Speaker 2: all good. That's a pretty surprise. And I'm only exaggerating 928 00:53:51,640 --> 00:53:54,520 Speaker 2: a little bit, But that seems like that's kind of shocking. 929 00:53:55,160 --> 00:53:57,920 Speaker 1: It's it is surprising when you find out about it, 930 00:53:57,960 --> 00:54:00,919 Speaker 1: but you quickly come to understand. If you trade single 931 00:54:00,960 --> 00:54:03,160 Speaker 1: name bonds, as we do, you quickly come to understand 932 00:54:03,520 --> 00:54:05,719 Speaker 1: why bond ETFs work that way. 933 00:54:06,560 --> 00:54:08,680 Speaker 2: I recall a couple of years ago there was a 934 00:54:08,719 --> 00:54:12,480 Speaker 2: big Wall Street Journal article on the g l d 935 00:54:13,280 --> 00:54:17,279 Speaker 2: E t F, and from that article I learned that 936 00:54:18,040 --> 00:54:22,280 Speaker 2: GLD was formed because gold dealers had just excess gold 937 00:54:22,400 --> 00:54:25,120 Speaker 2: piling up in their warehouses and they needed a way 938 00:54:25,560 --> 00:54:27,920 Speaker 2: to move it. So that was kind of shocking about 939 00:54:27,960 --> 00:54:32,279 Speaker 2: that ETF any other space that that led to a 940 00:54:33,360 --> 00:54:35,719 Speaker 2: sort of big surprise as you worked your way into it. 941 00:54:37,160 --> 00:54:41,200 Speaker 1: Well, I think ETFs are a kind of a good 942 00:54:41,239 --> 00:54:45,239 Speaker 1: source of these examples. So the volatility ETFs, the you know, 943 00:54:45,280 --> 00:54:47,560 Speaker 1: the ETFs that are that are based on the VIX 944 00:54:47,640 --> 00:54:50,360 Speaker 1: or that are short the vics. You may remember several years. 945 00:54:50,160 --> 00:54:52,120 Speaker 2: Ago I was gonna say the ones that haven't blown up. 946 00:54:52,200 --> 00:54:55,600 Speaker 1: Yeah right, there was this event called Valmageddon where. 947 00:54:56,239 --> 00:54:58,640 Speaker 2: That was ETF notes, wasn't it the Yeah. 948 00:54:59,680 --> 00:55:03,160 Speaker 1: Right, there are these essentially these investment products that were 949 00:55:03,280 --> 00:55:07,040 Speaker 1: short VIX, and VIX went through a spike that caused 950 00:55:07,040 --> 00:55:09,160 Speaker 1: them to have to liquidate, which was part I mean, 951 00:55:09,239 --> 00:55:13,200 Speaker 1: the people who designed the sixteene traded note. They understood 952 00:55:13,239 --> 00:55:15,040 Speaker 1: that this was a possibility, so they had a sort 953 00:55:15,040 --> 00:55:18,600 Speaker 1: of uh descriptions in their in their contract for what 954 00:55:18,880 --> 00:55:23,680 Speaker 1: it would mean. But yeah, always surprising to watch something 955 00:55:24,040 --> 00:55:25,360 Speaker 1: suddenly go out of business. 956 00:55:25,600 --> 00:55:28,120 Speaker 2: We seem to get a thousand year flood every couple 957 00:55:28,120 --> 00:55:30,760 Speaker 2: of years. Maybe we shouldn't be calling these things thousand 958 00:55:30,800 --> 00:55:33,880 Speaker 2: year flood. That's right, that's a that's a big misnomer. 959 00:55:34,360 --> 00:55:36,879 Speaker 1: As statisticians, we tell people, you know, if you if 960 00:55:36,920 --> 00:55:39,960 Speaker 1: you think that you've experienced a six sigma event, the 961 00:55:40,000 --> 00:55:43,120 Speaker 1: problem is that you have underestimated sigma. 962 00:55:43,239 --> 00:55:46,759 Speaker 2: That that's really interesting. So so, given the gap in 963 00:55:46,840 --> 00:55:53,000 Speaker 2: the world between computer science and an investment management, how 964 00:55:53,080 --> 00:55:56,279 Speaker 2: long is it going to be before that narrows and 965 00:55:56,320 --> 00:55:58,560 Speaker 2: we start seeing a whole lot more of the sort 966 00:55:58,560 --> 00:56:02,239 Speaker 2: of work you're doing applied across the board to to 967 00:56:02,320 --> 00:56:03,440 Speaker 2: the world of investment. 968 00:56:04,520 --> 00:56:08,160 Speaker 1: Well, I think it's happening. It's been happening for for 969 00:56:08,239 --> 00:56:11,000 Speaker 1: quite a long time. I mean, for example, all of 970 00:56:11,440 --> 00:56:15,279 Speaker 1: modern portfolio theory. Really, it kind of began in the 971 00:56:15,320 --> 00:56:18,520 Speaker 1: fifties with you know, first of all, Markowitz and other 972 00:56:18,560 --> 00:56:21,600 Speaker 1: people thinking about, you know, what it means to benefit 973 00:56:21,640 --> 00:56:24,960 Speaker 1: from diversification, and the idea that you know, diversification is 974 00:56:24,960 --> 00:56:28,040 Speaker 1: the only free lunch in finance. So I would I 975 00:56:28,040 --> 00:56:32,880 Speaker 1: would say that, you know, the idea of thinking in 976 00:56:32,920 --> 00:56:38,120 Speaker 1: a in a systematic and scientific way about how to 977 00:56:38,120 --> 00:56:41,279 Speaker 1: to manage and grow wealth, not you know, not even 978 00:56:41,440 --> 00:56:45,359 Speaker 1: just for institutions, but also for individuals. Has is an 979 00:56:45,360 --> 00:56:48,920 Speaker 1: example of a way that these ideas have kind of 980 00:56:49,840 --> 00:56:51,880 Speaker 1: had profound effects. 981 00:56:52,120 --> 00:56:55,200 Speaker 2: I know, I only have you for a little while longer, 982 00:56:55,520 --> 00:56:58,319 Speaker 2: So let's jump to our favorite questions that we ask 983 00:56:59,040 --> 00:57:01,640 Speaker 2: all of our guests, starting with tell us what you're 984 00:57:01,640 --> 00:57:04,400 Speaker 2: streaming these days? What are you either listening to or 985 00:57:04,480 --> 00:57:06,680 Speaker 2: watching to keep yourself entertained. 986 00:57:08,160 --> 00:57:11,480 Speaker 1: I A few things I've been watching recently. The Bear, 987 00:57:11,520 --> 00:57:13,960 Speaker 1: I don't know if you've heard So Great, So great, right, 988 00:57:14,239 --> 00:57:16,200 Speaker 1: and I'm in Chicago, as I know, we were just 989 00:57:16,360 --> 00:57:19,680 Speaker 1: from Yeah, so. 990 00:57:18,920 --> 00:57:21,200 Speaker 2: So and and there are parts of that show that 991 00:57:21,280 --> 00:57:23,760 Speaker 2: are kind of a love letter to absolutely as you 992 00:57:23,800 --> 00:57:26,640 Speaker 2: get deeper into the series, because it starts out kind 993 00:57:26,640 --> 00:57:29,600 Speaker 2: of gritty and you're seeing the underside, and then as 994 00:57:29,640 --> 00:57:33,600 Speaker 2: we progress, it really becomes like a lovely postcard. Such 995 00:57:33,640 --> 00:57:34,400 Speaker 2: an amazing show. 996 00:57:34,480 --> 00:57:37,760 Speaker 1: So really really love that show. Was I was late 997 00:57:37,800 --> 00:57:41,040 Speaker 1: to better call Saul that I'm finishing up. I think 998 00:57:41,080 --> 00:57:45,600 Speaker 1: as good as as Breaking Bad, So I maybe when 999 00:57:45,640 --> 00:57:48,240 Speaker 1: you haven't heard of there's a show called Mister in Between. 1000 00:57:48,160 --> 00:57:50,320 Speaker 2: Which is mister Yeah. 1001 00:57:50,320 --> 00:57:53,040 Speaker 1: It's not Hulu, it's from it's from Australia. It's about 1002 00:57:53,040 --> 00:57:58,360 Speaker 1: a guy who's, you know, a doting father living his life. 1003 00:57:58,360 --> 00:58:03,120 Speaker 1: He's also essentially a muscle man and hitman for for 1004 00:58:04,040 --> 00:58:07,479 Speaker 1: local criminals in his part of Australia. But it's half 1005 00:58:07,480 --> 00:58:09,200 Speaker 1: hour dark comedy. 1006 00:58:09,160 --> 00:58:12,440 Speaker 2: Right, so not quite Barry and not quite Sopranos somewhere. 1007 00:58:12,960 --> 00:58:14,080 Speaker 1: Yeah, that's exactly. 1008 00:58:14,240 --> 00:58:19,360 Speaker 2: Yeah, sounds really interesting. Tell us about your early mentors 1009 00:58:19,360 --> 00:58:21,160 Speaker 2: who helped shape your career. 1010 00:58:21,880 --> 00:58:24,440 Speaker 1: Well, Berry, I'd been lucky to have a lot of 1011 00:58:24,840 --> 00:58:28,880 Speaker 1: people who were you know, both really smart and talented 1012 00:58:28,920 --> 00:58:32,000 Speaker 1: and willing to you know, take the time to help 1013 00:58:32,040 --> 00:58:35,840 Speaker 1: me learn and understand things. So actually, my co founder, 1014 00:58:36,000 --> 00:58:40,240 Speaker 1: Michael Caratanov, he was kind of my first mentor in finance. 1015 00:58:40,320 --> 00:58:42,680 Speaker 1: He he had been a d SHAW for several years 1016 00:58:43,400 --> 00:58:46,120 Speaker 1: when I got there, and he he really taught me 1017 00:58:46,560 --> 00:58:49,280 Speaker 1: kind of the ins and outs of of market micro structure. 1018 00:58:50,440 --> 00:58:53,120 Speaker 1: I worked with a couple of people who managed me 1019 00:58:53,280 --> 00:58:56,480 Speaker 1: at d SHAW yo see Friedman and Kapeel Mature, who 1020 00:58:56,480 --> 00:59:00,480 Speaker 1: have gone on to hugely successful careers in quantitative finance, 1021 00:59:00,560 --> 00:59:03,360 Speaker 1: and they taught me a lot to when I did 1022 00:59:03,360 --> 00:59:06,800 Speaker 1: my PhD. My advisor Mike Jordan, who's a kind of 1023 00:59:06,840 --> 00:59:12,320 Speaker 1: world famous machine learning researcher. You know, I learned enormously 1024 00:59:12,320 --> 00:59:18,000 Speaker 1: from him. And there's another professor of statistics who sadly 1025 00:59:18,000 --> 00:59:21,920 Speaker 1: passed away about fifteen years ago named David Friedman. He 1026 00:59:22,040 --> 00:59:26,120 Speaker 1: was really just an intellectual giant of the twentieth century 1027 00:59:26,120 --> 00:59:30,040 Speaker 1: and probability and statistics. He was both, you know, one 1028 00:59:30,080 --> 00:59:35,120 Speaker 1: of the most brilliant probabilists and also an applied statistician. 1029 00:59:35,160 --> 00:59:38,200 Speaker 1: And this is this is like a pink diamond kind 1030 00:59:38,240 --> 00:59:42,240 Speaker 1: of combination. It's that rare to find someone who has 1031 00:59:42,320 --> 00:59:46,320 Speaker 1: that kind of technical capability but also understands the pragmatics 1032 00:59:46,320 --> 00:59:48,560 Speaker 1: of actually doing data analysis. He spent a lot of 1033 00:59:48,600 --> 00:59:53,360 Speaker 1: time as an expert witness. He was the lead statistical 1034 00:59:53,360 --> 00:59:56,440 Speaker 1: consultant for the case on census adjustment that went to 1035 00:59:56,520 --> 01:00:02,000 Speaker 1: the Supreme Court. In fact, he told me, uh, what 1036 01:00:02,360 --> 01:00:05,280 Speaker 1: went that in the end? Uh, you know, the the 1037 01:00:05,320 --> 01:00:08,760 Speaker 1: people against adjustment they won in a unanimous Supreme Court decision. 1038 01:00:08,800 --> 01:00:11,120 Speaker 1: And David Freeman told me, he said, you know, all 1039 01:00:11,160 --> 01:00:13,240 Speaker 1: that work and we only convinced nine people. 1040 01:00:15,440 --> 01:00:17,840 Speaker 2: But not nine people that kind of matter, Yeah, exactly. 1041 01:00:18,160 --> 01:00:21,280 Speaker 1: So it was just it was a real it was 1042 01:00:21,360 --> 01:00:24,480 Speaker 1: kind of a once in a lifetime privilege to get 1043 01:00:24,520 --> 01:00:28,520 Speaker 1: to spend time with someone of that intellectual caliber. And 1044 01:00:28,600 --> 01:00:30,520 Speaker 1: there were others too. I mean, I've been I've been 1045 01:00:30,600 --> 01:00:31,520 Speaker 1: very fortunate that. 1046 01:00:31,600 --> 01:00:35,360 Speaker 2: That's quite a list to begin with. Let's talk about books. 1047 01:00:35,360 --> 01:00:36,920 Speaker 2: What are some of your favorites and what are you 1048 01:00:36,960 --> 01:00:37,760 Speaker 2: reading right now? 1049 01:00:38,880 --> 01:00:40,880 Speaker 1: Uh? Well, I'm a I'm a big book reader, so 1050 01:00:41,080 --> 01:00:42,240 Speaker 1: I had a long list. 1051 01:00:42,480 --> 01:00:45,800 Speaker 2: But probably by the way, this is everybody's favorite section 1052 01:00:46,400 --> 01:00:50,200 Speaker 2: of the podcast. People are always looking for good book recommendations, 1053 01:00:50,320 --> 01:00:54,520 Speaker 2: and if they like what you said earlier, they're gonna 1054 01:00:54,520 --> 01:00:56,720 Speaker 2: love love your book recommendations, so fire away. 1055 01:00:57,120 --> 01:01:02,800 Speaker 1: So I'm a big fan of kind of modernist dystopian fiction. 1056 01:01:03,280 --> 01:01:05,800 Speaker 1: So a couple of examples of that would be the 1057 01:01:05,800 --> 01:01:10,600 Speaker 1: book Infinite Jest by David Foster Wallace, wind Up Bird 1058 01:01:10,680 --> 01:01:14,000 Speaker 1: Chronicle by Hirouki Murakami. Those are two of my all 1059 01:01:14,040 --> 01:01:17,919 Speaker 1: time favorite books. There's a I think much less well 1060 01:01:17,960 --> 01:01:22,840 Speaker 1: known but beautiful novel. It's a kind of academic coming 1061 01:01:22,880 --> 01:01:28,440 Speaker 1: of age novel called Stoner by John Williams. A really moving, 1062 01:01:28,560 --> 01:01:33,000 Speaker 1: just a tremendous book. Sort of more dystopia would be 1063 01:01:33,360 --> 01:01:38,240 Speaker 1: White Noise to Lilo and kind of the classics that 1064 01:01:38,240 --> 01:01:41,040 Speaker 1: everybody knows nineteen eighty four and Brave New World. Those 1065 01:01:41,040 --> 01:01:42,920 Speaker 1: are two more of my favorite. 1066 01:01:42,600 --> 01:01:46,800 Speaker 2: Huh, it's funny when you mentioned The Bear. I'm in 1067 01:01:46,840 --> 01:01:49,920 Speaker 2: the middle of reading a book that I would swear 1068 01:01:50,000 --> 01:01:54,880 Speaker 2: the writers of The Bear leaned on called Unreasonable Hospitality 1069 01:01:55,520 --> 01:01:59,640 Speaker 2: by somebody who worked for the Danny Meyer's Hospitality Group. 1070 01:02:00,080 --> 01:02:03,520 Speaker 2: Eleven Madison Park in Ramsey Tavern and all these famous 1071 01:02:03,840 --> 01:02:08,040 Speaker 2: New York haunts, and the scene in The Bear where 1072 01:02:08,440 --> 01:02:12,240 Speaker 2: they overhear a couple say, oh, we visited Chicago when 1073 01:02:12,240 --> 01:02:14,640 Speaker 2: you never had deep dish, So they send the guy 1074 01:02:14,720 --> 01:02:18,160 Speaker 2: out to get deep dish. There's part of the book 1075 01:02:18,720 --> 01:02:23,720 Speaker 2: where at eleven Medicine Park this people actually showed up 1076 01:02:23,760 --> 01:02:25,760 Speaker 2: with suitcases. It was the last thing they would eat 1077 01:02:25,800 --> 01:02:28,360 Speaker 2: doing before they heading to the airport. And they said, oh, 1078 01:02:28,360 --> 01:02:30,360 Speaker 2: we ate all these great places in New York, but 1079 01:02:30,400 --> 01:02:32,640 Speaker 2: we never had a New York hot dog. And what 1080 01:02:32,640 --> 01:02:34,320 Speaker 2: do they do. They send them out to get someone 1081 01:02:34,360 --> 01:02:36,600 Speaker 2: out to get a hot dog. They played it and 1082 01:02:37,280 --> 01:02:39,920 Speaker 2: use all the condiments to make it very special, and 1083 01:02:39,960 --> 01:02:42,840 Speaker 2: it looks like it was ripped right out of the Bear, 1084 01:02:43,000 --> 01:02:46,960 Speaker 2: or vice versa. But if you're interested in just hey, 1085 01:02:47,000 --> 01:02:51,840 Speaker 2: how can we disrupt the restaurant business and make it 1086 01:02:51,920 --> 01:02:54,000 Speaker 2: not just about the celebrity chef in the kitchen but 1087 01:02:54,400 --> 01:02:58,160 Speaker 2: the whole experience. Fascinating kind of nonfiction book? 1088 01:02:58,240 --> 01:02:59,240 Speaker 1: That does sound really interesting. 1089 01:02:59,400 --> 01:03:02,080 Speaker 2: Yeah, really, you mentioned the Bear and it just popped 1090 01:03:02,120 --> 01:03:04,160 Speaker 2: into my head. Any of the books you want to 1091 01:03:04,160 --> 01:03:06,080 Speaker 2: mention that's that's a good list to start with. 1092 01:03:06,440 --> 01:03:10,240 Speaker 1: Yeah. My other kind of big interest is science fiction, 1093 01:03:10,480 --> 01:03:16,920 Speaker 1: speculative fiction. Unsurprisingly right, Sorry, sorry, but so there are 1094 01:03:16,960 --> 01:03:19,640 Speaker 1: some classics that I think everybody should read. Ursula LeGuin 1095 01:03:19,960 --> 01:03:24,040 Speaker 1: loves just amazing. So The Dispossessed and The Left Hand 1096 01:03:24,040 --> 01:03:25,680 Speaker 1: of Darkness, those are just two of the best books 1097 01:03:25,680 --> 01:03:26,520 Speaker 1: I've ever read period. 1098 01:03:26,560 --> 01:03:30,040 Speaker 2: Forget Left Handed Darkness stays with you for a long time. 1099 01:03:30,120 --> 01:03:35,200 Speaker 1: Yeah right, yeah, really really amazing books. I'm rereading right now, 1100 01:03:35,360 --> 01:03:41,960 Speaker 1: Cryptonomicon Neil Stevenson. And one other thing I try to 1101 01:03:42,000 --> 01:03:45,520 Speaker 1: do is I have very big gaps in my reading. 1102 01:03:45,560 --> 01:03:48,280 Speaker 1: For example, I've never read Updyke, so I started reading 1103 01:03:48,320 --> 01:03:49,000 Speaker 1: The Rabbit. 1104 01:03:49,000 --> 01:03:52,280 Speaker 2: Serious World of Corn. It's a garb and they're they're 1105 01:03:52,440 --> 01:03:53,800 Speaker 2: very much of an era. 1106 01:03:54,000 --> 01:03:54,880 Speaker 1: Yeah, that's right. 1107 01:03:55,880 --> 01:03:57,720 Speaker 2: What else give us more? Uh? 1108 01:03:58,080 --> 01:03:59,880 Speaker 1: Wow? Okay, let's see George so. 1109 01:04:01,600 --> 01:04:01,840 Speaker 2: He. 1110 01:04:02,320 --> 01:04:05,040 Speaker 1: Oh wow, I think I think you'd love him. So 1111 01:04:05,400 --> 01:04:09,280 Speaker 1: He's his real strength is short fiction. He had He's 1112 01:04:09,280 --> 01:04:12,960 Speaker 1: written great novels too, but tenth of December this is 1113 01:04:13,000 --> 01:04:16,320 Speaker 1: his best collection of of fiction. And that this is 1114 01:04:16,360 --> 01:04:23,280 Speaker 1: more kind of modern dystopian, kind of comic dystopian stuff. 1115 01:04:23,600 --> 01:04:27,560 Speaker 2: You keep coming back to dystopia, yeasinating. 1116 01:04:26,760 --> 01:04:30,680 Speaker 1: I find, you know, it's uh, it's very different from 1117 01:04:30,720 --> 01:04:32,919 Speaker 1: my my day to day reality. So I think it's 1118 01:04:32,960 --> 01:04:36,120 Speaker 1: a you know, it's a great change of pace for 1119 01:04:36,200 --> 01:04:41,320 Speaker 1: me to be able to read this stuff. So, uh, 1120 01:04:41,960 --> 01:04:45,360 Speaker 1: some some science writing, I can tell you. Probably the 1121 01:04:45,400 --> 01:04:48,360 Speaker 1: best science book I ever read is The Selfish Gene 1122 01:04:48,800 --> 01:04:54,280 Speaker 1: by Richard Dawkins, which kind of really you know, you 1123 01:04:54,360 --> 01:04:57,480 Speaker 1: have a kind of intuitive understanding of genetics and natural 1124 01:04:57,480 --> 01:05:01,560 Speaker 1: selection in Darwin, but the language that Dawkins uses really 1125 01:05:01,560 --> 01:05:06,160 Speaker 1: makes you appreciate just how how much the genes are 1126 01:05:06,160 --> 01:05:08,840 Speaker 1: in charge and how little we as the as the 1127 01:05:09,520 --> 01:05:13,320 Speaker 1: you know he calls he calls organisms survival machines that 1128 01:05:13,400 --> 01:05:16,720 Speaker 1: the genes have kind of built and and exist inside 1129 01:05:16,760 --> 01:05:19,200 Speaker 1: in order to ensure their propagation. And his whole the 1130 01:05:19,200 --> 01:05:21,560 Speaker 1: whole point of view in that book just gives you, Uh, 1131 01:05:22,360 --> 01:05:25,280 Speaker 1: it's really eye opening, makes you think about natural selection 1132 01:05:25,360 --> 01:05:28,040 Speaker 1: and evolution and genetics in a completely different way, even 1133 01:05:28,040 --> 01:05:30,600 Speaker 1: though it's all based on the same kind of facts that. 1134 01:05:30,600 --> 01:05:32,400 Speaker 2: You know, it's just framing. 1135 01:05:32,480 --> 01:05:34,760 Speaker 1: It's the framing and the perspective that are really that 1136 01:05:34,840 --> 01:05:37,120 Speaker 1: really kind of blow your mind. So it's a great 1137 01:05:37,200 --> 01:05:38,200 Speaker 1: it's a great book to read. 1138 01:05:39,440 --> 01:05:41,439 Speaker 2: Huh, that's a hell of a list. You've given people 1139 01:05:41,480 --> 01:05:44,080 Speaker 2: a lot of things to start with, and now down 1140 01:05:44,080 --> 01:05:47,160 Speaker 2: to our last two questions, HM, what advice would you 1141 01:05:47,200 --> 01:05:50,560 Speaker 2: give to a recent college grad who is interested in 1142 01:05:50,600 --> 01:05:54,880 Speaker 2: a career in either investment management or machine learning. 1143 01:05:56,600 --> 01:05:59,560 Speaker 1: Yeah? So, I mean I work in a very specialized 1144 01:05:59,720 --> 01:06:01,600 Speaker 1: sub domain of finance, So there are a lot of 1145 01:06:01,600 --> 01:06:03,200 Speaker 1: people who are going to be interested in investment in 1146 01:06:03,240 --> 01:06:06,600 Speaker 1: finance that I that I couldn't give any specific advice to. 1147 01:06:06,880 --> 01:06:11,240 Speaker 1: I have kind of general advice that I think is 1148 01:06:11,520 --> 01:06:15,600 Speaker 1: useful both for finance and even more broadly. This advice 1149 01:06:15,680 --> 01:06:19,400 Speaker 1: is really kind of top of Maslow's pyramid advice. If 1150 01:06:19,520 --> 01:06:22,240 Speaker 1: you know, if you're trying to kind of write your 1151 01:06:22,280 --> 01:06:25,040 Speaker 1: novel and pay the rent while you get it done, 1152 01:06:25,040 --> 01:06:27,600 Speaker 1: this is I can't really help you with that. But 1153 01:06:29,120 --> 01:06:32,360 Speaker 1: you know, if what you care about is building this career, 1154 01:06:32,520 --> 01:06:34,520 Speaker 1: then I would say number one piece of advice is 1155 01:06:34,520 --> 01:06:37,080 Speaker 1: work with incredible people. Like far and away, much more 1156 01:06:37,080 --> 01:06:40,360 Speaker 1: important than what the particular field is the details of 1157 01:06:40,400 --> 01:06:42,880 Speaker 1: what you're working on, is the caliber of the people 1158 01:06:42,880 --> 01:06:45,640 Speaker 1: that you do it with, both in terms of your 1159 01:06:45,680 --> 01:06:51,120 Speaker 1: own satisfaction and how much you learn and and and 1160 01:06:51,600 --> 01:06:55,040 Speaker 1: all of that. I think you know you'll learn, you'll 1161 01:06:55,040 --> 01:06:59,360 Speaker 1: benefit hugely on a personal level from working with incredible 1162 01:06:59,400 --> 01:07:03,680 Speaker 1: people and if you don't work with people that are 1163 01:07:04,000 --> 01:07:06,760 Speaker 1: like that, then you're probably going to have a lot 1164 01:07:06,760 --> 01:07:09,080 Speaker 1: of professional unhappiness. So it's kind of either or. 1165 01:07:09,600 --> 01:07:14,760 Speaker 2: That's a really intriguing answer. So final question, what do 1166 01:07:14,800 --> 01:07:18,360 Speaker 2: you know about the world of investing, machine learning, large 1167 01:07:18,440 --> 01:07:22,400 Speaker 2: language models, just the application of technology to the field 1168 01:07:22,440 --> 01:07:25,439 Speaker 2: of investing that you wish you knew twenty five years 1169 01:07:25,520 --> 01:07:28,520 Speaker 2: or so ago when you were really first ramping up. 1170 01:07:29,840 --> 01:07:33,720 Speaker 1: I think one of the most important lessons that I learned, 1171 01:07:33,760 --> 01:07:35,640 Speaker 1: had to learn the hard way kind of going through 1172 01:07:35,840 --> 01:07:39,960 Speaker 1: and running these systems, was that it's kind of comes 1173 01:07:40,000 --> 01:07:43,440 Speaker 1: back to the point you made earlier about the primacy 1174 01:07:43,480 --> 01:07:47,680 Speaker 1: of prediction rules. And it may be true that the 1175 01:07:47,720 --> 01:07:51,360 Speaker 1: most important thing is the prediction quality, but there are 1176 01:07:51,480 --> 01:07:55,280 Speaker 1: lots of other very necessary, mandatory ingredients, and I would 1177 01:07:55,280 --> 01:07:57,680 Speaker 1: put kind of risk management at the top of that list. 1178 01:07:57,720 --> 01:08:02,560 Speaker 1: So I think it's easy to to maybe neglect risk 1179 01:08:02,600 --> 01:08:06,480 Speaker 1: management to a certain extent and focus all of your 1180 01:08:06,520 --> 01:08:10,520 Speaker 1: attention on predictive accuracy. But I think it really does 1181 01:08:10,560 --> 01:08:13,720 Speaker 1: turn out that if you don't have high quality risk 1182 01:08:13,760 --> 01:08:16,479 Speaker 1: management to go along with that predictive accuracy, you won't succeed. 1183 01:08:17,439 --> 01:08:20,640 Speaker 1: And I guess I wish I had appreciated that in 1184 01:08:21,080 --> 01:08:23,080 Speaker 1: a really deep way twenty five years ago. 1185 01:08:23,320 --> 01:08:27,400 Speaker 2: John, This has been really, absolutely fascinating. I don't even 1186 01:08:27,479 --> 01:08:30,000 Speaker 2: know where to begin other than saying thank you for 1187 01:08:30,040 --> 01:08:34,080 Speaker 2: being so generous with your time and your expertise. We 1188 01:08:34,280 --> 01:08:36,920 Speaker 2: have been speaking with John mccauloff. He is the co 1189 01:08:37,040 --> 01:08:41,360 Speaker 2: founder and chief investment officer at the five billion dollar 1190 01:08:41,479 --> 01:08:46,040 Speaker 2: hedge fund Volleyon Group. If you enjoy this conversation, well, 1191 01:08:46,320 --> 01:08:48,599 Speaker 2: be sure and check out any of the previous five 1192 01:08:48,720 --> 01:08:53,000 Speaker 2: hundred we've done over the past nine years. You can 1193 01:08:53,040 --> 01:08:57,200 Speaker 2: find those at iTunes, Spotify, YouTube, or wherever you find 1194 01:08:57,280 --> 01:09:01,040 Speaker 2: your favorite podcast. Sign up for my daily reading list 1195 01:09:01,560 --> 01:09:06,200 Speaker 2: at rid Halts. Follow me on Twitter at Barry Underscore 1196 01:09:06,280 --> 01:09:10,200 Speaker 2: Rit Halts until I get my hacked account at rid 1197 01:09:10,200 --> 01:09:16,240 Speaker 2: Holt's back. I say that. I say that because the 1198 01:09:16,320 --> 01:09:20,759 Speaker 2: process of dealing with the seventeen people left at once 1199 01:09:20,840 --> 01:09:27,000 Speaker 2: Twitter now x is unbelievably frustrating and annoying. Follow all 1200 01:09:27,080 --> 01:09:31,160 Speaker 2: of the fine family of podcasts on Twitter at podcast 1201 01:09:31,720 --> 01:09:33,800 Speaker 2: I would be remiss if I did not thank the 1202 01:09:33,800 --> 01:09:37,639 Speaker 2: crack team that helps put these conversations together each week. 1203 01:09:38,320 --> 01:09:42,080 Speaker 2: Paris Woald is my producer. Attiko val Bron is my 1204 01:09:42,280 --> 01:09:47,519 Speaker 2: project manager. Sean Russo is my director of Research. I'm 1205 01:09:47,600 --> 01:09:51,000 Speaker 2: Barry rid Halts. You've been listening to Masters in Business 1206 01:09:51,560 --> 01:10:03,160 Speaker 2: on Bloomberg Radio at