WEBVTT - Jon McAuliffe on Innovation and Statistical Methods 0:00:02.040 --> 0:00:07.360 This is Master's in Business with Barry Ridholds on Bloomberg Radio. 0:00:08.400 --> 0:00:11.800 This week on the podcast Strap Yourself In, I have 0:00:12.039 --> 0:00:16.159 another extra special guest. John mccauliffe is co founder and 0:00:16.239 --> 0:00:19.720 chief investment officer at the Volleyon Group. They're a five 0:00:19.760 --> 0:00:23.360 billion dollar hedge funds and one of the earliest shops 0:00:24.079 --> 0:00:27.880 to ever use machine learning as it applies to training 0:00:27.920 --> 0:00:33.239 and investment management decisions. It is a full systematic approach 0:00:33.360 --> 0:00:39.599 to using computer horsepower and database and machine learning and 0:00:39.880 --> 0:00:43.960 their own predictive engine to make investments and trades, and 0:00:44.000 --> 0:00:48.400 it's managed to put together quite a track record. Previously, 0:00:48.479 --> 0:00:52.680 John was at d SHAW where he ran statistical arbitrage. 0:00:53.080 --> 0:00:56.040 He is one of the people who worked on the 0:00:56.080 --> 0:01:01.520 Amazon recommendation engine, and he is currently a professor of 0:01:01.560 --> 0:01:06.040 statistics at Berkeley. I don't even know where to begin, 0:01:06.200 --> 0:01:10.160 other than say, if you're interested in AI or machine 0:01:10.240 --> 0:01:16.080 learning or quantitative strategies, this is just a masterclass in 0:01:16.120 --> 0:01:18.280 how it's done by one of the first people in 0:01:18.319 --> 0:01:22.880 the space to not only do this sort of machine 0:01:22.920 --> 0:01:25.800 learning and apply it to investing, but one of the best. 0:01:26.120 --> 0:01:29.400 I think this is a fascinating conversation and I believe 0:01:29.440 --> 0:01:31.840 you will find it to be so. Also, with no 0:01:32.000 --> 0:01:37.800 further ado, my discussion with volleyon groups. John mccauliffe. John mccauliff, 0:01:37.920 --> 0:01:39.479 Welcome to Bloomberg. 0:01:40.120 --> 0:01:41.560 Thanks, Barry. I'm really happy to be here. 0:01:41.640 --> 0:01:45.160 So let's talk a little bit about your academic background. First, 0:01:45.840 --> 0:01:49.640 you start out undergrad computer science and applied mathematics at 0:01:49.720 --> 0:01:54.040 Harvard before you go on to get a PhD from 0:01:54.080 --> 0:01:58.240 California Berkeley. What led to a career in data analysis? 0:01:58.280 --> 0:02:00.880 How early did you know that's what you wanted to do? 0:02:01.880 --> 0:02:06.200 Well, it was a winding path. Actually, I was very 0:02:06.240 --> 0:02:11.320 interested in international relations and foreign languages when I was 0:02:11.400 --> 0:02:13.880 finishing high school. In fact, I spent the last year 0:02:13.919 --> 0:02:17.040 of high school as an exchange student in Germany. And 0:02:17.120 --> 0:02:20.400 so when I got to college, I was expecting to 0:02:20.919 --> 0:02:23.880 major in government and go on to maybe work in 0:02:23.919 --> 0:02:25.600 the foreign service something like that. 0:02:25.800 --> 0:02:30.200 Really, so this is a big shift from your original expectations. 0:02:30.280 --> 0:02:33.559 Yeah, it took about one semester for me to realize 0:02:33.840 --> 0:02:36.600 that none of the questions that were being asked in 0:02:36.639 --> 0:02:39.680 my classes had definitive and correct answers. 0:02:40.680 --> 0:02:41.720 Did that frustrate you? 0:02:41.880 --> 0:02:45.760 It did frustrate me. Yeah, And so I stayed home 0:02:45.880 --> 0:02:47.959 over winter I stayed Excuse me, I didn't go home. 0:02:48.000 --> 0:02:51.120 I stayed at college over winter break to try to 0:02:51.160 --> 0:02:52.400 sort out what the heck I was going to do, 0:02:52.440 --> 0:02:54.440 because I could see that it wasn't My plan was 0:02:54.440 --> 0:02:58.000 in disarray. And I'd always been interested in computers, had 0:02:58.000 --> 0:03:01.840 played around with computers, never done anything very serious, but 0:03:02.919 --> 0:03:05.560 I thought I might as well give it a shot, 0:03:05.639 --> 0:03:07.799 and so in the spring semester I took my first 0:03:07.840 --> 0:03:13.560 computer science course. And when you write software, everything has 0:03:13.720 --> 0:03:15.800 a right answer. It either does what you wanted to 0:03:15.800 --> 0:03:16.200 do or. 0:03:16.160 --> 0:03:20.360 It doesn't, does not compile exactly. So that's really quite 0:03:20.480 --> 0:03:26.919 quite fascinating. So what led you from Berkeley to d 0:03:27.280 --> 0:03:30.040 sure that they're one of the first quand shops. How 0:03:30.080 --> 0:03:31.839 did you get there? What sort of research did Yeah? 0:03:31.840 --> 0:03:34.920 I actually I spent time at d Shot in between 0:03:34.960 --> 0:03:38.920 my undergrad and my PhD program, So it was after 0:03:39.000 --> 0:03:40.720 Harvard that I went to that show. 0:03:40.840 --> 0:03:45.280 Did that light an interest in using machine learning and 0:03:45.320 --> 0:03:48.840 computers applied to finance or what was that experience like? 0:03:49.120 --> 0:03:54.040 Yeah, it made me really interested in and excited about 0:03:54.520 --> 0:03:59.720 using statistical thinking and data analysis to sort of understand 0:03:59.720 --> 0:04:02.840 then amics of securities prices. Machine learning did not play 0:04:02.880 --> 0:04:05.840 really a role at that time, I think, not at 0:04:05.920 --> 0:04:08.680 d SHAW, but you know, probably nowhere it was too 0:04:08.720 --> 0:04:13.960 immature a feel in the nineties. But I had already 0:04:14.000 --> 0:04:20.039 been curious and interested in using these kinds of statistical 0:04:20.080 --> 0:04:23.480 tools in trading and in investing when I was finishing 0:04:23.520 --> 0:04:25.880 college and then at d SHAW. You know, I had 0:04:25.920 --> 0:04:28.800 brilliant colleagues and we were working on hard problems. So 0:04:28.839 --> 0:04:30.200 I really, I really got a lot. 0:04:30.040 --> 0:04:32.559 Of us still one of the top performing hedge funds, 0:04:32.600 --> 0:04:34.520 one of the earliest quant hedge funds, A great a 0:04:34.560 --> 0:04:37.440 great place to absolutely cut your teeth at. So was 0:04:37.480 --> 0:04:40.080 it Harvard d SHAW and then Berkeley? 0:04:40.160 --> 0:04:40.800 Yeah, that's right? 0:04:40.839 --> 0:04:43.800 And then from Berkeley? How did you end up at Amazon? 0:04:44.800 --> 0:04:47.400 I guess I should correct myself. There was a year 0:04:47.440 --> 0:04:50.320 at Amazon after d Eshaw, but before Berkeley. 0:04:50.240 --> 0:04:55.520 And am I reading this correctly? The recommendation engine that 0:04:55.640 --> 0:04:58.480 Amazon uses you helped develop? 0:04:58.880 --> 0:05:01.200 I would say I worked on it. I would you 0:05:01.200 --> 0:05:03.560 know it existed, It was in place when I got there, 0:05:03.920 --> 0:05:07.200 and sort of the things that are familiar about the 0:05:07.200 --> 0:05:11.480 recommendation engine had already been built by my manager and 0:05:12.000 --> 0:05:17.520 his colleagues. But I worked I did research on improvements 0:05:17.560 --> 0:05:21.880 and different ways of forming recommendations. It was funny because 0:05:21.920 --> 0:05:28.840 at the time, the entire database of purchase history for 0:05:29.240 --> 0:05:32.719 all of Amazon fit in one twenty gigabyte file on 0:05:32.800 --> 0:05:34.160 a disc, so I could just load it on my 0:05:34.200 --> 0:05:35.280 computer and run. 0:05:35.360 --> 0:05:36.960 Now I don't think we could do that anymore. We 0:05:37.160 --> 0:05:39.920 could not, so, thank goodness is Amica Zone cloud services, 0:05:39.960 --> 0:05:43.279 so you could put what is it, twenty five years 0:05:43.520 --> 0:05:50.200 and hundreds of billions of dollars of transactions. So my 0:05:50.320 --> 0:05:53.839 assumption is products like that are highly iterative. The first 0:05:53.920 --> 0:05:56.159 version is all right, it does a half decent job, 0:05:56.440 --> 0:05:58.240 and then it gets better, and then it starts to 0:05:58.279 --> 0:06:01.919 get almost spookily good. It's like, oh, how much of 0:06:01.960 --> 0:06:04.560 that is just the size of the database, and how 0:06:04.600 --> 0:06:07.360 much of that is just a clever algorithm. 0:06:08.040 --> 0:06:12.960 Well, that's a great question, because the two are inextricably linked. 0:06:14.160 --> 0:06:18.080 The way that you make algorithms great is by making 0:06:18.120 --> 0:06:21.360 them more powerful, more expressive, able to describe lots of 0:06:21.360 --> 0:06:25.000 different kinds of patterns and relationships. But those kinds of 0:06:25.040 --> 0:06:29.480 approaches need huge amounts of data in order to correctly 0:06:29.520 --> 0:06:32.280 sort out what's signal and what's noise. The more expressive. 0:06:32.800 --> 0:06:35.680 A tool like that is like a recommender system, the 0:06:35.680 --> 0:06:40.359 more prone it is to mistake one time noise for 0:06:40.560 --> 0:06:44.200 persistent signal, and that is a recurring theme in statistical prediction. 0:06:44.279 --> 0:06:47.599 It is really the central problem in statistical predictions. So 0:06:47.839 --> 0:06:50.520 you have it in recommender systems, you have it in 0:06:50.720 --> 0:06:54.960 predicting price action, in the problems that we solve, and elsewhere. 0:06:55.400 --> 0:06:57.560 There was a pretty infamous New York Times article a 0:06:57.560 --> 0:07:02.919 couple of years ago about targets using their own recommender 0:07:03.000 --> 0:07:09.279 system and sending out maternity things to people. A dad 0:07:09.320 --> 0:07:13.160 gets his young teenage daughters what is this and goes 0:07:13.200 --> 0:07:15.200 in to yell at them, and turns out she was 0:07:15.200 --> 0:07:20.080 pregnant and they had pieced it together. How far of 0:07:20.120 --> 0:07:23.400 a leap is it from these systems to much more 0:07:23.880 --> 0:07:29.160 sophisticated machine learning and even large language models. 0:07:30.080 --> 0:07:32.840 It's the answer, it turns out, is that it's a 0:07:32.920 --> 0:07:37.520 question of scale. That wasn't at all obvious before GPT 0:07:38.120 --> 0:07:42.640 three and chat GPT, But it just turned out that 0:07:42.920 --> 0:07:46.360 when you have, for example, GPT is built from a 0:07:46.440 --> 0:07:50.800 database of sentences in English, it's got a trillion words 0:07:50.800 --> 0:07:53.880 in it that database, and when you take a trillion 0:07:53.920 --> 0:07:55.520 words and you use it to fit a model that 0:07:55.560 --> 0:07:59.080 has one hundred and seventy five billion parameters. There is 0:07:59.120 --> 0:08:02.480 apparently a kind of transition where things become, you know, 0:08:02.560 --> 0:08:05.440 frankly astounding. I don't I think, I don't think that 0:08:05.480 --> 0:08:07.600 anybody who isn't astounded is telling the truth. 0:08:07.720 --> 0:08:13.200 Right. It's eerie is in terms of how sophisticated it is, 0:08:13.520 --> 0:08:17.120 but it's also kind of surprising in terms of I 0:08:17.120 --> 0:08:20.520 guess what the programs like to call hallucinations. I guess 0:08:20.560 --> 0:08:24.160 if you're using the Internet as your base model, Hey, 0:08:24.320 --> 0:08:26.760 there's one or two things on the Internet that are wrong, 0:08:27.240 --> 0:08:30.080 so of course that's going to show up in something 0:08:30.080 --> 0:08:30.960 like chap GPT. 0:08:31.640 --> 0:08:36.559 Yeah, you know. Underlyingly, there's this tool GPT three that's 0:08:36.600 --> 0:08:39.560 really the engine that powers jed GPT, and that tool 0:08:40.760 --> 0:08:43.560 it has one goal. It's a simple goal. You show 0:08:43.600 --> 0:08:46.240 at the beginning of a sentence, and it predicts the 0:08:46.280 --> 0:08:48.600 next word in the sentence, and that's all it is 0:08:48.600 --> 0:08:50.800 trained to do. I mean, it really is actually that simple. 0:08:51.280 --> 0:08:53.559 It's a dumb program that looks smart. 0:08:53.679 --> 0:08:58.079 If you like. But the thing about predicting the next 0:08:58.120 --> 0:09:01.600 word in a sentence is whether you know the sequence 0:09:01.600 --> 0:09:04.400 of words that's being output, is leading to something that 0:09:04.440 --> 0:09:07.400 is true or false, is irrelevant. The only thing that 0:09:07.520 --> 0:09:10.640 it is trained to do is make highly accurate predictions 0:09:10.679 --> 0:09:11.320 of next words. 0:09:11.640 --> 0:09:16.080 So when I said it's really very sophisticated, it just 0:09:16.640 --> 0:09:20.080 for what we tend to call this artificial intelligence. But 0:09:20.160 --> 0:09:22.559 I've read a number of people said, hey, this really 0:09:22.600 --> 0:09:26.320 is an AI. This is something a little more rudimentary. 0:09:26.480 --> 0:09:30.719 Yeah, I think, you know, a critic would say that 0:09:31.280 --> 0:09:35.559 artificial intelligence is a complete misnomer. There's sort of nothing 0:09:36.160 --> 0:09:40.560 remotely intelligent in the colloquial sense about these systems. And 0:09:40.600 --> 0:09:44.080 then a common defense in AI research is that artificial 0:09:44.120 --> 0:09:46.240 intelligence is a moving target. As soon as you build 0:09:46.240 --> 0:09:50.120 a system that does something quasi magical that was the 0:09:50.160 --> 0:09:53.720 old yardstick of intelligence, then the goalposts get moved by 0:09:53.720 --> 0:09:57.400 the people who are supplying the evaluations. And I guess 0:09:57.440 --> 0:09:59.320 I would sit somewhere in between. I think the language 0:09:59.320 --> 0:10:03.480 is unfortunate because it's so easily misconstrued. I wouldn't call 0:10:03.520 --> 0:10:05.760 the system dumb, and I wouldn't call it smart. It's 0:10:06.000 --> 0:10:08.319 you know, those are those are not characteristics of these systems. 0:10:08.440 --> 0:10:10.320 But it's complex and sophisticated. 0:10:10.440 --> 0:10:12.280 It certainly is it has one hundred and seventy five 0:10:12.280 --> 0:10:15.559 billion parameters. That doesn't fit your definition of complex you 0:10:15.679 --> 0:10:16.120 know what would? 0:10:16.920 --> 0:10:21.360 Yeah, that works for me. So your in your career line, 0:10:21.400 --> 0:10:26.720 where is aphametrics and what was that recommendation engine? Like? 0:10:26.920 --> 0:10:29.760 Yeah, So that was work I did as a summer 0:10:30.280 --> 0:10:34.120 research intern during my PhD. And that work was about 0:10:35.800 --> 0:10:39.880 what's called the problem is called genotype calling. So genotype calling, 0:10:40.360 --> 0:10:43.200 I'll explain, Barry, do you have an identical twin? I 0:10:43.280 --> 0:10:46.720 do not, Okay, So I can safely say your genome 0:10:46.760 --> 0:10:49.040 is unique in the world. There's no one else who 0:10:49.040 --> 0:10:52.560 has exactly your genome. On the other hand, if you 0:10:52.559 --> 0:10:55.280 were to lay your genome in mind alongside each other 0:10:55.400 --> 0:10:58.840 lined up, they would be ninety nine point nine percent identical. 0:10:58.960 --> 0:11:03.240 About one position in a thousand is different. But those 0:11:03.280 --> 0:11:05.719 differences are what caused you to be you and me 0:11:05.800 --> 0:11:08.200 to be me. So they're obviously of intense kind of 0:11:08.200 --> 0:11:11.680 scientific and applied interest. And so it's very important to 0:11:11.800 --> 0:11:15.320 be able to take a sort of a sample of 0:11:15.360 --> 0:11:20.360 your DNA and quickly produce a profile of all the 0:11:20.400 --> 0:11:23.960 places that have variability what your particular values are, Okay, 0:11:24.120 --> 0:11:27.079 And that problem is the genotyping problem. 0:11:27.400 --> 0:11:31.000 And this used to be a very expensive, very complex 0:11:31.760 --> 0:11:34.840 problem to solve that. We've spent billions of dollars figuring 0:11:34.880 --> 0:11:38.040 out now a lot faster, a lot cheaper. 0:11:37.800 --> 0:11:40.560 A lot faster. In fact, even the technology I worked 0:11:40.600 --> 0:11:43.280 on in two thousand and five two thousand and four 0:11:43.840 --> 0:11:47.679 is multiple generations old and not really what's used anymore. 0:11:47.880 --> 0:11:51.000 So let's talk about what you did at the efficient frontier. 0:11:51.559 --> 0:11:56.680 Explain what real time click prediction rules are and how 0:11:56.720 --> 0:11:59.120 it works for a keyword search. 0:11:59.280 --> 0:12:06.000 Sure, the revenue engine that drives Google is search keyword ads, right, 0:12:06.040 --> 0:12:07.480 So every time you do a search, at the top 0:12:07.520 --> 0:12:10.360 you see ad ad AD, And so how do those 0:12:10.400 --> 0:12:14.520 ads get there? Well, actually it's surprising maybe if you 0:12:14.520 --> 0:12:16.280 don't know about it, but every single time you type 0:12:16.280 --> 0:12:19.240 in a search term on Google and hit return, a 0:12:19.559 --> 0:12:23.280 very fast auction takes place, and a whole bunch of 0:12:23.360 --> 0:12:28.840 companies running software bid electronically to place their ads at 0:12:28.840 --> 0:12:33.360 the top of your search results. And the more or 0:12:33.480 --> 0:12:36.600 less the results that are shown on the page are 0:12:36.600 --> 0:12:38.760 in order of how much they bid. It's not quite true, 0:12:38.760 --> 0:12:40.000 but you could think of it. It's true. 0:12:40.360 --> 0:12:44.160 A rough outline. So the first three sponsored results on 0:12:44.280 --> 0:12:48.320 a Google page, go through that auction process, and I 0:12:48.360 --> 0:12:50.760 think at this point everybody knows what page rank is 0:12:50.800 --> 0:12:53.800 for for the rest of that that's right, And that 0:12:53.840 --> 0:12:56.080 seemed to be Google secret sauce early on. 0:12:56.280 --> 0:13:01.360 Right, Well, you know, to talk about the the ad placement. 0:13:01.520 --> 0:13:04.120 So the people who are supplying the ad, who are 0:13:04.120 --> 0:13:06.200 participating in the auctions, they have a problem, which is 0:13:06.200 --> 0:13:08.840 how much to bid, right, And so how would you 0:13:08.880 --> 0:13:11.959 decide how much to bid? Well, you want to know 0:13:13.120 --> 0:13:15.520 basically the probability that somebody is going to click on 0:13:15.559 --> 0:13:19.040 your ad, and then you would multiply that by how 0:13:19.120 --> 0:13:21.959 much money you make eventually if they click. And that's 0:13:22.040 --> 0:13:24.840 kind of an expectation of how much money you'll make. 0:13:25.200 --> 0:13:29.959 And so then you gear your bid price to make 0:13:30.000 --> 0:13:32.319 sure that it's going to be profitable for you. And 0:13:32.360 --> 0:13:36.000 then so really you have to make a decision about 0:13:36.200 --> 0:13:38.040 what this click through rate is going to be. You 0:13:38.040 --> 0:13:39.800 have to predict the click through probability. 0:13:40.360 --> 0:13:42.480 So I was going to say, this sounds like it's 0:13:42.520 --> 0:13:48.000 a very sophisticated application of computer science probability and statistics. 0:13:48.520 --> 0:13:50.880 And if you do it right, you make money, and 0:13:50.960 --> 0:13:54.360 if you do it wrong, your ad budget is a 0:13:54.360 --> 0:13:55.400 money loser. 0:13:55.160 --> 0:13:55.559 That's right. 0:13:55.840 --> 0:13:58.520 Huh. So tell us a little bit about your doctorate, 0:13:58.600 --> 0:14:01.920 what you wrote about for your PhD at Berkeley. 0:14:02.240 --> 0:14:06.640 Yeah, so we're back to genomes. Actually, this was around 0:14:06.679 --> 0:14:08.679 the time when I was in my first year of 0:14:08.679 --> 0:14:11.560 my PhD program, is when the human genome was published 0:14:11.960 --> 0:14:16.120 in Nature. So it was kind of really the beginning 0:14:16.200 --> 0:14:20.720 of the explosion of work on kind of high throughput, 0:14:21.240 --> 0:14:26.400 large scale genetics research. And one really important question after 0:14:26.440 --> 0:14:28.600 you've sequenced a genome is well, what are all the 0:14:28.640 --> 0:14:30.360 bits of it doing. You can look at a string 0:14:30.400 --> 0:14:33.520 of DNA. It's just made up of these kind of 0:14:33.520 --> 0:14:37.400 four letters, but you don't want to just know the 0:14:37.440 --> 0:14:39.960 four letters. They're kind of a code. And some parts 0:14:39.960 --> 0:14:43.920 of the DNA represent useful stuff that is being turned 0:14:44.080 --> 0:14:47.720 by your cell into proteins and et cetera, and other 0:14:47.840 --> 0:14:49.880 parts of the DNA don't appear to have any function 0:14:49.920 --> 0:14:51.760 at all, and it's really important to know which is 0:14:51.800 --> 0:14:56.000 which as a biology researcher. And so it's you know, 0:14:56.040 --> 0:15:01.240 for a long time before high throughput sequencing, biologists would 0:15:01.240 --> 0:15:03.320 be in the lab and they would very laboriously look 0:15:03.360 --> 0:15:05.920 at very tiny segments of DNA and establish what their 0:15:05.920 --> 0:15:08.960 function was. But now we have the whole human genome 0:15:09.040 --> 0:15:10.880 sitting on disk, and we would like to be able 0:15:10.920 --> 0:15:13.200 to just run an analysis on it and have the 0:15:13.240 --> 0:15:16.760 computer spit out everything that is functional and not functional. 0:15:17.760 --> 0:15:21.480 And so that's the problem I worked on. And a 0:15:21.520 --> 0:15:24.400 really important insight is that you can take advantage of 0:15:24.440 --> 0:15:28.600 the idea of natural selection and the idea of evolution 0:15:29.120 --> 0:15:31.640 to help you. And the way you do that is 0:15:32.160 --> 0:15:34.840 you have the human genome, you sequence a bunch of 0:15:35.160 --> 0:15:38.800 primate genomes nearby relatives of the union, and you lay 0:15:38.840 --> 0:15:41.760 all those genomes on top of each other, and then 0:15:41.960 --> 0:15:45.800 you look for places where all of the genomes agree. Right, 0:15:45.920 --> 0:15:50.080 there hasn't been variation that's happening through mutations. And why 0:15:50.160 --> 0:15:53.080 hasn't there been, Well, the biggest force that throws out 0:15:53.160 --> 0:15:56.440 variation is natural selection. If you get a mutation in 0:15:56.480 --> 0:15:59.400 a part of your genome that really matters, then you're 0:15:59.480 --> 0:16:02.640 kind of on it and you won't have progeny and 0:16:02.680 --> 0:16:06.120 that'll get stamped out. So natural selection is this very 0:16:06.120 --> 0:16:10.160 strong force that's causing DNA not to change. And so 0:16:10.200 --> 0:16:13.160 when you when you make these primate alignments, you can 0:16:13.360 --> 0:16:18.320 really leverage that fact and look for conservation and use 0:16:18.360 --> 0:16:20.000 that as a big signal that something is functional. 0:16:20.280 --> 0:16:25.160 Huh, really really interesting. You mentioned our DNA is ninety 0:16:25.240 --> 0:16:28.640 nine point ninety nine. I don't know how many places 0:16:28.640 --> 0:16:30.200 to the right of the decimal point you would want 0:16:30.200 --> 0:16:34.560 to go, but very similar. How how similar or different 0:16:34.720 --> 0:16:38.360 are we from let's say, a chimpanzee. I've always questioned, 0:16:38.400 --> 0:16:41.680 there's an urban legend that they're practically the same. It 0:16:41.680 --> 0:16:46.680 always seems like it's overstated two percent. So you and 0:16:46.720 --> 0:16:49.680 I have a point one percent different me and the 0:16:49.720 --> 0:16:52.280 average chimp. It's two point zero percent. 0:16:52.440 --> 0:16:55.800 That's exactly right. Yeah, so chimps are essentially our closest 0:16:56.360 --> 0:16:57.720 non human primate relatives. 0:16:58.320 --> 0:17:02.280 Really really quite fascinating. So let's talk a little bit 0:17:02.320 --> 0:17:05.160 about the firm. You guys were one of the earliest 0:17:05.160 --> 0:17:09.040 pioneers of machine learning research. Explain a little bit what 0:17:09.119 --> 0:17:09.840 the firm does. 0:17:10.880 --> 0:17:16.439 Sure, so, we run trading strategies investment strategies that are 0:17:16.640 --> 0:17:20.320 fully automated, so we call them fully systematic, and that 0:17:20.400 --> 0:17:24.760 means that we have software systems that run every day 0:17:25.440 --> 0:17:29.800 during market hours, and they take in information about the 0:17:29.880 --> 0:17:34.400 characteristics of the securities we're trading. Think of stocks and 0:17:34.440 --> 0:17:39.800 then they make predictions of how the prices of each 0:17:39.960 --> 0:17:43.400 security is going to change over time, and then they 0:17:44.480 --> 0:17:47.600 decide on changes in our inventory, changes in held positions 0:17:48.280 --> 0:17:53.000 based on those predictions, and then those desired changes are 0:17:53.040 --> 0:17:56.360 sent into an execution system which automatically carries them out. 0:17:56.960 --> 0:18:02.680 So fully automated. Is there supervision or it's kind of 0:18:02.760 --> 0:18:05.040 running on its own with a couple of checks. 0:18:05.119 --> 0:18:09.119 There's lots of human diagnostic supervision, right, So there are 0:18:09.160 --> 0:18:14.760 people who are watching screens full of instrumentation and telemetry 0:18:14.840 --> 0:18:17.720 about what the systems are doing. But those people are 0:18:17.760 --> 0:18:21.240 not taking any actions, right unless there's a problem, right, 0:18:21.320 --> 0:18:23.000 and then they do. 0:18:23.480 --> 0:18:26.120 So let's talk a little bit about how machines learn 0:18:26.200 --> 0:18:30.280 to identify signals. I'm assuming you start with the giant 0:18:30.359 --> 0:18:35.320 database that is the history of stock prices, volume movement, etc. 0:18:36.200 --> 0:18:38.840 And then bring in a lot of additional things to bear. 0:18:39.520 --> 0:18:44.439 What's the process like developing a particular trading strategy. 0:18:44.960 --> 0:18:48.760 Yeah, so, as you're saying, we begin with a very 0:18:48.840 --> 0:18:54.160 large historical data set of prices and volumes, market data 0:18:54.200 --> 0:18:59.440 that kind, but importantly all kinds of other information about securities, 0:19:00.000 --> 0:19:04.760 financial statement data, textual data, analyst data. 0:19:05.080 --> 0:19:11.000 So it's everything from prices fundamental everything from learnings to 0:19:11.080 --> 0:19:14.080 revenue to sales, etc. I'm assuming the change and the 0:19:14.680 --> 0:19:17.159 delta of the change is going to be very significant 0:19:17.200 --> 0:19:22.000 in that. What about macroeconomic what some people call noise, 0:19:22.119 --> 0:19:27.159 but one would imagine some signal in everything from inflation 0:19:27.359 --> 0:19:31.840 to interest rates to GDPs firm spending. Are those inputs 0:19:32.280 --> 0:19:34.200 worthwhile or how do you think about those? 0:19:34.560 --> 0:19:38.640 So we don't hold portfolios that are exposed to those things. 0:19:38.760 --> 0:19:42.320 So it's really a business decision on our part. We 0:19:42.440 --> 0:19:47.200 are working with institutional investors who already have as much 0:19:47.240 --> 0:19:50.200 exposure as they want to things like the market or 0:19:50.520 --> 0:19:56.040 to well recognized econometric risk factors like value, and so 0:19:56.080 --> 0:19:58.680 they don't need our help to be exposed to those things. 0:19:58.680 --> 0:20:01.439 They are very well equipped to handle that part of 0:20:01.480 --> 0:20:05.440 their investment process. What we're trying to provide is the 0:20:05.480 --> 0:20:09.000 most diversification possible. So we want to give them a 0:20:09.040 --> 0:20:14.000 new return stream which has good and stable returns, but 0:20:14.160 --> 0:20:17.199 on top of that, importantly, is also not correlated with 0:20:17.240 --> 0:20:19.639 any of the other return streams that they already that 0:20:19.680 --> 0:20:20.280 they already have. 0:20:20.480 --> 0:20:25.040 That's interesting. So can I assume that you're applying your 0:20:25.359 --> 0:20:29.359 machine learning methodology across different asset classes or is it 0:20:29.400 --> 0:20:30.560 strictly equities? Oh? 0:20:30.600 --> 0:20:34.639 No, We apply it to UH to equities, to credit, 0:20:34.720 --> 0:20:39.840 to corporate bonds, and we trade futures contracts, and in 0:20:39.880 --> 0:20:41.520 the fullness of time, we hope that we will be 0:20:41.560 --> 0:20:44.320 trading kind of every security in the world. 0:20:44.359 --> 0:20:47.159 So, so currently stocks, bonds, When you say futures, I 0:20:47.200 --> 0:20:48.440 assume commodities, all. 0:20:48.400 --> 0:20:49.320 Kinds of futures contract. 0:20:49.359 --> 0:20:52.399 It's really really interesting. So it could be anything from 0:20:52.640 --> 0:20:56.560 interest rate swaps to commodities to the full gamut. So, 0:20:56.880 --> 0:21:01.480 so how different is this approach from what other quant 0:21:01.600 --> 0:21:05.480 shops do that really focus on equities. 0:21:06.800 --> 0:21:11.280 I think it's kind of the same question as asking, well, 0:21:11.400 --> 0:21:13.119 what do we mean when we say we use machine 0:21:13.160 --> 0:21:16.480 learning or that you know we are our principles are 0:21:16.520 --> 0:21:20.520 our machine learning principles, and so how does that make 0:21:20.600 --> 0:21:24.159 us different than the kind of standard approach in quantitative trading? 0:21:24.840 --> 0:21:28.000 And the answer to the question really comes back to 0:21:28.000 --> 0:21:31.720 this idea we mentioned a little while ago of how 0:21:31.760 --> 0:21:36.200 powerful the tools are that you're using to form predictions. Right, 0:21:36.480 --> 0:21:40.199 So in our business, the thing that we build is 0:21:40.200 --> 0:21:44.040 called a prediction rule. Okay, that's that's our widget and 0:21:44.320 --> 0:21:46.760 What a prediction rule does is it takes in a 0:21:46.760 --> 0:21:49.560 bunch of input, a bunch of information about a stock 0:21:49.880 --> 0:21:53.600 at a moment in time, and it hands you a 0:21:53.760 --> 0:21:56.080 guess about how that stock's price is going to change 0:21:56.200 --> 0:22:00.840 over some future period of time. Okay, and so there 0:22:00.920 --> 0:22:05.320 is one most important question about prediction rules, which is 0:22:05.480 --> 0:22:07.840 how complex are they? How much complexity do they have? 0:22:08.400 --> 0:22:13.000 Complexity is a colloquial term. It's unfortunately another example of 0:22:13.600 --> 0:22:16.879 a place where things can be vague or ambiguous because 0:22:18.080 --> 0:22:21.359 a general purpose word has been borrowed in a technical setting. 0:22:21.520 --> 0:22:24.399 But when you use the word complexity in statistical prediction, 0:22:24.720 --> 0:22:28.800 there's a very specific meaning. It means how much expressive 0:22:28.840 --> 0:22:32.280 power does this prediction rule have? How good a job 0:22:32.440 --> 0:22:35.280 can it do of approximating what's going on in the 0:22:35.359 --> 0:22:38.200 data you show it. Remember, we have these giant historical 0:22:38.280 --> 0:22:41.760 data sets, and every entry in the data set looks 0:22:41.800 --> 0:22:44.520 like this. What was going on with the stock at 0:22:44.520 --> 0:22:47.440 a moment in a certain moment in time, it's price action, 0:22:47.680 --> 0:22:52.080 it's financials analyst information. And then what did its price 0:22:52.160 --> 0:22:55.040 do in the subsequent twenty four hours or the subsequent 0:22:55.160 --> 0:23:01.000 fifteen minutes or whatever. Okay, and so when you talk 0:23:01.040 --> 0:23:04.240 about the amount of complexity that a prediction rule has, 0:23:04.720 --> 0:23:08.280 that means how well is it able to capture the 0:23:08.320 --> 0:23:11.000 relationship between the things that you can show it when 0:23:11.040 --> 0:23:13.840 you ask it for a prediction, and what actually happens 0:23:14.040 --> 0:23:18.159 to the price. And naturally you kind of want to 0:23:18.720 --> 0:23:20.840 use high complexity rules because they have a lot of 0:23:20.880 --> 0:23:23.440 approximating power. They do a good job of describing anything 0:23:23.480 --> 0:23:26.920 that's going on. But there are two There are two 0:23:26.960 --> 0:23:30.639 disadvantages to high complexity. One is it needs a lot 0:23:30.680 --> 0:23:34.639 of data, otherwise it gets fooled into thinking that randomness 0:23:34.680 --> 0:23:39.000 is actually signal. And the other is that it's hard 0:23:39.000 --> 0:23:41.880 to reason about what's going on under the hood. Right, 0:23:41.960 --> 0:23:45.080