WEBVTT - Why You Should Wait Out AI’s Super-Spending False Start 0:00:02.720 --> 0:00:19.880 Bloomberg Audio Studios, Podcasts, Radio News. Welcome to Marin Talks Money, 0:00:19.880 --> 0:00:22.560 the podcast in which people who know the markets explain 0:00:22.640 --> 0:00:25.479 the markets. I am Maren Summerzet Web and this week 0:00:25.520 --> 0:00:28.600 I am speaking with doctor Yanushmeretzky, who is an AI 0:00:28.680 --> 0:00:32.080 partner at Aaron Innovation Capital. Now, as you know, on 0:00:32.159 --> 0:00:34.680 this podcast, we like to talk about the big forces 0:00:34.720 --> 0:00:37.520 affecting our economy and markets in general, so it's really 0:00:37.720 --> 0:00:40.879 no surprise that we keep coming back to the impact 0:00:40.880 --> 0:00:44.000 of AI. We've talked at different times about the consequences 0:00:44.000 --> 0:00:46.760 for jobs, for inflation, for interest rates, for tech companies, 0:00:46.840 --> 0:00:50.800 and whether politicians and indeed policymakers, let alone ordinary workers 0:00:50.840 --> 0:00:53.519 and investors, are ready for any of this. But what 0:00:53.560 --> 0:00:59.640 we've never asked is is it actually working? Yanish Welcome 0:00:59.680 --> 0:01:00.680 to Marri Talks Money. 0:01:01.360 --> 0:01:02.640 Thank you so much for having me. 0:01:03.000 --> 0:01:06.840 And it's coming to start with a brief explanation of 0:01:06.920 --> 0:01:10.199 exactly what it is that we mean when we say AI. 0:01:10.480 --> 0:01:11.559 Yes, everyone talks. 0:01:11.400 --> 0:01:13.840 About AI all the time. Aire say that is going 0:01:13.880 --> 0:01:15.840 to change the world, it's going to solve all our problems, 0:01:15.920 --> 0:01:18.319 it's going to destroy our jobs, etc. But what do 0:01:18.360 --> 0:01:20.520 we actually mean when we say AI. 0:01:21.319 --> 0:01:24.720 So these days, what we mean by saying AI is 0:01:24.760 --> 0:01:30.080 a system which is approximating certain process. It might be 0:01:30.160 --> 0:01:33.600 a system which is approximating language. It might be a 0:01:33.640 --> 0:01:37.200 system which is approximating images. It might be a system 0:01:37.520 --> 0:01:40.560 which is approximating how a robot moves. By the end 0:01:40.600 --> 0:01:44.280 of the day, the current generation of AI techniques, those 0:01:44.319 --> 0:01:49.480 neural networks are the function approximators. This is approximate things. 0:01:49.720 --> 0:01:53.320 They don't solve intelligence, they approximate. So that's why you 0:01:53.360 --> 0:01:56.160 may have an illusion that those systems are intelligent. At 0:01:56.160 --> 0:01:58.960 the end of the day, they are approximating intelligence. 0:01:59.080 --> 0:02:00.520 And I suppose there's two ways to look at this. 0:02:00.600 --> 0:02:04.120 What AI as you've just described needs from us and 0:02:04.160 --> 0:02:06.480 what we really need from it to make it work 0:02:06.520 --> 0:02:09.280 for us. So if you look at the way that 0:02:09.280 --> 0:02:11.600 the big hyperscalas are approaching things at the moment, they're 0:02:11.600 --> 0:02:15.080 building massive data centers to build out their capacity, and 0:02:15.120 --> 0:02:18.960 that requires vast amounts of energy, It requires lots of coolants, 0:02:19.120 --> 0:02:21.640 it requires a very large volume of very different types 0:02:21.680 --> 0:02:25.079 of chips, right, yes, and all those things. Obviously there 0:02:25.160 --> 0:02:27.680 are troubles at the moment. We're getting all these things 0:02:27.760 --> 0:02:29.680 with the war in the Middle East, etc. So there 0:02:29.680 --> 0:02:32.560 are supply restrictions, but nonetheless none of these things are 0:02:32.560 --> 0:02:35.440 really our long term problem. Given their correct policy choices, 0:02:35.639 --> 0:02:40.480 all those material elements can easily be not easily, but 0:02:40.560 --> 0:02:43.400 can be found and built in. Then there's the second 0:02:43.400 --> 0:02:47.400 bit that we've talked about when we met we last met, 0:02:47.440 --> 0:02:51.200 which is the data that you require to train a model, 0:02:51.720 --> 0:02:56.520 and that can't be created in the volumes that are required. 0:02:56.560 --> 0:02:59.880 And we've hit it. We've hit a supply problem with data. 0:03:00.160 --> 0:03:04.240 We have hit a supply problem with diverse data. To 0:03:04.240 --> 0:03:07.520 clarify the thing, because you can just create an infinite 0:03:07.520 --> 0:03:10.800 amount of data by randomly generating new words, so you 0:03:10.800 --> 0:03:13.679 can create it. But we're talking about creating high quality, 0:03:13.800 --> 0:03:17.120 diverse data. And you see we've run out of data, 0:03:17.200 --> 0:03:21.040 diverse data, not yesterday, not a month ago. We have 0:03:21.120 --> 0:03:23.840 run out of data three and a half years ago. 0:03:23.880 --> 0:03:28.160 You can to understand that the last frontier model GBT four, 0:03:28.400 --> 0:03:34.320 which was not a combination of agents, etc. Just basic LLM. 0:03:34.920 --> 0:03:38.560 The last LLM that was the most potent was GBT four. 0:03:39.240 --> 0:03:42.080 It was released when it was released in twenty twenty 0:03:42.160 --> 0:03:45.960 three in January, However, the training of that model finished 0:03:46.000 --> 0:03:48.920 at the end of twenty twenty two, so it's three 0:03:48.960 --> 0:03:52.080 and a half years ago. We have trained the model 0:03:52.440 --> 0:03:56.000 that used all the publicly available data on the Internet. 0:03:56.320 --> 0:03:58.600 There is nothing more out there to use to train 0:03:58.680 --> 0:04:02.400 the model. Hit the data ceiling not a month ago, 0:04:02.640 --> 0:04:05.640 but three and a half years ago. It's extremely important. 0:04:06.280 --> 0:04:08.920 And yes, what we're doing right now is we're trying 0:04:09.000 --> 0:04:13.160 to put together multiple lms. We're trying to have synthetic data, 0:04:13.200 --> 0:04:16.919 but the performance isn't really there. We've hit diminishing returns 0:04:17.000 --> 0:04:19.440 not a month ago, but three and a half years ago. 0:04:19.880 --> 0:04:22.640 Okay, And it's surely new data is created all the 0:04:22.680 --> 0:04:25.559 time on the Internet. You know, we talk about diverse data, 0:04:25.600 --> 0:04:27.480 and it's an awful lot more created in the past 0:04:27.520 --> 0:04:30.000 than there is now. But everyone's use of the Internet 0:04:30.080 --> 0:04:33.200 surely creates huge volumes of new data all the time. 0:04:33.560 --> 0:04:37.200 That's even worse. Humans are creating new data on the Internet, 0:04:37.240 --> 0:04:40.560 but that data falls into certain patterns. How many conversations 0:04:40.600 --> 0:04:42.680 on the weather you can have every day? 0:04:42.960 --> 0:04:46.000 A lot, actually a lot. I think I've already had 0:04:46.040 --> 0:04:47.880 three today, to be honest. You know, it's very cold 0:04:47.920 --> 0:04:48.479 in Edinburgh. 0:04:48.640 --> 0:04:50.680 You can have a lot, but there's a bigger, profound 0:04:50.680 --> 0:04:52.960 problem there I want to mention is that if you 0:04:53.000 --> 0:04:55.840 look at the data that is currently being created on 0:04:55.880 --> 0:05:01.560 the open Internet, it is data create by those llms, 0:05:01.800 --> 0:05:06.599 which number one is inaccurate because it suffers from Halla's hallucinations, 0:05:06.640 --> 0:05:10.719 and number two, it feeds into itself, so that the 0:05:10.800 --> 0:05:14.600 new models now train on the entire Internet, the training 0:05:14.640 --> 0:05:17.880 on the output from other models, which is I said, 0:05:17.960 --> 0:05:22.080 are making mistakes and hallucinations, and that, using a technical term, 0:05:22.600 --> 0:05:26.520 slowly ends up leading to something called model collapse, where 0:05:26.520 --> 0:05:28.520 the models themselves are actually getting dumber. 0:05:29.440 --> 0:05:32.640 Okay, so if you train a new model on newly 0:05:32.680 --> 0:05:36.719 created data, you're effectively training on its own nonsense or 0:05:36.760 --> 0:05:38.760 nonsense created by other similar models. 0:05:39.040 --> 0:05:41.880 Yeah, well, not only nonsense. We have to understand that 0:05:41.920 --> 0:05:46.200 those llms using artificial neural networks do you're correct let's 0:05:46.240 --> 0:05:49.240 say ninety five ninety nine percent of the time, So 0:05:49.320 --> 0:05:52.360 most of the content is correct, but you no longer 0:05:52.520 --> 0:05:56.680 know what content is incorrect, which is a significant degradation 0:05:56.760 --> 0:05:59.080 of the quality of data. It's almost like having access 0:05:59.120 --> 0:06:02.080 to a calculator which claims to be correct one hundred 0:06:02.080 --> 0:06:04.200 percent of the time, but in reality it is correct 0:06:04.279 --> 0:06:07.000 ninety five ninety nine percent of the time. How much 0:06:07.000 --> 0:06:09.520 would you pay for that calculator? Would you use the 0:06:09.560 --> 0:06:14.479 output from that calculator to produce novel calculator? No, you 0:06:14.480 --> 0:06:16.680 wouldn't do that. But we are doing it right now, 0:06:16.839 --> 0:06:19.200 and we are seeing right now that the performance of 0:06:19.200 --> 0:06:22.560 those models on benchmark is not increasing anymore plat out 0:06:22.600 --> 0:06:25.440 three years ago. And there's even a bigger, profound question. 0:06:25.600 --> 0:06:29.960 We are training those models now using higher quality data, 0:06:30.040 --> 0:06:33.120 so no longer we're using the entire internet. Now we 0:06:33.200 --> 0:06:37.359 are filtering from the Internet a lot of garbage, so 0:06:37.400 --> 0:06:41.880 we're using a smaller training data set, thus producing smaller 0:06:42.000 --> 0:06:45.200 llms that have the same quality. But as I mentioned, 0:06:45.279 --> 0:06:48.000 they are smaller, So what is the consequence of it. Well, 0:06:48.000 --> 0:06:50.400 as we speak, I'm running those models on my laptop. 0:06:50.600 --> 0:06:53.159 I don't use any data center for it. I never will. 0:06:53.440 --> 0:06:55.760 I'm running them on my laptop. So the models have 0:06:55.800 --> 0:06:58.880 gotten better. But by saying better, I mean they are 0:06:58.960 --> 0:07:00.000 more power efficial. 0:07:00.520 --> 0:07:03.039 They're not all right, So I mean that they're most specific, 0:07:03.200 --> 0:07:06.520 so that I get toward a more specific task because 0:07:06.520 --> 0:07:09.239 they're using a narrower range of data, so they can't 0:07:09.279 --> 0:07:12.080 be a type of general intelligence they're specific and. 0:07:12.440 --> 0:07:16.080 Remarkably no, they are general purpose. I mean you can 0:07:16.160 --> 0:07:18.400 just go online and down on quent three point five 0:07:18.480 --> 0:07:20.920 and three point six. This is a general purpose model, 0:07:21.080 --> 0:07:24.239 which is even accessing the internet to give you Summari 0:07:24.280 --> 0:07:27.480 is reasoning. It's doing this thing on your laptop. It's today. 0:07:27.760 --> 0:07:29.600 Of course. On top of that, if you want to, 0:07:30.440 --> 0:07:34.600 you can find tune your model on your proprietary data. 0:07:34.880 --> 0:07:37.960 But you see even the fine tuning process these days 0:07:38.080 --> 0:07:41.560 can be done on your laptop, which again raises the question, 0:07:41.880 --> 0:07:44.560 why do you need to pay for compute? Why do 0:07:44.600 --> 0:07:46.880 you need all this expansion of the data centers. You 0:07:46.960 --> 0:07:50.200 really don't. And what's going to happen in year two, 0:07:50.280 --> 0:07:52.960 three years from now when the majority of laptops out 0:07:53.000 --> 0:07:56.040 there will be able to run general purpose the most 0:07:56.160 --> 0:07:59.720 potent language models with access to the Internet. I just 0:08:00.040 --> 0:08:02.040 I'm just not seeing it. And that's on top of 0:08:02.120 --> 0:08:05.200 one fundamental thing that I want to mention upfront, because 0:08:05.240 --> 0:08:08.960 you mentioned what does the AI need from us? And 0:08:09.080 --> 0:08:10.920 what we need from the AI? So what the AI 0:08:11.040 --> 0:08:13.400 needs from us are two things, as you mentioned. Number one, 0:08:13.480 --> 0:08:17.080 compute and yes, we can provide more and more compute 0:08:17.200 --> 0:08:21.400 as long as oracle CDs is not going down, which 0:08:21.440 --> 0:08:23.920 it is right now. And number two, we need to 0:08:24.000 --> 0:08:26.360 provide more data. We've use all of it, so to 0:08:26.480 --> 0:08:29.480 some extent we cannot produce better AI. Now what do 0:08:29.600 --> 0:08:32.719 we want to have from the AI. We want to 0:08:32.840 --> 0:08:35.679 have at least two things and just bury me here. 0:08:36.480 --> 0:08:41.600 Number one, we want to have systems which are continually learning. 0:08:42.360 --> 0:08:45.800 So just like during this podcast today, I hope that 0:08:46.040 --> 0:08:48.240 I will be able to learn something from you. You 0:08:48.320 --> 0:08:51.000 will be able to learn something from me and remember 0:08:51.160 --> 0:08:56.640 tomorrow or maybe remembered after Easter. Current generation systems in general, 0:08:56.800 --> 0:09:00.719 neural networks are not learning anything new you when you 0:09:00.840 --> 0:09:03.679 interact with them, which is significant limitation. So we're not 0:09:03.760 --> 0:09:06.240 getting from the AIS what we want to They're not 0:09:06.400 --> 0:09:10.040 learning from us. It's a fundamental limitation that's not solved. 0:09:10.080 --> 0:09:12.839 And number two, you want to mention this thing up front. 0:09:13.600 --> 0:09:19.640 Those systems are stochastic. They are probabilistic. You cannot trust them. 0:09:20.120 --> 0:09:23.240 They roll the dice whenever they produce output, so to 0:09:23.400 --> 0:09:26.640 some extent you cannot trust their output. Can you make 0:09:26.679 --> 0:09:30.400 them deterministic? Yes, of course you can by making sure 0:09:30.440 --> 0:09:34.360 that they always produce the most likely token. Now, use 0:09:34.400 --> 0:09:36.280 the word token. But the problem with this thing is 0:09:36.320 --> 0:09:38.559 that then they will be just copying the data from 0:09:38.600 --> 0:09:41.839 the training set. So just imagine those lawsuits when you 0:09:41.920 --> 0:09:46.240 see verbatting copies of all the podcasts books produced as 0:09:46.240 --> 0:09:48.880 an output of Gemini or open Ai system. So those 0:09:48.920 --> 0:09:52.880 systems number one are not continually learning, which basically for 0:09:53.000 --> 0:09:55.120 me is just a no go. And number two, those 0:09:55.160 --> 0:09:58.840 systems are not to be trusted because whenever they produce output, 0:09:59.000 --> 0:10:02.920 that output is it stochastic. It's not a terministic it's 0:10:03.040 --> 0:10:05.280 basically rolling the dice to catch you output. 0:10:05.840 --> 0:10:07.679 Yeah. Can we talk a bit more about that about 0:10:07.720 --> 0:10:10.280 how the hallucinations or errors have it build up? 0:10:10.760 --> 0:10:14.600 Absolutely? I me try to explain using simple terms when 0:10:14.679 --> 0:10:18.959 you use ch GPT or Gemini or copilot, but this 0:10:19.080 --> 0:10:23.640 is Copilot is actually open eye system. It just produces texts, 0:10:23.800 --> 0:10:27.840 so you have an impression that generates one word at 0:10:27.880 --> 0:10:31.439 the time deterministically, that's what's on the screen, just one 0:10:31.559 --> 0:10:34.800 word after another. But in reality that's not what those 0:10:34.840 --> 0:10:39.280 systems produce. If you are a developer like me and 0:10:39.360 --> 0:10:42.480 my coelagus are developers as well, you can look at 0:10:42.520 --> 0:10:45.480 the developer output of a large language. Models do you 0:10:45.520 --> 0:10:48.559 know what it gives you. It gives you a vector 0:10:48.920 --> 0:10:52.640 of fifty thousand elements, actually fifty two thousand elements, so 0:10:52.800 --> 0:10:56.840 fifty thousand elements, and each element has a certain probability 0:10:56.960 --> 0:11:00.360 of being correct zero point ninety five zero one zero 0:11:00.480 --> 0:11:03.360 zero one zero, zero point three. It's an entire vector. 0:11:03.880 --> 0:11:07.679 It's not one word has probability one. Everything else is zero. No, 0:11:07.840 --> 0:11:09.640 there is a little bit of error there. It has 0:11:09.720 --> 0:11:12.240 to be, and so not is what happens when you're 0:11:12.280 --> 0:11:15.400 producing one word at the time, or one token at 0:11:15.440 --> 0:11:18.439 the time. You can think of a token as a 0:11:18.559 --> 0:11:21.400 ward split by two or three. So when you produce 0:11:21.480 --> 0:11:24.920 output one token at the time, this system is rolling 0:11:24.960 --> 0:11:27.400 the dice all the time. It's making a small, tiny 0:11:27.600 --> 0:11:31.319 error every time it produces a ward s. At the beginning, 0:11:32.040 --> 0:11:34.520 you may not perceive the error, but over time, after 0:11:34.679 --> 0:11:38.079 three hundred, five hundred a thousand words, the error is 0:11:38.120 --> 0:11:41.160 going to be so big that it's going to result 0:11:41.200 --> 0:11:45.119 in the critical failure of the system. You cannot circumvent 0:11:45.200 --> 0:11:48.599 it because those systems are probabilistic. It's not like in 0:11:48.640 --> 0:11:51.280 an Excel spreadsheet, where you can have a chain of 0:11:51.400 --> 0:11:54.719 aund ten twenty one hundred one thousand formulas and you 0:11:54.840 --> 0:11:58.120 know that the formulas are going to produce the correct result. Here, 0:11:58.679 --> 0:12:02.079 if you have a chain of words, every time you 0:12:02.200 --> 0:12:04.640 produce a word, you accrue a little bit of error 0:12:05.080 --> 0:12:07.400 and you see this error manifest itself later on. 0:12:08.400 --> 0:12:12.679 Okay, so is this solvable? I guess that's the key question. 0:12:13.000 --> 0:12:15.679 Is it possible? And the type of models that we're 0:12:15.760 --> 0:12:18.800 using at the moment, which are super hyped, is it 0:12:18.880 --> 0:12:22.839 possible for that hallucination problem or compounding error problem to 0:12:22.920 --> 0:12:26.400 be solved, or is it simply a systemic shortcoming that 0:12:26.559 --> 0:12:27.400 is unresolvable. 0:12:28.280 --> 0:12:31.679 So here I'm again speaking from my experience having two 0:12:31.679 --> 0:12:34.719 ephds in mathematics and computer science and twenty years of 0:12:34.800 --> 0:12:39.640 experience in deep mind and IBM Watson research. No, it's impossible. 0:12:39.880 --> 0:12:43.120 You have to use a different technique for it. Yes, 0:12:43.320 --> 0:12:47.040 there are different techniques that are emerging right now. In 0:12:47.200 --> 0:12:49.520 full disclaimer, I'm also a co founder of a startup 0:12:49.679 --> 0:12:53.520 working on one of those techniques. We call them fractal brain. Yes, 0:12:53.600 --> 0:12:55.920 there are new techniques on the horizon. However, the existing 0:12:56.040 --> 0:13:00.600 techniques have a building mechanism so that every time produce 0:13:00.640 --> 0:13:04.040 an output another word, you have a little bit of error. 0:13:04.400 --> 0:13:07.079 You cannot eliminate it thing. And there's also a fundamental 0:13:07.120 --> 0:13:10.400 other thing you mentioned hallucinations. What do we mean by hallucinations? 0:13:10.720 --> 0:13:14.240 It's number one, it's producing those small errors one word 0:13:14.240 --> 0:13:18.280 at a time. But there's also another reason for hallucinations 0:13:18.320 --> 0:13:22.200 when this system just doesn't remember what was mentioned yesterday, 0:13:22.320 --> 0:13:24.600 a week ago, or a month ago, and goes back 0:13:24.640 --> 0:13:26.800 to the initial question. You need both. You need to 0:13:26.880 --> 0:13:30.439 have a system which, like humans, it's continually learning. And 0:13:30.600 --> 0:13:33.720 number two, you need to have a system which does 0:13:33.840 --> 0:13:37.040 not make errors when it produces the next word. It 0:13:37.080 --> 0:13:39.760 shouldn't roll a dice. You need those two and so 0:13:40.480 --> 0:13:43.240 now to answer your question, can we solve this problem 0:13:43.400 --> 0:13:49.200 using artificial neural networks? No, there are attempts to. There 0:13:49.280 --> 0:13:52.000 have been attempts to circumvent it. If you want to 0:13:52.080 --> 0:13:55.320 have continual learning, you can maybe try to use something 0:13:55.440 --> 0:13:58.800 like continual backpropagation from rich Sutt only got a two 0:13:58.840 --> 0:14:00.559 in go word in twenty twenty four. Or we can, 0:14:00.679 --> 0:14:03.559 but it's not solving the problem. At least he's attempting 0:14:03.640 --> 0:14:07.200 to find solutions to it. You can retrain the system, 0:14:07.280 --> 0:14:09.599 of course, right, I mean, why not take GBT for 0:14:09.960 --> 0:14:12.520 after the end of the podcast. Retrain the entire system. 0:14:12.559 --> 0:14:14.839 It's going to cost you five million dollars. But you 0:14:14.920 --> 0:14:18.559 could do that. You could retrain the entire system, or 0:14:18.640 --> 0:14:20.320 you can find you in your system. But when you 0:14:20.440 --> 0:14:24.600 find you in your system, the system is forgetting what 0:14:24.920 --> 0:14:28.400 it was trained on before. So you're suffering something which 0:14:28.480 --> 0:14:34.080 is called catastrophic forgetting or catastrophic interference. Long story short, No, 0:14:35.120 --> 0:14:38.960 you cannot solve the outstanding problems of hallucinations and lack 0:14:39.000 --> 0:14:41.800 of continual learning. Unfortunately. I wish you could, but you 0:14:41.880 --> 0:14:43.320 need to have different things for it. And it's not 0:14:43.440 --> 0:14:47.400 just me saying it. Look at the landscape of researchers, 0:14:47.480 --> 0:14:49.800 leading researchers in the field, my colleagues, I know all 0:14:49.840 --> 0:14:54.520 of them personally. They have all jumped ship. This is important. 0:14:54.680 --> 0:14:57.880 Look at, for example, young Lecon from Metaga. He jumped ship. 0:14:58.120 --> 0:15:00.280 He's working on his new startup am I line apps, 0:15:00.720 --> 0:15:03.480 not working on llms, saying l lams is a dead 0:15:03.600 --> 0:15:07.680 end to AGI. Look at Michael Iff Dave Silver from DeepMind. 0:15:07.760 --> 0:15:09.720 He just left deep Mind, but a couple of weeks 0:15:09.760 --> 0:15:12.480 ago he formed Ineffable Intelligence. I think they're raising a 0:15:12.520 --> 0:15:15.600 billion dollars the same thing. He is not a believer 0:15:15.760 --> 0:15:19.720 of using l lms for general artificial intelligence, and can 0:15:19.960 --> 0:15:24.400 go on and on Iliasutskiver, Andre Carpaty. And so it's 0:15:24.440 --> 0:15:26.560 not just me who is saying that we need to 0:15:26.600 --> 0:15:30.160 go back to research. The leading research and researchers in 0:15:30.240 --> 0:15:34.200 the field have already jumped ship a few months, maybe 0:15:34.200 --> 0:15:37.640 even years ago, working on the next generation things. The 0:15:37.760 --> 0:15:41.840 market still believes we can solve hallucinations, but the leading 0:15:41.920 --> 0:15:45.800 researchers have jumped ship. That's unbelievable to me that we 0:15:46.040 --> 0:15:50.160 keep pouring money into bigger data centers, knowing we've used 0:15:50.200 --> 0:15:53.840 all the data already, and knowing that even if they 0:15:53.880 --> 0:15:57.320 have more data, you will not solve continual learning, and 0:15:57.440 --> 0:16:00.800 you will not solve hallucinations. So why is not everyone? 0:16:01.360 --> 0:16:02.200 Why are people doing it? 0:16:02.800 --> 0:16:05.480 Okay, So if we know clearly, and it sounds from 0:16:05.520 --> 0:16:07.440 what you say that we do know very clearly that 0:16:07.640 --> 0:16:10.680 using more and more computing power and more and more 0:16:10.720 --> 0:16:13.920 already mildly corrupted data isn't going to get us anywhere. 0:16:14.320 --> 0:16:19.120 Then the enormous cap expand on vast data centers, the 0:16:19.560 --> 0:16:21.960 hundreds of billions of dollars that have already gone into 0:16:22.040 --> 0:16:24.200 this and are still projected to go into This is 0:16:24.280 --> 0:16:27.080 a catastrophic misallocation of capital. 0:16:27.600 --> 0:16:31.080 Well that's how you look at it. So obviously you 0:16:31.160 --> 0:16:34.040 can say the biggest winner of it is Nvidia because 0:16:34.040 --> 0:16:37.680 it's producing those GPUs or during the gold rush, you 0:16:37.680 --> 0:16:40.880 should invest in companies that produce the shovels you can 0:16:40.920 --> 0:16:43.920 still make money. For example, I'm a partner at Urn 0:16:43.960 --> 0:16:47.920 Innovation Capital in the UK and we are investing in 0:16:48.400 --> 0:16:51.560 airs mission two companies, one called high Verge and second 0:16:51.640 --> 0:16:57.000 called Hydra. These companies improve coolings of data centers or 0:16:57.120 --> 0:17:01.040 these companies produce better algorithms to run on data centers. 0:17:01.080 --> 0:17:04.560 So you can still allocate your capital wisely, but you 0:17:04.600 --> 0:17:08.360 shouldn't allocate them in companies which are spending on this compute. 0:17:09.000 --> 0:17:12.880 You should allocate your capital in companies that are allowing 0:17:12.960 --> 0:17:16.520 those data centers to run efficiently because those data centers, 0:17:16.920 --> 0:17:19.280 who knows, maybe they will be used for different purpose 0:17:19.320 --> 0:17:22.040 at some point. So again, there are going to be 0:17:22.119 --> 0:17:25.879 winners and losers of the current gold rush in GPUs 0:17:26.000 --> 0:17:31.000 and AI. I would say companies that have not invested 0:17:31.200 --> 0:17:34.880 massively in data centers and in front here models are 0:17:34.920 --> 0:17:38.399 going to be the winners. There's some companies without mentioning 0:17:38.440 --> 0:17:42.760 the names, that have been accused of not training their 0:17:42.840 --> 0:17:45.760 own language models. I think these, to me are going 0:17:45.840 --> 0:17:47.320 to be the winners in today's market. 0:17:47.560 --> 0:17:48.760 Are we talking about Apple? 0:17:49.280 --> 0:17:52.040 I will let you determine that thing. But you can 0:17:52.119 --> 0:17:56.040 see some companies number one, have not invested in the 0:17:56.240 --> 0:17:59.680 LM front here models, but the research teams have kept 0:17:59.720 --> 0:18:02.760 public papers saying that those lms stayed on the reason 0:18:02.800 --> 0:18:05.359 they make mistakes. So let you find those companies. And 0:18:05.480 --> 0:18:08.440 there are some companies that have borrowed a lot of 0:18:08.560 --> 0:18:10.960 money to expand those center centers. And it's not me. 0:18:11.119 --> 0:18:13.520 Look at the market, look at CDs on Oracle. It's 0:18:14.320 --> 0:18:17.400 the market is just flashing reds saying this is foolish. 0:18:17.800 --> 0:18:20.240 So we're seeing those signals already. But I want to 0:18:20.280 --> 0:18:21.680 make sure that if you want to make money in 0:18:21.720 --> 0:18:25.520 today's market, is given that the governments have to pay 0:18:25.600 --> 0:18:29.360 eight nine percent of the revenue on servicing the debt. 0:18:29.800 --> 0:18:31.280 You don't know if the market is going to go 0:18:31.359 --> 0:18:33.520 up and down, and maybe a capital injection, liquid, the 0:18:33.520 --> 0:18:35.520 injection from Central Bucks. You don't know those things. So 0:18:35.640 --> 0:18:38.960 I would not recommend your short or go along any investment, 0:18:39.000 --> 0:18:42.280 I would recommend maybe doing an arbitrush buy companies that 0:18:42.400 --> 0:18:46.399 have not wasted money on llms and short companies that 0:18:46.560 --> 0:18:49.080 have borrowed a lot of money to expand data centers. 0:18:49.119 --> 0:18:51.680 That will be my suggestion, but I might be wrong. 0:18:51.760 --> 0:19:16.600 Again. Can you tell us about any of the startup 0:19:16.640 --> 0:19:19.159 companies that you're invested in or interested in that are 0:19:19.240 --> 0:19:22.120 taking us to this new frontier and AI away from 0:19:22.160 --> 0:19:24.080 the LLM model and towards a different model. 0:19:24.240 --> 0:19:27.680 Absolutely so. In our investment fund, our an innovation capital, 0:19:27.840 --> 0:19:30.040 we have access to a lot of companies, actually maybe 0:19:30.119 --> 0:19:33.680 twenty or thirty companies that are developing the next frontier 0:19:33.760 --> 0:19:36.560 models which are not necessarily using artificial in our networks. 0:19:36.560 --> 0:19:39.240 And I can speak about three or four of them 0:19:39.240 --> 0:19:42.000 which are very exciting. One company you can have a look. 0:19:42.200 --> 0:19:46.640 They are based in Switzerland. They're called Innate AI. Again, 0:19:46.640 --> 0:19:48.800 I'm advertising them and we're not investing in them. We're 0:19:48.800 --> 0:19:53.080 not investing them yet. They are developing new version of 0:19:53.960 --> 0:19:57.280 neural networks which are inspired on the brain. This is 0:19:57.359 --> 0:19:59.320 an effort that was going on in Europe for more 0:19:59.359 --> 0:20:02.680 than a decade, the Blue Brain project. They are developing 0:20:02.760 --> 0:20:05.600 something new which is not an artificial neural network. So 0:20:05.680 --> 0:20:10.159 that's one kind. Look at another company, for example, PATHWAYAI 0:20:10.480 --> 0:20:14.960 in the Bay Area. Again another example they mentioned upfront, 0:20:15.240 --> 0:20:18.800 you need to solve continual learning. You need to solve that. 0:20:18.880 --> 0:20:21.280 If you don't have it, forget about the solution to AGI. 0:20:21.800 --> 0:20:27.000 And so they have been developing systems that can learn 0:20:27.160 --> 0:20:29.960 using something called heavy on learning, which is a local 0:20:30.080 --> 0:20:32.840 learning technique that happens in the brain, not using bad 0:20:32.880 --> 0:20:36.520 propagation grid in descent. So this is another example. Another 0:20:36.640 --> 0:20:38.879 company that I'm actually a co founder of and the 0:20:39.000 --> 0:20:42.040 CEO called Fractal Brain AI. Have a look at the thing. 0:20:42.600 --> 0:20:45.440 It's also based on prefrontal cortex, and it's this idea 0:20:45.600 --> 0:20:50.359 that those networks are continually growing and rewiring themselves. So 0:20:50.680 --> 0:20:53.119 no longer you have a fixed network with a fixed 0:20:53.240 --> 0:20:58.080 number of parameters. No, the network is growing expanding themselves. 0:20:58.200 --> 0:21:00.840 Like today, you're going to probably form a connections after 0:21:00.920 --> 0:21:03.800 this podcast. Those networks do the same, They create new 0:21:03.840 --> 0:21:06.719 connections all the time, and on top of that, they 0:21:06.760 --> 0:21:10.080 are continually learning and thousands of times more power efficient 0:21:10.600 --> 0:21:13.000 in addition to being data efficient. So these are only 0:21:13.080 --> 0:21:15.720 some of the examples of companies that I'm personally very 0:21:15.760 --> 0:21:20.159 excited about. But as ILIOSU Skiverer said the other day 0:21:20.200 --> 0:21:24.880 on one of those podcasts, that we have gone back 0:21:25.240 --> 0:21:29.520 from the age of scaling to the age of research, 0:21:30.200 --> 0:21:33.320 So researchers have gone back to developing the new things. 0:21:33.840 --> 0:21:36.879 It's just onlin end here me and my teams have 0:21:37.000 --> 0:21:41.160 started developing, for example, fractals fractal brain twelve years ago. 0:21:41.320 --> 0:21:44.960 We knew about those outstanding limitations of artificial neural networks 0:21:45.000 --> 0:21:47.199 more than a decade ago, so we wouldn't invest our 0:21:47.280 --> 0:21:47.720 time in it. 0:21:48.000 --> 0:21:50.480 Nevertheless, somehow I got to cut up in this sort 0:21:50.520 --> 0:21:54.120 of super bubble hype, despite the fact that good scientists 0:21:54.160 --> 0:21:54.399 aren't you. 0:21:54.880 --> 0:21:56.800 But this is good. I like the hype because you see, 0:21:57.240 --> 0:22:02.000 to some extent that hype and the lands they allowed 0:22:02.160 --> 0:22:06.600 us to understand that it's possible to approximate human language. 0:22:07.080 --> 0:22:09.400 So now that you know that you can approximate human 0:22:09.520 --> 0:22:13.240 language with el lamps, you can try to find ways 0:22:13.280 --> 0:22:15.960 to actually solve the idea of human language. If you 0:22:16.040 --> 0:22:18.480 can approximate something, you can see the size of it. 0:22:19.119 --> 0:22:22.000 So it's almost like if someone showed you, hey, there 0:22:22.080 --> 0:22:24.240 is a rocket there, it flies, you already know the 0:22:24.359 --> 0:22:26.280 size of the rocket. You can know you can fly, 0:22:26.720 --> 0:22:29.159 you can start to crack the details of the engine 0:22:29.200 --> 0:22:31.240 of the rocket. So to some extent, I like the 0:22:31.320 --> 0:22:34.560 current hype. I like the current generation systems because they 0:22:34.600 --> 0:22:38.000 allowed us to understand the size of the problem and 0:22:38.119 --> 0:22:40.320