WEBVTT - Why You Should Wait Out AI’s Super-Spending False Start

0:00:02.720 --> 0:00:19.880
<v Speaker 1>Bloomberg Audio Studios, Podcasts, Radio News. Welcome to Marin Talks Money,

0:00:19.880 --> 0:00:22.560
<v Speaker 1>the podcast in which people who know the markets explain

0:00:22.640 --> 0:00:25.479
<v Speaker 1>the markets. I am Maren Summerzet Web and this week

0:00:25.520 --> 0:00:28.600
<v Speaker 1>I am speaking with doctor Yanushmeretzky, who is an AI

0:00:28.680 --> 0:00:32.080
<v Speaker 1>partner at Aaron Innovation Capital. Now, as you know, on

0:00:32.159 --> 0:00:34.680
<v Speaker 1>this podcast, we like to talk about the big forces

0:00:34.720 --> 0:00:37.520
<v Speaker 1>affecting our economy and markets in general, so it's really

0:00:37.720 --> 0:00:40.879
<v Speaker 1>no surprise that we keep coming back to the impact

0:00:40.880 --> 0:00:44.000
<v Speaker 1>of AI. We've talked at different times about the consequences

0:00:44.000 --> 0:00:46.760
<v Speaker 1>for jobs, for inflation, for interest rates, for tech companies,

0:00:46.840 --> 0:00:50.800
<v Speaker 1>and whether politicians and indeed policymakers, let alone ordinary workers

0:00:50.840 --> 0:00:53.519
<v Speaker 1>and investors, are ready for any of this. But what

0:00:53.560 --> 0:00:59.640
<v Speaker 1>we've never asked is is it actually working? Yanish Welcome

0:00:59.680 --> 0:01:00.680
<v Speaker 1>to Marri Talks Money.

0:01:01.360 --> 0:01:02.640
<v Speaker 2>Thank you so much for having me.

0:01:03.000 --> 0:01:06.840
<v Speaker 1>And it's coming to start with a brief explanation of

0:01:06.920 --> 0:01:10.199
<v Speaker 1>exactly what it is that we mean when we say AI.

0:01:10.480 --> 0:01:11.559
<v Speaker 2>Yes, everyone talks.

0:01:11.400 --> 0:01:13.840
<v Speaker 1>About AI all the time. Aire say that is going

0:01:13.880 --> 0:01:15.840
<v Speaker 1>to change the world, it's going to solve all our problems,

0:01:15.920 --> 0:01:18.319
<v Speaker 1>it's going to destroy our jobs, etc. But what do

0:01:18.360 --> 0:01:20.520
<v Speaker 1>we actually mean when we say AI.

0:01:21.319 --> 0:01:24.720
<v Speaker 2>So these days, what we mean by saying AI is

0:01:24.760 --> 0:01:30.080
<v Speaker 2>a system which is approximating certain process. It might be

0:01:30.160 --> 0:01:33.600
<v Speaker 2>a system which is approximating language. It might be a

0:01:33.640 --> 0:01:37.200
<v Speaker 2>system which is approximating images. It might be a system

0:01:37.520 --> 0:01:40.560
<v Speaker 2>which is approximating how a robot moves. By the end

0:01:40.600 --> 0:01:44.280
<v Speaker 2>of the day, the current generation of AI techniques, those

0:01:44.319 --> 0:01:49.480
<v Speaker 2>neural networks are the function approximators. This is approximate things.

0:01:49.720 --> 0:01:53.320
<v Speaker 2>They don't solve intelligence, they approximate. So that's why you

0:01:53.360 --> 0:01:56.160
<v Speaker 2>may have an illusion that those systems are intelligent. At

0:01:56.160 --> 0:01:58.960
<v Speaker 2>the end of the day, they are approximating intelligence.

0:01:59.080 --> 0:02:00.520
<v Speaker 1>And I suppose there's two ways to look at this.

0:02:00.600 --> 0:02:04.120
<v Speaker 1>What AI as you've just described needs from us and

0:02:04.160 --> 0:02:06.480
<v Speaker 1>what we really need from it to make it work

0:02:06.520 --> 0:02:09.280
<v Speaker 1>for us. So if you look at the way that

0:02:09.280 --> 0:02:11.600
<v Speaker 1>the big hyperscalas are approaching things at the moment, they're

0:02:11.600 --> 0:02:15.080
<v Speaker 1>building massive data centers to build out their capacity, and

0:02:15.120 --> 0:02:18.960
<v Speaker 1>that requires vast amounts of energy, It requires lots of coolants,

0:02:19.120 --> 0:02:21.640
<v Speaker 1>it requires a very large volume of very different types

0:02:21.680 --> 0:02:25.079
<v Speaker 1>of chips, right, yes, and all those things. Obviously there

0:02:25.160 --> 0:02:27.680
<v Speaker 1>are troubles at the moment. We're getting all these things

0:02:27.760 --> 0:02:29.680
<v Speaker 1>with the war in the Middle East, etc. So there

0:02:29.680 --> 0:02:32.560
<v Speaker 1>are supply restrictions, but nonetheless none of these things are

0:02:32.560 --> 0:02:35.440
<v Speaker 1>really our long term problem. Given their correct policy choices,

0:02:35.639 --> 0:02:40.480
<v Speaker 1>all those material elements can easily be not easily, but

0:02:40.560 --> 0:02:43.400
<v Speaker 1>can be found and built in. Then there's the second

0:02:43.400 --> 0:02:47.400
<v Speaker 1>bit that we've talked about when we met we last met,

0:02:47.440 --> 0:02:51.200
<v Speaker 1>which is the data that you require to train a model,

0:02:51.720 --> 0:02:56.520
<v Speaker 1>and that can't be created in the volumes that are required.

0:02:56.560 --> 0:02:59.880
<v Speaker 1>And we've hit it. We've hit a supply problem with data.

0:03:00.160 --> 0:03:04.240
<v Speaker 2>We have hit a supply problem with diverse data. To

0:03:04.240 --> 0:03:07.520
<v Speaker 2>clarify the thing, because you can just create an infinite

0:03:07.520 --> 0:03:10.800
<v Speaker 2>amount of data by randomly generating new words, so you

0:03:10.800 --> 0:03:13.679
<v Speaker 2>can create it. But we're talking about creating high quality,

0:03:13.800 --> 0:03:17.120
<v Speaker 2>diverse data. And you see we've run out of data,

0:03:17.200 --> 0:03:21.040
<v Speaker 2>diverse data, not yesterday, not a month ago. We have

0:03:21.120 --> 0:03:23.840
<v Speaker 2>run out of data three and a half years ago.

0:03:23.880 --> 0:03:28.160
<v Speaker 2>You can to understand that the last frontier model GBT four,

0:03:28.400 --> 0:03:34.320
<v Speaker 2>which was not a combination of agents, etc. Just basic LLM.

0:03:34.920 --> 0:03:38.560
<v Speaker 2>The last LLM that was the most potent was GBT four.

0:03:39.240 --> 0:03:42.080
<v Speaker 2>It was released when it was released in twenty twenty

0:03:42.160 --> 0:03:45.960
<v Speaker 2>three in January, However, the training of that model finished

0:03:46.000 --> 0:03:48.920
<v Speaker 2>at the end of twenty twenty two, so it's three

0:03:48.960 --> 0:03:52.080
<v Speaker 2>and a half years ago. We have trained the model

0:03:52.440 --> 0:03:56.000
<v Speaker 2>that used all the publicly available data on the Internet.

0:03:56.320 --> 0:03:58.600
<v Speaker 2>There is nothing more out there to use to train

0:03:58.680 --> 0:04:02.400
<v Speaker 2>the model. Hit the data ceiling not a month ago,

0:04:02.640 --> 0:04:05.640
<v Speaker 2>but three and a half years ago. It's extremely important.

0:04:06.280 --> 0:04:08.920
<v Speaker 2>And yes, what we're doing right now is we're trying

0:04:09.000 --> 0:04:13.160
<v Speaker 2>to put together multiple lms. We're trying to have synthetic data,

0:04:13.200 --> 0:04:16.919
<v Speaker 2>but the performance isn't really there. We've hit diminishing returns

0:04:17.000 --> 0:04:19.440
<v Speaker 2>not a month ago, but three and a half years ago.

0:04:19.880 --> 0:04:22.640
<v Speaker 1>Okay, And it's surely new data is created all the

0:04:22.680 --> 0:04:25.559
<v Speaker 1>time on the Internet. You know, we talk about diverse data,

0:04:25.600 --> 0:04:27.480
<v Speaker 1>and it's an awful lot more created in the past

0:04:27.520 --> 0:04:30.000
<v Speaker 1>than there is now. But everyone's use of the Internet

0:04:30.080 --> 0:04:33.200
<v Speaker 1>surely creates huge volumes of new data all the time.

0:04:33.560 --> 0:04:37.200
<v Speaker 2>That's even worse. Humans are creating new data on the Internet,

0:04:37.240 --> 0:04:40.560
<v Speaker 2>but that data falls into certain patterns. How many conversations

0:04:40.600 --> 0:04:42.680
<v Speaker 2>on the weather you can have every day?

0:04:42.960 --> 0:04:46.000
<v Speaker 1>A lot, actually a lot. I think I've already had

0:04:46.040 --> 0:04:47.880
<v Speaker 1>three today, to be honest. You know, it's very cold

0:04:47.920 --> 0:04:48.479
<v Speaker 1>in Edinburgh.

0:04:48.640 --> 0:04:50.680
<v Speaker 2>You can have a lot, but there's a bigger, profound

0:04:50.680 --> 0:04:52.960
<v Speaker 2>problem there I want to mention is that if you

0:04:53.000 --> 0:04:55.840
<v Speaker 2>look at the data that is currently being created on

0:04:55.880 --> 0:05:01.560
<v Speaker 2>the open Internet, it is data create by those llms,

0:05:01.800 --> 0:05:06.599
<v Speaker 2>which number one is inaccurate because it suffers from Halla's hallucinations,

0:05:06.640 --> 0:05:10.719
<v Speaker 2>and number two, it feeds into itself, so that the

0:05:10.800 --> 0:05:14.600
<v Speaker 2>new models now train on the entire Internet, the training

0:05:14.640 --> 0:05:17.880
<v Speaker 2>on the output from other models, which is I said,

0:05:17.960 --> 0:05:22.080
<v Speaker 2>are making mistakes and hallucinations, and that, using a technical term,

0:05:22.600 --> 0:05:26.520
<v Speaker 2>slowly ends up leading to something called model collapse, where

0:05:26.520 --> 0:05:28.520
<v Speaker 2>the models themselves are actually getting dumber.

0:05:29.440 --> 0:05:32.640
<v Speaker 1>Okay, so if you train a new model on newly

0:05:32.680 --> 0:05:36.719
<v Speaker 1>created data, you're effectively training on its own nonsense or

0:05:36.760 --> 0:05:38.760
<v Speaker 1>nonsense created by other similar models.

0:05:39.040 --> 0:05:41.880
<v Speaker 2>Yeah, well, not only nonsense. We have to understand that

0:05:41.920 --> 0:05:46.200
<v Speaker 2>those llms using artificial neural networks do you're correct let's

0:05:46.240 --> 0:05:49.240
<v Speaker 2>say ninety five ninety nine percent of the time, So

0:05:49.320 --> 0:05:52.360
<v Speaker 2>most of the content is correct, but you no longer

0:05:52.520 --> 0:05:56.680
<v Speaker 2>know what content is incorrect, which is a significant degradation

0:05:56.760 --> 0:05:59.080
<v Speaker 2>of the quality of data. It's almost like having access

0:05:59.120 --> 0:06:02.080
<v Speaker 2>to a calculator which claims to be correct one hundred

0:06:02.080 --> 0:06:04.200
<v Speaker 2>percent of the time, but in reality it is correct

0:06:04.279 --> 0:06:07.000
<v Speaker 2>ninety five ninety nine percent of the time. How much

0:06:07.000 --> 0:06:09.520
<v Speaker 2>would you pay for that calculator? Would you use the

0:06:09.560 --> 0:06:14.479
<v Speaker 2>output from that calculator to produce novel calculator? No, you

0:06:14.480 --> 0:06:16.680
<v Speaker 2>wouldn't do that. But we are doing it right now,

0:06:16.839 --> 0:06:19.200
<v Speaker 2>and we are seeing right now that the performance of

0:06:19.200 --> 0:06:22.560
<v Speaker 2>those models on benchmark is not increasing anymore plat out

0:06:22.600 --> 0:06:25.440
<v Speaker 2>three years ago. And there's even a bigger, profound question.

0:06:25.600 --> 0:06:29.960
<v Speaker 2>We are training those models now using higher quality data,

0:06:30.040 --> 0:06:33.120
<v Speaker 2>so no longer we're using the entire internet. Now we

0:06:33.200 --> 0:06:37.359
<v Speaker 2>are filtering from the Internet a lot of garbage, so

0:06:37.400 --> 0:06:41.880
<v Speaker 2>we're using a smaller training data set, thus producing smaller

0:06:42.000 --> 0:06:45.200
<v Speaker 2>llms that have the same quality. But as I mentioned,

0:06:45.279 --> 0:06:48.000
<v Speaker 2>they are smaller, So what is the consequence of it. Well,

0:06:48.000 --> 0:06:50.400
<v Speaker 2>as we speak, I'm running those models on my laptop.

0:06:50.600 --> 0:06:53.159
<v Speaker 2>I don't use any data center for it. I never will.

0:06:53.440 --> 0:06:55.760
<v Speaker 2>I'm running them on my laptop. So the models have

0:06:55.800 --> 0:06:58.880
<v Speaker 2>gotten better. But by saying better, I mean they are

0:06:58.960 --> 0:07:00.000
<v Speaker 2>more power efficial.

0:07:00.520 --> 0:07:03.039
<v Speaker 1>They're not all right, So I mean that they're most specific,

0:07:03.200 --> 0:07:06.520
<v Speaker 1>so that I get toward a more specific task because

0:07:06.520 --> 0:07:09.239
<v Speaker 1>they're using a narrower range of data, so they can't

0:07:09.279 --> 0:07:12.080
<v Speaker 1>be a type of general intelligence they're specific and.

0:07:12.440 --> 0:07:16.080
<v Speaker 2>Remarkably no, they are general purpose. I mean you can

0:07:16.160 --> 0:07:18.400
<v Speaker 2>just go online and down on quent three point five

0:07:18.480 --> 0:07:20.920
<v Speaker 2>and three point six. This is a general purpose model,

0:07:21.080 --> 0:07:24.239
<v Speaker 2>which is even accessing the internet to give you Summari

0:07:24.280 --> 0:07:27.480
<v Speaker 2>is reasoning. It's doing this thing on your laptop. It's today.

0:07:27.760 --> 0:07:29.600
<v Speaker 2>Of course. On top of that, if you want to,

0:07:30.440 --> 0:07:34.600
<v Speaker 2>you can find tune your model on your proprietary data.

0:07:34.880 --> 0:07:37.960
<v Speaker 2>But you see even the fine tuning process these days

0:07:38.080 --> 0:07:41.560
<v Speaker 2>can be done on your laptop, which again raises the question,

0:07:41.880 --> 0:07:44.560
<v Speaker 2>why do you need to pay for compute? Why do

0:07:44.600 --> 0:07:46.880
<v Speaker 2>you need all this expansion of the data centers. You

0:07:46.960 --> 0:07:50.200
<v Speaker 2>really don't. And what's going to happen in year two,

0:07:50.280 --> 0:07:52.960
<v Speaker 2>three years from now when the majority of laptops out

0:07:53.000 --> 0:07:56.040
<v Speaker 2>there will be able to run general purpose the most

0:07:56.160 --> 0:07:59.720
<v Speaker 2>potent language models with access to the Internet. I just

0:08:00.040 --> 0:08:02.040
<v Speaker 2>I'm just not seeing it. And that's on top of

0:08:02.120 --> 0:08:05.200
<v Speaker 2>one fundamental thing that I want to mention upfront, because

0:08:05.240 --> 0:08:08.960
<v Speaker 2>you mentioned what does the AI need from us? And

0:08:09.080 --> 0:08:10.920
<v Speaker 2>what we need from the AI? So what the AI

0:08:11.040 --> 0:08:13.400
<v Speaker 2>needs from us are two things, as you mentioned. Number one,

0:08:13.480 --> 0:08:17.080
<v Speaker 2>compute and yes, we can provide more and more compute

0:08:17.200 --> 0:08:21.400
<v Speaker 2>as long as oracle CDs is not going down, which

0:08:21.440 --> 0:08:23.920
<v Speaker 2>it is right now. And number two, we need to

0:08:24.000 --> 0:08:26.360
<v Speaker 2>provide more data. We've use all of it, so to

0:08:26.480 --> 0:08:29.480
<v Speaker 2>some extent we cannot produce better AI. Now what do

0:08:29.600 --> 0:08:32.719
<v Speaker 2>we want to have from the AI. We want to

0:08:32.840 --> 0:08:35.679
<v Speaker 2>have at least two things and just bury me here.

0:08:36.480 --> 0:08:41.600
<v Speaker 2>Number one, we want to have systems which are continually learning.

0:08:42.360 --> 0:08:45.800
<v Speaker 2>So just like during this podcast today, I hope that

0:08:46.040 --> 0:08:48.240
<v Speaker 2>I will be able to learn something from you. You

0:08:48.320 --> 0:08:51.000
<v Speaker 2>will be able to learn something from me and remember

0:08:51.160 --> 0:08:56.640
<v Speaker 2>tomorrow or maybe remembered after Easter. Current generation systems in general,

0:08:56.800 --> 0:09:00.719
<v Speaker 2>neural networks are not learning anything new you when you

0:09:00.840 --> 0:09:03.679
<v Speaker 2>interact with them, which is significant limitation. So we're not

0:09:03.760 --> 0:09:06.240
<v Speaker 2>getting from the AIS what we want to They're not

0:09:06.400 --> 0:09:10.040
<v Speaker 2>learning from us. It's a fundamental limitation that's not solved.

0:09:10.080 --> 0:09:12.839
<v Speaker 2>And number two, you want to mention this thing up front.

0:09:13.600 --> 0:09:19.640
<v Speaker 2>Those systems are stochastic. They are probabilistic. You cannot trust them.

0:09:20.120 --> 0:09:23.240
<v Speaker 2>They roll the dice whenever they produce output, so to

0:09:23.400 --> 0:09:26.640
<v Speaker 2>some extent you cannot trust their output. Can you make

0:09:26.679 --> 0:09:30.400
<v Speaker 2>them deterministic? Yes, of course you can by making sure

0:09:30.440 --> 0:09:34.360
<v Speaker 2>that they always produce the most likely token. Now, use

0:09:34.400 --> 0:09:36.280
<v Speaker 2>the word token. But the problem with this thing is

0:09:36.320 --> 0:09:38.559
<v Speaker 2>that then they will be just copying the data from

0:09:38.600 --> 0:09:41.839
<v Speaker 2>the training set. So just imagine those lawsuits when you

0:09:41.920 --> 0:09:46.240
<v Speaker 2>see verbatting copies of all the podcasts books produced as

0:09:46.240 --> 0:09:48.880
<v Speaker 2>an output of Gemini or open Ai system. So those

0:09:48.920 --> 0:09:52.880
<v Speaker 2>systems number one are not continually learning, which basically for

0:09:53.000 --> 0:09:55.120
<v Speaker 2>me is just a no go. And number two, those

0:09:55.160 --> 0:09:58.840
<v Speaker 2>systems are not to be trusted because whenever they produce output,

0:09:59.000 --> 0:10:02.920
<v Speaker 2>that output is it stochastic. It's not a terministic it's

0:10:03.040 --> 0:10:05.280
<v Speaker 2>basically rolling the dice to catch you output.

0:10:05.840 --> 0:10:07.679
<v Speaker 1>Yeah. Can we talk a bit more about that about

0:10:07.720 --> 0:10:10.280
<v Speaker 1>how the hallucinations or errors have it build up?

0:10:10.760 --> 0:10:14.600
<v Speaker 2>Absolutely? I me try to explain using simple terms when

0:10:14.679 --> 0:10:18.959
<v Speaker 2>you use ch GPT or Gemini or copilot, but this

0:10:19.080 --> 0:10:23.640
<v Speaker 2>is Copilot is actually open eye system. It just produces texts,

0:10:23.800 --> 0:10:27.840
<v Speaker 2>so you have an impression that generates one word at

0:10:27.880 --> 0:10:31.439
<v Speaker 2>the time deterministically, that's what's on the screen, just one

0:10:31.559 --> 0:10:34.800
<v Speaker 2>word after another. But in reality that's not what those

0:10:34.840 --> 0:10:39.280
<v Speaker 2>systems produce. If you are a developer like me and

0:10:39.360 --> 0:10:42.480
<v Speaker 2>my coelagus are developers as well, you can look at

0:10:42.520 --> 0:10:45.480
<v Speaker 2>the developer output of a large language. Models do you

0:10:45.520 --> 0:10:48.559
<v Speaker 2>know what it gives you. It gives you a vector

0:10:48.920 --> 0:10:52.640
<v Speaker 2>of fifty thousand elements, actually fifty two thousand elements, so

0:10:52.800 --> 0:10:56.840
<v Speaker 2>fifty thousand elements, and each element has a certain probability

0:10:56.960 --> 0:11:00.360
<v Speaker 2>of being correct zero point ninety five zero one zero

0:11:00.480 --> 0:11:03.360
<v Speaker 2>zero one zero, zero point three. It's an entire vector.

0:11:03.880 --> 0:11:07.679
<v Speaker 2>It's not one word has probability one. Everything else is zero. No,

0:11:07.840 --> 0:11:09.640
<v Speaker 2>there is a little bit of error there. It has

0:11:09.720 --> 0:11:12.240
<v Speaker 2>to be, and so not is what happens when you're

0:11:12.280 --> 0:11:15.400
<v Speaker 2>producing one word at the time, or one token at

0:11:15.440 --> 0:11:18.439
<v Speaker 2>the time. You can think of a token as a

0:11:18.559 --> 0:11:21.400
<v Speaker 2>ward split by two or three. So when you produce

0:11:21.480 --> 0:11:24.920
<v Speaker 2>output one token at the time, this system is rolling

0:11:24.960 --> 0:11:27.400
<v Speaker 2>the dice all the time. It's making a small, tiny

0:11:27.600 --> 0:11:31.319
<v Speaker 2>error every time it produces a ward s. At the beginning,

0:11:32.040 --> 0:11:34.520
<v Speaker 2>you may not perceive the error, but over time, after

0:11:34.679 --> 0:11:38.079
<v Speaker 2>three hundred, five hundred a thousand words, the error is

0:11:38.120 --> 0:11:41.160
<v Speaker 2>going to be so big that it's going to result

0:11:41.200 --> 0:11:45.119
<v Speaker 2>in the critical failure of the system. You cannot circumvent

0:11:45.200 --> 0:11:48.599
<v Speaker 2>it because those systems are probabilistic. It's not like in

0:11:48.640 --> 0:11:51.280
<v Speaker 2>an Excel spreadsheet, where you can have a chain of

0:11:51.400 --> 0:11:54.719
<v Speaker 2>aund ten twenty one hundred one thousand formulas and you

0:11:54.840 --> 0:11:58.120
<v Speaker 2>know that the formulas are going to produce the correct result. Here,

0:11:58.679 --> 0:12:02.079
<v Speaker 2>if you have a chain of words, every time you

0:12:02.200 --> 0:12:04.640
<v Speaker 2>produce a word, you accrue a little bit of error

0:12:05.080 --> 0:12:07.400
<v Speaker 2>and you see this error manifest itself later on.

0:12:08.400 --> 0:12:12.679
<v Speaker 1>Okay, so is this solvable? I guess that's the key question.

0:12:13.000 --> 0:12:15.679
<v Speaker 1>Is it possible? And the type of models that we're

0:12:15.760 --> 0:12:18.800
<v Speaker 1>using at the moment, which are super hyped, is it

0:12:18.880 --> 0:12:22.839
<v Speaker 1>possible for that hallucination problem or compounding error problem to

0:12:22.920 --> 0:12:26.400
<v Speaker 1>be solved, or is it simply a systemic shortcoming that

0:12:26.559 --> 0:12:27.400
<v Speaker 1>is unresolvable.

0:12:28.280 --> 0:12:31.679
<v Speaker 2>So here I'm again speaking from my experience having two

0:12:31.679 --> 0:12:34.719
<v Speaker 2>ephds in mathematics and computer science and twenty years of

0:12:34.800 --> 0:12:39.640
<v Speaker 2>experience in deep mind and IBM Watson research. No, it's impossible.

0:12:39.880 --> 0:12:43.120
<v Speaker 2>You have to use a different technique for it. Yes,

0:12:43.320 --> 0:12:47.040
<v Speaker 2>there are different techniques that are emerging right now. In

0:12:47.200 --> 0:12:49.520
<v Speaker 2>full disclaimer, I'm also a co founder of a startup

0:12:49.679 --> 0:12:53.520
<v Speaker 2>working on one of those techniques. We call them fractal brain. Yes,

0:12:53.600 --> 0:12:55.920
<v Speaker 2>there are new techniques on the horizon. However, the existing

0:12:56.040 --> 0:13:00.600
<v Speaker 2>techniques have a building mechanism so that every time produce

0:13:00.640 --> 0:13:04.040
<v Speaker 2>an output another word, you have a little bit of error.

0:13:04.400 --> 0:13:07.079
<v Speaker 2>You cannot eliminate it thing. And there's also a fundamental

0:13:07.120 --> 0:13:10.400
<v Speaker 2>other thing you mentioned hallucinations. What do we mean by hallucinations?

0:13:10.720 --> 0:13:14.240
<v Speaker 2>It's number one, it's producing those small errors one word

0:13:14.240 --> 0:13:18.280
<v Speaker 2>at a time. But there's also another reason for hallucinations

0:13:18.320 --> 0:13:22.200
<v Speaker 2>when this system just doesn't remember what was mentioned yesterday,

0:13:22.320 --> 0:13:24.600
<v Speaker 2>a week ago, or a month ago, and goes back

0:13:24.640 --> 0:13:26.800
<v Speaker 2>to the initial question. You need both. You need to

0:13:26.880 --> 0:13:30.439
<v Speaker 2>have a system which, like humans, it's continually learning. And

0:13:30.600 --> 0:13:33.720
<v Speaker 2>number two, you need to have a system which does

0:13:33.840 --> 0:13:37.040
<v Speaker 2>not make errors when it produces the next word. It

0:13:37.080 --> 0:13:39.760
<v Speaker 2>shouldn't roll a dice. You need those two and so

0:13:40.480 --> 0:13:43.240
<v Speaker 2>now to answer your question, can we solve this problem

0:13:43.400 --> 0:13:49.200
<v Speaker 2>using artificial neural networks? No, there are attempts to. There

0:13:49.280 --> 0:13:52.000
<v Speaker 2>have been attempts to circumvent it. If you want to

0:13:52.080 --> 0:13:55.320
<v Speaker 2>have continual learning, you can maybe try to use something

0:13:55.440 --> 0:13:58.800
<v Speaker 2>like continual backpropagation from rich Sutt only got a two

0:13:58.840 --> 0:14:00.559
<v Speaker 2>in go word in twenty twenty four. Or we can,

0:14:00.679 --> 0:14:03.559
<v Speaker 2>but it's not solving the problem. At least he's attempting

0:14:03.640 --> 0:14:07.200
<v Speaker 2>to find solutions to it. You can retrain the system,

0:14:07.280 --> 0:14:09.599
<v Speaker 2>of course, right, I mean, why not take GBT for

0:14:09.960 --> 0:14:12.520
<v Speaker 2>after the end of the podcast. Retrain the entire system.

0:14:12.559 --> 0:14:14.839
<v Speaker 2>It's going to cost you five million dollars. But you

0:14:14.920 --> 0:14:18.559
<v Speaker 2>could do that. You could retrain the entire system, or

0:14:18.640 --> 0:14:20.320
<v Speaker 2>you can find you in your system. But when you

0:14:20.440 --> 0:14:24.600
<v Speaker 2>find you in your system, the system is forgetting what

0:14:24.920 --> 0:14:28.400
<v Speaker 2>it was trained on before. So you're suffering something which

0:14:28.480 --> 0:14:34.080
<v Speaker 2>is called catastrophic forgetting or catastrophic interference. Long story short, No,

0:14:35.120 --> 0:14:38.960
<v Speaker 2>you cannot solve the outstanding problems of hallucinations and lack

0:14:39.000 --> 0:14:41.800
<v Speaker 2>of continual learning. Unfortunately. I wish you could, but you

0:14:41.880 --> 0:14:43.320
<v Speaker 2>need to have different things for it. And it's not

0:14:43.440 --> 0:14:47.400
<v Speaker 2>just me saying it. Look at the landscape of researchers,

0:14:47.480 --> 0:14:49.800
<v Speaker 2>leading researchers in the field, my colleagues, I know all

0:14:49.840 --> 0:14:54.520
<v Speaker 2>of them personally. They have all jumped ship. This is important.

0:14:54.680 --> 0:14:57.880
<v Speaker 2>Look at, for example, young Lecon from Metaga. He jumped ship.

0:14:58.120 --> 0:15:00.280
<v Speaker 2>He's working on his new startup am I line apps,

0:15:00.720 --> 0:15:03.480
<v Speaker 2>not working on llms, saying l lams is a dead

0:15:03.600 --> 0:15:07.680
<v Speaker 2>end to AGI. Look at Michael Iff Dave Silver from DeepMind.

0:15:07.760 --> 0:15:09.720
<v Speaker 2>He just left deep Mind, but a couple of weeks

0:15:09.760 --> 0:15:12.480
<v Speaker 2>ago he formed Ineffable Intelligence. I think they're raising a

0:15:12.520 --> 0:15:15.600
<v Speaker 2>billion dollars the same thing. He is not a believer

0:15:15.760 --> 0:15:19.720
<v Speaker 2>of using l lms for general artificial intelligence, and can

0:15:19.960 --> 0:15:24.400
<v Speaker 2>go on and on Iliasutskiver, Andre Carpaty. And so it's

0:15:24.440 --> 0:15:26.560
<v Speaker 2>not just me who is saying that we need to

0:15:26.600 --> 0:15:30.160
<v Speaker 2>go back to research. The leading research and researchers in

0:15:30.240 --> 0:15:34.200
<v Speaker 2>the field have already jumped ship a few months, maybe

0:15:34.200 --> 0:15:37.640
<v Speaker 2>even years ago, working on the next generation things. The

0:15:37.760 --> 0:15:41.840
<v Speaker 2>market still believes we can solve hallucinations, but the leading

0:15:41.920 --> 0:15:45.800
<v Speaker 2>researchers have jumped ship. That's unbelievable to me that we

0:15:46.040 --> 0:15:50.160
<v Speaker 2>keep pouring money into bigger data centers, knowing we've used

0:15:50.200 --> 0:15:53.840
<v Speaker 2>all the data already, and knowing that even if they

0:15:53.880 --> 0:15:57.320
<v Speaker 2>have more data, you will not solve continual learning, and

0:15:57.440 --> 0:16:00.800
<v Speaker 2>you will not solve hallucinations. So why is not everyone?

0:16:01.360 --> 0:16:02.200
<v Speaker 2>Why are people doing it?

0:16:02.800 --> 0:16:05.480
<v Speaker 1>Okay, So if we know clearly, and it sounds from

0:16:05.520 --> 0:16:07.440
<v Speaker 1>what you say that we do know very clearly that

0:16:07.640 --> 0:16:10.680
<v Speaker 1>using more and more computing power and more and more

0:16:10.720 --> 0:16:13.920
<v Speaker 1>already mildly corrupted data isn't going to get us anywhere.

0:16:14.320 --> 0:16:19.120
<v Speaker 1>Then the enormous cap expand on vast data centers, the

0:16:19.560 --> 0:16:21.960
<v Speaker 1>hundreds of billions of dollars that have already gone into

0:16:22.040 --> 0:16:24.200
<v Speaker 1>this and are still projected to go into This is

0:16:24.280 --> 0:16:27.080
<v Speaker 1>a catastrophic misallocation of capital.

0:16:27.600 --> 0:16:31.080
<v Speaker 2>Well that's how you look at it. So obviously you

0:16:31.160 --> 0:16:34.040
<v Speaker 2>can say the biggest winner of it is Nvidia because

0:16:34.040 --> 0:16:37.680
<v Speaker 2>it's producing those GPUs or during the gold rush, you

0:16:37.680 --> 0:16:40.880
<v Speaker 2>should invest in companies that produce the shovels you can

0:16:40.920 --> 0:16:43.920
<v Speaker 2>still make money. For example, I'm a partner at Urn

0:16:43.960 --> 0:16:47.920
<v Speaker 2>Innovation Capital in the UK and we are investing in

0:16:48.400 --> 0:16:51.560
<v Speaker 2>airs mission two companies, one called high Verge and second

0:16:51.640 --> 0:16:57.000
<v Speaker 2>called Hydra. These companies improve coolings of data centers or

0:16:57.120 --> 0:17:01.040
<v Speaker 2>these companies produce better algorithms to run on data centers.

0:17:01.080 --> 0:17:04.560
<v Speaker 2>So you can still allocate your capital wisely, but you

0:17:04.600 --> 0:17:08.360
<v Speaker 2>shouldn't allocate them in companies which are spending on this compute.

0:17:09.000 --> 0:17:12.880
<v Speaker 2>You should allocate your capital in companies that are allowing

0:17:12.960 --> 0:17:16.520
<v Speaker 2>those data centers to run efficiently because those data centers,

0:17:16.920 --> 0:17:19.280
<v Speaker 2>who knows, maybe they will be used for different purpose

0:17:19.320 --> 0:17:22.040
<v Speaker 2>at some point. So again, there are going to be

0:17:22.119 --> 0:17:25.879
<v Speaker 2>winners and losers of the current gold rush in GPUs

0:17:26.000 --> 0:17:31.000
<v Speaker 2>and AI. I would say companies that have not invested

0:17:31.200 --> 0:17:34.880
<v Speaker 2>massively in data centers and in front here models are

0:17:34.920 --> 0:17:38.399
<v Speaker 2>going to be the winners. There's some companies without mentioning

0:17:38.440 --> 0:17:42.760
<v Speaker 2>the names, that have been accused of not training their

0:17:42.840 --> 0:17:45.760
<v Speaker 2>own language models. I think these, to me are going

0:17:45.840 --> 0:17:47.320
<v Speaker 2>to be the winners in today's market.

0:17:47.560 --> 0:17:48.760
<v Speaker 1>Are we talking about Apple?

0:17:49.280 --> 0:17:52.040
<v Speaker 2>I will let you determine that thing. But you can

0:17:52.119 --> 0:17:56.040
<v Speaker 2>see some companies number one, have not invested in the

0:17:56.240 --> 0:17:59.680
<v Speaker 2>LM front here models, but the research teams have kept

0:17:59.720 --> 0:18:02.760
<v Speaker 2>public papers saying that those lms stayed on the reason

0:18:02.800 --> 0:18:05.359
<v Speaker 2>they make mistakes. So let you find those companies. And

0:18:05.480 --> 0:18:08.440
<v Speaker 2>there are some companies that have borrowed a lot of

0:18:08.560 --> 0:18:10.960
<v Speaker 2>money to expand those center centers. And it's not me.

0:18:11.119 --> 0:18:13.520
<v Speaker 2>Look at the market, look at CDs on Oracle. It's

0:18:14.320 --> 0:18:17.400
<v Speaker 2>the market is just flashing reds saying this is foolish.

0:18:17.800 --> 0:18:20.240
<v Speaker 2>So we're seeing those signals already. But I want to

0:18:20.280 --> 0:18:21.680
<v Speaker 2>make sure that if you want to make money in

0:18:21.720 --> 0:18:25.520
<v Speaker 2>today's market, is given that the governments have to pay

0:18:25.600 --> 0:18:29.360
<v Speaker 2>eight nine percent of the revenue on servicing the debt.

0:18:29.800 --> 0:18:31.280
<v Speaker 2>You don't know if the market is going to go

0:18:31.359 --> 0:18:33.520
<v Speaker 2>up and down, and maybe a capital injection, liquid, the

0:18:33.520 --> 0:18:35.520
<v Speaker 2>injection from Central Bucks. You don't know those things. So

0:18:35.640 --> 0:18:38.960
<v Speaker 2>I would not recommend your short or go along any investment,

0:18:39.000 --> 0:18:42.280
<v Speaker 2>I would recommend maybe doing an arbitrush buy companies that

0:18:42.400 --> 0:18:46.399
<v Speaker 2>have not wasted money on llms and short companies that

0:18:46.560 --> 0:18:49.080
<v Speaker 2>have borrowed a lot of money to expand data centers.

0:18:49.119 --> 0:18:51.680
<v Speaker 2>That will be my suggestion, but I might be wrong.

0:18:51.760 --> 0:19:16.600
<v Speaker 1>Again. Can you tell us about any of the startup

0:19:16.640 --> 0:19:19.159
<v Speaker 1>companies that you're invested in or interested in that are

0:19:19.240 --> 0:19:22.120
<v Speaker 1>taking us to this new frontier and AI away from

0:19:22.160 --> 0:19:24.080
<v Speaker 1>the LLM model and towards a different model.

0:19:24.240 --> 0:19:27.680
<v Speaker 2>Absolutely so. In our investment fund, our an innovation capital,

0:19:27.840 --> 0:19:30.040
<v Speaker 2>we have access to a lot of companies, actually maybe

0:19:30.119 --> 0:19:33.680
<v Speaker 2>twenty or thirty companies that are developing the next frontier

0:19:33.760 --> 0:19:36.560
<v Speaker 2>models which are not necessarily using artificial in our networks.

0:19:36.560 --> 0:19:39.240
<v Speaker 2>And I can speak about three or four of them

0:19:39.240 --> 0:19:42.000
<v Speaker 2>which are very exciting. One company you can have a look.

0:19:42.200 --> 0:19:46.640
<v Speaker 2>They are based in Switzerland. They're called Innate AI. Again,

0:19:46.640 --> 0:19:48.800
<v Speaker 2>I'm advertising them and we're not investing in them. We're

0:19:48.800 --> 0:19:53.080
<v Speaker 2>not investing them yet. They are developing new version of

0:19:53.960 --> 0:19:57.280
<v Speaker 2>neural networks which are inspired on the brain. This is

0:19:57.359 --> 0:19:59.320
<v Speaker 2>an effort that was going on in Europe for more

0:19:59.359 --> 0:20:02.680
<v Speaker 2>than a decade, the Blue Brain project. They are developing

0:20:02.760 --> 0:20:05.600
<v Speaker 2>something new which is not an artificial neural network. So

0:20:05.680 --> 0:20:10.159
<v Speaker 2>that's one kind. Look at another company, for example, PATHWAYAI

0:20:10.480 --> 0:20:14.960
<v Speaker 2>in the Bay Area. Again another example they mentioned upfront,

0:20:15.240 --> 0:20:18.800
<v Speaker 2>you need to solve continual learning. You need to solve that.

0:20:18.880 --> 0:20:21.280
<v Speaker 2>If you don't have it, forget about the solution to AGI.

0:20:21.800 --> 0:20:27.000
<v Speaker 2>And so they have been developing systems that can learn

0:20:27.160 --> 0:20:29.960
<v Speaker 2>using something called heavy on learning, which is a local

0:20:30.080 --> 0:20:32.840
<v Speaker 2>learning technique that happens in the brain, not using bad

0:20:32.880 --> 0:20:36.520
<v Speaker 2>propagation grid in descent. So this is another example. Another

0:20:36.640 --> 0:20:38.879
<v Speaker 2>company that I'm actually a co founder of and the

0:20:39.000 --> 0:20:42.040
<v Speaker 2>CEO called Fractal Brain AI. Have a look at the thing.

0:20:42.600 --> 0:20:45.440
<v Speaker 2>It's also based on prefrontal cortex, and it's this idea

0:20:45.600 --> 0:20:50.359
<v Speaker 2>that those networks are continually growing and rewiring themselves. So

0:20:50.680 --> 0:20:53.119
<v Speaker 2>no longer you have a fixed network with a fixed

0:20:53.240 --> 0:20:58.080
<v Speaker 2>number of parameters. No, the network is growing expanding themselves.

0:20:58.200 --> 0:21:00.840
<v Speaker 2>Like today, you're going to probably form a connections after

0:21:00.920 --> 0:21:03.800
<v Speaker 2>this podcast. Those networks do the same, They create new

0:21:03.840 --> 0:21:06.719
<v Speaker 2>connections all the time, and on top of that, they

0:21:06.760 --> 0:21:10.080
<v Speaker 2>are continually learning and thousands of times more power efficient

0:21:10.600 --> 0:21:13.000
<v Speaker 2>in addition to being data efficient. So these are only

0:21:13.080 --> 0:21:15.720
<v Speaker 2>some of the examples of companies that I'm personally very

0:21:15.760 --> 0:21:20.159
<v Speaker 2>excited about. But as ILIOSU Skiverer said the other day

0:21:20.200 --> 0:21:24.880
<v Speaker 2>on one of those podcasts, that we have gone back

0:21:25.240 --> 0:21:29.520
<v Speaker 2>from the age of scaling to the age of research,

0:21:30.200 --> 0:21:33.320
<v Speaker 2>So researchers have gone back to developing the new things.

0:21:33.840 --> 0:21:36.879
<v Speaker 2>It's just onlin end here me and my teams have

0:21:37.000 --> 0:21:41.160
<v Speaker 2>started developing, for example, fractals fractal brain twelve years ago.

0:21:41.320 --> 0:21:44.960
<v Speaker 2>We knew about those outstanding limitations of artificial neural networks

0:21:45.000 --> 0:21:47.199
<v Speaker 2>more than a decade ago, so we wouldn't invest our

0:21:47.280 --> 0:21:47.720
<v Speaker 2>time in it.

0:21:48.000 --> 0:21:50.480
<v Speaker 1>Nevertheless, somehow I got to cut up in this sort

0:21:50.520 --> 0:21:54.120
<v Speaker 1>of super bubble hype, despite the fact that good scientists

0:21:54.160 --> 0:21:54.399
<v Speaker 1>aren't you.

0:21:54.880 --> 0:21:56.800
<v Speaker 2>But this is good. I like the hype because you see,

0:21:57.240 --> 0:22:02.000
<v Speaker 2>to some extent that hype and the lands they allowed

0:22:02.160 --> 0:22:06.600
<v Speaker 2>us to understand that it's possible to approximate human language.

0:22:07.080 --> 0:22:09.400
<v Speaker 2>So now that you know that you can approximate human

0:22:09.520 --> 0:22:13.240
<v Speaker 2>language with el lamps, you can try to find ways

0:22:13.280 --> 0:22:15.960
<v Speaker 2>to actually solve the idea of human language. If you

0:22:16.040 --> 0:22:18.480
<v Speaker 2>can approximate something, you can see the size of it.

0:22:19.119 --> 0:22:22.000
<v Speaker 2>So it's almost like if someone showed you, hey, there

0:22:22.080 --> 0:22:24.240
<v Speaker 2>is a rocket there, it flies, you already know the

0:22:24.359 --> 0:22:26.280
<v Speaker 2>size of the rocket. You can know you can fly,

0:22:26.720 --> 0:22:29.159
<v Speaker 2>you can start to crack the details of the engine

0:22:29.200 --> 0:22:31.240
<v Speaker 2>of the rocket. So to some extent, I like the

0:22:31.320 --> 0:22:34.560
<v Speaker 2>current hype. I like the current generation systems because they

0:22:34.600 --> 0:22:38.000
<v Speaker 2>allowed us to understand the size of the problem and

0:22:38.119 --> 0:22:40.320
<v Speaker 2>approximate it. Now that we know that we can in

0:22:40.400 --> 0:22:44.480
<v Speaker 2>principle approximate human language, let's just solve it, okay.

0:22:44.640 --> 0:22:46.359
<v Speaker 1>And the other thing I suppose we should say is

0:22:46.440 --> 0:22:48.440
<v Speaker 1>that while we spent quite a lot of time criticizing

0:22:48.520 --> 0:22:52.520
<v Speaker 1>this generation of llms, they're still great, still as really useful.

0:22:52.640 --> 0:22:55.560
<v Speaker 1>It's not like we have a totally pointless technology, something

0:22:55.680 --> 0:22:58.760
<v Speaker 1>that can remove entry level jobs across the board, which

0:22:58.760 --> 0:23:01.280
<v Speaker 1>of course comes with its own problems, but none LASS

0:23:01.320 --> 0:23:04.160
<v Speaker 1>has enormous use from activity and business.

0:23:04.320 --> 0:23:07.000
<v Speaker 2>Absolutely, I love maybe not just l elms. I love

0:23:07.160 --> 0:23:09.879
<v Speaker 2>the generative AI. For example. Not sure if you've noticed

0:23:09.920 --> 0:23:13.520
<v Speaker 2>behind me, I have this amazing landscape of London, but

0:23:13.720 --> 0:23:15.679
<v Speaker 2>you can tell it's all fake here, like this building

0:23:15.760 --> 0:23:19.959
<v Speaker 2>is all tilted here. So those systems they produce very

0:23:20.119 --> 0:23:23.680
<v Speaker 2>pretty graphics. It's inaccurate, but it's okay for me. It

0:23:23.760 --> 0:23:26.119
<v Speaker 2>still gives me a very nice background. Same thing with text.

0:23:26.480 --> 0:23:31.920
<v Speaker 2>They will produce beautifully looking poem. They can summarize the document. Yes,

0:23:32.080 --> 0:23:34.159
<v Speaker 2>there are errors. The are like this building here. You

0:23:34.240 --> 0:23:37.000
<v Speaker 2>can tell it's all tilted a little bit, But I'm

0:23:37.040 --> 0:23:40.159
<v Speaker 2>okay with that. So those systems are very good for

0:23:40.480 --> 0:23:44.560
<v Speaker 2>creating templates, knowing what templates of data, it's so called

0:23:44.640 --> 0:23:48.680
<v Speaker 2>boiler plate code, creating nice graphics. They're not good for

0:23:48.880 --> 0:23:51.840
<v Speaker 2>details to put in there. And it's interesting because about

0:23:51.840 --> 0:23:54.920
<v Speaker 2>two or three years ago, I was giving a talk

0:23:55.200 --> 0:23:57.879
<v Speaker 2>to high school students. They're asking me what do we

0:23:58.040 --> 0:24:01.720
<v Speaker 2>use those lms for. I told them for generating templates,

0:24:02.320 --> 0:24:08.200
<v Speaker 2>templates of presentation, etc. And for summarizing documents. But I

0:24:08.320 --> 0:24:10.600
<v Speaker 2>misled them. I don't think you should be using those

0:24:10.640 --> 0:24:14.960
<v Speaker 2>systems for summarization for two reasons. The reason number one,

0:24:15.800 --> 0:24:19.520
<v Speaker 2>in that summary there might be errors and mistakes, So

0:24:19.680 --> 0:24:22.840
<v Speaker 2>if you summarize a document, don't throw away the original.

0:24:23.760 --> 0:24:27.240
<v Speaker 2>And number two, you know, when you're summarizing something, you

0:24:27.320 --> 0:24:29.680
<v Speaker 2>should know what you care about. For example, if you

0:24:29.760 --> 0:24:33.640
<v Speaker 2>are to summarize today's podcast, maybe you only care about

0:24:33.680 --> 0:24:36.320
<v Speaker 2>this tilted building here, which is fake. Maybe that's what

0:24:36.440 --> 0:24:39.159
<v Speaker 2>you're looking for. LM doesn't know it. So when it

0:24:39.240 --> 0:24:43.639
<v Speaker 2>produces a summary of text. When it compresses graphics, it

0:24:43.760 --> 0:24:45.960
<v Speaker 2>doesn't know what you care about really, so it's going

0:24:46.000 --> 0:24:49.520
<v Speaker 2>to produce a summary maybe lacking the details that you

0:24:49.600 --> 0:24:50.520
<v Speaker 2>want to know later on.

0:24:51.800 --> 0:24:54.040
<v Speaker 1>And I suppose the other things with the errors. You

0:24:54.080 --> 0:24:56.960
<v Speaker 1>should only really be using it for things where you

0:24:57.080 --> 0:24:58.720
<v Speaker 1>know that you will be able to spot the errors

0:24:58.760 --> 0:24:59.159
<v Speaker 1>at the end.

0:24:59.480 --> 0:25:01.159
<v Speaker 2>So this is a actually very interesting I think that

0:25:01.240 --> 0:25:04.800
<v Speaker 2>the killer use case for generative AI is producing output

0:25:05.119 --> 0:25:09.479
<v Speaker 2>that you yourself can check for correctness. So for example,

0:25:09.520 --> 0:25:11.760
<v Speaker 2>if I'm if I want to compute one hundred plus

0:25:11.840 --> 0:25:14.159
<v Speaker 2>one hundred, LM will give me an answer I know

0:25:14.320 --> 0:25:17.680
<v Speaker 2>I can check the answer myself the correctness. I like

0:25:17.760 --> 0:25:22.240
<v Speaker 2>it this way. Unfortunately, people are using those lms to

0:25:22.440 --> 0:25:25.680
<v Speaker 2>answer a question to which they themselves don't know the

0:25:25.800 --> 0:25:29.920
<v Speaker 2>answer to. This is a recipe of an absolute disaster.

0:25:30.680 --> 0:25:33.119
<v Speaker 2>In the worst case, you can use those systems to

0:25:34.320 --> 0:25:38.119
<v Speaker 2>check whether your output that you produce yourself is correct.

0:25:38.200 --> 0:25:40.320
<v Speaker 2>You can do it that way. But people using the

0:25:40.400 --> 0:25:42.960
<v Speaker 2>other way around. They are asking those lms to produce

0:25:43.000 --> 0:25:45.840
<v Speaker 2>an output they don't know what the output should be,

0:25:46.560 --> 0:25:49.359
<v Speaker 2>and the output can have one or five percent error, right,

0:25:49.480 --> 0:25:50.320
<v Speaker 2>why would you even do that?

0:25:50.480 --> 0:25:53.159
<v Speaker 1>So are we worrying unnecessarily about the job market. We

0:25:53.240 --> 0:25:56.480
<v Speaker 1>spend most of the talks and podcasts and panels, etc.

0:25:56.760 --> 0:25:58.800
<v Speaker 1>That I do, the question is always what on earth

0:25:58.880 --> 0:26:01.280
<v Speaker 1>does my child do for work? In an age of AI.

0:26:01.440 --> 0:26:03.720
<v Speaker 1>Are we worrying too much about that? Because the human

0:26:03.760 --> 0:26:07.959
<v Speaker 1>input will remain absolutely compulsory for the next couple of decades.

0:26:08.760 --> 0:26:11.240
<v Speaker 2>Having the same problem as I mentioned, my son as

0:26:11.720 --> 0:26:14.719
<v Speaker 2>is thirteen years old, so mis the father to give

0:26:14.800 --> 0:26:16.920
<v Speaker 2>him advice what to do in the future. I guess

0:26:17.080 --> 0:26:19.600
<v Speaker 2>being a scuba diver instructor is a good job. Yeah,

0:26:20.320 --> 0:26:22.000
<v Speaker 2>it is a great job. You're going to need them.

0:26:22.200 --> 0:26:27.080
<v Speaker 2>I worry about interns entry level of software engineers because

0:26:27.160 --> 0:26:31.639
<v Speaker 2>to some extent you can automate. It's not replaced. You

0:26:31.680 --> 0:26:35.119
<v Speaker 2>can automate most of the tasks that you delegate to

0:26:35.280 --> 0:26:40.320
<v Speaker 2>interns today, like, for example, write a boilerplate code, template code,

0:26:40.520 --> 0:26:43.040
<v Speaker 2>check some if there's errors in that code, you can

0:26:43.119 --> 0:26:44.840
<v Speaker 2>automate that. But what's going to happen then is that

0:26:45.560 --> 0:26:48.640
<v Speaker 2>we are not going to have interns anymore or significantly

0:26:48.760 --> 0:26:51.960
<v Speaker 2>smaller number of interests and entry level software engineers. So

0:26:52.080 --> 0:26:55.720
<v Speaker 2>what's the consequence. What's going to happen with the entire

0:26:55.920 --> 0:27:00.640
<v Speaker 2>promotion cycle. If you're in senior management or a middle management,

0:27:01.320 --> 0:27:04.359
<v Speaker 2>you're gonna get promoted. Who's going to replace you? Who's

0:27:04.400 --> 0:27:07.880
<v Speaker 2>going to become a software architect if we're not training

0:27:08.160 --> 0:27:11.680
<v Speaker 2>the new entry level software engineers. So there's this entire

0:27:11.760 --> 0:27:14.520
<v Speaker 2>skills gap right now. Some people that I know have

0:27:14.720 --> 0:27:17.399
<v Speaker 2>chosen not to pursue studies in computer science because for

0:27:17.520 --> 0:27:20.680
<v Speaker 2>the very reason they worry that we're not going to

0:27:20.720 --> 0:27:23.639
<v Speaker 2>need software engineers. Yes, we are going to need software engineers,

0:27:23.720 --> 0:27:27.919
<v Speaker 2>it's just you need to jump immediately to being an

0:27:28.119 --> 0:27:31.360
<v Speaker 2>architect of a software engineering system. And to make sure

0:27:31.680 --> 0:27:34.760
<v Speaker 2>verry hard. It is hard. It Typically what you do

0:27:35.040 --> 0:27:37.240
<v Speaker 2>is you get gain this experience on the job. You

0:27:37.359 --> 0:27:40.280
<v Speaker 2>go to Google, spend first two three years writing code,

0:27:40.480 --> 0:27:43.800
<v Speaker 2>but you appear so software architects, and so you learn

0:27:43.880 --> 0:27:46.560
<v Speaker 2>from them. If we don't have this experience that we're

0:27:46.600 --> 0:27:49.680
<v Speaker 2>giving to entry level software engineers, they won't be able

0:27:49.720 --> 0:27:51.719
<v Speaker 2>to have this experience. So this is what always me more.

0:27:51.840 --> 0:27:56.480
<v Speaker 2>It's not about replacing software engineers, it's about us not

0:27:56.680 --> 0:28:00.960
<v Speaker 2>having a pipeline of software architects. And see your software engineers,

0:28:01.000 --> 0:28:03.560
<v Speaker 2>we are not having this pielime anymore. That worries me

0:28:03.680 --> 0:28:04.680
<v Speaker 2>quite a bit, to be honest.

0:28:04.800 --> 0:28:07.200
<v Speaker 1>Yeah, and the pipeline problem is just being discussed in

0:28:07.200 --> 0:28:09.159
<v Speaker 1>a lot of other professions as well, most obviously in

0:28:09.200 --> 0:28:10.080
<v Speaker 1>a legal profession.

0:28:10.359 --> 0:28:12.760
<v Speaker 2>Absolutely, again for lawyers, this is very interesting and this

0:28:12.880 --> 0:28:14.840
<v Speaker 2>is not my profession, so you can discount what I

0:28:14.880 --> 0:28:17.920
<v Speaker 2>say quite a bit way. Now, you can get the

0:28:18.119 --> 0:28:21.520
<v Speaker 2>initial blueprints of legal documents very quickly, and we do

0:28:21.600 --> 0:28:23.880
<v Speaker 2>it all the time at our startup. You can get

0:28:23.920 --> 0:28:26.520
<v Speaker 2>a blueprint of a legal document, but I would never

0:28:26.760 --> 0:28:29.440
<v Speaker 2>use that document to get an investor on board in

0:28:29.520 --> 0:28:32.399
<v Speaker 2>my startup. I wouldn't do that. I still need to

0:28:32.520 --> 0:28:34.880
<v Speaker 2>send it to an actual human being to at least

0:28:34.960 --> 0:28:37.240
<v Speaker 2>prove read it. So we're going to have to have

0:28:37.359 --> 0:28:42.120
<v Speaker 2>those lawyers which can prove readdocuments produced by the generative AI,

0:28:43.040 --> 0:28:46.680
<v Speaker 2>but aren't doing it already. They already have hundreds of

0:28:46.800 --> 0:28:49.920
<v Speaker 2>template documents safe on their computers. It's just changing the

0:28:50.040 --> 0:28:53.080
<v Speaker 2>names of companies investors in those documents. So the legal

0:28:53.160 --> 0:28:57.080
<v Speaker 2>profession is not going to go away because those generative

0:28:57.120 --> 0:29:00.960
<v Speaker 2>AI systems, they don't have the notion of true and false.

0:29:01.160 --> 0:29:04.320
<v Speaker 2>They don't they confused. It's all probabilistic, so they will

0:29:04.360 --> 0:29:09.080
<v Speaker 2>make an error. Sometimes after twenty thirty forty legal statements,

0:29:09.200 --> 0:29:13.120
<v Speaker 2>they will just change one true into false. So we're

0:29:13.160 --> 0:29:15.040
<v Speaker 2>still going to need to have lawyers for it. This

0:29:15.160 --> 0:29:17.160
<v Speaker 2>is one of those professions where I don't think it's

0:29:17.200 --> 0:29:19.920
<v Speaker 2>going to be automated fully by AI. But as I said,

0:29:20.000 --> 0:29:23.479
<v Speaker 2>there are other professions that have already been automated. For example,

0:29:23.560 --> 0:29:25.800
<v Speaker 2>content creators. So you can go online, go to YouTube.

0:29:25.800 --> 0:29:28.720
<v Speaker 2>You're going to see a lot of videos summarizing AI,

0:29:28.960 --> 0:29:33.720
<v Speaker 2>AI bubble summarizing conflict in Ukraine or Iran, all generated

0:29:33.880 --> 0:29:37.080
<v Speaker 2>using generative AI. So some jobs have already gone away,

0:29:37.600 --> 0:29:39.920
<v Speaker 2>the jobs that require to be you to be one

0:29:39.960 --> 0:29:43.320
<v Speaker 2>hundred percent accurate, those jobs are not going to go

0:29:43.440 --> 0:29:47.840
<v Speaker 2>away using the current generation of AI systems. Remember, there

0:29:47.880 --> 0:29:51.160
<v Speaker 2>are new generations on the horizons. My startup is working

0:29:51.240 --> 0:29:53.600
<v Speaker 2>on it, other startups are working on it. But with

0:29:53.720 --> 0:29:56.880
<v Speaker 2>the current generation of tools, they will not displace those

0:29:56.960 --> 0:29:57.400
<v Speaker 2>jobs yet.

0:29:57.680 --> 0:30:00.160
<v Speaker 1>Yeah, it sounds to me like we shouldn't necessarily be

0:30:00.240 --> 0:30:02.920
<v Speaker 1>frightened the current generation of tools, but we should hardly

0:30:02.960 --> 0:30:04.719
<v Speaker 1>be pretty frightened the next generation.

0:30:05.400 --> 0:30:07.760
<v Speaker 2>I would give a very simple example here. So me

0:30:07.800 --> 0:30:11.080
<v Speaker 2>and my colleagues we've built Alpha go at Deepmine in

0:30:11.120 --> 0:30:13.920
<v Speaker 2>twenty fifteen, which is the system that won in computer

0:30:14.040 --> 0:30:17.640
<v Speaker 2>Goal with the World champion, and so back then it

0:30:17.880 --> 0:30:21.960
<v Speaker 2>was a state of the art system. These days, you

0:30:22.200 --> 0:30:25.240
<v Speaker 2>yourself can win against that system. I can do it

0:30:25.320 --> 0:30:30.000
<v Speaker 2>as well. Why because those systems have stayed the same,

0:30:30.560 --> 0:30:34.400
<v Speaker 2>and humans have identified flaws in that systems. They have

0:30:34.560 --> 0:30:39.280
<v Speaker 2>identified hallucinations errors, And now when you play against that system,

0:30:39.800 --> 0:30:43.120
<v Speaker 2>you just exploit this system, exploit its weaknesses. The system

0:30:43.200 --> 0:30:46.880
<v Speaker 2>is not adapting itself, not rewiring itself. So to answer

0:30:46.920 --> 0:30:50.600
<v Speaker 2>your question quickly, I'm personally not worried of existing AI

0:30:50.720 --> 0:30:53.840
<v Speaker 2>systems because they are adapting themselves. However, I do have

0:30:53.920 --> 0:30:58.320
<v Speaker 2>to mention I do worry that people are going to

0:30:58.360 --> 0:31:03.000
<v Speaker 2>be using existing systems in domains where they should not

0:31:03.120 --> 0:31:06.880
<v Speaker 2>be used, for example, for identifying targets to bombing around.

0:31:07.120 --> 0:31:09.800
<v Speaker 2>It's just don't do those things. Those systems make errors

0:31:09.840 --> 0:31:14.880
<v Speaker 2>and hallucinations. So I worry about misuse of existing AI tools.

0:31:15.440 --> 0:31:18.280
<v Speaker 2>I don't worry about these tools themselves being malicious. No,

0:31:19.000 --> 0:31:23.480
<v Speaker 2>I worry about inadvertent misuse of those tools without understanding

0:31:23.600 --> 0:31:25.000
<v Speaker 2>what they are not good for.

0:31:26.000 --> 0:31:29.200
<v Speaker 1>Okay, interesting, always something to worry about. Can I ask

0:31:29.280 --> 0:31:31.600
<v Speaker 1>you one last thing. One of the things that you

0:31:31.680 --> 0:31:36.240
<v Speaker 1>mentioned earlier was the extraordinary energy inefficiency of the current systems.

0:31:36.320 --> 0:31:38.960
<v Speaker 1>And it is true, isn't it. The human brain is

0:31:39.080 --> 0:31:43.440
<v Speaker 1>remarkably energy efficient. And when you look at these models,

0:31:43.480 --> 0:31:45.560
<v Speaker 1>the amount of energy that they will use to simply

0:31:45.600 --> 0:31:47.520
<v Speaker 1>have the same thought that the human can have on

0:31:47.560 --> 0:31:50.560
<v Speaker 1>a couple of words, it's that's an extraordinary problem that

0:31:50.720 --> 0:31:54.000
<v Speaker 1>in your new generation that we talked about earlier will

0:31:54.040 --> 0:31:54.760
<v Speaker 1>be diminished.

0:31:55.520 --> 0:31:59.480
<v Speaker 2>Absolutely. The systems that I know that fractal brain is

0:31:59.560 --> 0:32:03.000
<v Speaker 2>coming up with, innate AI is coming up with. See

0:32:03.040 --> 0:32:08.240
<v Speaker 2>those systems, they don't use back propagation to produce the

0:32:08.320 --> 0:32:12.880
<v Speaker 2>next word. They don't have to load from the memory

0:32:13.280 --> 0:32:17.000
<v Speaker 2>all of those one point five trillion parameters just to

0:32:17.080 --> 0:32:20.080
<v Speaker 2>produce the next word. They don't do that. They load

0:32:20.160 --> 0:32:24.680
<v Speaker 2>maybe one hundred, maybe one thousand parameters three four orders

0:32:24.720 --> 0:32:27.320
<v Speaker 2>of magnitude better power efficiency. And we know this thing,

0:32:27.400 --> 0:32:29.400
<v Speaker 2>and we know that we've trained what first version of

0:32:29.480 --> 0:32:33.000
<v Speaker 2>our fractal language model using I think zero point one

0:32:33.040 --> 0:32:36.080
<v Speaker 2>percent electricity that open AI use for GPT one and two,

0:32:36.640 --> 0:32:38.760
<v Speaker 2>so we know it's possible to do that. It finds

0:32:38.760 --> 0:32:42.000
<v Speaker 2>its interesting. Before this podcast, we had a conversation. If

0:32:42.040 --> 0:32:45.840
<v Speaker 2>you recall me on Easter dinner and cooking potatoes. So

0:32:46.840 --> 0:32:50.240
<v Speaker 2>as we're talking right now, I can assure you that

0:32:50.680 --> 0:32:52.960
<v Speaker 2>for me and for you to produce the next word

0:32:53.040 --> 0:32:57.000
<v Speaker 2>in our conversation, you are not thinking about potatoes for Easter.

0:32:57.120 --> 0:33:00.400
<v Speaker 2>You're not doing that thing. You don't need those connections too.

0:33:00.720 --> 0:33:02.480
<v Speaker 2>When we talk about a we don't do those things.

0:33:02.520 --> 0:33:04.720
<v Speaker 2>And maybe now you think about your Eastern dinner. But

0:33:04.800 --> 0:33:08.120
<v Speaker 2>the point is that those systems should produce the next

0:33:08.280 --> 0:33:12.360
<v Speaker 2>word only loading up parameters which are important for the

0:33:12.960 --> 0:33:15.880
<v Speaker 2>next world. And this is not billions of parameters, nor

0:33:16.040 --> 0:33:19.080
<v Speaker 2>hundreds of thousands of most maybe a thousand, two thousand parameters.

0:33:19.320 --> 0:33:23.760
<v Speaker 2>So yes, the next generation systems, in addition to being deterministic,

0:33:23.880 --> 0:33:26.960
<v Speaker 2>you can trust their output. They have power and the

0:33:27.080 --> 0:33:30.680
<v Speaker 2>data efficiency which is remarkable, talking three orders of magnitude better,

0:33:31.120 --> 0:33:33.400
<v Speaker 2>it's coming out. This thing is coming better. And again

0:33:34.000 --> 0:33:36.200
<v Speaker 2>it's the world may not be ready for it yet.

0:33:36.320 --> 0:33:40.040
<v Speaker 2>Plus you didn't understand my own motivation for building those

0:33:40.080 --> 0:33:44.960
<v Speaker 2>systems and the dangers. We have systems which are adapting themselves,

0:33:45.120 --> 0:33:50.160
<v Speaker 2>rewiring themselves, so to some extent you cannot outsmart them permanently.

0:33:50.200 --> 0:33:53.120
<v Speaker 2>They'll learn from your mistakes. It's like your kid. Try

0:33:53.400 --> 0:33:55.600
<v Speaker 2>using some technique on your kid. It's going to learn

0:33:55.680 --> 0:33:58.200
<v Speaker 2>how to adapt itself. Those systems do it as well,

0:33:58.600 --> 0:34:00.840
<v Speaker 2>So I personally do not no if it's a good

0:34:00.880 --> 0:34:04.520
<v Speaker 2>time to release those systems to the general public. I don't.

0:34:04.920 --> 0:34:07.880
<v Speaker 2>That's why those systems are not being published because think

0:34:07.920 --> 0:34:09.640
<v Speaker 2>about it. If it has a solution for a system

0:34:09.719 --> 0:34:13.080
<v Speaker 2>which is learning all the time adapting it's flaws, that

0:34:13.360 --> 0:34:18.120
<v Speaker 2>becomes spooky to release that system. So I worry more

0:34:18.200 --> 0:34:22.919
<v Speaker 2>about potential misuse of those next generation systems, less worry

0:34:22.920 --> 0:34:25.960
<v Speaker 2>about their performance because they're already beating existing systems on

0:34:26.080 --> 0:34:26.960
<v Speaker 2>common benchmarks.

0:34:27.640 --> 0:34:30.920
<v Speaker 1>Okay, thank you very much. It's one last thing imish

0:34:31.000 --> 0:34:33.160
<v Speaker 1>before we go. I think our listeners will have been

0:34:33.360 --> 0:34:36.479
<v Speaker 1>absolutely fascinated by all this and what I often ask people,

0:34:36.480 --> 0:34:37.839
<v Speaker 1>I'm going to ask you as well, and I hope

0:34:37.880 --> 0:34:39.800
<v Speaker 1>that we'll be able to understand that.

0:34:40.080 --> 0:34:42.839
<v Speaker 2>Your suggestion, what are you reading at the moment, Well,

0:34:42.880 --> 0:34:44.360
<v Speaker 2>I wish I could tell you the IM reading books

0:34:44.400 --> 0:34:46.640
<v Speaker 2>on AI, but I don't. In fact, four years ago

0:34:46.840 --> 0:34:51.279
<v Speaker 2>I started learning without any necessity Spanish, zero necessity, nothing.

0:34:51.920 --> 0:34:54.640
<v Speaker 2>The reason I started doing that is I wanted to

0:34:55.440 --> 0:35:00.440
<v Speaker 2>again learn a language the way humans do that without

0:35:00.560 --> 0:35:02.760
<v Speaker 2>billions of words, not with a small number of words.

0:35:02.760 --> 0:35:05.799
<v Speaker 2>So as an expers a mental experiment to learn whether

0:35:05.920 --> 0:35:08.640
<v Speaker 2>our fractal language model learns language in the same way

0:35:08.680 --> 0:35:11.480
<v Speaker 2>as I'm learning Spanish, I started learning Spanish, and because

0:35:11.520 --> 0:35:14.520
<v Speaker 2>of that, I'll I'll disappoint you. I'm reading kids' books,

0:35:15.080 --> 0:35:17.399
<v Speaker 2>reading Diary of the wimpik It in Spanish. Of course,

0:35:17.440 --> 0:35:19.719
<v Speaker 2>I'm reading Harry Potter as well in Spanish. I'm reading

0:35:19.760 --> 0:35:22.560
<v Speaker 2>lots and lots of books in Spanish, but these are

0:35:23.480 --> 0:35:25.520
<v Speaker 2>elementary school books. I'm sorry to disappointed.

0:35:25.520 --> 0:35:28.239
<v Speaker 1>You're reading the same things in Spanish that my son is.

0:35:28.320 --> 0:35:29.640
<v Speaker 1>So there we go in common.

0:35:29.719 --> 0:35:32.080
<v Speaker 2>Yes, absolutely, this is actually good because if you want

0:35:32.080 --> 0:35:35.000
<v Speaker 2>to understand, for example, how language works, at least try

0:35:35.080 --> 0:35:37.719
<v Speaker 2>to learn yet another language. Now that you can do

0:35:37.800 --> 0:35:40.480
<v Speaker 2>an introspection, you can see how you're learning a language,

0:35:40.560 --> 0:35:42.839
<v Speaker 2>and so it's remarkable you can learn a language after

0:35:43.520 --> 0:35:46.600
<v Speaker 2>in my case, about seven or eight thousand hours. I

0:35:46.680 --> 0:35:48.560
<v Speaker 2>guess my Spanish is better than my English right now,

0:35:48.960 --> 0:35:51.480
<v Speaker 2>So eight thousand hours, how many tokens we're talking about?

0:35:51.840 --> 0:35:55.680
<v Speaker 2>Couple million tokens, couple million tokens of a training set

0:35:56.120 --> 0:36:01.120
<v Speaker 2>rather than couple trillion tokens. So that's what fascinates me.

0:36:01.280 --> 0:36:03.480
<v Speaker 2>And yeah, maybe next time we can speak in the

0:36:03.520 --> 0:36:04.680
<v Speaker 2>Spanish on the podcast.

0:36:04.840 --> 0:36:06.200
<v Speaker 1>No, I'll have to get one of my kids in

0:36:06.280 --> 0:36:06.480
<v Speaker 1>for that.

0:36:07.040 --> 0:36:08.000
<v Speaker 2>Okay, sounds good.

0:36:09.040 --> 0:36:10.880
<v Speaker 1>Yeahhish, thank you so much for joining us today.

0:36:11.360 --> 0:36:12.960
<v Speaker 2>Its pleasure. Thank you so much for having me.

0:36:18.080 --> 0:36:20.239
<v Speaker 1>Thanks for listening to this week's Marin Talks Money. If

0:36:20.239 --> 0:36:22.560
<v Speaker 1>you like our show, rate to review and subscribe wherever

0:36:22.600 --> 0:36:24.800
<v Speaker 1>you listen to podcasts, and keep sending your questions or

0:36:24.800 --> 0:36:27.480
<v Speaker 1>comments the Merrin Money at Bloomberg dot net. You can

0:36:27.520 --> 0:36:30.200
<v Speaker 1>also follow me and John on Twitter or x I

0:36:30.320 --> 0:36:34.080
<v Speaker 1>met Marinas w and John is Underscore Stepic. This episode

0:36:34.160 --> 0:36:37.480
<v Speaker 1>was hosted by Me Marenzumaset Web was produced by Sammersadi

0:36:37.520 --> 0:36:40.160
<v Speaker 1>and Moses and sound designed by Blake Maples and Aaron

0:36:40.239 --> 0:36:43.480
<v Speaker 1>Kasper and special thanks of course to Yannish Mareski.