WEBVTT - Inside the Battle for Chips That Will Power Artificial Intelligence 0:00:10.160 --> 0:00:14.480 Hello, and welcome to another episode of The Odd Blocks podcast. 0:00:14.560 --> 0:00:16.280 I'm Joe Wisenthal. 0:00:15.840 --> 0:00:16.880 And I'm Tracy Alloway. 0:00:17.239 --> 0:00:21.160 Tracy, I'm not sure if you've heard anyone talking about 0:00:21.200 --> 0:00:22.800 it or anything, but have you heard about like this 0:00:22.920 --> 0:00:24.920 sort of AI thing people have been discussing? 0:00:24.960 --> 0:00:27.720 Oh, you know what, I discovered this really cool new 0:00:27.800 --> 0:00:29.240 thing called chat gps. 0:00:29.280 --> 0:00:31.520 Oh yeah, I saw that website too. Yeah. 0:00:31.560 --> 0:00:32.440 Have you tried it? 0:00:32.960 --> 0:00:35.080 I tried it. Yeah, I kind of like write a 0:00:35.080 --> 0:00:38.840 poem for me. She's pretty cool technology. We should probably 0:00:38.880 --> 0:00:39.639 learn more about it. 0:00:39.880 --> 0:00:42.199 Yeah, I think we should know. Okay, all right, obviously 0:00:42.320 --> 0:00:46.920 we're being facetious and joking, but everyone has been talking 0:00:47.159 --> 0:00:51.920 about AI and these new sort of natural language interfaces 0:00:52.000 --> 0:00:56.440 that allow you to ask questions or generate all different 0:00:56.480 --> 0:00:59.320 types of texts and things like that. It feels like 0:00:59.440 --> 0:01:02.280 everyone is very excited about that space. 0:01:02.160 --> 0:01:06.240 Every almost every time. Mile Like I went out with 0:01:06.280 --> 0:01:08.399 some friends that I hadn't seen in a long time, 0:01:08.480 --> 0:01:10.720 Like I was at a bar last night, and like 0:01:10.800 --> 0:01:13.840 the conversation like turned to AI within like two minutes. 0:01:13.880 --> 0:01:16.120 Never got to talk about the experiments they did. But yes, 0:01:16.240 --> 0:01:18.959 there is a lot. It's basically like this, like wall 0:01:19.040 --> 0:01:22.240 of noise and everyone's been talking about actually but us, 0:01:22.280 --> 0:01:24.280 because I don't think we have done as far as 0:01:24.319 --> 0:01:27.400 I can recall, like an AI episode. We don't want 0:01:27.400 --> 0:01:30.240 to just add to the noise and get another sort 0:01:30.280 --> 0:01:33.240 of chin stroke around. But obviously there's a lot there 0:01:33.280 --> 0:01:33.679 for us. 0:01:33.560 --> 0:01:36.320 To discuss totally, and I'm sure this will be the 0:01:36.319 --> 0:01:39.720 first of many episodes. But one of the ways that 0:01:39.760 --> 0:01:43.640 it fits into sort of classic odd lots lore is 0:01:44.000 --> 0:01:45.360 via semiconductors. 0:01:45.480 --> 0:01:45.640 Right. 0:01:45.840 --> 0:01:49.480 If you think about what chat GPT, for instance, is doing, 0:01:49.680 --> 0:01:55.000 it's taking words and transforming them into numbers and then 0:01:55.240 --> 0:01:57.920 spitting those words back out at you. And the thing 0:01:58.000 --> 0:02:01.520 that enables it to do that semiconductors chips. 0:02:01.800 --> 0:02:04.560 Right, So here's like the four things I think I 0:02:04.600 --> 0:02:08.440 know about this and so this is that A. Training 0:02:08.480 --> 0:02:10.680 the AI models so that they can do that is 0:02:10.680 --> 0:02:16.119 a computationally intensive process. B. Each query is much more 0:02:16.120 --> 0:02:18.680 computationally intensive than say a Google search. 0:02:19.400 --> 0:02:19.680 Three. 0:02:20.360 --> 0:02:23.880 The company that's absolutely crushing the space and printing money 0:02:24.000 --> 0:02:27.840 because of this is in Nvidia. Yeah, And four there's 0:02:27.880 --> 0:02:31.920 a general scarcity of computing powers, so that even if 0:02:31.960 --> 0:02:35.200 you and I like were brilliant mathematicians and AI theorists, 0:02:35.200 --> 0:02:38.440 et cetera. If we wanted to start a chat GPT competitor, 0:02:39.200 --> 0:02:42.400 just getting access to the computing power in order to 0:02:42.480 --> 0:02:44.960 do that would not be trivial, even if we had 0:02:45.000 --> 0:02:46.040 tons of money outside of it. 0:02:46.120 --> 0:02:49.200 I'm going to buy an out of business cryptomne and 0:02:49.240 --> 0:02:49.799 take all the. 0:02:50.280 --> 0:02:53.280 They've already been bought. Someone got that. But that's that's 0:02:53.360 --> 0:02:57.240 basically the extent of my understanding of the nexus between 0:02:57.360 --> 0:03:01.320 this AI and chips, and I suspect there's more to know. 0:03:01.400 --> 0:03:05.120 They're just well. I also think having a conversation about 0:03:05.160 --> 0:03:09.280 semiconductors and AI is a really good way to understand 0:03:09.480 --> 0:03:12.720 the underlying technology of both those things. So that's what 0:03:12.760 --> 0:03:14.280 I'm hoping for out of this conversation. 0:03:14.320 --> 0:03:16.320 All right, Well, you mentioned we've been doing We've done 0:03:16.360 --> 0:03:18.560 lots of Chips episodes in the past, so we're going 0:03:18.639 --> 0:03:22.040 to go back to the future or something like that. 0:03:22.080 --> 0:03:23.960 We're going to go back to our first episode, our 0:03:24.000 --> 0:03:27.240 first guest, where we started exploring Chips episodes. I think 0:03:27.240 --> 0:03:29.720 it was the first one that we did sometime maybe 0:03:29.760 --> 0:03:32.760 in early twenty twenty one. We're going to be speaking 0:03:32.800 --> 0:03:36.320 with Stacey Raskin, Managing director and senior analyst of US 0:03:36.360 --> 0:03:40.960 Semiconductors and Semiconductor Capital Equipment at Bernstein Research, someone who's 0:03:41.040 --> 0:03:43.280 great at breaking all this stuff down has been doing 0:03:43.320 --> 0:03:46.280 a lot of research on this question now. So Stacy, 0:03:46.680 --> 0:03:48.760 thank you so much for coming back on odd lots. 0:03:49.680 --> 0:03:51.520 I am so happy to be back. Thank you so 0:03:51.640 --> 0:03:52.560 much for having me right. 0:03:52.560 --> 0:03:54.560 So I'm going to start with just sort of like 0:03:54.880 --> 0:03:58.560 not even a business question, but a sort of semiconductor 0:03:58.600 --> 0:04:03.280 design question, which is this company in video Like for 0:04:03.440 --> 0:04:05.480 years I just sort of knew them. Is like they 0:04:05.480 --> 0:04:08.680 were the company that made graphics cards for video games, 0:04:08.720 --> 0:04:10.880 and then for a while they got there like oh, 0:04:10.920 --> 0:04:13.960 and they're also good for crypto mining, and they were 0:04:14.040 --> 0:04:16.880 very popular for a while in ethereum mining when it 0:04:17.000 --> 0:04:20.279 used roof of work. And now my understanding is everyone 0:04:20.320 --> 0:04:22.800 wants their chips for AI purposes. And we'll get into 0:04:22.839 --> 0:04:25.760 all that, but just to start, what is it about 0:04:25.839 --> 0:04:29.920 the design of their chips that makes them naturally suited 0:04:29.960 --> 0:04:32.200 for these other things? A company that started in graphics 0:04:32.240 --> 0:04:35.440 cards that makes them naturally suited for these things like 0:04:35.560 --> 0:04:39.240 AI in a way apparently that other chip makers, like 0:04:39.279 --> 0:04:43.400 saying Intel, their chips do not seem to be as 0:04:43.720 --> 0:04:44.640 used for this space. 0:04:46.160 --> 0:04:48.560 Yeah, so let me step back. 0:04:48.640 --> 0:04:52.040 Yeah, sure, if the question, if the question is totally 0:04:52.120 --> 0:04:54.320 flawed in its premise, then feel free to say your 0:04:54.400 --> 0:04:56.320 question is totally let me step back. 0:04:56.360 --> 0:05:00.279 So sure, I'd say the idea of like using cute 0:05:00.360 --> 0:05:02.599 and artificial intelligence has obviously been around for a long 0:05:02.880 --> 0:05:05.120 long time, and actually the AI industry has been through 0:05:05.120 --> 0:05:08.240 a number of what they call AI winters over the years, 0:05:08.279 --> 0:05:10.760 where people would get really excited about this and then 0:05:10.760 --> 0:05:12.279 they would do work, and then it would just turn 0:05:12.320 --> 0:05:15.640 out it wasn't working, and pretty much it was just 0:05:15.680 --> 0:05:19.839 because the compute capacity and capabilities of the hardware at 0:05:19.880 --> 0:05:21.720 the time doesn't really wasn't really up to the task, 0:05:21.760 --> 0:05:24.080 and so interest would wane and you'd go through this 0:05:24.160 --> 0:05:27.560 winter period, and a while back, I don't know, ten 0:05:27.720 --> 0:05:29.719 fifteen years ago, whenever it was, it was sort of 0:05:29.760 --> 0:05:35.520 discovered that the types of calculations that are used for 0:05:35.839 --> 0:05:38.280 neural networks and machine learning, it turns out they are 0:05:38.440 --> 0:05:41.080 very similar to the kinds of application the kinds of 0:05:41.200 --> 0:05:45.479 mathematics that are used for graphics process processing and graphics rendering. 0:05:45.520 --> 0:05:48.960 As it turns out it's primarily matrix multiplication and we'll 0:05:48.960 --> 0:05:51.000 probably get into this call on this call a little 0:05:51.040 --> 0:05:53.960 bit in terms of how these machine learning models and 0:05:53.960 --> 0:05:55.680 everything actually work. But at the end of the day, 0:05:55.800 --> 0:05:59.520 really it comes down to like really really large amounts 0:05:59.520 --> 0:06:02.840 of matrix multiplication and parallel operations. And as it turned out, 0:06:03.600 --> 0:06:07.200 the GPU, the graphics of processing unit was was quite suitable. 0:06:07.640 --> 0:06:10.400 Before you go on then and maybe we'll get into 0:06:10.440 --> 0:06:13.159 this an hour three of this conversation. No, we're not 0:06:13.160 --> 0:06:15.599 going to go down on but what is matrix multiplication? 0:06:17.000 --> 0:06:18.599 Yeah? So, I don't know how many of you are 0:06:18.640 --> 0:06:21.880 our listeners here have had linear algebra or anything, but 0:06:22.120 --> 0:06:24.000 a matrix is just like an array of numbers, like 0:06:24.120 --> 0:06:27.279 thinking about like a square array of numbers, okay, okay, 0:06:27.320 --> 0:06:29.800 and matrix multiplications. I've got two of these arrays and 0:06:29.839 --> 0:06:32.960 I'm multiplying them together, and it's it's not as simple 0:06:33.000 --> 0:06:35.800 as the kind of math or multiplication that maybe you're 0:06:35.960 --> 0:06:39.880 typically used to, but it can be done. And it 0:06:39.960 --> 0:06:42.240 turns out there are some of these characteristics of these 0:06:42.320 --> 0:06:44.520 kinds of matrix' number of these matrix can be really big, 0:06:44.560 --> 0:06:46.680 and there's like lots and lots of operations that need 0:06:46.760 --> 0:06:49.000 to happen, and this stuff needs to happen like like 0:06:49.080 --> 0:06:52.520 quite rapidly. And again I'm grossly simplifying here for the listeners, 0:06:53.279 --> 0:06:56.360 But when when you're working through these kinds of machine 0:06:56.440 --> 0:06:58.960 learning models, that that's really what you're doing. It's it's 0:06:58.960 --> 0:07:02.000 a bunch of different makes, a bunch of different arrays 0:07:02.720 --> 0:07:06.080 of numbers that contain all of the different parameters and things. 0:07:06.279 --> 0:07:08.120 But we should probably step up a bit and talk 0:07:08.160 --> 0:07:11.200 about what we actually mean when we talk about machine 0:07:11.240 --> 0:07:14.720 learning and models and all kinds of things. But at 0:07:14.760 --> 0:07:16.440 the end of the day, you have these really large 0:07:16.480 --> 0:07:19.560 arrays of numbers that have to get multiplied together in 0:07:19.600 --> 0:07:21.760 many cases, over and over again, many many times, and 0:07:21.800 --> 0:07:26.000 it turns into a very very large compute problem. And 0:07:26.040 --> 0:07:30.000 it's something that the GPU architecture can actually can do 0:07:30.120 --> 0:07:33.800 really really efficiently, much more efficiently than you could say 0:07:33.840 --> 0:07:37.760 on a traditional CPU. And so, as it turns out, 0:07:37.760 --> 0:07:40.200 the GPU has become a good architecture for this. Now 0:07:40.200 --> 0:07:41.640 when a video has done on top of this, not 0:07:41.640 --> 0:07:44.160 only with having the hardware is they've also built a 0:07:44.240 --> 0:07:48.160 really massive software ecosystem around all of this. They have 0:07:48.360 --> 0:07:51.240 their software is called Kuta. Think about it as kind 0:07:51.280 --> 0:07:54.440 of like the software of the programming and environment, like 0:07:54.440 --> 0:07:57.440 the parallel programming environment for these gps, and they've layered 0:07:57.480 --> 0:08:01.120 on all kinds of other libraries, stks and everything on 0:08:01.440 --> 0:08:05.480 top of that that actually makes this relatively easy to 0:08:05.640 --> 0:08:07.600 use and to deploy and to deliver. And so they've 0:08:07.640 --> 0:08:09.800 built up not just the hardware bus of the software 0:08:09.800 --> 0:08:12.160 around this, and it's given them a really really sort 0:08:12.160 --> 0:08:15.520 of like like like massive gap versus like a lot 0:08:15.520 --> 0:08:17.480 of the other competitors that are now trying to get 0:08:17.480 --> 0:08:19.960 into this market as well. And so and it's FUNNYO 0:08:20.000 --> 0:08:22.720 if you look at Nvidia as a stock I mean today, 0:08:22.760 --> 0:08:24.320 I mean this morning, it's about a lot of a 0:08:24.320 --> 0:08:26.640 two hundred and sixty or two hundred and seventy dollars 0:08:26.680 --> 0:08:29.920 a share. This was a ten to twenty dollars stock forever, 0:08:30.000 --> 0:08:33.319 and they did a four to one s stock split recently, 0:08:33.400 --> 0:08:35.200 so that'd be more like, you know, like a two 0:08:35.240 --> 0:08:37.880 dollars and fifty cent to five dollars stock on today's 0:08:37.880 --> 0:08:40.560 basis for for years and years and years. And just 0:08:40.600 --> 0:08:44.640 the magnitude of the growth that we've had with these 0:08:44.640 --> 0:08:47.000 guys over over the last like five or ten years, 0:08:47.000 --> 0:08:51.040 particularly around their data center business and artificial intelligence. Everything 0:08:51.240 --> 0:08:54.000 has just been quite remarkable, and so the earnings have 0:08:54.040 --> 0:08:56.959 gone through the roof, and clearly the multiple that you're 0:08:57.000 --> 0:08:59.280 placing on those earnings has gone through the roof, because 0:08:59.440 --> 0:09:01.400 you know, the the view is that the opportunity here 0:09:01.440 --> 0:09:02.960 is massive and that we're early and there's a lot 0:09:02.960 --> 0:09:05.000 of runway ahead of us and the stocks. I mean, 0:09:05.000 --> 0:09:07.000 it's had it tops and downs, but in general it's 0:09:07.000 --> 0:09:07.640 been a home run. 0:09:08.200 --> 0:09:10.240 I definitely want to ask you about where we are 0:09:10.280 --> 0:09:14.800 in the sort of semiconductor stock price cycle. But before 0:09:14.840 --> 0:09:17.560 we get into that, you know, I will also bite 0:09:17.640 --> 0:09:21.240 on the really basic question that you already alluded to, 0:09:21.400 --> 0:09:26.560 but how does machine learning slash AI actually work. You 0:09:26.640 --> 0:09:29.560 mentioned this idea of I guess processing a bunch of 0:09:29.640 --> 0:09:34.199 data in parallel versus I guess old style computing where 0:09:34.240 --> 0:09:36.960 it would be sequential. But like, talk to us about 0:09:37.000 --> 0:09:40.280 what is actually happening here and how does it fit 0:09:40.480 --> 0:09:42.200 into the semiconductor space. 0:09:43.360 --> 0:09:45.120 You bet? You bet? So let me let me first 0:09:45.160 --> 0:09:47.679 abstract this up and I'll give you a really contrived 0:09:47.720 --> 0:09:50.959 example just sort of simplistically about what's going on, and 0:09:51.000 --> 0:09:52.319 then we can go a little bit more into the 0:09:52.360 --> 0:09:55.199 actual details of what's happening. But let's imagine you want 0:09:55.200 --> 0:09:58.079 to have some kind of a neural net. But the 0:09:58.280 --> 0:10:01.079 machine learning is typically done with something called a neural network, 0:10:01.480 --> 0:10:03.600 and I'll talk about what that is in a moment. 0:10:03.600 --> 0:10:05.680 And let's let's just imagine, for example, you want to 0:10:05.679 --> 0:10:09.720 build a an artificial intelligence a neural network to recognize 0:10:09.760 --> 0:10:13.040 pictures of casts. It's just saying, okay, let's imagine I've 0:10:13.080 --> 0:10:15.040 got this black box sitting in front of me, and 0:10:15.280 --> 0:10:17.680 it's got a slots on one side where I'm taking 0:10:17.720 --> 0:10:20.800 pictures and I'm feeding them in. It's got to display 0:10:20.880 --> 0:10:22.800 on the other side which tells me, yes, it's a 0:10:22.840 --> 0:10:25.360 cat or no it's not. And on the side of 0:10:25.400 --> 0:10:30.080 the box there are a billion knobs that you can turn, okay, 0:10:30.679 --> 0:10:34.160 and and they'll change various parameters of this model that 0:10:34.280 --> 0:10:36.520 right now are inside the black box. Don't worry about 0:10:36.520 --> 0:10:38.920 what those parameters are, but there's there's knobs that can 0:10:39.000 --> 0:10:41.760 change them, and so effectively what you're doing when you're 0:10:42.480 --> 0:10:43.880 training the thing. And by the way, when you have 0:10:43.920 --> 0:10:45.440 the artificion does what you have is you have this 0:10:45.480 --> 0:10:48.320 big black box. You need to train it to do 0:10:48.400 --> 0:10:50.600 a specific task. That's what I'm going to talk about 0:10:50.600 --> 0:10:53.760 in a moment. That's called training, and then once it's trained, 0:10:53.800 --> 0:10:56.800 you need to use it for whatever task you've traded for. 0:10:57.080 --> 0:10:59.280 That task is called inference. So you got to do 0:10:59.520 --> 0:11:02.040 the training inference. So the training here's where we have. 0:11:02.280 --> 0:11:04.160 I got my box with a slot and the display 0:11:04.160 --> 0:11:06.920 and a billion knobs. Okay, So what I do for 0:11:06.960 --> 0:11:09.360 the training process effectively is I take a picture and 0:11:10.440 --> 0:11:12.400 a known picture okay, so I know if it's a 0:11:12.440 --> 0:11:15.599 catter or not. I feed it into the box and 0:11:15.720 --> 0:11:18.400 I look at the display and it tells me yes 0:11:18.440 --> 0:11:20.240 it's a catteror yes it's not, and it probably gets 0:11:20.280 --> 0:11:21.640 it wrong. And so then what I do is I 0:11:21.679 --> 0:11:25.240 turn some of the knobs and I feed another picture in, 0:11:26.160 --> 0:11:27.920 and then I turned some of the knobs, and I'm 0:11:27.920 --> 0:11:31.440 basically tuning all of the parameters and sort of measuring 0:11:31.559 --> 0:11:35.280 how accurate is this network at doing this tasket recognizing 0:11:35.360 --> 0:11:36.679 is this a picture of a cat or is it not? 0:11:37.400 --> 0:11:42.200 And I keep feeding pictures in known pictures known data set, 0:11:42.679 --> 0:11:45.080 and I keep playing with all the knobs until the 0:11:45.120 --> 0:11:47.040 accuracy of the thing is wherever I want it to be. 0:11:47.120 --> 0:11:50.480 So yes, it's decided that that now it's very good 0:11:50.520 --> 0:11:52.840 at recognizing is this a picture of a catteror is 0:11:52.840 --> 0:11:55.600 it not. At that point, my model, my box is trained. 0:11:56.240 --> 0:11:58.280 I now lock all of those knobs in place, I 0:11:58.280 --> 0:12:00.720 don't move them anymore, and I use it now I 0:12:00.720 --> 0:12:02.839 can just feed in pictures and it'll tell me yes, 0:12:02.880 --> 0:12:05.360 it's a category, yes it's not. And so the process 0:12:05.400 --> 0:12:07.920 of training this model is what that's really what it's about. 0:12:07.920 --> 0:12:11.079 It's about varying all of the parameters. And by the way, 0:12:11.120 --> 0:12:14.480 these models can have billions or hundreds of billions or 0:12:14.480 --> 0:12:17.679 even more of parameters that they can be changed. And 0:12:17.720 --> 0:12:20.920 that's the process of training. You're basically trying to optimize 0:12:20.960 --> 0:12:24.240 this this sort of situation. I'm changing the parameters a 0:12:24.280 --> 0:12:26.960 little bit at a time such that I can optimize 0:12:27.000 --> 0:12:29.040 the response of this thing such sus that I can 0:12:29.080 --> 0:12:33.280 get the performance of it, the accuracy of the network 0:12:33.320 --> 0:12:36.040 to be high. So that's the training process, and it 0:12:36.120 --> 0:12:39.040 is very very compute intensive, because you can imagine, if 0:12:39.040 --> 0:12:41.480 I've got a billion different knobs on turning, I'm trying 0:12:41.520 --> 0:12:43.640 to optimize the output, that takes a lot of compute. 0:12:43.960 --> 0:12:47.280 The inference process once all that is much less compute 0:12:47.280 --> 0:12:50.640 intensive because I'm not changing anything. I'm just applying the 0:12:50.679 --> 0:12:53.559 network as it is to whatever data that I'm feeding 0:12:53.559 --> 0:12:55.480 in at that But I'm not changing anything. But I 0:12:55.559 --> 0:12:57.240 may be doing a lot more that the difference of 0:12:57.320 --> 0:12:58.679 the inference. I may be using it all the time, 0:12:58.720 --> 0:13:01.280 whereas once I've trained the model trained it. So it's 0:13:01.280 --> 0:13:04.000 more like a one and done versus like a continual 0:13:04.080 --> 0:13:04.679 use sort of thing. 0:13:05.160 --> 0:13:07.160 Since you talk said, we're getting into sort of the 0:13:07.240 --> 0:13:12.199 economics of training versus inference. A is there sort of 0:13:12.240 --> 0:13:14.440 any way to get a sense of Like let's say 0:13:14.679 --> 0:13:18.000 Tracy and me start odd Lodge GPT. It's a competitor 0:13:18.080 --> 0:13:21.000 to chat, a competitor to open AI, Like, what are 0:13:21.040 --> 0:13:23.199 we thinking of in terms of just that scale? How 0:13:23.280 --> 0:13:27.400 much we're spending to compute on the training part? Then 0:13:27.440 --> 0:13:30.520 how much are recurring costs in terms of inference are? 0:13:30.920 --> 0:13:33.280 And then I'm also just curious, like also, like I 0:13:33.640 --> 0:13:36.280 know you said the inference is much cheaper, but how 0:13:36.360 --> 0:13:41.120 much cheaper is it versus say, asking Google question? How 0:13:41.200 --> 0:13:43.960 much more expensive is it? How much more expensive is 0:13:44.000 --> 0:13:47.320 a Chad GPT query or an odd Lodge GPT query 0:13:47.520 --> 0:13:49.520 versus just a normal Google search? 0:13:50.000 --> 0:13:52.080 Yeah, now you get and by the wahen I say cheaper. 0:13:52.080 --> 0:13:54.800 It's like for any given given single use right again, 0:13:54.840 --> 0:13:56.480 if I've got if I'm if I've got like one 0:13:56.520 --> 0:13:58.719 hundred billion different inference activities, maybe it's not. 0:13:58.880 --> 0:13:59.840 It's still expensive. 0:14:00.360 --> 0:14:02.400 Yeah, But I first want to talk about it, just 0:14:02.400 --> 0:14:04.160 just really quickly about like so that this is my 0:14:04.200 --> 0:14:07.760 big abstract, contrived example about what's going on. If if 0:14:07.800 --> 0:14:10.000 I go just a little bit deeper about what what 0:14:10.040 --> 0:14:11.880 this thing is, like, let's talk just briefly about a 0:14:11.920 --> 0:14:13.959 neural network, and then I will get true question, but 0:14:14.559 --> 0:14:17.120 it kind of influences it. So think what is a 0:14:17.160 --> 0:14:19.640 neural If I was to draw like a representation of 0:14:19.640 --> 0:14:21.160 a neural network for you, what I would do is 0:14:21.200 --> 0:14:24.000 I have a bunch of circles. Each of the circles 0:14:24.000 --> 0:14:25.760 would be a neuron, and I wish I was there. 0:14:25.760 --> 0:14:28.200 I could draw a picture for you. But imagine like send. 0:14:27.960 --> 0:14:30.680 A picture after you're done, send a picture and we'll 0:14:30.720 --> 0:14:31.840 run it with the episode. 0:14:31.840 --> 0:14:34.200 We'll run it with the Okay, okay, I can I 0:14:34.200 --> 0:14:34.480 can do? 0:14:34.520 --> 0:14:38.760 There your a hand drawn explanation of these are varies. 0:14:39.400 --> 0:14:42.680 These are varies and fine, but anyways, but imagine like 0:14:42.720 --> 0:14:44.720 I've got like a group of circles. I've got like 0:14:44.760 --> 0:14:47.720 a column, you know, in column one with like three circles, 0:14:47.720 --> 0:14:50.160 and then column two, I've got i don't know, three 0:14:50.200 --> 0:14:52.520 or four circles, and column three, I've got some circles. 0:14:52.760 --> 0:14:55.160 These are my neurons. And imagine I've got arrows that 0:14:55.200 --> 0:14:58.960 are connecting each circle to the circles in one row, 0:14:59.000 --> 0:15:00.720 to all of the circles in the next throw. Those 0:15:00.760 --> 0:15:03.280 are my connections between my neurons. So you can see 0:15:03.280 --> 0:15:05.880 it looks like kind of a net or a network. Okay. 0:15:06.520 --> 0:15:09.960 And so within each circle, I've got some which what's 0:15:10.000 --> 0:15:12.480 called activation function. So what each circle does is it 0:15:12.520 --> 0:15:16.120 takes an input the arrow that's coming into it, and 0:15:16.160 --> 0:15:18.720 it has to decide based on those inputs, do I 0:15:18.800 --> 0:15:22.520 send an output out out the other side or not? Right, 0:15:22.840 --> 0:15:25.960 So there's some certain threshold. If the inputs reach some 0:15:26.040 --> 0:15:28.200 amount of threshold, the neuron will fire, just just like 0:15:28.240 --> 0:15:31.760 the neuron in your brain. Okay. Each each neuron can 0:15:31.800 --> 0:15:33.800 have more than one input coming in from from more 0:15:33.840 --> 0:15:36.480 than one neuron in the previous These are called layers. 0:15:36.480 --> 0:15:38.840 By the way, these rows of circles can have more 0:15:38.840 --> 0:15:41.360 than one input from the different neurons in the previous layer, 0:15:41.640 --> 0:15:44.600 and that the neuron can weight those those different inputs 0:15:44.640 --> 0:15:46.720 differently good, So it can say, you know, from from 0:15:46.920 --> 0:15:48.600 this one neuron, I'm going to give that a fifty 0:15:48.640 --> 0:15:50.680 percent weight, and from the other neural only weight at 0:15:50.680 --> 0:15:52.640 twenty percent. I'm not going to take the full signal. 0:15:53.040 --> 0:15:57.400 So those are called the weights of the network. And 0:15:57.440 --> 0:16:01.160 so each neuron has inputs coming in and outputs going out, 0:16:01.200 --> 0:16:02.760 and each of those inputs and outputs will have a 0:16:02.760 --> 0:16:04.960 weight associated with it. So those those are where I 0:16:05.000 --> 0:16:08.320 talk about those knobs. Those parameters. Yeah, those weights are 0:16:08.400 --> 0:16:11.800 are one set of parameters. And then within each neuron 0:16:12.000 --> 0:16:15.600 there's there's basically there's a certain threshold with all those 0:16:15.640 --> 0:16:17.760 all those signals coming in when you add them up, 0:16:17.760 --> 0:16:20.560 if they reach a certain threshold, then the neuron fires. Okay, 0:16:20.720 --> 0:16:23.080 So that that threshold is called the bias, and you 0:16:23.120 --> 0:16:25.520 can tune that. Like I can have a really sensitive 0:16:25.560 --> 0:16:28.080 neuron where if the bias doesn't I don't need a 0:16:28.080 --> 0:16:29.920 lot of signal coming in to make it fire. I 0:16:29.920 --> 0:16:32.200 can have a neuron that's less sensitive. I need a 0:16:32.200 --> 0:16:35.560 lot of signal coming into portal fire. That's called a bias. 0:16:35.600 --> 0:16:37.520 That that that's also a parameter. So those are the 0:16:37.560 --> 0:16:41.440 parameters that you're setting. The structure of the network itself, 0:16:41.480 --> 0:16:43.640 the number of neurons and the number of layers and 0:16:43.640 --> 0:16:46.640 everything that's that's sort of set, and then you're trying 0:16:46.680 --> 0:16:50.160 to determine these weights and biases and again just just 0:16:50.200 --> 0:16:53.160 the level set you check GPT, which you haven't getting 0:16:53.160 --> 0:16:56.360 excited about as one hundred and seventy five billion separate 0:16:56.400 --> 0:17:00.400 parameters that they get set during their during the training press. Okay, 0:17:00.640 --> 0:17:02.640 So that's that's kind of what's what's going on. 0:17:19.440 --> 0:17:21.640 Before you talk about the economics. Can I just ask 0:17:21.800 --> 0:17:24.920 so one of the things about the technology is it's 0:17:24.960 --> 0:17:28.360 sort of it's supposed to be iterative, right, like it's 0:17:28.480 --> 0:17:31.440 learning as it goes along. Can you talk just briefly 0:17:31.480 --> 0:17:36.760 maybe about how it's incorporating like new inputs as it develops. 0:17:37.880 --> 0:17:40.639 Yeah, So when when you when you training, let's talk 0:17:40.640 --> 0:17:43.760 about training now. So when you train the network, it 0:17:43.880 --> 0:17:47.000 happens on a static data set. Okay, so you have 0:17:47.080 --> 0:17:49.359 to start with a data set, right, and in terms 0:17:49.359 --> 0:17:53.159 of check GPT, that is you know, it has a 0:17:53.400 --> 0:17:56.000 large corpus of data that it was trained on. It 0:17:56.040 --> 0:17:58.399 was there's a lot of data from the Internet and 0:17:58.400 --> 0:17:59.680 from other sources. 0:17:59.359 --> 0:18:02.439 Right, basically trained the smart like all of the Internet, 0:18:03.200 --> 0:18:06.920 but also a lot of Reddit. So it's like we've right, 0:18:07.080 --> 0:18:09.120 like is it like we've trained just like the greatest 0:18:09.119 --> 0:18:11.120 brain of all time is like reddit pill. 0:18:11.800 --> 0:18:13.880 Now it talks like a seventeen year old boy. 0:18:14.400 --> 0:18:16.440 So there's a lot of data and and so yes, 0:18:16.560 --> 0:18:18.639 I sort of how does that data get get you know, 0:18:19.560 --> 0:18:22.760 incorporated into I don't want to get too short of 0:18:22.760 --> 0:18:24.480 getting too complet I don't want to get too complicated. 0:18:24.760 --> 0:18:26.600 Let me talk about how to standard training works, and 0:18:26.600 --> 0:18:28.400 then we can talk about chat GPT because that uses 0:18:28.440 --> 0:18:30.760 a different kind of model. It's called a transformer model. 0:18:30.840 --> 0:18:33.639 But anyways, but when when I'm training this, so, so 0:18:33.680 --> 0:18:35.800 what happens is is I feed this stuff that there's 0:18:35.840 --> 0:18:38.600 a there's a process called it's called back propagation. Basically 0:18:38.680 --> 0:18:42.879 what you do is you sort of feed this stuff 0:18:42.920 --> 0:18:46.679 through through this through the network itself, and then you 0:18:46.720 --> 0:18:48.680 work it backwards and you're basically what you're doing is 0:18:48.720 --> 0:18:51.480 you're measuring the output against a known response. I want 0:18:51.480 --> 0:18:54.480 to sort of you know, that's my my cat picture. 0:18:54.560 --> 0:18:56.080 Is it a cat or is it not a cat, right, 0:18:56.119 --> 0:18:58.160 I'm trying to minimize the difference between because I want 0:18:58.160 --> 0:19:00.080 to be accurate. Right, So what you sort of to 0:19:00.160 --> 0:19:03.280 do is you roll a certain step through the network, right, 0:19:03.320 --> 0:19:06.040 You measure the output against the against the known what 0:19:06.200 --> 0:19:08.400 it should be. And then there's a process that's called 0:19:08.480 --> 0:19:11.200 back propagation, where what you're doing you're actually what you're 0:19:11.200 --> 0:19:14.160 calculate what's called the gradients of all of these things. 0:19:14.160 --> 0:19:16.119 You're basically looking at sort of like the sort of 0:19:16.119 --> 0:19:19.720 like the rate of change of of these different parameters, 0:19:19.720 --> 0:19:23.000 and you sort of work the network backwards, and that 0:19:23.160 --> 0:19:25.400