WEBVTT - Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip 0:00:02.720 --> 0:00:13.960 Bloomberg Audio Studios, Podcasts, Radio News. 0:00:18.600 --> 0:00:22.239 Hello and welcome to another episode of the Odd Lots podcast. 0:00:22.360 --> 0:00:24.840 I'm Jill Wisenthal and I'm Tracy Alloway. 0:00:25.040 --> 0:00:29.600 Tracy, I have to say, unfortunately, I don't have AI psychosis. 0:00:29.640 --> 0:00:31.040 I'm certain of that debatable. 0:00:31.640 --> 0:00:34.240 I'm pretty sure. I'm pretty sure I don't have AI psychosis. 0:00:34.440 --> 0:00:39.400 I do have to say, unfortunately, like the amount of 0:00:39.640 --> 0:00:43.320 time now where it's like it feels like AI related 0:00:43.440 --> 0:00:47.360 questions and there's many of them are sort of like 0:00:47.760 --> 0:00:51.000 swallowing up the other thoughts that I have in my head, 0:00:51.240 --> 0:00:55.279 whether it's questions about which models best and why, and 0:00:55.360 --> 0:00:58.880 what are the economics of inference and how much training 0:00:59.120 --> 0:01:02.200 is pre training versus post training for each model. 0:01:02.600 --> 0:01:05.160 Like it's just sort of like this blog, there's a 0:01:05.200 --> 0:01:07.240 growing that's taking out more and more of my thoughts. 0:01:07.319 --> 0:01:10.640 What is your definition of AI psychosis? Because one would 0:01:10.800 --> 0:01:14.600 argue that maybe thinking about AI literally all the time 0:01:14.920 --> 0:01:16.240 would be a form of psychosis. 0:01:16.240 --> 0:01:18.240 Well, let's just say, like, I'm not the type who 0:01:18.280 --> 0:01:21.039 thinks that, Like, I don't like think that the AI 0:01:21.360 --> 0:01:24.240 is a friend, for one saying I'm not in love 0:01:24.280 --> 0:01:27.280 with the AI models. I don't think that in collaboration 0:01:27.560 --> 0:01:32.160 with chat GPT, that I'm stumbling on unified theory of 0:01:32.240 --> 0:01:33.959 physics and things like that. 0:01:34.319 --> 0:01:36.520 So like, but you do spend a lot of time 0:01:37.160 --> 0:01:40.959 in putting instructions, pressing the button, yes, what comes out, and. 0:01:41.000 --> 0:01:41.720 See what comes out. 0:01:41.840 --> 0:01:44.039 I'm just saying I think I'm aware that I'm talking 0:01:44.080 --> 0:01:47.920 to machine and that we're not establishing any great breakthroughs 0:01:48.360 --> 0:01:51.279 of which we are collaborators in partners and friends. 0:01:51.320 --> 0:01:54.080 Recognizing you have a problem is the first step towards 0:01:54.080 --> 0:01:57.760 healing Joe. Seriously, though, there's there's a good reason to 0:01:57.840 --> 0:02:01.080 think about AI more and more, which is a huge 0:02:01.160 --> 0:02:03.160 chunk of not just the market, but the real economy 0:02:03.200 --> 0:02:05.840 is now revolving around AI right totally. 0:02:05.880 --> 0:02:09.480 So anyway, again, within the AI conversation, there are a 0:02:09.520 --> 0:02:13.640 lot of subcategories. One of the subcategories happens to be 0:02:13.800 --> 0:02:16.760 another odd Lat's favorite topic, which is chips. Of course, 0:02:16.880 --> 0:02:19.560 chips are used in multiple different ways. The chips are 0:02:19.639 --> 0:02:22.639 used in different parts of the AI supply chain, different 0:02:22.639 --> 0:02:24.640 types of chips of different roles, and so we have 0:02:24.680 --> 0:02:25.200 to learn more. 0:02:25.240 --> 0:02:26.919 We have to learn more, and I have to say 0:02:27.000 --> 0:02:30.280 I'm particularly interested in the company we're about to speak 0:02:30.320 --> 0:02:33.519 to partly because the two things I know about them 0:02:33.600 --> 0:02:37.119 are number one, they just had a huge IPO yep, right, 0:02:37.280 --> 0:02:40.119 raising something like five point five billion dollars at kind 0:02:40.160 --> 0:02:43.000 of insane multiple. I can't even do a price to 0:02:43.040 --> 0:02:46.440 earnings multiple because they're not profitable yet, but I think 0:02:46.520 --> 0:02:49.600 just on a sales basis, it was like sixty seven 0:02:49.639 --> 0:02:54.520 times forward earnings, which is pretty juicy, pretty hot. And 0:02:54.560 --> 0:02:57.200 the second thing I know about the company is they 0:02:57.200 --> 0:03:01.000 make giant way first, which is just a fun image 0:03:01.520 --> 0:03:02.080 in your head. 0:03:02.280 --> 0:03:02.720 That's right. 0:03:02.840 --> 0:03:05.919 So if you were thinking it's like, okay, there is 0:03:06.000 --> 0:03:10.120 a hot entrance in this space. What is their differentiator? Well, 0:03:10.160 --> 0:03:13.359 one fact about them is their chips are just enormous. 0:03:13.400 --> 0:03:15.840 About the size of the dinner plate. One might think 0:03:15.880 --> 0:03:18.120 you're reading an onion article, but in fact it's real 0:03:18.240 --> 0:03:23.160 and apparently it actually has some real technical advantages. 0:03:22.840 --> 0:03:25.240 And it's different and so what everyone else is doing. 0:03:25.280 --> 0:03:27.560 So everyone else is, I guess, doing this sort of 0:03:27.600 --> 0:03:30.359 like modular networking thing where you get together a bunch 0:03:30.400 --> 0:03:32.760 of chips and you connect them together and that's how 0:03:32.800 --> 0:03:36.400 you get more compute, more memory, more power basically, But 0:03:36.520 --> 0:03:38.760 this company has done something different in the form of 0:03:39.200 --> 0:03:39.920 the giant wafers. 0:03:40.000 --> 0:03:42.520 The giant wafer, and if you figure that to get 0:03:42.600 --> 0:03:46.400 maximum performance, you sort of want to lessen the distance 0:03:46.480 --> 0:03:48.200 between things, then put it all. 0:03:48.080 --> 0:03:48.720 On one wafer. 0:03:48.760 --> 0:03:51.800 Anyway, we're gonna learn a lot more. I'm very sad 0:03:52.680 --> 0:03:53.600 about giant wafers. 0:03:53.640 --> 0:03:53.760 More. 0:03:53.800 --> 0:03:56.040 I'm very excited to say we do have the founder 0:03:56.400 --> 0:03:59.520 and CEO of Sarah Bras on the podcast, Andrew Feldman. 0:03:59.600 --> 0:04:00.520 Truly the perfect guest. 0:04:00.600 --> 0:04:03.160 So, Andrew, thank you so much for coming on the 0:04:03.200 --> 0:04:04.960 podcast on the week of your IPO. 0:04:05.280 --> 0:04:07.280 Well, thank you so much for having me. What a pleasure. 0:04:07.440 --> 0:04:09.760 Absolutely, Why don't you just start us. 0:04:09.600 --> 0:04:11.600 Off the big giant chip. 0:04:11.680 --> 0:04:14.640 They're apparently real, They're as big as a dinner plate. 0:04:14.920 --> 0:04:19.320 What is the technical reason why this actually makes sense 0:04:19.760 --> 0:04:23.359 as a superior form of architecture for at least some 0:04:23.640 --> 0:04:24.520 aspect of AI. 0:04:25.200 --> 0:04:29.680 I think larger chips process more information and less time, okay, and. 0:04:29.640 --> 0:04:31.360 That produces faster results. 0:04:32.200 --> 0:04:36.160 And everybody had gone to bigger chips and video had 0:04:36.200 --> 0:04:40.039 moved from four hundred square millimeters to eight hundred square 0:04:40.080 --> 0:04:41.320 millimeters over. 0:04:41.120 --> 0:04:43.440 The course of five or six years for. 0:04:43.360 --> 0:04:48.000 This exact reason, and in the compute industry wafer scale, 0:04:48.040 --> 0:04:50.080 which is building a chip for. 0:04:50.000 --> 0:04:52.520 Those, by the way, for those who are just listening, 0:04:52.600 --> 0:04:54.840 andrews Now holding up the chip, and yes, it looks 0:04:55.160 --> 0:04:57.760 it actually looks bigger than a dinner plate, to be honest. 0:04:57.800 --> 0:05:01.359 But that is a big that's a big chip. 0:05:00.640 --> 0:05:03.400 That's fifty think. 0:05:03.440 --> 0:05:05.719 It's fifty eight times larger than any other chip that 0:05:05.760 --> 0:05:09.800 had ever been Wow. And what it did was it 0:05:09.839 --> 0:05:13.560 allowed us to use a different type of memory, a 0:05:13.600 --> 0:05:17.240 type of memory that at the beginning, there are two 0:05:17.279 --> 0:05:19.440 types of memory. There's memory that can store a lot, 0:05:20.080 --> 0:05:23.120 but it's really slow, and there's memory that can't store 0:05:23.279 --> 0:05:27.400 very much per square millimeter, but it's blisteringly fast. And 0:05:27.960 --> 0:05:34.120 historically all graphics processing units use this memory that could 0:05:34.120 --> 0:05:37.200 store a lot but was really slow, and that's the 0:05:37.279 --> 0:05:40.000 reason they do inference so slowly. So if you're using 0:05:40.040 --> 0:05:43.479 Claude right now, or you're using anything but chat GPT, 0:05:44.000 --> 0:05:47.240 what you frequently feel is you'll enter your prompt and 0:05:47.279 --> 0:05:52.080 you'll wait for an answer, right, And that's because the 0:05:52.200 --> 0:05:54.360 memory is slow and they have to move a ton 0:05:54.360 --> 0:05:58.040 of information from memory to compute. Now, by going to 0:05:58.560 --> 0:06:02.960 wayfer scale use this fast memory. Now we couldn't make 0:06:03.000 --> 0:06:07.120 that memory store more information per square millimeter, but we 0:06:07.160 --> 0:06:11.119 could add square millimeters, and so by building this big chip, 0:06:11.400 --> 0:06:14.520 we were able to stuff it to the gills with 0:06:14.560 --> 0:06:18.920 this fast memory. And that's why we're fifteen times faster 0:06:19.000 --> 0:06:23.480 than the fastest GPU. That's why on some problems we're fifty, 0:06:23.520 --> 0:06:27.720 one hundred, even one thousand times faster than graphics processing units. 0:06:28.000 --> 0:06:31.200 Wait, can you explain how you actually managed to do this? 0:06:31.440 --> 0:06:34.080 Because I know there have been previous attempts to do 0:06:34.360 --> 0:06:37.160 wayfer scale, and I seem to remember there was even 0:06:37.200 --> 0:06:39.720 like an early attempt in the nineteen eighties or something 0:06:39.800 --> 0:06:42.640 to do it. How are you able to pull this off? 0:06:42.960 --> 0:06:46.080 Yeah, it was an ambitious undertaking, that's for sure. Every 0:06:46.120 --> 0:06:49.360 previous effort in the seventy five year history of our 0:06:49.400 --> 0:06:54.320 industry had failed, including Gene Amdall, who's sort of on 0:06:54.360 --> 0:06:57.839 the mount Rushmore of compute in our industry. He failed 0:06:58.200 --> 0:07:00.920 sort of spectacularly in the mid eighties at a company 0:07:01.000 --> 0:07:05.920 called Trilogy. Not only that, but after we succeeded, people 0:07:05.960 --> 0:07:08.560 who had visited us, who'd been in our labs tried 0:07:08.600 --> 0:07:12.040 to copy us, and they also failed. And so what 0:07:12.080 --> 0:07:14.440 we were able to do is solve a set of 0:07:14.680 --> 0:07:18.760 really fundamental problems, and those problems cut across a wide 0:07:18.880 --> 0:07:22.760 swath of technology. They cut across lithography, so we had 0:07:22.800 --> 0:07:25.880 to collaborate closely with TSMC, and they turned out to 0:07:25.880 --> 0:07:29.720 be a great partner. We had to make inventions in 0:07:29.800 --> 0:07:33.240 material and packaging. That's how you put a process, or 0:07:33.280 --> 0:07:35.640 how you put a piece of silicon on a motherboard 0:07:35.840 --> 0:07:39.920 deliver power and IO to it. We had to make 0:07:40.000 --> 0:07:44.160 inventions in power delivery. Right when you build a giant chip, 0:07:44.200 --> 0:07:46.480 you're going to deliver way more power to it than 0:07:46.520 --> 0:07:48.960 if you do a chip the size of a postage stamp. 0:07:49.400 --> 0:07:51.640 We had to invent ways to cool it. We had 0:07:51.680 --> 0:07:52.840 to write new types of. 0:07:52.800 --> 0:07:54.080 Software that ran on it. 0:07:54.640 --> 0:07:57.880 All of these had never been done before, and it 0:07:57.960 --> 0:08:01.760 was a decade long process. It took us five years 0:08:01.800 --> 0:08:04.880 and about five hundred million dollars to deliver the first one, 0:08:05.480 --> 0:08:10.080 and it's been an extraordinary run since. In December, we 0:08:10.200 --> 0:08:13.360 signed a deal with open Ai North to twenty billion dollars, 0:08:13.400 --> 0:08:16.840 one of the largest contracts ever signed in Silicon Valley, 0:08:17.280 --> 0:08:19.600 and then in March we signed a deal with with 0:08:19.680 --> 0:08:24.280 AWS where they would deploy our systems in their data 0:08:24.320 --> 0:08:27.680 centers in their AWS data centers, and so it's just 0:08:27.720 --> 0:08:30.880 been an extraordinary run. But it took a long time. 0:08:31.160 --> 0:08:34.800 It took extraordinary engineering, and there were certainly long periods 0:08:34.840 --> 0:08:37.000 of time when it wasn't clear we were going to 0:08:37.040 --> 0:08:37.679 make this work. 0:08:38.000 --> 0:08:41.400 Obviously you've hit this remarkable milestone you have in fact 0:08:41.640 --> 0:08:45.959 IPO and so forth, and right now market's valuing your 0:08:45.960 --> 0:08:49.680 company at sixty four billion dollars early days of the IPO. 0:08:50.000 --> 0:08:53.839 Just for the listener to understand, the chips are are 0:08:53.840 --> 0:08:57.560 a solely in inference as opposed to, you know, in training. 0:08:57.559 --> 0:09:01.000 When we think about AI, I think about, okay, there's training, training, 0:09:01.040 --> 0:09:03.360 the model, and then answer giving that's the inference. 0:09:03.640 --> 0:09:05.680 Are the tips for just for inference. 0:09:05.800 --> 0:09:08.400 So a couple things I think you framed it exactly right. 0:09:08.520 --> 0:09:11.320 Training is how we make AI, and inference is how 0:09:11.360 --> 0:09:15.800 we use AI. And so what happened was that in 0:09:16.080 --> 0:09:18.000 sort of twenty twenty five, in the first part of 0:09:18.040 --> 0:09:21.480 twenty twenty five, the models we made were smart enough 0:09:21.520 --> 0:09:24.960 to be useful, and there was an explosion of use. 0:09:26.120 --> 0:09:28.160 And we use AI by doing inference. So there was 0:09:28.160 --> 0:09:32.520 this sort of tidal wave of demand on inference, and 0:09:32.559 --> 0:09:34.760 that has continued in twenty twenty six, and we think 0:09:34.800 --> 0:09:38.480 it will continue for years and years to come. And 0:09:38.559 --> 0:09:43.400 so that's what had happened in twenty fifteen. When we 0:09:43.520 --> 0:09:46.920 began thinking about the company. We knew that AI was 0:09:46.920 --> 0:09:49.079 on the horizon and they would eat a huge amount 0:09:49.120 --> 0:09:54.199 of computer right, and we made sort of two fundamental bets. 0:09:54.400 --> 0:09:59.719 We bet that it would need dedicated silicon, and right, 0:10:00.000 --> 0:10:02.520 graphics had needed dedicated silicon, that's how you got. 0:10:02.360 --> 0:10:03.800 The graphics processing unit. 0:10:04.160 --> 0:10:07.000 Mobile compute had needed dedicated compute. 0:10:07.040 --> 0:10:08.640 That's where you got ARM processors. 0:10:09.320 --> 0:10:11.160 We made that bet, and we made a bet that 0:10:11.800 --> 0:10:15.400 modifying the GPU architecture wouldn't be right. You needed to 0:10:15.440 --> 0:10:18.080 start with a clean sheet of paper. And so what 0:10:18.160 --> 0:10:22.400 we started with was a new vision, and that vision 0:10:22.440 --> 0:10:25.920 could do training and it could do inference, and it 0:10:25.960 --> 0:10:30.280 was orders of magnitude faster at both. But right now 0:10:30.320 --> 0:10:34.200 what we're seeing is such an explosion in demand for 0:10:34.320 --> 0:10:36.680 inference that a lot of the business this minuted his inference, 0:10:36.920 --> 0:10:40.920 even though we're just as fast at the same amount 0:10:41.000 --> 0:10:43.160 faster than GPUs on training. 0:10:43.320 --> 0:10:43.959 That's interesting. 0:10:44.000 --> 0:10:46.679 Maybe we'll get more to the theoretical training market a 0:10:46.720 --> 0:10:47.240 little later. 0:10:47.600 --> 0:10:49.199 Just real quick on inference. 0:10:49.320 --> 0:10:52.640 Ben Thompson, who writes a newsletter about tech, He wrote 0:10:52.640 --> 0:10:56.640 a piece in which he distinguishes between answer inference and 0:10:56.720 --> 0:11:02.520 agentic So answer inferences like format by resume or whatever, 0:11:02.880 --> 0:11:05.240 or write me an essay on X or Y, or 0:11:05.240 --> 0:11:08.400 answer some questions, and then agentic inference is like, Okay, 0:11:08.440 --> 0:11:11.360 here's this thing that's going to go around. Do you 0:11:11.440 --> 0:11:15.440 distinguish and do services for you not producing visual answers? 0:11:15.520 --> 0:11:18.440 Do you distinguish between those two? Is that a real 0:11:18.480 --> 0:11:21.800 divide in your view? And can your chips do both? 0:11:22.120 --> 0:11:25.640 Our chips can do both. I think it is a divide, Okay. 0:11:25.800 --> 0:11:28.000 I think speed. 0:11:27.960 --> 0:11:29.280 Matters equally in both. 0:11:29.480 --> 0:11:29.800 Okay. 0:11:30.360 --> 0:11:33.800 I think if you are engaged with the AI, if 0:11:33.800 --> 0:11:37.000 you're writing code, which is agentic. If you're writing code 0:11:37.080 --> 0:11:41.040 or you're doing work, nobody wants to wait. I mean, 0:11:41.480 --> 0:11:43.400 we could just turn the question around and say, well, 0:11:43.400 --> 0:11:46.959 how big is the market for slow search zero? How 0:11:46.960 --> 0:11:49.240 big is the market for dial up internet zero? 0:11:49.280 --> 0:11:52.680 Why is that? Because nobody wants to wait? Right? 0:11:52.760 --> 0:11:56.040 So, if you're engaged with the AI, speed is of 0:11:56.080 --> 0:11:59.760 the essence. But if the AI is doing agentic work 0:12:00.559 --> 0:12:04.480 and your competitor gets three times five times, ten times 0:12:04.480 --> 0:12:07.520 as much work done in twenty minutes than you do, 0:12:07.800 --> 0:12:11.760 you're gonna get smoked. And so this notion somehow that 0:12:11.880 --> 0:12:16.520 been proposed that speed isn't very important in agentic flows 0:12:16.640 --> 0:12:20.520 is dead wrong. That speed is important in all aspects 0:12:20.559 --> 0:12:24.280 of productive work, and that your ability to get more 0:12:24.360 --> 0:12:29.400 done in less time is a fundamental advantage that accrues 0:12:29.520 --> 0:12:33.680 over time. Right If while your competitor is doing one 0:12:33.800 --> 0:12:37.640 unit of work, you can do three, and in the 0:12:37.679 --> 0:12:41.600 next time they do one unit of work, you do six. Sure, right, 0:12:41.679 --> 0:12:46.280 this adds up over time and you beat them in 0:12:46.360 --> 0:12:49.719 any line of work. And so speed, which is sort 0:12:49.720 --> 0:12:53.079 of our specialty, is important across the board. 0:12:53.400 --> 0:12:56.520 What do giant wafers and speed in general actually mean 0:12:56.679 --> 0:13:00.160 for I guess the economics of tokens, because one way 0:13:00.760 --> 0:13:02.839 I think about it, I have this sort of vision 0:13:02.880 --> 0:13:07.040 in my head, like, Okay, if I'm out shopping for toothpaste, 0:13:07.400 --> 0:13:09.120 I know I need toothpaste every once in a while, 0:13:09.160 --> 0:13:11.000 and I go into like a cvs A store, I 0:13:11.040 --> 0:13:13.040 get one thing of toothpaste, and then maybe a week 0:13:13.160 --> 0:13:15.680 later I get some more toothpaste. Or I could go 0:13:15.720 --> 0:13:19.640 to Costco and buy a giant thing of toothpaste and 0:13:19.679 --> 0:13:22.360 take it home, probably at a cheaper cost. And that's 0:13:22.400 --> 0:13:25.200 sort of how I think of the giant wafers. Maybe 0:13:25.200 --> 0:13:28.400 it's bad analogy, but what does speed actually mean for 0:13:29.080 --> 0:13:30.320 the cost of tokens? 0:13:30.679 --> 0:13:33.840 Well, I think there are a couple observations. I think 0:13:33.880 --> 0:13:38.440 people have chosen so far to price speed a little higher. 0:13:39.600 --> 0:13:45.280 For example, Anthropic offered a premium service in which they 0:13:45.720 --> 0:13:49.640 offered tokens twice as fast and charged six times as much, 0:13:50.200 --> 0:13:53.560 and they sold it out and they couldn't meet the demand. Now, 0:13:53.840 --> 0:13:56.520 just to give you an idea, we're fifteen times faster 0:13:56.880 --> 0:14:01.400 than there, twice as fast, and so people value speed 0:14:01.600 --> 0:14:04.720 because it allows them to do more work and they 0:14:04.800 --> 0:14:07.920 value their time. And when you can do more work 0:14:07.920 --> 0:14:11.280 in less time, you are making people more productive. That's 0:14:11.320 --> 0:14:13.920 why people have chosen to price them at a premium. 0:14:13.920 --> 0:14:15.400 They don't cost more to make. 0:14:16.640 --> 0:14:21.000 In fact, in the GPU architecture is an extremely good 0:14:21.080 --> 0:14:25.960 architecture and extremely efficient at building very slow tokens. And 0:14:26.000 --> 0:14:29.800 if you don't mind slow, the cost per token on 0:14:29.880 --> 0:14:34.080 a GPU is extremely low. But the GPU has a 0:14:34.200 --> 0:14:38.840 characteristic that as you try and go faster, the cost 0:14:38.920 --> 0:14:43.640 and the power used per token increase, sort of like 0:14:43.800 --> 0:14:46.200 as you go faster in your car, your miles per 0:14:46.240 --> 0:14:51.080 gallon decrease. Right, So what happens is as you try 0:14:51.120 --> 0:14:54.080 and get fast enough to be useful, fast enough to 0:14:54.120 --> 0:14:59.400 be interesting, fast enough to keep users intelligence focused on 0:14:59.440 --> 0:15:04.440 this product, they become extremely expensive and extremely power hungry. 0:15:05.280 --> 0:15:08.480 And so the question is is not just what people 0:15:08.480 --> 0:15:11.400 are paying for a token, what people are choosing to 0:15:11.480 --> 0:15:13.800 price them at, but what they actually cost to make. 0:15:14.440 --> 0:15:17.080 And GPS make very. 0:15:16.960 --> 0:15:22.040 Slow tokens very cheaply, and they're unbelievably expensive at fast tokens. 0:15:22.080 --> 0:15:26.160 We make fast tokens vastly less expensive than the GPU's 0:15:26.160 --> 0:15:32.760 and we use a tiny fraction of the power. 0:15:43.440 --> 0:15:46.120 Let's say we stipulate that this is not true and 0:15:46.560 --> 0:15:48.800 everyone wants the fastest and everyone's like, you know what, 0:15:49.400 --> 0:15:53.560 this is the solution that the Cerebras technology one big chip. 0:15:53.920 --> 0:15:58.440 This is really where it's at. How much of your 0:15:58.600 --> 0:16:01.920 market share for the inference market when you look out 0:16:02.000 --> 0:16:05.440 next year, the year after, et cetera, how much is 0:16:05.480 --> 0:16:10.320 your market share going to be dictated by your ability 0:16:10.360 --> 0:16:13.680 to get capacity at tsmc fabs. How much is that 0:16:13.760 --> 0:16:15.360 a gating mechanism for growth? 0:16:15.600 --> 0:16:19.560 You know, TSMC is a huge part of the supply chain. Yeah, 0:16:19.640 --> 0:16:21.480 but we have some real advantages. 0:16:21.840 --> 0:16:25.680 There are three areas right now that are limiting vendors 0:16:25.680 --> 0:16:27.120 in building AI computes. 0:16:28.040 --> 0:16:30.640 Number one is HBM memory. 0:16:31.200 --> 0:16:34.920 Is this memory we described earlier that can store a lot, 0:16:34.960 --> 0:16:39.920 but it's really slow. That's made by three companies approximately Samsung, Heienix, 0:16:39.960 --> 0:16:44.640 and Micron, and it's under unbelievable supply pressure. It's extremely 0:16:44.680 --> 0:16:47.960 difficult to get their very long lead times. It's unbelievably 0:16:47.960 --> 0:16:51.880 expensive right now, we don't use it. The second part 0:16:51.920 --> 0:16:56.920 that's limiting is a process inside of TSMC called cooths, 0:16:57.880 --> 0:17:00.440 and this is the process that in Nvidia and other 0:17:00.520 --> 0:17:01.400 GPUs use. 0:17:02.160 --> 0:17:02.880 We don't use it. 0:17:03.680 --> 0:17:07.919 The third thing is that at TSMC, the factory that 0:17:08.040 --> 0:17:11.560 is under most pressure is their three nanimeter factory. 0:17:12.000 --> 0:17:14.000 We don't use it. We use five nanometer. 0:17:14.760 --> 0:17:18.879 So we have managed to avoid some of the most 0:17:18.920 --> 0:17:23.919 binding supply constraints. Now, TSMC still has to give us 0:17:23.920 --> 0:17:27.080 a meaningful allocation, and they've been an extraordinary partner from 0:17:27.080 --> 0:17:30.560 the get go, and they are the greatest manufacturing company 0:17:30.680 --> 0:17:33.600 on earth by far. A fab is sort of a 0:17:33.640 --> 0:17:37.280 modern pyramid. It's an unbelievable thing. And I highly recommend 0:17:37.359 --> 0:17:39.639 you or any of your your listeners, if you get 0:17:39.640 --> 0:17:42.040 a chance to go to Taipei, go and see them. 0:17:42.280 --> 0:17:44.240 They are just extraordinary. 0:17:44.280 --> 0:17:47.639 Can you do faburs You can't, Actually, you can't do 0:17:48.560 --> 0:17:50.560 you can go and they have a museum of Innovation 0:17:50.680 --> 0:17:52.680 and it is an extraordinary thing. 0:17:53.280 --> 0:17:56.040 They are the sort of the national champion of Taiwan. 0:17:56.720 --> 0:18:00.680 But I think today TSMC is given a as many 0:18:00.680 --> 0:18:05.280 wafers as we've needed. Business today is constrained by data centers, 0:18:06.080 --> 0:18:09.440 and that's the grand irony, right You invent technology that 0:18:09.520 --> 0:18:13.320 has been unbuildable, never been invented for seventy five years 0:18:13.359 --> 0:18:17.520 in the history of compute. You write software that is extraordinary, 0:18:17.560 --> 0:18:19.720 You build a product that is vastly faster. 0:18:19.560 --> 0:18:22.679 Than the cumbent. And what are we all constrained by buildings? 0:18:24.000 --> 0:18:24.520 All right? 0:18:24.760 --> 0:18:28.040 The data centers right now are everybody's constrained in the 0:18:28.160 --> 0:18:30.640 entire industry powered buildings, So real estate. 0:18:31.000 --> 0:18:32.600 It is an amazing thing right now. 0:18:33.480 --> 0:18:37.160 And that is too sort of across the board, and 0:18:37.240 --> 0:18:40.760 that will not change for the next fifteen or eighteen 0:18:40.760 --> 0:18:41.440 months for sure. 0:18:41.880 --> 0:18:45.119 I mean, since we're talking physical constraints, I guess I 0:18:45.119 --> 0:18:48.800 should ask you. We did an episode about helium recently, 0:18:49.000 --> 0:18:51.840 a helium shortage given the situation in the Strait of 0:18:51.880 --> 0:18:54.840 Horror moves, and one of the things that helium is 0:18:54.960 --> 0:18:58.920 used for is lithography on semiconductor chips. Has that affected 0:18:58.960 --> 0:19:01.400 you at all or is that so thing that you're monitoring. 0:19:01.760 --> 0:19:03.600 We monitor, but there's not a lot we can do, 0:19:04.520 --> 0:19:06.720 and there's plenty of stuff to worry about that we 0:19:06.800 --> 0:19:11.760 can't affect. We obviously are in communication every day with TSMC. 0:19:12.240 --> 0:19:14.320 We're in communication with our entire. 0:19:14.240 --> 0:19:17.960 Supply chain every single day, and we stay abreast of 0:19:18.640 --> 0:19:23.000 the various issues. But it has had no impact on us, 0:19:23.359 --> 0:19:26.720 and we put that in the bucket of things that 0:19:26.840 --> 0:19:31.200 our manufacturing partners worry about also and that we can't help. 0:19:31.720 --> 0:19:36.320 You know, So, in addition to manufacturing these chips, you 0:19:36.400 --> 0:19:39.320 actually I didn't realize this. You have your own cloud 0:19:39.400 --> 0:19:42.119 we do, and or you have your own cloud services, 0:19:42.400 --> 0:19:45.040 which I have a bunch of questions about that. You 0:19:45.080 --> 0:19:48.600 have your own cloud services through which a user can 0:19:48.680 --> 0:19:52.400 actually get access to various open source models and so forth. 0:19:52.520 --> 0:19:54.520 It looks a little bit sort of visually, it looks 0:19:54.560 --> 0:19:59.000 a lot like the open router interface roughly the same environment, 0:19:59.359 --> 0:20:03.320 except like the open source What I'm something I'm curious 0:20:03.320 --> 0:20:06.240 about and maybe you could speak to this. You know, 0:20:07.720 --> 0:20:11.600 in traditional software open source. One nice thing about open 0:20:11.640 --> 0:20:13.640 sources you don't have to pay for it, so it's free. 0:20:13.840 --> 0:20:14.600 It's a little bit. 0:20:14.480 --> 0:20:16.600 Different when we're talking about there's no really such thing 0:20:16.640 --> 0:20:19.439 as like free AI software because even if it's like free, 0:20:19.680 --> 0:20:21.760 you still have to pay for the depreciation of the 0:20:21.800 --> 0:20:24.000 chips and you have to pay for the electricity to 0:20:24.119 --> 0:20:26.720 run them. So there's no real sarch things like free 0:20:26.800 --> 0:20:29.600 open source AI software. But what I am curious about 0:20:29.720 --> 0:20:34.679 in your experience as a cloud vendor, are the open 0:20:34.760 --> 0:20:39.800 sources models cheaper on a per unit of intelligence basis? 0:20:40.040 --> 0:20:44.160 If we had some way of saying levelized cost of intelligence, 0:20:44.160 --> 0:20:46.720 which I don't know if the industry has yet, Are 0:20:46.760 --> 0:20:52.240 open source models cheaper per IQ point whatever we want, 0:20:52.280 --> 0:20:54.600 however we want to measure intelligence. 0:20:54.200 --> 0:20:55.520 Yes, by a lot? 0:20:55.640 --> 0:20:59.240 Really yeah, I think in the closed source world you're 0:20:59.280 --> 0:21:01.920 paying a lot for that extra little bit of intelligence. 0:21:02.000 --> 0:21:04.320 Right the open source models. There are no open source 0:21:04.359 --> 0:21:06.120 models that are as good as. 0:21:06.000 --> 0:21:07.040 The closed source models. 0:21:07.240 --> 0:21:10.800 Okay, think of it as three four percent five percent 0:21:10.880 --> 0:21:14.320 different Okay, something in that range, and it could be 0:21:14.359 --> 0:21:16.760 a little more, could be a little less, but the 0:21:16.920 --> 0:21:20.720 cost to you using them. You can jump up right 0:21:20.760 --> 0:21:23.160 now and run KIMMI Kate two. It's a one trillion 0:21:23.200 --> 0:21:26.600 parameter model. It's an open source model on cerebras where 0:21:26.640 --> 0:21:30.560 ten or fifteen times faster than others. And what you're 0:21:30.560 --> 0:21:35.000 paying for is the cost of our power and some 0:21:35.720 --> 0:21:38.679 cost of the compute that took to calculate it. What 0:21:38.720 --> 0:21:40.800 you're not paying for was the cost to train it. 0:21:41.520 --> 0:21:43.400 And that's a battle that. 0:21:43.400 --> 0:21:44.520 Is underway in the market. 0:21:45.200 --> 0:21:49.720 You have open Ai with their coding software, you have 0:21:50.320 --> 0:21:53.840 Anthropic with their coding software. And you've got companies like 0:21:53.960 --> 0:21:58.440 Cursor and Cognition that are using open source. We power 0:21:58.480 --> 0:22:01.800 open Ai and we power Cognitive. You have a battle 0:22:01.880 --> 0:22:06.359 underway between closed source and open source, and I think 0:22:06.400 --> 0:22:09.000 that the winners of that battle is yet to be determined. 0:22:09.359 --> 0:22:13.840 What is clear is that the closed sources is strictly 0:22:14.040 --> 0:22:18.040 better by a little bit by how much varies, and 0:22:18.119 --> 0:22:18.920 it's more expensive. 0:22:19.320 --> 0:22:21.919