WEBVTT - Two Veteran Chip Builders Have a Plan to Take On Nvidia 0:00:03.160 --> 0:00:18.520 Bloomberg Audio Studios, Podcasts, radio News. 0:00:20.079 --> 0:00:23.959 Hello and welcome to another episode of the Odd Lots podcast. 0:00:24.040 --> 0:00:25.680 I'm Jill Wisenthal. 0:00:25.360 --> 0:00:26.439 And I'm Tracy Alloway. 0:00:26.720 --> 0:00:30.880 Tracy, here's something I know about AI. I don't know much, 0:00:30.920 --> 0:00:31.920 but here's something. 0:00:31.680 --> 0:00:32.080 I do know. 0:00:32.240 --> 0:00:33.600 How to log into chat GPT. 0:00:33.920 --> 0:00:35.680 No, I'm good at it. I'm good at that. I'm 0:00:35.680 --> 0:00:38.479 good at logging into chat GPT and claude, and I'm 0:00:38.520 --> 0:00:41.680 reasonably good at asking questions. Now, here's actually something about 0:00:41.680 --> 0:00:44.280 the actually about the business of AI that I know. 0:00:44.520 --> 0:00:44.840 Okay. 0:00:45.120 --> 0:00:45.879 I know that in. 0:00:46.080 --> 0:00:50.120 Video is making a ton of money and the stock 0:00:50.159 --> 0:00:53.280 has gone to the moon, and that other companies would 0:00:53.280 --> 0:00:54.560 like a slice of that pie. 0:00:55.160 --> 0:00:57.560 Yes, yes, that's a good thing to know. 0:00:58.000 --> 0:01:00.360 It's like a basic, simple thing, which is that when 0:01:00.360 --> 0:01:04.080 people think about AI chips, there's literally one company that 0:01:04.160 --> 0:01:08.280 comes to mind. I know others are involved. AMD has stuff, 0:01:08.440 --> 0:01:11.800 Intel obviously wants to play others, but there is obviously 0:01:11.840 --> 0:01:15.840 that one gigantic pile of cash that's flowing to this 0:01:15.840 --> 0:01:18.120 one company. I don't know if it's still but at 0:01:18.120 --> 0:01:20.240 one point, is the biggest company in the world is 0:01:20.440 --> 0:01:21.160 pulled back. 0:01:21.000 --> 0:01:21.520 A little bit. 0:01:22.080 --> 0:01:24.640 Well, I would say two things. One, other companies would 0:01:24.680 --> 0:01:27.920 like that a piece of that pie. And b companies 0:01:27.959 --> 0:01:31.639 that are in the business of building AI models would 0:01:31.680 --> 0:01:35.039 like to find a way to get cheaper, more efficient, 0:01:35.360 --> 0:01:38.640 less energy intensive chips so that they don't have to 0:01:38.680 --> 0:01:40.160 always pay the Nvidia tax. 0:01:40.440 --> 0:01:43.240 Do you want to know what I know about AI 0:01:43.319 --> 0:01:46.320 and semiconductors, Let's go for it. Okay, here's the one 0:01:46.360 --> 0:01:49.160 thing that I know, which is that whenever you have 0:01:49.280 --> 0:01:52.800 this conversation about in Nvidia, the one word that always 0:01:52.800 --> 0:01:54.080 comes up is moat. 0:01:54.400 --> 0:01:55.440 Oh yes, moat yeah. 0:01:55.520 --> 0:01:59.400 So, like you're either talking about like medieval castles or 0:01:59.440 --> 0:02:02.280 you're talking about semiconductor manufacturing. That's when you hear the 0:02:02.320 --> 0:02:05.360 word mote because over and over again people will say 0:02:05.400 --> 0:02:07.480 it is expensive to make the chips. You need a 0:02:07.480 --> 0:02:10.040 lot of money for research and development and to set 0:02:10.120 --> 0:02:12.480 up the fabs, and you need a lot of first 0:02:12.520 --> 0:02:16.080 person expertise in building them. And then there's also the 0:02:16.120 --> 0:02:20.160 network effect. So a company like Nvidia has this huge 0:02:20.200 --> 0:02:23.560 moat around its business. The question, of course, is whether 0:02:23.680 --> 0:02:26.520 or not, getting back to the medieval castle analogy, it 0:02:26.600 --> 0:02:28.560 is unassailable, that's right. 0:02:28.720 --> 0:02:32.519 If semiconductor seems to be mote after MOTI, after mode, 0:02:32.520 --> 0:02:36.840 because there's ASML's moat, and then there's Taiwan Semiconductor's moat, 0:02:37.440 --> 0:02:41.000 and then there's Nvidia's moat, and so yes, it's like 0:02:41.040 --> 0:02:44.880 there's a series of moats, and if someone could overcome 0:02:45.000 --> 0:02:46.960 these moats or make find a way to build a 0:02:47.000 --> 0:02:50.800 bridge over one of these moats and enter this proverbial castle, 0:02:51.080 --> 0:02:53.760 that would be very lucrative. We know that many are 0:02:53.880 --> 0:02:57.919 trying to enter these moats, but it's incredibly costly and 0:02:58.080 --> 0:03:01.680 capital intensive and difficult. There are just not many people 0:03:01.680 --> 0:03:04.080 who know how to do any of this stuff, and 0:03:04.200 --> 0:03:06.840 so the question of whether these modes can be overcome. 0:03:07.200 --> 0:03:09.480 But again, there are many businesses that would love to 0:03:09.480 --> 0:03:13.320 see more robust competition in the space so that their 0:03:13.400 --> 0:03:15.160 payment is not a attack. 0:03:15.520 --> 0:03:18.359 You know, one thing I don't know, and I don't 0:03:18.400 --> 0:03:21.120 think we've ever done an episode purely on this, but 0:03:21.200 --> 0:03:25.040 I don't really understand the different designs of chips. So 0:03:25.200 --> 0:03:28.720 I know that some chips, specifically in videos, are supposed 0:03:28.760 --> 0:03:33.040 to be better at AI. They're better at running lots 0:03:33.120 --> 0:03:36.400 of little calculations all at the same time. And I 0:03:36.440 --> 0:03:40.200 know there's basic chips that go into your refrigerator or 0:03:40.200 --> 0:03:42.360 your car or whatever. But I don't really know the 0:03:42.400 --> 0:03:46.560 difference between what a chip that was designed specifically to 0:03:46.720 --> 0:03:50.120 run a large language model would look like compared to 0:03:50.560 --> 0:03:52.080 a standard basic chip. 0:03:52.320 --> 0:03:54.400 I don't know anything about chip design. I just sort 0:03:54.400 --> 0:03:58.760 of imagined someone on like using some CADS software, etching 0:03:58.880 --> 0:04:02.520 little lines in the thing and drawing some sort of 0:04:02.560 --> 0:04:05.560 like circuitry or you know, put it place in the trains. 0:04:06.040 --> 0:04:08.520 You know, A chip design game would be really fun, 0:04:08.600 --> 0:04:10.400 now that I think about it. Yeah, you could just 0:04:10.520 --> 0:04:13.360 draw little things on the square. Okay. Anyway, Well, we 0:04:13.400 --> 0:04:13.800 are going. 0:04:13.760 --> 0:04:17.200 To learn about how chip design works. We are going 0:04:17.279 --> 0:04:21.200 to learn about what makes a chip particularly good for 0:04:21.279 --> 0:04:25.320 the task of training and running inference on these AI models. 0:04:25.600 --> 0:04:27.479 And I have to say, I really do believe we 0:04:27.600 --> 0:04:31.400 have the two perfect guests because they are both veterans 0:04:31.480 --> 0:04:34.400 in this space, and they are both active in the 0:04:34.680 --> 0:04:38.120 attempt to bridge some of these motes and enter the 0:04:38.160 --> 0:04:41.479 space and bring competition to the industry. We are going 0:04:41.520 --> 0:04:44.320 to be speaking with yin Or Pope, co founder and 0:04:44.440 --> 0:04:47.400 CEO of Medex, as well as Mike Gunter, co founder 0:04:47.400 --> 0:04:50.679 and CTO of Madex. It's a new company that's trying 0:04:50.720 --> 0:04:55.960 to build chips specifically for the purpose of large language models. 0:04:56.279 --> 0:04:58.839 Both of them have a lot of experience in the 0:04:58.880 --> 0:05:01.440 space we're going to we get our hands dirty, so 0:05:01.520 --> 0:05:04.560 to speak, and understand how you build the hardware for 0:05:04.560 --> 0:05:06.800 all this stuff and what makes it win and whether 0:05:06.839 --> 0:05:09.400 it's even a winnable game. Ryan Or and Mike, thank 0:05:09.440 --> 0:05:11.080 you so much for coming on Outlaws. 0:05:11.440 --> 0:05:14.040 Thanks, happy to be here, pleasure to be here. 0:05:14.160 --> 0:05:16.839 So what do you tell us? What does a chip 0:05:16.880 --> 0:05:20.640 designer do? I know, I have this completely cartoonish view 0:05:20.680 --> 0:05:23.880 in my head that cannot possibly be right of someone 0:05:23.960 --> 0:05:27.200 on a big screen using some CAD software to sort of, 0:05:27.279 --> 0:05:28.880 you know, figure out what's going to be etched in 0:05:28.920 --> 0:05:31.560 that way for of silicon. What is the job of 0:05:31.640 --> 0:05:32.280 chip design? 0:05:33.200 --> 0:05:35.440 So maybe this is best told by what is the 0:05:35.480 --> 0:05:38.520 story of chip development from the beginning of a project 0:05:38.560 --> 0:05:41.000 to the end of it. So there's a range of 0:05:41.000 --> 0:05:42.360 different ways this can go, but there's a lot of 0:05:42.400 --> 0:05:46.000 things that are in common. So generally a chip design 0:05:46.200 --> 0:05:49.880 team is at the low end, maybe thirty people, up 0:05:49.920 --> 0:05:52.560 to many many thousands of people at the high end, 0:05:53.000 --> 0:05:56.479 and it as the project typically runs for somewhere in 0:05:56.480 --> 0:05:58.800 the range of three to five years from conception to 0:05:58.880 --> 0:06:02.160 actually shipping to customer, and so over that time what 0:06:02.160 --> 0:06:04.760 we see in the life cycle is we tend to 0:06:04.800 --> 0:06:07.840 start with a small team of architects. If you think 0:06:07.839 --> 0:06:10.080 of designing a house, the team of architects are the 0:06:10.080 --> 0:06:12.440 people who decide what rooms go in here, or how 0:06:12.440 --> 0:06:14.880 many bedrooms, how many bathrooms, what are the flows between them, 0:06:14.880 --> 0:06:16.640 how do people walk through the corridors, and so on, 0:06:17.000 --> 0:06:19.840 the coarse grained design of the chip, in the chip itself, 0:06:19.880 --> 0:06:22.080 that is, you know what kinds of components at the 0:06:22.360 --> 0:06:26.160 high level we have, and then after that initial exploration, 0:06:26.680 --> 0:06:29.039 this moves then over to the micro architects. These are 0:06:29.080 --> 0:06:31.200 the people who are designing the individual rooms. What are 0:06:31.240 --> 0:06:34.320 the components that go in the individual rooms. So at 0:06:34.360 --> 0:06:36.760 that point everything we've done so far is a design 0:06:36.839 --> 0:06:41.040 stage thing. This is done in documents, spreadsheets, and it's 0:06:41.080 --> 0:06:44.080 a verbal and human communication form. But beyond that, that's 0:06:44.080 --> 0:06:46.160 when it starts to actually touch the computer in a 0:06:46.520 --> 0:06:49.839 more meaningful sense. And so the micro architects will hand 0:06:49.880 --> 0:06:52.760 over to the logic designers. They are the people who 0:06:52.800 --> 0:06:55.200 are actually writing code. So even though you think of 0:06:55.240 --> 0:06:58.080 chips as being this very physical thing where there's wires 0:06:58.120 --> 0:07:00.320 and gates and everything. The way we try to admit 0:07:00.320 --> 0:07:02.400 this information to the computer is actually writing code. We 0:07:02.440 --> 0:07:05.760 write verolog that expresses the design of the chip. So 0:07:06.120 --> 0:07:10.040 that's what the logic designers are doing. That's an extended 0:07:10.080 --> 0:07:12.400 period of time building out all of the different you know, 0:07:12.680 --> 0:07:16.320 matrix multiplies, memories, circuitry that connects to the outside world, 0:07:16.360 --> 0:07:18.800 and so on. And then the output of all of 0:07:18.840 --> 0:07:21.920 them is this verolog piece of software code that gets 0:07:22.080 --> 0:07:25.000 then compiled by a computer down to a set of 0:07:25.080 --> 0:07:27.960 gates which are logic gates and or gates and so on. 0:07:28.040 --> 0:07:29.800 And then why is that connect them together? That's the 0:07:29.840 --> 0:07:33.560 netlist this file. Then there's a few more stages still 0:07:33.560 --> 0:07:36.960 coming here. This file gets handed off to physical designers, 0:07:37.000 --> 0:07:39.760 who again work with CAD tools to convert this kind 0:07:39.760 --> 0:07:40.600 of logical discussion. 0:07:40.640 --> 0:07:42.560 Was right, Someone is using CAD tools. 0:07:43.480 --> 0:07:46.040 Absolutely, there's a CAD tool, but it's it's only out 0:07:46.080 --> 0:07:50.040 of the job. Okay, So the physical designers are converting 0:07:50.080 --> 0:07:52.800 the sort of logical description into a physical placement. So 0:07:53.240 --> 0:07:55.560 where do each of these gates go? Now there's two 0:07:55.640 --> 0:07:58.000 hundred billion logic gates on a chip, so a human 0:07:58.040 --> 0:07:59.760 is not going to be placing all of those manually. 0:08:00.040 --> 0:08:03.120 So there's a huge amount of software assistance here. But 0:08:03.160 --> 0:08:05.240 what the human is doing is providing oversights through this 0:08:05.280 --> 0:08:07.520 process and saying, I've done this a ton of times before. 0:08:07.640 --> 0:08:10.560 This placement kind of looks wrong, it doesn't match my heuristics, 0:08:10.600 --> 0:08:12.760 and so I can probably do a better job here. 0:08:13.160 --> 0:08:15.360 So that's the physical designers, and the output of their 0:08:15.400 --> 0:08:18.920 work is actually eventually you get a polygons, so basically 0:08:18.920 --> 0:08:21.600 an image saying here is the thing that is going 0:08:21.680 --> 0:08:26.160 to get etched onto a piece of silicon. So that 0:08:26.600 --> 0:08:29.640 file is ultimately a huge, like really big image in 0:08:29.640 --> 0:08:32.000 some form a bunch of polygons on it. It gets 0:08:32.040 --> 0:08:36.439 handed over to a manufacturing company such as TSMC. They 0:08:36.440 --> 0:08:41.040 spend maybe four or five months initially creating a mask set, 0:08:41.120 --> 0:08:43.760 so those are like the templates or the stencils that 0:08:43.800 --> 0:08:46.160 will be used to stamp out many many copies of 0:08:46.160 --> 0:08:48.679 the chip, and then stamps up many copies of the chip. 0:08:48.720 --> 0:08:51.840 You get a chip back. This is typically about two 0:08:51.920 --> 0:08:54.160 or three years after you started the project. You get 0:08:54.200 --> 0:08:57.000 chips back, and now you have a bring up team 0:08:57.000 --> 0:09:00.520 who puts this chip into a whole board and connected 0:09:00.559 --> 0:09:02.680 to what to power and electricity and starts testing it, 0:09:03.240 --> 0:09:05.760 and then after another six to twelve months or maybe 0:09:05.760 --> 0:09:08.800 even more, eventually you actually can hand this over to customers. 0:09:09.160 --> 0:09:10.920 There's maybe just one or two other things which are 0:09:10.920 --> 0:09:13.920 not in that flow but very essential to call out too. 0:09:14.360 --> 0:09:18.040 Are because of this whole process taking so long, especially 0:09:18.040 --> 0:09:21.440 the manufacturing, we also have like very large teams of 0:09:21.600 --> 0:09:24.920 verification people. So these are the people who before we 0:09:24.920 --> 0:09:27.160 actually send it to manufacturing and pay twenty to thirty 0:09:27.160 --> 0:09:31.480 million dollars of manufacturing, we have a substantial team doing 0:09:31.480 --> 0:09:33.640 a lot of testing. And this is software based testing, 0:09:33.720 --> 0:09:36.120 so writing tests in the same way a software engineer 0:09:36.160 --> 0:09:39.600 might to make sure that the functionality actually works as intended. 0:09:39.920 --> 0:09:44.240 To underlying the comparison to ordinary software, which Reiner touched 0:09:44.280 --> 0:09:47.760 on it, we're writing code, but it's on super hard mode. 0:09:48.160 --> 0:09:50.600 So if you have a if you have a software 0:09:50.640 --> 0:09:54.000 that's deployed the website, you can fix a bug and 0:09:54.120 --> 0:09:57.880 you know, ten minutes at basically zero cost. Whereas in 0:09:57.920 --> 0:09:59.880 our case, the reason that we have a large team 0:10:00.080 --> 0:10:03.280 people doing verification making sure that what we've done is 0:10:03.320 --> 0:10:07.439 correct is that it's potentially four months and thirty million 0:10:07.440 --> 0:10:11.079 dollars for every mistake that you let through. Likewise, there 0:10:11.120 --> 0:10:14.280 is software, but it's a relatively small fraction of software 0:10:14.320 --> 0:10:16.719 that's very performance critical where you want the code to 0:10:16.760 --> 0:10:19.400 run as fast as possible. But in some sense, every 0:10:19.480 --> 0:10:22.120 line of code that you write in hardware has an 0:10:22.120 --> 0:10:25.480 impact on the overall performance of the product, because every 0:10:25.520 --> 0:10:28.400 line of code ends up getting embodied in silicon, and 0:10:28.440 --> 0:10:31.280 every line of code affects the eventual performance. So it's 0:10:31.360 --> 0:10:34.080 kind of coding, but on hard mode. 0:10:34.800 --> 0:10:40.520 So I intuitively understand the importance of getting the software right. 0:10:40.679 --> 0:10:45.360 But why does placement on the actual chip or wayfer 0:10:45.480 --> 0:10:48.280 Why does that matter? Are you trying to make it 0:10:48.280 --> 0:10:51.280 more efficient, are you trying to reduce the rise time? 0:10:51.440 --> 0:10:53.640 Or why does it matter where the little bits and 0:10:53.679 --> 0:10:56.679 bobs are placed? To use the scientific. 0:10:56.200 --> 0:11:00.400 Term, Yeah, you're right that reducing the right time is 0:11:00.640 --> 0:11:04.320 a massive issue. And you know, fundamentally the issue is 0:11:04.320 --> 0:11:07.520 that chips, you know, at a very abstract level, are 0:11:07.960 --> 0:11:11.480 composed of were at a somewhat content concrete level, really 0:11:11.800 --> 0:11:16.000 are composed of transistors and wires, and the placement has 0:11:16.000 --> 0:11:19.720 a dramatic effect on the link through the wires, which 0:11:19.720 --> 0:11:22.199 has a dramatic effect on both the performance of the 0:11:22.240 --> 0:11:24.760 chip and how much you can fit. In terms of 0:11:24.800 --> 0:11:27.679 the impact that this has on the quality of chip 0:11:27.720 --> 0:11:32.080 that you produce, wires have over time not been shrinking 0:11:32.200 --> 0:11:36.559 in the same way that transistors have, and so getting 0:11:36.800 --> 0:11:39.560 the wearing right, which usually means getting the placement right, 0:11:39.679 --> 0:11:41.560 has become more and more important over time. 0:11:57.960 --> 0:12:01.160 Can chips be beautiful? I know code can be elegant, 0:12:01.720 --> 0:12:04.160 and some people will say certain code is beautiful, But 0:12:04.320 --> 0:12:07.120 have you ever looked at a semiconductor and been like, oh, wow, 0:12:07.320 --> 0:12:09.680 that's really nicely put together. 0:12:10.520 --> 0:12:12.640 For me, I mean I think absolutely yes. This is 0:12:12.679 --> 0:12:14.320 like why I work in this space is I just 0:12:14.400 --> 0:12:16.560 really like geeking out on the design of things. But 0:12:16.800 --> 0:12:19.000 to me, what beautiful for a chip means is that 0:12:19.280 --> 0:12:21.439 it kind of does exactly what it was designed to do, 0:12:21.960 --> 0:12:24.679 and no more and no less. I mean, obviously less 0:12:24.720 --> 0:12:27.720 would be a bit of a disappointment, but often if 0:12:27.720 --> 0:12:29.600 it does more, do you think, well, maybe I designed 0:12:29.600 --> 0:12:31.600 it for slightly the wrong purpose or something like that. 0:12:32.000 --> 0:12:35.240 I think this is a good seg into getting into 0:12:35.360 --> 0:12:39.120 your business specifically, so we all know that so much 0:12:39.120 --> 0:12:42.720 of this AI is powered by these in video GPUs, 0:12:43.240 --> 0:12:46.520 but in video GPUs have been used for a long 0:12:46.559 --> 0:12:49.480 time for many things that do not have anything to 0:12:49.559 --> 0:12:53.880 do with large language models or the specific AI applications 0:12:53.880 --> 0:12:56.120 that people are excited about today in twenty twenty four. 0:12:56.640 --> 0:12:58.960 So for a while they were, well, the video games 0:12:59.000 --> 0:13:01.400 is obviously the big one for decades and decades, and 0:13:01.440 --> 0:13:03.560 then there was like five minutes where people got really 0:13:03.600 --> 0:13:07.520 excited to use them for ethereum mining, and now everyone's 0:13:07.559 --> 0:13:11.600 really excited about their use for artificial intelligence and large 0:13:11.679 --> 0:13:14.920 language models and some of these other generative AI applications 0:13:14.960 --> 0:13:18.440 that people are excited about right now, Why don't you 0:13:18.559 --> 0:13:21.920 tell us maybe the sort of idea behind maddex, but 0:13:22.040 --> 0:13:25.640 specifically what you were both doing when you were at 0:13:25.679 --> 0:13:29.440 alphabet or Google, which you know it has its own chips. 0:13:29.480 --> 0:13:33.319 I believe it has something called TPUs. What was the 0:13:33.440 --> 0:13:38.160 project at Google? Why did Google find it necessary or 0:13:38.280 --> 0:13:40.600 a good business to start building their own chips for 0:13:40.640 --> 0:13:43.520 in house purposes? And then why did you feel the 0:13:43.559 --> 0:13:46.960 need to then leave to build what you're building now 0:13:47.040 --> 0:13:48.400 for LLM specifically? 0:13:48.960 --> 0:13:52.760 Yeah, So what Google was seeing, and this was at 0:13:52.760 --> 0:13:56.640 this point sometime back more than a decade ago, they 0:13:56.679 --> 0:14:01.439 were seeing that the use of artific intelligence lllms were 0:14:01.440 --> 0:14:04.160 not a thing at that point, was going up, and 0:14:04.440 --> 0:14:08.520 they were worried about how much money they would have 0:14:08.720 --> 0:14:11.960 to spend on traditional it would be it would have 0:14:12.000 --> 0:14:16.040 been GPUs at that time, and so they built a 0:14:16.160 --> 0:14:21.040 very specialized chip to do neural nets, and that chips 0:14:21.400 --> 0:14:27.240 specialize on matrix multiplication. So they put in a structure 0:14:27.280 --> 0:14:31.520 called a systolic array, which they definitely didn't invent. It existed, 0:14:32.120 --> 0:14:35.400 has existed from the seventies that is especially good at 0:14:35.400 --> 0:14:39.920 doing matrix multiplication. Now after that, Nvidia has added a 0:14:39.960 --> 0:14:44.680 similar structure into their chips. And the initial Google TPU 0:14:45.000 --> 0:14:47.600 was an inference focused only chip, and then they have 0:14:47.840 --> 0:14:51.360 subsequently made chips that can be used for both training 0:14:51.360 --> 0:14:54.480 and inference. And I guess now is a good point 0:14:54.520 --> 0:14:56.920 to So the very last thing that I was doing 0:14:56.920 --> 0:14:59.440 at Google was I was on the TPU team and 0:14:59.480 --> 0:15:02.120 Reiner was on the large language model team. And it's 0:15:02.120 --> 0:15:04.680 probably good to have him sort of tell free from here. 0:15:05.040 --> 0:15:07.320 So I mean, what we were seeing and this this 0:15:07.400 --> 0:15:09.320 is what we personally were seeing, but Google was seeing 0:15:09.360 --> 0:15:12.120 more generally as well. Is just large language models were 0:15:12.120 --> 0:15:14.400 a thing. There was this period of time between GPT 0:15:14.560 --> 0:15:17.480 three and chat GIPT coming out. GPT three came out 0:15:17.480 --> 0:15:20.440 in twenty twenty, and so people who were very plugged 0:15:20.480 --> 0:15:24.560 into the field recognized the importance of it all at 0:15:24.640 --> 0:15:26.720 least to some extent, recognized the importance of it back then, 0:15:27.280 --> 0:15:30.080 and so there was this push to you know, everyone 0:15:30.120 --> 0:15:32.600 wanted to create their own large language model that was 0:15:32.640 --> 0:15:35.800 better than GPT three, and so, I mean, at the time, 0:15:35.840 --> 0:15:38.280 I was on the Large Language Model team. We helped 0:15:38.320 --> 0:15:41.440 training Google Palm, and we were using thousands of TPUs 0:15:41.480 --> 0:15:44.240 for that, and one of the things we were saying is, well, 0:15:44.240 --> 0:15:47.240 look what does it cost to deploy this? In Google Search? 0:15:47.360 --> 0:15:49.280 There's quite a lot of search querers. I think it's 0:15:49.320 --> 0:15:51.200 the public estimates thro about one hundred thousand of them 0:15:51.200 --> 0:15:54.600 per second. If you multiply out how much each querer costs, 0:15:54.720 --> 0:15:56.400 and if you want to run that on large language models, 0:15:56.400 --> 0:15:58.680 that's a lot more expensive. And then also I just 0:15:58.720 --> 0:16:00.680 if I want to train a model that's times bigger 0:16:00.680 --> 0:16:03.840 than my current model or one hundred times bigger, suddenly 0:16:04.280 --> 0:16:07.120 these models have just moved from costing you know, a 0:16:07.160 --> 0:16:09.640 million dollars or one hundred thousand dollars to train to 0:16:10.000 --> 0:16:12.040 tens of millions and hundreds of millions of dollars, and 0:16:12.120 --> 0:16:16.000 so the overall goal was can we make it cheaper 0:16:16.000 --> 0:16:18.440 by any way possible. So, of course there's algorithmic approaches. 0:16:18.480 --> 0:16:21.440 There's a lot of opportunity on the algorithm and research side. 0:16:21.480 --> 0:16:23.560 But then the other really big lever is just making 0:16:23.560 --> 0:16:25.840 better hardware. So one of the things we were looking 0:16:25.880 --> 0:16:29.440 at was trying to make Google's TPUs better for large 0:16:29.480 --> 0:16:32.000 language models. What led us, actually, i mean this is 0:16:32.040 --> 0:16:33.760 personally about Mike and me in this case, or what 0:16:33.840 --> 0:16:36.440 led us to leave Google to make medics was we 0:16:36.480 --> 0:16:38.640 saw that there was We believe that there is some 0:16:38.720 --> 0:16:42.400 opportunity to make chips substantially better if you're only looking 0:16:42.400 --> 0:16:45.160 to focus on large language models. And so the chips 0:16:45.160 --> 0:16:49.560 that were designed pre GPT three and especially pre chat 0:16:49.600 --> 0:16:52.560 GPT try to do a really good job on really 0:16:52.560 --> 0:16:54.440 good job on small models as well as a really 0:16:54.480 --> 0:16:56.840 good job on large models. And so what you find 0:16:56.880 --> 0:16:59.040 is that the circuitry in those chips, there's a bit 0:16:59.080 --> 0:17:01.120 of circuitry for what you need for small models, there's 0:17:01.120 --> 0:17:03.080 a bit of secretry for what you need for large models. 0:17:03.120 --> 0:17:05.760 Also for maybe embedding look ups. There's three or four 0:17:05.760 --> 0:17:08.560 different kinds of workloads, and all of them take some 0:17:08.640 --> 0:17:11.640 of the real estate in your cellica. And so if 0:17:11.640 --> 0:17:13.280 you really want to make the best use of the 0:17:13.280 --> 0:17:15.119 real estate, you should just focus on the thing you 0:17:15.160 --> 0:17:17.520 care about most and hope that there's a big market there. 0:17:17.640 --> 0:17:20.639 So that the game and or what we decided to 0:17:20.680 --> 0:17:22.600 do when we see some others deciding to do as well, 0:17:22.720 --> 0:17:25.680 is to really try and focus on just the one 0:17:25.680 --> 0:17:27.639 workload that seems like it's going to become a one 0:17:27.680 --> 0:17:30.320 hundred billion dollar or a trendion dollar industry. 0:17:30.680 --> 0:17:33.160 I know there's always this sort of cliche when talking 0:17:33.160 --> 0:17:36.480 about techno. Oh, Google and Facebook, they can just build 0:17:36.480 --> 0:17:38.760 this and they'll destroy your little startup because they have 0:17:38.840 --> 0:17:42.000 infinites amounts of money. Except that doesn't actually seem to 0:17:42.200 --> 0:17:44.840 happen in the real world as much as people on 0:17:44.880 --> 0:17:48.400 Twitter expect it to happen. But can you just sort 0:17:48.400 --> 0:17:51.639 of give a sense of maybe the business and organizational 0:17:52.200 --> 0:17:57.960 incentives for why a company like Google doesn't say, oh, 0:17:58.040 --> 0:18:00.159 this is one hundred billion dollar market in video is 0:18:00.200 --> 0:18:02.320 worth three and a half trillion or three trillion dollars, 0:18:02.440 --> 0:18:06.240 let's build our own LM specific chips. Why doesn't that 0:18:06.880 --> 0:18:11.159 happen at these large, hyperscaler companies that presumably have all 0:18:11.200 --> 0:18:12.520 the talent and money to do it. 0:18:13.920 --> 0:18:20.919 So Google's TPUs are primarily built to serve their internal customers, 0:18:21.520 --> 0:18:25.320 and Google's revenue for the most part comes from Google 0:18:25.359 --> 0:18:28.960 Search that Google Search, and in particular from Google Search ads. 0:18:29.400 --> 0:18:34.280 Google Search ads. Is you know, a customer of the TPUs, 0:18:34.040 --> 0:18:38.720 It's a relatively difficult thing to say that hundreds of 0:18:38.800 --> 0:18:41.480 billions of dollars of revenue that we're making, we're going 0:18:41.520 --> 0:18:44.359 to make a chip that doesn't really support that particularly well, 0:18:44.400 --> 0:18:47.400 and focuses on this at this point unproven in terms 0:18:47.440 --> 0:18:51.840 of revenue market and it's not just ads, but they 0:18:51.880 --> 0:18:54.320 are you know, a variety of other customers. For instance, 0:18:54.560 --> 0:18:57.359 you know, you may have noticed how Google is pretty 0:18:57.359 --> 0:19:01.679 good at identifying good photos and doing a whole variety 0:19:01.760 --> 0:19:04.359 of other things that are supported in many cases by 0:19:04.400 --> 0:19:05.000 the TPUs. 0:19:06.280 --> 0:19:08.240 I think one of the other things too, that we 0:19:08.320 --> 0:19:11.760 see in all chip companies in general, or companies producing chips, 0:19:11.840 --> 0:19:14.919 is because producing chips is so expensive, you end up 0:19:14.960 --> 0:19:16.600 in this place where you really want to put all 0:19:16.640 --> 0:19:21.320 your resources behind one chip effort. And so just because 0:19:21.400 --> 0:19:23.520 the thinking is that there's a huge amount of return 0:19:23.600 --> 0:19:25.879 on investment in making this one thing better rather than 0:19:25.920 --> 0:19:28.199 fragmenting your efforts. Really, what you'd like to do in 0:19:28.200 --> 0:19:30.880 this situation where there's a new emerging field that might 0:19:30.960 --> 0:19:33.600 be huge or might not, but it's hard to say yet, 0:19:33.720 --> 0:19:35.399 what you'd like to do is maybe spin up a 0:19:35.440 --> 0:19:37.760 second effort on the side and have like a skunk works. Yeah, 0:19:37.880 --> 0:19:38.439 that's work, right. 0:19:38.440 --> 0:19:41.199 That would be just to let Ryan er and just 0:19:41.320 --> 0:19:43.280 let the two of you go have your own little 0:19:43.280 --> 0:19:44.160 office somewhere else. 0:19:44.560 --> 0:19:48.199 Yeah, just organizationally that it's often challenging to do, and 0:19:48.240 --> 0:19:50.720 we see this across all companies. Every chip company really 0:19:50.720 --> 0:19:54.760 has essentially only one mainstream chip product that is that 0:19:54.800 --> 0:19:57.120 they're iterating on and making better and better over time. 0:19:58.200 --> 0:20:03.000 To what degree is to design driven by the customer? 0:20:03.119 --> 0:20:05.440 And what I mean by that is, so the TPUs 0:20:05.480 --> 0:20:09.639 at Google were developed to handle Google's internal workloads, but 0:20:09.920 --> 0:20:13.920 at other chip designers, to what degree will customers come 0:20:13.960 --> 0:20:16.600 and like basically do a reverse inquiry and ask for 0:20:16.640 --> 0:20:20.320 a specific chip or what does the dialogue between customers 0:20:20.400 --> 0:20:23.320 and the big chip designers actually look like. 0:20:24.080 --> 0:20:27.040 Yeah, it's a fun interplay of I want my provider 0:20:27.080 --> 0:20:28.479 to do a good job, but I also don't want 0:20:28.520 --> 0:20:31.880 to leak my IP too much. So you can see 0:20:31.920 --> 0:20:34.640 this how this played out in so Mike was talking 0:20:34.680 --> 0:20:37.880 about through the development of the TPUs which were publicly 0:20:37.920 --> 0:20:41.439