WEBVTT - AWS CEO Matt Garman Talks AI Race 0:00:02.520 --> 0:00:07.000 Bloomberg Audio Studios, podcasts, radio news. 0:00:07.840 --> 0:00:10.160 Now, let's narrow our focus from the broader markets to 0:00:10.280 --> 0:00:15.079 one single stock. Amazon, the tech giant, hosting its annual 0:00:15.160 --> 0:00:19.280 Amazon Web Services Reinvent Conference down in Las Vegas this week. 0:00:19.520 --> 0:00:23.400 The Cloud focused confab draws developers, engineers, and other thought 0:00:23.480 --> 0:00:25.840 leaders in tech to explore the latest cloud and AI 0:00:25.960 --> 0:00:29.640 projects happening under Amazon's roof, including a new AI chip. 0:00:29.960 --> 0:00:32.839 Let's go live now, we're Bloomberg Tech co host ed 0:00:33.000 --> 0:00:38.960 Ludlow is joined by a special guest ed take it away. Yeah. 0:00:39.000 --> 0:00:41.639 Three pieces of news move markets this morning. A new 0:00:41.720 --> 0:00:45.960 generation of Frontier model from AWS new agentic tools, and 0:00:46.000 --> 0:00:51.720 then a very quickly released, installed and now ramping generation 0:00:51.920 --> 0:00:55.520 of in house custom accelerator which is Trainium three. All 0:00:55.520 --> 0:00:59.640 points of discussion for Matt Garman, AWSCO. You know the 0:00:59.680 --> 0:01:02.920 base point with Trainium three and you've moved quickly to 0:01:02.960 --> 0:01:07.360 bring it to the real world is cost performance efficiency 0:01:07.440 --> 0:01:10.800 over the prior generation, but also over in Vidia GPUs, 0:01:10.920 --> 0:01:14.360 over Google TPUs. I think what people are trying to 0:01:14.400 --> 0:01:17.440 understand is that ramp part I was talking about when 0:01:17.560 --> 0:01:22.240 real world customers use it beyond this anchor customer of Enenthropic, 0:01:22.480 --> 0:01:23.800 which relies on it currently. 0:01:24.200 --> 0:01:27.520 Yeah. Well, look, we're quite excited about Trainium and Trainium 0:01:27.520 --> 0:01:30.360 three in particular, as you mentioned, excited to get it 0:01:30.360 --> 0:01:32.760 into customer's hands. And part of where we have a 0:01:32.800 --> 0:01:35.759 benefit that we can bring to bear is, as you mentioned, 0:01:35.760 --> 0:01:38.120 getting it into market quickly, and it's because we control 0:01:38.200 --> 0:01:41.360 that full stack, We control the silicon development, we control 0:01:41.360 --> 0:01:43.440 the data centers that all land in. We know that 0:01:43.480 --> 0:01:45.880 full environment, and we can land that in very large 0:01:45.880 --> 0:01:47.720 clusters for people to take advantage of that, and the 0:01:47.760 --> 0:01:50.840 performance that we're seeing out of it is quite incredible, 0:01:50.840 --> 0:01:53.280 and so we're anxious and excited to get more and 0:01:53.320 --> 0:01:54.080 more people using it. 0:01:54.640 --> 0:01:56.600 I've been able to go inside out a Perna Labs 0:01:56.640 --> 0:01:59.280 and look at the engineering work between the first generation 0:01:59.360 --> 0:02:02.960 of Trainium and second It wasn't just the accelerator at 0:02:02.960 --> 0:02:04.320 the server level as well. 0:02:04.440 --> 0:02:04.800 That's right. 0:02:05.040 --> 0:02:07.040 But a part of the surprise of today is this, 0:02:07.560 --> 0:02:10.560 you appear to be committing to an annual cadence a 0:02:10.560 --> 0:02:13.799 new generation of Trainium. How do you keep that up? 0:02:14.000 --> 0:02:16.880 Well, the key thing that we're focused on is making 0:02:16.880 --> 0:02:20.160 sure that we can iterate on the technology as fast 0:02:20.160 --> 0:02:23.359 as possible. The desire and the hunger out there for 0:02:23.760 --> 0:02:28.240 more power and more compute is almost insatiable. And so 0:02:28.320 --> 0:02:31.799 the more we can take an existing power footprint, an 0:02:31.800 --> 0:02:34.600 existing set of capabilities and bring more and more compute 0:02:34.600 --> 0:02:38.799 into that for customers to build cool applications and cool 0:02:38.880 --> 0:02:42.000 environments and to get value from that, that's we're focused on. Then, 0:02:42.040 --> 0:02:44.160 so we're going to be pushing that envelope as fast 0:02:44.200 --> 0:02:46.600 as we as we possibly can to get those new 0:02:46.600 --> 0:02:48.120 and new capabilities out to customers. 0:02:48.280 --> 0:02:50.760 The pitch for Trainium in both the training and inference 0:02:50.840 --> 0:02:54.160 use case is that it's a great deal, you know, 0:02:54.320 --> 0:02:57.480 cost effective performance. At the same time, you went on 0:02:57.520 --> 0:03:00.720 stage and said AWS is quote by far the best 0:03:00.720 --> 0:03:05.560 place to run in Vidia GPS how above if possible. 0:03:05.320 --> 0:03:07.920 Well, I mean that both both are possible because that 0:03:08.040 --> 0:03:12.760 is a great environment to run accelerators and compute in. 0:03:13.160 --> 0:03:15.960 And so we've been working for fifteen plus years with 0:03:16.040 --> 0:03:19.400 the in Nvidia team and Jensen and team to deliver 0:03:19.639 --> 0:03:23.240 outstanding capabilities for our customers and for when you're running 0:03:23.280 --> 0:03:26.160 a large cluster of Nvidia GPUs, people will tell you 0:03:26.240 --> 0:03:28.600 AIGHTWS is the best place you get the best performance, 0:03:28.639 --> 0:03:32.120 the most stable cluster the best capabilities out there and 0:03:32.320 --> 0:03:34.720 broad scale, and it's why folks like OpenAI and others 0:03:34.760 --> 0:03:37.880 are running in AWS and we have that choice. And 0:03:37.960 --> 0:03:39.800 so for others that want to be able to take 0:03:39.840 --> 0:03:42.800 advantage of Trainium, and there's some use cases that are 0:03:42.800 --> 0:03:45.040 best for Trainum, there's other use cases where in vidio 0:03:45.080 --> 0:03:47.040 GPUs are going to be your best option. We want 0:03:47.040 --> 0:03:49.200 to have all of those available, and so we think 0:03:49.200 --> 0:03:51.680 that if we can continue to push the envelope on 0:03:51.720 --> 0:03:54.480 what Trainium can deliver for customers and make sure that 0:03:54.560 --> 0:03:58.040 we are supporting the latest and greatest from everything that 0:03:58.040 --> 0:04:00.800 the awesome team in Nvidia is delivering, that's going to 0:04:00.800 --> 0:04:02.240 be the best outcome for our customers. 0:04:02.800 --> 0:04:06.200 The plan for AWS is to basically double capacity by 0:04:06.200 --> 0:04:08.760 the end of twenty twenty seven to round eight gigawatts, 0:04:09.080 --> 0:04:11.480 so you have a sense of how you apportion that 0:04:11.520 --> 0:04:15.920 capacity in how silicon and server designs to traineum versus 0:04:16.080 --> 0:04:17.159 and video gp is. 0:04:18.000 --> 0:04:19.560 We're just going to keep pushing as fast as we 0:04:19.600 --> 0:04:22.200 can and we'll see where customer demands drives us as 0:04:22.240 --> 0:04:25.719 we go. And as you said, we're massively adding capacity. 0:04:25.760 --> 0:04:28.200 In the last year alone, we've added three point eight 0:04:28.200 --> 0:04:30.760 gigawatts of capacity, and we'll continue to add more and 0:04:30.800 --> 0:04:33.640 more as over the next couple of years, and we'll 0:04:33.680 --> 0:04:35.880 let customer demands drive us a little bit on what 0:04:35.920 --> 0:04:38.919 they're looking for and what they want, and that's what 0:04:38.960 --> 0:04:40.520 we always listen to and that's what we'll continue to 0:04:40.520 --> 0:04:40.880 listen to. 0:04:41.400 --> 0:04:43.760 The focus with Trainium in the time I've been able 0:04:43.760 --> 0:04:46.240 to interact with you and talk about not again not 0:04:46.279 --> 0:04:49.400 just the accelerator, but at the server design level, there's 0:04:49.440 --> 0:04:52.239 a lot of benefits the customer. When does that benefit 0:04:52.320 --> 0:04:55.520 start accruing to AWS in terms of profitability, Like if 0:04:55.520 --> 0:04:59.160 it's such a good financial proposition, you must be able 0:04:59.240 --> 0:05:01.080 soon to say when making a lot of money on this. 0:05:01.240 --> 0:05:03.600 Yeah, Well, you're already seeing some of the benefits of 0:05:03.680 --> 0:05:06.239 crew You see things like bedrock growing really really rapidly, 0:05:06.440 --> 0:05:08.760 and you see trainingum powering that under the covers, and 0:05:08.800 --> 0:05:12.920 we announced today that more than half of all tokens 0:05:12.920 --> 0:05:15.760 and inference done in bedrock are done on TRAININGUM two 0:05:15.760 --> 0:05:18.000 servers under the covers, and so you're already seeing that 0:05:18.040 --> 0:05:21.160 benefit come. You see the models that we're building in 0:05:21.240 --> 0:05:23.480 Nova and Nova two start to get better and better 0:05:23.520 --> 0:05:26.839 over time and be accelerated by Trainum, and so we 0:05:26.880 --> 0:05:29.240 really think that there's a whole bunch of dimensions on 0:05:29.279 --> 0:05:32.520 which both our customers, our partners, and our own products 0:05:32.520 --> 0:05:34.480 are going to get accelerated all from Trainium. 0:05:34.800 --> 0:05:37.200 Every time you come onto the program, I always offer 0:05:37.240 --> 0:05:39.400 the audience opportunity to pose a question to you. There's 0:05:39.400 --> 0:05:41.680 a lot of interests in AWS right. Many of your 0:05:41.920 --> 0:05:45.680 customers span global technology. Actually most of the questions were 0:05:45.720 --> 0:05:49.680 about anthropic. That wasn't much said on stage. I think 0:05:49.680 --> 0:05:52.760 people are trying to understand what is the benefit and 0:05:52.839 --> 0:05:57.560 advantage AWS Office to anthropic while they are ramping Trainium 0:05:57.560 --> 0:06:01.320 through Project Raineer, but also ramping their tea allocations as well. 0:06:01.839 --> 0:06:04.000 Well. Look, our partners are an anthropic. Our partnership with 0:06:04.000 --> 0:06:06.440 them is incredibly strong and it's never been stronger, and 0:06:07.720 --> 0:06:09.720 we do a ton of collaboration with them, and as 0:06:09.720 --> 0:06:12.480 I mentioned through Project Right here, it's a huge collaboration 0:06:12.600 --> 0:06:15.279 there to go build their current generation models and all 0:06:15.320 --> 0:06:18.400 their models run today and launch on day one on 0:06:18.520 --> 0:06:21.120 top of Trainingum and on top of AWS which we're 0:06:21.160 --> 0:06:23.520 incredibly excited about it, and we'll continue that partnership for 0:06:23.520 --> 0:06:26.560 a long time. I think from them, they have a 0:06:26.680 --> 0:06:29.080 huge demand for compute, and so they'll go to other 0:06:29.120 --> 0:06:32.080 places where it makes sense to round out their compute 0:06:32.480 --> 0:06:35.640 needs because they just have such massive needs for compute, 0:06:35.680 --> 0:06:38.039 and they have customers in other clouds as well, But 0:06:38.080 --> 0:06:41.640 we're definitely their their primary cloud provider and closest partner. 0:06:41.680 --> 0:06:42.000 For sure. 0:06:42.960 --> 0:06:46.680 Supply constraints so am Fropic is supply constrained, they can't 0:06:46.680 --> 0:06:49.400 get the compute they need. We've talked about the rampont 0:06:49.440 --> 0:06:52.279 and video GPU and in house silicon. Is there a 0:06:52.320 --> 0:06:55.679 supply constraint element with AWS so you able to get 0:06:56.040 --> 0:06:57.479 the chips that you need. 0:06:57.640 --> 0:07:01.440 Yeah, I think there's always anytime you see an industry 0:07:01.440 --> 0:07:03.840 that's growing as fast as this is right now, when 0:07:03.839 --> 0:07:07.120 you think about AI and model development and chips, there 0:07:07.160 --> 0:07:09.440 are going to be constraints. No matter what. There is 0:07:09.520 --> 0:07:13.040 more demand than there is supply. Sometimes it's in chips, 0:07:13.080 --> 0:07:15.680 Sometimes it's in power and data centers. Sometimes it's in 0:07:16.440 --> 0:07:19.120 you know, different parts of that. At some points it's 0:07:19.440 --> 0:07:23.320 you know, networking equipment. At some point it's transistors, you know, 0:07:23.480 --> 0:07:25.520 resistors or whatever it is, and you look at the 0:07:25.680 --> 0:07:29.320 entire supply chain that is needed to ramp up at 0:07:29.400 --> 0:07:32.720 such a massive rate right Never before has the technology 0:07:32.720 --> 0:07:34.640 industry ramped at the rate that we are right now, 0:07:35.240 --> 0:07:37.640 and so there are always constraints, and so it's not 0:07:37.680 --> 0:07:40.920 necessarily that there is necessarily one constraint where it's like, wow, 0:07:40.960 --> 0:07:42.600 I can't get in vidio chips. We can get in 0:07:42.640 --> 0:07:45.720 video chips. And actually Jensen team have been incredibly supportive 0:07:45.840 --> 0:07:48.160 and great partners and helping us get capacity there. It's 0:07:48.200 --> 0:07:50.480 not that you can't get power. We're getting power all 0:07:50.520 --> 0:07:52.720 over the place. But it's just we're ramping all of 0:07:52.760 --> 0:07:56.280 these places in such rapid rates that always there's a 0:07:56.320 --> 0:07:58.560 constraint in that system, and it'll change every month you 0:07:58.600 --> 0:07:59.720 ask me of what the current one is. 0:08:00.080 --> 0:08:01.960 Throughout the day, we was just speaking with your team 0:08:02.040 --> 0:08:05.200 about the idea we're moving from AI assistance to AI 0:08:05.280 --> 0:08:08.400 co workers. You know, particular focus on the agentic offering 0:08:08.440 --> 0:08:10.560 that you've done. You're in the camp of people, if 0:08:10.600 --> 0:08:13.520 you don't mind me saying that sees basically ninety percent 0:08:13.560 --> 0:08:16.840 of the value in enterprise coming from agentic technology. Do 0:08:16.880 --> 0:08:19.280 you have any data or evidence to support that all 0:08:19.320 --> 0:08:20.800 of your customers are ready for that? 0:08:21.320 --> 0:08:23.920 Yeah, I don't think all of our customers are. You 0:08:23.920 --> 0:08:25.800 get ready for that, but they're excited about it, So, 0:08:25.840 --> 0:08:27.360 you know, I think it'd definitely be an overstatement to 0:08:27.360 --> 0:08:29.280 say everybody's ready for it. And part of that is 0:08:29.320 --> 0:08:31.480 because it is going to take change. Right, people are 0:08:31.480 --> 0:08:33.240 going to have to change how they think about work. 0:08:33.280 --> 0:08:35.240 They're going to have to change their process flows, they're 0:08:35.240 --> 0:08:37.240 going to have to change some of the things about how 0:08:37.240 --> 0:08:38.680 they get work done. It's not just going to be 0:08:38.880 --> 0:08:40.920 a magic one that's going to come in and magically 0:08:40.920 --> 0:08:44.199 get them to get value. But almost everyone that I 0:08:44.280 --> 0:08:46.440 talked to you definitely sees that that's the path. These 0:08:46.520 --> 0:08:50.560 the agentic power of the power of agents is what 0:08:50.720 --> 0:08:53.480 allows customers to actually get that work done. And when 0:08:53.520 --> 0:08:55.880 they see that efficiency gain, they see them able to 0:08:55.880 --> 0:08:58.640 accomplish things they weren't able to do before. That is 0:08:58.640 --> 0:09:00.440 when it's worth it to go make these changes. And 0:09:00.480 --> 0:09:01.920 so there's going to be work for people, and it's 0:09:01.960 --> 0:09:04.440 going to take some time, right, It's taken We're twenty 0:09:04.480 --> 0:09:07.319 years into the cloud journey and still only a fraction 0:09:07.400 --> 0:09:09.280 of workloads have moved to the cloud. So it's going 0:09:09.360 --> 0:09:10.720 to take time. It's not like the people are going 0:09:10.800 --> 0:09:12.880 to magically switch. And I think it's going to be really. 0:09:12.800 --> 0:09:15.719 Fair that we just have sixty seconds twenty years into 0:09:15.760 --> 0:09:17.920 the cloud journey. When I touched down in Vegas. Everyone 0:09:17.960 --> 0:09:23.000 accepts AWS number one in terms of scale infrastructure. They question, 0:09:23.200 --> 0:09:26.319 is AWS number one in AI? Just in the thirty 0:09:26.360 --> 0:09:28.760 seconds we have left? Yeah, I think I'll give it. 0:09:28.760 --> 0:09:30.240 It's a question that we got a lot two years 0:09:30.240 --> 0:09:32.160 ago and not that much a year ago, and today 0:09:32.160 --> 0:09:33.719 I don't think we get that nearly as much. It's 0:09:33.720 --> 0:09:35.360 just people that are kind of playing the same tapes. 0:09:35.679 --> 0:09:39.319 We have a huge choice of models. We see when 0:09:39.360 --> 0:09:42.480 customers are actually moving their workloads to production, they want 0:09:42.480 --> 0:09:44.440 to run those AI workloads on AWS, and that to 0:09:44.480 --> 0:09:46.800 me is the biggest signal. When we see our customers 0:09:46.840 --> 0:09:48.600 they say, I ran proof of concepts in a lot 0:09:48.600 --> 0:09:50.439 of places. When I want to move to production, I 0:09:50.480 --> 0:09:52.200 want to run on AWS. And that's the thing that 0:09:52.200 --> 0:09:54.120 we hear over and over again, which makes me think 0:09:54.160 --> 0:09:55.320 we're actually in a great position. 0:09:55.640 --> 0:09:59.240 Matt Garman, AWS, CEO with the Full Stack AI Company, 0:09:59.280 --> 0:10:01.080 pitch here Vegas at reinvent