WEBVTT - AWS CEO Matt Garman Talks AI Race

0:00:02.520 --> 0:00:07.000
<v Speaker 1>Bloomberg Audio Studios, podcasts, radio news.

0:00:07.840 --> 0:00:10.160
<v Speaker 2>Now, let's narrow our focus from the broader markets to

0:00:10.280 --> 0:00:15.079
<v Speaker 2>one single stock. Amazon, the tech giant, hosting its annual

0:00:15.160 --> 0:00:19.280
<v Speaker 2>Amazon Web Services Reinvent Conference down in Las Vegas this week.

0:00:19.520 --> 0:00:23.400
<v Speaker 2>The Cloud focused confab draws developers, engineers, and other thought

0:00:23.480 --> 0:00:25.840
<v Speaker 2>leaders in tech to explore the latest cloud and AI

0:00:25.960 --> 0:00:29.640
<v Speaker 2>projects happening under Amazon's roof, including a new AI chip.

0:00:29.960 --> 0:00:32.839
<v Speaker 2>Let's go live now, we're Bloomberg Tech co host ed

0:00:33.000 --> 0:00:38.960
<v Speaker 2>Ludlow is joined by a special guest ed take it away. Yeah.

0:00:39.000 --> 0:00:41.639
<v Speaker 3>Three pieces of news move markets this morning. A new

0:00:41.720 --> 0:00:45.960
<v Speaker 3>generation of Frontier model from AWS new agentic tools, and

0:00:46.000 --> 0:00:51.720
<v Speaker 3>then a very quickly released, installed and now ramping generation

0:00:51.920 --> 0:00:55.520
<v Speaker 3>of in house custom accelerator which is Trainium three. All

0:00:55.520 --> 0:00:59.640
<v Speaker 3>points of discussion for Matt Garman, AWSCO. You know the

0:00:59.680 --> 0:01:02.920
<v Speaker 3>base point with Trainium three and you've moved quickly to

0:01:02.960 --> 0:01:07.360
<v Speaker 3>bring it to the real world is cost performance efficiency

0:01:07.440 --> 0:01:10.800
<v Speaker 3>over the prior generation, but also over in Vidia GPUs,

0:01:10.920 --> 0:01:14.360
<v Speaker 3>over Google TPUs. I think what people are trying to

0:01:14.400 --> 0:01:17.440
<v Speaker 3>understand is that ramp part I was talking about when

0:01:17.560 --> 0:01:22.240
<v Speaker 3>real world customers use it beyond this anchor customer of Enenthropic,

0:01:22.480 --> 0:01:23.800
<v Speaker 3>which relies on it currently.

0:01:24.200 --> 0:01:27.520
<v Speaker 1>Yeah. Well, look, we're quite excited about Trainium and Trainium

0:01:27.520 --> 0:01:30.360
<v Speaker 1>three in particular, as you mentioned, excited to get it

0:01:30.360 --> 0:01:32.760
<v Speaker 1>into customer's hands. And part of where we have a

0:01:32.800 --> 0:01:35.759
<v Speaker 1>benefit that we can bring to bear is, as you mentioned,

0:01:35.760 --> 0:01:38.120
<v Speaker 1>getting it into market quickly, and it's because we control

0:01:38.200 --> 0:01:41.360
<v Speaker 1>that full stack, We control the silicon development, we control

0:01:41.360 --> 0:01:43.440
<v Speaker 1>the data centers that all land in. We know that

0:01:43.480 --> 0:01:45.880
<v Speaker 1>full environment, and we can land that in very large

0:01:45.880 --> 0:01:47.720
<v Speaker 1>clusters for people to take advantage of that, and the

0:01:47.760 --> 0:01:50.840
<v Speaker 1>performance that we're seeing out of it is quite incredible,

0:01:50.840 --> 0:01:53.280
<v Speaker 1>and so we're anxious and excited to get more and

0:01:53.320 --> 0:01:54.080
<v Speaker 1>more people using it.

0:01:54.640 --> 0:01:56.600
<v Speaker 3>I've been able to go inside out a Perna Labs

0:01:56.640 --> 0:01:59.280
<v Speaker 3>and look at the engineering work between the first generation

0:01:59.360 --> 0:02:02.960
<v Speaker 3>of Trainium and second It wasn't just the accelerator at

0:02:02.960 --> 0:02:04.320
<v Speaker 3>the server level as well.

0:02:04.440 --> 0:02:04.800
<v Speaker 1>That's right.

0:02:05.040 --> 0:02:07.040
<v Speaker 3>But a part of the surprise of today is this,

0:02:07.560 --> 0:02:10.560
<v Speaker 3>you appear to be committing to an annual cadence a

0:02:10.560 --> 0:02:13.799
<v Speaker 3>new generation of Trainium. How do you keep that up?

0:02:14.000 --> 0:02:16.880
<v Speaker 1>Well, the key thing that we're focused on is making

0:02:16.880 --> 0:02:20.160
<v Speaker 1>sure that we can iterate on the technology as fast

0:02:20.160 --> 0:02:23.359
<v Speaker 1>as possible. The desire and the hunger out there for

0:02:23.760 --> 0:02:28.240
<v Speaker 1>more power and more compute is almost insatiable. And so

0:02:28.320 --> 0:02:31.799
<v Speaker 1>the more we can take an existing power footprint, an

0:02:31.800 --> 0:02:34.600
<v Speaker 1>existing set of capabilities and bring more and more compute

0:02:34.600 --> 0:02:38.799
<v Speaker 1>into that for customers to build cool applications and cool

0:02:38.880 --> 0:02:42.000
<v Speaker 1>environments and to get value from that, that's we're focused on. Then,

0:02:42.040 --> 0:02:44.160
<v Speaker 1>so we're going to be pushing that envelope as fast

0:02:44.200 --> 0:02:46.600
<v Speaker 1>as we as we possibly can to get those new

0:02:46.600 --> 0:02:48.120
<v Speaker 1>and new capabilities out to customers.

0:02:48.280 --> 0:02:50.760
<v Speaker 3>The pitch for Trainium in both the training and inference

0:02:50.840 --> 0:02:54.160
<v Speaker 3>use case is that it's a great deal, you know,

0:02:54.320 --> 0:02:57.480
<v Speaker 3>cost effective performance. At the same time, you went on

0:02:57.520 --> 0:03:00.720
<v Speaker 3>stage and said AWS is quote by far the best

0:03:00.720 --> 0:03:05.560
<v Speaker 3>place to run in Vidia GPS how above if possible.

0:03:05.320 --> 0:03:07.920
<v Speaker 1>Well, I mean that both both are possible because that

0:03:08.040 --> 0:03:12.760
<v Speaker 1>is a great environment to run accelerators and compute in.

0:03:13.160 --> 0:03:15.960
<v Speaker 1>And so we've been working for fifteen plus years with

0:03:16.040 --> 0:03:19.400
<v Speaker 1>the in Nvidia team and Jensen and team to deliver

0:03:19.639 --> 0:03:23.240
<v Speaker 1>outstanding capabilities for our customers and for when you're running

0:03:23.280 --> 0:03:26.160
<v Speaker 1>a large cluster of Nvidia GPUs, people will tell you

0:03:26.240 --> 0:03:28.600
<v Speaker 1>AIGHTWS is the best place you get the best performance,

0:03:28.639 --> 0:03:32.120
<v Speaker 1>the most stable cluster the best capabilities out there and

0:03:32.320 --> 0:03:34.720
<v Speaker 1>broad scale, and it's why folks like OpenAI and others

0:03:34.760 --> 0:03:37.880
<v Speaker 1>are running in AWS and we have that choice. And

0:03:37.960 --> 0:03:39.800
<v Speaker 1>so for others that want to be able to take

0:03:39.840 --> 0:03:42.800
<v Speaker 1>advantage of Trainium, and there's some use cases that are

0:03:42.800 --> 0:03:45.040
<v Speaker 1>best for Trainum, there's other use cases where in vidio

0:03:45.080 --> 0:03:47.040
<v Speaker 1>GPUs are going to be your best option. We want

0:03:47.040 --> 0:03:49.200
<v Speaker 1>to have all of those available, and so we think

0:03:49.200 --> 0:03:51.680
<v Speaker 1>that if we can continue to push the envelope on

0:03:51.720 --> 0:03:54.480
<v Speaker 1>what Trainium can deliver for customers and make sure that

0:03:54.560 --> 0:03:58.040
<v Speaker 1>we are supporting the latest and greatest from everything that

0:03:58.040 --> 0:04:00.800
<v Speaker 1>the awesome team in Nvidia is delivering, that's going to

0:04:00.800 --> 0:04:02.240
<v Speaker 1>be the best outcome for our customers.

0:04:02.800 --> 0:04:06.200
<v Speaker 3>The plan for AWS is to basically double capacity by

0:04:06.200 --> 0:04:08.760
<v Speaker 3>the end of twenty twenty seven to round eight gigawatts,

0:04:09.080 --> 0:04:11.480
<v Speaker 3>so you have a sense of how you apportion that

0:04:11.520 --> 0:04:15.920
<v Speaker 3>capacity in how silicon and server designs to traineum versus

0:04:16.080 --> 0:04:17.159
<v Speaker 3>and video gp is.

0:04:18.000 --> 0:04:19.560
<v Speaker 1>We're just going to keep pushing as fast as we

0:04:19.600 --> 0:04:22.200
<v Speaker 1>can and we'll see where customer demands drives us as

0:04:22.240 --> 0:04:25.719
<v Speaker 1>we go. And as you said, we're massively adding capacity.

0:04:25.760 --> 0:04:28.200
<v Speaker 1>In the last year alone, we've added three point eight

0:04:28.200 --> 0:04:30.760
<v Speaker 1>gigawatts of capacity, and we'll continue to add more and

0:04:30.800 --> 0:04:33.640
<v Speaker 1>more as over the next couple of years, and we'll

0:04:33.680 --> 0:04:35.880
<v Speaker 1>let customer demands drive us a little bit on what

0:04:35.920 --> 0:04:38.919
<v Speaker 1>they're looking for and what they want, and that's what

0:04:38.960 --> 0:04:40.520
<v Speaker 1>we always listen to and that's what we'll continue to

0:04:40.520 --> 0:04:40.880
<v Speaker 1>listen to.

0:04:41.400 --> 0:04:43.760
<v Speaker 3>The focus with Trainium in the time I've been able

0:04:43.760 --> 0:04:46.240
<v Speaker 3>to interact with you and talk about not again not

0:04:46.279 --> 0:04:49.400
<v Speaker 3>just the accelerator, but at the server design level, there's

0:04:49.440 --> 0:04:52.239
<v Speaker 3>a lot of benefits the customer. When does that benefit

0:04:52.320 --> 0:04:55.520
<v Speaker 3>start accruing to AWS in terms of profitability, Like if

0:04:55.520 --> 0:04:59.160
<v Speaker 3>it's such a good financial proposition, you must be able

0:04:59.240 --> 0:05:01.080
<v Speaker 3>soon to say when making a lot of money on this.

0:05:01.240 --> 0:05:03.600
<v Speaker 1>Yeah, Well, you're already seeing some of the benefits of

0:05:03.680 --> 0:05:06.239
<v Speaker 1>crew You see things like bedrock growing really really rapidly,

0:05:06.440 --> 0:05:08.760
<v Speaker 1>and you see trainingum powering that under the covers, and

0:05:08.800 --> 0:05:12.920
<v Speaker 1>we announced today that more than half of all tokens

0:05:12.920 --> 0:05:15.760
<v Speaker 1>and inference done in bedrock are done on TRAININGUM two

0:05:15.760 --> 0:05:18.000
<v Speaker 1>servers under the covers, and so you're already seeing that

0:05:18.040 --> 0:05:21.160
<v Speaker 1>benefit come. You see the models that we're building in

0:05:21.240 --> 0:05:23.480
<v Speaker 1>Nova and Nova two start to get better and better

0:05:23.520 --> 0:05:26.839
<v Speaker 1>over time and be accelerated by Trainum, and so we

0:05:26.880 --> 0:05:29.240
<v Speaker 1>really think that there's a whole bunch of dimensions on

0:05:29.279 --> 0:05:32.520
<v Speaker 1>which both our customers, our partners, and our own products

0:05:32.520 --> 0:05:34.480
<v Speaker 1>are going to get accelerated all from Trainium.

0:05:34.800 --> 0:05:37.200
<v Speaker 3>Every time you come onto the program, I always offer

0:05:37.240 --> 0:05:39.400
<v Speaker 3>the audience opportunity to pose a question to you. There's

0:05:39.400 --> 0:05:41.680
<v Speaker 3>a lot of interests in AWS right. Many of your

0:05:41.920 --> 0:05:45.680
<v Speaker 3>customers span global technology. Actually most of the questions were

0:05:45.720 --> 0:05:49.680
<v Speaker 3>about anthropic. That wasn't much said on stage. I think

0:05:49.680 --> 0:05:52.760
<v Speaker 3>people are trying to understand what is the benefit and

0:05:52.839 --> 0:05:57.560
<v Speaker 3>advantage AWS Office to anthropic while they are ramping Trainium

0:05:57.560 --> 0:06:01.320
<v Speaker 3>through Project Raineer, but also ramping their tea allocations as well.

0:06:01.839 --> 0:06:04.000
<v Speaker 1>Well. Look, our partners are an anthropic. Our partnership with

0:06:04.000 --> 0:06:06.440
<v Speaker 1>them is incredibly strong and it's never been stronger, and

0:06:07.720 --> 0:06:09.720
<v Speaker 1>we do a ton of collaboration with them, and as

0:06:09.720 --> 0:06:12.480
<v Speaker 1>I mentioned through Project Right here, it's a huge collaboration

0:06:12.600 --> 0:06:15.279
<v Speaker 1>there to go build their current generation models and all

0:06:15.320 --> 0:06:18.400
<v Speaker 1>their models run today and launch on day one on

0:06:18.520 --> 0:06:21.120
<v Speaker 1>top of Trainingum and on top of AWS which we're

0:06:21.160 --> 0:06:23.520
<v Speaker 1>incredibly excited about it, and we'll continue that partnership for

0:06:23.520 --> 0:06:26.560
<v Speaker 1>a long time. I think from them, they have a

0:06:26.680 --> 0:06:29.080
<v Speaker 1>huge demand for compute, and so they'll go to other

0:06:29.120 --> 0:06:32.080
<v Speaker 1>places where it makes sense to round out their compute

0:06:32.480 --> 0:06:35.640
<v Speaker 1>needs because they just have such massive needs for compute,

0:06:35.680 --> 0:06:38.039
<v Speaker 1>and they have customers in other clouds as well, But

0:06:38.080 --> 0:06:41.640
<v Speaker 1>we're definitely their their primary cloud provider and closest partner.

0:06:41.680 --> 0:06:42.000
<v Speaker 1>For sure.

0:06:42.960 --> 0:06:46.680
<v Speaker 3>Supply constraints so am Fropic is supply constrained, they can't

0:06:46.680 --> 0:06:49.400
<v Speaker 3>get the compute they need. We've talked about the rampont

0:06:49.440 --> 0:06:52.279
<v Speaker 3>and video GPU and in house silicon. Is there a

0:06:52.320 --> 0:06:55.679
<v Speaker 3>supply constraint element with AWS so you able to get

0:06:56.040 --> 0:06:57.479
<v Speaker 3>the chips that you need.

0:06:57.640 --> 0:07:01.440
<v Speaker 1>Yeah, I think there's always anytime you see an industry

0:07:01.440 --> 0:07:03.840
<v Speaker 1>that's growing as fast as this is right now, when

0:07:03.839 --> 0:07:07.120
<v Speaker 1>you think about AI and model development and chips, there

0:07:07.160 --> 0:07:09.440
<v Speaker 1>are going to be constraints. No matter what. There is

0:07:09.520 --> 0:07:13.040
<v Speaker 1>more demand than there is supply. Sometimes it's in chips,

0:07:13.080 --> 0:07:15.680
<v Speaker 1>Sometimes it's in power and data centers. Sometimes it's in

0:07:16.440 --> 0:07:19.120
<v Speaker 1>you know, different parts of that. At some points it's

0:07:19.440 --> 0:07:23.320
<v Speaker 1>you know, networking equipment. At some point it's transistors, you know,

0:07:23.480 --> 0:07:25.520
<v Speaker 1>resistors or whatever it is, and you look at the

0:07:25.680 --> 0:07:29.320
<v Speaker 1>entire supply chain that is needed to ramp up at

0:07:29.400 --> 0:07:32.720
<v Speaker 1>such a massive rate right Never before has the technology

0:07:32.720 --> 0:07:34.640
<v Speaker 1>industry ramped at the rate that we are right now,

0:07:35.240 --> 0:07:37.640
<v Speaker 1>and so there are always constraints, and so it's not

0:07:37.680 --> 0:07:40.920
<v Speaker 1>necessarily that there is necessarily one constraint where it's like, wow,

0:07:40.960 --> 0:07:42.600
<v Speaker 1>I can't get in vidio chips. We can get in

0:07:42.640 --> 0:07:45.720
<v Speaker 1>video chips. And actually Jensen team have been incredibly supportive

0:07:45.840 --> 0:07:48.160
<v Speaker 1>and great partners and helping us get capacity there. It's

0:07:48.200 --> 0:07:50.480
<v Speaker 1>not that you can't get power. We're getting power all

0:07:50.520 --> 0:07:52.720
<v Speaker 1>over the place. But it's just we're ramping all of

0:07:52.760 --> 0:07:56.280
<v Speaker 1>these places in such rapid rates that always there's a

0:07:56.320 --> 0:07:58.560
<v Speaker 1>constraint in that system, and it'll change every month you

0:07:58.600 --> 0:07:59.720
<v Speaker 1>ask me of what the current one is.

0:08:00.080 --> 0:08:01.960
<v Speaker 3>Throughout the day, we was just speaking with your team

0:08:02.040 --> 0:08:05.200
<v Speaker 3>about the idea we're moving from AI assistance to AI

0:08:05.280 --> 0:08:08.400
<v Speaker 3>co workers. You know, particular focus on the agentic offering

0:08:08.440 --> 0:08:10.560
<v Speaker 3>that you've done. You're in the camp of people, if

0:08:10.600 --> 0:08:13.520
<v Speaker 3>you don't mind me saying that sees basically ninety percent

0:08:13.560 --> 0:08:16.840
<v Speaker 3>of the value in enterprise coming from agentic technology. Do

0:08:16.880 --> 0:08:19.280
<v Speaker 3>you have any data or evidence to support that all

0:08:19.320 --> 0:08:20.800
<v Speaker 3>of your customers are ready for that?

0:08:21.320 --> 0:08:23.920
<v Speaker 1>Yeah, I don't think all of our customers are. You

0:08:23.920 --> 0:08:25.800
<v Speaker 1>get ready for that, but they're excited about it, So,

0:08:25.840 --> 0:08:27.360
<v Speaker 1>you know, I think it'd definitely be an overstatement to

0:08:27.360 --> 0:08:29.280
<v Speaker 1>say everybody's ready for it. And part of that is

0:08:29.320 --> 0:08:31.480
<v Speaker 1>because it is going to take change. Right, people are

0:08:31.480 --> 0:08:33.240
<v Speaker 1>going to have to change how they think about work.

0:08:33.280 --> 0:08:35.240
<v Speaker 1>They're going to have to change their process flows, they're

0:08:35.240 --> 0:08:37.240
<v Speaker 1>going to have to change some of the things about how

0:08:37.240 --> 0:08:38.680
<v Speaker 1>they get work done. It's not just going to be

0:08:38.880 --> 0:08:40.920
<v Speaker 1>a magic one that's going to come in and magically

0:08:40.920 --> 0:08:44.199
<v Speaker 1>get them to get value. But almost everyone that I

0:08:44.280 --> 0:08:46.440
<v Speaker 1>talked to you definitely sees that that's the path. These

0:08:46.520 --> 0:08:50.560
<v Speaker 1>the agentic power of the power of agents is what

0:08:50.720 --> 0:08:53.480
<v Speaker 1>allows customers to actually get that work done. And when

0:08:53.520 --> 0:08:55.880
<v Speaker 1>they see that efficiency gain, they see them able to

0:08:55.880 --> 0:08:58.640
<v Speaker 1>accomplish things they weren't able to do before. That is

0:08:58.640 --> 0:09:00.440
<v Speaker 1>when it's worth it to go make these changes. And

0:09:00.480 --> 0:09:01.920
<v Speaker 1>so there's going to be work for people, and it's

0:09:01.960 --> 0:09:04.440
<v Speaker 1>going to take some time, right, It's taken We're twenty

0:09:04.480 --> 0:09:07.319
<v Speaker 1>years into the cloud journey and still only a fraction

0:09:07.400 --> 0:09:09.280
<v Speaker 1>of workloads have moved to the cloud. So it's going

0:09:09.360 --> 0:09:10.720
<v Speaker 1>to take time. It's not like the people are going

0:09:10.800 --> 0:09:12.880
<v Speaker 1>to magically switch. And I think it's going to be really.

0:09:12.800 --> 0:09:15.719
<v Speaker 3>Fair that we just have sixty seconds twenty years into

0:09:15.760 --> 0:09:17.920
<v Speaker 3>the cloud journey. When I touched down in Vegas. Everyone

0:09:17.960 --> 0:09:23.000
<v Speaker 3>accepts AWS number one in terms of scale infrastructure. They question,

0:09:23.200 --> 0:09:26.319
<v Speaker 3>is AWS number one in AI? Just in the thirty

0:09:26.360 --> 0:09:28.760
<v Speaker 3>seconds we have left? Yeah, I think I'll give it.

0:09:28.760 --> 0:09:30.240
<v Speaker 1>It's a question that we got a lot two years

0:09:30.240 --> 0:09:32.160
<v Speaker 1>ago and not that much a year ago, and today

0:09:32.160 --> 0:09:33.719
<v Speaker 1>I don't think we get that nearly as much. It's

0:09:33.720 --> 0:09:35.360
<v Speaker 1>just people that are kind of playing the same tapes.

0:09:35.679 --> 0:09:39.319
<v Speaker 1>We have a huge choice of models. We see when

0:09:39.360 --> 0:09:42.480
<v Speaker 1>customers are actually moving their workloads to production, they want

0:09:42.480 --> 0:09:44.440
<v Speaker 1>to run those AI workloads on AWS, and that to

0:09:44.480 --> 0:09:46.800
<v Speaker 1>me is the biggest signal. When we see our customers

0:09:46.840 --> 0:09:48.600
<v Speaker 1>they say, I ran proof of concepts in a lot

0:09:48.600 --> 0:09:50.439
<v Speaker 1>of places. When I want to move to production, I

0:09:50.480 --> 0:09:52.200
<v Speaker 1>want to run on AWS. And that's the thing that

0:09:52.200 --> 0:09:54.120
<v Speaker 1>we hear over and over again, which makes me think

0:09:54.160 --> 0:09:55.320
<v Speaker 1>we're actually in a great position.

0:09:55.640 --> 0:09:59.240
<v Speaker 3>Matt Garman, AWS, CEO with the Full Stack AI Company,

0:09:59.280 --> 0:10:01.080
<v Speaker 3>pitch here Vegas at reinvent