WEBVTT - The Amazon Web Services Outage

0:00:04.400 --> 0:00:07.800
<v Speaker 1>Welcome to tech Stuff, a production from I Heart Radio.

0:00:12.160 --> 0:00:15.120
<v Speaker 1>Hey there, and welcome to tech Stuff. I'm your host,

0:00:15.280 --> 0:00:18.400
<v Speaker 1>Jonathan Strickland. I'm an executive producer with I Heart Radio

0:00:18.440 --> 0:00:21.560
<v Speaker 1>and a love of all things tech and last week,

0:00:21.960 --> 0:00:27.360
<v Speaker 1>Amazon's US East one cloud region had a bit of

0:00:27.400 --> 0:00:32.360
<v Speaker 1>an outage, and the effects were widespread. Amazon delivery services

0:00:32.400 --> 0:00:35.680
<v Speaker 1>were affected. A lot of deliveries just couldn't be made

0:00:35.720 --> 0:00:40.240
<v Speaker 1>because the whole system that underlies that the computer system

0:00:40.320 --> 0:00:45.919
<v Speaker 1>was affected. Computer games like Player Unknowns Battlegrounds became unavailable.

0:00:46.320 --> 0:00:49.520
<v Speaker 1>People discovered that some of their home automation devices weren't

0:00:49.520 --> 0:00:53.600
<v Speaker 1>working properly. Room Baz went berserk and rose up against

0:00:53.600 --> 0:00:57.720
<v Speaker 1>their human owners. Even down at Walt Disney World, guests

0:00:57.800 --> 0:01:01.360
<v Speaker 1>found themselves struggling with systems like Genie Plus, or even

0:01:01.400 --> 0:01:04.080
<v Speaker 1>just making a park reservation so that they could visit

0:01:04.160 --> 0:01:06.720
<v Speaker 1>a theme park. Also, I kind of made up the

0:01:06.800 --> 0:01:11.319
<v Speaker 1>roomba thing. So today I thought I would talk a

0:01:11.360 --> 0:01:15.399
<v Speaker 1>little bit about the history of Amazon Web Services, what

0:01:15.600 --> 0:01:19.280
<v Speaker 1>it actually does, why it's such a big deal for

0:01:19.319 --> 0:01:23.520
<v Speaker 1>Amazon the company, and why when there's an outage it

0:01:23.640 --> 0:01:28.040
<v Speaker 1>has such a widespread effect. Now, the history of Amazon

0:01:28.120 --> 0:01:32.119
<v Speaker 1>Web Services or AWS goes back a couple of decades

0:01:32.360 --> 0:01:35.560
<v Speaker 1>and it is tied closely with the general rise of

0:01:35.640 --> 0:01:39.800
<v Speaker 1>cloud computing. So first, let's define cloud computing just so

0:01:39.840 --> 0:01:43.080
<v Speaker 1>that we have a common language. Now, if you were

0:01:43.160 --> 0:01:47.919
<v Speaker 1>to go to Google and query the terms cloud computing definition,

0:01:48.520 --> 0:01:52.320
<v Speaker 1>you would likely get something like the following quote. The

0:01:52.360 --> 0:01:56.040
<v Speaker 1>practice of using a network of remote servers hosted on

0:01:56.080 --> 0:02:00.360
<v Speaker 1>the Internet to store, manage, and process data either than

0:02:00.400 --> 0:02:04.520
<v Speaker 1>a local server or a personal computer end quote. So,

0:02:05.320 --> 0:02:09.919
<v Speaker 1>at its most simplest form, cloud computing is when you

0:02:10.000 --> 0:02:14.839
<v Speaker 1>access computational resources that are on someone else's computer and

0:02:14.960 --> 0:02:17.440
<v Speaker 1>you use the Internet to do it. So, if you

0:02:17.560 --> 0:02:21.400
<v Speaker 1>use any sort of cloud storage like one drive or

0:02:21.480 --> 0:02:24.520
<v Speaker 1>drop box or any of a thousand others, what you

0:02:24.560 --> 0:02:29.000
<v Speaker 1>are actually doing is saving files to special data servers

0:02:29.040 --> 0:02:33.200
<v Speaker 1>that are in some massive server farm somewhere in the world,

0:02:33.320 --> 0:02:37.280
<v Speaker 1>probably not too far from where you are, Or maybe

0:02:37.639 --> 0:02:40.560
<v Speaker 1>you're actually saving that one file two servers that are

0:02:40.560 --> 0:02:44.560
<v Speaker 1>in a few different massive server farms. Though you wouldn't

0:02:44.600 --> 0:02:47.400
<v Speaker 1>necessarily be aware of any of this, because that would

0:02:47.400 --> 0:02:49.720
<v Speaker 1>be going on in the background, and it would be

0:02:49.760 --> 0:02:52.799
<v Speaker 1>a matter of redundancy to make sure that your file

0:02:53.440 --> 0:02:56.680
<v Speaker 1>remains available even if something should happen to any one

0:02:56.840 --> 0:03:00.960
<v Speaker 1>particular machine. So when you access that file, what you're

0:03:01.000 --> 0:03:04.440
<v Speaker 1>doing is connecting back to one of those servers that

0:03:04.560 --> 0:03:08.160
<v Speaker 1>holds that particular file, and you might download the file

0:03:08.200 --> 0:03:11.200
<v Speaker 1>to your local machine, so you're just retrieving it, or

0:03:11.320 --> 0:03:14.560
<v Speaker 1>depending on the type of file you're accessing and the

0:03:14.560 --> 0:03:16.760
<v Speaker 1>type of service you're using, you might be able to

0:03:16.800 --> 0:03:19.720
<v Speaker 1>do stuff like make changes to that file through a

0:03:19.760 --> 0:03:22.720
<v Speaker 1>web based client. So if you were to create a

0:03:22.800 --> 0:03:26.720
<v Speaker 1>document in Google Docs, for example, that would follow that

0:03:26.840 --> 0:03:30.040
<v Speaker 1>kind of cloud computing model. It's one of the simplest

0:03:30.080 --> 0:03:34.840
<v Speaker 1>manifestations of cloud computing and is effectively cloud storage with

0:03:34.960 --> 0:03:38.040
<v Speaker 1>a little bit of editing thrown in. But cloud computing

0:03:38.080 --> 0:03:41.840
<v Speaker 1>can go far beyond just storing files. There are cloud

0:03:41.840 --> 0:03:45.920
<v Speaker 1>based services that allow developers to build out an app environment.

0:03:46.520 --> 0:03:49.440
<v Speaker 1>They might do this so that a distributed team, you know,

0:03:49.520 --> 0:03:52.400
<v Speaker 1>people who aren't working all in the same location can

0:03:52.520 --> 0:03:56.560
<v Speaker 1>simultaneously work on the same code and create test environments

0:03:56.600 --> 0:04:00.000
<v Speaker 1>to make sure that the app performs as expected before

0:04:00.120 --> 0:04:03.760
<v Speaker 1>or they deploy the app to end users, you know,

0:04:03.800 --> 0:04:09.000
<v Speaker 1>to customers. Other cloud services serve as an actual deployment platform,

0:04:09.080 --> 0:04:12.080
<v Speaker 1>so not just to develop, but to deploy the gifts,

0:04:12.120 --> 0:04:14.720
<v Speaker 1>developers the assets that they need to push out an

0:04:14.760 --> 0:04:20.120
<v Speaker 1>app and handle user interactions. So some apps might quote

0:04:20.160 --> 0:04:24.919
<v Speaker 1>unquote live natively on your device. Right, You download a

0:04:25.040 --> 0:04:28.560
<v Speaker 1>file to whatever you're using, whether it's a computer or

0:04:28.680 --> 0:04:31.800
<v Speaker 1>smartphone or tablet or whatever it is, and then all

0:04:31.839 --> 0:04:35.400
<v Speaker 1>the processes and all the data could be contained right

0:04:35.400 --> 0:04:39.200
<v Speaker 1>there locally on your machine. Um that's like the old

0:04:39.279 --> 0:04:43.719
<v Speaker 1>form of computing. But increasingly we're seeing apps that rely

0:04:43.880 --> 0:04:47.800
<v Speaker 1>on the cloud for functionality. So games could have stuff

0:04:47.839 --> 0:04:51.280
<v Speaker 1>like leaderboards or ways that you can compete or cooperate

0:04:51.360 --> 0:04:55.400
<v Speaker 1>with other players in real time. A weather app needs

0:04:55.440 --> 0:04:58.240
<v Speaker 1>to fetch data from servers to tell you what the

0:04:58.279 --> 0:05:01.240
<v Speaker 1>weather will be, like your your own doesn't magically know

0:05:01.279 --> 0:05:03.560
<v Speaker 1>what the weather is going to be. Even a lot

0:05:03.560 --> 0:05:07.520
<v Speaker 1>of home automation apps will communicate back with a web

0:05:07.560 --> 0:05:11.320
<v Speaker 1>server somewhere rather than handle everything right there in your home.

0:05:11.360 --> 0:05:14.000
<v Speaker 1>In fact, that's a sticking point for a lot of

0:05:14.040 --> 0:05:18.440
<v Speaker 1>home automation folks, right they don't necessarily want to have

0:05:18.640 --> 0:05:21.080
<v Speaker 1>the cloud part of the infrastructure. They would prefer to

0:05:21.080 --> 0:05:23.799
<v Speaker 1>have their home be kind of a self contained system.

0:05:24.400 --> 0:05:27.280
<v Speaker 1>You see this a lot with people who have security

0:05:27.320 --> 0:05:30.839
<v Speaker 1>systems where they would prefer to have something that was

0:05:30.920 --> 0:05:34.400
<v Speaker 1>completely contained within their own home, as opposed to having

0:05:34.440 --> 0:05:38.600
<v Speaker 1>their security system become a surveillance tool for a company

0:05:38.640 --> 0:05:41.560
<v Speaker 1>that may or may not be working in conjunction with, say,

0:05:41.839 --> 0:05:45.120
<v Speaker 1>law enforcement. That's become a big issue, but that's a

0:05:45.120 --> 0:05:48.559
<v Speaker 1>matter for a different podcast. Now. Building out these kinds

0:05:48.560 --> 0:05:53.719
<v Speaker 1>of systems is expensive because you need the physical facilities right,

0:05:53.760 --> 0:05:55.919
<v Speaker 1>You need the actual buildings, and they have to be

0:05:56.000 --> 0:05:59.240
<v Speaker 1>large enough to hold all the servers that are designed

0:05:59.560 --> 0:06:03.279
<v Speaker 1>to make your app work right, and then the facilities

0:06:03.320 --> 0:06:06.039
<v Speaker 1>also have to be designed themselves to allow those servers

0:06:06.080 --> 0:06:09.479
<v Speaker 1>to operate. That means building out stuff like cooling systems

0:06:09.760 --> 0:06:12.320
<v Speaker 1>so that your machines don't overheat. So you know, it's

0:06:12.320 --> 0:06:15.480
<v Speaker 1>not just enough to have a place where you store

0:06:15.480 --> 0:06:19.360
<v Speaker 1>all the computers. You've gotta have it be appropriate for that.

0:06:19.400 --> 0:06:22.760
<v Speaker 1>You know, it needs to be dry, free of dust, cooled,

0:06:22.880 --> 0:06:26.280
<v Speaker 1>that kind of stuff. You also need to maintain those systems.

0:06:26.400 --> 0:06:29.359
<v Speaker 1>You have to repair or replace components as they fail,

0:06:29.440 --> 0:06:32.279
<v Speaker 1>because we all know technology does fail at some point

0:06:32.360 --> 0:06:35.760
<v Speaker 1>for a variety of reasons. That's also why you need

0:06:35.800 --> 0:06:39.880
<v Speaker 1>to make more than the bare minimum to run your operations, right,

0:06:39.920 --> 0:06:42.760
<v Speaker 1>you need to do more than just the basics. You

0:06:42.800 --> 0:06:47.240
<v Speaker 1>need backups for redundancy so that if and when a

0:06:47.400 --> 0:06:51.039
<v Speaker 1>specific machine goes down, others can take its place seamlessly

0:06:51.160 --> 0:06:54.720
<v Speaker 1>without affecting the end user. So in our Google Docs experience,

0:06:55.400 --> 0:06:58.240
<v Speaker 1>like if you were to create a document that's not

0:06:58.320 --> 0:07:01.080
<v Speaker 1>just sitting on one server that Google owns, it's actually

0:07:01.120 --> 0:07:05.840
<v Speaker 1>on multiple servers um multiple machines, and if one of

0:07:05.880 --> 0:07:08.320
<v Speaker 1>those machines goes down, you can still get access to

0:07:08.360 --> 0:07:11.360
<v Speaker 1>your file. Also when you make changes to it. Essentially

0:07:11.400 --> 0:07:14.560
<v Speaker 1>you're making changes on one file on one machine that

0:07:14.560 --> 0:07:16.640
<v Speaker 1>that machine then sends out a message to all the

0:07:16.680 --> 0:07:19.440
<v Speaker 1>other machines that have that same file so that they

0:07:19.440 --> 0:07:22.640
<v Speaker 1>can all be updated with the newest version. UM. I've

0:07:22.680 --> 0:07:25.600
<v Speaker 1>done episodes about you know, that kind of background stuff

0:07:25.720 --> 0:07:29.040
<v Speaker 1>in in the Google Docs world. So moving on, Essentially,

0:07:29.080 --> 0:07:35.040
<v Speaker 1>cloud computing utilizes network connections to allow you, other people, organizations,

0:07:35.120 --> 0:07:38.160
<v Speaker 1>companies to rely on machines that are hosted somewhere else,

0:07:38.400 --> 0:07:41.440
<v Speaker 1>and that frees you up considerably. You don't have to

0:07:41.520 --> 0:07:43.960
<v Speaker 1>invest in buying hard drives so that you can save

0:07:44.000 --> 0:07:46.840
<v Speaker 1>all your files. You can just subscribe to a service

0:07:46.920 --> 0:07:49.800
<v Speaker 1>to get some cloud storage companies don't have to keep

0:07:49.880 --> 0:07:53.600
<v Speaker 1>themselves out with massive computer systems, uh complete with an

0:07:53.600 --> 0:07:56.400
<v Speaker 1>I T department to support those computer systems. They can

0:07:56.440 --> 0:07:58.760
<v Speaker 1>just you know, spend some money to use a cloud

0:07:58.800 --> 0:08:02.240
<v Speaker 1>computing service owned by someone else and then host all

0:08:02.280 --> 0:08:05.160
<v Speaker 1>their operations through that. Though, I should add a lot

0:08:05.160 --> 0:08:07.760
<v Speaker 1>of companies take a more hybrid approach, so they have

0:08:07.880 --> 0:08:12.880
<v Speaker 1>some systems typically really like mission critical systems are sometimes

0:08:12.880 --> 0:08:15.200
<v Speaker 1>ones that require a great deal of privacy and security.

0:08:15.520 --> 0:08:19.400
<v Speaker 1>They might run those on premises or on prem and

0:08:19.440 --> 0:08:22.800
<v Speaker 1>then rely on other stuff, like the more administrative stuff

0:08:23.440 --> 0:08:28.560
<v Speaker 1>for cloud systems. Cloud computing started becoming a term, a

0:08:28.600 --> 0:08:32.360
<v Speaker 1>buzz term really around I mean, the idea was older

0:08:32.360 --> 0:08:34.960
<v Speaker 1>than that, but it was starting to really get circulation

0:08:35.000 --> 0:08:37.040
<v Speaker 1>around twenty ten or so. But the seeds, like I said,

0:08:37.080 --> 0:08:40.679
<v Speaker 1>we're planted earlier. So let's let's take a look at

0:08:40.679 --> 0:08:44.199
<v Speaker 1>Amazon specifically, because it played a big part in this.

0:08:45.160 --> 0:08:48.360
<v Speaker 1>So way back in two thousand, Amazon was scrambling to

0:08:48.440 --> 0:08:51.720
<v Speaker 1>keep up with some scaling issues, and this is something

0:08:51.760 --> 0:08:54.719
<v Speaker 1>that you hear about with startups pretty frequently. A new

0:08:54.760 --> 0:08:58.920
<v Speaker 1>startup is usually a fairly small company, and it's nimble,

0:08:59.000 --> 0:09:01.440
<v Speaker 1>and it's agile, and it might offer a small range

0:09:01.440 --> 0:09:04.400
<v Speaker 1>of services or products, or it might only serve a

0:09:04.440 --> 0:09:08.600
<v Speaker 1>relatively small region or both. I think companies like Lift

0:09:08.640 --> 0:09:11.600
<v Speaker 1>and Uber those launched in just a couple of cities

0:09:11.720 --> 0:09:15.680
<v Speaker 1>early on, right, so they were able to grow in

0:09:15.720 --> 0:09:19.360
<v Speaker 1>a controlled manner. Well, if customer demand is high and

0:09:19.520 --> 0:09:22.920
<v Speaker 1>investors are pouring money into the startup, it makes some

0:09:23.040 --> 0:09:26.400
<v Speaker 1>sense to try and grow the company and expand operations.

0:09:26.440 --> 0:09:29.640
<v Speaker 1>But growing ads new challenges and making sure that the

0:09:29.679 --> 0:09:32.600
<v Speaker 1>things you offer are able to scale up and meet

0:09:32.640 --> 0:09:37.360
<v Speaker 1>demand is a non trivial matter. That's the situation Amazon

0:09:37.520 --> 0:09:40.920
<v Speaker 1>was in around two thousand. One of the things the

0:09:40.920 --> 0:09:44.520
<v Speaker 1>company was exploring was building out merchant sites for other

0:09:44.600 --> 0:09:48.680
<v Speaker 1>companies but still using the Amazon platform. So, for example,

0:09:48.880 --> 0:09:52.640
<v Speaker 1>Amazon might partner with a retail company like Target to

0:09:52.720 --> 0:09:57.360
<v Speaker 1>provide an online store, but use Amazon's infrastructure underlying that store,

0:09:57.920 --> 0:10:00.000
<v Speaker 1>and this would bring in a new stream of revenue

0:10:00.240 --> 0:10:03.080
<v Speaker 1>for Amazon, and it would mean these retail companies could

0:10:03.080 --> 0:10:06.480
<v Speaker 1>rely on Amazon's platform rather than having to build out

0:10:06.559 --> 0:10:09.680
<v Speaker 1>us an online store all of their own. So Amazon

0:10:09.760 --> 0:10:13.560
<v Speaker 1>called this merchant dot com. But it turned out building

0:10:13.600 --> 0:10:16.800
<v Speaker 1>merchant dot com was pretty challenging. It was one thing

0:10:16.840 --> 0:10:20.040
<v Speaker 1>to manage Amazon's rapid growth, but it was another to

0:10:20.080 --> 0:10:23.160
<v Speaker 1>build out products that could immediately scale to fit the

0:10:23.200 --> 0:10:27.160
<v Speaker 1>needs of established companies like Target. The initial result was

0:10:27.200 --> 0:10:30.840
<v Speaker 1>a product that had so many interconnected moving parts and

0:10:30.960 --> 0:10:33.960
<v Speaker 1>features that it was difficult for a user to navigate

0:10:34.040 --> 0:10:37.240
<v Speaker 1>and actually use. And I'm sure all of you out

0:10:37.320 --> 0:10:40.280
<v Speaker 1>there know that if a tool is hard to use,

0:10:41.120 --> 0:10:43.680
<v Speaker 1>most people don't bother with it. Right. You might get

0:10:43.720 --> 0:10:46.199
<v Speaker 1>it and try and think this is too much hassle,

0:10:46.280 --> 0:10:48.439
<v Speaker 1>so you would rather go without or find some of

0:10:48.480 --> 0:10:52.320
<v Speaker 1>their alternative. Well, in two thousand two, Amazon began building

0:10:52.320 --> 0:10:56.439
<v Speaker 1>out Amazon dot Com Web Service. Now this would not

0:10:56.679 --> 0:11:00.000
<v Speaker 1>quite be the same thing as Amazon Web Services, despite

0:11:00.040 --> 0:11:02.760
<v Speaker 1>the similar names. It was much more simple than that.

0:11:03.240 --> 0:11:08.079
<v Speaker 1>It used a SOAP and XML interface. And by SOAP,

0:11:08.120 --> 0:11:11.280
<v Speaker 1>I don't mean the stuff you use to get clean. Now,

0:11:11.440 --> 0:11:14.160
<v Speaker 1>if you're not a developer, those things probably sound a

0:11:14.160 --> 0:11:17.520
<v Speaker 1>little confusing, so let's clear it up. SOAP is a

0:11:17.559 --> 0:11:22.280
<v Speaker 1>messaging protocol which originally stood for Simple Object Access Protocol

0:11:22.760 --> 0:11:29.400
<v Speaker 1>and XML means extensible markup Language. It is a language

0:11:30.480 --> 0:11:32.880
<v Speaker 1>so weird to say this, so it's like a machine

0:11:32.920 --> 0:11:36.439
<v Speaker 1>readable and human readable language that's used to create sets

0:11:36.480 --> 0:11:40.800
<v Speaker 1>of rules for document encoding. So this this is a

0:11:41.800 --> 0:11:46.320
<v Speaker 1>language we used to define rules as opposed to you know,

0:11:46.440 --> 0:11:51.280
<v Speaker 1>programming something together. These allowed developers to create processes that

0:11:51.320 --> 0:11:53.960
<v Speaker 1>can run on pretty much any machine that has HTTP

0:11:54.160 --> 0:11:57.760
<v Speaker 1>installed on it. UH. That way, you could create a

0:11:57.880 --> 0:12:00.719
<v Speaker 1>process that can run on Windows device is or Mac

0:12:00.800 --> 0:12:03.280
<v Speaker 1>os or Lenox, all that kind of stuff without having

0:12:03.280 --> 0:12:06.880
<v Speaker 1>to program a specific version for each operating system. So

0:12:07.320 --> 0:12:10.520
<v Speaker 1>Amazon's version of this allowed for a pretty limited amount

0:12:10.559 --> 0:12:14.120
<v Speaker 1>of development around creating processes that could access the Amazon

0:12:14.200 --> 0:12:18.240
<v Speaker 1>product catalog. This would allow web developers to create an

0:12:18.280 --> 0:12:22.720
<v Speaker 1>interface on their own web page that would utilize Amazon's store,

0:12:23.360 --> 0:12:25.800
<v Speaker 1>with the idea that people could buy a product right

0:12:25.840 --> 0:12:29.360
<v Speaker 1>there from that web page instead of having to navigate

0:12:29.440 --> 0:12:33.199
<v Speaker 1>over to Amazon dot com itself, and the developers would

0:12:33.200 --> 0:12:36.760
<v Speaker 1>earn a small commission on every sale made through that

0:12:37.240 --> 0:12:39.960
<v Speaker 1>you know, web page based point of sale. It's just

0:12:40.040 --> 0:12:44.120
<v Speaker 1>a tiny dip of the toe in the cloud based infrastructure. Also,

0:12:44.160 --> 0:12:47.240
<v Speaker 1>Amazon noticed that developers were I mean, this happens all

0:12:47.280 --> 0:12:50.680
<v Speaker 1>the time. Developers were taking that tool and making stuff

0:12:50.679 --> 0:12:55.160
<v Speaker 1>that Amazon had not anticipated or intended, and nothing necessarily bad,

0:12:55.200 --> 0:12:58.559
<v Speaker 1>but like some were making games where they would show

0:12:59.240 --> 0:13:02.360
<v Speaker 1>use this this methodology to show a picture of an

0:13:02.360 --> 0:13:04.360
<v Speaker 1>Amazon product, and it was up to you to guess

0:13:04.400 --> 0:13:07.959
<v Speaker 1>what that product was. That kind of thing. So they

0:13:07.960 --> 0:13:11.360
<v Speaker 1>were gamifying certain elements of this and that kind of

0:13:11.360 --> 0:13:15.240
<v Speaker 1>got wheels turning over at Amazon. This happens all the time.

0:13:15.280 --> 0:13:17.440
<v Speaker 1>Whenever you create anything and you give it to developers,

0:13:17.480 --> 0:13:20.280
<v Speaker 1>they immediately figure out ways to misuse it, I mean

0:13:20.400 --> 0:13:25.479
<v Speaker 1>use it creatively anyway. Around the same time, Amazon executives

0:13:25.520 --> 0:13:28.480
<v Speaker 1>began to realize that their various development teams were running

0:13:28.480 --> 0:13:32.120
<v Speaker 1>into the same problems over and over. Namely, each team

0:13:32.120 --> 0:13:35.600
<v Speaker 1>working on a different internal project would need to go

0:13:35.679 --> 0:13:38.120
<v Speaker 1>through the same basic steps before they could do any

0:13:38.160 --> 0:13:41.040
<v Speaker 1>serious work on the project itself, which involved things like

0:13:41.640 --> 0:13:46.240
<v Speaker 1>establishing systems to handle compute operations, UH, storage solutions to

0:13:46.280 --> 0:13:49.920
<v Speaker 1>hold all the data, and also database solutions to organize everything,

0:13:50.280 --> 0:13:53.600
<v Speaker 1>and a clear picture began to emerge. Amazon's teams were

0:13:53.640 --> 0:13:57.360
<v Speaker 1>having to reinvent the wheel with every new project, and

0:13:57.400 --> 0:13:59.880
<v Speaker 1>the original projection for seeing a project go from start

0:13:59.880 --> 0:14:01.959
<v Speaker 1>to finish was supposed to be three months. That was

0:14:02.000 --> 0:14:04.720
<v Speaker 1>the goal for Amazon, but it turned out that just

0:14:04.800 --> 0:14:07.760
<v Speaker 1>building out the infrastructure to allow a project team to

0:14:07.800 --> 0:14:12.320
<v Speaker 1>actually start developing their project would take three months, so

0:14:12.440 --> 0:14:16.119
<v Speaker 1>everything was running behind schedule. The lesson that the executives

0:14:16.120 --> 0:14:18.360
<v Speaker 1>took from this is that it would be a worthwhile

0:14:18.440 --> 0:14:23.280
<v Speaker 1>endeavor to establish an centralized internal system that could support

0:14:23.320 --> 0:14:27.120
<v Speaker 1>the compute, database, and storage needs of all these different

0:14:27.120 --> 0:14:29.480
<v Speaker 1>project teams. So it would need to be a system

0:14:29.560 --> 0:14:33.120
<v Speaker 1>that could compartmentalize and contain each project so that every

0:14:33.120 --> 0:14:35.880
<v Speaker 1>one of them would have the resources that the teams needed.

0:14:36.320 --> 0:14:39.480
<v Speaker 1>It meant building out virtual machines and figuring out ways

0:14:39.520 --> 0:14:42.760
<v Speaker 1>to create redundancy, and it was a matter of necessity

0:14:42.840 --> 0:14:45.400
<v Speaker 1>for Amazon in order for those internal teams to get

0:14:45.440 --> 0:14:48.880
<v Speaker 1>out of that you know, three month projection goal. But

0:14:48.960 --> 0:14:51.600
<v Speaker 1>it also meant Amazon was building up something that could

0:14:51.640 --> 0:14:54.280
<v Speaker 1>potentially end up being a service that the company could

0:14:54.280 --> 0:14:57.160
<v Speaker 1>offer to others. It would take a little bit longer

0:14:57.200 --> 0:14:59.880
<v Speaker 1>for that to come about. Over time, Folks at Amazon

0:15:00.000 --> 0:15:02.680
<v Speaker 1>again to think of this effort as creating something almost

0:15:02.760 --> 0:15:06.440
<v Speaker 1>like an operating system, but for the Internet rather than

0:15:06.520 --> 0:15:10.640
<v Speaker 1>for a computer or a mobile device. These ideas began

0:15:10.680 --> 0:15:13.160
<v Speaker 1>to first take shape around two thousand three, when Amazon

0:15:13.240 --> 0:15:16.480
<v Speaker 1>executives were attending a company retreat. It would be another

0:15:16.560 --> 0:15:19.760
<v Speaker 1>few years before the earliest version of Amazon's web services

0:15:19.840 --> 0:15:22.560
<v Speaker 1>would launch. All Right, we're gonna take a quick break.

0:15:22.880 --> 0:15:34.400
<v Speaker 1>When we come back, we'll talk more about Amazon Web Services. Okay,

0:15:34.400 --> 0:15:36.800
<v Speaker 1>we left off in two thousand three. Let's put this

0:15:36.880 --> 0:15:39.800
<v Speaker 1>in perspective. If there anything like me, you might say,

0:15:39.840 --> 0:15:41.840
<v Speaker 1>all right, well that's less than twenty years ago. I

0:15:41.880 --> 0:15:44.480
<v Speaker 1>get it. But we let's think about other things that

0:15:44.520 --> 0:15:46.920
<v Speaker 1>were going on. Right, So, two thousand three was a

0:15:47.000 --> 0:15:51.360
<v Speaker 1>year before Facebook would launch at Harvard, let alone expand

0:15:51.400 --> 0:15:53.560
<v Speaker 1>beyond it. In fact, it was about three years before

0:15:53.560 --> 0:15:55.360
<v Speaker 1>Facebook would get out of the phase where it was

0:15:55.480 --> 0:15:59.040
<v Speaker 1>only available to college students. Two thousand three was two

0:15:59.160 --> 0:16:02.360
<v Speaker 1>years before You Too blaunched. It was four years before

0:16:02.360 --> 0:16:05.760
<v Speaker 1>Apple would introduce the iPhone, and it was just two

0:16:05.840 --> 0:16:09.680
<v Speaker 1>years after we had had the dot com crash that

0:16:09.760 --> 0:16:13.680
<v Speaker 1>had whited out numerous web based companies. So this was

0:16:13.800 --> 0:16:18.880
<v Speaker 1>very early on in thinking about cloud computing and operations

0:16:18.880 --> 0:16:21.600
<v Speaker 1>at this kind of scale. The company began to invest

0:16:21.640 --> 0:16:24.640
<v Speaker 1>in building out data centers, you know, these huge facilities.

0:16:24.640 --> 0:16:29.040
<v Speaker 1>The whole thousands of servers and engineers developed and tweaked

0:16:29.120 --> 0:16:34.800
<v Speaker 1>database management services to coordinate and partition these machines effectively Meanwhile,

0:16:35.200 --> 0:16:38.040
<v Speaker 1>the product development teams would work on new products to

0:16:38.120 --> 0:16:41.840
<v Speaker 1>expand what Amazon could do for customers. So in two

0:16:41.880 --> 0:16:45.120
<v Speaker 1>thousand three, Andy Jesse, who would go on to become

0:16:45.200 --> 0:16:48.760
<v Speaker 1>the CEO of Amazon as of July five of this year,

0:16:49.400 --> 0:16:52.960
<v Speaker 1>he became the project lead for Amazon Web Services. He

0:16:53.040 --> 0:16:56.320
<v Speaker 1>had suggested to Jeff Bezos that Amazon could take the

0:16:56.440 --> 0:16:59.680
<v Speaker 1>systems the company had been developing for internal use and

0:16:59.720 --> 0:17:03.040
<v Speaker 1>then open those up as a product for other companies.

0:17:03.080 --> 0:17:06.640
<v Speaker 1>And he was essentially pitching cloud computing to Jeff Bezos,

0:17:07.040 --> 0:17:09.600
<v Speaker 1>and he got the go ahead. In two thousand four,

0:17:09.680 --> 0:17:12.639
<v Speaker 1>Jesse's team had a beta version of this product that

0:17:12.720 --> 0:17:16.280
<v Speaker 1>was ready for testing, and over the following two years

0:17:16.560 --> 0:17:19.440
<v Speaker 1>they would refine and tweak that product until in two

0:17:19.480 --> 0:17:23.920
<v Speaker 1>thousand six, AWS was ready to launch its first initial product.

0:17:24.440 --> 0:17:27.800
<v Speaker 1>Now this would not be Amazon Web Services as a

0:17:27.840 --> 0:17:33.160
<v Speaker 1>cohesive whole, but rather a single product called Simple Storage

0:17:33.200 --> 0:17:37.680
<v Speaker 1>Services or S three, which debuted on March fourteen, two

0:17:37.680 --> 0:17:42.280
<v Speaker 1>thousand six. Now, Amazon described AS three as a tool

0:17:42.320 --> 0:17:46.119
<v Speaker 1>that would let developers save and retrieve quote any amount

0:17:46.160 --> 0:17:49.480
<v Speaker 1>of data at any time from anywhere on the web

0:17:49.800 --> 0:17:53.639
<v Speaker 1>end quote. So this was a cloud storage product. It

0:17:53.840 --> 0:17:57.159
<v Speaker 1>is a cloud storage product it still exists. The experience

0:17:57.200 --> 0:18:00.359
<v Speaker 1>of merchant dot com had, however, taught Amazon de olopers

0:18:00.400 --> 0:18:04.800
<v Speaker 1>a pretty valuable lesson, which I would summarize as just

0:18:05.000 --> 0:18:08.600
<v Speaker 1>because he can doesn't mean you should. Now. Granted, I

0:18:08.720 --> 0:18:12.439
<v Speaker 1>usually use that phrase to criticize vocalists who do irritating

0:18:12.520 --> 0:18:17.840
<v Speaker 1>vocal runs during their songs um Mariah carry, but in

0:18:17.880 --> 0:18:22.440
<v Speaker 1>this case, I'm talking about the issue of feature creep. Now.

0:18:22.560 --> 0:18:26.119
<v Speaker 1>Feature creep is this tendency to throw in extra features

0:18:26.160 --> 0:18:30.040
<v Speaker 1>and options into a product just because he can. These

0:18:30.040 --> 0:18:34.359
<v Speaker 1>features don't necessarily contribute to the usefulness of that product.

0:18:34.680 --> 0:18:37.119
<v Speaker 1>In fact, more often than not, they can cause a

0:18:37.119 --> 0:18:39.840
<v Speaker 1>product to be jan kie and hard to navigate. The

0:18:39.880 --> 0:18:43.360
<v Speaker 1>Amazon developers didn't want S three to fall into that trap,

0:18:43.400 --> 0:18:46.240
<v Speaker 1>and so early on the team decided that the only

0:18:46.240 --> 0:18:48.680
<v Speaker 1>thing that needed to be done was to make sure

0:18:48.720 --> 0:18:51.520
<v Speaker 1>the storage service was as good as it could be

0:18:51.760 --> 0:18:56.159
<v Speaker 1>and just avoid including any extraneous options. Their motto was

0:18:56.280 --> 0:19:00.000
<v Speaker 1>quote the system should be made as simple as possible,

0:19:00.720 --> 0:19:04.200
<v Speaker 1>but no simpler end quote. That's also a good point.

0:19:04.560 --> 0:19:08.040
<v Speaker 1>A bare bones approach is sometimes the best one, but

0:19:08.240 --> 0:19:11.639
<v Speaker 1>you do still need the bones to be there. The

0:19:11.760 --> 0:19:15.480
<v Speaker 1>architecture of the product can be described as objects, buckets,

0:19:15.520 --> 0:19:20.240
<v Speaker 1>and keys. Objects are essentially data, and that data could

0:19:20.240 --> 0:19:23.040
<v Speaker 1>be just about anything. S three doesn't care what the

0:19:23.160 --> 0:19:25.600
<v Speaker 1>data is. It could be video files, it could be

0:19:26.160 --> 0:19:29.760
<v Speaker 1>a game, it could be a database, it could be music,

0:19:29.800 --> 0:19:33.560
<v Speaker 1>it could be whatever. The objects have metadata that describes

0:19:33.680 --> 0:19:38.280
<v Speaker 1>what the object is and when it was last modified. Next,

0:19:38.359 --> 0:19:40.639
<v Speaker 1>you've got your buckets, and this is a kind of

0:19:40.680 --> 0:19:45.760
<v Speaker 1>classification system. So imagine you've got these objects, that is, files,

0:19:46.040 --> 0:19:48.120
<v Speaker 1>and you've got a lot of different types of them

0:19:48.280 --> 0:19:51.199
<v Speaker 1>that belong to a lot of different things. So you

0:19:51.280 --> 0:19:54.320
<v Speaker 1>might have a bucket for specific kind of file like

0:19:54.480 --> 0:19:58.720
<v Speaker 1>music files, or more likely, you might organize buckets according

0:19:58.720 --> 0:20:02.040
<v Speaker 1>to specific projects. So one project might have all its

0:20:02.080 --> 0:20:05.560
<v Speaker 1>objects sorted into one or more buckets that belong to

0:20:05.600 --> 0:20:09.280
<v Speaker 1>that project alone. Now, keys are a kind of I

0:20:09.440 --> 0:20:13.240
<v Speaker 1>D for each object inside bucket, and each object has

0:20:13.480 --> 0:20:17.240
<v Speaker 1>one key, so you can find any object inside S

0:20:17.280 --> 0:20:20.400
<v Speaker 1>three if you have two pieces of information, the bucket

0:20:20.480 --> 0:20:23.320
<v Speaker 1>it is in and the key for the object. So

0:20:23.480 --> 0:20:26.960
<v Speaker 1>keys are used mainly for retrieval and you know that

0:20:27.080 --> 0:20:31.080
<v Speaker 1>kind of thing. The Amazon developers created a storage system

0:20:31.119 --> 0:20:35.159
<v Speaker 1>that was priced at fifteen cents per gigabyte of storage

0:20:35.240 --> 0:20:39.440
<v Speaker 1>space per month. At least at launch, it is significantly

0:20:39.520 --> 0:20:42.040
<v Speaker 1>cheaper than that now and this tells you that Amazon

0:20:42.119 --> 0:20:46.000
<v Speaker 1>has scaled the service dramatically, and considering we're well into

0:20:46.040 --> 0:20:48.480
<v Speaker 1>the era of big data, that's a good thing for developers.

0:20:48.840 --> 0:20:54.080
<v Speaker 1>So today, Amazon's S three standard storage has three different

0:20:54.119 --> 0:20:57.760
<v Speaker 1>tiers of cost, which depends on how much storage you're

0:20:57.760 --> 0:20:59.560
<v Speaker 1>actually using, Like how much data do you have in

0:20:59.600 --> 0:21:03.560
<v Speaker 1>the system. So let's say that you have fifty terabytes

0:21:03.720 --> 0:21:07.320
<v Speaker 1>or less in S three standard, that would mean that

0:21:07.359 --> 0:21:10.919
<v Speaker 1>you are looking at two point three since per gigabyte

0:21:10.920 --> 0:21:13.959
<v Speaker 1>per month. Uh, if you've got more than five hundred

0:21:14.040 --> 0:21:16.760
<v Speaker 1>terabytes stored, that's on the other end of the scale,

0:21:17.400 --> 0:21:21.240
<v Speaker 1>then you're paying two point one since per gigabyte per month.

0:21:21.640 --> 0:21:23.600
<v Speaker 1>And yeah, that adds up for companies that need to

0:21:23.600 --> 0:21:26.560
<v Speaker 1>store a lot of data. Anyway, I bring it up

0:21:26.560 --> 0:21:29.520
<v Speaker 1>to help illustrate how much things have changed. Fifteen cents

0:21:29.560 --> 0:21:32.760
<v Speaker 1>per gig per month is way, way, way, way, way

0:21:32.800 --> 0:21:35.880
<v Speaker 1>more expensive than two point three cents per gig per month.

0:21:36.359 --> 0:21:39.040
<v Speaker 1>Oh and I should also mention that S three today

0:21:39.080 --> 0:21:43.000
<v Speaker 1>offers several other storage products that have other features and

0:21:43.119 --> 0:21:46.560
<v Speaker 1>costs associated with them. But this is not meant to

0:21:46.560 --> 0:21:48.600
<v Speaker 1>be an ad for S three, so we're just gonna

0:21:49.119 --> 0:21:52.639
<v Speaker 1>leave that for now. Anyway, S three right now, the

0:21:52.640 --> 0:21:56.520
<v Speaker 1>gate was successful. In fact, just two months after launch,

0:21:56.800 --> 0:22:00.120
<v Speaker 1>Amazon saw that demand had exceeded their projections by act

0:22:00.280 --> 0:22:03.760
<v Speaker 1>of one hundred. Today, there are more than one hundred

0:22:03.880 --> 0:22:07.320
<v Speaker 1>trillion objects stored in buckets in S three, and the

0:22:07.359 --> 0:22:10.080
<v Speaker 1>fact that the product could scale up to accommodate that

0:22:10.200 --> 0:22:13.800
<v Speaker 1>number of objects attests to good design decisions that were

0:22:13.840 --> 0:22:17.960
<v Speaker 1>made early on. Yeah, the organization system is simple, but

0:22:18.040 --> 0:22:21.440
<v Speaker 1>that simplicity also meant that S three could grow on demand,

0:22:21.680 --> 0:22:25.679
<v Speaker 1>which it did. In August two thousand six, Amazon launched

0:22:25.680 --> 0:22:28.159
<v Speaker 1>a new cloud based service, and this one was called

0:22:28.280 --> 0:22:31.639
<v Speaker 1>and still is called Amazon Elastic Compute Cloud or e

0:22:31.840 --> 0:22:36.240
<v Speaker 1>C two, And as the name suggests, this product offers

0:22:36.320 --> 0:22:41.040
<v Speaker 1>up a different element of computing, the actual compute part.

0:22:41.600 --> 0:22:44.600
<v Speaker 1>That is, this is a system that would allow customers

0:22:44.640 --> 0:22:48.800
<v Speaker 1>the chance to tap into on demand computing power. Now,

0:22:48.800 --> 0:22:51.080
<v Speaker 1>developers who had a great idea but who lacked the

0:22:51.119 --> 0:22:54.160
<v Speaker 1>money or space or both to build out a computer

0:22:54.200 --> 0:22:58.239
<v Speaker 1>facility could subscribe to e C two and lean on

0:22:58.280 --> 0:23:02.120
<v Speaker 1>Amazon's systems to do the work for them. Like S three.

0:23:02.359 --> 0:23:05.040
<v Speaker 1>This idea had its roots back in two thousand three.

0:23:05.080 --> 0:23:08.760
<v Speaker 1>A couple of Amazon engineers, Chris Pinkham and Benjamin Black,

0:23:08.920 --> 0:23:12.520
<v Speaker 1>had authored a memo suggesting a product that could give

0:23:12.560 --> 0:23:15.960
<v Speaker 1>developers the chance to run software on Amazon computer systems

0:23:16.040 --> 0:23:21.240
<v Speaker 1>specifically designated for that task. Around this same time, Amazon

0:23:21.400 --> 0:23:25.880
<v Speaker 1>introduced Simple que Services or Amazon s q S. This

0:23:26.040 --> 0:23:29.040
<v Speaker 1>is a type of message que and by message I

0:23:29.040 --> 0:23:32.440
<v Speaker 1>mean the kinds of communications that go from service to service.

0:23:32.920 --> 0:23:35.080
<v Speaker 1>So let's say you're running an app on your phone

0:23:35.400 --> 0:23:38.240
<v Speaker 1>and the app might in the background send a request

0:23:38.359 --> 0:23:41.840
<v Speaker 1>to a remote server to get access to some data,

0:23:41.960 --> 0:23:44.399
<v Speaker 1>and that would be a message. So s q S

0:23:44.480 --> 0:23:47.320
<v Speaker 1>is a platform that queues up messages so that the

0:23:47.400 --> 0:23:50.800
<v Speaker 1>back end of a system can respond appropriately to requests

0:23:51.359 --> 0:23:55.280
<v Speaker 1>that should give the end user a seamless experience. Now

0:23:55.320 --> 0:23:57.600
<v Speaker 1>there's a lot more to s q S than that,

0:23:58.080 --> 0:24:00.760
<v Speaker 1>but I think that's simple X A nation will serve

0:24:00.840 --> 0:24:04.879
<v Speaker 1>us well enough for this episode. So these products S three,

0:24:05.320 --> 0:24:08.639
<v Speaker 1>e C two and s q S kind of became

0:24:08.680 --> 0:24:12.240
<v Speaker 1>the backbone for what would grow into Amazon Web Services

0:24:12.320 --> 0:24:14.560
<v Speaker 1>as a whole. And there are a lot of other

0:24:14.720 --> 0:24:18.960
<v Speaker 1>focused products in that suite, but generally speaking, each one

0:24:19.040 --> 0:24:21.840
<v Speaker 1>is meant to be really good at doing something specific

0:24:22.000 --> 0:24:25.640
<v Speaker 1>without having that feature creep issue come into play. Amazon

0:24:25.760 --> 0:24:28.720
<v Speaker 1>got the jump on other big companies like Google and

0:24:28.760 --> 0:24:32.080
<v Speaker 1>Microsoft when it came to offering up cloud based computing products.

0:24:32.119 --> 0:24:34.760
<v Speaker 1>This gave Amazon the chance to establish a dominant position

0:24:34.840 --> 0:24:37.480
<v Speaker 1>in the market. I mean, when you're effectively the only

0:24:37.520 --> 0:24:40.920
<v Speaker 1>game in town, it's you know, not hard to become dominant.

0:24:41.240 --> 0:24:45.160
<v Speaker 1>But today these other companies, Microsoft and Google and lots

0:24:45.160 --> 0:24:49.520
<v Speaker 1>more have their own cloud computing services available. Still, Amazon's

0:24:49.560 --> 0:24:52.000
<v Speaker 1>head start meant that the company still has a very

0:24:52.040 --> 0:24:56.800
<v Speaker 1>strong presence. According to Synergy Research Group, Amazon's share of

0:24:56.840 --> 0:25:00.760
<v Speaker 1>the cloud computing market is thirty two percent, or nearly

0:25:00.880 --> 0:25:04.760
<v Speaker 1>one third of the entire market. That's more than Microsoft

0:25:04.800 --> 0:25:08.600
<v Speaker 1>and Google's products combined together, those companies make up about

0:25:09.640 --> 0:25:12.919
<v Speaker 1>of the market. So about a third of all the

0:25:12.920 --> 0:25:15.920
<v Speaker 1>cloud computing business that's going on out there is going

0:25:15.920 --> 0:25:19.359
<v Speaker 1>through Amazon. And like I said, that includes tons of

0:25:19.400 --> 0:25:22.359
<v Speaker 1>different things from apps on your phone to video games

0:25:22.359 --> 0:25:26.719
<v Speaker 1>to Walt Disney World's virtual ticketing system. Now, I'm not

0:25:26.760 --> 0:25:29.600
<v Speaker 1>going to say that as long as a WS is

0:25:29.680 --> 0:25:33.800
<v Speaker 1>running smoothly, everything should go well, because all the products

0:25:33.840 --> 0:25:36.200
<v Speaker 1>that are built on top of a w S still

0:25:36.240 --> 0:25:38.360
<v Speaker 1>need to have a good design. I mean, it's possible

0:25:38.400 --> 0:25:42.640
<v Speaker 1>to make a really lousy product that's using AWS, and

0:25:42.840 --> 0:25:45.960
<v Speaker 1>it's not the fault of AWS if that product is lousy.

0:25:46.000 --> 0:25:48.480
<v Speaker 1>But as we saw last week, when things get harry

0:25:48.880 --> 0:25:52.639
<v Speaker 1>on AWS, all the products that rely on those services

0:25:52.720 --> 0:25:56.280
<v Speaker 1>they can be affected. So last week, at approximately ten

0:25:56.359 --> 0:26:00.680
<v Speaker 1>thirty am on Tuesday, December seven, twenty one, a w

0:26:01.000 --> 0:26:03.280
<v Speaker 1>S had what we in the tech biz call a

0:26:03.280 --> 0:26:06.600
<v Speaker 1>whoop see. It was a whoop see that lasted between

0:26:06.680 --> 0:26:09.400
<v Speaker 1>five to seven hours, depending upon the services you were

0:26:09.440 --> 0:26:14.200
<v Speaker 1>relying upon. And because a WS has this massive presence

0:26:14.240 --> 0:26:17.080
<v Speaker 1>in the market, and because so many big companies rely

0:26:17.240 --> 0:26:20.119
<v Speaker 1>on it in order to make their stuff work, that

0:26:20.160 --> 0:26:23.360
<v Speaker 1>whoop see had a pretty big footprint. According to Amazon,

0:26:23.480 --> 0:26:26.000
<v Speaker 1>the issue was that there was a glitch in some

0:26:26.080 --> 0:26:29.840
<v Speaker 1>crucial networking hardware. And this hardware is in charge of

0:26:29.880 --> 0:26:34.159
<v Speaker 1>hosting what Amazon called foundational services, including stuff like e

0:26:34.359 --> 0:26:38.960
<v Speaker 1>C two, but also it handled stuff like Amazon's Domain

0:26:39.040 --> 0:26:42.000
<v Speaker 1>name service. Now, this service is kind of like the

0:26:42.040 --> 0:26:46.239
<v Speaker 1>liaison that connects human readable u r L addresses with

0:26:46.440 --> 0:26:52.040
<v Speaker 1>machine readable addresses, and without it, you can tell your

0:26:52.240 --> 0:26:55.080
<v Speaker 1>browser to go to that particular website all you like,

0:26:55.280 --> 0:26:58.560
<v Speaker 1>but it ain't happening because the liaison is on like

0:26:58.600 --> 0:27:01.440
<v Speaker 1>a five to seven hour offee break, and the machines

0:27:01.480 --> 0:27:05.440
<v Speaker 1>have no idea what you're on about. Anyway, the AWS

0:27:05.520 --> 0:27:10.399
<v Speaker 1>internal system became overwhelmed, and that's something that usually doesn't happen.

0:27:10.640 --> 0:27:13.800
<v Speaker 1>Usually there's this cross network scaling system that kicks in

0:27:13.920 --> 0:27:17.439
<v Speaker 1>and meets increased demand. But this glitch essentially caused a

0:27:17.480 --> 0:27:21.399
<v Speaker 1>massive game of telephone within the AWS system, and it

0:27:21.520 --> 0:27:25.800
<v Speaker 1>overloaded all the circuits. To use a somewhat flimsy analogy,

0:27:25.840 --> 0:27:29.080
<v Speaker 1>so the glitch triggered what Amazon called quote a large

0:27:29.119 --> 0:27:32.960
<v Speaker 1>surge of connection activity that overwhelmed the networking devices between

0:27:32.960 --> 0:27:37.480
<v Speaker 1>the internal network and the main AWS network, resulting in

0:27:37.520 --> 0:27:41.280
<v Speaker 1>delays for a communication between these networks end quote. So

0:27:41.320 --> 0:27:44.800
<v Speaker 1>it's almost like a classic denial of service attack, only

0:27:44.840 --> 0:27:48.920
<v Speaker 1>Amazon kind of did it to itself. I guess we're

0:27:48.920 --> 0:27:51.880
<v Speaker 1>being fair, we would say the glitch cost it. Now.

0:27:51.960 --> 0:27:54.719
<v Speaker 1>A delay in communication normally just means you have an

0:27:54.720 --> 0:27:59.680
<v Speaker 1>irritating experience like lag, right, and you can manage that

0:27:59.800 --> 0:28:02.800
<v Speaker 1>you usually, but you know, it just makes whatever you're

0:28:02.840 --> 0:28:05.800
<v Speaker 1>doing more difficult. Except a lot of systems have time

0:28:05.800 --> 0:28:08.400
<v Speaker 1>out features, in which if there is a long enough

0:28:08.440 --> 0:28:11.800
<v Speaker 1>delay between sending a message and getting a response, you

0:28:11.880 --> 0:28:16.200
<v Speaker 1>reach a failed state, and that happened a lot last Tuesday.

0:28:17.119 --> 0:28:20.040
<v Speaker 1>What made matters more difficult was that Amazon's own real

0:28:20.119 --> 0:28:25.119
<v Speaker 1>time monitoring services rely on those internal AWS systems. I

0:28:25.119 --> 0:28:27.520
<v Speaker 1>mean that's how AWS even got started, right, I mean

0:28:27.560 --> 0:28:30.640
<v Speaker 1>it was Amazon building out its own infrastructure and then

0:28:30.680 --> 0:28:34.200
<v Speaker 1>offering up those capabilities to other companies. So that meant

0:28:34.400 --> 0:28:38.040
<v Speaker 1>the mitigation teams who were working to fix stuff didn't

0:28:38.080 --> 0:28:41.520
<v Speaker 1>have all their real time monitoring tools available as they

0:28:41.520 --> 0:28:44.200
<v Speaker 1>were tackling the problem, so that slowed down the recovery

0:28:44.280 --> 0:28:48.400
<v Speaker 1>quite a bit. Amazon has since apologized to customers for

0:28:48.480 --> 0:28:51.320
<v Speaker 1>this outage, and the reps now say that the company

0:28:51.400 --> 0:28:55.360
<v Speaker 1>is working to distribute its service Health Dashboard across multiple regions,

0:28:55.360 --> 0:28:58.680
<v Speaker 1>so that should something similar happen in the future, the

0:28:58.760 --> 0:29:03.160
<v Speaker 1>fixed should theoretically happened much more quickly. Uh So, Yeah,

0:29:03.200 --> 0:29:05.800
<v Speaker 1>this is another way for us to realize that we

0:29:05.880 --> 0:29:09.920
<v Speaker 1>have put a tremendous amount of trust and dependence upon

0:29:10.040 --> 0:29:15.120
<v Speaker 1>cloud services. And it's another reminder that sometimes like you

0:29:15.160 --> 0:29:18.000
<v Speaker 1>could have designed everything yourself as good as it can

0:29:18.040 --> 0:29:21.000
<v Speaker 1>possibly be. You can have an incredible app, but if

0:29:21.040 --> 0:29:24.560
<v Speaker 1>the technology that powers that app goes down, it doesn't

0:29:24.560 --> 0:29:27.120
<v Speaker 1>matter how good your product is, right it, you know,

0:29:27.240 --> 0:29:29.800
<v Speaker 1>and since you don't control that, since you are dependent

0:29:29.880 --> 0:29:35.280
<v Speaker 1>upon a cloud uh provider, then if the cloud provider

0:29:35.320 --> 0:29:38.440
<v Speaker 1>has problems, that's really a big blow to your own

0:29:38.800 --> 0:29:43.160
<v Speaker 1>business plans. It's one of the reasons why companies really

0:29:43.200 --> 0:29:45.680
<v Speaker 1>debate on what services they want to put on the

0:29:45.720 --> 0:29:49.680
<v Speaker 1>cloud versus on premises. Um It's it's a complicated thing too,

0:29:49.720 --> 0:29:54.200
<v Speaker 1>because scaling is such a tricky issue. Most companies you

0:29:54.240 --> 0:29:57.960
<v Speaker 1>know that aren't like huge fortune five companies don't have

0:29:58.040 --> 0:30:00.840
<v Speaker 1>the assets necessary to be able to scare al at

0:30:00.920 --> 0:30:03.600
<v Speaker 1>least not to the massive scales that we're seeing in

0:30:03.680 --> 0:30:08.000
<v Speaker 1>the global Internet space. Anyway, I hope you found this

0:30:08.040 --> 0:30:10.840
<v Speaker 1>episode interesting as we talked about a WS and what

0:30:10.960 --> 0:30:13.400
<v Speaker 1>happened last week. If you have suggestions for topics I

0:30:13.400 --> 0:30:16.360
<v Speaker 1>should cover on future episodes of tech Stuff, please reach

0:30:16.360 --> 0:30:17.960
<v Speaker 1>out to me. The best way to do that is

0:30:18.000 --> 0:30:20.760
<v Speaker 1>on Twitter. The handle for the show is text Stuff

0:30:21.160 --> 0:30:25.680
<v Speaker 1>H s W and I'll talk to you again really soon.

0:30:30.760 --> 0:30:33.800
<v Speaker 1>Text Stuff is an I Heart Radio production. For more

0:30:33.880 --> 0:30:37.280
<v Speaker 1>podcasts from My Heart radio, visit the I heart Radio app,

0:30:37.400 --> 0:30:40.560
<v Speaker 1>Apple podcasts, or wherever you listen to your favorite shows.