WEBVTT - Why Amazon Built a Spatula-Wielding Robot

0:00:15.356 --> 0:00:15.796
<v Speaker 1>Pushkin.

0:00:20.396 --> 0:00:22.596
<v Speaker 2>One of the things I look at in robotics as

0:00:22.636 --> 0:00:27.076
<v Speaker 2>a big field. There are so many amazing demonstrations of mobility,

0:00:27.636 --> 0:00:32.396
<v Speaker 2>robots doing backflips, robots running down hills, and that's really

0:00:32.396 --> 0:00:34.916
<v Speaker 2>impressive to me because I can't do a backflip, or

0:00:35.036 --> 0:00:37.756
<v Speaker 2>I might trip if I run down the hill. But

0:00:37.876 --> 0:00:41.636
<v Speaker 2>where the really valuable parts of robotics are going to

0:00:41.676 --> 0:00:46.036
<v Speaker 2>be are in manipulation. So my kid can take a

0:00:46.036 --> 0:00:48.436
<v Speaker 2>blueberry out of her cereal bowl because she doesn't want

0:00:48.476 --> 0:00:51.396
<v Speaker 2>to eat it, and that is an incredibly hard task

0:00:51.436 --> 0:00:54.196
<v Speaker 2>for a robot. And you don't see any of those demos.

0:00:54.276 --> 0:00:58.156
<v Speaker 2>And I think we're like kind of inherently programmed as

0:00:58.236 --> 0:01:01.476
<v Speaker 2>people to like be biased towards the backflip being more

0:01:01.516 --> 0:01:04.796
<v Speaker 2>impressive and in reality like the business value and the

0:01:04.836 --> 0:01:07.516
<v Speaker 2>harder thing for the robot is to like take the

0:01:07.516 --> 0:01:13.876
<v Speaker 2>blueberry out of the cereal bowl.

0:01:15.076 --> 0:01:17.436
<v Speaker 1>I'm Jacob Goldstein and this is What's Your Problem, the

0:01:17.476 --> 0:01:19.516
<v Speaker 1>show where I talk to people who are trying to

0:01:19.596 --> 0:01:24.676
<v Speaker 1>make technological progress. My guest today is Aaron Parness. Aarin

0:01:24.756 --> 0:01:27.876
<v Speaker 1>spent the earlier part of his career building space robots

0:01:27.876 --> 0:01:32.236
<v Speaker 1>at NASA's Jet Propulsion Laboratory JPL. Six years ago, he

0:01:32.276 --> 0:01:35.676
<v Speaker 1>went to work at Amazon. Now, Aaron is a director

0:01:35.716 --> 0:01:39.636
<v Speaker 1>of Applied Science at Amazon Robotics. I wanted to talk

0:01:39.676 --> 0:01:43.356
<v Speaker 1>to Erin about a robot arm called Vulcan. He and

0:01:43.436 --> 0:01:45.996
<v Speaker 1>his team developed Vulcan to do a job that is

0:01:46.076 --> 0:01:50.276
<v Speaker 1>surprisingly hard for robots to do, taking stuff that gets

0:01:50.316 --> 0:01:55.236
<v Speaker 1>delivered to Amazon warehouses and putting it onto shelves. In

0:01:55.316 --> 0:01:58.236
<v Speaker 1>order to solve this problem, Aaron and his team had

0:01:58.236 --> 0:02:00.676
<v Speaker 1>to build a robot that had a sense of touch,

0:02:01.156 --> 0:02:05.116
<v Speaker 1>that could deal with complicated, unpredictable situations, and that could

0:02:05.196 --> 0:02:08.036
<v Speaker 1>look at a shelf and plan out a course of action.

0:02:09.156 --> 0:02:11.796
<v Speaker 1>As you'll hear in the interview, all of those traits

0:02:11.916 --> 0:02:14.836
<v Speaker 1>may someday be helpful, not just in stocking shelves in

0:02:14.876 --> 0:02:18.036
<v Speaker 1>a warehouse, but in doing lots of boring sounding but

0:02:18.196 --> 0:02:23.316
<v Speaker 1>complicated real world tasks like, for example, taking a blueberry

0:02:23.396 --> 0:02:26.196
<v Speaker 1>out of a bowl of cereal. To start, I asked

0:02:26.196 --> 0:02:28.716
<v Speaker 1>Aaron to tell me the problem that Vulcan was designed

0:02:28.756 --> 0:02:30.716
<v Speaker 1>to solve at Amazon's warehouses.

0:02:31.516 --> 0:02:35.356
<v Speaker 2>So new inventory comes into the building. You know, trucks

0:02:35.356 --> 0:02:38.156
<v Speaker 2>pull up and they unload new stuff. We need to

0:02:38.196 --> 0:02:40.796
<v Speaker 2>store that stuff while it's waiting for someone to click

0:02:40.836 --> 0:02:45.236
<v Speaker 2>the buy button. We store it in these large fabric bookcases.

0:02:46.116 --> 0:02:49.276
<v Speaker 2>It's about eight feet tall. It has about forty different

0:02:49.316 --> 0:02:53.076
<v Speaker 2>shelves on it. It's four sided, so you can store

0:02:53.156 --> 0:02:56.996
<v Speaker 2>stuff from any of the different faces of the case.

0:02:57.756 --> 0:03:01.156
<v Speaker 2>What's really interesting is this stuff is randomly stowed, so

0:03:01.196 --> 0:03:04.036
<v Speaker 2>it's not like all the iPhones are in one shelf.

0:03:04.476 --> 0:03:08.076
<v Speaker 2>It'll be all different stuff, all mixed together.

0:03:08.596 --> 0:03:11.356
<v Speaker 1>When you say random, do you mean random or do

0:03:11.396 --> 0:03:13.876
<v Speaker 1>you mean it would look random to the untrained eye.

0:03:14.116 --> 0:03:18.476
<v Speaker 2>I mean literally random. Really wherever there is space you

0:03:18.556 --> 0:03:20.436
<v Speaker 2>can put the item, because.

0:03:20.076 --> 0:03:22.956
<v Speaker 1>That's what's optimal. It turns out the optimal way to

0:03:22.956 --> 0:03:24.516
<v Speaker 1>store stuff is random.

0:03:24.236 --> 0:03:28.956
<v Speaker 2>That's right. Why the stems actually from Jeff Bezos is

0:03:28.996 --> 0:03:32.516
<v Speaker 2>like original vision, I think, and it's it's incredible. So

0:03:33.556 --> 0:03:36.596
<v Speaker 2>you want to have the most selection, and you want

0:03:36.636 --> 0:03:39.156
<v Speaker 2>to have speed of delivery, and you want to have

0:03:39.196 --> 0:03:41.556
<v Speaker 2>low cost, and that's what the customer wants, right. The

0:03:42.276 --> 0:03:46.116
<v Speaker 2>customer is using Amazon dot Com because we have selection,

0:03:46.236 --> 0:03:49.356
<v Speaker 2>we have speed, and we have low cost. In order

0:03:49.396 --> 0:03:53.076
<v Speaker 2>to achieve that, you have to have these massive warehouses

0:03:53.676 --> 0:03:56.996
<v Speaker 2>located really close to your customers, and you have a

0:03:56.996 --> 0:03:59.316
<v Speaker 2>lot of customers in Tokyo, in New York City and

0:03:59.356 --> 0:04:03.356
<v Speaker 2>San Francisco where real estate's really expensive, so you have

0:04:03.396 --> 0:04:05.076
<v Speaker 2>to figure out a way to put all of this

0:04:05.276 --> 0:04:10.116
<v Speaker 2>different stuff in like the densest packing area, you can

0:04:10.596 --> 0:04:13.516
<v Speaker 2>and have access to it immediately so that you can

0:04:13.676 --> 0:04:17.636
<v Speaker 2>you can deliver in hours instead of days. And what

0:04:17.676 --> 0:04:22.556
<v Speaker 2>that means is that random is better than structured. So

0:04:22.676 --> 0:04:25.476
<v Speaker 2>anywhere there's a space, you can add that item into

0:04:25.516 --> 0:04:28.196
<v Speaker 2>the inventory, and that means it comes up for sale

0:04:28.236 --> 0:04:31.916
<v Speaker 2>immediately on the website, and then when someone places in order,

0:04:32.596 --> 0:04:34.956
<v Speaker 2>you don't have to wait for that iPhone bookcase to

0:04:34.996 --> 0:04:37.436
<v Speaker 2>make its way all the way across the warehouse. You

0:04:37.516 --> 0:04:40.076
<v Speaker 2>probably have a thousand iPhones in the warehouse, and whichever

0:04:40.116 --> 0:04:43.676
<v Speaker 2>one is closest can go to whichever pickstation is eligible,

0:04:44.316 --> 0:04:49.716
<v Speaker 2>and it ends up being actually substantially faster. So that last.

0:04:49.676 --> 0:04:52.396
<v Speaker 1>Sentence seems to be the key. The idea is like, yes,

0:04:52.436 --> 0:04:55.076
<v Speaker 1>given you have whatever a thousand iPhones in the warehouse

0:04:56.076 --> 0:04:57.876
<v Speaker 1>in the universe where a human had to know where

0:04:57.876 --> 0:04:59.676
<v Speaker 1>they all were, you'd put them all on one shelf.

0:05:00.076 --> 0:05:02.516
<v Speaker 1>But you're saying at any given time that means that

0:05:02.516 --> 0:05:05.836
<v Speaker 1>shelf is probably going to be pretty far away, whereas

0:05:05.916 --> 0:05:09.756
<v Speaker 1>if you randomly distribute them throughout the shelves and warehouse

0:05:10.236 --> 0:05:12.676
<v Speaker 1>at any given time, one of those thousand iPhones is

0:05:12.676 --> 0:05:14.556
<v Speaker 1>probably going to be pretty close to where it needs

0:05:14.596 --> 0:05:17.716
<v Speaker 1>to be. And because you have a whatever, a computerized

0:05:17.716 --> 0:05:19.836
<v Speaker 1>system that can keep track of everything all the time,

0:05:19.956 --> 0:05:22.276
<v Speaker 1>it makes sense to randomly distribute all the things.

0:05:22.396 --> 0:05:25.236
<v Speaker 2>Yeah, that's exactly right. And it works on the flip

0:05:25.276 --> 0:05:27.396
<v Speaker 2>side as well. So when you have a new item

0:05:27.436 --> 0:05:30.676
<v Speaker 2>that's come in, rather than waiting for the shelf that

0:05:30.716 --> 0:05:33.516
<v Speaker 2>has the right size thing to put the new dog

0:05:33.596 --> 0:05:36.116
<v Speaker 2>toy in, you just put the dog toy anywhere you

0:05:36.156 --> 0:05:37.076
<v Speaker 2>can find space for it.

0:05:37.796 --> 0:05:41.636
<v Speaker 1>H It's like my house. We have a lot of

0:05:41.676 --> 0:05:45.316
<v Speaker 1>dog toys in my house. Also, Yeah, that's really interesting.

0:05:45.396 --> 0:05:47.996
<v Speaker 1>It's great for the customer, and that's optimal.

0:05:48.036 --> 0:05:54.676
<v Speaker 2>It's optimal, and it creates an incredibly difficult environment for robotics. Huh,

0:05:54.756 --> 0:05:57.076
<v Speaker 2>because now you have to deal with all this clutter.

0:05:57.556 --> 0:06:00.316
<v Speaker 2>We can have more than a million unique items in

0:06:00.436 --> 0:06:03.156
<v Speaker 2>one warehouse. Yeah, so it's not like you have a

0:06:03.196 --> 0:06:06.756
<v Speaker 2>model of each of those items. And we sell more

0:06:06.956 --> 0:06:10.596
<v Speaker 2>third party items than you know, Amazon owns themselves. Right.

0:06:10.636 --> 0:06:14.036
<v Speaker 2>We are a platform for third party fulfillment, and so

0:06:14.836 --> 0:06:17.116
<v Speaker 2>you don't have all the data about all those items,

0:06:17.156 --> 0:06:19.236
<v Speaker 2>and so you have to handle all this uncertainty, all

0:06:19.276 --> 0:06:22.516
<v Speaker 2>this clutter, and everything's tightly packed.

0:06:22.596 --> 0:06:26.196
<v Speaker 1>And so still in most places as a result, when

0:06:26.316 --> 0:06:28.836
<v Speaker 1>stuff comes into the warehouse every day off a truck

0:06:28.876 --> 0:06:32.636
<v Speaker 1>to people take the things out of the truck and

0:06:32.676 --> 0:06:35.156
<v Speaker 1>stick them randomly on shelves wherever they can find space.

0:06:35.236 --> 0:06:36.116
<v Speaker 1>Is that the system?

0:06:36.196 --> 0:06:38.876
<v Speaker 2>That is exactly the system, and it's in you know,

0:06:39.036 --> 0:06:41.716
<v Speaker 2>hundreds of buildings around the world.

0:06:42.556 --> 0:06:44.876
<v Speaker 1>And just to be clear, I mean it's pretty clear,

0:06:44.916 --> 0:06:48.236
<v Speaker 1>but just to really put a point on it, why

0:06:48.396 --> 0:06:50.756
<v Speaker 1>is this a hard environment for robots?

0:06:51.396 --> 0:06:56.916
<v Speaker 2>Traditional industrial robots do not handle contact well, so like

0:06:57.036 --> 0:07:03.836
<v Speaker 2>touching their environments, and they don't handle clutter or you know, uncertainty,

0:07:03.956 --> 0:07:07.356
<v Speaker 2>and so it's hard because to put that last book

0:07:07.356 --> 0:07:11.876
<v Speaker 2>onto the bookshelf, squeeze that teddy Bear into the just

0:07:12.036 --> 0:07:15.036
<v Speaker 2>small enough space that it'll fit, you have to push

0:07:15.076 --> 0:07:18.396
<v Speaker 2>the other stuff around that's already on that bookshelf. And

0:07:18.916 --> 0:07:21.996
<v Speaker 2>a traditional robot doesn't have sensors, it doesn't even know

0:07:22.036 --> 0:07:24.036
<v Speaker 2>how to do that. So if you think of like

0:07:24.076 --> 0:07:28.156
<v Speaker 2>a car manufacturing line, you're like nineteen nineties two thousands,

0:07:28.196 --> 0:07:31.516
<v Speaker 2>you know, welding robot or loading sheet metal into a press.

0:07:32.196 --> 0:07:34.956
<v Speaker 2>It's doing all of that only knowing its position in space.

0:07:35.356 --> 0:07:38.396
<v Speaker 2>So it has no force sensing. If it runs into something,

0:07:38.556 --> 0:07:41.996
<v Speaker 2>it either is like an emergency stop because it's like broken,

0:07:42.476 --> 0:07:44.756
<v Speaker 2>or it just smashes that thing and keeps going and

0:07:44.836 --> 0:07:47.356
<v Speaker 2>it doesn't even know it's smashed anything. It literally has

0:07:47.476 --> 0:07:48.236
<v Speaker 2>no sensing.

0:07:48.956 --> 0:07:53.036
<v Speaker 1>That is an incredibly homogeneous environment. Right, It's doing like

0:07:53.076 --> 0:07:55.196
<v Speaker 1>the exact same thing at a very high level of

0:07:55.196 --> 0:07:57.996
<v Speaker 1>precision forever one thing.

0:07:57.916 --> 0:08:01.876
<v Speaker 2>That's exactly right. And so this extension, the fundamental breakthrough

0:08:02.516 --> 0:08:05.876
<v Speaker 2>for science for robotics manipulation that my team is trying

0:08:05.876 --> 0:08:08.316
<v Speaker 2>to make is one giving the robot a sense of

0:08:08.356 --> 0:08:11.796
<v Speaker 2>touch and using that along with site and along with

0:08:11.996 --> 0:08:16.556
<v Speaker 2>like knowing where your robot is to do meaningfull tasks

0:08:16.596 --> 0:08:20.276
<v Speaker 2>in like very high contact, high clutter environments. And then

0:08:20.276 --> 0:08:23.316
<v Speaker 2>there's a brain part. It's also much more difficult to

0:08:23.396 --> 0:08:27.476
<v Speaker 2>kind of predict how this random assortment of items is

0:08:27.516 --> 0:08:30.436
<v Speaker 2>going to move or change as you push on it.

0:08:30.756 --> 0:08:33.116
<v Speaker 2>And so there's an AI piece, there's a brain piece

0:08:33.196 --> 0:08:36.196
<v Speaker 2>that's saying this item will fit in that bin. This

0:08:36.276 --> 0:08:37.916
<v Speaker 2>is actually one of the most frustrating things when you

0:08:37.916 --> 0:08:40.396
<v Speaker 2>try and do the job yourself. I'm like an optimist.

0:08:40.396 --> 0:08:42.356
<v Speaker 2>I'm always oh, yeah, this will fit. And I go

0:08:42.436 --> 0:08:44.236
<v Speaker 2>up there and I try and play Tetris and I

0:08:44.276 --> 0:08:47.196
<v Speaker 2>try and rearrange the shelf and like, it clearly isn't

0:08:47.196 --> 0:08:49.556
<v Speaker 2>going to fit. And then I've wasted thirty seconds or

0:08:49.556 --> 0:08:52.076
<v Speaker 2>forty seconds and I have to try something else.

0:08:53.036 --> 0:08:56.036
<v Speaker 1>That's a good statement of the problem. Well, like when

0:08:56.036 --> 0:08:57.236
<v Speaker 1>did you come onto the scene?

0:08:58.196 --> 0:09:01.276
<v Speaker 2>So I was working on some other stuff and there

0:09:01.436 --> 0:09:06.476
<v Speaker 2>was a recent PhD that had joined our team. He was,

0:09:06.516 --> 0:09:08.316
<v Speaker 2>you know, one year out of school something like this,

0:09:08.956 --> 0:09:10.796
<v Speaker 2>and he says, I'm going to go try and solve

0:09:10.996 --> 0:09:15.476
<v Speaker 2>stowing items into these bookshelves. And my thought was, Oh,

0:09:15.476 --> 0:09:18.116
<v Speaker 2>how naive, Like the real world is going to teach

0:09:18.556 --> 0:09:21.836
<v Speaker 2>this new grad. That's just way too hard a problem

0:09:21.836 --> 0:09:24.716
<v Speaker 2>for robotics to solve. But I was helping him because

0:09:24.756 --> 0:09:26.596
<v Speaker 2>it's fun, right, Like you like to work on hard

0:09:26.596 --> 0:09:29.836
<v Speaker 2>problems when you're a researcher. And he was a very

0:09:29.916 --> 0:09:32.196
<v Speaker 2>nice guy, and so I was, you know, helping him,

0:09:32.236 --> 0:09:34.356
<v Speaker 2>but never thought it was going to work. And there

0:09:34.356 --> 0:09:36.476
<v Speaker 2>were a couple of kind of moments where we made

0:09:36.516 --> 0:09:41.556
<v Speaker 2>these simplifications that turn the problem from I have to

0:09:41.596 --> 0:09:44.996
<v Speaker 2>try and do every possible game of Tetris that a

0:09:45.036 --> 0:09:49.996
<v Speaker 2>person can do into a problem where you're like, oh,

0:09:50.676 --> 0:09:52.436
<v Speaker 2>it's not that this is never going to work, it's

0:09:52.436 --> 0:09:54.836
<v Speaker 2>that this is the future, Like this is robotics two

0:09:54.836 --> 0:09:57.356
<v Speaker 2>point zero, Like this is I have to work on this.

0:09:57.436 --> 0:10:00.116
<v Speaker 2>I can't do anything else anymore. I'm like, I'm all

0:10:00.156 --> 0:10:01.236
<v Speaker 2>in on this problem.

0:10:01.756 --> 0:10:05.596
<v Speaker 1>Tell me about one of those simplifications one of those moments.

0:10:05.836 --> 0:10:09.276
<v Speaker 2>It was the gripper is one. The design, the mechanic

0:10:09.476 --> 0:10:13.076
<v Speaker 2>design of the robotic hand was actually a big breakthrough.

0:10:13.396 --> 0:10:17.036
<v Speaker 2>And when we started, we were trying to push items

0:10:17.396 --> 0:10:19.876
<v Speaker 2>with the item we were gripping. So imagine you're pinching

0:10:19.916 --> 0:10:21.996
<v Speaker 2>a book and you're trying to use that book to

0:10:22.076 --> 0:10:25.236
<v Speaker 2>like push this dog toy over to the side.

0:10:25.916 --> 0:10:27.516
<v Speaker 1>I see, So you want to put the book in

0:10:27.556 --> 0:10:30.516
<v Speaker 1>a bin. Yeah, dog toys in the way. So you're like, okay,

0:10:30.556 --> 0:10:32.236
<v Speaker 1>pick up the book and use the book kind of

0:10:32.316 --> 0:10:34.516
<v Speaker 1>like a brush to sweep the dog toy out of

0:10:34.556 --> 0:10:34.836
<v Speaker 1>the way.

0:10:35.036 --> 0:10:37.916
<v Speaker 2>Okay, And I say, okay, like I understand, but it's

0:10:37.996 --> 0:10:39.756
<v Speaker 2>never going to work. What if you don't have a book.

0:10:39.756 --> 0:10:41.676
<v Speaker 2>What if you have a T shirt? Yeah, what if

0:10:41.716 --> 0:10:44.836
<v Speaker 2>you have an iPhone and it's very expensive? Are you

0:10:44.916 --> 0:10:46.756
<v Speaker 2>going to actually want to start pushing on stuff with

0:10:46.836 --> 0:10:49.996
<v Speaker 2>the phone? And so we came up with this strategy

0:10:50.156 --> 0:10:53.556
<v Speaker 2>to have like a spatula that would extend into the

0:10:53.596 --> 0:10:56.436
<v Speaker 2>bin and you'd push everything with this spatula that was

0:10:56.596 --> 0:11:00.116
<v Speaker 2>part of your hand. So imagine like you're like Wolverine

0:11:00.316 --> 0:11:02.716
<v Speaker 2>and you can shoot out, you know, but instead of

0:11:02.756 --> 0:11:05.676
<v Speaker 2>like the Adamantium claws, you're shooting out a spatula.

0:11:07.516 --> 0:11:10.676
<v Speaker 1>So it's like a pincher grip. A little spatula shoots

0:11:10.716 --> 0:11:12.116
<v Speaker 1>forward out of the pincher grip.

0:11:12.236 --> 0:11:13.476
<v Speaker 2>Is the thing that's right.

0:11:13.876 --> 0:11:16.796
<v Speaker 1>It's so simple when you put it that way. I mean,

0:11:16.836 --> 0:11:18.756
<v Speaker 1>I'm sure making it was not low tech, but it

0:11:18.796 --> 0:11:22.556
<v Speaker 1>sounds very like. It's not like some crazy AI thing.

0:11:22.636 --> 0:11:24.556
<v Speaker 1>It's like just what if there was another little thing

0:11:24.596 --> 0:11:26.756
<v Speaker 1>that came out and push stuff out of the way.

0:11:26.916 --> 0:11:29.756
<v Speaker 2>But those ideas are like the really powerful ones when

0:11:29.796 --> 0:11:33.956
<v Speaker 2>you have a simple, elegant solution and you're like, okay,

0:11:34.396 --> 0:11:38.476
<v Speaker 2>that could work. That's different than like a five fingered

0:11:38.556 --> 0:11:42.636
<v Speaker 2>hand that has twenty five motors embedded in it. Yeah,

0:11:42.796 --> 0:11:44.596
<v Speaker 2>like Oh, it's just dispatulate.

0:11:44.196 --> 0:11:48.716
<v Speaker 1>Fingers are famously difficult. Why didn't anybody think of it before?

0:11:49.516 --> 0:11:51.836
<v Speaker 2>So we had been working on it as a company

0:11:51.956 --> 0:11:54.396
<v Speaker 2>back to the Amazon Picking challenge, which was, you know,

0:11:54.436 --> 0:11:59.916
<v Speaker 2>twenty fifteen. But I think a lot of robotics researchers

0:11:59.956 --> 0:12:02.396
<v Speaker 2>like myself, were scared that this problem was just too hard.

0:12:02.436 --> 0:12:04.476
<v Speaker 2>There was easier things to go try and work on,

0:12:05.276 --> 0:12:08.276
<v Speaker 2>and there were a couple of simplifications, so using this

0:12:08.316 --> 0:12:11.356
<v Speaker 2>spatulo as one, and then you watch people do the

0:12:11.396 --> 0:12:13.716
<v Speaker 2>task and you realize they're kind of doing the same

0:12:14.876 --> 0:12:19.276
<v Speaker 2>strategies over and over again. It's like insert dispatchela on

0:12:19.316 --> 0:12:23.596
<v Speaker 2>the D and sweep to one side. For this kind

0:12:23.596 --> 0:12:26.716
<v Speaker 2>of page turn mechanism. Something's fallen over and you need

0:12:26.756 --> 0:12:29.396
<v Speaker 2>to sort of flip it back up to make space.

0:12:30.316 --> 0:12:33.076
<v Speaker 1>So you put this spatula underneath it and flip the

0:12:33.116 --> 0:12:35.276
<v Speaker 1>thing up ninety degreees basically, Yeah.

0:12:35.156 --> 0:12:38.396
<v Speaker 2>And you realize that accounts for like ninety percent of

0:12:38.436 --> 0:12:41.476
<v Speaker 2>the actions you do when you try and stow into

0:12:41.516 --> 0:12:42.076
<v Speaker 2>these pins.

0:12:42.796 --> 0:12:45.996
<v Speaker 1>And did you figure that out by watching people stow.

0:12:45.956 --> 0:12:47.956
<v Speaker 2>We did and doing it yourself.

0:12:48.356 --> 0:12:52.356
<v Speaker 1>How much stowing did you do couple of days? Okay,

0:12:52.516 --> 0:12:56.156
<v Speaker 1>it's a hard job, thousands of items probably, I imagine.

0:12:55.596 --> 0:12:59.116
<v Speaker 2>Yeah, exactly. And we tried to wear go pro cameras

0:12:59.116 --> 0:13:00.796
<v Speaker 2>on our heads so we could look at the videos later,

0:13:00.836 --> 0:13:03.716
<v Speaker 2>which turns out as a recipe for motion sickness. It's

0:13:03.876 --> 0:13:06.516
<v Speaker 2>very difficult to watch those videos, but you go and

0:13:06.556 --> 0:13:08.356
<v Speaker 2>you do it, and you build up this intuition. And

0:13:08.916 --> 0:13:11.036
<v Speaker 2>I I think the other piece of the problem that

0:13:11.556 --> 0:13:14.036
<v Speaker 2>made it tractable and made me this like huge believer

0:13:14.636 --> 0:13:16.756
<v Speaker 2>was recognizing we didn't have to get to one hundred percent.

0:13:17.356 --> 0:13:21.716
<v Speaker 2>So in some automation scenarios, you have to solve the

0:13:21.756 --> 0:13:24.956
<v Speaker 2>whole problem, and if you don't, you have nothing, so

0:13:24.996 --> 0:13:27.476
<v Speaker 2>like landing on the moon. And what we realized was

0:13:27.516 --> 0:13:30.076
<v Speaker 2>there was a way to like make the business logic

0:13:30.196 --> 0:13:34.196
<v Speaker 2>work that the robot could handle seventy five percent of

0:13:34.236 --> 0:13:36.756
<v Speaker 2>the stoves and it just had to not make a

0:13:36.796 --> 0:13:41.036
<v Speaker 2>mess and let like work alongside people to do the

0:13:41.116 --> 0:13:44.276
<v Speaker 2>other twenty five percent, and the sum of the parts

0:13:44.316 --> 0:13:47.836
<v Speaker 2>is actually much better than either all robots or all

0:13:47.956 --> 0:13:51.876
<v Speaker 2>employees would be on their own. And making that realization

0:13:52.916 --> 0:13:54.396
<v Speaker 2>all of a sudden meant that it could be a

0:13:54.836 --> 0:13:57.316
<v Speaker 2>two or three year project instead of a twenty year

0:13:57.356 --> 0:14:01.276
<v Speaker 2>project because chasing this long tail. You know, we have

0:14:01.316 --> 0:14:04.156
<v Speaker 2>a million unique items in the building, but we also

0:14:04.236 --> 0:14:06.636
<v Speaker 2>process a million items per day. So I have a

0:14:06.676 --> 0:14:09.796
<v Speaker 2>phrase like, if something goes wrong one in a million,

0:14:09.876 --> 0:14:13.636
<v Speaker 2>it happens every day in every Amazon building. And to

0:14:13.676 --> 0:14:15.596
<v Speaker 2>try and solve all of those is it is a

0:14:15.636 --> 0:14:16.596
<v Speaker 2>twenty year problem.

0:14:17.076 --> 0:14:20.796
<v Speaker 1>I feel like that part of the solution generalizes in

0:14:20.836 --> 0:14:23.436
<v Speaker 1>a really nice way, right, Like I mean, I guess

0:14:23.436 --> 0:14:26.636
<v Speaker 1>the eighty twenty problem is a sort of cliche. But

0:14:27.236 --> 0:14:30.516
<v Speaker 1>the idea that like, oh, if you think of the

0:14:30.556 --> 0:14:32.636
<v Speaker 1>problem the right way, it's like, no, we don't have

0:14:32.636 --> 0:14:34.156
<v Speaker 1>to build a robot that does it every time. Before

0:14:34.156 --> 0:14:36.156
<v Speaker 1>we build a robot that does it seventy five percent

0:14:36.196 --> 0:14:38.916
<v Speaker 1>of the time. That is a huge efficiency gain and

0:14:38.956 --> 0:14:42.636
<v Speaker 1>maybe the optimal point on the curve. Right yep, if

0:14:42.676 --> 0:14:45.956
<v Speaker 1>the robot is doing everything, you're working too hard to

0:14:45.956 --> 0:14:46.876
<v Speaker 1>make the robot work.

0:14:46.956 --> 0:14:48.556
<v Speaker 2>Probably exactly that.

0:14:49.716 --> 0:14:52.956
<v Speaker 1>So, Okay, so you have these two big ideas. Do

0:14:53.036 --> 0:14:54.916
<v Speaker 1>you want to tell me the sort of story of

0:14:54.956 --> 0:14:56.876
<v Speaker 1>making it work? You want to tell me how it works.

0:14:57.156 --> 0:15:00.516
<v Speaker 2>We've been running six of these robots at a warehouse

0:15:00.636 --> 0:15:06.076
<v Speaker 2>in Spokane, Washington, Okay, since November of last year and

0:15:06.116 --> 0:15:09.676
<v Speaker 2>so we've done over half a million stows it this point.

0:15:10.116 --> 0:15:12.916
<v Speaker 2>We also have another product that's picking those items out

0:15:12.916 --> 0:15:16.236
<v Speaker 2>of the bins, and so that's my team in Germany,

0:15:16.316 --> 0:15:19.196
<v Speaker 2>and so we have a warehouse in Homburg where we've

0:15:19.236 --> 0:15:22.716
<v Speaker 2>been picking items. And picking is a slightly harder problem

0:15:22.756 --> 0:15:25.436
<v Speaker 2>in some ways because you have to identify the item.

0:15:25.836 --> 0:15:28.196
<v Speaker 2>So for stow, you have to identify free space. It's

0:15:28.236 --> 0:15:32.116
<v Speaker 2>either occupied or you can make space to put the

0:15:32.156 --> 0:15:34.276
<v Speaker 2>next item in. For pick, you want to make sure

0:15:34.276 --> 0:15:36.476
<v Speaker 2>I get you the red T shirt, not the red sweatpants,

0:15:36.556 --> 0:15:39.276
<v Speaker 2>or I get you the Harry Potter volume two and

0:15:39.356 --> 0:15:41.396
<v Speaker 2>not Sapiens or some other book.

0:15:41.956 --> 0:15:44.156
<v Speaker 1>Tell me how it works. Let's do this stewing first,

0:15:44.196 --> 0:15:46.356
<v Speaker 1>since that's what we've been talking about. So there's this

0:15:46.476 --> 0:15:49.836
<v Speaker 1>warehouse in Spokane where this robot that you built is

0:15:49.876 --> 0:15:52.916
<v Speaker 1>in use. Like what happens there? A truck pulls in

0:15:53.076 --> 0:15:54.036
<v Speaker 1>and then what happens.

0:15:54.476 --> 0:15:57.796
<v Speaker 2>The way the system works is one of these pods,

0:15:57.876 --> 0:16:00.956
<v Speaker 2>one of these bookcases pulls up to the station, so

0:16:00.996 --> 0:16:04.036
<v Speaker 2>it pulls in front of the robot. We have stereo

0:16:04.116 --> 0:16:07.076
<v Speaker 2>camera towers and so we're looking with the eyes first,

0:16:07.756 --> 0:16:12.716
<v Speaker 2>and we are creating a three D representation of the scene.

0:16:12.796 --> 0:16:15.996
<v Speaker 2>So we're modeling, you know, all the items that are

0:16:16.036 --> 0:16:20.716
<v Speaker 2>in the in the pod already. But the really interesting

0:16:20.756 --> 0:16:23.796
<v Speaker 2>part is we're actually predicting on top of that, how

0:16:23.836 --> 0:16:27.676
<v Speaker 2>we can move those items around to make more empty space,

0:16:28.156 --> 0:16:30.996
<v Speaker 2>how can we squeeze more stuff in. So it's not

0:16:31.276 --> 0:16:35.316
<v Speaker 2>just identifying vacant space. You have to predict where you

0:16:35.356 --> 0:16:40.156
<v Speaker 2>can make that vacant space by pushing stuff with this spatula. Okay,

0:16:40.356 --> 0:16:43.556
<v Speaker 2>then we do this matching algorithm. So we have about

0:16:43.636 --> 0:16:46.916
<v Speaker 2>forty or fifty items waiting for us to stow, and

0:16:46.956 --> 0:16:49.956
<v Speaker 2>so we have a variety of stuff, and we're matching

0:16:49.996 --> 0:16:53.756
<v Speaker 2>those forty or fifty items to the thirty ish shelves

0:16:53.836 --> 0:16:56.276
<v Speaker 2>that are in front of the robot. Which items should

0:16:56.276 --> 0:16:58.876
<v Speaker 2>go where, and then how do we make that space?

0:16:59.436 --> 0:17:01.636
<v Speaker 2>And so that's where a lot of the AI in

0:17:01.716 --> 0:17:05.436
<v Speaker 2>the system is active and operating. It's predicting success, it's

0:17:05.956 --> 0:17:08.756
<v Speaker 2>minimizing risk, it's trying to optimize for a bunch of

0:17:08.756 --> 0:17:14.196
<v Speaker 2>different parameters. Once we've made that selection, we grasp the item,

0:17:14.436 --> 0:17:18.036
<v Speaker 2>so that item we've selected for putting into the given

0:17:18.076 --> 0:17:21.876
<v Speaker 2>shelf passes into our hand and our hand is two

0:17:21.956 --> 0:17:24.196
<v Speaker 2>conveyor belt paddles, so you can think of it kind

0:17:24.196 --> 0:17:27.036
<v Speaker 2>of like a Panini press, like a George Foreman grill.

0:17:27.116 --> 0:17:30.596
<v Speaker 2>It is a George Foreman grill where each side has

0:17:30.636 --> 0:17:32.316
<v Speaker 2>a conveyor built into it.

0:17:32.596 --> 0:17:35.476
<v Speaker 1>Like a little belt, Like just a little belt going around.

0:17:35.636 --> 0:17:37.796
<v Speaker 2>That's right. So each face of the grill, the top

0:17:37.796 --> 0:17:40.876
<v Speaker 2>face and the bottom face have a conveyor belt. And

0:17:40.916 --> 0:17:43.516
<v Speaker 2>that's important because you can control the pose of the

0:17:43.556 --> 0:17:47.436
<v Speaker 2>item and you can feed it into the bin rather

0:17:47.476 --> 0:17:50.676
<v Speaker 2>than like throwing it into the bin. One of the

0:17:50.716 --> 0:17:53.076
<v Speaker 2>early versions we had kind of dropped it and tried

0:17:53.116 --> 0:17:55.036
<v Speaker 2>to punch it to put the item into the bin,

0:17:55.076 --> 0:17:57.396
<v Speaker 2>and that predictably failed in a lot.

0:17:57.196 --> 0:18:00.876
<v Speaker 1>Of Well you say predictably now, but if you try

0:18:00.916 --> 0:18:02.236
<v Speaker 1>to it wasn't predictable.

0:18:02.396 --> 0:18:05.556
<v Speaker 2>Yeah. Yeah. I'm a huge believer in iterative design, and

0:18:05.636 --> 0:18:09.716
<v Speaker 2>so we try and build early, build often build and

0:18:09.796 --> 0:18:13.716
<v Speaker 2>learn from those builds. So it's actually really important to

0:18:13.916 --> 0:18:17.956
<v Speaker 2>keep sixed off pose control of the item. So you

0:18:17.996 --> 0:18:20.756
<v Speaker 2>want to make sure the item isn't rotating as you

0:18:20.756 --> 0:18:22.556
<v Speaker 2>shoot it out. You want to make sure that you

0:18:22.796 --> 0:18:25.996
<v Speaker 2>keep the orientation of the item because it's fitting tightly,

0:18:26.636 --> 0:18:28.956
<v Speaker 2>so you don't want it to run into the bookshelf

0:18:28.996 --> 0:18:31.796
<v Speaker 2>above it or below it, or the items thirty in there.

0:18:31.876 --> 0:18:34.676
<v Speaker 2>Yeah yeah, yeah. We started by trying to shoot it out,

0:18:34.676 --> 0:18:36.236
<v Speaker 2>and then we had all kinds of problems when it

0:18:36.236 --> 0:18:38.756
<v Speaker 2>would like collide with stuff and fall on the floor.

0:18:39.316 --> 0:18:42.236
<v Speaker 2>The worst case is, you know, you would shoot it out,

0:18:42.236 --> 0:18:44.516
<v Speaker 2>it would bounce off the back of the back of

0:18:44.556 --> 0:18:46.196
<v Speaker 2>the bookcase and then come back and hit you in

0:18:46.196 --> 0:18:47.356
<v Speaker 2>the face or hit you in your.

0:18:47.436 --> 0:18:48.076
<v Speaker 1>Did that happen?

0:18:48.156 --> 0:18:48.996
<v Speaker 2>Yeah? Oh yeah, good.

0:18:49.036 --> 0:18:52.636
<v Speaker 1>That's robot comedy, yesh, ro but physical comedy.

0:18:52.916 --> 0:18:53.396
<v Speaker 2>Yeah.

0:18:53.476 --> 0:18:54.556
<v Speaker 1>So, and that's it.

0:18:54.916 --> 0:18:56.956
<v Speaker 2>That's the stow process. And we want to do that

0:18:57.036 --> 0:18:59.516
<v Speaker 2>a few hundred times an hour, and we want to

0:18:59.516 --> 0:19:03.476
<v Speaker 2>do it on the top shelves of those bookcases where. Yeah.

0:19:03.516 --> 0:19:06.596
<v Speaker 2>That's one of the ways we are really complimentary to

0:19:06.716 --> 0:19:10.116
<v Speaker 2>the employees is if the robots can do the top shelves,

0:19:09.836 --> 0:19:13.436
<v Speaker 2>it saves a lot of ergonomic tasks. It allows the

0:19:13.556 --> 0:19:16.436
<v Speaker 2>employees to work in their power zone, like you know,

0:19:17.156 --> 0:19:20.236
<v Speaker 2>shoulder level. That makes them faster too. So if you

0:19:20.236 --> 0:19:22.436
<v Speaker 2>put robots in, people get faster at the job.

0:19:22.916 --> 0:19:25.996
<v Speaker 1>I mean, presumably as the robot gets better, it'll also

0:19:26.036 --> 0:19:27.836
<v Speaker 1>be better at putting things on the middle shelf.

0:19:28.036 --> 0:19:29.716
<v Speaker 2>Right, Well, there's this like sweet spot.

0:19:29.756 --> 0:19:31.956
<v Speaker 1>The robot's going to get better faster than people will

0:19:31.996 --> 0:19:32.436
<v Speaker 1>get better.

0:19:33.036 --> 0:19:34.996
<v Speaker 2>Yeah, we want the robots to be as good as

0:19:34.996 --> 0:19:37.396
<v Speaker 2>it can and not chase one hundred percent. We don't

0:19:37.396 --> 0:19:39.756
<v Speaker 2>really believe in one hundred percent automation. We want to

0:19:40.396 --> 0:19:43.236
<v Speaker 2>find that sweet spot where we're maximizing product.

0:19:43.236 --> 0:19:45.476
<v Speaker 1>Mean, the sweet spot's going to keep moving, right, The

0:19:45.556 --> 0:19:47.716
<v Speaker 1>robot's going to get better and better and be able

0:19:47.716 --> 0:19:49.356
<v Speaker 1>to do more and more faster and faster.

0:19:49.196 --> 0:19:52.516
<v Speaker 2>Presumably, and my science team's actually really excited about that.

0:19:52.716 --> 0:19:54.996
<v Speaker 2>As you get more and more data. So we have

0:19:55.036 --> 0:19:58.276
<v Speaker 2>five hundred thousand stows that we've done so far, but

0:19:58.356 --> 0:20:00.876
<v Speaker 2>when we get to five hundred million STOs, you can

0:20:00.956 --> 0:20:03.676
<v Speaker 2>leverage some of these techniques to start learning the motions

0:20:03.716 --> 0:20:06.676
<v Speaker 2>and learning some of these strategies and refining them to

0:20:06.716 --> 0:20:09.196
<v Speaker 2>be specific to the item that you are molding in

0:20:09.196 --> 0:20:12.836
<v Speaker 2>your hand. There's a lot of opportunity as you get

0:20:12.876 --> 0:20:13.916
<v Speaker 2>more and more data.

0:20:14.156 --> 0:20:16.556
<v Speaker 1>Well, right, so we haven't really I mean, you mentioned

0:20:16.636 --> 0:20:19.236
<v Speaker 1>the software side, the AI side, but we haven't really

0:20:19.276 --> 0:20:22.316
<v Speaker 1>talked about it, and it is I mean, in talking

0:20:22.316 --> 0:20:26.076
<v Speaker 1>to other people working on robotics. It's plainly a data

0:20:27.036 --> 0:20:30.356
<v Speaker 1>game because there's no Internet of the physical world, right,

0:20:30.356 --> 0:20:33.436
<v Speaker 1>because large language models work so well, because there's this

0:20:33.556 --> 0:20:36.796
<v Speaker 1>huge data set and everybody is trying to get data

0:20:36.836 --> 0:20:40.396
<v Speaker 1>from the physical world, and you seem very well positioned

0:20:40.436 --> 0:20:42.756
<v Speaker 1>to get a lot of data from the physical world.

0:20:43.516 --> 0:20:46.236
<v Speaker 2>I think that's true. So one of the joys of

0:20:46.276 --> 0:20:48.756
<v Speaker 2>being a roboticist at Amazon is all the data that

0:20:48.796 --> 0:20:51.796
<v Speaker 2>we have access to. But I will push back a

0:20:51.796 --> 0:20:54.476
<v Speaker 2>little bit that it's just a data problem. It's a

0:20:54.516 --> 0:20:57.916
<v Speaker 2>highly debated topic. Some people in the world believe that

0:20:57.996 --> 0:21:02.036
<v Speaker 2>you can apply the same sort of transformer architectures that

0:21:02.156 --> 0:21:05.156
<v Speaker 2>work so well for search and so well for natural

0:21:05.236 --> 0:21:08.636
<v Speaker 2>language processing and apply those to robotics. If we only

0:21:08.716 --> 0:21:12.556
<v Speaker 2>had the data. I would not put myself in that camp.

0:21:12.676 --> 0:21:15.076
<v Speaker 2>I am not a believer that all we need is

0:21:15.116 --> 0:21:18.956
<v Speaker 2>more torque data from robotic grippers and will solve it.

0:21:19.516 --> 0:21:23.396
<v Speaker 2>Natural language is already tokenized in a way that's very

0:21:23.396 --> 0:21:27.836
<v Speaker 2>amenable to those methods, and language and search are also

0:21:28.716 --> 0:21:32.996
<v Speaker 2>very tolerant of sloppiness. So you and I can have

0:21:33.036 --> 0:21:35.916
<v Speaker 2>a conversation. I don't have to get every single word correct,

0:21:36.516 --> 0:21:38.876
<v Speaker 2>but if you mess up a torque on a gripper,

0:21:38.916 --> 0:21:41.836
<v Speaker 2>you can crush your iPhone, or you can sort of

0:21:41.876 --> 0:21:45.396
<v Speaker 2>smash something else that's there, or drop something, or just

0:21:45.436 --> 0:21:48.556
<v Speaker 2>fail the task. And that's because you have physics and

0:21:48.676 --> 0:21:54.436
<v Speaker 2>this nonlinear, very sort of difficult to model real world

0:21:54.916 --> 0:21:57.956
<v Speaker 2>that these robots have to interact with. And so I

0:21:57.996 --> 0:22:01.596
<v Speaker 2>think those techniques certainly accelerate us in a lot of places,

0:22:01.596 --> 0:22:04.636
<v Speaker 2>but they don't just solve the problem. I think we

0:22:04.676 --> 0:22:08.236
<v Speaker 2>need all of the rest of robotics, like hardware design

0:22:08.276 --> 0:22:11.476
<v Speaker 2>and classical control theory to solve those problems.

0:22:12.516 --> 0:22:15.756
<v Speaker 1>Compelling. Although you did start this part of the conversation,

0:22:15.876 --> 0:22:18.676
<v Speaker 1>you brought it up by saying, the science team is

0:22:18.716 --> 0:22:21.716
<v Speaker 1>really excited for what the model's going to learn once

0:22:21.796 --> 0:22:24.156
<v Speaker 1>you have hundreds of millions of stoves.

0:22:24.316 --> 0:22:26.396
<v Speaker 2>That's right, and that both things are true.

0:22:26.596 --> 0:22:28.956
<v Speaker 1>I know, yes, plainly, we're just talking about sort of

0:22:28.996 --> 0:22:31.756
<v Speaker 1>the margins, right, what is true? At what margins? I mean?

0:22:31.796 --> 0:22:34.436
<v Speaker 1>I did wonder as I was reading about this, you know,

0:22:34.476 --> 0:22:37.556
<v Speaker 1>I thought of AWS of Amazon Web Services, which of course,

0:22:37.676 --> 0:22:40.156
<v Speaker 1>like was an internal Amazon thing that at some point

0:22:40.156 --> 0:22:42.476
<v Speaker 1>Amazon was like, oh, maybe other people would find this

0:22:42.516 --> 0:22:45.276
<v Speaker 1>service useful, And now it's a giant part of Amazon's business,

0:22:45.596 --> 0:22:50.876
<v Speaker 1>and so I wondered, like, are you building Amazon robotics services.

0:22:51.436 --> 0:22:55.396
<v Speaker 2>Yet not today? There's so much value that we can

0:22:55.436 --> 0:23:00.596
<v Speaker 2>provide to our fulfillment business that we are one hundred

0:23:00.596 --> 0:23:04.196
<v Speaker 2>percent focused on that. Certainly as a roboticist, though, I

0:23:04.276 --> 0:23:08.236
<v Speaker 2>take great joy that the work we're doing is advancing

0:23:08.276 --> 0:23:11.396
<v Speaker 2>the field of robots, and so it's definitely like in

0:23:11.436 --> 0:23:14.116
<v Speaker 2>the makes my job better that we're advancing the state

0:23:14.156 --> 0:23:17.716
<v Speaker 2>of the art. But from a business perspective, it's all

0:23:17.756 --> 0:23:24.956
<v Speaker 2>hands on making the fulfillment process better for Amazon dot Com.

0:23:25.076 --> 0:23:40.276
<v Speaker 1>We'll be back in just a minute. I think I

0:23:40.396 --> 0:23:45.356
<v Speaker 1>read you say that you're building a foundation model of items.

0:23:45.556 --> 0:23:48.556
<v Speaker 1>Is that right? And I sort of know what that means,

0:23:48.556 --> 0:23:50.276
<v Speaker 1>But tell me what that means when you say that.

0:23:50.996 --> 0:23:54.716
<v Speaker 2>So, when a robot handles an item, it would do

0:23:54.836 --> 0:23:58.396
<v Speaker 2>better if it takes into account the properties of that item.

0:23:58.476 --> 0:24:01.716
<v Speaker 2>So if you're trying to hand a bowling ball to someone,

0:24:02.356 --> 0:24:04.196
<v Speaker 2>you should do that in a different way than if

0:24:04.196 --> 0:24:06.876
<v Speaker 2>you're handing them a bouncy ball or a light bulb.

0:24:07.356 --> 0:24:11.196
<v Speaker 2>At its core, a foundation model for items is simply

0:24:11.276 --> 0:24:15.356
<v Speaker 2>a model that encodes all of those attributes of an item.

0:24:15.916 --> 0:24:19.716
<v Speaker 2>And makes them available to the robotic systems that are

0:24:19.756 --> 0:24:21.516
<v Speaker 2>going to use it. And one of the things that

0:24:21.556 --> 0:24:23.756
<v Speaker 2>makes it a foundation model instead of just you know,

0:24:23.836 --> 0:24:26.076
<v Speaker 2>some custom bespoke thing is that you can transfer it

0:24:26.116 --> 0:24:30.396
<v Speaker 2>across lots of different applications. So if it's you know, stowing,

0:24:30.596 --> 0:24:32.556
<v Speaker 2>you can use it. If you're packing it into a

0:24:32.596 --> 0:24:35.716
<v Speaker 2>delivery box, you can use it. If you're putting it

0:24:35.756 --> 0:24:38.156
<v Speaker 2>onto a shelf in a physical store like for grocery

0:24:38.236 --> 0:24:40.276
<v Speaker 2>or whole foods or something, you can use it. And

0:24:40.316 --> 0:24:43.756
<v Speaker 2>so that like commonality across applications is one of the

0:24:43.756 --> 0:24:44.716
<v Speaker 2>things that's important.

0:24:45.476 --> 0:24:49.516
<v Speaker 1>Is part of the notion there that like the model

0:24:49.556 --> 0:24:52.516
<v Speaker 1>would allow a robot to sort of look at some

0:24:52.836 --> 0:24:56.396
<v Speaker 1>novel item and make a reasonable inference about the properties

0:24:56.436 --> 0:24:57.156
<v Speaker 1>of that item.

0:24:57.436 --> 0:25:00.636
<v Speaker 2>Yeah, absolutely that. And the other thing that's a little

0:25:00.676 --> 0:25:05.316
<v Speaker 2>non intuitive is that by understanding how to handle that

0:25:05.356 --> 0:25:10.076
<v Speaker 2>item in all those different applications a grocery a you know,

0:25:10.116 --> 0:25:14.316
<v Speaker 2>stowing picking, you get better at the individual application. So

0:25:14.916 --> 0:25:18.196
<v Speaker 2>by training on all of this data across these different domains,

0:25:18.596 --> 0:25:22.316
<v Speaker 2>you actually get better at the individual task that your

0:25:22.556 --> 0:25:26.156
<v Speaker 2>specific robot is trying to do. Doesn't like it takes

0:25:26.156 --> 0:25:29.316
<v Speaker 2>a while to like understand that it is not intuitive.

0:25:29.796 --> 0:25:31.596
<v Speaker 1>SayMore, what do you mean, like, I don't know that

0:25:31.676 --> 0:25:32.436
<v Speaker 1>I fully get it.

0:25:32.956 --> 0:25:35.836
<v Speaker 2>Understanding how an item behaves when you gift wrap it,

0:25:36.596 --> 0:25:40.236
<v Speaker 2>Uh huh shouldn't really inform how it's going to behave

0:25:40.316 --> 0:25:42.196
<v Speaker 2>when you're picking it off of a bookshelf.

0:25:42.796 --> 0:25:45.796
<v Speaker 1>Oh, I mean yes, it should, right, Like if you

0:25:45.876 --> 0:25:49.836
<v Speaker 1>think of like a whatever, a stuffed animal versus a book. Yeah,

0:25:49.876 --> 0:25:52.836
<v Speaker 1>maybe that's too easy of a case, but like if

0:25:52.876 --> 0:25:56.796
<v Speaker 1>a thing is squishy or rigid, that seems like as

0:25:56.836 --> 0:25:58.996
<v Speaker 1>a human being, I feel like we sort of port

0:25:59.076 --> 0:26:01.516
<v Speaker 1>that knowledge from one use case to another, right.

0:26:02.436 --> 0:26:04.596
<v Speaker 2>Yeah, it's a good point. And maybe that's because we

0:26:04.756 --> 0:26:08.596
<v Speaker 2>are inherently sort of we think and manipulate items in

0:26:08.636 --> 0:26:11.436
<v Speaker 2>the world more similarly to how these foundation models do.

0:26:11.916 --> 0:26:15.076
<v Speaker 2>But ten years ago it was totally not the case.

0:26:15.116 --> 0:26:17.796
<v Speaker 2>You would train your model in a very narrow domain,

0:26:17.996 --> 0:26:21.196
<v Speaker 2>and if you gave it data from some other domain,

0:26:21.356 --> 0:26:23.396
<v Speaker 2>it would kind of corrupt the results that you had,

0:26:23.476 --> 0:26:26.956
<v Speaker 2>and so you were very careful to curate all the

0:26:27.076 --> 0:26:29.556
<v Speaker 2>data that you were using to be very specific to

0:26:29.596 --> 0:26:32.036
<v Speaker 2>the task that you wanted it to do, and that

0:26:32.076 --> 0:26:34.996
<v Speaker 2>made the performance better. But it also meant that the

0:26:35.036 --> 0:26:37.436
<v Speaker 2>model you had was only good at that one very

0:26:37.516 --> 0:26:38.116
<v Speaker 2>narrow thing.

0:26:38.676 --> 0:26:40.596
<v Speaker 1>It was why we were always so far from the

0:26:40.636 --> 0:26:43.716
<v Speaker 1>general purpose robot. Yeah, because, as you're describing it, trying

0:26:43.756 --> 0:26:46.636
<v Speaker 1>to make a robot do more than one thing just

0:26:46.916 --> 0:26:47.836
<v Speaker 1>meant it couldn't even do.

0:26:47.756 --> 0:26:49.396
<v Speaker 2>One We couldn't even do one thing, and so you're

0:26:49.396 --> 0:26:51.756
<v Speaker 2>putting all your effort into making it do that one

0:26:51.796 --> 0:26:56.036
<v Speaker 2>thing just a little better. I think there's another really

0:26:56.036 --> 0:27:01.556
<v Speaker 2>interesting piece here, which is our team, the Vulcan team

0:27:01.556 --> 0:27:05.636
<v Speaker 2>at Amazon, is trying to use touch and vision together,

0:27:06.396 --> 0:27:09.916
<v Speaker 2>and that is how people interact with the world. That's

0:27:09.956 --> 0:27:13.716
<v Speaker 2>how people manipulate the world. And so the example I

0:27:14.116 --> 0:27:16.396
<v Speaker 2>like to give is picking a coin up off a

0:27:16.436 --> 0:27:19.236
<v Speaker 2>table ten years ago, when a robot would try and

0:27:19.276 --> 0:27:22.636
<v Speaker 2>do that, I mean it's impossible, Like, robot can't pick

0:27:22.636 --> 0:27:24.676
<v Speaker 2>a coin up off table, it's too hard a task.

0:27:25.636 --> 0:27:27.396
<v Speaker 2>My five year old can pick a coin up off

0:27:27.396 --> 0:27:30.996
<v Speaker 2>the table in half a second without you noticing. Well,

0:27:31.036 --> 0:27:34.076
<v Speaker 2>the reason is your strategy. So when you pick a

0:27:34.116 --> 0:27:37.796
<v Speaker 2>coin up off the table, you actually don't grasp the coin.

0:27:38.076 --> 0:27:40.676
<v Speaker 2>You go and you touch the table and then you

0:27:40.796 --> 0:27:44.596
<v Speaker 2>slide your fingers along the surface of the table until

0:27:44.636 --> 0:27:47.436
<v Speaker 2>you feel the coin, and when you feel the coin,

0:27:47.476 --> 0:27:50.556
<v Speaker 2>that's your trigger to like rotate it up into a grasp.

0:27:51.716 --> 0:27:54.476
<v Speaker 2>You're not going to some millimeter precision the way your

0:27:54.516 --> 0:27:59.196
<v Speaker 2>grandfather's robot and the welding line would do. And you're

0:27:59.196 --> 0:28:01.436
<v Speaker 2>not just watching with your eyes. You're using your eyes

0:28:01.516 --> 0:28:04.516
<v Speaker 2>and your fingertips both your.

0:28:04.356 --> 0:28:07.236
<v Speaker 1>Sense of touch. Yes, sense of touch is central to.

0:28:07.156 --> 0:28:11.556
<v Speaker 2>Pick and we are trying to do those same kind

0:28:11.636 --> 0:28:16.196
<v Speaker 2>of behaviors that are not only reacting to touch, but

0:28:16.516 --> 0:28:19.796
<v Speaker 2>planning for touch. So the same way you plan to

0:28:19.796 --> 0:28:23.716
<v Speaker 2>touch the table first, we plan to put our spatula

0:28:23.876 --> 0:28:26.796
<v Speaker 2>against the side of the bookcase before we try to

0:28:26.836 --> 0:28:30.676
<v Speaker 2>extend it in between this you know, small gap between

0:28:30.676 --> 0:28:32.196
<v Speaker 2>the T shirt and the bag and the side of

0:28:32.236 --> 0:28:36.396
<v Speaker 2>the bookcase. So we are building our plans and our

0:28:36.716 --> 0:28:40.156
<v Speaker 2>controllers around having sight and touch.

0:28:40.836 --> 0:28:42.676
<v Speaker 1>I mean when you say touch in the context of

0:28:42.716 --> 0:28:45.796
<v Speaker 1>the robot, does that mean that it is getting feedback

0:28:45.916 --> 0:28:49.196
<v Speaker 1>from the stuff it is coming into contact with? And

0:28:49.276 --> 0:28:51.396
<v Speaker 1>is that novel? And how does that work?

0:28:51.476 --> 0:28:54.636
<v Speaker 2>So the sensor is a force torque sensor. It looks

0:28:54.636 --> 0:28:57.956
<v Speaker 2>like a hockey puck and a thousand times a second,

0:28:58.236 --> 0:29:03.076
<v Speaker 2>it's telling you what it feels in the six degrees

0:29:03.076 --> 0:29:05.996
<v Speaker 2>of freedom, So up, up and down is one, left

0:29:05.996 --> 0:29:08.396
<v Speaker 2>and right is two, in and out as three, and

0:29:08.436 --> 0:29:11.756
<v Speaker 2>then you've got roll pitch y'ah as the three torques.

0:29:12.076 --> 0:29:15.516
<v Speaker 2>So a thousand times per second, you're sensing, you're feeling

0:29:16.876 --> 0:29:19.996
<v Speaker 2>what the world is pushing on you with, and we

0:29:20.156 --> 0:29:22.836
<v Speaker 2>use that to control the motion but also to plan

0:29:22.916 --> 0:29:23.276
<v Speaker 2>the motion.

0:29:24.676 --> 0:29:27.516
<v Speaker 1>When you say plan the motion, it's like, given the

0:29:27.556 --> 0:29:29.836
<v Speaker 1>sense of touch that is happening right now, what should

0:29:29.836 --> 0:29:30.836
<v Speaker 1>I do next? Yep.

0:29:31.156 --> 0:29:33.876
<v Speaker 2>So in a like high level view, it's like touch

0:29:33.916 --> 0:29:37.676
<v Speaker 2>the table first, slide along the table while keeping you know,

0:29:38.676 --> 0:29:41.436
<v Speaker 2>sort of one pound of force pushing into the table

0:29:42.116 --> 0:29:45.156
<v Speaker 2>until you touch the coin, and then you know, rotate.

0:29:45.596 --> 0:29:47.916
<v Speaker 2>That's at a high level, but then even at a

0:29:48.036 --> 0:29:51.116
<v Speaker 2>low level, the thousand times per second is so that

0:29:51.276 --> 0:29:54.276
<v Speaker 2>as you slide your fingers along the table, you're sort

0:29:54.276 --> 0:29:56.556
<v Speaker 2>of maintaining that accurate force.

0:29:57.676 --> 0:30:00.276
<v Speaker 1>Yeah. Or like if you're putting a thing on the shelf,

0:30:00.356 --> 0:30:02.236
<v Speaker 1>you can sort of tell if you've pushed it too

0:30:02.276 --> 0:30:04.156
<v Speaker 1>far because the shelf is pushing back.

0:30:04.036 --> 0:30:06.796
<v Speaker 2>At you exactly, or you can tell it slipping and

0:30:06.836 --> 0:30:08.836
<v Speaker 2>you're about to like push over the top of it,

0:30:09.156 --> 0:30:11.956
<v Speaker 2>so you can like, oh, it's about to fall over,

0:30:12.116 --> 0:30:15.596
<v Speaker 2>so I can react. And those dynamics are happening at

0:30:16.236 --> 0:30:18.316
<v Speaker 2>tens or hundreds of hurt since you need to sense

0:30:18.356 --> 0:30:19.556
<v Speaker 2>them at a thousand hurts.

0:30:21.156 --> 0:30:23.916
<v Speaker 1>What's the frontier right now for stewing? What are you

0:30:24.276 --> 0:30:25.636
<v Speaker 1>trying to figure out?

0:30:26.796 --> 0:30:31.276
<v Speaker 2>One of the things is getting the fullness of those

0:30:31.316 --> 0:30:34.876
<v Speaker 2>bins all the way up to where they are today,

0:30:34.916 --> 0:30:37.396
<v Speaker 2>so as a person you can pack those bins really,

0:30:37.436 --> 0:30:41.836
<v Speaker 2>really densely, and so the robot's close but not quite

0:30:41.876 --> 0:30:45.636
<v Speaker 2>as good as a person is today at getting as

0:30:45.716 --> 0:30:48.916
<v Speaker 2>much stuff into the bookcase as it can. That's one frontier,

0:30:49.756 --> 0:30:53.876
<v Speaker 2>and that is because one we're conservative, like our brain

0:30:53.996 --> 0:30:56.596
<v Speaker 2>is telling us there's no space when really there is space.

0:30:57.156 --> 0:31:00.916
<v Speaker 2>And two it's because those motions are not sophisticated enough yet.

0:31:01.436 --> 0:31:04.316
<v Speaker 2>So we're trying to improve our video streaming. We're trying

0:31:04.316 --> 0:31:07.356
<v Speaker 2>to get the eyes better to help as well as

0:31:07.396 --> 0:31:11.556
<v Speaker 2>those low level touch centers to those behaviors to be better.

0:31:12.516 --> 0:31:16.756
<v Speaker 2>So that's one of the major frontiers. The other one

0:31:16.796 --> 0:31:19.276
<v Speaker 2>is the negative. The robot makes too many mistakes, so

0:31:20.396 --> 0:31:25.676
<v Speaker 2>defects and exception handling are so important in robotic systems,

0:31:26.156 --> 0:31:27.836
<v Speaker 2>and this is another thing I think the world on

0:31:27.876 --> 0:31:30.596
<v Speaker 2>the Internet doesn't appreciate enough. Like you can do a

0:31:30.636 --> 0:31:33.956
<v Speaker 2>demo and a happy path. Hey, it worked once. I

0:31:33.956 --> 0:31:35.916
<v Speaker 2>can submit a paper to a conference, or I can

0:31:35.916 --> 0:31:38.876
<v Speaker 2>put a cool video on YouTube. That's great. You have

0:31:38.876 --> 0:31:41.876
<v Speaker 2>a demo. To have a product, you have to make

0:31:41.876 --> 0:31:44.916
<v Speaker 2>sure it's working, you know, ninety nine percent of the time,

0:31:44.996 --> 0:31:46.916
<v Speaker 2>or ninety nine and a half percent, or you know

0:31:47.036 --> 0:31:50.676
<v Speaker 2>in some cases four nines or five nines. And a

0:31:50.716 --> 0:31:51.996
<v Speaker 2>lot of the work you have to do is to

0:31:52.076 --> 0:31:57.436
<v Speaker 2>recover and handle those rare exceptions or prevent or recover

0:31:57.516 --> 0:32:00.916
<v Speaker 2>from those defects. And so the robot still drops too

0:32:00.956 --> 0:32:03.756
<v Speaker 2>much stuff on the floor. One of our frontiers is

0:32:03.796 --> 0:32:06.436
<v Speaker 2>not dropping crap on the floor, like, we need to

0:32:06.436 --> 0:32:07.796
<v Speaker 2>get about three times better at that.

0:32:08.196 --> 0:32:13.116
<v Speaker 1>Umly, the robot is already skipping some universe of items

0:32:13.116 --> 0:32:14.316
<v Speaker 1>that the robot can't handle.

0:32:14.876 --> 0:32:17.236
<v Speaker 2>Yeah, and so we need to get smarter about which

0:32:17.276 --> 0:32:19.716
<v Speaker 2>items we skip and which items we take. We also

0:32:19.756 --> 0:32:23.316
<v Speaker 2>need to get better at inserting those items in such

0:32:23.316 --> 0:32:24.756
<v Speaker 2>a way that they're not going to fall back out?

0:32:25.956 --> 0:32:27.956
<v Speaker 1>What items are particularly hard for the robot?

0:32:28.596 --> 0:32:31.596
<v Speaker 2>So tight fitting items are the hardest.

0:32:31.236 --> 0:32:34.116
<v Speaker 1>Uh huh. And so that's not the nature of the item,

0:32:34.156 --> 0:32:37.516
<v Speaker 1>but the nature of the particular relationship between the item and.

0:32:37.556 --> 0:32:38.556
<v Speaker 2>The shelf exactly.

0:32:38.836 --> 0:32:41.956
<v Speaker 1>Yeah, Like, is there a kind of thing that the

0:32:42.076 --> 0:32:46.236
<v Speaker 1>robot just can't do because of its shape or something.

0:32:46.436 --> 0:32:51.476
<v Speaker 2>There is a particular rubber fish that we really hate.

0:32:52.516 --> 0:32:53.396
<v Speaker 2>It's a dog toy.

0:32:53.476 --> 0:32:54.756
<v Speaker 1>It's floppy. Is that what?

0:32:55.076 --> 0:32:55.516
<v Speaker 2>Sticky?

0:32:56.316 --> 0:33:00.156
<v Speaker 1>Oh? Sticky? Interesting? Yeah, And they don't put it in

0:33:00.196 --> 0:33:03.996
<v Speaker 1>a bot, Nope, they just send you the sticky fish.

0:33:04.116 --> 0:33:07.076
<v Speaker 2>Yeah, and it sort of gets hung up on whenever

0:33:07.116 --> 0:33:09.916
<v Speaker 2>it makes contact. It doesn't slide, it like it wants

0:33:09.996 --> 0:33:13.556
<v Speaker 2>to rotate about whatever it's made contact with. And so

0:33:13.596 --> 0:33:15.596
<v Speaker 2>there's this particular dog toy and so we use it.

0:33:15.636 --> 0:33:18.676
<v Speaker 2>We've bought like fifty of them and now we have

0:33:18.756 --> 0:33:21.076
<v Speaker 2>them in the lab and this is like our diabolical

0:33:21.156 --> 0:33:21.756
<v Speaker 2>item set.

0:33:22.196 --> 0:33:23.996
<v Speaker 1>Is that a term of art that diabolical?

0:33:24.556 --> 0:33:27.436
<v Speaker 2>I don't know, Yeah, it's our term of art. Yeah.

0:33:27.916 --> 0:33:30.916
<v Speaker 2>Also bagged items where the bag is really loose. So

0:33:31.996 --> 0:33:34.756
<v Speaker 2>imagine having like a T shirt in a bag, but

0:33:34.796 --> 0:33:36.996
<v Speaker 2>the bag is like twice as big as the T shirt.

0:33:37.956 --> 0:33:40.036
<v Speaker 1>Floppy? Is that the floppy problem?

0:33:40.196 --> 0:33:43.996
<v Speaker 2>Floppy but also transparent, so sometimes you can see through

0:33:44.036 --> 0:33:44.476
<v Speaker 2>the bag.

0:33:44.596 --> 0:33:48.076
<v Speaker 1>Or so the robot gets confused about is the bag

0:33:48.156 --> 0:33:49.036
<v Speaker 1>the item.

0:33:49.076 --> 0:33:51.916
<v Speaker 2>Yeah or not? Sometimes you want one and sometimes you

0:33:51.916 --> 0:33:55.356
<v Speaker 2>want the other. So like if it's just floppy plastic bag,

0:33:55.756 --> 0:33:57.876
<v Speaker 2>it probably will fit. Like if I just push it

0:33:57.916 --> 0:33:59.876
<v Speaker 2>into the bin, the bag is going to conform and

0:34:00.396 --> 0:34:02.996
<v Speaker 2>slide in, but you can't be sure about that. You know,

0:34:03.036 --> 0:34:04.716
<v Speaker 2>you get into a bunch of those edge cases that

0:34:04.756 --> 0:34:06.836
<v Speaker 2>are in that long tail of being robust.

0:34:07.756 --> 0:34:10.596
<v Speaker 1>I mean, it's interesting, right because the robot is dealing

0:34:10.636 --> 0:34:13.436
<v Speaker 1>with this sort of human optimized world. Like it reminds

0:34:13.476 --> 0:34:16.836
<v Speaker 1>me of the way I think is it. Ikea designs

0:34:16.876 --> 0:34:19.996
<v Speaker 1>its furniture to fit optimally on a pallette, so you

0:34:20.036 --> 0:34:21.556
<v Speaker 1>can fit the most of them, like not just the

0:34:21.596 --> 0:34:25.436
<v Speaker 1>flat pack, but like in more subtle ways. And can

0:34:25.516 --> 0:34:28.716
<v Speaker 1>you imagine that there is some shift in the world

0:34:28.796 --> 0:34:31.436
<v Speaker 1>where I mean, obviously you're trying to make the robot better,

0:34:31.516 --> 0:34:33.596
<v Speaker 1>but also people are trying to make things work better

0:34:33.596 --> 0:34:34.276
<v Speaker 1>for the robot.

0:34:34.876 --> 0:34:38.956
<v Speaker 2>Yes. Absolutely, And there is a different team within Amazon

0:34:38.996 --> 0:34:43.476
<v Speaker 2>that's imagining a future world and future bookcases that are

0:34:43.916 --> 0:34:45.156
<v Speaker 2>friendly for robots.

0:34:45.636 --> 0:34:46.356
<v Speaker 1>Uh huh.

0:34:46.396 --> 0:34:53.316
<v Speaker 2>However, there are currently five million of those bookshelves in

0:34:53.436 --> 0:34:57.636
<v Speaker 2>warehouses holding inventory that's for sale on Amazon dot Com.

0:34:58.556 --> 0:35:02.716
<v Speaker 2>And so it's a really really big lift to go

0:35:02.836 --> 0:35:04.436
<v Speaker 2>replace all of those books.

0:35:04.436 --> 0:35:07.636
<v Speaker 1>Shehlds interesting. So it's a whole other team that's just like,

0:35:07.796 --> 0:35:13.676
<v Speaker 1>let's imagine the you know, a much more robot centric warehouse. Yeah,

0:35:13.716 --> 0:35:15.516
<v Speaker 1>those guys like you don't even talk to them. They're

0:35:15.516 --> 0:35:17.196
<v Speaker 1>just off in their own.

0:35:16.876 --> 0:35:19.716
<v Speaker 2>I mean, they're friends, but yeah, we are spacing very

0:35:19.716 --> 0:35:23.956
<v Speaker 2>different problems. And so we took a tenant very early on.

0:35:24.036 --> 0:35:28.916
<v Speaker 2>It's like, the world exists, the robot needs to perform

0:35:29.116 --> 0:35:32.316
<v Speaker 2>in the world as it exists. And this team they

0:35:32.316 --> 0:35:34.916
<v Speaker 2>get their green field, so they get to think of

0:35:34.996 --> 0:35:38.116
<v Speaker 2>like a new field. We are a brownfield, meaning we

0:35:38.156 --> 0:35:41.036
<v Speaker 2>have to retrofit into these existing buildings. You know, we

0:35:41.076 --> 0:35:43.036
<v Speaker 2>have like ten year leases on some of these building

0:35:43.036 --> 0:35:44.676
<v Speaker 2>They're going to be there for a long long time.

0:35:45.636 --> 0:35:47.716
<v Speaker 1>And then somebody else is out there. So they're building

0:35:47.756 --> 0:35:50.196
<v Speaker 1>a whole other kind of robot. Your robot is optimized

0:35:50.236 --> 0:35:52.036
<v Speaker 1>for the world today, and somebody else is building a

0:35:52.116 --> 0:35:53.956
<v Speaker 1>robot for the robot world.

0:35:53.996 --> 0:35:56.916
<v Speaker 2>That's right. I love that they have a building that

0:35:56.956 --> 0:36:00.516
<v Speaker 2>they've built in Louisiana. It's in Shreveport, Louisiana. It has

0:36:00.716 --> 0:36:04.996
<v Speaker 2>ten times the number of robots that traditional building has.

0:36:05.716 --> 0:36:09.836
<v Speaker 2>It's a completely reimagined way of fulfilling your order. It

0:36:09.836 --> 0:36:12.436
<v Speaker 2>also has a lot of people still working in those buildings,

0:36:12.476 --> 0:36:17.556
<v Speaker 2>but they're working in maintenance and robotics quarterbacks jobs, and

0:36:17.596 --> 0:36:19.676
<v Speaker 2>so they're higher skilled. And so we have a bunch

0:36:19.676 --> 0:36:23.356
<v Speaker 2>of programs that are trying to transition our very talented

0:36:23.356 --> 0:36:26.276
<v Speaker 2>workforce into the jobs of the future. One of the

0:36:26.276 --> 0:36:28.156
<v Speaker 2>things I really like to say is, you don't need

0:36:28.196 --> 0:36:31.676
<v Speaker 2>a college degree to work in robotics. At Amazon. It's

0:36:31.716 --> 0:36:33.996
<v Speaker 2>about twenty twenty five percent of my team doesn't have

0:36:33.996 --> 0:36:37.436
<v Speaker 2>a college degree but are enormously valuable. Like some of

0:36:37.436 --> 0:36:40.436
<v Speaker 2>our top ten people on our team are those people.

0:36:41.316 --> 0:36:45.116
<v Speaker 1>That facility in Treeport. Is it live? Like, is real

0:36:45.156 --> 0:36:47.036
<v Speaker 1>stuff going in and real orders going out?

0:36:47.156 --> 0:36:49.956
<v Speaker 2>Yeah, it's live. We could follow up with exactly the date,

0:36:49.956 --> 0:36:52.236
<v Speaker 2>but it's been up for about a year. I think, so.

0:36:52.396 --> 0:36:55.076
<v Speaker 1>Interesting thing like that, Well, I would be interested in

0:36:55.116 --> 0:36:57.676
<v Speaker 1>talking to your counterpart there as well. That show would

0:36:57.676 --> 0:37:01.836
<v Speaker 1>pair interestingly with this show. So okay, let's talk about

0:37:01.836 --> 0:37:04.716
<v Speaker 1>the rest of the process. You know, the rest of

0:37:04.756 --> 0:37:06.876
<v Speaker 1>what's going on in the warehouse and where else you're

0:37:06.916 --> 0:37:10.356
<v Speaker 1>working on robots. So the piece we've been talking about

0:37:10.356 --> 0:37:13.476
<v Speaker 1>this whole time is getting stuff as it comes in

0:37:13.516 --> 0:37:17.716
<v Speaker 1>from the truck onto the shelf, which naively I wouldn't

0:37:17.716 --> 0:37:19.476
<v Speaker 1>even think of that part, but it turns out to

0:37:19.476 --> 0:37:22.076
<v Speaker 1>me this scrape big problem. What are the other pieces?

0:37:23.036 --> 0:37:26.836
<v Speaker 2>What's interesting is the science we're building giving robots a

0:37:26.876 --> 0:37:30.756
<v Speaker 2>sense of touch has applicability and lots and lots of

0:37:30.796 --> 0:37:34.756
<v Speaker 2>places across that whole chain. Anytime the robots need to

0:37:34.796 --> 0:37:40.396
<v Speaker 2>be physically interacting, like contacting, touching items is a good

0:37:40.436 --> 0:37:44.436
<v Speaker 2>place for our core technology. So if we're packing four

0:37:44.476 --> 0:37:46.516
<v Speaker 2>items into a box because we want to send you

0:37:46.556 --> 0:37:48.676
<v Speaker 2>the four things you bought in one shipment, not in

0:37:48.756 --> 0:37:52.196
<v Speaker 2>four separate packages, you need to touch the box. You

0:37:52.236 --> 0:37:53.956
<v Speaker 2>need to touch the other items that are already in

0:37:53.996 --> 0:37:56.116
<v Speaker 2>the box. You need to play that game of tetris.

0:37:56.636 --> 0:37:59.356
<v Speaker 1>Yes, I mean it's a stowing problem again, right, I

0:37:59.436 --> 0:38:01.796
<v Speaker 1>know it's called packing, but it's a version of that same.

0:38:01.636 --> 0:38:04.276
<v Speaker 2>Problem, that's right, And those problems recur over and over again.

0:38:04.396 --> 0:38:07.036
<v Speaker 2>So getting all of the packages, all of the cardboard

0:38:07.076 --> 0:38:10.916
<v Speaker 2>boxes and paper mailers into a cart that can go

0:38:11.356 --> 0:38:13.596
<v Speaker 2>onto the back of the truck, that is a stowing

0:38:13.676 --> 0:38:14.716
<v Speaker 2>problem in the cart.

0:38:15.116 --> 0:38:18.036
<v Speaker 1>Putting things in a thing, yeah, is a great, big

0:38:18.076 --> 0:38:19.716
<v Speaker 1>problem in many ways.

0:38:19.836 --> 0:38:22.276
<v Speaker 2>But you can also expand to think about grocery. So

0:38:22.356 --> 0:38:28.036
<v Speaker 2>if you order produce, you don't want your grandfather's welding

0:38:28.156 --> 0:38:31.236
<v Speaker 2>robot handling your peaches. It's gonna smash them, like, you

0:38:31.276 --> 0:38:33.676
<v Speaker 2>need a robot with a sense of touch. If you

0:38:33.756 --> 0:38:37.516
<v Speaker 2>think about household tasks, if you want a robot, you know,

0:38:37.596 --> 0:38:41.796
<v Speaker 2>picking up your kid's toys or dealing with laundry, like,

0:38:41.876 --> 0:38:43.876
<v Speaker 2>those robots need to have a sense of touch. They're

0:38:43.916 --> 0:38:46.956
<v Speaker 2>physically interacting in a dexterous way with the world. And

0:38:47.036 --> 0:38:49.036
<v Speaker 2>so one of the things that we're so excited about

0:38:49.436 --> 0:38:52.836
<v Speaker 2>not only these big applications for stowing and picking off

0:38:52.836 --> 0:38:57.076
<v Speaker 2>of you know, these bookcases, but everything that gets unlocked

0:38:57.196 --> 0:38:59.356
<v Speaker 2>once the robot has that sense of touch.

0:39:00.396 --> 0:39:04.996
<v Speaker 1>When you talk that way, it feels like a beyond

0:39:05.116 --> 0:39:08.516
<v Speaker 1>what is typically considered Amazon kind of thing. It seems

0:39:08.556 --> 0:39:11.076
<v Speaker 1>like a thing either Amazon's going to get into lots

0:39:11.076 --> 0:39:16.316
<v Speaker 1>of other sort of non retail businesses or license the

0:39:16.356 --> 0:39:19.476
<v Speaker 1>technology or sell you know, robotic touch as a service

0:39:19.716 --> 0:39:20.236
<v Speaker 1>or whatever.

0:39:20.356 --> 0:39:25.276
<v Speaker 2>Yeah, I think there are probably five or ten applications

0:39:25.276 --> 0:39:29.036
<v Speaker 2>in how we process orders today that are all within

0:39:29.156 --> 0:39:32.756
<v Speaker 2>the warehouses and delivery stations, and those are my first

0:39:33.716 --> 0:39:37.236
<v Speaker 2>hill to climb. Then we do have a consumer robotics team.

0:39:37.316 --> 0:39:40.956
<v Speaker 2>So there was a cool robot we released called Astro.

0:39:41.516 --> 0:39:44.236
<v Speaker 2>It didn't have any manipulation capabilities, right, It would drive

0:39:44.236 --> 0:39:46.236
<v Speaker 2>around your house. It had a camera on a mast

0:39:46.316 --> 0:39:48.916
<v Speaker 2>that would extend up and down. You could talk to

0:39:48.956 --> 0:39:50.716
<v Speaker 2>it the way you can talk to an Alexa device.

0:39:51.196 --> 0:39:53.436
<v Speaker 2>The future versions of those robots are going to want

0:39:53.476 --> 0:39:55.956
<v Speaker 2>to do more useful things, and so they're going to

0:39:56.036 --> 0:39:58.316
<v Speaker 2>need this kind of underlying technology, and so that's a

0:39:58.316 --> 0:40:01.996
<v Speaker 2>business opportunity in the long term. You know, that's not

0:40:02.116 --> 0:40:05.276
<v Speaker 2>a thing my team is focused on now, but I

0:40:05.276 --> 0:40:08.316
<v Speaker 2>get excited about it when I think about what we unlock.

0:40:12.796 --> 0:40:14.916
<v Speaker 1>We'll be back in a minute with the lightning round.

0:40:25.436 --> 0:40:27.676
<v Speaker 1>Let's do a lightning round. If you listen to the show,

0:40:28.196 --> 0:40:30.476
<v Speaker 1>you have a sense of what this is. Tell me

0:40:30.476 --> 0:40:32.996
<v Speaker 1>about the last time you were in zero gravity.

0:40:33.796 --> 0:40:38.036
<v Speaker 2>I flew an experiment to try and drill into rocks,

0:40:38.436 --> 0:40:42.036
<v Speaker 2>which was going to be applied to asteroids. And of course,

0:40:42.036 --> 0:40:45.236
<v Speaker 2>if you're drilling into an asteroid, any amount you're pushing

0:40:45.276 --> 0:40:47.836
<v Speaker 2>into the rock is pushing you back off into space

0:40:47.876 --> 0:40:51.516
<v Speaker 2>because asteroids have almost zero gravity.

0:40:51.436 --> 0:40:54.556
<v Speaker 1>Right, so you gotta have somebody push it on the

0:40:54.596 --> 0:40:56.716
<v Speaker 1>other side. How do you solve that? What do you do?

0:40:56.996 --> 0:40:58.676
<v Speaker 1>You grab it? How do you even do that?

0:40:58.756 --> 0:41:02.396
<v Speaker 2>Says my passion for robot hands. We built a robot

0:41:02.436 --> 0:41:05.116
<v Speaker 2>hand that would grab the rock with a bunch of claws.

0:41:05.156 --> 0:41:08.076
<v Speaker 2>I think it had a thousand claws, and the claws

0:41:08.076 --> 0:41:10.436
<v Speaker 2>were actually fish hooks. So imagine a bunch of fish

0:41:10.436 --> 0:41:13.676
<v Speaker 2>hooks grabbing onto a rock to react the force of

0:41:13.716 --> 0:41:15.276
<v Speaker 2>pushing a drill bit down the center.

0:41:15.916 --> 0:41:16.596
<v Speaker 1>Did it work?

0:41:17.316 --> 0:41:19.756
<v Speaker 2>It did work, but it only worked on rocks that

0:41:19.796 --> 0:41:22.476
<v Speaker 2>were pretty rough, that had a lot of spots for

0:41:22.636 --> 0:41:25.396
<v Speaker 2>the fish hooks to grab. But it turns out asteroids

0:41:25.396 --> 0:41:27.916
<v Speaker 2>are really rough. Most of the smooth rocks you find

0:41:27.916 --> 0:41:31.116
<v Speaker 2>on Earth have been processed by liquid water or ice,

0:41:31.476 --> 0:41:34.876
<v Speaker 2>and that's not happening on asteroids. No liquid water.

0:41:35.516 --> 0:41:37.556
<v Speaker 1>And so this was on the on the plane, on

0:41:37.636 --> 0:41:41.396
<v Speaker 1>that NASA plane that flies what is it? Yeah, vomit

0:41:41.476 --> 0:41:44.636
<v Speaker 1>combat fine curve basically, yeah, what was it like?

0:41:45.316 --> 0:41:48.756
<v Speaker 2>The vomit comets actually very zen. So when you're in

0:41:48.956 --> 0:41:52.676
<v Speaker 2>zero gravity, when you're floating, it's like very peaceful. It's

0:41:52.716 --> 0:41:54.996
<v Speaker 2>when you're in double gravity, where you're the bottom of

0:41:55.036 --> 0:41:58.116
<v Speaker 2>the parabola and you're like being glued and pushed against

0:41:58.156 --> 0:42:00.196
<v Speaker 2>the floor. If you like turn your head very quickly,

0:42:00.196 --> 0:42:03.196
<v Speaker 2>that's where you get like into serious trouble. And so

0:42:03.276 --> 0:42:05.876
<v Speaker 2>the trick is just to like go into your zone

0:42:06.156 --> 0:42:08.196
<v Speaker 2>for the bottom of the parabola and then you've become

0:42:08.236 --> 0:42:11.876
<v Speaker 2>like very free and zen like in the zero G portion.

0:42:13.236 --> 0:42:14.436
<v Speaker 1>You think you'll ever go to space?

0:42:15.556 --> 0:42:18.476
<v Speaker 2>No, I think now that I have three kids, I

0:42:18.476 --> 0:42:19.556
<v Speaker 2>think I'm landlocked.

0:42:20.156 --> 0:42:24.076
<v Speaker 1>You seem a little bit sad about that. Does everybody

0:42:24.116 --> 0:42:26.076
<v Speaker 1>who works at JPL kind of want to go to space?

0:42:26.916 --> 0:42:30.676
<v Speaker 2>Yes, everybody that works at JPL, I think does think

0:42:30.676 --> 0:42:32.876
<v Speaker 2>about going to space. I think what makes me sad

0:42:32.996 --> 0:42:36.916
<v Speaker 2>is we could be doing so much more at building

0:42:36.956 --> 0:42:41.476
<v Speaker 2>civilization out into space, at the scientific exploration of all

0:42:41.556 --> 0:42:45.156
<v Speaker 2>of the interesting places in space, and I think we're

0:42:45.236 --> 0:42:48.156
<v Speaker 2>kind of tripping ourselves up in a couple of places

0:42:48.356 --> 0:42:51.036
<v Speaker 2>as a species. I wish we would get unblocked and

0:42:51.036 --> 0:42:52.996
<v Speaker 2>get some of that eagerness you see some of the

0:42:52.996 --> 0:42:56.036
<v Speaker 2>private investment, Like we're doing well in rockets, but we're

0:42:56.076 --> 0:42:59.636
<v Speaker 2>not yet doing well in the spacecraft and the scientific

0:42:59.676 --> 0:43:03.116
<v Speaker 2>instruments and the pieces that have to fly on top

0:43:03.156 --> 0:43:04.036
<v Speaker 2>of the rockets.

0:43:04.556 --> 0:43:07.396
<v Speaker 1>When you say we're tripping ourselves up in a couple places,

0:43:07.436 --> 0:43:09.076
<v Speaker 1>in what places? Like, what do you mean?

0:43:10.516 --> 0:43:13.596
<v Speaker 2>I think we became very conservative, like our risk posture

0:43:13.636 --> 0:43:17.316
<v Speaker 2>about going to space. We stopped treating it as this

0:43:17.476 --> 0:43:20.916
<v Speaker 2>like very dangerous activity and tried to make it extremely safe,

0:43:20.916 --> 0:43:22.636
<v Speaker 2>and that slowed us down.

0:43:22.636 --> 0:43:24.516
<v Speaker 1>To bring back to cowboys a little bit.

0:43:24.636 --> 0:43:28.076
<v Speaker 2>Yeah. Interesting, Yeah, and then there's a lot of bureaucracy,

0:43:28.196 --> 0:43:32.556
<v Speaker 2>of course, that built up over fifty years. I still

0:43:32.556 --> 0:43:35.116
<v Speaker 2>have very optimistic there's a lot of smart people working

0:43:35.116 --> 0:43:38.116
<v Speaker 2>in that area and a lot of exciting things happening,

0:43:39.116 --> 0:43:40.076
<v Speaker 2>So we're going to get through it.

0:43:46.996 --> 0:43:51.276
<v Speaker 1>Aaron Parness is a director of Applied Science at Amazon Robotics.

0:43:52.476 --> 0:43:55.796
<v Speaker 1>Please email us at problem at Pushkin dot fm. We

0:43:55.876 --> 0:43:59.596
<v Speaker 1>are always looking for new guests for the show. Today's

0:43:59.596 --> 0:44:03.436
<v Speaker 1>show was produced by Trinomnino and Gabriel Hunter Chang. It

0:44:03.556 --> 0:44:07.556
<v Speaker 1>was edited by Alexander Garretton and engineered by Sarah Bruguerrett.

0:44:07.796 --> 0:44:09.916
<v Speaker 1>I'm Jacob Goldstein, and we'll be back next week with

0:44:09.996 --> 0:44:23.356
<v Speaker 1>another episode of What's Your Problem.