1 00:00:15,356 --> 00:00:15,796 Speaker 1: Pushkin. 2 00:00:20,396 --> 00:00:22,596 Speaker 2: One of the things I look at in robotics as 3 00:00:22,636 --> 00:00:27,076 Speaker 2: a big field. There are so many amazing demonstrations of mobility, 4 00:00:27,636 --> 00:00:32,396 Speaker 2: robots doing backflips, robots running down hills, and that's really 5 00:00:32,396 --> 00:00:34,916 Speaker 2: impressive to me because I can't do a backflip, or 6 00:00:35,036 --> 00:00:37,756 Speaker 2: I might trip if I run down the hill. But 7 00:00:37,876 --> 00:00:41,636 Speaker 2: where the really valuable parts of robotics are going to 8 00:00:41,676 --> 00:00:46,036 Speaker 2: be are in manipulation. So my kid can take a 9 00:00:46,036 --> 00:00:48,436 Speaker 2: blueberry out of her cereal bowl because she doesn't want 10 00:00:48,476 --> 00:00:51,396 Speaker 2: to eat it, and that is an incredibly hard task 11 00:00:51,436 --> 00:00:54,196 Speaker 2: for a robot. And you don't see any of those demos. 12 00:00:54,276 --> 00:00:58,156 Speaker 2: And I think we're like kind of inherently programmed as 13 00:00:58,236 --> 00:01:01,476 Speaker 2: people to like be biased towards the backflip being more 14 00:01:01,516 --> 00:01:04,796 Speaker 2: impressive and in reality like the business value and the 15 00:01:04,836 --> 00:01:07,516 Speaker 2: harder thing for the robot is to like take the 16 00:01:07,516 --> 00:01:13,876 Speaker 2: blueberry out of the cereal bowl. 17 00:01:15,076 --> 00:01:17,436 Speaker 1: I'm Jacob Goldstein and this is What's Your Problem, the 18 00:01:17,476 --> 00:01:19,516 Speaker 1: show where I talk to people who are trying to 19 00:01:19,596 --> 00:01:24,676 Speaker 1: make technological progress. My guest today is Aaron Parness. Aarin 20 00:01:24,756 --> 00:01:27,876 Speaker 1: spent the earlier part of his career building space robots 21 00:01:27,876 --> 00:01:32,236 Speaker 1: at NASA's Jet Propulsion Laboratory JPL. Six years ago, he 22 00:01:32,276 --> 00:01:35,676 Speaker 1: went to work at Amazon. Now, Aaron is a director 23 00:01:35,716 --> 00:01:39,636 Speaker 1: of Applied Science at Amazon Robotics. I wanted to talk 24 00:01:39,676 --> 00:01:43,356 Speaker 1: to Erin about a robot arm called Vulcan. He and 25 00:01:43,436 --> 00:01:45,996 Speaker 1: his team developed Vulcan to do a job that is 26 00:01:46,076 --> 00:01:50,276 Speaker 1: surprisingly hard for robots to do, taking stuff that gets 27 00:01:50,316 --> 00:01:55,236 Speaker 1: delivered to Amazon warehouses and putting it onto shelves. In 28 00:01:55,316 --> 00:01:58,236 Speaker 1: order to solve this problem, Aaron and his team had 29 00:01:58,236 --> 00:02:00,676 Speaker 1: to build a robot that had a sense of touch, 30 00:02:01,156 --> 00:02:05,116 Speaker 1: that could deal with complicated, unpredictable situations, and that could 31 00:02:05,196 --> 00:02:08,036 Speaker 1: look at a shelf and plan out a course of action. 32 00:02:09,156 --> 00:02:11,796 Speaker 1: As you'll hear in the interview, all of those traits 33 00:02:11,916 --> 00:02:14,836 Speaker 1: may someday be helpful, not just in stocking shelves in 34 00:02:14,876 --> 00:02:18,036 Speaker 1: a warehouse, but in doing lots of boring sounding but 35 00:02:18,196 --> 00:02:23,316 Speaker 1: complicated real world tasks like, for example, taking a blueberry 36 00:02:23,396 --> 00:02:26,196 Speaker 1: out of a bowl of cereal. To start, I asked 37 00:02:26,196 --> 00:02:28,716 Speaker 1: Aaron to tell me the problem that Vulcan was designed 38 00:02:28,756 --> 00:02:30,716 Speaker 1: to solve at Amazon's warehouses. 39 00:02:31,516 --> 00:02:35,356 Speaker 2: So new inventory comes into the building. You know, trucks 40 00:02:35,356 --> 00:02:38,156 Speaker 2: pull up and they unload new stuff. We need to 41 00:02:38,196 --> 00:02:40,796 Speaker 2: store that stuff while it's waiting for someone to click 42 00:02:40,836 --> 00:02:45,236 Speaker 2: the buy button. We store it in these large fabric bookcases. 43 00:02:46,116 --> 00:02:49,276 Speaker 2: It's about eight feet tall. It has about forty different 44 00:02:49,316 --> 00:02:53,076 Speaker 2: shelves on it. It's four sided, so you can store 45 00:02:53,156 --> 00:02:56,996 Speaker 2: stuff from any of the different faces of the case. 46 00:02:57,756 --> 00:03:01,156 Speaker 2: What's really interesting is this stuff is randomly stowed, so 47 00:03:01,196 --> 00:03:04,036 Speaker 2: it's not like all the iPhones are in one shelf. 48 00:03:04,476 --> 00:03:08,076 Speaker 2: It'll be all different stuff, all mixed together. 49 00:03:08,596 --> 00:03:11,356 Speaker 1: When you say random, do you mean random or do 50 00:03:11,396 --> 00:03:13,876 Speaker 1: you mean it would look random to the untrained eye. 51 00:03:14,116 --> 00:03:18,476 Speaker 2: I mean literally random. Really wherever there is space you 52 00:03:18,556 --> 00:03:20,436 Speaker 2: can put the item, because. 53 00:03:20,076 --> 00:03:22,956 Speaker 1: That's what's optimal. It turns out the optimal way to 54 00:03:22,956 --> 00:03:24,516 Speaker 1: store stuff is random. 55 00:03:24,236 --> 00:03:28,956 Speaker 2: That's right. Why the stems actually from Jeff Bezos is 56 00:03:28,996 --> 00:03:32,516 Speaker 2: like original vision, I think, and it's it's incredible. So 57 00:03:33,556 --> 00:03:36,596 Speaker 2: you want to have the most selection, and you want 58 00:03:36,636 --> 00:03:39,156 Speaker 2: to have speed of delivery, and you want to have 59 00:03:39,196 --> 00:03:41,556 Speaker 2: low cost, and that's what the customer wants, right. The 60 00:03:42,276 --> 00:03:46,116 Speaker 2: customer is using Amazon dot Com because we have selection, 61 00:03:46,236 --> 00:03:49,356 Speaker 2: we have speed, and we have low cost. In order 62 00:03:49,396 --> 00:03:53,076 Speaker 2: to achieve that, you have to have these massive warehouses 63 00:03:53,676 --> 00:03:56,996 Speaker 2: located really close to your customers, and you have a 64 00:03:56,996 --> 00:03:59,316 Speaker 2: lot of customers in Tokyo, in New York City and 65 00:03:59,356 --> 00:04:03,356 Speaker 2: San Francisco where real estate's really expensive, so you have 66 00:04:03,396 --> 00:04:05,076 Speaker 2: to figure out a way to put all of this 67 00:04:05,276 --> 00:04:10,116 Speaker 2: different stuff in like the densest packing area, you can 68 00:04:10,596 --> 00:04:13,516 Speaker 2: and have access to it immediately so that you can 69 00:04:13,676 --> 00:04:17,636 Speaker 2: you can deliver in hours instead of days. And what 70 00:04:17,676 --> 00:04:22,556 Speaker 2: that means is that random is better than structured. So 71 00:04:22,676 --> 00:04:25,476 Speaker 2: anywhere there's a space, you can add that item into 72 00:04:25,516 --> 00:04:28,196 Speaker 2: the inventory, and that means it comes up for sale 73 00:04:28,236 --> 00:04:31,916 Speaker 2: immediately on the website, and then when someone places in order, 74 00:04:32,596 --> 00:04:34,956 Speaker 2: you don't have to wait for that iPhone bookcase to 75 00:04:34,996 --> 00:04:37,436 Speaker 2: make its way all the way across the warehouse. You 76 00:04:37,516 --> 00:04:40,076 Speaker 2: probably have a thousand iPhones in the warehouse, and whichever 77 00:04:40,116 --> 00:04:43,676 Speaker 2: one is closest can go to whichever pickstation is eligible, 78 00:04:44,316 --> 00:04:49,716 Speaker 2: and it ends up being actually substantially faster. So that last. 79 00:04:49,676 --> 00:04:52,396 Speaker 1: Sentence seems to be the key. The idea is like, yes, 80 00:04:52,436 --> 00:04:55,076 Speaker 1: given you have whatever a thousand iPhones in the warehouse 81 00:04:56,076 --> 00:04:57,876 Speaker 1: in the universe where a human had to know where 82 00:04:57,876 --> 00:04:59,676 Speaker 1: they all were, you'd put them all on one shelf. 83 00:05:00,076 --> 00:05:02,516 Speaker 1: But you're saying at any given time that means that 84 00:05:02,516 --> 00:05:05,836 Speaker 1: shelf is probably going to be pretty far away, whereas 85 00:05:05,916 --> 00:05:09,756 Speaker 1: if you randomly distribute them throughout the shelves and warehouse 86 00:05:10,236 --> 00:05:12,676 Speaker 1: at any given time, one of those thousand iPhones is 87 00:05:12,676 --> 00:05:14,556 Speaker 1: probably going to be pretty close to where it needs 88 00:05:14,596 --> 00:05:17,716 Speaker 1: to be. And because you have a whatever, a computerized 89 00:05:17,716 --> 00:05:19,836 Speaker 1: system that can keep track of everything all the time, 90 00:05:19,956 --> 00:05:22,276 Speaker 1: it makes sense to randomly distribute all the things. 91 00:05:22,396 --> 00:05:25,236 Speaker 2: Yeah, that's exactly right. And it works on the flip 92 00:05:25,276 --> 00:05:27,396 Speaker 2: side as well. So when you have a new item 93 00:05:27,436 --> 00:05:30,676 Speaker 2: that's come in, rather than waiting for the shelf that 94 00:05:30,716 --> 00:05:33,516 Speaker 2: has the right size thing to put the new dog 95 00:05:33,596 --> 00:05:36,116 Speaker 2: toy in, you just put the dog toy anywhere you 96 00:05:36,156 --> 00:05:37,076 Speaker 2: can find space for it. 97 00:05:37,796 --> 00:05:41,636 Speaker 1: H It's like my house. We have a lot of 98 00:05:41,676 --> 00:05:45,316 Speaker 1: dog toys in my house. Also, Yeah, that's really interesting. 99 00:05:45,396 --> 00:05:47,996 Speaker 1: It's great for the customer, and that's optimal. 100 00:05:48,036 --> 00:05:54,676 Speaker 2: It's optimal, and it creates an incredibly difficult environment for robotics. Huh, 101 00:05:54,756 --> 00:05:57,076 Speaker 2: because now you have to deal with all this clutter. 102 00:05:57,556 --> 00:06:00,316 Speaker 2: We can have more than a million unique items in 103 00:06:00,436 --> 00:06:03,156 Speaker 2: one warehouse. Yeah, so it's not like you have a 104 00:06:03,196 --> 00:06:06,756 Speaker 2: model of each of those items. And we sell more 105 00:06:06,956 --> 00:06:10,596 Speaker 2: third party items than you know, Amazon owns themselves. Right. 106 00:06:10,636 --> 00:06:14,036 Speaker 2: We are a platform for third party fulfillment, and so 107 00:06:14,836 --> 00:06:17,116 Speaker 2: you don't have all the data about all those items, 108 00:06:17,156 --> 00:06:19,236 Speaker 2: and so you have to handle all this uncertainty, all 109 00:06:19,276 --> 00:06:22,516 Speaker 2: this clutter, and everything's tightly packed. 110 00:06:22,596 --> 00:06:26,196 Speaker 1: And so still in most places as a result, when 111 00:06:26,316 --> 00:06:28,836 Speaker 1: stuff comes into the warehouse every day off a truck 112 00:06:28,876 --> 00:06:32,636 Speaker 1: to people take the things out of the truck and 113 00:06:32,676 --> 00:06:35,156 Speaker 1: stick them randomly on shelves wherever they can find space. 114 00:06:35,236 --> 00:06:36,116 Speaker 1: Is that the system? 115 00:06:36,196 --> 00:06:38,876 Speaker 2: That is exactly the system, and it's in you know, 116 00:06:39,036 --> 00:06:41,716 Speaker 2: hundreds of buildings around the world. 117 00:06:42,556 --> 00:06:44,876 Speaker 1: And just to be clear, I mean it's pretty clear, 118 00:06:44,916 --> 00:06:48,236 Speaker 1: but just to really put a point on it, why 119 00:06:48,396 --> 00:06:50,756 Speaker 1: is this a hard environment for robots? 120 00:06:51,396 --> 00:06:56,916 Speaker 2: Traditional industrial robots do not handle contact well, so like 121 00:06:57,036 --> 00:07:03,836 Speaker 2: touching their environments, and they don't handle clutter or you know, uncertainty, 122 00:07:03,956 --> 00:07:07,356 Speaker 2: and so it's hard because to put that last book 123 00:07:07,356 --> 00:07:11,876 Speaker 2: onto the bookshelf, squeeze that teddy Bear into the just 124 00:07:12,036 --> 00:07:15,036 Speaker 2: small enough space that it'll fit, you have to push 125 00:07:15,076 --> 00:07:18,396 Speaker 2: the other stuff around that's already on that bookshelf. And 126 00:07:18,916 --> 00:07:21,996 Speaker 2: a traditional robot doesn't have sensors, it doesn't even know 127 00:07:22,036 --> 00:07:24,036 Speaker 2: how to do that. So if you think of like 128 00:07:24,076 --> 00:07:28,156 Speaker 2: a car manufacturing line, you're like nineteen nineties two thousands, 129 00:07:28,196 --> 00:07:31,516 Speaker 2: you know, welding robot or loading sheet metal into a press. 130 00:07:32,196 --> 00:07:34,956 Speaker 2: It's doing all of that only knowing its position in space. 131 00:07:35,356 --> 00:07:38,396 Speaker 2: So it has no force sensing. If it runs into something, 132 00:07:38,556 --> 00:07:41,996 Speaker 2: it either is like an emergency stop because it's like broken, 133 00:07:42,476 --> 00:07:44,756 Speaker 2: or it just smashes that thing and keeps going and 134 00:07:44,836 --> 00:07:47,356 Speaker 2: it doesn't even know it's smashed anything. It literally has 135 00:07:47,476 --> 00:07:48,236 Speaker 2: no sensing. 136 00:07:48,956 --> 00:07:53,036 Speaker 1: That is an incredibly homogeneous environment. Right, It's doing like 137 00:07:53,076 --> 00:07:55,196 Speaker 1: the exact same thing at a very high level of 138 00:07:55,196 --> 00:07:57,996 Speaker 1: precision forever one thing. 139 00:07:57,916 --> 00:08:01,876 Speaker 2: That's exactly right. And so this extension, the fundamental breakthrough 140 00:08:02,516 --> 00:08:05,876 Speaker 2: for science for robotics manipulation that my team is trying 141 00:08:05,876 --> 00:08:08,316 Speaker 2: to make is one giving the robot a sense of 142 00:08:08,356 --> 00:08:11,796 Speaker 2: touch and using that along with site and along with 143 00:08:11,996 --> 00:08:16,556 Speaker 2: like knowing where your robot is to do meaningfull tasks 144 00:08:16,596 --> 00:08:20,276 Speaker 2: in like very high contact, high clutter environments. And then 145 00:08:20,276 --> 00:08:23,316 Speaker 2: there's a brain part. It's also much more difficult to 146 00:08:23,396 --> 00:08:27,476 Speaker 2: kind of predict how this random assortment of items is 147 00:08:27,516 --> 00:08:30,436 Speaker 2: going to move or change as you push on it. 148 00:08:30,756 --> 00:08:33,116 Speaker 2: And so there's an AI piece, there's a brain piece 149 00:08:33,196 --> 00:08:36,196 Speaker 2: that's saying this item will fit in that bin. This 150 00:08:36,276 --> 00:08:37,916 Speaker 2: is actually one of the most frustrating things when you 151 00:08:37,916 --> 00:08:40,396 Speaker 2: try and do the job yourself. I'm like an optimist. 152 00:08:40,396 --> 00:08:42,356 Speaker 2: I'm always oh, yeah, this will fit. And I go 153 00:08:42,436 --> 00:08:44,236 Speaker 2: up there and I try and play Tetris and I 154 00:08:44,276 --> 00:08:47,196 Speaker 2: try and rearrange the shelf and like, it clearly isn't 155 00:08:47,196 --> 00:08:49,556 Speaker 2: going to fit. And then I've wasted thirty seconds or 156 00:08:49,556 --> 00:08:52,076 Speaker 2: forty seconds and I have to try something else. 157 00:08:53,036 --> 00:08:56,036 Speaker 1: That's a good statement of the problem. Well, like when 158 00:08:56,036 --> 00:08:57,236 Speaker 1: did you come onto the scene? 159 00:08:58,196 --> 00:09:01,276 Speaker 2: So I was working on some other stuff and there 160 00:09:01,436 --> 00:09:06,476 Speaker 2: was a recent PhD that had joined our team. He was, 161 00:09:06,516 --> 00:09:08,316 Speaker 2: you know, one year out of school something like this, 162 00:09:08,956 --> 00:09:10,796 Speaker 2: and he says, I'm going to go try and solve 163 00:09:10,996 --> 00:09:15,476 Speaker 2: stowing items into these bookshelves. And my thought was, Oh, 164 00:09:15,476 --> 00:09:18,116 Speaker 2: how naive, Like the real world is going to teach 165 00:09:18,556 --> 00:09:21,836 Speaker 2: this new grad. That's just way too hard a problem 166 00:09:21,836 --> 00:09:24,716 Speaker 2: for robotics to solve. But I was helping him because 167 00:09:24,756 --> 00:09:26,596 Speaker 2: it's fun, right, Like you like to work on hard 168 00:09:26,596 --> 00:09:29,836 Speaker 2: problems when you're a researcher. And he was a very 169 00:09:29,916 --> 00:09:32,196 Speaker 2: nice guy, and so I was, you know, helping him, 170 00:09:32,236 --> 00:09:34,356 Speaker 2: but never thought it was going to work. And there 171 00:09:34,356 --> 00:09:36,476 Speaker 2: were a couple of kind of moments where we made 172 00:09:36,516 --> 00:09:41,556 Speaker 2: these simplifications that turn the problem from I have to 173 00:09:41,596 --> 00:09:44,996 Speaker 2: try and do every possible game of Tetris that a 174 00:09:45,036 --> 00:09:49,996 Speaker 2: person can do into a problem where you're like, oh, 175 00:09:50,676 --> 00:09:52,436 Speaker 2: it's not that this is never going to work, it's 176 00:09:52,436 --> 00:09:54,836 Speaker 2: that this is the future, Like this is robotics two 177 00:09:54,836 --> 00:09:57,356 Speaker 2: point zero, Like this is I have to work on this. 178 00:09:57,436 --> 00:10:00,116 Speaker 2: I can't do anything else anymore. I'm like, I'm all 179 00:10:00,156 --> 00:10:01,236 Speaker 2: in on this problem. 180 00:10:01,756 --> 00:10:05,596 Speaker 1: Tell me about one of those simplifications one of those moments. 181 00:10:05,836 --> 00:10:09,276 Speaker 2: It was the gripper is one. The design, the mechanic 182 00:10:09,476 --> 00:10:13,076 Speaker 2: design of the robotic hand was actually a big breakthrough. 183 00:10:13,396 --> 00:10:17,036 Speaker 2: And when we started, we were trying to push items 184 00:10:17,396 --> 00:10:19,876 Speaker 2: with the item we were gripping. So imagine you're pinching 185 00:10:19,916 --> 00:10:21,996 Speaker 2: a book and you're trying to use that book to 186 00:10:22,076 --> 00:10:25,236 Speaker 2: like push this dog toy over to the side. 187 00:10:25,916 --> 00:10:27,516 Speaker 1: I see, So you want to put the book in 188 00:10:27,556 --> 00:10:30,516 Speaker 1: a bin. Yeah, dog toys in the way. So you're like, okay, 189 00:10:30,556 --> 00:10:32,236 Speaker 1: pick up the book and use the book kind of 190 00:10:32,316 --> 00:10:34,516 Speaker 1: like a brush to sweep the dog toy out of 191 00:10:34,556 --> 00:10:34,836 Speaker 1: the way. 192 00:10:35,036 --> 00:10:37,916 Speaker 2: Okay, And I say, okay, like I understand, but it's 193 00:10:37,996 --> 00:10:39,756 Speaker 2: never going to work. What if you don't have a book. 194 00:10:39,756 --> 00:10:41,676 Speaker 2: What if you have a T shirt? Yeah, what if 195 00:10:41,716 --> 00:10:44,836 Speaker 2: you have an iPhone and it's very expensive? Are you 196 00:10:44,916 --> 00:10:46,756 Speaker 2: going to actually want to start pushing on stuff with 197 00:10:46,836 --> 00:10:49,996 Speaker 2: the phone? And so we came up with this strategy 198 00:10:50,156 --> 00:10:53,556 Speaker 2: to have like a spatula that would extend into the 199 00:10:53,596 --> 00:10:56,436 Speaker 2: bin and you'd push everything with this spatula that was 200 00:10:56,596 --> 00:11:00,116 Speaker 2: part of your hand. So imagine like you're like Wolverine 201 00:11:00,316 --> 00:11:02,716 Speaker 2: and you can shoot out, you know, but instead of 202 00:11:02,756 --> 00:11:05,676 Speaker 2: like the Adamantium claws, you're shooting out a spatula. 203 00:11:07,516 --> 00:11:10,676 Speaker 1: So it's like a pincher grip. A little spatula shoots 204 00:11:10,716 --> 00:11:12,116 Speaker 1: forward out of the pincher grip. 205 00:11:12,236 --> 00:11:13,476 Speaker 2: Is the thing that's right. 206 00:11:13,876 --> 00:11:16,796 Speaker 1: It's so simple when you put it that way. I mean, 207 00:11:16,836 --> 00:11:18,756 Speaker 1: I'm sure making it was not low tech, but it 208 00:11:18,796 --> 00:11:22,556 Speaker 1: sounds very like. It's not like some crazy AI thing. 209 00:11:22,636 --> 00:11:24,556 Speaker 1: It's like just what if there was another little thing 210 00:11:24,596 --> 00:11:26,756 Speaker 1: that came out and push stuff out of the way. 211 00:11:26,916 --> 00:11:29,756 Speaker 2: But those ideas are like the really powerful ones when 212 00:11:29,796 --> 00:11:33,956 Speaker 2: you have a simple, elegant solution and you're like, okay, 213 00:11:34,396 --> 00:11:38,476 Speaker 2: that could work. That's different than like a five fingered 214 00:11:38,556 --> 00:11:42,636 Speaker 2: hand that has twenty five motors embedded in it. Yeah, 215 00:11:42,796 --> 00:11:44,596 Speaker 2: like Oh, it's just dispatulate. 216 00:11:44,196 --> 00:11:48,716 Speaker 1: Fingers are famously difficult. Why didn't anybody think of it before? 217 00:11:49,516 --> 00:11:51,836 Speaker 2: So we had been working on it as a company 218 00:11:51,956 --> 00:11:54,396 Speaker 2: back to the Amazon Picking challenge, which was, you know, 219 00:11:54,436 --> 00:11:59,916 Speaker 2: twenty fifteen. But I think a lot of robotics researchers 220 00:11:59,956 --> 00:12:02,396 Speaker 2: like myself, were scared that this problem was just too hard. 221 00:12:02,436 --> 00:12:04,476 Speaker 2: There was easier things to go try and work on, 222 00:12:05,276 --> 00:12:08,276 Speaker 2: and there were a couple of simplifications, so using this 223 00:12:08,316 --> 00:12:11,356 Speaker 2: spatulo as one, and then you watch people do the 224 00:12:11,396 --> 00:12:13,716 Speaker 2: task and you realize they're kind of doing the same 225 00:12:14,876 --> 00:12:19,276 Speaker 2: strategies over and over again. It's like insert dispatchela on 226 00:12:19,316 --> 00:12:23,596 Speaker 2: the D and sweep to one side. For this kind 227 00:12:23,596 --> 00:12:26,716 Speaker 2: of page turn mechanism. Something's fallen over and you need 228 00:12:26,756 --> 00:12:29,396 Speaker 2: to sort of flip it back up to make space. 229 00:12:30,316 --> 00:12:33,076 Speaker 1: So you put this spatula underneath it and flip the 230 00:12:33,116 --> 00:12:35,276 Speaker 1: thing up ninety degreees basically, Yeah. 231 00:12:35,156 --> 00:12:38,396 Speaker 2: And you realize that accounts for like ninety percent of 232 00:12:38,436 --> 00:12:41,476 Speaker 2: the actions you do when you try and stow into 233 00:12:41,516 --> 00:12:42,076 Speaker 2: these pins. 234 00:12:42,796 --> 00:12:45,996 Speaker 1: And did you figure that out by watching people stow. 235 00:12:45,956 --> 00:12:47,956 Speaker 2: We did and doing it yourself. 236 00:12:48,356 --> 00:12:52,356 Speaker 1: How much stowing did you do couple of days? Okay, 237 00:12:52,516 --> 00:12:56,156 Speaker 1: it's a hard job, thousands of items probably, I imagine. 238 00:12:55,596 --> 00:12:59,116 Speaker 2: Yeah, exactly. And we tried to wear go pro cameras 239 00:12:59,116 --> 00:13:00,796 Speaker 2: on our heads so we could look at the videos later, 240 00:13:00,836 --> 00:13:03,716 Speaker 2: which turns out as a recipe for motion sickness. It's 241 00:13:03,876 --> 00:13:06,516 Speaker 2: very difficult to watch those videos, but you go and 242 00:13:06,556 --> 00:13:08,356 Speaker 2: you do it, and you build up this intuition. And 243 00:13:08,916 --> 00:13:11,036 Speaker 2: I I think the other piece of the problem that 244 00:13:11,556 --> 00:13:14,036 Speaker 2: made it tractable and made me this like huge believer 245 00:13:14,636 --> 00:13:16,756 Speaker 2: was recognizing we didn't have to get to one hundred percent. 246 00:13:17,356 --> 00:13:21,716 Speaker 2: So in some automation scenarios, you have to solve the 247 00:13:21,756 --> 00:13:24,956 Speaker 2: whole problem, and if you don't, you have nothing, so 248 00:13:24,996 --> 00:13:27,476 Speaker 2: like landing on the moon. And what we realized was 249 00:13:27,516 --> 00:13:30,076 Speaker 2: there was a way to like make the business logic 250 00:13:30,196 --> 00:13:34,196 Speaker 2: work that the robot could handle seventy five percent of 251 00:13:34,236 --> 00:13:36,756 Speaker 2: the stoves and it just had to not make a 252 00:13:36,796 --> 00:13:41,036 Speaker 2: mess and let like work alongside people to do the 253 00:13:41,116 --> 00:13:44,276 Speaker 2: other twenty five percent, and the sum of the parts 254 00:13:44,316 --> 00:13:47,836 Speaker 2: is actually much better than either all robots or all 255 00:13:47,956 --> 00:13:51,876 Speaker 2: employees would be on their own. And making that realization 256 00:13:52,916 --> 00:13:54,396 Speaker 2: all of a sudden meant that it could be a 257 00:13:54,836 --> 00:13:57,316 Speaker 2: two or three year project instead of a twenty year 258 00:13:57,356 --> 00:14:01,276 Speaker 2: project because chasing this long tail. You know, we have 259 00:14:01,316 --> 00:14:04,156 Speaker 2: a million unique items in the building, but we also 260 00:14:04,236 --> 00:14:06,636 Speaker 2: process a million items per day. So I have a 261 00:14:06,676 --> 00:14:09,796 Speaker 2: phrase like, if something goes wrong one in a million, 262 00:14:09,876 --> 00:14:13,636 Speaker 2: it happens every day in every Amazon building. And to 263 00:14:13,676 --> 00:14:15,596 Speaker 2: try and solve all of those is it is a 264 00:14:15,636 --> 00:14:16,596 Speaker 2: twenty year problem. 265 00:14:17,076 --> 00:14:20,796 Speaker 1: I feel like that part of the solution generalizes in 266 00:14:20,836 --> 00:14:23,436 Speaker 1: a really nice way, right, Like I mean, I guess 267 00:14:23,436 --> 00:14:26,636 Speaker 1: the eighty twenty problem is a sort of cliche. But 268 00:14:27,236 --> 00:14:30,516 Speaker 1: the idea that like, oh, if you think of the 269 00:14:30,556 --> 00:14:32,636 Speaker 1: problem the right way, it's like, no, we don't have 270 00:14:32,636 --> 00:14:34,156 Speaker 1: to build a robot that does it every time. Before 271 00:14:34,156 --> 00:14:36,156 Speaker 1: we build a robot that does it seventy five percent 272 00:14:36,196 --> 00:14:38,916 Speaker 1: of the time. That is a huge efficiency gain and 273 00:14:38,956 --> 00:14:42,636 Speaker 1: maybe the optimal point on the curve. Right yep, if 274 00:14:42,676 --> 00:14:45,956 Speaker 1: the robot is doing everything, you're working too hard to 275 00:14:45,956 --> 00:14:46,876 Speaker 1: make the robot work. 276 00:14:46,956 --> 00:14:48,556 Speaker 2: Probably exactly that. 277 00:14:49,716 --> 00:14:52,956 Speaker 1: So, Okay, so you have these two big ideas. Do 278 00:14:53,036 --> 00:14:54,916 Speaker 1: you want to tell me the sort of story of 279 00:14:54,956 --> 00:14:56,876 Speaker 1: making it work? You want to tell me how it works. 280 00:14:57,156 --> 00:15:00,516 Speaker 2: We've been running six of these robots at a warehouse 281 00:15:00,636 --> 00:15:06,076 Speaker 2: in Spokane, Washington, Okay, since November of last year and 282 00:15:06,116 --> 00:15:09,676 Speaker 2: so we've done over half a million stows it this point. 283 00:15:10,116 --> 00:15:12,916 Speaker 2: We also have another product that's picking those items out 284 00:15:12,916 --> 00:15:16,236 Speaker 2: of the bins, and so that's my team in Germany, 285 00:15:16,316 --> 00:15:19,196 Speaker 2: and so we have a warehouse in Homburg where we've 286 00:15:19,236 --> 00:15:22,716 Speaker 2: been picking items. And picking is a slightly harder problem 287 00:15:22,756 --> 00:15:25,436 Speaker 2: in some ways because you have to identify the item. 288 00:15:25,836 --> 00:15:28,196 Speaker 2: So for stow, you have to identify free space. It's 289 00:15:28,236 --> 00:15:32,116 Speaker 2: either occupied or you can make space to put the 290 00:15:32,156 --> 00:15:34,276 Speaker 2: next item in. For pick, you want to make sure 291 00:15:34,276 --> 00:15:36,476 Speaker 2: I get you the red T shirt, not the red sweatpants, 292 00:15:36,556 --> 00:15:39,276 Speaker 2: or I get you the Harry Potter volume two and 293 00:15:39,356 --> 00:15:41,396 Speaker 2: not Sapiens or some other book. 294 00:15:41,956 --> 00:15:44,156 Speaker 1: Tell me how it works. Let's do this stewing first, 295 00:15:44,196 --> 00:15:46,356 Speaker 1: since that's what we've been talking about. So there's this 296 00:15:46,476 --> 00:15:49,836 Speaker 1: warehouse in Spokane where this robot that you built is 297 00:15:49,876 --> 00:15:52,916 Speaker 1: in use. Like what happens there? A truck pulls in 298 00:15:53,076 --> 00:15:54,036 Speaker 1: and then what happens. 299 00:15:54,476 --> 00:15:57,796 Speaker 2: The way the system works is one of these pods, 300 00:15:57,876 --> 00:16:00,956 Speaker 2: one of these bookcases pulls up to the station, so 301 00:16:00,996 --> 00:16:04,036 Speaker 2: it pulls in front of the robot. We have stereo 302 00:16:04,116 --> 00:16:07,076 Speaker 2: camera towers and so we're looking with the eyes first, 303 00:16:07,756 --> 00:16:12,716 Speaker 2: and we are creating a three D representation of the scene. 304 00:16:12,796 --> 00:16:15,996 Speaker 2: So we're modeling, you know, all the items that are 305 00:16:16,036 --> 00:16:20,716 Speaker 2: in the in the pod already. But the really interesting 306 00:16:20,756 --> 00:16:23,796 Speaker 2: part is we're actually predicting on top of that, how 307 00:16:23,836 --> 00:16:27,676 Speaker 2: we can move those items around to make more empty space, 308 00:16:28,156 --> 00:16:30,996 Speaker 2: how can we squeeze more stuff in. So it's not 309 00:16:31,276 --> 00:16:35,316 Speaker 2: just identifying vacant space. You have to predict where you 310 00:16:35,356 --> 00:16:40,156 Speaker 2: can make that vacant space by pushing stuff with this spatula. Okay, 311 00:16:40,356 --> 00:16:43,556 Speaker 2: then we do this matching algorithm. So we have about 312 00:16:43,636 --> 00:16:46,916 Speaker 2: forty or fifty items waiting for us to stow, and 313 00:16:46,956 --> 00:16:49,956 Speaker 2: so we have a variety of stuff, and we're matching 314 00:16:49,996 --> 00:16:53,756 Speaker 2: those forty or fifty items to the thirty ish shelves 315 00:16:53,836 --> 00:16:56,276 Speaker 2: that are in front of the robot. Which items should 316 00:16:56,276 --> 00:16:58,876 Speaker 2: go where, and then how do we make that space? 317 00:16:59,436 --> 00:17:01,636 Speaker 2: And so that's where a lot of the AI in 318 00:17:01,716 --> 00:17:05,436 Speaker 2: the system is active and operating. It's predicting success, it's 319 00:17:05,956 --> 00:17:08,756 Speaker 2: minimizing risk, it's trying to optimize for a bunch of 320 00:17:08,756 --> 00:17:14,196 Speaker 2: different parameters. Once we've made that selection, we grasp the item, 321 00:17:14,436 --> 00:17:18,036 Speaker 2: so that item we've selected for putting into the given 322 00:17:18,076 --> 00:17:21,876 Speaker 2: shelf passes into our hand and our hand is two 323 00:17:21,956 --> 00:17:24,196 Speaker 2: conveyor belt paddles, so you can think of it kind 324 00:17:24,196 --> 00:17:27,036 Speaker 2: of like a Panini press, like a George Foreman grill. 325 00:17:27,116 --> 00:17:30,596 Speaker 2: It is a George Foreman grill where each side has 326 00:17:30,636 --> 00:17:32,316 Speaker 2: a conveyor built into it. 327 00:17:32,596 --> 00:17:35,476 Speaker 1: Like a little belt, Like just a little belt going around. 328 00:17:35,636 --> 00:17:37,796 Speaker 2: That's right. So each face of the grill, the top 329 00:17:37,796 --> 00:17:40,876 Speaker 2: face and the bottom face have a conveyor belt. And 330 00:17:40,916 --> 00:17:43,516 Speaker 2: that's important because you can control the pose of the 331 00:17:43,556 --> 00:17:47,436 Speaker 2: item and you can feed it into the bin rather 332 00:17:47,476 --> 00:17:50,676 Speaker 2: than like throwing it into the bin. One of the 333 00:17:50,716 --> 00:17:53,076 Speaker 2: early versions we had kind of dropped it and tried 334 00:17:53,116 --> 00:17:55,036 Speaker 2: to punch it to put the item into the bin, 335 00:17:55,076 --> 00:17:57,396 Speaker 2: and that predictably failed in a lot. 336 00:17:57,196 --> 00:18:00,876 Speaker 1: Of Well you say predictably now, but if you try 337 00:18:00,916 --> 00:18:02,236 Speaker 1: to it wasn't predictable. 338 00:18:02,396 --> 00:18:05,556 Speaker 2: Yeah. Yeah. I'm a huge believer in iterative design, and 339 00:18:05,636 --> 00:18:09,716 Speaker 2: so we try and build early, build often build and 340 00:18:09,796 --> 00:18:13,716 Speaker 2: learn from those builds. So it's actually really important to 341 00:18:13,916 --> 00:18:17,956 Speaker 2: keep sixed off pose control of the item. So you 342 00:18:17,996 --> 00:18:20,756 Speaker 2: want to make sure the item isn't rotating as you 343 00:18:20,756 --> 00:18:22,556 Speaker 2: shoot it out. You want to make sure that you 344 00:18:22,796 --> 00:18:25,996 Speaker 2: keep the orientation of the item because it's fitting tightly, 345 00:18:26,636 --> 00:18:28,956 Speaker 2: so you don't want it to run into the bookshelf 346 00:18:28,996 --> 00:18:31,796 Speaker 2: above it or below it, or the items thirty in there. 347 00:18:31,876 --> 00:18:34,676 Speaker 2: Yeah yeah, yeah. We started by trying to shoot it out, 348 00:18:34,676 --> 00:18:36,236 Speaker 2: and then we had all kinds of problems when it 349 00:18:36,236 --> 00:18:38,756 Speaker 2: would like collide with stuff and fall on the floor. 350 00:18:39,316 --> 00:18:42,236 Speaker 2: The worst case is, you know, you would shoot it out, 351 00:18:42,236 --> 00:18:44,516 Speaker 2: it would bounce off the back of the back of 352 00:18:44,556 --> 00:18:46,196 Speaker 2: the bookcase and then come back and hit you in 353 00:18:46,196 --> 00:18:47,356 Speaker 2: the face or hit you in your. 354 00:18:47,436 --> 00:18:48,076 Speaker 1: Did that happen? 355 00:18:48,156 --> 00:18:48,996 Speaker 2: Yeah? Oh yeah, good. 356 00:18:49,036 --> 00:18:52,636 Speaker 1: That's robot comedy, yesh, ro but physical comedy. 357 00:18:52,916 --> 00:18:53,396 Speaker 2: Yeah. 358 00:18:53,476 --> 00:18:54,556 Speaker 1: So, and that's it. 359 00:18:54,916 --> 00:18:56,956 Speaker 2: That's the stow process. And we want to do that 360 00:18:57,036 --> 00:18:59,516 Speaker 2: a few hundred times an hour, and we want to 361 00:18:59,516 --> 00:19:03,476 Speaker 2: do it on the top shelves of those bookcases where. Yeah. 362 00:19:03,516 --> 00:19:06,596 Speaker 2: That's one of the ways we are really complimentary to 363 00:19:06,716 --> 00:19:10,116 Speaker 2: the employees is if the robots can do the top shelves, 364 00:19:09,836 --> 00:19:13,436 Speaker 2: it saves a lot of ergonomic tasks. It allows the 365 00:19:13,556 --> 00:19:16,436 Speaker 2: employees to work in their power zone, like you know, 366 00:19:17,156 --> 00:19:20,236 Speaker 2: shoulder level. That makes them faster too. So if you 367 00:19:20,236 --> 00:19:22,436 Speaker 2: put robots in, people get faster at the job. 368 00:19:22,916 --> 00:19:25,996 Speaker 1: I mean, presumably as the robot gets better, it'll also 369 00:19:26,036 --> 00:19:27,836 Speaker 1: be better at putting things on the middle shelf. 370 00:19:28,036 --> 00:19:29,716 Speaker 2: Right, Well, there's this like sweet spot. 371 00:19:29,756 --> 00:19:31,956 Speaker 1: The robot's going to get better faster than people will 372 00:19:31,996 --> 00:19:32,436 Speaker 1: get better. 373 00:19:33,036 --> 00:19:34,996 Speaker 2: Yeah, we want the robots to be as good as 374 00:19:34,996 --> 00:19:37,396 Speaker 2: it can and not chase one hundred percent. We don't 375 00:19:37,396 --> 00:19:39,756 Speaker 2: really believe in one hundred percent automation. We want to 376 00:19:40,396 --> 00:19:43,236 Speaker 2: find that sweet spot where we're maximizing product. 377 00:19:43,236 --> 00:19:45,476 Speaker 1: Mean, the sweet spot's going to keep moving, right, The 378 00:19:45,556 --> 00:19:47,716 Speaker 1: robot's going to get better and better and be able 379 00:19:47,716 --> 00:19:49,356 Speaker 1: to do more and more faster and faster. 380 00:19:49,196 --> 00:19:52,516 Speaker 2: Presumably, and my science team's actually really excited about that. 381 00:19:52,716 --> 00:19:54,996 Speaker 2: As you get more and more data. So we have 382 00:19:55,036 --> 00:19:58,276 Speaker 2: five hundred thousand stows that we've done so far, but 383 00:19:58,356 --> 00:20:00,876 Speaker 2: when we get to five hundred million STOs, you can 384 00:20:00,956 --> 00:20:03,676 Speaker 2: leverage some of these techniques to start learning the motions 385 00:20:03,716 --> 00:20:06,676 Speaker 2: and learning some of these strategies and refining them to 386 00:20:06,716 --> 00:20:09,196 Speaker 2: be specific to the item that you are molding in 387 00:20:09,196 --> 00:20:12,836 Speaker 2: your hand. There's a lot of opportunity as you get 388 00:20:12,876 --> 00:20:13,916 Speaker 2: more and more data. 389 00:20:14,156 --> 00:20:16,556 Speaker 1: Well, right, so we haven't really I mean, you mentioned 390 00:20:16,636 --> 00:20:19,236 Speaker 1: the software side, the AI side, but we haven't really 391 00:20:19,276 --> 00:20:22,316 Speaker 1: talked about it, and it is I mean, in talking 392 00:20:22,316 --> 00:20:26,076 Speaker 1: to other people working on robotics. It's plainly a data 393 00:20:27,036 --> 00:20:30,356 Speaker 1: game because there's no Internet of the physical world, right, 394 00:20:30,356 --> 00:20:33,436 Speaker 1: because large language models work so well, because there's this 395 00:20:33,556 --> 00:20:36,796 Speaker 1: huge data set and everybody is trying to get data 396 00:20:36,836 --> 00:20:40,396 Speaker 1: from the physical world, and you seem very well positioned 397 00:20:40,436 --> 00:20:42,756 Speaker 1: to get a lot of data from the physical world. 398 00:20:43,516 --> 00:20:46,236 Speaker 2: I think that's true. So one of the joys of 399 00:20:46,276 --> 00:20:48,756 Speaker 2: being a roboticist at Amazon is all the data that 400 00:20:48,796 --> 00:20:51,796 Speaker 2: we have access to. But I will push back a 401 00:20:51,796 --> 00:20:54,476 Speaker 2: little bit that it's just a data problem. It's a 402 00:20:54,516 --> 00:20:57,916 Speaker 2: highly debated topic. Some people in the world believe that 403 00:20:57,996 --> 00:21:02,036 Speaker 2: you can apply the same sort of transformer architectures that 404 00:21:02,156 --> 00:21:05,156 Speaker 2: work so well for search and so well for natural 405 00:21:05,236 --> 00:21:08,636 Speaker 2: language processing and apply those to robotics. If we only 406 00:21:08,716 --> 00:21:12,556 Speaker 2: had the data. I would not put myself in that camp. 407 00:21:12,676 --> 00:21:15,076 Speaker 2: I am not a believer that all we need is 408 00:21:15,116 --> 00:21:18,956 Speaker 2: more torque data from robotic grippers and will solve it. 409 00:21:19,516 --> 00:21:23,396 Speaker 2: Natural language is already tokenized in a way that's very 410 00:21:23,396 --> 00:21:27,836 Speaker 2: amenable to those methods, and language and search are also 411 00:21:28,716 --> 00:21:32,996 Speaker 2: very tolerant of sloppiness. So you and I can have 412 00:21:33,036 --> 00:21:35,916 Speaker 2: a conversation. I don't have to get every single word correct, 413 00:21:36,516 --> 00:21:38,876 Speaker 2: but if you mess up a torque on a gripper, 414 00:21:38,916 --> 00:21:41,836 Speaker 2: you can crush your iPhone, or you can sort of 415 00:21:41,876 --> 00:21:45,396 Speaker 2: smash something else that's there, or drop something, or just 416 00:21:45,436 --> 00:21:48,556 Speaker 2: fail the task. And that's because you have physics and 417 00:21:48,676 --> 00:21:54,436 Speaker 2: this nonlinear, very sort of difficult to model real world 418 00:21:54,916 --> 00:21:57,956 Speaker 2: that these robots have to interact with. And so I 419 00:21:57,996 --> 00:22:01,596 Speaker 2: think those techniques certainly accelerate us in a lot of places, 420 00:22:01,596 --> 00:22:04,636 Speaker 2: but they don't just solve the problem. I think we 421 00:22:04,676 --> 00:22:08,236 Speaker 2: need all of the rest of robotics, like hardware design 422 00:22:08,276 --> 00:22:11,476 Speaker 2: and classical control theory to solve those problems. 423 00:22:12,516 --> 00:22:15,756 Speaker 1: Compelling. Although you did start this part of the conversation, 424 00:22:15,876 --> 00:22:18,676 Speaker 1: you brought it up by saying, the science team is 425 00:22:18,716 --> 00:22:21,716 Speaker 1: really excited for what the model's going to learn once 426 00:22:21,796 --> 00:22:24,156 Speaker 1: you have hundreds of millions of stoves. 427 00:22:24,316 --> 00:22:26,396 Speaker 2: That's right, and that both things are true. 428 00:22:26,596 --> 00:22:28,956 Speaker 1: I know, yes, plainly, we're just talking about sort of 429 00:22:28,996 --> 00:22:31,756 Speaker 1: the margins, right, what is true? At what margins? I mean? 430 00:22:31,796 --> 00:22:34,436 Speaker 1: I did wonder as I was reading about this, you know, 431 00:22:34,476 --> 00:22:37,556 Speaker 1: I thought of AWS of Amazon Web Services, which of course, 432 00:22:37,676 --> 00:22:40,156 Speaker 1: like was an internal Amazon thing that at some point 433 00:22:40,156 --> 00:22:42,476 Speaker 1: Amazon was like, oh, maybe other people would find this 434 00:22:42,516 --> 00:22:45,276 Speaker 1: service useful, And now it's a giant part of Amazon's business, 435 00:22:45,596 --> 00:22:50,876 Speaker 1: and so I wondered, like, are you building Amazon robotics services. 436 00:22:51,436 --> 00:22:55,396 Speaker 2: Yet not today? There's so much value that we can 437 00:22:55,436 --> 00:23:00,596 Speaker 2: provide to our fulfillment business that we are one hundred 438 00:23:00,596 --> 00:23:04,196 Speaker 2: percent focused on that. Certainly as a roboticist, though, I 439 00:23:04,276 --> 00:23:08,236 Speaker 2: take great joy that the work we're doing is advancing 440 00:23:08,276 --> 00:23:11,396 Speaker 2: the field of robots, and so it's definitely like in 441 00:23:11,436 --> 00:23:14,116 Speaker 2: the makes my job better that we're advancing the state 442 00:23:14,156 --> 00:23:17,716 Speaker 2: of the art. But from a business perspective, it's all 443 00:23:17,756 --> 00:23:24,956 Speaker 2: hands on making the fulfillment process better for Amazon dot Com. 444 00:23:25,076 --> 00:23:40,276 Speaker 1: We'll be back in just a minute. I think I 445 00:23:40,396 --> 00:23:45,356 Speaker 1: read you say that you're building a foundation model of items. 446 00:23:45,556 --> 00:23:48,556 Speaker 1: Is that right? And I sort of know what that means, 447 00:23:48,556 --> 00:23:50,276 Speaker 1: But tell me what that means when you say that. 448 00:23:50,996 --> 00:23:54,716 Speaker 2: So, when a robot handles an item, it would do 449 00:23:54,836 --> 00:23:58,396 Speaker 2: better if it takes into account the properties of that item. 450 00:23:58,476 --> 00:24:01,716 Speaker 2: So if you're trying to hand a bowling ball to someone, 451 00:24:02,356 --> 00:24:04,196 Speaker 2: you should do that in a different way than if 452 00:24:04,196 --> 00:24:06,876 Speaker 2: you're handing them a bouncy ball or a light bulb. 453 00:24:07,356 --> 00:24:11,196 Speaker 2: At its core, a foundation model for items is simply 454 00:24:11,276 --> 00:24:15,356 Speaker 2: a model that encodes all of those attributes of an item. 455 00:24:15,916 --> 00:24:19,716 Speaker 2: And makes them available to the robotic systems that are 456 00:24:19,756 --> 00:24:21,516 Speaker 2: going to use it. And one of the things that 457 00:24:21,556 --> 00:24:23,756 Speaker 2: makes it a foundation model instead of just you know, 458 00:24:23,836 --> 00:24:26,076 Speaker 2: some custom bespoke thing is that you can transfer it 459 00:24:26,116 --> 00:24:30,396 Speaker 2: across lots of different applications. So if it's you know, stowing, 460 00:24:30,596 --> 00:24:32,556 Speaker 2: you can use it. If you're packing it into a 461 00:24:32,596 --> 00:24:35,716 Speaker 2: delivery box, you can use it. If you're putting it 462 00:24:35,756 --> 00:24:38,156 Speaker 2: onto a shelf in a physical store like for grocery 463 00:24:38,236 --> 00:24:40,276 Speaker 2: or whole foods or something, you can use it. And 464 00:24:40,316 --> 00:24:43,756 Speaker 2: so that like commonality across applications is one of the 465 00:24:43,756 --> 00:24:44,716 Speaker 2: things that's important. 466 00:24:45,476 --> 00:24:49,516 Speaker 1: Is part of the notion there that like the model 467 00:24:49,556 --> 00:24:52,516 Speaker 1: would allow a robot to sort of look at some 468 00:24:52,836 --> 00:24:56,396 Speaker 1: novel item and make a reasonable inference about the properties 469 00:24:56,436 --> 00:24:57,156 Speaker 1: of that item. 470 00:24:57,436 --> 00:25:00,636 Speaker 2: Yeah, absolutely that. And the other thing that's a little 471 00:25:00,676 --> 00:25:05,316 Speaker 2: non intuitive is that by understanding how to handle that 472 00:25:05,356 --> 00:25:10,076 Speaker 2: item in all those different applications a grocery a you know, 473 00:25:10,116 --> 00:25:14,316 Speaker 2: stowing picking, you get better at the individual application. So 474 00:25:14,916 --> 00:25:18,196 Speaker 2: by training on all of this data across these different domains, 475 00:25:18,596 --> 00:25:22,316 Speaker 2: you actually get better at the individual task that your 476 00:25:22,556 --> 00:25:26,156 Speaker 2: specific robot is trying to do. Doesn't like it takes 477 00:25:26,156 --> 00:25:29,316 Speaker 2: a while to like understand that it is not intuitive. 478 00:25:29,796 --> 00:25:31,596 Speaker 1: SayMore, what do you mean, like, I don't know that 479 00:25:31,676 --> 00:25:32,436 Speaker 1: I fully get it. 480 00:25:32,956 --> 00:25:35,836 Speaker 2: Understanding how an item behaves when you gift wrap it, 481 00:25:36,596 --> 00:25:40,236 Speaker 2: Uh huh shouldn't really inform how it's going to behave 482 00:25:40,316 --> 00:25:42,196 Speaker 2: when you're picking it off of a bookshelf. 483 00:25:42,796 --> 00:25:45,796 Speaker 1: Oh, I mean yes, it should, right, Like if you 484 00:25:45,876 --> 00:25:49,836 Speaker 1: think of like a whatever, a stuffed animal versus a book. Yeah, 485 00:25:49,876 --> 00:25:52,836 Speaker 1: maybe that's too easy of a case, but like if 486 00:25:52,876 --> 00:25:56,796 Speaker 1: a thing is squishy or rigid, that seems like as 487 00:25:56,836 --> 00:25:58,996 Speaker 1: a human being, I feel like we sort of port 488 00:25:59,076 --> 00:26:01,516 Speaker 1: that knowledge from one use case to another, right. 489 00:26:02,436 --> 00:26:04,596 Speaker 2: Yeah, it's a good point. And maybe that's because we 490 00:26:04,756 --> 00:26:08,596 Speaker 2: are inherently sort of we think and manipulate items in 491 00:26:08,636 --> 00:26:11,436 Speaker 2: the world more similarly to how these foundation models do. 492 00:26:11,916 --> 00:26:15,076 Speaker 2: But ten years ago it was totally not the case. 493 00:26:15,116 --> 00:26:17,796 Speaker 2: You would train your model in a very narrow domain, 494 00:26:17,996 --> 00:26:21,196 Speaker 2: and if you gave it data from some other domain, 495 00:26:21,356 --> 00:26:23,396 Speaker 2: it would kind of corrupt the results that you had, 496 00:26:23,476 --> 00:26:26,956 Speaker 2: and so you were very careful to curate all the 497 00:26:27,076 --> 00:26:29,556 Speaker 2: data that you were using to be very specific to 498 00:26:29,596 --> 00:26:32,036 Speaker 2: the task that you wanted it to do, and that 499 00:26:32,076 --> 00:26:34,996 Speaker 2: made the performance better. But it also meant that the 500 00:26:35,036 --> 00:26:37,436 Speaker 2: model you had was only good at that one very 501 00:26:37,516 --> 00:26:38,116 Speaker 2: narrow thing. 502 00:26:38,676 --> 00:26:40,596 Speaker 1: It was why we were always so far from the 503 00:26:40,636 --> 00:26:43,716 Speaker 1: general purpose robot. Yeah, because, as you're describing it, trying 504 00:26:43,756 --> 00:26:46,636 Speaker 1: to make a robot do more than one thing just 505 00:26:46,916 --> 00:26:47,836 Speaker 1: meant it couldn't even do. 506 00:26:47,756 --> 00:26:49,396 Speaker 2: One We couldn't even do one thing, and so you're 507 00:26:49,396 --> 00:26:51,756 Speaker 2: putting all your effort into making it do that one 508 00:26:51,796 --> 00:26:56,036 Speaker 2: thing just a little better. I think there's another really 509 00:26:56,036 --> 00:27:01,556 Speaker 2: interesting piece here, which is our team, the Vulcan team 510 00:27:01,556 --> 00:27:05,636 Speaker 2: at Amazon, is trying to use touch and vision together, 511 00:27:06,396 --> 00:27:09,916 Speaker 2: and that is how people interact with the world. That's 512 00:27:09,956 --> 00:27:13,716 Speaker 2: how people manipulate the world. And so the example I 513 00:27:14,116 --> 00:27:16,396 Speaker 2: like to give is picking a coin up off a 514 00:27:16,436 --> 00:27:19,236 Speaker 2: table ten years ago, when a robot would try and 515 00:27:19,276 --> 00:27:22,636 Speaker 2: do that, I mean it's impossible, Like, robot can't pick 516 00:27:22,636 --> 00:27:24,676 Speaker 2: a coin up off table, it's too hard a task. 517 00:27:25,636 --> 00:27:27,396 Speaker 2: My five year old can pick a coin up off 518 00:27:27,396 --> 00:27:30,996 Speaker 2: the table in half a second without you noticing. Well, 519 00:27:31,036 --> 00:27:34,076 Speaker 2: the reason is your strategy. So when you pick a 520 00:27:34,116 --> 00:27:37,796 Speaker 2: coin up off the table, you actually don't grasp the coin. 521 00:27:38,076 --> 00:27:40,676 Speaker 2: You go and you touch the table and then you 522 00:27:40,796 --> 00:27:44,596 Speaker 2: slide your fingers along the surface of the table until 523 00:27:44,636 --> 00:27:47,436 Speaker 2: you feel the coin, and when you feel the coin, 524 00:27:47,476 --> 00:27:50,556 Speaker 2: that's your trigger to like rotate it up into a grasp. 525 00:27:51,716 --> 00:27:54,476 Speaker 2: You're not going to some millimeter precision the way your 526 00:27:54,516 --> 00:27:59,196 Speaker 2: grandfather's robot and the welding line would do. And you're 527 00:27:59,196 --> 00:28:01,436 Speaker 2: not just watching with your eyes. You're using your eyes 528 00:28:01,516 --> 00:28:04,516 Speaker 2: and your fingertips both your. 529 00:28:04,356 --> 00:28:07,236 Speaker 1: Sense of touch. Yes, sense of touch is central to. 530 00:28:07,156 --> 00:28:11,556 Speaker 2: Pick and we are trying to do those same kind 531 00:28:11,636 --> 00:28:16,196 Speaker 2: of behaviors that are not only reacting to touch, but 532 00:28:16,516 --> 00:28:19,796 Speaker 2: planning for touch. So the same way you plan to 533 00:28:19,796 --> 00:28:23,716 Speaker 2: touch the table first, we plan to put our spatula 534 00:28:23,876 --> 00:28:26,796 Speaker 2: against the side of the bookcase before we try to 535 00:28:26,836 --> 00:28:30,676 Speaker 2: extend it in between this you know, small gap between 536 00:28:30,676 --> 00:28:32,196 Speaker 2: the T shirt and the bag and the side of 537 00:28:32,236 --> 00:28:36,396 Speaker 2: the bookcase. So we are building our plans and our 538 00:28:36,716 --> 00:28:40,156 Speaker 2: controllers around having sight and touch. 539 00:28:40,836 --> 00:28:42,676 Speaker 1: I mean when you say touch in the context of 540 00:28:42,716 --> 00:28:45,796 Speaker 1: the robot, does that mean that it is getting feedback 541 00:28:45,916 --> 00:28:49,196 Speaker 1: from the stuff it is coming into contact with? And 542 00:28:49,276 --> 00:28:51,396 Speaker 1: is that novel? And how does that work? 543 00:28:51,476 --> 00:28:54,636 Speaker 2: So the sensor is a force torque sensor. It looks 544 00:28:54,636 --> 00:28:57,956 Speaker 2: like a hockey puck and a thousand times a second, 545 00:28:58,236 --> 00:29:03,076 Speaker 2: it's telling you what it feels in the six degrees 546 00:29:03,076 --> 00:29:05,996 Speaker 2: of freedom, So up, up and down is one, left 547 00:29:05,996 --> 00:29:08,396 Speaker 2: and right is two, in and out as three, and 548 00:29:08,436 --> 00:29:11,756 Speaker 2: then you've got roll pitch y'ah as the three torques. 549 00:29:12,076 --> 00:29:15,516 Speaker 2: So a thousand times per second, you're sensing, you're feeling 550 00:29:16,876 --> 00:29:19,996 Speaker 2: what the world is pushing on you with, and we 551 00:29:20,156 --> 00:29:22,836 Speaker 2: use that to control the motion but also to plan 552 00:29:22,916 --> 00:29:23,276 Speaker 2: the motion. 553 00:29:24,676 --> 00:29:27,516 Speaker 1: When you say plan the motion, it's like, given the 554 00:29:27,556 --> 00:29:29,836 Speaker 1: sense of touch that is happening right now, what should 555 00:29:29,836 --> 00:29:30,836 Speaker 1: I do next? Yep. 556 00:29:31,156 --> 00:29:33,876 Speaker 2: So in a like high level view, it's like touch 557 00:29:33,916 --> 00:29:37,676 Speaker 2: the table first, slide along the table while keeping you know, 558 00:29:38,676 --> 00:29:41,436 Speaker 2: sort of one pound of force pushing into the table 559 00:29:42,116 --> 00:29:45,156 Speaker 2: until you touch the coin, and then you know, rotate. 560 00:29:45,596 --> 00:29:47,916 Speaker 2: That's at a high level, but then even at a 561 00:29:48,036 --> 00:29:51,116 Speaker 2: low level, the thousand times per second is so that 562 00:29:51,276 --> 00:29:54,276 Speaker 2: as you slide your fingers along the table, you're sort 563 00:29:54,276 --> 00:29:56,556 Speaker 2: of maintaining that accurate force. 564 00:29:57,676 --> 00:30:00,276 Speaker 1: Yeah. Or like if you're putting a thing on the shelf, 565 00:30:00,356 --> 00:30:02,236 Speaker 1: you can sort of tell if you've pushed it too 566 00:30:02,276 --> 00:30:04,156 Speaker 1: far because the shelf is pushing back. 567 00:30:04,036 --> 00:30:06,796 Speaker 2: At you exactly, or you can tell it slipping and 568 00:30:06,836 --> 00:30:08,836 Speaker 2: you're about to like push over the top of it, 569 00:30:09,156 --> 00:30:11,956 Speaker 2: so you can like, oh, it's about to fall over, 570 00:30:12,116 --> 00:30:15,596 Speaker 2: so I can react. And those dynamics are happening at 571 00:30:16,236 --> 00:30:18,316 Speaker 2: tens or hundreds of hurt since you need to sense 572 00:30:18,356 --> 00:30:19,556 Speaker 2: them at a thousand hurts. 573 00:30:21,156 --> 00:30:23,916 Speaker 1: What's the frontier right now for stewing? What are you 574 00:30:24,276 --> 00:30:25,636 Speaker 1: trying to figure out? 575 00:30:26,796 --> 00:30:31,276 Speaker 2: One of the things is getting the fullness of those 576 00:30:31,316 --> 00:30:34,876 Speaker 2: bins all the way up to where they are today, 577 00:30:34,916 --> 00:30:37,396 Speaker 2: so as a person you can pack those bins really, 578 00:30:37,436 --> 00:30:41,836 Speaker 2: really densely, and so the robot's close but not quite 579 00:30:41,876 --> 00:30:45,636 Speaker 2: as good as a person is today at getting as 580 00:30:45,716 --> 00:30:48,916 Speaker 2: much stuff into the bookcase as it can. That's one frontier, 581 00:30:49,756 --> 00:30:53,876 Speaker 2: and that is because one we're conservative, like our brain 582 00:30:53,996 --> 00:30:56,596 Speaker 2: is telling us there's no space when really there is space. 583 00:30:57,156 --> 00:31:00,916 Speaker 2: And two it's because those motions are not sophisticated enough yet. 584 00:31:01,436 --> 00:31:04,316 Speaker 2: So we're trying to improve our video streaming. We're trying 585 00:31:04,316 --> 00:31:07,356 Speaker 2: to get the eyes better to help as well as 586 00:31:07,396 --> 00:31:11,556 Speaker 2: those low level touch centers to those behaviors to be better. 587 00:31:12,516 --> 00:31:16,756 Speaker 2: So that's one of the major frontiers. The other one 588 00:31:16,796 --> 00:31:19,276 Speaker 2: is the negative. The robot makes too many mistakes, so 589 00:31:20,396 --> 00:31:25,676 Speaker 2: defects and exception handling are so important in robotic systems, 590 00:31:26,156 --> 00:31:27,836 Speaker 2: and this is another thing I think the world on 591 00:31:27,876 --> 00:31:30,596 Speaker 2: the Internet doesn't appreciate enough. Like you can do a 592 00:31:30,636 --> 00:31:33,956 Speaker 2: demo and a happy path. Hey, it worked once. I 593 00:31:33,956 --> 00:31:35,916 Speaker 2: can submit a paper to a conference, or I can 594 00:31:35,916 --> 00:31:38,876 Speaker 2: put a cool video on YouTube. That's great. You have 595 00:31:38,876 --> 00:31:41,876 Speaker 2: a demo. To have a product, you have to make 596 00:31:41,876 --> 00:31:44,916 Speaker 2: sure it's working, you know, ninety nine percent of the time, 597 00:31:44,996 --> 00:31:46,916 Speaker 2: or ninety nine and a half percent, or you know 598 00:31:47,036 --> 00:31:50,676 Speaker 2: in some cases four nines or five nines. And a 599 00:31:50,716 --> 00:31:51,996 Speaker 2: lot of the work you have to do is to 600 00:31:52,076 --> 00:31:57,436 Speaker 2: recover and handle those rare exceptions or prevent or recover 601 00:31:57,516 --> 00:32:00,916 Speaker 2: from those defects. And so the robot still drops too 602 00:32:00,956 --> 00:32:03,756 Speaker 2: much stuff on the floor. One of our frontiers is 603 00:32:03,796 --> 00:32:06,436 Speaker 2: not dropping crap on the floor, like, we need to 604 00:32:06,436 --> 00:32:07,796 Speaker 2: get about three times better at that. 605 00:32:08,196 --> 00:32:13,116 Speaker 1: Umly, the robot is already skipping some universe of items 606 00:32:13,116 --> 00:32:14,316 Speaker 1: that the robot can't handle. 607 00:32:14,876 --> 00:32:17,236 Speaker 2: Yeah, and so we need to get smarter about which 608 00:32:17,276 --> 00:32:19,716 Speaker 2: items we skip and which items we take. We also 609 00:32:19,756 --> 00:32:23,316 Speaker 2: need to get better at inserting those items in such 610 00:32:23,316 --> 00:32:24,756 Speaker 2: a way that they're not going to fall back out? 611 00:32:25,956 --> 00:32:27,956 Speaker 1: What items are particularly hard for the robot? 612 00:32:28,596 --> 00:32:31,596 Speaker 2: So tight fitting items are the hardest. 613 00:32:31,236 --> 00:32:34,116 Speaker 1: Uh huh. And so that's not the nature of the item, 614 00:32:34,156 --> 00:32:37,516 Speaker 1: but the nature of the particular relationship between the item and. 615 00:32:37,556 --> 00:32:38,556 Speaker 2: The shelf exactly. 616 00:32:38,836 --> 00:32:41,956 Speaker 1: Yeah, Like, is there a kind of thing that the 617 00:32:42,076 --> 00:32:46,236 Speaker 1: robot just can't do because of its shape or something. 618 00:32:46,436 --> 00:32:51,476 Speaker 2: There is a particular rubber fish that we really hate. 619 00:32:52,516 --> 00:32:53,396 Speaker 2: It's a dog toy. 620 00:32:53,476 --> 00:32:54,756 Speaker 1: It's floppy. Is that what? 621 00:32:55,076 --> 00:32:55,516 Speaker 2: Sticky? 622 00:32:56,316 --> 00:33:00,156 Speaker 1: Oh? Sticky? Interesting? Yeah, And they don't put it in 623 00:33:00,196 --> 00:33:03,996 Speaker 1: a bot, Nope, they just send you the sticky fish. 624 00:33:04,116 --> 00:33:07,076 Speaker 2: Yeah, and it sort of gets hung up on whenever 625 00:33:07,116 --> 00:33:09,916 Speaker 2: it makes contact. It doesn't slide, it like it wants 626 00:33:09,996 --> 00:33:13,556 Speaker 2: to rotate about whatever it's made contact with. And so 627 00:33:13,596 --> 00:33:15,596 Speaker 2: there's this particular dog toy and so we use it. 628 00:33:15,636 --> 00:33:18,676 Speaker 2: We've bought like fifty of them and now we have 629 00:33:18,756 --> 00:33:21,076 Speaker 2: them in the lab and this is like our diabolical 630 00:33:21,156 --> 00:33:21,756 Speaker 2: item set. 631 00:33:22,196 --> 00:33:23,996 Speaker 1: Is that a term of art that diabolical? 632 00:33:24,556 --> 00:33:27,436 Speaker 2: I don't know, Yeah, it's our term of art. Yeah. 633 00:33:27,916 --> 00:33:30,916 Speaker 2: Also bagged items where the bag is really loose. So 634 00:33:31,996 --> 00:33:34,756 Speaker 2: imagine having like a T shirt in a bag, but 635 00:33:34,796 --> 00:33:36,996 Speaker 2: the bag is like twice as big as the T shirt. 636 00:33:37,956 --> 00:33:40,036 Speaker 1: Floppy? Is that the floppy problem? 637 00:33:40,196 --> 00:33:43,996 Speaker 2: Floppy but also transparent, so sometimes you can see through 638 00:33:44,036 --> 00:33:44,476 Speaker 2: the bag. 639 00:33:44,596 --> 00:33:48,076 Speaker 1: Or so the robot gets confused about is the bag 640 00:33:48,156 --> 00:33:49,036 Speaker 1: the item. 641 00:33:49,076 --> 00:33:51,916 Speaker 2: Yeah or not? Sometimes you want one and sometimes you 642 00:33:51,916 --> 00:33:55,356 Speaker 2: want the other. So like if it's just floppy plastic bag, 643 00:33:55,756 --> 00:33:57,876 Speaker 2: it probably will fit. Like if I just push it 644 00:33:57,916 --> 00:33:59,876 Speaker 2: into the bin, the bag is going to conform and 645 00:34:00,396 --> 00:34:02,996 Speaker 2: slide in, but you can't be sure about that. You know, 646 00:34:03,036 --> 00:34:04,716 Speaker 2: you get into a bunch of those edge cases that 647 00:34:04,756 --> 00:34:06,836 Speaker 2: are in that long tail of being robust. 648 00:34:07,756 --> 00:34:10,596 Speaker 1: I mean, it's interesting, right because the robot is dealing 649 00:34:10,636 --> 00:34:13,436 Speaker 1: with this sort of human optimized world. Like it reminds 650 00:34:13,476 --> 00:34:16,836 Speaker 1: me of the way I think is it. Ikea designs 651 00:34:16,876 --> 00:34:19,996 Speaker 1: its furniture to fit optimally on a pallette, so you 652 00:34:20,036 --> 00:34:21,556 Speaker 1: can fit the most of them, like not just the 653 00:34:21,596 --> 00:34:25,436 Speaker 1: flat pack, but like in more subtle ways. And can 654 00:34:25,516 --> 00:34:28,716 Speaker 1: you imagine that there is some shift in the world 655 00:34:28,796 --> 00:34:31,436 Speaker 1: where I mean, obviously you're trying to make the robot better, 656 00:34:31,516 --> 00:34:33,596 Speaker 1: but also people are trying to make things work better 657 00:34:33,596 --> 00:34:34,276 Speaker 1: for the robot. 658 00:34:34,876 --> 00:34:38,956 Speaker 2: Yes. Absolutely, And there is a different team within Amazon 659 00:34:38,996 --> 00:34:43,476 Speaker 2: that's imagining a future world and future bookcases that are 660 00:34:43,916 --> 00:34:45,156 Speaker 2: friendly for robots. 661 00:34:45,636 --> 00:34:46,356 Speaker 1: Uh huh. 662 00:34:46,396 --> 00:34:53,316 Speaker 2: However, there are currently five million of those bookshelves in 663 00:34:53,436 --> 00:34:57,636 Speaker 2: warehouses holding inventory that's for sale on Amazon dot Com. 664 00:34:58,556 --> 00:35:02,716 Speaker 2: And so it's a really really big lift to go 665 00:35:02,836 --> 00:35:04,436 Speaker 2: replace all of those books. 666 00:35:04,436 --> 00:35:07,636 Speaker 1: Shehlds interesting. So it's a whole other team that's just like, 667 00:35:07,796 --> 00:35:13,676 Speaker 1: let's imagine the you know, a much more robot centric warehouse. Yeah, 668 00:35:13,716 --> 00:35:15,516 Speaker 1: those guys like you don't even talk to them. They're 669 00:35:15,516 --> 00:35:17,196 Speaker 1: just off in their own. 670 00:35:16,876 --> 00:35:19,716 Speaker 2: I mean, they're friends, but yeah, we are spacing very 671 00:35:19,716 --> 00:35:23,956 Speaker 2: different problems. And so we took a tenant very early on. 672 00:35:24,036 --> 00:35:28,916 Speaker 2: It's like, the world exists, the robot needs to perform 673 00:35:29,116 --> 00:35:32,316 Speaker 2: in the world as it exists. And this team they 674 00:35:32,316 --> 00:35:34,916 Speaker 2: get their green field, so they get to think of 675 00:35:34,996 --> 00:35:38,116 Speaker 2: like a new field. We are a brownfield, meaning we 676 00:35:38,156 --> 00:35:41,036 Speaker 2: have to retrofit into these existing buildings. You know, we 677 00:35:41,076 --> 00:35:43,036 Speaker 2: have like ten year leases on some of these building 678 00:35:43,036 --> 00:35:44,676 Speaker 2: They're going to be there for a long long time. 679 00:35:45,636 --> 00:35:47,716 Speaker 1: And then somebody else is out there. So they're building 680 00:35:47,756 --> 00:35:50,196 Speaker 1: a whole other kind of robot. Your robot is optimized 681 00:35:50,236 --> 00:35:52,036 Speaker 1: for the world today, and somebody else is building a 682 00:35:52,116 --> 00:35:53,956 Speaker 1: robot for the robot world. 683 00:35:53,996 --> 00:35:56,916 Speaker 2: That's right. I love that they have a building that 684 00:35:56,956 --> 00:36:00,516 Speaker 2: they've built in Louisiana. It's in Shreveport, Louisiana. It has 685 00:36:00,716 --> 00:36:04,996 Speaker 2: ten times the number of robots that traditional building has. 686 00:36:05,716 --> 00:36:09,836 Speaker 2: It's a completely reimagined way of fulfilling your order. It 687 00:36:09,836 --> 00:36:12,436 Speaker 2: also has a lot of people still working in those buildings, 688 00:36:12,476 --> 00:36:17,556 Speaker 2: but they're working in maintenance and robotics quarterbacks jobs, and 689 00:36:17,596 --> 00:36:19,676 Speaker 2: so they're higher skilled. And so we have a bunch 690 00:36:19,676 --> 00:36:23,356 Speaker 2: of programs that are trying to transition our very talented 691 00:36:23,356 --> 00:36:26,276 Speaker 2: workforce into the jobs of the future. One of the 692 00:36:26,276 --> 00:36:28,156 Speaker 2: things I really like to say is, you don't need 693 00:36:28,196 --> 00:36:31,676 Speaker 2: a college degree to work in robotics. At Amazon. It's 694 00:36:31,716 --> 00:36:33,996 Speaker 2: about twenty twenty five percent of my team doesn't have 695 00:36:33,996 --> 00:36:37,436 Speaker 2: a college degree but are enormously valuable. Like some of 696 00:36:37,436 --> 00:36:40,436 Speaker 2: our top ten people on our team are those people. 697 00:36:41,316 --> 00:36:45,116 Speaker 1: That facility in Treeport. Is it live? Like, is real 698 00:36:45,156 --> 00:36:47,036 Speaker 1: stuff going in and real orders going out? 699 00:36:47,156 --> 00:36:49,956 Speaker 2: Yeah, it's live. We could follow up with exactly the date, 700 00:36:49,956 --> 00:36:52,236 Speaker 2: but it's been up for about a year. I think, so. 701 00:36:52,396 --> 00:36:55,076 Speaker 1: Interesting thing like that, Well, I would be interested in 702 00:36:55,116 --> 00:36:57,676 Speaker 1: talking to your counterpart there as well. That show would 703 00:36:57,676 --> 00:37:01,836 Speaker 1: pair interestingly with this show. So okay, let's talk about 704 00:37:01,836 --> 00:37:04,716 Speaker 1: the rest of the process. You know, the rest of 705 00:37:04,756 --> 00:37:06,876 Speaker 1: what's going on in the warehouse and where else you're 706 00:37:06,916 --> 00:37:10,356 Speaker 1: working on robots. So the piece we've been talking about 707 00:37:10,356 --> 00:37:13,476 Speaker 1: this whole time is getting stuff as it comes in 708 00:37:13,516 --> 00:37:17,716 Speaker 1: from the truck onto the shelf, which naively I wouldn't 709 00:37:17,716 --> 00:37:19,476 Speaker 1: even think of that part, but it turns out to 710 00:37:19,476 --> 00:37:22,076 Speaker 1: me this scrape big problem. What are the other pieces? 711 00:37:23,036 --> 00:37:26,836 Speaker 2: What's interesting is the science we're building giving robots a 712 00:37:26,876 --> 00:37:30,756 Speaker 2: sense of touch has applicability and lots and lots of 713 00:37:30,796 --> 00:37:34,756 Speaker 2: places across that whole chain. Anytime the robots need to 714 00:37:34,796 --> 00:37:40,396 Speaker 2: be physically interacting, like contacting, touching items is a good 715 00:37:40,436 --> 00:37:44,436 Speaker 2: place for our core technology. So if we're packing four 716 00:37:44,476 --> 00:37:46,516 Speaker 2: items into a box because we want to send you 717 00:37:46,556 --> 00:37:48,676 Speaker 2: the four things you bought in one shipment, not in 718 00:37:48,756 --> 00:37:52,196 Speaker 2: four separate packages, you need to touch the box. You 719 00:37:52,236 --> 00:37:53,956 Speaker 2: need to touch the other items that are already in 720 00:37:53,996 --> 00:37:56,116 Speaker 2: the box. You need to play that game of tetris. 721 00:37:56,636 --> 00:37:59,356 Speaker 1: Yes, I mean it's a stowing problem again, right, I 722 00:37:59,436 --> 00:38:01,796 Speaker 1: know it's called packing, but it's a version of that same. 723 00:38:01,636 --> 00:38:04,276 Speaker 2: Problem, that's right, And those problems recur over and over again. 724 00:38:04,396 --> 00:38:07,036 Speaker 2: So getting all of the packages, all of the cardboard 725 00:38:07,076 --> 00:38:10,916 Speaker 2: boxes and paper mailers into a cart that can go 726 00:38:11,356 --> 00:38:13,596 Speaker 2: onto the back of the truck, that is a stowing 727 00:38:13,676 --> 00:38:14,716 Speaker 2: problem in the cart. 728 00:38:15,116 --> 00:38:18,036 Speaker 1: Putting things in a thing, yeah, is a great, big 729 00:38:18,076 --> 00:38:19,716 Speaker 1: problem in many ways. 730 00:38:19,836 --> 00:38:22,276 Speaker 2: But you can also expand to think about grocery. So 731 00:38:22,356 --> 00:38:28,036 Speaker 2: if you order produce, you don't want your grandfather's welding 732 00:38:28,156 --> 00:38:31,236 Speaker 2: robot handling your peaches. It's gonna smash them, like, you 733 00:38:31,276 --> 00:38:33,676 Speaker 2: need a robot with a sense of touch. If you 734 00:38:33,756 --> 00:38:37,516 Speaker 2: think about household tasks, if you want a robot, you know, 735 00:38:37,596 --> 00:38:41,796 Speaker 2: picking up your kid's toys or dealing with laundry, like, 736 00:38:41,876 --> 00:38:43,876 Speaker 2: those robots need to have a sense of touch. They're 737 00:38:43,916 --> 00:38:46,956 Speaker 2: physically interacting in a dexterous way with the world. And 738 00:38:47,036 --> 00:38:49,036 Speaker 2: so one of the things that we're so excited about 739 00:38:49,436 --> 00:38:52,836 Speaker 2: not only these big applications for stowing and picking off 740 00:38:52,836 --> 00:38:57,076 Speaker 2: of you know, these bookcases, but everything that gets unlocked 741 00:38:57,196 --> 00:38:59,356 Speaker 2: once the robot has that sense of touch. 742 00:39:00,396 --> 00:39:04,996 Speaker 1: When you talk that way, it feels like a beyond 743 00:39:05,116 --> 00:39:08,516 Speaker 1: what is typically considered Amazon kind of thing. It seems 744 00:39:08,556 --> 00:39:11,076 Speaker 1: like a thing either Amazon's going to get into lots 745 00:39:11,076 --> 00:39:16,316 Speaker 1: of other sort of non retail businesses or license the 746 00:39:16,356 --> 00:39:19,476 Speaker 1: technology or sell you know, robotic touch as a service 747 00:39:19,716 --> 00:39:20,236 Speaker 1: or whatever. 748 00:39:20,356 --> 00:39:25,276 Speaker 2: Yeah, I think there are probably five or ten applications 749 00:39:25,276 --> 00:39:29,036 Speaker 2: in how we process orders today that are all within 750 00:39:29,156 --> 00:39:32,756 Speaker 2: the warehouses and delivery stations, and those are my first 751 00:39:33,716 --> 00:39:37,236 Speaker 2: hill to climb. Then we do have a consumer robotics team. 752 00:39:37,316 --> 00:39:40,956 Speaker 2: So there was a cool robot we released called Astro. 753 00:39:41,516 --> 00:39:44,236 Speaker 2: It didn't have any manipulation capabilities, right, It would drive 754 00:39:44,236 --> 00:39:46,236 Speaker 2: around your house. It had a camera on a mast 755 00:39:46,316 --> 00:39:48,916 Speaker 2: that would extend up and down. You could talk to 756 00:39:48,956 --> 00:39:50,716 Speaker 2: it the way you can talk to an Alexa device. 757 00:39:51,196 --> 00:39:53,436 Speaker 2: The future versions of those robots are going to want 758 00:39:53,476 --> 00:39:55,956 Speaker 2: to do more useful things, and so they're going to 759 00:39:56,036 --> 00:39:58,316 Speaker 2: need this kind of underlying technology, and so that's a 760 00:39:58,316 --> 00:40:01,996 Speaker 2: business opportunity in the long term. You know, that's not 761 00:40:02,116 --> 00:40:05,276 Speaker 2: a thing my team is focused on now, but I 762 00:40:05,276 --> 00:40:08,316 Speaker 2: get excited about it when I think about what we unlock. 763 00:40:12,796 --> 00:40:14,916 Speaker 1: We'll be back in a minute with the lightning round. 764 00:40:25,436 --> 00:40:27,676 Speaker 1: Let's do a lightning round. If you listen to the show, 765 00:40:28,196 --> 00:40:30,476 Speaker 1: you have a sense of what this is. Tell me 766 00:40:30,476 --> 00:40:32,996 Speaker 1: about the last time you were in zero gravity. 767 00:40:33,796 --> 00:40:38,036 Speaker 2: I flew an experiment to try and drill into rocks, 768 00:40:38,436 --> 00:40:42,036 Speaker 2: which was going to be applied to asteroids. And of course, 769 00:40:42,036 --> 00:40:45,236 Speaker 2: if you're drilling into an asteroid, any amount you're pushing 770 00:40:45,276 --> 00:40:47,836 Speaker 2: into the rock is pushing you back off into space 771 00:40:47,876 --> 00:40:51,516 Speaker 2: because asteroids have almost zero gravity. 772 00:40:51,436 --> 00:40:54,556 Speaker 1: Right, so you gotta have somebody push it on the 773 00:40:54,596 --> 00:40:56,716 Speaker 1: other side. How do you solve that? What do you do? 774 00:40:56,996 --> 00:40:58,676 Speaker 1: You grab it? How do you even do that? 775 00:40:58,756 --> 00:41:02,396 Speaker 2: Says my passion for robot hands. We built a robot 776 00:41:02,436 --> 00:41:05,116 Speaker 2: hand that would grab the rock with a bunch of claws. 777 00:41:05,156 --> 00:41:08,076 Speaker 2: I think it had a thousand claws, and the claws 778 00:41:08,076 --> 00:41:10,436 Speaker 2: were actually fish hooks. So imagine a bunch of fish 779 00:41:10,436 --> 00:41:13,676 Speaker 2: hooks grabbing onto a rock to react the force of 780 00:41:13,716 --> 00:41:15,276 Speaker 2: pushing a drill bit down the center. 781 00:41:15,916 --> 00:41:16,596 Speaker 1: Did it work? 782 00:41:17,316 --> 00:41:19,756 Speaker 2: It did work, but it only worked on rocks that 783 00:41:19,796 --> 00:41:22,476 Speaker 2: were pretty rough, that had a lot of spots for 784 00:41:22,636 --> 00:41:25,396 Speaker 2: the fish hooks to grab. But it turns out asteroids 785 00:41:25,396 --> 00:41:27,916 Speaker 2: are really rough. Most of the smooth rocks you find 786 00:41:27,916 --> 00:41:31,116 Speaker 2: on Earth have been processed by liquid water or ice, 787 00:41:31,476 --> 00:41:34,876 Speaker 2: and that's not happening on asteroids. No liquid water. 788 00:41:35,516 --> 00:41:37,556 Speaker 1: And so this was on the on the plane, on 789 00:41:37,636 --> 00:41:41,396 Speaker 1: that NASA plane that flies what is it? Yeah, vomit 790 00:41:41,476 --> 00:41:44,636 Speaker 1: combat fine curve basically, yeah, what was it like? 791 00:41:45,316 --> 00:41:48,756 Speaker 2: The vomit comets actually very zen. So when you're in 792 00:41:48,956 --> 00:41:52,676 Speaker 2: zero gravity, when you're floating, it's like very peaceful. It's 793 00:41:52,716 --> 00:41:54,996 Speaker 2: when you're in double gravity, where you're the bottom of 794 00:41:55,036 --> 00:41:58,116 Speaker 2: the parabola and you're like being glued and pushed against 795 00:41:58,156 --> 00:42:00,196 Speaker 2: the floor. If you like turn your head very quickly, 796 00:42:00,196 --> 00:42:03,196 Speaker 2: that's where you get like into serious trouble. And so 797 00:42:03,276 --> 00:42:05,876 Speaker 2: the trick is just to like go into your zone 798 00:42:06,156 --> 00:42:08,196 Speaker 2: for the bottom of the parabola and then you've become 799 00:42:08,236 --> 00:42:11,876 Speaker 2: like very free and zen like in the zero G portion. 800 00:42:13,236 --> 00:42:14,436 Speaker 1: You think you'll ever go to space? 801 00:42:15,556 --> 00:42:18,476 Speaker 2: No, I think now that I have three kids, I 802 00:42:18,476 --> 00:42:19,556 Speaker 2: think I'm landlocked. 803 00:42:20,156 --> 00:42:24,076 Speaker 1: You seem a little bit sad about that. Does everybody 804 00:42:24,116 --> 00:42:26,076 Speaker 1: who works at JPL kind of want to go to space? 805 00:42:26,916 --> 00:42:30,676 Speaker 2: Yes, everybody that works at JPL, I think does think 806 00:42:30,676 --> 00:42:32,876 Speaker 2: about going to space. I think what makes me sad 807 00:42:32,996 --> 00:42:36,916 Speaker 2: is we could be doing so much more at building 808 00:42:36,956 --> 00:42:41,476 Speaker 2: civilization out into space, at the scientific exploration of all 809 00:42:41,556 --> 00:42:45,156 Speaker 2: of the interesting places in space, and I think we're 810 00:42:45,236 --> 00:42:48,156 Speaker 2: kind of tripping ourselves up in a couple of places 811 00:42:48,356 --> 00:42:51,036 Speaker 2: as a species. I wish we would get unblocked and 812 00:42:51,036 --> 00:42:52,996 Speaker 2: get some of that eagerness you see some of the 813 00:42:52,996 --> 00:42:56,036 Speaker 2: private investment, Like we're doing well in rockets, but we're 814 00:42:56,076 --> 00:42:59,636 Speaker 2: not yet doing well in the spacecraft and the scientific 815 00:42:59,676 --> 00:43:03,116 Speaker 2: instruments and the pieces that have to fly on top 816 00:43:03,156 --> 00:43:04,036 Speaker 2: of the rockets. 817 00:43:04,556 --> 00:43:07,396 Speaker 1: When you say we're tripping ourselves up in a couple places, 818 00:43:07,436 --> 00:43:09,076 Speaker 1: in what places? Like, what do you mean? 819 00:43:10,516 --> 00:43:13,596 Speaker 2: I think we became very conservative, like our risk posture 820 00:43:13,636 --> 00:43:17,316 Speaker 2: about going to space. We stopped treating it as this 821 00:43:17,476 --> 00:43:20,916 Speaker 2: like very dangerous activity and tried to make it extremely safe, 822 00:43:20,916 --> 00:43:22,636 Speaker 2: and that slowed us down. 823 00:43:22,636 --> 00:43:24,516 Speaker 1: To bring back to cowboys a little bit. 824 00:43:24,636 --> 00:43:28,076 Speaker 2: Yeah. Interesting, Yeah, and then there's a lot of bureaucracy, 825 00:43:28,196 --> 00:43:32,556 Speaker 2: of course, that built up over fifty years. I still 826 00:43:32,556 --> 00:43:35,116 Speaker 2: have very optimistic there's a lot of smart people working 827 00:43:35,116 --> 00:43:38,116 Speaker 2: in that area and a lot of exciting things happening, 828 00:43:39,116 --> 00:43:40,076 Speaker 2: So we're going to get through it. 829 00:43:46,996 --> 00:43:51,276 Speaker 1: Aaron Parness is a director of Applied Science at Amazon Robotics. 830 00:43:52,476 --> 00:43:55,796 Speaker 1: Please email us at problem at Pushkin dot fm. We 831 00:43:55,876 --> 00:43:59,596 Speaker 1: are always looking for new guests for the show. Today's 832 00:43:59,596 --> 00:44:03,436 Speaker 1: show was produced by Trinomnino and Gabriel Hunter Chang. It 833 00:44:03,556 --> 00:44:07,556 Speaker 1: was edited by Alexander Garretton and engineered by Sarah Bruguerrett. 834 00:44:07,796 --> 00:44:09,916 Speaker 1: I'm Jacob Goldstein, and we'll be back next week with 835 00:44:09,996 --> 00:44:23,356 Speaker 1: another episode of What's Your Problem.