1 00:00:15,356 --> 00:00:23,476 Speaker 1: Pushkin. In a metaphorical sense, AI is everywhere. It can 2 00:00:23,556 --> 00:00:26,356 Speaker 1: write essays, it can do your texes, it can design drugs, 3 00:00:26,356 --> 00:00:30,516 Speaker 1: it can make movies. But in a literal sense, AI 4 00:00:31,236 --> 00:00:35,356 Speaker 1: is not everywhere. You know, a large language model can 5 00:00:35,396 --> 00:00:38,196 Speaker 1: tell you whatever twenty seven ways to fold your shirts 6 00:00:38,196 --> 00:00:40,836 Speaker 1: and put them in the drawer, but there's no robot 7 00:00:40,916 --> 00:00:44,076 Speaker 1: that you can buy that can actually fold your shirts 8 00:00:44,156 --> 00:00:46,876 Speaker 1: and put them in the drawer. At some point, though 9 00:00:47,596 --> 00:00:50,116 Speaker 1: maybe at some point in the not that distant future, 10 00:00:50,756 --> 00:00:53,516 Speaker 1: there will be a robot that can use AI to 11 00:00:53,596 --> 00:00:55,316 Speaker 1: learn how to fold your shirts and put them in 12 00:00:55,316 --> 00:00:58,996 Speaker 1: the drawer, or you know, cook lasagna, pack boxes, plug 13 00:00:58,996 --> 00:01:02,196 Speaker 1: in cables. In other words, there will be a robot 14 00:01:02,316 --> 00:01:06,196 Speaker 1: that can use AI to learn how to do basically anything. 15 00:01:12,276 --> 00:01:14,636 Speaker 1: I'm Jacob Goldstein and this is What's Your Problem, the 16 00:01:14,676 --> 00:01:16,436 Speaker 1: show where I talk to people who are trying to 17 00:01:16,436 --> 00:01:20,876 Speaker 1: make technological progress. My guest today is Chelsea Finn. She's 18 00:01:20,916 --> 00:01:23,156 Speaker 1: a professor at Stanford and the co founder of a 19 00:01:23,196 --> 00:01:28,556 Speaker 1: company called Physical Intelligence aka PI. Chelsea's problem is this, 20 00:01:29,276 --> 00:01:32,316 Speaker 1: can you build an AI model that will bring AI 21 00:01:32,676 --> 00:01:35,876 Speaker 1: to robots, or, as she puts it, we're. 22 00:01:35,676 --> 00:01:39,356 Speaker 2: Trying to develop a model that can control any robot 23 00:01:39,436 --> 00:01:41,036 Speaker 2: to do any task anywhere. 24 00:01:41,756 --> 00:01:44,916 Speaker 1: Physical Intelligence was founded just last year, but the company 25 00:01:44,916 --> 00:01:49,396 Speaker 1: has already raised over four hundred million dollars. Investors include 26 00:01:49,516 --> 00:01:53,556 Speaker 1: Jeff Bezos and OpenAI. The company has raised so much 27 00:01:53,596 --> 00:01:55,836 Speaker 1: money in part because what they're trying to do is 28 00:01:55,876 --> 00:01:59,916 Speaker 1: so hard. Motor skills, the ability to move and find 29 00:01:59,956 --> 00:02:02,596 Speaker 1: ways to fold the shirt to plug in a cable, 30 00:02:02,996 --> 00:02:07,116 Speaker 1: they feel simple to us, easy, basic, But Chelsea told 31 00:02:07,156 --> 00:02:10,756 Speaker 1: me basic motor skills are in fact wildly complex. 32 00:02:11,476 --> 00:02:14,556 Speaker 2: All of the motor control that we do with our body, 33 00:02:14,596 --> 00:02:18,196 Speaker 2: with their hands, with our legs, our feet, a lot 34 00:02:18,236 --> 00:02:20,716 Speaker 2: of it we don't think about when we do it. 35 00:02:20,716 --> 00:02:23,836 Speaker 2: It actually is incredibly complicated what we do. This is 36 00:02:23,836 --> 00:02:26,876 Speaker 2: actually like a really really hard problem to develop in 37 00:02:26,996 --> 00:02:30,476 Speaker 2: aisystems into robots, despite it being so simple. And the 38 00:02:30,516 --> 00:02:33,516 Speaker 2: reasons for that are because actually it is inherently very complex, 39 00:02:34,116 --> 00:02:37,316 Speaker 2: and second that we don't have tons and tons of 40 00:02:37,356 --> 00:02:40,876 Speaker 2: data of doing this, in part because it's so basic 41 00:02:40,956 --> 00:02:42,756 Speaker 2: to humans as well. 42 00:02:42,836 --> 00:02:45,556 Speaker 1: Right, let's talk about the data side, because that seems 43 00:02:45,636 --> 00:02:49,396 Speaker 1: like really the story, right, the big challenge, and it's 44 00:02:49,436 --> 00:02:54,596 Speaker 1: particularly interesting in the context of large language models and 45 00:02:54,636 --> 00:02:58,956 Speaker 1: computer vision which really seem to have emerged in a 46 00:02:58,996 --> 00:03:01,876 Speaker 1: weird way as a consequence of the Internet. Right, just 47 00:03:01,916 --> 00:03:06,436 Speaker 1: because we happen to have this crazy amount of data 48 00:03:06,596 --> 00:03:09,276 Speaker 1: of words and pictures on the Internet, we were able 49 00:03:09,316 --> 00:03:12,476 Speaker 1: to train language models and computer vision models. But we 50 00:03:12,556 --> 00:03:16,756 Speaker 1: don't have that for robots, right. There is no data 51 00:03:16,796 --> 00:03:19,876 Speaker 1: set of training data for robots, which is like the 52 00:03:19,956 --> 00:03:22,756 Speaker 1: big challenge for you and for robotics in general. 53 00:03:22,796 --> 00:03:25,636 Speaker 2: It seems, Yeah, so we don't have an open internet 54 00:03:25,636 --> 00:03:29,316 Speaker 2: of how to control motors to do like even really 55 00:03:29,356 --> 00:03:31,556 Speaker 2: basic things. Maybe the closest thing we have is we 56 00:03:31,596 --> 00:03:34,596 Speaker 2: have videos of people doing things, and perhaps that could 57 00:03:34,596 --> 00:03:37,076 Speaker 2: be useful. But at the same time, if I watch 58 00:03:37,196 --> 00:03:40,036 Speaker 2: like videos of like Roger Federer or playing tennis, you 59 00:03:40,076 --> 00:03:42,956 Speaker 2: can't just become an amazing tennis player as a result 60 00:03:42,956 --> 00:03:45,476 Speaker 2: of that. And likewise, just with videos of people doing things, 61 00:03:45,876 --> 00:03:48,716 Speaker 2: it's very hard to actually extract the motor control behind that. 62 00:03:48,876 --> 00:03:51,476 Speaker 2: And so that lack of data, that scarcity of data, 63 00:03:51,876 --> 00:03:56,316 Speaker 2: makes it in some ways a very different problem than 64 00:03:56,636 --> 00:03:58,956 Speaker 2: in language and computer vision. And I think that we 65 00:03:58,956 --> 00:04:00,796 Speaker 2: should still learn a lot of things from language computer 66 00:04:00,876 --> 00:04:04,196 Speaker 2: vision and collect large data sets like that. It opens 67 00:04:04,276 --> 00:04:07,596 Speaker 2: up new new challenges new possibilities on that front, and 68 00:04:07,676 --> 00:04:08,996 Speaker 2: I think that in the long run we should be 69 00:04:09,236 --> 00:04:11,876 Speaker 2: to get large amounts of data, just like how in 70 00:04:11,916 --> 00:04:14,356 Speaker 2: autonomous driving we have lots of data of cars driving 71 00:04:14,396 --> 00:04:18,076 Speaker 2: around very effectively. Robots too, could be in the world 72 00:04:18,196 --> 00:04:21,316 Speaker 2: collecting data learning about how to pick up mustard and 73 00:04:21,356 --> 00:04:23,516 Speaker 2: put it on a hot dog fund, or learning how 74 00:04:23,556 --> 00:04:26,556 Speaker 2: to open a cabinet to put some objects away. We 75 00:04:26,556 --> 00:04:29,356 Speaker 2: can get that sort of data, but it's not given 76 00:04:29,436 --> 00:04:33,196 Speaker 2: to us for free. 77 00:04:33,436 --> 00:04:36,596 Speaker 1: You still have this core problem, which is there is 78 00:04:36,916 --> 00:04:41,956 Speaker 1: no giant trove of physical reality data that you can 79 00:04:41,996 --> 00:04:44,996 Speaker 1: train your model on. Right, That's the great big challenge, 80 00:04:45,036 --> 00:04:46,796 Speaker 1: it seems, what do you do about that? How do 81 00:04:46,796 --> 00:04:47,996 Speaker 1: you start to approach that? 82 00:04:49,196 --> 00:04:52,676 Speaker 2: Yeah, so we're starting off by collecting data through telling 83 00:04:52,716 --> 00:04:57,436 Speaker 2: operation where you are people are controlling the robot to 84 00:04:57,436 --> 00:05:00,116 Speaker 2: do tasks, and then you don't just get video data. 85 00:05:00,196 --> 00:05:03,196 Speaker 2: You get the videos alongside what are the actions or 86 00:05:03,196 --> 00:05:07,076 Speaker 2: the motor commands needed to actually accomplish those tasks. We've 87 00:05:07,116 --> 00:05:10,636 Speaker 2: collected data in our own office. We've also collected data 88 00:05:10,876 --> 00:05:14,956 Speaker 2: in homes across San Francisco, and we also have a 89 00:05:15,076 --> 00:05:18,476 Speaker 2: very modest warehouse. In some ways, it actually like our 90 00:05:18,516 --> 00:05:22,476 Speaker 2: current operation is rather small, given that we're a little 91 00:05:22,476 --> 00:05:24,076 Speaker 2: over a year old at this point. 92 00:05:24,356 --> 00:05:26,556 Speaker 1: Like what's actually happening? Like if I went into your 93 00:05:26,556 --> 00:05:28,996 Speaker 1: warehouse and somebody was doing teleoperation, what would I see? 94 00:05:29,036 --> 00:05:29,836 Speaker 1: What would it look like? 95 00:05:30,676 --> 00:05:35,076 Speaker 2: Yeah, so we it's a little bit like controlling a puppet. 96 00:05:35,276 --> 00:05:38,956 Speaker 2: So the person who's operating at the robot, they are 97 00:05:38,996 --> 00:05:42,196 Speaker 2: holding in some ways a set of robot arms, but 98 00:05:42,196 --> 00:05:44,596 Speaker 2: they're very lightweight robot arms, and we use those to 99 00:05:44,676 --> 00:05:46,676 Speaker 2: measure the positions of joints. 100 00:05:47,076 --> 00:05:49,516 Speaker 1: It's almost like an elaborate control for a video game 101 00:05:49,636 --> 00:05:52,716 Speaker 1: or something. It's like that, it's not actually a robot arm, right, 102 00:05:52,716 --> 00:05:54,796 Speaker 1: It's a thing you control to sort of play the 103 00:05:54,956 --> 00:05:57,196 Speaker 1: robot to the robot move. 104 00:05:57,076 --> 00:06:00,516 Speaker 2: Yeah, exactly exactly, and then we record that and then 105 00:06:01,036 --> 00:06:04,956 Speaker 2: directly translate those controls over to the robot. We have 106 00:06:04,996 --> 00:06:07,516 Speaker 2: some robots that are just robot arms, where you're only 107 00:06:07,516 --> 00:06:09,636 Speaker 2: just controlling the robot arm. It's mounted to a table 108 00:06:09,756 --> 00:06:11,996 Speaker 2: or something like that. But we also have what we 109 00:06:12,036 --> 00:06:14,636 Speaker 2: call mobile manipulators that have wheels and robot arms, and 110 00:06:14,676 --> 00:06:18,036 Speaker 2: you can control both how the robot drives around as 111 00:06:18,116 --> 00:06:21,236 Speaker 2: well as how the arms move and we're doing tasks 112 00:06:21,356 --> 00:06:26,956 Speaker 2: like wiping down counters, folding laundry, putting dishes into dishwashers, 113 00:06:27,276 --> 00:06:32,716 Speaker 2: plugging cables into data center racks, assembling cardboard boxes, lots 114 00:06:32,756 --> 00:06:35,556 Speaker 2: and lots of different tasks that might be useful for 115 00:06:35,676 --> 00:06:38,636 Speaker 2: robots to do, and recording all the data. So we 116 00:06:38,676 --> 00:06:40,996 Speaker 2: have cameras on the robots. There are sensors on the 117 00:06:41,036 --> 00:06:44,636 Speaker 2: joints on the motors of the robots as well, and 118 00:06:44,676 --> 00:06:47,596 Speaker 2: we record that in like a synchronized way across time. 119 00:06:47,836 --> 00:06:50,596 Speaker 1: So when you do it, it's like kind of like 120 00:06:50,756 --> 00:06:52,716 Speaker 1: a real world video game, like you're moving your arms 121 00:06:52,716 --> 00:06:55,676 Speaker 1: in these things, and in basically real time, the robot 122 00:06:55,796 --> 00:06:58,036 Speaker 1: arm is moving and picking up the thing you wanted 123 00:06:58,076 --> 00:07:01,156 Speaker 1: to pick up, And like, what's it like? Is there 124 00:07:01,236 --> 00:07:03,556 Speaker 1: like a curve where like at the beginning it's really bad? 125 00:07:03,636 --> 00:07:06,036 Speaker 1: Sort of tell me talk me through an instance. 126 00:07:06,956 --> 00:07:08,796 Speaker 2: And it depends on the person. So some people can 127 00:07:08,836 --> 00:07:11,276 Speaker 2: pay it really really quickly. Some people are a bit 128 00:07:11,276 --> 00:07:13,756 Speaker 2: slower to pick it up. I've pride myself in being 129 00:07:13,756 --> 00:07:17,756 Speaker 2: a pretty good operator, and so I have done tasks 130 00:07:17,756 --> 00:07:20,476 Speaker 2: as complex as peeling a hard boiled egg with the robot, 131 00:07:21,196 --> 00:07:22,476 Speaker 2: which is how are. 132 00:07:22,316 --> 00:07:24,916 Speaker 1: You how are you at peeling a hardboard hard boiled 133 00:07:24,916 --> 00:07:25,796 Speaker 1: egg with your hands. 134 00:07:27,276 --> 00:07:29,796 Speaker 2: It's pretty hard with my own hands too, yeah, and 135 00:07:29,836 --> 00:07:31,076 Speaker 2: with the robot is even harder. 136 00:07:31,156 --> 00:07:32,996 Speaker 1: Tell me about the robot peeling a hard build egg 137 00:07:33,036 --> 00:07:35,276 Speaker 1: because that sounds like a hard one. Yeah. 138 00:07:35,316 --> 00:07:37,796 Speaker 2: So the robots, basically, all the robots that we're using 139 00:07:37,836 --> 00:07:40,716 Speaker 2: are like kind of pincher grippers. They're called parallel drag rippers, 140 00:07:41,036 --> 00:07:44,756 Speaker 2: where there's just one degree random like open clothes two pincers. 141 00:07:44,756 --> 00:07:46,556 Speaker 1: It's basically two pincers, like two. 142 00:07:46,396 --> 00:07:50,676 Speaker 2: Pinters, two arms. Yeah, exactly, and I've used that exact setup. 143 00:07:50,996 --> 00:07:52,956 Speaker 2: There's six different joints on the arm, so it can 144 00:07:53,396 --> 00:07:56,556 Speaker 2: move as kind of full basically full range of motion 145 00:07:56,676 --> 00:07:59,236 Speaker 2: in three D space and three D rotation, and you 146 00:07:59,236 --> 00:08:01,396 Speaker 2: can use that to peel a hard boiled egg. You 147 00:08:01,436 --> 00:08:04,156 Speaker 2: don't have any tactile feedback, so you can't actually feel 148 00:08:04,556 --> 00:08:05,996 Speaker 2: the egg, and that's actually one of the things that 149 00:08:06,116 --> 00:08:08,876 Speaker 2: makes it more difficult. But you can actually you can 150 00:08:08,996 --> 00:08:13,036 Speaker 2: use visual feedback to compensate for that. And so just 151 00:08:13,036 --> 00:08:15,516 Speaker 2: by looking at the egg myself, I'm able to figure 152 00:08:15,516 --> 00:08:18,076 Speaker 2: out if you're like in contact with something, and you just. 153 00:08:18,156 --> 00:08:21,156 Speaker 1: Use one prong of the claw like what I could say, 154 00:08:21,156 --> 00:08:23,236 Speaker 1: you squeeze it a little to crack it, and then 155 00:08:23,676 --> 00:08:25,836 Speaker 1: use like one prong of the claw to get the 156 00:08:25,836 --> 00:08:26,316 Speaker 1: shell off. 157 00:08:26,996 --> 00:08:28,956 Speaker 2: Yeah, exactly, so you can. You want to crack it 158 00:08:28,996 --> 00:08:31,116 Speaker 2: initially and then hold it with one gripper and then 159 00:08:31,236 --> 00:08:34,716 Speaker 2: use basically one of the two fingers in the gripper 160 00:08:35,036 --> 00:08:38,076 Speaker 2: to get pieces of shell off. When we did this, 161 00:08:38,116 --> 00:08:41,836 Speaker 2: we heart boiled only two eggs and the moss egg. 162 00:08:42,556 --> 00:08:44,516 Speaker 2: This is actually a Stanford The first egg and graduate 163 00:08:44,556 --> 00:08:46,956 Speaker 2: student ended up breaking and so that I did the 164 00:08:46,996 --> 00:08:49,156 Speaker 2: second egg, and I was able to successfully not break 165 00:08:49,196 --> 00:08:52,156 Speaker 2: it and fully peel it. It took some patience, certainly, 166 00:08:52,156 --> 00:08:53,956 Speaker 2: and I wasn't able to do it as quickly as 167 00:08:53,956 --> 00:08:56,556 Speaker 2: with my own hands, But I guess goes to show 168 00:08:56,636 --> 00:09:00,276 Speaker 2: the extent to which we're able to control robots to 169 00:09:00,356 --> 00:09:02,116 Speaker 2: do pretty complicated things. 170 00:09:02,356 --> 00:09:05,596 Speaker 1: Yeah, and so obviously, I mean that is a stunt 171 00:09:05,676 --> 00:09:07,876 Speaker 1: or a game or something fun to do with the robot. 172 00:09:07,916 --> 00:09:11,956 Speaker 1: But presumably in that instance, as in the other instances 173 00:09:11,996 --> 00:09:16,556 Speaker 1: of folding clothes and vacuuming it like, there is learning, right. 174 00:09:16,596 --> 00:09:19,076 Speaker 1: The idea is that you do it some number of 175 00:09:19,116 --> 00:09:21,476 Speaker 1: times and then the robot can do it, and then 176 00:09:21,516 --> 00:09:24,516 Speaker 1: presumably there's also generalization. But just to start with learning, 177 00:09:24,796 --> 00:09:29,036 Speaker 1: like you know, reductively, how many times do you got 178 00:09:29,036 --> 00:09:30,356 Speaker 1: to do it for the robot to learn it? 179 00:09:31,676 --> 00:09:35,796 Speaker 2: Yeah, so it really depends on the extent to which 180 00:09:35,836 --> 00:09:38,636 Speaker 2: you want the robot to handle different conditions. So in 181 00:09:38,676 --> 00:09:40,996 Speaker 2: some of our research, we've been able to show the 182 00:09:41,116 --> 00:09:44,596 Speaker 2: robot how to do something like thirty times or fifty times, 183 00:09:44,716 --> 00:09:47,716 Speaker 2: and just with that maybe sounds like a bit, but 184 00:09:47,716 --> 00:09:49,476 Speaker 2: you can do that in like typically less than an 185 00:09:49,476 --> 00:09:52,276 Speaker 2: hour if it's a simple task, and from that the 186 00:09:52,356 --> 00:09:56,036 Speaker 2: robot can under the circumstances. You only kind of demonstrate it. 187 00:09:56,036 --> 00:09:59,036 Speaker 2: In a narrow set of circumstances, like a single environment, 188 00:09:59,396 --> 00:10:02,956 Speaker 2: a single particular object, the robot can learn just from 189 00:10:03,036 --> 00:10:05,076 Speaker 2: like less than hour of data. 190 00:10:05,156 --> 00:10:07,156 Speaker 1: What is an example of a thing that the robot 191 00:10:07,196 --> 00:10:08,556 Speaker 1: learned in less than an er of data? 192 00:10:09,316 --> 00:10:12,556 Speaker 2: Oh yeah, we put a shoe on a foot, We 193 00:10:12,876 --> 00:10:14,156 Speaker 2: tear it off a piece of tape and put it 194 00:10:14,196 --> 00:10:18,516 Speaker 2: on a box. We've also hung up a shirt on 195 00:10:18,596 --> 00:10:19,036 Speaker 2: a hangar. 196 00:10:19,676 --> 00:10:22,276 Speaker 1: So that's not that much I mean, especially because you 197 00:10:22,316 --> 00:10:24,676 Speaker 1: say the robot, but what you really mean is the model. 198 00:10:24,796 --> 00:10:29,116 Speaker 1: So every robot, right, presumably or every robot that's built 199 00:10:29,156 --> 00:10:30,916 Speaker 1: more or less like that one, right, Like that's one 200 00:10:30,956 --> 00:10:33,236 Speaker 1: of the key things. It's like you're not teaching one robot, 201 00:10:33,276 --> 00:10:37,276 Speaker 1: you're teaching every robot ever, because it's it's software fundamentally, 202 00:10:37,276 --> 00:10:38,836 Speaker 1: it's an am model. It's not hardware. 203 00:10:39,356 --> 00:10:42,236 Speaker 2: Yeah, yes, with the caveat that, if you want to 204 00:10:42,236 --> 00:10:44,796 Speaker 2: be this data efficient, it works best if it's like 205 00:10:45,156 --> 00:10:47,356 Speaker 2: in the same like the same color of the table, 206 00:10:47,756 --> 00:10:50,156 Speaker 2: the same kind of rough initial conditions of where the 207 00:10:50,156 --> 00:10:52,636 Speaker 2: objects are starting, right, and the same shirt for example. 208 00:10:52,676 --> 00:10:54,436 Speaker 2: So this is just with like a single shirt and 209 00:10:54,476 --> 00:10:55,276 Speaker 2: not like any shirt. 210 00:10:55,436 --> 00:10:59,556 Speaker 1: So there's there's like concentric circles of generalizability, right, like 211 00:10:59,676 --> 00:11:02,836 Speaker 1: exact same shirt, exact same spot, exact same table versus 212 00:11:02,876 --> 00:11:06,876 Speaker 1: like fold a shirt versus fold clothes, right and versus. 213 00:11:07,676 --> 00:11:12,116 Speaker 1: And so is that just infinitely harder, Like how does 214 00:11:12,156 --> 00:11:14,396 Speaker 1: that work? That's your big that's your big challenge at 215 00:11:14,396 --> 00:11:16,396 Speaker 1: some level, right, Yeah. 216 00:11:16,236 --> 00:11:18,396 Speaker 2: So generalization is one of the big one of the 217 00:11:18,396 --> 00:11:20,076 Speaker 2: big challenges, not the only one, but it's one of 218 00:11:20,076 --> 00:11:23,636 Speaker 2: the big challenges. And in some ways, I mean the 219 00:11:23,956 --> 00:11:25,916 Speaker 2: first unlock there is just to make sure that you're 220 00:11:25,916 --> 00:11:28,316 Speaker 2: collecting data not just for one shirt, but collecting it 221 00:11:28,316 --> 00:11:30,036 Speaker 2: for lots of shirts, or collecting it for lots of 222 00:11:30,036 --> 00:11:33,316 Speaker 2: clothing items, and ideally also collecting data with lots of 223 00:11:33,356 --> 00:11:37,356 Speaker 2: tables with different textures, and also like not just visual 224 00:11:37,596 --> 00:11:40,596 Speaker 2: like appearances, but also like if you're folding on a 225 00:11:40,636 --> 00:11:43,716 Speaker 2: surface that has very low friction, like it's very smooth, 226 00:11:43,796 --> 00:11:46,236 Speaker 2: versus a surface that like maybe on top of carpet 227 00:11:46,316 --> 00:11:49,436 Speaker 2: or something that's going to behave differently when you're trying 228 00:11:49,476 --> 00:11:53,916 Speaker 2: to move the shirt across the table. So having variability 229 00:11:53,996 --> 00:11:57,236 Speaker 2: in the scenarios in which the robot is experiencing in 230 00:11:57,276 --> 00:12:02,076 Speaker 2: the data set is important, and we've seen evidence that 231 00:12:02,596 --> 00:12:04,716 Speaker 2: you set things up correctly and collect data under lots 232 00:12:04,756 --> 00:12:08,276 Speaker 2: of scenarios, you can actually generalize to completely new scenarios. 233 00:12:08,316 --> 00:12:11,556 Speaker 2: And in like Pile five release, for example, we found 234 00:12:11,596 --> 00:12:15,356 Speaker 2: that if we collected data in roughly like one hundred 235 00:12:15,396 --> 00:12:20,436 Speaker 2: different rooms, then the robot is able to do some 236 00:12:20,636 --> 00:12:22,756 Speaker 2: tasks in rooms that it's never been in before. 237 00:12:23,116 --> 00:12:26,516 Speaker 1: So you mentioned Pile five, So PI zero point five 238 00:12:26,556 --> 00:12:31,716 Speaker 1: that's your latest model that you've released, right, tell me 239 00:12:31,756 --> 00:12:35,676 Speaker 1: about that, Like, what what does that model allow robots 240 00:12:35,716 --> 00:12:38,956 Speaker 1: to do? Like what robots and what settings and what tasks. 241 00:12:39,436 --> 00:12:43,116 Speaker 2: Yeah, yeah, definitely. So we were focusing on generalization. So 242 00:12:43,316 --> 00:12:46,196 Speaker 2: the previous model, we were focusing on capability, and we 243 00:12:46,236 --> 00:12:49,756 Speaker 2: did a really complicated task of laundry folding. From there, 244 00:12:49,796 --> 00:12:52,556 Speaker 2: we wanted to answer, like, Okay, that model worked in 245 00:12:52,556 --> 00:12:54,596 Speaker 2: one environment. It's fairly brittle. If you put it in 246 00:12:54,596 --> 00:12:56,556 Speaker 2: a new environment, it wouldn't work. And we wanted to 247 00:12:56,556 --> 00:12:59,476 Speaker 2: see if we put robots in new environments with new objects, 248 00:12:59,476 --> 00:13:03,476 Speaker 2: new lighting conditions, new furniture, can the robot be successful. 249 00:13:03,636 --> 00:13:09,956 Speaker 2: And to do that, we collected data on these manipulators, 250 00:13:10,076 --> 00:13:13,636 Speaker 2: which feels like a terrible name, but robots with two 251 00:13:13,716 --> 00:13:16,036 Speaker 2: arms and wheels that can drive around kind of like 252 00:13:16,036 --> 00:13:18,956 Speaker 2: a humanoid, but we're using wheels instead of legs, a 253 00:13:18,956 --> 00:13:22,716 Speaker 2: bit more practical in that regard, and we train the 254 00:13:22,796 --> 00:13:26,396 Speaker 2: robot to do things like tidying a bed, or wiping 255 00:13:26,476 --> 00:13:29,556 Speaker 2: spills off of a surface, or putting dishes into a sink, 256 00:13:29,676 --> 00:13:34,156 Speaker 2: or putting away items into drawers, taking items of clothing, 257 00:13:34,156 --> 00:13:36,236 Speaker 2: dirty clothing off the floor and putting them into a 258 00:13:36,276 --> 00:13:39,836 Speaker 2: laundry basket, things like that, And then we tested whether 259 00:13:39,916 --> 00:13:42,036 Speaker 2: or not after collecting data like that and lots of 260 00:13:42,116 --> 00:13:45,676 Speaker 2: environments aggregated with other data, including data on the internet. 261 00:13:46,156 --> 00:13:49,876 Speaker 2: Can the robot then do those things in a home 262 00:13:49,916 --> 00:13:53,076 Speaker 2: that has never been in before. And in some ways 263 00:13:53,076 --> 00:13:57,916 Speaker 2: that sounds kind of basic, like people have no problem 264 00:13:58,316 --> 00:14:01,236 Speaker 2: with if you can do it something in like one home, 265 00:14:01,356 --> 00:14:03,236 Speaker 2: probably could do the same thing in another home. It's 266 00:14:03,276 --> 00:14:05,796 Speaker 2: not really doesn't seem like a complicated thing for humans, 267 00:14:05,956 --> 00:14:08,316 Speaker 2: but for robots that are trained on data, if they're 268 00:14:08,316 --> 00:14:11,116 Speaker 2: only trained on in one place there are whole universe, 269 00:14:11,196 --> 00:14:13,476 Speaker 2: is that one place they haven't ever seen any other place? 270 00:14:13,836 --> 00:14:17,276 Speaker 2: This is actually kind of a big challenge for existing methods. 271 00:14:17,276 --> 00:14:18,956 Speaker 2: And yeah, it was a step four. We were able 272 00:14:18,996 --> 00:14:21,676 Speaker 2: to see that it definitely isn't perfect by any means, 273 00:14:21,716 --> 00:14:25,916 Speaker 2: and that kind of comes to another challenge, which is reliability. 274 00:14:26,036 --> 00:14:29,036 Speaker 2: But we're able to see the robot do things in 275 00:14:29,076 --> 00:14:31,236 Speaker 2: homes it's never been in before, where we set it up, 276 00:14:31,356 --> 00:14:33,156 Speaker 2: ask it to do things, and it does some things 277 00:14:33,196 --> 00:14:33,756 Speaker 2: that are useful. 278 00:14:33,876 --> 00:14:36,476 Speaker 1: So like in the classical setting where a robot is 279 00:14:36,556 --> 00:14:38,356 Speaker 1: changed in one room, like it doesn't even know that 280 00:14:38,436 --> 00:14:40,996 Speaker 1: room is a room. That's just like the whole world 281 00:14:41,036 --> 00:14:43,196 Speaker 1: to the robot, is that world right? And if you 282 00:14:43,236 --> 00:14:46,996 Speaker 1: put it in another room, it's in a completely unfamiliar 283 00:14:47,036 --> 00:14:48,236 Speaker 1: world exactly. 284 00:14:48,316 --> 00:14:50,316 Speaker 2: And so for example, what we were talking about, like 285 00:14:50,556 --> 00:14:52,996 Speaker 2: hanging up a shirt, its whole world was like that one, 286 00:14:53,156 --> 00:14:57,036 Speaker 2: like like a black tabletop that smooth, that one blue shirt, 287 00:14:57,156 --> 00:14:59,436 Speaker 2: that one coat hanger. And it doesn't know about this 288 00:14:59,916 --> 00:15:01,676 Speaker 2: entire universe of other shirts and other. 289 00:15:01,716 --> 00:15:03,956 Speaker 1: It doesn't know that there is a category called shirt. 290 00:15:04,156 --> 00:15:04,676 Speaker 1: It only knows. 291 00:15:04,756 --> 00:15:05,876 Speaker 2: Yeah, it doesn't even know what shirts are. 292 00:15:06,036 --> 00:15:08,356 Speaker 1: Yeah, it doesn't even know what shirts are. For pie 293 00:15:08,436 --> 00:15:10,556 Speaker 1: zero point five, Like, what did you ask the robot 294 00:15:10,596 --> 00:15:12,196 Speaker 1: to do? And how well did it work? 295 00:15:13,316 --> 00:15:16,596 Speaker 2: Yeah, So we trained the model. We took actually a 296 00:15:16,596 --> 00:15:19,956 Speaker 2: pre trading language model with also like a vision component, 297 00:15:20,476 --> 00:15:23,156 Speaker 2: and we fine tuned it on a lot of data, 298 00:15:23,196 --> 00:15:26,676 Speaker 2: including data from different homes across San Francisco, but actually 299 00:15:26,676 --> 00:15:28,276 Speaker 2: a lot of other data too. So actually only two 300 00:15:28,316 --> 00:15:31,796 Speaker 2: percent of the data was on these like mobile robots 301 00:15:31,956 --> 00:15:35,196 Speaker 2: with arms. So we can store how the motors were 302 00:15:35,196 --> 00:15:38,036 Speaker 2: all moving in all of our previous data and then 303 00:15:38,356 --> 00:15:40,716 Speaker 2: train the model to mimic that data that we've stored. 304 00:15:40,836 --> 00:15:43,476 Speaker 1: It's like it's like predicting the next word, but instead 305 00:15:43,476 --> 00:15:45,716 Speaker 1: of predicting the next word, it's like predicting the next movement. 306 00:15:45,876 --> 00:15:47,236 Speaker 1: Or something like yes, exactly. 307 00:15:48,716 --> 00:15:50,956 Speaker 2: We've kind of trained it to predict next actions or 308 00:15:51,036 --> 00:15:54,276 Speaker 2: next motor commands instead of next words. We do an 309 00:15:54,316 --> 00:15:57,476 Speaker 2: additional training process to have it focus on and be 310 00:15:57,596 --> 00:16:01,036 Speaker 2: good at the mobile robot data and homes. Then we 311 00:16:01,036 --> 00:16:03,396 Speaker 2: set up the robot in a new home and we 312 00:16:03,476 --> 00:16:06,516 Speaker 2: give it language commands, so we can give it low 313 00:16:06,596 --> 00:16:09,516 Speaker 2: level language commands, or we can actually all so give 314 00:16:09,516 --> 00:16:12,596 Speaker 2: it higher level commands. So the highest level of command 315 00:16:12,676 --> 00:16:14,916 Speaker 2: might be cleaned the bedroom. And one of the things 316 00:16:14,916 --> 00:16:16,556 Speaker 2: that we've also been thinking about more recently is can 317 00:16:16,556 --> 00:16:18,916 Speaker 2: you give it a more detailed description of how you 318 00:16:18,956 --> 00:16:20,916 Speaker 2: want it to clean the bedroom? But we're not quite 319 00:16:20,916 --> 00:16:22,756 Speaker 2: there yet, So we could say clean the bedroom. We'd 320 00:16:22,796 --> 00:16:25,316 Speaker 2: also tell it put the dirty clothes in the laundry basket, 321 00:16:26,236 --> 00:16:29,476 Speaker 2: so that would be kind of a subtask. Or we 322 00:16:29,516 --> 00:16:32,116 Speaker 2: can tell it like commands like pick up the shirt, 323 00:16:32,556 --> 00:16:35,396 Speaker 2: put the shirt in the laundry basket. Then after we 324 00:16:35,476 --> 00:16:39,996 Speaker 2: tell it that command, then it will go off and 325 00:16:40,756 --> 00:16:44,476 Speaker 2: follow that command and actually in most cases realize that 326 00:16:44,516 --> 00:16:46,636 Speaker 2: command successfully in the real world. 327 00:16:47,156 --> 00:16:47,676 Speaker 1: How did it do. 328 00:16:48,476 --> 00:16:51,556 Speaker 2: So it depends on the task. The average success rate 329 00:16:51,596 --> 00:16:55,476 Speaker 2: was around eighty percent, so definitely room for improvement, and 330 00:16:56,036 --> 00:16:58,436 Speaker 2: in many snares it was able to be quite successful. 331 00:16:58,556 --> 00:17:01,756 Speaker 2: We also saw some some failure modes where for example, 332 00:17:01,796 --> 00:17:04,956 Speaker 2: if you're trying to put dishes into a sink, sometimes 333 00:17:05,076 --> 00:17:06,956 Speaker 2: one of the dishes was a cutting board, and picking 334 00:17:06,996 --> 00:17:09,036 Speaker 2: up a cutting board is actually pretty tricky for the 335 00:17:09,196 --> 00:17:11,516 Speaker 2: robot because you either need to slide it to the 336 00:17:11,676 --> 00:17:14,236 Speaker 2: edge of the counter and then grasp it or somehow 337 00:17:14,276 --> 00:17:17,916 Speaker 2: get the kind of get the finger underneath the cutting board. 338 00:17:18,276 --> 00:17:20,396 Speaker 2: And so sometimes it was able to do that successfully. 339 00:17:20,396 --> 00:17:24,076 Speaker 2: Sometimes it struggled and got stuck. The exciting thing though, 340 00:17:24,116 --> 00:17:26,436 Speaker 2: was that it was able to We were able to 341 00:17:26,476 --> 00:17:27,916 Speaker 2: kind of drop it in place as it had never 342 00:17:27,956 --> 00:17:31,276 Speaker 2: been before. And I was doing things that are quite reasonable. 343 00:17:32,036 --> 00:17:33,836 Speaker 1: So what are you doing now, Like, what's the next 344 00:17:33,876 --> 00:17:35,996 Speaker 1: thing you're trying to get to? Yeah? 345 00:17:35,996 --> 00:17:39,796 Speaker 2: Absolutely, So the next thing we're focusing on is reliability 346 00:17:40,116 --> 00:17:44,036 Speaker 2: and speed. So I mentioned like around eighty percent for 347 00:17:44,076 --> 00:17:46,956 Speaker 2: these tasks. How do we get that to ninety nine percent? 348 00:17:47,116 --> 00:17:49,716 Speaker 2: And I think that if we can get the reliability up, 349 00:17:49,916 --> 00:17:54,236 Speaker 2: that's kind of, in my mind, the main missing ingredient 350 00:17:54,476 --> 00:17:57,596 Speaker 2: before we can like really have these being like useful 351 00:17:58,236 --> 00:18:00,116 Speaker 2: in real world scenarios. 352 00:18:00,716 --> 00:18:03,316 Speaker 1: So getting to ninety nine percent is interesting. I mean, 353 00:18:03,396 --> 00:18:08,036 Speaker 1: I think of self driving cars right where it seemed 354 00:18:08,516 --> 00:18:11,236 Speaker 1: sometime go I don't know, ten years ago, fifteen years ago, 355 00:18:11,316 --> 00:18:14,116 Speaker 1: like they were almost there, and I know they're more 356 00:18:14,156 --> 00:18:16,356 Speaker 1: almost there now. I know in San Francisco there really 357 00:18:16,436 --> 00:18:18,676 Speaker 1: are self driving cars, but they're still very much at 358 00:18:18,716 --> 00:18:22,036 Speaker 1: the margin of cars in the world, right, And it 359 00:18:22,076 --> 00:18:26,236 Speaker 1: does seem like almost there means different things in different settings, 360 00:18:26,276 --> 00:18:31,716 Speaker 1: But I don't know. Is it super hard to get 361 00:18:31,716 --> 00:18:33,996 Speaker 1: from eighty percent to ninety nine percent? Does the self 362 00:18:34,076 --> 00:18:38,716 Speaker 1: driving car example teach us anything for your work? 363 00:18:39,796 --> 00:18:42,756 Speaker 2: The self driving car analogy is pretty good. I do 364 00:18:42,836 --> 00:18:47,156 Speaker 2: think that fortunately, we may not need There are scenarios 365 00:18:47,156 --> 00:18:48,916 Speaker 2: where we may not need it to be quite as 366 00:18:48,956 --> 00:18:52,676 Speaker 2: reliable as cars. Cars there is a much much higher 367 00:18:52,876 --> 00:18:56,956 Speaker 2: safety risk. It's much easier to hurt people, and in 368 00:18:57,076 --> 00:18:59,036 Speaker 2: robots there are safety risks because you are in the 369 00:18:59,036 --> 00:19:03,356 Speaker 2: physical world. But it's easier to put in software precautions 370 00:19:03,396 --> 00:19:06,116 Speaker 2: in place and even hardware precautions in place to prevent 371 00:19:06,156 --> 00:19:08,396 Speaker 2: that as well, So that makes it a little bit easier. 372 00:19:08,396 --> 00:19:11,796 Speaker 1: I mean, nine percent probably isn't good enough for cars, right, 373 00:19:11,796 --> 00:19:14,596 Speaker 1: They probably need more nines than that, whereas it may 374 00:19:14,596 --> 00:19:16,356 Speaker 1: well be good enough for a house. 375 00:19:16,156 --> 00:19:19,916 Speaker 2: Cleaning robots, yeah, in certain circumstances. And yeah, like we're 376 00:19:19,916 --> 00:19:22,316 Speaker 2: also thinking about scenarios where maybe even less than that 377 00:19:22,396 --> 00:19:26,076 Speaker 2: is fine. And if we view humans and robots working together, 378 00:19:26,396 --> 00:19:29,436 Speaker 2: it's more about kind of helping the person complete the 379 00:19:29,436 --> 00:19:33,436 Speaker 2: task faster or complete the task more effectively. So I 380 00:19:33,436 --> 00:19:35,956 Speaker 2: think there might be scenarios like that, but still we 381 00:19:35,996 --> 00:19:39,076 Speaker 2: need the performance and reliability to be higher for the 382 00:19:39,156 --> 00:19:41,476 Speaker 2: robots to be faster in order to accomplish that. 383 00:19:44,676 --> 00:19:59,156 Speaker 1: We'll be back in just a minute. What do you 384 00:19:59,196 --> 00:20:02,436 Speaker 1: imagine as the initial real world use cases? 385 00:20:05,076 --> 00:20:07,236 Speaker 2: I don't know. There's a lot of examples of robotics 386 00:20:07,236 --> 00:20:11,196 Speaker 2: companies that have a tempted to kind of start with 387 00:20:11,236 --> 00:20:16,156 Speaker 2: an application and hone in on that, and I think 388 00:20:16,196 --> 00:20:20,156 Speaker 2: the lesson from watching those companies is that you end 389 00:20:20,236 --> 00:20:23,596 Speaker 2: up then spending a lot of time on the problems 390 00:20:23,596 --> 00:20:26,956 Speaker 2: of that specific application and less on developing the sort 391 00:20:26,996 --> 00:20:28,796 Speaker 2: of generalist systems that we think in the long run 392 00:20:28,836 --> 00:20:31,596 Speaker 2: will be more effective. And so we're very focused on 393 00:20:32,276 --> 00:20:36,036 Speaker 2: understanding what are the core bottlenecks and the core missing 394 00:20:36,076 --> 00:20:38,876 Speaker 2: pieces for developing these generalist models, and we think that 395 00:20:38,916 --> 00:20:41,356 Speaker 2: if we had picked an application now, we would kind 396 00:20:41,356 --> 00:20:43,156 Speaker 2: of lose sight of that bigger problem because we need 397 00:20:43,156 --> 00:20:45,916 Speaker 2: to solve things that are specific to that application. So 398 00:20:46,076 --> 00:20:48,636 Speaker 2: we're very focused on what we think are like the 399 00:20:48,636 --> 00:20:53,876 Speaker 2: core technological challenges. We have certain tasks that we're working on. 400 00:20:53,916 --> 00:20:56,556 Speaker 2: Some of them have been home cleaning tasks. We've also 401 00:20:56,636 --> 00:20:59,716 Speaker 2: have some more kind of industrial light tasks as well, 402 00:20:59,956 --> 00:21:04,196 Speaker 2: just to instantiate and actually be iterating on robots and 403 00:21:04,396 --> 00:21:09,396 Speaker 2: applications could range from things and homes to things in 404 00:21:09,476 --> 00:21:14,076 Speaker 2: workplaces to industrial settings. There's lots and lots of use 405 00:21:14,116 --> 00:21:18,716 Speaker 2: cases for intelligent robots and intelligent kind of physical machines. 406 00:21:19,556 --> 00:21:23,796 Speaker 1: What are some of the industrial tasks you've been working on. 407 00:21:24,476 --> 00:21:27,356 Speaker 2: One example that I mentioned before is inserting cables. There's 408 00:21:27,436 --> 00:21:31,236 Speaker 2: lots of use cases in data centers, for example, where 409 00:21:31,836 --> 00:21:36,716 Speaker 2: that's a challenging task. Another example is constructing cardboard boxes 410 00:21:36,756 --> 00:21:40,396 Speaker 2: and filling them with items. We've also done some packaging 411 00:21:40,436 --> 00:21:44,396 Speaker 2: tasks highly relevant to lots of different kind of shipping operations. 412 00:21:44,836 --> 00:21:47,516 Speaker 2: And then even folding clothes. It seems like a very 413 00:21:47,556 --> 00:21:50,556 Speaker 2: home task, but it turns out that there are companies 414 00:21:50,756 --> 00:21:54,316 Speaker 2: that need to fold like very large lots of clothing, 415 00:21:55,036 --> 00:21:57,996 Speaker 2: and so that's also something that in the long term 416 00:21:58,036 --> 00:22:01,316 Speaker 2: could be used in larger scale settings. 417 00:22:01,756 --> 00:22:07,916 Speaker 1: So I've read that you have open sourced your model 418 00:22:07,956 --> 00:22:11,556 Speaker 1: weights and given designs of robots to hardware companies, and 419 00:22:11,596 --> 00:22:14,916 Speaker 1: I'm interested in that and that set of decisions, right, 420 00:22:14,956 --> 00:22:17,756 Speaker 1: that set of sort of strategic decisions. Tell me about 421 00:22:17,796 --> 00:22:20,716 Speaker 1: that sort of giving away IP basically. 422 00:22:20,356 --> 00:22:23,596 Speaker 2: Right, yeah, yeah, definitely. So this is a really hard problem, 423 00:22:23,836 --> 00:22:26,676 Speaker 2: especially this longer term problem of developing a general system. 424 00:22:26,756 --> 00:22:32,996 Speaker 2: We think that the field is very young, and there's 425 00:22:33,316 --> 00:22:36,356 Speaker 2: like a couple of reasons. One is that we think 426 00:22:36,396 --> 00:22:38,236 Speaker 2: that the field needs to mature, and we think that 427 00:22:38,756 --> 00:22:41,876 Speaker 2: having more people being kind of competent with using robots 428 00:22:41,916 --> 00:22:44,916 Speaker 2: and using this kind of technology will be beneficial in 429 00:22:44,916 --> 00:22:47,476 Speaker 2: the long term for the company, and by open sourcing things, 430 00:22:47,516 --> 00:22:49,916 Speaker 2: we make it easier for people to do that. And 431 00:22:49,956 --> 00:22:52,516 Speaker 2: then the second thing is, like the models that we 432 00:22:52,596 --> 00:22:55,996 Speaker 2: develop right now, they're very early, and the models that 433 00:22:56,076 --> 00:22:59,916 Speaker 2: we'll be developing one to three years from now are 434 00:22:59,956 --> 00:23:02,396 Speaker 2: going to be far far more capable than the ones 435 00:23:02,436 --> 00:23:05,156 Speaker 2: that we have now. And so it's kind of like 436 00:23:05,156 --> 00:23:09,276 Speaker 2: like equivalent to like open eye open sourcing GPT to 437 00:23:09,516 --> 00:23:13,236 Speaker 2: GPT three. They actually didn't open source GPT three, but like, 438 00:23:13,596 --> 00:23:15,556 Speaker 2: I think that they would still be in an excellent 439 00:23:15,596 --> 00:23:17,356 Speaker 2: spot today if they had. 440 00:23:19,076 --> 00:23:22,836 Speaker 1: Like what could go wrong that would either prevent you 441 00:23:22,956 --> 00:23:25,676 Speaker 1: as a company from succeeding or even hold back the 442 00:23:25,716 --> 00:23:28,756 Speaker 1: field In general, I don't think we. 443 00:23:28,836 --> 00:23:31,996 Speaker 2: Entirely know the scale of data that we need for 444 00:23:32,676 --> 00:23:36,276 Speaker 2: getting really capable models. And there's a little bit of 445 00:23:36,276 --> 00:23:39,116 Speaker 2: a chicken and egg problem where it's a lot easier 446 00:23:39,116 --> 00:23:41,676 Speaker 2: to collect data once you have a really good model. 447 00:23:42,116 --> 00:23:43,716 Speaker 2: It took like large amounts of data. 448 00:23:43,516 --> 00:23:45,196 Speaker 1: Right, Or if there were thousands of robots out of 449 00:23:45,236 --> 00:23:47,036 Speaker 1: the world running your model, they would just make an 450 00:23:47,076 --> 00:23:50,036 Speaker 1: incredible amount of data coming into you every day, right. 451 00:23:50,356 --> 00:23:53,676 Speaker 2: Yeah, yeah, exactly. So that's that's one thing I actually 452 00:23:53,796 --> 00:23:57,116 Speaker 2: less maybe less a little bit less concerned about that myself. 453 00:23:57,116 --> 00:23:58,796 Speaker 2: And then I think the other thing is just that 454 00:23:58,796 --> 00:24:02,396 Speaker 2: there are technological challenges to getting these things to work 455 00:24:02,436 --> 00:24:05,316 Speaker 2: really well. I think that I think we've had incredible 456 00:24:05,356 --> 00:24:09,476 Speaker 2: progress over the last year and two months, the last 457 00:24:09,476 --> 00:24:12,636 Speaker 2: like fourteen months. I think since we've started, probably more 458 00:24:12,676 --> 00:24:17,236 Speaker 2: progress than I was expecting, honestly compared to when we 459 00:24:17,236 --> 00:24:20,916 Speaker 2: started the company. I think it's like wild that we 460 00:24:20,956 --> 00:24:22,676 Speaker 2: were able to get a robot to like unload and 461 00:24:22,676 --> 00:24:25,676 Speaker 2: fold laundry like a ten minute long task. 462 00:24:25,596 --> 00:24:30,196 Speaker 1: And folding laundry is like a famously hard robot problem, right, 463 00:24:30,236 --> 00:24:32,636 Speaker 1: Like it's the one that people in robotics talk about 464 00:24:32,916 --> 00:24:35,796 Speaker 1: when they talk about things people think are easy are 465 00:24:35,836 --> 00:24:37,636 Speaker 1: actually hard for robots, right. 466 00:24:37,596 --> 00:24:39,796 Speaker 2: Yeah, absolutely absolutely. I mean you have to deal with 467 00:24:39,836 --> 00:24:42,836 Speaker 2: all sorts of variability and how clothes can be crumpled 468 00:24:42,836 --> 00:24:45,516 Speaker 2: on each other. And also it's like there's even like 469 00:24:45,636 --> 00:24:47,516 Speaker 2: really small, minor things you need to do in order 470 00:24:47,556 --> 00:24:49,036 Speaker 2: to like actually get it to be flat on the 471 00:24:49,076 --> 00:24:52,836 Speaker 2: table and folded nicely and even stacked. And as the 472 00:24:52,836 --> 00:24:55,476 Speaker 2: task gets longer as well, there are more opportunities to 473 00:24:55,516 --> 00:24:58,836 Speaker 2: make mistakes, more opportunities to get stuck. And so if 474 00:24:58,836 --> 00:25:00,676 Speaker 2: you're doing a task it takes ten minutes, in those 475 00:25:00,676 --> 00:25:02,676 Speaker 2: ten minutes, there's many many times where the robot can 476 00:25:02,716 --> 00:25:06,316 Speaker 2: make a mistake that it can't recover from or just 477 00:25:06,316 --> 00:25:08,276 Speaker 2: get stuck or something like that. And so being able 478 00:25:08,316 --> 00:25:10,956 Speaker 2: to do such a task starts to kind of point 479 00:25:10,956 --> 00:25:13,676 Speaker 2: at the resilience that these models can have by recovering 480 00:25:13,756 --> 00:25:16,476 Speaker 2: from those mystics. Uh huh, so when we were first 481 00:25:16,516 --> 00:25:20,316 Speaker 2: trying to fold laundry, like, one of the common failure 482 00:25:20,356 --> 00:25:23,356 Speaker 2: modes is that it would fold the laundry like very 483 00:25:23,356 --> 00:25:26,116 Speaker 2: well by my standards at the time, I would be 484 00:25:26,196 --> 00:25:28,116 Speaker 2: very very happy with the robot, and then it would 485 00:25:28,276 --> 00:25:30,836 Speaker 2: push the entire stack of laundry onto the ground. 486 00:25:32,756 --> 00:25:35,476 Speaker 1: Sort of like teaching a toddler to fold clothes. 487 00:25:36,236 --> 00:25:37,436 Speaker 2: Yeah, yeah, exactly. 488 00:25:37,636 --> 00:25:43,556 Speaker 1: Was there a particular moment when you saw a robot 489 00:25:43,636 --> 00:25:46,236 Speaker 1: using your model full close for ten minutes and it worked. 490 00:25:46,756 --> 00:25:50,356 Speaker 2: Yeah. First off, we started with just folding a shirt 491 00:25:50,516 --> 00:25:52,516 Speaker 2: starting flat on the table. We got that to work 492 00:25:52,556 --> 00:25:54,596 Speaker 2: pretty quickly that it turns out to be pretty easy, 493 00:25:55,196 --> 00:25:57,156 Speaker 2: and I wasn't too surprised by that. And then we 494 00:25:57,276 --> 00:25:59,756 Speaker 2: moved from that to starting it in like just a 495 00:25:59,836 --> 00:26:02,996 Speaker 2: random ball, like some sort of crumpled position on the table, 496 00:26:03,156 --> 00:26:04,836 Speaker 2: and then you have to flatten and then fold it, 497 00:26:04,916 --> 00:26:07,956 Speaker 2: and that makes a problem dramatically harder because of all 498 00:26:07,956 --> 00:26:10,676 Speaker 2: the variability having to figure out how to flatten it. 499 00:26:11,236 --> 00:26:14,796 Speaker 2: We were kind of stuck on that problem for at 500 00:26:14,876 --> 00:26:18,596 Speaker 2: least a couple of months, where everything we're trying, the 501 00:26:18,636 --> 00:26:20,956 Speaker 2: success rate of the robot was zero percent. It wasn't 502 00:26:20,956 --> 00:26:24,836 Speaker 2: able to really make progress on it, and we started 503 00:26:24,836 --> 00:26:28,676 Speaker 2: to see signs of life I think in August or 504 00:26:28,716 --> 00:26:33,196 Speaker 2: September of last year, where we tried a new recipe 505 00:26:33,236 --> 00:26:35,996 Speaker 2: where we were continue to train the model on a 506 00:26:36,076 --> 00:26:39,716 Speaker 2: curated part of the data that was following a consistent strategy, 507 00:26:40,436 --> 00:26:43,516 Speaker 2: and that sort of high quality post training is what 508 00:26:43,676 --> 00:26:46,116 Speaker 2: really seemed to make the model work better. And then 509 00:26:46,236 --> 00:26:48,436 Speaker 2: the moment that I was most excited about was the 510 00:26:48,516 --> 00:26:52,316 Speaker 2: first time that I saw the model flatten and fold 511 00:26:52,396 --> 00:26:54,076 Speaker 2: and stack five items in a row. 512 00:26:54,396 --> 00:26:54,596 Speaker 1: Yeah. 513 00:26:54,836 --> 00:26:56,796 Speaker 2: I just remember going home that night and being like 514 00:26:56,876 --> 00:27:00,196 Speaker 2: so excited. It seemed like we had just like figured 515 00:27:00,196 --> 00:27:02,116 Speaker 2: out this this big missing puzzle piece. 516 00:27:02,436 --> 00:27:04,996 Speaker 1: So I was asking you why might it not work 517 00:27:05,076 --> 00:27:07,076 Speaker 1: or what might slow the field down? And then we 518 00:27:07,436 --> 00:27:10,636 Speaker 1: talked about the happy short story. But if in five 519 00:27:10,716 --> 00:27:12,996 Speaker 1: years things didn't progress as quickly as you thought, what 520 00:27:14,596 --> 00:27:15,316 Speaker 1: might have happened. 521 00:27:16,316 --> 00:27:18,756 Speaker 2: I mentioned that I think that incorporating practice, like allowing 522 00:27:18,756 --> 00:27:22,276 Speaker 2: the we're about to practice the task, should be really 523 00:27:22,276 --> 00:27:26,556 Speaker 2: helpful for allowing robots to get better. We don't know 524 00:27:26,556 --> 00:27:30,076 Speaker 2: what exactly that recipe will look like, and so it's 525 00:27:30,116 --> 00:27:33,956 Speaker 2: like a research problem, and with any sort of research problem, 526 00:27:34,676 --> 00:27:36,756 Speaker 2: you don't know exactly how hard the solution is going 527 00:27:36,796 --> 00:27:38,596 Speaker 2: to be, and I think that there are some other 528 00:27:39,156 --> 00:27:41,836 Speaker 2: more nuanced unknowns as well that are somewhat similar to that. 529 00:27:41,956 --> 00:27:45,596 Speaker 2: And we have a large number of very talented researchers 530 00:27:45,836 --> 00:27:48,196 Speaker 2: on our team because we think that there are some 531 00:27:48,236 --> 00:27:51,436 Speaker 2: of these unsolved breakthroughs that are going to be needed 532 00:27:51,476 --> 00:27:53,516 Speaker 2: to really truly solve this problem. 533 00:27:54,276 --> 00:28:01,476 Speaker 1: So, if it does work well and things progress in 534 00:28:01,556 --> 00:28:05,316 Speaker 1: that universe, what would you be worried about? 535 00:28:06,116 --> 00:28:09,036 Speaker 2: Good question? I mean, if things work well, I shouldn't 536 00:28:09,036 --> 00:28:12,236 Speaker 2: be too worried. In general. I do think that it's 537 00:28:12,356 --> 00:28:16,316 Speaker 2: very easy in general to underestimate the challenges around actually 538 00:28:16,356 --> 00:28:20,756 Speaker 2: deploying and disseminating technology that takes time, and when the 539 00:28:20,836 --> 00:28:24,316 Speaker 2: technology doesn't exist yet, that means that like the world 540 00:28:24,436 --> 00:28:26,396 Speaker 2: is not in a place that is like ready for 541 00:28:26,436 --> 00:28:29,636 Speaker 2: that technology. I think that there's a lot of unknowns there. 542 00:28:29,956 --> 00:28:33,996 Speaker 1: I mean, one of the striking things to me about, say, 543 00:28:34,076 --> 00:28:36,596 Speaker 1: language models, is the people who know the most about 544 00:28:36,596 --> 00:28:39,036 Speaker 1: them seem to be the most worried about them, which 545 00:28:39,076 --> 00:28:42,196 Speaker 1: is generally not the case. I think historically with technology, 546 00:28:42,276 --> 00:28:47,596 Speaker 1: right the possible exception of the atomic bomb, and so 547 00:28:47,636 --> 00:28:51,036 Speaker 1: I'm curious. I mean those kinds of worries, like do 548 00:28:51,116 --> 00:28:53,356 Speaker 1: you share them? Are there worries you have about developing 549 00:28:53,356 --> 00:28:57,476 Speaker 1: a foundation model for robots about bad actors using it? 550 00:28:57,636 --> 00:29:01,796 Speaker 2: Even I do think that, like, yeah, there's plenty of 551 00:29:01,796 --> 00:29:05,476 Speaker 2: technology that has dual uses, and I think there are 552 00:29:06,636 --> 00:29:12,836 Speaker 2: applications of technologies that are harmful. I think that a 553 00:29:12,876 --> 00:29:17,916 Speaker 2: lot of the concerns in the language model community stem 554 00:29:17,956 --> 00:29:24,116 Speaker 2: from imviewing these systems with greater autonomy. And I think 555 00:29:24,156 --> 00:29:28,956 Speaker 2: that I work like hands on with the robots quite 556 00:29:28,956 --> 00:29:32,636 Speaker 2: a bit, and I don't see a world in which 557 00:29:32,876 --> 00:29:35,956 Speaker 2: they will be taking over in any way. It's very 558 00:29:35,996 --> 00:29:38,836 Speaker 2: easy to just like, well, with our current iteration of robots, 559 00:29:38,836 --> 00:29:40,676 Speaker 2: to just like if we threw some water on it, 560 00:29:40,716 --> 00:29:42,756 Speaker 2: the robot wouldn't be in trouble. 561 00:29:42,876 --> 00:29:46,716 Speaker 1: So that might be a problem for you, but I'm 562 00:29:46,756 --> 00:29:48,316 Speaker 1: sure you could solve that way we're working. 563 00:29:48,356 --> 00:29:50,356 Speaker 2: We're working on so we actually do have a new 564 00:29:50,356 --> 00:29:52,996 Speaker 2: iteration that that is actually a lot more waterproof. But 565 00:29:53,436 --> 00:29:54,716 Speaker 2: it's just not a concern that I show. 566 00:29:54,876 --> 00:29:58,756 Speaker 1: Okay, interesting basically just because you think we can whatever 567 00:29:59,196 --> 00:30:00,436 Speaker 1: turn it off if we need to. 568 00:30:01,036 --> 00:30:03,516 Speaker 2: Yeah, and yeah, and I think, yeah, there's always going 569 00:30:03,596 --> 00:30:05,796 Speaker 2: to be dual use concerns, but I think that the 570 00:30:06,156 --> 00:30:09,396 Speaker 2: pros of the technology outweigh outway some of the Jobson's. 571 00:30:09,196 --> 00:30:11,796 Speaker 1: Well, give me the happy story, then, like in what 572 00:30:11,796 --> 00:30:13,956 Speaker 1: what number of years should we choose for a happy story? 573 00:30:14,036 --> 00:30:15,396 Speaker 1: Ten is ten too soon? 574 00:30:16,036 --> 00:30:17,516 Speaker 2: I don't want to put a number to it. I 575 00:30:17,516 --> 00:30:21,716 Speaker 2: think that they with research, you don't know exactly how 576 00:30:21,756 --> 00:30:25,756 Speaker 2: thongs things will take. And I an envision a world 577 00:30:25,836 --> 00:30:30,876 Speaker 2: where the when you're developing hardware, it's it's not too 578 00:30:30,956 --> 00:30:34,276 Speaker 2: hard to actually teach it to do something, and teach 579 00:30:34,316 --> 00:30:38,236 Speaker 2: it to do something useful, rather than just having machines 580 00:30:38,316 --> 00:30:43,796 Speaker 2: that are not particularly intelligent, like dishwashers and laundry machines 581 00:30:43,836 --> 00:30:44,676 Speaker 2: and so forth. 582 00:30:45,676 --> 00:30:49,236 Speaker 1: Go bigger if you would like what like what what 583 00:30:49,236 --> 00:30:51,316 Speaker 1: what would be pill be teached robots to do in 584 00:30:51,316 --> 00:30:52,996 Speaker 1: that world, I. 585 00:30:53,196 --> 00:30:54,916 Speaker 2: Guess if we were to go bigger, I think that 586 00:30:55,036 --> 00:30:59,036 Speaker 2: there's a lot of challenges around helping helping people as 587 00:30:59,076 --> 00:31:02,316 Speaker 2: the age allowing them to be more independent. That that's 588 00:31:02,356 --> 00:31:05,636 Speaker 2: like a huge one. I think that I don't know, manufacturing, 589 00:31:05,676 --> 00:31:08,196 Speaker 2: there's all sorts of places where like there's abuse of 590 00:31:08,276 --> 00:31:11,076 Speaker 2: labor practices and we can maybe like be able to 591 00:31:11,076 --> 00:31:15,476 Speaker 2: eliminate those if it's a robot instead of a human. Yeah, many, many, 592 00:31:15,516 --> 00:31:17,476 Speaker 2: many examples. And I think that there's also even things 593 00:31:17,476 --> 00:31:20,476 Speaker 2: that are even hard to imagine because the technology doesn't exist. 594 00:31:20,516 --> 00:31:22,756 Speaker 2: So a lot of the things that I'm thinking about 595 00:31:22,796 --> 00:31:26,556 Speaker 2: are robots helping humans in different circumstances to allow them 596 00:31:26,556 --> 00:31:30,556 Speaker 2: to be more productive. But once something exists, like you often, 597 00:31:30,836 --> 00:31:32,876 Speaker 2: like people are creative and come up with new ways 598 00:31:32,876 --> 00:31:34,316 Speaker 2: of how that's used. 599 00:31:37,116 --> 00:31:49,196 Speaker 1: We'll be back in a minute with the lightning round. Great, 600 00:31:49,276 --> 00:31:54,116 Speaker 1: let's finish with the lightning round. What's one thing that 601 00:31:54,196 --> 00:31:58,796 Speaker 1: working with robots has caused you to appreciate about the 602 00:31:58,876 --> 00:31:59,516 Speaker 1: human body? 603 00:32:00,836 --> 00:32:02,196 Speaker 2: Our skin is pretty amazing. 604 00:32:02,676 --> 00:32:07,556 Speaker 1: Huh. Well, so we didn't talk about I mean a 605 00:32:07,636 --> 00:32:10,836 Speaker 1: sense of touch, or of of heat or of cold, right, 606 00:32:10,876 --> 00:32:13,556 Speaker 1: I mean presumably the models you're building, the robots you're 607 00:32:13,596 --> 00:32:17,076 Speaker 1: using don't have that, but they could, right, they could 608 00:32:17,196 --> 00:32:20,676 Speaker 1: have a sense of touch. Is anyone working on that? 609 00:32:20,876 --> 00:32:21,836 Speaker 1: Is that of interest to you? 610 00:32:22,676 --> 00:32:25,036 Speaker 2: Lots of people working on it. I think it's pretty interesting. 611 00:32:25,236 --> 00:32:28,516 Speaker 2: I think that the hardware technology is not super mature 612 00:32:28,716 --> 00:32:30,156 Speaker 2: compared to where I'd like for it to be in 613 00:32:30,236 --> 00:32:33,756 Speaker 2: terms of how robust it is. And the cheapness and 614 00:32:33,796 --> 00:32:37,156 Speaker 2: the resolution that said, Like, we actually put cameras on 615 00:32:37,236 --> 00:32:39,996 Speaker 2: the risks of our robot to help it get some 616 00:32:40,036 --> 00:32:42,516 Speaker 2: sort of tactile and for example, if you can, if 617 00:32:42,516 --> 00:32:45,076 Speaker 2: you like visually look at your finger as you make 618 00:32:45,116 --> 00:32:48,156 Speaker 2: contact with an object, you can see it to form 619 00:32:48,756 --> 00:32:51,796 Speaker 2: around that object, and you can actually just by looking 620 00:32:51,796 --> 00:32:55,076 Speaker 2: at your finger get some notion of tactile feedback similar 621 00:32:55,076 --> 00:32:57,196 Speaker 2: to what our skin gets. Yeah, and cameras are cheap, 622 00:32:57,196 --> 00:33:01,236 Speaker 2: really easy, robust, way more robust and cheap than existing 623 00:33:01,276 --> 00:33:02,596 Speaker 2: technology for tactile something. 624 00:33:04,716 --> 00:33:08,476 Speaker 1: I've heard you say that humanoid robots are overrated, and 625 00:33:08,516 --> 00:33:09,876 Speaker 1: I'm curious, why do you think that. 626 00:33:11,196 --> 00:33:14,956 Speaker 2: I think that simplicity is really helpful and important when 627 00:33:14,996 --> 00:33:19,596 Speaker 2: trying to develop technology. When you introduce more complexity than's needed, 628 00:33:19,636 --> 00:33:22,596 Speaker 2: it slows you down a lot. And I think that 629 00:33:22,876 --> 00:33:27,116 Speaker 2: the complexity that humanoids introduce. Yeah, I think that if 630 00:33:27,116 --> 00:33:29,316 Speaker 2: all of the robots we were working with were humanoids, 631 00:33:29,516 --> 00:33:31,996 Speaker 2: I think that we wouldn't have made anywhere near the 632 00:33:31,996 --> 00:33:35,236 Speaker 2: progress that we've made because we'd be dealing with additional challenges. 633 00:33:35,636 --> 00:33:38,636 Speaker 2: I also think that optimizing for ease of data collection 634 00:33:38,916 --> 00:33:41,276 Speaker 2: is really important in a world where we need data, 635 00:33:41,596 --> 00:33:45,396 Speaker 2: and it's a lot harder to collect and operate all 636 00:33:45,436 --> 00:33:49,236 Speaker 2: of the different joints and motors of a humanoid than 637 00:33:49,276 --> 00:33:51,476 Speaker 2: it is to control a simpler robot. 638 00:33:52,476 --> 00:33:54,236 Speaker 1: Do you anthropomorphize robots? 639 00:33:55,236 --> 00:33:58,676 Speaker 2: I hate it when people are anthrough morphize robots. I 640 00:33:58,716 --> 00:34:03,156 Speaker 2: think that it is misleading because the failure modes that 641 00:34:03,236 --> 00:34:05,596 Speaker 2: robots have are very different from the failure modes that 642 00:34:05,636 --> 00:34:08,836 Speaker 2: people have, and it misleads people into thinking that it's 643 00:34:08,876 --> 00:34:11,196 Speaker 2: going to behave in the way that people behave. 644 00:34:12,196 --> 00:34:13,836 Speaker 1: Like like in what way? 645 00:34:14,276 --> 00:34:16,516 Speaker 2: Oh like, if you see a robot doing something like 646 00:34:16,556 --> 00:34:20,036 Speaker 2: doing a backflip, like or even folding laundry, you kind 647 00:34:20,036 --> 00:34:21,996 Speaker 2: of assume that anything like like if you saw a 648 00:34:22,036 --> 00:34:23,796 Speaker 2: person do that, then they probably could do a lot 649 00:34:23,796 --> 00:34:26,236 Speaker 2: of other things too. And if you have to promorphize 650 00:34:26,236 --> 00:34:28,756 Speaker 2: the robot, then you assume that it, like the capabilities 651 00:34:28,756 --> 00:34:31,436 Speaker 2: that you see are representative as if it were like 652 00:34:31,476 --> 00:34:34,716 Speaker 2: a human ah, and that it could do a backflip anywhere, 653 00:34:35,036 --> 00:34:38,876 Speaker 2: or that it could fold laundry anywhere with any item 654 00:34:38,876 --> 00:34:39,676 Speaker 2: of clothing. 655 00:34:39,516 --> 00:34:41,396 Speaker 1: Or surely you would think a robot that could do 656 00:34:41,436 --> 00:34:44,476 Speaker 1: a backflip could fold a shirt, but no. 657 00:34:45,196 --> 00:34:49,956 Speaker 2: Exactly exactly, so sometimes it's fun to like assign emotions 658 00:34:49,956 --> 00:34:51,756 Speaker 2: to some of the things, or say the robots having 659 00:34:51,796 --> 00:34:54,476 Speaker 2: a bad day, because certainly it feels like that sometime. 660 00:34:54,676 --> 00:34:58,356 Speaker 2: But when it kind of moves beyond fun and jokes, 661 00:34:58,836 --> 00:35:01,316 Speaker 2: it might have consequences that I don't think makes sense. 662 00:35:02,836 --> 00:35:06,636 Speaker 1: I read that there was a researcher who said they 663 00:35:06,676 --> 00:35:10,276 Speaker 1: would retire if a robot tied to shoela Yes, and 664 00:35:10,276 --> 00:35:13,036 Speaker 1: then one of your robots tied to shoelace, and I 665 00:35:13,076 --> 00:35:18,476 Speaker 1: guess they didn't retire. But I'm curious. What would you 666 00:35:18,676 --> 00:35:22,156 Speaker 1: need to see a robot do to retire. 667 00:35:23,516 --> 00:35:26,916 Speaker 2: Hmm, I don't know. I guess one example that I've 668 00:35:26,916 --> 00:35:29,676 Speaker 2: given before that I would love to see a robot do. 669 00:35:29,756 --> 00:35:32,716 Speaker 2: I don't think this is quite retirement level, but being 670 00:35:32,756 --> 00:35:34,876 Speaker 2: able to go into a kitchen that has never been 671 00:35:34,916 --> 00:35:39,196 Speaker 2: in before and make a bowl of cereal pretty basic, 672 00:35:40,236 --> 00:35:42,356 Speaker 2: especially compared to doing a backflip. I cannot do a 673 00:35:42,356 --> 00:35:44,396 Speaker 2: backflip myself, but I could make a bowl of cereal. 674 00:35:44,716 --> 00:35:47,476 Speaker 2: But it requires being able to find objects in the environment, 675 00:35:47,516 --> 00:35:51,036 Speaker 2: being able to interact with delicate objects like a cereal box, 676 00:35:51,596 --> 00:35:54,116 Speaker 2: maybe even use tools in order to open the cereal box. 677 00:35:54,516 --> 00:35:58,396 Speaker 2: Pouring liquids. Yeah, so that's a task that I love, 678 00:35:58,636 --> 00:36:00,796 Speaker 2: and I could actually even see us being able to 679 00:36:01,116 --> 00:36:04,276 Speaker 2: show a demo of that without too much difficulty actually 680 00:36:04,716 --> 00:36:06,676 Speaker 2: if we put our mind to it and in collected 681 00:36:06,756 --> 00:36:09,276 Speaker 2: data for it. So it actually is, I think, or 682 00:36:09,316 --> 00:36:12,756 Speaker 2: within reach than maybe I imagined a few years ago. 683 00:36:12,876 --> 00:36:16,756 Speaker 1: Just as you're thinking about it, it's getting closer. You're like, oh, wait, 684 00:36:16,796 --> 00:36:17,516 Speaker 1: we could do that. 685 00:36:18,396 --> 00:36:20,676 Speaker 2: Yeah. I mean we've actually collected data of pouring cereal, 686 00:36:21,276 --> 00:36:23,516 Speaker 2: like opening a cereal box and pouring it into a bowl. 687 00:36:23,916 --> 00:36:26,916 Speaker 2: We haven't yet done liquid handling and pouring, but I 688 00:36:26,916 --> 00:36:28,996 Speaker 2: think we're actually going to do it this week. On 689 00:36:29,076 --> 00:36:32,076 Speaker 2: the Robot, I asked the hardware team to make a 690 00:36:32,476 --> 00:36:35,636 Speaker 2: waterproof robot. So we're not too far. A lot of 691 00:36:35,676 --> 00:36:38,836 Speaker 2: the pieces are coming together. I also, I love working 692 00:36:38,836 --> 00:36:41,796 Speaker 2: with robots and so, and I'm also fairly young, I 693 00:36:41,796 --> 00:36:46,036 Speaker 2: think not too old, and so I don't imagine myself 694 00:36:46,036 --> 00:36:47,036 Speaker 2: retiring anytime soon. 695 00:36:53,996 --> 00:36:57,156 Speaker 1: Chelsea Finn is a Stanford professor and the co founder 696 00:36:57,196 --> 00:37:01,596 Speaker 1: of Physical Intelligence. You can email us at problem at 697 00:37:01,596 --> 00:37:04,556 Speaker 1: pushkin dot fm, and please do email us. I read 698 00:37:04,596 --> 00:37:07,996 Speaker 1: all the emails. Today's show was produced by Gabriel Hunter Chang, 699 00:37:08,516 --> 00:37:12,836 Speaker 1: edited by Alexander Garreton and engineered by Sarah Bruguerrett. I'm 700 00:37:12,876 --> 00:37:15,236 Speaker 1: Jacob Goldstein and we'll be back next week with another 701 00:37:15,236 --> 00:37:16,236 Speaker 1: episode of What's Your Pop