WEBVTT - Teaching Robots How to Do Everything

0:00:15.356 --> 0:00:23.476
<v Speaker 1>Pushkin. In a metaphorical sense, AI is everywhere. It can

0:00:23.556 --> 0:00:26.356
<v Speaker 1>write essays, it can do your texes, it can design drugs,

0:00:26.356 --> 0:00:30.516
<v Speaker 1>it can make movies. But in a literal sense, AI

0:00:31.236 --> 0:00:35.356
<v Speaker 1>is not everywhere. You know, a large language model can

0:00:35.396 --> 0:00:38.196
<v Speaker 1>tell you whatever twenty seven ways to fold your shirts

0:00:38.196 --> 0:00:40.836
<v Speaker 1>and put them in the drawer, but there's no robot

0:00:40.916 --> 0:00:44.076
<v Speaker 1>that you can buy that can actually fold your shirts

0:00:44.156 --> 0:00:46.876
<v Speaker 1>and put them in the drawer. At some point, though

0:00:47.596 --> 0:00:50.116
<v Speaker 1>maybe at some point in the not that distant future,

0:00:50.756 --> 0:00:53.516
<v Speaker 1>there will be a robot that can use AI to

0:00:53.596 --> 0:00:55.316
<v Speaker 1>learn how to fold your shirts and put them in

0:00:55.316 --> 0:00:58.996
<v Speaker 1>the drawer, or you know, cook lasagna, pack boxes, plug

0:00:58.996 --> 0:01:02.196
<v Speaker 1>in cables. In other words, there will be a robot

0:01:02.316 --> 0:01:06.196
<v Speaker 1>that can use AI to learn how to do basically anything.

0:01:12.276 --> 0:01:14.636
<v Speaker 1>I'm Jacob Goldstein and this is What's Your Problem, the

0:01:14.676 --> 0:01:16.436
<v Speaker 1>show where I talk to people who are trying to

0:01:16.436 --> 0:01:20.876
<v Speaker 1>make technological progress. My guest today is Chelsea Finn. She's

0:01:20.916 --> 0:01:23.156
<v Speaker 1>a professor at Stanford and the co founder of a

0:01:23.196 --> 0:01:28.556
<v Speaker 1>company called Physical Intelligence aka PI. Chelsea's problem is this,

0:01:29.276 --> 0:01:32.316
<v Speaker 1>can you build an AI model that will bring AI

0:01:32.676 --> 0:01:35.876
<v Speaker 1>to robots, or, as she puts it, we're.

0:01:35.676 --> 0:01:39.356
<v Speaker 2>Trying to develop a model that can control any robot

0:01:39.436 --> 0:01:41.036
<v Speaker 2>to do any task anywhere.

0:01:41.756 --> 0:01:44.916
<v Speaker 1>Physical Intelligence was founded just last year, but the company

0:01:44.916 --> 0:01:49.396
<v Speaker 1>has already raised over four hundred million dollars. Investors include

0:01:49.516 --> 0:01:53.556
<v Speaker 1>Jeff Bezos and OpenAI. The company has raised so much

0:01:53.596 --> 0:01:55.836
<v Speaker 1>money in part because what they're trying to do is

0:01:55.876 --> 0:01:59.916
<v Speaker 1>so hard. Motor skills, the ability to move and find

0:01:59.956 --> 0:02:02.596
<v Speaker 1>ways to fold the shirt to plug in a cable,

0:02:02.996 --> 0:02:07.116
<v Speaker 1>they feel simple to us, easy, basic, But Chelsea told

0:02:07.156 --> 0:02:10.756
<v Speaker 1>me basic motor skills are in fact wildly complex.

0:02:11.476 --> 0:02:14.556
<v Speaker 2>All of the motor control that we do with our body,

0:02:14.596 --> 0:02:18.196
<v Speaker 2>with their hands, with our legs, our feet, a lot

0:02:18.236 --> 0:02:20.716
<v Speaker 2>of it we don't think about when we do it.

0:02:20.716 --> 0:02:23.836
<v Speaker 2>It actually is incredibly complicated what we do. This is

0:02:23.836 --> 0:02:26.876
<v Speaker 2>actually like a really really hard problem to develop in

0:02:26.996 --> 0:02:30.476
<v Speaker 2>aisystems into robots, despite it being so simple. And the

0:02:30.516 --> 0:02:33.516
<v Speaker 2>reasons for that are because actually it is inherently very complex,

0:02:34.116 --> 0:02:37.316
<v Speaker 2>and second that we don't have tons and tons of

0:02:37.356 --> 0:02:40.876
<v Speaker 2>data of doing this, in part because it's so basic

0:02:40.956 --> 0:02:42.756
<v Speaker 2>to humans as well.

0:02:42.836 --> 0:02:45.556
<v Speaker 1>Right, let's talk about the data side, because that seems

0:02:45.636 --> 0:02:49.396
<v Speaker 1>like really the story, right, the big challenge, and it's

0:02:49.436 --> 0:02:54.596
<v Speaker 1>particularly interesting in the context of large language models and

0:02:54.636 --> 0:02:58.956
<v Speaker 1>computer vision which really seem to have emerged in a

0:02:58.996 --> 0:03:01.876
<v Speaker 1>weird way as a consequence of the Internet. Right, just

0:03:01.916 --> 0:03:06.436
<v Speaker 1>because we happen to have this crazy amount of data

0:03:06.596 --> 0:03:09.276
<v Speaker 1>of words and pictures on the Internet, we were able

0:03:09.316 --> 0:03:12.476
<v Speaker 1>to train language models and computer vision models. But we

0:03:12.556 --> 0:03:16.756
<v Speaker 1>don't have that for robots, right. There is no data

0:03:16.796 --> 0:03:19.876
<v Speaker 1>set of training data for robots, which is like the

0:03:19.956 --> 0:03:22.756
<v Speaker 1>big challenge for you and for robotics in general.

0:03:22.796 --> 0:03:25.636
<v Speaker 2>It seems, Yeah, so we don't have an open internet

0:03:25.636 --> 0:03:29.316
<v Speaker 2>of how to control motors to do like even really

0:03:29.356 --> 0:03:31.556
<v Speaker 2>basic things. Maybe the closest thing we have is we

0:03:31.596 --> 0:03:34.596
<v Speaker 2>have videos of people doing things, and perhaps that could

0:03:34.596 --> 0:03:37.076
<v Speaker 2>be useful. But at the same time, if I watch

0:03:37.196 --> 0:03:40.036
<v Speaker 2>like videos of like Roger Federer or playing tennis, you

0:03:40.076 --> 0:03:42.956
<v Speaker 2>can't just become an amazing tennis player as a result

0:03:42.956 --> 0:03:45.476
<v Speaker 2>of that. And likewise, just with videos of people doing things,

0:03:45.876 --> 0:03:48.716
<v Speaker 2>it's very hard to actually extract the motor control behind that.

0:03:48.876 --> 0:03:51.476
<v Speaker 2>And so that lack of data, that scarcity of data,

0:03:51.876 --> 0:03:56.316
<v Speaker 2>makes it in some ways a very different problem than

0:03:56.636 --> 0:03:58.956
<v Speaker 2>in language and computer vision. And I think that we

0:03:58.956 --> 0:04:00.796
<v Speaker 2>should still learn a lot of things from language computer

0:04:00.876 --> 0:04:04.196
<v Speaker 2>vision and collect large data sets like that. It opens

0:04:04.276 --> 0:04:07.596
<v Speaker 2>up new new challenges new possibilities on that front, and

0:04:07.676 --> 0:04:08.996
<v Speaker 2>I think that in the long run we should be

0:04:09.236 --> 0:04:11.876
<v Speaker 2>to get large amounts of data, just like how in

0:04:11.916 --> 0:04:14.356
<v Speaker 2>autonomous driving we have lots of data of cars driving

0:04:14.396 --> 0:04:18.076
<v Speaker 2>around very effectively. Robots too, could be in the world

0:04:18.196 --> 0:04:21.316
<v Speaker 2>collecting data learning about how to pick up mustard and

0:04:21.356 --> 0:04:23.516
<v Speaker 2>put it on a hot dog fund, or learning how

0:04:23.556 --> 0:04:26.556
<v Speaker 2>to open a cabinet to put some objects away. We

0:04:26.556 --> 0:04:29.356
<v Speaker 2>can get that sort of data, but it's not given

0:04:29.436 --> 0:04:33.196
<v Speaker 2>to us for free.

0:04:33.436 --> 0:04:36.596
<v Speaker 1>You still have this core problem, which is there is

0:04:36.916 --> 0:04:41.956
<v Speaker 1>no giant trove of physical reality data that you can

0:04:41.996 --> 0:04:44.996
<v Speaker 1>train your model on. Right, That's the great big challenge,

0:04:45.036 --> 0:04:46.796
<v Speaker 1>it seems, what do you do about that? How do

0:04:46.796 --> 0:04:47.996
<v Speaker 1>you start to approach that?

0:04:49.196 --> 0:04:52.676
<v Speaker 2>Yeah, so we're starting off by collecting data through telling

0:04:52.716 --> 0:04:57.436
<v Speaker 2>operation where you are people are controlling the robot to

0:04:57.436 --> 0:05:00.116
<v Speaker 2>do tasks, and then you don't just get video data.

0:05:00.196 --> 0:05:03.196
<v Speaker 2>You get the videos alongside what are the actions or

0:05:03.196 --> 0:05:07.076
<v Speaker 2>the motor commands needed to actually accomplish those tasks. We've

0:05:07.116 --> 0:05:10.636
<v Speaker 2>collected data in our own office. We've also collected data

0:05:10.876 --> 0:05:14.956
<v Speaker 2>in homes across San Francisco, and we also have a

0:05:15.076 --> 0:05:18.476
<v Speaker 2>very modest warehouse. In some ways, it actually like our

0:05:18.516 --> 0:05:22.476
<v Speaker 2>current operation is rather small, given that we're a little

0:05:22.476 --> 0:05:24.076
<v Speaker 2>over a year old at this point.

0:05:24.356 --> 0:05:26.556
<v Speaker 1>Like what's actually happening? Like if I went into your

0:05:26.556 --> 0:05:28.996
<v Speaker 1>warehouse and somebody was doing teleoperation, what would I see?

0:05:29.036 --> 0:05:29.836
<v Speaker 1>What would it look like?

0:05:30.676 --> 0:05:35.076
<v Speaker 2>Yeah, so we it's a little bit like controlling a puppet.

0:05:35.276 --> 0:05:38.956
<v Speaker 2>So the person who's operating at the robot, they are

0:05:38.996 --> 0:05:42.196
<v Speaker 2>holding in some ways a set of robot arms, but

0:05:42.196 --> 0:05:44.596
<v Speaker 2>they're very lightweight robot arms, and we use those to

0:05:44.676 --> 0:05:46.676
<v Speaker 2>measure the positions of joints.

0:05:47.076 --> 0:05:49.516
<v Speaker 1>It's almost like an elaborate control for a video game

0:05:49.636 --> 0:05:52.716
<v Speaker 1>or something. It's like that, it's not actually a robot arm, right,

0:05:52.716 --> 0:05:54.796
<v Speaker 1>It's a thing you control to sort of play the

0:05:54.956 --> 0:05:57.196
<v Speaker 1>robot to the robot move.

0:05:57.076 --> 0:06:00.516
<v Speaker 2>Yeah, exactly exactly, and then we record that and then

0:06:01.036 --> 0:06:04.956
<v Speaker 2>directly translate those controls over to the robot. We have

0:06:04.996 --> 0:06:07.516
<v Speaker 2>some robots that are just robot arms, where you're only

0:06:07.516 --> 0:06:09.636
<v Speaker 2>just controlling the robot arm. It's mounted to a table

0:06:09.756 --> 0:06:11.996
<v Speaker 2>or something like that. But we also have what we

0:06:12.036 --> 0:06:14.636
<v Speaker 2>call mobile manipulators that have wheels and robot arms, and

0:06:14.676 --> 0:06:18.036
<v Speaker 2>you can control both how the robot drives around as

0:06:18.116 --> 0:06:21.236
<v Speaker 2>well as how the arms move and we're doing tasks

0:06:21.356 --> 0:06:26.956
<v Speaker 2>like wiping down counters, folding laundry, putting dishes into dishwashers,

0:06:27.276 --> 0:06:32.716
<v Speaker 2>plugging cables into data center racks, assembling cardboard boxes, lots

0:06:32.756 --> 0:06:35.556
<v Speaker 2>and lots of different tasks that might be useful for

0:06:35.676 --> 0:06:38.636
<v Speaker 2>robots to do, and recording all the data. So we

0:06:38.676 --> 0:06:40.996
<v Speaker 2>have cameras on the robots. There are sensors on the

0:06:41.036 --> 0:06:44.636
<v Speaker 2>joints on the motors of the robots as well, and

0:06:44.676 --> 0:06:47.596
<v Speaker 2>we record that in like a synchronized way across time.

0:06:47.836 --> 0:06:50.596
<v Speaker 1>So when you do it, it's like kind of like

0:06:50.756 --> 0:06:52.716
<v Speaker 1>a real world video game, like you're moving your arms

0:06:52.716 --> 0:06:55.676
<v Speaker 1>in these things, and in basically real time, the robot

0:06:55.796 --> 0:06:58.036
<v Speaker 1>arm is moving and picking up the thing you wanted

0:06:58.076 --> 0:07:01.156
<v Speaker 1>to pick up, And like, what's it like? Is there

0:07:01.236 --> 0:07:03.556
<v Speaker 1>like a curve where like at the beginning it's really bad?

0:07:03.636 --> 0:07:06.036
<v Speaker 1>Sort of tell me talk me through an instance.

0:07:06.956 --> 0:07:08.796
<v Speaker 2>And it depends on the person. So some people can

0:07:08.836 --> 0:07:11.276
<v Speaker 2>pay it really really quickly. Some people are a bit

0:07:11.276 --> 0:07:13.756
<v Speaker 2>slower to pick it up. I've pride myself in being

0:07:13.756 --> 0:07:17.756
<v Speaker 2>a pretty good operator, and so I have done tasks

0:07:17.756 --> 0:07:20.476
<v Speaker 2>as complex as peeling a hard boiled egg with the robot,

0:07:21.196 --> 0:07:22.476
<v Speaker 2>which is how are.

0:07:22.316 --> 0:07:24.916
<v Speaker 1>You how are you at peeling a hardboard hard boiled

0:07:24.916 --> 0:07:25.796
<v Speaker 1>egg with your hands.

0:07:27.276 --> 0:07:29.796
<v Speaker 2>It's pretty hard with my own hands too, yeah, and

0:07:29.836 --> 0:07:31.076
<v Speaker 2>with the robot is even harder.

0:07:31.156 --> 0:07:32.996
<v Speaker 1>Tell me about the robot peeling a hard build egg

0:07:33.036 --> 0:07:35.276
<v Speaker 1>because that sounds like a hard one. Yeah.

0:07:35.316 --> 0:07:37.796
<v Speaker 2>So the robots, basically, all the robots that we're using

0:07:37.836 --> 0:07:40.716
<v Speaker 2>are like kind of pincher grippers. They're called parallel drag rippers,

0:07:41.036 --> 0:07:44.756
<v Speaker 2>where there's just one degree random like open clothes two pincers.

0:07:44.756 --> 0:07:46.556
<v Speaker 1>It's basically two pincers, like two.

0:07:46.396 --> 0:07:50.676
<v Speaker 2>Pinters, two arms. Yeah, exactly, and I've used that exact setup.

0:07:50.996 --> 0:07:52.956
<v Speaker 2>There's six different joints on the arm, so it can

0:07:53.396 --> 0:07:56.556
<v Speaker 2>move as kind of full basically full range of motion

0:07:56.676 --> 0:07:59.236
<v Speaker 2>in three D space and three D rotation, and you

0:07:59.236 --> 0:08:01.396
<v Speaker 2>can use that to peel a hard boiled egg. You

0:08:01.436 --> 0:08:04.156
<v Speaker 2>don't have any tactile feedback, so you can't actually feel

0:08:04.556 --> 0:08:05.996
<v Speaker 2>the egg, and that's actually one of the things that

0:08:06.116 --> 0:08:08.876
<v Speaker 2>makes it more difficult. But you can actually you can

0:08:08.996 --> 0:08:13.036
<v Speaker 2>use visual feedback to compensate for that. And so just

0:08:13.036 --> 0:08:15.516
<v Speaker 2>by looking at the egg myself, I'm able to figure

0:08:15.516 --> 0:08:18.076
<v Speaker 2>out if you're like in contact with something, and you just.

0:08:18.156 --> 0:08:21.156
<v Speaker 1>Use one prong of the claw like what I could say,

0:08:21.156 --> 0:08:23.236
<v Speaker 1>you squeeze it a little to crack it, and then

0:08:23.676 --> 0:08:25.836
<v Speaker 1>use like one prong of the claw to get the

0:08:25.836 --> 0:08:26.316
<v Speaker 1>shell off.

0:08:26.996 --> 0:08:28.956
<v Speaker 2>Yeah, exactly, so you can. You want to crack it

0:08:28.996 --> 0:08:31.116
<v Speaker 2>initially and then hold it with one gripper and then

0:08:31.236 --> 0:08:34.716
<v Speaker 2>use basically one of the two fingers in the gripper

0:08:35.036 --> 0:08:38.076
<v Speaker 2>to get pieces of shell off. When we did this,

0:08:38.116 --> 0:08:41.836
<v Speaker 2>we heart boiled only two eggs and the moss egg.

0:08:42.556 --> 0:08:44.516
<v Speaker 2>This is actually a Stanford The first egg and graduate

0:08:44.556 --> 0:08:46.956
<v Speaker 2>student ended up breaking and so that I did the

0:08:46.996 --> 0:08:49.156
<v Speaker 2>second egg, and I was able to successfully not break

0:08:49.196 --> 0:08:52.156
<v Speaker 2>it and fully peel it. It took some patience, certainly,

0:08:52.156 --> 0:08:53.956
<v Speaker 2>and I wasn't able to do it as quickly as

0:08:53.956 --> 0:08:56.556
<v Speaker 2>with my own hands, But I guess goes to show

0:08:56.636 --> 0:09:00.276
<v Speaker 2>the extent to which we're able to control robots to

0:09:00.356 --> 0:09:02.116
<v Speaker 2>do pretty complicated things.

0:09:02.356 --> 0:09:05.596
<v Speaker 1>Yeah, and so obviously, I mean that is a stunt

0:09:05.676 --> 0:09:07.876
<v Speaker 1>or a game or something fun to do with the robot.

0:09:07.916 --> 0:09:11.956
<v Speaker 1>But presumably in that instance, as in the other instances

0:09:11.996 --> 0:09:16.556
<v Speaker 1>of folding clothes and vacuuming it like, there is learning, right.

0:09:16.596 --> 0:09:19.076
<v Speaker 1>The idea is that you do it some number of

0:09:19.116 --> 0:09:21.476
<v Speaker 1>times and then the robot can do it, and then

0:09:21.516 --> 0:09:24.516
<v Speaker 1>presumably there's also generalization. But just to start with learning,

0:09:24.796 --> 0:09:29.036
<v Speaker 1>like you know, reductively, how many times do you got

0:09:29.036 --> 0:09:30.356
<v Speaker 1>to do it for the robot to learn it?

0:09:31.676 --> 0:09:35.796
<v Speaker 2>Yeah, so it really depends on the extent to which

0:09:35.836 --> 0:09:38.636
<v Speaker 2>you want the robot to handle different conditions. So in

0:09:38.676 --> 0:09:40.996
<v Speaker 2>some of our research, we've been able to show the

0:09:41.116 --> 0:09:44.596
<v Speaker 2>robot how to do something like thirty times or fifty times,

0:09:44.716 --> 0:09:47.716
<v Speaker 2>and just with that maybe sounds like a bit, but

0:09:47.716 --> 0:09:49.476
<v Speaker 2>you can do that in like typically less than an

0:09:49.476 --> 0:09:52.276
<v Speaker 2>hour if it's a simple task, and from that the

0:09:52.356 --> 0:09:56.036
<v Speaker 2>robot can under the circumstances. You only kind of demonstrate it.

0:09:56.036 --> 0:09:59.036
<v Speaker 2>In a narrow set of circumstances, like a single environment,

0:09:59.396 --> 0:10:02.956
<v Speaker 2>a single particular object, the robot can learn just from

0:10:03.036 --> 0:10:05.076
<v Speaker 2>like less than hour of data.

0:10:05.156 --> 0:10:07.156
<v Speaker 1>What is an example of a thing that the robot

0:10:07.196 --> 0:10:08.556
<v Speaker 1>learned in less than an er of data?

0:10:09.316 --> 0:10:12.556
<v Speaker 2>Oh yeah, we put a shoe on a foot, We

0:10:12.876 --> 0:10:14.156
<v Speaker 2>tear it off a piece of tape and put it

0:10:14.196 --> 0:10:18.516
<v Speaker 2>on a box. We've also hung up a shirt on

0:10:18.596 --> 0:10:19.036
<v Speaker 2>a hangar.

0:10:19.676 --> 0:10:22.276
<v Speaker 1>So that's not that much I mean, especially because you

0:10:22.316 --> 0:10:24.676
<v Speaker 1>say the robot, but what you really mean is the model.

0:10:24.796 --> 0:10:29.116
<v Speaker 1>So every robot, right, presumably or every robot that's built

0:10:29.156 --> 0:10:30.916
<v Speaker 1>more or less like that one, right, Like that's one

0:10:30.956 --> 0:10:33.236
<v Speaker 1>of the key things. It's like you're not teaching one robot,

0:10:33.276 --> 0:10:37.276
<v Speaker 1>you're teaching every robot ever, because it's it's software fundamentally,

0:10:37.276 --> 0:10:38.836
<v Speaker 1>it's an am model. It's not hardware.

0:10:39.356 --> 0:10:42.236
<v Speaker 2>Yeah, yes, with the caveat that, if you want to

0:10:42.236 --> 0:10:44.796
<v Speaker 2>be this data efficient, it works best if it's like

0:10:45.156 --> 0:10:47.356
<v Speaker 2>in the same like the same color of the table,

0:10:47.756 --> 0:10:50.156
<v Speaker 2>the same kind of rough initial conditions of where the

0:10:50.156 --> 0:10:52.636
<v Speaker 2>objects are starting, right, and the same shirt for example.

0:10:52.676 --> 0:10:54.436
<v Speaker 2>So this is just with like a single shirt and

0:10:54.476 --> 0:10:55.276
<v Speaker 2>not like any shirt.

0:10:55.436 --> 0:10:59.556
<v Speaker 1>So there's there's like concentric circles of generalizability, right, like

0:10:59.676 --> 0:11:02.836
<v Speaker 1>exact same shirt, exact same spot, exact same table versus

0:11:02.876 --> 0:11:06.876
<v Speaker 1>like fold a shirt versus fold clothes, right and versus.

0:11:07.676 --> 0:11:12.116
<v Speaker 1>And so is that just infinitely harder, Like how does

0:11:12.156 --> 0:11:14.396
<v Speaker 1>that work? That's your big that's your big challenge at

0:11:14.396 --> 0:11:16.396
<v Speaker 1>some level, right, Yeah.

0:11:16.236 --> 0:11:18.396
<v Speaker 2>So generalization is one of the big one of the

0:11:18.396 --> 0:11:20.076
<v Speaker 2>big challenges, not the only one, but it's one of

0:11:20.076 --> 0:11:23.636
<v Speaker 2>the big challenges. And in some ways, I mean the

0:11:23.956 --> 0:11:25.916
<v Speaker 2>first unlock there is just to make sure that you're

0:11:25.916 --> 0:11:28.316
<v Speaker 2>collecting data not just for one shirt, but collecting it

0:11:28.316 --> 0:11:30.036
<v Speaker 2>for lots of shirts, or collecting it for lots of

0:11:30.036 --> 0:11:33.316
<v Speaker 2>clothing items, and ideally also collecting data with lots of

0:11:33.356 --> 0:11:37.356
<v Speaker 2>tables with different textures, and also like not just visual

0:11:37.596 --> 0:11:40.596
<v Speaker 2>like appearances, but also like if you're folding on a

0:11:40.636 --> 0:11:43.716
<v Speaker 2>surface that has very low friction, like it's very smooth,

0:11:43.796 --> 0:11:46.236
<v Speaker 2>versus a surface that like maybe on top of carpet

0:11:46.316 --> 0:11:49.436
<v Speaker 2>or something that's going to behave differently when you're trying

0:11:49.476 --> 0:11:53.916
<v Speaker 2>to move the shirt across the table. So having variability

0:11:53.996 --> 0:11:57.236
<v Speaker 2>in the scenarios in which the robot is experiencing in

0:11:57.276 --> 0:12:02.076
<v Speaker 2>the data set is important, and we've seen evidence that

0:12:02.596 --> 0:12:04.716
<v Speaker 2>you set things up correctly and collect data under lots

0:12:04.756 --> 0:12:08.276
<v Speaker 2>of scenarios, you can actually generalize to completely new scenarios.

0:12:08.316 --> 0:12:11.556
<v Speaker 2>And in like Pile five release, for example, we found

0:12:11.596 --> 0:12:15.356
<v Speaker 2>that if we collected data in roughly like one hundred

0:12:15.396 --> 0:12:20.436
<v Speaker 2>different rooms, then the robot is able to do some

0:12:20.636 --> 0:12:22.756
<v Speaker 2>tasks in rooms that it's never been in before.

0:12:23.116 --> 0:12:26.516
<v Speaker 1>So you mentioned Pile five, So PI zero point five

0:12:26.556 --> 0:12:31.716
<v Speaker 1>that's your latest model that you've released, right, tell me

0:12:31.756 --> 0:12:35.676
<v Speaker 1>about that, Like, what what does that model allow robots

0:12:35.716 --> 0:12:38.956
<v Speaker 1>to do? Like what robots and what settings and what tasks.

0:12:39.436 --> 0:12:43.116
<v Speaker 2>Yeah, yeah, definitely. So we were focusing on generalization. So

0:12:43.316 --> 0:12:46.196
<v Speaker 2>the previous model, we were focusing on capability, and we

0:12:46.236 --> 0:12:49.756
<v Speaker 2>did a really complicated task of laundry folding. From there,

0:12:49.796 --> 0:12:52.556
<v Speaker 2>we wanted to answer, like, Okay, that model worked in

0:12:52.556 --> 0:12:54.596
<v Speaker 2>one environment. It's fairly brittle. If you put it in

0:12:54.596 --> 0:12:56.556
<v Speaker 2>a new environment, it wouldn't work. And we wanted to

0:12:56.556 --> 0:12:59.476
<v Speaker 2>see if we put robots in new environments with new objects,

0:12:59.476 --> 0:13:03.476
<v Speaker 2>new lighting conditions, new furniture, can the robot be successful.

0:13:03.636 --> 0:13:09.956
<v Speaker 2>And to do that, we collected data on these manipulators,

0:13:10.076 --> 0:13:13.636
<v Speaker 2>which feels like a terrible name, but robots with two

0:13:13.716 --> 0:13:16.036
<v Speaker 2>arms and wheels that can drive around kind of like

0:13:16.036 --> 0:13:18.956
<v Speaker 2>a humanoid, but we're using wheels instead of legs, a

0:13:18.956 --> 0:13:22.716
<v Speaker 2>bit more practical in that regard, and we train the

0:13:22.796 --> 0:13:26.396
<v Speaker 2>robot to do things like tidying a bed, or wiping

0:13:26.476 --> 0:13:29.556
<v Speaker 2>spills off of a surface, or putting dishes into a sink,

0:13:29.676 --> 0:13:34.156
<v Speaker 2>or putting away items into drawers, taking items of clothing,

0:13:34.156 --> 0:13:36.236
<v Speaker 2>dirty clothing off the floor and putting them into a

0:13:36.276 --> 0:13:39.836
<v Speaker 2>laundry basket, things like that, And then we tested whether

0:13:39.916 --> 0:13:42.036
<v Speaker 2>or not after collecting data like that and lots of

0:13:42.116 --> 0:13:45.676
<v Speaker 2>environments aggregated with other data, including data on the internet.

0:13:46.156 --> 0:13:49.876
<v Speaker 2>Can the robot then do those things in a home

0:13:49.916 --> 0:13:53.076
<v Speaker 2>that has never been in before. And in some ways

0:13:53.076 --> 0:13:57.916
<v Speaker 2>that sounds kind of basic, like people have no problem

0:13:58.316 --> 0:14:01.236
<v Speaker 2>with if you can do it something in like one home,

0:14:01.356 --> 0:14:03.236
<v Speaker 2>probably could do the same thing in another home. It's

0:14:03.276 --> 0:14:05.796
<v Speaker 2>not really doesn't seem like a complicated thing for humans,

0:14:05.956 --> 0:14:08.316
<v Speaker 2>but for robots that are trained on data, if they're

0:14:08.316 --> 0:14:11.116
<v Speaker 2>only trained on in one place there are whole universe,

0:14:11.196 --> 0:14:13.476
<v Speaker 2>is that one place they haven't ever seen any other place?

0:14:13.836 --> 0:14:17.276
<v Speaker 2>This is actually kind of a big challenge for existing methods.

0:14:17.276 --> 0:14:18.956
<v Speaker 2>And yeah, it was a step four. We were able

0:14:18.996 --> 0:14:21.676
<v Speaker 2>to see that it definitely isn't perfect by any means,

0:14:21.716 --> 0:14:25.916
<v Speaker 2>and that kind of comes to another challenge, which is reliability.

0:14:26.036 --> 0:14:29.036
<v Speaker 2>But we're able to see the robot do things in

0:14:29.076 --> 0:14:31.236
<v Speaker 2>homes it's never been in before, where we set it up,

0:14:31.356 --> 0:14:33.156
<v Speaker 2>ask it to do things, and it does some things

0:14:33.196 --> 0:14:33.756
<v Speaker 2>that are useful.

0:14:33.876 --> 0:14:36.476
<v Speaker 1>So like in the classical setting where a robot is

0:14:36.556 --> 0:14:38.356
<v Speaker 1>changed in one room, like it doesn't even know that

0:14:38.436 --> 0:14:40.996
<v Speaker 1>room is a room. That's just like the whole world

0:14:41.036 --> 0:14:43.196
<v Speaker 1>to the robot, is that world right? And if you

0:14:43.236 --> 0:14:46.996
<v Speaker 1>put it in another room, it's in a completely unfamiliar

0:14:47.036 --> 0:14:48.236
<v Speaker 1>world exactly.

0:14:48.316 --> 0:14:50.316
<v Speaker 2>And so for example, what we were talking about, like

0:14:50.556 --> 0:14:52.996
<v Speaker 2>hanging up a shirt, its whole world was like that one,

0:14:53.156 --> 0:14:57.036
<v Speaker 2>like like a black tabletop that smooth, that one blue shirt,

0:14:57.156 --> 0:14:59.436
<v Speaker 2>that one coat hanger. And it doesn't know about this

0:14:59.916 --> 0:15:01.676
<v Speaker 2>entire universe of other shirts and other.

0:15:01.716 --> 0:15:03.956
<v Speaker 1>It doesn't know that there is a category called shirt.

0:15:04.156 --> 0:15:04.676
<v Speaker 1>It only knows.

0:15:04.756 --> 0:15:05.876
<v Speaker 2>Yeah, it doesn't even know what shirts are.

0:15:06.036 --> 0:15:08.356
<v Speaker 1>Yeah, it doesn't even know what shirts are. For pie

0:15:08.436 --> 0:15:10.556
<v Speaker 1>zero point five, Like, what did you ask the robot

0:15:10.596 --> 0:15:12.196
<v Speaker 1>to do? And how well did it work?

0:15:13.316 --> 0:15:16.596
<v Speaker 2>Yeah, So we trained the model. We took actually a

0:15:16.596 --> 0:15:19.956
<v Speaker 2>pre trading language model with also like a vision component,

0:15:20.476 --> 0:15:23.156
<v Speaker 2>and we fine tuned it on a lot of data,

0:15:23.196 --> 0:15:26.676
<v Speaker 2>including data from different homes across San Francisco, but actually

0:15:26.676 --> 0:15:28.276
<v Speaker 2>a lot of other data too. So actually only two

0:15:28.316 --> 0:15:31.796
<v Speaker 2>percent of the data was on these like mobile robots

0:15:31.956 --> 0:15:35.196
<v Speaker 2>with arms. So we can store how the motors were

0:15:35.196 --> 0:15:38.036
<v Speaker 2>all moving in all of our previous data and then

0:15:38.356 --> 0:15:40.716
<v Speaker 2>train the model to mimic that data that we've stored.

0:15:40.836 --> 0:15:43.476
<v Speaker 1>It's like it's like predicting the next word, but instead

0:15:43.476 --> 0:15:45.716
<v Speaker 1>of predicting the next word, it's like predicting the next movement.

0:15:45.876 --> 0:15:47.236
<v Speaker 1>Or something like yes, exactly.

0:15:48.716 --> 0:15:50.956
<v Speaker 2>We've kind of trained it to predict next actions or

0:15:51.036 --> 0:15:54.276
<v Speaker 2>next motor commands instead of next words. We do an

0:15:54.316 --> 0:15:57.476
<v Speaker 2>additional training process to have it focus on and be

0:15:57.596 --> 0:16:01.036
<v Speaker 2>good at the mobile robot data and homes. Then we

0:16:01.036 --> 0:16:03.396
<v Speaker 2>set up the robot in a new home and we

0:16:03.476 --> 0:16:06.516
<v Speaker 2>give it language commands, so we can give it low

0:16:06.596 --> 0:16:09.516
<v Speaker 2>level language commands, or we can actually all so give

0:16:09.516 --> 0:16:12.596
<v Speaker 2>it higher level commands. So the highest level of command

0:16:12.676 --> 0:16:14.916
<v Speaker 2>might be cleaned the bedroom. And one of the things

0:16:14.916 --> 0:16:16.556
<v Speaker 2>that we've also been thinking about more recently is can

0:16:16.556 --> 0:16:18.916
<v Speaker 2>you give it a more detailed description of how you

0:16:18.956 --> 0:16:20.916
<v Speaker 2>want it to clean the bedroom? But we're not quite

0:16:20.916 --> 0:16:22.756
<v Speaker 2>there yet, So we could say clean the bedroom. We'd

0:16:22.796 --> 0:16:25.316
<v Speaker 2>also tell it put the dirty clothes in the laundry basket,

0:16:26.236 --> 0:16:29.476
<v Speaker 2>so that would be kind of a subtask. Or we

0:16:29.516 --> 0:16:32.116
<v Speaker 2>can tell it like commands like pick up the shirt,

0:16:32.556 --> 0:16:35.396
<v Speaker 2>put the shirt in the laundry basket. Then after we

0:16:35.476 --> 0:16:39.996
<v Speaker 2>tell it that command, then it will go off and

0:16:40.756 --> 0:16:44.476
<v Speaker 2>follow that command and actually in most cases realize that

0:16:44.516 --> 0:16:46.636
<v Speaker 2>command successfully in the real world.

0:16:47.156 --> 0:16:47.676
<v Speaker 1>How did it do.

0:16:48.476 --> 0:16:51.556
<v Speaker 2>So it depends on the task. The average success rate

0:16:51.596 --> 0:16:55.476
<v Speaker 2>was around eighty percent, so definitely room for improvement, and

0:16:56.036 --> 0:16:58.436
<v Speaker 2>in many snares it was able to be quite successful.

0:16:58.556 --> 0:17:01.756
<v Speaker 2>We also saw some some failure modes where for example,

0:17:01.796 --> 0:17:04.956
<v Speaker 2>if you're trying to put dishes into a sink, sometimes

0:17:05.076 --> 0:17:06.956
<v Speaker 2>one of the dishes was a cutting board, and picking

0:17:06.996 --> 0:17:09.036
<v Speaker 2>up a cutting board is actually pretty tricky for the

0:17:09.196 --> 0:17:11.516
<v Speaker 2>robot because you either need to slide it to the

0:17:11.676 --> 0:17:14.236
<v Speaker 2>edge of the counter and then grasp it or somehow

0:17:14.276 --> 0:17:17.916
<v Speaker 2>get the kind of get the finger underneath the cutting board.

0:17:18.276 --> 0:17:20.396
<v Speaker 2>And so sometimes it was able to do that successfully.

0:17:20.396 --> 0:17:24.076
<v Speaker 2>Sometimes it struggled and got stuck. The exciting thing though,

0:17:24.116 --> 0:17:26.436
<v Speaker 2>was that it was able to We were able to

0:17:26.476 --> 0:17:27.916
<v Speaker 2>kind of drop it in place as it had never

0:17:27.956 --> 0:17:31.276
<v Speaker 2>been before. And I was doing things that are quite reasonable.

0:17:32.036 --> 0:17:33.836
<v Speaker 1>So what are you doing now, Like, what's the next

0:17:33.876 --> 0:17:35.996
<v Speaker 1>thing you're trying to get to? Yeah?

0:17:35.996 --> 0:17:39.796
<v Speaker 2>Absolutely, So the next thing we're focusing on is reliability

0:17:40.116 --> 0:17:44.036
<v Speaker 2>and speed. So I mentioned like around eighty percent for

0:17:44.076 --> 0:17:46.956
<v Speaker 2>these tasks. How do we get that to ninety nine percent?

0:17:47.116 --> 0:17:49.716
<v Speaker 2>And I think that if we can get the reliability up,

0:17:49.916 --> 0:17:54.236
<v Speaker 2>that's kind of, in my mind, the main missing ingredient

0:17:54.476 --> 0:17:57.596
<v Speaker 2>before we can like really have these being like useful

0:17:58.236 --> 0:18:00.116
<v Speaker 2>in real world scenarios.

0:18:00.716 --> 0:18:03.316
<v Speaker 1>So getting to ninety nine percent is interesting. I mean,

0:18:03.396 --> 0:18:08.036
<v Speaker 1>I think of self driving cars right where it seemed

0:18:08.516 --> 0:18:11.236
<v Speaker 1>sometime go I don't know, ten years ago, fifteen years ago,

0:18:11.316 --> 0:18:14.116
<v Speaker 1>like they were almost there, and I know they're more

0:18:14.156 --> 0:18:16.356
<v Speaker 1>almost there now. I know in San Francisco there really

0:18:16.436 --> 0:18:18.676
<v Speaker 1>are self driving cars, but they're still very much at

0:18:18.716 --> 0:18:22.036
<v Speaker 1>the margin of cars in the world, right, And it

0:18:22.076 --> 0:18:26.236
<v Speaker 1>does seem like almost there means different things in different settings,

0:18:26.276 --> 0:18:31.716
<v Speaker 1>But I don't know. Is it super hard to get

0:18:31.716 --> 0:18:33.996
<v Speaker 1>from eighty percent to ninety nine percent? Does the self

0:18:34.076 --> 0:18:38.716
<v Speaker 1>driving car example teach us anything for your work?

0:18:39.796 --> 0:18:42.756
<v Speaker 2>The self driving car analogy is pretty good. I do

0:18:42.836 --> 0:18:47.156
<v Speaker 2>think that fortunately, we may not need There are scenarios

0:18:47.156 --> 0:18:48.916
<v Speaker 2>where we may not need it to be quite as

0:18:48.956 --> 0:18:52.676
<v Speaker 2>reliable as cars. Cars there is a much much higher

0:18:52.876 --> 0:18:56.956
<v Speaker 2>safety risk. It's much easier to hurt people, and in

0:18:57.076 --> 0:18:59.036
<v Speaker 2>robots there are safety risks because you are in the

0:18:59.036 --> 0:19:03.356
<v Speaker 2>physical world. But it's easier to put in software precautions

0:19:03.396 --> 0:19:06.116
<v Speaker 2>in place and even hardware precautions in place to prevent

0:19:06.156 --> 0:19:08.396
<v Speaker 2>that as well, So that makes it a little bit easier.

0:19:08.396 --> 0:19:11.796
<v Speaker 1>I mean, nine percent probably isn't good enough for cars, right,

0:19:11.796 --> 0:19:14.596
<v Speaker 1>They probably need more nines than that, whereas it may

0:19:14.596 --> 0:19:16.356
<v Speaker 1>well be good enough for a house.

0:19:16.156 --> 0:19:19.916
<v Speaker 2>Cleaning robots, yeah, in certain circumstances. And yeah, like we're

0:19:19.916 --> 0:19:22.316
<v Speaker 2>also thinking about scenarios where maybe even less than that

0:19:22.396 --> 0:19:26.076
<v Speaker 2>is fine. And if we view humans and robots working together,

0:19:26.396 --> 0:19:29.436
<v Speaker 2>it's more about kind of helping the person complete the

0:19:29.436 --> 0:19:33.436
<v Speaker 2>task faster or complete the task more effectively. So I

0:19:33.436 --> 0:19:35.956
<v Speaker 2>think there might be scenarios like that, but still we

0:19:35.996 --> 0:19:39.076
<v Speaker 2>need the performance and reliability to be higher for the

0:19:39.156 --> 0:19:41.476
<v Speaker 2>robots to be faster in order to accomplish that.

0:19:44.676 --> 0:19:59.156
<v Speaker 1>We'll be back in just a minute. What do you

0:19:59.196 --> 0:20:02.436
<v Speaker 1>imagine as the initial real world use cases?

0:20:05.076 --> 0:20:07.236
<v Speaker 2>I don't know. There's a lot of examples of robotics

0:20:07.236 --> 0:20:11.196
<v Speaker 2>companies that have a tempted to kind of start with

0:20:11.236 --> 0:20:16.156
<v Speaker 2>an application and hone in on that, and I think

0:20:16.196 --> 0:20:20.156
<v Speaker 2>the lesson from watching those companies is that you end

0:20:20.236 --> 0:20:23.596
<v Speaker 2>up then spending a lot of time on the problems

0:20:23.596 --> 0:20:26.956
<v Speaker 2>of that specific application and less on developing the sort

0:20:26.996 --> 0:20:28.796
<v Speaker 2>of generalist systems that we think in the long run

0:20:28.836 --> 0:20:31.596
<v Speaker 2>will be more effective. And so we're very focused on

0:20:32.276 --> 0:20:36.036
<v Speaker 2>understanding what are the core bottlenecks and the core missing

0:20:36.076 --> 0:20:38.876
<v Speaker 2>pieces for developing these generalist models, and we think that

0:20:38.916 --> 0:20:41.356
<v Speaker 2>if we had picked an application now, we would kind

0:20:41.356 --> 0:20:43.156
<v Speaker 2>of lose sight of that bigger problem because we need

0:20:43.156 --> 0:20:45.916
<v Speaker 2>to solve things that are specific to that application. So

0:20:46.076 --> 0:20:48.636
<v Speaker 2>we're very focused on what we think are like the

0:20:48.636 --> 0:20:53.876
<v Speaker 2>core technological challenges. We have certain tasks that we're working on.

0:20:53.916 --> 0:20:56.556
<v Speaker 2>Some of them have been home cleaning tasks. We've also

0:20:56.636 --> 0:20:59.716
<v Speaker 2>have some more kind of industrial light tasks as well,

0:20:59.956 --> 0:21:04.196
<v Speaker 2>just to instantiate and actually be iterating on robots and

0:21:04.396 --> 0:21:09.396
<v Speaker 2>applications could range from things and homes to things in

0:21:09.476 --> 0:21:14.076
<v Speaker 2>workplaces to industrial settings. There's lots and lots of use

0:21:14.116 --> 0:21:18.716
<v Speaker 2>cases for intelligent robots and intelligent kind of physical machines.

0:21:19.556 --> 0:21:23.796
<v Speaker 1>What are some of the industrial tasks you've been working on.

0:21:24.476 --> 0:21:27.356
<v Speaker 2>One example that I mentioned before is inserting cables. There's

0:21:27.436 --> 0:21:31.236
<v Speaker 2>lots of use cases in data centers, for example, where

0:21:31.836 --> 0:21:36.716
<v Speaker 2>that's a challenging task. Another example is constructing cardboard boxes

0:21:36.756 --> 0:21:40.396
<v Speaker 2>and filling them with items. We've also done some packaging

0:21:40.436 --> 0:21:44.396
<v Speaker 2>tasks highly relevant to lots of different kind of shipping operations.

0:21:44.836 --> 0:21:47.516
<v Speaker 2>And then even folding clothes. It seems like a very

0:21:47.556 --> 0:21:50.556
<v Speaker 2>home task, but it turns out that there are companies

0:21:50.756 --> 0:21:54.316
<v Speaker 2>that need to fold like very large lots of clothing,

0:21:55.036 --> 0:21:57.996
<v Speaker 2>and so that's also something that in the long term

0:21:58.036 --> 0:22:01.316
<v Speaker 2>could be used in larger scale settings.

0:22:01.756 --> 0:22:07.916
<v Speaker 1>So I've read that you have open sourced your model

0:22:07.956 --> 0:22:11.556
<v Speaker 1>weights and given designs of robots to hardware companies, and

0:22:11.596 --> 0:22:14.916
<v Speaker 1>I'm interested in that and that set of decisions, right,

0:22:14.956 --> 0:22:17.756
<v Speaker 1>that set of sort of strategic decisions. Tell me about

0:22:17.796 --> 0:22:20.716
<v Speaker 1>that sort of giving away IP basically.

0:22:20.356 --> 0:22:23.596
<v Speaker 2>Right, yeah, yeah, definitely. So this is a really hard problem,

0:22:23.836 --> 0:22:26.676
<v Speaker 2>especially this longer term problem of developing a general system.

0:22:26.756 --> 0:22:32.996
<v Speaker 2>We think that the field is very young, and there's

0:22:33.316 --> 0:22:36.356
<v Speaker 2>like a couple of reasons. One is that we think

0:22:36.396 --> 0:22:38.236
<v Speaker 2>that the field needs to mature, and we think that

0:22:38.756 --> 0:22:41.876
<v Speaker 2>having more people being kind of competent with using robots

0:22:41.916 --> 0:22:44.916
<v Speaker 2>and using this kind of technology will be beneficial in

0:22:44.916 --> 0:22:47.476
<v Speaker 2>the long term for the company, and by open sourcing things,

0:22:47.516 --> 0:22:49.916
<v Speaker 2>we make it easier for people to do that. And

0:22:49.956 --> 0:22:52.516
<v Speaker 2>then the second thing is, like the models that we

0:22:52.596 --> 0:22:55.996
<v Speaker 2>develop right now, they're very early, and the models that

0:22:56.076 --> 0:22:59.916
<v Speaker 2>we'll be developing one to three years from now are

0:22:59.956 --> 0:23:02.396
<v Speaker 2>going to be far far more capable than the ones

0:23:02.436 --> 0:23:05.156
<v Speaker 2>that we have now. And so it's kind of like

0:23:05.156 --> 0:23:09.276
<v Speaker 2>like equivalent to like open eye open sourcing GPT to

0:23:09.516 --> 0:23:13.236
<v Speaker 2>GPT three. They actually didn't open source GPT three, but like,

0:23:13.596 --> 0:23:15.556
<v Speaker 2>I think that they would still be in an excellent

0:23:15.596 --> 0:23:17.356
<v Speaker 2>spot today if they had.

0:23:19.076 --> 0:23:22.836
<v Speaker 1>Like what could go wrong that would either prevent you

0:23:22.956 --> 0:23:25.676
<v Speaker 1>as a company from succeeding or even hold back the

0:23:25.716 --> 0:23:28.756
<v Speaker 1>field In general, I don't think we.

0:23:28.836 --> 0:23:31.996
<v Speaker 2>Entirely know the scale of data that we need for

0:23:32.676 --> 0:23:36.276
<v Speaker 2>getting really capable models. And there's a little bit of

0:23:36.276 --> 0:23:39.116
<v Speaker 2>a chicken and egg problem where it's a lot easier

0:23:39.116 --> 0:23:41.676
<v Speaker 2>to collect data once you have a really good model.

0:23:42.116 --> 0:23:43.716
<v Speaker 2>It took like large amounts of data.

0:23:43.516 --> 0:23:45.196
<v Speaker 1>Right, Or if there were thousands of robots out of

0:23:45.236 --> 0:23:47.036
<v Speaker 1>the world running your model, they would just make an

0:23:47.076 --> 0:23:50.036
<v Speaker 1>incredible amount of data coming into you every day, right.

0:23:50.356 --> 0:23:53.676
<v Speaker 2>Yeah, yeah, exactly. So that's that's one thing I actually

0:23:53.796 --> 0:23:57.116
<v Speaker 2>less maybe less a little bit less concerned about that myself.

0:23:57.116 --> 0:23:58.796
<v Speaker 2>And then I think the other thing is just that

0:23:58.796 --> 0:24:02.396
<v Speaker 2>there are technological challenges to getting these things to work

0:24:02.436 --> 0:24:05.316
<v Speaker 2>really well. I think that I think we've had incredible

0:24:05.356 --> 0:24:09.476
<v Speaker 2>progress over the last year and two months, the last

0:24:09.476 --> 0:24:12.636
<v Speaker 2>like fourteen months. I think since we've started, probably more

0:24:12.676 --> 0:24:17.236
<v Speaker 2>progress than I was expecting, honestly compared to when we

0:24:17.236 --> 0:24:20.916
<v Speaker 2>started the company. I think it's like wild that we

0:24:20.956 --> 0:24:22.676
<v Speaker 2>were able to get a robot to like unload and

0:24:22.676 --> 0:24:25.676
<v Speaker 2>fold laundry like a ten minute long task.

0:24:25.596 --> 0:24:30.196
<v Speaker 1>And folding laundry is like a famously hard robot problem, right,

0:24:30.236 --> 0:24:32.636
<v Speaker 1>Like it's the one that people in robotics talk about

0:24:32.916 --> 0:24:35.796
<v Speaker 1>when they talk about things people think are easy are

0:24:35.836 --> 0:24:37.636
<v Speaker 1>actually hard for robots, right.

0:24:37.596 --> 0:24:39.796
<v Speaker 2>Yeah, absolutely absolutely. I mean you have to deal with

0:24:39.836 --> 0:24:42.836
<v Speaker 2>all sorts of variability and how clothes can be crumpled

0:24:42.836 --> 0:24:45.516
<v Speaker 2>on each other. And also it's like there's even like

0:24:45.636 --> 0:24:47.516
<v Speaker 2>really small, minor things you need to do in order

0:24:47.556 --> 0:24:49.036
<v Speaker 2>to like actually get it to be flat on the

0:24:49.076 --> 0:24:52.836
<v Speaker 2>table and folded nicely and even stacked. And as the

0:24:52.836 --> 0:24:55.476
<v Speaker 2>task gets longer as well, there are more opportunities to

0:24:55.516 --> 0:24:58.836
<v Speaker 2>make mistakes, more opportunities to get stuck. And so if

0:24:58.836 --> 0:25:00.676
<v Speaker 2>you're doing a task it takes ten minutes, in those

0:25:00.676 --> 0:25:02.676
<v Speaker 2>ten minutes, there's many many times where the robot can

0:25:02.716 --> 0:25:06.316
<v Speaker 2>make a mistake that it can't recover from or just

0:25:06.316 --> 0:25:08.276
<v Speaker 2>get stuck or something like that. And so being able

0:25:08.316 --> 0:25:10.956
<v Speaker 2>to do such a task starts to kind of point

0:25:10.956 --> 0:25:13.676
<v Speaker 2>at the resilience that these models can have by recovering

0:25:13.756 --> 0:25:16.476
<v Speaker 2>from those mystics. Uh huh, so when we were first

0:25:16.516 --> 0:25:20.316
<v Speaker 2>trying to fold laundry, like, one of the common failure

0:25:20.356 --> 0:25:23.356
<v Speaker 2>modes is that it would fold the laundry like very

0:25:23.356 --> 0:25:26.116
<v Speaker 2>well by my standards at the time, I would be

0:25:26.196 --> 0:25:28.116
<v Speaker 2>very very happy with the robot, and then it would

0:25:28.276 --> 0:25:30.836
<v Speaker 2>push the entire stack of laundry onto the ground.

0:25:32.756 --> 0:25:35.476
<v Speaker 1>Sort of like teaching a toddler to fold clothes.

0:25:36.236 --> 0:25:37.436
<v Speaker 2>Yeah, yeah, exactly.

0:25:37.636 --> 0:25:43.556
<v Speaker 1>Was there a particular moment when you saw a robot

0:25:43.636 --> 0:25:46.236
<v Speaker 1>using your model full close for ten minutes and it worked.

0:25:46.756 --> 0:25:50.356
<v Speaker 2>Yeah. First off, we started with just folding a shirt

0:25:50.516 --> 0:25:52.516
<v Speaker 2>starting flat on the table. We got that to work

0:25:52.556 --> 0:25:54.596
<v Speaker 2>pretty quickly that it turns out to be pretty easy,

0:25:55.196 --> 0:25:57.156
<v Speaker 2>and I wasn't too surprised by that. And then we

0:25:57.276 --> 0:25:59.756
<v Speaker 2>moved from that to starting it in like just a

0:25:59.836 --> 0:26:02.996
<v Speaker 2>random ball, like some sort of crumpled position on the table,

0:26:03.156 --> 0:26:04.836
<v Speaker 2>and then you have to flatten and then fold it,

0:26:04.916 --> 0:26:07.956
<v Speaker 2>and that makes a problem dramatically harder because of all

0:26:07.956 --> 0:26:10.676
<v Speaker 2>the variability having to figure out how to flatten it.

0:26:11.236 --> 0:26:14.796
<v Speaker 2>We were kind of stuck on that problem for at

0:26:14.876 --> 0:26:18.596
<v Speaker 2>least a couple of months, where everything we're trying, the

0:26:18.636 --> 0:26:20.956
<v Speaker 2>success rate of the robot was zero percent. It wasn't

0:26:20.956 --> 0:26:24.836
<v Speaker 2>able to really make progress on it, and we started

0:26:24.836 --> 0:26:28.676
<v Speaker 2>to see signs of life I think in August or

0:26:28.716 --> 0:26:33.196
<v Speaker 2>September of last year, where we tried a new recipe

0:26:33.236 --> 0:26:35.996
<v Speaker 2>where we were continue to train the model on a

0:26:36.076 --> 0:26:39.716
<v Speaker 2>curated part of the data that was following a consistent strategy,

0:26:40.436 --> 0:26:43.516
<v Speaker 2>and that sort of high quality post training is what

0:26:43.676 --> 0:26:46.116
<v Speaker 2>really seemed to make the model work better. And then

0:26:46.236 --> 0:26:48.436
<v Speaker 2>the moment that I was most excited about was the

0:26:48.516 --> 0:26:52.316
<v Speaker 2>first time that I saw the model flatten and fold

0:26:52.396 --> 0:26:54.076
<v Speaker 2>and stack five items in a row.

0:26:54.396 --> 0:26:54.596
<v Speaker 1>Yeah.

0:26:54.836 --> 0:26:56.796
<v Speaker 2>I just remember going home that night and being like

0:26:56.876 --> 0:27:00.196
<v Speaker 2>so excited. It seemed like we had just like figured

0:27:00.196 --> 0:27:02.116
<v Speaker 2>out this this big missing puzzle piece.

0:27:02.436 --> 0:27:04.996
<v Speaker 1>So I was asking you why might it not work

0:27:05.076 --> 0:27:07.076
<v Speaker 1>or what might slow the field down? And then we

0:27:07.436 --> 0:27:10.636
<v Speaker 1>talked about the happy short story. But if in five

0:27:10.716 --> 0:27:12.996
<v Speaker 1>years things didn't progress as quickly as you thought, what

0:27:14.596 --> 0:27:15.316
<v Speaker 1>might have happened.

0:27:16.316 --> 0:27:18.756
<v Speaker 2>I mentioned that I think that incorporating practice, like allowing

0:27:18.756 --> 0:27:22.276
<v Speaker 2>the we're about to practice the task, should be really

0:27:22.276 --> 0:27:26.556
<v Speaker 2>helpful for allowing robots to get better. We don't know

0:27:26.556 --> 0:27:30.076
<v Speaker 2>what exactly that recipe will look like, and so it's

0:27:30.116 --> 0:27:33.956
<v Speaker 2>like a research problem, and with any sort of research problem,

0:27:34.676 --> 0:27:36.756
<v Speaker 2>you don't know exactly how hard the solution is going

0:27:36.796 --> 0:27:38.596
<v Speaker 2>to be, and I think that there are some other

0:27:39.156 --> 0:27:41.836
<v Speaker 2>more nuanced unknowns as well that are somewhat similar to that.

0:27:41.956 --> 0:27:45.596
<v Speaker 2>And we have a large number of very talented researchers

0:27:45.836 --> 0:27:48.196
<v Speaker 2>on our team because we think that there are some

0:27:48.236 --> 0:27:51.436
<v Speaker 2>of these unsolved breakthroughs that are going to be needed

0:27:51.476 --> 0:27:53.516
<v Speaker 2>to really truly solve this problem.

0:27:54.276 --> 0:28:01.476
<v Speaker 1>So, if it does work well and things progress in

0:28:01.556 --> 0:28:05.316
<v Speaker 1>that universe, what would you be worried about?

0:28:06.116 --> 0:28:09.036
<v Speaker 2>Good question? I mean, if things work well, I shouldn't

0:28:09.036 --> 0:28:12.236
<v Speaker 2>be too worried. In general. I do think that it's

0:28:12.356 --> 0:28:16.316
<v Speaker 2>very easy in general to underestimate the challenges around actually

0:28:16.356 --> 0:28:20.756
<v Speaker 2>deploying and disseminating technology that takes time, and when the

0:28:20.836 --> 0:28:24.316
<v Speaker 2>technology doesn't exist yet, that means that like the world

0:28:24.436 --> 0:28:26.396
<v Speaker 2>is not in a place that is like ready for

0:28:26.436 --> 0:28:29.636
<v Speaker 2>that technology. I think that there's a lot of unknowns there.

0:28:29.956 --> 0:28:33.996
<v Speaker 1>I mean, one of the striking things to me about, say,

0:28:34.076 --> 0:28:36.596
<v Speaker 1>language models, is the people who know the most about

0:28:36.596 --> 0:28:39.036
<v Speaker 1>them seem to be the most worried about them, which

0:28:39.076 --> 0:28:42.196
<v Speaker 1>is generally not the case. I think historically with technology,

0:28:42.276 --> 0:28:47.596
<v Speaker 1>right the possible exception of the atomic bomb, and so

0:28:47.636 --> 0:28:51.036
<v Speaker 1>I'm curious. I mean those kinds of worries, like do

0:28:51.116 --> 0:28:53.356
<v Speaker 1>you share them? Are there worries you have about developing

0:28:53.356 --> 0:28:57.476
<v Speaker 1>a foundation model for robots about bad actors using it?

0:28:57.636 --> 0:29:01.796
<v Speaker 2>Even I do think that, like, yeah, there's plenty of

0:29:01.796 --> 0:29:05.476
<v Speaker 2>technology that has dual uses, and I think there are

0:29:06.636 --> 0:29:12.836
<v Speaker 2>applications of technologies that are harmful. I think that a

0:29:12.876 --> 0:29:17.916
<v Speaker 2>lot of the concerns in the language model community stem

0:29:17.956 --> 0:29:24.116
<v Speaker 2>from imviewing these systems with greater autonomy. And I think

0:29:24.156 --> 0:29:28.956
<v Speaker 2>that I work like hands on with the robots quite

0:29:28.956 --> 0:29:32.636
<v Speaker 2>a bit, and I don't see a world in which

0:29:32.876 --> 0:29:35.956
<v Speaker 2>they will be taking over in any way. It's very

0:29:35.996 --> 0:29:38.836
<v Speaker 2>easy to just like, well, with our current iteration of robots,

0:29:38.836 --> 0:29:40.676
<v Speaker 2>to just like if we threw some water on it,

0:29:40.716 --> 0:29:42.756
<v Speaker 2>the robot wouldn't be in trouble.

0:29:42.876 --> 0:29:46.716
<v Speaker 1>So that might be a problem for you, but I'm

0:29:46.756 --> 0:29:48.316
<v Speaker 1>sure you could solve that way we're working.

0:29:48.356 --> 0:29:50.356
<v Speaker 2>We're working on so we actually do have a new

0:29:50.356 --> 0:29:52.996
<v Speaker 2>iteration that that is actually a lot more waterproof. But

0:29:53.436 --> 0:29:54.716
<v Speaker 2>it's just not a concern that I show.

0:29:54.876 --> 0:29:58.756
<v Speaker 1>Okay, interesting basically just because you think we can whatever

0:29:59.196 --> 0:30:00.436
<v Speaker 1>turn it off if we need to.

0:30:01.036 --> 0:30:03.516
<v Speaker 2>Yeah, and yeah, and I think, yeah, there's always going

0:30:03.596 --> 0:30:05.796
<v Speaker 2>to be dual use concerns, but I think that the

0:30:06.156 --> 0:30:09.396
<v Speaker 2>pros of the technology outweigh outway some of the Jobson's.

0:30:09.196 --> 0:30:11.796
<v Speaker 1>Well, give me the happy story, then, like in what

0:30:11.796 --> 0:30:13.956
<v Speaker 1>what number of years should we choose for a happy story?

0:30:14.036 --> 0:30:15.396
<v Speaker 1>Ten is ten too soon?

0:30:16.036 --> 0:30:17.516
<v Speaker 2>I don't want to put a number to it. I

0:30:17.516 --> 0:30:21.716
<v Speaker 2>think that they with research, you don't know exactly how

0:30:21.756 --> 0:30:25.756
<v Speaker 2>thongs things will take. And I an envision a world

0:30:25.836 --> 0:30:30.876
<v Speaker 2>where the when you're developing hardware, it's it's not too

0:30:30.956 --> 0:30:34.276
<v Speaker 2>hard to actually teach it to do something, and teach

0:30:34.316 --> 0:30:38.236
<v Speaker 2>it to do something useful, rather than just having machines

0:30:38.316 --> 0:30:43.796
<v Speaker 2>that are not particularly intelligent, like dishwashers and laundry machines

0:30:43.836 --> 0:30:44.676
<v Speaker 2>and so forth.

0:30:45.676 --> 0:30:49.236
<v Speaker 1>Go bigger if you would like what like what what

0:30:49.236 --> 0:30:51.316
<v Speaker 1>what would be pill be teached robots to do in

0:30:51.316 --> 0:30:52.996
<v Speaker 1>that world, I.

0:30:53.196 --> 0:30:54.916
<v Speaker 2>Guess if we were to go bigger, I think that

0:30:55.036 --> 0:30:59.036
<v Speaker 2>there's a lot of challenges around helping helping people as

0:30:59.076 --> 0:31:02.316
<v Speaker 2>the age allowing them to be more independent. That that's

0:31:02.356 --> 0:31:05.636
<v Speaker 2>like a huge one. I think that I don't know, manufacturing,

0:31:05.676 --> 0:31:08.196
<v Speaker 2>there's all sorts of places where like there's abuse of

0:31:08.276 --> 0:31:11.076
<v Speaker 2>labor practices and we can maybe like be able to

0:31:11.076 --> 0:31:15.476
<v Speaker 2>eliminate those if it's a robot instead of a human. Yeah, many, many,

0:31:15.516 --> 0:31:17.476
<v Speaker 2>many examples. And I think that there's also even things

0:31:17.476 --> 0:31:20.476
<v Speaker 2>that are even hard to imagine because the technology doesn't exist.

0:31:20.516 --> 0:31:22.756
<v Speaker 2>So a lot of the things that I'm thinking about

0:31:22.796 --> 0:31:26.556
<v Speaker 2>are robots helping humans in different circumstances to allow them

0:31:26.556 --> 0:31:30.556
<v Speaker 2>to be more productive. But once something exists, like you often,

0:31:30.836 --> 0:31:32.876
<v Speaker 2>like people are creative and come up with new ways

0:31:32.876 --> 0:31:34.316
<v Speaker 2>of how that's used.

0:31:37.116 --> 0:31:49.196
<v Speaker 1>We'll be back in a minute with the lightning round. Great,

0:31:49.276 --> 0:31:54.116
<v Speaker 1>let's finish with the lightning round. What's one thing that

0:31:54.196 --> 0:31:58.796
<v Speaker 1>working with robots has caused you to appreciate about the

0:31:58.876 --> 0:31:59.516
<v Speaker 1>human body?

0:32:00.836 --> 0:32:02.196
<v Speaker 2>Our skin is pretty amazing.

0:32:02.676 --> 0:32:07.556
<v Speaker 1>Huh. Well, so we didn't talk about I mean a

0:32:07.636 --> 0:32:10.836
<v Speaker 1>sense of touch, or of of heat or of cold, right,

0:32:10.876 --> 0:32:13.556
<v Speaker 1>I mean presumably the models you're building, the robots you're

0:32:13.596 --> 0:32:17.076
<v Speaker 1>using don't have that, but they could, right, they could

0:32:17.196 --> 0:32:20.676
<v Speaker 1>have a sense of touch. Is anyone working on that?

0:32:20.876 --> 0:32:21.836
<v Speaker 1>Is that of interest to you?

0:32:22.676 --> 0:32:25.036
<v Speaker 2>Lots of people working on it. I think it's pretty interesting.

0:32:25.236 --> 0:32:28.516
<v Speaker 2>I think that the hardware technology is not super mature

0:32:28.716 --> 0:32:30.156
<v Speaker 2>compared to where I'd like for it to be in

0:32:30.236 --> 0:32:33.756
<v Speaker 2>terms of how robust it is. And the cheapness and

0:32:33.796 --> 0:32:37.156
<v Speaker 2>the resolution that said, Like, we actually put cameras on

0:32:37.236 --> 0:32:39.996
<v Speaker 2>the risks of our robot to help it get some

0:32:40.036 --> 0:32:42.516
<v Speaker 2>sort of tactile and for example, if you can, if

0:32:42.516 --> 0:32:45.076
<v Speaker 2>you like visually look at your finger as you make

0:32:45.116 --> 0:32:48.156
<v Speaker 2>contact with an object, you can see it to form

0:32:48.756 --> 0:32:51.796
<v Speaker 2>around that object, and you can actually just by looking

0:32:51.796 --> 0:32:55.076
<v Speaker 2>at your finger get some notion of tactile feedback similar

0:32:55.076 --> 0:32:57.196
<v Speaker 2>to what our skin gets. Yeah, and cameras are cheap,

0:32:57.196 --> 0:33:01.236
<v Speaker 2>really easy, robust, way more robust and cheap than existing

0:33:01.276 --> 0:33:02.596
<v Speaker 2>technology for tactile something.

0:33:04.716 --> 0:33:08.476
<v Speaker 1>I've heard you say that humanoid robots are overrated, and

0:33:08.516 --> 0:33:09.876
<v Speaker 1>I'm curious, why do you think that.

0:33:11.196 --> 0:33:14.956
<v Speaker 2>I think that simplicity is really helpful and important when

0:33:14.996 --> 0:33:19.596
<v Speaker 2>trying to develop technology. When you introduce more complexity than's needed,

0:33:19.636 --> 0:33:22.596
<v Speaker 2>it slows you down a lot. And I think that

0:33:22.876 --> 0:33:27.116
<v Speaker 2>the complexity that humanoids introduce. Yeah, I think that if

0:33:27.116 --> 0:33:29.316
<v Speaker 2>all of the robots we were working with were humanoids,

0:33:29.516 --> 0:33:31.996
<v Speaker 2>I think that we wouldn't have made anywhere near the

0:33:31.996 --> 0:33:35.236
<v Speaker 2>progress that we've made because we'd be dealing with additional challenges.

0:33:35.636 --> 0:33:38.636
<v Speaker 2>I also think that optimizing for ease of data collection

0:33:38.916 --> 0:33:41.276
<v Speaker 2>is really important in a world where we need data,

0:33:41.596 --> 0:33:45.396
<v Speaker 2>and it's a lot harder to collect and operate all

0:33:45.436 --> 0:33:49.236
<v Speaker 2>of the different joints and motors of a humanoid than

0:33:49.276 --> 0:33:51.476
<v Speaker 2>it is to control a simpler robot.

0:33:52.476 --> 0:33:54.236
<v Speaker 1>Do you anthropomorphize robots?

0:33:55.236 --> 0:33:58.676
<v Speaker 2>I hate it when people are anthrough morphize robots. I

0:33:58.716 --> 0:34:03.156
<v Speaker 2>think that it is misleading because the failure modes that

0:34:03.236 --> 0:34:05.596
<v Speaker 2>robots have are very different from the failure modes that

0:34:05.636 --> 0:34:08.836
<v Speaker 2>people have, and it misleads people into thinking that it's

0:34:08.876 --> 0:34:11.196
<v Speaker 2>going to behave in the way that people behave.

0:34:12.196 --> 0:34:13.836
<v Speaker 1>Like like in what way?

0:34:14.276 --> 0:34:16.516
<v Speaker 2>Oh like, if you see a robot doing something like

0:34:16.556 --> 0:34:20.036
<v Speaker 2>doing a backflip, like or even folding laundry, you kind

0:34:20.036 --> 0:34:21.996
<v Speaker 2>of assume that anything like like if you saw a

0:34:22.036 --> 0:34:23.796
<v Speaker 2>person do that, then they probably could do a lot

0:34:23.796 --> 0:34:26.236
<v Speaker 2>of other things too. And if you have to promorphize

0:34:26.236 --> 0:34:28.756
<v Speaker 2>the robot, then you assume that it, like the capabilities

0:34:28.756 --> 0:34:31.436
<v Speaker 2>that you see are representative as if it were like

0:34:31.476 --> 0:34:34.716
<v Speaker 2>a human ah, and that it could do a backflip anywhere,

0:34:35.036 --> 0:34:38.876
<v Speaker 2>or that it could fold laundry anywhere with any item

0:34:38.876 --> 0:34:39.676
<v Speaker 2>of clothing.

0:34:39.516 --> 0:34:41.396
<v Speaker 1>Or surely you would think a robot that could do

0:34:41.436 --> 0:34:44.476
<v Speaker 1>a backflip could fold a shirt, but no.

0:34:45.196 --> 0:34:49.956
<v Speaker 2>Exactly exactly, so sometimes it's fun to like assign emotions

0:34:49.956 --> 0:34:51.756
<v Speaker 2>to some of the things, or say the robots having

0:34:51.796 --> 0:34:54.476
<v Speaker 2>a bad day, because certainly it feels like that sometime.

0:34:54.676 --> 0:34:58.356
<v Speaker 2>But when it kind of moves beyond fun and jokes,

0:34:58.836 --> 0:35:01.316
<v Speaker 2>it might have consequences that I don't think makes sense.

0:35:02.836 --> 0:35:06.636
<v Speaker 1>I read that there was a researcher who said they

0:35:06.676 --> 0:35:10.276
<v Speaker 1>would retire if a robot tied to shoela Yes, and

0:35:10.276 --> 0:35:13.036
<v Speaker 1>then one of your robots tied to shoelace, and I

0:35:13.076 --> 0:35:18.476
<v Speaker 1>guess they didn't retire. But I'm curious. What would you

0:35:18.676 --> 0:35:22.156
<v Speaker 1>need to see a robot do to retire.

0:35:23.516 --> 0:35:26.916
<v Speaker 2>Hmm, I don't know. I guess one example that I've

0:35:26.916 --> 0:35:29.676
<v Speaker 2>given before that I would love to see a robot do.

0:35:29.756 --> 0:35:32.716
<v Speaker 2>I don't think this is quite retirement level, but being

0:35:32.756 --> 0:35:34.876
<v Speaker 2>able to go into a kitchen that has never been

0:35:34.916 --> 0:35:39.196
<v Speaker 2>in before and make a bowl of cereal pretty basic,

0:35:40.236 --> 0:35:42.356
<v Speaker 2>especially compared to doing a backflip. I cannot do a

0:35:42.356 --> 0:35:44.396
<v Speaker 2>backflip myself, but I could make a bowl of cereal.

0:35:44.716 --> 0:35:47.476
<v Speaker 2>But it requires being able to find objects in the environment,

0:35:47.516 --> 0:35:51.036
<v Speaker 2>being able to interact with delicate objects like a cereal box,

0:35:51.596 --> 0:35:54.116
<v Speaker 2>maybe even use tools in order to open the cereal box.

0:35:54.516 --> 0:35:58.396
<v Speaker 2>Pouring liquids. Yeah, so that's a task that I love,

0:35:58.636 --> 0:36:00.796
<v Speaker 2>and I could actually even see us being able to

0:36:01.116 --> 0:36:04.276
<v Speaker 2>show a demo of that without too much difficulty actually

0:36:04.716 --> 0:36:06.676
<v Speaker 2>if we put our mind to it and in collected

0:36:06.756 --> 0:36:09.276
<v Speaker 2>data for it. So it actually is, I think, or

0:36:09.316 --> 0:36:12.756
<v Speaker 2>within reach than maybe I imagined a few years ago.

0:36:12.876 --> 0:36:16.756
<v Speaker 1>Just as you're thinking about it, it's getting closer. You're like, oh, wait,

0:36:16.796 --> 0:36:17.516
<v Speaker 1>we could do that.

0:36:18.396 --> 0:36:20.676
<v Speaker 2>Yeah. I mean we've actually collected data of pouring cereal,

0:36:21.276 --> 0:36:23.516
<v Speaker 2>like opening a cereal box and pouring it into a bowl.

0:36:23.916 --> 0:36:26.916
<v Speaker 2>We haven't yet done liquid handling and pouring, but I

0:36:26.916 --> 0:36:28.996
<v Speaker 2>think we're actually going to do it this week. On

0:36:29.076 --> 0:36:32.076
<v Speaker 2>the Robot, I asked the hardware team to make a

0:36:32.476 --> 0:36:35.636
<v Speaker 2>waterproof robot. So we're not too far. A lot of

0:36:35.676 --> 0:36:38.836
<v Speaker 2>the pieces are coming together. I also, I love working

0:36:38.836 --> 0:36:41.796
<v Speaker 2>with robots and so, and I'm also fairly young, I

0:36:41.796 --> 0:36:46.036
<v Speaker 2>think not too old, and so I don't imagine myself

0:36:46.036 --> 0:36:47.036
<v Speaker 2>retiring anytime soon.

0:36:53.996 --> 0:36:57.156
<v Speaker 1>Chelsea Finn is a Stanford professor and the co founder

0:36:57.196 --> 0:37:01.596
<v Speaker 1>of Physical Intelligence. You can email us at problem at

0:37:01.596 --> 0:37:04.556
<v Speaker 1>pushkin dot fm, and please do email us. I read

0:37:04.596 --> 0:37:07.996
<v Speaker 1>all the emails. Today's show was produced by Gabriel Hunter Chang,

0:37:08.516 --> 0:37:12.836
<v Speaker 1>edited by Alexander Garreton and engineered by Sarah Bruguerrett. I'm

0:37:12.876 --> 0:37:15.236
<v Speaker 1>Jacob Goldstein and we'll be back next week with another

0:37:15.236 --> 0:37:16.236
<v Speaker 1>episode of What's Your Pop