1
00:00:15,356 --> 00:00:23,476
Speaker 1: Pushkin. In a metaphorical sense, AI is everywhere. It can

2
00:00:23,556 --> 00:00:26,356
Speaker 1: write essays, it can do your texes, it can design drugs,

3
00:00:26,356 --> 00:00:30,516
Speaker 1: it can make movies. But in a literal sense, AI

4
00:00:31,236 --> 00:00:35,356
Speaker 1: is not everywhere. You know, a large language model can

5
00:00:35,396 --> 00:00:38,196
Speaker 1: tell you whatever twenty seven ways to fold your shirts

6
00:00:38,196 --> 00:00:40,836
Speaker 1: and put them in the drawer, but there's no robot

7
00:00:40,916 --> 00:00:44,076
Speaker 1: that you can buy that can actually fold your shirts

8
00:00:44,156 --> 00:00:46,876
Speaker 1: and put them in the drawer. At some point, though

9
00:00:47,596 --> 00:00:50,116
Speaker 1: maybe at some point in the not that distant future,

10
00:00:50,756 --> 00:00:53,516
Speaker 1: there will be a robot that can use AI to

11
00:00:53,596 --> 00:00:55,316
Speaker 1: learn how to fold your shirts and put them in

12
00:00:55,316 --> 00:00:58,996
Speaker 1: the drawer, or you know, cook lasagna, pack boxes, plug

13
00:00:58,996 --> 00:01:02,196
Speaker 1: in cables. In other words, there will be a robot

14
00:01:02,316 --> 00:01:06,196
Speaker 1: that can use AI to learn how to do basically anything.

15
00:01:12,276 --> 00:01:14,636
Speaker 1: I'm Jacob Goldstein and this is What's Your Problem, the

16
00:01:14,676 --> 00:01:16,436
Speaker 1: show where I talk to people who are trying to

17
00:01:16,436 --> 00:01:20,876
Speaker 1: make technological progress. My guest today is Chelsea Finn. She's

18
00:01:20,916 --> 00:01:23,156
Speaker 1: a professor at Stanford and the co founder of a

19
00:01:23,196 --> 00:01:28,556
Speaker 1: company called Physical Intelligence aka PI. Chelsea's problem is this,

20
00:01:29,276 --> 00:01:32,316
Speaker 1: can you build an AI model that will bring AI

21
00:01:32,676 --> 00:01:35,876
Speaker 1: to robots, or, as she puts it, we're.

22
00:01:35,676 --> 00:01:39,356
Speaker 2: Trying to develop a model that can control any robot

23
00:01:39,436 --> 00:01:41,036
Speaker 2: to do any task anywhere.

24
00:01:41,756 --> 00:01:44,916
Speaker 1: Physical Intelligence was founded just last year, but the company

25
00:01:44,916 --> 00:01:49,396
Speaker 1: has already raised over four hundred million dollars. Investors include

26
00:01:49,516 --> 00:01:53,556
Speaker 1: Jeff Bezos and OpenAI. The company has raised so much

27
00:01:53,596 --> 00:01:55,836
Speaker 1: money in part because what they're trying to do is

28
00:01:55,876 --> 00:01:59,916
Speaker 1: so hard. Motor skills, the ability to move and find

29
00:01:59,956 --> 00:02:02,596
Speaker 1: ways to fold the shirt to plug in a cable,

30
00:02:02,996 --> 00:02:07,116
Speaker 1: they feel simple to us, easy, basic, But Chelsea told

31
00:02:07,156 --> 00:02:10,756
Speaker 1: me basic motor skills are in fact wildly complex.

32
00:02:11,476 --> 00:02:14,556
Speaker 2: All of the motor control that we do with our body,

33
00:02:14,596 --> 00:02:18,196
Speaker 2: with their hands, with our legs, our feet, a lot

34
00:02:18,236 --> 00:02:20,716
Speaker 2: of it we don't think about when we do it.

35
00:02:20,716 --> 00:02:23,836
Speaker 2: It actually is incredibly complicated what we do. This is

36
00:02:23,836 --> 00:02:26,876
Speaker 2: actually like a really really hard problem to develop in

37
00:02:26,996 --> 00:02:30,476
Speaker 2: aisystems into robots, despite it being so simple. And the

38
00:02:30,516 --> 00:02:33,516
Speaker 2: reasons for that are because actually it is inherently very complex,

39
00:02:34,116 --> 00:02:37,316
Speaker 2: and second that we don't have tons and tons of

40
00:02:37,356 --> 00:02:40,876
Speaker 2: data of doing this, in part because it's so basic

41
00:02:40,956 --> 00:02:42,756
Speaker 2: to humans as well.

42
00:02:42,836 --> 00:02:45,556
Speaker 1: Right, let's talk about the data side, because that seems

43
00:02:45,636 --> 00:02:49,396
Speaker 1: like really the story, right, the big challenge, and it's

44
00:02:49,436 --> 00:02:54,596
Speaker 1: particularly interesting in the context of large language models and

45
00:02:54,636 --> 00:02:58,956
Speaker 1: computer vision which really seem to have emerged in a

46
00:02:58,996 --> 00:03:01,876
Speaker 1: weird way as a consequence of the Internet. Right, just

47
00:03:01,916 --> 00:03:06,436
Speaker 1: because we happen to have this crazy amount of data

48
00:03:06,596 --> 00:03:09,276
Speaker 1: of words and pictures on the Internet, we were able

49
00:03:09,316 --> 00:03:12,476
Speaker 1: to train language models and computer vision models. But we

50
00:03:12,556 --> 00:03:16,756
Speaker 1: don't have that for robots, right. There is no data

51
00:03:16,796 --> 00:03:19,876
Speaker 1: set of training data for robots, which is like the

52
00:03:19,956 --> 00:03:22,756
Speaker 1: big challenge for you and for robotics in general.

53
00:03:22,796 --> 00:03:25,636
Speaker 2: It seems, Yeah, so we don't have an open internet

54
00:03:25,636 --> 00:03:29,316
Speaker 2: of how to control motors to do like even really

55
00:03:29,356 --> 00:03:31,556
Speaker 2: basic things. Maybe the closest thing we have is we

56
00:03:31,596 --> 00:03:34,596
Speaker 2: have videos of people doing things, and perhaps that could

57
00:03:34,596 --> 00:03:37,076
Speaker 2: be useful. But at the same time, if I watch

58
00:03:37,196 --> 00:03:40,036
Speaker 2: like videos of like Roger Federer or playing tennis, you

59
00:03:40,076 --> 00:03:42,956
Speaker 2: can't just become an amazing tennis player as a result

60
00:03:42,956 --> 00:03:45,476
Speaker 2: of that. And likewise, just with videos of people doing things,

61
00:03:45,876 --> 00:03:48,716
Speaker 2: it's very hard to actually extract the motor control behind that.

62
00:03:48,876 --> 00:03:51,476
Speaker 2: And so that lack of data, that scarcity of data,

63
00:03:51,876 --> 00:03:56,316
Speaker 2: makes it in some ways a very different problem than

64
00:03:56,636 --> 00:03:58,956
Speaker 2: in language and computer vision. And I think that we

65
00:03:58,956 --> 00:04:00,796
Speaker 2: should still learn a lot of things from language computer

66
00:04:00,876 --> 00:04:04,196
Speaker 2: vision and collect large data sets like that. It opens

67
00:04:04,276 --> 00:04:07,596
Speaker 2: up new new challenges new possibilities on that front, and

68
00:04:07,676 --> 00:04:08,996
Speaker 2: I think that in the long run we should be

69
00:04:09,236 --> 00:04:11,876
Speaker 2: to get large amounts of data, just like how in

70
00:04:11,916 --> 00:04:14,356
Speaker 2: autonomous driving we have lots of data of cars driving

71
00:04:14,396 --> 00:04:18,076
Speaker 2: around very effectively. Robots too, could be in the world

72
00:04:18,196 --> 00:04:21,316
Speaker 2: collecting data learning about how to pick up mustard and

73
00:04:21,356 --> 00:04:23,516
Speaker 2: put it on a hot dog fund, or learning how

74
00:04:23,556 --> 00:04:26,556
Speaker 2: to open a cabinet to put some objects away. We

75
00:04:26,556 --> 00:04:29,356
Speaker 2: can get that sort of data, but it's not given

76
00:04:29,436 --> 00:04:33,196
Speaker 2: to us for free.

77
00:04:33,436 --> 00:04:36,596
Speaker 1: You still have this core problem, which is there is

78
00:04:36,916 --> 00:04:41,956
Speaker 1: no giant trove of physical reality data that you can

79
00:04:41,996 --> 00:04:44,996
Speaker 1: train your model on. Right, That's the great big challenge,

80
00:04:45,036 --> 00:04:46,796
Speaker 1: it seems, what do you do about that? How do

81
00:04:46,796 --> 00:04:47,996
Speaker 1: you start to approach that?

82
00:04:49,196 --> 00:04:52,676
Speaker 2: Yeah, so we're starting off by collecting data through telling

83
00:04:52,716 --> 00:04:57,436
Speaker 2: operation where you are people are controlling the robot to

84
00:04:57,436 --> 00:05:00,116
Speaker 2: do tasks, and then you don't just get video data.

85
00:05:00,196 --> 00:05:03,196
Speaker 2: You get the videos alongside what are the actions or

86
00:05:03,196 --> 00:05:07,076
Speaker 2: the motor commands needed to actually accomplish those tasks. We've

87
00:05:07,116 --> 00:05:10,636
Speaker 2: collected data in our own office. We've also collected data

88
00:05:10,876 --> 00:05:14,956
Speaker 2: in homes across San Francisco, and we also have a

89
00:05:15,076 --> 00:05:18,476
Speaker 2: very modest warehouse. In some ways, it actually like our

90
00:05:18,516 --> 00:05:22,476
Speaker 2: current operation is rather small, given that we're a little

91
00:05:22,476 --> 00:05:24,076
Speaker 2: over a year old at this point.

92
00:05:24,356 --> 00:05:26,556
Speaker 1: Like what's actually happening? Like if I went into your

93
00:05:26,556 --> 00:05:28,996
Speaker 1: warehouse and somebody was doing teleoperation, what would I see?

94
00:05:29,036 --> 00:05:29,836
Speaker 1: What would it look like?

95
00:05:30,676 --> 00:05:35,076
Speaker 2: Yeah, so we it's a little bit like controlling a puppet.

96
00:05:35,276 --> 00:05:38,956
Speaker 2: So the person who's operating at the robot, they are

97
00:05:38,996 --> 00:05:42,196
Speaker 2: holding in some ways a set of robot arms, but

98
00:05:42,196 --> 00:05:44,596
Speaker 2: they're very lightweight robot arms, and we use those to

99
00:05:44,676 --> 00:05:46,676
Speaker 2: measure the positions of joints.

100
00:05:47,076 --> 00:05:49,516
Speaker 1: It's almost like an elaborate control for a video game

101
00:05:49,636 --> 00:05:52,716
Speaker 1: or something. It's like that, it's not actually a robot arm, right,

102
00:05:52,716 --> 00:05:54,796
Speaker 1: It's a thing you control to sort of play the

103
00:05:54,956 --> 00:05:57,196
Speaker 1: robot to the robot move.

104
00:05:57,076 --> 00:06:00,516
Speaker 2: Yeah, exactly exactly, and then we record that and then

105
00:06:01,036 --> 00:06:04,956
Speaker 2: directly translate those controls over to the robot. We have

106
00:06:04,996 --> 00:06:07,516
Speaker 2: some robots that are just robot arms, where you're only

107
00:06:07,516 --> 00:06:09,636
Speaker 2: just controlling the robot arm. It's mounted to a table

108
00:06:09,756 --> 00:06:11,996
Speaker 2: or something like that. But we also have what we

109
00:06:12,036 --> 00:06:14,636
Speaker 2: call mobile manipulators that have wheels and robot arms, and

110
00:06:14,676 --> 00:06:18,036
Speaker 2: you can control both how the robot drives around as

111
00:06:18,116 --> 00:06:21,236
Speaker 2: well as how the arms move and we're doing tasks

112
00:06:21,356 --> 00:06:26,956
Speaker 2: like wiping down counters, folding laundry, putting dishes into dishwashers,

113
00:06:27,276 --> 00:06:32,716
Speaker 2: plugging cables into data center racks, assembling cardboard boxes, lots

114
00:06:32,756 --> 00:06:35,556
Speaker 2: and lots of different tasks that might be useful for

115
00:06:35,676 --> 00:06:38,636
Speaker 2: robots to do, and recording all the data. So we

116
00:06:38,676 --> 00:06:40,996
Speaker 2: have cameras on the robots. There are sensors on the

117
00:06:41,036 --> 00:06:44,636
Speaker 2: joints on the motors of the robots as well, and

118
00:06:44,676 --> 00:06:47,596
Speaker 2: we record that in like a synchronized way across time.

119
00:06:47,836 --> 00:06:50,596
Speaker 1: So when you do it, it's like kind of like

120
00:06:50,756 --> 00:06:52,716
Speaker 1: a real world video game, like you're moving your arms

121
00:06:52,716 --> 00:06:55,676
Speaker 1: in these things, and in basically real time, the robot

122
00:06:55,796 --> 00:06:58,036
Speaker 1: arm is moving and picking up the thing you wanted

123
00:06:58,076 --> 00:07:01,156
Speaker 1: to pick up, And like, what's it like? Is there

124
00:07:01,236 --> 00:07:03,556
Speaker 1: like a curve where like at the beginning it's really bad?

125
00:07:03,636 --> 00:07:06,036
Speaker 1: Sort of tell me talk me through an instance.

126
00:07:06,956 --> 00:07:08,796
Speaker 2: And it depends on the person. So some people can

127
00:07:08,836 --> 00:07:11,276
Speaker 2: pay it really really quickly. Some people are a bit

128
00:07:11,276 --> 00:07:13,756
Speaker 2: slower to pick it up. I've pride myself in being

129
00:07:13,756 --> 00:07:17,756
Speaker 2: a pretty good operator, and so I have done tasks

130
00:07:17,756 --> 00:07:20,476
Speaker 2: as complex as peeling a hard boiled egg with the robot,

131
00:07:21,196 --> 00:07:22,476
Speaker 2: which is how are.

132
00:07:22,316 --> 00:07:24,916
Speaker 1: You how are you at peeling a hardboard hard boiled

133
00:07:24,916 --> 00:07:25,796
Speaker 1: egg with your hands.

134
00:07:27,276 --> 00:07:29,796
Speaker 2: It's pretty hard with my own hands too, yeah, and

135
00:07:29,836 --> 00:07:31,076
Speaker 2: with the robot is even harder.

136
00:07:31,156 --> 00:07:32,996
Speaker 1: Tell me about the robot peeling a hard build egg

137
00:07:33,036 --> 00:07:35,276
Speaker 1: because that sounds like a hard one. Yeah.

138
00:07:35,316 --> 00:07:37,796
Speaker 2: So the robots, basically, all the robots that we're using

139
00:07:37,836 --> 00:07:40,716
Speaker 2: are like kind of pincher grippers. They're called parallel drag rippers,

140
00:07:41,036 --> 00:07:44,756
Speaker 2: where there's just one degree random like open clothes two pincers.

141
00:07:44,756 --> 00:07:46,556
Speaker 1: It's basically two pincers, like two.

142
00:07:46,396 --> 00:07:50,676
Speaker 2: Pinters, two arms. Yeah, exactly, and I've used that exact setup.

143
00:07:50,996 --> 00:07:52,956
Speaker 2: There's six different joints on the arm, so it can

144
00:07:53,396 --> 00:07:56,556
Speaker 2: move as kind of full basically full range of motion

145
00:07:56,676 --> 00:07:59,236
Speaker 2: in three D space and three D rotation, and you

146
00:07:59,236 --> 00:08:01,396
Speaker 2: can use that to peel a hard boiled egg. You

147
00:08:01,436 --> 00:08:04,156
Speaker 2: don't have any tactile feedback, so you can't actually feel

148
00:08:04,556 --> 00:08:05,996
Speaker 2: the egg, and that's actually one of the things that

149
00:08:06,116 --> 00:08:08,876
Speaker 2: makes it more difficult. But you can actually you can

150
00:08:08,996 --> 00:08:13,036
Speaker 2: use visual feedback to compensate for that. And so just

151
00:08:13,036 --> 00:08:15,516
Speaker 2: by looking at the egg myself, I'm able to figure

152
00:08:15,516 --> 00:08:18,076
Speaker 2: out if you're like in contact with something, and you just.

153
00:08:18,156 --> 00:08:21,156
Speaker 1: Use one prong of the claw like what I could say,

154
00:08:21,156 --> 00:08:23,236
Speaker 1: you squeeze it a little to crack it, and then

155
00:08:23,676 --> 00:08:25,836
Speaker 1: use like one prong of the claw to get the

156
00:08:25,836 --> 00:08:26,316
Speaker 1: shell off.

157
00:08:26,996 --> 00:08:28,956
Speaker 2: Yeah, exactly, so you can. You want to crack it

158
00:08:28,996 --> 00:08:31,116
Speaker 2: initially and then hold it with one gripper and then

159
00:08:31,236 --> 00:08:34,716
Speaker 2: use basically one of the two fingers in the gripper

160
00:08:35,036 --> 00:08:38,076
Speaker 2: to get pieces of shell off. When we did this,

161
00:08:38,116 --> 00:08:41,836
Speaker 2: we heart boiled only two eggs and the moss egg.

162
00:08:42,556 --> 00:08:44,516
Speaker 2: This is actually a Stanford The first egg and graduate

163
00:08:44,556 --> 00:08:46,956
Speaker 2: student ended up breaking and so that I did the

164
00:08:46,996 --> 00:08:49,156
Speaker 2: second egg, and I was able to successfully not break

165
00:08:49,196 --> 00:08:52,156
Speaker 2: it and fully peel it. It took some patience, certainly,

166
00:08:52,156 --> 00:08:53,956
Speaker 2: and I wasn't able to do it as quickly as

167
00:08:53,956 --> 00:08:56,556
Speaker 2: with my own hands, But I guess goes to show

168
00:08:56,636 --> 00:09:00,276
Speaker 2: the extent to which we're able to control robots to

169
00:09:00,356 --> 00:09:02,116
Speaker 2: do pretty complicated things.

170
00:09:02,356 --> 00:09:05,596
Speaker 1: Yeah, and so obviously, I mean that is a stunt

171
00:09:05,676 --> 00:09:07,876
Speaker 1: or a game or something fun to do with the robot.

172
00:09:07,916 --> 00:09:11,956
Speaker 1: But presumably in that instance, as in the other instances

173
00:09:11,996 --> 00:09:16,556
Speaker 1: of folding clothes and vacuuming it like, there is learning, right.

174
00:09:16,596 --> 00:09:19,076
Speaker 1: The idea is that you do it some number of

175
00:09:19,116 --> 00:09:21,476
Speaker 1: times and then the robot can do it, and then

176
00:09:21,516 --> 00:09:24,516
Speaker 1: presumably there's also generalization. But just to start with learning,

177
00:09:24,796 --> 00:09:29,036
Speaker 1: like you know, reductively, how many times do you got

178
00:09:29,036 --> 00:09:30,356
Speaker 1: to do it for the robot to learn it?

179
00:09:31,676 --> 00:09:35,796
Speaker 2: Yeah, so it really depends on the extent to which

180
00:09:35,836 --> 00:09:38,636
Speaker 2: you want the robot to handle different conditions. So in

181
00:09:38,676 --> 00:09:40,996
Speaker 2: some of our research, we've been able to show the

182
00:09:41,116 --> 00:09:44,596
Speaker 2: robot how to do something like thirty times or fifty times,

183
00:09:44,716 --> 00:09:47,716
Speaker 2: and just with that maybe sounds like a bit, but

184
00:09:47,716 --> 00:09:49,476
Speaker 2: you can do that in like typically less than an

185
00:09:49,476 --> 00:09:52,276
Speaker 2: hour if it's a simple task, and from that the

186
00:09:52,356 --> 00:09:56,036
Speaker 2: robot can under the circumstances. You only kind of demonstrate it.

187
00:09:56,036 --> 00:09:59,036
Speaker 2: In a narrow set of circumstances, like a single environment,

188
00:09:59,396 --> 00:10:02,956
Speaker 2: a single particular object, the robot can learn just from

189
00:10:03,036 --> 00:10:05,076
Speaker 2: like less than hour of data.

190
00:10:05,156 --> 00:10:07,156
Speaker 1: What is an example of a thing that the robot

191
00:10:07,196 --> 00:10:08,556
Speaker 1: learned in less than an er of data?

192
00:10:09,316 --> 00:10:12,556
Speaker 2: Oh yeah, we put a shoe on a foot, We

193
00:10:12,876 --> 00:10:14,156
Speaker 2: tear it off a piece of tape and put it

194
00:10:14,196 --> 00:10:18,516
Speaker 2: on a box. We've also hung up a shirt on

195
00:10:18,596 --> 00:10:19,036
Speaker 2: a hangar.

196
00:10:19,676 --> 00:10:22,276
Speaker 1: So that's not that much I mean, especially because you

197
00:10:22,316 --> 00:10:24,676
Speaker 1: say the robot, but what you really mean is the model.

198
00:10:24,796 --> 00:10:29,116
Speaker 1: So every robot, right, presumably or every robot that's built

199
00:10:29,156 --> 00:10:30,916
Speaker 1: more or less like that one, right, Like that's one

200
00:10:30,956 --> 00:10:33,236
Speaker 1: of the key things. It's like you're not teaching one robot,

201
00:10:33,276 --> 00:10:37,276
Speaker 1: you're teaching every robot ever, because it's it's software fundamentally,

202
00:10:37,276 --> 00:10:38,836
Speaker 1: it's an am model. It's not hardware.

203
00:10:39,356 --> 00:10:42,236
Speaker 2: Yeah, yes, with the caveat that, if you want to

204
00:10:42,236 --> 00:10:44,796
Speaker 2: be this data efficient, it works best if it's like

205
00:10:45,156 --> 00:10:47,356
Speaker 2: in the same like the same color of the table,

206
00:10:47,756 --> 00:10:50,156
Speaker 2: the same kind of rough initial conditions of where the

207
00:10:50,156 --> 00:10:52,636
Speaker 2: objects are starting, right, and the same shirt for example.

208
00:10:52,676 --> 00:10:54,436
Speaker 2: So this is just with like a single shirt and

209
00:10:54,476 --> 00:10:55,276
Speaker 2: not like any shirt.

210
00:10:55,436 --> 00:10:59,556
Speaker 1: So there's there's like concentric circles of generalizability, right, like

211
00:10:59,676 --> 00:11:02,836
Speaker 1: exact same shirt, exact same spot, exact same table versus

212
00:11:02,876 --> 00:11:06,876
Speaker 1: like fold a shirt versus fold clothes, right and versus.

213
00:11:07,676 --> 00:11:12,116
Speaker 1: And so is that just infinitely harder, Like how does

214
00:11:12,156 --> 00:11:14,396
Speaker 1: that work? That's your big that's your big challenge at

215
00:11:14,396 --> 00:11:16,396
Speaker 1: some level, right, Yeah.

216
00:11:16,236 --> 00:11:18,396
Speaker 2: So generalization is one of the big one of the

217
00:11:18,396 --> 00:11:20,076
Speaker 2: big challenges, not the only one, but it's one of

218
00:11:20,076 --> 00:11:23,636
Speaker 2: the big challenges. And in some ways, I mean the

219
00:11:23,956 --> 00:11:25,916
Speaker 2: first unlock there is just to make sure that you're

220
00:11:25,916 --> 00:11:28,316
Speaker 2: collecting data not just for one shirt, but collecting it

221
00:11:28,316 --> 00:11:30,036
Speaker 2: for lots of shirts, or collecting it for lots of

222
00:11:30,036 --> 00:11:33,316
Speaker 2: clothing items, and ideally also collecting data with lots of

223
00:11:33,356 --> 00:11:37,356
Speaker 2: tables with different textures, and also like not just visual

224
00:11:37,596 --> 00:11:40,596
Speaker 2: like appearances, but also like if you're folding on a

225
00:11:40,636 --> 00:11:43,716
Speaker 2: surface that has very low friction, like it's very smooth,

226
00:11:43,796 --> 00:11:46,236
Speaker 2: versus a surface that like maybe on top of carpet

227
00:11:46,316 --> 00:11:49,436
Speaker 2: or something that's going to behave differently when you're trying

228
00:11:49,476 --> 00:11:53,916
Speaker 2: to move the shirt across the table. So having variability

229
00:11:53,996 --> 00:11:57,236
Speaker 2: in the scenarios in which the robot is experiencing in

230
00:11:57,276 --> 00:12:02,076
Speaker 2: the data set is important, and we've seen evidence that

231
00:12:02,596 --> 00:12:04,716
Speaker 2: you set things up correctly and collect data under lots

232
00:12:04,756 --> 00:12:08,276
Speaker 2: of scenarios, you can actually generalize to completely new scenarios.

233
00:12:08,316 --> 00:12:11,556
Speaker 2: And in like Pile five release, for example, we found

234
00:12:11,596 --> 00:12:15,356
Speaker 2: that if we collected data in roughly like one hundred

235
00:12:15,396 --> 00:12:20,436
Speaker 2: different rooms, then the robot is able to do some

236
00:12:20,636 --> 00:12:22,756
Speaker 2: tasks in rooms that it's never been in before.

237
00:12:23,116 --> 00:12:26,516
Speaker 1: So you mentioned Pile five, So PI zero point five

238
00:12:26,556 --> 00:12:31,716
Speaker 1: that's your latest model that you've released, right, tell me

239
00:12:31,756 --> 00:12:35,676
Speaker 1: about that, Like, what what does that model allow robots

240
00:12:35,716 --> 00:12:38,956
Speaker 1: to do? Like what robots and what settings and what tasks.

241
00:12:39,436 --> 00:12:43,116
Speaker 2: Yeah, yeah, definitely. So we were focusing on generalization. So

242
00:12:43,316 --> 00:12:46,196
Speaker 2: the previous model, we were focusing on capability, and we

243
00:12:46,236 --> 00:12:49,756
Speaker 2: did a really complicated task of laundry folding. From there,

244
00:12:49,796 --> 00:12:52,556
Speaker 2: we wanted to answer, like, Okay, that model worked in

245
00:12:52,556 --> 00:12:54,596
Speaker 2: one environment. It's fairly brittle. If you put it in

246
00:12:54,596 --> 00:12:56,556
Speaker 2: a new environment, it wouldn't work. And we wanted to

247
00:12:56,556 --> 00:12:59,476
Speaker 2: see if we put robots in new environments with new objects,

248
00:12:59,476 --> 00:13:03,476
Speaker 2: new lighting conditions, new furniture, can the robot be successful.

249
00:13:03,636 --> 00:13:09,956
Speaker 2: And to do that, we collected data on these manipulators,

250
00:13:10,076 --> 00:13:13,636
Speaker 2: which feels like a terrible name, but robots with two

251
00:13:13,716 --> 00:13:16,036
Speaker 2: arms and wheels that can drive around kind of like

252
00:13:16,036 --> 00:13:18,956
Speaker 2: a humanoid, but we're using wheels instead of legs, a

253
00:13:18,956 --> 00:13:22,716
Speaker 2: bit more practical in that regard, and we train the

254
00:13:22,796 --> 00:13:26,396
Speaker 2: robot to do things like tidying a bed, or wiping

255
00:13:26,476 --> 00:13:29,556
Speaker 2: spills off of a surface, or putting dishes into a sink,

256
00:13:29,676 --> 00:13:34,156
Speaker 2: or putting away items into drawers, taking items of clothing,

257
00:13:34,156 --> 00:13:36,236
Speaker 2: dirty clothing off the floor and putting them into a

258
00:13:36,276 --> 00:13:39,836
Speaker 2: laundry basket, things like that, And then we tested whether

259
00:13:39,916 --> 00:13:42,036
Speaker 2: or not after collecting data like that and lots of

260
00:13:42,116 --> 00:13:45,676
Speaker 2: environments aggregated with other data, including data on the internet.

261
00:13:46,156 --> 00:13:49,876
Speaker 2: Can the robot then do those things in a home

262
00:13:49,916 --> 00:13:53,076
Speaker 2: that has never been in before. And in some ways

263
00:13:53,076 --> 00:13:57,916
Speaker 2: that sounds kind of basic, like people have no problem

264
00:13:58,316 --> 00:14:01,236
Speaker 2: with if you can do it something in like one home,

265
00:14:01,356 --> 00:14:03,236
Speaker 2: probably could do the same thing in another home. It's

266
00:14:03,276 --> 00:14:05,796
Speaker 2: not really doesn't seem like a complicated thing for humans,

267
00:14:05,956 --> 00:14:08,316
Speaker 2: but for robots that are trained on data, if they're

268
00:14:08,316 --> 00:14:11,116
Speaker 2: only trained on in one place there are whole universe,

269
00:14:11,196 --> 00:14:13,476
Speaker 2: is that one place they haven't ever seen any other place?

270
00:14:13,836 --> 00:14:17,276
Speaker 2: This is actually kind of a big challenge for existing methods.

271
00:14:17,276 --> 00:14:18,956
Speaker 2: And yeah, it was a step four. We were able

272
00:14:18,996 --> 00:14:21,676
Speaker 2: to see that it definitely isn't perfect by any means,

273
00:14:21,716 --> 00:14:25,916
Speaker 2: and that kind of comes to another challenge, which is reliability.

274
00:14:26,036 --> 00:14:29,036
Speaker 2: But we're able to see the robot do things in

275
00:14:29,076 --> 00:14:31,236
Speaker 2: homes it's never been in before, where we set it up,

276
00:14:31,356 --> 00:14:33,156
Speaker 2: ask it to do things, and it does some things

277
00:14:33,196 --> 00:14:33,756
Speaker 2: that are useful.

278
00:14:33,876 --> 00:14:36,476
Speaker 1: So like in the classical setting where a robot is

279
00:14:36,556 --> 00:14:38,356
Speaker 1: changed in one room, like it doesn't even know that

280
00:14:38,436 --> 00:14:40,996
Speaker 1: room is a room. That's just like the whole world

281
00:14:41,036 --> 00:14:43,196
Speaker 1: to the robot, is that world right? And if you

282
00:14:43,236 --> 00:14:46,996
Speaker 1: put it in another room, it's in a completely unfamiliar

283
00:14:47,036 --> 00:14:48,236
Speaker 1: world exactly.

284
00:14:48,316 --> 00:14:50,316
Speaker 2: And so for example, what we were talking about, like

285
00:14:50,556 --> 00:14:52,996
Speaker 2: hanging up a shirt, its whole world was like that one,

286
00:14:53,156 --> 00:14:57,036
Speaker 2: like like a black tabletop that smooth, that one blue shirt,

287
00:14:57,156 --> 00:14:59,436
Speaker 2: that one coat hanger. And it doesn't know about this

288
00:14:59,916 --> 00:15:01,676
Speaker 2: entire universe of other shirts and other.

289
00:15:01,716 --> 00:15:03,956
Speaker 1: It doesn't know that there is a category called shirt.

290
00:15:04,156 --> 00:15:04,676
Speaker 1: It only knows.

291
00:15:04,756 --> 00:15:05,876
Speaker 2: Yeah, it doesn't even know what shirts are.

292
00:15:06,036 --> 00:15:08,356
Speaker 1: Yeah, it doesn't even know what shirts are. For pie

293
00:15:08,436 --> 00:15:10,556
Speaker 1: zero point five, Like, what did you ask the robot

294
00:15:10,596 --> 00:15:12,196
Speaker 1: to do? And how well did it work?

295
00:15:13,316 --> 00:15:16,596
Speaker 2: Yeah, So we trained the model. We took actually a

296
00:15:16,596 --> 00:15:19,956
Speaker 2: pre trading language model with also like a vision component,

297
00:15:20,476 --> 00:15:23,156
Speaker 2: and we fine tuned it on a lot of data,

298
00:15:23,196 --> 00:15:26,676
Speaker 2: including data from different homes across San Francisco, but actually

299
00:15:26,676 --> 00:15:28,276
Speaker 2: a lot of other data too. So actually only two

300
00:15:28,316 --> 00:15:31,796
Speaker 2: percent of the data was on these like mobile robots

301
00:15:31,956 --> 00:15:35,196
Speaker 2: with arms. So we can store how the motors were

302
00:15:35,196 --> 00:15:38,036
Speaker 2: all moving in all of our previous data and then

303
00:15:38,356 --> 00:15:40,716
Speaker 2: train the model to mimic that data that we've stored.

304
00:15:40,836 --> 00:15:43,476
Speaker 1: It's like it's like predicting the next word, but instead

305
00:15:43,476 --> 00:15:45,716
Speaker 1: of predicting the next word, it's like predicting the next movement.

306
00:15:45,876 --> 00:15:47,236
Speaker 1: Or something like yes, exactly.

307
00:15:48,716 --> 00:15:50,956
Speaker 2: We've kind of trained it to predict next actions or

308
00:15:51,036 --> 00:15:54,276
Speaker 2: next motor commands instead of next words. We do an

309
00:15:54,316 --> 00:15:57,476
Speaker 2: additional training process to have it focus on and be

310
00:15:57,596 --> 00:16:01,036
Speaker 2: good at the mobile robot data and homes. Then we

311
00:16:01,036 --> 00:16:03,396
Speaker 2: set up the robot in a new home and we

312
00:16:03,476 --> 00:16:06,516
Speaker 2: give it language commands, so we can give it low

313
00:16:06,596 --> 00:16:09,516
Speaker 2: level language commands, or we can actually all so give

314
00:16:09,516 --> 00:16:12,596
Speaker 2: it higher level commands. So the highest level of command

315
00:16:12,676 --> 00:16:14,916
Speaker 2: might be cleaned the bedroom. And one of the things

316
00:16:14,916 --> 00:16:16,556
Speaker 2: that we've also been thinking about more recently is can

317
00:16:16,556 --> 00:16:18,916
Speaker 2: you give it a more detailed description of how you

318
00:16:18,956 --> 00:16:20,916
Speaker 2: want it to clean the bedroom? But we're not quite

319
00:16:20,916 --> 00:16:22,756
Speaker 2: there yet, So we could say clean the bedroom. We'd

320
00:16:22,796 --> 00:16:25,316
Speaker 2: also tell it put the dirty clothes in the laundry basket,

321
00:16:26,236 --> 00:16:29,476
Speaker 2: so that would be kind of a subtask. Or we

322
00:16:29,516 --> 00:16:32,116
Speaker 2: can tell it like commands like pick up the shirt,

323
00:16:32,556 --> 00:16:35,396
Speaker 2: put the shirt in the laundry basket. Then after we

324
00:16:35,476 --> 00:16:39,996
Speaker 2: tell it that command, then it will go off and

325
00:16:40,756 --> 00:16:44,476
Speaker 2: follow that command and actually in most cases realize that

326
00:16:44,516 --> 00:16:46,636
Speaker 2: command successfully in the real world.

327
00:16:47,156 --> 00:16:47,676
Speaker 1: How did it do.

328
00:16:48,476 --> 00:16:51,556
Speaker 2: So it depends on the task. The average success rate

329
00:16:51,596 --> 00:16:55,476
Speaker 2: was around eighty percent, so definitely room for improvement, and

330
00:16:56,036 --> 00:16:58,436
Speaker 2: in many snares it was able to be quite successful.

331
00:16:58,556 --> 00:17:01,756
Speaker 2: We also saw some some failure modes where for example,

332
00:17:01,796 --> 00:17:04,956
Speaker 2: if you're trying to put dishes into a sink, sometimes

333
00:17:05,076 --> 00:17:06,956
Speaker 2: one of the dishes was a cutting board, and picking

334
00:17:06,996 --> 00:17:09,036
Speaker 2: up a cutting board is actually pretty tricky for the

335
00:17:09,196 --> 00:17:11,516
Speaker 2: robot because you either need to slide it to the

336
00:17:11,676 --> 00:17:14,236
Speaker 2: edge of the counter and then grasp it or somehow

337
00:17:14,276 --> 00:17:17,916
Speaker 2: get the kind of get the finger underneath the cutting board.

338
00:17:18,276 --> 00:17:20,396
Speaker 2: And so sometimes it was able to do that successfully.

339
00:17:20,396 --> 00:17:24,076
Speaker 2: Sometimes it struggled and got stuck. The exciting thing though,

340
00:17:24,116 --> 00:17:26,436
Speaker 2: was that it was able to We were able to

341
00:17:26,476 --> 00:17:27,916
Speaker 2: kind of drop it in place as it had never

342
00:17:27,956 --> 00:17:31,276
Speaker 2: been before. And I was doing things that are quite reasonable.

343
00:17:32,036 --> 00:17:33,836
Speaker 1: So what are you doing now, Like, what's the next

344
00:17:33,876 --> 00:17:35,996
Speaker 1: thing you're trying to get to? Yeah?

345
00:17:35,996 --> 00:17:39,796
Speaker 2: Absolutely, So the next thing we're focusing on is reliability

346
00:17:40,116 --> 00:17:44,036
Speaker 2: and speed. So I mentioned like around eighty percent for

347
00:17:44,076 --> 00:17:46,956
Speaker 2: these tasks. How do we get that to ninety nine percent?

348
00:17:47,116 --> 00:17:49,716
Speaker 2: And I think that if we can get the reliability up,

349
00:17:49,916 --> 00:17:54,236
Speaker 2: that's kind of, in my mind, the main missing ingredient

350
00:17:54,476 --> 00:17:57,596
Speaker 2: before we can like really have these being like useful

351
00:17:58,236 --> 00:18:00,116
Speaker 2: in real world scenarios.

352
00:18:00,716 --> 00:18:03,316
Speaker 1: So getting to ninety nine percent is interesting. I mean,

353
00:18:03,396 --> 00:18:08,036
Speaker 1: I think of self driving cars right where it seemed

354
00:18:08,516 --> 00:18:11,236
Speaker 1: sometime go I don't know, ten years ago, fifteen years ago,

355
00:18:11,316 --> 00:18:14,116
Speaker 1: like they were almost there, and I know they're more

356
00:18:14,156 --> 00:18:16,356
Speaker 1: almost there now. I know in San Francisco there really

357
00:18:16,436 --> 00:18:18,676
Speaker 1: are self driving cars, but they're still very much at

358
00:18:18,716 --> 00:18:22,036
Speaker 1: the margin of cars in the world, right, And it

359
00:18:22,076 --> 00:18:26,236
Speaker 1: does seem like almost there means different things in different settings,

360
00:18:26,276 --> 00:18:31,716
Speaker 1: But I don't know. Is it super hard to get

361
00:18:31,716 --> 00:18:33,996
Speaker 1: from eighty percent to ninety nine percent? Does the self

362
00:18:34,076 --> 00:18:38,716
Speaker 1: driving car example teach us anything for your work?

363
00:18:39,796 --> 00:18:42,756
Speaker 2: The self driving car analogy is pretty good. I do

364
00:18:42,836 --> 00:18:47,156
Speaker 2: think that fortunately, we may not need There are scenarios

365
00:18:47,156 --> 00:18:48,916
Speaker 2: where we may not need it to be quite as

366
00:18:48,956 --> 00:18:52,676
Speaker 2: reliable as cars. Cars there is a much much higher

367
00:18:52,876 --> 00:18:56,956
Speaker 2: safety risk. It's much easier to hurt people, and in

368
00:18:57,076 --> 00:18:59,036
Speaker 2: robots there are safety risks because you are in the

369
00:18:59,036 --> 00:19:03,356
Speaker 2: physical world. But it's easier to put in software precautions

370
00:19:03,396 --> 00:19:06,116
Speaker 2: in place and even hardware precautions in place to prevent

371
00:19:06,156 --> 00:19:08,396
Speaker 2: that as well, So that makes it a little bit easier.

372
00:19:08,396 --> 00:19:11,796
Speaker 1: I mean, nine percent probably isn't good enough for cars, right,

373
00:19:11,796 --> 00:19:14,596
Speaker 1: They probably need more nines than that, whereas it may

374
00:19:14,596 --> 00:19:16,356
Speaker 1: well be good enough for a house.

375
00:19:16,156 --> 00:19:19,916
Speaker 2: Cleaning robots, yeah, in certain circumstances. And yeah, like we're

376
00:19:19,916 --> 00:19:22,316
Speaker 2: also thinking about scenarios where maybe even less than that

377
00:19:22,396 --> 00:19:26,076
Speaker 2: is fine. And if we view humans and robots working together,

378
00:19:26,396 --> 00:19:29,436
Speaker 2: it's more about kind of helping the person complete the

379
00:19:29,436 --> 00:19:33,436
Speaker 2: task faster or complete the task more effectively. So I

380
00:19:33,436 --> 00:19:35,956
Speaker 2: think there might be scenarios like that, but still we

381
00:19:35,996 --> 00:19:39,076
Speaker 2: need the performance and reliability to be higher for the

382
00:19:39,156 --> 00:19:41,476
Speaker 2: robots to be faster in order to accomplish that.

383
00:19:44,676 --> 00:19:59,156
Speaker 1: We'll be back in just a minute. What do you

384
00:19:59,196 --> 00:20:02,436
Speaker 1: imagine as the initial real world use cases?

385
00:20:05,076 --> 00:20:07,236
Speaker 2: I don't know. There's a lot of examples of robotics

386
00:20:07,236 --> 00:20:11,196
Speaker 2: companies that have a tempted to kind of start with

387
00:20:11,236 --> 00:20:16,156
Speaker 2: an application and hone in on that, and I think

388
00:20:16,196 --> 00:20:20,156
Speaker 2: the lesson from watching those companies is that you end

389
00:20:20,236 --> 00:20:23,596
Speaker 2: up then spending a lot of time on the problems

390
00:20:23,596 --> 00:20:26,956
Speaker 2: of that specific application and less on developing the sort

391
00:20:26,996 --> 00:20:28,796
Speaker 2: of generalist systems that we think in the long run

392
00:20:28,836 --> 00:20:31,596
Speaker 2: will be more effective. And so we're very focused on

393
00:20:32,276 --> 00:20:36,036
Speaker 2: understanding what are the core bottlenecks and the core missing

394
00:20:36,076 --> 00:20:38,876
Speaker 2: pieces for developing these generalist models, and we think that

395
00:20:38,916 --> 00:20:41,356
Speaker 2: if we had picked an application now, we would kind

396
00:20:41,356 --> 00:20:43,156
Speaker 2: of lose sight of that bigger problem because we need

397
00:20:43,156 --> 00:20:45,916
Speaker 2: to solve things that are specific to that application. So

398
00:20:46,076 --> 00:20:48,636
Speaker 2: we're very focused on what we think are like the

399
00:20:48,636 --> 00:20:53,876
Speaker 2: core technological challenges. We have certain tasks that we're working on.

400
00:20:53,916 --> 00:20:56,556
Speaker 2: Some of them have been home cleaning tasks. We've also

401
00:20:56,636 --> 00:20:59,716
Speaker 2: have some more kind of industrial light tasks as well,

402
00:20:59,956 --> 00:21:04,196
Speaker 2: just to instantiate and actually be iterating on robots and

403
00:21:04,396 --> 00:21:09,396
Speaker 2: applications could range from things and homes to things in

404
00:21:09,476 --> 00:21:14,076
Speaker 2: workplaces to industrial settings. There's lots and lots of use

405
00:21:14,116 --> 00:21:18,716
Speaker 2: cases for intelligent robots and intelligent kind of physical machines.

406
00:21:19,556 --> 00:21:23,796
Speaker 1: What are some of the industrial tasks you've been working on.

407
00:21:24,476 --> 00:21:27,356
Speaker 2: One example that I mentioned before is inserting cables. There's

408
00:21:27,436 --> 00:21:31,236
Speaker 2: lots of use cases in data centers, for example, where

409
00:21:31,836 --> 00:21:36,716
Speaker 2: that's a challenging task. Another example is constructing cardboard boxes

410
00:21:36,756 --> 00:21:40,396
Speaker 2: and filling them with items. We've also done some packaging

411
00:21:40,436 --> 00:21:44,396
Speaker 2: tasks highly relevant to lots of different kind of shipping operations.

412
00:21:44,836 --> 00:21:47,516
Speaker 2: And then even folding clothes. It seems like a very

413
00:21:47,556 --> 00:21:50,556
Speaker 2: home task, but it turns out that there are companies

414
00:21:50,756 --> 00:21:54,316
Speaker 2: that need to fold like very large lots of clothing,

415
00:21:55,036 --> 00:21:57,996
Speaker 2: and so that's also something that in the long term

416
00:21:58,036 --> 00:22:01,316
Speaker 2: could be used in larger scale settings.

417
00:22:01,756 --> 00:22:07,916
Speaker 1: So I've read that you have open sourced your model

418
00:22:07,956 --> 00:22:11,556
Speaker 1: weights and given designs of robots to hardware companies, and

419
00:22:11,596 --> 00:22:14,916
Speaker 1: I'm interested in that and that set of decisions, right,

420
00:22:14,956 --> 00:22:17,756
Speaker 1: that set of sort of strategic decisions. Tell me about

421
00:22:17,796 --> 00:22:20,716
Speaker 1: that sort of giving away IP basically.

422
00:22:20,356 --> 00:22:23,596
Speaker 2: Right, yeah, yeah, definitely. So this is a really hard problem,

423
00:22:23,836 --> 00:22:26,676
Speaker 2: especially this longer term problem of developing a general system.

424
00:22:26,756 --> 00:22:32,996
Speaker 2: We think that the field is very young, and there's

425
00:22:33,316 --> 00:22:36,356
Speaker 2: like a couple of reasons. One is that we think

426
00:22:36,396 --> 00:22:38,236
Speaker 2: that the field needs to mature, and we think that

427
00:22:38,756 --> 00:22:41,876
Speaker 2: having more people being kind of competent with using robots

428
00:22:41,916 --> 00:22:44,916
Speaker 2: and using this kind of technology will be beneficial in

429
00:22:44,916 --> 00:22:47,476
Speaker 2: the long term for the company, and by open sourcing things,

430
00:22:47,516 --> 00:22:49,916
Speaker 2: we make it easier for people to do that. And

431
00:22:49,956 --> 00:22:52,516
Speaker 2: then the second thing is, like the models that we

432
00:22:52,596 --> 00:22:55,996
Speaker 2: develop right now, they're very early, and the models that

433
00:22:56,076 --> 00:22:59,916
Speaker 2: we'll be developing one to three years from now are

434
00:22:59,956 --> 00:23:02,396
Speaker 2: going to be far far more capable than the ones

435
00:23:02,436 --> 00:23:05,156
Speaker 2: that we have now. And so it's kind of like

436
00:23:05,156 --> 00:23:09,276
Speaker 2: like equivalent to like open eye open sourcing GPT to

437
00:23:09,516 --> 00:23:13,236
Speaker 2: GPT three. They actually didn't open source GPT three, but like,

438
00:23:13,596 --> 00:23:15,556
Speaker 2: I think that they would still be in an excellent

439
00:23:15,596 --> 00:23:17,356
Speaker 2: spot today if they had.

440
00:23:19,076 --> 00:23:22,836
Speaker 1: Like what could go wrong that would either prevent you

441
00:23:22,956 --> 00:23:25,676
Speaker 1: as a company from succeeding or even hold back the

442
00:23:25,716 --> 00:23:28,756
Speaker 1: field In general, I don't think we.

443
00:23:28,836 --> 00:23:31,996
Speaker 2: Entirely know the scale of data that we need for

444
00:23:32,676 --> 00:23:36,276
Speaker 2: getting really capable models. And there's a little bit of

445
00:23:36,276 --> 00:23:39,116
Speaker 2: a chicken and egg problem where it's a lot easier

446
00:23:39,116 --> 00:23:41,676
Speaker 2: to collect data once you have a really good model.

447
00:23:42,116 --> 00:23:43,716
Speaker 2: It took like large amounts of data.

448
00:23:43,516 --> 00:23:45,196
Speaker 1: Right, Or if there were thousands of robots out of

449
00:23:45,236 --> 00:23:47,036
Speaker 1: the world running your model, they would just make an

450
00:23:47,076 --> 00:23:50,036
Speaker 1: incredible amount of data coming into you every day, right.

451
00:23:50,356 --> 00:23:53,676
Speaker 2: Yeah, yeah, exactly. So that's that's one thing I actually

452
00:23:53,796 --> 00:23:57,116
Speaker 2: less maybe less a little bit less concerned about that myself.

453
00:23:57,116 --> 00:23:58,796
Speaker 2: And then I think the other thing is just that

454
00:23:58,796 --> 00:24:02,396
Speaker 2: there are technological challenges to getting these things to work

455
00:24:02,436 --> 00:24:05,316
Speaker 2: really well. I think that I think we've had incredible

456
00:24:05,356 --> 00:24:09,476
Speaker 2: progress over the last year and two months, the last

457
00:24:09,476 --> 00:24:12,636
Speaker 2: like fourteen months. I think since we've started, probably more

458
00:24:12,676 --> 00:24:17,236
Speaker 2: progress than I was expecting, honestly compared to when we

459
00:24:17,236 --> 00:24:20,916
Speaker 2: started the company. I think it's like wild that we

460
00:24:20,956 --> 00:24:22,676
Speaker 2: were able to get a robot to like unload and

461
00:24:22,676 --> 00:24:25,676
Speaker 2: fold laundry like a ten minute long task.

462
00:24:25,596 --> 00:24:30,196
Speaker 1: And folding laundry is like a famously hard robot problem, right,

463
00:24:30,236 --> 00:24:32,636
Speaker 1: Like it's the one that people in robotics talk about

464
00:24:32,916 --> 00:24:35,796
Speaker 1: when they talk about things people think are easy are

465
00:24:35,836 --> 00:24:37,636
Speaker 1: actually hard for robots, right.

466
00:24:37,596 --> 00:24:39,796
Speaker 2: Yeah, absolutely absolutely. I mean you have to deal with

467
00:24:39,836 --> 00:24:42,836
Speaker 2: all sorts of variability and how clothes can be crumpled

468
00:24:42,836 --> 00:24:45,516
Speaker 2: on each other. And also it's like there's even like

469
00:24:45,636 --> 00:24:47,516
Speaker 2: really small, minor things you need to do in order

470
00:24:47,556 --> 00:24:49,036
Speaker 2: to like actually get it to be flat on the

471
00:24:49,076 --> 00:24:52,836
Speaker 2: table and folded nicely and even stacked. And as the

472
00:24:52,836 --> 00:24:55,476
Speaker 2: task gets longer as well, there are more opportunities to

473
00:24:55,516 --> 00:24:58,836
Speaker 2: make mistakes, more opportunities to get stuck. And so if

474
00:24:58,836 --> 00:25:00,676
Speaker 2: you're doing a task it takes ten minutes, in those

475
00:25:00,676 --> 00:25:02,676
Speaker 2: ten minutes, there's many many times where the robot can

476
00:25:02,716 --> 00:25:06,316
Speaker 2: make a mistake that it can't recover from or just

477
00:25:06,316 --> 00:25:08,276
Speaker 2: get stuck or something like that. And so being able

478
00:25:08,316 --> 00:25:10,956
Speaker 2: to do such a task starts to kind of point

479
00:25:10,956 --> 00:25:13,676
Speaker 2: at the resilience that these models can have by recovering

480
00:25:13,756 --> 00:25:16,476
Speaker 2: from those mystics. Uh huh, so when we were first

481
00:25:16,516 --> 00:25:20,316
Speaker 2: trying to fold laundry, like, one of the common failure

482
00:25:20,356 --> 00:25:23,356
Speaker 2: modes is that it would fold the laundry like very

483
00:25:23,356 --> 00:25:26,116
Speaker 2: well by my standards at the time, I would be

484
00:25:26,196 --> 00:25:28,116
Speaker 2: very very happy with the robot, and then it would

485
00:25:28,276 --> 00:25:30,836
Speaker 2: push the entire stack of laundry onto the ground.

486
00:25:32,756 --> 00:25:35,476
Speaker 1: Sort of like teaching a toddler to fold clothes.

487
00:25:36,236 --> 00:25:37,436
Speaker 2: Yeah, yeah, exactly.

488
00:25:37,636 --> 00:25:43,556
Speaker 1: Was there a particular moment when you saw a robot

489
00:25:43,636 --> 00:25:46,236
Speaker 1: using your model full close for ten minutes and it worked.

490
00:25:46,756 --> 00:25:50,356
Speaker 2: Yeah. First off, we started with just folding a shirt

491
00:25:50,516 --> 00:25:52,516
Speaker 2: starting flat on the table. We got that to work

492
00:25:52,556 --> 00:25:54,596
Speaker 2: pretty quickly that it turns out to be pretty easy,

493
00:25:55,196 --> 00:25:57,156
Speaker 2: and I wasn't too surprised by that. And then we

494
00:25:57,276 --> 00:25:59,756
Speaker 2: moved from that to starting it in like just a

495
00:25:59,836 --> 00:26:02,996
Speaker 2: random ball, like some sort of crumpled position on the table,

496
00:26:03,156 --> 00:26:04,836
Speaker 2: and then you have to flatten and then fold it,

497
00:26:04,916 --> 00:26:07,956
Speaker 2: and that makes a problem dramatically harder because of all

498
00:26:07,956 --> 00:26:10,676
Speaker 2: the variability having to figure out how to flatten it.

499
00:26:11,236 --> 00:26:14,796
Speaker 2: We were kind of stuck on that problem for at

500
00:26:14,876 --> 00:26:18,596
Speaker 2: least a couple of months, where everything we're trying, the

501
00:26:18,636 --> 00:26:20,956
Speaker 2: success rate of the robot was zero percent. It wasn't

502
00:26:20,956 --> 00:26:24,836
Speaker 2: able to really make progress on it, and we started

503
00:26:24,836 --> 00:26:28,676
Speaker 2: to see signs of life I think in August or

504
00:26:28,716 --> 00:26:33,196
Speaker 2: September of last year, where we tried a new recipe

505
00:26:33,236 --> 00:26:35,996
Speaker 2: where we were continue to train the model on a

506
00:26:36,076 --> 00:26:39,716
Speaker 2: curated part of the data that was following a consistent strategy,

507
00:26:40,436 --> 00:26:43,516
Speaker 2: and that sort of high quality post training is what

508
00:26:43,676 --> 00:26:46,116
Speaker 2: really seemed to make the model work better. And then

509
00:26:46,236 --> 00:26:48,436
Speaker 2: the moment that I was most excited about was the

510
00:26:48,516 --> 00:26:52,316
Speaker 2: first time that I saw the model flatten and fold

511
00:26:52,396 --> 00:26:54,076
Speaker 2: and stack five items in a row.

512
00:26:54,396 --> 00:26:54,596
Speaker 1: Yeah.

513
00:26:54,836 --> 00:26:56,796
Speaker 2: I just remember going home that night and being like

514
00:26:56,876 --> 00:27:00,196
Speaker 2: so excited. It seemed like we had just like figured

515
00:27:00,196 --> 00:27:02,116
Speaker 2: out this this big missing puzzle piece.

516
00:27:02,436 --> 00:27:04,996
Speaker 1: So I was asking you why might it not work

517
00:27:05,076 --> 00:27:07,076
Speaker 1: or what might slow the field down? And then we

518
00:27:07,436 --> 00:27:10,636
Speaker 1: talked about the happy short story. But if in five

519
00:27:10,716 --> 00:27:12,996
Speaker 1: years things didn't progress as quickly as you thought, what

520
00:27:14,596 --> 00:27:15,316
Speaker 1: might have happened.

521
00:27:16,316 --> 00:27:18,756
Speaker 2: I mentioned that I think that incorporating practice, like allowing

522
00:27:18,756 --> 00:27:22,276
Speaker 2: the we're about to practice the task, should be really

523
00:27:22,276 --> 00:27:26,556
Speaker 2: helpful for allowing robots to get better. We don't know

524
00:27:26,556 --> 00:27:30,076
Speaker 2: what exactly that recipe will look like, and so it's

525
00:27:30,116 --> 00:27:33,956
Speaker 2: like a research problem, and with any sort of research problem,

526
00:27:34,676 --> 00:27:36,756
Speaker 2: you don't know exactly how hard the solution is going

527
00:27:36,796 --> 00:27:38,596
Speaker 2: to be, and I think that there are some other

528
00:27:39,156 --> 00:27:41,836
Speaker 2: more nuanced unknowns as well that are somewhat similar to that.

529
00:27:41,956 --> 00:27:45,596
Speaker 2: And we have a large number of very talented researchers

530
00:27:45,836 --> 00:27:48,196
Speaker 2: on our team because we think that there are some

531
00:27:48,236 --> 00:27:51,436
Speaker 2: of these unsolved breakthroughs that are going to be needed

532
00:27:51,476 --> 00:27:53,516
Speaker 2: to really truly solve this problem.

533
00:27:54,276 --> 00:28:01,476
Speaker 1: So, if it does work well and things progress in

534
00:28:01,556 --> 00:28:05,316
Speaker 1: that universe, what would you be worried about?

535
00:28:06,116 --> 00:28:09,036
Speaker 2: Good question? I mean, if things work well, I shouldn't

536
00:28:09,036 --> 00:28:12,236
Speaker 2: be too worried. In general. I do think that it's

537
00:28:12,356 --> 00:28:16,316
Speaker 2: very easy in general to underestimate the challenges around actually

538
00:28:16,356 --> 00:28:20,756
Speaker 2: deploying and disseminating technology that takes time, and when the

539
00:28:20,836 --> 00:28:24,316
Speaker 2: technology doesn't exist yet, that means that like the world

540
00:28:24,436 --> 00:28:26,396
Speaker 2: is not in a place that is like ready for

541
00:28:26,436 --> 00:28:29,636
Speaker 2: that technology. I think that there's a lot of unknowns there.

542
00:28:29,956 --> 00:28:33,996
Speaker 1: I mean, one of the striking things to me about, say,

543
00:28:34,076 --> 00:28:36,596
Speaker 1: language models, is the people who know the most about

544
00:28:36,596 --> 00:28:39,036
Speaker 1: them seem to be the most worried about them, which

545
00:28:39,076 --> 00:28:42,196
Speaker 1: is generally not the case. I think historically with technology,

546
00:28:42,276 --> 00:28:47,596
Speaker 1: right the possible exception of the atomic bomb, and so

547
00:28:47,636 --> 00:28:51,036
Speaker 1: I'm curious. I mean those kinds of worries, like do

548
00:28:51,116 --> 00:28:53,356
Speaker 1: you share them? Are there worries you have about developing

549
00:28:53,356 --> 00:28:57,476
Speaker 1: a foundation model for robots about bad actors using it?

550
00:28:57,636 --> 00:29:01,796
Speaker 2: Even I do think that, like, yeah, there's plenty of

551
00:29:01,796 --> 00:29:05,476
Speaker 2: technology that has dual uses, and I think there are

552
00:29:06,636 --> 00:29:12,836
Speaker 2: applications of technologies that are harmful. I think that a

553
00:29:12,876 --> 00:29:17,916
Speaker 2: lot of the concerns in the language model community stem

554
00:29:17,956 --> 00:29:24,116
Speaker 2: from imviewing these systems with greater autonomy. And I think

555
00:29:24,156 --> 00:29:28,956
Speaker 2: that I work like hands on with the robots quite

556
00:29:28,956 --> 00:29:32,636
Speaker 2: a bit, and I don't see a world in which

557
00:29:32,876 --> 00:29:35,956
Speaker 2: they will be taking over in any way. It's very

558
00:29:35,996 --> 00:29:38,836
Speaker 2: easy to just like, well, with our current iteration of robots,

559
00:29:38,836 --> 00:29:40,676
Speaker 2: to just like if we threw some water on it,

560
00:29:40,716 --> 00:29:42,756
Speaker 2: the robot wouldn't be in trouble.

561
00:29:42,876 --> 00:29:46,716
Speaker 1: So that might be a problem for you, but I'm

562
00:29:46,756 --> 00:29:48,316
Speaker 1: sure you could solve that way we're working.

563
00:29:48,356 --> 00:29:50,356
Speaker 2: We're working on so we actually do have a new

564
00:29:50,356 --> 00:29:52,996
Speaker 2: iteration that that is actually a lot more waterproof. But

565
00:29:53,436 --> 00:29:54,716
Speaker 2: it's just not a concern that I show.

566
00:29:54,876 --> 00:29:58,756
Speaker 1: Okay, interesting basically just because you think we can whatever

567
00:29:59,196 --> 00:30:00,436
Speaker 1: turn it off if we need to.

568
00:30:01,036 --> 00:30:03,516
Speaker 2: Yeah, and yeah, and I think, yeah, there's always going

569
00:30:03,596 --> 00:30:05,796
Speaker 2: to be dual use concerns, but I think that the

570
00:30:06,156 --> 00:30:09,396
Speaker 2: pros of the technology outweigh outway some of the Jobson's.

571
00:30:09,196 --> 00:30:11,796
Speaker 1: Well, give me the happy story, then, like in what

572
00:30:11,796 --> 00:30:13,956
Speaker 1: what number of years should we choose for a happy story?

573
00:30:14,036 --> 00:30:15,396
Speaker 1: Ten is ten too soon?

574
00:30:16,036 --> 00:30:17,516
Speaker 2: I don't want to put a number to it. I

575
00:30:17,516 --> 00:30:21,716
Speaker 2: think that they with research, you don't know exactly how

576
00:30:21,756 --> 00:30:25,756
Speaker 2: thongs things will take. And I an envision a world

577
00:30:25,836 --> 00:30:30,876
Speaker 2: where the when you're developing hardware, it's it's not too

578
00:30:30,956 --> 00:30:34,276
Speaker 2: hard to actually teach it to do something, and teach

579
00:30:34,316 --> 00:30:38,236
Speaker 2: it to do something useful, rather than just having machines

580
00:30:38,316 --> 00:30:43,796
Speaker 2: that are not particularly intelligent, like dishwashers and laundry machines

581
00:30:43,836 --> 00:30:44,676
Speaker 2: and so forth.

582
00:30:45,676 --> 00:30:49,236
Speaker 1: Go bigger if you would like what like what what

583
00:30:49,236 --> 00:30:51,316
Speaker 1: what would be pill be teached robots to do in

584
00:30:51,316 --> 00:30:52,996
Speaker 1: that world, I.

585
00:30:53,196 --> 00:30:54,916
Speaker 2: Guess if we were to go bigger, I think that

586
00:30:55,036 --> 00:30:59,036
Speaker 2: there's a lot of challenges around helping helping people as

587
00:30:59,076 --> 00:31:02,316
Speaker 2: the age allowing them to be more independent. That that's

588
00:31:02,356 --> 00:31:05,636
Speaker 2: like a huge one. I think that I don't know, manufacturing,

589
00:31:05,676 --> 00:31:08,196
Speaker 2: there's all sorts of places where like there's abuse of

590
00:31:08,276 --> 00:31:11,076
Speaker 2: labor practices and we can maybe like be able to

591
00:31:11,076 --> 00:31:15,476
Speaker 2: eliminate those if it's a robot instead of a human. Yeah, many, many,

592
00:31:15,516 --> 00:31:17,476
Speaker 2: many examples. And I think that there's also even things

593
00:31:17,476 --> 00:31:20,476
Speaker 2: that are even hard to imagine because the technology doesn't exist.

594
00:31:20,516 --> 00:31:22,756
Speaker 2: So a lot of the things that I'm thinking about

595
00:31:22,796 --> 00:31:26,556
Speaker 2: are robots helping humans in different circumstances to allow them

596
00:31:26,556 --> 00:31:30,556
Speaker 2: to be more productive. But once something exists, like you often,

597
00:31:30,836 --> 00:31:32,876
Speaker 2: like people are creative and come up with new ways

598
00:31:32,876 --> 00:31:34,316
Speaker 2: of how that's used.

599
00:31:37,116 --> 00:31:49,196
Speaker 1: We'll be back in a minute with the lightning round. Great,

600
00:31:49,276 --> 00:31:54,116
Speaker 1: let's finish with the lightning round. What's one thing that

601
00:31:54,196 --> 00:31:58,796
Speaker 1: working with robots has caused you to appreciate about the

602
00:31:58,876 --> 00:31:59,516
Speaker 1: human body?

603
00:32:00,836 --> 00:32:02,196
Speaker 2: Our skin is pretty amazing.

604
00:32:02,676 --> 00:32:07,556
Speaker 1: Huh. Well, so we didn't talk about I mean a

605
00:32:07,636 --> 00:32:10,836
Speaker 1: sense of touch, or of of heat or of cold, right,

606
00:32:10,876 --> 00:32:13,556
Speaker 1: I mean presumably the models you're building, the robots you're

607
00:32:13,596 --> 00:32:17,076
Speaker 1: using don't have that, but they could, right, they could

608
00:32:17,196 --> 00:32:20,676
Speaker 1: have a sense of touch. Is anyone working on that?

609
00:32:20,876 --> 00:32:21,836
Speaker 1: Is that of interest to you?

610
00:32:22,676 --> 00:32:25,036
Speaker 2: Lots of people working on it. I think it's pretty interesting.

611
00:32:25,236 --> 00:32:28,516
Speaker 2: I think that the hardware technology is not super mature

612
00:32:28,716 --> 00:32:30,156
Speaker 2: compared to where I'd like for it to be in

613
00:32:30,236 --> 00:32:33,756
Speaker 2: terms of how robust it is. And the cheapness and

614
00:32:33,796 --> 00:32:37,156
Speaker 2: the resolution that said, Like, we actually put cameras on

615
00:32:37,236 --> 00:32:39,996
Speaker 2: the risks of our robot to help it get some

616
00:32:40,036 --> 00:32:42,516
Speaker 2: sort of tactile and for example, if you can, if

617
00:32:42,516 --> 00:32:45,076
Speaker 2: you like visually look at your finger as you make

618
00:32:45,116 --> 00:32:48,156
Speaker 2: contact with an object, you can see it to form

619
00:32:48,756 --> 00:32:51,796
Speaker 2: around that object, and you can actually just by looking

620
00:32:51,796 --> 00:32:55,076
Speaker 2: at your finger get some notion of tactile feedback similar

621
00:32:55,076 --> 00:32:57,196
Speaker 2: to what our skin gets. Yeah, and cameras are cheap,

622
00:32:57,196 --> 00:33:01,236
Speaker 2: really easy, robust, way more robust and cheap than existing

623
00:33:01,276 --> 00:33:02,596
Speaker 2: technology for tactile something.

624
00:33:04,716 --> 00:33:08,476
Speaker 1: I've heard you say that humanoid robots are overrated, and

625
00:33:08,516 --> 00:33:09,876
Speaker 1: I'm curious, why do you think that.

626
00:33:11,196 --> 00:33:14,956
Speaker 2: I think that simplicity is really helpful and important when

627
00:33:14,996 --> 00:33:19,596
Speaker 2: trying to develop technology. When you introduce more complexity than's needed,

628
00:33:19,636 --> 00:33:22,596
Speaker 2: it slows you down a lot. And I think that

629
00:33:22,876 --> 00:33:27,116
Speaker 2: the complexity that humanoids introduce. Yeah, I think that if

630
00:33:27,116 --> 00:33:29,316
Speaker 2: all of the robots we were working with were humanoids,

631
00:33:29,516 --> 00:33:31,996
Speaker 2: I think that we wouldn't have made anywhere near the

632
00:33:31,996 --> 00:33:35,236
Speaker 2: progress that we've made because we'd be dealing with additional challenges.

633
00:33:35,636 --> 00:33:38,636
Speaker 2: I also think that optimizing for ease of data collection

634
00:33:38,916 --> 00:33:41,276
Speaker 2: is really important in a world where we need data,

635
00:33:41,596 --> 00:33:45,396
Speaker 2: and it's a lot harder to collect and operate all

636
00:33:45,436 --> 00:33:49,236
Speaker 2: of the different joints and motors of a humanoid than

637
00:33:49,276 --> 00:33:51,476
Speaker 2: it is to control a simpler robot.

638
00:33:52,476 --> 00:33:54,236
Speaker 1: Do you anthropomorphize robots?

639
00:33:55,236 --> 00:33:58,676
Speaker 2: I hate it when people are anthrough morphize robots. I

640
00:33:58,716 --> 00:34:03,156
Speaker 2: think that it is misleading because the failure modes that

641
00:34:03,236 --> 00:34:05,596
Speaker 2: robots have are very different from the failure modes that

642
00:34:05,636 --> 00:34:08,836
Speaker 2: people have, and it misleads people into thinking that it's

643
00:34:08,876 --> 00:34:11,196
Speaker 2: going to behave in the way that people behave.

644
00:34:12,196 --> 00:34:13,836
Speaker 1: Like like in what way?

645
00:34:14,276 --> 00:34:16,516
Speaker 2: Oh like, if you see a robot doing something like

646
00:34:16,556 --> 00:34:20,036
Speaker 2: doing a backflip, like or even folding laundry, you kind

647
00:34:20,036 --> 00:34:21,996
Speaker 2: of assume that anything like like if you saw a

648
00:34:22,036 --> 00:34:23,796
Speaker 2: person do that, then they probably could do a lot

649
00:34:23,796 --> 00:34:26,236
Speaker 2: of other things too. And if you have to promorphize

650
00:34:26,236 --> 00:34:28,756
Speaker 2: the robot, then you assume that it, like the capabilities

651
00:34:28,756 --> 00:34:31,436
Speaker 2: that you see are representative as if it were like

652
00:34:31,476 --> 00:34:34,716
Speaker 2: a human ah, and that it could do a backflip anywhere,

653
00:34:35,036 --> 00:34:38,876
Speaker 2: or that it could fold laundry anywhere with any item

654
00:34:38,876 --> 00:34:39,676
Speaker 2: of clothing.

655
00:34:39,516 --> 00:34:41,396
Speaker 1: Or surely you would think a robot that could do

656
00:34:41,436 --> 00:34:44,476
Speaker 1: a backflip could fold a shirt, but no.

657
00:34:45,196 --> 00:34:49,956
Speaker 2: Exactly exactly, so sometimes it's fun to like assign emotions

658
00:34:49,956 --> 00:34:51,756
Speaker 2: to some of the things, or say the robots having

659
00:34:51,796 --> 00:34:54,476
Speaker 2: a bad day, because certainly it feels like that sometime.

660
00:34:54,676 --> 00:34:58,356
Speaker 2: But when it kind of moves beyond fun and jokes,

661
00:34:58,836 --> 00:35:01,316
Speaker 2: it might have consequences that I don't think makes sense.

662
00:35:02,836 --> 00:35:06,636
Speaker 1: I read that there was a researcher who said they

663
00:35:06,676 --> 00:35:10,276
Speaker 1: would retire if a robot tied to shoela Yes, and

664
00:35:10,276 --> 00:35:13,036
Speaker 1: then one of your robots tied to shoelace, and I

665
00:35:13,076 --> 00:35:18,476
Speaker 1: guess they didn't retire. But I'm curious. What would you

666
00:35:18,676 --> 00:35:22,156
Speaker 1: need to see a robot do to retire.

667
00:35:23,516 --> 00:35:26,916
Speaker 2: Hmm, I don't know. I guess one example that I've

668
00:35:26,916 --> 00:35:29,676
Speaker 2: given before that I would love to see a robot do.

669
00:35:29,756 --> 00:35:32,716
Speaker 2: I don't think this is quite retirement level, but being

670
00:35:32,756 --> 00:35:34,876
Speaker 2: able to go into a kitchen that has never been

671
00:35:34,916 --> 00:35:39,196
Speaker 2: in before and make a bowl of cereal pretty basic,

672
00:35:40,236 --> 00:35:42,356
Speaker 2: especially compared to doing a backflip. I cannot do a

673
00:35:42,356 --> 00:35:44,396
Speaker 2: backflip myself, but I could make a bowl of cereal.

674
00:35:44,716 --> 00:35:47,476
Speaker 2: But it requires being able to find objects in the environment,

675
00:35:47,516 --> 00:35:51,036
Speaker 2: being able to interact with delicate objects like a cereal box,

676
00:35:51,596 --> 00:35:54,116
Speaker 2: maybe even use tools in order to open the cereal box.

677
00:35:54,516 --> 00:35:58,396
Speaker 2: Pouring liquids. Yeah, so that's a task that I love,

678
00:35:58,636 --> 00:36:00,796
Speaker 2: and I could actually even see us being able to

679
00:36:01,116 --> 00:36:04,276
Speaker 2: show a demo of that without too much difficulty actually

680
00:36:04,716 --> 00:36:06,676
Speaker 2: if we put our mind to it and in collected

681
00:36:06,756 --> 00:36:09,276
Speaker 2: data for it. So it actually is, I think, or

682
00:36:09,316 --> 00:36:12,756
Speaker 2: within reach than maybe I imagined a few years ago.

683
00:36:12,876 --> 00:36:16,756
Speaker 1: Just as you're thinking about it, it's getting closer. You're like, oh, wait,

684
00:36:16,796 --> 00:36:17,516
Speaker 1: we could do that.

685
00:36:18,396 --> 00:36:20,676
Speaker 2: Yeah. I mean we've actually collected data of pouring cereal,

686
00:36:21,276 --> 00:36:23,516
Speaker 2: like opening a cereal box and pouring it into a bowl.

687
00:36:23,916 --> 00:36:26,916
Speaker 2: We haven't yet done liquid handling and pouring, but I

688
00:36:26,916 --> 00:36:28,996
Speaker 2: think we're actually going to do it this week. On

689
00:36:29,076 --> 00:36:32,076
Speaker 2: the Robot, I asked the hardware team to make a

690
00:36:32,476 --> 00:36:35,636
Speaker 2: waterproof robot. So we're not too far. A lot of

691
00:36:35,676 --> 00:36:38,836
Speaker 2: the pieces are coming together. I also, I love working

692
00:36:38,836 --> 00:36:41,796
Speaker 2: with robots and so, and I'm also fairly young, I

693
00:36:41,796 --> 00:36:46,036
Speaker 2: think not too old, and so I don't imagine myself

694
00:36:46,036 --> 00:36:47,036
Speaker 2: retiring anytime soon.

695
00:36:53,996 --> 00:36:57,156
Speaker 1: Chelsea Finn is a Stanford professor and the co founder

696
00:36:57,196 --> 00:37:01,596
Speaker 1: of Physical Intelligence. You can email us at problem at

697
00:37:01,596 --> 00:37:04,556
Speaker 1: pushkin dot fm, and please do email us. I read

698
00:37:04,596 --> 00:37:07,996
Speaker 1: all the emails. Today's show was produced by Gabriel Hunter Chang,

699
00:37:08,516 --> 00:37:12,836
Speaker 1: edited by Alexander Garreton and engineered by Sarah Bruguerrett. I'm

700
00:37:12,876 --> 00:37:15,236
Speaker 1: Jacob Goldstein and we'll be back next week with another

701
00:37:15,236 --> 00:37:16,236
Speaker 1: episode of What's Your Pop