1
00:00:04,400 --> 00:00:07,800
Speaker 1: Welcome to tech Stuff, a production from my Heart Radio.

2
00:00:12,039 --> 00:00:14,760
Speaker 1: Hey there, and welcome to tech Stuff. I'm your host,

3
00:00:14,880 --> 00:00:18,160
Speaker 1: Jonathan Strickland. I'm an executive producer with I Heart Radio,

4
00:00:18,200 --> 00:00:21,560
Speaker 1: and I love all things tech. And Halloween is over

5
00:00:21,800 --> 00:00:23,639
Speaker 1: by the time you hear this. I hope you had

6
00:00:23,680 --> 00:00:26,560
Speaker 1: a happy one. But I still have something that falls

7
00:00:26,680 --> 00:00:31,200
Speaker 1: into the kind of creepy category, at least in my opinion.

8
00:00:31,760 --> 00:00:34,960
Speaker 1: And I discovered this after looking around at tech news

9
00:00:35,000 --> 00:00:37,920
Speaker 1: in general, and I became fascinated by it and figured, hey,

10
00:00:37,960 --> 00:00:40,839
Speaker 1: you know, I haven't done a really focused episode on

11
00:00:40,880 --> 00:00:44,920
Speaker 1: a very specific implementation of technology in a long time,

12
00:00:45,560 --> 00:00:48,840
Speaker 1: so why not do that now. Now, anyone who knows

13
00:00:48,920 --> 00:00:52,199
Speaker 1: me can tell you that I am a sucker for

14
00:00:52,479 --> 00:00:57,320
Speaker 1: Disney imagineering, which of course is the peculiar twist on

15
00:00:57,920 --> 00:01:03,560
Speaker 1: engineering and innovation that Disney champions. Right. The inventiveness and

16
00:01:03,640 --> 00:01:06,400
Speaker 1: the attention to detail impressed me a great deal. Those

17
00:01:06,440 --> 00:01:11,160
Speaker 1: are hallmarks of Disney engineering or imagineering. And I've done

18
00:01:11,200 --> 00:01:14,319
Speaker 1: episodes covering various elements that tie into this, from the

19
00:01:14,400 --> 00:01:18,160
Speaker 1: history of upcot to how audio animatronics work. And it's

20
00:01:18,200 --> 00:01:22,440
Speaker 1: that last topic I wish to revisit because not long

21
00:01:22,480 --> 00:01:27,080
Speaker 1: ago I read a research paper from Disney Imagineers titled

22
00:01:27,480 --> 00:01:33,040
Speaker 1: Realistic and Interactive Robot Gaze. That's g A Z E,

23
00:01:33,720 --> 00:01:36,240
Speaker 1: you know, referring to where a person or in this

24
00:01:36,319 --> 00:01:41,040
Speaker 1: case uh an object a robot appears to be looking.

25
00:01:41,920 --> 00:01:44,679
Speaker 1: And the paper is fascinating and it's available for anyone

26
00:01:44,720 --> 00:01:47,480
Speaker 1: to read for free. So if you find this subject

27
00:01:47,480 --> 00:01:50,480
Speaker 1: matter neat, I really recommend you read it. Now. It

28
00:01:50,560 --> 00:01:54,280
Speaker 1: does get a bit technical. There's some math in there too,

29
00:01:54,600 --> 00:01:56,880
Speaker 1: but for the most part, I think it's a pretty

30
00:01:56,960 --> 00:02:02,880
Speaker 1: accessible paper. The pictures and good gravy, y'all. The video

31
00:02:03,520 --> 00:02:08,320
Speaker 1: that are connected to this project are the stuff of nightmares,

32
00:02:09,120 --> 00:02:11,720
Speaker 1: but we'll get to that. The heart of the paper

33
00:02:12,160 --> 00:02:16,400
Speaker 1: is all about designing systems so that an audio animatronic

34
00:02:16,520 --> 00:02:20,800
Speaker 1: or or just an animatronic figure can make and maintain

35
00:02:21,000 --> 00:02:24,639
Speaker 1: eye contact or at least appear to with someone who

36
00:02:24,720 --> 00:02:27,799
Speaker 1: is looking at that figure and onlooker. So, in other words,

37
00:02:28,360 --> 00:02:32,360
Speaker 1: imagine that there's a Disney attraction at a park, and

38
00:02:32,720 --> 00:02:35,200
Speaker 1: in this attraction you can walk up to a robot.

39
00:02:35,560 --> 00:02:39,280
Speaker 1: It's probably going to be behind like a rail or

40
00:02:39,320 --> 00:02:41,440
Speaker 1: inside a booth or something, so that you can't you know,

41
00:02:41,960 --> 00:02:45,760
Speaker 1: touch it, and the robot notices you looking at it,

42
00:02:45,800 --> 00:02:48,720
Speaker 1: and it looks you in the eye. And then maybe

43
00:02:48,760 --> 00:02:51,639
Speaker 1: you get to chat with the robot and it maintains

44
00:02:51,639 --> 00:02:54,600
Speaker 1: eye contact with you, and occasionally maybe it's eyes dart

45
00:02:54,639 --> 00:02:57,360
Speaker 1: around to glance at other stuff that's within its field

46
00:02:57,400 --> 00:03:00,799
Speaker 1: of view, or maybe even indicating that the robot is

47
00:03:00,840 --> 00:03:03,880
Speaker 1: appearing to like take a second to think of a response.

48
00:03:04,320 --> 00:03:07,080
Speaker 1: That's kind of what we're talking about here. And here's

49
00:03:07,120 --> 00:03:11,480
Speaker 1: the thing. This is surprisingly difficult to do, and it's

50
00:03:11,680 --> 00:03:17,000
Speaker 1: extra hard to do without dipping into super unsettling territory.

51
00:03:17,120 --> 00:03:20,079
Speaker 1: So today we're going to learn more about the technology

52
00:03:20,120 --> 00:03:24,000
Speaker 1: and the psychology behind this project, as well as what

53
00:03:24,160 --> 00:03:28,560
Speaker 1: makes it different from earlier audio animatronics, which is honestly

54
00:03:28,560 --> 00:03:32,000
Speaker 1: a good place for us to start. The original audio

55
00:03:32,040 --> 00:03:36,480
Speaker 1: animatronics were essentially puppets. In fact, you could argue that

56
00:03:36,640 --> 00:03:42,840
Speaker 1: all animatronics are ultimately puppets. Each puppet has a certain

57
00:03:42,920 --> 00:03:46,000
Speaker 1: number of degrees of freedom, and that refers to a

58
00:03:46,080 --> 00:03:49,760
Speaker 1: number of independent directions of motion. So let's take a

59
00:03:49,800 --> 00:03:54,320
Speaker 1: simple example. Let's say that a robots neck only has

60
00:03:54,400 --> 00:03:56,920
Speaker 1: one degree of freedom. Well, that would mean the robot

61
00:03:57,040 --> 00:03:59,480
Speaker 1: might be able to nod its head up and down.

62
00:04:00,000 --> 00:04:01,520
Speaker 1: But if it could do that, it wouldn't be able

63
00:04:01,520 --> 00:04:03,600
Speaker 1: to shake its head or tilt its head, because that

64
00:04:03,600 --> 00:04:06,840
Speaker 1: would be an additional degree of freedom. Or maybe it's

65
00:04:06,880 --> 00:04:08,920
Speaker 1: able to shake its head, but it's not able to

66
00:04:09,000 --> 00:04:11,920
Speaker 1: nod or tilt because it only has that one degree

67
00:04:11,920 --> 00:04:14,720
Speaker 1: of freedom. That one degree is really limiting, and it

68
00:04:14,760 --> 00:04:20,520
Speaker 1: just tells us the full range of of direction emotions

69
00:04:21,040 --> 00:04:24,320
Speaker 1: that any one joint can do, and we typically talk

70
00:04:24,360 --> 00:04:27,680
Speaker 1: about degrees of freedom with joints to express the range

71
00:04:27,680 --> 00:04:32,160
Speaker 1: of possible motions the you know, whatever it is can perform.

72
00:04:32,200 --> 00:04:36,000
Speaker 1: The enchanted Tiki Room at Disneyland was an early example

73
00:04:36,080 --> 00:04:40,040
Speaker 1: of audio animatronic ingenuity. It wasn't the very first use

74
00:04:40,080 --> 00:04:43,280
Speaker 1: of audio animatronics, but it was an early one, and

75
00:04:43,360 --> 00:04:46,480
Speaker 1: when you learned how it worked behind the scenes, it's

76
00:04:46,560 --> 00:04:50,960
Speaker 1: pretty wacky. The various birds, flowers, and other elements in

77
00:04:50,960 --> 00:04:55,080
Speaker 1: the attraction connected to a very complex system, including some

78
00:04:55,200 --> 00:04:59,800
Speaker 1: pneumatic valves. A pneumatic system uses air under pressure to

79
00:05:00,040 --> 00:05:03,400
Speaker 1: do work, so these valves in turn connected to a

80
00:05:03,480 --> 00:05:08,400
Speaker 1: circuit that had thin metal reads as switches. Now, normally

81
00:05:08,680 --> 00:05:11,640
Speaker 1: the switch would be open, meaning no electricity can flow

82
00:05:11,680 --> 00:05:14,720
Speaker 1: through the circuit and thus provide electricity to open or

83
00:05:14,839 --> 00:05:19,040
Speaker 1: close the valve. But when sounds of a certain frequency

84
00:05:19,160 --> 00:05:22,719
Speaker 1: would play near these reads, it would cause those reads

85
00:05:22,720 --> 00:05:25,240
Speaker 1: to vibrate, and you know, depending on the thickness and

86
00:05:25,320 --> 00:05:28,440
Speaker 1: length of the read, that would determine what frequency of

87
00:05:28,520 --> 00:05:32,000
Speaker 1: sound would most likely get it to start vibrating. Once

88
00:05:32,000 --> 00:05:34,760
Speaker 1: it vibrated, it would close the circuit and thus allow

89
00:05:34,839 --> 00:05:39,160
Speaker 1: power to go through to the respective valve. And every

90
00:05:39,200 --> 00:05:41,200
Speaker 1: bird and flower in the attraction had this sort of

91
00:05:41,240 --> 00:05:45,279
Speaker 1: system where the sounds playing through the sound system would

92
00:05:45,320 --> 00:05:48,400
Speaker 1: actually cause the individual circuits for those birds and flowers

93
00:05:48,400 --> 00:05:51,919
Speaker 1: to activate. So the chirping of the bird, that chirping

94
00:05:51,920 --> 00:05:54,320
Speaker 1: sound was actually the sound that was opening and closing

95
00:05:54,360 --> 00:05:58,240
Speaker 1: the the circuit and thus activating the valve that would

96
00:05:58,279 --> 00:06:01,719
Speaker 1: control the bird's beak. And because the figures relied on

97
00:06:01,760 --> 00:06:04,839
Speaker 1: the sound to close the circuit, they were audio animatronics.

98
00:06:05,480 --> 00:06:08,479
Speaker 1: Over the years, Disney would improve on this design, sometimes

99
00:06:08,520 --> 00:06:12,279
Speaker 1: by necessity. So for example, when the imagineers set out

100
00:06:12,279 --> 00:06:15,680
Speaker 1: to create the attraction The Great Moments with Mr. Lincoln,

101
00:06:16,279 --> 00:06:18,520
Speaker 1: they had to come up with new mechanisms to do

102
00:06:18,640 --> 00:06:22,600
Speaker 1: that because pneumatics would not be a good solution. With pneumatics,

103
00:06:22,600 --> 00:06:25,520
Speaker 1: you've got a couple of limitations that you're working with.

104
00:06:25,600 --> 00:06:29,560
Speaker 1: One is that you can't move really heavy stuff effectively

105
00:06:29,640 --> 00:06:34,159
Speaker 1: with pneumatics. Another is that pneumatic pistons tend to move

106
00:06:34,200 --> 00:06:38,320
Speaker 1: really fast. It's hard to do controlled slow movements with pneumatics.

107
00:06:38,320 --> 00:06:40,760
Speaker 1: So it might be okay for something like a bird

108
00:06:40,760 --> 00:06:44,320
Speaker 1: flapping its wings or opening and closing its beak fairly quickly,

109
00:06:44,720 --> 00:06:47,560
Speaker 1: but it's not so great for say, a revered US

110
00:06:47,640 --> 00:06:51,920
Speaker 1: president lifting his hand. But I've covered that in other episodes.

111
00:06:52,480 --> 00:06:54,880
Speaker 1: The really important thing I want to stress is that

112
00:06:55,000 --> 00:07:00,520
Speaker 1: audio animatronic figures have historically been limited to a cific,

113
00:07:00,920 --> 00:07:05,880
Speaker 1: pre programmed sequence of motions, so calling them puppets is

114
00:07:06,279 --> 00:07:10,360
Speaker 1: fairly appropriate. These are figures that will do the exact

115
00:07:10,440 --> 00:07:14,160
Speaker 1: same sequence of motions until something goes wrong or the

116
00:07:14,200 --> 00:07:17,600
Speaker 1: attraction is shut off for some reason. The pirate and

117
00:07:17,680 --> 00:07:20,800
Speaker 1: Pirates of the Caribbean that is precariously attempting to step

118
00:07:20,840 --> 00:07:23,960
Speaker 1: onto a rowboat is never going to fall into the water.

119
00:07:24,360 --> 00:07:26,720
Speaker 1: He's never going to get into the boat, and he's

120
00:07:26,760 --> 00:07:29,800
Speaker 1: never gonna step back onto the shore. He will continue

121
00:07:29,960 --> 00:07:34,520
Speaker 1: his balancing act until the end of time. And this

122
00:07:34,600 --> 00:07:37,600
Speaker 1: is starting to sound like some sort of Greek myth

123
00:07:37,680 --> 00:07:40,640
Speaker 1: about the afterlife at this point. Now, the reason I'm

124
00:07:40,640 --> 00:07:43,880
Speaker 1: bringing this up, the reason it's important, is that creating

125
00:07:44,040 --> 00:07:48,520
Speaker 1: an animatronic figure that can actually detect an onlookers gaze

126
00:07:48,960 --> 00:07:53,200
Speaker 1: and return it making eye contact can't be totally dedicated

127
00:07:53,240 --> 00:07:57,520
Speaker 1: to following the same set of motions on repeat. There

128
00:07:57,560 --> 00:08:01,240
Speaker 1: has to be some room for variability within it. At

129
00:08:01,240 --> 00:08:04,840
Speaker 1: the same time, Disney's whole gig is to create a show.

130
00:08:05,440 --> 00:08:09,000
Speaker 1: The amusement parks are show business. If you are in

131
00:08:09,160 --> 00:08:12,040
Speaker 1: a public space of one of those parks, like you're

132
00:08:12,120 --> 00:08:15,240
Speaker 1: inside the confines of the park itself, walgging a down

133
00:08:15,280 --> 00:08:19,520
Speaker 1: Main street or whatever, you are on stage. The employees

134
00:08:19,520 --> 00:08:23,400
Speaker 1: are called cast members, and shows, while they can have

135
00:08:23,480 --> 00:08:27,040
Speaker 1: some variation in them, are supposed to follow a general flow.

136
00:08:27,160 --> 00:08:30,720
Speaker 1: They follow a script. And so the imagineers were working

137
00:08:30,720 --> 00:08:33,680
Speaker 1: on creating a figure that would follow a scripted set

138
00:08:33,679 --> 00:08:36,280
Speaker 1: of behaviors, but would have the freedom to throw in

139
00:08:36,360 --> 00:08:39,840
Speaker 1: stuff like eye contact now and then the figure, in

140
00:08:39,880 --> 00:08:44,600
Speaker 1: a way would be able to improvise. It's jazz Baby.

141
00:08:44,840 --> 00:08:46,839
Speaker 1: The tune is more or less set, but how you

142
00:08:46,920 --> 00:08:49,960
Speaker 1: go through it allows for a lot of variation. For

143
00:08:50,040 --> 00:08:53,040
Speaker 1: the purposes of this work, the team relied on an

144
00:08:53,040 --> 00:08:56,800
Speaker 1: animatronic bust. Now we've kind of dropped the audio at

145
00:08:56,800 --> 00:09:01,480
Speaker 1: this point. Modern animatronic figures are not really driven by

146
00:09:01,640 --> 00:09:06,520
Speaker 1: audio signals anymore. They're driven by circuitry and sophisticated computer

147
00:09:06,640 --> 00:09:11,720
Speaker 1: systems and programs. Though to be fair, they still often

148
00:09:11,760 --> 00:09:15,120
Speaker 1: are referred to as audio animatronic. But you really need

149
00:09:15,200 --> 00:09:18,240
Speaker 1: to see a picture of this thing. I'll do my

150
00:09:18,280 --> 00:09:21,080
Speaker 1: best to describe it, but really you should search this

151
00:09:21,240 --> 00:09:27,600
Speaker 1: Disney uh interactive gaze animatronic because who boy, so imagine

152
00:09:27,679 --> 00:09:32,000
Speaker 1: the V shaped torso of a bust sculpture, right, It's

153
00:09:32,080 --> 00:09:34,640
Speaker 1: very narrow at the bottom, and it widens up to

154
00:09:34,679 --> 00:09:38,360
Speaker 1: the shoulders. It's clad in a white button up shirt,

155
00:09:38,640 --> 00:09:40,880
Speaker 1: you know, kind of like an Oxford shirt of business shirt.

156
00:09:41,880 --> 00:09:44,800
Speaker 1: It does have shoulders, but does not have arms. It

157
00:09:44,880 --> 00:09:48,920
Speaker 1: has a head, good golly, it has a head. The

158
00:09:49,000 --> 00:09:52,560
Speaker 1: head of this figure has a sort of plastic skull,

159
00:09:53,280 --> 00:09:56,680
Speaker 1: though it's kind of more like a plastic mask than

160
00:09:56,960 --> 00:10:00,199
Speaker 1: a human skull. It doesn't look like a skeleton skull.

161
00:10:00,679 --> 00:10:04,200
Speaker 1: It does have eyes, it's even got eyelids, and it's

162
00:10:04,240 --> 00:10:08,719
Speaker 1: got teeth. And looking at this thing is a little unsettling.

163
00:10:09,360 --> 00:10:12,920
Speaker 1: And that's before it even makes eye contact with you. Now,

164
00:10:13,000 --> 00:10:15,840
Speaker 1: why would you want to make something like this be

165
00:10:15,960 --> 00:10:18,920
Speaker 1: able to make eye contact in the first place. Well,

166
00:10:18,960 --> 00:10:24,280
Speaker 1: eye contact is an important social signal. It shows mutual acknowledgement,

167
00:10:24,360 --> 00:10:27,360
Speaker 1: and it can lead us to projecting certain things upon

168
00:10:27,400 --> 00:10:31,199
Speaker 1: the person or animal that's making eye contact with us.

169
00:10:31,480 --> 00:10:34,760
Speaker 1: We tend to perceive such creatures as possessing a certain

170
00:10:34,760 --> 00:10:38,960
Speaker 1: amount of intelligence and sincerity. For example, when I make

171
00:10:39,040 --> 00:10:42,360
Speaker 1: eye contact with my dog Ti Bolt, I perceive him

172
00:10:42,400 --> 00:10:46,440
Speaker 1: to be intelligent and alert and loving. Now I have

173
00:10:46,520 --> 00:10:49,400
Speaker 1: no way of knowing what is really going on in

174
00:10:49,520 --> 00:10:53,160
Speaker 1: his doggy mind. I suspect it's probably more along the

175
00:10:53,200 --> 00:10:55,760
Speaker 1: lines of is the bald man about to give me

176
00:10:55,840 --> 00:10:59,120
Speaker 1: a treat? I should pay attention, But I like to

177
00:10:59,160 --> 00:11:03,120
Speaker 1: think of it as sincere love. Now, as the paper states, quote,

178
00:11:03,640 --> 00:11:07,480
Speaker 1: given the importance of gays in social interactions, as well

179
00:11:07,520 --> 00:11:11,280
Speaker 1: as its ability to communicate states and shape perceptions, it

180
00:11:11,400 --> 00:11:14,480
Speaker 1: is a parent that gays can function as a significant

181
00:11:14,520 --> 00:11:19,160
Speaker 1: tool for an interactive robot character end quote. And I

182
00:11:19,160 --> 00:11:21,840
Speaker 1: can totally grock that. I imagine what it might be

183
00:11:21,920 --> 00:11:25,160
Speaker 1: like to a child who's going to Disney World or

184
00:11:25,240 --> 00:11:28,400
Speaker 1: Disneyland for the very first time and going to a

185
00:11:28,520 --> 00:11:32,280
Speaker 1: ride or an attraction where there's an animatronic figure, perhaps

186
00:11:32,400 --> 00:11:35,400
Speaker 1: one that looks like a famous Disney character, and it

187
00:11:35,480 --> 00:11:38,560
Speaker 1: makes eye contact with that child, maybe it even speaks

188
00:11:38,600 --> 00:11:40,839
Speaker 1: to the child, and maybe it can respond to the

189
00:11:40,920 --> 00:11:44,400
Speaker 1: child of the child speaks back. That sort of interaction

190
00:11:44,720 --> 00:11:46,240
Speaker 1: would have been the kind of stuff that would have

191
00:11:46,240 --> 00:11:49,560
Speaker 1: stuck with me as a kid well into adulthood, and

192
00:11:49,600 --> 00:11:52,240
Speaker 1: I feel confident about that because I have a lot

193
00:11:52,280 --> 00:11:56,880
Speaker 1: of memories of the seemingly magical moments I've experienced at

194
00:11:56,920 --> 00:12:00,400
Speaker 1: Disney with far more primitive technology. Is that we're in

195
00:12:00,440 --> 00:12:03,040
Speaker 1: the Disney parks when I first started visiting them in

196
00:12:03,080 --> 00:12:06,400
Speaker 1: the nineteen seventies, so I can certainly see the show

197
00:12:06,559 --> 00:12:09,800
Speaker 1: need for this sort of development. But there are numerous

198
00:12:09,920 --> 00:12:12,640
Speaker 1: challenges that stand in the way of achieving this goal,

199
00:12:12,760 --> 00:12:16,880
Speaker 1: and they fall into different broad categories. Perhaps the easiest

200
00:12:17,000 --> 00:12:20,079
Speaker 1: set of challenges to conquer is actually the electro mechanical

201
00:12:20,240 --> 00:12:23,360
Speaker 1: side of things. That is, the actual mechanisms that you're

202
00:12:23,360 --> 00:12:27,240
Speaker 1: going to use to create these effects, the servos and

203
00:12:27,280 --> 00:12:29,920
Speaker 1: the motors and the other components that will create the

204
00:12:29,960 --> 00:12:33,720
Speaker 1: actual motions that will translate into the robot making eye

205
00:12:33,720 --> 00:12:38,920
Speaker 1: contact or behaving in otherwise realistic ways. That's one of

206
00:12:38,960 --> 00:12:42,280
Speaker 1: the set of challenges, but there are others. One is

207
00:12:42,280 --> 00:12:45,480
Speaker 1: giving the robot the ability to detect the gaze of

208
00:12:45,600 --> 00:12:48,160
Speaker 1: onlookers in the first place. There has to be some

209
00:12:48,240 --> 00:12:52,880
Speaker 1: sort of face recognition and maybe even eye tracking technology

210
00:12:52,960 --> 00:12:56,600
Speaker 1: so that the robot looks at the right spot. So

211
00:12:56,640 --> 00:12:59,360
Speaker 1: the electro mechanical parts have to work correctly, but so

212
00:12:59,400 --> 00:13:03,600
Speaker 1: does the robot vision or perception. Otherwise the robot is

213
00:13:03,600 --> 00:13:06,199
Speaker 1: going to look in the wrong spot, perhaps staring off

214
00:13:06,240 --> 00:13:09,560
Speaker 1: to one side or above or below and onlooker's eye

215
00:13:09,559 --> 00:13:14,160
Speaker 1: contact or attempt at eye contact. Another challenge would be

216
00:13:14,200 --> 00:13:16,440
Speaker 1: on the programming side. You have to figure out how

217
00:13:16,440 --> 00:13:18,719
Speaker 1: to determine who the figure is going to look at.

218
00:13:19,000 --> 00:13:22,199
Speaker 1: You also have to figure out how long the robot

219
00:13:22,200 --> 00:13:26,120
Speaker 1: will look at somebody and what could distract the robot,

220
00:13:26,240 --> 00:13:29,320
Speaker 1: and whether or not the robot would return to looking at,

221
00:13:29,440 --> 00:13:32,240
Speaker 1: you know, the first person, or maybe look at a

222
00:13:32,280 --> 00:13:35,040
Speaker 1: second person, or maybe look at something else Entirely, you

223
00:13:35,080 --> 00:13:38,319
Speaker 1: have to solve the challenge of the program and prioritize

224
00:13:38,360 --> 00:13:41,480
Speaker 1: the order of operations so that the robot behaves in

225
00:13:41,480 --> 00:13:43,920
Speaker 1: a way that makes sense, as opposed to a robot

226
00:13:43,920 --> 00:13:47,679
Speaker 1: that's just you know, reacting to all visual stimuli in

227
00:13:47,720 --> 00:13:51,880
Speaker 1: a random way, which would be at the very least disconcerting.

228
00:13:52,640 --> 00:13:54,480
Speaker 1: And then we get to something that's a bit harder

229
00:13:54,520 --> 00:13:57,760
Speaker 1: to define than degrees of freedom or range of motion

230
00:13:58,160 --> 00:14:02,120
Speaker 1: or the hierarchy of programming, and that's human psychology. Now,

231
00:14:02,160 --> 00:14:05,559
Speaker 1: as the paper points out, eye contact is an important

232
00:14:05,600 --> 00:14:09,160
Speaker 1: social cue for most of us, but there are a

233
00:14:09,160 --> 00:14:11,960
Speaker 1: whole range of humans out there right For people who

234
00:14:11,960 --> 00:14:15,600
Speaker 1: have autism, eye contact can be a really challenging task,

235
00:14:16,160 --> 00:14:19,040
Speaker 1: and it tends to make people who have this type

236
00:14:19,040 --> 00:14:21,880
Speaker 1: of autism. It makes their lives a little more difficult

237
00:14:22,040 --> 00:14:26,520
Speaker 1: or complicated as a result. It's something that people some

238
00:14:26,560 --> 00:14:29,280
Speaker 1: people anyway, have to consciously deal with. They have to

239
00:14:30,040 --> 00:14:32,720
Speaker 1: remember to do this and work at it. It's not

240
00:14:32,920 --> 00:14:35,120
Speaker 1: it's not a natural behavior for them. So this is

241
00:14:35,160 --> 00:14:37,320
Speaker 1: something that can be tricky for human beings, let alone

242
00:14:37,400 --> 00:14:41,240
Speaker 1: for robots. Now, while eye contact can help create a

243
00:14:41,280 --> 00:14:44,320
Speaker 1: sense of sincerity and interest, it can also shift over

244
00:14:44,360 --> 00:14:48,560
Speaker 1: into more unpleasant territory, such as a sense of predatory

245
00:14:48,680 --> 00:14:52,840
Speaker 1: intent or as a comedian I once saw said there's

246
00:14:52,840 --> 00:14:55,840
Speaker 1: a fine line between the casual eye contact of a

247
00:14:55,880 --> 00:14:59,040
Speaker 1: friend and the cold stare of a serial killer. He

248
00:14:59,120 --> 00:15:01,960
Speaker 1: was specifically taught king about trying to navigate the tricky

249
00:15:02,040 --> 00:15:05,040
Speaker 1: territory of approaching people in order to get to know them.

250
00:15:05,400 --> 00:15:07,400
Speaker 1: But I think the meaning could be used for lots

251
00:15:07,400 --> 00:15:11,160
Speaker 1: of scenarios, including an encounter with a robotic figure. And

252
00:15:11,240 --> 00:15:15,640
Speaker 1: along with that is the issue of the uncanny valley,

253
00:15:15,680 --> 00:15:19,120
Speaker 1: which I have touched on in previous episodes. I'm not

254
00:15:19,200 --> 00:15:21,920
Speaker 1: sure if I've ever actually talked about the origin of

255
00:15:21,960 --> 00:15:25,400
Speaker 1: the phrase, however, a professor at the Tokyo Institute of

256
00:15:25,400 --> 00:15:28,960
Speaker 1: Technology named massa Hiro Mori coined this phrase in the

257
00:15:29,040 --> 00:15:33,800
Speaker 1: nineteen seventies to describe a pretty odd phenomenon. As robots

258
00:15:33,880 --> 00:15:37,680
Speaker 1: become more human like or more lifelike in general, they

259
00:15:37,720 --> 00:15:41,640
Speaker 1: become more appealing to us, but only up to a point,

260
00:15:42,120 --> 00:15:44,560
Speaker 1: and once they get to that point and go beyond it,

261
00:15:45,440 --> 00:15:51,040
Speaker 1: our reception of these robots plunges into the uncanny valley.

262
00:15:51,120 --> 00:15:54,680
Speaker 1: The valley in this case is how humans react to

263
00:15:54,880 --> 00:15:57,440
Speaker 1: the robot. This also applies to other stuff like c

264
00:15:57,640 --> 00:16:00,920
Speaker 1: g I characters, for instance, and other words are a

265
00:16:01,000 --> 00:16:04,560
Speaker 1: robot that might be a simple industrial arm is one

266
00:16:04,640 --> 00:16:07,440
Speaker 1: we probably wouldn't feel very much affinity for, you know,

267
00:16:07,480 --> 00:16:11,680
Speaker 1: it's obviously a machine. A robot that still looks really robotic,

268
00:16:11,800 --> 00:16:14,240
Speaker 1: but has you know, arms and legs like a vaguely

269
00:16:14,280 --> 00:16:17,280
Speaker 1: humanoid shape. We would probably feel a little more affinity

270
00:16:17,320 --> 00:16:20,280
Speaker 1: towards that make it look a little bit more human,

271
00:16:20,560 --> 00:16:23,360
Speaker 1: but you know, not to the point where anyone would

272
00:16:23,800 --> 00:16:26,880
Speaker 1: mistake it for being human. We might like it even more.

273
00:16:27,280 --> 00:16:29,960
Speaker 1: But once you start getting close to but not quite

274
00:16:30,160 --> 00:16:33,960
Speaker 1: human in appearance and behavior, our response drops to a

275
00:16:34,000 --> 00:16:37,720
Speaker 1: point where a lot of people feel unsettled, or even

276
00:16:37,880 --> 00:16:41,960
Speaker 1: they might feel revulsion when looking at the figure. Something is,

277
00:16:42,000 --> 00:16:45,160
Speaker 1: you know, not right. The cues that would normally help

278
00:16:45,200 --> 00:16:48,800
Speaker 1: us identify with the synthetic figure now feel strange and

279
00:16:48,880 --> 00:16:52,960
Speaker 1: maybe even scary. It's possible to get beyond the uncanny

280
00:16:53,040 --> 00:16:56,080
Speaker 1: valley to create a robot or c g I character

281
00:16:56,440 --> 00:17:00,720
Speaker 1: that doesn't initiate this kind of instant revulsion, but it

282
00:17:00,880 --> 00:17:03,480
Speaker 1: is very hard to do so. A big challenge is

283
00:17:03,520 --> 00:17:08,240
Speaker 1: building an animatronic that doesn't trigger the uncanny value response

284
00:17:08,320 --> 00:17:11,159
Speaker 1: either by avoiding the trap of being almost but not

285
00:17:11,280 --> 00:17:14,359
Speaker 1: quite human in behavior, you know, by keeping things a

286
00:17:14,359 --> 00:17:18,520
Speaker 1: bit more obviously robotic, so there's that clear and distinct

287
00:17:18,600 --> 00:17:22,840
Speaker 1: separation that kind of removes that that response we have,

288
00:17:23,480 --> 00:17:27,160
Speaker 1: or creating something lifelike enough that we feel the same

289
00:17:27,200 --> 00:17:29,760
Speaker 1: sort of reactions we would experience if that were a

290
00:17:29,800 --> 00:17:34,399
Speaker 1: real human. So it's tough to do. It's easier to

291
00:17:34,440 --> 00:17:37,879
Speaker 1: do the robot approach than it is to get something

292
00:17:37,920 --> 00:17:40,960
Speaker 1: that seems human enough that we let our guard down.

293
00:17:41,600 --> 00:17:44,400
Speaker 1: None of these challenges are trivial, but they all require

294
00:17:44,480 --> 00:17:49,000
Speaker 1: distinct approaches that must ultimately converge into a single implementation.

295
00:17:49,760 --> 00:17:51,600
Speaker 1: When we come back, I'll talk about some of the

296
00:17:51,640 --> 00:17:55,359
Speaker 1: technologies in this animatronic figure and the engineering team's philosophy

297
00:17:55,440 --> 00:17:58,959
Speaker 1: behind their design choices. But first let's take a quick break.

298
00:18:06,560 --> 00:18:10,080
Speaker 1: The engineering team limited itself to parameters that related to

299
00:18:10,119 --> 00:18:13,680
Speaker 1: creating a robot that could direct its gaze towards onlookers,

300
00:18:13,840 --> 00:18:16,760
Speaker 1: which meant they didn't have to worry about it doing

301
00:18:17,280 --> 00:18:21,520
Speaker 1: literally anything else. The audio animatronic bus they used has

302
00:18:21,760 --> 00:18:25,640
Speaker 1: nineteen degrees of freedom total, but the team made no

303
00:18:25,840 --> 00:18:28,600
Speaker 1: use of ten of those. They only used nine degrees

304
00:18:28,680 --> 00:18:32,040
Speaker 1: of freedom. They focused on the neck, which has three

305
00:18:32,080 --> 00:18:35,919
Speaker 1: degrees of freedom. The eyelids, which have two degrees of freedom,

306
00:18:35,960 --> 00:18:39,400
Speaker 1: the eyes, which also have too, and the eyebrows, which

307
00:18:39,440 --> 00:18:42,479
Speaker 1: have two degrees of freedom. The unused degrees of freedom

308
00:18:42,480 --> 00:18:44,840
Speaker 1: are for moving the jaw and the lips of the figure,

309
00:18:45,240 --> 00:18:48,320
Speaker 1: but since that's not necessary to make eye contact, the

310
00:18:48,359 --> 00:18:51,400
Speaker 1: team just ignored those they didn't need to mess with them,

311
00:18:51,440 --> 00:18:54,000
Speaker 1: which means we get the effect of a robotic skull

312
00:18:54,160 --> 00:18:57,920
Speaker 1: with an unchanging rictus grin staring at us as its

313
00:18:57,960 --> 00:19:01,679
Speaker 1: upper facial area remains animated it. I guess what I'm

314
00:19:01,680 --> 00:19:06,159
Speaker 1: saying is I didn't find the overall effect particularly comforting.

315
00:19:06,880 --> 00:19:10,760
Speaker 1: According to the paper, the commands going to these components

316
00:19:10,800 --> 00:19:15,399
Speaker 1: come from a quote custom proprietary software stack operating on

317
00:19:15,440 --> 00:19:19,800
Speaker 1: a one hurts real time loop end quote. Hurts is

318
00:19:19,840 --> 00:19:23,160
Speaker 1: a cycle per second, so this means that the software

319
00:19:23,240 --> 00:19:26,919
Speaker 1: is pulsing out operations one hundred times every second to

320
00:19:27,080 --> 00:19:31,280
Speaker 1: control this animatronic bust. Many of those commands aren't only

321
00:19:31,440 --> 00:19:34,800
Speaker 1: about making the bus do something specific, but to do

322
00:19:34,960 --> 00:19:39,399
Speaker 1: it in a specific way. Let's get back to the

323
00:19:39,440 --> 00:19:43,119
Speaker 1: Tiki birds as an example. The pneumatic valve that would

324
00:19:43,119 --> 00:19:45,840
Speaker 1: control whether or not pressurized air could travel to a

325
00:19:45,920 --> 00:19:49,920
Speaker 1: specific place like the mechanism that operates a bird's beak

326
00:19:50,480 --> 00:19:52,920
Speaker 1: is a pretty simple on or off switch, meaning the

327
00:19:53,000 --> 00:19:55,399
Speaker 1: valve is either open, in which case air can flow,

328
00:19:56,000 --> 00:19:58,199
Speaker 1: or it's closed, in which case the air is blocked

329
00:19:58,200 --> 00:20:01,760
Speaker 1: from flowing through. And a debating the mechanism, So the

330
00:20:01,800 --> 00:20:05,000
Speaker 1: beak has a natural resting position, and for this example,

331
00:20:05,080 --> 00:20:08,720
Speaker 1: will just assume that the rest position is a closed beak,

332
00:20:09,600 --> 00:20:12,119
Speaker 1: and so that's what the beak will always return to

333
00:20:12,320 --> 00:20:16,080
Speaker 1: when there's no air flowing. To the mechanism that opens

334
00:20:16,119 --> 00:20:19,040
Speaker 1: the beak. If we open the valve, it lets air through,

335
00:20:19,280 --> 00:20:21,399
Speaker 1: It rushes to the end point, forces the beak to

336
00:20:21,600 --> 00:20:25,280
Speaker 1: open rapidly. Closing and opening the valve quickly forces the

337
00:20:25,280 --> 00:20:28,560
Speaker 1: bird's beak to open and close quickly, and when matched

338
00:20:28,560 --> 00:20:31,080
Speaker 1: with a soundtrack, it looks as though the bird is

339
00:20:31,119 --> 00:20:34,240
Speaker 1: speaking or singing, or you know, whatever it's doing. But

340
00:20:34,320 --> 00:20:37,080
Speaker 1: that movement is rapid and, just as I mentioned earlier,

341
00:20:37,160 --> 00:20:41,919
Speaker 1: not suitable for all animatronic applications. Having life sized humanoids

342
00:20:41,960 --> 00:20:45,080
Speaker 1: move with that kind of alarming speed would be scary

343
00:20:45,119 --> 00:20:49,040
Speaker 1: and legitimately dangerous. The greater mass of the figures would

344
00:20:49,080 --> 00:20:51,800
Speaker 1: mean you're dealing with larger amounts of inertia. I mean,

345
00:20:51,840 --> 00:20:54,400
Speaker 1: I just imagine what it would look like if Mr Lincoln,

346
00:20:54,480 --> 00:20:56,760
Speaker 1: in an effort to raise his hand in a gentle

347
00:20:56,800 --> 00:21:01,400
Speaker 1: show of reserve determination, instead violently karate chopped his own

348
00:21:01,440 --> 00:21:05,159
Speaker 1: head off. It would be, as the kids say, a

349
00:21:05,240 --> 00:21:10,040
Speaker 1: bad look. To create the illusion of life, the animatronics

350
00:21:10,080 --> 00:21:14,480
Speaker 1: that Disney designs follow certain general strategies. One is called

351
00:21:14,640 --> 00:21:18,640
Speaker 1: slow in and slow out. Now. This refers to general

352
00:21:18,680 --> 00:21:22,280
Speaker 1: movements and the ideas that any movement should start off

353
00:21:22,400 --> 00:21:26,240
Speaker 1: slowly and then pick up speed as the movement continues,

354
00:21:26,800 --> 00:21:30,080
Speaker 1: and then slow down again before coming to a stop.

355
00:21:30,440 --> 00:21:32,879
Speaker 1: And it makes the motions appear more fluid, and it

356
00:21:32,880 --> 00:21:35,320
Speaker 1: has the added benefit of not being quite so harsh

357
00:21:35,359 --> 00:21:38,680
Speaker 1: on the figures themselves. So when a Disney figure raises

358
00:21:38,800 --> 00:21:41,720
Speaker 1: its hand, the hand should start off moving upward with

359
00:21:41,760 --> 00:21:45,399
Speaker 1: a nice, smooth slow motion, pick up a bit of

360
00:21:45,440 --> 00:21:48,960
Speaker 1: speed as it's moving upward, and then slow down again

361
00:21:49,000 --> 00:21:52,199
Speaker 1: as it's approaching its end point. And this means that

362
00:21:52,280 --> 00:21:55,440
Speaker 1: the underlying motors and mechanical systems have to be capable

363
00:21:55,560 --> 00:21:59,240
Speaker 1: of achieving the strategy. It's why you can't use pneumatic systems.

364
00:21:59,240 --> 00:22:02,320
Speaker 1: They can't be those simple single speed devices that are

365
00:22:02,320 --> 00:22:06,080
Speaker 1: either on or off, like the Tiki birds. Oh, and

366
00:22:06,119 --> 00:22:08,320
Speaker 1: I guess I should specify I'm talking in this case

367
00:22:08,320 --> 00:22:11,639
Speaker 1: about the original Tiki birds because the birds in the

368
00:22:11,680 --> 00:22:15,600
Speaker 1: attractions today work on updated and more sophisticated computer systems

369
00:22:15,600 --> 00:22:17,760
Speaker 1: that take up a fraction of a fraction of the

370
00:22:17,800 --> 00:22:21,960
Speaker 1: space of the old attraction, which essentially required an entire

371
00:22:22,119 --> 00:22:24,920
Speaker 1: room filled with cables and tubes to make everything work

372
00:22:25,040 --> 00:22:30,240
Speaker 1: underneath the actual attraction itself. Now a few computers handled

373
00:22:30,280 --> 00:22:35,359
Speaker 1: the whole shebang. Anyway, Let's get back to animatronics. Some

374
00:22:35,520 --> 00:22:39,080
Speaker 1: of the other guiding principles in animatronic motion that in

375
00:22:39,160 --> 00:22:42,240
Speaker 1: turn dictate the types of motors and joints and other

376
00:22:42,280 --> 00:22:45,680
Speaker 1: mechanical elements that the team mustn't use to to make

377
00:22:45,760 --> 00:22:50,000
Speaker 1: these happen include designing motions as arcs, meaning the motion

378
00:22:50,040 --> 00:22:54,560
Speaker 1: should follow an arched trajectory. Another is that the motions

379
00:22:54,680 --> 00:22:58,960
Speaker 1: should have overlap, meaning a robot shouldn't move a single

380
00:22:59,040 --> 00:23:03,320
Speaker 1: element like an arm, stop, then go to move on

381
00:23:03,400 --> 00:23:07,840
Speaker 1: the next element like the head position, and then stop

382
00:23:07,880 --> 00:23:12,160
Speaker 1: and so on, because that would be well, really robotic. Instead,

383
00:23:12,200 --> 00:23:16,040
Speaker 1: the robots motions should overlap with one another so that

384
00:23:16,359 --> 00:23:18,879
Speaker 1: Let's say Mr. Lincoln is turning his head at the

385
00:23:18,920 --> 00:23:22,320
Speaker 1: same time his arm is going up in determination. Now,

386
00:23:22,400 --> 00:23:26,040
Speaker 1: another element that's connected to this concept is that of drag,

387
00:23:26,480 --> 00:23:29,040
Speaker 1: which means that the different body parts are moving at

388
00:23:29,119 --> 00:23:31,960
Speaker 1: different frequencies or timing. They're not moving all at the

389
00:23:32,000 --> 00:23:35,000
Speaker 1: same speed. So, in other words, the speed at which Mr.

390
00:23:35,040 --> 00:23:38,399
Speaker 1: Lincoln turns his head might be slightly faster or slower

391
00:23:38,440 --> 00:23:41,280
Speaker 1: than the speed at which his arm goes up. This

392
00:23:41,359 --> 00:23:44,560
Speaker 1: is all in an effort to create the illusion of life,

393
00:23:44,640 --> 00:23:47,960
Speaker 1: but it also means that the programming in hardware underlying

394
00:23:48,000 --> 00:23:51,840
Speaker 1: the figure has to support those strategies. For the purposes

395
00:23:51,880 --> 00:23:54,919
Speaker 1: of this project, the engineers had certain motions they wanted

396
00:23:54,960 --> 00:23:58,000
Speaker 1: to be included. One minimum set of motions needed were

397
00:23:58,080 --> 00:24:02,360
Speaker 1: some that would imply that the bust was a breathing entity,

398
00:24:02,400 --> 00:24:04,920
Speaker 1: So I need to move slightly as if it were

399
00:24:05,040 --> 00:24:08,960
Speaker 1: drawing breath. Blinking was also an important motion to get down,

400
00:24:09,080 --> 00:24:11,359
Speaker 1: as it would be more than a little unnerving to

401
00:24:11,359 --> 00:24:14,440
Speaker 1: have an animatronic figure make eye contact with you and

402
00:24:14,480 --> 00:24:19,600
Speaker 1: then never ever blink. And then there were the scads.

403
00:24:20,440 --> 00:24:23,040
Speaker 1: Now I have to confess something to you, guys. When

404
00:24:23,040 --> 00:24:26,639
Speaker 1: I first encountered the word scads, which is S A

405
00:24:26,960 --> 00:24:30,920
Speaker 1: C C A D E S. I had no idea

406
00:24:30,960 --> 00:24:33,239
Speaker 1: what that meant. It was a new word to me,

407
00:24:33,840 --> 00:24:35,560
Speaker 1: and maybe it's a new word for some of you

408
00:24:35,640 --> 00:24:38,760
Speaker 1: out there too. So if you happen to be like me,

409
00:24:39,160 --> 00:24:42,720
Speaker 1: what the heck are scads? Well? That refers to the quick,

410
00:24:43,000 --> 00:24:47,200
Speaker 1: simultaneous movement of both eyes from one point of focus

411
00:24:47,280 --> 00:24:50,240
Speaker 1: to another. So think about how you might take in

412
00:24:50,320 --> 00:24:52,719
Speaker 1: a scene that has a lot of stuff going on.

413
00:24:52,800 --> 00:24:57,640
Speaker 1: Let's say you you walk up to a building that's

414
00:24:57,640 --> 00:25:00,920
Speaker 1: that's that's burning. Well, your is are going to dart

415
00:25:01,080 --> 00:25:03,679
Speaker 1: at different things that are going on in front of

416
00:25:03,720 --> 00:25:06,640
Speaker 1: you that catch your attention as you focus on them,

417
00:25:06,640 --> 00:25:09,000
Speaker 1: and then you file that information away. And perhaps you're

418
00:25:09,000 --> 00:25:13,320
Speaker 1: even doing this subconsciously. Uh. It means our gaze is

419
00:25:13,359 --> 00:25:17,280
Speaker 1: not always steady and unwavering. It it moves around a

420
00:25:17,280 --> 00:25:19,840
Speaker 1: bit on occasion. And that's not the only way we

421
00:25:19,880 --> 00:25:22,159
Speaker 1: move our eyes. Of course, we can actually track things

422
00:25:22,200 --> 00:25:25,320
Speaker 1: that are moving and use our eyes to move in

423
00:25:25,400 --> 00:25:28,720
Speaker 1: a more smooth and gradual motion. But the team knew

424
00:25:28,720 --> 00:25:31,080
Speaker 1: that if they could incorporate the CODs, that would give

425
00:25:31,119 --> 00:25:35,320
Speaker 1: the robot a more lifelike performance. But that decision meant

426
00:25:35,320 --> 00:25:37,560
Speaker 1: the team needed to figure out something else, which was

427
00:25:37,600 --> 00:25:40,960
Speaker 1: where to put the cameras. The animatronic needs its own

428
00:25:41,119 --> 00:25:44,480
Speaker 1: vision to be able to detect onlookers and then direct

429
00:25:44,520 --> 00:25:49,080
Speaker 1: its own gaze appropriately, and some robots do put cameras

430
00:25:49,080 --> 00:25:52,000
Speaker 1: in the eyes of the robot so that the eyes

431
00:25:52,040 --> 00:25:55,520
Speaker 1: are actually camera lenses, but that presents a challenge if

432
00:25:55,560 --> 00:25:58,760
Speaker 1: you wish to incorporate rapid eye movement like the CODs,

433
00:25:58,800 --> 00:26:01,720
Speaker 1: because that sort of movement introduces motion blur in the

434
00:26:01,840 --> 00:26:04,679
Speaker 1: video imagery makes it more challenging for the robot to

435
00:26:04,720 --> 00:26:06,520
Speaker 1: keep track of what's going on in front of it.

436
00:26:06,920 --> 00:26:09,600
Speaker 1: For that reason, the team decided that the cameras would

437
00:26:09,600 --> 00:26:13,360
Speaker 1: not be mounted in the eyes, but they rather were

438
00:26:13,359 --> 00:26:18,640
Speaker 1: mounted on the animatronics chest. Presumably, should the gaze tracking

439
00:26:18,720 --> 00:26:22,040
Speaker 1: technology find its way into full animatronic figures in the future,

440
00:26:22,440 --> 00:26:25,080
Speaker 1: the camera will be you know, hidden within the body

441
00:26:25,160 --> 00:26:29,040
Speaker 1: of the animatronic torso in order to avoid this problem,

442
00:26:29,160 --> 00:26:33,160
Speaker 1: or otherwise maybe mounted in an obtrusive spot. One thing

443
00:26:33,160 --> 00:26:36,320
Speaker 1: that interests me with this particular approach is that the

444
00:26:36,359 --> 00:26:39,639
Speaker 1: system has to do some calculations as to where the

445
00:26:39,720 --> 00:26:43,040
Speaker 1: eyes of the animatronic are in relation to the physical

446
00:26:43,080 --> 00:26:46,680
Speaker 1: location of the cameras, you know, because for us, all

447
00:26:46,680 --> 00:26:50,440
Speaker 1: our eyes are essentially the cameras, or at least the

448
00:26:50,480 --> 00:26:53,920
Speaker 1: camera lenses, so we don't have to make any adjustments.

449
00:26:54,000 --> 00:26:57,280
Speaker 1: Right where we're looking is like the point of our

450
00:26:57,320 --> 00:27:01,000
Speaker 1: gaze is the point of where we're taking in visual information.

451
00:27:01,480 --> 00:27:05,520
Speaker 1: For the animatronic, the eyes of the robot, the actual

452
00:27:05,640 --> 00:27:08,960
Speaker 1: eyes that are in the skull, don't function as eyes.

453
00:27:09,520 --> 00:27:14,359
Speaker 1: They aren't lenses. They're actually several inches above the actual camera.

454
00:27:15,000 --> 00:27:18,439
Speaker 1: And yet the eyes in the robot's head need to

455
00:27:18,480 --> 00:27:20,439
Speaker 1: point in the right direction. They need to be the

456
00:27:20,520 --> 00:27:23,760
Speaker 1: part that's pointed at the person who's looking at it. Right,

457
00:27:23,800 --> 00:27:26,359
Speaker 1: it doesn't make sense for the robot to just turn

458
00:27:26,440 --> 00:27:29,480
Speaker 1: its sternom towards you. It needs to be looking at

459
00:27:29,520 --> 00:27:33,040
Speaker 1: you with its robot eyes. And I think of this

460
00:27:33,200 --> 00:27:36,679
Speaker 1: kind of like someone who's working a hand puppet and

461
00:27:36,720 --> 00:27:39,760
Speaker 1: they've got the hand puppet up over their head, so

462
00:27:40,000 --> 00:27:42,359
Speaker 1: maybe they're behind a little stage, you know, like like

463
00:27:42,480 --> 00:27:45,719
Speaker 1: the muppets tend to be. You've got this hand puppet

464
00:27:45,720 --> 00:27:49,480
Speaker 1: and it needs to make eye contact with a human being. Well,

465
00:27:49,600 --> 00:27:52,000
Speaker 1: that just means the puppeteer has to take that into

466
00:27:52,040 --> 00:27:56,919
Speaker 1: account and angle their hand so that the puppets eyes

467
00:27:57,119 --> 00:28:00,359
Speaker 1: appear to be locking on the eyes of the real

468
00:28:00,440 --> 00:28:04,440
Speaker 1: person that the muppet or puppet is interacting with. It's

469
00:28:04,440 --> 00:28:07,680
Speaker 1: a little tricky. It requires some skill for the robot.

470
00:28:07,800 --> 00:28:10,600
Speaker 1: It means that there's some you know, nifty geometry going

471
00:28:10,640 --> 00:28:13,960
Speaker 1: on in the processor side to make this work out.

472
00:28:14,040 --> 00:28:17,320
Speaker 1: Like the image recognition has to identify where the eyes

473
00:28:17,400 --> 00:28:22,560
Speaker 1: are of the onlooker and then calculate where the robots

474
00:28:22,560 --> 00:28:25,840
Speaker 1: eyes are in relation to that and direct them in

475
00:28:25,880 --> 00:28:29,359
Speaker 1: the right way, which to me is really fascinating because again,

476
00:28:29,720 --> 00:28:32,359
Speaker 1: the eyes of the robot are not where the visual

477
00:28:32,359 --> 00:28:36,399
Speaker 1: information is actually coming in. We'll talk more about the

478
00:28:36,440 --> 00:28:39,240
Speaker 1: behaviors of this robot in a second, but since we're

479
00:28:39,240 --> 00:28:41,720
Speaker 1: already chatting about cameras, it's good to talk about what

480
00:28:41,800 --> 00:28:44,600
Speaker 1: the team was actually using to give the robot it's vision.

481
00:28:45,040 --> 00:28:47,840
Speaker 1: They went within off the shelf solution. They used a

482
00:28:47,920 --> 00:28:52,320
Speaker 1: camera called the Mint Eye D one thousand and Mint

483
00:28:52,440 --> 00:28:56,760
Speaker 1: is spelled m y nt. This particular camera has two

484
00:28:56,840 --> 00:29:00,400
Speaker 1: lenses in it for stereoscopic vision, and so together they

485
00:29:00,400 --> 00:29:03,240
Speaker 1: can create a stereo image that is an image with

486
00:29:03,640 --> 00:29:05,640
Speaker 1: you know, kind of a depth like a three D

487
00:29:05,760 --> 00:29:09,160
Speaker 1: image with a resolution of two thousand, five hundred sixty

488
00:29:09,160 --> 00:29:12,920
Speaker 1: by seven twenty pixels at sixty frames per second, so

489
00:29:12,920 --> 00:29:16,480
Speaker 1: it can do you know, this is video information. There's

490
00:29:16,480 --> 00:29:20,120
Speaker 1: also a depth map mode which uses infrared light to

491
00:29:20,160 --> 00:29:23,800
Speaker 1: help judge the depth of the things within its field

492
00:29:23,800 --> 00:29:26,600
Speaker 1: of view, like how close is one thing versus another

493
00:29:27,240 --> 00:29:30,160
Speaker 1: relative to the camera, and the depth maps resolution is

494
00:29:30,200 --> 00:29:33,160
Speaker 1: at one thousand, two hundred eighty by seven twenty pixels

495
00:29:33,200 --> 00:29:36,200
Speaker 1: at sixty frames per second. As I mentioned, these two

496
00:29:36,320 --> 00:29:40,000
Speaker 1: lenses allow the camera to simulate human binocular vision. So

497
00:29:40,040 --> 00:29:42,400
Speaker 1: just as we perceive depth in the world around us

498
00:29:42,520 --> 00:29:45,320
Speaker 1: using two eyes, you know, most of us, uh, this

499
00:29:45,440 --> 00:29:48,480
Speaker 1: camera can do the same thing and judge which things

500
00:29:48,520 --> 00:29:51,360
Speaker 1: are in the foreground versus the background, what things are

501
00:29:51,520 --> 00:29:54,280
Speaker 1: closest to it versus furthest away, and make a better

502
00:29:54,320 --> 00:29:57,440
Speaker 1: determination of which things within its field of view are

503
00:29:57,480 --> 00:30:00,680
Speaker 1: worthy of attention, which will become important in a little bit.

504
00:30:01,240 --> 00:30:03,880
Speaker 1: The camera has a more limited field of view than

505
00:30:03,920 --> 00:30:07,640
Speaker 1: a typical human. It has about half the horizontal field

506
00:30:07,720 --> 00:30:10,760
Speaker 1: of view of persons, so it's periphery is more narrow,

507
00:30:11,400 --> 00:30:13,959
Speaker 1: and it has a little more than a third the

508
00:30:14,080 --> 00:30:16,440
Speaker 1: vertical field of view, so I can't see as much

509
00:30:16,520 --> 00:30:19,840
Speaker 1: up and down as your typical person can. So any

510
00:30:19,920 --> 00:30:23,080
Speaker 1: future animatronic figure might need a more expansive field of

511
00:30:23,160 --> 00:30:26,000
Speaker 1: view to be able to interact with guests who could

512
00:30:26,080 --> 00:30:28,640
Speaker 1: range and height from very small to quite tall. I mean,

513
00:30:28,680 --> 00:30:31,680
Speaker 1: all sorts of people go to Disney. So I do

514
00:30:31,760 --> 00:30:35,240
Speaker 1: see that as a potential limiting factor in the short run,

515
00:30:35,800 --> 00:30:38,920
Speaker 1: that any stereoscopic kind of camera would need to have

516
00:30:39,000 --> 00:30:43,480
Speaker 1: a pretty good field of view for a robot to

517
00:30:43,520 --> 00:30:49,080
Speaker 1: be able to interact properly with guests of different heights. Now,

518
00:30:49,120 --> 00:30:51,200
Speaker 1: I decided to see how much this camera would cost

519
00:30:51,240 --> 00:30:54,360
Speaker 1: for some normal schlub like myself, and the answer is

520
00:30:54,520 --> 00:30:56,840
Speaker 1: less than four hundred dollars. So this is actually a

521
00:30:56,840 --> 00:31:01,600
Speaker 1: pretty inexpensive solution all things considered. And again so it's

522
00:31:01,640 --> 00:31:05,520
Speaker 1: it's really more important for creating the basis for the

523
00:31:05,600 --> 00:31:08,400
Speaker 1: work as opposed to saying this is a final product.

524
00:31:08,960 --> 00:31:11,440
Speaker 1: And that's more or less the hardware side of things,

525
00:31:11,520 --> 00:31:13,680
Speaker 1: or at least as specific as I can get based

526
00:31:13,680 --> 00:31:16,840
Speaker 1: on the material available. Like I, I don't know what

527
00:31:17,000 --> 00:31:19,880
Speaker 1: the power of their computer system was, you know, I

528
00:31:19,920 --> 00:31:23,120
Speaker 1: don't know the specific types of motors they were using

529
00:31:23,120 --> 00:31:26,720
Speaker 1: in the animatronic but from a high level we understand

530
00:31:26,720 --> 00:31:30,200
Speaker 1: what's going on. However, the real magic happens with the

531
00:31:30,240 --> 00:31:33,840
Speaker 1: system that gives this hardware it's orders, and the team

532
00:31:33,880 --> 00:31:36,760
Speaker 1: made the conscious decision to create the illusion of life

533
00:31:37,120 --> 00:31:41,040
Speaker 1: rather than attempt to replicate human behaviors perfectly, which is

534
00:31:41,040 --> 00:31:43,400
Speaker 1: a bit of a challenging concept. You might think, well,

535
00:31:43,400 --> 00:31:46,160
Speaker 1: what's the difference, But I think I have a pretty

536
00:31:46,200 --> 00:31:50,640
Speaker 1: decent analogy. If you've ever gone to see a stage play,

537
00:31:51,000 --> 00:31:55,560
Speaker 1: then you've seen sets. Maybe the sets were really detailed,

538
00:31:56,040 --> 00:31:59,240
Speaker 1: maybe they were bare bones sets, But in any case,

539
00:31:59,280 --> 00:32:01,800
Speaker 1: the sets are meant to create the illusion of a

540
00:32:01,880 --> 00:32:04,800
Speaker 1: real place at a real moment of time. You know,

541
00:32:04,840 --> 00:32:07,000
Speaker 1: it could be a room in the eighteenth century in

542
00:32:07,040 --> 00:32:10,480
Speaker 1: a in a palatial estate, or it might be a

543
00:32:10,600 --> 00:32:14,560
Speaker 1: modern day real estate sales office if it's a moment play,

544
00:32:14,680 --> 00:32:18,600
Speaker 1: or maybe it's a campsite. In any case, the sets

545
00:32:18,600 --> 00:32:21,800
Speaker 1: and props are meant to convey the illusion of that

546
00:32:21,880 --> 00:32:24,920
Speaker 1: place and time, and if you were to actually get

547
00:32:25,000 --> 00:32:27,360
Speaker 1: up on stage and walk around, that illusion would very

548
00:32:27,440 --> 00:32:30,480
Speaker 1: quickly be broken. But when you're sitting in the audience,

549
00:32:30,920 --> 00:32:33,680
Speaker 1: it's up to you to use your imagination to fill

550
00:32:33,720 --> 00:32:36,840
Speaker 1: in some of the gaps and suspend disbelief it is

551
00:32:37,000 --> 00:32:41,360
Speaker 1: a show. Likewise, the engineers who worked on this project

552
00:32:41,520 --> 00:32:45,600
Speaker 1: talk about robot behaviors in terms of a show, and

553
00:32:45,640 --> 00:32:48,040
Speaker 1: that means that the robot needs to react and move

554
00:32:48,080 --> 00:32:50,840
Speaker 1: in ways that create the illusion of life, but it

555
00:32:50,880 --> 00:32:56,320
Speaker 1: does not necessarily need to adhere completely to human behaviors.

556
00:32:56,360 --> 00:33:00,000
Speaker 1: This makes things much more simple, particularly since it removes

557
00:33:00,000 --> 00:33:03,800
Speaker 1: of tricky questions regarding what sets of behaviors are the

558
00:33:03,800 --> 00:33:07,400
Speaker 1: most human, because I'm sure you've noticed human beings and

559
00:33:07,520 --> 00:33:11,600
Speaker 1: human behavior occur in a really broad spectrum, and what

560
00:33:11,840 --> 00:33:15,240
Speaker 1: might be a typical set of behaviors for one person

561
00:33:15,560 --> 00:33:19,000
Speaker 1: could be completely alien to another person. So it's a

562
00:33:19,000 --> 00:33:23,120
Speaker 1: good idea to not try and define what sets of

563
00:33:23,160 --> 00:33:28,000
Speaker 1: behaviors are quintessentially human. When we come back, I'll talk

564
00:33:28,040 --> 00:33:31,320
Speaker 1: about how the team determined how the robot would actually

565
00:33:31,360 --> 00:33:35,400
Speaker 1: behave it's pretty cool, but first let's take another quick break.

566
00:33:42,760 --> 00:33:46,440
Speaker 1: The team created an architecture to describe the relationship of

567
00:33:46,560 --> 00:33:51,280
Speaker 1: various elements to create the behavior of an interactive robotic gaze.

568
00:33:51,400 --> 00:33:56,120
Speaker 1: To create this robotic eye contact, the layers include the camera,

569
00:33:56,400 --> 00:33:59,280
Speaker 1: which is you know, the point of perception from the robot,

570
00:33:59,800 --> 00:34:05,280
Speaker 1: a perception engine h and an attention engine which determines

571
00:34:05,680 --> 00:34:08,560
Speaker 1: which things within the robots perception are actually worthy of

572
00:34:08,680 --> 00:34:13,200
Speaker 1: attention or focus. A behavior selection engine and a library

573
00:34:13,280 --> 00:34:18,880
Speaker 1: of potential behaviors, and the audio animatronic figures systems. It's hardware,

574
00:34:18,960 --> 00:34:22,839
Speaker 1: the motor commands and motor states go to that, and

575
00:34:23,080 --> 00:34:26,120
Speaker 1: that's the layers in order from top to bottom. These

576
00:34:26,200 --> 00:34:29,320
Speaker 1: layers explain the relationship of each element in sort of

577
00:34:29,320 --> 00:34:32,600
Speaker 1: an abstract way, allowing us to understand how the robot

578
00:34:32,680 --> 00:34:36,640
Speaker 1: processes and reacts to information. So the perception engine is

579
00:34:36,680 --> 00:34:40,759
Speaker 1: designed to identify potential elements within the robotic vision, you know,

580
00:34:40,800 --> 00:34:44,200
Speaker 1: separating things out from say just a static background, and

581
00:34:44,239 --> 00:34:47,600
Speaker 1: the attention engine attempts to identify things within the robots

582
00:34:47,680 --> 00:34:51,560
Speaker 1: vision that merit focus. The attention engine generates what the

583
00:34:51,560 --> 00:34:56,040
Speaker 1: team calls a curiosity score. So if that curiosity score

584
00:34:56,160 --> 00:35:00,600
Speaker 1: is below a certain threshold, the robot won't quota quote

585
00:35:00,640 --> 00:35:03,719
Speaker 1: notice something within its field of view. It's it's not

586
00:35:03,880 --> 00:35:07,960
Speaker 1: enough to capture its attention. Certain actions, such as you know,

587
00:35:08,000 --> 00:35:12,040
Speaker 1: waving at the robot, merit a higher curiosity score. So

588
00:35:12,080 --> 00:35:15,640
Speaker 1: if the score ends up being above the curiosity score threshold,

589
00:35:15,960 --> 00:35:18,920
Speaker 1: the robot will look toward whatever it was that you know,

590
00:35:19,200 --> 00:35:22,560
Speaker 1: quote unquote got its attention. The team decided it would

591
00:35:22,560 --> 00:35:25,680
Speaker 1: be helpful to create a sort of scenario to work with,

592
00:35:25,760 --> 00:35:28,560
Speaker 1: not just have you know, a robot randomly looking around,

593
00:35:28,600 --> 00:35:32,800
Speaker 1: So their approach was to simulate an elderly man reading

594
00:35:32,880 --> 00:35:36,720
Speaker 1: something like a newspaper or a book. Most of the time,

595
00:35:36,960 --> 00:35:39,239
Speaker 1: the robot would be looking downward a bit, you know,

596
00:35:39,280 --> 00:35:41,839
Speaker 1: it's head tilted down a little, as if it were

597
00:35:41,840 --> 00:35:44,879
Speaker 1: reading something that was held more or less at torso level.

598
00:35:45,600 --> 00:35:48,960
Speaker 1: If something moves into the robots field of you, the

599
00:35:49,080 --> 00:35:51,600
Speaker 1: robot could glance up quickly, just as a human would

600
00:35:51,600 --> 00:35:54,640
Speaker 1: to assess what's going on, and if whatever is within

601
00:35:54,719 --> 00:35:57,839
Speaker 1: the field of view creates a curiosity score lower than

602
00:35:58,000 --> 00:36:01,319
Speaker 1: what the threshold is, then the robot just goes back

603
00:36:01,360 --> 00:36:04,840
Speaker 1: to reading. If whatever is going on is above that

604
00:36:04,960 --> 00:36:09,560
Speaker 1: curiosity score threshold, the robot might look directly at whatever

605
00:36:09,600 --> 00:36:12,279
Speaker 1: it is that's happening, and then things could progress from there.

606
00:36:12,920 --> 00:36:16,520
Speaker 1: That's where the behavior selection engine and behavior library come

607
00:36:16,560 --> 00:36:19,319
Speaker 1: into play. There are a few possible reactions, and the

608
00:36:19,360 --> 00:36:22,880
Speaker 1: robot will choose one depending on several factors. For example,

609
00:36:23,120 --> 00:36:26,880
Speaker 1: one such factor was familiarity. The robot would behave differently

610
00:36:26,880 --> 00:36:30,800
Speaker 1: toward people it quote unquote recognized. It also wouldn't switch

611
00:36:30,920 --> 00:36:34,560
Speaker 1: focus every time someone tried to wave it down, So

612
00:36:34,719 --> 00:36:37,080
Speaker 1: if you were to distract the robot, it might look

613
00:36:37,120 --> 00:36:39,799
Speaker 1: away from whatever it was looking at before and then

614
00:36:39,840 --> 00:36:42,440
Speaker 1: look to you once. Then it might look back at

615
00:36:42,480 --> 00:36:45,479
Speaker 1: someone quote unquote knows, and if you were to wave

616
00:36:45,480 --> 00:36:48,880
Speaker 1: at it again, you wouldn't necessarily get a response. So

617
00:36:49,040 --> 00:36:51,600
Speaker 1: kind of think about how adults can be with kids,

618
00:36:51,880 --> 00:36:54,800
Speaker 1: where the adults tend to develop a highly attuned skill

619
00:36:54,880 --> 00:36:57,560
Speaker 1: of ignoring the child after a bit, even if the

620
00:36:57,600 --> 00:37:03,279
Speaker 1: child is saying, but look, look, look, hey, Look what's

621
00:37:03,280 --> 00:37:07,800
Speaker 1: what I'm doing? Look? And so on. So the team

622
00:37:07,880 --> 00:37:12,840
Speaker 1: created four basic states. The default state was called read,

623
00:37:13,200 --> 00:37:15,920
Speaker 1: meaning it would appear as though the figure we're reading

624
00:37:15,920 --> 00:37:19,480
Speaker 1: a book or newspaper at Torso level. The next state

625
00:37:19,600 --> 00:37:23,040
Speaker 1: up is glance, where upon the robot would appear to

626
00:37:23,120 --> 00:37:26,480
Speaker 1: glance away from the reading material to see what sort

627
00:37:26,480 --> 00:37:29,440
Speaker 1: of ruckus is going on. This involved movement of not

628
00:37:29,520 --> 00:37:31,759
Speaker 1: just the eyes but the head as well. So the

629
00:37:31,800 --> 00:37:35,000
Speaker 1: head tilts up a bit and it looks for a moment,

630
00:37:35,080 --> 00:37:38,799
Speaker 1: like the robot is looking away from the imaginary book

631
00:37:38,840 --> 00:37:42,319
Speaker 1: or newspaper. If the curiosity threshold is met, then the

632
00:37:42,360 --> 00:37:46,640
Speaker 1: next state engage would pop up. This means that whatever

633
00:37:46,680 --> 00:37:49,040
Speaker 1: it was that got the robot's attention is worthy of

634
00:37:49,200 --> 00:37:52,399
Speaker 1: further focus. In the robot will direct its gaze at

635
00:37:52,480 --> 00:37:56,560
Speaker 1: that thing. With the engage stage, which has a nice

636
00:37:56,640 --> 00:37:59,759
Speaker 1: rhyme to it, the robot will attempt to make eye contact,

637
00:38:00,000 --> 00:38:02,840
Speaker 1: which involves the cameras detecting the face of the person

638
00:38:02,920 --> 00:38:06,160
Speaker 1: of interest, and then the computer system commanding the robot's

639
00:38:06,160 --> 00:38:09,600
Speaker 1: head and eyes to aim towards that detected face. The

640
00:38:09,640 --> 00:38:11,920
Speaker 1: amount of time that the robot spends looking at a

641
00:38:11,960 --> 00:38:15,240
Speaker 1: person is determined both by a minimum countdown clock saying

642
00:38:15,640 --> 00:38:18,960
Speaker 1: you have to spend this amount at least looking at

643
00:38:19,040 --> 00:38:22,719
Speaker 1: this person, and then there's the curiosity score that the

644
00:38:22,800 --> 00:38:26,040
Speaker 1: robot has assigned to that person. So once that score

645
00:38:26,080 --> 00:38:30,600
Speaker 1: decreases below the engaged threshold, the robot returns to read.

646
00:38:30,760 --> 00:38:34,120
Speaker 1: So if you happen to be particularly interesting, the robot

647
00:38:34,160 --> 00:38:37,360
Speaker 1: will look at you for longer, and when you stop

648
00:38:37,360 --> 00:38:40,839
Speaker 1: being interesting, the robot eventually goes back to reading its

649
00:38:40,840 --> 00:38:45,399
Speaker 1: pretend book or whatever. The final stage is called acknowledge,

650
00:38:45,400 --> 00:38:47,279
Speaker 1: and that was the name that the team gave for

651
00:38:47,320 --> 00:38:49,960
Speaker 1: those times when the robot is seeing a person that

652
00:38:50,080 --> 00:38:53,720
Speaker 1: is familiar to the robot. For the purposes of the tests,

653
00:38:54,200 --> 00:38:58,000
Speaker 1: the familiarity variable was actually randomized, so in other words,

654
00:38:58,280 --> 00:39:02,480
Speaker 1: the robot wasn't necessary early familiar with people. It just

655
00:39:02,880 --> 00:39:06,800
Speaker 1: was told it was familiar with somebody. So, in other words,

656
00:39:06,800 --> 00:39:09,360
Speaker 1: that it could be a totally new person that walks

657
00:39:09,440 --> 00:39:12,960
Speaker 1: up to the robot and the robot randomly assigns that

658
00:39:13,000 --> 00:39:16,879
Speaker 1: person the familiar tag, and the robot will behave as

659
00:39:16,880 --> 00:39:20,759
Speaker 1: if that's someone that the robot recognizes. Maybe they're just

660
00:39:20,880 --> 00:39:24,840
Speaker 1: an old friend the robot just met. Is there a

661
00:39:24,880 --> 00:39:28,640
Speaker 1: word for that? The robot system also had a sort

662
00:39:28,680 --> 00:39:32,480
Speaker 1: of short term memory that the team called the guesthouse.

663
00:39:33,080 --> 00:39:35,799
Speaker 1: As people would come into the robot's field of view

664
00:39:36,080 --> 00:39:39,720
Speaker 1: or the scene as the team called it, the robot

665
00:39:39,760 --> 00:39:43,799
Speaker 1: would analyze that person and assign that person a numerical

666
00:39:43,960 --> 00:39:47,560
Speaker 1: value to keep track of that person, and it would

667
00:39:47,560 --> 00:39:49,920
Speaker 1: also keep track of how many times that particular person

668
00:39:50,200 --> 00:39:53,239
Speaker 1: had been within its field of view, and it would

669
00:39:53,320 --> 00:39:56,160
Speaker 1: keep track of the curiosity score that was assigned to

670
00:39:56,280 --> 00:39:59,760
Speaker 1: that person. In addition to the states, the team described

671
00:39:59,840 --> 00:40:03,520
Speaker 1: lay years of show. Now this relates closely with the

672
00:40:03,560 --> 00:40:06,480
Speaker 1: states I just mentioned, but it helps explain how the

673
00:40:06,600 --> 00:40:09,879
Speaker 1: robot transitions from one set of behaviors to another, how

674
00:40:09,880 --> 00:40:13,239
Speaker 1: does it make the determination to change from one thing

675
00:40:13,320 --> 00:40:17,360
Speaker 1: to the next, and which behaviors will overwrite others versus

676
00:40:17,440 --> 00:40:21,160
Speaker 1: behaviors that will always be present with the robot. All

677
00:40:21,200 --> 00:40:24,640
Speaker 1: of this is necessary because of that variation I was

678
00:40:24,680 --> 00:40:27,640
Speaker 1: talking about at the beginning of the show. If the

679
00:40:27,760 --> 00:40:32,239
Speaker 1: robot were just following a scripted set of directions, it

680
00:40:32,280 --> 00:40:34,839
Speaker 1: wouldn't have to make these determinations because it would just

681
00:40:34,840 --> 00:40:38,520
Speaker 1: follow the same sequence over and over. But because we

682
00:40:38,600 --> 00:40:42,800
Speaker 1: have this variability, we have to build in a system

683
00:40:42,920 --> 00:40:45,359
Speaker 1: for the robot to follow in order to make decisions.

684
00:40:45,440 --> 00:40:48,040
Speaker 1: So at the base level you have what the team

685
00:40:48,040 --> 00:40:52,560
Speaker 1: calls zero show. This is essentially the robot in off mode.

686
00:40:52,640 --> 00:40:55,719
Speaker 1: It is inanimate. But the next layer up is a

687
00:40:55,800 --> 00:41:00,279
Speaker 1: live show, which has the baseline behaviors of simulated bree thing,

688
00:41:00,960 --> 00:41:05,399
Speaker 1: eye blinking, and the scads. This level of show underlies

689
00:41:05,680 --> 00:41:08,960
Speaker 1: all the other higher levels, so this is sort of

690
00:41:09,920 --> 00:41:13,520
Speaker 1: always running in the background. You don't want the robot

691
00:41:13,560 --> 00:41:17,160
Speaker 1: to suddenly stop breathing while it does other stuff. The

692
00:41:17,280 --> 00:41:21,400
Speaker 1: next four show levels correspond with the four states of

693
00:41:21,440 --> 00:41:25,560
Speaker 1: the robots. So you have read, glance, engage, and acknowledge,

694
00:41:26,040 --> 00:41:30,920
Speaker 1: and an engage show will subsume the glance and read shows.

695
00:41:31,360 --> 00:41:34,040
Speaker 1: It will take over the robots behaviors, So the robots

696
00:41:34,080 --> 00:41:37,800
Speaker 1: not going to display the behaviors of read and glance

697
00:41:38,160 --> 00:41:43,279
Speaker 1: when engage happens. So it's that hierarchy of operations, and

698
00:41:43,360 --> 00:41:46,080
Speaker 1: I find it really interesting to look at robot behaviors

699
00:41:46,080 --> 00:41:49,359
Speaker 1: in this way as that hierarchy of potential states. It's

700
00:41:49,400 --> 00:41:52,800
Speaker 1: amazing when you break down those states and determine which

701
00:41:52,840 --> 00:41:57,520
Speaker 1: should take priority given certain circumstances, and how long that

702
00:41:57,640 --> 00:42:00,520
Speaker 1: state should remain active before it rever it's to a

703
00:42:00,640 --> 00:42:04,400
Speaker 1: lower level state. Again, the team is trying to create

704
00:42:04,440 --> 00:42:07,240
Speaker 1: the illusion of life. The robot doesn't have to actually

705
00:42:07,320 --> 00:42:10,960
Speaker 1: lose interest or anything like that. It's just simulating it.

706
00:42:11,600 --> 00:42:15,040
Speaker 1: This particular project was working within some pretty well defined

707
00:42:15,080 --> 00:42:18,640
Speaker 1: parameters and restrictions. The team acknowledge that their work is

708
00:42:18,680 --> 00:42:22,120
Speaker 1: really meant to be a starting point for further improvements.

709
00:42:22,560 --> 00:42:26,759
Speaker 1: They point out that older audio animatronics might seem lifelike

710
00:42:26,840 --> 00:42:30,760
Speaker 1: at greater distances and for shorter durations. So, for example,

711
00:42:31,239 --> 00:42:33,920
Speaker 1: if you were to ride an attraction where you go

712
00:42:34,040 --> 00:42:37,520
Speaker 1: by a scene of audio animatronic figures at a decent

713
00:42:37,520 --> 00:42:40,800
Speaker 1: clip and and there you know, good, twenty feet away.

714
00:42:41,200 --> 00:42:43,919
Speaker 1: The limited amount of time and the greater distance that

715
00:42:44,080 --> 00:42:48,160
Speaker 1: are involved can help support that illusion of life. The

716
00:42:48,200 --> 00:42:51,879
Speaker 1: animatronic figures don't have to be super convincing because you're

717
00:42:51,920 --> 00:42:55,480
Speaker 1: not spending enough time and attention to see through the illusion,

718
00:42:55,520 --> 00:42:58,120
Speaker 1: nor are you close enough to see it showed through.

719
00:42:58,840 --> 00:43:01,800
Speaker 1: The more time you have and the less distance between

720
00:43:01,880 --> 00:43:04,840
Speaker 1: you and the animatronic figure, the harder it is to

721
00:43:04,920 --> 00:43:09,800
Speaker 1: create and maintain that illusion of life. Without an interactive gaze,

722
00:43:09,880 --> 00:43:13,800
Speaker 1: Without eye contact, it becomes pretty clear that the animatronic

723
00:43:13,840 --> 00:43:17,480
Speaker 1: figure has no real lifelike quality to it. If you

724
00:43:17,520 --> 00:43:20,520
Speaker 1: were to stand close to one of these older animatronic figures,

725
00:43:20,920 --> 00:43:23,280
Speaker 1: you would notice that it's not really looking at anything

726
00:43:23,320 --> 00:43:26,520
Speaker 1: in particular, and that its movements are a matter of routine.

727
00:43:26,880 --> 00:43:32,000
Speaker 1: It's not a demonstration of spontaneous or seemingly spontaneous decisions.

728
00:43:32,640 --> 00:43:35,799
Speaker 1: The Interactive Gaze project takes this a step up. The

729
00:43:35,880 --> 00:43:39,080
Speaker 1: robot can recognize and acknowledge someone that is in the

730
00:43:39,160 --> 00:43:43,200
Speaker 1: robot's presence, it can direct its focus and attention at

731
00:43:43,239 --> 00:43:46,560
Speaker 1: that person. This definitely is a step up in creating

732
00:43:46,560 --> 00:43:50,560
Speaker 1: that illusion and works at much smaller distances of viewing

733
00:43:51,040 --> 00:43:54,200
Speaker 1: than the older methods do, but the engineers admit it

734
00:43:54,320 --> 00:43:57,759
Speaker 1: still has limitations. They point out that their approach as

735
00:43:57,800 --> 00:44:00,680
Speaker 1: it stands, might serve as a way to reserve that

736
00:44:00,760 --> 00:44:04,320
Speaker 1: illusion of life for a couple of minutes at the most,

737
00:44:04,640 --> 00:44:08,040
Speaker 1: but beyond that the illusion would start to fade away.

738
00:44:08,280 --> 00:44:10,880
Speaker 1: They point out that as the distance between the robot

739
00:44:10,960 --> 00:44:14,359
Speaker 1: and the audience decreases, and as the time of observing

740
00:44:14,360 --> 00:44:19,560
Speaker 1: the robot increases, you have to incorporate increasingly complex and

741
00:44:19,640 --> 00:44:24,040
Speaker 1: natural behaviors to maintain that illusion of life, and interactive

742
00:44:24,080 --> 00:44:27,359
Speaker 1: gaze is just one element. Others could include stuff like

743
00:44:27,440 --> 00:44:31,440
Speaker 1: a display of emotion. The bust has sort of a

744
00:44:31,440 --> 00:44:34,040
Speaker 1: little bit of this. It can it can imply a

745
00:44:34,080 --> 00:44:36,640
Speaker 1: sense of emotion to some degree with the way it

746
00:44:36,719 --> 00:44:40,240
Speaker 1: holds its eyes, but because it doesn't have any movement

747
00:44:40,239 --> 00:44:42,960
Speaker 1: of its jaw or lips, and doesn't have any other

748
00:44:43,840 --> 00:44:48,400
Speaker 1: means of really indicating emotion, this is pretty limited. So

749
00:44:48,480 --> 00:44:52,080
Speaker 1: perhaps a robot that can hear and parse and respond

750
00:44:52,080 --> 00:44:54,440
Speaker 1: to speech, you know, sort of like the voice activated

751
00:44:54,480 --> 00:44:57,200
Speaker 1: digital assistance that are familiar to us, and you know,

752
00:44:57,239 --> 00:45:00,360
Speaker 1: probably like the Amazon Echo or the iPhone or Android phones.

753
00:45:00,960 --> 00:45:04,640
Speaker 1: That might be something that really pushes that illusion of life.

754
00:45:05,000 --> 00:45:09,279
Speaker 1: And of course there's also the physical appearance aspect. Now,

755
00:45:09,280 --> 00:45:12,800
Speaker 1: you would never mistake this animatronic bust for a human

756
00:45:13,120 --> 00:45:16,319
Speaker 1: I mentioned before. It's pretty creepy looking. It's got a

757
00:45:16,360 --> 00:45:19,879
Speaker 1: plastic and skeletal quality to it that prevents you from

758
00:45:19,880 --> 00:45:23,640
Speaker 1: ever mistaking it as a person. But the team points

759
00:45:23,640 --> 00:45:27,240
Speaker 1: out the physical appearance of the robot taps back into

760
00:45:27,320 --> 00:45:31,000
Speaker 1: that problem of uncanny Valley. It might take a while

761
00:45:31,080 --> 00:45:34,800
Speaker 1: to create something that's convincing enough and yet not repulsive

762
00:45:36,120 --> 00:45:39,560
Speaker 1: to work as a robotic human animatronic. If you make

763
00:45:39,600 --> 00:45:43,240
Speaker 1: it look too real, it's going to give people the creeps.

764
00:45:43,880 --> 00:45:46,680
Speaker 1: I think, at least in the short term, we're more

765
00:45:46,760 --> 00:45:49,400
Speaker 1: likely to see this technology used to create characters that

766
00:45:49,440 --> 00:45:53,960
Speaker 1: are human like but still distinctly not human, in order

767
00:45:54,000 --> 00:45:58,000
Speaker 1: to avoid that negative reaction when the uncanny Valley gets involved.

768
00:45:58,320 --> 00:46:01,200
Speaker 1: In other words, using the US to create an animatronic

769
00:46:01,239 --> 00:46:04,799
Speaker 1: figure that looks a lot like a cartoon character, even

770
00:46:04,840 --> 00:46:09,160
Speaker 1: a human cartoon character because well, you recognize the cartoon

771
00:46:09,239 --> 00:46:13,680
Speaker 1: character as representing a human. Cartoon characters don't really look

772
00:46:13,719 --> 00:46:18,239
Speaker 1: like humans. Usually they look like they have human qualities

773
00:46:18,280 --> 00:46:21,719
Speaker 1: to them, but they still have cartoonish qualities to them,

774
00:46:21,760 --> 00:46:24,480
Speaker 1: so you wouldn't mistake them for actually being human. Or

775
00:46:24,560 --> 00:46:27,040
Speaker 1: you just you know, go the robot route or some

776
00:46:27,120 --> 00:46:31,239
Speaker 1: sort of animal career and you sidestep that problem. The

777
00:46:31,280 --> 00:46:34,440
Speaker 1: engineers conclude their paper by talking about how the attention

778
00:46:34,520 --> 00:46:37,960
Speaker 1: engine could, with some evolution, work for a lot of

779
00:46:37,960 --> 00:46:42,160
Speaker 1: different applications. So imagine that you design an animatronic that

780
00:46:42,239 --> 00:46:46,400
Speaker 1: represents someone who's really frightened, and that kind of character

781
00:46:46,560 --> 00:46:49,600
Speaker 1: might have a very low threshold for stimuli to push

782
00:46:49,640 --> 00:46:52,400
Speaker 1: it to a higher state of attentiveness, right like a

783
00:46:52,440 --> 00:46:54,920
Speaker 1: little sound might cause that character to perk up and

784
00:46:54,960 --> 00:46:59,440
Speaker 1: look around quickly because that that character is supposed to

785
00:46:59,440 --> 00:47:02,440
Speaker 1: be frightened. Or you could create something like, you know,

786
00:47:02,480 --> 00:47:06,279
Speaker 1: an absent minded book lover who only glances up from

787
00:47:06,440 --> 00:47:10,400
Speaker 1: whatever book they're studying if something really exciting is happening,

788
00:47:10,400 --> 00:47:14,120
Speaker 1: otherwise they just ignore it. They also talk about the

789
00:47:14,160 --> 00:47:18,520
Speaker 1: bottom up approach to layering behaviors and deciding which behaviors

790
00:47:18,560 --> 00:47:23,560
Speaker 1: will replace others that might inhabit a lower state. That

791
00:47:23,680 --> 00:47:26,520
Speaker 1: is really fascinating to me. Now, we're still a far

792
00:47:26,560 --> 00:47:29,560
Speaker 1: away off from seeing these sorts of technologies make their

793
00:47:29,600 --> 00:47:33,279
Speaker 1: way into official attractions, but based on what I've seen

794
00:47:33,440 --> 00:47:36,160
Speaker 1: and read, I wouldn't be surprised to find them making

795
00:47:36,200 --> 00:47:38,879
Speaker 1: their way into Disney parks in the next say, five

796
00:47:38,960 --> 00:47:42,760
Speaker 1: years or so, depending on how the company budgets stuff.

797
00:47:42,800 --> 00:47:46,839
Speaker 1: Of course, the pandemic has created a particularly tricky situation

798
00:47:46,920 --> 00:47:49,839
Speaker 1: for that branch of the Disney Company, even as other

799
00:47:49,880 --> 00:47:53,319
Speaker 1: branches of that company continue it's global domination of all

800
00:47:53,360 --> 00:47:57,759
Speaker 1: things entertainment. But the technology itself and the design philosophy

801
00:47:57,760 --> 00:48:00,480
Speaker 1: of how to program a robot to behave as if

802
00:48:00,560 --> 00:48:03,600
Speaker 1: it were doing so naturally, it's really neat to me.

803
00:48:04,080 --> 00:48:06,040
Speaker 1: And as I said at the beginning, the paper is

804
00:48:06,040 --> 00:48:09,320
Speaker 1: available for free to read, so if you want to

805
00:48:09,400 --> 00:48:13,359
Speaker 1: check that out, I highly recommend it. I think it

806
00:48:13,480 --> 00:48:16,440
Speaker 1: is a fascinating piece of work, and as I said,

807
00:48:17,000 --> 00:48:20,840
Speaker 1: it's not that difficult to follow. There's some math stuff

808
00:48:20,880 --> 00:48:23,160
Speaker 1: that will probably, you know, lose a lot of you,

809
00:48:23,239 --> 00:48:25,719
Speaker 1: but it lost me. I'm not I'm not trying to

810
00:48:25,760 --> 00:48:28,279
Speaker 1: shame you. I couldn't follow all of it, but it

811
00:48:28,360 --> 00:48:31,000
Speaker 1: is otherwise pretty easy to understand. And like I said,

812
00:48:31,000 --> 00:48:36,279
Speaker 1: it is titled Realistic and Interactive Robot Gaze g A

813
00:48:36,600 --> 00:48:40,040
Speaker 1: Z E, so check that out. It is really a

814
00:48:40,080 --> 00:48:43,759
Speaker 1: neat paper. Just I apologize for the pictures that are

815
00:48:43,760 --> 00:48:48,080
Speaker 1: in there because they're creepy as all get out. That's

816
00:48:48,080 --> 00:48:50,640
Speaker 1: it for me. I hope you guys enjoyed this episode.

817
00:48:50,840 --> 00:48:53,400
Speaker 1: If you have suggestions for future topics I should tackle

818
00:48:53,520 --> 00:48:56,360
Speaker 1: in tech stuff, let me know on Twitter. The handle

819
00:48:56,480 --> 00:48:59,840
Speaker 1: is text stuff h s W and I'll talk to

820
00:48:59,880 --> 00:49:08,160
Speaker 1: you again really soon. Text Stuff is an I Heart

821
00:49:08,280 --> 00:49:12,000
Speaker 1: Radio production. For more podcasts from I Heart Radio, visit

822
00:49:12,040 --> 00:49:15,080
Speaker 1: the I Heart Radio app, Apple Podcasts, or wherever you

823
00:49:15,200 --> 00:49:16,520
Speaker 1: listen to your favorite shows,