1 00:00:04,400 --> 00:00:07,800 Speaker 1: Welcome to tech Stuff, a production from my Heart Radio. 2 00:00:12,039 --> 00:00:14,760 Speaker 1: Hey there, and welcome to tech Stuff. I'm your host, 3 00:00:14,880 --> 00:00:18,160 Speaker 1: Jonathan Strickland. I'm an executive producer with I Heart Radio, 4 00:00:18,200 --> 00:00:21,560 Speaker 1: and I love all things tech. And Halloween is over 5 00:00:21,800 --> 00:00:23,639 Speaker 1: by the time you hear this. I hope you had 6 00:00:23,680 --> 00:00:26,560 Speaker 1: a happy one. But I still have something that falls 7 00:00:26,680 --> 00:00:31,200 Speaker 1: into the kind of creepy category, at least in my opinion. 8 00:00:31,760 --> 00:00:34,960 Speaker 1: And I discovered this after looking around at tech news 9 00:00:35,000 --> 00:00:37,920 Speaker 1: in general, and I became fascinated by it and figured, hey, 10 00:00:37,960 --> 00:00:40,839 Speaker 1: you know, I haven't done a really focused episode on 11 00:00:40,880 --> 00:00:44,920 Speaker 1: a very specific implementation of technology in a long time, 12 00:00:45,560 --> 00:00:48,840 Speaker 1: so why not do that now. Now, anyone who knows 13 00:00:48,920 --> 00:00:52,199 Speaker 1: me can tell you that I am a sucker for 14 00:00:52,479 --> 00:00:57,320 Speaker 1: Disney imagineering, which of course is the peculiar twist on 15 00:00:57,920 --> 00:01:03,560 Speaker 1: engineering and innovation that Disney champions. Right. The inventiveness and 16 00:01:03,640 --> 00:01:06,400 Speaker 1: the attention to detail impressed me a great deal. Those 17 00:01:06,440 --> 00:01:11,160 Speaker 1: are hallmarks of Disney engineering or imagineering. And I've done 18 00:01:11,200 --> 00:01:14,319 Speaker 1: episodes covering various elements that tie into this, from the 19 00:01:14,400 --> 00:01:18,160 Speaker 1: history of upcot to how audio animatronics work. And it's 20 00:01:18,200 --> 00:01:22,440 Speaker 1: that last topic I wish to revisit because not long 21 00:01:22,480 --> 00:01:27,080 Speaker 1: ago I read a research paper from Disney Imagineers titled 22 00:01:27,480 --> 00:01:33,040 Speaker 1: Realistic and Interactive Robot Gaze. That's g A Z E, 23 00:01:33,720 --> 00:01:36,240 Speaker 1: you know, referring to where a person or in this 24 00:01:36,319 --> 00:01:41,040 Speaker 1: case uh an object a robot appears to be looking. 25 00:01:41,920 --> 00:01:44,679 Speaker 1: And the paper is fascinating and it's available for anyone 26 00:01:44,720 --> 00:01:47,480 Speaker 1: to read for free. So if you find this subject 27 00:01:47,480 --> 00:01:50,480 Speaker 1: matter neat, I really recommend you read it. Now. It 28 00:01:50,560 --> 00:01:54,280 Speaker 1: does get a bit technical. There's some math in there too, 29 00:01:54,600 --> 00:01:56,880 Speaker 1: but for the most part, I think it's a pretty 30 00:01:56,960 --> 00:02:02,880 Speaker 1: accessible paper. The pictures and good gravy, y'all. The video 31 00:02:03,520 --> 00:02:08,320 Speaker 1: that are connected to this project are the stuff of nightmares, 32 00:02:09,120 --> 00:02:11,720 Speaker 1: but we'll get to that. The heart of the paper 33 00:02:12,160 --> 00:02:16,400 Speaker 1: is all about designing systems so that an audio animatronic 34 00:02:16,520 --> 00:02:20,800 Speaker 1: or or just an animatronic figure can make and maintain 35 00:02:21,000 --> 00:02:24,639 Speaker 1: eye contact or at least appear to with someone who 36 00:02:24,720 --> 00:02:27,799 Speaker 1: is looking at that figure and onlooker. So, in other words, 37 00:02:28,360 --> 00:02:32,360 Speaker 1: imagine that there's a Disney attraction at a park, and 38 00:02:32,720 --> 00:02:35,200 Speaker 1: in this attraction you can walk up to a robot. 39 00:02:35,560 --> 00:02:39,280 Speaker 1: It's probably going to be behind like a rail or 40 00:02:39,320 --> 00:02:41,440 Speaker 1: inside a booth or something, so that you can't you know, 41 00:02:41,960 --> 00:02:45,760 Speaker 1: touch it, and the robot notices you looking at it, 42 00:02:45,800 --> 00:02:48,720 Speaker 1: and it looks you in the eye. And then maybe 43 00:02:48,760 --> 00:02:51,639 Speaker 1: you get to chat with the robot and it maintains 44 00:02:51,639 --> 00:02:54,600 Speaker 1: eye contact with you, and occasionally maybe it's eyes dart 45 00:02:54,639 --> 00:02:57,360 Speaker 1: around to glance at other stuff that's within its field 46 00:02:57,400 --> 00:03:00,799 Speaker 1: of view, or maybe even indicating that the robot is 47 00:03:00,840 --> 00:03:03,880 Speaker 1: appearing to like take a second to think of a response. 48 00:03:04,320 --> 00:03:07,080 Speaker 1: That's kind of what we're talking about here. And here's 49 00:03:07,120 --> 00:03:11,480 Speaker 1: the thing. This is surprisingly difficult to do, and it's 50 00:03:11,680 --> 00:03:17,000 Speaker 1: extra hard to do without dipping into super unsettling territory. 51 00:03:17,120 --> 00:03:20,079 Speaker 1: So today we're going to learn more about the technology 52 00:03:20,120 --> 00:03:24,000 Speaker 1: and the psychology behind this project, as well as what 53 00:03:24,160 --> 00:03:28,560 Speaker 1: makes it different from earlier audio animatronics, which is honestly 54 00:03:28,560 --> 00:03:32,000 Speaker 1: a good place for us to start. The original audio 55 00:03:32,040 --> 00:03:36,480 Speaker 1: animatronics were essentially puppets. In fact, you could argue that 56 00:03:36,640 --> 00:03:42,840 Speaker 1: all animatronics are ultimately puppets. Each puppet has a certain 57 00:03:42,920 --> 00:03:46,000 Speaker 1: number of degrees of freedom, and that refers to a 58 00:03:46,080 --> 00:03:49,760 Speaker 1: number of independent directions of motion. So let's take a 59 00:03:49,800 --> 00:03:54,320 Speaker 1: simple example. Let's say that a robots neck only has 60 00:03:54,400 --> 00:03:56,920 Speaker 1: one degree of freedom. Well, that would mean the robot 61 00:03:57,040 --> 00:03:59,480 Speaker 1: might be able to nod its head up and down. 62 00:04:00,000 --> 00:04:01,520 Speaker 1: But if it could do that, it wouldn't be able 63 00:04:01,520 --> 00:04:03,600 Speaker 1: to shake its head or tilt its head, because that 64 00:04:03,600 --> 00:04:06,840 Speaker 1: would be an additional degree of freedom. Or maybe it's 65 00:04:06,880 --> 00:04:08,920 Speaker 1: able to shake its head, but it's not able to 66 00:04:09,000 --> 00:04:11,920 Speaker 1: nod or tilt because it only has that one degree 67 00:04:11,920 --> 00:04:14,720 Speaker 1: of freedom. That one degree is really limiting, and it 68 00:04:14,760 --> 00:04:20,520 Speaker 1: just tells us the full range of of direction emotions 69 00:04:21,040 --> 00:04:24,320 Speaker 1: that any one joint can do, and we typically talk 70 00:04:24,360 --> 00:04:27,680 Speaker 1: about degrees of freedom with joints to express the range 71 00:04:27,680 --> 00:04:32,160 Speaker 1: of possible motions the you know, whatever it is can perform. 72 00:04:32,200 --> 00:04:36,000 Speaker 1: The enchanted Tiki Room at Disneyland was an early example 73 00:04:36,080 --> 00:04:40,040 Speaker 1: of audio animatronic ingenuity. It wasn't the very first use 74 00:04:40,080 --> 00:04:43,280 Speaker 1: of audio animatronics, but it was an early one, and 75 00:04:43,360 --> 00:04:46,480 Speaker 1: when you learned how it worked behind the scenes, it's 76 00:04:46,560 --> 00:04:50,960 Speaker 1: pretty wacky. The various birds, flowers, and other elements in 77 00:04:50,960 --> 00:04:55,080 Speaker 1: the attraction connected to a very complex system, including some 78 00:04:55,200 --> 00:04:59,800 Speaker 1: pneumatic valves. A pneumatic system uses air under pressure to 79 00:05:00,040 --> 00:05:03,400 Speaker 1: do work, so these valves in turn connected to a 80 00:05:03,480 --> 00:05:08,400 Speaker 1: circuit that had thin metal reads as switches. Now, normally 81 00:05:08,680 --> 00:05:11,640 Speaker 1: the switch would be open, meaning no electricity can flow 82 00:05:11,680 --> 00:05:14,720 Speaker 1: through the circuit and thus provide electricity to open or 83 00:05:14,839 --> 00:05:19,040 Speaker 1: close the valve. But when sounds of a certain frequency 84 00:05:19,160 --> 00:05:22,719 Speaker 1: would play near these reads, it would cause those reads 85 00:05:22,720 --> 00:05:25,240 Speaker 1: to vibrate, and you know, depending on the thickness and 86 00:05:25,320 --> 00:05:28,440 Speaker 1: length of the read, that would determine what frequency of 87 00:05:28,520 --> 00:05:32,000 Speaker 1: sound would most likely get it to start vibrating. Once 88 00:05:32,000 --> 00:05:34,760 Speaker 1: it vibrated, it would close the circuit and thus allow 89 00:05:34,839 --> 00:05:39,160 Speaker 1: power to go through to the respective valve. And every 90 00:05:39,200 --> 00:05:41,200 Speaker 1: bird and flower in the attraction had this sort of 91 00:05:41,240 --> 00:05:45,279 Speaker 1: system where the sounds playing through the sound system would 92 00:05:45,320 --> 00:05:48,400 Speaker 1: actually cause the individual circuits for those birds and flowers 93 00:05:48,400 --> 00:05:51,919 Speaker 1: to activate. So the chirping of the bird, that chirping 94 00:05:51,920 --> 00:05:54,320 Speaker 1: sound was actually the sound that was opening and closing 95 00:05:54,360 --> 00:05:58,240 Speaker 1: the the circuit and thus activating the valve that would 96 00:05:58,279 --> 00:06:01,719 Speaker 1: control the bird's beak. And because the figures relied on 97 00:06:01,760 --> 00:06:04,839 Speaker 1: the sound to close the circuit, they were audio animatronics. 98 00:06:05,480 --> 00:06:08,479 Speaker 1: Over the years, Disney would improve on this design, sometimes 99 00:06:08,520 --> 00:06:12,279 Speaker 1: by necessity. So for example, when the imagineers set out 100 00:06:12,279 --> 00:06:15,680 Speaker 1: to create the attraction The Great Moments with Mr. Lincoln, 101 00:06:16,279 --> 00:06:18,520 Speaker 1: they had to come up with new mechanisms to do 102 00:06:18,640 --> 00:06:22,600 Speaker 1: that because pneumatics would not be a good solution. With pneumatics, 103 00:06:22,600 --> 00:06:25,520 Speaker 1: you've got a couple of limitations that you're working with. 104 00:06:25,600 --> 00:06:29,560 Speaker 1: One is that you can't move really heavy stuff effectively 105 00:06:29,640 --> 00:06:34,159 Speaker 1: with pneumatics. Another is that pneumatic pistons tend to move 106 00:06:34,200 --> 00:06:38,320 Speaker 1: really fast. It's hard to do controlled slow movements with pneumatics. 107 00:06:38,320 --> 00:06:40,760 Speaker 1: So it might be okay for something like a bird 108 00:06:40,760 --> 00:06:44,320 Speaker 1: flapping its wings or opening and closing its beak fairly quickly, 109 00:06:44,720 --> 00:06:47,560 Speaker 1: but it's not so great for say, a revered US 110 00:06:47,640 --> 00:06:51,920 Speaker 1: president lifting his hand. But I've covered that in other episodes. 111 00:06:52,480 --> 00:06:54,880 Speaker 1: The really important thing I want to stress is that 112 00:06:55,000 --> 00:07:00,520 Speaker 1: audio animatronic figures have historically been limited to a cific, 113 00:07:00,920 --> 00:07:05,880 Speaker 1: pre programmed sequence of motions, so calling them puppets is 114 00:07:06,279 --> 00:07:10,360 Speaker 1: fairly appropriate. These are figures that will do the exact 115 00:07:10,440 --> 00:07:14,160 Speaker 1: same sequence of motions until something goes wrong or the 116 00:07:14,200 --> 00:07:17,600 Speaker 1: attraction is shut off for some reason. The pirate and 117 00:07:17,680 --> 00:07:20,800 Speaker 1: Pirates of the Caribbean that is precariously attempting to step 118 00:07:20,840 --> 00:07:23,960 Speaker 1: onto a rowboat is never going to fall into the water. 119 00:07:24,360 --> 00:07:26,720 Speaker 1: He's never going to get into the boat, and he's 120 00:07:26,760 --> 00:07:29,800 Speaker 1: never gonna step back onto the shore. He will continue 121 00:07:29,960 --> 00:07:34,520 Speaker 1: his balancing act until the end of time. And this 122 00:07:34,600 --> 00:07:37,600 Speaker 1: is starting to sound like some sort of Greek myth 123 00:07:37,680 --> 00:07:40,640 Speaker 1: about the afterlife at this point. Now, the reason I'm 124 00:07:40,640 --> 00:07:43,880 Speaker 1: bringing this up, the reason it's important, is that creating 125 00:07:44,040 --> 00:07:48,520 Speaker 1: an animatronic figure that can actually detect an onlookers gaze 126 00:07:48,960 --> 00:07:53,200 Speaker 1: and return it making eye contact can't be totally dedicated 127 00:07:53,240 --> 00:07:57,520 Speaker 1: to following the same set of motions on repeat. There 128 00:07:57,560 --> 00:08:01,240 Speaker 1: has to be some room for variability within it. At 129 00:08:01,240 --> 00:08:04,840 Speaker 1: the same time, Disney's whole gig is to create a show. 130 00:08:05,440 --> 00:08:09,000 Speaker 1: The amusement parks are show business. If you are in 131 00:08:09,160 --> 00:08:12,040 Speaker 1: a public space of one of those parks, like you're 132 00:08:12,120 --> 00:08:15,240 Speaker 1: inside the confines of the park itself, walgging a down 133 00:08:15,280 --> 00:08:19,520 Speaker 1: Main street or whatever, you are on stage. The employees 134 00:08:19,520 --> 00:08:23,400 Speaker 1: are called cast members, and shows, while they can have 135 00:08:23,480 --> 00:08:27,040 Speaker 1: some variation in them, are supposed to follow a general flow. 136 00:08:27,160 --> 00:08:30,720 Speaker 1: They follow a script. And so the imagineers were working 137 00:08:30,720 --> 00:08:33,680 Speaker 1: on creating a figure that would follow a scripted set 138 00:08:33,679 --> 00:08:36,280 Speaker 1: of behaviors, but would have the freedom to throw in 139 00:08:36,360 --> 00:08:39,840 Speaker 1: stuff like eye contact now and then the figure, in 140 00:08:39,880 --> 00:08:44,600 Speaker 1: a way would be able to improvise. It's jazz Baby. 141 00:08:44,840 --> 00:08:46,839 Speaker 1: The tune is more or less set, but how you 142 00:08:46,920 --> 00:08:49,960 Speaker 1: go through it allows for a lot of variation. For 143 00:08:50,040 --> 00:08:53,040 Speaker 1: the purposes of this work, the team relied on an 144 00:08:53,040 --> 00:08:56,800 Speaker 1: animatronic bust. Now we've kind of dropped the audio at 145 00:08:56,800 --> 00:09:01,480 Speaker 1: this point. Modern animatronic figures are not really driven by 146 00:09:01,640 --> 00:09:06,520 Speaker 1: audio signals anymore. They're driven by circuitry and sophisticated computer 147 00:09:06,640 --> 00:09:11,720 Speaker 1: systems and programs. Though to be fair, they still often 148 00:09:11,760 --> 00:09:15,120 Speaker 1: are referred to as audio animatronic. But you really need 149 00:09:15,200 --> 00:09:18,240 Speaker 1: to see a picture of this thing. I'll do my 150 00:09:18,280 --> 00:09:21,080 Speaker 1: best to describe it, but really you should search this 151 00:09:21,240 --> 00:09:27,600 Speaker 1: Disney uh interactive gaze animatronic because who boy, so imagine 152 00:09:27,679 --> 00:09:32,000 Speaker 1: the V shaped torso of a bust sculpture, right, It's 153 00:09:32,080 --> 00:09:34,640 Speaker 1: very narrow at the bottom, and it widens up to 154 00:09:34,679 --> 00:09:38,360 Speaker 1: the shoulders. It's clad in a white button up shirt, 155 00:09:38,640 --> 00:09:40,880 Speaker 1: you know, kind of like an Oxford shirt of business shirt. 156 00:09:41,880 --> 00:09:44,800 Speaker 1: It does have shoulders, but does not have arms. It 157 00:09:44,880 --> 00:09:48,920 Speaker 1: has a head, good golly, it has a head. The 158 00:09:49,000 --> 00:09:52,560 Speaker 1: head of this figure has a sort of plastic skull, 159 00:09:53,280 --> 00:09:56,680 Speaker 1: though it's kind of more like a plastic mask than 160 00:09:56,960 --> 00:10:00,199 Speaker 1: a human skull. It doesn't look like a skeleton skull. 161 00:10:00,679 --> 00:10:04,200 Speaker 1: It does have eyes, it's even got eyelids, and it's 162 00:10:04,240 --> 00:10:08,719 Speaker 1: got teeth. And looking at this thing is a little unsettling. 163 00:10:09,360 --> 00:10:12,920 Speaker 1: And that's before it even makes eye contact with you. Now, 164 00:10:13,000 --> 00:10:15,840 Speaker 1: why would you want to make something like this be 165 00:10:15,960 --> 00:10:18,920 Speaker 1: able to make eye contact in the first place. Well, 166 00:10:18,960 --> 00:10:24,280 Speaker 1: eye contact is an important social signal. It shows mutual acknowledgement, 167 00:10:24,360 --> 00:10:27,360 Speaker 1: and it can lead us to projecting certain things upon 168 00:10:27,400 --> 00:10:31,199 Speaker 1: the person or animal that's making eye contact with us. 169 00:10:31,480 --> 00:10:34,760 Speaker 1: We tend to perceive such creatures as possessing a certain 170 00:10:34,760 --> 00:10:38,960 Speaker 1: amount of intelligence and sincerity. For example, when I make 171 00:10:39,040 --> 00:10:42,360 Speaker 1: eye contact with my dog Ti Bolt, I perceive him 172 00:10:42,400 --> 00:10:46,440 Speaker 1: to be intelligent and alert and loving. Now I have 173 00:10:46,520 --> 00:10:49,400 Speaker 1: no way of knowing what is really going on in 174 00:10:49,520 --> 00:10:53,160 Speaker 1: his doggy mind. I suspect it's probably more along the 175 00:10:53,200 --> 00:10:55,760 Speaker 1: lines of is the bald man about to give me 176 00:10:55,840 --> 00:10:59,120 Speaker 1: a treat? I should pay attention, But I like to 177 00:10:59,160 --> 00:11:03,120 Speaker 1: think of it as sincere love. Now, as the paper states, quote, 178 00:11:03,640 --> 00:11:07,480 Speaker 1: given the importance of gays in social interactions, as well 179 00:11:07,520 --> 00:11:11,280 Speaker 1: as its ability to communicate states and shape perceptions, it 180 00:11:11,400 --> 00:11:14,480 Speaker 1: is a parent that gays can function as a significant 181 00:11:14,520 --> 00:11:19,160 Speaker 1: tool for an interactive robot character end quote. And I 182 00:11:19,160 --> 00:11:21,840 Speaker 1: can totally grock that. I imagine what it might be 183 00:11:21,920 --> 00:11:25,160 Speaker 1: like to a child who's going to Disney World or 184 00:11:25,240 --> 00:11:28,400 Speaker 1: Disneyland for the very first time and going to a 185 00:11:28,520 --> 00:11:32,280 Speaker 1: ride or an attraction where there's an animatronic figure, perhaps 186 00:11:32,400 --> 00:11:35,400 Speaker 1: one that looks like a famous Disney character, and it 187 00:11:35,480 --> 00:11:38,560 Speaker 1: makes eye contact with that child, maybe it even speaks 188 00:11:38,600 --> 00:11:40,839 Speaker 1: to the child, and maybe it can respond to the 189 00:11:40,920 --> 00:11:44,400 Speaker 1: child of the child speaks back. That sort of interaction 190 00:11:44,720 --> 00:11:46,240 Speaker 1: would have been the kind of stuff that would have 191 00:11:46,240 --> 00:11:49,560 Speaker 1: stuck with me as a kid well into adulthood, and 192 00:11:49,600 --> 00:11:52,240 Speaker 1: I feel confident about that because I have a lot 193 00:11:52,280 --> 00:11:56,880 Speaker 1: of memories of the seemingly magical moments I've experienced at 194 00:11:56,920 --> 00:12:00,400 Speaker 1: Disney with far more primitive technology. Is that we're in 195 00:12:00,440 --> 00:12:03,040 Speaker 1: the Disney parks when I first started visiting them in 196 00:12:03,080 --> 00:12:06,400 Speaker 1: the nineteen seventies, so I can certainly see the show 197 00:12:06,559 --> 00:12:09,800 Speaker 1: need for this sort of development. But there are numerous 198 00:12:09,920 --> 00:12:12,640 Speaker 1: challenges that stand in the way of achieving this goal, 199 00:12:12,760 --> 00:12:16,880 Speaker 1: and they fall into different broad categories. Perhaps the easiest 200 00:12:17,000 --> 00:12:20,079 Speaker 1: set of challenges to conquer is actually the electro mechanical 201 00:12:20,240 --> 00:12:23,360 Speaker 1: side of things. That is, the actual mechanisms that you're 202 00:12:23,360 --> 00:12:27,240 Speaker 1: going to use to create these effects, the servos and 203 00:12:27,280 --> 00:12:29,920 Speaker 1: the motors and the other components that will create the 204 00:12:29,960 --> 00:12:33,720 Speaker 1: actual motions that will translate into the robot making eye 205 00:12:33,720 --> 00:12:38,920 Speaker 1: contact or behaving in otherwise realistic ways. That's one of 206 00:12:38,960 --> 00:12:42,280 Speaker 1: the set of challenges, but there are others. One is 207 00:12:42,280 --> 00:12:45,480 Speaker 1: giving the robot the ability to detect the gaze of 208 00:12:45,600 --> 00:12:48,160 Speaker 1: onlookers in the first place. There has to be some 209 00:12:48,240 --> 00:12:52,880 Speaker 1: sort of face recognition and maybe even eye tracking technology 210 00:12:52,960 --> 00:12:56,600 Speaker 1: so that the robot looks at the right spot. So 211 00:12:56,640 --> 00:12:59,360 Speaker 1: the electro mechanical parts have to work correctly, but so 212 00:12:59,400 --> 00:13:03,600 Speaker 1: does the robot vision or perception. Otherwise the robot is 213 00:13:03,600 --> 00:13:06,199 Speaker 1: going to look in the wrong spot, perhaps staring off 214 00:13:06,240 --> 00:13:09,560 Speaker 1: to one side or above or below and onlooker's eye 215 00:13:09,559 --> 00:13:14,160 Speaker 1: contact or attempt at eye contact. Another challenge would be 216 00:13:14,200 --> 00:13:16,440 Speaker 1: on the programming side. You have to figure out how 217 00:13:16,440 --> 00:13:18,719 Speaker 1: to determine who the figure is going to look at. 218 00:13:19,000 --> 00:13:22,199 Speaker 1: You also have to figure out how long the robot 219 00:13:22,200 --> 00:13:26,120 Speaker 1: will look at somebody and what could distract the robot, 220 00:13:26,240 --> 00:13:29,320 Speaker 1: and whether or not the robot would return to looking at, 221 00:13:29,440 --> 00:13:32,240 Speaker 1: you know, the first person, or maybe look at a 222 00:13:32,280 --> 00:13:35,040 Speaker 1: second person, or maybe look at something else Entirely, you 223 00:13:35,080 --> 00:13:38,319 Speaker 1: have to solve the challenge of the program and prioritize 224 00:13:38,360 --> 00:13:41,480 Speaker 1: the order of operations so that the robot behaves in 225 00:13:41,480 --> 00:13:43,920 Speaker 1: a way that makes sense, as opposed to a robot 226 00:13:43,920 --> 00:13:47,679 Speaker 1: that's just you know, reacting to all visual stimuli in 227 00:13:47,720 --> 00:13:51,880 Speaker 1: a random way, which would be at the very least disconcerting. 228 00:13:52,640 --> 00:13:54,480 Speaker 1: And then we get to something that's a bit harder 229 00:13:54,520 --> 00:13:57,760 Speaker 1: to define than degrees of freedom or range of motion 230 00:13:58,160 --> 00:14:02,120 Speaker 1: or the hierarchy of programming, and that's human psychology. Now, 231 00:14:02,160 --> 00:14:05,559 Speaker 1: as the paper points out, eye contact is an important 232 00:14:05,600 --> 00:14:09,160 Speaker 1: social cue for most of us, but there are a 233 00:14:09,160 --> 00:14:11,960 Speaker 1: whole range of humans out there right For people who 234 00:14:11,960 --> 00:14:15,600 Speaker 1: have autism, eye contact can be a really challenging task, 235 00:14:16,160 --> 00:14:19,040 Speaker 1: and it tends to make people who have this type 236 00:14:19,040 --> 00:14:21,880 Speaker 1: of autism. It makes their lives a little more difficult 237 00:14:22,040 --> 00:14:26,520 Speaker 1: or complicated as a result. It's something that people some 238 00:14:26,560 --> 00:14:29,280 Speaker 1: people anyway, have to consciously deal with. They have to 239 00:14:30,040 --> 00:14:32,720 Speaker 1: remember to do this and work at it. It's not 240 00:14:32,920 --> 00:14:35,120 Speaker 1: it's not a natural behavior for them. So this is 241 00:14:35,160 --> 00:14:37,320 Speaker 1: something that can be tricky for human beings, let alone 242 00:14:37,400 --> 00:14:41,240 Speaker 1: for robots. Now, while eye contact can help create a 243 00:14:41,280 --> 00:14:44,320 Speaker 1: sense of sincerity and interest, it can also shift over 244 00:14:44,360 --> 00:14:48,560 Speaker 1: into more unpleasant territory, such as a sense of predatory 245 00:14:48,680 --> 00:14:52,840 Speaker 1: intent or as a comedian I once saw said there's 246 00:14:52,840 --> 00:14:55,840 Speaker 1: a fine line between the casual eye contact of a 247 00:14:55,880 --> 00:14:59,040 Speaker 1: friend and the cold stare of a serial killer. He 248 00:14:59,120 --> 00:15:01,960 Speaker 1: was specifically taught king about trying to navigate the tricky 249 00:15:02,040 --> 00:15:05,040 Speaker 1: territory of approaching people in order to get to know them. 250 00:15:05,400 --> 00:15:07,400 Speaker 1: But I think the meaning could be used for lots 251 00:15:07,400 --> 00:15:11,160 Speaker 1: of scenarios, including an encounter with a robotic figure. And 252 00:15:11,240 --> 00:15:15,640 Speaker 1: along with that is the issue of the uncanny valley, 253 00:15:15,680 --> 00:15:19,120 Speaker 1: which I have touched on in previous episodes. I'm not 254 00:15:19,200 --> 00:15:21,920 Speaker 1: sure if I've ever actually talked about the origin of 255 00:15:21,960 --> 00:15:25,400 Speaker 1: the phrase, however, a professor at the Tokyo Institute of 256 00:15:25,400 --> 00:15:28,960 Speaker 1: Technology named massa Hiro Mori coined this phrase in the 257 00:15:29,040 --> 00:15:33,800 Speaker 1: nineteen seventies to describe a pretty odd phenomenon. As robots 258 00:15:33,880 --> 00:15:37,680 Speaker 1: become more human like or more lifelike in general, they 259 00:15:37,720 --> 00:15:41,640 Speaker 1: become more appealing to us, but only up to a point, 260 00:15:42,120 --> 00:15:44,560 Speaker 1: and once they get to that point and go beyond it, 261 00:15:45,440 --> 00:15:51,040 Speaker 1: our reception of these robots plunges into the uncanny valley. 262 00:15:51,120 --> 00:15:54,680 Speaker 1: The valley in this case is how humans react to 263 00:15:54,880 --> 00:15:57,440 Speaker 1: the robot. This also applies to other stuff like c 264 00:15:57,640 --> 00:16:00,920 Speaker 1: g I characters, for instance, and other words are a 265 00:16:01,000 --> 00:16:04,560 Speaker 1: robot that might be a simple industrial arm is one 266 00:16:04,640 --> 00:16:07,440 Speaker 1: we probably wouldn't feel very much affinity for, you know, 267 00:16:07,480 --> 00:16:11,680 Speaker 1: it's obviously a machine. A robot that still looks really robotic, 268 00:16:11,800 --> 00:16:14,240 Speaker 1: but has you know, arms and legs like a vaguely 269 00:16:14,280 --> 00:16:17,280 Speaker 1: humanoid shape. We would probably feel a little more affinity 270 00:16:17,320 --> 00:16:20,280 Speaker 1: towards that make it look a little bit more human, 271 00:16:20,560 --> 00:16:23,360 Speaker 1: but you know, not to the point where anyone would 272 00:16:23,800 --> 00:16:26,880 Speaker 1: mistake it for being human. We might like it even more. 273 00:16:27,280 --> 00:16:29,960 Speaker 1: But once you start getting close to but not quite 274 00:16:30,160 --> 00:16:33,960 Speaker 1: human in appearance and behavior, our response drops to a 275 00:16:34,000 --> 00:16:37,720 Speaker 1: point where a lot of people feel unsettled, or even 276 00:16:37,880 --> 00:16:41,960 Speaker 1: they might feel revulsion when looking at the figure. Something is, 277 00:16:42,000 --> 00:16:45,160 Speaker 1: you know, not right. The cues that would normally help 278 00:16:45,200 --> 00:16:48,800 Speaker 1: us identify with the synthetic figure now feel strange and 279 00:16:48,880 --> 00:16:52,960 Speaker 1: maybe even scary. It's possible to get beyond the uncanny 280 00:16:53,040 --> 00:16:56,080 Speaker 1: valley to create a robot or c g I character 281 00:16:56,440 --> 00:17:00,720 Speaker 1: that doesn't initiate this kind of instant revulsion, but it 282 00:17:00,880 --> 00:17:03,480 Speaker 1: is very hard to do so. A big challenge is 283 00:17:03,520 --> 00:17:08,240 Speaker 1: building an animatronic that doesn't trigger the uncanny value response 284 00:17:08,320 --> 00:17:11,159 Speaker 1: either by avoiding the trap of being almost but not 285 00:17:11,280 --> 00:17:14,359 Speaker 1: quite human in behavior, you know, by keeping things a 286 00:17:14,359 --> 00:17:18,520 Speaker 1: bit more obviously robotic, so there's that clear and distinct 287 00:17:18,600 --> 00:17:22,840 Speaker 1: separation that kind of removes that that response we have, 288 00:17:23,480 --> 00:17:27,160 Speaker 1: or creating something lifelike enough that we feel the same 289 00:17:27,200 --> 00:17:29,760 Speaker 1: sort of reactions we would experience if that were a 290 00:17:29,800 --> 00:17:34,399 Speaker 1: real human. So it's tough to do. It's easier to 291 00:17:34,440 --> 00:17:37,879 Speaker 1: do the robot approach than it is to get something 292 00:17:37,920 --> 00:17:40,960 Speaker 1: that seems human enough that we let our guard down. 293 00:17:41,600 --> 00:17:44,400 Speaker 1: None of these challenges are trivial, but they all require 294 00:17:44,480 --> 00:17:49,000 Speaker 1: distinct approaches that must ultimately converge into a single implementation. 295 00:17:49,760 --> 00:17:51,600 Speaker 1: When we come back, I'll talk about some of the 296 00:17:51,640 --> 00:17:55,359 Speaker 1: technologies in this animatronic figure and the engineering team's philosophy 297 00:17:55,440 --> 00:17:58,959 Speaker 1: behind their design choices. But first let's take a quick break. 298 00:18:06,560 --> 00:18:10,080 Speaker 1: The engineering team limited itself to parameters that related to 299 00:18:10,119 --> 00:18:13,680 Speaker 1: creating a robot that could direct its gaze towards onlookers, 300 00:18:13,840 --> 00:18:16,760 Speaker 1: which meant they didn't have to worry about it doing 301 00:18:17,280 --> 00:18:21,520 Speaker 1: literally anything else. The audio animatronic bus they used has 302 00:18:21,760 --> 00:18:25,640 Speaker 1: nineteen degrees of freedom total, but the team made no 303 00:18:25,840 --> 00:18:28,600 Speaker 1: use of ten of those. They only used nine degrees 304 00:18:28,680 --> 00:18:32,040 Speaker 1: of freedom. They focused on the neck, which has three 305 00:18:32,080 --> 00:18:35,919 Speaker 1: degrees of freedom. The eyelids, which have two degrees of freedom, 306 00:18:35,960 --> 00:18:39,400 Speaker 1: the eyes, which also have too, and the eyebrows, which 307 00:18:39,440 --> 00:18:42,479 Speaker 1: have two degrees of freedom. The unused degrees of freedom 308 00:18:42,480 --> 00:18:44,840 Speaker 1: are for moving the jaw and the lips of the figure, 309 00:18:45,240 --> 00:18:48,320 Speaker 1: but since that's not necessary to make eye contact, the 310 00:18:48,359 --> 00:18:51,400 Speaker 1: team just ignored those they didn't need to mess with them, 311 00:18:51,440 --> 00:18:54,000 Speaker 1: which means we get the effect of a robotic skull 312 00:18:54,160 --> 00:18:57,920 Speaker 1: with an unchanging rictus grin staring at us as its 313 00:18:57,960 --> 00:19:01,679 Speaker 1: upper facial area remains animated it. I guess what I'm 314 00:19:01,680 --> 00:19:06,159 Speaker 1: saying is I didn't find the overall effect particularly comforting. 315 00:19:06,880 --> 00:19:10,760 Speaker 1: According to the paper, the commands going to these components 316 00:19:10,800 --> 00:19:15,399 Speaker 1: come from a quote custom proprietary software stack operating on 317 00:19:15,440 --> 00:19:19,800 Speaker 1: a one hurts real time loop end quote. Hurts is 318 00:19:19,840 --> 00:19:23,160 Speaker 1: a cycle per second, so this means that the software 319 00:19:23,240 --> 00:19:26,919 Speaker 1: is pulsing out operations one hundred times every second to 320 00:19:27,080 --> 00:19:31,280 Speaker 1: control this animatronic bust. Many of those commands aren't only 321 00:19:31,440 --> 00:19:34,800 Speaker 1: about making the bus do something specific, but to do 322 00:19:34,960 --> 00:19:39,399 Speaker 1: it in a specific way. Let's get back to the 323 00:19:39,440 --> 00:19:43,119 Speaker 1: Tiki birds as an example. The pneumatic valve that would 324 00:19:43,119 --> 00:19:45,840 Speaker 1: control whether or not pressurized air could travel to a 325 00:19:45,920 --> 00:19:49,920 Speaker 1: specific place like the mechanism that operates a bird's beak 326 00:19:50,480 --> 00:19:52,920 Speaker 1: is a pretty simple on or off switch, meaning the 327 00:19:53,000 --> 00:19:55,399 Speaker 1: valve is either open, in which case air can flow, 328 00:19:56,000 --> 00:19:58,199 Speaker 1: or it's closed, in which case the air is blocked 329 00:19:58,200 --> 00:20:01,760 Speaker 1: from flowing through. And a debating the mechanism, So the 330 00:20:01,800 --> 00:20:05,000 Speaker 1: beak has a natural resting position, and for this example, 331 00:20:05,080 --> 00:20:08,720 Speaker 1: will just assume that the rest position is a closed beak, 332 00:20:09,600 --> 00:20:12,119 Speaker 1: and so that's what the beak will always return to 333 00:20:12,320 --> 00:20:16,080 Speaker 1: when there's no air flowing. To the mechanism that opens 334 00:20:16,119 --> 00:20:19,040 Speaker 1: the beak. If we open the valve, it lets air through, 335 00:20:19,280 --> 00:20:21,399 Speaker 1: It rushes to the end point, forces the beak to 336 00:20:21,600 --> 00:20:25,280 Speaker 1: open rapidly. Closing and opening the valve quickly forces the 337 00:20:25,280 --> 00:20:28,560 Speaker 1: bird's beak to open and close quickly, and when matched 338 00:20:28,560 --> 00:20:31,080 Speaker 1: with a soundtrack, it looks as though the bird is 339 00:20:31,119 --> 00:20:34,240 Speaker 1: speaking or singing, or you know, whatever it's doing. But 340 00:20:34,320 --> 00:20:37,080 Speaker 1: that movement is rapid and, just as I mentioned earlier, 341 00:20:37,160 --> 00:20:41,919 Speaker 1: not suitable for all animatronic applications. Having life sized humanoids 342 00:20:41,960 --> 00:20:45,080 Speaker 1: move with that kind of alarming speed would be scary 343 00:20:45,119 --> 00:20:49,040 Speaker 1: and legitimately dangerous. The greater mass of the figures would 344 00:20:49,080 --> 00:20:51,800 Speaker 1: mean you're dealing with larger amounts of inertia. I mean, 345 00:20:51,840 --> 00:20:54,400 Speaker 1: I just imagine what it would look like if Mr Lincoln, 346 00:20:54,480 --> 00:20:56,760 Speaker 1: in an effort to raise his hand in a gentle 347 00:20:56,800 --> 00:21:01,400 Speaker 1: show of reserve determination, instead violently karate chopped his own 348 00:21:01,440 --> 00:21:05,159 Speaker 1: head off. It would be, as the kids say, a 349 00:21:05,240 --> 00:21:10,040 Speaker 1: bad look. To create the illusion of life, the animatronics 350 00:21:10,080 --> 00:21:14,480 Speaker 1: that Disney designs follow certain general strategies. One is called 351 00:21:14,640 --> 00:21:18,640 Speaker 1: slow in and slow out. Now. This refers to general 352 00:21:18,680 --> 00:21:22,280 Speaker 1: movements and the ideas that any movement should start off 353 00:21:22,400 --> 00:21:26,240 Speaker 1: slowly and then pick up speed as the movement continues, 354 00:21:26,800 --> 00:21:30,080 Speaker 1: and then slow down again before coming to a stop. 355 00:21:30,440 --> 00:21:32,879 Speaker 1: And it makes the motions appear more fluid, and it 356 00:21:32,880 --> 00:21:35,320 Speaker 1: has the added benefit of not being quite so harsh 357 00:21:35,359 --> 00:21:38,680 Speaker 1: on the figures themselves. So when a Disney figure raises 358 00:21:38,800 --> 00:21:41,720 Speaker 1: its hand, the hand should start off moving upward with 359 00:21:41,760 --> 00:21:45,399 Speaker 1: a nice, smooth slow motion, pick up a bit of 360 00:21:45,440 --> 00:21:48,960 Speaker 1: speed as it's moving upward, and then slow down again 361 00:21:49,000 --> 00:21:52,199 Speaker 1: as it's approaching its end point. And this means that 362 00:21:52,280 --> 00:21:55,440 Speaker 1: the underlying motors and mechanical systems have to be capable 363 00:21:55,560 --> 00:21:59,240 Speaker 1: of achieving the strategy. It's why you can't use pneumatic systems. 364 00:21:59,240 --> 00:22:02,320 Speaker 1: They can't be those simple single speed devices that are 365 00:22:02,320 --> 00:22:06,080 Speaker 1: either on or off, like the Tiki birds. Oh, and 366 00:22:06,119 --> 00:22:08,320 Speaker 1: I guess I should specify I'm talking in this case 367 00:22:08,320 --> 00:22:11,639 Speaker 1: about the original Tiki birds because the birds in the 368 00:22:11,680 --> 00:22:15,600 Speaker 1: attractions today work on updated and more sophisticated computer systems 369 00:22:15,600 --> 00:22:17,760 Speaker 1: that take up a fraction of a fraction of the 370 00:22:17,800 --> 00:22:21,960 Speaker 1: space of the old attraction, which essentially required an entire 371 00:22:22,119 --> 00:22:24,920 Speaker 1: room filled with cables and tubes to make everything work 372 00:22:25,040 --> 00:22:30,240 Speaker 1: underneath the actual attraction itself. Now a few computers handled 373 00:22:30,280 --> 00:22:35,359 Speaker 1: the whole shebang. Anyway, Let's get back to animatronics. Some 374 00:22:35,520 --> 00:22:39,080 Speaker 1: of the other guiding principles in animatronic motion that in 375 00:22:39,160 --> 00:22:42,240 Speaker 1: turn dictate the types of motors and joints and other 376 00:22:42,280 --> 00:22:45,680 Speaker 1: mechanical elements that the team mustn't use to to make 377 00:22:45,760 --> 00:22:50,000 Speaker 1: these happen include designing motions as arcs, meaning the motion 378 00:22:50,040 --> 00:22:54,560 Speaker 1: should follow an arched trajectory. Another is that the motions 379 00:22:54,680 --> 00:22:58,960 Speaker 1: should have overlap, meaning a robot shouldn't move a single 380 00:22:59,040 --> 00:23:03,320 Speaker 1: element like an arm, stop, then go to move on 381 00:23:03,400 --> 00:23:07,840 Speaker 1: the next element like the head position, and then stop 382 00:23:07,880 --> 00:23:12,160 Speaker 1: and so on, because that would be well, really robotic. Instead, 383 00:23:12,200 --> 00:23:16,040 Speaker 1: the robots motions should overlap with one another so that 384 00:23:16,359 --> 00:23:18,879 Speaker 1: Let's say Mr. Lincoln is turning his head at the 385 00:23:18,920 --> 00:23:22,320 Speaker 1: same time his arm is going up in determination. Now, 386 00:23:22,400 --> 00:23:26,040 Speaker 1: another element that's connected to this concept is that of drag, 387 00:23:26,480 --> 00:23:29,040 Speaker 1: which means that the different body parts are moving at 388 00:23:29,119 --> 00:23:31,960 Speaker 1: different frequencies or timing. They're not moving all at the 389 00:23:32,000 --> 00:23:35,000 Speaker 1: same speed. So, in other words, the speed at which Mr. 390 00:23:35,040 --> 00:23:38,399 Speaker 1: Lincoln turns his head might be slightly faster or slower 391 00:23:38,440 --> 00:23:41,280 Speaker 1: than the speed at which his arm goes up. This 392 00:23:41,359 --> 00:23:44,560 Speaker 1: is all in an effort to create the illusion of life, 393 00:23:44,640 --> 00:23:47,960 Speaker 1: but it also means that the programming in hardware underlying 394 00:23:48,000 --> 00:23:51,840 Speaker 1: the figure has to support those strategies. For the purposes 395 00:23:51,880 --> 00:23:54,919 Speaker 1: of this project, the engineers had certain motions they wanted 396 00:23:54,960 --> 00:23:58,000 Speaker 1: to be included. One minimum set of motions needed were 397 00:23:58,080 --> 00:24:02,360 Speaker 1: some that would imply that the bust was a breathing entity, 398 00:24:02,400 --> 00:24:04,920 Speaker 1: So I need to move slightly as if it were 399 00:24:05,040 --> 00:24:08,960 Speaker 1: drawing breath. Blinking was also an important motion to get down, 400 00:24:09,080 --> 00:24:11,359 Speaker 1: as it would be more than a little unnerving to 401 00:24:11,359 --> 00:24:14,440 Speaker 1: have an animatronic figure make eye contact with you and 402 00:24:14,480 --> 00:24:19,600 Speaker 1: then never ever blink. And then there were the scads. 403 00:24:20,440 --> 00:24:23,040 Speaker 1: Now I have to confess something to you, guys. When 404 00:24:23,040 --> 00:24:26,639 Speaker 1: I first encountered the word scads, which is S A 405 00:24:26,960 --> 00:24:30,920 Speaker 1: C C A D E S. I had no idea 406 00:24:30,960 --> 00:24:33,239 Speaker 1: what that meant. It was a new word to me, 407 00:24:33,840 --> 00:24:35,560 Speaker 1: and maybe it's a new word for some of you 408 00:24:35,640 --> 00:24:38,760 Speaker 1: out there too. So if you happen to be like me, 409 00:24:39,160 --> 00:24:42,720 Speaker 1: what the heck are scads? Well? That refers to the quick, 410 00:24:43,000 --> 00:24:47,200 Speaker 1: simultaneous movement of both eyes from one point of focus 411 00:24:47,280 --> 00:24:50,240 Speaker 1: to another. So think about how you might take in 412 00:24:50,320 --> 00:24:52,719 Speaker 1: a scene that has a lot of stuff going on. 413 00:24:52,800 --> 00:24:57,640 Speaker 1: Let's say you you walk up to a building that's 414 00:24:57,640 --> 00:25:00,920 Speaker 1: that's that's burning. Well, your is are going to dart 415 00:25:01,080 --> 00:25:03,679 Speaker 1: at different things that are going on in front of 416 00:25:03,720 --> 00:25:06,640 Speaker 1: you that catch your attention as you focus on them, 417 00:25:06,640 --> 00:25:09,000 Speaker 1: and then you file that information away. And perhaps you're 418 00:25:09,000 --> 00:25:13,320 Speaker 1: even doing this subconsciously. Uh. It means our gaze is 419 00:25:13,359 --> 00:25:17,280 Speaker 1: not always steady and unwavering. It it moves around a 420 00:25:17,280 --> 00:25:19,840 Speaker 1: bit on occasion. And that's not the only way we 421 00:25:19,880 --> 00:25:22,159 Speaker 1: move our eyes. Of course, we can actually track things 422 00:25:22,200 --> 00:25:25,320 Speaker 1: that are moving and use our eyes to move in 423 00:25:25,400 --> 00:25:28,720 Speaker 1: a more smooth and gradual motion. But the team knew 424 00:25:28,720 --> 00:25:31,080 Speaker 1: that if they could incorporate the CODs, that would give 425 00:25:31,119 --> 00:25:35,320 Speaker 1: the robot a more lifelike performance. But that decision meant 426 00:25:35,320 --> 00:25:37,560 Speaker 1: the team needed to figure out something else, which was 427 00:25:37,600 --> 00:25:40,960 Speaker 1: where to put the cameras. The animatronic needs its own 428 00:25:41,119 --> 00:25:44,480 Speaker 1: vision to be able to detect onlookers and then direct 429 00:25:44,520 --> 00:25:49,080 Speaker 1: its own gaze appropriately, and some robots do put cameras 430 00:25:49,080 --> 00:25:52,000 Speaker 1: in the eyes of the robot so that the eyes 431 00:25:52,040 --> 00:25:55,520 Speaker 1: are actually camera lenses, but that presents a challenge if 432 00:25:55,560 --> 00:25:58,760 Speaker 1: you wish to incorporate rapid eye movement like the CODs, 433 00:25:58,800 --> 00:26:01,720 Speaker 1: because that sort of movement introduces motion blur in the 434 00:26:01,840 --> 00:26:04,679 Speaker 1: video imagery makes it more challenging for the robot to 435 00:26:04,720 --> 00:26:06,520 Speaker 1: keep track of what's going on in front of it. 436 00:26:06,920 --> 00:26:09,600 Speaker 1: For that reason, the team decided that the cameras would 437 00:26:09,600 --> 00:26:13,360 Speaker 1: not be mounted in the eyes, but they rather were 438 00:26:13,359 --> 00:26:18,640 Speaker 1: mounted on the animatronics chest. Presumably, should the gaze tracking 439 00:26:18,720 --> 00:26:22,040 Speaker 1: technology find its way into full animatronic figures in the future, 440 00:26:22,440 --> 00:26:25,080 Speaker 1: the camera will be you know, hidden within the body 441 00:26:25,160 --> 00:26:29,040 Speaker 1: of the animatronic torso in order to avoid this problem, 442 00:26:29,160 --> 00:26:33,160 Speaker 1: or otherwise maybe mounted in an obtrusive spot. One thing 443 00:26:33,160 --> 00:26:36,320 Speaker 1: that interests me with this particular approach is that the 444 00:26:36,359 --> 00:26:39,639 Speaker 1: system has to do some calculations as to where the 445 00:26:39,720 --> 00:26:43,040 Speaker 1: eyes of the animatronic are in relation to the physical 446 00:26:43,080 --> 00:26:46,680 Speaker 1: location of the cameras, you know, because for us, all 447 00:26:46,680 --> 00:26:50,440 Speaker 1: our eyes are essentially the cameras, or at least the 448 00:26:50,480 --> 00:26:53,920 Speaker 1: camera lenses, so we don't have to make any adjustments. 449 00:26:54,000 --> 00:26:57,280 Speaker 1: Right where we're looking is like the point of our 450 00:26:57,320 --> 00:27:01,000 Speaker 1: gaze is the point of where we're taking in visual information. 451 00:27:01,480 --> 00:27:05,520 Speaker 1: For the animatronic, the eyes of the robot, the actual 452 00:27:05,640 --> 00:27:08,960 Speaker 1: eyes that are in the skull, don't function as eyes. 453 00:27:09,520 --> 00:27:14,359 Speaker 1: They aren't lenses. They're actually several inches above the actual camera. 454 00:27:15,000 --> 00:27:18,439 Speaker 1: And yet the eyes in the robot's head need to 455 00:27:18,480 --> 00:27:20,439 Speaker 1: point in the right direction. They need to be the 456 00:27:20,520 --> 00:27:23,760 Speaker 1: part that's pointed at the person who's looking at it. Right, 457 00:27:23,800 --> 00:27:26,359 Speaker 1: it doesn't make sense for the robot to just turn 458 00:27:26,440 --> 00:27:29,480 Speaker 1: its sternom towards you. It needs to be looking at 459 00:27:29,520 --> 00:27:33,040 Speaker 1: you with its robot eyes. And I think of this 460 00:27:33,200 --> 00:27:36,679 Speaker 1: kind of like someone who's working a hand puppet and 461 00:27:36,720 --> 00:27:39,760 Speaker 1: they've got the hand puppet up over their head, so 462 00:27:40,000 --> 00:27:42,359 Speaker 1: maybe they're behind a little stage, you know, like like 463 00:27:42,480 --> 00:27:45,719 Speaker 1: the muppets tend to be. You've got this hand puppet 464 00:27:45,720 --> 00:27:49,480 Speaker 1: and it needs to make eye contact with a human being. Well, 465 00:27:49,600 --> 00:27:52,000 Speaker 1: that just means the puppeteer has to take that into 466 00:27:52,040 --> 00:27:56,919 Speaker 1: account and angle their hand so that the puppets eyes 467 00:27:57,119 --> 00:28:00,359 Speaker 1: appear to be locking on the eyes of the real 468 00:28:00,440 --> 00:28:04,440 Speaker 1: person that the muppet or puppet is interacting with. It's 469 00:28:04,440 --> 00:28:07,680 Speaker 1: a little tricky. It requires some skill for the robot. 470 00:28:07,800 --> 00:28:10,600 Speaker 1: It means that there's some you know, nifty geometry going 471 00:28:10,640 --> 00:28:13,960 Speaker 1: on in the processor side to make this work out. 472 00:28:14,040 --> 00:28:17,320 Speaker 1: Like the image recognition has to identify where the eyes 473 00:28:17,400 --> 00:28:22,560 Speaker 1: are of the onlooker and then calculate where the robots 474 00:28:22,560 --> 00:28:25,840 Speaker 1: eyes are in relation to that and direct them in 475 00:28:25,880 --> 00:28:29,359 Speaker 1: the right way, which to me is really fascinating because again, 476 00:28:29,720 --> 00:28:32,359 Speaker 1: the eyes of the robot are not where the visual 477 00:28:32,359 --> 00:28:36,399 Speaker 1: information is actually coming in. We'll talk more about the 478 00:28:36,440 --> 00:28:39,240 Speaker 1: behaviors of this robot in a second, but since we're 479 00:28:39,240 --> 00:28:41,720 Speaker 1: already chatting about cameras, it's good to talk about what 480 00:28:41,800 --> 00:28:44,600 Speaker 1: the team was actually using to give the robot it's vision. 481 00:28:45,040 --> 00:28:47,840 Speaker 1: They went within off the shelf solution. They used a 482 00:28:47,920 --> 00:28:52,320 Speaker 1: camera called the Mint Eye D one thousand and Mint 483 00:28:52,440 --> 00:28:56,760 Speaker 1: is spelled m y nt. This particular camera has two 484 00:28:56,840 --> 00:29:00,400 Speaker 1: lenses in it for stereoscopic vision, and so together they 485 00:29:00,400 --> 00:29:03,240 Speaker 1: can create a stereo image that is an image with 486 00:29:03,640 --> 00:29:05,640 Speaker 1: you know, kind of a depth like a three D 487 00:29:05,760 --> 00:29:09,160 Speaker 1: image with a resolution of two thousand, five hundred sixty 488 00:29:09,160 --> 00:29:12,920 Speaker 1: by seven twenty pixels at sixty frames per second, so 489 00:29:12,920 --> 00:29:16,480 Speaker 1: it can do you know, this is video information. There's 490 00:29:16,480 --> 00:29:20,120 Speaker 1: also a depth map mode which uses infrared light to 491 00:29:20,160 --> 00:29:23,800 Speaker 1: help judge the depth of the things within its field 492 00:29:23,800 --> 00:29:26,600 Speaker 1: of view, like how close is one thing versus another 493 00:29:27,240 --> 00:29:30,160 Speaker 1: relative to the camera, and the depth maps resolution is 494 00:29:30,200 --> 00:29:33,160 Speaker 1: at one thousand, two hundred eighty by seven twenty pixels 495 00:29:33,200 --> 00:29:36,200 Speaker 1: at sixty frames per second. As I mentioned, these two 496 00:29:36,320 --> 00:29:40,000 Speaker 1: lenses allow the camera to simulate human binocular vision. So 497 00:29:40,040 --> 00:29:42,400 Speaker 1: just as we perceive depth in the world around us 498 00:29:42,520 --> 00:29:45,320 Speaker 1: using two eyes, you know, most of us, uh, this 499 00:29:45,440 --> 00:29:48,480 Speaker 1: camera can do the same thing and judge which things 500 00:29:48,520 --> 00:29:51,360 Speaker 1: are in the foreground versus the background, what things are 501 00:29:51,520 --> 00:29:54,280 Speaker 1: closest to it versus furthest away, and make a better 502 00:29:54,320 --> 00:29:57,440 Speaker 1: determination of which things within its field of view are 503 00:29:57,480 --> 00:30:00,680 Speaker 1: worthy of attention, which will become important in a little bit. 504 00:30:01,240 --> 00:30:03,880 Speaker 1: The camera has a more limited field of view than 505 00:30:03,920 --> 00:30:07,640 Speaker 1: a typical human. It has about half the horizontal field 506 00:30:07,720 --> 00:30:10,760 Speaker 1: of view of persons, so it's periphery is more narrow, 507 00:30:11,400 --> 00:30:13,959 Speaker 1: and it has a little more than a third the 508 00:30:14,080 --> 00:30:16,440 Speaker 1: vertical field of view, so I can't see as much 509 00:30:16,520 --> 00:30:19,840 Speaker 1: up and down as your typical person can. So any 510 00:30:19,920 --> 00:30:23,080 Speaker 1: future animatronic figure might need a more expansive field of 511 00:30:23,160 --> 00:30:26,000 Speaker 1: view to be able to interact with guests who could 512 00:30:26,080 --> 00:30:28,640 Speaker 1: range and height from very small to quite tall. I mean, 513 00:30:28,680 --> 00:30:31,680 Speaker 1: all sorts of people go to Disney. So I do 514 00:30:31,760 --> 00:30:35,240 Speaker 1: see that as a potential limiting factor in the short run, 515 00:30:35,800 --> 00:30:38,920 Speaker 1: that any stereoscopic kind of camera would need to have 516 00:30:39,000 --> 00:30:43,480 Speaker 1: a pretty good field of view for a robot to 517 00:30:43,520 --> 00:30:49,080 Speaker 1: be able to interact properly with guests of different heights. Now, 518 00:30:49,120 --> 00:30:51,200 Speaker 1: I decided to see how much this camera would cost 519 00:30:51,240 --> 00:30:54,360 Speaker 1: for some normal schlub like myself, and the answer is 520 00:30:54,520 --> 00:30:56,840 Speaker 1: less than four hundred dollars. So this is actually a 521 00:30:56,840 --> 00:31:01,600 Speaker 1: pretty inexpensive solution all things considered. And again so it's 522 00:31:01,640 --> 00:31:05,520 Speaker 1: it's really more important for creating the basis for the 523 00:31:05,600 --> 00:31:08,400 Speaker 1: work as opposed to saying this is a final product. 524 00:31:08,960 --> 00:31:11,440 Speaker 1: And that's more or less the hardware side of things, 525 00:31:11,520 --> 00:31:13,680 Speaker 1: or at least as specific as I can get based 526 00:31:13,680 --> 00:31:16,840 Speaker 1: on the material available. Like I, I don't know what 527 00:31:17,000 --> 00:31:19,880 Speaker 1: the power of their computer system was, you know, I 528 00:31:19,920 --> 00:31:23,120 Speaker 1: don't know the specific types of motors they were using 529 00:31:23,120 --> 00:31:26,720 Speaker 1: in the animatronic but from a high level we understand 530 00:31:26,720 --> 00:31:30,200 Speaker 1: what's going on. However, the real magic happens with the 531 00:31:30,240 --> 00:31:33,840 Speaker 1: system that gives this hardware it's orders, and the team 532 00:31:33,880 --> 00:31:36,760 Speaker 1: made the conscious decision to create the illusion of life 533 00:31:37,120 --> 00:31:41,040 Speaker 1: rather than attempt to replicate human behaviors perfectly, which is 534 00:31:41,040 --> 00:31:43,400 Speaker 1: a bit of a challenging concept. You might think, well, 535 00:31:43,400 --> 00:31:46,160 Speaker 1: what's the difference, But I think I have a pretty 536 00:31:46,200 --> 00:31:50,640 Speaker 1: decent analogy. If you've ever gone to see a stage play, 537 00:31:51,000 --> 00:31:55,560 Speaker 1: then you've seen sets. Maybe the sets were really detailed, 538 00:31:56,040 --> 00:31:59,240 Speaker 1: maybe they were bare bones sets, But in any case, 539 00:31:59,280 --> 00:32:01,800 Speaker 1: the sets are meant to create the illusion of a 540 00:32:01,880 --> 00:32:04,800 Speaker 1: real place at a real moment of time. You know, 541 00:32:04,840 --> 00:32:07,000 Speaker 1: it could be a room in the eighteenth century in 542 00:32:07,040 --> 00:32:10,480 Speaker 1: a in a palatial estate, or it might be a 543 00:32:10,600 --> 00:32:14,560 Speaker 1: modern day real estate sales office if it's a moment play, 544 00:32:14,680 --> 00:32:18,600 Speaker 1: or maybe it's a campsite. In any case, the sets 545 00:32:18,600 --> 00:32:21,800 Speaker 1: and props are meant to convey the illusion of that 546 00:32:21,880 --> 00:32:24,920 Speaker 1: place and time, and if you were to actually get 547 00:32:25,000 --> 00:32:27,360 Speaker 1: up on stage and walk around, that illusion would very 548 00:32:27,440 --> 00:32:30,480 Speaker 1: quickly be broken. But when you're sitting in the audience, 549 00:32:30,920 --> 00:32:33,680 Speaker 1: it's up to you to use your imagination to fill 550 00:32:33,720 --> 00:32:36,840 Speaker 1: in some of the gaps and suspend disbelief it is 551 00:32:37,000 --> 00:32:41,360 Speaker 1: a show. Likewise, the engineers who worked on this project 552 00:32:41,520 --> 00:32:45,600 Speaker 1: talk about robot behaviors in terms of a show, and 553 00:32:45,640 --> 00:32:48,040 Speaker 1: that means that the robot needs to react and move 554 00:32:48,080 --> 00:32:50,840 Speaker 1: in ways that create the illusion of life, but it 555 00:32:50,880 --> 00:32:56,320 Speaker 1: does not necessarily need to adhere completely to human behaviors. 556 00:32:56,360 --> 00:33:00,000 Speaker 1: This makes things much more simple, particularly since it removes 557 00:33:00,000 --> 00:33:03,800 Speaker 1: of tricky questions regarding what sets of behaviors are the 558 00:33:03,800 --> 00:33:07,400 Speaker 1: most human, because I'm sure you've noticed human beings and 559 00:33:07,520 --> 00:33:11,600 Speaker 1: human behavior occur in a really broad spectrum, and what 560 00:33:11,840 --> 00:33:15,240 Speaker 1: might be a typical set of behaviors for one person 561 00:33:15,560 --> 00:33:19,000 Speaker 1: could be completely alien to another person. So it's a 562 00:33:19,000 --> 00:33:23,120 Speaker 1: good idea to not try and define what sets of 563 00:33:23,160 --> 00:33:28,000 Speaker 1: behaviors are quintessentially human. When we come back, I'll talk 564 00:33:28,040 --> 00:33:31,320 Speaker 1: about how the team determined how the robot would actually 565 00:33:31,360 --> 00:33:35,400 Speaker 1: behave it's pretty cool, but first let's take another quick break. 566 00:33:42,760 --> 00:33:46,440 Speaker 1: The team created an architecture to describe the relationship of 567 00:33:46,560 --> 00:33:51,280 Speaker 1: various elements to create the behavior of an interactive robotic gaze. 568 00:33:51,400 --> 00:33:56,120 Speaker 1: To create this robotic eye contact, the layers include the camera, 569 00:33:56,400 --> 00:33:59,280 Speaker 1: which is you know, the point of perception from the robot, 570 00:33:59,800 --> 00:34:05,280 Speaker 1: a perception engine h and an attention engine which determines 571 00:34:05,680 --> 00:34:08,560 Speaker 1: which things within the robots perception are actually worthy of 572 00:34:08,680 --> 00:34:13,200 Speaker 1: attention or focus. A behavior selection engine and a library 573 00:34:13,280 --> 00:34:18,880 Speaker 1: of potential behaviors, and the audio animatronic figures systems. It's hardware, 574 00:34:18,960 --> 00:34:22,839 Speaker 1: the motor commands and motor states go to that, and 575 00:34:23,080 --> 00:34:26,120 Speaker 1: that's the layers in order from top to bottom. These 576 00:34:26,200 --> 00:34:29,320 Speaker 1: layers explain the relationship of each element in sort of 577 00:34:29,320 --> 00:34:32,600 Speaker 1: an abstract way, allowing us to understand how the robot 578 00:34:32,680 --> 00:34:36,640 Speaker 1: processes and reacts to information. So the perception engine is 579 00:34:36,680 --> 00:34:40,759 Speaker 1: designed to identify potential elements within the robotic vision, you know, 580 00:34:40,800 --> 00:34:44,200 Speaker 1: separating things out from say just a static background, and 581 00:34:44,239 --> 00:34:47,600 Speaker 1: the attention engine attempts to identify things within the robots 582 00:34:47,680 --> 00:34:51,560 Speaker 1: vision that merit focus. The attention engine generates what the 583 00:34:51,560 --> 00:34:56,040 Speaker 1: team calls a curiosity score. So if that curiosity score 584 00:34:56,160 --> 00:35:00,600 Speaker 1: is below a certain threshold, the robot won't quota quote 585 00:35:00,640 --> 00:35:03,719 Speaker 1: notice something within its field of view. It's it's not 586 00:35:03,880 --> 00:35:07,960 Speaker 1: enough to capture its attention. Certain actions, such as you know, 587 00:35:08,000 --> 00:35:12,040 Speaker 1: waving at the robot, merit a higher curiosity score. So 588 00:35:12,080 --> 00:35:15,640 Speaker 1: if the score ends up being above the curiosity score threshold, 589 00:35:15,960 --> 00:35:18,920 Speaker 1: the robot will look toward whatever it was that you know, 590 00:35:19,200 --> 00:35:22,560 Speaker 1: quote unquote got its attention. The team decided it would 591 00:35:22,560 --> 00:35:25,680 Speaker 1: be helpful to create a sort of scenario to work with, 592 00:35:25,760 --> 00:35:28,560 Speaker 1: not just have you know, a robot randomly looking around, 593 00:35:28,600 --> 00:35:32,800 Speaker 1: So their approach was to simulate an elderly man reading 594 00:35:32,880 --> 00:35:36,720 Speaker 1: something like a newspaper or a book. Most of the time, 595 00:35:36,960 --> 00:35:39,239 Speaker 1: the robot would be looking downward a bit, you know, 596 00:35:39,280 --> 00:35:41,839 Speaker 1: it's head tilted down a little, as if it were 597 00:35:41,840 --> 00:35:44,879 Speaker 1: reading something that was held more or less at torso level. 598 00:35:45,600 --> 00:35:48,960 Speaker 1: If something moves into the robots field of you, the 599 00:35:49,080 --> 00:35:51,600 Speaker 1: robot could glance up quickly, just as a human would 600 00:35:51,600 --> 00:35:54,640 Speaker 1: to assess what's going on, and if whatever is within 601 00:35:54,719 --> 00:35:57,839 Speaker 1: the field of view creates a curiosity score lower than 602 00:35:58,000 --> 00:36:01,319 Speaker 1: what the threshold is, then the robot just goes back 603 00:36:01,360 --> 00:36:04,840 Speaker 1: to reading. If whatever is going on is above that 604 00:36:04,960 --> 00:36:09,560 Speaker 1: curiosity score threshold, the robot might look directly at whatever 605 00:36:09,600 --> 00:36:12,279 Speaker 1: it is that's happening, and then things could progress from there. 606 00:36:12,920 --> 00:36:16,520 Speaker 1: That's where the behavior selection engine and behavior library come 607 00:36:16,560 --> 00:36:19,319 Speaker 1: into play. There are a few possible reactions, and the 608 00:36:19,360 --> 00:36:22,880 Speaker 1: robot will choose one depending on several factors. For example, 609 00:36:23,120 --> 00:36:26,880 Speaker 1: one such factor was familiarity. The robot would behave differently 610 00:36:26,880 --> 00:36:30,800 Speaker 1: toward people it quote unquote recognized. It also wouldn't switch 611 00:36:30,920 --> 00:36:34,560 Speaker 1: focus every time someone tried to wave it down, So 612 00:36:34,719 --> 00:36:37,080 Speaker 1: if you were to distract the robot, it might look 613 00:36:37,120 --> 00:36:39,799 Speaker 1: away from whatever it was looking at before and then 614 00:36:39,840 --> 00:36:42,440 Speaker 1: look to you once. Then it might look back at 615 00:36:42,480 --> 00:36:45,479 Speaker 1: someone quote unquote knows, and if you were to wave 616 00:36:45,480 --> 00:36:48,880 Speaker 1: at it again, you wouldn't necessarily get a response. So 617 00:36:49,040 --> 00:36:51,600 Speaker 1: kind of think about how adults can be with kids, 618 00:36:51,880 --> 00:36:54,800 Speaker 1: where the adults tend to develop a highly attuned skill 619 00:36:54,880 --> 00:36:57,560 Speaker 1: of ignoring the child after a bit, even if the 620 00:36:57,600 --> 00:37:03,279 Speaker 1: child is saying, but look, look, look, hey, Look what's 621 00:37:03,280 --> 00:37:07,800 Speaker 1: what I'm doing? Look? And so on. So the team 622 00:37:07,880 --> 00:37:12,840 Speaker 1: created four basic states. The default state was called read, 623 00:37:13,200 --> 00:37:15,920 Speaker 1: meaning it would appear as though the figure we're reading 624 00:37:15,920 --> 00:37:19,480 Speaker 1: a book or newspaper at Torso level. The next state 625 00:37:19,600 --> 00:37:23,040 Speaker 1: up is glance, where upon the robot would appear to 626 00:37:23,120 --> 00:37:26,480 Speaker 1: glance away from the reading material to see what sort 627 00:37:26,480 --> 00:37:29,440 Speaker 1: of ruckus is going on. This involved movement of not 628 00:37:29,520 --> 00:37:31,759 Speaker 1: just the eyes but the head as well. So the 629 00:37:31,800 --> 00:37:35,000 Speaker 1: head tilts up a bit and it looks for a moment, 630 00:37:35,080 --> 00:37:38,799 Speaker 1: like the robot is looking away from the imaginary book 631 00:37:38,840 --> 00:37:42,319 Speaker 1: or newspaper. If the curiosity threshold is met, then the 632 00:37:42,360 --> 00:37:46,640 Speaker 1: next state engage would pop up. This means that whatever 633 00:37:46,680 --> 00:37:49,040 Speaker 1: it was that got the robot's attention is worthy of 634 00:37:49,200 --> 00:37:52,399 Speaker 1: further focus. In the robot will direct its gaze at 635 00:37:52,480 --> 00:37:56,560 Speaker 1: that thing. With the engage stage, which has a nice 636 00:37:56,640 --> 00:37:59,759 Speaker 1: rhyme to it, the robot will attempt to make eye contact, 637 00:38:00,000 --> 00:38:02,840 Speaker 1: which involves the cameras detecting the face of the person 638 00:38:02,920 --> 00:38:06,160 Speaker 1: of interest, and then the computer system commanding the robot's 639 00:38:06,160 --> 00:38:09,600 Speaker 1: head and eyes to aim towards that detected face. The 640 00:38:09,640 --> 00:38:11,920 Speaker 1: amount of time that the robot spends looking at a 641 00:38:11,960 --> 00:38:15,240 Speaker 1: person is determined both by a minimum countdown clock saying 642 00:38:15,640 --> 00:38:18,960 Speaker 1: you have to spend this amount at least looking at 643 00:38:19,040 --> 00:38:22,719 Speaker 1: this person, and then there's the curiosity score that the 644 00:38:22,800 --> 00:38:26,040 Speaker 1: robot has assigned to that person. So once that score 645 00:38:26,080 --> 00:38:30,600 Speaker 1: decreases below the engaged threshold, the robot returns to read. 646 00:38:30,760 --> 00:38:34,120 Speaker 1: So if you happen to be particularly interesting, the robot 647 00:38:34,160 --> 00:38:37,360 Speaker 1: will look at you for longer, and when you stop 648 00:38:37,360 --> 00:38:40,839 Speaker 1: being interesting, the robot eventually goes back to reading its 649 00:38:40,840 --> 00:38:45,399 Speaker 1: pretend book or whatever. The final stage is called acknowledge, 650 00:38:45,400 --> 00:38:47,279 Speaker 1: and that was the name that the team gave for 651 00:38:47,320 --> 00:38:49,960 Speaker 1: those times when the robot is seeing a person that 652 00:38:50,080 --> 00:38:53,720 Speaker 1: is familiar to the robot. For the purposes of the tests, 653 00:38:54,200 --> 00:38:58,000 Speaker 1: the familiarity variable was actually randomized, so in other words, 654 00:38:58,280 --> 00:39:02,480 Speaker 1: the robot wasn't necessary early familiar with people. It just 655 00:39:02,880 --> 00:39:06,800 Speaker 1: was told it was familiar with somebody. So, in other words, 656 00:39:06,800 --> 00:39:09,360 Speaker 1: that it could be a totally new person that walks 657 00:39:09,440 --> 00:39:12,960 Speaker 1: up to the robot and the robot randomly assigns that 658 00:39:13,000 --> 00:39:16,879 Speaker 1: person the familiar tag, and the robot will behave as 659 00:39:16,880 --> 00:39:20,759 Speaker 1: if that's someone that the robot recognizes. Maybe they're just 660 00:39:20,880 --> 00:39:24,840 Speaker 1: an old friend the robot just met. Is there a 661 00:39:24,880 --> 00:39:28,640 Speaker 1: word for that? The robot system also had a sort 662 00:39:28,680 --> 00:39:32,480 Speaker 1: of short term memory that the team called the guesthouse. 663 00:39:33,080 --> 00:39:35,799 Speaker 1: As people would come into the robot's field of view 664 00:39:36,080 --> 00:39:39,720 Speaker 1: or the scene as the team called it, the robot 665 00:39:39,760 --> 00:39:43,799 Speaker 1: would analyze that person and assign that person a numerical 666 00:39:43,960 --> 00:39:47,560 Speaker 1: value to keep track of that person, and it would 667 00:39:47,560 --> 00:39:49,920 Speaker 1: also keep track of how many times that particular person 668 00:39:50,200 --> 00:39:53,239 Speaker 1: had been within its field of view, and it would 669 00:39:53,320 --> 00:39:56,160 Speaker 1: keep track of the curiosity score that was assigned to 670 00:39:56,280 --> 00:39:59,760 Speaker 1: that person. In addition to the states, the team described 671 00:39:59,840 --> 00:40:03,520 Speaker 1: lay years of show. Now this relates closely with the 672 00:40:03,560 --> 00:40:06,480 Speaker 1: states I just mentioned, but it helps explain how the 673 00:40:06,600 --> 00:40:09,879 Speaker 1: robot transitions from one set of behaviors to another, how 674 00:40:09,880 --> 00:40:13,239 Speaker 1: does it make the determination to change from one thing 675 00:40:13,320 --> 00:40:17,360 Speaker 1: to the next, and which behaviors will overwrite others versus 676 00:40:17,440 --> 00:40:21,160 Speaker 1: behaviors that will always be present with the robot. All 677 00:40:21,200 --> 00:40:24,640 Speaker 1: of this is necessary because of that variation I was 678 00:40:24,680 --> 00:40:27,640 Speaker 1: talking about at the beginning of the show. If the 679 00:40:27,760 --> 00:40:32,239 Speaker 1: robot were just following a scripted set of directions, it 680 00:40:32,280 --> 00:40:34,839 Speaker 1: wouldn't have to make these determinations because it would just 681 00:40:34,840 --> 00:40:38,520 Speaker 1: follow the same sequence over and over. But because we 682 00:40:38,600 --> 00:40:42,800 Speaker 1: have this variability, we have to build in a system 683 00:40:42,920 --> 00:40:45,359 Speaker 1: for the robot to follow in order to make decisions. 684 00:40:45,440 --> 00:40:48,040 Speaker 1: So at the base level you have what the team 685 00:40:48,040 --> 00:40:52,560 Speaker 1: calls zero show. This is essentially the robot in off mode. 686 00:40:52,640 --> 00:40:55,719 Speaker 1: It is inanimate. But the next layer up is a 687 00:40:55,800 --> 00:41:00,279 Speaker 1: live show, which has the baseline behaviors of simulated bree thing, 688 00:41:00,960 --> 00:41:05,399 Speaker 1: eye blinking, and the scads. This level of show underlies 689 00:41:05,680 --> 00:41:08,960 Speaker 1: all the other higher levels, so this is sort of 690 00:41:09,920 --> 00:41:13,520 Speaker 1: always running in the background. You don't want the robot 691 00:41:13,560 --> 00:41:17,160 Speaker 1: to suddenly stop breathing while it does other stuff. The 692 00:41:17,280 --> 00:41:21,400 Speaker 1: next four show levels correspond with the four states of 693 00:41:21,440 --> 00:41:25,560 Speaker 1: the robots. So you have read, glance, engage, and acknowledge, 694 00:41:26,040 --> 00:41:30,920 Speaker 1: and an engage show will subsume the glance and read shows. 695 00:41:31,360 --> 00:41:34,040 Speaker 1: It will take over the robots behaviors, So the robots 696 00:41:34,080 --> 00:41:37,800 Speaker 1: not going to display the behaviors of read and glance 697 00:41:38,160 --> 00:41:43,279 Speaker 1: when engage happens. So it's that hierarchy of operations, and 698 00:41:43,360 --> 00:41:46,080 Speaker 1: I find it really interesting to look at robot behaviors 699 00:41:46,080 --> 00:41:49,359 Speaker 1: in this way as that hierarchy of potential states. It's 700 00:41:49,400 --> 00:41:52,800 Speaker 1: amazing when you break down those states and determine which 701 00:41:52,840 --> 00:41:57,520 Speaker 1: should take priority given certain circumstances, and how long that 702 00:41:57,640 --> 00:42:00,520 Speaker 1: state should remain active before it rever it's to a 703 00:42:00,640 --> 00:42:04,400 Speaker 1: lower level state. Again, the team is trying to create 704 00:42:04,440 --> 00:42:07,240 Speaker 1: the illusion of life. The robot doesn't have to actually 705 00:42:07,320 --> 00:42:10,960 Speaker 1: lose interest or anything like that. It's just simulating it. 706 00:42:11,600 --> 00:42:15,040 Speaker 1: This particular project was working within some pretty well defined 707 00:42:15,080 --> 00:42:18,640 Speaker 1: parameters and restrictions. The team acknowledge that their work is 708 00:42:18,680 --> 00:42:22,120 Speaker 1: really meant to be a starting point for further improvements. 709 00:42:22,560 --> 00:42:26,759 Speaker 1: They point out that older audio animatronics might seem lifelike 710 00:42:26,840 --> 00:42:30,760 Speaker 1: at greater distances and for shorter durations. So, for example, 711 00:42:31,239 --> 00:42:33,920 Speaker 1: if you were to ride an attraction where you go 712 00:42:34,040 --> 00:42:37,520 Speaker 1: by a scene of audio animatronic figures at a decent 713 00:42:37,520 --> 00:42:40,800 Speaker 1: clip and and there you know, good, twenty feet away. 714 00:42:41,200 --> 00:42:43,919 Speaker 1: The limited amount of time and the greater distance that 715 00:42:44,080 --> 00:42:48,160 Speaker 1: are involved can help support that illusion of life. The 716 00:42:48,200 --> 00:42:51,879 Speaker 1: animatronic figures don't have to be super convincing because you're 717 00:42:51,920 --> 00:42:55,480 Speaker 1: not spending enough time and attention to see through the illusion, 718 00:42:55,520 --> 00:42:58,120 Speaker 1: nor are you close enough to see it showed through. 719 00:42:58,840 --> 00:43:01,800 Speaker 1: The more time you have and the less distance between 720 00:43:01,880 --> 00:43:04,840 Speaker 1: you and the animatronic figure, the harder it is to 721 00:43:04,920 --> 00:43:09,800 Speaker 1: create and maintain that illusion of life. Without an interactive gaze, 722 00:43:09,880 --> 00:43:13,800 Speaker 1: Without eye contact, it becomes pretty clear that the animatronic 723 00:43:13,840 --> 00:43:17,480 Speaker 1: figure has no real lifelike quality to it. If you 724 00:43:17,520 --> 00:43:20,520 Speaker 1: were to stand close to one of these older animatronic figures, 725 00:43:20,920 --> 00:43:23,280 Speaker 1: you would notice that it's not really looking at anything 726 00:43:23,320 --> 00:43:26,520 Speaker 1: in particular, and that its movements are a matter of routine. 727 00:43:26,880 --> 00:43:32,000 Speaker 1: It's not a demonstration of spontaneous or seemingly spontaneous decisions. 728 00:43:32,640 --> 00:43:35,799 Speaker 1: The Interactive Gaze project takes this a step up. The 729 00:43:35,880 --> 00:43:39,080 Speaker 1: robot can recognize and acknowledge someone that is in the 730 00:43:39,160 --> 00:43:43,200 Speaker 1: robot's presence, it can direct its focus and attention at 731 00:43:43,239 --> 00:43:46,560 Speaker 1: that person. This definitely is a step up in creating 732 00:43:46,560 --> 00:43:50,560 Speaker 1: that illusion and works at much smaller distances of viewing 733 00:43:51,040 --> 00:43:54,200 Speaker 1: than the older methods do, but the engineers admit it 734 00:43:54,320 --> 00:43:57,759 Speaker 1: still has limitations. They point out that their approach as 735 00:43:57,800 --> 00:44:00,680 Speaker 1: it stands, might serve as a way to reserve that 736 00:44:00,760 --> 00:44:04,320 Speaker 1: illusion of life for a couple of minutes at the most, 737 00:44:04,640 --> 00:44:08,040 Speaker 1: but beyond that the illusion would start to fade away. 738 00:44:08,280 --> 00:44:10,880 Speaker 1: They point out that as the distance between the robot 739 00:44:10,960 --> 00:44:14,359 Speaker 1: and the audience decreases, and as the time of observing 740 00:44:14,360 --> 00:44:19,560 Speaker 1: the robot increases, you have to incorporate increasingly complex and 741 00:44:19,640 --> 00:44:24,040 Speaker 1: natural behaviors to maintain that illusion of life, and interactive 742 00:44:24,080 --> 00:44:27,359 Speaker 1: gaze is just one element. Others could include stuff like 743 00:44:27,440 --> 00:44:31,440 Speaker 1: a display of emotion. The bust has sort of a 744 00:44:31,440 --> 00:44:34,040 Speaker 1: little bit of this. It can it can imply a 745 00:44:34,080 --> 00:44:36,640 Speaker 1: sense of emotion to some degree with the way it 746 00:44:36,719 --> 00:44:40,240 Speaker 1: holds its eyes, but because it doesn't have any movement 747 00:44:40,239 --> 00:44:42,960 Speaker 1: of its jaw or lips, and doesn't have any other 748 00:44:43,840 --> 00:44:48,400 Speaker 1: means of really indicating emotion, this is pretty limited. So 749 00:44:48,480 --> 00:44:52,080 Speaker 1: perhaps a robot that can hear and parse and respond 750 00:44:52,080 --> 00:44:54,440 Speaker 1: to speech, you know, sort of like the voice activated 751 00:44:54,480 --> 00:44:57,200 Speaker 1: digital assistance that are familiar to us, and you know, 752 00:44:57,239 --> 00:45:00,360 Speaker 1: probably like the Amazon Echo or the iPhone or Android phones. 753 00:45:00,960 --> 00:45:04,640 Speaker 1: That might be something that really pushes that illusion of life. 754 00:45:05,000 --> 00:45:09,279 Speaker 1: And of course there's also the physical appearance aspect. Now, 755 00:45:09,280 --> 00:45:12,800 Speaker 1: you would never mistake this animatronic bust for a human 756 00:45:13,120 --> 00:45:16,319 Speaker 1: I mentioned before. It's pretty creepy looking. It's got a 757 00:45:16,360 --> 00:45:19,879 Speaker 1: plastic and skeletal quality to it that prevents you from 758 00:45:19,880 --> 00:45:23,640 Speaker 1: ever mistaking it as a person. But the team points 759 00:45:23,640 --> 00:45:27,240 Speaker 1: out the physical appearance of the robot taps back into 760 00:45:27,320 --> 00:45:31,000 Speaker 1: that problem of uncanny Valley. It might take a while 761 00:45:31,080 --> 00:45:34,800 Speaker 1: to create something that's convincing enough and yet not repulsive 762 00:45:36,120 --> 00:45:39,560 Speaker 1: to work as a robotic human animatronic. If you make 763 00:45:39,600 --> 00:45:43,240 Speaker 1: it look too real, it's going to give people the creeps. 764 00:45:43,880 --> 00:45:46,680 Speaker 1: I think, at least in the short term, we're more 765 00:45:46,760 --> 00:45:49,400 Speaker 1: likely to see this technology used to create characters that 766 00:45:49,440 --> 00:45:53,960 Speaker 1: are human like but still distinctly not human, in order 767 00:45:54,000 --> 00:45:58,000 Speaker 1: to avoid that negative reaction when the uncanny Valley gets involved. 768 00:45:58,320 --> 00:46:01,200 Speaker 1: In other words, using the US to create an animatronic 769 00:46:01,239 --> 00:46:04,799 Speaker 1: figure that looks a lot like a cartoon character, even 770 00:46:04,840 --> 00:46:09,160 Speaker 1: a human cartoon character because well, you recognize the cartoon 771 00:46:09,239 --> 00:46:13,680 Speaker 1: character as representing a human. Cartoon characters don't really look 772 00:46:13,719 --> 00:46:18,239 Speaker 1: like humans. Usually they look like they have human qualities 773 00:46:18,280 --> 00:46:21,719 Speaker 1: to them, but they still have cartoonish qualities to them, 774 00:46:21,760 --> 00:46:24,480 Speaker 1: so you wouldn't mistake them for actually being human. Or 775 00:46:24,560 --> 00:46:27,040 Speaker 1: you just you know, go the robot route or some 776 00:46:27,120 --> 00:46:31,239 Speaker 1: sort of animal career and you sidestep that problem. The 777 00:46:31,280 --> 00:46:34,440 Speaker 1: engineers conclude their paper by talking about how the attention 778 00:46:34,520 --> 00:46:37,960 Speaker 1: engine could, with some evolution, work for a lot of 779 00:46:37,960 --> 00:46:42,160 Speaker 1: different applications. So imagine that you design an animatronic that 780 00:46:42,239 --> 00:46:46,400 Speaker 1: represents someone who's really frightened, and that kind of character 781 00:46:46,560 --> 00:46:49,600 Speaker 1: might have a very low threshold for stimuli to push 782 00:46:49,640 --> 00:46:52,400 Speaker 1: it to a higher state of attentiveness, right like a 783 00:46:52,440 --> 00:46:54,920 Speaker 1: little sound might cause that character to perk up and 784 00:46:54,960 --> 00:46:59,440 Speaker 1: look around quickly because that that character is supposed to 785 00:46:59,440 --> 00:47:02,440 Speaker 1: be frightened. Or you could create something like, you know, 786 00:47:02,480 --> 00:47:06,279 Speaker 1: an absent minded book lover who only glances up from 787 00:47:06,440 --> 00:47:10,400 Speaker 1: whatever book they're studying if something really exciting is happening, 788 00:47:10,400 --> 00:47:14,120 Speaker 1: otherwise they just ignore it. They also talk about the 789 00:47:14,160 --> 00:47:18,520 Speaker 1: bottom up approach to layering behaviors and deciding which behaviors 790 00:47:18,560 --> 00:47:23,560 Speaker 1: will replace others that might inhabit a lower state. That 791 00:47:23,680 --> 00:47:26,520 Speaker 1: is really fascinating to me. Now, we're still a far 792 00:47:26,560 --> 00:47:29,560 Speaker 1: away off from seeing these sorts of technologies make their 793 00:47:29,600 --> 00:47:33,279 Speaker 1: way into official attractions, but based on what I've seen 794 00:47:33,440 --> 00:47:36,160 Speaker 1: and read, I wouldn't be surprised to find them making 795 00:47:36,200 --> 00:47:38,879 Speaker 1: their way into Disney parks in the next say, five 796 00:47:38,960 --> 00:47:42,760 Speaker 1: years or so, depending on how the company budgets stuff. 797 00:47:42,800 --> 00:47:46,839 Speaker 1: Of course, the pandemic has created a particularly tricky situation 798 00:47:46,920 --> 00:47:49,839 Speaker 1: for that branch of the Disney Company, even as other 799 00:47:49,880 --> 00:47:53,319 Speaker 1: branches of that company continue it's global domination of all 800 00:47:53,360 --> 00:47:57,759 Speaker 1: things entertainment. But the technology itself and the design philosophy 801 00:47:57,760 --> 00:48:00,480 Speaker 1: of how to program a robot to behave as if 802 00:48:00,560 --> 00:48:03,600 Speaker 1: it were doing so naturally, it's really neat to me. 803 00:48:04,080 --> 00:48:06,040 Speaker 1: And as I said at the beginning, the paper is 804 00:48:06,040 --> 00:48:09,320 Speaker 1: available for free to read, so if you want to 805 00:48:09,400 --> 00:48:13,359 Speaker 1: check that out, I highly recommend it. I think it 806 00:48:13,480 --> 00:48:16,440 Speaker 1: is a fascinating piece of work, and as I said, 807 00:48:17,000 --> 00:48:20,840 Speaker 1: it's not that difficult to follow. There's some math stuff 808 00:48:20,880 --> 00:48:23,160 Speaker 1: that will probably, you know, lose a lot of you, 809 00:48:23,239 --> 00:48:25,719 Speaker 1: but it lost me. I'm not I'm not trying to 810 00:48:25,760 --> 00:48:28,279 Speaker 1: shame you. I couldn't follow all of it, but it 811 00:48:28,360 --> 00:48:31,000 Speaker 1: is otherwise pretty easy to understand. And like I said, 812 00:48:31,000 --> 00:48:36,279 Speaker 1: it is titled Realistic and Interactive Robot Gaze g A 813 00:48:36,600 --> 00:48:40,040 Speaker 1: Z E, so check that out. It is really a 814 00:48:40,080 --> 00:48:43,759 Speaker 1: neat paper. Just I apologize for the pictures that are 815 00:48:43,760 --> 00:48:48,080 Speaker 1: in there because they're creepy as all get out. That's 816 00:48:48,080 --> 00:48:50,640 Speaker 1: it for me. I hope you guys enjoyed this episode. 817 00:48:50,840 --> 00:48:53,400 Speaker 1: If you have suggestions for future topics I should tackle 818 00:48:53,520 --> 00:48:56,360 Speaker 1: in tech stuff, let me know on Twitter. The handle 819 00:48:56,480 --> 00:48:59,840 Speaker 1: is text stuff h s W and I'll talk to 820 00:48:59,880 --> 00:49:08,160 Speaker 1: you again really soon. Text Stuff is an I Heart 821 00:49:08,280 --> 00:49:12,000 Speaker 1: Radio production. For more podcasts from I Heart Radio, visit 822 00:49:12,040 --> 00:49:15,080 Speaker 1: the I Heart Radio app, Apple Podcasts, or wherever you 823 00:49:15,200 --> 00:49:16,520 Speaker 1: listen to your favorite shows,