WEBVTT - How Motion Capture Works

0:00:00.520 --> 0:00:03.160
<v Speaker 1>Brought to you by the two thousand and twelve Toyota Camera.

0:00:05.880 --> 0:00:08.920
<v Speaker 1>Get in touch with technology with tech Stuff from how

0:00:08.960 --> 0:00:17.000
<v Speaker 1>stuff works dot com. Hello again, everyone, and welcome to

0:00:17.040 --> 0:00:19.000
<v Speaker 1>tech stuff. My name is Chris Poulette, and I'm an

0:00:19.079 --> 0:00:21.720
<v Speaker 1>editor at how stuff works dot com. Sitting across from

0:00:21.720 --> 0:00:24.440
<v Speaker 1>me as always as senior writer Jonathan Strickland. Hey there,

0:00:25.200 --> 0:00:28.640
<v Speaker 1>So today we thought we'd talk a bit about a

0:00:28.760 --> 0:00:33.720
<v Speaker 1>type of performance that is relatively new as far as

0:00:33.760 --> 0:00:37.680
<v Speaker 1>performance goes. Uh, something that uh, I guess this falls

0:00:37.680 --> 0:00:41.080
<v Speaker 1>into our movie making category, but it's also something that's

0:00:41.120 --> 0:00:44.120
<v Speaker 1>been used in things like video games and and other

0:00:44.159 --> 0:00:48.640
<v Speaker 1>forms of media as well. Motion capture. Yeah, you'll even

0:00:48.680 --> 0:00:52.720
<v Speaker 1>see it in uh, in sports, they've been talking about

0:00:52.720 --> 0:00:55.120
<v Speaker 1>this for a while now. And if you've ever seen

0:00:55.480 --> 0:01:01.040
<v Speaker 1>the making of a a video or a game, um,

0:01:01.160 --> 0:01:05.400
<v Speaker 1>or you know, even in sports rehabilitation in medicine, um,

0:01:05.440 --> 0:01:08.480
<v Speaker 1>they where the people are wearing dots, little white dots

0:01:08.520 --> 0:01:12.880
<v Speaker 1>all over their clothing and sometimes their faces and hands. UM,

0:01:12.920 --> 0:01:15.679
<v Speaker 1>that's probably what they were doing. Either that or they

0:01:15.760 --> 0:01:18.960
<v Speaker 1>just really like stickers. Yeah, yeah, I mean, who doesn't.

0:01:19.120 --> 0:01:22.880
<v Speaker 1>I remember being very competitive in elementary school in order

0:01:22.920 --> 0:01:25.800
<v Speaker 1>to get a sticker. And also this is a tangent

0:01:25.880 --> 0:01:29.120
<v Speaker 1>but a true story. I got a gold star sticker

0:01:29.520 --> 0:01:32.920
<v Speaker 1>uh just last month awes from Tracy, the head of

0:01:32.920 --> 0:01:37.959
<v Speaker 1>our our site. So anyway, um, yeah, motion capture. Actually,

0:01:38.000 --> 0:01:40.759
<v Speaker 1>there are a lot of different terms that you can

0:01:40.920 --> 0:01:44.800
<v Speaker 1>use in this in this realm Uh, motion capture or

0:01:44.840 --> 0:01:47.600
<v Speaker 1>mo cap is probably the one I hear the most frequently,

0:01:47.640 --> 0:01:53.120
<v Speaker 1>but also things like performance animation, performance capture, digital puppetry,

0:01:53.280 --> 0:01:57.080
<v Speaker 1>real time animation, motion scanning, which is really more of

0:01:57.080 --> 0:02:01.000
<v Speaker 1>a proprietary thing, but these are The concept is pretty

0:02:01.040 --> 0:02:03.000
<v Speaker 1>much the same across the board. The idea is to

0:02:03.240 --> 0:02:08.919
<v Speaker 1>capture the physical representation of something and then converted into

0:02:09.000 --> 0:02:12.799
<v Speaker 1>a virtual format. So usually it's something that's in motion,

0:02:12.840 --> 0:02:15.560
<v Speaker 1>but it's not always that way. Uh, since you know

0:02:15.600 --> 0:02:19.280
<v Speaker 1>we're talking about motion capture, that makes sense. But you're

0:02:19.320 --> 0:02:23.359
<v Speaker 1>trying to get uh, translate something that is moving through

0:02:23.440 --> 0:02:28.560
<v Speaker 1>real space into a digital format. And uh, there's different

0:02:28.600 --> 0:02:30.000
<v Speaker 1>ways to do this. I mean, you could do it

0:02:30.080 --> 0:02:32.639
<v Speaker 1>the really hard way, which is where you study something

0:02:32.680 --> 0:02:35.680
<v Speaker 1>and then you try to recreate it, uh, either by

0:02:35.760 --> 0:02:38.560
<v Speaker 1>hand or by or digitally, you know, by by programming

0:02:39.080 --> 0:02:43.400
<v Speaker 1>movements into an animated figure. But this is an idea

0:02:43.400 --> 0:02:46.000
<v Speaker 1>that kind of takes that step out where you are

0:02:46.560 --> 0:02:50.800
<v Speaker 1>directly porting the movements. Uh, something is making within physical

0:02:50.800 --> 0:02:55.200
<v Speaker 1>space into virtual space. Yeah, there was an early technique, um.

0:02:55.240 --> 0:02:56.960
<v Speaker 1>And of course this is this is all an attempt

0:02:57.040 --> 0:03:01.720
<v Speaker 1>to get as real as you can with of animation. UM.

0:03:02.400 --> 0:03:06.240
<v Speaker 1>And one of the earlier techniques that that was sort

0:03:06.280 --> 0:03:10.359
<v Speaker 1>of a predecessor to this is called rotoscoping. Uh. Ralph

0:03:10.400 --> 0:03:13.400
<v Speaker 1>Box She's Lord of the Rings had a lot of

0:03:13.520 --> 0:03:17.520
<v Speaker 1>rotoscoping in it. Well, what happens is, um, in that

0:03:17.600 --> 0:03:21.440
<v Speaker 1>case is that a real uh, real human being goes

0:03:21.520 --> 0:03:24.480
<v Speaker 1>through the motions and they act through the parts that

0:03:24.520 --> 0:03:26.760
<v Speaker 1>are that you're going to see in the animation. They

0:03:26.760 --> 0:03:30.600
<v Speaker 1>shoot that on film, yes, yes, and then the the

0:03:30.639 --> 0:03:34.640
<v Speaker 1>animators basically are looking at that and are drawing more

0:03:34.720 --> 0:03:37.160
<v Speaker 1>or less on top of that. They see a projection

0:03:37.160 --> 0:03:39.800
<v Speaker 1>of that, and they are drawing, uh, the animation over

0:03:39.840 --> 0:03:44.080
<v Speaker 1>that to capture the way that person's body looks. And

0:03:44.880 --> 0:03:47.760
<v Speaker 1>this this was famous. You know, the Disney studios were

0:03:47.760 --> 0:03:50.120
<v Speaker 1>famous for this. We're studying models and then they would

0:03:50.120 --> 0:03:54.600
<v Speaker 1>do the rotoscoping technique to to try to make their uh,

0:03:55.000 --> 0:03:58.480
<v Speaker 1>their characters look more realistic. Yeah. And there are some artists,

0:03:58.560 --> 0:04:02.480
<v Speaker 1>like I said, like box She who famously would leave

0:04:02.560 --> 0:04:06.120
<v Speaker 1>the film image as part of the animation, so that

0:04:06.160 --> 0:04:09.120
<v Speaker 1>you had this this weird effect where the thing you

0:04:09.160 --> 0:04:13.360
<v Speaker 1>were looking at was part uh well quote unquote real

0:04:13.400 --> 0:04:17.240
<v Speaker 1>image and part animated image, which was it was an

0:04:17.320 --> 0:04:21.200
<v Speaker 1>artistic choice, uh, definitely something that was not meant to

0:04:21.200 --> 0:04:24.280
<v Speaker 1>to necessarily fool you into thinking, oh, well, that animated

0:04:24.360 --> 0:04:26.960
<v Speaker 1>character is moving very realistically. It was done on purpose,

0:04:27.560 --> 0:04:29.560
<v Speaker 1>but it was. That's what I always think of when

0:04:29.600 --> 0:04:32.119
<v Speaker 1>I think rhodoscoping, as I just think of the different

0:04:32.200 --> 0:04:34.040
<v Speaker 1>box Sheet films, but in particular I think of his

0:04:34.160 --> 0:04:37.640
<v Speaker 1>Lord of the Rings adaptation um, which, as I recall,

0:04:37.800 --> 0:04:42.800
<v Speaker 1>ended halfway through the Two Towers. So anyway, that's just

0:04:42.839 --> 0:04:45.240
<v Speaker 1>bringing back memories. But yeah, that was that was sort

0:04:45.240 --> 0:04:50.560
<v Speaker 1>of a precursor to motion capture. Motion capture itself. There

0:04:50.560 --> 0:04:55.599
<v Speaker 1>are many different ways of achieving this. For example, there

0:04:55.680 --> 0:05:00.159
<v Speaker 1>were it's not used very frequently now, but there were

0:05:00.800 --> 0:05:04.960
<v Speaker 1>mechanical systems where you had sensors that would be attached

0:05:05.000 --> 0:05:10.640
<v Speaker 1>to specific joints uh that would relay movement. And usually

0:05:10.640 --> 0:05:12.839
<v Speaker 1>it was kind of like a like an actor would

0:05:12.839 --> 0:05:19.039
<v Speaker 1>wear a physical metallic skeleton type device that would have

0:05:19.160 --> 0:05:22.800
<v Speaker 1>the sensors attached to the various joints and as the

0:05:22.839 --> 0:05:27.719
<v Speaker 1>actor moved, the sensors would register the changes in motion

0:05:28.600 --> 0:05:33.160
<v Speaker 1>in this metallic skeleton and UH, and that would be

0:05:33.200 --> 0:05:38.560
<v Speaker 1>relayed through usually cables to a computer system that would

0:05:39.120 --> 0:05:42.080
<v Speaker 1>measure these or take the measurements from the sensors and

0:05:42.080 --> 0:05:45.960
<v Speaker 1>translated into movements for the virtual character. UH. It's very

0:05:46.040 --> 0:05:49.839
<v Speaker 1>limiting this particular system. There was another one that was

0:05:49.920 --> 0:05:55.280
<v Speaker 1>a little more versatile, which was used electro magnets. And

0:05:55.320 --> 0:05:58.640
<v Speaker 1>in this case you talked about sensors that would be

0:05:58.680 --> 0:06:02.279
<v Speaker 1>attached by really thin cables that again would go to

0:06:02.520 --> 0:06:06.080
<v Speaker 1>a computer, and there'd be a magnetic field and by

0:06:06.160 --> 0:06:10.440
<v Speaker 1>moving through this magnetic field, the sensors would pick up alterations.

0:06:10.440 --> 0:06:13.000
<v Speaker 1>You know, it would you know, moving through magnetic field,

0:06:13.000 --> 0:06:16.880
<v Speaker 1>you would get little electrical changes. We we've talked a

0:06:16.880 --> 0:06:22.880
<v Speaker 1>lot about electricity magnetism in general, moving through UH, Fluctuating

0:06:22.880 --> 0:06:27.039
<v Speaker 1>a magnetic field can induce electricity through a conductor, or

0:06:27.400 --> 0:06:30.800
<v Speaker 1>putting electricity through a conductor can induce a magnetic field.

0:06:31.000 --> 0:06:34.240
<v Speaker 1>So anyway, by moving these sensors through the magnetic field,

0:06:34.440 --> 0:06:38.359
<v Speaker 1>it would create these electronic fluctuations that would then be

0:06:38.480 --> 0:06:43.359
<v Speaker 1>measured and translated into movement, and again this was a

0:06:43.560 --> 0:06:47.880
<v Speaker 1>fairly effective way of picking up movements. It actually didn't

0:06:48.000 --> 0:06:50.960
<v Speaker 1>use as many points of contact as the optical systems

0:06:50.960 --> 0:06:52.919
<v Speaker 1>that we mostly think about. That was the kind that

0:06:53.000 --> 0:06:55.599
<v Speaker 1>Chris was referring to early with all the dots on

0:06:55.640 --> 0:06:58.960
<v Speaker 1>the person. Those systems tend to have lots and lots

0:06:58.960 --> 0:07:02.800
<v Speaker 1>and lots of points of ref. The electro magnet ones

0:07:03.279 --> 0:07:06.799
<v Speaker 1>didn't tend to have as many points of reference because

0:07:07.320 --> 0:07:10.640
<v Speaker 1>the the software side of it, because you know, we

0:07:10.680 --> 0:07:12.520
<v Speaker 1>do have a hardware and a software side to this.

0:07:12.880 --> 0:07:16.360
<v Speaker 1>The software side would assume that the joints that these

0:07:16.400 --> 0:07:20.440
<v Speaker 1>sensors were attached to behaved the way they normally would

0:07:20.480 --> 0:07:24.239
<v Speaker 1>in humans, and that they don't have complete freedom of movement.

0:07:24.440 --> 0:07:28.600
<v Speaker 1>Most of us are not multi jointed in every joint,

0:07:28.720 --> 0:07:30.800
<v Speaker 1>so we can't you know, that we have a limitation

0:07:30.840 --> 0:07:33.080
<v Speaker 1>on how far we can move in certain directions with

0:07:33.120 --> 0:07:36.440
<v Speaker 1>these various joints. So taking that into account, you didn't

0:07:36.480 --> 0:07:40.040
<v Speaker 1>have to have sensors all over the body. You would

0:07:40.240 --> 0:07:41.920
<v Speaker 1>just have them in a few places, which was good

0:07:41.960 --> 0:07:44.520
<v Speaker 1>considering that there were these thick cables attached to the

0:07:44.600 --> 0:07:49.080
<v Speaker 1>sensors and then once you were done moving, then the

0:07:49.480 --> 0:07:52.720
<v Speaker 1>all that data would get be captured within the system

0:07:52.800 --> 0:07:56.600
<v Speaker 1>and could then be rendered into animation. Although this was

0:07:56.640 --> 0:07:59.200
<v Speaker 1>also a way that you could do real time animation

0:07:59.320 --> 0:08:02.720
<v Speaker 1>or digital a tree. Uh, it's not that different from

0:08:02.720 --> 0:08:06.160
<v Speaker 1>controlling a video game character with a controller. It's sort

0:08:06.160 --> 0:08:08.320
<v Speaker 1>of the same principle, except in this case the the

0:08:08.560 --> 0:08:10.800
<v Speaker 1>video game controller. Instead of it being something you hold

0:08:10.800 --> 0:08:15.120
<v Speaker 1>in your hands, it's something you were actually wearing. And uh,

0:08:15.240 --> 0:08:17.600
<v Speaker 1>I've seen plenty of instances of this. If you've ever

0:08:17.600 --> 0:08:21.400
<v Speaker 1>seen Turtle Talk with Crush over at Disney, that's what

0:08:21.480 --> 0:08:23.280
<v Speaker 1>they use. They use a digital you know, they use

0:08:23.320 --> 0:08:27.440
<v Speaker 1>digital poetry, and it's awesome by the way, I love that.

0:08:28.360 --> 0:08:32.200
<v Speaker 1>Well it uh, it would also seem that, um, you

0:08:32.200 --> 0:08:36.559
<v Speaker 1>would need to be aware of where those cables were going,

0:08:37.080 --> 0:08:39.280
<v Speaker 1>and it would it would also affect the way that

0:08:39.280 --> 0:08:41.600
<v Speaker 1>you would move. You wouldn't move as naturally if you

0:08:41.640 --> 0:08:44.360
<v Speaker 1>were wearing something like that as if you were, you know,

0:08:44.760 --> 0:08:48.920
<v Speaker 1>unencumbered by by that, which, um, sort of I think

0:08:49.000 --> 0:08:52.400
<v Speaker 1>would lend itself to to an upgrade, which is I

0:08:52.440 --> 0:08:57.839
<v Speaker 1>think why they were so keen on Well, it's also

0:08:58.120 --> 0:09:00.920
<v Speaker 1>also that's very true. It did limit what you could do,

0:09:01.000 --> 0:09:02.720
<v Speaker 1>it could live, It would limit your movement. I mean,

0:09:02.800 --> 0:09:05.000
<v Speaker 1>we've got these big cables attached to you. You You obviously

0:09:05.040 --> 0:09:08.559
<v Speaker 1>you can't just move freely within a space. Um, So

0:09:08.640 --> 0:09:11.679
<v Speaker 1>it did put some limitations on you. Their limitations to

0:09:11.720 --> 0:09:15.560
<v Speaker 1>the optical systems too, but we'll get into that. The

0:09:14.800 --> 0:09:19.320
<v Speaker 1>uh the other problem was that the sampling rate for

0:09:19.400 --> 0:09:22.360
<v Speaker 1>the magnetic systems was not as high as it is

0:09:22.400 --> 0:09:24.800
<v Speaker 1>for optical systems. And by sampling rate, what I mean

0:09:24.880 --> 0:09:28.240
<v Speaker 1>is that this the entire system as a whole, is

0:09:28.280 --> 0:09:33.520
<v Speaker 1>taking little measurements of from the sensors of you know,

0:09:33.559 --> 0:09:37.200
<v Speaker 1>the orientation of those sensors within the space, and it

0:09:37.280 --> 0:09:42.000
<v Speaker 1>does that several times every second. But the sample rate

0:09:42.080 --> 0:09:46.160
<v Speaker 1>of the magnetic motion capture systems was much lower than

0:09:46.240 --> 0:09:49.400
<v Speaker 1>what it was for than what it would be if

0:09:49.400 --> 0:09:51.480
<v Speaker 1>you were to use an optical system. So you're not

0:09:51.559 --> 0:09:55.520
<v Speaker 1>getting data as frequently. I mean still several times a second,

0:09:55.559 --> 0:09:59.000
<v Speaker 1>but it's not as precise as the optical system. So

0:09:59.520 --> 0:10:01.720
<v Speaker 1>not only where you limited in the kind of movements

0:10:01.760 --> 0:10:05.640
<v Speaker 1>you can make because you had these major cables attached

0:10:05.679 --> 0:10:11.520
<v Speaker 1>to you, but also you couldn't get really minute precise

0:10:12.480 --> 0:10:16.040
<v Speaker 1>measurements on every kind of movement, So it wasn't good

0:10:16.040 --> 0:10:19.480
<v Speaker 1>for things like sports. So, you know, something like throwing

0:10:19.480 --> 0:10:22.200
<v Speaker 1>a pitch in baseball, there are a lot of movements,

0:10:22.200 --> 0:10:25.160
<v Speaker 1>a little tiny motions that are involved in that. I mean,

0:10:25.360 --> 0:10:30.880
<v Speaker 1>anyone who's watched slow motion footage of a professional baseball

0:10:30.920 --> 0:10:33.800
<v Speaker 1>pitcher throwing a pitch, you can see that there are

0:10:33.840 --> 0:10:38.880
<v Speaker 1>some incredibly subtle movements that are involved in that. And uh,

0:10:38.920 --> 0:10:41.960
<v Speaker 1>and it takes place over a very short period of time.

0:10:42.000 --> 0:10:45.600
<v Speaker 1>I mean, it's a very fast thing to to to measure.

0:10:46.600 --> 0:10:50.400
<v Speaker 1>Using the magnetic motion capture system, you would probably one

0:10:51.160 --> 0:10:53.640
<v Speaker 1>slow the person down because they have all these cables

0:10:53.679 --> 0:10:56.440
<v Speaker 1>attached to them, and to not get enough data to

0:10:56.920 --> 0:11:00.520
<v Speaker 1>give an accurate representation of what had happened in the

0:11:00.600 --> 0:11:04.480
<v Speaker 1>virtual format. So if you were to say, create a

0:11:04.559 --> 0:11:07.520
<v Speaker 1>video game, a baseball video game, the picture would not

0:11:07.679 --> 0:11:13.760
<v Speaker 1>necessarily behave properly if all you did was directly port

0:11:13.800 --> 0:11:17.560
<v Speaker 1>the data you got from the motion capture into the game. Yeah.

0:11:17.559 --> 0:11:20.640
<v Speaker 1>Another drawback of the mechanical systems like that too, is

0:11:20.679 --> 0:11:23.760
<v Speaker 1>that that it's um it's the kind of system that

0:11:23.800 --> 0:11:27.040
<v Speaker 1>not only is cumbersome and inaccurate, but it has to

0:11:27.040 --> 0:11:31.200
<v Speaker 1>be calibrated fairly frequently. UM. And you know, there there's

0:11:31.200 --> 0:11:33.640
<v Speaker 1>a there's some work that you can do with this.

0:11:33.760 --> 0:11:37.920
<v Speaker 1>With the optical systems that they began to introduce UM,

0:11:37.960 --> 0:11:43.040
<v Speaker 1>you know, generally became an upgrade UM. The only there

0:11:43.120 --> 0:11:47.760
<v Speaker 1>is one big advantage that the mechanical systems do have, though,

0:11:47.800 --> 0:11:51.400
<v Speaker 1>and that is that light. The lighting will not necessarily

0:11:51.400 --> 0:11:54.880
<v Speaker 1>interfere with the different points of motion that are captured

0:11:54.880 --> 0:11:57.880
<v Speaker 1>by the mechanical system UM. And that can be an

0:11:57.880 --> 0:12:01.960
<v Speaker 1>issue with the optical systems UM. You know, because that's

0:12:02.080 --> 0:12:04.640
<v Speaker 1>that's why UM, they will be wearing the people, the

0:12:04.679 --> 0:12:08.560
<v Speaker 1>actors who will be UM having their motions captured by

0:12:08.600 --> 0:12:11.760
<v Speaker 1>the system, will be wearing you know, those bright dots

0:12:11.920 --> 0:12:14.400
<v Speaker 1>so that the computer can pick up on that. And

0:12:14.440 --> 0:12:16.440
<v Speaker 1>at the beginning, and these are these early systems, there

0:12:16.440 --> 0:12:20.720
<v Speaker 1>were only so many points action points that they could capture. UM.

0:12:20.800 --> 0:12:23.400
<v Speaker 1>They were very limited in what they could do at first,

0:12:23.520 --> 0:12:25.880
<v Speaker 1>but still you know, somewhat of an upgrade over the

0:12:25.920 --> 0:12:30.400
<v Speaker 1>mechanical Yeah. It also limited what you could have in

0:12:30.440 --> 0:12:34.040
<v Speaker 1>the background, obviously, because you could not have anything that

0:12:34.200 --> 0:12:38.079
<v Speaker 1>was going to be of a similar shade. Uh. You know,

0:12:38.160 --> 0:12:42.199
<v Speaker 1>usually where you're talking about reflective white substance used as

0:12:42.240 --> 0:12:47.440
<v Speaker 1>the um the points of of uh articulations. So the

0:12:47.600 --> 0:12:50.600
<v Speaker 1>little like white stickers is like what you were saying,

0:12:50.640 --> 0:12:53.199
<v Speaker 1>Chris Um, you couldn't have anything like that in the

0:12:53.240 --> 0:12:57.840
<v Speaker 1>background because it would confuse the optical system. So that's

0:12:57.880 --> 0:13:01.439
<v Speaker 1>why a lot of these motion capture scenes are shot

0:13:01.440 --> 0:13:04.760
<v Speaker 1>against a blue screen or green screen. It's so that

0:13:05.040 --> 0:13:08.720
<v Speaker 1>the background does not in any way interfere with the

0:13:08.800 --> 0:13:11.240
<v Speaker 1>motion capture. So if you've ever seen behind the scenes

0:13:11.240 --> 0:13:13.280
<v Speaker 1>footage of The Lord of the Rings movies is a

0:13:13.280 --> 0:13:18.120
<v Speaker 1>great example with Andy Serkis as Gollum or Sniegel if

0:13:18.160 --> 0:13:22.480
<v Speaker 1>you prefer, but he's wearing you know, a tight like

0:13:22.760 --> 0:13:27.360
<v Speaker 1>skin tight suit with these little white uh circles all

0:13:27.400 --> 0:13:30.800
<v Speaker 1>over it. Those are the points that the cameras track

0:13:31.280 --> 0:13:36.080
<v Speaker 1>to create the the performance of Gollum slash Sniegel. So

0:13:36.480 --> 0:13:39.880
<v Speaker 1>the performance is something that's being created not only by

0:13:40.120 --> 0:13:43.840
<v Speaker 1>the actor but also the animators because not We should

0:13:43.880 --> 0:13:47.240
<v Speaker 1>also point out that the motion capture stuff rarely is

0:13:47.320 --> 0:13:54.120
<v Speaker 1>motion capture uh completely. Uh. There's there's rarely a moment

0:13:54.120 --> 0:13:56.520
<v Speaker 1>where you don't have an animator step in and tweak

0:13:56.559 --> 0:14:01.520
<v Speaker 1>it somehow, like Uh, you don't normally have someone create

0:14:01.559 --> 0:14:06.840
<v Speaker 1>a physical performance and that physical performance is completely without

0:14:06.920 --> 0:14:10.719
<v Speaker 1>any tinkering represented in the final product. I mean it

0:14:11.040 --> 0:14:14.760
<v Speaker 1>can happen, there are instances of it, but it's more frequently, uh,

0:14:14.800 --> 0:14:18.439
<v Speaker 1>something where the motion capture performance goes to the animator,

0:14:18.480 --> 0:14:21.720
<v Speaker 1>who can then tweak things if the performance is not

0:14:21.840 --> 0:14:24.840
<v Speaker 1>exactly what needs to be, which is kind of nice.

0:14:25.240 --> 0:14:28.480
<v Speaker 1>You don't necessarily have that luxury with flesh and blood actors.

0:14:29.280 --> 0:14:33.000
<v Speaker 1>That's that's true. That's true. Well, especially with the earlier systems,

0:14:33.080 --> 0:14:38.440
<v Speaker 1>especially the electromagnetic systems. Uh, those were really noisy, not

0:14:38.520 --> 0:14:42.120
<v Speaker 1>literally noisy, but but digital noise. They weren't They weren't

0:14:42.160 --> 0:14:46.240
<v Speaker 1>really highly accurate. Um. The optical systems are are far

0:14:46.440 --> 0:14:49.920
<v Speaker 1>cleaner and give them more accurate representation. But you know there,

0:14:50.000 --> 0:14:52.720
<v Speaker 1>that's it's sort of falls in the realm of artistic license.

0:14:52.720 --> 0:14:55.200
<v Speaker 1>I would think, um, where they need to go in

0:14:55.240 --> 0:14:59.000
<v Speaker 1>and make subtle adjustments to make it look the way

0:14:59.000 --> 0:15:01.560
<v Speaker 1>they want it to look. Up. I should also point out,

0:15:01.720 --> 0:15:04.240
<v Speaker 1>now you just reminded me of something else another drawback

0:15:04.280 --> 0:15:07.720
<v Speaker 1>to the electromagnetic systems, which was you couldn't have anything

0:15:07.760 --> 0:15:11.840
<v Speaker 1>metal on the set because it would interfere with that

0:15:11.880 --> 0:15:15.520
<v Speaker 1>magnetic field and give incorrect readings to the system. So

0:15:15.600 --> 0:15:18.880
<v Speaker 1>you're you're virtual character would not move in the same

0:15:18.920 --> 0:15:20.640
<v Speaker 1>way as the physical one because there would be some

0:15:20.680 --> 0:15:24.440
<v Speaker 1>interference in that sense. So your set couldn't have anything

0:15:24.600 --> 0:15:28.640
<v Speaker 1>metal in it. The props didn't shouldn't have anything metal

0:15:28.800 --> 0:15:31.160
<v Speaker 1>in them, so that that limited to you as well.

0:15:31.240 --> 0:15:34.920
<v Speaker 1>So each system has its own limitations. Getting back to

0:15:34.960 --> 0:15:37.360
<v Speaker 1>the optical one, um, one of the other things you

0:15:37.400 --> 0:15:39.560
<v Speaker 1>have to remember is that in order to really capture

0:15:40.480 --> 0:15:45.560
<v Speaker 1>a a physical object moving through three D space and

0:15:45.680 --> 0:15:51.280
<v Speaker 1>to replicate that in virtual space, you need multiple cameras

0:15:51.440 --> 0:15:55.080
<v Speaker 1>in that system. Because a single camera, assuming that's a

0:15:55.160 --> 0:15:59.480
<v Speaker 1>regular video or film camera, something that does not have

0:15:59.680 --> 0:16:04.680
<v Speaker 1>three capability, pointing that at an object. It's creating a

0:16:04.800 --> 0:16:08.280
<v Speaker 1>two dimensional image of something that's moving in three dimensions.

0:16:09.120 --> 0:16:14.400
<v Speaker 1>The camera can't necessarily tell where movements are happening within

0:16:14.920 --> 0:16:19.720
<v Speaker 1>the depth frame of of that of that image. Right, So,

0:16:19.760 --> 0:16:22.360
<v Speaker 1>if someone's moving in such a way where let's say

0:16:22.360 --> 0:16:25.040
<v Speaker 1>they're moving their head where it would be bobbing closer

0:16:25.080 --> 0:16:31.120
<v Speaker 1>to the camera, Uh, unless the size of the the

0:16:31.200 --> 0:16:34.680
<v Speaker 1>sensors is such that something that subtle could be picked

0:16:34.720 --> 0:16:37.640
<v Speaker 1>up by the camera system, you would lose that information.

0:16:38.760 --> 0:16:41.480
<v Speaker 1>So what you need are multiple cameras on the same

0:16:41.600 --> 0:16:45.000
<v Speaker 1>object so that you can compare that data from the

0:16:45.040 --> 0:16:48.560
<v Speaker 1>multiple angles to tell how this object is really moving

0:16:48.600 --> 0:16:51.560
<v Speaker 1>through this three dimensional space. So it's kind of like

0:16:51.840 --> 0:16:55.400
<v Speaker 1>the idea of having parallax with two eyes. You know,

0:16:55.440 --> 0:16:59.040
<v Speaker 1>our eyes are offset, so by looking at an object,

0:16:59.400 --> 0:17:02.120
<v Speaker 1>we can tell how far away it is in part

0:17:02.680 --> 0:17:07.160
<v Speaker 1>because of parallax. Uh. We also have other visual cues

0:17:07.160 --> 0:17:09.119
<v Speaker 1>that tell us about how far something is, you know,

0:17:09.320 --> 0:17:11.880
<v Speaker 1>things like how tall it is in relation to where

0:17:11.880 --> 0:17:13.480
<v Speaker 1>we are that kind of thing, or how tall it

0:17:13.520 --> 0:17:15.400
<v Speaker 1>is in relation to other objects that are within our

0:17:15.480 --> 0:17:19.080
<v Speaker 1>frame of vision. But parallax is very important. Same sort

0:17:19.080 --> 0:17:21.400
<v Speaker 1>of thing. With these optical systems, you would have multiple

0:17:21.440 --> 0:17:25.920
<v Speaker 1>cameras set up to try and capture the information that's

0:17:25.960 --> 0:17:29.280
<v Speaker 1>going on in the frame so that you could tell

0:17:29.320 --> 0:17:33.160
<v Speaker 1>exactly how it's moving through that three dimensional space. Yeah,

0:17:33.240 --> 0:17:37.000
<v Speaker 1>it seems like um. In order to capture the correct perspective,

0:17:37.359 --> 0:17:40.159
<v Speaker 1>you need that additional information, even though you may not

0:17:40.240 --> 0:17:43.879
<v Speaker 1>necessarily see it. UM. It helps the the animator do that,

0:17:44.160 --> 0:17:47.360
<v Speaker 1>and the optical system to allows you to work with

0:17:47.760 --> 0:17:51.440
<v Speaker 1>more than one actor um, which was not really an

0:17:51.440 --> 0:17:56.440
<v Speaker 1>option with some of the earlier systems. So in other words,

0:17:56.440 --> 0:17:59.879
<v Speaker 1>you can, although it requires more equipment, you know, just

0:18:00.000 --> 0:18:03.840
<v Speaker 1>simply out of necessity, the optical system is really affording

0:18:04.080 --> 0:18:08.280
<v Speaker 1>the animators a an opportunity to use a greater amount

0:18:08.320 --> 0:18:12.520
<v Speaker 1>of information um both you know, from the different the

0:18:12.600 --> 0:18:16.280
<v Speaker 1>different points of data they're getting from a single actor,

0:18:16.680 --> 0:18:20.479
<v Speaker 1>but from multiple actors on the set simultaneously, which enables

0:18:20.520 --> 0:18:24.240
<v Speaker 1>them to to create more complex work right and UH.

0:18:24.400 --> 0:18:27.919
<v Speaker 1>This also gives us a good example of how the

0:18:27.960 --> 0:18:32.720
<v Speaker 1>optical motion capture systems are a passive system because you

0:18:32.760 --> 0:18:35.159
<v Speaker 1>have these sensors you're wearing that are not necessarily or

0:18:35.160 --> 0:18:39.000
<v Speaker 1>not even sensors. They're they're reflective markers that you're wearing.

0:18:39.040 --> 0:18:42.880
<v Speaker 1>They aren't connected to any sort of electronic components at all,

0:18:43.320 --> 0:18:47.960
<v Speaker 1>versus the active systems like the electromagnetic one, where you

0:18:48.040 --> 0:18:52.560
<v Speaker 1>are generating data by moving through a magnetic field and

0:18:52.600 --> 0:18:55.600
<v Speaker 1>you have these big cables attached to it. Uh. With

0:18:55.640 --> 0:18:58.520
<v Speaker 1>the optical motion capture systems. Another thing that's kind of interesting,

0:18:58.600 --> 0:19:00.880
<v Speaker 1>I think is that a lot of least the early ones,

0:19:01.320 --> 0:19:06.520
<v Speaker 1>the cameras would have infrared l ed s uh so admitters,

0:19:06.560 --> 0:19:10.959
<v Speaker 1>really that we're emitting infrared lights. That's outside our our

0:19:11.040 --> 0:19:15.160
<v Speaker 1>visible spectrum. We cannot see infrared light. But by putting

0:19:15.200 --> 0:19:18.320
<v Speaker 1>an infrared filter on the camera, you could have the

0:19:18.359 --> 0:19:21.879
<v Speaker 1>camera pick up reflections of infrared light. And that was

0:19:21.920 --> 0:19:25.879
<v Speaker 1>a way of helping to identify the sensors that you

0:19:25.960 --> 0:19:29.240
<v Speaker 1>had put on the actor. The actors, the sensors would

0:19:29.280 --> 0:19:33.240
<v Speaker 1>be reflective specifically so that the infrared light would reflect

0:19:33.280 --> 0:19:37.080
<v Speaker 1>back toward the camera and give the most accurate rendering

0:19:37.160 --> 0:19:41.000
<v Speaker 1>of what's going on at any given moment within a scene.

0:19:41.560 --> 0:19:45.520
<v Speaker 1>So um, yeah, it's another way of making sure that

0:19:45.680 --> 0:19:48.560
<v Speaker 1>the data being captured is as precise as possible. I

0:19:48.560 --> 0:19:50.200
<v Speaker 1>mean that is, of course, the goal is to try

0:19:50.200 --> 0:19:55.240
<v Speaker 1>and recreate the physical movements as truthfully as you possibly

0:19:55.320 --> 0:19:59.880
<v Speaker 1>can given all the limitations involved. Yeah, and if you're

0:20:00.000 --> 0:20:04.600
<v Speaker 1>looking for a real life easy to find an example

0:20:04.680 --> 0:20:08.679
<v Speaker 1>of this, you would look no farther than your local

0:20:08.760 --> 0:20:13.879
<v Speaker 1>video game store. Um, because the Xbox Connect h uses

0:20:14.000 --> 0:20:18.119
<v Speaker 1>very much that that exact uh form of technology. It

0:20:18.320 --> 0:20:21.480
<v Speaker 1>is using an infrared emitter, um, and it has cameras

0:20:21.520 --> 0:20:24.240
<v Speaker 1>that it uses to pick it up uh, the information

0:20:24.440 --> 0:20:27.040
<v Speaker 1>pick the information up that is coming back from what

0:20:27.200 --> 0:20:29.720
<v Speaker 1>is being reflected around the room, and anybody who who

0:20:29.760 --> 0:20:33.080
<v Speaker 1>has one is also aware that lighting is very much

0:20:33.119 --> 0:20:36.040
<v Speaker 1>an issue. Um. The way that were room is let

0:20:36.119 --> 0:20:39.160
<v Speaker 1>affects the information that the Connect is able to refer

0:20:39.200 --> 0:20:42.200
<v Speaker 1>to the Xbox. Now, it's not, while it is sophisticated,

0:20:42.280 --> 0:20:44.399
<v Speaker 1>is not as sophisticated as the kind of equipment that

0:20:44.440 --> 0:20:47.159
<v Speaker 1>they might use in making a movie or making a

0:20:47.280 --> 0:20:51.000
<v Speaker 1>video game. But it is very very similar technology, and

0:20:51.000 --> 0:20:53.960
<v Speaker 1>in some ways I would argue that it's more sophisticated

0:20:53.960 --> 0:20:57.080
<v Speaker 1>than some of those early UH systems simply because it

0:20:57.160 --> 0:21:00.800
<v Speaker 1>is able to capture a lot of information uh, Whereas

0:21:00.880 --> 0:21:03.560
<v Speaker 1>you know, the very early optical systems were only using

0:21:03.760 --> 0:21:08.480
<v Speaker 1>a handful of data points. So um, it's it's a

0:21:08.520 --> 0:21:11.800
<v Speaker 1>pretty neat device. Um, you know, not only used for gaming.

0:21:11.800 --> 0:21:13.679
<v Speaker 1>Now the hacker community has fallen in love with it

0:21:13.720 --> 0:21:16.280
<v Speaker 1>too because it can do so much and can be

0:21:16.359 --> 0:21:20.000
<v Speaker 1>used for so many things and is you know, fairly inexpensive. Yeah.

0:21:20.000 --> 0:21:22.680
<v Speaker 1>The cool thing about the Connect is that rather than

0:21:22.720 --> 0:21:25.480
<v Speaker 1>have to obviously, if you've if you've ever played an

0:21:25.640 --> 0:21:27.959
<v Speaker 1>Xbox with the connect, you know, you don't have to

0:21:28.040 --> 0:21:31.000
<v Speaker 1>go out and buy a snug body suit covered in

0:21:31.040 --> 0:21:33.800
<v Speaker 1>reflective markers in order to play. I mean, it doesn't hurt,

0:21:34.600 --> 0:21:36.720
<v Speaker 1>but uh, you know, if you're if you can pull

0:21:36.760 --> 0:21:39.000
<v Speaker 1>that look off. There a very few of us who can.

0:21:39.280 --> 0:21:42.560
<v Speaker 1>I count myself among them. But you don't have to

0:21:42.600 --> 0:21:45.080
<v Speaker 1>do that because what it's doing is it's actually projecting

0:21:45.200 --> 0:21:50.120
<v Speaker 1>essentially a grid, uh in infrared light, so you can't

0:21:50.160 --> 0:21:52.880
<v Speaker 1>see the grid, but it's being projected into the room.

0:21:53.040 --> 0:21:56.760
<v Speaker 1>And then when you move uh within the space, you

0:21:56.800 --> 0:21:59.440
<v Speaker 1>are deforming that grid. You know, the camera that's picking

0:21:59.520 --> 0:22:04.560
<v Speaker 1>up the the reflections of that infrared light can detect

0:22:04.600 --> 0:22:08.359
<v Speaker 1>when the grid's being deformed by a physical object interrupting

0:22:08.400 --> 0:22:11.440
<v Speaker 1>the grid. So as you move, you interrupt different parts

0:22:11.480 --> 0:22:13.800
<v Speaker 1>of the grid, and it can start to interpret those

0:22:13.880 --> 0:22:19.600
<v Speaker 1>as motions and commands. It's not, uh, it's not as

0:22:19.640 --> 0:22:22.800
<v Speaker 1>precise as what we're talking about with the optical systems

0:22:22.840 --> 0:22:25.560
<v Speaker 1>that are used in movies and video games, uh, to

0:22:25.560 --> 0:22:29.200
<v Speaker 1>to create them, that is, not to to play them. Um,

0:22:29.240 --> 0:22:32.200
<v Speaker 1>it's not as precise as those, but it also has

0:22:32.280 --> 0:22:35.520
<v Speaker 1>other elements that help balance it out, Like it has

0:22:36.200 --> 0:22:42.120
<v Speaker 1>regular optical cameras that can have some other software that

0:22:42.520 --> 0:22:46.640
<v Speaker 1>aids it in recognizing things like facial recognition software, which

0:22:46.640 --> 0:22:50.960
<v Speaker 1>does not necessarily rely upon that infrared grid. It relies

0:22:51.000 --> 0:22:55.680
<v Speaker 1>more on the traditional camera functions, but has the software

0:22:55.720 --> 0:23:00.280
<v Speaker 1>included that, let's the the programs within recognize who is

0:23:00.320 --> 0:23:04.280
<v Speaker 1>standing in front of it, so that combination, uh increases

0:23:04.320 --> 0:23:06.560
<v Speaker 1>the precision, which of course is very important whenever you're

0:23:06.560 --> 0:23:09.120
<v Speaker 1>playing a game. I mean, anyone who's played any sort

0:23:09.200 --> 0:23:12.560
<v Speaker 1>of game where you're using a faulty controller, or it's

0:23:12.600 --> 0:23:16.479
<v Speaker 1>just a system that hasn't been fully uh it's not

0:23:16.520 --> 0:23:19.080
<v Speaker 1>finished yet, it's just in prototype stage or whatever, you

0:23:19.119 --> 0:23:22.359
<v Speaker 1>may have noticed that it could be very frustrating to

0:23:22.480 --> 0:23:26.399
<v Speaker 1>try and control something where the actual controller is not

0:23:27.440 --> 0:23:31.239
<v Speaker 1>as responsive as you would hope. It's um not a

0:23:31.280 --> 0:23:34.880
<v Speaker 1>fun experience. But anyway, that is kind of related to

0:23:34.960 --> 0:23:41.560
<v Speaker 1>this whole motion capture technology. UM, I'm sorry what you

0:23:41.320 --> 0:23:43.639
<v Speaker 1>were You look like you have something to say, Well, no,

0:23:43.760 --> 0:23:46.320
<v Speaker 1>I was, I was going to say that. Um, you know,

0:23:46.359 --> 0:23:51.240
<v Speaker 1>we really hadn't other than my earlier statement about sports. UM,

0:23:51.240 --> 0:23:52.879
<v Speaker 1>you know, we've we've been talking about it in an

0:23:53.040 --> 0:23:59.120
<v Speaker 1>entertain amount entertainment about the the ability to capture motion

0:23:59.200 --> 0:24:02.800
<v Speaker 1>to make care act is more realistic. And um, that

0:24:02.800 --> 0:24:05.479
<v Speaker 1>that is exactly what they want to do when they

0:24:05.480 --> 0:24:08.399
<v Speaker 1>are using this in sports medicine. UM. Jonathan alluded to

0:24:08.480 --> 0:24:13.119
<v Speaker 1>earlier the difficulty in UH and capturing all the little

0:24:13.160 --> 0:24:18.200
<v Speaker 1>subtle motions that go into UM into a Major League

0:24:18.200 --> 0:24:22.080
<v Speaker 1>baseball players pitching. UM. And you know when somebody, when

0:24:22.080 --> 0:24:27.359
<v Speaker 1>somebody gets hurt, UM, sometimes they go through uh extensive surgery.

0:24:27.920 --> 0:24:31.399
<v Speaker 1>The Tommy John procedures is UH famous. You know, they

0:24:31.600 --> 0:24:35.720
<v Speaker 1>do a ligament transplant to to help rebuild a picture's elbow,

0:24:36.119 --> 0:24:39.119
<v Speaker 1>and that can really throw off, um, the mechanics of

0:24:39.119 --> 0:24:42.680
<v Speaker 1>a pictures motion. So they use this motion capture technology

0:24:42.680 --> 0:24:47.280
<v Speaker 1>to really get an idea of how, UM, how that

0:24:47.359 --> 0:24:50.520
<v Speaker 1>person is is throwing going about the mechanics of their

0:24:50.560 --> 0:24:53.560
<v Speaker 1>typical game play. And and that's exactly the same kind

0:24:53.600 --> 0:24:55.440
<v Speaker 1>of thing that they're doing when they create these very

0:24:55.840 --> 0:24:59.960
<v Speaker 1>realistic sports games. UM. But you know, in this case,

0:25:00.000 --> 0:25:02.680
<v Speaker 1>they're using it for sports medicine to see if they can, UH,

0:25:02.720 --> 0:25:06.439
<v Speaker 1>they can go back and recreate some of the motions

0:25:06.440 --> 0:25:10.400
<v Speaker 1>that made them so successful before they were injured. Now, UM, ironically,

0:25:10.600 --> 0:25:17.560
<v Speaker 1>in in UH entertainment purposes, especially video, UM, you can

0:25:17.720 --> 0:25:22.400
<v Speaker 1>get too realistic. UM. The Japanese professor massa Hiro Mori

0:25:22.600 --> 0:25:26.800
<v Speaker 1>is famous for his Uncanny Valley UM, which has been

0:25:26.920 --> 0:25:31.760
<v Speaker 1>used in uses a robotics term for a robot that

0:25:31.800 --> 0:25:35.080
<v Speaker 1>looks so much and moves so much like a human

0:25:35.520 --> 0:25:39.000
<v Speaker 1>that it it creeps us out. It looks a little

0:25:39.200 --> 0:25:41.919
<v Speaker 1>too realistic. And I can think of we're actually recording

0:25:41.920 --> 0:25:45.320
<v Speaker 1>this in December of and um. One of the movies

0:25:45.359 --> 0:25:46.960
<v Speaker 1>that comes on about this time of year is The

0:25:47.000 --> 0:25:54.480
<v Speaker 1>Polar Express, which is known, loved and reviled both for

0:25:54.560 --> 0:25:57.560
<v Speaker 1>its story and it's um and the way that they

0:25:57.640 --> 0:25:59.720
<v Speaker 1>use motion capture because the characters and there are so

0:25:59.800 --> 0:26:02.760
<v Speaker 1>realistic they're downright creepy. Yeah, it's it's one of those

0:26:02.760 --> 0:26:07.560
<v Speaker 1>things where they are almost but not quite able to

0:26:07.640 --> 0:26:12.359
<v Speaker 1>pass for a real person, so that there's just enough

0:26:12.640 --> 0:26:17.680
<v Speaker 1>off about them to be unsettling. Now, this does bring

0:26:17.760 --> 0:26:20.200
<v Speaker 1>up something else that's kind of interesting. We have an

0:26:20.280 --> 0:26:25.000
<v Speaker 1>article on how stuff works dot com about motion scan technology,

0:26:25.080 --> 0:26:28.800
<v Speaker 1>which is, as I said earlier, a proprietary technology. It's

0:26:29.480 --> 0:26:33.840
<v Speaker 1>it's more specific than just motion capture. It's specifically meant

0:26:34.080 --> 0:26:40.240
<v Speaker 1>to capture facial motion activity. So when an actor is speaking,

0:26:40.280 --> 0:26:44.200
<v Speaker 1>when they're delivering lines, the way that they furrow their

0:26:44.280 --> 0:26:49.760
<v Speaker 1>brow or move their eyes or smile, or they give

0:26:49.760 --> 0:26:54.280
<v Speaker 1>a facial take, anything like that. This system is designed

0:26:54.320 --> 0:26:57.520
<v Speaker 1>to pick that up so that it can be recreated

0:26:57.760 --> 0:27:01.200
<v Speaker 1>virtually in a game, and it was used too great effect,

0:27:01.240 --> 0:27:06.480
<v Speaker 1>in my opinion, in L A Noir. Le Noir was

0:27:06.600 --> 0:27:08.600
<v Speaker 1>a video game that came out in two thousand eleven,

0:27:08.760 --> 0:27:12.200
<v Speaker 1>and it was a game in which you played a well,

0:27:12.280 --> 0:27:14.520
<v Speaker 1>you played a couple of different characters, but the one

0:27:14.600 --> 0:27:18.680
<v Speaker 1>you played for most of the game spoiler alert was

0:27:18.680 --> 0:27:22.200
<v Speaker 1>was a a police detective. And you're kind of rising

0:27:22.240 --> 0:27:26.359
<v Speaker 1>through the ranks uh in L A U during the

0:27:26.600 --> 0:27:33.320
<v Speaker 1>uh early part of the twentieth century. And it's it's um,

0:27:33.359 --> 0:27:38.640
<v Speaker 1>it's notable in that you are, uh, you're spending most

0:27:38.680 --> 0:27:42.679
<v Speaker 1>of the game looking at people's reactions. You know. The

0:27:42.760 --> 0:27:45.800
<v Speaker 1>idea behind L A Noir It was a new type

0:27:45.800 --> 0:27:51.199
<v Speaker 1>of video game where you would interrogate characters throughout your investigations,

0:27:51.480 --> 0:27:54.199
<v Speaker 1>and as you interrogate them, you had to watch the

0:27:54.320 --> 0:27:58.239
<v Speaker 1>characters facial reactions to kind of get an idea of

0:27:58.280 --> 0:28:00.720
<v Speaker 1>whether the character was trying to be evade se or

0:28:00.760 --> 0:28:03.520
<v Speaker 1>if they were telling the truth. And you would do

0:28:03.600 --> 0:28:06.359
<v Speaker 1>things like watch for their eyes and if they weren't

0:28:06.359 --> 0:28:09.439
<v Speaker 1>able to maintain eye contact, that was an indication that

0:28:09.480 --> 0:28:13.119
<v Speaker 1>perhaps they were being less than truthful. Or if they would,

0:28:13.359 --> 0:28:16.320
<v Speaker 1>you know, twitch their mouth or clench their jaw, these

0:28:16.359 --> 0:28:20.800
<v Speaker 1>would be little little hints that perhaps there's more going

0:28:20.920 --> 0:28:25.600
<v Speaker 1>on than what they're letting onto. And obviously, if your

0:28:25.640 --> 0:28:29.240
<v Speaker 1>gameplay depends upon trying to determine whether or not a

0:28:29.440 --> 0:28:32.359
<v Speaker 1>virtual character is telling the truth, you have to be

0:28:32.400 --> 0:28:38.400
<v Speaker 1>able to represent those facial expressions as closely to reality

0:28:38.440 --> 0:28:42.000
<v Speaker 1>as possible, or else the game does not work. So

0:28:42.200 --> 0:28:45.400
<v Speaker 1>they used this motion scan technology and the way that

0:28:45.640 --> 0:28:49.760
<v Speaker 1>they did this was that they had a very brightly

0:28:49.920 --> 0:28:54.520
<v Speaker 1>lit studio that had lights trained on an actor from

0:28:54.560 --> 0:28:57.160
<v Speaker 1>just about every angle and the purpose of that was

0:28:57.200 --> 0:29:00.320
<v Speaker 1>to try and eliminate shadows, because any sort of shadows

0:29:00.320 --> 0:29:03.280
<v Speaker 1>you would have there would of course affect the actual capture.

0:29:03.960 --> 0:29:06.760
<v Speaker 1>It was really all about the light. And they used

0:29:07.160 --> 0:29:11.760
<v Speaker 1>thirty two high definition cameras. So think about that, thirty

0:29:11.760 --> 0:29:15.560
<v Speaker 1>two high definition cameras just to capture and actor's facial

0:29:15.680 --> 0:29:18.760
<v Speaker 1>performance like that's it. There, there's no other movement. The

0:29:18.920 --> 0:29:22.680
<v Speaker 1>actor is seated at the time and um and had

0:29:22.720 --> 0:29:26.120
<v Speaker 1>to remain as still as possible and just do all

0:29:26.160 --> 0:29:29.880
<v Speaker 1>the acting with their face, which for anyone out there

0:29:29.880 --> 0:29:34.560
<v Speaker 1>who's done any sort of acting, you know, that's incredibly

0:29:34.680 --> 0:29:39.200
<v Speaker 1>challenging because actors are trained to use their whole body

0:29:39.640 --> 0:29:42.280
<v Speaker 1>when they are performance making a performance. They're trained to

0:29:42.760 --> 0:29:46.000
<v Speaker 1>to really think about movement. I mean, if you're if

0:29:46.040 --> 0:29:48.880
<v Speaker 1>you're really serious about acting, you've probably taken movement classes.

0:29:49.240 --> 0:29:51.920
<v Speaker 1>And to suddenly have all of that taken away and

0:29:52.080 --> 0:29:54.960
<v Speaker 1>all of your acting is restricted to just your face,

0:29:55.480 --> 0:29:59.480
<v Speaker 1>it's pretty that's pretty dramatic. It's tough to do, but anyway,

0:29:59.520 --> 0:30:01.080
<v Speaker 1>that's what the actors had to do. They had to

0:30:01.080 --> 0:30:05.120
<v Speaker 1>sit down and restrict their acting to just their facial

0:30:05.960 --> 0:30:09.880
<v Speaker 1>expressions without it going like over the top crazy, because

0:30:09.920 --> 0:30:13.160
<v Speaker 1>that would be just as distracting as not enough performance

0:30:13.160 --> 0:30:17.400
<v Speaker 1>at all. And these thirty two cameras were paired up,

0:30:17.520 --> 0:30:20.880
<v Speaker 1>so sixteen pairs of cameras. There's technically there was a

0:30:20.920 --> 0:30:23.760
<v Speaker 1>thirty third camera as well that the director used to

0:30:23.880 --> 0:30:28.400
<v Speaker 1>watch the scene and give directions to the actors um

0:30:28.480 --> 0:30:32.080
<v Speaker 1>but these these pairs of cameras were trained on all

0:30:32.080 --> 0:30:34.880
<v Speaker 1>these different angles of the face in order to capture

0:30:35.000 --> 0:30:39.200
<v Speaker 1>that that performance so that in the virtual world they

0:30:39.200 --> 0:30:43.520
<v Speaker 1>could recreate it accurately, which to me is phenomenal. And

0:30:43.640 --> 0:30:47.720
<v Speaker 1>apparently the way the system works is you get that

0:30:47.920 --> 0:30:54.280
<v Speaker 1>virtual version of the person's face and head almost instantly,

0:30:54.720 --> 0:30:59.600
<v Speaker 1>which is kind of creepy but also awesome. It's it's

0:30:59.640 --> 0:31:02.959
<v Speaker 1>funny too that uh they used that many cameras in

0:31:03.160 --> 0:31:07.240
<v Speaker 1>the creation of a video game, because uh, elsewhere in

0:31:07.240 --> 0:31:11.640
<v Speaker 1>that article that notes that um Circus who was playing

0:31:11.760 --> 0:31:20.320
<v Speaker 1>Gollum um only had only had cameras on on him,

0:31:20.360 --> 0:31:23.080
<v Speaker 1>but in doing so, they were able to, uh to

0:31:23.960 --> 0:31:28.320
<v Speaker 1>create roughly, you know, ten thousand different kinds or identify

0:31:28.440 --> 0:31:31.160
<v Speaker 1>ten thousand different kinds of facial movements that they could

0:31:31.280 --> 0:31:37.200
<v Speaker 1>use in in animating the character on screen. So um, clearly, uh,

0:31:37.240 --> 0:31:41.600
<v Speaker 1>you know, this is very very high tech and painstaking

0:31:41.600 --> 0:31:44.760
<v Speaker 1>procedure to do, but in doing so they can they

0:31:44.800 --> 0:31:47.360
<v Speaker 1>can create very very realistic movements. Yeah, there's a lot

0:31:47.400 --> 0:31:52.240
<v Speaker 1>of number crunching involved, and frankly, the the part that

0:31:52.360 --> 0:31:56.400
<v Speaker 1>takes place after you've captured the data is can be

0:31:56.480 --> 0:31:59.840
<v Speaker 1>dramatically different from one case to the next. In some cases,

0:32:00.120 --> 0:32:06.200
<v Speaker 1>may have already created uh an animated figure pretty much

0:32:06.240 --> 0:32:08.480
<v Speaker 1>from start to finish, you might not have completely put

0:32:08.560 --> 0:32:13.000
<v Speaker 1>textures on it or or something. But you might have

0:32:13.400 --> 0:32:16.480
<v Speaker 1>essentially the way the character is going to look in

0:32:16.520 --> 0:32:21.000
<v Speaker 1>the finished product, uh, and then you just map it

0:32:21.080 --> 0:32:23.800
<v Speaker 1>to the movements that you've captured and it's and there

0:32:23.800 --> 0:32:26.720
<v Speaker 1>it goes. And in other cases you might see that

0:32:26.880 --> 0:32:29.600
<v Speaker 1>what they do is they capture the motions and then

0:32:29.640 --> 0:32:33.560
<v Speaker 1>you essentially have what looks like a very primitive stick

0:32:33.600 --> 0:32:37.200
<v Speaker 1>figure skeleton that moves in the way that the actor moved,

0:32:37.520 --> 0:32:41.040
<v Speaker 1>but there's no definition, there's no character there yet. And

0:32:41.520 --> 0:32:44.480
<v Speaker 1>you may have animators who build the character somewhat based

0:32:44.560 --> 0:32:47.280
<v Speaker 1>upon the way the actor moved through the space, so

0:32:47.360 --> 0:32:51.040
<v Speaker 1>that perhaps the character's design is not finalized until you've

0:32:51.080 --> 0:32:54.479
<v Speaker 1>captured that that performance, and the performance helps guide the

0:32:54.520 --> 0:32:58.680
<v Speaker 1>design of the character. It all depends on the specific

0:32:58.720 --> 0:33:03.080
<v Speaker 1>technology that's being you and the preference of the crew

0:33:03.200 --> 0:33:05.880
<v Speaker 1>that's that's designing whatever it is that they're making, whether

0:33:05.920 --> 0:33:08.880
<v Speaker 1>it's a video game or movie, TV show, commercial, whatever

0:33:08.920 --> 0:33:12.840
<v Speaker 1>it happens to be. UH. In the case of digital puppetry,

0:33:12.880 --> 0:33:16.720
<v Speaker 1>obviously you would already have the the full character realized,

0:33:16.880 --> 0:33:21.280
<v Speaker 1>so that just by using whatever control mechanism happens to

0:33:21.360 --> 0:33:25.000
<v Speaker 1>be there, you would be able to make the puppet

0:33:25.000 --> 0:33:29.320
<v Speaker 1>move in real time, otherwise it's not really puppetry. Um.

0:33:29.400 --> 0:33:31.440
<v Speaker 1>And again that's sort of like the if you've been

0:33:31.440 --> 0:33:33.880
<v Speaker 1>to that that turtle talk thing I talked about, the

0:33:34.760 --> 0:33:38.400
<v Speaker 1>Disney World or Disneyland. Um. I'm sure there are other

0:33:38.560 --> 0:33:42.040
<v Speaker 1>similar ones. I think Monsters Inc. Laugh Factory has a

0:33:42.080 --> 0:33:45.720
<v Speaker 1>similar setup where you've got a digital character on a

0:33:45.800 --> 0:33:49.120
<v Speaker 1>screen that can react in real time to things that

0:33:49.120 --> 0:33:52.600
<v Speaker 1>are happening within the physical environment. So they interact with

0:33:52.600 --> 0:33:55.640
<v Speaker 1>the audience like they'll specifically single people out and chat

0:33:55.720 --> 0:33:59.680
<v Speaker 1>with people in the audience. And Um, to two kids,

0:33:59.720 --> 0:34:02.360
<v Speaker 1>this is amazing. I means the cartoon character acting in

0:34:02.440 --> 0:34:05.840
<v Speaker 1>real time, it's a real person. Now, Uh, two adults,

0:34:05.840 --> 0:34:10.719
<v Speaker 1>it's fascinating because they're like, how the heck did that happen? Um,

0:34:10.760 --> 0:34:13.319
<v Speaker 1>But yeah, that's it's all based on this same sort

0:34:13.360 --> 0:34:17.160
<v Speaker 1>of technology. UM. And It's it's really interesting to me

0:34:17.360 --> 0:34:21.880
<v Speaker 1>to see how the field is evolving over time, because

0:34:21.880 --> 0:34:25.120
<v Speaker 1>things like the connect show that we are adapting the

0:34:25.239 --> 0:34:27.880
<v Speaker 1>same sort of technology in different ways. We're using different

0:34:27.880 --> 0:34:31.840
<v Speaker 1>implementations to essentially do the same thing, and that perhaps

0:34:32.000 --> 0:34:35.560
<v Speaker 1>we will get to a point where we won't have

0:34:35.680 --> 0:34:40.239
<v Speaker 1>to worry about all the sensors so much. Um, you

0:34:40.280 --> 0:34:44.799
<v Speaker 1>can maybe have an actor who's not completely coded and

0:34:44.880 --> 0:34:48.560
<v Speaker 1>stickers perform and and you could capture all that data

0:34:48.680 --> 0:34:52.240
<v Speaker 1>without having to worry about, you know, tracking these little dots.

0:34:52.719 --> 0:34:54.520
<v Speaker 1>That might be something that we've seen in the future.

0:34:54.560 --> 0:34:56.080
<v Speaker 1>I mean the motion scan is kind of like that

0:34:56.120 --> 0:35:03.400
<v Speaker 1>because before motion scan with that facial acting uh technology. Uh,

0:35:03.440 --> 0:35:07.200
<v Speaker 1>whenever I saw anyone who was having their face tracked

0:35:07.239 --> 0:35:11.040
<v Speaker 1>for a performance, they always were wearing those tiny little

0:35:11.120 --> 0:35:14.200
<v Speaker 1>white stickers all over their face to track. I mean,

0:35:14.200 --> 0:35:15.759
<v Speaker 1>we've got a lot of muscles in our face. There's

0:35:15.760 --> 0:35:18.320
<v Speaker 1>something like nineteen muscles or something that you have to track,

0:35:18.400 --> 0:35:21.440
<v Speaker 1>so um, you would have all these little dots on

0:35:21.480 --> 0:35:23.600
<v Speaker 1>your face to track those motions. Well, with motion scan

0:35:23.680 --> 0:35:27.200
<v Speaker 1>you don't need those anymore. So maybe we'll see something

0:35:27.239 --> 0:35:30.240
<v Speaker 1>like that. Of course, that would really depend upon perhaps

0:35:30.320 --> 0:35:34.560
<v Speaker 1>the lighting, which could if you're shooting a virtual character

0:35:34.640 --> 0:35:36.600
<v Speaker 1>that's next to real characters like in The Lord of

0:35:36.640 --> 0:35:40.560
<v Speaker 1>the Rings, real being I guess you know, your mileage

0:35:40.600 --> 0:35:43.680
<v Speaker 1>may very I mean they're hobbits, but anyway, when you're

0:35:43.719 --> 0:35:46.239
<v Speaker 1>next to real people, clearly you can't mess with the

0:35:46.360 --> 0:35:48.960
<v Speaker 1>lighting too much or it'll just make the whole scene

0:35:49.000 --> 0:35:54.760
<v Speaker 1>look strange. Speaking of strange, UM, well, you might think

0:35:54.840 --> 0:36:00.640
<v Speaker 1>that the techniques used in motion capture uh, um, you know,

0:36:00.680 --> 0:36:03.239
<v Speaker 1>bringing film into a uh you know, adding a lot

0:36:03.239 --> 0:36:08.000
<v Speaker 1>of advancement to to film. Um basically uh, some people

0:36:08.120 --> 0:36:12.720
<v Speaker 1>sort of regardless as cheating. Yeah. I did some research

0:36:12.840 --> 0:36:17.319
<v Speaker 1>that that indicated that, um, although some other types of

0:36:17.360 --> 0:36:22.320
<v Speaker 1>animation are considered you know, considered more artful, UM, motion

0:36:22.360 --> 0:36:26.840
<v Speaker 1>capture is sort of not everyone. But some people say, well,

0:36:26.880 --> 0:36:29.200
<v Speaker 1>you know it's it's not oscar worthy because you were

0:36:29.280 --> 0:36:34.840
<v Speaker 1>using these computer add animation techniques that that really, um

0:36:35.520 --> 0:36:38.600
<v Speaker 1>are simulating human motion, and it's just it's just not real.

0:36:39.080 --> 0:36:42.160
<v Speaker 1>And uh. The argument that I've seen used against it is, well,

0:36:42.320 --> 0:36:47.560
<v Speaker 1>you consider rotoscoping, okay, why don't you consider motion capture,

0:36:47.560 --> 0:36:51.479
<v Speaker 1>which is a kind of descendant from this technology. Why

0:36:51.520 --> 0:36:54.360
<v Speaker 1>why isn't that okay to uh, you know, to consider

0:36:54.440 --> 0:36:59.520
<v Speaker 1>for um quality and and and for rewards. But um,

0:36:59.560 --> 0:37:03.680
<v Speaker 1>apparently it's a it's sort of a hot topic among um,

0:37:03.719 --> 0:37:07.680
<v Speaker 1>among movie makers. Yeah. I can see one animator a

0:37:07.680 --> 0:37:11.799
<v Speaker 1>traditional animator, or even a computer animator. I mean that's

0:37:12.280 --> 0:37:15.719
<v Speaker 1>closer and closer to becoming traditional already but either hand

0:37:15.800 --> 0:37:19.200
<v Speaker 1>drawn animation or computer animation. Someone who goes through the

0:37:19.200 --> 0:37:22.200
<v Speaker 1>trouble of animating these things and doing a lot of

0:37:22.200 --> 0:37:26.520
<v Speaker 1>this work. Uh, by hand seems like it's the wrong term,

0:37:26.560 --> 0:37:30.440
<v Speaker 1>but but personally going through and creating these performances, I

0:37:30.480 --> 0:37:33.520
<v Speaker 1>can see where they might feel that way. Um, I

0:37:33.560 --> 0:37:35.520
<v Speaker 1>have a completely different perspective on it. Of course, I'm

0:37:35.520 --> 0:37:37.960
<v Speaker 1>not an animator, so that's part of it, but I

0:37:38.000 --> 0:37:40.520
<v Speaker 1>think of it as creating a performance. And in the

0:37:40.560 --> 0:37:42.840
<v Speaker 1>sense of creating a performance, I think it's a completely

0:37:42.920 --> 0:37:48.880
<v Speaker 1>legitimate tool because you're still relying on an actor to

0:37:49.280 --> 0:37:54.319
<v Speaker 1>create a performance that that that people will relate to,

0:37:54.400 --> 0:37:58.160
<v Speaker 1>whether it's a character that you're supposed to love or

0:37:58.239 --> 0:38:03.800
<v Speaker 1>hate or fear, that all is dependent upon the animator

0:38:03.840 --> 0:38:08.320
<v Speaker 1>and the actor and several other people working to create

0:38:08.400 --> 0:38:13.320
<v Speaker 1>this this performance. And uh, I don't see anything wrong

0:38:13.360 --> 0:38:16.960
<v Speaker 1>with that. That to me is a completely legitimate form

0:38:17.080 --> 0:38:22.399
<v Speaker 1>of creating the art of entertainment. So um, I mean,

0:38:22.440 --> 0:38:25.839
<v Speaker 1>I do understand from an artistic perspective where some people

0:38:25.880 --> 0:38:27.560
<v Speaker 1>could have a problem with it. But but if you

0:38:27.680 --> 0:38:30.480
<v Speaker 1>take a bigger picture look and not not just you

0:38:30.520 --> 0:38:33.680
<v Speaker 1>know what technique you're using, but the end goal of creating,

0:38:34.600 --> 0:38:36.520
<v Speaker 1>whether you want to call it art or not, but

0:38:36.719 --> 0:38:40.920
<v Speaker 1>creating something that has an impact to the viewer or

0:38:41.040 --> 0:38:43.880
<v Speaker 1>player in the case of a video game, I think

0:38:43.960 --> 0:38:47.960
<v Speaker 1>that's more important. But then again, I'm like I said,

0:38:47.960 --> 0:38:50.239
<v Speaker 1>I'm not an animator, so I don't have that kind

0:38:50.280 --> 0:38:53.319
<v Speaker 1>of emotional attachment, you know, I'm not vested in it

0:38:53.360 --> 0:38:56.399
<v Speaker 1>in that way. So UM, I'll be curious to hear

0:38:56.440 --> 0:38:58.960
<v Speaker 1>what our listeners think if they think that there is

0:38:59.040 --> 0:39:03.560
<v Speaker 1>motion capture? Is that cheating? Is it? Uh? Is it

0:39:03.680 --> 0:39:07.440
<v Speaker 1>as Red versus Blue would have you say, a legitimate strategy?

0:39:07.960 --> 0:39:10.000
<v Speaker 1>What what do you think? What do you consider a

0:39:10.000 --> 0:39:13.680
<v Speaker 1>motion capture? You should less know? Yeah, I UM, I

0:39:14.040 --> 0:39:21.120
<v Speaker 1>do see where UM it might make a traditional animator concerned,

0:39:21.239 --> 0:39:25.160
<v Speaker 1>but I don't. I don't really think it diminishes their UM,

0:39:25.239 --> 0:39:29.960
<v Speaker 1>their artistic value, to to UM, to a work whatever

0:39:29.960 --> 0:39:32.400
<v Speaker 1>it may be that they are working on. UM. And

0:39:32.440 --> 0:39:35.439
<v Speaker 1>there are certain times I'm sure where uh you would

0:39:35.520 --> 0:39:39.600
<v Speaker 1>argue that using these techniques is completely inappropriate to what

0:39:40.040 --> 0:39:43.279
<v Speaker 1>they might do UM. But yeah, I mean it's it's

0:39:43.320 --> 0:39:47.680
<v Speaker 1>always a concern when UM you start saying, well, the

0:39:47.680 --> 0:39:49.800
<v Speaker 1>machine can do it, and we don't really need people

0:39:49.840 --> 0:39:53.200
<v Speaker 1>to do it, so get out. Yeah, I don't think

0:39:53.200 --> 0:39:57.319
<v Speaker 1>that's ever gonna be um always the fully the case,

0:39:57.360 --> 0:40:00.160
<v Speaker 1>because you're going to have certain characters within movies that

0:40:00.239 --> 0:40:04.480
<v Speaker 1>are going to be so different from the way humans

0:40:04.680 --> 0:40:09.160
<v Speaker 1>are built, so to speak, that that, uh, that motion

0:40:09.200 --> 0:40:12.399
<v Speaker 1>capture would not be practical. For example, like let's say

0:40:12.400 --> 0:40:17.880
<v Speaker 1>that the character that you're creating has really super long arms,

0:40:18.080 --> 0:40:20.080
<v Speaker 1>and you know, you've got an actor who's pretty lanky,

0:40:20.160 --> 0:40:22.640
<v Speaker 1>but but their arms are not as long as the

0:40:22.719 --> 0:40:25.640
<v Speaker 1>character's arms. Uh, if you were just to a direct

0:40:25.640 --> 0:40:29.799
<v Speaker 1>translation of the actor's movements into the animation, it might

0:40:29.840 --> 0:40:34.879
<v Speaker 1>not look right because the character has different dimensions, their

0:40:34.920 --> 0:40:38.720
<v Speaker 1>body is built differently than the actor. And so without

0:40:38.800 --> 0:40:41.400
<v Speaker 1>tweaking it, without having an animator go in there and

0:40:41.440 --> 0:40:45.040
<v Speaker 1>adjust this and make it look correct compared to what

0:40:45.080 --> 0:40:47.200
<v Speaker 1>the you know, the the vision is for the movie,

0:40:47.760 --> 0:40:50.600
<v Speaker 1>it doesn't come out correctly, it doesn't look right. So

0:40:51.040 --> 0:40:55.839
<v Speaker 1>I think there's very little risk of motion capture ever

0:40:56.000 --> 0:41:01.200
<v Speaker 1>taking that away completely. Plus, there is something to you know,

0:41:01.320 --> 0:41:06.279
<v Speaker 1>creating a performance through traditional animation that you know, it

0:41:06.320 --> 0:41:09.160
<v Speaker 1>does feel differently the motion capture, but that's not a

0:41:09.160 --> 0:41:13.040
<v Speaker 1>bad thing, Like it just depends upon the the vision

0:41:13.040 --> 0:41:17.000
<v Speaker 1>of the director and what the tone of the piece

0:41:17.080 --> 0:41:19.680
<v Speaker 1>needs to be. And in some cases motion capture is

0:41:19.680 --> 0:41:21.160
<v Speaker 1>going to be the best way to achieve that. In

0:41:21.200 --> 0:41:25.879
<v Speaker 1>other cases, motion capture would make it distracting. So um yeah,

0:41:25.920 --> 0:41:30.120
<v Speaker 1>I think as long as we maintain this desire for

0:41:30.640 --> 0:41:34.960
<v Speaker 1>different types of of entertainment and techniques, then there's not

0:41:35.040 --> 0:41:40.160
<v Speaker 1>really any risk of making one disappear. I agree completely,

0:41:40.200 --> 0:41:44.520
<v Speaker 1>I really do. It's it's just a different animals, yep, yep.

0:41:44.560 --> 0:41:48.080
<v Speaker 1>But hey, guys, if you want to chime in on

0:41:48.120 --> 0:41:51.480
<v Speaker 1>this motion capture discussion, please do. Let's know, send us

0:41:51.480 --> 0:41:54.439
<v Speaker 1>an email, are address this tech stuff at Discovery dot com,

0:41:54.600 --> 0:41:57.360
<v Speaker 1>or get in touch with us on Twitter or Facebook

0:41:57.640 --> 0:42:00.719
<v Speaker 1>our handle at both of those. It's text stuff. H. S.

0:42:01.000 --> 0:42:02.800
<v Speaker 1>W and Chris and I will talk to you again

0:42:03.480 --> 0:42:07.640
<v Speaker 1>really soon for more on this and thousands of other topics.

0:42:07.920 --> 0:42:14.279
<v Speaker 1>Is it how staff works dot com Brought to you

0:42:14.320 --> 0:42:16.440
<v Speaker 1>by the two thousand twelve Toyota Camra