WEBVTT - How the Virtual First-Down Line Works

0:00:00.560 --> 0:00:03.600
<v Speaker 1>Welcome to brain Stuff from how stuff works dot com

0:00:03.600 --> 0:00:08.440
<v Speaker 1>where smart Happens. Hi and Marshall brain with today's question,

0:00:08.840 --> 0:00:12.360
<v Speaker 1>how do they superimpose the first down line onto the

0:00:12.400 --> 0:00:15.880
<v Speaker 1>field on televised football games? This is one of those

0:00:15.920 --> 0:00:19.120
<v Speaker 1>things that sounds really simple in theory, but it ends

0:00:19.200 --> 0:00:22.520
<v Speaker 1>up being incredibly complicated when you actually try to do it.

0:00:22.960 --> 0:00:25.799
<v Speaker 1>The system that ESPN uses to paint the line is

0:00:25.840 --> 0:00:29.000
<v Speaker 1>called first and ten and it's created by a company

0:00:29.040 --> 0:00:33.200
<v Speaker 1>called sport Vision. The simplest description of the system is this,

0:00:33.800 --> 0:00:36.559
<v Speaker 1>The first down line is drawn on the field with

0:00:36.640 --> 0:00:40.000
<v Speaker 1>the computer so that viewers seeing the game on TV

0:00:40.280 --> 0:00:42.839
<v Speaker 1>can see the line as though it were painted on

0:00:42.880 --> 0:00:45.320
<v Speaker 1>the field. Here are some of the problems that have

0:00:45.440 --> 0:00:49.000
<v Speaker 1>to be solved in order for this system to work. First,

0:00:49.280 --> 0:00:52.040
<v Speaker 1>the system has to know the orientation of the field

0:00:52.159 --> 0:00:54.960
<v Speaker 1>with respect to the camera so that it can paint

0:00:55.000 --> 0:00:57.680
<v Speaker 1>the first down line with the correct perspective from the

0:00:57.720 --> 0:01:01.400
<v Speaker 1>camera's point of view. Second, the system has to know,

0:01:01.720 --> 0:01:07.760
<v Speaker 1>in the same perspective framework exactly where every yard line is. Third,

0:01:07.800 --> 0:01:10.720
<v Speaker 1>given that the camera person can move the camera, the

0:01:10.840 --> 0:01:15.360
<v Speaker 1>system has to be able to sense the cameras movement tilt, pan, zoom,

0:01:15.360 --> 0:01:21.360
<v Speaker 1>and focus and understand the perspective change resulting from that movement. Fourth,

0:01:21.640 --> 0:01:25.440
<v Speaker 1>a football field is not flat. It crests very gently

0:01:25.480 --> 0:01:28.160
<v Speaker 1>in the middle to help rainwater run off, so the

0:01:28.240 --> 0:01:33.360
<v Speaker 1>line calculated by the system has to appropriately follow that curve. Five.

0:01:33.880 --> 0:01:37.400
<v Speaker 1>The football game is shot by multiple cameras at different

0:01:37.440 --> 0:01:40.200
<v Speaker 1>places in the stadium, so the system has to do

0:01:40.280 --> 0:01:44.520
<v Speaker 1>all this work for several different cameras. Six, the system

0:01:44.600 --> 0:01:47.640
<v Speaker 1>has to be able to sense when players, referees, or

0:01:47.680 --> 0:01:50.680
<v Speaker 1>the ball cross over the first down line so it

0:01:50.720 --> 0:01:53.840
<v Speaker 1>doesn't paint the line on top of them. And seventh,

0:01:54.200 --> 0:01:57.440
<v Speaker 1>the system has to be aware of superimposed graphics that

0:01:57.480 --> 0:02:00.960
<v Speaker 1>the network might overlay on the scene. There are probably

0:02:01.040 --> 0:02:05.040
<v Speaker 1>several other complications as well. It's a tough problem. To

0:02:05.160 --> 0:02:08.680
<v Speaker 1>solve these problems. The creators of the first intense system

0:02:08.720 --> 0:02:13.360
<v Speaker 1>combine hardware and software. First, each camera must have very

0:02:13.360 --> 0:02:18.320
<v Speaker 1>sensitive encoders attached so it can read the cameras angle, tilt, zoom,

0:02:18.320 --> 0:02:21.720
<v Speaker 1>and so on and send that information to the system.

0:02:21.760 --> 0:02:25.079
<v Speaker 1>The system must also have a detailed three D model

0:02:25.120 --> 0:02:27.960
<v Speaker 1>of the field so that it knows where each yard

0:02:28.000 --> 0:02:32.120
<v Speaker 1>line is by integrating the tilt, pan, and zoom information

0:02:32.160 --> 0:02:35.200
<v Speaker 1>with the three D model, the system can calculate where

0:02:35.200 --> 0:02:38.799
<v Speaker 1>the line should go. Then the system uses color palettes

0:02:38.880 --> 0:02:42.080
<v Speaker 1>for the field and for the players, referees, and balls

0:02:42.120 --> 0:02:46.280
<v Speaker 1>to recognize pixel by pixel, whether it's looking at the

0:02:46.320 --> 0:02:50.280
<v Speaker 1>field or something else. This way, only the field gets painted.

0:02:50.600 --> 0:02:53.800
<v Speaker 1>According to the sport Vision website, all of this computation

0:02:53.919 --> 0:02:57.680
<v Speaker 1>requires a lot of equipment. There are eight computers, three

0:02:57.720 --> 0:03:01.440
<v Speaker 1>sets of special encoders, and a lot of wiring dedicated

0:03:01.480 --> 0:03:05.320
<v Speaker 1>to generating the virtual first down line in video format.

0:03:05.720 --> 0:03:08.320
<v Speaker 1>Who would have thought that it would be this complicated.

0:03:09.360 --> 0:03:12.239
<v Speaker 1>Do you have any ideas or suggestions for this podcast?

0:03:12.639 --> 0:03:15.320
<v Speaker 1>If so, please send me an email at podcast at

0:03:15.320 --> 0:03:18.040
<v Speaker 1>how stuff works dot com. For more on this and

0:03:18.080 --> 0:03:23.200
<v Speaker 1>thousands of other topics, go to how stuff works dot com.