WEBVTT - Deepfakes and the Future of Truth

0:00:15.250 --> 0:00:26.810
<v Speaker 1>Pushkin, you're listening to Brave New Planet, a podcast about

0:00:26.850 --> 0:00:31.010
<v Speaker 1>amazing new technologies that could dramatically improve our world. Or

0:00:31.410 --> 0:00:33.970
<v Speaker 1>if we don't make wise choices, could leave us a

0:00:33.970 --> 0:00:44.690
<v Speaker 1>lot worse off. Utopia or dystopia. It's up to us

0:00:45.810 --> 0:00:52.810
<v Speaker 1>pash the eye. On July sixteenth, nineteen sixty nine, Apollo

0:00:52.850 --> 0:00:58.290
<v Speaker 1>eleven blasted off from the Kennedy Space Center near Cape Canaveral, Florida.

0:00:58.650 --> 0:01:03.290
<v Speaker 1>Twenty five million Americans watched on television as the spacecraft

0:01:03.410 --> 0:01:08.890
<v Speaker 1>ascended toward the heavens, carrying commander Neil Armstrong, Lunar Module

0:01:08.930 --> 0:01:14.410
<v Speaker 1>pilot Buzz Aldron, and Command Module pilot Michael Collins their

0:01:14.410 --> 0:01:18.330
<v Speaker 1>mission to be the first humans in history to set

0:01:18.410 --> 0:01:23.810
<v Speaker 1>foot on the Moon. Four days later, on Sunday, July twentieth,

0:01:24.210 --> 0:01:27.970
<v Speaker 1>the lunar module separated from the command ship and soon

0:01:28.210 --> 0:01:34.490
<v Speaker 1>fired its rockets to begin its lunar descent. Five minutes later,

0:01:35.330 --> 0:01:40.250
<v Speaker 1>disaster struck about a mile above the Moon's surface. Program

0:01:40.290 --> 0:01:43.890
<v Speaker 1>alarms twelve O one and twelve O two sounded loudly,

0:01:43.930 --> 0:01:49.810
<v Speaker 1>indicating that the mission computer was overloaded, and then, well,

0:01:51.130 --> 0:01:59.530
<v Speaker 1>every American knows what happened next. Lost date of five

0:02:05.770 --> 0:02:10.130
<v Speaker 1>good evening, my fellow Americans, President Richard Nick addressed a

0:02:10.250 --> 0:02:14.970
<v Speaker 1>grieving nation. Fades has ordained that the men who went

0:02:15.010 --> 0:02:19.010
<v Speaker 1>to the Moon to explore in peace will stay on

0:02:19.050 --> 0:02:25.930
<v Speaker 1>the Moon to rest in peace. These brave men, Neil

0:02:26.250 --> 0:02:32.130
<v Speaker 1>Armstrong and Edwin Auburn, know that there's no hope for

0:02:32.210 --> 0:02:37.290
<v Speaker 1>their recovery, but they also know that there is hope

0:02:37.410 --> 0:02:42.450
<v Speaker 1>for mankind in their sacrifice. He ended with the now

0:02:42.570 --> 0:02:46.370
<v Speaker 1>famous words for every human being who looks up at

0:02:46.370 --> 0:02:49.530
<v Speaker 1>the Moon and the nights to come, will know that

0:02:49.530 --> 0:02:54.610
<v Speaker 1>there is some corner another word that is forever mankind.

0:02:58.130 --> 0:03:02.130
<v Speaker 1>Wait a minute, that never happened. The Moon mission was

0:03:02.170 --> 0:03:07.170
<v Speaker 1>a historic success. The three astronauts returned safely to ticker

0:03:07.210 --> 0:03:11.130
<v Speaker 1>tape parades and a celebrity thirty eight day world tour.

0:03:11.930 --> 0:03:15.730
<v Speaker 1>Those alarms actually did sound, but they turned out to

0:03:15.730 --> 0:03:21.130
<v Speaker 1>be harmless. Nixon never delivered that speech. His speechwriter had

0:03:21.130 --> 0:03:24.450
<v Speaker 1>written it, but it sat in a folder labeled an

0:03:24.530 --> 0:03:30.650
<v Speaker 1>event of Moon disaster until now. The Nixon you just

0:03:30.810 --> 0:03:34.050
<v Speaker 1>heard is a deep fake, part of a seven minute

0:03:34.090 --> 0:03:39.650
<v Speaker 1>film created by artificial intelligence deep learning algorithms. The fake

0:03:39.810 --> 0:03:43.090
<v Speaker 1>was made by the Center for Advanced Virtuality at the

0:03:43.090 --> 0:03:46.930
<v Speaker 1>Massachusetts Institute of Technology as part of an art exhibit

0:03:47.210 --> 0:03:50.930
<v Speaker 1>to raise awareness about the power of synthesized media. Not

0:03:51.050 --> 0:03:54.570
<v Speaker 1>long ago, something like this would have taken a lot

0:03:54.730 --> 0:03:58.410
<v Speaker 1>of time and money, But now it's getting easy. You

0:03:58.450 --> 0:04:01.450
<v Speaker 1>can make new paintings in the style of French Impressionism,

0:04:01.770 --> 0:04:06.330
<v Speaker 1>revived dead movie stars, help patients with nor degenerative disease,

0:04:06.810 --> 0:04:09.570
<v Speaker 1>or soon maybe take a class on a tour of

0:04:09.610 --> 0:04:14.090
<v Speaker 1>ancient rome. But as the technology quickly becomes democratized, we're

0:04:14.130 --> 0:04:17.690
<v Speaker 1>getting to the point where almost anyone can create a

0:04:17.730 --> 0:04:21.210
<v Speaker 1>fake video of a friend, an ex lover, a stranger,

0:04:21.650 --> 0:04:26.650
<v Speaker 1>or a public figure that's embarrassing, pornographic, or perhaps capable

0:04:26.690 --> 0:04:31.450
<v Speaker 1>of causing international chaos. Some argue that in a culture

0:04:31.490 --> 0:04:36.290
<v Speaker 1>where fake news spreads like wildfire and political leaders deny

0:04:36.410 --> 0:04:40.530
<v Speaker 1>the veracity of hard facts, deep fake media may do

0:04:40.570 --> 0:04:47.090
<v Speaker 1>a lot more harm than good. Today's big question will

0:04:47.170 --> 0:04:52.570
<v Speaker 1>synthesized media unleash a new wave of creativity or will

0:04:52.570 --> 0:04:57.170
<v Speaker 1>it erode the already tenuous role of truth in our democracy?

0:04:57.290 --> 0:05:00.850
<v Speaker 1>And is there anything we can do to keep it

0:05:00.890 --> 0:05:12.090
<v Speaker 1>in check. My name is Eric Lander. I'm a scientist

0:05:12.090 --> 0:05:15.010
<v Speaker 1>who works on ways to improve human health. I helped

0:05:15.050 --> 0:05:17.770
<v Speaker 1>lead the Human Genome Project, and today I lead the

0:05:17.810 --> 0:05:21.930
<v Speaker 1>Broad Institute of MIT and Harvard. In the twenty first century,

0:05:22.290 --> 0:05:26.770
<v Speaker 1>powerful technologies have been appearing at a breathtaking pace related

0:05:26.810 --> 0:05:31.290
<v Speaker 1>to the Internet, artificial intelligence, genetic engineering, and more. They

0:05:31.330 --> 0:05:35.650
<v Speaker 1>have amazing potential upsides, but we can't ignore the risks

0:05:35.730 --> 0:05:38.770
<v Speaker 1>that come with them. The decisions aren't just up to

0:05:38.890 --> 0:05:42.730
<v Speaker 1>scientists or politicians. Whether we like it or not, we

0:05:43.450 --> 0:05:46.530
<v Speaker 1>all of us are the stewards of a brave New Planet.

0:05:47.090 --> 0:05:50.850
<v Speaker 1>This generation's choices will shape the future as never before.

0:05:53.410 --> 0:05:57.490
<v Speaker 1>Coming up on today's episode of Brave New Planet, I

0:05:57.530 --> 0:06:01.850
<v Speaker 1>speak with some of the leaders behind advances in synthesized media.

0:06:02.050 --> 0:06:05.050
<v Speaker 1>You could, certainly, by the way, generate stories that could

0:06:05.570 --> 0:06:09.370
<v Speaker 1>be fresh and interesting and new and personal for every child.

0:06:09.610 --> 0:06:13.410
<v Speaker 1>We got emails from people who were quadruplegic and they

0:06:13.450 --> 0:06:16.090
<v Speaker 1>asked us if we could make them dance. We hear

0:06:16.170 --> 0:06:19.810
<v Speaker 1>from experts about some of the frightening ways that bad

0:06:19.850 --> 0:06:23.850
<v Speaker 1>actors can use deep fakes. Creditors would chime in and say,

0:06:24.090 --> 0:06:26.970
<v Speaker 1>you can absolutely make a deep fake sex video of

0:06:26.970 --> 0:06:29.770
<v Speaker 1>your ex with thirty pictures. I've done it with twenty.

0:06:30.010 --> 0:06:31.850
<v Speaker 1>Here's the things that keep me up at night right

0:06:32.530 --> 0:06:35.410
<v Speaker 1>a video of Donald Trump saying I've launched nuclear weapons

0:06:35.410 --> 0:06:38.770
<v Speaker 1>against Iran, and before anybody gets around to firing out

0:06:38.810 --> 0:06:40.890
<v Speaker 1>whether this is real or not, we have global nuclear

0:06:40.970 --> 0:06:45.410
<v Speaker 1>outdown and we explore how we might prevent the worst abuses.

0:06:46.250 --> 0:06:52.050
<v Speaker 1>It's important that younger people advocate for the Internet that

0:06:52.090 --> 0:06:54.690
<v Speaker 1>they want. We have to fight for it. We have

0:06:54.770 --> 0:07:04.650
<v Speaker 1>to ask for different things. Stay with us, Chapter one,

0:07:05.210 --> 0:07:10.050
<v Speaker 1>Abraham Lincoln's Head. To begin to understand and the significance

0:07:10.050 --> 0:07:13.530
<v Speaker 1>of deep fake technology, I went to San Francisco to

0:07:13.610 --> 0:07:17.170
<v Speaker 1>speak with a world expert on synthetic media. My name

0:07:17.330 --> 0:07:22.170
<v Speaker 1>is Alexei or sometimes called Alyosha Afros, and I'm a

0:07:22.170 --> 0:07:26.650
<v Speaker 1>professor at UC Berkeley and Computer Science and Lexical Engineering Department.

0:07:27.170 --> 0:07:33.050
<v Speaker 1>My research is on computer vision, computer graphics, machine learning,

0:07:33.770 --> 0:07:39.570
<v Speaker 1>various aspects of artificial intelligence. Where'd you grow up. I

0:07:39.650 --> 0:07:43.530
<v Speaker 1>grew up in Saint Petersburg in Russia. I was one

0:07:43.570 --> 0:07:47.970
<v Speaker 1>of those geeky kids playing around with computers or dreaming

0:07:47.970 --> 0:07:55.170
<v Speaker 1>about computers. My first computer was actually the first Soviet

0:07:55.650 --> 0:07:59.610
<v Speaker 1>personal computer. So you actually are involved in making sort

0:07:59.650 --> 0:08:04.890
<v Speaker 1>of synthetic content, synthetic media, that's right. Alexei has invented

0:08:04.970 --> 0:08:08.970
<v Speaker 1>powerful artificial intelligence tools, but his lab also has one

0:08:09.250 --> 0:08:13.130
<v Speaker 1>full ability to use computers to enhance the human experience.

0:08:13.850 --> 0:08:17.250
<v Speaker 1>I was struck by a remarkable video on YouTube created

0:08:17.290 --> 0:08:21.090
<v Speaker 1>by his team at Berkeley. So this was a project

0:08:21.130 --> 0:08:26.530
<v Speaker 1>that actually was done by my students who didn't even

0:08:26.570 --> 0:08:30.290
<v Speaker 1>think of this as anything but a silly little toy

0:08:30.370 --> 0:08:34.370
<v Speaker 1>project of trying to see if we could get a

0:08:34.490 --> 0:08:39.290
<v Speaker 1>geeky computer science student to move like a ballerina. In

0:08:39.330 --> 0:08:43.050
<v Speaker 1>the video, one of the students, Carolyn cham dances with

0:08:43.130 --> 0:08:46.730
<v Speaker 1>a skill and grace of a professional despite never having

0:08:46.730 --> 0:08:50.890
<v Speaker 1>studied ballet. The idea is, you take a source actor

0:08:50.970 --> 0:08:55.090
<v Speaker 1>like a ballerina. There is a way to detect the

0:08:55.250 --> 0:08:59.770
<v Speaker 1>limbs of the dancer, have a kind of a skeleton extracted,

0:09:00.210 --> 0:09:04.090
<v Speaker 1>and also have my student just move around and do

0:09:04.210 --> 0:09:08.050
<v Speaker 1>some geeky moves. And now we're basically just going to

0:09:08.090 --> 0:09:14.090
<v Speaker 1>try to sympathize the appearance of my student driven by

0:09:14.130 --> 0:09:16.930
<v Speaker 1>the skeleton of the ballerina. Put it all together, and

0:09:16.970 --> 0:09:21.810
<v Speaker 1>then we have our grad student dancing pirouets like a ballerina.

0:09:23.290 --> 0:09:27.850
<v Speaker 1>Through artificial intelligence, Carolyn's body is puppeteered by the dancer.

0:09:28.130 --> 0:09:30.890
<v Speaker 1>We weren't even going to publish it, but we just

0:09:31.370 --> 0:09:35.610
<v Speaker 1>released a video on YouTube called Everybody Dance Now, and

0:09:36.490 --> 0:09:40.650
<v Speaker 1>somehow it really touched the nerve. Well, there's been an

0:09:40.650 --> 0:09:44.890
<v Speaker 1>explosion recently a new ways to manipulate media. Alexei notes

0:09:44.970 --> 0:09:49.410
<v Speaker 1>that the idea itself isn't new, It has a long history.

0:09:49.770 --> 0:09:53.450
<v Speaker 1>I can't help but ask, given that you come from Russia.

0:09:53.770 --> 0:09:58.770
<v Speaker 1>One of the premier users of doctoring photographs I think

0:09:58.930 --> 0:10:03.930
<v Speaker 1>was Stalin, who used the ability to manipulate images for

0:10:04.130 --> 0:10:07.650
<v Speaker 1>political effect. How did they do that? Can you think

0:10:07.650 --> 0:10:10.650
<v Speaker 1>of examples of this and like what was the technology? Then?

0:10:11.370 --> 0:10:17.890
<v Speaker 1>The urge to change photographs has been around basically since

0:10:17.890 --> 0:10:20.930
<v Speaker 1>the invention of photography. For example, there is a photograph

0:10:21.010 --> 0:10:25.650
<v Speaker 1>of Abraham Lincoln that still hangs in many classrooms. That's fake.

0:10:25.730 --> 0:10:30.450
<v Speaker 1>It's actually Calhoun with Lincoln's head attached to it. Alexei's

0:10:30.490 --> 0:10:34.570
<v Speaker 1>referring to John C. Calhoun, the South Carolina senator and

0:10:34.770 --> 0:10:39.970
<v Speaker 1>champion of slavery. A Civil War portrait artist superimposed a

0:10:40.050 --> 0:10:44.530
<v Speaker 1>photo of Lincoln's head onto an engraving of Calhoun's body

0:10:45.090 --> 0:10:49.650
<v Speaker 1>because he thought Lincoln's gangly frame wasn't dignified enough, and

0:10:49.730 --> 0:10:52.570
<v Speaker 1>so they just said Okay, we can use Calhoun. Let's

0:10:52.730 --> 0:10:55.690
<v Speaker 1>slap the Lincoln's head on his body. And then, of course,

0:10:56.250 --> 0:10:59.250
<v Speaker 1>as soon as you go into the twentieth century, as

0:10:59.250 --> 0:11:03.370
<v Speaker 1>soon as you get to dictatorships, this is a wonderful

0:11:04.050 --> 0:11:07.850
<v Speaker 1>toy for a dictator to use. So again, Stalin was

0:11:08.690 --> 0:11:12.370
<v Speaker 1>big fan of this. He would get rid of people

0:11:12.450 --> 0:11:15.850
<v Speaker 1>in photographs once they were out of favor, or once

0:11:15.890 --> 0:11:20.170
<v Speaker 1>they got jailed or killed. He would just basically get

0:11:20.170 --> 0:11:25.810
<v Speaker 1>them scratched out with reasonably crude techniques. Hitler did it,

0:11:25.930 --> 0:11:29.090
<v Speaker 1>Mao did it, Castro did it, Bresnev did it. I'm

0:11:29.090 --> 0:11:32.530
<v Speaker 1>sure US agencies have done it. Also, we have always

0:11:32.570 --> 0:11:36.610
<v Speaker 1>manipulated images with a desire to change history. This is

0:11:36.650 --> 0:11:39.930
<v Speaker 1>Honi f Reed. He's also a professor at Berkeley and

0:11:39.970 --> 0:11:43.130
<v Speaker 1>a friend of Alexey's. I'm a professor of computer science

0:11:43.130 --> 0:11:47.210
<v Speaker 1>and I'm an expert in digital forensics, where Alexei works

0:11:47.250 --> 0:11:50.930
<v Speaker 1>on making synthetic media. Honey has devoted his career to

0:11:51.170 --> 0:11:55.250
<v Speaker 1>identifying when synthetic media is being used to fool people,

0:11:55.810 --> 0:12:00.730
<v Speaker 1>that is, spotting fakes. He regularly collaborates on this mission

0:12:00.730 --> 0:12:05.130
<v Speaker 1>with Alexey so I, met Alyosha efros Ten, twenty years ago.

0:12:05.570 --> 0:12:11.090
<v Speaker 1>He is really incredibly creative and clever guy, and he

0:12:11.690 --> 0:12:14.050
<v Speaker 1>has done what I consider some of the most interesting

0:12:14.090 --> 0:12:16.650
<v Speaker 1>work in computer vision and computer graphics over the last

0:12:16.690 --> 0:12:20.810
<v Speaker 1>two decades. And if you really want to do forensics, well,

0:12:20.930 --> 0:12:23.290
<v Speaker 1>you have to partner with somebody like Aliosha. You have

0:12:23.370 --> 0:12:26.570
<v Speaker 1>to partner with a world class mind who knows how

0:12:26.570 --> 0:12:29.410
<v Speaker 1>to think about the synthesis side so that you can

0:12:29.450 --> 0:12:32.890
<v Speaker 1>synthesize the absolute best content and then think about how

0:12:32.930 --> 0:12:35.370
<v Speaker 1>to detect it. I think it's interesting that if you're

0:12:35.410 --> 0:12:38.250
<v Speaker 1>somebody on the synthesis side and developing the forensic there's

0:12:38.290 --> 0:12:39.890
<v Speaker 1>a little bit of a jekylin hide there, and I

0:12:39.930 --> 0:12:44.290
<v Speaker 1>think it's really fascinating. You know, the idea of altering photos,

0:12:44.930 --> 0:12:47.850
<v Speaker 1>it's not entirely new. How far back does this go?

0:12:48.730 --> 0:12:51.450
<v Speaker 1>So we used to have in the days of Stalin,

0:12:51.970 --> 0:12:57.690
<v Speaker 1>highly talented, highly skilled, time consuming, difficult process of manipulating images,

0:12:58.170 --> 0:13:03.090
<v Speaker 1>removing somebody, erasing something from the image, splicing faces together.

0:13:03.650 --> 0:13:06.810
<v Speaker 1>And then we moved into the digital age where now

0:13:06.850 --> 0:13:10.290
<v Speaker 1>a highly talented digital artist could remove one face and

0:13:10.330 --> 0:13:13.290
<v Speaker 1>add another phase, but it was still a time consuming

0:13:13.330 --> 0:13:17.090
<v Speaker 1>and required scale. In nineteen ninety four, The makers of

0:13:17.090 --> 0:13:21.170
<v Speaker 1>the movie Forrest Gump won an Oscar for Visual Effects

0:13:21.250 --> 0:13:25.410
<v Speaker 1>for their representations of the title character interacting with historical

0:13:25.530 --> 0:13:29.610
<v Speaker 1>figures like President John F. Kennedy gratulating how do the

0:13:29.690 --> 0:13:33.690
<v Speaker 1>field being all Americans? It's very good congratulation. How do

0:13:33.730 --> 0:13:40.090
<v Speaker 1>you feel I got? I believe that he had. Now

0:13:40.450 --> 0:13:42.770
<v Speaker 1>computers are doing all of the heavy lifting of what

0:13:42.930 --> 0:13:46.290
<v Speaker 1>used to be relegated to talented artists. The average person

0:13:46.370 --> 0:13:50.130
<v Speaker 1>now can use sophisticated technology to not just capture the recording,

0:13:50.170 --> 0:13:53.170
<v Speaker 1>but also manipulate it and then distribute it. The tools

0:13:53.330 --> 0:13:57.290
<v Speaker 1>used to create synthetic media have grown by leaps and bounds,

0:13:57.410 --> 0:14:00.330
<v Speaker 1>especially in the past few years, and so now we

0:14:00.410 --> 0:14:04.290
<v Speaker 1>have technology broadly called deep fake, but more specifically should

0:14:04.330 --> 0:14:08.490
<v Speaker 1>be called synthesized content, where you point an image or

0:14:08.530 --> 0:14:11.850
<v Speaker 1>a video or an audio to an AI or machine

0:14:11.930 --> 0:14:15.210
<v Speaker 1>learning system and it will replace the face for you.

0:14:15.450 --> 0:14:16.850
<v Speaker 1>I mean it can do that in an image, it

0:14:16.890 --> 0:14:19.210
<v Speaker 1>can do that in a video, or it can synthesize

0:14:19.250 --> 0:14:25.890
<v Speaker 1>audio for you in a particular person's voice. It's becomes

0:14:25.970 --> 0:14:30.810
<v Speaker 1>straightforward to swap people's faces. There's a popular YouTube video

0:14:31.170 --> 0:14:35.170
<v Speaker 1>that features tech pioneer Elon Musk's adult face on a

0:14:35.250 --> 0:14:39.290
<v Speaker 1>baby's body, and there's a famous meme where actor Nicholas

0:14:39.330 --> 0:14:43.570
<v Speaker 1>Cage's face replaces those of leading movie actors, both male

0:14:43.650 --> 0:14:47.130
<v Speaker 1>and female. You can put words into people's mouths and

0:14:47.210 --> 0:14:50.970
<v Speaker 1>make them jump and dance and run. You can even

0:14:51.010 --> 0:14:55.290
<v Speaker 1>resurrect powerful figures and have them deliver a fake speech

0:14:55.810 --> 0:15:04.570
<v Speaker 1>about a fake tragedy. From an Altered History, Chapter two,

0:15:05.530 --> 0:15:10.890
<v Speaker 1>Creating Nixon. The text of Nixon's Moon disaster speech that

0:15:10.930 --> 0:15:13.330
<v Speaker 1>we heard at the top of the show is actually

0:15:13.370 --> 0:15:16.330
<v Speaker 1>not fake. As I mentioned, it was written for President

0:15:16.450 --> 0:15:20.690
<v Speaker 1>Nixon as a contingency speech and thankfully never had to

0:15:20.690 --> 0:15:23.970
<v Speaker 1>be delivered. It's an amazing piece of writing. It was

0:15:24.010 --> 0:15:28.130
<v Speaker 1>written by Bill Safire, who was one of Nixon's speech writers.

0:15:28.530 --> 0:15:32.090
<v Speaker 1>This is artist in journalist Francesca Panetta. She's the co

0:15:32.250 --> 0:15:36.850
<v Speaker 1>director of the Nixon Fake or MIT's Moon Disaster Team.

0:15:37.490 --> 0:15:42.410
<v Speaker 1>She's also the creative director in MIT's Center for Advanced Virtuality.

0:15:42.770 --> 0:15:47.330
<v Speaker 1>I was doing experimental journalism at the Guardian newspaper. I

0:15:47.410 --> 0:15:50.490
<v Speaker 1>ran the Guardians Virtual Reality studio for the last three years.

0:15:50.770 --> 0:15:53.250
<v Speaker 1>The second half of the Moon Disaster team is sound

0:15:53.370 --> 0:15:57.650
<v Speaker 1>artist Halsey Bergund. My name is Halsey Bergund. I am

0:15:57.690 --> 0:16:00.690
<v Speaker 1>a sound artist and technologist, and I've had a lot

0:16:00.690 --> 0:16:04.890
<v Speaker 1>of experience with lots of sorts of audio enhanced with technology,

0:16:05.170 --> 0:16:08.530
<v Speaker 1>though this is my first experience with synthetic media, especially

0:16:08.610 --> 0:16:12.090
<v Speaker 1>since I typically focus on authenticity of voice and now

0:16:12.130 --> 0:16:15.570
<v Speaker 1>I'm kind of doing the opposite. So together, Halsey and

0:16:15.650 --> 0:16:19.530
<v Speaker 1>Francesca chose to automate a tragic moment in history that

0:16:19.770 --> 0:16:22.930
<v Speaker 1>never actually happened. I think it all started with it

0:16:23.010 --> 0:16:25.370
<v Speaker 1>being the fiftieth anniversary of the moon landing last year,

0:16:25.490 --> 0:16:28.290
<v Speaker 1>and add on top of that an election cycle in

0:16:28.330 --> 0:16:31.530
<v Speaker 1>this country, and dealing with this information, which is obviously

0:16:32.170 --> 0:16:36.730
<v Speaker 1>very important in election cycles. It was like lightbulbs went

0:16:36.810 --> 0:16:40.250
<v Speaker 1>on and we got very excited about pursuing it. It's

0:16:40.290 --> 0:16:43.810
<v Speaker 1>possible to make mediocre fakes pretty quickly and cheaply, but

0:16:43.930 --> 0:16:47.690
<v Speaker 1>Francesca and Halsey wanted high production values. So how does

0:16:47.730 --> 0:16:51.410
<v Speaker 1>one go about making a first rate fake presidential address?

0:16:51.890 --> 0:16:55.090
<v Speaker 1>There are two components. There's the visuals and there's the audio,

0:16:55.130 --> 0:16:59.370
<v Speaker 1>and they are completely different processes. So we decided to

0:16:59.410 --> 0:17:03.650
<v Speaker 1>go with a video dialogue replacement company called Kenny Ai,

0:17:04.090 --> 0:17:06.650
<v Speaker 1>who would do the visuals for us and then we

0:17:06.730 --> 0:17:10.810
<v Speaker 1>decided to go with re Speech, who are a dialogue

0:17:10.850 --> 0:17:15.690
<v Speaker 1>replacement company for the voice of Nixon. They tackled the

0:17:15.770 --> 0:17:19.090
<v Speaker 1>voice first, the more challenging of the two mediums. What

0:17:19.130 --> 0:17:21.850
<v Speaker 1>we were told to do was to get two to

0:17:21.930 --> 0:17:25.450
<v Speaker 1>three hours worth of Nixon talking. That was pretty easy

0:17:25.450 --> 0:17:28.810
<v Speaker 1>because the Nixon Library has hours and hours of Nixon,

0:17:29.370 --> 0:17:33.690
<v Speaker 1>mainly giving Vietnam's speeches. The Communist armies of North Vietnam

0:17:33.810 --> 0:17:37.290
<v Speaker 1>launched a massive inversion of South Vietnam. That audio was

0:17:37.370 --> 0:17:41.050
<v Speaker 1>then chopped up into chunks between one and three seconds long.

0:17:41.770 --> 0:17:46.610
<v Speaker 1>We found this incredibly patient actor called Lewis D. Wheeler.

0:17:47.250 --> 0:17:51.050
<v Speaker 1>Lewis would listen to the one second clip and then

0:17:51.210 --> 0:17:59.850
<v Speaker 1>he would repeat that and do what I believe was right.

0:18:00.250 --> 0:18:02.410
<v Speaker 1>Re Speech would say to us things like we need

0:18:02.450 --> 0:18:07.370
<v Speaker 1>to change the diagonal attention, which meant nothing to us. Yes,

0:18:07.530 --> 0:18:12.450
<v Speaker 1>we have a whole lot of potential band name going forward. Yeah,

0:18:14.210 --> 0:18:17.410
<v Speaker 1>Synthetic Nixon is another good one. So once we have

0:18:17.810 --> 0:18:21.210
<v Speaker 1>our Nixon model made out of these thousands of tiny clips,

0:18:21.530 --> 0:18:25.890
<v Speaker 1>it means that whatever our actor says will come out

0:18:26.050 --> 0:18:29.450
<v Speaker 1>then in Nixon's voice. So then what we did was

0:18:29.530 --> 0:18:34.130
<v Speaker 1>record the contingency speech of Nixon, and it meant that

0:18:34.170 --> 0:18:39.810
<v Speaker 1>we got Lewis's actually performance but in Nixon's voice. What

0:18:39.850 --> 0:18:42.930
<v Speaker 1>about the video part? I mean, the video was much easier.

0:18:42.970 --> 0:18:45.010
<v Speaker 1>We're talking a couple of days here and a tiny

0:18:45.010 --> 0:18:50.610
<v Speaker 1>amount of data just with Lewis's iPhone. We filmed him

0:18:50.650 --> 0:18:54.370
<v Speaker 1>reading the contingency speech once a couple of minutes of

0:18:54.450 --> 0:18:57.930
<v Speaker 1>him just chatting to camera, and that was it fate

0:18:59.130 --> 0:19:01.650
<v Speaker 1>that the men who went to the Moon to explore

0:19:01.810 --> 0:19:07.770
<v Speaker 1>in peace will stay on. You know. We were told

0:19:07.770 --> 0:19:11.050
<v Speaker 1>by Kenny Ai that everything would be the same in

0:19:11.090 --> 0:19:14.410
<v Speaker 1>the video apart from just the area around the mouth.

0:19:14.850 --> 0:19:18.010
<v Speaker 1>So every gesture of the hand, every blink, every time

0:19:18.050 --> 0:19:20.810
<v Speaker 1>he moved his face, all of that would stay the same,

0:19:21.330 --> 0:19:25.970
<v Speaker 1>but just the mouth basically would change. So we used

0:19:26.210 --> 0:19:30.330
<v Speaker 1>Nixon's resignation speech to have served in this office, it's

0:19:30.450 --> 0:19:35.050
<v Speaker 1>to have felt a very personal sense of it was

0:19:35.370 --> 0:19:38.570
<v Speaker 1>the speech of Nixon that looked the most somber, where

0:19:38.610 --> 0:19:40.810
<v Speaker 1>he seemed to have the most emotion in his face.

0:19:41.490 --> 0:19:45.690
<v Speaker 1>So what actually went on in the computer? Artificial intelligence

0:19:45.970 --> 0:19:50.610
<v Speaker 1>sometimes sounds inscrutable, but the basic ideas are quite simple.

0:19:51.290 --> 0:19:54.290
<v Speaker 1>In this case, it uses a type of computer program

0:19:54.330 --> 0:19:59.010
<v Speaker 1>called an auto encoder. It's trained to take complicated things,

0:19:59.450 --> 0:20:04.010
<v Speaker 1>say spoken sentences or pictures, encode them in a much

0:20:04.050 --> 0:20:08.250
<v Speaker 1>simpler form, and then decode them to recover the original

0:20:08.290 --> 0:20:12.210
<v Speaker 1>as bested. Care the encoder tries to reduce things to

0:20:12.250 --> 0:20:16.490
<v Speaker 1>their essence, throwing away most of the information but keeping

0:20:16.610 --> 0:20:19.690
<v Speaker 1>enough to do a good job of reconstructing it to

0:20:19.690 --> 0:20:23.330
<v Speaker 1>make a deep fake. Here's the trick. Train a speech

0:20:23.370 --> 0:20:27.090
<v Speaker 1>auto encoder for Nixon to Nixon, and a speech auto

0:20:27.210 --> 0:20:31.330
<v Speaker 1>encoder for actor to actor, but force them to use

0:20:31.450 --> 0:20:37.970
<v Speaker 1>the same encoder. Then you can input actor and decoded

0:20:38.330 --> 0:20:41.730
<v Speaker 1>as Nixon. If you have enough data. It's a piece

0:20:41.770 --> 0:20:48.330
<v Speaker 1>of cake around there. Carefully created video, the Moon Disaster

0:20:48.450 --> 0:20:53.170
<v Speaker 1>team created an entire art installation a nineteen sixties living

0:20:53.210 --> 0:20:56.890
<v Speaker 1>room with a fake vintage newspaper sharing the fake tragic

0:20:56.930 --> 0:21:01.730
<v Speaker 1>news while a fake Nixon speaks solemnly on a vintage

0:21:01.770 --> 0:21:05.530
<v Speaker 1>black and white television. Some people, when they were watching

0:21:05.570 --> 0:21:08.570
<v Speaker 1>the installation, they watched a number of times. You'd see them,

0:21:08.610 --> 0:21:10.610
<v Speaker 1>they'd watch at once, then they would watch it again,

0:21:11.410 --> 0:21:13.450
<v Speaker 1>staring at the lips to see if they could see

0:21:13.490 --> 0:21:17.770
<v Speaker 1>any lack of synchronicity. We had some people who thought

0:21:17.810 --> 0:21:21.850
<v Speaker 1>that perhaps Nixon had actually recorded this speech as a

0:21:21.890 --> 0:21:24.810
<v Speaker 1>contingency speech for it to go onto television. Lots of

0:21:24.850 --> 0:21:28.610
<v Speaker 1>folks who were listening, viewing, and even press folks just

0:21:28.690 --> 0:21:30.690
<v Speaker 1>immediately said, oh, the voice is real or whatever you

0:21:30.810 --> 0:21:34.770
<v Speaker 1>said these things that weren't accurate because they just felt

0:21:34.770 --> 0:21:37.330
<v Speaker 1>like there wasn't even a question. I suppose that is

0:21:37.330 --> 0:21:39.490
<v Speaker 1>what we wanted to achieve, But at the same time,

0:21:39.530 --> 0:21:42.530
<v Speaker 1>it was a little bit eye opening and like a

0:21:42.530 --> 0:21:50.890
<v Speaker 1>little scary. You know that that could happen. Chapter three,

0:21:51.610 --> 0:21:55.930
<v Speaker 1>Everybody dance. What do you see as just the wonderful

0:21:57.130 --> 0:22:01.050
<v Speaker 1>upside of having technologies like this? Yeah, I mean a

0:22:01.210 --> 0:22:06.850
<v Speaker 1>aion art is becoming a whole field in itself, so creatively,

0:22:07.170 --> 0:22:11.010
<v Speaker 1>there is enormous potential. One of the potential positive educational

0:22:11.090 --> 0:22:14.730
<v Speaker 1>uses of deep fake technology would be to bring historical

0:22:14.770 --> 0:22:17.930
<v Speaker 1>figures back to life to make learning more durable. I

0:22:17.970 --> 0:22:21.010
<v Speaker 1>think one could do that with bringing Abraham Lincoln back

0:22:21.010 --> 0:22:23.770
<v Speaker 1>to life and having him deliver speeches. Film companies are

0:22:23.770 --> 0:22:26.930
<v Speaker 1>really excited about re enactments. We're already beginning to see

0:22:26.970 --> 0:22:31.010
<v Speaker 1>this in films like Star Wars, when we're bringing people

0:22:31.010 --> 0:22:33.770
<v Speaker 1>like Carrie Fisher back to life. I mean that is

0:22:33.970 --> 0:22:37.090
<v Speaker 1>at the moment not being done through deep fake technologies.

0:22:37.130 --> 0:22:40.650
<v Speaker 1>This is using fatty traditional techniques of CGI at the moment,

0:22:40.890 --> 0:22:43.570
<v Speaker 1>So we still have to see our first deep fake

0:22:43.770 --> 0:22:47.330
<v Speaker 1>big cinema screen release. But this is just to come

0:22:47.450 --> 0:22:50.450
<v Speaker 1>like the technology is getting better and better. Not only

0:22:50.490 --> 0:22:53.530
<v Speaker 1>will we be able to potentially bring back actors and

0:22:53.570 --> 0:22:56.690
<v Speaker 1>actresses who are no longer alive and have them star

0:22:56.770 --> 0:22:59.610
<v Speaker 1>in movies, but an actor could make a model of

0:22:59.650 --> 0:23:02.210
<v Speaker 1>their own voice and then sell the use of that

0:23:02.330 --> 0:23:06.290
<v Speaker 1>voice to anybody to do a voiceover of whatever is wanted,

0:23:06.410 --> 0:23:09.410
<v Speaker 1>and so they could have twenty of the is going

0:23:09.410 --> 0:23:11.410
<v Speaker 1>on at the same time, and the sort of restriction

0:23:11.490 --> 0:23:15.170
<v Speaker 1>of their physical presence is no longer there. And that

0:23:15.250 --> 0:23:17.410
<v Speaker 1>might mean that, you know, Brad Pitt is in everything,

0:23:18.370 --> 0:23:21.530
<v Speaker 1>or it might just mean that lower budget films can

0:23:21.570 --> 0:23:24.050
<v Speaker 1>afford to have some of the higher cost talent. At

0:23:24.090 --> 0:23:26.370
<v Speaker 1>that point, you know, the top twenty actors could just

0:23:26.450 --> 0:23:29.210
<v Speaker 1>do everything. Yes, there's no doubt that there will be

0:23:29.210 --> 0:23:33.050
<v Speaker 1>winners and losers from these technologies, but the potential of

0:23:33.090 --> 0:23:36.530
<v Speaker 1>synthetic media goes way beyond the arts. There are possible

0:23:36.610 --> 0:23:40.730
<v Speaker 1>medical and therapeutic applications. There are companies that are working

0:23:40.810 --> 0:23:44.210
<v Speaker 1>very hard to allow people who have either lost their

0:23:44.290 --> 0:23:46.330
<v Speaker 1>voice or who never had a voice, to be able

0:23:46.370 --> 0:23:49.890
<v Speaker 1>to speak in a way that is either how they

0:23:49.970 --> 0:23:52.690
<v Speaker 1>used to speak or in a way that isn't a

0:23:52.690 --> 0:23:56.570
<v Speaker 1>canned voice that everybody has. Alexei ePROs and his students

0:23:56.690 --> 0:24:01.410
<v Speaker 1>discovered potential uses of synthetic media and medicine quite unintentionally

0:24:01.810 --> 0:24:06.050
<v Speaker 1>while working on their Everybody Dance Now project that could

0:24:06.050 --> 0:24:10.970
<v Speaker 1>turn anyone into a ballerina. Were kind of surprised for

0:24:11.210 --> 0:24:14.650
<v Speaker 1>all the positive feedback we got. We've got emails from

0:24:14.770 --> 0:24:17.650
<v Speaker 1>people who were quadriplegic and they asked us if we

0:24:17.690 --> 0:24:21.450
<v Speaker 1>could make them dance, and it was very unexpected. So

0:24:21.490 --> 0:24:24.570
<v Speaker 1>now we are trying to get the software to be

0:24:24.650 --> 0:24:27.890
<v Speaker 1>in a state where people can use it, because yeah,

0:24:27.930 --> 0:24:33.530
<v Speaker 1>it's somehow it did hit a nerve with folks. Chapter

0:24:33.690 --> 0:24:39.250
<v Speaker 1>four Unicorns in the Andes. The past few years have

0:24:39.370 --> 0:24:43.250
<v Speaker 1>seen amazing advances in the creation of synthetic media through

0:24:43.330 --> 0:24:47.690
<v Speaker 1>artificial intelligence. The technology now goes far beyond fitting one

0:24:47.730 --> 0:24:51.610
<v Speaker 1>face over another face in a video. A recent breakthrough

0:24:51.690 --> 0:24:56.130
<v Speaker 1>has made it possible to create entirely new and very

0:24:56.250 --> 0:25:01.370
<v Speaker 1>convincing content out of thin air. The breakthrough called generative

0:25:01.530 --> 0:25:06.690
<v Speaker 1>adversarial networks or GAMS, came from a machine learning researcher

0:25:06.730 --> 0:25:11.530
<v Speaker 1>at Google named Ian Goodfellow. Like auto encoders, the basic

0:25:11.610 --> 0:25:15.930
<v Speaker 1>idea is simple but brilliant. Suppose you want to create

0:25:16.050 --> 0:25:20.570
<v Speaker 1>amazingly realistic photos of people who don't exist. While you

0:25:20.610 --> 0:25:24.810
<v Speaker 1>build a GAN consisting of two computer programs, a photo

0:25:25.010 --> 0:25:29.170
<v Speaker 1>generator that learns to generate fake photos and a photo

0:25:29.330 --> 0:25:35.170
<v Speaker 1>discriminator that learns to discriminate or identify fake photos from

0:25:35.170 --> 0:25:38.850
<v Speaker 1>a vast collection of real photos. You then let the

0:25:38.850 --> 0:25:43.690
<v Speaker 1>two programs compete, continually tweaking their code to outsmart each other.

0:25:44.490 --> 0:25:48.330
<v Speaker 1>By the time they're done, the GAN can generate amazingly

0:25:48.330 --> 0:25:51.730
<v Speaker 1>convincing fakes. You can see for yourself if you go

0:25:51.770 --> 0:25:55.810
<v Speaker 1>to the website this Person does Not Exist dot com.

0:25:56.650 --> 0:25:59.890
<v Speaker 1>Every time you refresh the page, you're shown a new

0:26:00.130 --> 0:26:04.410
<v Speaker 1>uncanny image of a person who, as the website says,

0:26:04.890 --> 0:26:09.450
<v Speaker 1>does not and never did exist. Francescan I actually tried

0:26:09.530 --> 0:26:17.650
<v Speaker 1>out the website. This young Asian woman. She's got great complexion.

0:26:17.890 --> 0:26:20.850
<v Speaker 1>Envious of that neat black hair with a fringe pink

0:26:20.850 --> 0:26:24.130
<v Speaker 1>lipstick and a slightly dreamy look as she's kind of

0:26:24.130 --> 0:26:30.330
<v Speaker 1>gazing off to her left. Oh, here's a woman who

0:26:30.330 --> 0:26:32.770
<v Speaker 1>looks like she could be a neighbor of mine in Cambridge,

0:26:33.410 --> 0:26:38.090
<v Speaker 1>probably about sixty five. She's got nice wire framed glasses,

0:26:38.370 --> 0:26:43.610
<v Speaker 1>layered hair. Her earrings don't actually match, but that could

0:26:43.610 --> 0:26:46.690
<v Speaker 1>just be her distinctive style. I mean, of course, she

0:26:47.050 --> 0:26:52.330
<v Speaker 1>doesn't really exist. It's hard to argue that gams aren't

0:26:52.330 --> 0:26:57.970
<v Speaker 1>creating original art. In fact, an artist collective recently used

0:26:58.010 --> 0:27:03.050
<v Speaker 1>a GAM to create a French Impressionist style portrait. When

0:27:03.170 --> 0:27:07.210
<v Speaker 1>Christie's sold it at auction, it fetched an eye popping

0:27:07.290 --> 0:27:11.970
<v Speaker 1>four hundred and thirty two thousand dollars. Alexei Efros, the

0:27:12.010 --> 0:27:17.130
<v Speaker 1>Berkeley professor, recently pushed gans a step further, creating something

0:27:17.210 --> 0:27:21.770
<v Speaker 1>called cycle gans. By connecting two gans together in a

0:27:21.770 --> 0:27:27.290
<v Speaker 1>clever way, cycle gans can transform a monet painting into

0:27:27.290 --> 0:27:31.210
<v Speaker 1>what's seemingly a photograph of the same scene, or turn

0:27:31.250 --> 0:27:34.930
<v Speaker 1>a summer landscape into a winter landscape of the same view.

0:27:35.810 --> 0:27:39.290
<v Speaker 1>Alexei's cycle gans seem like magic. If you were to

0:27:39.290 --> 0:27:44.810
<v Speaker 1>add in virtual reality, the possibilities become mind blowing. You

0:27:45.090 --> 0:27:51.490
<v Speaker 1>may be reminiscing about walking down Saint German and Paris

0:27:51.570 --> 0:27:54.450
<v Speaker 1>and with a few clicks. You are there, and you're

0:27:54.530 --> 0:27:57.410
<v Speaker 1>walking down the boulevard, and you're looking at all the buildings,

0:27:57.450 --> 0:28:00.970
<v Speaker 1>and maybe you can even switch to a different year.

0:28:01.130 --> 0:28:06.330
<v Speaker 1>And I think that is I think very exciting as

0:28:06.370 --> 0:28:10.490
<v Speaker 1>a way to mentally travel to different places. So if

0:28:10.490 --> 0:28:12.650
<v Speaker 1>you do this in VR, I mean, can you imagine

0:28:13.010 --> 0:28:16.810
<v Speaker 1>classes going on a class visit to ancient Rome. That's right,

0:28:17.170 --> 0:28:21.890
<v Speaker 1>you could imagine from how a particular city like Chrome.

0:28:22.290 --> 0:28:24.770
<v Speaker 1>Luke's now trying to extrapolate to how it looked in

0:28:24.810 --> 0:28:29.210
<v Speaker 1>the past. It turns out that gans aren't just transforming images.

0:28:29.890 --> 0:28:32.690
<v Speaker 1>I spoke with a friend who's very familiar with another

0:28:32.810 --> 0:28:37.250
<v Speaker 1>remarkable application of the technology. My name is Reid Hoffman.

0:28:37.370 --> 0:28:40.650
<v Speaker 1>I'm a podcaster of Master's Scale. I'm a partner at Greylock,

0:28:40.690 --> 0:28:43.570
<v Speaker 1>which is where we're sitting right now, co founder of LinkedIn,

0:28:43.770 --> 0:28:47.810
<v Speaker 1>and then a variety of other eccentric hobbies. Reid is

0:28:47.850 --> 0:28:52.170
<v Speaker 1>a board member of an unusual organization called open AI.

0:28:52.650 --> 0:28:55.850
<v Speaker 1>Open a Eyes is highly concerned with artificial general intelligence

0:28:55.930 --> 0:28:59.890
<v Speaker 1>human level intelligence. I helped Sam Altman and Elon Musk

0:29:00.130 --> 0:29:05.410
<v Speaker 1>standing up. The basic concern was that if one company

0:29:05.490 --> 0:29:09.850
<v Speaker 1>created and deployed that that could be is balancing in

0:29:09.890 --> 0:29:12.970
<v Speaker 1>all kinds of ways. And so the thought is, if

0:29:12.970 --> 0:29:15.770
<v Speaker 1>it could be created, we should make sure that there

0:29:15.850 --> 0:29:19.090
<v Speaker 1>is essentially a nonprofit that is creating this and that

0:29:19.170 --> 0:29:23.530
<v Speaker 1>can make that technology available at selective time, slices to

0:29:24.250 --> 0:29:28.690
<v Speaker 1>industry as a whole, government, etc. Last year, open ai

0:29:28.890 --> 0:29:33.490
<v Speaker 1>released a program that uses gams to write language from

0:29:33.490 --> 0:29:38.450
<v Speaker 1>a short opening prompt. The system, called GPT two, can

0:29:38.490 --> 0:29:42.130
<v Speaker 1>spin a convincing article or story instead of a deep

0:29:42.170 --> 0:29:47.010
<v Speaker 1>fake video. It's deep fake text. It's pretty amazing actually.

0:29:47.450 --> 0:29:52.770
<v Speaker 1>For example, open ai researchers gave the program the following prompt.

0:29:53.650 --> 0:29:56.970
<v Speaker 1>In a shocking finding, scientists discovered a herd of unicorns

0:29:57.050 --> 0:30:00.610
<v Speaker 1>living in a remote, previously unexplored valley in the Andes Mountains.

0:30:01.290 --> 0:30:04.250
<v Speaker 1>Even more surprising to the researches was the fact that

0:30:04.290 --> 0:30:08.290
<v Speaker 1>the unicorns spoke perfect to English. GPT two took it

0:30:08.330 --> 0:30:13.250
<v Speaker 1>from there, the livering nine crisp paragraphs on the landmark discovery.

0:30:13.850 --> 0:30:16.050
<v Speaker 1>I asked Franz to read a bit from the story.

0:30:16.610 --> 0:30:20.490
<v Speaker 1>Doctor Jorge Perez, an evolutionary biologists from the University of

0:30:20.570 --> 0:30:25.130
<v Speaker 1>Lapez and several companions, were exploring the Andes Mountains when

0:30:25.130 --> 0:30:28.530
<v Speaker 1>they found a small valley with no other animals or humans.

0:30:29.210 --> 0:30:31.810
<v Speaker 1>Perez noticed that the valley had what appeared to be

0:30:31.970 --> 0:30:35.370
<v Speaker 1>a natural fountains surrounded by two peaks of rock and

0:30:35.450 --> 0:30:39.570
<v Speaker 1>silver snow. Perez and the others then ventured further into

0:30:39.610 --> 0:30:41.930
<v Speaker 1>the valley. By the time we reached the top of

0:30:41.930 --> 0:30:44.690
<v Speaker 1>one peak, the water looked blue with some crystals on top,

0:30:44.770 --> 0:30:48.130
<v Speaker 1>said Perez. Perez and his friends were astonished to see

0:30:48.130 --> 0:30:55.250
<v Speaker 1>the unicorn. Heard. Tell me some of the great things

0:30:55.290 --> 0:30:59.410
<v Speaker 1>you can do with language generation, well, say, for example, entertainment,

0:30:59.850 --> 0:31:03.530
<v Speaker 1>generate stories that could be fresh and interesting and new

0:31:03.610 --> 0:31:08.130
<v Speaker 1>and personal for every child. Embed educational things in those

0:31:08.170 --> 0:31:11.130
<v Speaker 1>stories of the on into the fact that the story

0:31:11.250 --> 0:31:14.690
<v Speaker 1>is involving them and their friends, but also now brings

0:31:14.730 --> 0:31:19.170
<v Speaker 1>in grammar and math and other kinds of things as

0:31:19.490 --> 0:31:24.010
<v Speaker 1>the doing it generate explanatory material of this kind of

0:31:24.170 --> 0:31:27.890
<v Speaker 1>education that works best for this audience, for this kind

0:31:27.890 --> 0:31:29.370
<v Speaker 1>of people, like we want to have this kind of

0:31:29.410 --> 0:31:31.130
<v Speaker 1>math or this kind of physics, or this kind of

0:31:31.170 --> 0:31:34.730
<v Speaker 1>history or this kind of poetry explained in the right way,

0:31:34.850 --> 0:31:37.530
<v Speaker 1>and also the style of language right like you know

0:31:37.730 --> 0:31:41.890
<v Speaker 1>native city x language. When open ai announced its breakthrough

0:31:41.930 --> 0:31:45.970
<v Speaker 1>program for text generation, it took the unusual step of

0:31:46.090 --> 0:31:49.130
<v Speaker 1>not releasing the full powered version because it was worried

0:31:49.170 --> 0:31:52.810
<v Speaker 1>about the possible consequences. Now, part of the open AI

0:31:52.890 --> 0:31:56.330
<v Speaker 1>decision to say we're going to release a smaller model

0:31:56.570 --> 0:31:59.410
<v Speaker 1>than the one we did is because we think that

0:31:59.450 --> 0:32:02.250
<v Speaker 1>the deep fake problem hasn't been solved. And by the way,

0:32:02.450 --> 0:32:04.570
<v Speaker 1>some people complained about that, because they said, well, you're

0:32:04.570 --> 0:32:07.250
<v Speaker 1>slowing down our ability to do progress. And so for

0:32:07.410 --> 0:32:09.490
<v Speaker 1>the answer and say, look, when these are at least

0:32:09.650 --> 0:32:13.130
<v Speaker 1>to the entire public, we cannot control the downside as

0:32:13.130 --> 0:32:20.650
<v Speaker 1>well as upsides. Downsides from art to therapy to virtual

0:32:20.730 --> 0:32:26.850
<v Speaker 1>time travel, personalized stories and education, synthetic media has amazing upsides.

0:32:27.530 --> 0:32:34.690
<v Speaker 1>What could possibly go wrong? Chapter five? What could possibly

0:32:34.770 --> 0:32:39.330
<v Speaker 1>go wrong? The downsides are actually not hard to find.

0:32:39.890 --> 0:32:44.570
<v Speaker 1>The ability to reshape reality brings extraordinary power, and people

0:32:44.690 --> 0:32:49.210
<v Speaker 1>inevitably use power to control other people. It should be

0:32:49.250 --> 0:32:52.650
<v Speaker 1>no surprise, therefore, that ninety six percent of fake videos

0:32:52.690 --> 0:32:58.650
<v Speaker 1>posted online are non consensual pornography videos, almost always of

0:32:58.650 --> 0:33:03.450
<v Speaker 1>women manipulated to depict sex acts that never actually occurred.

0:33:04.210 --> 0:33:07.450
<v Speaker 1>I spoke with a professor who studies deep fakes, including

0:33:07.450 --> 0:33:11.930
<v Speaker 1>digital attempts to control women's bodies. I'm Danielle Citron and

0:33:12.170 --> 0:33:15.410
<v Speaker 1>I am a law professor at Boston University School of Law.

0:33:15.610 --> 0:33:20.610
<v Speaker 1>I write about privacy, technology, automation. My newest work and

0:33:20.690 --> 0:33:23.130
<v Speaker 1>my next book is going to be about sexual privacy.

0:33:24.130 --> 0:33:27.450
<v Speaker 1>So I've worked in and around consumer privacy, individual rights,

0:33:27.490 --> 0:33:30.490
<v Speaker 1>civil rights. I write a lot about free speech and

0:33:30.530 --> 0:33:34.890
<v Speaker 1>then automated systems. When do you first become aware of

0:33:35.010 --> 0:33:38.050
<v Speaker 1>deep fakes? Do you remember when this cross rit I did? So,

0:33:38.570 --> 0:33:41.730
<v Speaker 1>there was a Reddit thread devoted to, you know, fake

0:33:42.090 --> 0:33:46.130
<v Speaker 1>pornography movies of Gal Jadot Emma Watson. But the reddit

0:33:46.170 --> 0:33:50.690
<v Speaker 1>thread sort of spooled not just from celebrities but ordinary people,

0:33:51.170 --> 0:33:53.570
<v Speaker 1>and so you had rereditors asking each other, how do

0:33:53.650 --> 0:33:55.890
<v Speaker 1>I make a deep fake sex video of max girlfriend?

0:33:55.930 --> 0:33:58.770
<v Speaker 1>I have thirty pictures? And then other redditors would chime

0:33:58.810 --> 0:34:01.850
<v Speaker 1>in and say, look at this YouTube tutorial. You can

0:34:01.930 --> 0:34:04.850
<v Speaker 1>absolutely make a deep fake sex video of your ex

0:34:05.330 --> 0:34:08.250
<v Speaker 1>with thirty pictures. I've done it with twenty. In November

0:34:08.370 --> 0:34:13.690
<v Speaker 1>two thousand seventeen, an anonymous reditor began posting synthesized porn

0:34:13.810 --> 0:34:18.130
<v Speaker 1>videos under the pseudonym deep fakes, perhaps a nod to

0:34:18.210 --> 0:34:21.450
<v Speaker 1>the deep learning technology used to create them as well

0:34:21.490 --> 0:34:26.250
<v Speaker 1>as the nineteen seventies porn film deep Throat. The Internet

0:34:26.530 --> 0:34:30.570
<v Speaker 1>quickly adopted the term deep fakes and broadened its meanings

0:34:30.570 --> 0:34:34.970
<v Speaker 1>beyond pornography. To create the videos, he used celebrity faces

0:34:35.050 --> 0:34:39.410
<v Speaker 1>from Google image search and YouTube videos and then trains

0:34:39.450 --> 0:34:44.130
<v Speaker 1>an algorithm on that content together with pornographic videos. Have

0:34:44.250 --> 0:34:49.690
<v Speaker 1>you seen deep fake pornography videos? Yes, so still pretty crude,

0:34:49.730 --> 0:34:53.050
<v Speaker 1>so you probably can tell that it's a fake, but

0:34:53.370 --> 0:34:57.810
<v Speaker 1>for the person who's inserted into pornography, it's devastating. You

0:34:57.970 --> 0:35:03.170
<v Speaker 1>use the neural network technology, the artificial intelligence technology to

0:35:03.210 --> 0:35:08.930
<v Speaker 1>create out of digital whole cloth pornography videos using proba

0:35:09.370 --> 0:35:13.050
<v Speaker 1>real pornography and then inserting the person in the pornography

0:35:13.250 --> 0:35:15.970
<v Speaker 1>so they become the female actress. If it's a female,

0:35:16.010 --> 0:35:20.690
<v Speaker 1>it's usually a female in that video. My name is

0:35:20.810 --> 0:35:26.690
<v Speaker 1>Noel Martin and I am an activist and Laura Form

0:35:26.810 --> 0:35:31.650
<v Speaker 1>campaigner in Australia. Noel is twenty six years old and

0:35:31.770 --> 0:35:36.330
<v Speaker 1>she lives in Perth, Australia. So the first time that

0:35:36.490 --> 0:35:43.050
<v Speaker 1>I discovered myself on pornographic sites was when I was

0:35:43.290 --> 0:35:48.890
<v Speaker 1>eighteen and out of curiosity, decided to Google image reverse

0:35:48.890 --> 0:35:53.210
<v Speaker 1>search myself in an instant, like in a less than

0:35:53.250 --> 0:35:58.170
<v Speaker 1>a millisecond, my life completely changed. At first, it started

0:35:58.170 --> 0:36:02.610
<v Speaker 1>with photos still images stolen from Noel's social media accounts.

0:36:03.090 --> 0:36:08.730
<v Speaker 1>They were then doctoring my face from ordinary images and

0:36:09.450 --> 0:36:14.650
<v Speaker 1>superimposing those onto the bodies of women depicting me having

0:36:14.690 --> 0:36:19.370
<v Speaker 1>sexual intercourse. It proved impossible to identify who was manipulating

0:36:19.450 --> 0:36:23.050
<v Speaker 1>Nowell's image in this way. It's still unclear today, which

0:36:23.050 --> 0:36:25.850
<v Speaker 1>made it difficult for her to seek legal action. I

0:36:25.890 --> 0:36:31.730
<v Speaker 1>went to the police soon after, I contacted government agencies,

0:36:32.650 --> 0:36:36.770
<v Speaker 1>tried getting a private investigator. Essentially, there's nothing that they

0:36:36.770 --> 0:36:40.850
<v Speaker 1>could do. The sites are hosted overseas, the perpetrators are

0:36:40.890 --> 0:36:44.530
<v Speaker 1>probably overseas. The reaction was at the end of the day,

0:36:44.570 --> 0:36:48.050
<v Speaker 1>I think you can contact the webmasters to try and

0:36:48.090 --> 0:36:51.370
<v Speaker 1>get things deleted. You know, you can adjust your privacy

0:36:51.410 --> 0:36:56.210
<v Speaker 1>setting so that nothing is available to anyone publicly. It

0:36:56.250 --> 0:37:01.370
<v Speaker 1>was an unwinnable situation. Then things started to escalate. In

0:37:01.410 --> 0:37:05.570
<v Speaker 1>twenty eighteen, who Well saw a synthesized pornographic video of

0:37:05.570 --> 0:37:09.570
<v Speaker 1>herself and I believe that it was done for the

0:37:09.610 --> 0:37:16.050
<v Speaker 1>purposes of silencing me because I've been very public about

0:37:16.130 --> 0:37:19.970
<v Speaker 1>my story and advocating for change. So I had actually

0:37:20.050 --> 0:37:25.170
<v Speaker 1>gotten email from a fake email address, and you know,

0:37:25.210 --> 0:37:28.570
<v Speaker 1>I clicked the link. I was actually at work. It

0:37:28.650 --> 0:37:33.330
<v Speaker 1>was a video of me having sexual intercourse. The title

0:37:33.410 --> 0:37:36.770
<v Speaker 1>had my name, the face of the woman in it

0:37:37.330 --> 0:37:41.290
<v Speaker 1>was edited so that it was my face, and you know,

0:37:41.330 --> 0:37:46.050
<v Speaker 1>all the tags were like Noel Martin Australia, feminist, and

0:37:46.970 --> 0:37:51.930
<v Speaker 1>it didn't look real, but the context of everything with

0:37:52.010 --> 0:37:56.730
<v Speaker 1>the title my face, with the tags all points to

0:37:57.330 --> 0:38:00.730
<v Speaker 1>me being depicted in this video. The fakes were of

0:38:00.810 --> 0:38:05.290
<v Speaker 1>poor quality, but poor and consumers are in a discriminating lot,

0:38:05.730 --> 0:38:08.210
<v Speaker 1>and many people reacted to them as if they were real.

0:38:08.410 --> 0:38:12.330
<v Speaker 1>The public reaction was horrifying to me. I was a victim,

0:38:12.370 --> 0:38:16.050
<v Speaker 1>blamed and slut shamed, and it's definitely limited the course

0:38:16.090 --> 0:38:20.570
<v Speaker 1>of where I can go in terms of career and employment.

0:38:21.090 --> 0:38:24.810
<v Speaker 1>Noel finished a degree in law and began campaigning to

0:38:24.810 --> 0:38:29.650
<v Speaker 1>criminalize this sort of content. My advocacy and my activism

0:38:29.850 --> 0:38:32.730
<v Speaker 1>started off because I had a lived experience of this,

0:38:32.890 --> 0:38:36.570
<v Speaker 1>and I experienced it at a time where it wasn't

0:38:36.610 --> 0:38:43.450
<v Speaker 1>criminalized in Australia. The distribution of altered intimate images or

0:38:43.610 --> 0:38:49.090
<v Speaker 1>altered intimate videos and so I had to petition, meet

0:38:49.130 --> 0:38:53.210
<v Speaker 1>with my politicians in my area. I wrote a number

0:38:53.210 --> 0:38:55.970
<v Speaker 1>of articles, I spoke to the media, and I was

0:38:56.450 --> 0:39:00.130
<v Speaker 1>involved in the law reform in Australia in a number

0:39:00.130 --> 0:39:04.050
<v Speaker 1>of jurisdictions in Western Australia and New South Wales, and

0:39:04.490 --> 0:39:08.290
<v Speaker 1>I ended up being involved in two press conferences with

0:39:08.370 --> 0:39:12.570
<v Speaker 1>the Attorney generals of each state at the announcement of

0:39:12.610 --> 0:39:18.530
<v Speaker 1>the law that was criminalizing this abuse. Today, in part

0:39:18.570 --> 0:39:22.090
<v Speaker 1>because of Noel's activism, it is illegal in Australia to

0:39:22.170 --> 0:39:27.210
<v Speaker 1>distribute intimate images without consent, including intimate images and videos

0:39:27.410 --> 0:39:31.530
<v Speaker 1>that have been altered. Although it doesn't encompass all malicious

0:39:31.570 --> 0:39:39.650
<v Speaker 1>synthetic media, Noel has made a solid start. Chapter six,

0:39:40.170 --> 0:39:45.810
<v Speaker 1>Scissors and Glue. The videos depicting Noel Martin were nowhere

0:39:45.930 --> 0:39:49.850
<v Speaker 1>near as sophisticated as those made by the Moon Disastered team.

0:39:50.290 --> 0:39:54.410
<v Speaker 1>They were more cheap fakes than deep fakes, and yet

0:39:54.490 --> 0:39:56.930
<v Speaker 1>the point didn't have to be perfect to be devastating.

0:39:57.690 --> 0:40:00.690
<v Speaker 1>The same turns out to be true in politics. To

0:40:00.770 --> 0:40:05.490
<v Speaker 1>understand the power of fakes, you have to understand human psychology.

0:40:05.570 --> 0:40:08.250
<v Speaker 1>It turns out that people are pretty easy to fool.

0:40:09.250 --> 0:40:12.250
<v Speaker 1>Carry I was running for President of the US. His

0:40:12.650 --> 0:40:16.450
<v Speaker 1>stance on the Vietnam War was controversial. Jane Fonda, of course,

0:40:16.530 --> 0:40:18.770
<v Speaker 1>was a very controversial figure back then because of her

0:40:18.810 --> 0:40:21.890
<v Speaker 1>anti war stand. What have we become as a nation

0:40:21.930 --> 0:40:23.890
<v Speaker 1>if we call the men heroes that were used by

0:40:23.890 --> 0:40:26.410
<v Speaker 1>the Pentagon to try to exterminate an entire people? What

0:40:26.490 --> 0:40:28.610
<v Speaker 1>business have we to try to exterminate a people? And

0:40:28.690 --> 0:40:30.810
<v Speaker 1>somebody had created a photo of the two of them

0:40:30.890 --> 0:40:33.770
<v Speaker 1>sharing a stage and an anti war rally with the

0:40:33.810 --> 0:40:37.090
<v Speaker 1>hopes of damaging the Carry campaign. The photo was fake.

0:40:37.330 --> 0:40:39.650
<v Speaker 1>They had never shared a stage together. They just took

0:40:39.690 --> 0:40:42.770
<v Speaker 1>two images, probably put it into some standard photo editing

0:40:42.770 --> 0:40:46.090
<v Speaker 1>software like a Photoshop, and just put a headline around it,

0:40:46.170 --> 0:40:48.690
<v Speaker 1>and out to the world it went. And I will

0:40:48.730 --> 0:40:51.810
<v Speaker 1>tell you I remember the most fascinating interview I've heard

0:40:51.850 --> 0:40:55.650
<v Speaker 1>in a long time was right after the election, Kerry

0:40:55.690 --> 0:40:58.930
<v Speaker 1>of course lost, and a voter was being interviewed and

0:40:59.010 --> 0:41:01.570
<v Speaker 1>asked how they voted, and he said he couldn't vote

0:41:01.610 --> 0:41:04.050
<v Speaker 1>for Carry, and the interview said, well why not? And

0:41:04.170 --> 0:41:06.610
<v Speaker 1>the gentleman said, I couldn't get that photo of John

0:41:06.690 --> 0:41:09.810
<v Speaker 1>Carry and Jane Fonda out of my head. And the interviews, well,

0:41:09.970 --> 0:41:12.090
<v Speaker 1>you know, that photo is fake, and the guy said,

0:41:12.170 --> 0:41:15.010
<v Speaker 1>much to my surprise, yes, but I couldn't get it

0:41:15.050 --> 0:41:18.330
<v Speaker 1>out of my mind. And this shows you the power

0:41:18.370 --> 0:41:21.450
<v Speaker 1>of visual imagery, Like even after I tell you something

0:41:21.530 --> 0:41:24.330
<v Speaker 1>is fake, it still had an impact on somebody, and

0:41:24.450 --> 0:41:27.650
<v Speaker 1>I thought, Wow, we're in a lot of trouble because

0:41:27.890 --> 0:41:30.050
<v Speaker 1>it's very very hard to put the cat back into

0:41:30.050 --> 0:41:32.810
<v Speaker 1>the bag. Once that content is out there, you can't

0:41:32.890 --> 0:41:37.650
<v Speaker 1>undo it. So seeing is believing, even above thinking Yeah,

0:41:37.690 --> 0:41:40.050
<v Speaker 1>that seems to be the rule. There is very good

0:41:40.050 --> 0:41:43.330
<v Speaker 1>evidence from the social science literature that it's very very

0:41:43.370 --> 0:41:46.570
<v Speaker 1>difficult to correct the record after the mistakes are out there.

0:41:46.930 --> 0:41:50.650
<v Speaker 1>Law professor Danielle Citram also notes that humans tend to

0:41:50.690 --> 0:41:55.530
<v Speaker 1>pass on information without thinking, which triggers what she calls

0:41:55.930 --> 0:42:00.570
<v Speaker 1>information cascades. Information cascades is a phenomenon where we have

0:42:00.650 --> 0:42:03.850
<v Speaker 1>so much information overload that when someone sends us something,

0:42:03.970 --> 0:42:06.570
<v Speaker 1>some information, and we trust that person, we pass it on.

0:42:06.890 --> 0:42:10.610
<v Speaker 1>We don't even check it's veracity, and so information can

0:42:10.650 --> 0:42:16.250
<v Speaker 1>go viral fairly quickly because we're not terribly reflective, because

0:42:16.250 --> 0:42:21.130
<v Speaker 1>we act on impulse. Danielle says that information cascades have

0:42:21.210 --> 0:42:24.010
<v Speaker 1>been given new life in the twenty first century through

0:42:24.090 --> 0:42:27.770
<v Speaker 1>social media. Think about the twentieth century phenomenon, where do

0:42:27.770 --> 0:42:33.090
<v Speaker 1>we get most of our information from trusted sources, trusted newspapers,

0:42:33.530 --> 0:42:37.130
<v Speaker 1>trusted major couple of TV channels. Growing up, we only

0:42:37.170 --> 0:42:40.210
<v Speaker 1>had you know, we didn't have a million, and they

0:42:41.130 --> 0:42:44.930
<v Speaker 1>were adhering to journalistic ethics and commitments to truth and

0:42:44.970 --> 0:42:48.570
<v Speaker 1>neutrality and notion that you can't publish something without checking it.

0:42:49.090 --> 0:42:52.810
<v Speaker 1>Now we are publishing information that most people say. We're

0:42:52.850 --> 0:42:56.330
<v Speaker 1>lying on our peers and our friends. Social media platforms

0:42:56.370 --> 0:43:00.130
<v Speaker 1>are designed to tailor our information diet to what we

0:43:00.210 --> 0:43:03.610
<v Speaker 1>want and to our pre existing views, so we're locked

0:43:03.610 --> 0:43:07.490
<v Speaker 1>in a digital echo chamber. We think everybody agrees with us.

0:43:07.930 --> 0:43:12.010
<v Speaker 1>We pass on that information. We haven't checked the veracity.

0:43:12.090 --> 0:43:15.250
<v Speaker 1>It goes wild and we're especially likely to pass it

0:43:15.290 --> 0:43:18.850
<v Speaker 1>on if it's negative and novel. Why's that? It's just

0:43:19.010 --> 0:43:22.530
<v Speaker 1>like it's one of our weaknesses. We know how gossip

0:43:22.570 --> 0:43:26.850
<v Speaker 1>goes like wildfire online. So like Hillary Clinton as running

0:43:27.010 --> 0:43:31.530
<v Speaker 1>a sex ring. That's crazy. Oh my god, Eric, did

0:43:31.570 --> 0:43:34.170
<v Speaker 1>you hear about that. I'll post it on Facebook. Eric,

0:43:34.210 --> 0:43:37.410
<v Speaker 1>you pass it on. We just can't help ourselves, and

0:43:37.530 --> 0:43:40.650
<v Speaker 1>it is much in the way that we love suits

0:43:40.650 --> 0:43:44.970
<v Speaker 1>and fats and pizza. You know, we indulge. We don't think.

0:43:45.890 --> 0:43:49.570
<v Speaker 1>On some sense, this phenomenon is an old phenomenon. Right

0:43:49.770 --> 0:43:54.250
<v Speaker 1>is the famous observation by Mark Twain about how a

0:43:54.330 --> 0:43:57.170
<v Speaker 1>lie gets halfway around the world before the truth gets

0:43:57.170 --> 0:43:59.010
<v Speaker 1>its pants. Hall. Yeah, the truth still in the bedroom

0:43:59.010 --> 0:44:03.130
<v Speaker 1>getting dressed, and we often will see the lie, but

0:44:03.210 --> 0:44:08.650
<v Speaker 1>the rebuttal is not seen. It's often lost in the

0:44:08.770 --> 0:44:12.650
<v Speaker 1>noise ways of the defamatory statements. That is not new.

0:44:12.730 --> 0:44:15.930
<v Speaker 1>But what is new is a number of things about

0:44:15.970 --> 0:44:28.610
<v Speaker 1>our information ecosystem are our force multipliers Chapter seven, Truth Decay.

0:44:30.090 --> 0:44:34.050
<v Speaker 1>Many experts are worried that the rapid advances in making fakes,

0:44:34.090 --> 0:44:38.650
<v Speaker 1>combined with a catalyst of information cascades, will undermine democracy.

0:44:39.330 --> 0:44:44.250
<v Speaker 1>The biggest concerns have focused on elections Globally, we are

0:44:44.250 --> 0:44:50.610
<v Speaker 1>looking at highly polarized situations where this kind of manipulated

0:44:50.690 --> 0:44:52.770
<v Speaker 1>media can be used as a weapon. One of the

0:44:52.770 --> 0:44:56.690
<v Speaker 1>main reasons Francesca and Halsey made their Nixon deep fake

0:44:57.290 --> 0:45:01.050
<v Speaker 1>was to spread awareness about the risks of misinformation campaigns

0:45:01.530 --> 0:45:05.690
<v Speaker 1>before the twenty twenty US presidential election. Similarly, a group

0:45:05.770 --> 0:45:09.170
<v Speaker 1>showcased the power of deep fakes by making videos the

0:45:09.290 --> 0:45:12.690
<v Speaker 1>run up to the UK parliamentary election showing the two

0:45:12.730 --> 0:45:17.730
<v Speaker 1>bitter rivals, Boris Johnson and Jeremy Corman, each endorsing the other.

0:45:18.330 --> 0:45:21.330
<v Speaker 1>I wish to rise above this divide and indorse my

0:45:21.410 --> 0:45:25.530
<v Speaker 1>worthy opponent, the right Honorable Jeremy Corbyn. SIPI Prime Minister

0:45:25.770 --> 0:45:29.810
<v Speaker 1>of our United Kingdom, back Boris Johnson to continue as

0:45:29.810 --> 0:45:33.610
<v Speaker 1>our Prime Minister. But you know what, don't listen to me.

0:45:33.930 --> 0:45:36.210
<v Speaker 1>I think I may be one of the thousands of

0:45:36.290 --> 0:45:40.530
<v Speaker 1>deep fakes on the Internet, using powerful technologies to tell

0:45:40.650 --> 0:45:44.850
<v Speaker 1>stories that aren't so. This just kind of indicates how

0:45:45.250 --> 0:45:50.130
<v Speaker 1>candidates and political figures can be misrepresented, and you just

0:45:50.170 --> 0:45:54.770
<v Speaker 1>need to feed them into people's social media feeds for

0:45:54.810 --> 0:45:57.370
<v Speaker 1>them to be seeing this at times when the stakes

0:45:57.370 --> 0:46:01.570
<v Speaker 1>are pretty high. So far, we haven't yet seen sophisticated

0:46:01.650 --> 0:46:05.770
<v Speaker 1>deep fakes in US or UK politics. That might be

0:46:05.770 --> 0:46:08.770
<v Speaker 1>because fakes will be most effective if they're tim for

0:46:09.170 --> 0:46:13.130
<v Speaker 1>XM chaos, say close to election day, when newsrooms won't

0:46:13.130 --> 0:46:16.570
<v Speaker 1>have the time to investigate and debunk them. But another

0:46:16.610 --> 0:46:20.810
<v Speaker 1>reason might be the cheap fakes made with basic video

0:46:20.970 --> 0:46:25.210
<v Speaker 1>editing software are actually pretty effective. Remember the video that

0:46:25.330 --> 0:46:28.850
<v Speaker 1>surfaced of how speaker Nancy Pelosi, in which she appeared

0:46:28.890 --> 0:46:34.050
<v Speaker 1>intoxicated and confused. We want to give this president the

0:46:34.250 --> 0:46:42.010
<v Speaker 1>opportunity to something historic for our country. Both President Trump

0:46:42.050 --> 0:46:45.290
<v Speaker 1>and Rudy Giuliani shared the video as fact on Twitter.

0:46:45.930 --> 0:46:48.930
<v Speaker 1>The video is just a cheap fake, just slowed down

0:46:49.010 --> 0:46:53.810
<v Speaker 1>Pelosi's speech to make her seem incompetent. But maybe elections

0:46:54.450 --> 0:46:58.010
<v Speaker 1>won't be the biggest targets. Some people worry that deep

0:46:58.090 --> 0:47:03.690
<v Speaker 1>fakes could be weaponized to foment international conflict. Berkeley professor

0:47:03.730 --> 0:47:06.610
<v Speaker 1>Honey f Reed has been working with US government's Media

0:47:06.690 --> 0:47:11.330
<v Speaker 1>Forensics program to address this issue. DARPA, the Defense Department's

0:47:11.570 --> 0:47:13.730
<v Speaker 1>research arm, has been pouring a lot of money over

0:47:13.770 --> 0:47:16.970
<v Speaker 1>the last five years into this program. They are very

0:47:17.050 --> 0:47:21.290
<v Speaker 1>concerned about how this technology can be a threat to

0:47:21.370 --> 0:47:24.490
<v Speaker 1>national security and also how when we get images and

0:47:24.570 --> 0:47:26.770
<v Speaker 1>videos from around the world in areas of conflict, do

0:47:26.850 --> 0:47:28.850
<v Speaker 1>we know if they're real or not? Is this really

0:47:28.850 --> 0:47:32.290
<v Speaker 1>an image of a US soldier who has been taken hostage?

0:47:32.570 --> 0:47:34.770
<v Speaker 1>How do we know? So? What do you see as

0:47:34.850 --> 0:47:38.090
<v Speaker 1>some of the worst case scenarios. Here's the things that

0:47:38.170 --> 0:47:41.170
<v Speaker 1>keep me up at night. Right. A video of Donald

0:47:41.250 --> 0:47:45.010
<v Speaker 1>Trump saying I've launched nuclear weapons against Iran, and before

0:47:45.010 --> 0:47:46.970
<v Speaker 1>anybody gets around to figuring out whether this is real

0:47:47.050 --> 0:47:49.250
<v Speaker 1>or not, where we have global nuclear moutdown. And here's

0:47:49.250 --> 0:47:52.570
<v Speaker 1>the thing. I don't think that that's likely, but I

0:47:52.610 --> 0:47:55.730
<v Speaker 1>also don't think that the probability of that is zero.

0:47:56.330 --> 0:48:00.730
<v Speaker 1>And that should worry us because while it's not likely,

0:48:00.810 --> 0:48:06.010
<v Speaker 1>the consequences are spectacularly bad. Lawyer Danielle Citram worries about

0:48:06.010 --> 0:48:10.290
<v Speaker 1>an even more plausible scenario. And imagine a deep fake

0:48:10.690 --> 0:48:15.050
<v Speaker 1>of a well known American general burning a koran and

0:48:15.170 --> 0:48:19.410
<v Speaker 1>it is timed at a very tense moment in a

0:48:19.450 --> 0:48:25.330
<v Speaker 1>particular most you know country, whether it's Afghanistan. It could

0:48:25.370 --> 0:48:28.650
<v Speaker 1>then lead to physical violence. And you think this could

0:48:28.730 --> 0:48:32.370
<v Speaker 1>be made. No general, no qoran actually used in the video,

0:48:32.490 --> 0:48:37.890
<v Speaker 1>just programmed. You can use the technology to mine existing photographs.

0:48:37.930 --> 0:48:40.410
<v Speaker 1>Kind of easy, especially with someone like take Jim Mattis

0:48:40.810 --> 0:48:44.490
<v Speaker 1>when he was our defense secretary. Of Jim Mattis, you know,

0:48:44.770 --> 0:48:47.130
<v Speaker 1>actually taking a koran and ripping it in half and

0:48:47.130 --> 0:48:52.810
<v Speaker 1>say all Muslims should die. Imagine the chaos in diplomacy,

0:48:53.210 --> 0:48:58.050
<v Speaker 1>the chaos of our soldiers abroad in Muslim countries. It

0:48:58.130 --> 0:49:01.810
<v Speaker 1>would be inciting violence without question. Well, we haven't yet

0:49:01.810 --> 0:49:06.250
<v Speaker 1>seen spectacular fake videos used to disrupt elections or create

0:49:06.290 --> 0:49:12.970
<v Speaker 1>international chaos. We have seen recingly sophisticated attacks on public policymaking.

0:49:13.530 --> 0:49:16.690
<v Speaker 1>So we've got an example in twenty seventeen where the

0:49:16.810 --> 0:49:21.970
<v Speaker 1>FEC solicited public comment on the proposal to repeal net neutrality.

0:49:22.450 --> 0:49:26.570
<v Speaker 1>Net neutrality is the principle that internet service providers should

0:49:26.610 --> 0:49:31.370
<v Speaker 1>be a neutral public utility. They shouldn't discriminate between websites,

0:49:31.650 --> 0:49:35.530
<v Speaker 1>say slowing down Netflix streaming to encourage you to purchase

0:49:35.570 --> 0:49:40.410
<v Speaker 1>a different online video service. As President Barack Obama described

0:49:40.450 --> 0:49:44.210
<v Speaker 1>in twenty fourteen, there are no gatekeepers deciding which sites

0:49:44.250 --> 0:49:47.210
<v Speaker 1>you get to access. There are no toll roads on

0:49:47.250 --> 0:49:51.770
<v Speaker 1>the information super Highway. Federal Communications Policy had long supported

0:49:51.850 --> 0:49:56.730
<v Speaker 1>net neutrality, but in twenty seventeen, the Trump administration favored

0:49:56.810 --> 0:50:01.130
<v Speaker 1>repealing the policy. There were twenty two million comments that

0:50:01.290 --> 0:50:05.890
<v Speaker 1>the FEC received, but ninety six percent of those were

0:50:05.930 --> 0:50:10.930
<v Speaker 1>actually fake. The interesting thing is the real comments were

0:50:10.930 --> 0:50:15.330
<v Speaker 1>opposed to repeal, whereas the fake comments were in favor.

0:50:15.690 --> 0:50:19.810
<v Speaker 1>A Wall Street Journal investigation exposed that the fake public

0:50:19.850 --> 0:50:24.210
<v Speaker 1>comments were generated by bots. It found similar problems with

0:50:24.250 --> 0:50:28.250
<v Speaker 1>public comments about pay data lending. The bots varied their

0:50:28.290 --> 0:50:33.050
<v Speaker 1>comments in a combinatorial fashion so that the content wasn't identical.

0:50:33.690 --> 0:50:36.090
<v Speaker 1>With a little sleuthing, though, you could see that they

0:50:36.090 --> 0:50:40.570
<v Speaker 1>were generated by computers. But with the technology increasingly able

0:50:40.610 --> 0:50:45.210
<v Speaker 1>to generate completely original writing, like open aiyes program that

0:50:45.250 --> 0:50:48.570
<v Speaker 1>wrote the story about unicorns in the Andes, it's going

0:50:48.610 --> 0:50:51.930
<v Speaker 1>to become hard to spot the fakes. So there was

0:50:51.970 --> 0:50:55.850
<v Speaker 1>this Harvest student, Max Weiss, who used GPT two to

0:50:55.930 --> 0:50:58.330
<v Speaker 1>kind of demonstrate this, And I went on his site

0:50:58.410 --> 0:51:01.970
<v Speaker 1>yesterday and he's got this little test where you need

0:51:02.130 --> 0:51:07.130
<v Speaker 1>to decide whether a comment is real or fake. So

0:51:07.290 --> 0:51:09.330
<v Speaker 1>you go on and you read it and you decide

0:51:09.370 --> 0:51:11.570
<v Speaker 1>whether it's been written by a bot or by a human.

0:51:12.450 --> 0:51:15.570
<v Speaker 1>So I did this, and the ones that seemed to

0:51:15.610 --> 0:51:19.370
<v Speaker 1>be really well written and quite narrative discussive, generally I

0:51:19.410 --> 0:51:21.970
<v Speaker 1>was picking them as human. I was wrong almost all

0:51:21.970 --> 0:51:26.170
<v Speaker 1>the time. It was amazing and alarming. In our democracy,

0:51:26.250 --> 0:51:29.370
<v Speaker 1>public comments have been an important way in which citizens

0:51:29.410 --> 0:51:33.370
<v Speaker 1>can make their voices heard, but now it's becoming easy

0:51:33.490 --> 0:51:38.210
<v Speaker 1>to drown out those voices with millions of fake opinions. Now,

0:51:38.250 --> 0:51:41.610
<v Speaker 1>the downfall of truth likely won't come with a bang,

0:51:41.970 --> 0:51:46.810
<v Speaker 1>but a whimper, a slow, steady erosion that some call

0:51:47.410 --> 0:51:50.090
<v Speaker 1>truth decay. If you can't believe anything you read, or

0:51:50.130 --> 0:51:52.250
<v Speaker 1>hear or see anymore, I don't know how you have

0:51:52.290 --> 0:51:54.770
<v Speaker 1>a democracy a I don't know, frankly, how we have

0:51:54.850 --> 0:51:57.650
<v Speaker 1>civilized society if everybody's going to live in an echo

0:51:57.730 --> 0:52:00.850
<v Speaker 1>chamber believing their own version of events. How do we

0:52:00.890 --> 0:52:03.370
<v Speaker 1>have a dialogue if we can't agree on basic facts.

0:52:03.850 --> 0:52:07.490
<v Speaker 1>In the end, the most insidious impact of deep fakes

0:52:07.930 --> 0:52:11.170
<v Speaker 1>may not be the deep fake content itself, but the

0:52:11.170 --> 0:52:15.450
<v Speaker 1>ability to claim that real content is fake. It's something

0:52:15.450 --> 0:52:19.730
<v Speaker 1>that Danielle Citron refers to as the liar's dividend. The

0:52:19.850 --> 0:52:23.050
<v Speaker 1>liar's dividend is that the more you educate people about

0:52:23.090 --> 0:52:26.570
<v Speaker 1>the phenomenon of deep fags, the more the wrongdoer can

0:52:26.610 --> 0:52:31.210
<v Speaker 1>disclaim reality. Think about what President Trump did with the

0:52:31.290 --> 0:52:34.770
<v Speaker 1>Access Hollywood tape. You know, I'm automatically attracted to beautiful

0:52:34.810 --> 0:52:37.330
<v Speaker 1>I just started kissing them. It's like a magnet, kid.

0:52:38.370 --> 0:52:40.090
<v Speaker 1>I don't even know it. And when you're started, they

0:52:40.210 --> 0:52:42.530
<v Speaker 1>let you do it. You can do anything whatever you want.

0:52:42.610 --> 0:52:47.490
<v Speaker 1>Grab them by the pro I can do anything. Initially,

0:52:47.690 --> 0:52:51.730
<v Speaker 1>Trump apologized for the remarks. Anyone who knows me knows

0:52:51.810 --> 0:52:55.690
<v Speaker 1>these words don't reflect who I am. I said it,

0:52:56.130 --> 0:53:00.250
<v Speaker 1>I was wrong, and I apologize. But in twenty seventeen,

0:53:00.770 --> 0:53:04.330
<v Speaker 1>a year after his initial apology and with the idea

0:53:04.330 --> 0:53:08.570
<v Speaker 1>of deep fake content starting to gain attention, Trump changed

0:53:08.650 --> 0:53:11.770
<v Speaker 1>his tomb upon reflection, he said, they're not real. That

0:53:11.890 --> 0:53:14.330
<v Speaker 1>wasn't me. I don't think that was my voice. That's

0:53:14.370 --> 0:53:18.810
<v Speaker 1>the liar's dividend. In practice, the Trump comments about excess

0:53:18.810 --> 0:53:24.050
<v Speaker 1>Hollywood was remarkable. Slightly more subtle than that, he said,

0:53:24.490 --> 0:53:27.290
<v Speaker 1>I'm not sure that was me. Right, Well, that's the

0:53:27.370 --> 0:53:42.370
<v Speaker 1>corrosive gas lighting. Chapter eight, A Life Stored in the Cloud.

0:53:44.290 --> 0:53:48.650
<v Speaker 1>Deep fakes have the potential to devastate individuals and harms society.

0:53:49.250 --> 0:53:53.050
<v Speaker 1>The question is can we stop them from spreading before

0:53:53.090 --> 0:53:56.330
<v Speaker 1>they get out of control. To do so, we'd need

0:53:56.410 --> 0:54:00.570
<v Speaker 1>reliable ways to spot deep fakes. So the good news

0:54:00.650 --> 0:54:03.530
<v Speaker 1>is there are still artifacts in the synthesized content, whether

0:54:03.570 --> 0:54:06.170
<v Speaker 1>those are images, audio, or a video, that we as

0:54:06.210 --> 0:54:09.170
<v Speaker 1>the experts, can tell apart. So when, for example, The

0:54:09.170 --> 0:54:11.370
<v Speaker 1>New York Times wants to run a story with a video,

0:54:11.930 --> 0:54:14.610
<v Speaker 1>we can help them validate it. What are the real

0:54:14.730 --> 0:54:18.970
<v Speaker 1>sophisticated experts looking. Yeah, so the eyes are really wonderful

0:54:19.130 --> 0:54:22.850
<v Speaker 1>forensically because they reflect back to you what is in

0:54:22.850 --> 0:54:26.290
<v Speaker 1>the scene. So I'm sitting now right now in a studio,

0:54:26.450 --> 0:54:28.770
<v Speaker 1>there's maybe about a dozen or so lights around me,

0:54:28.810 --> 0:54:31.170
<v Speaker 1>and you can see this very complex set of reflections

0:54:31.170 --> 0:54:35.210
<v Speaker 1>in my eyes. So we can analyze fairly complex lighting patterns,

0:54:35.210 --> 0:54:38.170
<v Speaker 1>for example, to determine if this is one person's head

0:54:38.250 --> 0:54:41.170
<v Speaker 1>spliced onto another person's body, or if the two people

0:54:41.210 --> 0:54:45.530
<v Speaker 1>standing next to each other were digitally inserted from another photograph.

0:54:45.650 --> 0:54:48.250
<v Speaker 1>I could spend another hour telling you about the many

0:54:48.250 --> 0:54:52.130
<v Speaker 1>different forensic techniques that we've developed. There's no silver bullet here.

0:54:52.130 --> 0:54:55.050
<v Speaker 1>Really is a sort of a time consuming and deliberate

0:54:55.090 --> 0:54:58.450
<v Speaker 1>and thoughtful and it requires many many tools, and it

0:54:58.490 --> 0:55:00.530
<v Speaker 1>requires people with a fair amount of skill to do this.

0:55:01.090 --> 0:55:04.370
<v Speaker 1>Honey Freed also has quite a few detection techniques that

0:55:04.450 --> 0:55:07.050
<v Speaker 1>he won't speak about publicly for fear of the deep

0:55:07.050 --> 0:55:10.450
<v Speaker 1>fake creators will learn how beat his tests. I don't

0:55:10.490 --> 0:55:12.770
<v Speaker 1>create a GitHub repository and give my code to all

0:55:12.810 --> 0:55:16.050
<v Speaker 1>my adversaries. I don't have just one forensic techniques. I

0:55:16.090 --> 0:55:18.730
<v Speaker 1>have a couple dozen of them. So that means you,

0:55:18.970 --> 0:55:21.250
<v Speaker 1>as the person creating this now have to go back

0:55:21.250 --> 0:55:24.410
<v Speaker 1>and implement twenty different techniques and you have to do

0:55:24.450 --> 0:55:27.330
<v Speaker 1>it just perfectly, and that makes the landscape a little

0:55:27.330 --> 0:55:30.570
<v Speaker 1>bit more tricky for you to manage. As technology makes

0:55:30.570 --> 0:55:34.050
<v Speaker 1>it easier to create deep fakes, a big problem will

0:55:34.090 --> 0:55:37.450
<v Speaker 1>be the sheer amounts of content to review. So the

0:55:37.490 --> 0:55:41.330
<v Speaker 1>average person can download software repositories, and so it's getting

0:55:41.330 --> 0:55:44.370
<v Speaker 1>to the point now where the average person can just

0:55:44.490 --> 0:55:47.170
<v Speaker 1>run these as if they're running any standard piece of software.

0:55:47.210 --> 0:55:50.050
<v Speaker 1>There's also websites that have propped up where you can

0:55:50.090 --> 0:55:52.530
<v Speaker 1>pay them twenty bucks and you tell them, please put

0:55:52.530 --> 0:55:54.890
<v Speaker 1>this person's face into this person's video, and they will

0:55:54.930 --> 0:55:57.330
<v Speaker 1>do that for you. And so it doesn't take a

0:55:57.370 --> 0:55:59.850
<v Speaker 1>lot to get access to these tools. Now, I will

0:55:59.890 --> 0:56:02.210
<v Speaker 1>say that the output of those are not quite as

0:56:02.250 --> 0:56:04.690
<v Speaker 1>good as what we can create inside the lab. And

0:56:04.770 --> 0:56:06.490
<v Speaker 1>you just know what the trend is. You just know

0:56:06.570 --> 0:56:08.770
<v Speaker 1>it's going to get better and cheaper and faster and

0:56:08.810 --> 0:56:12.130
<v Speaker 1>easy to use. Detecting dow fakes will be a never

0:56:12.370 --> 0:56:17.210
<v Speaker 1>ending cat and mouse game. Remember how generative adversarial networks

0:56:17.290 --> 0:56:21.650
<v Speaker 1>or gams are built by training a fake generator to

0:56:21.730 --> 0:56:27.250
<v Speaker 1>outsmart a detector. Well. As detectors get better, fake generators

0:56:27.330 --> 0:56:31.850
<v Speaker 1>will be trained to keep pays still. Detectives like Honey

0:56:31.970 --> 0:56:36.090
<v Speaker 1>and platforms like Facebook are working to develop automated ways

0:56:36.130 --> 0:56:40.810
<v Speaker 1>to spot deep fakes rapidly and reliably. That's important because

0:56:40.850 --> 0:56:44.970
<v Speaker 1>more than five hundred additional hours of video are being

0:56:45.090 --> 0:56:48.690
<v Speaker 1>uploaded to YouTube every minute. I don't mean to sound

0:56:48.730 --> 0:56:51.730
<v Speaker 1>defeatist about this, but I'm going to lose this war.

0:56:51.970 --> 0:56:54.850
<v Speaker 1>I know this because it's always going to be easier

0:56:54.890 --> 0:56:57.130
<v Speaker 1>to create content than it is to detect it. But

0:56:58.130 --> 0:57:00.290
<v Speaker 1>here's where I will win. I will take it out

0:57:00.290 --> 0:57:03.050
<v Speaker 1>of the hands of the average person. So think about,

0:57:03.090 --> 0:57:06.690
<v Speaker 1>for example, the creation of counterfeit currency. With the latest

0:57:06.810 --> 0:57:10.090
<v Speaker 1>innovations brought on by the Treasure Department, it is hard

0:57:10.130 --> 0:57:12.290
<v Speaker 1>for the average person to take their inkjet printer and

0:57:12.370 --> 0:57:15.730
<v Speaker 1>create compelling fake currency. And I think that's going to

0:57:15.770 --> 0:57:18.130
<v Speaker 1>be the same trend here is that if you're using

0:57:18.130 --> 0:57:20.330
<v Speaker 1>some off the shelf tool, if you're paying somebody on

0:57:20.370 --> 0:57:22.090
<v Speaker 1>the website, we're going to find you, and we're going

0:57:22.130 --> 0:57:24.490
<v Speaker 1>to find you quickly. But if you are a dedicated,

0:57:24.690 --> 0:57:27.410
<v Speaker 1>highly skilled of the time and the effort to create it,

0:57:27.690 --> 0:57:29.530
<v Speaker 1>we are going to have to work really hard to

0:57:29.570 --> 0:57:34.210
<v Speaker 1>detect those. Given the challenges of detecting fake content, some

0:57:34.290 --> 0:57:38.130
<v Speaker 1>people envision a different kind of techno fix. They propose

0:57:38.250 --> 0:57:42.930
<v Speaker 1>developing airtight ways for content creators to mark their own

0:57:42.930 --> 0:57:48.370
<v Speaker 1>original video as real. That way, we get instantly recognize

0:57:48.410 --> 0:57:51.850
<v Speaker 1>an altered version if it wasn't identical. Now there's ways

0:57:51.850 --> 0:57:54.410
<v Speaker 1>of authenticating at the point of recording, and these are

0:57:54.450 --> 0:57:58.010
<v Speaker 1>what it's called control capture system. So here's the idea.

0:57:58.250 --> 0:58:01.330
<v Speaker 1>You use a special app on your mobile device that,

0:58:01.410 --> 0:58:05.250
<v Speaker 1>at the point of capture, a cryptographically signs the image

0:58:05.250 --> 0:58:07.970
<v Speaker 1>of the video or the audio. It puts that signature

0:58:07.970 --> 0:58:09.930
<v Speaker 1>onto the blockchain. The only thing you have to know

0:58:09.930 --> 0:58:13.370
<v Speaker 1>about the blockchain is that it is an immutable distributed ledger,

0:58:13.450 --> 0:58:16.930
<v Speaker 1>which means that that signature is essentially impossible to manipulate.

0:58:17.490 --> 0:58:20.850
<v Speaker 1>And now all of that happened at the point of recording.

0:58:21.330 --> 0:58:23.370
<v Speaker 1>If I was running a campaign today and I was

0:58:23.450 --> 0:58:27.610
<v Speaker 1>worried about my candidates likeness being misused, absolutely every public

0:58:27.610 --> 0:58:29.730
<v Speaker 1>event that they were at, I would record with a

0:58:29.770 --> 0:58:31.770
<v Speaker 1>control capture system and I'd be able to prove what

0:58:31.810 --> 0:58:35.250
<v Speaker 1>they actually said or did at any point in the future.

0:58:35.690 --> 0:58:39.370
<v Speaker 1>So this approach would shift the burden of authentication so

0:58:39.410 --> 0:58:43.650
<v Speaker 1>the people creating the videos rather than publishers or consumers.

0:58:44.530 --> 0:58:48.170
<v Speaker 1>Law professor Danielle Citron has explored how this solution could

0:58:48.290 --> 0:58:52.410
<v Speaker 1>quickly become dystopium. We might see the emergence of an

0:58:52.530 --> 0:58:54.810
<v Speaker 1>essentially an audit trail of everything you do and say

0:58:54.850 --> 0:58:57.810
<v Speaker 1>all of the time. Danielle refers to the business model

0:58:57.930 --> 0:59:02.010
<v Speaker 1>as immutable lifelogs in the cloud. In a way we

0:59:02.090 --> 0:59:05.410
<v Speaker 1>sort of already seen it. There are health plans that

0:59:05.490 --> 0:59:07.570
<v Speaker 1>if you wear a fitbit all the time and you

0:59:07.650 --> 0:59:10.290
<v Speaker 1>let yourself be monitored as your insurance, you know your

0:59:10.330 --> 0:59:13.370
<v Speaker 1>health insurance rates. But you can see how if the

0:59:13.450 --> 0:59:17.530
<v Speaker 1>incentives are there in the market to self surveil, whether

0:59:17.570 --> 0:59:22.050
<v Speaker 1>it's for health insurance, life insurance, car insurance, we're going

0:59:22.090 --> 0:59:25.650
<v Speaker 1>to see the unraveling of bribe to say by ourselves.

0:59:26.050 --> 0:59:30.850
<v Speaker 1>You know, corporations may very well, because the CEO is

0:59:30.930 --> 0:59:35.050
<v Speaker 1>so valuable, they may say you've got to have a log,

0:59:35.370 --> 0:59:37.530
<v Speaker 1>an immutable audit trail of everything you do and say.

0:59:37.570 --> 0:59:39.770
<v Speaker 1>So when that deep fake comes up the night before

0:59:39.770 --> 0:59:43.970
<v Speaker 1>the IPO, you can say, look, the CEO wasn't taking

0:59:44.010 --> 0:59:47.290
<v Speaker 1>the bribe wasn't having sex with a prostitute, and so

0:59:47.330 --> 0:59:50.930
<v Speaker 1>we have proof because we have an auto trail, we

0:59:50.930 --> 0:59:53.690
<v Speaker 1>have a log. So when we were imagining, we were

0:59:53.690 --> 0:59:58.410
<v Speaker 1>imagining a business model that hasn't quite come up, but

0:59:58.450 --> 1:00:02.570
<v Speaker 1>we have gotten a number of requests from insurance companies

1:00:03.010 --> 1:00:06.730
<v Speaker 1>as well as companies to say we're interested in this idea.

1:00:06.850 --> 1:00:08.490
<v Speaker 1>So how much has to be in that log? Does

1:00:08.490 --> 1:00:10.610
<v Speaker 1>this have to be a whole video of your life?

1:00:10.650 --> 1:00:13.570
<v Speaker 1>That is a great question, one that terrifies us. So

1:00:13.610 --> 1:00:19.130
<v Speaker 1>it may be that you're logging locate geolocation, you're logging videos,

1:00:19.170 --> 1:00:22.050
<v Speaker 1>you see people talking and who they're interacting with, and

1:00:22.130 --> 1:00:25.170
<v Speaker 1>that might be good enough to prevent the mischief that

1:00:25.450 --> 1:00:30.970
<v Speaker 1>would hijack the IP. Your whole life online, yes, stored securely,

1:00:31.250 --> 1:00:35.170
<v Speaker 1>our clock down, protected in the cloud. It is, at

1:00:35.250 --> 1:00:38.290
<v Speaker 1>least for a privacy scholar. There are so many reasons

1:00:38.290 --> 1:00:42.210
<v Speaker 1>why we ought to have privacy that aren't about hiding things.

1:00:42.850 --> 1:00:47.530
<v Speaker 1>It's about creating spaces and managing boundaries around ourselves and

1:00:47.650 --> 1:00:51.650
<v Speaker 1>our intimates and our loved ones. So I worry that

1:00:51.690 --> 1:00:55.570
<v Speaker 1>if we entirely unravel privacy A in the wrong hands

1:00:56.050 --> 1:01:00.970
<v Speaker 1>is very dangerous. Right B It changes how we think

1:01:00.970 --> 1:01:10.770
<v Speaker 1>about ourselves and humanity? Chapter nine, Section two thirty. So

1:01:10.890 --> 1:01:15.730
<v Speaker 1>technofixes are complicated. What about passing laws to ban deep

1:01:15.810 --> 1:01:19.090
<v Speaker 1>fakes or at least deep fakes that don't disclose their

1:01:19.090 --> 1:01:22.850
<v Speaker 1>fake So the video and audio is speech in our

1:01:22.850 --> 1:01:26.730
<v Speaker 1>First Amendment doctrine is very much a protective of free speech,

1:01:27.010 --> 1:01:30.610
<v Speaker 1>and the Supreme Court has explained that lies just lies

1:01:30.650 --> 1:01:34.930
<v Speaker 1>themselves without harm is protected speech. When lies cause certain

1:01:34.970 --> 1:01:42.250
<v Speaker 1>kinds of harm, we can regulate it. Defamation of private people, threats, incitement, fraud,

1:01:42.890 --> 1:01:47.410
<v Speaker 1>impersonation of government officials. What about lies concerning public figures

1:01:47.450 --> 1:01:53.010
<v Speaker 1>like politicians? California and Texas, for instance, recently pass laws

1:01:53.090 --> 1:01:56.490
<v Speaker 1>making it illegal to publish deep fakes of a candidate

1:01:56.770 --> 1:01:59.370
<v Speaker 1>in the weeks leading up to an election. It's not

1:01:59.490 --> 1:02:03.650
<v Speaker 1>clear yet whether the laws will pass constitutional muster. As

1:02:03.690 --> 1:02:07.810
<v Speaker 1>you're saying in an American content, we are just not

1:02:07.930 --> 1:02:10.730
<v Speaker 1>going to be able to law great fakes. Yeah, we

1:02:10.770 --> 1:02:12.810
<v Speaker 1>can't have a flat van, and I don't think we should.

1:02:12.970 --> 1:02:16.970
<v Speaker 1>It would fail on doctrinal grounds, but ultimately it would

1:02:17.010 --> 1:02:23.850
<v Speaker 1>prevent the positive uses. Interestingly, in January twenty twenty, China,

1:02:24.050 --> 1:02:28.970
<v Speaker 1>which has no First Amendment protecting free speech promulgated regulations

1:02:29.370 --> 1:02:33.450
<v Speaker 1>banning deep fakes. The use of AI or virtuality now

1:02:33.570 --> 1:02:36.650
<v Speaker 1>needs to be clearly marked in a prominent manner, and

1:02:36.770 --> 1:02:40.090
<v Speaker 1>the failure to do so is considered a criminal offense.

1:02:40.930 --> 1:02:43.970
<v Speaker 1>To explore other options for the US, I went to

1:02:44.010 --> 1:02:47.650
<v Speaker 1>speak with a public policy expert. My name is Joan Donovan,

1:02:47.770 --> 1:02:51.370
<v Speaker 1>and I work at Harvard Kennedy Shorenstein Center, where I

1:02:51.490 --> 1:02:54.730
<v Speaker 1>lead a team of researchers looking at medium manipulation and

1:02:54.770 --> 1:02:58.530
<v Speaker 1>disinformation campaigns. Joan is head of the Technology and Social

1:02:58.610 --> 1:03:03.210
<v Speaker 1>Change Research Project, and her staff studies how social media

1:03:03.530 --> 1:03:07.610
<v Speaker 1>gives rise to hoaxes and scams. Her team is particularly

1:03:07.610 --> 1:03:13.450
<v Speaker 1>interested and precisely how misinformation spreads across the Internet. Ultimately,

1:03:13.570 --> 1:03:16.730
<v Speaker 1>underneath all of this is the distribution mechanism, which is

1:03:16.970 --> 1:03:23.690
<v Speaker 1>social media and platforms. And platforms have to rethink the

1:03:23.730 --> 1:03:27.530
<v Speaker 1>openness of their design because that has now become a

1:03:27.650 --> 1:03:32.450
<v Speaker 1>territory for information warfare. In early twenty twenty, Facebook announced

1:03:32.490 --> 1:03:38.330
<v Speaker 1>a major policy change about synthesized content. Facebook preissued policies

1:03:38.450 --> 1:03:41.530
<v Speaker 1>now on deep fake saying that if it is an

1:03:41.530 --> 1:03:46.650
<v Speaker 1>AI generated video and it's misleading in some other contextual way,

1:03:47.410 --> 1:03:52.970
<v Speaker 1>then they will remove it. Interestingly, Facebook ban the Moon

1:03:53.050 --> 1:03:56.410
<v Speaker 1>Disaster Team's Nixon video even though it was made for

1:03:56.530 --> 1:04:00.810
<v Speaker 1>educational purposes, but didn't remove the slowed down version of

1:04:00.890 --> 1:04:05.690
<v Speaker 1>Nancy Pelosi, which was made to mislead the public. Why

1:04:05.810 --> 1:04:10.730
<v Speaker 1>because the Pelosi video wasn't created with artificial elligience. For now,

1:04:11.170 --> 1:04:14.970
<v Speaker 1>Facebook is choosing to target deep fakes, but not cheap fakes.

1:04:15.410 --> 1:04:18.170
<v Speaker 1>One way to push platforms to take a stronger stance

1:04:18.290 --> 1:04:21.170
<v Speaker 1>might be to remove some of the legal protections that

1:04:21.250 --> 1:04:25.570
<v Speaker 1>they currently enjoy. Under Section two thirty of the Communication's

1:04:25.610 --> 1:04:30.490
<v Speaker 1>Decency Act past in nineteen ninety six, platforms aren't legally

1:04:30.610 --> 1:04:34.890
<v Speaker 1>liable for content posted by its users. The fact that

1:04:34.970 --> 1:04:38.890
<v Speaker 1>platforms have no responsibility for the content they host has

1:04:38.930 --> 1:04:42.530
<v Speaker 1>an upside. It's led to the massive diversity of online

1:04:42.610 --> 1:04:47.050
<v Speaker 1>content we enjoyed today, but it also allows a dangerous

1:04:47.210 --> 1:04:51.330
<v Speaker 1>escalation of fake news. Is it time to change section

1:04:51.370 --> 1:04:56.290
<v Speaker 1>two thirty to create incentives for platforms to police false content?

1:04:57.090 --> 1:05:00.250
<v Speaker 1>I ask the former head of a major platform, LinkedIn

1:05:00.410 --> 1:05:04.170
<v Speaker 1>co founder Reid Hoffman. For example, let's take my view

1:05:04.250 --> 1:05:07.570
<v Speaker 1>of what the response to the Christchurch shooting should be

1:05:07.730 --> 1:05:10.970
<v Speaker 1>as to say, well, we want you to solve, not

1:05:11.130 --> 1:05:16.970
<v Speaker 1>having terrorism, murderer or murderers displayed to people. So we're

1:05:17.010 --> 1:05:19.570
<v Speaker 1>simply going to do a fine of ten thousand dollars

1:05:19.650 --> 1:05:23.850
<v Speaker 1>per view. Two shootings occurred at mosques in Christchurch, New

1:05:23.890 --> 1:05:28.050
<v Speaker 1>Zealand in March twenty nineteen. Graphic videos of the event

1:05:28.490 --> 1:05:32.210
<v Speaker 1>were soon posted online. Five people saw it, that's fifty

1:05:32.250 --> 1:05:35.050
<v Speaker 1>thousand dollars. But if he becomes a meme and a

1:05:35.130 --> 1:05:39.730
<v Speaker 1>million people see it, that's ten billion dollars. Yes, right,

1:05:39.890 --> 1:05:42.490
<v Speaker 1>So what's really trying to do is get you to say,

1:05:43.010 --> 1:05:45.930
<v Speaker 1>let's make sure that the meme never happens. Okay, So

1:05:45.970 --> 1:05:50.730
<v Speaker 1>that's a governance mechanism there. Yes, you find the channel

1:05:50.770 --> 1:05:54.130
<v Speaker 1>the platform based on number of views would be a

1:05:54.410 --> 1:05:57.290
<v Speaker 1>very general way to say. Now you guys have to solve.

1:05:57.450 --> 1:06:01.330
<v Speaker 1>Now you solve, you figure it out. What about other solutions?

1:06:01.610 --> 1:06:05.050
<v Speaker 1>If we are to make regulation, it should be about

1:06:05.090 --> 1:06:09.450
<v Speaker 1>the amount of staff in proportion to the out of

1:06:09.530 --> 1:06:13.050
<v Speaker 1>users so that they can get a handle on the content.

1:06:13.530 --> 1:06:17.010
<v Speaker 1>But can they be fast enough. Maybe the viral spread

1:06:17.050 --> 1:06:21.010
<v Speaker 1>should be slowed down enough to allow them to moderate.

1:06:21.090 --> 1:06:25.210
<v Speaker 1>Let's put it this way. The stock market has certain

1:06:25.850 --> 1:06:29.770
<v Speaker 1>governors built in when there's massive changes in a stock price.

1:06:30.250 --> 1:06:33.370
<v Speaker 1>There are decelerators that kick in, breaks that kick in

1:06:33.770 --> 1:06:37.010
<v Speaker 1>should the platforms have breaks that kick in before something

1:06:37.010 --> 1:06:41.970
<v Speaker 1>can go fully viral. So in terms of deceleration, there

1:06:41.970 --> 1:06:44.930
<v Speaker 1>are things that they do already that accelerate the process

1:06:45.010 --> 1:06:48.210
<v Speaker 1>that they need to think differently about, especially when it

1:06:48.250 --> 1:06:52.810
<v Speaker 1>comes to something turning into a trending topic. So there

1:06:52.850 --> 1:06:56.930
<v Speaker 1>needs to be an intervening moment before things get to

1:06:56.970 --> 1:07:00.130
<v Speaker 1>the homepage and get to trending, where there is a

1:07:00.170 --> 1:07:04.250
<v Speaker 1>content review. So much to say here, but I want

1:07:04.290 --> 1:07:08.170
<v Speaker 1>to think particularly about listeners who are in their twenties

1:07:08.170 --> 1:07:12.010
<v Speaker 1>and thirties, are very tech savvy. They're going to be

1:07:12.090 --> 1:07:15.530
<v Speaker 1>part of the solution here. What would you say to

1:07:15.650 --> 1:07:21.090
<v Speaker 1>them about what they can do? I think it's important

1:07:21.650 --> 1:07:26.690
<v Speaker 1>that younger people advocate for the Internet that they want.

1:07:27.010 --> 1:07:29.050
<v Speaker 1>We have to fight for it, We have to ask

1:07:29.130 --> 1:07:33.850
<v Speaker 1>for different things, and that kind of agitation can come

1:07:33.890 --> 1:07:38.570
<v Speaker 1>in the form of posting on the platform, writing letters,

1:07:39.370 --> 1:07:42.770
<v Speaker 1>joining groups like Fight for the Future, and trying to

1:07:44.170 --> 1:07:48.570
<v Speaker 1>work on getting platforms to do better and to advocate

1:07:48.570 --> 1:07:51.210
<v Speaker 1>for the kind of content that you want to see

1:07:51.570 --> 1:07:56.770
<v Speaker 1>more of. The important thing is that our society is

1:07:56.810 --> 1:08:00.650
<v Speaker 1>shaped by these platforms and so we're not going to

1:08:00.770 --> 1:08:03.730
<v Speaker 1>do away with them, but we don't have to make

1:08:03.890 --> 1:08:17.050
<v Speaker 1>do with them either. Conclusion, choose your planet. So there

1:08:17.090 --> 1:08:20.650
<v Speaker 1>you have it. Stewards of the Brave New Planet. Synthetic

1:08:20.730 --> 1:08:25.850
<v Speaker 1>media or deep fakes. People have been manipulating content for

1:08:26.010 --> 1:08:29.770
<v Speaker 1>more than a hundred years, but recent advances in AI

1:08:29.890 --> 1:08:33.250
<v Speaker 1>have taken it to a whole new level of verisimilitude.

1:08:33.810 --> 1:08:38.530
<v Speaker 1>The technology could transform movies and television, favored actors from

1:08:38.610 --> 1:08:42.290
<v Speaker 1>years past starring in new narratives, along with actors who

1:08:42.330 --> 1:08:46.490
<v Speaker 1>never existed, patients regaining the ability to speak in their

1:08:46.490 --> 1:08:52.290
<v Speaker 1>own voices, personalized stories created on demand for any child

1:08:52.330 --> 1:08:56.370
<v Speaker 1>around the globe, matching their interests, written in their dialect,

1:08:56.690 --> 1:09:02.170
<v Speaker 1>representing their communities. But there's also great potential for harm,

1:09:02.330 --> 1:09:07.210
<v Speaker 1>the ability to cast anyone in a pornographic video, weaponized

1:09:07.290 --> 1:09:11.810
<v Speaker 1>media dropping days before an election, or provoking international conflicts.

1:09:12.650 --> 1:09:15.610
<v Speaker 1>Are we going to be able to tell fact from fiction?

1:09:16.170 --> 1:09:21.890
<v Speaker 1>Will truth survive? And what does it mean for our democracy? Better?

1:09:21.930 --> 1:09:25.010
<v Speaker 1>Fake detection may help, but it'll be hard for it

1:09:25.050 --> 1:09:28.570
<v Speaker 1>to keep up, and logging our lives in blockchain to

1:09:28.650 --> 1:09:34.970
<v Speaker 1>protect against misrepresentation doesn't sound like an attractive idea. Outright

1:09:35.090 --> 1:09:37.850
<v Speaker 1>bands on deep fakes are being tried in some countries,

1:09:38.210 --> 1:09:41.530
<v Speaker 1>but they're tricky in the US given our constitutional protections

1:09:41.570 --> 1:09:45.450
<v Speaker 1>for free speech. Maybe the best solution is to put

1:09:45.490 --> 1:09:50.090
<v Speaker 1>the liability on platforms like Facebook and YouTube. If we

1:09:50.210 --> 1:09:54.090
<v Speaker 1>can joan Donovan's right to get the future you want,

1:09:54.490 --> 1:09:57.010
<v Speaker 1>you're going to have to fight for it. You don't

1:09:57.050 --> 1:09:59.410
<v Speaker 1>have to be an expert, and you don't have to

1:09:59.450 --> 1:10:02.850
<v Speaker 1>do it alone. When enough people get engaged, we make

1:10:02.890 --> 1:10:07.170
<v Speaker 1>wise choices. Deep fakes are a problem that everyone can

1:10:07.210 --> 1:10:11.050
<v Speaker 1>engage with. Brainstorm with your friends about what should be done.

1:10:11.490 --> 1:10:15.330
<v Speaker 1>Use social media. Tweet at your elected representatives to ask

1:10:15.370 --> 1:10:18.810
<v Speaker 1>if they're working on laws, like in California and Texas.

1:10:19.490 --> 1:10:23.210
<v Speaker 1>And if you work for a tech company, ask yourself

1:10:23.490 --> 1:10:27.250
<v Speaker 1>and your colleagues if you're doing enough. You can find

1:10:27.410 --> 1:10:31.090
<v Speaker 1>lots of resources and ideas at our website Brave New

1:10:31.130 --> 1:10:35.850
<v Speaker 1>Planet dot org. It's time to choose our planet. The

1:10:35.970 --> 1:10:49.890
<v Speaker 1>future is up to us. Brave New Planet is a

1:10:49.930 --> 1:10:53.050
<v Speaker 1>coproduction of the Broad Institute of MT and Harvard Pushkin

1:10:53.130 --> 1:10:56.530
<v Speaker 1>Industries in the Boston Globe, with support from the Alfred P.

1:10:56.690 --> 1:11:00.490
<v Speaker 1>Sloan Foundation. Our show is produced by Rebecca Lee Douglas

1:11:00.610 --> 1:11:05.290
<v Speaker 1>with Mary Doo theme song composed by Ned Porter, mastering

1:11:05.330 --> 1:11:09.210
<v Speaker 1>and sound designed by James Garver, fact checking by as

1:11:09.210 --> 1:11:13.290
<v Speaker 1>If Fridman and a Stitt and Enchant. Special thanks to

1:11:13.370 --> 1:11:17.970
<v Speaker 1>Christine Heenan and Rachel Roberts at Clarendon Communications, to Lee McGuire,

1:11:18.170 --> 1:11:21.530
<v Speaker 1>Kristen Zarelli and Justine Levin Allerhans at the Broad, to

1:11:21.730 --> 1:11:25.930
<v Speaker 1>mil Lobell and Heather Faine at Pushkin, and to Eliah

1:11:26.090 --> 1:11:30.170
<v Speaker 1>Edie Brode who made the Broad Institute possible. This is

1:11:30.210 --> 1:11:32.650
<v Speaker 1>brave new planet. I'm Eric Lander.