WEBVTT - What Algorithms Say About You

0:00:15.250 --> 0:00:28.890
<v Speaker 1>Pushkin. You're listening to Brave New Planet, a podcast about

0:00:28.930 --> 0:00:33.090
<v Speaker 1>amazing new technologies that could dramatically improve our world. Or

0:00:33.490 --> 0:00:36.010
<v Speaker 1>if we don't make wise choices, could leave us a

0:00:36.050 --> 0:00:40.690
<v Speaker 1>lot worse off, Utopia or dystopia. It's up to us.

0:00:46.970 --> 0:00:52.850
<v Speaker 1>On November eleventh, twenty sixteen, the Babelfish burst from fiction

0:00:53.290 --> 0:00:58.450
<v Speaker 1>into reality. The Babelfish was conceived forty years ago in

0:00:58.610 --> 0:01:03.450
<v Speaker 1>Douglas Adam's science fiction classic The Hitchhiker's Guide to the Galaxy.

0:01:04.130 --> 0:01:08.410
<v Speaker 1>In the story, a hapless Earthling finds himself a stowaway

0:01:08.690 --> 0:01:13.010
<v Speaker 1>on a Vogon spaceship. When the alien captain starts an

0:01:13.010 --> 0:01:17.370
<v Speaker 1>announcement over the loudspeaker, his companion tells him to stick

0:01:17.410 --> 0:01:23.090
<v Speaker 1>a small yellow fish in his ear. Listen, it's important,

0:01:24.170 --> 0:01:27.730
<v Speaker 1>it's a I can't just put this in your ear.

0:01:28.570 --> 0:01:35.450
<v Speaker 1>Suddenly he's able to understand the language. The Babelfish is small, yellow,

0:01:35.850 --> 0:01:39.490
<v Speaker 1>leech like and probably the oddest thing in the universe.

0:01:40.930 --> 0:01:45.530
<v Speaker 1>It feeds on brainwave energy, whose ambing all unconscious frequencies,

0:01:46.010 --> 0:01:48.570
<v Speaker 1>the practical upshot of which is that if you stick

0:01:48.610 --> 0:01:51.770
<v Speaker 1>one in your ear, you instantly understand anything said to

0:01:51.810 --> 0:01:54.370
<v Speaker 1>you in any form of language. At the time, the

0:01:54.450 --> 0:01:58.810
<v Speaker 1>idea of sticking an instantaneous universal translator in your ear

0:01:59.330 --> 0:02:04.490
<v Speaker 1>seems charmingly absurd, But a couple of years ago, Google

0:02:04.570 --> 0:02:08.890
<v Speaker 1>and other companies announced plants to start selling Babelfish well

0:02:09.130 --> 0:02:12.570
<v Speaker 1>not fish actually, but earbuds that do the same thing.

0:02:13.490 --> 0:02:17.050
<v Speaker 1>The key breakthrough came in November twenty sixteen, when Google

0:02:17.170 --> 0:02:23.010
<v Speaker 1>replaced the technology behind its translate program. Overnight, the Internet

0:02:23.090 --> 0:02:28.490
<v Speaker 1>realized that something extraordinary had happened. A Japanese computer scientist

0:02:28.570 --> 0:02:31.890
<v Speaker 1>ran a quick test. He dashed off his own Japanese

0:02:31.930 --> 0:02:36.250
<v Speaker 1>translation of the opening lines of Ernest Hemingway's short story

0:02:36.690 --> 0:02:41.290
<v Speaker 1>The Snows of Kilmanjaro, and dared Google Translate to turn

0:02:41.330 --> 0:02:45.210
<v Speaker 1>it back into English. Here's the opening passage from the

0:02:45.250 --> 0:02:49.370
<v Speaker 1>Simon and Schuster audio book. Kilimanjaro is a snow covered

0:02:49.410 --> 0:02:53.050
<v Speaker 1>mountain nineteen thousand, seven hundred and ten feet high and

0:02:53.210 --> 0:02:56.090
<v Speaker 1>is said to be the highest mountain in Africa. Its

0:02:56.090 --> 0:03:00.130
<v Speaker 1>western summit is called the Massai Nagaji Nagai, the House

0:03:00.170 --> 0:03:03.530
<v Speaker 1>of God. Close to the western summit there is the

0:03:03.650 --> 0:03:07.690
<v Speaker 1>dried and frozen carcass of a leopard. No one has

0:03:07.730 --> 0:03:11.810
<v Speaker 1>explained what the leopard seeking at that altitude. Let's just

0:03:11.890 --> 0:03:15.970
<v Speaker 1>consider that last sentence. No one has explained what the

0:03:16.090 --> 0:03:21.330
<v Speaker 1>leopard was seeking at that altitude. One day earlier, Google

0:03:21.410 --> 0:03:26.650
<v Speaker 1>had mangled the back translation quote. Whether the leopard had

0:03:26.690 --> 0:03:30.890
<v Speaker 1>what the demand at that altitude? There is no that

0:03:30.970 --> 0:03:37.850
<v Speaker 1>nobody explained. But now Google Translate returned quote. No one

0:03:37.930 --> 0:03:43.130
<v Speaker 1>has ever explained what leopard wanted at that altitude. It

0:03:43.250 --> 0:03:49.570
<v Speaker 1>was perfect except for a missing the the what explained

0:03:49.610 --> 0:03:54.970
<v Speaker 1>the great leap? While Google had built a predictive algorithm

0:03:55.010 --> 0:03:59.330
<v Speaker 1>that taught itself how to translate between English and Japanese

0:03:59.810 --> 0:04:03.690
<v Speaker 1>by training on a vast library of examples and tweaking

0:04:03.730 --> 0:04:07.210
<v Speaker 1>its connections to get better and better at predicting the

0:04:07.330 --> 0:04:11.330
<v Speaker 1>right answer. In anyways, the algorithm was a black box.

0:04:11.970 --> 0:04:15.730
<v Speaker 1>No one understood precisely how it worked, but it did

0:04:15.770 --> 0:04:21.890
<v Speaker 1>amazingly well. Predictive algorithms turn out to be remarkably general.

0:04:22.570 --> 0:04:25.250
<v Speaker 1>They can be applied to predict which movies a Netflix

0:04:25.370 --> 0:04:27.970
<v Speaker 1>user will want to see next, or whether an eye

0:04:28.050 --> 0:04:32.690
<v Speaker 1>exam or a mammogram indicates disease. But it doesn't stop there.

0:04:33.330 --> 0:04:37.610
<v Speaker 1>Predictive algorithms or also being trained to make societal decisions

0:04:38.450 --> 0:04:41.490
<v Speaker 1>who to hire for a job, whether to approve a

0:04:41.570 --> 0:04:45.890
<v Speaker 1>mortgage application, what students to let into a college, what

0:04:46.050 --> 0:04:49.530
<v Speaker 1>a rest ease to let out on bail? But what

0:04:49.650 --> 0:04:54.170
<v Speaker 1>exactly are these big black boxes learning from massive data sets?

0:04:54.770 --> 0:04:58.330
<v Speaker 1>Are they gaining deep new insights about people? Or might

0:04:58.370 --> 0:05:07.490
<v Speaker 1>they sometimes be automating systemic biases? Today's big question when

0:05:07.530 --> 0:05:11.690
<v Speaker 1>should predictive algorithms be allowed to make big decisions about people?

0:05:12.490 --> 0:05:15.970
<v Speaker 1>And before they judge us, should we have the right

0:05:16.010 --> 0:05:21.090
<v Speaker 1>to know what's inside the black box? My name is

0:05:21.170 --> 0:05:23.570
<v Speaker 1>Eric Lander. I'm a scientist who works on ways to

0:05:23.610 --> 0:05:27.010
<v Speaker 1>improve human health. I helped lead the Human Genome Project,

0:05:27.210 --> 0:05:30.370
<v Speaker 1>and today I lead the Broad Institute of MIT and Harvard.

0:05:31.090 --> 0:05:34.890
<v Speaker 1>In the twenty first century, powerful technologies have been appearing

0:05:34.930 --> 0:05:39.290
<v Speaker 1>at a breathtaking pace, related to the Internet, artificial intelligence,

0:05:39.330 --> 0:05:44.610
<v Speaker 1>genetic engineering, and more. They have amazing potential upsides, but

0:05:44.730 --> 0:05:47.410
<v Speaker 1>we can't ignore the risks that come with them. The

0:05:47.530 --> 0:05:52.210
<v Speaker 1>decisions aren't just up to scientists or politicians. Whether we

0:05:52.290 --> 0:05:55.130
<v Speaker 1>like it or not, we all of us are the

0:05:55.210 --> 0:05:59.130
<v Speaker 1>stewards of a brave New planet. This generation's choices will

0:05:59.130 --> 0:06:11.570
<v Speaker 1>shape the future as never before. Coming up on today's

0:06:11.570 --> 0:06:20.210
<v Speaker 1>episode of Brave New Planet, predictive algorithms. We hear from

0:06:20.210 --> 0:06:23.650
<v Speaker 1>a physician at Google about how this technology might help

0:06:23.770 --> 0:06:27.410
<v Speaker 1>keep millions of people with diabetes from going blind, and

0:06:27.490 --> 0:06:30.010
<v Speaker 1>the idea was, well, if you could retrain the model,

0:06:30.450 --> 0:06:33.650
<v Speaker 1>you could get to more patients to screen them for disease.

0:06:34.010 --> 0:06:37.410
<v Speaker 1>The first iteration of the model was on par with

0:06:37.690 --> 0:06:41.450
<v Speaker 1>the US board sortified ophomologists. I speak with an AI

0:06:41.610 --> 0:06:46.930
<v Speaker 1>researcher about how predictive algorithms sometimes learn to be sexist

0:06:47.010 --> 0:06:50.690
<v Speaker 1>and racist. If you typed in I am a white man,

0:06:50.730 --> 0:06:53.090
<v Speaker 1>you would get positive sentiment. If you typed in I

0:06:53.130 --> 0:06:56.810
<v Speaker 1>am a black lesbian, for example, negative sentiment. We hear

0:06:56.850 --> 0:07:01.330
<v Speaker 1>how algorithms are affecting the criminal justice system. For black defendants,

0:07:01.570 --> 0:07:05.330
<v Speaker 1>it was much more likely to incorrectly predict that they

0:07:05.330 --> 0:07:06.730
<v Speaker 1>were going to go on to come in a future

0:07:06.730 --> 0:07:09.610
<v Speaker 1>crime when they didn't, and for white defend it was

0:07:09.810 --> 0:07:12.490
<v Speaker 1>much more likely to predict that they were going to

0:07:12.530 --> 0:07:14.770
<v Speaker 1>go on to not commit a future crime when they did.

0:07:15.330 --> 0:07:18.370
<v Speaker 1>And we hear from a policy expert about whether these

0:07:18.410 --> 0:07:21.650
<v Speaker 1>systems should be regulated. A lot of the horror stories

0:07:21.690 --> 0:07:25.610
<v Speaker 1>are about fully implemented tools that were in work for years.

0:07:25.690 --> 0:07:29.650
<v Speaker 1>There's never a pause button. To reevaluate or look at

0:07:29.650 --> 0:07:32.690
<v Speaker 1>how a system is working real time. Stay with us

0:07:36.330 --> 0:07:42.130
<v Speaker 1>Chapter one, The Big Black Box. To better understand these algorithms,

0:07:42.170 --> 0:07:44.370
<v Speaker 1>I decided to speak with one of the creators of

0:07:44.410 --> 0:07:49.090
<v Speaker 1>the technology that transformed Google Translate. My name is Greg Kurado,

0:07:49.250 --> 0:07:52.330
<v Speaker 1>and I'm a distinguished scientist at Google Research. Early in

0:07:52.370 --> 0:07:56.370
<v Speaker 1>his career, Greg had trained neuroscience, but he soon shifted

0:07:56.410 --> 0:08:01.210
<v Speaker 1>his focus from organic intelligence to artificial and not turned

0:08:01.210 --> 0:08:04.290
<v Speaker 1>out to be really a very lucky moment, because I

0:08:04.410 --> 0:08:08.570
<v Speaker 1>was becoming interested in artificial intelligence at exactly the moment

0:08:08.610 --> 0:08:12.010
<v Speaker 1>that artificial intelligence was changing so much. Ever since the

0:08:12.050 --> 0:08:16.090
<v Speaker 1>field of artificial intelligence started more than sixty years ago,

0:08:16.610 --> 0:08:19.930
<v Speaker 1>there have been two warring approaches about how to teach

0:08:20.010 --> 0:08:24.210
<v Speaker 1>machines to do human tasks. We might call them human

0:08:24.290 --> 0:08:28.010
<v Speaker 1>rules versus machine learning. The way that we used to

0:08:28.050 --> 0:08:32.610
<v Speaker 1>try to get computers to recognize patterns was to program

0:08:32.650 --> 0:08:36.850
<v Speaker 1>into them specific rules. So we would say, oh, well,

0:08:36.850 --> 0:08:38.810
<v Speaker 1>you can tell the difference between a cat and a

0:08:38.930 --> 0:08:42.530
<v Speaker 1>dog by how long it's whiskers are and what kind

0:08:42.570 --> 0:08:45.090
<v Speaker 1>of fur it has and does it have stripes? And

0:08:45.210 --> 0:08:48.930
<v Speaker 1>trying to put these rules into computers. It kind of worked,

0:08:49.450 --> 0:08:52.650
<v Speaker 1>but it made for a lot of mistakes. The other

0:08:52.690 --> 0:08:56.330
<v Speaker 1>approach was machine learning, let the computer figure everything out

0:08:56.330 --> 0:09:01.290
<v Speaker 1>for itself, somewhat like the biological brain. The machine learning

0:09:01.290 --> 0:09:05.930
<v Speaker 1>system is actually built of tiny little decision makers or neurons.

0:09:06.290 --> 0:09:09.730
<v Speaker 1>They start out connected very much in random ways, but

0:09:10.250 --> 0:09:13.250
<v Speaker 1>we give the system feedback. So, for example, if it's

0:09:13.250 --> 0:09:16.090
<v Speaker 1>guessing between a cat and a dog and it gets

0:09:16.090 --> 0:09:18.890
<v Speaker 1>one wrong, we tell the system that it got one wrong,

0:09:18.970 --> 0:09:21.810
<v Speaker 1>and we make little changes inside so that it's much

0:09:21.850 --> 0:09:25.210
<v Speaker 1>more likely to recognize that cat as a cat and

0:09:25.370 --> 0:09:29.050
<v Speaker 1>not mistake it for a dog. Over time, the system

0:09:29.090 --> 0:09:32.650
<v Speaker 1>gets better and better and better. Machine learning had been

0:09:32.690 --> 0:09:37.450
<v Speaker 1>around for decades with rather unimpressive results. The number of

0:09:37.610 --> 0:09:42.570
<v Speaker 1>connections and neurons in those early systems was pretty small.

0:09:42.890 --> 0:09:47.690
<v Speaker 1>We didn't realize until about two and ten that computers

0:09:47.730 --> 0:09:51.250
<v Speaker 1>had gotten fast enough and the data sets were big

0:09:51.370 --> 0:09:55.850
<v Speaker 1>enough that these systems could actually learn from patterns and

0:09:55.930 --> 0:10:01.930
<v Speaker 1>learn from data better than we could describe rules manually.

0:10:02.170 --> 0:10:07.570
<v Speaker 1>Machine learning made huge leaps. Google itself became the leading

0:10:07.690 --> 0:10:12.570
<v Speaker 1>driver of machine learning. In twenty eleven, Krrado joined with

0:10:12.650 --> 0:10:17.490
<v Speaker 1>two colleagues to form a unit called Google Brain. Among

0:10:17.570 --> 0:10:22.090
<v Speaker 1>other things, they applied a machine learning approach to language translation.

0:10:22.970 --> 0:10:28.290
<v Speaker 1>The strategy turned out to be remarkably effective. It doesn't

0:10:28.410 --> 0:10:31.130
<v Speaker 1>learn French the way you would learn French in high school.

0:10:31.530 --> 0:10:34.290
<v Speaker 1>It learns French the way you would learn French at home,

0:10:34.730 --> 0:10:38.050
<v Speaker 1>much more like the way that a child learns the language.

0:10:38.250 --> 0:10:41.930
<v Speaker 1>We give the machine the English sentence, and then we

0:10:41.970 --> 0:10:45.370
<v Speaker 1>give it an example of a French translation of that

0:10:45.530 --> 0:10:49.730
<v Speaker 1>whole sentence. We show a whole lot of them, probably

0:10:50.170 --> 0:10:53.330
<v Speaker 1>more French and English sentences than you could read in

0:10:53.370 --> 0:10:57.690
<v Speaker 1>your whole life. And by seeing so many examples of

0:10:58.050 --> 0:11:02.170
<v Speaker 1>entire sentences, the system is able to learn, oh, this

0:11:02.250 --> 0:11:05.170
<v Speaker 1>is how I would say this in French. That's actually,

0:11:05.210 --> 0:11:09.530
<v Speaker 1>at this point about as good as a biling human

0:11:09.570 --> 0:11:13.890
<v Speaker 1>would produce. Soon Google was training predictive algorithms for all

0:11:14.090 --> 0:11:17.890
<v Speaker 1>sorts of purposes. We use neural network predictors to help

0:11:18.090 --> 0:11:22.370
<v Speaker 1>rank search results, tell people organize their photos, to recognize speech,

0:11:22.450 --> 0:11:27.370
<v Speaker 1>to find driving directions, to help complete emails. Really anything

0:11:27.410 --> 0:11:30.010
<v Speaker 1>that you can think of where there's some notion of

0:11:30.130 --> 0:11:34.370
<v Speaker 1>finding a pattern or making a prediction, artificial intelligence might

0:11:34.410 --> 0:11:39.330
<v Speaker 1>be at play. Predictive algorithms would become ubiquitous in commerce.

0:11:39.770 --> 0:11:43.410
<v Speaker 1>They let Netflix know which movies to recommend to each customer,

0:11:43.850 --> 0:11:47.890
<v Speaker 1>Amazon to suggest products users might be interested in purchasing,

0:11:48.210 --> 0:11:53.050
<v Speaker 1>and much more well, they're shockingly useful, they can also

0:11:53.130 --> 0:11:57.810
<v Speaker 1>be inscrutable. Modern neural networks are like a black box.

0:11:58.370 --> 0:12:02.850
<v Speaker 1>Understanding how they make their predictions can be surprisingly difficult.

0:12:03.210 --> 0:12:05.530
<v Speaker 1>When you build an artificial neural network, you do not

0:12:05.810 --> 0:12:10.210
<v Speaker 1>necessarily understand exactly the final state of how it works.

0:12:10.730 --> 0:12:15.530
<v Speaker 1>Figuring out how it works becomes its own science project.

0:12:16.010 --> 0:12:21.130
<v Speaker 1>One thing we do know. Predictive algorithms are especially sensitive,

0:12:21.410 --> 0:12:24.530
<v Speaker 1>so the choice of examples used to train them. The

0:12:24.730 --> 0:12:28.490
<v Speaker 1>systems learn to imitate the examples in the data that

0:12:28.530 --> 0:12:31.170
<v Speaker 1>they see. You don't know how well they will do

0:12:31.210 --> 0:12:34.090
<v Speaker 1>on things that are very different. So, for example, if

0:12:34.130 --> 0:12:38.090
<v Speaker 1>you train a system to recognize cats and dogs, but

0:12:38.250 --> 0:12:43.290
<v Speaker 1>you only ever show it border collies and tabbycats, it's

0:12:43.370 --> 0:12:45.570
<v Speaker 1>not clear what it will do. When you show it

0:12:45.610 --> 0:12:50.250
<v Speaker 1>a picture of chihuahua, all it's ever seen as border collies,

0:12:50.690 --> 0:12:53.930
<v Speaker 1>it may not get the right answer. So its concept

0:12:53.970 --> 0:12:56.890
<v Speaker 1>of dog is going to be limited by the dogs

0:12:56.930 --> 0:13:00.130
<v Speaker 1>it's seen. That's right, and this is why diversity of

0:13:00.290 --> 0:13:04.850
<v Speaker 1>data in machine learning systems is so important. You have

0:13:04.930 --> 0:13:08.810
<v Speaker 1>to have a data set that represents the entire spectrum

0:13:08.810 --> 0:13:12.170
<v Speaker 1>of possibilities that you expect the system to work under.

0:13:12.730 --> 0:13:15.970
<v Speaker 1>Teaching algorithms turns out to be not so different than

0:13:16.010 --> 0:13:25.490
<v Speaker 1>teaching people. They learn what they see. Chapter two, retina fundoscopy.

0:13:27.170 --> 0:13:30.650
<v Speaker 1>It's cool that predictive algorithms can learn to translate languages

0:13:30.730 --> 0:13:35.130
<v Speaker 1>and suggest movies, but what about more life changing applications.

0:13:35.850 --> 0:13:38.970
<v Speaker 1>My name is Lily Ping. I am a physician by training,

0:13:39.370 --> 0:13:42.170
<v Speaker 1>and I am a product manager at Google. I went

0:13:42.250 --> 0:13:45.050
<v Speaker 1>to visit doctor Ping because she and her colleagues are

0:13:45.130 --> 0:13:50.330
<v Speaker 1>using predictive algorithms to help millions of people avoid going blind. So.

0:13:50.490 --> 0:13:55.210
<v Speaker 1>Diabetic retinopathy is a complication of diabetes that affects the

0:13:55.250 --> 0:13:58.170
<v Speaker 1>back of the eye, the retina. One of the devastating

0:13:58.210 --> 0:14:02.530
<v Speaker 1>complications is vision loss. All patients that have diabetes need

0:14:02.650 --> 0:14:06.370
<v Speaker 1>to be screened once a year for a diabetic retnopathy.

0:14:06.410 --> 0:14:08.810
<v Speaker 1>This is an asymptomatic disease, which means that you do

0:14:08.850 --> 0:14:12.170
<v Speaker 1>not feel the symptoms. You don't experienced vision loss until

0:14:12.210 --> 0:14:16.330
<v Speaker 1>it's too late. Now, diabetes is epidemic around the world.

0:14:16.370 --> 0:14:19.930
<v Speaker 1>How many diabetics are there? Though by most estimates, there

0:14:19.930 --> 0:14:23.370
<v Speaker 1>are over four hundred million patients in the world with diabetes.

0:14:23.610 --> 0:14:26.810
<v Speaker 1>How do you screen a patient to see whether they

0:14:26.850 --> 0:14:30.930
<v Speaker 1>have diabetic retinopathy. You need to have a special camera

0:14:31.050 --> 0:14:34.050
<v Speaker 1>while a fundis camera and it takes a picture through

0:14:34.090 --> 0:14:36.410
<v Speaker 1>the people of the back of the eye. We have

0:14:36.450 --> 0:14:39.850
<v Speaker 1>a very small supply of retina specialists and eye doctors

0:14:40.130 --> 0:14:42.730
<v Speaker 1>and they do a lot more than reading images, so

0:14:42.970 --> 0:14:47.250
<v Speaker 1>they needed to scale the reading of these images. Four

0:14:47.490 --> 0:14:52.370
<v Speaker 1>hundred million people with diabetes. There just aren't enough specialists

0:14:52.370 --> 0:14:55.770
<v Speaker 1>for all the retinal images that need reading, especially in

0:14:55.810 --> 0:14:59.330
<v Speaker 1>some countries in Asia where resources are limited and the

0:14:59.410 --> 0:15:04.290
<v Speaker 1>incidence of diabetes is skyrocketing. Two hospitals in southern India

0:15:04.370 --> 0:15:07.930
<v Speaker 1>recognize the problem and reached out to Google for help.

0:15:09.210 --> 0:15:12.410
<v Speaker 1>That point, Google was already sort of well known for

0:15:12.450 --> 0:15:17.850
<v Speaker 1>image recognition. We were classifying cats and dogs and consumer images,

0:15:18.090 --> 0:15:20.490
<v Speaker 1>and the idea was, well, if you could retrain the

0:15:20.530 --> 0:15:26.410
<v Speaker 1>model to recognize diabetic retinopathy, you could potentially help the

0:15:26.450 --> 0:15:30.290
<v Speaker 1>hospitals in India get to more patients to screen them

0:15:30.410 --> 0:15:33.090
<v Speaker 1>for disease. How did you and your colleagues set out

0:15:33.090 --> 0:15:36.210
<v Speaker 1>to attack this problem? So when I first started the project,

0:15:36.570 --> 0:15:40.370
<v Speaker 1>we had about one hundred thirty thousand images from eye

0:15:40.410 --> 0:15:43.690
<v Speaker 1>hospitals in India as well as a screening program in

0:15:43.690 --> 0:15:48.170
<v Speaker 1>the US. Also, we gathered the army of opthalmologists to

0:15:48.290 --> 0:15:52.370
<v Speaker 1>grade them eight hundred eighty thousand diagnoses or rendered on

0:15:52.370 --> 0:15:55.610
<v Speaker 1>one hundred thirty thousand images. So we took this training

0:15:55.690 --> 0:15:58.090
<v Speaker 1>data and we put it in a machine learning model

0:15:58.250 --> 0:16:00.770
<v Speaker 1>and had to do The first iteration of the model

0:16:01.210 --> 0:16:06.090
<v Speaker 1>was on par with the US board sortified ophomologists. Since then,

0:16:06.130 --> 0:16:09.570
<v Speaker 1>we've made some improvements to the model, and the initial

0:16:09.610 --> 0:16:13.610
<v Speaker 1>training took about how long The first time we train

0:16:13.650 --> 0:16:16.290
<v Speaker 1>a model, it may have taken a couple of weeks,

0:16:16.330 --> 0:16:18.410
<v Speaker 1>But then the second time you train the next models

0:16:18.410 --> 0:16:21.530
<v Speaker 1>and next models, it's just it's shorter and shorter, sometimes overnight,

0:16:22.050 --> 0:16:24.930
<v Speaker 1>sometimes overnight. Well, yes, all right, And by contrast, how

0:16:24.970 --> 0:16:28.250
<v Speaker 1>long does it take to train a board certified ophthimologist,

0:16:29.570 --> 0:16:33.650
<v Speaker 1>So that usually takes at least five years, and then

0:16:33.690 --> 0:16:37.730
<v Speaker 1>you also have additional fellowship ears to specialize in the retina.

0:16:37.770 --> 0:16:39.450
<v Speaker 1>And at the end of that you only have one

0:16:39.770 --> 0:16:42.890
<v Speaker 1>board certified ophthimologist. Yes, at the end of that you'd

0:16:42.930 --> 0:16:48.250
<v Speaker 1>have one very very well trained doctor, but that's not scaled. Yes,

0:16:48.410 --> 0:16:53.930
<v Speaker 1>So by contrast, a model like this scales worldwide and

0:16:54.050 --> 0:16:58.730
<v Speaker 1>never fatigues. It consistently gives the same diagnosis on the

0:16:58.730 --> 0:17:02.850
<v Speaker 1>same image, and it obviously takes a much shorter time

0:17:02.930 --> 0:17:07.010
<v Speaker 1>to train. That being said, it does a very very

0:17:07.130 --> 0:17:10.050
<v Speaker 1>narrow task that is just a very small portion of

0:17:10.050 --> 0:17:14.050
<v Speaker 1>what that doctor can do. The retina screening tools already

0:17:14.090 --> 0:17:17.490
<v Speaker 1>being used in India, It was recently approved in Europe

0:17:17.530 --> 0:17:21.450
<v Speaker 1>and its under review in the United States. Groups around

0:17:21.490 --> 0:17:24.890
<v Speaker 1>the world are now working on other challenges in medical imaging,

0:17:25.370 --> 0:17:29.130
<v Speaker 1>like detecting breast cancers at earlier stages. But I was

0:17:29.210 --> 0:17:33.970
<v Speaker 1>particularly struck by a surprising discovery by Lily's team that

0:17:34.210 --> 0:17:39.410
<v Speaker 1>unexpected information about patients was hiding in their retinal pictures.

0:17:40.010 --> 0:17:43.450
<v Speaker 1>In the fundess image, there are blood vessels, and so

0:17:43.650 --> 0:17:46.570
<v Speaker 1>one of the thoughts that we had was, because you

0:17:46.610 --> 0:17:49.250
<v Speaker 1>can see these vessels, I wonder if we can predict

0:17:49.450 --> 0:17:53.170
<v Speaker 1>cardiovascular disease from the same image. So we did an

0:17:53.170 --> 0:17:58.250
<v Speaker 1>experiment where we took fundess images and we train a

0:17:58.290 --> 0:18:01.650
<v Speaker 1>model to predict whether or not that patient would have

0:18:01.690 --> 0:18:04.370
<v Speaker 1>a heart attack in five years. We found that we

0:18:04.450 --> 0:18:09.130
<v Speaker 1>could tell whether or not this patient may have a

0:18:09.130 --> 0:18:13.170
<v Speaker 1>a vascular event much better than doctors. It speaks to

0:18:13.410 --> 0:18:16.850
<v Speaker 1>what might be in this data that we've overlooked. The

0:18:16.930 --> 0:18:21.730
<v Speaker 1>model could make predictions that doctors couldn't from the same

0:18:21.890 --> 0:18:26.090
<v Speaker 1>type of data. It turned out the computer could also

0:18:26.130 --> 0:18:30.130
<v Speaker 1>do a reasonable job of predicting a patient sex, age,

0:18:30.170 --> 0:18:32.930
<v Speaker 1>and smoking status. The first time I did this with

0:18:32.970 --> 0:18:35.570
<v Speaker 1>an ahomologist, I think she thought I was trolling her.

0:18:35.850 --> 0:18:38.570
<v Speaker 1>I said, well, here pictures. Guess which one is a woman,

0:18:38.970 --> 0:18:41.410
<v Speaker 1>Guess which one is a man. Guess which one's a smoker,

0:18:41.850 --> 0:18:44.330
<v Speaker 1>Guess which one is young. Right, these are all tasks

0:18:44.370 --> 0:18:48.090
<v Speaker 1>that doctors don't generally do with these images. It turns

0:18:48.130 --> 0:18:51.970
<v Speaker 1>out the model was right ninety eight ninety nine percent

0:18:51.970 --> 0:18:55.610
<v Speaker 1>of the time. That being said, there are much easier

0:18:55.650 --> 0:18:59.290
<v Speaker 1>ways of getting the sex of a patience. So so,

0:18:59.450 --> 0:19:03.890
<v Speaker 1>while scientifically interesting, this is one of the most useless

0:19:03.890 --> 0:19:07.250
<v Speaker 1>clinical predictions ever. So how far can it go? If

0:19:07.250 --> 0:19:11.930
<v Speaker 1>you gave preference for rock music or not? What do

0:19:11.970 --> 0:19:15.850
<v Speaker 1>you think? You know? We tried predicting happiness. That didn't work,

0:19:15.850 --> 0:19:22.290
<v Speaker 1>So I'm guessing rock music. Oh, probably not, but who knows. So.

0:19:22.450 --> 0:19:26.610
<v Speaker 1>Predictive algorithms can learn a remarkable range of tasks, and

0:19:26.650 --> 0:19:30.890
<v Speaker 1>they can even discover hidden patterns that humans miss. We

0:19:31.010 --> 0:19:33.730
<v Speaker 1>just have to give them enough training data to learn from.

0:19:34.450 --> 0:19:43.970
<v Speaker 1>Sounds pretty fantastic. What could possibly go wrong? Chapter three?

0:19:44.290 --> 0:19:49.130
<v Speaker 1>What could possibly go wrong? If predictive algorithms can use

0:19:49.210 --> 0:19:53.410
<v Speaker 1>massive data to discover unexpected connections between your eye and

0:19:53.490 --> 0:19:58.410
<v Speaker 1>your heart, what might they be learning about, say, human society.

0:19:58.930 --> 0:20:01.170
<v Speaker 1>To answer this question, I took a trip to speak

0:20:01.210 --> 0:20:04.330
<v Speaker 1>with Kate Crawford, the co founder and co director of

0:20:04.370 --> 0:20:08.770
<v Speaker 1>the AI Now Institute at New York University. When we

0:20:08.890 --> 0:20:12.930
<v Speaker 1>begin and we were the world's first AI institute dedicated

0:20:12.970 --> 0:20:16.850
<v Speaker 1>to studying the social implications of these tools. To me,

0:20:17.290 --> 0:20:20.170
<v Speaker 1>these are the biggest challenges that we face right now,

0:20:20.210 --> 0:20:23.650
<v Speaker 1>simply because we've spent decades looking at these questions from

0:20:23.650 --> 0:20:26.770
<v Speaker 1>a technical lens at the expense of looking at them

0:20:26.850 --> 0:20:29.490
<v Speaker 1>at a social and an ethical lens. I knew about

0:20:29.570 --> 0:20:32.410
<v Speaker 1>Kate's work because we served together on a working group

0:20:32.450 --> 0:20:36.330
<v Speaker 1>about artificial intelligence for the US National Institutes of Health.

0:20:36.970 --> 0:20:40.770
<v Speaker 1>I also knew she had an interesting background. I grew

0:20:40.810 --> 0:20:44.930
<v Speaker 1>up in Australia. I studied a really strange grab bag

0:20:44.970 --> 0:20:49.370
<v Speaker 1>of disciplines. I studied law, I studied philosophy, and then

0:20:49.370 --> 0:20:52.570
<v Speaker 1>I got really interested in computer science, and this was

0:20:52.570 --> 0:20:56.210
<v Speaker 1>happening at the same time as I was writing electronic

0:20:56.330 --> 0:21:00.090
<v Speaker 1>music on large scale modulus synthesizers, and that's still a

0:21:00.170 --> 0:21:03.250
<v Speaker 1>thing that I do today. It's almost like the opposite

0:21:03.250 --> 0:21:06.730
<v Speaker 1>of artificial intelligence because it's so analog, so I absolutely

0:21:06.730 --> 0:21:09.010
<v Speaker 1>love it for that reason. In the year two thousand

0:21:09.090 --> 0:21:14.090
<v Speaker 1>and Kate's band released an album entitled twenty twenty that

0:21:14.210 --> 0:21:20.930
<v Speaker 1>included a pression song called Machines work so that people

0:21:21.050 --> 0:21:27.210
<v Speaker 1>have time to think. It's funny because we use a

0:21:27.330 --> 0:21:30.930
<v Speaker 1>sample from an early IBM promotional film that was made

0:21:30.930 --> 0:21:34.130
<v Speaker 1>in the nineteen sixties, which says machines can do the

0:21:34.170 --> 0:21:37.250
<v Speaker 1>work so that people have time to think, and we

0:21:37.370 --> 0:21:40.210
<v Speaker 1>actually ended up sort of cutting it and splicing it

0:21:40.250 --> 0:21:42.050
<v Speaker 1>in the track, so it ends up saying that people

0:21:42.050 --> 0:21:44.370
<v Speaker 1>can do the work so that machines have time to think.

0:21:44.690 --> 0:21:47.290
<v Speaker 1>And strangely, the more that I've been working in the

0:21:47.330 --> 0:21:49.930
<v Speaker 1>sort of machine learning space, I think, yeah, there's a

0:21:49.930 --> 0:21:52.050
<v Speaker 1>lot of ways in which actually people are doing the

0:21:52.130 --> 0:22:00.050
<v Speaker 1>work so that machines can do all the thinking. Kate

0:22:00.210 --> 0:22:03.890
<v Speaker 1>gave me a crash course on how predictive algorithms not

0:22:04.010 --> 0:22:08.170
<v Speaker 1>only teach themselves language skills, but also in the process

0:22:08.450 --> 0:22:13.730
<v Speaker 1>acquire human prejudices, even in something as seemingly benign as

0:22:13.850 --> 0:22:18.090
<v Speaker 1>language translation. So in many cases, if you say, translate

0:22:18.090 --> 0:22:21.930
<v Speaker 1>a sentence like she is a doctor into a language

0:22:21.930 --> 0:22:24.890
<v Speaker 1>like Turkish, and then you translate it back into English,

0:22:25.170 --> 0:22:28.130
<v Speaker 1>and you're saying Turkish because Turkish has pronouns that are

0:22:28.130 --> 0:22:32.170
<v Speaker 1>not gendered precisely, and so you would expect that you

0:22:32.170 --> 0:22:34.130
<v Speaker 1>would get the same sentence back, but you do not.

0:22:34.370 --> 0:22:37.410
<v Speaker 1>It will say he is a doctor, so she is

0:22:37.410 --> 0:22:41.810
<v Speaker 1>a doctor was translated into gender neutral Turkish as all

0:22:41.970 --> 0:22:46.770
<v Speaker 1>beer doctor, which was then back translated into English as

0:22:46.930 --> 0:22:49.650
<v Speaker 1>he is a doctor. In fact, you could see how

0:22:49.770 --> 0:22:53.170
<v Speaker 1>much the predictive algorithms had learned about gender roles. Just

0:22:53.250 --> 0:22:57.610
<v Speaker 1>by giving Google Translate a bunch of gender neutral sentences

0:22:57.650 --> 0:23:01.730
<v Speaker 1>in Turkish. You got he is an engineer, she is

0:23:01.730 --> 0:23:04.930
<v Speaker 1>a cook. He is a soldier, but she is a teacher.

0:23:05.290 --> 0:23:08.010
<v Speaker 1>He is a friend, but she is a lover. He

0:23:08.290 --> 0:23:11.170
<v Speaker 1>is happy and she is unhappy. I find that one

0:23:11.290 --> 0:23:15.810
<v Speaker 1>particularly odd, and it's not just language translation that's problematic.

0:23:16.250 --> 0:23:20.650
<v Speaker 1>The same sort of issues arise in language understanding. Predictive

0:23:20.690 --> 0:23:24.850
<v Speaker 1>algorithms were trained to learn analogies by reading lots of texts,

0:23:25.210 --> 0:23:28.570
<v Speaker 1>they concluded that dog is to puppy as cat is

0:23:28.570 --> 0:23:31.530
<v Speaker 1>to kitten, and man is to king as woman is

0:23:31.570 --> 0:23:36.490
<v Speaker 1>to queen. But they also automatically inferred that man is

0:23:36.490 --> 0:23:41.490
<v Speaker 1>to computer programmer as woman is to homemaker. And with

0:23:41.570 --> 0:23:44.890
<v Speaker 1>the rise of social media, Google used text on the

0:23:44.890 --> 0:23:49.530
<v Speaker 1>Internet to train predictive algorithms to infer the sentiment of

0:23:49.650 --> 0:23:53.690
<v Speaker 1>tweets and online reviews. Is it a positive sentiment? Is

0:23:53.690 --> 0:23:56.490
<v Speaker 1>it a negative sentiment? I believe it was Google who

0:23:56.490 --> 0:23:59.690
<v Speaker 1>released their sentiment engine, and you could just try it online,

0:23:59.730 --> 0:24:01.530
<v Speaker 1>you know, put in a sentiment and see what you'd get.

0:24:02.010 --> 0:24:05.730
<v Speaker 1>And again, similar problems emerged. If you typed in I

0:24:05.810 --> 0:24:08.130
<v Speaker 1>am a white man, you would get positive sentiment. If

0:24:08.130 --> 0:24:11.690
<v Speaker 1>you typed in a black lesbian, for example, negative sentiment.

0:24:12.530 --> 0:24:16.370
<v Speaker 1>Just as Greg Korado explained with chihuahuas and border collies,

0:24:16.850 --> 0:24:20.410
<v Speaker 1>the predictive algorithms were learning from the examples they found

0:24:20.410 --> 0:24:24.690
<v Speaker 1>in the world, and those examples reflected a lot about

0:24:24.770 --> 0:24:28.490
<v Speaker 1>past practices and prejudices. If we think about where you

0:24:28.570 --> 0:24:32.090
<v Speaker 1>might be scraping large amounts of text from say Reddit,

0:24:32.170 --> 0:24:35.450
<v Speaker 1>for example, and you're not thinking about how that sentiment

0:24:35.530 --> 0:24:39.250
<v Speaker 1>might be biased against certain groups, then you're just basically

0:24:39.290 --> 0:24:43.130
<v Speaker 1>importing that directly into your tool. But it's not just

0:24:43.250 --> 0:24:47.090
<v Speaker 1>conversations on Reddit. There's the cautionary tale of what happens

0:24:47.130 --> 0:24:50.450
<v Speaker 1>when Amazon let a computer teach itself how to sift

0:24:50.450 --> 0:24:54.930
<v Speaker 1>through mountains of resumes for computer programming jobs to find

0:24:55.010 --> 0:24:59.530
<v Speaker 1>the best candidates to interview. So they set up this system,

0:24:59.650 --> 0:25:01.730
<v Speaker 1>they designed it, and what they found was a very

0:25:01.810 --> 0:25:06.170
<v Speaker 1>quickly this system had learned to discard and really demote

0:25:06.490 --> 0:25:10.570
<v Speaker 1>the applications from women. And typically if you had a

0:25:10.570 --> 0:25:13.410
<v Speaker 1>women's college mentioned, and even if you had the word

0:25:13.770 --> 0:25:18.010
<v Speaker 1>women's on your resume, your application would go to the

0:25:18.050 --> 0:25:20.610
<v Speaker 1>bottom of the pile. All right, So how does it

0:25:20.770 --> 0:25:23.570
<v Speaker 1>learn that? So, first of all, we take a look

0:25:23.570 --> 0:25:26.770
<v Speaker 1>at who is generally hired by Amazon, and of course

0:25:26.850 --> 0:25:30.450
<v Speaker 1>they have a very heavily skewed male workforce, and so

0:25:30.490 --> 0:25:32.850
<v Speaker 1>the system is learning that these are the sorts of

0:25:32.890 --> 0:25:36.210
<v Speaker 1>people who will tend to be hired and promoted. And

0:25:36.330 --> 0:25:38.770
<v Speaker 1>it is not a surprise then that they actually found

0:25:38.770 --> 0:25:41.730
<v Speaker 1>it impossible to really retrain this system. They ended up

0:25:41.770 --> 0:25:45.650
<v Speaker 1>abandoning this tool because simply correcting for a bias is

0:25:45.770 --> 0:25:48.370
<v Speaker 1>very hard to do when all of your ground truth

0:25:48.450 --> 0:25:52.890
<v Speaker 1>data is so profoundly skewed in a particular direction. So

0:25:53.010 --> 0:25:57.250
<v Speaker 1>Amazon dropped this particular machine learning project and Google fixed

0:25:57.290 --> 0:26:01.210
<v Speaker 1>the Turkish to English problem. Today, Google Translate gives both

0:26:01.330 --> 0:26:04.250
<v Speaker 1>he is a doctor and she is a doctor as

0:26:04.290 --> 0:26:09.090
<v Speaker 1>translation options. But biases keep popping up in predictive algorithms

0:26:09.170 --> 0:26:14.170
<v Speaker 1>in many settings, there's no systematic way to prevent them. Instead,

0:26:14.650 --> 0:26:18.850
<v Speaker 1>spotting and fixing biases has become a game of whacamole

0:26:22.570 --> 0:26:28.930
<v Speaker 1>Chapter four quarterbacks. Perhaps it's no surprise that algorithms trained

0:26:28.930 --> 0:26:32.090
<v Speaker 1>in the wild west of the Internet or on tech

0:26:32.130 --> 0:26:37.330
<v Speaker 1>industry hiring practices learned serious biases. But what about more

0:26:37.450 --> 0:26:42.250
<v Speaker 1>sober settings like a hospital. I talked with someone recently

0:26:42.330 --> 0:26:47.850
<v Speaker 1>discovered similar problems with potentially life threatening consequences. Hi am

0:26:47.930 --> 0:26:52.250
<v Speaker 1>Christine Vogeli. I'm the director of evaluation research at Partner's

0:26:52.370 --> 0:26:57.930
<v Speaker 1>Healthcare here in Boston. Partner's Healthcare, recently rebranded as mass

0:26:58.010 --> 0:27:02.570
<v Speaker 1>General Brigham, is the largest healthcare provider in Massachusetts, a

0:27:02.650 --> 0:27:06.410
<v Speaker 1>system that has six thousand doctors and a dozen hospitals

0:27:06.410 --> 0:27:10.370
<v Speaker 1>and serves more than a million patients. As Christine explained

0:27:10.410 --> 0:27:13.530
<v Speaker 1>to me, the role of healthcare providers in the US

0:27:13.530 --> 0:27:18.210
<v Speaker 1>has been shifting. The responsibility for controlling costs and ensuring

0:27:18.290 --> 0:27:21.770
<v Speaker 1>high quality services is now being put down on the

0:27:21.810 --> 0:27:24.690
<v Speaker 1>hospitals and the doctors. And to me, this makes a

0:27:24.690 --> 0:27:27.010
<v Speaker 1>lot of sense, Right, we really should be the ones

0:27:27.090 --> 0:27:29.690
<v Speaker 1>responsible for ensuring that there's good quality care and that

0:27:29.730 --> 0:27:34.410
<v Speaker 1>we're doing it efficiently. Healthcare providers are especially focusing their

0:27:34.450 --> 0:27:38.770
<v Speaker 1>attention on what they call high risk patients. Really, what

0:27:38.810 --> 0:27:42.730
<v Speaker 1>it means is that they have both multiple chronic illnesses

0:27:43.130 --> 0:27:46.930
<v Speaker 1>and relatively acute chronic illnesses. So give me a set

0:27:46.970 --> 0:27:49.890
<v Speaker 1>of conditions that a patient might have, right, So somebody,

0:27:49.930 --> 0:27:53.250
<v Speaker 1>for example, with cardiovascular disease co occurring with diabetes, and

0:27:53.490 --> 0:27:55.890
<v Speaker 1>you know, maybe they also have depression. They're just kind

0:27:55.930 --> 0:27:57.890
<v Speaker 1>of suffering and trying to get used to having that

0:27:58.010 --> 0:28:01.610
<v Speaker 1>complex illness and how to manage it. Partners Healthcare offers

0:28:01.610 --> 0:28:04.650
<v Speaker 1>a program to help these complex patients. We have a

0:28:04.730 --> 0:28:08.170
<v Speaker 1>nurse or social worker who works as a care manager

0:28:08.610 --> 0:28:13.450
<v Speaker 1>who help everything from education to care coordination services. But

0:28:13.610 --> 0:28:17.530
<v Speaker 1>really that care manager works essentially as a quarterback, arranges

0:28:17.570 --> 0:28:20.210
<v Speaker 1>everything but also provides hands on care to the patient

0:28:20.250 --> 0:28:24.090
<v Speaker 1>and the caregiver. Yeah, I think it's a wonder how

0:28:24.170 --> 0:28:27.530
<v Speaker 1>we expect patients to go figure out all the things

0:28:27.610 --> 0:28:29.730
<v Speaker 1>they're supposed to be doing and how to interact with

0:28:29.770 --> 0:28:33.570
<v Speaker 1>the medical system without a quarterback. It's incredibly complex. These

0:28:33.610 --> 0:28:37.570
<v Speaker 1>patients have multiple specialists who are interacting with the primary

0:28:37.570 --> 0:28:39.690
<v Speaker 1>care physician. They need somebody to be able to tie

0:28:39.690 --> 0:28:42.210
<v Speaker 1>it together and be able to create a care plan

0:28:42.290 --> 0:28:45.290
<v Speaker 1>for them that they can follow, and it pulls everything

0:28:45.330 --> 0:28:49.050
<v Speaker 1>together from all those specialists. Partners Healthcare found that providing

0:28:49.130 --> 0:28:54.810
<v Speaker 1>complex patients with quarterbacks both saved money and improved patient's health.

0:28:55.330 --> 0:28:58.890
<v Speaker 1>For example, they had fewer emergency visits each year, so

0:28:59.050 --> 0:29:02.530
<v Speaker 1>Partners developed a program to identify the top three percent

0:29:02.570 --> 0:29:06.610
<v Speaker 1>of patients with the greatest need for the service. Most

0:29:06.690 --> 0:29:10.570
<v Speaker 1>were recommended by their physicians, but they also used a

0:29:10.650 --> 0:29:14.690
<v Speaker 1>predictive algorithm provided by a major health insurance company that

0:29:14.770 --> 0:29:19.090
<v Speaker 1>assigns each patient a risk score. What does the algorithm do?

0:29:19.650 --> 0:29:22.330
<v Speaker 1>When you look at the web page, it really describes

0:29:22.370 --> 0:29:26.290
<v Speaker 1>itself as a tool to help identify high risk patients.

0:29:26.530 --> 0:29:29.530
<v Speaker 1>And that term is really interesting term to me. What

0:29:29.730 --> 0:29:32.410
<v Speaker 1>makes a patient high risk? So I think from an

0:29:32.410 --> 0:29:36.730
<v Speaker 1>insurance perspective, risk means these patients are going to be

0:29:36.810 --> 0:29:41.730
<v Speaker 1>expensive from a healthcare organization perspective, these are patients who

0:29:41.730 --> 0:29:45.530
<v Speaker 1>we think we could help, and that's the fundamental challenge

0:29:45.570 --> 0:29:48.050
<v Speaker 1>on this one. When the team began to look closely

0:29:48.090 --> 0:29:51.210
<v Speaker 1>at the results, they noticed that people recommended by the

0:29:51.250 --> 0:29:55.690
<v Speaker 1>algorithm were strikingly different than those recommended by their doctor.

0:29:56.450 --> 0:30:00.930
<v Speaker 1>We noticed that black patients overall were underrepresented patients with

0:30:01.090 --> 0:30:05.090
<v Speaker 1>similar numbers of chronic illnesses. If they were black, they

0:30:05.130 --> 0:30:08.290
<v Speaker 1>had a lower riskcore than if they were white, and

0:30:08.330 --> 0:30:11.770
<v Speaker 1>that didn't make sense. Just black patients identified by the

0:30:11.810 --> 0:30:15.970
<v Speaker 1>algorithm turned out to have twenty six percent more chronic

0:30:16.010 --> 0:30:20.410
<v Speaker 1>illnesses than white patients with the same risk scores. So

0:30:20.450 --> 0:30:24.690
<v Speaker 1>what was wrong with the algorithm? It was because given

0:30:24.690 --> 0:30:28.530
<v Speaker 1>a certain level of illness, black and minority patients tend

0:30:28.570 --> 0:30:31.490
<v Speaker 1>to use fewer healthcare services, and whites tend to use

0:30:31.530 --> 0:30:35.250
<v Speaker 1>more even if they have the same level of chronic

0:30:35.410 --> 0:30:37.370
<v Speaker 1>even if they have the same level of chronic conditions.

0:30:37.370 --> 0:30:39.850
<v Speaker 1>That's right, So in some sense, the algorithm is correctly

0:30:39.930 --> 0:30:44.450
<v Speaker 1>predicting the cost associated with the patient, but not the

0:30:44.570 --> 0:30:48.970
<v Speaker 1>need exactly. It predicts costs very well, but we're interested

0:30:49.010 --> 0:30:53.050
<v Speaker 1>in understanding patients who are sick and have needs It's

0:30:53.090 --> 0:30:56.410
<v Speaker 1>important to say that the algorithm only used information about

0:30:56.410 --> 0:31:00.250
<v Speaker 1>insurance claims and medical costs. It didn't use any information

0:31:00.290 --> 0:31:04.610
<v Speaker 1>about a patient's race. But of course these factors are

0:31:04.690 --> 0:31:09.850
<v Speaker 1>correlated with race due to longstanding issues in American society. Frankly,

0:31:10.850 --> 0:31:14.690
<v Speaker 1>we have fewer minority physicians and we do white physicians.

0:31:14.690 --> 0:31:18.010
<v Speaker 1>So the level of trust minorities with the healthcare system,

0:31:18.010 --> 0:31:21.450
<v Speaker 1>we've observed it's lower. And we also know that there

0:31:21.490 --> 0:31:25.890
<v Speaker 1>are just systematic barriers to care that certain groups of

0:31:25.930 --> 0:31:29.770
<v Speaker 1>patients experience more so. For example, race and poverty go

0:31:29.850 --> 0:31:34.370
<v Speaker 1>together and job flexibility. So all these issues with scheduling,

0:31:34.410 --> 0:31:37.330
<v Speaker 1>being able to come in, being able to access services

0:31:37.410 --> 0:31:41.050
<v Speaker 1>are just heightened for minority populations relative to white populations.

0:31:41.730 --> 0:31:46.970
<v Speaker 1>So someone who just has less economic resources might not

0:31:47.050 --> 0:31:48.850
<v Speaker 1>be able to get off work, might not be able

0:31:48.850 --> 0:31:51.010
<v Speaker 1>to get off work, might not have the flexibility with

0:31:51.090 --> 0:31:53.010
<v Speaker 1>childcare to be able to come in for a visit

0:31:53.050 --> 0:31:55.650
<v Speaker 1>when they need to. Exactly, so it means that if

0:31:55.770 --> 0:31:59.970
<v Speaker 1>one only relied on the algorithm, you wouldn't be targeting

0:31:59.970 --> 0:32:03.410
<v Speaker 1>the right people. Yes, we would be targeting more advantaged

0:32:03.490 --> 0:32:06.650
<v Speaker 1>patients who tend to use a lot of healthcare services

0:32:06.810 --> 0:32:09.970
<v Speaker 1>when they corrected the problem, the proportion black patients in

0:32:10.050 --> 0:32:13.850
<v Speaker 1>the high risk group jumped from eighteen percent to forty

0:32:13.930 --> 0:32:18.810
<v Speaker 1>seven percent. Christine, together with colleagues from several other institutions,

0:32:18.810 --> 0:32:22.890
<v Speaker 1>wrote up a paper describing their findings. It was published

0:32:22.930 --> 0:32:27.570
<v Speaker 1>in Science, the nation's leading research journal, in twenty nineteen.

0:32:28.330 --> 0:32:31.730
<v Speaker 1>It made a big splash, not least because many other

0:32:31.730 --> 0:32:34.850
<v Speaker 1>hospital systems we're using the algorithm and others like it.

0:32:35.210 --> 0:32:39.010
<v Speaker 1>We've since changed the algorithm that we use to one

0:32:39.090 --> 0:32:45.290
<v Speaker 1>that uses exclusively information about chronic illness and not healthcare utilization,

0:32:45.770 --> 0:32:49.970
<v Speaker 1>and has that worked. We're still testing. We think it's

0:32:49.970 --> 0:32:52.330
<v Speaker 1>going to work, but as in all of these things,

0:32:52.530 --> 0:32:55.370
<v Speaker 1>you really need to test it. You need to understand

0:32:55.570 --> 0:32:58.530
<v Speaker 1>and see if there's actually any biases. In the end,

0:32:58.610 --> 0:33:01.090
<v Speaker 1>you can't just adopt an algorithm. It's very important to

0:33:01.090 --> 0:33:04.090
<v Speaker 1>be very conscious about what you're predicting. It's also very

0:33:04.090 --> 0:33:06.370
<v Speaker 1>important to think about what are the factors you're putting

0:33:06.370 --> 0:33:09.090
<v Speaker 1>into that prediction algorithm. Even if you believe the ingredients

0:33:09.210 --> 0:33:11.570
<v Speaker 1>so right, you do actually have to see how it

0:33:11.610 --> 0:33:15.210
<v Speaker 1>works in practice. Anything that has to do with people's lives,

0:33:15.290 --> 0:33:22.370
<v Speaker 1>you know, you have to be transparent about it. Chapter

0:33:22.450 --> 0:33:29.690
<v Speaker 1>five Compass Transparency. Christine Vogeli and her colleagues were able

0:33:29.730 --> 0:33:31.250
<v Speaker 1>to get to the bottom of the issue with the

0:33:31.290 --> 0:33:34.730
<v Speaker 1>medical risk prediction because they had ready access to the

0:33:34.770 --> 0:33:39.490
<v Speaker 1>partners healthcare data and could test the algorithm. Unfortunately, that's

0:33:39.570 --> 0:33:42.970
<v Speaker 1>not always the case. I traveled to New York to

0:33:43.050 --> 0:33:46.210
<v Speaker 1>speak with a person who's arguably done more than anyone

0:33:46.650 --> 0:33:50.930
<v Speaker 1>to focus attention on the consequences of algorithmic bias. My

0:33:51.010 --> 0:33:54.530
<v Speaker 1>name is Julia Anguin. I'm a journalist. I've been writing

0:33:54.530 --> 0:33:57.970
<v Speaker 1>about technology for twenty five years, mostly at the Wealthy

0:33:58.050 --> 0:34:01.330
<v Speaker 1>Journal and Pro Publicat. Julia grew up in Silicon Valley

0:34:01.570 --> 0:34:04.770
<v Speaker 1>as the child of a mathematician and a chemist. She

0:34:04.890 --> 0:34:08.370
<v Speaker 1>studied math at the University of Chicago, but decided to

0:34:08.370 --> 0:34:12.450
<v Speaker 1>pursue a career your in journalism. Her quantitative skills gave

0:34:12.490 --> 0:34:16.290
<v Speaker 1>her a unique lens to report on the societal implications

0:34:16.290 --> 0:34:21.290
<v Speaker 1>of technology, and she eventually became interested in investigating high

0:34:21.370 --> 0:34:24.570
<v Speaker 1>stakes algorithms. When I learned that there was actually an

0:34:24.610 --> 0:34:28.210
<v Speaker 1>algorithm that judges used to help decide what to sentence people,

0:34:28.850 --> 0:34:31.370
<v Speaker 1>I was stunned. I thought, this is shocking. I can't

0:34:31.370 --> 0:34:33.770
<v Speaker 1>believe this exists, and I'm going to investigate it. What

0:34:33.890 --> 0:34:39.090
<v Speaker 1>we're talking about is a score that is assigned to

0:34:39.690 --> 0:34:44.490
<v Speaker 1>criminal defendants in many jurisdictions in this country that aims

0:34:44.530 --> 0:34:46.970
<v Speaker 1>to predict whether they will go on to commit a

0:34:47.010 --> 0:34:52.490
<v Speaker 1>future crime. It's known as a risk assessment score, and

0:34:52.610 --> 0:34:54.490
<v Speaker 1>the one that we chose to look at was called

0:34:54.530 --> 0:34:57.890
<v Speaker 1>the Compass Risk Assessment Score. Based on the answers to

0:34:57.930 --> 0:35:01.970
<v Speaker 1>a long list of questions, Compass gives defendants a risk

0:35:02.090 --> 0:35:07.090
<v Speaker 1>score from one to ten. In some jurisdictions, judges use

0:35:07.170 --> 0:35:10.690
<v Speaker 1>the Compass score to decide whether defendants should be released

0:35:10.730 --> 0:35:14.770
<v Speaker 1>on bail before trial. In others, judges use it to

0:35:14.770 --> 0:35:18.250
<v Speaker 1>decide the length of sentence to impose undefendants who plead

0:35:18.290 --> 0:35:22.410
<v Speaker 1>guilty or who were convicted a trial. Julia had a

0:35:22.450 --> 0:35:26.890
<v Speaker 1>suspicion that the algorithm might reflect bias against black defendants.

0:35:27.130 --> 0:35:31.290
<v Speaker 1>Attorney General Eric Holder had actually given a big speech

0:35:31.290 --> 0:35:33.730
<v Speaker 1>saying he was concerned about the use of these growers

0:35:33.770 --> 0:35:37.090
<v Speaker 1>and whether they were exacerbating racial bias, and so that

0:35:37.170 --> 0:35:39.770
<v Speaker 1>was one of the reasons we wanted to investigate. But

0:35:39.970 --> 0:35:45.450
<v Speaker 1>investigating wasn't easy. Unlike Christine Vogeli had partner's healthcare. Julia

0:35:45.610 --> 0:35:49.810
<v Speaker 1>couldn't inspect the Compass algorithm itself. Now, Compass isn't a

0:35:49.890 --> 0:35:53.330
<v Speaker 1>modern neural network who was developed by a company that's

0:35:53.330 --> 0:35:57.410
<v Speaker 1>now called Equivand and it's a much simpler algorithm. It's

0:35:57.450 --> 0:36:01.450
<v Speaker 1>basically a linear equation that should be easy to understand.

0:36:02.290 --> 0:36:05.210
<v Speaker 1>But it's a black box of a different sort. The

0:36:05.290 --> 0:36:10.450
<v Speaker 1>algorithm is opaque because to date Fant has insisted on

0:36:10.610 --> 0:36:14.450
<v Speaker 1>keeping it a trade secret. Julia also had no way

0:36:14.450 --> 0:36:18.210
<v Speaker 1>to download defendants Compass scores from her website, so she

0:36:18.330 --> 0:36:21.650
<v Speaker 1>had to gather the data herself. Her team decided to

0:36:21.690 --> 0:36:26.770
<v Speaker 1>focus on Broward County, Florida. Florida has great public records laws,

0:36:26.970 --> 0:36:29.690
<v Speaker 1>and so we filed a public records request and we

0:36:29.810 --> 0:36:33.410
<v Speaker 1>did end up getting eighteen thousand scores. We got scores

0:36:33.410 --> 0:36:36.850
<v Speaker 1>for everyone who was arrested for a two year period.

0:36:37.530 --> 0:36:41.490
<v Speaker 1>Eighteen thousand scores. All right, So then what did you

0:36:41.530 --> 0:36:44.930
<v Speaker 1>do to evaluate these scores? Well, first thing we did

0:36:44.970 --> 0:36:47.490
<v Speaker 1>when we got the eighteen thousand scores was actually we

0:36:47.570 --> 0:36:51.850
<v Speaker 1>just threw them into a bar chart black and white defendants.

0:36:52.450 --> 0:36:56.530
<v Speaker 1>We immediately noticed there was really different looking distributions for

0:36:56.690 --> 0:37:01.010
<v Speaker 1>black defendants. The scores were evenly distributed, meaning one through

0:37:01.090 --> 0:37:04.010
<v Speaker 1>ten lowest risk to highest Chris. There's equal numbers of

0:37:04.010 --> 0:37:07.690
<v Speaker 1>black defendants in every one of those buckets. For white defendants,

0:37:07.810 --> 0:37:11.730
<v Speaker 1>the scores were heavily clustered in the low risk range.

0:37:12.050 --> 0:37:15.010
<v Speaker 1>And so we thought, there's two options. All the white

0:37:15.010 --> 0:37:19.010
<v Speaker 1>people getting scored in Broward County are legitimately really low risk.

0:37:19.250 --> 0:37:22.810
<v Speaker 1>They're all Mother Teresa, or there's something weird going on.

0:37:23.290 --> 0:37:26.890
<v Speaker 1>Julia sworted the defendants and to those who were rearrested

0:37:27.050 --> 0:37:30.210
<v Speaker 1>over the next two years and those who weren't. She

0:37:30.330 --> 0:37:33.610
<v Speaker 1>compared the compass scores that had been assigned to each group.

0:37:34.210 --> 0:37:38.050
<v Speaker 1>For black defendants, it was much more likely to incorrectly

0:37:38.170 --> 0:37:40.290
<v Speaker 1>predict that they were going to go on to commit

0:37:40.370 --> 0:37:43.370
<v Speaker 1>a future crime when they didn't, and for white defendants,

0:37:43.370 --> 0:37:45.810
<v Speaker 1>it was much more likely to predict that they were

0:37:45.970 --> 0:37:48.330
<v Speaker 1>going to go on to not commit a future crime

0:37:48.370 --> 0:37:51.410
<v Speaker 1>when they did. They were twice as many false positives

0:37:51.650 --> 0:37:54.490
<v Speaker 1>for black defendants as white and twice as many false

0:37:54.530 --> 0:37:57.970
<v Speaker 1>negatives for white defendants as black defendants. Julia described the

0:37:58.010 --> 0:38:02.210
<v Speaker 1>story of two people whose arrest histories illustrate this difference.

0:38:02.730 --> 0:38:05.410
<v Speaker 1>A young eighteen year old black girl named Brecia Borden,

0:38:05.930 --> 0:38:10.730
<v Speaker 1>who had been arrested after picking up kid's bicycle from

0:38:10.730 --> 0:38:13.770
<v Speaker 1>their front yard. Riding at a few blocks. The mom

0:38:13.850 --> 0:38:16.090
<v Speaker 1>came out yelled at her, so that's my kid's bike.

0:38:16.690 --> 0:38:20.490
<v Speaker 1>She gave it back, but actually by then the neighbor

0:38:20.570 --> 0:38:22.770
<v Speaker 1>had called the police, and so she was arrested for that.

0:38:23.290 --> 0:38:26.730
<v Speaker 1>And we compared her with a white man who had

0:38:26.770 --> 0:38:30.450
<v Speaker 1>stolen about eighty dollars worth of stuff from a drug

0:38:30.490 --> 0:38:35.170
<v Speaker 1>store Vernon Prater. When teenager Brecia Borden got booked into jail,

0:38:35.850 --> 0:38:40.410
<v Speaker 1>she got a high compass score and eight, predicting a

0:38:40.490 --> 0:38:44.330
<v Speaker 1>high risk that she'd get re arrested, And Vernon Prader,

0:38:44.930 --> 0:38:49.890
<v Speaker 1>he got a low score a three. Now he had

0:38:49.890 --> 0:38:54.370
<v Speaker 1>already committed two armed robberies and had served time. She

0:38:54.930 --> 0:38:58.970
<v Speaker 1>was eighteen. She given back the bike, and of course

0:38:59.010 --> 0:39:01.010
<v Speaker 1>these scores turned out to be completely wrong. She did

0:39:01.050 --> 0:39:02.690
<v Speaker 1>not go on to commit a future crime in the

0:39:02.690 --> 0:39:05.090
<v Speaker 1>next two years, and he actually went on to break

0:39:05.090 --> 0:39:08.010
<v Speaker 1>into a warehouse steal thousands of dollars of electronics and

0:39:08.050 --> 0:39:13.450
<v Speaker 1>he's serving a ten year ten And so that's what

0:39:13.490 --> 0:39:15.610
<v Speaker 1>the difference between a false positive and a false negative

0:39:15.650 --> 0:39:18.330
<v Speaker 1>looks like. It looks like Fresha Borden and Vernon Prater

0:39:24.730 --> 0:39:30.770
<v Speaker 1>Chapter six Criminal Attitudes. Julia Anguin and her team spent

0:39:30.850 --> 0:39:35.130
<v Speaker 1>over a year doing research. In May twenty sixteen, Pro

0:39:35.250 --> 0:39:42.250
<v Speaker 1>Publica published their article headlined machine Bias. The subtitle quote

0:39:42.650 --> 0:39:46.610
<v Speaker 1>their software used across the country to predict future criminals,

0:39:46.970 --> 0:39:52.130
<v Speaker 1>and it's biased against blacks. Julia's team released all the

0:39:52.250 --> 0:39:55.650
<v Speaker 1>data they had collected so that anyone could check or

0:39:55.730 --> 0:40:02.050
<v Speaker 1>dispute their conclusions. What happened next was truly remarkable. The

0:40:02.130 --> 0:40:06.810
<v Speaker 1>Pro Publica article provoked an outcry for some statisticians, who

0:40:06.970 --> 0:40:10.970
<v Speaker 1>argued that the data actually proved moved Compass wasn't biased.

0:40:11.810 --> 0:40:15.490
<v Speaker 1>How could they reach the opposite conclusion. It turned out

0:40:15.530 --> 0:40:21.290
<v Speaker 1>the answer depended on how you define bias. Pro Publica

0:40:21.370 --> 0:40:25.090
<v Speaker 1>had to analyze the Compass scores by looking backward after

0:40:25.170 --> 0:40:29.170
<v Speaker 1>the outcomes were known among people who are not re arrested.

0:40:29.610 --> 0:40:32.530
<v Speaker 1>They found that black people had been assigned much higher

0:40:32.610 --> 0:40:36.890
<v Speaker 1>risk scores than white people. That seemed pretty unfair, but

0:40:37.050 --> 0:40:41.850
<v Speaker 1>statisticians use the word bias to describe how a predictor

0:40:41.930 --> 0:40:47.490
<v Speaker 1>performs when looking forward before the outcomes happened. It turns

0:40:47.490 --> 0:40:50.690
<v Speaker 1>out that black people and white people who received the

0:40:50.810 --> 0:40:55.010
<v Speaker 1>same risk score had roughly the same chance of being rearrested.

0:40:55.930 --> 0:41:00.330
<v Speaker 1>That seems pretty fair, So whether Compass was fair or

0:41:00.450 --> 0:41:06.090
<v Speaker 1>unfair depended on your definition of fairness. This sparked an

0:41:06.130 --> 0:41:11.450
<v Speaker 1>explosion of academic research. Matt Maticians showed there's no way

0:41:11.530 --> 0:41:15.170
<v Speaker 1>out of the problem. They proved a theorem saying it's

0:41:15.330 --> 0:41:19.850
<v Speaker 1>impossible to build a risk predictor that's fair when looking

0:41:19.930 --> 0:41:24.170
<v Speaker 1>both backward and forward unless the arrest rates for black

0:41:24.210 --> 0:41:28.810
<v Speaker 1>people and white people are identical, which they aren't. The

0:41:28.890 --> 0:41:32.250
<v Speaker 1>pro public article also focused at tension on many other

0:41:32.330 --> 0:41:36.650
<v Speaker 1>ways in which COMPASS scores are biased, like the healthcare

0:41:36.730 --> 0:41:41.730
<v Speaker 1>algorithm that Christine Vogeli studied. Compass scores don't explicitly ask

0:41:41.810 --> 0:41:45.490
<v Speaker 1>about a person's race, but race is closely correlated with

0:41:45.530 --> 0:41:49.770
<v Speaker 1>both the training data and the inputs to the algorithm. First,

0:41:49.810 --> 0:41:53.490
<v Speaker 1>the training data, COMPASS isn't actually trained to predict the

0:41:53.490 --> 0:41:58.170
<v Speaker 1>probability that a person will commit another crime. Instead, it's

0:41:58.170 --> 0:42:01.330
<v Speaker 1>trained to predict whether a person will be arrested for

0:42:01.410 --> 0:42:05.850
<v Speaker 1>committing another crime. The problem is there's abundant evidence that

0:42:06.290 --> 0:42:10.010
<v Speaker 1>in situations where black people and white people commit crimes

0:42:10.050 --> 0:42:13.970
<v Speaker 1>at the same rate, for example, illegal drug use, black

0:42:14.010 --> 0:42:17.810
<v Speaker 1>people are much more likely to get arrested, so Compass

0:42:17.930 --> 0:42:23.290
<v Speaker 1>is being trained on an unfair outcome. Second, the questionnaire

0:42:23.450 --> 0:42:28.810
<v Speaker 1>used to calculate Compass scores is pretty revealing. Some sections

0:42:28.930 --> 0:42:35.530
<v Speaker 1>assess peers, work, and social environment. The questions include how

0:42:35.530 --> 0:42:38.930
<v Speaker 1>many of your friends and acquaintances have ever been arrested?

0:42:39.490 --> 0:42:42.890
<v Speaker 1>How many have been crime victims? How often do you

0:42:42.890 --> 0:42:48.730
<v Speaker 1>have trouble paying bills. Other sections are titled criminal personality

0:42:48.770 --> 0:42:53.130
<v Speaker 1>and criminal attitude. They ask people to agree or disagree

0:42:53.170 --> 0:42:57.330
<v Speaker 1>with such statements as the law doesn't help average people,

0:42:58.250 --> 0:43:01.930
<v Speaker 1>or many people get into trouble because society has given

0:43:01.970 --> 0:43:06.490
<v Speaker 1>them no education, jobs, or future. In a nutshell, the

0:43:06.570 --> 0:43:10.770
<v Speaker 1>predictor penalizes defendants who are honest enough to admit they

0:43:10.810 --> 0:43:13.970
<v Speaker 1>live in high crime neighborhoods or they don't fully trust

0:43:14.010 --> 0:43:18.050
<v Speaker 1>the system. From the questionnaire, it's not hard to guess

0:43:18.090 --> 0:43:22.250
<v Speaker 1>how a teenage black girl arrested for something so minor

0:43:22.330 --> 0:43:26.290
<v Speaker 1>is writing someone else's bicycle a few blocks and returning

0:43:26.290 --> 0:43:30.490
<v Speaker 1>it might have received a COMPASS score of eight. And

0:43:30.570 --> 0:43:35.330
<v Speaker 1>it's not hard to imagine why racially correlated questions would

0:43:35.410 --> 0:43:39.610
<v Speaker 1>do a good job of predicting racially correlated arrest rates.

0:43:40.570 --> 0:43:43.370
<v Speaker 1>Pro PUBLICA didn't win a Pulitzer Prize for its article,

0:43:43.890 --> 0:43:52.610
<v Speaker 1>but it was a remarkable public service Chapter seven Minority report.

0:43:54.290 --> 0:43:57.610
<v Speaker 1>Putting aside the details of Compass, I wanted to find

0:43:57.610 --> 0:44:00.530
<v Speaker 1>out more about the role of predictive algorithms in courts.

0:44:01.170 --> 0:44:03.850
<v Speaker 1>I reached out to one of the leading legal scholars

0:44:03.890 --> 0:44:06.970
<v Speaker 1>in the country. I'm Martha Minnow. I'm a law professor

0:44:07.010 --> 0:44:12.690
<v Speaker 1>at Harvard, and I have recently immersed myself in issues

0:44:12.770 --> 0:44:18.170
<v Speaker 1>of algorithmic fairness. Martha Minnow has a remarkable resume. From

0:44:18.210 --> 0:44:21.810
<v Speaker 1>two thousand and nine to twenty seventeen, she served as

0:44:21.930 --> 0:44:25.530
<v Speaker 1>dean of the Harvard Law School, following now Supreme Court

0:44:25.610 --> 0:44:29.410
<v Speaker 1>Justice Elaina Kagan. Martha also served on the board of

0:44:29.410 --> 0:44:34.410
<v Speaker 1>the government sponsored Legal Services Corporation, which provides legal assistance

0:44:34.450 --> 0:44:37.850
<v Speaker 1>to low income Americans. She was appointed by her former

0:44:37.970 --> 0:44:42.570
<v Speaker 1>law student, President Barack Obama. It became very interested in

0:44:42.650 --> 0:44:46.970
<v Speaker 1>and concerned about the increasing use of algorithms in worlds

0:44:47.010 --> 0:44:51.930
<v Speaker 1>that touch on my preoccupations with equal protection, do process,

0:44:52.290 --> 0:44:58.210
<v Speaker 1>constitutional rights, fairness, anti discrimination. Martha recently co signed a

0:44:58.290 --> 0:45:02.890
<v Speaker 1>statement with twenty six other lawyers and scientists raising quote

0:45:03.050 --> 0:45:07.210
<v Speaker 1>grave concerns about the use of predictive algorithms for pre

0:45:07.330 --> 0:45:11.610
<v Speaker 1>trial risk assessment. I asked her how courts had gotten

0:45:11.650 --> 0:45:16.690
<v Speaker 1>involved in the business of prediction. Criminal's justice system has

0:45:16.770 --> 0:45:21.770
<v Speaker 1>flirted with the use of prediction forever, including discussions from

0:45:21.770 --> 0:45:24.970
<v Speaker 1>the nineteenth century on in this country about dangerousness and

0:45:25.090 --> 0:45:30.410
<v Speaker 1>whether people should be detained prevactively. So far, that's not

0:45:30.530 --> 0:45:33.810
<v Speaker 1>permitted in the United States. It appears in Minority Report

0:45:33.850 --> 0:45:38.730
<v Speaker 1>and other interesting movies. The movie starring Tom Cruise tells

0:45:38.850 --> 0:45:42.010
<v Speaker 1>the story of a future in which the PreCrime division

0:45:42.170 --> 0:45:46.770
<v Speaker 1>of the police arrest people for crimes they haven't yet committed.

0:45:48.170 --> 0:45:50.530
<v Speaker 1>I'm placing you under arrest for the future, murder, Sarah marks.

0:45:50.610 --> 0:45:54.170
<v Speaker 1>We are arresting individuals who've broken no law, but they will.

0:45:54.770 --> 0:45:59.330
<v Speaker 1>The use of prediction in the context of sentencing is

0:45:59.410 --> 0:46:04.330
<v Speaker 1>part of this rather large sphere of discretion that judges

0:46:04.410 --> 0:46:07.890
<v Speaker 1>have to decide what kind of sentence fits the crime

0:46:08.570 --> 0:46:13.410
<v Speaker 1>you're saying. In sentencing, one is allowed to use essentially

0:46:13.490 --> 0:46:16.970
<v Speaker 1>information from the pre crime division about crimes that haven't

0:46:17.010 --> 0:46:21.690
<v Speaker 1>been committed yet. Well, I am horrified by that suggestion,

0:46:21.770 --> 0:46:24.290
<v Speaker 1>but I think it's fair to raise it as a concern.

0:46:25.010 --> 0:46:30.050
<v Speaker 1>The problem is if we actually acknowledge purposes of the

0:46:30.050 --> 0:46:33.050
<v Speaker 1>criminal justice system, some of them start to get into

0:46:33.210 --> 0:46:39.530
<v Speaker 1>the future. So if one purpose is simply incapacitation, prevent

0:46:39.610 --> 0:46:43.010
<v Speaker 1>this person from walking the streets because they might hurt

0:46:43.050 --> 0:46:47.130
<v Speaker 1>someone else, there's a prediction built in. So judges have

0:46:47.210 --> 0:46:50.690
<v Speaker 1>been factoring in predictions about a defendant's future behavior for

0:46:50.770 --> 0:46:55.130
<v Speaker 1>a long time. And judges certainly aren't perfect. They can

0:46:55.170 --> 0:47:00.370
<v Speaker 1>be biased or sometimes just cranky. There are even studies

0:47:00.370 --> 0:47:04.570
<v Speaker 1>showing the judges hand down harsher sentences before lunch breaks

0:47:04.810 --> 0:47:09.770
<v Speaker 1>than after. Now, the defenders of risk Prediction score will say, well,

0:47:09.770 --> 0:47:12.890
<v Speaker 1>it's always not what's the ideal but compared to what

0:47:13.730 --> 0:47:17.450
<v Speaker 1>and if the alternative is we're relying entirely on the

0:47:17.650 --> 0:47:23.650
<v Speaker 1>individual judges and their prejudices, their lack of education, what

0:47:23.770 --> 0:47:27.130
<v Speaker 1>they had for lunch. Isn't this better that it will

0:47:27.170 --> 0:47:33.530
<v Speaker 1>provide some kind of scaffold for more consistency. Journalist Julia

0:47:33.610 --> 0:47:37.490
<v Speaker 1>Anguin has heard the same arguments some good friends right

0:47:37.530 --> 0:47:40.890
<v Speaker 1>who really believe in the use of these criminal risks

0:47:40.890 --> 0:47:43.650
<v Speaker 1>score algorithms. I've said to me, look, Julia, the fact

0:47:43.770 --> 0:47:48.130
<v Speaker 1>is judges are terribly biased, and this is an improvement,

0:47:48.290 --> 0:47:51.850
<v Speaker 1>and my feeling is That's probably true for some judges

0:47:51.890 --> 0:47:55.610
<v Speaker 1>and maybe less true for other judges. But I don't

0:47:55.610 --> 0:47:59.450
<v Speaker 1>think it is a reason to automate bias, right, Like

0:47:59.490 --> 0:48:02.490
<v Speaker 1>I don't understand why you say, Okay, humans are flawed,

0:48:02.530 --> 0:48:05.290
<v Speaker 1>so why don't we make a flawed algorithm and bake

0:48:05.370 --> 0:48:09.730
<v Speaker 1>it into every decision, because then it's really intractable. Martha

0:48:09.890 --> 0:48:14.450
<v Speaker 1>also worries that numerical risk scores are misleading. The judges

0:48:14.570 --> 0:48:18.090
<v Speaker 1>think high numbers mean people are very likely to commit

0:48:18.170 --> 0:48:21.890
<v Speaker 1>violent crime. In fact, the actual probability of violence is

0:48:22.010 --> 0:48:26.810
<v Speaker 1>very low, about eight percent according to a public assessment,

0:48:27.770 --> 0:48:31.930
<v Speaker 1>And she thinks numerical scores can lull judges into a

0:48:32.090 --> 0:48:36.690
<v Speaker 1>false sense of certainty. There's an appearance of objectivity because

0:48:36.690 --> 0:48:40.930
<v Speaker 1>it's math, but is it really Then for lawyers, they

0:48:40.970 --> 0:48:45.170
<v Speaker 1>may have had no math, no numeracy education since high school.

0:48:46.090 --> 0:48:49.250
<v Speaker 1>Many people go to a law in part because they

0:48:49.290 --> 0:48:53.050
<v Speaker 1>don't want to do anything with numbers. And there is

0:48:53.730 --> 0:48:58.810
<v Speaker 1>a larger problem, which is the deference to expertise, particularly

0:48:58.810 --> 0:49:03.610
<v Speaker 1>scientific expertise. Finally, I wanted to ask Martha if defendants

0:49:03.610 --> 0:49:07.410
<v Speaker 1>have a constitutional right to know what's inside the black

0:49:07.450 --> 0:49:11.210
<v Speaker 1>box that's helping to term in their fate. I confess

0:49:11.290 --> 0:49:15.090
<v Speaker 1>I thought the answer was an obvious yes until I

0:49:15.130 --> 0:49:19.890
<v Speaker 1>read a twenty sixteen decision by Wisconsin's Supreme Court. The

0:49:20.010 --> 0:49:24.210
<v Speaker 1>defendant in that case, Eric Loomis, pled guilty to operating

0:49:24.210 --> 0:49:28.290
<v Speaker 1>a car without the owner's permission and fleeing a traffic officer.

0:49:29.210 --> 0:49:32.610
<v Speaker 1>When Loomis was sentenced, the presentencing report given to the

0:49:32.690 --> 0:49:36.850
<v Speaker 1>judge included a Compass score that predicted Loomis had a

0:49:36.970 --> 0:49:41.530
<v Speaker 1>high risk for committing future crimes. He was sentenced to

0:49:41.690 --> 0:49:46.850
<v Speaker 1>six years in prison. Loomis appealed, arguing that his inability

0:49:46.890 --> 0:49:51.970
<v Speaker 1>to inspect the Compass algorithm violated his constitutional right to

0:49:52.050 --> 0:49:57.130
<v Speaker 1>due process. Wisconsin's Supreme Court ultimately decided that Loomis had

0:49:57.290 --> 0:50:02.090
<v Speaker 1>no right to know how Compass worked. Why. First, the

0:50:02.090 --> 0:50:05.890
<v Speaker 1>Wisconsin court said the score was just one of several

0:50:05.970 --> 0:50:10.850
<v Speaker 1>inputs to the judge's sentencing decision. Second, the court said

0:50:11.370 --> 0:50:13.850
<v Speaker 1>even if Lomas didn't know how the score was determined,

0:50:14.210 --> 0:50:17.970
<v Speaker 1>he could still dispute its accuracy. Lomas appealed to the

0:50:18.050 --> 0:50:21.610
<v Speaker 1>US Supreme Court, but it declined to hear the case.

0:50:22.490 --> 0:50:26.650
<v Speaker 1>I find that troubling and not persuasive. It was up

0:50:26.690 --> 0:50:31.050
<v Speaker 1>to you, how would you change the law. I actually

0:50:31.170 --> 0:50:37.890
<v Speaker 1>would require transparency for any use of any algorithm by

0:50:37.930 --> 0:50:43.730
<v Speaker 1>a government agency or court that has the consequence of

0:50:43.890 --> 0:50:50.250
<v Speaker 1>influencing not just deciding, but influencing decisions about individual's rights.

0:50:50.410 --> 0:50:54.410
<v Speaker 1>And those rights could be rights to liberty, property opportunities.

0:50:54.850 --> 0:50:58.050
<v Speaker 1>So transparency, transparency, and am be able to see what

0:50:58.090 --> 0:51:01.410
<v Speaker 1>this algorithm does, absolutely and have the code and be

0:51:01.490 --> 0:51:03.410
<v Speaker 1>able to give it to your own lawyer and your

0:51:03.450 --> 0:51:07.410
<v Speaker 1>own experts. But should a state be able to buy

0:51:07.810 --> 0:51:11.650
<v Speaker 1>a computer program that's proprietary. I mean it would say, well,

0:51:11.850 --> 0:51:13.850
<v Speaker 1>I'd love to give it to you, but it's proprietary.

0:51:13.930 --> 0:51:16.410
<v Speaker 1>I can't. Should that be okay? I think not, because

0:51:16.450 --> 0:51:20.170
<v Speaker 1>if that then limits the transparency, that seems a breach.

0:51:20.490 --> 0:51:23.610
<v Speaker 1>But you know, this is a major problem, the outsourcing

0:51:23.610 --> 0:51:28.650
<v Speaker 1>of government activity that has the effect of bypassing restrictions.

0:51:28.970 --> 0:51:35.370
<v Speaker 1>Take another example, when the US government hires private contractors

0:51:35.410 --> 0:51:40.010
<v Speaker 1>to engage in war activities, they are not governed by

0:51:40.050 --> 0:51:43.090
<v Speaker 1>the same rules that govern the US military. She's saying

0:51:43.130 --> 0:51:47.690
<v Speaker 1>that government can get around constitutional limitations on the government

0:51:48.290 --> 0:51:51.130
<v Speaker 1>by just outsourcing it to somebody who's not the government.

0:51:51.210 --> 0:51:54.890
<v Speaker 1>It's currently the case, and I think that's wrong for

0:51:54.930 --> 0:51:59.090
<v Speaker 1>her part, journalist Julia Angwin is baffled by the Wisconsin

0:51:59.170 --> 0:52:02.330
<v Speaker 1>Court's ruling. I mean, we have this idea that you

0:52:02.410 --> 0:52:06.210
<v Speaker 1>should be able to argue against whatever accusations are made.

0:52:06.650 --> 0:52:08.970
<v Speaker 1>But I don't know how you make an argument against

0:52:09.490 --> 0:52:13.170
<v Speaker 1>a score, like the score says you're seven, but you

0:52:13.210 --> 0:52:15.330
<v Speaker 1>think you're a four. How do you make that argument

0:52:15.370 --> 0:52:18.770
<v Speaker 1>If you don't know how that seven was calculated? You

0:52:18.770 --> 0:52:25.370
<v Speaker 1>can't make an argument that you're a four Chapter eight

0:52:28.530 --> 0:52:32.410
<v Speaker 1>robo recruiter. Even if you never find yourself in a

0:52:32.450 --> 0:52:36.010
<v Speaker 1>criminal court filling out a compass questionnaire, that doesn't mean

0:52:36.050 --> 0:52:39.370
<v Speaker 1>you won't be judged by a predictive algorithm. There's actually

0:52:39.410 --> 0:52:41.410
<v Speaker 1>a good chance it will happen the next time you

0:52:41.490 --> 0:52:44.730
<v Speaker 1>go looking for a job. I spoke to a scientist

0:52:44.730 --> 0:52:49.170
<v Speaker 1>at a high tech company that screens job applicants. My

0:52:49.250 --> 0:52:53.610
<v Speaker 1>name is Lindsay Zulaga, and I'm actually educated as a physicist,

0:52:53.650 --> 0:52:57.970
<v Speaker 1>but now working for a company called higher View. Higher

0:52:58.050 --> 0:53:02.410
<v Speaker 1>View is a video interviewing platform. Companies create an interview,

0:53:02.450 --> 0:53:05.370
<v Speaker 1>candidates can take it at any time that's convenient for them,

0:53:05.650 --> 0:53:09.010
<v Speaker 1>So they go through the questions and they record themselves answer.

0:53:09.650 --> 0:53:12.370
<v Speaker 1>So it's really a great substitute for kind of the

0:53:12.450 --> 0:53:18.690
<v Speaker 1>resume phone screening part of the process. When a candidate

0:53:18.730 --> 0:53:22.890
<v Speaker 1>takes a video interview, they're creating thousands of unique points

0:53:22.890 --> 0:53:27.410
<v Speaker 1>of data. A candidate's verbal and nonverbal cues give us

0:53:27.450 --> 0:53:32.410
<v Speaker 1>insight into their emotional, engagement, thinking, and problem solving style.

0:53:34.250 --> 0:53:38.890
<v Speaker 1>This combination of cutting edge AI and validated science is

0:53:38.890 --> 0:53:43.410
<v Speaker 1>the perfect partner for making data driven talent decisions. Higher View.

0:53:49.530 --> 0:53:52.450
<v Speaker 1>You know, we'll have a customer and they are hiring

0:53:52.530 --> 0:53:55.450
<v Speaker 1>for something like a call center, say it's sales calls.

0:53:55.850 --> 0:53:58.130
<v Speaker 1>And what we do is we look at past employees

0:53:58.170 --> 0:54:00.890
<v Speaker 1>that applied, and we look at their video interviews. We

0:54:01.010 --> 0:54:04.090
<v Speaker 1>look at the words they said, tone of voice, pauses,

0:54:04.370 --> 0:54:07.810
<v Speaker 1>and facial expressions, things like that, and we look for

0:54:07.890 --> 0:54:11.810
<v Speaker 1>patterns in how those people with good sales numbers behave

0:54:12.130 --> 0:54:14.930
<v Speaker 1>as compared to people as low sales numbers. And then

0:54:14.970 --> 0:54:17.770
<v Speaker 1>we have this algorithm that scores new candidates as they

0:54:17.810 --> 0:54:19.850
<v Speaker 1>come in, and so we help kind of get those

0:54:19.890 --> 0:54:22.570
<v Speaker 1>more promising candidates to the top of the pile so

0:54:22.650 --> 0:54:27.610
<v Speaker 1>they're seen more quickly. So Higher View trains a predictive

0:54:27.610 --> 0:54:32.250
<v Speaker 1>algorithm on video interviews of past applicants who turned out

0:54:32.250 --> 0:54:36.090
<v Speaker 1>to be successful employees. But how does higher View know

0:54:36.210 --> 0:54:41.370
<v Speaker 1>its program isn't learning sexism or racism or other similar biases.

0:54:41.850 --> 0:54:45.610
<v Speaker 1>There are lots of reasons to worry. For example, studies

0:54:45.650 --> 0:54:48.810
<v Speaker 1>from M I. T have shown that facial recognition algorithms

0:54:49.010 --> 0:54:52.890
<v Speaker 1>can have a hard time reading emotions from black people's faces.

0:54:53.370 --> 0:54:56.810
<v Speaker 1>And how would Higher Views program evaluate videos from people

0:54:56.890 --> 0:55:00.930
<v Speaker 1>who might look or sound different than the average employee, say,

0:55:00.930 --> 0:55:04.050
<v Speaker 1>people who don't speak English as a native language, who

0:55:04.050 --> 0:55:08.250
<v Speaker 1>are disabled, who are on the autism spectrum, or even

0:55:08.530 --> 0:55:12.370
<v Speaker 1>people who are just a little quirky. Well, Lindsay says

0:55:12.570 --> 0:55:16.490
<v Speaker 1>Higher View tests for certain kinds of bias, So we

0:55:17.050 --> 0:55:20.210
<v Speaker 1>audit the algorithm after the fact and see if it's

0:55:20.250 --> 0:55:23.450
<v Speaker 1>scoring different groups differently in terms of age, race, and gender.

0:55:23.890 --> 0:55:26.970
<v Speaker 1>So if we do see that happening a lot of times,

0:55:27.050 --> 0:55:29.850
<v Speaker 1>that's probably coming from the training data. So maybe there

0:55:29.930 --> 0:55:32.450
<v Speaker 1>is only one female software engineer in this data set,

0:55:32.650 --> 0:55:35.490
<v Speaker 1>the model might mimic that bias. If we do see

0:55:35.490 --> 0:55:39.850
<v Speaker 1>any of that adverse impact, we simply remove the features

0:55:39.890 --> 0:55:42.810
<v Speaker 1>that are causing it, so we can say this model

0:55:43.090 --> 0:55:46.690
<v Speaker 1>is being sexist. How does the model even know what

0:55:46.770 --> 0:55:49.370
<v Speaker 1>gender the person is? So we look at all the features,

0:55:49.370 --> 0:55:51.650
<v Speaker 1>and we find the features that are the most correlated

0:55:51.730 --> 0:55:54.250
<v Speaker 1>to gender. If there are, we simply remove some of

0:55:54.250 --> 0:55:58.490
<v Speaker 1>those features. I asked lindsay why people should believe higher

0:55:58.570 --> 0:56:03.770
<v Speaker 1>views or any company's assurances, or whether something more was needed.

0:56:04.210 --> 0:56:07.050
<v Speaker 1>You seem thoughtful about this, but there will be many

0:56:07.090 --> 0:56:10.010
<v Speaker 1>people coming into the industry over time might not be

0:56:10.050 --> 0:56:13.570
<v Speaker 1>as thoughtful or as sophisticated as you are. Do you

0:56:13.570 --> 0:56:15.490
<v Speaker 1>think it would be a good idea to have third

0:56:15.610 --> 0:56:21.490
<v Speaker 1>parties come in to certify the audits for bias? I

0:56:21.570 --> 0:56:30.210
<v Speaker 1>know that's a hard question, I guess I I kind

0:56:30.210 --> 0:56:33.650
<v Speaker 1>of lean towards no. So you're talking about having a

0:56:33.650 --> 0:56:38.650
<v Speaker 1>third party entity that comes in an assess and certifies

0:56:38.730 --> 0:56:40.930
<v Speaker 1>the audit. You know, because you've described what I think

0:56:41.010 --> 0:56:43.810
<v Speaker 1>is a really impressive process. But of course, how do

0:56:43.850 --> 0:56:46.370
<v Speaker 1>we know it's true? You know, you could reveal all

0:56:46.410 --> 0:56:49.330
<v Speaker 1>your algorithms, but probably not the thing you want to do,

0:56:49.850 --> 0:56:53.410
<v Speaker 1>And so the next best thing is a certifier says yes,

0:56:53.810 --> 0:56:56.570
<v Speaker 1>this audit has been done. Probably you know your financials

0:56:56.610 --> 0:57:01.650
<v Speaker 1>presumably get audited. Why not the result of the algorithm?

0:57:01.810 --> 0:57:04.570
<v Speaker 1>I guess a little bit. The reason I'm not sure

0:57:04.610 --> 0:57:06.810
<v Speaker 1>about the certification is just. It is mostly just because

0:57:06.810 --> 0:57:09.010
<v Speaker 1>I feel like I don't know how it would work exactly,

0:57:09.210 --> 0:57:13.370
<v Speaker 1>Like you're right totally that finances are audited. I haven't

0:57:13.410 --> 0:57:15.890
<v Speaker 1>thought about it enough to have like a strong opinion

0:57:15.890 --> 0:57:17.810
<v Speaker 1>that it should happen, because it's like, Okay, we have

0:57:17.850 --> 0:57:22.330
<v Speaker 1>all these different models, it's constantly changing. How to do

0:57:22.370 --> 0:57:26.770
<v Speaker 1>they audit every single model all the time. I was

0:57:26.850 --> 0:57:30.690
<v Speaker 1>impressed with Lindsay's willingness as a scientist to think in

0:57:30.810 --> 0:57:34.330
<v Speaker 1>real time about a hard question, and it turns out

0:57:34.730 --> 0:57:38.330
<v Speaker 1>she kept thinking about it afterwards. A few months later,

0:57:38.770 --> 0:57:41.770
<v Speaker 1>she wrote back to me to say that she changed

0:57:41.770 --> 0:57:45.210
<v Speaker 1>her mind. We do have a lot of private information,

0:57:46.010 --> 0:57:47.970
<v Speaker 1>but if we don't share it, people tend to assume

0:57:48.010 --> 0:57:52.050
<v Speaker 1>the worst. So I've decided, after thinking about it quite

0:57:52.090 --> 0:57:55.090
<v Speaker 1>a bit, that I definitely support the third party auditing

0:57:55.090 --> 0:58:00.010
<v Speaker 1>of algorithms. Sometimes people you assume we're doing horrible, horrible things,

0:58:00.290 --> 0:58:03.010
<v Speaker 1>and that can be frustrating. But I do think the

0:58:03.090 --> 0:58:05.290
<v Speaker 1>more transparent we can be about what we are doing

0:58:05.810 --> 0:58:10.930
<v Speaker 1>is important. Several months later, Lindsay emailed again to say

0:58:10.970 --> 0:58:14.530
<v Speaker 1>that Higher View was now undergoing a third party audit.

0:58:15.290 --> 0:58:23.290
<v Speaker 1>She says she's excited to learn from the results, Chapter nine,

0:58:23.410 --> 0:58:29.330
<v Speaker 1>confronting the black box so higher view at first, Reluctant

0:58:29.730 --> 0:58:35.650
<v Speaker 1>says it's now engaging external auditors. What about Equivant, whose

0:58:35.690 --> 0:58:39.490
<v Speaker 1>Compass scores can heavily influence prison sentences, but which is

0:58:39.530 --> 0:58:43.610
<v Speaker 1>steadfastly refused to let anyone even see how their simple

0:58:43.650 --> 0:58:47.930
<v Speaker 1>algorithm works. Well. Just before we release this podcast, I

0:58:48.010 --> 0:58:51.930
<v Speaker 1>checked back with them. A company spokesperson wrote that Equivant

0:58:52.010 --> 0:58:55.970
<v Speaker 1>now agrees that the Compass scoring process quote should be

0:58:56.010 --> 0:59:00.490
<v Speaker 1>made available for third party examination, but they weren't releasing

0:59:00.530 --> 0:59:04.210
<v Speaker 1>it yet because they first wanted to file for copyright

0:59:04.290 --> 0:59:09.610
<v Speaker 1>protection on their simple algorithm. So we're still waiting. You

0:59:09.730 --> 0:59:13.170
<v Speaker 1>might ask, should it be up to the companies to decide?

0:59:13.690 --> 0:59:19.530
<v Speaker 1>Aren't there laws or regulations? The answer is there's not much.

0:59:20.370 --> 0:59:23.370
<v Speaker 1>Governments are just now waking up to the idea that

0:59:23.410 --> 0:59:26.530
<v Speaker 1>they have a role to play. I traveled back to

0:59:26.570 --> 0:59:29.610
<v Speaker 1>New York City to talk to someone who's been involved

0:59:29.650 --> 0:59:33.650
<v Speaker 1>in this question. My name's Rashida Richardson, and I'm a

0:59:33.730 --> 0:59:37.330
<v Speaker 1>civil rights lawyer that focuses on the social implications of

0:59:37.450 --> 0:59:41.650
<v Speaker 1>artificial intelligence. Rashida served as the director of policy research

0:59:41.690 --> 0:59:44.890
<v Speaker 1>at a i Now Institute at NYU, where she worked

0:59:44.890 --> 0:59:49.170
<v Speaker 1>with Kate Crawford, the Australian expert and algorithmic bias that

0:59:49.210 --> 0:59:52.850
<v Speaker 1>I spoke to earlier in the episode. In twenty eighteen,

0:59:53.490 --> 0:59:56.450
<v Speaker 1>New York City became the first jurisdiction in the US

0:59:56.490 --> 1:00:00.010
<v Speaker 1>to create a task force to come up with recommendations

1:00:00.050 --> 1:00:04.250
<v Speaker 1>about government use of predictive algorithms, or, as they call them,

1:00:04.650 --> 1:00:10.130
<v Speaker 1>automated decision systems. Unfortunately, the task force bogged down in

1:00:10.210 --> 1:00:16.010
<v Speaker 1>details and wasn't very productive. In response, Rashida led a

1:00:16.010 --> 1:00:18.970
<v Speaker 1>group of twenty seven experts that wrote a fifty six

1:00:19.050 --> 1:00:26.330
<v Speaker 1>page shadow report entitled Confronting Black Boxes that offered concrete proposals.

1:00:27.570 --> 1:00:30.570
<v Speaker 1>New York City it turns out, uses quite a few

1:00:30.610 --> 1:00:37.290
<v Speaker 1>algorithms to make major decisions. You have the school matching algorithms.

1:00:37.330 --> 1:00:42.090
<v Speaker 1>You have an algorithm used by the Child Welfare agency here.

1:00:42.650 --> 1:00:46.450
<v Speaker 1>You have public benefits algorithms that are used to determine

1:00:46.530 --> 1:00:50.770
<v Speaker 1>who will qualify or have their public benefits, whether that's

1:00:50.850 --> 1:00:56.210
<v Speaker 1>Medicaid or temporary food assistance terminated, or whether they'll receive

1:00:56.250 --> 1:01:00.410
<v Speaker 1>access to those benefits. You have a gang database which

1:01:00.450 --> 1:01:03.650
<v Speaker 1>tries to identify who is likely to be in a gang,

1:01:03.690 --> 1:01:06.250
<v Speaker 1>and that's both used by the DA's office and the

1:01:06.290 --> 1:01:10.370
<v Speaker 1>police department. If you had to make a yes, how

1:01:10.370 --> 1:01:14.810
<v Speaker 1>many predictive algorithms are used by the City of New York,

1:01:15.890 --> 1:01:21.410
<v Speaker 1>I'd say upwards to thirty and I'm underestimating with that number.

1:01:22.370 --> 1:01:28.290
<v Speaker 1>How many of these thirty plus algorithms are transparent about

1:01:28.290 --> 1:01:33.170
<v Speaker 1>how they work, about their code. None. So what should

1:01:33.210 --> 1:01:36.210
<v Speaker 1>New York do? It was up to you what should

1:01:36.250 --> 1:01:39.770
<v Speaker 1>be the behavior of a responsible city with respect to

1:01:40.170 --> 1:01:43.250
<v Speaker 1>the algorithms it uses. I think the first step is

1:01:43.370 --> 1:01:49.410
<v Speaker 1>creating greater transparency, some annual acknowledgement of what is being used,

1:01:49.410 --> 1:01:52.090
<v Speaker 1>how it's being used, whether it's been tested or had

1:01:52.090 --> 1:01:56.890
<v Speaker 1>a validation study. And then you would also want general

1:01:56.930 --> 1:01:59.930
<v Speaker 1>information about the inputs or factors that are used by

1:01:59.930 --> 1:02:03.050
<v Speaker 1>these systems to make predictions, because in some cases you

1:02:03.170 --> 1:02:07.610
<v Speaker 1>have factors that are just discriminatory or proxies for protected

1:02:07.690 --> 1:02:11.450
<v Speaker 1>status is like race, gender, ability status. All right, So

1:02:11.570 --> 1:02:16.770
<v Speaker 1>step one, disclose what systems you're using. Yes, And then

1:02:17.530 --> 1:02:21.770
<v Speaker 1>the second step, I think is creating a system of audits,

1:02:21.770 --> 1:02:26.250
<v Speaker 1>both prior to procurement and then once procured, ongoing auditing

1:02:26.330 --> 1:02:29.290
<v Speaker 1>of the system to at least have a gauge on

1:02:29.330 --> 1:02:32.090
<v Speaker 1>what it's doing real time. A lot of the horror

1:02:32.170 --> 1:02:34.970
<v Speaker 1>stories we hear are about fully implemented tools that we're

1:02:35.050 --> 1:02:38.890
<v Speaker 1>in works for years. There's never your pause button to

1:02:39.370 --> 1:02:43.050
<v Speaker 1>reevaluate or look at how a system is working real time.

1:02:43.570 --> 1:02:46.130
<v Speaker 1>And even when I did studies on the use of

1:02:46.170 --> 1:02:50.650
<v Speaker 1>predictive policing systems, I looked at thirteen jurisdictions, only one

1:02:50.690 --> 1:02:54.690
<v Speaker 1>of them actually did a retrospective review of their system.

1:02:54.890 --> 1:02:57.010
<v Speaker 1>So what's your theory about how do you get the

1:02:57.010 --> 1:03:00.650
<v Speaker 1>auditing done? If you are going to outsource to third parties,

1:03:01.130 --> 1:03:03.650
<v Speaker 1>I think it's going to have to be some approval

1:03:03.690 --> 1:03:07.650
<v Speaker 1>process to assess their level of independence, but also any

1:03:07.730 --> 1:03:10.690
<v Speaker 1>conflict of interest she use that may come up, and

1:03:10.770 --> 1:03:13.610
<v Speaker 1>then also doing some thinking about what types of expertise

1:03:13.610 --> 1:03:16.650
<v Speaker 1>are needed, because I think if you don't necessarily have

1:03:16.690 --> 1:03:20.210
<v Speaker 1>someone who understands that social context or even the history

1:03:20.810 --> 1:03:25.170
<v Speaker 1>of a certain government sector, then you could have a

1:03:25.210 --> 1:03:28.090
<v Speaker 1>tool that is technically accurate and meets all of the

1:03:28.130 --> 1:03:31.370
<v Speaker 1>technical standards, but is still reproducing harm because it's not

1:03:31.450 --> 1:03:34.890
<v Speaker 1>paying attention to that social context. Should a government be

1:03:35.450 --> 1:03:42.610
<v Speaker 1>permitted to purchase an automated decision system where the code

1:03:42.810 --> 1:03:48.130
<v Speaker 1>can't be disclosed by contract now, and in fact, there's

1:03:48.250 --> 1:03:53.290
<v Speaker 1>movement around creating more provisions that vendors must waive trade

1:03:53.330 --> 1:03:56.850
<v Speaker 1>secrecy claims once they enter a contract with the government.

1:03:57.530 --> 1:04:00.290
<v Speaker 1>Rashida says, we need laws to regulate the use of

1:04:00.290 --> 1:04:04.570
<v Speaker 1>predictive algorithms, both by governments and by private companies like

1:04:04.690 --> 1:04:08.250
<v Speaker 1>higher View. We're beginning to see bills being explored in

1:04:08.290 --> 1:04:14.050
<v Speaker 1>different states. Massachusetts, Vermont, and Washington DC are considering setting

1:04:14.090 --> 1:04:17.610
<v Speaker 1>up commissions to look at the government use of predictive algorithms.

1:04:18.530 --> 1:04:23.010
<v Speaker 1>Idaho recently passed a first in the nation law requiring

1:04:23.050 --> 1:04:27.610
<v Speaker 1>that pre trial risk algorithms be free of bias and transparent.

1:04:28.330 --> 1:04:33.290
<v Speaker 1>It blocks manufacturers of tools like Compass from claiming trade

1:04:33.330 --> 1:04:37.290
<v Speaker 1>secret protection. And at the national level, a bill was

1:04:37.410 --> 1:04:42.850
<v Speaker 1>recently introduced in the US Congress, the Algorithmic Accountability Act.

1:04:43.650 --> 1:04:47.090
<v Speaker 1>The bill would require that private companies ensure certain types

1:04:47.130 --> 1:04:53.370
<v Speaker 1>of algorithms are audited for bias. Unfortunately, it doesn't require

1:04:53.450 --> 1:04:56.730
<v Speaker 1>that the results of the audit are made public, so

1:04:56.770 --> 1:04:59.850
<v Speaker 1>there's still a long way to go. Rashida thinks it's

1:04:59.850 --> 1:05:04.410
<v Speaker 1>important that regulations don't just focus on technical issues. They

1:05:04.490 --> 1:05:07.810
<v Speaker 1>need to look at the larger context. Part of the

1:05:07.890 --> 1:05:10.730
<v Speaker 1>problems that were identif fine with these systems is that

1:05:10.770 --> 1:05:14.490
<v Speaker 1>they're amplifying and reproducing a lot of the historical and

1:05:14.570 --> 1:05:18.010
<v Speaker 1>current discrimination that we see in society. There are large

1:05:18.090 --> 1:05:21.050
<v Speaker 1>questions we've been unable to answer as a society of

1:05:21.290 --> 1:05:24.450
<v Speaker 1>how do you deal with the compounded effect of fifty

1:05:24.570 --> 1:05:27.050
<v Speaker 1>years of discrimination? And we don't have a simple answer,

1:05:27.050 --> 1:05:29.850
<v Speaker 1>and there's not necessarily going to be a technical solution.

1:05:30.210 --> 1:05:33.170
<v Speaker 1>But I think having access to more data in an

1:05:33.290 --> 1:05:36.050
<v Speaker 1>understanding of how these systems are working will help us

1:05:36.050 --> 1:05:39.330
<v Speaker 1>evaluate whether these tools are even being evalue added and

1:05:39.370 --> 1:05:45.530
<v Speaker 1>addressing the larger social questions. Finally, Kate Crawford says laws

1:05:45.610 --> 1:05:49.890
<v Speaker 1>alone likely won't be enough. There's another thing we need

1:05:49.890 --> 1:05:53.530
<v Speaker 1>to focus on. In the end, it really matters who

1:05:53.610 --> 1:05:57.090
<v Speaker 1>is in the room designing these systems. If you have

1:05:57.370 --> 1:06:00.490
<v Speaker 1>people sitting around a conference table, they all look the same.

1:06:00.570 --> 1:06:03.290
<v Speaker 1>Perhaps they all did the same type of engineering degree.

1:06:03.370 --> 1:06:06.130
<v Speaker 1>Perhaps they're all men. Perhaps they're all pretty middle class

1:06:06.210 --> 1:06:08.770
<v Speaker 1>or pretty well off. They're going to be designing systems

1:06:08.810 --> 1:06:11.930
<v Speaker 1>that reflect their worldview. What we're learning is that the

1:06:11.970 --> 1:06:14.290
<v Speaker 1>more diverse those rooms are, and the more we can

1:06:14.370 --> 1:06:17.490
<v Speaker 1>question those kinds of assumptions, the better we can actually

1:06:17.530 --> 1:06:27.290
<v Speaker 1>design systems for a diverse world. Conclusion, choose your planet.

1:06:30.370 --> 1:06:33.330
<v Speaker 1>So there you have it, Steorides of the Brave New Planet.

1:06:33.850 --> 1:06:39.410
<v Speaker 1>Predictive algorithms, a sixty year old dream of artificial intelligence

1:06:39.890 --> 1:06:45.090
<v Speaker 1>machines making human like decisions has finally become a reality.

1:06:45.970 --> 1:06:49.050
<v Speaker 1>If a task can be turned into a prediction problem,

1:06:49.090 --> 1:06:52.450
<v Speaker 1>and if you've got a mountain of training data, algorithms

1:06:52.610 --> 1:06:57.050
<v Speaker 1>can learn to do the job. Countless applications are possible,

1:06:57.610 --> 1:07:03.890
<v Speaker 1>translating languages instantaneously, providing expert medical diagnoses for eye diseases

1:07:03.930 --> 1:07:09.370
<v Speaker 1>and cancer to patients anywhere, improving drug development, all at

1:07:09.450 --> 1:07:13.410
<v Speaker 1>levels comparable to or better than human experts. But it's

1:07:13.450 --> 1:07:17.890
<v Speaker 1>also letting governments and companies make automatic decisions about you,

1:07:19.050 --> 1:07:21.850
<v Speaker 1>whether you should get admitted to college, be hired for

1:07:21.890 --> 1:07:26.130
<v Speaker 1>a job, get a loan, get housing assistance, be granted bail,

1:07:26.850 --> 1:07:31.330
<v Speaker 1>or get medical attention. The problem is that algorithms that

1:07:31.490 --> 1:07:35.490
<v Speaker 1>learn to make human like decisions based on past human

1:07:35.530 --> 1:07:42.170
<v Speaker 1>outcomes can acquire a lot of human biases about gender, race, class,

1:07:42.250 --> 1:07:48.810
<v Speaker 1>and more often masquerading as objective judgment. Even worse, you

1:07:48.970 --> 1:07:52.130
<v Speaker 1>usually don't have a right even to know you're being

1:07:52.250 --> 1:07:55.970
<v Speaker 1>judged by a machine, or what's inside the black box,

1:07:56.610 --> 1:08:00.850
<v Speaker 1>or whether the algorithms are accurate or fair. Should laws

1:08:00.930 --> 1:08:04.810
<v Speaker 1>require that automated decision systems used by governments or companies

1:08:05.050 --> 1:08:09.650
<v Speaker 1>be transparent? Should they require public auditing for a curacy

1:08:09.810 --> 1:08:16.090
<v Speaker 1>and fairness? And what exactly is fairness? Anyway? Governments are

1:08:16.130 --> 1:08:18.730
<v Speaker 1>just beginning to wake up to these issues, and they're

1:08:18.770 --> 1:08:22.050
<v Speaker 1>not sure what they should do. In the coming years,

1:08:22.330 --> 1:08:26.490
<v Speaker 1>they'll decide what rules to set, or perhaps to do

1:08:26.570 --> 1:08:30.490
<v Speaker 1>nothing at all. So what can you do a lot?

1:08:30.570 --> 1:08:33.770
<v Speaker 1>It turns out you don't have to be an expert

1:08:33.810 --> 1:08:36.810
<v Speaker 1>and you don't have to do it alone. Start by

1:08:36.930 --> 1:08:41.690
<v Speaker 1>learning a bit more. Invite friends over virtually or in

1:08:41.770 --> 1:08:45.410
<v Speaker 1>person when it's saved for dinner and debate about what

1:08:45.450 --> 1:08:49.090
<v Speaker 1>we should do. Or organize a conversation at a book club,

1:08:49.410 --> 1:08:53.690
<v Speaker 1>a faith group, or a campus event. And then email

1:08:53.730 --> 1:08:57.570
<v Speaker 1>your city or state representatives to ask what they're doing

1:08:57.610 --> 1:09:02.010
<v Speaker 1>about the issue, maybe even proposing first steps like setting

1:09:02.050 --> 1:09:06.810
<v Speaker 1>up a task force. When people get engaged, action happens.

1:09:07.650 --> 1:09:10.770
<v Speaker 1>You'll find lots of resources and ideas at our website,

1:09:11.170 --> 1:09:15.970
<v Speaker 1>Brave New Planet dot org. It's time to choose our planet.

1:09:16.650 --> 1:09:31.330
<v Speaker 1>The future is up to us, James. Brave New Planet

1:09:31.450 --> 1:09:33.610
<v Speaker 1>is a co production of the Broad Institute of Mt

1:09:33.730 --> 1:09:37.530
<v Speaker 1>and Harvard Pushkin Industries in the Boston Globe, with support

1:09:37.610 --> 1:09:40.930
<v Speaker 1>from the Alfred P. Sloan Foundation. Our show is produced

1:09:40.930 --> 1:09:44.970
<v Speaker 1>by Rebecca Lee Douglas with Mary Doo theme song composed

1:09:44.970 --> 1:09:48.770
<v Speaker 1>by Ned Porter, mastering and sound designed by James Garver,

1:09:49.410 --> 1:09:53.090
<v Speaker 1>fact checking by Joseph Fridman, and a Stitt and Enchant.

1:09:53.970 --> 1:09:58.170
<v Speaker 1>Special Thanks to Christine Heenan and Rachel Roberts at Clarendon Communications,

1:09:58.730 --> 1:10:02.290
<v Speaker 1>to Lee McGuire, Kristen Zarelli and Justine Levin Allerhand at

1:10:02.290 --> 1:10:06.450
<v Speaker 1>the Broad, to Milobelle and Heather Faine at Pushkin, and

1:10:07.010 --> 1:10:10.370
<v Speaker 1>to Eli and Edy Brode who made Broad Institute possible.

1:10:11.010 --> 1:10:14.330
<v Speaker 1>This is Brave New Planet. I'm Ericlander.