WEBVTT - Why We Need More Black Data Scientists w/ Matthew Finney 0:00:01.560 --> 0:00:04.320 So why is it fairness part of our process here? 0:00:04.880 --> 0:00:08.760 It's because, well, as data scientists and statisticians and researchers, 0:00:08.840 --> 0:00:12.560 we had good intentions. We lack those mechanisms for action. 0:00:13.080 --> 0:00:15.520 We lack things in our process that force us to 0:00:15.560 --> 0:00:18.880 consider hard questions. We need to use our brains a 0:00:18.920 --> 0:00:21.440 little bit more than we need to for other problems 0:00:21.440 --> 0:00:24.400 that we solve every day. And so why don't we 0:00:24.440 --> 0:00:27.920 solve these hard problems? It's because we lack incentives as 0:00:27.960 --> 0:00:32.360 a community data scientist to do something. Um, it's a 0:00:32.400 --> 0:00:36.000 hard problem, and we have no transparency and no accountability 0:00:36.320 --> 0:00:38.920 for the models that we produce. Right, So that means 0:00:38.960 --> 0:00:42.800 that we have little hard business reason to prioritize fairness 0:00:42.800 --> 0:00:46.320 and to spend time working on addressing this hard problem. Well, 0:00:46.320 --> 0:00:49.160 you see a black tech green money. Let's talk about 0:00:49.400 --> 0:00:53.320 algorithmic bias. You probably like, yo will. What in the 0:00:53.360 --> 0:00:58.400 world is algorithmic bias? The wikipedias is it describes systematic 0:00:58.720 --> 0:01:03.240 and repeatable error in the computer system that create unfair outcomes, 0:01:03.280 --> 0:01:07.480 such as privileging one category over another in ways different 0:01:07.720 --> 0:01:11.440 from the intended function of the algorithm. Now we can 0:01:11.680 --> 0:01:14.959 debate whether these things are intended or not intended. But 0:01:14.959 --> 0:01:17.679 that's a different conversation for another day. But these canna 0:01:17.680 --> 0:01:20.800 have a direct impact on you when it determines which 0:01:21.040 --> 0:01:24.720 political ads you see, or how many cops are deployed 0:01:24.720 --> 0:01:28.480 in your neighborhood, or even your insurance premiums, how much 0:01:28.520 --> 0:01:32.319 you pay for insurance. It was a study that show 0:01:32.840 --> 0:01:36.200 even though black Americans are four times more likely to 0:01:36.240 --> 0:01:40.399 have kidney failure, an algorithm to determine the priority of 0:01:40.480 --> 0:01:44.360 patients on a kidney transplant list put black patients lower 0:01:44.400 --> 0:01:46.800 on the list than white patients, even when all other 0:01:46.840 --> 0:01:50.640 factors remain identical. So today on Black Tech, Green Money, 0:01:50.680 --> 0:01:53.640 we're hearing from Matthew Finney, who's a data scientist is 0:01:53.680 --> 0:01:57.000 strategy consultant at Harvard. He was a speaking from Afro 0:01:57.040 --> 0:02:00.760 Tech World and in his day job, he phillips AI 0:02:01.000 --> 0:02:04.760 decision systems to help large organizations and make an impact 0:02:04.760 --> 0:02:09.040 on their most challenging business emission problems. I can sometimes 0:02:09.040 --> 0:02:12.560 be a reluctant technologist, don't get me wrong. In the 0:02:12.639 --> 0:02:16.799 last decade we have made some amazing feats with artificial intelligence. 0:02:17.160 --> 0:02:19.400 We've been able to figure out what you want to 0:02:19.520 --> 0:02:22.760 buy before you knew you wanted it we can have 0:02:23.040 --> 0:02:27.080 a self driving, artificially intelligent electric car, and if that 0:02:27.160 --> 0:02:30.440 was enough, we put it in space. We've trained AI 0:02:30.639 --> 0:02:34.760 to read mammograms with particular skill at diagnosing a set 0:02:34.800 --> 0:02:38.840 of highly invasive cancers that radiologists had missed, but we 0:02:38.919 --> 0:02:42.359 still hadn't figured out how to make our technology treat 0:02:42.400 --> 0:02:44.440 others the way that we would want to be treated. 0:02:44.760 --> 0:02:47.000 So I promise I'm not just gonna stick to that 0:02:47.080 --> 0:02:51.000 gloom and doom topic today. So what are we gonna do. First, 0:02:51.000 --> 0:02:54.880 we're gonna define and measure algorithm bias. Then we're gonna 0:02:54.960 --> 0:02:57.520 figure out how we can isolate the root causes of 0:02:57.560 --> 0:03:00.520 poor algorithm behavior, and finally, we're going to learn how 0:03:00.560 --> 0:03:03.160 we can all take action to make algorithms more fair. 0:03:03.639 --> 0:03:07.280 So let's get started. I want to evaluate algorithmic bias 0:03:07.360 --> 0:03:10.280 here through the lens of a case study, and we'll 0:03:10.360 --> 0:03:13.760 learn how to, through this case study, apply the tools 0:03:13.800 --> 0:03:18.320 more generally. Kidneys are really important. Obviously, their main function 0:03:18.400 --> 0:03:21.440 in our body is to help us filter out waste, 0:03:21.840 --> 0:03:24.280 and so there's a metric of kidney function called the 0:03:24.280 --> 0:03:29.880 glomerular filtration rate that's very important for diagnosed and kidney 0:03:29.919 --> 0:03:33.680 disease However, this metric is really hard to measure directly. 0:03:34.320 --> 0:03:36.320 If you were going to measure directly, you need to 0:03:36.360 --> 0:03:38.720 collect the waste from the kidney over the period of 0:03:38.760 --> 0:03:41.839 twenty four hours. So it's not practical, it's not fun 0:03:41.920 --> 0:03:44.840 for anyone. That's why in the seventies they developed an 0:03:44.880 --> 0:03:50.520 algorithmic way to estimate this metric. UH Doctors can take 0:03:50.560 --> 0:03:54.120 a sample of your blood and measure the level of 0:03:54.320 --> 0:03:59.680 asset called creatomy that's in your blood sample, and there's 0:03:59.680 --> 0:04:03.480 a Russian equation that takes that crowdning metric and turns 0:04:03.480 --> 0:04:07.120 it into a kidney function index, this creating any metric 0:04:07.280 --> 0:04:10.960 that they use. When researchers were developing the model, they 0:04:11.000 --> 0:04:15.600 realized that creating is highly sensitive to someone's muscle mass, 0:04:15.920 --> 0:04:18.800 you know, given that it's actually a byproduct of muscle activity. 0:04:19.000 --> 0:04:21.480 And so when they were trying to make the algorithm 0:04:21.640 --> 0:04:25.719 as accurate as they could, researchers determined that because African 0:04:25.760 --> 0:04:30.720 Americans have higher muscle mass, they have higher baseline crawdning levels, 0:04:30.800 --> 0:04:33.200 and so they decided that they were going to adjust 0:04:33.279 --> 0:04:36.720 the c k D EPI algorithm, this kidney function algorithm, 0:04:36.760 --> 0:04:41.080 to increase kidney function index scorers for African Americans to 0:04:41.400 --> 0:04:45.360 control for this muscle difference. Here, a higher kidney function 0:04:45.440 --> 0:04:49.440 score indicates that your kidney is healthier, so African Americans 0:04:49.440 --> 0:04:52.520 were being given kidney index scores that were showing their 0:04:52.560 --> 0:04:55.400 kidneys were healthier than a white person with the same 0:04:55.600 --> 0:04:59.760 observable metrics. Interestingly, the United States is the only place 0:04:59.800 --> 0:05:03.000 in the world that we do this race correction for 0:05:03.160 --> 0:05:05.960 kidney functions, and there are many other places in the 0:05:06.000 --> 0:05:08.279 world where we have a large population of people with 0:05:08.360 --> 0:05:13.159 African heritage. This is because people see that this correction 0:05:13.360 --> 0:05:17.320 is unfair. There are two specific definitions of fairness that 0:05:17.360 --> 0:05:21.400 we use in the algorithm community. The first is group fairness, 0:05:22.200 --> 0:05:24.920 and the idea behind group fairness is that in your 0:05:25.000 --> 0:05:28.400 data set, you have groups that are identifiable and they 0:05:28.440 --> 0:05:32.240 should be treated similarly to the population as a whole. Right, 0:05:32.320 --> 0:05:35.080 So a group could be all people with blue eyes, 0:05:35.680 --> 0:05:40.200 people with red hair, everyone who lives in Minnesota, all men, 0:05:40.920 --> 0:05:45.520 people of Latin heritage. All those are examples of groups. 0:05:45.560 --> 0:05:47.920 And if you have an algorithm that is grouped fair 0:05:48.279 --> 0:05:51.800 that means that the algorithm treats all of these groups 0:05:51.839 --> 0:05:55.200 similarly to the rest of the population. Regardless of whether 0:05:55.279 --> 0:05:58.880 or not the algorithm has that information about the sensitive attribute. 0:05:58.880 --> 0:06:01.080 That means someone's in a group or not. So let's 0:06:01.080 --> 0:06:05.760 look at the second definition, individual fairness. Individual fairness means 0:06:05.800 --> 0:06:09.799 that similar individuals should be treated similarly. In An example 0:06:09.839 --> 0:06:13.080 of that is, let's say you have two people who 0:06:13.120 --> 0:06:17.279 have equal incomes and equal credit history, and they're applying 0:06:17.320 --> 0:06:19.480 for credit at a bank, and the bank uses an 0:06:19.520 --> 0:06:23.760 algorithmic decision system to determine whether or not to extend 0:06:23.800 --> 0:06:27.039 credit and a certain credit limit to the customers. So, 0:06:27.160 --> 0:06:29.880 given that they had the same income and the same 0:06:29.920 --> 0:06:33.520 credit history, even though one is male and the other's female, 0:06:33.800 --> 0:06:36.760 both individuals should get the same credit limit if the 0:06:36.760 --> 0:06:39.919 algorithm is individually fair. So now let's dive into this 0:06:40.000 --> 0:06:44.720 kidney function algorithm again and let's think is this algorithm fair. 0:06:45.080 --> 0:06:47.279 So first we'll look at the group fairness of the 0:06:47.360 --> 0:06:51.120 c K D E P I algorithm. UM. The chart 0:06:51.160 --> 0:06:53.599 here on the rank is taking a look at the 0:06:53.640 --> 0:06:57.640 media number of days that adults in the United States 0:06:57.880 --> 0:07:01.159 who received kidney transplants spent on the waiting list for 0:07:01.160 --> 0:07:05.960 a kidney before they receive the transplant. UM something stands 0:07:05.960 --> 0:07:10.040 out almost immediately here, and it's that African Americans can 0:07:10.120 --> 0:07:15.760 spend over twice as long as Caucasians on the waiting 0:07:15.800 --> 0:07:18.920 list for a kidney in the United States. Right, So, 0:07:19.120 --> 0:07:22.440 African Americans are spending years on the waiting list, and 0:07:22.520 --> 0:07:25.080 part of this is because of the c K D 0:07:25.280 --> 0:07:29.920 e PI algorithm that's giving them higher kidney functions scores 0:07:30.320 --> 0:07:33.000 even though their kidney might not be functioning well, and 0:07:33.040 --> 0:07:35.280 that puts them at a lower priority on the waiting 0:07:35.320 --> 0:07:38.960 list for a kidney. So this is treating African Americans 0:07:39.040 --> 0:07:42.640 as a group different from groups of other Americans, and 0:07:42.680 --> 0:07:45.720 that's something we should be concerned about. This algorithm is 0:07:45.720 --> 0:07:49.320 not group fair. So now let's consider is this algorithm 0:07:49.360 --> 0:07:54.200 individually fair. Individual fairness means that we treat similar individuals similarly. 0:07:54.440 --> 0:07:57.800 And in this algorithm, we can have two individuals who 0:07:57.840 --> 0:08:00.560 have the same muscle mass and the a level of 0:08:00.560 --> 0:08:03.360 creating me measured in their blood. But if one of 0:08:03.400 --> 0:08:05.480 them is white and one of them is black, they're 0:08:05.480 --> 0:08:09.280 going to get different scores for their kidney function, such 0:08:09.360 --> 0:08:12.080 that the black person will get a score indicating a 0:08:12.120 --> 0:08:17.120 healthier kidney than the white person. Um this is concerning, right, 0:08:17.240 --> 0:08:21.280 This is not individually fair and the medical community starting 0:08:21.320 --> 0:08:24.200 to come around to this. So last year in the 0:08:24.280 --> 0:08:27.880 Journal of the American Medical Association, they published an article 0:08:28.320 --> 0:08:31.760 asking to reconsider the use of race and the kidney 0:08:31.760 --> 0:08:34.600 function algorithm. And there was a sentence here that I 0:08:34.640 --> 0:08:37.240 thought was really important. With the e G. F Our 0:08:37.280 --> 0:08:41.640 equation that's being used, it asserts that existing organ function 0:08:42.240 --> 0:08:46.120 is different between individuals who are identical except for race. 0:08:46.960 --> 0:08:51.880 Race is causing African Americans to get unfavorable scores of 0:08:51.920 --> 0:08:55.200 their kidney measurement function that might lead them to get 0:08:55.240 --> 0:08:57.719 a lower priority on the waiting list to receive an 0:08:57.800 --> 0:09:02.240 organ that's desperately needed. This might seem obvious that these 0:09:02.320 --> 0:09:07.120 types of scenarios are bad, right, and we shouldn't be 0:09:07.240 --> 0:09:10.440 using race for something that could have unfair outcomes that 0:09:10.520 --> 0:09:15.319 cause life or death situations for people. But this keeps 0:09:15.360 --> 0:09:18.600 happening over and over again. Any week you can open 0:09:18.679 --> 0:09:22.080 up the newspaper and see a new algorithm that was 0:09:22.200 --> 0:09:25.640 racist or sexist. You know, name YOURYSM. There's an algorithm 0:09:25.720 --> 0:09:29.480 that is suffering from it. So let's talk about how 0:09:29.520 --> 0:09:32.000 and why this happens. First, I want to just talk 0:09:32.040 --> 0:09:36.040 about how we make models. Algorithmic models are function of 0:09:36.160 --> 0:09:41.439 three things, technology, people, and process. On the technical front, 0:09:41.679 --> 0:09:44.320 you know, that's where we consider the data that you're 0:09:44.400 --> 0:09:47.840 using to train your model and the specific algorithm for example, 0:09:48.080 --> 0:09:50.160 so that could be a neural network, that could be 0:09:50.200 --> 0:09:53.480 a linear progression, that could be anything in between. On 0:09:53.559 --> 0:09:56.480 the people front, you know, that's where we consider the 0:09:56.640 --> 0:10:00.480 role of people like myself, data scientists, business owners who 0:10:00.840 --> 0:10:03.640 come up with the business requirements for these algorithms, and 0:10:03.679 --> 0:10:06.880 the end users who actually take the algorithms and put 0:10:06.920 --> 0:10:10.000 them into practice to make decisions. And the last component 0:10:10.080 --> 0:10:13.720 here are the processes, the processes that we use to 0:10:13.800 --> 0:10:17.199 tread our models, to evaluate our models, and apply them 0:10:17.200 --> 0:10:21.840 in practice. And by breaking down the process of building 0:10:21.840 --> 0:10:25.319 a model into these three components, we can evaluate them 0:10:25.360 --> 0:10:28.400 individually when we want to determine the root cause of 0:10:28.440 --> 0:10:32.480 algorithmic fairness or algorithmic bias. So how did we make 0:10:32.600 --> 0:10:35.720 a biased kidney function model in the context of these 0:10:35.760 --> 0:10:40.400 three components. First, let's look at technology. So when researchers 0:10:40.440 --> 0:10:43.479 were developing the c K D E p I algorithm, 0:10:43.520 --> 0:10:46.840 they had many different ways that they could consider that 0:10:46.920 --> 0:10:50.920 we're technologically feasible to measure and estimate e g. F R. 0:10:51.080 --> 0:10:54.280 There was a direct way of measuring at gloom earlier 0:10:54.440 --> 0:10:59.400 filtration rate, which was very difficult but not impossible, and 0:10:59.440 --> 0:11:02.000 we could have on with that as medical community. There 0:11:02.000 --> 0:11:05.120 were other alternatives to things that we can measure in 0:11:05.160 --> 0:11:08.800 the blood Beyond looking at the creatomy, which is sensitive 0:11:08.840 --> 0:11:11.920 to muscle mass. We could have instead decided to look 0:11:11.960 --> 0:11:15.440 at sistat and see, which is another indicator of kidney 0:11:15.440 --> 0:11:18.880 function that has no sensitivity to muscle muscle mass. And 0:11:19.040 --> 0:11:23.000 there were also better ways of measuring muscle mass that 0:11:23.040 --> 0:11:27.080 were technologically possible beyond just looking at someone's race to 0:11:27.240 --> 0:11:31.080 estimate muscle mass. Right, So technology wasn't the constraint here 0:11:31.080 --> 0:11:35.400 that let us to have a unfair algorithm for measuring 0:11:35.480 --> 0:11:39.160 kidney function. Let's evaluate the people. Now it's gonna sound 0:11:39.160 --> 0:11:41.320 like I'm glossing over this one, but I really do 0:11:41.480 --> 0:11:45.240 want to assume the researcher's best intentions here when they 0:11:45.280 --> 0:11:48.880 decided to build this regression model for measuring kidney function. 0:11:49.360 --> 0:11:52.080 And I also want to assume that the doctors have 0:11:52.640 --> 0:11:55.520 only the best intentions and the best interests of their 0:11:55.559 --> 0:11:58.440 patients and mind when they make decisions on ordering this 0:11:58.559 --> 0:12:02.360 test and recommen patients for kidney transplants, So I don't 0:12:02.400 --> 0:12:05.080 think that people are the constraint here either. That led 0:12:05.160 --> 0:12:07.480 us to have a biased model. So now let's look 0:12:07.520 --> 0:12:11.280 at the process. The process here for building this model 0:12:11.600 --> 0:12:15.160 was optimized for overall accuracy of the model. So we 0:12:15.240 --> 0:12:19.120 mentioned how when researchers decided to include race in the 0:12:19.200 --> 0:12:22.840 model that they were training, they got a slight overall 0:12:22.880 --> 0:12:25.760 accuracy boost in the model, and that was the driving 0:12:25.760 --> 0:12:28.360 factor in the decision to include race as a predictor 0:12:28.400 --> 0:12:31.280 of kidney function. That process, that's where I want to 0:12:31.280 --> 0:12:35.840 dive deeper. That's where our failure was. We had a 0:12:35.920 --> 0:12:41.720 process that was optimized for accuracy and not for fairness objectives, 0:12:42.080 --> 0:12:46.720 and because of that, that's how researchers developed a kidney 0:12:46.720 --> 0:12:49.880 function model that was biased racially and had led to 0:12:50.000 --> 0:13:05.040 unfair outcomes. A couple of years ago, the US Department 0:13:05.040 --> 0:13:09.560 of Education Civil Rights Data Collection released information showing that 0:13:09.720 --> 0:13:13.240 black and Latino students lack access at the high school 0:13:13.320 --> 0:13:18.360 level to high level science and math classes and predominantly 0:13:18.360 --> 0:13:22.600 white schools, calculus was offered across fifty percent of them. 0:13:23.360 --> 0:13:29.080 In predominantly minority schools, just thirty three physics sixty seven 0:13:29.120 --> 0:13:34.160 percent for white, forty percent for minority, algebra eight fo 0:13:34.280 --> 0:13:38.000 percent for white, seventy one percent for minority. Now this 0:13:38.120 --> 0:13:42.080 matters because these have downstream effects. High aptitude in these 0:13:42.160 --> 0:13:46.520 STEM fields us higher representation in STEM careers. So when 0:13:46.559 --> 0:13:50.040 we're not represented well, the systems don't get built for 0:13:50.160 --> 0:13:54.520 us or even with our input appropriately considered. So how 0:13:54.520 --> 0:13:57.640 can these systems that weren't built with our input play 0:13:57.679 --> 0:14:01.840 out negatively in our communities? As a data scientist, you know, 0:14:02.360 --> 0:14:06.080 we are in a profession where there's a high emphasis 0:14:06.120 --> 0:14:10.920 on overall accuracy and a number of procedural technical controls 0:14:10.960 --> 0:14:14.560 that promote that. On the technical side, we have many 0:14:14.679 --> 0:14:20.920 metrics like just overall vanilla accuracy, MSc, precision recall, you 0:14:21.040 --> 0:14:25.120 name it, specialized metrics to measure the accuracy of our models. 0:14:25.400 --> 0:14:28.800 And then we have procedures like p testing that help 0:14:28.880 --> 0:14:32.000 us make determinations about whether or not we should deploy 0:14:32.040 --> 0:14:35.160 a certain model into practice. But we don't have that 0:14:35.280 --> 0:14:39.680 same infrastructure for fairness. Um. As someone who's been in 0:14:39.680 --> 0:14:42.360 the room where it happens, you know, I can tell 0:14:42.400 --> 0:14:45.960 you where I think specifically, this type of process breakdown 0:14:46.280 --> 0:14:50.040 affected our our kidney function model that we've been evaluating. 0:14:50.520 --> 0:14:54.400 So let's look at specific things that they missed. UM. First, 0:14:54.520 --> 0:14:57.160 let's address this chart here on the right. This is 0:14:57.200 --> 0:15:00.840 a chart that shows muscle mass by ray among a 0:15:00.920 --> 0:15:04.640 population of the US adults. The blue line represents white 0:15:04.680 --> 0:15:08.680 Americans and the red line represents Black Americans. So we 0:15:08.720 --> 0:15:12.120 can see that while on average, black Americans have a 0:15:12.160 --> 0:15:16.720 slightly higher muscle mass and white Americans UM, this shift 0:15:16.880 --> 0:15:20.480 is so slight that the distributions of muscle mass by 0:15:20.600 --> 0:15:24.280 race overlap almost entirely. What this tells me as a 0:15:24.360 --> 0:15:27.840 data scientist and a statistician is that an individual's race 0:15:27.920 --> 0:15:32.080 tells me next to nothing about that person's muscle mass. 0:15:32.240 --> 0:15:36.600 And so, as a researcher developing a kidney function algorithm, 0:15:36.720 --> 0:15:39.280 if I was concerned about muscle mass, I would have 0:15:39.360 --> 0:15:42.000 seen this chart and said, Wow, race is not a 0:15:42.040 --> 0:15:44.560 predictor for muscle mass. That's going to help us, uh 0:15:44.720 --> 0:15:47.440 improve the accuracy of our algorithm in a way. That's fair, 0:15:48.120 --> 0:15:51.880 because you know, if we treat individuals as just members 0:15:51.960 --> 0:15:54.440 of a race, we're actually not going to give that 0:15:54.520 --> 0:15:57.880 person the best healthcare. So nothing in their process forced 0:15:57.880 --> 0:16:01.840 them to look at whether or not race is predictive um, 0:16:02.360 --> 0:16:05.720 in in in a broad sense for their objective, which 0:16:05.760 --> 0:16:09.280 was to control for muscle mass. Nothing also forced them 0:16:09.320 --> 0:16:13.160 to consider what the impact of using race would be 0:16:13.240 --> 0:16:16.920 on the fairness of their model. So they didn't consider 0:16:17.400 --> 0:16:21.320 the societal impacts of using race and healthcare. They also 0:16:21.360 --> 0:16:26.160 didn't consider, um, how that would impact individuals you know, 0:16:26.200 --> 0:16:28.280 who are on the waiting list for a kidney, and 0:16:28.320 --> 0:16:31.960 how that might lead to individuals who are equally qualified 0:16:31.960 --> 0:16:37.240 to receive a kidney uh be uh differentially prioritized on 0:16:37.320 --> 0:16:41.800 the list to receive that kidney based on race. So 0:16:41.960 --> 0:16:46.120 why isn't fairness part of our process here? Um? It's 0:16:46.160 --> 0:16:49.960 because well, as data scientists and statisticians and researchers, we 0:16:50.040 --> 0:16:54.200 had good intentions. We lack those mechanisms for action. We 0:16:54.360 --> 0:16:57.120 lack things in our process that forced us to consider 0:16:57.480 --> 0:17:02.239 hard questions. UM. It would be really easy to say 0:17:02.280 --> 0:17:05.640 that we have biased algorithms because there are biased individuals 0:17:05.720 --> 0:17:09.080 who want to encode their bias and the algorithms. UM. 0:17:09.080 --> 0:17:11.679 And while I can't rule that out completely, let me 0:17:11.720 --> 0:17:15.520 tell you that of the time that is not the case. Right. 0:17:15.840 --> 0:17:22.760 Here's my hypothesis. Fairness is context specific um, meaning that 0:17:23.119 --> 0:17:27.120 depending on what type of algorithm we're training, there might 0:17:27.119 --> 0:17:30.800 be a different fairness subjective, and there might be different 0:17:30.880 --> 0:17:34.600 rules for what's fair and what's unfair. So, for example, 0:17:34.840 --> 0:17:37.879 there could be some healthcare scenarios where race is actually 0:17:37.920 --> 0:17:41.280 an important predictor of a person to have overall health 0:17:41.359 --> 0:17:46.080 or or risk for a disease, and those scenarivos might 0:17:46.080 --> 0:17:50.480 be areas where it's fair to include race in an algorithm. 0:17:50.520 --> 0:17:53.439 But it's something like this kidney function algorithm, we can 0:17:53.480 --> 0:17:56.840 see that including race is clearly unfair. Um. And it's 0:17:56.880 --> 0:17:59.760 because that there are these multiple notions of fairness with 0:18:00.040 --> 0:18:04.840 different context dependencies that fairness is actually a hard problem 0:18:04.920 --> 0:18:08.399 to solve. And for data scientists, you know, this is 0:18:08.400 --> 0:18:12.879 a hard problem without a unique, closed form mathematical solutions, 0:18:13.480 --> 0:18:15.600 meaning we need to use our brains a little bit 0:18:15.600 --> 0:18:17.879 more than we need to for other problems that we 0:18:17.920 --> 0:18:21.160 solve every day. And so why don't we solve these 0:18:21.200 --> 0:18:24.800 hard problems. It's because we lack incentives as a community 0:18:24.840 --> 0:18:29.280 data scientist to do something. Um, it's a hard problem, 0:18:29.359 --> 0:18:32.840 and we have no transparency and no accountability for the 0:18:32.880 --> 0:18:35.480 models that we produce. Right, So that means that we 0:18:35.520 --> 0:18:39.480 have little hard business reason to prioritize fairness and to 0:18:39.480 --> 0:18:42.840 spend time working on addressing this hard problem if no 0:18:42.840 --> 0:18:44.959 one's ever going to be able to see, you know, 0:18:45.280 --> 0:18:47.520 the steps that we took to address it and the 0:18:47.520 --> 0:18:53.119 impact of our work. So, considering this process and mechanism 0:18:53.160 --> 0:18:57.520 failure for fairness, how will we end algorithmic bias? So 0:18:57.600 --> 0:19:01.520 I want to return to this idea, yeah, that algorithmic 0:19:01.560 --> 0:19:06.679 models are a function of three major components technology, people, 0:19:07.000 --> 0:19:10.560 and process. This is actually a question I asked often, 0:19:11.040 --> 0:19:14.720 and I've asked in conversations about algorithm algorithmic fairness with 0:19:14.760 --> 0:19:20.760 all kinds of people technologists, computer scientists, mathematicians, lawyers, ethicist, activists, 0:19:21.359 --> 0:19:25.280 policy makers, and sociologists and many more. Right, And so 0:19:25.400 --> 0:19:27.800 I found through these conversations and through some of my 0:19:27.880 --> 0:19:31.720 own research that there are many existing approaches to addressing 0:19:31.720 --> 0:19:35.760 algorithmic bias, and they generally fall in the technology and 0:19:35.840 --> 0:19:39.680 people the veins. And so that's what we're looking at here, 0:19:40.400 --> 0:19:43.480 just a couple of those different approaches that are already 0:19:43.480 --> 0:19:47.879 out there that allows to address algorithmic fairness on the 0:19:47.880 --> 0:19:52.040 technology front. I want to highlight that we already do 0:19:52.240 --> 0:19:56.040 have class of algorithms that are always fair or fair 0:19:56.080 --> 0:20:00.480 within certain constraints, and we're not always using them our work. 0:20:00.720 --> 0:20:04.600 That's the problem. But there are tools out there that 0:20:04.680 --> 0:20:08.600 allows to implement these very directly. So IBM, for example, 0:20:08.720 --> 0:20:12.920 recently released a toolkit called AI Fairness three sixty UM 0:20:12.960 --> 0:20:17.080 and it has fair machine learning algorithms and machine learning 0:20:17.280 --> 0:20:21.919 diagnostics already implemented in Python that can be adapted to 0:20:22.320 --> 0:20:25.480 any other type of prediction problem. Now, if you're a 0:20:25.480 --> 0:20:29.440 little bit more adventurous, there's also a community of academics 0:20:29.760 --> 0:20:33.800 who are on the cutting edge of research of algorithmic fairness. 0:20:33.880 --> 0:20:36.840 And I'll point out the Symposium on the Foundations of 0:20:36.960 --> 0:20:40.000 Responsible Computing as one place where you can go and 0:20:40.119 --> 0:20:43.160 learn about a lot of those really cutting cutting edge 0:20:43.240 --> 0:20:46.840 research topics. All these videos from the symposium are actually 0:20:46.920 --> 0:20:50.000 publicly available on YouTube, so that you can add your 0:20:50.080 --> 0:20:53.720 leisure learn about these topics from the academics who developed 0:20:53.760 --> 0:20:57.480 them themselves. On the people front, right, we have a 0:20:57.560 --> 0:21:02.000 lot of existing organizations that attack length education and tackling 0:21:02.040 --> 0:21:04.600 the social movement component of this as well. Just to 0:21:04.680 --> 0:21:08.119 name a few of organizations that are doing many great things. 0:21:08.560 --> 0:21:11.359 Are we have data for black Lives and the Algorithmic 0:21:11.520 --> 0:21:15.199 Justice League that are tackling that social movements and social 0:21:15.200 --> 0:21:21.680 activism approach to encouraging algorithmic fairness. And then there's also 0:21:21.720 --> 0:21:25.360 an organization called AI for All that is UH tackling 0:21:25.400 --> 0:21:28.359 the education. So given that we see a lot of 0:21:28.440 --> 0:21:31.920 existing work out there on the technology and people fronts, 0:21:32.200 --> 0:21:34.960 I want to turn our attention to process where there's 0:21:35.040 --> 0:21:40.359 relatively less existing work, and that's where the focus of 0:21:40.400 --> 0:21:43.800 my research is what mechanisms can help us to build 0:21:43.840 --> 0:21:48.320 fair algorithmic models. I'll return to those challenges that we 0:21:48.400 --> 0:21:52.240 discussed before, the fact that algorithm fairness is hard to 0:21:52.240 --> 0:21:55.000 define and hard to measure, and because of a lack 0:21:55.000 --> 0:21:59.080 of transparency and accountability, we have a few incentives to 0:21:59.119 --> 0:22:02.080 actually go in an and tackle the heart problem. So 0:22:02.240 --> 0:22:04.840 first I want to propose an approach that will allow 0:22:04.960 --> 0:22:07.880 us to make this hard problem a little bit easier 0:22:07.920 --> 0:22:11.360 for us to solve. And it's called a fairness statement. 0:22:11.720 --> 0:22:15.680 So what is a fairness statement? That's an application specific 0:22:15.720 --> 0:22:20.159 commitment to defined and measurable fairness goals. The scope of 0:22:20.160 --> 0:22:23.720 this fairness's statement is going to include defining the relevant 0:22:23.760 --> 0:22:27.600 fairness objective or constraint for the specific algorithm that we're 0:22:27.600 --> 0:22:31.840 working on developing. So, for example, that could be we 0:22:31.880 --> 0:22:35.480 want to make sure that African American people and white 0:22:35.480 --> 0:22:41.879 people received similar kidney functions scores for similar actual kidney function. Now, 0:22:42.640 --> 0:22:46.040 now that we've defined a fairness objective, we can document 0:22:46.080 --> 0:22:50.240 potential sources of bias that might impact our fairness subjective 0:22:50.720 --> 0:22:55.240 and also the downstream impact will see two individuals or groups, right, 0:22:55.359 --> 0:22:57.680 So this might be the place where we raise well, 0:22:57.720 --> 0:23:01.800 if our algorithms racial racially bias, we might see African 0:23:01.840 --> 0:23:05.719 Americans play prioritize at a lower priority on the kidney 0:23:05.720 --> 0:23:09.040 waiting list, and I might leave to adverse healthcare outcomes 0:23:09.080 --> 0:23:14.840 for that population. Finally, once we've documented the source of biases, 0:23:15.080 --> 0:23:19.600 we can identify appropriate procedural and technical controls that we 0:23:19.600 --> 0:23:23.520 would would take to mitigate the unacceptable risks. Right. So 0:23:23.640 --> 0:23:26.640 that could be, for example, implementing one of the classes 0:23:26.640 --> 0:23:30.240 of fair algorithms that we discussed before. One of the 0:23:30.359 --> 0:23:33.160 key benefits of the fairness statement is that it gives 0:23:33.280 --> 0:23:36.720 data scientists a named goal they can work towards, and 0:23:36.840 --> 0:23:40.000 that helps them informed choices and trade offs in the 0:23:40.080 --> 0:23:45.360 development of algorithms and the deployment. So, for example, if 0:23:45.400 --> 0:23:48.720 we had a fairness statement that was in place for 0:23:48.760 --> 0:23:51.880 the researchers who developed the c k d EPI algorithm 0:23:51.960 --> 0:23:56.120 for kidney function UH, that might have helped them say, hey, 0:23:56.240 --> 0:23:58.840 we could include race and have a slight bump and 0:23:59.000 --> 0:24:04.960 overall accuracy for our algorithm. But that presents a high 0:24:05.160 --> 0:24:09.480 risk of unfair outcomes. Therefore, the cost of this solution 0:24:09.920 --> 0:24:15.440 outweighs the small benefit of controlling for race and measuring 0:24:15.560 --> 0:24:20.080 kidney function. Now, the other key thing fit here is 0:24:20.119 --> 0:24:26.680 that this allows algorithmic developers to catch problems early, at 0:24:26.720 --> 0:24:29.240 the stage when the algorithm is still in development and 0:24:29.280 --> 0:24:32.960 before it's been deployed into the world. This might mean 0:24:33.000 --> 0:24:36.520 that we catch an issue before it actually creates harm 0:24:36.720 --> 0:24:39.879 for people in real life. So now that we've talked 0:24:39.880 --> 0:24:43.800 about how we can make the UH fairness problem a 0:24:43.800 --> 0:24:46.480 little bit less hard, now let's talk about how we 0:24:46.520 --> 0:24:50.560 can incentivize people to actually tackle it. I want to 0:24:50.600 --> 0:24:55.159 propose an approach called the algorithmic Practice audit. So what 0:24:55.359 --> 0:24:58.399 is this? As an independent third party review of an 0:24:58.480 --> 0:25:03.320 organization's algorithmic the season outcomes. On the process front, we 0:25:03.400 --> 0:25:07.359 might evaluate questions like are we using a representative training 0:25:07.440 --> 0:25:10.679 data set to trade our model. We might also question 0:25:10.680 --> 0:25:14.280 whether or not the organization is using fair classes of 0:25:14.320 --> 0:25:19.920 algorithms when they exist to train models. On the outcome front, 0:25:20.160 --> 0:25:24.080