1 00:00:00,080 --> 00:00:04,520 Speaker 1: Yeah. Welcome to How to Citizen with Baritune Day, a 2 00:00:04,600 --> 00:00:08,400 Speaker 1: podcast that reimagined citizen as a verb, not a legal status. 3 00:00:09,240 --> 00:00:11,639 Speaker 1: This season is all about tech and how it can 4 00:00:11,680 --> 00:00:15,960 Speaker 1: bring us together instead of tearing us apart. We're bringing 5 00:00:16,000 --> 00:00:18,720 Speaker 1: you the people using technology for so much more than 6 00:00:18,800 --> 00:00:23,000 Speaker 1: revenue and user growth. They're using it to help us citizen. 7 00:00:34,440 --> 00:00:37,080 Speaker 1: I have been working over the past year to try 8 00:00:37,120 --> 00:00:41,040 Speaker 1: to integrate my own thinking around technology, and last year 9 00:00:41,080 --> 00:00:46,080 Speaker 1: I wrote a bit of a manifesto. Back in I 10 00:00:46,120 --> 00:00:49,600 Speaker 1: was invited to speak at Google IOH, an annual developer 11 00:00:49,640 --> 00:00:53,239 Speaker 1: conference held by Google. They wanted me to share my 12 00:00:53,360 --> 00:00:55,800 Speaker 1: thoughts on what the future of technology could look like. 13 00:00:56,600 --> 00:00:59,040 Speaker 1: I went on a journey to try to understand how 14 00:00:59,120 --> 00:01:04,080 Speaker 1: all my data existed amongst the major platforms, amongst app developers, 15 00:01:04,200 --> 00:01:06,520 Speaker 1: and what came out of that was a set of 16 00:01:06,560 --> 00:01:12,119 Speaker 1: principles to help guide us more conscientiously into the future. Now. 17 00:01:12,160 --> 00:01:16,240 Speaker 1: The first principle of my manifesto is all about transparency. 18 00:01:16,560 --> 00:01:20,120 Speaker 1: Like I wanted to understand what was going on inside 19 00:01:20,200 --> 00:01:22,800 Speaker 1: the apps, behind the websites I was spending all my 20 00:01:22,880 --> 00:01:25,959 Speaker 1: time on. When I want to know what's in my food, 21 00:01:26,360 --> 00:01:29,320 Speaker 1: I don't drag a chemistry set to the grocery store 22 00:01:29,640 --> 00:01:33,240 Speaker 1: and inspect every item point by point. I read the 23 00:01:33,280 --> 00:01:37,800 Speaker 1: nutrition label. I know the content, the calories, the ratings. 24 00:01:38,280 --> 00:01:41,720 Speaker 1: I shouldn't have to guess about what's inside the product. 25 00:01:41,760 --> 00:01:45,120 Speaker 1: I certainly shouldn't have to read thirty three thousand word 26 00:01:45,240 --> 00:01:49,880 Speaker 1: legally's terms of service to figure out what's really happening inside. 27 00:01:50,640 --> 00:01:54,080 Speaker 1: It's pretty simple. We make better decisions about the things 28 00:01:54,120 --> 00:01:57,720 Speaker 1: we consume when we know what's in them. So if 29 00:01:57,720 --> 00:01:59,440 Speaker 1: I'm checking out an app on the app store right 30 00:01:59,560 --> 00:02:01,560 Speaker 1: and I see upfront that it's going to harvest my 31 00:02:01,640 --> 00:02:05,240 Speaker 1: data and slang it on some digital street corner, can 32 00:02:05,240 --> 00:02:11,640 Speaker 1: I interest you in so data? I can ask myself, Hey, self, 33 00:02:12,200 --> 00:02:14,480 Speaker 1: are you okay with this app harvesting your data and 34 00:02:14,480 --> 00:02:18,040 Speaker 1: slanging it on a digital street corner? And then, having 35 00:02:18,080 --> 00:02:21,360 Speaker 1: asked myself that question, I can decide whether or not 36 00:02:21,440 --> 00:02:23,880 Speaker 1: to download it. I don't have to hope that it 37 00:02:23,880 --> 00:02:28,480 Speaker 1: won't screw me over. I can know, but check it out. 38 00:02:28,720 --> 00:02:32,800 Speaker 1: This nutrition label idea hasn't just existed in the vacuum 39 00:02:32,800 --> 00:02:36,240 Speaker 1: of my own brain. It's a real thing. There are 40 00:02:36,400 --> 00:02:39,600 Speaker 1: actual people making nutrition labels in the world of tech. 41 00:02:40,760 --> 00:02:43,639 Speaker 1: In the same way that I walk into a bakery 42 00:02:43,760 --> 00:02:45,799 Speaker 1: and I see a cake that's been baked, and I 43 00:02:45,880 --> 00:02:49,400 Speaker 1: might think to myself, I wonder what's in that cake. 44 00:02:49,840 --> 00:02:51,360 Speaker 1: We would want the same thing for a data set, 45 00:02:51,440 --> 00:02:54,560 Speaker 1: where even if you encounter that data set in the wild, you, 46 00:02:54,600 --> 00:02:57,360 Speaker 1: as a data practitioner, will think to yourself, I wonder 47 00:02:57,400 --> 00:03:01,440 Speaker 1: if this is representative. Cash of Malinsky is one of 48 00:03:01,440 --> 00:03:04,760 Speaker 1: those people. These labels are a little different from what 49 00:03:04,800 --> 00:03:08,480 Speaker 1: I propose that Google, I yoe. Their data nutrition labels 50 00:03:08,480 --> 00:03:11,000 Speaker 1: aren't for consumers like me and you at the end 51 00:03:11,000 --> 00:03:13,959 Speaker 1: of the assembly line. Instead, therefore, the people at the 52 00:03:14,040 --> 00:03:19,000 Speaker 1: very beginning the data scientists. Now, Kasha's data nutrition labels 53 00:03:19,200 --> 00:03:22,040 Speaker 1: are an easy to use tool to help data scientists 54 00:03:22,280 --> 00:03:24,680 Speaker 1: pick the data that's right for the thing they're making. 55 00:03:26,840 --> 00:03:30,680 Speaker 1: We interact with algorithms every day, even when we're not 56 00:03:30,720 --> 00:03:34,600 Speaker 1: aware of it. They affect the decisions we make about hiring, 57 00:03:34,639 --> 00:03:38,440 Speaker 1: about policing, pretty much everything. And in the same way 58 00:03:38,480 --> 00:03:42,000 Speaker 1: that we the people ensure our well being through government 59 00:03:42,040 --> 00:03:46,920 Speaker 1: standards and regulations on business activities. For example, data scientists 60 00:03:46,920 --> 00:03:52,640 Speaker 1: needs standards to Kasha is fighting for standards that will 61 00:03:52,680 --> 00:03:56,480 Speaker 1: make sure that artificial intelligence works for our collective benefit 62 00:03:57,240 --> 00:04:04,200 Speaker 1: or at least doesn't undermine m Hi. Hello, how are 63 00:04:04,240 --> 00:04:07,440 Speaker 1: you feeling right now? Kasha? I'm feeling pretty good the 64 00:04:07,440 --> 00:04:10,040 Speaker 1: beginning of another way. Kasha is the co founder and 65 00:04:10,200 --> 00:04:13,840 Speaker 1: lead of the Data Nutrition Project, the team behind those labels. 66 00:04:14,320 --> 00:04:17,400 Speaker 1: They've also worked as a digital services technologist in the 67 00:04:17,400 --> 00:04:21,799 Speaker 1: White House, on COVID analytics at Mackenzie and in communications 68 00:04:22,160 --> 00:04:27,599 Speaker 1: at Google. Yeah. Yeah, so I've kind of I've jumped around. Yeah, 69 00:04:30,000 --> 00:04:33,000 Speaker 1: so why don't you introduce yourself and just tell me 70 00:04:33,080 --> 00:04:36,279 Speaker 1: what you do. My name is Kasha Shamalinski, and I 71 00:04:36,320 --> 00:04:40,479 Speaker 1: am a technologist working on the ethics of data. And 72 00:04:40,880 --> 00:04:43,919 Speaker 1: I'd say, you know importantly to me, although I have 73 00:04:43,960 --> 00:04:47,040 Speaker 1: always been a nerd and I studied physics along time ago. 74 00:04:47,240 --> 00:04:50,359 Speaker 1: I come from a family of artists. Actually, the painting 75 00:04:50,560 --> 00:04:53,160 Speaker 1: behind me is by my brother. There's another one in 76 00:04:53,200 --> 00:04:55,720 Speaker 1: the room by my mom um. And so I come 77 00:04:55,720 --> 00:04:58,520 Speaker 1: from a really kind of multidisciplinary group of people who 78 00:04:58,640 --> 00:05:01,240 Speaker 1: are driven by our passions. And that's kind of what 79 00:05:01,279 --> 00:05:02,920 Speaker 1: I've tried to do too, and it's just led me 80 00:05:03,200 --> 00:05:07,320 Speaker 1: on many different paths. Where does the interest in technology 81 00:05:07,360 --> 00:05:10,479 Speaker 1: come from? For you? You know, I don't think that 82 00:05:10,520 --> 00:05:12,880 Speaker 1: it's really an interest in technology. It's just that we're 83 00:05:12,920 --> 00:05:16,760 Speaker 1: in a technological time. And so when I graduated from 84 00:05:16,839 --> 00:05:20,640 Speaker 1: university with this physics degree, I had a few options, 85 00:05:21,040 --> 00:05:24,520 Speaker 1: and none of them really seemed great. Uh. You know, 86 00:05:24,560 --> 00:05:27,039 Speaker 1: I could go into defense work, I could become a spy, 87 00:05:27,279 --> 00:05:29,719 Speaker 1: or I could make weapons, and that really wasn't so 88 00:05:29,800 --> 00:05:35,600 Speaker 1: interesting to me. Was being was Was spy really an option? Uh? Yes, 89 00:05:37,400 --> 00:05:39,919 Speaker 1: so you know I could do that, um, but I 90 00:05:39,960 --> 00:05:42,520 Speaker 1: didn't end and none of these are really interesting because 91 00:05:42,560 --> 00:05:45,320 Speaker 1: I wanted to make an impact and I wanted to 92 00:05:45,360 --> 00:05:47,719 Speaker 1: drive change, and I think that was around you know, um, 93 00:05:47,800 --> 00:05:51,240 Speaker 1: early thousands, and technology was the place to be. That's 94 00:05:51,240 --> 00:05:53,240 Speaker 1: where you could really have the most impact and solve 95 00:05:53,279 --> 00:05:55,920 Speaker 1: really big problems. Um. And so that's where I ended up. 96 00:05:56,520 --> 00:05:58,800 Speaker 1: So I actually don't think that it's really about the 97 00:05:58,839 --> 00:06:00,920 Speaker 1: technology at all. I think that the technology is just 98 00:06:01,000 --> 00:06:03,359 Speaker 1: a tool that you can use to to kind of 99 00:06:03,400 --> 00:06:06,320 Speaker 1: make an impact in the world. I love the way 100 00:06:06,320 --> 00:06:09,920 Speaker 1: you describe the interest in technology is really just an 101 00:06:09,920 --> 00:06:13,000 Speaker 1: interest in the world. So do you remember some of 102 00:06:13,040 --> 00:06:16,320 Speaker 1: the first steps that led you to what you are 103 00:06:16,360 --> 00:06:20,880 Speaker 1: doing now? So when I graduated, I actually applied to 104 00:06:20,920 --> 00:06:23,479 Speaker 1: many things and didn't get them. And what I realized 105 00:06:23,520 --> 00:06:25,159 Speaker 1: that I really didn't know how to do it at all. 106 00:06:25,160 --> 00:06:28,240 Speaker 1: Always tell a story. Um, and coming out of a 107 00:06:28,520 --> 00:06:32,760 Speaker 1: fairly technical path, I couldn't really make eye contact. I 108 00:06:32,800 --> 00:06:35,680 Speaker 1: hadn't talked to a variety of people. I mean, I 109 00:06:35,720 --> 00:06:38,000 Speaker 1: was definitely one of the only people who had my 110 00:06:38,040 --> 00:06:41,120 Speaker 1: identity in in that discipline at that time. I went 111 00:06:41,160 --> 00:06:43,200 Speaker 1: to a school where the the head of the school 112 00:06:43,200 --> 00:06:45,919 Speaker 1: at the time was saying that women might not be 113 00:06:45,960 --> 00:06:49,120 Speaker 1: able to do science because biologically they were inferior in 114 00:06:49,160 --> 00:06:52,880 Speaker 1: some way. Oh that's nice, very welcoming environment. Oh yeah, 115 00:06:52,920 --> 00:06:55,080 Speaker 1: super welcoming. And I was studying physics and at the time, 116 00:06:55,080 --> 00:06:57,520 Speaker 1: I you know, it was female identified. I now identify 117 00:06:57,560 --> 00:07:00,440 Speaker 1: as non binary. Um. But it wasn't like a great 118 00:07:00,440 --> 00:07:02,839 Speaker 1: place to be doing science, and I just felt like 119 00:07:02,839 --> 00:07:05,440 Speaker 1: coming out of that, I was, UM. I didn't know 120 00:07:05,440 --> 00:07:07,000 Speaker 1: how to talk to people. I didn't know what it 121 00:07:07,000 --> 00:07:08,760 Speaker 1: was like to be part of a great community. And 122 00:07:08,760 --> 00:07:11,200 Speaker 1: so I actually went into communications at Google, which was 123 00:07:11,240 --> 00:07:16,600 Speaker 1: strange duringdustory industry. I went from this super nerdy, very 124 00:07:16,640 --> 00:07:19,640 Speaker 1: male dominated place to like a kind of like the 125 00:07:19,680 --> 00:07:23,040 Speaker 1: party wing of of technology at the time. Right, So 126 00:07:23,080 --> 00:07:25,560 Speaker 1: people who are doing a lot of marketing and communications 127 00:07:25,600 --> 00:07:27,800 Speaker 1: and talking to journalists and telling stories and trying to 128 00:07:27,840 --> 00:07:30,200 Speaker 1: figure out like what's interesting it has this fit into 129 00:07:30,240 --> 00:07:39,200 Speaker 1: the greater narratives of our time. So while at Google, 130 00:07:39,280 --> 00:07:42,000 Speaker 1: I got to see inside of so many different projects 131 00:07:42,320 --> 00:07:45,280 Speaker 1: that I think was a great benefit to being part 132 00:07:45,280 --> 00:07:48,200 Speaker 1: of that strategy team. So I got to work on 133 00:07:48,360 --> 00:07:50,680 Speaker 1: core Search, I got to work on image search, I 134 00:07:50,720 --> 00:07:55,200 Speaker 1: got to work on Gmail and Calendar, and I started 135 00:07:55,200 --> 00:07:59,960 Speaker 1: to see the importance of first of all, knowing why 136 00:08:00,120 --> 00:08:03,480 Speaker 1: you're building something before you start to build it, right, 137 00:08:03,520 --> 00:08:06,080 Speaker 1: And there were so many times that I saw really 138 00:08:06,280 --> 00:08:08,800 Speaker 1: really cool product at the end of the day, something 139 00:08:09,000 --> 00:08:11,760 Speaker 1: an algorithm or something technical that was just really cool, 140 00:08:12,160 --> 00:08:15,080 Speaker 1: but there was no reason that it needed to exist. 141 00:08:15,280 --> 00:08:18,360 Speaker 1: Right from from a person perspective, from a society perspective, 142 00:08:26,760 --> 00:08:29,680 Speaker 1: I am relieved to hear you say that. There's been 143 00:08:29,680 --> 00:08:32,520 Speaker 1: one of my critiques of this industry for quite some time. 144 00:08:32,559 --> 00:08:35,760 Speaker 1: It's like, whose problems are you trying to solve? And 145 00:08:35,800 --> 00:08:37,440 Speaker 1: so you were at the epicenter of one of the 146 00:08:37,840 --> 00:08:41,719 Speaker 1: major companies seeing some of this firsthand. Yeah, that's exactly right, 147 00:08:41,800 --> 00:08:44,000 Speaker 1: And it was endemic. I mean It just happens all 148 00:08:44,000 --> 00:08:46,920 Speaker 1: the time, and it's not the fault of anyone in particular. 149 00:08:46,960 --> 00:08:48,800 Speaker 1: You just put a bunch of really smart engineers on 150 00:08:48,840 --> 00:08:52,200 Speaker 1: a technical problem and they just find amazing ways to 151 00:08:52,240 --> 00:08:53,960 Speaker 1: solve that. But then at the end of the day 152 00:08:54,000 --> 00:08:55,400 Speaker 1: you say, well, how are we actually going to use this? 153 00:08:56,240 --> 00:08:58,800 Speaker 1: And that would fall to the comms team, right or 154 00:08:58,800 --> 00:09:00,880 Speaker 1: the marketing team to say, okay, now what are we 155 00:09:00,920 --> 00:09:02,560 Speaker 1: going to do with this? Um So that was one 156 00:09:02,600 --> 00:09:04,520 Speaker 1: thing and that's why I actually ended up moving into 157 00:09:04,520 --> 00:09:07,439 Speaker 1: product management, where I could think about why we want 158 00:09:07,480 --> 00:09:09,160 Speaker 1: to build something to begin with, and to make sure 159 00:09:09,200 --> 00:09:11,720 Speaker 1: we're building the right thing. Um So I got closer 160 00:09:11,760 --> 00:09:16,240 Speaker 1: to the technology after that job. The second thing that 161 00:09:16,400 --> 00:09:20,160 Speaker 1: I became aware of is the importance of considering the 162 00:09:20,200 --> 00:09:22,880 Speaker 1: whole pipeline of the thing that you build, because the 163 00:09:22,880 --> 00:09:25,360 Speaker 1: thing that you build, it's d n A, is in 164 00:09:25,720 --> 00:09:28,400 Speaker 1: the initial data that you put into it. And I'm 165 00:09:28,400 --> 00:09:33,640 Speaker 1: talking specifically about algorithmic systems here. So one example I 166 00:09:33,679 --> 00:09:36,120 Speaker 1: have from my days when I was at Google. I 167 00:09:36,160 --> 00:09:38,959 Speaker 1: actually I worked out of the London office and there 168 00:09:39,000 --> 00:09:42,319 Speaker 1: was a new search capability and it was trained entirely 169 00:09:42,360 --> 00:09:47,080 Speaker 1: on one particular accent and then when other people tried 170 00:09:47,120 --> 00:09:49,640 Speaker 1: to use that, if they didn't have that very specific accent, 171 00:09:49,679 --> 00:09:52,560 Speaker 1: it wasn't working so well. And I really didn't know 172 00:09:52,640 --> 00:09:55,240 Speaker 1: much about AI at the time. I hadn't studied it, 173 00:09:55,760 --> 00:09:58,200 Speaker 1: but I realized, you know, bias in bias out like 174 00:09:58,200 --> 00:10:01,240 Speaker 1: garbage in garbage out. You you feed this machine something, 175 00:10:01,280 --> 00:10:03,719 Speaker 1: the machine is going to look exactly like what you 176 00:10:03,840 --> 00:10:10,160 Speaker 1: fed it. Right, you are what you eat. We'll be 177 00:10:10,280 --> 00:10:24,439 Speaker 1: right back. We use these terms data, we use this 178 00:10:24,640 --> 00:10:29,240 Speaker 1: terms algorithm and artificial intelligence. And so before we keep going, 179 00:10:29,320 --> 00:10:32,840 Speaker 1: I'd love for you to pause and kind of explain 180 00:10:32,920 --> 00:10:38,320 Speaker 1: what these things are in their relationship to each other. Data, algorithms, 181 00:10:38,880 --> 00:10:44,960 Speaker 1: artificial intelligence. How how does Kasha define these? Yeah, thank 182 00:10:44,960 --> 00:10:47,280 Speaker 1: you for taking a moment. I think that that's um. 183 00:10:47,320 --> 00:10:50,079 Speaker 1: Something that's so important in technology is that people feel 184 00:10:50,080 --> 00:10:53,360 Speaker 1: like they aren't allowed to have an opinion or have 185 00:10:53,480 --> 00:10:56,760 Speaker 1: thoughts about it because they don't quote unquote don't understand it. 186 00:10:57,640 --> 00:11:03,000 Speaker 1: But you're right, it's just it's just a definition all issue. Often. So, 187 00:11:03,559 --> 00:11:09,480 Speaker 1: data is anything that is programmatically accessible that is probably 188 00:11:09,480 --> 00:11:13,320 Speaker 1: in enough volume to be used for something by a system. 189 00:11:13,360 --> 00:11:16,000 Speaker 1: So it could be records of something, it could be 190 00:11:16,160 --> 00:11:19,440 Speaker 1: whether information. It could be the notes taken by a 191 00:11:19,480 --> 00:11:22,680 Speaker 1: doctor that then get turned into something that's programmatically accessible. 192 00:11:22,960 --> 00:11:25,440 Speaker 1: There's a lot of stuff and you can feed that 193 00:11:25,480 --> 00:11:30,240 Speaker 1: to a machine. I'm really interested in algorithms because it's 194 00:11:30,320 --> 00:11:33,479 Speaker 1: kind of the practical way of understanding something like AI. 195 00:11:33,640 --> 00:11:36,520 Speaker 1: It's it's a mathematical formula and it it takes some 196 00:11:36,600 --> 00:11:39,440 Speaker 1: stuff and then it outputs something. So that could be 197 00:11:39,480 --> 00:11:44,120 Speaker 1: something like you input where you live and your name, 198 00:11:44,679 --> 00:11:47,560 Speaker 1: and then the algorithm will churn and spit out something 199 00:11:47,600 --> 00:11:50,000 Speaker 1: like you know what race or ethnicity it thinks you are. 200 00:11:50,440 --> 00:11:54,120 Speaker 1: And that algorithm, in order to to make whatever guesses 201 00:11:54,200 --> 00:11:56,920 Speaker 1: it's making, needs to be fed a bunch of data 202 00:11:57,520 --> 00:12:01,160 Speaker 1: so that it can start to recognize patterns. When you 203 00:12:01,200 --> 00:12:04,240 Speaker 1: deploy that algorithm out in in the world, you feed 204 00:12:04,280 --> 00:12:06,480 Speaker 1: it some data and it will spit out what it 205 00:12:06,520 --> 00:12:09,320 Speaker 1: believes is the pattern that it recognizes based on what 206 00:12:09,360 --> 00:12:16,800 Speaker 1: it knows. You know, there's different flavors of AI. I 207 00:12:16,800 --> 00:12:19,040 Speaker 1: think a lot of people are very afraid of kind 208 00:12:19,080 --> 00:12:25,079 Speaker 1: of the terminator type AI. I'll be back as as 209 00:12:25,120 --> 00:12:27,319 Speaker 1: we should be, because the terminator is very scary. I've 210 00:12:27,360 --> 00:12:29,880 Speaker 1: seen the documentary many times and I don't want to 211 00:12:29,880 --> 00:12:35,520 Speaker 1: live in that world. Yeah, legitimately very scary. UM. And 212 00:12:35,559 --> 00:12:37,800 Speaker 1: so there's this there's this question of Okay, is the 213 00:12:37,840 --> 00:12:40,480 Speaker 1: AI going to come to eat our lunch? Right? Are 214 00:12:40,480 --> 00:12:42,120 Speaker 1: they smarter than us? And all the things that we 215 00:12:42,160 --> 00:12:45,640 Speaker 1: can do, and that's like you know, generalized AI or 216 00:12:45,840 --> 00:12:49,200 Speaker 1: or even kind of super AI. We're not quite there yet. Currently, 217 00:12:49,240 --> 00:12:52,040 Speaker 1: we're in the phase where we have discrete AI that 218 00:12:52,080 --> 00:12:55,440 Speaker 1: makes discreet decisions and we leverage those to help us 219 00:12:55,520 --> 00:12:59,199 Speaker 1: in our daily lives or to hurt us. Sometimes data 220 00:12:59,280 --> 00:13:02,800 Speaker 1: as food for algorithms. I think it's a really useful metaphor. 221 00:13:03,040 --> 00:13:05,959 Speaker 1: And a lot of us out in the wild who 222 00:13:06,000 --> 00:13:09,400 Speaker 1: aren't specialized in this, I think we're not encouraged to 223 00:13:09,480 --> 00:13:13,439 Speaker 1: understand that relationship. I agree, And I think the the 224 00:13:13,480 --> 00:13:16,320 Speaker 1: relationship between what you feed the algorithm and what it 225 00:13:16,360 --> 00:13:20,440 Speaker 1: gives you is so direct, and people don't necessarily know 226 00:13:20,600 --> 00:13:23,120 Speaker 1: that or see that. And what you see is the 227 00:13:23,679 --> 00:13:26,640 Speaker 1: harm or the output that comes out of the system, 228 00:13:26,679 --> 00:13:28,800 Speaker 1: and what you don't see is all the work that 229 00:13:28,800 --> 00:13:31,680 Speaker 1: went into building that system. You have someone who decided 230 00:13:31,679 --> 00:13:33,480 Speaker 1: in the beginning they wanted to use AI, and then 231 00:13:33,520 --> 00:13:35,839 Speaker 1: you have somebody who went and found the data, and 232 00:13:35,920 --> 00:13:38,280 Speaker 1: you have somebody else who cleaned the data, and you've 233 00:13:38,320 --> 00:13:42,040 Speaker 1: got somebody or somebody's who then built the algorithm and 234 00:13:42,080 --> 00:13:44,520 Speaker 1: train the algorithm, and then you have the somebodies who 235 00:13:44,559 --> 00:13:47,079 Speaker 1: coded that up, and then you have the somebody's that's 236 00:13:47,080 --> 00:13:49,320 Speaker 1: deployed that, and then you have people who are running that. 237 00:13:50,400 --> 00:13:52,480 Speaker 1: And so when the algorithm comes out the end and 238 00:13:52,559 --> 00:13:54,640 Speaker 1: there's a decision that's made you get the loan, you 239 00:13:54,679 --> 00:13:58,559 Speaker 1: didn't get the loan. The algorithm recognizes your speech, doesn't 240 00:13:58,559 --> 00:14:02,640 Speaker 1: recognize your speech see ease, you doesn't see you. People think, oh, 241 00:14:02,720 --> 00:14:05,280 Speaker 1: just change the algorithm. Oh no, you have to go 242 00:14:05,400 --> 00:14:13,280 Speaker 1: all the way back to the beginning because you have 243 00:14:13,360 --> 00:14:15,240 Speaker 1: that long chain of people who are doing so many 244 00:14:15,240 --> 00:14:18,000 Speaker 1: different things, and it becomes very complicated to try to 245 00:14:18,040 --> 00:14:21,520 Speaker 1: fix that. So the more that we can understand that 246 00:14:21,560 --> 00:14:25,360 Speaker 1: the process begins with the question do I need AI 247 00:14:25,480 --> 00:14:28,680 Speaker 1: for this? And then very quickly after where are we 248 00:14:28,680 --> 00:14:30,760 Speaker 1: going to get the data to feed that? So that 249 00:14:30,840 --> 00:14:34,080 Speaker 1: we make the right decision. The sooner we understand that 250 00:14:34,120 --> 00:14:36,360 Speaker 1: as a society, I think the easier it's going to 251 00:14:36,440 --> 00:14:38,640 Speaker 1: be for us to build better AI because we're not 252 00:14:38,680 --> 00:14:40,560 Speaker 1: just catching the issues at the very end of what 253 00:14:40,600 --> 00:14:44,000 Speaker 1: can be a year's long process. Mm hm. So so 254 00:14:44,080 --> 00:14:47,840 Speaker 1: what problems does the Data Nutrition Project aimed to tackle? 255 00:14:48,240 --> 00:14:50,240 Speaker 1: We've kind of talked about them all in pieces. At 256 00:14:50,240 --> 00:14:53,320 Speaker 1: its core, the Data Nutrition Project, which is this research 257 00:14:53,440 --> 00:14:56,520 Speaker 1: organization that I co found a bunch of very smart people. 258 00:14:56,960 --> 00:14:59,520 Speaker 1: We were all part of a fellowship that was looking 259 00:14:59,600 --> 00:15:02,600 Speaker 1: at the ethics and governance of AI. And so when 260 00:15:02,640 --> 00:15:05,000 Speaker 1: we sat down to say what are the real things 261 00:15:05,080 --> 00:15:08,640 Speaker 1: that we can do to drive change um as practitioners, 262 00:15:08,640 --> 00:15:10,320 Speaker 1: as people in the space, as people who had built 263 00:15:10,360 --> 00:15:14,760 Speaker 1: AI before, we decided let's just go really small. And 264 00:15:14,760 --> 00:15:17,520 Speaker 1: obviously it's actually a huge problem and it's it's very challenging. 265 00:15:17,600 --> 00:15:20,440 Speaker 1: But instead of saying, let's look at the harms that 266 00:15:20,520 --> 00:15:23,920 Speaker 1: come out of an AI system, let's just think about 267 00:15:23,920 --> 00:15:26,680 Speaker 1: what goes in. And I think we're maybe eating a 268 00:15:26,720 --> 00:15:28,520 Speaker 1: lot of snacks. We were hold up at the M 269 00:15:28,520 --> 00:15:30,320 Speaker 1: I T Media Lab, right, So we were just all 270 00:15:30,360 --> 00:15:32,920 Speaker 1: in this room for many many hours, many many days, 271 00:15:33,280 --> 00:15:35,520 Speaker 1: and I think somebody at some point picked up, you know, 272 00:15:35,560 --> 00:15:39,840 Speaker 1: a snack package, and we're like, what if you just 273 00:15:39,920 --> 00:15:43,000 Speaker 1: had a nutritional label like the one you have on food, 274 00:15:43,160 --> 00:15:45,240 Speaker 1: you just put that on a data set. What would 275 00:15:45,280 --> 00:15:48,480 Speaker 1: that do? I mean, is it possible? Right? But if 276 00:15:48,560 --> 00:15:50,960 Speaker 1: if it is possible, would that actually change things? And 277 00:15:51,000 --> 00:15:52,880 Speaker 1: we started talking it over and we thought, you know, 278 00:15:52,960 --> 00:15:57,200 Speaker 1: we think it would. In our experience in data science 279 00:15:57,200 --> 00:16:02,640 Speaker 1: as practitioners, we know that data doesn't hum with standardized documentation, 280 00:16:03,520 --> 00:16:06,400 Speaker 1: and often you get a data set and you don't 281 00:16:06,400 --> 00:16:08,120 Speaker 1: know how you're supposed to use it or not use it. 282 00:16:08,840 --> 00:16:11,160 Speaker 1: There may or may not be tools that you use 283 00:16:11,600 --> 00:16:13,960 Speaker 1: to look at things that will tell you whether that 284 00:16:14,040 --> 00:16:16,120 Speaker 1: data set is healthy for the thing that you want 285 00:16:16,160 --> 00:16:22,120 Speaker 1: to do with it. The standard process would be a 286 00:16:22,120 --> 00:16:25,440 Speaker 1: product manager CEO would come over to the desk of 287 00:16:25,840 --> 00:16:28,760 Speaker 1: data scientists and say, look, we have all this information 288 00:16:28,800 --> 00:16:31,000 Speaker 1: about this new product we want to sell. We need 289 00:16:31,040 --> 00:16:35,640 Speaker 1: to map the marketing information to the demographics of people 290 00:16:35,720 --> 00:16:38,200 Speaker 1: who are likely to want to buy our product or 291 00:16:38,200 --> 00:16:41,160 Speaker 1: click on our product. Don't make it happen. And the 292 00:16:41,200 --> 00:16:43,320 Speaker 1: data scientist goes okay, and the person goes, oh yeah, 293 00:16:43,320 --> 00:16:47,360 Speaker 1: by tuesday, and the persons like, oh okay, let me 294 00:16:47,400 --> 00:16:50,320 Speaker 1: go find the right data for that. There's a whole world. 295 00:16:50,320 --> 00:16:53,080 Speaker 1: You just google a bunch of stuff and then you 296 00:16:53,160 --> 00:16:55,360 Speaker 1: get the data, and then you kind of poke around 297 00:16:55,400 --> 00:16:57,840 Speaker 1: and you think, as seems pretty good, and then you 298 00:16:57,960 --> 00:17:01,160 Speaker 1: use it and you build your algorithm on that. Your 299 00:17:01,160 --> 00:17:04,680 Speaker 1: algorithm is going to determine which demographics or what geographies 300 00:17:04,760 --> 00:17:07,440 Speaker 1: or whatever it is you're trying to do. You train 301 00:17:07,520 --> 00:17:09,760 Speaker 1: it on that data you found, and then you deploy 302 00:17:09,840 --> 00:17:12,960 Speaker 1: that algorithm and it starts to work in production. And 303 00:17:13,320 --> 00:17:16,000 Speaker 1: you know, no fault of anybody, really, but the industry 304 00:17:16,000 --> 00:17:19,080 Speaker 1: has grown up so much faster than the structures and 305 00:17:19,119 --> 00:17:22,280 Speaker 1: the scaffolding to keep that industry doing the right thing. 306 00:17:23,000 --> 00:17:25,560 Speaker 1: So there might be documentation on some of the data, 307 00:17:25,600 --> 00:17:27,639 Speaker 1: there might not be in some cases. We're working with 308 00:17:27,640 --> 00:17:30,320 Speaker 1: a data partner that was very concerned how people were 309 00:17:30,320 --> 00:17:33,040 Speaker 1: going to use their data. The data set documentation was 310 00:17:33,080 --> 00:17:37,320 Speaker 1: an eighty page PDF. Zero that data scientist who's on 311 00:17:37,359 --> 00:17:40,040 Speaker 1: deadline for Tuesday is not going to read eighty pages. 312 00:17:40,800 --> 00:17:45,080 Speaker 1: So our thought was, hey, can we distill the most 313 00:17:45,119 --> 00:17:49,080 Speaker 1: important components of a data set and its usage to 314 00:17:49,160 --> 00:17:52,000 Speaker 1: something that is maybe one sheet two sheets right, using 315 00:17:52,000 --> 00:17:54,679 Speaker 1: the analogy of the nutrition label, put it on a 316 00:17:54,720 --> 00:17:57,240 Speaker 1: data set, and then make that the standard so that 317 00:17:57,280 --> 00:17:59,520 Speaker 1: anybody who is picking up a data set to decide 318 00:17:59,520 --> 00:18:01,560 Speaker 1: whether or not to use. It will very quickly be 319 00:18:01,600 --> 00:18:03,520 Speaker 1: able to assess is this healthy for the thing I 320 00:18:03,560 --> 00:18:06,600 Speaker 1: want to do. It's a novel application of a thing 321 00:18:06,680 --> 00:18:09,639 Speaker 1: that so many of us understand. What are some of 322 00:18:09,680 --> 00:18:13,080 Speaker 1: the harms you've seen some of the harms you're trying 323 00:18:13,119 --> 00:18:16,320 Speaker 1: to avoid by the data scientists who are building these 324 00:18:16,359 --> 00:18:21,520 Speaker 1: services not having access to healthy data. Yeah. Let's say 325 00:18:21,560 --> 00:18:24,640 Speaker 1: you have a data set that health outcomes and you're 326 00:18:24,680 --> 00:18:27,359 Speaker 1: looking at people who have had heart attacks or something 327 00:18:27,440 --> 00:18:31,320 Speaker 1: like that, and you realize that the data was only 328 00:18:31,359 --> 00:18:35,440 Speaker 1: taken from men in their sixties. If you are now 329 00:18:35,480 --> 00:18:38,200 Speaker 1: going to use this as a data set to train 330 00:18:38,200 --> 00:18:42,160 Speaker 1: an algorithm to provide early warning signs for who might 331 00:18:42,240 --> 00:18:46,600 Speaker 1: have a heart attack, you're gonna miss entire demographics of people, 332 00:18:46,680 --> 00:18:49,000 Speaker 1: which may or may not matter. That's a question. Does 333 00:18:49,040 --> 00:18:52,160 Speaker 1: that matter? I don't know, But perhaps it matters what 334 00:18:52,240 --> 00:18:54,679 Speaker 1: the average size of a body is, or the average 335 00:18:54,680 --> 00:18:57,399 Speaker 1: age of a body is, or maybe there's something that 336 00:18:57,560 --> 00:19:00,080 Speaker 1: is gender or sex related, and you will miss so 337 00:19:00,080 --> 00:19:02,119 Speaker 1: all of that. If you just take the data at 338 00:19:02,160 --> 00:19:04,840 Speaker 1: face value, you don't think about who's not represented here. 339 00:19:05,440 --> 00:19:09,080 Speaker 1: I remember examples that I used to cite in some talks. 340 00:19:09,119 --> 00:19:14,320 Speaker 1: It was the Amazon hiring decisions. Amazon software engineers recently 341 00:19:14,400 --> 00:19:18,680 Speaker 1: uncovered a big problem. Their new online recruiting tool did 342 00:19:18,760 --> 00:19:24,720 Speaker 1: not like women. It had an automated screening system for resumes, 343 00:19:24,760 --> 00:19:28,359 Speaker 1: and that system ignored all the women because the data 344 00:19:28,440 --> 00:19:32,840 Speaker 1: set showed that successful job candidates at Amazon were men. 345 00:19:33,720 --> 00:19:36,159 Speaker 1: And so the computer like garbage in, garbage out. The 346 00:19:36,200 --> 00:19:39,840 Speaker 1: way we've discussed said, well, you've defined success as mail. 347 00:19:40,640 --> 00:19:44,080 Speaker 1: You've fed me a bunch of female that's not success. 348 00:19:44,200 --> 00:19:49,320 Speaker 1: Therefore my formula dictates they get rejected, and that affects 349 00:19:49,320 --> 00:19:51,600 Speaker 1: people's job prospects. You know, that affects people sense of 350 00:19:51,640 --> 00:19:54,359 Speaker 1: their self worth and self esteem. That could open up 351 00:19:54,400 --> 00:19:57,560 Speaker 1: the company to liability, all kinds of harms in a 352 00:19:57,600 --> 00:20:03,040 Speaker 1: system that was supposed to reread efficiency and and help. Yeah, 353 00:20:03,080 --> 00:20:05,640 Speaker 1: that's a great example, and it's, you know, a very 354 00:20:05,680 --> 00:20:09,600 Speaker 1: true one. And I think that one was pretty high profile. 355 00:20:09,840 --> 00:20:12,720 Speaker 1: Imagine all the situations that either have never been caught 356 00:20:13,280 --> 00:20:15,000 Speaker 1: or we're kind of too low profile to make it 357 00:20:15,000 --> 00:20:18,879 Speaker 1: into the news. It happens all the time because the 358 00:20:19,000 --> 00:20:21,280 Speaker 1: algorithm is a kind of a reflection of whatever you've 359 00:20:21,280 --> 00:20:24,720 Speaker 1: fed it. So in that case, you had historical bias, 360 00:20:24,800 --> 00:20:27,920 Speaker 1: and so the historical bias in the resumes that they 361 00:20:27,920 --> 00:20:30,880 Speaker 1: were using to feed the algorithm showed that men were 362 00:20:30,960 --> 00:20:34,520 Speaker 1: hired more frequently and that was success. It also comes 363 00:20:34,520 --> 00:20:37,040 Speaker 1: down to, in terms of the metrics, how you're defining things. 364 00:20:37,240 --> 00:20:40,600 Speaker 1: If your definition of success is that someone was hired, 365 00:20:41,240 --> 00:20:44,359 Speaker 1: you're not necessarily saying that your definition is that person 366 00:20:44,400 --> 00:20:47,280 Speaker 1: was a good ended up being a good worker. Or 367 00:20:47,320 --> 00:20:49,800 Speaker 1: even if you're looking at the person's performance reviews and 368 00:20:49,840 --> 00:20:52,960 Speaker 1: saying success would be that we hire somebody who performs well. 369 00:20:53,680 --> 00:20:57,080 Speaker 1: But historically you hired more men than women. So even then, 370 00:20:57,160 --> 00:20:59,920 Speaker 1: if your success metric is someone who performed well, you're 371 00:21:00,080 --> 00:21:02,520 Speaker 1: already taking into account the historical bias that there are 372 00:21:02,520 --> 00:21:05,359 Speaker 1: more men than women who are hired. So there are 373 00:21:05,359 --> 00:21:07,919 Speaker 1: all different kinds of biases that are being captured in 374 00:21:07,960 --> 00:21:15,160 Speaker 1: the data. Something that the Data Nutrition Project is trying 375 00:21:15,240 --> 00:21:17,880 Speaker 1: to do with the label that we've built is highlight 376 00:21:18,000 --> 00:21:20,960 Speaker 1: these kinds of historical issues as well as the technical 377 00:21:20,960 --> 00:21:23,359 Speaker 1: issues in the data, and that I think is an 378 00:21:23,359 --> 00:21:27,040 Speaker 1: important balance to strike. It's not just about what you 379 00:21:27,080 --> 00:21:29,520 Speaker 1: can see in the data. It's also about what you 380 00:21:29,560 --> 00:21:33,560 Speaker 1: cannot see in the data. So in the case that 381 00:21:33,640 --> 00:21:35,960 Speaker 1: you just called out there with the resumes, you would 382 00:21:35,960 --> 00:21:39,040 Speaker 1: be able to see that's not representative with respect to gender, 383 00:21:39,680 --> 00:21:41,760 Speaker 1: and maybe you'd be able to see things like these 384 00:21:41,760 --> 00:21:45,000 Speaker 1: are all English language resumes, But what you would not 385 00:21:45,080 --> 00:21:49,000 Speaker 1: be able to see are things like socio economic differences 386 00:21:49,160 --> 00:21:52,040 Speaker 1: or people who never applied, or you know, what the 387 00:21:52,160 --> 00:21:55,720 Speaker 1: job market looked like whenever these resumes were collected, So 388 00:21:55,800 --> 00:21:57,119 Speaker 1: you'll kind of not be able to see any of 389 00:21:57,160 --> 00:21:59,280 Speaker 1: that if you just take a purely technical approach to 390 00:21:59,359 --> 00:22:02,080 Speaker 1: what's in the data up. So the data set nutrition 391 00:22:02,119 --> 00:22:04,680 Speaker 1: label tries to highlight those things as well to data 392 00:22:04,720 --> 00:22:08,600 Speaker 1: practitioners to say, before you use this data set, here 393 00:22:08,640 --> 00:22:12,400 Speaker 1: are some things you should consider, and sometimes will even 394 00:22:12,440 --> 00:22:15,080 Speaker 1: go as far as to say you probably shouldn't use 395 00:22:15,119 --> 00:22:18,199 Speaker 1: this data set for this particular thing because we just 396 00:22:18,359 --> 00:22:21,320 Speaker 1: know that it's not good for that, and that's always 397 00:22:21,320 --> 00:22:23,840 Speaker 1: an option, is to say don't use it. Right. It 398 00:22:23,880 --> 00:22:25,560 Speaker 1: doesn't mean people won't do it, but at least we 399 00:22:25,600 --> 00:22:27,600 Speaker 1: can give you a warning, and we kind of hope 400 00:22:27,600 --> 00:22:29,680 Speaker 1: that people have the best of intentions and are trying 401 00:22:29,720 --> 00:22:32,119 Speaker 1: to do the right thing. So it's about explaining what 402 00:22:32,280 --> 00:22:35,119 Speaker 1: is in the data set or in the data so 403 00:22:35,160 --> 00:22:37,320 Speaker 1: that you can decide as a practitioner whether or not 404 00:22:37,400 --> 00:22:43,720 Speaker 1: it is healthy for your usage. After the break, it's 405 00:22:43,720 --> 00:23:00,000 Speaker 1: snack time. I'm back hungry. So I'm holding a package 406 00:23:00,000 --> 00:23:03,800 Speaker 1: your food right now, and I'm looking at the nutrition 407 00:23:03,880 --> 00:23:08,240 Speaker 1: label nutrition facts. It's got servings per container, the size 408 00:23:08,280 --> 00:23:12,880 Speaker 1: of a serving, and then numbers and percentages in terms 409 00:23:12,920 --> 00:23:17,960 Speaker 1: of the daily percent of total fact cholesterol, sodium, carbohydrates, protein, 410 00:23:18,560 --> 00:23:21,000 Speaker 1: and a set of vitamins that I can expect in 411 00:23:21,040 --> 00:23:23,239 Speaker 1: a single serving of this product. And then I can 412 00:23:23,280 --> 00:23:27,280 Speaker 1: make an informed choice about whether and how much of 413 00:23:27,320 --> 00:23:30,080 Speaker 1: that food stuff I want to put in my body, 414 00:23:30,680 --> 00:23:32,600 Speaker 1: how much garbage I want to let in. In this case, 415 00:23:32,680 --> 00:23:36,440 Speaker 1: it's pretty healthy stuff. It's uh dried mangoes. If you're curious, 416 00:23:38,080 --> 00:23:43,360 Speaker 1: what's on your data nutrition label? Yeah, a great question. 417 00:23:43,520 --> 00:23:45,000 Speaker 1: And now I'm like kind of hungry. I'm like, oh, 418 00:23:45,000 --> 00:23:49,400 Speaker 1: it's a snack time. I feel like it's snack time. Um. 419 00:23:49,440 --> 00:23:52,320 Speaker 1: This is the hardest part to me about this project 420 00:23:52,480 --> 00:23:56,720 Speaker 1: is what the right level of metadata is. So what 421 00:23:56,800 --> 00:23:58,680 Speaker 1: are the right elements that you want to call out 422 00:23:58,720 --> 00:24:01,639 Speaker 1: for our nutritional label? You know, what are the facts 423 00:24:01,640 --> 00:24:04,080 Speaker 1: and the sodiums and these kinds of things, Because you know, 424 00:24:04,119 --> 00:24:07,080 Speaker 1: that The complication here is that there are so many 425 00:24:07,080 --> 00:24:09,440 Speaker 1: different kinds of data sets. I can have a data 426 00:24:09,480 --> 00:24:11,399 Speaker 1: set about trees in Central Park, and I can have 427 00:24:11,440 --> 00:24:14,880 Speaker 1: a data set about people in prison. So we've kind 428 00:24:14,880 --> 00:24:19,000 Speaker 1: of identified that the harms that were most worried about 429 00:24:19,160 --> 00:24:22,480 Speaker 1: have to do with people. Um not to say that 430 00:24:22,520 --> 00:24:25,200 Speaker 1: we are, you know, not worried about things like the 431 00:24:25,359 --> 00:24:29,159 Speaker 1: environment or other things, but when it touches people or 432 00:24:29,240 --> 00:24:32,359 Speaker 1: communities is when we see the greatest harms from an 433 00:24:32,400 --> 00:24:36,560 Speaker 1: algorithmic standpoint in society. And so we kind of have 434 00:24:36,640 --> 00:24:39,399 Speaker 1: a badge system that should be very quick, kind of 435 00:24:39,600 --> 00:24:42,400 Speaker 1: icon based that says this data sets about people are 436 00:24:42,440 --> 00:24:46,760 Speaker 1: not This data set includes subpopulation data, so you know, 437 00:24:46,880 --> 00:24:51,440 Speaker 1: includes information about race or gender or whatever status. Right, 438 00:24:52,040 --> 00:24:55,520 Speaker 1: this data set can be used for commercial purposes or not. 439 00:24:55,880 --> 00:24:58,840 Speaker 1: We've identified, let's say, tend to fifteen things that we 440 00:24:58,880 --> 00:25:01,640 Speaker 1: think are kind of high level almost like little food 441 00:25:01,720 --> 00:25:05,600 Speaker 1: warning symbols that you would see on something like organic 442 00:25:05,680 --> 00:25:09,520 Speaker 1: or it's got a surgeon General's warning right exactly. So 443 00:25:09,560 --> 00:25:11,680 Speaker 1: at a very high level we have these kind of icons. 444 00:25:12,080 --> 00:25:16,119 Speaker 1: And then underneath that there are additional very important questions 445 00:25:16,119 --> 00:25:18,879 Speaker 1: that we've highlighted that people will answer who own the 446 00:25:18,920 --> 00:25:21,840 Speaker 1: data set? And then finally there's a section that says, 447 00:25:21,880 --> 00:25:23,960 Speaker 1: here's the reason it was made. The data set was made, 448 00:25:23,960 --> 00:25:27,080 Speaker 1: it's probably an intended use. Here are some other use 449 00:25:27,119 --> 00:25:29,560 Speaker 1: cases that are possible or ways that other people have 450 00:25:29,680 --> 00:25:31,439 Speaker 1: used it, and then here are some things that you 451 00:25:31,480 --> 00:25:36,200 Speaker 1: just shouldn't do. So how do we make this approach 452 00:25:36,440 --> 00:25:40,200 Speaker 1: more mainstream? Mainstream is a tough word because we're talking 453 00:25:40,240 --> 00:25:42,600 Speaker 1: about people who build AI, and I think that is 454 00:25:42,640 --> 00:25:46,480 Speaker 1: becoming more mainstream for sure. Um, but we're really focused 455 00:25:46,480 --> 00:25:49,679 Speaker 1: on data practitioners, so people who are taking data and 456 00:25:49,680 --> 00:25:52,119 Speaker 1: then building things on that data. But there's kind of 457 00:25:52,119 --> 00:25:56,000 Speaker 1: a bottoms up approach. It's very anti establishment in some ways, 458 00:25:56,040 --> 00:25:59,280 Speaker 1: in very hagriculture. And so we've been working with a 459 00:25:59,320 --> 00:26:01,800 Speaker 1: lot of data petitioners to say what works, what doesn't, 460 00:26:02,080 --> 00:26:04,200 Speaker 1: is as useful as it not. Make it open source right, 461 00:26:04,280 --> 00:26:07,160 Speaker 1: open licenses, use it if you want, and just hoping 462 00:26:07,160 --> 00:26:09,239 Speaker 1: that if we make a good thing, people will use it. 463 00:26:09,600 --> 00:26:12,399 Speaker 1: A rising tide lifts all boats, we think, so you know, 464 00:26:12,920 --> 00:26:15,320 Speaker 1: we're not cag about it because we just want better data. 465 00:26:15,400 --> 00:26:17,159 Speaker 1: We have better data out there, and if people have 466 00:26:17,240 --> 00:26:19,159 Speaker 1: the expectation that they're going to see something like this, 467 00:26:19,240 --> 00:26:22,520 Speaker 1: that's awesome. There's also the top down approach, which is 468 00:26:22,840 --> 00:26:26,160 Speaker 1: regulation policy, And I could imagine a world in which 469 00:26:26,160 --> 00:26:30,480 Speaker 1: in the future, if you deploy an algorithm, especially in 470 00:26:30,480 --> 00:26:32,960 Speaker 1: the public sector, you would have to include some kind 471 00:26:33,000 --> 00:26:35,080 Speaker 1: of labeling on that right to talk about the data 472 00:26:35,080 --> 00:26:36,800 Speaker 1: that it was trained on and provide a label for that. 473 00:26:36,880 --> 00:26:39,520 Speaker 1: So it's kind of a two way approach, you know. Yeah, no, 474 00:26:39,560 --> 00:26:41,760 Speaker 1: I mean when I think of analogus, like most of 475 00:26:41,840 --> 00:26:46,560 Speaker 1: us don't know civil engineers personally, but we interact with 476 00:26:46,640 --> 00:26:49,520 Speaker 1: their work on a regular basis through a system of trust, 477 00:26:49,880 --> 00:26:54,920 Speaker 1: through standards, through approvals, through certifications, and data scientists are 478 00:26:55,200 --> 00:26:57,440 Speaker 1: on par with like a civil engineer in my mind, 479 00:26:57,480 --> 00:27:00,960 Speaker 1: and that the erect structures that we inhabit a regular basis. 480 00:27:01,520 --> 00:27:04,480 Speaker 1: But I have no idea what rules they're operating by. 481 00:27:04,600 --> 00:27:06,639 Speaker 1: I don't know what's in this algorithm, you know, I 482 00:27:06,680 --> 00:27:08,600 Speaker 1: don't know how what ingredients you used to put this 483 00:27:08,640 --> 00:27:12,080 Speaker 1: together that's determining whether I get a job or vaccination. 484 00:27:13,280 --> 00:27:16,400 Speaker 1: What's your biggest dream for the Data nutrition project? Where 485 00:27:16,400 --> 00:27:20,600 Speaker 1: does it go? So I could easily say, you know, 486 00:27:20,880 --> 00:27:22,560 Speaker 1: our dream would be that every data set comes with 487 00:27:22,600 --> 00:27:26,280 Speaker 1: a label. Cool, But more than that, I think we're 488 00:27:26,320 --> 00:27:28,960 Speaker 1: trying to drive awareness and change. So even if there 489 00:27:29,040 --> 00:27:31,520 Speaker 1: isn't a label, you're thinking about, I wonder what's in 490 00:27:31,560 --> 00:27:33,199 Speaker 1: this and I wish it had a label on it. 491 00:27:35,480 --> 00:27:38,360 Speaker 1: In the same way that I walk into a bakery 492 00:27:38,480 --> 00:27:40,560 Speaker 1: and I see a cake that's been baked, and I 493 00:27:40,640 --> 00:27:44,840 Speaker 1: might think to myself, I wonder what's in that cake, 494 00:27:45,800 --> 00:27:50,080 Speaker 1: and I wonder, you know, if it has this much 495 00:27:50,119 --> 00:27:52,640 Speaker 1: of something, or maybe I should consider this when I 496 00:27:52,720 --> 00:27:54,960 Speaker 1: decide whether to have four or five pieces of cake. 497 00:27:55,320 --> 00:27:56,879 Speaker 1: We would want the same thing for a data set 498 00:27:56,960 --> 00:27:59,639 Speaker 1: where even if you encounter that data set in the wild, 499 00:28:00,320 --> 00:28:03,600 Speaker 1: someone's created it. You just downloaded it from some repository 500 00:28:03,640 --> 00:28:07,280 Speaker 1: on gith hub. There's no documentation that you, as a 501 00:28:07,359 --> 00:28:10,040 Speaker 1: data practitioner, will think to yourself, I wonder if this 502 00:28:10,080 --> 00:28:13,520 Speaker 1: is representative. I wonder if the thing I'm trying to 503 00:28:13,520 --> 00:28:18,320 Speaker 1: do with this data is responsible, considering the data, where 504 00:28:18,320 --> 00:28:20,960 Speaker 1: it came from, who touched it, who funded it, where 505 00:28:20,960 --> 00:28:24,159 Speaker 1: it lives, how often it's updated, whether they got consent 506 00:28:24,560 --> 00:28:28,120 Speaker 1: from people when they took their data, And so we're 507 00:28:28,119 --> 00:28:33,040 Speaker 1: trying to drive a culture change. I love that and 508 00:28:33,280 --> 00:28:35,760 Speaker 1: I love the idea that when I go to a bakery, 509 00:28:36,480 --> 00:28:39,880 Speaker 1: one of the questions I'm not asking myself is is 510 00:28:39,920 --> 00:28:43,360 Speaker 1: that muffin safe to eat? Right? Is that Kate gonna 511 00:28:43,480 --> 00:28:46,960 Speaker 1: kill me? It literally doesn't enter my mind because there's 512 00:28:47,000 --> 00:28:51,040 Speaker 1: such a level of earned trust in the system overall 513 00:28:51,720 --> 00:28:54,680 Speaker 1: that you know, these people are getting inspected, that there's 514 00:28:54,720 --> 00:28:57,040 Speaker 1: some kind of oversight that they were trained in a 515 00:28:57,080 --> 00:29:00,800 Speaker 1: reasonable way, so I know there's not arsenic in the muffins. 516 00:29:01,760 --> 00:29:04,520 Speaker 1: So this brings me to zooming out a little bit 517 00:29:04,560 --> 00:29:08,600 Speaker 1: further to artificial intelligence and the idea of standards, because 518 00:29:08,640 --> 00:29:10,880 Speaker 1: I'm getting this picture from you that there's kind of 519 00:29:10,880 --> 00:29:13,080 Speaker 1: a wild West in terms of what we're feeding into 520 00:29:13,080 --> 00:29:16,720 Speaker 1: the systems that ultimately become some form of AI. What 521 00:29:16,840 --> 00:29:20,280 Speaker 1: does the world look like when we have more standards 522 00:29:20,920 --> 00:29:24,760 Speaker 1: in the tools and components that create AI. I think 523 00:29:24,760 --> 00:29:29,160 Speaker 1: that our understanding of what AI is and what kinds 524 00:29:29,200 --> 00:29:32,600 Speaker 1: of AI there are is going to mature. I imagine 525 00:29:32,680 --> 00:29:36,800 Speaker 1: that there is a system of classification where some AI 526 00:29:37,000 --> 00:29:40,720 Speaker 1: is very high risk and some AI is less high risk, 527 00:29:41,200 --> 00:29:43,760 Speaker 1: and we start to have a stratified view of what 528 00:29:43,920 --> 00:29:47,720 Speaker 1: needs to occur in each level in order to reach 529 00:29:47,800 --> 00:29:50,920 Speaker 1: an understanding that there's no arsenic in the muffins. So 530 00:29:51,840 --> 00:29:54,360 Speaker 1: at the highest level, when it's super super risky, maybe 531 00:29:54,400 --> 00:29:57,520 Speaker 1: we just don't use AI. This seems to be something 532 00:29:57,520 --> 00:29:59,440 Speaker 1: that people forget, is that we can decide whether or 533 00:29:59,480 --> 00:30:02,479 Speaker 1: not to use it. Like, would you want an AI 534 00:30:02,600 --> 00:30:07,240 Speaker 1: performing surgery on you with no human around? If it's 535 00:30:07,280 --> 00:30:09,440 Speaker 1: really really good? Do you want that? Do you want 536 00:30:09,480 --> 00:30:11,400 Speaker 1: to assume that risk? I mean that is dealing with 537 00:30:11,440 --> 00:30:16,040 Speaker 1: your literal organs, your heart. So I think that you know, 538 00:30:16,200 --> 00:30:19,240 Speaker 1: ideally what happens is you've got a good combination of 539 00:30:19,960 --> 00:30:23,200 Speaker 1: regulation and oversight, which I do believe in, but then 540 00:30:23,240 --> 00:30:27,880 Speaker 1: also training and you know, good human intention to do 541 00:30:27,920 --> 00:30:33,160 Speaker 1: the right thing. So when I think about these algorithms, 542 00:30:33,640 --> 00:30:36,000 Speaker 1: I think of them as kind of automated decision makers, 543 00:30:36,440 --> 00:30:39,080 Speaker 1: and I think they can pose a challenge to our 544 00:30:39,160 --> 00:30:44,680 Speaker 1: ideas of free will and self determination because we are 545 00:30:44,720 --> 00:30:48,840 Speaker 1: increasingly living in this world where we think we're making choices, 546 00:30:49,280 --> 00:30:52,400 Speaker 1: but we're actually operating within a narrow set of recommendations. 547 00:30:53,320 --> 00:30:57,000 Speaker 1: What do you think about human agency in the age 548 00:30:57,000 --> 00:31:00,920 Speaker 1: of algorithms? WHOA These are the big questions? Um, Well, 549 00:31:00,960 --> 00:31:03,200 Speaker 1: I mean I think that we have to be careful 550 00:31:03,400 --> 00:31:06,560 Speaker 1: not to give the machines more agency than they have. 551 00:31:07,200 --> 00:31:10,800 Speaker 1: And there are people who are making those machines. So 552 00:31:10,880 --> 00:31:13,760 Speaker 1: when we talk about, you know, the free will of 553 00:31:14,000 --> 00:31:17,760 Speaker 1: people versus machines, it's like the free will of people 554 00:31:18,440 --> 00:31:22,440 Speaker 1: versus the people who made the machines. To me, technology 555 00:31:22,520 --> 00:31:26,800 Speaker 1: is just a tool, and I personally don't want to 556 00:31:26,880 --> 00:31:30,240 Speaker 1: live in a world that has no algorithms and no 557 00:31:30,320 --> 00:31:33,640 Speaker 1: technology because these are useful tools. But I want to 558 00:31:33,640 --> 00:31:36,080 Speaker 1: decide when I'm using them and what I use them for. 559 00:31:37,000 --> 00:31:40,440 Speaker 1: And so my perspective is really from the point of 560 00:31:40,520 --> 00:31:43,920 Speaker 1: view of a person who has been making the tools, 561 00:31:44,760 --> 00:31:46,840 Speaker 1: and I think that we need to make sure that 562 00:31:46,880 --> 00:31:50,160 Speaker 1: those folks have the free will to say, no, I 563 00:31:50,160 --> 00:31:52,880 Speaker 1: don't want to make those tools, or this should not 564 00:31:52,920 --> 00:31:55,760 Speaker 1: be used in this way, or we need to modify 565 00:31:55,800 --> 00:31:57,920 Speaker 1: this tool in this way so those tools don't run 566 00:31:57,920 --> 00:32:02,320 Speaker 1: away from us. Um So, I guess I I kind 567 00:32:02,320 --> 00:32:05,960 Speaker 1: of disagree with the premise that it's people versus machines 568 00:32:06,360 --> 00:32:08,520 Speaker 1: because people are making the machines and we're not at 569 00:32:08,520 --> 00:32:12,240 Speaker 1: the terminator stage yet. Currently it's people and people, right, 570 00:32:12,280 --> 00:32:14,640 Speaker 1: So so let's so let's like work together to make 571 00:32:14,680 --> 00:32:19,600 Speaker 1: the right things um for people. Yes, Kasha, thank you 572 00:32:19,680 --> 00:32:21,840 Speaker 1: so much for spending this time with me. I've learned 573 00:32:21,880 --> 00:32:24,560 Speaker 1: a lot and now I'm just thinking about Arsenic and 574 00:32:24,600 --> 00:32:27,120 Speaker 1: my muffins. Thanks so much for having me. I've really 575 00:32:27,280 --> 00:32:34,840 Speaker 1: enjoyed it. Garbage in, garbage out. It's a cycle that 576 00:32:34,880 --> 00:32:37,160 Speaker 1: we see that doesn't just apply to the world of 577 00:32:37,280 --> 00:32:41,680 Speaker 1: artificial intelligence, but everywhere. If I feed my body junk, 578 00:32:42,200 --> 00:32:45,880 Speaker 1: it turns to junk. If I fill my planet with filth, 579 00:32:46,480 --> 00:32:50,080 Speaker 1: it turns to filth. If I inject my Twitter feed 580 00:32:50,120 --> 00:32:55,880 Speaker 1: with hatred, that breeds more hatred. It's pretty straightforward, but 581 00:32:55,960 --> 00:33:00,160 Speaker 1: it doesn't have to be this way. In essence, kashas 582 00:33:00,240 --> 00:33:06,080 Speaker 1: to standardize thoughtfulness, and that fills me with so much hope. 583 00:33:06,960 --> 00:33:12,080 Speaker 1: We're all responsible for something or someone, so let's always 584 00:33:12,080 --> 00:33:15,719 Speaker 1: do our best to really consider what they need to thrive. 585 00:33:16,760 --> 00:33:19,880 Speaker 1: If we put a little more goodness into our ai, 586 00:33:20,200 --> 00:33:25,640 Speaker 1: our bodies, our planet, our relationships, and everything else, we'll 587 00:33:25,640 --> 00:33:29,320 Speaker 1: see goodness come out. And that's a psychle I can 588 00:33:29,320 --> 00:33:36,680 Speaker 1: get behind goodness in, goodness out. This is just one 589 00:33:36,800 --> 00:33:40,760 Speaker 1: part of the how does citizen conversation about data? Who 590 00:33:40,840 --> 00:33:45,960 Speaker 1: does data ultimately benefit? If the data is not benefiting 591 00:33:46,400 --> 00:33:50,480 Speaker 1: the people, the individuals, the communities that provided that data. 592 00:33:51,320 --> 00:33:55,360 Speaker 1: Then who are we uplifting at the cost of others justice? 593 00:33:56,000 --> 00:33:58,400 Speaker 1: Next week, we dive deeper into how it's collected in 594 00:33:58,400 --> 00:34:02,000 Speaker 1: the first place, and we meet an indigenous geneticist reclaiming 595 00:34:02,080 --> 00:34:14,440 Speaker 1: data for her people. See you. Then we asked Kasha 596 00:34:14,760 --> 00:34:17,319 Speaker 1: what we should have you do, and they came up 597 00:34:17,320 --> 00:34:20,600 Speaker 1: with a lot. So here's a whole bunch of beautiful 598 00:34:20,640 --> 00:34:26,799 Speaker 1: options for citizening. Think about this. Like people, machines are 599 00:34:26,800 --> 00:34:29,520 Speaker 1: shaped by the context in which they're created. So if 600 00:34:29,520 --> 00:34:32,319 Speaker 1: we think of machines and algorithmic systems as children who 601 00:34:32,320 --> 00:34:35,200 Speaker 1: are learning from us, we're to parents. What kind of 602 00:34:35,200 --> 00:34:37,560 Speaker 1: parents do we want to be? How do we want 603 00:34:37,560 --> 00:34:41,360 Speaker 1: to raise our machines to be considerate, fair, and to 604 00:34:41,400 --> 00:34:43,360 Speaker 1: build a better world than the one we're in today. 605 00:34:44,640 --> 00:34:49,160 Speaker 1: Watch Coded Bias. It's a documentary that explores the fallout 606 00:34:49,360 --> 00:34:52,840 Speaker 1: around m I T media lab researcher Joy will Ameni's 607 00:34:52,880 --> 00:34:56,520 Speaker 1: discovery that facial recognition don't see dark skin face as well, 608 00:34:57,080 --> 00:35:00,200 Speaker 1: and this film is capturing her journey to put for 609 00:35:00,200 --> 00:35:03,120 Speaker 1: the first ever legislation in the US that will govern 610 00:35:03,160 --> 00:35:07,160 Speaker 1: against bias and the algorithms that impact us all. Check 611 00:35:07,200 --> 00:35:11,239 Speaker 1: out this online buying resource called the Privacy Not Included 612 00:35:11,320 --> 00:35:15,239 Speaker 1: Buying Guide. Mozilla built this shopping guide which tells you 613 00:35:15,280 --> 00:35:19,000 Speaker 1: the data practices of the app or product that you're considering, 614 00:35:19,280 --> 00:35:22,719 Speaker 1: and it's basically the product reviews we need in this 615 00:35:22,800 --> 00:35:27,560 Speaker 1: hyper connected era of data theft and hoarding and non 616 00:35:27,600 --> 00:35:33,279 Speaker 1: consensual monetization. Donate if you've got money, you can distribute 617 00:35:33,320 --> 00:35:36,320 Speaker 1: some power through dollars to these groups that are ensuring 618 00:35:36,360 --> 00:35:39,520 Speaker 1: that the future of AI is human and just, the 619 00:35:39,600 --> 00:35:43,000 Speaker 1: Algorithmic Justice League, the a c L You, and the 620 00:35:43,040 --> 00:35:49,040 Speaker 1: Electronic Frontier Foundation. If you take any of these actions, 621 00:35:49,080 --> 00:35:52,560 Speaker 1: please brag about yourself online. Use the hashtag how to citizen. 622 00:35:53,040 --> 00:35:56,480 Speaker 1: Tag us up on Instagram at how to Citizen. We 623 00:35:56,520 --> 00:36:00,319 Speaker 1: will accept general direct feedback to our inbox common at 624 00:36:00,320 --> 00:36:02,759 Speaker 1: how to citizen dot com and make sure you go 625 00:36:02,800 --> 00:36:04,799 Speaker 1: ahead and visit how to citizen dot com because that's 626 00:36:04,800 --> 00:36:07,880 Speaker 1: the brand new kid in town. We have a spanky 627 00:36:08,040 --> 00:36:11,959 Speaker 1: new website. It's very interactive. We have an email list 628 00:36:12,080 --> 00:36:15,440 Speaker 1: you can join. If you like this show, tell somebody 629 00:36:15,480 --> 00:36:19,800 Speaker 1: about thanks. How to Citizen with Baryton Day is a 630 00:36:19,840 --> 00:36:23,240 Speaker 1: production of I Heart Radio Podcasts and Dust Light Productions. 631 00:36:23,239 --> 00:36:27,720 Speaker 1: Our executive producers are Me Barryton Day Thurston, Elizabeth Stewart, 632 00:36:27,719 --> 00:36:31,120 Speaker 1: and Misha Yusuf. Our senior producer is Tamika Adams, our 633 00:36:31,160 --> 00:36:34,000 Speaker 1: producer is Ali Kilts, and our assistant producer is Sam Paulson. 634 00:36:34,400 --> 00:36:38,239 Speaker 1: Stephanie Cohen is our editor, Valentino Rivera is our senior engineer, 635 00:36:38,400 --> 00:36:42,280 Speaker 1: and Matthew Lai as our apprentice. Original music by Andrew Eathan, 636 00:36:42,440 --> 00:36:45,400 Speaker 1: with additional original music for season three from Andrew Clausen. 637 00:36:46,080 --> 00:36:49,000 Speaker 1: This episode was produced and sound designed by Sam Paulson. 638 00:36:49,440 --> 00:36:52,040 Speaker 1: Special thanks to Joel Smith from My Heart Radio and 639 00:36:52,120 --> 00:36:54,000 Speaker 1: Rachel Garcia at dust Light Productions.