1 00:00:15,356 --> 00:00:22,596 Speaker 1: Pushkin. There are a lot of reasons that I'm excited 2 00:00:22,636 --> 00:00:25,836 Speaker 1: about today's show. I'm going to tell you three right now. 3 00:00:26,796 --> 00:00:30,836 Speaker 1: Number One, the show is about this whole dimension of 4 00:00:30,996 --> 00:00:37,596 Speaker 1: medicine that I essentially didn't know existed, acoustic biomarkers, basically 5 00:00:38,076 --> 00:00:41,996 Speaker 1: using a person's voice to assess their health. Second thing 6 00:00:42,036 --> 00:00:45,076 Speaker 1: I'm excited about the show is about the intersection of 7 00:00:45,436 --> 00:00:51,196 Speaker 1: AI and healthcare, one of my top say five intersections. 8 00:00:51,676 --> 00:00:57,316 Speaker 1: Love that intersection. And three, today's guest doctor Yael Bensusan, 9 00:00:57,796 --> 00:01:01,436 Speaker 1: gave me what was truly the best excuse that anyone 10 00:01:01,436 --> 00:01:04,076 Speaker 1: has ever given me for canceling an interview at the 11 00:01:04,156 --> 00:01:04,916 Speaker 1: last minute. 12 00:01:05,156 --> 00:01:07,116 Speaker 2: Yeah, so I'm really sorry for having to cancel on 13 00:01:07,156 --> 00:01:07,916 Speaker 2: you yesterday. 14 00:01:09,196 --> 00:01:12,756 Speaker 1: What was the what was the surgery you had to 15 00:01:12,756 --> 00:01:13,356 Speaker 1: do yesterday? 16 00:01:13,596 --> 00:01:17,916 Speaker 2: So yesterday we call it airway surgery, where I take 17 00:01:17,956 --> 00:01:20,076 Speaker 2: a base to the oar and I have to open 18 00:01:20,196 --> 00:01:22,996 Speaker 2: up their windpipe or their trachia because they're a scar 19 00:01:23,116 --> 00:01:26,716 Speaker 2: tissue that's blocking them from breathing. So I have to 20 00:01:26,716 --> 00:01:29,356 Speaker 2: go with a laser and cut the scar tissue out 21 00:01:30,276 --> 00:01:32,796 Speaker 2: and then take a balloon and open up their windpipe 22 00:01:32,916 --> 00:01:35,996 Speaker 2: so that they can wake up and breathe better, and 23 00:01:36,076 --> 00:01:38,756 Speaker 2: that translates to a different sound when they're breathing. 24 00:01:38,836 --> 00:01:39,516 Speaker 3: So when they're not. 25 00:01:39,596 --> 00:01:41,956 Speaker 2: Breathing because of the scar tissue, we can sound like, 26 00:01:43,116 --> 00:01:45,756 Speaker 2: you know, very noisy breathing. We call it the Darth 27 00:01:45,836 --> 00:01:49,236 Speaker 2: Vader breathing. And then when wake up they wake up 28 00:01:49,236 --> 00:01:51,956 Speaker 2: from surgery and they're done, they have silent breathing, which 29 00:01:51,956 --> 00:01:53,916 Speaker 2: means that I know that I did a good job. 30 00:02:00,236 --> 00:02:02,556 Speaker 1: I'm Jacob Goldstein and this is What's your Problem the 31 00:02:02,596 --> 00:02:05,156 Speaker 1: show where I talk to people. We're trying to make 32 00:02:05,236 --> 00:02:09,796 Speaker 1: technological progress. Doctor Yeah. Eld and Susan run the Health 33 00:02:09,916 --> 00:02:13,076 Speaker 1: Voice Center at the University of South Florida, and she 34 00:02:13,156 --> 00:02:16,596 Speaker 1: is also leading a team of researchers that's building a 35 00:02:16,716 --> 00:02:21,316 Speaker 1: giant database of human voices and breadths and health information. 36 00:02:22,116 --> 00:02:25,236 Speaker 1: Her problem is this, how do you record the voices 37 00:02:25,276 --> 00:02:29,876 Speaker 1: of thousands of people without violating patient privacy laws while 38 00:02:29,876 --> 00:02:34,276 Speaker 1: building a giant public database that could someday allow your 39 00:02:34,356 --> 00:02:37,996 Speaker 1: phone to warn you, based solely on your voice that 40 00:02:38,036 --> 00:02:40,996 Speaker 1: you may be getting sick. Yeah El told me that 41 00:02:41,116 --> 00:02:43,996 Speaker 1: she got into this field in part because she used 42 00:02:43,996 --> 00:02:44,636 Speaker 1: to be a singer. 43 00:02:46,636 --> 00:02:50,316 Speaker 2: So I growing up, you know, I always was in 44 00:02:50,356 --> 00:02:53,116 Speaker 2: a very musical family. I took singing lessons when I 45 00:02:53,196 --> 00:02:57,636 Speaker 2: was a kid, and then I started singing more professionally 46 00:02:57,996 --> 00:03:00,956 Speaker 2: around eighteen years old, and I had a short but 47 00:03:01,076 --> 00:03:05,356 Speaker 2: exciting singing career. I wrote pop folk music. We had 48 00:03:05,356 --> 00:03:08,236 Speaker 2: a bend, and we toured. We had an album out 49 00:03:08,276 --> 00:03:12,316 Speaker 2: in two thousand and twelve. Yeah, and I mean it 50 00:03:12,436 --> 00:03:15,836 Speaker 2: was a lot of fun. And actually the reason I 51 00:03:15,956 --> 00:03:18,876 Speaker 2: was able to have that short and exciting career was 52 00:03:19,276 --> 00:03:21,876 Speaker 2: because I met a speech pathologist when I was fifteen. 53 00:03:21,956 --> 00:03:25,836 Speaker 2: So I was taking singing classes and one day my 54 00:03:25,916 --> 00:03:28,036 Speaker 2: teacher looked at me and she said, there's something wrong 55 00:03:28,076 --> 00:03:28,756 Speaker 2: with your voice. 56 00:03:28,996 --> 00:03:29,876 Speaker 3: Go get checked. 57 00:03:31,716 --> 00:03:35,236 Speaker 2: And I met a laryngologist who put his camera down 58 00:03:35,636 --> 00:03:37,996 Speaker 2: and she said, you have nodules on your vocal cords 59 00:03:38,036 --> 00:03:39,956 Speaker 2: and you might not be able to sing again if 60 00:03:39,996 --> 00:03:42,316 Speaker 2: you don't take this seriously. And I went to see 61 00:03:42,316 --> 00:03:45,236 Speaker 2: a speech pathologist. I did rehabilitation with my voice for 62 00:03:45,316 --> 00:03:47,596 Speaker 2: six months, and I was able to sing again. And 63 00:03:47,636 --> 00:03:49,916 Speaker 2: I mean that's what led me to then become a 64 00:03:49,916 --> 00:03:54,156 Speaker 2: speech pathologist growing up, and then eventually go to med 65 00:03:54,156 --> 00:03:56,476 Speaker 2: school and then decide to become a laryngologist. 66 00:03:56,756 --> 00:03:58,596 Speaker 3: So it was kind of all interconnected. 67 00:03:59,916 --> 00:04:03,676 Speaker 1: So I know that your research now and most of 68 00:04:04,356 --> 00:04:08,716 Speaker 1: what I'm really interested to talk with you about is 69 00:04:08,716 --> 00:04:12,836 Speaker 1: is around acoustic biomarkers. So just to start, I mean, 70 00:04:13,796 --> 00:04:15,196 Speaker 1: what's an acoustic biomarker? 71 00:04:16,076 --> 00:04:19,116 Speaker 2: Very good question. So what is a biomarker? First, A 72 00:04:19,116 --> 00:04:24,196 Speaker 2: biomarker is something that indicates the presence of a disease, right, 73 00:04:25,396 --> 00:04:28,836 Speaker 2: So if you think about a biomarker for a cancer, 74 00:04:28,916 --> 00:04:31,716 Speaker 2: so different cancers have different types of biomarker. For example, 75 00:04:31,756 --> 00:04:35,476 Speaker 2: for ovarian cancer, we're looking for a specific thing you know, 76 00:04:35,676 --> 00:04:38,756 Speaker 2: called ca in your blood. For different types of cancers, 77 00:04:38,756 --> 00:04:41,316 Speaker 2: they could take a blood draw and find a specific biomarker. 78 00:04:41,316 --> 00:04:44,676 Speaker 2: It's an indicator of a disease. An acoustic biomarkers is 79 00:04:45,116 --> 00:04:47,756 Speaker 2: something that can indicate a presence of a disease, but 80 00:04:47,836 --> 00:04:50,316 Speaker 2: that you can hear. So that's the definition of an 81 00:04:50,356 --> 00:04:54,036 Speaker 2: acoustic biomarker. So I always say, you know, when you 82 00:04:54,156 --> 00:04:57,676 Speaker 2: have people in your family that are not well, you 83 00:04:57,716 --> 00:05:00,836 Speaker 2: will always notice first and you'll say you don't sound 84 00:05:00,876 --> 00:05:04,996 Speaker 2: good right, or you sound funny. And I have the 85 00:05:05,116 --> 00:05:08,036 Speaker 2: luxury to know that because I'm a voice doctor. So 86 00:05:08,196 --> 00:05:10,756 Speaker 2: then people will bring me their family members or people 87 00:05:10,796 --> 00:05:13,356 Speaker 2: will come saying, I don't know what's wrong with me, 88 00:05:13,796 --> 00:05:16,276 Speaker 2: but my wife told me to come because my voice 89 00:05:16,316 --> 00:05:20,756 Speaker 2: is not good. And sometimes it's because their vocal cords 90 00:05:20,916 --> 00:05:23,116 Speaker 2: are not working, but a lot of times it's because 91 00:05:23,156 --> 00:05:25,956 Speaker 2: they can have a neurological issue or a cardiac issue 92 00:05:26,316 --> 00:05:27,956 Speaker 2: that is affecting their voice. 93 00:05:28,196 --> 00:05:34,036 Speaker 1: So, more broadly, what's going on with AI and acoustic biomarkers. 94 00:05:34,756 --> 00:05:37,436 Speaker 2: Yeah, so, so many exciting things are going on. I 95 00:05:37,476 --> 00:05:42,036 Speaker 2: think that's the first answer. There are so many startups, 96 00:05:42,076 --> 00:05:46,876 Speaker 2: so many companies, industry researchers, academic researchers that are working 97 00:05:46,876 --> 00:05:50,956 Speaker 2: and looking into voice AI. And the reason is it's 98 00:05:50,996 --> 00:05:54,916 Speaker 2: really cheap to collect. Right to think about this, If 99 00:05:54,916 --> 00:05:56,796 Speaker 2: you have a phone, it's really cheap to collect Compared 100 00:05:56,836 --> 00:05:56,996 Speaker 2: to this. 101 00:05:57,036 --> 00:05:59,556 Speaker 1: You don't have to pick a blood sample. You have 102 00:05:59,636 --> 00:06:02,276 Speaker 1: exactly just you've got the phone. You've got the device 103 00:06:02,796 --> 00:06:05,236 Speaker 1: literally in your hand already. All you have to do 104 00:06:05,356 --> 00:06:07,636 Speaker 1: is talk, and you're talking already. 105 00:06:07,436 --> 00:06:09,756 Speaker 2: And you're talking already, so it's cheap to that's why 106 00:06:09,836 --> 00:06:12,956 Speaker 2: pharmaceutical industries are also very interested, and there's a lot 107 00:06:12,956 --> 00:06:16,156 Speaker 2: of pharmaceutical projects around it. So there are a lot 108 00:06:16,276 --> 00:06:21,196 Speaker 2: of projects that are going on and the state or 109 00:06:21,236 --> 00:06:24,476 Speaker 2: the The current landscape is that there's tons of people 110 00:06:24,556 --> 00:06:29,316 Speaker 2: working on very similar things and very interesting and various disease. 111 00:06:29,356 --> 00:06:31,876 Speaker 2: So I always I kind of categorize them in three 112 00:06:31,916 --> 00:06:36,076 Speaker 2: categories of diseases that are being studied. One is the 113 00:06:36,276 --> 00:06:41,276 Speaker 2: disease that affects the voice box. Okay, so vocal court paralysis, absolutely, 114 00:06:41,316 --> 00:06:43,756 Speaker 2: it's intuitive. There's going to be vocal biomarkers in that 115 00:06:44,476 --> 00:06:48,356 Speaker 2: voice box cancer, right, that's easy. Then there's a voice 116 00:06:48,396 --> 00:06:52,316 Speaker 2: and speech affecting disorders, so disorders that don't affect the 117 00:06:52,396 --> 00:06:55,436 Speaker 2: voice box, but that have an impact on the voice 118 00:06:55,436 --> 00:06:59,316 Speaker 2: and the speech. Parkinson is one of them, right, Alzheimer's 119 00:06:59,356 --> 00:07:01,996 Speaker 2: is one of them. A stroke somebody having a stroke, 120 00:07:02,156 --> 00:07:04,076 Speaker 2: they don't have a problem with their voice box, but 121 00:07:04,116 --> 00:07:06,236 Speaker 2: their speech is going to be altered. So these are 122 00:07:06,316 --> 00:07:09,476 Speaker 2: voice and speech affecting conditions. So lots of work is 123 00:07:09,476 --> 00:07:11,916 Speaker 2: being done in that field. And the third one is 124 00:07:12,116 --> 00:07:15,636 Speaker 2: diseases that you don't think would affect speech, but still 125 00:07:15,676 --> 00:07:17,556 Speaker 2: people are doing research on that. So there was a 126 00:07:17,596 --> 00:07:21,316 Speaker 2: really interesting study on diabetes. They're saying that there was 127 00:07:21,356 --> 00:07:24,356 Speaker 2: a group that published that they could diagnose people that 128 00:07:24,396 --> 00:07:28,516 Speaker 2: were diabetic versus non diabetics based on their speech and this. 129 00:07:28,836 --> 00:07:34,356 Speaker 1: So this third group is one presumably where there's at 130 00:07:34,436 --> 00:07:39,476 Speaker 1: least the potential for AI to detect differences that even 131 00:07:39,876 --> 00:07:43,196 Speaker 1: experts like you cannot detect, right, I mean, is that 132 00:07:43,276 --> 00:07:45,316 Speaker 1: what's going on there? What? 133 00:07:45,596 --> 00:07:48,316 Speaker 2: So AI is not magical, you know, I think it's 134 00:07:48,436 --> 00:07:50,156 Speaker 2: It does a lot of things. But what AI does 135 00:07:50,236 --> 00:07:53,316 Speaker 2: that the layperson doesn't do is that it can analyze 136 00:07:53,316 --> 00:07:54,596 Speaker 2: a lot more data faster. 137 00:07:55,516 --> 00:07:56,076 Speaker 1: Yeah. 138 00:07:56,196 --> 00:07:59,676 Speaker 2: Right, So AI has the possibility, if you have a 139 00:07:59,836 --> 00:08:03,916 Speaker 2: large data set, to then find small differences in these 140 00:08:04,036 --> 00:08:06,316 Speaker 2: data sets that we don't have. I mean, I would 141 00:08:06,356 --> 00:08:09,076 Speaker 2: have to listen to, you know, thousands and thousands of 142 00:08:09,196 --> 00:08:10,916 Speaker 2: voices and compare them statistically. 143 00:08:11,116 --> 00:08:13,356 Speaker 1: It might it might, right. It might also be able 144 00:08:13,396 --> 00:08:16,716 Speaker 1: to detect differences that are not even audible. 145 00:08:17,156 --> 00:08:20,636 Speaker 2: It could exactly. I can give it an example. There's 146 00:08:20,676 --> 00:08:25,156 Speaker 2: a company looking at atrial fibrillation, and I cannot validate 147 00:08:25,196 --> 00:08:27,796 Speaker 2: their data because that's one of the limitations that we're 148 00:08:27,836 --> 00:08:30,076 Speaker 2: going to talk about. But obviously their data set is 149 00:08:30,076 --> 00:08:33,316 Speaker 2: not public. But they're saying that they can diagnose atrial 150 00:08:33,356 --> 00:08:36,316 Speaker 2: fibrillation based on the voice. And their explanation is that 151 00:08:36,756 --> 00:08:39,396 Speaker 2: our voice vibrates to the sound of our heartbeats. 152 00:08:40,796 --> 00:08:42,756 Speaker 1: Big if true? Fun if true? 153 00:08:43,076 --> 00:08:45,916 Speaker 2: I mean you know, again, the limitation here is that 154 00:08:45,996 --> 00:08:48,356 Speaker 2: it's there's a lot of things you can't validate. But 155 00:08:48,716 --> 00:08:52,276 Speaker 2: they say that they've been validating it with EKGs and 156 00:08:52,396 --> 00:08:54,476 Speaker 2: that they can see it. They can hear a difference 157 00:08:54,516 --> 00:08:56,436 Speaker 2: in the voice between patient patients with a. 158 00:08:56,476 --> 00:09:00,476 Speaker 1: Fib atrial fibrillation. It puts you at risk for a stroke, right, 159 00:09:00,516 --> 00:09:04,156 Speaker 1: it can go undiagnosed. So like, if if this works, 160 00:09:04,196 --> 00:09:08,636 Speaker 1: that would be very helpful to many people, right, absolutely, absolutely. 161 00:09:09,116 --> 00:09:13,196 Speaker 1: So you're mentioning like that's super interesting. It's it's interesting 162 00:09:13,236 --> 00:09:17,716 Speaker 1: more generally. So, so you're building a giant database, right, 163 00:09:18,756 --> 00:09:21,196 Speaker 1: and I find that interesting for a lot of reasons. 164 00:09:21,036 --> 00:09:23,996 Speaker 1: It happens. I don't have you come across the work 165 00:09:24,036 --> 00:09:27,636 Speaker 1: of faith A Lee. Absolutely, yeses, So I talked to 166 00:09:27,636 --> 00:09:30,796 Speaker 1: faith A Lee for this show not long ago. Wow. Right, 167 00:09:30,916 --> 00:09:35,916 Speaker 1: she's like nerd famous, right yeah, And so you know, 168 00:09:36,036 --> 00:09:40,756 Speaker 1: as you know, she built this giant database of images 169 00:09:40,996 --> 00:09:44,516 Speaker 1: about ten years ago a little more now called image net. 170 00:09:44,956 --> 00:09:50,156 Speaker 1: And that was that giant database was what allowed these 171 00:09:50,316 --> 00:09:55,316 Speaker 1: early machine learning models AI models to you know, start 172 00:09:55,476 --> 00:10:02,076 Speaker 1: recognizing images, right, and so the database was this necessary tool, 173 00:10:02,916 --> 00:10:05,996 Speaker 1: necessary thing for the AI to really work, right, And 174 00:10:06,116 --> 00:10:12,796 Speaker 1: so are you building the acoustic biomarker version of that? 175 00:10:13,636 --> 00:10:16,636 Speaker 2: So the first the short answer is yes, but I'd 176 00:10:16,676 --> 00:10:18,716 Speaker 2: like to start by saying that I am not building 177 00:10:19,156 --> 00:10:20,116 Speaker 2: it's our distortion. 178 00:10:20,596 --> 00:10:22,916 Speaker 1: Yes, yes, are you all are? 179 00:10:23,116 --> 00:10:24,276 Speaker 3: Actually, I'll just. 180 00:10:24,316 --> 00:10:27,036 Speaker 2: First start by recognizing here that it's it's a it's 181 00:10:27,116 --> 00:10:29,196 Speaker 2: a huge team. So we're the Bridge to Way I 182 00:10:29,316 --> 00:10:33,116 Speaker 2: Voice Constortium is a team of fifty investigators across the 183 00:10:33,236 --> 00:10:36,556 Speaker 2: US and Canada. We're funded by the NIH through the 184 00:10:36,596 --> 00:10:40,076 Speaker 2: Bridge to Way I program and the goal absolutely this 185 00:10:40,156 --> 00:10:41,916 Speaker 2: is the first time I hear the analogy to the 186 00:10:41,956 --> 00:10:43,356 Speaker 2: image net database. 187 00:10:43,396 --> 00:10:43,756 Speaker 3: I like it. 188 00:10:43,796 --> 00:10:47,076 Speaker 2: I usually give the example of the genomic database, the 189 00:10:47,196 --> 00:10:48,996 Speaker 2: Human Genome Project, huge. 190 00:10:49,076 --> 00:10:52,196 Speaker 1: Project, more famous, more famous, they're. 191 00:10:51,716 --> 00:10:53,716 Speaker 3: Both very famous. But I like this analogy. 192 00:10:53,876 --> 00:10:56,196 Speaker 1: Well. Image net is maybe a little bit closer of 193 00:10:56,236 --> 00:11:00,156 Speaker 1: an analogy, but maybe less Yeah yeah, sexy, yeah. 194 00:10:59,836 --> 00:11:02,316 Speaker 2: Well, but I mean it's interesting because the genome project 195 00:11:02,356 --> 00:11:06,116 Speaker 2: has also very interesting ethical particularities like voice, right, the 196 00:11:06,196 --> 00:11:08,996 Speaker 2: image has a little bit less of the ethical constraints. 197 00:11:08,996 --> 00:11:11,036 Speaker 3: For is, when we talk about whole genome. 198 00:11:10,716 --> 00:11:15,636 Speaker 2: Sequencing or genomics data people kind of understand that voice 199 00:11:15,636 --> 00:11:18,436 Speaker 2: has similar concerns in terms of process. 200 00:11:18,476 --> 00:11:20,596 Speaker 1: We want to get to the concerns, but I want 201 00:11:20,596 --> 00:11:23,276 Speaker 1: to first talk about what you're doing and and then 202 00:11:23,316 --> 00:11:28,716 Speaker 1: we can talk about you know, not doing anything wrong. Yeah. 203 00:11:28,836 --> 00:11:32,676 Speaker 1: So broadly, if it becomes the thing you hope it 204 00:11:32,756 --> 00:11:35,076 Speaker 1: will be, what, what is it going to be? What 205 00:11:35,196 --> 00:11:38,636 Speaker 1: is the bridge to AI voice database going to be? 206 00:11:39,436 --> 00:11:42,516 Speaker 2: So it's going to be this large database of thousands 207 00:11:42,516 --> 00:11:47,796 Speaker 2: of human voices linked to other health information that are 208 00:11:47,876 --> 00:11:52,716 Speaker 2: going to be available to researchers and potentially people other 209 00:11:52,796 --> 00:11:56,756 Speaker 2: than researchers as well, to be able to make discoveries, right, 210 00:11:56,916 --> 00:12:00,756 Speaker 2: to learn to use a voice AI, to train you know, 211 00:12:00,796 --> 00:12:02,956 Speaker 2: the next generation of people on how to learn to 212 00:12:03,196 --> 00:12:07,516 Speaker 2: build models on voice AI, to help pharmaceutical companies develop 213 00:12:07,556 --> 00:12:11,676 Speaker 2: products or learn even to to develop products, right, And 214 00:12:11,716 --> 00:12:15,876 Speaker 2: the other really important thing is to teach people what 215 00:12:15,956 --> 00:12:18,716 Speaker 2: type of standards we need right right now, a lot 216 00:12:18,796 --> 00:12:21,916 Speaker 2: of different projects, there's really a lack of standards. People 217 00:12:21,996 --> 00:12:25,116 Speaker 2: collect voice in different ways. That's why it's really hard 218 00:12:25,116 --> 00:12:29,956 Speaker 2: to pull data together. So our dream was really to say, like, hey, 219 00:12:30,196 --> 00:12:33,956 Speaker 2: you want to do voice research, here's a manual, my friend, right, 220 00:12:34,036 --> 00:12:36,236 Speaker 2: like here is how you collect the voice to make 221 00:12:36,276 --> 00:12:39,356 Speaker 2: it accurate. This is the protocols with the task that 222 00:12:39,396 --> 00:12:43,956 Speaker 2: we think, based on our studies, give the best biomarkers. Right, 223 00:12:44,316 --> 00:12:46,556 Speaker 2: These are the type of biomarkers you can look for 224 00:12:46,836 --> 00:12:50,116 Speaker 2: and this is the data you can train, so really 225 00:12:50,156 --> 00:12:52,716 Speaker 2: create a manual of operations also for people to be 226 00:12:52,756 --> 00:12:55,596 Speaker 2: able to make discoveries, and that's the goal to have 227 00:12:56,476 --> 00:12:57,916 Speaker 2: the most impact on patient care. 228 00:12:58,116 --> 00:13:00,396 Speaker 1: So what are the biomarkers? What are you asking people 229 00:13:00,396 --> 00:13:01,396 Speaker 1: to do? What are you collecting? 230 00:13:02,356 --> 00:13:07,316 Speaker 2: So I separate things between. So there are respiratory biomarkers, 231 00:13:08,356 --> 00:13:13,196 Speaker 2: voice biome markers, speech biomarkers, and linguistics biomarkers, and they're 232 00:13:13,236 --> 00:13:16,476 Speaker 2: all different. So let's go about why these are different. 233 00:13:16,876 --> 00:13:20,076 Speaker 2: So respiratory is easy, right, So we ask people to breathe, 234 00:13:20,556 --> 00:13:23,916 Speaker 2: to cough, to take big breaths in and that has 235 00:13:24,036 --> 00:13:27,236 Speaker 2: a lot of information on our pulmonary capacity, on how 236 00:13:27,276 --> 00:13:31,836 Speaker 2: our windpipe is shaped. Okay, that's respiratory. Then voice and 237 00:13:31,876 --> 00:13:34,236 Speaker 2: speech what's the difference. So voice is really the sound 238 00:13:34,316 --> 00:13:38,156 Speaker 2: that we make when our vocal cords come together. So 239 00:13:38,436 --> 00:13:42,836 Speaker 2: when we say, like birds can voice, but they can't speak. 240 00:13:43,636 --> 00:13:46,436 Speaker 2: If you have a bird that speaks, then you'll be very. 241 00:13:46,316 --> 00:13:50,236 Speaker 1: Rich or you have a parent. 242 00:13:52,676 --> 00:13:55,236 Speaker 2: So when we when we do voice tasks, we ask 243 00:13:55,356 --> 00:13:57,516 Speaker 2: patients to say E or. 244 00:13:57,516 --> 00:13:59,436 Speaker 3: Ah or I. 245 00:13:59,516 --> 00:14:00,116 Speaker 1: Get the difference. 246 00:14:01,716 --> 00:14:06,636 Speaker 2: Birds and voice biomarkers will be impacted when our voice 247 00:14:06,636 --> 00:14:11,436 Speaker 2: box is changed or our resp is changed. Right, So 248 00:14:11,476 --> 00:14:14,916 Speaker 2: somebody with pneumonia probably cannot hold a note for very long, 249 00:14:15,036 --> 00:14:18,596 Speaker 2: So that's voice biomarkers. When we talk about speech biomarkers, 250 00:14:18,636 --> 00:14:22,476 Speaker 2: then you go into articulation. So some people, for example, 251 00:14:22,476 --> 00:14:25,796 Speaker 2: who have neurological deficits or their mouth is not working correctly, 252 00:14:25,796 --> 00:14:28,236 Speaker 2: they're going to have trouble articulating. They're going to have 253 00:14:28,236 --> 00:14:31,916 Speaker 2: trouble saying some words. So these are biomarkers we can extract. 254 00:14:32,196 --> 00:14:36,436 Speaker 2: And then lastly there's linguistic biomarkers. So what type of 255 00:14:36,476 --> 00:14:41,156 Speaker 2: words are people using, what type of semantic how fast 256 00:14:41,476 --> 00:14:44,076 Speaker 2: do they speak for example? These are all different types 257 00:14:44,076 --> 00:14:48,316 Speaker 2: of biomarkers that. 258 00:14:46,596 --> 00:14:47,476 Speaker 3: That we can extract. 259 00:14:47,476 --> 00:14:49,636 Speaker 2: So to give you a very tangible example, I was 260 00:14:49,676 --> 00:14:53,316 Speaker 2: reading a paper from a group looking at biomarkers of depression, 261 00:14:54,836 --> 00:14:58,196 Speaker 2: and rate of speech was one of the important biomarkers 262 00:14:58,196 --> 00:15:01,236 Speaker 2: they found. So people who are sad or depressed will 263 00:15:01,276 --> 00:15:06,956 Speaker 2: speak at a slower pace, so words per second is smaller. 264 00:15:06,996 --> 00:15:08,676 Speaker 2: So that's simple when you think about it, it's a 265 00:15:08,676 --> 00:15:12,996 Speaker 2: simple by marker, right, So that's to give up tangible examples. 266 00:15:13,276 --> 00:15:14,956 Speaker 2: So in terms of I think I didn't answer your 267 00:15:15,036 --> 00:15:18,516 Speaker 2: question fully, So what are we asking patients? So we 268 00:15:18,636 --> 00:15:21,916 Speaker 2: ask people to do all these tasks so coughing, breathing, 269 00:15:22,316 --> 00:15:27,516 Speaker 2: a e. Then we make them read those validated passages, 270 00:15:27,596 --> 00:15:30,956 Speaker 2: and we also ask open questions. And then when we 271 00:15:31,036 --> 00:15:33,836 Speaker 2: ask open questions, we have to ask about questions that 272 00:15:34,116 --> 00:15:37,676 Speaker 2: make them emotional and some that don't make them emotional, 273 00:15:37,716 --> 00:15:40,236 Speaker 2: because if you trigger emotion, that causes a bias on 274 00:15:40,276 --> 00:15:41,516 Speaker 2: how your voice will sound. 275 00:15:42,676 --> 00:15:45,636 Speaker 1: What what question do you ask to make people emotional? 276 00:15:46,036 --> 00:15:47,116 Speaker 3: So it's really interesting. 277 00:15:47,396 --> 00:15:50,676 Speaker 2: So at first we would ask, you know, our first 278 00:15:50,756 --> 00:15:53,436 Speaker 2: question was, you know, can you talk to me about 279 00:15:53,836 --> 00:15:56,276 Speaker 2: something that makes you sad? It could be somebody that 280 00:15:56,356 --> 00:15:58,796 Speaker 2: died in your family or you know, So that was 281 00:15:58,836 --> 00:16:03,756 Speaker 2: our prompt. And then our question without emotion was tell 282 00:16:03,836 --> 00:16:06,356 Speaker 2: us about your disease and. 283 00:16:07,036 --> 00:16:09,676 Speaker 1: Only a doctor. What'd think that's for that emotional question? 284 00:16:09,836 --> 00:16:10,316 Speaker 3: Exactly? 285 00:16:10,396 --> 00:16:12,196 Speaker 2: I mean, but it's like when you think about it, 286 00:16:12,236 --> 00:16:14,916 Speaker 2: like Our consortium is like tons of experts that put 287 00:16:14,956 --> 00:16:16,516 Speaker 2: their minds together to develop. 288 00:16:16,276 --> 00:16:19,516 Speaker 1: Tell me about having Parkinson's. That's the unemotional question we're 289 00:16:19,516 --> 00:16:19,956 Speaker 1: going to ask. 290 00:16:19,916 --> 00:16:22,196 Speaker 3: And then we I mean, we like, why are you here? 291 00:16:22,236 --> 00:16:23,796 Speaker 2: I think it was not that obvious, but it's like, 292 00:16:24,556 --> 00:16:27,116 Speaker 2: tell us about why you're here to see your doctor today. 293 00:16:27,316 --> 00:16:29,956 Speaker 2: And then analyzing the data, because we do pilots, right, 294 00:16:30,036 --> 00:16:33,316 Speaker 2: we audit our data. We realized that people were starting 295 00:16:33,356 --> 00:16:36,236 Speaker 2: to tear up, like we had people crying while talking 296 00:16:36,276 --> 00:16:38,516 Speaker 2: about why they were coming to the doctor today, which is. 297 00:16:38,516 --> 00:16:41,436 Speaker 1: Supposed to be the example of unemotional. 298 00:16:40,836 --> 00:16:42,636 Speaker 3: Sure, correct, So we had to change that. 299 00:16:45,396 --> 00:16:50,556 Speaker 1: Yes, interesting, So okay, this is great. So you're getting 300 00:16:50,596 --> 00:16:55,836 Speaker 1: a lot of auditory information from every patient. What other 301 00:16:55,876 --> 00:16:58,356 Speaker 1: information you're getting from each person? So much? 302 00:16:58,676 --> 00:17:00,876 Speaker 2: So to give you an idea, our full protocol is 303 00:17:00,876 --> 00:17:01,516 Speaker 2: about one. 304 00:17:01,396 --> 00:17:05,396 Speaker 1: Hour okay, so of the patient with the patient. 305 00:17:05,076 --> 00:17:08,236 Speaker 2: With an ipassion it's an iPad, So everything is based 306 00:17:08,276 --> 00:17:10,756 Speaker 2: on an iPad and there's a helper right now or 307 00:17:10,756 --> 00:17:14,476 Speaker 2: research assistant. So we collect data. We collect very extensive 308 00:17:14,516 --> 00:17:18,396 Speaker 2: demographics in terms of you know, age, race, geographical location. 309 00:17:19,356 --> 00:17:22,556 Speaker 2: We collect language, So what language do you speak? How 310 00:17:22,596 --> 00:17:25,636 Speaker 2: many languages is do you speak, what languages do you write? 311 00:17:26,436 --> 00:17:28,196 Speaker 2: You know, what part of the world are you from? 312 00:17:28,356 --> 00:17:32,156 Speaker 2: That's really important. Then we collect about disabilities. Are you 313 00:17:32,236 --> 00:17:34,836 Speaker 2: hearing compared are you visually impaired? Because that makes a 314 00:17:34,916 --> 00:17:40,076 Speaker 2: change in your voice, your smoking status, your hydration status, 315 00:17:40,396 --> 00:17:43,796 Speaker 2: your fatigue status, because that's so we're we kind of 316 00:17:43,836 --> 00:17:47,196 Speaker 2: thought about anything that could affect voice, right, your socio 317 00:17:47,236 --> 00:17:50,996 Speaker 2: economical status because if you think about it, that's going 318 00:17:51,036 --> 00:17:54,956 Speaker 2: to affect you know, your linguistics as well. And then 319 00:17:55,036 --> 00:17:59,916 Speaker 2: so other that extensive demographics, then we collect confounders, so 320 00:17:59,956 --> 00:18:02,196 Speaker 2: we think about anything that could change your voice. Do 321 00:18:02,236 --> 00:18:05,556 Speaker 2: you have allergies? Do you do you have dental issues? 322 00:18:05,596 --> 00:18:09,636 Speaker 2: Do you wear braces? So everybody gets a basic test 323 00:18:09,756 --> 00:18:13,036 Speaker 2: about if they are depressed. So no matter what disease 324 00:18:13,076 --> 00:18:15,716 Speaker 2: you have, you kind of get the basic tests for 325 00:18:15,796 --> 00:18:18,756 Speaker 2: all the other disease to measure if it's possible that 326 00:18:18,836 --> 00:18:21,356 Speaker 2: you have concurrent diseases at the same time. 327 00:18:21,396 --> 00:18:24,716 Speaker 1: Because presumably because people are in fact complex, and there 328 00:18:24,756 --> 00:18:27,676 Speaker 1: are many people who have depression and Parkinson's and you 329 00:18:27,716 --> 00:18:29,716 Speaker 1: want to understand what's going on there. 330 00:18:30,116 --> 00:18:33,836 Speaker 2: I mean, most people are complex, right, It's really rare 331 00:18:33,916 --> 00:18:36,396 Speaker 2: to have and people that go to the doctor are 332 00:18:36,436 --> 00:18:39,636 Speaker 2: not twenty year old and healthy. Right, most of the 333 00:18:39,676 --> 00:18:42,596 Speaker 2: people who will use our technology or will benefit from 334 00:18:42,636 --> 00:18:45,716 Speaker 2: these database will be your typical sixty year old chronic 335 00:18:45,756 --> 00:18:48,196 Speaker 2: disease patient that comes into the doctor and they're not 336 00:18:48,316 --> 00:18:50,076 Speaker 2: they don't have a sterile bill of health. 337 00:18:50,916 --> 00:18:53,556 Speaker 1: How many people do you want to have in the database? Like, 338 00:18:53,636 --> 00:18:55,436 Speaker 1: is there a final number you're going for? 339 00:18:55,876 --> 00:18:58,476 Speaker 2: So at the beginning, we were aiming for thirty thousand, 340 00:19:00,436 --> 00:19:03,796 Speaker 2: which is extremely it's extremely ambitious, I think to be fair, 341 00:19:03,836 --> 00:19:06,236 Speaker 2: I mean, if after four years we get to ten thousand, 342 00:19:06,276 --> 00:19:10,356 Speaker 2: I think it'll be a huge success. Okay, And you 343 00:19:10,356 --> 00:19:13,636 Speaker 2: know the data collection. I think what we're learning is 344 00:19:13,676 --> 00:19:17,556 Speaker 2: that data collection is very resource intensive. To have good 345 00:19:17,676 --> 00:19:20,196 Speaker 2: data is very resource intensive. 346 00:19:20,996 --> 00:19:25,436 Speaker 1: So what happened that made you realize that thirty thousand 347 00:19:25,636 --> 00:19:27,876 Speaker 1: was maybe harder than you thought? 348 00:19:28,796 --> 00:19:31,756 Speaker 2: So? I think we thought that we wanted to collect 349 00:19:31,796 --> 00:19:34,196 Speaker 2: as much data as possible, and our original plan was 350 00:19:34,236 --> 00:19:38,996 Speaker 2: to collect a lot shorter protocols, you know, like shorter clips. 351 00:19:40,236 --> 00:19:43,516 Speaker 2: But as we started working with patients, we realized that 352 00:19:44,076 --> 00:19:48,076 Speaker 2: by getting more data from the same patients, we can 353 00:19:48,116 --> 00:19:51,636 Speaker 2: actually have a lot more information and it provides a 354 00:19:51,716 --> 00:19:55,276 Speaker 2: lot of interesting you know biomarkers. So we're focusing more 355 00:19:55,316 --> 00:19:58,716 Speaker 2: on getting more data from a smaller amount of patients 356 00:19:59,156 --> 00:20:02,356 Speaker 2: and really with the right data, kind of right data 357 00:20:02,436 --> 00:20:04,636 Speaker 2: with a lot of clinical information attached to it. 358 00:20:08,036 --> 00:20:10,516 Speaker 1: After the break, what the world will look like in 359 00:20:10,556 --> 00:20:22,876 Speaker 1: a few years if everything goes well. So this is 360 00:20:22,916 --> 00:20:25,356 Speaker 1: a big project that yeah, Elle and her colleagues are 361 00:20:25,356 --> 00:20:27,556 Speaker 1: embarked on. It's a four year project. They're about a 362 00:20:27,676 --> 00:20:31,356 Speaker 1: year in and there will be interim data releases along 363 00:20:31,356 --> 00:20:34,156 Speaker 1: the way. So I asked her, how long will it 364 00:20:34,196 --> 00:20:37,036 Speaker 1: take for this project to advance the state of the 365 00:20:37,116 --> 00:20:39,396 Speaker 1: science in acoustic biomarkers. 366 00:20:39,876 --> 00:20:43,036 Speaker 2: Yeah, I would say to say at the end of 367 00:20:43,076 --> 00:20:46,956 Speaker 2: the four years would be a probably the best answer. 368 00:20:47,036 --> 00:20:49,236 Speaker 2: I think at the end of the four years. But 369 00:20:49,316 --> 00:20:51,356 Speaker 2: I think that you know, you can just say, oh, 370 00:20:51,356 --> 00:20:53,196 Speaker 2: we'll just start training models at the end of the 371 00:20:53,196 --> 00:20:55,396 Speaker 2: four year once we have all the data. Right, It's 372 00:20:55,396 --> 00:20:57,636 Speaker 2: not just about you know, building one model that I'll 373 00:20:57,636 --> 00:21:02,036 Speaker 2: answer your question, is about continuously training models to understand 374 00:21:02,196 --> 00:21:05,756 Speaker 2: which biomarkers to extract the then build products that walk. 375 00:21:06,356 --> 00:21:13,756 Speaker 1: So, so, if things go well, what will this world 376 00:21:13,796 --> 00:21:16,596 Speaker 1: look like in whatever five years? 377 00:21:17,276 --> 00:21:21,156 Speaker 2: Yes, So, I mean there's there's a few things that 378 00:21:21,196 --> 00:21:24,036 Speaker 2: this can help with in general, voice biomarkers. Let's not 379 00:21:24,076 --> 00:21:28,596 Speaker 2: talk about just our project. Diagnosis is one thing, right, 380 00:21:28,676 --> 00:21:35,276 Speaker 2: early diagnosis, but that's probably the hardest thing, Huh. Screening 381 00:21:35,516 --> 00:21:38,996 Speaker 2: is most more important. So when we think about screening, 382 00:21:39,076 --> 00:21:41,596 Speaker 2: it means you, let's say you live really far you 383 00:21:41,596 --> 00:21:43,956 Speaker 2: don't have access to a doctor, but your doctor has 384 00:21:43,956 --> 00:21:46,356 Speaker 2: an iPhone and you can talk into the iPhone and 385 00:21:46,356 --> 00:21:49,076 Speaker 2: it can say, hey, something's wrong. You know, you need 386 00:21:49,236 --> 00:21:53,156 Speaker 2: a neurological specialist, for example. So to help screen and triage. 387 00:21:53,236 --> 00:21:55,956 Speaker 2: I think this probably we're looking at in the next 388 00:21:55,996 --> 00:22:00,476 Speaker 2: five years, something definitely possible. The other product that I 389 00:22:00,516 --> 00:22:03,596 Speaker 2: think will be very possible within five years is tracking 390 00:22:03,596 --> 00:22:07,196 Speaker 2: of diseases. If you want to monitor the evolution of 391 00:22:07,236 --> 00:22:11,876 Speaker 2: parkinson or how people respond to drugs. That's why pharmaceutical 392 00:22:11,876 --> 00:22:13,196 Speaker 2: companies are very interested. 393 00:22:13,396 --> 00:22:16,716 Speaker 1: Right. So the acoustic biomarker is not just a binary 394 00:22:16,796 --> 00:22:19,676 Speaker 1: signal of disease, no disease. It can tell you a 395 00:22:19,716 --> 00:22:23,796 Speaker 1: lot about the status of disease. Is it getting better, 396 00:22:23,836 --> 00:22:24,556 Speaker 1: is it getting worse? 397 00:22:24,956 --> 00:22:27,996 Speaker 2: Evolution, especially if you train it on your own voice. Right, 398 00:22:28,476 --> 00:22:32,596 Speaker 2: it's even easier to detect changes in somebody's voice as 399 00:22:32,716 --> 00:22:36,676 Speaker 2: they progress, like your sory for example, or Alexa that 400 00:22:36,756 --> 00:22:39,716 Speaker 2: learns listens to your voice. So that's going to be 401 00:22:39,716 --> 00:22:42,396 Speaker 2: a really good tool for pharmaceutical companies. That's why they're 402 00:22:42,396 --> 00:22:44,716 Speaker 2: investing in it, right, to see how you respond to 403 00:22:44,756 --> 00:22:47,756 Speaker 2: a drug, how you respond to a treatment. And when 404 00:22:47,796 --> 00:22:51,396 Speaker 2: you think about telehealth at home, right, so more and 405 00:22:51,476 --> 00:22:55,796 Speaker 2: more we're going to talk about remote monitoring people. There 406 00:22:55,796 --> 00:22:57,676 Speaker 2: were just too many people on this earth to all 407 00:22:57,716 --> 00:22:59,236 Speaker 2: be in hospitals when we're sick. 408 00:22:59,876 --> 00:23:02,356 Speaker 1: Well, and if you can stay out of the hospital 409 00:23:02,356 --> 00:23:04,636 Speaker 1: when you're sick, that's better, Right, You don't want to 410 00:23:04,636 --> 00:23:06,956 Speaker 1: go to the hospital unless you have to do yeah, or. 411 00:23:07,756 --> 00:23:10,836 Speaker 2: That you're you know, your Lexa detects when your voice 412 00:23:10,836 --> 00:23:14,036 Speaker 2: starts detailor rating and sends you a nurse before you 413 00:23:14,076 --> 00:23:15,236 Speaker 2: need to go to the hospital. 414 00:23:15,716 --> 00:23:19,396 Speaker 1: So there's a more general version of that one, right 415 00:23:19,476 --> 00:23:22,716 Speaker 1: that you could imagine, which is you get your whatever, 416 00:23:22,836 --> 00:23:25,716 Speaker 1: your iPhone, your Android phone, and you have a choice 417 00:23:25,716 --> 00:23:27,956 Speaker 1: when you're setting up your phone, like do you want 418 00:23:27,956 --> 00:23:32,596 Speaker 1: to opt into to the phone listening and to tell 419 00:23:32,636 --> 00:23:34,676 Speaker 1: you if you need to go talk to your doctor, right, 420 00:23:34,836 --> 00:23:39,156 Speaker 1: just like a very broad based thing that you could 421 00:23:39,196 --> 00:23:43,236 Speaker 1: opt into, like I would probably opt into that. I mean, 422 00:23:43,316 --> 00:23:44,916 Speaker 1: is that a thing that you think about? 423 00:23:45,276 --> 00:23:48,076 Speaker 2: So, I mean yes, I'm sure that you know Apple 424 00:23:48,196 --> 00:23:49,636 Speaker 2: is working on that already. 425 00:23:49,756 --> 00:23:52,196 Speaker 3: They are, you know. 426 00:23:52,516 --> 00:23:55,996 Speaker 2: The question is there has to be technology that's being 427 00:23:56,036 --> 00:23:59,916 Speaker 2: developed as well to ensure privacy of not only you, 428 00:24:00,396 --> 00:24:03,676 Speaker 2: but your environment. Right, because when it's your phone, then 429 00:24:03,676 --> 00:24:04,956 Speaker 2: it's your environment as well. 430 00:24:05,476 --> 00:24:07,916 Speaker 1: So you brought up privacy in that context, we can 431 00:24:08,276 --> 00:24:10,876 Speaker 1: knock out private see in the context of the database 432 00:24:10,916 --> 00:24:17,436 Speaker 1: as well. Here, how could it go wrong? Building a 433 00:24:17,516 --> 00:24:23,196 Speaker 1: database of thousands of people's voices with tons of data 434 00:24:23,236 --> 00:24:25,916 Speaker 1: about them sort of answers itself. 435 00:24:26,156 --> 00:24:28,516 Speaker 2: Yeah, it can go wrong in many ways. And I 436 00:24:28,796 --> 00:24:31,156 Speaker 2: just came out of like two hours of meetings of this. 437 00:24:31,276 --> 00:24:33,956 Speaker 2: So add the Bridge to ay I program. We have 438 00:24:34,036 --> 00:24:37,956 Speaker 2: a huge group of bioethicists and one of our big 439 00:24:38,036 --> 00:24:41,756 Speaker 2: aim as a group is really to ensure patient privacy 440 00:24:41,796 --> 00:24:44,356 Speaker 2: and to answer these questions of how do we protect 441 00:24:44,396 --> 00:24:47,276 Speaker 2: patient privacy in the context of open data. Right, So, 442 00:24:47,676 --> 00:24:50,316 Speaker 2: you are absolutely right, tons of things can go wrong. 443 00:24:50,876 --> 00:24:55,996 Speaker 2: People can be potentially reidentified through their voice. So one 444 00:24:56,036 --> 00:24:59,236 Speaker 2: of our biggest goals this year is determined what part 445 00:24:59,276 --> 00:25:02,916 Speaker 2: of the voice is identifiable and which part is not okay, 446 00:25:03,316 --> 00:25:05,596 Speaker 2: And all of this is based on the Hippo law. 447 00:25:05,796 --> 00:25:08,036 Speaker 2: Hippo law is from the nineteen nineties. 448 00:25:08,316 --> 00:25:13,796 Speaker 1: Hippola the that governs sharing and security of people's medical information. 449 00:25:13,676 --> 00:25:17,116 Speaker 2: Correct protected health information PHI, we call that, and that 450 00:25:17,196 --> 00:25:20,796 Speaker 2: law was made in nineteen nineties, right and back then 451 00:25:20,876 --> 00:25:23,276 Speaker 2: they listed a list of things of what they called 452 00:25:23,396 --> 00:25:27,956 Speaker 2: PHI or identifiers that cannot be shared openly and that 453 00:25:27,956 --> 00:25:31,996 Speaker 2: should stay in the hospital. And voice prints are listed. 454 00:25:32,756 --> 00:25:35,116 Speaker 2: When you go into what a definition of a voice 455 00:25:35,116 --> 00:25:39,876 Speaker 2: print is, it's very nebulous. It's you know, we don't know. 456 00:25:39,956 --> 00:25:42,036 Speaker 2: So because of that nebubularity. 457 00:25:42,116 --> 00:25:45,596 Speaker 1: If I have that word, if that's nebulosity, I'm fred, 458 00:25:45,596 --> 00:25:46,156 Speaker 1: I don't know. 459 00:25:49,196 --> 00:25:52,676 Speaker 2: Because it's so nebulous. A lot of institutions, a lot 460 00:25:52,676 --> 00:25:56,316 Speaker 2: of hospitals will say, well, you know, voice is not 461 00:25:56,996 --> 00:25:59,956 Speaker 2: is not an identifier as long as you don't say hi, 462 00:26:00,076 --> 00:26:02,916 Speaker 2: I'm John Doe and I live at four twenty five, 463 00:26:02,956 --> 00:26:06,036 Speaker 2: blah blah blah. Other universities will say, no, no, no, 464 00:26:06,156 --> 00:26:09,436 Speaker 2: voice is always an identifier. You can never really least 465 00:26:09,516 --> 00:26:12,876 Speaker 2: voice data. So what our group is doing right now 466 00:26:13,036 --> 00:26:16,276 Speaker 2: is really looking at why the hippo law says this, 467 00:26:16,516 --> 00:26:20,996 Speaker 2: what are the actual legal implications of sharing voice? And 468 00:26:21,556 --> 00:26:24,236 Speaker 2: we always grade it in terms of risk, Right, if 469 00:26:24,316 --> 00:26:27,116 Speaker 2: I talked about all the things that we collect, you 470 00:26:27,156 --> 00:26:30,836 Speaker 2: can think that the respiratory sounds are probably very safe 471 00:26:30,836 --> 00:26:36,236 Speaker 2: to share versus a speech sample. As we say free speech, 472 00:26:37,316 --> 00:26:41,196 Speaker 2: it's probably the most identifying if you have to grade it, right, 473 00:26:41,756 --> 00:26:45,436 Speaker 2: And we're kind of looking at, well, where is the balance? 474 00:26:45,516 --> 00:26:49,156 Speaker 2: How much can we release? And also we can transform 475 00:26:49,236 --> 00:26:51,796 Speaker 2: the data, so for example, we can change the data, 476 00:26:52,156 --> 00:26:58,036 Speaker 2: the audio data and what we call visual spectrograms. 477 00:26:56,156 --> 00:26:57,036 Speaker 1: Like a waveform. 478 00:26:58,076 --> 00:27:01,476 Speaker 2: Yeah, it's a sort of waveform that machine learning can use. 479 00:27:02,236 --> 00:27:08,076 Speaker 2: We can extract acoustic features, right, like loudness, frequency, stuff like. 480 00:27:08,036 --> 00:27:10,716 Speaker 1: That, and basically trying to figure out how to make 481 00:27:10,756 --> 00:27:15,876 Speaker 1: a person be not identifiable based on their voice without 482 00:27:16,036 --> 00:27:19,636 Speaker 1: messing up the database. Like that's the balance, right, Like 483 00:27:19,716 --> 00:27:22,636 Speaker 1: if you monkey with their voice too much, then your 484 00:27:22,676 --> 00:27:24,916 Speaker 1: monkey with the data, the database that we care the 485 00:27:24,956 --> 00:27:27,436 Speaker 1: most about. Like that seems like a hard trade off. 486 00:27:28,236 --> 00:27:31,076 Speaker 1: So if we go farther out into the future, you 487 00:27:31,156 --> 00:27:33,996 Speaker 1: solve all these problems, you build your giant database, the 488 00:27:34,076 --> 00:27:36,956 Speaker 1: models get really good. All of these things seem like 489 00:27:36,996 --> 00:27:42,876 Speaker 1: things that may well happen. I'm curious about, you know, 490 00:27:43,036 --> 00:27:48,596 Speaker 1: AI doing some chunk of what you do now. Right, 491 00:27:48,636 --> 00:27:51,276 Speaker 1: we see this happening, say in radiology already. AI is 492 00:27:51,316 --> 00:27:54,596 Speaker 1: clearly very good at doing some of the technical work 493 00:27:54,636 --> 00:27:58,836 Speaker 1: that radiologists do in diagnosing scans of patients. Right, how 494 00:27:58,836 --> 00:28:02,516 Speaker 1: do you think about the future of AI, you know, 495 00:28:02,676 --> 00:28:06,876 Speaker 1: using acoustic biomarkers to make diagnosis in a way that 496 00:28:06,956 --> 00:28:09,436 Speaker 1: is similar to what you do now as a human being. 497 00:28:10,156 --> 00:28:11,836 Speaker 2: Yeah, So, I mean, I I don't think I'm going 498 00:28:11,916 --> 00:28:14,156 Speaker 2: to lose my job yet because I would say that 499 00:28:14,316 --> 00:28:18,836 Speaker 2: my primary goal as a doctor is not to necessarily 500 00:28:19,156 --> 00:28:22,356 Speaker 2: do that, right, Like, my primary goal is, yes, to diagnose, 501 00:28:22,396 --> 00:28:24,436 Speaker 2: but it's to treat patient. So for now, AI is 502 00:28:24,476 --> 00:28:26,676 Speaker 2: not going to treat the patient. So I think what 503 00:28:26,716 --> 00:28:28,836 Speaker 2: it's going to do is it's going to support a 504 00:28:28,876 --> 00:28:35,036 Speaker 2: lot of the workforce. So for example, I'm an academic laryngologist. 505 00:28:35,116 --> 00:28:36,836 Speaker 2: I'm a super a super specialist. 506 00:28:37,116 --> 00:28:37,276 Speaker 3: Right. 507 00:28:37,716 --> 00:28:39,876 Speaker 2: For people to get to me, they often see like 508 00:28:39,916 --> 00:28:40,996 Speaker 2: five different doctors. 509 00:28:41,276 --> 00:28:43,956 Speaker 1: So instead of going through for specialists who can't figure 510 00:28:43,956 --> 00:28:45,876 Speaker 1: it out, you go to your primary care doctor or 511 00:28:45,916 --> 00:28:49,036 Speaker 1: even you just talk to your phone, and your phone 512 00:28:49,076 --> 00:28:50,876 Speaker 1: says you better talk to your primary care doctor, and 513 00:28:50,876 --> 00:28:54,356 Speaker 1: your primary care doctor sends sends the patient directly. 514 00:28:54,076 --> 00:28:57,756 Speaker 2: To you, correct, correct, right to say like hey. Because again, 515 00:28:58,196 --> 00:29:00,276 Speaker 2: most of what we do for a very long time 516 00:29:00,356 --> 00:29:04,916 Speaker 2: will will need a gold standard right diagnosis. So often 517 00:29:05,116 --> 00:29:08,196 Speaker 2: you know it's it's a it's a biopsy, or it's 518 00:29:08,276 --> 00:29:09,876 Speaker 2: a it's and imaging. 519 00:29:09,996 --> 00:29:11,116 Speaker 3: You need a gold standard. 520 00:29:11,276 --> 00:29:16,316 Speaker 1: The acoustic biomarker is not a clear enough diagnostic technique. 521 00:29:16,356 --> 00:29:18,236 Speaker 1: You need something more reliable. 522 00:29:18,716 --> 00:29:20,436 Speaker 2: So I don't think it's going to you know, no 523 00:29:20,556 --> 00:29:22,676 Speaker 2: doctor will say, oh, well, based on this, this is 524 00:29:22,716 --> 00:29:26,716 Speaker 2: your diagnostics, start chemotherapy. That's not where we're going. I 525 00:29:26,756 --> 00:29:32,236 Speaker 2: wouldn't take chemotherapy based on an acoustic biomarker. But it's 526 00:29:32,316 --> 00:29:35,676 Speaker 2: hopefully going to support a lot of primary care and 527 00:29:35,756 --> 00:29:38,636 Speaker 2: access to care to get to the right person faster. 528 00:29:39,916 --> 00:29:43,196 Speaker 1: Great anything else we should talk about. 529 00:29:44,156 --> 00:29:46,396 Speaker 2: The one thing we didn't talk about, I guess I 530 00:29:46,436 --> 00:29:48,116 Speaker 2: talk about this all day, so sometimes it's hard to 531 00:29:48,356 --> 00:29:52,636 Speaker 2: remember what I've said in iowha haven't said. But the 532 00:29:52,716 --> 00:29:57,436 Speaker 2: implication for probably all this new telehealth you know, online 533 00:29:57,756 --> 00:30:01,196 Speaker 2: world that we live in, a lot of industries are 534 00:30:01,236 --> 00:30:07,116 Speaker 2: already integrating tools. So, for example, Canary Speech is a 535 00:30:07,156 --> 00:30:09,876 Speaker 2: startup that sold a product. I think they're working with 536 00:30:09,956 --> 00:30:15,756 Speaker 2: teams to capture if there's signs in your voice of depression. 537 00:30:16,356 --> 00:30:19,916 Speaker 1: Teams meaning Microsoft teams, like Microsoft's version of Zoom. 538 00:30:20,076 --> 00:30:22,076 Speaker 2: Yeah, yeah, so I think And don't quote me on 539 00:30:22,116 --> 00:30:24,316 Speaker 2: the particular. Maybe I'm you know, I'm not giving, but 540 00:30:24,316 --> 00:30:27,236 Speaker 2: but I know there's a few startups that are starting 541 00:30:27,236 --> 00:30:31,156 Speaker 2: to integrate products in Zoom or in teams to let 542 00:30:31,316 --> 00:30:34,396 Speaker 2: employers know that, hey, your employee is not doing well 543 00:30:34,436 --> 00:30:36,636 Speaker 2: based on his voice, for example, Right, and. 544 00:30:36,636 --> 00:30:40,956 Speaker 1: What is your view of the efficacy of those? 545 00:30:41,836 --> 00:30:45,236 Speaker 2: So, I mean, I I the easy the quick answer 546 00:30:45,396 --> 00:30:50,996 Speaker 2: is it probably works partially. Yeah, But the question is 547 00:30:50,996 --> 00:30:53,156 Speaker 2: not if it works full you're not. The question is 548 00:30:53,196 --> 00:30:57,196 Speaker 2: does it make a difference? Right, So let's say let's. 549 00:30:57,556 --> 00:30:59,436 Speaker 1: Do what they say it does. Is a question that 550 00:30:59,476 --> 00:31:03,196 Speaker 1: matters to me, right, like does it are the claims valid? 551 00:31:03,516 --> 00:31:05,396 Speaker 1: Seems like a reasonable starting Yeah. 552 00:31:05,236 --> 00:31:05,876 Speaker 3: I think so. 553 00:31:05,876 --> 00:31:08,276 Speaker 2: So. I just I just reviewed one an article of 554 00:31:08,316 --> 00:31:11,396 Speaker 2: one of the startups Fantastic that's looking at like depression, 555 00:31:11,476 --> 00:31:14,396 Speaker 2: and I mean their numbers look great. I do think 556 00:31:14,396 --> 00:31:16,796 Speaker 2: that's that the results that a lot of these projects 557 00:31:16,796 --> 00:31:19,956 Speaker 2: are getting are definitely positive and promising. 558 00:31:19,996 --> 00:31:25,996 Speaker 1: Absolutely, we'll be back in a minute with the light. 559 00:31:28,436 --> 00:31:29,636 Speaker 3: M h. 560 00:31:37,636 --> 00:31:41,676 Speaker 1: Now, as promised, we're back with the lighting around. What 561 00:31:41,756 --> 00:31:42,636 Speaker 1: was your band called? 562 00:31:43,356 --> 00:31:48,316 Speaker 2: Ha, My chase stage name was Ella Bence Ella Bence 563 00:31:48,596 --> 00:31:50,636 Speaker 2: because my last name is Ben Susan so that was 564 00:31:50,676 --> 00:31:51,276 Speaker 2: too long. 565 00:31:51,876 --> 00:31:53,516 Speaker 1: And yeah, al became Ella. 566 00:31:53,956 --> 00:31:55,756 Speaker 3: Yeah, I could say my first name. 567 00:31:57,276 --> 00:32:00,876 Speaker 1: What did you have a hit song in French? What 568 00:32:00,956 --> 00:32:01,476 Speaker 1: was it called? 569 00:32:04,196 --> 00:32:05,916 Speaker 2: I wouldn't call it a hit song. It was called 570 00:32:05,916 --> 00:32:10,036 Speaker 2: annalis samp means I guess in English, it's like a 571 00:32:10,116 --> 00:32:10,916 Speaker 2: one way flight. 572 00:32:12,236 --> 00:32:13,276 Speaker 1: Can you sing a line? 573 00:32:13,516 --> 00:32:17,156 Speaker 3: No, that's my previous life. 574 00:32:17,756 --> 00:32:22,836 Speaker 1: Can you just say a line? Yeah? 575 00:32:22,956 --> 00:32:25,876 Speaker 3: I was Uh, it's in French, though I know it'll 576 00:32:25,916 --> 00:32:29,676 Speaker 3: sound great. Yeah, And nalisam. 577 00:32:33,036 --> 00:32:35,916 Speaker 2: Means get me a one way flight for the other 578 00:32:36,036 --> 00:32:38,636 Speaker 2: side of the world. I hope people are really happy there. 579 00:32:39,276 --> 00:32:42,436 Speaker 1: Well you're here, now, you're you're you're in Tampa. Now 580 00:32:42,476 --> 00:32:43,836 Speaker 1: did it work out as hoped? 581 00:32:44,116 --> 00:32:46,796 Speaker 2: Oh? Yeah, I mean I have I have the best 582 00:32:46,836 --> 00:32:49,476 Speaker 2: job in the world, you know. I get my my 583 00:32:49,556 --> 00:32:53,316 Speaker 2: mom raised us me and my brother saying you guys 584 00:32:53,356 --> 00:32:56,956 Speaker 2: need two jobs, one that make money, makes money and 585 00:32:56,996 --> 00:33:00,276 Speaker 2: the other one that makes you really happy. And if 586 00:33:00,316 --> 00:33:02,996 Speaker 2: you manage to have both in one job, then you'll 587 00:33:02,996 --> 00:33:04,996 Speaker 2: have made it, you know. And I get to be 588 00:33:05,116 --> 00:33:10,596 Speaker 2: a surgeon and work with voice and voice professional and 589 00:33:10,636 --> 00:33:13,556 Speaker 2: you know, it's been my passion pretty much all my life. 590 00:33:13,556 --> 00:33:14,876 Speaker 3: So yeah, do. 591 00:33:14,876 --> 00:33:17,916 Speaker 1: You work with professional singers as a physician? 592 00:33:18,316 --> 00:33:22,276 Speaker 2: Absolutely? I mean I love treating my professional singers, so yeah, 593 00:33:22,316 --> 00:33:23,116 Speaker 2: I love that part. 594 00:33:22,916 --> 00:33:23,396 Speaker 3: Of my job. 595 00:33:23,836 --> 00:33:26,076 Speaker 1: Have you treated anybody famous? Yes? 596 00:33:26,156 --> 00:33:26,836 Speaker 3: But I can't tell. 597 00:33:28,716 --> 00:33:30,116 Speaker 1: What's Taylor Swift really like? 598 00:33:30,476 --> 00:33:32,276 Speaker 3: Oh that I don't know. I wish No. 599 00:33:32,676 --> 00:33:35,116 Speaker 2: She sounds fine, though she probably doesn't need a laryngologist. 600 00:33:35,756 --> 00:33:37,516 Speaker 1: What's the best cure for a sore throat? 601 00:33:39,196 --> 00:33:43,356 Speaker 2: Voice rest and advil? It takes the inflammation away. 602 00:33:43,956 --> 00:33:48,676 Speaker 1: Uh huh? Advil? Just ibuprofen and don't talk and voice rest. 603 00:33:48,836 --> 00:33:49,036 Speaker 3: Yes? 604 00:33:49,396 --> 00:33:55,196 Speaker 1: Okay? Are you just always involuntarily diagnosing people based on 605 00:33:55,236 --> 00:33:57,076 Speaker 1: their voice all the time? Okay? 606 00:33:57,156 --> 00:33:58,836 Speaker 3: So funny funny fun fact. 607 00:33:58,916 --> 00:34:03,356 Speaker 2: Two months ago, my girlfriend from residency called me. I 608 00:34:03,356 --> 00:34:05,036 Speaker 2: hadn't spoken to her in like a year and a 609 00:34:05,076 --> 00:34:07,916 Speaker 2: half and she called me and she said hi, And 610 00:34:07,956 --> 00:34:09,076 Speaker 2: I said, you're pregnant? 611 00:34:09,996 --> 00:34:10,436 Speaker 1: Really? 612 00:34:11,676 --> 00:34:15,276 Speaker 2: And I could hear it because pregnancy gives you like this, 613 00:34:15,676 --> 00:34:18,076 Speaker 2: you know, you get stuffy in a certain way in 614 00:34:18,116 --> 00:34:20,876 Speaker 2: your nose, like we call it rhyanidis of pregnancy. 615 00:34:21,276 --> 00:34:22,676 Speaker 3: And I knew her voice very well. 616 00:34:22,716 --> 00:34:24,956 Speaker 2: She was my girlfriend for a long time, you know, 617 00:34:24,996 --> 00:34:28,356 Speaker 2: we studied together, and she just I knew it, you know. 618 00:34:28,556 --> 00:34:30,796 Speaker 2: And and I think she says, hey, how are you. 619 00:34:30,836 --> 00:34:32,556 Speaker 2: I wanted to talk to you, and I'm like, you're pregnant, 620 00:34:32,596 --> 00:34:33,276 Speaker 2: and She's like, how. 621 00:34:33,156 --> 00:34:35,196 Speaker 3: Did you know? So? 622 00:34:35,356 --> 00:34:37,356 Speaker 1: Yes, that's amazing people. 623 00:34:37,596 --> 00:34:40,756 Speaker 2: I mean I was listening to the political debates, you know, 624 00:34:41,276 --> 00:34:43,436 Speaker 2: and I'm like, ooh, this guy needs a laryngologist. I 625 00:34:43,436 --> 00:34:45,196 Speaker 2: could I'm diagnosing people all the time. 626 00:34:46,076 --> 00:34:52,036 Speaker 1: Well, they should give you a call. Yeah, absolutely, Okay, 627 00:34:53,076 --> 00:34:54,076 Speaker 1: lovely to talk with you. 628 00:34:54,996 --> 00:34:58,956 Speaker 2: It was one of the funnest interviews I've done. 629 00:35:00,156 --> 00:35:02,836 Speaker 1: Yeah, Albin Susan runs the Health Voice Center at the 630 00:35:02,956 --> 00:35:06,796 Speaker 1: University of South Florida. She's also a principal investigator on 631 00:35:06,836 --> 00:35:11,076 Speaker 1: the Bridge to AI Voice Project. Today's show was produced 632 00:35:11,076 --> 00:35:14,116 Speaker 1: by Gabriel Hunter Chang, edited by Lydia Jeane Kott, and 633 00:35:14,356 --> 00:35:17,716 Speaker 1: engineered by Sarah Bruguer. Just a quick note, We're going 634 00:35:17,756 --> 00:35:20,036 Speaker 1: to be taking a break for the next couple of weeks, 635 00:35:20,476 --> 00:35:23,276 Speaker 1: but we will have an episode in our feed next 636 00:35:23,276 --> 00:35:26,356 Speaker 1: week from our colleagues over at the Happiness Lab that 637 00:35:26,556 --> 00:35:31,156 Speaker 1: is timed not coincidentally to World Happiness Day, which I'm 638 00:35:31,196 --> 00:35:34,836 Speaker 1: informed is on March twentieth. I'm Jacob Goldstein and we'll 639 00:35:34,836 --> 00:35:48,276 Speaker 1: be back soon with more episodes of What's Your Problem