WEBVTT - Can Your Phone Tell When You're Getting Sick?

0:00:15.356 --> 0:00:22.596
<v Speaker 1>Pushkin. There are a lot of reasons that I'm excited

0:00:22.636 --> 0:00:25.836
<v Speaker 1>about today's show. I'm going to tell you three right now.

0:00:26.796 --> 0:00:30.836
<v Speaker 1>Number One, the show is about this whole dimension of

0:00:30.996 --> 0:00:37.596
<v Speaker 1>medicine that I essentially didn't know existed, acoustic biomarkers, basically

0:00:38.076 --> 0:00:41.996
<v Speaker 1>using a person's voice to assess their health. Second thing

0:00:42.036 --> 0:00:45.076
<v Speaker 1>I'm excited about the show is about the intersection of

0:00:45.436 --> 0:00:51.196
<v Speaker 1>AI and healthcare, one of my top say five intersections.

0:00:51.676 --> 0:00:57.316
<v Speaker 1>Love that intersection. And three, today's guest doctor Yael Bensusan,

0:00:57.796 --> 0:01:01.436
<v Speaker 1>gave me what was truly the best excuse that anyone

0:01:01.436 --> 0:01:04.076
<v Speaker 1>has ever given me for canceling an interview at the

0:01:04.156 --> 0:01:04.916
<v Speaker 1>last minute.

0:01:05.156 --> 0:01:07.116
<v Speaker 2>Yeah, so I'm really sorry for having to cancel on

0:01:07.156 --> 0:01:07.916
<v Speaker 2>you yesterday.

0:01:09.196 --> 0:01:12.756
<v Speaker 1>What was the what was the surgery you had to

0:01:12.756 --> 0:01:13.356
<v Speaker 1>do yesterday?

0:01:13.596 --> 0:01:17.916
<v Speaker 2>So yesterday we call it airway surgery, where I take

0:01:17.956 --> 0:01:20.076
<v Speaker 2>a base to the oar and I have to open

0:01:20.196 --> 0:01:22.996
<v Speaker 2>up their windpipe or their trachia because they're a scar

0:01:23.116 --> 0:01:26.716
<v Speaker 2>tissue that's blocking them from breathing. So I have to

0:01:26.716 --> 0:01:29.356
<v Speaker 2>go with a laser and cut the scar tissue out

0:01:30.276 --> 0:01:32.796
<v Speaker 2>and then take a balloon and open up their windpipe

0:01:32.916 --> 0:01:35.996
<v Speaker 2>so that they can wake up and breathe better, and

0:01:36.076 --> 0:01:38.756
<v Speaker 2>that translates to a different sound when they're breathing.

0:01:38.836 --> 0:01:39.516
<v Speaker 3>So when they're not.

0:01:39.596 --> 0:01:41.956
<v Speaker 2>Breathing because of the scar tissue, we can sound like,

0:01:43.116 --> 0:01:45.756
<v Speaker 2>you know, very noisy breathing. We call it the Darth

0:01:45.836 --> 0:01:49.236
<v Speaker 2>Vader breathing. And then when wake up they wake up

0:01:49.236 --> 0:01:51.956
<v Speaker 2>from surgery and they're done, they have silent breathing, which

0:01:51.956 --> 0:01:53.916
<v Speaker 2>means that I know that I did a good job.

0:02:00.236 --> 0:02:02.556
<v Speaker 1>I'm Jacob Goldstein and this is What's your Problem the

0:02:02.596 --> 0:02:05.156
<v Speaker 1>show where I talk to people. We're trying to make

0:02:05.236 --> 0:02:09.796
<v Speaker 1>technological progress. Doctor Yeah. Eld and Susan run the Health

0:02:09.916 --> 0:02:13.076
<v Speaker 1>Voice Center at the University of South Florida, and she

0:02:13.156 --> 0:02:16.596
<v Speaker 1>is also leading a team of researchers that's building a

0:02:16.716 --> 0:02:21.316
<v Speaker 1>giant database of human voices and breadths and health information.

0:02:22.116 --> 0:02:25.236
<v Speaker 1>Her problem is this, how do you record the voices

0:02:25.276 --> 0:02:29.876
<v Speaker 1>of thousands of people without violating patient privacy laws while

0:02:29.876 --> 0:02:34.276
<v Speaker 1>building a giant public database that could someday allow your

0:02:34.356 --> 0:02:37.996
<v Speaker 1>phone to warn you, based solely on your voice that

0:02:38.036 --> 0:02:40.996
<v Speaker 1>you may be getting sick. Yeah El told me that

0:02:41.116 --> 0:02:43.996
<v Speaker 1>she got into this field in part because she used

0:02:43.996 --> 0:02:44.636
<v Speaker 1>to be a singer.

0:02:46.636 --> 0:02:50.316
<v Speaker 2>So I growing up, you know, I always was in

0:02:50.356 --> 0:02:53.116
<v Speaker 2>a very musical family. I took singing lessons when I

0:02:53.196 --> 0:02:57.636
<v Speaker 2>was a kid, and then I started singing more professionally

0:02:57.996 --> 0:03:00.956
<v Speaker 2>around eighteen years old, and I had a short but

0:03:01.076 --> 0:03:05.356
<v Speaker 2>exciting singing career. I wrote pop folk music. We had

0:03:05.356 --> 0:03:08.236
<v Speaker 2>a bend, and we toured. We had an album out

0:03:08.276 --> 0:03:12.316
<v Speaker 2>in two thousand and twelve. Yeah, and I mean it

0:03:12.436 --> 0:03:15.836
<v Speaker 2>was a lot of fun. And actually the reason I

0:03:15.956 --> 0:03:18.876
<v Speaker 2>was able to have that short and exciting career was

0:03:19.276 --> 0:03:21.876
<v Speaker 2>because I met a speech pathologist when I was fifteen.

0:03:21.956 --> 0:03:25.836
<v Speaker 2>So I was taking singing classes and one day my

0:03:25.916 --> 0:03:28.036
<v Speaker 2>teacher looked at me and she said, there's something wrong

0:03:28.076 --> 0:03:28.756
<v Speaker 2>with your voice.

0:03:28.996 --> 0:03:29.876
<v Speaker 3>Go get checked.

0:03:31.716 --> 0:03:35.236
<v Speaker 2>And I met a laryngologist who put his camera down

0:03:35.636 --> 0:03:37.996
<v Speaker 2>and she said, you have nodules on your vocal cords

0:03:38.036 --> 0:03:39.956
<v Speaker 2>and you might not be able to sing again if

0:03:39.996 --> 0:03:42.316
<v Speaker 2>you don't take this seriously. And I went to see

0:03:42.316 --> 0:03:45.236
<v Speaker 2>a speech pathologist. I did rehabilitation with my voice for

0:03:45.316 --> 0:03:47.596
<v Speaker 2>six months, and I was able to sing again. And

0:03:47.636 --> 0:03:49.916
<v Speaker 2>I mean that's what led me to then become a

0:03:49.916 --> 0:03:54.156
<v Speaker 2>speech pathologist growing up, and then eventually go to med

0:03:54.156 --> 0:03:56.476
<v Speaker 2>school and then decide to become a laryngologist.

0:03:56.756 --> 0:03:58.596
<v Speaker 3>So it was kind of all interconnected.

0:03:59.916 --> 0:04:03.676
<v Speaker 1>So I know that your research now and most of

0:04:04.356 --> 0:04:08.716
<v Speaker 1>what I'm really interested to talk with you about is

0:04:08.716 --> 0:04:12.836
<v Speaker 1>is around acoustic biomarkers. So just to start, I mean,

0:04:13.796 --> 0:04:15.196
<v Speaker 1>what's an acoustic biomarker?

0:04:16.076 --> 0:04:19.116
<v Speaker 2>Very good question. So what is a biomarker? First, A

0:04:19.116 --> 0:04:24.196
<v Speaker 2>biomarker is something that indicates the presence of a disease, right,

0:04:25.396 --> 0:04:28.836
<v Speaker 2>So if you think about a biomarker for a cancer,

0:04:28.916 --> 0:04:31.716
<v Speaker 2>so different cancers have different types of biomarker. For example,

0:04:31.756 --> 0:04:35.476
<v Speaker 2>for ovarian cancer, we're looking for a specific thing you know,

0:04:35.676 --> 0:04:38.756
<v Speaker 2>called ca in your blood. For different types of cancers,

0:04:38.756 --> 0:04:41.316
<v Speaker 2>they could take a blood draw and find a specific biomarker.

0:04:41.316 --> 0:04:44.676
<v Speaker 2>It's an indicator of a disease. An acoustic biomarkers is

0:04:45.116 --> 0:04:47.756
<v Speaker 2>something that can indicate a presence of a disease, but

0:04:47.836 --> 0:04:50.316
<v Speaker 2>that you can hear. So that's the definition of an

0:04:50.356 --> 0:04:54.036
<v Speaker 2>acoustic biomarker. So I always say, you know, when you

0:04:54.156 --> 0:04:57.676
<v Speaker 2>have people in your family that are not well, you

0:04:57.716 --> 0:05:00.836
<v Speaker 2>will always notice first and you'll say you don't sound

0:05:00.876 --> 0:05:04.996
<v Speaker 2>good right, or you sound funny. And I have the

0:05:05.116 --> 0:05:08.036
<v Speaker 2>luxury to know that because I'm a voice doctor. So

0:05:08.196 --> 0:05:10.756
<v Speaker 2>then people will bring me their family members or people

0:05:10.796 --> 0:05:13.356
<v Speaker 2>will come saying, I don't know what's wrong with me,

0:05:13.796 --> 0:05:16.276
<v Speaker 2>but my wife told me to come because my voice

0:05:16.316 --> 0:05:20.756
<v Speaker 2>is not good. And sometimes it's because their vocal cords

0:05:20.916 --> 0:05:23.116
<v Speaker 2>are not working, but a lot of times it's because

0:05:23.156 --> 0:05:25.956
<v Speaker 2>they can have a neurological issue or a cardiac issue

0:05:26.316 --> 0:05:27.956
<v Speaker 2>that is affecting their voice.

0:05:28.196 --> 0:05:34.036
<v Speaker 1>So, more broadly, what's going on with AI and acoustic biomarkers.

0:05:34.756 --> 0:05:37.436
<v Speaker 2>Yeah, so, so many exciting things are going on. I

0:05:37.476 --> 0:05:42.036
<v Speaker 2>think that's the first answer. There are so many startups,

0:05:42.076 --> 0:05:46.876
<v Speaker 2>so many companies, industry researchers, academic researchers that are working

0:05:46.876 --> 0:05:50.956
<v Speaker 2>and looking into voice AI. And the reason is it's

0:05:50.996 --> 0:05:54.916
<v Speaker 2>really cheap to collect. Right to think about this, If

0:05:54.916 --> 0:05:56.796
<v Speaker 2>you have a phone, it's really cheap to collect Compared

0:05:56.836 --> 0:05:56.996
<v Speaker 2>to this.

0:05:57.036 --> 0:05:59.556
<v Speaker 1>You don't have to pick a blood sample. You have

0:05:59.636 --> 0:06:02.276
<v Speaker 1>exactly just you've got the phone. You've got the device

0:06:02.796 --> 0:06:05.236
<v Speaker 1>literally in your hand already. All you have to do

0:06:05.356 --> 0:06:07.636
<v Speaker 1>is talk, and you're talking already.

0:06:07.436 --> 0:06:09.756
<v Speaker 2>And you're talking already, so it's cheap to that's why

0:06:09.836 --> 0:06:12.956
<v Speaker 2>pharmaceutical industries are also very interested, and there's a lot

0:06:12.956 --> 0:06:16.156
<v Speaker 2>of pharmaceutical projects around it. So there are a lot

0:06:16.276 --> 0:06:21.196
<v Speaker 2>of projects that are going on and the state or

0:06:21.236 --> 0:06:24.476
<v Speaker 2>the The current landscape is that there's tons of people

0:06:24.556 --> 0:06:29.316
<v Speaker 2>working on very similar things and very interesting and various disease.

0:06:29.356 --> 0:06:31.876
<v Speaker 2>So I always I kind of categorize them in three

0:06:31.916 --> 0:06:36.076
<v Speaker 2>categories of diseases that are being studied. One is the

0:06:36.276 --> 0:06:41.276
<v Speaker 2>disease that affects the voice box. Okay, so vocal court paralysis, absolutely,

0:06:41.316 --> 0:06:43.756
<v Speaker 2>it's intuitive. There's going to be vocal biomarkers in that

0:06:44.476 --> 0:06:48.356
<v Speaker 2>voice box cancer, right, that's easy. Then there's a voice

0:06:48.396 --> 0:06:52.316
<v Speaker 2>and speech affecting disorders, so disorders that don't affect the

0:06:52.396 --> 0:06:55.436
<v Speaker 2>voice box, but that have an impact on the voice

0:06:55.436 --> 0:06:59.316
<v Speaker 2>and the speech. Parkinson is one of them, right, Alzheimer's

0:06:59.356 --> 0:07:01.996
<v Speaker 2>is one of them. A stroke somebody having a stroke,

0:07:02.156 --> 0:07:04.076
<v Speaker 2>they don't have a problem with their voice box, but

0:07:04.116 --> 0:07:06.236
<v Speaker 2>their speech is going to be altered. So these are

0:07:06.316 --> 0:07:09.476
<v Speaker 2>voice and speech affecting conditions. So lots of work is

0:07:09.476 --> 0:07:11.916
<v Speaker 2>being done in that field. And the third one is

0:07:12.116 --> 0:07:15.636
<v Speaker 2>diseases that you don't think would affect speech, but still

0:07:15.676 --> 0:07:17.556
<v Speaker 2>people are doing research on that. So there was a

0:07:17.596 --> 0:07:21.316
<v Speaker 2>really interesting study on diabetes. They're saying that there was

0:07:21.356 --> 0:07:24.356
<v Speaker 2>a group that published that they could diagnose people that

0:07:24.396 --> 0:07:28.516
<v Speaker 2>were diabetic versus non diabetics based on their speech and this.

0:07:28.836 --> 0:07:34.356
<v Speaker 1>So this third group is one presumably where there's at

0:07:34.436 --> 0:07:39.476
<v Speaker 1>least the potential for AI to detect differences that even

0:07:39.876 --> 0:07:43.196
<v Speaker 1>experts like you cannot detect, right, I mean, is that

0:07:43.276 --> 0:07:45.316
<v Speaker 1>what's going on there? What?

0:07:45.596 --> 0:07:48.316
<v Speaker 2>So AI is not magical, you know, I think it's

0:07:48.436 --> 0:07:50.156
<v Speaker 2>It does a lot of things. But what AI does

0:07:50.236 --> 0:07:53.316
<v Speaker 2>that the layperson doesn't do is that it can analyze

0:07:53.316 --> 0:07:54.596
<v Speaker 2>a lot more data faster.

0:07:55.516 --> 0:07:56.076
<v Speaker 1>Yeah.

0:07:56.196 --> 0:07:59.676
<v Speaker 2>Right, So AI has the possibility, if you have a

0:07:59.836 --> 0:08:03.916
<v Speaker 2>large data set, to then find small differences in these

0:08:04.036 --> 0:08:06.316
<v Speaker 2>data sets that we don't have. I mean, I would

0:08:06.356 --> 0:08:09.076
<v Speaker 2>have to listen to, you know, thousands and thousands of

0:08:09.196 --> 0:08:10.916
<v Speaker 2>voices and compare them statistically.

0:08:11.116 --> 0:08:13.356
<v Speaker 1>It might it might, right. It might also be able

0:08:13.396 --> 0:08:16.716
<v Speaker 1>to detect differences that are not even audible.

0:08:17.156 --> 0:08:20.636
<v Speaker 2>It could exactly. I can give it an example. There's

0:08:20.676 --> 0:08:25.156
<v Speaker 2>a company looking at atrial fibrillation, and I cannot validate

0:08:25.196 --> 0:08:27.796
<v Speaker 2>their data because that's one of the limitations that we're

0:08:27.836 --> 0:08:30.076
<v Speaker 2>going to talk about. But obviously their data set is

0:08:30.076 --> 0:08:33.316
<v Speaker 2>not public. But they're saying that they can diagnose atrial

0:08:33.356 --> 0:08:36.316
<v Speaker 2>fibrillation based on the voice. And their explanation is that

0:08:36.756 --> 0:08:39.396
<v Speaker 2>our voice vibrates to the sound of our heartbeats.

0:08:40.796 --> 0:08:42.756
<v Speaker 1>Big if true? Fun if true?

0:08:43.076 --> 0:08:45.916
<v Speaker 2>I mean you know, again, the limitation here is that

0:08:45.996 --> 0:08:48.356
<v Speaker 2>it's there's a lot of things you can't validate. But

0:08:48.716 --> 0:08:52.276
<v Speaker 2>they say that they've been validating it with EKGs and

0:08:52.396 --> 0:08:54.476
<v Speaker 2>that they can see it. They can hear a difference

0:08:54.516 --> 0:08:56.436
<v Speaker 2>in the voice between patient patients with a.

0:08:56.476 --> 0:09:00.476
<v Speaker 1>Fib atrial fibrillation. It puts you at risk for a stroke, right,

0:09:00.516 --> 0:09:04.156
<v Speaker 1>it can go undiagnosed. So like, if if this works,

0:09:04.196 --> 0:09:08.636
<v Speaker 1>that would be very helpful to many people, right, absolutely, absolutely.

0:09:09.116 --> 0:09:13.196
<v Speaker 1>So you're mentioning like that's super interesting. It's it's interesting

0:09:13.236 --> 0:09:17.716
<v Speaker 1>more generally. So, so you're building a giant database, right,

0:09:18.756 --> 0:09:21.196
<v Speaker 1>and I find that interesting for a lot of reasons.

0:09:21.036 --> 0:09:23.996
<v Speaker 1>It happens. I don't have you come across the work

0:09:24.036 --> 0:09:27.636
<v Speaker 1>of faith A Lee. Absolutely, yeses, So I talked to

0:09:27.636 --> 0:09:30.796
<v Speaker 1>faith A Lee for this show not long ago. Wow. Right,

0:09:30.916 --> 0:09:35.916
<v Speaker 1>she's like nerd famous, right yeah, And so you know,

0:09:36.036 --> 0:09:40.756
<v Speaker 1>as you know, she built this giant database of images

0:09:40.996 --> 0:09:44.516
<v Speaker 1>about ten years ago a little more now called image net.

0:09:44.956 --> 0:09:50.156
<v Speaker 1>And that was that giant database was what allowed these

0:09:50.316 --> 0:09:55.316
<v Speaker 1>early machine learning models AI models to you know, start

0:09:55.476 --> 0:10:02.076
<v Speaker 1>recognizing images, right, and so the database was this necessary tool,

0:10:02.916 --> 0:10:05.996
<v Speaker 1>necessary thing for the AI to really work, right, And

0:10:06.116 --> 0:10:12.796
<v Speaker 1>so are you building the acoustic biomarker version of that?

0:10:13.636 --> 0:10:16.636
<v Speaker 2>So the first the short answer is yes, but I'd

0:10:16.676 --> 0:10:18.716
<v Speaker 2>like to start by saying that I am not building

0:10:19.156 --> 0:10:20.116
<v Speaker 2>it's our distortion.

0:10:20.596 --> 0:10:22.916
<v Speaker 1>Yes, yes, are you all are?

0:10:23.116 --> 0:10:24.276
<v Speaker 3>Actually, I'll just.

0:10:24.316 --> 0:10:27.036
<v Speaker 2>First start by recognizing here that it's it's a it's

0:10:27.116 --> 0:10:29.196
<v Speaker 2>a huge team. So we're the Bridge to Way I

0:10:29.316 --> 0:10:33.116
<v Speaker 2>Voice Constortium is a team of fifty investigators across the

0:10:33.236 --> 0:10:36.556
<v Speaker 2>US and Canada. We're funded by the NIH through the

0:10:36.596 --> 0:10:40.076
<v Speaker 2>Bridge to Way I program and the goal absolutely this

0:10:40.156 --> 0:10:41.916
<v Speaker 2>is the first time I hear the analogy to the

0:10:41.956 --> 0:10:43.356
<v Speaker 2>image net database.

0:10:43.396 --> 0:10:43.756
<v Speaker 3>I like it.

0:10:43.796 --> 0:10:47.076
<v Speaker 2>I usually give the example of the genomic database, the

0:10:47.196 --> 0:10:48.996
<v Speaker 2>Human Genome Project, huge.

0:10:49.076 --> 0:10:52.196
<v Speaker 1>Project, more famous, more famous, they're.

0:10:51.716 --> 0:10:53.716
<v Speaker 3>Both very famous. But I like this analogy.

0:10:53.876 --> 0:10:56.196
<v Speaker 1>Well. Image net is maybe a little bit closer of

0:10:56.236 --> 0:11:00.156
<v Speaker 1>an analogy, but maybe less Yeah yeah, sexy, yeah.

0:10:59.836 --> 0:11:02.316
<v Speaker 2>Well, but I mean it's interesting because the genome project

0:11:02.356 --> 0:11:06.116
<v Speaker 2>has also very interesting ethical particularities like voice, right, the

0:11:06.196 --> 0:11:08.996
<v Speaker 2>image has a little bit less of the ethical constraints.

0:11:08.996 --> 0:11:11.036
<v Speaker 3>For is, when we talk about whole genome.

0:11:10.716 --> 0:11:15.636
<v Speaker 2>Sequencing or genomics data people kind of understand that voice

0:11:15.636 --> 0:11:18.436
<v Speaker 2>has similar concerns in terms of process.

0:11:18.476 --> 0:11:20.596
<v Speaker 1>We want to get to the concerns, but I want

0:11:20.596 --> 0:11:23.276
<v Speaker 1>to first talk about what you're doing and and then

0:11:23.316 --> 0:11:28.716
<v Speaker 1>we can talk about you know, not doing anything wrong. Yeah.

0:11:28.836 --> 0:11:32.676
<v Speaker 1>So broadly, if it becomes the thing you hope it

0:11:32.756 --> 0:11:35.076
<v Speaker 1>will be, what, what is it going to be? What

0:11:35.196 --> 0:11:38.636
<v Speaker 1>is the bridge to AI voice database going to be?

0:11:39.436 --> 0:11:42.516
<v Speaker 2>So it's going to be this large database of thousands

0:11:42.516 --> 0:11:47.796
<v Speaker 2>of human voices linked to other health information that are

0:11:47.876 --> 0:11:52.716
<v Speaker 2>going to be available to researchers and potentially people other

0:11:52.796 --> 0:11:56.756
<v Speaker 2>than researchers as well, to be able to make discoveries, right,

0:11:56.916 --> 0:12:00.756
<v Speaker 2>to learn to use a voice AI, to train you know,

0:12:00.796 --> 0:12:02.956
<v Speaker 2>the next generation of people on how to learn to

0:12:03.196 --> 0:12:07.516
<v Speaker 2>build models on voice AI, to help pharmaceutical companies develop

0:12:07.556 --> 0:12:11.676
<v Speaker 2>products or learn even to to develop products, right, And

0:12:11.716 --> 0:12:15.876
<v Speaker 2>the other really important thing is to teach people what

0:12:15.956 --> 0:12:18.716
<v Speaker 2>type of standards we need right right now, a lot

0:12:18.796 --> 0:12:21.916
<v Speaker 2>of different projects, there's really a lack of standards. People

0:12:21.996 --> 0:12:25.116
<v Speaker 2>collect voice in different ways. That's why it's really hard

0:12:25.116 --> 0:12:29.956
<v Speaker 2>to pull data together. So our dream was really to say, like, hey,

0:12:30.196 --> 0:12:33.956
<v Speaker 2>you want to do voice research, here's a manual, my friend, right,

0:12:34.036 --> 0:12:36.236
<v Speaker 2>like here is how you collect the voice to make

0:12:36.276 --> 0:12:39.356
<v Speaker 2>it accurate. This is the protocols with the task that

0:12:39.396 --> 0:12:43.956
<v Speaker 2>we think, based on our studies, give the best biomarkers. Right,

0:12:44.316 --> 0:12:46.556
<v Speaker 2>These are the type of biomarkers you can look for

0:12:46.836 --> 0:12:50.116
<v Speaker 2>and this is the data you can train, so really

0:12:50.156 --> 0:12:52.716
<v Speaker 2>create a manual of operations also for people to be

0:12:52.756 --> 0:12:55.596
<v Speaker 2>able to make discoveries, and that's the goal to have

0:12:56.476 --> 0:12:57.916
<v Speaker 2>the most impact on patient care.

0:12:58.116 --> 0:13:00.396
<v Speaker 1>So what are the biomarkers? What are you asking people

0:13:00.396 --> 0:13:01.396
<v Speaker 1>to do? What are you collecting?

0:13:02.356 --> 0:13:07.316
<v Speaker 2>So I separate things between. So there are respiratory biomarkers,

0:13:08.356 --> 0:13:13.196
<v Speaker 2>voice biome markers, speech biomarkers, and linguistics biomarkers, and they're

0:13:13.236 --> 0:13:16.476
<v Speaker 2>all different. So let's go about why these are different.

0:13:16.876 --> 0:13:20.076
<v Speaker 2>So respiratory is easy, right, So we ask people to breathe,

0:13:20.556 --> 0:13:23.916
<v Speaker 2>to cough, to take big breaths in and that has

0:13:24.036 --> 0:13:27.236
<v Speaker 2>a lot of information on our pulmonary capacity, on how

0:13:27.276 --> 0:13:31.836
<v Speaker 2>our windpipe is shaped. Okay, that's respiratory. Then voice and

0:13:31.876 --> 0:13:34.236
<v Speaker 2>speech what's the difference. So voice is really the sound

0:13:34.316 --> 0:13:38.156
<v Speaker 2>that we make when our vocal cords come together. So

0:13:38.436 --> 0:13:42.836
<v Speaker 2>when we say, like birds can voice, but they can't speak.

0:13:43.636 --> 0:13:46.436
<v Speaker 2>If you have a bird that speaks, then you'll be very.

0:13:46.316 --> 0:13:50.236
<v Speaker 1>Rich or you have a parent.

0:13:52.676 --> 0:13:55.236
<v Speaker 2>So when we when we do voice tasks, we ask

0:13:55.356 --> 0:13:57.516
<v Speaker 2>patients to say E or.

0:13:57.516 --> 0:13:59.436
<v Speaker 3>Ah or I.

0:13:59.516 --> 0:14:00.116
<v Speaker 1>Get the difference.

0:14:01.716 --> 0:14:06.636
<v Speaker 2>Birds and voice biomarkers will be impacted when our voice

0:14:06.636 --> 0:14:11.436
<v Speaker 2>box is changed or our resp is changed. Right, So

0:14:11.476 --> 0:14:14.916
<v Speaker 2>somebody with pneumonia probably cannot hold a note for very long,

0:14:15.036 --> 0:14:18.596
<v Speaker 2>So that's voice biomarkers. When we talk about speech biomarkers,

0:14:18.636 --> 0:14:22.476
<v Speaker 2>then you go into articulation. So some people, for example,

0:14:22.476 --> 0:14:25.796
<v Speaker 2>who have neurological deficits or their mouth is not working correctly,

0:14:25.796 --> 0:14:28.236
<v Speaker 2>they're going to have trouble articulating. They're going to have

0:14:28.236 --> 0:14:31.916
<v Speaker 2>trouble saying some words. So these are biomarkers we can extract.

0:14:32.196 --> 0:14:36.436
<v Speaker 2>And then lastly there's linguistic biomarkers. So what type of

0:14:36.476 --> 0:14:41.156
<v Speaker 2>words are people using, what type of semantic how fast

0:14:41.476 --> 0:14:44.076
<v Speaker 2>do they speak for example? These are all different types

0:14:44.076 --> 0:14:48.316
<v Speaker 2>of biomarkers that.

0:14:46.596 --> 0:14:47.476
<v Speaker 3>That we can extract.

0:14:47.476 --> 0:14:49.636
<v Speaker 2>So to give you a very tangible example, I was

0:14:49.676 --> 0:14:53.316
<v Speaker 2>reading a paper from a group looking at biomarkers of depression,

0:14:54.836 --> 0:14:58.196
<v Speaker 2>and rate of speech was one of the important biomarkers

0:14:58.196 --> 0:15:01.236
<v Speaker 2>they found. So people who are sad or depressed will

0:15:01.276 --> 0:15:06.956
<v Speaker 2>speak at a slower pace, so words per second is smaller.

0:15:06.996 --> 0:15:08.676
<v Speaker 2>So that's simple when you think about it, it's a

0:15:08.676 --> 0:15:12.996
<v Speaker 2>simple by marker, right, So that's to give up tangible examples.

0:15:13.276 --> 0:15:14.956
<v Speaker 2>So in terms of I think I didn't answer your

0:15:15.036 --> 0:15:18.516
<v Speaker 2>question fully, So what are we asking patients? So we

0:15:18.636 --> 0:15:21.916
<v Speaker 2>ask people to do all these tasks so coughing, breathing,

0:15:22.316 --> 0:15:27.516
<v Speaker 2>a e. Then we make them read those validated passages,

0:15:27.596 --> 0:15:30.956
<v Speaker 2>and we also ask open questions. And then when we

0:15:31.036 --> 0:15:33.836
<v Speaker 2>ask open questions, we have to ask about questions that

0:15:34.116 --> 0:15:37.676
<v Speaker 2>make them emotional and some that don't make them emotional,

0:15:37.716 --> 0:15:40.236
<v Speaker 2>because if you trigger emotion, that causes a bias on

0:15:40.276 --> 0:15:41.516
<v Speaker 2>how your voice will sound.

0:15:42.676 --> 0:15:45.636
<v Speaker 1>What what question do you ask to make people emotional?

0:15:46.036 --> 0:15:47.116
<v Speaker 3>So it's really interesting.

0:15:47.396 --> 0:15:50.676
<v Speaker 2>So at first we would ask, you know, our first

0:15:50.756 --> 0:15:53.436
<v Speaker 2>question was, you know, can you talk to me about

0:15:53.836 --> 0:15:56.276
<v Speaker 2>something that makes you sad? It could be somebody that

0:15:56.356 --> 0:15:58.796
<v Speaker 2>died in your family or you know, So that was

0:15:58.836 --> 0:16:03.756
<v Speaker 2>our prompt. And then our question without emotion was tell

0:16:03.836 --> 0:16:06.356
<v Speaker 2>us about your disease and.

0:16:07.036 --> 0:16:09.676
<v Speaker 1>Only a doctor. What'd think that's for that emotional question?

0:16:09.836 --> 0:16:10.316
<v Speaker 3>Exactly?

0:16:10.396 --> 0:16:12.196
<v Speaker 2>I mean, but it's like when you think about it,

0:16:12.236 --> 0:16:14.916
<v Speaker 2>like Our consortium is like tons of experts that put

0:16:14.956 --> 0:16:16.516
<v Speaker 2>their minds together to develop.

0:16:16.276 --> 0:16:19.516
<v Speaker 1>Tell me about having Parkinson's. That's the unemotional question we're

0:16:19.516 --> 0:16:19.956
<v Speaker 1>going to ask.

0:16:19.916 --> 0:16:22.196
<v Speaker 3>And then we I mean, we like, why are you here?

0:16:22.236 --> 0:16:23.796
<v Speaker 2>I think it was not that obvious, but it's like,

0:16:24.556 --> 0:16:27.116
<v Speaker 2>tell us about why you're here to see your doctor today.

0:16:27.316 --> 0:16:29.956
<v Speaker 2>And then analyzing the data, because we do pilots, right,

0:16:30.036 --> 0:16:33.316
<v Speaker 2>we audit our data. We realized that people were starting

0:16:33.356 --> 0:16:36.236
<v Speaker 2>to tear up, like we had people crying while talking

0:16:36.276 --> 0:16:38.516
<v Speaker 2>about why they were coming to the doctor today, which is.

0:16:38.516 --> 0:16:41.436
<v Speaker 1>Supposed to be the example of unemotional.

0:16:40.836 --> 0:16:42.636
<v Speaker 3>Sure, correct, So we had to change that.

0:16:45.396 --> 0:16:50.556
<v Speaker 1>Yes, interesting, So okay, this is great. So you're getting

0:16:50.596 --> 0:16:55.836
<v Speaker 1>a lot of auditory information from every patient. What other

0:16:55.876 --> 0:16:58.356
<v Speaker 1>information you're getting from each person? So much?

0:16:58.676 --> 0:17:00.876
<v Speaker 2>So to give you an idea, our full protocol is

0:17:00.876 --> 0:17:01.516
<v Speaker 2>about one.

0:17:01.396 --> 0:17:05.396
<v Speaker 1>Hour okay, so of the patient with the patient.

0:17:05.076 --> 0:17:08.236
<v Speaker 2>With an ipassion it's an iPad, So everything is based

0:17:08.276 --> 0:17:10.756
<v Speaker 2>on an iPad and there's a helper right now or

0:17:10.756 --> 0:17:14.476
<v Speaker 2>research assistant. So we collect data. We collect very extensive

0:17:14.516 --> 0:17:18.396
<v Speaker 2>demographics in terms of you know, age, race, geographical location.

0:17:19.356 --> 0:17:22.556
<v Speaker 2>We collect language, So what language do you speak? How

0:17:22.596 --> 0:17:25.636
<v Speaker 2>many languages is do you speak, what languages do you write?

0:17:26.436 --> 0:17:28.196
<v Speaker 2>You know, what part of the world are you from?

0:17:28.356 --> 0:17:32.156
<v Speaker 2>That's really important. Then we collect about disabilities. Are you

0:17:32.236 --> 0:17:34.836
<v Speaker 2>hearing compared are you visually impaired? Because that makes a

0:17:34.916 --> 0:17:40.076
<v Speaker 2>change in your voice, your smoking status, your hydration status,

0:17:40.396 --> 0:17:43.796
<v Speaker 2>your fatigue status, because that's so we're we kind of

0:17:43.836 --> 0:17:47.196
<v Speaker 2>thought about anything that could affect voice, right, your socio

0:17:47.236 --> 0:17:50.996
<v Speaker 2>economical status because if you think about it, that's going

0:17:51.036 --> 0:17:54.956
<v Speaker 2>to affect you know, your linguistics as well. And then

0:17:55.036 --> 0:17:59.916
<v Speaker 2>so other that extensive demographics, then we collect confounders, so

0:17:59.956 --> 0:18:02.196
<v Speaker 2>we think about anything that could change your voice. Do

0:18:02.236 --> 0:18:05.556
<v Speaker 2>you have allergies? Do you do you have dental issues?

0:18:05.596 --> 0:18:09.636
<v Speaker 2>Do you wear braces? So everybody gets a basic test

0:18:09.756 --> 0:18:13.036
<v Speaker 2>about if they are depressed. So no matter what disease

0:18:13.076 --> 0:18:15.716
<v Speaker 2>you have, you kind of get the basic tests for

0:18:15.796 --> 0:18:18.756
<v Speaker 2>all the other disease to measure if it's possible that

0:18:18.836 --> 0:18:21.356
<v Speaker 2>you have concurrent diseases at the same time.

0:18:21.396 --> 0:18:24.716
<v Speaker 1>Because presumably because people are in fact complex, and there

0:18:24.756 --> 0:18:27.676
<v Speaker 1>are many people who have depression and Parkinson's and you

0:18:27.716 --> 0:18:29.716
<v Speaker 1>want to understand what's going on there.

0:18:30.116 --> 0:18:33.836
<v Speaker 2>I mean, most people are complex, right, It's really rare

0:18:33.916 --> 0:18:36.396
<v Speaker 2>to have and people that go to the doctor are

0:18:36.436 --> 0:18:39.636
<v Speaker 2>not twenty year old and healthy. Right, most of the

0:18:39.676 --> 0:18:42.596
<v Speaker 2>people who will use our technology or will benefit from

0:18:42.636 --> 0:18:45.716
<v Speaker 2>these database will be your typical sixty year old chronic

0:18:45.756 --> 0:18:48.196
<v Speaker 2>disease patient that comes into the doctor and they're not

0:18:48.316 --> 0:18:50.076
<v Speaker 2>they don't have a sterile bill of health.

0:18:50.916 --> 0:18:53.556
<v Speaker 1>How many people do you want to have in the database? Like,

0:18:53.636 --> 0:18:55.436
<v Speaker 1>is there a final number you're going for?

0:18:55.876 --> 0:18:58.476
<v Speaker 2>So at the beginning, we were aiming for thirty thousand,

0:19:00.436 --> 0:19:03.796
<v Speaker 2>which is extremely it's extremely ambitious, I think to be fair,

0:19:03.836 --> 0:19:06.236
<v Speaker 2>I mean, if after four years we get to ten thousand,

0:19:06.276 --> 0:19:10.356
<v Speaker 2>I think it'll be a huge success. Okay, And you

0:19:10.356 --> 0:19:13.636
<v Speaker 2>know the data collection. I think what we're learning is

0:19:13.676 --> 0:19:17.556
<v Speaker 2>that data collection is very resource intensive. To have good

0:19:17.676 --> 0:19:20.196
<v Speaker 2>data is very resource intensive.

0:19:20.996 --> 0:19:25.436
<v Speaker 1>So what happened that made you realize that thirty thousand

0:19:25.636 --> 0:19:27.876
<v Speaker 1>was maybe harder than you thought?

0:19:28.796 --> 0:19:31.756
<v Speaker 2>So? I think we thought that we wanted to collect

0:19:31.796 --> 0:19:34.196
<v Speaker 2>as much data as possible, and our original plan was

0:19:34.236 --> 0:19:38.996
<v Speaker 2>to collect a lot shorter protocols, you know, like shorter clips.

0:19:40.236 --> 0:19:43.516
<v Speaker 2>But as we started working with patients, we realized that

0:19:44.076 --> 0:19:48.076
<v Speaker 2>by getting more data from the same patients, we can

0:19:48.116 --> 0:19:51.636
<v Speaker 2>actually have a lot more information and it provides a

0:19:51.716 --> 0:19:55.276
<v Speaker 2>lot of interesting you know biomarkers. So we're focusing more

0:19:55.316 --> 0:19:58.716
<v Speaker 2>on getting more data from a smaller amount of patients

0:19:59.156 --> 0:20:02.356
<v Speaker 2>and really with the right data, kind of right data

0:20:02.436 --> 0:20:04.636
<v Speaker 2>with a lot of clinical information attached to it.

0:20:08.036 --> 0:20:10.516
<v Speaker 1>After the break, what the world will look like in

0:20:10.556 --> 0:20:22.876
<v Speaker 1>a few years if everything goes well. So this is

0:20:22.916 --> 0:20:25.356
<v Speaker 1>a big project that yeah, Elle and her colleagues are

0:20:25.356 --> 0:20:27.556
<v Speaker 1>embarked on. It's a four year project. They're about a

0:20:27.676 --> 0:20:31.356
<v Speaker 1>year in and there will be interim data releases along

0:20:31.356 --> 0:20:34.156
<v Speaker 1>the way. So I asked her, how long will it

0:20:34.196 --> 0:20:37.036
<v Speaker 1>take for this project to advance the state of the

0:20:37.116 --> 0:20:39.396
<v Speaker 1>science in acoustic biomarkers.

0:20:39.876 --> 0:20:43.036
<v Speaker 2>Yeah, I would say to say at the end of

0:20:43.076 --> 0:20:46.956
<v Speaker 2>the four years would be a probably the best answer.

0:20:47.036 --> 0:20:49.236
<v Speaker 2>I think at the end of the four years. But

0:20:49.316 --> 0:20:51.356
<v Speaker 2>I think that you know, you can just say, oh,

0:20:51.356 --> 0:20:53.196
<v Speaker 2>we'll just start training models at the end of the

0:20:53.196 --> 0:20:55.396
<v Speaker 2>four year once we have all the data. Right, It's

0:20:55.396 --> 0:20:57.636
<v Speaker 2>not just about you know, building one model that I'll

0:20:57.636 --> 0:21:02.036
<v Speaker 2>answer your question, is about continuously training models to understand

0:21:02.196 --> 0:21:05.756
<v Speaker 2>which biomarkers to extract the then build products that walk.

0:21:06.356 --> 0:21:13.756
<v Speaker 1>So, so, if things go well, what will this world

0:21:13.796 --> 0:21:16.596
<v Speaker 1>look like in whatever five years?

0:21:17.276 --> 0:21:21.156
<v Speaker 2>Yes, So, I mean there's there's a few things that

0:21:21.196 --> 0:21:24.036
<v Speaker 2>this can help with in general, voice biomarkers. Let's not

0:21:24.076 --> 0:21:28.596
<v Speaker 2>talk about just our project. Diagnosis is one thing, right,

0:21:28.676 --> 0:21:35.276
<v Speaker 2>early diagnosis, but that's probably the hardest thing, Huh. Screening

0:21:35.516 --> 0:21:38.996
<v Speaker 2>is most more important. So when we think about screening,

0:21:39.076 --> 0:21:41.596
<v Speaker 2>it means you, let's say you live really far you

0:21:41.596 --> 0:21:43.956
<v Speaker 2>don't have access to a doctor, but your doctor has

0:21:43.956 --> 0:21:46.356
<v Speaker 2>an iPhone and you can talk into the iPhone and

0:21:46.356 --> 0:21:49.076
<v Speaker 2>it can say, hey, something's wrong. You know, you need

0:21:49.236 --> 0:21:53.156
<v Speaker 2>a neurological specialist, for example. So to help screen and triage.

0:21:53.236 --> 0:21:55.956
<v Speaker 2>I think this probably we're looking at in the next

0:21:55.996 --> 0:22:00.476
<v Speaker 2>five years, something definitely possible. The other product that I

0:22:00.516 --> 0:22:03.596
<v Speaker 2>think will be very possible within five years is tracking

0:22:03.596 --> 0:22:07.196
<v Speaker 2>of diseases. If you want to monitor the evolution of

0:22:07.236 --> 0:22:11.876
<v Speaker 2>parkinson or how people respond to drugs. That's why pharmaceutical

0:22:11.876 --> 0:22:13.196
<v Speaker 2>companies are very interested.

0:22:13.396 --> 0:22:16.716
<v Speaker 1>Right. So the acoustic biomarker is not just a binary

0:22:16.796 --> 0:22:19.676
<v Speaker 1>signal of disease, no disease. It can tell you a

0:22:19.716 --> 0:22:23.796
<v Speaker 1>lot about the status of disease. Is it getting better,

0:22:23.836 --> 0:22:24.556
<v Speaker 1>is it getting worse?

0:22:24.956 --> 0:22:27.996
<v Speaker 2>Evolution, especially if you train it on your own voice. Right,

0:22:28.476 --> 0:22:32.596
<v Speaker 2>it's even easier to detect changes in somebody's voice as

0:22:32.716 --> 0:22:36.676
<v Speaker 2>they progress, like your sory for example, or Alexa that

0:22:36.756 --> 0:22:39.716
<v Speaker 2>learns listens to your voice. So that's going to be

0:22:39.716 --> 0:22:42.396
<v Speaker 2>a really good tool for pharmaceutical companies. That's why they're

0:22:42.396 --> 0:22:44.716
<v Speaker 2>investing in it, right, to see how you respond to

0:22:44.756 --> 0:22:47.756
<v Speaker 2>a drug, how you respond to a treatment. And when

0:22:47.796 --> 0:22:51.396
<v Speaker 2>you think about telehealth at home, right, so more and

0:22:51.476 --> 0:22:55.796
<v Speaker 2>more we're going to talk about remote monitoring people. There

0:22:55.796 --> 0:22:57.676
<v Speaker 2>were just too many people on this earth to all

0:22:57.716 --> 0:22:59.236
<v Speaker 2>be in hospitals when we're sick.

0:22:59.876 --> 0:23:02.356
<v Speaker 1>Well, and if you can stay out of the hospital

0:23:02.356 --> 0:23:04.636
<v Speaker 1>when you're sick, that's better, Right, You don't want to

0:23:04.636 --> 0:23:06.956
<v Speaker 1>go to the hospital unless you have to do yeah, or.

0:23:07.756 --> 0:23:10.836
<v Speaker 2>That you're you know, your Lexa detects when your voice

0:23:10.836 --> 0:23:14.036
<v Speaker 2>starts detailor rating and sends you a nurse before you

0:23:14.076 --> 0:23:15.236
<v Speaker 2>need to go to the hospital.

0:23:15.716 --> 0:23:19.396
<v Speaker 1>So there's a more general version of that one, right

0:23:19.476 --> 0:23:22.716
<v Speaker 1>that you could imagine, which is you get your whatever,

0:23:22.836 --> 0:23:25.716
<v Speaker 1>your iPhone, your Android phone, and you have a choice

0:23:25.716 --> 0:23:27.956
<v Speaker 1>when you're setting up your phone, like do you want

0:23:27.956 --> 0:23:32.596
<v Speaker 1>to opt into to the phone listening and to tell

0:23:32.636 --> 0:23:34.676
<v Speaker 1>you if you need to go talk to your doctor, right,

0:23:34.836 --> 0:23:39.156
<v Speaker 1>just like a very broad based thing that you could

0:23:39.196 --> 0:23:43.236
<v Speaker 1>opt into, like I would probably opt into that. I mean,

0:23:43.316 --> 0:23:44.916
<v Speaker 1>is that a thing that you think about?

0:23:45.276 --> 0:23:48.076
<v Speaker 2>So, I mean yes, I'm sure that you know Apple

0:23:48.196 --> 0:23:49.636
<v Speaker 2>is working on that already.

0:23:49.756 --> 0:23:52.196
<v Speaker 3>They are, you know.

0:23:52.516 --> 0:23:55.996
<v Speaker 2>The question is there has to be technology that's being

0:23:56.036 --> 0:23:59.916
<v Speaker 2>developed as well to ensure privacy of not only you,

0:24:00.396 --> 0:24:03.676
<v Speaker 2>but your environment. Right, because when it's your phone, then

0:24:03.676 --> 0:24:04.956
<v Speaker 2>it's your environment as well.

0:24:05.476 --> 0:24:07.916
<v Speaker 1>So you brought up privacy in that context, we can

0:24:08.276 --> 0:24:10.876
<v Speaker 1>knock out private see in the context of the database

0:24:10.916 --> 0:24:17.436
<v Speaker 1>as well. Here, how could it go wrong? Building a

0:24:17.516 --> 0:24:23.196
<v Speaker 1>database of thousands of people's voices with tons of data

0:24:23.236 --> 0:24:25.916
<v Speaker 1>about them sort of answers itself.

0:24:26.156 --> 0:24:28.516
<v Speaker 2>Yeah, it can go wrong in many ways. And I

0:24:28.796 --> 0:24:31.156
<v Speaker 2>just came out of like two hours of meetings of this.

0:24:31.276 --> 0:24:33.956
<v Speaker 2>So add the Bridge to ay I program. We have

0:24:34.036 --> 0:24:37.956
<v Speaker 2>a huge group of bioethicists and one of our big

0:24:38.036 --> 0:24:41.756
<v Speaker 2>aim as a group is really to ensure patient privacy

0:24:41.796 --> 0:24:44.356
<v Speaker 2>and to answer these questions of how do we protect

0:24:44.396 --> 0:24:47.276
<v Speaker 2>patient privacy in the context of open data. Right, So,

0:24:47.676 --> 0:24:50.316
<v Speaker 2>you are absolutely right, tons of things can go wrong.

0:24:50.876 --> 0:24:55.996
<v Speaker 2>People can be potentially reidentified through their voice. So one

0:24:56.036 --> 0:24:59.236
<v Speaker 2>of our biggest goals this year is determined what part

0:24:59.276 --> 0:25:02.916
<v Speaker 2>of the voice is identifiable and which part is not okay,

0:25:03.316 --> 0:25:05.596
<v Speaker 2>And all of this is based on the Hippo law.

0:25:05.796 --> 0:25:08.036
<v Speaker 2>Hippo law is from the nineteen nineties.

0:25:08.316 --> 0:25:13.796
<v Speaker 1>Hippola the that governs sharing and security of people's medical information.

0:25:13.676 --> 0:25:17.116
<v Speaker 2>Correct protected health information PHI, we call that, and that

0:25:17.196 --> 0:25:20.796
<v Speaker 2>law was made in nineteen nineties, right and back then

0:25:20.876 --> 0:25:23.276
<v Speaker 2>they listed a list of things of what they called

0:25:23.396 --> 0:25:27.956
<v Speaker 2>PHI or identifiers that cannot be shared openly and that

0:25:27.956 --> 0:25:31.996
<v Speaker 2>should stay in the hospital. And voice prints are listed.

0:25:32.756 --> 0:25:35.116
<v Speaker 2>When you go into what a definition of a voice

0:25:35.116 --> 0:25:39.876
<v Speaker 2>print is, it's very nebulous. It's you know, we don't know.

0:25:39.956 --> 0:25:42.036
<v Speaker 2>So because of that nebubularity.

0:25:42.116 --> 0:25:45.596
<v Speaker 1>If I have that word, if that's nebulosity, I'm fred,

0:25:45.596 --> 0:25:46.156
<v Speaker 1>I don't know.

0:25:49.196 --> 0:25:52.676
<v Speaker 2>Because it's so nebulous. A lot of institutions, a lot

0:25:52.676 --> 0:25:56.316
<v Speaker 2>of hospitals will say, well, you know, voice is not

0:25:56.996 --> 0:25:59.956
<v Speaker 2>is not an identifier as long as you don't say hi,

0:26:00.076 --> 0:26:02.916
<v Speaker 2>I'm John Doe and I live at four twenty five,

0:26:02.956 --> 0:26:06.036
<v Speaker 2>blah blah blah. Other universities will say, no, no, no,

0:26:06.156 --> 0:26:09.436
<v Speaker 2>voice is always an identifier. You can never really least

0:26:09.516 --> 0:26:12.876
<v Speaker 2>voice data. So what our group is doing right now

0:26:13.036 --> 0:26:16.276
<v Speaker 2>is really looking at why the hippo law says this,

0:26:16.516 --> 0:26:20.996
<v Speaker 2>what are the actual legal implications of sharing voice? And

0:26:21.556 --> 0:26:24.236
<v Speaker 2>we always grade it in terms of risk, Right, if

0:26:24.316 --> 0:26:27.116
<v Speaker 2>I talked about all the things that we collect, you

0:26:27.156 --> 0:26:30.836
<v Speaker 2>can think that the respiratory sounds are probably very safe

0:26:30.836 --> 0:26:36.236
<v Speaker 2>to share versus a speech sample. As we say free speech,

0:26:37.316 --> 0:26:41.196
<v Speaker 2>it's probably the most identifying if you have to grade it, right,

0:26:41.756 --> 0:26:45.436
<v Speaker 2>And we're kind of looking at, well, where is the balance?

0:26:45.516 --> 0:26:49.156
<v Speaker 2>How much can we release? And also we can transform

0:26:49.236 --> 0:26:51.796
<v Speaker 2>the data, so for example, we can change the data,

0:26:52.156 --> 0:26:58.036
<v Speaker 2>the audio data and what we call visual spectrograms.

0:26:56.156 --> 0:26:57.036
<v Speaker 1>Like a waveform.

0:26:58.076 --> 0:27:01.476
<v Speaker 2>Yeah, it's a sort of waveform that machine learning can use.

0:27:02.236 --> 0:27:08.076
<v Speaker 2>We can extract acoustic features, right, like loudness, frequency, stuff like.

0:27:08.036 --> 0:27:10.716
<v Speaker 1>That, and basically trying to figure out how to make

0:27:10.756 --> 0:27:15.876
<v Speaker 1>a person be not identifiable based on their voice without

0:27:16.036 --> 0:27:19.636
<v Speaker 1>messing up the database. Like that's the balance, right, Like

0:27:19.716 --> 0:27:22.636
<v Speaker 1>if you monkey with their voice too much, then your

0:27:22.676 --> 0:27:24.916
<v Speaker 1>monkey with the data, the database that we care the

0:27:24.956 --> 0:27:27.436
<v Speaker 1>most about. Like that seems like a hard trade off.

0:27:28.236 --> 0:27:31.076
<v Speaker 1>So if we go farther out into the future, you

0:27:31.156 --> 0:27:33.996
<v Speaker 1>solve all these problems, you build your giant database, the

0:27:34.076 --> 0:27:36.956
<v Speaker 1>models get really good. All of these things seem like

0:27:36.996 --> 0:27:42.876
<v Speaker 1>things that may well happen. I'm curious about, you know,

0:27:43.036 --> 0:27:48.596
<v Speaker 1>AI doing some chunk of what you do now. Right,

0:27:48.636 --> 0:27:51.276
<v Speaker 1>we see this happening, say in radiology already. AI is

0:27:51.316 --> 0:27:54.596
<v Speaker 1>clearly very good at doing some of the technical work

0:27:54.636 --> 0:27:58.836
<v Speaker 1>that radiologists do in diagnosing scans of patients. Right, how

0:27:58.836 --> 0:28:02.516
<v Speaker 1>do you think about the future of AI, you know,

0:28:02.676 --> 0:28:06.876
<v Speaker 1>using acoustic biomarkers to make diagnosis in a way that

0:28:06.956 --> 0:28:09.436
<v Speaker 1>is similar to what you do now as a human being.

0:28:10.156 --> 0:28:11.836
<v Speaker 2>Yeah, So, I mean, I I don't think I'm going

0:28:11.916 --> 0:28:14.156
<v Speaker 2>to lose my job yet because I would say that

0:28:14.316 --> 0:28:18.836
<v Speaker 2>my primary goal as a doctor is not to necessarily

0:28:19.156 --> 0:28:22.356
<v Speaker 2>do that, right, Like, my primary goal is, yes, to diagnose,

0:28:22.396 --> 0:28:24.436
<v Speaker 2>but it's to treat patient. So for now, AI is

0:28:24.476 --> 0:28:26.676
<v Speaker 2>not going to treat the patient. So I think what

0:28:26.716 --> 0:28:28.836
<v Speaker 2>it's going to do is it's going to support a

0:28:28.876 --> 0:28:35.036
<v Speaker 2>lot of the workforce. So for example, I'm an academic laryngologist.

0:28:35.116 --> 0:28:36.836
<v Speaker 2>I'm a super a super specialist.

0:28:37.116 --> 0:28:37.276
<v Speaker 3>Right.

0:28:37.716 --> 0:28:39.876
<v Speaker 2>For people to get to me, they often see like

0:28:39.916 --> 0:28:40.996
<v Speaker 2>five different doctors.

0:28:41.276 --> 0:28:43.956
<v Speaker 1>So instead of going through for specialists who can't figure

0:28:43.956 --> 0:28:45.876
<v Speaker 1>it out, you go to your primary care doctor or

0:28:45.916 --> 0:28:49.036
<v Speaker 1>even you just talk to your phone, and your phone

0:28:49.076 --> 0:28:50.876
<v Speaker 1>says you better talk to your primary care doctor, and

0:28:50.876 --> 0:28:54.356
<v Speaker 1>your primary care doctor sends sends the patient directly.

0:28:54.076 --> 0:28:57.756
<v Speaker 2>To you, correct, correct, right to say like hey. Because again,

0:28:58.196 --> 0:29:00.276
<v Speaker 2>most of what we do for a very long time

0:29:00.356 --> 0:29:04.916
<v Speaker 2>will will need a gold standard right diagnosis. So often

0:29:05.116 --> 0:29:08.196
<v Speaker 2>you know it's it's a it's a biopsy, or it's

0:29:08.276 --> 0:29:09.876
<v Speaker 2>a it's and imaging.

0:29:09.996 --> 0:29:11.116
<v Speaker 3>You need a gold standard.

0:29:11.276 --> 0:29:16.316
<v Speaker 1>The acoustic biomarker is not a clear enough diagnostic technique.

0:29:16.356 --> 0:29:18.236
<v Speaker 1>You need something more reliable.

0:29:18.716 --> 0:29:20.436
<v Speaker 2>So I don't think it's going to you know, no

0:29:20.556 --> 0:29:22.676
<v Speaker 2>doctor will say, oh, well, based on this, this is

0:29:22.716 --> 0:29:26.716
<v Speaker 2>your diagnostics, start chemotherapy. That's not where we're going. I

0:29:26.756 --> 0:29:32.236
<v Speaker 2>wouldn't take chemotherapy based on an acoustic biomarker. But it's

0:29:32.316 --> 0:29:35.676
<v Speaker 2>hopefully going to support a lot of primary care and

0:29:35.756 --> 0:29:38.636
<v Speaker 2>access to care to get to the right person faster.

0:29:39.916 --> 0:29:43.196
<v Speaker 1>Great anything else we should talk about.

0:29:44.156 --> 0:29:46.396
<v Speaker 2>The one thing we didn't talk about, I guess I

0:29:46.436 --> 0:29:48.116
<v Speaker 2>talk about this all day, so sometimes it's hard to

0:29:48.356 --> 0:29:52.636
<v Speaker 2>remember what I've said in iowha haven't said. But the

0:29:52.716 --> 0:29:57.436
<v Speaker 2>implication for probably all this new telehealth you know, online

0:29:57.756 --> 0:30:01.196
<v Speaker 2>world that we live in, a lot of industries are

0:30:01.236 --> 0:30:07.116
<v Speaker 2>already integrating tools. So, for example, Canary Speech is a

0:30:07.156 --> 0:30:09.876
<v Speaker 2>startup that sold a product. I think they're working with

0:30:09.956 --> 0:30:15.756
<v Speaker 2>teams to capture if there's signs in your voice of depression.

0:30:16.356 --> 0:30:19.916
<v Speaker 1>Teams meaning Microsoft teams, like Microsoft's version of Zoom.

0:30:20.076 --> 0:30:22.076
<v Speaker 2>Yeah, yeah, so I think And don't quote me on

0:30:22.116 --> 0:30:24.316
<v Speaker 2>the particular. Maybe I'm you know, I'm not giving, but

0:30:24.316 --> 0:30:27.236
<v Speaker 2>but I know there's a few startups that are starting

0:30:27.236 --> 0:30:31.156
<v Speaker 2>to integrate products in Zoom or in teams to let

0:30:31.316 --> 0:30:34.396
<v Speaker 2>employers know that, hey, your employee is not doing well

0:30:34.436 --> 0:30:36.636
<v Speaker 2>based on his voice, for example, Right, and.

0:30:36.636 --> 0:30:40.956
<v Speaker 1>What is your view of the efficacy of those?

0:30:41.836 --> 0:30:45.236
<v Speaker 2>So, I mean, I I the easy the quick answer

0:30:45.396 --> 0:30:50.996
<v Speaker 2>is it probably works partially. Yeah, But the question is

0:30:50.996 --> 0:30:53.156
<v Speaker 2>not if it works full you're not. The question is

0:30:53.196 --> 0:30:57.196
<v Speaker 2>does it make a difference? Right, So let's say let's.

0:30:57.556 --> 0:30:59.436
<v Speaker 1>Do what they say it does. Is a question that

0:30:59.476 --> 0:31:03.196
<v Speaker 1>matters to me, right, like does it are the claims valid?

0:31:03.516 --> 0:31:05.396
<v Speaker 1>Seems like a reasonable starting Yeah.

0:31:05.236 --> 0:31:05.876
<v Speaker 3>I think so.

0:31:05.876 --> 0:31:08.276
<v Speaker 2>So. I just I just reviewed one an article of

0:31:08.316 --> 0:31:11.396
<v Speaker 2>one of the startups Fantastic that's looking at like depression,

0:31:11.476 --> 0:31:14.396
<v Speaker 2>and I mean their numbers look great. I do think

0:31:14.396 --> 0:31:16.796
<v Speaker 2>that's that the results that a lot of these projects

0:31:16.796 --> 0:31:19.956
<v Speaker 2>are getting are definitely positive and promising.

0:31:19.996 --> 0:31:25.996
<v Speaker 1>Absolutely, we'll be back in a minute with the light.

0:31:28.436 --> 0:31:29.636
<v Speaker 3>M h.

0:31:37.636 --> 0:31:41.676
<v Speaker 1>Now, as promised, we're back with the lighting around. What

0:31:41.756 --> 0:31:42.636
<v Speaker 1>was your band called?

0:31:43.356 --> 0:31:48.316
<v Speaker 2>Ha, My chase stage name was Ella Bence Ella Bence

0:31:48.596 --> 0:31:50.636
<v Speaker 2>because my last name is Ben Susan so that was

0:31:50.676 --> 0:31:51.276
<v Speaker 2>too long.

0:31:51.876 --> 0:31:53.516
<v Speaker 1>And yeah, al became Ella.

0:31:53.956 --> 0:31:55.756
<v Speaker 3>Yeah, I could say my first name.

0:31:57.276 --> 0:32:00.876
<v Speaker 1>What did you have a hit song in French? What

0:32:00.956 --> 0:32:01.476
<v Speaker 1>was it called?

0:32:04.196 --> 0:32:05.916
<v Speaker 2>I wouldn't call it a hit song. It was called

0:32:05.916 --> 0:32:10.036
<v Speaker 2>annalis samp means I guess in English, it's like a

0:32:10.116 --> 0:32:10.916
<v Speaker 2>one way flight.

0:32:12.236 --> 0:32:13.276
<v Speaker 1>Can you sing a line?

0:32:13.516 --> 0:32:17.156
<v Speaker 3>No, that's my previous life.

0:32:17.756 --> 0:32:22.836
<v Speaker 1>Can you just say a line? Yeah?

0:32:22.956 --> 0:32:25.876
<v Speaker 3>I was Uh, it's in French, though I know it'll

0:32:25.916 --> 0:32:29.676
<v Speaker 3>sound great. Yeah, And nalisam.

0:32:33.036 --> 0:32:35.916
<v Speaker 2>Means get me a one way flight for the other

0:32:36.036 --> 0:32:38.636
<v Speaker 2>side of the world. I hope people are really happy there.

0:32:39.276 --> 0:32:42.436
<v Speaker 1>Well you're here, now, you're you're you're in Tampa. Now

0:32:42.476 --> 0:32:43.836
<v Speaker 1>did it work out as hoped?

0:32:44.116 --> 0:32:46.796
<v Speaker 2>Oh? Yeah, I mean I have I have the best

0:32:46.836 --> 0:32:49.476
<v Speaker 2>job in the world, you know. I get my my

0:32:49.556 --> 0:32:53.316
<v Speaker 2>mom raised us me and my brother saying you guys

0:32:53.356 --> 0:32:56.956
<v Speaker 2>need two jobs, one that make money, makes money and

0:32:56.996 --> 0:33:00.276
<v Speaker 2>the other one that makes you really happy. And if

0:33:00.316 --> 0:33:02.996
<v Speaker 2>you manage to have both in one job, then you'll

0:33:02.996 --> 0:33:04.996
<v Speaker 2>have made it, you know. And I get to be

0:33:05.116 --> 0:33:10.596
<v Speaker 2>a surgeon and work with voice and voice professional and

0:33:10.636 --> 0:33:13.556
<v Speaker 2>you know, it's been my passion pretty much all my life.

0:33:13.556 --> 0:33:14.876
<v Speaker 3>So yeah, do.

0:33:14.876 --> 0:33:17.916
<v Speaker 1>You work with professional singers as a physician?

0:33:18.316 --> 0:33:22.276
<v Speaker 2>Absolutely? I mean I love treating my professional singers, so yeah,

0:33:22.316 --> 0:33:23.116
<v Speaker 2>I love that part.

0:33:22.916 --> 0:33:23.396
<v Speaker 3>Of my job.

0:33:23.836 --> 0:33:26.076
<v Speaker 1>Have you treated anybody famous? Yes?

0:33:26.156 --> 0:33:26.836
<v Speaker 3>But I can't tell.

0:33:28.716 --> 0:33:30.116
<v Speaker 1>What's Taylor Swift really like?

0:33:30.476 --> 0:33:32.276
<v Speaker 3>Oh that I don't know. I wish No.

0:33:32.676 --> 0:33:35.116
<v Speaker 2>She sounds fine, though she probably doesn't need a laryngologist.

0:33:35.756 --> 0:33:37.516
<v Speaker 1>What's the best cure for a sore throat?

0:33:39.196 --> 0:33:43.356
<v Speaker 2>Voice rest and advil? It takes the inflammation away.

0:33:43.956 --> 0:33:48.676
<v Speaker 1>Uh huh? Advil? Just ibuprofen and don't talk and voice rest.

0:33:48.836 --> 0:33:49.036
<v Speaker 3>Yes?

0:33:49.396 --> 0:33:55.196
<v Speaker 1>Okay? Are you just always involuntarily diagnosing people based on

0:33:55.236 --> 0:33:57.076
<v Speaker 1>their voice all the time? Okay?

0:33:57.156 --> 0:33:58.836
<v Speaker 3>So funny funny fun fact.

0:33:58.916 --> 0:34:03.356
<v Speaker 2>Two months ago, my girlfriend from residency called me. I

0:34:03.356 --> 0:34:05.036
<v Speaker 2>hadn't spoken to her in like a year and a

0:34:05.076 --> 0:34:07.916
<v Speaker 2>half and she called me and she said hi, And

0:34:07.956 --> 0:34:09.076
<v Speaker 2>I said, you're pregnant?

0:34:09.996 --> 0:34:10.436
<v Speaker 1>Really?

0:34:11.676 --> 0:34:15.276
<v Speaker 2>And I could hear it because pregnancy gives you like this,

0:34:15.676 --> 0:34:18.076
<v Speaker 2>you know, you get stuffy in a certain way in

0:34:18.116 --> 0:34:20.876
<v Speaker 2>your nose, like we call it rhyanidis of pregnancy.

0:34:21.276 --> 0:34:22.676
<v Speaker 3>And I knew her voice very well.

0:34:22.716 --> 0:34:24.956
<v Speaker 2>She was my girlfriend for a long time, you know,

0:34:24.996 --> 0:34:28.356
<v Speaker 2>we studied together, and she just I knew it, you know.

0:34:28.556 --> 0:34:30.796
<v Speaker 2>And and I think she says, hey, how are you.

0:34:30.836 --> 0:34:32.556
<v Speaker 2>I wanted to talk to you, and I'm like, you're pregnant,

0:34:32.596 --> 0:34:33.276
<v Speaker 2>and She's like, how.

0:34:33.156 --> 0:34:35.196
<v Speaker 3>Did you know? So?

0:34:35.356 --> 0:34:37.356
<v Speaker 1>Yes, that's amazing people.

0:34:37.596 --> 0:34:40.756
<v Speaker 2>I mean I was listening to the political debates, you know,

0:34:41.276 --> 0:34:43.436
<v Speaker 2>and I'm like, ooh, this guy needs a laryngologist. I

0:34:43.436 --> 0:34:45.196
<v Speaker 2>could I'm diagnosing people all the time.

0:34:46.076 --> 0:34:52.036
<v Speaker 1>Well, they should give you a call. Yeah, absolutely, Okay,

0:34:53.076 --> 0:34:54.076
<v Speaker 1>lovely to talk with you.

0:34:54.996 --> 0:34:58.956
<v Speaker 2>It was one of the funnest interviews I've done.

0:35:00.156 --> 0:35:02.836
<v Speaker 1>Yeah, Albin Susan runs the Health Voice Center at the

0:35:02.956 --> 0:35:06.796
<v Speaker 1>University of South Florida. She's also a principal investigator on

0:35:06.836 --> 0:35:11.076
<v Speaker 1>the Bridge to AI Voice Project. Today's show was produced

0:35:11.076 --> 0:35:14.116
<v Speaker 1>by Gabriel Hunter Chang, edited by Lydia Jeane Kott, and

0:35:14.356 --> 0:35:17.716
<v Speaker 1>engineered by Sarah Bruguer. Just a quick note, We're going

0:35:17.756 --> 0:35:20.036
<v Speaker 1>to be taking a break for the next couple of weeks,

0:35:20.476 --> 0:35:23.276
<v Speaker 1>but we will have an episode in our feed next

0:35:23.276 --> 0:35:26.356
<v Speaker 1>week from our colleagues over at the Happiness Lab that

0:35:26.556 --> 0:35:31.156
<v Speaker 1>is timed not coincidentally to World Happiness Day, which I'm

0:35:31.196 --> 0:35:34.836
<v Speaker 1>informed is on March twentieth. I'm Jacob Goldstein and we'll

0:35:34.836 --> 0:35:48.276
<v Speaker 1>be back soon with more episodes of What's Your Problem