1
00:00:15,356 --> 00:00:22,596
Speaker 1: Pushkin. There are a lot of reasons that I'm excited

2
00:00:22,636 --> 00:00:25,836
Speaker 1: about today's show. I'm going to tell you three right now.

3
00:00:26,796 --> 00:00:30,836
Speaker 1: Number One, the show is about this whole dimension of

4
00:00:30,996 --> 00:00:37,596
Speaker 1: medicine that I essentially didn't know existed, acoustic biomarkers, basically

5
00:00:38,076 --> 00:00:41,996
Speaker 1: using a person's voice to assess their health. Second thing

6
00:00:42,036 --> 00:00:45,076
Speaker 1: I'm excited about the show is about the intersection of

7
00:00:45,436 --> 00:00:51,196
Speaker 1: AI and healthcare, one of my top say five intersections.

8
00:00:51,676 --> 00:00:57,316
Speaker 1: Love that intersection. And three, today's guest doctor Yael Bensusan,

9
00:00:57,796 --> 00:01:01,436
Speaker 1: gave me what was truly the best excuse that anyone

10
00:01:01,436 --> 00:01:04,076
Speaker 1: has ever given me for canceling an interview at the

11
00:01:04,156 --> 00:01:04,916
Speaker 1: last minute.

12
00:01:05,156 --> 00:01:07,116
Speaker 2: Yeah, so I'm really sorry for having to cancel on

13
00:01:07,156 --> 00:01:07,916
Speaker 2: you yesterday.

14
00:01:09,196 --> 00:01:12,756
Speaker 1: What was the what was the surgery you had to

15
00:01:12,756 --> 00:01:13,356
Speaker 1: do yesterday?

16
00:01:13,596 --> 00:01:17,916
Speaker 2: So yesterday we call it airway surgery, where I take

17
00:01:17,956 --> 00:01:20,076
Speaker 2: a base to the oar and I have to open

18
00:01:20,196 --> 00:01:22,996
Speaker 2: up their windpipe or their trachia because they're a scar

19
00:01:23,116 --> 00:01:26,716
Speaker 2: tissue that's blocking them from breathing. So I have to

20
00:01:26,716 --> 00:01:29,356
Speaker 2: go with a laser and cut the scar tissue out

21
00:01:30,276 --> 00:01:32,796
Speaker 2: and then take a balloon and open up their windpipe

22
00:01:32,916 --> 00:01:35,996
Speaker 2: so that they can wake up and breathe better, and

23
00:01:36,076 --> 00:01:38,756
Speaker 2: that translates to a different sound when they're breathing.

24
00:01:38,836 --> 00:01:39,516
Speaker 3: So when they're not.

25
00:01:39,596 --> 00:01:41,956
Speaker 2: Breathing because of the scar tissue, we can sound like,

26
00:01:43,116 --> 00:01:45,756
Speaker 2: you know, very noisy breathing. We call it the Darth

27
00:01:45,836 --> 00:01:49,236
Speaker 2: Vader breathing. And then when wake up they wake up

28
00:01:49,236 --> 00:01:51,956
Speaker 2: from surgery and they're done, they have silent breathing, which

29
00:01:51,956 --> 00:01:53,916
Speaker 2: means that I know that I did a good job.

30
00:02:00,236 --> 00:02:02,556
Speaker 1: I'm Jacob Goldstein and this is What's your Problem the

31
00:02:02,596 --> 00:02:05,156
Speaker 1: show where I talk to people. We're trying to make

32
00:02:05,236 --> 00:02:09,796
Speaker 1: technological progress. Doctor Yeah. Eld and Susan run the Health

33
00:02:09,916 --> 00:02:13,076
Speaker 1: Voice Center at the University of South Florida, and she

34
00:02:13,156 --> 00:02:16,596
Speaker 1: is also leading a team of researchers that's building a

35
00:02:16,716 --> 00:02:21,316
Speaker 1: giant database of human voices and breadths and health information.

36
00:02:22,116 --> 00:02:25,236
Speaker 1: Her problem is this, how do you record the voices

37
00:02:25,276 --> 00:02:29,876
Speaker 1: of thousands of people without violating patient privacy laws while

38
00:02:29,876 --> 00:02:34,276
Speaker 1: building a giant public database that could someday allow your

39
00:02:34,356 --> 00:02:37,996
Speaker 1: phone to warn you, based solely on your voice that

40
00:02:38,036 --> 00:02:40,996
Speaker 1: you may be getting sick. Yeah El told me that

41
00:02:41,116 --> 00:02:43,996
Speaker 1: she got into this field in part because she used

42
00:02:43,996 --> 00:02:44,636
Speaker 1: to be a singer.

43
00:02:46,636 --> 00:02:50,316
Speaker 2: So I growing up, you know, I always was in

44
00:02:50,356 --> 00:02:53,116
Speaker 2: a very musical family. I took singing lessons when I

45
00:02:53,196 --> 00:02:57,636
Speaker 2: was a kid, and then I started singing more professionally

46
00:02:57,996 --> 00:03:00,956
Speaker 2: around eighteen years old, and I had a short but

47
00:03:01,076 --> 00:03:05,356
Speaker 2: exciting singing career. I wrote pop folk music. We had

48
00:03:05,356 --> 00:03:08,236
Speaker 2: a bend, and we toured. We had an album out

49
00:03:08,276 --> 00:03:12,316
Speaker 2: in two thousand and twelve. Yeah, and I mean it

50
00:03:12,436 --> 00:03:15,836
Speaker 2: was a lot of fun. And actually the reason I

51
00:03:15,956 --> 00:03:18,876
Speaker 2: was able to have that short and exciting career was

52
00:03:19,276 --> 00:03:21,876
Speaker 2: because I met a speech pathologist when I was fifteen.

53
00:03:21,956 --> 00:03:25,836
Speaker 2: So I was taking singing classes and one day my

54
00:03:25,916 --> 00:03:28,036
Speaker 2: teacher looked at me and she said, there's something wrong

55
00:03:28,076 --> 00:03:28,756
Speaker 2: with your voice.

56
00:03:28,996 --> 00:03:29,876
Speaker 3: Go get checked.

57
00:03:31,716 --> 00:03:35,236
Speaker 2: And I met a laryngologist who put his camera down

58
00:03:35,636 --> 00:03:37,996
Speaker 2: and she said, you have nodules on your vocal cords

59
00:03:38,036 --> 00:03:39,956
Speaker 2: and you might not be able to sing again if

60
00:03:39,996 --> 00:03:42,316
Speaker 2: you don't take this seriously. And I went to see

61
00:03:42,316 --> 00:03:45,236
Speaker 2: a speech pathologist. I did rehabilitation with my voice for

62
00:03:45,316 --> 00:03:47,596
Speaker 2: six months, and I was able to sing again. And

63
00:03:47,636 --> 00:03:49,916
Speaker 2: I mean that's what led me to then become a

64
00:03:49,916 --> 00:03:54,156
Speaker 2: speech pathologist growing up, and then eventually go to med

65
00:03:54,156 --> 00:03:56,476
Speaker 2: school and then decide to become a laryngologist.

66
00:03:56,756 --> 00:03:58,596
Speaker 3: So it was kind of all interconnected.

67
00:03:59,916 --> 00:04:03,676
Speaker 1: So I know that your research now and most of

68
00:04:04,356 --> 00:04:08,716
Speaker 1: what I'm really interested to talk with you about is

69
00:04:08,716 --> 00:04:12,836
Speaker 1: is around acoustic biomarkers. So just to start, I mean,

70
00:04:13,796 --> 00:04:15,196
Speaker 1: what's an acoustic biomarker?

71
00:04:16,076 --> 00:04:19,116
Speaker 2: Very good question. So what is a biomarker? First, A

72
00:04:19,116 --> 00:04:24,196
Speaker 2: biomarker is something that indicates the presence of a disease, right,

73
00:04:25,396 --> 00:04:28,836
Speaker 2: So if you think about a biomarker for a cancer,

74
00:04:28,916 --> 00:04:31,716
Speaker 2: so different cancers have different types of biomarker. For example,

75
00:04:31,756 --> 00:04:35,476
Speaker 2: for ovarian cancer, we're looking for a specific thing you know,

76
00:04:35,676 --> 00:04:38,756
Speaker 2: called ca in your blood. For different types of cancers,

77
00:04:38,756 --> 00:04:41,316
Speaker 2: they could take a blood draw and find a specific biomarker.

78
00:04:41,316 --> 00:04:44,676
Speaker 2: It's an indicator of a disease. An acoustic biomarkers is

79
00:04:45,116 --> 00:04:47,756
Speaker 2: something that can indicate a presence of a disease, but

80
00:04:47,836 --> 00:04:50,316
Speaker 2: that you can hear. So that's the definition of an

81
00:04:50,356 --> 00:04:54,036
Speaker 2: acoustic biomarker. So I always say, you know, when you

82
00:04:54,156 --> 00:04:57,676
Speaker 2: have people in your family that are not well, you

83
00:04:57,716 --> 00:05:00,836
Speaker 2: will always notice first and you'll say you don't sound

84
00:05:00,876 --> 00:05:04,996
Speaker 2: good right, or you sound funny. And I have the

85
00:05:05,116 --> 00:05:08,036
Speaker 2: luxury to know that because I'm a voice doctor. So

86
00:05:08,196 --> 00:05:10,756
Speaker 2: then people will bring me their family members or people

87
00:05:10,796 --> 00:05:13,356
Speaker 2: will come saying, I don't know what's wrong with me,

88
00:05:13,796 --> 00:05:16,276
Speaker 2: but my wife told me to come because my voice

89
00:05:16,316 --> 00:05:20,756
Speaker 2: is not good. And sometimes it's because their vocal cords

90
00:05:20,916 --> 00:05:23,116
Speaker 2: are not working, but a lot of times it's because

91
00:05:23,156 --> 00:05:25,956
Speaker 2: they can have a neurological issue or a cardiac issue

92
00:05:26,316 --> 00:05:27,956
Speaker 2: that is affecting their voice.

93
00:05:28,196 --> 00:05:34,036
Speaker 1: So, more broadly, what's going on with AI and acoustic biomarkers.

94
00:05:34,756 --> 00:05:37,436
Speaker 2: Yeah, so, so many exciting things are going on. I

95
00:05:37,476 --> 00:05:42,036
Speaker 2: think that's the first answer. There are so many startups,

96
00:05:42,076 --> 00:05:46,876
Speaker 2: so many companies, industry researchers, academic researchers that are working

97
00:05:46,876 --> 00:05:50,956
Speaker 2: and looking into voice AI. And the reason is it's

98
00:05:50,996 --> 00:05:54,916
Speaker 2: really cheap to collect. Right to think about this, If

99
00:05:54,916 --> 00:05:56,796
Speaker 2: you have a phone, it's really cheap to collect Compared

100
00:05:56,836 --> 00:05:56,996
Speaker 2: to this.

101
00:05:57,036 --> 00:05:59,556
Speaker 1: You don't have to pick a blood sample. You have

102
00:05:59,636 --> 00:06:02,276
Speaker 1: exactly just you've got the phone. You've got the device

103
00:06:02,796 --> 00:06:05,236
Speaker 1: literally in your hand already. All you have to do

104
00:06:05,356 --> 00:06:07,636
Speaker 1: is talk, and you're talking already.

105
00:06:07,436 --> 00:06:09,756
Speaker 2: And you're talking already, so it's cheap to that's why

106
00:06:09,836 --> 00:06:12,956
Speaker 2: pharmaceutical industries are also very interested, and there's a lot

107
00:06:12,956 --> 00:06:16,156
Speaker 2: of pharmaceutical projects around it. So there are a lot

108
00:06:16,276 --> 00:06:21,196
Speaker 2: of projects that are going on and the state or

109
00:06:21,236 --> 00:06:24,476
Speaker 2: the The current landscape is that there's tons of people

110
00:06:24,556 --> 00:06:29,316
Speaker 2: working on very similar things and very interesting and various disease.

111
00:06:29,356 --> 00:06:31,876
Speaker 2: So I always I kind of categorize them in three

112
00:06:31,916 --> 00:06:36,076
Speaker 2: categories of diseases that are being studied. One is the

113
00:06:36,276 --> 00:06:41,276
Speaker 2: disease that affects the voice box. Okay, so vocal court paralysis, absolutely,

114
00:06:41,316 --> 00:06:43,756
Speaker 2: it's intuitive. There's going to be vocal biomarkers in that

115
00:06:44,476 --> 00:06:48,356
Speaker 2: voice box cancer, right, that's easy. Then there's a voice

116
00:06:48,396 --> 00:06:52,316
Speaker 2: and speech affecting disorders, so disorders that don't affect the

117
00:06:52,396 --> 00:06:55,436
Speaker 2: voice box, but that have an impact on the voice

118
00:06:55,436 --> 00:06:59,316
Speaker 2: and the speech. Parkinson is one of them, right, Alzheimer's

119
00:06:59,356 --> 00:07:01,996
Speaker 2: is one of them. A stroke somebody having a stroke,

120
00:07:02,156 --> 00:07:04,076
Speaker 2: they don't have a problem with their voice box, but

121
00:07:04,116 --> 00:07:06,236
Speaker 2: their speech is going to be altered. So these are

122
00:07:06,316 --> 00:07:09,476
Speaker 2: voice and speech affecting conditions. So lots of work is

123
00:07:09,476 --> 00:07:11,916
Speaker 2: being done in that field. And the third one is

124
00:07:12,116 --> 00:07:15,636
Speaker 2: diseases that you don't think would affect speech, but still

125
00:07:15,676 --> 00:07:17,556
Speaker 2: people are doing research on that. So there was a

126
00:07:17,596 --> 00:07:21,316
Speaker 2: really interesting study on diabetes. They're saying that there was

127
00:07:21,356 --> 00:07:24,356
Speaker 2: a group that published that they could diagnose people that

128
00:07:24,396 --> 00:07:28,516
Speaker 2: were diabetic versus non diabetics based on their speech and this.

129
00:07:28,836 --> 00:07:34,356
Speaker 1: So this third group is one presumably where there's at

130
00:07:34,436 --> 00:07:39,476
Speaker 1: least the potential for AI to detect differences that even

131
00:07:39,876 --> 00:07:43,196
Speaker 1: experts like you cannot detect, right, I mean, is that

132
00:07:43,276 --> 00:07:45,316
Speaker 1: what's going on there? What?

133
00:07:45,596 --> 00:07:48,316
Speaker 2: So AI is not magical, you know, I think it's

134
00:07:48,436 --> 00:07:50,156
Speaker 2: It does a lot of things. But what AI does

135
00:07:50,236 --> 00:07:53,316
Speaker 2: that the layperson doesn't do is that it can analyze

136
00:07:53,316 --> 00:07:54,596
Speaker 2: a lot more data faster.

137
00:07:55,516 --> 00:07:56,076
Speaker 1: Yeah.

138
00:07:56,196 --> 00:07:59,676
Speaker 2: Right, So AI has the possibility, if you have a

139
00:07:59,836 --> 00:08:03,916
Speaker 2: large data set, to then find small differences in these

140
00:08:04,036 --> 00:08:06,316
Speaker 2: data sets that we don't have. I mean, I would

141
00:08:06,356 --> 00:08:09,076
Speaker 2: have to listen to, you know, thousands and thousands of

142
00:08:09,196 --> 00:08:10,916
Speaker 2: voices and compare them statistically.

143
00:08:11,116 --> 00:08:13,356
Speaker 1: It might it might, right. It might also be able

144
00:08:13,396 --> 00:08:16,716
Speaker 1: to detect differences that are not even audible.

145
00:08:17,156 --> 00:08:20,636
Speaker 2: It could exactly. I can give it an example. There's

146
00:08:20,676 --> 00:08:25,156
Speaker 2: a company looking at atrial fibrillation, and I cannot validate

147
00:08:25,196 --> 00:08:27,796
Speaker 2: their data because that's one of the limitations that we're

148
00:08:27,836 --> 00:08:30,076
Speaker 2: going to talk about. But obviously their data set is

149
00:08:30,076 --> 00:08:33,316
Speaker 2: not public. But they're saying that they can diagnose atrial

150
00:08:33,356 --> 00:08:36,316
Speaker 2: fibrillation based on the voice. And their explanation is that

151
00:08:36,756 --> 00:08:39,396
Speaker 2: our voice vibrates to the sound of our heartbeats.

152
00:08:40,796 --> 00:08:42,756
Speaker 1: Big if true? Fun if true?

153
00:08:43,076 --> 00:08:45,916
Speaker 2: I mean you know, again, the limitation here is that

154
00:08:45,996 --> 00:08:48,356
Speaker 2: it's there's a lot of things you can't validate. But

155
00:08:48,716 --> 00:08:52,276
Speaker 2: they say that they've been validating it with EKGs and

156
00:08:52,396 --> 00:08:54,476
Speaker 2: that they can see it. They can hear a difference

157
00:08:54,516 --> 00:08:56,436
Speaker 2: in the voice between patient patients with a.

158
00:08:56,476 --> 00:09:00,476
Speaker 1: Fib atrial fibrillation. It puts you at risk for a stroke, right,

159
00:09:00,516 --> 00:09:04,156
Speaker 1: it can go undiagnosed. So like, if if this works,

160
00:09:04,196 --> 00:09:08,636
Speaker 1: that would be very helpful to many people, right, absolutely, absolutely.

161
00:09:09,116 --> 00:09:13,196
Speaker 1: So you're mentioning like that's super interesting. It's it's interesting

162
00:09:13,236 --> 00:09:17,716
Speaker 1: more generally. So, so you're building a giant database, right,

163
00:09:18,756 --> 00:09:21,196
Speaker 1: and I find that interesting for a lot of reasons.

164
00:09:21,036 --> 00:09:23,996
Speaker 1: It happens. I don't have you come across the work

165
00:09:24,036 --> 00:09:27,636
Speaker 1: of faith A Lee. Absolutely, yeses, So I talked to

166
00:09:27,636 --> 00:09:30,796
Speaker 1: faith A Lee for this show not long ago. Wow. Right,

167
00:09:30,916 --> 00:09:35,916
Speaker 1: she's like nerd famous, right yeah, And so you know,

168
00:09:36,036 --> 00:09:40,756
Speaker 1: as you know, she built this giant database of images

169
00:09:40,996 --> 00:09:44,516
Speaker 1: about ten years ago a little more now called image net.

170
00:09:44,956 --> 00:09:50,156
Speaker 1: And that was that giant database was what allowed these

171
00:09:50,316 --> 00:09:55,316
Speaker 1: early machine learning models AI models to you know, start

172
00:09:55,476 --> 00:10:02,076
Speaker 1: recognizing images, right, and so the database was this necessary tool,

173
00:10:02,916 --> 00:10:05,996
Speaker 1: necessary thing for the AI to really work, right, And

174
00:10:06,116 --> 00:10:12,796
Speaker 1: so are you building the acoustic biomarker version of that?

175
00:10:13,636 --> 00:10:16,636
Speaker 2: So the first the short answer is yes, but I'd

176
00:10:16,676 --> 00:10:18,716
Speaker 2: like to start by saying that I am not building

177
00:10:19,156 --> 00:10:20,116
Speaker 2: it's our distortion.

178
00:10:20,596 --> 00:10:22,916
Speaker 1: Yes, yes, are you all are?

179
00:10:23,116 --> 00:10:24,276
Speaker 3: Actually, I'll just.

180
00:10:24,316 --> 00:10:27,036
Speaker 2: First start by recognizing here that it's it's a it's

181
00:10:27,116 --> 00:10:29,196
Speaker 2: a huge team. So we're the Bridge to Way I

182
00:10:29,316 --> 00:10:33,116
Speaker 2: Voice Constortium is a team of fifty investigators across the

183
00:10:33,236 --> 00:10:36,556
Speaker 2: US and Canada. We're funded by the NIH through the

184
00:10:36,596 --> 00:10:40,076
Speaker 2: Bridge to Way I program and the goal absolutely this

185
00:10:40,156 --> 00:10:41,916
Speaker 2: is the first time I hear the analogy to the

186
00:10:41,956 --> 00:10:43,356
Speaker 2: image net database.

187
00:10:43,396 --> 00:10:43,756
Speaker 3: I like it.

188
00:10:43,796 --> 00:10:47,076
Speaker 2: I usually give the example of the genomic database, the

189
00:10:47,196 --> 00:10:48,996
Speaker 2: Human Genome Project, huge.

190
00:10:49,076 --> 00:10:52,196
Speaker 1: Project, more famous, more famous, they're.

191
00:10:51,716 --> 00:10:53,716
Speaker 3: Both very famous. But I like this analogy.

192
00:10:53,876 --> 00:10:56,196
Speaker 1: Well. Image net is maybe a little bit closer of

193
00:10:56,236 --> 00:11:00,156
Speaker 1: an analogy, but maybe less Yeah yeah, sexy, yeah.

194
00:10:59,836 --> 00:11:02,316
Speaker 2: Well, but I mean it's interesting because the genome project

195
00:11:02,356 --> 00:11:06,116
Speaker 2: has also very interesting ethical particularities like voice, right, the

196
00:11:06,196 --> 00:11:08,996
Speaker 2: image has a little bit less of the ethical constraints.

197
00:11:08,996 --> 00:11:11,036
Speaker 3: For is, when we talk about whole genome.

198
00:11:10,716 --> 00:11:15,636
Speaker 2: Sequencing or genomics data people kind of understand that voice

199
00:11:15,636 --> 00:11:18,436
Speaker 2: has similar concerns in terms of process.

200
00:11:18,476 --> 00:11:20,596
Speaker 1: We want to get to the concerns, but I want

201
00:11:20,596 --> 00:11:23,276
Speaker 1: to first talk about what you're doing and and then

202
00:11:23,316 --> 00:11:28,716
Speaker 1: we can talk about you know, not doing anything wrong. Yeah.

203
00:11:28,836 --> 00:11:32,676
Speaker 1: So broadly, if it becomes the thing you hope it

204
00:11:32,756 --> 00:11:35,076
Speaker 1: will be, what, what is it going to be? What

205
00:11:35,196 --> 00:11:38,636
Speaker 1: is the bridge to AI voice database going to be?

206
00:11:39,436 --> 00:11:42,516
Speaker 2: So it's going to be this large database of thousands

207
00:11:42,516 --> 00:11:47,796
Speaker 2: of human voices linked to other health information that are

208
00:11:47,876 --> 00:11:52,716
Speaker 2: going to be available to researchers and potentially people other

209
00:11:52,796 --> 00:11:56,756
Speaker 2: than researchers as well, to be able to make discoveries, right,

210
00:11:56,916 --> 00:12:00,756
Speaker 2: to learn to use a voice AI, to train you know,

211
00:12:00,796 --> 00:12:02,956
Speaker 2: the next generation of people on how to learn to

212
00:12:03,196 --> 00:12:07,516
Speaker 2: build models on voice AI, to help pharmaceutical companies develop

213
00:12:07,556 --> 00:12:11,676
Speaker 2: products or learn even to to develop products, right, And

214
00:12:11,716 --> 00:12:15,876
Speaker 2: the other really important thing is to teach people what

215
00:12:15,956 --> 00:12:18,716
Speaker 2: type of standards we need right right now, a lot

216
00:12:18,796 --> 00:12:21,916
Speaker 2: of different projects, there's really a lack of standards. People

217
00:12:21,996 --> 00:12:25,116
Speaker 2: collect voice in different ways. That's why it's really hard

218
00:12:25,116 --> 00:12:29,956
Speaker 2: to pull data together. So our dream was really to say, like, hey,

219
00:12:30,196 --> 00:12:33,956
Speaker 2: you want to do voice research, here's a manual, my friend, right,

220
00:12:34,036 --> 00:12:36,236
Speaker 2: like here is how you collect the voice to make

221
00:12:36,276 --> 00:12:39,356
Speaker 2: it accurate. This is the protocols with the task that

222
00:12:39,396 --> 00:12:43,956
Speaker 2: we think, based on our studies, give the best biomarkers. Right,

223
00:12:44,316 --> 00:12:46,556
Speaker 2: These are the type of biomarkers you can look for

224
00:12:46,836 --> 00:12:50,116
Speaker 2: and this is the data you can train, so really

225
00:12:50,156 --> 00:12:52,716
Speaker 2: create a manual of operations also for people to be

226
00:12:52,756 --> 00:12:55,596
Speaker 2: able to make discoveries, and that's the goal to have

227
00:12:56,476 --> 00:12:57,916
Speaker 2: the most impact on patient care.

228
00:12:58,116 --> 00:13:00,396
Speaker 1: So what are the biomarkers? What are you asking people

229
00:13:00,396 --> 00:13:01,396
Speaker 1: to do? What are you collecting?

230
00:13:02,356 --> 00:13:07,316
Speaker 2: So I separate things between. So there are respiratory biomarkers,

231
00:13:08,356 --> 00:13:13,196
Speaker 2: voice biome markers, speech biomarkers, and linguistics biomarkers, and they're

232
00:13:13,236 --> 00:13:16,476
Speaker 2: all different. So let's go about why these are different.

233
00:13:16,876 --> 00:13:20,076
Speaker 2: So respiratory is easy, right, So we ask people to breathe,

234
00:13:20,556 --> 00:13:23,916
Speaker 2: to cough, to take big breaths in and that has

235
00:13:24,036 --> 00:13:27,236
Speaker 2: a lot of information on our pulmonary capacity, on how

236
00:13:27,276 --> 00:13:31,836
Speaker 2: our windpipe is shaped. Okay, that's respiratory. Then voice and

237
00:13:31,876 --> 00:13:34,236
Speaker 2: speech what's the difference. So voice is really the sound

238
00:13:34,316 --> 00:13:38,156
Speaker 2: that we make when our vocal cords come together. So

239
00:13:38,436 --> 00:13:42,836
Speaker 2: when we say, like birds can voice, but they can't speak.

240
00:13:43,636 --> 00:13:46,436
Speaker 2: If you have a bird that speaks, then you'll be very.

241
00:13:46,316 --> 00:13:50,236
Speaker 1: Rich or you have a parent.

242
00:13:52,676 --> 00:13:55,236
Speaker 2: So when we when we do voice tasks, we ask

243
00:13:55,356 --> 00:13:57,516
Speaker 2: patients to say E or.

244
00:13:57,516 --> 00:13:59,436
Speaker 3: Ah or I.

245
00:13:59,516 --> 00:14:00,116
Speaker 1: Get the difference.

246
00:14:01,716 --> 00:14:06,636
Speaker 2: Birds and voice biomarkers will be impacted when our voice

247
00:14:06,636 --> 00:14:11,436
Speaker 2: box is changed or our resp is changed. Right, So

248
00:14:11,476 --> 00:14:14,916
Speaker 2: somebody with pneumonia probably cannot hold a note for very long,

249
00:14:15,036 --> 00:14:18,596
Speaker 2: So that's voice biomarkers. When we talk about speech biomarkers,

250
00:14:18,636 --> 00:14:22,476
Speaker 2: then you go into articulation. So some people, for example,

251
00:14:22,476 --> 00:14:25,796
Speaker 2: who have neurological deficits or their mouth is not working correctly,

252
00:14:25,796 --> 00:14:28,236
Speaker 2: they're going to have trouble articulating. They're going to have

253
00:14:28,236 --> 00:14:31,916
Speaker 2: trouble saying some words. So these are biomarkers we can extract.

254
00:14:32,196 --> 00:14:36,436
Speaker 2: And then lastly there's linguistic biomarkers. So what type of

255
00:14:36,476 --> 00:14:41,156
Speaker 2: words are people using, what type of semantic how fast

256
00:14:41,476 --> 00:14:44,076
Speaker 2: do they speak for example? These are all different types

257
00:14:44,076 --> 00:14:48,316
Speaker 2: of biomarkers that.

258
00:14:46,596 --> 00:14:47,476
Speaker 3: That we can extract.

259
00:14:47,476 --> 00:14:49,636
Speaker 2: So to give you a very tangible example, I was

260
00:14:49,676 --> 00:14:53,316
Speaker 2: reading a paper from a group looking at biomarkers of depression,

261
00:14:54,836 --> 00:14:58,196
Speaker 2: and rate of speech was one of the important biomarkers

262
00:14:58,196 --> 00:15:01,236
Speaker 2: they found. So people who are sad or depressed will

263
00:15:01,276 --> 00:15:06,956
Speaker 2: speak at a slower pace, so words per second is smaller.

264
00:15:06,996 --> 00:15:08,676
Speaker 2: So that's simple when you think about it, it's a

265
00:15:08,676 --> 00:15:12,996
Speaker 2: simple by marker, right, So that's to give up tangible examples.

266
00:15:13,276 --> 00:15:14,956
Speaker 2: So in terms of I think I didn't answer your

267
00:15:15,036 --> 00:15:18,516
Speaker 2: question fully, So what are we asking patients? So we

268
00:15:18,636 --> 00:15:21,916
Speaker 2: ask people to do all these tasks so coughing, breathing,

269
00:15:22,316 --> 00:15:27,516
Speaker 2: a e. Then we make them read those validated passages,

270
00:15:27,596 --> 00:15:30,956
Speaker 2: and we also ask open questions. And then when we

271
00:15:31,036 --> 00:15:33,836
Speaker 2: ask open questions, we have to ask about questions that

272
00:15:34,116 --> 00:15:37,676
Speaker 2: make them emotional and some that don't make them emotional,

273
00:15:37,716 --> 00:15:40,236
Speaker 2: because if you trigger emotion, that causes a bias on

274
00:15:40,276 --> 00:15:41,516
Speaker 2: how your voice will sound.

275
00:15:42,676 --> 00:15:45,636
Speaker 1: What what question do you ask to make people emotional?

276
00:15:46,036 --> 00:15:47,116
Speaker 3: So it's really interesting.

277
00:15:47,396 --> 00:15:50,676
Speaker 2: So at first we would ask, you know, our first

278
00:15:50,756 --> 00:15:53,436
Speaker 2: question was, you know, can you talk to me about

279
00:15:53,836 --> 00:15:56,276
Speaker 2: something that makes you sad? It could be somebody that

280
00:15:56,356 --> 00:15:58,796
Speaker 2: died in your family or you know, So that was

281
00:15:58,836 --> 00:16:03,756
Speaker 2: our prompt. And then our question without emotion was tell

282
00:16:03,836 --> 00:16:06,356
Speaker 2: us about your disease and.

283
00:16:07,036 --> 00:16:09,676
Speaker 1: Only a doctor. What'd think that's for that emotional question?

284
00:16:09,836 --> 00:16:10,316
Speaker 3: Exactly?

285
00:16:10,396 --> 00:16:12,196
Speaker 2: I mean, but it's like when you think about it,

286
00:16:12,236 --> 00:16:14,916
Speaker 2: like Our consortium is like tons of experts that put

287
00:16:14,956 --> 00:16:16,516
Speaker 2: their minds together to develop.

288
00:16:16,276 --> 00:16:19,516
Speaker 1: Tell me about having Parkinson's. That's the unemotional question we're

289
00:16:19,516 --> 00:16:19,956
Speaker 1: going to ask.

290
00:16:19,916 --> 00:16:22,196
Speaker 3: And then we I mean, we like, why are you here?

291
00:16:22,236 --> 00:16:23,796
Speaker 2: I think it was not that obvious, but it's like,

292
00:16:24,556 --> 00:16:27,116
Speaker 2: tell us about why you're here to see your doctor today.

293
00:16:27,316 --> 00:16:29,956
Speaker 2: And then analyzing the data, because we do pilots, right,

294
00:16:30,036 --> 00:16:33,316
Speaker 2: we audit our data. We realized that people were starting

295
00:16:33,356 --> 00:16:36,236
Speaker 2: to tear up, like we had people crying while talking

296
00:16:36,276 --> 00:16:38,516
Speaker 2: about why they were coming to the doctor today, which is.

297
00:16:38,516 --> 00:16:41,436
Speaker 1: Supposed to be the example of unemotional.

298
00:16:40,836 --> 00:16:42,636
Speaker 3: Sure, correct, So we had to change that.

299
00:16:45,396 --> 00:16:50,556
Speaker 1: Yes, interesting, So okay, this is great. So you're getting

300
00:16:50,596 --> 00:16:55,836
Speaker 1: a lot of auditory information from every patient. What other

301
00:16:55,876 --> 00:16:58,356
Speaker 1: information you're getting from each person? So much?

302
00:16:58,676 --> 00:17:00,876
Speaker 2: So to give you an idea, our full protocol is

303
00:17:00,876 --> 00:17:01,516
Speaker 2: about one.

304
00:17:01,396 --> 00:17:05,396
Speaker 1: Hour okay, so of the patient with the patient.

305
00:17:05,076 --> 00:17:08,236
Speaker 2: With an ipassion it's an iPad, So everything is based

306
00:17:08,276 --> 00:17:10,756
Speaker 2: on an iPad and there's a helper right now or

307
00:17:10,756 --> 00:17:14,476
Speaker 2: research assistant. So we collect data. We collect very extensive

308
00:17:14,516 --> 00:17:18,396
Speaker 2: demographics in terms of you know, age, race, geographical location.

309
00:17:19,356 --> 00:17:22,556
Speaker 2: We collect language, So what language do you speak? How

310
00:17:22,596 --> 00:17:25,636
Speaker 2: many languages is do you speak, what languages do you write?

311
00:17:26,436 --> 00:17:28,196
Speaker 2: You know, what part of the world are you from?

312
00:17:28,356 --> 00:17:32,156
Speaker 2: That's really important. Then we collect about disabilities. Are you

313
00:17:32,236 --> 00:17:34,836
Speaker 2: hearing compared are you visually impaired? Because that makes a

314
00:17:34,916 --> 00:17:40,076
Speaker 2: change in your voice, your smoking status, your hydration status,

315
00:17:40,396 --> 00:17:43,796
Speaker 2: your fatigue status, because that's so we're we kind of

316
00:17:43,836 --> 00:17:47,196
Speaker 2: thought about anything that could affect voice, right, your socio

317
00:17:47,236 --> 00:17:50,996
Speaker 2: economical status because if you think about it, that's going

318
00:17:51,036 --> 00:17:54,956
Speaker 2: to affect you know, your linguistics as well. And then

319
00:17:55,036 --> 00:17:59,916
Speaker 2: so other that extensive demographics, then we collect confounders, so

320
00:17:59,956 --> 00:18:02,196
Speaker 2: we think about anything that could change your voice. Do

321
00:18:02,236 --> 00:18:05,556
Speaker 2: you have allergies? Do you do you have dental issues?

322
00:18:05,596 --> 00:18:09,636
Speaker 2: Do you wear braces? So everybody gets a basic test

323
00:18:09,756 --> 00:18:13,036
Speaker 2: about if they are depressed. So no matter what disease

324
00:18:13,076 --> 00:18:15,716
Speaker 2: you have, you kind of get the basic tests for

325
00:18:15,796 --> 00:18:18,756
Speaker 2: all the other disease to measure if it's possible that

326
00:18:18,836 --> 00:18:21,356
Speaker 2: you have concurrent diseases at the same time.

327
00:18:21,396 --> 00:18:24,716
Speaker 1: Because presumably because people are in fact complex, and there

328
00:18:24,756 --> 00:18:27,676
Speaker 1: are many people who have depression and Parkinson's and you

329
00:18:27,716 --> 00:18:29,716
Speaker 1: want to understand what's going on there.

330
00:18:30,116 --> 00:18:33,836
Speaker 2: I mean, most people are complex, right, It's really rare

331
00:18:33,916 --> 00:18:36,396
Speaker 2: to have and people that go to the doctor are

332
00:18:36,436 --> 00:18:39,636
Speaker 2: not twenty year old and healthy. Right, most of the

333
00:18:39,676 --> 00:18:42,596
Speaker 2: people who will use our technology or will benefit from

334
00:18:42,636 --> 00:18:45,716
Speaker 2: these database will be your typical sixty year old chronic

335
00:18:45,756 --> 00:18:48,196
Speaker 2: disease patient that comes into the doctor and they're not

336
00:18:48,316 --> 00:18:50,076
Speaker 2: they don't have a sterile bill of health.

337
00:18:50,916 --> 00:18:53,556
Speaker 1: How many people do you want to have in the database? Like,

338
00:18:53,636 --> 00:18:55,436
Speaker 1: is there a final number you're going for?

339
00:18:55,876 --> 00:18:58,476
Speaker 2: So at the beginning, we were aiming for thirty thousand,

340
00:19:00,436 --> 00:19:03,796
Speaker 2: which is extremely it's extremely ambitious, I think to be fair,

341
00:19:03,836 --> 00:19:06,236
Speaker 2: I mean, if after four years we get to ten thousand,

342
00:19:06,276 --> 00:19:10,356
Speaker 2: I think it'll be a huge success. Okay, And you

343
00:19:10,356 --> 00:19:13,636
Speaker 2: know the data collection. I think what we're learning is

344
00:19:13,676 --> 00:19:17,556
Speaker 2: that data collection is very resource intensive. To have good

345
00:19:17,676 --> 00:19:20,196
Speaker 2: data is very resource intensive.

346
00:19:20,996 --> 00:19:25,436
Speaker 1: So what happened that made you realize that thirty thousand

347
00:19:25,636 --> 00:19:27,876
Speaker 1: was maybe harder than you thought?

348
00:19:28,796 --> 00:19:31,756
Speaker 2: So? I think we thought that we wanted to collect

349
00:19:31,796 --> 00:19:34,196
Speaker 2: as much data as possible, and our original plan was

350
00:19:34,236 --> 00:19:38,996
Speaker 2: to collect a lot shorter protocols, you know, like shorter clips.

351
00:19:40,236 --> 00:19:43,516
Speaker 2: But as we started working with patients, we realized that

352
00:19:44,076 --> 00:19:48,076
Speaker 2: by getting more data from the same patients, we can

353
00:19:48,116 --> 00:19:51,636
Speaker 2: actually have a lot more information and it provides a

354
00:19:51,716 --> 00:19:55,276
Speaker 2: lot of interesting you know biomarkers. So we're focusing more

355
00:19:55,316 --> 00:19:58,716
Speaker 2: on getting more data from a smaller amount of patients

356
00:19:59,156 --> 00:20:02,356
Speaker 2: and really with the right data, kind of right data

357
00:20:02,436 --> 00:20:04,636
Speaker 2: with a lot of clinical information attached to it.

358
00:20:08,036 --> 00:20:10,516
Speaker 1: After the break, what the world will look like in

359
00:20:10,556 --> 00:20:22,876
Speaker 1: a few years if everything goes well. So this is

360
00:20:22,916 --> 00:20:25,356
Speaker 1: a big project that yeah, Elle and her colleagues are

361
00:20:25,356 --> 00:20:27,556
Speaker 1: embarked on. It's a four year project. They're about a

362
00:20:27,676 --> 00:20:31,356
Speaker 1: year in and there will be interim data releases along

363
00:20:31,356 --> 00:20:34,156
Speaker 1: the way. So I asked her, how long will it

364
00:20:34,196 --> 00:20:37,036
Speaker 1: take for this project to advance the state of the

365
00:20:37,116 --> 00:20:39,396
Speaker 1: science in acoustic biomarkers.

366
00:20:39,876 --> 00:20:43,036
Speaker 2: Yeah, I would say to say at the end of

367
00:20:43,076 --> 00:20:46,956
Speaker 2: the four years would be a probably the best answer.

368
00:20:47,036 --> 00:20:49,236
Speaker 2: I think at the end of the four years. But

369
00:20:49,316 --> 00:20:51,356
Speaker 2: I think that you know, you can just say, oh,

370
00:20:51,356 --> 00:20:53,196
Speaker 2: we'll just start training models at the end of the

371
00:20:53,196 --> 00:20:55,396
Speaker 2: four year once we have all the data. Right, It's

372
00:20:55,396 --> 00:20:57,636
Speaker 2: not just about you know, building one model that I'll

373
00:20:57,636 --> 00:21:02,036
Speaker 2: answer your question, is about continuously training models to understand

374
00:21:02,196 --> 00:21:05,756
Speaker 2: which biomarkers to extract the then build products that walk.

375
00:21:06,356 --> 00:21:13,756
Speaker 1: So, so, if things go well, what will this world

376
00:21:13,796 --> 00:21:16,596
Speaker 1: look like in whatever five years?

377
00:21:17,276 --> 00:21:21,156
Speaker 2: Yes, So, I mean there's there's a few things that

378
00:21:21,196 --> 00:21:24,036
Speaker 2: this can help with in general, voice biomarkers. Let's not

379
00:21:24,076 --> 00:21:28,596
Speaker 2: talk about just our project. Diagnosis is one thing, right,

380
00:21:28,676 --> 00:21:35,276
Speaker 2: early diagnosis, but that's probably the hardest thing, Huh. Screening

381
00:21:35,516 --> 00:21:38,996
Speaker 2: is most more important. So when we think about screening,

382
00:21:39,076 --> 00:21:41,596
Speaker 2: it means you, let's say you live really far you

383
00:21:41,596 --> 00:21:43,956
Speaker 2: don't have access to a doctor, but your doctor has

384
00:21:43,956 --> 00:21:46,356
Speaker 2: an iPhone and you can talk into the iPhone and

385
00:21:46,356 --> 00:21:49,076
Speaker 2: it can say, hey, something's wrong. You know, you need

386
00:21:49,236 --> 00:21:53,156
Speaker 2: a neurological specialist, for example. So to help screen and triage.

387
00:21:53,236 --> 00:21:55,956
Speaker 2: I think this probably we're looking at in the next

388
00:21:55,996 --> 00:22:00,476
Speaker 2: five years, something definitely possible. The other product that I

389
00:22:00,516 --> 00:22:03,596
Speaker 2: think will be very possible within five years is tracking

390
00:22:03,596 --> 00:22:07,196
Speaker 2: of diseases. If you want to monitor the evolution of

391
00:22:07,236 --> 00:22:11,876
Speaker 2: parkinson or how people respond to drugs. That's why pharmaceutical

392
00:22:11,876 --> 00:22:13,196
Speaker 2: companies are very interested.

393
00:22:13,396 --> 00:22:16,716
Speaker 1: Right. So the acoustic biomarker is not just a binary

394
00:22:16,796 --> 00:22:19,676
Speaker 1: signal of disease, no disease. It can tell you a

395
00:22:19,716 --> 00:22:23,796
Speaker 1: lot about the status of disease. Is it getting better,

396
00:22:23,836 --> 00:22:24,556
Speaker 1: is it getting worse?

397
00:22:24,956 --> 00:22:27,996
Speaker 2: Evolution, especially if you train it on your own voice. Right,

398
00:22:28,476 --> 00:22:32,596
Speaker 2: it's even easier to detect changes in somebody's voice as

399
00:22:32,716 --> 00:22:36,676
Speaker 2: they progress, like your sory for example, or Alexa that

400
00:22:36,756 --> 00:22:39,716
Speaker 2: learns listens to your voice. So that's going to be

401
00:22:39,716 --> 00:22:42,396
Speaker 2: a really good tool for pharmaceutical companies. That's why they're

402
00:22:42,396 --> 00:22:44,716
Speaker 2: investing in it, right, to see how you respond to

403
00:22:44,756 --> 00:22:47,756
Speaker 2: a drug, how you respond to a treatment. And when

404
00:22:47,796 --> 00:22:51,396
Speaker 2: you think about telehealth at home, right, so more and

405
00:22:51,476 --> 00:22:55,796
Speaker 2: more we're going to talk about remote monitoring people. There

406
00:22:55,796 --> 00:22:57,676
Speaker 2: were just too many people on this earth to all

407
00:22:57,716 --> 00:22:59,236
Speaker 2: be in hospitals when we're sick.

408
00:22:59,876 --> 00:23:02,356
Speaker 1: Well, and if you can stay out of the hospital

409
00:23:02,356 --> 00:23:04,636
Speaker 1: when you're sick, that's better, Right, You don't want to

410
00:23:04,636 --> 00:23:06,956
Speaker 1: go to the hospital unless you have to do yeah, or.

411
00:23:07,756 --> 00:23:10,836
Speaker 2: That you're you know, your Lexa detects when your voice

412
00:23:10,836 --> 00:23:14,036
Speaker 2: starts detailor rating and sends you a nurse before you

413
00:23:14,076 --> 00:23:15,236
Speaker 2: need to go to the hospital.

414
00:23:15,716 --> 00:23:19,396
Speaker 1: So there's a more general version of that one, right

415
00:23:19,476 --> 00:23:22,716
Speaker 1: that you could imagine, which is you get your whatever,

416
00:23:22,836 --> 00:23:25,716
Speaker 1: your iPhone, your Android phone, and you have a choice

417
00:23:25,716 --> 00:23:27,956
Speaker 1: when you're setting up your phone, like do you want

418
00:23:27,956 --> 00:23:32,596
Speaker 1: to opt into to the phone listening and to tell

419
00:23:32,636 --> 00:23:34,676
Speaker 1: you if you need to go talk to your doctor, right,

420
00:23:34,836 --> 00:23:39,156
Speaker 1: just like a very broad based thing that you could

421
00:23:39,196 --> 00:23:43,236
Speaker 1: opt into, like I would probably opt into that. I mean,

422
00:23:43,316 --> 00:23:44,916
Speaker 1: is that a thing that you think about?

423
00:23:45,276 --> 00:23:48,076
Speaker 2: So, I mean yes, I'm sure that you know Apple

424
00:23:48,196 --> 00:23:49,636
Speaker 2: is working on that already.

425
00:23:49,756 --> 00:23:52,196
Speaker 3: They are, you know.

426
00:23:52,516 --> 00:23:55,996
Speaker 2: The question is there has to be technology that's being

427
00:23:56,036 --> 00:23:59,916
Speaker 2: developed as well to ensure privacy of not only you,

428
00:24:00,396 --> 00:24:03,676
Speaker 2: but your environment. Right, because when it's your phone, then

429
00:24:03,676 --> 00:24:04,956
Speaker 2: it's your environment as well.

430
00:24:05,476 --> 00:24:07,916
Speaker 1: So you brought up privacy in that context, we can

431
00:24:08,276 --> 00:24:10,876
Speaker 1: knock out private see in the context of the database

432
00:24:10,916 --> 00:24:17,436
Speaker 1: as well. Here, how could it go wrong? Building a

433
00:24:17,516 --> 00:24:23,196
Speaker 1: database of thousands of people's voices with tons of data

434
00:24:23,236 --> 00:24:25,916
Speaker 1: about them sort of answers itself.

435
00:24:26,156 --> 00:24:28,516
Speaker 2: Yeah, it can go wrong in many ways. And I

436
00:24:28,796 --> 00:24:31,156
Speaker 2: just came out of like two hours of meetings of this.

437
00:24:31,276 --> 00:24:33,956
Speaker 2: So add the Bridge to ay I program. We have

438
00:24:34,036 --> 00:24:37,956
Speaker 2: a huge group of bioethicists and one of our big

439
00:24:38,036 --> 00:24:41,756
Speaker 2: aim as a group is really to ensure patient privacy

440
00:24:41,796 --> 00:24:44,356
Speaker 2: and to answer these questions of how do we protect

441
00:24:44,396 --> 00:24:47,276
Speaker 2: patient privacy in the context of open data. Right, So,

442
00:24:47,676 --> 00:24:50,316
Speaker 2: you are absolutely right, tons of things can go wrong.

443
00:24:50,876 --> 00:24:55,996
Speaker 2: People can be potentially reidentified through their voice. So one

444
00:24:56,036 --> 00:24:59,236
Speaker 2: of our biggest goals this year is determined what part

445
00:24:59,276 --> 00:25:02,916
Speaker 2: of the voice is identifiable and which part is not okay,

446
00:25:03,316 --> 00:25:05,596
Speaker 2: And all of this is based on the Hippo law.

447
00:25:05,796 --> 00:25:08,036
Speaker 2: Hippo law is from the nineteen nineties.

448
00:25:08,316 --> 00:25:13,796
Speaker 1: Hippola the that governs sharing and security of people's medical information.

449
00:25:13,676 --> 00:25:17,116
Speaker 2: Correct protected health information PHI, we call that, and that

450
00:25:17,196 --> 00:25:20,796
Speaker 2: law was made in nineteen nineties, right and back then

451
00:25:20,876 --> 00:25:23,276
Speaker 2: they listed a list of things of what they called

452
00:25:23,396 --> 00:25:27,956
Speaker 2: PHI or identifiers that cannot be shared openly and that

453
00:25:27,956 --> 00:25:31,996
Speaker 2: should stay in the hospital. And voice prints are listed.

454
00:25:32,756 --> 00:25:35,116
Speaker 2: When you go into what a definition of a voice

455
00:25:35,116 --> 00:25:39,876
Speaker 2: print is, it's very nebulous. It's you know, we don't know.

456
00:25:39,956 --> 00:25:42,036
Speaker 2: So because of that nebubularity.

457
00:25:42,116 --> 00:25:45,596
Speaker 1: If I have that word, if that's nebulosity, I'm fred,

458
00:25:45,596 --> 00:25:46,156
Speaker 1: I don't know.

459
00:25:49,196 --> 00:25:52,676
Speaker 2: Because it's so nebulous. A lot of institutions, a lot

460
00:25:52,676 --> 00:25:56,316
Speaker 2: of hospitals will say, well, you know, voice is not

461
00:25:56,996 --> 00:25:59,956
Speaker 2: is not an identifier as long as you don't say hi,

462
00:26:00,076 --> 00:26:02,916
Speaker 2: I'm John Doe and I live at four twenty five,

463
00:26:02,956 --> 00:26:06,036
Speaker 2: blah blah blah. Other universities will say, no, no, no,

464
00:26:06,156 --> 00:26:09,436
Speaker 2: voice is always an identifier. You can never really least

465
00:26:09,516 --> 00:26:12,876
Speaker 2: voice data. So what our group is doing right now

466
00:26:13,036 --> 00:26:16,276
Speaker 2: is really looking at why the hippo law says this,

467
00:26:16,516 --> 00:26:20,996
Speaker 2: what are the actual legal implications of sharing voice? And

468
00:26:21,556 --> 00:26:24,236
Speaker 2: we always grade it in terms of risk, Right, if

469
00:26:24,316 --> 00:26:27,116
Speaker 2: I talked about all the things that we collect, you

470
00:26:27,156 --> 00:26:30,836
Speaker 2: can think that the respiratory sounds are probably very safe

471
00:26:30,836 --> 00:26:36,236
Speaker 2: to share versus a speech sample. As we say free speech,

472
00:26:37,316 --> 00:26:41,196
Speaker 2: it's probably the most identifying if you have to grade it, right,

473
00:26:41,756 --> 00:26:45,436
Speaker 2: And we're kind of looking at, well, where is the balance?

474
00:26:45,516 --> 00:26:49,156
Speaker 2: How much can we release? And also we can transform

475
00:26:49,236 --> 00:26:51,796
Speaker 2: the data, so for example, we can change the data,

476
00:26:52,156 --> 00:26:58,036
Speaker 2: the audio data and what we call visual spectrograms.

477
00:26:56,156 --> 00:26:57,036
Speaker 1: Like a waveform.

478
00:26:58,076 --> 00:27:01,476
Speaker 2: Yeah, it's a sort of waveform that machine learning can use.

479
00:27:02,236 --> 00:27:08,076
Speaker 2: We can extract acoustic features, right, like loudness, frequency, stuff like.

480
00:27:08,036 --> 00:27:10,716
Speaker 1: That, and basically trying to figure out how to make

481
00:27:10,756 --> 00:27:15,876
Speaker 1: a person be not identifiable based on their voice without

482
00:27:16,036 --> 00:27:19,636
Speaker 1: messing up the database. Like that's the balance, right, Like

483
00:27:19,716 --> 00:27:22,636
Speaker 1: if you monkey with their voice too much, then your

484
00:27:22,676 --> 00:27:24,916
Speaker 1: monkey with the data, the database that we care the

485
00:27:24,956 --> 00:27:27,436
Speaker 1: most about. Like that seems like a hard trade off.

486
00:27:28,236 --> 00:27:31,076
Speaker 1: So if we go farther out into the future, you

487
00:27:31,156 --> 00:27:33,996
Speaker 1: solve all these problems, you build your giant database, the

488
00:27:34,076 --> 00:27:36,956
Speaker 1: models get really good. All of these things seem like

489
00:27:36,996 --> 00:27:42,876
Speaker 1: things that may well happen. I'm curious about, you know,

490
00:27:43,036 --> 00:27:48,596
Speaker 1: AI doing some chunk of what you do now. Right,

491
00:27:48,636 --> 00:27:51,276
Speaker 1: we see this happening, say in radiology already. AI is

492
00:27:51,316 --> 00:27:54,596
Speaker 1: clearly very good at doing some of the technical work

493
00:27:54,636 --> 00:27:58,836
Speaker 1: that radiologists do in diagnosing scans of patients. Right, how

494
00:27:58,836 --> 00:28:02,516
Speaker 1: do you think about the future of AI, you know,

495
00:28:02,676 --> 00:28:06,876
Speaker 1: using acoustic biomarkers to make diagnosis in a way that

496
00:28:06,956 --> 00:28:09,436
Speaker 1: is similar to what you do now as a human being.

497
00:28:10,156 --> 00:28:11,836
Speaker 2: Yeah, So, I mean, I I don't think I'm going

498
00:28:11,916 --> 00:28:14,156
Speaker 2: to lose my job yet because I would say that

499
00:28:14,316 --> 00:28:18,836
Speaker 2: my primary goal as a doctor is not to necessarily

500
00:28:19,156 --> 00:28:22,356
Speaker 2: do that, right, Like, my primary goal is, yes, to diagnose,

501
00:28:22,396 --> 00:28:24,436
Speaker 2: but it's to treat patient. So for now, AI is

502
00:28:24,476 --> 00:28:26,676
Speaker 2: not going to treat the patient. So I think what

503
00:28:26,716 --> 00:28:28,836
Speaker 2: it's going to do is it's going to support a

504
00:28:28,876 --> 00:28:35,036
Speaker 2: lot of the workforce. So for example, I'm an academic laryngologist.

505
00:28:35,116 --> 00:28:36,836
Speaker 2: I'm a super a super specialist.

506
00:28:37,116 --> 00:28:37,276
Speaker 3: Right.

507
00:28:37,716 --> 00:28:39,876
Speaker 2: For people to get to me, they often see like

508
00:28:39,916 --> 00:28:40,996
Speaker 2: five different doctors.

509
00:28:41,276 --> 00:28:43,956
Speaker 1: So instead of going through for specialists who can't figure

510
00:28:43,956 --> 00:28:45,876
Speaker 1: it out, you go to your primary care doctor or

511
00:28:45,916 --> 00:28:49,036
Speaker 1: even you just talk to your phone, and your phone

512
00:28:49,076 --> 00:28:50,876
Speaker 1: says you better talk to your primary care doctor, and

513
00:28:50,876 --> 00:28:54,356
Speaker 1: your primary care doctor sends sends the patient directly.

514
00:28:54,076 --> 00:28:57,756
Speaker 2: To you, correct, correct, right to say like hey. Because again,

515
00:28:58,196 --> 00:29:00,276
Speaker 2: most of what we do for a very long time

516
00:29:00,356 --> 00:29:04,916
Speaker 2: will will need a gold standard right diagnosis. So often

517
00:29:05,116 --> 00:29:08,196
Speaker 2: you know it's it's a it's a biopsy, or it's

518
00:29:08,276 --> 00:29:09,876
Speaker 2: a it's and imaging.

519
00:29:09,996 --> 00:29:11,116
Speaker 3: You need a gold standard.

520
00:29:11,276 --> 00:29:16,316
Speaker 1: The acoustic biomarker is not a clear enough diagnostic technique.

521
00:29:16,356 --> 00:29:18,236
Speaker 1: You need something more reliable.

522
00:29:18,716 --> 00:29:20,436
Speaker 2: So I don't think it's going to you know, no

523
00:29:20,556 --> 00:29:22,676
Speaker 2: doctor will say, oh, well, based on this, this is

524
00:29:22,716 --> 00:29:26,716
Speaker 2: your diagnostics, start chemotherapy. That's not where we're going. I

525
00:29:26,756 --> 00:29:32,236
Speaker 2: wouldn't take chemotherapy based on an acoustic biomarker. But it's

526
00:29:32,316 --> 00:29:35,676
Speaker 2: hopefully going to support a lot of primary care and

527
00:29:35,756 --> 00:29:38,636
Speaker 2: access to care to get to the right person faster.

528
00:29:39,916 --> 00:29:43,196
Speaker 1: Great anything else we should talk about.

529
00:29:44,156 --> 00:29:46,396
Speaker 2: The one thing we didn't talk about, I guess I

530
00:29:46,436 --> 00:29:48,116
Speaker 2: talk about this all day, so sometimes it's hard to

531
00:29:48,356 --> 00:29:52,636
Speaker 2: remember what I've said in iowha haven't said. But the

532
00:29:52,716 --> 00:29:57,436
Speaker 2: implication for probably all this new telehealth you know, online

533
00:29:57,756 --> 00:30:01,196
Speaker 2: world that we live in, a lot of industries are

534
00:30:01,236 --> 00:30:07,116
Speaker 2: already integrating tools. So, for example, Canary Speech is a

535
00:30:07,156 --> 00:30:09,876
Speaker 2: startup that sold a product. I think they're working with

536
00:30:09,956 --> 00:30:15,756
Speaker 2: teams to capture if there's signs in your voice of depression.

537
00:30:16,356 --> 00:30:19,916
Speaker 1: Teams meaning Microsoft teams, like Microsoft's version of Zoom.

538
00:30:20,076 --> 00:30:22,076
Speaker 2: Yeah, yeah, so I think And don't quote me on

539
00:30:22,116 --> 00:30:24,316
Speaker 2: the particular. Maybe I'm you know, I'm not giving, but

540
00:30:24,316 --> 00:30:27,236
Speaker 2: but I know there's a few startups that are starting

541
00:30:27,236 --> 00:30:31,156
Speaker 2: to integrate products in Zoom or in teams to let

542
00:30:31,316 --> 00:30:34,396
Speaker 2: employers know that, hey, your employee is not doing well

543
00:30:34,436 --> 00:30:36,636
Speaker 2: based on his voice, for example, Right, and.

544
00:30:36,636 --> 00:30:40,956
Speaker 1: What is your view of the efficacy of those?

545
00:30:41,836 --> 00:30:45,236
Speaker 2: So, I mean, I I the easy the quick answer

546
00:30:45,396 --> 00:30:50,996
Speaker 2: is it probably works partially. Yeah, But the question is

547
00:30:50,996 --> 00:30:53,156
Speaker 2: not if it works full you're not. The question is

548
00:30:53,196 --> 00:30:57,196
Speaker 2: does it make a difference? Right, So let's say let's.

549
00:30:57,556 --> 00:30:59,436
Speaker 1: Do what they say it does. Is a question that

550
00:30:59,476 --> 00:31:03,196
Speaker 1: matters to me, right, like does it are the claims valid?

551
00:31:03,516 --> 00:31:05,396
Speaker 1: Seems like a reasonable starting Yeah.

552
00:31:05,236 --> 00:31:05,876
Speaker 3: I think so.

553
00:31:05,876 --> 00:31:08,276
Speaker 2: So. I just I just reviewed one an article of

554
00:31:08,316 --> 00:31:11,396
Speaker 2: one of the startups Fantastic that's looking at like depression,

555
00:31:11,476 --> 00:31:14,396
Speaker 2: and I mean their numbers look great. I do think

556
00:31:14,396 --> 00:31:16,796
Speaker 2: that's that the results that a lot of these projects

557
00:31:16,796 --> 00:31:19,956
Speaker 2: are getting are definitely positive and promising.

558
00:31:19,996 --> 00:31:25,996
Speaker 1: Absolutely, we'll be back in a minute with the light.

559
00:31:28,436 --> 00:31:29,636
Speaker 3: M h.

560
00:31:37,636 --> 00:31:41,676
Speaker 1: Now, as promised, we're back with the lighting around. What

561
00:31:41,756 --> 00:31:42,636
Speaker 1: was your band called?

562
00:31:43,356 --> 00:31:48,316
Speaker 2: Ha, My chase stage name was Ella Bence Ella Bence

563
00:31:48,596 --> 00:31:50,636
Speaker 2: because my last name is Ben Susan so that was

564
00:31:50,676 --> 00:31:51,276
Speaker 2: too long.

565
00:31:51,876 --> 00:31:53,516
Speaker 1: And yeah, al became Ella.

566
00:31:53,956 --> 00:31:55,756
Speaker 3: Yeah, I could say my first name.

567
00:31:57,276 --> 00:32:00,876
Speaker 1: What did you have a hit song in French? What

568
00:32:00,956 --> 00:32:01,476
Speaker 1: was it called?

569
00:32:04,196 --> 00:32:05,916
Speaker 2: I wouldn't call it a hit song. It was called

570
00:32:05,916 --> 00:32:10,036
Speaker 2: annalis samp means I guess in English, it's like a

571
00:32:10,116 --> 00:32:10,916
Speaker 2: one way flight.

572
00:32:12,236 --> 00:32:13,276
Speaker 1: Can you sing a line?

573
00:32:13,516 --> 00:32:17,156
Speaker 3: No, that's my previous life.

574
00:32:17,756 --> 00:32:22,836
Speaker 1: Can you just say a line? Yeah?

575
00:32:22,956 --> 00:32:25,876
Speaker 3: I was Uh, it's in French, though I know it'll

576
00:32:25,916 --> 00:32:29,676
Speaker 3: sound great. Yeah, And nalisam.

577
00:32:33,036 --> 00:32:35,916
Speaker 2: Means get me a one way flight for the other

578
00:32:36,036 --> 00:32:38,636
Speaker 2: side of the world. I hope people are really happy there.

579
00:32:39,276 --> 00:32:42,436
Speaker 1: Well you're here, now, you're you're you're in Tampa. Now

580
00:32:42,476 --> 00:32:43,836
Speaker 1: did it work out as hoped?

581
00:32:44,116 --> 00:32:46,796
Speaker 2: Oh? Yeah, I mean I have I have the best

582
00:32:46,836 --> 00:32:49,476
Speaker 2: job in the world, you know. I get my my

583
00:32:49,556 --> 00:32:53,316
Speaker 2: mom raised us me and my brother saying you guys

584
00:32:53,356 --> 00:32:56,956
Speaker 2: need two jobs, one that make money, makes money and

585
00:32:56,996 --> 00:33:00,276
Speaker 2: the other one that makes you really happy. And if

586
00:33:00,316 --> 00:33:02,996
Speaker 2: you manage to have both in one job, then you'll

587
00:33:02,996 --> 00:33:04,996
Speaker 2: have made it, you know. And I get to be

588
00:33:05,116 --> 00:33:10,596
Speaker 2: a surgeon and work with voice and voice professional and

589
00:33:10,636 --> 00:33:13,556
Speaker 2: you know, it's been my passion pretty much all my life.

590
00:33:13,556 --> 00:33:14,876
Speaker 3: So yeah, do.

591
00:33:14,876 --> 00:33:17,916
Speaker 1: You work with professional singers as a physician?

592
00:33:18,316 --> 00:33:22,276
Speaker 2: Absolutely? I mean I love treating my professional singers, so yeah,

593
00:33:22,316 --> 00:33:23,116
Speaker 2: I love that part.

594
00:33:22,916 --> 00:33:23,396
Speaker 3: Of my job.

595
00:33:23,836 --> 00:33:26,076
Speaker 1: Have you treated anybody famous? Yes?

596
00:33:26,156 --> 00:33:26,836
Speaker 3: But I can't tell.

597
00:33:28,716 --> 00:33:30,116
Speaker 1: What's Taylor Swift really like?

598
00:33:30,476 --> 00:33:32,276
Speaker 3: Oh that I don't know. I wish No.

599
00:33:32,676 --> 00:33:35,116
Speaker 2: She sounds fine, though she probably doesn't need a laryngologist.

600
00:33:35,756 --> 00:33:37,516
Speaker 1: What's the best cure for a sore throat?

601
00:33:39,196 --> 00:33:43,356
Speaker 2: Voice rest and advil? It takes the inflammation away.

602
00:33:43,956 --> 00:33:48,676
Speaker 1: Uh huh? Advil? Just ibuprofen and don't talk and voice rest.

603
00:33:48,836 --> 00:33:49,036
Speaker 3: Yes?

604
00:33:49,396 --> 00:33:55,196
Speaker 1: Okay? Are you just always involuntarily diagnosing people based on

605
00:33:55,236 --> 00:33:57,076
Speaker 1: their voice all the time? Okay?

606
00:33:57,156 --> 00:33:58,836
Speaker 3: So funny funny fun fact.

607
00:33:58,916 --> 00:34:03,356
Speaker 2: Two months ago, my girlfriend from residency called me. I

608
00:34:03,356 --> 00:34:05,036
Speaker 2: hadn't spoken to her in like a year and a

609
00:34:05,076 --> 00:34:07,916
Speaker 2: half and she called me and she said hi, And

610
00:34:07,956 --> 00:34:09,076
Speaker 2: I said, you're pregnant?

611
00:34:09,996 --> 00:34:10,436
Speaker 1: Really?

612
00:34:11,676 --> 00:34:15,276
Speaker 2: And I could hear it because pregnancy gives you like this,

613
00:34:15,676 --> 00:34:18,076
Speaker 2: you know, you get stuffy in a certain way in

614
00:34:18,116 --> 00:34:20,876
Speaker 2: your nose, like we call it rhyanidis of pregnancy.

615
00:34:21,276 --> 00:34:22,676
Speaker 3: And I knew her voice very well.

616
00:34:22,716 --> 00:34:24,956
Speaker 2: She was my girlfriend for a long time, you know,

617
00:34:24,996 --> 00:34:28,356
Speaker 2: we studied together, and she just I knew it, you know.

618
00:34:28,556 --> 00:34:30,796
Speaker 2: And and I think she says, hey, how are you.

619
00:34:30,836 --> 00:34:32,556
Speaker 2: I wanted to talk to you, and I'm like, you're pregnant,

620
00:34:32,596 --> 00:34:33,276
Speaker 2: and She's like, how.

621
00:34:33,156 --> 00:34:35,196
Speaker 3: Did you know? So?

622
00:34:35,356 --> 00:34:37,356
Speaker 1: Yes, that's amazing people.

623
00:34:37,596 --> 00:34:40,756
Speaker 2: I mean I was listening to the political debates, you know,

624
00:34:41,276 --> 00:34:43,436
Speaker 2: and I'm like, ooh, this guy needs a laryngologist. I

625
00:34:43,436 --> 00:34:45,196
Speaker 2: could I'm diagnosing people all the time.

626
00:34:46,076 --> 00:34:52,036
Speaker 1: Well, they should give you a call. Yeah, absolutely, Okay,

627
00:34:53,076 --> 00:34:54,076
Speaker 1: lovely to talk with you.

628
00:34:54,996 --> 00:34:58,956
Speaker 2: It was one of the funnest interviews I've done.

629
00:35:00,156 --> 00:35:02,836
Speaker 1: Yeah, Albin Susan runs the Health Voice Center at the

630
00:35:02,956 --> 00:35:06,796
Speaker 1: University of South Florida. She's also a principal investigator on

631
00:35:06,836 --> 00:35:11,076
Speaker 1: the Bridge to AI Voice Project. Today's show was produced

632
00:35:11,076 --> 00:35:14,116
Speaker 1: by Gabriel Hunter Chang, edited by Lydia Jeane Kott, and

633
00:35:14,356 --> 00:35:17,716
Speaker 1: engineered by Sarah Bruguer. Just a quick note, We're going

634
00:35:17,756 --> 00:35:20,036
Speaker 1: to be taking a break for the next couple of weeks,

635
00:35:20,476 --> 00:35:23,276
Speaker 1: but we will have an episode in our feed next

636
00:35:23,276 --> 00:35:26,356
Speaker 1: week from our colleagues over at the Happiness Lab that

637
00:35:26,556 --> 00:35:31,156
Speaker 1: is timed not coincidentally to World Happiness Day, which I'm

638
00:35:31,196 --> 00:35:34,836
Speaker 1: informed is on March twentieth. I'm Jacob Goldstein and we'll

639
00:35:34,836 --> 00:35:48,276
Speaker 1: be back soon with more episodes of What's Your Problem