1
00:00:04,240 --> 00:00:07,240
Speaker 1: Welcome to Tech Stuff, a production of I Heart Radios

2
00:00:07,320 --> 00:00:13,880
Speaker 1: How Stuff Works. Hey there, and welcome to tech Stuff.

3
00:00:13,880 --> 00:00:17,400
Speaker 1: I'm your host, Jonathan Strickland. I'm an executive producer with

4
00:00:17,600 --> 00:00:19,560
Speaker 1: I Heart Radio and How Stuff Works, and I love

5
00:00:19,600 --> 00:00:24,200
Speaker 1: all things tech. And I'm sitting in the audience of

6
00:00:24,239 --> 00:00:28,240
Speaker 1: a local theater like Stage theater not long ago. I'm

7
00:00:28,240 --> 00:00:31,440
Speaker 1: waiting for the show to start, and there's a song

8
00:00:31,720 --> 00:00:34,440
Speaker 1: that's playing over the sound system, and I'm really kind

9
00:00:34,440 --> 00:00:37,479
Speaker 1: of digging the song, but I totally don't recognize it.

10
00:00:38,040 --> 00:00:40,920
Speaker 1: And I glanced down at my phone and I see

11
00:00:41,240 --> 00:00:44,320
Speaker 1: that on the phone below the time on the locked

12
00:00:44,400 --> 00:00:48,600
Speaker 1: phone screen, it says that the song is danger High

13
00:00:48,680 --> 00:00:52,280
Speaker 1: Voltage by Electric six. Now this is obviously a hypothetical

14
00:00:52,280 --> 00:00:54,680
Speaker 1: example because I would recognize that song anywhere, but you

15
00:00:54,720 --> 00:00:57,959
Speaker 1: get the point. Anyway, I'm thinking, that's so cool. My

16
00:00:58,000 --> 00:01:01,640
Speaker 1: phone knows what songs are playing around me. That's so neat.

17
00:01:02,360 --> 00:01:05,000
Speaker 1: I didn't even have to tell to do anything. And

18
00:01:05,040 --> 00:01:07,760
Speaker 1: then a couple of hours later, as I think back

19
00:01:07,800 --> 00:01:11,560
Speaker 1: on this moment, uncertainty and dreads start to see Ben,

20
00:01:11,680 --> 00:01:15,240
Speaker 1: wait a minute, if my phone can identify a song

21
00:01:15,440 --> 00:01:18,400
Speaker 1: that's playing around me, that means my phone is actually

22
00:01:18,440 --> 00:01:21,319
Speaker 1: listening to stuff. It wouldn't be able to tell me

23
00:01:21,680 --> 00:01:23,920
Speaker 1: the song title. Otherwise it has to be able to

24
00:01:23,959 --> 00:01:26,959
Speaker 1: pick up the audio. I didn't activate any app. I

25
00:01:26,959 --> 00:01:30,880
Speaker 1: didn't turn on shah Zam or ask my phone or anything.

26
00:01:30,920 --> 00:01:33,560
Speaker 1: My phone did it by itself. So my phone is

27
00:01:33,600 --> 00:01:36,800
Speaker 1: detecting the sounds around it even when it's not in

28
00:01:36,920 --> 00:01:41,280
Speaker 1: an active mode. Now, on a similar note, I'm sure

29
00:01:41,440 --> 00:01:45,640
Speaker 1: we all have had these personal assistant experiences out there.

30
00:01:45,680 --> 00:01:48,520
Speaker 1: Whether we use one ourselves, we've been around when someone

31
00:01:48,520 --> 00:01:52,880
Speaker 1: else uses them, things like Google Assistant or Alexa or

32
00:01:52,920 --> 00:01:56,120
Speaker 1: Siri or Cartana. There's more of them out there. You

33
00:01:56,160 --> 00:01:59,200
Speaker 1: can activate these assistants with a specific word or phrase,

34
00:01:59,560 --> 00:02:01,640
Speaker 1: and then you speak to them to carry out some

35
00:02:01,680 --> 00:02:04,560
Speaker 1: sort of task or to get you some sort of

36
00:02:04,560 --> 00:02:07,400
Speaker 1: information or something along those lines. We've got a Google

37
00:02:07,440 --> 00:02:10,200
Speaker 1: Home device in our house, so we might use it

38
00:02:10,240 --> 00:02:13,480
Speaker 1: to get a quick rundown on the weather Report. We

39
00:02:13,560 --> 00:02:15,360
Speaker 1: might ask it to play a track off an album

40
00:02:15,360 --> 00:02:19,000
Speaker 1: by the jazz Fusion band weather Report. But wait, that

41
00:02:19,080 --> 00:02:22,120
Speaker 1: means that device is listening to We didn't have to

42
00:02:22,120 --> 00:02:24,280
Speaker 1: take any physical action. We didn't have to push a

43
00:02:24,320 --> 00:02:27,560
Speaker 1: button to make it work. We just spoke the keyword

44
00:02:27,720 --> 00:02:31,160
Speaker 1: or a key phrase, and off it goes. And then

45
00:02:31,200 --> 00:02:34,760
Speaker 1: we get into stuff that seems super creepy. And I'm

46
00:02:34,800 --> 00:02:37,240
Speaker 1: sure most of you have had some sort of experience

47
00:02:37,280 --> 00:02:40,840
Speaker 1: like this. Say you're chatting with friends, maybe you're at

48
00:02:40,880 --> 00:02:44,400
Speaker 1: a restaurant or you're just hanging out, and you're talking

49
00:02:44,440 --> 00:02:47,480
Speaker 1: about this new snack food you just heard about, and

50
00:02:47,520 --> 00:02:50,519
Speaker 1: this is just one part of a conversation that rambles

51
00:02:50,560 --> 00:02:55,200
Speaker 1: all over the place. But then you talk a little

52
00:02:55,200 --> 00:02:56,840
Speaker 1: bit about the snack food for a couple of minutes.

53
00:02:56,840 --> 00:02:58,760
Speaker 1: You're like, you've heard about it, you wanted to try it,

54
00:02:58,880 --> 00:03:01,080
Speaker 1: you haven't tried it yet. Later on, you pop on

55
00:03:01,120 --> 00:03:03,079
Speaker 1: over to Facebook, and as you're scrolling through your feed,

56
00:03:03,160 --> 00:03:06,440
Speaker 1: there it is. There's an ad for the very same

57
00:03:06,480 --> 00:03:09,480
Speaker 1: snack food you mentioned to your friends just a little

58
00:03:09,480 --> 00:03:13,240
Speaker 1: earlier that day. You've never purchased the snack as far

59
00:03:13,280 --> 00:03:15,520
Speaker 1: as you remember, you haven't even searched for it on

60
00:03:15,560 --> 00:03:19,240
Speaker 1: the web, and there's the ad. So as Facebook listening

61
00:03:19,280 --> 00:03:22,200
Speaker 1: in on your conversation in an effort to serve up

62
00:03:22,240 --> 00:03:26,680
Speaker 1: a laser focused targeted ad. One this episode, we're gonna

63
00:03:26,680 --> 00:03:29,840
Speaker 1: take a look at the technology that allows our devices

64
00:03:29,880 --> 00:03:33,320
Speaker 1: to listen in on us, and we'll explore the studies

65
00:03:33,320 --> 00:03:36,200
Speaker 1: about whether or not anything hanky is going on and

66
00:03:36,200 --> 00:03:40,400
Speaker 1: try to separate fact from fud FU D that's fear,

67
00:03:40,520 --> 00:03:44,240
Speaker 1: uncertainty and doubt. And we'll also chat about some recent

68
00:03:44,320 --> 00:03:47,120
Speaker 1: news stories about how big companies have been handing over

69
00:03:47,160 --> 00:03:51,280
Speaker 1: audio messages to third party human contractors and what that

70
00:03:51,360 --> 00:03:55,680
Speaker 1: means in terms of privacy and ethics. Now, first, let's

71
00:03:55,720 --> 00:04:00,160
Speaker 1: address a big reason why devices aren't constantly recording or

72
00:04:00,200 --> 00:04:05,520
Speaker 1: broadcasting all the sounds within an environment that's reachable by microphone.

73
00:04:06,320 --> 00:04:10,840
Speaker 1: It's because that's truly enormous, Like, that's a huge amount

74
00:04:10,960 --> 00:04:14,040
Speaker 1: of data. So let's just take Facebook as an example.

75
00:04:14,680 --> 00:04:18,360
Speaker 1: There are more than two billion people using Facebook every month.

76
00:04:18,880 --> 00:04:21,080
Speaker 1: At least one and a half billion people pop on

77
00:04:21,080 --> 00:04:24,400
Speaker 1: Facebook every single day. Now that's not necessarily the same

78
00:04:24,880 --> 00:04:27,680
Speaker 1: one and a half billion people every day, but every

79
00:04:27,760 --> 00:04:31,640
Speaker 1: day one point five billion people check Facebook, and out

80
00:04:31,640 --> 00:04:35,400
Speaker 1: of that number, nearly one billion of them are accessing

81
00:04:35,440 --> 00:04:40,360
Speaker 1: Facebook on mobile devices. So, just from a data management standpoint,

82
00:04:41,040 --> 00:04:45,240
Speaker 1: there's no way any company, even one as large as Facebook,

83
00:04:45,400 --> 00:04:49,279
Speaker 1: could be actively monitoring, recording, or even analyzing all that

84
00:04:49,360 --> 00:04:54,080
Speaker 1: audio that would be coming in from a billion mobile handsets.

85
00:04:54,960 --> 00:04:56,960
Speaker 1: We are in the age of big data, but we

86
00:04:57,040 --> 00:04:59,640
Speaker 1: still have our limits. Plus you'd have to figure out

87
00:05:00,240 --> 00:05:03,520
Speaker 1: that you know that that large amount of data, most

88
00:05:03,560 --> 00:05:06,640
Speaker 1: of it wouldn't be useful to Facebook. Now, don't get

89
00:05:06,640 --> 00:05:08,880
Speaker 1: me wrong. At the end of the day, you and

90
00:05:08,960 --> 00:05:14,000
Speaker 1: I are the products being bought and sold on Facebook

91
00:05:14,080 --> 00:05:19,240
Speaker 1: and Google and other providers out there. We're potential customers

92
00:05:19,279 --> 00:05:22,720
Speaker 1: for all of the advertisers that use those companies like

93
00:05:22,760 --> 00:05:26,839
Speaker 1: Facebook as a platform. So it benefits the advertisers and

94
00:05:27,040 --> 00:05:31,120
Speaker 1: Facebook and sometimes even us as customers to match the

95
00:05:31,200 --> 00:05:34,360
Speaker 1: right ads to the right people. So there's definitely an

96
00:05:34,400 --> 00:05:37,880
Speaker 1: incentive to learn as much about users as possible to

97
00:05:38,000 --> 00:05:42,200
Speaker 1: leverage their interests and potentially convert them into paying customers

98
00:05:42,240 --> 00:05:45,960
Speaker 1: to an advertiser. Now, this is the very basic foundation

99
00:05:46,080 --> 00:05:50,520
Speaker 1: of Facebook's business model. So if Facebook could do this

100
00:05:50,839 --> 00:05:54,160
Speaker 1: from a technical standpoint, and if the company could get

101
00:05:54,200 --> 00:05:58,400
Speaker 1: away with it from a public perception standpoint, I think

102
00:05:58,400 --> 00:06:03,000
Speaker 1: there's little doubt that face Book would do it. But honestly,

103
00:06:03,000 --> 00:06:05,440
Speaker 1: it's just way too much information to process and to

104
00:06:05,480 --> 00:06:09,200
Speaker 1: boil down into actionable plans. We talk about a lot

105
00:06:09,200 --> 00:06:12,080
Speaker 1: of stuff in our day, you know, and some of

106
00:06:12,120 --> 00:06:14,159
Speaker 1: it we may not really be interested in. We're just

107
00:06:14,200 --> 00:06:17,839
Speaker 1: talking about something, So it wouldn't do Facebook any good

108
00:06:17,839 --> 00:06:20,239
Speaker 1: to serve up ads for stuff that we weren't actually

109
00:06:20,279 --> 00:06:22,880
Speaker 1: really interested in, So it has to pick and choose

110
00:06:22,880 --> 00:06:27,360
Speaker 1: its moments. Facebook has denied using phone microphones in this way.

111
00:06:27,720 --> 00:06:30,320
Speaker 1: In a June second, two thousand sixteen blog post on

112
00:06:30,360 --> 00:06:34,280
Speaker 1: the Facebook newsroom site, a company representative wrote this, and

113
00:06:34,320 --> 00:06:39,720
Speaker 1: here's a quote. Facebook does not use your phone's microphone

114
00:06:39,760 --> 00:06:42,359
Speaker 1: to inform ads or to change what you see in

115
00:06:42,440 --> 00:06:45,800
Speaker 1: news feed. Some recent articles have suggested that we must

116
00:06:45,839 --> 00:06:48,280
Speaker 1: be listening to people's conversations in order to show them

117
00:06:48,279 --> 00:06:52,360
Speaker 1: relevant ads. This is not true. We show ads based

118
00:06:52,400 --> 00:06:56,400
Speaker 1: on people's interests and other profile information, not what you're

119
00:06:56,400 --> 00:07:00,160
Speaker 1: talking out loud about. We only access your microphone if

120
00:07:00,200 --> 00:07:02,560
Speaker 1: you have given our app permission, and if you are

121
00:07:02,600 --> 00:07:06,560
Speaker 1: actively using a specific feature that requires audio. This might

122
00:07:06,600 --> 00:07:09,600
Speaker 1: include recording a video or using an optional feature we

123
00:07:09,640 --> 00:07:12,560
Speaker 1: introduced two years ago to include music or other audio

124
00:07:12,600 --> 00:07:18,240
Speaker 1: in your status updates. End quote. Now, it's understandable that

125
00:07:18,320 --> 00:07:22,200
Speaker 1: people would be a bit skeptical regarding Facebook's claims of innocence.

126
00:07:22,520 --> 00:07:25,840
Speaker 1: In this regard. The company has had several high profile

127
00:07:25,920 --> 00:07:29,840
Speaker 1: scandals and issues with privacy and security. Zuckerberg himself once

128
00:07:29,960 --> 00:07:35,240
Speaker 1: famously declared that privacy is dead. Also, he simultaneously does

129
00:07:35,280 --> 00:07:38,400
Speaker 1: his best to preserve his own privacy. But that's commentary

130
00:07:38,440 --> 00:07:42,400
Speaker 1: for another episode. So I don't blame people for thinking

131
00:07:42,440 --> 00:07:45,480
Speaker 1: that Facebook might actually be listening in on conversations because

132
00:07:45,480 --> 00:07:48,880
Speaker 1: the company has already proven it hasn't been the best

133
00:07:49,000 --> 00:07:52,640
Speaker 1: steward of user privacy in the past. But that doesn't

134
00:07:52,680 --> 00:07:56,040
Speaker 1: mean the company has actually been spying on people. It

135
00:07:56,080 --> 00:08:00,480
Speaker 1: doesn't have to, at least not in that way. And

136
00:08:00,720 --> 00:08:03,680
Speaker 1: this is where we get into some troubling territory because

137
00:08:03,720 --> 00:08:06,200
Speaker 1: it's where we start to learn how services like Google

138
00:08:06,280 --> 00:08:10,880
Speaker 1: and Facebook and others can glean information about us, whether

139
00:08:10,960 --> 00:08:14,240
Speaker 1: we have consciously shared that information or not, and it

140
00:08:14,240 --> 00:08:17,840
Speaker 1: helps explain how these companies can advertise to us so effectively.

141
00:08:18,640 --> 00:08:22,200
Speaker 1: One way Facebook does this is with an innovation called

142
00:08:22,360 --> 00:08:26,640
Speaker 1: Facebook Pixel. Now, this is a piece of code that

143
00:08:27,000 --> 00:08:32,320
Speaker 1: Facebook's clients advertisers really can put on their own websites.

144
00:08:32,720 --> 00:08:35,600
Speaker 1: So it's the type of code you would insert into

145
00:08:35,640 --> 00:08:38,040
Speaker 1: the website for a business. So let's say you own

146
00:08:38,080 --> 00:08:42,359
Speaker 1: a specialty niche marketing shop. We'll say you sell figurines

147
00:08:42,400 --> 00:08:46,200
Speaker 1: based off of iconic horror movie monsters and characters, and

148
00:08:46,240 --> 00:08:49,200
Speaker 1: you're going to advertise on Facebook. The pixel code is

149
00:08:49,240 --> 00:08:52,920
Speaker 1: one way Facebook can optimize that experience. The code pulls

150
00:08:52,960 --> 00:08:57,320
Speaker 1: information off of user behavior on your website and sends

151
00:08:57,320 --> 00:09:00,760
Speaker 1: it to Facebook. If people click over to your site

152
00:09:00,760 --> 00:09:03,560
Speaker 1: because of an ad on Facebook, pixel will register it.

153
00:09:04,000 --> 00:09:07,120
Speaker 1: This helps you see how effective or ineffective your ads

154
00:09:07,200 --> 00:09:10,800
Speaker 1: are on the site. It also can target your ads

155
00:09:10,920 --> 00:09:13,520
Speaker 1: to people on Facebook who would be most likely to

156
00:09:13,600 --> 00:09:17,160
Speaker 1: click on those ads. It might analyze the traits common

157
00:09:17,200 --> 00:09:19,600
Speaker 1: to people who are interacting with your ads, and then

158
00:09:19,640 --> 00:09:22,760
Speaker 1: extrapolate that to target people who have similar traits and

159
00:09:22,880 --> 00:09:27,920
Speaker 1: behaviors but they haven't yet seen your advertisements. Facebook, meanwhile,

160
00:09:28,040 --> 00:09:30,360
Speaker 1: can also use that data to serve up ads from

161
00:09:30,400 --> 00:09:33,559
Speaker 1: other companies to users based on similar findings, and it

162
00:09:33,640 --> 00:09:36,400
Speaker 1: can track other stuff too. Let's say you click over

163
00:09:36,480 --> 00:09:38,880
Speaker 1: to an article on a blog or news site that

164
00:09:38,960 --> 00:09:42,680
Speaker 1: incorporates Facebook pixel in the site's code. Facebook can see

165
00:09:42,679 --> 00:09:45,160
Speaker 1: how long you were on that article, which in turn

166
00:09:45,200 --> 00:09:48,600
Speaker 1: indicates your interest and investment level in that topic. Then

167
00:09:48,640 --> 00:09:51,640
Speaker 1: Facebook can serve up ads related to the contents of

168
00:09:51,679 --> 00:09:54,920
Speaker 1: that article to you. In the end, it's all about

169
00:09:54,920 --> 00:09:58,760
Speaker 1: analyzing user behavior to get the biggest return on investment,

170
00:09:59,080 --> 00:10:01,800
Speaker 1: and it doesn't require are using the microphone to do it.

171
00:10:02,160 --> 00:10:05,000
Speaker 1: They can just look at who you are, where you've been,

172
00:10:05,440 --> 00:10:09,280
Speaker 1: both in real life if it's tracking your location and

173
00:10:09,360 --> 00:10:12,720
Speaker 1: on the Internet if it's tracking your your browsing and

174
00:10:12,800 --> 00:10:15,600
Speaker 1: who your friends are. And all of this information combined

175
00:10:16,000 --> 00:10:19,240
Speaker 1: gives Facebook a ton of data about what kind of

176
00:10:19,280 --> 00:10:21,920
Speaker 1: ads to target towards you. Now, on top of that,

177
00:10:22,200 --> 00:10:26,120
Speaker 1: Facebook can purchase information from data brokers to supplement its

178
00:10:26,120 --> 00:10:29,400
Speaker 1: own guard Ganga and database. There are companies that manage

179
00:10:29,400 --> 00:10:33,160
Speaker 1: stuff like loyalty programs, which also track what you buy.

180
00:10:33,360 --> 00:10:36,000
Speaker 1: They have to for the loyalty programs to work, and

181
00:10:36,040 --> 00:10:39,400
Speaker 1: those purchases are linked to you as a person. They know, Oh,

182
00:10:39,480 --> 00:10:42,480
Speaker 1: Jonathan goes to Starbucks all the time and he always

183
00:10:42,480 --> 00:10:45,520
Speaker 1: gets those Nitro cold brews, So let's put an ad

184
00:10:46,000 --> 00:10:49,720
Speaker 1: that targets him based on that information. Now, that data

185
00:10:49,800 --> 00:10:51,920
Speaker 1: isn't just being used to help you get the best

186
00:10:52,200 --> 00:10:56,080
Speaker 1: deal on whatever it happens to be. That information is valuable.

187
00:10:56,559 --> 00:11:00,480
Speaker 1: So companies that manage these loyalty programs can and do

188
00:11:00,840 --> 00:11:03,600
Speaker 1: buy and sell sell that data you know are spending

189
00:11:03,640 --> 00:11:07,400
Speaker 1: habits are part of this sort of encyclopedia entry about

190
00:11:07,400 --> 00:11:11,080
Speaker 1: our interests, priorities, and behaviors. Now, none of this needs

191
00:11:11,200 --> 00:11:15,200
Speaker 1: to use a microphone to spy on us. So in

192
00:11:15,240 --> 00:11:17,800
Speaker 1: the case of seeing that snack food pop up on

193
00:11:17,800 --> 00:11:20,480
Speaker 1: the Facebook feed, it could simply be that you exhibit

194
00:11:20,559 --> 00:11:23,520
Speaker 1: behaviors similar to ones that people who have bought that

195
00:11:23,600 --> 00:11:26,200
Speaker 1: snack food tend to have. As well. You've liked the

196
00:11:26,240 --> 00:11:29,480
Speaker 1: same sort of pages. You may even have a lot

197
00:11:29,520 --> 00:11:32,080
Speaker 1: of friends who have already bought this stuff. You may

198
00:11:32,120 --> 00:11:34,959
Speaker 1: live in a region where it has recently been introduced.

199
00:11:35,360 --> 00:11:37,600
Speaker 1: These are the kinds of points of data that Facebook

200
00:11:37,679 --> 00:11:39,320
Speaker 1: might use in order to serve that add up to

201
00:11:39,360 --> 00:11:41,840
Speaker 1: you that have nothing to do with your microphone. So

202
00:11:41,880 --> 00:11:44,640
Speaker 1: you got the ad not because you talked about the

203
00:11:44,640 --> 00:11:47,760
Speaker 1: snack food, but because Facebook has sussed out you're the

204
00:11:47,760 --> 00:11:50,640
Speaker 1: type of person who would like that snack food because

205
00:11:51,400 --> 00:11:54,360
Speaker 1: spoiler alert, You're not as special as you think you are,

206
00:11:54,880 --> 00:11:57,600
Speaker 1: and I'm not as special as I think I am.

207
00:11:57,640 --> 00:12:00,080
Speaker 1: Now you could argue, and I would agree with you

208
00:12:00,160 --> 00:12:03,480
Speaker 1: on this, that what Facebook is doing is at least

209
00:12:03,559 --> 00:12:06,520
Speaker 1: as creepy as listening in on a microphone, perhaps even

210
00:12:06,600 --> 00:12:10,760
Speaker 1: more so. Facebook has filed patents that focus on technology

211
00:12:10,840 --> 00:12:13,200
Speaker 1: is meant to predict where you're going to go next

212
00:12:13,559 --> 00:12:16,400
Speaker 1: based on your history of location data. So, in other words,

213
00:12:16,640 --> 00:12:19,160
Speaker 1: Facebook is trying to figure out where you're going to

214
00:12:19,240 --> 00:12:23,000
Speaker 1: go before you go there. And it's not just you,

215
00:12:23,160 --> 00:12:25,680
Speaker 1: it's all the people you know who are using Facebook

216
00:12:25,720 --> 00:12:29,440
Speaker 1: two and so it's not just predicting where you'll go,

217
00:12:30,120 --> 00:12:33,600
Speaker 1: it's also predicting which people you may be running into,

218
00:12:33,679 --> 00:12:35,800
Speaker 1: because it's predicting those people are going to go to

219
00:12:35,840 --> 00:12:38,560
Speaker 1: that same place and whether or not you might encounter

220
00:12:38,679 --> 00:12:41,199
Speaker 1: one another. It can also use that to make suggestions

221
00:12:41,240 --> 00:12:44,480
Speaker 1: to add people on Facebook who are going to those

222
00:12:44,520 --> 00:12:48,240
Speaker 1: same places so that they become your friends online. Now

223
00:12:48,240 --> 00:12:51,400
Speaker 1: why does Facebook care who your friends are? Because the

224
00:12:51,440 --> 00:12:55,120
Speaker 1: more people who use Facebook and the more interconnected they become,

225
00:12:55,640 --> 00:12:59,480
Speaker 1: the more useful the information they generate for Facebook. That

226
00:12:59,720 --> 00:13:03,640
Speaker 1: that ends up becoming more valuable to the company. So

227
00:13:05,040 --> 00:13:07,480
Speaker 1: it is pretty creepy and invasive, and it doesn't have

228
00:13:07,520 --> 00:13:10,439
Speaker 1: to use the microphone. But when we come back, I'll

229
00:13:10,440 --> 00:13:13,040
Speaker 1: talk a bit more about these sound activated features and

230
00:13:13,080 --> 00:13:15,439
Speaker 1: what's actually going on, because there is some stuff we've

231
00:13:15,480 --> 00:13:17,760
Speaker 1: got to be worried about. But first, let's take a

232
00:13:17,880 --> 00:13:28,240
Speaker 1: quick break. When I opened this show, I talked about

233
00:13:28,240 --> 00:13:30,920
Speaker 1: how my phone could listen in on music and identify

234
00:13:31,000 --> 00:13:34,320
Speaker 1: the song even when the phone was in its locked mode.

235
00:13:34,800 --> 00:13:38,200
Speaker 1: Now that's because I have a Pixel to xcel phone.

236
00:13:38,240 --> 00:13:41,839
Speaker 1: It's an Android phone. It's actually a flagship Google phone,

237
00:13:42,160 --> 00:13:45,400
Speaker 1: and there's a feature on the Pixel too that's called

238
00:13:45,640 --> 00:13:48,560
Speaker 1: now playing. You have to activate this feature, you have

239
00:13:48,600 --> 00:13:51,679
Speaker 1: to choose to optimize it. So I want to make

240
00:13:51,720 --> 00:13:54,679
Speaker 1: that clear. I chose to activate this feature. It's not

241
00:13:54,760 --> 00:13:59,240
Speaker 1: just active by default, and with it active, the phone

242
00:13:59,240 --> 00:14:01,920
Speaker 1: can identify music that's playing, and it can tell me

243
00:14:01,960 --> 00:14:04,720
Speaker 1: the title even when the phone is in its locked position.

244
00:14:04,800 --> 00:14:08,360
Speaker 1: So what gives Well, this is not as creepy and

245
00:14:08,440 --> 00:14:12,040
Speaker 1: invasive as it sounds at first glance, because his feature,

246
00:14:12,480 --> 00:14:16,480
Speaker 1: this is incredible to me, is actually entirely local to

247
00:14:16,600 --> 00:14:21,320
Speaker 1: the Pixel two phones. It works on the phone itself.

248
00:14:21,360 --> 00:14:24,320
Speaker 1: It's not consulting the cloud at all, it's not sending

249
00:14:24,360 --> 00:14:28,760
Speaker 1: any information. So how can that be possible? How can

250
00:14:29,320 --> 00:14:32,400
Speaker 1: all this information exists on the phone already? Well, let's

251
00:14:32,440 --> 00:14:35,960
Speaker 1: boil it down first, if you've ever played with any

252
00:14:36,000 --> 00:14:40,920
Speaker 1: digital sound recording software, you've likely seen sound recorded as

253
00:14:40,920 --> 00:14:44,880
Speaker 1: a wave form, a visualization of sound, and typically it's

254
00:14:44,880 --> 00:14:47,120
Speaker 1: pretty simple stuff like if you're using a very basic

255
00:14:47,240 --> 00:14:51,920
Speaker 1: sound recording system, you're mostly looking at changes in amplitude

256
00:14:52,280 --> 00:14:55,119
Speaker 1: or volume. In other words, so you see a continuous

257
00:14:55,200 --> 00:14:57,520
Speaker 1: series of peaks and valleys over the course of a

258
00:14:57,560 --> 00:15:02,200
Speaker 1: sound recording. Those represent the loudest and the quietest parts

259
00:15:02,240 --> 00:15:05,200
Speaker 1: of the recording that changes in volume. You can also

260
00:15:05,240 --> 00:15:09,480
Speaker 1: graph frequency or pitch, and you can if you zoom

261
00:15:09,520 --> 00:15:12,480
Speaker 1: way in, see shapes in the wave form that indicates

262
00:15:12,480 --> 00:15:17,080
Speaker 1: specific phonetics and sounds. Anyone who has worked in audio

263
00:15:17,240 --> 00:15:20,760
Speaker 1: editing for a while can identify at a glance certain

264
00:15:20,800 --> 00:15:26,000
Speaker 1: distinctive sounds. Tari, my producer, can probably tell you just

265
00:15:26,160 --> 00:15:29,520
Speaker 1: by looking at a waveform of my recording which moments

266
00:15:29,560 --> 00:15:34,400
Speaker 1: represent the irritating mouth sounds she removes before publishing an episode.

267
00:15:35,080 --> 00:15:37,680
Speaker 1: It doesn't take long before you can do this yourself.

268
00:15:38,040 --> 00:15:40,560
Speaker 1: It's actually pretty easy to identify, say it like a

269
00:15:40,640 --> 00:15:46,000
Speaker 1: high hat symbol in a music recording, because it's very distinctive. Now,

270
00:15:46,080 --> 00:15:49,200
Speaker 1: that means that songs have these distinctive features like a

271
00:15:49,240 --> 00:15:53,400
Speaker 1: fingerprint that represent the sound of the song, and if

272
00:15:53,440 --> 00:15:56,800
Speaker 1: you can recognize the fingerprint, you can identify the song

273
00:15:57,040 --> 00:15:59,600
Speaker 1: even if you're not listening to the song at that moment.

274
00:16:00,040 --> 00:16:03,000
Speaker 1: And you could look at a print out of a

275
00:16:03,000 --> 00:16:06,280
Speaker 1: wave form of a song and you can try and

276
00:16:06,360 --> 00:16:10,760
Speaker 1: match it against a library of print outs. That's essentially

277
00:16:10,840 --> 00:16:14,280
Speaker 1: what the pixel Too is doing. The program runs in

278
00:16:14,320 --> 00:16:17,960
Speaker 1: the background, It activates when the sound profile indicates that

279
00:16:18,000 --> 00:16:22,160
Speaker 1: there's music present, so it then analyzes the sound that's

280
00:16:22,160 --> 00:16:24,800
Speaker 1: coming in through the microphone and it creates one of

281
00:16:24,800 --> 00:16:28,400
Speaker 1: these digital fingerprints that I was just saying. Then, just

282
00:16:28,440 --> 00:16:31,040
Speaker 1: like you would with a crime scene fingerprint, the pixel

283
00:16:31,080 --> 00:16:34,760
Speaker 1: Too will compare the digital analysis of the song that's

284
00:16:34,760 --> 00:16:38,560
Speaker 1: playing against a local database on the phone of fingerprints

285
00:16:38,600 --> 00:16:42,640
Speaker 1: that represent thousands of popular songs for your region. Now

286
00:16:42,680 --> 00:16:45,920
Speaker 1: exactly how many hasn't really been released, but supposedly in

287
00:16:45,960 --> 00:16:49,560
Speaker 1: the tens of thousands of songs range. And if the

288
00:16:49,560 --> 00:16:51,920
Speaker 1: pixel Too finds a match between the song that is

289
00:16:51,960 --> 00:16:55,200
Speaker 1: currently playing and the one that's in the database, it

290
00:16:55,280 --> 00:16:58,200
Speaker 1: returns the result. This works even if the phone has

291
00:16:58,200 --> 00:17:01,840
Speaker 1: cellular and WiFi data turned off, because again it's all local.

292
00:17:02,440 --> 00:17:06,480
Speaker 1: Now the now playing feature doesn't run constantly because that

293
00:17:06,520 --> 00:17:10,119
Speaker 1: would drain battery life like crazy. Instead, it samples the

294
00:17:10,160 --> 00:17:14,600
Speaker 1: audio approximately every sixty seconds, and it takes time to

295
00:17:14,680 --> 00:17:17,560
Speaker 1: match a song to an entry in the database. The

296
00:17:17,600 --> 00:17:20,959
Speaker 1: cleaner the audio, in other words, the less background noise

297
00:17:21,040 --> 00:17:24,800
Speaker 1: and less interference that's present, the faster this process tends

298
00:17:24,800 --> 00:17:28,440
Speaker 1: to be. This means that when songs transition from one

299
00:17:28,480 --> 00:17:31,200
Speaker 1: song to another, it can take a little bit before

300
00:17:31,240 --> 00:17:33,879
Speaker 1: the phone registers the change. It all depends on the

301
00:17:33,920 --> 00:17:38,040
Speaker 1: acoustic quality of the environment and where in this sampling

302
00:17:38,160 --> 00:17:42,440
Speaker 1: cycle the phone is at any given time, so that's

303
00:17:42,480 --> 00:17:45,840
Speaker 1: not quite as creepy because everything's local on the device.

304
00:17:45,920 --> 00:17:49,159
Speaker 1: It's not sending any data out anywhere else. It's not

305
00:17:49,280 --> 00:17:52,240
Speaker 1: listening to what I'm listening to and an alerting Google

306
00:17:52,400 --> 00:17:55,359
Speaker 1: to let them know, hey, Jonathan's once again listening to

307
00:17:55,400 --> 00:17:59,960
Speaker 1: the soundtrack to be More Chill, which would be an

308
00:18:00,040 --> 00:18:03,000
Speaker 1: accurate suggestion that it would make because I do listen

309
00:18:03,040 --> 00:18:05,840
Speaker 1: to that a lot. Anyway, you can use this feature

310
00:18:06,520 --> 00:18:09,560
Speaker 1: to learn more about the track, the artist, the album,

311
00:18:09,600 --> 00:18:13,320
Speaker 1: including potentially purchasing that music. And those features do connect

312
00:18:13,359 --> 00:18:16,679
Speaker 1: to the outside world through WiFi or cellular connections, but

313
00:18:16,760 --> 00:18:20,639
Speaker 1: that requires an extra step on the part of the user. Also,

314
00:18:20,680 --> 00:18:23,520
Speaker 1: Google pushes out updates to this database with the most

315
00:18:23,520 --> 00:18:27,560
Speaker 1: popular songs, and these are regionalized to reflect the country

316
00:18:27,560 --> 00:18:31,240
Speaker 1: you're in, because you're less likely to run into, say

317
00:18:31,600 --> 00:18:35,320
Speaker 1: a Peruvian pop song when you're in Scotland. The push

318
00:18:35,440 --> 00:18:39,320
Speaker 1: updates do happen over WiFi or cellular local connections. But

319
00:18:39,960 --> 00:18:42,920
Speaker 1: but this is just the reference data that analyze music

320
00:18:42,960 --> 00:18:47,080
Speaker 1: gets compared against. An app like Shazam, on the other hand,

321
00:18:47,520 --> 00:18:50,400
Speaker 1: connects to the cloud, but you also have to activate

322
00:18:50,440 --> 00:18:52,760
Speaker 1: the app to have it listened to the audio, so

323
00:18:53,160 --> 00:18:56,439
Speaker 1: it's a user choice to have the app listen. So

324
00:18:56,480 --> 00:18:59,040
Speaker 1: this is more like a push to talk device, except

325
00:18:59,040 --> 00:19:02,439
Speaker 1: it's pushed to listen. Shazam is also analyzing music to

326
00:19:02,480 --> 00:19:05,399
Speaker 1: sus out a digital fingerprint for the audio, but it

327
00:19:05,480 --> 00:19:09,480
Speaker 1: can compare the sampled audio against a much larger database

328
00:19:09,800 --> 00:19:13,239
Speaker 1: consisting of millions of songs, rather than the tens of

329
00:19:13,280 --> 00:19:16,439
Speaker 1: thousands you would find on the pixel to now playing feature.

330
00:19:17,040 --> 00:19:20,320
Speaker 1: More importantly, I think it's fair to say this isn't

331
00:19:20,359 --> 00:19:23,679
Speaker 1: a creepy use of the technology, since the listening feature

332
00:19:23,760 --> 00:19:27,240
Speaker 1: only activates on the user's command rather than just being

333
00:19:27,320 --> 00:19:30,320
Speaker 1: on by default. Now, this isn't that much different than

334
00:19:30,359 --> 00:19:34,440
Speaker 1: what virtual assistants are doing when you use them. Clearly,

335
00:19:35,000 --> 00:19:38,359
Speaker 1: the microphone on a virtual assistant like Google Home or

336
00:19:38,440 --> 00:19:41,960
Speaker 1: Siri or whatever, it has to be active all the time,

337
00:19:42,040 --> 00:19:44,879
Speaker 1: otherwise you wouldn't get a response when you used whatever

338
00:19:44,920 --> 00:19:48,800
Speaker 1: the keyword or phrase was to activate the assistant. I'm

339
00:19:48,800 --> 00:19:52,440
Speaker 1: going to try and avoid saying any of those phrases,

340
00:19:52,520 --> 00:19:54,399
Speaker 1: by the way, because I don't want those of you

341
00:19:54,520 --> 00:19:57,280
Speaker 1: who have those devices to deal with the frustration of

342
00:19:57,320 --> 00:20:01,200
Speaker 1: them going off in response to something I say. A Now,

343
00:20:01,200 --> 00:20:05,000
Speaker 1: those words or phrases have a specific sound, just like

344
00:20:05,240 --> 00:20:09,040
Speaker 1: music does. In this case, we're talking about phonemes, which

345
00:20:09,040 --> 00:20:12,440
Speaker 1: are recognizable sounds found in language. So in English there

346
00:20:12,480 --> 00:20:16,560
Speaker 1: are forty four phonemes. The order and combination of those

347
00:20:16,560 --> 00:20:19,560
Speaker 1: phonemes are the key. So if you say something that

348
00:20:19,680 --> 00:20:23,000
Speaker 1: has those phonemes in the right order, or if it's

349
00:20:23,119 --> 00:20:26,440
Speaker 1: close enough, if it's an a noisy environment, this can

350
00:20:26,480 --> 00:20:30,560
Speaker 1: activate the virtual assistant. It's like a key fitting into

351
00:20:30,600 --> 00:20:33,640
Speaker 1: a lock. Now, if you're saying other stuff, it's like

352
00:20:33,680 --> 00:20:37,000
Speaker 1: the wrong key is inserted and nothing happens. It's only

353
00:20:37,000 --> 00:20:39,720
Speaker 1: when you say something that fits the lock that the

354
00:20:39,760 --> 00:20:45,000
Speaker 1: assistant activates. This process continues after activation. When you talk

355
00:20:45,080 --> 00:20:48,960
Speaker 1: to the virtual assistant, it analyzes your speech by phonemes.

356
00:20:49,920 --> 00:20:53,000
Speaker 1: Software processes those to figure out what words you are

357
00:20:53,080 --> 00:20:56,520
Speaker 1: actually saying. Well for the first step, that is, because

358
00:20:56,560 --> 00:21:00,199
Speaker 1: it's actually more complicated than that. So, for example, there

359
00:21:00,240 --> 00:21:03,440
Speaker 1: are hominems. These are words that have a similar sound

360
00:21:03,760 --> 00:21:08,480
Speaker 1: but different meanings and often different spellings. An easy example

361
00:21:08,600 --> 00:21:12,080
Speaker 1: is the number eight in the past tense for to eat,

362
00:21:12,520 --> 00:21:16,520
Speaker 1: such as I ate an entire bowl of cao. Mm

363
00:21:16,600 --> 00:21:22,840
Speaker 1: hmm okay. So those two words eight and eight sound

364
00:21:22,920 --> 00:21:26,199
Speaker 1: exactly the same, but they have different meanings. Now that

365
00:21:26,240 --> 00:21:29,400
Speaker 1: means the software can't rely on just the sounds you're

366
00:21:29,440 --> 00:21:32,000
Speaker 1: making when you speak to figure out what you mean,

367
00:21:32,480 --> 00:21:36,120
Speaker 1: has to actually analyze syntax and context and make judgment

368
00:21:36,160 --> 00:21:38,960
Speaker 1: calls about what you are actually meaning when you say

369
00:21:38,960 --> 00:21:43,040
Speaker 1: these things. Sometimes it gets things right, sometimes it gets

370
00:21:43,040 --> 00:21:45,840
Speaker 1: things wrong. But don't be too hard on it. Because

371
00:21:46,160 --> 00:21:50,000
Speaker 1: humans misunderstand other humans all the time. Even when we

372
00:21:50,040 --> 00:21:52,719
Speaker 1: are both communicating with it in the same language, we

373
00:21:52,760 --> 00:21:56,600
Speaker 1: can misunderstand each other. Now, this is still just the

374
00:21:56,680 --> 00:22:00,000
Speaker 1: first step you can think of. This is essentially speed

375
00:22:00,000 --> 00:22:02,960
Speaker 1: each to text. From there, you have to determine what

376
00:22:03,160 --> 00:22:06,320
Speaker 1: is actually being asked by the speaker, what is the

377
00:22:06,400 --> 00:22:11,600
Speaker 1: intent behind the words. If someone speaks French very slowly

378
00:22:11,640 --> 00:22:14,199
Speaker 1: to me, I might be able to spell out what

379
00:22:14,359 --> 00:22:17,400
Speaker 1: is being said phonetically, but that doesn't mean I understand

380
00:22:17,440 --> 00:22:21,360
Speaker 1: the actual content of what was spoken. And to complicate matters,

381
00:22:21,640 --> 00:22:23,560
Speaker 1: there are a lot of different ways to ask for

382
00:22:23,600 --> 00:22:27,199
Speaker 1: the same information. I might say what's the weather for

383
00:22:27,240 --> 00:22:30,280
Speaker 1: this week? Or will I need an umbrella today, or

384
00:22:30,320 --> 00:22:32,879
Speaker 1: one of a dozen other ways to inquire about the weather.

385
00:22:33,359 --> 00:22:36,479
Speaker 1: The software has to be able to determine what the

386
00:22:36,560 --> 00:22:40,960
Speaker 1: intent was behind my question, and then there's another step,

387
00:22:41,280 --> 00:22:45,280
Speaker 1: which is matching intent with action. The assistant has to

388
00:22:45,359 --> 00:22:48,679
Speaker 1: respond to my request, and hopefully it does so in

389
00:22:48,680 --> 00:22:51,320
Speaker 1: a way that's relevant to whatever I was asking about

390
00:22:51,320 --> 00:22:53,840
Speaker 1: in the first place. So if I ask my virtual

391
00:22:53,880 --> 00:22:56,720
Speaker 1: assistant for an update on the weather, I'm not going

392
00:22:56,760 --> 00:22:59,679
Speaker 1: to be impressed if it instead tells me about the

393
00:22:59,720 --> 00:23:03,720
Speaker 1: track FAIC or vice versa. And as assistants get connected

394
00:23:03,760 --> 00:23:08,320
Speaker 1: into more systems like security systems, lights, apps, and more,

395
00:23:08,760 --> 00:23:12,520
Speaker 1: the software has to send appropriate commands to these other

396
00:23:12,600 --> 00:23:16,679
Speaker 1: elements to produce the expected results. Now, this is all impressive,

397
00:23:17,000 --> 00:23:20,040
Speaker 1: and because it's impressive, it could be a little scary

398
00:23:20,160 --> 00:23:23,639
Speaker 1: when we think about assistance as hanging on our every word.

399
00:23:23,760 --> 00:23:27,440
Speaker 1: What are are they always listening? Are they always paying attention? Now?

400
00:23:27,480 --> 00:23:30,760
Speaker 1: They're always monitoring sound, but they're not doing so in

401
00:23:30,800 --> 00:23:34,520
Speaker 1: an effort to broadcast or record information. They are on

402
00:23:34,720 --> 00:23:39,399
Speaker 1: alert for that initiating phrase or word. They ignore everything else.

403
00:23:40,200 --> 00:23:43,399
Speaker 1: More on that a little bit later. Now that being said,

404
00:23:43,800 --> 00:23:47,280
Speaker 1: there are ways in which someone could hack an assistant

405
00:23:47,560 --> 00:23:51,199
Speaker 1: or a phone, or really any connected device that has

406
00:23:51,240 --> 00:23:55,719
Speaker 1: a microphone in order to eavesdrop using that devices microphone.

407
00:23:56,359 --> 00:23:59,280
Speaker 1: Edward Snowden revealed that the n s A use such

408
00:23:59,320 --> 00:24:03,520
Speaker 1: tactics in the agency's surveillance efforts. Apps that have access

409
00:24:03,560 --> 00:24:06,600
Speaker 1: to your phone's camera and microphone for the purposes of

410
00:24:06,640 --> 00:24:10,680
Speaker 1: sharing video, audio, and related features can do some disturbing

411
00:24:10,720 --> 00:24:13,800
Speaker 1: stuff if they're compromised. They can also do some disturbing

412
00:24:13,800 --> 00:24:16,520
Speaker 1: stuff if they're not compromised, but if the party behind

413
00:24:16,560 --> 00:24:22,240
Speaker 1: it is malicious. Felix Krauss made such an app as

414
00:24:22,280 --> 00:24:26,159
Speaker 1: a proof of concept for iOS devices. The app, like

415
00:24:26,240 --> 00:24:29,679
Speaker 1: many others, asked the user for permission to access the camera.

416
00:24:30,040 --> 00:24:32,639
Speaker 1: Kraus stated that once a user agreed to this, the

417
00:24:32,640 --> 00:24:36,240
Speaker 1: app could access both the front and back camera anytime

418
00:24:36,280 --> 00:24:38,800
Speaker 1: the app was in the foreground of the iOS device.

419
00:24:39,160 --> 00:24:42,159
Speaker 1: It could take videos and pictures with no indication to

420
00:24:42,200 --> 00:24:44,560
Speaker 1: the user that such a thing was happening, and it

421
00:24:44,600 --> 00:24:47,360
Speaker 1: could upload that data to a remote server. It could

422
00:24:47,400 --> 00:24:51,639
Speaker 1: even run real time facial recognition software. Now does this

423
00:24:51,720 --> 00:24:56,360
Speaker 1: mean apps like Facebook's Messenger or YouTube are doing this? Well,

424
00:24:56,359 --> 00:24:59,480
Speaker 1: not necessarily, but it does mean it's at least possible

425
00:24:59,600 --> 00:25:03,639
Speaker 1: to do and nothing is stopping him. More, let's say

426
00:25:03,680 --> 00:25:08,399
Speaker 1: ethically unconcerned app from doing just that. So what can

427
00:25:08,440 --> 00:25:12,480
Speaker 1: you do to protect yourself from bad actors? Uh, here's

428
00:25:12,520 --> 00:25:16,160
Speaker 1: the bad news. Not much you could go without using

429
00:25:16,160 --> 00:25:19,480
Speaker 1: such devices and apps in the first place. That's pretty

430
00:25:19,560 --> 00:25:23,520
Speaker 1: darn restrictive. Crowds recommended using camera covers to obscure the

431
00:25:23,520 --> 00:25:27,440
Speaker 1: phone's cameras when you weren't actively using them, or revoking

432
00:25:27,520 --> 00:25:30,800
Speaker 1: camera access to the various apps on the phone. And

433
00:25:30,920 --> 00:25:35,000
Speaker 1: that's about it. Yikes. Now, when we come back, I'll

434
00:25:35,040 --> 00:25:38,479
Speaker 1: cover a related topic that's been in the news lately.

435
00:25:38,520 --> 00:25:49,280
Speaker 1: But first let's take another quick break. Okay, so we

436
00:25:49,400 --> 00:25:52,720
Speaker 1: know it's possible to use cameras and microphones against people,

437
00:25:52,960 --> 00:25:56,560
Speaker 1: either with malware or what amounts to a security loophole

438
00:25:56,680 --> 00:26:00,240
Speaker 1: between handset hardware and apps. But there's something us we

439
00:26:00,240 --> 00:26:03,760
Speaker 1: need to chat about, and that's humans listening in on

440
00:26:03,840 --> 00:26:08,160
Speaker 1: what were assumed to be private conversations and messages. Now

441
00:26:08,160 --> 00:26:12,440
Speaker 1: here's the context. In August two thousand nineteen, several major

442
00:26:12,480 --> 00:26:17,480
Speaker 1: media outlets reported an upsetting revelation, namely that Facebook had

443
00:26:17,480 --> 00:26:20,520
Speaker 1: been sending out audio files that users were creating in

444
00:26:20,720 --> 00:26:24,760
Speaker 1: Facebook Messenger, for example. And these were audio clips sent

445
00:26:24,960 --> 00:26:28,720
Speaker 1: through Messenger itself, so it's akin to a private text

446
00:26:28,840 --> 00:26:32,000
Speaker 1: to a friend. And Facebook was sending these audio files

447
00:26:32,040 --> 00:26:36,359
Speaker 1: to a third party contractor to transcribe that audio. So

448
00:26:36,400 --> 00:26:40,159
Speaker 1: imagine having a private text message thread set to a

449
00:26:40,320 --> 00:26:43,600
Speaker 1: complete stranger for review. It was similar to that, except

450
00:26:43,600 --> 00:26:47,080
Speaker 1: it was audio, not text. So what's actually going on? Well,

451
00:26:47,320 --> 00:26:49,520
Speaker 1: Facebook said this all had to do with users who

452
00:26:49,560 --> 00:26:54,200
Speaker 1: had opted into having their audio messages transcribed automatically. Essentially,

453
00:26:54,960 --> 00:26:59,360
Speaker 1: it was all about using the voice to text option

454
00:26:59,800 --> 00:27:06,320
Speaker 1: in Facebook. Now, according to Express Computer, this option didn't

455
00:27:06,359 --> 00:27:09,720
Speaker 1: really have a warning that let you know that those

456
00:27:10,359 --> 00:27:13,560
Speaker 1: audio files you were creating through this voice to text

457
00:27:13,640 --> 00:27:18,040
Speaker 1: feature would go to be heard by any humans out there.

458
00:27:18,560 --> 00:27:21,760
Speaker 1: In fact, they said that the warning that would pop up,

459
00:27:21,840 --> 00:27:25,800
Speaker 1: or the notification that popped up said, turn on voice

460
00:27:25,840 --> 00:27:31,199
Speaker 1: to text in this chat using Facebook Messenger, and above

461
00:27:31,280 --> 00:27:34,119
Speaker 1: the no and yes buttons where you would choose one

462
00:27:34,160 --> 00:27:38,040
Speaker 1: of these options. Facebook further would describe the option display

463
00:27:38,200 --> 00:27:41,720
Speaker 1: text of voice clips you send and receive. You can

464
00:27:41,720 --> 00:27:45,240
Speaker 1: control whether text is visible to you for each chat.

465
00:27:46,359 --> 00:27:49,520
Speaker 1: So again it makes it sound like, oh, this is

466
00:27:49,520 --> 00:27:52,080
Speaker 1: all automated. If I use voice to text, I just

467
00:27:52,320 --> 00:27:55,760
Speaker 1: say a phrase, the text shows up. I might have

468
00:27:55,800 --> 00:27:58,840
Speaker 1: to make some adjustments to the text, maybe it has

469
00:27:58,960 --> 00:28:01,560
Speaker 1: misinterpreted one of the words or whatever. But sort of

470
00:28:01,600 --> 00:28:06,520
Speaker 1: a hands free approach to sending messages in Messenger. Lots

471
00:28:06,560 --> 00:28:09,520
Speaker 1: of apps use voice to text features, and in theory

472
00:28:10,000 --> 00:28:12,760
Speaker 1: it's a pretty great feature. You can dictate a message

473
00:28:12,800 --> 00:28:15,280
Speaker 1: to be sent to your friend without having to stare

474
00:28:15,359 --> 00:28:18,520
Speaker 1: at the screen and type or swipe on a keyboard.

475
00:28:19,200 --> 00:28:22,800
Speaker 1: Tons of folks use features like this if they want

476
00:28:22,840 --> 00:28:25,680
Speaker 1: to interact with an app while they're driving, for example,

477
00:28:25,720 --> 00:28:29,440
Speaker 1: to minimize the distractions they have as they putter around.

478
00:28:30,000 --> 00:28:34,200
Speaker 1: But you'll notice those messages don't seem to indicate anywhere

479
00:28:34,960 --> 00:28:37,800
Speaker 1: that the voice to text recordings could be sent to

480
00:28:38,000 --> 00:28:42,959
Speaker 1: a human being for review. Express Computer further explains that

481
00:28:43,160 --> 00:28:47,200
Speaker 1: even on a supplemental page explaining the voice to text feature,

482
00:28:48,040 --> 00:28:51,280
Speaker 1: Facebook fails to mention that human beings will be reviewing

483
00:28:51,320 --> 00:28:56,040
Speaker 1: that material. Instead. The supplemental page talks about how voice

484
00:28:56,040 --> 00:28:59,680
Speaker 1: to text uses machine learning to get better at interpreting

485
00:28:59,680 --> 00:29:02,160
Speaker 1: what you saying, so that it becomes more useful to

486
00:29:02,200 --> 00:29:05,840
Speaker 1: you the more you actually use the feature. So the

487
00:29:05,880 --> 00:29:10,520
Speaker 1: concept here was that some voice recognition software would transcribe

488
00:29:10,560 --> 00:29:13,880
Speaker 1: this audio. Google Voice also used to do this for

489
00:29:14,000 --> 00:29:17,760
Speaker 1: voice messages. I remember getting voicemails from my mother, who

490
00:29:17,840 --> 00:29:21,600
Speaker 1: has a Southern US dialect as do I, but hers

491
00:29:21,720 --> 00:29:25,520
Speaker 1: is more pronounced. The Google Voice speech to text program

492
00:29:25,640 --> 00:29:30,840
Speaker 1: had problems interpreting my mother's messages, and frequently the transcription

493
00:29:30,880 --> 00:29:34,520
Speaker 1: would be hilariously off track, and most of the time

494
00:29:34,720 --> 00:29:37,200
Speaker 1: I wouldn't even be able to guess what the original

495
00:29:37,240 --> 00:29:40,800
Speaker 1: message was based off the transcription. It meant that I

496
00:29:40,840 --> 00:29:43,240
Speaker 1: would listen to the voicemail and then I would shake

497
00:29:43,280 --> 00:29:46,240
Speaker 1: my head a lot as I would read the transcription

498
00:29:46,320 --> 00:29:48,520
Speaker 1: at the same time and just see how far off

499
00:29:48,600 --> 00:29:53,320
Speaker 1: it was. This is a big challenge for voice recognition programs.

500
00:29:53,560 --> 00:29:57,280
Speaker 1: There are a lot of different dialects and accents. People

501
00:29:57,320 --> 00:30:01,080
Speaker 1: from different regions within the same country can sound very

502
00:30:01,160 --> 00:30:04,680
Speaker 1: different even if they're speaking the exact same language. If

503
00:30:04,680 --> 00:30:08,760
Speaker 1: you get someone from Savannah, Georgia, a native of Savannah, Georgia,

504
00:30:09,000 --> 00:30:12,960
Speaker 1: and a native from Boston, Massachusetts, they're going to be

505
00:30:13,000 --> 00:30:15,600
Speaker 1: able to have a conversation with each other, but they

506
00:30:15,640 --> 00:30:19,280
Speaker 1: will end up saying the same words very differently from

507
00:30:19,280 --> 00:30:22,880
Speaker 1: one another. And that's before you even start talking about

508
00:30:22,960 --> 00:30:26,760
Speaker 1: people who have a different native language, who have learned

509
00:30:26,800 --> 00:30:30,560
Speaker 1: English and have a foreign accent on top of the

510
00:30:30,560 --> 00:30:34,120
Speaker 1: English they speak. There's no hard and fast rule you

511
00:30:34,160 --> 00:30:37,640
Speaker 1: can create for a voice recognition program to follow to

512
00:30:37,800 --> 00:30:42,040
Speaker 1: interpret speech correctly throughout a language. Because there's so much

513
00:30:42,120 --> 00:30:45,000
Speaker 1: variation in how the words and that language are said,

514
00:30:45,600 --> 00:30:49,479
Speaker 1: training the model becomes a challenge. So one thing you

515
00:30:49,560 --> 00:30:53,960
Speaker 1: can do is you have a human being transcribe spoken

516
00:30:54,000 --> 00:30:59,600
Speaker 1: words and then compare the human transcription against the machine

517
00:30:59,680 --> 00:31:03,120
Speaker 1: produce transcription in an effort to train your model to

518
00:31:03,200 --> 00:31:07,840
Speaker 1: be more effective. Humans are pretty good, though not perfect,

519
00:31:08,000 --> 00:31:11,800
Speaker 1: at figuring out what some other humans says. Assuming both

520
00:31:11,840 --> 00:31:15,200
Speaker 1: parties are fluent in the same language. By comparing these

521
00:31:15,200 --> 00:31:17,800
Speaker 1: two records against each other and then making corrections to

522
00:31:17,840 --> 00:31:21,560
Speaker 1: the model, computer scientists can tweak their voice recognition software

523
00:31:21,560 --> 00:31:25,479
Speaker 1: models to be more accurate. Now, ideally you would do

524
00:31:25,520 --> 00:31:29,440
Speaker 1: this before unleashing such a system on the public, but

525
00:31:29,760 --> 00:31:33,360
Speaker 1: that's not really that practical. There is no in lab

526
00:31:33,520 --> 00:31:36,280
Speaker 1: project that is going to come close to generating the

527
00:31:36,360 --> 00:31:39,800
Speaker 1: amount of data and the sheer variety that you will

528
00:31:39,880 --> 00:31:43,360
Speaker 1: encounter out in the real world. Improving the model would

529
00:31:43,360 --> 00:31:47,360
Speaker 1: happen much faster with a larger sample of subjects using

530
00:31:47,480 --> 00:31:50,520
Speaker 1: the model, and a billion or so people is a

531
00:31:50,560 --> 00:31:55,400
Speaker 1: pretty darn big sample size. But that means sending these

532
00:31:55,440 --> 00:31:59,320
Speaker 1: audio files to humans in the first place. And Facebook

533
00:31:59,320 --> 00:32:02,520
Speaker 1: has said that the files were anonymized so that there

534
00:32:02,560 --> 00:32:06,240
Speaker 1: was no identifiable name or anything associated with each of

535
00:32:06,240 --> 00:32:09,440
Speaker 1: the audio files being sent for human review. But hey,

536
00:32:09,600 --> 00:32:12,360
Speaker 1: I hear you say. Earlier in this episode, you pointed

537
00:32:12,360 --> 00:32:14,480
Speaker 1: out how it's possible to really get an idea about

538
00:32:14,480 --> 00:32:18,640
Speaker 1: a person just from the other data they provide, and

539
00:32:18,720 --> 00:32:22,520
Speaker 1: you'd be right. These audio files had all sorts of

540
00:32:22,560 --> 00:32:25,480
Speaker 1: different types of content in them, some of it was

541
00:32:25,600 --> 00:32:30,719
Speaker 1: likely upsetting disturbing or inappropriate. Contractors who had been hired

542
00:32:30,760 --> 00:32:34,320
Speaker 1: to do the transcription came forward anonymously, I might add,

543
00:32:34,320 --> 00:32:36,520
Speaker 1: because they didn't want to get fired from their jobs,

544
00:32:36,920 --> 00:32:40,040
Speaker 1: and said they felt that the practice was an unethical one.

545
00:32:40,280 --> 00:32:42,680
Speaker 1: And media outlets looked into it and their conclusions were

546
00:32:42,680 --> 00:32:45,480
Speaker 1: pretty much the same. Right down the board, Facebook was

547
00:32:45,600 --> 00:32:49,440
Speaker 1: not transparent about what was happening with this audio, and

548
00:32:49,440 --> 00:32:52,680
Speaker 1: there were no clear indications to users that their audio

549
00:32:52,680 --> 00:32:55,480
Speaker 1: files might get sent to some stranger for the purposes

550
00:32:55,520 --> 00:32:59,280
Speaker 1: of transcription. Now, for its part, Facebook said it halted

551
00:32:59,280 --> 00:33:03,080
Speaker 1: the practice in early August two thousand nineteen, and third

552
00:33:03,120 --> 00:33:06,280
Speaker 1: party contractors have said that that is true that they

553
00:33:06,320 --> 00:33:09,480
Speaker 1: no longer are doing this work for Facebook. Facebook isn't

554
00:33:09,480 --> 00:33:11,680
Speaker 1: the only company to come under scrutiny for this kind

555
00:33:11,720 --> 00:33:15,320
Speaker 1: of thing. Google, Apple, and Microsoft have also been under

556
00:33:15,320 --> 00:33:18,880
Speaker 1: the microscope for very similar practices. Now, on the one hand,

557
00:33:19,320 --> 00:33:22,160
Speaker 1: it's understandable that these companies want to improve their voice

558
00:33:22,200 --> 00:33:26,280
Speaker 1: recognition capabilities. It's what makes these apps and products useful

559
00:33:26,720 --> 00:33:29,640
Speaker 1: and makes it more useful to a wider variety of

560
00:33:29,680 --> 00:33:33,120
Speaker 1: people by training the models on this stuff. But the

561
00:33:33,160 --> 00:33:37,040
Speaker 1: privacy concerns remain and it's something that isn't just troubling

562
00:33:37,080 --> 00:33:39,640
Speaker 1: to users, but to the people actually being paid to

563
00:33:39,720 --> 00:33:42,480
Speaker 1: transcribe the stuff in the first place. Now, it would

564
00:33:42,520 --> 00:33:46,160
Speaker 1: be another matter if the companies were transparent about this practice.

565
00:33:46,480 --> 00:33:50,040
Speaker 1: If users knew that there's a chance a real, live

566
00:33:50,120 --> 00:33:52,200
Speaker 1: human being would be listening in on some of those

567
00:33:52,240 --> 00:33:55,680
Speaker 1: voice messages for the purposes of quality control for the

568
00:33:55,760 --> 00:33:59,000
Speaker 1: voice to text feature, maybe they wouldn't opt into using

569
00:33:59,000 --> 00:34:01,239
Speaker 1: the voice to text in the first place, or they

570
00:34:01,320 --> 00:34:05,080
Speaker 1: might opt in and not care. In some cases, I'm

571
00:34:05,080 --> 00:34:07,120
Speaker 1: sure there'd be no shortage of people who would actually

572
00:34:07,160 --> 00:34:11,680
Speaker 1: say truly terrible things, hoping that some poor contractor would

573
00:34:11,719 --> 00:34:13,760
Speaker 1: have to listen to it all and check the audio

574
00:34:13,800 --> 00:34:18,480
Speaker 1: against the automated transcription, because some people would just play nasty.

575
00:34:18,880 --> 00:34:21,480
Speaker 1: Don't be nasty. By the way, there are better ways

576
00:34:21,480 --> 00:34:24,759
Speaker 1: to entertain yourself than by making some other person's life miserable.

577
00:34:25,560 --> 00:34:30,480
Speaker 1: Facebook could potentially face some serious charges based on this practice.

578
00:34:30,880 --> 00:34:34,279
Speaker 1: The company had settled with the Federal Trade Commission, or FTC,

579
00:34:35,000 --> 00:34:38,320
Speaker 1: earlier in the summer of two thousand nineteen. The settlement

580
00:34:38,400 --> 00:34:43,040
Speaker 1: was for an incredible five billion dollars, and it largely

581
00:34:43,040 --> 00:34:47,400
Speaker 1: revolved around the company's rather abysmal record with privacy. The

582
00:34:47,520 --> 00:34:50,520
Speaker 1: charges date all the way back to two thousand twelve,

583
00:34:50,800 --> 00:34:55,440
Speaker 1: when the FTC brought eight privacy related allegations against Facebook.

584
00:34:55,920 --> 00:34:59,239
Speaker 1: And again, this isn't a big surprise. Zuckerberg had already

585
00:34:59,360 --> 00:35:03,759
Speaker 1: cavalierly proclaimed privacy dead a couple of years before that. Now,

586
00:35:03,760 --> 00:35:07,120
Speaker 1: in the settlement, Facebook agreed to adhere to some rules.

587
00:35:07,400 --> 00:35:11,440
Speaker 1: Those rules said that Facebook was prohibited from making misrepresentations

588
00:35:11,520 --> 00:35:15,920
Speaker 1: about the privacy or security of consumers information, prohibited from

589
00:35:15,960 --> 00:35:20,120
Speaker 1: misrepresenting the extent to which it shares personal data, and

590
00:35:20,239 --> 00:35:24,560
Speaker 1: it required Facebook to implement a reasonable privacy program. Now

591
00:35:24,600 --> 00:35:28,319
Speaker 1: I'm no legal expert, not by a long shot, but

592
00:35:28,400 --> 00:35:32,200
Speaker 1: it seems to me that Facebook's failure to alert users

593
00:35:32,280 --> 00:35:34,640
Speaker 1: that their voice to text data could be sent to

594
00:35:34,760 --> 00:35:39,440
Speaker 1: non Facebook employees for review is in violation of this agreement.

595
00:35:39,880 --> 00:35:43,080
Speaker 1: That Facebook agreed to these terms in July two thousand nineteen,

596
00:35:43,520 --> 00:35:47,640
Speaker 1: and then continued the practice into August is a big problem.

597
00:35:47,680 --> 00:35:50,160
Speaker 1: Whether or not it will result in further legal action

598
00:35:50,480 --> 00:35:53,840
Speaker 1: against this company is unknown as I record this episode,

599
00:35:54,040 --> 00:35:57,440
Speaker 1: but it seems like it's at least possible, So I'm

600
00:35:57,440 --> 00:36:00,160
Speaker 1: gonna wrap this up. We know that microphones can sit

601
00:36:00,239 --> 00:36:02,440
Speaker 1: in on us without our knowledge. The n s A

602
00:36:02,560 --> 00:36:05,759
Speaker 1: worked on programs in the United States that did exactly that.

603
00:36:06,239 --> 00:36:09,120
Speaker 1: And while companies with virtual personal assistants tell us that

604
00:36:09,160 --> 00:36:13,399
Speaker 1: those assistants only activate when certain phrases are spoken, it's

605
00:36:13,440 --> 00:36:16,760
Speaker 1: also possible that that list of phrases could go well

606
00:36:16,840 --> 00:36:20,480
Speaker 1: beyond the ones published by the company. So, in other words,

607
00:36:20,880 --> 00:36:24,799
Speaker 1: I might know that to wake up my hypothetical virtual assistant,

608
00:36:25,080 --> 00:36:28,759
Speaker 1: I would have to say the alert phrase sky net awaken,

609
00:36:29,200 --> 00:36:31,520
Speaker 1: and then it pays attention. But what if there's a

610
00:36:31,560 --> 00:36:35,680
Speaker 1: whole laundry list of other words or phrases that could

611
00:36:35,719 --> 00:36:38,880
Speaker 1: wake it up so that it records or transcribes whatever

612
00:36:38,960 --> 00:36:43,040
Speaker 1: audio follows. What if, for example, the phrase shopping or

613
00:36:43,280 --> 00:36:48,240
Speaker 1: going shopping activates it so that whatever follows gets registered

614
00:36:48,280 --> 00:36:50,320
Speaker 1: by the device. So if I tell a friend tomorrow,

615
00:36:50,360 --> 00:36:53,839
Speaker 1: I'm going shopping for some new sneakers, the device has

616
00:36:53,880 --> 00:36:57,279
Speaker 1: registered the phrase new speakers because it paid attention once

617
00:36:57,320 --> 00:37:00,200
Speaker 1: I said the words going shopping, and then I starting

618
00:37:00,200 --> 00:37:03,359
Speaker 1: ads pop up everywhere I go online for sneakers. Now,

619
00:37:03,440 --> 00:37:08,759
Speaker 1: is that something that's possible, Well, yeah, it's possible. That

620
00:37:08,800 --> 00:37:12,399
Speaker 1: doesn't mean it's happening, but it could be It's also

621
00:37:12,440 --> 00:37:15,440
Speaker 1: possible that my other behaviors have indicated that I'm on

622
00:37:15,480 --> 00:37:19,160
Speaker 1: the lookout for some new kicks. Coincidence is a thing,

623
00:37:19,480 --> 00:37:23,319
Speaker 1: and it's frustrating because without seeing behind the scenes, it's

624
00:37:23,360 --> 00:37:28,120
Speaker 1: hard to draw any firm conclusions. Most of us, myself included,

625
00:37:28,400 --> 00:37:32,000
Speaker 1: have a limited understanding of exactly how much data we're

626
00:37:32,040 --> 00:37:34,719
Speaker 1: generating in our day to day lives and how that

627
00:37:34,840 --> 00:37:38,719
Speaker 1: data can be analyzed for patterns and predictions. We may

628
00:37:38,760 --> 00:37:42,080
Speaker 1: not even be aware that we're heading toward a particular

629
00:37:42,120 --> 00:37:46,840
Speaker 1: decision before an algorithm draws that conclusion, and it's spooky

630
00:37:46,960 --> 00:37:50,080
Speaker 1: and disturbing. But it doesn't necessarily mean that we're being

631
00:37:50,160 --> 00:37:53,440
Speaker 1: spied on by a microphone. It may mean we're just

632
00:37:53,520 --> 00:37:57,880
Speaker 1: broadcasting our decisions before we've known that we've made a decision,

633
00:37:58,600 --> 00:38:01,640
Speaker 1: and it does indicate that there is some sort of

634
00:38:02,000 --> 00:38:05,800
Speaker 1: eaves dropping going on, just not necessarily audio eaves dropping.

635
00:38:05,800 --> 00:38:09,800
Speaker 1: It's more about all of our other behaviors that humans

636
00:38:09,840 --> 00:38:11,919
Speaker 1: don't pick up on, so we've never had to worry

637
00:38:11,960 --> 00:38:14,840
Speaker 1: about it before, but machines can analyze it at a

638
00:38:14,920 --> 00:38:19,080
Speaker 1: level that is disturbing. In fact, an actual study at

639
00:38:19,120 --> 00:38:22,560
Speaker 1: Northeastern University looked into the possibility of whether or not

640
00:38:22,719 --> 00:38:26,960
Speaker 1: phones were getting activated by clandestine phrases and listening in

641
00:38:27,000 --> 00:38:30,400
Speaker 1: on conversations, and it found that there was no evidence

642
00:38:30,480 --> 00:38:32,920
Speaker 1: that this was happening. They did find that a lot

643
00:38:33,000 --> 00:38:36,360
Speaker 1: of apps were taking screenshots of stuff on phones and

644
00:38:36,400 --> 00:38:39,080
Speaker 1: sending those screenshots to third parties, though, so you know,

645
00:38:39,560 --> 00:38:44,600
Speaker 1: that's also disturbing, But it doesn't appear that these devices

646
00:38:44,600 --> 00:38:48,320
Speaker 1: are actively listening to you all the time and recording

647
00:38:48,400 --> 00:38:54,120
Speaker 1: or transcribing or broadcasting that information anywhere. There's a lot

648
00:38:54,200 --> 00:38:59,600
Speaker 1: to lose from doing that approach. The problem is it

649
00:38:59,800 --> 00:39:03,239
Speaker 1: is something that is possible, and the other problem is

650
00:39:03,280 --> 00:39:06,239
Speaker 1: that there are other behaviors were doing that are just

651
00:39:06,320 --> 00:39:09,719
Speaker 1: as revealing, if not more so, than recording what it

652
00:39:09,840 --> 00:39:13,919
Speaker 1: is we're saying, and that without being aware of that,

653
00:39:14,360 --> 00:39:18,040
Speaker 1: we are just giving away more and more information about

654
00:39:18,040 --> 00:39:21,200
Speaker 1: ourselves and more and more control over our own lives.

655
00:39:21,360 --> 00:39:23,360
Speaker 1: And we're going to see more and more targeted ads

656
00:39:23,360 --> 00:39:27,400
Speaker 1: that seem super creepy because there's mentioning things that we

657
00:39:27,400 --> 00:39:31,359
Speaker 1: didn't think anyone knew about, because most people wouldn't pick

658
00:39:31,440 --> 00:39:35,080
Speaker 1: up on it fun times, So I don't think this

659
00:39:35,160 --> 00:39:39,800
Speaker 1: was a particularly you know, um, I don't think this

660
00:39:39,880 --> 00:39:44,440
Speaker 1: show really helps allay any fears. It may just switch

661
00:39:44,520 --> 00:39:48,759
Speaker 1: fears from microphones to everything else. But I did want

662
00:39:48,760 --> 00:39:50,920
Speaker 1: to cover this because a lot of people have been

663
00:39:50,960 --> 00:39:53,319
Speaker 1: talking about it for the last few years, and with

664
00:39:53,520 --> 00:39:59,560
Speaker 1: these transcription services that has brought the whole conversation back

665
00:39:59,640 --> 00:40:02,120
Speaker 1: into you the forefront. So I wanted to take an

666
00:40:02,160 --> 00:40:05,080
Speaker 1: opportunity to really tackle it here on the show. If

667
00:40:05,120 --> 00:40:08,080
Speaker 1: you have a suggestion for a future episode of tech Stuff,

668
00:40:08,320 --> 00:40:10,920
Speaker 1: send me an email the addresses tech Stuff at how

669
00:40:11,000 --> 00:40:13,319
Speaker 1: stuff works dot com, or drop me a line. By

670
00:40:13,640 --> 00:40:16,760
Speaker 1: going to tech stuff podcast dot com. You will find

671
00:40:16,920 --> 00:40:20,239
Speaker 1: there a link to all of our archived episodes, as

672
00:40:20,280 --> 00:40:23,120
Speaker 1: well as links to our presence on social media where

673
00:40:23,160 --> 00:40:25,120
Speaker 1: you can get in touch with us, and also a

674
00:40:25,160 --> 00:40:27,640
Speaker 1: link to our online store, where every purchase you make

675
00:40:27,760 --> 00:40:30,880
Speaker 1: goes to help the show. We greatly appreciate your support

676
00:40:31,400 --> 00:40:39,359
Speaker 1: and I will talk to you again really soon. Text

677
00:40:39,400 --> 00:40:42,040
Speaker 1: Stuff is a production of I Heart Radio's How Stuff Works.

678
00:40:42,200 --> 00:40:45,040
Speaker 1: For more podcasts from my heart Radio, visit the i

679
00:40:45,160 --> 00:40:48,360
Speaker 1: heart Radio app, Apple Podcasts, or wherever you listen to

680
00:40:48,400 --> 00:40:49,360
Speaker 1: your favorite shows.