WEBVTT - Shh! The Tech is Listening!

0:00:04.240 --> 0:00:07.240
<v Speaker 1>Welcome to Tech Stuff, a production of I Heart Radios

0:00:07.320 --> 0:00:13.880
<v Speaker 1>How Stuff Works. Hey there, and welcome to tech Stuff.

0:00:13.880 --> 0:00:17.400
<v Speaker 1>I'm your host, Jonathan Strickland. I'm an executive producer with

0:00:17.600 --> 0:00:19.560
<v Speaker 1>I Heart Radio and How Stuff Works, and I love

0:00:19.600 --> 0:00:24.200
<v Speaker 1>all things tech. And I'm sitting in the audience of

0:00:24.239 --> 0:00:28.240
<v Speaker 1>a local theater like Stage theater not long ago. I'm

0:00:28.240 --> 0:00:31.440
<v Speaker 1>waiting for the show to start, and there's a song

0:00:31.720 --> 0:00:34.440
<v Speaker 1>that's playing over the sound system, and I'm really kind

0:00:34.440 --> 0:00:37.479
<v Speaker 1>of digging the song, but I totally don't recognize it.

0:00:38.040 --> 0:00:40.920
<v Speaker 1>And I glanced down at my phone and I see

0:00:41.240 --> 0:00:44.320
<v Speaker 1>that on the phone below the time on the locked

0:00:44.400 --> 0:00:48.600
<v Speaker 1>phone screen, it says that the song is danger High

0:00:48.680 --> 0:00:52.280
<v Speaker 1>Voltage by Electric six. Now this is obviously a hypothetical

0:00:52.280 --> 0:00:54.680
<v Speaker 1>example because I would recognize that song anywhere, but you

0:00:54.720 --> 0:00:57.959
<v Speaker 1>get the point. Anyway, I'm thinking, that's so cool. My

0:00:58.000 --> 0:01:01.640
<v Speaker 1>phone knows what songs are playing around me. That's so neat.

0:01:02.360 --> 0:01:05.000
<v Speaker 1>I didn't even have to tell to do anything. And

0:01:05.040 --> 0:01:07.760
<v Speaker 1>then a couple of hours later, as I think back

0:01:07.800 --> 0:01:11.560
<v Speaker 1>on this moment, uncertainty and dreads start to see Ben,

0:01:11.680 --> 0:01:15.240
<v Speaker 1>wait a minute, if my phone can identify a song

0:01:15.440 --> 0:01:18.400
<v Speaker 1>that's playing around me, that means my phone is actually

0:01:18.440 --> 0:01:21.319
<v Speaker 1>listening to stuff. It wouldn't be able to tell me

0:01:21.680 --> 0:01:23.920
<v Speaker 1>the song title. Otherwise it has to be able to

0:01:23.959 --> 0:01:26.959
<v Speaker 1>pick up the audio. I didn't activate any app. I

0:01:26.959 --> 0:01:30.880
<v Speaker 1>didn't turn on shah Zam or ask my phone or anything.

0:01:30.920 --> 0:01:33.560
<v Speaker 1>My phone did it by itself. So my phone is

0:01:33.600 --> 0:01:36.800
<v Speaker 1>detecting the sounds around it even when it's not in

0:01:36.920 --> 0:01:41.280
<v Speaker 1>an active mode. Now, on a similar note, I'm sure

0:01:41.440 --> 0:01:45.640
<v Speaker 1>we all have had these personal assistant experiences out there.

0:01:45.680 --> 0:01:48.520
<v Speaker 1>Whether we use one ourselves, we've been around when someone

0:01:48.520 --> 0:01:52.880
<v Speaker 1>else uses them, things like Google Assistant or Alexa or

0:01:52.920 --> 0:01:56.120
<v Speaker 1>Siri or Cartana. There's more of them out there. You

0:01:56.160 --> 0:01:59.200
<v Speaker 1>can activate these assistants with a specific word or phrase,

0:01:59.560 --> 0:02:01.640
<v Speaker 1>and then you speak to them to carry out some

0:02:01.680 --> 0:02:04.560
<v Speaker 1>sort of task or to get you some sort of

0:02:04.560 --> 0:02:07.400
<v Speaker 1>information or something along those lines. We've got a Google

0:02:07.440 --> 0:02:10.200
<v Speaker 1>Home device in our house, so we might use it

0:02:10.240 --> 0:02:13.480
<v Speaker 1>to get a quick rundown on the weather Report. We

0:02:13.560 --> 0:02:15.360
<v Speaker 1>might ask it to play a track off an album

0:02:15.360 --> 0:02:19.000
<v Speaker 1>by the jazz Fusion band weather Report. But wait, that

0:02:19.080 --> 0:02:22.120
<v Speaker 1>means that device is listening to We didn't have to

0:02:22.120 --> 0:02:24.280
<v Speaker 1>take any physical action. We didn't have to push a

0:02:24.320 --> 0:02:27.560
<v Speaker 1>button to make it work. We just spoke the keyword

0:02:27.720 --> 0:02:31.160
<v Speaker 1>or a key phrase, and off it goes. And then

0:02:31.200 --> 0:02:34.760
<v Speaker 1>we get into stuff that seems super creepy. And I'm

0:02:34.800 --> 0:02:37.240
<v Speaker 1>sure most of you have had some sort of experience

0:02:37.280 --> 0:02:40.840
<v Speaker 1>like this. Say you're chatting with friends, maybe you're at

0:02:40.880 --> 0:02:44.400
<v Speaker 1>a restaurant or you're just hanging out, and you're talking

0:02:44.440 --> 0:02:47.480
<v Speaker 1>about this new snack food you just heard about, and

0:02:47.520 --> 0:02:50.519
<v Speaker 1>this is just one part of a conversation that rambles

0:02:50.560 --> 0:02:55.200
<v Speaker 1>all over the place. But then you talk a little

0:02:55.200 --> 0:02:56.840
<v Speaker 1>bit about the snack food for a couple of minutes.

0:02:56.840 --> 0:02:58.760
<v Speaker 1>You're like, you've heard about it, you wanted to try it,

0:02:58.880 --> 0:03:01.080
<v Speaker 1>you haven't tried it yet. Later on, you pop on

0:03:01.120 --> 0:03:03.079
<v Speaker 1>over to Facebook, and as you're scrolling through your feed,

0:03:03.160 --> 0:03:06.440
<v Speaker 1>there it is. There's an ad for the very same

0:03:06.480 --> 0:03:09.480
<v Speaker 1>snack food you mentioned to your friends just a little

0:03:09.480 --> 0:03:13.240
<v Speaker 1>earlier that day. You've never purchased the snack as far

0:03:13.280 --> 0:03:15.520
<v Speaker 1>as you remember, you haven't even searched for it on

0:03:15.560 --> 0:03:19.240
<v Speaker 1>the web, and there's the ad. So as Facebook listening

0:03:19.280 --> 0:03:22.200
<v Speaker 1>in on your conversation in an effort to serve up

0:03:22.240 --> 0:03:26.680
<v Speaker 1>a laser focused targeted ad. One this episode, we're gonna

0:03:26.680 --> 0:03:29.840
<v Speaker 1>take a look at the technology that allows our devices

0:03:29.880 --> 0:03:33.320
<v Speaker 1>to listen in on us, and we'll explore the studies

0:03:33.320 --> 0:03:36.200
<v Speaker 1>about whether or not anything hanky is going on and

0:03:36.200 --> 0:03:40.400
<v Speaker 1>try to separate fact from fud FU D that's fear,

0:03:40.520 --> 0:03:44.240
<v Speaker 1>uncertainty and doubt. And we'll also chat about some recent

0:03:44.320 --> 0:03:47.120
<v Speaker 1>news stories about how big companies have been handing over

0:03:47.160 --> 0:03:51.280
<v Speaker 1>audio messages to third party human contractors and what that

0:03:51.360 --> 0:03:55.680
<v Speaker 1>means in terms of privacy and ethics. Now, first, let's

0:03:55.720 --> 0:04:00.160
<v Speaker 1>address a big reason why devices aren't constantly recording or

0:04:00.200 --> 0:04:05.520
<v Speaker 1>broadcasting all the sounds within an environment that's reachable by microphone.

0:04:06.320 --> 0:04:10.840
<v Speaker 1>It's because that's truly enormous, Like, that's a huge amount

0:04:10.960 --> 0:04:14.040
<v Speaker 1>of data. So let's just take Facebook as an example.

0:04:14.680 --> 0:04:18.360
<v Speaker 1>There are more than two billion people using Facebook every month.

0:04:18.880 --> 0:04:21.080
<v Speaker 1>At least one and a half billion people pop on

0:04:21.080 --> 0:04:24.400
<v Speaker 1>Facebook every single day. Now that's not necessarily the same

0:04:24.880 --> 0:04:27.680
<v Speaker 1>one and a half billion people every day, but every

0:04:27.760 --> 0:04:31.640
<v Speaker 1>day one point five billion people check Facebook, and out

0:04:31.640 --> 0:04:35.400
<v Speaker 1>of that number, nearly one billion of them are accessing

0:04:35.440 --> 0:04:40.360
<v Speaker 1>Facebook on mobile devices. So, just from a data management standpoint,

0:04:41.040 --> 0:04:45.240
<v Speaker 1>there's no way any company, even one as large as Facebook,

0:04:45.400 --> 0:04:49.279
<v Speaker 1>could be actively monitoring, recording, or even analyzing all that

0:04:49.360 --> 0:04:54.080
<v Speaker 1>audio that would be coming in from a billion mobile handsets.

0:04:54.960 --> 0:04:56.960
<v Speaker 1>We are in the age of big data, but we

0:04:57.040 --> 0:04:59.640
<v Speaker 1>still have our limits. Plus you'd have to figure out

0:05:00.240 --> 0:05:03.520
<v Speaker 1>that you know that that large amount of data, most

0:05:03.560 --> 0:05:06.640
<v Speaker 1>of it wouldn't be useful to Facebook. Now, don't get

0:05:06.640 --> 0:05:08.880
<v Speaker 1>me wrong. At the end of the day, you and

0:05:08.960 --> 0:05:14.000
<v Speaker 1>I are the products being bought and sold on Facebook

0:05:14.080 --> 0:05:19.240
<v Speaker 1>and Google and other providers out there. We're potential customers

0:05:19.279 --> 0:05:22.720
<v Speaker 1>for all of the advertisers that use those companies like

0:05:22.760 --> 0:05:26.839
<v Speaker 1>Facebook as a platform. So it benefits the advertisers and

0:05:27.040 --> 0:05:31.120
<v Speaker 1>Facebook and sometimes even us as customers to match the

0:05:31.200 --> 0:05:34.360
<v Speaker 1>right ads to the right people. So there's definitely an

0:05:34.400 --> 0:05:37.880
<v Speaker 1>incentive to learn as much about users as possible to

0:05:38.000 --> 0:05:42.200
<v Speaker 1>leverage their interests and potentially convert them into paying customers

0:05:42.240 --> 0:05:45.960
<v Speaker 1>to an advertiser. Now, this is the very basic foundation

0:05:46.080 --> 0:05:50.520
<v Speaker 1>of Facebook's business model. So if Facebook could do this

0:05:50.839 --> 0:05:54.160
<v Speaker 1>from a technical standpoint, and if the company could get

0:05:54.200 --> 0:05:58.400
<v Speaker 1>away with it from a public perception standpoint, I think

0:05:58.400 --> 0:06:03.000
<v Speaker 1>there's little doubt that face Book would do it. But honestly,

0:06:03.000 --> 0:06:05.440
<v Speaker 1>it's just way too much information to process and to

0:06:05.480 --> 0:06:09.200
<v Speaker 1>boil down into actionable plans. We talk about a lot

0:06:09.200 --> 0:06:12.080
<v Speaker 1>of stuff in our day, you know, and some of

0:06:12.120 --> 0:06:14.159
<v Speaker 1>it we may not really be interested in. We're just

0:06:14.200 --> 0:06:17.839
<v Speaker 1>talking about something, So it wouldn't do Facebook any good

0:06:17.839 --> 0:06:20.239
<v Speaker 1>to serve up ads for stuff that we weren't actually

0:06:20.279 --> 0:06:22.880
<v Speaker 1>really interested in, So it has to pick and choose

0:06:22.880 --> 0:06:27.360
<v Speaker 1>its moments. Facebook has denied using phone microphones in this way.

0:06:27.720 --> 0:06:30.320
<v Speaker 1>In a June second, two thousand sixteen blog post on

0:06:30.360 --> 0:06:34.280
<v Speaker 1>the Facebook newsroom site, a company representative wrote this, and

0:06:34.320 --> 0:06:39.720
<v Speaker 1>here's a quote. Facebook does not use your phone's microphone

0:06:39.760 --> 0:06:42.359
<v Speaker 1>to inform ads or to change what you see in

0:06:42.440 --> 0:06:45.800
<v Speaker 1>news feed. Some recent articles have suggested that we must

0:06:45.839 --> 0:06:48.280
<v Speaker 1>be listening to people's conversations in order to show them

0:06:48.279 --> 0:06:52.360
<v Speaker 1>relevant ads. This is not true. We show ads based

0:06:52.400 --> 0:06:56.400
<v Speaker 1>on people's interests and other profile information, not what you're

0:06:56.400 --> 0:07:00.160
<v Speaker 1>talking out loud about. We only access your microphone if

0:07:00.200 --> 0:07:02.560
<v Speaker 1>you have given our app permission, and if you are

0:07:02.600 --> 0:07:06.560
<v Speaker 1>actively using a specific feature that requires audio. This might

0:07:06.600 --> 0:07:09.600
<v Speaker 1>include recording a video or using an optional feature we

0:07:09.640 --> 0:07:12.560
<v Speaker 1>introduced two years ago to include music or other audio

0:07:12.600 --> 0:07:18.240
<v Speaker 1>in your status updates. End quote. Now, it's understandable that

0:07:18.320 --> 0:07:22.200
<v Speaker 1>people would be a bit skeptical regarding Facebook's claims of innocence.

0:07:22.520 --> 0:07:25.840
<v Speaker 1>In this regard. The company has had several high profile

0:07:25.920 --> 0:07:29.840
<v Speaker 1>scandals and issues with privacy and security. Zuckerberg himself once

0:07:29.960 --> 0:07:35.240
<v Speaker 1>famously declared that privacy is dead. Also, he simultaneously does

0:07:35.280 --> 0:07:38.400
<v Speaker 1>his best to preserve his own privacy. But that's commentary

0:07:38.440 --> 0:07:42.400
<v Speaker 1>for another episode. So I don't blame people for thinking

0:07:42.440 --> 0:07:45.480
<v Speaker 1>that Facebook might actually be listening in on conversations because

0:07:45.480 --> 0:07:48.880
<v Speaker 1>the company has already proven it hasn't been the best

0:07:49.000 --> 0:07:52.640
<v Speaker 1>steward of user privacy in the past. But that doesn't

0:07:52.680 --> 0:07:56.040
<v Speaker 1>mean the company has actually been spying on people. It

0:07:56.080 --> 0:08:00.480
<v Speaker 1>doesn't have to, at least not in that way. And

0:08:00.720 --> 0:08:03.680
<v Speaker 1>this is where we get into some troubling territory because

0:08:03.720 --> 0:08:06.200
<v Speaker 1>it's where we start to learn how services like Google

0:08:06.280 --> 0:08:10.880
<v Speaker 1>and Facebook and others can glean information about us, whether

0:08:10.960 --> 0:08:14.240
<v Speaker 1>we have consciously shared that information or not, and it

0:08:14.240 --> 0:08:17.840
<v Speaker 1>helps explain how these companies can advertise to us so effectively.

0:08:18.640 --> 0:08:22.200
<v Speaker 1>One way Facebook does this is with an innovation called

0:08:22.360 --> 0:08:26.640
<v Speaker 1>Facebook Pixel. Now, this is a piece of code that

0:08:27.000 --> 0:08:32.320
<v Speaker 1>Facebook's clients advertisers really can put on their own websites.

0:08:32.720 --> 0:08:35.600
<v Speaker 1>So it's the type of code you would insert into

0:08:35.640 --> 0:08:38.040
<v Speaker 1>the website for a business. So let's say you own

0:08:38.080 --> 0:08:42.359
<v Speaker 1>a specialty niche marketing shop. We'll say you sell figurines

0:08:42.400 --> 0:08:46.200
<v Speaker 1>based off of iconic horror movie monsters and characters, and

0:08:46.240 --> 0:08:49.200
<v Speaker 1>you're going to advertise on Facebook. The pixel code is

0:08:49.240 --> 0:08:52.920
<v Speaker 1>one way Facebook can optimize that experience. The code pulls

0:08:52.960 --> 0:08:57.320
<v Speaker 1>information off of user behavior on your website and sends

0:08:57.320 --> 0:09:00.760
<v Speaker 1>it to Facebook. If people click over to your site

0:09:00.760 --> 0:09:03.560
<v Speaker 1>because of an ad on Facebook, pixel will register it.

0:09:04.000 --> 0:09:07.120
<v Speaker 1>This helps you see how effective or ineffective your ads

0:09:07.200 --> 0:09:10.800
<v Speaker 1>are on the site. It also can target your ads

0:09:10.920 --> 0:09:13.520
<v Speaker 1>to people on Facebook who would be most likely to

0:09:13.600 --> 0:09:17.160
<v Speaker 1>click on those ads. It might analyze the traits common

0:09:17.200 --> 0:09:19.600
<v Speaker 1>to people who are interacting with your ads, and then

0:09:19.640 --> 0:09:22.760
<v Speaker 1>extrapolate that to target people who have similar traits and

0:09:22.880 --> 0:09:27.920
<v Speaker 1>behaviors but they haven't yet seen your advertisements. Facebook, meanwhile,

0:09:28.040 --> 0:09:30.360
<v Speaker 1>can also use that data to serve up ads from

0:09:30.400 --> 0:09:33.559
<v Speaker 1>other companies to users based on similar findings, and it

0:09:33.640 --> 0:09:36.400
<v Speaker 1>can track other stuff too. Let's say you click over

0:09:36.480 --> 0:09:38.880
<v Speaker 1>to an article on a blog or news site that

0:09:38.960 --> 0:09:42.680
<v Speaker 1>incorporates Facebook pixel in the site's code. Facebook can see

0:09:42.679 --> 0:09:45.160
<v Speaker 1>how long you were on that article, which in turn

0:09:45.200 --> 0:09:48.600
<v Speaker 1>indicates your interest and investment level in that topic. Then

0:09:48.640 --> 0:09:51.640
<v Speaker 1>Facebook can serve up ads related to the contents of

0:09:51.679 --> 0:09:54.920
<v Speaker 1>that article to you. In the end, it's all about

0:09:54.920 --> 0:09:58.760
<v Speaker 1>analyzing user behavior to get the biggest return on investment,

0:09:59.080 --> 0:10:01.800
<v Speaker 1>and it doesn't require are using the microphone to do it.

0:10:02.160 --> 0:10:05.000
<v Speaker 1>They can just look at who you are, where you've been,

0:10:05.440 --> 0:10:09.280
<v Speaker 1>both in real life if it's tracking your location and

0:10:09.360 --> 0:10:12.720
<v Speaker 1>on the Internet if it's tracking your your browsing and

0:10:12.800 --> 0:10:15.600
<v Speaker 1>who your friends are. And all of this information combined

0:10:16.000 --> 0:10:19.240
<v Speaker 1>gives Facebook a ton of data about what kind of

0:10:19.280 --> 0:10:21.920
<v Speaker 1>ads to target towards you. Now, on top of that,

0:10:22.200 --> 0:10:26.120
<v Speaker 1>Facebook can purchase information from data brokers to supplement its

0:10:26.120 --> 0:10:29.400
<v Speaker 1>own guard Ganga and database. There are companies that manage

0:10:29.400 --> 0:10:33.160
<v Speaker 1>stuff like loyalty programs, which also track what you buy.

0:10:33.360 --> 0:10:36.000
<v Speaker 1>They have to for the loyalty programs to work, and

0:10:36.040 --> 0:10:39.400
<v Speaker 1>those purchases are linked to you as a person. They know, Oh,

0:10:39.480 --> 0:10:42.480
<v Speaker 1>Jonathan goes to Starbucks all the time and he always

0:10:42.480 --> 0:10:45.520
<v Speaker 1>gets those Nitro cold brews, So let's put an ad

0:10:46.000 --> 0:10:49.720
<v Speaker 1>that targets him based on that information. Now, that data

0:10:49.800 --> 0:10:51.920
<v Speaker 1>isn't just being used to help you get the best

0:10:52.200 --> 0:10:56.080
<v Speaker 1>deal on whatever it happens to be. That information is valuable.

0:10:56.559 --> 0:11:00.480
<v Speaker 1>So companies that manage these loyalty programs can and do

0:11:00.840 --> 0:11:03.600
<v Speaker 1>buy and sell sell that data you know are spending

0:11:03.640 --> 0:11:07.400
<v Speaker 1>habits are part of this sort of encyclopedia entry about

0:11:07.400 --> 0:11:11.080
<v Speaker 1>our interests, priorities, and behaviors. Now, none of this needs

0:11:11.200 --> 0:11:15.200
<v Speaker 1>to use a microphone to spy on us. So in

0:11:15.240 --> 0:11:17.800
<v Speaker 1>the case of seeing that snack food pop up on

0:11:17.800 --> 0:11:20.480
<v Speaker 1>the Facebook feed, it could simply be that you exhibit

0:11:20.559 --> 0:11:23.520
<v Speaker 1>behaviors similar to ones that people who have bought that

0:11:23.600 --> 0:11:26.200
<v Speaker 1>snack food tend to have. As well. You've liked the

0:11:26.240 --> 0:11:29.480
<v Speaker 1>same sort of pages. You may even have a lot

0:11:29.520 --> 0:11:32.080
<v Speaker 1>of friends who have already bought this stuff. You may

0:11:32.120 --> 0:11:34.959
<v Speaker 1>live in a region where it has recently been introduced.

0:11:35.360 --> 0:11:37.600
<v Speaker 1>These are the kinds of points of data that Facebook

0:11:37.679 --> 0:11:39.320
<v Speaker 1>might use in order to serve that add up to

0:11:39.360 --> 0:11:41.840
<v Speaker 1>you that have nothing to do with your microphone. So

0:11:41.880 --> 0:11:44.640
<v Speaker 1>you got the ad not because you talked about the

0:11:44.640 --> 0:11:47.760
<v Speaker 1>snack food, but because Facebook has sussed out you're the

0:11:47.760 --> 0:11:50.640
<v Speaker 1>type of person who would like that snack food because

0:11:51.400 --> 0:11:54.360
<v Speaker 1>spoiler alert, You're not as special as you think you are,

0:11:54.880 --> 0:11:57.600
<v Speaker 1>and I'm not as special as I think I am.

0:11:57.640 --> 0:12:00.080
<v Speaker 1>Now you could argue, and I would agree with you

0:12:00.160 --> 0:12:03.480
<v Speaker 1>on this, that what Facebook is doing is at least

0:12:03.559 --> 0:12:06.520
<v Speaker 1>as creepy as listening in on a microphone, perhaps even

0:12:06.600 --> 0:12:10.760
<v Speaker 1>more so. Facebook has filed patents that focus on technology

0:12:10.840 --> 0:12:13.200
<v Speaker 1>is meant to predict where you're going to go next

0:12:13.559 --> 0:12:16.400
<v Speaker 1>based on your history of location data. So, in other words,

0:12:16.640 --> 0:12:19.160
<v Speaker 1>Facebook is trying to figure out where you're going to

0:12:19.240 --> 0:12:23.000
<v Speaker 1>go before you go there. And it's not just you,

0:12:23.160 --> 0:12:25.680
<v Speaker 1>it's all the people you know who are using Facebook

0:12:25.720 --> 0:12:29.440
<v Speaker 1>two and so it's not just predicting where you'll go,

0:12:30.120 --> 0:12:33.600
<v Speaker 1>it's also predicting which people you may be running into,

0:12:33.679 --> 0:12:35.800
<v Speaker 1>because it's predicting those people are going to go to

0:12:35.840 --> 0:12:38.560
<v Speaker 1>that same place and whether or not you might encounter

0:12:38.679 --> 0:12:41.199
<v Speaker 1>one another. It can also use that to make suggestions

0:12:41.240 --> 0:12:44.480
<v Speaker 1>to add people on Facebook who are going to those

0:12:44.520 --> 0:12:48.240
<v Speaker 1>same places so that they become your friends online. Now

0:12:48.240 --> 0:12:51.400
<v Speaker 1>why does Facebook care who your friends are? Because the

0:12:51.440 --> 0:12:55.120
<v Speaker 1>more people who use Facebook and the more interconnected they become,

0:12:55.640 --> 0:12:59.480
<v Speaker 1>the more useful the information they generate for Facebook. That

0:12:59.720 --> 0:13:03.640
<v Speaker 1>that ends up becoming more valuable to the company. So

0:13:05.040 --> 0:13:07.480
<v Speaker 1>it is pretty creepy and invasive, and it doesn't have

0:13:07.520 --> 0:13:10.439
<v Speaker 1>to use the microphone. But when we come back, I'll

0:13:10.440 --> 0:13:13.040
<v Speaker 1>talk a bit more about these sound activated features and

0:13:13.080 --> 0:13:15.439
<v Speaker 1>what's actually going on, because there is some stuff we've

0:13:15.480 --> 0:13:17.760
<v Speaker 1>got to be worried about. But first, let's take a

0:13:17.880 --> 0:13:28.240
<v Speaker 1>quick break. When I opened this show, I talked about

0:13:28.240 --> 0:13:30.920
<v Speaker 1>how my phone could listen in on music and identify

0:13:31.000 --> 0:13:34.320
<v Speaker 1>the song even when the phone was in its locked mode.

0:13:34.800 --> 0:13:38.200
<v Speaker 1>Now that's because I have a Pixel to xcel phone.

0:13:38.240 --> 0:13:41.839
<v Speaker 1>It's an Android phone. It's actually a flagship Google phone,

0:13:42.160 --> 0:13:45.400
<v Speaker 1>and there's a feature on the Pixel too that's called

0:13:45.640 --> 0:13:48.560
<v Speaker 1>now playing. You have to activate this feature, you have

0:13:48.600 --> 0:13:51.679
<v Speaker 1>to choose to optimize it. So I want to make

0:13:51.720 --> 0:13:54.679
<v Speaker 1>that clear. I chose to activate this feature. It's not

0:13:54.760 --> 0:13:59.240
<v Speaker 1>just active by default, and with it active, the phone

0:13:59.240 --> 0:14:01.920
<v Speaker 1>can identify music that's playing, and it can tell me

0:14:01.960 --> 0:14:04.720
<v Speaker 1>the title even when the phone is in its locked position.

0:14:04.800 --> 0:14:08.360
<v Speaker 1>So what gives Well, this is not as creepy and

0:14:08.440 --> 0:14:12.040
<v Speaker 1>invasive as it sounds at first glance, because his feature,

0:14:12.480 --> 0:14:16.480
<v Speaker 1>this is incredible to me, is actually entirely local to

0:14:16.600 --> 0:14:21.320
<v Speaker 1>the Pixel two phones. It works on the phone itself.

0:14:21.360 --> 0:14:24.320
<v Speaker 1>It's not consulting the cloud at all, it's not sending

0:14:24.360 --> 0:14:28.760
<v Speaker 1>any information. So how can that be possible? How can

0:14:29.320 --> 0:14:32.400
<v Speaker 1>all this information exists on the phone already? Well, let's

0:14:32.440 --> 0:14:35.960
<v Speaker 1>boil it down first, if you've ever played with any

0:14:36.000 --> 0:14:40.920
<v Speaker 1>digital sound recording software, you've likely seen sound recorded as

0:14:40.920 --> 0:14:44.880
<v Speaker 1>a wave form, a visualization of sound, and typically it's

0:14:44.880 --> 0:14:47.120
<v Speaker 1>pretty simple stuff like if you're using a very basic

0:14:47.240 --> 0:14:51.920
<v Speaker 1>sound recording system, you're mostly looking at changes in amplitude

0:14:52.280 --> 0:14:55.119
<v Speaker 1>or volume. In other words, so you see a continuous

0:14:55.200 --> 0:14:57.520
<v Speaker 1>series of peaks and valleys over the course of a

0:14:57.560 --> 0:15:02.200
<v Speaker 1>sound recording. Those represent the loudest and the quietest parts

0:15:02.240 --> 0:15:05.200
<v Speaker 1>of the recording that changes in volume. You can also

0:15:05.240 --> 0:15:09.480
<v Speaker 1>graph frequency or pitch, and you can if you zoom

0:15:09.520 --> 0:15:12.480
<v Speaker 1>way in, see shapes in the wave form that indicates

0:15:12.480 --> 0:15:17.080
<v Speaker 1>specific phonetics and sounds. Anyone who has worked in audio

0:15:17.240 --> 0:15:20.760
<v Speaker 1>editing for a while can identify at a glance certain

0:15:20.800 --> 0:15:26.000
<v Speaker 1>distinctive sounds. Tari, my producer, can probably tell you just

0:15:26.160 --> 0:15:29.520
<v Speaker 1>by looking at a waveform of my recording which moments

0:15:29.560 --> 0:15:34.400
<v Speaker 1>represent the irritating mouth sounds she removes before publishing an episode.

0:15:35.080 --> 0:15:37.680
<v Speaker 1>It doesn't take long before you can do this yourself.

0:15:38.040 --> 0:15:40.560
<v Speaker 1>It's actually pretty easy to identify, say it like a

0:15:40.640 --> 0:15:46.000
<v Speaker 1>high hat symbol in a music recording, because it's very distinctive. Now,

0:15:46.080 --> 0:15:49.200
<v Speaker 1>that means that songs have these distinctive features like a

0:15:49.240 --> 0:15:53.400
<v Speaker 1>fingerprint that represent the sound of the song, and if

0:15:53.440 --> 0:15:56.800
<v Speaker 1>you can recognize the fingerprint, you can identify the song

0:15:57.040 --> 0:15:59.600
<v Speaker 1>even if you're not listening to the song at that moment.

0:16:00.040 --> 0:16:03.000
<v Speaker 1>And you could look at a print out of a

0:16:03.000 --> 0:16:06.280
<v Speaker 1>wave form of a song and you can try and

0:16:06.360 --> 0:16:10.760
<v Speaker 1>match it against a library of print outs. That's essentially

0:16:10.840 --> 0:16:14.280
<v Speaker 1>what the pixel Too is doing. The program runs in

0:16:14.320 --> 0:16:17.960
<v Speaker 1>the background, It activates when the sound profile indicates that

0:16:18.000 --> 0:16:22.160
<v Speaker 1>there's music present, so it then analyzes the sound that's

0:16:22.160 --> 0:16:24.800
<v Speaker 1>coming in through the microphone and it creates one of

0:16:24.800 --> 0:16:28.400
<v Speaker 1>these digital fingerprints that I was just saying. Then, just

0:16:28.440 --> 0:16:31.040
<v Speaker 1>like you would with a crime scene fingerprint, the pixel

0:16:31.080 --> 0:16:34.760
<v Speaker 1>Too will compare the digital analysis of the song that's

0:16:34.760 --> 0:16:38.560
<v Speaker 1>playing against a local database on the phone of fingerprints

0:16:38.600 --> 0:16:42.640
<v Speaker 1>that represent thousands of popular songs for your region. Now

0:16:42.680 --> 0:16:45.920
<v Speaker 1>exactly how many hasn't really been released, but supposedly in

0:16:45.960 --> 0:16:49.560
<v Speaker 1>the tens of thousands of songs range. And if the

0:16:49.560 --> 0:16:51.920
<v Speaker 1>pixel Too finds a match between the song that is

0:16:51.960 --> 0:16:55.200
<v Speaker 1>currently playing and the one that's in the database, it

0:16:55.280 --> 0:16:58.200
<v Speaker 1>returns the result. This works even if the phone has

0:16:58.200 --> 0:17:01.840
<v Speaker 1>cellular and WiFi data turned off, because again it's all local.

0:17:02.440 --> 0:17:06.480
<v Speaker 1>Now the now playing feature doesn't run constantly because that

0:17:06.520 --> 0:17:10.119
<v Speaker 1>would drain battery life like crazy. Instead, it samples the

0:17:10.160 --> 0:17:14.600
<v Speaker 1>audio approximately every sixty seconds, and it takes time to

0:17:14.680 --> 0:17:17.560
<v Speaker 1>match a song to an entry in the database. The

0:17:17.600 --> 0:17:20.959
<v Speaker 1>cleaner the audio, in other words, the less background noise

0:17:21.040 --> 0:17:24.800
<v Speaker 1>and less interference that's present, the faster this process tends

0:17:24.800 --> 0:17:28.440
<v Speaker 1>to be. This means that when songs transition from one

0:17:28.480 --> 0:17:31.200
<v Speaker 1>song to another, it can take a little bit before

0:17:31.240 --> 0:17:33.879
<v Speaker 1>the phone registers the change. It all depends on the

0:17:33.920 --> 0:17:38.040
<v Speaker 1>acoustic quality of the environment and where in this sampling

0:17:38.160 --> 0:17:42.440
<v Speaker 1>cycle the phone is at any given time, so that's

0:17:42.480 --> 0:17:45.840
<v Speaker 1>not quite as creepy because everything's local on the device.

0:17:45.920 --> 0:17:49.159
<v Speaker 1>It's not sending any data out anywhere else. It's not

0:17:49.280 --> 0:17:52.240
<v Speaker 1>listening to what I'm listening to and an alerting Google

0:17:52.400 --> 0:17:55.359
<v Speaker 1>to let them know, hey, Jonathan's once again listening to

0:17:55.400 --> 0:17:59.960
<v Speaker 1>the soundtrack to be More Chill, which would be an

0:18:00.040 --> 0:18:03.000
<v Speaker 1>accurate suggestion that it would make because I do listen

0:18:03.040 --> 0:18:05.840
<v Speaker 1>to that a lot. Anyway, you can use this feature

0:18:06.520 --> 0:18:09.560
<v Speaker 1>to learn more about the track, the artist, the album,

0:18:09.600 --> 0:18:13.320
<v Speaker 1>including potentially purchasing that music. And those features do connect

0:18:13.359 --> 0:18:16.679
<v Speaker 1>to the outside world through WiFi or cellular connections, but

0:18:16.760 --> 0:18:20.639
<v Speaker 1>that requires an extra step on the part of the user. Also,

0:18:20.680 --> 0:18:23.520
<v Speaker 1>Google pushes out updates to this database with the most

0:18:23.520 --> 0:18:27.560
<v Speaker 1>popular songs, and these are regionalized to reflect the country

0:18:27.560 --> 0:18:31.240
<v Speaker 1>you're in, because you're less likely to run into, say

0:18:31.600 --> 0:18:35.320
<v Speaker 1>a Peruvian pop song when you're in Scotland. The push

0:18:35.440 --> 0:18:39.320
<v Speaker 1>updates do happen over WiFi or cellular local connections. But

0:18:39.960 --> 0:18:42.920
<v Speaker 1>but this is just the reference data that analyze music

0:18:42.960 --> 0:18:47.080
<v Speaker 1>gets compared against. An app like Shazam, on the other hand,

0:18:47.520 --> 0:18:50.400
<v Speaker 1>connects to the cloud, but you also have to activate

0:18:50.440 --> 0:18:52.760
<v Speaker 1>the app to have it listened to the audio, so

0:18:53.160 --> 0:18:56.439
<v Speaker 1>it's a user choice to have the app listen. So

0:18:56.480 --> 0:18:59.040
<v Speaker 1>this is more like a push to talk device, except

0:18:59.040 --> 0:19:02.439
<v Speaker 1>it's pushed to listen. Shazam is also analyzing music to

0:19:02.480 --> 0:19:05.399
<v Speaker 1>sus out a digital fingerprint for the audio, but it

0:19:05.480 --> 0:19:09.480
<v Speaker 1>can compare the sampled audio against a much larger database

0:19:09.800 --> 0:19:13.239
<v Speaker 1>consisting of millions of songs, rather than the tens of

0:19:13.280 --> 0:19:16.439
<v Speaker 1>thousands you would find on the pixel to now playing feature.

0:19:17.040 --> 0:19:20.320
<v Speaker 1>More importantly, I think it's fair to say this isn't

0:19:20.359 --> 0:19:23.679
<v Speaker 1>a creepy use of the technology, since the listening feature

0:19:23.760 --> 0:19:27.240
<v Speaker 1>only activates on the user's command rather than just being

0:19:27.320 --> 0:19:30.320
<v Speaker 1>on by default. Now, this isn't that much different than

0:19:30.359 --> 0:19:34.440
<v Speaker 1>what virtual assistants are doing when you use them. Clearly,

0:19:35.000 --> 0:19:38.359
<v Speaker 1>the microphone on a virtual assistant like Google Home or

0:19:38.440 --> 0:19:41.960
<v Speaker 1>Siri or whatever, it has to be active all the time,

0:19:42.040 --> 0:19:44.879
<v Speaker 1>otherwise you wouldn't get a response when you used whatever

0:19:44.920 --> 0:19:48.800
<v Speaker 1>the keyword or phrase was to activate the assistant. I'm

0:19:48.800 --> 0:19:52.440
<v Speaker 1>going to try and avoid saying any of those phrases,

0:19:52.520 --> 0:19:54.399
<v Speaker 1>by the way, because I don't want those of you

0:19:54.520 --> 0:19:57.280
<v Speaker 1>who have those devices to deal with the frustration of

0:19:57.320 --> 0:20:01.200
<v Speaker 1>them going off in response to something I say. A Now,

0:20:01.200 --> 0:20:05.000
<v Speaker 1>those words or phrases have a specific sound, just like

0:20:05.240 --> 0:20:09.040
<v Speaker 1>music does. In this case, we're talking about phonemes, which

0:20:09.040 --> 0:20:12.440
<v Speaker 1>are recognizable sounds found in language. So in English there

0:20:12.480 --> 0:20:16.560
<v Speaker 1>are forty four phonemes. The order and combination of those

0:20:16.560 --> 0:20:19.560
<v Speaker 1>phonemes are the key. So if you say something that

0:20:19.680 --> 0:20:23.000
<v Speaker 1>has those phonemes in the right order, or if it's

0:20:23.119 --> 0:20:26.440
<v Speaker 1>close enough, if it's an a noisy environment, this can

0:20:26.480 --> 0:20:30.560
<v Speaker 1>activate the virtual assistant. It's like a key fitting into

0:20:30.600 --> 0:20:33.640
<v Speaker 1>a lock. Now, if you're saying other stuff, it's like

0:20:33.680 --> 0:20:37.000
<v Speaker 1>the wrong key is inserted and nothing happens. It's only

0:20:37.000 --> 0:20:39.720
<v Speaker 1>when you say something that fits the lock that the

0:20:39.760 --> 0:20:45.000
<v Speaker 1>assistant activates. This process continues after activation. When you talk

0:20:45.080 --> 0:20:48.960
<v Speaker 1>to the virtual assistant, it analyzes your speech by phonemes.

0:20:49.920 --> 0:20:53.000
<v Speaker 1>Software processes those to figure out what words you are

0:20:53.080 --> 0:20:56.520
<v Speaker 1>actually saying. Well for the first step, that is, because

0:20:56.560 --> 0:21:00.199
<v Speaker 1>it's actually more complicated than that. So, for example, there

0:21:00.240 --> 0:21:03.440
<v Speaker 1>are hominems. These are words that have a similar sound

0:21:03.760 --> 0:21:08.480
<v Speaker 1>but different meanings and often different spellings. An easy example

0:21:08.600 --> 0:21:12.080
<v Speaker 1>is the number eight in the past tense for to eat,

0:21:12.520 --> 0:21:16.520
<v Speaker 1>such as I ate an entire bowl of cao. Mm

0:21:16.600 --> 0:21:22.840
<v Speaker 1>hmm okay. So those two words eight and eight sound

0:21:22.920 --> 0:21:26.199
<v Speaker 1>exactly the same, but they have different meanings. Now that

0:21:26.240 --> 0:21:29.400
<v Speaker 1>means the software can't rely on just the sounds you're

0:21:29.440 --> 0:21:32.000
<v Speaker 1>making when you speak to figure out what you mean,

0:21:32.480 --> 0:21:36.120
<v Speaker 1>has to actually analyze syntax and context and make judgment

0:21:36.160 --> 0:21:38.960
<v Speaker 1>calls about what you are actually meaning when you say

0:21:38.960 --> 0:21:43.040
<v Speaker 1>these things. Sometimes it gets things right, sometimes it gets

0:21:43.040 --> 0:21:45.840
<v Speaker 1>things wrong. But don't be too hard on it. Because

0:21:46.160 --> 0:21:50.000
<v Speaker 1>humans misunderstand other humans all the time. Even when we

0:21:50.040 --> 0:21:52.719
<v Speaker 1>are both communicating with it in the same language, we

0:21:52.760 --> 0:21:56.600
<v Speaker 1>can misunderstand each other. Now, this is still just the

0:21:56.680 --> 0:22:00.000
<v Speaker 1>first step you can think of. This is essentially speed

0:22:00.000 --> 0:22:02.960
<v Speaker 1>each to text. From there, you have to determine what

0:22:03.160 --> 0:22:06.320
<v Speaker 1>is actually being asked by the speaker, what is the

0:22:06.400 --> 0:22:11.600
<v Speaker 1>intent behind the words. If someone speaks French very slowly

0:22:11.640 --> 0:22:14.199
<v Speaker 1>to me, I might be able to spell out what

0:22:14.359 --> 0:22:17.400
<v Speaker 1>is being said phonetically, but that doesn't mean I understand

0:22:17.440 --> 0:22:21.360
<v Speaker 1>the actual content of what was spoken. And to complicate matters,

0:22:21.640 --> 0:22:23.560
<v Speaker 1>there are a lot of different ways to ask for

0:22:23.600 --> 0:22:27.199
<v Speaker 1>the same information. I might say what's the weather for

0:22:27.240 --> 0:22:30.280
<v Speaker 1>this week? Or will I need an umbrella today, or

0:22:30.320 --> 0:22:32.879
<v Speaker 1>one of a dozen other ways to inquire about the weather.

0:22:33.359 --> 0:22:36.479
<v Speaker 1>The software has to be able to determine what the

0:22:36.560 --> 0:22:40.960
<v Speaker 1>intent was behind my question, and then there's another step,

0:22:41.280 --> 0:22:45.280
<v Speaker 1>which is matching intent with action. The assistant has to

0:22:45.359 --> 0:22:48.679
<v Speaker 1>respond to my request, and hopefully it does so in

0:22:48.680 --> 0:22:51.320
<v Speaker 1>a way that's relevant to whatever I was asking about

0:22:51.320 --> 0:22:53.840
<v Speaker 1>in the first place. So if I ask my virtual

0:22:53.880 --> 0:22:56.720
<v Speaker 1>assistant for an update on the weather, I'm not going

0:22:56.760 --> 0:22:59.679
<v Speaker 1>to be impressed if it instead tells me about the

0:22:59.720 --> 0:23:03.720
<v Speaker 1>track FAIC or vice versa. And as assistants get connected

0:23:03.760 --> 0:23:08.320
<v Speaker 1>into more systems like security systems, lights, apps, and more,

0:23:08.760 --> 0:23:12.520
<v Speaker 1>the software has to send appropriate commands to these other

0:23:12.600 --> 0:23:16.679
<v Speaker 1>elements to produce the expected results. Now, this is all impressive,

0:23:17.000 --> 0:23:20.040
<v Speaker 1>and because it's impressive, it could be a little scary

0:23:20.160 --> 0:23:23.639
<v Speaker 1>when we think about assistance as hanging on our every word.

0:23:23.760 --> 0:23:27.440
<v Speaker 1>What are are they always listening? Are they always paying attention? Now?

0:23:27.480 --> 0:23:30.760
<v Speaker 1>They're always monitoring sound, but they're not doing so in

0:23:30.800 --> 0:23:34.520
<v Speaker 1>an effort to broadcast or record information. They are on

0:23:34.720 --> 0:23:39.399
<v Speaker 1>alert for that initiating phrase or word. They ignore everything else.

0:23:40.200 --> 0:23:43.399
<v Speaker 1>More on that a little bit later. Now that being said,

0:23:43.800 --> 0:23:47.280
<v Speaker 1>there are ways in which someone could hack an assistant

0:23:47.560 --> 0:23:51.199
<v Speaker 1>or a phone, or really any connected device that has

0:23:51.240 --> 0:23:55.719
<v Speaker 1>a microphone in order to eavesdrop using that devices microphone.

0:23:56.359 --> 0:23:59.280
<v Speaker 1>Edward Snowden revealed that the n s A use such

0:23:59.320 --> 0:24:03.520
<v Speaker 1>tactics in the agency's surveillance efforts. Apps that have access

0:24:03.560 --> 0:24:06.600
<v Speaker 1>to your phone's camera and microphone for the purposes of

0:24:06.640 --> 0:24:10.680
<v Speaker 1>sharing video, audio, and related features can do some disturbing

0:24:10.720 --> 0:24:13.800
<v Speaker 1>stuff if they're compromised. They can also do some disturbing

0:24:13.800 --> 0:24:16.520
<v Speaker 1>stuff if they're not compromised, but if the party behind

0:24:16.560 --> 0:24:22.240
<v Speaker 1>it is malicious. Felix Krauss made such an app as

0:24:22.280 --> 0:24:26.159
<v Speaker 1>a proof of concept for iOS devices. The app, like

0:24:26.240 --> 0:24:29.679
<v Speaker 1>many others, asked the user for permission to access the camera.

0:24:30.040 --> 0:24:32.639
<v Speaker 1>Kraus stated that once a user agreed to this, the

0:24:32.640 --> 0:24:36.240
<v Speaker 1>app could access both the front and back camera anytime

0:24:36.280 --> 0:24:38.800
<v Speaker 1>the app was in the foreground of the iOS device.

0:24:39.160 --> 0:24:42.159
<v Speaker 1>It could take videos and pictures with no indication to

0:24:42.200 --> 0:24:44.560
<v Speaker 1>the user that such a thing was happening, and it

0:24:44.600 --> 0:24:47.360
<v Speaker 1>could upload that data to a remote server. It could

0:24:47.400 --> 0:24:51.639
<v Speaker 1>even run real time facial recognition software. Now does this

0:24:51.720 --> 0:24:56.360
<v Speaker 1>mean apps like Facebook's Messenger or YouTube are doing this? Well,

0:24:56.359 --> 0:24:59.480
<v Speaker 1>not necessarily, but it does mean it's at least possible

0:24:59.600 --> 0:25:03.639
<v Speaker 1>to do and nothing is stopping him. More, let's say

0:25:03.680 --> 0:25:08.399
<v Speaker 1>ethically unconcerned app from doing just that. So what can

0:25:08.440 --> 0:25:12.480
<v Speaker 1>you do to protect yourself from bad actors? Uh, here's

0:25:12.520 --> 0:25:16.160
<v Speaker 1>the bad news. Not much you could go without using

0:25:16.160 --> 0:25:19.480
<v Speaker 1>such devices and apps in the first place. That's pretty

0:25:19.560 --> 0:25:23.520
<v Speaker 1>darn restrictive. Crowds recommended using camera covers to obscure the

0:25:23.520 --> 0:25:27.440
<v Speaker 1>phone's cameras when you weren't actively using them, or revoking

0:25:27.520 --> 0:25:30.800
<v Speaker 1>camera access to the various apps on the phone. And

0:25:30.920 --> 0:25:35.000
<v Speaker 1>that's about it. Yikes. Now, when we come back, I'll

0:25:35.040 --> 0:25:38.479
<v Speaker 1>cover a related topic that's been in the news lately.

0:25:38.520 --> 0:25:49.280
<v Speaker 1>But first let's take another quick break. Okay, so we

0:25:49.400 --> 0:25:52.720
<v Speaker 1>know it's possible to use cameras and microphones against people,

0:25:52.960 --> 0:25:56.560
<v Speaker 1>either with malware or what amounts to a security loophole

0:25:56.680 --> 0:26:00.240
<v Speaker 1>between handset hardware and apps. But there's something us we

0:26:00.240 --> 0:26:03.760
<v Speaker 1>need to chat about, and that's humans listening in on

0:26:03.840 --> 0:26:08.160
<v Speaker 1>what were assumed to be private conversations and messages. Now

0:26:08.160 --> 0:26:12.440
<v Speaker 1>here's the context. In August two thousand nineteen, several major

0:26:12.480 --> 0:26:17.480
<v Speaker 1>media outlets reported an upsetting revelation, namely that Facebook had

0:26:17.480 --> 0:26:20.520
<v Speaker 1>been sending out audio files that users were creating in

0:26:20.720 --> 0:26:24.760
<v Speaker 1>Facebook Messenger, for example. And these were audio clips sent

0:26:24.960 --> 0:26:28.720
<v Speaker 1>through Messenger itself, so it's akin to a private text

0:26:28.840 --> 0:26:32.000
<v Speaker 1>to a friend. And Facebook was sending these audio files

0:26:32.040 --> 0:26:36.359
<v Speaker 1>to a third party contractor to transcribe that audio. So

0:26:36.400 --> 0:26:40.159
<v Speaker 1>imagine having a private text message thread set to a

0:26:40.320 --> 0:26:43.600
<v Speaker 1>complete stranger for review. It was similar to that, except

0:26:43.600 --> 0:26:47.080
<v Speaker 1>it was audio, not text. So what's actually going on? Well,

0:26:47.320 --> 0:26:49.520
<v Speaker 1>Facebook said this all had to do with users who

0:26:49.560 --> 0:26:54.200
<v Speaker 1>had opted into having their audio messages transcribed automatically. Essentially,

0:26:54.960 --> 0:26:59.360
<v Speaker 1>it was all about using the voice to text option

0:26:59.800 --> 0:27:06.320
<v Speaker 1>in Facebook. Now, according to Express Computer, this option didn't

0:27:06.359 --> 0:27:09.720
<v Speaker 1>really have a warning that let you know that those

0:27:10.359 --> 0:27:13.560
<v Speaker 1>audio files you were creating through this voice to text

0:27:13.640 --> 0:27:18.040
<v Speaker 1>feature would go to be heard by any humans out there.

0:27:18.560 --> 0:27:21.760
<v Speaker 1>In fact, they said that the warning that would pop up,

0:27:21.840 --> 0:27:25.800
<v Speaker 1>or the notification that popped up said, turn on voice

0:27:25.840 --> 0:27:31.199
<v Speaker 1>to text in this chat using Facebook Messenger, and above

0:27:31.280 --> 0:27:34.119
<v Speaker 1>the no and yes buttons where you would choose one

0:27:34.160 --> 0:27:38.040
<v Speaker 1>of these options. Facebook further would describe the option display

0:27:38.200 --> 0:27:41.720
<v Speaker 1>text of voice clips you send and receive. You can

0:27:41.720 --> 0:27:45.240
<v Speaker 1>control whether text is visible to you for each chat.

0:27:46.359 --> 0:27:49.520
<v Speaker 1>So again it makes it sound like, oh, this is

0:27:49.520 --> 0:27:52.080
<v Speaker 1>all automated. If I use voice to text, I just

0:27:52.320 --> 0:27:55.760
<v Speaker 1>say a phrase, the text shows up. I might have

0:27:55.800 --> 0:27:58.840
<v Speaker 1>to make some adjustments to the text, maybe it has

0:27:58.960 --> 0:28:01.560
<v Speaker 1>misinterpreted one of the words or whatever. But sort of

0:28:01.600 --> 0:28:06.520
<v Speaker 1>a hands free approach to sending messages in Messenger. Lots

0:28:06.560 --> 0:28:09.520
<v Speaker 1>of apps use voice to text features, and in theory

0:28:10.000 --> 0:28:12.760
<v Speaker 1>it's a pretty great feature. You can dictate a message

0:28:12.800 --> 0:28:15.280
<v Speaker 1>to be sent to your friend without having to stare

0:28:15.359 --> 0:28:18.520
<v Speaker 1>at the screen and type or swipe on a keyboard.

0:28:19.200 --> 0:28:22.800
<v Speaker 1>Tons of folks use features like this if they want

0:28:22.840 --> 0:28:25.680
<v Speaker 1>to interact with an app while they're driving, for example,

0:28:25.720 --> 0:28:29.440
<v Speaker 1>to minimize the distractions they have as they putter around.

0:28:30.000 --> 0:28:34.200
<v Speaker 1>But you'll notice those messages don't seem to indicate anywhere

0:28:34.960 --> 0:28:37.800
<v Speaker 1>that the voice to text recordings could be sent to

0:28:38.000 --> 0:28:42.959
<v Speaker 1>a human being for review. Express Computer further explains that

0:28:43.160 --> 0:28:47.200
<v Speaker 1>even on a supplemental page explaining the voice to text feature,

0:28:48.040 --> 0:28:51.280
<v Speaker 1>Facebook fails to mention that human beings will be reviewing

0:28:51.320 --> 0:28:56.040
<v Speaker 1>that material. Instead. The supplemental page talks about how voice

0:28:56.040 --> 0:28:59.680
<v Speaker 1>to text uses machine learning to get better at interpreting

0:28:59.680 --> 0:29:02.160
<v Speaker 1>what you saying, so that it becomes more useful to

0:29:02.200 --> 0:29:05.840
<v Speaker 1>you the more you actually use the feature. So the

0:29:05.880 --> 0:29:10.520
<v Speaker 1>concept here was that some voice recognition software would transcribe

0:29:10.560 --> 0:29:13.880
<v Speaker 1>this audio. Google Voice also used to do this for

0:29:14.000 --> 0:29:17.760
<v Speaker 1>voice messages. I remember getting voicemails from my mother, who

0:29:17.840 --> 0:29:21.600
<v Speaker 1>has a Southern US dialect as do I, but hers

0:29:21.720 --> 0:29:25.520
<v Speaker 1>is more pronounced. The Google Voice speech to text program

0:29:25.640 --> 0:29:30.840
<v Speaker 1>had problems interpreting my mother's messages, and frequently the transcription

0:29:30.880 --> 0:29:34.520
<v Speaker 1>would be hilariously off track, and most of the time

0:29:34.720 --> 0:29:37.200
<v Speaker 1>I wouldn't even be able to guess what the original

0:29:37.240 --> 0:29:40.800
<v Speaker 1>message was based off the transcription. It meant that I

0:29:40.840 --> 0:29:43.240
<v Speaker 1>would listen to the voicemail and then I would shake

0:29:43.280 --> 0:29:46.240
<v Speaker 1>my head a lot as I would read the transcription

0:29:46.320 --> 0:29:48.520
<v Speaker 1>at the same time and just see how far off

0:29:48.600 --> 0:29:53.320
<v Speaker 1>it was. This is a big challenge for voice recognition programs.

0:29:53.560 --> 0:29:57.280
<v Speaker 1>There are a lot of different dialects and accents. People

0:29:57.320 --> 0:30:01.080
<v Speaker 1>from different regions within the same country can sound very

0:30:01.160 --> 0:30:04.680
<v Speaker 1>different even if they're speaking the exact same language. If

0:30:04.680 --> 0:30:08.760
<v Speaker 1>you get someone from Savannah, Georgia, a native of Savannah, Georgia,

0:30:09.000 --> 0:30:12.960
<v Speaker 1>and a native from Boston, Massachusetts, they're going to be

0:30:13.000 --> 0:30:15.600
<v Speaker 1>able to have a conversation with each other, but they

0:30:15.640 --> 0:30:19.280
<v Speaker 1>will end up saying the same words very differently from

0:30:19.280 --> 0:30:22.880
<v Speaker 1>one another. And that's before you even start talking about

0:30:22.960 --> 0:30:26.760
<v Speaker 1>people who have a different native language, who have learned

0:30:26.800 --> 0:30:30.560
<v Speaker 1>English and have a foreign accent on top of the

0:30:30.560 --> 0:30:34.120
<v Speaker 1>English they speak. There's no hard and fast rule you

0:30:34.160 --> 0:30:37.640
<v Speaker 1>can create for a voice recognition program to follow to

0:30:37.800 --> 0:30:42.040
<v Speaker 1>interpret speech correctly throughout a language. Because there's so much

0:30:42.120 --> 0:30:45.000
<v Speaker 1>variation in how the words and that language are said,

0:30:45.600 --> 0:30:49.479
<v Speaker 1>training the model becomes a challenge. So one thing you

0:30:49.560 --> 0:30:53.960
<v Speaker 1>can do is you have a human being transcribe spoken

0:30:54.000 --> 0:30:59.600
<v Speaker 1>words and then compare the human transcription against the machine

0:30:59.680 --> 0:31:03.120
<v Speaker 1>produce transcription in an effort to train your model to

0:31:03.200 --> 0:31:07.840
<v Speaker 1>be more effective. Humans are pretty good, though not perfect,

0:31:08.000 --> 0:31:11.800
<v Speaker 1>at figuring out what some other humans says. Assuming both

0:31:11.840 --> 0:31:15.200
<v Speaker 1>parties are fluent in the same language. By comparing these

0:31:15.200 --> 0:31:17.800
<v Speaker 1>two records against each other and then making corrections to

0:31:17.840 --> 0:31:21.560
<v Speaker 1>the model, computer scientists can tweak their voice recognition software

0:31:21.560 --> 0:31:25.479
<v Speaker 1>models to be more accurate. Now, ideally you would do

0:31:25.520 --> 0:31:29.440
<v Speaker 1>this before unleashing such a system on the public, but

0:31:29.760 --> 0:31:33.360
<v Speaker 1>that's not really that practical. There is no in lab

0:31:33.520 --> 0:31:36.280
<v Speaker 1>project that is going to come close to generating the

0:31:36.360 --> 0:31:39.800
<v Speaker 1>amount of data and the sheer variety that you will

0:31:39.880 --> 0:31:43.360
<v Speaker 1>encounter out in the real world. Improving the model would

0:31:43.360 --> 0:31:47.360
<v Speaker 1>happen much faster with a larger sample of subjects using

0:31:47.480 --> 0:31:50.520
<v Speaker 1>the model, and a billion or so people is a

0:31:50.560 --> 0:31:55.400
<v Speaker 1>pretty darn big sample size. But that means sending these

0:31:55.440 --> 0:31:59.320
<v Speaker 1>audio files to humans in the first place. And Facebook

0:31:59.320 --> 0:32:02.520
<v Speaker 1>has said that the files were anonymized so that there

0:32:02.560 --> 0:32:06.240
<v Speaker 1>was no identifiable name or anything associated with each of

0:32:06.240 --> 0:32:09.440
<v Speaker 1>the audio files being sent for human review. But hey,

0:32:09.600 --> 0:32:12.360
<v Speaker 1>I hear you say. Earlier in this episode, you pointed

0:32:12.360 --> 0:32:14.480
<v Speaker 1>out how it's possible to really get an idea about

0:32:14.480 --> 0:32:18.640
<v Speaker 1>a person just from the other data they provide, and

0:32:18.720 --> 0:32:22.520
<v Speaker 1>you'd be right. These audio files had all sorts of

0:32:22.560 --> 0:32:25.480
<v Speaker 1>different types of content in them, some of it was

0:32:25.600 --> 0:32:30.719
<v Speaker 1>likely upsetting disturbing or inappropriate. Contractors who had been hired

0:32:30.760 --> 0:32:34.320
<v Speaker 1>to do the transcription came forward anonymously, I might add,

0:32:34.320 --> 0:32:36.520
<v Speaker 1>because they didn't want to get fired from their jobs,

0:32:36.920 --> 0:32:40.040
<v Speaker 1>and said they felt that the practice was an unethical one.

0:32:40.280 --> 0:32:42.680
<v Speaker 1>And media outlets looked into it and their conclusions were

0:32:42.680 --> 0:32:45.480
<v Speaker 1>pretty much the same. Right down the board, Facebook was

0:32:45.600 --> 0:32:49.440
<v Speaker 1>not transparent about what was happening with this audio, and

0:32:49.440 --> 0:32:52.680
<v Speaker 1>there were no clear indications to users that their audio

0:32:52.680 --> 0:32:55.480
<v Speaker 1>files might get sent to some stranger for the purposes

0:32:55.520 --> 0:32:59.280
<v Speaker 1>of transcription. Now, for its part, Facebook said it halted

0:32:59.280 --> 0:33:03.080
<v Speaker 1>the practice in early August two thousand nineteen, and third

0:33:03.120 --> 0:33:06.280
<v Speaker 1>party contractors have said that that is true that they

0:33:06.320 --> 0:33:09.480
<v Speaker 1>no longer are doing this work for Facebook. Facebook isn't

0:33:09.480 --> 0:33:11.680
<v Speaker 1>the only company to come under scrutiny for this kind

0:33:11.720 --> 0:33:15.320
<v Speaker 1>of thing. Google, Apple, and Microsoft have also been under

0:33:15.320 --> 0:33:18.880
<v Speaker 1>the microscope for very similar practices. Now, on the one hand,

0:33:19.320 --> 0:33:22.160
<v Speaker 1>it's understandable that these companies want to improve their voice

0:33:22.200 --> 0:33:26.280
<v Speaker 1>recognition capabilities. It's what makes these apps and products useful

0:33:26.720 --> 0:33:29.640
<v Speaker 1>and makes it more useful to a wider variety of

0:33:29.680 --> 0:33:33.120
<v Speaker 1>people by training the models on this stuff. But the

0:33:33.160 --> 0:33:37.040
<v Speaker 1>privacy concerns remain and it's something that isn't just troubling

0:33:37.080 --> 0:33:39.640
<v Speaker 1>to users, but to the people actually being paid to

0:33:39.720 --> 0:33:42.480
<v Speaker 1>transcribe the stuff in the first place. Now, it would

0:33:42.520 --> 0:33:46.160
<v Speaker 1>be another matter if the companies were transparent about this practice.

0:33:46.480 --> 0:33:50.040
<v Speaker 1>If users knew that there's a chance a real, live

0:33:50.120 --> 0:33:52.200
<v Speaker 1>human being would be listening in on some of those

0:33:52.240 --> 0:33:55.680
<v Speaker 1>voice messages for the purposes of quality control for the

0:33:55.760 --> 0:33:59.000
<v Speaker 1>voice to text feature, maybe they wouldn't opt into using

0:33:59.000 --> 0:34:01.239
<v Speaker 1>the voice to text in the first place, or they

0:34:01.320 --> 0:34:05.080
<v Speaker 1>might opt in and not care. In some cases, I'm

0:34:05.080 --> 0:34:07.120
<v Speaker 1>sure there'd be no shortage of people who would actually

0:34:07.160 --> 0:34:11.680
<v Speaker 1>say truly terrible things, hoping that some poor contractor would

0:34:11.719 --> 0:34:13.760
<v Speaker 1>have to listen to it all and check the audio

0:34:13.800 --> 0:34:18.480
<v Speaker 1>against the automated transcription, because some people would just play nasty.

0:34:18.880 --> 0:34:21.480
<v Speaker 1>Don't be nasty. By the way, there are better ways

0:34:21.480 --> 0:34:24.759
<v Speaker 1>to entertain yourself than by making some other person's life miserable.

0:34:25.560 --> 0:34:30.480
<v Speaker 1>Facebook could potentially face some serious charges based on this practice.

0:34:30.880 --> 0:34:34.279
<v Speaker 1>The company had settled with the Federal Trade Commission, or FTC,

0:34:35.000 --> 0:34:38.320
<v Speaker 1>earlier in the summer of two thousand nineteen. The settlement

0:34:38.400 --> 0:34:43.040
<v Speaker 1>was for an incredible five billion dollars, and it largely

0:34:43.040 --> 0:34:47.400
<v Speaker 1>revolved around the company's rather abysmal record with privacy. The

0:34:47.520 --> 0:34:50.520
<v Speaker 1>charges date all the way back to two thousand twelve,

0:34:50.800 --> 0:34:55.440
<v Speaker 1>when the FTC brought eight privacy related allegations against Facebook.

0:34:55.920 --> 0:34:59.239
<v Speaker 1>And again, this isn't a big surprise. Zuckerberg had already

0:34:59.360 --> 0:35:03.759
<v Speaker 1>cavalierly proclaimed privacy dead a couple of years before that. Now,

0:35:03.760 --> 0:35:07.120
<v Speaker 1>in the settlement, Facebook agreed to adhere to some rules.

0:35:07.400 --> 0:35:11.440
<v Speaker 1>Those rules said that Facebook was prohibited from making misrepresentations

0:35:11.520 --> 0:35:15.920
<v Speaker 1>about the privacy or security of consumers information, prohibited from

0:35:15.960 --> 0:35:20.120
<v Speaker 1>misrepresenting the extent to which it shares personal data, and

0:35:20.239 --> 0:35:24.560
<v Speaker 1>it required Facebook to implement a reasonable privacy program. Now

0:35:24.600 --> 0:35:28.319
<v Speaker 1>I'm no legal expert, not by a long shot, but

0:35:28.400 --> 0:35:32.200
<v Speaker 1>it seems to me that Facebook's failure to alert users

0:35:32.280 --> 0:35:34.640
<v Speaker 1>that their voice to text data could be sent to

0:35:34.760 --> 0:35:39.440
<v Speaker 1>non Facebook employees for review is in violation of this agreement.

0:35:39.880 --> 0:35:43.080
<v Speaker 1>That Facebook agreed to these terms in July two thousand nineteen,

0:35:43.520 --> 0:35:47.640
<v Speaker 1>and then continued the practice into August is a big problem.

0:35:47.680 --> 0:35:50.160
<v Speaker 1>Whether or not it will result in further legal action

0:35:50.480 --> 0:35:53.840
<v Speaker 1>against this company is unknown as I record this episode,

0:35:54.040 --> 0:35:57.440
<v Speaker 1>but it seems like it's at least possible, So I'm

0:35:57.440 --> 0:36:00.160
<v Speaker 1>gonna wrap this up. We know that microphones can sit

0:36:00.239 --> 0:36:02.440
<v Speaker 1>in on us without our knowledge. The n s A

0:36:02.560 --> 0:36:05.759
<v Speaker 1>worked on programs in the United States that did exactly that.

0:36:06.239 --> 0:36:09.120
<v Speaker 1>And while companies with virtual personal assistants tell us that

0:36:09.160 --> 0:36:13.399
<v Speaker 1>those assistants only activate when certain phrases are spoken, it's

0:36:13.440 --> 0:36:16.760
<v Speaker 1>also possible that that list of phrases could go well

0:36:16.840 --> 0:36:20.480
<v Speaker 1>beyond the ones published by the company. So, in other words,

0:36:20.880 --> 0:36:24.799
<v Speaker 1>I might know that to wake up my hypothetical virtual assistant,

0:36:25.080 --> 0:36:28.759
<v Speaker 1>I would have to say the alert phrase sky net awaken,

0:36:29.200 --> 0:36:31.520
<v Speaker 1>and then it pays attention. But what if there's a

0:36:31.560 --> 0:36:35.680
<v Speaker 1>whole laundry list of other words or phrases that could

0:36:35.719 --> 0:36:38.880
<v Speaker 1>wake it up so that it records or transcribes whatever

0:36:38.960 --> 0:36:43.040
<v Speaker 1>audio follows. What if, for example, the phrase shopping or

0:36:43.280 --> 0:36:48.240
<v Speaker 1>going shopping activates it so that whatever follows gets registered

0:36:48.280 --> 0:36:50.320
<v Speaker 1>by the device. So if I tell a friend tomorrow,

0:36:50.360 --> 0:36:53.839
<v Speaker 1>I'm going shopping for some new sneakers, the device has

0:36:53.880 --> 0:36:57.279
<v Speaker 1>registered the phrase new speakers because it paid attention once

0:36:57.320 --> 0:37:00.200
<v Speaker 1>I said the words going shopping, and then I starting

0:37:00.200 --> 0:37:03.359
<v Speaker 1>ads pop up everywhere I go online for sneakers. Now,

0:37:03.440 --> 0:37:08.759
<v Speaker 1>is that something that's possible, Well, yeah, it's possible. That

0:37:08.800 --> 0:37:12.399
<v Speaker 1>doesn't mean it's happening, but it could be It's also

0:37:12.440 --> 0:37:15.440
<v Speaker 1>possible that my other behaviors have indicated that I'm on

0:37:15.480 --> 0:37:19.160
<v Speaker 1>the lookout for some new kicks. Coincidence is a thing,

0:37:19.480 --> 0:37:23.319
<v Speaker 1>and it's frustrating because without seeing behind the scenes, it's

0:37:23.360 --> 0:37:28.120
<v Speaker 1>hard to draw any firm conclusions. Most of us, myself included,

0:37:28.400 --> 0:37:32.000
<v Speaker 1>have a limited understanding of exactly how much data we're

0:37:32.040 --> 0:37:34.719
<v Speaker 1>generating in our day to day lives and how that

0:37:34.840 --> 0:37:38.719
<v Speaker 1>data can be analyzed for patterns and predictions. We may

0:37:38.760 --> 0:37:42.080
<v Speaker 1>not even be aware that we're heading toward a particular

0:37:42.120 --> 0:37:46.840
<v Speaker 1>decision before an algorithm draws that conclusion, and it's spooky

0:37:46.960 --> 0:37:50.080
<v Speaker 1>and disturbing. But it doesn't necessarily mean that we're being

0:37:50.160 --> 0:37:53.440
<v Speaker 1>spied on by a microphone. It may mean we're just

0:37:53.520 --> 0:37:57.880
<v Speaker 1>broadcasting our decisions before we've known that we've made a decision,

0:37:58.600 --> 0:38:01.640
<v Speaker 1>and it does indicate that there is some sort of

0:38:02.000 --> 0:38:05.800
<v Speaker 1>eaves dropping going on, just not necessarily audio eaves dropping.

0:38:05.800 --> 0:38:09.800
<v Speaker 1>It's more about all of our other behaviors that humans

0:38:09.840 --> 0:38:11.919
<v Speaker 1>don't pick up on, so we've never had to worry

0:38:11.960 --> 0:38:14.840
<v Speaker 1>about it before, but machines can analyze it at a

0:38:14.920 --> 0:38:19.080
<v Speaker 1>level that is disturbing. In fact, an actual study at

0:38:19.120 --> 0:38:22.560
<v Speaker 1>Northeastern University looked into the possibility of whether or not

0:38:22.719 --> 0:38:26.960
<v Speaker 1>phones were getting activated by clandestine phrases and listening in

0:38:27.000 --> 0:38:30.400
<v Speaker 1>on conversations, and it found that there was no evidence

0:38:30.480 --> 0:38:32.920
<v Speaker 1>that this was happening. They did find that a lot

0:38:33.000 --> 0:38:36.360
<v Speaker 1>of apps were taking screenshots of stuff on phones and

0:38:36.400 --> 0:38:39.080
<v Speaker 1>sending those screenshots to third parties, though, so you know,

0:38:39.560 --> 0:38:44.600
<v Speaker 1>that's also disturbing, But it doesn't appear that these devices

0:38:44.600 --> 0:38:48.320
<v Speaker 1>are actively listening to you all the time and recording

0:38:48.400 --> 0:38:54.120
<v Speaker 1>or transcribing or broadcasting that information anywhere. There's a lot

0:38:54.200 --> 0:38:59.600
<v Speaker 1>to lose from doing that approach. The problem is it

0:38:59.800 --> 0:39:03.239
<v Speaker 1>is something that is possible, and the other problem is

0:39:03.280 --> 0:39:06.239
<v Speaker 1>that there are other behaviors were doing that are just

0:39:06.320 --> 0:39:09.719
<v Speaker 1>as revealing, if not more so, than recording what it

0:39:09.840 --> 0:39:13.919
<v Speaker 1>is we're saying, and that without being aware of that,

0:39:14.360 --> 0:39:18.040
<v Speaker 1>we are just giving away more and more information about

0:39:18.040 --> 0:39:21.200
<v Speaker 1>ourselves and more and more control over our own lives.

0:39:21.360 --> 0:39:23.360
<v Speaker 1>And we're going to see more and more targeted ads

0:39:23.360 --> 0:39:27.400
<v Speaker 1>that seem super creepy because there's mentioning things that we

0:39:27.400 --> 0:39:31.359
<v Speaker 1>didn't think anyone knew about, because most people wouldn't pick

0:39:31.440 --> 0:39:35.080
<v Speaker 1>up on it fun times, So I don't think this

0:39:35.160 --> 0:39:39.800
<v Speaker 1>was a particularly you know, um, I don't think this

0:39:39.880 --> 0:39:44.440
<v Speaker 1>show really helps allay any fears. It may just switch

0:39:44.520 --> 0:39:48.759
<v Speaker 1>fears from microphones to everything else. But I did want

0:39:48.760 --> 0:39:50.920
<v Speaker 1>to cover this because a lot of people have been

0:39:50.960 --> 0:39:53.319
<v Speaker 1>talking about it for the last few years, and with

0:39:53.520 --> 0:39:59.560
<v Speaker 1>these transcription services that has brought the whole conversation back

0:39:59.640 --> 0:40:02.120
<v Speaker 1>into you the forefront. So I wanted to take an

0:40:02.160 --> 0:40:05.080
<v Speaker 1>opportunity to really tackle it here on the show. If

0:40:05.120 --> 0:40:08.080
<v Speaker 1>you have a suggestion for a future episode of tech Stuff,

0:40:08.320 --> 0:40:10.920
<v Speaker 1>send me an email the addresses tech Stuff at how

0:40:11.000 --> 0:40:13.319
<v Speaker 1>stuff works dot com, or drop me a line. By

0:40:13.640 --> 0:40:16.760
<v Speaker 1>going to tech stuff podcast dot com. You will find

0:40:16.920 --> 0:40:20.239
<v Speaker 1>there a link to all of our archived episodes, as

0:40:20.280 --> 0:40:23.120
<v Speaker 1>well as links to our presence on social media where

0:40:23.160 --> 0:40:25.120
<v Speaker 1>you can get in touch with us, and also a

0:40:25.160 --> 0:40:27.640
<v Speaker 1>link to our online store, where every purchase you make

0:40:27.760 --> 0:40:30.880
<v Speaker 1>goes to help the show. We greatly appreciate your support

0:40:31.400 --> 0:40:39.359
<v Speaker 1>and I will talk to you again really soon. Text

0:40:39.400 --> 0:40:42.040
<v Speaker 1>Stuff is a production of I Heart Radio's How Stuff Works.

0:40:42.200 --> 0:40:45.040
<v Speaker 1>For more podcasts from my heart Radio, visit the i

0:40:45.160 --> 0:40:48.360
<v Speaker 1>heart Radio app, Apple Podcasts, or wherever you listen to

0:40:48.400 --> 0:40:49.360
<v Speaker 1>your favorite shows.