WEBVTT - Zoom and Enhance: The Sound Edition 0:00:00.160 --> 0:00:07.160 Brought to you by Toyota. Let's go places. Welcome to 0:00:07.360 --> 0:00:14.920 Forward Thinking. Pay there and welcome to Forward Thinking, the 0:00:15.000 --> 0:00:17.680 podcast that looks at the future and says, but my 0:00:17.800 --> 0:00:21.239 words like silent raindrops fell and echoed in the wells 0:00:21.360 --> 0:00:27.960 of silence. I'm Jonathan Strickland and I'm Joe McCormick. And today, Yes, Joe, 0:00:28.800 --> 0:00:32.680 we're going to be zooming in and enhancing the topic 0:00:32.720 --> 0:00:35.440 we've talked about before. Yeah. We we talked about zoom 0:00:35.440 --> 0:00:39.120 and Enhanced a while ago, right, we did one way 0:00:39.159 --> 0:00:43.440 back in Uh, you know, I haven't gone back and 0:00:43.479 --> 0:00:46.600 listened to any of our episodes from in a while, 0:00:46.680 --> 0:00:50.800 but I don't know. I feel like I'd be embarrassed 0:00:50.800 --> 0:00:54.440 to hear myself. Then we have learned a lot about podcasting. 0:00:55.480 --> 0:01:00.240 I've forgotten more than I've ever learned. Wait what No, anyway, So, yeah, 0:01:00.280 --> 0:01:02.960 we we recorded one about zoom and Enhanced with image 0:01:03.600 --> 0:01:06.480 imagery back in August two thousand teen. We're recording this 0:01:06.520 --> 0:01:08.680 at the beginning of September two thousand fifteen, so it's 0:01:08.680 --> 0:01:11.639 been more than two years. So clearly, first of all, 0:01:12.400 --> 0:01:15.040 we finally solve that zoom and enhanced for video and 0:01:15.400 --> 0:01:18.280 pictures right, That's that's done. Now you can take like 0:01:18.319 --> 0:01:22.680 a highly pixelated JPEG and turn it into a full 0:01:22.720 --> 0:01:25.360 motion three D video and it doesn't matter if that 0:01:25.480 --> 0:01:29.280 image was taken seventy years ago. You can still do it, 0:01:29.720 --> 0:01:33.320 you know. I came across just recently. Uh, my wife 0:01:33.440 --> 0:01:36.480 Rachel and I have been going back through The X 0:01:36.520 --> 0:01:41.360 Files because it's up on Netflix, and and and uh Christian, 0:01:41.440 --> 0:01:43.720 our our colleague Christian who's also on Stuff to Blow 0:01:43.760 --> 0:01:45.640 your mind with me and Lauren, there have been like, 0:01:45.680 --> 0:01:48.280 you've got to watch the X Files, so we have, 0:01:48.760 --> 0:01:52.920 and we came across an episode that has probably the 0:01:52.920 --> 0:01:56.680 most egregious and ridiculous case of zoom and enhanced I 0:01:56.760 --> 0:02:00.680 have ever seen in any show. It's an episode where 0:02:00.720 --> 0:02:05.400 there is a character who does psychic photography, so he 0:02:05.520 --> 0:02:08.079 like gets near a camera and whatever is going on 0:02:08.160 --> 0:02:11.200 in his head ends up on the film, and it's 0:02:11.280 --> 0:02:17.560 this polaroid of somebody's vision of some hellish nightmare escape 0:02:17.600 --> 0:02:21.480 with these ghosts called howlers screaming around a woman's face, 0:02:21.800 --> 0:02:25.079 and then in the background there's this little blur and 0:02:25.080 --> 0:02:27.520 and Moulder goes to the lab where they zoom and 0:02:27.639 --> 0:02:30.280 enhance the little blur and then they get this perfectly 0:02:30.360 --> 0:02:34.239 resolved picture of a guy's face from a polaroid. Yeah, 0:02:34.320 --> 0:02:37.960 seems like that might be a little far fetched. Um. Yeah, 0:02:38.000 --> 0:02:40.920 as it turns out, we do not have this magical capability. 0:02:40.960 --> 0:02:44.679 Things have improved since two thousand thirteen. You know, the 0:02:45.040 --> 0:02:48.680 basic premise of zoom and enhances all about you take 0:02:48.919 --> 0:02:53.320 you take an existing video or image, you concentrate your 0:02:53.400 --> 0:02:57.600 view on one particular sector of that image, and then 0:02:57.680 --> 0:03:00.320 you zoom that in and you are in and swing 0:03:00.360 --> 0:03:03.000 the pictures so that it is more uh, that you 0:03:03.040 --> 0:03:05.399 can see what it is. And for one thing, we've 0:03:05.440 --> 0:03:09.280 got super high resolution cameras now, so there's some that 0:03:09.320 --> 0:03:12.560 are so high resolution that a normal view you'd be 0:03:12.600 --> 0:03:15.560 looking at you know, like a like a YouTube video, 0:03:15.720 --> 0:03:18.480 something along that size. And then it turns out that 0:03:18.560 --> 0:03:22.160 the resolution is so high you could digitally punch into 0:03:22.200 --> 0:03:25.359 parts of that video and not lose a lot of resolutions. 0:03:25.400 --> 0:03:27.480 So it gives you the effect of zoom and enhanced. 0:03:27.480 --> 0:03:29.960 But really it's not enhancing. It's just that information is 0:03:30.000 --> 0:03:33.320 already there. Yeah, and that's the key. The information is 0:03:33.360 --> 0:03:37.280 already there. We have never gotten and we will never 0:03:37.360 --> 0:03:41.720 get to a point where there can be information retrieved 0:03:41.800 --> 0:03:44.680 from an image that was not recorded in the image. 0:03:44.680 --> 0:03:47.320 Now we can have it simulated. Right. That's exactly what 0:03:47.360 --> 0:03:50.880 we're going to talk about today, is finding ways to 0:03:51.000 --> 0:03:54.280 use what information is recorded in the image and do 0:03:54.520 --> 0:03:57.760 very smart things with it with computer programs. Right. And 0:03:57.800 --> 0:04:01.040 the really cool thing is that we're not talking about 0:04:01.400 --> 0:04:05.680 actual like images as the end result in this case, right, 0:04:05.800 --> 0:04:11.280 we are talking about using images to reconstruct sound. Yeah. 0:04:11.320 --> 0:04:14.920 And this all comes to us because we watched a 0:04:15.000 --> 0:04:19.080 TED talk in which a computer scientist named Aid Davis 0:04:19.480 --> 0:04:25.080 talked about a really interesting project that involved cameras and 0:04:25.640 --> 0:04:30.080 uh not necessarily well, I guess inanimate objects really and 0:04:30.200 --> 0:04:34.240 being able to reconstruct sound that took place near that 0:04:34.279 --> 0:04:39.040 inanimate object as if that inanimate object itself were a microphone. 0:04:40.160 --> 0:04:43.800 Say that all again? All right, so, um, there's this 0:04:43.800 --> 0:04:46.520 guy named Aimed Davis. No no, no, no, no. The 0:04:46.600 --> 0:04:50.800 relevant part it does what sound? So yeah, it acts 0:04:50.800 --> 0:04:54.520 as if any object itself is a microphone, not in 0:04:54.560 --> 0:04:58.400 the sense of amplifying what was said, but in the 0:04:58.480 --> 0:05:02.400 sense of of being being a record of what was said, 0:05:02.440 --> 0:05:07.120 Like the vibrations of the material itself are able to 0:05:07.320 --> 0:05:11.599 inform us enough that we can we can replicate the 0:05:11.680 --> 0:05:14.480 sound that was created. Yeah, he's he's part of this 0:05:14.560 --> 0:05:17.279 team that includes other researchers at m I t he's 0:05:17.320 --> 0:05:19.800 a he's a grad student or possibly a graduate of 0:05:19.880 --> 0:05:22.880 m I t UH and also scientists over at Microsoft 0:05:22.920 --> 0:05:26.560 and Adobe. And this is so cool. Yes, this, So 0:05:26.600 --> 0:05:31.479 this is a synesthesia machine. It sees sound. This it's 0:05:31.520 --> 0:05:34.560 like it took a bunch of LSD and began to 0:05:34.680 --> 0:05:39.440 see the music. Except that's what literally that they can do. Now. Yeah, 0:05:39.440 --> 0:05:42.040 it's and and when you start to break down what's 0:05:42.080 --> 0:05:46.080 going on, it starts to be less magical, but no 0:05:46.400 --> 0:05:49.640 less amazing. All right, So let's let's take some of 0:05:49.680 --> 0:05:52.560 the mysticism and magic out first. And to do that, 0:05:52.640 --> 0:05:54.880 we have to talk about what sound is, which we've 0:05:54.920 --> 0:05:56.960 done on this show before, but I'm going to give 0:05:56.960 --> 0:06:01.800 a quick rundown. So sound is really the energy of vibration, right, 0:06:01.839 --> 0:06:05.440 So when something vibrates, that's when it's creating a sound. 0:06:06.000 --> 0:06:09.360 And as long as there is some sort of medium 0:06:09.560 --> 0:06:13.120 for that sound to travel through, such as air exactly, 0:06:13.440 --> 0:06:16.080 water or the wood of a table that you've put 0:06:16.080 --> 0:06:19.160 your head down on, right, any of these things. Metal 0:06:19.200 --> 0:06:27.840 mask around your head hatchet. Now you're just giving Aaron 0:06:27.880 --> 0:06:31.839 Cooper father for the next next image. Alright, So but yes, 0:06:31.880 --> 0:06:34.240 as long as there's a medium through which sound can travel, 0:06:34.279 --> 0:06:36.840 it will travel as far as it possibly can before 0:06:36.839 --> 0:06:41.040 the energy has essentially dissipated. And this is why sound 0:06:41.040 --> 0:06:43.679 does not really does not travel in space because space 0:06:43.760 --> 0:06:47.159 is effectively a vacuum. So there are no particles, there's 0:06:47.160 --> 0:06:50.080 no medium through which the sound can travel. But here 0:06:50.160 --> 0:06:53.560 on Earth we've got air, thank goodness, because this is 0:06:53.560 --> 0:06:57.039 where I keep all my stuff, and air can act 0:06:57.120 --> 0:07:00.200 as a medium through which sound can travel. So what 0:07:00.279 --> 0:07:03.679 happens is when something vibrates, h, it begins to pull 0:07:03.720 --> 0:07:07.400 and push uh, the air around it. So if you 0:07:07.440 --> 0:07:10.560 imagine a vibration, some of those vibrations are going to 0:07:10.720 --> 0:07:15.320 move inward based on your perspective, that's gonna pull air 0:07:15.400 --> 0:07:18.920 toward it. Sometimes it's going to be moving outward, pushing 0:07:18.960 --> 0:07:22.360 air away from it. So think of like a a 0:07:22.480 --> 0:07:26.600 vibrating string on a guitar or a vibrating drumhead. Uh, 0:07:26.640 --> 0:07:30.440 that's going to be pushing and pulling air. Now that air, 0:07:30.680 --> 0:07:32.640 in turn, is going to be pushing and pulling the 0:07:32.640 --> 0:07:35.200 air molecules around it, and so on and so forth. 0:07:35.200 --> 0:07:38.160 It's this great, big chain reaction because our atmosphere is 0:07:38.160 --> 0:07:41.400 a giant fluid, right, It's it's a gas, but it's 0:07:41.400 --> 0:07:44.960 it's a It acts as a fluid. So these various 0:07:44.960 --> 0:07:48.400 molecules will continue to push and pull, and then eventually 0:07:48.520 --> 0:07:52.000 that motion will make it into the air inside of 0:07:52.040 --> 0:07:55.120 your ears. So it's not that the air molecules that 0:07:55.160 --> 0:07:57.600 were next to the strumming string on the guitar have 0:07:57.760 --> 0:08:00.520 magically made their way to your ear. It's rather that 0:08:00.520 --> 0:08:04.400 that motion has continued to move up at the speed 0:08:04.440 --> 0:08:07.960 of push to your to the air inside your ears. Sure, 0:08:08.320 --> 0:08:11.160 now at that point it ends up vibrating your ear drum, 0:08:11.240 --> 0:08:15.120 which then goes through this whole complicated series of maneuvers 0:08:15.160 --> 0:08:17.760 where you're talking about tiny bones and the cochlea and 0:08:17.960 --> 0:08:19.960 fluid and we're not going to get into this to 0:08:20.120 --> 0:08:23.080 how hearing works. You can actually read an amazing article 0:08:23.080 --> 0:08:25.160 at how stuff works dot com on how hearing works 0:08:25.200 --> 0:08:30.960 that explains it. But our brains ultimately interpret this motion 0:08:31.440 --> 0:08:35.000 as sound. Now, of course, the key fact here is 0:08:35.080 --> 0:08:39.840 that sound is vibration, and that vibration is something that 0:08:39.920 --> 0:08:43.400 you could in theory. See, yeah, if you could see 0:08:43.440 --> 0:08:45.880 fast enough, and if you could tell what you were 0:08:45.880 --> 0:08:49.719 looking at. Yeah, if you could see with the ability 0:08:49.760 --> 0:08:55.120 to really notice minute changes. I said fast enough, I 0:08:55.160 --> 0:08:58.240 should have said I guess fast enough enough frames per 0:08:58.280 --> 0:09:01.960 second and with enough resolution ship Yeah right. So, uh, 0:09:02.000 --> 0:09:04.760 you know this is like there's certain videos where if 0:09:04.800 --> 0:09:08.880 you do you know, high speed photography, high speed film, 0:09:08.960 --> 0:09:12.520 you can see how how something like a tuning fork, 0:09:13.240 --> 0:09:16.000 when you strike it and it's vibrating, you can actually 0:09:16.040 --> 0:09:21.080 see how it's moving in and out of its normal alignment, 0:09:21.160 --> 0:09:24.520 and it looks really freaky because when you just look 0:09:24.559 --> 0:09:27.600 at it with our normal eyes are normal ability to perceive, 0:09:28.200 --> 0:09:31.360 it doesn't really have that, you know, you don't see 0:09:31.360 --> 0:09:33.839 it distorting like it does in that high speed video. 0:09:34.760 --> 0:09:39.240 But uh, if we could see it, and if we 0:09:39.320 --> 0:09:42.640 could then interpret those vibrations, if we if we knew, 0:09:43.120 --> 0:09:46.920 all right, it's vibrating at this speed and this amplitude, 0:09:47.320 --> 0:09:50.679 that would tell us the pitch and volume of the 0:09:50.760 --> 0:09:53.880 sound that was affecting it, if we knew enough of 0:09:53.920 --> 0:09:58.800 the properties of the material itself. So that's the basis 0:09:58.920 --> 0:10:01.640 of the experiment that these folks from in my Tea 0:10:01.800 --> 0:10:06.200 were following, and it was all about kind of pointing 0:10:06.280 --> 0:10:10.840 a camera at an object, a camera that was capable 0:10:10.960 --> 0:10:15.920 of detecting these minute changes, these these movements of that object, 0:10:16.360 --> 0:10:19.320 and then feeding that through a computer that had an 0:10:19.360 --> 0:10:24.000 algorithm that could interpret those changes as sound and then 0:10:25.080 --> 0:10:29.440 reconstruct the sound that must have happened to produce those changes. 0:10:30.000 --> 0:10:33.920 And the results are pretty amazing. Yeah, I have to 0:10:33.960 --> 0:10:39.079 say I was really impressed. I am astonished. Yeah. Yeah, 0:10:39.160 --> 0:10:42.240 it was one of those things where well, first of all, uh, 0:10:42.480 --> 0:10:45.800 they decided to use Mary had a Little Lamb as 0:10:45.920 --> 0:10:49.120 a lot of their their you know what they would 0:10:49.120 --> 0:10:53.480 try to record, right, which is a throwback to experiments 0:10:53.520 --> 0:10:56.880 that Edison was doing way back when. Yeah. The first, 0:10:57.000 --> 0:11:00.280 the earliest recording that we know of that tom as 0:11:00.400 --> 0:11:04.520 and made dates to eighteen seventy eight. It was on 0:11:04.559 --> 0:11:11.319 a device that recorded messages onto tinfoil. Interestingly, the scholarship 0:11:11.360 --> 0:11:14.280 suggests that it wasn't Edison himself that provided the voice. 0:11:14.320 --> 0:11:18.480 It was probably someone else, but the voice says he made. 0:11:20.480 --> 0:11:23.559 The voice says Mary had a little Lamb, And it's 0:11:23.679 --> 0:11:27.520 very loud and very deliberate because the technology was brand 0:11:27.520 --> 0:11:30.560 new and it was not high fidelity by any stretch 0:11:30.559 --> 0:11:33.280 of the imagination. So this is kind of a a 0:11:34.720 --> 0:11:37.439 sort of genuflecting to history, saying, well, that this was 0:11:37.480 --> 0:11:42.160 a significant moment in history. We're going to use that same, uh, 0:11:42.200 --> 0:11:46.720 that same idea when we're trying this new experiment. And 0:11:46.760 --> 0:11:49.720 it worked like that. They did both tones of the 0:11:49.760 --> 0:11:52.960 song Mary had a Little Lamb, and they also did 0:11:53.080 --> 0:11:56.720 spoken variations of Mary had a Little Lamp, which in 0:11:56.800 --> 0:11:58.719 the Ted talk I highly recommend you watch it. It's 0:11:58.800 --> 0:12:02.920 very entertaining. Abe Davis is actually very entertaining presented. He 0:12:03.000 --> 0:12:05.560 talks about how he he shot a you know, one 0:12:05.559 --> 0:12:08.479 of the videos. He shows the video of him shouting 0:12:08.920 --> 0:12:12.360 at an empty bag of potato chips. Yeah. Yeah, it 0:12:12.480 --> 0:12:16.000 is a very technical experiment that definitely involves M. I. T. 0:12:16.240 --> 0:12:19.560 Grad students yelling the lyrics to Marry had a Little 0:12:19.640 --> 0:12:22.800 lamb at empty bags of potato chips. Right. He even 0:12:22.840 --> 0:12:25.640 talks about how, you know, originally they wanted to have 0:12:26.280 --> 0:12:30.079 the best possible um a chance to be able to 0:12:30.120 --> 0:12:33.120 pick up these vibrations. They knew that these uh, these 0:12:33.400 --> 0:12:36.439 vibrations were going to be tiny like a micrometer, like 0:12:36.440 --> 0:12:39.880 like a tenth of a micrometer. Yeah, that's super tiny. 0:12:40.559 --> 0:12:42.520 So they wanted to be able to get that with 0:12:42.760 --> 0:12:45.480 a pretty high resolution high speed camera, and they had 0:12:45.520 --> 0:12:48.240 to use a lot of light because these high speed 0:12:48.280 --> 0:12:50.400 cameras that you know, the shutter speed is so fast 0:12:50.760 --> 0:12:52.200 that you need a lot of light to light your 0:12:52.240 --> 0:12:54.520 scene in order to get an image of what you're 0:12:54.760 --> 0:12:57.680 you're pointing the camera at. And he even talked about 0:12:57.720 --> 0:13:00.000 how the lights were so hot that on a previous 0:13:00.200 --> 0:13:04.080 experiment they melted the bag, the empty bag of potato 0:13:04.160 --> 0:13:06.040 chips as a result of this. So it was a 0:13:06.040 --> 0:13:09.760 lot of trial and error early on, but it worked. Yeah. So, 0:13:09.760 --> 0:13:13.120 so they were using objects like like bags of potato chips, 0:13:13.160 --> 0:13:17.079 empty bags of potato chips, and potted plants. And the 0:13:17.120 --> 0:13:20.240 camera that they were using for these first experiments was 0:13:20.240 --> 0:13:24.440 was a high speed camera could capture at two thousand 0:13:24.480 --> 0:13:27.640 to six thousand frames per second, which is a higher 0:13:27.679 --> 0:13:31.520 frequency than the audio signal. But it certainly isn't like 0:13:31.520 --> 0:13:35.320 like the highest possible end high speed camera on the market, 0:13:35.400 --> 0:13:38.680 Phantom or anything. Yeah. Yeah, yeah, the the highest speed 0:13:38.679 --> 0:13:42.280 cameras run something like a thousand a hundred thousand frames 0:13:42.280 --> 0:13:47.079 per second. Sorry, And the software could could pick up 0:13:47.120 --> 0:13:52.160 these tiny, these tiny, tiny movements. Um a tenth of 0:13:52.160 --> 0:13:56.000 a micrometer is something like five thousands of a pixel, 0:13:56.960 --> 0:14:00.680 and it could do that thanks to very subtle changes 0:14:00.800 --> 0:14:04.080 and in each pixels color values at the edges of 0:14:04.120 --> 0:14:08.600 the objects that were being studied. So yeah, and it's 0:14:08.640 --> 0:14:11.719 also that he pointed out in the TED talk that 0:14:12.200 --> 0:14:15.120 it's not like the camera was pointed at one particular, 0:14:15.240 --> 0:14:18.120 tiny little edge of one of these objects. It could 0:14:18.200 --> 0:14:22.520 actually take into account all of the different vibrations happening 0:14:22.520 --> 0:14:26.520 across the object, and that collectively that provided the data 0:14:26.600 --> 0:14:30.000 necessary for them to be able to reconstruct the audio. Right. 0:14:30.080 --> 0:14:33.720 It's rooted in research from from m I t S 0:14:33.760 --> 0:14:38.080 Computer Science and Artificial Intelligence Laboratory, and the software that 0:14:38.080 --> 0:14:41.320 that team was developing was originally intended to amplify color 0:14:41.440 --> 0:14:43.480 changes in video, but then they realized that that it 0:14:43.520 --> 0:14:46.840 could thereby amplify motion, and so they bent it to 0:14:47.680 --> 0:14:52.920 to UH tasks like monitoring blood flow unobtrusively and then 0:14:53.080 --> 0:14:55.200 show that in the video right that they show the 0:14:55.280 --> 0:14:59.160 pulse of someone's arm. Because of these minute changes, they're 0:14:59.200 --> 0:15:01.360 able to amplif by that to the point where you 0:15:01.360 --> 0:15:03.960 can actually see the pulse, which is, by the way, 0:15:04.160 --> 0:15:08.960 a little freaky sure, but also pretty cool. And and 0:15:09.040 --> 0:15:11.640 so this this new team, this this acoustic team in 0:15:13.160 --> 0:15:16.440 built on top of that software, adding the algorithms that 0:15:16.480 --> 0:15:20.520 would identify the whole object and monitor its overall movements 0:15:20.560 --> 0:15:27.360 in order to create the the sound goodness. And it 0:15:27.400 --> 0:15:30.680 was interesting because once they determined that they could capture 0:15:31.320 --> 0:15:35.640 the motion under those quote unquote ideal circumstances with the 0:15:35.760 --> 0:15:39.120 bright lighting and the high speed camera, they started to 0:15:39.560 --> 0:15:43.720 test how far outside of those ideal circumstances they could 0:15:43.760 --> 0:15:47.600 still capture meaningful information and be able to replicate the 0:15:47.680 --> 0:15:51.960 sound that occurred next to that physical object. And you know, 0:15:52.080 --> 0:15:54.040 the idea being that you would be able to replicate 0:15:54.040 --> 0:15:57.720 sound even if there were no microphones, no official microphones working. 0:15:58.560 --> 0:16:04.120 So they it up testing it with normal daylight, providing 0:16:04.160 --> 0:16:07.480 the lighting and shooting through a soundproofed window. So the 0:16:07.520 --> 0:16:10.080 camera was on one side of the window, the object 0:16:10.160 --> 0:16:12.040 was on the other side of the window, another empty 0:16:12.080 --> 0:16:14.880 back of chips, yep, and and that's where the sound 0:16:14.920 --> 0:16:17.360 was generator was on the other side of the soundproof windows. 0:16:17.360 --> 0:16:20.040 So uh, in theory, there shouldn't have been any sound 0:16:20.080 --> 0:16:23.880 bleed over into the camera and they could still pick 0:16:23.960 --> 0:16:28.240 up sound that way clearly. Yeah. Um, and with normal 0:16:28.560 --> 0:16:32.120 indoor light. They filmed a pair of earbuds like normal plastic, 0:16:32.200 --> 0:16:35.520 cheap out earbuds, and then reconstructed the music that they 0:16:35.520 --> 0:16:40.280 were playing well enough that they successfully shazammed the music 0:16:40.480 --> 0:16:42.960 and it was under pressure, which I realized now that 0:16:42.960 --> 0:16:44.960 I should have used a lyric from that from the 0:16:45.000 --> 0:16:48.640 beginning of the show, but never mind that. Uh They 0:16:48.720 --> 0:16:51.760 also furthermore found that they could use a standard camera, 0:16:51.920 --> 0:16:55.880 not a high faluten camera. And we're talking standard like 0:16:55.880 --> 0:16:59.440 like sixty frames per second smartphone camera or you know something. 0:16:59.480 --> 0:17:02.360 You could run out and buy a target. And this 0:17:02.440 --> 0:17:05.280 is thanks to a quirk and how standard digital cameras 0:17:05.320 --> 0:17:09.720 handle fast moving objects. It would be more accurate for 0:17:09.720 --> 0:17:12.520 for them to read measurements off of your whole array 0:17:12.600 --> 0:17:15.720 of photo detectors at the same time, but that is 0:17:15.800 --> 0:17:20.400 kind of expensive, so uh so cameras that are less 0:17:20.400 --> 0:17:24.160 expensive than than super high speed cameras instead read off 0:17:24.200 --> 0:17:27.480 of their photo detectors one row at a time, sort 0:17:27.520 --> 0:17:29.840 of like scan line televisions. And it does this very 0:17:29.920 --> 0:17:33.720 quickly but not instantaneously sure, and it can lead to 0:17:33.760 --> 0:17:36.040 that weird lag that you might have noticed in some 0:17:36.200 --> 0:17:39.200 videos of high speed objects, like sort of jagged edges 0:17:39.359 --> 0:17:42.199 or extra pixelation when the object is moving faster than 0:17:42.240 --> 0:17:45.440 the software can can handle. Yeah. I think in the 0:17:45.560 --> 0:17:48.600 m I T article we read, they use the example 0:17:48.600 --> 0:17:51.800 of a rotor blade of a helicopter. Right. Sure, sure, 0:17:52.240 --> 0:17:54.760 it might be spinning so fast that it's not going 0:17:54.840 --> 0:17:57.239 to capture it the same way your eye would see it, 0:17:57.280 --> 0:17:59.560 but it's going to scan the blade in a different 0:17:59.560 --> 0:18:03.520 position each line, right, and uh on, on a much 0:18:03.720 --> 0:18:08.240 smaller level, invisible to the naked eye. This flaw in 0:18:08.640 --> 0:18:12.640 in normal cameras creates visual artifacts that the researchers found 0:18:12.680 --> 0:18:16.600 out that they could use in order to measure subtle vibrations. 0:18:16.720 --> 0:18:19.639 So the audio reconstructions that they got out of this 0:18:19.720 --> 0:18:23.720 experiment weren't as close to the original audio, but the 0:18:23.760 --> 0:18:26.639 researchers did report that they could probably still identify like 0:18:26.680 --> 0:18:29.160 the number of speakers in a room, or the identity 0:18:29.240 --> 0:18:32.320 of a speaker, given that you have an audio profile 0:18:32.359 --> 0:18:35.400 of the person's voice to begin with. Yeah, so, uh 0:18:35.440 --> 0:18:39.639 definitely a little more like um muted and a little 0:18:39.640 --> 0:18:42.359 more distorted, to the point where if you had not 0:18:42.480 --> 0:18:44.800 already heard what was being said, you might not be 0:18:44.840 --> 0:18:48.800 able to necessarily reconstruct it. Um. Our brains are interesting, right, 0:18:48.880 --> 0:18:51.520 Like if we hear what we're supposed to hear, and 0:18:51.560 --> 0:18:54.480 then we hear the sound played, were more likely to 0:18:54.520 --> 0:18:57.480 pick it out. Uh. This is something that you find 0:18:57.480 --> 0:19:00.440 and people who claim for for like you know, hidden 0:19:00.480 --> 0:19:04.280 messages and backmasking and that kind of stuff. Yeah, if 0:19:04.280 --> 0:19:08.000 you if you listen to the raw sound file without 0:19:08.040 --> 0:19:11.720 any prompting, you may end up saying I didn't make 0:19:11.720 --> 0:19:13.560 anything out and then someone says, oh, you need to 0:19:13.600 --> 0:19:17.080 listen for the phrase, um help me, Jonathan has me 0:19:17.119 --> 0:19:20.080 trapped in the basement. You might end up hearing it. 0:19:20.680 --> 0:19:23.880 I try very hard to make sure don't hear it, 0:19:24.880 --> 0:19:30.960 but you might hear it. Okay. So anyway, that's when 0:19:31.000 --> 0:19:38.400 you play under pressure backwards. Oh vo vovo oh, Freddie Mercury. 0:19:39.200 --> 0:19:41.440 Other methods that you can use to pick up sound 0:19:41.440 --> 0:19:45.000 from a distance. Uh. Some of them use a similar method, 0:19:45.040 --> 0:19:48.880 Like there's one that uses lasers, and in fact, at 0:19:48.880 --> 0:19:52.639 the TED talk, Davis actually says this particular approach, a 0:19:52.720 --> 0:19:55.359 lot of people might immediately spring to the conclusion that 0:19:55.400 --> 0:19:57.440 you would use it to spy on someone, you would 0:19:58.000 --> 0:20:01.120 aim a camera in at something that would be kind 0:20:01.119 --> 0:20:03.359 of unobtrusive, like a potted plant that happens to be 0:20:03.400 --> 0:20:05.920 near a person's desk, and then you could end up 0:20:05.920 --> 0:20:09.520 replicating conversations that went on inside that room by just 0:20:09.680 --> 0:20:12.679 measuring the vibrations of that plant and then running it 0:20:12.680 --> 0:20:16.760 through this algorithm to recreate the sound that happened. Most 0:20:16.800 --> 0:20:19.920 evil plots take place in a room with a potted plant. Yeah, 0:20:20.000 --> 0:20:24.240 you know. Uh, yeah, that's why I only use cacti now, 0:20:24.280 --> 0:20:29.320 But even they have have been deceptive. So the the 0:20:29.359 --> 0:20:32.439 point that Davis makes is that this is pretty low fidelity. 0:20:32.520 --> 0:20:33.920 If you want to do something like that, and if 0:20:33.920 --> 0:20:37.080 you really want to do that, there are alternatives that 0:20:37.200 --> 0:20:42.359 provide much higher quality recordings, uh, mainly using laser microphones. Now, 0:20:42.560 --> 0:20:44.920 you guys heard of laser microphones before we started looking 0:20:44.920 --> 0:20:47.639 into this, No, I don't think I had. It was 0:20:47.680 --> 0:20:49.280 one of those things that I had heard about only 0:20:49.320 --> 0:20:53.200 because I was looking through the Spy Museum in Washington, 0:20:53.280 --> 0:20:56.600 d C. And And read about them. So they work 0:20:56.600 --> 0:21:00.119 on a very similar principle, except instead of detecting the 0:21:00.200 --> 0:21:03.520 vibrations optically, what they're doing is they're using it's still optical, 0:21:03.600 --> 0:21:07.480 but it's not you know, like a visual approach. Um 0:21:07.560 --> 0:21:11.040 you're shooting a laser out at an object and as 0:21:11.040 --> 0:21:15.000 that object vibrates when it is exposed to sound the 0:21:15.080 --> 0:21:17.520 returning laser light. Because you know, it's all based on 0:21:17.680 --> 0:21:20.120 shooting a laser out and then detecting when it comes back, 0:21:20.560 --> 0:21:24.159 the returning laser light will have slightly different arrival times 0:21:24.160 --> 0:21:27.160 than when it was sent out based on those vibrations 0:21:27.160 --> 0:21:29.040 of it's vibrating out, it's going to be a little 0:21:29.080 --> 0:21:32.479 shorter than if it's vibrating in. And while that sounds 0:21:32.480 --> 0:21:36.560 incredibly minute, and it is, it's enough to be able 0:21:36.680 --> 0:21:40.400 to take that data, feed it back through and create 0:21:40.480 --> 0:21:43.679 a sound file based on it, so you could replicate 0:21:43.800 --> 0:21:46.199 things that are being said or other sounds that are 0:21:46.240 --> 0:21:48.879 going on. And in fact, there are a lot of 0:21:48.880 --> 0:21:53.280 places that that due to their classified nature, due to 0:21:53.320 --> 0:21:56.359 the secrecy of stuff that goes on inside, they take 0:21:56.440 --> 0:22:01.480 great pains to try and obvious skate any view into 0:22:01.600 --> 0:22:04.840 the place, whether that's creating sort of a double glazing 0:22:04.840 --> 0:22:07.080 on the window so it disperses the laser beam and 0:22:07.080 --> 0:22:10.119 thus the laser beam can't get a good read um 0:22:10.240 --> 0:22:14.119 or other elements as well. So that's super spy stuff 0:22:14.160 --> 0:22:16.320 that most of us don't have to worry about. But 0:22:16.400 --> 0:22:21.240 as Davis would say, that is a more relevant fear 0:22:21.400 --> 0:22:24.760 than someone using a camera to look in Um, that's 0:22:24.840 --> 0:22:29.200 not as likely. I mean, why would you do. It's expensive, 0:22:31.840 --> 0:22:35.440 but you do have to get your access to the 0:22:35.480 --> 0:22:38.119 algorithm though. That's true, that's true. No, I've got another 0:22:38.119 --> 0:22:41.920 technology in mind. That's where you just grab somebody from 0:22:42.000 --> 0:22:44.280 the room and throw them in a van and demand 0:22:44.320 --> 0:22:45.960 to know what was said. That's not so much a 0:22:46.000 --> 0:22:51.280 technology as it is a valid strategy. Hey, vans are 0:22:51.320 --> 0:22:54.960 a technology. You could just as easily throw them in 0:22:55.000 --> 0:22:57.240 the back of a horse drawn carriage. I mean, it's 0:22:57.240 --> 0:23:00.880 just which are also a technology? Options are list uh 0:23:00.920 --> 0:23:04.520 there of course also long range microphones. You probably have 0:23:04.560 --> 0:23:06.800 seen these advertise in the back of a comic book. 0:23:07.160 --> 0:23:10.080 Parabolic mic. Yeah, that's that's the really popular one. Like 0:23:10.160 --> 0:23:12.800 that's if you ever see, like the spy kits that 0:23:12.840 --> 0:23:14.760 are made for kids who are interested in this kind 0:23:14.760 --> 0:23:16.919 of stuff, there's usually some sort of parabolic mic and 0:23:17.080 --> 0:23:21.040 involved in that. Parabolic mics are meant to be sensitive 0:23:21.080 --> 0:23:24.399 and directional. They're not terribly good at being directional, but 0:23:24.440 --> 0:23:27.879 the really high powered ones are fairly sensitive. Uh. So 0:23:27.960 --> 0:23:30.480 the idea is that you're you're concentrating on a specific 0:23:30.480 --> 0:23:32.600 area to try and pick up sound from that area 0:23:32.680 --> 0:23:36.000 while trying to block sound from as other from all 0:23:36.040 --> 0:23:38.479 the other directions as much as possible. Right, Because as 0:23:38.600 --> 0:23:41.720 as you keep turning up sensitivity in any kind of recorder, 0:23:41.760 --> 0:23:44.080 you're going to increase your noise. Yeah, it's it's like 0:23:44.119 --> 0:23:46.840 turning up the gain on a microphone. If you crank 0:23:46.920 --> 0:23:48.840