WEBVTT - Techstuff Classic: The Dirt on Digital Audio 0:00:04.160 --> 0:00:07.200 Get in touch with technology with tech Stuff from half 0:00:07.240 --> 0:00:14.720 stuff works dot com. Hey everybody, it's Jonathan Strickland here 0:00:14.920 --> 0:00:19.120 with text Stuff classic episodes. We're doing some Saturday morning 0:00:19.239 --> 0:00:22.439 reruns for you guys. This is a special series where 0:00:22.440 --> 0:00:25.440 we're going to dig up some classic episodes of tech 0:00:25.520 --> 0:00:28.400 Stuff and present them to you guys who may not 0:00:28.520 --> 0:00:30.480 have had a chance to listen to them, especially if 0:00:30.480 --> 0:00:33.400 you're a brand new listener. First of all, welcome. If 0:00:33.440 --> 0:00:36.480 that's the case, I hope you enjoy these episodes. This 0:00:36.520 --> 0:00:39.800 one is called The Dirt on Digital Audio, and it 0:00:39.880 --> 0:00:43.960 is an episode all about the actual technical process of 0:00:44.040 --> 0:00:48.200 recording audio into a digital format and what that requires 0:00:48.240 --> 0:00:51.680 because it's very different from the analog style. I hope 0:00:51.720 --> 0:00:55.280 you guys enjoy it. This episode was originally published on 0:00:55.360 --> 0:00:58.960 November twenty three, two thousand sixteen. And just in case 0:00:58.960 --> 0:01:01.120 you're listening to this one in the far future, I'm 0:01:01.160 --> 0:01:04.440 recording this in two thousand eighteen, So we're gonna time 0:01:04.440 --> 0:01:08.000 travel a bit and listen to this classic episode The 0:01:08.080 --> 0:01:12.080 Dirt on Digital Audio. So to start it all off, 0:01:12.880 --> 0:01:15.840 we all have to take a quick trip to Germany, 0:01:16.000 --> 0:01:19.520 So anyone who is not in Germany get your passport. 0:01:20.240 --> 0:01:22.280 I was actually in Germany not that long ago. I 0:01:22.319 --> 0:01:25.800 got to visit Berlin and had a wonderful time. And 0:01:25.880 --> 0:01:29.279 in Germany there's a company called frown Hofer Gazelle Shaft. 0:01:30.080 --> 0:01:32.000 And you might wonder, well, what does this company do 0:01:32.440 --> 0:01:38.040 they think? I joke that my profession, that my title 0:01:38.200 --> 0:01:40.800 that I should put on my business card it should 0:01:40.800 --> 0:01:44.399 say professional smart person. Well, no joke, that's what these 0:01:44.400 --> 0:01:49.920 people are. They specialize in research and development, applied research. 0:01:51.080 --> 0:01:54.520 It's a whole company that specializes and applied research. And 0:01:54.640 --> 0:01:58.960 it's huge. It encompasses sixties seven institutes and research units 0:01:59.040 --> 0:02:04.480 across Germany. Well back in the eighties and there was 0:02:04.520 --> 0:02:10.320 a researcher named Karl Heinz Brandenburg, and Karl Heinz made 0:02:10.720 --> 0:02:16.720 a breakthrough round uh and came up with this clever 0:02:16.880 --> 0:02:21.360 idea about encoding audio. He was actually working towards creating 0:02:21.360 --> 0:02:25.640 a way that would allow for high audio quality transfer 0:02:26.040 --> 0:02:30.239 but having a low bit rate sampling, so that file 0:02:30.360 --> 0:02:34.160 sizes and transfer times wouldn't get out of control. Because 0:02:34.160 --> 0:02:36.040 you got to remember, this is the eighties, this is 0:02:36.120 --> 0:02:39.200 before the World Wide Web was a thing that would 0:02:39.360 --> 0:02:42.400 that wouldn't happen until the early nineties, so the Internet 0:02:42.440 --> 0:02:44.280 was very young. In fact, they weren't even looking at 0:02:44.320 --> 0:02:47.320 the Internet as a method of distribution for this particular 0:02:47.400 --> 0:02:50.639 type of encoded audio. They were looking at using this 0:02:51.000 --> 0:02:54.880 to transmit across telephone lines. So they needed to have 0:02:54.960 --> 0:02:58.320 something that was going to be high quality but low space. 0:02:59.520 --> 0:03:01.600 So what the heck does that mean? All right? Well, 0:03:02.560 --> 0:03:06.960 digital audio and analog audio are very different things. So 0:03:07.000 --> 0:03:10.320 to understand that, we need to look at how sound 0:03:10.360 --> 0:03:14.679 works and how we describe sound, because that informs how 0:03:14.720 --> 0:03:19.440 we can capture sound and replicate those qualities digitally. So 0:03:19.520 --> 0:03:22.560 stick with me. We're gonna go back to school for 0:03:22.720 --> 0:03:27.400 some basic sound science. And this goes back to the 0:03:27.400 --> 0:03:31.720 way sound physically moves through a medium, whether that's a 0:03:31.800 --> 0:03:36.640 solid or through the air or through water. Sound is vibration. 0:03:37.480 --> 0:03:42.640 Now we sense this primarily through hearing it or sometimes 0:03:42.880 --> 0:03:45.720 feeling it. If it's the right frequency and the right amplitude. 0:03:45.720 --> 0:03:49.240 We can actually feel sound. Anyone who stood close to, 0:03:49.280 --> 0:03:52.320 say a sub wiffer that was really blasting out bass notes, 0:03:52.400 --> 0:03:54.600 you know what I'm talking about. You can feel it 0:03:54.720 --> 0:03:58.720 pressing against you. Well, sound travels through the air when 0:03:58.760 --> 0:04:03.320 molecules vibrate against each other, and this creates instances of 0:04:03.560 --> 0:04:07.840 increased pressure and decreased pressure at what is a hyper 0:04:08.000 --> 0:04:11.000 local level. We're not talking about weather maps here, We're 0:04:11.040 --> 0:04:15.160 talking about tiny little areas. So this increase in decrease 0:04:15.160 --> 0:04:17.760 in pressure is something that we can sense as sound. 0:04:18.320 --> 0:04:21.520 When those changes in pressure affect a diaphragm, such as 0:04:21.640 --> 0:04:25.640 one that's in a microphone or maybe your ear drum, 0:04:25.839 --> 0:04:30.040 for example, it causes the diaphragm to actually move. So 0:04:30.200 --> 0:04:36.839 increased pressure pushes the diaphragm in and decreased pressure doesn't 0:04:36.920 --> 0:04:39.440 really pull the diaphragm out. I mean, you could say 0:04:39.440 --> 0:04:42.080 it it pulls the diaphragm out, but to be more accurate, 0:04:42.120 --> 0:04:46.080 the diaphragm actually pushes outward because the pressure on the 0:04:46.080 --> 0:04:48.640 outside is lower than the pressure on the inside. But 0:04:48.680 --> 0:04:51.839 you get what I'm saying. The diaphragm begins to to 0:04:52.440 --> 0:04:56.200 flex inward and outward depending upon the amount of pressure 0:04:56.279 --> 0:04:59.720 that it's it's encountering. You can imagine this being kind 0:04:59.720 --> 0:05:01.919 of like a drum drum, not an ear drum, but 0:05:01.920 --> 0:05:04.800 an actual drum and striking it. Uh, that's the same 0:05:04.839 --> 0:05:08.920 sort of thing. So sound is the fluctuations of pressure, 0:05:09.480 --> 0:05:12.760 which we can diagram as a wave or a wave 0:05:12.839 --> 0:05:16.440 length a wave form on an x y axis, So 0:05:16.480 --> 0:05:21.560 the horizontal line that access that represents time that has passed, 0:05:21.960 --> 0:05:26.000 and the vertical axis represents the amplitude or the volume 0:05:26.560 --> 0:05:29.880 of the sound wave. The wave length of the sound 0:05:30.240 --> 0:05:32.760 which is the distance between successive points on a wave, 0:05:32.839 --> 0:05:36.359 such as like the successive crests on a wave. That 0:05:36.440 --> 0:05:39.600 tells you a lot about the frequency. So sound moves 0:05:39.640 --> 0:05:43.280 at a constant rate through a given medium, but it 0:05:43.320 --> 0:05:47.200 moves at different rates through different media. So in other words, 0:05:47.440 --> 0:05:49.760 it moves a different speed through a solid than it 0:05:49.800 --> 0:05:53.400 does through air. If the crests of each sound wave 0:05:53.480 --> 0:05:57.479 are really close together, that's a high frequency sound. More 0:05:57.520 --> 0:06:00.920 waves will pass through an arbitrary point within a second. 0:06:01.360 --> 0:06:04.240 The waves that are spaced further apart, that would be 0:06:04.279 --> 0:06:07.559 a lower frequency sound. Higher frequency sounds have a higher 0:06:07.600 --> 0:06:10.719 pitch than lower frequency sounds. So if you hold a 0:06:10.800 --> 0:06:14.360 single note at a constant frequency, you'll have what is 0:06:14.400 --> 0:06:18.520 called a simple harmonic motion. That means the vibrations are 0:06:18.600 --> 0:06:21.880 moving at a constant rate inward and outward. The cycle 0:06:22.120 --> 0:06:25.680 is constant. A tuning fork is a good example of this. 0:06:26.800 --> 0:06:31.080 So if you hear a clear C note played on 0:06:31.120 --> 0:06:34.360 a musical instrument, that could be a simple harmonic motion. 0:06:34.600 --> 0:06:36.720 It won't be, but it could be. I'll tell you 0:06:36.720 --> 0:06:39.080 why it won't be in a minute. So the frequency 0:06:39.120 --> 0:06:42.240 of vibration doesn't change, and so you would get this 0:06:42.480 --> 0:06:44.840 very clear note as a result, And if you were 0:06:44.839 --> 0:06:49.760 to diagram it, you would have very regular crests and troughs, 0:06:49.800 --> 0:06:53.600 all of the same amplitude and distance from each other. 0:06:53.640 --> 0:06:58.240 The frequency and volume would remain constant, assuming of course, 0:06:58.320 --> 0:07:02.480 that you're not trying to change the frequency or volume. Now, 0:07:02.520 --> 0:07:05.320 this is where I point out most musical instruments don't 0:07:05.400 --> 0:07:09.920 produce a single clear note, even if played expertly. They 0:07:09.920 --> 0:07:15.360 actually create several resonant frequencies. So every physical object resonates 0:07:15.400 --> 0:07:19.400 at several different frequencies. You've probably seen this in various programs. 0:07:19.440 --> 0:07:22.840 MythBusters did one about bridges, the idea being that if 0:07:22.880 --> 0:07:25.840 you were to have a group of people marching on 0:07:25.880 --> 0:07:28.960 a bridge at the bridge's resonant frequency, it could cause 0:07:29.000 --> 0:07:33.600 the bridge to start to vibrate and swing out of control. Well, 0:07:33.640 --> 0:07:35.480 there's a reason for this. You may have also seen 0:07:35.560 --> 0:07:39.160 videos of people singing a certain note and causing a 0:07:39.240 --> 0:07:43.640 crystal glass to shatter. That's because that crystal glass does 0:07:43.680 --> 0:07:45.880 have a resonant frequency, and if you can hit that 0:07:45.920 --> 0:07:49.200 resonant frequency at the right volume, you can cause the 0:07:49.240 --> 0:07:52.360 glass to start to deform, or the crystal in this case, 0:07:52.440 --> 0:07:55.160 to deform to a point where it loses integrity and 0:07:55.160 --> 0:08:00.840 it shatters as a result. Well, the resonation of an 0:08:00.840 --> 0:08:04.560 object is dependent upon lots of different factors, and in fact, 0:08:04.720 --> 0:08:09.760 most stuff will resonate at different frequencies, but at different intensities. 0:08:10.040 --> 0:08:14.239 Like there might be one sweet spot, one specific frequency 0:08:14.320 --> 0:08:18.600 that will have the greatest effect, but other related frequencies 0:08:18.640 --> 0:08:20.480 may also have an effect. It will just be to 0:08:20.520 --> 0:08:24.040 a lesser extent. Well, if you were to pluck a 0:08:24.040 --> 0:08:28.240 guitar string, just you've tuned it to whatever note doesn't matter. 0:08:28.320 --> 0:08:31.640 Let's say it's you tuned it to to G and 0:08:31.880 --> 0:08:35.920 you play the G string on your guitar. The note 0:08:35.960 --> 0:08:38.960 that you will hear really over all others will be 0:08:39.000 --> 0:08:40.640 g that that is going to be the one that 0:08:40.640 --> 0:08:43.320 will sound the loudest, But it will also play resonant 0:08:43.320 --> 0:08:47.240 frequencies at a decreased amplitude. In other words, of decreased 0:08:47.360 --> 0:08:51.440 volume so you still hear the intended note above everything else, 0:08:51.480 --> 0:08:54.600 above all the other resonant frequencies. This is called a 0:08:54.679 --> 0:08:58.800 complex tone, and that collection of frequencies in their amplitudes 0:08:59.000 --> 0:09:03.720 is called the sectrum of sound. You get a full spectrum. Now, 0:09:03.760 --> 0:09:09.280 some of the components of that complex tone will be uh, 0:09:09.320 --> 0:09:12.360 imperceptible to you. You there'll be so quiet that you 0:09:12.400 --> 0:09:15.640 wouldn't really notice them. They might affect the overall quality 0:09:15.640 --> 0:09:17.280 of the sound, but in such a subtle way that 0:09:17.320 --> 0:09:19.160 it may be difficult for you to even put it 0:09:19.240 --> 0:09:23.200 into words. Each of those little components is called a partial. 0:09:23.640 --> 0:09:25.959 So in the example of a guitar string, the partials 0:09:26.000 --> 0:09:30.080 are all integers of the same fundamental frequency, and the 0:09:30.160 --> 0:09:34.680 sound has a harmonic spectrum. But as you get further 0:09:34.760 --> 0:09:39.600 away from that fundamental frequency, the amplitude decreases significantly. So, 0:09:39.679 --> 0:09:42.719 like I said, you get far enough away, they are 0:09:42.760 --> 0:09:47.040 technically there, but they might be imperceptible to you. Now, 0:09:47.080 --> 0:09:51.520 some sounds have frequencies that aren't integers of a fundamental 0:09:51.559 --> 0:09:55.320 frequency and are inharmonic Uh. Certain bells, like if you 0:09:55.320 --> 0:09:57.240 hear a bell ring, you can probably pick out a 0:09:57.240 --> 0:10:00.840 couple of different frequencies. There that are not harmon frequencies. 0:10:01.679 --> 0:10:04.439 These are very complex sounds, and to our perception, if 0:10:04.480 --> 0:10:07.480 it's complex enough, it can seem like there's no single 0:10:07.559 --> 0:10:12.480 discernible pitch. They're like there's no fundamental frequency over all 0:10:12.559 --> 0:10:16.640 the others. If it's complex enough, we call it noise. 0:10:17.360 --> 0:10:21.040 That is the technical term. It is noise. Now, the 0:10:21.160 --> 0:10:26.319 unit we use to measure frequency is the hurts uh 0:10:26.559 --> 0:10:29.839 H E R t Z. Typical human hearing ranges from 0:10:29.880 --> 0:10:33.840 twenty hurts, which means a wave will pass a given 0:10:33.960 --> 0:10:37.400 arbitrary point twenty times within a second, all the way 0:10:37.480 --> 0:10:40.439 up to twenty killer hurts, which means a wave will 0:10:40.440 --> 0:10:44.520 pass a particular point in time twenty thousand times in 0:10:44.520 --> 0:10:47.320 a second, or particular point on your wave form twenty 0:10:47.320 --> 0:10:50.880 thousand times in the second. And most of our sensitivity 0:10:51.040 --> 0:10:54.800 tends to be between one or two killer hurts up 0:10:54.840 --> 0:10:58.000 to four or five killer hurts. That's generally where we 0:10:58.240 --> 0:11:02.280 have human voices, and we've really gotten good at picking 0:11:02.280 --> 0:11:04.800 those out of over everything else. So our sensitivity of 0:11:04.880 --> 0:11:07.520 hearing is really concentrated between one killer hurts and four 0:11:07.600 --> 0:11:10.640 killer hurts or two and five depending upon whom you ask. 0:11:12.840 --> 0:11:16.200 Now we get back over to amplitude, that is referring 0:11:16.240 --> 0:11:18.520 to the height of the wave. It also refers to 0:11:18.559 --> 0:11:23.679 the volume the loudness of something. Amplitude means bigness. So 0:11:23.720 --> 0:11:27.199 how big is the sound, Well, the greater the amplitude, 0:11:27.240 --> 0:11:30.319 the louder it is. And amplitudes can have an enormous 0:11:30.480 --> 0:11:34.080 range and affect how we perceive sounds. So, for example, 0:11:34.559 --> 0:11:38.720 take a really complicated classical piece of music. It's just 0:11:38.840 --> 0:11:42.120 easy to explain it in that term. You might have 0:11:42.160 --> 0:11:45.760 a stretch in that classical piece of music in which 0:11:45.840 --> 0:11:48.360 all the instruments are more or less playing at a 0:11:48.440 --> 0:11:52.000 similar volume, so the sound from each instrument section has 0:11:52.040 --> 0:11:55.319 a similar amplitude. But then there might be one segment 0:11:55.400 --> 0:11:58.920 where an instrument group or maybe even a single soloist 0:11:59.559 --> 0:12:03.600 has an increased amplitude and increased volume. It rises over 0:12:03.640 --> 0:12:06.960 the rest of the orchestra, and that peak of the 0:12:07.000 --> 0:12:10.600 amplitude is called the attack of the sound, and the 0:12:10.880 --> 0:12:16.040 entire range of amplitudes is called the amplitude envelope. Now 0:12:16.040 --> 0:12:18.920 this is important when we get to m P three's 0:12:18.960 --> 0:12:23.640 because the way we perceive these sounds uh that that 0:12:23.679 --> 0:12:26.240 has everything to do with the way the MP three 0:12:26.360 --> 0:12:29.600 was designed. The whole point of the MP three was 0:12:29.640 --> 0:12:34.800 to try and create a small file size to represent 0:12:34.880 --> 0:12:37.880 what we can hear and kind of ignore everything else. 0:12:38.120 --> 0:12:40.760 We'll get to that in a little bit more more time. 0:12:40.920 --> 0:12:43.880 So this is really interesting to me. If you take 0:12:44.240 --> 0:12:49.679 a sound and you double its amplitude, you increase the 0:12:49.720 --> 0:12:54.080 amplitude by twofold, a listener would not necessarily feel that 0:12:54.120 --> 0:12:59.400 the sound is twice as loud. Human hearing is incredibly subjective, 0:13:00.040 --> 0:13:04.079 and typically for most listeners, it would require much more 0:13:04.880 --> 0:13:08.760 than doubling the sounds amplitude for them to feel that 0:13:08.880 --> 0:13:12.400 the sound itself was twice as loud. This perception of 0:13:12.480 --> 0:13:14.920 volume is important when we get to the lossy formats 0:13:14.920 --> 0:13:19.839 for audio files. Now I've given you all this information, 0:13:20.040 --> 0:13:22.760 and I know everyone is probably thinking, you know, I 0:13:23.120 --> 0:13:26.480 learned this in primary school, elementary school. All of this 0:13:26.559 --> 0:13:29.800 is really familiar to me, and you're maybe rolling your 0:13:29.800 --> 0:13:32.840 eyes because it's so basic. But I think it's important 0:13:33.280 --> 0:13:36.560 to have that refresher so that you can understand the 0:13:36.600 --> 0:13:41.240 difference between sound as we experience it and sound as 0:13:41.320 --> 0:13:45.960 the way we encode it digitally and replicate it digitally. 0:13:46.840 --> 0:13:49.840 For one thing, this illustrates how sound in the real 0:13:49.880 --> 0:13:54.640 world is a continuum. It's a continuum both in frequency 0:13:54.679 --> 0:13:59.920 and amplitude. You can have sound changing in frequency very 0:14:00.280 --> 0:14:04.520 smoothly from one pitch to another. You can also have 0:14:04.600 --> 0:14:09.199 sound increase or decrease in amplitude in a very smooth way. 0:14:09.360 --> 0:14:14.240 And it is continuous, it's unbroken. It can have smooth transitions. 0:14:14.240 --> 0:14:17.240 And these qualities provide challenges when we want to describe 0:14:17.280 --> 0:14:22.960 something digitally because at the heart of digital information is 0:14:23.400 --> 0:14:28.120 the bit, the basic unit of information. It is a 0:14:28.240 --> 0:14:31.840 unit of information that only has two states zero or 0:14:32.000 --> 0:14:36.120 one is essentially off or on. When you get down 0:14:36.160 --> 0:14:41.040 to defining information in just two states, then you start 0:14:41.080 --> 0:14:44.040 to look at something that is continuous and you realize 0:14:44.560 --> 0:14:46.240 this is going to be a challenge. How do I 0:14:46.320 --> 0:14:52.160 describe a continuous experience in very discrete amounts of information. 0:14:53.160 --> 0:14:57.280 And that's when we get to the methodology we've developed 0:14:57.920 --> 0:15:01.280 to digitally encode sound. I'm going to get into that 0:15:01.320 --> 0:15:04.680 in just a minute, but before I do that, let's 0:15:04.720 --> 0:15:16.240 take a quick break to thank our sponsor. All right, 0:15:16.360 --> 0:15:20.400 let's get back into it. So we've talked about the 0:15:20.480 --> 0:15:23.400 nature of sound. Analog sound, by the way, tries to 0:15:23.440 --> 0:15:27.560 replicate exactly what we would experience in nature. It tries 0:15:27.560 --> 0:15:32.640 to create this continuous experience, so you get these smooth 0:15:32.720 --> 0:15:38.320 waves of frequencies and amplitudes. And that's why some people 0:15:38.480 --> 0:15:43.920 argue that that analog styles of of sound recordings are 0:15:44.000 --> 0:15:48.800 superior to digital ones. I don't necessarily think they're right, 0:15:49.360 --> 0:15:52.800 but they often feel that way. So something like a 0:15:52.920 --> 0:15:58.000 vinyl album, which is an analog format of digital or sorry, 0:15:58.040 --> 0:16:02.280 an analog format of music storage should say sound storage. Uh, 0:16:02.320 --> 0:16:04.960 they think that that is superior to say a c D, 0:16:05.320 --> 0:16:10.320 which is a digital storage format. Uh. And who's to say. 0:16:10.400 --> 0:16:14.440 I mean, like, if your sense of hearing is incredibly 0:16:14.720 --> 0:16:18.040 well tuned, you might be able to pick up on 0:16:18.120 --> 0:16:22.160 some differences. Or if someone did a really terrible job 0:16:22.680 --> 0:16:28.000 encoding music digitally, then that might reveal itself to you 0:16:28.040 --> 0:16:30.760 as well. Uh. But this is one of those things 0:16:30.800 --> 0:16:32.960 that I think a lot of people feel they can 0:16:32.960 --> 0:16:34.760 tell the difference, but if they would do a double 0:16:34.800 --> 0:16:39.360 blind test, they might be surprised at how difficult it is. 0:16:39.840 --> 0:16:43.200 If things if everything's working the way it should, then 0:16:43.440 --> 0:16:48.000 there shouldn't be a perceptible difference at any rate. Digital 0:16:48.040 --> 0:16:54.360 audio has two really important factors. Sample rate and bit depth, 0:16:55.160 --> 0:16:57.640 or to another extent, bit rate. We'll talk about bit 0:16:57.760 --> 0:17:02.280 rate as well. So the sample rate refers to how 0:17:02.320 --> 0:17:05.919 many times you reference an analog sound to create the 0:17:05.960 --> 0:17:09.760 digital version. So sound, like I said, is uninterrupted in 0:17:09.800 --> 0:17:14.840 the analog world, you've got that that nice wave form. 0:17:14.920 --> 0:17:18.040 In the analog world, that's not how digital world works. 0:17:18.119 --> 0:17:21.320 Digital world, we have to describe that sound in a 0:17:21.400 --> 0:17:27.600 series of discrete snippets of sound. It's probably easiest to 0:17:27.640 --> 0:17:33.840 describe this with an analogy to movies on film. If 0:17:33.880 --> 0:17:37.359 you work with film, like you're creating a movie on film, 0:17:37.840 --> 0:17:41.040 then you know that you're not looking at a real 0:17:41.240 --> 0:17:44.240 moving picture when you see the film played out at 0:17:44.280 --> 0:17:47.560 the cinema. Instead, what you're looking at is a series 0:17:47.640 --> 0:17:52.160 of photographs. If you take a film strip and you 0:17:52.240 --> 0:17:56.280 look at it under a light, you'll see it's one 0:17:56.359 --> 0:18:00.760 after another photograph. It's just a series of pictures. It's 0:18:00.760 --> 0:18:02.920 only when you play them back at the right speed 0:18:03.520 --> 0:18:05.800 and you projected onto a screen that you get the 0:18:05.880 --> 0:18:10.520 illusion of continuous motion. But it's not really continuous. It's 0:18:10.560 --> 0:18:13.800 just this series of photographs played at twenty four frames 0:18:13.800 --> 0:18:18.840 per second in the case of actual film. So that 0:18:19.040 --> 0:18:22.200 ends up being very analogous to the way we encode 0:18:22.200 --> 0:18:26.040 digital audio. You take the analog recording and you take 0:18:26.359 --> 0:18:31.840 snapshots of sound. The more frequently you take those snapshots, 0:18:32.240 --> 0:18:34.480 the higher your sample rates. So in other words, if 0:18:34.480 --> 0:18:37.639 you did one a second, your sample rate would be awful. 0:18:38.400 --> 0:18:40.600 You would have a sample rate of one. But the 0:18:40.680 --> 0:18:43.680 higher the sample rate, the closer your digital representation will 0:18:43.680 --> 0:18:47.280 be to the frequency in the analog sound format. Actually, 0:18:47.760 --> 0:18:50.000 what's really important to remember is that your sample rate 0:18:50.040 --> 0:18:52.480 has to be about twice actually does have to be 0:18:52.520 --> 0:18:56.920 twice what the highest frequency sound is in your recording. 0:18:58.440 --> 0:19:01.480 It has to be because as if it's not, it 0:19:01.640 --> 0:19:07.720 cannot encode that sound accurately. It's kind of interesting and 0:19:07.840 --> 0:19:09.919 you might wonder, how do we take these snapshots in 0:19:09.920 --> 0:19:12.920 the first place. Well, if you're capturing audio, let's say 0:19:12.960 --> 0:19:16.360 we're recording to digital, So we've got a microphone set 0:19:16.440 --> 0:19:20.960 up and we're recording to a digital media storage. Like 0:19:21.040 --> 0:19:23.280 let's just say we're recording straight to someone's hard drive. 0:19:23.440 --> 0:19:26.760 So we're talking into a microphone recording to a hard drive. 0:19:27.720 --> 0:19:31.440 So you're using an analog microphone. Let's say you would 0:19:31.440 --> 0:19:35.760 need an analog to digital converter Now this particular component 0:19:36.040 --> 0:19:40.800 can receive discrete voltages from another device like your microphone. 0:19:41.040 --> 0:19:47.920 So your microphone is converting sound into uh differences in voltage. 0:19:48.000 --> 0:19:50.880 That's essentially how it communicates, so that it can then 0:19:51.040 --> 0:19:54.080 send that to some other element. In this case, it's 0:19:54.119 --> 0:19:57.720 sending it to the the analog to digital converter so 0:19:57.760 --> 0:20:00.400 that it can be stored digitally on your our drive. 0:20:01.480 --> 0:20:08.560 So this analog digital converters references or samples the discrete 0:20:08.680 --> 0:20:12.240 voltage many times every second in order to create a 0:20:12.280 --> 0:20:16.760 digital representation of the analog sound. It converts the voltages 0:20:16.840 --> 0:20:21.399 into numbers and a process called quantization, and we express 0:20:21.480 --> 0:20:24.480 those numbers in bits, So these are zeros and ones. 0:20:25.080 --> 0:20:27.760 When you want to play the digital audio, a digital 0:20:27.800 --> 0:20:31.840 to analog converter does the same process in reverse. So 0:20:32.080 --> 0:20:35.800 it takes this digital information, these zeros and ones and 0:20:35.880 --> 0:20:39.600 converts it into a series of discrete voltages, which then 0:20:39.840 --> 0:20:43.520 can be amplified and sent to a speaker and create sound. 0:20:44.760 --> 0:20:47.360 So all of that's really important. But now let's let's 0:20:47.359 --> 0:20:49.960 talk about some concrete examples, and the best way to 0:20:49.960 --> 0:20:53.240 do this is to go with compact discs. Because we 0:20:53.320 --> 0:20:57.119 have a standard sample rate for compact discs, and that 0:20:57.280 --> 0:21:00.560 standard sample rate is forty four point one la hurts 0:21:00.680 --> 0:21:04.200 to create CD equality audio. That means that the audio 0:21:04.280 --> 0:21:10.000 is sampled forty four thousand, one hundred times every second 0:21:10.880 --> 0:21:12.840 the way they hear you say, the range of human 0:21:12.880 --> 0:21:15.359 hearing you said only goes to twenty hurts to twenty 0:21:15.440 --> 0:21:18.280 killer hurts. If it only goes up to twenty killer hurts, 0:21:18.280 --> 0:21:21.080 why are you sampling at forty four thousand, one hundred 0:21:21.160 --> 0:21:25.560 times every second? If it's twenty thousand times a second 0:21:25.600 --> 0:21:28.919 for the frequency, why go up to four thousand, one 0:21:29.000 --> 0:21:31.520 hundred Is there some relationship between that and the c 0:21:31.640 --> 0:21:34.680 D sample rate? And the answer is yes. So there 0:21:34.800 --> 0:21:40.000 is a theorem called the Nyquist Shannon sampling theorem, and 0:21:40.080 --> 0:21:42.760 that states that the sample rate must be twice the 0:21:42.840 --> 0:21:46.000 maximum frequency of a recording in order to describe the 0:21:46.040 --> 0:21:50.240 frequency properly. So the general thought is the maximum frequency 0:21:50.320 --> 0:21:52.919 most humans can here's twenty killer hurts. And for that reason, 0:21:52.960 --> 0:21:55.760 Phillips and Sony when they were working to create the 0:21:55.960 --> 0:21:59.879 CD format to make it a standard, they decide on 0:22:00.040 --> 0:22:02.879 forty four point one killer hurts as that standard sample 0:22:03.000 --> 0:22:05.399 rate for c D audio. It was more than double 0:22:05.440 --> 0:22:08.040 the top frequency generally considered to be in the upper 0:22:08.119 --> 0:22:11.200 level of human hearing. But what happens if you were 0:22:11.200 --> 0:22:14.400 to lower the sampling rate. What if you didn't sample 0:22:14.480 --> 0:22:19.600 at What if you sampled at let's say sixteen killer hurts, 0:22:19.600 --> 0:22:23.120 so sixteen thousand times a second you sample it well, 0:22:23.400 --> 0:22:25.560 that means you would only be able to record and 0:22:25.600 --> 0:22:29.200 replicate any sound with a frequency up to eight killer 0:22:29.280 --> 0:22:34.280 hurts or less, so eight thousand hurts or less. But 0:22:34.440 --> 0:22:37.640 if you had any sound that was greater than eight 0:22:37.640 --> 0:22:42.080 thousand hurts or eight killer hurts, anything higher than that, 0:22:43.080 --> 0:22:46.400 it would be folded down to fit below the eight 0:22:46.480 --> 0:22:50.200 killer hurts limit. Perceptually, that means the sounds you would 0:22:50.240 --> 0:22:53.199 hear in the playback could include frequencies that were not 0:22:53.359 --> 0:22:58.160 present in the original performance of that sound. So let's 0:22:58.160 --> 0:23:02.600 say that I'm using a sample rate of sixteen uh, 0:23:02.640 --> 0:23:06.359 you know, killer hurts, and someone is playing a musical 0:23:06.440 --> 0:23:09.200 instrument and they play a note that's at a nine 0:23:09.280 --> 0:23:14.760 killer hurts frequency. Well, because I'm sampling at sixteen killer hurts, 0:23:15.400 --> 0:23:19.679 my limit for frequencies is eight killer hurts. If you 0:23:19.720 --> 0:23:22.600 play something at nine killer hurts, what happens is it 0:23:22.920 --> 0:23:27.280 the recording seems to fold the sound back, and it 0:23:27.400 --> 0:23:31.879 folds it back at the same limit that the sound 0:23:31.960 --> 0:23:37.119 goes over, the sample rate rather the Nyquist limit, I 0:23:37.119 --> 0:23:39.639 should say, not the sample rateself, but the Nyquist limit. 0:23:40.760 --> 0:23:45.760 So nine killer hurts sound played, My limit is eight 0:23:45.840 --> 0:23:49.000 killer hurts. Well, nine killer hurts is one killer hurts 0:23:49.000 --> 0:23:52.040 more than eight, so it folds it back and the 0:23:52.080 --> 0:23:55.359 sound you would hear on the recording would be seven 0:23:55.440 --> 0:23:59.040 killer hurts. So the original sound is nine killer hurts. 0:23:59.119 --> 0:24:03.480 The playbacks sound is seven killer hurts, and you would 0:24:03.560 --> 0:24:07.719 hear something recorded that wasn't actually played. That's why you 0:24:07.760 --> 0:24:10.840 have to have a really high sample rate so that 0:24:10.880 --> 0:24:14.720 you don't have these instances where sound gets folded back 0:24:15.520 --> 0:24:20.399 into the frequency range, because otherwise what you were hearing 0:24:20.560 --> 0:24:24.560 is not an accurate representation of what was actually generated 0:24:24.800 --> 0:24:28.960 what you were trying to record. This whole phenomenon, by 0:24:29.000 --> 0:24:32.320 the way, is called fold over or sometimes alias sing. 0:24:33.720 --> 0:24:36.880 So that's sample rate. But then we've got bit depth. Now, 0:24:36.920 --> 0:24:41.159 this is all about measuring the volume or amplitude of 0:24:41.160 --> 0:24:44.440 a sound. So you have a range. You just make 0:24:44.440 --> 0:24:48.280 an arbitrary range to say, like we're gonna go quietest 0:24:48.320 --> 0:24:51.320 to loudest, and you just define what that range is. 0:24:51.440 --> 0:24:54.160 It could literally be any range. Let's say you say 0:24:54.240 --> 0:24:58.360 zero to one. Zero is dead silence, no sound at all. 0:24:58.840 --> 0:25:02.560 One hundred is as loud as the sound ever gets. 0:25:02.680 --> 0:25:06.480 It's the peak volume of sound. That means you can 0:25:06.560 --> 0:25:11.399 describe all the different volumes within that recording at a 0:25:11.520 --> 0:25:15.359 number between zero and one hundred. But let's say you 0:25:15.440 --> 0:25:18.800 take that same recording and instead of making the range 0:25:19.000 --> 0:25:22.840 zero to one hundred, you say it's zero to two thousand. 0:25:23.240 --> 0:25:26.840 You haven't made the volume louder. The volume is still 0:25:27.080 --> 0:25:29.679 the exact same as it was when you called the 0:25:29.760 --> 0:25:32.800 range zero to one hundred. But what you have done 0:25:33.240 --> 0:25:38.160 is added more units. You've created more precise steps between 0:25:38.400 --> 0:25:43.520 absolute silent and as loud as it gets. So you've 0:25:43.560 --> 0:25:45.359 just increased the size of the range so that you 0:25:45.400 --> 0:25:48.959 can be more precise in the differences in volume. And 0:25:48.960 --> 0:25:52.440