WEBVTT - The Dirt on Digital Audio 0:00:04.160 --> 0:00:07.160 Get in touch with technology with tech Stuff from how 0:00:07.240 --> 0:00:13.920 stuff works dot com. Hey there, everyone, this is Jonathan 0:00:13.960 --> 0:00:18.279 Strickling with tech Stuff, and today we're gonna tackle a 0:00:18.360 --> 0:00:21.600 subject that I've talked about in the past. Actually, way 0:00:21.640 --> 0:00:25.200 back in two thousand and eight, back when you were 0:00:25.200 --> 0:00:27.760 a knee high to a grasshopper, Chris Pollette and I 0:00:27.800 --> 0:00:31.880 did an episode called how MB three files Work, and 0:00:31.920 --> 0:00:35.440 we talked about the lossy file format, and we actually 0:00:35.479 --> 0:00:38.120 revisited it in two thousand eleven we did an episode 0:00:38.120 --> 0:00:42.000 about the iPod and about MP three players, but I 0:00:42.000 --> 0:00:44.480 really thought it would be a good idea to revisit 0:00:45.120 --> 0:00:48.640 MP three files, MP three players, digital audio in general, 0:00:48.720 --> 0:00:51.519 the difference between digital audio and analog and all of 0:00:51.520 --> 0:00:54.760 that history. Uh, to really give a deep dive, because 0:00:54.800 --> 0:00:57.200 back in those days we did really short episodes and 0:00:57.240 --> 0:00:59.600 so we weren't able to give it the full coverage 0:00:59.600 --> 0:01:04.360 that I think get deserved. Um and we actually reached 0:01:04.360 --> 0:01:08.720 a point in history that I did not anticipate. And 0:01:08.760 --> 0:01:11.759 I am, of course talking about the day when I said, 0:01:11.800 --> 0:01:15.120 you know what, I don't need to carry a smartphone 0:01:15.200 --> 0:01:18.120 and an MP three player. I held out for a 0:01:18.120 --> 0:01:20.320 really long time. You guys who have been long time 0:01:20.319 --> 0:01:23.280 listeners of tech stuff might remember that I really liked 0:01:23.360 --> 0:01:26.959 dedicated devices, Like I really liked having a digital camera, 0:01:27.080 --> 0:01:29.760 and I really liked having an MP three player, and 0:01:29.800 --> 0:01:32.319 I really liked having a phone that was a phone. 0:01:33.000 --> 0:01:34.880 And now I'm like, no, I'm good with just one 0:01:34.880 --> 0:01:38.440 device doing all that kind of thing. So, uh, since 0:01:38.440 --> 0:01:41.360 we've reached that point, the point where our machines are 0:01:41.360 --> 0:01:45.400 sophisticate enough to either have enough storage space to carry 0:01:45.440 --> 0:01:50.640 an impressive music collection, or more likely as the things 0:01:50.840 --> 0:01:53.520 as things have changed these days, um access to a 0:01:53.560 --> 0:01:57.720 streaming service where I don't even have stuff stored permanently 0:01:58.080 --> 0:02:01.840 or like in any any you know, lasting format on 0:02:01.880 --> 0:02:04.880 the phone itself. Instead, I'm streaming a file over the 0:02:04.920 --> 0:02:08.919 Internet to listen to. Dynamically, I thought, why not talk 0:02:08.960 --> 0:02:12.000 about the MP three because who knows, in a few 0:02:12.080 --> 0:02:15.760 years and that might just be a distant memory. So 0:02:16.080 --> 0:02:18.840 this is going to be the first of a three 0:02:18.880 --> 0:02:21.600 part series, and I want to let you guys know, 0:02:21.639 --> 0:02:25.120 I'm not going to record all of these and publish 0:02:25.160 --> 0:02:27.440 them all one right after the other. So it's not 0:02:27.480 --> 0:02:30.800 gonna be MP three Part one, Part two, Part three 0:02:30.880 --> 0:02:33.639 in a row. Uh. In this episode, we're gonna look 0:02:33.639 --> 0:02:36.560 at how digital audio works in general and how it's 0:02:36.600 --> 0:02:40.320 different from analog audio. Uh, and we're also gonna talk 0:02:40.320 --> 0:02:43.080 about how the MP three was created and what it does. 0:02:43.639 --> 0:02:46.959 In the next episode, I'm gonna take a deeper dive 0:02:47.480 --> 0:02:51.639 into how an MP three file works, how it compresses audio. 0:02:52.120 --> 0:02:56.120 It gets really technical. And in the final episode of 0:02:56.120 --> 0:02:58.760 the series, we're gonna explore the history of the MP 0:02:58.880 --> 0:03:02.200 three player and how Apple ended up dominating that space 0:03:02.240 --> 0:03:04.200 for so long, to the point that we have things 0:03:04.240 --> 0:03:09.880 called podcasts. But don't worry, I have other episodes to 0:03:09.919 --> 0:03:12.320 divide up this content. So, like I said, it's not 0:03:12.360 --> 0:03:14.560 all gonna be in a row. I don't want you 0:03:14.639 --> 0:03:19.520 to have a month of MP three related episodes, but 0:03:20.200 --> 0:03:23.519 you know, every couple of episodes, expect one of these. 0:03:24.160 --> 0:03:28.720 It's kind of an interesting subject, I think. So to 0:03:28.840 --> 0:03:31.640 start it all off, we all have to take a 0:03:31.720 --> 0:03:34.760 quick trip to Germany. So anyone who is not in 0:03:34.840 --> 0:03:38.960 Germany get your passport. I was actually in Germany not 0:03:39.080 --> 0:03:41.400 that long ago. I got to visit Berlin and had 0:03:41.440 --> 0:03:45.000 a wonderful time. And in Germany there's a company called 0:03:45.160 --> 0:03:48.640 frown Hoffer Gazelle Shoft and you might wonder, well, what 0:03:48.680 --> 0:03:54.360 does this company do? They think I joke that my profession, 0:03:54.720 --> 0:03:57.160 that my title that I should put on my business 0:03:57.160 --> 0:04:01.200 card it should say professional smart person. And well, no joke, 0:04:01.320 --> 0:04:05.240 that's what these people are. They they specialize in research 0:04:05.360 --> 0:04:10.880 and development, applied research. It's a whole company that specializes 0:04:10.920 --> 0:04:14.760 and applied research. And it's huge. It encompasses sixties seven 0:04:14.800 --> 0:04:20.200 institutes and research units across Germany. Well back in the 0:04:20.240 --> 0:04:25.880 eighties and there was a researcher named Karl Heinz Brandenburg, 0:04:26.440 --> 0:04:33.000 and Karl Heinz made a breakthrough round seven uh and 0:04:33.160 --> 0:04:37.480 came up with this clever idea about encoding audio. He 0:04:37.520 --> 0:04:40.239 was actually working towards creating a way that would allow 0:04:40.640 --> 0:04:45.000 for high audio quality transfer but having a low bit 0:04:45.120 --> 0:04:50.400 rate sampling so that file sizes and transfer times wouldn't 0:04:50.440 --> 0:04:52.520 get out of control. Because you got to remember, this 0:04:52.560 --> 0:04:55.599 is the eighties, this is before the Worldwide Web was 0:04:55.640 --> 0:04:58.839 a thing that would That wouldn't happen until the early nineties, 0:04:59.240 --> 0:05:01.240 so the Internet is very young. In fact, they weren't 0:05:01.240 --> 0:05:03.839 even looking at the Internet as a method of distribution 0:05:03.880 --> 0:05:07.520 for this particular type of encoded audio. They were looking 0:05:07.560 --> 0:05:11.960 at using this to transmit across telephone lines, so they 0:05:12.000 --> 0:05:13.760 need to have something that was going to be high 0:05:13.839 --> 0:05:18.440 quality but low space. So what the heck does that mean? 0:05:18.520 --> 0:05:22.920 All right, Well, digital audio and analog audio are very 0:05:23.000 --> 0:05:26.920 different things. So to understand that, we need to look 0:05:27.000 --> 0:05:31.000 at how sound works and how we describe sound, because 0:05:31.000 --> 0:05:34.760 that informs how we can capture sound and replicate those 0:05:34.839 --> 0:05:39.120 qualities digitally. So stick with me. We're gonna go back 0:05:39.160 --> 0:05:44.480 to school for some basic sound science. And this goes 0:05:44.560 --> 0:05:48.320 back to the way sound physically moves through a medium, 0:05:48.320 --> 0:05:51.839 whether that's a solid or through the air or through water. 0:05:52.320 --> 0:05:58.640 Sound is vibration. Now we sense this primarily through hearing 0:05:58.680 --> 0:06:01.720 it or some type feeling it. If it's the right 0:06:01.760 --> 0:06:04.720 frequency in the right amplitude, we can actually feel sound. 0:06:05.040 --> 0:06:08.120 Anyone who stood close to, say a sub wiffer that 0:06:08.160 --> 0:06:10.480 was really blasting out bass notes, you know what I'm 0:06:10.520 --> 0:06:14.159 talking about, You can feel it pressing against you. Well, 0:06:14.200 --> 0:06:18.760 sound travels through the air when molecules vibrate against each other, 0:06:19.360 --> 0:06:23.680 and this creates instances of increased pressure and decreased pressure 0:06:24.080 --> 0:06:27.760 at what is a hyperlocal level. We're not talking about 0:06:27.800 --> 0:06:31.000 weather maps here, We're talking about tiny, little areas. So 0:06:31.279 --> 0:06:33.839 this increase and decrease in pressure is something that we 0:06:33.920 --> 0:06:37.679 can sense as sound. When those changes in pressure affect 0:06:37.760 --> 0:06:41.119 a diaphragm, such as one that's in a microphone or 0:06:41.839 --> 0:06:45.919 maybe your ear drum, for example, it causes the diaphragm 0:06:45.960 --> 0:06:50.120 to actually move. So increased pressure pushes the diaphragm in, 0:06:51.080 --> 0:06:56.400 and decreased pressure doesn't really pull the diaphragm out. I mean, 0:06:56.440 --> 0:06:58.680 you could say it it pulls the diaphragm out, but 0:06:58.680 --> 0:07:02.680 to be more accurate, the diagram actually pushes outward because 0:07:02.720 --> 0:07:05.440 the pressure on the outside is lower than the pressure 0:07:05.480 --> 0:07:07.760 on the inside. But you get what I'm saying. The 0:07:07.880 --> 0:07:12.320 diaphragm begins to to flex inward and outward depending upon 0:07:12.680 --> 0:07:16.360 the amount of pressure that it's it's encountering. You could 0:07:16.360 --> 0:07:18.720 imagine this being kind of like a drum drum, not 0:07:18.800 --> 0:07:20.960 an ear drum, but an actual drum and striking it. 0:07:21.800 --> 0:07:24.720 That's the same sort of thing. So sound is the 0:07:24.760 --> 0:07:29.280 fluctuations of pressure, which we can diagram as a wave 0:07:29.880 --> 0:07:32.720 or a wave length a wave form on an X 0:07:32.840 --> 0:07:37.760 Y axis, So the horizontal line that access that represents 0:07:37.840 --> 0:07:41.320 time that has passed, and the vertical axis represents the 0:07:41.400 --> 0:07:46.200 amplitude or the volume of the sound wave. The wave 0:07:46.320 --> 0:07:49.560 length of the sound, which is the distance between successive 0:07:49.600 --> 0:07:52.800 points on a wave, such as like the successive crests 0:07:52.840 --> 0:07:55.480 on a wave. That tells you a lot about the frequency. 0:07:56.400 --> 0:08:00.520 So sound moves at a constant rate through a given medium, 0:08:00.520 --> 0:08:04.080 but it moves at different rates through different media. So, 0:08:04.120 --> 0:08:06.640 in other words, it moves at different speed through a 0:08:06.680 --> 0:08:09.880 solid than it does through air. If the crests of 0:08:09.960 --> 0:08:13.080 each sound wave are really close together, that's a high 0:08:13.160 --> 0:08:17.320 frequency sound. More waves will pass through an arbitrary point 0:08:17.560 --> 0:08:21.080 within a second than waves that are spaced further apart. 0:08:21.440 --> 0:08:24.600 That would be a lower frequency sound. Higher frequency sounds 0:08:24.600 --> 0:08:27.800 have a higher pitch than lower frequency sounds. So if 0:08:27.800 --> 0:08:31.440 you hold a single note at a constant frequency, you'll 0:08:31.440 --> 0:08:34.880 have what is called a simple harmonic motion. That means 0:08:34.920 --> 0:08:38.840 the vibrations are moving at a constant rate inward and outward. 0:08:38.880 --> 0:08:42.400 The cycle is constant. A tuning fork is a good 0:08:42.440 --> 0:08:46.640 example of this. So if you hear a clear C 0:08:46.920 --> 0:08:50.640 note played on a musical instrument, that could be a 0:08:50.679 --> 0:08:53.480 simple harmonic motion. It won't be, but it could be. 0:08:53.600 --> 0:08:55.520 I'll tell you why it won't be in a minute. 0:08:55.840 --> 0:08:59.160 So the frequency of vibration doesn't change, and so you 0:08:59.160 --> 0:09:01.959 would get this very clear note as a result, And 0:09:02.000 --> 0:09:04.800 if you were to diagram it, you would have very 0:09:04.840 --> 0:09:10.040 regular crests and troughs, all of the same amplitude and 0:09:10.120 --> 0:09:13.800 distance from each other. The frequency and volume would remain constant, 0:09:15.040 --> 0:09:17.880 assuming of course, that you're not trying to change the 0:09:17.920 --> 0:09:21.160 frequency or volume. Now, this is where I point out 0:09:21.480 --> 0:09:25.839 most musical instruments don't produce a single clear note, even 0:09:25.880 --> 0:09:30.640 if played expertly. They actually create several resonant frequencies. So 0:09:30.720 --> 0:09:35.319 every physical object resonates at several different frequencies. You've probably 0:09:35.360 --> 0:09:38.960 seen this in various programs. MythBusters did one about bridges, 0:09:39.440 --> 0:09:42.080 the idea being that if you were to have a 0:09:42.080 --> 0:09:44.760 group of people marching on a bridge at the bridge's 0:09:44.800 --> 0:09:48.040 resonant frequency, it could cause the bridge to start to 0:09:48.120 --> 0:09:51.839 vibrate and swing out of control. Well, there's a reason 0:09:51.880 --> 0:09:53.960 for this. You may have also seen videos of people 0:09:54.080 --> 0:09:58.280 singing a certain note and causing a crystal glass to shatter. 0:09:58.880 --> 0:10:02.360 That's because that crystal glass does have a resonant frequency, 0:10:02.400 --> 0:10:04.640 and if you can hit that resonant frequency at the 0:10:04.760 --> 0:10:08.600 right volume, you can cause the glass to start to deform, 0:10:08.720 --> 0:10:11.120 or the crystal in this case, to deform to a 0:10:11.160 --> 0:10:15.120 point where it loses integrity and it shatters as a result. Well, 0:10:16.240 --> 0:10:20.679 the resonation of an object is dependent upon lots of 0:10:20.720 --> 0:10:23.760 different factors, and in fact, most stuff will resonate at 0:10:23.840 --> 0:10:28.240 different frequencies but at different intensities. Like there might be 0:10:28.320 --> 0:10:32.480 one sweet spot, one specific frequency that will have the 0:10:32.559 --> 0:10:37.360 greatest effect, but other related frequencies may also have an effect. 0:10:37.360 --> 0:10:40.720 It will just be to a lesser extent. Well, if 0:10:40.760 --> 0:10:44.200 you were to pluck a guitar string, just you've tuned 0:10:44.200 --> 0:10:46.640 it to whatever note doesn't matter. Let's say it's you've 0:10:46.679 --> 0:10:50.439 tuned it to to G and you play the G 0:10:50.679 --> 0:10:53.960 string on your guitar. Uh, the note that you will 0:10:54.000 --> 0:10:57.280 hear really over all others will be g that that 0:10:57.400 --> 0:10:59.240 is going to be the one that will sound the loudest, 0:10:59.280 --> 0:11:03.679 But it will also play resonant frequencies at a decreased amplitude, 0:11:03.720 --> 0:11:06.839 in other words, of decreased volume, so you still hear 0:11:06.880 --> 0:11:09.679 the intended note above everything else, above all the other 0:11:09.679 --> 0:11:14.320 resonant frequencies. This is called a complex tone, and that 0:11:14.360 --> 0:11:18.040 collection of frequencies in their amplitudes is called the spectrum 0:11:18.240 --> 0:11:21.640 of sound. You get a full spectrum. Now, some of 0:11:21.679 --> 0:11:27.640 the components of that complex tone will be uh imperceptible 0:11:27.679 --> 0:11:30.360 to you. You there'll be so quiet that you wouldn't 0:11:30.440 --> 0:11:33.320 really notice them. They might affect the overall quality of 0:11:33.320 --> 0:11:34.960 the sound, but in such a subtle way that it 0:11:35.000 --> 0:11:38.120 may be difficult for you to even put it into words. 0:11:38.160 --> 0:11:41.360 Each of those little components is called a partial. So 0:11:41.400 --> 0:11:43.679 in the example of a guitar string, the partials are 0:11:43.720 --> 0:11:48.040 all integers of the same fundamental frequency, and the sound 0:11:48.080 --> 0:11:52.680 has a harmonic spectrum. But as you get further away 0:11:52.760 --> 0:11:57.400 from that fundamental frequency, the amplitude decreases significantly. So, like 0:11:57.440 --> 0:12:01.199 I said, you get far enough away, they are technically there, 0:12:01.360 --> 0:12:05.200 but they might be imperceptible to you. Now, some sounds 0:12:05.240 --> 0:12:09.880 have frequencies that aren't integers of a fundamental frequency and 0:12:09.920 --> 0:12:13.120 are inharmonic uh. Certain bells, Like if you hear a 0:12:13.120 --> 0:12:15.160 bell ring, you can probably pick out a couple of 0:12:15.200 --> 0:12:19.560 different frequencies there that are not harmonic frequencies. These are 0:12:19.679 --> 0:12:23.400 very complex sounds, and to our perception, if it's complex enough, 0:12:23.440 --> 0:12:26.959 it can seem like there's no single discernible pitch. They're 0:12:27.080 --> 0:12:31.040 like there's no fundamental frequency over all the others. If 0:12:31.040 --> 0:12:35.320 it's complex enough, we call it noise. That is the 0:12:35.360 --> 0:12:39.440 technical term. It is noise. Now, the unit we use 0:12:39.600 --> 0:12:44.719 to measure frequency is the hurts uh H, E R 0:12:44.840 --> 0:12:49.240 t Z. Typical human hearing ranges from twenty hurts, which 0:12:49.280 --> 0:12:52.760 means a wave will pass a given arbitrary point twenty 0:12:52.840 --> 0:12:55.640 times within a second, all the way up to twenty 0:12:55.760 --> 0:12:59.040 killer hurts, which means a wave will pass a particular 0:12:59.440 --> 0:13:02.640 point in time twenty thousand times in a second, or 0:13:02.800 --> 0:13:05.560 particular point on your wave form twenty thousand times in 0:13:05.559 --> 0:13:09.559 the second. And most of our sensitivity tends to be 0:13:09.559 --> 0:13:12.920 between one or two killer hurts up to four or 0:13:12.960 --> 0:13:17.320 five killer hurts. That's generally where we have human voices, 0:13:17.800 --> 0:13:20.400 and we've really gotten good at picking those out of 0:13:20.480 --> 0:13:23.160 over everything else. So our sensitivity of hearing is really 0:13:23.200 --> 0:13:26.240 concentrated between one killer hurts and four killer hurts or 0:13:26.400 --> 0:13:30.680 two and five depending upon whom you ask. Now we 0:13:30.720 --> 0:13:34.040 get back over to amplitude. That is referring to the 0:13:34.080 --> 0:13:36.800 height of the wave. It also refers to the volume 0:13:37.080 --> 0:13:41.960 the loudness of something. Amplitude means bigness, So how big 0:13:42.160 --> 0:13:45.400 is the sound? Well, the greater the amplitude, the louder 0:13:45.440 --> 0:13:48.480 it is, and amplitudes can have an enormous range and 0:13:48.520 --> 0:13:52.480 affect how we perceive sounds. So, for example, take a 0:13:52.559 --> 0:13:56.840 really complicated classical piece of music. It's just easy to 0:13:56.920 --> 0:14:00.319 explain it in that term. You might have a wretch 0:14:01.080 --> 0:14:03.640 in that classical piece of music in which all the 0:14:03.720 --> 0:14:06.920 instruments are more or less playing at a similar volume, 0:14:07.000 --> 0:14:10.720 so the sound from each instrument section has a similar amplitude. 0:14:11.240 --> 0:14:14.240 But then there might be one segment where an instrument 0:14:14.280 --> 0:14:18.599 group or maybe even a single soloist has an increased 0:14:18.600 --> 0:14:21.640 amplitude and increased volume. It rises over the rest of 0:14:21.680 --> 0:14:25.480 the orchestra, and that peak of the amplitude is called 0:14:25.520 --> 0:14:29.720 the attack of the sound, and the entire range of 0:14:29.760 --> 0:14:34.280 amplitudes is called the amplitude envelope. Now this is important 0:14:34.320 --> 0:14:38.120 when we get to m P three's because the way 0:14:38.120 --> 0:14:42.040 we perceive these sounds, uh that that has everything to 0:14:42.120 --> 0:14:44.720 do with the way the MP three was designed. The 0:14:44.760 --> 0:14:47.720 whole point of the MP three was to try and 0:14:47.760 --> 0:14:53.040 create a small file size to represent what we can 0:14:53.120 --> 0:14:56.080 hear and kind of ignore everything else. But we'll get 0:14:56.120 --> 0:14:58.640 to that in a little bit more more time so 0:14:59.160 --> 0:15:01.880 this is really interesting to me. If you take a 0:15:02.000 --> 0:15:07.920 sound and you double its amplitude, you increase the amplitude 0:15:07.920 --> 0:15:11.760 by twofold, a listener would not necessarily feel that the 0:15:11.800 --> 0:15:16.960 sound is twice as loud. Human hearing is incredibly subjective, 0:15:17.560 --> 0:15:21.640 and typically for most listeners, it would require much more 0:15:22.440 --> 0:15:26.320 than doubling the sounds amplitude for them to feel that 0:15:26.440 --> 0:15:29.960 the sound itself was twice as loud. This perception of 0:15:30.040 --> 0:15:32.480 volume is important when we get to the lossy formats 0:15:32.480 --> 0:15:37.440 for audio files. Now I've given you all this information, 0:15:37.640 --> 0:15:40.600 and I know everyone is probably thinking, you know, I 0:15:40.680 --> 0:15:44.040 learned this in primary school, elementary school. All of this 0:15:44.120 --> 0:15:47.360 is really familiar to me, and you're maybe rolling your 0:15:47.360 --> 0:15:50.400 eyes because it's so basic. But I think it's important 0:15:50.840 --> 0:15:54.120 to have that refresher so that you can understand the 0:15:54.160 --> 0:15:58.800 difference between sound as we experience it and sound as 0:15:58.880 --> 0:16:03.520 the way we hold it digitally and replicate it digitally. 0:16:04.400 --> 0:16:07.400 For one thing, this illustrates how sound in the real 0:16:07.440 --> 0:16:12.200 world is a continuum. It's a continuum both in frequency 0:16:12.240 --> 0:16:17.800 and amplitude. You can have sound changing in frequency very 0:16:17.800 --> 0:16:22.080 smoothly from one pitch to another. You can also have 0:16:22.200 --> 0:16:26.800 sound increase or decrease in amplitude in a very smooth way. 0:16:26.920 --> 0:16:31.800 And it is continuous, it's unbroken, it can have smooth transitions. 0:16:31.800 --> 0:16:34.800 And these qualities provide challenges when we want to describe 0:16:34.840 --> 0:16:40.520 something digitally, because at the heart of digital information is 0:16:40.960 --> 0:16:45.680 the bit, the basic unit of information. It is a 0:16:45.800 --> 0:16:49.440 unit of information that only has two states zero or 0:16:49.560 --> 0:16:53.720 one is essentially off or on. When you get down 0:16:53.760 --> 0:16:58.600 to defining information in just two states, then you start 0:16:58.640 --> 0:17:02.320 to look at something that's continuous and you realize this 0:17:02.400 --> 0:17:04.359 is going to be a challenge. How do I describe 0:17:04.400 --> 0:17:10.840 a continuous experience in very discreet amounts of information. And 0:17:10.920 --> 0:17:15.520 that's when we get to the methodology we've developed to 0:17:15.920 --> 0:17:19.359 digitally encode sound. I'm going to get into that in 0:17:19.640 --> 0:17:22.880 just a minute, but before I do that, let's take 0:17:22.880 --> 0:17:34.520 a quick break to thank our sponsor. All right, let's 0:17:34.560 --> 0:17:38.800 get back into it. So we've talked about the nature 0:17:38.840 --> 0:17:42.120 of sound. Analog sound, by the way, tries to replicate 0:17:42.359 --> 0:17:45.600 exactly what we would experience in nature. It tries to 0:17:45.600 --> 0:17:51.200 create this continuous experience, so you get these smooth waves 0:17:51.240 --> 0:17:56.800 of frequencies and amplitudes. And that's why some people argue 0:17:56.880 --> 0:18:02.760 that that analog styles of of sound recordings are superior 0:18:02.840 --> 0:18:07.399 to digital ones. I don't necessarily think they're right, but 0:18:07.560 --> 0:18:12.280 they often feel that way. So something like a vinyl album, 0:18:12.320 --> 0:18:16.080 which is an analog format of digital or sorry, an 0:18:16.080 --> 0:18:20.240 analog format of music storage I should say sound storage. Uh, 0:18:20.280 --> 0:18:22.960 they think that that is superior to say a CD, 0:18:23.280 --> 0:18:28.280 which is a digital storage format. Uh. And who's to say. 0:18:28.359 --> 0:18:32.399 I mean, like, if your sense of hearing is incredibly 0:18:32.680 --> 0:18:36.040 well tuned, you might be able to pick up on 0:18:36.080 --> 0:18:40.080 some differences. Or if someone did a really terrible job 0:18:40.640 --> 0:18:45.960 encoding music digitally, then that might reveal itself to you 0:18:46.000 --> 0:18:48.760 as well. Uh. But this is one of those things 0:18:48.760 --> 0:18:50.920 that I think a lot of people feel they can 0:18:50.920 --> 0:18:52.720 tell the difference, but if they would do a double 0:18:52.760 --> 0:18:57.280 blind test, they might be surprised at how difficult it is. 0:18:57.760 --> 0:19:01.160 If things if everything's working the way it should, then 0:19:01.400 --> 0:19:05.960 there shouldn't be a perceptible difference at any rate. Digital 0:19:05.960 --> 0:19:12.320 audio has two really important factors, sample rate and bit depth, 0:19:13.119 --> 0:19:15.600 or to another extent, bit rate. We'll talk about bit 0:19:15.720 --> 0:19:20.240 rate as well. So the sample rate refers to how 0:19:20.280 --> 0:19:23.840 many times you reference an analog sound to create the 0:19:23.920 --> 0:19:27.720 digital version. So sound like I said, is uninterrupted. In 0:19:27.760 --> 0:19:32.840 the analog world, you've got that that nice wave form. 0:19:32.880 --> 0:19:36.000 In the analog world, that's not how digital world works. 0:19:36.080 --> 0:19:39.280 Digital world, we have to describe that sound in a 0:19:39.359 --> 0:19:45.560 series of discrete snippets of sound. It's probably easiest to 0:19:45.600 --> 0:19:51.800 describe this with an analogy to movies on film. If 0:19:51.840 --> 0:19:55.320 you work with film, like you're creating a movie on film, 0:19:55.800 --> 0:19:58.960 then you know that you're not looking at a real 0:19:59.200 --> 0:20:02.200 moving picture when you see the film played out at 0:20:02.200 --> 0:20:05.480 the cinema. Instead, what you're looking at is a series 0:20:05.600 --> 0:20:10.120 of photographs. If you take a film strip and you 0:20:10.160 --> 0:20:14.200 look at it under a light, you'll see it's one 0:20:14.320 --> 0:20:18.720 after another photograph. It's just a series of pictures. It's 0:20:18.720 --> 0:20:20.880 only when you play them back at the right speed 0:20:21.480 --> 0:20:23.760 and you projected onto a screen that you get the 0:20:23.840 --> 0:20:28.480 illusion of continuous motion. But it's not really continuous. It's 0:20:28.520 --> 0:20:31.720 just this series of photographs played at twenty four frames 0:20:31.760 --> 0:20:36.800 per second in the case of actual film. So that 0:20:37.000 --> 0:20:40.119 ends up being very analogous to the way we encode 0:20:40.160 --> 0:20:44.000 digital audio. You take the analog recording and you take 0:20:44.280 --> 0:20:49.800 snapshots of sound. The more frequently you take those snapshots, 0:20:50.200 --> 0:20:52.440 the higher your sample rates. So in other words, if 0:20:52.440 --> 0:20:55.600 you did one a second, your sample rate would be awful. 0:20:56.320 --> 0:20:58.560 You would have a sample rate of one. But the 0:20:58.640 --> 0:21:01.400 higher the sample rate, the close to your digital representation 0:21:01.440 --> 0:21:05.240 will be to the frequency in the analog sound format. Actually, 0:21:05.720 --> 0:21:07.960 what's really important to remember is that your sample rate 0:21:08.000 --> 0:21:10.399 has to be about twice actually does have to be 0:21:10.480 --> 0:21:14.879 twice what the highest frequency sound is in your recording. 0:21:16.359 --> 0:21:20.119 It has to be because if it's not, it cannot 0:21:20.280 --> 0:21:25.879 encode that sound accurately. It's kind of interesting and you 0:21:25.960 --> 0:21:27.960 might wonder, how do we take these snapshots in the 0:21:27.960 --> 0:21:31.080 first place. Well, if you're capturing audio, let's say we're 0:21:31.119 --> 0:21:34.560 recording to digital, So we've got a microphone set up, 0:21:34.920 --> 0:21:39.240 and we're recording to a digital media storage. Like let's 0:21:39.240 --> 0:21:41.480 just say we're recording straight to someone's hard drive. So 0:21:41.520 --> 0:21:44.720 we're talking into a microphone recording to a hard drive. 0:21:45.640 --> 0:21:49.400 So you're using an analog microphone. Let's say you would 0:21:49.400 --> 0:21:53.720 need an analog to digital converter. Now, this particular component 0:21:54.000 --> 0:21:58.719 can receive discrete voltages from another device like your microphone. 0:21:59.000 --> 0:22:05.720 So your microphone is converting sound into uh differences in voltage. 0:22:05.960 --> 0:22:08.840 That's essentially how it communicates. So that it can then 0:22:09.000 --> 0:22:12.040 send that to some other element. In this case, it's 0:22:12.080 --> 0:22:15.679 sending it to the the analog to digital converter so 0:22:15.720 --> 0:22:18.359 that it can be stored digitally on your hard drive. 0:22:19.400 --> 0:22:26.560 So this analog digital converters references or samples the discrete 0:22:26.640 --> 0:22:30.199 voltage many times every second in order to create a 0:22:30.240 --> 0:22:34.720 digital representation of the analog sound. It converts the voltages 0:22:34.800 --> 0:22:39.360 into numbers in a process called quantization, and we express 0:22:39.400 --> 0:22:42.439 those numbers in bits, So these are zeros and ones. 0:22:43.000 --> 0:22:45.720 When you want to play the digital audio, a digital 0:22:45.760 --> 0:22:49.760 to analog converter does the same process in reverse. So 0:22:50.040 --> 0:22:53.720 it takes this digital information, these zeros and ones and 0:22:53.840 --> 0:22:57.520 converts it into a series of discrete voltages, which then 0:22:57.800 --> 0:23:01.480 can be amplified and sent to a speaker and create sound. 0:23:02.720 --> 0:23:05.280 So all of that's really important. But now let's let's 0:23:05.320 --> 0:23:07.879 talk about some concrete examples. And the best way to 0:23:07.920 --> 0:23:11.199 do this is to go with compact discs. Because we 0:23:11.280 --> 0:23:15.080 have a standard sample rate for compact discs, and that 0:23:15.240 --> 0:23:18.520 standard sample rate is forty four point one killer hurts 0:23:18.600 --> 0:23:22.119 to create CD equality audio. That means that the audio 0:23:22.240 --> 0:23:27.960 is sampled forty four thousand, one hundred times every second 0:23:28.840 --> 0:23:30.800 the way to hear. You say, the range of human 0:23:30.840 --> 0:23:33.280 hearing you said only goes to twenty hurts to twenty 0:23:33.359 --> 0:23:36.240 killer hurts. If it only goes up to twenty killer hurts, 0:23:36.240 --> 0:23:39.000 why are you sampling at forty four thousand, one hundred 0:23:39.119 --> 0:23:43.520 times every second? If it's twenty thousand times a second 0:23:43.560 --> 0:23:46.680 for the frequency, why go up to forty four thousand, 0:23:46.760 --> 0:23:49.359 one hundred Is there some relationship between that and the 0:23:49.400 --> 0:23:52.640 CD sample rate? And the answer is yes. So there 0:23:52.760 --> 0:23:57.959 is a theorem called the Niquist Shannon sampling theorem, and 0:23:58.040 --> 0:24:00.719 that states that the sample rate must be twice the 0:24:00.760 --> 0:24:03.960 maximum frequency of a recording in order to describe the 0:24:04.000 --> 0:24:08.200 frequency properly. So the general thought is the maximum frequency 0:24:08.240 --> 0:24:10.879 most humans can here's twenty killer hurts. And for that reason, 0:24:10.920 --> 0:24:13.760 Phillips and Sony when they were working to create the 0:24:13.920 --> 0:24:17.919 CD format to make it a standard, they decided on 0:24:17.960 --> 0:24:20.840 forty four point one killer hurts as that standard sample 0:24:20.920 --> 0:24:23.359 rate for c D audio. It was more than double 0:24:23.400 --> 0:24:26.000 the top frequency generally considered to be in the upper 0:24:26.080 --> 0:24:29.120 level of human hearing. But what happens if you were 0:24:29.160 --> 0:24:32.360 to lower the sampling rate. What if you didn't sample 0:24:32.440 --> 0:24:37.520 at What if you sampled at let's say sixteen killer hurts, 0:24:37.560 --> 0:24:41.040 so sixteen thousand times a second you sample it. Well, 0:24:41.359 --> 0:24:43.520 that means you would only be able to record and 0:24:43.560 --> 0:24:47.119 replicate any sound with a frequency up to eight killer 0:24:47.200 --> 0:24:52.240 hurts or less, so eight thousand hurts or less. But 0:24:52.400 --> 0:24:55.560 if you had any sound that was greater than eight 0:24:55.600 --> 0:24:59.879 thousand hurts or eight killer hurts, anything higher than that, 0:25:00.000 --> 0:25:04.360 it would be folded down to fit below the eight 0:25:04.440 --> 0:25:08.160 killer hurts limit. Perceptually, that means the sounds you would 0:25:08.200 --> 0:25:11.159 hear in the playback could include frequencies that were not 0:25:11.320 --> 0:25:16.120 present in the original performance of that sound. So let's 0:25:16.119 --> 0:25:20.560 say that I'm using a sample rate of sixteen uh, 0:25:20.600 --> 0:25:24.359 you know, killer hurts, and someone is playing a musical 0:25:24.400 --> 0:25:27.160