1 00:00:04,160 --> 00:00:07,160 Speaker 1: Get in touch with technology with tech Stuff from how 2 00:00:07,240 --> 00:00:13,920 Speaker 1: stuff works dot com. Hey there, everyone, this is Jonathan 3 00:00:13,960 --> 00:00:18,279 Speaker 1: Strickling with tech Stuff, and today we're gonna tackle a 4 00:00:18,360 --> 00:00:21,600 Speaker 1: subject that I've talked about in the past. Actually, way 5 00:00:21,640 --> 00:00:25,200 Speaker 1: back in two thousand and eight, back when you were 6 00:00:25,200 --> 00:00:27,760 Speaker 1: a knee high to a grasshopper, Chris Pollette and I 7 00:00:27,800 --> 00:00:31,880 Speaker 1: did an episode called how MB three files Work, and 8 00:00:31,920 --> 00:00:35,440 Speaker 1: we talked about the lossy file format, and we actually 9 00:00:35,479 --> 00:00:38,120 Speaker 1: revisited it in two thousand eleven we did an episode 10 00:00:38,120 --> 00:00:42,000 Speaker 1: about the iPod and about MP three players, but I 11 00:00:42,000 --> 00:00:44,480 Speaker 1: really thought it would be a good idea to revisit 12 00:00:45,120 --> 00:00:48,640 Speaker 1: MP three files, MP three players, digital audio in general, 13 00:00:48,720 --> 00:00:51,519 Speaker 1: the difference between digital audio and analog and all of 14 00:00:51,520 --> 00:00:54,760 Speaker 1: that history. Uh, to really give a deep dive, because 15 00:00:54,800 --> 00:00:57,200 Speaker 1: back in those days we did really short episodes and 16 00:00:57,240 --> 00:00:59,600 Speaker 1: so we weren't able to give it the full coverage 17 00:00:59,600 --> 00:01:04,360 Speaker 1: that I think get deserved. Um and we actually reached 18 00:01:04,360 --> 00:01:08,720 Speaker 1: a point in history that I did not anticipate. And 19 00:01:08,760 --> 00:01:11,759 Speaker 1: I am, of course talking about the day when I said, 20 00:01:11,800 --> 00:01:15,120 Speaker 1: you know what, I don't need to carry a smartphone 21 00:01:15,200 --> 00:01:18,120 Speaker 1: and an MP three player. I held out for a 22 00:01:18,120 --> 00:01:20,320 Speaker 1: really long time. You guys who have been long time 23 00:01:20,319 --> 00:01:23,280 Speaker 1: listeners of tech stuff might remember that I really liked 24 00:01:23,360 --> 00:01:26,959 Speaker 1: dedicated devices, Like I really liked having a digital camera, 25 00:01:27,080 --> 00:01:29,760 Speaker 1: and I really liked having an MP three player, and 26 00:01:29,800 --> 00:01:32,319 Speaker 1: I really liked having a phone that was a phone. 27 00:01:33,000 --> 00:01:34,880 Speaker 1: And now I'm like, no, I'm good with just one 28 00:01:34,880 --> 00:01:38,440 Speaker 1: device doing all that kind of thing. So, uh, since 29 00:01:38,440 --> 00:01:41,360 Speaker 1: we've reached that point, the point where our machines are 30 00:01:41,360 --> 00:01:45,400 Speaker 1: sophisticate enough to either have enough storage space to carry 31 00:01:45,440 --> 00:01:50,640 Speaker 1: an impressive music collection, or more likely as the things 32 00:01:50,840 --> 00:01:53,520 Speaker 1: as things have changed these days, um access to a 33 00:01:53,560 --> 00:01:57,720 Speaker 1: streaming service where I don't even have stuff stored permanently 34 00:01:58,080 --> 00:02:01,840 Speaker 1: or like in any any you know, lasting format on 35 00:02:01,880 --> 00:02:04,880 Speaker 1: the phone itself. Instead, I'm streaming a file over the 36 00:02:04,920 --> 00:02:08,919 Speaker 1: Internet to listen to. Dynamically, I thought, why not talk 37 00:02:08,960 --> 00:02:12,000 Speaker 1: about the MP three because who knows, in a few 38 00:02:12,080 --> 00:02:15,760 Speaker 1: years and that might just be a distant memory. So 39 00:02:16,080 --> 00:02:18,840 Speaker 1: this is going to be the first of a three 40 00:02:18,880 --> 00:02:21,600 Speaker 1: part series, and I want to let you guys know, 41 00:02:21,639 --> 00:02:25,120 Speaker 1: I'm not going to record all of these and publish 42 00:02:25,160 --> 00:02:27,440 Speaker 1: them all one right after the other. So it's not 43 00:02:27,480 --> 00:02:30,800 Speaker 1: gonna be MP three Part one, Part two, Part three 44 00:02:30,880 --> 00:02:33,639 Speaker 1: in a row. Uh. In this episode, we're gonna look 45 00:02:33,639 --> 00:02:36,560 Speaker 1: at how digital audio works in general and how it's 46 00:02:36,600 --> 00:02:40,320 Speaker 1: different from analog audio. Uh, and we're also gonna talk 47 00:02:40,320 --> 00:02:43,080 Speaker 1: about how the MP three was created and what it does. 48 00:02:43,639 --> 00:02:46,959 Speaker 1: In the next episode, I'm gonna take a deeper dive 49 00:02:47,480 --> 00:02:51,639 Speaker 1: into how an MP three file works, how it compresses audio. 50 00:02:52,120 --> 00:02:56,120 Speaker 1: It gets really technical. And in the final episode of 51 00:02:56,120 --> 00:02:58,760 Speaker 1: the series, we're gonna explore the history of the MP 52 00:02:58,880 --> 00:03:02,200 Speaker 1: three player and how Apple ended up dominating that space 53 00:03:02,240 --> 00:03:04,200 Speaker 1: for so long, to the point that we have things 54 00:03:04,240 --> 00:03:09,880 Speaker 1: called podcasts. But don't worry, I have other episodes to 55 00:03:09,919 --> 00:03:12,320 Speaker 1: divide up this content. So, like I said, it's not 56 00:03:12,360 --> 00:03:14,560 Speaker 1: all gonna be in a row. I don't want you 57 00:03:14,639 --> 00:03:19,520 Speaker 1: to have a month of MP three related episodes, but 58 00:03:20,200 --> 00:03:23,519 Speaker 1: you know, every couple of episodes, expect one of these. 59 00:03:24,160 --> 00:03:28,720 Speaker 1: It's kind of an interesting subject, I think. So to 60 00:03:28,840 --> 00:03:31,640 Speaker 1: start it all off, we all have to take a 61 00:03:31,720 --> 00:03:34,760 Speaker 1: quick trip to Germany. So anyone who is not in 62 00:03:34,840 --> 00:03:38,960 Speaker 1: Germany get your passport. I was actually in Germany not 63 00:03:39,080 --> 00:03:41,400 Speaker 1: that long ago. I got to visit Berlin and had 64 00:03:41,440 --> 00:03:45,000 Speaker 1: a wonderful time. And in Germany there's a company called 65 00:03:45,160 --> 00:03:48,640 Speaker 1: frown Hoffer Gazelle Shoft and you might wonder, well, what 66 00:03:48,680 --> 00:03:54,360 Speaker 1: does this company do? They think I joke that my profession, 67 00:03:54,720 --> 00:03:57,160 Speaker 1: that my title that I should put on my business 68 00:03:57,160 --> 00:04:01,200 Speaker 1: card it should say professional smart person. And well, no joke, 69 00:04:01,320 --> 00:04:05,240 Speaker 1: that's what these people are. They they specialize in research 70 00:04:05,360 --> 00:04:10,880 Speaker 1: and development, applied research. It's a whole company that specializes 71 00:04:10,920 --> 00:04:14,760 Speaker 1: and applied research. And it's huge. It encompasses sixties seven 72 00:04:14,800 --> 00:04:20,200 Speaker 1: institutes and research units across Germany. Well back in the 73 00:04:20,240 --> 00:04:25,880 Speaker 1: eighties and there was a researcher named Karl Heinz Brandenburg, 74 00:04:26,440 --> 00:04:33,000 Speaker 1: and Karl Heinz made a breakthrough round seven uh and 75 00:04:33,160 --> 00:04:37,480 Speaker 1: came up with this clever idea about encoding audio. He 76 00:04:37,520 --> 00:04:40,239 Speaker 1: was actually working towards creating a way that would allow 77 00:04:40,640 --> 00:04:45,000 Speaker 1: for high audio quality transfer but having a low bit 78 00:04:45,120 --> 00:04:50,400 Speaker 1: rate sampling so that file sizes and transfer times wouldn't 79 00:04:50,440 --> 00:04:52,520 Speaker 1: get out of control. Because you got to remember, this 80 00:04:52,560 --> 00:04:55,599 Speaker 1: is the eighties, this is before the Worldwide Web was 81 00:04:55,640 --> 00:04:58,839 Speaker 1: a thing that would That wouldn't happen until the early nineties, 82 00:04:59,240 --> 00:05:01,240 Speaker 1: so the Internet is very young. In fact, they weren't 83 00:05:01,240 --> 00:05:03,839 Speaker 1: even looking at the Internet as a method of distribution 84 00:05:03,880 --> 00:05:07,520 Speaker 1: for this particular type of encoded audio. They were looking 85 00:05:07,560 --> 00:05:11,960 Speaker 1: at using this to transmit across telephone lines, so they 86 00:05:12,000 --> 00:05:13,760 Speaker 1: need to have something that was going to be high 87 00:05:13,839 --> 00:05:18,440 Speaker 1: quality but low space. So what the heck does that mean? 88 00:05:18,520 --> 00:05:22,920 Speaker 1: All right, Well, digital audio and analog audio are very 89 00:05:23,000 --> 00:05:26,920 Speaker 1: different things. So to understand that, we need to look 90 00:05:27,000 --> 00:05:31,000 Speaker 1: at how sound works and how we describe sound, because 91 00:05:31,000 --> 00:05:34,760 Speaker 1: that informs how we can capture sound and replicate those 92 00:05:34,839 --> 00:05:39,120 Speaker 1: qualities digitally. So stick with me. We're gonna go back 93 00:05:39,160 --> 00:05:44,480 Speaker 1: to school for some basic sound science. And this goes 94 00:05:44,560 --> 00:05:48,320 Speaker 1: back to the way sound physically moves through a medium, 95 00:05:48,320 --> 00:05:51,839 Speaker 1: whether that's a solid or through the air or through water. 96 00:05:52,320 --> 00:05:58,640 Speaker 1: Sound is vibration. Now we sense this primarily through hearing 97 00:05:58,680 --> 00:06:01,720 Speaker 1: it or some type feeling it. If it's the right 98 00:06:01,760 --> 00:06:04,720 Speaker 1: frequency in the right amplitude, we can actually feel sound. 99 00:06:05,040 --> 00:06:08,120 Speaker 1: Anyone who stood close to, say a sub wiffer that 100 00:06:08,160 --> 00:06:10,480 Speaker 1: was really blasting out bass notes, you know what I'm 101 00:06:10,520 --> 00:06:14,159 Speaker 1: talking about, You can feel it pressing against you. Well, 102 00:06:14,200 --> 00:06:18,760 Speaker 1: sound travels through the air when molecules vibrate against each other, 103 00:06:19,360 --> 00:06:23,680 Speaker 1: and this creates instances of increased pressure and decreased pressure 104 00:06:24,080 --> 00:06:27,760 Speaker 1: at what is a hyperlocal level. We're not talking about 105 00:06:27,800 --> 00:06:31,000 Speaker 1: weather maps here, We're talking about tiny, little areas. So 106 00:06:31,279 --> 00:06:33,839 Speaker 1: this increase and decrease in pressure is something that we 107 00:06:33,920 --> 00:06:37,679 Speaker 1: can sense as sound. When those changes in pressure affect 108 00:06:37,760 --> 00:06:41,119 Speaker 1: a diaphragm, such as one that's in a microphone or 109 00:06:41,839 --> 00:06:45,919 Speaker 1: maybe your ear drum, for example, it causes the diaphragm 110 00:06:45,960 --> 00:06:50,120 Speaker 1: to actually move. So increased pressure pushes the diaphragm in, 111 00:06:51,080 --> 00:06:56,400 Speaker 1: and decreased pressure doesn't really pull the diaphragm out. I mean, 112 00:06:56,440 --> 00:06:58,680 Speaker 1: you could say it it pulls the diaphragm out, but 113 00:06:58,680 --> 00:07:02,680 Speaker 1: to be more accurate, the diagram actually pushes outward because 114 00:07:02,720 --> 00:07:05,440 Speaker 1: the pressure on the outside is lower than the pressure 115 00:07:05,480 --> 00:07:07,760 Speaker 1: on the inside. But you get what I'm saying. The 116 00:07:07,880 --> 00:07:12,320 Speaker 1: diaphragm begins to to flex inward and outward depending upon 117 00:07:12,680 --> 00:07:16,360 Speaker 1: the amount of pressure that it's it's encountering. You could 118 00:07:16,360 --> 00:07:18,720 Speaker 1: imagine this being kind of like a drum drum, not 119 00:07:18,800 --> 00:07:20,960 Speaker 1: an ear drum, but an actual drum and striking it. 120 00:07:21,800 --> 00:07:24,720 Speaker 1: That's the same sort of thing. So sound is the 121 00:07:24,760 --> 00:07:29,280 Speaker 1: fluctuations of pressure, which we can diagram as a wave 122 00:07:29,880 --> 00:07:32,720 Speaker 1: or a wave length a wave form on an X 123 00:07:32,840 --> 00:07:37,760 Speaker 1: Y axis, So the horizontal line that access that represents 124 00:07:37,840 --> 00:07:41,320 Speaker 1: time that has passed, and the vertical axis represents the 125 00:07:41,400 --> 00:07:46,200 Speaker 1: amplitude or the volume of the sound wave. The wave 126 00:07:46,320 --> 00:07:49,560 Speaker 1: length of the sound, which is the distance between successive 127 00:07:49,600 --> 00:07:52,800 Speaker 1: points on a wave, such as like the successive crests 128 00:07:52,840 --> 00:07:55,480 Speaker 1: on a wave. That tells you a lot about the frequency. 129 00:07:56,400 --> 00:08:00,520 Speaker 1: So sound moves at a constant rate through a given medium, 130 00:08:00,520 --> 00:08:04,080 Speaker 1: but it moves at different rates through different media. So, 131 00:08:04,120 --> 00:08:06,640 Speaker 1: in other words, it moves at different speed through a 132 00:08:06,680 --> 00:08:09,880 Speaker 1: solid than it does through air. If the crests of 133 00:08:09,960 --> 00:08:13,080 Speaker 1: each sound wave are really close together, that's a high 134 00:08:13,160 --> 00:08:17,320 Speaker 1: frequency sound. More waves will pass through an arbitrary point 135 00:08:17,560 --> 00:08:21,080 Speaker 1: within a second than waves that are spaced further apart. 136 00:08:21,440 --> 00:08:24,600 Speaker 1: That would be a lower frequency sound. Higher frequency sounds 137 00:08:24,600 --> 00:08:27,800 Speaker 1: have a higher pitch than lower frequency sounds. So if 138 00:08:27,800 --> 00:08:31,440 Speaker 1: you hold a single note at a constant frequency, you'll 139 00:08:31,440 --> 00:08:34,880 Speaker 1: have what is called a simple harmonic motion. That means 140 00:08:34,920 --> 00:08:38,840 Speaker 1: the vibrations are moving at a constant rate inward and outward. 141 00:08:38,880 --> 00:08:42,400 Speaker 1: The cycle is constant. A tuning fork is a good 142 00:08:42,440 --> 00:08:46,640 Speaker 1: example of this. So if you hear a clear C 143 00:08:46,920 --> 00:08:50,640 Speaker 1: note played on a musical instrument, that could be a 144 00:08:50,679 --> 00:08:53,480 Speaker 1: simple harmonic motion. It won't be, but it could be. 145 00:08:53,600 --> 00:08:55,520 Speaker 1: I'll tell you why it won't be in a minute. 146 00:08:55,840 --> 00:08:59,160 Speaker 1: So the frequency of vibration doesn't change, and so you 147 00:08:59,160 --> 00:09:01,959 Speaker 1: would get this very clear note as a result, And 148 00:09:02,000 --> 00:09:04,800 Speaker 1: if you were to diagram it, you would have very 149 00:09:04,840 --> 00:09:10,040 Speaker 1: regular crests and troughs, all of the same amplitude and 150 00:09:10,120 --> 00:09:13,800 Speaker 1: distance from each other. The frequency and volume would remain constant, 151 00:09:15,040 --> 00:09:17,880 Speaker 1: assuming of course, that you're not trying to change the 152 00:09:17,920 --> 00:09:21,160 Speaker 1: frequency or volume. Now, this is where I point out 153 00:09:21,480 --> 00:09:25,839 Speaker 1: most musical instruments don't produce a single clear note, even 154 00:09:25,880 --> 00:09:30,640 Speaker 1: if played expertly. They actually create several resonant frequencies. So 155 00:09:30,720 --> 00:09:35,319 Speaker 1: every physical object resonates at several different frequencies. You've probably 156 00:09:35,360 --> 00:09:38,960 Speaker 1: seen this in various programs. MythBusters did one about bridges, 157 00:09:39,440 --> 00:09:42,080 Speaker 1: the idea being that if you were to have a 158 00:09:42,080 --> 00:09:44,760 Speaker 1: group of people marching on a bridge at the bridge's 159 00:09:44,800 --> 00:09:48,040 Speaker 1: resonant frequency, it could cause the bridge to start to 160 00:09:48,120 --> 00:09:51,839 Speaker 1: vibrate and swing out of control. Well, there's a reason 161 00:09:51,880 --> 00:09:53,960 Speaker 1: for this. You may have also seen videos of people 162 00:09:54,080 --> 00:09:58,280 Speaker 1: singing a certain note and causing a crystal glass to shatter. 163 00:09:58,880 --> 00:10:02,360 Speaker 1: That's because that crystal glass does have a resonant frequency, 164 00:10:02,400 --> 00:10:04,640 Speaker 1: and if you can hit that resonant frequency at the 165 00:10:04,760 --> 00:10:08,600 Speaker 1: right volume, you can cause the glass to start to deform, 166 00:10:08,720 --> 00:10:11,120 Speaker 1: or the crystal in this case, to deform to a 167 00:10:11,160 --> 00:10:15,120 Speaker 1: point where it loses integrity and it shatters as a result. Well, 168 00:10:16,240 --> 00:10:20,679 Speaker 1: the resonation of an object is dependent upon lots of 169 00:10:20,720 --> 00:10:23,760 Speaker 1: different factors, and in fact, most stuff will resonate at 170 00:10:23,840 --> 00:10:28,240 Speaker 1: different frequencies but at different intensities. Like there might be 171 00:10:28,320 --> 00:10:32,480 Speaker 1: one sweet spot, one specific frequency that will have the 172 00:10:32,559 --> 00:10:37,360 Speaker 1: greatest effect, but other related frequencies may also have an effect. 173 00:10:37,360 --> 00:10:40,720 Speaker 1: It will just be to a lesser extent. Well, if 174 00:10:40,760 --> 00:10:44,200 Speaker 1: you were to pluck a guitar string, just you've tuned 175 00:10:44,200 --> 00:10:46,640 Speaker 1: it to whatever note doesn't matter. Let's say it's you've 176 00:10:46,679 --> 00:10:50,439 Speaker 1: tuned it to to G and you play the G 177 00:10:50,679 --> 00:10:53,960 Speaker 1: string on your guitar. Uh, the note that you will 178 00:10:54,000 --> 00:10:57,280 Speaker 1: hear really over all others will be g that that 179 00:10:57,400 --> 00:10:59,240 Speaker 1: is going to be the one that will sound the loudest, 180 00:10:59,280 --> 00:11:03,679 Speaker 1: But it will also play resonant frequencies at a decreased amplitude, 181 00:11:03,720 --> 00:11:06,839 Speaker 1: in other words, of decreased volume, so you still hear 182 00:11:06,880 --> 00:11:09,679 Speaker 1: the intended note above everything else, above all the other 183 00:11:09,679 --> 00:11:14,320 Speaker 1: resonant frequencies. This is called a complex tone, and that 184 00:11:14,360 --> 00:11:18,040 Speaker 1: collection of frequencies in their amplitudes is called the spectrum 185 00:11:18,240 --> 00:11:21,640 Speaker 1: of sound. You get a full spectrum. Now, some of 186 00:11:21,679 --> 00:11:27,640 Speaker 1: the components of that complex tone will be uh imperceptible 187 00:11:27,679 --> 00:11:30,360 Speaker 1: to you. You there'll be so quiet that you wouldn't 188 00:11:30,440 --> 00:11:33,320 Speaker 1: really notice them. They might affect the overall quality of 189 00:11:33,320 --> 00:11:34,960 Speaker 1: the sound, but in such a subtle way that it 190 00:11:35,000 --> 00:11:38,120 Speaker 1: may be difficult for you to even put it into words. 191 00:11:38,160 --> 00:11:41,360 Speaker 1: Each of those little components is called a partial. So 192 00:11:41,400 --> 00:11:43,679 Speaker 1: in the example of a guitar string, the partials are 193 00:11:43,720 --> 00:11:48,040 Speaker 1: all integers of the same fundamental frequency, and the sound 194 00:11:48,080 --> 00:11:52,680 Speaker 1: has a harmonic spectrum. But as you get further away 195 00:11:52,760 --> 00:11:57,400 Speaker 1: from that fundamental frequency, the amplitude decreases significantly. So, like 196 00:11:57,440 --> 00:12:01,199 Speaker 1: I said, you get far enough away, they are technically there, 197 00:12:01,360 --> 00:12:05,200 Speaker 1: but they might be imperceptible to you. Now, some sounds 198 00:12:05,240 --> 00:12:09,880 Speaker 1: have frequencies that aren't integers of a fundamental frequency and 199 00:12:09,920 --> 00:12:13,120 Speaker 1: are inharmonic uh. Certain bells, Like if you hear a 200 00:12:13,120 --> 00:12:15,160 Speaker 1: bell ring, you can probably pick out a couple of 201 00:12:15,200 --> 00:12:19,560 Speaker 1: different frequencies there that are not harmonic frequencies. These are 202 00:12:19,679 --> 00:12:23,400 Speaker 1: very complex sounds, and to our perception, if it's complex enough, 203 00:12:23,440 --> 00:12:26,959 Speaker 1: it can seem like there's no single discernible pitch. They're 204 00:12:27,080 --> 00:12:31,040 Speaker 1: like there's no fundamental frequency over all the others. If 205 00:12:31,040 --> 00:12:35,320 Speaker 1: it's complex enough, we call it noise. That is the 206 00:12:35,360 --> 00:12:39,440 Speaker 1: technical term. It is noise. Now, the unit we use 207 00:12:39,600 --> 00:12:44,719 Speaker 1: to measure frequency is the hurts uh H, E R 208 00:12:44,840 --> 00:12:49,240 Speaker 1: t Z. Typical human hearing ranges from twenty hurts, which 209 00:12:49,280 --> 00:12:52,760 Speaker 1: means a wave will pass a given arbitrary point twenty 210 00:12:52,840 --> 00:12:55,640 Speaker 1: times within a second, all the way up to twenty 211 00:12:55,760 --> 00:12:59,040 Speaker 1: killer hurts, which means a wave will pass a particular 212 00:12:59,440 --> 00:13:02,640 Speaker 1: point in time twenty thousand times in a second, or 213 00:13:02,800 --> 00:13:05,560 Speaker 1: particular point on your wave form twenty thousand times in 214 00:13:05,559 --> 00:13:09,559 Speaker 1: the second. And most of our sensitivity tends to be 215 00:13:09,559 --> 00:13:12,920 Speaker 1: between one or two killer hurts up to four or 216 00:13:12,960 --> 00:13:17,320 Speaker 1: five killer hurts. That's generally where we have human voices, 217 00:13:17,800 --> 00:13:20,400 Speaker 1: and we've really gotten good at picking those out of 218 00:13:20,480 --> 00:13:23,160 Speaker 1: over everything else. So our sensitivity of hearing is really 219 00:13:23,200 --> 00:13:26,240 Speaker 1: concentrated between one killer hurts and four killer hurts or 220 00:13:26,400 --> 00:13:30,680 Speaker 1: two and five depending upon whom you ask. Now we 221 00:13:30,720 --> 00:13:34,040 Speaker 1: get back over to amplitude. That is referring to the 222 00:13:34,080 --> 00:13:36,800 Speaker 1: height of the wave. It also refers to the volume 223 00:13:37,080 --> 00:13:41,960 Speaker 1: the loudness of something. Amplitude means bigness, So how big 224 00:13:42,160 --> 00:13:45,400 Speaker 1: is the sound? Well, the greater the amplitude, the louder 225 00:13:45,440 --> 00:13:48,480 Speaker 1: it is, and amplitudes can have an enormous range and 226 00:13:48,520 --> 00:13:52,480 Speaker 1: affect how we perceive sounds. So, for example, take a 227 00:13:52,559 --> 00:13:56,840 Speaker 1: really complicated classical piece of music. It's just easy to 228 00:13:56,920 --> 00:14:00,319 Speaker 1: explain it in that term. You might have a wretch 229 00:14:01,080 --> 00:14:03,640 Speaker 1: in that classical piece of music in which all the 230 00:14:03,720 --> 00:14:06,920 Speaker 1: instruments are more or less playing at a similar volume, 231 00:14:07,000 --> 00:14:10,720 Speaker 1: so the sound from each instrument section has a similar amplitude. 232 00:14:11,240 --> 00:14:14,240 Speaker 1: But then there might be one segment where an instrument 233 00:14:14,280 --> 00:14:18,599 Speaker 1: group or maybe even a single soloist has an increased 234 00:14:18,600 --> 00:14:21,640 Speaker 1: amplitude and increased volume. It rises over the rest of 235 00:14:21,680 --> 00:14:25,480 Speaker 1: the orchestra, and that peak of the amplitude is called 236 00:14:25,520 --> 00:14:29,720 Speaker 1: the attack of the sound, and the entire range of 237 00:14:29,760 --> 00:14:34,280 Speaker 1: amplitudes is called the amplitude envelope. Now this is important 238 00:14:34,320 --> 00:14:38,120 Speaker 1: when we get to m P three's because the way 239 00:14:38,120 --> 00:14:42,040 Speaker 1: we perceive these sounds, uh that that has everything to 240 00:14:42,120 --> 00:14:44,720 Speaker 1: do with the way the MP three was designed. The 241 00:14:44,760 --> 00:14:47,720 Speaker 1: whole point of the MP three was to try and 242 00:14:47,760 --> 00:14:53,040 Speaker 1: create a small file size to represent what we can 243 00:14:53,120 --> 00:14:56,080 Speaker 1: hear and kind of ignore everything else. But we'll get 244 00:14:56,120 --> 00:14:58,640 Speaker 1: to that in a little bit more more time so 245 00:14:59,160 --> 00:15:01,880 Speaker 1: this is really interesting to me. If you take a 246 00:15:02,000 --> 00:15:07,920 Speaker 1: sound and you double its amplitude, you increase the amplitude 247 00:15:07,920 --> 00:15:11,760 Speaker 1: by twofold, a listener would not necessarily feel that the 248 00:15:11,800 --> 00:15:16,960 Speaker 1: sound is twice as loud. Human hearing is incredibly subjective, 249 00:15:17,560 --> 00:15:21,640 Speaker 1: and typically for most listeners, it would require much more 250 00:15:22,440 --> 00:15:26,320 Speaker 1: than doubling the sounds amplitude for them to feel that 251 00:15:26,440 --> 00:15:29,960 Speaker 1: the sound itself was twice as loud. This perception of 252 00:15:30,040 --> 00:15:32,480 Speaker 1: volume is important when we get to the lossy formats 253 00:15:32,480 --> 00:15:37,440 Speaker 1: for audio files. Now I've given you all this information, 254 00:15:37,640 --> 00:15:40,600 Speaker 1: and I know everyone is probably thinking, you know, I 255 00:15:40,680 --> 00:15:44,040 Speaker 1: learned this in primary school, elementary school. All of this 256 00:15:44,120 --> 00:15:47,360 Speaker 1: is really familiar to me, and you're maybe rolling your 257 00:15:47,360 --> 00:15:50,400 Speaker 1: eyes because it's so basic. But I think it's important 258 00:15:50,840 --> 00:15:54,120 Speaker 1: to have that refresher so that you can understand the 259 00:15:54,160 --> 00:15:58,800 Speaker 1: difference between sound as we experience it and sound as 260 00:15:58,880 --> 00:16:03,520 Speaker 1: the way we hold it digitally and replicate it digitally. 261 00:16:04,400 --> 00:16:07,400 Speaker 1: For one thing, this illustrates how sound in the real 262 00:16:07,440 --> 00:16:12,200 Speaker 1: world is a continuum. It's a continuum both in frequency 263 00:16:12,240 --> 00:16:17,800 Speaker 1: and amplitude. You can have sound changing in frequency very 264 00:16:17,800 --> 00:16:22,080 Speaker 1: smoothly from one pitch to another. You can also have 265 00:16:22,200 --> 00:16:26,800 Speaker 1: sound increase or decrease in amplitude in a very smooth way. 266 00:16:26,920 --> 00:16:31,800 Speaker 1: And it is continuous, it's unbroken, it can have smooth transitions. 267 00:16:31,800 --> 00:16:34,800 Speaker 1: And these qualities provide challenges when we want to describe 268 00:16:34,840 --> 00:16:40,520 Speaker 1: something digitally, because at the heart of digital information is 269 00:16:40,960 --> 00:16:45,680 Speaker 1: the bit, the basic unit of information. It is a 270 00:16:45,800 --> 00:16:49,440 Speaker 1: unit of information that only has two states zero or 271 00:16:49,560 --> 00:16:53,720 Speaker 1: one is essentially off or on. When you get down 272 00:16:53,760 --> 00:16:58,600 Speaker 1: to defining information in just two states, then you start 273 00:16:58,640 --> 00:17:02,320 Speaker 1: to look at something that's continuous and you realize this 274 00:17:02,400 --> 00:17:04,359 Speaker 1: is going to be a challenge. How do I describe 275 00:17:04,400 --> 00:17:10,840 Speaker 1: a continuous experience in very discreet amounts of information. And 276 00:17:10,920 --> 00:17:15,520 Speaker 1: that's when we get to the methodology we've developed to 277 00:17:15,920 --> 00:17:19,359 Speaker 1: digitally encode sound. I'm going to get into that in 278 00:17:19,640 --> 00:17:22,880 Speaker 1: just a minute, but before I do that, let's take 279 00:17:22,880 --> 00:17:34,520 Speaker 1: a quick break to thank our sponsor. All right, let's 280 00:17:34,560 --> 00:17:38,800 Speaker 1: get back into it. So we've talked about the nature 281 00:17:38,840 --> 00:17:42,120 Speaker 1: of sound. Analog sound, by the way, tries to replicate 282 00:17:42,359 --> 00:17:45,600 Speaker 1: exactly what we would experience in nature. It tries to 283 00:17:45,600 --> 00:17:51,200 Speaker 1: create this continuous experience, so you get these smooth waves 284 00:17:51,240 --> 00:17:56,800 Speaker 1: of frequencies and amplitudes. And that's why some people argue 285 00:17:56,880 --> 00:18:02,760 Speaker 1: that that analog styles of of sound recordings are superior 286 00:18:02,840 --> 00:18:07,399 Speaker 1: to digital ones. I don't necessarily think they're right, but 287 00:18:07,560 --> 00:18:12,280 Speaker 1: they often feel that way. So something like a vinyl album, 288 00:18:12,320 --> 00:18:16,080 Speaker 1: which is an analog format of digital or sorry, an 289 00:18:16,080 --> 00:18:20,240 Speaker 1: analog format of music storage I should say sound storage. Uh, 290 00:18:20,280 --> 00:18:22,960 Speaker 1: they think that that is superior to say a CD, 291 00:18:23,280 --> 00:18:28,280 Speaker 1: which is a digital storage format. Uh. And who's to say. 292 00:18:28,359 --> 00:18:32,399 Speaker 1: I mean, like, if your sense of hearing is incredibly 293 00:18:32,680 --> 00:18:36,040 Speaker 1: well tuned, you might be able to pick up on 294 00:18:36,080 --> 00:18:40,080 Speaker 1: some differences. Or if someone did a really terrible job 295 00:18:40,640 --> 00:18:45,960 Speaker 1: encoding music digitally, then that might reveal itself to you 296 00:18:46,000 --> 00:18:48,760 Speaker 1: as well. Uh. But this is one of those things 297 00:18:48,760 --> 00:18:50,920 Speaker 1: that I think a lot of people feel they can 298 00:18:50,920 --> 00:18:52,720 Speaker 1: tell the difference, but if they would do a double 299 00:18:52,760 --> 00:18:57,280 Speaker 1: blind test, they might be surprised at how difficult it is. 300 00:18:57,760 --> 00:19:01,160 Speaker 1: If things if everything's working the way it should, then 301 00:19:01,400 --> 00:19:05,960 Speaker 1: there shouldn't be a perceptible difference at any rate. Digital 302 00:19:05,960 --> 00:19:12,320 Speaker 1: audio has two really important factors, sample rate and bit depth, 303 00:19:13,119 --> 00:19:15,600 Speaker 1: or to another extent, bit rate. We'll talk about bit 304 00:19:15,720 --> 00:19:20,240 Speaker 1: rate as well. So the sample rate refers to how 305 00:19:20,280 --> 00:19:23,840 Speaker 1: many times you reference an analog sound to create the 306 00:19:23,920 --> 00:19:27,720 Speaker 1: digital version. So sound like I said, is uninterrupted. In 307 00:19:27,760 --> 00:19:32,840 Speaker 1: the analog world, you've got that that nice wave form. 308 00:19:32,880 --> 00:19:36,000 Speaker 1: In the analog world, that's not how digital world works. 309 00:19:36,080 --> 00:19:39,280 Speaker 1: Digital world, we have to describe that sound in a 310 00:19:39,359 --> 00:19:45,560 Speaker 1: series of discrete snippets of sound. It's probably easiest to 311 00:19:45,600 --> 00:19:51,800 Speaker 1: describe this with an analogy to movies on film. If 312 00:19:51,840 --> 00:19:55,320 Speaker 1: you work with film, like you're creating a movie on film, 313 00:19:55,800 --> 00:19:58,960 Speaker 1: then you know that you're not looking at a real 314 00:19:59,200 --> 00:20:02,200 Speaker 1: moving picture when you see the film played out at 315 00:20:02,200 --> 00:20:05,480 Speaker 1: the cinema. Instead, what you're looking at is a series 316 00:20:05,600 --> 00:20:10,120 Speaker 1: of photographs. If you take a film strip and you 317 00:20:10,160 --> 00:20:14,200 Speaker 1: look at it under a light, you'll see it's one 318 00:20:14,320 --> 00:20:18,720 Speaker 1: after another photograph. It's just a series of pictures. It's 319 00:20:18,720 --> 00:20:20,880 Speaker 1: only when you play them back at the right speed 320 00:20:21,480 --> 00:20:23,760 Speaker 1: and you projected onto a screen that you get the 321 00:20:23,840 --> 00:20:28,480 Speaker 1: illusion of continuous motion. But it's not really continuous. It's 322 00:20:28,520 --> 00:20:31,720 Speaker 1: just this series of photographs played at twenty four frames 323 00:20:31,760 --> 00:20:36,800 Speaker 1: per second in the case of actual film. So that 324 00:20:37,000 --> 00:20:40,119 Speaker 1: ends up being very analogous to the way we encode 325 00:20:40,160 --> 00:20:44,000 Speaker 1: digital audio. You take the analog recording and you take 326 00:20:44,280 --> 00:20:49,800 Speaker 1: snapshots of sound. The more frequently you take those snapshots, 327 00:20:50,200 --> 00:20:52,440 Speaker 1: the higher your sample rates. So in other words, if 328 00:20:52,440 --> 00:20:55,600 Speaker 1: you did one a second, your sample rate would be awful. 329 00:20:56,320 --> 00:20:58,560 Speaker 1: You would have a sample rate of one. But the 330 00:20:58,640 --> 00:21:01,400 Speaker 1: higher the sample rate, the close to your digital representation 331 00:21:01,440 --> 00:21:05,240 Speaker 1: will be to the frequency in the analog sound format. Actually, 332 00:21:05,720 --> 00:21:07,960 Speaker 1: what's really important to remember is that your sample rate 333 00:21:08,000 --> 00:21:10,399 Speaker 1: has to be about twice actually does have to be 334 00:21:10,480 --> 00:21:14,879 Speaker 1: twice what the highest frequency sound is in your recording. 335 00:21:16,359 --> 00:21:20,119 Speaker 1: It has to be because if it's not, it cannot 336 00:21:20,280 --> 00:21:25,879 Speaker 1: encode that sound accurately. It's kind of interesting and you 337 00:21:25,960 --> 00:21:27,960 Speaker 1: might wonder, how do we take these snapshots in the 338 00:21:27,960 --> 00:21:31,080 Speaker 1: first place. Well, if you're capturing audio, let's say we're 339 00:21:31,119 --> 00:21:34,560 Speaker 1: recording to digital, So we've got a microphone set up, 340 00:21:34,920 --> 00:21:39,240 Speaker 1: and we're recording to a digital media storage. Like let's 341 00:21:39,240 --> 00:21:41,480 Speaker 1: just say we're recording straight to someone's hard drive. So 342 00:21:41,520 --> 00:21:44,720 Speaker 1: we're talking into a microphone recording to a hard drive. 343 00:21:45,640 --> 00:21:49,400 Speaker 1: So you're using an analog microphone. Let's say you would 344 00:21:49,400 --> 00:21:53,720 Speaker 1: need an analog to digital converter. Now, this particular component 345 00:21:54,000 --> 00:21:58,719 Speaker 1: can receive discrete voltages from another device like your microphone. 346 00:21:59,000 --> 00:22:05,720 Speaker 1: So your microphone is converting sound into uh differences in voltage. 347 00:22:05,960 --> 00:22:08,840 Speaker 1: That's essentially how it communicates. So that it can then 348 00:22:09,000 --> 00:22:12,040 Speaker 1: send that to some other element. In this case, it's 349 00:22:12,080 --> 00:22:15,679 Speaker 1: sending it to the the analog to digital converter so 350 00:22:15,720 --> 00:22:18,359 Speaker 1: that it can be stored digitally on your hard drive. 351 00:22:19,400 --> 00:22:26,560 Speaker 1: So this analog digital converters references or samples the discrete 352 00:22:26,640 --> 00:22:30,199 Speaker 1: voltage many times every second in order to create a 353 00:22:30,240 --> 00:22:34,720 Speaker 1: digital representation of the analog sound. It converts the voltages 354 00:22:34,800 --> 00:22:39,360 Speaker 1: into numbers in a process called quantization, and we express 355 00:22:39,400 --> 00:22:42,439 Speaker 1: those numbers in bits, So these are zeros and ones. 356 00:22:43,000 --> 00:22:45,720 Speaker 1: When you want to play the digital audio, a digital 357 00:22:45,760 --> 00:22:49,760 Speaker 1: to analog converter does the same process in reverse. So 358 00:22:50,040 --> 00:22:53,720 Speaker 1: it takes this digital information, these zeros and ones and 359 00:22:53,840 --> 00:22:57,520 Speaker 1: converts it into a series of discrete voltages, which then 360 00:22:57,800 --> 00:23:01,480 Speaker 1: can be amplified and sent to a speaker and create sound. 361 00:23:02,720 --> 00:23:05,280 Speaker 1: So all of that's really important. But now let's let's 362 00:23:05,320 --> 00:23:07,879 Speaker 1: talk about some concrete examples. And the best way to 363 00:23:07,920 --> 00:23:11,199 Speaker 1: do this is to go with compact discs. Because we 364 00:23:11,280 --> 00:23:15,080 Speaker 1: have a standard sample rate for compact discs, and that 365 00:23:15,240 --> 00:23:18,520 Speaker 1: standard sample rate is forty four point one killer hurts 366 00:23:18,600 --> 00:23:22,119 Speaker 1: to create CD equality audio. That means that the audio 367 00:23:22,240 --> 00:23:27,960 Speaker 1: is sampled forty four thousand, one hundred times every second 368 00:23:28,840 --> 00:23:30,800 Speaker 1: the way to hear. You say, the range of human 369 00:23:30,840 --> 00:23:33,280 Speaker 1: hearing you said only goes to twenty hurts to twenty 370 00:23:33,359 --> 00:23:36,240 Speaker 1: killer hurts. If it only goes up to twenty killer hurts, 371 00:23:36,240 --> 00:23:39,000 Speaker 1: why are you sampling at forty four thousand, one hundred 372 00:23:39,119 --> 00:23:43,520 Speaker 1: times every second? If it's twenty thousand times a second 373 00:23:43,560 --> 00:23:46,680 Speaker 1: for the frequency, why go up to forty four thousand, 374 00:23:46,760 --> 00:23:49,359 Speaker 1: one hundred Is there some relationship between that and the 375 00:23:49,400 --> 00:23:52,640 Speaker 1: CD sample rate? And the answer is yes. So there 376 00:23:52,760 --> 00:23:57,959 Speaker 1: is a theorem called the Niquist Shannon sampling theorem, and 377 00:23:58,040 --> 00:24:00,719 Speaker 1: that states that the sample rate must be twice the 378 00:24:00,760 --> 00:24:03,960 Speaker 1: maximum frequency of a recording in order to describe the 379 00:24:04,000 --> 00:24:08,200 Speaker 1: frequency properly. So the general thought is the maximum frequency 380 00:24:08,240 --> 00:24:10,879 Speaker 1: most humans can here's twenty killer hurts. And for that reason, 381 00:24:10,920 --> 00:24:13,760 Speaker 1: Phillips and Sony when they were working to create the 382 00:24:13,920 --> 00:24:17,919 Speaker 1: CD format to make it a standard, they decided on 383 00:24:17,960 --> 00:24:20,840 Speaker 1: forty four point one killer hurts as that standard sample 384 00:24:20,920 --> 00:24:23,359 Speaker 1: rate for c D audio. It was more than double 385 00:24:23,400 --> 00:24:26,000 Speaker 1: the top frequency generally considered to be in the upper 386 00:24:26,080 --> 00:24:29,120 Speaker 1: level of human hearing. But what happens if you were 387 00:24:29,160 --> 00:24:32,360 Speaker 1: to lower the sampling rate. What if you didn't sample 388 00:24:32,440 --> 00:24:37,520 Speaker 1: at What if you sampled at let's say sixteen killer hurts, 389 00:24:37,560 --> 00:24:41,040 Speaker 1: so sixteen thousand times a second you sample it. Well, 390 00:24:41,359 --> 00:24:43,520 Speaker 1: that means you would only be able to record and 391 00:24:43,560 --> 00:24:47,119 Speaker 1: replicate any sound with a frequency up to eight killer 392 00:24:47,200 --> 00:24:52,240 Speaker 1: hurts or less, so eight thousand hurts or less. But 393 00:24:52,400 --> 00:24:55,560 Speaker 1: if you had any sound that was greater than eight 394 00:24:55,600 --> 00:24:59,879 Speaker 1: thousand hurts or eight killer hurts, anything higher than that, 395 00:25:00,000 --> 00:25:04,360 Speaker 1: it would be folded down to fit below the eight 396 00:25:04,440 --> 00:25:08,160 Speaker 1: killer hurts limit. Perceptually, that means the sounds you would 397 00:25:08,200 --> 00:25:11,159 Speaker 1: hear in the playback could include frequencies that were not 398 00:25:11,320 --> 00:25:16,120 Speaker 1: present in the original performance of that sound. So let's 399 00:25:16,119 --> 00:25:20,560 Speaker 1: say that I'm using a sample rate of sixteen uh, 400 00:25:20,600 --> 00:25:24,359 Speaker 1: you know, killer hurts, and someone is playing a musical 401 00:25:24,400 --> 00:25:27,160 Speaker 1: instrument and they play a note that's at a nine 402 00:25:27,200 --> 00:25:32,720 Speaker 1: killer hurts frequency. Well, because I'm sampling at sixteen killer hurts, 403 00:25:33,320 --> 00:25:37,639 Speaker 1: my limit for frequencies is eight killer hurts. If you 404 00:25:37,680 --> 00:25:40,560 Speaker 1: play something at nine killer hurts, what happens is it 405 00:25:40,880 --> 00:25:45,240 Speaker 1: the recording seems to fold the sound back, and it 406 00:25:45,359 --> 00:25:49,840 Speaker 1: folds it back at the same limit that the sound 407 00:25:49,880 --> 00:25:54,960 Speaker 1: goes over. The sample rate, or rather the Nyquist limit, 408 00:25:55,000 --> 00:25:57,560 Speaker 1: I should say, not the sample rateself but the Nyquist limit, 409 00:25:58,720 --> 00:26:03,720 Speaker 1: so nine killer her sound played. My limit is eight 410 00:26:03,800 --> 00:26:06,960 Speaker 1: killer hurts. Well, nine killer hurts is one killer hurts 411 00:26:06,960 --> 00:26:10,000 Speaker 1: more than eight, so it folds it back and the 412 00:26:10,040 --> 00:26:13,320 Speaker 1: sound you would hear on the recording would be seven 413 00:26:13,400 --> 00:26:17,000 Speaker 1: killer hurts. So the original sound is nine killer hurts, 414 00:26:17,080 --> 00:26:21,480 Speaker 1: the playback sound is seven killer hurts, and you would 415 00:26:21,520 --> 00:26:25,639 Speaker 1: hear something recorded that wasn't actually played. That's why you 416 00:26:25,680 --> 00:26:28,800 Speaker 1: have to have a really high sample rate so that 417 00:26:28,840 --> 00:26:32,679 Speaker 1: you don't have these instances where sound gets folded back 418 00:26:33,480 --> 00:26:38,359 Speaker 1: into the frequency range, because otherwise what you are hearing 419 00:26:38,520 --> 00:26:42,480 Speaker 1: is not an accurate representation of what was actually generated 420 00:26:42,760 --> 00:26:46,919 Speaker 1: what you were trying to record. This whole phenomenon, by 421 00:26:46,920 --> 00:26:51,800 Speaker 1: the way, is called fold over or sometimes aliasing. So 422 00:26:51,840 --> 00:26:54,800 Speaker 1: that's sample rate. But then we've got bit depth. Now, 423 00:26:54,840 --> 00:26:59,080 Speaker 1: this is all about measuring the volume or amplitude of 424 00:26:59,119 --> 00:27:02,359 Speaker 1: a sound. So you have a range. You just make 425 00:27:02,400 --> 00:27:06,240 Speaker 1: an arbitrary range to say, like we're gonna go quietest 426 00:27:06,280 --> 00:27:09,199 Speaker 1: to loudest, and you just define what that range is. 427 00:27:09,400 --> 00:27:12,120 Speaker 1: It could literally be any range. Let's say you say 428 00:27:12,200 --> 00:27:15,960 Speaker 1: zero to one hundred. Zero is dead silence, no sound 429 00:27:16,000 --> 00:27:19,560 Speaker 1: at all. One hundred is as loud as the sound 430 00:27:19,720 --> 00:27:24,160 Speaker 1: ever gets. It's the peak volume of sound. That means 431 00:27:24,200 --> 00:27:28,560 Speaker 1: you can describe all the different volumes within that recording 432 00:27:29,119 --> 00:27:33,000 Speaker 1: at a number between zero and one hundred. But let's 433 00:27:33,000 --> 00:27:36,320 Speaker 1: say you take that same recording and instead of making 434 00:27:36,320 --> 00:27:39,679 Speaker 1: the range zero to one hundred, you say it's zero 435 00:27:39,760 --> 00:27:43,919 Speaker 1: to two thousand. You haven't made the volume louder. The 436 00:27:44,000 --> 00:27:47,080 Speaker 1: volume is still the exact same as it was when 437 00:27:47,119 --> 00:27:49,879 Speaker 1: you called the range zero to one hundred. But what 438 00:27:50,000 --> 00:27:53,720 Speaker 1: you have done is added more units. You have created 439 00:27:53,880 --> 00:27:58,880 Speaker 1: more precise steps between absolute silent and as loud as 440 00:27:58,920 --> 00:28:02,720 Speaker 1: it gets. So you've just increased the size of the 441 00:28:02,800 --> 00:28:04,760 Speaker 1: range so that you can be more precise in the 442 00:28:04,800 --> 00:28:09,280 Speaker 1: differences in volume. And this is really important. So let's 443 00:28:09,320 --> 00:28:11,800 Speaker 1: say that you've got a sound that you rank at 444 00:28:11,880 --> 00:28:15,440 Speaker 1: seventy eight and another sound that you rank at seventy nine, 445 00:28:16,080 --> 00:28:18,920 Speaker 1: and that's gonna be the same for both of these ranges. Uh, 446 00:28:19,040 --> 00:28:21,880 Speaker 1: just two different examples. Actually, So you've got your zero 447 00:28:21,880 --> 00:28:25,840 Speaker 1: to one range, and a seventy eight would be seventy 448 00:28:25,840 --> 00:28:29,760 Speaker 1: eight percent of the loudest sound in the entire recording, 449 00:28:30,280 --> 00:28:33,159 Speaker 1: and at seventy nine would be a seventy nine of 450 00:28:33,200 --> 00:28:36,960 Speaker 1: the loudest sound in the entire recording. That's an actually 451 00:28:36,960 --> 00:28:39,760 Speaker 1: pretty hefty jump. But let's say we instead went with 452 00:28:39,800 --> 00:28:42,920 Speaker 1: that zero to two thousand range and you still had 453 00:28:42,920 --> 00:28:47,160 Speaker 1: seventy eight and seventy nine. Well, seventy eight would represent 454 00:28:47,280 --> 00:28:50,840 Speaker 1: three point nine percent of the full volume and seventy 455 00:28:50,920 --> 00:28:54,480 Speaker 1: nine would resent represent three point nine five of a 456 00:28:54,520 --> 00:28:57,640 Speaker 1: full volume. In other words, you'd be able to mark 457 00:28:57,960 --> 00:29:02,280 Speaker 1: much more subtle differences in volume, and that means you 458 00:29:02,280 --> 00:29:06,680 Speaker 1: can have more nuance in your recording. And since we're 459 00:29:06,680 --> 00:29:09,800 Speaker 1: talking about a natural sound to start off with, so 460 00:29:09,840 --> 00:29:12,360 Speaker 1: you're taking a natural sound and you're trying to digitize it. 461 00:29:13,160 --> 00:29:17,800 Speaker 1: Smooth changes in amplitude are possible in natural sound. Using 462 00:29:17,800 --> 00:29:21,000 Speaker 1: a broader range to describe the volume is best if 463 00:29:21,000 --> 00:29:25,320 Speaker 1: you want to get an accurate representation or resolution of 464 00:29:25,360 --> 00:29:28,880 Speaker 1: that sound. Going back to that zero to one range 465 00:29:29,200 --> 00:29:32,240 Speaker 1: changes in volume would be more chunky. Two sounds that 466 00:29:32,280 --> 00:29:36,440 Speaker 1: have slight differences in amplitude would end up being defined 467 00:29:36,520 --> 00:29:40,680 Speaker 1: as being identical because you wouldn't have the precision. You know, 468 00:29:40,720 --> 00:29:42,760 Speaker 1: you couldn't say this one seventy eight and a half. 469 00:29:43,240 --> 00:29:45,520 Speaker 1: It would either be seventy eight or seventy nine. So 470 00:29:45,600 --> 00:29:48,960 Speaker 1: you could have two sounds that in a greater precision 471 00:29:49,120 --> 00:29:52,680 Speaker 1: you could tell the difference between their volumes. But if 472 00:29:52,720 --> 00:29:57,240 Speaker 1: you have that lower, that more shallow bit depth, you 473 00:29:57,240 --> 00:29:58,800 Speaker 1: wouldn't be able to tell the difference of it. You 474 00:29:58,840 --> 00:30:01,840 Speaker 1: would lose that new once that's subtlety. This is part 475 00:30:01,880 --> 00:30:06,000 Speaker 1: of the reason why people say, like a lot of 476 00:30:06,040 --> 00:30:10,480 Speaker 1: the modern music has uh lower ranges and changes in volume, 477 00:30:10,600 --> 00:30:14,480 Speaker 1: like the the loudest loud parts and the softest soft parts. 478 00:30:14,520 --> 00:30:18,480 Speaker 1: That range has decreased over time, which a lot of 479 00:30:18,480 --> 00:30:21,320 Speaker 1: people have argued has meant that music has gotten less 480 00:30:21,880 --> 00:30:27,120 Speaker 1: complex and therefore, in some minds, less interesting. That's on 481 00:30:27,160 --> 00:30:31,160 Speaker 1: a related uh kind of philosophy to what I'm talking 482 00:30:31,200 --> 00:30:36,640 Speaker 1: about here. So you want to have those smaller steps 483 00:30:36,720 --> 00:30:40,760 Speaker 1: between each unit so you can create greater resolution, more 484 00:30:40,880 --> 00:30:46,360 Speaker 1: smoothness to the recorded audio. And it's actually the bit 485 00:30:46,480 --> 00:30:49,480 Speaker 1: rate and CD audio that will help make the sound 486 00:30:49,680 --> 00:30:53,480 Speaker 1: seem smooth. So if you ever listened to eight bit music, 487 00:30:53,880 --> 00:30:56,480 Speaker 1: you know, like the kind from old video game consoles, 488 00:30:56,520 --> 00:30:59,520 Speaker 1: that sound is really harsh and sort of chunky and 489 00:30:59,640 --> 00:31:03,040 Speaker 1: has an appeal, but it's not you know, it's not 490 00:31:03,440 --> 00:31:07,160 Speaker 1: smooth at all. It can create an amazing effect, but 491 00:31:07,200 --> 00:31:10,960 Speaker 1: if you want to represent true analog sound, it's not awesome. 492 00:31:11,960 --> 00:31:15,920 Speaker 1: If you went up to sixteen bit, that's CD quality 493 00:31:16,000 --> 00:31:21,080 Speaker 1: bit depth, it's much better. Uh, Professional recording studios will 494 00:31:21,120 --> 00:31:25,240 Speaker 1: do four bit or thirty two bit because they're gonna 495 00:31:25,280 --> 00:31:28,800 Speaker 1: do a lot of post processing work on those audio files. 496 00:31:29,080 --> 00:31:31,000 Speaker 1: And when you do that post processing work, if you 497 00:31:31,040 --> 00:31:34,840 Speaker 1: do it at sixteen bit, the stuff you're doing, the 498 00:31:34,920 --> 00:31:37,840 Speaker 1: changes you make can become noticeable, and most times you 499 00:31:37,880 --> 00:31:40,440 Speaker 1: don't want that. You don't want it to be you know, 500 00:31:40,680 --> 00:31:42,600 Speaker 1: you don't want it to stand out from the rest 501 00:31:42,600 --> 00:31:45,320 Speaker 1: of the audio file. But that's the only reason they 502 00:31:45,320 --> 00:31:47,360 Speaker 1: go up to twenty four bit or thirty two bit. 503 00:31:47,720 --> 00:31:51,320 Speaker 1: There'd be no point in playing it back at that rate, 504 00:31:51,440 --> 00:31:57,160 Speaker 1: that bit depth, because human hearing is not so adept 505 00:31:57,320 --> 00:32:00,280 Speaker 1: to tell the difference, at least not from most human ends. 506 00:32:01,120 --> 00:32:04,240 Speaker 1: So if you played back a recording at sixteen bit 507 00:32:04,560 --> 00:32:07,080 Speaker 1: and another one at four bit, it's the same piece. 508 00:32:07,760 --> 00:32:10,080 Speaker 1: Most people would not be able to tell the difference 509 00:32:10,120 --> 00:32:14,360 Speaker 1: because you've already reached a resolution that equals the precision 510 00:32:14,440 --> 00:32:18,120 Speaker 1: of human hearing. Keeping in mind again, human hearing is subjective, 511 00:32:18,360 --> 00:32:21,680 Speaker 1: not everyone is equal. There's some people who have incredible 512 00:32:21,720 --> 00:32:24,680 Speaker 1: hearing who may be able to pick out that difference. 513 00:32:25,520 --> 00:32:27,880 Speaker 1: I am not one of those people, but I am 514 00:32:27,920 --> 00:32:30,200 Speaker 1: a person who's going to tell you. We'll get to 515 00:32:30,240 --> 00:32:34,200 Speaker 1: the last section in just a bit, but first let's 516 00:32:34,200 --> 00:32:45,440 Speaker 1: take another quick break to thank our sponsor. All Right, 517 00:32:45,520 --> 00:32:48,320 Speaker 1: So bits depth, what we just talked about that can 518 00:32:48,360 --> 00:32:51,800 Speaker 1: be thought of is how well the sound is described, 519 00:32:52,640 --> 00:32:55,680 Speaker 1: and the sampling rate is how frequently or how much 520 00:32:56,040 --> 00:33:00,560 Speaker 1: the sound is described. And CD Audio quality has sixteen 521 00:33:00,560 --> 00:33:05,000 Speaker 1: bit audio. That means that they actually have sixty five thousand, 522 00:33:05,160 --> 00:33:09,480 Speaker 1: five hundred thirty six different levels of volume that they 523 00:33:09,480 --> 00:33:13,800 Speaker 1: can describe within an audio track. So my example of 524 00:33:13,880 --> 00:33:18,280 Speaker 1: zero to two thousand that is primitive compared to c 525 00:33:18,440 --> 00:33:22,600 Speaker 1: D audio because it has the sixteen bit style five 526 00:33:22,640 --> 00:33:26,160 Speaker 1: hundred thirty six different levels. And how is that possible. Well, 527 00:33:28,000 --> 00:33:31,840 Speaker 1: when we say sixteen bit, remember a bit represents two 528 00:33:31,880 --> 00:33:34,240 Speaker 1: states zero or one. So you take the number two 529 00:33:34,960 --> 00:33:39,480 Speaker 1: and then you raise it to the power of sixteen ah, 530 00:33:39,680 --> 00:33:43,800 Speaker 1: so you multiply to by itself sixteen times and you 531 00:33:43,840 --> 00:33:47,280 Speaker 1: get sixty five thousand, three D fifty six. So that's 532 00:33:47,280 --> 00:33:51,160 Speaker 1: that's where that number comes from. Now, with your digital sample. 533 00:33:52,080 --> 00:33:54,840 Speaker 1: You have a collection of points that roughly replicate the 534 00:33:54,920 --> 00:33:57,760 Speaker 1: shape of an analog sound wave. It's gonna look a 535 00:33:57,760 --> 00:34:01,080 Speaker 1: little funky, but you'll be able to see what the 536 00:34:01,240 --> 00:34:05,760 Speaker 1: frequency and amplitude generally was of the original recording if 537 00:34:05,760 --> 00:34:08,800 Speaker 1: you were to plot this on an X y axis. 538 00:34:09,600 --> 00:34:12,439 Speaker 1: But if you were just to connect each successive point 539 00:34:12,480 --> 00:34:15,759 Speaker 1: with a straight line, even as close together as they 540 00:34:15,800 --> 00:34:18,040 Speaker 1: would be, because you're looking at forty four thousand one 541 00:34:18,400 --> 00:34:22,080 Speaker 1: times a second, it had sound pretty awful. So we 542 00:34:22,120 --> 00:34:26,440 Speaker 1: actually use an algorithm called interpolation to join the points 543 00:34:26,719 --> 00:34:29,960 Speaker 1: smoothly to imitate a sound wave form, and that gives 544 00:34:30,000 --> 00:34:33,640 Speaker 1: a musical playback program the ability to replicate an analog 545 00:34:33,680 --> 00:34:38,040 Speaker 1: wave form. And that's actually called pulse code modulation or 546 00:34:38,160 --> 00:34:45,200 Speaker 1: pc M. And if you store audio uh intact this way, 547 00:34:45,320 --> 00:34:48,560 Speaker 1: you would have what we call a lossless audio file, 548 00:34:48,960 --> 00:34:51,640 Speaker 1: which means exactly what it sounds like. None of that 549 00:34:51,760 --> 00:34:54,919 Speaker 1: data would ever get filtered out of the file, even 550 00:34:54,960 --> 00:34:57,800 Speaker 1: if the sounds were beyond the range of human hearing, 551 00:34:57,800 --> 00:35:01,000 Speaker 1: they would be recorded, and you would have a lossless 552 00:35:01,200 --> 00:35:05,080 Speaker 1: file format. Those files tend to be quite big, depending 553 00:35:05,160 --> 00:35:08,520 Speaker 1: upon how long a recording you make. Of course, all right. 554 00:35:09,000 --> 00:35:11,400 Speaker 1: So now here's where it gets a little confusing. And 555 00:35:11,440 --> 00:35:13,080 Speaker 1: I think I even said bit rate a couple of 556 00:35:13,080 --> 00:35:16,000 Speaker 1: times when I really meant bit depth earlier. But up 557 00:35:16,040 --> 00:35:19,319 Speaker 1: to this point, I really was talking bit depth. So 558 00:35:19,680 --> 00:35:22,120 Speaker 1: my apologies to all of you out there if a 559 00:35:22,160 --> 00:35:24,719 Speaker 1: bit rate slipped through because I did not mean it. 560 00:35:24,800 --> 00:35:27,520 Speaker 1: Now I'm going to talk about bit rate and show 561 00:35:27,560 --> 00:35:32,080 Speaker 1: you how it's different than bit depth. Bit Rate refers 562 00:35:32,120 --> 00:35:35,799 Speaker 1: to the amount of data audio uses per second or 563 00:35:35,840 --> 00:35:39,960 Speaker 1: requires per second of recording, and you derive bit rate 564 00:35:40,360 --> 00:35:43,960 Speaker 1: from the bit depth and this sampling rate it's represented 565 00:35:44,040 --> 00:35:47,520 Speaker 1: as bits per second. So again let's go to seed 566 00:35:47,560 --> 00:35:52,200 Speaker 1: equality sound. That makes it easy. You have thousand samples 567 00:35:52,360 --> 00:35:57,160 Speaker 1: per second, you've got sixteen bits or two bites because 568 00:35:57,239 --> 00:36:00,800 Speaker 1: remember a bite is eight bits, so you two bites 569 00:36:00,840 --> 00:36:07,320 Speaker 1: to describe each sample. So two bites for samples per second. 570 00:36:08,200 --> 00:36:11,040 Speaker 1: Uh plus you probably are gonna have to multiply that 571 00:36:11,160 --> 00:36:14,719 Speaker 1: by two because you're probably recording in stereo, so you 572 00:36:14,760 --> 00:36:18,680 Speaker 1: have to do that once reach track. So you get 573 00:36:18,719 --> 00:36:21,080 Speaker 1: that number, then you have to multiply that by sixty 574 00:36:21,080 --> 00:36:23,839 Speaker 1: seconds to determine how much data per minute you are 575 00:36:23,960 --> 00:36:27,520 Speaker 1: creating when you're recording and with seed quality audio that 576 00:36:27,600 --> 00:36:30,400 Speaker 1: ends up being about ten megabytes of data per minute. 577 00:36:31,239 --> 00:36:34,000 Speaker 1: Now these days that's not really that big a deal 578 00:36:34,680 --> 00:36:38,719 Speaker 1: because we're dealing with super fast Internet speeds and enormous 579 00:36:38,800 --> 00:36:42,600 Speaker 1: hard drives. But just a few years ago, that was 580 00:36:42,640 --> 00:36:46,000 Speaker 1: considered to be a really sizeable file, I mean an 581 00:36:46,120 --> 00:36:48,920 Speaker 1: enormous file. And so if you wanted to find a 582 00:36:48,920 --> 00:36:51,879 Speaker 1: way to distribute digital audio so it didn't take up 583 00:36:51,920 --> 00:36:55,840 Speaker 1: too much space, you had to figure out how you 584 00:36:55,880 --> 00:37:00,000 Speaker 1: could compress those files and make them smaller, make them 585 00:37:00,000 --> 00:37:04,239 Speaker 1: more manageable. And now we can finally get back to 586 00:37:04,320 --> 00:37:08,240 Speaker 1: Germany and Hair Brandenburg. You thought we left him behind, 587 00:37:08,960 --> 00:37:12,680 Speaker 1: We didn't. He was just part of a flashback. So 588 00:37:12,840 --> 00:37:15,320 Speaker 1: let's go to the MP three. First of all, it 589 00:37:15,360 --> 00:37:19,040 Speaker 1: gets its name from the Motion Picture Experts Group, also 590 00:37:19,160 --> 00:37:23,600 Speaker 1: known as IMPEG. It was part of a project that 591 00:37:23,719 --> 00:37:26,840 Speaker 1: IMPEG was doing that was looking at ways of compressing 592 00:37:26,880 --> 00:37:30,239 Speaker 1: audio along with the work that they were doing with 593 00:37:30,360 --> 00:37:35,279 Speaker 1: video files. It's actually named after the process that they developed, 594 00:37:35,680 --> 00:37:39,120 Speaker 1: called IMPEG Audio Layer three. So yes, there was a 595 00:37:39,200 --> 00:37:42,279 Speaker 1: layer one and a layer two. Layer three was a 596 00:37:42,320 --> 00:37:44,719 Speaker 1: refinement of the approach and was the one that was 597 00:37:44,760 --> 00:37:49,520 Speaker 1: actually successful in the market now, Brandenburg was working with 598 00:37:49,600 --> 00:37:53,680 Speaker 1: an instructor he was pursuing. Brandenburg was pursuing a PhD 599 00:37:53,760 --> 00:37:55,880 Speaker 1: at the time and trying to come up with a 600 00:37:55,880 --> 00:37:59,799 Speaker 1: practical means of transmitting digital audio across phone lines, and 601 00:37:59,800 --> 00:38:03,000 Speaker 1: in the process he began to experiment with algorithms that 602 00:38:03,040 --> 00:38:08,360 Speaker 1: could take digital audio information and determine which bits are significant. 603 00:38:08,840 --> 00:38:13,040 Speaker 1: Anything that was deemed insignificant could be discarded. So the 604 00:38:13,120 --> 00:38:16,560 Speaker 1: thinking was that information we cannot perceive as human beings 605 00:38:16,680 --> 00:38:19,799 Speaker 1: is worthless. There's no point in preserving it in an 606 00:38:19,800 --> 00:38:22,880 Speaker 1: audio file format. It's just taking up space that we 607 00:38:22,960 --> 00:38:26,000 Speaker 1: can't even perceive when we play it back, So there's 608 00:38:26,040 --> 00:38:28,960 Speaker 1: no reason to replicate it. There's no reason to record it. 609 00:38:29,080 --> 00:38:32,000 Speaker 1: Leave it out, and that way you could compress digital 610 00:38:32,040 --> 00:38:35,680 Speaker 1: audio files. Or to put it another way, if the 611 00:38:35,719 --> 00:38:38,239 Speaker 1: algorithm determined that a sound was outside the range of 612 00:38:38,280 --> 00:38:41,880 Speaker 1: human hearing, it would drop it from the encoding process, 613 00:38:41,920 --> 00:38:44,640 Speaker 1: so you get a sound file much smaller than the 614 00:38:44,680 --> 00:38:49,239 Speaker 1: more accurate representative version. So the lossless version would be 615 00:38:49,440 --> 00:38:53,360 Speaker 1: more accurate to the original sound. But this new version, 616 00:38:53,480 --> 00:38:56,640 Speaker 1: what we would call a lossy version, a compressed file 617 00:38:57,120 --> 00:39:00,359 Speaker 1: would be able to replicate it pretty well if it's 618 00:39:00,400 --> 00:39:03,640 Speaker 1: designed properly, and it maybe to a point if you 619 00:39:03,680 --> 00:39:06,440 Speaker 1: design it well enough that you couldn't tell the difference 620 00:39:06,480 --> 00:39:10,040 Speaker 1: between the two. Uh. That took some time. That was 621 00:39:10,080 --> 00:39:15,320 Speaker 1: not easy to do. So the new file, the new version, 622 00:39:15,400 --> 00:39:18,560 Speaker 1: the compressed one, the lossy format, would only have the 623 00:39:18,600 --> 00:39:22,280 Speaker 1: actual relevant data, and from that point forward, the challenge 624 00:39:22,360 --> 00:39:26,719 Speaker 1: was to determine what are the benchmarks to figure out 625 00:39:26,840 --> 00:39:30,480 Speaker 1: what is relevant versus what is irrelevant, because if you 626 00:39:30,520 --> 00:39:33,520 Speaker 1: lose too much information, you change the quality of the recording, 627 00:39:33,960 --> 00:39:37,280 Speaker 1: meaning it's no longer an accurate representation of the original sound. 628 00:39:37,880 --> 00:39:41,360 Speaker 1: So you might say that any sound below twenty hurts 629 00:39:41,640 --> 00:39:44,880 Speaker 1: isn't relevant because it's below the range of your typical 630 00:39:45,000 --> 00:39:49,080 Speaker 1: human humans ability to hear. You might say that anything 631 00:39:49,080 --> 00:39:54,200 Speaker 1: above twenty thousand hurts or twenty killer hurts is irrelevant 632 00:39:54,280 --> 00:39:59,080 Speaker 1: because humans typically can't hear sounds above that frequency. You 633 00:39:59,160 --> 00:40:02,640 Speaker 1: might say that sound at a certain amplitude or lower 634 00:40:03,200 --> 00:40:08,040 Speaker 1: are irrelevant because they're so quiet that humans wouldn't hear them. 635 00:40:08,239 --> 00:40:11,320 Speaker 1: Or you might say that if a certain sound is 636 00:40:11,360 --> 00:40:14,120 Speaker 1: at a lower amplitude and a different sound is at 637 00:40:14,120 --> 00:40:18,279 Speaker 1: a higher amplitude, the higher amplitude sound is drowning out 638 00:40:18,320 --> 00:40:21,680 Speaker 1: the lower amplitude sound, and so we humans don't really 639 00:40:21,680 --> 00:40:24,799 Speaker 1: perceive the lower amplitude sound. This is where we get 640 00:40:24,800 --> 00:40:28,000 Speaker 1: into psychoacoustics. It's not just what we hear, but how 641 00:40:28,040 --> 00:40:32,200 Speaker 1: we perceive the sound itself. And a lot of that 642 00:40:32,280 --> 00:40:35,520 Speaker 1: went into formulating the algorithms to figure out how to 643 00:40:35,560 --> 00:40:38,480 Speaker 1: compress this music in a way where you get a 644 00:40:38,560 --> 00:40:44,359 Speaker 1: recording that represents the original without uh, you know, compromising 645 00:40:44,360 --> 00:40:46,920 Speaker 1: too much and still getting the file size to a 646 00:40:47,040 --> 00:40:50,640 Speaker 1: manageable size. And these are the decisions you have to 647 00:40:50,680 --> 00:40:53,200 Speaker 1: make to figure out which bits of information you keep 648 00:40:53,200 --> 00:40:57,040 Speaker 1: in which ones you ditch. Well Brandenburg and a team 649 00:40:57,040 --> 00:40:59,080 Speaker 1: we're working on our fighting this approach in the late 650 00:40:59,120 --> 00:41:02,839 Speaker 1: eighties and early nineties. And he said, at one point 651 00:41:02,880 --> 00:41:05,120 Speaker 1: he thought he had nailed it, and then he heard 652 00:41:05,120 --> 00:41:10,280 Speaker 1: an acapella song, It was Tom's Diner by Suzanne Vega, 653 00:41:10,800 --> 00:41:14,000 Speaker 1: and then he listened to the compressed MP three version 654 00:41:14,160 --> 00:41:17,520 Speaker 1: of that song, using the the version of MP three 655 00:41:17,560 --> 00:41:20,440 Speaker 1: that had been developed up to that point, and he said, 656 00:41:21,120 --> 00:41:25,360 Speaker 1: it ruined the song. It trashed it. It sounded terrible. 657 00:41:25,680 --> 00:41:29,279 Speaker 1: He said that other representations of music seemed fine with 658 00:41:29,360 --> 00:41:32,360 Speaker 1: this particular approach, but when they went with this stripped 659 00:41:32,360 --> 00:41:36,520 Speaker 1: down acapella song with this particular kind of you're in 660 00:41:36,560 --> 00:41:39,600 Speaker 1: the middle of a space, listening to Suzanne Vegas sing, 661 00:41:40,280 --> 00:41:43,440 Speaker 1: it ruined her voice, and so the team began to 662 00:41:43,440 --> 00:41:47,080 Speaker 1: tweet the compression algorithms to correct for this problem, and 663 00:41:47,120 --> 00:41:49,760 Speaker 1: it took a lot of work to figure out, Okay, well, 664 00:41:49,800 --> 00:41:53,279 Speaker 1: what are the elements of sound that we messed with 665 00:41:53,960 --> 00:41:56,920 Speaker 1: that have created this issue, and ultimately they were finally 666 00:41:56,920 --> 00:41:59,440 Speaker 1: able to create an MP three file that didn't distort 667 00:41:59,560 --> 00:42:02,440 Speaker 1: or ruin the recording. Brandberg said he listened to that 668 00:42:02,520 --> 00:42:05,880 Speaker 1: song somewhere between five hundred and a thousand times, and 669 00:42:05,920 --> 00:42:09,440 Speaker 1: then he saw Suzanne Vega performance live and he was 670 00:42:09,480 --> 00:42:14,520 Speaker 1: able to recognize all of those subtle changes in her 671 00:42:14,600 --> 00:42:18,160 Speaker 1: voice because he had paid so close attention to it 672 00:42:18,280 --> 00:42:22,200 Speaker 1: during the process of tweaking this algorithm. He said, Ultimately, 673 00:42:22,680 --> 00:42:25,200 Speaker 1: the real telling thing is he still enjoyed the song, 674 00:42:26,719 --> 00:42:29,720 Speaker 1: which says a lot about him. Me. I can't stand 675 00:42:29,760 --> 00:42:33,319 Speaker 1: that song, but maybe it's just because to me, there's 676 00:42:33,320 --> 00:42:34,880 Speaker 1: a point where it just sounds like someone is just 677 00:42:34,920 --> 00:42:37,879 Speaker 1: singing about what they're doing, and I do that every day. 678 00:42:38,400 --> 00:42:41,280 Speaker 1: No one gave me a record deal, alright. So getting 679 00:42:41,280 --> 00:42:46,480 Speaker 1: back to MP three, they had finalized the FOUL format 680 00:42:46,520 --> 00:42:49,920 Speaker 1: and created the standard, but it was just one of 681 00:42:50,080 --> 00:42:54,279 Speaker 1: several possibilities for encoding audio, and it didn't immediately take off. 682 00:42:54,320 --> 00:43:01,080 Speaker 1: It wasn't immediately adopted by consumers. The team had identified 683 00:43:01,080 --> 00:43:04,480 Speaker 1: the Internet as a possible distribute distribution method for MP 684 00:43:04,560 --> 00:43:07,839 Speaker 1: three files, rather than just over telephone lines. They said, well, 685 00:43:08,000 --> 00:43:11,080 Speaker 1: can technically we could send and P three's across the Internet, 686 00:43:11,480 --> 00:43:16,280 Speaker 1: so you could send manageable sized files across this network. 687 00:43:17,560 --> 00:43:22,280 Speaker 1: On July fourteenth, they created the file extension dot MP three. 688 00:43:23,680 --> 00:43:26,920 Speaker 1: Now it would take a little bit longer for software 689 00:43:26,960 --> 00:43:29,440 Speaker 1: to take advantage of this. One of the early programs 690 00:43:29,480 --> 00:43:33,560 Speaker 1: was winamp, which made MP three decoding accessible and from 691 00:43:33,560 --> 00:43:36,920 Speaker 1: that point the file format began to take off. To 692 00:43:37,080 --> 00:43:40,440 Speaker 1: follow would be dedicated MP three players and sites that 693 00:43:40,480 --> 00:43:44,160 Speaker 1: allowed people to upload and download compressed audio files, which 694 00:43:44,280 --> 00:43:50,200 Speaker 1: also indicated a rise in piracy. And then in response 695 00:43:50,280 --> 00:43:52,640 Speaker 1: to the rise in piracy, we saw an increase in 696 00:43:52,800 --> 00:43:56,960 Speaker 1: d r M strategies digital rights management or copy protection 697 00:43:57,000 --> 00:44:00,319 Speaker 1: if you prefer, and that all really in it up 698 00:44:00,360 --> 00:44:04,640 Speaker 1: shaping a lot of the policies and strategies that affect 699 00:44:04,640 --> 00:44:07,479 Speaker 1: the Internet today, So you could say that the MP 700 00:44:07,640 --> 00:44:11,520 Speaker 1: three is one of the reasons why the Internet is 701 00:44:11,560 --> 00:44:14,200 Speaker 1: the way it is right now, and why arguments both 702 00:44:14,239 --> 00:44:19,399 Speaker 1: for and against net neutrality have formulated in certain ways. 703 00:44:19,440 --> 00:44:21,439 Speaker 1: A lot of it is shaped by the MP three. 704 00:44:22,480 --> 00:44:26,240 Speaker 1: So that kind of wraps up this discussion about digital 705 00:44:26,280 --> 00:44:29,560 Speaker 1: audio in general and a little bit on MP three files. 706 00:44:29,560 --> 00:44:32,560 Speaker 1: In the next episode of this series, I will dive 707 00:44:32,640 --> 00:44:36,040 Speaker 1: into a more technical explanation of what is actually going 708 00:44:36,120 --> 00:44:39,440 Speaker 1: on with the MP three compression algorithms. And I bet 709 00:44:39,520 --> 00:44:44,440 Speaker 1: you can't wait to learn all about fast Furrier transforms. 710 00:44:44,600 --> 00:44:47,480 Speaker 1: I know I can't, And like I said, I'll have 711 00:44:47,600 --> 00:44:50,400 Speaker 1: other episodes to sprinkle in between this one and the 712 00:44:50,440 --> 00:44:53,239 Speaker 1: next one and then the third one, so that way 713 00:44:53,280 --> 00:44:56,239 Speaker 1: you won't just get digital audio overload. And if you 714 00:44:56,280 --> 00:44:59,839 Speaker 1: guys have any comments or questions or suggestions for show 715 00:44:59,880 --> 00:45:03,040 Speaker 1: to topics or people I should interview, or maybe people 716 00:45:03,040 --> 00:45:05,520 Speaker 1: I should have on as a guest host, shoot him 717 00:45:05,520 --> 00:45:09,240 Speaker 1: my way. My email is tech Stuff at how stuff 718 00:45:09,280 --> 00:45:12,000 Speaker 1: works dot com, or you can always drop me a 719 00:45:12,040 --> 00:45:15,000 Speaker 1: line on Facebook or Twitter with the handle tech stuff 720 00:45:15,239 --> 00:45:18,960 Speaker 1: hs W and I'll talk to you guys again really 721 00:45:19,000 --> 00:45:25,200 Speaker 1: soon for more on this and thousands of other topics. 722 00:45:25,440 --> 00:45:36,520 Speaker 1: Is it how stuff Works? Dot com