WEBVTT - The Dirt on Digital Audio

0:00:04.160 --> 0:00:07.160
<v Speaker 1>Get in touch with technology with tech Stuff from how

0:00:07.240 --> 0:00:13.920
<v Speaker 1>stuff works dot com. Hey there, everyone, this is Jonathan

0:00:13.960 --> 0:00:18.279
<v Speaker 1>Strickling with tech Stuff, and today we're gonna tackle a

0:00:18.360 --> 0:00:21.600
<v Speaker 1>subject that I've talked about in the past. Actually, way

0:00:21.640 --> 0:00:25.200
<v Speaker 1>back in two thousand and eight, back when you were

0:00:25.200 --> 0:00:27.760
<v Speaker 1>a knee high to a grasshopper, Chris Pollette and I

0:00:27.800 --> 0:00:31.880
<v Speaker 1>did an episode called how MB three files Work, and

0:00:31.920 --> 0:00:35.440
<v Speaker 1>we talked about the lossy file format, and we actually

0:00:35.479 --> 0:00:38.120
<v Speaker 1>revisited it in two thousand eleven we did an episode

0:00:38.120 --> 0:00:42.000
<v Speaker 1>about the iPod and about MP three players, but I

0:00:42.000 --> 0:00:44.480
<v Speaker 1>really thought it would be a good idea to revisit

0:00:45.120 --> 0:00:48.640
<v Speaker 1>MP three files, MP three players, digital audio in general,

0:00:48.720 --> 0:00:51.519
<v Speaker 1>the difference between digital audio and analog and all of

0:00:51.520 --> 0:00:54.760
<v Speaker 1>that history. Uh, to really give a deep dive, because

0:00:54.800 --> 0:00:57.200
<v Speaker 1>back in those days we did really short episodes and

0:00:57.240 --> 0:00:59.600
<v Speaker 1>so we weren't able to give it the full coverage

0:00:59.600 --> 0:01:04.360
<v Speaker 1>that I think get deserved. Um and we actually reached

0:01:04.360 --> 0:01:08.720
<v Speaker 1>a point in history that I did not anticipate. And

0:01:08.760 --> 0:01:11.759
<v Speaker 1>I am, of course talking about the day when I said,

0:01:11.800 --> 0:01:15.120
<v Speaker 1>you know what, I don't need to carry a smartphone

0:01:15.200 --> 0:01:18.120
<v Speaker 1>and an MP three player. I held out for a

0:01:18.120 --> 0:01:20.320
<v Speaker 1>really long time. You guys who have been long time

0:01:20.319 --> 0:01:23.280
<v Speaker 1>listeners of tech stuff might remember that I really liked

0:01:23.360 --> 0:01:26.959
<v Speaker 1>dedicated devices, Like I really liked having a digital camera,

0:01:27.080 --> 0:01:29.760
<v Speaker 1>and I really liked having an MP three player, and

0:01:29.800 --> 0:01:32.319
<v Speaker 1>I really liked having a phone that was a phone.

0:01:33.000 --> 0:01:34.880
<v Speaker 1>And now I'm like, no, I'm good with just one

0:01:34.880 --> 0:01:38.440
<v Speaker 1>device doing all that kind of thing. So, uh, since

0:01:38.440 --> 0:01:41.360
<v Speaker 1>we've reached that point, the point where our machines are

0:01:41.360 --> 0:01:45.400
<v Speaker 1>sophisticate enough to either have enough storage space to carry

0:01:45.440 --> 0:01:50.640
<v Speaker 1>an impressive music collection, or more likely as the things

0:01:50.840 --> 0:01:53.520
<v Speaker 1>as things have changed these days, um access to a

0:01:53.560 --> 0:01:57.720
<v Speaker 1>streaming service where I don't even have stuff stored permanently

0:01:58.080 --> 0:02:01.840
<v Speaker 1>or like in any any you know, lasting format on

0:02:01.880 --> 0:02:04.880
<v Speaker 1>the phone itself. Instead, I'm streaming a file over the

0:02:04.920 --> 0:02:08.919
<v Speaker 1>Internet to listen to. Dynamically, I thought, why not talk

0:02:08.960 --> 0:02:12.000
<v Speaker 1>about the MP three because who knows, in a few

0:02:12.080 --> 0:02:15.760
<v Speaker 1>years and that might just be a distant memory. So

0:02:16.080 --> 0:02:18.840
<v Speaker 1>this is going to be the first of a three

0:02:18.880 --> 0:02:21.600
<v Speaker 1>part series, and I want to let you guys know,

0:02:21.639 --> 0:02:25.120
<v Speaker 1>I'm not going to record all of these and publish

0:02:25.160 --> 0:02:27.440
<v Speaker 1>them all one right after the other. So it's not

0:02:27.480 --> 0:02:30.800
<v Speaker 1>gonna be MP three Part one, Part two, Part three

0:02:30.880 --> 0:02:33.639
<v Speaker 1>in a row. Uh. In this episode, we're gonna look

0:02:33.639 --> 0:02:36.560
<v Speaker 1>at how digital audio works in general and how it's

0:02:36.600 --> 0:02:40.320
<v Speaker 1>different from analog audio. Uh, and we're also gonna talk

0:02:40.320 --> 0:02:43.080
<v Speaker 1>about how the MP three was created and what it does.

0:02:43.639 --> 0:02:46.959
<v Speaker 1>In the next episode, I'm gonna take a deeper dive

0:02:47.480 --> 0:02:51.639
<v Speaker 1>into how an MP three file works, how it compresses audio.

0:02:52.120 --> 0:02:56.120
<v Speaker 1>It gets really technical. And in the final episode of

0:02:56.120 --> 0:02:58.760
<v Speaker 1>the series, we're gonna explore the history of the MP

0:02:58.880 --> 0:03:02.200
<v Speaker 1>three player and how Apple ended up dominating that space

0:03:02.240 --> 0:03:04.200
<v Speaker 1>for so long, to the point that we have things

0:03:04.240 --> 0:03:09.880
<v Speaker 1>called podcasts. But don't worry, I have other episodes to

0:03:09.919 --> 0:03:12.320
<v Speaker 1>divide up this content. So, like I said, it's not

0:03:12.360 --> 0:03:14.560
<v Speaker 1>all gonna be in a row. I don't want you

0:03:14.639 --> 0:03:19.520
<v Speaker 1>to have a month of MP three related episodes, but

0:03:20.200 --> 0:03:23.519
<v Speaker 1>you know, every couple of episodes, expect one of these.

0:03:24.160 --> 0:03:28.720
<v Speaker 1>It's kind of an interesting subject, I think. So to

0:03:28.840 --> 0:03:31.640
<v Speaker 1>start it all off, we all have to take a

0:03:31.720 --> 0:03:34.760
<v Speaker 1>quick trip to Germany. So anyone who is not in

0:03:34.840 --> 0:03:38.960
<v Speaker 1>Germany get your passport. I was actually in Germany not

0:03:39.080 --> 0:03:41.400
<v Speaker 1>that long ago. I got to visit Berlin and had

0:03:41.440 --> 0:03:45.000
<v Speaker 1>a wonderful time. And in Germany there's a company called

0:03:45.160 --> 0:03:48.640
<v Speaker 1>frown Hoffer Gazelle Shoft and you might wonder, well, what

0:03:48.680 --> 0:03:54.360
<v Speaker 1>does this company do? They think I joke that my profession,

0:03:54.720 --> 0:03:57.160
<v Speaker 1>that my title that I should put on my business

0:03:57.160 --> 0:04:01.200
<v Speaker 1>card it should say professional smart person. And well, no joke,

0:04:01.320 --> 0:04:05.240
<v Speaker 1>that's what these people are. They they specialize in research

0:04:05.360 --> 0:04:10.880
<v Speaker 1>and development, applied research. It's a whole company that specializes

0:04:10.920 --> 0:04:14.760
<v Speaker 1>and applied research. And it's huge. It encompasses sixties seven

0:04:14.800 --> 0:04:20.200
<v Speaker 1>institutes and research units across Germany. Well back in the

0:04:20.240 --> 0:04:25.880
<v Speaker 1>eighties and there was a researcher named Karl Heinz Brandenburg,

0:04:26.440 --> 0:04:33.000
<v Speaker 1>and Karl Heinz made a breakthrough round seven uh and

0:04:33.160 --> 0:04:37.480
<v Speaker 1>came up with this clever idea about encoding audio. He

0:04:37.520 --> 0:04:40.239
<v Speaker 1>was actually working towards creating a way that would allow

0:04:40.640 --> 0:04:45.000
<v Speaker 1>for high audio quality transfer but having a low bit

0:04:45.120 --> 0:04:50.400
<v Speaker 1>rate sampling so that file sizes and transfer times wouldn't

0:04:50.440 --> 0:04:52.520
<v Speaker 1>get out of control. Because you got to remember, this

0:04:52.560 --> 0:04:55.599
<v Speaker 1>is the eighties, this is before the Worldwide Web was

0:04:55.640 --> 0:04:58.839
<v Speaker 1>a thing that would That wouldn't happen until the early nineties,

0:04:59.240 --> 0:05:01.240
<v Speaker 1>so the Internet is very young. In fact, they weren't

0:05:01.240 --> 0:05:03.839
<v Speaker 1>even looking at the Internet as a method of distribution

0:05:03.880 --> 0:05:07.520
<v Speaker 1>for this particular type of encoded audio. They were looking

0:05:07.560 --> 0:05:11.960
<v Speaker 1>at using this to transmit across telephone lines, so they

0:05:12.000 --> 0:05:13.760
<v Speaker 1>need to have something that was going to be high

0:05:13.839 --> 0:05:18.440
<v Speaker 1>quality but low space. So what the heck does that mean?

0:05:18.520 --> 0:05:22.920
<v Speaker 1>All right, Well, digital audio and analog audio are very

0:05:23.000 --> 0:05:26.920
<v Speaker 1>different things. So to understand that, we need to look

0:05:27.000 --> 0:05:31.000
<v Speaker 1>at how sound works and how we describe sound, because

0:05:31.000 --> 0:05:34.760
<v Speaker 1>that informs how we can capture sound and replicate those

0:05:34.839 --> 0:05:39.120
<v Speaker 1>qualities digitally. So stick with me. We're gonna go back

0:05:39.160 --> 0:05:44.480
<v Speaker 1>to school for some basic sound science. And this goes

0:05:44.560 --> 0:05:48.320
<v Speaker 1>back to the way sound physically moves through a medium,

0:05:48.320 --> 0:05:51.839
<v Speaker 1>whether that's a solid or through the air or through water.

0:05:52.320 --> 0:05:58.640
<v Speaker 1>Sound is vibration. Now we sense this primarily through hearing

0:05:58.680 --> 0:06:01.720
<v Speaker 1>it or some type feeling it. If it's the right

0:06:01.760 --> 0:06:04.720
<v Speaker 1>frequency in the right amplitude, we can actually feel sound.

0:06:05.040 --> 0:06:08.120
<v Speaker 1>Anyone who stood close to, say a sub wiffer that

0:06:08.160 --> 0:06:10.480
<v Speaker 1>was really blasting out bass notes, you know what I'm

0:06:10.520 --> 0:06:14.159
<v Speaker 1>talking about, You can feel it pressing against you. Well,

0:06:14.200 --> 0:06:18.760
<v Speaker 1>sound travels through the air when molecules vibrate against each other,

0:06:19.360 --> 0:06:23.680
<v Speaker 1>and this creates instances of increased pressure and decreased pressure

0:06:24.080 --> 0:06:27.760
<v Speaker 1>at what is a hyperlocal level. We're not talking about

0:06:27.800 --> 0:06:31.000
<v Speaker 1>weather maps here, We're talking about tiny, little areas. So

0:06:31.279 --> 0:06:33.839
<v Speaker 1>this increase and decrease in pressure is something that we

0:06:33.920 --> 0:06:37.679
<v Speaker 1>can sense as sound. When those changes in pressure affect

0:06:37.760 --> 0:06:41.119
<v Speaker 1>a diaphragm, such as one that's in a microphone or

0:06:41.839 --> 0:06:45.919
<v Speaker 1>maybe your ear drum, for example, it causes the diaphragm

0:06:45.960 --> 0:06:50.120
<v Speaker 1>to actually move. So increased pressure pushes the diaphragm in,

0:06:51.080 --> 0:06:56.400
<v Speaker 1>and decreased pressure doesn't really pull the diaphragm out. I mean,

0:06:56.440 --> 0:06:58.680
<v Speaker 1>you could say it it pulls the diaphragm out, but

0:06:58.680 --> 0:07:02.680
<v Speaker 1>to be more accurate, the diagram actually pushes outward because

0:07:02.720 --> 0:07:05.440
<v Speaker 1>the pressure on the outside is lower than the pressure

0:07:05.480 --> 0:07:07.760
<v Speaker 1>on the inside. But you get what I'm saying. The

0:07:07.880 --> 0:07:12.320
<v Speaker 1>diaphragm begins to to flex inward and outward depending upon

0:07:12.680 --> 0:07:16.360
<v Speaker 1>the amount of pressure that it's it's encountering. You could

0:07:16.360 --> 0:07:18.720
<v Speaker 1>imagine this being kind of like a drum drum, not

0:07:18.800 --> 0:07:20.960
<v Speaker 1>an ear drum, but an actual drum and striking it.

0:07:21.800 --> 0:07:24.720
<v Speaker 1>That's the same sort of thing. So sound is the

0:07:24.760 --> 0:07:29.280
<v Speaker 1>fluctuations of pressure, which we can diagram as a wave

0:07:29.880 --> 0:07:32.720
<v Speaker 1>or a wave length a wave form on an X

0:07:32.840 --> 0:07:37.760
<v Speaker 1>Y axis, So the horizontal line that access that represents

0:07:37.840 --> 0:07:41.320
<v Speaker 1>time that has passed, and the vertical axis represents the

0:07:41.400 --> 0:07:46.200
<v Speaker 1>amplitude or the volume of the sound wave. The wave

0:07:46.320 --> 0:07:49.560
<v Speaker 1>length of the sound, which is the distance between successive

0:07:49.600 --> 0:07:52.800
<v Speaker 1>points on a wave, such as like the successive crests

0:07:52.840 --> 0:07:55.480
<v Speaker 1>on a wave. That tells you a lot about the frequency.

0:07:56.400 --> 0:08:00.520
<v Speaker 1>So sound moves at a constant rate through a given medium,

0:08:00.520 --> 0:08:04.080
<v Speaker 1>but it moves at different rates through different media. So,

0:08:04.120 --> 0:08:06.640
<v Speaker 1>in other words, it moves at different speed through a

0:08:06.680 --> 0:08:09.880
<v Speaker 1>solid than it does through air. If the crests of

0:08:09.960 --> 0:08:13.080
<v Speaker 1>each sound wave are really close together, that's a high

0:08:13.160 --> 0:08:17.320
<v Speaker 1>frequency sound. More waves will pass through an arbitrary point

0:08:17.560 --> 0:08:21.080
<v Speaker 1>within a second than waves that are spaced further apart.

0:08:21.440 --> 0:08:24.600
<v Speaker 1>That would be a lower frequency sound. Higher frequency sounds

0:08:24.600 --> 0:08:27.800
<v Speaker 1>have a higher pitch than lower frequency sounds. So if

0:08:27.800 --> 0:08:31.440
<v Speaker 1>you hold a single note at a constant frequency, you'll

0:08:31.440 --> 0:08:34.880
<v Speaker 1>have what is called a simple harmonic motion. That means

0:08:34.920 --> 0:08:38.840
<v Speaker 1>the vibrations are moving at a constant rate inward and outward.

0:08:38.880 --> 0:08:42.400
<v Speaker 1>The cycle is constant. A tuning fork is a good

0:08:42.440 --> 0:08:46.640
<v Speaker 1>example of this. So if you hear a clear C

0:08:46.920 --> 0:08:50.640
<v Speaker 1>note played on a musical instrument, that could be a

0:08:50.679 --> 0:08:53.480
<v Speaker 1>simple harmonic motion. It won't be, but it could be.

0:08:53.600 --> 0:08:55.520
<v Speaker 1>I'll tell you why it won't be in a minute.

0:08:55.840 --> 0:08:59.160
<v Speaker 1>So the frequency of vibration doesn't change, and so you

0:08:59.160 --> 0:09:01.959
<v Speaker 1>would get this very clear note as a result, And

0:09:02.000 --> 0:09:04.800
<v Speaker 1>if you were to diagram it, you would have very

0:09:04.840 --> 0:09:10.040
<v Speaker 1>regular crests and troughs, all of the same amplitude and

0:09:10.120 --> 0:09:13.800
<v Speaker 1>distance from each other. The frequency and volume would remain constant,

0:09:15.040 --> 0:09:17.880
<v Speaker 1>assuming of course, that you're not trying to change the

0:09:17.920 --> 0:09:21.160
<v Speaker 1>frequency or volume. Now, this is where I point out

0:09:21.480 --> 0:09:25.839
<v Speaker 1>most musical instruments don't produce a single clear note, even

0:09:25.880 --> 0:09:30.640
<v Speaker 1>if played expertly. They actually create several resonant frequencies. So

0:09:30.720 --> 0:09:35.319
<v Speaker 1>every physical object resonates at several different frequencies. You've probably

0:09:35.360 --> 0:09:38.960
<v Speaker 1>seen this in various programs. MythBusters did one about bridges,

0:09:39.440 --> 0:09:42.080
<v Speaker 1>the idea being that if you were to have a

0:09:42.080 --> 0:09:44.760
<v Speaker 1>group of people marching on a bridge at the bridge's

0:09:44.800 --> 0:09:48.040
<v Speaker 1>resonant frequency, it could cause the bridge to start to

0:09:48.120 --> 0:09:51.839
<v Speaker 1>vibrate and swing out of control. Well, there's a reason

0:09:51.880 --> 0:09:53.960
<v Speaker 1>for this. You may have also seen videos of people

0:09:54.080 --> 0:09:58.280
<v Speaker 1>singing a certain note and causing a crystal glass to shatter.

0:09:58.880 --> 0:10:02.360
<v Speaker 1>That's because that crystal glass does have a resonant frequency,

0:10:02.400 --> 0:10:04.640
<v Speaker 1>and if you can hit that resonant frequency at the

0:10:04.760 --> 0:10:08.600
<v Speaker 1>right volume, you can cause the glass to start to deform,

0:10:08.720 --> 0:10:11.120
<v Speaker 1>or the crystal in this case, to deform to a

0:10:11.160 --> 0:10:15.120
<v Speaker 1>point where it loses integrity and it shatters as a result. Well,

0:10:16.240 --> 0:10:20.679
<v Speaker 1>the resonation of an object is dependent upon lots of

0:10:20.720 --> 0:10:23.760
<v Speaker 1>different factors, and in fact, most stuff will resonate at

0:10:23.840 --> 0:10:28.240
<v Speaker 1>different frequencies but at different intensities. Like there might be

0:10:28.320 --> 0:10:32.480
<v Speaker 1>one sweet spot, one specific frequency that will have the

0:10:32.559 --> 0:10:37.360
<v Speaker 1>greatest effect, but other related frequencies may also have an effect.

0:10:37.360 --> 0:10:40.720
<v Speaker 1>It will just be to a lesser extent. Well, if

0:10:40.760 --> 0:10:44.200
<v Speaker 1>you were to pluck a guitar string, just you've tuned

0:10:44.200 --> 0:10:46.640
<v Speaker 1>it to whatever note doesn't matter. Let's say it's you've

0:10:46.679 --> 0:10:50.439
<v Speaker 1>tuned it to to G and you play the G

0:10:50.679 --> 0:10:53.960
<v Speaker 1>string on your guitar. Uh, the note that you will

0:10:54.000 --> 0:10:57.280
<v Speaker 1>hear really over all others will be g that that

0:10:57.400 --> 0:10:59.240
<v Speaker 1>is going to be the one that will sound the loudest,

0:10:59.280 --> 0:11:03.679
<v Speaker 1>But it will also play resonant frequencies at a decreased amplitude,

0:11:03.720 --> 0:11:06.839
<v Speaker 1>in other words, of decreased volume, so you still hear

0:11:06.880 --> 0:11:09.679
<v Speaker 1>the intended note above everything else, above all the other

0:11:09.679 --> 0:11:14.320
<v Speaker 1>resonant frequencies. This is called a complex tone, and that

0:11:14.360 --> 0:11:18.040
<v Speaker 1>collection of frequencies in their amplitudes is called the spectrum

0:11:18.240 --> 0:11:21.640
<v Speaker 1>of sound. You get a full spectrum. Now, some of

0:11:21.679 --> 0:11:27.640
<v Speaker 1>the components of that complex tone will be uh imperceptible

0:11:27.679 --> 0:11:30.360
<v Speaker 1>to you. You there'll be so quiet that you wouldn't

0:11:30.440 --> 0:11:33.320
<v Speaker 1>really notice them. They might affect the overall quality of

0:11:33.320 --> 0:11:34.960
<v Speaker 1>the sound, but in such a subtle way that it

0:11:35.000 --> 0:11:38.120
<v Speaker 1>may be difficult for you to even put it into words.

0:11:38.160 --> 0:11:41.360
<v Speaker 1>Each of those little components is called a partial. So

0:11:41.400 --> 0:11:43.679
<v Speaker 1>in the example of a guitar string, the partials are

0:11:43.720 --> 0:11:48.040
<v Speaker 1>all integers of the same fundamental frequency, and the sound

0:11:48.080 --> 0:11:52.680
<v Speaker 1>has a harmonic spectrum. But as you get further away

0:11:52.760 --> 0:11:57.400
<v Speaker 1>from that fundamental frequency, the amplitude decreases significantly. So, like

0:11:57.440 --> 0:12:01.199
<v Speaker 1>I said, you get far enough away, they are technically there,

0:12:01.360 --> 0:12:05.200
<v Speaker 1>but they might be imperceptible to you. Now, some sounds

0:12:05.240 --> 0:12:09.880
<v Speaker 1>have frequencies that aren't integers of a fundamental frequency and

0:12:09.920 --> 0:12:13.120
<v Speaker 1>are inharmonic uh. Certain bells, Like if you hear a

0:12:13.120 --> 0:12:15.160
<v Speaker 1>bell ring, you can probably pick out a couple of

0:12:15.200 --> 0:12:19.560
<v Speaker 1>different frequencies there that are not harmonic frequencies. These are

0:12:19.679 --> 0:12:23.400
<v Speaker 1>very complex sounds, and to our perception, if it's complex enough,

0:12:23.440 --> 0:12:26.959
<v Speaker 1>it can seem like there's no single discernible pitch. They're

0:12:27.080 --> 0:12:31.040
<v Speaker 1>like there's no fundamental frequency over all the others. If

0:12:31.040 --> 0:12:35.320
<v Speaker 1>it's complex enough, we call it noise. That is the

0:12:35.360 --> 0:12:39.440
<v Speaker 1>technical term. It is noise. Now, the unit we use

0:12:39.600 --> 0:12:44.719
<v Speaker 1>to measure frequency is the hurts uh H, E R

0:12:44.840 --> 0:12:49.240
<v Speaker 1>t Z. Typical human hearing ranges from twenty hurts, which

0:12:49.280 --> 0:12:52.760
<v Speaker 1>means a wave will pass a given arbitrary point twenty

0:12:52.840 --> 0:12:55.640
<v Speaker 1>times within a second, all the way up to twenty

0:12:55.760 --> 0:12:59.040
<v Speaker 1>killer hurts, which means a wave will pass a particular

0:12:59.440 --> 0:13:02.640
<v Speaker 1>point in time twenty thousand times in a second, or

0:13:02.800 --> 0:13:05.560
<v Speaker 1>particular point on your wave form twenty thousand times in

0:13:05.559 --> 0:13:09.559
<v Speaker 1>the second. And most of our sensitivity tends to be

0:13:09.559 --> 0:13:12.920
<v Speaker 1>between one or two killer hurts up to four or

0:13:12.960 --> 0:13:17.320
<v Speaker 1>five killer hurts. That's generally where we have human voices,

0:13:17.800 --> 0:13:20.400
<v Speaker 1>and we've really gotten good at picking those out of

0:13:20.480 --> 0:13:23.160
<v Speaker 1>over everything else. So our sensitivity of hearing is really

0:13:23.200 --> 0:13:26.240
<v Speaker 1>concentrated between one killer hurts and four killer hurts or

0:13:26.400 --> 0:13:30.680
<v Speaker 1>two and five depending upon whom you ask. Now we

0:13:30.720 --> 0:13:34.040
<v Speaker 1>get back over to amplitude. That is referring to the

0:13:34.080 --> 0:13:36.800
<v Speaker 1>height of the wave. It also refers to the volume

0:13:37.080 --> 0:13:41.960
<v Speaker 1>the loudness of something. Amplitude means bigness, So how big

0:13:42.160 --> 0:13:45.400
<v Speaker 1>is the sound? Well, the greater the amplitude, the louder

0:13:45.440 --> 0:13:48.480
<v Speaker 1>it is, and amplitudes can have an enormous range and

0:13:48.520 --> 0:13:52.480
<v Speaker 1>affect how we perceive sounds. So, for example, take a

0:13:52.559 --> 0:13:56.840
<v Speaker 1>really complicated classical piece of music. It's just easy to

0:13:56.920 --> 0:14:00.319
<v Speaker 1>explain it in that term. You might have a wretch

0:14:01.080 --> 0:14:03.640
<v Speaker 1>in that classical piece of music in which all the

0:14:03.720 --> 0:14:06.920
<v Speaker 1>instruments are more or less playing at a similar volume,

0:14:07.000 --> 0:14:10.720
<v Speaker 1>so the sound from each instrument section has a similar amplitude.

0:14:11.240 --> 0:14:14.240
<v Speaker 1>But then there might be one segment where an instrument

0:14:14.280 --> 0:14:18.599
<v Speaker 1>group or maybe even a single soloist has an increased

0:14:18.600 --> 0:14:21.640
<v Speaker 1>amplitude and increased volume. It rises over the rest of

0:14:21.680 --> 0:14:25.480
<v Speaker 1>the orchestra, and that peak of the amplitude is called

0:14:25.520 --> 0:14:29.720
<v Speaker 1>the attack of the sound, and the entire range of

0:14:29.760 --> 0:14:34.280
<v Speaker 1>amplitudes is called the amplitude envelope. Now this is important

0:14:34.320 --> 0:14:38.120
<v Speaker 1>when we get to m P three's because the way

0:14:38.120 --> 0:14:42.040
<v Speaker 1>we perceive these sounds, uh that that has everything to

0:14:42.120 --> 0:14:44.720
<v Speaker 1>do with the way the MP three was designed. The

0:14:44.760 --> 0:14:47.720
<v Speaker 1>whole point of the MP three was to try and

0:14:47.760 --> 0:14:53.040
<v Speaker 1>create a small file size to represent what we can

0:14:53.120 --> 0:14:56.080
<v Speaker 1>hear and kind of ignore everything else. But we'll get

0:14:56.120 --> 0:14:58.640
<v Speaker 1>to that in a little bit more more time so

0:14:59.160 --> 0:15:01.880
<v Speaker 1>this is really interesting to me. If you take a

0:15:02.000 --> 0:15:07.920
<v Speaker 1>sound and you double its amplitude, you increase the amplitude

0:15:07.920 --> 0:15:11.760
<v Speaker 1>by twofold, a listener would not necessarily feel that the

0:15:11.800 --> 0:15:16.960
<v Speaker 1>sound is twice as loud. Human hearing is incredibly subjective,

0:15:17.560 --> 0:15:21.640
<v Speaker 1>and typically for most listeners, it would require much more

0:15:22.440 --> 0:15:26.320
<v Speaker 1>than doubling the sounds amplitude for them to feel that

0:15:26.440 --> 0:15:29.960
<v Speaker 1>the sound itself was twice as loud. This perception of

0:15:30.040 --> 0:15:32.480
<v Speaker 1>volume is important when we get to the lossy formats

0:15:32.480 --> 0:15:37.440
<v Speaker 1>for audio files. Now I've given you all this information,

0:15:37.640 --> 0:15:40.600
<v Speaker 1>and I know everyone is probably thinking, you know, I

0:15:40.680 --> 0:15:44.040
<v Speaker 1>learned this in primary school, elementary school. All of this

0:15:44.120 --> 0:15:47.360
<v Speaker 1>is really familiar to me, and you're maybe rolling your

0:15:47.360 --> 0:15:50.400
<v Speaker 1>eyes because it's so basic. But I think it's important

0:15:50.840 --> 0:15:54.120
<v Speaker 1>to have that refresher so that you can understand the

0:15:54.160 --> 0:15:58.800
<v Speaker 1>difference between sound as we experience it and sound as

0:15:58.880 --> 0:16:03.520
<v Speaker 1>the way we hold it digitally and replicate it digitally.

0:16:04.400 --> 0:16:07.400
<v Speaker 1>For one thing, this illustrates how sound in the real

0:16:07.440 --> 0:16:12.200
<v Speaker 1>world is a continuum. It's a continuum both in frequency

0:16:12.240 --> 0:16:17.800
<v Speaker 1>and amplitude. You can have sound changing in frequency very

0:16:17.800 --> 0:16:22.080
<v Speaker 1>smoothly from one pitch to another. You can also have

0:16:22.200 --> 0:16:26.800
<v Speaker 1>sound increase or decrease in amplitude in a very smooth way.

0:16:26.920 --> 0:16:31.800
<v Speaker 1>And it is continuous, it's unbroken, it can have smooth transitions.

0:16:31.800 --> 0:16:34.800
<v Speaker 1>And these qualities provide challenges when we want to describe

0:16:34.840 --> 0:16:40.520
<v Speaker 1>something digitally, because at the heart of digital information is

0:16:40.960 --> 0:16:45.680
<v Speaker 1>the bit, the basic unit of information. It is a

0:16:45.800 --> 0:16:49.440
<v Speaker 1>unit of information that only has two states zero or

0:16:49.560 --> 0:16:53.720
<v Speaker 1>one is essentially off or on. When you get down

0:16:53.760 --> 0:16:58.600
<v Speaker 1>to defining information in just two states, then you start

0:16:58.640 --> 0:17:02.320
<v Speaker 1>to look at something that's continuous and you realize this

0:17:02.400 --> 0:17:04.359
<v Speaker 1>is going to be a challenge. How do I describe

0:17:04.400 --> 0:17:10.840
<v Speaker 1>a continuous experience in very discreet amounts of information. And

0:17:10.920 --> 0:17:15.520
<v Speaker 1>that's when we get to the methodology we've developed to

0:17:15.920 --> 0:17:19.359
<v Speaker 1>digitally encode sound. I'm going to get into that in

0:17:19.640 --> 0:17:22.880
<v Speaker 1>just a minute, but before I do that, let's take

0:17:22.880 --> 0:17:34.520
<v Speaker 1>a quick break to thank our sponsor. All right, let's

0:17:34.560 --> 0:17:38.800
<v Speaker 1>get back into it. So we've talked about the nature

0:17:38.840 --> 0:17:42.120
<v Speaker 1>of sound. Analog sound, by the way, tries to replicate

0:17:42.359 --> 0:17:45.600
<v Speaker 1>exactly what we would experience in nature. It tries to

0:17:45.600 --> 0:17:51.200
<v Speaker 1>create this continuous experience, so you get these smooth waves

0:17:51.240 --> 0:17:56.800
<v Speaker 1>of frequencies and amplitudes. And that's why some people argue

0:17:56.880 --> 0:18:02.760
<v Speaker 1>that that analog styles of of sound recordings are superior

0:18:02.840 --> 0:18:07.399
<v Speaker 1>to digital ones. I don't necessarily think they're right, but

0:18:07.560 --> 0:18:12.280
<v Speaker 1>they often feel that way. So something like a vinyl album,

0:18:12.320 --> 0:18:16.080
<v Speaker 1>which is an analog format of digital or sorry, an

0:18:16.080 --> 0:18:20.240
<v Speaker 1>analog format of music storage I should say sound storage. Uh,

0:18:20.280 --> 0:18:22.960
<v Speaker 1>they think that that is superior to say a CD,

0:18:23.280 --> 0:18:28.280
<v Speaker 1>which is a digital storage format. Uh. And who's to say.

0:18:28.359 --> 0:18:32.399
<v Speaker 1>I mean, like, if your sense of hearing is incredibly

0:18:32.680 --> 0:18:36.040
<v Speaker 1>well tuned, you might be able to pick up on

0:18:36.080 --> 0:18:40.080
<v Speaker 1>some differences. Or if someone did a really terrible job

0:18:40.640 --> 0:18:45.960
<v Speaker 1>encoding music digitally, then that might reveal itself to you

0:18:46.000 --> 0:18:48.760
<v Speaker 1>as well. Uh. But this is one of those things

0:18:48.760 --> 0:18:50.920
<v Speaker 1>that I think a lot of people feel they can

0:18:50.920 --> 0:18:52.720
<v Speaker 1>tell the difference, but if they would do a double

0:18:52.760 --> 0:18:57.280
<v Speaker 1>blind test, they might be surprised at how difficult it is.

0:18:57.760 --> 0:19:01.160
<v Speaker 1>If things if everything's working the way it should, then

0:19:01.400 --> 0:19:05.960
<v Speaker 1>there shouldn't be a perceptible difference at any rate. Digital

0:19:05.960 --> 0:19:12.320
<v Speaker 1>audio has two really important factors, sample rate and bit depth,

0:19:13.119 --> 0:19:15.600
<v Speaker 1>or to another extent, bit rate. We'll talk about bit

0:19:15.720 --> 0:19:20.240
<v Speaker 1>rate as well. So the sample rate refers to how

0:19:20.280 --> 0:19:23.840
<v Speaker 1>many times you reference an analog sound to create the

0:19:23.920 --> 0:19:27.720
<v Speaker 1>digital version. So sound like I said, is uninterrupted. In

0:19:27.760 --> 0:19:32.840
<v Speaker 1>the analog world, you've got that that nice wave form.

0:19:32.880 --> 0:19:36.000
<v Speaker 1>In the analog world, that's not how digital world works.

0:19:36.080 --> 0:19:39.280
<v Speaker 1>Digital world, we have to describe that sound in a

0:19:39.359 --> 0:19:45.560
<v Speaker 1>series of discrete snippets of sound. It's probably easiest to

0:19:45.600 --> 0:19:51.800
<v Speaker 1>describe this with an analogy to movies on film. If

0:19:51.840 --> 0:19:55.320
<v Speaker 1>you work with film, like you're creating a movie on film,

0:19:55.800 --> 0:19:58.960
<v Speaker 1>then you know that you're not looking at a real

0:19:59.200 --> 0:20:02.200
<v Speaker 1>moving picture when you see the film played out at

0:20:02.200 --> 0:20:05.480
<v Speaker 1>the cinema. Instead, what you're looking at is a series

0:20:05.600 --> 0:20:10.120
<v Speaker 1>of photographs. If you take a film strip and you

0:20:10.160 --> 0:20:14.200
<v Speaker 1>look at it under a light, you'll see it's one

0:20:14.320 --> 0:20:18.720
<v Speaker 1>after another photograph. It's just a series of pictures. It's

0:20:18.720 --> 0:20:20.880
<v Speaker 1>only when you play them back at the right speed

0:20:21.480 --> 0:20:23.760
<v Speaker 1>and you projected onto a screen that you get the

0:20:23.840 --> 0:20:28.480
<v Speaker 1>illusion of continuous motion. But it's not really continuous. It's

0:20:28.520 --> 0:20:31.720
<v Speaker 1>just this series of photographs played at twenty four frames

0:20:31.760 --> 0:20:36.800
<v Speaker 1>per second in the case of actual film. So that

0:20:37.000 --> 0:20:40.119
<v Speaker 1>ends up being very analogous to the way we encode

0:20:40.160 --> 0:20:44.000
<v Speaker 1>digital audio. You take the analog recording and you take

0:20:44.280 --> 0:20:49.800
<v Speaker 1>snapshots of sound. The more frequently you take those snapshots,

0:20:50.200 --> 0:20:52.440
<v Speaker 1>the higher your sample rates. So in other words, if

0:20:52.440 --> 0:20:55.600
<v Speaker 1>you did one a second, your sample rate would be awful.

0:20:56.320 --> 0:20:58.560
<v Speaker 1>You would have a sample rate of one. But the

0:20:58.640 --> 0:21:01.400
<v Speaker 1>higher the sample rate, the close to your digital representation

0:21:01.440 --> 0:21:05.240
<v Speaker 1>will be to the frequency in the analog sound format. Actually,

0:21:05.720 --> 0:21:07.960
<v Speaker 1>what's really important to remember is that your sample rate

0:21:08.000 --> 0:21:10.399
<v Speaker 1>has to be about twice actually does have to be

0:21:10.480 --> 0:21:14.879
<v Speaker 1>twice what the highest frequency sound is in your recording.

0:21:16.359 --> 0:21:20.119
<v Speaker 1>It has to be because if it's not, it cannot

0:21:20.280 --> 0:21:25.879
<v Speaker 1>encode that sound accurately. It's kind of interesting and you

0:21:25.960 --> 0:21:27.960
<v Speaker 1>might wonder, how do we take these snapshots in the

0:21:27.960 --> 0:21:31.080
<v Speaker 1>first place. Well, if you're capturing audio, let's say we're

0:21:31.119 --> 0:21:34.560
<v Speaker 1>recording to digital, So we've got a microphone set up,

0:21:34.920 --> 0:21:39.240
<v Speaker 1>and we're recording to a digital media storage. Like let's

0:21:39.240 --> 0:21:41.480
<v Speaker 1>just say we're recording straight to someone's hard drive. So

0:21:41.520 --> 0:21:44.720
<v Speaker 1>we're talking into a microphone recording to a hard drive.

0:21:45.640 --> 0:21:49.400
<v Speaker 1>So you're using an analog microphone. Let's say you would

0:21:49.400 --> 0:21:53.720
<v Speaker 1>need an analog to digital converter. Now, this particular component

0:21:54.000 --> 0:21:58.719
<v Speaker 1>can receive discrete voltages from another device like your microphone.

0:21:59.000 --> 0:22:05.720
<v Speaker 1>So your microphone is converting sound into uh differences in voltage.

0:22:05.960 --> 0:22:08.840
<v Speaker 1>That's essentially how it communicates. So that it can then

0:22:09.000 --> 0:22:12.040
<v Speaker 1>send that to some other element. In this case, it's

0:22:12.080 --> 0:22:15.679
<v Speaker 1>sending it to the the analog to digital converter so

0:22:15.720 --> 0:22:18.359
<v Speaker 1>that it can be stored digitally on your hard drive.

0:22:19.400 --> 0:22:26.560
<v Speaker 1>So this analog digital converters references or samples the discrete

0:22:26.640 --> 0:22:30.199
<v Speaker 1>voltage many times every second in order to create a

0:22:30.240 --> 0:22:34.720
<v Speaker 1>digital representation of the analog sound. It converts the voltages

0:22:34.800 --> 0:22:39.360
<v Speaker 1>into numbers in a process called quantization, and we express

0:22:39.400 --> 0:22:42.439
<v Speaker 1>those numbers in bits, So these are zeros and ones.

0:22:43.000 --> 0:22:45.720
<v Speaker 1>When you want to play the digital audio, a digital

0:22:45.760 --> 0:22:49.760
<v Speaker 1>to analog converter does the same process in reverse. So

0:22:50.040 --> 0:22:53.720
<v Speaker 1>it takes this digital information, these zeros and ones and

0:22:53.840 --> 0:22:57.520
<v Speaker 1>converts it into a series of discrete voltages, which then

0:22:57.800 --> 0:23:01.480
<v Speaker 1>can be amplified and sent to a speaker and create sound.

0:23:02.720 --> 0:23:05.280
<v Speaker 1>So all of that's really important. But now let's let's

0:23:05.320 --> 0:23:07.879
<v Speaker 1>talk about some concrete examples. And the best way to

0:23:07.920 --> 0:23:11.199
<v Speaker 1>do this is to go with compact discs. Because we

0:23:11.280 --> 0:23:15.080
<v Speaker 1>have a standard sample rate for compact discs, and that

0:23:15.240 --> 0:23:18.520
<v Speaker 1>standard sample rate is forty four point one killer hurts

0:23:18.600 --> 0:23:22.119
<v Speaker 1>to create CD equality audio. That means that the audio

0:23:22.240 --> 0:23:27.960
<v Speaker 1>is sampled forty four thousand, one hundred times every second

0:23:28.840 --> 0:23:30.800
<v Speaker 1>the way to hear. You say, the range of human

0:23:30.840 --> 0:23:33.280
<v Speaker 1>hearing you said only goes to twenty hurts to twenty

0:23:33.359 --> 0:23:36.240
<v Speaker 1>killer hurts. If it only goes up to twenty killer hurts,

0:23:36.240 --> 0:23:39.000
<v Speaker 1>why are you sampling at forty four thousand, one hundred

0:23:39.119 --> 0:23:43.520
<v Speaker 1>times every second? If it's twenty thousand times a second

0:23:43.560 --> 0:23:46.680
<v Speaker 1>for the frequency, why go up to forty four thousand,

0:23:46.760 --> 0:23:49.359
<v Speaker 1>one hundred Is there some relationship between that and the

0:23:49.400 --> 0:23:52.640
<v Speaker 1>CD sample rate? And the answer is yes. So there

0:23:52.760 --> 0:23:57.959
<v Speaker 1>is a theorem called the Niquist Shannon sampling theorem, and

0:23:58.040 --> 0:24:00.719
<v Speaker 1>that states that the sample rate must be twice the

0:24:00.760 --> 0:24:03.960
<v Speaker 1>maximum frequency of a recording in order to describe the

0:24:04.000 --> 0:24:08.200
<v Speaker 1>frequency properly. So the general thought is the maximum frequency

0:24:08.240 --> 0:24:10.879
<v Speaker 1>most humans can here's twenty killer hurts. And for that reason,

0:24:10.920 --> 0:24:13.760
<v Speaker 1>Phillips and Sony when they were working to create the

0:24:13.920 --> 0:24:17.919
<v Speaker 1>CD format to make it a standard, they decided on

0:24:17.960 --> 0:24:20.840
<v Speaker 1>forty four point one killer hurts as that standard sample

0:24:20.920 --> 0:24:23.359
<v Speaker 1>rate for c D audio. It was more than double

0:24:23.400 --> 0:24:26.000
<v Speaker 1>the top frequency generally considered to be in the upper

0:24:26.080 --> 0:24:29.120
<v Speaker 1>level of human hearing. But what happens if you were

0:24:29.160 --> 0:24:32.360
<v Speaker 1>to lower the sampling rate. What if you didn't sample

0:24:32.440 --> 0:24:37.520
<v Speaker 1>at What if you sampled at let's say sixteen killer hurts,

0:24:37.560 --> 0:24:41.040
<v Speaker 1>so sixteen thousand times a second you sample it. Well,

0:24:41.359 --> 0:24:43.520
<v Speaker 1>that means you would only be able to record and

0:24:43.560 --> 0:24:47.119
<v Speaker 1>replicate any sound with a frequency up to eight killer

0:24:47.200 --> 0:24:52.240
<v Speaker 1>hurts or less, so eight thousand hurts or less. But

0:24:52.400 --> 0:24:55.560
<v Speaker 1>if you had any sound that was greater than eight

0:24:55.600 --> 0:24:59.879
<v Speaker 1>thousand hurts or eight killer hurts, anything higher than that,

0:25:00.000 --> 0:25:04.360
<v Speaker 1>it would be folded down to fit below the eight

0:25:04.440 --> 0:25:08.160
<v Speaker 1>killer hurts limit. Perceptually, that means the sounds you would

0:25:08.200 --> 0:25:11.159
<v Speaker 1>hear in the playback could include frequencies that were not

0:25:11.320 --> 0:25:16.120
<v Speaker 1>present in the original performance of that sound. So let's

0:25:16.119 --> 0:25:20.560
<v Speaker 1>say that I'm using a sample rate of sixteen uh,

0:25:20.600 --> 0:25:24.359
<v Speaker 1>you know, killer hurts, and someone is playing a musical

0:25:24.400 --> 0:25:27.160
<v Speaker 1>instrument and they play a note that's at a nine

0:25:27.200 --> 0:25:32.720
<v Speaker 1>killer hurts frequency. Well, because I'm sampling at sixteen killer hurts,

0:25:33.320 --> 0:25:37.639
<v Speaker 1>my limit for frequencies is eight killer hurts. If you

0:25:37.680 --> 0:25:40.560
<v Speaker 1>play something at nine killer hurts, what happens is it

0:25:40.880 --> 0:25:45.240
<v Speaker 1>the recording seems to fold the sound back, and it

0:25:45.359 --> 0:25:49.840
<v Speaker 1>folds it back at the same limit that the sound

0:25:49.880 --> 0:25:54.960
<v Speaker 1>goes over. The sample rate, or rather the Nyquist limit,

0:25:55.000 --> 0:25:57.560
<v Speaker 1>I should say, not the sample rateself but the Nyquist limit,

0:25:58.720 --> 0:26:03.720
<v Speaker 1>so nine killer her sound played. My limit is eight

0:26:03.800 --> 0:26:06.960
<v Speaker 1>killer hurts. Well, nine killer hurts is one killer hurts

0:26:06.960 --> 0:26:10.000
<v Speaker 1>more than eight, so it folds it back and the

0:26:10.040 --> 0:26:13.320
<v Speaker 1>sound you would hear on the recording would be seven

0:26:13.400 --> 0:26:17.000
<v Speaker 1>killer hurts. So the original sound is nine killer hurts,

0:26:17.080 --> 0:26:21.480
<v Speaker 1>the playback sound is seven killer hurts, and you would

0:26:21.520 --> 0:26:25.639
<v Speaker 1>hear something recorded that wasn't actually played. That's why you

0:26:25.680 --> 0:26:28.800
<v Speaker 1>have to have a really high sample rate so that

0:26:28.840 --> 0:26:32.679
<v Speaker 1>you don't have these instances where sound gets folded back

0:26:33.480 --> 0:26:38.359
<v Speaker 1>into the frequency range, because otherwise what you are hearing

0:26:38.520 --> 0:26:42.480
<v Speaker 1>is not an accurate representation of what was actually generated

0:26:42.760 --> 0:26:46.919
<v Speaker 1>what you were trying to record. This whole phenomenon, by

0:26:46.920 --> 0:26:51.800
<v Speaker 1>the way, is called fold over or sometimes aliasing. So

0:26:51.840 --> 0:26:54.800
<v Speaker 1>that's sample rate. But then we've got bit depth. Now,

0:26:54.840 --> 0:26:59.080
<v Speaker 1>this is all about measuring the volume or amplitude of

0:26:59.119 --> 0:27:02.359
<v Speaker 1>a sound. So you have a range. You just make

0:27:02.400 --> 0:27:06.240
<v Speaker 1>an arbitrary range to say, like we're gonna go quietest

0:27:06.280 --> 0:27:09.199
<v Speaker 1>to loudest, and you just define what that range is.

0:27:09.400 --> 0:27:12.120
<v Speaker 1>It could literally be any range. Let's say you say

0:27:12.200 --> 0:27:15.960
<v Speaker 1>zero to one hundred. Zero is dead silence, no sound

0:27:16.000 --> 0:27:19.560
<v Speaker 1>at all. One hundred is as loud as the sound

0:27:19.720 --> 0:27:24.160
<v Speaker 1>ever gets. It's the peak volume of sound. That means

0:27:24.200 --> 0:27:28.560
<v Speaker 1>you can describe all the different volumes within that recording

0:27:29.119 --> 0:27:33.000
<v Speaker 1>at a number between zero and one hundred. But let's

0:27:33.000 --> 0:27:36.320
<v Speaker 1>say you take that same recording and instead of making

0:27:36.320 --> 0:27:39.679
<v Speaker 1>the range zero to one hundred, you say it's zero

0:27:39.760 --> 0:27:43.919
<v Speaker 1>to two thousand. You haven't made the volume louder. The

0:27:44.000 --> 0:27:47.080
<v Speaker 1>volume is still the exact same as it was when

0:27:47.119 --> 0:27:49.879
<v Speaker 1>you called the range zero to one hundred. But what

0:27:50.000 --> 0:27:53.720
<v Speaker 1>you have done is added more units. You have created

0:27:53.880 --> 0:27:58.880
<v Speaker 1>more precise steps between absolute silent and as loud as

0:27:58.920 --> 0:28:02.720
<v Speaker 1>it gets. So you've just increased the size of the

0:28:02.800 --> 0:28:04.760
<v Speaker 1>range so that you can be more precise in the

0:28:04.800 --> 0:28:09.280
<v Speaker 1>differences in volume. And this is really important. So let's

0:28:09.320 --> 0:28:11.800
<v Speaker 1>say that you've got a sound that you rank at

0:28:11.880 --> 0:28:15.440
<v Speaker 1>seventy eight and another sound that you rank at seventy nine,

0:28:16.080 --> 0:28:18.920
<v Speaker 1>and that's gonna be the same for both of these ranges. Uh,

0:28:19.040 --> 0:28:21.880
<v Speaker 1>just two different examples. Actually, So you've got your zero

0:28:21.880 --> 0:28:25.840
<v Speaker 1>to one range, and a seventy eight would be seventy

0:28:25.840 --> 0:28:29.760
<v Speaker 1>eight percent of the loudest sound in the entire recording,

0:28:30.280 --> 0:28:33.159
<v Speaker 1>and at seventy nine would be a seventy nine of

0:28:33.200 --> 0:28:36.960
<v Speaker 1>the loudest sound in the entire recording. That's an actually

0:28:36.960 --> 0:28:39.760
<v Speaker 1>pretty hefty jump. But let's say we instead went with

0:28:39.800 --> 0:28:42.920
<v Speaker 1>that zero to two thousand range and you still had

0:28:42.920 --> 0:28:47.160
<v Speaker 1>seventy eight and seventy nine. Well, seventy eight would represent

0:28:47.280 --> 0:28:50.840
<v Speaker 1>three point nine percent of the full volume and seventy

0:28:50.920 --> 0:28:54.480
<v Speaker 1>nine would resent represent three point nine five of a

0:28:54.520 --> 0:28:57.640
<v Speaker 1>full volume. In other words, you'd be able to mark

0:28:57.960 --> 0:29:02.280
<v Speaker 1>much more subtle differences in volume, and that means you

0:29:02.280 --> 0:29:06.680
<v Speaker 1>can have more nuance in your recording. And since we're

0:29:06.680 --> 0:29:09.800
<v Speaker 1>talking about a natural sound to start off with, so

0:29:09.840 --> 0:29:12.360
<v Speaker 1>you're taking a natural sound and you're trying to digitize it.

0:29:13.160 --> 0:29:17.800
<v Speaker 1>Smooth changes in amplitude are possible in natural sound. Using

0:29:17.800 --> 0:29:21.000
<v Speaker 1>a broader range to describe the volume is best if

0:29:21.000 --> 0:29:25.320
<v Speaker 1>you want to get an accurate representation or resolution of

0:29:25.360 --> 0:29:28.880
<v Speaker 1>that sound. Going back to that zero to one range

0:29:29.200 --> 0:29:32.240
<v Speaker 1>changes in volume would be more chunky. Two sounds that

0:29:32.280 --> 0:29:36.440
<v Speaker 1>have slight differences in amplitude would end up being defined

0:29:36.520 --> 0:29:40.680
<v Speaker 1>as being identical because you wouldn't have the precision. You know,

0:29:40.720 --> 0:29:42.760
<v Speaker 1>you couldn't say this one seventy eight and a half.

0:29:43.240 --> 0:29:45.520
<v Speaker 1>It would either be seventy eight or seventy nine. So

0:29:45.600 --> 0:29:48.960
<v Speaker 1>you could have two sounds that in a greater precision

0:29:49.120 --> 0:29:52.680
<v Speaker 1>you could tell the difference between their volumes. But if

0:29:52.720 --> 0:29:57.240
<v Speaker 1>you have that lower, that more shallow bit depth, you

0:29:57.240 --> 0:29:58.800
<v Speaker 1>wouldn't be able to tell the difference of it. You

0:29:58.840 --> 0:30:01.840
<v Speaker 1>would lose that new once that's subtlety. This is part

0:30:01.880 --> 0:30:06.000
<v Speaker 1>of the reason why people say, like a lot of

0:30:06.040 --> 0:30:10.480
<v Speaker 1>the modern music has uh lower ranges and changes in volume,

0:30:10.600 --> 0:30:14.480
<v Speaker 1>like the the loudest loud parts and the softest soft parts.

0:30:14.520 --> 0:30:18.480
<v Speaker 1>That range has decreased over time, which a lot of

0:30:18.480 --> 0:30:21.320
<v Speaker 1>people have argued has meant that music has gotten less

0:30:21.880 --> 0:30:27.120
<v Speaker 1>complex and therefore, in some minds, less interesting. That's on

0:30:27.160 --> 0:30:31.160
<v Speaker 1>a related uh kind of philosophy to what I'm talking

0:30:31.200 --> 0:30:36.640
<v Speaker 1>about here. So you want to have those smaller steps

0:30:36.720 --> 0:30:40.760
<v Speaker 1>between each unit so you can create greater resolution, more

0:30:40.880 --> 0:30:46.360
<v Speaker 1>smoothness to the recorded audio. And it's actually the bit

0:30:46.480 --> 0:30:49.480
<v Speaker 1>rate and CD audio that will help make the sound

0:30:49.680 --> 0:30:53.480
<v Speaker 1>seem smooth. So if you ever listened to eight bit music,

0:30:53.880 --> 0:30:56.480
<v Speaker 1>you know, like the kind from old video game consoles,

0:30:56.520 --> 0:30:59.520
<v Speaker 1>that sound is really harsh and sort of chunky and

0:30:59.640 --> 0:31:03.040
<v Speaker 1>has an appeal, but it's not you know, it's not

0:31:03.440 --> 0:31:07.160
<v Speaker 1>smooth at all. It can create an amazing effect, but

0:31:07.200 --> 0:31:10.960
<v Speaker 1>if you want to represent true analog sound, it's not awesome.

0:31:11.960 --> 0:31:15.920
<v Speaker 1>If you went up to sixteen bit, that's CD quality

0:31:16.000 --> 0:31:21.080
<v Speaker 1>bit depth, it's much better. Uh, Professional recording studios will

0:31:21.120 --> 0:31:25.240
<v Speaker 1>do four bit or thirty two bit because they're gonna

0:31:25.280 --> 0:31:28.800
<v Speaker 1>do a lot of post processing work on those audio files.

0:31:29.080 --> 0:31:31.000
<v Speaker 1>And when you do that post processing work, if you

0:31:31.040 --> 0:31:34.840
<v Speaker 1>do it at sixteen bit, the stuff you're doing, the

0:31:34.920 --> 0:31:37.840
<v Speaker 1>changes you make can become noticeable, and most times you

0:31:37.880 --> 0:31:40.440
<v Speaker 1>don't want that. You don't want it to be you know,

0:31:40.680 --> 0:31:42.600
<v Speaker 1>you don't want it to stand out from the rest

0:31:42.600 --> 0:31:45.320
<v Speaker 1>of the audio file. But that's the only reason they

0:31:45.320 --> 0:31:47.360
<v Speaker 1>go up to twenty four bit or thirty two bit.

0:31:47.720 --> 0:31:51.320
<v Speaker 1>There'd be no point in playing it back at that rate,

0:31:51.440 --> 0:31:57.160
<v Speaker 1>that bit depth, because human hearing is not so adept

0:31:57.320 --> 0:32:00.280
<v Speaker 1>to tell the difference, at least not from most human ends.

0:32:01.120 --> 0:32:04.240
<v Speaker 1>So if you played back a recording at sixteen bit

0:32:04.560 --> 0:32:07.080
<v Speaker 1>and another one at four bit, it's the same piece.

0:32:07.760 --> 0:32:10.080
<v Speaker 1>Most people would not be able to tell the difference

0:32:10.120 --> 0:32:14.360
<v Speaker 1>because you've already reached a resolution that equals the precision

0:32:14.440 --> 0:32:18.120
<v Speaker 1>of human hearing. Keeping in mind again, human hearing is subjective,

0:32:18.360 --> 0:32:21.680
<v Speaker 1>not everyone is equal. There's some people who have incredible

0:32:21.720 --> 0:32:24.680
<v Speaker 1>hearing who may be able to pick out that difference.

0:32:25.520 --> 0:32:27.880
<v Speaker 1>I am not one of those people, but I am

0:32:27.920 --> 0:32:30.200
<v Speaker 1>a person who's going to tell you. We'll get to

0:32:30.240 --> 0:32:34.200
<v Speaker 1>the last section in just a bit, but first let's

0:32:34.200 --> 0:32:45.440
<v Speaker 1>take another quick break to thank our sponsor. All Right,

0:32:45.520 --> 0:32:48.320
<v Speaker 1>So bits depth, what we just talked about that can

0:32:48.360 --> 0:32:51.800
<v Speaker 1>be thought of is how well the sound is described,

0:32:52.640 --> 0:32:55.680
<v Speaker 1>and the sampling rate is how frequently or how much

0:32:56.040 --> 0:33:00.560
<v Speaker 1>the sound is described. And CD Audio quality has sixteen

0:33:00.560 --> 0:33:05.000
<v Speaker 1>bit audio. That means that they actually have sixty five thousand,

0:33:05.160 --> 0:33:09.480
<v Speaker 1>five hundred thirty six different levels of volume that they

0:33:09.480 --> 0:33:13.800
<v Speaker 1>can describe within an audio track. So my example of

0:33:13.880 --> 0:33:18.280
<v Speaker 1>zero to two thousand that is primitive compared to c

0:33:18.440 --> 0:33:22.600
<v Speaker 1>D audio because it has the sixteen bit style five

0:33:22.640 --> 0:33:26.160
<v Speaker 1>hundred thirty six different levels. And how is that possible. Well,

0:33:28.000 --> 0:33:31.840
<v Speaker 1>when we say sixteen bit, remember a bit represents two

0:33:31.880 --> 0:33:34.240
<v Speaker 1>states zero or one. So you take the number two

0:33:34.960 --> 0:33:39.480
<v Speaker 1>and then you raise it to the power of sixteen ah,

0:33:39.680 --> 0:33:43.800
<v Speaker 1>so you multiply to by itself sixteen times and you

0:33:43.840 --> 0:33:47.280
<v Speaker 1>get sixty five thousand, three D fifty six. So that's

0:33:47.280 --> 0:33:51.160
<v Speaker 1>that's where that number comes from. Now, with your digital sample.

0:33:52.080 --> 0:33:54.840
<v Speaker 1>You have a collection of points that roughly replicate the

0:33:54.920 --> 0:33:57.760
<v Speaker 1>shape of an analog sound wave. It's gonna look a

0:33:57.760 --> 0:34:01.080
<v Speaker 1>little funky, but you'll be able to see what the

0:34:01.240 --> 0:34:05.760
<v Speaker 1>frequency and amplitude generally was of the original recording if

0:34:05.760 --> 0:34:08.800
<v Speaker 1>you were to plot this on an X y axis.

0:34:09.600 --> 0:34:12.439
<v Speaker 1>But if you were just to connect each successive point

0:34:12.480 --> 0:34:15.759
<v Speaker 1>with a straight line, even as close together as they

0:34:15.800 --> 0:34:18.040
<v Speaker 1>would be, because you're looking at forty four thousand one

0:34:18.400 --> 0:34:22.080
<v Speaker 1>times a second, it had sound pretty awful. So we

0:34:22.120 --> 0:34:26.440
<v Speaker 1>actually use an algorithm called interpolation to join the points

0:34:26.719 --> 0:34:29.960
<v Speaker 1>smoothly to imitate a sound wave form, and that gives

0:34:30.000 --> 0:34:33.640
<v Speaker 1>a musical playback program the ability to replicate an analog

0:34:33.680 --> 0:34:38.040
<v Speaker 1>wave form. And that's actually called pulse code modulation or

0:34:38.160 --> 0:34:45.200
<v Speaker 1>pc M. And if you store audio uh intact this way,

0:34:45.320 --> 0:34:48.560
<v Speaker 1>you would have what we call a lossless audio file,

0:34:48.960 --> 0:34:51.640
<v Speaker 1>which means exactly what it sounds like. None of that

0:34:51.760 --> 0:34:54.919
<v Speaker 1>data would ever get filtered out of the file, even

0:34:54.960 --> 0:34:57.800
<v Speaker 1>if the sounds were beyond the range of human hearing,

0:34:57.800 --> 0:35:01.000
<v Speaker 1>they would be recorded, and you would have a lossless

0:35:01.200 --> 0:35:05.080
<v Speaker 1>file format. Those files tend to be quite big, depending

0:35:05.160 --> 0:35:08.520
<v Speaker 1>upon how long a recording you make. Of course, all right.

0:35:09.000 --> 0:35:11.400
<v Speaker 1>So now here's where it gets a little confusing. And

0:35:11.440 --> 0:35:13.080
<v Speaker 1>I think I even said bit rate a couple of

0:35:13.080 --> 0:35:16.000
<v Speaker 1>times when I really meant bit depth earlier. But up

0:35:16.040 --> 0:35:19.319
<v Speaker 1>to this point, I really was talking bit depth. So

0:35:19.680 --> 0:35:22.120
<v Speaker 1>my apologies to all of you out there if a

0:35:22.160 --> 0:35:24.719
<v Speaker 1>bit rate slipped through because I did not mean it.

0:35:24.800 --> 0:35:27.520
<v Speaker 1>Now I'm going to talk about bit rate and show

0:35:27.560 --> 0:35:32.080
<v Speaker 1>you how it's different than bit depth. Bit Rate refers

0:35:32.120 --> 0:35:35.799
<v Speaker 1>to the amount of data audio uses per second or

0:35:35.840 --> 0:35:39.960
<v Speaker 1>requires per second of recording, and you derive bit rate

0:35:40.360 --> 0:35:43.960
<v Speaker 1>from the bit depth and this sampling rate it's represented

0:35:44.040 --> 0:35:47.520
<v Speaker 1>as bits per second. So again let's go to seed

0:35:47.560 --> 0:35:52.200
<v Speaker 1>equality sound. That makes it easy. You have thousand samples

0:35:52.360 --> 0:35:57.160
<v Speaker 1>per second, you've got sixteen bits or two bites because

0:35:57.239 --> 0:36:00.800
<v Speaker 1>remember a bite is eight bits, so you two bites

0:36:00.840 --> 0:36:07.320
<v Speaker 1>to describe each sample. So two bites for samples per second.

0:36:08.200 --> 0:36:11.040
<v Speaker 1>Uh plus you probably are gonna have to multiply that

0:36:11.160 --> 0:36:14.719
<v Speaker 1>by two because you're probably recording in stereo, so you

0:36:14.760 --> 0:36:18.680
<v Speaker 1>have to do that once reach track. So you get

0:36:18.719 --> 0:36:21.080
<v Speaker 1>that number, then you have to multiply that by sixty

0:36:21.080 --> 0:36:23.839
<v Speaker 1>seconds to determine how much data per minute you are

0:36:23.960 --> 0:36:27.520
<v Speaker 1>creating when you're recording and with seed quality audio that

0:36:27.600 --> 0:36:30.400
<v Speaker 1>ends up being about ten megabytes of data per minute.

0:36:31.239 --> 0:36:34.000
<v Speaker 1>Now these days that's not really that big a deal

0:36:34.680 --> 0:36:38.719
<v Speaker 1>because we're dealing with super fast Internet speeds and enormous

0:36:38.800 --> 0:36:42.600
<v Speaker 1>hard drives. But just a few years ago, that was

0:36:42.640 --> 0:36:46.000
<v Speaker 1>considered to be a really sizeable file, I mean an

0:36:46.120 --> 0:36:48.920
<v Speaker 1>enormous file. And so if you wanted to find a

0:36:48.920 --> 0:36:51.879
<v Speaker 1>way to distribute digital audio so it didn't take up

0:36:51.920 --> 0:36:55.840
<v Speaker 1>too much space, you had to figure out how you

0:36:55.880 --> 0:37:00.000
<v Speaker 1>could compress those files and make them smaller, make them

0:37:00.000 --> 0:37:04.239
<v Speaker 1>more manageable. And now we can finally get back to

0:37:04.320 --> 0:37:08.240
<v Speaker 1>Germany and Hair Brandenburg. You thought we left him behind,

0:37:08.960 --> 0:37:12.680
<v Speaker 1>We didn't. He was just part of a flashback. So

0:37:12.840 --> 0:37:15.320
<v Speaker 1>let's go to the MP three. First of all, it

0:37:15.360 --> 0:37:19.040
<v Speaker 1>gets its name from the Motion Picture Experts Group, also

0:37:19.160 --> 0:37:23.600
<v Speaker 1>known as IMPEG. It was part of a project that

0:37:23.719 --> 0:37:26.840
<v Speaker 1>IMPEG was doing that was looking at ways of compressing

0:37:26.880 --> 0:37:30.239
<v Speaker 1>audio along with the work that they were doing with

0:37:30.360 --> 0:37:35.279
<v Speaker 1>video files. It's actually named after the process that they developed,

0:37:35.680 --> 0:37:39.120
<v Speaker 1>called IMPEG Audio Layer three. So yes, there was a

0:37:39.200 --> 0:37:42.279
<v Speaker 1>layer one and a layer two. Layer three was a

0:37:42.320 --> 0:37:44.719
<v Speaker 1>refinement of the approach and was the one that was

0:37:44.760 --> 0:37:49.520
<v Speaker 1>actually successful in the market now, Brandenburg was working with

0:37:49.600 --> 0:37:53.680
<v Speaker 1>an instructor he was pursuing. Brandenburg was pursuing a PhD

0:37:53.760 --> 0:37:55.880
<v Speaker 1>at the time and trying to come up with a

0:37:55.880 --> 0:37:59.799
<v Speaker 1>practical means of transmitting digital audio across phone lines, and

0:37:59.800 --> 0:38:03.000
<v Speaker 1>in the process he began to experiment with algorithms that

0:38:03.040 --> 0:38:08.360
<v Speaker 1>could take digital audio information and determine which bits are significant.

0:38:08.840 --> 0:38:13.040
<v Speaker 1>Anything that was deemed insignificant could be discarded. So the

0:38:13.120 --> 0:38:16.560
<v Speaker 1>thinking was that information we cannot perceive as human beings

0:38:16.680 --> 0:38:19.799
<v Speaker 1>is worthless. There's no point in preserving it in an

0:38:19.800 --> 0:38:22.880
<v Speaker 1>audio file format. It's just taking up space that we

0:38:22.960 --> 0:38:26.000
<v Speaker 1>can't even perceive when we play it back, So there's

0:38:26.040 --> 0:38:28.960
<v Speaker 1>no reason to replicate it. There's no reason to record it.

0:38:29.080 --> 0:38:32.000
<v Speaker 1>Leave it out, and that way you could compress digital

0:38:32.040 --> 0:38:35.680
<v Speaker 1>audio files. Or to put it another way, if the

0:38:35.719 --> 0:38:38.239
<v Speaker 1>algorithm determined that a sound was outside the range of

0:38:38.280 --> 0:38:41.880
<v Speaker 1>human hearing, it would drop it from the encoding process,

0:38:41.920 --> 0:38:44.640
<v Speaker 1>so you get a sound file much smaller than the

0:38:44.680 --> 0:38:49.239
<v Speaker 1>more accurate representative version. So the lossless version would be

0:38:49.440 --> 0:38:53.360
<v Speaker 1>more accurate to the original sound. But this new version,

0:38:53.480 --> 0:38:56.640
<v Speaker 1>what we would call a lossy version, a compressed file

0:38:57.120 --> 0:39:00.359
<v Speaker 1>would be able to replicate it pretty well if it's

0:39:00.400 --> 0:39:03.640
<v Speaker 1>designed properly, and it maybe to a point if you

0:39:03.680 --> 0:39:06.440
<v Speaker 1>design it well enough that you couldn't tell the difference

0:39:06.480 --> 0:39:10.040
<v Speaker 1>between the two. Uh. That took some time. That was

0:39:10.080 --> 0:39:15.320
<v Speaker 1>not easy to do. So the new file, the new version,

0:39:15.400 --> 0:39:18.560
<v Speaker 1>the compressed one, the lossy format, would only have the

0:39:18.600 --> 0:39:22.280
<v Speaker 1>actual relevant data, and from that point forward, the challenge

0:39:22.360 --> 0:39:26.719
<v Speaker 1>was to determine what are the benchmarks to figure out

0:39:26.840 --> 0:39:30.480
<v Speaker 1>what is relevant versus what is irrelevant, because if you

0:39:30.520 --> 0:39:33.520
<v Speaker 1>lose too much information, you change the quality of the recording,

0:39:33.960 --> 0:39:37.280
<v Speaker 1>meaning it's no longer an accurate representation of the original sound.

0:39:37.880 --> 0:39:41.360
<v Speaker 1>So you might say that any sound below twenty hurts

0:39:41.640 --> 0:39:44.880
<v Speaker 1>isn't relevant because it's below the range of your typical

0:39:45.000 --> 0:39:49.080
<v Speaker 1>human humans ability to hear. You might say that anything

0:39:49.080 --> 0:39:54.200
<v Speaker 1>above twenty thousand hurts or twenty killer hurts is irrelevant

0:39:54.280 --> 0:39:59.080
<v Speaker 1>because humans typically can't hear sounds above that frequency. You

0:39:59.160 --> 0:40:02.640
<v Speaker 1>might say that sound at a certain amplitude or lower

0:40:03.200 --> 0:40:08.040
<v Speaker 1>are irrelevant because they're so quiet that humans wouldn't hear them.

0:40:08.239 --> 0:40:11.320
<v Speaker 1>Or you might say that if a certain sound is

0:40:11.360 --> 0:40:14.120
<v Speaker 1>at a lower amplitude and a different sound is at

0:40:14.120 --> 0:40:18.279
<v Speaker 1>a higher amplitude, the higher amplitude sound is drowning out

0:40:18.320 --> 0:40:21.680
<v Speaker 1>the lower amplitude sound, and so we humans don't really

0:40:21.680 --> 0:40:24.799
<v Speaker 1>perceive the lower amplitude sound. This is where we get

0:40:24.800 --> 0:40:28.000
<v Speaker 1>into psychoacoustics. It's not just what we hear, but how

0:40:28.040 --> 0:40:32.200
<v Speaker 1>we perceive the sound itself. And a lot of that

0:40:32.280 --> 0:40:35.520
<v Speaker 1>went into formulating the algorithms to figure out how to

0:40:35.560 --> 0:40:38.480
<v Speaker 1>compress this music in a way where you get a

0:40:38.560 --> 0:40:44.359
<v Speaker 1>recording that represents the original without uh, you know, compromising

0:40:44.360 --> 0:40:46.920
<v Speaker 1>too much and still getting the file size to a

0:40:47.040 --> 0:40:50.640
<v Speaker 1>manageable size. And these are the decisions you have to

0:40:50.680 --> 0:40:53.200
<v Speaker 1>make to figure out which bits of information you keep

0:40:53.200 --> 0:40:57.040
<v Speaker 1>in which ones you ditch. Well Brandenburg and a team

0:40:57.040 --> 0:40:59.080
<v Speaker 1>we're working on our fighting this approach in the late

0:40:59.120 --> 0:41:02.839
<v Speaker 1>eighties and early nineties. And he said, at one point

0:41:02.880 --> 0:41:05.120
<v Speaker 1>he thought he had nailed it, and then he heard

0:41:05.120 --> 0:41:10.280
<v Speaker 1>an acapella song, It was Tom's Diner by Suzanne Vega,

0:41:10.800 --> 0:41:14.000
<v Speaker 1>and then he listened to the compressed MP three version

0:41:14.160 --> 0:41:17.520
<v Speaker 1>of that song, using the the version of MP three

0:41:17.560 --> 0:41:20.440
<v Speaker 1>that had been developed up to that point, and he said,

0:41:21.120 --> 0:41:25.360
<v Speaker 1>it ruined the song. It trashed it. It sounded terrible.

0:41:25.680 --> 0:41:29.279
<v Speaker 1>He said that other representations of music seemed fine with

0:41:29.360 --> 0:41:32.360
<v Speaker 1>this particular approach, but when they went with this stripped

0:41:32.360 --> 0:41:36.520
<v Speaker 1>down acapella song with this particular kind of you're in

0:41:36.560 --> 0:41:39.600
<v Speaker 1>the middle of a space, listening to Suzanne Vegas sing,

0:41:40.280 --> 0:41:43.440
<v Speaker 1>it ruined her voice, and so the team began to

0:41:43.440 --> 0:41:47.080
<v Speaker 1>tweet the compression algorithms to correct for this problem, and

0:41:47.120 --> 0:41:49.760
<v Speaker 1>it took a lot of work to figure out, Okay, well,

0:41:49.800 --> 0:41:53.279
<v Speaker 1>what are the elements of sound that we messed with

0:41:53.960 --> 0:41:56.920
<v Speaker 1>that have created this issue, and ultimately they were finally

0:41:56.920 --> 0:41:59.440
<v Speaker 1>able to create an MP three file that didn't distort

0:41:59.560 --> 0:42:02.440
<v Speaker 1>or ruin the recording. Brandberg said he listened to that

0:42:02.520 --> 0:42:05.880
<v Speaker 1>song somewhere between five hundred and a thousand times, and

0:42:05.920 --> 0:42:09.440
<v Speaker 1>then he saw Suzanne Vega performance live and he was

0:42:09.480 --> 0:42:14.520
<v Speaker 1>able to recognize all of those subtle changes in her

0:42:14.600 --> 0:42:18.160
<v Speaker 1>voice because he had paid so close attention to it

0:42:18.280 --> 0:42:22.200
<v Speaker 1>during the process of tweaking this algorithm. He said, Ultimately,

0:42:22.680 --> 0:42:25.200
<v Speaker 1>the real telling thing is he still enjoyed the song,

0:42:26.719 --> 0:42:29.720
<v Speaker 1>which says a lot about him. Me. I can't stand

0:42:29.760 --> 0:42:33.319
<v Speaker 1>that song, but maybe it's just because to me, there's

0:42:33.320 --> 0:42:34.880
<v Speaker 1>a point where it just sounds like someone is just

0:42:34.920 --> 0:42:37.879
<v Speaker 1>singing about what they're doing, and I do that every day.

0:42:38.400 --> 0:42:41.280
<v Speaker 1>No one gave me a record deal, alright. So getting

0:42:41.280 --> 0:42:46.480
<v Speaker 1>back to MP three, they had finalized the FOUL format

0:42:46.520 --> 0:42:49.920
<v Speaker 1>and created the standard, but it was just one of

0:42:50.080 --> 0:42:54.279
<v Speaker 1>several possibilities for encoding audio, and it didn't immediately take off.

0:42:54.320 --> 0:43:01.080
<v Speaker 1>It wasn't immediately adopted by consumers. The team had identified

0:43:01.080 --> 0:43:04.480
<v Speaker 1>the Internet as a possible distribute distribution method for MP

0:43:04.560 --> 0:43:07.839
<v Speaker 1>three files, rather than just over telephone lines. They said, well,

0:43:08.000 --> 0:43:11.080
<v Speaker 1>can technically we could send and P three's across the Internet,

0:43:11.480 --> 0:43:16.280
<v Speaker 1>so you could send manageable sized files across this network.

0:43:17.560 --> 0:43:22.280
<v Speaker 1>On July fourteenth, they created the file extension dot MP three.

0:43:23.680 --> 0:43:26.920
<v Speaker 1>Now it would take a little bit longer for software

0:43:26.960 --> 0:43:29.440
<v Speaker 1>to take advantage of this. One of the early programs

0:43:29.480 --> 0:43:33.560
<v Speaker 1>was winamp, which made MP three decoding accessible and from

0:43:33.560 --> 0:43:36.920
<v Speaker 1>that point the file format began to take off. To

0:43:37.080 --> 0:43:40.440
<v Speaker 1>follow would be dedicated MP three players and sites that

0:43:40.480 --> 0:43:44.160
<v Speaker 1>allowed people to upload and download compressed audio files, which

0:43:44.280 --> 0:43:50.200
<v Speaker 1>also indicated a rise in piracy. And then in response

0:43:50.280 --> 0:43:52.640
<v Speaker 1>to the rise in piracy, we saw an increase in

0:43:52.800 --> 0:43:56.960
<v Speaker 1>d r M strategies digital rights management or copy protection

0:43:57.000 --> 0:44:00.319
<v Speaker 1>if you prefer, and that all really in it up

0:44:00.360 --> 0:44:04.640
<v Speaker 1>shaping a lot of the policies and strategies that affect

0:44:04.640 --> 0:44:07.479
<v Speaker 1>the Internet today, So you could say that the MP

0:44:07.640 --> 0:44:11.520
<v Speaker 1>three is one of the reasons why the Internet is

0:44:11.560 --> 0:44:14.200
<v Speaker 1>the way it is right now, and why arguments both

0:44:14.239 --> 0:44:19.399
<v Speaker 1>for and against net neutrality have formulated in certain ways.

0:44:19.440 --> 0:44:21.439
<v Speaker 1>A lot of it is shaped by the MP three.

0:44:22.480 --> 0:44:26.240
<v Speaker 1>So that kind of wraps up this discussion about digital

0:44:26.280 --> 0:44:29.560
<v Speaker 1>audio in general and a little bit on MP three files.

0:44:29.560 --> 0:44:32.560
<v Speaker 1>In the next episode of this series, I will dive

0:44:32.640 --> 0:44:36.040
<v Speaker 1>into a more technical explanation of what is actually going

0:44:36.120 --> 0:44:39.440
<v Speaker 1>on with the MP three compression algorithms. And I bet

0:44:39.520 --> 0:44:44.440
<v Speaker 1>you can't wait to learn all about fast Furrier transforms.

0:44:44.600 --> 0:44:47.480
<v Speaker 1>I know I can't, And like I said, I'll have

0:44:47.600 --> 0:44:50.400
<v Speaker 1>other episodes to sprinkle in between this one and the

0:44:50.440 --> 0:44:53.239
<v Speaker 1>next one and then the third one, so that way

0:44:53.280 --> 0:44:56.239
<v Speaker 1>you won't just get digital audio overload. And if you

0:44:56.280 --> 0:44:59.839
<v Speaker 1>guys have any comments or questions or suggestions for show

0:44:59.880 --> 0:45:03.040
<v Speaker 1>to topics or people I should interview, or maybe people

0:45:03.040 --> 0:45:05.520
<v Speaker 1>I should have on as a guest host, shoot him

0:45:05.520 --> 0:45:09.240
<v Speaker 1>my way. My email is tech Stuff at how stuff

0:45:09.280 --> 0:45:12.000
<v Speaker 1>works dot com, or you can always drop me a

0:45:12.040 --> 0:45:15.000
<v Speaker 1>line on Facebook or Twitter with the handle tech stuff

0:45:15.239 --> 0:45:18.960
<v Speaker 1>hs W and I'll talk to you guys again really

0:45:19.000 --> 0:45:25.200
<v Speaker 1>soon for more on this and thousands of other topics.

0:45:25.440 --> 0:45:36.520
<v Speaker 1>Is it how stuff Works? Dot com