1
00:00:04,160 --> 00:00:07,160
Speaker 1: Get in touch with technology with tech Stuff from how

2
00:00:07,240 --> 00:00:13,920
Speaker 1: stuff works dot com. Hey there, everyone, this is Jonathan

3
00:00:13,960 --> 00:00:18,279
Speaker 1: Strickling with tech Stuff, and today we're gonna tackle a

4
00:00:18,360 --> 00:00:21,600
Speaker 1: subject that I've talked about in the past. Actually, way

5
00:00:21,640 --> 00:00:25,200
Speaker 1: back in two thousand and eight, back when you were

6
00:00:25,200 --> 00:00:27,760
Speaker 1: a knee high to a grasshopper, Chris Pollette and I

7
00:00:27,800 --> 00:00:31,880
Speaker 1: did an episode called how MB three files Work, and

8
00:00:31,920 --> 00:00:35,440
Speaker 1: we talked about the lossy file format, and we actually

9
00:00:35,479 --> 00:00:38,120
Speaker 1: revisited it in two thousand eleven we did an episode

10
00:00:38,120 --> 00:00:42,000
Speaker 1: about the iPod and about MP three players, but I

11
00:00:42,000 --> 00:00:44,480
Speaker 1: really thought it would be a good idea to revisit

12
00:00:45,120 --> 00:00:48,640
Speaker 1: MP three files, MP three players, digital audio in general,

13
00:00:48,720 --> 00:00:51,519
Speaker 1: the difference between digital audio and analog and all of

14
00:00:51,520 --> 00:00:54,760
Speaker 1: that history. Uh, to really give a deep dive, because

15
00:00:54,800 --> 00:00:57,200
Speaker 1: back in those days we did really short episodes and

16
00:00:57,240 --> 00:00:59,600
Speaker 1: so we weren't able to give it the full coverage

17
00:00:59,600 --> 00:01:04,360
Speaker 1: that I think get deserved. Um and we actually reached

18
00:01:04,360 --> 00:01:08,720
Speaker 1: a point in history that I did not anticipate. And

19
00:01:08,760 --> 00:01:11,759
Speaker 1: I am, of course talking about the day when I said,

20
00:01:11,800 --> 00:01:15,120
Speaker 1: you know what, I don't need to carry a smartphone

21
00:01:15,200 --> 00:01:18,120
Speaker 1: and an MP three player. I held out for a

22
00:01:18,120 --> 00:01:20,320
Speaker 1: really long time. You guys who have been long time

23
00:01:20,319 --> 00:01:23,280
Speaker 1: listeners of tech stuff might remember that I really liked

24
00:01:23,360 --> 00:01:26,959
Speaker 1: dedicated devices, Like I really liked having a digital camera,

25
00:01:27,080 --> 00:01:29,760
Speaker 1: and I really liked having an MP three player, and

26
00:01:29,800 --> 00:01:32,319
Speaker 1: I really liked having a phone that was a phone.

27
00:01:33,000 --> 00:01:34,880
Speaker 1: And now I'm like, no, I'm good with just one

28
00:01:34,880 --> 00:01:38,440
Speaker 1: device doing all that kind of thing. So, uh, since

29
00:01:38,440 --> 00:01:41,360
Speaker 1: we've reached that point, the point where our machines are

30
00:01:41,360 --> 00:01:45,400
Speaker 1: sophisticate enough to either have enough storage space to carry

31
00:01:45,440 --> 00:01:50,640
Speaker 1: an impressive music collection, or more likely as the things

32
00:01:50,840 --> 00:01:53,520
Speaker 1: as things have changed these days, um access to a

33
00:01:53,560 --> 00:01:57,720
Speaker 1: streaming service where I don't even have stuff stored permanently

34
00:01:58,080 --> 00:02:01,840
Speaker 1: or like in any any you know, lasting format on

35
00:02:01,880 --> 00:02:04,880
Speaker 1: the phone itself. Instead, I'm streaming a file over the

36
00:02:04,920 --> 00:02:08,919
Speaker 1: Internet to listen to. Dynamically, I thought, why not talk

37
00:02:08,960 --> 00:02:12,000
Speaker 1: about the MP three because who knows, in a few

38
00:02:12,080 --> 00:02:15,760
Speaker 1: years and that might just be a distant memory. So

39
00:02:16,080 --> 00:02:18,840
Speaker 1: this is going to be the first of a three

40
00:02:18,880 --> 00:02:21,600
Speaker 1: part series, and I want to let you guys know,

41
00:02:21,639 --> 00:02:25,120
Speaker 1: I'm not going to record all of these and publish

42
00:02:25,160 --> 00:02:27,440
Speaker 1: them all one right after the other. So it's not

43
00:02:27,480 --> 00:02:30,800
Speaker 1: gonna be MP three Part one, Part two, Part three

44
00:02:30,880 --> 00:02:33,639
Speaker 1: in a row. Uh. In this episode, we're gonna look

45
00:02:33,639 --> 00:02:36,560
Speaker 1: at how digital audio works in general and how it's

46
00:02:36,600 --> 00:02:40,320
Speaker 1: different from analog audio. Uh, and we're also gonna talk

47
00:02:40,320 --> 00:02:43,080
Speaker 1: about how the MP three was created and what it does.

48
00:02:43,639 --> 00:02:46,959
Speaker 1: In the next episode, I'm gonna take a deeper dive

49
00:02:47,480 --> 00:02:51,639
Speaker 1: into how an MP three file works, how it compresses audio.

50
00:02:52,120 --> 00:02:56,120
Speaker 1: It gets really technical. And in the final episode of

51
00:02:56,120 --> 00:02:58,760
Speaker 1: the series, we're gonna explore the history of the MP

52
00:02:58,880 --> 00:03:02,200
Speaker 1: three player and how Apple ended up dominating that space

53
00:03:02,240 --> 00:03:04,200
Speaker 1: for so long, to the point that we have things

54
00:03:04,240 --> 00:03:09,880
Speaker 1: called podcasts. But don't worry, I have other episodes to

55
00:03:09,919 --> 00:03:12,320
Speaker 1: divide up this content. So, like I said, it's not

56
00:03:12,360 --> 00:03:14,560
Speaker 1: all gonna be in a row. I don't want you

57
00:03:14,639 --> 00:03:19,520
Speaker 1: to have a month of MP three related episodes, but

58
00:03:20,200 --> 00:03:23,519
Speaker 1: you know, every couple of episodes, expect one of these.

59
00:03:24,160 --> 00:03:28,720
Speaker 1: It's kind of an interesting subject, I think. So to

60
00:03:28,840 --> 00:03:31,640
Speaker 1: start it all off, we all have to take a

61
00:03:31,720 --> 00:03:34,760
Speaker 1: quick trip to Germany. So anyone who is not in

62
00:03:34,840 --> 00:03:38,960
Speaker 1: Germany get your passport. I was actually in Germany not

63
00:03:39,080 --> 00:03:41,400
Speaker 1: that long ago. I got to visit Berlin and had

64
00:03:41,440 --> 00:03:45,000
Speaker 1: a wonderful time. And in Germany there's a company called

65
00:03:45,160 --> 00:03:48,640
Speaker 1: frown Hoffer Gazelle Shoft and you might wonder, well, what

66
00:03:48,680 --> 00:03:54,360
Speaker 1: does this company do? They think I joke that my profession,

67
00:03:54,720 --> 00:03:57,160
Speaker 1: that my title that I should put on my business

68
00:03:57,160 --> 00:04:01,200
Speaker 1: card it should say professional smart person. And well, no joke,

69
00:04:01,320 --> 00:04:05,240
Speaker 1: that's what these people are. They they specialize in research

70
00:04:05,360 --> 00:04:10,880
Speaker 1: and development, applied research. It's a whole company that specializes

71
00:04:10,920 --> 00:04:14,760
Speaker 1: and applied research. And it's huge. It encompasses sixties seven

72
00:04:14,800 --> 00:04:20,200
Speaker 1: institutes and research units across Germany. Well back in the

73
00:04:20,240 --> 00:04:25,880
Speaker 1: eighties and there was a researcher named Karl Heinz Brandenburg,

74
00:04:26,440 --> 00:04:33,000
Speaker 1: and Karl Heinz made a breakthrough round seven uh and

75
00:04:33,160 --> 00:04:37,480
Speaker 1: came up with this clever idea about encoding audio. He

76
00:04:37,520 --> 00:04:40,239
Speaker 1: was actually working towards creating a way that would allow

77
00:04:40,640 --> 00:04:45,000
Speaker 1: for high audio quality transfer but having a low bit

78
00:04:45,120 --> 00:04:50,400
Speaker 1: rate sampling so that file sizes and transfer times wouldn't

79
00:04:50,440 --> 00:04:52,520
Speaker 1: get out of control. Because you got to remember, this

80
00:04:52,560 --> 00:04:55,599
Speaker 1: is the eighties, this is before the Worldwide Web was

81
00:04:55,640 --> 00:04:58,839
Speaker 1: a thing that would That wouldn't happen until the early nineties,

82
00:04:59,240 --> 00:05:01,240
Speaker 1: so the Internet is very young. In fact, they weren't

83
00:05:01,240 --> 00:05:03,839
Speaker 1: even looking at the Internet as a method of distribution

84
00:05:03,880 --> 00:05:07,520
Speaker 1: for this particular type of encoded audio. They were looking

85
00:05:07,560 --> 00:05:11,960
Speaker 1: at using this to transmit across telephone lines, so they

86
00:05:12,000 --> 00:05:13,760
Speaker 1: need to have something that was going to be high

87
00:05:13,839 --> 00:05:18,440
Speaker 1: quality but low space. So what the heck does that mean?

88
00:05:18,520 --> 00:05:22,920
Speaker 1: All right, Well, digital audio and analog audio are very

89
00:05:23,000 --> 00:05:26,920
Speaker 1: different things. So to understand that, we need to look

90
00:05:27,000 --> 00:05:31,000
Speaker 1: at how sound works and how we describe sound, because

91
00:05:31,000 --> 00:05:34,760
Speaker 1: that informs how we can capture sound and replicate those

92
00:05:34,839 --> 00:05:39,120
Speaker 1: qualities digitally. So stick with me. We're gonna go back

93
00:05:39,160 --> 00:05:44,480
Speaker 1: to school for some basic sound science. And this goes

94
00:05:44,560 --> 00:05:48,320
Speaker 1: back to the way sound physically moves through a medium,

95
00:05:48,320 --> 00:05:51,839
Speaker 1: whether that's a solid or through the air or through water.

96
00:05:52,320 --> 00:05:58,640
Speaker 1: Sound is vibration. Now we sense this primarily through hearing

97
00:05:58,680 --> 00:06:01,720
Speaker 1: it or some type feeling it. If it's the right

98
00:06:01,760 --> 00:06:04,720
Speaker 1: frequency in the right amplitude, we can actually feel sound.

99
00:06:05,040 --> 00:06:08,120
Speaker 1: Anyone who stood close to, say a sub wiffer that

100
00:06:08,160 --> 00:06:10,480
Speaker 1: was really blasting out bass notes, you know what I'm

101
00:06:10,520 --> 00:06:14,159
Speaker 1: talking about, You can feel it pressing against you. Well,

102
00:06:14,200 --> 00:06:18,760
Speaker 1: sound travels through the air when molecules vibrate against each other,

103
00:06:19,360 --> 00:06:23,680
Speaker 1: and this creates instances of increased pressure and decreased pressure

104
00:06:24,080 --> 00:06:27,760
Speaker 1: at what is a hyperlocal level. We're not talking about

105
00:06:27,800 --> 00:06:31,000
Speaker 1: weather maps here, We're talking about tiny, little areas. So

106
00:06:31,279 --> 00:06:33,839
Speaker 1: this increase and decrease in pressure is something that we

107
00:06:33,920 --> 00:06:37,679
Speaker 1: can sense as sound. When those changes in pressure affect

108
00:06:37,760 --> 00:06:41,119
Speaker 1: a diaphragm, such as one that's in a microphone or

109
00:06:41,839 --> 00:06:45,919
Speaker 1: maybe your ear drum, for example, it causes the diaphragm

110
00:06:45,960 --> 00:06:50,120
Speaker 1: to actually move. So increased pressure pushes the diaphragm in,

111
00:06:51,080 --> 00:06:56,400
Speaker 1: and decreased pressure doesn't really pull the diaphragm out. I mean,

112
00:06:56,440 --> 00:06:58,680
Speaker 1: you could say it it pulls the diaphragm out, but

113
00:06:58,680 --> 00:07:02,680
Speaker 1: to be more accurate, the diagram actually pushes outward because

114
00:07:02,720 --> 00:07:05,440
Speaker 1: the pressure on the outside is lower than the pressure

115
00:07:05,480 --> 00:07:07,760
Speaker 1: on the inside. But you get what I'm saying. The

116
00:07:07,880 --> 00:07:12,320
Speaker 1: diaphragm begins to to flex inward and outward depending upon

117
00:07:12,680 --> 00:07:16,360
Speaker 1: the amount of pressure that it's it's encountering. You could

118
00:07:16,360 --> 00:07:18,720
Speaker 1: imagine this being kind of like a drum drum, not

119
00:07:18,800 --> 00:07:20,960
Speaker 1: an ear drum, but an actual drum and striking it.

120
00:07:21,800 --> 00:07:24,720
Speaker 1: That's the same sort of thing. So sound is the

121
00:07:24,760 --> 00:07:29,280
Speaker 1: fluctuations of pressure, which we can diagram as a wave

122
00:07:29,880 --> 00:07:32,720
Speaker 1: or a wave length a wave form on an X

123
00:07:32,840 --> 00:07:37,760
Speaker 1: Y axis, So the horizontal line that access that represents

124
00:07:37,840 --> 00:07:41,320
Speaker 1: time that has passed, and the vertical axis represents the

125
00:07:41,400 --> 00:07:46,200
Speaker 1: amplitude or the volume of the sound wave. The wave

126
00:07:46,320 --> 00:07:49,560
Speaker 1: length of the sound, which is the distance between successive

127
00:07:49,600 --> 00:07:52,800
Speaker 1: points on a wave, such as like the successive crests

128
00:07:52,840 --> 00:07:55,480
Speaker 1: on a wave. That tells you a lot about the frequency.

129
00:07:56,400 --> 00:08:00,520
Speaker 1: So sound moves at a constant rate through a given medium,

130
00:08:00,520 --> 00:08:04,080
Speaker 1: but it moves at different rates through different media. So,

131
00:08:04,120 --> 00:08:06,640
Speaker 1: in other words, it moves at different speed through a

132
00:08:06,680 --> 00:08:09,880
Speaker 1: solid than it does through air. If the crests of

133
00:08:09,960 --> 00:08:13,080
Speaker 1: each sound wave are really close together, that's a high

134
00:08:13,160 --> 00:08:17,320
Speaker 1: frequency sound. More waves will pass through an arbitrary point

135
00:08:17,560 --> 00:08:21,080
Speaker 1: within a second than waves that are spaced further apart.

136
00:08:21,440 --> 00:08:24,600
Speaker 1: That would be a lower frequency sound. Higher frequency sounds

137
00:08:24,600 --> 00:08:27,800
Speaker 1: have a higher pitch than lower frequency sounds. So if

138
00:08:27,800 --> 00:08:31,440
Speaker 1: you hold a single note at a constant frequency, you'll

139
00:08:31,440 --> 00:08:34,880
Speaker 1: have what is called a simple harmonic motion. That means

140
00:08:34,920 --> 00:08:38,840
Speaker 1: the vibrations are moving at a constant rate inward and outward.

141
00:08:38,880 --> 00:08:42,400
Speaker 1: The cycle is constant. A tuning fork is a good

142
00:08:42,440 --> 00:08:46,640
Speaker 1: example of this. So if you hear a clear C

143
00:08:46,920 --> 00:08:50,640
Speaker 1: note played on a musical instrument, that could be a

144
00:08:50,679 --> 00:08:53,480
Speaker 1: simple harmonic motion. It won't be, but it could be.

145
00:08:53,600 --> 00:08:55,520
Speaker 1: I'll tell you why it won't be in a minute.

146
00:08:55,840 --> 00:08:59,160
Speaker 1: So the frequency of vibration doesn't change, and so you

147
00:08:59,160 --> 00:09:01,959
Speaker 1: would get this very clear note as a result, And

148
00:09:02,000 --> 00:09:04,800
Speaker 1: if you were to diagram it, you would have very

149
00:09:04,840 --> 00:09:10,040
Speaker 1: regular crests and troughs, all of the same amplitude and

150
00:09:10,120 --> 00:09:13,800
Speaker 1: distance from each other. The frequency and volume would remain constant,

151
00:09:15,040 --> 00:09:17,880
Speaker 1: assuming of course, that you're not trying to change the

152
00:09:17,920 --> 00:09:21,160
Speaker 1: frequency or volume. Now, this is where I point out

153
00:09:21,480 --> 00:09:25,839
Speaker 1: most musical instruments don't produce a single clear note, even

154
00:09:25,880 --> 00:09:30,640
Speaker 1: if played expertly. They actually create several resonant frequencies. So

155
00:09:30,720 --> 00:09:35,319
Speaker 1: every physical object resonates at several different frequencies. You've probably

156
00:09:35,360 --> 00:09:38,960
Speaker 1: seen this in various programs. MythBusters did one about bridges,

157
00:09:39,440 --> 00:09:42,080
Speaker 1: the idea being that if you were to have a

158
00:09:42,080 --> 00:09:44,760
Speaker 1: group of people marching on a bridge at the bridge's

159
00:09:44,800 --> 00:09:48,040
Speaker 1: resonant frequency, it could cause the bridge to start to

160
00:09:48,120 --> 00:09:51,839
Speaker 1: vibrate and swing out of control. Well, there's a reason

161
00:09:51,880 --> 00:09:53,960
Speaker 1: for this. You may have also seen videos of people

162
00:09:54,080 --> 00:09:58,280
Speaker 1: singing a certain note and causing a crystal glass to shatter.

163
00:09:58,880 --> 00:10:02,360
Speaker 1: That's because that crystal glass does have a resonant frequency,

164
00:10:02,400 --> 00:10:04,640
Speaker 1: and if you can hit that resonant frequency at the

165
00:10:04,760 --> 00:10:08,600
Speaker 1: right volume, you can cause the glass to start to deform,

166
00:10:08,720 --> 00:10:11,120
Speaker 1: or the crystal in this case, to deform to a

167
00:10:11,160 --> 00:10:15,120
Speaker 1: point where it loses integrity and it shatters as a result. Well,

168
00:10:16,240 --> 00:10:20,679
Speaker 1: the resonation of an object is dependent upon lots of

169
00:10:20,720 --> 00:10:23,760
Speaker 1: different factors, and in fact, most stuff will resonate at

170
00:10:23,840 --> 00:10:28,240
Speaker 1: different frequencies but at different intensities. Like there might be

171
00:10:28,320 --> 00:10:32,480
Speaker 1: one sweet spot, one specific frequency that will have the

172
00:10:32,559 --> 00:10:37,360
Speaker 1: greatest effect, but other related frequencies may also have an effect.

173
00:10:37,360 --> 00:10:40,720
Speaker 1: It will just be to a lesser extent. Well, if

174
00:10:40,760 --> 00:10:44,200
Speaker 1: you were to pluck a guitar string, just you've tuned

175
00:10:44,200 --> 00:10:46,640
Speaker 1: it to whatever note doesn't matter. Let's say it's you've

176
00:10:46,679 --> 00:10:50,439
Speaker 1: tuned it to to G and you play the G

177
00:10:50,679 --> 00:10:53,960
Speaker 1: string on your guitar. Uh, the note that you will

178
00:10:54,000 --> 00:10:57,280
Speaker 1: hear really over all others will be g that that

179
00:10:57,400 --> 00:10:59,240
Speaker 1: is going to be the one that will sound the loudest,

180
00:10:59,280 --> 00:11:03,679
Speaker 1: But it will also play resonant frequencies at a decreased amplitude,

181
00:11:03,720 --> 00:11:06,839
Speaker 1: in other words, of decreased volume, so you still hear

182
00:11:06,880 --> 00:11:09,679
Speaker 1: the intended note above everything else, above all the other

183
00:11:09,679 --> 00:11:14,320
Speaker 1: resonant frequencies. This is called a complex tone, and that

184
00:11:14,360 --> 00:11:18,040
Speaker 1: collection of frequencies in their amplitudes is called the spectrum

185
00:11:18,240 --> 00:11:21,640
Speaker 1: of sound. You get a full spectrum. Now, some of

186
00:11:21,679 --> 00:11:27,640
Speaker 1: the components of that complex tone will be uh imperceptible

187
00:11:27,679 --> 00:11:30,360
Speaker 1: to you. You there'll be so quiet that you wouldn't

188
00:11:30,440 --> 00:11:33,320
Speaker 1: really notice them. They might affect the overall quality of

189
00:11:33,320 --> 00:11:34,960
Speaker 1: the sound, but in such a subtle way that it

190
00:11:35,000 --> 00:11:38,120
Speaker 1: may be difficult for you to even put it into words.

191
00:11:38,160 --> 00:11:41,360
Speaker 1: Each of those little components is called a partial. So

192
00:11:41,400 --> 00:11:43,679
Speaker 1: in the example of a guitar string, the partials are

193
00:11:43,720 --> 00:11:48,040
Speaker 1: all integers of the same fundamental frequency, and the sound

194
00:11:48,080 --> 00:11:52,680
Speaker 1: has a harmonic spectrum. But as you get further away

195
00:11:52,760 --> 00:11:57,400
Speaker 1: from that fundamental frequency, the amplitude decreases significantly. So, like

196
00:11:57,440 --> 00:12:01,199
Speaker 1: I said, you get far enough away, they are technically there,

197
00:12:01,360 --> 00:12:05,200
Speaker 1: but they might be imperceptible to you. Now, some sounds

198
00:12:05,240 --> 00:12:09,880
Speaker 1: have frequencies that aren't integers of a fundamental frequency and

199
00:12:09,920 --> 00:12:13,120
Speaker 1: are inharmonic uh. Certain bells, Like if you hear a

200
00:12:13,120 --> 00:12:15,160
Speaker 1: bell ring, you can probably pick out a couple of

201
00:12:15,200 --> 00:12:19,560
Speaker 1: different frequencies there that are not harmonic frequencies. These are

202
00:12:19,679 --> 00:12:23,400
Speaker 1: very complex sounds, and to our perception, if it's complex enough,

203
00:12:23,440 --> 00:12:26,959
Speaker 1: it can seem like there's no single discernible pitch. They're

204
00:12:27,080 --> 00:12:31,040
Speaker 1: like there's no fundamental frequency over all the others. If

205
00:12:31,040 --> 00:12:35,320
Speaker 1: it's complex enough, we call it noise. That is the

206
00:12:35,360 --> 00:12:39,440
Speaker 1: technical term. It is noise. Now, the unit we use

207
00:12:39,600 --> 00:12:44,719
Speaker 1: to measure frequency is the hurts uh H, E R

208
00:12:44,840 --> 00:12:49,240
Speaker 1: t Z. Typical human hearing ranges from twenty hurts, which

209
00:12:49,280 --> 00:12:52,760
Speaker 1: means a wave will pass a given arbitrary point twenty

210
00:12:52,840 --> 00:12:55,640
Speaker 1: times within a second, all the way up to twenty

211
00:12:55,760 --> 00:12:59,040
Speaker 1: killer hurts, which means a wave will pass a particular

212
00:12:59,440 --> 00:13:02,640
Speaker 1: point in time twenty thousand times in a second, or

213
00:13:02,800 --> 00:13:05,560
Speaker 1: particular point on your wave form twenty thousand times in

214
00:13:05,559 --> 00:13:09,559
Speaker 1: the second. And most of our sensitivity tends to be

215
00:13:09,559 --> 00:13:12,920
Speaker 1: between one or two killer hurts up to four or

216
00:13:12,960 --> 00:13:17,320
Speaker 1: five killer hurts. That's generally where we have human voices,

217
00:13:17,800 --> 00:13:20,400
Speaker 1: and we've really gotten good at picking those out of

218
00:13:20,480 --> 00:13:23,160
Speaker 1: over everything else. So our sensitivity of hearing is really

219
00:13:23,200 --> 00:13:26,240
Speaker 1: concentrated between one killer hurts and four killer hurts or

220
00:13:26,400 --> 00:13:30,680
Speaker 1: two and five depending upon whom you ask. Now we

221
00:13:30,720 --> 00:13:34,040
Speaker 1: get back over to amplitude. That is referring to the

222
00:13:34,080 --> 00:13:36,800
Speaker 1: height of the wave. It also refers to the volume

223
00:13:37,080 --> 00:13:41,960
Speaker 1: the loudness of something. Amplitude means bigness, So how big

224
00:13:42,160 --> 00:13:45,400
Speaker 1: is the sound? Well, the greater the amplitude, the louder

225
00:13:45,440 --> 00:13:48,480
Speaker 1: it is, and amplitudes can have an enormous range and

226
00:13:48,520 --> 00:13:52,480
Speaker 1: affect how we perceive sounds. So, for example, take a

227
00:13:52,559 --> 00:13:56,840
Speaker 1: really complicated classical piece of music. It's just easy to

228
00:13:56,920 --> 00:14:00,319
Speaker 1: explain it in that term. You might have a wretch

229
00:14:01,080 --> 00:14:03,640
Speaker 1: in that classical piece of music in which all the

230
00:14:03,720 --> 00:14:06,920
Speaker 1: instruments are more or less playing at a similar volume,

231
00:14:07,000 --> 00:14:10,720
Speaker 1: so the sound from each instrument section has a similar amplitude.

232
00:14:11,240 --> 00:14:14,240
Speaker 1: But then there might be one segment where an instrument

233
00:14:14,280 --> 00:14:18,599
Speaker 1: group or maybe even a single soloist has an increased

234
00:14:18,600 --> 00:14:21,640
Speaker 1: amplitude and increased volume. It rises over the rest of

235
00:14:21,680 --> 00:14:25,480
Speaker 1: the orchestra, and that peak of the amplitude is called

236
00:14:25,520 --> 00:14:29,720
Speaker 1: the attack of the sound, and the entire range of

237
00:14:29,760 --> 00:14:34,280
Speaker 1: amplitudes is called the amplitude envelope. Now this is important

238
00:14:34,320 --> 00:14:38,120
Speaker 1: when we get to m P three's because the way

239
00:14:38,120 --> 00:14:42,040
Speaker 1: we perceive these sounds, uh that that has everything to

240
00:14:42,120 --> 00:14:44,720
Speaker 1: do with the way the MP three was designed. The

241
00:14:44,760 --> 00:14:47,720
Speaker 1: whole point of the MP three was to try and

242
00:14:47,760 --> 00:14:53,040
Speaker 1: create a small file size to represent what we can

243
00:14:53,120 --> 00:14:56,080
Speaker 1: hear and kind of ignore everything else. But we'll get

244
00:14:56,120 --> 00:14:58,640
Speaker 1: to that in a little bit more more time so

245
00:14:59,160 --> 00:15:01,880
Speaker 1: this is really interesting to me. If you take a

246
00:15:02,000 --> 00:15:07,920
Speaker 1: sound and you double its amplitude, you increase the amplitude

247
00:15:07,920 --> 00:15:11,760
Speaker 1: by twofold, a listener would not necessarily feel that the

248
00:15:11,800 --> 00:15:16,960
Speaker 1: sound is twice as loud. Human hearing is incredibly subjective,

249
00:15:17,560 --> 00:15:21,640
Speaker 1: and typically for most listeners, it would require much more

250
00:15:22,440 --> 00:15:26,320
Speaker 1: than doubling the sounds amplitude for them to feel that

251
00:15:26,440 --> 00:15:29,960
Speaker 1: the sound itself was twice as loud. This perception of

252
00:15:30,040 --> 00:15:32,480
Speaker 1: volume is important when we get to the lossy formats

253
00:15:32,480 --> 00:15:37,440
Speaker 1: for audio files. Now I've given you all this information,

254
00:15:37,640 --> 00:15:40,600
Speaker 1: and I know everyone is probably thinking, you know, I

255
00:15:40,680 --> 00:15:44,040
Speaker 1: learned this in primary school, elementary school. All of this

256
00:15:44,120 --> 00:15:47,360
Speaker 1: is really familiar to me, and you're maybe rolling your

257
00:15:47,360 --> 00:15:50,400
Speaker 1: eyes because it's so basic. But I think it's important

258
00:15:50,840 --> 00:15:54,120
Speaker 1: to have that refresher so that you can understand the

259
00:15:54,160 --> 00:15:58,800
Speaker 1: difference between sound as we experience it and sound as

260
00:15:58,880 --> 00:16:03,520
Speaker 1: the way we hold it digitally and replicate it digitally.

261
00:16:04,400 --> 00:16:07,400
Speaker 1: For one thing, this illustrates how sound in the real

262
00:16:07,440 --> 00:16:12,200
Speaker 1: world is a continuum. It's a continuum both in frequency

263
00:16:12,240 --> 00:16:17,800
Speaker 1: and amplitude. You can have sound changing in frequency very

264
00:16:17,800 --> 00:16:22,080
Speaker 1: smoothly from one pitch to another. You can also have

265
00:16:22,200 --> 00:16:26,800
Speaker 1: sound increase or decrease in amplitude in a very smooth way.

266
00:16:26,920 --> 00:16:31,800
Speaker 1: And it is continuous, it's unbroken, it can have smooth transitions.

267
00:16:31,800 --> 00:16:34,800
Speaker 1: And these qualities provide challenges when we want to describe

268
00:16:34,840 --> 00:16:40,520
Speaker 1: something digitally, because at the heart of digital information is

269
00:16:40,960 --> 00:16:45,680
Speaker 1: the bit, the basic unit of information. It is a

270
00:16:45,800 --> 00:16:49,440
Speaker 1: unit of information that only has two states zero or

271
00:16:49,560 --> 00:16:53,720
Speaker 1: one is essentially off or on. When you get down

272
00:16:53,760 --> 00:16:58,600
Speaker 1: to defining information in just two states, then you start

273
00:16:58,640 --> 00:17:02,320
Speaker 1: to look at something that's continuous and you realize this

274
00:17:02,400 --> 00:17:04,359
Speaker 1: is going to be a challenge. How do I describe

275
00:17:04,400 --> 00:17:10,840
Speaker 1: a continuous experience in very discreet amounts of information. And

276
00:17:10,920 --> 00:17:15,520
Speaker 1: that's when we get to the methodology we've developed to

277
00:17:15,920 --> 00:17:19,359
Speaker 1: digitally encode sound. I'm going to get into that in

278
00:17:19,640 --> 00:17:22,880
Speaker 1: just a minute, but before I do that, let's take

279
00:17:22,880 --> 00:17:34,520
Speaker 1: a quick break to thank our sponsor. All right, let's

280
00:17:34,560 --> 00:17:38,800
Speaker 1: get back into it. So we've talked about the nature

281
00:17:38,840 --> 00:17:42,120
Speaker 1: of sound. Analog sound, by the way, tries to replicate

282
00:17:42,359 --> 00:17:45,600
Speaker 1: exactly what we would experience in nature. It tries to

283
00:17:45,600 --> 00:17:51,200
Speaker 1: create this continuous experience, so you get these smooth waves

284
00:17:51,240 --> 00:17:56,800
Speaker 1: of frequencies and amplitudes. And that's why some people argue

285
00:17:56,880 --> 00:18:02,760
Speaker 1: that that analog styles of of sound recordings are superior

286
00:18:02,840 --> 00:18:07,399
Speaker 1: to digital ones. I don't necessarily think they're right, but

287
00:18:07,560 --> 00:18:12,280
Speaker 1: they often feel that way. So something like a vinyl album,

288
00:18:12,320 --> 00:18:16,080
Speaker 1: which is an analog format of digital or sorry, an

289
00:18:16,080 --> 00:18:20,240
Speaker 1: analog format of music storage I should say sound storage. Uh,

290
00:18:20,280 --> 00:18:22,960
Speaker 1: they think that that is superior to say a CD,

291
00:18:23,280 --> 00:18:28,280
Speaker 1: which is a digital storage format. Uh. And who's to say.

292
00:18:28,359 --> 00:18:32,399
Speaker 1: I mean, like, if your sense of hearing is incredibly

293
00:18:32,680 --> 00:18:36,040
Speaker 1: well tuned, you might be able to pick up on

294
00:18:36,080 --> 00:18:40,080
Speaker 1: some differences. Or if someone did a really terrible job

295
00:18:40,640 --> 00:18:45,960
Speaker 1: encoding music digitally, then that might reveal itself to you

296
00:18:46,000 --> 00:18:48,760
Speaker 1: as well. Uh. But this is one of those things

297
00:18:48,760 --> 00:18:50,920
Speaker 1: that I think a lot of people feel they can

298
00:18:50,920 --> 00:18:52,720
Speaker 1: tell the difference, but if they would do a double

299
00:18:52,760 --> 00:18:57,280
Speaker 1: blind test, they might be surprised at how difficult it is.

300
00:18:57,760 --> 00:19:01,160
Speaker 1: If things if everything's working the way it should, then

301
00:19:01,400 --> 00:19:05,960
Speaker 1: there shouldn't be a perceptible difference at any rate. Digital

302
00:19:05,960 --> 00:19:12,320
Speaker 1: audio has two really important factors, sample rate and bit depth,

303
00:19:13,119 --> 00:19:15,600
Speaker 1: or to another extent, bit rate. We'll talk about bit

304
00:19:15,720 --> 00:19:20,240
Speaker 1: rate as well. So the sample rate refers to how

305
00:19:20,280 --> 00:19:23,840
Speaker 1: many times you reference an analog sound to create the

306
00:19:23,920 --> 00:19:27,720
Speaker 1: digital version. So sound like I said, is uninterrupted. In

307
00:19:27,760 --> 00:19:32,840
Speaker 1: the analog world, you've got that that nice wave form.

308
00:19:32,880 --> 00:19:36,000
Speaker 1: In the analog world, that's not how digital world works.

309
00:19:36,080 --> 00:19:39,280
Speaker 1: Digital world, we have to describe that sound in a

310
00:19:39,359 --> 00:19:45,560
Speaker 1: series of discrete snippets of sound. It's probably easiest to

311
00:19:45,600 --> 00:19:51,800
Speaker 1: describe this with an analogy to movies on film. If

312
00:19:51,840 --> 00:19:55,320
Speaker 1: you work with film, like you're creating a movie on film,

313
00:19:55,800 --> 00:19:58,960
Speaker 1: then you know that you're not looking at a real

314
00:19:59,200 --> 00:20:02,200
Speaker 1: moving picture when you see the film played out at

315
00:20:02,200 --> 00:20:05,480
Speaker 1: the cinema. Instead, what you're looking at is a series

316
00:20:05,600 --> 00:20:10,120
Speaker 1: of photographs. If you take a film strip and you

317
00:20:10,160 --> 00:20:14,200
Speaker 1: look at it under a light, you'll see it's one

318
00:20:14,320 --> 00:20:18,720
Speaker 1: after another photograph. It's just a series of pictures. It's

319
00:20:18,720 --> 00:20:20,880
Speaker 1: only when you play them back at the right speed

320
00:20:21,480 --> 00:20:23,760
Speaker 1: and you projected onto a screen that you get the

321
00:20:23,840 --> 00:20:28,480
Speaker 1: illusion of continuous motion. But it's not really continuous. It's

322
00:20:28,520 --> 00:20:31,720
Speaker 1: just this series of photographs played at twenty four frames

323
00:20:31,760 --> 00:20:36,800
Speaker 1: per second in the case of actual film. So that

324
00:20:37,000 --> 00:20:40,119
Speaker 1: ends up being very analogous to the way we encode

325
00:20:40,160 --> 00:20:44,000
Speaker 1: digital audio. You take the analog recording and you take

326
00:20:44,280 --> 00:20:49,800
Speaker 1: snapshots of sound. The more frequently you take those snapshots,

327
00:20:50,200 --> 00:20:52,440
Speaker 1: the higher your sample rates. So in other words, if

328
00:20:52,440 --> 00:20:55,600
Speaker 1: you did one a second, your sample rate would be awful.

329
00:20:56,320 --> 00:20:58,560
Speaker 1: You would have a sample rate of one. But the

330
00:20:58,640 --> 00:21:01,400
Speaker 1: higher the sample rate, the close to your digital representation

331
00:21:01,440 --> 00:21:05,240
Speaker 1: will be to the frequency in the analog sound format. Actually,

332
00:21:05,720 --> 00:21:07,960
Speaker 1: what's really important to remember is that your sample rate

333
00:21:08,000 --> 00:21:10,399
Speaker 1: has to be about twice actually does have to be

334
00:21:10,480 --> 00:21:14,879
Speaker 1: twice what the highest frequency sound is in your recording.

335
00:21:16,359 --> 00:21:20,119
Speaker 1: It has to be because if it's not, it cannot

336
00:21:20,280 --> 00:21:25,879
Speaker 1: encode that sound accurately. It's kind of interesting and you

337
00:21:25,960 --> 00:21:27,960
Speaker 1: might wonder, how do we take these snapshots in the

338
00:21:27,960 --> 00:21:31,080
Speaker 1: first place. Well, if you're capturing audio, let's say we're

339
00:21:31,119 --> 00:21:34,560
Speaker 1: recording to digital, So we've got a microphone set up,

340
00:21:34,920 --> 00:21:39,240
Speaker 1: and we're recording to a digital media storage. Like let's

341
00:21:39,240 --> 00:21:41,480
Speaker 1: just say we're recording straight to someone's hard drive. So

342
00:21:41,520 --> 00:21:44,720
Speaker 1: we're talking into a microphone recording to a hard drive.

343
00:21:45,640 --> 00:21:49,400
Speaker 1: So you're using an analog microphone. Let's say you would

344
00:21:49,400 --> 00:21:53,720
Speaker 1: need an analog to digital converter. Now, this particular component

345
00:21:54,000 --> 00:21:58,719
Speaker 1: can receive discrete voltages from another device like your microphone.

346
00:21:59,000 --> 00:22:05,720
Speaker 1: So your microphone is converting sound into uh differences in voltage.

347
00:22:05,960 --> 00:22:08,840
Speaker 1: That's essentially how it communicates. So that it can then

348
00:22:09,000 --> 00:22:12,040
Speaker 1: send that to some other element. In this case, it's

349
00:22:12,080 --> 00:22:15,679
Speaker 1: sending it to the the analog to digital converter so

350
00:22:15,720 --> 00:22:18,359
Speaker 1: that it can be stored digitally on your hard drive.

351
00:22:19,400 --> 00:22:26,560
Speaker 1: So this analog digital converters references or samples the discrete

352
00:22:26,640 --> 00:22:30,199
Speaker 1: voltage many times every second in order to create a

353
00:22:30,240 --> 00:22:34,720
Speaker 1: digital representation of the analog sound. It converts the voltages

354
00:22:34,800 --> 00:22:39,360
Speaker 1: into numbers in a process called quantization, and we express

355
00:22:39,400 --> 00:22:42,439
Speaker 1: those numbers in bits, So these are zeros and ones.

356
00:22:43,000 --> 00:22:45,720
Speaker 1: When you want to play the digital audio, a digital

357
00:22:45,760 --> 00:22:49,760
Speaker 1: to analog converter does the same process in reverse. So

358
00:22:50,040 --> 00:22:53,720
Speaker 1: it takes this digital information, these zeros and ones and

359
00:22:53,840 --> 00:22:57,520
Speaker 1: converts it into a series of discrete voltages, which then

360
00:22:57,800 --> 00:23:01,480
Speaker 1: can be amplified and sent to a speaker and create sound.

361
00:23:02,720 --> 00:23:05,280
Speaker 1: So all of that's really important. But now let's let's

362
00:23:05,320 --> 00:23:07,879
Speaker 1: talk about some concrete examples. And the best way to

363
00:23:07,920 --> 00:23:11,199
Speaker 1: do this is to go with compact discs. Because we

364
00:23:11,280 --> 00:23:15,080
Speaker 1: have a standard sample rate for compact discs, and that

365
00:23:15,240 --> 00:23:18,520
Speaker 1: standard sample rate is forty four point one killer hurts

366
00:23:18,600 --> 00:23:22,119
Speaker 1: to create CD equality audio. That means that the audio

367
00:23:22,240 --> 00:23:27,960
Speaker 1: is sampled forty four thousand, one hundred times every second

368
00:23:28,840 --> 00:23:30,800
Speaker 1: the way to hear. You say, the range of human

369
00:23:30,840 --> 00:23:33,280
Speaker 1: hearing you said only goes to twenty hurts to twenty

370
00:23:33,359 --> 00:23:36,240
Speaker 1: killer hurts. If it only goes up to twenty killer hurts,

371
00:23:36,240 --> 00:23:39,000
Speaker 1: why are you sampling at forty four thousand, one hundred

372
00:23:39,119 --> 00:23:43,520
Speaker 1: times every second? If it's twenty thousand times a second

373
00:23:43,560 --> 00:23:46,680
Speaker 1: for the frequency, why go up to forty four thousand,

374
00:23:46,760 --> 00:23:49,359
Speaker 1: one hundred Is there some relationship between that and the

375
00:23:49,400 --> 00:23:52,640
Speaker 1: CD sample rate? And the answer is yes. So there

376
00:23:52,760 --> 00:23:57,959
Speaker 1: is a theorem called the Niquist Shannon sampling theorem, and

377
00:23:58,040 --> 00:24:00,719
Speaker 1: that states that the sample rate must be twice the

378
00:24:00,760 --> 00:24:03,960
Speaker 1: maximum frequency of a recording in order to describe the

379
00:24:04,000 --> 00:24:08,200
Speaker 1: frequency properly. So the general thought is the maximum frequency

380
00:24:08,240 --> 00:24:10,879
Speaker 1: most humans can here's twenty killer hurts. And for that reason,

381
00:24:10,920 --> 00:24:13,760
Speaker 1: Phillips and Sony when they were working to create the

382
00:24:13,920 --> 00:24:17,919
Speaker 1: CD format to make it a standard, they decided on

383
00:24:17,960 --> 00:24:20,840
Speaker 1: forty four point one killer hurts as that standard sample

384
00:24:20,920 --> 00:24:23,359
Speaker 1: rate for c D audio. It was more than double

385
00:24:23,400 --> 00:24:26,000
Speaker 1: the top frequency generally considered to be in the upper

386
00:24:26,080 --> 00:24:29,120
Speaker 1: level of human hearing. But what happens if you were

387
00:24:29,160 --> 00:24:32,360
Speaker 1: to lower the sampling rate. What if you didn't sample

388
00:24:32,440 --> 00:24:37,520
Speaker 1: at What if you sampled at let's say sixteen killer hurts,

389
00:24:37,560 --> 00:24:41,040
Speaker 1: so sixteen thousand times a second you sample it. Well,

390
00:24:41,359 --> 00:24:43,520
Speaker 1: that means you would only be able to record and

391
00:24:43,560 --> 00:24:47,119
Speaker 1: replicate any sound with a frequency up to eight killer

392
00:24:47,200 --> 00:24:52,240
Speaker 1: hurts or less, so eight thousand hurts or less. But

393
00:24:52,400 --> 00:24:55,560
Speaker 1: if you had any sound that was greater than eight

394
00:24:55,600 --> 00:24:59,879
Speaker 1: thousand hurts or eight killer hurts, anything higher than that,

395
00:25:00,000 --> 00:25:04,360
Speaker 1: it would be folded down to fit below the eight

396
00:25:04,440 --> 00:25:08,160
Speaker 1: killer hurts limit. Perceptually, that means the sounds you would

397
00:25:08,200 --> 00:25:11,159
Speaker 1: hear in the playback could include frequencies that were not

398
00:25:11,320 --> 00:25:16,120
Speaker 1: present in the original performance of that sound. So let's

399
00:25:16,119 --> 00:25:20,560
Speaker 1: say that I'm using a sample rate of sixteen uh,

400
00:25:20,600 --> 00:25:24,359
Speaker 1: you know, killer hurts, and someone is playing a musical

401
00:25:24,400 --> 00:25:27,160
Speaker 1: instrument and they play a note that's at a nine

402
00:25:27,200 --> 00:25:32,720
Speaker 1: killer hurts frequency. Well, because I'm sampling at sixteen killer hurts,

403
00:25:33,320 --> 00:25:37,639
Speaker 1: my limit for frequencies is eight killer hurts. If you

404
00:25:37,680 --> 00:25:40,560
Speaker 1: play something at nine killer hurts, what happens is it

405
00:25:40,880 --> 00:25:45,240
Speaker 1: the recording seems to fold the sound back, and it

406
00:25:45,359 --> 00:25:49,840
Speaker 1: folds it back at the same limit that the sound

407
00:25:49,880 --> 00:25:54,960
Speaker 1: goes over. The sample rate, or rather the Nyquist limit,

408
00:25:55,000 --> 00:25:57,560
Speaker 1: I should say, not the sample rateself but the Nyquist limit,

409
00:25:58,720 --> 00:26:03,720
Speaker 1: so nine killer her sound played. My limit is eight

410
00:26:03,800 --> 00:26:06,960
Speaker 1: killer hurts. Well, nine killer hurts is one killer hurts

411
00:26:06,960 --> 00:26:10,000
Speaker 1: more than eight, so it folds it back and the

412
00:26:10,040 --> 00:26:13,320
Speaker 1: sound you would hear on the recording would be seven

413
00:26:13,400 --> 00:26:17,000
Speaker 1: killer hurts. So the original sound is nine killer hurts,

414
00:26:17,080 --> 00:26:21,480
Speaker 1: the playback sound is seven killer hurts, and you would

415
00:26:21,520 --> 00:26:25,639
Speaker 1: hear something recorded that wasn't actually played. That's why you

416
00:26:25,680 --> 00:26:28,800
Speaker 1: have to have a really high sample rate so that

417
00:26:28,840 --> 00:26:32,679
Speaker 1: you don't have these instances where sound gets folded back

418
00:26:33,480 --> 00:26:38,359
Speaker 1: into the frequency range, because otherwise what you are hearing

419
00:26:38,520 --> 00:26:42,480
Speaker 1: is not an accurate representation of what was actually generated

420
00:26:42,760 --> 00:26:46,919
Speaker 1: what you were trying to record. This whole phenomenon, by

421
00:26:46,920 --> 00:26:51,800
Speaker 1: the way, is called fold over or sometimes aliasing. So

422
00:26:51,840 --> 00:26:54,800
Speaker 1: that's sample rate. But then we've got bit depth. Now,

423
00:26:54,840 --> 00:26:59,080
Speaker 1: this is all about measuring the volume or amplitude of

424
00:26:59,119 --> 00:27:02,359
Speaker 1: a sound. So you have a range. You just make

425
00:27:02,400 --> 00:27:06,240
Speaker 1: an arbitrary range to say, like we're gonna go quietest

426
00:27:06,280 --> 00:27:09,199
Speaker 1: to loudest, and you just define what that range is.

427
00:27:09,400 --> 00:27:12,120
Speaker 1: It could literally be any range. Let's say you say

428
00:27:12,200 --> 00:27:15,960
Speaker 1: zero to one hundred. Zero is dead silence, no sound

429
00:27:16,000 --> 00:27:19,560
Speaker 1: at all. One hundred is as loud as the sound

430
00:27:19,720 --> 00:27:24,160
Speaker 1: ever gets. It's the peak volume of sound. That means

431
00:27:24,200 --> 00:27:28,560
Speaker 1: you can describe all the different volumes within that recording

432
00:27:29,119 --> 00:27:33,000
Speaker 1: at a number between zero and one hundred. But let's

433
00:27:33,000 --> 00:27:36,320
Speaker 1: say you take that same recording and instead of making

434
00:27:36,320 --> 00:27:39,679
Speaker 1: the range zero to one hundred, you say it's zero

435
00:27:39,760 --> 00:27:43,919
Speaker 1: to two thousand. You haven't made the volume louder. The

436
00:27:44,000 --> 00:27:47,080
Speaker 1: volume is still the exact same as it was when

437
00:27:47,119 --> 00:27:49,879
Speaker 1: you called the range zero to one hundred. But what

438
00:27:50,000 --> 00:27:53,720
Speaker 1: you have done is added more units. You have created

439
00:27:53,880 --> 00:27:58,880
Speaker 1: more precise steps between absolute silent and as loud as

440
00:27:58,920 --> 00:28:02,720
Speaker 1: it gets. So you've just increased the size of the

441
00:28:02,800 --> 00:28:04,760
Speaker 1: range so that you can be more precise in the

442
00:28:04,800 --> 00:28:09,280
Speaker 1: differences in volume. And this is really important. So let's

443
00:28:09,320 --> 00:28:11,800
Speaker 1: say that you've got a sound that you rank at

444
00:28:11,880 --> 00:28:15,440
Speaker 1: seventy eight and another sound that you rank at seventy nine,

445
00:28:16,080 --> 00:28:18,920
Speaker 1: and that's gonna be the same for both of these ranges. Uh,

446
00:28:19,040 --> 00:28:21,880
Speaker 1: just two different examples. Actually, So you've got your zero

447
00:28:21,880 --> 00:28:25,840
Speaker 1: to one range, and a seventy eight would be seventy

448
00:28:25,840 --> 00:28:29,760
Speaker 1: eight percent of the loudest sound in the entire recording,

449
00:28:30,280 --> 00:28:33,159
Speaker 1: and at seventy nine would be a seventy nine of

450
00:28:33,200 --> 00:28:36,960
Speaker 1: the loudest sound in the entire recording. That's an actually

451
00:28:36,960 --> 00:28:39,760
Speaker 1: pretty hefty jump. But let's say we instead went with

452
00:28:39,800 --> 00:28:42,920
Speaker 1: that zero to two thousand range and you still had

453
00:28:42,920 --> 00:28:47,160
Speaker 1: seventy eight and seventy nine. Well, seventy eight would represent

454
00:28:47,280 --> 00:28:50,840
Speaker 1: three point nine percent of the full volume and seventy

455
00:28:50,920 --> 00:28:54,480
Speaker 1: nine would resent represent three point nine five of a

456
00:28:54,520 --> 00:28:57,640
Speaker 1: full volume. In other words, you'd be able to mark

457
00:28:57,960 --> 00:29:02,280
Speaker 1: much more subtle differences in volume, and that means you

458
00:29:02,280 --> 00:29:06,680
Speaker 1: can have more nuance in your recording. And since we're

459
00:29:06,680 --> 00:29:09,800
Speaker 1: talking about a natural sound to start off with, so

460
00:29:09,840 --> 00:29:12,360
Speaker 1: you're taking a natural sound and you're trying to digitize it.

461
00:29:13,160 --> 00:29:17,800
Speaker 1: Smooth changes in amplitude are possible in natural sound. Using

462
00:29:17,800 --> 00:29:21,000
Speaker 1: a broader range to describe the volume is best if

463
00:29:21,000 --> 00:29:25,320
Speaker 1: you want to get an accurate representation or resolution of

464
00:29:25,360 --> 00:29:28,880
Speaker 1: that sound. Going back to that zero to one range

465
00:29:29,200 --> 00:29:32,240
Speaker 1: changes in volume would be more chunky. Two sounds that

466
00:29:32,280 --> 00:29:36,440
Speaker 1: have slight differences in amplitude would end up being defined

467
00:29:36,520 --> 00:29:40,680
Speaker 1: as being identical because you wouldn't have the precision. You know,

468
00:29:40,720 --> 00:29:42,760
Speaker 1: you couldn't say this one seventy eight and a half.

469
00:29:43,240 --> 00:29:45,520
Speaker 1: It would either be seventy eight or seventy nine. So

470
00:29:45,600 --> 00:29:48,960
Speaker 1: you could have two sounds that in a greater precision

471
00:29:49,120 --> 00:29:52,680
Speaker 1: you could tell the difference between their volumes. But if

472
00:29:52,720 --> 00:29:57,240
Speaker 1: you have that lower, that more shallow bit depth, you

473
00:29:57,240 --> 00:29:58,800
Speaker 1: wouldn't be able to tell the difference of it. You

474
00:29:58,840 --> 00:30:01,840
Speaker 1: would lose that new once that's subtlety. This is part

475
00:30:01,880 --> 00:30:06,000
Speaker 1: of the reason why people say, like a lot of

476
00:30:06,040 --> 00:30:10,480
Speaker 1: the modern music has uh lower ranges and changes in volume,

477
00:30:10,600 --> 00:30:14,480
Speaker 1: like the the loudest loud parts and the softest soft parts.

478
00:30:14,520 --> 00:30:18,480
Speaker 1: That range has decreased over time, which a lot of

479
00:30:18,480 --> 00:30:21,320
Speaker 1: people have argued has meant that music has gotten less

480
00:30:21,880 --> 00:30:27,120
Speaker 1: complex and therefore, in some minds, less interesting. That's on

481
00:30:27,160 --> 00:30:31,160
Speaker 1: a related uh kind of philosophy to what I'm talking

482
00:30:31,200 --> 00:30:36,640
Speaker 1: about here. So you want to have those smaller steps

483
00:30:36,720 --> 00:30:40,760
Speaker 1: between each unit so you can create greater resolution, more

484
00:30:40,880 --> 00:30:46,360
Speaker 1: smoothness to the recorded audio. And it's actually the bit

485
00:30:46,480 --> 00:30:49,480
Speaker 1: rate and CD audio that will help make the sound

486
00:30:49,680 --> 00:30:53,480
Speaker 1: seem smooth. So if you ever listened to eight bit music,

487
00:30:53,880 --> 00:30:56,480
Speaker 1: you know, like the kind from old video game consoles,

488
00:30:56,520 --> 00:30:59,520
Speaker 1: that sound is really harsh and sort of chunky and

489
00:30:59,640 --> 00:31:03,040
Speaker 1: has an appeal, but it's not you know, it's not

490
00:31:03,440 --> 00:31:07,160
Speaker 1: smooth at all. It can create an amazing effect, but

491
00:31:07,200 --> 00:31:10,960
Speaker 1: if you want to represent true analog sound, it's not awesome.

492
00:31:11,960 --> 00:31:15,920
Speaker 1: If you went up to sixteen bit, that's CD quality

493
00:31:16,000 --> 00:31:21,080
Speaker 1: bit depth, it's much better. Uh, Professional recording studios will

494
00:31:21,120 --> 00:31:25,240
Speaker 1: do four bit or thirty two bit because they're gonna

495
00:31:25,280 --> 00:31:28,800
Speaker 1: do a lot of post processing work on those audio files.

496
00:31:29,080 --> 00:31:31,000
Speaker 1: And when you do that post processing work, if you

497
00:31:31,040 --> 00:31:34,840
Speaker 1: do it at sixteen bit, the stuff you're doing, the

498
00:31:34,920 --> 00:31:37,840
Speaker 1: changes you make can become noticeable, and most times you

499
00:31:37,880 --> 00:31:40,440
Speaker 1: don't want that. You don't want it to be you know,

500
00:31:40,680 --> 00:31:42,600
Speaker 1: you don't want it to stand out from the rest

501
00:31:42,600 --> 00:31:45,320
Speaker 1: of the audio file. But that's the only reason they

502
00:31:45,320 --> 00:31:47,360
Speaker 1: go up to twenty four bit or thirty two bit.

503
00:31:47,720 --> 00:31:51,320
Speaker 1: There'd be no point in playing it back at that rate,

504
00:31:51,440 --> 00:31:57,160
Speaker 1: that bit depth, because human hearing is not so adept

505
00:31:57,320 --> 00:32:00,280
Speaker 1: to tell the difference, at least not from most human ends.

506
00:32:01,120 --> 00:32:04,240
Speaker 1: So if you played back a recording at sixteen bit

507
00:32:04,560 --> 00:32:07,080
Speaker 1: and another one at four bit, it's the same piece.

508
00:32:07,760 --> 00:32:10,080
Speaker 1: Most people would not be able to tell the difference

509
00:32:10,120 --> 00:32:14,360
Speaker 1: because you've already reached a resolution that equals the precision

510
00:32:14,440 --> 00:32:18,120
Speaker 1: of human hearing. Keeping in mind again, human hearing is subjective,

511
00:32:18,360 --> 00:32:21,680
Speaker 1: not everyone is equal. There's some people who have incredible

512
00:32:21,720 --> 00:32:24,680
Speaker 1: hearing who may be able to pick out that difference.

513
00:32:25,520 --> 00:32:27,880
Speaker 1: I am not one of those people, but I am

514
00:32:27,920 --> 00:32:30,200
Speaker 1: a person who's going to tell you. We'll get to

515
00:32:30,240 --> 00:32:34,200
Speaker 1: the last section in just a bit, but first let's

516
00:32:34,200 --> 00:32:45,440
Speaker 1: take another quick break to thank our sponsor. All Right,

517
00:32:45,520 --> 00:32:48,320
Speaker 1: So bits depth, what we just talked about that can

518
00:32:48,360 --> 00:32:51,800
Speaker 1: be thought of is how well the sound is described,

519
00:32:52,640 --> 00:32:55,680
Speaker 1: and the sampling rate is how frequently or how much

520
00:32:56,040 --> 00:33:00,560
Speaker 1: the sound is described. And CD Audio quality has sixteen

521
00:33:00,560 --> 00:33:05,000
Speaker 1: bit audio. That means that they actually have sixty five thousand,

522
00:33:05,160 --> 00:33:09,480
Speaker 1: five hundred thirty six different levels of volume that they

523
00:33:09,480 --> 00:33:13,800
Speaker 1: can describe within an audio track. So my example of

524
00:33:13,880 --> 00:33:18,280
Speaker 1: zero to two thousand that is primitive compared to c

525
00:33:18,440 --> 00:33:22,600
Speaker 1: D audio because it has the sixteen bit style five

526
00:33:22,640 --> 00:33:26,160
Speaker 1: hundred thirty six different levels. And how is that possible. Well,

527
00:33:28,000 --> 00:33:31,840
Speaker 1: when we say sixteen bit, remember a bit represents two

528
00:33:31,880 --> 00:33:34,240
Speaker 1: states zero or one. So you take the number two

529
00:33:34,960 --> 00:33:39,480
Speaker 1: and then you raise it to the power of sixteen ah,

530
00:33:39,680 --> 00:33:43,800
Speaker 1: so you multiply to by itself sixteen times and you

531
00:33:43,840 --> 00:33:47,280
Speaker 1: get sixty five thousand, three D fifty six. So that's

532
00:33:47,280 --> 00:33:51,160
Speaker 1: that's where that number comes from. Now, with your digital sample.

533
00:33:52,080 --> 00:33:54,840
Speaker 1: You have a collection of points that roughly replicate the

534
00:33:54,920 --> 00:33:57,760
Speaker 1: shape of an analog sound wave. It's gonna look a

535
00:33:57,760 --> 00:34:01,080
Speaker 1: little funky, but you'll be able to see what the

536
00:34:01,240 --> 00:34:05,760
Speaker 1: frequency and amplitude generally was of the original recording if

537
00:34:05,760 --> 00:34:08,800
Speaker 1: you were to plot this on an X y axis.

538
00:34:09,600 --> 00:34:12,439
Speaker 1: But if you were just to connect each successive point

539
00:34:12,480 --> 00:34:15,759
Speaker 1: with a straight line, even as close together as they

540
00:34:15,800 --> 00:34:18,040
Speaker 1: would be, because you're looking at forty four thousand one

541
00:34:18,400 --> 00:34:22,080
Speaker 1: times a second, it had sound pretty awful. So we

542
00:34:22,120 --> 00:34:26,440
Speaker 1: actually use an algorithm called interpolation to join the points

543
00:34:26,719 --> 00:34:29,960
Speaker 1: smoothly to imitate a sound wave form, and that gives

544
00:34:30,000 --> 00:34:33,640
Speaker 1: a musical playback program the ability to replicate an analog

545
00:34:33,680 --> 00:34:38,040
Speaker 1: wave form. And that's actually called pulse code modulation or

546
00:34:38,160 --> 00:34:45,200
Speaker 1: pc M. And if you store audio uh intact this way,

547
00:34:45,320 --> 00:34:48,560
Speaker 1: you would have what we call a lossless audio file,

548
00:34:48,960 --> 00:34:51,640
Speaker 1: which means exactly what it sounds like. None of that

549
00:34:51,760 --> 00:34:54,919
Speaker 1: data would ever get filtered out of the file, even

550
00:34:54,960 --> 00:34:57,800
Speaker 1: if the sounds were beyond the range of human hearing,

551
00:34:57,800 --> 00:35:01,000
Speaker 1: they would be recorded, and you would have a lossless

552
00:35:01,200 --> 00:35:05,080
Speaker 1: file format. Those files tend to be quite big, depending

553
00:35:05,160 --> 00:35:08,520
Speaker 1: upon how long a recording you make. Of course, all right.

554
00:35:09,000 --> 00:35:11,400
Speaker 1: So now here's where it gets a little confusing. And

555
00:35:11,440 --> 00:35:13,080
Speaker 1: I think I even said bit rate a couple of

556
00:35:13,080 --> 00:35:16,000
Speaker 1: times when I really meant bit depth earlier. But up

557
00:35:16,040 --> 00:35:19,319
Speaker 1: to this point, I really was talking bit depth. So

558
00:35:19,680 --> 00:35:22,120
Speaker 1: my apologies to all of you out there if a

559
00:35:22,160 --> 00:35:24,719
Speaker 1: bit rate slipped through because I did not mean it.

560
00:35:24,800 --> 00:35:27,520
Speaker 1: Now I'm going to talk about bit rate and show

561
00:35:27,560 --> 00:35:32,080
Speaker 1: you how it's different than bit depth. Bit Rate refers

562
00:35:32,120 --> 00:35:35,799
Speaker 1: to the amount of data audio uses per second or

563
00:35:35,840 --> 00:35:39,960
Speaker 1: requires per second of recording, and you derive bit rate

564
00:35:40,360 --> 00:35:43,960
Speaker 1: from the bit depth and this sampling rate it's represented

565
00:35:44,040 --> 00:35:47,520
Speaker 1: as bits per second. So again let's go to seed

566
00:35:47,560 --> 00:35:52,200
Speaker 1: equality sound. That makes it easy. You have thousand samples

567
00:35:52,360 --> 00:35:57,160
Speaker 1: per second, you've got sixteen bits or two bites because

568
00:35:57,239 --> 00:36:00,800
Speaker 1: remember a bite is eight bits, so you two bites

569
00:36:00,840 --> 00:36:07,320
Speaker 1: to describe each sample. So two bites for samples per second.

570
00:36:08,200 --> 00:36:11,040
Speaker 1: Uh plus you probably are gonna have to multiply that

571
00:36:11,160 --> 00:36:14,719
Speaker 1: by two because you're probably recording in stereo, so you

572
00:36:14,760 --> 00:36:18,680
Speaker 1: have to do that once reach track. So you get

573
00:36:18,719 --> 00:36:21,080
Speaker 1: that number, then you have to multiply that by sixty

574
00:36:21,080 --> 00:36:23,839
Speaker 1: seconds to determine how much data per minute you are

575
00:36:23,960 --> 00:36:27,520
Speaker 1: creating when you're recording and with seed quality audio that

576
00:36:27,600 --> 00:36:30,400
Speaker 1: ends up being about ten megabytes of data per minute.

577
00:36:31,239 --> 00:36:34,000
Speaker 1: Now these days that's not really that big a deal

578
00:36:34,680 --> 00:36:38,719
Speaker 1: because we're dealing with super fast Internet speeds and enormous

579
00:36:38,800 --> 00:36:42,600
Speaker 1: hard drives. But just a few years ago, that was

580
00:36:42,640 --> 00:36:46,000
Speaker 1: considered to be a really sizeable file, I mean an

581
00:36:46,120 --> 00:36:48,920
Speaker 1: enormous file. And so if you wanted to find a

582
00:36:48,920 --> 00:36:51,879
Speaker 1: way to distribute digital audio so it didn't take up

583
00:36:51,920 --> 00:36:55,840
Speaker 1: too much space, you had to figure out how you

584
00:36:55,880 --> 00:37:00,000
Speaker 1: could compress those files and make them smaller, make them

585
00:37:00,000 --> 00:37:04,239
Speaker 1: more manageable. And now we can finally get back to

586
00:37:04,320 --> 00:37:08,240
Speaker 1: Germany and Hair Brandenburg. You thought we left him behind,

587
00:37:08,960 --> 00:37:12,680
Speaker 1: We didn't. He was just part of a flashback. So

588
00:37:12,840 --> 00:37:15,320
Speaker 1: let's go to the MP three. First of all, it

589
00:37:15,360 --> 00:37:19,040
Speaker 1: gets its name from the Motion Picture Experts Group, also

590
00:37:19,160 --> 00:37:23,600
Speaker 1: known as IMPEG. It was part of a project that

591
00:37:23,719 --> 00:37:26,840
Speaker 1: IMPEG was doing that was looking at ways of compressing

592
00:37:26,880 --> 00:37:30,239
Speaker 1: audio along with the work that they were doing with

593
00:37:30,360 --> 00:37:35,279
Speaker 1: video files. It's actually named after the process that they developed,

594
00:37:35,680 --> 00:37:39,120
Speaker 1: called IMPEG Audio Layer three. So yes, there was a

595
00:37:39,200 --> 00:37:42,279
Speaker 1: layer one and a layer two. Layer three was a

596
00:37:42,320 --> 00:37:44,719
Speaker 1: refinement of the approach and was the one that was

597
00:37:44,760 --> 00:37:49,520
Speaker 1: actually successful in the market now, Brandenburg was working with

598
00:37:49,600 --> 00:37:53,680
Speaker 1: an instructor he was pursuing. Brandenburg was pursuing a PhD

599
00:37:53,760 --> 00:37:55,880
Speaker 1: at the time and trying to come up with a

600
00:37:55,880 --> 00:37:59,799
Speaker 1: practical means of transmitting digital audio across phone lines, and

601
00:37:59,800 --> 00:38:03,000
Speaker 1: in the process he began to experiment with algorithms that

602
00:38:03,040 --> 00:38:08,360
Speaker 1: could take digital audio information and determine which bits are significant.

603
00:38:08,840 --> 00:38:13,040
Speaker 1: Anything that was deemed insignificant could be discarded. So the

604
00:38:13,120 --> 00:38:16,560
Speaker 1: thinking was that information we cannot perceive as human beings

605
00:38:16,680 --> 00:38:19,799
Speaker 1: is worthless. There's no point in preserving it in an

606
00:38:19,800 --> 00:38:22,880
Speaker 1: audio file format. It's just taking up space that we

607
00:38:22,960 --> 00:38:26,000
Speaker 1: can't even perceive when we play it back, So there's

608
00:38:26,040 --> 00:38:28,960
Speaker 1: no reason to replicate it. There's no reason to record it.

609
00:38:29,080 --> 00:38:32,000
Speaker 1: Leave it out, and that way you could compress digital

610
00:38:32,040 --> 00:38:35,680
Speaker 1: audio files. Or to put it another way, if the

611
00:38:35,719 --> 00:38:38,239
Speaker 1: algorithm determined that a sound was outside the range of

612
00:38:38,280 --> 00:38:41,880
Speaker 1: human hearing, it would drop it from the encoding process,

613
00:38:41,920 --> 00:38:44,640
Speaker 1: so you get a sound file much smaller than the

614
00:38:44,680 --> 00:38:49,239
Speaker 1: more accurate representative version. So the lossless version would be

615
00:38:49,440 --> 00:38:53,360
Speaker 1: more accurate to the original sound. But this new version,

616
00:38:53,480 --> 00:38:56,640
Speaker 1: what we would call a lossy version, a compressed file

617
00:38:57,120 --> 00:39:00,359
Speaker 1: would be able to replicate it pretty well if it's

618
00:39:00,400 --> 00:39:03,640
Speaker 1: designed properly, and it maybe to a point if you

619
00:39:03,680 --> 00:39:06,440
Speaker 1: design it well enough that you couldn't tell the difference

620
00:39:06,480 --> 00:39:10,040
Speaker 1: between the two. Uh. That took some time. That was

621
00:39:10,080 --> 00:39:15,320
Speaker 1: not easy to do. So the new file, the new version,

622
00:39:15,400 --> 00:39:18,560
Speaker 1: the compressed one, the lossy format, would only have the

623
00:39:18,600 --> 00:39:22,280
Speaker 1: actual relevant data, and from that point forward, the challenge

624
00:39:22,360 --> 00:39:26,719
Speaker 1: was to determine what are the benchmarks to figure out

625
00:39:26,840 --> 00:39:30,480
Speaker 1: what is relevant versus what is irrelevant, because if you

626
00:39:30,520 --> 00:39:33,520
Speaker 1: lose too much information, you change the quality of the recording,

627
00:39:33,960 --> 00:39:37,280
Speaker 1: meaning it's no longer an accurate representation of the original sound.

628
00:39:37,880 --> 00:39:41,360
Speaker 1: So you might say that any sound below twenty hurts

629
00:39:41,640 --> 00:39:44,880
Speaker 1: isn't relevant because it's below the range of your typical

630
00:39:45,000 --> 00:39:49,080
Speaker 1: human humans ability to hear. You might say that anything

631
00:39:49,080 --> 00:39:54,200
Speaker 1: above twenty thousand hurts or twenty killer hurts is irrelevant

632
00:39:54,280 --> 00:39:59,080
Speaker 1: because humans typically can't hear sounds above that frequency. You

633
00:39:59,160 --> 00:40:02,640
Speaker 1: might say that sound at a certain amplitude or lower

634
00:40:03,200 --> 00:40:08,040
Speaker 1: are irrelevant because they're so quiet that humans wouldn't hear them.

635
00:40:08,239 --> 00:40:11,320
Speaker 1: Or you might say that if a certain sound is

636
00:40:11,360 --> 00:40:14,120
Speaker 1: at a lower amplitude and a different sound is at

637
00:40:14,120 --> 00:40:18,279
Speaker 1: a higher amplitude, the higher amplitude sound is drowning out

638
00:40:18,320 --> 00:40:21,680
Speaker 1: the lower amplitude sound, and so we humans don't really

639
00:40:21,680 --> 00:40:24,799
Speaker 1: perceive the lower amplitude sound. This is where we get

640
00:40:24,800 --> 00:40:28,000
Speaker 1: into psychoacoustics. It's not just what we hear, but how

641
00:40:28,040 --> 00:40:32,200
Speaker 1: we perceive the sound itself. And a lot of that

642
00:40:32,280 --> 00:40:35,520
Speaker 1: went into formulating the algorithms to figure out how to

643
00:40:35,560 --> 00:40:38,480
Speaker 1: compress this music in a way where you get a

644
00:40:38,560 --> 00:40:44,359
Speaker 1: recording that represents the original without uh, you know, compromising

645
00:40:44,360 --> 00:40:46,920
Speaker 1: too much and still getting the file size to a

646
00:40:47,040 --> 00:40:50,640
Speaker 1: manageable size. And these are the decisions you have to

647
00:40:50,680 --> 00:40:53,200
Speaker 1: make to figure out which bits of information you keep

648
00:40:53,200 --> 00:40:57,040
Speaker 1: in which ones you ditch. Well Brandenburg and a team

649
00:40:57,040 --> 00:40:59,080
Speaker 1: we're working on our fighting this approach in the late

650
00:40:59,120 --> 00:41:02,839
Speaker 1: eighties and early nineties. And he said, at one point

651
00:41:02,880 --> 00:41:05,120
Speaker 1: he thought he had nailed it, and then he heard

652
00:41:05,120 --> 00:41:10,280
Speaker 1: an acapella song, It was Tom's Diner by Suzanne Vega,

653
00:41:10,800 --> 00:41:14,000
Speaker 1: and then he listened to the compressed MP three version

654
00:41:14,160 --> 00:41:17,520
Speaker 1: of that song, using the the version of MP three

655
00:41:17,560 --> 00:41:20,440
Speaker 1: that had been developed up to that point, and he said,

656
00:41:21,120 --> 00:41:25,360
Speaker 1: it ruined the song. It trashed it. It sounded terrible.

657
00:41:25,680 --> 00:41:29,279
Speaker 1: He said that other representations of music seemed fine with

658
00:41:29,360 --> 00:41:32,360
Speaker 1: this particular approach, but when they went with this stripped

659
00:41:32,360 --> 00:41:36,520
Speaker 1: down acapella song with this particular kind of you're in

660
00:41:36,560 --> 00:41:39,600
Speaker 1: the middle of a space, listening to Suzanne Vegas sing,

661
00:41:40,280 --> 00:41:43,440
Speaker 1: it ruined her voice, and so the team began to

662
00:41:43,440 --> 00:41:47,080
Speaker 1: tweet the compression algorithms to correct for this problem, and

663
00:41:47,120 --> 00:41:49,760
Speaker 1: it took a lot of work to figure out, Okay, well,

664
00:41:49,800 --> 00:41:53,279
Speaker 1: what are the elements of sound that we messed with

665
00:41:53,960 --> 00:41:56,920
Speaker 1: that have created this issue, and ultimately they were finally

666
00:41:56,920 --> 00:41:59,440
Speaker 1: able to create an MP three file that didn't distort

667
00:41:59,560 --> 00:42:02,440
Speaker 1: or ruin the recording. Brandberg said he listened to that

668
00:42:02,520 --> 00:42:05,880
Speaker 1: song somewhere between five hundred and a thousand times, and

669
00:42:05,920 --> 00:42:09,440
Speaker 1: then he saw Suzanne Vega performance live and he was

670
00:42:09,480 --> 00:42:14,520
Speaker 1: able to recognize all of those subtle changes in her

671
00:42:14,600 --> 00:42:18,160
Speaker 1: voice because he had paid so close attention to it

672
00:42:18,280 --> 00:42:22,200
Speaker 1: during the process of tweaking this algorithm. He said, Ultimately,

673
00:42:22,680 --> 00:42:25,200
Speaker 1: the real telling thing is he still enjoyed the song,

674
00:42:26,719 --> 00:42:29,720
Speaker 1: which says a lot about him. Me. I can't stand

675
00:42:29,760 --> 00:42:33,319
Speaker 1: that song, but maybe it's just because to me, there's

676
00:42:33,320 --> 00:42:34,880
Speaker 1: a point where it just sounds like someone is just

677
00:42:34,920 --> 00:42:37,879
Speaker 1: singing about what they're doing, and I do that every day.

678
00:42:38,400 --> 00:42:41,280
Speaker 1: No one gave me a record deal, alright. So getting

679
00:42:41,280 --> 00:42:46,480
Speaker 1: back to MP three, they had finalized the FOUL format

680
00:42:46,520 --> 00:42:49,920
Speaker 1: and created the standard, but it was just one of

681
00:42:50,080 --> 00:42:54,279
Speaker 1: several possibilities for encoding audio, and it didn't immediately take off.

682
00:42:54,320 --> 00:43:01,080
Speaker 1: It wasn't immediately adopted by consumers. The team had identified

683
00:43:01,080 --> 00:43:04,480
Speaker 1: the Internet as a possible distribute distribution method for MP

684
00:43:04,560 --> 00:43:07,839
Speaker 1: three files, rather than just over telephone lines. They said, well,

685
00:43:08,000 --> 00:43:11,080
Speaker 1: can technically we could send and P three's across the Internet,

686
00:43:11,480 --> 00:43:16,280
Speaker 1: so you could send manageable sized files across this network.

687
00:43:17,560 --> 00:43:22,280
Speaker 1: On July fourteenth, they created the file extension dot MP three.

688
00:43:23,680 --> 00:43:26,920
Speaker 1: Now it would take a little bit longer for software

689
00:43:26,960 --> 00:43:29,440
Speaker 1: to take advantage of this. One of the early programs

690
00:43:29,480 --> 00:43:33,560
Speaker 1: was winamp, which made MP three decoding accessible and from

691
00:43:33,560 --> 00:43:36,920
Speaker 1: that point the file format began to take off. To

692
00:43:37,080 --> 00:43:40,440
Speaker 1: follow would be dedicated MP three players and sites that

693
00:43:40,480 --> 00:43:44,160
Speaker 1: allowed people to upload and download compressed audio files, which

694
00:43:44,280 --> 00:43:50,200
Speaker 1: also indicated a rise in piracy. And then in response

695
00:43:50,280 --> 00:43:52,640
Speaker 1: to the rise in piracy, we saw an increase in

696
00:43:52,800 --> 00:43:56,960
Speaker 1: d r M strategies digital rights management or copy protection

697
00:43:57,000 --> 00:44:00,319
Speaker 1: if you prefer, and that all really in it up

698
00:44:00,360 --> 00:44:04,640
Speaker 1: shaping a lot of the policies and strategies that affect

699
00:44:04,640 --> 00:44:07,479
Speaker 1: the Internet today, So you could say that the MP

700
00:44:07,640 --> 00:44:11,520
Speaker 1: three is one of the reasons why the Internet is

701
00:44:11,560 --> 00:44:14,200
Speaker 1: the way it is right now, and why arguments both

702
00:44:14,239 --> 00:44:19,399
Speaker 1: for and against net neutrality have formulated in certain ways.

703
00:44:19,440 --> 00:44:21,439
Speaker 1: A lot of it is shaped by the MP three.

704
00:44:22,480 --> 00:44:26,240
Speaker 1: So that kind of wraps up this discussion about digital

705
00:44:26,280 --> 00:44:29,560
Speaker 1: audio in general and a little bit on MP three files.

706
00:44:29,560 --> 00:44:32,560
Speaker 1: In the next episode of this series, I will dive

707
00:44:32,640 --> 00:44:36,040
Speaker 1: into a more technical explanation of what is actually going

708
00:44:36,120 --> 00:44:39,440
Speaker 1: on with the MP three compression algorithms. And I bet

709
00:44:39,520 --> 00:44:44,440
Speaker 1: you can't wait to learn all about fast Furrier transforms.

710
00:44:44,600 --> 00:44:47,480
Speaker 1: I know I can't, And like I said, I'll have

711
00:44:47,600 --> 00:44:50,400
Speaker 1: other episodes to sprinkle in between this one and the

712
00:44:50,440 --> 00:44:53,239
Speaker 1: next one and then the third one, so that way

713
00:44:53,280 --> 00:44:56,239
Speaker 1: you won't just get digital audio overload. And if you

714
00:44:56,280 --> 00:44:59,839
Speaker 1: guys have any comments or questions or suggestions for show

715
00:44:59,880 --> 00:45:03,040
Speaker 1: to topics or people I should interview, or maybe people

716
00:45:03,040 --> 00:45:05,520
Speaker 1: I should have on as a guest host, shoot him

717
00:45:05,520 --> 00:45:09,240
Speaker 1: my way. My email is tech Stuff at how stuff

718
00:45:09,280 --> 00:45:12,000
Speaker 1: works dot com, or you can always drop me a

719
00:45:12,040 --> 00:45:15,000
Speaker 1: line on Facebook or Twitter with the handle tech stuff

720
00:45:15,239 --> 00:45:18,960
Speaker 1: hs W and I'll talk to you guys again really

721
00:45:19,000 --> 00:45:25,200
Speaker 1: soon for more on this and thousands of other topics.

722
00:45:25,440 --> 00:45:36,520
Speaker 1: Is it how stuff Works? Dot com