1
00:00:04,440 --> 00:00:12,479
Speaker 1: Welcome to tech Stuff, a production from iHeartRadio. Hey there,

2
00:00:12,520 --> 00:00:15,720
Speaker 1: and welcome to tech Stuff. I'm your host, Jonathan Strickland.

3
00:00:15,720 --> 00:00:18,799
Speaker 1: I'm an executive producer with iHeart Podcasts and How the

4
00:00:18,880 --> 00:00:22,239
Speaker 1: tech are you? So I'm getting ready to go on vacation,

5
00:00:22,560 --> 00:00:26,280
Speaker 1: which means we've got some classic episodes lined up for you. Actually,

6
00:00:26,280 --> 00:00:30,040
Speaker 1: these aren't that classic. These came out last year and

7
00:00:30,080 --> 00:00:33,959
Speaker 1: today I thought I would bring a short one for you.

8
00:00:34,440 --> 00:00:38,840
Speaker 1: This one was published originally on June seventh, twenty twenty three.

9
00:00:39,200 --> 00:00:42,479
Speaker 1: It's a fun little episode. It is titled what was

10
00:00:42,960 --> 00:00:46,560
Speaker 1: the First MP three? This is like one of those

11
00:00:46,600 --> 00:00:50,200
Speaker 1: pub trivia style tech stuff topics. I hope you enjoy.

12
00:00:52,280 --> 00:00:54,240
Speaker 1: It's time for a tech stuff tidbits. I'm going to

13
00:00:54,320 --> 00:00:58,720
Speaker 1: answer the question what was the first MP three? Well,

14
00:00:58,800 --> 00:01:01,960
Speaker 1: here's the too long didn't answer. It was Tom's Diner

15
00:01:02,040 --> 00:01:06,360
Speaker 1: by Suzanne Vega. It's a song I personally do not like.

16
00:01:07,000 --> 00:01:09,800
Speaker 1: It's not to say it's a bad song. Just because

17
00:01:09,920 --> 00:01:12,679
Speaker 1: I don't like something doesn't mean it's bad. I just

18
00:01:12,720 --> 00:01:16,000
Speaker 1: mean I personally do not find this song at all appealing.

19
00:01:16,640 --> 00:01:19,520
Speaker 1: But it was, in fact the first MP three. Now,

20
00:01:19,520 --> 00:01:23,320
Speaker 1: if you don't know Tom's Diner. It features Vega giving

21
00:01:23,360 --> 00:01:26,360
Speaker 1: a little slice a life moment from the perspective of

22
00:01:26,400 --> 00:01:29,240
Speaker 1: a man sitting in a diner who feels kind of

23
00:01:29,280 --> 00:01:32,960
Speaker 1: distanced from the world around him. In case you need

24
00:01:33,000 --> 00:01:38,000
Speaker 1: a reminder, here's the first verse of the song. I

25
00:01:38,040 --> 00:01:41,240
Speaker 1: am sitting in the morning at the diner on the corner.

26
00:01:41,480 --> 00:01:44,120
Speaker 1: I am waiting at the counter for the man to

27
00:01:44,240 --> 00:01:47,840
Speaker 1: pour the coffee, and he fills it only halfway and

28
00:01:47,920 --> 00:01:51,400
Speaker 1: before I even argue, he is looking out the window

29
00:01:51,840 --> 00:01:58,480
Speaker 1: at somebody coming in. Now that song doesn't work for me.

30
00:01:59,160 --> 00:02:01,880
Speaker 1: I get that it got really popular, especially after someone

31
00:02:01,920 --> 00:02:04,800
Speaker 1: did an unauthorized remix of it, which is the version

32
00:02:04,920 --> 00:02:08,160
Speaker 1: most people know. But it turned out to be an

33
00:02:08,320 --> 00:02:12,919
Speaker 1: absolute perfect song to test the MP three compression algorithm.

34
00:02:13,360 --> 00:02:16,800
Speaker 1: To understand why, we need to learn about the purpose

35
00:02:16,880 --> 00:02:20,040
Speaker 1: of the MP three compression algorithm in the first place.

36
00:02:20,360 --> 00:02:23,440
Speaker 1: So in this case, the compression we're talking about is

37
00:02:23,520 --> 00:02:27,200
Speaker 1: relating to file size. There's an interesting side note. There's

38
00:02:27,240 --> 00:02:30,960
Speaker 1: a different kind of audio compression. This refers to the

39
00:02:31,000 --> 00:02:35,360
Speaker 1: reduction of dynamic range in a recording, and by that

40
00:02:35,480 --> 00:02:40,720
Speaker 1: I mean reducing the volume distance between the loudest and

41
00:02:40,800 --> 00:02:44,240
Speaker 1: the softest parts of a recording that can actually take

42
00:02:44,600 --> 00:02:49,639
Speaker 1: a part in file compression as well, but that's we're

43
00:02:49,639 --> 00:02:52,080
Speaker 1: going to set it aside. Just put a pin in that,

44
00:02:52,280 --> 00:02:54,920
Speaker 1: take a look at it later on. But with file

45
00:02:55,000 --> 00:02:58,840
Speaker 1: compression generally, the whole goal is to find ways to

46
00:02:58,960 --> 00:03:03,800
Speaker 1: pack information into smaller file sizes. That makes those files

47
00:03:03,840 --> 00:03:07,600
Speaker 1: easier to manage. That's important if you are dealing with

48
00:03:07,639 --> 00:03:11,000
Speaker 1: a limited amount of storage, or maybe you want to

49
00:03:11,080 --> 00:03:13,639
Speaker 1: send the file from one machine to another and you've

50
00:03:13,639 --> 00:03:17,200
Speaker 1: got limited bandwidth so you need smaller file sizes, or

51
00:03:17,240 --> 00:03:19,760
Speaker 1: else the process is going to take way too long,

52
00:03:20,200 --> 00:03:23,200
Speaker 1: But how do you do it well? One approach to

53
00:03:23,520 --> 00:03:27,160
Speaker 1: file compression is to take a real good look at

54
00:03:27,160 --> 00:03:31,480
Speaker 1: the file you're trying to compress, and you ask the question,

55
00:03:32,360 --> 00:03:35,560
Speaker 1: is all the information that is inside this file necessary?

56
00:03:36,080 --> 00:03:38,840
Speaker 1: Or could I get rid of some of that information

57
00:03:39,400 --> 00:03:43,200
Speaker 1: and still have a usable file on the other side

58
00:03:43,200 --> 00:03:47,080
Speaker 1: of it With music, That means figuring out which bits

59
00:03:47,160 --> 00:03:50,640
Speaker 1: of data you can drop without it having a noticeable

60
00:03:50,680 --> 00:03:55,640
Speaker 1: effect on the audio quality. Ideally the compressed file would

61
00:03:55,680 --> 00:04:00,600
Speaker 1: be indistinguishable from the original raw audio, but since tossing

62
00:04:00,680 --> 00:04:05,160
Speaker 1: out information that's not necessarily a guarantee. This is what

63
00:04:05,320 --> 00:04:10,520
Speaker 1: makes the MP three a loss e file format. MP

64
00:04:10,600 --> 00:04:14,120
Speaker 1: three is just one example of a loss e file format.

65
00:04:14,160 --> 00:04:17,159
Speaker 1: There are others, and the word loss e means just

66
00:04:17,320 --> 00:04:21,039
Speaker 1: exactly what you think. It means that some information is

67
00:04:21,160 --> 00:04:25,359
Speaker 1: tossed aside or lost in the process of compressing the

68
00:04:25,360 --> 00:04:28,520
Speaker 1: file to a smaller size. The folks who worked on

69
00:04:28,560 --> 00:04:32,360
Speaker 1: the MP three format had to figure out which information

70
00:04:32,920 --> 00:04:35,359
Speaker 1: was most likely to have little to no impact on

71
00:04:35,480 --> 00:04:39,600
Speaker 1: audio quality within an audio file. To do that, they

72
00:04:39,640 --> 00:04:43,880
Speaker 1: had to take into account human psychology and the limitations

73
00:04:44,000 --> 00:04:49,159
Speaker 1: of human hearing. So psychoacoustics played a big part in

74
00:04:49,240 --> 00:04:54,359
Speaker 1: determining the MP three compression algorithm. So for example, by that,

75
00:04:54,480 --> 00:04:58,039
Speaker 1: I mean, let's think of the range of human hearing

76
00:04:58,080 --> 00:05:01,160
Speaker 1: in terms of frequencies for a second, so your typical

77
00:05:01,240 --> 00:05:06,360
Speaker 1: human is able to hear frequencies as low as twenty

78
00:05:06,440 --> 00:05:11,000
Speaker 1: hurts and as high as twenty thousand hurts or twenty

79
00:05:11,080 --> 00:05:15,760
Speaker 1: killer hurts. Hurts in this case references an oscillation per

80
00:05:15,760 --> 00:05:19,039
Speaker 1: second or a vibration per second, So twenty hurts means

81
00:05:19,560 --> 00:05:24,479
Speaker 1: that something is effectively vibrating twenty times per second. So

82
00:05:24,520 --> 00:05:27,640
Speaker 1: if you had a string that when you plucked, it

83
00:05:27,680 --> 00:05:31,160
Speaker 1: would vibrate twenty times per second. That string is vibrating

84
00:05:31,200 --> 00:05:35,360
Speaker 1: at twenty hurts. That would be a very very low note.

85
00:05:35,960 --> 00:05:38,880
Speaker 1: The higher the frequency, the higher the pitch, and as

86
00:05:38,920 --> 00:05:41,200
Speaker 1: we age, we tend to lose the ability to hear

87
00:05:41,240 --> 00:05:44,200
Speaker 1: some of those higher pitches, which is why you would

88
00:05:44,240 --> 00:05:48,120
Speaker 1: hear about some convenience stores experimenting with playing very high

89
00:05:48,200 --> 00:05:52,480
Speaker 1: pitch noises to discourage young punks who wanted to loiter

90
00:05:52,640 --> 00:05:56,960
Speaker 1: in the joint. So human hearing has limitations, and in

91
00:05:57,040 --> 00:06:01,880
Speaker 1: theory you can eliminate sounds that would fall outside of

92
00:06:01,920 --> 00:06:05,960
Speaker 1: those limitations. If a sound file contains frequencies that are

93
00:06:06,000 --> 00:06:09,520
Speaker 1: at twenty one killer hertz, but your typical person can't

94
00:06:09,520 --> 00:06:14,560
Speaker 1: hear anything above twenty killer hertz, well, at least theoretically,

95
00:06:14,600 --> 00:06:17,599
Speaker 1: you can just toss that information and it won't change anything.

96
00:06:17,880 --> 00:06:21,240
Speaker 1: If a sound file contains a sound but no one

97
00:06:21,320 --> 00:06:24,279
Speaker 1: has the capacity to hear it, does a tree fall

98
00:06:24,360 --> 00:06:28,440
Speaker 1: in the forest? Might be getting a little lost in

99
00:06:28,480 --> 00:06:32,760
Speaker 1: the woods here anyway. That frequency example, that's just one

100
00:06:32,839 --> 00:06:35,520
Speaker 1: example of the sound that humans would have trouble hearing.

101
00:06:35,920 --> 00:06:40,120
Speaker 1: So another is when we hear a very soft sound

102
00:06:40,200 --> 00:06:44,400
Speaker 1: that immediately follows a very loud sound, we don't actually

103
00:06:44,440 --> 00:06:48,200
Speaker 1: perceive the soft one. The loud sound we hear eclipses

104
00:06:48,279 --> 00:06:51,400
Speaker 1: the soft sound, and it turns out we can't hear

105
00:06:51,440 --> 00:06:54,839
Speaker 1: the soft one at all. So again, if we can't

106
00:06:54,839 --> 00:06:58,120
Speaker 1: hear that soft sound that played immediately after a loud one,

107
00:06:58,680 --> 00:07:00,640
Speaker 1: why would you keep it? You know, you might as

108
00:07:00,680 --> 00:07:02,480
Speaker 1: well just get rid of the information you can't hear

109
00:07:02,520 --> 00:07:05,720
Speaker 1: it anyway, Just get rid of it, save the space.

110
00:07:06,520 --> 00:07:10,120
Speaker 1: This psychoacoustic approach to sound would lead the developers of

111
00:07:10,120 --> 00:07:13,360
Speaker 1: the MP three format to create a strategy regarding what

112
00:07:13,520 --> 00:07:17,600
Speaker 1: information to keep and what information to ditch. On top

113
00:07:17,680 --> 00:07:22,000
Speaker 1: of that, the algorithm had sort of a sliding scale,

114
00:07:22,640 --> 00:07:25,680
Speaker 1: So maybe you want to keep as much information as possible,

115
00:07:25,720 --> 00:07:27,920
Speaker 1: so you select that when you create the MP three

116
00:07:28,400 --> 00:07:32,000
Speaker 1: So you're losing less information in the process. You're still

117
00:07:32,000 --> 00:07:34,160
Speaker 1: compressing the file, but not to the extent that you

118
00:07:34,320 --> 00:07:38,640
Speaker 1: could if you chose. Maybe the most important thing to

119
00:07:38,720 --> 00:07:41,080
Speaker 1: you is that you reduce the file size as much

120
00:07:41,080 --> 00:07:44,440
Speaker 1: as you can, so you crank the compression up. Now,

121
00:07:44,480 --> 00:07:47,880
Speaker 1: obviously the harder you go, the more likely you're going

122
00:07:47,920 --> 00:07:50,680
Speaker 1: to lose information that will make a noticeable difference in

123
00:07:50,720 --> 00:07:54,760
Speaker 1: the playback of the audio. File, and you'll you would say, oh,

124
00:07:54,800 --> 00:07:57,400
Speaker 1: the quality here is not as good as I thought

125
00:07:57,440 --> 00:08:01,000
Speaker 1: it would be. This is where Tom's Diner comes in.

126
00:08:01,840 --> 00:08:05,160
Speaker 1: Carl Heinz Brandenburg, who was one of the leads on

127
00:08:05,280 --> 00:08:09,800
Speaker 1: creating the MP three format, used Tom's Diner to listen

128
00:08:09,840 --> 00:08:13,920
Speaker 1: back to compressed files and determine how the compression was

129
00:08:13,960 --> 00:08:18,960
Speaker 1: affecting the audio quality. So it was a great track

130
00:08:19,040 --> 00:08:23,920
Speaker 1: to use because the actual qualities of the recording itself

131
00:08:24,600 --> 00:08:27,920
Speaker 1: were such that it was easy to detect if something

132
00:08:28,160 --> 00:08:32,200
Speaker 1: was not quite right. The original recording of Tom's Diner

133
00:08:32,320 --> 00:08:36,200
Speaker 1: is not the one that has the catchy beat and

134
00:08:36,240 --> 00:08:38,720
Speaker 1: the horns in it. It's a very simple a cappella

135
00:08:38,800 --> 00:08:42,200
Speaker 1: recording of Suzanne Vegas singing her tale of looking at

136
00:08:42,200 --> 00:08:44,760
Speaker 1: the world from a male perspective through a sense of

137
00:08:44,800 --> 00:08:49,800
Speaker 1: distance and attachment. Brandenburg would use that track while tweaking

138
00:08:49,800 --> 00:08:52,920
Speaker 1: the algorithm, trying to create the thin line between an

139
00:08:53,000 --> 00:08:57,440
Speaker 1: effective data compression technique and a minimal impact on sound quality,

140
00:08:57,559 --> 00:09:00,000
Speaker 1: and for her contributions to the effort, although she made

141
00:09:00,160 --> 00:09:04,600
Speaker 1: them unknowingly, branden Berg would name Suzanne Vega the mother

142
00:09:04,880 --> 00:09:08,800
Speaker 1: of the MP three. Interestingly, Ryan maguire decided to take

143
00:09:08,840 --> 00:09:12,280
Speaker 1: a sort of negative image of the compressed Tom's Diner.

144
00:09:12,320 --> 00:09:15,960
Speaker 1: He identified sounds that were deleted in the process of

145
00:09:16,000 --> 00:09:19,280
Speaker 1: creating a lossy version of Tom's Diner, and then it

146
00:09:19,360 --> 00:09:22,480
Speaker 1: created a new recording that contained only the bits that

147
00:09:22,679 --> 00:09:26,640
Speaker 1: had been cut from the file. And it's almost like

148
00:09:26,760 --> 00:09:29,439
Speaker 1: listening to the ghost of a song. In fact, I

149
00:09:29,480 --> 00:09:33,160
Speaker 1: think they called the project the Ghost of the MP three.

150
00:09:33,240 --> 00:09:35,640
Speaker 1: It's pretty creepy stuff. It would not be out of

151
00:09:35,679 --> 00:09:38,480
Speaker 1: place in a horror movie. The fact that lossy files,

152
00:09:38,480 --> 00:09:41,479
Speaker 1: by definition lose information in the process of data compression

153
00:09:42,120 --> 00:09:45,480
Speaker 1: meant that audio files dismissed. The MP three format is

154
00:09:45,559 --> 00:09:48,800
Speaker 1: inherently inferior to others, at least as far as listening

155
00:09:48,880 --> 00:09:52,440
Speaker 1: experiences go, and there are arguments that some of the

156
00:09:52,520 --> 00:09:56,720
Speaker 1: lost information, while potentially being imperceptible within the song itself,

157
00:09:57,040 --> 00:10:00,720
Speaker 1: helped shape the overall sound and tone the piece. So

158
00:10:00,760 --> 00:10:05,040
Speaker 1: though you can't directly hear the stuff that's being cut,

159
00:10:05,600 --> 00:10:09,480
Speaker 1: that stuff actually influences how you perceive other things, so

160
00:10:09,960 --> 00:10:13,840
Speaker 1: you still change the experience of hearing the finished audio.

161
00:10:14,080 --> 00:10:17,120
Speaker 1: But the MP three format create the opportunity to store

162
00:10:17,200 --> 00:10:20,120
Speaker 1: and transfer audio files without having to deal with massive

163
00:10:20,200 --> 00:10:23,960
Speaker 1: raw audio formats, and back in the day that was

164
00:10:24,000 --> 00:10:27,840
Speaker 1: not a trivial thing. And so that is the answer

165
00:10:27,840 --> 00:10:32,320
Speaker 1: to the question. Tom's Diner the first MP three Hope

166
00:10:32,320 --> 00:10:36,400
Speaker 1: you're all well and I'll talk to you again really soon.

167
00:10:42,880 --> 00:10:47,520
Speaker 1: Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio,

168
00:10:47,840 --> 00:10:51,559
Speaker 1: visit the iHeartRadio app, Apple Podcasts, or wherever you listen

169
00:10:51,600 --> 00:10:56,200
Speaker 1: to your favorite shows.