WEBVTT - What was the first mp3?

0:00:04.440 --> 0:00:12.360
<v Speaker 1>Welcome to tech Stuff, a production from iHeartRadio. Hey there,

0:00:12.360 --> 0:00:15.840
<v Speaker 1>and welcome to tech Stuff. I'm your host, Jonathan Strickland.

0:00:15.840 --> 0:00:18.920
<v Speaker 1>I'm an executive producer with iHeartRadio. And how the tech

0:00:18.960 --> 0:00:22.000
<v Speaker 1>are you? It's time for a tech Stuff tidbits. I'm

0:00:22.000 --> 0:00:26.680
<v Speaker 1>going to answer the question what was the first MP three? Well,

0:00:26.760 --> 0:00:29.520
<v Speaker 1>here's the too long, didn't listen answer. It was Tom's

0:00:29.560 --> 0:00:33.680
<v Speaker 1>Diner by Suzanne Vega. It's a song I personally do

0:00:33.760 --> 0:00:36.400
<v Speaker 1>not like. That's not to say it's a bad song.

0:00:37.320 --> 0:00:39.520
<v Speaker 1>Just because I don't like something doesn't mean it's bad.

0:00:40.320 --> 0:00:42.840
<v Speaker 1>I just mean I personally do not find this song

0:00:43.120 --> 0:00:46.640
<v Speaker 1>at all appealing. But it was, in fact the first

0:00:46.720 --> 0:00:49.839
<v Speaker 1>MP three. Now, if you don't know Tom's Diner, it

0:00:49.920 --> 0:00:53.040
<v Speaker 1>features Vega giving a little slice a life moment from

0:00:53.120 --> 0:00:56.240
<v Speaker 1>the perspective of a man sitting in a diner who

0:00:56.280 --> 0:01:00.400
<v Speaker 1>feels kind of distanced from the world around him. In

0:01:00.440 --> 0:01:04.800
<v Speaker 1>case you need a reminder, here's the first verse of

0:01:04.840 --> 0:01:08.039
<v Speaker 1>the song. I am sitting in the morning at the

0:01:08.120 --> 0:01:10.840
<v Speaker 1>diner on the corner. I am waiting at the counter

0:01:11.319 --> 0:01:14.279
<v Speaker 1>for the man to pour the coffee, and he fills

0:01:14.319 --> 0:01:18.240
<v Speaker 1>it only halfway and before I even argue he is

0:01:18.280 --> 0:01:24.520
<v Speaker 1>looking out the window at somebody coming in. Now that

0:01:24.920 --> 0:01:28.000
<v Speaker 1>song doesn't work for me. I get that it got

0:01:28.000 --> 0:01:31.720
<v Speaker 1>really popular, especially after someone did an unauthorized remix of it,

0:01:31.760 --> 0:01:35.160
<v Speaker 1>which is the version most people know. But it turned

0:01:35.200 --> 0:01:38.800
<v Speaker 1>out to be an absolute perfect song to test the

0:01:38.959 --> 0:01:43.520
<v Speaker 1>MP three compression algorithm. To understand why, we need to

0:01:43.600 --> 0:01:47.119
<v Speaker 1>learn about the purpose of the MP three compression algorithm

0:01:47.200 --> 0:01:50.240
<v Speaker 1>in the first place. So in this case, the compression

0:01:50.280 --> 0:01:53.880
<v Speaker 1>we're talking about is relating to file size. There's an

0:01:53.920 --> 0:01:57.360
<v Speaker 1>interesting side note. There's a different kind of audio compression.

0:01:57.800 --> 0:02:01.840
<v Speaker 1>This refers to the reduction of diner range in a recording,

0:02:02.960 --> 0:02:07.760
<v Speaker 1>and by that I mean reducing the volume distance between

0:02:07.800 --> 0:02:11.240
<v Speaker 1>the loudest and the softest parts of a recording. That

0:02:11.280 --> 0:02:16.400
<v Speaker 1>can actually take a part in file compression as well,

0:02:16.440 --> 0:02:19.399
<v Speaker 1>but that's we're going to set it aside. Just put

0:02:19.400 --> 0:02:21.360
<v Speaker 1>a pin in that, take a look at it later on.

0:02:22.040 --> 0:02:25.960
<v Speaker 1>But with file compression generally, the whole goal is to

0:02:26.040 --> 0:02:30.760
<v Speaker 1>find ways to pack information into smaller file sizes. That

0:02:30.800 --> 0:02:34.880
<v Speaker 1>makes those files easier to manage. That's important if you

0:02:34.960 --> 0:02:38.400
<v Speaker 1>are dealing with a limited amount of storage, or maybe

0:02:38.400 --> 0:02:41.160
<v Speaker 1>you want to send the file from one machine to another.

0:02:41.240 --> 0:02:45.000
<v Speaker 1>And you've got limited bandwidth, so you need smaller file sizes,

0:02:45.080 --> 0:02:47.680
<v Speaker 1>or else the process is going to take way too long.

0:02:48.120 --> 0:02:51.160
<v Speaker 1>But how do you do it well? One approach to

0:02:51.440 --> 0:02:55.080
<v Speaker 1>file compression is to take a real good look at

0:02:55.120 --> 0:02:59.400
<v Speaker 1>the file you're trying to compress, and you ask the question,

0:03:00.280 --> 0:03:03.480
<v Speaker 1>is all the information that is inside this file necessary?

0:03:04.000 --> 0:03:06.800
<v Speaker 1>Or could I get rid of some of that information

0:03:07.320 --> 0:03:11.160
<v Speaker 1>and still have a usable file on the other side

0:03:11.160 --> 0:03:15.080
<v Speaker 1>of it With music. That means figuring out which bits

0:03:15.080 --> 0:03:18.560
<v Speaker 1>of data you can drop without it having a noticeable

0:03:18.600 --> 0:03:23.600
<v Speaker 1>effect on the audio quality. Ideally, the compressed file would

0:03:23.600 --> 0:03:28.040
<v Speaker 1>be indistinguishable from the original raw audio, but since you're

0:03:28.120 --> 0:03:32.919
<v Speaker 1>tossing out information, that's not necessarily a guarantee. This is

0:03:33.000 --> 0:03:37.760
<v Speaker 1>what makes the MP three a loss e file format.

0:03:38.320 --> 0:03:41.120
<v Speaker 1>MP three is just one example of a loss e

0:03:41.320 --> 0:03:44.360
<v Speaker 1>file format. There are others, and the word loss e

0:03:44.600 --> 0:03:47.600
<v Speaker 1>means just exactly what you think. It means that some

0:03:47.800 --> 0:03:52.480
<v Speaker 1>information is tossed aside or lost in the process of

0:03:52.560 --> 0:03:56.080
<v Speaker 1>compressing the file to a smaller size. The folks who

0:03:56.120 --> 0:03:58.960
<v Speaker 1>worked on the MP three format had to figure out

0:03:59.320 --> 0:04:02.480
<v Speaker 1>which information was most likely to have little to no

0:04:02.680 --> 0:04:06.840
<v Speaker 1>impact on audio quality within an audio file. To do that,

0:04:07.400 --> 0:04:10.840
<v Speaker 1>they had to take into account human psychology and the

0:04:10.880 --> 0:04:16.920
<v Speaker 1>limitations of human hearing. So psychoacoustics played a big part

0:04:17.040 --> 0:04:21.520
<v Speaker 1>in determining the MP three compression algorithm. So for example,

0:04:22.000 --> 0:04:25.320
<v Speaker 1>by that, I mean, let's think of the range of

0:04:25.400 --> 0:04:28.240
<v Speaker 1>human hearing in terms of frequencies for a second, So

0:04:28.440 --> 0:04:33.520
<v Speaker 1>your typical human is able to hear frequencies as low

0:04:33.680 --> 0:04:38.480
<v Speaker 1>as twenty hurts and as high as twenty thousand hurts

0:04:38.560 --> 0:04:42.760
<v Speaker 1>or twenty killer hurts. Hurts in this case references an

0:04:42.800 --> 0:04:46.240
<v Speaker 1>oscillation per second or a vibration per second, So twenty

0:04:46.360 --> 0:04:52.160
<v Speaker 1>hurts means that something is effectively vibrating twenty times per second.

0:04:52.360 --> 0:04:55.480
<v Speaker 1>So if you had a string that when you plucked,

0:04:55.520 --> 0:04:58.560
<v Speaker 1>it would vibrate twenty times per second, that string is

0:04:58.640 --> 0:05:02.559
<v Speaker 1>vibrating at twenty hurts. That would be a very very

0:05:02.640 --> 0:05:05.800
<v Speaker 1>low note. The higher the frequency, the higher the pitch,

0:05:06.360 --> 0:05:08.800
<v Speaker 1>and as we age we tend to lose the ability

0:05:08.839 --> 0:05:11.800
<v Speaker 1>to hear some of those higher pitches, which is why

0:05:11.839 --> 0:05:15.520
<v Speaker 1>you would hear about some convenience stores experimenting with playing

0:05:15.640 --> 0:05:19.800
<v Speaker 1>very high pitched noises to discourage young punks who wanted

0:05:19.800 --> 0:05:24.400
<v Speaker 1>to loiter in the joint. So human hearing has limitations,

0:05:24.480 --> 0:05:28.559
<v Speaker 1>and in theory you can eliminate sounds that would fall

0:05:28.680 --> 0:05:33.279
<v Speaker 1>outside of those limitations. If a sound file contains frequencies

0:05:33.640 --> 0:05:36.800
<v Speaker 1>that are at twenty one killer hertz, but your typical

0:05:36.839 --> 0:05:41.640
<v Speaker 1>person can't hear anything above twenty killer hertz, well, at

0:05:41.720 --> 0:05:44.479
<v Speaker 1>least theoretically, you can just toss that information and it

0:05:44.560 --> 0:05:47.920
<v Speaker 1>won't change anything. If a sound file contains a sound

0:05:48.560 --> 0:05:51.479
<v Speaker 1>but no one has the capacity to hear it, does

0:05:51.480 --> 0:05:55.720
<v Speaker 1>a tree fall in the forest. Might be getting a

0:05:55.720 --> 0:05:59.839
<v Speaker 1>little lost in the woods here anyway. That frequency example,

0:05:59.839 --> 0:06:02.479
<v Speaker 1>that's just one example of the sound that humans would

0:06:02.480 --> 0:06:06.920
<v Speaker 1>have trouble hearing. So another is when we hear a

0:06:07.080 --> 0:06:11.240
<v Speaker 1>very soft sound that immediately follows a very loud sound,

0:06:11.680 --> 0:06:14.560
<v Speaker 1>we don't actually perceive the soft one. The loud sound

0:06:14.640 --> 0:06:18.360
<v Speaker 1>we hear eclipses the soft sound, and it turns out

0:06:18.760 --> 0:06:21.640
<v Speaker 1>we can't hear the soft one at all. So again,

0:06:22.160 --> 0:06:25.120
<v Speaker 1>if we can't hear that soft sound that played immediately

0:06:25.200 --> 0:06:28.200
<v Speaker 1>after a loud one, why would you keep it? You know,

0:06:28.240 --> 0:06:29.880
<v Speaker 1>you might as well just get rid of that information.

0:06:29.960 --> 0:06:32.520
<v Speaker 1>You can't hear it anyway, Just get rid of it.

0:06:32.839 --> 0:06:37.320
<v Speaker 1>Save the space. This psychoacoustic approach to sound would lead

0:06:37.360 --> 0:06:39.640
<v Speaker 1>the developers of the MP three format to create a

0:06:39.680 --> 0:06:44.200
<v Speaker 1>strategy regarding what information to keep and what information to ditch.

0:06:45.160 --> 0:06:48.520
<v Speaker 1>On top of that, the algorithm had sort of a

0:06:48.560 --> 0:06:52.480
<v Speaker 1>sliding scale, so maybe you want to keep as much

0:06:52.520 --> 0:06:55.080
<v Speaker 1>information as possible, so you select that when you create

0:06:55.120 --> 0:06:59.480
<v Speaker 1>the MP three So you're losing less information in the process.

0:06:59.520 --> 0:07:01.800
<v Speaker 1>You're still impressing the file, but not to the extent

0:07:01.839 --> 0:07:06.320
<v Speaker 1>that you could if you chose. Maybe the most important

0:07:06.320 --> 0:07:08.640
<v Speaker 1>thing to you is that you reduce the file size

0:07:08.680 --> 0:07:12.400
<v Speaker 1>as much as you can, so you crank the compression up. Now,

0:07:12.440 --> 0:07:15.800
<v Speaker 1>obviously the harder you go, the more likely you're going

0:07:15.840 --> 0:07:18.600
<v Speaker 1>to lose information that will make a noticeable difference in

0:07:18.640 --> 0:07:22.680
<v Speaker 1>the playback of the audio file, and you'll you would say, oh,

0:07:22.720 --> 0:07:25.360
<v Speaker 1>the quality here is not as good as I thought

0:07:25.360 --> 0:07:28.920
<v Speaker 1>it would be. This is where Tom's Diner comes in.

0:07:29.760 --> 0:07:33.080
<v Speaker 1>Carl Heinz Brandenburg, who is one of the leads on

0:07:33.240 --> 0:07:37.720
<v Speaker 1>creating the MP three format, used Tom's Diner to listen

0:07:37.760 --> 0:07:41.840
<v Speaker 1>back to compressed files and determine how the compression was

0:07:41.880 --> 0:07:46.920
<v Speaker 1>affecting the audio quality. So it was a great track

0:07:47.000 --> 0:07:51.840
<v Speaker 1>to use because the actual qualities of the recording itself

0:07:52.520 --> 0:07:55.880
<v Speaker 1>were such that it was easy to detect if something

0:07:56.120 --> 0:07:59.960
<v Speaker 1>was not quite right. The original recording of Tom's Diner

0:08:00.240 --> 0:08:04.120
<v Speaker 1>is not the one that has the catchy beat and

0:08:04.160 --> 0:08:06.640
<v Speaker 1>the horns in it. It's a very simple a cappella

0:08:06.720 --> 0:08:10.160
<v Speaker 1>recording of Suzanne Vegas singing her tale of looking at

0:08:10.160 --> 0:08:12.680
<v Speaker 1>the world from a male perspective through a sense of

0:08:12.760 --> 0:08:17.320
<v Speaker 1>distance and attachment. Branden Berg would use that track while

0:08:17.320 --> 0:08:20.720
<v Speaker 1>tweaking the algorithm, trying to create the thin line between

0:08:20.760 --> 0:08:24.440
<v Speaker 1>an effective data compression technique and a minimal impact on

0:08:24.560 --> 0:08:27.679
<v Speaker 1>sound quality. And for her contributions to the effort, although

0:08:27.680 --> 0:08:32.200
<v Speaker 1>she made them unknowingly, Brandenburg would name Suzanne Vega the

0:08:32.280 --> 0:08:36.559
<v Speaker 1>mother of the mp three. Interestingly, Ryan maguire decided to

0:08:36.600 --> 0:08:40.200
<v Speaker 1>take a sort of negative image of the compressed Tom's Diner.

0:08:40.280 --> 0:08:43.920
<v Speaker 1>He identified sounds that were deleted in the process of

0:08:43.920 --> 0:08:47.240
<v Speaker 1>creating a lossy version of Tom's Diner, and then it

0:08:47.320 --> 0:08:50.440
<v Speaker 1>created a new recording that contained only the bits that

0:08:50.600 --> 0:08:54.600
<v Speaker 1>had been cut from the file. And it's almost like

0:08:54.720 --> 0:08:57.400
<v Speaker 1>listening to the ghost of a song. In fact, I

0:08:57.400 --> 0:09:01.040
<v Speaker 1>think they called the project the Ghost of the MPIE three.

0:09:01.160 --> 0:09:03.520
<v Speaker 1>It's pretty creepy stuff. It would not be out of

0:09:03.600 --> 0:09:06.360
<v Speaker 1>place in a horror movie. The fact that lossy files

0:09:06.400 --> 0:09:09.400
<v Speaker 1>by definition lose information in the process of data compression

0:09:10.040 --> 0:09:13.440
<v Speaker 1>meant that audio files dismiss the MP three format is

0:09:13.480 --> 0:09:16.800
<v Speaker 1>inherently inferior to others, at least as far as listening

0:09:16.840 --> 0:09:20.360
<v Speaker 1>experiences go. And there are arguments that some of the

0:09:20.440 --> 0:09:24.679
<v Speaker 1>lost information, while potentially being imperceptible within the song itself,

0:09:25.000 --> 0:09:28.520
<v Speaker 1>help shape the overall sound and tone of the piece.

0:09:28.559 --> 0:09:33.000
<v Speaker 1>So though you can't directly hear the stuff that's being cut,

0:09:33.559 --> 0:09:37.400
<v Speaker 1>that stuff actually influences how you perceive other things, so

0:09:37.920 --> 0:09:41.760
<v Speaker 1>you still change the experience of hearing the finished audio.

0:09:42.000 --> 0:09:45.079
<v Speaker 1>But the MP three format created the opportunity to store

0:09:45.120 --> 0:09:48.040
<v Speaker 1>and transfer audio files without having to deal with massive

0:09:48.160 --> 0:09:51.880
<v Speaker 1>raw audio formats, and back in the day that was

0:09:51.960 --> 0:09:55.720
<v Speaker 1>not a trivial thing. And so that is the answer

0:09:55.800 --> 0:10:00.280
<v Speaker 1>to the question Tom's Diner the first MP three Hope

0:10:00.280 --> 0:10:04.360
<v Speaker 1>you're all well and I'll talk to you again really soon.

0:10:10.800 --> 0:10:15.480
<v Speaker 1>Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio,

0:10:15.800 --> 0:10:19.520
<v Speaker 1>visit the iHeartRadio app, Apple Podcasts, or wherever you listen

0:10:19.520 --> 0:10:20.600
<v Speaker 1>to your favorite shows.