WEBVTT - Rerun: What was the first mp3?

0:00:04.440 --> 0:00:12.479
<v Speaker 1>Welcome to tech Stuff, a production from iHeartRadio. Hey there,

0:00:12.520 --> 0:00:15.720
<v Speaker 1>and welcome to tech Stuff. I'm your host, Jonathan Strickland.

0:00:15.720 --> 0:00:18.799
<v Speaker 1>I'm an executive producer with iHeart Podcasts and How the

0:00:18.880 --> 0:00:22.239
<v Speaker 1>tech are you? So I'm getting ready to go on vacation,

0:00:22.560 --> 0:00:26.280
<v Speaker 1>which means we've got some classic episodes lined up for you. Actually,

0:00:26.280 --> 0:00:30.040
<v Speaker 1>these aren't that classic. These came out last year and

0:00:30.080 --> 0:00:33.959
<v Speaker 1>today I thought I would bring a short one for you.

0:00:34.440 --> 0:00:38.840
<v Speaker 1>This one was published originally on June seventh, twenty twenty three.

0:00:39.200 --> 0:00:42.479
<v Speaker 1>It's a fun little episode. It is titled what was

0:00:42.960 --> 0:00:46.560
<v Speaker 1>the First MP three? This is like one of those

0:00:46.600 --> 0:00:50.200
<v Speaker 1>pub trivia style tech stuff topics. I hope you enjoy.

0:00:52.280 --> 0:00:54.240
<v Speaker 1>It's time for a tech stuff tidbits. I'm going to

0:00:54.320 --> 0:00:58.720
<v Speaker 1>answer the question what was the first MP three? Well,

0:00:58.800 --> 0:01:01.960
<v Speaker 1>here's the too long didn't answer. It was Tom's Diner

0:01:02.040 --> 0:01:06.360
<v Speaker 1>by Suzanne Vega. It's a song I personally do not like.

0:01:07.000 --> 0:01:09.800
<v Speaker 1>It's not to say it's a bad song. Just because

0:01:09.920 --> 0:01:12.679
<v Speaker 1>I don't like something doesn't mean it's bad. I just

0:01:12.720 --> 0:01:16.000
<v Speaker 1>mean I personally do not find this song at all appealing.

0:01:16.640 --> 0:01:19.520
<v Speaker 1>But it was, in fact the first MP three. Now,

0:01:19.520 --> 0:01:23.320
<v Speaker 1>if you don't know Tom's Diner. It features Vega giving

0:01:23.360 --> 0:01:26.360
<v Speaker 1>a little slice a life moment from the perspective of

0:01:26.400 --> 0:01:29.240
<v Speaker 1>a man sitting in a diner who feels kind of

0:01:29.280 --> 0:01:32.960
<v Speaker 1>distanced from the world around him. In case you need

0:01:33.000 --> 0:01:38.000
<v Speaker 1>a reminder, here's the first verse of the song. I

0:01:38.040 --> 0:01:41.240
<v Speaker 1>am sitting in the morning at the diner on the corner.

0:01:41.480 --> 0:01:44.120
<v Speaker 1>I am waiting at the counter for the man to

0:01:44.240 --> 0:01:47.840
<v Speaker 1>pour the coffee, and he fills it only halfway and

0:01:47.920 --> 0:01:51.400
<v Speaker 1>before I even argue, he is looking out the window

0:01:51.840 --> 0:01:58.480
<v Speaker 1>at somebody coming in. Now that song doesn't work for me.

0:01:59.160 --> 0:02:01.880
<v Speaker 1>I get that it got really popular, especially after someone

0:02:01.920 --> 0:02:04.800
<v Speaker 1>did an unauthorized remix of it, which is the version

0:02:04.920 --> 0:02:08.160
<v Speaker 1>most people know. But it turned out to be an

0:02:08.320 --> 0:02:12.919
<v Speaker 1>absolute perfect song to test the MP three compression algorithm.

0:02:13.360 --> 0:02:16.800
<v Speaker 1>To understand why, we need to learn about the purpose

0:02:16.880 --> 0:02:20.040
<v Speaker 1>of the MP three compression algorithm in the first place.

0:02:20.360 --> 0:02:23.440
<v Speaker 1>So in this case, the compression we're talking about is

0:02:23.520 --> 0:02:27.200
<v Speaker 1>relating to file size. There's an interesting side note. There's

0:02:27.240 --> 0:02:30.960
<v Speaker 1>a different kind of audio compression. This refers to the

0:02:31.000 --> 0:02:35.360
<v Speaker 1>reduction of dynamic range in a recording, and by that

0:02:35.480 --> 0:02:40.720
<v Speaker 1>I mean reducing the volume distance between the loudest and

0:02:40.800 --> 0:02:44.240
<v Speaker 1>the softest parts of a recording that can actually take

0:02:44.600 --> 0:02:49.639
<v Speaker 1>a part in file compression as well, but that's we're

0:02:49.639 --> 0:02:52.080
<v Speaker 1>going to set it aside. Just put a pin in that,

0:02:52.280 --> 0:02:54.920
<v Speaker 1>take a look at it later on. But with file

0:02:55.000 --> 0:02:58.840
<v Speaker 1>compression generally, the whole goal is to find ways to

0:02:58.960 --> 0:03:03.800
<v Speaker 1>pack information into smaller file sizes. That makes those files

0:03:03.840 --> 0:03:07.600
<v Speaker 1>easier to manage. That's important if you are dealing with

0:03:07.639 --> 0:03:11.000
<v Speaker 1>a limited amount of storage, or maybe you want to

0:03:11.080 --> 0:03:13.639
<v Speaker 1>send the file from one machine to another and you've

0:03:13.639 --> 0:03:17.200
<v Speaker 1>got limited bandwidth so you need smaller file sizes, or

0:03:17.240 --> 0:03:19.760
<v Speaker 1>else the process is going to take way too long,

0:03:20.200 --> 0:03:23.200
<v Speaker 1>But how do you do it well? One approach to

0:03:23.520 --> 0:03:27.160
<v Speaker 1>file compression is to take a real good look at

0:03:27.160 --> 0:03:31.480
<v Speaker 1>the file you're trying to compress, and you ask the question,

0:03:32.360 --> 0:03:35.560
<v Speaker 1>is all the information that is inside this file necessary?

0:03:36.080 --> 0:03:38.840
<v Speaker 1>Or could I get rid of some of that information

0:03:39.400 --> 0:03:43.200
<v Speaker 1>and still have a usable file on the other side

0:03:43.200 --> 0:03:47.080
<v Speaker 1>of it With music, That means figuring out which bits

0:03:47.160 --> 0:03:50.640
<v Speaker 1>of data you can drop without it having a noticeable

0:03:50.680 --> 0:03:55.640
<v Speaker 1>effect on the audio quality. Ideally the compressed file would

0:03:55.680 --> 0:04:00.600
<v Speaker 1>be indistinguishable from the original raw audio, but since tossing

0:04:00.680 --> 0:04:05.160
<v Speaker 1>out information that's not necessarily a guarantee. This is what

0:04:05.320 --> 0:04:10.520
<v Speaker 1>makes the MP three a loss e file format. MP

0:04:10.600 --> 0:04:14.120
<v Speaker 1>three is just one example of a loss e file format.

0:04:14.160 --> 0:04:17.159
<v Speaker 1>There are others, and the word loss e means just

0:04:17.320 --> 0:04:21.039
<v Speaker 1>exactly what you think. It means that some information is

0:04:21.160 --> 0:04:25.359
<v Speaker 1>tossed aside or lost in the process of compressing the

0:04:25.360 --> 0:04:28.520
<v Speaker 1>file to a smaller size. The folks who worked on

0:04:28.560 --> 0:04:32.360
<v Speaker 1>the MP three format had to figure out which information

0:04:32.920 --> 0:04:35.359
<v Speaker 1>was most likely to have little to no impact on

0:04:35.480 --> 0:04:39.600
<v Speaker 1>audio quality within an audio file. To do that, they

0:04:39.640 --> 0:04:43.880
<v Speaker 1>had to take into account human psychology and the limitations

0:04:44.000 --> 0:04:49.159
<v Speaker 1>of human hearing. So psychoacoustics played a big part in

0:04:49.240 --> 0:04:54.359
<v Speaker 1>determining the MP three compression algorithm. So for example, by that,

0:04:54.480 --> 0:04:58.039
<v Speaker 1>I mean, let's think of the range of human hearing

0:04:58.080 --> 0:05:01.160
<v Speaker 1>in terms of frequencies for a second, so your typical

0:05:01.240 --> 0:05:06.360
<v Speaker 1>human is able to hear frequencies as low as twenty

0:05:06.440 --> 0:05:11.000
<v Speaker 1>hurts and as high as twenty thousand hurts or twenty

0:05:11.080 --> 0:05:15.760
<v Speaker 1>killer hurts. Hurts in this case references an oscillation per

0:05:15.760 --> 0:05:19.039
<v Speaker 1>second or a vibration per second, So twenty hurts means

0:05:19.560 --> 0:05:24.479
<v Speaker 1>that something is effectively vibrating twenty times per second. So

0:05:24.520 --> 0:05:27.640
<v Speaker 1>if you had a string that when you plucked, it

0:05:27.680 --> 0:05:31.160
<v Speaker 1>would vibrate twenty times per second. That string is vibrating

0:05:31.200 --> 0:05:35.360
<v Speaker 1>at twenty hurts. That would be a very very low note.

0:05:35.960 --> 0:05:38.880
<v Speaker 1>The higher the frequency, the higher the pitch, and as

0:05:38.920 --> 0:05:41.200
<v Speaker 1>we age, we tend to lose the ability to hear

0:05:41.240 --> 0:05:44.200
<v Speaker 1>some of those higher pitches, which is why you would

0:05:44.240 --> 0:05:48.120
<v Speaker 1>hear about some convenience stores experimenting with playing very high

0:05:48.200 --> 0:05:52.480
<v Speaker 1>pitch noises to discourage young punks who wanted to loiter

0:05:52.640 --> 0:05:56.960
<v Speaker 1>in the joint. So human hearing has limitations, and in

0:05:57.040 --> 0:06:01.880
<v Speaker 1>theory you can eliminate sounds that would fall outside of

0:06:01.920 --> 0:06:05.960
<v Speaker 1>those limitations. If a sound file contains frequencies that are

0:06:06.000 --> 0:06:09.520
<v Speaker 1>at twenty one killer hertz, but your typical person can't

0:06:09.520 --> 0:06:14.560
<v Speaker 1>hear anything above twenty killer hertz, well, at least theoretically,

0:06:14.600 --> 0:06:17.599
<v Speaker 1>you can just toss that information and it won't change anything.

0:06:17.880 --> 0:06:21.240
<v Speaker 1>If a sound file contains a sound but no one

0:06:21.320 --> 0:06:24.279
<v Speaker 1>has the capacity to hear it, does a tree fall

0:06:24.360 --> 0:06:28.440
<v Speaker 1>in the forest? Might be getting a little lost in

0:06:28.480 --> 0:06:32.760
<v Speaker 1>the woods here anyway. That frequency example, that's just one

0:06:32.839 --> 0:06:35.520
<v Speaker 1>example of the sound that humans would have trouble hearing.

0:06:35.920 --> 0:06:40.120
<v Speaker 1>So another is when we hear a very soft sound

0:06:40.200 --> 0:06:44.400
<v Speaker 1>that immediately follows a very loud sound, we don't actually

0:06:44.440 --> 0:06:48.200
<v Speaker 1>perceive the soft one. The loud sound we hear eclipses

0:06:48.279 --> 0:06:51.400
<v Speaker 1>the soft sound, and it turns out we can't hear

0:06:51.440 --> 0:06:54.839
<v Speaker 1>the soft one at all. So again, if we can't

0:06:54.839 --> 0:06:58.120
<v Speaker 1>hear that soft sound that played immediately after a loud one,

0:06:58.680 --> 0:07:00.640
<v Speaker 1>why would you keep it? You know, you might as

0:07:00.680 --> 0:07:02.480
<v Speaker 1>well just get rid of the information you can't hear

0:07:02.520 --> 0:07:05.720
<v Speaker 1>it anyway, Just get rid of it, save the space.

0:07:06.520 --> 0:07:10.120
<v Speaker 1>This psychoacoustic approach to sound would lead the developers of

0:07:10.120 --> 0:07:13.360
<v Speaker 1>the MP three format to create a strategy regarding what

0:07:13.520 --> 0:07:17.600
<v Speaker 1>information to keep and what information to ditch. On top

0:07:17.680 --> 0:07:22.000
<v Speaker 1>of that, the algorithm had sort of a sliding scale,

0:07:22.640 --> 0:07:25.680
<v Speaker 1>So maybe you want to keep as much information as possible,

0:07:25.720 --> 0:07:27.920
<v Speaker 1>so you select that when you create the MP three

0:07:28.400 --> 0:07:32.000
<v Speaker 1>So you're losing less information in the process. You're still

0:07:32.000 --> 0:07:34.160
<v Speaker 1>compressing the file, but not to the extent that you

0:07:34.320 --> 0:07:38.640
<v Speaker 1>could if you chose. Maybe the most important thing to

0:07:38.720 --> 0:07:41.080
<v Speaker 1>you is that you reduce the file size as much

0:07:41.080 --> 0:07:44.440
<v Speaker 1>as you can, so you crank the compression up. Now,

0:07:44.480 --> 0:07:47.880
<v Speaker 1>obviously the harder you go, the more likely you're going

0:07:47.920 --> 0:07:50.680
<v Speaker 1>to lose information that will make a noticeable difference in

0:07:50.720 --> 0:07:54.760
<v Speaker 1>the playback of the audio. File, and you'll you would say, oh,

0:07:54.800 --> 0:07:57.400
<v Speaker 1>the quality here is not as good as I thought

0:07:57.440 --> 0:08:01.000
<v Speaker 1>it would be. This is where Tom's Diner comes in.

0:08:01.840 --> 0:08:05.160
<v Speaker 1>Carl Heinz Brandenburg, who was one of the leads on

0:08:05.280 --> 0:08:09.800
<v Speaker 1>creating the MP three format, used Tom's Diner to listen

0:08:09.840 --> 0:08:13.920
<v Speaker 1>back to compressed files and determine how the compression was

0:08:13.960 --> 0:08:18.960
<v Speaker 1>affecting the audio quality. So it was a great track

0:08:19.040 --> 0:08:23.920
<v Speaker 1>to use because the actual qualities of the recording itself

0:08:24.600 --> 0:08:27.920
<v Speaker 1>were such that it was easy to detect if something

0:08:28.160 --> 0:08:32.200
<v Speaker 1>was not quite right. The original recording of Tom's Diner

0:08:32.320 --> 0:08:36.200
<v Speaker 1>is not the one that has the catchy beat and

0:08:36.240 --> 0:08:38.720
<v Speaker 1>the horns in it. It's a very simple a cappella

0:08:38.800 --> 0:08:42.200
<v Speaker 1>recording of Suzanne Vegas singing her tale of looking at

0:08:42.200 --> 0:08:44.760
<v Speaker 1>the world from a male perspective through a sense of

0:08:44.800 --> 0:08:49.800
<v Speaker 1>distance and attachment. Brandenburg would use that track while tweaking

0:08:49.800 --> 0:08:52.920
<v Speaker 1>the algorithm, trying to create the thin line between an

0:08:53.000 --> 0:08:57.440
<v Speaker 1>effective data compression technique and a minimal impact on sound quality,

0:08:57.559 --> 0:09:00.000
<v Speaker 1>and for her contributions to the effort, although she made

0:09:00.160 --> 0:09:04.600
<v Speaker 1>them unknowingly, branden Berg would name Suzanne Vega the mother

0:09:04.880 --> 0:09:08.800
<v Speaker 1>of the MP three. Interestingly, Ryan maguire decided to take

0:09:08.840 --> 0:09:12.280
<v Speaker 1>a sort of negative image of the compressed Tom's Diner.

0:09:12.320 --> 0:09:15.960
<v Speaker 1>He identified sounds that were deleted in the process of

0:09:16.000 --> 0:09:19.280
<v Speaker 1>creating a lossy version of Tom's Diner, and then it

0:09:19.360 --> 0:09:22.480
<v Speaker 1>created a new recording that contained only the bits that

0:09:22.679 --> 0:09:26.640
<v Speaker 1>had been cut from the file. And it's almost like

0:09:26.760 --> 0:09:29.439
<v Speaker 1>listening to the ghost of a song. In fact, I

0:09:29.480 --> 0:09:33.160
<v Speaker 1>think they called the project the Ghost of the MP three.

0:09:33.240 --> 0:09:35.640
<v Speaker 1>It's pretty creepy stuff. It would not be out of

0:09:35.679 --> 0:09:38.480
<v Speaker 1>place in a horror movie. The fact that lossy files,

0:09:38.480 --> 0:09:41.479
<v Speaker 1>by definition lose information in the process of data compression

0:09:42.120 --> 0:09:45.480
<v Speaker 1>meant that audio files dismissed. The MP three format is

0:09:45.559 --> 0:09:48.800
<v Speaker 1>inherently inferior to others, at least as far as listening

0:09:48.880 --> 0:09:52.440
<v Speaker 1>experiences go, and there are arguments that some of the

0:09:52.520 --> 0:09:56.720
<v Speaker 1>lost information, while potentially being imperceptible within the song itself,

0:09:57.040 --> 0:10:00.720
<v Speaker 1>helped shape the overall sound and tone the piece. So

0:10:00.760 --> 0:10:05.040
<v Speaker 1>though you can't directly hear the stuff that's being cut,

0:10:05.600 --> 0:10:09.480
<v Speaker 1>that stuff actually influences how you perceive other things, so

0:10:09.960 --> 0:10:13.840
<v Speaker 1>you still change the experience of hearing the finished audio.

0:10:14.080 --> 0:10:17.120
<v Speaker 1>But the MP three format create the opportunity to store

0:10:17.200 --> 0:10:20.120
<v Speaker 1>and transfer audio files without having to deal with massive

0:10:20.200 --> 0:10:23.960
<v Speaker 1>raw audio formats, and back in the day that was

0:10:24.000 --> 0:10:27.840
<v Speaker 1>not a trivial thing. And so that is the answer

0:10:27.840 --> 0:10:32.320
<v Speaker 1>to the question. Tom's Diner the first MP three Hope

0:10:32.320 --> 0:10:36.400
<v Speaker 1>you're all well and I'll talk to you again really soon.

0:10:42.880 --> 0:10:47.520
<v Speaker 1>Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio,

0:10:47.840 --> 0:10:51.559
<v Speaker 1>visit the iHeartRadio app, Apple Podcasts, or wherever you listen

0:10:51.600 --> 0:10:56.200
<v Speaker 1>to your favorite shows.