WEBVTT - What was the first mp3? 0:00:04.440 --> 0:00:12.360 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.360 --> 0:00:15.840 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:15.840 --> 0:00:18.920 I'm an executive producer with iHeartRadio. And how the tech 0:00:18.960 --> 0:00:22.000 are you? It's time for a tech Stuff tidbits. I'm 0:00:22.000 --> 0:00:26.680 going to answer the question what was the first MP three? Well, 0:00:26.760 --> 0:00:29.520 here's the too long, didn't listen answer. It was Tom's 0:00:29.560 --> 0:00:33.680 Diner by Suzanne Vega. It's a song I personally do 0:00:33.760 --> 0:00:36.400 not like. That's not to say it's a bad song. 0:00:37.320 --> 0:00:39.520 Just because I don't like something doesn't mean it's bad. 0:00:40.320 --> 0:00:42.840 I just mean I personally do not find this song 0:00:43.120 --> 0:00:46.640 at all appealing. But it was, in fact the first 0:00:46.720 --> 0:00:49.839 MP three. Now, if you don't know Tom's Diner, it 0:00:49.920 --> 0:00:53.040 features Vega giving a little slice a life moment from 0:00:53.120 --> 0:00:56.240 the perspective of a man sitting in a diner who 0:00:56.280 --> 0:01:00.400 feels kind of distanced from the world around him. In 0:01:00.440 --> 0:01:04.800 case you need a reminder, here's the first verse of 0:01:04.840 --> 0:01:08.039 the song. I am sitting in the morning at the 0:01:08.120 --> 0:01:10.840 diner on the corner. I am waiting at the counter 0:01:11.319 --> 0:01:14.279 for the man to pour the coffee, and he fills 0:01:14.319 --> 0:01:18.240 it only halfway and before I even argue he is 0:01:18.280 --> 0:01:24.520 looking out the window at somebody coming in. Now that 0:01:24.920 --> 0:01:28.000 song doesn't work for me. I get that it got 0:01:28.000 --> 0:01:31.720 really popular, especially after someone did an unauthorized remix of it, 0:01:31.760 --> 0:01:35.160 which is the version most people know. But it turned 0:01:35.200 --> 0:01:38.800 out to be an absolute perfect song to test the 0:01:38.959 --> 0:01:43.520 MP three compression algorithm. To understand why, we need to 0:01:43.600 --> 0:01:47.119 learn about the purpose of the MP three compression algorithm 0:01:47.200 --> 0:01:50.240 in the first place. So in this case, the compression 0:01:50.280 --> 0:01:53.880 we're talking about is relating to file size. There's an 0:01:53.920 --> 0:01:57.360 interesting side note. There's a different kind of audio compression. 0:01:57.800 --> 0:02:01.840 This refers to the reduction of diner range in a recording, 0:02:02.960 --> 0:02:07.760 and by that I mean reducing the volume distance between 0:02:07.800 --> 0:02:11.240 the loudest and the softest parts of a recording. That 0:02:11.280 --> 0:02:16.400 can actually take a part in file compression as well, 0:02:16.440 --> 0:02:19.399 but that's we're going to set it aside. Just put 0:02:19.400 --> 0:02:21.360 a pin in that, take a look at it later on. 0:02:22.040 --> 0:02:25.960 But with file compression generally, the whole goal is to 0:02:26.040 --> 0:02:30.760 find ways to pack information into smaller file sizes. That 0:02:30.800 --> 0:02:34.880 makes those files easier to manage. That's important if you 0:02:34.960 --> 0:02:38.400 are dealing with a limited amount of storage, or maybe 0:02:38.400 --> 0:02:41.160 you want to send the file from one machine to another. 0:02:41.240 --> 0:02:45.000 And you've got limited bandwidth, so you need smaller file sizes, 0:02:45.080 --> 0:02:47.680 or else the process is going to take way too long. 0:02:48.120 --> 0:02:51.160 But how do you do it well? One approach to 0:02:51.440 --> 0:02:55.080 file compression is to take a real good look at 0:02:55.120 --> 0:02:59.400 the file you're trying to compress, and you ask the question, 0:03:00.280 --> 0:03:03.480 is all the information that is inside this file necessary? 0:03:04.000 --> 0:03:06.800 Or could I get rid of some of that information 0:03:07.320 --> 0:03:11.160 and still have a usable file on the other side 0:03:11.160 --> 0:03:15.080 of it With music. That means figuring out which bits 0:03:15.080 --> 0:03:18.560 of data you can drop without it having a noticeable 0:03:18.600 --> 0:03:23.600 effect on the audio quality. Ideally, the compressed file would 0:03:23.600 --> 0:03:28.040 be indistinguishable from the original raw audio, but since you're 0:03:28.120 --> 0:03:32.919 tossing out information, that's not necessarily a guarantee. This is 0:03:33.000 --> 0:03:37.760 what makes the MP three a loss e file format. 0:03:38.320 --> 0:03:41.120 MP three is just one example of a loss e 0:03:41.320 --> 0:03:44.360 file format. There are others, and the word loss e 0:03:44.600 --> 0:03:47.600 means just exactly what you think. It means that some 0:03:47.800 --> 0:03:52.480 information is tossed aside or lost in the process of 0:03:52.560 --> 0:03:56.080 compressing the file to a smaller size. The folks who 0:03:56.120 --> 0:03:58.960 worked on the MP three format had to figure out 0:03:59.320 --> 0:04:02.480 which information was most likely to have little to no 0:04:02.680 --> 0:04:06.840 impact on audio quality within an audio file. To do that, 0:04:07.400 --> 0:04:10.840 they had to take into account human psychology and the 0:04:10.880 --> 0:04:16.920 limitations of human hearing. So psychoacoustics played a big part 0:04:17.040 --> 0:04:21.520 in determining the MP three compression algorithm. So for example, 0:04:22.000 --> 0:04:25.320 by that, I mean, let's think of the range of 0:04:25.400 --> 0:04:28.240 human hearing in terms of frequencies for a second, So 0:04:28.440 --> 0:04:33.520 your typical human is able to hear frequencies as low 0:04:33.680 --> 0:04:38.480 as twenty hurts and as high as twenty thousand hurts 0:04:38.560 --> 0:04:42.760 or twenty killer hurts. Hurts in this case references an 0:04:42.800 --> 0:04:46.240 oscillation per second or a vibration per second, So twenty 0:04:46.360 --> 0:04:52.160 hurts means that something is effectively vibrating twenty times per second. 0:04:52.360 --> 0:04:55.480 So if you had a string that when you plucked, 0:04:55.520 --> 0:04:58.560 it would vibrate twenty times per second, that string is 0:04:58.640 --> 0:05:02.559 vibrating at twenty hurts. That would be a very very 0:05:02.640 --> 0:05:05.800 low note. The higher the frequency, the higher the pitch, 0:05:06.360 --> 0:05:08.800 and as we age we tend to lose the ability 0:05:08.839 --> 0:05:11.800 to hear some of those higher pitches, which is why 0:05:11.839 --> 0:05:15.520 you would hear about some convenience stores experimenting with playing 0:05:15.640 --> 0:05:19.800 very high pitched noises to discourage young punks who wanted 0:05:19.800 --> 0:05:24.400 to loiter in the joint. So human hearing has limitations, 0:05:24.480 --> 0:05:28.559 and in theory you can eliminate sounds that would fall 0:05:28.680 --> 0:05:33.279 outside of those limitations. If a sound file contains frequencies 0:05:33.640 --> 0:05:36.800 that are at twenty one killer hertz, but your typical 0:05:36.839 --> 0:05:41.640 person can't hear anything above twenty killer hertz, well, at 0:05:41.720 --> 0:05:44.479 least theoretically, you can just toss that information and it 0:05:44.560 --> 0:05:47.920 won't change anything. If a sound file contains a sound 0:05:48.560 --> 0:05:51.479 but no one has the capacity to hear it, does 0:05:51.480 --> 0:05:55.720 a tree fall in the forest. Might be getting a 0:05:55.720 --> 0:05:59.839 little lost in the woods here anyway. That frequency example, 0:05:59.839 --> 0:06:02.479 that's just one example of the sound that humans would 0:06:02.480 --> 0:06:06.920 have trouble hearing. So another is when we hear a 0:06:07.080 --> 0:06:11.240 very soft sound that immediately follows a very loud sound, 0:06:11.680 --> 0:06:14.560 we don't actually perceive the soft one. The loud sound 0:06:14.640 --> 0:06:18.360 we hear eclipses the soft sound, and it turns out 0:06:18.760 --> 0:06:21.640 we can't hear the soft one at all. So again, 0:06:22.160 --> 0:06:25.120 if we can't hear that soft sound that played immediately 0:06:25.200 --> 0:06:28.200 after a loud one, why would you keep it? You know, 0:06:28.240 --> 0:06:29.880 you might as well just get rid of that information. 0:06:29.960 --> 0:06:32.520 You can't hear it anyway, Just get rid of it. 0:06:32.839 --> 0:06:37.320 Save the space. This psychoacoustic approach to sound would lead 0:06:37.360 --> 0:06:39.640 the developers of the MP three format to create a 0:06:39.680 --> 0:06:44.200 strategy regarding what information to keep and what information to ditch. 0:06:45.160 --> 0:06:48.520 On top of that, the algorithm had sort of a 0:06:48.560 --> 0:06:52.480 sliding scale, so maybe you want to keep as much 0:06:52.520 --> 0:06:55.080 information as possible, so you select that when you create 0:06:55.120 --> 0:06:59.480 the MP three So you're losing less information in the process. 0:06:59.520 --> 0:07:01.800 You're still impressing the file, but not to the extent 0:07:01.839 --> 0:07:06.320 that you could if you chose. Maybe the most important 0:07:06.320 --> 0:07:08.640 thing to you is that you reduce the file size 0:07:08.680 --> 0:07:12.400 as much as you can, so you crank the compression up. Now, 0:07:12.440 --> 0:07:15.800 obviously the harder you go, the more likely you're going 0:07:15.840 --> 0:07:18.600 to lose information that will make a noticeable difference in 0:07:18.640 --> 0:07:22.680 the playback of the audio file, and you'll you would say, oh, 0:07:22.720 --> 0:07:25.360 the quality here is not as good as I thought 0:07:25.360 --> 0:07:28.920 it would be. This is where Tom's Diner comes in. 0:07:29.760 --> 0:07:33.080 Carl Heinz Brandenburg, who is one of the leads on 0:07:33.240 --> 0:07:37.720 creating the MP three format, used Tom's Diner to listen 0:07:37.760 --> 0:07:41.840 back to compressed files and determine how the compression was 0:07:41.880 --> 0:07:46.920 affecting the audio quality. So it was a great track 0:07:47.000 --> 0:07:51.840 to use because the actual qualities of the recording itself 0:07:52.520 --> 0:07:55.880 were such that it was easy to detect if something 0:07:56.120 --> 0:07:59.960 was not quite right. The original recording of Tom's Diner 0:08:00.240 --> 0:08:04.120 is not the one that has the catchy beat and 0:08:04.160 --> 0:08:06.640 the horns in it. It's a very simple a cappella 0:08:06.720 --> 0:08:10.160 recording of Suzanne Vegas singing her tale of looking at 0:08:10.160 --> 0:08:12.680 the world from a male perspective through a sense of 0:08:12.760 --> 0:08:17.320 distance and attachment. Branden Berg would use that track while 0:08:17.320 --> 0:08:20.720 tweaking the algorithm, trying to create the thin line between 0:08:20.760 --> 0:08:24.440 an effective data compression technique and a minimal impact on 0:08:24.560 --> 0:08:27.679 sound quality. And for her contributions to the effort, although 0:08:27.680 --> 0:08:32.200 she made them unknowingly, Brandenburg would name Suzanne Vega the 0:08:32.280 --> 0:08:36.559 mother of the mp three. Interestingly, Ryan maguire decided to 0:08:36.600 --> 0:08:40.200 take a sort of negative image of the compressed Tom's Diner. 0:08:40.280 --> 0:08:43.920 He identified sounds that were deleted in the process of 0:08:43.920 --> 0:08:47.240 creating a lossy version of Tom's Diner, and then it 0:08:47.320 --> 0:08:50.440 created a new recording that contained only the bits that 0:08:50.600 --> 0:08:54.600 had been cut from the file. And it's almost like 0:08:54.720 --> 0:08:57.400 listening to the ghost of a song. In fact, I 0:08:57.400 --> 0:09:01.040 think they called the project the Ghost of the MPIE three. 0:09:01.160 --> 0:09:03.520 It's pretty creepy stuff. It would not be out of 0:09:03.600 --> 0:09:06.360 place in a horror movie. The fact that lossy files 0:09:06.400 --> 0:09:09.400 by definition lose information in the process of data compression 0:09:10.040 --> 0:09:13.440 meant that audio files dismiss the MP three format is 0:09:13.480 --> 0:09:16.800 inherently inferior to others, at least as far as listening 0:09:16.840 --> 0:09:20.360 experiences go. And there are arguments that some of the 0:09:20.440 --> 0:09:24.679 lost information, while potentially being imperceptible within the song itself, 0:09:25.000 --> 0:09:28.520 help shape the overall sound and tone of the piece. 0:09:28.559 --> 0:09:33.000 So though you can't directly hear the stuff that's being cut, 0:09:33.559 --> 0:09:37.400 that stuff actually influences how you perceive other things, so 0:09:37.920 --> 0:09:41.760 you still change the experience of hearing the finished audio. 0:09:42.000 --> 0:09:45.079 But the MP three format created the opportunity to store 0:09:45.120 --> 0:09:48.040 and transfer audio files without having to deal with massive 0:09:48.160 --> 0:09:51.880 raw audio formats, and back in the day that was 0:09:51.960 --> 0:09:55.720 not a trivial thing. And so that is the answer 0:09:55.800 --> 0:10:00.280 to the question Tom's Diner the first MP three Hope 0:10:00.280 --> 0:10:04.360 you're all well and I'll talk to you again really soon. 0:10:10.800 --> 0:10:15.480 Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio, 0:10:15.800 --> 0:10:19.520 visit the iHeartRadio app, Apple Podcasts, or wherever you listen 0:10:19.520 --> 0:10:20.600 to your favorite shows.