WEBVTT - Rerun: What was the first mp3? 0:00:04.440 --> 0:00:12.479 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.520 --> 0:00:15.720 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:15.720 --> 0:00:18.799 I'm an executive producer with iHeart Podcasts and How the 0:00:18.880 --> 0:00:22.239 tech are you? So I'm getting ready to go on vacation, 0:00:22.560 --> 0:00:26.280 which means we've got some classic episodes lined up for you. Actually, 0:00:26.280 --> 0:00:30.040 these aren't that classic. These came out last year and 0:00:30.080 --> 0:00:33.959 today I thought I would bring a short one for you. 0:00:34.440 --> 0:00:38.840 This one was published originally on June seventh, twenty twenty three. 0:00:39.200 --> 0:00:42.479 It's a fun little episode. It is titled what was 0:00:42.960 --> 0:00:46.560 the First MP three? This is like one of those 0:00:46.600 --> 0:00:50.200 pub trivia style tech stuff topics. I hope you enjoy. 0:00:52.280 --> 0:00:54.240 It's time for a tech stuff tidbits. I'm going to 0:00:54.320 --> 0:00:58.720 answer the question what was the first MP three? Well, 0:00:58.800 --> 0:01:01.960 here's the too long didn't answer. It was Tom's Diner 0:01:02.040 --> 0:01:06.360 by Suzanne Vega. It's a song I personally do not like. 0:01:07.000 --> 0:01:09.800 It's not to say it's a bad song. Just because 0:01:09.920 --> 0:01:12.679 I don't like something doesn't mean it's bad. I just 0:01:12.720 --> 0:01:16.000 mean I personally do not find this song at all appealing. 0:01:16.640 --> 0:01:19.520 But it was, in fact the first MP three. Now, 0:01:19.520 --> 0:01:23.320 if you don't know Tom's Diner. It features Vega giving 0:01:23.360 --> 0:01:26.360 a little slice a life moment from the perspective of 0:01:26.400 --> 0:01:29.240 a man sitting in a diner who feels kind of 0:01:29.280 --> 0:01:32.960 distanced from the world around him. In case you need 0:01:33.000 --> 0:01:38.000 a reminder, here's the first verse of the song. I 0:01:38.040 --> 0:01:41.240 am sitting in the morning at the diner on the corner. 0:01:41.480 --> 0:01:44.120 I am waiting at the counter for the man to 0:01:44.240 --> 0:01:47.840 pour the coffee, and he fills it only halfway and 0:01:47.920 --> 0:01:51.400 before I even argue, he is looking out the window 0:01:51.840 --> 0:01:58.480 at somebody coming in. Now that song doesn't work for me. 0:01:59.160 --> 0:02:01.880 I get that it got really popular, especially after someone 0:02:01.920 --> 0:02:04.800 did an unauthorized remix of it, which is the version 0:02:04.920 --> 0:02:08.160 most people know. But it turned out to be an 0:02:08.320 --> 0:02:12.919 absolute perfect song to test the MP three compression algorithm. 0:02:13.360 --> 0:02:16.800 To understand why, we need to learn about the purpose 0:02:16.880 --> 0:02:20.040 of the MP three compression algorithm in the first place. 0:02:20.360 --> 0:02:23.440 So in this case, the compression we're talking about is 0:02:23.520 --> 0:02:27.200 relating to file size. There's an interesting side note. There's 0:02:27.240 --> 0:02:30.960 a different kind of audio compression. This refers to the 0:02:31.000 --> 0:02:35.360 reduction of dynamic range in a recording, and by that 0:02:35.480 --> 0:02:40.720 I mean reducing the volume distance between the loudest and 0:02:40.800 --> 0:02:44.240 the softest parts of a recording that can actually take 0:02:44.600 --> 0:02:49.639 a part in file compression as well, but that's we're 0:02:49.639 --> 0:02:52.080 going to set it aside. Just put a pin in that, 0:02:52.280 --> 0:02:54.920 take a look at it later on. But with file 0:02:55.000 --> 0:02:58.840 compression generally, the whole goal is to find ways to 0:02:58.960 --> 0:03:03.800 pack information into smaller file sizes. That makes those files 0:03:03.840 --> 0:03:07.600 easier to manage. That's important if you are dealing with 0:03:07.639 --> 0:03:11.000 a limited amount of storage, or maybe you want to 0:03:11.080 --> 0:03:13.639 send the file from one machine to another and you've 0:03:13.639 --> 0:03:17.200 got limited bandwidth so you need smaller file sizes, or 0:03:17.240 --> 0:03:19.760 else the process is going to take way too long, 0:03:20.200 --> 0:03:23.200 But how do you do it well? One approach to 0:03:23.520 --> 0:03:27.160 file compression is to take a real good look at 0:03:27.160 --> 0:03:31.480 the file you're trying to compress, and you ask the question, 0:03:32.360 --> 0:03:35.560 is all the information that is inside this file necessary? 0:03:36.080 --> 0:03:38.840 Or could I get rid of some of that information 0:03:39.400 --> 0:03:43.200 and still have a usable file on the other side 0:03:43.200 --> 0:03:47.080 of it With music, That means figuring out which bits 0:03:47.160 --> 0:03:50.640 of data you can drop without it having a noticeable 0:03:50.680 --> 0:03:55.640 effect on the audio quality. Ideally the compressed file would 0:03:55.680 --> 0:04:00.600 be indistinguishable from the original raw audio, but since tossing 0:04:00.680 --> 0:04:05.160 out information that's not necessarily a guarantee. This is what 0:04:05.320 --> 0:04:10.520 makes the MP three a loss e file format. MP 0:04:10.600 --> 0:04:14.120 three is just one example of a loss e file format. 0:04:14.160 --> 0:04:17.159 There are others, and the word loss e means just 0:04:17.320 --> 0:04:21.039 exactly what you think. It means that some information is 0:04:21.160 --> 0:04:25.359 tossed aside or lost in the process of compressing the 0:04:25.360 --> 0:04:28.520 file to a smaller size. The folks who worked on 0:04:28.560 --> 0:04:32.360 the MP three format had to figure out which information 0:04:32.920 --> 0:04:35.359 was most likely to have little to no impact on 0:04:35.480 --> 0:04:39.600 audio quality within an audio file. To do that, they 0:04:39.640 --> 0:04:43.880 had to take into account human psychology and the limitations 0:04:44.000 --> 0:04:49.159 of human hearing. So psychoacoustics played a big part in 0:04:49.240 --> 0:04:54.359 determining the MP three compression algorithm. So for example, by that, 0:04:54.480 --> 0:04:58.039 I mean, let's think of the range of human hearing 0:04:58.080 --> 0:05:01.160 in terms of frequencies for a second, so your typical 0:05:01.240 --> 0:05:06.360 human is able to hear frequencies as low as twenty 0:05:06.440 --> 0:05:11.000 hurts and as high as twenty thousand hurts or twenty 0:05:11.080 --> 0:05:15.760 killer hurts. Hurts in this case references an oscillation per 0:05:15.760 --> 0:05:19.039 second or a vibration per second, So twenty hurts means 0:05:19.560 --> 0:05:24.479 that something is effectively vibrating twenty times per second. So 0:05:24.520 --> 0:05:27.640 if you had a string that when you plucked, it 0:05:27.680 --> 0:05:31.160 would vibrate twenty times per second. That string is vibrating 0:05:31.200 --> 0:05:35.360 at twenty hurts. That would be a very very low note. 0:05:35.960 --> 0:05:38.880 The higher the frequency, the higher the pitch, and as 0:05:38.920 --> 0:05:41.200 we age, we tend to lose the ability to hear 0:05:41.240 --> 0:05:44.200 some of those higher pitches, which is why you would 0:05:44.240 --> 0:05:48.120 hear about some convenience stores experimenting with playing very high 0:05:48.200 --> 0:05:52.480 pitch noises to discourage young punks who wanted to loiter 0:05:52.640 --> 0:05:56.960 in the joint. So human hearing has limitations, and in 0:05:57.040 --> 0:06:01.880 theory you can eliminate sounds that would fall outside of 0:06:01.920 --> 0:06:05.960 those limitations. If a sound file contains frequencies that are 0:06:06.000 --> 0:06:09.520 at twenty one killer hertz, but your typical person can't 0:06:09.520 --> 0:06:14.560 hear anything above twenty killer hertz, well, at least theoretically, 0:06:14.600 --> 0:06:17.599 you can just toss that information and it won't change anything. 0:06:17.880 --> 0:06:21.240 If a sound file contains a sound but no one 0:06:21.320 --> 0:06:24.279 has the capacity to hear it, does a tree fall 0:06:24.360 --> 0:06:28.440 in the forest? Might be getting a little lost in 0:06:28.480 --> 0:06:32.760 the woods here anyway. That frequency example, that's just one 0:06:32.839 --> 0:06:35.520 example of the sound that humans would have trouble hearing. 0:06:35.920 --> 0:06:40.120 So another is when we hear a very soft sound 0:06:40.200 --> 0:06:44.400 that immediately follows a very loud sound, we don't actually 0:06:44.440 --> 0:06:48.200 perceive the soft one. The loud sound we hear eclipses 0:06:48.279 --> 0:06:51.400 the soft sound, and it turns out we can't hear 0:06:51.440 --> 0:06:54.839 the soft one at all. So again, if we can't 0:06:54.839 --> 0:06:58.120 hear that soft sound that played immediately after a loud one, 0:06:58.680 --> 0:07:00.640 why would you keep it? You know, you might as 0:07:00.680 --> 0:07:02.480 well just get rid of the information you can't hear 0:07:02.520 --> 0:07:05.720 it anyway, Just get rid of it, save the space. 0:07:06.520 --> 0:07:10.120 This psychoacoustic approach to sound would lead the developers of 0:07:10.120 --> 0:07:13.360 the MP three format to create a strategy regarding what 0:07:13.520 --> 0:07:17.600 information to keep and what information to ditch. On top 0:07:17.680 --> 0:07:22.000 of that, the algorithm had sort of a sliding scale, 0:07:22.640 --> 0:07:25.680 So maybe you want to keep as much information as possible, 0:07:25.720 --> 0:07:27.920 so you select that when you create the MP three 0:07:28.400 --> 0:07:32.000 So you're losing less information in the process. You're still 0:07:32.000 --> 0:07:34.160 compressing the file, but not to the extent that you 0:07:34.320 --> 0:07:38.640 could if you chose. Maybe the most important thing to 0:07:38.720 --> 0:07:41.080 you is that you reduce the file size as much 0:07:41.080 --> 0:07:44.440 as you can, so you crank the compression up. Now, 0:07:44.480 --> 0:07:47.880 obviously the harder you go, the more likely you're going 0:07:47.920 --> 0:07:50.680 to lose information that will make a noticeable difference in 0:07:50.720 --> 0:07:54.760 the playback of the audio. File, and you'll you would say, oh, 0:07:54.800 --> 0:07:57.400 the quality here is not as good as I thought 0:07:57.440 --> 0:08:01.000 it would be. This is where Tom's Diner comes in. 0:08:01.840 --> 0:08:05.160 Carl Heinz Brandenburg, who was one of the leads on 0:08:05.280 --> 0:08:09.800 creating the MP three format, used Tom's Diner to listen 0:08:09.840 --> 0:08:13.920 back to compressed files and determine how the compression was 0:08:13.960 --> 0:08:18.960 affecting the audio quality. So it was a great track 0:08:19.040 --> 0:08:23.920 to use because the actual qualities of the recording itself 0:08:24.600 --> 0:08:27.920 were such that it was easy to detect if something 0:08:28.160 --> 0:08:32.200 was not quite right. The original recording of Tom's Diner 0:08:32.320 --> 0:08:36.200 is not the one that has the catchy beat and 0:08:36.240 --> 0:08:38.720 the horns in it. It's a very simple a cappella 0:08:38.800 --> 0:08:42.200 recording of Suzanne Vegas singing her tale of looking at 0:08:42.200 --> 0:08:44.760 the world from a male perspective through a sense of 0:08:44.800 --> 0:08:49.800 distance and attachment. Brandenburg would use that track while tweaking 0:08:49.800 --> 0:08:52.920 the algorithm, trying to create the thin line between an 0:08:53.000 --> 0:08:57.440 effective data compression technique and a minimal impact on sound quality, 0:08:57.559 --> 0:09:00.000 and for her contributions to the effort, although she made 0:09:00.160 --> 0:09:04.600 them unknowingly, branden Berg would name Suzanne Vega the mother 0:09:04.880 --> 0:09:08.800 of the MP three. Interestingly, Ryan maguire decided to take 0:09:08.840 --> 0:09:12.280 a sort of negative image of the compressed Tom's Diner. 0:09:12.320 --> 0:09:15.960 He identified sounds that were deleted in the process of 0:09:16.000 --> 0:09:19.280 creating a lossy version of Tom's Diner, and then it 0:09:19.360 --> 0:09:22.480 created a new recording that contained only the bits that 0:09:22.679 --> 0:09:26.640 had been cut from the file. And it's almost like 0:09:26.760 --> 0:09:29.439 listening to the ghost of a song. In fact, I 0:09:29.480 --> 0:09:33.160 think they called the project the Ghost of the MP three. 0:09:33.240 --> 0:09:35.640 It's pretty creepy stuff. It would not be out of 0:09:35.679 --> 0:09:38.480 place in a horror movie. The fact that lossy files, 0:09:38.480 --> 0:09:41.479 by definition lose information in the process of data compression 0:09:42.120 --> 0:09:45.480 meant that audio files dismissed. The MP three format is 0:09:45.559 --> 0:09:48.800 inherently inferior to others, at least as far as listening 0:09:48.880 --> 0:09:52.440 experiences go, and there are arguments that some of the 0:09:52.520 --> 0:09:56.720 lost information, while potentially being imperceptible within the song itself, 0:09:57.040 --> 0:10:00.720 helped shape the overall sound and tone the piece. So 0:10:00.760 --> 0:10:05.040 though you can't directly hear the stuff that's being cut, 0:10:05.600 --> 0:10:09.480 that stuff actually influences how you perceive other things, so 0:10:09.960 --> 0:10:13.840 you still change the experience of hearing the finished audio. 0:10:14.080 --> 0:10:17.120 But the MP three format create the opportunity to store 0:10:17.200 --> 0:10:20.120 and transfer audio files without having to deal with massive 0:10:20.200 --> 0:10:23.960 raw audio formats, and back in the day that was 0:10:24.000 --> 0:10:27.840 not a trivial thing. And so that is the answer 0:10:27.840 --> 0:10:32.320 to the question. Tom's Diner the first MP three Hope 0:10:32.320 --> 0:10:36.400 you're all well and I'll talk to you again really soon. 0:10:42.880 --> 0:10:47.520 Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio, 0:10:47.840 --> 0:10:51.559 visit the iHeartRadio app, Apple Podcasts, or wherever you listen 0:10:51.600 --> 0:10:56.200 to your favorite shows.