WEBVTT - TechStuff Gets Auto-Tuned (Again) 0:00:04.440 --> 0:00:12.280 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.280 --> 0:00:15.240 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:15.240 --> 0:00:18.439 I'm an executive producer with iHeart Podcasts and how the 0:00:18.520 --> 0:00:21.000 tech are you. You might be able to tell from 0:00:21.000 --> 0:00:24.880 my voice that I have a cold, so I apologize 0:00:24.880 --> 0:00:28.000 for that. But we're going to soldier on because I'm 0:00:28.040 --> 0:00:30.520 back from vacation. It's time to get back to work, 0:00:30.960 --> 0:00:34.880 and I love to talk about the intersection of technology 0:00:35.200 --> 0:00:38.800 and music. So in past episodes, I've done shows about 0:00:38.840 --> 0:00:42.239 how electric guitars work, the history of the Moog or 0:00:42.400 --> 0:00:47.040 Mogue synthesizer, the evolution of various kinds of recordable media, 0:00:47.520 --> 0:00:51.440 and much much more. But way back, like back in 0:00:51.560 --> 0:00:55.240 two thousand and nine, my co host at the time, 0:00:55.360 --> 0:00:58.080 Chris Palette, and I did a little episode of tech 0:00:58.120 --> 0:01:00.760 Stuff about auto tune, and I thought it would be 0:01:00.840 --> 0:01:03.480 fun to go back and revisit that topic. So this 0:01:03.560 --> 0:01:06.440 is not a rerun. It's an all new episode about 0:01:06.480 --> 0:01:09.520 the same subject. I haven't even listened to the old episode, 0:01:09.600 --> 0:01:12.080 so I have no idea how much of what I 0:01:12.160 --> 0:01:14.440 have to say is going to be a repeat. I 0:01:14.440 --> 0:01:16.560 imagine a lot of it will be new, but I 0:01:16.640 --> 0:01:18.840 don't know for sure. I figure there will be fewer 0:01:19.000 --> 0:01:22.319 puns in this version compared to the last one, because, 0:01:22.760 --> 0:01:26.400 contrary to popular belief, it was actually Chris Pillette who 0:01:26.440 --> 0:01:29.680 made the most puns on tech stuff back in the day. 0:01:29.760 --> 0:01:32.360 I got a reputation for it, and I don't get 0:01:32.360 --> 0:01:34.839 me wrong, I won't shy away from a good pun, 0:01:35.160 --> 0:01:37.920 and by that I mean a terrible pun. I love them, 0:01:38.319 --> 0:01:42.720 but Chris like he loved them the way I love, 0:01:43.440 --> 0:01:48.080 you know, rich Indian food, he dined upon puns, So 0:01:48.160 --> 0:01:50.720 probably not as many in this one. But let's talk 0:01:50.760 --> 0:01:54.600 about auto tune now. I think just about everyone knows, 0:01:54.880 --> 0:01:57.240 or I think a lot of people know that Shares 0:01:57.480 --> 0:02:01.600 song Believe, which came out nineteen ninety eight, was the 0:02:01.640 --> 0:02:05.920 first major song to prominently use auto tune in an 0:02:05.960 --> 0:02:10.280 effort to achieve a particular artistic effect, but the technology 0:02:10.280 --> 0:02:13.200 had been around for more than a year at that point, 0:02:13.360 --> 0:02:16.560 and the original intention wasn't to make a tool though 0:02:16.560 --> 0:02:20.119 it actually draw attention to itself. Rather as the name 0:02:20.200 --> 0:02:24.799 autotune suggests, it was intended to automatically nudge the pitch 0:02:24.919 --> 0:02:27.640 of a musical note in the right direction so that 0:02:27.840 --> 0:02:31.160 it would be in tune. That way, the occasional wrong 0:02:31.240 --> 0:02:35.480 note could be subtly pushed into place, and it wouldn't 0:02:35.520 --> 0:02:38.320 require you to do another take and then try to 0:02:38.400 --> 0:02:42.000 splice together a great master recording. But even all of 0:02:42.040 --> 0:02:45.560 that is getting way ahead of ourselves. To understand the 0:02:45.639 --> 0:02:50.720 history of autotune, we must first learn about reflection seismology 0:02:51.080 --> 0:02:54.080 as well as the oil industry. And I am being serious, 0:02:54.120 --> 0:02:56.480 I'm not making a joke about this. As it turns out, 0:02:57.040 --> 0:02:59.960 reflection seismology has a lot to do with our story 0:03:00.200 --> 0:03:03.680 because the man who would go on to found the 0:03:03.760 --> 0:03:07.560 company that would create autotune was a doctor, Andy Hildebrand, 0:03:07.919 --> 0:03:11.760 who had made a career in using sound and complex 0:03:11.840 --> 0:03:17.440 mathematical calculations to help oil companies, namely Exon, locate oil 0:03:17.520 --> 0:03:23.440 deposits underground. So reflection seismology is in some ways similar 0:03:23.560 --> 0:03:27.919 to sonar. So with a sonar system, you would beam 0:03:28.000 --> 0:03:30.560 out pulses of sound waves. Typically we talk about this 0:03:30.760 --> 0:03:34.440 in water, right, Like using sonar on a boat or 0:03:34.480 --> 0:03:37.200 on a submarine, that kind of thing. You would pulse 0:03:37.240 --> 0:03:40.960 out these sound waves and those soundwaves travel outward from 0:03:41.040 --> 0:03:45.080 the source from the speaker. Essentially the transmitter, and if 0:03:45.080 --> 0:03:48.200 there's something solid in the way of those sound waves, well, 0:03:48.480 --> 0:03:50.720 the sound waves that hit that solid object, they're going 0:03:50.800 --> 0:03:54.040 to reflect back toward the source. They'll become an echo. 0:03:54.440 --> 0:03:56.320 This is what we get when we hear an echo. 0:03:56.560 --> 0:03:58.680 If you are ever in a place where you make 0:03:58.920 --> 0:04:01.120 a loud noise, then you hear the echo. It's because 0:04:01.160 --> 0:04:04.160 the sound waves have traveled out from you, bounced off 0:04:04.160 --> 0:04:07.560 something and came back to you. Well, if you if 0:04:07.560 --> 0:04:10.280 you measure the amount of time it took for a 0:04:10.360 --> 0:04:13.760 sound to leave you and then reflect off something else 0:04:14.000 --> 0:04:16.760 and come back to you, you can figure out how 0:04:16.760 --> 0:04:20.320 far away you are from that thing, right, because sound 0:04:20.360 --> 0:04:23.360 is going to travel a specific speed away from you 0:04:23.920 --> 0:04:26.080 and then hit the thing and then travel back to you. 0:04:26.320 --> 0:04:28.400 So if you know how long it took, you can 0:04:28.480 --> 0:04:31.640 do some very simple math and figure out how far 0:04:31.720 --> 0:04:34.600 away that object is. So, for example, if you're on 0:04:34.640 --> 0:04:37.320 a ship and use sonar to measure the distance between 0:04:37.360 --> 0:04:40.400 you and the sea floor, you do a little math. Right, 0:04:40.440 --> 0:04:43.640 you have to divide by two because it took a 0:04:43.640 --> 0:04:46.000 certain amount of time to travel down then back up, 0:04:46.120 --> 0:04:49.159 And you have to know how fast sound travels through water. 0:04:49.320 --> 0:04:52.080 You have to have all these important bits of information 0:04:52.279 --> 0:04:53.960 in your mind when you do this, but then you 0:04:53.960 --> 0:04:57.080 can suss that out. You can say how deep the 0:04:57.080 --> 0:05:00.160 ocean floor is from the surface. That saves you the 0:05:00.160 --> 0:05:02.320 trouble of having to do it the really old fashioned way, 0:05:02.320 --> 0:05:05.880 which typically involved lowering a weight on the end of 0:05:05.920 --> 0:05:10.040 a knotted line, a knotted rope, and then you use 0:05:10.080 --> 0:05:12.400 the knots to keep count of how deep in the 0:05:12.400 --> 0:05:15.320 ocean you were. That's a sounding line. That's the other 0:05:15.360 --> 0:05:18.440 way to see how how far down the ocean floor is. 0:05:18.640 --> 0:05:22.080 But sonar made it way simpler, especially once you were 0:05:22.120 --> 0:05:26.680 able to build that math into the sonar workstations. Reflection 0:05:26.839 --> 0:05:32.040 seismology does something similar, but with seismic waves, and those 0:05:32.080 --> 0:05:35.240 are waves that pass through the earth, and we typically 0:05:35.360 --> 0:05:39.479 talk about seismic waves in connection with earthquakes or like 0:05:39.560 --> 0:05:42.440 volcanic eruptions that kind of thing, and in fact earthquakes 0:05:42.440 --> 0:05:46.120 were what inspired smarting Pants is to say, hey, if 0:05:46.160 --> 0:05:50.120 we made something that could you know, create a huge 0:05:50.200 --> 0:05:53.040 vibration through the earth, and something else that could detect 0:05:53.200 --> 0:05:58.440 those vibrations, and we were able to calculate how long 0:05:58.480 --> 0:06:01.840 it took for the instrumentation to pick up on the 0:06:01.920 --> 0:06:06.400 echoes of that initial vibration event, we might be able 0:06:06.440 --> 0:06:10.920 to figure out stuff that's actually underground. We could figure 0:06:10.920 --> 0:06:13.600 out what is underground without having to dig it up 0:06:13.640 --> 0:06:17.920 and see. Now, that's because the seismic waves will travel 0:06:18.000 --> 0:06:21.360 at different speeds depending upon the density of the material 0:06:21.440 --> 0:06:24.680 that they travel through. You've probably heard things like, you know, 0:06:24.800 --> 0:06:27.599 sound travels at a consistent speed. That's true, but that 0:06:27.680 --> 0:06:31.279 consistent speed is dependent upon the medium through which the 0:06:31.320 --> 0:06:34.479 sound is traveling. So sound travels at a different speed 0:06:34.480 --> 0:06:37.120 through the water than it does through the air or 0:06:37.160 --> 0:06:40.960 through solid objects. You know, vibrations travel at different speeds 0:06:40.960 --> 0:06:43.719 depending upon the medium. So at a very basic level, 0:06:43.760 --> 0:06:46.720 a seismic wave will travel at a constant rate through 0:06:46.839 --> 0:06:51.200 one kind of say rocky soil. But let's say there's 0:06:51.360 --> 0:06:57.000 a place underground where that rocky soil gives way to 0:06:57.279 --> 0:07:03.600 a different material says petroleum for example. Well, then the 0:07:03.720 --> 0:07:07.000 speed of those sound waves is going to change. Moreover, 0:07:07.480 --> 0:07:10.640 as the sound waves hit that barrier between one type 0:07:10.760 --> 0:07:13.560 of material and another, some of the sound waves are 0:07:13.560 --> 0:07:16.120 going to reflect off of that and become an echo. 0:07:16.960 --> 0:07:19.240 Some of the sound waves will continue to penetrate through 0:07:19.440 --> 0:07:23.880 the new material and through lots of observations. We gradually 0:07:23.920 --> 0:07:27.040 began to learn about the different rates at which a 0:07:27.120 --> 0:07:31.160 seismic wave will travel depending upon the medium it's traveling through, 0:07:31.720 --> 0:07:34.520 and if it hits something really solid like bedrock, it 0:07:34.560 --> 0:07:39.760 pretty much just echoes back. So here's how reflection seismology works. 0:07:39.880 --> 0:07:43.160 From a very high level. You set up sensitive equipment 0:07:43.320 --> 0:07:47.040 at different distances from a blast sight, and yeah, you're 0:07:47.160 --> 0:07:49.920 likely to use something like explosives or maybe a really 0:07:50.000 --> 0:07:52.600 powerful air gun. It has to be something that's going 0:07:52.680 --> 0:07:56.040 to give a real jolt to the ground in order 0:07:56.080 --> 0:07:58.240 to do this, because that's essentially what you're doing is 0:07:58.280 --> 0:08:02.400 creating like a very localized earthquake. So this vibration travels 0:08:02.440 --> 0:08:04.960 through the earth, and because you know how far away 0:08:04.960 --> 0:08:08.120 you've set up your measuring equipment from that blast site, 0:08:08.280 --> 0:08:10.560 you already have distance figured out, right. You know how 0:08:10.600 --> 0:08:14.960 far away it is from the original source of the vibration, 0:08:15.400 --> 0:08:17.680 and you measure the time it takes for your equipment 0:08:17.720 --> 0:08:22.360 to pick up the echoes from that particular vibration event. 0:08:22.800 --> 0:08:25.680 So you've got distance and now you have time. Now 0:08:25.720 --> 0:08:28.560 you've got those variables sorted, so you can start to 0:08:28.600 --> 0:08:33.040 work out what material is actually under the ground that 0:08:33.240 --> 0:08:36.679 produces this particular result. And by doing that, you're kind 0:08:36.720 --> 0:08:39.360 of like working backwards. You're using this information to draw 0:08:39.440 --> 0:08:42.600 conclusions about what's under there, and that's where you can 0:08:42.679 --> 0:08:46.360 start to make a determination as to whether or not 0:08:46.400 --> 0:08:48.720 you're standing on top of a Beverly hillbilly is like 0:08:48.800 --> 0:08:52.080 oil deposit, or maybe you're just on top of a 0:08:52.120 --> 0:08:54.800 bunch of rocks or whatever. Now, in order to do 0:08:54.920 --> 0:08:58.840 that what I just described, it's actually incredibly complicated. It 0:08:58.880 --> 0:09:02.800 involves an awful lot of calculations in math, and it's 0:09:02.800 --> 0:09:04.560 a lot of work. But then you have to think 0:09:04.559 --> 0:09:07.600 that drilling for oil is even more work. That's a 0:09:07.679 --> 0:09:11.440 huge endeavor. It costs a lot of time and money 0:09:11.480 --> 0:09:14.120 and effort to do it, and like if you drill 0:09:14.120 --> 0:09:16.920 in the wrong place, like that's a huge loss. So 0:09:17.200 --> 0:09:19.880 you want the best possible information before you select a 0:09:19.960 --> 0:09:23.880 drilling site, and reflection seismology is one way to obtain 0:09:23.960 --> 0:09:27.640 information and to help make a decision. So doctor Hildebrand 0:09:27.840 --> 0:09:30.120 was making a really good living out of this work, 0:09:30.679 --> 0:09:34.520 but companies like Exon were saving hundreds of millions of 0:09:34.559 --> 0:09:39.040 dollars through Hildebrand's approach of narrowing down potential drill sites, 0:09:39.240 --> 0:09:42.679 and Hildebrand thought, you know, I'm not doing badly. I'm 0:09:42.679 --> 0:09:46.040 making a decent living. But you know, Exon is making 0:09:46.080 --> 0:09:49.040 out like a bandit. They're saving like half a billion 0:09:49.120 --> 0:09:52.840 dollars a year or whatever using this technology. Maybe if 0:09:52.920 --> 0:09:58.880 I apply my knowledge and skill set in a company 0:09:59.080 --> 0:10:02.800 that I own, I might actually, you know, do better 0:10:02.880 --> 0:10:06.599 than just working for Exon. So Hildebrand left Exon in 0:10:06.679 --> 0:10:10.640 nineteen seventy nine and he founded a company called Landmark Graphics, 0:10:10.920 --> 0:10:13.760 which at first sounds like, you know, it's a company 0:10:13.760 --> 0:10:17.839 that makes computer graphics, which is not untrue, but that 0:10:17.960 --> 0:10:21.480 wasn't It wasn't just general graphics. This company was still 0:10:21.600 --> 0:10:25.360 rooted in the oil industry. Hildebrand's team developed and produced 0:10:25.400 --> 0:10:30.200 workstations that could take incoming seismic information from these these 0:10:30.280 --> 0:10:33.560 you know, soundings that they do and generate three dimensional 0:10:33.679 --> 0:10:37.840 seismic maps based upon the data. And again, it was 0:10:37.880 --> 0:10:41.960 incredibly complicated. You had to analyze so many different points 0:10:41.960 --> 0:10:46.280 of information in order to create this three dimensional representation 0:10:46.360 --> 0:10:49.520 of what's under the ground. But it worked and it 0:10:49.520 --> 0:10:52.440 made Hildebrand very successful. He stuck with it for a 0:10:52.520 --> 0:10:56.960 decade until nineteen eighty nine, whereupon he retired and he 0:10:57.040 --> 0:11:00.840 decided to return his attention to a different passion he 0:11:00.920 --> 0:11:04.800 had had since he was a kid, which was music. 0:11:05.200 --> 0:11:08.840 Now Hildebrand wasn't just a music fan, he was a musician. 0:11:08.960 --> 0:11:12.680 He had played flute professionally. He had been a studio 0:11:12.800 --> 0:11:15.520 musician for some time. He had paid his way through 0:11:15.520 --> 0:11:20.520 college partly by giving flute lessons to musicians, So he 0:11:20.640 --> 0:11:23.760 decided he would go back to school as a retiree 0:11:24.120 --> 0:11:27.720 and study composition and techniques. He attended Rice University to 0:11:27.760 --> 0:11:31.400 do this. While he was back in college, he encountered 0:11:31.920 --> 0:11:36.200 some newer technologies in the music space, like music samplers 0:11:36.240 --> 0:11:39.560 and synthesizers. So these were machines designed to take a 0:11:39.720 --> 0:11:43.720 sample of a sound like a flute, and then allow 0:11:43.840 --> 0:11:47.120 a keyboard musician to recreate those sounds on a synthesizer. 0:11:47.480 --> 0:11:50.960 The only thing is that Hildebrand thought they sounded terrible, 0:11:51.600 --> 0:11:54.760 and partly it was because there was a limitation on 0:11:54.920 --> 0:11:58.360 how much data a synthesizer could actually handle, so it 0:11:58.400 --> 0:12:04.160 couldn't really replicate sound naturally. The sound it replicated would 0:12:04.200 --> 0:12:07.440 be like a gross approximation of the original sound, So 0:12:07.520 --> 0:12:10.559 Hildebrand wasn't really impressed, but he thought that there was 0:12:10.640 --> 0:12:13.960 room for improvement, and he developed a technique to compress 0:12:14.080 --> 0:12:18.640 audio data so that synthesizers could more effectively handle information 0:12:19.200 --> 0:12:23.240 and make notes, to produce notes that sounded more natural 0:12:23.320 --> 0:12:27.360 and less synthetic. He released his software as a product 0:12:27.360 --> 0:12:32.320 called Infinity, and while this tool would revolutionize the orchestration 0:12:32.400 --> 0:12:35.199 process for stuff like film and television, it did not 0:12:36.240 --> 0:12:40.240 revolutionize doctor Hildebrand's bank account. He didn't actually see much 0:12:40.320 --> 0:12:43.439 of that success himself because what actually happened was other 0:12:43.520 --> 0:12:47.040 companies purchased copies of Infinity and then bundled it with 0:12:47.120 --> 0:12:50.760 their own audio processing tools, and then sold those audio 0:12:50.880 --> 0:12:54.480 processing packages to other people and companies, and it kind 0:12:54.480 --> 0:12:58.520 of cut Hildebrand out of the picture. So while others 0:12:58.520 --> 0:13:02.640 were benefiting from his work, he did not see that 0:13:02.760 --> 0:13:07.320 much success. It did, however, again have an enormous impact 0:13:07.520 --> 0:13:12.400 on orchestrations, like According to doctor Hildebrand, he was the 0:13:12.440 --> 0:13:16.040 reason why the Los Angeles Orchestra hit real hard times 0:13:16.080 --> 0:13:20.360 in the nineteen nineties because his tools allowed composers to 0:13:20.559 --> 0:13:25.640 sample various musical instruments and create a natural enough representation 0:13:26.240 --> 0:13:28.880 of those sounds to be able to create a synthetic 0:13:29.040 --> 0:13:32.960 orchestra that sounded more or less like a real one. 0:13:33.040 --> 0:13:34.720 So there was no need to go and hire a 0:13:34.720 --> 0:13:38.199 real orchestra to orchestrate your film or TV project. You 0:13:38.240 --> 0:13:41.360 could do it yourself. I've actually heard some some of 0:13:41.400 --> 0:13:45.040 my favorite music scores. When I listened closely, I can 0:13:45.120 --> 0:13:49.800 tell like, oh, that's not a real cellist. That's a 0:13:49.880 --> 0:13:54.800 synthesizer playing a sample of a cello that sounds almost, 0:13:54.800 --> 0:13:58.400 but not quite like the real thing. Anyway, we can 0:13:58.480 --> 0:14:01.360 thank doctor Hildebrand for that. I'll talk more about what 0:14:01.440 --> 0:14:05.640 we could thank doctor Hildebrand for, specifically auto tune, but 0:14:05.720 --> 0:14:07.960 first let's take a quick break so we could thank 0:14:08.000 --> 0:14:21.160 some other people, namely our sponsors. Will be right back. Okay, 0:14:21.320 --> 0:14:24.000 So before we left off, I was talking about how 0:14:24.040 --> 0:14:29.000 doctor Hildebrand had released a program called Infinity that improved 0:14:29.160 --> 0:14:34.440 the performance of synthesizers and samplers. But in nineteen ninety 0:14:34.440 --> 0:14:37.040 he decided to take an extra step. He founded a 0:14:37.240 --> 0:14:41.840 new company. He called it Antare's Audio Technology, and this 0:14:41.880 --> 0:14:46.440 would be his music company, his music technology company that 0:14:46.560 --> 0:14:50.760 would ultimately produce autotune. And he knew that technology was 0:14:50.840 --> 0:14:53.480 poised to make a huge impact on the music industry 0:14:53.520 --> 0:14:56.240 and already had been like, that's kind of the history 0:14:56.240 --> 0:14:58.960 of modern music is how technology has shaped it. But 0:14:59.000 --> 0:15:01.440 he knew we were on the brink of another revolution. 0:15:01.520 --> 0:15:03.880 He just wasn't exactly sure how that was going to 0:15:03.960 --> 0:15:08.280 manifest now. According to an article by Simon Reynolds, it's 0:15:08.280 --> 0:15:12.320 titled How Autotune Revolutionized the Sound of Popular Music, and 0:15:12.360 --> 0:15:16.760 it was published in Pitchfork, the actual birth of Hildebrand's 0:15:16.800 --> 0:15:20.560 idea for autotune grew out of a casual lunch with 0:15:20.640 --> 0:15:24.160 some of his friends and peers back in nineteen ninety 0:15:24.160 --> 0:15:29.520 five during a National Association of Music Merchants conference. So 0:15:29.600 --> 0:15:32.720 he's at this conference, he's meeting with other people in 0:15:32.760 --> 0:15:36.360 the music and technology spheres, and at this lunch, one 0:15:36.400 --> 0:15:41.320 of the attendees jokingly suggested that what Hildebrand should do 0:15:41.440 --> 0:15:43.960 next is develop a technology that would allow her to 0:15:44.040 --> 0:15:47.000 sing on key, like, can you make a box that 0:15:47.120 --> 0:15:51.120 lets me sing well? And while this was presented as 0:15:51.160 --> 0:15:55.600 a joke, ultimately Hildebrand would think, huh, could I do 0:15:55.760 --> 0:16:00.400 that now? According to Zachary Crockett's article, which is the 0:16:00.440 --> 0:16:04.080 Mathematical Genius of auto Tune, this one in price Anomics 0:16:04.480 --> 0:16:07.760 This wasn't like a light bulb moment where the moment 0:16:07.840 --> 0:16:11.600 this woman says the thing, Hildebrand immediately thinks, ah, that's 0:16:11.640 --> 0:16:14.480 what I shall do. Actually, it took like another six 0:16:14.560 --> 0:16:18.560 months before Hildebrand really kind of revisited the concept and thought, 0:16:18.920 --> 0:16:21.880 maybe there's something here. But in order to do that, 0:16:22.600 --> 0:16:24.800 he would have to develop a technology that could do 0:16:24.920 --> 0:16:27.680 a few things really well, all of which are a 0:16:27.760 --> 0:16:30.640 bit tricky. One is it would need to detect the 0:16:30.720 --> 0:16:33.200 pitch that someone was singing in. For example, if you're 0:16:33.280 --> 0:16:36.160 using it for vocals, and so you would need to 0:16:36.160 --> 0:16:40.440 be able to detect exactly the frequency that was being sung. 0:16:40.840 --> 0:16:44.640 You would need to then also be able to have 0:16:44.880 --> 0:16:49.800 a list of tones that were in the whatever key 0:16:49.880 --> 0:16:52.120 you were supposed to be singing it. So I don't 0:16:52.120 --> 0:16:54.520 want to get into music theory, because goodness knows, I 0:16:54.520 --> 0:16:56.720 don't know that much about it myself, and I would 0:16:56.720 --> 0:16:59.240 just mess things up. But you know, if you're singing 0:16:59.240 --> 0:17:02.160 in a specific key, there are particular tones that belong 0:17:02.240 --> 0:17:04.920 to that key. And often when we sing and we're 0:17:05.119 --> 0:17:07.480 a little off pitch, what we need is to be 0:17:07.920 --> 0:17:11.560 gently nudged a little up or a little down, a 0:17:11.600 --> 0:17:14.480 little sharp or a little flat in order to hit 0:17:14.600 --> 0:17:18.080 a semitone that belongs in that key. So it needs 0:17:18.119 --> 0:17:22.600 to also quote unquote know which tones are appropriate, and 0:17:22.600 --> 0:17:25.480 then it has to be able to digitally alter the 0:17:25.600 --> 0:17:30.440 incoming pitch the actual sung note, and then guide it 0:17:30.720 --> 0:17:34.119 to match that of a target note. Now, ultimately that 0:17:34.200 --> 0:17:37.080 all sounds like a pretty simple idea, but in reality 0:17:37.560 --> 0:17:42.199 to achieve this it was incredibly complex. Ultimately, also, the 0:17:42.240 --> 0:17:45.159 toolould need to work in real time for live performances. 0:17:45.240 --> 0:17:47.840 Like it's one thing to have this for the studio, right, 0:17:47.880 --> 0:17:50.800 because even if you don't have an automatic, you could 0:17:50.840 --> 0:17:53.800 have a tool where an engineer could fiddle with some 0:17:53.920 --> 0:17:57.919 controls and gently alter the pitch of a performance to 0:17:57.960 --> 0:18:00.520 get it closer to being where it needs to be. 0:18:00.880 --> 0:18:03.119 It would be preferable to have that automated so that 0:18:03.160 --> 0:18:04.800 you don't have to go through there and do the 0:18:04.800 --> 0:18:08.800 manual process. But even so, like in a recording setting, 0:18:08.880 --> 0:18:11.199 you don't have to have it be real time necessarily, 0:18:11.280 --> 0:18:13.119 but if you're doing a live performance, you do have 0:18:13.160 --> 0:18:15.560 to have a real time. If someone's up there singing 0:18:15.920 --> 0:18:19.280 and they just hit a flat note when they're not 0:18:19.320 --> 0:18:23.280 supposed to, that could really be a memorable moment and 0:18:23.320 --> 0:18:25.439 not in a great way. So having a tool that 0:18:25.520 --> 0:18:28.920 could gently account for that and fix it in real 0:18:29.000 --> 0:18:32.880 time would be really helpful. But this would mean that 0:18:33.200 --> 0:18:35.439 this tool would have to be able to process a 0:18:35.560 --> 0:18:40.920 huge amount of sound data extremely quickly to make millisecond 0:18:40.920 --> 0:18:45.880 decisions like split millisecond decisions relating to how to shape 0:18:45.920 --> 0:18:49.879 a note moment by moment. Now it does help if 0:18:49.920 --> 0:18:53.320 we also think of sound in terms of mathematics. We 0:18:53.480 --> 0:18:56.080 describe sound in different ways, right, But some of those 0:18:56.400 --> 0:19:00.280 relate specifically to how sound looks to us. If we 0:19:00.400 --> 0:19:05.280 plot sound on like a wave chart, right. For example, 0:19:05.400 --> 0:19:07.760 sounds can be really loud or they can be really quiet, 0:19:08.119 --> 0:19:12.040 and that is volume, But it can also relate to amplitude. 0:19:12.440 --> 0:19:14.920 When you think of a sound wave. The amplitude of 0:19:14.960 --> 0:19:18.240 a sound wave describes how tall those peaks are or 0:19:18.280 --> 0:19:22.720 how low the valleys are. The distance between the furthest 0:19:23.080 --> 0:19:26.440 point of a peak or valley and the zero line. 0:19:26.800 --> 0:19:30.320 That's your amplitude. But we also describe sound in terms 0:19:30.320 --> 0:19:34.880 of pitch or frequencies. Higher frequencies correspond to higher pitches, 0:19:35.200 --> 0:19:37.560 And if we plot a sound wave, let's say that 0:19:37.600 --> 0:19:40.919 we plot it so that the x axis is a 0:19:41.000 --> 0:19:46.639 demarcation of time, so we have one second listed there, 0:19:46.880 --> 0:19:49.640 like the x axis is one second. If there's one 0:19:49.760 --> 0:19:52.399 wave that we draw so that the wave begins at 0:19:52.440 --> 0:19:54.880 the zero point and ends at the one second point, 0:19:55.119 --> 0:19:58.760 then we have a one hurtz sound wave. A hurtz 0:19:59.200 --> 0:20:03.040 is just a measurement a frequency. It refers to one 0:20:03.240 --> 0:20:06.960 cycle per second. So if a wave is one hurts, 0:20:06.960 --> 0:20:09.240 it means it takes one second for one of those 0:20:09.240 --> 0:20:13.120 sound waves to fully pass a given point where you're 0:20:13.160 --> 0:20:17.920 measuring the sound waves, right, If two waves pass that 0:20:18.040 --> 0:20:20.679 point within one second, then you're talking about two hurts, 0:20:21.040 --> 0:20:23.440 you know. Just so that we know, the typical human 0:20:23.480 --> 0:20:28.080 hearing range is anywhere between twenty and twenty thousand hurts. 0:20:28.320 --> 0:20:31.359 So one or two hurts sound We wouldn't even perceive it, 0:20:31.359 --> 0:20:33.720 at least not as sound. If it was a great 0:20:33.800 --> 0:20:36.840 enough amplitude, you could potentially perceive it as vibration, but 0:20:36.880 --> 0:20:40.359 you wouldn't feel it, you wouldn't hear it. But between 0:20:40.359 --> 0:20:43.960 twenty and twenty thousand hurts, that falls into the typical 0:20:44.040 --> 0:20:46.120 range of human hearing. Of course, as we get older, 0:20:46.119 --> 0:20:49.440 we start to lose the ability to hear those higher frequencies. 0:20:50.040 --> 0:20:53.600 These days, I think my hearing tops out around sixteen 0:20:53.640 --> 0:20:57.040 to seventeen thousand hurts somewhere around there. Like once you 0:20:57.080 --> 0:21:00.320 get beyond that, I don't hear anything, whereas younger people 0:21:00.320 --> 0:21:04.120 could hear it. Anyway, Hildebrand was working with music on 0:21:04.160 --> 0:21:07.840 this mathematical level. He was analyzing music to recognize where 0:21:07.840 --> 0:21:11.520 the frequencies were and where they should be, and to 0:21:11.600 --> 0:21:14.760 then shape the sound wave so that it would fit 0:21:15.440 --> 0:21:18.879 what the ideal would be where it would be on key. 0:21:19.520 --> 0:21:22.840 He was not the first person to attempt to do this, however, 0:21:22.920 --> 0:21:27.400 Earlier engineers had largely abandoned the quest because the signal 0:21:27.480 --> 0:21:32.800 processing and statistical analysis needs were so high. They were 0:21:32.840 --> 0:21:37.000 so extreme that you would need a supercomputer dedicated to 0:21:37.040 --> 0:21:39.119 the task to be able to do it. There's just 0:21:39.200 --> 0:21:43.399 too much data to process in too little time to 0:21:43.480 --> 0:21:46.720 be able to do anything meaningful with it. Hildebrand determined 0:21:46.720 --> 0:21:49.639 that yeah, to fully analyze music, you would have to 0:21:49.720 --> 0:21:54.280 run thousands or millions of calculations, but many of those 0:21:54.320 --> 0:21:57.280 calculations were actually redundant at the end of the day, 0:21:57.440 --> 0:22:00.919 and eliminating the redundancy would not affect the quality of 0:22:00.960 --> 0:22:04.240 the outcome, and so in his words he quote changed 0:22:04.400 --> 0:22:09.560 a million multiply ads into just four. It was a trick, 0:22:10.080 --> 0:22:14.040 a mathematical trick. End quote. That's ron the article I 0:22:14.119 --> 0:22:19.479 mentioned earlier by Zachary Crockett. So yeah, in prisonomics, pretty 0:22:20.119 --> 0:22:24.520 phenomenal that he was able to recognize that ultimately he 0:22:24.600 --> 0:22:28.879 just needed these four processes to really be able to 0:22:29.240 --> 0:22:33.560 zero in on pitch correction. So Hildebrand developed the autotune 0:22:33.600 --> 0:22:36.760 technology in nineteen ninety six. He actually used to customized 0:22:36.840 --> 0:22:40.080 Mac computer or specialized Mac computer as the way I've 0:22:40.119 --> 0:22:43.119 seen it explained. I don't know in what way it 0:22:43.160 --> 0:22:46.000 was specialized. I just know it was a Mac. And 0:22:46.200 --> 0:22:50.000 he brought his software to the next National Association of 0:22:50.119 --> 0:22:52.840 Music Merchant's conference, if you remember, that was the same 0:22:52.920 --> 0:22:56.639 conference where one of his lunch companions had inspired the 0:22:56.880 --> 0:22:59.639 idea for autotune in the first place. To say that 0:22:59.680 --> 0:23:03.040 he felt interest in his product at this conference is 0:23:03.119 --> 0:23:07.639 really under selling it, and it's understandable why. So let's 0:23:07.760 --> 0:23:12.119 talk about the process of creating a master recording for 0:23:12.200 --> 0:23:15.520 a song. If you want to get a perfect take 0:23:16.040 --> 0:23:20.800 of a song, where this is the master recording, this 0:23:20.840 --> 0:23:23.879 is what you want to use in order to you know, 0:23:23.960 --> 0:23:28.560 create your album. You can't just hope that everything lines 0:23:28.640 --> 0:23:32.560 up when you hit record and that everyone is playing 0:23:32.640 --> 0:23:37.080 seamlessly together and no one makes a mistake. Invariably something 0:23:37.359 --> 0:23:40.159 is going to be off. Maybe one of the musicians 0:23:40.200 --> 0:23:42.560 is lagging behind the others and it might not even 0:23:42.600 --> 0:23:46.159 be detectable at first, but upon closer examination you're like, ooh, 0:23:46.200 --> 0:23:49.280 you came in late, or you came in too early 0:23:49.359 --> 0:23:52.960 or whatever. Or the drummer is not keeping perfect time, 0:23:53.040 --> 0:23:55.680 whatever it may be. Maybe someone hits a wrong note, 0:23:56.000 --> 0:23:59.280 either while playing an instrument or while singing, or maybe both. 0:23:59.760 --> 0:24:02.760 But what it means for engineers is that they'll need 0:24:02.800 --> 0:24:06.760 to get another take where that mistake isn't there, and 0:24:06.800 --> 0:24:09.840 they'll probably need another take and another take, And if 0:24:09.840 --> 0:24:12.800