WEBVTT - TechStuff Gets Auto-Tuned (Again)

0:00:04.440 --> 0:00:12.280
<v Speaker 1>Welcome to tech Stuff, a production from iHeartRadio. Hey there,

0:00:12.280 --> 0:00:15.240
<v Speaker 1>and welcome to tech Stuff. I'm your host, Jonathan Strickland.

0:00:15.240 --> 0:00:18.439
<v Speaker 1>I'm an executive producer with iHeart Podcasts and how the

0:00:18.520 --> 0:00:21.000
<v Speaker 1>tech are you. You might be able to tell from

0:00:21.000 --> 0:00:24.880
<v Speaker 1>my voice that I have a cold, so I apologize

0:00:24.880 --> 0:00:28.000
<v Speaker 1>for that. But we're going to soldier on because I'm

0:00:28.040 --> 0:00:30.520
<v Speaker 1>back from vacation. It's time to get back to work,

0:00:30.960 --> 0:00:34.880
<v Speaker 1>and I love to talk about the intersection of technology

0:00:35.200 --> 0:00:38.800
<v Speaker 1>and music. So in past episodes, I've done shows about

0:00:38.840 --> 0:00:42.239
<v Speaker 1>how electric guitars work, the history of the Moog or

0:00:42.400 --> 0:00:47.040
<v Speaker 1>Mogue synthesizer, the evolution of various kinds of recordable media,

0:00:47.520 --> 0:00:51.440
<v Speaker 1>and much much more. But way back, like back in

0:00:51.560 --> 0:00:55.240
<v Speaker 1>two thousand and nine, my co host at the time,

0:00:55.360 --> 0:00:58.080
<v Speaker 1>Chris Palette, and I did a little episode of tech

0:00:58.120 --> 0:01:00.760
<v Speaker 1>Stuff about auto tune, and I thought it would be

0:01:00.840 --> 0:01:03.480
<v Speaker 1>fun to go back and revisit that topic. So this

0:01:03.560 --> 0:01:06.440
<v Speaker 1>is not a rerun. It's an all new episode about

0:01:06.480 --> 0:01:09.520
<v Speaker 1>the same subject. I haven't even listened to the old episode,

0:01:09.600 --> 0:01:12.080
<v Speaker 1>so I have no idea how much of what I

0:01:12.160 --> 0:01:14.440
<v Speaker 1>have to say is going to be a repeat. I

0:01:14.440 --> 0:01:16.560
<v Speaker 1>imagine a lot of it will be new, but I

0:01:16.640 --> 0:01:18.840
<v Speaker 1>don't know for sure. I figure there will be fewer

0:01:19.000 --> 0:01:22.319
<v Speaker 1>puns in this version compared to the last one, because,

0:01:22.760 --> 0:01:26.400
<v Speaker 1>contrary to popular belief, it was actually Chris Pillette who

0:01:26.440 --> 0:01:29.680
<v Speaker 1>made the most puns on tech stuff back in the day.

0:01:29.760 --> 0:01:32.360
<v Speaker 1>I got a reputation for it, and I don't get

0:01:32.360 --> 0:01:34.839
<v Speaker 1>me wrong, I won't shy away from a good pun,

0:01:35.160 --> 0:01:37.920
<v Speaker 1>and by that I mean a terrible pun. I love them,

0:01:38.319 --> 0:01:42.720
<v Speaker 1>but Chris like he loved them the way I love,

0:01:43.440 --> 0:01:48.080
<v Speaker 1>you know, rich Indian food, he dined upon puns, So

0:01:48.160 --> 0:01:50.720
<v Speaker 1>probably not as many in this one. But let's talk

0:01:50.760 --> 0:01:54.600
<v Speaker 1>about auto tune now. I think just about everyone knows,

0:01:54.880 --> 0:01:57.240
<v Speaker 1>or I think a lot of people know that Shares

0:01:57.480 --> 0:02:01.600
<v Speaker 1>song Believe, which came out nineteen ninety eight, was the

0:02:01.640 --> 0:02:05.920
<v Speaker 1>first major song to prominently use auto tune in an

0:02:05.960 --> 0:02:10.280
<v Speaker 1>effort to achieve a particular artistic effect, but the technology

0:02:10.280 --> 0:02:13.200
<v Speaker 1>had been around for more than a year at that point,

0:02:13.360 --> 0:02:16.560
<v Speaker 1>and the original intention wasn't to make a tool though

0:02:16.560 --> 0:02:20.119
<v Speaker 1>it actually draw attention to itself. Rather as the name

0:02:20.200 --> 0:02:24.799
<v Speaker 1>autotune suggests, it was intended to automatically nudge the pitch

0:02:24.919 --> 0:02:27.640
<v Speaker 1>of a musical note in the right direction so that

0:02:27.840 --> 0:02:31.160
<v Speaker 1>it would be in tune. That way, the occasional wrong

0:02:31.240 --> 0:02:35.480
<v Speaker 1>note could be subtly pushed into place, and it wouldn't

0:02:35.520 --> 0:02:38.320
<v Speaker 1>require you to do another take and then try to

0:02:38.400 --> 0:02:42.000
<v Speaker 1>splice together a great master recording. But even all of

0:02:42.040 --> 0:02:45.560
<v Speaker 1>that is getting way ahead of ourselves. To understand the

0:02:45.639 --> 0:02:50.720
<v Speaker 1>history of autotune, we must first learn about reflection seismology

0:02:51.080 --> 0:02:54.080
<v Speaker 1>as well as the oil industry. And I am being serious,

0:02:54.120 --> 0:02:56.480
<v Speaker 1>I'm not making a joke about this. As it turns out,

0:02:57.040 --> 0:02:59.960
<v Speaker 1>reflection seismology has a lot to do with our story

0:03:00.200 --> 0:03:03.680
<v Speaker 1>because the man who would go on to found the

0:03:03.760 --> 0:03:07.560
<v Speaker 1>company that would create autotune was a doctor, Andy Hildebrand,

0:03:07.919 --> 0:03:11.760
<v Speaker 1>who had made a career in using sound and complex

0:03:11.840 --> 0:03:17.440
<v Speaker 1>mathematical calculations to help oil companies, namely Exon, locate oil

0:03:17.520 --> 0:03:23.440
<v Speaker 1>deposits underground. So reflection seismology is in some ways similar

0:03:23.560 --> 0:03:27.919
<v Speaker 1>to sonar. So with a sonar system, you would beam

0:03:28.000 --> 0:03:30.560
<v Speaker 1>out pulses of sound waves. Typically we talk about this

0:03:30.760 --> 0:03:34.440
<v Speaker 1>in water, right, Like using sonar on a boat or

0:03:34.480 --> 0:03:37.200
<v Speaker 1>on a submarine, that kind of thing. You would pulse

0:03:37.240 --> 0:03:40.960
<v Speaker 1>out these sound waves and those soundwaves travel outward from

0:03:41.040 --> 0:03:45.080
<v Speaker 1>the source from the speaker. Essentially the transmitter, and if

0:03:45.080 --> 0:03:48.200
<v Speaker 1>there's something solid in the way of those sound waves, well,

0:03:48.480 --> 0:03:50.720
<v Speaker 1>the sound waves that hit that solid object, they're going

0:03:50.800 --> 0:03:54.040
<v Speaker 1>to reflect back toward the source. They'll become an echo.

0:03:54.440 --> 0:03:56.320
<v Speaker 1>This is what we get when we hear an echo.

0:03:56.560 --> 0:03:58.680
<v Speaker 1>If you are ever in a place where you make

0:03:58.920 --> 0:04:01.120
<v Speaker 1>a loud noise, then you hear the echo. It's because

0:04:01.160 --> 0:04:04.160
<v Speaker 1>the sound waves have traveled out from you, bounced off

0:04:04.160 --> 0:04:07.560
<v Speaker 1>something and came back to you. Well, if you if

0:04:07.560 --> 0:04:10.280
<v Speaker 1>you measure the amount of time it took for a

0:04:10.360 --> 0:04:13.760
<v Speaker 1>sound to leave you and then reflect off something else

0:04:14.000 --> 0:04:16.760
<v Speaker 1>and come back to you, you can figure out how

0:04:16.760 --> 0:04:20.320
<v Speaker 1>far away you are from that thing, right, because sound

0:04:20.360 --> 0:04:23.360
<v Speaker 1>is going to travel a specific speed away from you

0:04:23.920 --> 0:04:26.080
<v Speaker 1>and then hit the thing and then travel back to you.

0:04:26.320 --> 0:04:28.400
<v Speaker 1>So if you know how long it took, you can

0:04:28.480 --> 0:04:31.640
<v Speaker 1>do some very simple math and figure out how far

0:04:31.720 --> 0:04:34.600
<v Speaker 1>away that object is. So, for example, if you're on

0:04:34.640 --> 0:04:37.320
<v Speaker 1>a ship and use sonar to measure the distance between

0:04:37.360 --> 0:04:40.400
<v Speaker 1>you and the sea floor, you do a little math. Right,

0:04:40.440 --> 0:04:43.640
<v Speaker 1>you have to divide by two because it took a

0:04:43.640 --> 0:04:46.000
<v Speaker 1>certain amount of time to travel down then back up,

0:04:46.120 --> 0:04:49.159
<v Speaker 1>And you have to know how fast sound travels through water.

0:04:49.320 --> 0:04:52.080
<v Speaker 1>You have to have all these important bits of information

0:04:52.279 --> 0:04:53.960
<v Speaker 1>in your mind when you do this, but then you

0:04:53.960 --> 0:04:57.080
<v Speaker 1>can suss that out. You can say how deep the

0:04:57.080 --> 0:05:00.160
<v Speaker 1>ocean floor is from the surface. That saves you the

0:05:00.160 --> 0:05:02.320
<v Speaker 1>trouble of having to do it the really old fashioned way,

0:05:02.320 --> 0:05:05.880
<v Speaker 1>which typically involved lowering a weight on the end of

0:05:05.920 --> 0:05:10.040
<v Speaker 1>a knotted line, a knotted rope, and then you use

0:05:10.080 --> 0:05:12.400
<v Speaker 1>the knots to keep count of how deep in the

0:05:12.400 --> 0:05:15.320
<v Speaker 1>ocean you were. That's a sounding line. That's the other

0:05:15.360 --> 0:05:18.440
<v Speaker 1>way to see how how far down the ocean floor is.

0:05:18.640 --> 0:05:22.080
<v Speaker 1>But sonar made it way simpler, especially once you were

0:05:22.120 --> 0:05:26.680
<v Speaker 1>able to build that math into the sonar workstations. Reflection

0:05:26.839 --> 0:05:32.040
<v Speaker 1>seismology does something similar, but with seismic waves, and those

0:05:32.080 --> 0:05:35.240
<v Speaker 1>are waves that pass through the earth, and we typically

0:05:35.360 --> 0:05:39.479
<v Speaker 1>talk about seismic waves in connection with earthquakes or like

0:05:39.560 --> 0:05:42.440
<v Speaker 1>volcanic eruptions that kind of thing, and in fact earthquakes

0:05:42.440 --> 0:05:46.120
<v Speaker 1>were what inspired smarting Pants is to say, hey, if

0:05:46.160 --> 0:05:50.120
<v Speaker 1>we made something that could you know, create a huge

0:05:50.200 --> 0:05:53.040
<v Speaker 1>vibration through the earth, and something else that could detect

0:05:53.200 --> 0:05:58.440
<v Speaker 1>those vibrations, and we were able to calculate how long

0:05:58.480 --> 0:06:01.840
<v Speaker 1>it took for the instrumentation to pick up on the

0:06:01.920 --> 0:06:06.400
<v Speaker 1>echoes of that initial vibration event, we might be able

0:06:06.440 --> 0:06:10.920
<v Speaker 1>to figure out stuff that's actually underground. We could figure

0:06:10.920 --> 0:06:13.600
<v Speaker 1>out what is underground without having to dig it up

0:06:13.640 --> 0:06:17.920
<v Speaker 1>and see. Now, that's because the seismic waves will travel

0:06:18.000 --> 0:06:21.360
<v Speaker 1>at different speeds depending upon the density of the material

0:06:21.440 --> 0:06:24.680
<v Speaker 1>that they travel through. You've probably heard things like, you know,

0:06:24.800 --> 0:06:27.599
<v Speaker 1>sound travels at a consistent speed. That's true, but that

0:06:27.680 --> 0:06:31.279
<v Speaker 1>consistent speed is dependent upon the medium through which the

0:06:31.320 --> 0:06:34.479
<v Speaker 1>sound is traveling. So sound travels at a different speed

0:06:34.480 --> 0:06:37.120
<v Speaker 1>through the water than it does through the air or

0:06:37.160 --> 0:06:40.960
<v Speaker 1>through solid objects. You know, vibrations travel at different speeds

0:06:40.960 --> 0:06:43.719
<v Speaker 1>depending upon the medium. So at a very basic level,

0:06:43.760 --> 0:06:46.720
<v Speaker 1>a seismic wave will travel at a constant rate through

0:06:46.839 --> 0:06:51.200
<v Speaker 1>one kind of say rocky soil. But let's say there's

0:06:51.360 --> 0:06:57.000
<v Speaker 1>a place underground where that rocky soil gives way to

0:06:57.279 --> 0:07:03.600
<v Speaker 1>a different material says petroleum for example. Well, then the

0:07:03.720 --> 0:07:07.000
<v Speaker 1>speed of those sound waves is going to change. Moreover,

0:07:07.480 --> 0:07:10.640
<v Speaker 1>as the sound waves hit that barrier between one type

0:07:10.760 --> 0:07:13.560
<v Speaker 1>of material and another, some of the sound waves are

0:07:13.560 --> 0:07:16.120
<v Speaker 1>going to reflect off of that and become an echo.

0:07:16.960 --> 0:07:19.240
<v Speaker 1>Some of the sound waves will continue to penetrate through

0:07:19.440 --> 0:07:23.880
<v Speaker 1>the new material and through lots of observations. We gradually

0:07:23.920 --> 0:07:27.040
<v Speaker 1>began to learn about the different rates at which a

0:07:27.120 --> 0:07:31.160
<v Speaker 1>seismic wave will travel depending upon the medium it's traveling through,

0:07:31.720 --> 0:07:34.520
<v Speaker 1>and if it hits something really solid like bedrock, it

0:07:34.560 --> 0:07:39.760
<v Speaker 1>pretty much just echoes back. So here's how reflection seismology works.

0:07:39.880 --> 0:07:43.160
<v Speaker 1>From a very high level. You set up sensitive equipment

0:07:43.320 --> 0:07:47.040
<v Speaker 1>at different distances from a blast sight, and yeah, you're

0:07:47.160 --> 0:07:49.920
<v Speaker 1>likely to use something like explosives or maybe a really

0:07:50.000 --> 0:07:52.600
<v Speaker 1>powerful air gun. It has to be something that's going

0:07:52.680 --> 0:07:56.040
<v Speaker 1>to give a real jolt to the ground in order

0:07:56.080 --> 0:07:58.240
<v Speaker 1>to do this, because that's essentially what you're doing is

0:07:58.280 --> 0:08:02.400
<v Speaker 1>creating like a very localized earthquake. So this vibration travels

0:08:02.440 --> 0:08:04.960
<v Speaker 1>through the earth, and because you know how far away

0:08:04.960 --> 0:08:08.120
<v Speaker 1>you've set up your measuring equipment from that blast site,

0:08:08.280 --> 0:08:10.560
<v Speaker 1>you already have distance figured out, right. You know how

0:08:10.600 --> 0:08:14.960
<v Speaker 1>far away it is from the original source of the vibration,

0:08:15.400 --> 0:08:17.680
<v Speaker 1>and you measure the time it takes for your equipment

0:08:17.720 --> 0:08:22.360
<v Speaker 1>to pick up the echoes from that particular vibration event.

0:08:22.800 --> 0:08:25.680
<v Speaker 1>So you've got distance and now you have time. Now

0:08:25.720 --> 0:08:28.560
<v Speaker 1>you've got those variables sorted, so you can start to

0:08:28.600 --> 0:08:33.040
<v Speaker 1>work out what material is actually under the ground that

0:08:33.240 --> 0:08:36.679
<v Speaker 1>produces this particular result. And by doing that, you're kind

0:08:36.720 --> 0:08:39.360
<v Speaker 1>of like working backwards. You're using this information to draw

0:08:39.440 --> 0:08:42.600
<v Speaker 1>conclusions about what's under there, and that's where you can

0:08:42.679 --> 0:08:46.360
<v Speaker 1>start to make a determination as to whether or not

0:08:46.400 --> 0:08:48.720
<v Speaker 1>you're standing on top of a Beverly hillbilly is like

0:08:48.800 --> 0:08:52.080
<v Speaker 1>oil deposit, or maybe you're just on top of a

0:08:52.120 --> 0:08:54.800
<v Speaker 1>bunch of rocks or whatever. Now, in order to do

0:08:54.920 --> 0:08:58.840
<v Speaker 1>that what I just described, it's actually incredibly complicated. It

0:08:58.880 --> 0:09:02.800
<v Speaker 1>involves an awful lot of calculations in math, and it's

0:09:02.800 --> 0:09:04.560
<v Speaker 1>a lot of work. But then you have to think

0:09:04.559 --> 0:09:07.600
<v Speaker 1>that drilling for oil is even more work. That's a

0:09:07.679 --> 0:09:11.440
<v Speaker 1>huge endeavor. It costs a lot of time and money

0:09:11.480 --> 0:09:14.120
<v Speaker 1>and effort to do it, and like if you drill

0:09:14.120 --> 0:09:16.920
<v Speaker 1>in the wrong place, like that's a huge loss. So

0:09:17.200 --> 0:09:19.880
<v Speaker 1>you want the best possible information before you select a

0:09:19.960 --> 0:09:23.880
<v Speaker 1>drilling site, and reflection seismology is one way to obtain

0:09:23.960 --> 0:09:27.640
<v Speaker 1>information and to help make a decision. So doctor Hildebrand

0:09:27.840 --> 0:09:30.120
<v Speaker 1>was making a really good living out of this work,

0:09:30.679 --> 0:09:34.520
<v Speaker 1>but companies like Exon were saving hundreds of millions of

0:09:34.559 --> 0:09:39.040
<v Speaker 1>dollars through Hildebrand's approach of narrowing down potential drill sites,

0:09:39.240 --> 0:09:42.679
<v Speaker 1>and Hildebrand thought, you know, I'm not doing badly. I'm

0:09:42.679 --> 0:09:46.040
<v Speaker 1>making a decent living. But you know, Exon is making

0:09:46.080 --> 0:09:49.040
<v Speaker 1>out like a bandit. They're saving like half a billion

0:09:49.120 --> 0:09:52.840
<v Speaker 1>dollars a year or whatever using this technology. Maybe if

0:09:52.920 --> 0:09:58.880
<v Speaker 1>I apply my knowledge and skill set in a company

0:09:59.080 --> 0:10:02.800
<v Speaker 1>that I own, I might actually, you know, do better

0:10:02.880 --> 0:10:06.599
<v Speaker 1>than just working for Exon. So Hildebrand left Exon in

0:10:06.679 --> 0:10:10.640
<v Speaker 1>nineteen seventy nine and he founded a company called Landmark Graphics,

0:10:10.920 --> 0:10:13.760
<v Speaker 1>which at first sounds like, you know, it's a company

0:10:13.760 --> 0:10:17.839
<v Speaker 1>that makes computer graphics, which is not untrue, but that

0:10:17.960 --> 0:10:21.480
<v Speaker 1>wasn't It wasn't just general graphics. This company was still

0:10:21.600 --> 0:10:25.360
<v Speaker 1>rooted in the oil industry. Hildebrand's team developed and produced

0:10:25.400 --> 0:10:30.200
<v Speaker 1>workstations that could take incoming seismic information from these these

0:10:30.280 --> 0:10:33.560
<v Speaker 1>you know, soundings that they do and generate three dimensional

0:10:33.679 --> 0:10:37.840
<v Speaker 1>seismic maps based upon the data. And again, it was

0:10:37.880 --> 0:10:41.960
<v Speaker 1>incredibly complicated. You had to analyze so many different points

0:10:41.960 --> 0:10:46.280
<v Speaker 1>of information in order to create this three dimensional representation

0:10:46.360 --> 0:10:49.520
<v Speaker 1>of what's under the ground. But it worked and it

0:10:49.520 --> 0:10:52.440
<v Speaker 1>made Hildebrand very successful. He stuck with it for a

0:10:52.520 --> 0:10:56.960
<v Speaker 1>decade until nineteen eighty nine, whereupon he retired and he

0:10:57.040 --> 0:11:00.840
<v Speaker 1>decided to return his attention to a different passion he

0:11:00.920 --> 0:11:04.800
<v Speaker 1>had had since he was a kid, which was music.

0:11:05.200 --> 0:11:08.840
<v Speaker 1>Now Hildebrand wasn't just a music fan, he was a musician.

0:11:08.960 --> 0:11:12.680
<v Speaker 1>He had played flute professionally. He had been a studio

0:11:12.800 --> 0:11:15.520
<v Speaker 1>musician for some time. He had paid his way through

0:11:15.520 --> 0:11:20.520
<v Speaker 1>college partly by giving flute lessons to musicians, So he

0:11:20.640 --> 0:11:23.760
<v Speaker 1>decided he would go back to school as a retiree

0:11:24.120 --> 0:11:27.720
<v Speaker 1>and study composition and techniques. He attended Rice University to

0:11:27.760 --> 0:11:31.400
<v Speaker 1>do this. While he was back in college, he encountered

0:11:31.920 --> 0:11:36.200
<v Speaker 1>some newer technologies in the music space, like music samplers

0:11:36.240 --> 0:11:39.560
<v Speaker 1>and synthesizers. So these were machines designed to take a

0:11:39.720 --> 0:11:43.720
<v Speaker 1>sample of a sound like a flute, and then allow

0:11:43.840 --> 0:11:47.120
<v Speaker 1>a keyboard musician to recreate those sounds on a synthesizer.

0:11:47.480 --> 0:11:50.960
<v Speaker 1>The only thing is that Hildebrand thought they sounded terrible,

0:11:51.600 --> 0:11:54.760
<v Speaker 1>and partly it was because there was a limitation on

0:11:54.920 --> 0:11:58.360
<v Speaker 1>how much data a synthesizer could actually handle, so it

0:11:58.400 --> 0:12:04.160
<v Speaker 1>couldn't really replicate sound naturally. The sound it replicated would

0:12:04.200 --> 0:12:07.440
<v Speaker 1>be like a gross approximation of the original sound, So

0:12:07.520 --> 0:12:10.559
<v Speaker 1>Hildebrand wasn't really impressed, but he thought that there was

0:12:10.640 --> 0:12:13.960
<v Speaker 1>room for improvement, and he developed a technique to compress

0:12:14.080 --> 0:12:18.640
<v Speaker 1>audio data so that synthesizers could more effectively handle information

0:12:19.200 --> 0:12:23.240
<v Speaker 1>and make notes, to produce notes that sounded more natural

0:12:23.320 --> 0:12:27.360
<v Speaker 1>and less synthetic. He released his software as a product

0:12:27.360 --> 0:12:32.320
<v Speaker 1>called Infinity, and while this tool would revolutionize the orchestration

0:12:32.400 --> 0:12:35.199
<v Speaker 1>process for stuff like film and television, it did not

0:12:36.240 --> 0:12:40.240
<v Speaker 1>revolutionize doctor Hildebrand's bank account. He didn't actually see much

0:12:40.320 --> 0:12:43.439
<v Speaker 1>of that success himself because what actually happened was other

0:12:43.520 --> 0:12:47.040
<v Speaker 1>companies purchased copies of Infinity and then bundled it with

0:12:47.120 --> 0:12:50.760
<v Speaker 1>their own audio processing tools, and then sold those audio

0:12:50.880 --> 0:12:54.480
<v Speaker 1>processing packages to other people and companies, and it kind

0:12:54.480 --> 0:12:58.520
<v Speaker 1>of cut Hildebrand out of the picture. So while others

0:12:58.520 --> 0:13:02.640
<v Speaker 1>were benefiting from his work, he did not see that

0:13:02.760 --> 0:13:07.320
<v Speaker 1>much success. It did, however, again have an enormous impact

0:13:07.520 --> 0:13:12.400
<v Speaker 1>on orchestrations, like According to doctor Hildebrand, he was the

0:13:12.440 --> 0:13:16.040
<v Speaker 1>reason why the Los Angeles Orchestra hit real hard times

0:13:16.080 --> 0:13:20.360
<v Speaker 1>in the nineteen nineties because his tools allowed composers to

0:13:20.559 --> 0:13:25.640
<v Speaker 1>sample various musical instruments and create a natural enough representation

0:13:26.240 --> 0:13:28.880
<v Speaker 1>of those sounds to be able to create a synthetic

0:13:29.040 --> 0:13:32.960
<v Speaker 1>orchestra that sounded more or less like a real one.

0:13:33.040 --> 0:13:34.720
<v Speaker 1>So there was no need to go and hire a

0:13:34.720 --> 0:13:38.199
<v Speaker 1>real orchestra to orchestrate your film or TV project. You

0:13:38.240 --> 0:13:41.360
<v Speaker 1>could do it yourself. I've actually heard some some of

0:13:41.400 --> 0:13:45.040
<v Speaker 1>my favorite music scores. When I listened closely, I can

0:13:45.120 --> 0:13:49.800
<v Speaker 1>tell like, oh, that's not a real cellist. That's a

0:13:49.880 --> 0:13:54.800
<v Speaker 1>synthesizer playing a sample of a cello that sounds almost,

0:13:54.800 --> 0:13:58.400
<v Speaker 1>but not quite like the real thing. Anyway, we can

0:13:58.480 --> 0:14:01.360
<v Speaker 1>thank doctor Hildebrand for that. I'll talk more about what

0:14:01.440 --> 0:14:05.640
<v Speaker 1>we could thank doctor Hildebrand for, specifically auto tune, but

0:14:05.720 --> 0:14:07.960
<v Speaker 1>first let's take a quick break so we could thank

0:14:08.000 --> 0:14:21.160
<v Speaker 1>some other people, namely our sponsors. Will be right back. Okay,

0:14:21.320 --> 0:14:24.000
<v Speaker 1>So before we left off, I was talking about how

0:14:24.040 --> 0:14:29.000
<v Speaker 1>doctor Hildebrand had released a program called Infinity that improved

0:14:29.160 --> 0:14:34.440
<v Speaker 1>the performance of synthesizers and samplers. But in nineteen ninety

0:14:34.440 --> 0:14:37.040
<v Speaker 1>he decided to take an extra step. He founded a

0:14:37.240 --> 0:14:41.840
<v Speaker 1>new company. He called it Antare's Audio Technology, and this

0:14:41.880 --> 0:14:46.440
<v Speaker 1>would be his music company, his music technology company that

0:14:46.560 --> 0:14:50.760
<v Speaker 1>would ultimately produce autotune. And he knew that technology was

0:14:50.840 --> 0:14:53.480
<v Speaker 1>poised to make a huge impact on the music industry

0:14:53.520 --> 0:14:56.240
<v Speaker 1>and already had been like, that's kind of the history

0:14:56.240 --> 0:14:58.960
<v Speaker 1>of modern music is how technology has shaped it. But

0:14:59.000 --> 0:15:01.440
<v Speaker 1>he knew we were on the brink of another revolution.

0:15:01.520 --> 0:15:03.880
<v Speaker 1>He just wasn't exactly sure how that was going to

0:15:03.960 --> 0:15:08.280
<v Speaker 1>manifest now. According to an article by Simon Reynolds, it's

0:15:08.280 --> 0:15:12.320
<v Speaker 1>titled How Autotune Revolutionized the Sound of Popular Music, and

0:15:12.360 --> 0:15:16.760
<v Speaker 1>it was published in Pitchfork, the actual birth of Hildebrand's

0:15:16.800 --> 0:15:20.560
<v Speaker 1>idea for autotune grew out of a casual lunch with

0:15:20.640 --> 0:15:24.160
<v Speaker 1>some of his friends and peers back in nineteen ninety

0:15:24.160 --> 0:15:29.520
<v Speaker 1>five during a National Association of Music Merchants conference. So

0:15:29.600 --> 0:15:32.720
<v Speaker 1>he's at this conference, he's meeting with other people in

0:15:32.760 --> 0:15:36.360
<v Speaker 1>the music and technology spheres, and at this lunch, one

0:15:36.400 --> 0:15:41.320
<v Speaker 1>of the attendees jokingly suggested that what Hildebrand should do

0:15:41.440 --> 0:15:43.960
<v Speaker 1>next is develop a technology that would allow her to

0:15:44.040 --> 0:15:47.000
<v Speaker 1>sing on key, like, can you make a box that

0:15:47.120 --> 0:15:51.120
<v Speaker 1>lets me sing well? And while this was presented as

0:15:51.160 --> 0:15:55.600
<v Speaker 1>a joke, ultimately Hildebrand would think, huh, could I do

0:15:55.760 --> 0:16:00.400
<v Speaker 1>that now? According to Zachary Crockett's article, which is the

0:16:00.440 --> 0:16:04.080
<v Speaker 1>Mathematical Genius of auto Tune, this one in price Anomics

0:16:04.480 --> 0:16:07.760
<v Speaker 1>This wasn't like a light bulb moment where the moment

0:16:07.840 --> 0:16:11.600
<v Speaker 1>this woman says the thing, Hildebrand immediately thinks, ah, that's

0:16:11.640 --> 0:16:14.480
<v Speaker 1>what I shall do. Actually, it took like another six

0:16:14.560 --> 0:16:18.560
<v Speaker 1>months before Hildebrand really kind of revisited the concept and thought,

0:16:18.920 --> 0:16:21.880
<v Speaker 1>maybe there's something here. But in order to do that,

0:16:22.600 --> 0:16:24.800
<v Speaker 1>he would have to develop a technology that could do

0:16:24.920 --> 0:16:27.680
<v Speaker 1>a few things really well, all of which are a

0:16:27.760 --> 0:16:30.640
<v Speaker 1>bit tricky. One is it would need to detect the

0:16:30.720 --> 0:16:33.200
<v Speaker 1>pitch that someone was singing in. For example, if you're

0:16:33.280 --> 0:16:36.160
<v Speaker 1>using it for vocals, and so you would need to

0:16:36.160 --> 0:16:40.440
<v Speaker 1>be able to detect exactly the frequency that was being sung.

0:16:40.840 --> 0:16:44.640
<v Speaker 1>You would need to then also be able to have

0:16:44.880 --> 0:16:49.800
<v Speaker 1>a list of tones that were in the whatever key

0:16:49.880 --> 0:16:52.120
<v Speaker 1>you were supposed to be singing it. So I don't

0:16:52.120 --> 0:16:54.520
<v Speaker 1>want to get into music theory, because goodness knows, I

0:16:54.520 --> 0:16:56.720
<v Speaker 1>don't know that much about it myself, and I would

0:16:56.720 --> 0:16:59.240
<v Speaker 1>just mess things up. But you know, if you're singing

0:16:59.240 --> 0:17:02.160
<v Speaker 1>in a specific key, there are particular tones that belong

0:17:02.240 --> 0:17:04.920
<v Speaker 1>to that key. And often when we sing and we're

0:17:05.119 --> 0:17:07.480
<v Speaker 1>a little off pitch, what we need is to be

0:17:07.920 --> 0:17:11.560
<v Speaker 1>gently nudged a little up or a little down, a

0:17:11.600 --> 0:17:14.480
<v Speaker 1>little sharp or a little flat in order to hit

0:17:14.600 --> 0:17:18.080
<v Speaker 1>a semitone that belongs in that key. So it needs

0:17:18.119 --> 0:17:22.600
<v Speaker 1>to also quote unquote know which tones are appropriate, and

0:17:22.600 --> 0:17:25.480
<v Speaker 1>then it has to be able to digitally alter the

0:17:25.600 --> 0:17:30.440
<v Speaker 1>incoming pitch the actual sung note, and then guide it

0:17:30.720 --> 0:17:34.119
<v Speaker 1>to match that of a target note. Now, ultimately that

0:17:34.200 --> 0:17:37.080
<v Speaker 1>all sounds like a pretty simple idea, but in reality

0:17:37.560 --> 0:17:42.199
<v Speaker 1>to achieve this it was incredibly complex. Ultimately, also, the

0:17:42.240 --> 0:17:45.159
<v Speaker 1>toolould need to work in real time for live performances.

0:17:45.240 --> 0:17:47.840
<v Speaker 1>Like it's one thing to have this for the studio, right,

0:17:47.880 --> 0:17:50.800
<v Speaker 1>because even if you don't have an automatic, you could

0:17:50.840 --> 0:17:53.800
<v Speaker 1>have a tool where an engineer could fiddle with some

0:17:53.920 --> 0:17:57.919
<v Speaker 1>controls and gently alter the pitch of a performance to

0:17:57.960 --> 0:18:00.520
<v Speaker 1>get it closer to being where it needs to be.

0:18:00.880 --> 0:18:03.119
<v Speaker 1>It would be preferable to have that automated so that

0:18:03.160 --> 0:18:04.800
<v Speaker 1>you don't have to go through there and do the

0:18:04.800 --> 0:18:08.800
<v Speaker 1>manual process. But even so, like in a recording setting,

0:18:08.880 --> 0:18:11.199
<v Speaker 1>you don't have to have it be real time necessarily,

0:18:11.280 --> 0:18:13.119
<v Speaker 1>but if you're doing a live performance, you do have

0:18:13.160 --> 0:18:15.560
<v Speaker 1>to have a real time. If someone's up there singing

0:18:15.920 --> 0:18:19.280
<v Speaker 1>and they just hit a flat note when they're not

0:18:19.320 --> 0:18:23.280
<v Speaker 1>supposed to, that could really be a memorable moment and

0:18:23.320 --> 0:18:25.439
<v Speaker 1>not in a great way. So having a tool that

0:18:25.520 --> 0:18:28.920
<v Speaker 1>could gently account for that and fix it in real

0:18:29.000 --> 0:18:32.880
<v Speaker 1>time would be really helpful. But this would mean that

0:18:33.200 --> 0:18:35.439
<v Speaker 1>this tool would have to be able to process a

0:18:35.560 --> 0:18:40.920
<v Speaker 1>huge amount of sound data extremely quickly to make millisecond

0:18:40.920 --> 0:18:45.880
<v Speaker 1>decisions like split millisecond decisions relating to how to shape

0:18:45.920 --> 0:18:49.879
<v Speaker 1>a note moment by moment. Now it does help if

0:18:49.920 --> 0:18:53.320
<v Speaker 1>we also think of sound in terms of mathematics. We

0:18:53.480 --> 0:18:56.080
<v Speaker 1>describe sound in different ways, right, But some of those

0:18:56.400 --> 0:19:00.280
<v Speaker 1>relate specifically to how sound looks to us. If we

0:19:00.400 --> 0:19:05.280
<v Speaker 1>plot sound on like a wave chart, right. For example,

0:19:05.400 --> 0:19:07.760
<v Speaker 1>sounds can be really loud or they can be really quiet,

0:19:08.119 --> 0:19:12.040
<v Speaker 1>and that is volume, But it can also relate to amplitude.

0:19:12.440 --> 0:19:14.920
<v Speaker 1>When you think of a sound wave. The amplitude of

0:19:14.960 --> 0:19:18.240
<v Speaker 1>a sound wave describes how tall those peaks are or

0:19:18.280 --> 0:19:22.720
<v Speaker 1>how low the valleys are. The distance between the furthest

0:19:23.080 --> 0:19:26.440
<v Speaker 1>point of a peak or valley and the zero line.

0:19:26.800 --> 0:19:30.320
<v Speaker 1>That's your amplitude. But we also describe sound in terms

0:19:30.320 --> 0:19:34.880
<v Speaker 1>of pitch or frequencies. Higher frequencies correspond to higher pitches,

0:19:35.200 --> 0:19:37.560
<v Speaker 1>And if we plot a sound wave, let's say that

0:19:37.600 --> 0:19:40.919
<v Speaker 1>we plot it so that the x axis is a

0:19:41.000 --> 0:19:46.639
<v Speaker 1>demarcation of time, so we have one second listed there,

0:19:46.880 --> 0:19:49.640
<v Speaker 1>like the x axis is one second. If there's one

0:19:49.760 --> 0:19:52.399
<v Speaker 1>wave that we draw so that the wave begins at

0:19:52.440 --> 0:19:54.880
<v Speaker 1>the zero point and ends at the one second point,

0:19:55.119 --> 0:19:58.760
<v Speaker 1>then we have a one hurtz sound wave. A hurtz

0:19:59.200 --> 0:20:03.040
<v Speaker 1>is just a measurement a frequency. It refers to one

0:20:03.240 --> 0:20:06.960
<v Speaker 1>cycle per second. So if a wave is one hurts,

0:20:06.960 --> 0:20:09.240
<v Speaker 1>it means it takes one second for one of those

0:20:09.240 --> 0:20:13.120
<v Speaker 1>sound waves to fully pass a given point where you're

0:20:13.160 --> 0:20:17.920
<v Speaker 1>measuring the sound waves, right, If two waves pass that

0:20:18.040 --> 0:20:20.679
<v Speaker 1>point within one second, then you're talking about two hurts,

0:20:21.040 --> 0:20:23.440
<v Speaker 1>you know. Just so that we know, the typical human

0:20:23.480 --> 0:20:28.080
<v Speaker 1>hearing range is anywhere between twenty and twenty thousand hurts.

0:20:28.320 --> 0:20:31.359
<v Speaker 1>So one or two hurts sound We wouldn't even perceive it,

0:20:31.359 --> 0:20:33.720
<v Speaker 1>at least not as sound. If it was a great

0:20:33.800 --> 0:20:36.840
<v Speaker 1>enough amplitude, you could potentially perceive it as vibration, but

0:20:36.880 --> 0:20:40.359
<v Speaker 1>you wouldn't feel it, you wouldn't hear it. But between

0:20:40.359 --> 0:20:43.960
<v Speaker 1>twenty and twenty thousand hurts, that falls into the typical

0:20:44.040 --> 0:20:46.120
<v Speaker 1>range of human hearing. Of course, as we get older,

0:20:46.119 --> 0:20:49.440
<v Speaker 1>we start to lose the ability to hear those higher frequencies.

0:20:50.040 --> 0:20:53.600
<v Speaker 1>These days, I think my hearing tops out around sixteen

0:20:53.640 --> 0:20:57.040
<v Speaker 1>to seventeen thousand hurts somewhere around there. Like once you

0:20:57.080 --> 0:21:00.320
<v Speaker 1>get beyond that, I don't hear anything, whereas younger people

0:21:00.320 --> 0:21:04.120
<v Speaker 1>could hear it. Anyway, Hildebrand was working with music on

0:21:04.160 --> 0:21:07.840
<v Speaker 1>this mathematical level. He was analyzing music to recognize where

0:21:07.840 --> 0:21:11.520
<v Speaker 1>the frequencies were and where they should be, and to

0:21:11.600 --> 0:21:14.760
<v Speaker 1>then shape the sound wave so that it would fit

0:21:15.440 --> 0:21:18.879
<v Speaker 1>what the ideal would be where it would be on key.

0:21:19.520 --> 0:21:22.840
<v Speaker 1>He was not the first person to attempt to do this, however,

0:21:22.920 --> 0:21:27.400
<v Speaker 1>Earlier engineers had largely abandoned the quest because the signal

0:21:27.480 --> 0:21:32.800
<v Speaker 1>processing and statistical analysis needs were so high. They were

0:21:32.840 --> 0:21:37.000
<v Speaker 1>so extreme that you would need a supercomputer dedicated to

0:21:37.040 --> 0:21:39.119
<v Speaker 1>the task to be able to do it. There's just

0:21:39.200 --> 0:21:43.399
<v Speaker 1>too much data to process in too little time to

0:21:43.480 --> 0:21:46.720
<v Speaker 1>be able to do anything meaningful with it. Hildebrand determined

0:21:46.720 --> 0:21:49.639
<v Speaker 1>that yeah, to fully analyze music, you would have to

0:21:49.720 --> 0:21:54.280
<v Speaker 1>run thousands or millions of calculations, but many of those

0:21:54.320 --> 0:21:57.280
<v Speaker 1>calculations were actually redundant at the end of the day,

0:21:57.440 --> 0:22:00.919
<v Speaker 1>and eliminating the redundancy would not affect the quality of

0:22:00.960 --> 0:22:04.240
<v Speaker 1>the outcome, and so in his words he quote changed

0:22:04.400 --> 0:22:09.560
<v Speaker 1>a million multiply ads into just four. It was a trick,

0:22:10.080 --> 0:22:14.040
<v Speaker 1>a mathematical trick. End quote. That's ron the article I

0:22:14.119 --> 0:22:19.479
<v Speaker 1>mentioned earlier by Zachary Crockett. So yeah, in prisonomics, pretty

0:22:20.119 --> 0:22:24.520
<v Speaker 1>phenomenal that he was able to recognize that ultimately he

0:22:24.600 --> 0:22:28.879
<v Speaker 1>just needed these four processes to really be able to

0:22:29.240 --> 0:22:33.560
<v Speaker 1>zero in on pitch correction. So Hildebrand developed the autotune

0:22:33.600 --> 0:22:36.760
<v Speaker 1>technology in nineteen ninety six. He actually used to customized

0:22:36.840 --> 0:22:40.080
<v Speaker 1>Mac computer or specialized Mac computer as the way I've

0:22:40.119 --> 0:22:43.119
<v Speaker 1>seen it explained. I don't know in what way it

0:22:43.160 --> 0:22:46.000
<v Speaker 1>was specialized. I just know it was a Mac. And

0:22:46.200 --> 0:22:50.000
<v Speaker 1>he brought his software to the next National Association of

0:22:50.119 --> 0:22:52.840
<v Speaker 1>Music Merchant's conference, if you remember, that was the same

0:22:52.920 --> 0:22:56.639
<v Speaker 1>conference where one of his lunch companions had inspired the

0:22:56.880 --> 0:22:59.639
<v Speaker 1>idea for autotune in the first place. To say that

0:22:59.680 --> 0:23:03.040
<v Speaker 1>he felt interest in his product at this conference is

0:23:03.119 --> 0:23:07.639
<v Speaker 1>really under selling it, and it's understandable why. So let's

0:23:07.760 --> 0:23:12.119
<v Speaker 1>talk about the process of creating a master recording for

0:23:12.200 --> 0:23:15.520
<v Speaker 1>a song. If you want to get a perfect take

0:23:16.040 --> 0:23:20.800
<v Speaker 1>of a song, where this is the master recording, this

0:23:20.840 --> 0:23:23.879
<v Speaker 1>is what you want to use in order to you know,

0:23:23.960 --> 0:23:28.560
<v Speaker 1>create your album. You can't just hope that everything lines

0:23:28.640 --> 0:23:32.560
<v Speaker 1>up when you hit record and that everyone is playing

0:23:32.640 --> 0:23:37.080
<v Speaker 1>seamlessly together and no one makes a mistake. Invariably something

0:23:37.359 --> 0:23:40.159
<v Speaker 1>is going to be off. Maybe one of the musicians

0:23:40.200 --> 0:23:42.560
<v Speaker 1>is lagging behind the others and it might not even

0:23:42.600 --> 0:23:46.159
<v Speaker 1>be detectable at first, but upon closer examination you're like, ooh,

0:23:46.200 --> 0:23:49.280
<v Speaker 1>you came in late, or you came in too early

0:23:49.359 --> 0:23:52.960
<v Speaker 1>or whatever. Or the drummer is not keeping perfect time,

0:23:53.040 --> 0:23:55.680
<v Speaker 1>whatever it may be. Maybe someone hits a wrong note,

0:23:56.000 --> 0:23:59.280
<v Speaker 1>either while playing an instrument or while singing, or maybe both.

0:23:59.760 --> 0:24:02.760
<v Speaker 1>But what it means for engineers is that they'll need

0:24:02.800 --> 0:24:06.760
<v Speaker 1>to get another take where that mistake isn't there, and

0:24:06.800 --> 0:24:09.840
<v Speaker 1>they'll probably need another take and another take, And if

0:24:09.840 --> 0:24:12.800
<v Speaker 1>you want the perfect performance, this could mean recording the

0:24:12.800 --> 0:24:16.879
<v Speaker 1>same track dozens or gosh even hundreds of times and

0:24:16.920 --> 0:24:20.840
<v Speaker 1>then slowly picking apart each recording in order to piece

0:24:20.920 --> 0:24:24.919
<v Speaker 1>together a perfect edit. And that alone is hard because

0:24:24.960 --> 0:24:29.520
<v Speaker 1>just lining up the different takes isn't always the easiest

0:24:29.520 --> 0:24:32.160
<v Speaker 1>thing to do. You don't always have a seamless point

0:24:32.200 --> 0:24:34.520
<v Speaker 1>where you could line up take one would take two.

0:24:34.600 --> 0:24:37.720
<v Speaker 1>Like again, if the band is playing at a slightly

0:24:37.840 --> 0:24:41.200
<v Speaker 1>different pace in the second take. You can't easily line

0:24:41.280 --> 0:24:43.680
<v Speaker 1>up the two different ones to you know, even if

0:24:43.800 --> 0:24:46.359
<v Speaker 1>like one had an accident and the other one didn't,

0:24:46.760 --> 0:24:49.480
<v Speaker 1>you can't necessarily put them together to make the perfect recording.

0:24:49.520 --> 0:24:55.240
<v Speaker 1>So this is a really laborious, time consuming and expensive process.

0:24:55.640 --> 0:25:02.000
<v Speaker 1>Expensive because studio time is limited, so expensive. Hildebrand's invention

0:25:02.480 --> 0:25:05.359
<v Speaker 1>would take a ton of that effort off the table,

0:25:05.400 --> 0:25:08.439
<v Speaker 1>at least for vocals, because rather than re recording a

0:25:08.520 --> 0:25:12.280
<v Speaker 1>billion times, you could get maybe just one good take,

0:25:12.800 --> 0:25:16.159
<v Speaker 1>one decent take even and then use pitch correction for

0:25:16.200 --> 0:25:18.399
<v Speaker 1>any little flubs that might have found their way in

0:25:18.520 --> 0:25:21.880
<v Speaker 1>during the recording process. So it was a huge time saver,

0:25:22.080 --> 0:25:27.680
<v Speaker 1>and time is money. So immediately studios recognized the value

0:25:28.200 --> 0:25:32.560
<v Speaker 1>of Hildebrand's product and they rushed to get in on that,

0:25:33.040 --> 0:25:37.480
<v Speaker 1>and the tool absolutely revolutionized the recording industry. Studios that

0:25:37.560 --> 0:25:41.760
<v Speaker 1>incorporated auto tune were able to work much faster than

0:25:41.800 --> 0:25:44.639
<v Speaker 1>their competitors. They were able to cycle clients in and

0:25:44.680 --> 0:25:48.080
<v Speaker 1>out of their studios more quickly. That meant getting more

0:25:48.080 --> 0:25:52.120
<v Speaker 1>work done and more money coming in, and efficiency skyrocketed.

0:25:52.200 --> 0:25:55.720
<v Speaker 1>So studios that were not on the auto tuned train

0:25:56.040 --> 0:25:59.560
<v Speaker 1>soon found themselves getting out competed, and they ended up

0:25:59.600 --> 0:26:02.720
<v Speaker 1>adopting the technology as well, because it was either adopted

0:26:03.280 --> 0:26:06.600
<v Speaker 1>or go out of business. It also wasn't enough to

0:26:06.720 --> 0:26:10.240
<v Speaker 1>just be able to change the pitch of a note.

0:26:10.440 --> 0:26:13.040
<v Speaker 1>Auto tune would also have to be able to adjust

0:26:13.080 --> 0:26:17.000
<v Speaker 1>that pitch on a sliding scale of rapidity. That is,

0:26:17.240 --> 0:26:21.399
<v Speaker 1>the sound would be unnatural if you were to correct

0:26:21.440 --> 0:26:24.439
<v Speaker 1>a note instantaneously. It would be the effect that we

0:26:24.480 --> 0:26:27.800
<v Speaker 1>associate with autotune, that robotic effect. That's if you were

0:26:27.840 --> 0:26:32.440
<v Speaker 1>to change the pitch correction super fast. You don't want

0:26:32.440 --> 0:26:36.840
<v Speaker 1>to do that if you want the tool to remain unnoticed. So,

0:26:37.080 --> 0:26:39.800
<v Speaker 1>particularly for stuff like slow ballads, you would not have

0:26:40.000 --> 0:26:44.840
<v Speaker 1>a more gradual approach to correcting a pitch. So Hildebrand

0:26:44.880 --> 0:26:47.640
<v Speaker 1>wanted a tool that would let users determine how quickly

0:26:48.119 --> 0:26:51.000
<v Speaker 1>the note would get nudged to the correct pitch, and

0:26:51.119 --> 0:26:53.879
<v Speaker 1>the scale essentially went from zero to ten. The higher

0:26:53.920 --> 0:26:57.080
<v Speaker 1>settings would have longer adjustment times, so for a really

0:26:57.240 --> 0:27:00.320
<v Speaker 1>slow song, you might go with a nine or a

0:27:00.440 --> 0:27:02.919
<v Speaker 1>ten to let the note find its way to the

0:27:02.960 --> 0:27:06.320
<v Speaker 1>right pitch more gradually. Faster songs like rock and roll

0:27:06.440 --> 0:27:09.679
<v Speaker 1>type stuff or a rap or R and B. You

0:27:09.800 --> 0:27:13.200
<v Speaker 1>might require a lower setting, like fast rock songs, you

0:27:13.280 --> 0:27:15.439
<v Speaker 1>might need a two, three, or maybe even down to

0:27:15.520 --> 0:27:19.600
<v Speaker 1>a one. The zero setting. Really, Hildebrand just added that

0:27:19.640 --> 0:27:22.840
<v Speaker 1>for kicks, So essentially the software would immediately correct the

0:27:22.880 --> 0:27:27.639
<v Speaker 1>pitch upon detecting an incoming signal. And this sounded weird

0:27:27.920 --> 0:27:31.240
<v Speaker 1>and unnatural, and it was obvious that something was going on.

0:27:31.600 --> 0:27:35.000
<v Speaker 1>So this was more for fun than an intent to

0:27:35.040 --> 0:27:37.760
<v Speaker 1>create a new tool for musicians. But it turned out

0:27:38.080 --> 0:27:41.240
<v Speaker 1>that's exactly what autotune was really destined for, to become

0:27:41.280 --> 0:27:45.440
<v Speaker 1>a tool for a process called pitch quantization. But again,

0:27:45.680 --> 0:27:48.200
<v Speaker 1>that wasn't what Hildebrand set out to do. In fact,

0:27:48.200 --> 0:27:51.479
<v Speaker 1>ecquate to that Pitchfork article I mentioned earlier, the idea

0:27:51.560 --> 0:27:54.440
<v Speaker 1>here was to aim for perfection, at least in terms

0:27:54.520 --> 0:27:57.520
<v Speaker 1>of being in the right key and on pitch. That

0:27:57.720 --> 0:28:01.520
<v Speaker 1>imperfections would somehow interfere within a momotional connection to the music,

0:28:01.920 --> 0:28:04.199
<v Speaker 1>and you want that music to be perfect so that

0:28:04.280 --> 0:28:07.879
<v Speaker 1>you can have that emotional impact. Now, personally, I disagree

0:28:07.920 --> 0:28:11.320
<v Speaker 1>with that take. Some of my favorite recordings are with

0:28:11.520 --> 0:28:15.760
<v Speaker 1>artists who have imperfect voices. They weren't screeching or catterwalling.

0:28:15.800 --> 0:28:18.280
<v Speaker 1>It wasn't like it was unpleasant to listen to them,

0:28:18.480 --> 0:28:21.399
<v Speaker 1>but they aren't pitch perfect either, and to me, that

0:28:21.480 --> 0:28:24.520
<v Speaker 1>adds a lot of character and emotion. So as an example,

0:28:24.960 --> 0:28:28.399
<v Speaker 1>Warren Zevon, who you know did the song where Wolves

0:28:28.440 --> 0:28:31.119
<v Speaker 1>of London and tons of other stuff. I mean, prolific

0:28:31.240 --> 0:28:34.560
<v Speaker 1>musician who tragically passed away several years ago. He has

0:28:34.600 --> 0:28:37.280
<v Speaker 1>a great cover of the song back in the High

0:28:37.280 --> 0:28:40.720
<v Speaker 1>Life Again and which is a pretty cheesy song, but

0:28:41.080 --> 0:28:44.040
<v Speaker 1>Warren Zevon's cover is really emotional. It's great, and it's

0:28:44.080 --> 0:28:47.760
<v Speaker 1>a little bit raw, and to me it resonates far

0:28:47.840 --> 0:28:51.320
<v Speaker 1>more than a note perfect performance would have. But I

0:28:51.320 --> 0:28:53.320
<v Speaker 1>do understand where hill le Brand and his team were

0:28:53.360 --> 0:28:55.160
<v Speaker 1>coming from. You know, if you if you have a

0:28:55.240 --> 0:28:58.560
<v Speaker 1>take from a recording session that is almost but not

0:28:58.800 --> 0:29:01.600
<v Speaker 1>quite right, maybe there was a transition where the wrong

0:29:01.640 --> 0:29:03.480
<v Speaker 1>note came out, or you know, just a moment where

0:29:03.520 --> 0:29:06.160
<v Speaker 1>it took an artist a little bit longer to slide

0:29:06.320 --> 0:29:09.200
<v Speaker 1>to find the right pitch. A tool that could smooth

0:29:09.200 --> 0:29:14.000
<v Speaker 1>things out a little while not remaining you know, noticeable,

0:29:14.280 --> 0:29:17.120
<v Speaker 1>while you know, slipping under the radar. That could prevent

0:29:17.200 --> 0:29:20.760
<v Speaker 1>listeners from being distracted by something that was unintentional. But

0:29:20.800 --> 0:29:23.160
<v Speaker 1>what if you took that tool that was meant to

0:29:23.240 --> 0:29:27.680
<v Speaker 1>fix errors and used it to create unintended effects. That's

0:29:27.680 --> 0:29:29.800
<v Speaker 1>what we're going to talk about when we come back

0:29:30.000 --> 0:29:42.640
<v Speaker 1>from this quick break. So we talked about how auto

0:29:42.720 --> 0:29:48.080
<v Speaker 1>tune was meant to fix little imperfections in music recordings

0:29:48.080 --> 0:29:51.600
<v Speaker 1>and live performance. But as I mentioned, if you had

0:29:51.640 --> 0:29:55.720
<v Speaker 1>that setting set to zero so that it would instantaneously

0:29:55.760 --> 0:30:00.440
<v Speaker 1>attempt to correct pitches, then you could create an almost

0:30:00.760 --> 0:30:05.360
<v Speaker 1>robotic vocalization. So instead of shying away from the artificial

0:30:05.440 --> 0:30:07.400
<v Speaker 1>sounds that could come out if you were to use

0:30:07.400 --> 0:30:10.920
<v Speaker 1>it improperly, you leaned into it. That's what happened in

0:30:11.000 --> 0:30:14.600
<v Speaker 1>nineteen ninety eight with Sher's song Believe, saying that dolls

0:30:14.720 --> 0:30:17.280
<v Speaker 1>zero would create the robotic like effect, which in this

0:30:17.360 --> 0:30:19.560
<v Speaker 1>case was the goal in the first place, and that

0:30:19.720 --> 0:30:24.200
<v Speaker 1>song was a smash success. I couldn't stand it, and

0:30:24.280 --> 0:30:26.400
<v Speaker 1>it was everywhere. I couldn't stand it, not because of

0:30:26.440 --> 0:30:29.120
<v Speaker 1>the auto tune. I just didn't vibe with the song.

0:30:29.600 --> 0:30:35.040
<v Speaker 1>No no shade on chare phenomenal artist, you know, incredibly talented,

0:30:35.560 --> 0:30:39.720
<v Speaker 1>Just that song didn't jibe with me and The interesting

0:30:39.760 --> 0:30:42.800
<v Speaker 1>thing was that this huge success not only pulled the

0:30:42.840 --> 0:30:44.880
<v Speaker 1>curtain back on a tool that was meant to correct

0:30:44.880 --> 0:30:49.040
<v Speaker 1>little mistakes and thus create a whole conversation around whether

0:30:49.120 --> 0:30:52.040
<v Speaker 1>or not artists were quote unquote cheating by using it,

0:30:52.400 --> 0:30:54.760
<v Speaker 1>but it launched a whole new way to create music

0:30:54.760 --> 0:30:57.400
<v Speaker 1>in the first place. I personally think the artist who

0:30:57.480 --> 0:31:02.000
<v Speaker 1>is most associated with auto twoun is one who adopted

0:31:02.040 --> 0:31:04.880
<v Speaker 1>the technology and made it an intrinsic part of his brand.

0:31:05.280 --> 0:31:07.600
<v Speaker 1>That would be te Pain. He came to the party

0:31:07.640 --> 0:31:10.320
<v Speaker 1>a little bit late. He became interested in autotune around

0:31:10.320 --> 0:31:13.760
<v Speaker 1>two thousand and four, and he wasn't looking for something

0:31:13.760 --> 0:31:17.000
<v Speaker 1>to help compensate for his singing ability, because he actually

0:31:17.000 --> 0:31:19.920
<v Speaker 1>sings very well. But he liked the thought of the

0:31:19.920 --> 0:31:23.719
<v Speaker 1>technology that would set him apart from other artists, and

0:31:23.760 --> 0:31:27.360
<v Speaker 1>he could forge a vocal identity using this tool to

0:31:27.400 --> 0:31:30.000
<v Speaker 1>create a sound that no one else was really embracing

0:31:30.040 --> 0:31:34.920
<v Speaker 1>at that point. So he jumped wholeheartedly into autotune, and

0:31:34.960 --> 0:31:38.840
<v Speaker 1>he made liberal use of the technology and achieved tremendous

0:31:38.880 --> 0:31:43.360
<v Speaker 1>success along the way, selling like Platinum records by using

0:31:43.400 --> 0:31:46.600
<v Speaker 1>this technology. His love of the software led to an

0:31:46.600 --> 0:31:50.360
<v Speaker 1>official partnership with Hildebrand's company for a few years, and

0:31:50.440 --> 0:31:53.640
<v Speaker 1>Tarees licensed the technology to tee Pain for an app

0:31:53.720 --> 0:31:56.600
<v Speaker 1>called I Am te Pain, which you could use to

0:31:56.960 --> 0:32:01.560
<v Speaker 1>do autotuned right there on your smartphone three dollars initially,

0:32:01.720 --> 0:32:04.360
<v Speaker 1>and it was downloaded by millions of users that generated

0:32:04.680 --> 0:32:07.040
<v Speaker 1>quite a lot of revenue just on its own. Now,

0:32:07.080 --> 0:32:11.480
<v Speaker 1>eventually t Pain and Antarees parted ways, and t Pain

0:32:11.640 --> 0:32:15.640
<v Speaker 1>ultimately partnered with a different pitch correction company called Isotope.

0:32:15.880 --> 0:32:20.360
<v Speaker 1>The t Pain story also led to a lawsuit against Antarees,

0:32:20.560 --> 0:32:24.480
<v Speaker 1>and Antaries filed a countersuit against te Pain, and ultimately

0:32:24.520 --> 0:32:27.240
<v Speaker 1>the whole thing was settled out of court and everyone

0:32:27.360 --> 0:32:31.240
<v Speaker 1>signed an NDA. So I have no details about, you know,

0:32:31.320 --> 0:32:34.880
<v Speaker 1>how that shook out in the end, but it was

0:32:34.920 --> 0:32:37.040
<v Speaker 1>one of those things where it was kind of a

0:32:37.160 --> 0:32:43.400
<v Speaker 1>smudge on the Antarees name at least it was awkward, right.

0:32:43.600 --> 0:32:47.440
<v Speaker 1>But a much larger threat to Hildebrand's company was Apple.

0:32:47.840 --> 0:32:53.000
<v Speaker 1>Apple had purchased a German company called Emagic. Emagic also

0:32:53.480 --> 0:32:56.840
<v Speaker 1>had a pitch correction tool. In fact, it was a

0:32:56.840 --> 0:33:02.880
<v Speaker 1>pitch correction tool that, according to Hildebrand, essentially copied autotune technology.

0:33:03.160 --> 0:33:07.600
<v Speaker 1>This was possible because Antaris had failed to protect its

0:33:07.760 --> 0:33:13.960
<v Speaker 1>German patent properly, and so Emagic was able to appropriate

0:33:14.200 --> 0:33:18.840
<v Speaker 1>that technology or copy that technology without fear of legal recourse.

0:33:19.440 --> 0:33:23.760
<v Speaker 1>So then Apple acquires Emagic, which means Apple is then

0:33:23.800 --> 0:33:28.280
<v Speaker 1>able to incorporate Emagic's technology into their own products, including

0:33:28.480 --> 0:33:32.880
<v Speaker 1>their own sound editing software. And this meant that autotune

0:33:32.920 --> 0:33:39.320
<v Speaker 1>effectively got incorporated into Apple software without having to license

0:33:39.360 --> 0:33:43.560
<v Speaker 1>the technology from Antaries, because again they got it by

0:33:43.560 --> 0:33:47.360
<v Speaker 1>acquiring this German company. Now, Antaris could technically have still

0:33:47.440 --> 0:33:50.160
<v Speaker 1>sued Apple. There's no guarantee that they would have won,

0:33:50.560 --> 0:33:54.240
<v Speaker 1>but they could have sued them. However, Hildebrand explained that

0:33:54.320 --> 0:33:59.920
<v Speaker 1>they didn't really have that option because Apple has enorm

0:34:00.080 --> 0:34:05.400
<v Speaker 1>mislead deep pockets. Apple is just an incredibly rich company,

0:34:05.840 --> 0:34:10.760
<v Speaker 1>and Apple could easily just outweight Antaries in the legal system,

0:34:10.760 --> 0:34:14.840
<v Speaker 1>while Antaries would drain its resources trying to sue Apple.

0:34:15.120 --> 0:34:17.000
<v Speaker 1>So even if Antaris was in the right of it,

0:34:17.200 --> 0:34:20.120
<v Speaker 1>even if they would have won a judgment against Apple,

0:34:20.440 --> 0:34:23.000
<v Speaker 1>the chance was that Antari's would go out of business

0:34:23.080 --> 0:34:25.360
<v Speaker 1>just trying to pay for all the legal fees for

0:34:25.560 --> 0:34:29.719
<v Speaker 1>the whole battle in the first place. So Ultimately, Antari's

0:34:29.760 --> 0:34:32.000
<v Speaker 1>didn't go after Apple. It just it would have been

0:34:32.000 --> 0:34:37.080
<v Speaker 1>a death sentence. Culturally, autotune began to face resistance in

0:34:37.120 --> 0:34:42.239
<v Speaker 1>the late two thousands. Some artists expressed disdain for the technology,

0:34:42.280 --> 0:34:45.160
<v Speaker 1>going so far as to say it ruined Western music.

0:34:45.640 --> 0:34:49.840
<v Speaker 1>This was partly due to an oversaturation problem. The success

0:34:49.840 --> 0:34:53.200
<v Speaker 1>of Tea Pain, as well as the earlier instances of autotune,

0:34:53.600 --> 0:34:58.080
<v Speaker 1>inspired countless others to embrace the technology while not necessarily

0:34:58.120 --> 0:35:01.560
<v Speaker 1>doing very much else to differentiate themselves from other artists.

0:35:01.640 --> 0:35:03.799
<v Speaker 1>In other words, they were kind of leaning on it

0:35:03.840 --> 0:35:06.080
<v Speaker 1>as a crutch or a gimmick. So there was a

0:35:06.160 --> 0:35:09.600
<v Speaker 1>glut of auto tune robotic voiced vocals and music in

0:35:09.640 --> 0:35:11.920
<v Speaker 1>the early to mid two thousands, and by the late

0:35:11.960 --> 0:35:14.600
<v Speaker 1>two thousands some folks were absolutely fed up with this

0:35:14.719 --> 0:35:17.360
<v Speaker 1>and there was a backlash. It actually kind of reminds

0:35:17.360 --> 0:35:20.719
<v Speaker 1>me about how people began to turn against disco in

0:35:20.800 --> 0:35:23.640
<v Speaker 1>the nineteen seventies, and that in some ways the punk

0:35:23.719 --> 0:35:28.400
<v Speaker 1>rock movement was partly a reaction to disco or a

0:35:28.480 --> 0:35:31.799
<v Speaker 1>rejection of disco. I would only say partly because punk

0:35:31.880 --> 0:35:34.000
<v Speaker 1>rock also has its roots in glam rock, and I

0:35:34.000 --> 0:35:37.400
<v Speaker 1>think glam rock also kind of helped inspire disco. So

0:35:37.640 --> 0:35:40.759
<v Speaker 1>it's a complicated set of relationships, as you might say

0:35:40.840 --> 0:35:44.400
<v Speaker 1>on Facebook. But bands like Death Kem for Cuti actually

0:35:44.600 --> 0:35:48.440
<v Speaker 1>actively spoke out against autotune. So again, some artists were

0:35:48.560 --> 0:35:52.759
<v Speaker 1>arguing that autotune was being used by people to compensate

0:35:52.800 --> 0:35:55.400
<v Speaker 1>for a lack of ability, So they're kind of casting

0:35:55.520 --> 0:35:57.960
<v Speaker 1>shade on fellow artists saying, well, yeah, they have to

0:35:58.040 --> 0:36:00.959
<v Speaker 1>use autotune because they can't sing, or others would say

0:36:00.960 --> 0:36:04.360
<v Speaker 1>like it was making music less genuine and sincere, like

0:36:04.440 --> 0:36:08.560
<v Speaker 1>less human because it was going through this digital processing process.

0:36:08.960 --> 0:36:12.920
<v Speaker 1>Jay Z famously released a song titled Death of Autotune

0:36:12.920 --> 0:36:14.839
<v Speaker 1>in two thousand and nine, the same year when our

0:36:14.960 --> 0:36:18.239
<v Speaker 1>original Tech Stuff episode about autotune came out. As you

0:36:18.320 --> 0:36:21.640
<v Speaker 1>might imagine, jay Z's song had some pretty strong opinions

0:36:21.719 --> 0:36:25.279
<v Speaker 1>about the technology inside of it. It resonated enough to

0:36:25.320 --> 0:36:28.120
<v Speaker 1>win him a Grammy for it, so other people agreed.

0:36:28.440 --> 0:36:31.759
<v Speaker 1>But despite all that backlash, autotune continues to be a thing.

0:36:32.040 --> 0:36:35.760
<v Speaker 1>It did not, in fact die. It's been incorporated into

0:36:36.040 --> 0:36:40.680
<v Speaker 1>software and digital audio workstations. It and similar pitch manipulation

0:36:40.760 --> 0:36:44.640
<v Speaker 1>technologies are often found in everything from professional audio engineering

0:36:44.680 --> 0:36:48.279
<v Speaker 1>software suites to free programs that you can download online. So,

0:36:48.360 --> 0:36:52.759
<v Speaker 1>for example, I sometimes use a program called Audacity, and

0:36:53.200 --> 0:36:56.120
<v Speaker 1>Audacity has an option under its effects where I can

0:36:56.280 --> 0:36:59.760
<v Speaker 1>manually adjust the pitch of a recorded piece of audio.

0:37:00.080 --> 0:37:02.480
<v Speaker 1>I can set what the pitch should be. Now that's

0:37:02.480 --> 0:37:06.160
<v Speaker 1>not autotune, right, because by definition I'm not using an

0:37:06.239 --> 0:37:10.520
<v Speaker 1>auto feature. I'm manually changing the pitch, but it's using

0:37:10.600 --> 0:37:14.000
<v Speaker 1>similar approaches to get an effect. I've actually even made

0:37:14.040 --> 0:37:16.200
<v Speaker 1>use of that tool while I was editing my friend

0:37:16.239 --> 0:37:19.680
<v Speaker 1>Shay's podcast, Kadi Womple with the Shadow People. Shae does

0:37:19.760 --> 0:37:22.680
<v Speaker 1>nearly all the voices on that show. I've actually voiced

0:37:22.680 --> 0:37:25.960
<v Speaker 1>two characters on that show. So if you're eager to

0:37:26.000 --> 0:37:30.640
<v Speaker 1>hear other output from me, that's not a technology podcast,

0:37:30.760 --> 0:37:32.920
<v Speaker 1>go listen to Kadi Womple with the Shadow People. I

0:37:33.000 --> 0:37:35.480
<v Speaker 1>voice a couple of characters on that, but I edit

0:37:35.560 --> 0:37:38.839
<v Speaker 1>the show, so I use pitch adjustment tools in order

0:37:38.880 --> 0:37:42.360
<v Speaker 1>to make some of Shay's voices sound like different people.

0:37:42.640 --> 0:37:45.440
<v Speaker 1>So it's still herb doing the voice, but I digitally

0:37:45.560 --> 0:37:50.800
<v Speaker 1>manipulate the voice to give certain characters their own distinct sound.

0:37:51.239 --> 0:37:53.960
<v Speaker 1>It's pretty neat stuff. I have no idea what it

0:37:53.960 --> 0:37:56.800
<v Speaker 1>would sound like if I actually used an auto tuned tool.

0:37:57.200 --> 0:38:00.279
<v Speaker 1>That probably would sound very different, But I have a

0:38:00.360 --> 0:38:04.719
<v Speaker 1>lot of fun playing with these pitch manipulation tools. Now,

0:38:04.760 --> 0:38:09.000
<v Speaker 1>to get more into the cultural and social impact of autotune,

0:38:09.200 --> 0:38:13.520
<v Speaker 1>I highly recommend that article in Pitchfork by Simon Reynolds. Again,

0:38:13.560 --> 0:38:17.680
<v Speaker 1>that's titled how Autotune Revolutionized the sound of Popular Music.

0:38:17.840 --> 0:38:20.680
<v Speaker 1>It's a long form article, it's well worth your time

0:38:20.719 --> 0:38:23.360
<v Speaker 1>to read it. As a Zachary Crockett's article that I

0:38:23.400 --> 0:38:26.680
<v Speaker 1>mentioned earlier, both of those are great articles about autotune

0:38:26.719 --> 0:38:30.200
<v Speaker 1>and not just the technology, but it's impact on music

0:38:30.239 --> 0:38:33.960
<v Speaker 1>in general and society and culture as well. And Reynolds

0:38:33.960 --> 0:38:36.480
<v Speaker 1>goes into much deeper detail about how the technology has

0:38:36.520 --> 0:38:39.200
<v Speaker 1>had an impact on the recording industry and the backlash

0:38:39.239 --> 0:38:41.960
<v Speaker 1>that came out as a result of that, as well

0:38:41.960 --> 0:38:45.799
<v Speaker 1>as sort of a counter movement against autotune. So check

0:38:45.840 --> 0:38:48.759
<v Speaker 1>those out. They are well worth your time. And I

0:38:48.800 --> 0:38:51.719
<v Speaker 1>could go on, but really I feel like those articles

0:38:52.000 --> 0:38:55.480
<v Speaker 1>do a much better job than I would of describing

0:38:55.520 --> 0:38:57.400
<v Speaker 1>all of that, So check those out when you have

0:38:57.480 --> 0:39:00.239
<v Speaker 1>some time. That's it for today. I hope all of

0:39:00.239 --> 0:39:02.799
<v Speaker 1>you out there are doing well, and I will talk

0:39:02.800 --> 0:39:13.560
<v Speaker 1>to you again really soon. Tech Stuff is an iHeartRadio production.

0:39:13.840 --> 0:39:18.879
<v Speaker 1>For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts,

0:39:19.000 --> 0:39:21.000
<v Speaker 1>or wherever you listen to your favorite shows.