WEBVTT - The Machine Speaks

0:00:03.040 --> 0:00:07.000
<v Speaker 1>Welcome to Stuff to Blow Your Mind, a production of iHeartRadio.

0:00:12.800 --> 0:00:14.520
<v Speaker 2>Hey, welcome to Stuff to Blow Your Mind.

0:00:14.600 --> 0:00:18.000
<v Speaker 3>My name is Robert Lamb and I am Joe McCormick,

0:00:18.040 --> 0:00:21.480
<v Speaker 3>and today we're going to be talking about some early

0:00:22.120 --> 0:00:26.000
<v Speaker 3>voice synthesis machines. Rob I actually got interested in this

0:00:26.120 --> 0:00:30.560
<v Speaker 3>topic because last week, when we were watching the Weird

0:00:30.560 --> 0:00:33.680
<v Speaker 3>House Cinema movie The Black Hole, I was thinking about

0:00:33.840 --> 0:00:36.879
<v Speaker 3>Roddy McDowell's voice when he's doing a voice for the

0:00:36.960 --> 0:00:40.280
<v Speaker 3>robot character who shares a lot of proverbs with the

0:00:40.360 --> 0:00:44.360
<v Speaker 3>human characters. And I kept listening to his line delivery

0:00:44.400 --> 0:00:47.159
<v Speaker 3>and I couldn't decide if he was trying to do

0:00:47.320 --> 0:00:50.360
<v Speaker 3>quote robot voice or not. He seemed to kind of

0:00:50.360 --> 0:00:51.559
<v Speaker 3>dip in and out of it. You know what I

0:00:51.600 --> 0:00:54.200
<v Speaker 3>mean When I say robot voice, where a character's playing

0:00:54.240 --> 0:00:57.640
<v Speaker 3>a robot and they say things like this, Well.

0:00:57.440 --> 0:00:59.600
<v Speaker 2>Of course I know what you are talking about, Joe.

0:01:00.360 --> 0:01:03.760
<v Speaker 3>I got kind of interested in the history of robot voice.

0:01:03.840 --> 0:01:06.440
<v Speaker 3>I was like, where does that come from? And I

0:01:06.480 --> 0:01:08.160
<v Speaker 3>was digging around a little. I'm sure there is a

0:01:08.160 --> 0:01:10.800
<v Speaker 3>good answer on that, but I don't know. My short

0:01:10.800 --> 0:01:14.120
<v Speaker 3>search didn't really turn up anything interesting, but it did

0:01:14.319 --> 0:01:18.160
<v Speaker 3>lead me indirectly to what we're talking about today, which is,

0:01:18.280 --> 0:01:23.280
<v Speaker 3>of course, we have the voice synthesis systems that are

0:01:23.400 --> 0:01:26.080
<v Speaker 3>largely digital today. Before that you had a lot of

0:01:26.520 --> 0:01:32.319
<v Speaker 3>electrical and electro mechanical systems for synthesizing human voices. But

0:01:32.400 --> 0:01:35.039
<v Speaker 3>actually there is an even earlier generation, which are the

0:01:35.200 --> 0:01:40.600
<v Speaker 3>purely mechanical voice synthesizers before electricity even came into the picture.

0:01:40.959 --> 0:01:44.440
<v Speaker 3>And that is what really stole my heart, especially one

0:01:44.520 --> 0:01:46.760
<v Speaker 3>particular machine of this type that I'm going to talk

0:01:46.760 --> 0:01:48.560
<v Speaker 3>about in the second half of this episode.

0:01:48.560 --> 0:01:51.720
<v Speaker 2>I think, yeah, this is a fascinating topic, in part

0:01:51.800 --> 0:01:54.280
<v Speaker 2>because look at it. Look at where we are now, right,

0:01:54.320 --> 0:01:57.360
<v Speaker 2>it's easy today in our Internet age for just the

0:01:57.400 --> 0:02:01.120
<v Speaker 2>average Internet user to engage with their as chat bots

0:02:01.160 --> 0:02:05.920
<v Speaker 2>and generative AI, text to speech and so forth. And

0:02:05.960 --> 0:02:09.640
<v Speaker 2>you know, so we're able to interact with an artifact,

0:02:09.680 --> 0:02:12.400
<v Speaker 2>a thing that reflects human will, that has been designed

0:02:12.440 --> 0:02:15.440
<v Speaker 2>to do key and telling things that have long been

0:02:15.480 --> 0:02:19.680
<v Speaker 2>the hallmarks of human activity, artistic generation, creative writing and

0:02:19.840 --> 0:02:24.600
<v Speaker 2>conversation or especially speech. And of course it's you know,

0:02:24.600 --> 0:02:27.520
<v Speaker 2>it's easy nowadays to do that, right, to transform into

0:02:27.680 --> 0:02:31.640
<v Speaker 2>audible or even video content. What is either written by

0:02:31.720 --> 0:02:34.520
<v Speaker 2>human or or created with some sort of a chat

0:02:34.720 --> 0:02:39.160
<v Speaker 2>bot machine, and the results may be amusing, that may

0:02:39.480 --> 0:02:42.280
<v Speaker 2>may be disastrous. But we're in this age where the

0:02:42.320 --> 0:02:45.240
<v Speaker 2>idea of the machine speaking is not in and of

0:02:45.280 --> 0:02:49.240
<v Speaker 2>itself groundbreaking, or at least if it is groundbreaking, or

0:02:49.240 --> 0:02:53.600
<v Speaker 2>if it's amazing, it's it's that it's a lower level

0:02:53.600 --> 0:02:55.720
<v Speaker 2>of amazement compared to previous ages.

0:02:56.320 --> 0:02:59.040
<v Speaker 3>Well, as you say, it's very integrated into modern technology.

0:02:59.080 --> 0:03:02.040
<v Speaker 3>So there's you know, Series and Alexa, all these like

0:03:02.240 --> 0:03:06.080
<v Speaker 3>home devices that speak, GPS devices for the car, you know,

0:03:06.160 --> 0:03:09.040
<v Speaker 3>that speak to you, but almost all of them are

0:03:09.080 --> 0:03:11.880
<v Speaker 3>still the subject of amusement if you actually pay attention

0:03:11.919 --> 0:03:14.880
<v Speaker 3>to what the voice sounds like, you know, like reading

0:03:14.960 --> 0:03:17.400
<v Speaker 3>emotions into the voice that's telling you what to do

0:03:17.440 --> 0:03:20.480
<v Speaker 3>as you're driving. That always makes me laugh because it

0:03:20.520 --> 0:03:21.960
<v Speaker 3>always seems a little bit annoyed.

0:03:22.680 --> 0:03:25.919
<v Speaker 2>Yeah, yeah, what's Series whole deal that sort of thing, right?

0:03:27.760 --> 0:03:30.280
<v Speaker 2>You know. The other interesting angle on all this is

0:03:30.320 --> 0:03:35.560
<v Speaker 2>that are modern technological advancements here, or even some of

0:03:35.600 --> 0:03:38.720
<v Speaker 2>the historic technological advancements like they are kind of the

0:03:38.760 --> 0:03:42.960
<v Speaker 2>echo of a more ancient longing for this sort of thing.

0:03:43.720 --> 0:03:47.880
<v Speaker 2>It connects to something that's just fascinated us for a

0:03:47.920 --> 0:03:52.440
<v Speaker 2>long time, the idea generally of non human entities engaging

0:03:52.480 --> 0:03:55.040
<v Speaker 2>in speech, and you could you could do absolutely wild

0:03:55.120 --> 0:03:59.280
<v Speaker 2>chasing down the various divisions of this right, the various myths, legends,

0:03:59.280 --> 0:04:04.920
<v Speaker 2>and traditions concerning the speech of animals, plants, inorganic materials,

0:04:04.960 --> 0:04:10.800
<v Speaker 2>supernatural entities. You know, voices seemingly internal but also external

0:04:11.000 --> 0:04:12.520
<v Speaker 2>to our individual experience.

0:04:12.960 --> 0:04:15.120
<v Speaker 3>Though I would say there is an interesting thing about

0:04:15.360 --> 0:04:19.680
<v Speaker 3>machines or human or automata or human artifacts in general,

0:04:20.000 --> 0:04:24.520
<v Speaker 3>when compared to imagining an animal speak or any other

0:04:24.839 --> 0:04:27.680
<v Speaker 3>usually not speaking things starting to speak, which is, if

0:04:27.680 --> 0:04:30.360
<v Speaker 3>you're talking about a machine that does it, that means

0:04:30.520 --> 0:04:33.440
<v Speaker 3>somebody has to make that machine, and somebody has to

0:04:33.480 --> 0:04:36.120
<v Speaker 3>work that machine. And it kind of reminds me of

0:04:36.240 --> 0:04:39.520
<v Speaker 3>the idea of grammar in language. You know. The interesting

0:04:39.560 --> 0:04:42.640
<v Speaker 3>thing about grammar is that when we use language, we

0:04:42.680 --> 0:04:45.880
<v Speaker 3>all use grammar, so we have an intuitive grasp of

0:04:45.920 --> 0:04:49.320
<v Speaker 3>the rules of grammar. But without serious study, people can't

0:04:49.320 --> 0:04:51.560
<v Speaker 3>actually tell you what those rules are. And so, like

0:04:51.600 --> 0:04:53.599
<v Speaker 3>you know, that had to be in a sense a

0:04:53.880 --> 0:04:57.479
<v Speaker 3>science to back engineer the rules of grammar that we

0:04:57.600 --> 0:05:01.280
<v Speaker 3>use intuitively to like make them systematic and you know,

0:05:01.360 --> 0:05:04.599
<v Speaker 3>actually discover what those rules are the same thing could

0:05:04.600 --> 0:05:08.480
<v Speaker 3>be said about the phonetic rules that produce the intelligible speech.

0:05:08.839 --> 0:05:11.400
<v Speaker 3>We can all do it if we can speak, but

0:05:11.680 --> 0:05:16.720
<v Speaker 3>we don't necessarily understand what the individual physical properties of

0:05:16.800 --> 0:05:19.800
<v Speaker 3>a word are, and so we wouldn't necessarily know how

0:05:19.839 --> 0:05:22.359
<v Speaker 3>to make that same word come out of a machine.

0:05:23.160 --> 0:05:25.120
<v Speaker 2>Yeah, there are all these things that you have to

0:05:25.160 --> 0:05:28.800
<v Speaker 2>deconstruct before you can attempt to reproduce it artificially. And

0:05:28.839 --> 0:05:32.000
<v Speaker 2>we see that time and time again with in robotics,

0:05:32.000 --> 0:05:35.560
<v Speaker 2>for example. You know, things that we take for granted

0:05:35.880 --> 0:05:39.600
<v Speaker 2>concerning human movement, just about anything else you can imagine,

0:05:39.640 --> 0:05:42.200
<v Speaker 2>it becomes so much more difficult to try and reproduce

0:05:42.200 --> 0:05:45.039
<v Speaker 2>that you've got to understand what it actually is on

0:05:45.080 --> 0:05:46.599
<v Speaker 2>an entirely new level. First.

0:05:47.279 --> 0:05:50.279
<v Speaker 3>Now, I am to understand that before anybody actually made

0:05:50.320 --> 0:05:53.719
<v Speaker 3>a machine that could approximate or synthesize a human voice

0:05:53.760 --> 0:05:57.320
<v Speaker 3>and produce intelligible speech, people were thinking about this as

0:05:57.320 --> 0:05:58.080
<v Speaker 3>a concept.

0:05:58.960 --> 0:06:01.920
<v Speaker 2>Yeah, yeah, this is not surprising, you know, this is uh,

0:06:01.960 --> 0:06:05.520
<v Speaker 2>this is kind of the the the meat of science fiction.

0:06:05.680 --> 0:06:07.400
<v Speaker 2>Right before we can do it, we dream of it.

0:06:07.960 --> 0:06:11.680
<v Speaker 2>One way or another, no matter what our exact grasp

0:06:11.720 --> 0:06:14.600
<v Speaker 2>of science happens to be. It always reminds me of

0:06:14.640 --> 0:06:19.840
<v Speaker 2>that line in William Gibson's Neuromancer where the character has

0:06:19.880 --> 0:06:22.880
<v Speaker 2>made a deal, a pact with a powerful ai and

0:06:22.920 --> 0:06:24.960
<v Speaker 2>it's pointed out like, this is the sort of thing

0:06:25.040 --> 0:06:28.520
<v Speaker 2>that in you know, centuries ago, people only dreamed of

0:06:28.640 --> 0:06:31.520
<v Speaker 2>making a deal with a devil, and now we've made

0:06:31.560 --> 0:06:37.840
<v Speaker 2>it possible through our ingenuity and invention. Congratulations, yeah, so

0:06:37.839 --> 0:06:41.680
<v Speaker 2>so yeah. Narrowing down here into generally the realm of

0:06:43.360 --> 0:06:49.919
<v Speaker 2>alleged human creations that through at least partial technology, but

0:06:50.080 --> 0:06:53.520
<v Speaker 2>also sometimes wizardry and alchemy and other things that are

0:06:53.600 --> 0:06:56.520
<v Speaker 2>kind of like you know, bunched in there together with

0:06:56.520 --> 0:07:00.880
<v Speaker 2>with actual technology to create some sort of a device

0:07:01.000 --> 0:07:03.440
<v Speaker 2>capable of speech. And then there are some also some

0:07:03.520 --> 0:07:05.719
<v Speaker 2>related things that are tied in there as well, and

0:07:05.800 --> 0:07:08.040
<v Speaker 2>a lot of it comes down to the idea of

0:07:08.440 --> 0:07:12.440
<v Speaker 2>a head, an artificial head that speaks.

0:07:13.080 --> 0:07:16.400
<v Speaker 3>I found something so loaded and revealing about that. As

0:07:16.400 --> 0:07:19.360
<v Speaker 3>a fact, the history of these machines, so many of

0:07:19.400 --> 0:07:23.600
<v Speaker 3>them had, whether real or imagined, these machines, so many

0:07:23.640 --> 0:07:27.480
<v Speaker 3>of them early on had heads or faces, so like

0:07:27.520 --> 0:07:30.000
<v Speaker 3>it wouldn't just be a speaker like you would have today.

0:07:30.080 --> 0:07:33.080
<v Speaker 3>That's you know, it's just a mechanical device for making

0:07:33.120 --> 0:07:36.120
<v Speaker 3>the sound. It's like that the presence of a head

0:07:36.240 --> 0:07:39.840
<v Speaker 3>or a face was considered important or at least desirable.

0:07:40.600 --> 0:07:42.440
<v Speaker 2>Yeah, and I wondered to what extent part of it

0:07:42.480 --> 0:07:46.560
<v Speaker 2>is just an echo of these earlier ideas. So going

0:07:46.720 --> 0:07:50.280
<v Speaker 2>to run through a few of these here. One of

0:07:50.320 --> 0:07:53.920
<v Speaker 2>the most famous, mainly from a literary tradition, as we'll

0:07:54.080 --> 0:07:57.560
<v Speaker 2>discuss here, is the idea of the brazen head. And

0:07:58.000 --> 0:08:00.080
<v Speaker 2>ultimately I guess there's more than one brazen head. We

0:08:00.120 --> 0:08:05.320
<v Speaker 2>can say brazen heads artificial heads that could speak. There's

0:08:05.360 --> 0:08:08.960
<v Speaker 2>a basically a lot of these stories concerned thirteenth century

0:08:09.000 --> 0:08:13.160
<v Speaker 2>English philosopher and Franciscan friar Roger Bacon, who's come up

0:08:13.160 --> 0:08:16.920
<v Speaker 2>on the show before, though this particular version of the

0:08:16.920 --> 0:08:20.000
<v Speaker 2>story doesn't seem to emerge until the sixteenth century, and

0:08:20.080 --> 0:08:23.240
<v Speaker 2>it does so within the works of contemporary drama.

0:08:23.760 --> 0:08:25.960
<v Speaker 3>I think we talked about Roger Bacon at length in

0:08:26.000 --> 0:08:28.960
<v Speaker 3>an episode we did about the invention of fireworks, which

0:08:29.560 --> 0:08:31.920
<v Speaker 3>may come back and feature again in the feed soon.

0:08:32.600 --> 0:08:34.559
<v Speaker 2>Yeah. Yeah, I believe you're right, Yeah, I think Bacon

0:08:34.559 --> 0:08:37.800
<v Speaker 2>did come up in that he had a reputation as

0:08:37.840 --> 0:08:41.559
<v Speaker 2>not only a very learned man in both natural philosophy

0:08:41.559 --> 0:08:44.920
<v Speaker 2>and theology, and I should drive home definitely existed. I

0:08:44.920 --> 0:08:47.160
<v Speaker 2>don't think there's any doubt that there was a Roger Bacon.

0:08:47.640 --> 0:08:49.880
<v Speaker 2>But then there are all these other stories that he

0:08:50.000 --> 0:08:54.439
<v Speaker 2>was also potentially a wizard who was capable of producing

0:08:54.559 --> 0:09:00.480
<v Speaker 2>fabulous automata, either through amazing feats of clockwork engineuity that

0:09:00.520 --> 0:09:03.760
<v Speaker 2>I think would many would say was ultimately, you know,

0:09:04.240 --> 0:09:08.840
<v Speaker 2>impossible during his time period, or failing that he was

0:09:08.880 --> 0:09:11.920
<v Speaker 2>into alchemy and of course dark dank necromancy.

0:09:12.520 --> 0:09:14.600
<v Speaker 3>I think the way I conceive of Roger Bacon is

0:09:14.600 --> 0:09:17.920
<v Speaker 3>that he of course was a real figure. He was

0:09:18.679 --> 0:09:23.080
<v Speaker 3>of great intellectual note and significance, but much about his

0:09:23.200 --> 0:09:26.880
<v Speaker 3>sort of general reputation is kind of legendary, if that

0:09:26.880 --> 0:09:28.400
<v Speaker 3>makes sense. I mean, there are many things we know

0:09:28.480 --> 0:09:32.160
<v Speaker 3>about him that are true, but there's also just sort

0:09:32.160 --> 0:09:34.720
<v Speaker 3>of an aura or a vibe about him that is

0:09:34.760 --> 0:09:36.160
<v Speaker 3>not really based in reality.

0:09:36.840 --> 0:09:39.920
<v Speaker 2>Yeah, I mean, he becomes a character in literature, especially

0:09:39.960 --> 0:09:42.800
<v Speaker 2>in these accounts, so you can sort of look at

0:09:42.800 --> 0:09:46.199
<v Speaker 2>the different phases like historic individual ideas and you know,

0:09:46.760 --> 0:09:51.040
<v Speaker 2>misunderstandings have said real life individual and then eventually that

0:09:51.080 --> 0:09:53.920
<v Speaker 2>echoes into the fictional version of the person.

0:09:53.920 --> 0:09:55.960
<v Speaker 3>Which that's the more like wizard version.

0:09:56.720 --> 0:09:59.400
<v Speaker 2>Yeah, yeah, And so there are a few different examples

0:09:59.400 --> 0:10:01.760
<v Speaker 2>of this. This is like a popular motif for a while.

0:10:02.160 --> 0:10:06.040
<v Speaker 2>There's a sixteenth century prose romance titled the Famous History

0:10:06.040 --> 0:10:08.520
<v Speaker 2>of Friar Bacon, and it tells of Bacon trying to

0:10:08.960 --> 0:10:11.480
<v Speaker 2>give a replica of a human head speech and having

0:10:11.520 --> 0:10:15.360
<v Speaker 2>to call in the devil for help. Cool. Other versions

0:10:15.360 --> 0:10:17.520
<v Speaker 2>of this tale describe it as an artificial head given

0:10:17.600 --> 0:10:21.480
<v Speaker 2>life by demons, which was capable of spontaneous speech and

0:10:21.559 --> 0:10:23.199
<v Speaker 2>of course telling the future.

0:10:24.600 --> 0:10:26.400
<v Speaker 3>I mean, what else would you tell?

0:10:26.480 --> 0:10:31.400
<v Speaker 2>Right right? Robert Greene's sixteen thirty play Friar Bacon and

0:10:31.440 --> 0:10:36.920
<v Speaker 2>Friar Bungay mentions this several times, citing quote Bacon's necromantic

0:10:37.000 --> 0:10:41.880
<v Speaker 2>skill and heads of Brass that quote can utter any voice.

0:10:42.200 --> 0:10:44.560
<v Speaker 2>The idea that's exploring both of these works is that

0:10:44.600 --> 0:10:47.520
<v Speaker 2>Bacon wished to build a wall of brass around Britain

0:10:47.800 --> 0:10:51.280
<v Speaker 2>with the help of the Brazen Head. He fails and

0:10:51.320 --> 0:10:52.199
<v Speaker 2>the head explodes.

0:10:54.640 --> 0:10:56.480
<v Speaker 3>Why have I never heard this? Site? It as like

0:10:56.520 --> 0:10:58.199
<v Speaker 3>an early science fiction tale.

0:10:58.679 --> 0:11:00.720
<v Speaker 2>I don't know, I'm probably not doing do diligence on

0:11:00.840 --> 0:11:03.640
<v Speaker 2>exactly what happens and everything at It's like saying, well,

0:11:03.640 --> 0:11:06.319
<v Speaker 2>in Star Wars, the bad guys make one planet to

0:11:06.320 --> 0:11:08.520
<v Speaker 2>blow up another planet, and then the planet they may

0:11:08.559 --> 0:11:11.360
<v Speaker 2>blows up. You know, That's that's really skipping over a

0:11:11.360 --> 0:11:13.640
<v Speaker 2>lot of the nuance. And so I think there's there's

0:11:13.679 --> 0:11:15.839
<v Speaker 2>inevitably more nuance here, but I just I didn't get

0:11:15.840 --> 0:11:19.840
<v Speaker 2>into it. Okay, So this idea of the satanic brasshead

0:11:19.880 --> 0:11:23.040
<v Speaker 2>of Roger Bacon persists despite the fact that there's no

0:11:23.080 --> 0:11:27.040
<v Speaker 2>indication that anything like this even created purely through technology

0:11:27.120 --> 0:11:30.760
<v Speaker 2>and not satanic wizardry was part of Bacon's world. He

0:11:30.840 --> 0:11:35.320
<v Speaker 2>was interested in optics and certainly various instruments scientific instruments

0:11:35.320 --> 0:11:37.240
<v Speaker 2>of brass of the day, but there's no indication that

0:11:37.280 --> 0:11:39.679
<v Speaker 2>he ever built an artificial head and tried to get

0:11:39.720 --> 0:11:40.320
<v Speaker 2>it to speak.

0:11:40.640 --> 0:11:42.880
<v Speaker 3>Okay, so this is part of the wizard aura, not

0:11:43.000 --> 0:11:44.920
<v Speaker 3>part of his biography.

0:11:44.840 --> 0:11:47.600
<v Speaker 2>Right though, you know, we have to drive home to

0:11:47.840 --> 0:11:51.200
<v Speaker 2>is it possible that Roger Bacon as a hobby did

0:11:51.200 --> 0:11:53.120
<v Speaker 2>what he could to create you know, I mean, it's

0:11:53.160 --> 0:11:55.760
<v Speaker 2>possible it's not, you know, I don't think he would

0:11:55.760 --> 0:11:58.040
<v Speaker 2>have gotten to speak. But there are various sort of

0:11:58.360 --> 0:12:00.880
<v Speaker 2>ways you could interpret this as having some basis in

0:12:00.960 --> 0:12:06.040
<v Speaker 2>reality that doesn't involve magic or super science of the day. Okay, now,

0:12:06.240 --> 0:12:08.600
<v Speaker 2>I went to my bookshelf and I pulled off my

0:12:08.679 --> 0:12:12.240
<v Speaker 2>dusty copy of Brewers Dictionary Phrase and fable and provides

0:12:12.280 --> 0:12:15.360
<v Speaker 2>a little more insight on the legend quote. It was

0:12:15.360 --> 0:12:18.520
<v Speaker 2>said if Bacon heard it speak, he would succeed in

0:12:18.559 --> 0:12:22.720
<v Speaker 2>his projects, if not, he would fail. His familiar Miles

0:12:22.840 --> 0:12:25.560
<v Speaker 2>was set to watch, and while Bacon slept, the head

0:12:25.640 --> 0:12:29.960
<v Speaker 2>spoke thrice. Time is half an hour later, it said

0:12:30.320 --> 0:12:34.440
<v Speaker 2>time was in another half hour, it said times past

0:12:34.880 --> 0:12:35.640
<v Speaker 2>fell down.

0:12:35.440 --> 0:12:39.199
<v Speaker 3>And was broken to atoms to atoms to atoms. Yes,

0:12:40.000 --> 0:12:43.720
<v Speaker 3>surely Adams means something different here, Adams right when discovered

0:12:43.760 --> 0:12:46.200
<v Speaker 3>at the time, I think it just means like small

0:12:46.240 --> 0:12:47.080
<v Speaker 3>parts or something.

0:12:47.200 --> 0:12:49.839
<v Speaker 2>Yeah, yes, yeah, that.

0:12:49.760 --> 0:12:52.559
<v Speaker 3>Would be hilarious if it was literally broken to atoms.

0:12:53.160 --> 0:12:54.960
<v Speaker 2>Yeah. So I don't know if it works, though. It

0:12:54.960 --> 0:12:57.280
<v Speaker 2>sounds like it's kind of an alarm clock that explodes.

0:12:57.679 --> 0:13:01.160
<v Speaker 3>Well, but I don't understand the difference between time was

0:13:01.360 --> 0:13:03.960
<v Speaker 3>and time's passed. They're both past tens.

0:13:04.480 --> 0:13:08.280
<v Speaker 2>Hmmm, that's a good point. Time is time was, Time's past.

0:13:08.360 --> 0:13:11.120
<v Speaker 2>It seems like you would want the president there somewhere.

0:13:11.160 --> 0:13:15.920
<v Speaker 2>But yeah, that's that's what it allegedly said. And you'll

0:13:15.920 --> 0:13:21.160
<v Speaker 2>you'll find woodcuts that have this this motif on them

0:13:21.160 --> 0:13:23.000
<v Speaker 2>as well.

0:13:22.200 --> 0:13:24.320
<v Speaker 3>Like it would make more sense if it said the

0:13:24.360 --> 0:13:27.560
<v Speaker 3>three things where time will be, time is, time was,

0:13:28.480 --> 0:13:31.000
<v Speaker 3>but this seems more like time is, time was, time

0:13:31.240 --> 0:13:31.719
<v Speaker 3>was was.

0:13:33.840 --> 0:13:36.880
<v Speaker 2>Now. Brewers notes that reference to the references to the

0:13:36.920 --> 0:13:40.400
<v Speaker 2>brazen head are just common in literature, appearing frequently in

0:13:40.480 --> 0:13:45.640
<v Speaker 2>early romances but with Eastern origins, though it doesn't get

0:13:45.679 --> 0:13:48.720
<v Speaker 2>into that a lot elsewhere in the volume. It's also

0:13:48.880 --> 0:13:54.600
<v Speaker 2>noted that artificial heads that speak occur elsewhere as well.

0:13:54.640 --> 0:13:56.520
<v Speaker 2>And some of these are brazen heads, and some of

0:13:56.520 --> 0:13:58.240
<v Speaker 2>these are other things, but they're kind of I think

0:13:58.240 --> 0:14:01.040
<v Speaker 2>it's important to run through brief some of these examples

0:14:01.040 --> 0:14:03.640
<v Speaker 2>because they kind of paint a picture of not only

0:14:03.679 --> 0:14:07.160
<v Speaker 2>some of these other ideas of artificial heads speaking and

0:14:07.200 --> 0:14:14.800
<v Speaker 2>telling the future, but related non technological non artifacts that

0:14:15.200 --> 0:14:20.520
<v Speaker 2>kind of help inform what we think technology can do. Okay, okay,

0:14:20.560 --> 0:14:22.520
<v Speaker 2>so one of them is a brazen head in the

0:14:22.560 --> 0:14:25.520
<v Speaker 2>possession of Pope Sylvester the Second in the tenth century,

0:14:25.800 --> 0:14:30.280
<v Speaker 2>which he also constructed, and misinterpretations of its utterances could

0:14:30.320 --> 0:14:31.320
<v Speaker 2>prove disastrous.

0:14:31.800 --> 0:14:35.840
<v Speaker 3>Oh, is this also believed to be satanic in some way?

0:14:37.200 --> 0:14:42.480
<v Speaker 2>I didn't go too deep on satanic implications, but possibly, I.

0:14:42.440 --> 0:14:44.800
<v Speaker 3>Guess it would depend on if this legend is associated

0:14:44.840 --> 0:14:47.480
<v Speaker 3>with pro Pope Sylvester or anti Pope Sylvester.

0:14:47.600 --> 0:14:51.040
<v Speaker 2>Sources right, right, but you can definitely see that they're

0:14:51.160 --> 0:14:53.080
<v Speaker 2>in the head itself, regardless of what's supposed to be

0:14:53.120 --> 0:14:56.520
<v Speaker 2>powering it. Like this, it ties into two oracular traditions.

0:14:56.560 --> 0:14:59.360
<v Speaker 2>You know, the idea that here is this thing that

0:14:59.440 --> 0:15:02.640
<v Speaker 2>can give you you cryptic wisdom if you have the

0:15:02.680 --> 0:15:05.880
<v Speaker 2>wisdom to decipher what it's telling you. Another example it's

0:15:05.880 --> 0:15:09.680
<v Speaker 2>brought up in Brewers is or the Colossi of Memnon,

0:15:10.240 --> 0:15:12.280
<v Speaker 2>which we did at least a whole I don't know.

0:15:12.280 --> 0:15:14.400
<v Speaker 2>I can't remember as one episode or multiple episodes, but

0:15:14.440 --> 0:15:17.840
<v Speaker 2>we discussed this on stuff to blow your mind. This

0:15:17.880 --> 0:15:19.960
<v Speaker 2>is a fascinating topic in and of itself.

0:15:20.200 --> 0:15:22.560
<v Speaker 3>This was basically, I think a statue or a pair

0:15:22.600 --> 0:15:25.680
<v Speaker 3>of statues a part of sort of a ruins complex

0:15:25.840 --> 0:15:30.680
<v Speaker 3>that was famous during in Roman Egypt as basically because

0:15:30.720 --> 0:15:33.640
<v Speaker 3>it would make sounds, and there were different theories about

0:15:33.680 --> 0:15:35.160
<v Speaker 3>how it made sounds and why.

0:15:35.480 --> 0:15:38.480
<v Speaker 2>Yeah, yeah, it seems like I think some said it

0:15:38.520 --> 0:15:40.880
<v Speaker 2>was capable of speech, but generally it's described as singing

0:15:41.040 --> 0:15:44.160
<v Speaker 2>or some sort of a note. And as we discussed,

0:15:44.400 --> 0:15:47.400
<v Speaker 2>while there are some I think unlikely theories regarding the

0:15:47.480 --> 0:15:51.280
<v Speaker 2>use of some sort of intentional sound generating device or devices,

0:15:51.520 --> 0:15:53.920
<v Speaker 2>it seems like a more likely explanation would have to

0:15:54.000 --> 0:15:57.440
<v Speaker 2>do with peculiarities of the stone as it heated in

0:15:57.520 --> 0:16:00.480
<v Speaker 2>the sun and then cooled at night. Anyway, go back

0:16:00.520 --> 0:16:01.760
<v Speaker 2>and listen to that episode if you want to know

0:16:01.800 --> 0:16:04.160
<v Speaker 2>about them. They have a pretty fascinating history.

0:16:04.360 --> 0:16:06.200
<v Speaker 3>Will remember better in the original.

0:16:07.000 --> 0:16:12.080
<v Speaker 2>Yes, there's the head of Orpheus at Lesbos, predicting the

0:16:12.080 --> 0:16:15.640
<v Speaker 2>doom and death of Cyrus the Great. However, I believe

0:16:15.680 --> 0:16:17.960
<v Speaker 2>this is generally thought to be the actual head of

0:16:18.000 --> 0:16:20.840
<v Speaker 2>the hero Orpheus after he was torn apart by the

0:16:20.880 --> 0:16:25.080
<v Speaker 2>main ads of Dionysus during a bacchanalia for the sin

0:16:25.160 --> 0:16:28.600
<v Speaker 2>of worshiping Apollo or having worshiped Apollo. I'm not sure

0:16:28.600 --> 0:16:31.960
<v Speaker 2>what the exact charge was, but still a prophetic disembodied

0:16:32.000 --> 0:16:35.760
<v Speaker 2>head that still continues to speak. Brewers also mentions the

0:16:35.800 --> 0:16:40.000
<v Speaker 2>head of Minos brought by Odin to Scandinavia, which I

0:16:40.040 --> 0:16:42.640
<v Speaker 2>didn't know what to make of this, because Minos is,

0:16:42.680 --> 0:16:49.400
<v Speaker 2>of course the mythical king of Crete, who we've discussed

0:16:49.400 --> 0:16:51.920
<v Speaker 2>on the show before as well. I think the actual

0:16:52.040 --> 0:16:56.120
<v Speaker 2>figure in reference here might be Nimir, the god of wisdom,

0:16:56.160 --> 0:17:01.720
<v Speaker 2>that is beheaded in the Aservaniir War. Odin claims this

0:17:01.840 --> 0:17:05.000
<v Speaker 2>head and it continues to speak secret wisdom. Again, this

0:17:05.080 --> 0:17:07.240
<v Speaker 2>is another one that's not a mechanical head. It's the

0:17:07.240 --> 0:17:10.879
<v Speaker 2>head of an actual defeated divine being that continues to

0:17:10.920 --> 0:17:15.040
<v Speaker 2>live on and to speak. There are tales of Albertus

0:17:15.080 --> 0:17:18.800
<v Speaker 2>Magnus having an earthen head which during the thirteenth century

0:17:18.960 --> 0:17:22.720
<v Speaker 2>was said to speak and move until Thomas Aquinas breaks

0:17:22.720 --> 0:17:25.639
<v Speaker 2>it by accident, and Magnus says, there goes the labor

0:17:25.680 --> 0:17:30.360
<v Speaker 2>of thirty years because now it's broken. So I don't

0:17:30.359 --> 0:17:32.560
<v Speaker 2>know what to make of that one either completely. But

0:17:32.600 --> 0:17:36.120
<v Speaker 2>again we see this motif of a fabulous artificial head

0:17:36.119 --> 0:17:39.520
<v Speaker 2>that speaks, that manages to break one way or another,

0:17:39.560 --> 0:17:43.840
<v Speaker 2>either something fails, somebody knocks it over, or you know

0:17:43.920 --> 0:17:47.440
<v Speaker 2>it explodes after you hit this neooze alarm twice. Then

0:17:47.440 --> 0:17:51.560
<v Speaker 2>there's Alexander's statue of Ascalapius, the Greek god of medicine,

0:17:51.800 --> 0:17:55.440
<v Speaker 2>that was said to speak, but Lucian wrote that the

0:17:55.480 --> 0:17:59.680
<v Speaker 2>sounds came via a concealed man who spoke through tubes.

0:18:00.280 --> 0:18:05.120
<v Speaker 2>So here's an example of some sort of of a creation.

0:18:06.359 --> 0:18:08.199
<v Speaker 2>I guess it depends how you look. Either a statue

0:18:08.240 --> 0:18:13.480
<v Speaker 2>that isn't intended to speak, or through supernatural machinations speaks,

0:18:13.640 --> 0:18:16.800
<v Speaker 2>but according to Lucian, it's in neither of those. It's

0:18:16.840 --> 0:18:19.280
<v Speaker 2>just tubes and some guy like hiding in the bushes

0:18:19.320 --> 0:18:22.560
<v Speaker 2>speaking through the tubes, which is still clever and still technological,

0:18:23.320 --> 0:18:24.280
<v Speaker 2>but is trickery.

0:18:24.920 --> 0:18:28.440
<v Speaker 3>Nonetheless, I think the Lucian you're alluding to there is

0:18:28.520 --> 0:18:30.359
<v Speaker 3>Luci of sam Masada, Is that right?

0:18:30.680 --> 0:18:31.119
<v Speaker 2>I believe?

0:18:31.200 --> 0:18:31.240
<v Speaker 1>So?

0:18:31.440 --> 0:18:34.240
<v Speaker 3>Yes, yeah, this is like this was an ancient satirist

0:18:34.280 --> 0:18:38.879
<v Speaker 3>from Syria who is quite hilarious and was kind of

0:18:38.920 --> 0:18:42.440
<v Speaker 3>a skeptic debunker of the of like the second century CE,

0:18:42.600 --> 0:18:46.200
<v Speaker 3>which is sort of strange, but he was in that

0:18:46.240 --> 0:18:50.480
<v Speaker 3>mold and he made like vicious mockery of people of

0:18:50.520 --> 0:18:54.119
<v Speaker 3>all sorts and different philosophies and stuff, and also wrote

0:18:54.160 --> 0:18:56.600
<v Speaker 3>a satire that some people have considered one of the

0:18:56.640 --> 0:18:58.119
<v Speaker 3>earliest forms of science fiction.

0:18:59.640 --> 0:19:02.080
<v Speaker 2>Now this also reminds me this is not I mean,

0:19:02.200 --> 0:19:05.560
<v Speaker 2>I guess it memory service. Maybe it did speak. But

0:19:06.520 --> 0:19:09.400
<v Speaker 2>there was of course the Man Face Serpent God glyicon

0:19:09.880 --> 0:19:13.000
<v Speaker 2>of the second century that is often held up as

0:19:13.040 --> 0:19:16.680
<v Speaker 2>being a hoax, like it was actually a puppet according

0:19:17.280 --> 0:19:21.920
<v Speaker 2>to commentators. But I've always wondered what to make of that,

0:19:22.040 --> 0:19:26.119
<v Speaker 2>because it kind of if someone is performing puppetry and

0:19:26.160 --> 0:19:29.560
<v Speaker 2>people are having an emotional or even religious reaction to it,

0:19:29.560 --> 0:19:31.840
<v Speaker 2>it kind of depends how it's presented. Right, are you

0:19:31.920 --> 0:19:35.800
<v Speaker 2>presenting Glycan the man Face Serpent as like this is it?

0:19:35.960 --> 0:19:39.080
<v Speaker 2>This is an actual man face serpent God. Come take

0:19:39.080 --> 0:19:42.320
<v Speaker 2>a look that its life? Is proof that he is real?

0:19:43.040 --> 0:19:46.240
<v Speaker 2>Or is it something else? Is it more like performance

0:19:46.400 --> 0:19:48.919
<v Speaker 2>or is it more like reinterpretation? You know, because you

0:19:48.960 --> 0:19:53.480
<v Speaker 2>have plenty of examples where people will carry out performances

0:19:53.480 --> 0:19:56.639
<v Speaker 2>in which people dress as divine and semi divine figures.

0:19:57.160 --> 0:19:59.760
<v Speaker 2>It's not supposed to be like, look at the proof here,

0:20:00.119 --> 0:20:03.200
<v Speaker 2>here is this hero on the stage. This means God

0:20:03.240 --> 0:20:03.560
<v Speaker 2>is real?

0:20:03.880 --> 0:20:06.560
<v Speaker 3>Funny enough, I think Glicon was also written about by

0:20:06.640 --> 0:20:11.159
<v Speaker 3>Lucian of Somesada. But the I guess the crucial question

0:20:11.320 --> 0:20:13.919
<v Speaker 3>is like is there an attempt at trickery or not?

0:20:14.160 --> 0:20:17.440
<v Speaker 3>Like do do you want the audience to believe there

0:20:17.520 --> 0:20:19.560
<v Speaker 3>is not somebody behind the mask?

0:20:20.080 --> 0:20:23.159
<v Speaker 2>Right? And you know that's interesting because that still kind

0:20:23.200 --> 0:20:25.679
<v Speaker 2>of applies to a lot of what's going on in

0:20:25.680 --> 0:20:29.200
<v Speaker 2>the world today with things like like chatbots and so forth.

0:20:29.240 --> 0:20:33.280
<v Speaker 2>And you know this idea that if we you know,

0:20:34.000 --> 0:20:36.439
<v Speaker 2>what is coming out of the box, what is coming

0:20:36.440 --> 0:20:40.360
<v Speaker 2>out of the artificial head, and you know, we how

0:20:40.359 --> 0:20:43.199
<v Speaker 2>are we interpreting it? And are we thinking there is

0:20:43.240 --> 0:20:46.639
<v Speaker 2>something there that is not? So it's like on what

0:20:46.760 --> 0:20:49.800
<v Speaker 2>level is there trickery? And then there is like interpretation

0:20:49.880 --> 0:20:53.480
<v Speaker 2>of the trickery and so forth. But at any rate,

0:20:53.640 --> 0:20:55.520
<v Speaker 2>I think, you know, some of these examples they proved

0:20:55.520 --> 0:20:58.080
<v Speaker 2>that well before people could make any kind of a

0:20:58.119 --> 0:21:01.640
<v Speaker 2>mechanical thing, be it ahead or not ahead, that could speak,

0:21:01.920 --> 0:21:04.480
<v Speaker 2>we were still capable of dreaming about it. And I

0:21:04.480 --> 0:21:07.280
<v Speaker 2>think there's ample evidence that long before anyone attempted to

0:21:07.320 --> 0:21:10.680
<v Speaker 2>make a head that could talk through mechanical means, individuals

0:21:10.680 --> 0:21:14.720
<v Speaker 2>sought and sometimes found a voice emerging from disembodied heads,

0:21:15.000 --> 0:21:18.600
<v Speaker 2>either real ones, you know, the remains of human beings

0:21:19.080 --> 0:21:24.200
<v Speaker 2>or other animals, or likenesses of human heads, either attached

0:21:24.320 --> 0:21:27.640
<v Speaker 2>or detached from statues, and so forth. And I think

0:21:27.640 --> 0:21:30.159
<v Speaker 2>there's room between trickery and belief, you know, for the

0:21:30.200 --> 0:21:35.200
<v Speaker 2>suspension of belief and ritual as well to take into account.

0:21:42.400 --> 0:21:45.160
<v Speaker 3>But of course later on people would end up building real,

0:21:45.320 --> 0:21:49.560
<v Speaker 3>operable machines that were at least attempting to produce speech

0:21:49.680 --> 0:21:51.480
<v Speaker 3>that could be understood by humans.

0:21:51.960 --> 0:21:54.160
<v Speaker 2>That's right, And this is where we get more into

0:21:54.200 --> 0:21:58.600
<v Speaker 2>the deconstruction of what human speech is, which in and

0:21:58.640 --> 0:22:03.960
<v Speaker 2>of itself as a whole subject, but there are key

0:22:04.040 --> 0:22:06.960
<v Speaker 2>moments where we see some major advancements being made here.

0:22:07.560 --> 0:22:10.680
<v Speaker 2>So another major entry to discuss in all of this

0:22:10.760 --> 0:22:13.520
<v Speaker 2>is the work of German born Russian doctor, physicist and

0:22:13.560 --> 0:22:18.360
<v Speaker 2>engineer Christian Gottlieb Kratzenstein, who lives seventeen twenty three through

0:22:18.400 --> 0:22:22.840
<v Speaker 2>seventeen ninety five. So he was a man of various interests,

0:22:22.840 --> 0:22:26.200
<v Speaker 2>including the use of electricity and medicine, and at the

0:22:26.240 --> 0:22:29.159
<v Speaker 2>Saint Petersburg Science Academy at one point offered a prize

0:22:29.240 --> 0:22:33.879
<v Speaker 2>for advancements made in researching the mechanisms behind the vowels

0:22:33.960 --> 0:22:37.800
<v Speaker 2>AEI own you in human speech. So in seventeen seventy

0:22:37.880 --> 0:22:42.600
<v Speaker 2>nine he presented his vowel organ to the university. The

0:22:43.640 --> 0:22:47.480
<v Speaker 2>vowel organ consisted of a series of resonators that produced

0:22:47.600 --> 0:22:52.520
<v Speaker 2>vowel like sounds on a constant pitch when excited by

0:22:52.560 --> 0:22:57.840
<v Speaker 2>a read. I found some illustrations of these basic resonators

0:22:58.440 --> 0:23:05.320
<v Speaker 2>via the UCL Psychology and Language Sciences Department. Here I

0:23:05.359 --> 0:23:11.440
<v Speaker 2>also found a website linked at this website where you

0:23:11.480 --> 0:23:15.080
<v Speaker 2>can find instructions for how to make your own resonators

0:23:15.080 --> 0:23:19.159
<v Speaker 2>out of plumbing supplies, which I found rather insightful. I

0:23:19.200 --> 0:23:22.639
<v Speaker 2>did not attempt it, but if you're into plumbing supplies

0:23:23.000 --> 0:23:26.560
<v Speaker 2>and vowel sounds, it seems like a natural craft choice.

0:23:26.840 --> 0:23:30.000
<v Speaker 3>But the key insight being here that by changing the

0:23:30.119 --> 0:23:33.520
<v Speaker 3>shape of a physical resonating cavity, you can change the

0:23:33.680 --> 0:23:35.840
<v Speaker 3>sound of the vowel produced.

0:23:36.280 --> 0:23:39.560
<v Speaker 2>Right right. Another take on this, I was reading the

0:23:39.560 --> 0:23:42.720
<v Speaker 2>BBC Future article The Machines That Learned to Listen by

0:23:42.800 --> 0:23:47.480
<v Speaker 2>Kadia Muskvich, and it describes these as resonance tubes connected

0:23:47.520 --> 0:23:50.919
<v Speaker 2>to organ pipes. So you know, this is not to

0:23:50.960 --> 0:23:52.520
<v Speaker 2>say that we had. This is not on this like

0:23:52.560 --> 0:23:55.120
<v Speaker 2>the same level as some sort of imaginary brazen head

0:23:55.160 --> 0:23:57.280
<v Speaker 2>that's going to speak of its own and spout out,

0:23:57.520 --> 0:23:59.879
<v Speaker 2>spit out wisdom for you to interpret. This is a

0:24:00.320 --> 0:24:04.679
<v Speaker 2>just figuring out how these vowel sounds are produced and

0:24:04.720 --> 0:24:10.600
<v Speaker 2>reproducing them through a basic mechanical system. Mosfitch also points

0:24:10.600 --> 0:24:12.760
<v Speaker 2>out a few other key individuals in the advancement of

0:24:12.800 --> 0:24:17.560
<v Speaker 2>this technology. There's Wolfgang von Kimplin in Vienna, who created

0:24:17.600 --> 0:24:22.439
<v Speaker 2>a similar acoustic mechanical speech machine about ten years after Kratzenstein.

0:24:23.160 --> 0:24:27.280
<v Speaker 2>And then she also mentions English inventor Charles Wheatstone, who

0:24:27.320 --> 0:24:29.600
<v Speaker 2>would improve on this in the early nineteenth century.

0:24:30.000 --> 0:24:32.720
<v Speaker 3>Charles Wheatstone. I'm going to mention him again in a minute,

0:24:32.760 --> 0:24:35.560
<v Speaker 3>but he's also notable because he was one of the

0:24:35.640 --> 0:24:41.000
<v Speaker 3>inventors of the first commercially successful form of the telegraph.

0:24:41.480 --> 0:24:43.600
<v Speaker 3>So we talked about him in our episode on You

0:24:44.480 --> 0:24:47.200
<v Speaker 3>of the Telegraph. But when it comes to the one

0:24:47.480 --> 0:24:50.240
<v Speaker 3>you mentioned before, that von Kempelan's machine, this is interesting

0:24:50.240 --> 0:24:57.119
<v Speaker 3>because I read that while this machine was allegedly real,

0:24:57.200 --> 0:24:59.639
<v Speaker 3>it was a real attempt to make a machine that

0:24:59.680 --> 0:25:04.920
<v Speaker 3>would Von Kimpelan is now known for essentially being a hoaxer,

0:25:05.000 --> 0:25:08.879
<v Speaker 3>because he tried to create other automata, including a chess

0:25:08.960 --> 0:25:12.119
<v Speaker 3>playing automaton that was actually a hoax. It had a

0:25:12.200 --> 0:25:14.879
<v Speaker 3>human inside it doing the move, so it was a

0:25:14.920 --> 0:25:16.159
<v Speaker 3>fake robot.

0:25:16.240 --> 0:25:19.480
<v Speaker 2>Though as a fake still really impressive. It's interesting where

0:25:19.520 --> 0:25:21.600
<v Speaker 2>you get in, like what sometimes you're wondering you have

0:25:21.640 --> 0:25:24.600
<v Speaker 2>to wonder what the line is between, you know, the

0:25:24.640 --> 0:25:28.560
<v Speaker 2>actual technological innovation and trickery. I mean, obviously it's deception,

0:25:28.960 --> 0:25:31.720
<v Speaker 2>and if you have a secret chamber in which there's

0:25:31.760 --> 0:25:34.360
<v Speaker 2>a whole person doing stuff, you know, that's a real

0:25:34.400 --> 0:25:37.680
<v Speaker 2>red flag there as well. But still the trickery is

0:25:37.720 --> 0:25:38.720
<v Speaker 2>pretty ingenious too.

0:25:39.200 --> 0:25:42.439
<v Speaker 3>Yeah. Well yeah, I mean it takes skill to be

0:25:42.440 --> 0:25:43.160
<v Speaker 3>a good magician.

0:25:43.520 --> 0:25:44.000
<v Speaker 2>Yeah.

0:25:44.080 --> 0:25:46.760
<v Speaker 3>Anyway, this brings us to the example that I was

0:25:46.760 --> 0:25:50.000
<v Speaker 3>really excited to talk about in today's episode, which is

0:25:50.040 --> 0:25:55.400
<v Speaker 3>the speaking machine of a nineteenth century inventor named Joseph

0:25:55.680 --> 0:25:59.199
<v Speaker 3>fober So. One of my main sources here is just

0:25:59.320 --> 0:26:02.440
<v Speaker 3>generally a good source on the history of speech synthesis

0:26:02.520 --> 0:26:06.560
<v Speaker 3>and talking machines. It was a book chapter in the

0:26:06.640 --> 0:26:10.680
<v Speaker 3>Rutledge Handbook of Phonetics from twenty nineteen by an author

0:26:10.760 --> 0:26:14.560
<v Speaker 3>named Brad H. Story, who is part of the faculty

0:26:14.640 --> 0:26:17.320
<v Speaker 3>of the Department of Speech, Language and Hearing Sciences at

0:26:17.359 --> 0:26:22.560
<v Speaker 3>the University of Arizona, and Story in this chapter traces

0:26:22.640 --> 0:26:26.159
<v Speaker 3>the history of speech synthesis from the mechanical methods of

0:26:26.200 --> 0:26:29.680
<v Speaker 3>the eighteenth and nineteenth centuries to the digital techniques of

0:26:29.720 --> 0:26:31.920
<v Speaker 3>the present. So it's the whole sort of modern arc

0:26:32.119 --> 0:26:34.920
<v Speaker 3>of these machines. But the thing I really want to

0:26:34.960 --> 0:26:37.560
<v Speaker 3>focus in on here now is this machine that I

0:26:37.600 --> 0:26:41.919
<v Speaker 3>mentioned a minute ago by the nineteenth century German inventor

0:26:42.119 --> 0:26:45.800
<v Speaker 3>Joseph Fober. This features heavily at the beginning of stories

0:26:45.880 --> 0:26:49.520
<v Speaker 3>chapter here. So this machine was at various different times

0:26:49.680 --> 0:26:53.760
<v Speaker 3>called the marvelous talking machine. You got a hyphen between

0:26:53.800 --> 0:26:58.399
<v Speaker 3>talking machine and also the euphonia from the Greek meaning

0:26:58.480 --> 0:27:02.639
<v Speaker 3>good sound or sweet sound. We'll see about that as

0:27:02.680 --> 0:27:06.480
<v Speaker 3>we as we go on. Robi included one illustration of

0:27:06.520 --> 0:27:08.320
<v Speaker 3>the machine for you to look at here. I think

0:27:08.359 --> 0:27:10.960
<v Speaker 3>this may have been from some kind of promotional material

0:27:11.040 --> 0:27:13.440
<v Speaker 3>when this machine was featured in an exhibit that I'll

0:27:13.480 --> 0:27:14.280
<v Speaker 3>describe in a bit.

0:27:14.640 --> 0:27:18.040
<v Speaker 2>I love it in part because right there is this

0:27:18.160 --> 0:27:24.000
<v Speaker 2>angelic human face like right there on the machine, seemingly

0:27:24.160 --> 0:27:28.000
<v Speaker 2>as like decoration or maybe tribute. I'm not sure, but

0:27:28.400 --> 0:27:31.400
<v Speaker 2>I'm not sure if it's actually necessary to the mechanics

0:27:31.400 --> 0:27:31.919
<v Speaker 2>of the device.

0:27:31.920 --> 0:27:35.320
<v Speaker 3>Here, I think it sort of is. Well, I'll explain.

0:27:35.840 --> 0:27:40.960
<v Speaker 3>So story introduces Fober's machine through the eyes of another

0:27:41.200 --> 0:27:44.879
<v Speaker 3>inventor and scientist of the day named Joseph Henry. A

0:27:44.880 --> 0:27:50.160
<v Speaker 3>different Joseph, a researcher on electromagnetic induction and also the

0:27:50.200 --> 0:27:56.560
<v Speaker 3>inaugural secretary of the Smithsonian Institution. Henry encountered Faber's marvelous

0:27:56.600 --> 0:28:01.520
<v Speaker 3>talking machine at a private exhibition in Philadelphia on December twentieth,

0:28:01.640 --> 0:28:06.159
<v Speaker 3>eighteen forty five, and he described the demonstration in a

0:28:06.280 --> 0:28:09.480
<v Speaker 3>letter to a colleague named HM. Alexander. So we have

0:28:09.680 --> 0:28:12.800
<v Speaker 3>contemporaneous notes on what it was doing and what it

0:28:12.840 --> 0:28:16.639
<v Speaker 3>looked like in this private demonstration. So here's how it worked.

0:28:17.320 --> 0:28:22.359
<v Speaker 3>It was controlled by an operator via a mainly by

0:28:22.520 --> 0:28:26.480
<v Speaker 3>foot pedals and a keyboard, essentially just like an organ,

0:28:26.640 --> 0:28:29.680
<v Speaker 3>like a chamber organ, and in fact the device could

0:28:29.720 --> 0:28:33.359
<v Speaker 3>in some ways be considered a modified organ. So you

0:28:33.400 --> 0:28:36.679
<v Speaker 3>had a foot pedal that operated a bellows and that

0:28:36.720 --> 0:28:41.040
<v Speaker 3>would supply airflow to the whole system, and the bellows

0:28:41.120 --> 0:28:45.120
<v Speaker 3>pumped air through an artificial larynx that had vocal cords

0:28:45.200 --> 0:28:47.000
<v Speaker 3>that were in this source said to be made of

0:28:47.120 --> 0:28:53.080
<v Speaker 3>rubber and these so this artificial glottis or artificial vocal

0:28:53.120 --> 0:28:57.719
<v Speaker 3>cords would vibrate to produce the fundamental sound of the

0:28:57.760 --> 0:29:00.840
<v Speaker 3>machine's voice when air was flowing through them. And then

0:29:00.880 --> 0:29:05.120
<v Speaker 3>you had sixteen keys on the keyboard which were connected

0:29:05.120 --> 0:29:09.720
<v Speaker 3>by strings and levers to the various components that controlled

0:29:09.920 --> 0:29:12.719
<v Speaker 3>the shaping of that sound of that, you know, the

0:29:12.720 --> 0:29:16.560
<v Speaker 3>resonating sound from that airflow through the glottis into speech.

0:29:17.040 --> 0:29:19.600
<v Speaker 3>One of the interesting things is, as we've been saying,

0:29:19.640 --> 0:29:22.560
<v Speaker 3>this device actually had a face. So the face was

0:29:22.640 --> 0:29:26.520
<v Speaker 3>made of carved wood, essentially a large doll head, but

0:29:26.600 --> 0:29:29.400
<v Speaker 3>it had a hinged jaw. So maybe you should think

0:29:29.400 --> 0:29:32.160
<v Speaker 3>of it more like a ventriloquist dummy. You're loving this,

0:29:32.320 --> 0:29:35.880
<v Speaker 3>aren't you. Yeah, Night of the Living dummy. But it

0:29:36.160 --> 0:29:40.560
<v Speaker 3>can actually speak. And so inside the dummy's mouth there

0:29:40.640 --> 0:29:44.959
<v Speaker 3>was an ivory tongue that could be moved around inside

0:29:44.960 --> 0:29:49.200
<v Speaker 3>the oral cavity to control the shape of the resonating chamber.

0:29:49.920 --> 0:29:52.920
<v Speaker 3>And by controlling these different elements like the mouth and

0:29:52.960 --> 0:29:56.000
<v Speaker 3>the tongue and all that, with the keys on the keyboard.

0:29:57.160 --> 0:30:01.280
<v Speaker 3>It quote imposed time varying change to the air cavity

0:30:01.640 --> 0:30:08.320
<v Speaker 3>appropriate for generating apparently convincing renditions of connected speech. So

0:30:08.400 --> 0:30:11.520
<v Speaker 3>it may not have sounded perfect or even pleasant, but

0:30:11.800 --> 0:30:15.640
<v Speaker 3>apparently people in the room could understand what the machine

0:30:15.680 --> 0:30:19.040
<v Speaker 3>was saying when Fober operated it. So this is eighteen

0:30:19.120 --> 0:30:23.360
<v Speaker 3>forty five and the machine is speaking intelligible words. Henry

0:30:23.840 --> 0:30:27.200
<v Speaker 3>in this letter compares it favorably to a different talking machine,

0:30:27.200 --> 0:30:29.360
<v Speaker 3>one he had seen years before. This was one of

0:30:29.400 --> 0:30:31.640
<v Speaker 3>the ones you mentioned, Rob, the one built by the

0:30:31.680 --> 0:30:35.960
<v Speaker 3>English scientist and inventor Charles Wheatstone, again the telegraph guy.

0:30:36.880 --> 0:30:41.920
<v Speaker 3>Wheatstone's talking machine was capable of being understood for the

0:30:42.040 --> 0:30:45.360
<v Speaker 3>set of words it could produce, but Fober's machine was

0:30:45.600 --> 0:30:50.120
<v Speaker 3>far superior because its speech repertoire was infinitely variable, so

0:30:50.160 --> 0:30:54.280
<v Speaker 3>he could speak whole sentences, and those sentences could contain

0:30:54.400 --> 0:30:57.120
<v Speaker 3>any words and any sounds he wanted, as long as

0:30:57.120 --> 0:31:00.080
<v Speaker 3>they were in one of the covered languages. Obviously it

0:31:00.120 --> 0:31:03.880
<v Speaker 3>couldn't do, you know, like tonal languages, or like speak

0:31:03.920 --> 0:31:06.760
<v Speaker 3>Mandarin or something, But it seems like mainly it was

0:31:06.760 --> 0:31:09.280
<v Speaker 3>speaking German and English. It was said at the time

0:31:09.320 --> 0:31:12.200
<v Speaker 3>that it could speak any European language. Now, I think

0:31:12.240 --> 0:31:14.560
<v Speaker 3>one thing that's really worth noting here is that if

0:31:14.640 --> 0:31:17.440
<v Speaker 3>you imagine how a machine like this would work, the

0:31:17.680 --> 0:31:23.240
<v Speaker 3>success of the performance would depend heavily on the skill

0:31:23.600 --> 0:31:27.000
<v Speaker 3>of the operator, since the speech patterns are not like

0:31:27.200 --> 0:31:32.360
<v Speaker 3>programmed and you know, not sort of expressed automatically, but

0:31:32.560 --> 0:31:36.880
<v Speaker 3>expressed in real time by the player operating the bellows

0:31:36.920 --> 0:31:38.760
<v Speaker 3>and the keys. And I think also there were some

0:31:39.240 --> 0:31:42.560
<v Speaker 3>screws and stuff that would manipulate pitch and things like that,

0:31:42.680 --> 0:31:45.280
<v Speaker 3>So you have to play this just like you would

0:31:45.280 --> 0:31:49.560
<v Speaker 3>play a musical instrument. So different players using the same

0:31:49.640 --> 0:31:54.200
<v Speaker 3>machine would probably produce fairly different sounding speech, even if

0:31:54.200 --> 0:31:57.760
<v Speaker 3>they had memorized which keys corresponded to which phonetic units.

0:31:58.280 --> 0:32:01.480
<v Speaker 3>So nobody ever read says this, But you know, I'm

0:32:01.560 --> 0:32:04.800
<v Speaker 3>kind of picturing Fober as a sort of phantom of

0:32:04.800 --> 0:32:07.040
<v Speaker 3>the opera at the at the organ keyboard. You know,

0:32:07.080 --> 0:32:09.040
<v Speaker 3>he's not just like pressing the keys, but giving a

0:32:09.040 --> 0:32:12.880
<v Speaker 3>real passionate and dramatic performance. When somebody sells it, he

0:32:12.960 --> 0:32:15.640
<v Speaker 3>yells at yeah, make it say como tale vou or whatever.

0:32:16.840 --> 0:32:19.120
<v Speaker 3>It also sang songs, by the way, I'll get into

0:32:19.120 --> 0:32:21.600
<v Speaker 3>that in a minute. But I was wondering, what did

0:32:21.640 --> 0:32:24.120
<v Speaker 3>what are people asking, you know, what's the equivalent in

0:32:24.160 --> 0:32:27.160
<v Speaker 3>eighteen forty five of yelling out, you know, play Freebird?

0:32:27.200 --> 0:32:29.040
<v Speaker 3>And I was thinking, maybe it's people are yelling for

0:32:29.120 --> 0:32:30.400
<v Speaker 3>TYPICANU and Tyler too.

0:32:31.000 --> 0:32:32.520
<v Speaker 2>Oh yeah.

0:32:32.600 --> 0:32:35.520
<v Speaker 3>So an interesting detail that story includes in this chapter

0:32:35.720 --> 0:32:38.760
<v Speaker 3>is that this was not the first time Fober had

0:32:38.800 --> 0:32:41.640
<v Speaker 3>built a talking machine. In fact, this was not the

0:32:41.680 --> 0:32:45.680
<v Speaker 3>first time Fober had built this exact talking machine. There

0:32:45.720 --> 0:32:48.600
<v Speaker 3>was an earlier version of it that was destroyed by

0:32:48.760 --> 0:32:52.640
<v Speaker 3>Fober himself, quote in a bout of depression and intoxication.

0:32:53.520 --> 0:32:56.000
<v Speaker 3>I should say that nearly every source I read on

0:32:56.080 --> 0:33:01.920
<v Speaker 3>Fober mentions something about him being disheveled or haunted, obsessed

0:33:02.000 --> 0:33:05.360
<v Speaker 3>with his machine, and generally emotionally unwell or at the

0:33:05.440 --> 0:33:07.880
<v Speaker 3>very least having a really rough time a lot of

0:33:07.920 --> 0:33:11.920
<v Speaker 3>the time. Multiple writers describe him in terms containing a

0:33:11.920 --> 0:33:17.160
<v Speaker 3>lot of pity. But so, it took Fober apparently twenty

0:33:17.360 --> 0:33:20.560
<v Speaker 3>years to perfect the first version of the machine, the

0:33:20.600 --> 0:33:24.160
<v Speaker 3>one that he drunkenly destroyed, but he was able to

0:33:24.400 --> 0:33:27.680
<v Speaker 3>recreate the second version within a year of that. And

0:33:27.720 --> 0:33:30.640
<v Speaker 3>this kind of suggests to me the possibility that the

0:33:30.680 --> 0:33:34.560
<v Speaker 3>original creation of the machine may have really been a

0:33:34.560 --> 0:33:38.360
<v Speaker 3>project of fundamental research about phonetics more than it was

0:33:38.400 --> 0:33:41.920
<v Speaker 3>about engineering. And so once he had the knowledge in

0:33:42.040 --> 0:33:44.800
<v Speaker 3>hand of how each sound was produced, like what the

0:33:44.880 --> 0:33:48.360
<v Speaker 3>shape of the oral cavity, you know, how that corresponded

0:33:48.360 --> 0:33:51.880
<v Speaker 3>to the sounds, recreating the machine itself might have been

0:33:51.920 --> 0:33:54.800
<v Speaker 3>a relatively simple proposition. Is really what you needed was

0:33:54.840 --> 0:33:58.800
<v Speaker 3>the knowledge about how phonetics correspond to physical shapes.

0:33:59.360 --> 0:34:02.239
<v Speaker 2>Yeah, had that, and certainly they had notes on the

0:34:02.240 --> 0:34:06.120
<v Speaker 2>matter and his designs recorded. It would be easier to

0:34:06.160 --> 0:34:07.400
<v Speaker 2>come back and reproduce that.

0:34:07.840 --> 0:34:13.520
<v Speaker 3>Yeah. So Joseph Henry's letter about Faber's talking machine demonstration

0:34:13.840 --> 0:34:17.440
<v Speaker 3>it also includes speculation about the uses to which a

0:34:17.480 --> 0:34:21.000
<v Speaker 3>machine like this could be put. One interesting idea he

0:34:21.080 --> 0:34:24.240
<v Speaker 3>has is what if you could take a spoken message

0:34:24.280 --> 0:34:29.400
<v Speaker 3>at one location and code that spoken message into inputs

0:34:29.400 --> 0:34:34.600
<v Speaker 3>on this keyboard on this machine, and then, through electromagnetic means,

0:34:35.080 --> 0:34:40.520
<v Speaker 3>transmit those keystrokes across wires to a totally separate second location,

0:34:41.239 --> 0:34:45.040
<v Speaker 3>and then those electrical signals could operate the speech organs

0:34:45.080 --> 0:34:48.600
<v Speaker 3>of the doll faced machine. In the second location. You

0:34:48.640 --> 0:34:55.080
<v Speaker 3>would essentially be transmitting speech itself. Across great distance. Notable

0:34:55.080 --> 0:34:58.640
<v Speaker 3>that Henry's idea here is roughly thirty years before Alexander

0:34:58.680 --> 0:35:02.120
<v Speaker 3>Graham Bell demonstrates the principle of the telephone. But there

0:35:02.160 --> 0:35:06.000
<v Speaker 3>is a very important difference, which is that while Bell's

0:35:06.000 --> 0:35:09.480
<v Speaker 3>telephone and these are stories words here quote transmitted an

0:35:09.560 --> 0:35:14.680
<v Speaker 3>electrical analog of the speech pressure wave. Henry's description alluded

0:35:14.719 --> 0:35:20.000
<v Speaker 3>to representing speech in compressed form based on slowly varying

0:35:20.080 --> 0:35:23.879
<v Speaker 3>movements of the operator's hands, fingers, and feet as they

0:35:23.920 --> 0:35:27.920
<v Speaker 3>formed the keystroke sequences required to produce an utterance, a

0:35:27.960 --> 0:35:31.719
<v Speaker 3>signal processing technique that would not be implemented into telephone

0:35:31.719 --> 0:35:36.160
<v Speaker 3>transmission systems for nearly another century. So the interesting thing

0:35:36.200 --> 0:35:39.839
<v Speaker 3>about Henry here is that he's not just imagining converting

0:35:39.920 --> 0:35:43.200
<v Speaker 3>the sound of a voice into an impulse that travels

0:35:43.239 --> 0:35:47.319
<v Speaker 3>along the wire. He's imagining a coding process. It's put

0:35:47.400 --> 0:35:50.719
<v Speaker 3>into code for the transmission and then decoded by the

0:35:50.760 --> 0:35:51.960
<v Speaker 3>machine at the other end.

0:35:52.640 --> 0:35:56.160
<v Speaker 2>I can't help but try to imagine this alternate past

0:35:56.440 --> 0:36:00.560
<v Speaker 2>in which instead of early telephones, people all had this

0:36:01.320 --> 0:36:06.120
<v Speaker 2>weird cherub head mounted on the wall that then speaks

0:36:06.160 --> 0:36:09.920
<v Speaker 2>to you. In this I'm assumed slightly haunting voice.

0:36:10.160 --> 0:36:12.160
<v Speaker 3>Oh, I'll get to the haunting voice in a second,

0:36:12.960 --> 0:36:16.319
<v Speaker 3>but anyway, story flags It as historically significant that this

0:36:16.400 --> 0:36:21.040
<v Speaker 3>one invention had both succeeded in producing generally intelligible synthetic

0:36:21.080 --> 0:36:23.880
<v Speaker 3>speech to people in the room with it, and it

0:36:23.960 --> 0:36:27.800
<v Speaker 3>had inspired at least one onlooker to start considering ideas

0:36:27.800 --> 0:36:31.200
<v Speaker 3>for the electrical transmission of low bandwidth speech from one

0:36:31.239 --> 0:36:36.160
<v Speaker 3>place to another. But neither of these possibilities really went anywhere.

0:36:36.360 --> 0:36:39.239
<v Speaker 3>Henry did not devote any more effort to musing about

0:36:39.280 --> 0:36:44.440
<v Speaker 3>the electrical transmission, and Fober's machine ended up being a

0:36:44.440 --> 0:36:49.760
<v Speaker 3>circus side show, almost literally. So after this, Fober needed money,

0:36:50.280 --> 0:36:53.840
<v Speaker 3>and beginning in eighteen forty six, to get money, he

0:36:54.000 --> 0:36:57.360
<v Speaker 3>signed on to demonstrate his machine for p. T. Barnum

0:36:57.480 --> 0:37:00.959
<v Speaker 3>got to have something for everybody, even people who want

0:37:01.000 --> 0:37:05.080
<v Speaker 3>a talking doll head operated by a disheveled German organ master.

0:37:06.280 --> 0:37:10.239
<v Speaker 3>So Faber committed to exhibit the marvelous Speaking Machine four

0:37:10.320 --> 0:37:14.239
<v Speaker 3>Barnum at the Egyptian Hall in London. This was like

0:37:14.360 --> 0:37:18.200
<v Speaker 3>a general exhibition hall in Piccadilly, which hosted all kinds

0:37:18.239 --> 0:37:20.799
<v Speaker 3>of shows, but I think especially in the latter part

0:37:20.840 --> 0:37:23.920
<v Speaker 3>of the nineteenth century, it was known for showing like

0:37:23.960 --> 0:37:29.479
<v Speaker 3>a lot of Mountebanks and fraudulent spiritualist demonstrators. Yeah, I'll

0:37:29.520 --> 0:37:32.920
<v Speaker 3>reveal to you that you're actually a reincarnation of Cleopatra.

0:37:34.320 --> 0:37:37.600
<v Speaker 3>Lucky you. But by noting that that's just a random thing,

0:37:37.600 --> 0:37:40.719
<v Speaker 3>I'm not trying to cast dispersions on Fober, because I

0:37:40.719 --> 0:37:43.680
<v Speaker 3>want to stress that it seems totally clear that Fober

0:37:43.960 --> 0:37:46.640
<v Speaker 3>was no con artist. As best we can tell, his

0:37:46.760 --> 0:37:51.280
<v Speaker 3>machine really did work, and when played correctly, it did

0:37:51.400 --> 0:37:54.879
<v Speaker 3>really speak original sentences that people could, for the most

0:37:54.880 --> 0:37:56.000
<v Speaker 3>part understand.

0:37:56.360 --> 0:37:56.640
<v Speaker 2>Though.

0:37:56.800 --> 0:38:00.799
<v Speaker 3>One thing that emerges from reading descriptions of this is

0:38:00.840 --> 0:38:07.120
<v Speaker 3>that coding intelligible information and sounding like speech are two

0:38:07.160 --> 0:38:10.640
<v Speaker 3>completely different things. So it seems that a lot of

0:38:10.680 --> 0:38:14.480
<v Speaker 3>people could tell what the machine was saying, but still

0:38:14.560 --> 0:38:18.480
<v Speaker 3>they were not very impressed by what they heard. And

0:38:18.560 --> 0:38:23.400
<v Speaker 3>I found a spectacularly evocative description of what the machine

0:38:23.440 --> 0:38:26.560
<v Speaker 3>was like I recorded in a book called Instruments and

0:38:26.600 --> 0:38:30.600
<v Speaker 3>the Imagination by Thomas L. Hankins and Robert J. Silverman,

0:38:30.760 --> 0:38:34.319
<v Speaker 3>Princeton University Press, nineteen ninety nine. But the main thing

0:38:34.360 --> 0:38:37.319
<v Speaker 3>here is that they're quoting a person who saw the

0:38:37.360 --> 0:38:40.200
<v Speaker 3>machine in person in eighteen forty six, I believe, and

0:38:40.440 --> 0:38:42.960
<v Speaker 3>then wrote about it in a memoir. But generally the

0:38:43.000 --> 0:38:45.520
<v Speaker 3>authors here they note that there were like some satirical

0:38:45.640 --> 0:38:49.640
<v Speaker 3>articles making reference to Faber's machine, suggesting, for example, that

0:38:49.760 --> 0:38:52.319
<v Speaker 3>it could be used to replace the speaker of the

0:38:52.400 --> 0:38:57.319
<v Speaker 3>House of Commons. Yak, yeah, those wacky politicians. But then

0:38:57.960 --> 0:38:59.720
<v Speaker 3>well they do kind of make a funny point. Actually,

0:38:59.719 --> 0:39:01.839
<v Speaker 3>they say, like you could just program it to say

0:39:02.040 --> 0:39:04.080
<v Speaker 3>order order at ten minute intervals.

0:39:06.040 --> 0:39:07.760
<v Speaker 2>Well that's pretty good, that's funny today.

0:39:08.200 --> 0:39:10.960
<v Speaker 3>Yeah. But anyway, then there's a part of the book

0:39:11.000 --> 0:39:15.000
<v Speaker 3>where they're including this evocative written account which is from

0:39:15.080 --> 0:39:20.120
<v Speaker 3>a London theater manager named John Hollingshead who saw this

0:39:20.200 --> 0:39:22.960
<v Speaker 3>machine in person when he was nineteen years old and

0:39:23.000 --> 0:39:25.880
<v Speaker 3>then wrote about it in a memoirs or some book.

0:39:26.120 --> 0:39:30.760
<v Speaker 3>But anyway, this is Hollingshead's account. The exhibitor, Professor Fober,

0:39:31.239 --> 0:39:34.880
<v Speaker 3>was a sad faced man, dressed in respectable, well worn

0:39:34.920 --> 0:39:39.000
<v Speaker 3>clothes that were soiled by contact with tools, wood and machinery.

0:39:39.600 --> 0:39:43.160
<v Speaker 3>The room looked like a laboratory and workshop, which it was.

0:39:43.800 --> 0:39:46.560
<v Speaker 3>The professor was not too clean, and his hair and

0:39:46.600 --> 0:39:50.239
<v Speaker 3>beard sadly wanted the attention of a barber. I have

0:39:50.320 --> 0:39:53.000
<v Speaker 3>no doubt that he slept in the same room as

0:39:53.040 --> 0:39:57.480
<v Speaker 3>his figure, his scientific Frankenstein Monster. Note. I guess the

0:39:57.520 --> 0:40:00.560
<v Speaker 3>novel would have only been a few decades old this time.

0:40:00.960 --> 0:40:03.359
<v Speaker 2>Yeah, yeah, eighteen eighteen on Frankenstein there.

0:40:03.600 --> 0:40:07.200
<v Speaker 3>Yeah, sorry, going on with hallings Head and I felt

0:40:07.239 --> 0:40:10.000
<v Speaker 3>the secret influence of an idea that the two were

0:40:10.080 --> 0:40:12.160
<v Speaker 3>destined to live and die together.

0:40:12.760 --> 0:40:15.800
<v Speaker 2>Oh my god, this is those pretty strong words.

0:40:16.160 --> 0:40:20.319
<v Speaker 3>Yes, the professor, with a slight German accent, put his

0:40:20.440 --> 0:40:24.440
<v Speaker 3>wonderful toy in motion. He explained its action. It was

0:40:24.480 --> 0:40:28.520
<v Speaker 3>not necessary to prove the absence of deception. One keyboard

0:40:28.640 --> 0:40:32.680
<v Speaker 3>touched by the professor produced words which, slowly and deliberately,

0:40:32.800 --> 0:40:36.480
<v Speaker 3>in a hoarse, sepulchral voice, came from the mouth of

0:40:36.520 --> 0:40:39.200
<v Speaker 3>the figure, as if from the depths of a tomb.

0:40:39.760 --> 0:40:43.040
<v Speaker 3>It wanted little imagination to make the very few visitors

0:40:43.120 --> 0:40:46.959
<v Speaker 3>believe that the figure contained an imprisoned human or half

0:40:47.080 --> 0:40:51.560
<v Speaker 3>human being, bound to speak slowly when tormented by the

0:40:51.640 --> 0:40:55.680
<v Speaker 3>unseen power outside. No one thought for a moment that

0:40:55.719 --> 0:40:58.160
<v Speaker 3>they were being fooled by a second edition of The

0:40:58.360 --> 0:41:02.200
<v Speaker 3>Invisible Girl fra Allod And by the way, the reference

0:41:02.239 --> 0:41:04.719
<v Speaker 3>to the Invisible Girl fraud, I believe is about the

0:41:05.080 --> 0:41:08.680
<v Speaker 3>many fake machines and fake automata that were actually worked

0:41:08.719 --> 0:41:11.919
<v Speaker 3>by having a human hidden inside operating it but going

0:41:11.960 --> 0:41:14.880
<v Speaker 3>out so holling said, says, nobody thought that there was

0:41:14.920 --> 0:41:17.920
<v Speaker 3>an invisible girl operating. This as clear, this is real.

0:41:18.200 --> 0:41:22.120
<v Speaker 3>He goes on, there were truth, laborious invention, and good

0:41:22.200 --> 0:41:25.719
<v Speaker 3>faith in every part of the melancholy room. As a

0:41:25.760 --> 0:41:29.960
<v Speaker 3>crowning display, the head sang a sepulchral version of God

0:41:30.080 --> 0:41:34.640
<v Speaker 3>Save the Queen, which suggested, inevitably, God save the inventor.

0:41:35.280 --> 0:41:39.279
<v Speaker 3>This extraordinary effect was achieved by the professor working two keyboards,

0:41:39.600 --> 0:41:43.040
<v Speaker 3>one for the words and one for the music. Never

0:41:43.160 --> 0:41:46.880
<v Speaker 3>probably before or since, has the national anthem been so sung,

0:41:47.520 --> 0:41:51.360
<v Speaker 3>sadder and wiser. I and the few visitors crept slowly

0:41:51.440 --> 0:41:54.440
<v Speaker 3>from the place, leaving the professor with his one and

0:41:54.520 --> 0:41:58.920
<v Speaker 3>only treasure, his child of infinite labor and unmeasurable sorrow.

0:41:59.320 --> 0:42:03.960
<v Speaker 2>Oh wow, that is a lot. I mean, obviously, he's

0:42:04.040 --> 0:42:06.560
<v Speaker 2>lee lays it on really thick about the sadness of

0:42:06.600 --> 0:42:09.600
<v Speaker 2>the inventor here. And then also there's the ideas like

0:42:09.640 --> 0:42:12.520
<v Speaker 2>this was no hoax, this was real and it was depressing.

0:42:12.960 --> 0:42:16.719
<v Speaker 3>Yeah, I thought, it's a weird mix of like like

0:42:16.920 --> 0:42:20.920
<v Speaker 3>pity but real admiration, you know that, Like, there's something

0:42:21.280 --> 0:42:24.640
<v Speaker 3>beautiful and honest and true about this machine and his

0:42:24.680 --> 0:42:27.799
<v Speaker 3>devotion to it and the genius it took to create it.

0:42:28.000 --> 0:42:32.040
<v Speaker 3>But also it makes everybody feel bad and nobody wants

0:42:32.080 --> 0:42:34.040
<v Speaker 3>to look at it or listen to it, and everybody

0:42:34.120 --> 0:42:38.560
<v Speaker 3>leaves feeling depressed. Yeah, something about that struck me as

0:42:38.560 --> 0:42:41.879
<v Speaker 3>actually quite poignant and meaningful. Maybe we can come back

0:42:41.880 --> 0:42:43.840
<v Speaker 3>to that in a minute, but I did want to

0:42:43.840 --> 0:42:47.520
<v Speaker 3>flag that there was one notable visitor who, coming back

0:42:47.520 --> 0:42:51.640
<v Speaker 3>to the Invisible Girl suspicion, he did at first suspect fraud,

0:42:51.760 --> 0:42:54.960
<v Speaker 3>and that was the Duke of Wellington. I was reading

0:42:54.960 --> 0:42:57.240
<v Speaker 3>about this in a book called The Shows of London

0:42:57.320 --> 0:43:02.000
<v Speaker 3>by Richard Daniel Atlick, and at Lick recounts that Wellington,

0:43:02.320 --> 0:43:05.319
<v Speaker 3>when he first went to the demonstration, he was so

0:43:05.480 --> 0:43:09.719
<v Speaker 3>impressed by Faber's speaking machine that he asked to be

0:43:09.840 --> 0:43:12.600
<v Speaker 3>allowed to touch the keys with his own fingers, you know,

0:43:12.640 --> 0:43:15.359
<v Speaker 3>so he could see that it was genuine. And then

0:43:15.520 --> 0:43:17.840
<v Speaker 3>he did confirm that it was genuine, and then he

0:43:17.920 --> 0:43:20.799
<v Speaker 3>insisted that he'd be taught how to use it, so

0:43:21.040 --> 0:43:24.080
<v Speaker 3>Fober taught the Duke to play the machine in both

0:43:24.160 --> 0:43:28.120
<v Speaker 3>German and English, and Wellington did get it like he could.

0:43:28.160 --> 0:43:30.640
<v Speaker 3>He could make it speak sentences in German and English,

0:43:30.800 --> 0:43:33.680
<v Speaker 3>and he was amazed, writing in the visitor's log of

0:43:33.760 --> 0:43:37.840
<v Speaker 3>the exhibit that the speaking machine, or the Euphonia, was

0:43:37.920 --> 0:43:52.600
<v Speaker 3>quote an extraordinary production of mechanical genius. Faber's machine also

0:43:52.640 --> 0:43:56.400
<v Speaker 3>got rave reviews in The Times, in the Illustrated London News.

0:43:56.440 --> 0:44:00.160
<v Speaker 3>A lot of people like looked at it and they

0:44:00.000 --> 0:44:03.000
<v Speaker 3>they thought that, like, yeah, this is a work of genius.

0:44:03.239 --> 0:44:06.560
<v Speaker 3>It's incredible that he's done this. But at the same time,

0:44:07.120 --> 0:44:11.279
<v Speaker 3>audiences really were not into it. Barnum himself noticed that

0:44:11.320 --> 0:44:14.600
<v Speaker 3>Fober's machine was not attracting crowds, it was not selling

0:44:14.640 --> 0:44:19.040
<v Speaker 3>tickets and not generating revenue, and so eventually he took

0:44:19.160 --> 0:44:23.560
<v Speaker 3>Fober's machine out of the Egyptian Hall in London and

0:44:23.760 --> 0:44:26.360
<v Speaker 3>added it to a traveling exhibit that went around the

0:44:26.360 --> 0:44:31.440
<v Speaker 3>English countryside doing performances. And from here Fober himself seems

0:44:31.440 --> 0:44:34.880
<v Speaker 3>to kind of disappear from the historical record. Some sources

0:44:34.920 --> 0:44:38.640
<v Speaker 3>indicate that he may have died by suicide during this period,

0:44:38.880 --> 0:44:42.720
<v Speaker 3>though that isn't known. For sure, But after historical sources

0:44:42.760 --> 0:44:46.440
<v Speaker 3>stopped mentioning Fober himself, they still make references to his

0:44:46.480 --> 0:44:51.240
<v Speaker 3>machine reading from story here quote. Although his talking machine

0:44:51.520 --> 0:44:54.560
<v Speaker 3>continued to make side show like appearances in Europe and

0:44:54.640 --> 0:44:58.600
<v Speaker 3>North America over the next thirty years. It seems a relative,

0:44:58.719 --> 0:45:01.560
<v Speaker 3>perhaps a niece or nephew you may have inherited the

0:45:01.600 --> 0:45:04.160
<v Speaker 3>machine and performed with it to generate income.

0:45:05.160 --> 0:45:08.400
<v Speaker 2>So maybe no matter whatever happened to him, maybe a

0:45:08.440 --> 0:45:11.719
<v Speaker 2>relative with a little more showmanship like stepped in and

0:45:11.920 --> 0:45:14.239
<v Speaker 2>was able to make at least some sort of an

0:45:14.239 --> 0:45:15.040
<v Speaker 2>income off of it.

0:45:15.560 --> 0:45:18.240
<v Speaker 3>Yes, but then again, like I'm struck by the strange,

0:45:18.280 --> 0:45:23.600
<v Speaker 3>ironic sadness of this. This this was actually a scientifically

0:45:23.680 --> 0:45:27.520
<v Speaker 3>significant invention, Like he had done something kind of amazing,

0:45:28.640 --> 0:45:31.880
<v Speaker 3>but it just never really went anywhere under his mastery.

0:45:31.920 --> 0:45:35.520
<v Speaker 3>And then yeah, maybe a relative was a better Carnival

0:45:35.560 --> 0:45:39.520
<v Speaker 3>Barker essentially to perform with the machine and make some

0:45:39.560 --> 0:45:40.319
<v Speaker 3>money off of it.

0:45:40.520 --> 0:45:42.799
<v Speaker 2>I mean, it reminds me of so many advancements in

0:45:43.000 --> 0:45:48.480
<v Speaker 2>say robotics, that we've seen over the years, where oftentimes,

0:45:48.719 --> 0:45:51.480
<v Speaker 2>you know, to a certain extent, unfairly, they'll just be

0:45:51.600 --> 0:45:54.160
<v Speaker 2>one little clip of it that goes viral and people

0:45:54.200 --> 0:45:57.360
<v Speaker 2>react to be it some sort of you know, a

0:45:58.120 --> 0:46:01.480
<v Speaker 2>human likeness with facial features that seem to be moving

0:46:01.600 --> 0:46:04.520
<v Speaker 2>or operating in an uncanny way, or something like the

0:46:04.600 --> 0:46:11.160
<v Speaker 2>various dog robots from Boston Dynamics that are very impressive

0:46:11.200 --> 0:46:14.560
<v Speaker 2>but also may be interpreted as being a bit creepy.

0:46:14.880 --> 0:46:17.000
<v Speaker 2>And so even though they are these, you know, they're

0:46:17.000 --> 0:46:23.000
<v Speaker 2>often examples of a real impressive technological advancement setting aside

0:46:23.320 --> 0:46:26.719
<v Speaker 2>actual applications, you can have a situation where something like

0:46:26.760 --> 0:46:32.919
<v Speaker 2>that is not as comforting, not as entertaining as say

0:46:32.920 --> 0:46:35.960
<v Speaker 2>an act of puppetry, or even an act of just

0:46:36.320 --> 0:46:40.080
<v Speaker 2>outright well maybe not fraud, but say a robot or

0:46:40.120 --> 0:46:46.000
<v Speaker 2>a costume depicting a robot maybe ultimately maybe more reassuring,

0:46:46.120 --> 0:46:48.399
<v Speaker 2>maybe more fun compared to the actual thing.

0:46:48.920 --> 0:46:51.279
<v Speaker 3>Well yeah, which may which may just be fun or

0:46:51.280 --> 0:46:54.080
<v Speaker 3>may in fact be fraud, depending on what exactly they're

0:46:54.120 --> 0:46:54.839
<v Speaker 3>saying about it.

0:46:55.200 --> 0:46:56.799
<v Speaker 2>Yeah, but this.

0:46:56.760 --> 0:46:58.480
<v Speaker 3>Is a great point and it brings me to I

0:46:58.520 --> 0:47:01.160
<v Speaker 3>just wanted to mention a few of the the general

0:47:01.239 --> 0:47:05.200
<v Speaker 3>notes about the history of speech synthesis from the end

0:47:05.200 --> 0:47:09.719
<v Speaker 3>of this book chapter by Brad's story Story Rights that

0:47:09.800 --> 0:47:14.120
<v Speaker 3>you know, while there are technological use cases for speech synthesizers.

0:47:14.560 --> 0:47:16.759
<v Speaker 3>You know, we've got a number of them operating in

0:47:16.760 --> 0:47:21.080
<v Speaker 3>consumer technology today, and even before you had you know,

0:47:21.360 --> 0:47:24.680
<v Speaker 3>personal digital assistants and stuff, there would be use cases

0:47:24.719 --> 0:47:28.920
<v Speaker 3>for speech synthesizers for, for example, people who have a

0:47:28.920 --> 0:47:32.400
<v Speaker 3>disability that makes it difficult or impossible for them to speak.

0:47:32.800 --> 0:47:35.759
<v Speaker 3>Another one is that apparently this was actually used by

0:47:35.800 --> 0:47:38.000
<v Speaker 3>the Allies in World War Two, there were some forms

0:47:38.000 --> 0:47:42.880
<v Speaker 3>of speech synthesis that would allow sort of covert coded

0:47:43.080 --> 0:47:46.600
<v Speaker 3>transmissions of something like a phone call, so you could

0:47:46.600 --> 0:47:49.680
<v Speaker 3>have a phone call between like FDR and Winston Churchill.

0:47:50.000 --> 0:47:52.600
<v Speaker 3>It's not really a phone call, it's like a transmitted

0:47:53.200 --> 0:47:56.759
<v Speaker 3>synthesized bit of speech, and so it's very secure. But

0:47:56.840 --> 0:47:59.919
<v Speaker 3>it doesn't sound like the person talking. It sounds maybe

0:48:00.080 --> 0:48:03.320
<v Speaker 3>more like the euphonia, kind of robotic and unnatural and

0:48:03.960 --> 0:48:07.920
<v Speaker 3>maybe making the president's giggle a bit a president Prime minister.

0:48:09.200 --> 0:48:12.240
<v Speaker 3>But anyway, So what story says is that a large

0:48:12.320 --> 0:48:15.839
<v Speaker 3>number of these systems have actually been primarily used as

0:48:16.480 --> 0:48:20.600
<v Speaker 3>research tools, as scientific tools for understanding the nature of

0:48:20.680 --> 0:48:25.960
<v Speaker 3>human speech. It's by trying to reproduce human speech and

0:48:26.160 --> 0:48:29.919
<v Speaker 3>failing at it that we come closer to understanding how

0:48:30.040 --> 0:48:34.000
<v Speaker 3>speech actually works in the human body. But the second

0:48:34.040 --> 0:48:36.560
<v Speaker 3>general observation that I thought is interesting, and this seems

0:48:36.600 --> 0:48:41.000
<v Speaker 3>to be very much reflected in the Fober's machine example.

0:48:41.520 --> 0:48:44.319
<v Speaker 3>It is much easier to create a machine that can

0:48:44.360 --> 0:48:50.440
<v Speaker 3>speak intelligibly than one that can speak naturally. So that

0:48:50.480 --> 0:48:53.600
<v Speaker 3>indicates that when we talk, there's actually more than one

0:48:53.719 --> 0:48:58.240
<v Speaker 3>thing going on. Yes, we are conveying mental information coded

0:48:58.280 --> 0:49:02.480
<v Speaker 3>in words, and the substance of that coding is phonetic.

0:49:02.640 --> 0:49:04.880
<v Speaker 3>It's a series of sounds. But of course, you know,

0:49:04.920 --> 0:49:08.279
<v Speaker 3>the ironic thing to people who were used to thinking

0:49:08.280 --> 0:49:11.319
<v Speaker 3>about words as text is that the phonetic core of

0:49:11.400 --> 0:49:14.680
<v Speaker 3>language long predates writing, so like the written text of

0:49:14.719 --> 0:49:17.759
<v Speaker 3>a word is a visual code for the sound of

0:49:17.800 --> 0:49:21.000
<v Speaker 3>the word, which is the code for its meaning. But anyway,

0:49:21.239 --> 0:49:24.520
<v Speaker 3>so machines for hundreds of years have been able to

0:49:24.680 --> 0:49:29.440
<v Speaker 3>produce more or less intelligible phonetic code. They can speak words,

0:49:29.440 --> 0:49:32.360
<v Speaker 3>and people can understand what the words are supposed to be,

0:49:33.440 --> 0:49:37.239
<v Speaker 3>But it doesn't necessarily mean that people perceive these machines

0:49:37.320 --> 0:49:42.000
<v Speaker 3>as speaking, because there's another important quality to speech that

0:49:42.120 --> 0:49:45.200
<v Speaker 3>was not really captured by these early machines, and you

0:49:45.239 --> 0:49:48.160
<v Speaker 3>could argue is still somewhat lacking in the best speech

0:49:48.640 --> 0:49:52.400
<v Speaker 3>synthesis of today, and that is the natural character of

0:49:52.520 --> 0:49:58.920
<v Speaker 3>continuous speech. These machines always produce speech that sounded stilted, unreal, alien.

0:49:59.000 --> 0:50:01.799
<v Speaker 3>It was never something that would make you feel like

0:50:01.880 --> 0:50:05.200
<v Speaker 3>you were actually being talked to, as much as sort

0:50:05.239 --> 0:50:11.120
<v Speaker 3>of receiving a weird alien code in your language and

0:50:11.160 --> 0:50:14.560
<v Speaker 3>here I just want to read from the stories chapter quote.

0:50:14.760 --> 0:50:18.560
<v Speaker 3>As a result, synthesis often presents itself as an oral

0:50:18.880 --> 0:50:22.800
<v Speaker 3>caricature that can be perceived as an unnatural and sometimes

0:50:22.920 --> 0:50:26.920
<v Speaker 3>amusing rendition of a desired utterance or speech sound. It

0:50:27.000 --> 0:50:31.279
<v Speaker 3>is particularly unique to phonetics and speech science that the

0:50:31.400 --> 0:50:35.759
<v Speaker 3>models used as tools to understand the scientific aspects of

0:50:35.800 --> 0:50:40.000
<v Speaker 3>a complex system produce a signal intended to be heard

0:50:40.200 --> 0:50:43.160
<v Speaker 3>as if it were a human. As such, the quality

0:50:43.160 --> 0:50:47.080
<v Speaker 3>of a speech synthesis can be rather harshly judged because

0:50:47.200 --> 0:50:50.160
<v Speaker 3>the model on which it is based has not accounted

0:50:50.200 --> 0:50:54.239
<v Speaker 3>for the myriad of subtle variations and details that combine

0:50:54.440 --> 0:50:58.640
<v Speaker 3>in natural human speech. So to paraphrase, speech is so

0:50:58.880 --> 0:51:02.279
<v Speaker 3>much more than just the words, And even if you

0:51:02.360 --> 0:51:05.839
<v Speaker 3>can get the words right, there's still something that is

0:51:06.280 --> 0:51:08.319
<v Speaker 3>that is lacking and this is going to take a

0:51:08.400 --> 0:51:10.320
<v Speaker 3>lot of work to try to capture.

0:51:10.640 --> 0:51:13.080
<v Speaker 2>Yeah, this is this is fascinating to think about, you know,

0:51:13.120 --> 0:51:16.640
<v Speaker 2>especially given what you mentioned earlier about it's the importance

0:51:16.800 --> 0:51:21.400
<v Speaker 2>of speech the synthesizer technology to aid people who cannot

0:51:21.560 --> 0:51:24.719
<v Speaker 2>speak or have lost the ability to speak. You know,

0:51:24.760 --> 0:51:27.279
<v Speaker 2>I gave probably one of the most famous, if not

0:51:27.320 --> 0:51:30.279
<v Speaker 2>the most famous examples of this is, of course, the

0:51:30.320 --> 0:51:36.840
<v Speaker 2>speech synthesizer used by theoretical Stephen Hawking. Like one of

0:51:36.640 --> 0:51:39.760
<v Speaker 2>the interesting things about his story with it, as I remember,

0:51:39.880 --> 0:51:42.640
<v Speaker 2>is that just me mentioning it, you can probably sort

0:51:42.640 --> 0:51:45.520
<v Speaker 2>of hear the voice, the synthesized voice of Stephen Hawking

0:51:45.800 --> 0:51:49.120
<v Speaker 2>in your head. And I know that at some point

0:51:49.239 --> 0:51:52.399
<v Speaker 2>like that was you know, an early system he got there,

0:51:52.440 --> 0:51:54.560
<v Speaker 2>and later on in life he had he could have

0:51:54.600 --> 0:51:56.920
<v Speaker 2>switched the voice up, he could have changed the voice,

0:51:57.320 --> 0:52:00.840
<v Speaker 2>and I'm assuming could have maybe improved upon it, but

0:52:00.960 --> 0:52:03.920
<v Speaker 2>by that point he felt that this was his voice.

0:52:03.920 --> 0:52:05.799
<v Speaker 2>You know, you can't switch it up. You know, this

0:52:05.880 --> 0:52:08.200
<v Speaker 2>is this is how I speak, and this is how

0:52:08.239 --> 0:52:11.640
<v Speaker 2>I hear myself. So I always found that that interesting,

0:52:11.680 --> 0:52:13.560
<v Speaker 2>and especially when and then you can compare that to

0:52:13.600 --> 0:52:16.760
<v Speaker 2>some other cases like you know, film credit Roger Ebert

0:52:16.840 --> 0:52:18.800
<v Speaker 2>late in life, you know, you could no longer speak,

0:52:19.080 --> 0:52:21.480
<v Speaker 2>but had I think they had a more robust system

0:52:21.520 --> 0:52:24.839
<v Speaker 2>put together based on samples of you know, the the

0:52:24.880 --> 0:52:29.319
<v Speaker 2>great catalog of his own recorded speeches and reviews and

0:52:29.320 --> 0:52:32.000
<v Speaker 2>so forth that they could draw upon, and then looking

0:52:32.040 --> 0:52:35.200
<v Speaker 2>into the future, you have situations like James Earl Jones's

0:52:35.280 --> 0:52:41.160
<v Speaker 2>Darth Vader voice, that being you know, sort of archived

0:52:41.320 --> 0:52:45.120
<v Speaker 2>and prepared for so that in the future you can

0:52:45.400 --> 0:52:49.480
<v Speaker 2>you can basically have like a machine synthesized version of

0:52:49.480 --> 0:52:52.840
<v Speaker 2>that voice that will stand in as a sort of

0:52:52.920 --> 0:52:55.920
<v Speaker 2>one to one replication of what James Earl Jones did

0:52:55.960 --> 0:52:59.480
<v Speaker 2>in life with the voice acting, or at least so.

0:52:59.480 --> 0:53:01.680
<v Speaker 3>The proponent of the technology would say, I'm sure there

0:53:01.719 --> 0:53:03.480
<v Speaker 3>would be critics who would say it's never going to

0:53:03.520 --> 0:53:04.759
<v Speaker 3>be at one to one.

0:53:05.040 --> 0:53:07.160
<v Speaker 2>Right, right, And then of course there's also the argument

0:53:07.360 --> 0:53:11.480
<v Speaker 2>specifically with only with Darth Vader. Here am I discussing this,

0:53:11.600 --> 0:53:14.719
<v Speaker 2>but obviously the case can be made that like we

0:53:14.840 --> 0:53:19.720
<v Speaker 2>we shouldn't reproduce, you know, deceased actors' voices to continue

0:53:19.719 --> 0:53:24.200
<v Speaker 2>a fictional role. We should employ new living actors. And

0:53:24.280 --> 0:53:27.360
<v Speaker 2>existing living voice actors who can do the voice. I

0:53:27.400 --> 0:53:30.000
<v Speaker 2>think with Darth Vader in particular, you could make a

0:53:30.000 --> 0:53:32.160
<v Speaker 2>strong case for that because there are other voice actors

0:53:32.160 --> 0:53:36.000
<v Speaker 2>who do officially voice act that character and do a

0:53:36.000 --> 0:53:39.719
<v Speaker 2>great job with it. What does it mean if if

0:53:39.719 --> 0:53:44.120
<v Speaker 2>that individual's job is potentially taken by this sort of

0:53:44.239 --> 0:53:48.880
<v Speaker 2>machine likeness of that voice that is authorized based on

0:53:49.239 --> 0:53:53.280
<v Speaker 2>the voice of a you know, of a retired or

0:53:53.400 --> 0:53:55.600
<v Speaker 2>or in some cases you know, deceased individual.

0:53:55.920 --> 0:53:58.160
<v Speaker 3>Well, we're going a little off topic now, but I

0:53:58.200 --> 0:54:00.520
<v Speaker 3>will say that I stand by what I've said before,

0:54:00.520 --> 0:54:03.040
<v Speaker 3>which is I'm firmly in the camp that I prefer

0:54:03.360 --> 0:54:07.600
<v Speaker 3>recasting with a different actor as opposed to using technology

0:54:07.640 --> 0:54:10.560
<v Speaker 3>to try to synthesize the voice or appearance of an

0:54:10.600 --> 0:54:14.279
<v Speaker 3>actor who, for whatever reason cannot be present. People have

0:54:14.280 --> 0:54:17.160
<v Speaker 3>been recasting the same role with different actors for decades.

0:54:17.200 --> 0:54:19.319
<v Speaker 3>That happens all the time. Like, what's the problem with it?

0:54:19.840 --> 0:54:23.400
<v Speaker 2>Yeah? I agree, I agree, But in some cases, is

0:54:23.440 --> 0:54:28.239
<v Speaker 2>it possible that a role that's been established by a

0:54:28.280 --> 0:54:34.439
<v Speaker 2>living actor could not be just masterfully redone by a

0:54:34.640 --> 0:54:38.840
<v Speaker 2>clunky machine with the face of a cherub that is

0:54:39.960 --> 0:54:42.719
<v Speaker 2>manipulated by a sad German Man who needs a haircut.

0:54:42.920 --> 0:54:45.560
<v Speaker 2>I think there's some potential there, like I don't know

0:54:45.640 --> 0:54:46.600
<v Speaker 2>the next James Bond.

0:54:46.640 --> 0:54:50.040
<v Speaker 3>Maybe this is the only film genre I'm interested in

0:54:50.080 --> 0:54:55.239
<v Speaker 3>from now on, high tension espionage movies starring the Euphonia.

0:54:57.040 --> 0:55:00.120
<v Speaker 2>So there you have it. The machine speaks, obviously, I'd

0:55:00.120 --> 0:55:01.400
<v Speaker 2>love to hear from everyone out there if you have

0:55:01.440 --> 0:55:03.480
<v Speaker 2>thoughts on all of this, and certainly anyone out there

0:55:03.480 --> 0:55:08.680
<v Speaker 2>who has direct experience with speech synthesizer technology for one

0:55:08.800 --> 0:55:11.040
<v Speaker 2>use or another right in, we would love to hear

0:55:11.120 --> 0:55:11.480
<v Speaker 2>from you.

0:55:12.080 --> 0:55:17.280
<v Speaker 3>Just a reminder, I just the speech synthesis or speech

0:55:17.360 --> 0:55:21.120
<v Speaker 3>synthesizer is one of the hardest pairs of words to enunciate,

0:55:21.200 --> 0:55:22.680
<v Speaker 3>and I've had to say it so many times in

0:55:22.719 --> 0:55:26.600
<v Speaker 3>this episode. I just want to be recognized, especially for

0:55:26.640 --> 0:55:27.920
<v Speaker 3>the times I probably did it wrong.

0:55:28.719 --> 0:55:31.600
<v Speaker 2>Yes, well it's easy for the babyface machines though, Yeah,

0:55:31.680 --> 0:55:34.279
<v Speaker 2>so at any rate. Yeah, if you want to listen

0:55:34.280 --> 0:55:35.920
<v Speaker 2>to other episodes of Stuff to Blow Your Mind, you

0:55:35.920 --> 0:55:37.239
<v Speaker 2>will find them in the Stuff to Bliw Your Mind

0:55:37.239 --> 0:55:41.160
<v Speaker 2>podcast feed with our core episodes on Tuesdays and Thursdays. Mondays,

0:55:41.160 --> 0:55:43.359
<v Speaker 2>we do a listener mail Wednesdays, we do a short

0:55:43.360 --> 0:55:45.919
<v Speaker 2>form artufactor monster fact, and then on Fridays we set

0:55:45.920 --> 0:55:48.560
<v Speaker 2>aside most serious concerns to just talk about a weird

0:55:48.560 --> 0:55:50.239
<v Speaker 2>film on Weird House Cinema.

0:55:50.400 --> 0:55:54.399
<v Speaker 3>Huge thanks to our excellent audio producer JJ Posway. If

0:55:54.400 --> 0:55:55.799
<v Speaker 3>you would like to get in touch with us with

0:55:55.880 --> 0:55:58.360
<v Speaker 3>feedback on this episode or any other, to suggest a

0:55:58.400 --> 0:56:00.440
<v Speaker 3>topic for the future, or just to say hello, you

0:56:00.480 --> 0:56:03.200
<v Speaker 3>can email us at contact at stuff to Blow your

0:56:03.239 --> 0:56:11.320
<v Speaker 3>Mind dot com.

0:56:11.360 --> 0:56:14.319
<v Speaker 1>Stuff to Blow Your Mind is production of iHeartRadio. For

0:56:14.400 --> 0:56:17.200
<v Speaker 1>more podcasts from my heart Radio, visit the iHeartRadio app,

0:56:17.360 --> 0:56:34.320
<v Speaker 1>Apple Podcasts, or wherever you listen to your favorite shows.