WEBVTT - From the Vault: The Machine Speaks

0:00:05.920 --> 0:00:09.160
<v Speaker 1>Hey, welcome to Stuff to Blow Your Mind. This is

0:00:09.240 --> 0:00:12.520
<v Speaker 1>Robert Lamb and Hey it's Saturday. It's time to venture

0:00:12.560 --> 0:00:15.480
<v Speaker 1>into the vault once more for one of our core

0:00:15.560 --> 0:00:18.520
<v Speaker 1>episodes from the past. This is going to be The

0:00:18.560 --> 0:00:24.120
<v Speaker 1>Machine Speaks, Originally published on six twenty nine, twenty twenty three.

0:00:24.600 --> 0:00:27.600
<v Speaker 1>We really hope you enjoy this one. It's a journey

0:00:27.680 --> 0:00:32.080
<v Speaker 1>through everything from ancient tales of artifice and wizardry to

0:00:32.159 --> 0:00:37.040
<v Speaker 1>the early breakthroughs in speech synthesis technology. Let's dive right in.

0:00:40.479 --> 0:00:44.640
<v Speaker 2>Welcome to Stuff to Blow Your Mind, production of iHeartRadio.

0:00:50.240 --> 0:00:52.360
<v Speaker 1>Hey, welcome to Stuff to Blow Your Mind. My name

0:00:52.400 --> 0:00:53.000
<v Speaker 1>is Robert.

0:00:52.840 --> 0:00:56.560
<v Speaker 3>Lamb, and I am Joe McCormick, and today we're going

0:00:56.640 --> 0:01:02.160
<v Speaker 3>to be talking about some early voice synthesis machines. Rob

0:01:02.160 --> 0:01:05.440
<v Speaker 3>I actually got interested in this topic because last week,

0:01:05.480 --> 0:01:09.000
<v Speaker 3>when we were watching the Weird House Cinema movie The

0:01:09.040 --> 0:01:12.800
<v Speaker 3>Black Hole, I was thinking about Roddy McDowell's voice when

0:01:12.840 --> 0:01:16.399
<v Speaker 3>he's doing a voice for the robot character who shares

0:01:16.400 --> 0:01:19.280
<v Speaker 3>a lot of proverbs with the human characters. And I

0:01:19.400 --> 0:01:23.400
<v Speaker 3>kept listening to his line delivery and I couldn't decide

0:01:23.440 --> 0:01:26.640
<v Speaker 3>if he was trying to do quote robot voice or not.

0:01:27.040 --> 0:01:28.560
<v Speaker 3>He seemed to kind of dip in and out of it.

0:01:28.600 --> 0:01:30.480
<v Speaker 3>You know what I mean when I say robot voice,

0:01:30.480 --> 0:01:33.480
<v Speaker 3>where a character's playing a robot and they say things

0:01:33.680 --> 0:01:35.040
<v Speaker 3>like this, Well.

0:01:34.880 --> 0:01:37.119
<v Speaker 1>Of course I know what you are talking about, Joe.

0:01:37.760 --> 0:01:41.199
<v Speaker 3>I got kind of interested in the history of robot voice.

0:01:41.280 --> 0:01:43.880
<v Speaker 3>I was like, where does that come from? And I

0:01:43.920 --> 0:01:45.560
<v Speaker 3>was digging around a little. I'm sure there is a

0:01:45.600 --> 0:01:48.200
<v Speaker 3>good answer on that, but I don't know. My short

0:01:48.240 --> 0:01:51.560
<v Speaker 3>search didn't really turn up anything interesting, but it did

0:01:51.760 --> 0:01:55.560
<v Speaker 3>lead me indirectly to what we're talking about today, which is,

0:01:55.680 --> 0:02:00.520
<v Speaker 3>of course, we have the voice synthesis systems that that

0:02:00.560 --> 0:02:03.320
<v Speaker 3>are largely digital today. Before that, you had a lot

0:02:03.400 --> 0:02:09.400
<v Speaker 3>of electrical and electro mechanical systems for synthesizing human voices.

0:02:09.680 --> 0:02:12.359
<v Speaker 3>But actually there is an even earlier generation, which are

0:02:12.400 --> 0:02:17.520
<v Speaker 3>the purely mechanical voice synthesizers before electricity even came into

0:02:17.560 --> 0:02:21.040
<v Speaker 3>the picture. And that is what really stole my heart,

0:02:21.160 --> 0:02:23.920
<v Speaker 3>especially one particular machine of this type that I'm going

0:02:23.960 --> 0:02:25.960
<v Speaker 3>to talk about in the second half of this episode.

0:02:26.000 --> 0:02:29.120
<v Speaker 1>I think, yeah, this is a fascinating topic, in part

0:02:29.200 --> 0:02:31.359
<v Speaker 1>because look at it. Look at where we are now,

0:02:31.520 --> 0:02:34.640
<v Speaker 1>right it's easy today in our Internet age for just

0:02:34.680 --> 0:02:38.640
<v Speaker 1>the average Internet user to engage with various chatbots and

0:02:38.720 --> 0:02:43.760
<v Speaker 1>generative AI, text to speech and so forth. And so

0:02:43.800 --> 0:02:47.560
<v Speaker 1>we're able to interact with an artifact, a thing that

0:02:47.600 --> 0:02:50.480
<v Speaker 1>reflects human will, that has been designed to do key

0:02:50.639 --> 0:02:53.480
<v Speaker 1>and telling things that have long been the hallmarks of

0:02:53.560 --> 0:02:59.800
<v Speaker 1>human activity, artistic generation, creative writing and conversation, or especially speech.

0:03:00.840 --> 0:03:03.240
<v Speaker 1>And of course it's you know, it's easy nowadays to

0:03:03.280 --> 0:03:06.320
<v Speaker 1>do that, right, to transform into audible or even video

0:03:06.440 --> 0:03:11.000
<v Speaker 1>content what is either written by human or or created

0:03:11.040 --> 0:03:15.080
<v Speaker 1>with some sort of a chat bot machine. And the

0:03:15.160 --> 0:03:18.480
<v Speaker 1>results may be amusing, they may may be disastrous. But

0:03:18.639 --> 0:03:20.680
<v Speaker 1>we're in this age where the idea of the machine

0:03:20.720 --> 0:03:24.440
<v Speaker 1>speaking is not in and of itself groundbreaking, or at

0:03:24.520 --> 0:03:28.119
<v Speaker 1>least if it is groundbreaking, or if it's amazing, it's

0:03:28.960 --> 0:03:33.160
<v Speaker 1>that it's a lower level of amazement compared to previous ages.

0:03:33.760 --> 0:03:36.440
<v Speaker 3>Well, as you say, it's very integrated into modern technology.

0:03:36.520 --> 0:03:39.480
<v Speaker 3>So there's you know, Siri and Alexa, all these like

0:03:39.680 --> 0:03:43.520
<v Speaker 3>home devices that speak, GPS devices for the car, you know,

0:03:43.600 --> 0:03:46.480
<v Speaker 3>that speak to you, but almost all of them are

0:03:46.520 --> 0:03:49.320
<v Speaker 3>still the subject of amusement if you actually pay attention

0:03:49.360 --> 0:03:52.320
<v Speaker 3>to what the voice sounds like, you know, like reading

0:03:52.360 --> 0:03:54.800
<v Speaker 3>emotions into the voice that's telling you what to do

0:03:54.840 --> 0:03:57.920
<v Speaker 3>as you're driving. That always makes me laugh because it

0:03:57.920 --> 0:03:59.400
<v Speaker 3>always seems a little bit annoyed.

0:04:00.120 --> 0:04:03.360
<v Speaker 1>Yeah, yeah, what's series whole deal that sort of thing, right?

0:04:05.200 --> 0:04:07.240
<v Speaker 1>You know. The other interesting angle on all of this

0:04:07.600 --> 0:04:12.880
<v Speaker 1>is that are modern technological advancements here, or even some

0:04:12.920 --> 0:04:15.960
<v Speaker 1>of the historic technological advancements like they are kind of

0:04:16.000 --> 0:04:20.040
<v Speaker 1>the echo of a more ancient longing for this sort

0:04:20.040 --> 0:04:25.120
<v Speaker 1>of thing. It connects to something that's just fascinated us

0:04:25.120 --> 0:04:28.359
<v Speaker 1>for a long time, the idea generally of non human

0:04:28.520 --> 0:04:31.560
<v Speaker 1>entities engaging in speech. And you could you could go

0:04:31.600 --> 0:04:35.280
<v Speaker 1>absolutely wild chasing down the various divisions of this, right,

0:04:35.360 --> 0:04:40.160
<v Speaker 1>the various myths, legends, and traditions concerning the speech of animals, plants,

0:04:40.560 --> 0:04:47.080
<v Speaker 1>inorganic materials, supernatural entities, you know, voices seemingly internal but

0:04:47.160 --> 0:04:49.920
<v Speaker 1>also external to our individual experience.

0:04:50.400 --> 0:04:52.599
<v Speaker 3>Though I would say there is an interesting thing about

0:04:52.800 --> 0:04:57.120
<v Speaker 3>machines or human or automata or human artifacts in general

0:04:57.440 --> 0:05:01.920
<v Speaker 3>when compared to imagining an animal speaker or any other

0:05:02.279 --> 0:05:05.080
<v Speaker 3>usually not speaking things starting to speak, which is, if

0:05:05.120 --> 0:05:07.799
<v Speaker 3>you're talking about a machine that does it. That means

0:05:07.960 --> 0:05:10.839
<v Speaker 3>somebody has to make that machine and somebody has to

0:05:10.920 --> 0:05:13.560
<v Speaker 3>work that machine, and it kind of reminds me of

0:05:13.680 --> 0:05:16.960
<v Speaker 3>the idea of grammar in language. You know, the interesting

0:05:16.960 --> 0:05:20.080
<v Speaker 3>thing about grammar is that when we use language, we

0:05:20.120 --> 0:05:23.279
<v Speaker 3>all use grammar, so we have an intuitive grasp of

0:05:23.320 --> 0:05:26.760
<v Speaker 3>the rules of grammar, but without serious study, people can't

0:05:26.760 --> 0:05:29.000
<v Speaker 3>actually tell you what those rules are. And so, like,

0:05:29.040 --> 0:05:31.000
<v Speaker 3>you know, that had to be in a sense a

0:05:31.279 --> 0:05:34.919
<v Speaker 3>science to back engineer the rules of grammar that we

0:05:35.000 --> 0:05:38.680
<v Speaker 3>use intuitively to like make them systematic and you know,

0:05:38.800 --> 0:05:42.000
<v Speaker 3>actually discover what those rules are. The same thing could

0:05:42.040 --> 0:05:45.880
<v Speaker 3>be said about the phonetic rules that produce the intelligible speech.

0:05:46.279 --> 0:05:48.839
<v Speaker 3>We can all do it if we can speak, but

0:05:49.120 --> 0:05:54.160
<v Speaker 3>we don't necessarily understand what the individual physical properties of

0:05:54.200 --> 0:05:57.200
<v Speaker 3>a word are, and so we wouldn't necessarily know how

0:05:57.240 --> 0:05:59.800
<v Speaker 3>to make that same word come out of a machine.

0:06:00.600 --> 0:06:02.520
<v Speaker 1>Yeah, there are all these things that you have to

0:06:02.560 --> 0:06:06.200
<v Speaker 1>deconstruct before you can attempt to reproduce it artificially. And

0:06:06.279 --> 0:06:09.440
<v Speaker 1>we see that time and time again with in robotics,

0:06:09.440 --> 0:06:13.000
<v Speaker 1>for example. You know, things that we take for granted

0:06:13.320 --> 0:06:17.000
<v Speaker 1>concerning human movement, just about anything else you could imagine

0:06:17.080 --> 0:06:19.600
<v Speaker 1>it becomes so much more difficult to try and reproduce

0:06:19.640 --> 0:06:22.479
<v Speaker 1>that you've got to understand what it actually is on

0:06:22.520 --> 0:06:23.640
<v Speaker 1>an entirely new level.

0:06:23.680 --> 0:06:27.480
<v Speaker 3>First, Now, I am to understand that before anybody actually

0:06:27.520 --> 0:06:30.760
<v Speaker 3>made a machine that could approximate or synthesize a human

0:06:30.839 --> 0:06:34.479
<v Speaker 3>voice and produce intelligible speech, people were thinking about this

0:06:34.600 --> 0:06:35.480
<v Speaker 3>as a concept.

0:06:36.360 --> 0:06:38.680
<v Speaker 1>Yeah yeah, and this is not surprising. You know, this

0:06:38.880 --> 0:06:43.559
<v Speaker 1>is kind of the meat of science fiction. Right before

0:06:43.560 --> 0:06:45.839
<v Speaker 1>we can do it, we dream of it one way

0:06:45.960 --> 0:06:49.679
<v Speaker 1>or another, no matter what our exact grasp of science

0:06:49.720 --> 0:06:53.080
<v Speaker 1>happens to be. It always reminds me of that line

0:06:53.120 --> 0:06:57.880
<v Speaker 1>in William Gibson's Neuromancer where the character has made a deal,

0:06:57.960 --> 0:07:01.320
<v Speaker 1>a pact with a powerful AI, and it's pointed out like,

0:07:01.360 --> 0:07:04.000
<v Speaker 1>this is the sort of thing that in you know,

0:07:04.080 --> 0:07:07.200
<v Speaker 1>centuries ago, people only dreamed of making a deal with

0:07:07.320 --> 0:07:10.120
<v Speaker 1>a devil, and now we've made it possible through our

0:07:10.280 --> 0:07:11.320
<v Speaker 1>ingenuity and invention.

0:07:11.760 --> 0:07:12.880
<v Speaker 3>Congratulations.

0:07:13.240 --> 0:07:18.360
<v Speaker 1>Yeah, so so yeah. Narrowing down here into generally the

0:07:18.360 --> 0:07:27.200
<v Speaker 1>realm of alleged human creations that through at least partial technology,

0:07:27.280 --> 0:07:30.800
<v Speaker 1>but also sometimes wizardry and alchemy and other things that

0:07:30.840 --> 0:07:33.360
<v Speaker 1>are kind of like you know, bunched in there together

0:07:33.520 --> 0:07:37.760
<v Speaker 1>with with actual technology to create some sort of a

0:07:37.840 --> 0:07:40.600
<v Speaker 1>device capable of speech. And then there are some also

0:07:40.720 --> 0:07:42.880
<v Speaker 1>some related things that are tied in there as well.

0:07:43.080 --> 0:07:44.960
<v Speaker 1>And a lot of it comes down to the idea

0:07:45.320 --> 0:07:49.840
<v Speaker 1>of a head, an artificial head that speaks.

0:07:50.480 --> 0:07:53.800
<v Speaker 3>I found something so loaded and revealing about that. As

0:07:53.840 --> 0:07:56.800
<v Speaker 3>a fact, the history of these machines, so many of

0:07:56.840 --> 0:08:00.680
<v Speaker 3>them had fa whether real or imagined, These machines, so

0:08:00.800 --> 0:08:04.800
<v Speaker 3>many of them early on had heads or faces, so

0:08:04.840 --> 0:08:07.000
<v Speaker 3>like it wouldn't just be a speaker like you would

0:08:07.000 --> 0:08:10.000
<v Speaker 3>have today, that's you know, it's just a mechanical device

0:08:10.040 --> 0:08:13.040
<v Speaker 3>for making the sound. It's like that the presence of

0:08:13.080 --> 0:08:16.320
<v Speaker 3>a head or a face was considered important or at

0:08:16.360 --> 0:08:17.240
<v Speaker 3>least desirable.

0:08:18.000 --> 0:08:19.880
<v Speaker 1>Yeah, and I wondered to what extent part of it

0:08:19.920 --> 0:08:24.000
<v Speaker 1>is just an echo of these earlier ideas. So going

0:08:24.040 --> 0:08:27.680
<v Speaker 1>to run through a few of these here. One of

0:08:27.720 --> 0:08:31.400
<v Speaker 1>the most famous, mainly from a literary tradition, as we'll

0:08:31.520 --> 0:08:35.000
<v Speaker 1>discuss here, is the idea of the brazen head. And

0:08:35.440 --> 0:08:37.480
<v Speaker 1>ultimately I guess there's more than one brazen head. We

0:08:37.520 --> 0:08:42.720
<v Speaker 1>can say brazen heads artificial heads that could speak. There's

0:08:42.800 --> 0:08:46.400
<v Speaker 1>a basically a lot of these stories concerned thirteenth century

0:08:46.440 --> 0:08:50.600
<v Speaker 1>English philosopher and Franciscan friar Roger Bacon, who's come up

0:08:50.600 --> 0:08:54.360
<v Speaker 1>on the show before, though this particular version of the

0:08:54.360 --> 0:08:57.440
<v Speaker 1>story doesn't seem to emerge until the sixteenth century, and

0:08:57.520 --> 0:09:00.680
<v Speaker 1>it does so within the works of contemporary drama.

0:09:01.160 --> 0:09:03.400
<v Speaker 3>I think we talked about Roger Bacon at length in

0:09:03.440 --> 0:09:06.360
<v Speaker 3>an episode we did about the invention of fireworks, which

0:09:07.000 --> 0:09:09.360
<v Speaker 3>may come back and feature again in the feed soon.

0:09:10.040 --> 0:09:12.000
<v Speaker 1>Yeah, yeah, I believe you're right. Yeah. I think Bacon

0:09:12.000 --> 0:09:15.199
<v Speaker 1>did come up in that he had a reputation as

0:09:15.280 --> 0:09:18.960
<v Speaker 1>not only a very learned man in both natural philosophy

0:09:19.000 --> 0:09:22.319
<v Speaker 1>and theology, and I should drive home definitely existed. I

0:09:22.360 --> 0:09:24.560
<v Speaker 1>don't think there's any doubt that there was a Roger Bacon.

0:09:25.080 --> 0:09:27.280
<v Speaker 1>But then there are all these other stories that he

0:09:27.400 --> 0:09:31.880
<v Speaker 1>was also potentially a wizard who was capable of producing

0:09:32.000 --> 0:09:37.880
<v Speaker 1>fabulous automata, either through amazing feats of clockwork ingenuity that

0:09:37.960 --> 0:09:42.280
<v Speaker 1>I think many would say was ultimately, you know, impossible

0:09:42.360 --> 0:09:46.600
<v Speaker 1>during his time period, or failing that, he was into

0:09:46.679 --> 0:09:49.320
<v Speaker 1>alchemy and of course dark dank necromancy.

0:09:49.960 --> 0:09:52.000
<v Speaker 3>I think the way I conceive of Roger Bacon is

0:09:52.040 --> 0:09:55.360
<v Speaker 3>that he of course was a real figure. He was

0:09:56.120 --> 0:10:00.520
<v Speaker 3>of great intellectual note and significance, but much about his

0:10:00.600 --> 0:10:04.280
<v Speaker 3>sort of general reputation is kind of legendary, if that

0:10:04.320 --> 0:10:05.840
<v Speaker 3>makes sense. I mean, there are many things we know

0:10:05.920 --> 0:10:09.559
<v Speaker 3>about him that are true, but there's also just sort

0:10:09.600 --> 0:10:12.160
<v Speaker 3>of an aura or a vibe about him that is

0:10:12.200 --> 0:10:13.560
<v Speaker 3>not really based in reality.

0:10:14.280 --> 0:10:17.360
<v Speaker 1>Yeah, I mean, he becomes a character in literature, especially

0:10:17.360 --> 0:10:19.680
<v Speaker 1>in these accounts, so you can you can sort of

0:10:19.920 --> 0:10:23.320
<v Speaker 1>look at the different phases like historic individual ideas and

0:10:23.360 --> 0:10:27.880
<v Speaker 1>you know, misunderstandings have said real life individual and then

0:10:27.920 --> 0:10:31.320
<v Speaker 1>eventually that echoes into the fictional version of the person.

0:10:31.360 --> 0:10:33.400
<v Speaker 3>Which that's the more like wizard version.

0:10:34.120 --> 0:10:36.800
<v Speaker 1>Yeah, yeah, And so there are a few different examples

0:10:36.840 --> 0:10:39.160
<v Speaker 1>of this. This was like a popular motif for a while.

0:10:39.559 --> 0:10:43.480
<v Speaker 1>There's a sixteenth century prose romance titled the Famous History

0:10:43.480 --> 0:10:45.920
<v Speaker 1>of Friar Bacon, and it tells of Bacon trying to

0:10:46.400 --> 0:10:48.920
<v Speaker 1>give a replica of a human head speech and having

0:10:48.960 --> 0:10:52.800
<v Speaker 1>to call in the devil for help. Cool. Other versions

0:10:52.800 --> 0:10:54.959
<v Speaker 1>of this tale describe it as an artificial head given

0:10:55.040 --> 0:10:58.920
<v Speaker 1>life by demons, which was capable of spontaneous speech and

0:10:58.960 --> 0:11:02.840
<v Speaker 1>of course telling the future I mean, what else would

0:11:02.880 --> 0:11:08.160
<v Speaker 1>you tell right right? Robert Greene's sixteen thirty play Friar

0:11:08.240 --> 0:11:12.880
<v Speaker 1>Bacon and Friar Bungay mentions this several times, citing quote

0:11:13.000 --> 0:11:18.199
<v Speaker 1>Bacon's necromatic skill and heads of Brass that quote can

0:11:18.280 --> 0:11:21.160
<v Speaker 1>utter any voice. The idea that's exploring both of these

0:11:21.200 --> 0:11:23.679
<v Speaker 1>works is that Bacon wished to build a wall of

0:11:23.720 --> 0:11:27.000
<v Speaker 1>brass around Britain with the help of the Brazen Head.

0:11:27.720 --> 0:11:29.640
<v Speaker 1>He fails and the head explodes.

0:11:32.080 --> 0:11:33.920
<v Speaker 3>Why have I never heard this side? It as like

0:11:33.960 --> 0:11:35.640
<v Speaker 3>an early science fiction tale.

0:11:36.120 --> 0:11:38.120
<v Speaker 1>I don't know. I'm probably not doing due diligence on

0:11:38.280 --> 0:11:41.040
<v Speaker 1>exactly what happens and everything, at least like saying, well,

0:11:41.080 --> 0:11:43.719
<v Speaker 1>in Star Wars, the bad guys make one planet to

0:11:43.720 --> 0:11:45.920
<v Speaker 1>blow up another planet, and then the planet they may

0:11:46.000 --> 0:11:48.760
<v Speaker 1>blows up. You know, That's that's really skipping over a

0:11:48.760 --> 0:11:51.040
<v Speaker 1>lot of the nuance. And so I think there's there's

0:11:51.080 --> 0:11:53.280
<v Speaker 1>inevitably more nuance here, but I just I didn't get

0:11:53.280 --> 0:11:57.280
<v Speaker 1>into it. Okay, So this idea of the satanic brasshead

0:11:57.280 --> 0:12:00.480
<v Speaker 1>of Roger Bacon persists despite the fact that there's no

0:12:00.520 --> 0:12:04.520
<v Speaker 1>indication that anything like this even created purely through technology

0:12:04.559 --> 0:12:08.160
<v Speaker 1>and not Satanic wizardry was part of Bacon's world. He

0:12:08.280 --> 0:12:12.720
<v Speaker 1>was interested in optics and certainly various instruments scientific instruments

0:12:12.720 --> 0:12:14.680
<v Speaker 1>of brass of the day. But there's no indication that

0:12:14.720 --> 0:12:17.080
<v Speaker 1>he ever built an artificial head and tried to get

0:12:17.120 --> 0:12:17.760
<v Speaker 1>it to speak.

0:12:18.040 --> 0:12:20.320
<v Speaker 3>Okay, so this is part of the wizard aura, not

0:12:20.400 --> 0:12:22.360
<v Speaker 3>part of his biography.

0:12:22.280 --> 0:12:25.000
<v Speaker 1>Right though, you know, we have to drive home to

0:12:25.240 --> 0:12:28.600
<v Speaker 1>is it possible that Roger Bacon, as a hobby did

0:12:28.640 --> 0:12:31.040
<v Speaker 1>what he could to create you know, I mean it's possible,

0:12:31.080 --> 0:12:33.280
<v Speaker 1>it's not. You know, I don't think he would have

0:12:33.400 --> 0:12:36.120
<v Speaker 1>gotten to speak. But there are various sort of ways

0:12:36.160 --> 0:12:38.880
<v Speaker 1>you could interpret this as having some basis in reality

0:12:39.320 --> 0:12:43.480
<v Speaker 1>that doesn't involve magic or superscience of the day. Okay, Now,

0:12:43.640 --> 0:12:46.000
<v Speaker 1>I went to my bookshelf and I pulled off my

0:12:46.120 --> 0:12:49.680
<v Speaker 1>dusty copy of Brewer's Dictionary Phrase and Fable. It provides

0:12:49.679 --> 0:12:52.760
<v Speaker 1>a little more insight on the legend. Quote. It was

0:12:52.800 --> 0:12:55.960
<v Speaker 1>said if Bacon heard it speak, he would succeed in

0:12:55.960 --> 0:12:59.920
<v Speaker 1>his projects, if not, he would fail. His familiar mile

0:13:00.280 --> 0:13:03.000
<v Speaker 1>was set to watch, and while Bacon slept, the head

0:13:03.040 --> 0:13:07.400
<v Speaker 1>spoke thrice. Time is half an hour later, it said,

0:13:07.760 --> 0:13:11.840
<v Speaker 1>time was in another half hour, it said times past

0:13:12.280 --> 0:13:16.640
<v Speaker 1>fell down and was broken to atoms to atoms to atoms. Yes,

0:13:17.440 --> 0:13:21.160
<v Speaker 1>surely Adams means something different here, Adams rights been discovered

0:13:21.200 --> 0:13:23.600
<v Speaker 1>at the time. I think it just means like small

0:13:23.640 --> 0:13:24.520
<v Speaker 1>parts or something.

0:13:24.640 --> 0:13:28.400
<v Speaker 3>Yeah, yes, yeah, that would be hilarious if it was

0:13:28.480 --> 0:13:29.960
<v Speaker 3>literally broken to atoms.

0:13:30.600 --> 0:13:32.280
<v Speaker 1>Yeah. So I don't know if it if it works though,

0:13:32.320 --> 0:13:34.720
<v Speaker 1>it sounds like it's kind of an alarm clock that explodes.

0:13:35.080 --> 0:13:38.560
<v Speaker 3>Well, but I don't understand the difference between time was

0:13:38.800 --> 0:13:41.360
<v Speaker 3>and times pasted. They're both past tens.

0:13:41.920 --> 0:13:44.400
<v Speaker 1>Hm m. That's a good point. Time is, time was

0:13:45.080 --> 0:13:47.840
<v Speaker 1>times past. It seems like you would want the president

0:13:47.920 --> 0:13:52.439
<v Speaker 1>there somewhere, but yeah, that's that's that's what it allegedly said.

0:13:52.840 --> 0:13:57.679
<v Speaker 1>And you'll you'll find woodcuts that that have this this

0:13:57.840 --> 0:13:59.240
<v Speaker 1>motif on them as well.

0:13:59.600 --> 0:14:01.160
<v Speaker 3>Like it would it made more sense if it said

0:14:01.679 --> 0:14:05.000
<v Speaker 3>the three things where time will be, time is, time was,

0:14:05.920 --> 0:14:08.440
<v Speaker 3>but this seems more like time is, time was, time

0:14:08.679 --> 0:14:09.160
<v Speaker 3>was was.

0:14:11.240 --> 0:14:14.360
<v Speaker 1>Now. Brewers notes that reference to the references to the

0:14:14.360 --> 0:14:17.840
<v Speaker 1>Brazen Head are just common in literature, appearing frequently in

0:14:17.880 --> 0:14:23.080
<v Speaker 1>early romances but with Eastern origins, though it doesn't get

0:14:23.080 --> 0:14:26.200
<v Speaker 1>into that a lot elsewhere in the volume. It's also

0:14:26.320 --> 0:14:32.040
<v Speaker 1>noted that artificial heads that speak occur elsewhere as well.

0:14:32.080 --> 0:14:33.880
<v Speaker 1>And some of these are brazen heads, and some of

0:14:33.960 --> 0:14:35.680
<v Speaker 1>these are other things, but they're kind of I think

0:14:35.680 --> 0:14:38.440
<v Speaker 1>it's important to run through briefly some of these examples

0:14:38.480 --> 0:14:41.080
<v Speaker 1>because they kind of paint a picture of not only

0:14:41.120 --> 0:14:44.600
<v Speaker 1>some of these other ideas of artificial heads speaking and

0:14:44.640 --> 0:14:52.240
<v Speaker 1>telling the future, but related non technological non artifacts that

0:14:52.640 --> 0:14:57.960
<v Speaker 1>kind of help inform what we think technology can do. Okay, okay,

0:14:58.000 --> 0:14:59.920
<v Speaker 1>So one of them is a brazen head in the

0:15:00.000 --> 0:15:02.960
<v Speaker 1>possession of Pope Sylvester the Second in the tenth century,

0:15:03.240 --> 0:15:07.680
<v Speaker 1>which he also constructed, and misinterpretations of its utterances could

0:15:07.760 --> 0:15:08.720
<v Speaker 1>prove disastrous.

0:15:09.200 --> 0:15:13.280
<v Speaker 3>Oh, is this also believed to be Satanic in some way?

0:15:14.600 --> 0:15:19.920
<v Speaker 1>I didn't go too deep on Satanic implications, but possibly, I.

0:15:19.880 --> 0:15:22.240
<v Speaker 3>Guess it would depend on if this legend is associated

0:15:22.280 --> 0:15:25.920
<v Speaker 3>with pro Pope Sylvester or anti Pope Sylvester sources.

0:15:26.040 --> 0:15:28.680
<v Speaker 1>Right, right, But you can definitely see that they're in

0:15:28.720 --> 0:15:31.040
<v Speaker 1>the head itself, regardless of what's supposed to be powering it.

0:15:31.120 --> 0:15:34.160
<v Speaker 1>Like this, it ties into two oracular traditions. You know,

0:15:34.240 --> 0:15:37.320
<v Speaker 1>the idea that here is this thing that can give

0:15:37.400 --> 0:15:41.120
<v Speaker 1>you cryptic wisdom if you have the wisdom to decipher

0:15:41.160 --> 0:15:43.760
<v Speaker 1>what it's telling you. Another example that's brought up in

0:15:43.800 --> 0:15:48.560
<v Speaker 1>Brewers is or the Colossi of Memnon, which we did

0:15:48.640 --> 0:15:50.240
<v Speaker 1>at least a whole I don't know, I can't remember

0:15:50.240 --> 0:15:52.560
<v Speaker 1>as one episode or multiple episodes, but we discussed this

0:15:53.080 --> 0:15:56.280
<v Speaker 1>on stuff to blow your mind. This is a fascinating

0:15:56.280 --> 0:15:57.400
<v Speaker 1>topic in and of itself.

0:15:57.640 --> 0:15:59.920
<v Speaker 3>This was basically, I think a statue or a pair

0:15:59.920 --> 0:16:03.320
<v Speaker 3>of statues, part of sort of a ruins complex that

0:16:03.520 --> 0:16:08.200
<v Speaker 3>was famous during in Roman Egypt as basically because it

0:16:08.240 --> 0:16:11.280
<v Speaker 3>would make sounds, and there were different theories about how

0:16:11.320 --> 0:16:12.600
<v Speaker 3>it made sounds and why.

0:16:12.920 --> 0:16:15.920
<v Speaker 1>Yeah, yeah, it seems like I think some said it

0:16:15.960 --> 0:16:18.320
<v Speaker 1>was capable of speech, but generally it's described as singing

0:16:18.440 --> 0:16:21.600
<v Speaker 1>or some sort of a note. And as we discussed,

0:16:21.840 --> 0:16:24.840
<v Speaker 1>while there are some I think unlikely theories regarding the

0:16:24.920 --> 0:16:28.640
<v Speaker 1>use of some sort of intentional sound generating device or devices,

0:16:28.920 --> 0:16:31.400
<v Speaker 1>it seems like a more likely explanation would have to

0:16:31.400 --> 0:16:34.880
<v Speaker 1>do with peculiarities of the stone as it heated in

0:16:34.960 --> 0:16:37.880
<v Speaker 1>the sun and then cooled at night. Anyway, go back

0:16:37.920 --> 0:16:39.200
<v Speaker 1>and listen to that episode if you want to know

0:16:39.200 --> 0:16:41.560
<v Speaker 1>about them. They have a pretty fascinating history.

0:16:42.040 --> 0:16:43.640
<v Speaker 3>We'll remember better in the original.

0:16:44.440 --> 0:16:49.480
<v Speaker 1>Yes, there's the head of Orpheus at Lesbos, predicting the

0:16:49.520 --> 0:16:53.080
<v Speaker 1>doom and death of Cyrus the Great. However, I believe

0:16:53.120 --> 0:16:55.400
<v Speaker 1>this is generally thought to be the actual head of

0:16:55.440 --> 0:16:58.280
<v Speaker 1>the hero Orpheus, after he was torn apart by the

0:16:58.280 --> 0:17:02.480
<v Speaker 1>main ads of Dionysus during a bacchanalia for the sin

0:17:02.560 --> 0:17:06.000
<v Speaker 1>of worshiping Apollo or having worshiped Apollo. I'm not sure

0:17:06.040 --> 0:17:09.400
<v Speaker 1>what the exact charge was, but still a prophetic, disembodied

0:17:09.440 --> 0:17:13.199
<v Speaker 1>head that still continues to speak. Brewers also mentions the

0:17:13.240 --> 0:17:17.439
<v Speaker 1>head of Minos brought by Odin to Scandinavia, which I

0:17:17.480 --> 0:17:20.040
<v Speaker 1>didn't know what to make of this, because Minos is

0:17:20.119 --> 0:17:26.840
<v Speaker 1>of course the mythical king of Crete that we've discussed

0:17:26.840 --> 0:17:29.320
<v Speaker 1>on the show before as well. I think the actual

0:17:29.480 --> 0:17:33.040
<v Speaker 1>figure in reference here might be Nimir, the god of

0:17:33.080 --> 0:17:39.080
<v Speaker 1>wisdom that is beheaded in the Aservaniir War. Odin claims

0:17:39.080 --> 0:17:42.240
<v Speaker 1>this head and it continues to speak secret wisdom. Again,

0:17:42.320 --> 0:17:44.560
<v Speaker 1>this is another one that's not a mechanical head. It's

0:17:44.560 --> 0:17:48.160
<v Speaker 1>the head of an actual defeated divine being that continues

0:17:48.200 --> 0:17:51.840
<v Speaker 1>to live on and to speak. There are tales of

0:17:51.920 --> 0:17:55.760
<v Speaker 1>Albertus Magnus having an earthen head, which during the thirteenth

0:17:55.840 --> 0:17:59.760
<v Speaker 1>century was said to speak and move until Thomas Aquinas

0:17:59.760 --> 0:18:02.720
<v Speaker 1>breaks sit by accident, and Magnus says, there goes the

0:18:02.760 --> 0:18:07.560
<v Speaker 1>labor of thirty years, because now it's broken. So I

0:18:07.560 --> 0:18:09.880
<v Speaker 1>don't know what to make of that one either completely.

0:18:09.920 --> 0:18:13.359
<v Speaker 1>But again we see this motif of a fabulous artificial

0:18:13.359 --> 0:18:16.919
<v Speaker 1>head that speaks, that manages to break one way or another,

0:18:17.000 --> 0:18:21.280
<v Speaker 1>either something fails, somebody knocks it over, or you know,

0:18:21.359 --> 0:18:24.840
<v Speaker 1>it explodes after you hit this nooze alarm twice. Then

0:18:24.880 --> 0:18:29.000
<v Speaker 1>there's Alexander's statue of Ascalapius, the Greek god of medicine,

0:18:29.240 --> 0:18:32.840
<v Speaker 1>that was said to speak, but Lucian wrote that the

0:18:32.880 --> 0:18:37.119
<v Speaker 1>sounds came via a concealed man who spoke through tubes.

0:18:37.720 --> 0:18:42.560
<v Speaker 1>So here's an example of some sort of of a creation.

0:18:43.800 --> 0:18:45.240
<v Speaker 1>I guess it depends on you look at either a

0:18:45.240 --> 0:18:50.920
<v Speaker 1>statue that isn't intended to speak, or through supernatural machinations speaks,

0:18:51.040 --> 0:18:54.240
<v Speaker 1>But according to Lucian, it's in neither of those. It's

0:18:54.240 --> 0:18:56.720
<v Speaker 1>just tubes and some guy like hiding in the bushes

0:18:56.760 --> 0:18:59.840
<v Speaker 1>speaking through the tubes, which is still clever and still technological,

0:19:00.760 --> 0:19:01.680
<v Speaker 1>but is trickery.

0:19:02.359 --> 0:19:05.879
<v Speaker 3>Nonetheless, I think the Lucian you're alluding to there is

0:19:05.960 --> 0:19:07.080
<v Speaker 3>Lucian of Samosada.

0:19:07.280 --> 0:19:09.760
<v Speaker 1>Is that right? I believe so, yes, yeah, this.

0:19:09.800 --> 0:19:12.760
<v Speaker 3>Is like this was an ancient satirist from Syria who

0:19:12.960 --> 0:19:17.560
<v Speaker 3>is quite hilarious and was kind of a skeptic debunker

0:19:17.680 --> 0:19:20.399
<v Speaker 3>of the of like the second century CE, which is

0:19:20.840 --> 0:19:24.119
<v Speaker 3>sort of strange, but he was in that mold and

0:19:24.160 --> 0:19:28.560
<v Speaker 3>he made like vicious mockery of people of all sorts

0:19:28.600 --> 0:19:32.240
<v Speaker 3>and different philosophies and stuff, and also wrote a satire

0:19:32.359 --> 0:19:34.840
<v Speaker 3>that some people have considered one of the earliest forms

0:19:34.840 --> 0:19:35.560
<v Speaker 3>of science fiction.

0:19:37.080 --> 0:19:39.480
<v Speaker 1>Now this also reminds me this is not I mean,

0:19:39.640 --> 0:19:43.000
<v Speaker 1>I guess it memory series. Maybe it did speak, But

0:19:43.960 --> 0:19:46.800
<v Speaker 1>there was of course the man faced Serpent God Glicon

0:19:47.320 --> 0:19:50.440
<v Speaker 1>of the second century that is often held up as

0:19:50.480 --> 0:19:54.120
<v Speaker 1>being a hoax, like it was actually a puppet according

0:19:54.119 --> 0:19:59.360
<v Speaker 1>to commentators. But I've always wondered what to make of that,

0:19:59.440 --> 0:20:03.560
<v Speaker 1>because it kind of if someone is performing puppetry and

0:20:03.560 --> 0:20:07.000
<v Speaker 1>people are having an emotional or even religious reaction to it,

0:20:07.000 --> 0:20:09.280
<v Speaker 1>it kind of depends how it's presented. Right, are you

0:20:09.320 --> 0:20:13.240
<v Speaker 1>presenting Glican the man face serpent as like, this is it?

0:20:13.400 --> 0:20:16.520
<v Speaker 1>This is an actual man face serpent God, Come take

0:20:16.520 --> 0:20:19.760
<v Speaker 1>a look that it's life. Is proof that he is real?

0:20:20.480 --> 0:20:23.680
<v Speaker 1>Or is it something else? Is it more like performance

0:20:23.840 --> 0:20:26.359
<v Speaker 1>or is it more like reinterpretation? You know, because you

0:20:26.400 --> 0:20:30.920
<v Speaker 1>have plenty of examples where people will carry out performances

0:20:30.920 --> 0:20:34.040
<v Speaker 1>in which people dress as divine and semi divine figures.

0:20:34.600 --> 0:20:37.240
<v Speaker 1>It's not supposed to be like, look at the proof here,

0:20:37.440 --> 0:20:40.600
<v Speaker 1>here is this hero on the stage. This means God

0:20:40.680 --> 0:20:41.880
<v Speaker 1>is real. Funny enough.

0:20:41.880 --> 0:20:44.440
<v Speaker 3>I think Glicon was also written about by Lucian of

0:20:44.480 --> 0:20:49.080
<v Speaker 3>some Asada. But I guess the crucial question is like

0:20:49.720 --> 0:20:51.840
<v Speaker 3>is there an attempt at trickery or not? Like do

0:20:52.720 --> 0:20:55.960
<v Speaker 3>you want the audience to believe there is not somebody

0:20:56.080 --> 0:20:56.960
<v Speaker 3>behind the mask?

0:20:57.480 --> 0:21:00.640
<v Speaker 1>Right? And you know that's interesting because that still kind

0:21:00.640 --> 0:21:03.080
<v Speaker 1>of applies to a lot of what's going on in

0:21:03.119 --> 0:21:06.000
<v Speaker 1>the world today with things like like chat box and

0:21:06.040 --> 0:21:09.639
<v Speaker 1>so forth. And you know, this idea that if we

0:21:10.400 --> 0:21:13.480
<v Speaker 1>know what is coming out of the box, what is

0:21:13.560 --> 0:21:17.200
<v Speaker 1>coming out of the artificial head? And you know, we

0:21:17.640 --> 0:21:20.520
<v Speaker 1>how are we interpreting it? And are we thinking there

0:21:20.560 --> 0:21:23.919
<v Speaker 1>is something there that is not. So it's like, on

0:21:23.960 --> 0:21:26.359
<v Speaker 1>what level is there trickery, and then there is like

0:21:26.560 --> 0:21:30.920
<v Speaker 1>interpretation of the trickery and so forth. But at any rate,

0:21:31.080 --> 0:21:32.920
<v Speaker 1>I think, you know, some of these examples they proved

0:21:32.960 --> 0:21:35.520
<v Speaker 1>that well before people could make any kind of a

0:21:35.560 --> 0:21:38.359
<v Speaker 1>mechanical thing, be it a head or not ahead that

0:21:38.440 --> 0:21:41.159
<v Speaker 1>could speak, we were still capable of dreaming about it.

0:21:41.560 --> 0:21:44.080
<v Speaker 1>And I think there's ample evidence that long before anyone

0:21:44.119 --> 0:21:46.919
<v Speaker 1>attempted to make a head that could talk through mechanical means,

0:21:47.400 --> 0:21:52.159
<v Speaker 1>individuals sought and sometimes found a voice emerging from disembodied heads,

0:21:52.440 --> 0:21:55.240
<v Speaker 1>either real ones the you know, the the remains of

0:21:55.280 --> 0:22:00.000
<v Speaker 1>human beings or other animals, or or likenesses of human heads,

0:22:00.880 --> 0:22:04.560
<v Speaker 1>either attached or detached from statues, and so forth. And

0:22:04.800 --> 0:22:07.600
<v Speaker 1>I think there's room between trickery and belief for the

0:22:07.600 --> 0:22:18.199
<v Speaker 1>suspension of belief and ritual as well to take into account.

0:22:19.840 --> 0:22:22.560
<v Speaker 3>But of course, later on people would end up building real,

0:22:22.760 --> 0:22:27.000
<v Speaker 3>operable machines that were at least attempting to produce speech

0:22:27.119 --> 0:22:28.920
<v Speaker 3>that could be understood by humans.

0:22:29.400 --> 0:22:31.560
<v Speaker 1>That's right, And this is where we get more into

0:22:31.640 --> 0:22:36.040
<v Speaker 1>the deconstruction of what human speech is, which in and

0:22:36.040 --> 0:22:41.400
<v Speaker 1>of itself is a whole subject, but there are key

0:22:41.480 --> 0:22:44.440
<v Speaker 1>moments where we see some major advancements being made here.

0:22:44.960 --> 0:22:48.080
<v Speaker 1>So another major entry to discuss in all of this

0:22:48.160 --> 0:22:50.919
<v Speaker 1>is the work of German born Russian doctor, physicist and

0:22:50.960 --> 0:22:55.760
<v Speaker 1>engineer Christian Gottlieb Kratzenstein, who lives seventeen twenty three through

0:22:55.840 --> 0:23:00.280
<v Speaker 1>seventeen ninety five. So he was a man of various

0:22:59.800 --> 0:23:03.639
<v Speaker 1>including the use of electricity and medicine, and at the

0:23:03.640 --> 0:23:06.640
<v Speaker 1>Saint Petersburg Science Academy at one point offered a prize

0:23:06.640 --> 0:23:12.000
<v Speaker 1>for advancements made in researching the mechanisms behind the vowels AEI, O,

0:23:12.080 --> 0:23:15.520
<v Speaker 1>and you in human speech. So in seventeen seventy nine

0:23:15.560 --> 0:23:21.320
<v Speaker 1>he presented his vowel organ to the university. The vowel

0:23:21.400 --> 0:23:25.480
<v Speaker 1>organ consisted of a series of resonators that produced vowel

0:23:25.560 --> 0:23:30.440
<v Speaker 1>like sounds on a constant pitch when excited by a read.

0:23:31.240 --> 0:23:36.119
<v Speaker 1>I found some illustrations of these basic resonators via the

0:23:36.240 --> 0:23:44.000
<v Speaker 1>UCL Psychology and Language Sciences Department. Here I also found

0:23:44.880 --> 0:23:49.280
<v Speaker 1>a website linked at this website where you can find

0:23:49.280 --> 0:23:52.760
<v Speaker 1>instructions for how to make your own resonators out of

0:23:52.800 --> 0:23:56.879
<v Speaker 1>plumbing supplies, which I found rather insightful. I did not

0:23:56.920 --> 0:24:01.560
<v Speaker 1>attempt it, but if you're into plumbing supplies and vowel sounds,

0:24:01.800 --> 0:24:04.000
<v Speaker 1>it seems like a natural craft choice.

0:24:04.280 --> 0:24:07.399
<v Speaker 3>But the key insight being here that by changing the

0:24:07.520 --> 0:24:10.919
<v Speaker 3>shape of a physical resonating cavity, you can change the

0:24:11.119 --> 0:24:13.280
<v Speaker 3>sound of the vowel produced.

0:24:13.720 --> 0:24:16.960
<v Speaker 1>Right right. Another take on this. I was reading the

0:24:17.000 --> 0:24:20.160
<v Speaker 1>BBC Future article The Machines That Learned to Listen by

0:24:20.240 --> 0:24:24.960
<v Speaker 1>Kadia Musfych, and it describes these as resonance tubes connected

0:24:24.960 --> 0:24:28.320
<v Speaker 1>to organ pipes. So you know, this is not to

0:24:28.359 --> 0:24:29.960
<v Speaker 1>say that we have this is not on this like

0:24:30.000 --> 0:24:32.560
<v Speaker 1>the same level as some sort of imaginary brazen head

0:24:32.600 --> 0:24:34.680
<v Speaker 1>that's going to speak of its own and spout out,

0:24:34.960 --> 0:24:37.679
<v Speaker 1>spit out wisdom for you to interpret. This is about

0:24:37.720 --> 0:24:41.320
<v Speaker 1>just figuring out, you know, how these vowel sounds are

0:24:41.440 --> 0:24:47.400
<v Speaker 1>produced and reproducing them through a basic mechanical system. Musvich

0:24:47.480 --> 0:24:49.560
<v Speaker 1>also points out a few other key individuals in the

0:24:49.600 --> 0:24:54.439
<v Speaker 1>advancement of this technology. There's Wolfgang von Kimplin in Vienna,

0:24:54.480 --> 0:24:57.640
<v Speaker 1>who created a similar acoustic mechanical speech machine about ten

0:24:57.720 --> 0:25:02.880
<v Speaker 1>years after Kratzenstein. And then she also mentions English inventor

0:25:03.160 --> 0:25:06.120
<v Speaker 1>Charles Wheatstone, who would improve on this in the early

0:25:06.200 --> 0:25:07.040
<v Speaker 1>nineteenth century.

0:25:07.440 --> 0:25:10.159
<v Speaker 3>Charles Wheatstone. I'm going to mention him again in a minute,

0:25:10.200 --> 0:25:13.000
<v Speaker 3>but he's also notable because he was one of the

0:25:13.040 --> 0:25:18.439
<v Speaker 3>inventors of the first commercially successful form of the telegraph.

0:25:18.920 --> 0:25:21.040
<v Speaker 3>So we talked about him in our episode on your

0:25:21.200 --> 0:25:24.399
<v Speaker 3>mention of the telegraph. But when it comes to the

0:25:24.400 --> 0:25:27.199
<v Speaker 3>one you mentioned before, that von Kemplan's machine, this is

0:25:27.240 --> 0:25:34.560
<v Speaker 3>interesting because I read that while this machine was allegedly real,

0:25:34.640 --> 0:25:37.119
<v Speaker 3>it was a real attempt to make a machine that

0:25:37.160 --> 0:25:41.720
<v Speaker 3>would speak. Von Kemplan is now known for essentially being

0:25:41.760 --> 0:25:45.320
<v Speaker 3>a hoaxer because he tried to create other automata, including

0:25:45.720 --> 0:25:49.280
<v Speaker 3>a chess playing automaton that was actually a hoax. It

0:25:49.320 --> 0:25:52.080
<v Speaker 3>had a human inside it doing the moves, so it

0:25:52.119 --> 0:25:53.640
<v Speaker 3>was a fake robot.

0:25:53.680 --> 0:25:56.920
<v Speaker 1>Though as a fake still really impressive. It's interesting where

0:25:56.960 --> 0:25:59.000
<v Speaker 1>you get in like what sometimes you're wondering. You have

0:25:59.040 --> 0:26:02.040
<v Speaker 1>to wonder what the line knows between, you know, the

0:26:02.080 --> 0:26:06.040
<v Speaker 1>actual technological innovation and trickery. I mean, obviously it's deception,

0:26:06.400 --> 0:26:09.160
<v Speaker 1>and if you have a secret chamber in which there's

0:26:09.200 --> 0:26:11.800
<v Speaker 1>a whole person doing stuff, you know, that's a real

0:26:11.840 --> 0:26:15.119
<v Speaker 1>red flag there as well. But still the trickery is

0:26:15.119 --> 0:26:16.160
<v Speaker 1>pretty ingenious too.

0:26:16.640 --> 0:26:19.879
<v Speaker 3>Yeah, well, yeah, I mean it takes skill to be

0:26:19.880 --> 0:26:20.600
<v Speaker 3>a good magician.

0:26:20.960 --> 0:26:21.440
<v Speaker 1>Yeah.

0:26:21.480 --> 0:26:24.159
<v Speaker 3>Anyway, this brings us to the example that I was

0:26:24.200 --> 0:26:27.439
<v Speaker 3>really excited to talk about in today's episode, which is

0:26:27.480 --> 0:26:33.520
<v Speaker 3>the speaking machine of a nineteenth century inventor named Joseph Fober.

0:26:34.320 --> 0:26:37.160
<v Speaker 3>So one of my main sources here is just generally

0:26:37.160 --> 0:26:40.080
<v Speaker 3>a good source on the history of speech synthesis and

0:26:40.280 --> 0:26:44.639
<v Speaker 3>talking machines. It was a book chapter in the Rutledge

0:26:44.640 --> 0:26:48.480
<v Speaker 3>Handbook of Phonetics from twenty nineteen by an author named

0:26:48.520 --> 0:26:52.159
<v Speaker 3>Brad H. Story, who is part of the faculty of

0:26:52.200 --> 0:26:54.840
<v Speaker 3>the Department of Speech, Language and Hearing Sciences at the

0:26:54.920 --> 0:27:00.199
<v Speaker 3>University of Arizona. And Story, in this chapter traces the

0:27:00.280 --> 0:27:03.679
<v Speaker 3>history of speech synthesis from the mechanical methods of the

0:27:03.720 --> 0:27:07.560
<v Speaker 3>eighteenth and nineteenth centuries to the digital techniques of the present.

0:27:07.600 --> 0:27:10.960
<v Speaker 3>So it's the whole sort of modern arc of these machines.

0:27:11.359 --> 0:27:13.000
<v Speaker 3>But the thing I really want to focus in on

0:27:13.040 --> 0:27:15.960
<v Speaker 3>here now is this machine that I mentioned a minute ago,

0:27:16.000 --> 0:27:21.200
<v Speaker 3>by the nineteenth century German inventor Joseph Fober. This features

0:27:21.200 --> 0:27:24.639
<v Speaker 3>heavily at the beginning of stories chapter here. So this

0:27:24.800 --> 0:27:29.920
<v Speaker 3>machine was at various different times called the Marvelous Talking Machine.

0:27:30.119 --> 0:27:33.320
<v Speaker 3>You got a hyphen between talking machine and also the

0:27:33.600 --> 0:27:38.040
<v Speaker 3>euphonia from the Greek meaning good sound or sweet sound.

0:27:38.960 --> 0:27:40.960
<v Speaker 3>We'll see about that as we as we go on.

0:27:41.400 --> 0:27:44.800
<v Speaker 3>Robi included one illustration of the machine for you to

0:27:44.840 --> 0:27:46.640
<v Speaker 3>look at here. I think this may have been from

0:27:46.680 --> 0:27:49.760
<v Speaker 3>some kind of promotional material when this machine was featured

0:27:49.800 --> 0:27:51.720
<v Speaker 3>in an exhibit that I'll describe in a bit.

0:27:52.080 --> 0:27:55.439
<v Speaker 1>I love it in part because right there is this

0:27:55.600 --> 0:28:01.440
<v Speaker 1>angelic human face like right there on the machine, seemingly

0:28:01.560 --> 0:28:06.040
<v Speaker 1>as decoration or maybe tribute. I'm not sure, but I'm

0:28:06.040 --> 0:28:08.879
<v Speaker 1>not sure if it's actually necessary to the mechanics of

0:28:08.920 --> 0:28:09.320
<v Speaker 1>the device.

0:28:09.359 --> 0:28:13.359
<v Speaker 3>Here, I think it sort of is. I'll explain so.

0:28:13.600 --> 0:28:19.160
<v Speaker 3>Story introduces Fober's machine through the eyes of another inventor

0:28:19.200 --> 0:28:23.120
<v Speaker 3>and scientist of the day named Joseph Henry. A different Joseph,

0:28:23.520 --> 0:28:28.879
<v Speaker 3>a researcher on electromagnetic induction and also the inaugural secretary

0:28:28.920 --> 0:28:34.840
<v Speaker 3>of the Smithsonian Institution. Henry encountered Fober's Marvelous Talking Machine

0:28:34.880 --> 0:28:39.480
<v Speaker 3>at a private exhibition in Philadelphia on December twentieth, eighteen

0:28:39.640 --> 0:28:44.000
<v Speaker 3>forty five, and he described the demonstration in a letter

0:28:44.160 --> 0:28:46.959
<v Speaker 3>to a colleague named H. M. Alexander. So we have

0:28:47.080 --> 0:28:50.240
<v Speaker 3>contemporaneous notes on what it was doing and what it

0:28:50.240 --> 0:28:54.080
<v Speaker 3>looked like in this private demonstration. So here's how it worked.

0:28:54.760 --> 0:28:59.720
<v Speaker 3>It was controlled by an operator via a mainly by

0:29:00.040 --> 0:29:03.880
<v Speaker 3>foot pedals and a keyboard, essentially just like an organ,

0:29:04.040 --> 0:29:07.120
<v Speaker 3>like a chamber organ, and in fact the device could

0:29:07.160 --> 0:29:10.800
<v Speaker 3>in some ways be considered a modified organ. So you

0:29:10.840 --> 0:29:14.120
<v Speaker 3>had a foot pedal that operated a bellows and that

0:29:14.120 --> 0:29:18.480
<v Speaker 3>would supply airflow to the whole system, and the bellows

0:29:18.560 --> 0:29:22.560
<v Speaker 3>pumped air through an artificial larynx that had vocal cords

0:29:22.600 --> 0:29:24.440
<v Speaker 3>that were in this source said to be made of

0:29:24.560 --> 0:29:30.520
<v Speaker 3>rubber and these so this artificial glottis or artificial vocal

0:29:30.560 --> 0:29:35.120
<v Speaker 3>cords would vibrate to produce the fundamental sound of the

0:29:35.160 --> 0:29:38.280
<v Speaker 3>machine's voice when air was flowing through them. And then

0:29:38.320 --> 0:29:42.520
<v Speaker 3>you had sixteen keys on the keyboard which were connected

0:29:42.560 --> 0:29:47.160
<v Speaker 3>by strings and levers to the various components that controlled

0:29:47.320 --> 0:29:50.120
<v Speaker 3>the shaping of that sound of that, you know, the

0:29:50.160 --> 0:29:54.000
<v Speaker 3>resonating sound from that airflow through the glottis into speech.

0:29:54.480 --> 0:29:57.040
<v Speaker 3>One of the interesting things is, as we've been saying,

0:29:57.040 --> 0:29:59.920
<v Speaker 3>this device actually had a face, so the face was

0:30:00.120 --> 0:30:03.920
<v Speaker 3>made of carved wood, essentially a large doll head, but

0:30:04.040 --> 0:30:06.800
<v Speaker 3>it had a hinged jaw, so maybe you should think

0:30:06.840 --> 0:30:09.640
<v Speaker 3>of it more like a ventriloquist dummy. You're loving this,

0:30:09.720 --> 0:30:13.560
<v Speaker 3>aren't you, Yeah, Night of the Living dummy. But it

0:30:13.600 --> 0:30:18.000
<v Speaker 3>can actually speak, And so inside the dummy's mouth there

0:30:18.080 --> 0:30:22.360
<v Speaker 3>was an ivory tongue that could be moved around inside

0:30:22.400 --> 0:30:26.640
<v Speaker 3>the oral cavity to control the shape of the resonating chamber.

0:30:27.360 --> 0:30:30.360
<v Speaker 3>And by controlling these different elements like the mouth and

0:30:30.400 --> 0:30:33.400
<v Speaker 3>the tongue and all that with the keys on the keyboard,

0:30:34.600 --> 0:30:38.720
<v Speaker 3>it quote imposed time varying changes to the air cavity

0:30:39.040 --> 0:30:45.760
<v Speaker 3>appropriate for generating apparently convincing renditions of connected speech. So

0:30:45.800 --> 0:30:48.960
<v Speaker 3>it may not have sounded perfect or even pleasant, but

0:30:49.240 --> 0:30:53.080
<v Speaker 3>apparently people in the room could understand what the machine

0:30:53.120 --> 0:30:55.200
<v Speaker 3>was saying when Fober operated it.

0:30:55.640 --> 0:30:56.160
<v Speaker 1>So this is.

0:30:56.080 --> 0:30:59.520
<v Speaker 3>Eighteen forty five and the machine is speaking intelligible words.

0:31:00.360 --> 0:31:03.800
<v Speaker 3>Henry in this letter compares it favorably to a different

0:31:03.840 --> 0:31:06.600
<v Speaker 3>talking machine, one he had seen years before. This was

0:31:06.640 --> 0:31:08.760
<v Speaker 3>one of the ones you mentioned, Rob, the one built

0:31:08.800 --> 0:31:12.200
<v Speaker 3>by the English scientist and inventor Charles Wheatstone. Again, the

0:31:12.520 --> 0:31:18.760
<v Speaker 3>telegraph guy Wheatstone's talking machine was capable of being understood

0:31:18.920 --> 0:31:22.240
<v Speaker 3>for the set of words it could produce, but Fober's

0:31:22.280 --> 0:31:27.400
<v Speaker 3>machine was far superior because its speech repertoire was infinitely variable,

0:31:27.440 --> 0:31:31.280
<v Speaker 3>so he could speak whole sentences, and those sentences could

0:31:31.320 --> 0:31:34.360
<v Speaker 3>contain any words and any sounds you wanted, as long

0:31:34.400 --> 0:31:37.440
<v Speaker 3>as they were in one of the covered languages. Obviously

0:31:37.480 --> 0:31:40.840
<v Speaker 3>it couldn't do, you know, like tonal languages or like

0:31:41.000 --> 0:31:44.120
<v Speaker 3>speak Mandarin or something, but it seems like mainly it

0:31:44.160 --> 0:31:46.520
<v Speaker 3>was speaking German and English. It was said at the

0:31:46.520 --> 0:31:49.400
<v Speaker 3>time that it could speak any European language. Now, I

0:31:49.440 --> 0:31:51.680
<v Speaker 3>think one thing that's really worth noting here is that

0:31:51.800 --> 0:31:54.240
<v Speaker 3>if you imagine how a machine like this would work,

0:31:54.800 --> 0:31:59.880
<v Speaker 3>the success of the performance would depend heavily on the

0:32:00.240 --> 0:32:04.120
<v Speaker 3>skill of the operator, since the speech patterns are not

0:32:04.320 --> 0:32:09.160
<v Speaker 3>like programmed, and you know, it's not sort of expressed automatically,

0:32:09.720 --> 0:32:13.760
<v Speaker 3>but expressed in real time by the player operating the

0:32:13.760 --> 0:32:16.000
<v Speaker 3>bellows and the keys. And I think also there were

0:32:16.040 --> 0:32:19.640
<v Speaker 3>some screws and stuff that would manipulate pitch and things

0:32:19.680 --> 0:32:22.280
<v Speaker 3>like that, So you have to play this just like

0:32:22.360 --> 0:32:26.600
<v Speaker 3>you would play a musical instrument. So different players using

0:32:26.640 --> 0:32:30.840
<v Speaker 3>the same machine would probably produce fairly different sounding speech,

0:32:31.240 --> 0:32:34.200
<v Speaker 3>even if they had memorized which keys corresponded to which

0:32:34.240 --> 0:32:38.600
<v Speaker 3>phonetic units. So nobody I've read says this, but you know,

0:32:38.760 --> 0:32:42.120
<v Speaker 3>I'm kind of picturing Fober as a sort of phantom

0:32:42.160 --> 0:32:44.480
<v Speaker 3>of the opera at the at the organ keyboard. You know,

0:32:44.520 --> 0:32:46.440
<v Speaker 3>he's not just like pressing the keys, but giving a

0:32:46.480 --> 0:32:51.000
<v Speaker 3>real passionate and dramatic performance when somebody sells it. Ye yeah,

0:32:51.120 --> 0:32:54.680
<v Speaker 3>make it say Como tale vu or whatever. It also

0:32:54.720 --> 0:32:56.760
<v Speaker 3>sang songs, by the way. I'll get into that in

0:32:56.800 --> 0:32:59.120
<v Speaker 3>a minute. But I was wondering, what what did what

0:32:59.400 --> 0:33:01.840
<v Speaker 3>did people asking, you know, what's the equivalent in eighteen

0:33:02.000 --> 0:33:04.720
<v Speaker 3>forty five of yelling out, you know, play Freebird? And

0:33:04.760 --> 0:33:07.080
<v Speaker 3>I was thinking, maybe it's people are yelling for Tipicanu

0:33:07.120 --> 0:33:07.840
<v Speaker 3>and Tyler too.

0:33:08.400 --> 0:33:09.880
<v Speaker 1>Oh yeah.

0:33:10.040 --> 0:33:12.960
<v Speaker 3>So an interesting detail that story includes in this chapter

0:33:13.160 --> 0:33:16.200
<v Speaker 3>is that this was not the first time Fober had

0:33:16.240 --> 0:33:19.080
<v Speaker 3>built a talking machine. In fact, this was not the

0:33:19.080 --> 0:33:23.040
<v Speaker 3>first time Fober had built this exact talking machine. There

0:33:23.160 --> 0:33:26.040
<v Speaker 3>was an earlier version of it that was destroyed by

0:33:26.160 --> 0:33:30.080
<v Speaker 3>Fober himself. Quote in a bout of depression and intoxication.

0:33:30.960 --> 0:33:33.400
<v Speaker 3>I should say that nearly every source I read on

0:33:33.480 --> 0:33:38.200
<v Speaker 3>Fober mentions something about him being disheveled or even haunted,

0:33:38.840 --> 0:33:42.719
<v Speaker 3>obsessed with his machine, and generally emotionally unwell or at

0:33:42.720 --> 0:33:45.240
<v Speaker 3>the very least having a really rough time a lot

0:33:45.240 --> 0:33:49.280
<v Speaker 3>of the time. Multiple writers describe him in terms containing

0:33:49.280 --> 0:33:54.080
<v Speaker 3>a lot of pity. But so, it took Fober apparently

0:33:54.160 --> 0:33:57.880
<v Speaker 3>twenty years to perfect the first version of the machine,

0:33:57.920 --> 0:34:01.400
<v Speaker 3>the one that he drunkenly destroyed, but he was able

0:34:01.480 --> 0:34:04.480
<v Speaker 3>to recreate the second version within a year of that.

0:34:05.000 --> 0:34:07.360
<v Speaker 3>And this kind of suggests to me the possibility that

0:34:07.960 --> 0:34:11.560
<v Speaker 3>the original creation of the machine may have really been

0:34:11.880 --> 0:34:15.600
<v Speaker 3>a project of fundamental research about phonetics more than it

0:34:15.719 --> 0:34:19.280
<v Speaker 3>was about engineering. And so once he had the knowledge

0:34:19.280 --> 0:34:22.080
<v Speaker 3>in hand of how each sound was produced, like what

0:34:22.160 --> 0:34:25.000
<v Speaker 3>the shape of the oral cavity, you know, how that

0:34:25.040 --> 0:34:29.120
<v Speaker 3>corresponded to the sounds, recreating the machine itself might have

0:34:29.200 --> 0:34:32.080
<v Speaker 3>been a relatively simple proposition. Is really what you needed

0:34:32.160 --> 0:34:36.240
<v Speaker 3>was the knowledge about how phonetics correspond to physical shapes.

0:34:36.800 --> 0:34:38.840
<v Speaker 1>Yeah, And if he had that, and certainly if he

0:34:38.880 --> 0:34:42.400
<v Speaker 1>had notes on the matter and his designs recorded, it

0:34:42.440 --> 0:34:45.600
<v Speaker 1>would be easier to come back and reproduce that. Yeah.

0:34:46.360 --> 0:34:51.360
<v Speaker 3>So Joseph Henry's letter about Fober's talking machine demonstration. It

0:34:51.400 --> 0:34:55.320
<v Speaker 3>also includes speculation about the uses to which a machine

0:34:55.360 --> 0:34:59.080
<v Speaker 3>like this could be put. One interesting idea he has is,

0:34:59.280 --> 0:35:02.080
<v Speaker 3>what if you could take a spoken message at one

0:35:02.120 --> 0:35:07.080
<v Speaker 3>location and code that spoken message into inputs on this

0:35:07.239 --> 0:35:13.120
<v Speaker 3>keyboard on this machine, and then, through electromagnetic means, transmit

0:35:13.239 --> 0:35:17.920
<v Speaker 3>those keystrokes across wires to a totally separate second location,

0:35:18.680 --> 0:35:22.480
<v Speaker 3>and then those electrical signals could operate the speech organs

0:35:22.520 --> 0:35:26.040
<v Speaker 3>of the doll faced machine. In the second location. You

0:35:26.080 --> 0:35:32.480
<v Speaker 3>would essentially be transmitting speech itself across great distance. Notable

0:35:32.480 --> 0:35:36.080
<v Speaker 3>that Henry's idea here is roughly thirty years before Alexander

0:35:36.080 --> 0:35:39.560
<v Speaker 3>Graham Bell demonstrates the principle of the telephone. But there

0:35:39.600 --> 0:35:43.400
<v Speaker 3>is a very important difference, which is that while Bell's

0:35:43.440 --> 0:35:46.960
<v Speaker 3>telephone and these are stories words here quote transmitted an

0:35:47.000 --> 0:35:52.080
<v Speaker 3>electrical analog of the speech pressure wave. Henry's description alluded

0:35:52.120 --> 0:35:57.400
<v Speaker 3>to representing speech in compressed form based on slowly varying

0:35:57.480 --> 0:36:01.319
<v Speaker 3>movements of the operator's hands, fingers, and feet as they

0:36:01.360 --> 0:36:05.319
<v Speaker 3>formed the keystroke sequences required to produce an utterance, a

0:36:05.360 --> 0:36:09.120
<v Speaker 3>signal processing technique that would not be implemented into telephone

0:36:09.160 --> 0:36:13.600
<v Speaker 3>transmission systems for nearly another century. So the interesting thing

0:36:13.640 --> 0:36:17.279
<v Speaker 3>about Henry here is that he's not just imagining converting

0:36:17.320 --> 0:36:20.600
<v Speaker 3>the sound of a voice into an impulse that travels

0:36:20.640 --> 0:36:24.719
<v Speaker 3>along the wire. He's imagining a coding process. It's put

0:36:24.840 --> 0:36:28.160
<v Speaker 3>into code for the transmission and then decoded by the

0:36:28.200 --> 0:36:29.399
<v Speaker 3>machine at the other end.

0:36:30.080 --> 0:36:33.560
<v Speaker 1>I can't help but try to imagine this alternate past

0:36:33.840 --> 0:36:38.000
<v Speaker 1>in which instead of early telephones, people all had this

0:36:38.760 --> 0:36:43.560
<v Speaker 1>weird cherub head mounted on the wall that then speaks

0:36:43.600 --> 0:36:47.680
<v Speaker 1>to you in this I'm assumed slightly haunting voice. Oh.

0:36:47.760 --> 0:36:50.880
<v Speaker 3>I'll get to the haunting voice in a second, but anyway,

0:36:50.960 --> 0:36:54.520
<v Speaker 3>story flags it as historically significant that this one invention

0:36:54.680 --> 0:36:58.960
<v Speaker 3>had both succeeded in producing generally intelligible synthetic speech to

0:36:59.040 --> 0:37:02.279
<v Speaker 3>people in the room with it, and it had inspired

0:37:02.360 --> 0:37:05.480
<v Speaker 3>at least one onlooker to start considering ideas for the

0:37:05.560 --> 0:37:09.560
<v Speaker 3>electrical transmission of low bandwidth speech from one place to another.

0:37:10.480 --> 0:37:14.359
<v Speaker 3>But neither of these possibilities really went anywhere. Henry did

0:37:14.440 --> 0:37:18.239
<v Speaker 3>not devote any more effort to musing about the electrical transmission,

0:37:18.840 --> 0:37:22.880
<v Speaker 3>and Fober's machine ended up being a circus side show

0:37:23.040 --> 0:37:28.480
<v Speaker 3>almost literally. So after this, Fober needed money, and beginning

0:37:28.480 --> 0:37:31.880
<v Speaker 3>in eighteen forty six, to get money, he signed on

0:37:32.080 --> 0:37:35.279
<v Speaker 3>to demonstrate his machine for P. T. Barnum. Gotta have

0:37:35.320 --> 0:37:39.319
<v Speaker 3>something for everybody, even people who want a talking doll

0:37:39.400 --> 0:37:44.360
<v Speaker 3>head operated by a disheveled German organ master. So Fober

0:37:44.400 --> 0:37:48.759
<v Speaker 3>committed to exhibit the marvelous Speaking Machine for Barnum at

0:37:48.800 --> 0:37:52.400
<v Speaker 3>the Egyptian Hall in London. This was like a general

0:37:52.480 --> 0:37:56.280
<v Speaker 3>exhibition hall in Piccadilly which hosted all kinds of shows,

0:37:56.320 --> 0:37:58.680
<v Speaker 3>but I think, especially in the latter part of the

0:37:58.760 --> 0:38:01.680
<v Speaker 3>nineteenth century, it was known for showing like a lot

0:38:01.719 --> 0:38:07.440
<v Speaker 3>of Mountebanks and fraudulent spiritualist demonstrators. Yeah, I'll reveal to

0:38:07.480 --> 0:38:10.359
<v Speaker 3>you that you're actually a reincarnation of Cleopatra.

0:38:11.760 --> 0:38:12.200
<v Speaker 1>Lucky you.

0:38:12.840 --> 0:38:15.160
<v Speaker 3>But by noting that that's just a random thing, I'm

0:38:15.200 --> 0:38:18.279
<v Speaker 3>not trying to cast dispersions on Fober because I want

0:38:18.320 --> 0:38:21.480
<v Speaker 3>to stress that it seems totally clear that Fober was

0:38:21.640 --> 0:38:24.680
<v Speaker 3>no con artist. As best we can tell, his machine

0:38:24.760 --> 0:38:29.160
<v Speaker 3>really did work, and when played correctly, it did really

0:38:29.200 --> 0:38:34.080
<v Speaker 3>speak original sentences that people could, for the most part understand. Though,

0:38:34.200 --> 0:38:38.200
<v Speaker 3>one thing that emerges from reading descriptions of this is

0:38:38.239 --> 0:38:44.520
<v Speaker 3>that coding intelligible information and sounding like speech are two

0:38:44.560 --> 0:38:48.080
<v Speaker 3>completely different things. So it seems that a lot of

0:38:48.080 --> 0:38:51.880
<v Speaker 3>people could tell what the machine was saying, but still

0:38:52.000 --> 0:38:55.920
<v Speaker 3>they were not very impressed by what they heard. And

0:38:55.960 --> 0:39:00.840
<v Speaker 3>I found a spectacularly evocative description of what the machine

0:39:00.880 --> 0:39:04.000
<v Speaker 3>was like are recorded in a book called Instruments and

0:39:04.040 --> 0:39:08.040
<v Speaker 3>the Imagination by Thomas L. Hankins and Robert J. Silverman,

0:39:08.200 --> 0:39:11.759
<v Speaker 3>Princeton University Press, nineteen ninety nine. But the main thing

0:39:11.800 --> 0:39:14.759
<v Speaker 3>here is that they're quoting a person who saw the

0:39:14.760 --> 0:39:17.640
<v Speaker 3>machine in person in eighteen forty six, I believe, and

0:39:17.880 --> 0:39:20.400
<v Speaker 3>then wrote about it in a memoir. But generally the

0:39:20.440 --> 0:39:22.960
<v Speaker 3>authors here they note that there were like some satirical

0:39:23.080 --> 0:39:27.080
<v Speaker 3>articles making reference to Faber's machine, suggesting, for example, that

0:39:27.200 --> 0:39:29.759
<v Speaker 3>it could be used to replace the speaker of the

0:39:29.800 --> 0:39:34.759
<v Speaker 3>House of Commons. Yuk yah, those wacky politicians. But then

0:39:34.800 --> 0:39:37.160
<v Speaker 3>they well, they do kind of make a funny point. Actually,

0:39:37.239 --> 0:39:39.279
<v Speaker 3>they say, like you could just program it to say

0:39:39.440 --> 0:39:41.520
<v Speaker 3>order order at ten minute intervals.

0:39:43.480 --> 0:39:45.200
<v Speaker 1>Well that's pretty good, that's funny today.

0:39:45.600 --> 0:39:48.399
<v Speaker 3>Yeah. But anyway, then there's a part of the book

0:39:48.400 --> 0:39:52.359
<v Speaker 3>where they're including this evocative written account which is from

0:39:52.520 --> 0:39:57.560
<v Speaker 3>a London theater manager named John Hollingshead who saw this

0:39:57.640 --> 0:40:00.400
<v Speaker 3>machine in person when he was nineteen years old and

0:40:00.440 --> 0:40:03.239
<v Speaker 3>then wrote about it in a memoirs or some book.

0:40:03.520 --> 0:40:08.200
<v Speaker 3>But anyway, this is hallings Head's account. The exhibitor, Professor Fober,

0:40:08.640 --> 0:40:12.280
<v Speaker 3>was a sad faced man, dressed in respectable, well worn

0:40:12.320 --> 0:40:16.440
<v Speaker 3>clothes that were soiled by contact with tools, wood and machinery.

0:40:17.040 --> 0:40:20.600
<v Speaker 3>The room looked like a laboratory and workshop, which it was.

0:40:21.200 --> 0:40:23.960
<v Speaker 3>The professor was not too clean, and his hair and

0:40:24.040 --> 0:40:27.680
<v Speaker 3>beard sadly wanted the attention of a barber. I have

0:40:27.760 --> 0:40:30.439
<v Speaker 3>no doubt that he slept in the same room as

0:40:30.440 --> 0:40:33.720
<v Speaker 3>his figure, his scientific Frankenstein Monster.

0:40:34.320 --> 0:40:34.520
<v Speaker 1>Note.

0:40:34.520 --> 0:40:36.319
<v Speaker 3>I guess the novel would have only been a few

0:40:36.320 --> 0:40:38.000
<v Speaker 3>decades old at this time.

0:40:38.400 --> 0:40:40.879
<v Speaker 1>Yeah, yeah, eighteen eighteen on Frankenstein there.

0:40:41.040 --> 0:40:44.680
<v Speaker 3>Yeah, sorry going on with Halling's head, and I felt

0:40:44.680 --> 0:40:47.399
<v Speaker 3>the secret influence of an idea that the two were

0:40:47.520 --> 0:40:49.600
<v Speaker 3>destined to live and die together.

0:40:50.160 --> 0:40:53.839
<v Speaker 1>Oh my god, this is those pretty strong words. Yes.

0:40:54.880 --> 0:40:58.320
<v Speaker 3>The professor, with a slight German accent, put his wonderful

0:40:58.360 --> 0:41:02.000
<v Speaker 3>toy in motion. He explained its action. It was not

0:41:02.200 --> 0:41:06.400
<v Speaker 3>necessary to prove the absence of deception one keyboard touched

0:41:06.440 --> 0:41:10.279
<v Speaker 3>by the professor, produced words which slowly and deliberately, in

0:41:10.360 --> 0:41:14.440
<v Speaker 3>a hoarse, sepulchral voice, came from the mouth of the figure,

0:41:14.680 --> 0:41:17.640
<v Speaker 3>as if from the depths of a tomb. It wanted

0:41:17.680 --> 0:41:21.160
<v Speaker 3>little imagination to make the very few visitors believe that

0:41:21.200 --> 0:41:25.480
<v Speaker 3>the figure contained an imprisoned human or half human being

0:41:26.200 --> 0:41:30.719
<v Speaker 3>bound to speak slowly when tormented by the unseen power outside.

0:41:31.480 --> 0:41:33.640
<v Speaker 3>No one thought for a moment that they were being

0:41:33.680 --> 0:41:37.520
<v Speaker 3>fooled by a second edition of the Invisible Girl fraud.

0:41:38.440 --> 0:41:41.240
<v Speaker 3>And by the way, the reference to the Invisible Girl fraud,

0:41:41.280 --> 0:41:44.120
<v Speaker 3>I believe is about the many fake machines and fake

0:41:44.160 --> 0:41:47.520
<v Speaker 3>automata that were actually worked by having a human hidden

0:41:47.520 --> 0:41:50.800
<v Speaker 3>inside operating it. But going on, so Holling said, says,

0:41:51.160 --> 0:41:53.759
<v Speaker 3>nobody thought that there was an invisible girl operating. This

0:41:53.880 --> 0:41:57.440
<v Speaker 3>as clear, this is real. He goes on. There were truth,

0:41:57.760 --> 0:42:01.200
<v Speaker 3>laborious invention, and good faith in every part of the

0:42:01.239 --> 0:42:05.239
<v Speaker 3>melancholy room. As a crowning display, the head sang a

0:42:05.280 --> 0:42:10.440
<v Speaker 3>sepulchral version of God Save the Queen, which suggested, inevitably,

0:42:10.680 --> 0:42:14.799
<v Speaker 3>God save the inventor. This extraordinary effect was achieved by

0:42:14.840 --> 0:42:18.319
<v Speaker 3>the professor working two keyboards, one for the words and

0:42:18.360 --> 0:42:22.239
<v Speaker 3>one for the music. Never probably before or since, has

0:42:22.280 --> 0:42:26.279
<v Speaker 3>the national anthem been so sung, sadder and wiser. I

0:42:26.680 --> 0:42:30.360
<v Speaker 3>and the few visitors crept slowly from the place, leaving

0:42:30.400 --> 0:42:33.560
<v Speaker 3>the Professor with his one and only treasure, his child

0:42:33.640 --> 0:42:36.320
<v Speaker 3>of infinite labor and unmeasurable sorrow.

0:42:36.760 --> 0:42:41.279
<v Speaker 1>Oh wow, that is a lot. I mean, obviously he

0:42:41.680 --> 0:42:44.160
<v Speaker 1>lays it on really thick about the sadness of the

0:42:44.160 --> 0:42:47.120
<v Speaker 1>inventor here. And then also there's the ideas like this

0:42:47.239 --> 0:42:49.960
<v Speaker 1>was no hoax, this was real and it was depressing.

0:42:50.360 --> 0:42:54.840
<v Speaker 3>Yeah, it's a weird mix of like like pity but

0:42:55.080 --> 0:42:59.319
<v Speaker 3>real admiration, you know that, Like, there's something beautiful and

0:42:59.440 --> 0:43:02.799
<v Speaker 3>honest and true about this machine and his devotion to

0:43:02.840 --> 0:43:05.520
<v Speaker 3>it and the genius it took to create it. But

0:43:05.640 --> 0:43:09.600
<v Speaker 3>also it makes everybody feel bad and nobody wants to

0:43:09.600 --> 0:43:11.880
<v Speaker 3>look at it or listen to it, and everybody leaves

0:43:11.880 --> 0:43:16.400
<v Speaker 3>feeling depressed. YEA, something about that struck me as actually

0:43:16.760 --> 0:43:19.400
<v Speaker 3>quite poignant and meaningful. Maybe we can come back to

0:43:19.440 --> 0:43:21.600
<v Speaker 3>that in a minute, but I did want to flag

0:43:21.640 --> 0:43:25.040
<v Speaker 3>that there was one notable visitor who, coming back to

0:43:25.040 --> 0:43:29.080
<v Speaker 3>the Invisible Girl suspicion, he did at first suspect fraud,

0:43:29.200 --> 0:43:32.359
<v Speaker 3>and that was the Duke of Wellington. I was reading

0:43:32.360 --> 0:43:34.680
<v Speaker 3>about this in a book called The Shows of London

0:43:34.760 --> 0:43:39.440
<v Speaker 3>by Richard Daniel Atlick, and at Lick recounts that Wellington,

0:43:39.760 --> 0:43:42.719
<v Speaker 3>when he first went to the demonstration, he was so

0:43:42.880 --> 0:43:47.160
<v Speaker 3>impressed by Faber's speaking machine that he asked to be

0:43:47.239 --> 0:43:50.000
<v Speaker 3>allowed to touch the keys with his own fingers, you know,

0:43:50.080 --> 0:43:52.799
<v Speaker 3>so he could see that it was genuine. And then

0:43:52.920 --> 0:43:55.239
<v Speaker 3>he did confirm that it was genuine, and then he

0:43:55.320 --> 0:43:58.239
<v Speaker 3>insisted that he'd be taught how to use it. So

0:43:58.480 --> 0:44:01.520
<v Speaker 3>Fober taught the Duke to play the machine in both

0:44:01.600 --> 0:44:05.560
<v Speaker 3>German and English, and Wellington did get it like he could.

0:44:05.600 --> 0:44:08.080
<v Speaker 3>He could make it speak sentences in German and English,

0:44:08.200 --> 0:44:11.120
<v Speaker 3>and he was amazed, writing in the visitor's log of

0:44:11.200 --> 0:44:15.239
<v Speaker 3>the exhibit that the speaking machine, or the Euphonia, was

0:44:15.360 --> 0:44:30.040
<v Speaker 3>quote an extraordinary production of mechanical genius. Faber's machine also

0:44:30.080 --> 0:44:33.840
<v Speaker 3>got rave reviews in The Times, in the Illustrated London News.

0:44:33.880 --> 0:44:37.440
<v Speaker 3>A lot of people like looked at it and they

0:44:37.920 --> 0:44:40.440
<v Speaker 3>thought that, like, yeah, this is a work of genius.

0:44:40.280 --> 0:44:43.959
<v Speaker 3>It's incredible that he's done this. But at the same time,

0:44:44.560 --> 0:44:48.680
<v Speaker 3>audiences really were not into it. Barnum himself noticed that

0:44:48.719 --> 0:44:52.040
<v Speaker 3>Fober's machine was not attracting crowds, it was not selling

0:44:52.080 --> 0:44:56.520
<v Speaker 3>tickets and not generating revenue, and so eventually he took

0:44:56.600 --> 0:45:00.360
<v Speaker 3>Fober's machine out of the Egyptian Hall in Life, London

0:45:00.880 --> 0:45:03.680
<v Speaker 3>and added it to a traveling exhibit that went around

0:45:03.719 --> 0:45:08.480
<v Speaker 3>the English countryside doing performances. And from here Faber himself

0:45:08.520 --> 0:45:11.759
<v Speaker 3>seems to kind of disappear from the historical record. Some

0:45:11.840 --> 0:45:15.439
<v Speaker 3>sources indicate that he may have died by suicide during

0:45:15.480 --> 0:45:18.880
<v Speaker 3>this period, though that isn't known for sure. But after

0:45:19.320 --> 0:45:23.480
<v Speaker 3>historical sources stopped mentioning Faber himself, they still make references

0:45:23.520 --> 0:45:27.600
<v Speaker 3>to his machine, reading from story here quote. Although his

0:45:27.760 --> 0:45:31.440
<v Speaker 3>talking machine continued to make side show like appearances in

0:45:31.480 --> 0:45:35.000
<v Speaker 3>Europe and North America over the next thirty years. It

0:45:35.080 --> 0:45:38.239
<v Speaker 3>seems a relative, perhaps a niece or nephew, may have

0:45:38.320 --> 0:45:41.600
<v Speaker 3>inherited the machine and performed with it to generate income.

0:45:42.600 --> 0:45:45.800
<v Speaker 1>So maybe, no matter whatever happened to him, maybe a

0:45:45.840 --> 0:45:49.160
<v Speaker 1>relative with a little more showmanship like stepped in and

0:45:49.360 --> 0:45:51.640
<v Speaker 1>was able to make at least some sort of an

0:45:51.680 --> 0:45:52.440
<v Speaker 1>income off of it.

0:45:53.000 --> 0:45:55.680
<v Speaker 3>Yes, But then again, like I'm struck by the strange

0:45:55.719 --> 0:46:02.400
<v Speaker 3>ironic sadness of this, this was actually a scientifically significant invention,

0:46:02.719 --> 0:46:06.440
<v Speaker 3>like he had done something kind of amazing, but it

0:46:06.560 --> 0:46:10.040
<v Speaker 3>just never really went anywhere under his mastery. And then yeah,

0:46:10.120 --> 0:46:15.280
<v Speaker 3>maybe a relative was a better Carnival Barker essentially to

0:46:15.320 --> 0:46:17.759
<v Speaker 3>perform with the machine and make some money off of it.

0:46:17.920 --> 0:46:20.239
<v Speaker 1>I mean, it reminds me of so many advancements in

0:46:20.440 --> 0:46:25.960
<v Speaker 1>say robotics that we've seen over the years, where oftentimes,

0:46:26.160 --> 0:46:28.920
<v Speaker 1>you know, to a certain extent, unfairly, they'll just be

0:46:29.000 --> 0:46:31.560
<v Speaker 1>one little clip of it that goes viral and people

0:46:31.600 --> 0:46:35.880
<v Speaker 1>react to be it some sort of you know, human

0:46:36.000 --> 0:46:39.080
<v Speaker 1>likeness with facial features that seem to be moving or

0:46:39.120 --> 0:46:42.440
<v Speaker 1>operating in an uncanny way, or something like the various

0:46:43.239 --> 0:46:48.680
<v Speaker 1>dog robots from Boston Dynamics that are very impressive but

0:46:48.760 --> 0:46:52.560
<v Speaker 1>also maybe interpreted as being a bit creepy. And so

0:46:52.760 --> 0:46:54.680
<v Speaker 1>even though they are these, you know, they are often

0:46:54.920 --> 0:47:00.440
<v Speaker 1>examples of a real impressive technological advancement. Setting a side

0:47:00.760 --> 0:47:04.160
<v Speaker 1>actual applications, you can have a situation where something like

0:47:04.200 --> 0:47:10.359
<v Speaker 1>that is not as comforting, not as entertaining as say

0:47:10.360 --> 0:47:13.360
<v Speaker 1>an act of puppetry or even an act of just

0:47:13.719 --> 0:47:17.520
<v Speaker 1>outright well, maybe not fraud, but say a robot or

0:47:17.560 --> 0:47:23.480
<v Speaker 1>a costume depicting a robot maybe ultimately maybe more reassuring,

0:47:23.560 --> 0:47:25.799
<v Speaker 1>maybe more fun compared to the actual thing.

0:47:26.360 --> 0:47:28.680
<v Speaker 3>Well yeah, which may which may just be fun or

0:47:28.719 --> 0:47:31.520
<v Speaker 3>may in fact be fraud, depending on what exactly they're

0:47:31.520 --> 0:47:34.840
<v Speaker 3>saying about it. Yeah, but this is a great point

0:47:34.880 --> 0:47:36.680
<v Speaker 3>and it brings me to I just wanted to mention

0:47:36.719 --> 0:47:40.120
<v Speaker 3>a few of the the general notes about the history

0:47:40.160 --> 0:47:43.959
<v Speaker 3>of speech synthesis from the end of this this book

0:47:44.040 --> 0:47:47.719
<v Speaker 3>chapter by Brad's story Story writs that, you know, while

0:47:47.719 --> 0:47:52.200
<v Speaker 3>there are technological use cases for speech synthesizers, we've you know,

0:47:52.239 --> 0:47:55.600
<v Speaker 3>we've got a number of them operating in consumer technology today,

0:47:56.520 --> 0:48:00.360
<v Speaker 3>and even before you had you know, personal digitalist stunts

0:48:00.400 --> 0:48:03.680
<v Speaker 3>and stuff, there would be use cases for speech synthesizers,

0:48:04.239 --> 0:48:07.440
<v Speaker 3>for example, people who have a disability that makes it

0:48:07.480 --> 0:48:10.960
<v Speaker 3>difficult or impossible for them to speak. Another one is

0:48:11.000 --> 0:48:13.880
<v Speaker 3>that apparently this was actually used by the Allies in

0:48:13.920 --> 0:48:16.480
<v Speaker 3>World War Two. There were some forms of speech synthesis

0:48:16.480 --> 0:48:22.600
<v Speaker 3>that would allow sort of covert coded transmissions of something

0:48:22.800 --> 0:48:24.480
<v Speaker 3>like a phone call, So you could have a phone

0:48:24.480 --> 0:48:28.040
<v Speaker 3>call between like FDR and Winston Churchill. It's not really

0:48:28.040 --> 0:48:32.400
<v Speaker 3>a phone call. It's like a transmitted synthesized bit of speech,

0:48:32.520 --> 0:48:35.600
<v Speaker 3>and so it's very secure, but it doesn't sound like

0:48:35.640 --> 0:48:38.560
<v Speaker 3>the person talking. It sounds maybe more like the euphonia,

0:48:38.920 --> 0:48:43.320
<v Speaker 3>kind of robotic and unnatural and maybe making the president's

0:48:43.560 --> 0:48:47.520
<v Speaker 3>giggle a bit a president Prime minister. But anyway, So

0:48:48.000 --> 0:48:50.439
<v Speaker 3>what story says is that a large number of these

0:48:50.480 --> 0:48:55.040
<v Speaker 3>systems have actually been primarily used as research tools, as

0:48:55.080 --> 0:49:00.520
<v Speaker 3>scientific tools for understanding the nature of human speech. I

0:49:00.600 --> 0:49:05.000
<v Speaker 3>trying to reproduce human speech and failing at it, that

0:49:05.080 --> 0:49:09.040
<v Speaker 3>we come closer to understanding how speech actually works in

0:49:09.560 --> 0:49:12.600
<v Speaker 3>the human body. But the second general observation that I

0:49:12.600 --> 0:49:15.239
<v Speaker 3>thought is interesting, and this seems to be very much

0:49:15.280 --> 0:49:20.280
<v Speaker 3>reflected in the Fober's machine example. It is much easier

0:49:20.320 --> 0:49:24.359
<v Speaker 3>to create a machine that can speak intelligibly than one

0:49:24.400 --> 0:49:29.440
<v Speaker 3>that can speak naturally. So that indicates that when we talk,

0:49:29.600 --> 0:49:32.640
<v Speaker 3>there's actually more than one thing going on. Yes, we

0:49:32.800 --> 0:49:37.960
<v Speaker 3>are conveying mental information coded in words, and the substance

0:49:38.000 --> 0:49:41.360
<v Speaker 3>of that coding is phonetic. It's a series of sounds.

0:49:41.400 --> 0:49:44.879
<v Speaker 3>But of course, you know, the ironic thing to people

0:49:44.880 --> 0:49:47.120
<v Speaker 3>who were used to thinking about words as text is

0:49:47.120 --> 0:49:50.680
<v Speaker 3>that the phonetic core of language long predates writing, so

0:49:50.800 --> 0:49:53.760
<v Speaker 3>like the written text of a word is a visual

0:49:53.840 --> 0:49:56.400
<v Speaker 3>code for the sound of the word, which is the

0:49:56.440 --> 0:50:00.120
<v Speaker 3>code for its meaning. But anyway, so machines for hundred

0:50:00.200 --> 0:50:03.720
<v Speaker 3>of yours have been able to produce more or less

0:50:04.040 --> 0:50:07.399
<v Speaker 3>intelligible phonetic code. They can speak words, and people can

0:50:07.520 --> 0:50:11.520
<v Speaker 3>understand what the words are supposed to be. But it

0:50:11.560 --> 0:50:15.920
<v Speaker 3>doesn't necessarily mean that people perceive these machines as speaking,

0:50:16.520 --> 0:50:20.239
<v Speaker 3>because there's another important quality to speech that was not

0:50:20.320 --> 0:50:23.160
<v Speaker 3>really captured by these early machines, and you could argue

0:50:23.239 --> 0:50:27.080
<v Speaker 3>is still somewhat lacking in the best speech synthesis of today,

0:50:27.520 --> 0:50:31.640
<v Speaker 3>and that is the natural character of continuous speech. These

0:50:31.680 --> 0:50:36.520
<v Speaker 3>machines always produce speech that sounded stilted, unreal, alien. It

0:50:36.560 --> 0:50:39.400
<v Speaker 3>was never something that would make you feel like you

0:50:39.440 --> 0:50:42.799
<v Speaker 3>were actually being talked to, as much as sort of

0:50:42.880 --> 0:50:48.759
<v Speaker 3>receiving a weird alien code in your language. And here

0:50:48.800 --> 0:50:51.560
<v Speaker 3>I just want to read from the stories chapter quote.

0:50:52.200 --> 0:50:56.000
<v Speaker 3>As a result, synthesis often presents itself as an oral

0:50:56.320 --> 0:50:59.880
<v Speaker 3>caricature that can be perceived as an unnatural in some

0:51:00.040 --> 0:51:03.920
<v Speaker 3>times amusing rendition of a desired utterance or speech sound.

0:51:04.280 --> 0:51:08.560
<v Speaker 3>It is particularly unique to phonetics and speech science that

0:51:08.600 --> 0:51:13.080
<v Speaker 3>the models used as tools to understand the scientific aspects

0:51:13.120 --> 0:51:17.040
<v Speaker 3>of a complex system produce a signal intended to be

0:51:17.160 --> 0:51:20.080
<v Speaker 3>heard as if it were a human. As such, the

0:51:20.160 --> 0:51:23.800
<v Speaker 3>quality of a speech synthesis can be rather harshly judged

0:51:23.920 --> 0:51:27.040
<v Speaker 3>because the model on which it is based has not

0:51:27.080 --> 0:51:31.000
<v Speaker 3>accounted for the myriad of subtle variations and details that

0:51:31.120 --> 0:51:35.880
<v Speaker 3>combine in natural human speech. So to paraphrase, speech is

0:51:35.960 --> 0:51:39.520
<v Speaker 3>so much more than just the words, And even if

0:51:39.600 --> 0:51:42.840
<v Speaker 3>you can get the words right, there's still something that

0:51:43.120 --> 0:51:45.719
<v Speaker 3>is that is lacking and is going to take a

0:51:45.840 --> 0:51:47.720
<v Speaker 3>lot of work to try to capture.

0:51:48.080 --> 0:51:51.279
<v Speaker 1>Yeah, this is fascinating to think about, and especially given

0:51:51.880 --> 0:51:55.080
<v Speaker 1>what you mentioned earlier about it's the importance of speech

0:51:55.280 --> 0:51:59.520
<v Speaker 1>the synthesizer technology to aid people who cannot speak or

0:51:59.560 --> 0:52:03.280
<v Speaker 1>have lost ability to speak. You know, I gave probably

0:52:03.560 --> 0:52:05.400
<v Speaker 1>one of the most famous, if not the most famous

0:52:05.440 --> 0:52:09.359
<v Speaker 1>examples of this is, of course, the speech synthesizer used

0:52:09.400 --> 0:52:15.080
<v Speaker 1>by theoretical Stephen Hawking. Like one of the interesting things

0:52:15.120 --> 0:52:17.520
<v Speaker 1>about his story with it, as I remember, is that

0:52:18.040 --> 0:52:20.359
<v Speaker 1>just me mentioning it, you can probably sort of hear

0:52:20.400 --> 0:52:23.880
<v Speaker 1>the voice the synthesized voice of Stephen Hawking in your head.

0:52:24.560 --> 0:52:27.120
<v Speaker 1>And I know that at some point like that was

0:52:27.239 --> 0:52:30.239
<v Speaker 1>you know, an early system he got there, and later

0:52:30.320 --> 0:52:32.480
<v Speaker 1>on in life he had he could have switched the

0:52:32.560 --> 0:52:35.640
<v Speaker 1>voice up, he could have changed the voice and and

0:52:35.800 --> 0:52:38.520
<v Speaker 1>I'm assuming could have maybe improved upon it, but by

0:52:38.560 --> 0:52:41.520
<v Speaker 1>that point he felt that this was his voice. You know,

0:52:41.560 --> 0:52:43.719
<v Speaker 1>you can't switch it up. You know, this is this

0:52:43.800 --> 0:52:46.600
<v Speaker 1>is how I speak, and this is how I hear myself.

0:52:47.360 --> 0:52:49.839
<v Speaker 1>So I always found that that interesting, and especially when

0:52:49.920 --> 0:52:51.840
<v Speaker 1>and then you can compare that to some other cases,

0:52:51.880 --> 0:52:54.799
<v Speaker 1>like you know, film credit Roger Ebert late in life,

0:52:54.840 --> 0:52:56.960
<v Speaker 1>you know, you could no longer speak, but had I

0:52:56.960 --> 0:52:59.799
<v Speaker 1>think they had a more robust system put together based

0:52:59.840 --> 0:53:03.239
<v Speaker 1>on samples of you know, the great catalog of his

0:53:03.320 --> 0:53:07.680
<v Speaker 1>own recorded speeches and reviews and so forth that they

0:53:07.680 --> 0:53:10.160
<v Speaker 1>could draw upon. And then looking into the future, you

0:53:10.160 --> 0:53:14.160
<v Speaker 1>have situations like James Earl Jones's Darth Vader voice, that

0:53:14.280 --> 0:53:21.200
<v Speaker 1>being you know, sort of archived and prepared for so

0:53:21.239 --> 0:53:23.839
<v Speaker 1>that in the future you can you can basically have

0:53:24.200 --> 0:53:28.120
<v Speaker 1>like a machine synthesized version of that voice that will

0:53:28.160 --> 0:53:31.600
<v Speaker 1>stand in as a sort of one to one replication

0:53:31.760 --> 0:53:34.239
<v Speaker 1>of what James Earl Jones did in life with the

0:53:34.360 --> 0:53:35.319
<v Speaker 1>voice acting.

0:53:36.280 --> 0:53:38.680
<v Speaker 3>Or at least so the proponents of the technology would say,

0:53:38.680 --> 0:53:40.440
<v Speaker 3>I'm sure there would be critics who would say, it's

0:53:40.480 --> 0:53:41.920
<v Speaker 3>never going to be a one to one.

0:53:42.440 --> 0:53:44.600
<v Speaker 1>Right, right, And then of course there's also the argument,

0:53:44.800 --> 0:53:48.920
<v Speaker 1>specifically with only with Darth Vader. Here am I discussing this,

0:53:49.040 --> 0:53:52.239
<v Speaker 1>but obviously the case can be made that like, well,

0:53:52.280 --> 0:53:57.160
<v Speaker 1>we shouldn't reproduce, you know, deceased actors' voices to continue

0:53:57.160 --> 0:54:01.600
<v Speaker 1>a fictional role. We should employ new living actors and

0:54:01.719 --> 0:54:04.799
<v Speaker 1>existing living voice actors who can do the voice. I

0:54:04.800 --> 0:54:07.400
<v Speaker 1>think with Darth Vader in particular, you could make a

0:54:07.440 --> 0:54:09.600
<v Speaker 1>strong case for that because there are other voice actors

0:54:09.600 --> 0:54:13.400
<v Speaker 1>who do officially voice act that character and do a

0:54:13.440 --> 0:54:17.319
<v Speaker 1>great job with it. What does it mean if that

0:54:17.560 --> 0:54:22.120
<v Speaker 1>individual's job is potentially taken by this sort of machine

0:54:22.480 --> 0:54:26.719
<v Speaker 1>likeness of that voice that is authorized based on the

0:54:26.840 --> 0:54:31.040
<v Speaker 1>voice of a you know, of a retired or in

0:54:31.080 --> 0:54:33.040
<v Speaker 1>some cases you know, deceased individual.

0:54:33.320 --> 0:54:35.600
<v Speaker 3>Well, we're going a little off topic now, but I

0:54:35.640 --> 0:54:37.919
<v Speaker 3>will say that I stand by what I've said before,

0:54:37.960 --> 0:54:40.480
<v Speaker 3>which is I'm firmly in the camp that I prefer

0:54:40.760 --> 0:54:45.040
<v Speaker 3>recasting with a different actor, as opposed to using technology

0:54:45.080 --> 0:54:48.000
<v Speaker 3>to try to synthesize the voice or appearance of an

0:54:48.000 --> 0:54:51.520
<v Speaker 3>actor who, for whatever reason cannot be present. Right, people

0:54:51.600 --> 0:54:54.600
<v Speaker 3>have been recasting the same role with different actors for decades.

0:54:54.640 --> 0:54:56.719
<v Speaker 3>That happens all the time. Like, what's the problem with it?

0:54:57.280 --> 0:55:00.680
<v Speaker 1>Yeah? I agree? I agree? But in in some cases,

0:55:00.719 --> 0:55:04.560
<v Speaker 1>is it possible that a role that's been established by

0:55:05.040 --> 0:55:10.719
<v Speaker 1>by a living actor could not be just masterfully redone

0:55:11.280 --> 0:55:15.560
<v Speaker 1>by a clunky machine with the face of a cherub,

0:55:15.960 --> 0:55:19.200
<v Speaker 1>that is, that is manipulated by a sad German man

0:55:19.239 --> 0:55:21.879
<v Speaker 1>who needs a haircut. I think there's some potential there,

0:55:21.960 --> 0:55:24.040
<v Speaker 1>Like I don't know the next James Bond.

0:55:24.080 --> 0:55:27.480
<v Speaker 3>Maybe this is the only film genre I'm interested in

0:55:27.520 --> 0:55:32.680
<v Speaker 3>from now on. Yeah, high tension espionage movies starring the euphonia.

0:55:34.480 --> 0:55:37.480
<v Speaker 1>So there you have it. The machine speaks. Obviously, we'd

0:55:37.520 --> 0:55:38.840
<v Speaker 1>love to hear from everyone out there if you have

0:55:38.880 --> 0:55:40.880
<v Speaker 1>thoughts on all of this, and certainly anyone out there

0:55:40.920 --> 0:55:44.879
<v Speaker 1>who has you know, direct experience with speech synthesizer technology

0:55:45.680 --> 0:55:48.160
<v Speaker 1>for one use or another. Right in, we would love

0:55:48.200 --> 0:55:48.880
<v Speaker 1>to hear from you.

0:55:49.520 --> 0:55:54.720
<v Speaker 3>Just a reminder, I just the speech synthesis or speech

0:55:54.800 --> 0:55:58.600
<v Speaker 3>synthesizer is one of the hardest pairs of words to enunciate,

0:55:58.640 --> 0:56:00.160
<v Speaker 3>and I've had to say it so many times in

0:56:00.160 --> 0:56:04.040
<v Speaker 3>this episode. I just want to be recognized, especially for

0:56:04.080 --> 0:56:05.319
<v Speaker 3>the times I probably did it wrong.

0:56:06.160 --> 0:56:09.040
<v Speaker 1>Yes, well it's easy for the babyface machines that yeah.

0:56:09.120 --> 0:56:11.719
<v Speaker 1>So at any rate. Yeah. If you want to listen

0:56:11.719 --> 0:56:13.320
<v Speaker 1>to other episodes of Stuff to Blow Your Mind, you

0:56:13.360 --> 0:56:14.680
<v Speaker 1>will find them in the Stuff to Blow Your Mind

0:56:14.680 --> 0:56:17.759
<v Speaker 1>podcast feed with our core episodes on Tuesdays and Thursdays.

0:56:18.120 --> 0:56:20.520
<v Speaker 1>Mondays we do a listener mail, Wednesdays we do a

0:56:20.560 --> 0:56:23.080
<v Speaker 1>short form artufactor monster fact, and then on Fridays we

0:56:23.160 --> 0:56:25.719
<v Speaker 1>set aside most serious concerns to just talk about a

0:56:25.760 --> 0:56:27.680
<v Speaker 1>weird film on Weird House Cinema.

0:56:27.840 --> 0:56:31.799
<v Speaker 3>Huge thanks to our excellent audio producer JJ Posway. If

0:56:31.840 --> 0:56:33.239
<v Speaker 3>you would like to get in touch with us with

0:56:33.280 --> 0:56:36.120
<v Speaker 3>feedback on this episode or any other, to suggest topic

0:56:36.160 --> 0:56:38.040
<v Speaker 3>for the future, or just to say hello, you can

0:56:38.120 --> 0:56:48.680
<v Speaker 3>email us at contact Stuff to Blow your Mind dot com.

0:56:48.800 --> 0:56:51.759
<v Speaker 2>Stuff to Blow Your Mind is production of iHeartRadio. For

0:56:51.840 --> 0:56:54.640
<v Speaker 2>more podcasts from my Heart Radio, visit the iHeartRadio app,

0:56:54.760 --> 0:57:12.080
<v Speaker 2>Apple Podcasts, or wherever you listen to your favorite shows,