WEBVTT - Talking Tech Interview Special: Laureate Fellow and Scientia Professor of Artificial Intelligence, Toby Walsh

0:00:21.798 --> 0:00:26.388
<v S1>Hi folks, and welcome to a Vision Australia Radio special podcast.

0:00:26.388 --> 0:00:31.008
<v S1>And today I'm joined by Professor Toby Walsh. And I've

0:00:31.008 --> 0:00:32.658
<v S1>already forgotten what I'm going to call you, so I'll

0:00:32.658 --> 0:00:37.068
<v S1>just call you, um, chief science person from the Eye

0:00:37.068 --> 0:00:39.678
<v S1>Institute of University of New South Wales, which I probably

0:00:39.678 --> 0:00:43.308
<v S1>just strangled it. Um, so, Toby, thanks for coming on

0:00:43.308 --> 0:00:46.248
<v S1>the program today. And like I did, have seven questions

0:00:46.248 --> 0:00:49.308
<v S1>that I'll get to very, very quickly. But my first one,

0:00:49.578 --> 0:00:52.488
<v S1>and it makes me want to grind my teeth a lot,

0:00:52.488 --> 0:00:57.258
<v S1>is why is there a need in even with AI

0:00:57.258 --> 0:01:00.318
<v S1>itself and the people behind it, to actually try and

0:01:00.318 --> 0:01:05.808
<v S1>make AI systems sound and react like human beings? That's a.

0:01:05.808 --> 0:01:11.568
<v S2>Fantastic question. It's it. And it's a it's a fatal

0:01:11.568 --> 0:01:15.078
<v S2>mistake that we're making all the time because it confuses

0:01:15.078 --> 0:01:19.128
<v S2>people and it messes with people. Um, we give them,

0:01:19.128 --> 0:01:23.358
<v S2>we give them names, often female names. Um, um, you know,

0:01:23.358 --> 0:01:26.988
<v S2>we shouldn't call them Siri or Cortana. We should call

0:01:26.988 --> 0:01:33.738
<v S2>them robot or computer or or or or, you know,

0:01:33.738 --> 0:01:36.678
<v S2>names that certainly aren't gendered. You know, you could call

0:01:36.678 --> 0:01:40.518
<v S2>them Alpha or Omega or, you know, Greek letters of

0:01:40.518 --> 0:01:45.318
<v S2>the Greek alphabet, whatever it is. Yeah. Um, but we

0:01:45.318 --> 0:01:49.398
<v S2>it's a it's a bad human trait. We anthropomorphize, we,

0:01:49.398 --> 0:01:56.058
<v S2>we apply human attributes to things that are not human. Um, no.

0:01:56.418 --> 0:02:00.738
<v S1>Exactly. And because in the in the demo from OpenAI, um,

0:02:00.738 --> 0:02:03.948
<v S1>last week, as we record this week, they had it

0:02:03.948 --> 0:02:06.708
<v S1>doing things like I think it was supposedly flirting with

0:02:06.708 --> 0:02:10.968
<v S1>the person. Um, it was laughing at its own jokes.

0:02:11.328 --> 0:02:15.528
<v S1>And I kept thinking, you're a computer program. This is

0:02:15.528 --> 0:02:19.998
<v S1>all false and not real. And that's why I keep

0:02:19.998 --> 0:02:22.068
<v S1>thinking of, um, I read your book, which is the,

0:02:22.098 --> 0:02:26.748
<v S1>you know, the Faking It, um, excellent book about faking intelligence.

0:02:26.748 --> 0:02:30.258
<v S1>And it this just reminds me that this is just

0:02:30.258 --> 0:02:33.618
<v S1>so up there amongst faking it. I just don't know

0:02:33.618 --> 0:02:36.918
<v S1>why people don't shrink away from this stuff and go, look, guys,

0:02:36.918 --> 0:02:39.258
<v S1>I think we really are getting a bit carried away.

0:02:39.948 --> 0:02:42.018
<v S2>This. The story is actually even worse than that. So

0:02:42.018 --> 0:02:45.768
<v S2>actually today they've actually had to retard the female voice

0:02:45.768 --> 0:02:51.528
<v S2>for the new OpenAI chat bot because, um, they tried

0:02:51.528 --> 0:02:54.708
<v S2>to buy the rights for Scarlett Johansson. She was the

0:02:54.708 --> 0:02:58.728
<v S2>voice on the movie her where it was her voice.

0:02:58.728 --> 0:03:01.218
<v S2>She was nominated for an Oscar, indeed, for the part

0:03:01.458 --> 0:03:05.358
<v S2>where she was the AI operating system. She declined to

0:03:05.358 --> 0:03:09.018
<v S2>license her voice to OpenAI. But the voice that they

0:03:09.018 --> 0:03:12.828
<v S2>had was eerily like her voice. Lots of people said

0:03:12.828 --> 0:03:16.758
<v S2>it was reminded them of the movie, and many of

0:03:16.758 --> 0:03:20.388
<v S2>Scarlett Johansson's friends apparently said they thought it was her.

0:03:20.388 --> 0:03:24.828
<v S2>Right now. Um, because I think she's probably threatened to sue.

0:03:24.858 --> 0:03:30.528
<v S2>She's probably taken a lawyer's less legal, um, action and

0:03:30.528 --> 0:03:34.338
<v S2>the bad publicity. They've actually had to take the voice down. Yeah.

0:03:34.848 --> 0:03:37.308
<v S1>Because, I mean, as a blind person, I've had talking

0:03:37.308 --> 0:03:41.508
<v S1>computers since 1980. So, you know, in one form or another,

0:03:41.838 --> 0:03:45.048
<v S1>I've had computers speak to me. So is it this

0:03:45.378 --> 0:03:47.988
<v S1>thing that I don't know for a, for sighted people,

0:03:47.988 --> 0:03:52.308
<v S1>it's really novel to have a computer quote speak to you.

0:03:52.308 --> 0:03:55.578
<v S1>Or is it the fact that it's supposedly this real

0:03:55.578 --> 0:03:58.788
<v S1>thing behind your computer screen that's really talking to you?

0:04:00.978 --> 0:04:03.858
<v S2>It is a deceit. I mean, it is fooling us

0:04:03.858 --> 0:04:07.698
<v S2>that maybe there's more behind the scenes than there actually is.

0:04:07.908 --> 0:04:10.218
<v S2>As an example, another example, if you use one of

0:04:10.218 --> 0:04:14.448
<v S2>these chatbots like ChatGPT, it slowly types out the answer

0:04:14.448 --> 0:04:18.378
<v S2>to your question. If you actually understand how it works,

0:04:18.378 --> 0:04:21.288
<v S2>it has the answer in a flash. It doesn't actually

0:04:21.288 --> 0:04:24.168
<v S2>have to slowly type out the answer, but they've they've

0:04:24.168 --> 0:04:26.208
<v S2>actually built the interface. So it does that. So you

0:04:26.208 --> 0:04:29.028
<v S2>get the feeling that there's a person behind the screen

0:04:29.028 --> 0:04:32.628
<v S2>that's slowly typing away to you to make it more personalized,

0:04:32.628 --> 0:04:35.298
<v S2>to try and engage you more, to to fool you

0:04:35.298 --> 0:04:39.018
<v S2>more that it's smarter, more intelligent, maybe even more sentient

0:04:39.018 --> 0:04:39.918
<v S2>than it is.

0:04:39.918 --> 0:04:42.438
<v S1>That's right. And a lot of the the nicer the voice,

0:04:42.438 --> 0:04:45.288
<v S1>the more I tend to actually, um, dislike it more

0:04:45.288 --> 0:04:48.558
<v S1>because as far as I'm concerned, you're a computer. Don't

0:04:48.558 --> 0:04:50.988
<v S1>care what you sound like. Give me the information and

0:04:50.988 --> 0:04:54.048
<v S1>make it accurate. And the one of the issues that

0:04:54.048 --> 0:04:57.468
<v S1>I've got, particularly with generative AI, is that if it's

0:04:57.468 --> 0:05:04.818
<v S1>just doing straight, you know, word, sentence, paragraph, page prediction, um,

0:05:04.818 --> 0:05:08.148
<v S1>how can it actually get things so badly incorrect sometimes?

0:05:08.148 --> 0:05:11.538
<v S1>I mean, yes, it's almost like a default. Sometimes. Most

0:05:11.538 --> 0:05:14.928
<v S1>of the time it'll get it correct almost accidentally, I

0:05:14.928 --> 0:05:17.838
<v S1>think sometimes based on its data. But when it gets

0:05:17.838 --> 0:05:20.388
<v S1>things wrong, it really gets things wrong.

0:05:21.138 --> 0:05:24.078
<v S2>Well, because it's, it's that's the thing that we should

0:05:24.078 --> 0:05:28.428
<v S2>remember that it's artificial intelligence. It's not human intelligence. It's

0:05:28.428 --> 0:05:30.408
<v S2>not going to break like human touch. I mean, the

0:05:30.408 --> 0:05:35.538
<v S2>remarkable thing about human intelligence is, is how robust we are.

0:05:35.628 --> 0:05:39.828
<v S2>Whereas computers, people who program computers quickly realize computers are

0:05:39.828 --> 0:05:43.518
<v S2>very brittle. They break and they break very catastrophically. They

0:05:43.518 --> 0:05:46.938
<v S2>don't have all of our common sense. They don't have

0:05:46.938 --> 0:05:49.848
<v S2>our remarkability I can take you, I can drop you

0:05:49.848 --> 0:05:52.608
<v S2>into a new circumstance and you could start doing stuff.

0:05:52.998 --> 0:05:56.298
<v S2>Whereas a computer, you change the input very slightly and

0:05:56.298 --> 0:06:01.308
<v S2>it falls over completely. Um, because it's an artificial, very

0:06:01.308 --> 0:06:05.298
<v S2>different intelligence to human intelligence. And we we're quick, especially

0:06:05.298 --> 0:06:06.978
<v S2>when it sounds like us to think it's going to

0:06:06.978 --> 0:06:09.708
<v S2>be like us, but not realizing. No, actually it's quite

0:06:09.708 --> 0:06:13.188
<v S2>a different type, quite a different flavor of intelligence. I

0:06:13.188 --> 0:06:13.668
<v S2>can remember.

0:06:13.668 --> 0:06:16.518
<v S1>This. I can't remember what Star Trek episode it was,

0:06:16.518 --> 0:06:18.708
<v S1>but I think it was one of the original episodes

0:06:18.708 --> 0:06:21.378
<v S1>with Spock involved because he had to sort of like,

0:06:21.378 --> 0:06:24.708
<v S1>you know, he had to do these, um, his, you know,

0:06:24.708 --> 0:06:27.768
<v S1>mind meld. But he did it with a computer, and

0:06:27.768 --> 0:06:30.468
<v S1>I can remember this computer. It was some sort of

0:06:30.468 --> 0:06:33.888
<v S1>bastardized name of Voyager, because NASA sent out the Voyager

0:06:33.888 --> 0:06:37.728
<v S1>probe and this thing came back, which was this alien spacecraft.

0:06:37.728 --> 0:06:40.728
<v S1>And it was interesting because what they were basically saying

0:06:40.728 --> 0:06:42.678
<v S1>in the episode is that you can have all the

0:06:42.678 --> 0:06:45.648
<v S1>knowledge in the world. It actually doesn't mean you're intelligent.

0:06:45.648 --> 0:06:49.668
<v S1>And I just think it's such a misused word because

0:06:49.668 --> 0:06:51.978
<v S1>I know it's artificial, but I think it's a bit

0:06:51.978 --> 0:06:55.008
<v S1>bold to actually use the word intelligence because it's actually not.

0:06:55.008 --> 0:06:58.578
<v S1>You actually just all it's doing is just looking up facts.

0:06:58.578 --> 0:07:02.118
<v S1>It's not really extrapolating anything. And I just, I just

0:07:02.118 --> 0:07:04.458
<v S1>want to actually have a new name for artificial intelligence

0:07:04.458 --> 0:07:05.748
<v S1>because it's not intelligent.

0:07:05.958 --> 0:07:08.178
<v S2>Well, it nearly got called many other things. It nearly

0:07:08.178 --> 0:07:13.608
<v S2>got called cybernetics in Europe. They call it informatics. Um, yeah.

0:07:13.608 --> 0:07:16.218
<v S2>It's at the end of the day, it's a terrible name.

0:07:16.398 --> 0:07:18.858
<v S2>Intelligence is very poorly defined. So what the heck could

0:07:18.858 --> 0:07:23.058
<v S2>artificial intelligence be? But Spock is a good example because,

0:07:23.058 --> 0:07:26.118
<v S2>you know, he was quite a smart guy, but he

0:07:26.118 --> 0:07:30.498
<v S2>was somewhat lacking in emotional intelligence. And you realize there

0:07:30.498 --> 0:07:33.948
<v S2>are many different facets to intelligence your social intelligence, our

0:07:33.948 --> 0:07:38.268
<v S2>emotional intelligence, our creativity that are very important, really important

0:07:38.268 --> 0:07:42.438
<v S2>to to our interactions with each other. And Spock, despite

0:07:42.438 --> 0:07:46.038
<v S2>he was clearly a very smart cookie. Mm. Was also

0:07:46.038 --> 0:07:49.038
<v S2>quite lacking in his interpersonal skills. Yes.

0:07:49.038 --> 0:07:51.198
<v S1>Yeah. No it was it used to be quite funny.

0:07:51.198 --> 0:07:51.828
<v S1>It was great.

0:07:51.948 --> 0:07:54.918
<v S2>And the same would be the same is true of machines.

0:07:54.918 --> 0:07:58.968
<v S2>They don't have. Um, no. Some cognitive intelligence. They have

0:07:58.968 --> 0:08:01.848
<v S2>some ability to, you know, answer general knowledge questions and

0:08:01.848 --> 0:08:04.788
<v S2>do maths and those things. But they are very severely

0:08:04.788 --> 0:08:08.508
<v S2>lacking in emotional intelligence. And that's likely to be a

0:08:08.508 --> 0:08:12.948
<v S2>significant handicap, that they have a long time going forwards,

0:08:13.188 --> 0:08:16.278
<v S2>in part because they're not they don't have emotions themselves.

0:08:16.278 --> 0:08:19.428
<v S2>I mean, obviously they don't have emotions. Emotions are biochemical.

0:08:19.428 --> 0:08:23.418
<v S2>So they they're electrical devices. So they don't have anything

0:08:23.418 --> 0:08:26.208
<v S2>like that. And, and one of the great things that

0:08:26.208 --> 0:08:28.548
<v S2>we have is that, well, we can reflect on, on

0:08:28.548 --> 0:08:31.158
<v S2>our own emotions. We can say, well, how would I

0:08:31.158 --> 0:08:34.788
<v S2>feel if, um, you know, someone said that to me

0:08:34.788 --> 0:08:37.848
<v S2>and then I could think, oh, I'd be really upset. Um,

0:08:37.848 --> 0:08:41.328
<v S2>and so machines can't do that. They don't have any

0:08:41.328 --> 0:08:43.848
<v S2>insight to reflect and think about how I would feel,

0:08:43.848 --> 0:08:46.908
<v S2>because they have no feelings themselves. So they're going to

0:08:46.908 --> 0:08:51.618
<v S2>be very severely handicapped, certainly that facet of their intelligence. Um,

0:08:51.618 --> 0:08:52.818
<v S2>for a long time. Absolutely.

0:08:53.208 --> 0:08:56.718
<v S1>I, I can remember the first, um, thing I played

0:08:56.718 --> 0:08:58.728
<v S1>on my little Apple TV of speech back in the

0:08:58.728 --> 0:09:05.568
<v S1>early 1980s. Was that horrible, ridiculous psychotherapy program. Eliza. Um.

0:09:05.748 --> 0:09:08.508
<v S1>And I used to try and trick it. I'd say, oh, look, my,

0:09:08.508 --> 0:09:11.958
<v S1>my my bed is feeling very sad today. And it's like, oh,

0:09:11.958 --> 0:09:14.898
<v S1>when was the last time your your bed felt sad?

0:09:15.678 --> 0:09:18.348
<v S1>And it just got more and more ridiculous. And I thought, yeah,

0:09:18.348 --> 0:09:20.508
<v S1>all you're really doing is just word prediction. You're just

0:09:20.508 --> 0:09:24.618
<v S1>doing straight reflection. Um, so we're we're smart speakers these days,

0:09:24.618 --> 0:09:28.548
<v S1>are they? Because I'm assuming, you know, our traditional smart

0:09:28.548 --> 0:09:31.578
<v S1>speakers lie. So the, the upper ones, the Amazon ones

0:09:31.578 --> 0:09:33.918
<v S1>and the Google ones, I'm assuming at the moment they're

0:09:33.918 --> 0:09:37.458
<v S1>still not using large language models. I'm still they've sort

0:09:37.458 --> 0:09:39.378
<v S1>of got a I'm assuming they've got like some sort

0:09:39.378 --> 0:09:42.738
<v S1>of set series of instructions and that's all they're currently

0:09:42.738 --> 0:09:43.848
<v S1>working on at the moment.

0:09:44.388 --> 0:09:49.068
<v S2>Yes. Something like Alexa. Very it's very heavily programmed, very

0:09:49.068 --> 0:09:53.568
<v S2>heavily scripted. Mm. Um, but the future is going to

0:09:53.568 --> 0:09:56.988
<v S2>be that ultimately AI is going to be the operating

0:09:56.988 --> 0:10:00.588
<v S2>system of those devices. And indeed all our devices, your

0:10:00.588 --> 0:10:05.448
<v S2>your smartwatch, your smart phone, your smart home, your smart toaster,

0:10:05.448 --> 0:10:08.298
<v S2>your smart front door, your smart light switch, they're all

0:10:08.298 --> 0:10:11.808
<v S2>going to have, uh, it it is actually when we

0:10:11.808 --> 0:10:13.428
<v S2>come back to Scarlett Johansson, is is going to be

0:10:13.428 --> 0:10:16.608
<v S2>like that movie. Her AI is going to be the

0:10:16.608 --> 0:10:20.178
<v S2>operating system of all of those devices that's going to

0:10:20.178 --> 0:10:22.308
<v S2>be allowed you to interact with them so you can

0:10:22.308 --> 0:10:25.248
<v S2>talk to them. They understand at a much higher level

0:10:25.248 --> 0:10:27.558
<v S2>what you want them to do, and then do that

0:10:27.558 --> 0:10:30.558
<v S2>stuff for you. So it is, um, you know, we're

0:10:30.558 --> 0:10:33.888
<v S2>only at the beginnings of that journey where, um, devices

0:10:33.888 --> 0:10:37.638
<v S2>get upgraded with more and more AI that allows us

0:10:37.638 --> 0:10:44.368
<v S2>to have a richer and richer conversation with. Um, so that,

0:10:44.368 --> 0:10:47.668
<v S2>you know, ultimately, you know, people will, I think, become

0:10:47.668 --> 0:10:50.848
<v S2>somewhat attached to these devices that they're always talking to.

0:10:50.878 --> 0:10:54.298
<v S1>No. Yeah. The one thing that always gets me very

0:10:54.298 --> 0:10:58.498
<v S1>nervous about, um, anything to do with AI is when

0:10:58.498 --> 0:11:01.918
<v S1>it comes to computer vision. Um, because I know in

0:11:01.918 --> 0:11:04.438
<v S1>your book you were talking about, you know, um, you know,

0:11:04.438 --> 0:11:08.698
<v S1>there's lots of data being trained on, on, on medical stuff, on, uh,

0:11:08.698 --> 0:11:11.128
<v S1>doing stuff for radiology and all that sort of stuff.

0:11:11.398 --> 0:11:15.088
<v S1>The problem I have as a blind person is, you know,

0:11:15.088 --> 0:11:17.998
<v S1>whenever I use my smartphone to take a picture of

0:11:17.998 --> 0:11:20.728
<v S1>something in the backyard or out the front and so on,

0:11:21.148 --> 0:11:25.228
<v S1>there's still about a 75 or maybe even 80% chance

0:11:25.228 --> 0:11:27.988
<v S1>of if I was being nice, that what the camera's

0:11:27.988 --> 0:11:32.278
<v S1>actually telling me what the object is is actually incorrect.

0:11:32.278 --> 0:11:35.068
<v S1>And I just thought, is that just the state of

0:11:35.068 --> 0:11:38.878
<v S1>computer vision? Or is the fact that, you know, it

0:11:38.878 --> 0:11:42.928
<v S1>doesn't have enough data to base the fact that you know,

0:11:42.928 --> 0:11:46.948
<v S1>my red garbage bin is not a fire hydrant on

0:11:46.948 --> 0:11:48.808
<v S1>my grass verge, for example?

0:11:48.808 --> 0:11:53.638
<v S2>Yeah, it's it's getting better and better as we train

0:11:53.638 --> 0:11:56.578
<v S2>them on more and more data. Um, you know, the

0:11:56.578 --> 0:11:58.858
<v S2>idea ultimately of computer vision is to be able to

0:11:58.858 --> 0:12:02.098
<v S2>understand the world so that computers can robots, for example,

0:12:02.098 --> 0:12:05.488
<v S2>or autonomous cars can navigate around it. Mm. Uh, and

0:12:05.488 --> 0:12:09.148
<v S2>I'm pretty confident we will get there, um, in part because,

0:12:09.148 --> 0:12:13.108
<v S2>for example, um, we're driving increasing. You know, if you're

0:12:13.108 --> 0:12:16.138
<v S2>driving a Tesla, you're helping to train the next generation

0:12:16.138 --> 0:12:19.048
<v S2>of Tesla. Yeah. Um, and what and the other thing

0:12:19.048 --> 0:12:22.648
<v S2>that's really making a big difference is, interestingly enough, is

0:12:22.648 --> 0:12:26.068
<v S2>the is that, in fact, actually Tesla's do more of

0:12:26.068 --> 0:12:30.088
<v S2>their computer vision training on simulators than they do in

0:12:30.088 --> 0:12:34.138
<v S2>the real world that the simulators now, the car driving

0:12:34.138 --> 0:12:39.118
<v S2>simulators are so good, so realistic. Mm. Um, that they drive,

0:12:39.118 --> 0:12:41.968
<v S2>you know, more than ten times the distance every night

0:12:41.968 --> 0:12:43.948
<v S2>in simulators. And of course, if you're doing it in

0:12:43.948 --> 0:12:46.138
<v S2>a simulator, you can do it faster than real time.

0:12:46.138 --> 0:12:48.898
<v S2>You can you can speed up the world ten times

0:12:48.898 --> 0:12:53.578
<v S2>or 100 times, um, and train the computer systems on those.

0:12:53.578 --> 0:12:55.378
<v S2>And then the other great thing about simulators, of course,

0:12:55.378 --> 0:12:57.598
<v S2>is you can you can do things that wouldn't, you

0:12:57.598 --> 0:12:59.338
<v S2>wouldn't be allowed to do in the real world. You know,

0:12:59.338 --> 0:13:02.518
<v S2>you can have you can practice accidents. You can make

0:13:02.518 --> 0:13:06.658
<v S2>the conditions really difficult. Yeah. Um, you know, you can

0:13:06.658 --> 0:13:08.878
<v S2>put the sun into the eye of the driver and

0:13:08.878 --> 0:13:11.038
<v S2>you make the road wet. You can do things that

0:13:11.038 --> 0:13:13.198
<v S2>you know, where you might be risking the life of

0:13:13.198 --> 0:13:16.678
<v S2>the driver and and cause an accident, and then no

0:13:16.678 --> 0:13:19.018
<v S2>one dies, which is great. And then, of course, what

0:13:19.018 --> 0:13:23.008
<v S2>you can do in a simulator is that you can say, well, okay, um, the,

0:13:23.008 --> 0:13:25.168
<v S2>the computer failed to do it that right that time,

0:13:25.168 --> 0:13:27.448
<v S2>but let's train the algorithm again. Let's try and see

0:13:27.448 --> 0:13:29.698
<v S2>if we can correct it and then run it again

0:13:29.698 --> 0:13:35.338
<v S2>in exactly the same circumstances. You. Mm. Replication of the experiment,

0:13:35.338 --> 0:13:36.628
<v S2>which you can't of course, do in the real world.

0:13:36.628 --> 0:13:39.868
<v S2>You can never replicate exactly those conditions again. No, you

0:13:39.868 --> 0:13:41.968
<v S2>can say let's repeat it until the computer gets it

0:13:41.968 --> 0:13:45.028
<v S2>right and doesn't kill the driver. No. That's true. And

0:13:45.028 --> 0:13:48.628
<v S2>so so being able to train computers and simulators is

0:13:48.628 --> 0:13:52.228
<v S2>actually going to really help us move forwards in leaps

0:13:52.228 --> 0:13:52.558
<v S2>and bounds.

0:13:52.768 --> 0:13:54.718
<v S1>So is there any I mean, there are there any

0:13:54.718 --> 0:13:58.708
<v S1>true level five self-driving cars on the road, or are

0:13:58.708 --> 0:14:01.138
<v S1>there sort of or or back down to the level?

0:14:01.138 --> 0:14:03.118
<v S1>I think it's a level two isn't it? They're not

0:14:03.118 --> 0:14:05.188
<v S1>really self autonomous vehicles at the moment.

0:14:05.368 --> 0:14:08.728
<v S2>2 to 3. Yes. Um, yeah. I mean, the bad

0:14:08.728 --> 0:14:12.148
<v S2>news there is that unfortunately sighted people are going to get, um,

0:14:12.148 --> 0:14:17.878
<v S2>autonomous driving before people with, um, visual impairment because it's

0:14:17.878 --> 0:14:21.508
<v S2>slowly being put into our cars without us realizing. I

0:14:21.508 --> 0:14:23.878
<v S2>was driving my car the other day, and I realized

0:14:23.878 --> 0:14:27.298
<v S2>I'm driving less. I'm not looking over my shoulder as

0:14:27.298 --> 0:14:30.148
<v S2>often as I. I suspect it's going to be our

0:14:30.148 --> 0:14:33.658
<v S2>children who don't drive cars. They will, but they won't

0:14:33.658 --> 0:14:36.808
<v S2>get around to get their licenses. And in our case,

0:14:36.808 --> 0:14:39.928
<v S2>for people who already can drive. Mm. It's going to

0:14:39.928 --> 0:14:43.708
<v S2>slowly happen to us. Mhm. So um, I imagine at

0:14:43.708 --> 0:14:45.238
<v S2>some point in the future I'm going to go to

0:14:45.238 --> 0:14:48.658
<v S2>the RTA to renew my driving license, and they're going

0:14:48.658 --> 0:14:50.878
<v S2>to say to me, well, Mr. Walsh, we, um, we

0:14:50.878 --> 0:14:53.278
<v S2>checked your computer records and it seems that you haven't

0:14:53.278 --> 0:14:57.028
<v S2>been doing much driving recently, but it's actually been the car,

0:14:57.028 --> 0:14:59.758
<v S2>been doing all the driving, all the assists, all the

0:14:59.758 --> 0:15:02.998
<v S2>autonomous assistance in the car. And you really don't have

0:15:02.998 --> 0:15:06.268
<v S2>the hours under your belt. So you have two choices now, Mr. Walsh.

0:15:06.268 --> 0:15:08.878
<v S2>You can either you can either take your test again

0:15:08.878 --> 0:15:10.888
<v S2>to prove that you can still actually drive the car.

0:15:10.888 --> 0:15:14.038
<v S2>It's not just the all the intelligent assistants, the the

0:15:14.038 --> 0:15:16.708
<v S2>lane following and the automatic braking and all of that

0:15:16.708 --> 0:15:19.978
<v S2>and the automatic steering, the parking. Or we can give

0:15:19.978 --> 0:15:24.388
<v S2>you this new, um, non-driving driving license that, that and

0:15:24.388 --> 0:15:26.368
<v S2>I think I expect well, I it was too painful

0:15:26.368 --> 0:15:28.078
<v S2>getting my license in the first place. I think I'll

0:15:28.078 --> 0:15:30.538
<v S2>get the non-driving license. And they say, oh, by the way, Mr. Walsh,

0:15:30.538 --> 0:15:32.248
<v S2>you'll get a discount on your insurance now.

0:15:32.338 --> 0:15:34.708
<v S3>Yeah. That's right. Yes.

0:15:35.188 --> 0:15:37.348
<v S2>Remove the dangerous human from the loop.

0:15:37.348 --> 0:15:40.108
<v S1>That's right. My my wife often says it's it's actually

0:15:40.108 --> 0:15:42.088
<v S1>the other drivers that, um, are the.

0:15:42.838 --> 0:15:44.218
<v S3>And that is part.

0:15:44.218 --> 0:15:46.468
<v S2>Of the problem. Yes, dealing with the uncertainty of other drivers.

0:15:46.558 --> 0:15:49.078
<v S2>Once vehicles can talk to each other, then there would

0:15:49.108 --> 0:15:51.808
<v S2>be much less uncertainty. But I think it's slowly creeping

0:15:51.808 --> 0:15:54.508
<v S2>up on us. And then there will be it will

0:15:54.508 --> 0:15:57.118
<v S2>happen in special places. So there'll be the, you know,

0:15:57.118 --> 0:16:00.568
<v S2>inner city congestion charging zone where you're only allowed to

0:16:00.568 --> 0:16:03.088
<v S2>go in if you're in an electric car that's autonomous

0:16:03.088 --> 0:16:06.208
<v S2>or the high speed of the highway, you'll only be

0:16:06.208 --> 0:16:09.838
<v S2>able to enter the high speed of the highway where

0:16:09.838 --> 0:16:12.118
<v S2>you can platoon the cars together if you've got the

0:16:12.118 --> 0:16:13.378
<v S2>autonomous aids.

0:16:13.618 --> 0:16:17.788
<v S1>So the only question I've got to with autonomous cars

0:16:17.788 --> 0:16:21.578
<v S1>is it's fine to take you somewhere. But there seems

0:16:21.578 --> 0:16:24.608
<v S1>to be no mention at the moment about how I

0:16:24.638 --> 0:16:27.308
<v S1>could then say, okay, so the car's parked you in

0:16:27.308 --> 0:16:32.018
<v S1>the car park at Woolworths. I'm now going to actually

0:16:32.018 --> 0:16:35.408
<v S1>let you I'm going to guide you to where you

0:16:35.408 --> 0:16:40.118
<v S1>can start entering into the actual shop itself inside the complex.

0:16:40.478 --> 0:16:43.208
<v S1>Is that the sort of stuff that I could actually

0:16:43.208 --> 0:16:45.338
<v S1>do for people that are blind? So your GPS gets

0:16:45.338 --> 0:16:48.278
<v S1>you to the Or the car gets you to the location,

0:16:48.278 --> 0:16:50.468
<v S1>but then, then the AI has then got to get

0:16:50.468 --> 0:16:52.778
<v S1>you out of the car park into the building.

0:16:54.108 --> 0:16:57.708
<v S2>Yeah, yeah, it's the last 100m or the or the

0:16:57.708 --> 0:17:01.218
<v S2>last kilometer. That is a problem. And of course, GPS

0:17:01.218 --> 0:17:06.678
<v S2>doesn't work inside. No. And GPS doesn't have the accuracy or,

0:17:06.678 --> 0:17:10.368
<v S2>you know, really to do that, you know, very high precision. Um,

0:17:10.368 --> 0:17:14.358
<v S2>last 100m, which is exactly where you want AI to

0:17:14.358 --> 0:17:17.088
<v S2>step in, which is exactly why we work on things

0:17:17.088 --> 0:17:18.798
<v S2>like computer vision. Because you have to see the world.

0:17:18.798 --> 0:17:20.418
<v S2>You have to see where the door handle is. You

0:17:20.418 --> 0:17:23.328
<v S2>have to see there's a, you know, another pedestrian coming

0:17:23.328 --> 0:17:27.918
<v S2>out of the door and someone's left, um, a dog

0:17:27.918 --> 0:17:29.178
<v S2>tied to the railings. And you.

0:17:29.568 --> 0:17:30.108
<v S3>Yeah.

0:17:30.738 --> 0:17:33.798
<v S2>Um, that's why you have to treat, teach computers how

0:17:33.798 --> 0:17:37.008
<v S2>to see the world so you can navigate those last 100m.

0:17:37.128 --> 0:17:37.278
<v S3>Mm.

0:17:37.308 --> 0:17:39.168
<v S1>So is there any is there any sort of, like,

0:17:39.168 --> 0:17:40.818
<v S1>3D modeling.

0:17:40.818 --> 0:17:41.898
<v S3>Or.

0:17:41.898 --> 0:17:46.038
<v S1>Sort of computer vision type stuff where, um, I don't know,

0:17:46.038 --> 0:17:48.648
<v S1>a shopping center could say, well, look, here's a complete

0:17:48.648 --> 0:17:53.328
<v S1>3D visual map of the whole complex, and then we'll,

0:17:53.328 --> 0:17:56.538
<v S1>we'll actually use that data, imagery, that sort of stuff,

0:17:56.538 --> 0:18:00.888
<v S1>then to guide the person around. So I can say, okay, so, um,

0:18:00.888 --> 0:18:03.948
<v S1>the software or the AI knows that I'm in the building.

0:18:03.948 --> 0:18:08.808
<v S1>I want to go to the storage section in Bulworth's

0:18:09.168 --> 0:18:12.408
<v S1>so I, you can see where I am. Can you,

0:18:12.408 --> 0:18:14.868
<v S1>can you guide me to that spot just based on

0:18:14.868 --> 0:18:17.118
<v S1>the stuff that you write, the stuff that it's actually

0:18:17.118 --> 0:18:18.048
<v S1>sees around me?

0:18:19.798 --> 0:18:20.038
<v S3>Yeah.

0:18:21.568 --> 0:18:25.078
<v S2>The real problem, um, with that is that the world

0:18:25.078 --> 0:18:28.588
<v S2>keeps on changing. So they have high precision maps and

0:18:28.588 --> 0:18:33.358
<v S2>high precision, uh, 3D models of the world. But the

0:18:33.358 --> 0:18:37.618
<v S2>world has always changed. So one of the things that

0:18:37.618 --> 0:18:41.248
<v S2>we try and do in, in AI is what's called Slam.

0:18:41.248 --> 0:18:45.838
<v S2>It's simultaneous location localization and mapping, which is look at

0:18:45.838 --> 0:18:49.228
<v S2>the world, understand where you are, and also map the

0:18:49.228 --> 0:18:52.528
<v S2>world at the same time. So actually the computer vision

0:18:52.528 --> 0:18:56.188
<v S2>is not only working out where you are with perspective to,

0:18:56.188 --> 0:18:58.558
<v S2>you know, the front door and the door handle, but

0:18:58.558 --> 0:19:01.798
<v S2>also working out what is in the world to see actually, oh,

0:19:01.798 --> 0:19:03.718
<v S2>there's a dog over there now, which was obviously not

0:19:03.718 --> 0:19:06.508
<v S2>in my model of the world because it's just stepped

0:19:06.508 --> 0:19:09.808
<v S2>into frame. Mm. Um, so that's something that, you know,

0:19:09.808 --> 0:19:11.758
<v S2>we put a lot of effort in. I trying to

0:19:11.758 --> 0:19:15.118
<v S2>build systems that can actually situate themselves in the world,

0:19:15.118 --> 0:19:18.028
<v S2>but also map the world as they find it, because

0:19:18.028 --> 0:19:20.608
<v S2>the world keeps changing and you can never have, you know,

0:19:20.608 --> 0:19:22.498
<v S2>up to date maps and models of the world. They're

0:19:22.498 --> 0:19:25.408
<v S2>always out of date. You really got to go there

0:19:25.408 --> 0:19:29.038
<v S2>and actually perceive the world and work out what state

0:19:29.038 --> 0:19:30.178
<v S2>is the world now in.

0:19:30.328 --> 0:19:33.148
<v S1>Yeah. And look, that that's my ultimate thing to do

0:19:33.148 --> 0:19:35.368
<v S1>with I mean I can, I can sort of take

0:19:35.368 --> 0:19:38.248
<v S1>or leave self-driving cars. But if I knew I had

0:19:38.248 --> 0:19:45.208
<v S1>a computer vision system that could 100% independently navigate me around, um,

0:19:45.208 --> 0:19:51.418
<v S1>into shopping centers or public infrastructure, um, transport hubs, the airport,

0:19:51.418 --> 0:19:54.688
<v S1>that sort of stuff. That would be absolutely amazing. So

0:19:54.688 --> 0:19:55.708
<v S1>for me, the.

0:19:55.708 --> 0:19:57.628
<v S2>Good news is that's coming, right? Okay.

0:19:57.628 --> 0:19:58.378
<v S3>So it's not.

0:19:58.738 --> 0:20:00.238
<v S1>It's not pie in the sky.

0:20:00.238 --> 0:20:00.718
<v S3>Stuff.

0:20:00.718 --> 0:20:03.268
<v S2>It's not magic because we do it right. We do

0:20:03.268 --> 0:20:07.138
<v S2>it with our two eyes. Stereoscopic vision. Yeah. We managed.

0:20:07.138 --> 0:20:11.398
<v S2>Humans have managed to do that. Um, um, and we

0:20:11.398 --> 0:20:15.358
<v S2>are slowly getting to the point where, increasingly with increasing accuracy,

0:20:15.388 --> 0:20:17.998
<v S2>we can get computers to do the same thing. Mm.

0:20:18.328 --> 0:20:20.188
<v S2>And the great thing about computers, of course, as well,

0:20:20.188 --> 0:20:23.548
<v S2>is that they're not limited to the visual spectrum. So

0:20:23.548 --> 0:20:25.888
<v S2>they can also see the world in microwaves and infrared.

0:20:25.888 --> 0:20:29.278
<v S2>And um, so they can see. And so if it's

0:20:29.278 --> 0:20:32.758
<v S2>low light or bad weather, then they can also still

0:20:32.758 --> 0:20:36.628
<v S2>see the world when, um, uh, where vision alone may,

0:20:36.628 --> 0:20:39.868
<v S2>may make it really challenging problem. So I have the

0:20:39.868 --> 0:20:44.668
<v S2>possibility ultimately of actually helping us to see the world better.

0:20:44.758 --> 0:20:46.198
<v S2>We can see the world because they can do it

0:20:46.198 --> 0:20:48.988
<v S2>even if it was dark or even if it was

0:20:48.988 --> 0:20:49.948
<v S2>pouring with rain.

0:20:50.338 --> 0:20:50.818
<v S3>Okay.

0:20:51.148 --> 0:20:55.258
<v S1>And just finally, has there been much research into how

0:20:55.258 --> 0:20:59.158
<v S1>AI could impact people with a disability moving forward into

0:20:59.158 --> 0:20:59.878
<v S1>the future?

0:21:00.908 --> 0:21:04.418
<v S2>But, I mean, there has been work. Um, not enough work, actually,

0:21:04.418 --> 0:21:09.608
<v S2>I'm sure. Um, and, um, because when I think about

0:21:09.818 --> 0:21:13.028
<v S2>what are the technologies that are going to help people

0:21:13.358 --> 0:21:16.838
<v S2>with limited hearing or with limited vision, they are exactly

0:21:16.838 --> 0:21:19.928
<v S2>I the I is the technology that allows computers to

0:21:19.928 --> 0:21:22.598
<v S2>hear the world and see the world, and then convey

0:21:22.598 --> 0:21:26.738
<v S2>that information to those people who would otherwise be more

0:21:26.738 --> 0:21:30.128
<v S2>isolated than they need be. So. So I, I have

0:21:30.128 --> 0:21:32.948
<v S2>a lot of hope. The problem. The problem, of course,

0:21:32.948 --> 0:21:36.248
<v S2>the fundamental problem is not a technical problem. It's a

0:21:36.248 --> 0:21:38.408
<v S2>societal and a financial one, which is how do we

0:21:38.408 --> 0:21:42.218
<v S2>ensure that there are incentives for the tech companies and

0:21:42.218 --> 0:21:44.888
<v S2>business to do that? Because to begin with, it's going

0:21:44.888 --> 0:21:45.848
<v S2>to cost them money.

0:21:45.998 --> 0:21:46.178
<v S3>Mm.

0:21:46.418 --> 0:21:49.178
<v S2>It's much easier for them to cater for the, you know,

0:21:49.178 --> 0:21:52.418
<v S2>the vast majority of people who are, uh, you know,

0:21:52.418 --> 0:21:57.758
<v S2>normally sighted and, um, normally hearing and not, um, you know,

0:21:57.758 --> 0:21:59.888
<v S2>invest the time and effort and actually use this as

0:21:59.888 --> 0:22:02.828
<v S2>a way of making, um, it more accessible for people

0:22:02.828 --> 0:22:04.088
<v S2>with with disability.

0:22:04.268 --> 0:22:04.778
<v S3>No.

0:22:04.928 --> 0:22:07.898
<v S1>And I get this asked a lot about robotics and

0:22:08.018 --> 0:22:13.568
<v S1>artificial intelligence, but there's some interesting stuff coming out for mobility,

0:22:13.598 --> 0:22:16.358
<v S1>stuff to do for blind and low vision. And one

0:22:16.358 --> 0:22:19.928
<v S1>of the major products is called glide from glide technology.

0:22:19.928 --> 0:22:22.658
<v S1>And what it is, it's basically a well, people say

0:22:22.658 --> 0:22:24.458
<v S1>it looks like a little vacuum cleaner. You've got little

0:22:24.698 --> 0:22:26.978
<v S1>two little robot with a handle on it, and the

0:22:26.978 --> 0:22:30.878
<v S1>blind person hangs onto the handle, and then it steers

0:22:30.878 --> 0:22:33.998
<v S1>you around obstacles. And because it's connected to your smartphone,

0:22:33.998 --> 0:22:36.158
<v S1>you've got an app that will tell you what's around you,

0:22:36.548 --> 0:22:40.028
<v S1>as in businesses and that sort of stuff. But I'm

0:22:40.028 --> 0:22:46.298
<v S1>just wondering how trustworthy is such a system, because I'm

0:22:46.298 --> 0:22:50.738
<v S1>assuming it's using lidar and radar and infrared and all

0:22:50.738 --> 0:22:54.338
<v S1>sorts of amazing stuff, but I just, I just think

0:22:54.338 --> 0:22:56.528
<v S1>at the end of the day, it it it doesn't.

0:22:56.528 --> 0:22:58.028
<v S1>And I come back to that original thing that I

0:22:58.028 --> 0:23:00.278
<v S1>talked about. It doesn't have the level, a level of

0:23:00.278 --> 0:23:04.928
<v S1>human intelligence or thinking outside of the box, if you'll

0:23:05.138 --> 0:23:07.418
<v S1>pardon the pun, if the wheels fall off, for example.

0:23:10.018 --> 0:23:10.828
<v S3>Yes.

0:23:10.918 --> 0:23:15.658
<v S2>Yeah, it's obviously it doesn't replace what those wonderful guide

0:23:15.658 --> 0:23:20.638
<v S2>dogs do. Mhm. Uh, because if you know, I've had

0:23:20.638 --> 0:23:24.448
<v S2>the pleasure to meet some blood blind people with their

0:23:24.448 --> 0:23:28.678
<v S2>dogs and they have a wonderful relationship with their dog. Mm. Um,

0:23:28.678 --> 0:23:31.468
<v S2>and it is about that as much as anything that

0:23:31.468 --> 0:23:34.198
<v S2>they can trust the dog to, to, to guide them across,

0:23:34.198 --> 0:23:36.448
<v S2>you know, a busy road and putting their life in

0:23:36.448 --> 0:23:41.968
<v S2>the hands of the dog, literally. Um, and, uh, that

0:23:41.968 --> 0:23:48.418
<v S2>is a significant technical, um, milestone to me. Mm. But equally.

0:23:48.598 --> 0:23:52.558
<v S2>Mm am also aware of how expensive it is to

0:23:52.558 --> 0:23:57.298
<v S2>train guide dogs and how limited supply they are in. So, um,

0:23:57.298 --> 0:24:00.448
<v S2>I do think, you know, there is a possibility there

0:24:00.448 --> 0:24:04.348
<v S2>that we might be able to provide that mobility that perhaps,

0:24:04.348 --> 0:24:06.808
<v S2>you know, some people can't because we don't have enough blind.

0:24:06.838 --> 0:24:09.178
<v S2>You know, we don't have enough guide dogs to help

0:24:09.178 --> 0:24:11.248
<v S2>people around. It would be wonderful if we had more

0:24:11.248 --> 0:24:14.308
<v S2>and we could afford more. We trained more. But yeah,

0:24:14.398 --> 0:24:17.128
<v S2>I'm not sure that's the world that we're in. And so.

0:24:17.338 --> 0:24:17.668
<v S3>Um.

0:24:17.668 --> 0:24:20.548
<v S2>If it is not perhaps as such a good solution.

0:24:20.548 --> 0:24:22.618
<v S3>Yeah, we might look.

0:24:22.618 --> 0:24:24.568
<v S1>And the other thing too, that I always think about,

0:24:24.568 --> 0:24:27.508
<v S1>like I'm always thinking about the. Yes, but if then

0:24:27.508 --> 0:24:29.818
<v S1>when type stuff all the time. Because to me the

0:24:29.818 --> 0:24:33.418
<v S1>world's not black and white. It's got various shades of gray. And,

0:24:33.688 --> 0:24:37.288
<v S1>you know, it's fine. A self-driving robot type thing might

0:24:37.288 --> 0:24:40.168
<v S1>be fine for just trundling down the footpath and maybe

0:24:40.168 --> 0:24:42.988
<v S1>going around the occasional car that's parked on the footpath,

0:24:42.988 --> 0:24:46.168
<v S1>or it sees a branch and it goes, oh, hang

0:24:46.168 --> 0:24:51.298
<v S1>on a minute. Um, I know my, my, my user is, um, 2.2m.

0:24:51.298 --> 0:24:54.448
<v S1>That branch is two metres. I'll, I'll stop and go

0:24:54.448 --> 0:24:57.358
<v S1>around it or do something like that. The problem is

0:24:57.358 --> 0:25:00.358
<v S1>when you get into the highly unpredictable things. So I

0:25:00.358 --> 0:25:02.548
<v S1>remember I was yelling at my guide dog one day

0:25:02.548 --> 0:25:04.378
<v S1>because she wouldn't go in a straight line. She kept

0:25:04.378 --> 0:25:07.678
<v S1>zigzagging and, um, I got to you and I went

0:25:07.858 --> 0:25:10.528
<v S1>freaking bloody my dog. She was doing all sorts of

0:25:10.528 --> 0:25:14.068
<v S1>nut stuff. And they went, yeah, somebody was actually repainting

0:25:14.068 --> 0:25:16.948
<v S1>the manhole covers, and they were lifted off down that footpath.

0:25:16.948 --> 0:25:19.588
<v S1>And I went, oh, what a good guide dog my

0:25:19.588 --> 0:25:20.368
<v S1>guide dog was.

0:25:22.228 --> 0:25:23.758
<v S3>So you're right.

0:25:23.758 --> 0:25:27.598
<v S2>These, these unpredictable things, these these black swan, these corner cases,

0:25:27.598 --> 0:25:29.638
<v S2>those are going to be the challenging ones. And which

0:25:29.638 --> 0:25:31.888
<v S2>is why you're going to see these things turn up

0:25:31.888 --> 0:25:36.178
<v S2>in more controlled environments. So so suppose you live you

0:25:36.178 --> 0:25:39.418
<v S2>know on an estate, you know, an old person's estate

0:25:39.418 --> 0:25:42.868
<v S2>where it's, you know, a gated community and it's, you know,

0:25:42.868 --> 0:25:45.898
<v S2>it's a much more controlled environment. And then we're trundling

0:25:45.898 --> 0:25:50.308
<v S2>around that environment. Um, you might be quite safe with

0:25:50.308 --> 0:25:53.398
<v S2>a robot where, you know, as soon as you leave

0:25:53.398 --> 0:25:55.828
<v S2>the gates of that community, you're out of the big,

0:25:55.828 --> 0:25:59.578
<v S2>wild world where there are people who inconveniently leave manhole

0:25:59.578 --> 0:26:03.298
<v S2>covers off. Might be better off. Um, you know, using

0:26:03.298 --> 0:26:06.538
<v S2>the using the services of a of a of a

0:26:06.538 --> 0:26:09.688
<v S2>guide dog. Um, so I can see it, you know,

0:26:09.688 --> 0:26:12.808
<v S2>those more constrained settings being the ones where this sort

0:26:12.808 --> 0:26:14.818
<v S2>of technology is first turn up, where you can be

0:26:14.818 --> 0:26:17.608
<v S2>more sure that there are, you know, all of these

0:26:17.608 --> 0:26:20.518
<v S2>black swan events that are waiting to manhole covers, waiting

0:26:20.548 --> 0:26:22.948
<v S2>to trip you up or. Um.

0:26:23.398 --> 0:26:23.788
<v S3>Yeah.

0:26:23.788 --> 0:26:27.388
<v S1>And it just seems with the whole of, um, I

0:26:27.388 --> 0:26:30.958
<v S1>there's no I remember that famous saying I can't remember

0:26:30.958 --> 0:26:34.348
<v S1>it exactly, but something about who will guard those self-same guardians.

0:26:34.348 --> 0:26:37.378
<v S1>And I'm just wondering, is there any sort of fallback

0:26:37.378 --> 0:26:41.368
<v S1>system that says, no, the AI system is completely lost.

0:26:41.368 --> 0:26:44.518
<v S1>The plot. It's wrong. Um, it's got the information wrong.

0:26:44.518 --> 0:26:48.478
<v S1>It's got the orientation mobility wrong. It's got everything wrong.

0:26:48.478 --> 0:26:50.698
<v S1>Now it's time to stop and just put a human

0:26:50.698 --> 0:26:51.748
<v S1>being in charge.

0:26:52.558 --> 0:26:56.158
<v S2>This Custodiet custodes. Who guards the guards?

0:26:56.158 --> 0:26:56.938
<v S3>That's the one.

0:26:59.068 --> 0:27:02.038
<v S2>My, my, my classical education finally got.

0:27:02.158 --> 0:27:02.758
<v S3>Well done.

0:27:03.958 --> 0:27:06.418
<v S2>All of that. Studying my Latin and Greek finally got

0:27:06.418 --> 0:27:07.468
<v S2>used once.

0:27:08.608 --> 0:27:08.998
<v S3>Uh.

0:27:09.388 --> 0:27:13.258
<v S2>Yes. At the end of the day, um, we ultimately

0:27:13.258 --> 0:27:15.208
<v S2>humans have to be in charge because only humans can

0:27:15.208 --> 0:27:19.948
<v S2>be held accountable. So, um, um, we there are plentiful

0:27:19.948 --> 0:27:22.468
<v S2>places where I think we're going to have to make

0:27:22.468 --> 0:27:26.818
<v S2>sure that humans are left with, you know, overall responsibility.

0:27:28.178 --> 0:27:32.198
<v S1>Yeah. No, look, I tend to agree because, um, I mean,

0:27:32.198 --> 0:27:34.208
<v S1>every time I look at I and it mostly gets

0:27:34.208 --> 0:27:37.868
<v S1>it wrong or somebody says to me, this Onam system

0:27:37.868 --> 0:27:40.028
<v S1>is the best thing since sliced bread, and it still

0:27:40.028 --> 0:27:41.798
<v S1>gets it wrong. Um.

0:27:42.158 --> 0:27:45.308
<v S2>You you're right. It gets it's still making mistakes. It's

0:27:45.308 --> 0:27:48.848
<v S2>still not perfect, but it's also it's easy to be

0:27:48.848 --> 0:27:52.328
<v S2>forgetful of how it has advanced. I mean, I remember

0:27:52.328 --> 0:27:57.398
<v S2>speech recognition systems 20 years ago. Mm. Incredibly painful. They

0:27:57.398 --> 0:28:01.868
<v S2>had to be speaker trained. Mm. Um, that you you

0:28:01.868 --> 0:28:03.458
<v S2>had to train them on your voice. You had to

0:28:03.458 --> 0:28:05.888
<v S2>train them. They didn't work in the wild. You had

0:28:05.888 --> 0:28:10.538
<v S2>to use the proper microphone and and quiet environment. And

0:28:10.538 --> 0:28:14.198
<v S2>now people expect to, you know, buy a new smartphone,

0:28:14.198 --> 0:28:16.718
<v S2>open it up, walk down the street and start talking. Right.

0:28:17.228 --> 0:28:17.858
<v S3>True.

0:28:17.858 --> 0:28:20.888
<v S2>And it doesn't do a perfect job of transcribing what

0:28:20.888 --> 0:28:23.438
<v S2>people say, but it does a pretty good job. Mm.

0:28:23.528 --> 0:28:27.818
<v S2>And and you're just thinking of the advance that we've got.

0:28:27.848 --> 0:28:31.358
<v S2>I mean, that was just unthought of 20 years ago

0:28:31.358 --> 0:28:33.608
<v S2>to think the idea that it wouldn't be trained for

0:28:33.608 --> 0:28:36.248
<v S2>your voice, it would work in the wild with all

0:28:36.248 --> 0:28:39.338
<v S2>the street noise and wind around you. And you could

0:28:39.338 --> 0:28:41.528
<v S2>just you could just start talking and it would get,

0:28:41.528 --> 0:28:44.798
<v S2>you know, 90% of the words. Right. Mm. Is pretty,

0:28:45.308 --> 0:28:46.688
<v S2>you know, someone who's been working in the field for

0:28:46.688 --> 0:28:49.508
<v S2>those 20 years, I find that pretty amazing. You know,

0:28:49.508 --> 0:28:51.728
<v S2>it's still not it's still not good enough. It's still

0:28:51.728 --> 0:28:55.478
<v S2>not perfect. You still you still, you know, shouldn't stake

0:28:55.508 --> 0:28:58.778
<v S2>your life on it. Um, but for, you know, wandering

0:28:58.778 --> 0:29:03.458
<v S2>around a town, strange environment, and, you know, a country

0:29:03.458 --> 0:29:05.528
<v S2>where they speak a different language, it's good enough to,

0:29:05.528 --> 0:29:07.148
<v S2>you know, make yourself understood.

0:29:07.148 --> 0:29:07.418
<v S3>Well.

0:29:07.418 --> 0:29:09.788
<v S1>That's right. Yeah. And look, and I know when I

0:29:09.788 --> 0:29:12.698
<v S1>first had a look at the Kurzweil personal reader that, um,

0:29:12.698 --> 0:29:15.788
<v S1>Ray Kurzweil brought out in the mid, the mid late 70s,

0:29:16.298 --> 0:29:18.758
<v S1>I mean, that thing wasn't perfect either. And I was

0:29:18.758 --> 0:29:21.038
<v S1>sitting there going, oh my God, who the who? The

0:29:21.038 --> 0:29:24.488
<v S1>Blazers can afford something that's worth about $55,000 Australian. And

0:29:24.488 --> 0:29:27.428
<v S1>now it's in our pocket. Yeah. Um, and then I

0:29:27.428 --> 0:29:30.698
<v S1>remember when I got my first Apple Tui, the synthesizer

0:29:30.698 --> 0:29:34.028
<v S1>was even worse than the original Daleks on Doctor Who,

0:29:34.868 --> 0:29:37.058
<v S1>because for a long time it kept saying to me

0:29:37.058 --> 0:29:40.898
<v S1>there was an unclosed error on the Apple Tui. And

0:29:40.898 --> 0:29:44.048
<v S1>it wasn't until somebody started said it's actually saying unknown,

0:29:44.048 --> 0:29:48.368
<v S1>but it's actually saying the K as a C, right. Um,

0:29:48.368 --> 0:29:50.768
<v S1>and I thought that was only that was only 40

0:29:50.768 --> 0:29:51.518
<v S1>years ago.

0:29:51.518 --> 0:29:53.408
<v S3>Yeah. Oh another example.

0:29:53.408 --> 0:29:56.438
<v S2>I mean subtitles on the TV now. I mean, it

0:29:56.438 --> 0:29:58.448
<v S2>used to be we people had to transcribe it. The

0:29:58.448 --> 0:30:01.418
<v S2>only way you could get anything reasonable as a subtitle out.

0:30:01.418 --> 0:30:04.838
<v S2>Now it's done. Pretty automatic. Not done perfectly, but it's

0:30:04.838 --> 0:30:07.118
<v S2>good enough to, you know, be able to work out

0:30:07.118 --> 0:30:08.978
<v S2>what's happening if you can't hear the TV.

0:30:09.188 --> 0:30:12.608
<v S1>No, no. Exactly, exactly. And look, I'm I mean, I'm

0:30:12.608 --> 0:30:14.318
<v S1>because I'm an Apple geek. I mean, I'm really looking

0:30:14.318 --> 0:30:17.288
<v S1>forward to, um, what's coming up at the Worldwide Developers

0:30:17.288 --> 0:30:20.558
<v S1>Conference at WWDC to see what Siri morphs into.

0:30:21.128 --> 0:30:23.288
<v S2>Apple has been very slow, though, on the uptake of

0:30:23.288 --> 0:30:29.288
<v S2>AI that's going to change. Um, and the Apple was

0:30:29.288 --> 0:30:30.848
<v S2>the great thing about Apple is they're going to do

0:30:30.848 --> 0:30:32.648
<v S2>it more and more on your device.

0:30:32.798 --> 0:30:32.978
<v S3>Mm.

0:30:34.248 --> 0:30:36.798
<v S1>Exactly. Yeah. And that's what I'm looking forward to because

0:30:36.798 --> 0:30:41.118
<v S1>I really don't want my, you know, my conversations or

0:30:41.118 --> 0:30:43.008
<v S1>whatever else I might be looking at via an image

0:30:43.008 --> 0:30:45.648
<v S1>going out to a cloud somewhere, because you've got no

0:30:45.648 --> 0:30:47.808
<v S1>idea what where the information is going to end up

0:30:47.808 --> 0:30:50.478
<v S1>on the cloud. So the more stuff is done locally,

0:30:50.928 --> 0:30:52.728
<v S1>and to me that's going to be more appropriate.

0:30:52.728 --> 0:30:55.218
<v S2>But that's that's going to be the future. And certainly

0:30:55.218 --> 0:30:58.038
<v S2>Apple was one of the companies that's been, um, you know,

0:30:58.038 --> 0:31:01.638
<v S2>promoting that. But the idea is that increasingly you don't

0:31:01.638 --> 0:31:05.088
<v S2>want to share your data with everyone and anyone. And increasingly,

0:31:05.088 --> 0:31:08.508
<v S2>we'll have the sophisticated AI algorithms will be small enough

0:31:08.508 --> 0:31:12.708
<v S2>and smart enough to run actually on your device. Um,

0:31:12.708 --> 0:31:14.598
<v S2>that solves lots of other problems, and it solves the

0:31:14.598 --> 0:31:17.268
<v S2>privacy problem, but it also solves the latency problem. So

0:31:17.268 --> 0:31:19.998
<v S2>there's lots of places where you don't have the connectivity

0:31:20.148 --> 0:31:22.338
<v S2>you're in, you know, an urban canyon, you're in a tunnel,

0:31:22.338 --> 0:31:26.748
<v S2>whatever it is. Yeah. Can't depend upon, um, the connectivity

0:31:26.748 --> 0:31:28.308
<v S2>to be able to, you know, send the data to

0:31:28.308 --> 0:31:31.338
<v S2>the cloud and have it transcribed or the computer vision

0:31:31.338 --> 0:31:33.108
<v S2>on your car to do stuff. You need to be

0:31:33.108 --> 0:31:33.348
<v S2>able to.

0:31:33.348 --> 0:31:34.488
<v S3>Do it on.

0:31:34.488 --> 0:31:38.358
<v S2>The device. So but that's the future of AI. Increasingly,

0:31:38.358 --> 0:31:41.628
<v S2>it's going to be powerful enough to run on the

0:31:41.628 --> 0:31:44.898
<v S2>limited amount of hardware that you're actually carrying on your person.

0:31:45.108 --> 0:31:47.388
<v S1>No. And look, that's what I'm looking forward to. I mean,

0:31:47.388 --> 0:31:49.938
<v S1>I've got I've got about ten questions that I always

0:31:49.938 --> 0:31:53.238
<v S1>keep checking in with AI every, every three months or so.

0:31:53.328 --> 0:31:57.228
<v S1>And at the moment it's it it's getting better. Um,

0:31:57.228 --> 0:31:59.568
<v S1>the first couple of ones that actually broke dramatically on

0:31:59.568 --> 0:32:03.138
<v S1>my on my questions. Um, I was funny because, um,

0:32:03.138 --> 0:32:05.838
<v S1>when I was going to interview, I, I checked up

0:32:05.838 --> 0:32:08.388
<v S1>and it doesn't think that you're a poker player anymore,

0:32:08.388 --> 0:32:09.198
<v S1>which is cool.

0:32:09.918 --> 0:32:10.368
<v S3>Um.

0:32:10.938 --> 0:32:13.278
<v S1>It knows how many bees are in the in the

0:32:13.278 --> 0:32:15.288
<v S1>webinar now, which is also pretty cool.

0:32:15.348 --> 0:32:17.688
<v S2>Oh, Taramasalata will call it now, though.

0:32:18.438 --> 0:32:19.758
<v S3>All right.

0:32:20.868 --> 0:32:24.408
<v S2>Although I must admit I struggle to spell Taramasalata myself.

0:32:24.408 --> 0:32:25.398
<v S3>But anyway, yeah.

0:32:25.398 --> 0:32:28.368
<v S1>Actually my favorite word at university for philosophy was, um,

0:32:28.368 --> 0:32:32.418
<v S1>reductio ad absurdum. I love that word. Um, I was like, oh,

0:32:32.418 --> 0:32:34.668
<v S1>that's pretty cool. Um, all right. So look, if people

0:32:34.668 --> 0:32:36.918
<v S1>want to find out more about your books because I

0:32:36.918 --> 0:32:42.228
<v S1>know there's there's faking it, there's the 202,062, uh, there's

0:32:42.228 --> 0:32:45.708
<v S1>machines Behaving badly, which I absolutely love. Um, now there's

0:32:45.708 --> 0:32:47.478
<v S1>a fourth book, which, for the life of me, I

0:32:47.478 --> 0:32:48.468
<v S1>can't remember.

0:32:48.738 --> 0:32:52.878
<v S2>It's alive. Artificial intelligence from the piano to killer robots. Actually,

0:32:52.878 --> 0:32:54.108
<v S2>the first book I wrote, which is.

0:32:54.108 --> 0:32:54.678
<v S3>Oh, okay.

0:32:54.678 --> 0:32:55.578
<v S2>About the history of.

0:32:55.578 --> 0:32:57.228
<v S3>AI. Okay, now I've.

0:32:57.228 --> 0:32:59.748
<v S1>Got all of them on Kindle. Have you done anything

0:32:59.748 --> 0:33:01.698
<v S1>with audible? So people that want to sort of sit

0:33:01.698 --> 0:33:02.538
<v S1>back with a glass of wine?

0:33:02.568 --> 0:33:03.438
<v S3>Good news is.

0:33:03.438 --> 0:33:10.338
<v S2>Coming out next month. I have just recorded my, um, uh, book.

0:33:10.338 --> 0:33:12.378
<v S2>I've just been put out by Belinda. Um, Faking It

0:33:12.378 --> 0:33:16.008
<v S2>is coming out in audio, and I have to say,

0:33:16.158 --> 0:33:20.058
<v S2>I'm an absolute convert. I, I think it is the

0:33:20.058 --> 0:33:25.938
<v S2>best version of my book. Um, because I realized that,

0:33:25.938 --> 0:33:28.488
<v S2>you know, I've written it in some places. I write

0:33:28.488 --> 0:33:31.368
<v S2>it with, you know, I'm writing things and I'm annoyed

0:33:31.368 --> 0:33:35.628
<v S2>and upset or excited. And if you're if you read

0:33:35.628 --> 0:33:38.148
<v S2>the page carefully, maybe, hopefully you can tell that in

0:33:38.148 --> 0:33:41.298
<v S2>whether what I've written. Um, but in the audible book,

0:33:41.298 --> 0:33:43.968
<v S2>it's very clear when I, when I spoke the book,

0:33:43.968 --> 0:33:48.738
<v S2>I had the privilege of pleasure of speaking my own book. I,

0:33:48.738 --> 0:33:50.958
<v S2>you know, I could laugh at the jokes in the book.

0:33:50.958 --> 0:33:53.868
<v S2>I can I can express disgust with what the tech

0:33:53.868 --> 0:33:57.018
<v S2>companies are doing with my data. Um, so there's a

0:33:57.018 --> 0:33:59.928
<v S2>lot more information conveyed in the audible book that's not

0:33:59.928 --> 0:34:01.638
<v S2>conveyed in the written book. Maybe.

0:34:02.738 --> 0:34:05.018
<v S1>And, uh, because I, when I was reading the stuff

0:34:05.018 --> 0:34:08.888
<v S1>about the, um, the local autonomous weapon systems, I thought

0:34:08.888 --> 0:34:12.908
<v S1>that's just absolutely appalling. Giving, giving drones the capability to

0:34:12.908 --> 0:34:15.458
<v S1>be able to actually kill people without any human intervention.

0:34:15.458 --> 0:34:17.918
<v S1>That's just getting a bit beyond the pale. Basically it.

0:34:17.918 --> 0:34:18.248
<v S3>Is.

0:34:18.638 --> 0:34:21.158
<v S1>Um, and I just thought, oh, I wonder what you

0:34:21.158 --> 0:34:25.148
<v S1>would have sounded like reading that particular particular paragraph, because.

0:34:25.298 --> 0:34:26.858
<v S2>Next month you could discover.

0:34:26.858 --> 0:34:28.448
<v S3>There you go, I will.

0:34:28.838 --> 0:34:32.048
<v S2>The great thing about the book was I actually only

0:34:32.048 --> 0:34:33.308
<v S2>speak half the book.

0:34:33.518 --> 0:34:34.508
<v S3>Oh, because the.

0:34:34.508 --> 0:34:38.318
<v S2>Book is all about how ChatGPT and examples like that

0:34:38.318 --> 0:34:44.948
<v S2>make mistakes and get things wrong and, um, amuse us and, and, um,

0:34:45.098 --> 0:34:47.348
<v S2>I said to the publisher, I said, well, I could

0:34:47.348 --> 0:34:50.588
<v S2>read this. Um, but why should I read this? Why

0:34:50.588 --> 0:34:52.718
<v S2>not get the computer to read it? So I say

0:34:52.718 --> 0:34:54.938
<v S2>half of it and ChatGPT says the other half.

0:34:55.088 --> 0:34:57.698
<v S1>Yeah. Does it use your voice to do the the

0:34:57.698 --> 0:34:59.948
<v S1>second bit, or does it just use a generic computer

0:34:59.948 --> 0:35:00.998
<v S1>synthesizer voice?

0:35:00.998 --> 0:35:01.238
<v S3>Oh, no.

0:35:01.238 --> 0:35:04.988
<v S2>They've, um, they're getting a very nice, uh, synthesized voice

0:35:04.988 --> 0:35:07.898
<v S2>for that. Um, so the publisher is very excited. They

0:35:07.898 --> 0:35:10.418
<v S2>said this is the first book. They've they've really, um,

0:35:10.418 --> 0:35:14.288
<v S2>embraced the technology. Right. And it's so, so it is

0:35:14.288 --> 0:35:18.398
<v S2>a conversation between me and the computer, and I hope

0:35:18.398 --> 0:35:18.878
<v S2>you enjoy it.