WEBVTT - Teaching Cars to Think Like People

0:00:15.316 --> 0:00:22.316
<v Speaker 1>Pushkin. It seems like we've been hearing for I don't

0:00:22.316 --> 0:00:25.956
<v Speaker 1>know ten years now that self driving cars are two

0:00:26.036 --> 0:00:29.756
<v Speaker 1>years away. And yes, I know Tesla's are amazing because

0:00:29.796 --> 0:00:33.076
<v Speaker 1>Tesla owners keep telling me so. But we are clearly

0:00:33.396 --> 0:00:37.516
<v Speaker 1>not yet at that magic transformative moment when cars can

0:00:37.636 --> 0:00:41.556
<v Speaker 1>truly drive themselves. We're not yet in a world where

0:00:41.716 --> 0:00:44.076
<v Speaker 1>a human driver seems as out of place as a

0:00:44.476 --> 0:00:48.796
<v Speaker 1>human elevator operator. So the question for today's show is this,

0:00:49.716 --> 0:00:52.596
<v Speaker 1>what is the most important problem that we have to

0:00:52.676 --> 0:00:55.556
<v Speaker 1>solve for that to happen. What's it going to take

0:00:55.556 --> 0:00:57.796
<v Speaker 1>to get to a world where I'm heading north on

0:00:57.916 --> 0:01:00.356
<v Speaker 1>I five and I just climb in the back of

0:01:00.356 --> 0:01:05.276
<v Speaker 1>my car and take a nap. I'm Jacob Goldstein, and

0:01:05.356 --> 0:01:07.476
<v Speaker 1>this is What's Your Problem, the show where we talk

0:01:07.556 --> 0:01:10.556
<v Speaker 1>to entrepreneurs and engineers about the future they're going to

0:01:10.636 --> 0:01:14.036
<v Speaker 1>build once they solve a few problems. My guest today

0:01:14.196 --> 0:01:17.836
<v Speaker 1>is Aisha Evans, CEO of the autonomous car company Zooks

0:01:18.236 --> 0:01:23.036
<v Speaker 1>zoo X. Zooks was acquired by Amazon back in twenty twenty,

0:01:23.196 --> 0:01:25.436
<v Speaker 1>and I wanted to talk to Aisha because Zooks is

0:01:25.516 --> 0:01:29.956
<v Speaker 1>really all in on this self driving car dream. They're

0:01:29.956 --> 0:01:32.636
<v Speaker 1>not going for half measures. They're not going for baby steps.

0:01:32.956 --> 0:01:36.996
<v Speaker 1>They're trying to build a truly fully AI take the

0:01:36.996 --> 0:01:40.116
<v Speaker 1>wheel self driving car. So I should seem like the

0:01:40.156 --> 0:01:43.356
<v Speaker 1>perfect person to talk about the problems engineers are going

0:01:43.396 --> 0:01:45.916
<v Speaker 1>to have to solve to build a car that can

0:01:45.956 --> 0:01:50.996
<v Speaker 1>truly drive itself. We started our conversation talking about the

0:01:51.036 --> 0:01:54.316
<v Speaker 1>self driving car that Zoos is building now, which really

0:01:54.636 --> 0:01:58.796
<v Speaker 1>isn't exactly a car. It looks like a toaster on wheels.

0:01:58.796 --> 0:02:02.516
<v Speaker 1>It's very boxy. There are sliding doors on either side,

0:02:02.836 --> 0:02:06.276
<v Speaker 1>and inside there are two bench seats facing each other,

0:02:06.396 --> 0:02:09.996
<v Speaker 1>and that's kind of it. There's no driver seat, there's

0:02:10.036 --> 0:02:15.196
<v Speaker 1>no dashboard, there's no steering wheel. If you step into

0:02:15.196 --> 0:02:19.116
<v Speaker 1>our vehicle and you think about driving, we consider that

0:02:19.156 --> 0:02:23.556
<v Speaker 1>we failed what transporting you no pedals, no steering wheel,

0:02:23.596 --> 0:02:25.956
<v Speaker 1>You are not involved in the driving. You're a rider,

0:02:25.996 --> 0:02:29.236
<v Speaker 1>and we conceived it for you. There's also no front

0:02:29.276 --> 0:02:32.956
<v Speaker 1>or back right, no fully symmetrical. When you think again

0:02:33.436 --> 0:02:37.276
<v Speaker 1>about the effectiveness and the efficiency of transportation, especially in

0:02:37.396 --> 0:02:41.276
<v Speaker 1>dense urban environment, imagine pulling into a narrow alley and

0:02:41.476 --> 0:02:43.516
<v Speaker 1>not having to make a U turn and just flipping

0:02:43.556 --> 0:02:46.476
<v Speaker 1>the lights and the vehicle goes into the other direction.

0:02:46.676 --> 0:02:48.676
<v Speaker 1>So the thing I like about it, not having a

0:02:48.716 --> 0:02:51.796
<v Speaker 1>steering wheel, not having a front or back is just

0:02:51.876 --> 0:02:56.556
<v Speaker 1>the way the thing looks. The physical thing itself is

0:02:56.636 --> 0:03:00.116
<v Speaker 1>this manifestation of how all in you are. Right. It's

0:03:00.156 --> 0:03:02.476
<v Speaker 1>not like, let's take a car and make a robot

0:03:02.556 --> 0:03:05.116
<v Speaker 1>drive it. It's what does the world look like? What

0:03:05.156 --> 0:03:08.356
<v Speaker 1>does a thing look like if there is no driver? Yes,

0:03:08.876 --> 0:03:12.836
<v Speaker 1>if you you know, you become dispassionate no matter what

0:03:12.916 --> 0:03:14.996
<v Speaker 1>car you have today, a personal car, and you look

0:03:15.036 --> 0:03:19.676
<v Speaker 1>at it purely from an engineering standpoint. It's architected and

0:03:19.916 --> 0:03:24.316
<v Speaker 1>designed with the concept of a human driver. Yea. What

0:03:24.316 --> 0:03:28.556
<v Speaker 1>we're doing, by the way, is really okay. If AI

0:03:28.756 --> 0:03:32.556
<v Speaker 1>is going to be responsible for the driving amongst other

0:03:32.716 --> 0:03:37.716
<v Speaker 1>human drivers, how do you architect and design the vehicle

0:03:38.476 --> 0:03:42.756
<v Speaker 1>so that AI is the best and safest drival possible.

0:03:43.076 --> 0:03:45.476
<v Speaker 1>That's our point of view, and then you work backward

0:03:45.516 --> 0:03:51.316
<v Speaker 1>from there. So so how's it going to work? Basically,

0:03:51.996 --> 0:03:53.716
<v Speaker 1>you have the app, you say I want to go

0:03:53.916 --> 0:03:55.716
<v Speaker 1>from here to there. We come, we pick you up,

0:03:55.756 --> 0:03:58.516
<v Speaker 1>you sit down, you buckle up, push start. That's a

0:03:58.556 --> 0:04:01.356
<v Speaker 1>safety thing, and then we take you to point B

0:04:02.116 --> 0:04:05.156
<v Speaker 1>and your unbuckle, step off, and by that time we

0:04:05.276 --> 0:04:09.396
<v Speaker 1>probably already know the other passenger. When can I call

0:04:09.436 --> 0:04:13.636
<v Speaker 1>one depends on where you are. So if you're in

0:04:13.716 --> 0:04:17.516
<v Speaker 1>Las Vegas fairly soon, much sooner than you think on

0:04:17.596 --> 0:04:19.796
<v Speaker 1>the strip, you'll be able to call one. Does that

0:04:19.876 --> 0:04:23.276
<v Speaker 1>mean this year? Next year? It won't be this year.

0:04:24.716 --> 0:04:28.956
<v Speaker 1>I can tell you what happened by twenty twenty five. Yes,

0:04:29.836 --> 0:04:32.716
<v Speaker 1>And once we do that, then we'll go city by city.

0:04:32.756 --> 0:04:36.396
<v Speaker 1>We've already been public saying that we'll go Las Vegas

0:04:36.556 --> 0:04:40.436
<v Speaker 1>and then San Francisco. Then as we want to move east,

0:04:40.956 --> 0:04:43.396
<v Speaker 1>but then as you're moving east you have to handle snow.

0:04:43.716 --> 0:04:46.036
<v Speaker 1>And then we want to be global, and as you

0:04:46.156 --> 0:04:48.956
<v Speaker 1>go global, you get a whole new set of parameters.

0:04:49.076 --> 0:04:53.156
<v Speaker 1>The roundabouts in London come to mind, and so is

0:04:53.196 --> 0:04:57.876
<v Speaker 1>the idea that zoos vehicles will be like cabs are. Now,

0:04:58.036 --> 0:05:00.916
<v Speaker 1>like I have a car, and then if I'm in

0:05:00.956 --> 0:05:02.676
<v Speaker 1>the city and I need to get from one place

0:05:02.716 --> 0:05:05.676
<v Speaker 1>to another, maybe I'll take a cab. Is it like that?

0:05:05.956 --> 0:05:09.476
<v Speaker 1>It's like that with a little bit deeper philosophy, which is, look,

0:05:10.356 --> 0:05:13.636
<v Speaker 1>we know that let's not with the United States, right,

0:05:13.996 --> 0:05:16.876
<v Speaker 1>it's around two and a half cars roughly per family.

0:05:17.356 --> 0:05:19.916
<v Speaker 1>We're not saying it'll be zero overnight, obviously you have

0:05:19.956 --> 0:05:22.636
<v Speaker 1>to be rational, but we can do better than two

0:05:22.676 --> 0:05:25.436
<v Speaker 1>and a half cars per family. So let's say we

0:05:25.516 --> 0:05:27.676
<v Speaker 1>get to one or one and a half. We feel

0:05:27.716 --> 0:05:31.476
<v Speaker 1>that we will have done something after the break. The

0:05:31.516 --> 0:05:34.476
<v Speaker 1>technical problems that Zoos and for that matter, all the

0:05:34.516 --> 0:05:37.636
<v Speaker 1>other self driving car companies are still trying to fix.

0:05:38.276 --> 0:05:41.356
<v Speaker 1>One surprisingly hard one, how to figure out who goes

0:05:41.396 --> 0:05:44.156
<v Speaker 1>first when two cars pull up to an intersection at

0:05:44.156 --> 0:05:47.196
<v Speaker 1>the same time. I should calls it the UGO I

0:05:47.356 --> 0:05:54.876
<v Speaker 1>goo problem. Now let's get back to what's your problem. So, Okay,

0:05:54.876 --> 0:05:57.156
<v Speaker 1>the world has been talking about self driving cars for

0:05:57.596 --> 0:06:00.356
<v Speaker 1>well over ten years now, and we know that to

0:06:00.476 --> 0:06:02.516
<v Speaker 1>get to this self driving car world, there's going to

0:06:02.556 --> 0:06:05.876
<v Speaker 1>be regulatory hurdles. There's going to be people worried about safety.

0:06:05.956 --> 0:06:08.156
<v Speaker 1>I feel like all of that is pretty familiar by now.

0:06:08.716 --> 0:06:11.196
<v Speaker 1>The thing I really wanted to talk with Aisha about

0:06:11.396 --> 0:06:15.076
<v Speaker 1>is the technical side. You know, like, what problem exactly

0:06:15.156 --> 0:06:18.036
<v Speaker 1>do engineers have to solve to build cars that can

0:06:18.156 --> 0:06:23.156
<v Speaker 1>really drive themselves. All of the companies can drive, and

0:06:23.236 --> 0:06:27.676
<v Speaker 1>the normal very kind of constrained circumstances, The thing is

0:06:28.236 --> 0:06:30.836
<v Speaker 1>all of these scenarios that can pop up, how do

0:06:30.876 --> 0:06:32.476
<v Speaker 1>you deal with them? And by the way, how do

0:06:32.516 --> 0:06:36.196
<v Speaker 1>you deal with them knowing you have human drivers around you,

0:06:36.276 --> 0:06:41.156
<v Speaker 1>they have their own learned behaviors and learned expectations as

0:06:41.156 --> 0:06:43.556
<v Speaker 1>to what's going to come from a human driver. And

0:06:43.596 --> 0:06:47.196
<v Speaker 1>the etiquette, the etiquette of it too that will not

0:06:47.236 --> 0:06:50.836
<v Speaker 1>fully there yet. Tell me about the etiquette, Well, depending

0:06:50.836 --> 0:06:53.036
<v Speaker 1>on which parts of the city, like if you're in

0:06:53.116 --> 0:06:56.796
<v Speaker 1>more of a neighborhood area versus more of a business area,

0:06:57.156 --> 0:07:00.516
<v Speaker 1>the behaviors are slightly different in how you approach things.

0:07:00.516 --> 0:07:04.156
<v Speaker 1>So for example, we call it Ugo I goo. The

0:07:04.356 --> 0:07:06.676
<v Speaker 1>ugo igo is a little bit more assertive on the

0:07:06.716 --> 0:07:09.996
<v Speaker 1>business side of things versus in a place where that's

0:07:09.996 --> 0:07:13.276
<v Speaker 1>a little bit more residential. So all that is again

0:07:13.636 --> 0:07:16.476
<v Speaker 1>the long tale of scenarios that we have to deal

0:07:16.516 --> 0:07:18.876
<v Speaker 1>with and be ready for. The you go, I go

0:07:19.156 --> 0:07:23.836
<v Speaker 1>is a really interesting one because that's not about the

0:07:23.876 --> 0:07:27.596
<v Speaker 1>formal like rules of the road, right, that is very

0:07:27.716 --> 0:07:31.876
<v Speaker 1>much a cultural thing that's gonna even vary from town

0:07:31.916 --> 0:07:34.396
<v Speaker 1>to town. Right. I've lived different places and you go,

0:07:34.596 --> 0:07:37.956
<v Speaker 1>I go. It's totally different in you know, Brooklyn than

0:07:37.996 --> 0:07:41.556
<v Speaker 1>it is in Bozeman, Montana, So like, how's an AI

0:07:41.676 --> 0:07:44.436
<v Speaker 1>going to figure that out? This is where training is

0:07:44.476 --> 0:07:48.796
<v Speaker 1>important because before it'll be a long, long, long, long

0:07:48.876 --> 0:07:52.476
<v Speaker 1>time where you just deploy what we call generic AI,

0:07:52.556 --> 0:07:54.316
<v Speaker 1>which is you come in for the first time, you

0:07:54.436 --> 0:07:56.396
<v Speaker 1>drop in and boom, you know how to do it.

0:07:56.756 --> 0:07:59.396
<v Speaker 1>So a lot of a lot of what we do

0:07:59.556 --> 0:08:01.316
<v Speaker 1>is learn and train, and that's why you know it's

0:08:01.316 --> 0:08:04.676
<v Speaker 1>called deep learning for example. And so we do it

0:08:04.996 --> 0:08:09.396
<v Speaker 1>enough times enough forms that the stack knows what to

0:08:09.476 --> 0:08:11.436
<v Speaker 1>look for to make the call. And we have those

0:08:11.476 --> 0:08:15.156
<v Speaker 1>examples today. For example, how we drive in San Francisco

0:08:15.316 --> 0:08:18.476
<v Speaker 1>versus how we drive in the campus were on today,

0:08:18.956 --> 0:08:22.516
<v Speaker 1>it's totally different. We're a lot more assertive in San

0:08:22.556 --> 0:08:27.676
<v Speaker 1>Francisco because that's what's expected. The buffers are smaller because

0:08:27.676 --> 0:08:31.196
<v Speaker 1>that's also what's expected than we do on the campus.

0:08:31.236 --> 0:08:33.596
<v Speaker 1>So all that is built into the stack, both with

0:08:33.716 --> 0:08:38.076
<v Speaker 1>control logic as well as with algorithms logic based on

0:08:38.316 --> 0:08:42.316
<v Speaker 1>our training models. So I'm trying to sort of boil

0:08:42.396 --> 0:08:44.436
<v Speaker 1>down what you're saying. I mean, it's interesting, like on

0:08:44.476 --> 0:08:47.596
<v Speaker 1>a certain level, you're saying you've solved the kind of

0:08:47.636 --> 0:08:53.196
<v Speaker 1>big macro technical problems for computers to drive cars basically,

0:08:53.636 --> 0:08:57.556
<v Speaker 1>But you're also saying there are a million edge cases

0:08:57.636 --> 0:09:01.956
<v Speaker 1>double parked cars and bikes doing weird things, and who knows,

0:09:02.036 --> 0:09:06.596
<v Speaker 1>people are weird, The world is weird, and those you

0:09:06.676 --> 0:09:09.516
<v Speaker 1>haven't solved exactly. You're just saying you just have to

0:09:09.556 --> 0:09:11.956
<v Speaker 1>practice more to get it. You just have to do

0:09:12.076 --> 0:09:14.796
<v Speaker 1>it like that. Why am I unsatisfied by that answer?

0:09:15.876 --> 0:09:19.076
<v Speaker 1>Because it's to be fair, you have to practice. But

0:09:19.276 --> 0:09:24.316
<v Speaker 1>practice is a feedback loop of doing yeah, finding errors

0:09:25.476 --> 0:09:28.196
<v Speaker 1>and figuring out how to deal with the errors, doing

0:09:28.236 --> 0:09:31.276
<v Speaker 1>it again. So it's a continuous feedback loop of that,

0:09:31.476 --> 0:09:34.276
<v Speaker 1>and that is what is left to solve. Can you

0:09:34.276 --> 0:09:36.716
<v Speaker 1>give me an example of a version of that loop

0:09:36.716 --> 0:09:39.596
<v Speaker 1>you've just completed where you found an error and then

0:09:39.676 --> 0:09:42.236
<v Speaker 1>fixed it. What's an example of that? Yeah, this one

0:09:42.316 --> 0:09:48.556
<v Speaker 1>is very personal. My first Zooks ride ever, So that's

0:09:48.596 --> 0:09:51.556
<v Speaker 1>three and a half years ago. When the vehicle saw

0:09:51.596 --> 0:09:55.956
<v Speaker 1>a double parked vehicle in front of it, but there

0:09:56.116 --> 0:09:59.156
<v Speaker 1>was it's a single lane and so you have basically

0:09:59.236 --> 0:10:03.556
<v Speaker 1>a double yellow line. The vehicle says, oh, the rule

0:10:03.676 --> 0:10:07.036
<v Speaker 1>says through the AI stack that I cannot cross a

0:10:07.076 --> 0:10:12.196
<v Speaker 1>double yellow line. Right, We would actually stop, disengage and

0:10:12.276 --> 0:10:16.196
<v Speaker 1>do it manually and go around it. But over time

0:10:16.276 --> 0:10:19.516
<v Speaker 1>we've now learned through all of the dpvs we've seen,

0:10:19.956 --> 0:10:23.236
<v Speaker 1>and so there are many times now when the app

0:10:23.436 --> 0:10:27.996
<v Speaker 1>is a vehicle, ye sorry, when when there's a double

0:10:28.036 --> 0:10:32.036
<v Speaker 1>park vehicle in front of the vehicle. Now the vehicle

0:10:32.116 --> 0:10:34.836
<v Speaker 1>is able to look through the sense of pod, look

0:10:34.876 --> 0:10:39.076
<v Speaker 1>at oncoming traffic, look at everything else, speeds and feeds

0:10:39.076 --> 0:10:42.636
<v Speaker 1>and everything else around it, and many times now it

0:10:42.796 --> 0:10:45.356
<v Speaker 1>is able to just smoothly like a human would do,

0:10:46.076 --> 0:10:48.836
<v Speaker 1>make the decision itself. That's a good one because there's

0:10:48.836 --> 0:10:51.396
<v Speaker 1>a clear rule you can't cross a double yellow line.

0:10:51.436 --> 0:10:53.596
<v Speaker 1>We all learn that. But yeah, we also all know

0:10:53.636 --> 0:10:55.916
<v Speaker 1>if you're driving along and somebody's double parked in front

0:10:55.916 --> 0:10:58.036
<v Speaker 1>of you, got to swerve over and nobody's coming the

0:10:58.076 --> 0:11:00.556
<v Speaker 1>other way, like you're allowed to do that exactly. So

0:11:00.996 --> 0:11:03.716
<v Speaker 1>how does an AI figure out when it's okay to

0:11:03.836 --> 0:11:07.956
<v Speaker 1>do that? So we through again the algorithms, through a

0:11:07.996 --> 0:11:10.956
<v Speaker 1>lot of training, meaning all the scenarios. If it's seen

0:11:11.076 --> 0:11:14.876
<v Speaker 1>over and over again, it's able to say, okay, I looked.

0:11:15.476 --> 0:11:17.516
<v Speaker 1>And this is where again the placement of the center

0:11:17.596 --> 0:11:20.876
<v Speaker 1>architecture is very important because in our vehicle. Since it's

0:11:20.916 --> 0:11:23.036
<v Speaker 1>every top four corner and you have a two hundred

0:11:23.036 --> 0:11:26.156
<v Speaker 1>and seventy degree view, it's able to basically say, hey,

0:11:26.236 --> 0:11:28.636
<v Speaker 1>I'm looking in front of me. I have plenty of space.

0:11:28.836 --> 0:11:31.636
<v Speaker 1>There's no car coming on the oncoming lane, and I

0:11:31.676 --> 0:11:34.236
<v Speaker 1>see that I have space. Also in front of the

0:11:34.716 --> 0:11:38.276
<v Speaker 1>double park vehicle, there are no pedestrian, there's no bike

0:11:38.316 --> 0:11:40.876
<v Speaker 1>coming up. Oh yeah, I can do this. It goes ahead.

0:11:41.116 --> 0:11:44.316
<v Speaker 1>So you're telling me, basically, like the AI, the machine,

0:11:44.316 --> 0:11:46.316
<v Speaker 1>the vehicle already knows how to drive. It just needs

0:11:46.316 --> 0:11:49.596
<v Speaker 1>to practice, just needs to practice for millions of hours.

0:11:50.036 --> 0:11:52.116
<v Speaker 1>It's funny you should say this. I have a sixteen

0:11:52.196 --> 0:11:54.836
<v Speaker 1>year old daughter, so she started the journey of driving,

0:11:55.716 --> 0:11:58.956
<v Speaker 1>and you know, at first we were just within a

0:11:58.996 --> 0:12:02.756
<v Speaker 1>mile of the house in the neighborhood, then maybe some

0:12:02.836 --> 0:12:05.036
<v Speaker 1>areas where they are stop signs, and now we're up

0:12:05.076 --> 0:12:08.476
<v Speaker 1>to a sort of the supermarket, and very soon we'll

0:12:08.476 --> 0:12:10.916
<v Speaker 1>be up to going to school. It is the case

0:12:11.036 --> 0:12:16.476
<v Speaker 1>that human beings can learn to drive pretty well in

0:12:16.556 --> 0:12:23.116
<v Speaker 1>like what forty hours or so, and computers clearly cannot. Right.

0:12:24.276 --> 0:12:29.116
<v Speaker 1>And I used to think people were bad drivers, right,

0:12:29.196 --> 0:12:32.836
<v Speaker 1>Like it seems obvious to me frankly, that people are

0:12:32.836 --> 0:12:35.036
<v Speaker 1>bad drivers, right. You know, we look at our phones

0:12:35.076 --> 0:12:38.116
<v Speaker 1>while we're driving. We overrate our driving abilities. We have

0:12:38.196 --> 0:12:43.436
<v Speaker 1>literal blind spots. On the other hand, zooks, vehicles can

0:12:43.556 --> 0:12:45.876
<v Speaker 1>see way better than people can see all the way

0:12:45.876 --> 0:12:48.076
<v Speaker 1>around in three hundred and sixty degrees. They have like

0:12:48.396 --> 0:12:51.116
<v Speaker 1>the smartest people in the world trying to teach them

0:12:51.116 --> 0:12:53.996
<v Speaker 1>how to drive. They have millions of hours of practicing,

0:12:54.276 --> 0:12:58.636
<v Speaker 1>and still they're worse than human drivers. So, like, I

0:12:58.676 --> 0:13:01.196
<v Speaker 1>don't know how to parse that, right, Like, are people

0:13:01.276 --> 0:13:05.876
<v Speaker 1>actually way better drivers than I thought? Well, so I

0:13:05.996 --> 0:13:08.276
<v Speaker 1>think that we conflate a lot of things when we

0:13:08.316 --> 0:13:10.996
<v Speaker 1>talk about driving. So let's go back to my daughter.

0:13:11.076 --> 0:13:13.876
<v Speaker 1>I don't think I dispute a little bit that my

0:13:14.356 --> 0:13:19.196
<v Speaker 1>child started learning to drive at sixteen. She started learning

0:13:19.196 --> 0:13:22.036
<v Speaker 1>to drive the first day she was in a car, Yeah,

0:13:22.156 --> 0:13:25.196
<v Speaker 1>or maybe even the first day she was alive. That's

0:13:25.276 --> 0:13:29.956
<v Speaker 1>my seeing physics and seeing the world and understanding people.

0:13:30.396 --> 0:13:33.316
<v Speaker 1>I mean, that's maybe the most interesting thing for me

0:13:33.356 --> 0:13:36.716
<v Speaker 1>in this conversation, right, is like the hard part in

0:13:36.796 --> 0:13:39.436
<v Speaker 1>teaching computers to drive is teaching them to figure out

0:13:39.476 --> 0:13:44.996
<v Speaker 1>people exactly. And ecosystems that are built around and four people.

0:13:45.556 --> 0:13:47.756
<v Speaker 1>That in a way is the ultimate problem you're trying

0:13:47.756 --> 0:13:50.636
<v Speaker 1>to solve, right, Like try and teach a computer to

0:13:50.716 --> 0:13:54.196
<v Speaker 1>think like a person. That's exactly right. And I think

0:13:54.196 --> 0:13:58.036
<v Speaker 1>when I put my firstborn in the car seat coming

0:13:58.076 --> 0:14:01.876
<v Speaker 1>home from the hospital, she was already starting to learn huh,

0:14:02.556 --> 0:14:05.676
<v Speaker 1>say more. You know you're in a car, you know

0:14:05.756 --> 0:14:09.316
<v Speaker 1>it's moving, you know there's a driver, your parents, you're

0:14:09.356 --> 0:14:13.556
<v Speaker 1>looking around you. You're already starting to your internal algorithms

0:14:13.556 --> 0:14:16.796
<v Speaker 1>are already starting to learn and take inputs. You know

0:14:16.916 --> 0:14:18.916
<v Speaker 1>that you need to stop at a light before you're

0:14:18.956 --> 0:14:21.996
<v Speaker 1>an actual driver, right, and then all of the weirdness

0:14:22.036 --> 0:14:24.716
<v Speaker 1>of like if somebody is nodding, or if somebody's pointing,

0:14:24.796 --> 0:14:27.196
<v Speaker 1>or if somebody's waving, and the different things a wave

0:14:27.276 --> 0:14:29.636
<v Speaker 1>can mean, Like there can be like the nice guy point,

0:14:29.676 --> 0:14:32.276
<v Speaker 1>like hey, nice job, or like the angry point, like

0:14:32.356 --> 0:14:35.436
<v Speaker 1>what are you doing when you're driving? Certainly in the city,

0:14:35.476 --> 0:14:38.236
<v Speaker 1>you need to know what those different things mean. That

0:14:38.396 --> 0:14:42.036
<v Speaker 1>is exactly right, and that is the essence of the

0:14:42.116 --> 0:14:45.156
<v Speaker 1>problem that is left to solve. And how do you

0:14:45.196 --> 0:14:52.396
<v Speaker 1>solve it? Practice train in code, figuring out to figure

0:14:52.436 --> 0:14:57.276
<v Speaker 1>out a way to give the computers as many inputs

0:14:57.316 --> 0:15:01.116
<v Speaker 1>as possible, teaching it how to make decisions, and very important,

0:15:02.236 --> 0:15:05.396
<v Speaker 1>making sure that you teach the computer to know what

0:15:05.436 --> 0:15:08.876
<v Speaker 1>it is it doesn't know, so that when it doesn't

0:15:08.876 --> 0:15:12.436
<v Speaker 1>know something, it tells us or it shows us. So

0:15:12.476 --> 0:15:14.836
<v Speaker 1>then we can sit down and say, Okay, that's a problem.

0:15:14.916 --> 0:15:17.076
<v Speaker 1>How do we get around that? How do we solve

0:15:17.116 --> 0:15:22.116
<v Speaker 1>for that? That's really interesting. Fundamentally, this is a discussion

0:15:22.156 --> 0:15:25.316
<v Speaker 1>about risk, right. I mean, what you're saying is Zoo's

0:15:25.396 --> 0:15:30.276
<v Speaker 1>vehicles can basically drive themselves now, but not safely enough.

0:15:30.636 --> 0:15:34.116
<v Speaker 1>That's exactly right, And so I mean one question is

0:15:36.756 --> 0:15:38.876
<v Speaker 1>how safe are they going to have to be? Right?

0:15:38.956 --> 0:15:42.516
<v Speaker 1>I imagine as good as human drivers is not good enough.

0:15:43.276 --> 0:15:46.316
<v Speaker 1>This is clearly a super high stakes, literally a life

0:15:46.316 --> 0:15:52.596
<v Speaker 1>and death question. No system is infinitely safe. How good

0:15:52.676 --> 0:15:56.156
<v Speaker 1>is good enough? We have to do way, way, way

0:15:56.236 --> 0:16:01.556
<v Speaker 1>better than humans today, both on the crashes and on

0:16:01.956 --> 0:16:08.596
<v Speaker 1>the number of miles driven per incident, basically and twice

0:16:08.716 --> 0:16:11.836
<v Speaker 1>twice as good. That's like, how much how goodes it

0:16:11.876 --> 0:16:14.596
<v Speaker 1>have to be? Like in a purely rational world, a

0:16:14.596 --> 0:16:16.836
<v Speaker 1>little bit better would be good enough. Right. Human beings

0:16:16.876 --> 0:16:19.316
<v Speaker 1>are not just purely irrational. They are very rational, but

0:16:19.356 --> 0:16:22.956
<v Speaker 1>they are not purely irrational, and so again I don't.

0:16:22.996 --> 0:16:25.356
<v Speaker 1>None of us in the industry have given our metrics,

0:16:25.356 --> 0:16:26.716
<v Speaker 1>so I'm not going to be the first one to

0:16:26.716 --> 0:16:29.516
<v Speaker 1>do that. There's a reason we haven't. But we have

0:16:29.556 --> 0:16:32.036
<v Speaker 1>to be much safer than humans. This is not a

0:16:32.236 --> 0:16:34.876
<v Speaker 1>be as safe as the type situation a little bit better.

0:16:34.956 --> 0:16:37.396
<v Speaker 1>I would not consider that to be a responsible thing

0:16:37.436 --> 0:16:40.076
<v Speaker 1>to do. In a minute, in the Lightning round, I

0:16:40.116 --> 0:16:42.836
<v Speaker 1>should tells us where of all the places she's lived,

0:16:43.036 --> 0:16:47.076
<v Speaker 1>human drivers are the worst. Also one domain where AI

0:16:47.196 --> 0:16:55.836
<v Speaker 1>will never be better than humans. Okay, let's get back

0:16:55.836 --> 0:16:58.156
<v Speaker 1>to the show. We're going to close with the Lightning round.

0:16:58.556 --> 0:17:01.316
<v Speaker 1>So I love this. I love the conversation. I love

0:17:01.356 --> 0:17:04.916
<v Speaker 1>talking about big, complicated technical things. But I also love

0:17:05.116 --> 0:17:09.876
<v Speaker 1>asking lots of questions really fast at the end of

0:17:09.916 --> 0:17:14.356
<v Speaker 1>an interview. Are you ready? I am. What is the

0:17:14.356 --> 0:17:16.676
<v Speaker 1>one piece of advice you'd give to somebody who is

0:17:16.716 --> 0:17:20.636
<v Speaker 1>trying to solve a hard problem? Break it down. What's

0:17:20.676 --> 0:17:24.836
<v Speaker 1>the biggest misconception people have about self driving cars? That

0:17:24.996 --> 0:17:31.676
<v Speaker 1>cameras only, that cameras are enough? That's the tesla you Yes,

0:17:32.036 --> 0:17:34.916
<v Speaker 1>what driving advice have you given your sixteen year old

0:17:34.956 --> 0:17:38.796
<v Speaker 1>daughter who is learning to drive. Relax and pay attention.

0:17:39.636 --> 0:17:43.676
<v Speaker 1>What is one domain where AI will never be better

0:17:43.756 --> 0:17:48.796
<v Speaker 1>than humans? And don't say love understanding the soul of

0:17:48.836 --> 0:17:54.396
<v Speaker 1>a human empathy, empathy. Of all the places you've lived,

0:17:54.516 --> 0:17:57.436
<v Speaker 1>where were the drivers the worst? You're gonna get me

0:17:57.476 --> 0:18:02.516
<v Speaker 1>in trouble? Israel was pretty tough driving environment standpoint. I

0:18:02.596 --> 0:18:05.636
<v Speaker 1>love the country. I would live there again. Driving there

0:18:05.716 --> 0:18:09.276
<v Speaker 1>is pretty tough. Interesting in terms of like norms and I, Oh,

0:18:09.276 --> 0:18:12.156
<v Speaker 1>you go, right, it's basically I go, I go pretty much.

0:18:13.036 --> 0:18:19.636
<v Speaker 1>So where have AI engineers underrated humans self driving? And

0:18:19.676 --> 0:18:23.596
<v Speaker 1>where have AI engineers overrated humans? What are humans worse at? Oh?

0:18:23.636 --> 0:18:26.996
<v Speaker 1>We cannot process what I call linear known processing where

0:18:26.996 --> 0:18:31.396
<v Speaker 1>the rules are clear, the algorithm is clear, we will

0:18:31.436 --> 0:18:36.236
<v Speaker 1>never beat the machine. Yeah, so we're terrible at chess,

0:18:36.276 --> 0:18:39.996
<v Speaker 1>but actually pretty good at driving. That's exactly right. Well,

0:18:40.076 --> 0:18:43.516
<v Speaker 1>this was delightful. Thank you for your time. I really

0:18:43.636 --> 0:18:45.756
<v Speaker 1>enjoyed it too. Thank you. I have to thank you

0:18:45.836 --> 0:18:48.316
<v Speaker 1>for something that nobody has done, and I do a

0:18:48.356 --> 0:18:52.236
<v Speaker 1>lot of these over the years. You pushed me to roots,

0:18:52.516 --> 0:18:55.596
<v Speaker 1>to the root of and to finding a way to

0:18:56.596 --> 0:18:59.956
<v Speaker 1>decipher the essence of self driving and why it's hard,

0:18:59.956 --> 0:19:04.676
<v Speaker 1>and I really appreciate it. I learned something today. Aisha

0:19:04.756 --> 0:19:08.796
<v Speaker 1>Evans is the CEO of Zoos. Today's show was produced

0:19:08.796 --> 0:19:11.996
<v Speaker 1>by Edith Russelo. It was edited by Kate Parkinson Morgan

0:19:12.036 --> 0:19:14.756
<v Speaker 1>and Robert Smith, and it was engineered by Amanda kay Wong.

0:19:15.116 --> 0:19:17.836
<v Speaker 1>Theme music by Luis Gara. Our development team is Lee,

0:19:17.916 --> 0:19:20.876
<v Speaker 1>Tom Mulad and Justine Lang. A huge team of people

0:19:20.956 --> 0:19:24.036
<v Speaker 1>makes What's Your Problem possible. That team includes, but is

0:19:24.076 --> 0:19:28.116
<v Speaker 1>not limited to, Jacob Weisberg, Na Lobel, Heather Fame, John Schnars,

0:19:28.196 --> 0:19:31.836
<v Speaker 1>Kerry Brodie, Carli mcgleory, Christina Sullivan, Jason Gambrel, Grant Hayes,

0:19:32.596 --> 0:19:35.956
<v Speaker 1>Eric Sandler, Maggie Taylor, Morgan Ratner, Nicole Morano, Mary Beth Smith,

0:19:36.076 --> 0:19:39.916
<v Speaker 1>Royston Breserve, Maya Kanig, Daniella Lakhan, Kazia Tan and David Blefer.

0:19:40.316 --> 0:19:43.036
<v Speaker 1>What's Your Problem is a co production of Pushkin Industries

0:19:43.076 --> 0:19:46.436
<v Speaker 1>and iHeartMedia. To find more Pushkin podcast listen on the

0:19:46.516 --> 0:19:51.276
<v Speaker 1>iHeartRadio app, Apple Podcasts, or where Effort. I'm Jacob Boldstein

0:19:51.316 --> 0:19:53.796
<v Speaker 1>and I'll be back next week with another episode of

0:19:53.916 --> 0:19:54.636
<v Speaker 1>What's Your Problem