WEBVTT - Luis von Ahn Explains How Computers and Humans Learn From Each Other

0:00:02.480 --> 0:00:15.760
<v Speaker 1>Bloomberg Audio Studios, Podcasts, radio News.

0:00:17.920 --> 0:00:21.200
<v Speaker 2>Hello and welcome to another episode of The Odd Lots podcast.

0:00:21.280 --> 0:00:24.439
<v Speaker 2>I'm Tracy Alloway and I'm Joe Wisenthal. Joe, you know,

0:00:24.640 --> 0:00:27.480
<v Speaker 2>I had a life realization recently.

0:00:27.720 --> 0:00:30.240
<v Speaker 3>Okay, this should be good, go on.

0:00:30.800 --> 0:00:34.440
<v Speaker 2>It struck me that I am spending a non negligible

0:00:34.520 --> 0:00:37.440
<v Speaker 2>amount of my time proving that I am in fact

0:00:37.880 --> 0:00:38.680
<v Speaker 2>a human being.

0:00:39.120 --> 0:00:42.120
<v Speaker 3>It's getting harder and harder. I know what you're talking about.

0:00:42.159 --> 0:00:44.320
<v Speaker 3>So we're talking. You know, you go to a website

0:00:44.360 --> 0:00:46.160
<v Speaker 3>and you have to enter in the captcha and it's

0:00:46.240 --> 0:00:49.600
<v Speaker 3>like click all these squares that has like a crosswalk

0:00:49.640 --> 0:00:51.920
<v Speaker 3>on them or a truck, and like it feels like

0:00:51.960 --> 0:00:54.400
<v Speaker 3>it's just getting harder. And sometimes I'm like, no, trust me,

0:00:54.440 --> 0:00:54.960
<v Speaker 3>I'm a human.

0:00:55.760 --> 0:00:58.080
<v Speaker 2>This is it. And every time it happens, I kind

0:00:58.080 --> 0:01:01.040
<v Speaker 2>of have a moment of self doubt whether or not

0:01:02.000 --> 0:01:05.319
<v Speaker 2>is it just me? Am I particularly bad at picking

0:01:05.360 --> 0:01:08.720
<v Speaker 2>out all the motorcycles in a set of pictures? Or

0:01:08.800 --> 0:01:13.559
<v Speaker 2>are they just becoming increasingly weird or perhaps increasingly sophisticated

0:01:13.760 --> 0:01:16.679
<v Speaker 2>in the face of new types of technology.

0:01:17.040 --> 0:01:19.600
<v Speaker 3>It's not just you. I've heard this from multiple people

0:01:19.800 --> 0:01:24.360
<v Speaker 3>in fact, prepping for this episode, I heard people talking

0:01:24.400 --> 0:01:27.280
<v Speaker 3>about exactly this, But you know, it's like a big problem.

0:01:27.319 --> 0:01:29.240
<v Speaker 3>You know, we did that world Coin episode, like everyone

0:01:29.400 --> 0:01:32.039
<v Speaker 3>is trying to figure out, like how in a world

0:01:32.080 --> 0:01:35.240
<v Speaker 3>of AI and bods and artificial intelligence all that stuff,

0:01:35.560 --> 0:01:38.039
<v Speaker 3>how do you know whether someone you're interacting with is

0:01:38.080 --> 0:01:38.880
<v Speaker 3>in fact a person.

0:01:39.120 --> 0:01:43.039
<v Speaker 2>Yeah, and I'm glad you mentioned AI because obviously part

0:01:43.080 --> 0:01:46.119
<v Speaker 2>of this dynamic is AI seems to be getting better

0:01:46.240 --> 0:01:50.560
<v Speaker 2>at solving these particular types of problems, but also they're

0:01:50.600 --> 0:01:54.720
<v Speaker 2>being used more right to train AI models. So at

0:01:54.720 --> 0:01:56.840
<v Speaker 2>this point, I think we all know why we're constantly

0:01:57.000 --> 0:02:00.640
<v Speaker 2>trying to identify bikes and a bunch of photos. But

0:02:00.960 --> 0:02:06.680
<v Speaker 2>the whole idea behind captures is or was that humans

0:02:06.720 --> 0:02:09.320
<v Speaker 2>still have an edge. So there are some things that

0:02:09.400 --> 0:02:13.480
<v Speaker 2>humans are better able to do versus machines. And one

0:02:13.480 --> 0:02:15.400
<v Speaker 2>of the things that we used to talk about humans

0:02:15.440 --> 0:02:18.840
<v Speaker 2>having an edge in was linguistics. So there is this

0:02:18.919 --> 0:02:23.000
<v Speaker 2>idea that human language was so complex, so nuanced, that

0:02:23.080 --> 0:02:27.359
<v Speaker 2>machines would maybe never be able to fully appreciate all

0:02:27.360 --> 0:02:31.160
<v Speaker 2>the intricacies and subtleties of the human language. But obviously,

0:02:31.200 --> 0:02:35.200
<v Speaker 2>since the arrival of generative AI and natural language processing.

0:02:35.560 --> 0:02:38.640
<v Speaker 2>I think there's more of a question mark around that. Yeah.

0:02:38.720 --> 0:02:41.440
<v Speaker 3>I mean, look, I think like a typical chat bot

0:02:41.520 --> 0:02:44.080
<v Speaker 3>right now is probably better than most people at just

0:02:44.200 --> 0:02:47.280
<v Speaker 3>typing out several paragraphs. It's all sort of like seemed

0:02:47.280 --> 0:02:48.880
<v Speaker 3>to sort of as they say on the internet, kind

0:02:48.919 --> 0:02:51.080
<v Speaker 3>of mid curve to me. It never like strikes me

0:02:51.120 --> 0:02:55.919
<v Speaker 3>as like incredibly intelligent, but clearly computers can talk about

0:02:55.919 --> 0:02:58.480
<v Speaker 3>as well as humans, and so it raises all sorts

0:02:58.480 --> 0:03:01.320
<v Speaker 3>of interesting questions. You mentioned that part of capture is

0:03:01.400 --> 0:03:04.200
<v Speaker 3>part of this, like training computers. A big part of

0:03:04.240 --> 0:03:07.440
<v Speaker 3>these chatbots the so called like real life human feedback

0:03:07.480 --> 0:03:09.680
<v Speaker 3>where people say this answer is better then another, this

0:03:09.720 --> 0:03:12.240
<v Speaker 3>answer is better another, is they refine the models, et cetera.

0:03:12.720 --> 0:03:13.840
<v Speaker 4>So I think there's like.

0:03:13.800 --> 0:03:16.920
<v Speaker 3>An interesting moment where like we're learning from computers and

0:03:16.960 --> 0:03:21.720
<v Speaker 3>computers are learning from us, maybe collaboratively, the two sides

0:03:22.240 --> 0:03:25.120
<v Speaker 3>in a carbon and silicon working together.

0:03:25.680 --> 0:03:27.560
<v Speaker 2>I think that's a great way of putting it. Also,

0:03:27.800 --> 0:03:31.880
<v Speaker 2>mid curve is such an underappreciated insult, like calling people

0:03:31.960 --> 0:03:34.760
<v Speaker 2>top of the bell curve is one of my favorite

0:03:34.760 --> 0:03:37.320
<v Speaker 2>things to do online. Anyway, I am very pleased to

0:03:37.400 --> 0:03:41.240
<v Speaker 2>say that today we actually have the perfect guest. We're

0:03:41.280 --> 0:03:45.680
<v Speaker 2>going to be speaking to someone who was very instrumental

0:03:45.800 --> 0:03:49.440
<v Speaker 2>in the development of things like Captcha and someone who

0:03:49.520 --> 0:03:53.440
<v Speaker 2>is doing a lot with AI, particularly in the field

0:03:53.600 --> 0:03:56.800
<v Speaker 2>of linguistics and language. Right now, we're going to be

0:03:56.800 --> 0:03:59.440
<v Speaker 2>speaking with Louis von On. He is, of course the

0:03:59.560 --> 0:04:02.920
<v Speaker 2>CEO and co founder of Duo Lingo. So, Louise, thank

0:04:03.000 --> 0:04:04.200
<v Speaker 2>you so much for coming on.

0:04:04.040 --> 0:04:06.119
<v Speaker 4>On thoughts, Thank you, thank you for having me.

0:04:06.800 --> 0:04:09.680
<v Speaker 2>So maybe to begin with talk to us about the

0:04:09.800 --> 0:04:14.160
<v Speaker 2>idea behind capture and why it seems to have become

0:04:14.320 --> 0:04:17.039
<v Speaker 2>I don't want to say a significant portion of my life,

0:04:17.080 --> 0:04:20.080
<v Speaker 2>but I certainly spend a couple minutes every day doing

0:04:20.080 --> 0:04:21.000
<v Speaker 2>at least one version.

0:04:21.680 --> 0:04:24.359
<v Speaker 4>Yeah. So the original capture, the idea of a capture

0:04:24.560 --> 0:04:28.279
<v Speaker 4>was a test to distinguish humans from computers. The reasons

0:04:28.320 --> 0:04:31.120
<v Speaker 4>why you may want to distinguish whether you're interacting with

0:04:31.160 --> 0:04:34.000
<v Speaker 4>a human or a computer online for example, and this

0:04:34.120 --> 0:04:37.240
<v Speaker 4>is kind of the original motivation for it. Companies offer

0:04:37.279 --> 0:04:40.359
<v Speaker 4>free email services, and you know they have the problem

0:04:40.440 --> 0:04:43.599
<v Speaker 4>that if you allow anything to sign up for a

0:04:43.600 --> 0:04:46.920
<v Speaker 4>freemail service, like either a computer or human, somebody could

0:04:46.960 --> 0:04:49.560
<v Speaker 4>write a program to obtain millions of free email accounts,

0:04:49.880 --> 0:04:53.919
<v Speaker 4>whereas humans, because they are usually not that patient, cannot

0:04:54.240 --> 0:04:56.520
<v Speaker 4>get millions of email accounts for themselves. They can only

0:04:56.520 --> 0:05:00.360
<v Speaker 4>get one or two. So the original motivation for aptual

0:05:00.440 --> 0:05:02.359
<v Speaker 4>was to make a test to make sure that whoever

0:05:02.640 --> 0:05:04.760
<v Speaker 4>is getting a freemail accunt is actually a human and

0:05:04.800 --> 0:05:07.760
<v Speaker 4>not a computer program that was written to obtain millions

0:05:07.760 --> 0:05:11.000
<v Speaker 4>of email accounts, so, you know, and the way it worked,

0:05:11.120 --> 0:05:13.400
<v Speaker 4>there's there's many kind of tests. Originally, the way it

0:05:13.440 --> 0:05:16.560
<v Speaker 4>worked is distorted letters, So you would get a bunch

0:05:16.600 --> 0:05:18.800
<v Speaker 4>of letters that were predistorted and you had to type

0:05:18.800 --> 0:05:21.640
<v Speaker 4>what they were. And the reason that worked is because

0:05:22.240 --> 0:05:25.560
<v Speaker 4>human beings are very good at reindistorted letters. But at

0:05:25.600 --> 0:05:27.720
<v Speaker 4>the time this was, you know, more than twenty years ago,

0:05:28.000 --> 0:05:31.720
<v Speaker 4>computers just could not recognize distorted letters very well. So

0:05:31.760 --> 0:05:34.280
<v Speaker 4>that was a great test to determine whether you were

0:05:34.279 --> 0:05:36.880
<v Speaker 4>talking to a human or a computer. But what happened

0:05:36.880 --> 0:05:40.880
<v Speaker 4>is over time, computers got quite good at this trying

0:05:40.920 --> 0:05:45.919
<v Speaker 4>to deciphering distorted text, so it was no longer possible

0:05:45.960 --> 0:05:48.640
<v Speaker 4>to give an image with distorted text and distinguish a

0:05:48.680 --> 0:05:50.840
<v Speaker 4>human from a computer, because computers pretty much got as

0:05:50.880 --> 0:05:54.360
<v Speaker 4>good as a human at that point, these tests started

0:05:54.520 --> 0:05:56.480
<v Speaker 4>changing to other things. I mean, one of the more

0:05:56.520 --> 0:05:59.400
<v Speaker 4>popular ones that you see nowadays is kind of clicking

0:05:59.520 --> 0:06:02.520
<v Speaker 4>on the images of something. So you can see a grid,

0:06:02.680 --> 0:06:05.240
<v Speaker 4>like a four by four grid, and it may say

0:06:05.640 --> 0:06:07.920
<v Speaker 4>click on all the traffic lights, or click on all

0:06:07.960 --> 0:06:12.960
<v Speaker 4>the bicycles, et cetera. And by clicking on them, you know,

0:06:13.120 --> 0:06:17.279
<v Speaker 4>you're you're showing that you can actually recognize these things.

0:06:17.480 --> 0:06:20.520
<v Speaker 4>And the reason they're getting harder is because computers are

0:06:20.520 --> 0:06:23.960
<v Speaker 4>getting better and better at deciphering which ones are traffic lights,

0:06:24.000 --> 0:06:27.160
<v Speaker 4>et cetera. And by now, what you're getting here are

0:06:27.200 --> 0:06:30.280
<v Speaker 4>the things that we still think computers are not very

0:06:30.279 --> 0:06:33.920
<v Speaker 4>good at. So the image may be very blurry, or

0:06:34.080 --> 0:06:35.920
<v Speaker 4>you know, you may just get a tiny little corner

0:06:36.000 --> 0:06:38.599
<v Speaker 4>of it and things like that. So that's why they're

0:06:38.600 --> 0:06:40.960
<v Speaker 4>getting harder, and I expect that to continue happening.

0:06:41.680 --> 0:06:45.040
<v Speaker 3>So you are the found You founded a company called

0:06:45.200 --> 0:06:49.000
<v Speaker 3>recap Show, which you sold to Google and several years ago.

0:06:49.480 --> 0:06:52.040
<v Speaker 3>Is there gonna be a point where I mean, I

0:06:52.080 --> 0:06:56.479
<v Speaker 3>assume computer vision and their ability to decode images or

0:06:56.520 --> 0:06:59.800
<v Speaker 3>recognize images is not done improving. I assume it's going

0:06:59.800 --> 0:07:03.479
<v Speaker 3>to get better, whereas humans' ability to decode images. I

0:07:03.560 --> 0:07:06.159
<v Speaker 3>doubt it's really getting any better. We've probably been about

0:07:06.160 --> 0:07:09.120
<v Speaker 3>the same for a couple thousand years now. Like, is

0:07:09.160 --> 0:07:11.920
<v Speaker 3>there going to be a point in which it's impossible

0:07:12.040 --> 0:07:14.600
<v Speaker 3>to create a visual test that humans are better at

0:07:14.600 --> 0:07:15.480
<v Speaker 3>than computers?

0:07:15.680 --> 0:07:18.320
<v Speaker 4>I believe that will happen at some point. Yeah, it's

0:07:18.480 --> 0:07:21.840
<v Speaker 4>very hard to say when exactly, but you know, you

0:07:21.880 --> 0:07:24.440
<v Speaker 4>can just see at this point it's getting you know,

0:07:24.480 --> 0:07:27.200
<v Speaker 4>computers are getting better and better. And you know, the

0:07:27.240 --> 0:07:30.120
<v Speaker 4>other thing that is important to mention is this type

0:07:30.120 --> 0:07:33.200
<v Speaker 4>of test has extra constraints. It also has to be

0:07:33.280 --> 0:07:36.360
<v Speaker 4>the case that it's not just that humans can do

0:07:36.400 --> 0:07:38.040
<v Speaker 4>it. It's like, really, humans should be able to do it

0:07:38.040 --> 0:07:43.600
<v Speaker 4>pretty quickly and you know, success.

0:07:43.360 --> 0:07:46.400
<v Speaker 3>Quickly, and on a mobile phone and a very small

0:07:46.480 --> 0:07:48.840
<v Speaker 3>screen in which like my thumb is like half the

0:07:48.880 --> 0:07:49.520
<v Speaker 3>size of the screen.

0:07:49.640 --> 0:07:51.600
<v Speaker 4>Yeah. Yeah, And it may not be you know, quickly.

0:07:51.680 --> 0:07:53.520
<v Speaker 4>I mean it may take you, I don't know, thirty

0:07:53.520 --> 0:07:55.480
<v Speaker 4>seconds or a minute. But we cannot make a test

0:07:55.480 --> 0:07:59.200
<v Speaker 4>that takes you an hour. We can't do that. So

0:07:59.240 --> 0:08:01.200
<v Speaker 4>it has to be quick. It has to be done

0:08:01.200 --> 0:08:02.480
<v Speaker 4>on a mobile phone. It has to be the case

0:08:02.480 --> 0:08:04.440
<v Speaker 4>that the computer should be able to grade it. Computer

0:08:04.480 --> 0:08:06.160
<v Speaker 4>should be able to know what the right answer was,

0:08:06.280 --> 0:08:09.400
<v Speaker 4>even though it can't solve it. So because of all

0:08:09.400 --> 0:08:11.880
<v Speaker 4>of these constraints, I mean, my sense is at some

0:08:12.000 --> 0:08:14.160
<v Speaker 4>point this is just going to be impossible. I mean,

0:08:14.320 --> 0:08:17.360
<v Speaker 4>we knew this when we started the original capture that

0:08:17.400 --> 0:08:19.640
<v Speaker 4>at some point computers were going to get good enough,

0:08:20.800 --> 0:08:22.960
<v Speaker 4>but we just had no idea how long it was

0:08:23.000 --> 0:08:25.520
<v Speaker 4>going to take. And I still don't know how long

0:08:25.600 --> 0:08:27.520
<v Speaker 4>it's going to take. But you know, I would not

0:08:27.600 --> 0:08:29.960
<v Speaker 4>be surprised if in five to ten years there's just

0:08:30.040 --> 0:08:32.679
<v Speaker 4>not much that you can do that is really quick

0:08:33.080 --> 0:08:36.200
<v Speaker 4>online to be able to differentiate humans from computers.

0:08:36.360 --> 0:08:39.760
<v Speaker 2>Yeah, that's when we get the eyeball scanning ORBS. But

0:08:40.000 --> 0:08:42.360
<v Speaker 2>I mean you mentioned that you can't have a test

0:08:42.679 --> 0:08:45.760
<v Speaker 2>that takes an hour or something like that. But this

0:08:45.880 --> 0:08:49.160
<v Speaker 2>kind of begs the question in my mind of why

0:08:49.200 --> 0:08:51.839
<v Speaker 2>are people using these tests at all? So, like, Okay,

0:08:51.920 --> 0:08:56.160
<v Speaker 2>obviously you want to distinguish between humans and robots, but

0:08:56.280 --> 0:08:59.160
<v Speaker 2>I sometimes get the sense that these are basically free

0:08:59.240 --> 0:09:03.600
<v Speaker 2>labor AI training programs, Right, So even if you can

0:09:03.760 --> 0:09:07.439
<v Speaker 2>verify identity in some other way, why not get people

0:09:07.679 --> 0:09:10.920
<v Speaker 2>on a mass scale to spend two minutes training self

0:09:11.000 --> 0:09:11.680
<v Speaker 2>driving cars.

0:09:12.200 --> 0:09:14.240
<v Speaker 4>Yeah, I mean, this is what these things are doing.

0:09:14.240 --> 0:09:17.320
<v Speaker 4>That was the original idea of Recapture, which was my company.

0:09:17.400 --> 0:09:21.120
<v Speaker 4>The idea was that you could, at the same time

0:09:21.160 --> 0:09:23.000
<v Speaker 4>as you were proving that you are a human, you

0:09:23.040 --> 0:09:25.400
<v Speaker 4>could be doing something that computers could not yet do,

0:09:25.800 --> 0:09:29.080
<v Speaker 4>and that data could be used to improve computer programs

0:09:29.080 --> 0:09:32.520
<v Speaker 4>to do it. So certainly, when you're clicking on bicycles

0:09:32.600 --> 0:09:35.280
<v Speaker 4>or when you're clicking on traffic lights or whatever, that

0:09:35.440 --> 0:09:38.600
<v Speaker 4>is likely data that is being used. I say likely

0:09:38.600 --> 0:09:40.800
<v Speaker 4>because you know, I don't know what capture you're using.

0:09:41.000 --> 0:09:42.360
<v Speaker 4>There may be some that are not doing that, but

0:09:42.800 --> 0:09:47.000
<v Speaker 4>overall that data is being used to improve things like

0:09:47.559 --> 0:09:51.800
<v Speaker 4>self driving cars, image recognition programs, et cetera. So that

0:09:51.920 --> 0:09:54.800
<v Speaker 4>is happening, and that's you know, generally a good thing

0:09:54.840 --> 0:09:59.000
<v Speaker 4>because that's making basically AI smarter and smarter. But you know,

0:09:59.480 --> 0:10:01.520
<v Speaker 4>we still needed to be the case that it's a

0:10:01.559 --> 0:10:05.480
<v Speaker 4>good security mechanism. So if at some point just computers

0:10:05.480 --> 0:10:09.080
<v Speaker 4>can do that, then you know, that's just not a

0:10:09.080 --> 0:10:10.959
<v Speaker 4>great security mechanism and it's not going to be used.

0:10:10.960 --> 0:10:13.480
<v Speaker 4>And my sense is if we're gonna want to do something,

0:10:13.480 --> 0:10:16.280
<v Speaker 4>we are going to need something like real identity, Like

0:10:16.600 --> 0:10:18.040
<v Speaker 4>I don't know if it's going to be eyeball scanning

0:10:18.120 --> 0:10:20.520
<v Speaker 4>or whatever, but it's good. We're gonna you know, the

0:10:20.840 --> 0:10:23.360
<v Speaker 4>nice thing about a capture is it doesn't tie you

0:10:23.400 --> 0:10:26.040
<v Speaker 4>to you. It just proves that you're a human. Right,

0:10:26.440 --> 0:10:29.040
<v Speaker 4>We're probably going to need something that ties you to you.

0:10:29.760 --> 0:10:31.760
<v Speaker 4>We're probably going to need something that says, well, I

0:10:31.960 --> 0:10:35.400
<v Speaker 4>just know this is this specific person because you know whatever,

0:10:35.800 --> 0:10:39.040
<v Speaker 4>we're scanning their eyeball, we're looking at their fingerprint, whatever

0:10:39.080 --> 0:10:41.040
<v Speaker 4>it is, and it is actually a real person, and

0:10:41.080 --> 0:10:42.000
<v Speaker 4>it is this person.

0:10:43.000 --> 0:10:45.280
<v Speaker 3>Why don't we sort of zoom out and back up

0:10:45.320 --> 0:10:48.240
<v Speaker 3>for a second. So currently you are the CEO of

0:10:48.360 --> 0:10:54.120
<v Speaker 3>Duo Lingo of the popular language learning app, publicly traded company.

0:10:54.600 --> 0:10:58.160
<v Speaker 3>Done much better sort of stockwise than many companies that

0:10:58.240 --> 0:11:01.480
<v Speaker 3>came public in twenty twenty one. I have expected, you know,

0:11:01.640 --> 0:11:03.760
<v Speaker 3>there was a boom when people a bunch of time

0:11:03.800 --> 0:11:06.520
<v Speaker 3>on their hand gone down. You also sort of one

0:11:06.520 --> 0:11:10.240
<v Speaker 3>of the most respected sort of computer sciences thinkers coming

0:11:10.240 --> 0:11:13.520
<v Speaker 3>out of the Carnegie Mellon University. What is the through

0:11:13.600 --> 0:11:16.120
<v Speaker 3>line of your work or how would you characterize that

0:11:16.200 --> 0:11:20.280
<v Speaker 3>connects something like captures to language learning a dual lingo.

0:11:20.760 --> 0:11:23.600
<v Speaker 4>It's similar to what you were talking about smiling when

0:11:23.600 --> 0:11:25.320
<v Speaker 4>you were mentioning that. I mean, I think the general

0:11:25.360 --> 0:11:29.319
<v Speaker 4>through line is a combination of humans learning from computers

0:11:29.320 --> 0:11:32.480
<v Speaker 4>and computers learning from humans. And you know, capture had

0:11:32.520 --> 0:11:35.480
<v Speaker 4>that while you were typing a capture, computers were learning

0:11:35.520 --> 0:11:38.040
<v Speaker 4>from what you were doing. In the case of duolingo,

0:11:38.600 --> 0:11:41.760
<v Speaker 4>it's really a symbiotic thing that both are learning, in

0:11:41.800 --> 0:11:45.160
<v Speaker 4>that humans are learning a language and in the case

0:11:45.160 --> 0:11:47.080
<v Speaker 4>of due a lingo, due lingos learning how to teach

0:11:47.160 --> 0:11:51.520
<v Speaker 4>humans better by interacting with humans a lot. So you know,

0:11:51.600 --> 0:11:54.960
<v Speaker 4>dual lingo just gets better with time because we figure

0:11:55.000 --> 0:11:58.520
<v Speaker 4>out different ways in which humans are just learning better.

0:11:59.160 --> 0:12:01.440
<v Speaker 4>You know, humans are getting better with a language, and

0:12:01.520 --> 0:12:03.439
<v Speaker 4>do a linguos getting better at teaching you languages.

0:12:19.120 --> 0:12:20.640
<v Speaker 2>Joe, have you used to a lingo?

0:12:21.400 --> 0:12:25.520
<v Speaker 3>I haven't. Well, okay, I hadn't up until recently. So

0:12:26.080 --> 0:12:29.040
<v Speaker 3>last week, as it turns out, I visited my mother

0:12:29.120 --> 0:12:32.199
<v Speaker 3>who lives in Guatemala, which luis I Anderson You're from,

0:12:32.280 --> 0:12:35.280
<v Speaker 3>And oh, wow, yeah, she's she is. Uh, she's not

0:12:35.360 --> 0:12:38.440
<v Speaker 3>from there, but she visited a friend there eight years

0:12:38.440 --> 0:12:39.880
<v Speaker 3>ago and she loved it, and she's like, I'm just

0:12:39.920 --> 0:12:42.720
<v Speaker 3>gonna stay and she has a little never left. She

0:12:42.800 --> 0:12:44.440
<v Speaker 3>loved it so much, and so I visited her for

0:12:44.480 --> 0:12:47.240
<v Speaker 3>the first time at her house near Lake Atitlan, and

0:12:47.240 --> 0:12:48.679
<v Speaker 3>then I was like, oh, there's a great life and

0:12:48.720 --> 0:12:51.640
<v Speaker 3>maybe one day I'll even have that house. And I

0:12:51.640 --> 0:12:54.560
<v Speaker 3>should learn Spanish, And so I did, partly because of

0:12:54.559 --> 0:12:57.280
<v Speaker 3>that trip and partly to prepare for this episode. I

0:12:57.400 --> 0:12:59.880
<v Speaker 3>downloaded it and have started. I know a little bit

0:12:59.920 --> 0:13:02.280
<v Speaker 3>of Spanish, not much like I can, you know, ask

0:13:02.320 --> 0:13:04.079
<v Speaker 3>for the bill and stuff, but it's like, oh, I should,

0:13:04.120 --> 0:13:05.040
<v Speaker 3>I should start to learn it.

0:13:05.160 --> 0:13:09.160
<v Speaker 2>That's funny because I also started learning Spanish right before

0:13:09.280 --> 0:13:12.040
<v Speaker 2>a trip to Guatemala. There you go with Duolingo, and

0:13:12.280 --> 0:13:16.000
<v Speaker 2>I'm not the best advertisement for the app. I'm afraid,

0:13:16.080 --> 0:13:18.959
<v Speaker 2>like the only thing I remember is basically like Kissierra

0:13:19.120 --> 0:13:24.000
<v Speaker 2>una hapatas personas. That's all I remember from.

0:13:23.920 --> 0:13:24.600
<v Speaker 3>It's pretty good.

0:13:25.000 --> 0:13:26.160
<v Speaker 4>Thanks, that's pretty good.

0:13:26.920 --> 0:13:28.720
<v Speaker 2>All right, I need to get back on it. But

0:13:29.080 --> 0:13:31.600
<v Speaker 2>why don't you talk to us a little bit about

0:13:31.640 --> 0:13:37.040
<v Speaker 2>the opportunity with AI in this sort of language learning space,

0:13:37.280 --> 0:13:41.280
<v Speaker 2>because intuitively, it would seem like things like chat bots

0:13:41.320 --> 0:13:44.800
<v Speaker 2>and generative AI and natural language processing and things like

0:13:44.840 --> 0:13:48.840
<v Speaker 2>that would be an amazing fit for this type of business.

0:13:49.120 --> 0:13:51.600
<v Speaker 4>Yeah, it's a really good fit. So okay, So you know,

0:13:51.600 --> 0:13:55.320
<v Speaker 4>we teach languages. We do a lingo. Historically, you know,

0:13:55.400 --> 0:13:57.720
<v Speaker 4>learning a language just has a lot of different components.

0:13:57.760 --> 0:14:00.440
<v Speaker 4>You got to learn how to how to read language.

0:14:00.440 --> 0:14:02.760
<v Speaker 4>You got to learn some vocabulary, you got to learn

0:14:02.760 --> 0:14:05.480
<v Speaker 4>how to listen to it. If there's a different writing system,

0:14:05.520 --> 0:14:07.839
<v Speaker 4>you've got to learn the writing system, you got to

0:14:07.920 --> 0:14:09.800
<v Speaker 4>learn how to have a conversation. There's a lot of

0:14:09.800 --> 0:14:14.480
<v Speaker 4>different skills that are required in learning a language. Historically,

0:14:14.520 --> 0:14:17.720
<v Speaker 4>we have done pretty well in all the skills except

0:14:17.760 --> 0:14:21.080
<v Speaker 4>for one of them, which is having a multi turned

0:14:21.120 --> 0:14:24.960
<v Speaker 4>fluid conversation. So we could teach you, you know, historically, we

0:14:25.000 --> 0:14:27.320
<v Speaker 4>could teach you, We could teach your vocabulary really well.

0:14:27.360 --> 0:14:29.000
<v Speaker 4>We could teach you how to listen to a language.

0:14:29.040 --> 0:14:30.880
<v Speaker 4>It's you know, generally just by just getting you to

0:14:30.880 --> 0:14:32.920
<v Speaker 4>listen a lot to something. So we could teach you

0:14:32.960 --> 0:14:37.280
<v Speaker 4>all the things, but being able to practice actual multi

0:14:37.320 --> 0:14:40.160
<v Speaker 4>turned conversation was not something that we could do with

0:14:40.320 --> 0:14:42.840
<v Speaker 4>just a computer. Historically, that needed us to pair you

0:14:42.880 --> 0:14:45.240
<v Speaker 4>with another human. Now we do a ling We never

0:14:45.280 --> 0:14:47.280
<v Speaker 4>paired people up with other humans, because it turns out

0:14:47.800 --> 0:14:50.400
<v Speaker 4>a very small fraction of people actually want to be

0:14:50.480 --> 0:14:53.600
<v Speaker 4>paired with a random person over the internet who speaks

0:14:53.600 --> 0:14:56.720
<v Speaker 4>a different language. It's just it's kind of too embarrassing

0:14:56.760 --> 0:15:00.640
<v Speaker 4>for most people. I never did that. Well, it may

0:15:00.680 --> 0:15:04.640
<v Speaker 4>be dangerous, yes, but it also it's just it's like

0:15:04.720 --> 0:15:08.320
<v Speaker 4>ninety percent of people just not extroverted enough, yeah to

0:15:08.400 --> 0:15:11.120
<v Speaker 4>do that. I just don't want to do it. So

0:15:11.600 --> 0:15:14.440
<v Speaker 4>we always, you know, kind of we did these kind

0:15:14.440 --> 0:15:18.000
<v Speaker 4>of wonky things to try to emulate short conversations, but

0:15:18.040 --> 0:15:20.360
<v Speaker 4>we could never do anything like what we can do

0:15:20.480 --> 0:15:24.720
<v Speaker 4>now because with large language models, we really can get

0:15:24.760 --> 0:15:27.840
<v Speaker 4>you to practice you know, it may not be a

0:15:27.920 --> 0:15:30.160
<v Speaker 4>three hour conversation, but we can get you to practice

0:15:30.160 --> 0:15:32.440
<v Speaker 4>a multi turn, you know, ten minute conversation and it's

0:15:32.480 --> 0:15:34.680
<v Speaker 4>pretty good. So that's that's what we're doing with du

0:15:34.680 --> 0:15:38.680
<v Speaker 4>A Lingo. We're using it to help you learn conversational

0:15:38.720 --> 0:15:41.000
<v Speaker 4>skills a lot better, and that's helping out quite a bit.

0:15:41.840 --> 0:15:44.320
<v Speaker 3>There are so many questions I have, and I you know,

0:15:44.880 --> 0:15:46.920
<v Speaker 3>I think my mom will rely like this episode because,

0:15:46.960 --> 0:15:50.320
<v Speaker 3>in addition to the Guatemala connection, she is a linguist.

0:15:50.520 --> 0:15:54.440
<v Speaker 3>She speaks like seven languages, including Spanish, and like basically

0:15:55.240 --> 0:15:57.080
<v Speaker 3>you know all the others, not all the others, but

0:15:57.680 --> 0:16:01.040
<v Speaker 3>all the others, many many others. But you know something

0:16:01.080 --> 0:16:03.600
<v Speaker 3>that I was curious about, and maybe this is a

0:16:03.640 --> 0:16:05.600
<v Speaker 3>little bit of random jumping point, you know. I think

0:16:05.640 --> 0:16:09.480
<v Speaker 3>about like chess computers, and originally they were sort of

0:16:09.520 --> 0:16:12.680
<v Speaker 3>trained on a corpus of famous chess games, and then

0:16:12.720 --> 0:16:13.240
<v Speaker 3>with some.

0:16:13.120 --> 0:16:14.120
<v Speaker 4>Computer they got better.

0:16:14.120 --> 0:16:18.720
<v Speaker 3>And then the new generation essentially relearned chess from just

0:16:18.800 --> 0:16:21.640
<v Speaker 3>the rules from first principles, and it turns out that

0:16:21.640 --> 0:16:24.560
<v Speaker 3>they're way better. And I'm wondering, if you're learning through

0:16:24.560 --> 0:16:26.520
<v Speaker 3>the process of building out do a lingo improvement, Like

0:16:27.160 --> 0:16:30.960
<v Speaker 3>are there forms of pedagogy that in language learning, whether

0:16:31.040 --> 0:16:33.960
<v Speaker 3>it's the need for immersion or the need for roat drills,

0:16:34.040 --> 0:16:37.640
<v Speaker 3>or certain things that linguists have always thought were necessary

0:16:37.640 --> 0:16:41.880
<v Speaker 3>components of good language learning that when rebuilding education from

0:16:41.920 --> 0:16:46.240
<v Speaker 3>the ground up, like old dictums just turn out to

0:16:46.240 --> 0:16:49.000
<v Speaker 3>be completely wrong, And when you rebuild the process from

0:16:49.040 --> 0:16:52.600
<v Speaker 3>the beginning, like novel forms of pedagogy emerge.

0:16:53.160 --> 0:16:56.240
<v Speaker 4>It's a great question, and it's a hard question to

0:16:56.280 --> 0:16:59.840
<v Speaker 4>answer for the following reason, at least for us we

0:17:00.200 --> 0:17:04.760
<v Speaker 4>teach a language from an app. Historically, the way people

0:17:04.840 --> 0:17:08.280
<v Speaker 4>learn languages is basically by practicing with another human or

0:17:08.400 --> 0:17:10.840
<v Speaker 4>being in a classroom or whatever. Whereas we teach from

0:17:10.880 --> 0:17:14.240
<v Speaker 4>an app, the setting is just very different for one

0:17:14.680 --> 0:17:18.600
<v Speaker 4>key reason, which is that it is so easy to

0:17:18.720 --> 0:17:21.879
<v Speaker 4>leave the app, whereas leaving a classroom it's just not

0:17:21.920 --> 0:17:23.720
<v Speaker 4>that easy. You kind of have to go. You're usually

0:17:23.720 --> 0:17:25.800
<v Speaker 4>forced by your parents to go to a classroom, and like,

0:17:26.119 --> 0:17:29.760
<v Speaker 4>you know, so generally, the thing about learning something by

0:17:29.800 --> 0:17:33.240
<v Speaker 4>yourself when you're just learning it through a computer is

0:17:33.280 --> 0:17:37.439
<v Speaker 4>that the hardest thing is motivation. It turns out that

0:17:37.320 --> 0:17:41.040
<v Speaker 4>the pedagogy is important, of course it is, but much

0:17:41.119 --> 0:17:44.359
<v Speaker 4>like exercising, what matters the most is that you're actually

0:17:44.440 --> 0:17:46.720
<v Speaker 4>motivated to do it every day. So like, is the

0:17:46.760 --> 0:17:51.560
<v Speaker 4>elliptical better than the step climber or better than the treadmill? Like, yeah,

0:17:51.600 --> 0:17:55.000
<v Speaker 4>they're probably differences, but the reality is what's most important

0:17:55.040 --> 0:17:57.280
<v Speaker 4>is that you kind of do it often. And so

0:17:57.760 --> 0:17:59.760
<v Speaker 4>what we have found with dual linguo is that if

0:17:59.800 --> 0:18:01.960
<v Speaker 4>we're going to teach it with an app, there are

0:18:01.960 --> 0:18:05.480
<v Speaker 4>a lot of things that historically, you know, language teachers

0:18:05.640 --> 0:18:09.920
<v Speaker 4>or linguists didn't think we're the best ways to teach languages,

0:18:10.000 --> 0:18:11.359
<v Speaker 4>but if you're going to do it with an app,

0:18:11.359 --> 0:18:13.960
<v Speaker 4>you have to make it engaging. And we've had to

0:18:13.960 --> 0:18:16.320
<v Speaker 4>do it that way, and we have found that we

0:18:16.359 --> 0:18:20.320
<v Speaker 4>can do some things significantly better than human teachers, and

0:18:20.359 --> 0:18:23.560
<v Speaker 4>something's not as good because it's a very different system.

0:18:23.640 --> 0:18:26.040
<v Speaker 4>But again, the most important thing is just to keep

0:18:26.080 --> 0:18:29.480
<v Speaker 4>you motivated. So examples of things that we've had to

0:18:29.480 --> 0:18:32.320
<v Speaker 4>do to keep people motivated are quote unquote classes, which

0:18:32.359 --> 0:18:35.000
<v Speaker 4>is a lesson undu a lingo. They're not thirty minutes

0:18:35.080 --> 0:18:37.280
<v Speaker 4>or forty five minutes, they're two and a half minutes.

0:18:38.119 --> 0:18:41.960
<v Speaker 4>If they're any longer, we start losing people's attention. So

0:18:42.000 --> 0:18:44.359
<v Speaker 4>stuff like that I think has been really important. Now

0:18:44.400 --> 0:18:47.440
<v Speaker 4>I'll say, related to your question, one thing that has

0:18:47.440 --> 0:18:50.160
<v Speaker 4>been amazing is that, you know, we start out with

0:18:50.840 --> 0:18:53.720
<v Speaker 4>language experts who you know, people with PhDs and second

0:18:53.760 --> 0:18:56.200
<v Speaker 4>language acquisition, who tell us how to best teach something.

0:18:56.240 --> 0:18:58.280
<v Speaker 4>But then it takes it from there and the computer

0:18:58.359 --> 0:19:01.760
<v Speaker 4>optimizes it, and so the computer starts finding different ways.

0:19:01.800 --> 0:19:05.399
<v Speaker 4>There are different orderings of things that are actually better

0:19:05.880 --> 0:19:09.520
<v Speaker 4>than what the people with phg's and second language acquisition thought.

0:19:09.600 --> 0:19:12.040
<v Speaker 4>But it's because they just didn't have the data to

0:19:12.119 --> 0:19:14.240
<v Speaker 4>optimize this, whereas now you know, we do a lingo,

0:19:14.320 --> 0:19:17.239
<v Speaker 4>we have it's something like one billion exercises. Is one

0:19:17.280 --> 0:19:20.600
<v Speaker 4>billion exercises are solved every day by people using dual lingo,

0:19:21.119 --> 0:19:22.840
<v Speaker 4>and that just has a lot of data that helps

0:19:22.880 --> 0:19:23.480
<v Speaker 4>us teach better.

0:19:23.880 --> 0:19:26.280
<v Speaker 2>This is exactly what I wanted to ask you, which

0:19:26.320 --> 0:19:30.119
<v Speaker 2>is how iterative is this technology? So how much is

0:19:30.119 --> 0:19:33.320
<v Speaker 2>it about the AI model sort of developing off the

0:19:33.400 --> 0:19:36.199
<v Speaker 2>data that you feed it, and then the AI model

0:19:36.480 --> 0:19:41.600
<v Speaker 2>improving the outcome for users and thereby generating more data

0:19:41.680 --> 0:19:42.600
<v Speaker 2>from which it can train.

0:19:43.000 --> 0:19:47.080
<v Speaker 4>It's exactly we're exactly doing that, and in particular, one

0:19:47.119 --> 0:19:49.480
<v Speaker 4>of the things that we've been able to optimize a

0:19:49.520 --> 0:19:53.000
<v Speaker 4>lot is which exercise we give to which person. So

0:19:53.040 --> 0:19:54.840
<v Speaker 4>when you start a lesson and do a lingo, you

0:19:54.880 --> 0:19:56.800
<v Speaker 4>may think that all lessons are the same for everybody.

0:19:56.840 --> 0:20:00.119
<v Speaker 4>They're absolutely not. When you use to a lingo, you

0:20:00.200 --> 0:20:04.040
<v Speaker 4>watch what you do, and you know, the computer makes

0:20:04.040 --> 0:20:06.680
<v Speaker 4>a model of you as a student, so it sees

0:20:06.760 --> 0:20:08.879
<v Speaker 4>everything you get right, everything you get wrong, and based

0:20:08.880 --> 0:20:11.080
<v Speaker 4>on that, it starts realizing you're not very good at

0:20:11.080 --> 0:20:14.000
<v Speaker 4>the past tense, or you're not very good at the

0:20:14.000 --> 0:20:16.639
<v Speaker 4>future tens or whatever. And whenever you start a lesson,

0:20:17.160 --> 0:20:19.560
<v Speaker 4>it uses that model specifically for you, and it knows

0:20:19.560 --> 0:20:21.119
<v Speaker 4>that you're not very good at a past tense, so

0:20:21.119 --> 0:20:24.080
<v Speaker 4>it may give you more past tense or it does

0:20:24.119 --> 0:20:26.560
<v Speaker 4>stuff like that. And that definitely gets better with more

0:20:26.560 --> 0:20:28.600
<v Speaker 4>and more data. And I'll say another thing that is

0:20:28.640 --> 0:20:31.240
<v Speaker 4>really important. If we were to give you a lesson

0:20:32.000 --> 0:20:35.280
<v Speaker 4>only with the things that you're not good at, that

0:20:35.320 --> 0:20:38.560
<v Speaker 4>would be a horrible lesson because that would be extremely frustrating.

0:20:38.600 --> 0:20:40.239
<v Speaker 4>It's just basically, here are the things you're bad at,

0:20:40.400 --> 0:20:42.479
<v Speaker 4>just that we do a lot more of that. So

0:20:42.680 --> 0:20:45.080
<v Speaker 4>in addition to that, we have a system that tries

0:20:45.119 --> 0:20:48.119
<v Speaker 4>to and it gets better and better over time. It

0:20:48.200 --> 0:20:51.359
<v Speaker 4>is tuned for every exercise we have on DUELINGO that

0:20:51.640 --> 0:20:54.159
<v Speaker 4>could give you. It knows the probability that you're going

0:20:54.200 --> 0:20:57.520
<v Speaker 4>to get that exercise correct. And whenever we are giving

0:20:57.520 --> 0:21:00.720
<v Speaker 4>you an exercise, we optimize so that we try to

0:21:00.840 --> 0:21:03.160
<v Speaker 4>only give you exercises that you have about an eighty

0:21:03.200 --> 0:21:06.760
<v Speaker 4>percent chance of getting right. And that has been quite

0:21:06.800 --> 0:21:09.320
<v Speaker 4>good because it turns out eighty percent is kind of

0:21:09.359 --> 0:21:13.080
<v Speaker 4>at this zone of maximal development where basically it's not

0:21:13.760 --> 0:21:16.399
<v Speaker 4>too easy because you're not getting Having a one hundred

0:21:16.400 --> 0:21:18.440
<v Speaker 4>percent chance of getting it right if it's too easy

0:21:18.480 --> 0:21:20.600
<v Speaker 4>has two problems. Not only is it boring that it's

0:21:20.640 --> 0:21:23.399
<v Speaker 4>too easy, but also you're probably not learning anything if

0:21:23.400 --> 0:21:24.920
<v Speaker 4>you have a hundred percent chance of getting it right.

0:21:25.240 --> 0:21:28.400
<v Speaker 4>And it's also not too hard because humans get frustrated

0:21:28.440 --> 0:21:30.880
<v Speaker 4>if you're getting things right only thirty percent of the time.

0:21:31.160 --> 0:21:32.720
<v Speaker 4>So it turns out that we should give you things

0:21:32.720 --> 0:21:34.320
<v Speaker 4>that you have an eighty percent chance of getting right,

0:21:34.320 --> 0:21:36.879
<v Speaker 4>and that has been really successful, and you know, we

0:21:37.000 --> 0:21:40.080
<v Speaker 4>keep getting better and better at at finding that exact

0:21:40.119 --> 0:21:42.400
<v Speaker 4>exercise that you have an eighty percent chance of getting right.

0:21:42.840 --> 0:21:46.000
<v Speaker 3>Okay, I have another I guess I would say theory

0:21:46.040 --> 0:21:48.840
<v Speaker 3>of language question, and I think I read in one

0:21:48.840 --> 0:21:51.200
<v Speaker 3>of your interviews. You know, is part of the process

0:21:51.240 --> 0:21:54.000
<v Speaker 3>of making the dual lingo ad better, you're always a

0:21:54.080 --> 0:21:57.639
<v Speaker 3>b testing things like should people learn vocabulary first, should

0:21:57.640 --> 0:22:01.040
<v Speaker 3>people learn adjectives before adverbs or a verbs before verbs,

0:22:01.160 --> 0:22:04.000
<v Speaker 3>whatever it is, and that there's this constant process of

0:22:04.320 --> 0:22:08.639
<v Speaker 3>what is the correct sequence? Do rules about the sequence

0:22:08.720 --> 0:22:12.400
<v Speaker 3>of what you learn differ across languages. So let's say

0:22:12.400 --> 0:22:16.240
<v Speaker 3>someone learning Portuguese may have a different optimal path of

0:22:16.280 --> 0:22:20.040
<v Speaker 3>what to learn first grammatically or vocabulary wise, versus say

0:22:20.160 --> 0:22:24.640
<v Speaker 3>someone learning Chinese or Polish, because I'm curious about whether

0:22:24.680 --> 0:22:29.119
<v Speaker 3>we can undercover deep facts about common grammar and language

0:22:29.480 --> 0:22:33.240
<v Speaker 3>from the sort of learning sequence that is optimal across languages.

0:22:33.960 --> 0:22:38.159
<v Speaker 4>Yes, they definitely vary a lot based on the language

0:22:38.200 --> 0:22:41.200
<v Speaker 4>that you're learning, and even more so, they also vary

0:22:41.280 --> 0:22:45.040
<v Speaker 4>based on your native language. So we actually have a

0:22:45.119 --> 0:22:50.159
<v Speaker 4>different course to learn English for Spanish speakers than the

0:22:50.200 --> 0:22:52.639
<v Speaker 4>course we have to learn English for Chinese speakers. They

0:22:52.640 --> 0:22:55.359
<v Speaker 4>are different courses, and there's a reason for that. It

0:22:55.400 --> 0:22:58.320
<v Speaker 4>turns out that what's hard for Spanish speakers in learning

0:22:58.359 --> 0:23:01.760
<v Speaker 4>English is different than it's hard for Chinese speakers in

0:23:01.840 --> 0:23:05.280
<v Speaker 4>learning English. Typically, you know, the things that are common

0:23:05.440 --> 0:23:08.159
<v Speaker 4>between languages are easy, and the things that are very

0:23:08.160 --> 0:23:10.840
<v Speaker 4>different between languages are hard. So just a stupid example,

0:23:10.880 --> 0:23:14.679
<v Speaker 4>I mean, when you're learning English from Spanish, there's you know,

0:23:14.720 --> 0:23:18.600
<v Speaker 4>a couple of thousand cognates. That's words that are the

0:23:18.640 --> 0:23:21.359
<v Speaker 4>same or very close to the same, so you immediately

0:23:21.400 --> 0:23:23.600
<v Speaker 4>know those We don't even need to teach you those words.

0:23:23.640 --> 0:23:26.240
<v Speaker 4>If you're learning English from Spanish because you already you

0:23:26.560 --> 0:23:30.120
<v Speaker 4>know them automatically because they are the same word. That's

0:23:30.160 --> 0:23:33.639
<v Speaker 4>not quite true from Chinese. Other examples are, you know,

0:23:33.720 --> 0:23:36.720
<v Speaker 4>for me in particular, i started learning German, and for me,

0:23:37.000 --> 0:23:40.000
<v Speaker 4>German was quite hard to learn because Spanish, you know,

0:23:40.040 --> 0:23:43.680
<v Speaker 4>my native language is Spanish. Spanish just does not have

0:23:44.119 --> 0:23:48.040
<v Speaker 4>a very developed concept of grammatical cases, whereas German does.

0:23:48.920 --> 0:23:52.919
<v Speaker 4>But learning German from like from Russian, that's just not

0:23:53.000 --> 0:23:56.679
<v Speaker 4>a very hard concept to grasp. So it kind of

0:23:56.680 --> 0:24:00.320
<v Speaker 4>depends on what concepts your language has, you know, also

0:24:00.480 --> 0:24:03.960
<v Speaker 4>not exactly concepts. But in terms of pronunciation, everybody says

0:24:03.960 --> 0:24:06.840
<v Speaker 4>that Spanish pronunciation is really easy, and it's true. Vowels

0:24:06.840 --> 0:24:09.320
<v Speaker 4>in Spanish are really easy because there's only really about

0:24:09.359 --> 0:24:11.040
<v Speaker 4>five vowel sounds. It's a little more than that, but

0:24:11.040 --> 0:24:13.639
<v Speaker 4>it's about five vowel sounds, whereas you know, there are

0:24:13.640 --> 0:24:16.399
<v Speaker 4>other languages that have, you know, fifteen vowel sounds. So

0:24:16.520 --> 0:24:18.879
<v Speaker 4>learning Spanish is easy, but vice versa. If you're a

0:24:18.960 --> 0:24:22.240
<v Speaker 4>native Spanish speaker, learning the languages that have a lot

0:24:22.240 --> 0:24:24.640
<v Speaker 4>of vowel sounds is really hard because you don't even

0:24:24.720 --> 0:24:27.280
<v Speaker 4>you can't even hear the difference. You know, it's very

0:24:27.280 --> 0:24:30.479
<v Speaker 4>funny when you're learning English from as a native Spanish speaker,

0:24:30.480 --> 0:24:33.439
<v Speaker 4>you cannot hear the difference between beach and bitch. You

0:24:33.480 --> 0:24:36.960
<v Speaker 4>cannot hear that difference, and you know, people make funny

0:24:37.000 --> 0:24:37.879
<v Speaker 4>mistakes because of that.

0:24:37.960 --> 0:24:39.640
<v Speaker 2>But I think there are a lot of T shirts

0:24:39.720 --> 0:24:43.120
<v Speaker 2>that involve that at one point in time.

0:24:43.720 --> 0:24:46.479
<v Speaker 4>Well, because really, if you're a native Spanish speaker, you

0:24:46.520 --> 0:24:47.600
<v Speaker 4>just cannot hear that difference.

0:24:48.280 --> 0:24:50.439
<v Speaker 2>So one thing I wanted to ask you is the

0:24:50.640 --> 0:24:53.600
<v Speaker 2>type of model that you're actually using. So I believe

0:24:53.680 --> 0:24:58.160
<v Speaker 2>you're using GPT four for some things like your premium

0:24:58.200 --> 0:25:01.640
<v Speaker 2>subscription do a Lingo Max, but then you've also developed

0:25:01.640 --> 0:25:06.360
<v Speaker 2>your own proprietary AI model called bird Brain. And I'm

0:25:06.440 --> 0:25:10.520
<v Speaker 2>curious about the decision to both use an off the

0:25:10.520 --> 0:25:15.720
<v Speaker 2>shelf solution or platform and to also develop your own

0:25:15.800 --> 0:25:18.800
<v Speaker 2>model at the same time. How did you end up

0:25:18.840 --> 0:25:20.160
<v Speaker 2>going down that path.

0:25:20.680 --> 0:25:22.879
<v Speaker 4>Yeah, it's a great question. I mean, I think the

0:25:23.040 --> 0:25:27.000
<v Speaker 4>difference is these are just very different the last since

0:25:27.040 --> 0:25:29.640
<v Speaker 4>since I don't know, two years ago, when large language

0:25:29.640 --> 0:25:35.120
<v Speaker 4>models or generative AI became very popular. Before that, there

0:25:35.119 --> 0:25:37.600
<v Speaker 4>were different just different things that AI could be used

0:25:37.760 --> 0:25:40.280
<v Speaker 4>for us. We were not using AI, for example for

0:25:40.440 --> 0:25:44.000
<v Speaker 4>practicing conversation. But we were using AI to determine which

0:25:44.000 --> 0:25:47.959
<v Speaker 4>exercise to give to which person that we built our

0:25:48.000 --> 0:25:50.840
<v Speaker 4>own that is the bird brain model is a model

0:25:50.840 --> 0:25:52.680
<v Speaker 4>that tries to figure out which exercise to give to

0:25:52.720 --> 0:25:55.959
<v Speaker 4>which person, you know, the last two years ago, for

0:25:56.000 --> 0:25:58.600
<v Speaker 4>the last two year stories. When people talk about models,

0:25:58.680 --> 0:26:02.480
<v Speaker 4>they usually mean langue which models, And it's this, it's

0:26:02.520 --> 0:26:05.560
<v Speaker 4>this specific type of AI model that what it does

0:26:05.560 --> 0:26:08.560
<v Speaker 4>is it predicts the next word given the previous words.

0:26:08.600 --> 0:26:11.639
<v Speaker 4>That's what a language model does. The large language models

0:26:11.640 --> 0:26:14.919
<v Speaker 4>are particularly good at doing this, and we did not

0:26:15.000 --> 0:26:18.320
<v Speaker 4>develop our own large language model. We decided it's a

0:26:18.320 --> 0:26:21.199
<v Speaker 4>lot easier to just use something like GPT four. But

0:26:21.280 --> 0:26:23.000
<v Speaker 4>we have our own model for something else that is

0:26:23.040 --> 0:26:24.840
<v Speaker 4>not a language model. That is an but it is

0:26:24.880 --> 0:26:28.679
<v Speaker 4>an AI model to predict what exercise to give to

0:26:28.680 --> 0:26:46.960
<v Speaker 4>which user, which is a pretty pretty different problem.

0:26:47.240 --> 0:26:51.119
<v Speaker 3>Speaking of AI, all these especially the really big companies,

0:26:51.480 --> 0:26:55.560
<v Speaker 3>making an extraordinary show of almost bragging about how much

0:26:55.600 --> 0:26:58.280
<v Speaker 3>money they give to Jensen Wong and in video it's like,

0:26:58.440 --> 0:27:01.480
<v Speaker 3>we just spent you know, we're spending twenty billion dollars

0:27:01.480 --> 0:27:04.040
<v Speaker 3>over the next two years to just acquire h one

0:27:04.080 --> 0:27:07.119
<v Speaker 3>hundred chips or whatever it is, and it almost seems

0:27:07.200 --> 0:27:09.719
<v Speaker 3>like there's like arms race. And then there is also

0:27:09.880 --> 0:27:14.879
<v Speaker 3>this view that actually the best models will not necessarily

0:27:14.920 --> 0:27:17.240
<v Speaker 3>be the ones strictly with the access to the most

0:27:17.280 --> 0:27:21.719
<v Speaker 3>compute but the access to data sets that other models

0:27:21.720 --> 0:27:24.520
<v Speaker 3>simply don't have. And I'm curious sort of like you know,

0:27:24.720 --> 0:27:28.359
<v Speaker 3>you as dual lingo must have an extraordinary amount of

0:27:28.440 --> 0:27:32.959
<v Speaker 3>proprietary data just from all of your user interactions in

0:27:33.040 --> 0:27:35.760
<v Speaker 3>your experience. When you think about who the winners will

0:27:35.760 --> 0:27:38.159
<v Speaker 3>be in this space, is it going to be the

0:27:38.200 --> 0:27:41.600
<v Speaker 3>ones that just have the most electricity and energy and chips,

0:27:41.720 --> 0:27:44.560
<v Speaker 3>or is it going to be who has access to

0:27:44.600 --> 0:27:46.560
<v Speaker 3>some sort of data that they can fine tune their

0:27:46.560 --> 0:27:48.320
<v Speaker 3>model on that the other model can.

0:27:49.000 --> 0:27:52.000
<v Speaker 4>It depends on what you're talking about. You know, certainly

0:27:52.040 --> 0:27:54.720
<v Speaker 4>we a stoolingo have a lot of you know, data

0:27:54.880 --> 0:27:57.640
<v Speaker 4>nobody else has, which is the data on how each

0:27:57.680 --> 0:28:00.919
<v Speaker 4>person's learning language. I mean that's not data you can

0:28:00.960 --> 0:28:02.600
<v Speaker 4>find on the web or anything like that. That is

0:28:02.680 --> 0:28:04.399
<v Speaker 4>just the data that we have that we're generating, and

0:28:04.440 --> 0:28:07.040
<v Speaker 4>we're going to train our own models for that. I

0:28:07.040 --> 0:28:12.200
<v Speaker 4>don't think there's enough electricity to train a model without

0:28:12.200 --> 0:28:14.800
<v Speaker 4>this data to be as good as ours with our data,

0:28:15.320 --> 0:28:19.760
<v Speaker 4>but it is for specifically language learning. If you're talking

0:28:19.800 --> 0:28:23.399
<v Speaker 4>about training a general model, that is going to be something,

0:28:23.440 --> 0:28:26.520
<v Speaker 4>you know, a language model that is general for being

0:28:26.520 --> 0:28:29.960
<v Speaker 4>able to have conversations, et cetera. Usually you can get

0:28:30.000 --> 0:28:32.679
<v Speaker 4>that from there's pretty good data there out there. You know,

0:28:32.760 --> 0:28:35.840
<v Speaker 4>YouTube videos that are free or a lot of kind

0:28:35.840 --> 0:28:38.880
<v Speaker 4>of Reddit conversations or whatever. There's there's a lot of

0:28:39.000 --> 0:28:42.200
<v Speaker 4>data in there. Probably a power is going to matter.

0:28:42.720 --> 0:28:44.239
<v Speaker 4>So it depends on what you're going to use your

0:28:44.280 --> 0:28:46.400
<v Speaker 4>model for. If if you're getting if you're using it

0:28:46.440 --> 0:28:49.440
<v Speaker 4>for a very specific purpose and you have very specific

0:28:49.520 --> 0:28:52.160
<v Speaker 4>data for that that is proprietary, that's going to be

0:28:52.160 --> 0:28:56.000
<v Speaker 4>better for the specific purpose. But my sense is that

0:28:56.080 --> 0:28:58.800
<v Speaker 4>you know both are going to matter. You know what

0:28:59.000 --> 0:29:02.840
<v Speaker 4>data you have and also how much electricity you spend.

0:29:03.360 --> 0:29:06.080
<v Speaker 4>But I also think that over time, hopefully we're going

0:29:06.160 --> 0:29:08.239
<v Speaker 4>to get better and better at these algorithms. And if

0:29:08.280 --> 0:29:10.600
<v Speaker 4>you think about it, the human brain uses something like

0:29:10.640 --> 0:29:13.680
<v Speaker 4>thirty watts for the human brain is pretty good and

0:29:13.720 --> 0:29:15.760
<v Speaker 4>we don't need you know, some of these models. People

0:29:15.760 --> 0:29:17.960
<v Speaker 4>are saying, oh, this is uses the the amount of

0:29:18.000 --> 0:29:20.640
<v Speaker 4>electricity that all of New York City uses. We use

0:29:20.720 --> 0:29:24.440
<v Speaker 4>that to train a model. You know, our brain uses much, much,

0:29:24.520 --> 0:29:28.520
<v Speaker 4>much less electricity than that, and you know, it's pretty good.

0:29:28.760 --> 0:29:31.600
<v Speaker 4>So my sense is that also over time, hopefully we'll

0:29:31.640 --> 0:29:34.200
<v Speaker 4>be able to get to the point where we're not

0:29:34.480 --> 0:29:36.960
<v Speaker 4>as crazy about using electricity as we are today.

0:29:37.280 --> 0:29:40.160
<v Speaker 2>I'm glad our brains are energy efficient. That's nice to know.

0:29:40.800 --> 0:29:42.800
<v Speaker 4>We've been talking a lot better than computers.

0:29:43.760 --> 0:29:46.280
<v Speaker 2>We've been talking a lot about the use of AI

0:29:46.720 --> 0:29:51.440
<v Speaker 2>in the product itself, so improving the experience of learning

0:29:51.480 --> 0:29:55.080
<v Speaker 2>a language. But one of the things that we hear

0:29:55.200 --> 0:29:58.320
<v Speaker 2>a lot about nowadays is also, you know, angst over

0:29:58.560 --> 0:30:01.880
<v Speaker 2>the role of AI in the wider economy in terms

0:30:01.880 --> 0:30:05.040
<v Speaker 2>of the labor force, job security, and stuff like that,

0:30:05.120 --> 0:30:08.480
<v Speaker 2>as companies try to be more efficient. So I guess

0:30:08.520 --> 0:30:12.440
<v Speaker 2>I'm wondering, on the sort of corporate side, how much

0:30:12.480 --> 0:30:15.760
<v Speaker 2>does AI play into the business model right now in

0:30:15.880 --> 0:30:21.000
<v Speaker 2>terms of streamlining things like costs or reducing workforce. And

0:30:21.040 --> 0:30:23.760
<v Speaker 2>I believe there are quite a few headlines around Duo

0:30:23.880 --> 0:30:26.600
<v Speaker 2>Lingo on this exact topic late last year.

0:30:26.960 --> 0:30:29.360
<v Speaker 4>Yeah, first of all, those headlines were upsetting to me.

0:30:29.440 --> 0:30:31.320
<v Speaker 4>Because they were wrong. You know, there were a lot

0:30:31.320 --> 0:30:33.320
<v Speaker 4>of headlines thing that we had done a massive layoff

0:30:33.760 --> 0:30:37.120
<v Speaker 4>that was not actually true. So what is true is that,

0:30:37.160 --> 0:30:39.360
<v Speaker 4>you know, we really are leaning into AI. You know,

0:30:39.640 --> 0:30:42.480
<v Speaker 4>it just it makes sense. This is a very transformative technology,

0:30:42.520 --> 0:30:44.720
<v Speaker 4>so we're leaning into it. And it is also true

0:30:44.760 --> 0:30:48.280
<v Speaker 4>that many workflows are a lot more efficient. And so

0:30:48.320 --> 0:30:51.959
<v Speaker 4>what happened late last year was that we realized we

0:30:52.000 --> 0:30:54.040
<v Speaker 4>have full time employees and but we also have some

0:30:54.200 --> 0:30:58.120
<v Speaker 4>hourly contractors. We realized that we need a fewer hourly

0:30:58.200 --> 0:31:01.360
<v Speaker 4>contractors and so for you know, a small fraction of

0:31:01.360 --> 0:31:03.760
<v Speaker 4>our hourly contracts, we did not renew their contract because

0:31:03.800 --> 0:31:06.840
<v Speaker 4>we realized we need a few of them for doing

0:31:06.920 --> 0:31:10.440
<v Speaker 4>some tests that you know, honestly, computers were just as

0:31:10.440 --> 0:31:13.200
<v Speaker 4>good as as a human and that's you know, that

0:31:13.320 --> 0:31:16.080
<v Speaker 4>may be true for something like a like an hourly

0:31:16.120 --> 0:31:18.800
<v Speaker 4>contractor force that was being asked to do. We were

0:31:18.840 --> 0:31:22.800
<v Speaker 4>basically being asked to do very rote kind of language

0:31:22.840 --> 0:31:25.760
<v Speaker 4>tasks that computers just got very good at. I think

0:31:25.800 --> 0:31:28.959
<v Speaker 4>if you're talking about you know, our full time employees

0:31:29.000 --> 0:31:31.360
<v Speaker 4>and people who are who are not necessarily just doing

0:31:31.560 --> 0:31:36.280
<v Speaker 4>rote repetitive stuff that's going to take a while to replace.

0:31:36.320 --> 0:31:38.200
<v Speaker 4>I don't think, and certainly this is not what we

0:31:38.240 --> 0:31:39.800
<v Speaker 4>want to do as a company. You know, I heard

0:31:39.800 --> 0:31:43.080
<v Speaker 4>a really good saying recently, which is, your job's not

0:31:43.120 --> 0:31:44.840
<v Speaker 4>going to be replaced by AI. It's going to be

0:31:44.840 --> 0:31:48.000
<v Speaker 4>replaced by somebody who knows how to use AI. So

0:31:48.120 --> 0:31:50.120
<v Speaker 4>what we're seeing in the company, at least for our

0:31:50.120 --> 0:31:52.880
<v Speaker 4>full time employees, is not that we're able or even

0:31:52.920 --> 0:31:55.160
<v Speaker 4>want to replace them. What we're seeing is just way

0:31:55.160 --> 0:31:58.920
<v Speaker 4>more productivity, to the point where people are able to

0:31:59.000 --> 0:32:03.120
<v Speaker 4>concentrate on kind of higher level cognitive tasks rather than

0:32:03.200 --> 0:32:06.280
<v Speaker 4>wrote things. I don't know. One hundred years ago, people

0:32:06.320 --> 0:32:10.240
<v Speaker 4>were being hired to add numbers or multiply numbers the

0:32:10.280 --> 0:32:13.200
<v Speaker 4>original quote unquote computers were actually humans who are being

0:32:13.720 --> 0:32:17.600
<v Speaker 4>hired to multiply numbers. We were able to mechanize that

0:32:17.880 --> 0:32:19.680
<v Speaker 4>and use an actual computer to do that so that

0:32:19.720 --> 0:32:21.880
<v Speaker 4>people didn't have to do that. Instead, they spend time,

0:32:22.000 --> 0:32:25.719
<v Speaker 4>you know, planning something at a higher level rather than

0:32:25.760 --> 0:32:29.000
<v Speaker 4>having to do the multiplication. We're seeing something similar to

0:32:29.120 --> 0:32:31.800
<v Speaker 4>that now. And the other thing that we're seeing is

0:32:31.960 --> 0:32:35.080
<v Speaker 4>that is really amazing. So we are saving costs because

0:32:35.160 --> 0:32:38.840
<v Speaker 4>it's a single person can do more, but also we're

0:32:38.840 --> 0:32:42.640
<v Speaker 4>able to do things much much faster, and in particular

0:32:42.640 --> 0:32:44.400
<v Speaker 4>in data creation. I mean, one of the ways in

0:32:44.400 --> 0:32:45.960
<v Speaker 4>which we teach you how to read is even read

0:32:46.000 --> 0:32:48.760
<v Speaker 4>short stories. We used to create and we need to

0:32:48.760 --> 0:32:51.280
<v Speaker 4>create a lot of short stories. We used to be

0:32:51.280 --> 0:32:54.720
<v Speaker 4>able to create short stories, you know, at a certain pace.

0:32:55.080 --> 0:32:58.400
<v Speaker 4>We can now create them like ten times faster. And

0:32:58.440 --> 0:33:00.880
<v Speaker 4>what's beautiful about being able to eat them ten times

0:33:00.920 --> 0:33:03.920
<v Speaker 4>faster is that you can actually make the quality better

0:33:03.960 --> 0:33:06.360
<v Speaker 4>because if you create them once ten times faster and

0:33:06.520 --> 0:33:08.280
<v Speaker 4>you don't like it, you can start over and do

0:33:08.360 --> 0:33:11.240
<v Speaker 4>it again with certain changes and then oh you didn't

0:33:11.280 --> 0:33:13.320
<v Speaker 4>like it, Okay, try it again, So you can you

0:33:13.320 --> 0:33:16.400
<v Speaker 4>can try ten times at this, you know, whereas before

0:33:16.400 --> 0:33:18.840
<v Speaker 4>you can only try once, and generally you don't have

0:33:18.880 --> 0:33:20.240
<v Speaker 4>to try ten times. You have to try a few

0:33:20.320 --> 0:33:21.800
<v Speaker 4>or times. So this is able to at the same

0:33:21.880 --> 0:33:25.800
<v Speaker 4>time lower costs for us, but also make the speed

0:33:25.840 --> 0:33:28.360
<v Speaker 4>faster and the quality better. So I mean, we're very

0:33:28.360 --> 0:33:30.280
<v Speaker 4>happy with that. In terms from the corporate side.

0:33:30.440 --> 0:33:33.360
<v Speaker 3>Could you talk more about benchmarking AI, because there's all

0:33:33.400 --> 0:33:36.160
<v Speaker 3>these tests, right and you see these websites and they're like,

0:33:36.160 --> 0:33:38.560
<v Speaker 3>well this one got this on the l sads, and

0:33:38.560 --> 0:33:40.360
<v Speaker 3>this one got this on the SATs and I can

0:33:40.360 --> 0:33:42.880
<v Speaker 3>never quite tell. And a lot of it seems inscrutable

0:33:43.000 --> 0:33:46.320
<v Speaker 3>to me from your perspective, Like, what are sort of

0:33:46.360 --> 0:33:51.560
<v Speaker 3>your basic approaches to benchmarking different models and determining when

0:33:51.600 --> 0:33:54.640
<v Speaker 3>it like, okay, this makes sense as some sort of

0:33:54.760 --> 0:33:58.719
<v Speaker 3>task to employ AI instead of a person doing it.

0:33:59.000 --> 0:34:01.200
<v Speaker 4>Yeah, I have felt the same as you have. There's

0:34:01.240 --> 0:34:02.520
<v Speaker 4>a lot of my senses and a lot of these

0:34:02.560 --> 0:34:05.920
<v Speaker 4>benchmarks are from marketing teams. You know, what we do

0:34:06.080 --> 0:34:09.600
<v Speaker 4>internally is two things. First of all, we just try stuff,

0:34:09.680 --> 0:34:11.160
<v Speaker 4>and then we look at it, and we look at

0:34:11.280 --> 0:34:13.759
<v Speaker 4>the very specific you know, it's nice that an AI

0:34:13.800 --> 0:34:15.759
<v Speaker 4>can pass the L set or whatever, but we're you know,

0:34:15.760 --> 0:34:18.120
<v Speaker 4>we're not in the business of passing L sets. We're

0:34:18.160 --> 0:34:19.799
<v Speaker 4>in the business of doing whatever it is we're doing,

0:34:19.880 --> 0:34:22.279
<v Speaker 4>you know, creating short stories or whatever. So whatever task,

0:34:22.360 --> 0:34:25.800
<v Speaker 4>we just try it and then we judge the quality ourselves.

0:34:25.880 --> 0:34:28.719
<v Speaker 4>So far, we have found that the quality of the

0:34:28.760 --> 0:34:31.680
<v Speaker 4>open AI models is a little better than everybody else's,

0:34:32.360 --> 0:34:35.600
<v Speaker 4>but not that much better. I mean two years ago

0:34:35.680 --> 0:34:38.239
<v Speaker 4>it was way better. It seems like everybody else is

0:34:38.280 --> 0:34:40.520
<v Speaker 4>catching up. But so far we have found that that's

0:34:40.600 --> 0:34:42.799
<v Speaker 4>just when we do our tests. And again, this is

0:34:43.280 --> 0:34:45.400
<v Speaker 4>you know, just an end of one one company. I'm

0:34:45.400 --> 0:34:47.600
<v Speaker 4>sure that other companies are finding maybe different stuff, but

0:34:47.640 --> 0:34:50.879
<v Speaker 4>for us, for our specific use cases, we find time

0:34:50.920 --> 0:34:53.920
<v Speaker 4>and again the GPT four does better. And I don't know,

0:34:53.960 --> 0:34:56.000
<v Speaker 4>of course, everybody's now announcing like there's going to be

0:34:56.040 --> 0:34:58.040
<v Speaker 4>GPT five et cetera, et cetera. I don't know how

0:34:58.040 --> 0:35:00.840
<v Speaker 4>those will be, but that's what we're finding. You know, generally,

0:35:00.880 --> 0:35:01.960
<v Speaker 4>would just do our own testing.

0:35:02.160 --> 0:35:04.640
<v Speaker 3>Yeah, Tracy, I find that so fascinating, especially, I think

0:35:04.640 --> 0:35:07.560
<v Speaker 3>we've talked about this, like it definitely seems like TBD

0:35:07.760 --> 0:35:10.480
<v Speaker 3>whether like one model would just prove to be head

0:35:10.480 --> 0:35:12.840
<v Speaker 3>and shoulders better than the others, the way that Google

0:35:12.960 --> 0:35:15.160
<v Speaker 3>was just head and shoulders above everyone else for twenty

0:35:15.280 --> 0:35:18.880
<v Speaker 3>years basically and still is kind of like, it's unclear

0:35:18.920 --> 0:35:19.960
<v Speaker 3>to me whether that'll be the.

0:35:19.920 --> 0:35:21.719
<v Speaker 2>Case with they ask right, the idea that we're in

0:35:21.760 --> 0:35:25.080
<v Speaker 2>the I don't know, the bing era of chat models

0:35:25.080 --> 0:35:28.360
<v Speaker 2>and eventually we're all going to migrate to something else. Luise,

0:35:28.440 --> 0:35:30.600
<v Speaker 2>One thing I wanted to ask you, and this is

0:35:30.640 --> 0:35:32.400
<v Speaker 2>sort of going back to the very beginning of the

0:35:32.440 --> 0:35:36.360
<v Speaker 2>conversation and some of the you know, older thoughts around language.

0:35:36.400 --> 0:35:38.680
<v Speaker 2>There used to be I don't want to say a consensus,

0:35:38.719 --> 0:35:42.040
<v Speaker 2>but there used to be some thinking that language was

0:35:42.520 --> 0:35:45.359
<v Speaker 2>very complicated in many ways, and so much of it

0:35:45.440 --> 0:35:49.719
<v Speaker 2>was sort of ambiguous or maybe context dependent, that it

0:35:49.760 --> 0:35:52.640
<v Speaker 2>would be very hard for AI to sort of wrap

0:35:52.680 --> 0:35:55.960
<v Speaker 2>its head around it. And I'm wondering now, with something

0:35:56.000 --> 0:35:59.880
<v Speaker 2>like due lingo, how do your models take into account

0:36:00.239 --> 0:36:04.320
<v Speaker 2>that sort of context dependency? And I'm thinking, you know,

0:36:04.400 --> 0:36:09.279
<v Speaker 2>I'm thinking specifically about things like Mandarin, where the pronunciation

0:36:09.600 --> 0:36:13.720
<v Speaker 2>is kind of tricky and a lot of understanding depends

0:36:13.760 --> 0:36:17.560
<v Speaker 2>on the context in which a particular word is said.

0:36:17.680 --> 0:36:19.600
<v Speaker 2>So how do you sort of deal with that?

0:36:20.000 --> 0:36:21.719
<v Speaker 4>Yeah, I mean it's an interesting thing. You know, when

0:36:21.719 --> 0:36:24.319
<v Speaker 4>you meant when you were asking the question, I thought

0:36:24.320 --> 0:36:26.880
<v Speaker 4>of this thing. You know, I've been around AI since

0:36:26.960 --> 0:36:31.200
<v Speaker 4>the late nineties, and I remember just it's just this

0:36:31.280 --> 0:36:34.000
<v Speaker 4>moving goalpost. I remember. Everybody just kept on saying, look,

0:36:34.080 --> 0:36:36.640
<v Speaker 4>if a computer can play chess, surely we all agree

0:36:36.760 --> 0:36:39.279
<v Speaker 4>it has human level intelligence. This is kind of what

0:36:39.320 --> 0:36:42.319
<v Speaker 4>everybody said. Then it turned out computers could play chess,

0:36:42.320 --> 0:36:44.680
<v Speaker 4>and nobody agreed that I had human level intelligence. It's

0:36:44.719 --> 0:36:47.120
<v Speaker 4>just like, oh, very fine, it can play just next thing.

0:36:47.360 --> 0:36:49.120
<v Speaker 4>And it would just keep coming up with stuff like,

0:36:49.200 --> 0:36:51.839
<v Speaker 4>surely if a computer can you know, play the game

0:36:51.880 --> 0:36:54.160
<v Speaker 4>of go, or if a computer could do this, then

0:36:54.680 --> 0:36:56.920
<v Speaker 4>you know, And one of the last few things was

0:36:57.239 --> 0:37:01.840
<v Speaker 4>if a computer can whatever right poetry so well or

0:37:01.960 --> 0:37:05.800
<v Speaker 4>understand text, then surely is intelligent. And at this point,

0:37:06.320 --> 0:37:09.839
<v Speaker 4>models like GPT four are really good at doing things,

0:37:09.880 --> 0:37:11.480
<v Speaker 4>certainly better than the average human. They may not be

0:37:11.520 --> 0:37:13.400
<v Speaker 4>as good as the best poet in the world, but

0:37:13.440 --> 0:37:16.439
<v Speaker 4>certainly better than the average human writing poetry, certainly better

0:37:16.440 --> 0:37:19.680
<v Speaker 4>than the average human at almost anything with text manipulation. Actually,

0:37:19.719 --> 0:37:21.360
<v Speaker 4>if you look at your average human, they're just not

0:37:21.400 --> 0:37:24.080
<v Speaker 4>particularly good at writing.

0:37:23.719 --> 0:37:26.160
<v Speaker 3>So many professional writers oh yeah, ye.

0:37:26.120 --> 0:37:28.680
<v Speaker 4>Yeah, I mean just these models are excellent. And in fact,

0:37:28.680 --> 0:37:30.799
<v Speaker 4>you can write something that is half well written and

0:37:30.840 --> 0:37:32.520
<v Speaker 4>you can ask the model to make it better and

0:37:32.560 --> 0:37:35.600
<v Speaker 4>it does that. It like makes your text better. So

0:37:35.880 --> 0:37:39.479
<v Speaker 4>it's this funny thing that just AI. We keep coming

0:37:39.560 --> 0:37:41.600
<v Speaker 4>up with things that like if AI can crack that,

0:37:41.600 --> 0:37:43.719
<v Speaker 4>that's it, that's it. You know, I don't know what

0:37:43.760 --> 0:37:45.480
<v Speaker 4>the next one will be, but you know, we gep

0:37:45.520 --> 0:37:47.440
<v Speaker 4>coming up with stuff like that, you know, in terms

0:37:47.440 --> 0:37:51.160
<v Speaker 4>of the language, it just turns out that language can

0:37:51.200 --> 0:37:55.920
<v Speaker 4>be mostly captured by these models. It turns out that

0:37:55.960 --> 0:37:58.920
<v Speaker 4>if you make a neural network architecture and this you know,

0:37:58.960 --> 0:38:01.719
<v Speaker 4>nobody could have guess this, but it just turns out

0:38:02.120 --> 0:38:04.919
<v Speaker 4>that if you make this neural network at architecture that's

0:38:04.960 --> 0:38:08.760
<v Speaker 4>called the transformer, and you train it with a gazillion

0:38:09.080 --> 0:38:11.759
<v Speaker 4>pieces of text, it just turns out it pretty much

0:38:11.760 --> 0:38:14.200
<v Speaker 4>can capture almost any new ones of the language. Again,

0:38:14.239 --> 0:38:15.919
<v Speaker 4>nobody could have figured this out, but it just turns

0:38:15.960 --> 0:38:17.640
<v Speaker 4>out that this is the case. So at this point,

0:38:17.640 --> 0:38:19.200
<v Speaker 4>when you know, when you ask about you know, what

0:38:19.320 --> 0:38:23.080
<v Speaker 4>we do with context or whatever, it just works when

0:38:23.080 --> 0:38:26.040
<v Speaker 4>you're you know, some of it we do with handwritten

0:38:26.080 --> 0:38:28.520
<v Speaker 4>rules because we write the rules. But generally, if you're

0:38:28.520 --> 0:38:31.319
<v Speaker 4>going to use an AI, it just works. And you

0:38:31.360 --> 0:38:32.960
<v Speaker 4>can ask me why it works. And I don't know

0:38:33.000 --> 0:38:35.640
<v Speaker 4>white works. I don't think anybody does. But it turns

0:38:35.640 --> 0:38:37.919
<v Speaker 4>out that the statistics are kind of strong enough there

0:38:38.000 --> 0:38:40.880
<v Speaker 4>that if you train it with a gazillion pieces of text,

0:38:41.239 --> 0:38:42.000
<v Speaker 4>it just works.

0:38:43.120 --> 0:38:45.399
<v Speaker 3>I just want to go back to the sort of

0:38:45.480 --> 0:38:48.920
<v Speaker 3>like you know where AI is going and you mentioned

0:38:48.960 --> 0:38:52.640
<v Speaker 3>that AI can generate thousands or ten, you know, very

0:38:52.760 --> 0:38:55.560
<v Speaker 3>rapidly numerous short stories, and then a human can say, Okay,

0:38:55.640 --> 0:38:57.200
<v Speaker 3>these are the good ones. We can improve and so

0:38:57.280 --> 0:39:00.560
<v Speaker 3>you not only get the efficiency savings, actually can get

0:39:00.600 --> 0:39:03.399
<v Speaker 3>a better higher quality for the lessons and so forth.

0:39:03.480 --> 0:39:05.920
<v Speaker 3>But you know, sort of like I'm moving up the

0:39:06.000 --> 0:39:09.279
<v Speaker 3>abstraction layer, like, will there be a point at some

0:39:09.440 --> 0:39:13.480
<v Speaker 3>point in the future in which the entire concept of

0:39:13.560 --> 0:39:16.880
<v Speaker 3>learning a language or the entire sequence is almost entirely

0:39:16.960 --> 0:39:20.320
<v Speaker 3>something that AI can do from scratch? Again, I'm thinking

0:39:20.400 --> 0:39:23.360
<v Speaker 3>sort of back to that chess analogy of not having

0:39:23.400 --> 0:39:27.040
<v Speaker 3>to use the entire history of games to learn, but

0:39:27.200 --> 0:39:29.839
<v Speaker 3>just knowing the basic rules and then coming up with

0:39:29.880 --> 0:39:32.759
<v Speaker 3>something further like, will AI eventually be able to sort

0:39:32.800 --> 0:39:35.520
<v Speaker 3>of like design the architecture of what it means to

0:39:35.600 --> 0:39:36.520
<v Speaker 3>learn a language?

0:39:37.000 --> 0:39:39.160
<v Speaker 4>I mean sure, I think at some point EI is

0:39:39.160 --> 0:39:41.000
<v Speaker 4>going to be able to do pretty much everything.

0:39:41.360 --> 0:39:41.560
<v Speaker 2>Right.

0:39:41.600 --> 0:39:44.239
<v Speaker 4>It very hard to know how long this will take.

0:39:44.239 --> 0:39:47.279
<v Speaker 4>I mean, it's just very hard, and honestly for our

0:39:47.320 --> 0:39:50.000
<v Speaker 4>own society, I'm hoping that the process is gradual and

0:39:50.080 --> 0:39:52.600
<v Speaker 4>not from one day to the next, because if we

0:39:52.680 --> 0:39:55.239
<v Speaker 4>find that at some point AI really goes from if

0:39:55.280 --> 0:39:57.560
<v Speaker 4>tomorrow somebody announces, okay, I have an AI that can

0:39:57.560 --> 0:40:00.680
<v Speaker 4>pretty much do everything perfectly. I think this will be

0:40:00.719 --> 0:40:04.000
<v Speaker 4>a major societal problem because we won't know what to do.

0:40:04.040 --> 0:40:06.680
<v Speaker 4>But if this process takes twenty thirty years, at least

0:40:06.680 --> 0:40:09.440
<v Speaker 4>we'll be able to as a society figure out what

0:40:09.480 --> 0:40:13.000
<v Speaker 4>to do with ourselves. But generally, I mean, I think

0:40:13.000 --> 0:40:14.319
<v Speaker 4>at some point AI is going to be able to

0:40:14.320 --> 0:40:15.080
<v Speaker 4>do everything we can.

0:40:16.640 --> 0:40:19.640
<v Speaker 2>What's the big challenge when it comes to AI at

0:40:19.640 --> 0:40:22.680
<v Speaker 2>the moment? I realize we've been talking a lot about opportunities,

0:40:22.719 --> 0:40:24.840
<v Speaker 2>but what are some of the issues that you're trying

0:40:24.840 --> 0:40:28.560
<v Speaker 2>to surmount at the moment, Whether it's something like getting

0:40:28.640 --> 0:40:34.200
<v Speaker 2>enough compute or securing the best engineers, or I guess

0:40:34.280 --> 0:40:37.440
<v Speaker 2>being in competition with a number of other companies that

0:40:37.480 --> 0:40:40.800
<v Speaker 2>are also using AI, maybe in the same business.

0:40:41.400 --> 0:40:44.440
<v Speaker 4>I mean, certainly, securing good engineers has been a challenge

0:40:44.480 --> 0:40:47.200
<v Speaker 4>for anything related to engineering for a while. You know,

0:40:47.520 --> 0:40:49.359
<v Speaker 4>you want the best engineers, and there's just not very

0:40:49.360 --> 0:40:51.200
<v Speaker 4>many of them, so there's a lot of competition. So

0:40:51.200 --> 0:40:54.880
<v Speaker 4>that's certainly true in terms of AI in particular, I

0:40:54.880 --> 0:40:57.360
<v Speaker 4>would say that I don't know what depends on what

0:40:57.400 --> 0:41:00.760
<v Speaker 4>you're trying to achieve. These models are getting better and better.

0:41:01.480 --> 0:41:06.040
<v Speaker 4>What they're not yet quite exhibiting is actual kind of

0:41:06.200 --> 0:41:09.160
<v Speaker 4>deduction and understanding as good as we would want them

0:41:09.200 --> 0:41:11.240
<v Speaker 4>to do. I mean, so you still see really because

0:41:11.280 --> 0:41:12.719
<v Speaker 4>of the way they work, I mean, these are just

0:41:12.719 --> 0:41:15.719
<v Speaker 4>predicting the next word. Because of the way they work,

0:41:15.760 --> 0:41:18.439
<v Speaker 4>you can see them do funky stuff like they get

0:41:18.560 --> 0:41:22.440
<v Speaker 4>adding numbers wrong sometimes because they're not actually adding numbers.

0:41:22.480 --> 0:41:24.680
<v Speaker 4>They're just predicting the next word. And it turns out

0:41:24.719 --> 0:41:26.600
<v Speaker 4>you can predict a lot of things you know you

0:41:26.640 --> 0:41:29.520
<v Speaker 4>may not, So it doesn't quite have a concept of addition, doesn't.

0:41:29.680 --> 0:41:31.600
<v Speaker 4>So I think, you know, if what you're looking for

0:41:31.719 --> 0:41:34.480
<v Speaker 4>is kind of general intelligence, I think there's some amount

0:41:34.600 --> 0:41:38.560
<v Speaker 4>that's going to be required in terms of actually understanding

0:41:38.600 --> 0:41:41.919
<v Speaker 4>certain concepts that these models don't yet have. And that's,

0:41:42.080 --> 0:41:43.959
<v Speaker 4>you know, my sense is that new ideas are needed

0:41:44.000 --> 0:41:45.600
<v Speaker 4>for that. I don't know what they are. If I knew,

0:41:45.640 --> 0:41:48.640
<v Speaker 4>I would do them, but new ideas are needed for that.

0:41:48.920 --> 0:41:51.080
<v Speaker 3>Yeah, it's still like mind blowing, Like you see the

0:41:51.200 --> 0:41:54.920
<v Speaker 3>AI produce some sort of amazing output or explanation and

0:41:54.960 --> 0:41:57.160
<v Speaker 3>then it'll like get wrong. Like a question of like

0:41:57.200 --> 0:42:00.840
<v Speaker 3>what weighs more a kilogram of feathers or kill of steel,

0:42:00.920 --> 0:42:02.800
<v Speaker 3>like something really led.

0:42:02.680 --> 0:42:05.040
<v Speaker 4>Or yah because it doesn't Yeah, right, because.

0:42:05.600 --> 0:42:07.799
<v Speaker 3>There's no one, there's no actual intuition. I just have

0:42:07.880 --> 0:42:11.520
<v Speaker 3>one last question, and it's sort of There are not

0:42:11.680 --> 0:42:15.000
<v Speaker 3>many sort of like cutting edge tech companies based in Pittsburgh.

0:42:15.040 --> 0:42:18.920
<v Speaker 3>I understand like CMU has historically been a bastion of

0:42:19.000 --> 0:42:22.160
<v Speaker 3>advanced AI research. I think at one point, like Uber

0:42:22.280 --> 0:42:24.719
<v Speaker 3>bought out like the entire robotics department when it was

0:42:24.719 --> 0:42:27.719
<v Speaker 3>trying to do self driving cars. But how do you

0:42:27.760 --> 0:42:29.920
<v Speaker 3>see that when it comes to this sort of recruiting

0:42:30.200 --> 0:42:34.160
<v Speaker 3>of talent and it's already scarce. What are the advantages

0:42:34.239 --> 0:42:37.560
<v Speaker 3>and disadvantages of being based in Pittsburgh rather than the

0:42:37.560 --> 0:42:38.680
<v Speaker 3>Bay Area or somewhere else.

0:42:38.920 --> 0:42:41.560
<v Speaker 4>Yeah, we that a quarter in Pittsburgh's it's the beginning.

0:42:41.600 --> 0:42:43.759
<v Speaker 4>We've loved being there. There are good things and bad things.

0:42:43.800 --> 0:42:45.920
<v Speaker 4>I mean, certainly a good thing is being close to

0:42:45.920 --> 0:42:48.960
<v Speaker 4>Carnegie Mellon. Carnegie Mellon produces, you know, some of the

0:42:48.960 --> 0:42:51.880
<v Speaker 4>best engineers in the world, and certainly relating to AI.

0:42:52.719 --> 0:42:55.560
<v Speaker 4>Another good thing about being in a city like Pittsburgh

0:42:55.640 --> 0:42:58.439
<v Speaker 4>is that two good things. One of them is that

0:42:58.640 --> 0:43:01.799
<v Speaker 4>people don't leave jobs that easily. And you know, when

0:43:01.800 --> 0:43:03.560
<v Speaker 4>you're in a place like Silicon Valley, you get these

0:43:03.600 --> 0:43:07.200
<v Speaker 4>people that leave jobs every eighteen months. Our average employee

0:43:07.200 --> 0:43:10.240
<v Speaker 4>stays around for a very long time, and that's actually

0:43:10.239 --> 0:43:12.600
<v Speaker 4>a major advantage because you don't have to retrain them.

0:43:12.640 --> 0:43:14.440
<v Speaker 4>They really know how to do the job because they've

0:43:14.480 --> 0:43:16.360
<v Speaker 4>been doing it for the last seven years. So that

0:43:16.600 --> 0:43:19.719
<v Speaker 4>that's been an advantage. And I think another advantage that

0:43:19.760 --> 0:43:22.760
<v Speaker 4>we've had is in terms of Silicon Valley, there's usually

0:43:22.800 --> 0:43:24.880
<v Speaker 4>one or two companies that are kind of the darlings

0:43:24.880 --> 0:43:27.319
<v Speaker 4>of Silicon Valley, and everybody wants to work there, and

0:43:27.400 --> 0:43:30.640
<v Speaker 4>that the Darling Company changes every two three years, and

0:43:30.640 --> 0:43:32.520
<v Speaker 4>the kind of all the good people go there. The

0:43:32.560 --> 0:43:36.000
<v Speaker 4>good news in Pittsburgh is that fad type thing doesn't happen.

0:43:36.040 --> 0:43:38.359
<v Speaker 4>So there have been times. We're lucky that right now

0:43:38.360 --> 0:43:40.279
<v Speaker 4>our stock is doing very well, so we're kind of

0:43:40.280 --> 0:43:42.840
<v Speaker 4>a fad company. But there have been times when we

0:43:42.960 --> 0:43:45.319
<v Speaker 4>just weren't, but we still were able to get really

0:43:45.360 --> 0:43:48.200
<v Speaker 4>good talent. So I think that's been really good. You know.

0:43:48.239 --> 0:43:50.480
<v Speaker 4>On the flip side, of course, there are certain roles

0:43:50.560 --> 0:43:52.720
<v Speaker 4>for which it is hard to hire people in Pittsburgh.

0:43:52.760 --> 0:43:56.400
<v Speaker 4>Particularly product managers are hard to hire in Pittsburgh. So

0:43:56.640 --> 0:43:58.359
<v Speaker 4>because of that, we have an office in New York,

0:43:58.520 --> 0:44:00.440
<v Speaker 4>and we complement that, we have a pretty long our jofice

0:44:00.440 --> 0:44:02.560
<v Speaker 4>in New York, and we compliment that.

0:44:03.120 --> 0:44:05.840
<v Speaker 2>All right, Louise went on from dual LINGO, thank you

0:44:05.840 --> 0:44:07.800
<v Speaker 2>so much for coming on all thoughts. That was great,

0:44:08.480 --> 0:44:23.719
<v Speaker 2>Thank you excellent, Joe. I enjoyed that conversation. You know

0:44:23.760 --> 0:44:26.600
<v Speaker 2>what I was thinking about when Louis was talking about,

0:44:26.680 --> 0:44:28.439
<v Speaker 2>it's not that AI is going to take her job,

0:44:28.480 --> 0:44:30.719
<v Speaker 2>it's someone who knows how to use AI is going

0:44:30.760 --> 0:44:33.400
<v Speaker 2>to take your job. I was thinking about just before

0:44:33.440 --> 0:44:35.759
<v Speaker 2>we came on this recording, you were telling me that

0:44:35.840 --> 0:44:39.120
<v Speaker 2>you used was it Chat, GPT or claude to learn

0:44:39.200 --> 0:44:40.560
<v Speaker 2>something that I normally do.

0:44:40.880 --> 0:44:43.560
<v Speaker 3>Oh yeah. So for those who don't know, we have

0:44:43.600 --> 0:44:47.040
<v Speaker 3>a weekly odd lauged newsletter and we usually comes out

0:44:47.080 --> 0:44:50.840
<v Speaker 3>every Friday. You should go to subscribe and Tracy usually

0:44:50.840 --> 0:44:52.600
<v Speaker 3>sends an email to one of the guests each week

0:44:52.719 --> 0:44:56.040
<v Speaker 3>asking what books they recommend, you know, people like reading books.

0:44:56.400 --> 0:44:58.680
<v Speaker 3>And then she goes into ms paint and then like

0:44:58.719 --> 0:45:03.480
<v Speaker 3>puts the chatbooks of like the four books together, and

0:45:03.560 --> 0:45:05.879
<v Speaker 3>I did add because Tracy was out a couple weeks ago.

0:45:06.360 --> 0:45:08.799
<v Speaker 3>And I am not, like, I've never like learned Photoshop

0:45:08.880 --> 0:45:11.160
<v Speaker 3>or even MS paint, so just like I'm very dumb,

0:45:11.239 --> 0:45:13.760
<v Speaker 3>Like just like the process of putting four images together

0:45:14.520 --> 0:45:16.640
<v Speaker 3>was not something I exactly knew how to do. So

0:45:16.680 --> 0:45:18.560
<v Speaker 3>I went to Claude and I said, I'm putting together

0:45:18.640 --> 0:45:21.560
<v Speaker 3>four book images in an MS paint thing. Please tell

0:45:21.560 --> 0:45:23.360
<v Speaker 3>me how to do it and to walk through the steps.

0:45:23.400 --> 0:45:24.040
<v Speaker 1>And I did it.

0:45:24.080 --> 0:45:25.080
<v Speaker 3>Tracy, you were proud.

0:45:24.840 --> 0:45:27.360
<v Speaker 2>Of me, right, I was very proud. I do think

0:45:27.480 --> 0:45:30.960
<v Speaker 2>it's somewhat ironic that the pinnacle of AI usage is

0:45:30.960 --> 0:45:34.080
<v Speaker 2>teaching someone how to use MS paint, But it's fine,

0:45:34.120 --> 0:45:36.279
<v Speaker 2>I'll take it. Yeah, No, there's so much to pull

0:45:36.320 --> 0:45:38.719
<v Speaker 2>out of that conversation. One thing I'll say, and maybe

0:45:38.719 --> 0:45:41.360
<v Speaker 2>it's a little bit trite, but it does seem like

0:45:42.040 --> 0:45:46.120
<v Speaker 2>language learning is sort of ground zero for the application

0:45:46.400 --> 0:45:50.879
<v Speaker 2>of a lot of this natural language and chat bought technology.

0:45:50.960 --> 0:45:52.840
<v Speaker 2>So it was interesting to come at it from a

0:45:52.880 --> 0:45:56.319
<v Speaker 2>sort of pure language or linguistics perspective.

0:45:57.000 --> 0:45:58.839
<v Speaker 3>Yeah, I mean, I like, I feel like we could

0:45:58.880 --> 0:46:03.200
<v Speaker 3>have talked to Luist for hours, just on like theory

0:46:03.239 --> 0:46:08.040
<v Speaker 3>of language itself, which I find endlessly fascinating, and I

0:46:08.120 --> 0:46:10.160
<v Speaker 3>really I can only speak one language. I used to

0:46:10.200 --> 0:46:12.200
<v Speaker 3>be able to speak French, so I don't know if

0:46:12.200 --> 0:46:15.360
<v Speaker 3>I told you, but I did one semester in Geneva, Switzerland,

0:46:15.360 --> 0:46:17.440
<v Speaker 3>and I lived with a family that only spoke French,

0:46:17.800 --> 0:46:19.680
<v Speaker 3>and I'd never spoken a word of French before I

0:46:19.680 --> 0:46:22.080
<v Speaker 3>got there. And after one semester, I came home and

0:46:22.120 --> 0:46:24.800
<v Speaker 3>I passed out of four years worth of my college

0:46:24.840 --> 0:46:27.560
<v Speaker 3>requirements from that four months living there. And then I

0:46:27.600 --> 0:46:29.560
<v Speaker 3>didn't speak French again for twenty years and I lost

0:46:29.600 --> 0:46:32.000
<v Speaker 3>it all. But I was gonna go somewhere with that.

0:46:32.040 --> 0:46:32.960
<v Speaker 3>I don't really know.

0:46:33.080 --> 0:46:36.400
<v Speaker 2>It's okay I to speak multiple languages poorly.

0:46:36.800 --> 0:46:38.760
<v Speaker 3>But you know the other thing I was thinking about,

0:46:39.040 --> 0:46:41.319
<v Speaker 3>you know, so due LINGO has obviously been around for

0:46:41.400 --> 0:46:43.960
<v Speaker 3>quite a long time before anyone was talking about generative

0:46:44.000 --> 0:46:45.919
<v Speaker 3>AI or anything. And one of the things you hear,

0:46:46.800 --> 0:46:49.400
<v Speaker 3>and it sort of used pejoratively, is like some company

0:46:49.440 --> 0:46:52.560
<v Speaker 3>will be called like a chet GPT rapper, right, so

0:46:52.680 --> 0:46:56.560
<v Speaker 3>basically they're just taking GPT four whatever the latest model is,

0:46:56.880 --> 0:46:59.680
<v Speaker 3>and then building some slick interface to do a specific

0:46:59.719 --> 0:47:02.279
<v Speaker 3>task on top of it. And what's interesting about dual

0:47:02.360 --> 0:47:04.960
<v Speaker 3>Lingo is it feels like it's backwards or going in

0:47:05.000 --> 0:47:09.600
<v Speaker 3>the opposite sequence where they already had this extremely popular

0:47:10.440 --> 0:47:16.120
<v Speaker 3>app for language learning, and then over time they incorporate

0:47:16.160 --> 0:47:18.520
<v Speaker 3>more so rather than being starting off as a rapper

0:47:18.560 --> 0:47:22.080
<v Speaker 3>for someone else's technology, they already have the audience, they

0:47:22.080 --> 0:47:24.480
<v Speaker 3>already have the thing, and then they find more ways

0:47:24.960 --> 0:47:28.000
<v Speaker 3>that the AI can be used to actually like rebuild

0:47:28.200 --> 0:47:28.799
<v Speaker 3>the core app.

0:47:29.200 --> 0:47:30.960
<v Speaker 2>Yeah, that's a really good way of putting it. And

0:47:31.000 --> 0:47:34.160
<v Speaker 2>also just the iterative nature of all of this technology,

0:47:34.239 --> 0:47:36.920
<v Speaker 2>So the idea that you know, you're sort of training it,

0:47:37.040 --> 0:47:39.520
<v Speaker 2>I know, again it's sort of an obvious point, yeah,

0:47:39.560 --> 0:47:43.480
<v Speaker 2>but also I didn't realize how customized a lot of

0:47:43.480 --> 0:47:45.880
<v Speaker 2>the duo lingo stuff is at this point. And the

0:47:45.920 --> 0:47:48.640
<v Speaker 2>idea that if you speak one language, the way you learn,

0:47:48.920 --> 0:47:52.200
<v Speaker 2>say German, is going to be completely different to someone

0:47:52.280 --> 0:47:56.040
<v Speaker 2>who grew up speaking another language. And I'm very intrigued

0:47:56.280 --> 0:47:59.640
<v Speaker 2>by the amount of data that's something like a duolingo

0:48:00.200 --> 0:48:02.919
<v Speaker 2>have at this point, and I guess maybe we should

0:48:02.920 --> 0:48:05.879
<v Speaker 2>have asked Louise about this. But also other business opportunities

0:48:05.880 --> 0:48:09.120
<v Speaker 2>in terms of like licensing that data or maybe I

0:48:09.160 --> 0:48:11.799
<v Speaker 2>don't know. I think they were doing a partnership for

0:48:11.840 --> 0:48:15.399
<v Speaker 2>a while with BuzzFeed where they were where the cap

0:48:15.440 --> 0:48:19.360
<v Speaker 2>show was like actually translating news articles or something.

0:48:19.520 --> 0:48:21.879
<v Speaker 3>Right, there was going to be something like that, I think.

0:48:21.880 --> 0:48:24.480
<v Speaker 3>I recall it didn't really take off, but the idea

0:48:24.600 --> 0:48:27.960
<v Speaker 3>was BuzzFeed would get its news articles translated into Spanish

0:48:28.000 --> 0:48:31.280
<v Speaker 3>and other languages from the process of duo lingo users

0:48:31.360 --> 0:48:34.520
<v Speaker 3>learning that process. I forget why it didn't take off,

0:48:34.520 --> 0:48:35.560
<v Speaker 3>but yeah, absolutely.

0:48:35.840 --> 0:48:38.960
<v Speaker 2>I also I find it funny like in some senses

0:48:39.239 --> 0:48:43.120
<v Speaker 2>that we're sort of I guess the thing that AI

0:48:43.280 --> 0:48:46.160
<v Speaker 2>is feeding off of now right, And like all those

0:48:46.320 --> 0:48:50.239
<v Speaker 2>minutes which I'm sure add up to days eventually of

0:48:50.320 --> 0:48:54.160
<v Speaker 2>going through Capsha, it's all sort of unpaid labor for

0:48:54.320 --> 0:48:56.279
<v Speaker 2>training our future AI overlords.

0:48:56.400 --> 0:48:59.680
<v Speaker 3>So he mentioned that he was upset about headlines last

0:48:59.719 --> 0:49:02.319
<v Speaker 3>year implying that they had laid off a bunch of

0:49:02.360 --> 0:49:05.560
<v Speaker 3>people due to AI. But he did say that there

0:49:05.560 --> 0:49:08.040
<v Speaker 3>are people who they were contractors, so they weren't full

0:49:08.080 --> 0:49:11.640
<v Speaker 3>time employees. But it sounds like a very crisp example

0:49:11.800 --> 0:49:14.279
<v Speaker 3>of AI being able to do a job even if

0:49:14.320 --> 0:49:17.120
<v Speaker 3>they were contractors. That were done by humans. And I'm

0:49:17.239 --> 0:49:20.480
<v Speaker 3>generally skeptical of most articles and that I read where

0:49:20.520 --> 0:49:23.000
<v Speaker 3>a company says, oh, we're getting like cut all this

0:49:23.200 --> 0:49:25.680
<v Speaker 3>labor savings and we're gonna do AI, because I sort

0:49:25.680 --> 0:49:28.200
<v Speaker 3>of think that is often a smokescreen for just like

0:49:28.360 --> 0:49:30.680
<v Speaker 3>a business that wants to cut jobs and make it

0:49:30.719 --> 0:49:33.799
<v Speaker 3>sound like they're progressive. But here did sound like an

0:49:33.840 --> 0:49:37.000
<v Speaker 3>actual example in which there was some form of human

0:49:37.120 --> 0:49:41.000
<v Speaker 3>labor that is no longer needed because it is AI.

0:49:41.320 --> 0:49:44.440
<v Speaker 2>Yes, AI will come for us all. Shall we leave

0:49:44.440 --> 0:49:44.640
<v Speaker 2>it there?

0:49:44.719 --> 0:49:45.479
<v Speaker 4>Let's leave it there.

0:49:45.719 --> 0:49:48.400
<v Speaker 2>This has been another episode of the All Thoughts podcast.

0:49:48.520 --> 0:49:51.800
<v Speaker 2>I'm Tracy Alloway. You can follow me at Tracy Alloway.

0:49:51.520 --> 0:49:54.400
<v Speaker 3>And I'm Joe Wisenthal. You can follow me at the Stalwart.

0:49:54.480 --> 0:49:57.279
<v Speaker 3>Follow our guest Louis Vaughan on He's at Louis van On.

0:49:57.760 --> 0:50:01.120
<v Speaker 3>Follow our producers Carman Rodriguez at Herman Ermann dash Ol

0:50:01.120 --> 0:50:04.360
<v Speaker 3>Bennett at Dashbot and kill Brooks at Kilbrooks. Thank you

0:50:04.400 --> 0:50:07.400
<v Speaker 3>to our producer Moses Ondem From our Oddlows content. Go

0:50:07.440 --> 0:50:10.399
<v Speaker 3>to Bloomberg dot com slash odd Lots, where we have transcripts,

0:50:10.480 --> 0:50:13.359
<v Speaker 3>blog and a newsletter and you can chat about all

0:50:13.360 --> 0:50:16.120
<v Speaker 3>of these topics twenty four to seven in the Discord.

0:50:16.200 --> 0:50:19.200
<v Speaker 3>In fact, this episode came about because someone in the

0:50:19.200 --> 0:50:22.440
<v Speaker 3>Discord wanted to hear an interview with Luis van On,

0:50:22.920 --> 0:50:24.960
<v Speaker 3>So you can go there, you can talk about AI,

0:50:25.080 --> 0:50:27.600
<v Speaker 3>you can suggest future episodes.

0:50:28.120 --> 0:50:31.279
<v Speaker 2>Check it out and if you enjoy all blots, if

0:50:31.320 --> 0:50:35.200
<v Speaker 2>you like it when we speak bad Spanish, I guess,

0:50:35.440 --> 0:50:38.400
<v Speaker 2>then please leave us a positive review on your favorite

0:50:38.400 --> 0:50:42.279
<v Speaker 2>podcast platform. And remember, if you are a Bloomberg subscriber,

0:50:42.360 --> 0:50:45.560
<v Speaker 2>you can listen to all of our episodes absolutely ad free.

0:50:45.840 --> 0:50:48.440
<v Speaker 2>All you need to do is connect your Bloomberg subscription

0:50:48.719 --> 0:50:51.200
<v Speaker 2>with Apple Podcasts. Thanks for listening

0:51:07.960 --> 0:51:08.000
<v Speaker 4>In