WEBVTT - Luis von Ahn Explains How Computers and Humans Learn From Each Other 0:00:02.480 --> 0:00:15.760 Bloomberg Audio Studios, Podcasts, radio News. 0:00:17.920 --> 0:00:21.200 Hello and welcome to another episode of The Odd Lots podcast. 0:00:21.280 --> 0:00:24.439 I'm Tracy Alloway and I'm Joe Wisenthal. Joe, you know, 0:00:24.640 --> 0:00:27.480 I had a life realization recently. 0:00:27.720 --> 0:00:30.240 Okay, this should be good, go on. 0:00:30.800 --> 0:00:34.440 It struck me that I am spending a non negligible 0:00:34.520 --> 0:00:37.440 amount of my time proving that I am in fact 0:00:37.880 --> 0:00:38.680 a human being. 0:00:39.120 --> 0:00:42.120 It's getting harder and harder. I know what you're talking about. 0:00:42.159 --> 0:00:44.320 So we're talking. You know, you go to a website 0:00:44.360 --> 0:00:46.160 and you have to enter in the captcha and it's 0:00:46.240 --> 0:00:49.600 like click all these squares that has like a crosswalk 0:00:49.640 --> 0:00:51.920 on them or a truck, and like it feels like 0:00:51.960 --> 0:00:54.400 it's just getting harder. And sometimes I'm like, no, trust me, 0:00:54.440 --> 0:00:54.960 I'm a human. 0:00:55.760 --> 0:00:58.080 This is it. And every time it happens, I kind 0:00:58.080 --> 0:01:01.040 of have a moment of self doubt whether or not 0:01:02.000 --> 0:01:05.319 is it just me? Am I particularly bad at picking 0:01:05.360 --> 0:01:08.720 out all the motorcycles in a set of pictures? Or 0:01:08.800 --> 0:01:13.559 are they just becoming increasingly weird or perhaps increasingly sophisticated 0:01:13.760 --> 0:01:16.679 in the face of new types of technology. 0:01:17.040 --> 0:01:19.600 It's not just you. I've heard this from multiple people 0:01:19.800 --> 0:01:24.360 in fact, prepping for this episode, I heard people talking 0:01:24.400 --> 0:01:27.280 about exactly this, But you know, it's like a big problem. 0:01:27.319 --> 0:01:29.240 You know, we did that world Coin episode, like everyone 0:01:29.400 --> 0:01:32.039 is trying to figure out, like how in a world 0:01:32.080 --> 0:01:35.240 of AI and bods and artificial intelligence all that stuff, 0:01:35.560 --> 0:01:38.039 how do you know whether someone you're interacting with is 0:01:38.080 --> 0:01:38.880 in fact a person. 0:01:39.120 --> 0:01:43.039 Yeah, and I'm glad you mentioned AI because obviously part 0:01:43.080 --> 0:01:46.119 of this dynamic is AI seems to be getting better 0:01:46.240 --> 0:01:50.560 at solving these particular types of problems, but also they're 0:01:50.600 --> 0:01:54.720 being used more right to train AI models. So at 0:01:54.720 --> 0:01:56.840 this point, I think we all know why we're constantly 0:01:57.000 --> 0:02:00.640 trying to identify bikes and a bunch of photos. But 0:02:00.960 --> 0:02:06.680 the whole idea behind captures is or was that humans 0:02:06.720 --> 0:02:09.320 still have an edge. So there are some things that 0:02:09.400 --> 0:02:13.480 humans are better able to do versus machines. And one 0:02:13.480 --> 0:02:15.400 of the things that we used to talk about humans 0:02:15.440 --> 0:02:18.840 having an edge in was linguistics. So there is this 0:02:18.919 --> 0:02:23.000 idea that human language was so complex, so nuanced, that 0:02:23.080 --> 0:02:27.359 machines would maybe never be able to fully appreciate all 0:02:27.360 --> 0:02:31.160 the intricacies and subtleties of the human language. But obviously, 0:02:31.200 --> 0:02:35.200 since the arrival of generative AI and natural language processing. 0:02:35.560 --> 0:02:38.640 I think there's more of a question mark around that. Yeah. 0:02:38.720 --> 0:02:41.440 I mean, look, I think like a typical chat bot 0:02:41.520 --> 0:02:44.080 right now is probably better than most people at just 0:02:44.200 --> 0:02:47.280 typing out several paragraphs. It's all sort of like seemed 0:02:47.280 --> 0:02:48.880 to sort of as they say on the internet, kind 0:02:48.919 --> 0:02:51.080 of mid curve to me. It never like strikes me 0:02:51.120 --> 0:02:55.919 as like incredibly intelligent, but clearly computers can talk about 0:02:55.919 --> 0:02:58.480 as well as humans, and so it raises all sorts 0:02:58.480 --> 0:03:01.320 of interesting questions. You mentioned that part of capture is 0:03:01.400 --> 0:03:04.200 part of this, like training computers. A big part of 0:03:04.240 --> 0:03:07.440 these chatbots the so called like real life human feedback 0:03:07.480 --> 0:03:09.680 where people say this answer is better then another, this 0:03:09.720 --> 0:03:12.240 answer is better another, is they refine the models, et cetera. 0:03:12.720 --> 0:03:13.840 So I think there's like. 0:03:13.800 --> 0:03:16.920 An interesting moment where like we're learning from computers and 0:03:16.960 --> 0:03:21.720 computers are learning from us, maybe collaboratively, the two sides 0:03:22.240 --> 0:03:25.120 in a carbon and silicon working together. 0:03:25.680 --> 0:03:27.560 I think that's a great way of putting it. Also, 0:03:27.800 --> 0:03:31.880 mid curve is such an underappreciated insult, like calling people 0:03:31.960 --> 0:03:34.760 top of the bell curve is one of my favorite 0:03:34.760 --> 0:03:37.320 things to do online. Anyway, I am very pleased to 0:03:37.400 --> 0:03:41.240 say that today we actually have the perfect guest. We're 0:03:41.280 --> 0:03:45.680 going to be speaking to someone who was very instrumental 0:03:45.800 --> 0:03:49.440 in the development of things like Captcha and someone who 0:03:49.520 --> 0:03:53.440 is doing a lot with AI, particularly in the field 0:03:53.600 --> 0:03:56.800 of linguistics and language. Right now, we're going to be 0:03:56.800 --> 0:03:59.440 speaking with Louis von On. He is, of course the 0:03:59.560 --> 0:04:02.920 CEO and co founder of Duo Lingo. So, Louise, thank 0:04:03.000 --> 0:04:04.200 you so much for coming on. 0:04:04.040 --> 0:04:06.119 On thoughts, Thank you, thank you for having me. 0:04:06.800 --> 0:04:09.680 So maybe to begin with talk to us about the 0:04:09.800 --> 0:04:14.160 idea behind capture and why it seems to have become 0:04:14.320 --> 0:04:17.039 I don't want to say a significant portion of my life, 0:04:17.080 --> 0:04:20.080 but I certainly spend a couple minutes every day doing 0:04:20.080 --> 0:04:21.000 at least one version. 0:04:21.680 --> 0:04:24.359 Yeah. So the original capture, the idea of a capture 0:04:24.560 --> 0:04:28.279 was a test to distinguish humans from computers. The reasons 0:04:28.320 --> 0:04:31.120 why you may want to distinguish whether you're interacting with 0:04:31.160 --> 0:04:34.000 a human or a computer online for example, and this 0:04:34.120 --> 0:04:37.240 is kind of the original motivation for it. Companies offer 0:04:37.279 --> 0:04:40.359 free email services, and you know they have the problem 0:04:40.440 --> 0:04:43.599 that if you allow anything to sign up for a 0:04:43.600 --> 0:04:46.920 freemail service, like either a computer or human, somebody could 0:04:46.960 --> 0:04:49.560 write a program to obtain millions of free email accounts, 0:04:49.880 --> 0:04:53.919 whereas humans, because they are usually not that patient, cannot 0:04:54.240 --> 0:04:56.520 get millions of email accounts for themselves. They can only 0:04:56.520 --> 0:05:00.360 get one or two. So the original motivation for aptual 0:05:00.440 --> 0:05:02.359 was to make a test to make sure that whoever 0:05:02.640 --> 0:05:04.760 is getting a freemail accunt is actually a human and 0:05:04.800 --> 0:05:07.760 not a computer program that was written to obtain millions 0:05:07.760 --> 0:05:11.000 of email accounts, so, you know, and the way it worked, 0:05:11.120 --> 0:05:13.400 there's there's many kind of tests. Originally, the way it 0:05:13.440 --> 0:05:16.560 worked is distorted letters, So you would get a bunch 0:05:16.600 --> 0:05:18.800 of letters that were predistorted and you had to type 0:05:18.800 --> 0:05:21.640 what they were. And the reason that worked is because 0:05:22.240 --> 0:05:25.560 human beings are very good at reindistorted letters. But at 0:05:25.600 --> 0:05:27.720 the time this was, you know, more than twenty years ago, 0:05:28.000 --> 0:05:31.720 computers just could not recognize distorted letters very well. So 0:05:31.760 --> 0:05:34.280 that was a great test to determine whether you were 0:05:34.279 --> 0:05:36.880 talking to a human or a computer. But what happened 0:05:36.880 --> 0:05:40.880 is over time, computers got quite good at this trying 0:05:40.920 --> 0:05:45.919 to deciphering distorted text, so it was no longer possible 0:05:45.960 --> 0:05:48.640 to give an image with distorted text and distinguish a 0:05:48.680 --> 0:05:50.840 human from a computer, because computers pretty much got as 0:05:50.880 --> 0:05:54.360 good as a human at that point, these tests started 0:05:54.520 --> 0:05:56.480 changing to other things. I mean, one of the more 0:05:56.520 --> 0:05:59.400 popular ones that you see nowadays is kind of clicking 0:05:59.520 --> 0:06:02.520 on the images of something. So you can see a grid, 0:06:02.680 --> 0:06:05.240 like a four by four grid, and it may say 0:06:05.640 --> 0:06:07.920 click on all the traffic lights, or click on all 0:06:07.960 --> 0:06:12.960 the bicycles, et cetera. And by clicking on them, you know, 0:06:13.120 --> 0:06:17.279 you're you're showing that you can actually recognize these things. 0:06:17.480 --> 0:06:20.520 And the reason they're getting harder is because computers are 0:06:20.520 --> 0:06:23.960 getting better and better at deciphering which ones are traffic lights, 0:06:24.000 --> 0:06:27.160 et cetera. And by now, what you're getting here are 0:06:27.200 --> 0:06:30.280 the things that we still think computers are not very 0:06:30.279 --> 0:06:33.920 good at. So the image may be very blurry, or 0:06:34.080 --> 0:06:35.920 you know, you may just get a tiny little corner 0:06:36.000 --> 0:06:38.599 of it and things like that. So that's why they're 0:06:38.600 --> 0:06:40.960 getting harder, and I expect that to continue happening. 0:06:41.680 --> 0:06:45.040 So you are the found You founded a company called 0:06:45.200 --> 0:06:49.000 recap Show, which you sold to Google and several years ago. 0:06:49.480 --> 0:06:52.040 Is there gonna be a point where I mean, I 0:06:52.080 --> 0:06:56.479 assume computer vision and their ability to decode images or 0:06:56.520 --> 0:06:59.800 recognize images is not done improving. I assume it's going 0:06:59.800 --> 0:07:03.479 to get better, whereas humans' ability to decode images. I 0:07:03.560 --> 0:07:06.159 doubt it's really getting any better. We've probably been about 0:07:06.160 --> 0:07:09.120 the same for a couple thousand years now. Like, is 0:07:09.160 --> 0:07:11.920 there going to be a point in which it's impossible 0:07:12.040 --> 0:07:14.600 to create a visual test that humans are better at 0:07:14.600 --> 0:07:15.480 than computers? 0:07:15.680 --> 0:07:18.320 I believe that will happen at some point. Yeah, it's 0:07:18.480 --> 0:07:21.840 very hard to say when exactly, but you know, you 0:07:21.880 --> 0:07:24.440 can just see at this point it's getting you know, 0:07:24.480 --> 0:07:27.200 computers are getting better and better. And you know, the 0:07:27.240 --> 0:07:30.120 other thing that is important to mention is this type 0:07:30.120 --> 0:07:33.200 of test has extra constraints. It also has to be 0:07:33.280 --> 0:07:36.360 the case that it's not just that humans can do 0:07:36.400 --> 0:07:38.040 it. It's like, really, humans should be able to do it 0:07:38.040 --> 0:07:43.600 pretty quickly and you know, success. 0:07:43.360 --> 0:07:46.400 Quickly, and on a mobile phone and a very small 0:07:46.480 --> 0:07:48.840 screen in which like my thumb is like half the 0:07:48.880 --> 0:07:49.520 size of the screen. 0:07:49.640 --> 0:07:51.600 Yeah. Yeah, And it may not be you know, quickly. 0:07:51.680 --> 0:07:53.520 I mean it may take you, I don't know, thirty 0:07:53.520 --> 0:07:55.480 seconds or a minute. But we cannot make a test 0:07:55.480 --> 0:07:59.200 that takes you an hour. We can't do that. So 0:07:59.240 --> 0:08:01.200 it has to be quick. It has to be done 0:08:01.200 --> 0:08:02.480 on a mobile phone. It has to be the case 0:08:02.480 --> 0:08:04.440 that the computer should be able to grade it. Computer 0:08:04.480 --> 0:08:06.160 should be able to know what the right answer was, 0:08:06.280 --> 0:08:09.400 even though it can't solve it. So because of all 0:08:09.400 --> 0:08:11.880 of these constraints, I mean, my sense is at some 0:08:12.000 --> 0:08:14.160 point this is just going to be impossible. I mean, 0:08:14.320 --> 0:08:17.360 we knew this when we started the original capture that 0:08:17.400 --> 0:08:19.640 at some point computers were going to get good enough, 0:08:20.800 --> 0:08:22.960 but we just had no idea how long it was 0:08:23.000 --> 0:08:25.520 going to take. And I still don't know how long 0:08:25.600 --> 0:08:27.520 it's going to take. But you know, I would not 0:08:27.600 --> 0:08:29.960 be surprised if in five to ten years there's just 0:08:30.040 --> 0:08:32.679 not much that you can do that is really quick 0:08:33.080 --> 0:08:36.200 online to be able to differentiate humans from computers. 0:08:36.360 --> 0:08:39.760 Yeah, that's when we get the eyeball scanning ORBS. But 0:08:40.000 --> 0:08:42.360 I mean you mentioned that you can't have a test 0:08:42.679 --> 0:08:45.760 that takes an hour or something like that. But this 0:08:45.880 --> 0:08:49.160 kind of begs the question in my mind of why 0:08:49.200 --> 0:08:51.839 are people using these tests at all? So, like, Okay, 0:08:51.920 --> 0:08:56.160 obviously you want to distinguish between humans and robots, but 0:08:56.280 --> 0:08:59.160 I sometimes get the sense that these are basically free 0:08:59.240 --> 0:09:03.600 labor AI training programs, Right, So even if you can 0:09:03.760 --> 0:09:07.439 verify identity in some other way, why not get people 0:09:07.679 --> 0:09:10.920 on a mass scale to spend two minutes training self 0:09:11.000 --> 0:09:11.680 driving cars. 0:09:12.200 --> 0:09:14.240 Yeah, I mean, this is what these things are doing. 0:09:14.240 --> 0:09:17.320 That was the original idea of Recapture, which was my company. 0:09:17.400 --> 0:09:21.120 The idea was that you could, at the same time 0:09:21.160 --> 0:09:23.000 as you were proving that you are a human, you 0:09:23.040 --> 0:09:25.400 could be doing something that computers could not yet do, 0:09:25.800 --> 0:09:29.080 and that data could be used to improve computer programs 0:09:29.080 --> 0:09:32.520 to do it. So certainly, when you're clicking on bicycles 0:09:32.600 --> 0:09:35.280 or when you're clicking on traffic lights or whatever, that 0:09:35.440 --> 0:09:38.600 is likely data that is being used. I say likely 0:09:38.600 --> 0:09:40.800 because you know, I don't know what capture you're using. 0:09:41.000 --> 0:09:42.360 There may be some that are not doing that, but 0:09:42.800 --> 0:09:47.000 overall that data is being used to improve things like 0:09:47.559 --> 0:09:51.800 self driving cars, image recognition programs, et cetera. So that 0:09:51.920 --> 0:09:54.800 is happening, and that's you know, generally a good thing 0:09:54.840 --> 0:09:59.000 because that's making basically AI smarter and smarter. But you know, 0:09:59.480 --> 0:10:01.520 we still needed to be the case that it's a 0:10:01.559 --> 0:10:05.480 good security mechanism. So if at some point just computers 0:10:05.480 --> 0:10:09.080 can do that, then you know, that's just not a 0:10:09.080 --> 0:10:10.959 great security mechanism and it's not going to be used. 0:10:10.960 --> 0:10:13.480 And my sense is if we're gonna want to do something, 0:10:13.480 --> 0:10:16.280 we are going to need something like real identity, Like 0:10:16.600 --> 0:10:18.040 I don't know if it's going to be eyeball scanning 0:10:18.120 --> 0:10:20.520 or whatever, but it's good. We're gonna you know, the 0:10:20.840 --> 0:10:23.360 nice thing about a capture is it doesn't tie you 0:10:23.400 --> 0:10:26.040 to you. It just proves that you're a human. Right, 0:10:26.440 --> 0:10:29.040 We're probably going to need something that ties you to you. 0:10:29.760 --> 0:10:31.760 We're probably going to need something that says, well, I 0:10:31.960 --> 0:10:35.400 just know this is this specific person because you know whatever, 0:10:35.800 --> 0:10:39.040 we're scanning their eyeball, we're looking at their fingerprint, whatever 0:10:39.080 --> 0:10:41.040 it is, and it is actually a real person, and 0:10:41.080 --> 0:10:42.000 it is this person. 0:10:43.000 --> 0:10:45.280 Why don't we sort of zoom out and back up 0:10:45.320 --> 0:10:48.240 for a second. So currently you are the CEO of 0:10:48.360 --> 0:10:54.120 Duo Lingo of the popular language learning app, publicly traded company. 0:10:54.600 --> 0:10:58.160 Done much better sort of stockwise than many companies that 0:10:58.240 --> 0:11:01.480 came public in twenty twenty one. I have expected, you know, 0:11:01.640 --> 0:11:03.760 there was a boom when people a bunch of time 0:11:03.800 --> 0:11:06.520 on their hand gone down. You also sort of one 0:11:06.520 --> 0:11:10.240 of the most respected sort of computer sciences thinkers coming 0:11:10.240 --> 0:11:13.520 out of the Carnegie Mellon University. What is the through 0:11:13.600 --> 0:11:16.120 line of your work or how would you characterize that 0:11:16.200 --> 0:11:20.280 connects something like captures to language learning a dual lingo. 0:11:20.760 --> 0:11:23.600 It's similar to what you were talking about smiling when 0:11:23.600 --> 0:11:25.320 you were mentioning that. I mean, I think the general 0:11:25.360 --> 0:11:29.319 through line is a combination of humans learning from computers 0:11:29.320 --> 0:11:32.480 and computers learning from humans. And you know, capture had 0:11:32.520 --> 0:11:35.480 that while you were typing a capture, computers were learning 0:11:35.520 --> 0:11:38.040 from what you were doing. In the case of duolingo, 0:11:38.600 --> 0:11:41.760 it's really a symbiotic thing that both are learning, in 0:11:41.800 --> 0:11:45.160 that humans are learning a language and in the case 0:11:45.160 --> 0:11:47.080 of due a lingo, due lingos learning how to teach 0:11:47.160 --> 0:11:51.520 humans better by interacting with humans a lot. So you know, 0:11:51.600 --> 0:11:54.960 dual lingo just gets better with time because we figure 0:11:55.000 --> 0:11:58.520 out different ways in which humans are just learning better. 0:11:59.160 --> 0:12:01.440 You know, humans are getting better with a language, and 0:12:01.520 --> 0:12:03.439 do a linguos getting better at teaching you languages. 0:12:19.120 --> 0:12:20.640 Joe, have you used to a lingo? 0:12:21.400 --> 0:12:25.520 I haven't. Well, okay, I hadn't up until recently. So 0:12:26.080 --> 0:12:29.040 last week, as it turns out, I visited my mother 0:12:29.120 --> 0:12:32.199 who lives in Guatemala, which luis I Anderson You're from, 0:12:32.280 --> 0:12:35.280 And oh, wow, yeah, she's she is. Uh, she's not 0:12:35.360 --> 0:12:38.440 from there, but she visited a friend there eight years 0:12:38.440 --> 0:12:39.880 ago and she loved it, and she's like, I'm just 0:12:39.920 --> 0:12:42.720 gonna stay and she has a little never left. She 0:12:42.800 --> 0:12:44.440 loved it so much, and so I visited her for 0:12:44.480 --> 0:12:47.240 the first time at her house near Lake Atitlan, and 0:12:47.240 --> 0:12:48.679 then I was like, oh, there's a great life and 0:12:48.720 --> 0:12:51.640 maybe one day I'll even have that house. And I 0:12:51.640 --> 0:12:54.560 should learn Spanish, And so I did, partly because of 0:12:54.559 --> 0:12:57.280 that trip and partly to prepare for this episode. I 0:12:57.400 --> 0:12:59.880 downloaded it and have started. I know a little bit 0:12:59.920 --> 0:13:02.280 of Spanish, not much like I can, you know, ask 0:13:02.320 --> 0:13:04.079 for the bill and stuff, but it's like, oh, I should, 0:13:04.120 --> 0:13:05.040 I should start to learn it. 0:13:05.160 --> 0:13:09.160 That's funny because I also started learning Spanish right before 0:13:09.280 --> 0:13:12.040 a trip to Guatemala. There you go with Duolingo, and 0:13:12.280 --> 0:13:16.000 I'm not the best advertisement for the app. I'm afraid, 0:13:16.080 --> 0:13:18.959 like the only thing I remember is basically like Kissierra 0:13:19.120 --> 0:13:24.000 una hapatas personas. That's all I remember from. 0:13:23.920 --> 0:13:24.600 It's pretty good. 0:13:25.000 --> 0:13:26.160 Thanks, that's pretty good. 0:13:26.920 --> 0:13:28.720 All right, I need to get back on it. But 0:13:29.080 --> 0:13:31.600 why don't you talk to us a little bit about 0:13:31.640 --> 0:13:37.040 the opportunity with AI in this sort of language learning space, 0:13:37.280 --> 0:13:41.280 because intuitively, it would seem like things like chat bots 0:13:41.320 --> 0:13:44.800 and generative AI and natural language processing and things like 0:13:44.840 --> 0:13:48.840 that would be an amazing fit for this type of business. 0:13:49.120 --> 0:13:51.600 Yeah, it's a really good fit. So okay, So you know, 0:13:51.600 --> 0:13:55.320 we teach languages. We do a lingo. Historically, you know, 0:13:55.400 --> 0:13:57.720 learning a language just has a lot of different components. 0:13:57.760 --> 0:14:00.440 You got to learn how to how to read language. 0:14:00.440 --> 0:14:02.760 You got to learn some vocabulary, you got to learn 0:14:02.760 --> 0:14:05.480 how to listen to it. If there's a different writing system, 0:14:05.520 --> 0:14:07.839 you've got to learn the writing system, you got to 0:14:07.920 --> 0:14:09.800 learn how to have a conversation. There's a lot of 0:14:09.800 --> 0:14:14.480 different skills that are required in learning a language. Historically, 0:14:14.520 --> 0:14:17.720 we have done pretty well in all the skills except 0:14:17.760 --> 0:14:21.080 for one of them, which is having a multi turned 0:14:21.120 --> 0:14:24.960 fluid conversation. So we could teach you, you know, historically, we 0:14:25.000 --> 0:14:27.320 could teach you, We could teach your vocabulary really well. 0:14:27.360 --> 0:14:29.000 We could teach you how to listen to a language. 0:14:29.040 --> 0:14:30.880 It's you know, generally just by just getting you to 0:14:30.880 --> 0:14:32.920 listen a lot to something. So we could teach you 0:14:32.960 --> 0:14:37.280 all the things, but being able to practice actual multi 0:14:37.320 --> 0:14:40.160 turned conversation was not something that we could do with 0:14:40.320 --> 0:14:42.840 just a computer. Historically, that needed us to pair you 0:14:42.880 --> 0:14:45.240 with another human. Now we do a ling We never 0:14:45.280 --> 0:14:47.280 paired people up with other humans, because it turns out 0:14:47.800 --> 0:14:50.400 a very small fraction of people actually want to be 0:14:50.480 --> 0:14:53.600 paired with a random person over the internet who speaks 0:14:53.600 --> 0:14:56.720 a different language. It's just it's kind of too embarrassing 0:14:56.760 --> 0:15:00.640 for most people. I never did that. Well, it may 0:15:00.680 --> 0:15:04.640 be dangerous, yes, but it also it's just it's like 0:15:04.720 --> 0:15:08.320 ninety percent of people just not extroverted enough, yeah to 0:15:08.400 --> 0:15:11.120 do that. I just don't want to do it. So 0:15:11.600 --> 0:15:14.440 we always, you know, kind of we did these kind 0:15:14.440 --> 0:15:18.000 of wonky things to try to emulate short conversations, but 0:15:18.040 --> 0:15:20.360 we could never do anything like what we can do 0:15:20.480 --> 0:15:24.720 now because with large language models, we really can get 0:15:24.760 --> 0:15:27.840 you to practice you know, it may not be a 0:15:27.920 --> 0:15:30.160 three hour conversation, but we can get you to practice 0:15:30.160 --> 0:15:32.440 a multi turn, you know, ten minute conversation and it's 0:15:32.480 --> 0:15:34.680 pretty good. So that's that's what we're doing with du 0:15:34.680 --> 0:15:38.680 A Lingo. We're using it to help you learn conversational 0:15:38.720 --> 0:15:41.000 skills a lot better, and that's helping out quite a bit. 0:15:41.840 --> 0:15:44.320 There are so many questions I have, and I you know, 0:15:44.880 --> 0:15:46.920 I think my mom will rely like this episode because, 0:15:46.960 --> 0:15:50.320 in addition to the Guatemala connection, she is a linguist. 0:15:50.520 --> 0:15:54.440 She speaks like seven languages, including Spanish, and like basically 0:15:55.240 --> 0:15:57.080 you know all the others, not all the others, but 0:15:57.680 --> 0:16:01.040 all the others, many many others. But you know something 0:16:01.080 --> 0:16:03.600 that I was curious about, and maybe this is a 0:16:03.640 --> 0:16:05.600 little bit of random jumping point, you know. I think 0:16:05.640 --> 0:16:09.480 about like chess computers, and originally they were sort of 0:16:09.520 --> 0:16:12.680 trained on a corpus of famous chess games, and then 0:16:12.720 --> 0:16:13.240 with some. 0:16:13.120 --> 0:16:14.120 Computer they got better. 0:16:14.120 --> 0:16:18.720 And then the new generation essentially relearned chess from just 0:16:18.800 --> 0:16:21.640 the rules from first principles, and it turns out that 0:16:21.640 --> 0:16:24.560 they're way better. And I'm wondering, if you're learning through 0:16:24.560 --> 0:16:26.520 the process of building out do a lingo improvement, Like 0:16:27.160 --> 0:16:30.960 are there forms of pedagogy that in language learning, whether 0:16:31.040 --> 0:16:33.960 it's the need for immersion or the need for roat drills, 0:16:34.040 --> 0:16:37.640 or certain things that linguists have always thought were necessary 0:16:37.640 --> 0:16:41.880 components of good language learning that when rebuilding education from 0:16:41.920 --> 0:16:46.240 the ground up, like old dictums just turn out to 0:16:46.240 --> 0:16:49.000 be completely wrong, And when you rebuild the process from 0:16:49.040 --> 0:16:52.600 the beginning, like novel forms of pedagogy emerge. 0:16:53.160 --> 0:16:56.240 It's a great question, and it's a hard question to 0:16:56.280 --> 0:16:59.840 answer for the following reason, at least for us we 0:17:00.200 --> 0:17:04.760 teach a language from an app. Historically, the way people 0:17:04.840 --> 0:17:08.280 learn languages is basically by practicing with another human or 0:17:08.400 --> 0:17:10.840 being in a classroom or whatever. Whereas we teach from 0:17:10.880 --> 0:17:14.240 an app, the setting is just very different for one 0:17:14.680 --> 0:17:18.600 key reason, which is that it is so easy to 0:17:18.720 --> 0:17:21.879 leave the app, whereas leaving a classroom it's just not 0:17:21.920 --> 0:17:23.720 that easy. You kind of have to go. You're usually 0:17:23.720 --> 0:17:25.800 forced by your parents to go to a classroom, and like, 0:17:26.119 --> 0:17:29.760 you know, so generally, the thing about learning something by 0:17:29.800 --> 0:17:33.240 yourself when you're just learning it through a computer is 0:17:33.280 --> 0:17:37.439 that the hardest thing is motivation. It turns out that 0:17:37.320 --> 0:17:41.040 the pedagogy is important, of course it is, but much 0:17:41.119 --> 0:17:44.359 like exercising, what matters the most is that you're actually 0:17:44.440 --> 0:17:46.720 motivated to do it every day. So like, is the 0:17:46.760 --> 0:17:51.560 elliptical better than the step climber or better than the treadmill? Like, yeah, 0:17:51.600 --> 0:17:55.000 they're probably differences, but the reality is what's most important 0:17:55.040 --> 0:17:57.280 is that you kind of do it often. And so 0:17:57.760 --> 0:17:59.760 what we have found with dual linguo is that if 0:17:59.800 --> 0:18:01.960 we're going to teach it with an app, there are 0:18:01.960 --> 0:18:05.480 a lot of things that historically, you know, language teachers 0:18:05.640 --> 0:18:09.920 or linguists didn't think we're the best ways to teach languages, 0:18:10.000 --> 0:18:11.359 but if you're going to do it with an app, 0:18:11.359 --> 0:18:13.960 you have to make it engaging. And we've had to 0:18:13.960 --> 0:18:16.320 do it that way, and we have found that we 0:18:16.359 --> 0:18:20.320 can do some things significantly better than human teachers, and 0:18:20.359 --> 0:18:23.560 something's not as good because it's a very different system. 0:18:23.640 --> 0:18:26.040 But again, the most important thing is just to keep 0:18:26.080 --> 0:18:29.480 you motivated. So examples of things that we've had to 0:18:29.480 --> 0:18:32.320 do to keep people motivated are quote unquote classes, which 0:18:32.359 --> 0:18:35.000 is a lesson undu a lingo. They're not thirty minutes 0:18:35.080 --> 0:18:37.280 or forty five minutes, they're two and a half minutes. 0:18:38.119 --> 0:18:41.960 If they're any longer, we start losing people's attention. So 0:18:42.000 --> 0:18:44.359 stuff like that I think has been really important. Now 0:18:44.400 --> 0:18:47.440 I'll say, related to your question, one thing that has 0:18:47.440 --> 0:18:50.160 been amazing is that, you know, we start out with 0:18:50.840 --> 0:18:53.720 language experts who you know, people with PhDs and second 0:18:53.760 --> 0:18:56.200 language acquisition, who tell us how to best teach something. 0:18:56.240 --> 0:18:58.280 But then it takes it from there and the computer 0:18:58.359 --> 0:19:01.760 optimizes it, and so the computer starts finding different ways. 0:19:01.800 --> 0:19:05.399 There are different orderings of things that are actually better 0:19:05.880 --> 0:19:09.520 than what the people with phg's and second language acquisition thought. 0:19:09.600 --> 0:19:12.040 But it's because they just didn't have the data to 0:19:12.119 --> 0:19:14.240 optimize this, whereas now you know, we do a lingo, 0:19:14.320 --> 0:19:17.239 we have it's something like one billion exercises. Is one 0:19:17.280 --> 0:19:20.600 billion exercises are solved every day by people using dual lingo, 0:19:21.119 --> 0:19:22.840 and that just has a lot of data that helps 0:19:22.880 --> 0:19:23.480 us teach better. 0:19:23.880 --> 0:19:26.280 This is exactly what I wanted to ask you, which 0:19:26.320 --> 0:19:30.119 is how iterative is this technology? So how much is 0:19:30.119 --> 0:19:33.320 it about the AI model sort of developing off the 0:19:33.400 --> 0:19:36.199 data that you feed it, and then the AI model 0:19:36.480 --> 0:19:41.600 improving the outcome for users and thereby generating more data 0:19:41.680 --> 0:19:42.600 from which it can train. 0:19:43.000 --> 0:19:47.080 It's exactly we're exactly doing that, and in particular, one 0:19:47.119 --> 0:19:49.480 of the things that we've been able to optimize a 0:19:49.520 --> 0:19:53.000 lot is which exercise we give to which person. So 0:19:53.040 --> 0:19:54.840 when you start a lesson and do a lingo, you 0:19:54.880 --> 0:19:56.800 may think that all lessons are the same for everybody. 0:19:56.840 --> 0:20:00.119 They're absolutely not. When you use to a lingo, you 0:20:00.200 --> 0:20:04.040 watch what you do, and you know, the computer makes 0:20:04.040 --> 0:20:06.680 a model of you as a student, so it sees 0:20:06.760 --> 0:20:08.879 everything you get right, everything you get wrong, and based 0:20:08.880 --> 0:20:11.080 on that, it starts realizing you're not very good at 0:20:11.080 --> 0:20:14.000 the past tense, or you're not very good at the 0:20:14.000 --> 0:20:16.639 future tens or whatever. And whenever you start a lesson, 0:20:17.160 --> 0:20:19.560 it uses that model specifically for you, and it knows 0:20:19.560 --> 0:20:21.119 that you're not very good at a past tense, so 0:20:21.119 --> 0:20:24.080 it may give you more past tense or it does 0:20:24.119 --> 0:20:26.560 stuff like that. And that definitely gets better with more 0:20:26.560 --> 0:20:28.600 and more data. And I'll say another thing that is 0:20:28.640 --> 0:20:31.240 really important. If we were to give you a lesson 0:20:32.000 --> 0:20:35.280 only with the things that you're not good at, that 0:20:35.320 --> 0:20:38.560 would be a horrible lesson because that would be extremely frustrating. 0:20:38.600 --> 0:20:40.239