WEBVTT - This Is How to Tell if Writing Was Made by AI 0:00:02.720 --> 0:00:16.360 Bloomberg Audio Studios, Podcasts, Radio News. 0:00:18.480 --> 0:00:21.840 Hello and welcome to another episode of The Odd Lads podcast. 0:00:21.920 --> 0:00:24.119 I'm jille Wisenthal and I'm Tracy Alloway. 0:00:24.360 --> 0:00:27.280 So, Tracy, you know, you ever come across some writing 0:00:28.160 --> 0:00:31.720 you can't articulate exactly why, but you're like, I'm pretty 0:00:31.760 --> 0:00:32.720 sure AI wrote this? 0:00:33.120 --> 0:00:34.160 Does this happen too much? 0:00:34.280 --> 0:00:38.600 So, full disclosure, I haven't really thought about it that much. Yeah, 0:00:38.640 --> 0:00:41.640 because the thing is I probably should think about it more, 0:00:41.960 --> 0:00:43.960 but there's a lot of bad writing out there, and 0:00:44.000 --> 0:00:46.440 I've become sort of a nerd to it. And I 0:00:46.479 --> 0:00:49.680 also think that I don't know trying to figure out 0:00:49.680 --> 0:00:53.880 whether or not something was generated by AI nowadays, if 0:00:53.880 --> 0:00:55.960 you actually dedicate a lot of your own time to 0:00:56.120 --> 0:01:00.880 doing that, that is a huge mental burden to be attempting. 0:01:01.000 --> 0:01:03.480 Especially you and I are in the journalism industry. How 0:01:03.560 --> 0:01:05.760 many of the pitches do you think that we get 0:01:05.800 --> 0:01:08.440 from prs right now are being generated by A I 0:01:08.760 --> 0:01:11.520 imagine if you're reading each one of those and trying 0:01:11.560 --> 0:01:13.440 to figure it out on a daily basis. 0:01:13.560 --> 0:01:15.360 You know what I suppose I think about it the 0:01:15.360 --> 0:01:18.200 most is someone will respond to a tweet yeah, and 0:01:18.240 --> 0:01:19.800 I'll be like, well, if this is a real person, 0:01:19.800 --> 0:01:22.319 then maybe this person deserves some engagement and ask a 0:01:22.400 --> 0:01:24.560 question or I want to respond. But if there's a 0:01:24.560 --> 0:01:26.880 person in the bot, then obviously I don't. And that's 0:01:26.880 --> 0:01:28.399 where I look, you know what, I want to figure 0:01:28.400 --> 0:01:30.800 it out. I would like to know the answer. 0:01:30.959 --> 0:01:31.160 You know. 0:01:31.200 --> 0:01:34.200 I have a controversial view about AI writing, by the way, 0:01:34.240 --> 0:01:36.640 which is that it's pretty good. I mean, like, by 0:01:36.680 --> 0:01:38.920 and large, and I said this, I think maybe in 0:01:38.920 --> 0:01:41.640 a recent episode. When you consider the fact that I 0:01:41.680 --> 0:01:44.679 don't know the majority of the population, like doesn't know 0:01:44.680 --> 0:01:47.600 where to put a comma within the sentence, Well, this 0:01:47.680 --> 0:01:48.120 is my point. 0:01:48.320 --> 0:01:48.960 It's pretty good. 0:01:48.960 --> 0:01:49.400 I mean, yeah. 0:01:49.400 --> 0:01:51.160 One thing I'll say about AI is it never gets 0:01:51.240 --> 0:01:52.480 the placement of a comma wrong. 0:01:52.840 --> 0:01:54.160 On some level, it's perfect. 0:01:54.320 --> 0:01:56.000 Did you do that? I think it was in the 0:01:56.000 --> 0:01:57.560 New York Times the test. 0:01:57.600 --> 0:01:58.360 I kind of hated that. 0:01:58.560 --> 0:02:01.200 Okay, why well, because I'll tell you, first of all, 0:02:01.160 --> 0:02:02.240 it's a five examples. 0:02:02.280 --> 0:02:04.520 There's not very many. Two It asked the reader, which 0:02:04.520 --> 0:02:05.080 do you prefer? 0:02:05.160 --> 0:02:07.120 But I think they were different subjects as well. 0:02:07.200 --> 0:02:07.440 Yeah. 0:02:07.600 --> 0:02:10.080 Also, I think most people probably treated that as can 0:02:10.120 --> 0:02:11.959 you guess which one is a human? Because everyone wants 0:02:12.000 --> 0:02:14.320 to say they prefer the human I didn't think it 0:02:14.400 --> 0:02:18.400 was like a great test. Nonetheless, Look, not only is 0:02:18.400 --> 0:02:22.360 it often indistinguishable, not often is it often fine writing. 0:02:22.840 --> 0:02:25.359 Sometimes AI could come up with a really remarkable turn 0:02:25.400 --> 0:02:27.760 of phrase. Yeah, but I still buy and large don't 0:02:27.840 --> 0:02:30.000 like it. You read like a thing, especially a long 0:02:30.040 --> 0:02:32.760 text a's AI, and it's like, even if you can't articulate. 0:02:32.360 --> 0:02:33.880 It, it's like this feels AI. 0:02:33.960 --> 0:02:36.640 It has a certain sickliness sweetness to it that is 0:02:36.680 --> 0:02:37.320 often annoying. 0:02:37.320 --> 0:02:38.160 It's annoying. 0:02:38.400 --> 0:02:41.000 What I notice about it is it doesn't do style 0:02:41.200 --> 0:02:43.240 very well, right, So if you ask it to write 0:02:43.240 --> 0:02:45.840 something in the style of a writer, if you choose 0:02:45.880 --> 0:02:49.240 anything other than something really obvious like Shakespeare, it really 0:02:49.480 --> 0:02:53.120 it suffers. But the text that it actually outputs is 0:02:53.160 --> 0:02:58.519 pretty clear. Yeah, right, like for basic understanding. Total it's 0:02:58.639 --> 0:03:01.440 probably better than a lotful what's on the internet. 0:03:01.760 --> 0:03:03.200 The real people who are going to have to worry 0:03:03.240 --> 0:03:07.840 about this are like teachers obviously, universities and lawyers, student 0:03:07.919 --> 0:03:11.040 lawyers and maybe at it's fun, but there are sometimes 0:03:11.080 --> 0:03:12.800 it's like, Okay, did someone write this or not? 0:03:13.000 --> 0:03:14.920 And there has to be it'd be nice if we 0:03:14.960 --> 0:03:16.120 could know the answer. 0:03:16.320 --> 0:03:19.280 Well, the other thing that's starting to happen is have 0:03:19.400 --> 0:03:21.840 you seen any books out there that actually come with 0:03:21.960 --> 0:03:25.240 a disclosure or disclaimer that say this book has been 0:03:25.280 --> 0:03:26.760 written only by humans? 0:03:26.800 --> 0:03:26.880 No? 0:03:27.000 --> 0:03:28.079 AI used at all. 0:03:28.120 --> 0:03:29.720 I saw that for the first time on a book 0:03:29.760 --> 0:03:32.000 that we actually read for an All Blots episode. I 0:03:32.000 --> 0:03:33.600 don't think it's come out yet, but that kind of 0:03:33.639 --> 0:03:33.960 threw me. 0:03:34.320 --> 0:03:34.519 Yeah. 0:03:34.639 --> 0:03:37.480 No, it's more and more anyway, as we enter a 0:03:37.520 --> 0:03:40.400 world at which the vast majority, if not already of 0:03:40.480 --> 0:03:43.120 words written are written by AI, is going to be 0:03:43.200 --> 0:03:45.760 interested in this question of whether we know Anyway, there's 0:03:45.800 --> 0:03:48.520 this company called Pangram Labs, and they have a little 0:03:48.560 --> 0:03:50.440 thing and you can pay for it, but also a 0:03:50.440 --> 0:03:52.600 free service where you can drop like a text in 0:03:53.320 --> 0:03:55.320 and it'll say the odds that is written by human 0:03:55.440 --> 0:03:58.320 or AI. And I'm pretty impressed by it. I like 0:03:58.360 --> 0:04:01.320 did some samples of my own writing and then AI 0:04:01.440 --> 0:04:03.560 outputs it got them all right, But then I did 0:04:03.560 --> 0:04:05.680 some like further, like I tried to stump it to 0:04:05.720 --> 0:04:07.720 see if like. So, what I did was I took 0:04:07.760 --> 0:04:10.280 a piece of AI writing and then I had it 0:04:10.320 --> 0:04:13.600 translated into Chinese, okay, and then I had it translate 0:04:13.640 --> 0:04:16.400 that into High Chinese, so it's like, okay, imagine this 0:04:16.480 --> 0:04:19.160 is being written by a more formal register. And then 0:04:19.200 --> 0:04:21.920 I had that translated into Hebrew, and then I had 0:04:21.960 --> 0:04:24.960 that translated into English. So the original thing through this 0:04:25.080 --> 0:04:27.920 series of Ai telephone, through various translations, and then I 0:04:27.960 --> 0:04:30.240 put that output back into Pangram. 0:04:30.360 --> 0:04:31.640 I got that right. It said it was Ai. 0:04:31.720 --> 0:04:35.240 So even after a series of sort of transformations designed 0:04:35.279 --> 0:04:39.280 to obfuscate the original style of the piece to see 0:04:39.320 --> 0:04:41.600 if you know, eventually it would emerge in something else. 0:04:41.839 --> 0:04:44.160 So I was pretty impressed. It seems to work. And 0:04:44.240 --> 0:04:46.400 you know, I think that's interesting for a couple of reasons, 0:04:46.400 --> 0:04:49.320 which is maybe there is something that you can just tell. 0:04:49.680 --> 0:04:52.120 But two, it sort of worries me because you know, 0:04:52.320 --> 0:04:54.480 there have been articles and they'll say like, this is 0:04:54.480 --> 0:04:56.360 written by Ai, And I think one of my big 0:04:56.360 --> 0:04:58.240 fears would be that I write something. 0:04:58.600 --> 0:04:59.760 I like to use an mdash. 0:05:00.000 --> 0:05:02.520 I've always been in them, dash fan, I love m dashes. 0:05:02.600 --> 0:05:03.520 That's how people talk. 0:05:03.640 --> 0:05:04.200 I'm sorry. 0:05:04.400 --> 0:05:06.400 And then what if it says you wrote this by Ai, 0:05:06.640 --> 0:05:08.560 and I'm like, I didn't, And then here's this black 0:05:08.600 --> 0:05:11.680 box that is suddenly like Judge Jurgen, executioner for my 0:05:12.279 --> 0:05:15.880 career potentially who wrote this. AI the Lab says, so 0:05:16.440 --> 0:05:18.640 you are now done? Like that worries me. So I 0:05:18.640 --> 0:05:21.680 think this raises a lot of very interesting questions about 0:05:21.680 --> 0:05:23.960 these molde little detection things, and I want to learn 0:05:23.960 --> 0:05:24.640 more about how well. 0:05:24.640 --> 0:05:27.440 There's also a lot of philosophical questions about just what 0:05:27.480 --> 0:05:30.919 we value in writing true as well, because no one's 0:05:30.960 --> 0:05:33.760 going to yell at you for using spell check or 0:05:33.800 --> 0:05:36.039 something like that, right, Like, it's kind of crazy to 0:05:36.040 --> 0:05:39.000 think that reputational risk is going to hinge on whether 0:05:39.120 --> 0:05:41.640 or not you might have used a platform, a chat 0:05:41.680 --> 0:05:44.760 platform to like do some basic copy editing. 0:05:45.000 --> 0:05:47.320 Totally well, very happy to say, we do, in fact 0:05:47.360 --> 0:05:48.160 have the perfect guest. 0:05:48.440 --> 0:05:50.120 We're going to be speaking with Max Spiro. 0:05:50.240 --> 0:05:52.880 He is the founder and CEO of Pangram Labs, and 0:05:52.880 --> 0:05:54.720 he can answer all of our questions. So Max, thank 0:05:54.720 --> 0:05:55.600 you so much for coming on. 0:05:55.560 --> 0:05:56.919 Outlaws, Thanks for having me. 0:05:57.160 --> 0:05:58.120 How do you know it's right? 0:05:58.279 --> 0:06:00.600 So someone puts in a piece of tech and we'll 0:06:00.600 --> 0:06:02.440 get into the method in the second. But someone puts 0:06:02.440 --> 0:06:05.440 in a piece of text and it says human AI, 0:06:06.320 --> 0:06:08.719 what makes you believe that you have a very good. 0:06:08.560 --> 0:06:09.760 Track record all this question. 0:06:09.960 --> 0:06:12.520 So when we started Pangram, we started by doing this 0:06:12.560 --> 0:06:15.840 thing we call a human baseline, which is how well 0:06:16.120 --> 0:06:19.680 can we as a human predict whether something's AI or not? 0:06:19.960 --> 0:06:23.039 That's the first step out like learning, is this problem tractable? 0:06:23.440 --> 0:06:25.800 How hard or easy is it? And I found, like. 0:06:26.120 --> 0:06:29.240 Me personally, I was able to get about ninety percent accuracy, 0:06:29.720 --> 0:06:32.680 and so we figured an AI model should be able 0:06:32.720 --> 0:06:33.279 to do much. 0:06:33.120 --> 0:06:33.599 Better than that. 0:06:33.920 --> 0:06:37.359 So I have a bunch of methodology questions which we 0:06:37.400 --> 0:06:40.440 can get into. But just before we get into any 0:06:40.440 --> 0:06:44.240 of that, why is AI slot bad in your opinion? 0:06:44.279 --> 0:06:46.480 Why does it need to be tracked and identified? 0:06:46.760 --> 0:06:48.680 I think the problem is is just so easy to 0:06:48.760 --> 0:06:51.720 generate and so like it's very difficult to know, like 0:06:52.240 --> 0:06:56.080 what is the like intent behind it? Basically, Like right now, 0:06:56.360 --> 0:06:58.560 I think we're actually pretty lucky living. We live in 0:06:58.640 --> 0:07:02.039 a world where the signs noise ratio on the Internet 0:07:02.040 --> 0:07:03.279 and in our information. 0:07:02.920 --> 0:07:03.920 Channels is pretty high. 0:07:04.040 --> 0:07:06.839 We have pretty high signal to noise, But any bad 0:07:06.839 --> 0:07:10.520 actor can come in and just flood our information channels 0:07:10.560 --> 0:07:15.000 with aislot that looks legitimate. It looks like somebody put 0:07:15.040 --> 0:07:18.760 actual effort and thought into it, but really it was 0:07:18.880 --> 0:07:21.440 just like a single prompt which could have also been automated. 0:07:21.600 --> 0:07:23.679 This is something that I think about a lot, which 0:07:23.720 --> 0:07:26.239 is that there was a point in time and maybe 0:07:26.280 --> 0:07:28.960 still is the point in time where if you read 0:07:29.000 --> 0:07:33.120 something that was grammatically correct, where the punctuation was strong, 0:07:33.400 --> 0:07:36.640 where the spelling was strong, there was reason to think 0:07:36.680 --> 0:07:39.400 that the person who wrote it was a person of 0:07:39.560 --> 0:07:43.240 like certain seriousness and a certain intelligence behind it. 0:07:43.560 --> 0:07:45.640 And I think that the issue that you're. 0:07:45.520 --> 0:07:48.600 Identifying is that that link is now being severed so 0:07:48.640 --> 0:07:51.800 that we can't use these heuristics anymore, such as the 0:07:51.840 --> 0:07:55.640 strict quality of the pros to know in fact whether 0:07:55.920 --> 0:07:59.000 this was published by someone who was like a serious actor, 0:07:59.200 --> 0:08:00.320 intelligent or or not. 0:08:00.480 --> 0:08:03.600 And now you have people inserting typos into their card 0:08:04.000 --> 0:08:06.680 that's true that they are Yeah boyd. 0:08:06.680 --> 0:08:09.840 Sorry just to go back to my original question. So 0:08:09.880 --> 0:08:12.480 you mentioned, okay, you're able to get it ninety percent right, 0:08:12.480 --> 0:08:14.320 but now we've been used a lot more and you 0:08:14.320 --> 0:08:19.040 have people paying for your software, presumably teachers and journalists, etc. 0:08:20.160 --> 0:08:23.280 Given all of that, getting from ninety percent to one hundred, 0:08:23.320 --> 0:08:25.160 I mean, if you could make one out of ten 0:08:25.200 --> 0:08:28.240 it's clearly an unacceptable error raid for a piece of 0:08:28.240 --> 0:08:31.640 commercial software that could call someone an AI creator. So 0:08:31.680 --> 0:08:33.360 you have to do a lot better than ninety percent. 0:08:33.800 --> 0:08:36.360 Talk to us about like what you've seen so far 0:08:36.559 --> 0:08:39.920 in your data since releasing it as commercial software that 0:08:40.040 --> 0:08:43.600 makes you believe the software is doing a correct job 0:08:43.679 --> 0:08:45.720 of allocating between the two categories. 0:08:45.760 --> 0:08:49.679 So we've built out really comprehensive emails, okay, and so 0:08:49.880 --> 0:08:54.240 our evaluations. There's two kinds of errors. There's a false positive, 0:08:54.520 --> 0:08:56.920 which is when something is written by a human and 0:08:56.960 --> 0:08:58.720 then we say that it's written by an AI, okay. 0:08:58.760 --> 0:09:00.839 And there's a false negative, which is if it was 0:09:00.920 --> 0:09:03.840 AI written and we don't catch it. And so we 0:09:04.040 --> 0:09:07.839 track our numbers for both of these, and for human. 0:09:07.559 --> 0:09:09.079 Writing, we're actually pretty fortunate. 0:09:09.240 --> 0:09:11.080 We have like millions and millions of samples, so we 0:09:11.120 --> 0:09:13.640 can get like a false positive number that we have 0:09:13.679 --> 0:09:16.080 a very high degree of confidence in. And our number 0:09:16.160 --> 0:09:19.080 right now is about one in ten thousand. Ok So, 0:09:19.160 --> 0:09:22.760 if we scan ten thousand documents on average, one will 0:09:22.800 --> 0:09:23.480 come back as. 0:09:23.840 --> 0:09:25.240 AI when it was actually human. 0:09:25.440 --> 0:09:27.319 And what about in the other direction false negative? 0:09:27.720 --> 0:09:31.760 I would say around ninety nine percent accuracy, So like 0:09:32.120 --> 0:09:35.080 around one percent false negative rate. I think this depends 0:09:35.080 --> 0:09:38.440 a little bit more on like how adversarial the prompting is, 0:09:38.640 --> 0:09:40.720 how much they're trying to ev. 0:09:40.720 --> 0:09:44.280 What I did exact send it through multiple filtrations to 0:09:44.360 --> 0:09:47.600 obfuscate the original output. That would be an example of 0:09:47.640 --> 0:09:49.240 adversarial prompting exactly. 0:09:49.480 --> 0:09:52.079 But in like the general case where we're just looking 0:09:52.120 --> 0:09:55.880 at straight outputs from AI, it's above ninety nine percent. 0:09:55.960 --> 0:09:59.000 Okay, okay, So what is your model looking for exactly 0:09:59.040 --> 0:10:02.120 when it's evaluated a text? Because, as we mentioned in 0:10:02.160 --> 0:10:05.560 the intro, you know, syntax and grammar tends to be 0:10:05.679 --> 0:10:10.599 pretty good on AI generated copy. The style is sometimes 0:10:10.640 --> 0:10:14.760 more of an identifier, I would argue to your point, Joe, like, 0:10:14.960 --> 0:10:19.320 sometimes it reads very saccharine and kind of overly earnest 0:10:19.640 --> 0:10:22.280 in some ways. So what exactly are you focusing on here? 0:10:22.280 --> 0:10:23.000 What are the tells? 0:10:23.200 --> 0:10:26.120 Yeah, so the style and the word choices are definitely 0:10:26.200 --> 0:10:27.760 part of it. But I think what a lot of 0:10:27.760 --> 0:10:30.200 people don't realize is they're actually making a lot of 0:10:30.559 --> 0:10:33.720 decisions when they write a piece of text. So there's 0:10:33.840 --> 0:10:36.800 you know, dozens or hundreds of ways to phrase every 0:10:36.840 --> 0:10:39.680 single phrase, and over the course of fifty or one 0:10:39.720 --> 0:10:43.240 hundred or two hundred words, you're making thousands of decisions actually, 0:10:43.679 --> 0:10:46.400 And so what we're doing is we're learning the patterns 0:10:46.400 --> 0:10:49.880 and how like these frontier models make these decisions. And 0:10:49.960 --> 0:10:53.000 if the vast majority of these decisions line up with 0:10:53.040 --> 0:10:56.160 how the frontier models are doing it, then it's vanishingly 0:10:56.240 --> 0:10:58.600 unlikely that this was written by a human. You would 0:10:58.640 --> 0:11:01.240 have to just happen to make the same exact decisions 0:11:01.240 --> 0:11:03.240 that the LM does hundreds of times. 0:11:03.280 --> 0:11:04.280 Interesting, Okay, this. 0:11:04.320 --> 0:11:05.480 Is a really important point. 0:11:05.559 --> 0:11:08.200 So everyone at this point has some feel for let 0:11:08.280 --> 0:11:11.400 go the M dash tell right, But my understanding is 0:11:11.440 --> 0:11:13.640 it's not like you don't go in in like hard 0:11:13.679 --> 0:11:15.960 code if you see a bunch of M dashes. This 0:11:16.080 --> 0:11:19.920 is the thing these decisions. In many cases, I imagine, 0:11:19.960 --> 0:11:24.840 neither you nor the model itself can articulate in English 0:11:25.080 --> 0:11:27.720 what the decisions are. All you know is that the 0:11:27.760 --> 0:11:29.160 decision pattern exists. 0:11:29.240 --> 0:11:29.880 Is this correct? 0:11:30.000 --> 0:11:30.679 This is correct? 0:11:30.720 --> 0:11:31.840 Okay? Can you explain? 0:11:32.000 --> 0:11:35.120 So therefore, what does it mean that your model has 0:11:35.280 --> 0:11:37.079 learned these decision? 0:11:37.480 --> 0:11:39.920 So what we're doing on the very broad scale is 0:11:40.080 --> 0:11:42.920 we're training a deep learning model. So it's a pretty 0:11:42.920 --> 0:11:46.400 big black box, but it has the base model of 0:11:47.040 --> 0:11:50.040 a language model, and then instead of predicting the next token, 0:11:50.520 --> 0:11:53.880 it's predicting whether it the text is AI or not. Okay, 0:11:53.960 --> 0:11:56.800 And how we train it is we train on tens 0:11:56.840 --> 0:11:59.960 of millions of examples, so it sees millions and milli 0:12:00.160 --> 0:12:02.959 of human examples, and for each human example, we also 0:12:03.000 --> 0:12:05.920 show it an AI example. So, for example, let's say 0:12:05.920 --> 0:12:09.000 one of these is a five star review for Denny's 0:12:09.200 --> 0:12:11.959 that's seventy eight words long. Then we'll ask in AI 0:12:12.200 --> 0:12:14.120 to write a five star review about Denny's that's seventy 0:12:14.120 --> 0:12:16.240 eight words long in the style of the first one. 0:12:16.440 --> 0:12:18.840 And obviously these two will be different, and so our 0:12:18.880 --> 0:12:22.080 model is able to learn through contrast, what is the 0:12:22.080 --> 0:12:23.000 difference between. 0:12:22.720 --> 0:12:24.840 Me and The Important thing, sorry, just to be clear here, 0:12:25.000 --> 0:12:26.960 is that you and I might not be able to 0:12:27.040 --> 0:12:30.439 articulate the difference. There will be some difference in maybe 0:12:30.520 --> 0:12:33.240 the sentenced length, there will be some difference in word choice, 0:12:33.240 --> 0:12:36.480 there'll be some difference in punctuation, syntax, whatever, but you 0:12:36.600 --> 0:12:40.240 and I wouldn't obviously spot it. However, after millions of 0:12:40.280 --> 0:12:43.640 examples of these side by sides, the model learns what 0:12:43.679 --> 0:12:44.640 the difference is exactly. 0:12:44.720 --> 0:12:46.560 I think the best that a human can do is 0:12:46.720 --> 0:12:49.800 look for some of these like really obvious tells like chat. 0:12:49.880 --> 0:12:53.440 GIPT loves that, like it's not just X, it's y framing. 0:12:53.800 --> 0:12:57.240 Earlier models really liked some specific words like tapestry and 0:12:57.320 --> 0:12:58.760 intercate and delve. 0:12:58.840 --> 0:13:00.360 Yeah, delve tapestry. Yeah. 0:13:00.400 --> 0:13:00.960 But yeah. 0:13:01.000 --> 0:13:03.079 I think by training Pangram, we're able to go much 0:13:03.120 --> 0:13:05.640 deeper than this and look deeper than the high level 0:13:05.640 --> 0:13:08.120 science at the like document level science. 0:13:23.960 --> 0:13:26.080 So one thing this kind of reminds me of and 0:13:26.120 --> 0:13:28.559 I'm thinking how to phrase this, but it reminds me 0:13:28.600 --> 0:13:31.800 of you know those exercises people used to do where 0:13:31.800 --> 0:13:34.000 you would take a bunch of different faces and meld 0:13:34.040 --> 0:13:37.200 them all together and come up with like one face 0:13:37.320 --> 0:13:41.120 that was supposedly attractive. So, like, to what extent is 0:13:41.160 --> 0:13:46.560 this basically a distributional detector in the sense that you're 0:13:46.600 --> 0:13:50.960 looking for like certain paths that you think AI would choose. 0:13:51.800 --> 0:13:54.239 And I guess, like, could you get a false positive 0:13:54.840 --> 0:13:57.440 just from someone who's choosing like the average of the 0:13:57.480 --> 0:14:00.320 average of the average in a way to state a 0:14:00.320 --> 0:14:01.200 particular sentence. 0:14:03.360 --> 0:14:06.400 Maybe there's a reason we have our false posit rate 0:14:06.440 --> 0:14:08.840 is one in ten thousand and not zero. It's because 0:14:09.200 --> 0:14:12.319 you know, sometimes we look at the false positive and 0:14:12.360 --> 0:14:15.559 it's like, oh, it reads exactly like an AI generated 0:14:15.720 --> 0:14:18.600 review or essay, except that it was written in twenty nineteen. 0:14:18.640 --> 0:14:21.000 So it was probably a human who just happened to 0:14:21.800 --> 0:14:24.840 find the exact like mode collapsed. 0:14:24.640 --> 0:14:26.720 Type of way that like, yeah, thats right, Yeah, I 0:14:26.760 --> 0:14:27.400 would say, yeah. 0:14:27.480 --> 0:14:29.440 I think it's a good way to think about the 0:14:29.480 --> 0:14:32.840 distribution of writing or writing as a distribution where like, 0:14:32.920 --> 0:14:35.520 you know, there's the space of all human writing, and 0:14:35.560 --> 0:14:37.920 then AI writing is really just. 0:14:37.920 --> 0:14:39.840 Like a small point within this space. 0:14:39.880 --> 0:14:42.360 It's very no matter how much you prompt it, it 0:14:42.400 --> 0:14:46.160 doesn't go that far from where it was trained to be. 0:14:46.440 --> 0:14:48.120 Yeah, okay, WA's the black book. 0:14:48.200 --> 0:14:50.520 So I built a little model myself. I built this 0:14:50.560 --> 0:14:53.080 thing that detext. You can upload text and says whether 0:14:53.120 --> 0:14:56.600 it's more resemblant of the written word or the spoken word. 0:14:57.040 --> 0:14:59.600 Oh I saw that, yeah, yeah, And I used bert, 0:14:59.640 --> 0:15:02.480 which is like one of these things open source one 0:15:02.480 --> 0:15:02.960 from Google. 0:15:03.000 --> 0:15:04.800 What is the core model that. 0:15:04.720 --> 0:15:07.280 You trained on or is it something or did you 0:15:07.320 --> 0:15:08.120 build it yourself? 0:15:08.200 --> 0:15:08.960 Like, talk to us about that. 0:15:09.000 --> 0:15:11.760 Our very first model was actually built on Burt, but 0:15:11.960 --> 0:15:17.360 future models we needed to up our capacity. So basically 0:15:17.440 --> 0:15:20.480 we were running into capacity limits with our model. It 0:15:20.840 --> 0:15:23.840 was capping out at a certain false positive false negative rate. 0:15:24.040 --> 0:15:26.600 It wasn't learning the deeper signals, so we had to 0:15:26.800 --> 0:15:28.960 ten x and then one hundred x the parameter account 0:15:29.160 --> 0:15:32.400 so that can learn like really deeply, like how these 0:15:32.400 --> 0:15:33.400 frontier models. 0:15:33.200 --> 0:15:36.920 Right, Have you noticed any interesting differences between how the 0:15:36.960 --> 0:15:40.760 models right? Can you and actually is your model trained 0:15:40.800 --> 0:15:44.080 to identify different models as well as whether or not 0:15:44.120 --> 0:15:46.440 This is just broadly AI generated. 0:15:46.560 --> 0:15:50.520 So we don't specifically train it on different models. We 0:15:50.520 --> 0:15:52.720 don't say like hey, this one is CLAT three and 0:15:52.760 --> 0:15:56.400 this one is Chat or GPD five. What we've done 0:15:56.680 --> 0:16:00.040 we've done some interpretability work to look at basically the 0:16:00.080 --> 0:16:02.720 output embeddings of the model and where we find that 0:16:02.920 --> 0:16:05.880 it actually learns which model the text came from. So 0:16:05.920 --> 0:16:08.360 you could see like little clusters like this is the 0:16:08.440 --> 0:16:11.440 Clod cluster and like all the clods, yeah, cluster around here, 0:16:11.440 --> 0:16:13.760 and then these are like the deep Seek and Quinn 0:16:13.840 --> 0:16:15.760 and then this is like Chat schipt and they all 0:16:15.840 --> 0:16:19.680 kind of like cluster into different spaces and embedding space. 0:16:20.240 --> 0:16:22.640 So clearly the model is able to learn what the 0:16:22.640 --> 0:16:24.320 difference is between these frontier models. 0:16:24.520 --> 0:16:27.480 We actually since you mentioned Quin, I'm very interested is 0:16:27.480 --> 0:16:31.040 there anything like distinct in terms of how Quen generates 0:16:31.080 --> 0:16:34.600 text versus platforms that have been developed in the US. 0:16:35.120 --> 0:16:37.640 I think Quen is unique because it's trained on a 0:16:37.680 --> 0:16:40.640 lot more Chinese and multi lingual tokens than other models. 0:16:41.360 --> 0:16:44.200 So you know, I've heard from Chinese friends that it's 0:16:44.320 --> 0:16:49.680 it's much better at like being conversationally fluent in Chinese. 0:16:50.320 --> 0:16:52.400 Beyond that, I don't know that I can tell. 0:16:52.760 --> 0:16:54.280 It would be hard for me to look at a 0:16:54.320 --> 0:16:57.360 text and say, like, I know that's Quen, But I 0:16:57.360 --> 0:16:59.680 think somebody who's more familiar with it might be able to. 0:17:00.200 --> 0:17:02.880 Let's talk about sort of some of the philosophical or 0:17:02.920 --> 0:17:04.720 societal implications of this work. 0:17:05.240 --> 0:17:06.040 Have you had. 0:17:05.920 --> 0:17:10.120 Anyone whose text has been judged to be ai written 0:17:10.160 --> 0:17:12.840 by Pangram and they're like, I swear to God, this 0:17:12.880 --> 0:17:15.639 isn't you're in? They like, really insist, and what do 0:17:15.640 --> 0:17:17.399 you think about this situation? What do you do or 0:17:17.440 --> 0:17:18.200 talk choice about that. 0:17:18.359 --> 0:17:20.439 I've had a couple of times this happened. There have 0:17:20.440 --> 0:17:22.600 been times where I genuinely believe that you know this 0:17:22.720 --> 0:17:24.879 is just a false positive. We scan hundreds of millions 0:17:24.880 --> 0:17:27.040 of documents, so like, at a certain scale like this 0:17:27.040 --> 0:17:30.359 will happen. But I also get people who all the 0:17:30.400 --> 0:17:32.720 time they're just like AI detectors don't work. 0:17:32.840 --> 0:17:34.040 It's like a total fraud. 0:17:34.280 --> 0:17:37.040 And then whatever they're putting out on LinkedIn is just 0:17:37.080 --> 0:17:38.760 one hundred percent AI generated. 0:17:38.440 --> 0:17:40.120 And they're just like mad that they're getting called out. 0:17:40.440 --> 0:17:43.200 And then you look back farther into their past and 0:17:43.200 --> 0:17:45.600 their history, like everything they're putting out is AI generated 0:17:46.000 --> 0:17:49.320 until about like twenty twenty three, Like for everyone, if 0:17:49.359 --> 0:17:52.120 you look historically, there's a lot of like slop accounts 0:17:52.119 --> 0:17:54.800 that are putting out total slop, and you can tell 0:17:54.800 --> 0:17:57.800 either they like weren't posting as much before, and if 0:17:57.880 --> 0:18:00.479 you scan back in time, then you see that they 0:18:00.480 --> 0:18:02.160