1 00:00:02,720 --> 00:00:16,360 Speaker 1: Bloomberg Audio Studios, Podcasts, Radio News. 2 00:00:18,480 --> 00:00:21,840 Speaker 2: Hello and welcome to another episode of The Odd Lads podcast. 3 00:00:21,920 --> 00:00:24,119 Speaker 3: I'm jille Wisenthal and I'm Tracy Alloway. 4 00:00:24,360 --> 00:00:27,280 Speaker 2: So, Tracy, you know, you ever come across some writing 5 00:00:28,160 --> 00:00:31,720 Speaker 2: you can't articulate exactly why, but you're like, I'm pretty 6 00:00:31,760 --> 00:00:32,720 Speaker 2: sure AI wrote this? 7 00:00:33,120 --> 00:00:34,160 Speaker 3: Does this happen too much? 8 00:00:34,280 --> 00:00:38,600 Speaker 4: So, full disclosure, I haven't really thought about it that much. Yeah, 9 00:00:38,640 --> 00:00:41,640 Speaker 4: because the thing is I probably should think about it more, 10 00:00:41,960 --> 00:00:43,960 Speaker 4: but there's a lot of bad writing out there, and 11 00:00:44,000 --> 00:00:46,440 Speaker 4: I've become sort of a nerd to it. And I 12 00:00:46,479 --> 00:00:49,680 Speaker 4: also think that I don't know trying to figure out 13 00:00:49,680 --> 00:00:53,880 Speaker 4: whether or not something was generated by AI nowadays, if 14 00:00:53,880 --> 00:00:55,960 Speaker 4: you actually dedicate a lot of your own time to 15 00:00:56,120 --> 00:01:00,880 Speaker 4: doing that, that is a huge mental burden to be attempting. 16 00:01:01,000 --> 00:01:03,480 Speaker 4: Especially you and I are in the journalism industry. How 17 00:01:03,560 --> 00:01:05,760 Speaker 4: many of the pitches do you think that we get 18 00:01:05,800 --> 00:01:08,440 Speaker 4: from prs right now are being generated by A I 19 00:01:08,760 --> 00:01:11,520 Speaker 4: imagine if you're reading each one of those and trying 20 00:01:11,560 --> 00:01:13,440 Speaker 4: to figure it out on a daily basis. 21 00:01:13,560 --> 00:01:15,360 Speaker 2: You know what I suppose I think about it the 22 00:01:15,360 --> 00:01:18,200 Speaker 2: most is someone will respond to a tweet yeah, and 23 00:01:18,240 --> 00:01:19,800 Speaker 2: I'll be like, well, if this is a real person, 24 00:01:19,800 --> 00:01:22,319 Speaker 2: then maybe this person deserves some engagement and ask a 25 00:01:22,400 --> 00:01:24,560 Speaker 2: question or I want to respond. But if there's a 26 00:01:24,560 --> 00:01:26,880 Speaker 2: person in the bot, then obviously I don't. And that's 27 00:01:26,880 --> 00:01:28,399 Speaker 2: where I look, you know what, I want to figure 28 00:01:28,400 --> 00:01:30,800 Speaker 2: it out. I would like to know the answer. 29 00:01:30,959 --> 00:01:31,160 Speaker 3: You know. 30 00:01:31,200 --> 00:01:34,200 Speaker 2: I have a controversial view about AI writing, by the way, 31 00:01:34,240 --> 00:01:36,640 Speaker 2: which is that it's pretty good. I mean, like, by 32 00:01:36,680 --> 00:01:38,920 Speaker 2: and large, and I said this, I think maybe in 33 00:01:38,920 --> 00:01:41,640 Speaker 2: a recent episode. When you consider the fact that I 34 00:01:41,680 --> 00:01:44,679 Speaker 2: don't know the majority of the population, like doesn't know 35 00:01:44,680 --> 00:01:47,600 Speaker 2: where to put a comma within the sentence, Well, this 36 00:01:47,680 --> 00:01:48,120 Speaker 2: is my point. 37 00:01:48,320 --> 00:01:48,960 Speaker 3: It's pretty good. 38 00:01:48,960 --> 00:01:49,400 Speaker 5: I mean, yeah. 39 00:01:49,400 --> 00:01:51,160 Speaker 2: One thing I'll say about AI is it never gets 40 00:01:51,240 --> 00:01:52,480 Speaker 2: the placement of a comma wrong. 41 00:01:52,840 --> 00:01:54,160 Speaker 3: On some level, it's perfect. 42 00:01:54,320 --> 00:01:56,000 Speaker 6: Did you do that? I think it was in the 43 00:01:56,000 --> 00:01:57,560 Speaker 6: New York Times the test. 44 00:01:57,600 --> 00:01:58,360 Speaker 3: I kind of hated that. 45 00:01:58,560 --> 00:02:01,200 Speaker 2: Okay, why well, because I'll tell you, first of all, 46 00:02:01,160 --> 00:02:02,240 Speaker 2: it's a five examples. 47 00:02:02,280 --> 00:02:04,520 Speaker 3: There's not very many. Two It asked the reader, which 48 00:02:04,520 --> 00:02:05,080 Speaker 3: do you prefer? 49 00:02:05,160 --> 00:02:07,120 Speaker 4: But I think they were different subjects as well. 50 00:02:07,200 --> 00:02:07,440 Speaker 3: Yeah. 51 00:02:07,600 --> 00:02:10,080 Speaker 2: Also, I think most people probably treated that as can 52 00:02:10,120 --> 00:02:11,959 Speaker 2: you guess which one is a human? Because everyone wants 53 00:02:12,000 --> 00:02:14,320 Speaker 2: to say they prefer the human I didn't think it 54 00:02:14,400 --> 00:02:18,400 Speaker 2: was like a great test. Nonetheless, Look, not only is 55 00:02:18,400 --> 00:02:22,360 Speaker 2: it often indistinguishable, not often is it often fine writing. 56 00:02:22,840 --> 00:02:25,359 Speaker 2: Sometimes AI could come up with a really remarkable turn 57 00:02:25,400 --> 00:02:27,760 Speaker 2: of phrase. Yeah, but I still buy and large don't 58 00:02:27,840 --> 00:02:30,000 Speaker 2: like it. You read like a thing, especially a long 59 00:02:30,040 --> 00:02:32,760 Speaker 2: text a's AI, and it's like, even if you can't articulate. 60 00:02:32,360 --> 00:02:33,880 Speaker 3: It, it's like this feels AI. 61 00:02:33,960 --> 00:02:36,640 Speaker 2: It has a certain sickliness sweetness to it that is 62 00:02:36,680 --> 00:02:37,320 Speaker 2: often annoying. 63 00:02:37,320 --> 00:02:38,160 Speaker 3: It's annoying. 64 00:02:38,400 --> 00:02:41,000 Speaker 4: What I notice about it is it doesn't do style 65 00:02:41,200 --> 00:02:43,240 Speaker 4: very well, right, So if you ask it to write 66 00:02:43,240 --> 00:02:45,840 Speaker 4: something in the style of a writer, if you choose 67 00:02:45,880 --> 00:02:49,240 Speaker 4: anything other than something really obvious like Shakespeare, it really 68 00:02:49,480 --> 00:02:53,120 Speaker 4: it suffers. But the text that it actually outputs is 69 00:02:53,160 --> 00:02:58,519 Speaker 4: pretty clear. Yeah, right, like for basic understanding. Total it's 70 00:02:58,639 --> 00:03:01,440 Speaker 4: probably better than a lotful what's on the internet. 71 00:03:01,760 --> 00:03:03,200 Speaker 2: The real people who are going to have to worry 72 00:03:03,240 --> 00:03:07,840 Speaker 2: about this are like teachers obviously, universities and lawyers, student 73 00:03:07,919 --> 00:03:11,040 Speaker 2: lawyers and maybe at it's fun, but there are sometimes 74 00:03:11,080 --> 00:03:12,800 Speaker 2: it's like, Okay, did someone write this or not? 75 00:03:13,000 --> 00:03:14,920 Speaker 3: And there has to be it'd be nice if we 76 00:03:14,960 --> 00:03:16,120 Speaker 3: could know the answer. 77 00:03:16,320 --> 00:03:19,280 Speaker 4: Well, the other thing that's starting to happen is have 78 00:03:19,400 --> 00:03:21,840 Speaker 4: you seen any books out there that actually come with 79 00:03:21,960 --> 00:03:25,240 Speaker 4: a disclosure or disclaimer that say this book has been 80 00:03:25,280 --> 00:03:26,760 Speaker 4: written only by humans? 81 00:03:26,800 --> 00:03:26,880 Speaker 5: No? 82 00:03:27,000 --> 00:03:28,079 Speaker 6: AI used at all. 83 00:03:28,120 --> 00:03:29,720 Speaker 4: I saw that for the first time on a book 84 00:03:29,760 --> 00:03:32,000 Speaker 4: that we actually read for an All Blots episode. I 85 00:03:32,000 --> 00:03:33,600 Speaker 4: don't think it's come out yet, but that kind of 86 00:03:33,639 --> 00:03:33,960 Speaker 4: threw me. 87 00:03:34,320 --> 00:03:34,519 Speaker 1: Yeah. 88 00:03:34,639 --> 00:03:37,480 Speaker 2: No, it's more and more anyway, as we enter a 89 00:03:37,520 --> 00:03:40,400 Speaker 2: world at which the vast majority, if not already of 90 00:03:40,480 --> 00:03:43,120 Speaker 2: words written are written by AI, is going to be 91 00:03:43,200 --> 00:03:45,760 Speaker 2: interested in this question of whether we know Anyway, there's 92 00:03:45,800 --> 00:03:48,520 Speaker 2: this company called Pangram Labs, and they have a little 93 00:03:48,560 --> 00:03:50,440 Speaker 2: thing and you can pay for it, but also a 94 00:03:50,440 --> 00:03:52,600 Speaker 2: free service where you can drop like a text in 95 00:03:53,320 --> 00:03:55,320 Speaker 2: and it'll say the odds that is written by human 96 00:03:55,440 --> 00:03:58,320 Speaker 2: or AI. And I'm pretty impressed by it. I like 97 00:03:58,360 --> 00:04:01,320 Speaker 2: did some samples of my own writing and then AI 98 00:04:01,440 --> 00:04:03,560 Speaker 2: outputs it got them all right, But then I did 99 00:04:03,560 --> 00:04:05,680 Speaker 2: some like further, like I tried to stump it to 100 00:04:05,720 --> 00:04:07,720 Speaker 2: see if like. So, what I did was I took 101 00:04:07,760 --> 00:04:10,280 Speaker 2: a piece of AI writing and then I had it 102 00:04:10,320 --> 00:04:13,600 Speaker 2: translated into Chinese, okay, and then I had it translate 103 00:04:13,640 --> 00:04:16,400 Speaker 2: that into High Chinese, so it's like, okay, imagine this 104 00:04:16,480 --> 00:04:19,160 Speaker 2: is being written by a more formal register. And then 105 00:04:19,200 --> 00:04:21,920 Speaker 2: I had that translated into Hebrew, and then I had 106 00:04:21,960 --> 00:04:24,960 Speaker 2: that translated into English. So the original thing through this 107 00:04:25,080 --> 00:04:27,920 Speaker 2: series of Ai telephone, through various translations, and then I 108 00:04:27,960 --> 00:04:30,240 Speaker 2: put that output back into Pangram. 109 00:04:30,360 --> 00:04:31,640 Speaker 3: I got that right. It said it was Ai. 110 00:04:31,720 --> 00:04:35,240 Speaker 2: So even after a series of sort of transformations designed 111 00:04:35,279 --> 00:04:39,280 Speaker 2: to obfuscate the original style of the piece to see 112 00:04:39,320 --> 00:04:41,600 Speaker 2: if you know, eventually it would emerge in something else. 113 00:04:41,839 --> 00:04:44,160 Speaker 2: So I was pretty impressed. It seems to work. And 114 00:04:44,240 --> 00:04:46,400 Speaker 2: you know, I think that's interesting for a couple of reasons, 115 00:04:46,400 --> 00:04:49,320 Speaker 2: which is maybe there is something that you can just tell. 116 00:04:49,680 --> 00:04:52,120 Speaker 2: But two, it sort of worries me because you know, 117 00:04:52,320 --> 00:04:54,480 Speaker 2: there have been articles and they'll say like, this is 118 00:04:54,480 --> 00:04:56,360 Speaker 2: written by Ai, And I think one of my big 119 00:04:56,360 --> 00:04:58,240 Speaker 2: fears would be that I write something. 120 00:04:58,600 --> 00:04:59,760 Speaker 3: I like to use an mdash. 121 00:05:00,000 --> 00:05:02,520 Speaker 4: I've always been in them, dash fan, I love m dashes. 122 00:05:02,600 --> 00:05:03,520 Speaker 4: That's how people talk. 123 00:05:03,640 --> 00:05:04,200 Speaker 6: I'm sorry. 124 00:05:04,400 --> 00:05:06,400 Speaker 2: And then what if it says you wrote this by Ai, 125 00:05:06,640 --> 00:05:08,560 Speaker 2: and I'm like, I didn't, And then here's this black 126 00:05:08,600 --> 00:05:11,680 Speaker 2: box that is suddenly like Judge Jurgen, executioner for my 127 00:05:12,279 --> 00:05:15,880 Speaker 2: career potentially who wrote this. AI the Lab says, so 128 00:05:16,440 --> 00:05:18,640 Speaker 2: you are now done? Like that worries me. So I 129 00:05:18,640 --> 00:05:21,680 Speaker 2: think this raises a lot of very interesting questions about 130 00:05:21,680 --> 00:05:23,960 Speaker 2: these molde little detection things, and I want to learn 131 00:05:23,960 --> 00:05:24,640 Speaker 2: more about how well. 132 00:05:24,640 --> 00:05:27,440 Speaker 4: There's also a lot of philosophical questions about just what 133 00:05:27,480 --> 00:05:30,919 Speaker 4: we value in writing true as well, because no one's 134 00:05:30,960 --> 00:05:33,760 Speaker 4: going to yell at you for using spell check or 135 00:05:33,800 --> 00:05:36,039 Speaker 4: something like that, right, Like, it's kind of crazy to 136 00:05:36,040 --> 00:05:39,000 Speaker 4: think that reputational risk is going to hinge on whether 137 00:05:39,120 --> 00:05:41,640 Speaker 4: or not you might have used a platform, a chat 138 00:05:41,680 --> 00:05:44,760 Speaker 4: platform to like do some basic copy editing. 139 00:05:45,000 --> 00:05:47,320 Speaker 2: Totally well, very happy to say, we do, in fact 140 00:05:47,360 --> 00:05:48,160 Speaker 2: have the perfect guest. 141 00:05:48,440 --> 00:05:50,120 Speaker 3: We're going to be speaking with Max Spiro. 142 00:05:50,240 --> 00:05:52,880 Speaker 2: He is the founder and CEO of Pangram Labs, and 143 00:05:52,880 --> 00:05:54,720 Speaker 2: he can answer all of our questions. So Max, thank 144 00:05:54,720 --> 00:05:55,600 Speaker 2: you so much for coming on. 145 00:05:55,560 --> 00:05:56,919 Speaker 5: Outlaws, Thanks for having me. 146 00:05:57,160 --> 00:05:58,120 Speaker 3: How do you know it's right? 147 00:05:58,279 --> 00:06:00,600 Speaker 2: So someone puts in a piece of tech and we'll 148 00:06:00,600 --> 00:06:02,440 Speaker 2: get into the method in the second. But someone puts 149 00:06:02,440 --> 00:06:05,440 Speaker 2: in a piece of text and it says human AI, 150 00:06:06,320 --> 00:06:08,719 Speaker 2: what makes you believe that you have a very good. 151 00:06:08,560 --> 00:06:09,760 Speaker 3: Track record all this question. 152 00:06:09,960 --> 00:06:12,520 Speaker 7: So when we started Pangram, we started by doing this 153 00:06:12,560 --> 00:06:15,840 Speaker 7: thing we call a human baseline, which is how well 154 00:06:16,120 --> 00:06:19,680 Speaker 7: can we as a human predict whether something's AI or not? 155 00:06:19,960 --> 00:06:23,039 Speaker 7: That's the first step out like learning, is this problem tractable? 156 00:06:23,440 --> 00:06:25,800 Speaker 5: How hard or easy is it? And I found, like. 157 00:06:26,120 --> 00:06:29,240 Speaker 7: Me personally, I was able to get about ninety percent accuracy, 158 00:06:29,720 --> 00:06:32,680 Speaker 7: and so we figured an AI model should be able 159 00:06:32,720 --> 00:06:33,279 Speaker 7: to do much. 160 00:06:33,120 --> 00:06:33,599 Speaker 5: Better than that. 161 00:06:33,920 --> 00:06:37,359 Speaker 4: So I have a bunch of methodology questions which we 162 00:06:37,400 --> 00:06:40,440 Speaker 4: can get into. But just before we get into any 163 00:06:40,440 --> 00:06:44,240 Speaker 4: of that, why is AI slot bad in your opinion? 164 00:06:44,279 --> 00:06:46,480 Speaker 4: Why does it need to be tracked and identified? 165 00:06:46,760 --> 00:06:48,680 Speaker 7: I think the problem is is just so easy to 166 00:06:48,760 --> 00:06:51,720 Speaker 7: generate and so like it's very difficult to know, like 167 00:06:52,240 --> 00:06:56,080 Speaker 7: what is the like intent behind it? Basically, Like right now, 168 00:06:56,360 --> 00:06:58,560 Speaker 7: I think we're actually pretty lucky living. We live in 169 00:06:58,640 --> 00:07:02,039 Speaker 7: a world where the signs noise ratio on the Internet 170 00:07:02,040 --> 00:07:03,279 Speaker 7: and in our information. 171 00:07:02,920 --> 00:07:03,920 Speaker 5: Channels is pretty high. 172 00:07:04,040 --> 00:07:06,839 Speaker 7: We have pretty high signal to noise, But any bad 173 00:07:06,839 --> 00:07:10,520 Speaker 7: actor can come in and just flood our information channels 174 00:07:10,560 --> 00:07:15,000 Speaker 7: with aislot that looks legitimate. It looks like somebody put 175 00:07:15,040 --> 00:07:18,760 Speaker 7: actual effort and thought into it, but really it was 176 00:07:18,880 --> 00:07:21,440 Speaker 7: just like a single prompt which could have also been automated. 177 00:07:21,600 --> 00:07:23,679 Speaker 2: This is something that I think about a lot, which 178 00:07:23,720 --> 00:07:26,239 Speaker 2: is that there was a point in time and maybe 179 00:07:26,280 --> 00:07:28,960 Speaker 2: still is the point in time where if you read 180 00:07:29,000 --> 00:07:33,120 Speaker 2: something that was grammatically correct, where the punctuation was strong, 181 00:07:33,400 --> 00:07:36,640 Speaker 2: where the spelling was strong, there was reason to think 182 00:07:36,680 --> 00:07:39,400 Speaker 2: that the person who wrote it was a person of 183 00:07:39,560 --> 00:07:43,240 Speaker 2: like certain seriousness and a certain intelligence behind it. 184 00:07:43,560 --> 00:07:45,640 Speaker 3: And I think that the issue that you're. 185 00:07:45,520 --> 00:07:48,600 Speaker 2: Identifying is that that link is now being severed so 186 00:07:48,640 --> 00:07:51,800 Speaker 2: that we can't use these heuristics anymore, such as the 187 00:07:51,840 --> 00:07:55,640 Speaker 2: strict quality of the pros to know in fact whether 188 00:07:55,920 --> 00:07:59,000 Speaker 2: this was published by someone who was like a serious actor, 189 00:07:59,200 --> 00:08:00,320 Speaker 2: intelligent or or not. 190 00:08:00,480 --> 00:08:03,600 Speaker 4: And now you have people inserting typos into their card 191 00:08:04,000 --> 00:08:06,680 Speaker 4: that's true that they are Yeah boyd. 192 00:08:06,680 --> 00:08:09,840 Speaker 2: Sorry just to go back to my original question. So 193 00:08:09,880 --> 00:08:12,480 Speaker 2: you mentioned, okay, you're able to get it ninety percent right, 194 00:08:12,480 --> 00:08:14,320 Speaker 2: but now we've been used a lot more and you 195 00:08:14,320 --> 00:08:19,040 Speaker 2: have people paying for your software, presumably teachers and journalists, etc. 196 00:08:20,160 --> 00:08:23,280 Speaker 2: Given all of that, getting from ninety percent to one hundred, 197 00:08:23,320 --> 00:08:25,160 Speaker 2: I mean, if you could make one out of ten 198 00:08:25,200 --> 00:08:28,240 Speaker 2: it's clearly an unacceptable error raid for a piece of 199 00:08:28,240 --> 00:08:31,640 Speaker 2: commercial software that could call someone an AI creator. So 200 00:08:31,680 --> 00:08:33,360 Speaker 2: you have to do a lot better than ninety percent. 201 00:08:33,800 --> 00:08:36,360 Speaker 2: Talk to us about like what you've seen so far 202 00:08:36,559 --> 00:08:39,920 Speaker 2: in your data since releasing it as commercial software that 203 00:08:40,040 --> 00:08:43,600 Speaker 2: makes you believe the software is doing a correct job 204 00:08:43,679 --> 00:08:45,720 Speaker 2: of allocating between the two categories. 205 00:08:45,760 --> 00:08:49,679 Speaker 7: So we've built out really comprehensive emails, okay, and so 206 00:08:49,880 --> 00:08:54,240 Speaker 7: our evaluations. There's two kinds of errors. There's a false positive, 207 00:08:54,520 --> 00:08:56,920 Speaker 7: which is when something is written by a human and 208 00:08:56,960 --> 00:08:58,720 Speaker 7: then we say that it's written by an AI, okay. 209 00:08:58,760 --> 00:09:00,839 Speaker 7: And there's a false negative, which is if it was 210 00:09:00,920 --> 00:09:03,840 Speaker 7: AI written and we don't catch it. And so we 211 00:09:04,040 --> 00:09:07,839 Speaker 7: track our numbers for both of these, and for human. 212 00:09:07,559 --> 00:09:09,079 Speaker 5: Writing, we're actually pretty fortunate. 213 00:09:09,240 --> 00:09:11,080 Speaker 7: We have like millions and millions of samples, so we 214 00:09:11,120 --> 00:09:13,640 Speaker 7: can get like a false positive number that we have 215 00:09:13,679 --> 00:09:16,080 Speaker 7: a very high degree of confidence in. And our number 216 00:09:16,160 --> 00:09:19,080 Speaker 7: right now is about one in ten thousand. Ok So, 217 00:09:19,160 --> 00:09:22,760 Speaker 7: if we scan ten thousand documents on average, one will 218 00:09:22,800 --> 00:09:23,480 Speaker 7: come back as. 219 00:09:23,840 --> 00:09:25,240 Speaker 5: AI when it was actually human. 220 00:09:25,440 --> 00:09:27,319 Speaker 3: And what about in the other direction false negative? 221 00:09:27,720 --> 00:09:31,760 Speaker 7: I would say around ninety nine percent accuracy, So like 222 00:09:32,120 --> 00:09:35,080 Speaker 7: around one percent false negative rate. I think this depends 223 00:09:35,080 --> 00:09:38,440 Speaker 7: a little bit more on like how adversarial the prompting is, 224 00:09:38,640 --> 00:09:40,720 Speaker 7: how much they're trying to ev. 225 00:09:40,720 --> 00:09:44,280 Speaker 2: What I did exact send it through multiple filtrations to 226 00:09:44,360 --> 00:09:47,600 Speaker 2: obfuscate the original output. That would be an example of 227 00:09:47,640 --> 00:09:49,240 Speaker 2: adversarial prompting exactly. 228 00:09:49,480 --> 00:09:52,079 Speaker 7: But in like the general case where we're just looking 229 00:09:52,120 --> 00:09:55,880 Speaker 7: at straight outputs from AI, it's above ninety nine percent. 230 00:09:55,960 --> 00:09:59,000 Speaker 4: Okay, okay, So what is your model looking for exactly 231 00:09:59,040 --> 00:10:02,120 Speaker 4: when it's evaluated a text? Because, as we mentioned in 232 00:10:02,160 --> 00:10:05,560 Speaker 4: the intro, you know, syntax and grammar tends to be 233 00:10:05,679 --> 00:10:10,599 Speaker 4: pretty good on AI generated copy. The style is sometimes 234 00:10:10,640 --> 00:10:14,760 Speaker 4: more of an identifier, I would argue to your point, Joe, like, 235 00:10:14,960 --> 00:10:19,320 Speaker 4: sometimes it reads very saccharine and kind of overly earnest 236 00:10:19,640 --> 00:10:22,280 Speaker 4: in some ways. So what exactly are you focusing on here? 237 00:10:22,280 --> 00:10:23,000 Speaker 4: What are the tells? 238 00:10:23,200 --> 00:10:26,120 Speaker 7: Yeah, so the style and the word choices are definitely 239 00:10:26,200 --> 00:10:27,760 Speaker 7: part of it. But I think what a lot of 240 00:10:27,760 --> 00:10:30,200 Speaker 7: people don't realize is they're actually making a lot of 241 00:10:30,559 --> 00:10:33,720 Speaker 7: decisions when they write a piece of text. So there's 242 00:10:33,840 --> 00:10:36,800 Speaker 7: you know, dozens or hundreds of ways to phrase every 243 00:10:36,840 --> 00:10:39,680 Speaker 7: single phrase, and over the course of fifty or one 244 00:10:39,720 --> 00:10:43,240 Speaker 7: hundred or two hundred words, you're making thousands of decisions actually, 245 00:10:43,679 --> 00:10:46,400 Speaker 7: And so what we're doing is we're learning the patterns 246 00:10:46,400 --> 00:10:49,880 Speaker 7: and how like these frontier models make these decisions. And 247 00:10:49,960 --> 00:10:53,000 Speaker 7: if the vast majority of these decisions line up with 248 00:10:53,040 --> 00:10:56,160 Speaker 7: how the frontier models are doing it, then it's vanishingly 249 00:10:56,240 --> 00:10:58,600 Speaker 7: unlikely that this was written by a human. You would 250 00:10:58,640 --> 00:11:01,240 Speaker 7: have to just happen to make the same exact decisions 251 00:11:01,240 --> 00:11:03,240 Speaker 7: that the LM does hundreds of times. 252 00:11:03,280 --> 00:11:04,280 Speaker 6: Interesting, Okay, this. 253 00:11:04,320 --> 00:11:05,480 Speaker 3: Is a really important point. 254 00:11:05,559 --> 00:11:08,200 Speaker 2: So everyone at this point has some feel for let 255 00:11:08,280 --> 00:11:11,400 Speaker 2: go the M dash tell right, But my understanding is 256 00:11:11,440 --> 00:11:13,640 Speaker 2: it's not like you don't go in in like hard 257 00:11:13,679 --> 00:11:15,960 Speaker 2: code if you see a bunch of M dashes. This 258 00:11:16,080 --> 00:11:19,920 Speaker 2: is the thing these decisions. In many cases, I imagine, 259 00:11:19,960 --> 00:11:24,840 Speaker 2: neither you nor the model itself can articulate in English 260 00:11:25,080 --> 00:11:27,720 Speaker 2: what the decisions are. All you know is that the 261 00:11:27,760 --> 00:11:29,160 Speaker 2: decision pattern exists. 262 00:11:29,240 --> 00:11:29,880 Speaker 3: Is this correct? 263 00:11:30,000 --> 00:11:30,679 Speaker 5: This is correct? 264 00:11:30,720 --> 00:11:31,840 Speaker 3: Okay? Can you explain? 265 00:11:32,000 --> 00:11:35,120 Speaker 2: So therefore, what does it mean that your model has 266 00:11:35,280 --> 00:11:37,079 Speaker 2: learned these decision? 267 00:11:37,480 --> 00:11:39,920 Speaker 7: So what we're doing on the very broad scale is 268 00:11:40,080 --> 00:11:42,920 Speaker 7: we're training a deep learning model. So it's a pretty 269 00:11:42,920 --> 00:11:46,400 Speaker 7: big black box, but it has the base model of 270 00:11:47,040 --> 00:11:50,040 Speaker 7: a language model, and then instead of predicting the next token, 271 00:11:50,520 --> 00:11:53,880 Speaker 7: it's predicting whether it the text is AI or not. Okay, 272 00:11:53,960 --> 00:11:56,800 Speaker 7: And how we train it is we train on tens 273 00:11:56,840 --> 00:11:59,960 Speaker 7: of millions of examples, so it sees millions and milli 274 00:12:00,160 --> 00:12:02,959 Speaker 7: of human examples, and for each human example, we also 275 00:12:03,000 --> 00:12:05,920 Speaker 7: show it an AI example. So, for example, let's say 276 00:12:05,920 --> 00:12:09,000 Speaker 7: one of these is a five star review for Denny's 277 00:12:09,200 --> 00:12:11,959 Speaker 7: that's seventy eight words long. Then we'll ask in AI 278 00:12:12,200 --> 00:12:14,120 Speaker 7: to write a five star review about Denny's that's seventy 279 00:12:14,120 --> 00:12:16,240 Speaker 7: eight words long in the style of the first one. 280 00:12:16,440 --> 00:12:18,840 Speaker 7: And obviously these two will be different, and so our 281 00:12:18,880 --> 00:12:22,080 Speaker 7: model is able to learn through contrast, what is the 282 00:12:22,080 --> 00:12:23,000 Speaker 7: difference between. 283 00:12:22,720 --> 00:12:24,840 Speaker 2: Me and The Important thing, sorry, just to be clear here, 284 00:12:25,000 --> 00:12:26,960 Speaker 2: is that you and I might not be able to 285 00:12:27,040 --> 00:12:30,439 Speaker 2: articulate the difference. There will be some difference in maybe 286 00:12:30,520 --> 00:12:33,240 Speaker 2: the sentenced length, there will be some difference in word choice, 287 00:12:33,240 --> 00:12:36,480 Speaker 2: there'll be some difference in punctuation, syntax, whatever, but you 288 00:12:36,600 --> 00:12:40,240 Speaker 2: and I wouldn't obviously spot it. However, after millions of 289 00:12:40,280 --> 00:12:43,640 Speaker 2: examples of these side by sides, the model learns what 290 00:12:43,679 --> 00:12:44,640 Speaker 2: the difference is exactly. 291 00:12:44,720 --> 00:12:46,560 Speaker 7: I think the best that a human can do is 292 00:12:46,720 --> 00:12:49,800 Speaker 7: look for some of these like really obvious tells like chat. 293 00:12:49,880 --> 00:12:53,440 Speaker 7: GIPT loves that, like it's not just X, it's y framing. 294 00:12:53,800 --> 00:12:57,240 Speaker 7: Earlier models really liked some specific words like tapestry and 295 00:12:57,320 --> 00:12:58,760 Speaker 7: intercate and delve. 296 00:12:58,840 --> 00:13:00,360 Speaker 3: Yeah, delve tapestry. Yeah. 297 00:13:00,400 --> 00:13:00,960 Speaker 5: But yeah. 298 00:13:01,000 --> 00:13:03,079 Speaker 7: I think by training Pangram, we're able to go much 299 00:13:03,120 --> 00:13:05,640 Speaker 7: deeper than this and look deeper than the high level 300 00:13:05,640 --> 00:13:08,120 Speaker 7: science at the like document level science. 301 00:13:23,960 --> 00:13:26,080 Speaker 4: So one thing this kind of reminds me of and 302 00:13:26,120 --> 00:13:28,559 Speaker 4: I'm thinking how to phrase this, but it reminds me 303 00:13:28,600 --> 00:13:31,800 Speaker 4: of you know those exercises people used to do where 304 00:13:31,800 --> 00:13:34,000 Speaker 4: you would take a bunch of different faces and meld 305 00:13:34,040 --> 00:13:37,200 Speaker 4: them all together and come up with like one face 306 00:13:37,320 --> 00:13:41,120 Speaker 4: that was supposedly attractive. So, like, to what extent is 307 00:13:41,160 --> 00:13:46,560 Speaker 4: this basically a distributional detector in the sense that you're 308 00:13:46,600 --> 00:13:50,960 Speaker 4: looking for like certain paths that you think AI would choose. 309 00:13:51,800 --> 00:13:54,239 Speaker 4: And I guess, like, could you get a false positive 310 00:13:54,840 --> 00:13:57,440 Speaker 4: just from someone who's choosing like the average of the 311 00:13:57,480 --> 00:14:00,320 Speaker 4: average of the average in a way to state a 312 00:14:00,320 --> 00:14:01,200 Speaker 4: particular sentence. 313 00:14:03,360 --> 00:14:06,400 Speaker 7: Maybe there's a reason we have our false posit rate 314 00:14:06,440 --> 00:14:08,840 Speaker 7: is one in ten thousand and not zero. It's because 315 00:14:09,200 --> 00:14:12,319 Speaker 7: you know, sometimes we look at the false positive and 316 00:14:12,360 --> 00:14:15,559 Speaker 7: it's like, oh, it reads exactly like an AI generated 317 00:14:15,720 --> 00:14:18,600 Speaker 7: review or essay, except that it was written in twenty nineteen. 318 00:14:18,640 --> 00:14:21,000 Speaker 7: So it was probably a human who just happened to 319 00:14:21,800 --> 00:14:24,840 Speaker 7: find the exact like mode collapsed. 320 00:14:24,640 --> 00:14:26,720 Speaker 5: Type of way that like, yeah, thats right, Yeah, I 321 00:14:26,760 --> 00:14:27,400 Speaker 5: would say, yeah. 322 00:14:27,480 --> 00:14:29,440 Speaker 7: I think it's a good way to think about the 323 00:14:29,480 --> 00:14:32,840 Speaker 7: distribution of writing or writing as a distribution where like, 324 00:14:32,920 --> 00:14:35,520 Speaker 7: you know, there's the space of all human writing, and 325 00:14:35,560 --> 00:14:37,920 Speaker 7: then AI writing is really just. 326 00:14:37,920 --> 00:14:39,840 Speaker 5: Like a small point within this space. 327 00:14:39,880 --> 00:14:42,360 Speaker 7: It's very no matter how much you prompt it, it 328 00:14:42,400 --> 00:14:46,160 Speaker 7: doesn't go that far from where it was trained to be. 329 00:14:46,440 --> 00:14:48,120 Speaker 3: Yeah, okay, WA's the black book. 330 00:14:48,200 --> 00:14:50,520 Speaker 2: So I built a little model myself. I built this 331 00:14:50,560 --> 00:14:53,080 Speaker 2: thing that detext. You can upload text and says whether 332 00:14:53,120 --> 00:14:56,600 Speaker 2: it's more resemblant of the written word or the spoken word. 333 00:14:57,040 --> 00:14:59,600 Speaker 2: Oh I saw that, yeah, yeah, And I used bert, 334 00:14:59,640 --> 00:15:02,480 Speaker 2: which is like one of these things open source one 335 00:15:02,480 --> 00:15:02,960 Speaker 2: from Google. 336 00:15:03,000 --> 00:15:04,800 Speaker 3: What is the core model that. 337 00:15:04,720 --> 00:15:07,280 Speaker 2: You trained on or is it something or did you 338 00:15:07,320 --> 00:15:08,120 Speaker 2: build it yourself? 339 00:15:08,200 --> 00:15:08,960 Speaker 3: Like, talk to us about that. 340 00:15:09,000 --> 00:15:11,760 Speaker 7: Our very first model was actually built on Burt, but 341 00:15:11,960 --> 00:15:17,360 Speaker 7: future models we needed to up our capacity. So basically 342 00:15:17,440 --> 00:15:20,480 Speaker 7: we were running into capacity limits with our model. It 343 00:15:20,840 --> 00:15:23,840 Speaker 7: was capping out at a certain false positive false negative rate. 344 00:15:24,040 --> 00:15:26,600 Speaker 7: It wasn't learning the deeper signals, so we had to 345 00:15:26,800 --> 00:15:28,960 Speaker 7: ten x and then one hundred x the parameter account 346 00:15:29,160 --> 00:15:32,400 Speaker 7: so that can learn like really deeply, like how these 347 00:15:32,400 --> 00:15:33,400 Speaker 7: frontier models. 348 00:15:33,200 --> 00:15:36,920 Speaker 4: Right, Have you noticed any interesting differences between how the 349 00:15:36,960 --> 00:15:40,760 Speaker 4: models right? Can you and actually is your model trained 350 00:15:40,800 --> 00:15:44,080 Speaker 4: to identify different models as well as whether or not 351 00:15:44,120 --> 00:15:46,440 Speaker 4: This is just broadly AI generated. 352 00:15:46,560 --> 00:15:50,520 Speaker 7: So we don't specifically train it on different models. We 353 00:15:50,520 --> 00:15:52,720 Speaker 7: don't say like hey, this one is CLAT three and 354 00:15:52,760 --> 00:15:56,400 Speaker 7: this one is Chat or GPD five. What we've done 355 00:15:56,680 --> 00:16:00,040 Speaker 7: we've done some interpretability work to look at basically the 356 00:16:00,080 --> 00:16:02,720 Speaker 7: output embeddings of the model and where we find that 357 00:16:02,920 --> 00:16:05,880 Speaker 7: it actually learns which model the text came from. So 358 00:16:05,920 --> 00:16:08,360 Speaker 7: you could see like little clusters like this is the 359 00:16:08,440 --> 00:16:11,440 Speaker 7: Clod cluster and like all the clods, yeah, cluster around here, 360 00:16:11,440 --> 00:16:13,760 Speaker 7: and then these are like the deep Seek and Quinn 361 00:16:13,840 --> 00:16:15,760 Speaker 7: and then this is like Chat schipt and they all 362 00:16:15,840 --> 00:16:19,680 Speaker 7: kind of like cluster into different spaces and embedding space. 363 00:16:20,240 --> 00:16:22,640 Speaker 7: So clearly the model is able to learn what the 364 00:16:22,640 --> 00:16:24,320 Speaker 7: difference is between these frontier models. 365 00:16:24,520 --> 00:16:27,480 Speaker 4: We actually since you mentioned Quin, I'm very interested is 366 00:16:27,480 --> 00:16:31,040 Speaker 4: there anything like distinct in terms of how Quen generates 367 00:16:31,080 --> 00:16:34,600 Speaker 4: text versus platforms that have been developed in the US. 368 00:16:35,120 --> 00:16:37,640 Speaker 7: I think Quen is unique because it's trained on a 369 00:16:37,680 --> 00:16:40,640 Speaker 7: lot more Chinese and multi lingual tokens than other models. 370 00:16:41,360 --> 00:16:44,200 Speaker 7: So you know, I've heard from Chinese friends that it's 371 00:16:44,320 --> 00:16:49,680 Speaker 7: it's much better at like being conversationally fluent in Chinese. 372 00:16:50,320 --> 00:16:52,400 Speaker 5: Beyond that, I don't know that I can tell. 373 00:16:52,760 --> 00:16:54,280 Speaker 7: It would be hard for me to look at a 374 00:16:54,320 --> 00:16:57,360 Speaker 7: text and say, like, I know that's Quen, But I 375 00:16:57,360 --> 00:16:59,680 Speaker 7: think somebody who's more familiar with it might be able to. 376 00:17:00,200 --> 00:17:02,880 Speaker 2: Let's talk about sort of some of the philosophical or 377 00:17:02,920 --> 00:17:04,720 Speaker 2: societal implications of this work. 378 00:17:05,240 --> 00:17:06,040 Speaker 3: Have you had. 379 00:17:05,920 --> 00:17:10,120 Speaker 2: Anyone whose text has been judged to be ai written 380 00:17:10,160 --> 00:17:12,840 Speaker 2: by Pangram and they're like, I swear to God, this 381 00:17:12,880 --> 00:17:15,639 Speaker 2: isn't you're in? They like, really insist, and what do 382 00:17:15,640 --> 00:17:17,399 Speaker 2: you think about this situation? What do you do or 383 00:17:17,440 --> 00:17:18,200 Speaker 2: talk choice about that. 384 00:17:18,359 --> 00:17:20,439 Speaker 7: I've had a couple of times this happened. There have 385 00:17:20,440 --> 00:17:22,600 Speaker 7: been times where I genuinely believe that you know this 386 00:17:22,720 --> 00:17:24,879 Speaker 7: is just a false positive. We scan hundreds of millions 387 00:17:24,880 --> 00:17:27,040 Speaker 7: of documents, so like, at a certain scale like this 388 00:17:27,040 --> 00:17:30,359 Speaker 7: will happen. But I also get people who all the 389 00:17:30,400 --> 00:17:32,720 Speaker 7: time they're just like AI detectors don't work. 390 00:17:32,840 --> 00:17:34,040 Speaker 5: It's like a total fraud. 391 00:17:34,280 --> 00:17:37,040 Speaker 7: And then whatever they're putting out on LinkedIn is just 392 00:17:37,080 --> 00:17:38,760 Speaker 7: one hundred percent AI generated. 393 00:17:38,440 --> 00:17:40,120 Speaker 5: And they're just like mad that they're getting called out. 394 00:17:40,440 --> 00:17:43,200 Speaker 7: And then you look back farther into their past and 395 00:17:43,200 --> 00:17:45,600 Speaker 7: their history, like everything they're putting out is AI generated 396 00:17:46,000 --> 00:17:49,320 Speaker 7: until about like twenty twenty three, Like for everyone, if 397 00:17:49,359 --> 00:17:52,120 Speaker 7: you look historically, there's a lot of like slop accounts 398 00:17:52,119 --> 00:17:54,800 Speaker 7: that are putting out total slop, and you can tell 399 00:17:54,800 --> 00:17:57,800 Speaker 7: either they like weren't posting as much before, and if 400 00:17:57,880 --> 00:18:00,479 Speaker 7: you scan back in time, then you see that they 401 00:18:00,480 --> 00:18:02,160 Speaker 7: were writing human text at some point. 402 00:18:02,240 --> 00:18:04,800 Speaker 2: So there's a number of accounts out there that basically 403 00:18:04,960 --> 00:18:07,840 Speaker 2: right around the beginning of twenty twenty three, where if 404 00:18:07,880 --> 00:18:10,840 Speaker 2: you scan the entire corpus of their work, it very 405 00:18:10,960 --> 00:18:12,640 Speaker 2: clearly shows a switch. 406 00:18:12,359 --> 00:18:13,920 Speaker 3: Right around early twenty twenty three. 407 00:18:14,119 --> 00:18:17,280 Speaker 7: Yeah, it really like depends on the account. I think 408 00:18:17,400 --> 00:18:19,520 Speaker 7: one thing we saw that was interesting was there is 409 00:18:19,600 --> 00:18:22,720 Speaker 7: a writer for The Guardian that was covering the Winter Olympics, 410 00:18:22,920 --> 00:18:25,040 Speaker 7: and somebody was like, hey, this article is like total 411 00:18:25,080 --> 00:18:27,840 Speaker 7: AI slop. Ran it through pangram it was AI. The 412 00:18:27,880 --> 00:18:30,520 Speaker 7: Guardian was like, no, of course, our writers don't use AI. 413 00:18:30,760 --> 00:18:34,080 Speaker 7: And then we so we scanned this single writer's history 414 00:18:34,520 --> 00:18:36,760 Speaker 7: and we found that they really did start picking up 415 00:18:36,800 --> 00:18:39,400 Speaker 7: AI like mid to late twenty twenty four, and we're 416 00:18:39,480 --> 00:18:41,240 Speaker 7: using it more and more in their articles. 417 00:18:41,560 --> 00:18:44,240 Speaker 4: I mean, just play Devil's Advocate for a second. Does 418 00:18:44,280 --> 00:18:48,359 Speaker 4: intent matter when it comes to identifying AI slop in 419 00:18:48,400 --> 00:18:50,679 Speaker 4: the sense that, Okay, I get you can have a 420 00:18:50,720 --> 00:18:54,800 Speaker 4: bad actor who's maybe trying to influence how people feel 421 00:18:54,800 --> 00:18:57,720 Speaker 4: about a particular topic, and maybe they've created a bunch 422 00:18:57,760 --> 00:19:01,320 Speaker 4: of bots on Twitter slash x and they're using AI 423 00:19:01,480 --> 00:19:04,160 Speaker 4: to just flood the zone with a bunch of AI 424 00:19:04,240 --> 00:19:08,960 Speaker 4: slop supporting their particular viewpoints. On the other hand, if 425 00:19:08,960 --> 00:19:12,479 Speaker 4: you're a journalist and your business is to write, you know, 426 00:19:12,600 --> 00:19:16,520 Speaker 4: like basic understandable copy about a news topic. 427 00:19:16,800 --> 00:19:17,880 Speaker 6: Just to be clear, I'm. 428 00:19:17,680 --> 00:19:21,440 Speaker 4: Not advocating this at all, but that intent is very 429 00:19:21,440 --> 00:19:25,040 Speaker 4: different to I'm going to try to influence something by 430 00:19:25,280 --> 00:19:26,800 Speaker 4: just you know, sheer volume. 431 00:19:27,240 --> 00:19:29,680 Speaker 7: Yeah, I mean, definitely these are like one is a 432 00:19:29,720 --> 00:19:32,239 Speaker 7: lot more severe than the other. But I think at 433 00:19:32,240 --> 00:19:34,280 Speaker 7: the same time, if you're a journalist and you're using 434 00:19:34,760 --> 00:19:38,000 Speaker 7: AI to basically shirk your work and like not do 435 00:19:38,080 --> 00:19:40,240 Speaker 7: your work, I think that's also a problem. And I 436 00:19:40,240 --> 00:19:42,880 Speaker 7: think it's a reputational risk to the outlet because people 437 00:19:42,960 --> 00:19:44,879 Speaker 7: can tell and people are going to call you out. 438 00:19:45,440 --> 00:19:46,840 Speaker 7: There's a lot of people who don't want to read 439 00:19:46,840 --> 00:19:49,240 Speaker 7: AI slop kind of regardless of where it's from. 440 00:19:49,520 --> 00:19:52,840 Speaker 2: Yeah, this is a definitely true. Are you ever going 441 00:19:52,880 --> 00:19:55,240 Speaker 2: to run out of human material to change on? 442 00:19:55,400 --> 00:19:55,520 Speaker 5: Right? 443 00:19:55,560 --> 00:19:57,920 Speaker 2: Like you could be pretty confident that if you find 444 00:19:57,960 --> 00:20:00,879 Speaker 2: some piece of text that was published on the internet 445 00:20:00,880 --> 00:20:03,960 Speaker 2: prior to twenty twenty three, but certainly prior to like 446 00:20:04,000 --> 00:20:06,840 Speaker 2: twenty nineteen or something like that, you can be extremely 447 00:20:06,880 --> 00:20:11,240 Speaker 2: sure that this is human generated. Do you worry that 448 00:20:11,400 --> 00:20:14,040 Speaker 2: in the future, like it's going to be harder to 449 00:20:14,200 --> 00:20:16,840 Speaker 2: even establish the provenance of your training data. 450 00:20:17,200 --> 00:20:18,800 Speaker 5: Uh, Yeah, it's definitely a concern for us. 451 00:20:18,920 --> 00:20:20,280 Speaker 3: Talk to us about how to think about this. 452 00:20:20,359 --> 00:20:23,440 Speaker 7: So we have a near infinite data reservoir of pre 453 00:20:23,560 --> 00:20:26,600 Speaker 7: twenty twenty three data, there's just like more than enough 454 00:20:26,600 --> 00:20:28,280 Speaker 7: for us to train on for a long long time. 455 00:20:28,920 --> 00:20:31,080 Speaker 7: But part of the problem is we also want to 456 00:20:31,080 --> 00:20:33,560 Speaker 7: train on modern text. We want to there's all this 457 00:20:33,640 --> 00:20:36,840 Speaker 7: talk about like if somebody's writing about LMS or about AI, 458 00:20:36,920 --> 00:20:39,560 Speaker 7: we don't want to incorrectly flag that as AI because 459 00:20:39,760 --> 00:20:43,399 Speaker 7: our training data has no sense of this topic. So 460 00:20:44,040 --> 00:20:46,040 Speaker 7: I think we're looking at different ways to do this, 461 00:20:46,160 --> 00:20:48,760 Speaker 7: but most of them are just like figuring out like 462 00:20:48,800 --> 00:20:49,960 Speaker 7: who is a trusted actor? 463 00:20:50,000 --> 00:20:51,160 Speaker 5: Who do we know is. 464 00:20:51,160 --> 00:20:53,919 Speaker 7: Putting out humor written content and we could use our 465 00:20:53,960 --> 00:20:56,080 Speaker 7: model for that, like to some degree. And then so 466 00:20:56,200 --> 00:20:58,600 Speaker 7: we have known actors, we know they're putting out human 467 00:20:58,640 --> 00:21:00,560 Speaker 7: written content, and then we could use their as well. 468 00:21:00,920 --> 00:21:03,680 Speaker 4: Slightly random question, but using your model, are you able 469 00:21:03,680 --> 00:21:06,919 Speaker 4: to quantify like what percentage of the Internet at the 470 00:21:06,960 --> 00:21:08,240 Speaker 4: moment is aislot? 471 00:21:08,600 --> 00:21:12,920 Speaker 2: It's about forty percent based on why you're just how'd 472 00:21:12,920 --> 00:21:13,639 Speaker 2: you get that number? 473 00:21:13,960 --> 00:21:16,960 Speaker 7: So a lot of the Internet is just like SEO 474 00:21:17,080 --> 00:21:20,480 Speaker 7: written articles and like, yeah, it's articles written for search 475 00:21:20,560 --> 00:21:22,440 Speaker 7: basically so that your website comes up more often in 476 00:21:22,440 --> 00:21:24,919 Speaker 7: search because it's targeting certain keywords. And a lot of 477 00:21:24,920 --> 00:21:28,280 Speaker 7: that industry has switched over to using AI because then 478 00:21:28,320 --> 00:21:30,480 Speaker 7: instead of having to pay writers you could turn out 479 00:21:30,560 --> 00:21:33,520 Speaker 7: articles for pennies on the dollar, but I think that 480 00:21:33,600 --> 00:21:36,280 Speaker 7: kind of results in a lot of the Internet being 481 00:21:36,359 --> 00:21:39,399 Speaker 7: AI written. It's a little bit is also kind of 482 00:21:39,440 --> 00:21:43,040 Speaker 7: platform dependent. It's about forty percent from like a Internet 483 00:21:43,040 --> 00:21:46,600 Speaker 7: page perspective. About a year and a half ago, we 484 00:21:46,640 --> 00:21:49,600 Speaker 7: looked at Medium and found that over fifty percent of 485 00:21:49,840 --> 00:21:54,240 Speaker 7: newly written Medium articles were generated, which was a crazy 486 00:21:54,320 --> 00:21:54,840 Speaker 7: high number. 487 00:21:54,880 --> 00:21:55,520 Speaker 3: What about Reddit? 488 00:21:56,160 --> 00:21:58,679 Speaker 7: Reddit, it was seven percent a year ago, I believe 489 00:21:58,920 --> 00:21:59,879 Speaker 7: a little over ten percent. 490 00:22:00,400 --> 00:22:03,280 Speaker 4: Well, actually this reminds me. So I'm on Reddit a 491 00:22:03,280 --> 00:22:05,840 Speaker 4: lot and I really enjoy it nowadays as a platform, 492 00:22:05,880 --> 00:22:07,600 Speaker 4: but I do worry about how much of it is 493 00:22:07,640 --> 00:22:11,280 Speaker 4: being generated by AI. And the thing I don't necessarily 494 00:22:11,359 --> 00:22:16,000 Speaker 4: understand is what are the economic incentives to actually write 495 00:22:16,040 --> 00:22:18,480 Speaker 4: a bunch of AI generated posts on Reddit and get 496 00:22:18,600 --> 00:22:22,439 Speaker 4: up voted, Like why does that system or motivation even exist. 497 00:22:22,760 --> 00:22:25,200 Speaker 7: So there are startups I'm not going to name names 498 00:22:25,240 --> 00:22:27,520 Speaker 7: because I don't want to promote them, but they will 499 00:22:28,119 --> 00:22:30,480 Speaker 7: sell a promise to companies that we're going to get 500 00:22:30,480 --> 00:22:33,719 Speaker 7: you organic mentions on Reddit. We're going to run our 501 00:22:33,760 --> 00:22:37,320 Speaker 7: AI bots that seem organic, and they're just going to, 502 00:22:37,640 --> 00:22:40,280 Speaker 7: you know, naturally recommend your product or you know, just 503 00:22:40,359 --> 00:22:43,119 Speaker 7: mention your product in the comments or in a post. 504 00:22:43,600 --> 00:22:46,399 Speaker 7: And so I've seen evidence of this. We can find 505 00:22:46,440 --> 00:22:51,520 Speaker 7: these like they're basically like botforms that are mostly engaging, 506 00:22:52,000 --> 00:22:55,000 Speaker 7: seemingly organically, just like doing a short reply, and then 507 00:22:55,040 --> 00:22:57,560 Speaker 7: sometimes they're doing this brand mention. And so that's why 508 00:22:57,560 --> 00:22:58,840 Speaker 7: these posts are very valuable. 509 00:22:58,840 --> 00:22:59,680 Speaker 6: That's really interesting. 510 00:22:59,720 --> 00:23:02,280 Speaker 2: I have to you also imagine it's valuable because all 511 00:23:02,359 --> 00:23:05,280 Speaker 2: of the models train on Reddit, right, and if you 512 00:23:05,359 --> 00:23:09,399 Speaker 2: want your product's name to appear in model outputs, it's like, 513 00:23:09,680 --> 00:23:13,520 Speaker 2: what is the best you know, nose hair trimmer or whatever, 514 00:23:13,960 --> 00:23:16,320 Speaker 2: And there's a bunch of bots that on Reddit talked 515 00:23:16,320 --> 00:23:18,920 Speaker 2: about this nose hair trimmer, and then that's probably more. 516 00:23:18,800 --> 00:23:21,639 Speaker 3: Likely to show up in a chatchypt request, right. 517 00:23:21,760 --> 00:23:23,920 Speaker 7: Yeah, yeah, it's been weirdly gamed. You know, you used 518 00:23:23,920 --> 00:23:26,200 Speaker 7: to just google best nose hair trimmer, and now there's 519 00:23:26,240 --> 00:23:27,160 Speaker 7: like a thousand. 520 00:23:27,400 --> 00:23:29,959 Speaker 4: The Reddit search results like show up first nowadays. 521 00:23:30,080 --> 00:23:31,240 Speaker 6: Yeah, that's where people are looking. 522 00:23:31,480 --> 00:23:34,439 Speaker 7: Yeah, and then people start searching best nose trimmer Reddit 523 00:23:35,280 --> 00:23:37,280 Speaker 7: to get their Reddit comments on it. And now it's 524 00:23:37,680 --> 00:23:39,800 Speaker 7: people have realized that that's what people are searching for. 525 00:23:40,119 --> 00:23:43,480 Speaker 7: So you need to populate Reddit with your advertisements. 526 00:23:44,760 --> 00:23:46,600 Speaker 4: I'm on the Men's health Are you looking for nose 527 00:23:46,640 --> 00:23:47,240 Speaker 4: hair trimmers? 528 00:23:47,440 --> 00:23:50,440 Speaker 2: The Panasonic ear and nose hair trimmer is the number 529 00:23:50,480 --> 00:23:53,800 Speaker 2: one choice on men's health pros. Easy to hold anyway, 530 00:23:53,960 --> 00:23:54,240 Speaker 2: it's not. 531 00:23:54,440 --> 00:23:57,800 Speaker 5: Yeah, it's all these affiliate links. Yeah, just destroyed the Internet. 532 00:23:57,920 --> 00:24:00,760 Speaker 2: I know it's it's too bad, but whatever, talk to 533 00:24:00,840 --> 00:24:03,280 Speaker 2: us more about the whole pipeline. So, I'm very fascinated 534 00:24:03,280 --> 00:24:05,879 Speaker 2: by this idea. It's like, Okay, you see this review 535 00:24:05,960 --> 00:24:08,480 Speaker 2: for Denny's. You have the AI model. 536 00:24:08,600 --> 00:24:10,879 Speaker 3: Try to replicate it as best as it could. Movie 537 00:24:10,880 --> 00:24:13,000 Speaker 3: these subtle differences. Talk to us as though about, like 538 00:24:13,040 --> 00:24:14,000 Speaker 3: the whole pipeline. 539 00:24:14,000 --> 00:24:16,640 Speaker 2: What are the other tests that you're using to get 540 00:24:16,680 --> 00:24:19,760 Speaker 2: the true you know, because what I imagine you're trying to 541 00:24:19,800 --> 00:24:22,879 Speaker 2: do is get the most similar data sets with an 542 00:24:22,880 --> 00:24:26,760 Speaker 2: almost imperceptible difference to really stress tests. Yeah, talk to 543 00:24:26,840 --> 00:24:28,120 Speaker 2: us really about the whole pipeline. 544 00:24:28,160 --> 00:24:28,320 Speaker 4: Yeah. 545 00:24:28,359 --> 00:24:30,359 Speaker 7: So what we're really trying to do here is we're as. 546 00:24:30,240 --> 00:24:33,240 Speaker 3: A model maker myself, no, no, sorry, keep going. 547 00:24:33,320 --> 00:24:35,159 Speaker 5: Yeah, as an AI expert, Yeah, yeah. 548 00:24:35,000 --> 00:24:36,920 Speaker 3: As an AI expert. I need to hear some tips 549 00:24:36,960 --> 00:24:37,520 Speaker 3: of the field. 550 00:24:38,600 --> 00:24:41,399 Speaker 7: Uh yeah, So what we're really looking for is examples 551 00:24:41,400 --> 00:24:43,800 Speaker 7: that are as close to the boundary between human and 552 00:24:43,840 --> 00:24:47,000 Speaker 7: AI as possible that our model learns better. Something that's 553 00:24:47,119 --> 00:24:50,399 Speaker 7: very obviously AI is, you know, our models not learning 554 00:24:50,400 --> 00:24:53,639 Speaker 7: as much same thing for something that's obviously human. And 555 00:24:53,720 --> 00:24:57,879 Speaker 7: so step one is creating this data set with synthetic 556 00:24:57,920 --> 00:25:00,639 Speaker 7: mirrors of human examples, and then we train a model, 557 00:25:00,960 --> 00:25:03,920 Speaker 7: and then step two is something called active learning. So 558 00:25:03,960 --> 00:25:06,840 Speaker 7: we then take this model and use it to scan 559 00:25:06,960 --> 00:25:10,920 Speaker 7: a much larger corpus of data and look for errors, 560 00:25:11,200 --> 00:25:14,440 Speaker 7: false positives, false negatives, and then we pull those back 561 00:25:14,480 --> 00:25:17,080 Speaker 7: into our training set and are able to train a 562 00:25:17,160 --> 00:25:20,919 Speaker 7: much better model because it's seen these errors, which and 563 00:25:20,960 --> 00:25:23,119 Speaker 7: these errors we believe are just much closer to the 564 00:25:23,520 --> 00:25:24,840 Speaker 7: boundary between human and AI. 565 00:25:25,080 --> 00:25:28,040 Speaker 2: So sorry, just to be clear, the first pass is like, okay, 566 00:25:28,080 --> 00:25:31,800 Speaker 2: you have known human writing and known AI writing. You 567 00:25:31,840 --> 00:25:34,760 Speaker 2: train a model, and then the next pass is once 568 00:25:34,800 --> 00:25:38,199 Speaker 2: again unknown human and known AI writing. So you already 569 00:25:38,240 --> 00:25:41,600 Speaker 2: know the answer of each of these and therefore you 570 00:25:41,640 --> 00:25:44,000 Speaker 2: could come up with a list of which it got wrong, 571 00:25:44,400 --> 00:25:46,840 Speaker 2: and then that gets fed back into the first. 572 00:25:46,640 --> 00:25:50,000 Speaker 7: Verse exactly, and so that makes once we retrain, then 573 00:25:50,040 --> 00:25:52,760 Speaker 7: the model gets much much better, and then we could 574 00:25:52,840 --> 00:25:55,600 Speaker 7: do this as many times as we want to, kind 575 00:25:55,600 --> 00:25:58,800 Speaker 7: of just have a self improving model that gets better 576 00:25:58,880 --> 00:26:01,600 Speaker 7: with every training run. I can also tell you go 577 00:26:01,640 --> 00:26:04,840 Speaker 7: a little bit more into how we deal with AI edits, 578 00:26:05,000 --> 00:26:08,840 Speaker 7: because I think that's increasingly important. Problem is, like I 579 00:26:08,880 --> 00:26:12,080 Speaker 7: think most writing will be AI assisted in the future. 580 00:26:12,440 --> 00:26:14,719 Speaker 7: I think it's already in Google Docs and it's in 581 00:26:15,040 --> 00:26:15,760 Speaker 7: Google Keyboard. 582 00:26:16,000 --> 00:26:18,359 Speaker 4: Grammarly arguably has been doing this for a while. 583 00:26:18,520 --> 00:26:18,879 Speaker 5: Exactly. 584 00:26:18,960 --> 00:26:22,480 Speaker 7: Yeah, Grammarly uses LMS on the back end, and we 585 00:26:22,760 --> 00:26:25,400 Speaker 7: don't want to just say, like, all writing is AI now. 586 00:26:25,520 --> 00:26:28,000 Speaker 7: We want to be able to differentiate between AI assisted 587 00:26:28,280 --> 00:26:30,560 Speaker 7: and AI generated. So what we do is we also 588 00:26:30,640 --> 00:26:34,720 Speaker 7: have different prompts. So rather than saying so for our 589 00:26:34,960 --> 00:26:38,679 Speaker 7: human review of Denny's, rather than saying, generate a review 590 00:26:38,800 --> 00:26:41,439 Speaker 7: like this, we could say, help improve this, make it 591 00:26:41,480 --> 00:26:43,920 Speaker 7: more formal, make it more like, clean up the grammar. 592 00:26:44,080 --> 00:26:47,320 Speaker 7: And so we have like a long list of AI 593 00:26:47,520 --> 00:26:51,680 Speaker 7: editing prompts, and then we're able to look at basically 594 00:26:51,680 --> 00:26:56,280 Speaker 7: the cosine difference the distance between the original human text and. 595 00:26:56,600 --> 00:26:59,240 Speaker 3: The in that hyper multidimensional space. 596 00:26:59,080 --> 00:27:03,800 Speaker 7: Exactly, So how much did AI change this text? And 597 00:27:03,840 --> 00:27:06,119 Speaker 7: then we're able to train our model to say, like 598 00:27:06,760 --> 00:27:09,080 Speaker 7: we're just going to like put a point on this 599 00:27:09,119 --> 00:27:11,960 Speaker 7: distance and say like this is moderate aissistance, this is 600 00:27:12,040 --> 00:27:14,240 Speaker 7: light AI assistance, and this is heavy aissistance. 601 00:27:14,560 --> 00:27:16,919 Speaker 4: Interesting. I'm going to do something I don't think I've 602 00:27:16,960 --> 00:27:20,600 Speaker 4: ever done before, which is ask a founder about their 603 00:27:20,680 --> 00:27:24,760 Speaker 4: corporate mission. But you know, you've set up this company, 604 00:27:25,320 --> 00:27:27,359 Speaker 4: and when you think about what you're trying to do here, 605 00:27:27,520 --> 00:27:30,520 Speaker 4: is it just basic AI detection in the sense that 606 00:27:30,560 --> 00:27:32,600 Speaker 4: there might be you know, a few groups of people 607 00:27:32,720 --> 00:27:35,960 Speaker 4: like teachers that find this very valuable, or is the 608 00:27:36,000 --> 00:27:40,399 Speaker 4: mission something broader where you're actually trying to improve the 609 00:27:40,480 --> 00:27:42,720 Speaker 4: Internet and what people see on it. 610 00:27:43,000 --> 00:27:46,800 Speaker 7: I believe the technology of being able to detect AI 611 00:27:46,840 --> 00:27:51,439 Speaker 7: generated content is immensely valuable, and it's valuable not just 612 00:27:51,480 --> 00:27:55,680 Speaker 7: for teachers, but for basically everybody in every profession. Lawyer's 613 00:27:56,040 --> 00:28:00,560 Speaker 7: publisher is just an individual who consumes content on the Internet. 614 00:28:00,760 --> 00:28:04,480 Speaker 7: I think it's valuable for all these people. But ultimately, yeah, 615 00:28:04,520 --> 00:28:07,719 Speaker 7: our high level goal is to help mitigate some of 616 00:28:07,760 --> 00:28:11,119 Speaker 7: these negative effects of growing AI content. 617 00:28:11,440 --> 00:28:16,280 Speaker 4: But for instance, just using the product review example, is 618 00:28:16,320 --> 00:28:19,520 Speaker 4: the vision that like a Yelp, for instance, would want 619 00:28:19,520 --> 00:28:22,119 Speaker 4: to use this technology to make sure that its system 620 00:28:22,280 --> 00:28:25,520 Speaker 4: isn't being gamed or is the vision Like if I 621 00:28:25,560 --> 00:28:28,720 Speaker 4: am a particularly diligent consumer who has a lot of 622 00:28:28,720 --> 00:28:30,800 Speaker 4: time on my hands and I'm looking to go out 623 00:28:30,840 --> 00:28:34,440 Speaker 4: to a restaurant, I can run all these individual restaurant 624 00:28:34,480 --> 00:28:38,400 Speaker 4: reviews through Pangram and then like actually figure out if 625 00:28:38,440 --> 00:28:39,680 Speaker 4: it's real hype or not. 626 00:28:40,280 --> 00:28:42,800 Speaker 7: So I think right now it's a lot of the former. 627 00:28:42,880 --> 00:28:46,000 Speaker 7: We work with platforms. One of our biggest customers is Quorra, 628 00:28:46,600 --> 00:28:49,120 Speaker 7: and they run a bunch of content through Pangram. But 629 00:28:49,160 --> 00:28:52,480 Speaker 7: we have a lot of different platforms that use Pangram 630 00:28:52,560 --> 00:28:56,440 Speaker 7: to help moderate and find AI bad actors and get 631 00:28:56,440 --> 00:28:58,640 Speaker 7: them off their platform. But I also think, yeah, the 632 00:28:58,760 --> 00:29:01,920 Speaker 7: individual consumer case has been growing a lot, and we're 633 00:29:01,920 --> 00:29:03,560 Speaker 7: really interested in pushing. 634 00:29:03,240 --> 00:29:23,320 Speaker 2: Here the free version of pangram dot com. Like you 635 00:29:23,360 --> 00:29:26,160 Speaker 2: get a handful of tests a day or something like that. 636 00:29:26,800 --> 00:29:32,440 Speaker 2: If someone had an unlimited number of Pangram responses and 637 00:29:32,840 --> 00:29:36,240 Speaker 2: maybe had an excess to the Pangram api at infinite scale, 638 00:29:36,960 --> 00:29:40,959 Speaker 2: could they theoretically learn a prompt that they would then 639 00:29:41,040 --> 00:29:43,880 Speaker 2: be able to put into an AI to generate human style. 640 00:29:43,920 --> 00:29:46,479 Speaker 7: Writer actually had a friend do that. He put his 641 00:29:46,560 --> 00:29:49,640 Speaker 7: cloud code on a loop. I gave him some API credits, 642 00:29:49,680 --> 00:29:53,120 Speaker 7: and then his cloud code just basically worked overnight writing 643 00:29:53,120 --> 00:29:55,480 Speaker 7: a prompt trying to get it to put something that's 644 00:29:55,520 --> 00:29:58,360 Speaker 7: human written or that which came back there from Pangram 645 00:29:58,480 --> 00:30:01,680 Speaker 7: as human written. They got there, but the text was 646 00:30:01,720 --> 00:30:06,760 Speaker 7: pretty like uh incoherent, so so like, yeah, it was 647 00:30:06,920 --> 00:30:11,680 Speaker 7: producing more or less long gibberish. It was like grammatically incorrect. 648 00:30:12,600 --> 00:30:14,600 Speaker 7: A lot of the words just didn't really make sense. 649 00:30:14,680 --> 00:30:16,600 Speaker 2: Because this was my first thought, like when I saw it, 650 00:30:16,640 --> 00:30:18,680 Speaker 2: I was like, that would be like a fun experiment 651 00:30:19,120 --> 00:30:21,800 Speaker 2: to see if you could take all the outputs, find 652 00:30:21,800 --> 00:30:24,400 Speaker 2: the difference and just keep iterating on the prompt you 653 00:30:24,400 --> 00:30:27,560 Speaker 2: would have to tell AI in order to eventually get 654 00:30:27,560 --> 00:30:31,240 Speaker 2: an output that looked to Pangram like it was human generated. 655 00:30:31,360 --> 00:30:32,920 Speaker 7: Yeah, I think there's a way to do it if 656 00:30:32,960 --> 00:30:36,080 Speaker 7: you also had like an LM judge on coherency and 657 00:30:36,200 --> 00:30:40,040 Speaker 7: he's like Pangram and the coherency judge both to score 658 00:30:40,160 --> 00:30:43,280 Speaker 7: your text. I think it's definitely possible, and I'm excited 659 00:30:43,280 --> 00:30:44,960 Speaker 7: for someone to try to do it, because we could 660 00:30:44,960 --> 00:30:46,840 Speaker 7: make our model a lot better and more robust if 661 00:30:46,840 --> 00:30:47,480 Speaker 7: this existed. 662 00:30:47,640 --> 00:30:49,719 Speaker 4: So I want to know what your personal like token 663 00:30:49,760 --> 00:30:52,880 Speaker 4: budget is nowadays that you're even like contemplating some of 664 00:30:52,880 --> 00:30:53,360 Speaker 4: those stuff. 665 00:30:53,360 --> 00:30:56,000 Speaker 2: What I feel like I had the Cloude Max playing, 666 00:30:56,040 --> 00:30:59,400 Speaker 2: you know, and I don't work like when I'm at work, 667 00:31:00,000 --> 00:31:02,080 Speaker 2: I don't work on any of my Vibe coding projects. 668 00:31:02,160 --> 00:31:03,680 Speaker 3: And you know, like when we were kids. 669 00:31:03,840 --> 00:31:06,000 Speaker 2: I don't know if you remember, like if you didn't 670 00:31:06,000 --> 00:31:08,480 Speaker 2: need all your food, like someone to say, oh, there's 671 00:31:08,480 --> 00:31:09,760 Speaker 2: like starving kids in the world. 672 00:31:10,080 --> 00:31:13,120 Speaker 4: Yeah, I'm like, oh, it's starving Vibe coder. 673 00:31:14,280 --> 00:31:15,280 Speaker 3: It's like, oh, you didn't. 674 00:31:15,320 --> 00:31:17,720 Speaker 2: Like I have this four hour token window and I'm 675 00:31:17,760 --> 00:31:20,520 Speaker 2: almost never maxing it out, and I'm just thinking, like, 676 00:31:20,880 --> 00:31:22,600 Speaker 2: the are kids on the other side of the world 677 00:31:22,600 --> 00:31:25,160 Speaker 2: that wish they had your tokens and you're you're not 678 00:31:25,320 --> 00:31:27,040 Speaker 2: using all of your tokens for the window. 679 00:31:27,120 --> 00:31:27,680 Speaker 3: How dare you? 680 00:31:27,760 --> 00:31:30,360 Speaker 2: I feel a little guilty when I don't out max 681 00:31:30,400 --> 00:31:32,760 Speaker 2: out by Claude max token program. 682 00:31:32,840 --> 00:31:35,400 Speaker 7: I also have Claude Max and yeah, most days I'm 683 00:31:35,640 --> 00:31:37,720 Speaker 7: not doing much coding at all, I'm not maxing it out, 684 00:31:37,840 --> 00:31:39,480 Speaker 7: and then some days I'm going you feel a lot. 685 00:31:39,520 --> 00:31:42,520 Speaker 2: Guilty about that though, it's like, yeah, yeah, so can 686 00:31:42,600 --> 00:31:45,960 Speaker 2: I just feel like writing is kind of interesting, but like, 687 00:31:46,200 --> 00:31:49,960 Speaker 2: what are the prospects of this being able to work on? Say, 688 00:31:50,840 --> 00:31:53,160 Speaker 2: and you must get this lot image and video generation? 689 00:31:53,960 --> 00:31:56,680 Speaker 2: Is it it all theoretically similar? Is there a reason 690 00:31:56,800 --> 00:31:59,360 Speaker 2: to think that it will be replicable? Or is this 691 00:31:59,480 --> 00:32:00,960 Speaker 2: just a different beast of a problem. 692 00:32:01,040 --> 00:32:03,760 Speaker 7: I think the approach is definitely doable. I think some 693 00:32:03,840 --> 00:32:06,760 Speaker 7: of the economics change, especially if we look at video 694 00:32:06,840 --> 00:32:09,400 Speaker 7: and the cost of generating video today. Okay, we can't 695 00:32:09,440 --> 00:32:11,920 Speaker 7: generate video at the same scale that we can generate text, 696 00:32:12,400 --> 00:32:14,320 Speaker 7: and so we might need a kind of different approach. 697 00:32:14,680 --> 00:32:17,320 Speaker 7: But I also believe that if we're able to solve 698 00:32:17,360 --> 00:32:21,120 Speaker 7: this for image plus maybe like audio, that could be 699 00:32:21,240 --> 00:32:22,840 Speaker 7: enough to just solve it for video as well. 700 00:32:22,920 --> 00:32:24,000 Speaker 5: Huh, zero shot. 701 00:32:24,120 --> 00:32:27,040 Speaker 4: Could you ever envision, I don't know, launching some sort 702 00:32:27,040 --> 00:32:30,880 Speaker 4: of like certification program for video because this seems to 703 00:32:30,920 --> 00:32:33,920 Speaker 4: be my dad's a boomer spends a lot of time 704 00:32:33,960 --> 00:32:36,960 Speaker 4: on Facebook, Like this seems to be what society needs, right, 705 00:32:37,080 --> 00:32:39,240 Speaker 4: Like a video that comes with a little thing that 706 00:32:39,280 --> 00:32:42,680 Speaker 4: says this is not AI generated and someone has actually 707 00:32:42,760 --> 00:32:44,320 Speaker 4: like rubber stamped that, so. 708 00:32:44,360 --> 00:32:47,240 Speaker 7: There's an organization called c TWOPA, and I think they're 709 00:32:47,280 --> 00:32:52,000 Speaker 7: doing pretty good work on content provenance. Basically, they are 710 00:32:52,040 --> 00:32:57,520 Speaker 7: working with phone makers and hardware makers to basically embed 711 00:32:57,640 --> 00:33:02,080 Speaker 7: like hardware signatures to prove that image and video we're 712 00:33:02,080 --> 00:33:03,120 Speaker 7: truly taken from. 713 00:33:03,000 --> 00:33:05,120 Speaker 4: The hardware like watermarks basically. 714 00:33:04,840 --> 00:33:07,720 Speaker 7: Yeah, exactly so, So rather than marking the AI outputs, yeah, 715 00:33:07,760 --> 00:33:11,400 Speaker 7: we're instead embedding like a proof of authenticity in the 716 00:33:12,360 --> 00:33:15,080 Speaker 7: the like thing that's real and is captured. 717 00:33:14,760 --> 00:33:15,200 Speaker 5: In real life. 718 00:33:15,280 --> 00:33:19,480 Speaker 3: That's interesting, all right, So big picture, where's the Internet going? 719 00:33:19,640 --> 00:33:21,440 Speaker 2: You know, you mentioned forty percent of the Internet is 720 00:33:21,440 --> 00:33:24,560 Speaker 2: already air generated, but maybe that's something end of the world, Like, 721 00:33:25,000 --> 00:33:26,719 Speaker 2: you know, if it's just a bunch of SEO pages 722 00:33:26,760 --> 00:33:29,160 Speaker 2: that I never read, I don't know whatever, But like 723 00:33:29,560 --> 00:33:31,840 Speaker 2: give us some thoughts high level about like with the 724 00:33:31,880 --> 00:33:35,800 Speaker 2: trajectory of the Internet. Regardless of the uptake of Pangram 725 00:33:35,800 --> 00:33:37,360 Speaker 2: and other AD detection models. 726 00:33:37,560 --> 00:33:40,600 Speaker 5: I'm a little bit worried about the state of the Internet. 727 00:33:40,600 --> 00:33:41,440 Speaker 5: I'm gonna be honest. 728 00:33:41,880 --> 00:33:44,720 Speaker 7: I think like right now, there's still like so much 729 00:33:44,760 --> 00:33:47,400 Speaker 7: of it is built around trust and norms in a 730 00:33:47,440 --> 00:33:50,480 Speaker 7: way that like we're we're not really well equipped to 731 00:33:50,680 --> 00:33:53,720 Speaker 7: suddenly deal with an onslaught of bots at a completely 732 00:33:53,720 --> 00:33:55,320 Speaker 7: different scale than we've dealt with before. 733 00:33:55,920 --> 00:33:58,240 Speaker 5: There's maybe like a good case and a bad case. 734 00:33:58,480 --> 00:34:00,560 Speaker 7: I would say, like the bad case is the Internet 735 00:34:00,680 --> 00:34:04,240 Speaker 7: goes the way of debt internet theory, just like every 736 00:34:04,280 --> 00:34:07,280 Speaker 7: space that's open and accessible is just flooded by bots, 737 00:34:07,600 --> 00:34:10,000 Speaker 7: and then the only place people are able to communicate 738 00:34:10,040 --> 00:34:14,239 Speaker 7: authentically is in like very walled garden like closed servers 739 00:34:14,280 --> 00:34:17,280 Speaker 7: like like discord service for example, where you know everybody's 740 00:34:17,360 --> 00:34:19,000 Speaker 7: identity is known and you know you don't. 741 00:34:18,800 --> 00:34:21,600 Speaker 5: Have bots in here. So that's maybe the like bad scenario. 742 00:34:21,920 --> 00:34:24,399 Speaker 2: Can I do an insane thought that I've had go on, 743 00:34:25,360 --> 00:34:28,440 Speaker 2: We're gonna kick out of this? So when like I 744 00:34:28,480 --> 00:34:30,799 Speaker 2: forget what they call like this idea of like for 745 00:34:30,880 --> 00:34:31,880 Speaker 2: the bad actors, it's. 746 00:34:31,680 --> 00:34:34,200 Speaker 3: Called like heaven mode or heaven banning. Have you heard 747 00:34:34,200 --> 00:34:36,640 Speaker 3: of this? So there's this thought that one way. 748 00:34:36,520 --> 00:34:40,319 Speaker 2: You could deal with bad actors on the Internet is 749 00:34:41,280 --> 00:34:44,480 Speaker 2: suddenly they're on a version of say Twitter, in which 750 00:34:44,520 --> 00:34:47,480 Speaker 2: they're only bots and everyone always agrees with them on 751 00:34:47,520 --> 00:34:50,080 Speaker 2: everything and it drives them crazy and stuff like that, 752 00:34:50,320 --> 00:34:52,239 Speaker 2: and they would never know it because they're like, oh, 753 00:34:52,239 --> 00:34:54,160 Speaker 2: there's call, everyone's there, and then it's so like slowly 754 00:34:54,200 --> 00:34:56,040 Speaker 2: like yeah, they just this is like a way you 755 00:34:56,080 --> 00:34:58,279 Speaker 2: could punish people by putting them on an internet where 756 00:34:58,320 --> 00:34:59,480 Speaker 2: they will never get any fight. 757 00:35:00,120 --> 00:35:02,560 Speaker 7: Band and put into basically jail. You're talking a bunch. 758 00:35:02,360 --> 00:35:04,040 Speaker 3: Of that's right, that's right, that would be jail. But 759 00:35:04,080 --> 00:35:04,799 Speaker 3: you're heaven banned. 760 00:35:04,920 --> 00:35:07,080 Speaker 2: But I thought, and again, this is you know, like 761 00:35:07,080 --> 00:35:09,000 Speaker 2: I built this little am model myself and I like 762 00:35:09,000 --> 00:35:11,399 Speaker 2: showed it to my friends, like, oh, it's really cool, Joe. 763 00:35:11,400 --> 00:35:13,719 Speaker 2: I'm really oppressed, Like I'm really impressed by like that 764 00:35:13,760 --> 00:35:16,239 Speaker 2: you're able to do this. And I was like, are 765 00:35:16,280 --> 00:35:18,520 Speaker 2: people being honest with me? Have I been heaven banned? 766 00:35:18,520 --> 00:35:20,799 Speaker 2: Because I just like, like, you can be honest with 767 00:35:20,840 --> 00:35:21,560 Speaker 2: me if it sucks. 768 00:35:21,560 --> 00:35:23,400 Speaker 3: And I sort of have the fear. 769 00:35:23,360 --> 00:35:26,840 Speaker 4: The biggest humble braggad this thing and everyone thought it 770 00:35:26,880 --> 00:35:27,399 Speaker 4: was not great. 771 00:35:27,520 --> 00:35:29,279 Speaker 3: I'm just saying, like people are like I think people. 772 00:35:29,320 --> 00:35:31,560 Speaker 3: I'm worried that like people bring nice to me because like, 773 00:35:31,560 --> 00:35:33,400 Speaker 3: oh cool, Yeah that's repressed. You like did that. 774 00:35:33,560 --> 00:35:36,440 Speaker 2: And I have this like deep anxiety that like people 775 00:35:36,440 --> 00:35:38,520 Speaker 2: aren't giving it to me straight about it. I know 776 00:35:38,560 --> 00:35:40,120 Speaker 2: that sounds like a humble brag, but it's really not. 777 00:35:40,320 --> 00:35:42,120 Speaker 7: That's why you can never get like too successful, like 778 00:35:42,200 --> 00:35:45,080 Speaker 7: Maya West surrounded by a bunch of you never get. 779 00:35:44,880 --> 00:35:47,799 Speaker 2: Like, oh, this is his first try doing something with 780 00:35:47,960 --> 00:35:50,080 Speaker 2: vibe coding. I'm like deeply anxious, Like, no, you could 781 00:35:50,120 --> 00:35:52,480 Speaker 2: just tell me if it sucks, that's fine, that's my worry. 782 00:35:53,000 --> 00:35:53,920 Speaker 6: I don't worry about this. 783 00:35:54,040 --> 00:35:56,439 Speaker 4: If I tweet that I'm eating a steak, I will 784 00:35:56,440 --> 00:35:59,520 Speaker 4: get like a hundred people criticized and you didn't. 785 00:35:59,360 --> 00:35:59,839 Speaker 3: Put the meat. 786 00:36:00,120 --> 00:36:00,520 Speaker 2: Yeah. 787 00:36:00,560 --> 00:36:00,960 Speaker 5: Yeah. 788 00:36:01,000 --> 00:36:02,839 Speaker 2: So that's the other thing, which is that the two 789 00:36:02,920 --> 00:36:06,560 Speaker 2: things you are never allowed to tweet about meat preparation 790 00:36:07,160 --> 00:36:09,640 Speaker 2: and enjoying life, because if you ever enjoy life, then 791 00:36:09,640 --> 00:36:11,600 Speaker 2: if you ever enjoy it, and if you ever prepare. 792 00:36:11,360 --> 00:36:14,280 Speaker 3: Meat, people will flip out at you on the internet. 793 00:36:14,360 --> 00:36:16,279 Speaker 3: Those are the two things that you're not allowed to 794 00:36:16,360 --> 00:36:17,080 Speaker 3: do online. 795 00:36:17,280 --> 00:36:19,759 Speaker 4: Very true, this sort of related question, But just going 796 00:36:19,800 --> 00:36:22,600 Speaker 4: back to the methodology, if you're focused on this sort 797 00:36:22,600 --> 00:36:26,000 Speaker 4: of like path dependent idea, I'm kind of envisioning it 798 00:36:26,040 --> 00:36:29,279 Speaker 4: as like a giant decision tree, right, is there a 799 00:36:29,320 --> 00:36:32,839 Speaker 4: possibility that as the models get better and better, and 800 00:36:32,880 --> 00:36:35,839 Speaker 4: we know that they're already injecting like some degree of 801 00:36:36,120 --> 00:36:39,800 Speaker 4: randomness into their output. Although I know there's going to 802 00:36:39,800 --> 00:36:42,000 Speaker 4: be a pedant out there who like messages me and 803 00:36:42,040 --> 00:36:44,880 Speaker 4: says like, well, you know computers can't do like true randomness. 804 00:36:44,880 --> 00:36:49,480 Speaker 4: But setting that aside, setting that aside, like, we know 805 00:36:49,560 --> 00:36:53,640 Speaker 4: that they're adjusting, they're becoming more sophisticated at an incredible rate. 806 00:36:53,719 --> 00:36:57,480 Speaker 4: We know that they're trying to adjust and inject some 807 00:36:57,719 --> 00:37:01,000 Speaker 4: randomness in order to avoid exactly this kind of detection. 808 00:37:01,880 --> 00:37:05,160 Speaker 4: Do you worry about their own adaptation at all? 809 00:37:05,480 --> 00:37:08,600 Speaker 7: I have noticed that the models as they get more capable, 810 00:37:08,880 --> 00:37:12,279 Speaker 7: I believe that their output distribution gets more complex. It's 811 00:37:12,320 --> 00:37:14,920 Speaker 7: harder to learn with a simple model, which is why 812 00:37:14,960 --> 00:37:18,560 Speaker 7: we've been increasing our model size to capture a higher 813 00:37:18,600 --> 00:37:22,319 Speaker 7: complexity function that can capture the LM outputs. So I 814 00:37:22,320 --> 00:37:25,719 Speaker 7: think we may have to continue to make our models better. 815 00:37:25,960 --> 00:37:27,359 Speaker 7: We're gonna have to work to keep up with it. 816 00:37:27,719 --> 00:37:29,400 Speaker 7: We can't just rest on our laurels. 817 00:37:29,560 --> 00:37:31,399 Speaker 3: What our birstiness and perplexity. 818 00:37:31,760 --> 00:37:34,799 Speaker 7: Yeah, so this is a metric that's used by some 819 00:37:34,920 --> 00:37:37,960 Speaker 7: AI detectors, but not Pangram okay, And so I can 820 00:37:38,000 --> 00:37:41,319 Speaker 7: explain a bit about how it works. So perplexity is 821 00:37:41,480 --> 00:37:42,799 Speaker 7: Basically a measure of this. 822 00:37:42,800 --> 00:37:45,040 Speaker 2: Is not perplexity dot AI the website. This is a 823 00:37:45,080 --> 00:37:45,680 Speaker 2: technical term. 824 00:37:45,719 --> 00:37:48,640 Speaker 7: Okay, this is a metric. This is a measure of 825 00:37:48,719 --> 00:37:52,760 Speaker 7: how confusing a piece of text is to a language model. 826 00:37:53,320 --> 00:37:58,080 Speaker 7: So basically, if, for example, with every token, we can 827 00:37:58,120 --> 00:38:00,800 Speaker 7: calculate some perplexity, which is basically like how expected is 828 00:38:00,840 --> 00:38:03,600 Speaker 7: this is. So for example, like if it's I went 829 00:38:03,640 --> 00:38:06,560 Speaker 7: home to my pet and then the next token is chinchilla, 830 00:38:06,840 --> 00:38:09,000 Speaker 7: that'd be a much higher perplexity token. 831 00:38:08,960 --> 00:38:09,880 Speaker 5: Than my pet dog. 832 00:38:10,600 --> 00:38:16,000 Speaker 7: So low perplexity text or really like LM outputs tend 833 00:38:16,000 --> 00:38:19,040 Speaker 7: to be low perplexity. They're not going to produce outputs 834 00:38:19,080 --> 00:38:22,960 Speaker 7: that are surprising to themselves. So this is a decent 835 00:38:23,000 --> 00:38:26,160 Speaker 7: way to get an AI detector that's around ninety to 836 00:38:26,239 --> 00:38:30,000 Speaker 7: ninety five percent accurate. But it has some problems. The 837 00:38:30,000 --> 00:38:33,920 Speaker 7: main one is that you can't improve upon it. Basically 838 00:38:34,160 --> 00:38:38,160 Speaker 7: it has false positives. Text written by non native English 839 00:38:38,160 --> 00:38:41,440 Speaker 7: speakers often is low perplexity just because when you're late. 840 00:38:41,440 --> 00:38:42,880 Speaker 3: Don't take as many risks. Exactly. 841 00:38:43,000 --> 00:38:46,400 Speaker 7: Yeah, interesting, Yeah, So that's why a lot of the 842 00:38:46,440 --> 00:38:49,440 Speaker 7: early AI detectors had a bunch of false positives. With 843 00:38:49,800 --> 00:38:53,640 Speaker 7: ESL speakers. It's because their text was low perplexity. So 844 00:38:54,080 --> 00:38:56,600 Speaker 7: I think, like, this is a very cool metric, but 845 00:38:56,800 --> 00:38:59,120 Speaker 7: it is not the path for pangram. 846 00:38:59,120 --> 00:39:01,520 Speaker 5: Instead, we went the deep approach, so we can do 847 00:39:01,600 --> 00:39:02,120 Speaker 5: better than. 848 00:39:02,040 --> 00:39:04,359 Speaker 3: And what's in this is that just the opposite side 849 00:39:04,360 --> 00:39:04,759 Speaker 3: of the coin. 850 00:39:05,239 --> 00:39:09,040 Speaker 7: Yeah, Burstinus is basically actually, yeah, I don't know if 851 00:39:09,040 --> 00:39:09,600 Speaker 7: I can define it. 852 00:39:09,719 --> 00:39:13,319 Speaker 4: Okay, fine, Burstinus just sounds like one of those like 853 00:39:13,560 --> 00:39:16,960 Speaker 4: sort of I guess manosphere terms, doesn't it like, oh, 854 00:39:17,040 --> 00:39:17,520 Speaker 4: yeah he. 855 00:39:17,480 --> 00:39:20,320 Speaker 6: Has like he's been looksmaxing with high burst nets or 856 00:39:20,360 --> 00:39:20,759 Speaker 6: something like that. 857 00:39:21,440 --> 00:39:22,200 Speaker 3: Yeah, that's great. 858 00:39:22,239 --> 00:39:24,080 Speaker 7: Yeah, I think it might just be like a measure 859 00:39:24,160 --> 00:39:27,840 Speaker 7: of like sentence Lengthen, how the ups and downs of 860 00:39:27,880 --> 00:39:28,320 Speaker 7: the text. 861 00:39:28,960 --> 00:39:32,279 Speaker 4: If we assume that the world is collectively concerned about 862 00:39:32,280 --> 00:39:34,960 Speaker 4: AI slop and wants to do something about it, what 863 00:39:35,000 --> 00:39:39,120 Speaker 4: would be like the single biggest change to the system, 864 00:39:39,480 --> 00:39:42,080 Speaker 4: either in terms of like the economics of the internet 865 00:39:42,160 --> 00:39:46,120 Speaker 4: or regulation or technology like what you're developing that would 866 00:39:46,160 --> 00:39:48,160 Speaker 4: actually help reduce slop. 867 00:39:48,440 --> 00:39:51,080 Speaker 7: Yeah, I think the biggest one is norms. So there 868 00:39:51,080 --> 00:39:53,400 Speaker 7: have been a couple of great blog posts written about 869 00:39:53,440 --> 00:39:58,120 Speaker 7: how it is rude to send other people undisclosed AI outputs, 870 00:39:58,719 --> 00:40:02,359 Speaker 7: and I think I like completely agree here. I think, 871 00:40:02,480 --> 00:40:04,239 Speaker 7: you know, if somebody like asks the question on the 872 00:40:04,239 --> 00:40:06,759 Speaker 7: Internet and then somebody else like goes and puts into 873 00:40:06,800 --> 00:40:08,960 Speaker 7: chat CHEPT and then like pace the answer, it's kind 874 00:40:08,960 --> 00:40:10,560 Speaker 7: of rude, Like like I was going here to ask 875 00:40:10,800 --> 00:40:13,879 Speaker 7: the opinions of my friends or you know, my followers, not. 876 00:40:14,080 --> 00:40:16,520 Speaker 5: Just like not chat GPT. I could have done that myself. 877 00:40:16,840 --> 00:40:19,640 Speaker 7: And so I think, like building this norm is something 878 00:40:19,680 --> 00:40:22,120 Speaker 7: that you know, it's very new technology, so we need 879 00:40:22,160 --> 00:40:23,040 Speaker 7: to do it quickly. 880 00:40:23,080 --> 00:40:25,760 Speaker 5: But I think this would help a lot for society. 881 00:40:25,800 --> 00:40:27,880 Speaker 2: Well then actually just gets to a question that I 882 00:40:27,920 --> 00:40:30,680 Speaker 2: have then, which is I feel as though the major 883 00:40:30,719 --> 00:40:34,560 Speaker 2: Internet platforms are actually moving the exact opposite direction. I mean, 884 00:40:34,560 --> 00:40:38,320 Speaker 2: I'm stunned. Maybe I accidentally clicked on something at some point, 885 00:40:38,600 --> 00:40:41,520 Speaker 2: but the frequency with which I can email and then 886 00:40:41,560 --> 00:40:43,759 Speaker 2: I open it up to respond in Gmail, and there's 887 00:40:43,800 --> 00:40:47,000 Speaker 2: that ghost text there that do you just want GEM 888 00:40:47,040 --> 00:40:48,279 Speaker 2: and I to respond to this? 889 00:40:48,640 --> 00:40:49,680 Speaker 3: I've never done. 890 00:40:49,480 --> 00:40:52,040 Speaker 2: That, I also consider, I think that would be extremely rude. 891 00:40:52,040 --> 00:40:56,719 Speaker 2: I've never responded to any email with AI respond But 892 00:40:56,760 --> 00:40:59,239 Speaker 2: they're basically telling you to do that. They're doing the 893 00:40:59,239 --> 00:41:01,720 Speaker 2: exact opposite blowing up these norms, And so I'm curious 894 00:41:01,719 --> 00:41:04,680 Speaker 2: from your perspective, you managed to work with Quorra, But 895 00:41:04,920 --> 00:41:09,400 Speaker 2: from your impression, do the major internet platforms think this 896 00:41:09,560 --> 00:41:12,279 Speaker 2: is a problem worth solving or from their consider and 897 00:41:12,280 --> 00:41:14,320 Speaker 2: it is like you know what, Yeah, it feels content 898 00:41:14,400 --> 00:41:14,759 Speaker 2: the better. 899 00:41:14,840 --> 00:41:17,800 Speaker 4: There's mixed incentives for the big company. 900 00:41:17,800 --> 00:41:20,360 Speaker 7: It's funny because like Google seems to be playing both sides. 901 00:41:20,640 --> 00:41:23,680 Speaker 7: So like, on one hand, they had that advertisement which 902 00:41:23,680 --> 00:41:25,680 Speaker 7: people kind of blew up about where it's like, oh, 903 00:41:25,800 --> 00:41:29,480 Speaker 7: children can now send their heroes notes on like how 904 00:41:29,560 --> 00:41:31,799 Speaker 7: much they respect them by using AI instead of like 905 00:41:32,040 --> 00:41:34,160 Speaker 7: writing the note themselves, and like this is wrong, This 906 00:41:34,239 --> 00:41:37,560 Speaker 7: is like societally bad. But at the same time, they're 907 00:41:37,600 --> 00:41:40,799 Speaker 7: working very hard to deal with the AI slop on 908 00:41:40,880 --> 00:41:43,520 Speaker 7: the Internet in search results to make sure people get 909 00:41:43,560 --> 00:41:45,040 Speaker 7: served real content and not. 910 00:41:45,000 --> 00:41:45,960 Speaker 5: AI slot content. 911 00:41:46,640 --> 00:41:49,279 Speaker 7: So I think, I mean, I think obviously there's a 912 00:41:49,320 --> 00:41:51,640 Speaker 7: lot of incentives that play up around like product people 913 00:41:51,680 --> 00:41:55,000 Speaker 7: who are incentivized to push AI because that is the 914 00:41:55,040 --> 00:41:59,359 Speaker 7: corporate mandate. But yeah, I think overall, even like in 915 00:41:59,400 --> 00:42:02,000 Speaker 7: my sphere, a bunch of people who are AI researchers, 916 00:42:02,640 --> 00:42:06,520 Speaker 7: generally consensus is that like AI is a powerful tool, 917 00:42:06,560 --> 00:42:07,600 Speaker 7: but like slop is bad. 918 00:42:07,880 --> 00:42:10,840 Speaker 4: This reminds me my parents used to make me do 919 00:42:10,960 --> 00:42:15,080 Speaker 4: these like handmade greeting cards for every you know, for Christmas, 920 00:42:15,120 --> 00:42:17,160 Speaker 4: for like all relatives and stuff. And it was supposed 921 00:42:17,200 --> 00:42:22,319 Speaker 4: to be a demonstration of my commitment to communicating family. No, no, 922 00:42:22,400 --> 00:42:25,799 Speaker 4: it traumatized me forever. And I hate greeting cards as 923 00:42:25,840 --> 00:42:28,680 Speaker 4: a result of them of doing this, just spending hours 924 00:42:28,800 --> 00:42:31,840 Speaker 4: manufacturing these things. But then, secondly, the funniest thing was 925 00:42:31,920 --> 00:42:36,040 Speaker 4: once we got E cards, my parents immediately switched to 926 00:42:36,200 --> 00:42:40,080 Speaker 4: using e cards and just and now this is also 927 00:42:40,120 --> 00:42:40,879 Speaker 4: the funniest thing. 928 00:42:41,080 --> 00:42:42,359 Speaker 6: My dad uses E card. 929 00:42:42,400 --> 00:42:44,480 Speaker 4: He figured out that the E card system can tell 930 00:42:44,560 --> 00:42:46,680 Speaker 4: him whether or not you opened it, so he just 931 00:42:46,800 --> 00:42:48,680 Speaker 4: uses it as like day to day communication. 932 00:42:48,880 --> 00:42:51,840 Speaker 5: Now that's so funny. 933 00:42:51,880 --> 00:42:54,839 Speaker 3: Just send an email to your daughter E card. 934 00:42:55,120 --> 00:42:56,840 Speaker 4: It's like, I noticed you haven't opened up my E 935 00:42:57,000 --> 00:43:01,640 Speaker 4: card for International Hot Dog Day. Please let me know 936 00:43:01,920 --> 00:43:02,560 Speaker 4: what's going on. 937 00:43:02,640 --> 00:43:05,640 Speaker 2: I'm terrible handwriting as a kid, and my mother made 938 00:43:05,640 --> 00:43:08,480 Speaker 2: me write all of these handwritten notes to thank people 939 00:43:08,520 --> 00:43:09,440 Speaker 2: for the gifts I got for. 940 00:43:09,480 --> 00:43:10,400 Speaker 3: My bar mitzvah. 941 00:43:10,480 --> 00:43:12,839 Speaker 2: Yeah, I hated it, but you know what, I have 942 00:43:12,960 --> 00:43:14,359 Speaker 2: keep connections with all of. 943 00:43:14,320 --> 00:43:16,360 Speaker 3: Those people that have lasted over the years. 944 00:43:16,760 --> 00:43:19,400 Speaker 2: In that miserable one week where I just wrote and 945 00:43:19,440 --> 00:43:21,600 Speaker 2: I got, you know, hand creamped, I think it. 946 00:43:21,560 --> 00:43:22,520 Speaker 3: Paid off, all right. 947 00:43:22,520 --> 00:43:27,400 Speaker 4: Well, imagine doing that for like sixteen years basically in 948 00:43:27,400 --> 00:43:28,960 Speaker 4: a never ending stream. 949 00:43:29,000 --> 00:43:31,360 Speaker 3: Max Birou, thank you so much for coming on out Laws. 950 00:43:31,400 --> 00:43:33,600 Speaker 3: That was a lot of fun. I'm fascinated by this conversation. 951 00:43:33,800 --> 00:43:35,759 Speaker 7: Thanks so much for having me. Yeah, really exciting to 952 00:43:35,800 --> 00:43:38,480 Speaker 7: talk about this. And I think slaps is a growing problem, 953 00:43:38,520 --> 00:43:40,160 Speaker 7: so hopefully awesome RAPK deal with it. 954 00:43:41,120 --> 00:43:42,200 Speaker 6: Of the internet, I. 955 00:43:42,200 --> 00:43:44,040 Speaker 4: Can't tell if I'm surprised by that oring on. 956 00:43:44,280 --> 00:43:45,960 Speaker 3: And what's it going to be next year at this time? 957 00:43:46,280 --> 00:43:47,399 Speaker 5: Oh man, I don't know. 958 00:43:47,760 --> 00:43:49,800 Speaker 3: It'll be like hard to stay over with Georgian that 959 00:43:49,880 --> 00:43:50,280 Speaker 3: for sure. 960 00:43:50,719 --> 00:43:52,120 Speaker 5: Yeah, almost certainly crazy. 961 00:43:52,400 --> 00:43:53,480 Speaker 3: All right, thanks for coming on. 962 00:43:53,440 --> 00:44:02,560 Speaker 5: Oudlin, Thanks. 963 00:44:07,440 --> 00:44:08,920 Speaker 3: Tracy. I love that conversation. 964 00:44:09,000 --> 00:44:10,799 Speaker 2: I just think it's like a really fun puzzle, right, 965 00:44:11,719 --> 00:44:15,840 Speaker 2: It's very like it seems like a fun question to solve, 966 00:44:15,920 --> 00:44:19,520 Speaker 2: And I'm fascinated by this idea of how like with 967 00:44:19,719 --> 00:44:24,239 Speaker 2: both humans and AI, there's gonna be this gap inevitable 968 00:44:24,480 --> 00:44:27,319 Speaker 2: between what we know and what we can articulate because 969 00:44:27,360 --> 00:44:29,479 Speaker 2: you and I both setting aside a a versus text, 970 00:44:29,640 --> 00:44:32,160 Speaker 2: there are things that we both know. For example, this 971 00:44:32,200 --> 00:44:34,760 Speaker 2: is newsworthy, and this is this is a good episode 972 00:44:34,800 --> 00:44:37,880 Speaker 2: of a podcast, This is a credible sounding guest, and 973 00:44:37,920 --> 00:44:41,040 Speaker 2: this isn't the gap between that and then being able 974 00:44:41,080 --> 00:44:43,520 Speaker 2: to explain why, it's like, well, you just sort of 975 00:44:43,600 --> 00:44:45,560 Speaker 2: know it, right, You just sort of have this feeling there, 976 00:44:46,560 --> 00:44:50,479 Speaker 2: and that intuition is built up from numerous examples, which 977 00:44:50,520 --> 00:44:52,200 Speaker 2: is the same way in a sense that like the 978 00:44:52,239 --> 00:44:53,240 Speaker 2: AI is trained. 979 00:44:53,320 --> 00:44:54,360 Speaker 3: It's like these. 980 00:44:54,239 --> 00:44:56,760 Speaker 2: Things that you only know from patterns and you can 981 00:44:57,160 --> 00:45:00,520 Speaker 2: see them without fully being able to, like article exactly 982 00:45:00,560 --> 00:45:01,160 Speaker 2: what's going on. 983 00:45:01,280 --> 00:45:02,360 Speaker 6: Well, the other. 984 00:45:02,239 --> 00:45:05,239 Speaker 4: Question I would have on that is is it even 985 00:45:05,280 --> 00:45:07,680 Speaker 4: going to matter in the long run if you think about, 986 00:45:07,719 --> 00:45:10,960 Speaker 4: like so much of the Internet is already built on 987 00:45:11,040 --> 00:45:14,120 Speaker 4: bots and the sort of like false attention economy, Like 988 00:45:14,800 --> 00:45:21,680 Speaker 4: if our entire like worldview becomes shaped by AI driven drivel, yeah, 989 00:45:22,560 --> 00:45:25,440 Speaker 4: does it matter if like the economics of the Internet 990 00:45:25,560 --> 00:45:28,759 Speaker 4: are still attached to individual bought accounts and things like that. 991 00:45:28,760 --> 00:45:31,640 Speaker 6: I don't know if I'm if I'm explaining this, but. 992 00:45:31,760 --> 00:45:33,040 Speaker 2: No, no, I think it makes a lot of sense, 993 00:45:33,080 --> 00:45:36,160 Speaker 2: and I do think like it is important, like we're. 994 00:45:36,040 --> 00:45:37,799 Speaker 3: Going to have to change the entire way with them. 995 00:45:38,000 --> 00:45:40,399 Speaker 2: And Max said at the beginning, which is, and I've 996 00:45:40,400 --> 00:45:42,759 Speaker 2: thought about this, which is that it used to be 997 00:45:43,120 --> 00:45:45,000 Speaker 2: that if you came across a piece of writing and 998 00:45:45,040 --> 00:45:49,120 Speaker 2: the punctuation was excellent and the spelling was excellent, and 999 00:45:49,160 --> 00:45:51,680 Speaker 2: it was like cogent sounding, you're like, okay, this has 1000 00:45:51,680 --> 00:45:55,239 Speaker 2: been written by a smart person. I will read the seriously, right, 1001 00:45:55,880 --> 00:45:59,160 Speaker 2: And now there is this complete severance of sort of 1002 00:45:59,200 --> 00:46:02,440 Speaker 2: like craft and out put because you could and you 1003 00:46:02,520 --> 00:46:05,759 Speaker 2: did this, Like, ask Claude to write an argument in 1004 00:46:05,800 --> 00:46:10,239 Speaker 2: favor of the most absurd proposition imaginable. Ask Claude to 1005 00:46:11,000 --> 00:46:15,520 Speaker 2: write an argument for me that the reason why Reagan 1006 00:46:15,560 --> 00:46:18,200 Speaker 2: wanted to do tax cuts in the early nineteen eighties 1007 00:46:18,680 --> 00:46:22,200 Speaker 2: related to these reports of UFO sightings in the nineteen seventies, 1008 00:46:22,600 --> 00:46:25,719 Speaker 2: and it will write something that not only is it 1009 00:46:25,760 --> 00:46:28,319 Speaker 2: grammatically correct, it'll actually like strain to come up with 1010 00:46:28,360 --> 00:46:31,000 Speaker 2: the best version of this argument before and again if 1011 00:46:31,080 --> 00:46:33,560 Speaker 2: prior to that, having read and like, maybe the person 1012 00:46:34,160 --> 00:46:37,000 Speaker 2: like this person took this argument seriously, but now this 1013 00:46:37,080 --> 00:46:39,880 Speaker 2: argument is just created. Ax nail Oh We're going to 1014 00:46:39,960 --> 00:46:42,440 Speaker 2: have to really like change our heuristics about this stuff. 1015 00:46:42,480 --> 00:46:46,600 Speaker 4: We've created an unlimited stream of basically cranks, which is 1016 00:46:46,680 --> 00:46:47,480 Speaker 4: really good grammar. 1017 00:46:47,640 --> 00:46:50,520 Speaker 2: Yeah, that's right, that's right, because it used to be 1018 00:46:50,560 --> 00:46:52,360 Speaker 2: we knew the crank because they had bad grammar, or 1019 00:46:52,360 --> 00:46:55,239 Speaker 2: they would email us and like half the words would 1020 00:46:55,239 --> 00:46:57,520 Speaker 2: be in yellow and the other half would be underlined green. 1021 00:46:57,560 --> 00:47:01,279 Speaker 4: Inlastic exams, the tools that we use to just like, oh, 1022 00:47:01,280 --> 00:47:03,920 Speaker 4: this person's a crank, they like, you know, half the 1023 00:47:04,000 --> 00:47:05,799 Speaker 4: words are at all caps and stuff like that. 1024 00:47:06,200 --> 00:47:07,280 Speaker 3: Those don't work anymore. 1025 00:47:07,320 --> 00:47:09,440 Speaker 4: All right, on that note, shall we leave it there? 1026 00:47:09,520 --> 00:47:10,160 Speaker 3: Let's save it there. 1027 00:47:10,320 --> 00:47:12,799 Speaker 4: This has been another episode of the Authlots podcast. I'm 1028 00:47:12,840 --> 00:47:15,600 Speaker 4: Tracy Alloway. You can follow me at Tracy Alloway. 1029 00:47:15,320 --> 00:47:18,080 Speaker 2: And I'm joll Wisenthal. You can follow me at the Stalwart. 1030 00:47:18,400 --> 00:47:22,480 Speaker 2: Follow our guest Max Spiro. He's at Max Underscore Spiro Underscore. 1031 00:47:22,719 --> 00:47:25,960 Speaker 2: Follow our producers Carmen Rodriguez at Carmen Arman, dash Sho 1032 00:47:26,040 --> 00:47:29,120 Speaker 2: Bennett at Dashbot, and Cal Brooks at Kilbrooks. And for 1033 00:47:29,239 --> 00:47:32,359 Speaker 2: more oddloss content, go to Bloomberg dot com slash odd Lots. 1034 00:47:32,360 --> 00:47:34,799 Speaker 2: We're a daily newsletter and all of our episodes, and 1035 00:47:34,840 --> 00:47:36,640 Speaker 2: you can chat about all of these topics twenty four 1036 00:47:36,680 --> 00:47:40,279 Speaker 2: to seven in our discord discord dot gg slash od 1037 00:47:40,280 --> 00:47:41,000 Speaker 2: lots And. 1038 00:47:41,080 --> 00:47:43,279 Speaker 4: If you enjoy odlots, if you like it when we 1039 00:47:43,320 --> 00:47:46,120 Speaker 4: talk about how the Internet is forty percent slop, then 1040 00:47:46,160 --> 00:47:49,280 Speaker 4: please leave us a positive review on your favorite podcast platform. 1041 00:47:49,520 --> 00:47:51,880 Speaker 4: And remember, if you are a Bloomberg subscriber, you can 1042 00:47:51,920 --> 00:47:55,000 Speaker 4: listen to all of our episodes absolutely ad free. All 1043 00:47:55,000 --> 00:47:56,920 Speaker 4: you need to do is find the Bloomberg channel on 1044 00:47:57,000 --> 00:47:59,239 Speaker 4: Apple Podcasts and follow the instructions there. 1045 00:47:59,640 --> 00:48:00,480 Speaker 6: Thanks listening,