WEBVTT - This Is How to Tell if Writing Was Made by AI

0:00:02.720 --> 0:00:16.360
<v Speaker 1>Bloomberg Audio Studios, Podcasts, Radio News.

0:00:18.480 --> 0:00:21.840
<v Speaker 2>Hello and welcome to another episode of The Odd Lads podcast.

0:00:21.920 --> 0:00:24.119
<v Speaker 3>I'm jille Wisenthal and I'm Tracy Alloway.

0:00:24.360 --> 0:00:27.280
<v Speaker 2>So, Tracy, you know, you ever come across some writing

0:00:28.160 --> 0:00:31.720
<v Speaker 2>you can't articulate exactly why, but you're like, I'm pretty

0:00:31.760 --> 0:00:32.720
<v Speaker 2>sure AI wrote this?

0:00:33.120 --> 0:00:34.160
<v Speaker 3>Does this happen too much?

0:00:34.280 --> 0:00:38.600
<v Speaker 4>So, full disclosure, I haven't really thought about it that much. Yeah,

0:00:38.640 --> 0:00:41.640
<v Speaker 4>because the thing is I probably should think about it more,

0:00:41.960 --> 0:00:43.960
<v Speaker 4>but there's a lot of bad writing out there, and

0:00:44.000 --> 0:00:46.440
<v Speaker 4>I've become sort of a nerd to it. And I

0:00:46.479 --> 0:00:49.680
<v Speaker 4>also think that I don't know trying to figure out

0:00:49.680 --> 0:00:53.880
<v Speaker 4>whether or not something was generated by AI nowadays, if

0:00:53.880 --> 0:00:55.960
<v Speaker 4>you actually dedicate a lot of your own time to

0:00:56.120 --> 0:01:00.880
<v Speaker 4>doing that, that is a huge mental burden to be attempting.

0:01:01.000 --> 0:01:03.480
<v Speaker 4>Especially you and I are in the journalism industry. How

0:01:03.560 --> 0:01:05.760
<v Speaker 4>many of the pitches do you think that we get

0:01:05.800 --> 0:01:08.440
<v Speaker 4>from prs right now are being generated by A I

0:01:08.760 --> 0:01:11.520
<v Speaker 4>imagine if you're reading each one of those and trying

0:01:11.560 --> 0:01:13.440
<v Speaker 4>to figure it out on a daily basis.

0:01:13.560 --> 0:01:15.360
<v Speaker 2>You know what I suppose I think about it the

0:01:15.360 --> 0:01:18.200
<v Speaker 2>most is someone will respond to a tweet yeah, and

0:01:18.240 --> 0:01:19.800
<v Speaker 2>I'll be like, well, if this is a real person,

0:01:19.800 --> 0:01:22.319
<v Speaker 2>then maybe this person deserves some engagement and ask a

0:01:22.400 --> 0:01:24.560
<v Speaker 2>question or I want to respond. But if there's a

0:01:24.560 --> 0:01:26.880
<v Speaker 2>person in the bot, then obviously I don't. And that's

0:01:26.880 --> 0:01:28.399
<v Speaker 2>where I look, you know what, I want to figure

0:01:28.400 --> 0:01:30.800
<v Speaker 2>it out. I would like to know the answer.

0:01:30.959 --> 0:01:31.160
<v Speaker 3>You know.

0:01:31.200 --> 0:01:34.200
<v Speaker 2>I have a controversial view about AI writing, by the way,

0:01:34.240 --> 0:01:36.640
<v Speaker 2>which is that it's pretty good. I mean, like, by

0:01:36.680 --> 0:01:38.920
<v Speaker 2>and large, and I said this, I think maybe in

0:01:38.920 --> 0:01:41.640
<v Speaker 2>a recent episode. When you consider the fact that I

0:01:41.680 --> 0:01:44.679
<v Speaker 2>don't know the majority of the population, like doesn't know

0:01:44.680 --> 0:01:47.600
<v Speaker 2>where to put a comma within the sentence, Well, this

0:01:47.680 --> 0:01:48.120
<v Speaker 2>is my point.

0:01:48.320 --> 0:01:48.960
<v Speaker 3>It's pretty good.

0:01:48.960 --> 0:01:49.400
<v Speaker 5>I mean, yeah.

0:01:49.400 --> 0:01:51.160
<v Speaker 2>One thing I'll say about AI is it never gets

0:01:51.240 --> 0:01:52.480
<v Speaker 2>the placement of a comma wrong.

0:01:52.840 --> 0:01:54.160
<v Speaker 3>On some level, it's perfect.

0:01:54.320 --> 0:01:56.000
<v Speaker 6>Did you do that? I think it was in the

0:01:56.000 --> 0:01:57.560
<v Speaker 6>New York Times the test.

0:01:57.600 --> 0:01:58.360
<v Speaker 3>I kind of hated that.

0:01:58.560 --> 0:02:01.200
<v Speaker 2>Okay, why well, because I'll tell you, first of all,

0:02:01.160 --> 0:02:02.240
<v Speaker 2>it's a five examples.

0:02:02.280 --> 0:02:04.520
<v Speaker 3>There's not very many. Two It asked the reader, which

0:02:04.520 --> 0:02:05.080
<v Speaker 3>do you prefer?

0:02:05.160 --> 0:02:07.120
<v Speaker 4>But I think they were different subjects as well.

0:02:07.200 --> 0:02:07.440
<v Speaker 3>Yeah.

0:02:07.600 --> 0:02:10.080
<v Speaker 2>Also, I think most people probably treated that as can

0:02:10.120 --> 0:02:11.959
<v Speaker 2>you guess which one is a human? Because everyone wants

0:02:12.000 --> 0:02:14.320
<v Speaker 2>to say they prefer the human I didn't think it

0:02:14.400 --> 0:02:18.400
<v Speaker 2>was like a great test. Nonetheless, Look, not only is

0:02:18.400 --> 0:02:22.360
<v Speaker 2>it often indistinguishable, not often is it often fine writing.

0:02:22.840 --> 0:02:25.359
<v Speaker 2>Sometimes AI could come up with a really remarkable turn

0:02:25.400 --> 0:02:27.760
<v Speaker 2>of phrase. Yeah, but I still buy and large don't

0:02:27.840 --> 0:02:30.000
<v Speaker 2>like it. You read like a thing, especially a long

0:02:30.040 --> 0:02:32.760
<v Speaker 2>text a's AI, and it's like, even if you can't articulate.

0:02:32.360 --> 0:02:33.880
<v Speaker 3>It, it's like this feels AI.

0:02:33.960 --> 0:02:36.640
<v Speaker 2>It has a certain sickliness sweetness to it that is

0:02:36.680 --> 0:02:37.320
<v Speaker 2>often annoying.

0:02:37.320 --> 0:02:38.160
<v Speaker 3>It's annoying.

0:02:38.400 --> 0:02:41.000
<v Speaker 4>What I notice about it is it doesn't do style

0:02:41.200 --> 0:02:43.240
<v Speaker 4>very well, right, So if you ask it to write

0:02:43.240 --> 0:02:45.840
<v Speaker 4>something in the style of a writer, if you choose

0:02:45.880 --> 0:02:49.240
<v Speaker 4>anything other than something really obvious like Shakespeare, it really

0:02:49.480 --> 0:02:53.120
<v Speaker 4>it suffers. But the text that it actually outputs is

0:02:53.160 --> 0:02:58.519
<v Speaker 4>pretty clear. Yeah, right, like for basic understanding. Total it's

0:02:58.639 --> 0:03:01.440
<v Speaker 4>probably better than a lotful what's on the internet.

0:03:01.760 --> 0:03:03.200
<v Speaker 2>The real people who are going to have to worry

0:03:03.240 --> 0:03:07.840
<v Speaker 2>about this are like teachers obviously, universities and lawyers, student

0:03:07.919 --> 0:03:11.040
<v Speaker 2>lawyers and maybe at it's fun, but there are sometimes

0:03:11.080 --> 0:03:12.800
<v Speaker 2>it's like, Okay, did someone write this or not?

0:03:13.000 --> 0:03:14.920
<v Speaker 3>And there has to be it'd be nice if we

0:03:14.960 --> 0:03:16.120
<v Speaker 3>could know the answer.

0:03:16.320 --> 0:03:19.280
<v Speaker 4>Well, the other thing that's starting to happen is have

0:03:19.400 --> 0:03:21.840
<v Speaker 4>you seen any books out there that actually come with

0:03:21.960 --> 0:03:25.240
<v Speaker 4>a disclosure or disclaimer that say this book has been

0:03:25.280 --> 0:03:26.760
<v Speaker 4>written only by humans?

0:03:26.800 --> 0:03:26.880
<v Speaker 5>No?

0:03:27.000 --> 0:03:28.079
<v Speaker 6>AI used at all.

0:03:28.120 --> 0:03:29.720
<v Speaker 4>I saw that for the first time on a book

0:03:29.760 --> 0:03:32.000
<v Speaker 4>that we actually read for an All Blots episode. I

0:03:32.000 --> 0:03:33.600
<v Speaker 4>don't think it's come out yet, but that kind of

0:03:33.639 --> 0:03:33.960
<v Speaker 4>threw me.

0:03:34.320 --> 0:03:34.519
<v Speaker 1>Yeah.

0:03:34.639 --> 0:03:37.480
<v Speaker 2>No, it's more and more anyway, as we enter a

0:03:37.520 --> 0:03:40.400
<v Speaker 2>world at which the vast majority, if not already of

0:03:40.480 --> 0:03:43.120
<v Speaker 2>words written are written by AI, is going to be

0:03:43.200 --> 0:03:45.760
<v Speaker 2>interested in this question of whether we know Anyway, there's

0:03:45.800 --> 0:03:48.520
<v Speaker 2>this company called Pangram Labs, and they have a little

0:03:48.560 --> 0:03:50.440
<v Speaker 2>thing and you can pay for it, but also a

0:03:50.440 --> 0:03:52.600
<v Speaker 2>free service where you can drop like a text in

0:03:53.320 --> 0:03:55.320
<v Speaker 2>and it'll say the odds that is written by human

0:03:55.440 --> 0:03:58.320
<v Speaker 2>or AI. And I'm pretty impressed by it. I like

0:03:58.360 --> 0:04:01.320
<v Speaker 2>did some samples of my own writing and then AI

0:04:01.440 --> 0:04:03.560
<v Speaker 2>outputs it got them all right, But then I did

0:04:03.560 --> 0:04:05.680
<v Speaker 2>some like further, like I tried to stump it to

0:04:05.720 --> 0:04:07.720
<v Speaker 2>see if like. So, what I did was I took

0:04:07.760 --> 0:04:10.280
<v Speaker 2>a piece of AI writing and then I had it

0:04:10.320 --> 0:04:13.600
<v Speaker 2>translated into Chinese, okay, and then I had it translate

0:04:13.640 --> 0:04:16.400
<v Speaker 2>that into High Chinese, so it's like, okay, imagine this

0:04:16.480 --> 0:04:19.160
<v Speaker 2>is being written by a more formal register. And then

0:04:19.200 --> 0:04:21.920
<v Speaker 2>I had that translated into Hebrew, and then I had

0:04:21.960 --> 0:04:24.960
<v Speaker 2>that translated into English. So the original thing through this

0:04:25.080 --> 0:04:27.920
<v Speaker 2>series of Ai telephone, through various translations, and then I

0:04:27.960 --> 0:04:30.240
<v Speaker 2>put that output back into Pangram.

0:04:30.360 --> 0:04:31.640
<v Speaker 3>I got that right. It said it was Ai.

0:04:31.720 --> 0:04:35.240
<v Speaker 2>So even after a series of sort of transformations designed

0:04:35.279 --> 0:04:39.280
<v Speaker 2>to obfuscate the original style of the piece to see

0:04:39.320 --> 0:04:41.600
<v Speaker 2>if you know, eventually it would emerge in something else.

0:04:41.839 --> 0:04:44.160
<v Speaker 2>So I was pretty impressed. It seems to work. And

0:04:44.240 --> 0:04:46.400
<v Speaker 2>you know, I think that's interesting for a couple of reasons,

0:04:46.400 --> 0:04:49.320
<v Speaker 2>which is maybe there is something that you can just tell.

0:04:49.680 --> 0:04:52.120
<v Speaker 2>But two, it sort of worries me because you know,

0:04:52.320 --> 0:04:54.480
<v Speaker 2>there have been articles and they'll say like, this is

0:04:54.480 --> 0:04:56.360
<v Speaker 2>written by Ai, And I think one of my big

0:04:56.360 --> 0:04:58.240
<v Speaker 2>fears would be that I write something.

0:04:58.600 --> 0:04:59.760
<v Speaker 3>I like to use an mdash.

0:05:00.000 --> 0:05:02.520
<v Speaker 4>I've always been in them, dash fan, I love m dashes.

0:05:02.600 --> 0:05:03.520
<v Speaker 4>That's how people talk.

0:05:03.640 --> 0:05:04.200
<v Speaker 6>I'm sorry.

0:05:04.400 --> 0:05:06.400
<v Speaker 2>And then what if it says you wrote this by Ai,

0:05:06.640 --> 0:05:08.560
<v Speaker 2>and I'm like, I didn't, And then here's this black

0:05:08.600 --> 0:05:11.680
<v Speaker 2>box that is suddenly like Judge Jurgen, executioner for my

0:05:12.279 --> 0:05:15.880
<v Speaker 2>career potentially who wrote this. AI the Lab says, so

0:05:16.440 --> 0:05:18.640
<v Speaker 2>you are now done? Like that worries me. So I

0:05:18.640 --> 0:05:21.680
<v Speaker 2>think this raises a lot of very interesting questions about

0:05:21.680 --> 0:05:23.960
<v Speaker 2>these molde little detection things, and I want to learn

0:05:23.960 --> 0:05:24.640
<v Speaker 2>more about how well.

0:05:24.640 --> 0:05:27.440
<v Speaker 4>There's also a lot of philosophical questions about just what

0:05:27.480 --> 0:05:30.919
<v Speaker 4>we value in writing true as well, because no one's

0:05:30.960 --> 0:05:33.760
<v Speaker 4>going to yell at you for using spell check or

0:05:33.800 --> 0:05:36.039
<v Speaker 4>something like that, right, Like, it's kind of crazy to

0:05:36.040 --> 0:05:39.000
<v Speaker 4>think that reputational risk is going to hinge on whether

0:05:39.120 --> 0:05:41.640
<v Speaker 4>or not you might have used a platform, a chat

0:05:41.680 --> 0:05:44.760
<v Speaker 4>platform to like do some basic copy editing.

0:05:45.000 --> 0:05:47.320
<v Speaker 2>Totally well, very happy to say, we do, in fact

0:05:47.360 --> 0:05:48.160
<v Speaker 2>have the perfect guest.

0:05:48.440 --> 0:05:50.120
<v Speaker 3>We're going to be speaking with Max Spiro.

0:05:50.240 --> 0:05:52.880
<v Speaker 2>He is the founder and CEO of Pangram Labs, and

0:05:52.880 --> 0:05:54.720
<v Speaker 2>he can answer all of our questions. So Max, thank

0:05:54.720 --> 0:05:55.600
<v Speaker 2>you so much for coming on.

0:05:55.560 --> 0:05:56.919
<v Speaker 5>Outlaws, Thanks for having me.

0:05:57.160 --> 0:05:58.120
<v Speaker 3>How do you know it's right?

0:05:58.279 --> 0:06:00.600
<v Speaker 2>So someone puts in a piece of tech and we'll

0:06:00.600 --> 0:06:02.440
<v Speaker 2>get into the method in the second. But someone puts

0:06:02.440 --> 0:06:05.440
<v Speaker 2>in a piece of text and it says human AI,

0:06:06.320 --> 0:06:08.719
<v Speaker 2>what makes you believe that you have a very good.

0:06:08.560 --> 0:06:09.760
<v Speaker 3>Track record all this question.

0:06:09.960 --> 0:06:12.520
<v Speaker 7>So when we started Pangram, we started by doing this

0:06:12.560 --> 0:06:15.840
<v Speaker 7>thing we call a human baseline, which is how well

0:06:16.120 --> 0:06:19.680
<v Speaker 7>can we as a human predict whether something's AI or not?

0:06:19.960 --> 0:06:23.039
<v Speaker 7>That's the first step out like learning, is this problem tractable?

0:06:23.440 --> 0:06:25.800
<v Speaker 5>How hard or easy is it? And I found, like.

0:06:26.120 --> 0:06:29.240
<v Speaker 7>Me personally, I was able to get about ninety percent accuracy,

0:06:29.720 --> 0:06:32.680
<v Speaker 7>and so we figured an AI model should be able

0:06:32.720 --> 0:06:33.279
<v Speaker 7>to do much.

0:06:33.120 --> 0:06:33.599
<v Speaker 5>Better than that.

0:06:33.920 --> 0:06:37.359
<v Speaker 4>So I have a bunch of methodology questions which we

0:06:37.400 --> 0:06:40.440
<v Speaker 4>can get into. But just before we get into any

0:06:40.440 --> 0:06:44.240
<v Speaker 4>of that, why is AI slot bad in your opinion?

0:06:44.279 --> 0:06:46.480
<v Speaker 4>Why does it need to be tracked and identified?

0:06:46.760 --> 0:06:48.680
<v Speaker 7>I think the problem is is just so easy to

0:06:48.760 --> 0:06:51.720
<v Speaker 7>generate and so like it's very difficult to know, like

0:06:52.240 --> 0:06:56.080
<v Speaker 7>what is the like intent behind it? Basically, Like right now,

0:06:56.360 --> 0:06:58.560
<v Speaker 7>I think we're actually pretty lucky living. We live in

0:06:58.640 --> 0:07:02.039
<v Speaker 7>a world where the signs noise ratio on the Internet

0:07:02.040 --> 0:07:03.279
<v Speaker 7>and in our information.

0:07:02.920 --> 0:07:03.920
<v Speaker 5>Channels is pretty high.

0:07:04.040 --> 0:07:06.839
<v Speaker 7>We have pretty high signal to noise, But any bad

0:07:06.839 --> 0:07:10.520
<v Speaker 7>actor can come in and just flood our information channels

0:07:10.560 --> 0:07:15.000
<v Speaker 7>with aislot that looks legitimate. It looks like somebody put

0:07:15.040 --> 0:07:18.760
<v Speaker 7>actual effort and thought into it, but really it was

0:07:18.880 --> 0:07:21.440
<v Speaker 7>just like a single prompt which could have also been automated.

0:07:21.600 --> 0:07:23.679
<v Speaker 2>This is something that I think about a lot, which

0:07:23.720 --> 0:07:26.239
<v Speaker 2>is that there was a point in time and maybe

0:07:26.280 --> 0:07:28.960
<v Speaker 2>still is the point in time where if you read

0:07:29.000 --> 0:07:33.120
<v Speaker 2>something that was grammatically correct, where the punctuation was strong,

0:07:33.400 --> 0:07:36.640
<v Speaker 2>where the spelling was strong, there was reason to think

0:07:36.680 --> 0:07:39.400
<v Speaker 2>that the person who wrote it was a person of

0:07:39.560 --> 0:07:43.240
<v Speaker 2>like certain seriousness and a certain intelligence behind it.

0:07:43.560 --> 0:07:45.640
<v Speaker 3>And I think that the issue that you're.

0:07:45.520 --> 0:07:48.600
<v Speaker 2>Identifying is that that link is now being severed so

0:07:48.640 --> 0:07:51.800
<v Speaker 2>that we can't use these heuristics anymore, such as the

0:07:51.840 --> 0:07:55.640
<v Speaker 2>strict quality of the pros to know in fact whether

0:07:55.920 --> 0:07:59.000
<v Speaker 2>this was published by someone who was like a serious actor,

0:07:59.200 --> 0:08:00.320
<v Speaker 2>intelligent or or not.

0:08:00.480 --> 0:08:03.600
<v Speaker 4>And now you have people inserting typos into their card

0:08:04.000 --> 0:08:06.680
<v Speaker 4>that's true that they are Yeah boyd.

0:08:06.680 --> 0:08:09.840
<v Speaker 2>Sorry just to go back to my original question. So

0:08:09.880 --> 0:08:12.480
<v Speaker 2>you mentioned, okay, you're able to get it ninety percent right,

0:08:12.480 --> 0:08:14.320
<v Speaker 2>but now we've been used a lot more and you

0:08:14.320 --> 0:08:19.040
<v Speaker 2>have people paying for your software, presumably teachers and journalists, etc.

0:08:20.160 --> 0:08:23.280
<v Speaker 2>Given all of that, getting from ninety percent to one hundred,

0:08:23.320 --> 0:08:25.160
<v Speaker 2>I mean, if you could make one out of ten

0:08:25.200 --> 0:08:28.240
<v Speaker 2>it's clearly an unacceptable error raid for a piece of

0:08:28.240 --> 0:08:31.640
<v Speaker 2>commercial software that could call someone an AI creator. So

0:08:31.680 --> 0:08:33.360
<v Speaker 2>you have to do a lot better than ninety percent.

0:08:33.800 --> 0:08:36.360
<v Speaker 2>Talk to us about like what you've seen so far

0:08:36.559 --> 0:08:39.920
<v Speaker 2>in your data since releasing it as commercial software that

0:08:40.040 --> 0:08:43.600
<v Speaker 2>makes you believe the software is doing a correct job

0:08:43.679 --> 0:08:45.720
<v Speaker 2>of allocating between the two categories.

0:08:45.760 --> 0:08:49.679
<v Speaker 7>So we've built out really comprehensive emails, okay, and so

0:08:49.880 --> 0:08:54.240
<v Speaker 7>our evaluations. There's two kinds of errors. There's a false positive,

0:08:54.520 --> 0:08:56.920
<v Speaker 7>which is when something is written by a human and

0:08:56.960 --> 0:08:58.720
<v Speaker 7>then we say that it's written by an AI, okay.

0:08:58.760 --> 0:09:00.839
<v Speaker 7>And there's a false negative, which is if it was

0:09:00.920 --> 0:09:03.840
<v Speaker 7>AI written and we don't catch it. And so we

0:09:04.040 --> 0:09:07.839
<v Speaker 7>track our numbers for both of these, and for human.

0:09:07.559 --> 0:09:09.079
<v Speaker 5>Writing, we're actually pretty fortunate.

0:09:09.240 --> 0:09:11.080
<v Speaker 7>We have like millions and millions of samples, so we

0:09:11.120 --> 0:09:13.640
<v Speaker 7>can get like a false positive number that we have

0:09:13.679 --> 0:09:16.080
<v Speaker 7>a very high degree of confidence in. And our number

0:09:16.160 --> 0:09:19.080
<v Speaker 7>right now is about one in ten thousand. Ok So,

0:09:19.160 --> 0:09:22.760
<v Speaker 7>if we scan ten thousand documents on average, one will

0:09:22.800 --> 0:09:23.480
<v Speaker 7>come back as.

0:09:23.840 --> 0:09:25.240
<v Speaker 5>AI when it was actually human.

0:09:25.440 --> 0:09:27.319
<v Speaker 3>And what about in the other direction false negative?

0:09:27.720 --> 0:09:31.760
<v Speaker 7>I would say around ninety nine percent accuracy, So like

0:09:32.120 --> 0:09:35.080
<v Speaker 7>around one percent false negative rate. I think this depends

0:09:35.080 --> 0:09:38.440
<v Speaker 7>a little bit more on like how adversarial the prompting is,

0:09:38.640 --> 0:09:40.720
<v Speaker 7>how much they're trying to ev.

0:09:40.720 --> 0:09:44.280
<v Speaker 2>What I did exact send it through multiple filtrations to

0:09:44.360 --> 0:09:47.600
<v Speaker 2>obfuscate the original output. That would be an example of

0:09:47.640 --> 0:09:49.240
<v Speaker 2>adversarial prompting exactly.

0:09:49.480 --> 0:09:52.079
<v Speaker 7>But in like the general case where we're just looking

0:09:52.120 --> 0:09:55.880
<v Speaker 7>at straight outputs from AI, it's above ninety nine percent.

0:09:55.960 --> 0:09:59.000
<v Speaker 4>Okay, okay, So what is your model looking for exactly

0:09:59.040 --> 0:10:02.120
<v Speaker 4>when it's evaluated a text? Because, as we mentioned in

0:10:02.160 --> 0:10:05.560
<v Speaker 4>the intro, you know, syntax and grammar tends to be

0:10:05.679 --> 0:10:10.599
<v Speaker 4>pretty good on AI generated copy. The style is sometimes

0:10:10.640 --> 0:10:14.760
<v Speaker 4>more of an identifier, I would argue to your point, Joe, like,

0:10:14.960 --> 0:10:19.320
<v Speaker 4>sometimes it reads very saccharine and kind of overly earnest

0:10:19.640 --> 0:10:22.280
<v Speaker 4>in some ways. So what exactly are you focusing on here?

0:10:22.280 --> 0:10:23.000
<v Speaker 4>What are the tells?

0:10:23.200 --> 0:10:26.120
<v Speaker 7>Yeah, so the style and the word choices are definitely

0:10:26.200 --> 0:10:27.760
<v Speaker 7>part of it. But I think what a lot of

0:10:27.760 --> 0:10:30.200
<v Speaker 7>people don't realize is they're actually making a lot of

0:10:30.559 --> 0:10:33.720
<v Speaker 7>decisions when they write a piece of text. So there's

0:10:33.840 --> 0:10:36.800
<v Speaker 7>you know, dozens or hundreds of ways to phrase every

0:10:36.840 --> 0:10:39.680
<v Speaker 7>single phrase, and over the course of fifty or one

0:10:39.720 --> 0:10:43.240
<v Speaker 7>hundred or two hundred words, you're making thousands of decisions actually,

0:10:43.679 --> 0:10:46.400
<v Speaker 7>And so what we're doing is we're learning the patterns

0:10:46.400 --> 0:10:49.880
<v Speaker 7>and how like these frontier models make these decisions. And

0:10:49.960 --> 0:10:53.000
<v Speaker 7>if the vast majority of these decisions line up with

0:10:53.040 --> 0:10:56.160
<v Speaker 7>how the frontier models are doing it, then it's vanishingly

0:10:56.240 --> 0:10:58.600
<v Speaker 7>unlikely that this was written by a human. You would

0:10:58.640 --> 0:11:01.240
<v Speaker 7>have to just happen to make the same exact decisions

0:11:01.240 --> 0:11:03.240
<v Speaker 7>that the LM does hundreds of times.

0:11:03.280 --> 0:11:04.280
<v Speaker 6>Interesting, Okay, this.

0:11:04.320 --> 0:11:05.480
<v Speaker 3>Is a really important point.

0:11:05.559 --> 0:11:08.200
<v Speaker 2>So everyone at this point has some feel for let

0:11:08.280 --> 0:11:11.400
<v Speaker 2>go the M dash tell right, But my understanding is

0:11:11.440 --> 0:11:13.640
<v Speaker 2>it's not like you don't go in in like hard

0:11:13.679 --> 0:11:15.960
<v Speaker 2>code if you see a bunch of M dashes. This

0:11:16.080 --> 0:11:19.920
<v Speaker 2>is the thing these decisions. In many cases, I imagine,

0:11:19.960 --> 0:11:24.840
<v Speaker 2>neither you nor the model itself can articulate in English

0:11:25.080 --> 0:11:27.720
<v Speaker 2>what the decisions are. All you know is that the

0:11:27.760 --> 0:11:29.160
<v Speaker 2>decision pattern exists.

0:11:29.240 --> 0:11:29.880
<v Speaker 3>Is this correct?

0:11:30.000 --> 0:11:30.679
<v Speaker 5>This is correct?

0:11:30.720 --> 0:11:31.840
<v Speaker 3>Okay? Can you explain?

0:11:32.000 --> 0:11:35.120
<v Speaker 2>So therefore, what does it mean that your model has

0:11:35.280 --> 0:11:37.079
<v Speaker 2>learned these decision?

0:11:37.480 --> 0:11:39.920
<v Speaker 7>So what we're doing on the very broad scale is

0:11:40.080 --> 0:11:42.920
<v Speaker 7>we're training a deep learning model. So it's a pretty

0:11:42.920 --> 0:11:46.400
<v Speaker 7>big black box, but it has the base model of

0:11:47.040 --> 0:11:50.040
<v Speaker 7>a language model, and then instead of predicting the next token,

0:11:50.520 --> 0:11:53.880
<v Speaker 7>it's predicting whether it the text is AI or not. Okay,

0:11:53.960 --> 0:11:56.800
<v Speaker 7>And how we train it is we train on tens

0:11:56.840 --> 0:11:59.960
<v Speaker 7>of millions of examples, so it sees millions and milli

0:12:00.160 --> 0:12:02.959
<v Speaker 7>of human examples, and for each human example, we also

0:12:03.000 --> 0:12:05.920
<v Speaker 7>show it an AI example. So, for example, let's say

0:12:05.920 --> 0:12:09.000
<v Speaker 7>one of these is a five star review for Denny's

0:12:09.200 --> 0:12:11.959
<v Speaker 7>that's seventy eight words long. Then we'll ask in AI

0:12:12.200 --> 0:12:14.120
<v Speaker 7>to write a five star review about Denny's that's seventy

0:12:14.120 --> 0:12:16.240
<v Speaker 7>eight words long in the style of the first one.

0:12:16.440 --> 0:12:18.840
<v Speaker 7>And obviously these two will be different, and so our

0:12:18.880 --> 0:12:22.080
<v Speaker 7>model is able to learn through contrast, what is the

0:12:22.080 --> 0:12:23.000
<v Speaker 7>difference between.

0:12:22.720 --> 0:12:24.840
<v Speaker 2>Me and The Important thing, sorry, just to be clear here,

0:12:25.000 --> 0:12:26.960
<v Speaker 2>is that you and I might not be able to

0:12:27.040 --> 0:12:30.439
<v Speaker 2>articulate the difference. There will be some difference in maybe

0:12:30.520 --> 0:12:33.240
<v Speaker 2>the sentenced length, there will be some difference in word choice,

0:12:33.240 --> 0:12:36.480
<v Speaker 2>there'll be some difference in punctuation, syntax, whatever, but you

0:12:36.600 --> 0:12:40.240
<v Speaker 2>and I wouldn't obviously spot it. However, after millions of

0:12:40.280 --> 0:12:43.640
<v Speaker 2>examples of these side by sides, the model learns what

0:12:43.679 --> 0:12:44.640
<v Speaker 2>the difference is exactly.

0:12:44.720 --> 0:12:46.560
<v Speaker 7>I think the best that a human can do is

0:12:46.720 --> 0:12:49.800
<v Speaker 7>look for some of these like really obvious tells like chat.

0:12:49.880 --> 0:12:53.440
<v Speaker 7>GIPT loves that, like it's not just X, it's y framing.

0:12:53.800 --> 0:12:57.240
<v Speaker 7>Earlier models really liked some specific words like tapestry and

0:12:57.320 --> 0:12:58.760
<v Speaker 7>intercate and delve.

0:12:58.840 --> 0:13:00.360
<v Speaker 3>Yeah, delve tapestry. Yeah.

0:13:00.400 --> 0:13:00.960
<v Speaker 5>But yeah.

0:13:01.000 --> 0:13:03.079
<v Speaker 7>I think by training Pangram, we're able to go much

0:13:03.120 --> 0:13:05.640
<v Speaker 7>deeper than this and look deeper than the high level

0:13:05.640 --> 0:13:08.120
<v Speaker 7>science at the like document level science.

0:13:23.960 --> 0:13:26.080
<v Speaker 4>So one thing this kind of reminds me of and

0:13:26.120 --> 0:13:28.559
<v Speaker 4>I'm thinking how to phrase this, but it reminds me

0:13:28.600 --> 0:13:31.800
<v Speaker 4>of you know those exercises people used to do where

0:13:31.800 --> 0:13:34.000
<v Speaker 4>you would take a bunch of different faces and meld

0:13:34.040 --> 0:13:37.200
<v Speaker 4>them all together and come up with like one face

0:13:37.320 --> 0:13:41.120
<v Speaker 4>that was supposedly attractive. So, like, to what extent is

0:13:41.160 --> 0:13:46.560
<v Speaker 4>this basically a distributional detector in the sense that you're

0:13:46.600 --> 0:13:50.960
<v Speaker 4>looking for like certain paths that you think AI would choose.

0:13:51.800 --> 0:13:54.239
<v Speaker 4>And I guess, like, could you get a false positive

0:13:54.840 --> 0:13:57.440
<v Speaker 4>just from someone who's choosing like the average of the

0:13:57.480 --> 0:14:00.320
<v Speaker 4>average of the average in a way to state a

0:14:00.320 --> 0:14:01.200
<v Speaker 4>particular sentence.

0:14:03.360 --> 0:14:06.400
<v Speaker 7>Maybe there's a reason we have our false posit rate

0:14:06.440 --> 0:14:08.840
<v Speaker 7>is one in ten thousand and not zero. It's because

0:14:09.200 --> 0:14:12.319
<v Speaker 7>you know, sometimes we look at the false positive and

0:14:12.360 --> 0:14:15.559
<v Speaker 7>it's like, oh, it reads exactly like an AI generated

0:14:15.720 --> 0:14:18.600
<v Speaker 7>review or essay, except that it was written in twenty nineteen.

0:14:18.640 --> 0:14:21.000
<v Speaker 7>So it was probably a human who just happened to

0:14:21.800 --> 0:14:24.840
<v Speaker 7>find the exact like mode collapsed.

0:14:24.640 --> 0:14:26.720
<v Speaker 5>Type of way that like, yeah, thats right, Yeah, I

0:14:26.760 --> 0:14:27.400
<v Speaker 5>would say, yeah.

0:14:27.480 --> 0:14:29.440
<v Speaker 7>I think it's a good way to think about the

0:14:29.480 --> 0:14:32.840
<v Speaker 7>distribution of writing or writing as a distribution where like,

0:14:32.920 --> 0:14:35.520
<v Speaker 7>you know, there's the space of all human writing, and

0:14:35.560 --> 0:14:37.920
<v Speaker 7>then AI writing is really just.

0:14:37.920 --> 0:14:39.840
<v Speaker 5>Like a small point within this space.

0:14:39.880 --> 0:14:42.360
<v Speaker 7>It's very no matter how much you prompt it, it

0:14:42.400 --> 0:14:46.160
<v Speaker 7>doesn't go that far from where it was trained to be.

0:14:46.440 --> 0:14:48.120
<v Speaker 3>Yeah, okay, WA's the black book.

0:14:48.200 --> 0:14:50.520
<v Speaker 2>So I built a little model myself. I built this

0:14:50.560 --> 0:14:53.080
<v Speaker 2>thing that detext. You can upload text and says whether

0:14:53.120 --> 0:14:56.600
<v Speaker 2>it's more resemblant of the written word or the spoken word.

0:14:57.040 --> 0:14:59.600
<v Speaker 2>Oh I saw that, yeah, yeah, And I used bert,

0:14:59.640 --> 0:15:02.480
<v Speaker 2>which is like one of these things open source one

0:15:02.480 --> 0:15:02.960
<v Speaker 2>from Google.

0:15:03.000 --> 0:15:04.800
<v Speaker 3>What is the core model that.

0:15:04.720 --> 0:15:07.280
<v Speaker 2>You trained on or is it something or did you

0:15:07.320 --> 0:15:08.120
<v Speaker 2>build it yourself?

0:15:08.200 --> 0:15:08.960
<v Speaker 3>Like, talk to us about that.

0:15:09.000 --> 0:15:11.760
<v Speaker 7>Our very first model was actually built on Burt, but

0:15:11.960 --> 0:15:17.360
<v Speaker 7>future models we needed to up our capacity. So basically

0:15:17.440 --> 0:15:20.480
<v Speaker 7>we were running into capacity limits with our model. It

0:15:20.840 --> 0:15:23.840
<v Speaker 7>was capping out at a certain false positive false negative rate.

0:15:24.040 --> 0:15:26.600
<v Speaker 7>It wasn't learning the deeper signals, so we had to

0:15:26.800 --> 0:15:28.960
<v Speaker 7>ten x and then one hundred x the parameter account

0:15:29.160 --> 0:15:32.400
<v Speaker 7>so that can learn like really deeply, like how these

0:15:32.400 --> 0:15:33.400
<v Speaker 7>frontier models.

0:15:33.200 --> 0:15:36.920
<v Speaker 4>Right, Have you noticed any interesting differences between how the

0:15:36.960 --> 0:15:40.760
<v Speaker 4>models right? Can you and actually is your model trained

0:15:40.800 --> 0:15:44.080
<v Speaker 4>to identify different models as well as whether or not

0:15:44.120 --> 0:15:46.440
<v Speaker 4>This is just broadly AI generated.

0:15:46.560 --> 0:15:50.520
<v Speaker 7>So we don't specifically train it on different models. We

0:15:50.520 --> 0:15:52.720
<v Speaker 7>don't say like hey, this one is CLAT three and

0:15:52.760 --> 0:15:56.400
<v Speaker 7>this one is Chat or GPD five. What we've done

0:15:56.680 --> 0:16:00.040
<v Speaker 7>we've done some interpretability work to look at basically the

0:16:00.080 --> 0:16:02.720
<v Speaker 7>output embeddings of the model and where we find that

0:16:02.920 --> 0:16:05.880
<v Speaker 7>it actually learns which model the text came from. So

0:16:05.920 --> 0:16:08.360
<v Speaker 7>you could see like little clusters like this is the

0:16:08.440 --> 0:16:11.440
<v Speaker 7>Clod cluster and like all the clods, yeah, cluster around here,

0:16:11.440 --> 0:16:13.760
<v Speaker 7>and then these are like the deep Seek and Quinn

0:16:13.840 --> 0:16:15.760
<v Speaker 7>and then this is like Chat schipt and they all

0:16:15.840 --> 0:16:19.680
<v Speaker 7>kind of like cluster into different spaces and embedding space.

0:16:20.240 --> 0:16:22.640
<v Speaker 7>So clearly the model is able to learn what the

0:16:22.640 --> 0:16:24.320
<v Speaker 7>difference is between these frontier models.

0:16:24.520 --> 0:16:27.480
<v Speaker 4>We actually since you mentioned Quin, I'm very interested is

0:16:27.480 --> 0:16:31.040
<v Speaker 4>there anything like distinct in terms of how Quen generates

0:16:31.080 --> 0:16:34.600
<v Speaker 4>text versus platforms that have been developed in the US.

0:16:35.120 --> 0:16:37.640
<v Speaker 7>I think Quen is unique because it's trained on a

0:16:37.680 --> 0:16:40.640
<v Speaker 7>lot more Chinese and multi lingual tokens than other models.

0:16:41.360 --> 0:16:44.200
<v Speaker 7>So you know, I've heard from Chinese friends that it's

0:16:44.320 --> 0:16:49.680
<v Speaker 7>it's much better at like being conversationally fluent in Chinese.

0:16:50.320 --> 0:16:52.400
<v Speaker 5>Beyond that, I don't know that I can tell.

0:16:52.760 --> 0:16:54.280
<v Speaker 7>It would be hard for me to look at a

0:16:54.320 --> 0:16:57.360
<v Speaker 7>text and say, like, I know that's Quen, But I

0:16:57.360 --> 0:16:59.680
<v Speaker 7>think somebody who's more familiar with it might be able to.

0:17:00.200 --> 0:17:02.880
<v Speaker 2>Let's talk about sort of some of the philosophical or

0:17:02.920 --> 0:17:04.720
<v Speaker 2>societal implications of this work.

0:17:05.240 --> 0:17:06.040
<v Speaker 3>Have you had.

0:17:05.920 --> 0:17:10.120
<v Speaker 2>Anyone whose text has been judged to be ai written

0:17:10.160 --> 0:17:12.840
<v Speaker 2>by Pangram and they're like, I swear to God, this

0:17:12.880 --> 0:17:15.639
<v Speaker 2>isn't you're in? They like, really insist, and what do

0:17:15.640 --> 0:17:17.399
<v Speaker 2>you think about this situation? What do you do or

0:17:17.440 --> 0:17:18.200
<v Speaker 2>talk choice about that.

0:17:18.359 --> 0:17:20.439
<v Speaker 7>I've had a couple of times this happened. There have

0:17:20.440 --> 0:17:22.600
<v Speaker 7>been times where I genuinely believe that you know this

0:17:22.720 --> 0:17:24.879
<v Speaker 7>is just a false positive. We scan hundreds of millions

0:17:24.880 --> 0:17:27.040
<v Speaker 7>of documents, so like, at a certain scale like this

0:17:27.040 --> 0:17:30.359
<v Speaker 7>will happen. But I also get people who all the

0:17:30.400 --> 0:17:32.720
<v Speaker 7>time they're just like AI detectors don't work.

0:17:32.840 --> 0:17:34.040
<v Speaker 5>It's like a total fraud.

0:17:34.280 --> 0:17:37.040
<v Speaker 7>And then whatever they're putting out on LinkedIn is just

0:17:37.080 --> 0:17:38.760
<v Speaker 7>one hundred percent AI generated.

0:17:38.440 --> 0:17:40.120
<v Speaker 5>And they're just like mad that they're getting called out.

0:17:40.440 --> 0:17:43.200
<v Speaker 7>And then you look back farther into their past and

0:17:43.200 --> 0:17:45.600
<v Speaker 7>their history, like everything they're putting out is AI generated

0:17:46.000 --> 0:17:49.320
<v Speaker 7>until about like twenty twenty three, Like for everyone, if

0:17:49.359 --> 0:17:52.120
<v Speaker 7>you look historically, there's a lot of like slop accounts

0:17:52.119 --> 0:17:54.800
<v Speaker 7>that are putting out total slop, and you can tell

0:17:54.800 --> 0:17:57.800
<v Speaker 7>either they like weren't posting as much before, and if

0:17:57.880 --> 0:18:00.479
<v Speaker 7>you scan back in time, then you see that they

0:18:00.480 --> 0:18:02.160
<v Speaker 7>were writing human text at some point.

0:18:02.240 --> 0:18:04.800
<v Speaker 2>So there's a number of accounts out there that basically

0:18:04.960 --> 0:18:07.840
<v Speaker 2>right around the beginning of twenty twenty three, where if

0:18:07.880 --> 0:18:10.840
<v Speaker 2>you scan the entire corpus of their work, it very

0:18:10.960 --> 0:18:12.640
<v Speaker 2>clearly shows a switch.

0:18:12.359 --> 0:18:13.920
<v Speaker 3>Right around early twenty twenty three.

0:18:14.119 --> 0:18:17.280
<v Speaker 7>Yeah, it really like depends on the account. I think

0:18:17.400 --> 0:18:19.520
<v Speaker 7>one thing we saw that was interesting was there is

0:18:19.600 --> 0:18:22.720
<v Speaker 7>a writer for The Guardian that was covering the Winter Olympics,

0:18:22.920 --> 0:18:25.040
<v Speaker 7>and somebody was like, hey, this article is like total

0:18:25.080 --> 0:18:27.840
<v Speaker 7>AI slop. Ran it through pangram it was AI. The

0:18:27.880 --> 0:18:30.520
<v Speaker 7>Guardian was like, no, of course, our writers don't use AI.

0:18:30.760 --> 0:18:34.080
<v Speaker 7>And then we so we scanned this single writer's history

0:18:34.520 --> 0:18:36.760
<v Speaker 7>and we found that they really did start picking up

0:18:36.800 --> 0:18:39.400
<v Speaker 7>AI like mid to late twenty twenty four, and we're

0:18:39.480 --> 0:18:41.240
<v Speaker 7>using it more and more in their articles.

0:18:41.560 --> 0:18:44.240
<v Speaker 4>I mean, just play Devil's Advocate for a second. Does

0:18:44.280 --> 0:18:48.359
<v Speaker 4>intent matter when it comes to identifying AI slop in

0:18:48.400 --> 0:18:50.679
<v Speaker 4>the sense that, Okay, I get you can have a

0:18:50.720 --> 0:18:54.800
<v Speaker 4>bad actor who's maybe trying to influence how people feel

0:18:54.800 --> 0:18:57.720
<v Speaker 4>about a particular topic, and maybe they've created a bunch

0:18:57.760 --> 0:19:01.320
<v Speaker 4>of bots on Twitter slash x and they're using AI

0:19:01.480 --> 0:19:04.160
<v Speaker 4>to just flood the zone with a bunch of AI

0:19:04.240 --> 0:19:08.960
<v Speaker 4>slop supporting their particular viewpoints. On the other hand, if

0:19:08.960 --> 0:19:12.479
<v Speaker 4>you're a journalist and your business is to write, you know,

0:19:12.600 --> 0:19:16.520
<v Speaker 4>like basic understandable copy about a news topic.

0:19:16.800 --> 0:19:17.880
<v Speaker 6>Just to be clear, I'm.

0:19:17.680 --> 0:19:21.440
<v Speaker 4>Not advocating this at all, but that intent is very

0:19:21.440 --> 0:19:25.040
<v Speaker 4>different to I'm going to try to influence something by

0:19:25.280 --> 0:19:26.800
<v Speaker 4>just you know, sheer volume.

0:19:27.240 --> 0:19:29.680
<v Speaker 7>Yeah, I mean, definitely these are like one is a

0:19:29.720 --> 0:19:32.239
<v Speaker 7>lot more severe than the other. But I think at

0:19:32.240 --> 0:19:34.280
<v Speaker 7>the same time, if you're a journalist and you're using

0:19:34.760 --> 0:19:38.000
<v Speaker 7>AI to basically shirk your work and like not do

0:19:38.080 --> 0:19:40.240
<v Speaker 7>your work, I think that's also a problem. And I

0:19:40.240 --> 0:19:42.880
<v Speaker 7>think it's a reputational risk to the outlet because people

0:19:42.960 --> 0:19:44.879
<v Speaker 7>can tell and people are going to call you out.

0:19:45.440 --> 0:19:46.840
<v Speaker 7>There's a lot of people who don't want to read

0:19:46.840 --> 0:19:49.240
<v Speaker 7>AI slop kind of regardless of where it's from.

0:19:49.520 --> 0:19:52.840
<v Speaker 2>Yeah, this is a definitely true. Are you ever going

0:19:52.880 --> 0:19:55.240
<v Speaker 2>to run out of human material to change on?

0:19:55.400 --> 0:19:55.520
<v Speaker 5>Right?

0:19:55.560 --> 0:19:57.920
<v Speaker 2>Like you could be pretty confident that if you find

0:19:57.960 --> 0:20:00.879
<v Speaker 2>some piece of text that was published on the internet

0:20:00.880 --> 0:20:03.960
<v Speaker 2>prior to twenty twenty three, but certainly prior to like

0:20:04.000 --> 0:20:06.840
<v Speaker 2>twenty nineteen or something like that, you can be extremely

0:20:06.880 --> 0:20:11.240
<v Speaker 2>sure that this is human generated. Do you worry that

0:20:11.400 --> 0:20:14.040
<v Speaker 2>in the future, like it's going to be harder to

0:20:14.200 --> 0:20:16.840
<v Speaker 2>even establish the provenance of your training data.

0:20:17.200 --> 0:20:18.800
<v Speaker 5>Uh, Yeah, it's definitely a concern for us.

0:20:18.920 --> 0:20:20.280
<v Speaker 3>Talk to us about how to think about this.

0:20:20.359 --> 0:20:23.440
<v Speaker 7>So we have a near infinite data reservoir of pre

0:20:23.560 --> 0:20:26.600
<v Speaker 7>twenty twenty three data, there's just like more than enough

0:20:26.600 --> 0:20:28.280
<v Speaker 7>for us to train on for a long long time.

0:20:28.920 --> 0:20:31.080
<v Speaker 7>But part of the problem is we also want to

0:20:31.080 --> 0:20:33.560
<v Speaker 7>train on modern text. We want to there's all this

0:20:33.640 --> 0:20:36.840
<v Speaker 7>talk about like if somebody's writing about LMS or about AI,

0:20:36.920 --> 0:20:39.560
<v Speaker 7>we don't want to incorrectly flag that as AI because

0:20:39.760 --> 0:20:43.399
<v Speaker 7>our training data has no sense of this topic. So

0:20:44.040 --> 0:20:46.040
<v Speaker 7>I think we're looking at different ways to do this,

0:20:46.160 --> 0:20:48.760
<v Speaker 7>but most of them are just like figuring out like

0:20:48.800 --> 0:20:49.960
<v Speaker 7>who is a trusted actor?

0:20:50.000 --> 0:20:51.160
<v Speaker 5>Who do we know is.

0:20:51.160 --> 0:20:53.919
<v Speaker 7>Putting out humor written content and we could use our

0:20:53.960 --> 0:20:56.080
<v Speaker 7>model for that, like to some degree. And then so

0:20:56.200 --> 0:20:58.600
<v Speaker 7>we have known actors, we know they're putting out human

0:20:58.640 --> 0:21:00.560
<v Speaker 7>written content, and then we could use their as well.

0:21:00.920 --> 0:21:03.680
<v Speaker 4>Slightly random question, but using your model, are you able

0:21:03.680 --> 0:21:06.919
<v Speaker 4>to quantify like what percentage of the Internet at the

0:21:06.960 --> 0:21:08.240
<v Speaker 4>moment is aislot?

0:21:08.600 --> 0:21:12.920
<v Speaker 2>It's about forty percent based on why you're just how'd

0:21:12.920 --> 0:21:13.639
<v Speaker 2>you get that number?

0:21:13.960 --> 0:21:16.960
<v Speaker 7>So a lot of the Internet is just like SEO

0:21:17.080 --> 0:21:20.480
<v Speaker 7>written articles and like, yeah, it's articles written for search

0:21:20.560 --> 0:21:22.440
<v Speaker 7>basically so that your website comes up more often in

0:21:22.440 --> 0:21:24.919
<v Speaker 7>search because it's targeting certain keywords. And a lot of

0:21:24.920 --> 0:21:28.280
<v Speaker 7>that industry has switched over to using AI because then

0:21:28.320 --> 0:21:30.480
<v Speaker 7>instead of having to pay writers you could turn out

0:21:30.560 --> 0:21:33.520
<v Speaker 7>articles for pennies on the dollar, but I think that

0:21:33.600 --> 0:21:36.280
<v Speaker 7>kind of results in a lot of the Internet being

0:21:36.359 --> 0:21:39.399
<v Speaker 7>AI written. It's a little bit is also kind of

0:21:39.440 --> 0:21:43.040
<v Speaker 7>platform dependent. It's about forty percent from like a Internet

0:21:43.040 --> 0:21:46.600
<v Speaker 7>page perspective. About a year and a half ago, we

0:21:46.640 --> 0:21:49.600
<v Speaker 7>looked at Medium and found that over fifty percent of

0:21:49.840 --> 0:21:54.240
<v Speaker 7>newly written Medium articles were generated, which was a crazy

0:21:54.320 --> 0:21:54.840
<v Speaker 7>high number.

0:21:54.880 --> 0:21:55.520
<v Speaker 3>What about Reddit?

0:21:56.160 --> 0:21:58.679
<v Speaker 7>Reddit, it was seven percent a year ago, I believe

0:21:58.920 --> 0:21:59.879
<v Speaker 7>a little over ten percent.

0:22:00.400 --> 0:22:03.280
<v Speaker 4>Well, actually this reminds me. So I'm on Reddit a

0:22:03.280 --> 0:22:05.840
<v Speaker 4>lot and I really enjoy it nowadays as a platform,

0:22:05.880 --> 0:22:07.600
<v Speaker 4>but I do worry about how much of it is

0:22:07.640 --> 0:22:11.280
<v Speaker 4>being generated by AI. And the thing I don't necessarily

0:22:11.359 --> 0:22:16.000
<v Speaker 4>understand is what are the economic incentives to actually write

0:22:16.040 --> 0:22:18.480
<v Speaker 4>a bunch of AI generated posts on Reddit and get

0:22:18.600 --> 0:22:22.439
<v Speaker 4>up voted, Like why does that system or motivation even exist.

0:22:22.760 --> 0:22:25.200
<v Speaker 7>So there are startups I'm not going to name names

0:22:25.240 --> 0:22:27.520
<v Speaker 7>because I don't want to promote them, but they will

0:22:28.119 --> 0:22:30.480
<v Speaker 7>sell a promise to companies that we're going to get

0:22:30.480 --> 0:22:33.719
<v Speaker 7>you organic mentions on Reddit. We're going to run our

0:22:33.760 --> 0:22:37.320
<v Speaker 7>AI bots that seem organic, and they're just going to,

0:22:37.640 --> 0:22:40.280
<v Speaker 7>you know, naturally recommend your product or you know, just

0:22:40.359 --> 0:22:43.119
<v Speaker 7>mention your product in the comments or in a post.

0:22:43.600 --> 0:22:46.399
<v Speaker 7>And so I've seen evidence of this. We can find

0:22:46.440 --> 0:22:51.520
<v Speaker 7>these like they're basically like botforms that are mostly engaging,

0:22:52.000 --> 0:22:55.000
<v Speaker 7>seemingly organically, just like doing a short reply, and then

0:22:55.040 --> 0:22:57.560
<v Speaker 7>sometimes they're doing this brand mention. And so that's why

0:22:57.560 --> 0:22:58.840
<v Speaker 7>these posts are very valuable.

0:22:58.840 --> 0:22:59.680
<v Speaker 6>That's really interesting.

0:22:59.720 --> 0:23:02.280
<v Speaker 2>I have to you also imagine it's valuable because all

0:23:02.359 --> 0:23:05.280
<v Speaker 2>of the models train on Reddit, right, and if you

0:23:05.359 --> 0:23:09.399
<v Speaker 2>want your product's name to appear in model outputs, it's like,

0:23:09.680 --> 0:23:13.520
<v Speaker 2>what is the best you know, nose hair trimmer or whatever,

0:23:13.960 --> 0:23:16.320
<v Speaker 2>And there's a bunch of bots that on Reddit talked

0:23:16.320 --> 0:23:18.920
<v Speaker 2>about this nose hair trimmer, and then that's probably more.

0:23:18.800 --> 0:23:21.639
<v Speaker 3>Likely to show up in a chatchypt request, right.

0:23:21.760 --> 0:23:23.920
<v Speaker 7>Yeah, yeah, it's been weirdly gamed. You know, you used

0:23:23.920 --> 0:23:26.200
<v Speaker 7>to just google best nose hair trimmer, and now there's

0:23:26.240 --> 0:23:27.160
<v Speaker 7>like a thousand.

0:23:27.400 --> 0:23:29.959
<v Speaker 4>The Reddit search results like show up first nowadays.

0:23:30.080 --> 0:23:31.240
<v Speaker 6>Yeah, that's where people are looking.

0:23:31.480 --> 0:23:34.439
<v Speaker 7>Yeah, and then people start searching best nose trimmer Reddit

0:23:35.280 --> 0:23:37.280
<v Speaker 7>to get their Reddit comments on it. And now it's

0:23:37.680 --> 0:23:39.800
<v Speaker 7>people have realized that that's what people are searching for.

0:23:40.119 --> 0:23:43.480
<v Speaker 7>So you need to populate Reddit with your advertisements.

0:23:44.760 --> 0:23:46.600
<v Speaker 4>I'm on the Men's health Are you looking for nose

0:23:46.640 --> 0:23:47.240
<v Speaker 4>hair trimmers?

0:23:47.440 --> 0:23:50.440
<v Speaker 2>The Panasonic ear and nose hair trimmer is the number

0:23:50.480 --> 0:23:53.800
<v Speaker 2>one choice on men's health pros. Easy to hold anyway,

0:23:53.960 --> 0:23:54.240
<v Speaker 2>it's not.

0:23:54.440 --> 0:23:57.800
<v Speaker 5>Yeah, it's all these affiliate links. Yeah, just destroyed the Internet.

0:23:57.920 --> 0:24:00.760
<v Speaker 2>I know it's it's too bad, but whatever, talk to

0:24:00.840 --> 0:24:03.280
<v Speaker 2>us more about the whole pipeline. So, I'm very fascinated

0:24:03.280 --> 0:24:05.879
<v Speaker 2>by this idea. It's like, Okay, you see this review

0:24:05.960 --> 0:24:08.480
<v Speaker 2>for Denny's. You have the AI model.

0:24:08.600 --> 0:24:10.879
<v Speaker 3>Try to replicate it as best as it could. Movie

0:24:10.880 --> 0:24:13.000
<v Speaker 3>these subtle differences. Talk to us as though about, like

0:24:13.040 --> 0:24:14.000
<v Speaker 3>the whole pipeline.

0:24:14.000 --> 0:24:16.640
<v Speaker 2>What are the other tests that you're using to get

0:24:16.680 --> 0:24:19.760
<v Speaker 2>the true you know, because what I imagine you're trying to

0:24:19.800 --> 0:24:22.879
<v Speaker 2>do is get the most similar data sets with an

0:24:22.880 --> 0:24:26.760
<v Speaker 2>almost imperceptible difference to really stress tests. Yeah, talk to

0:24:26.840 --> 0:24:28.120
<v Speaker 2>us really about the whole pipeline.

0:24:28.160 --> 0:24:28.320
<v Speaker 4>Yeah.

0:24:28.359 --> 0:24:30.359
<v Speaker 7>So what we're really trying to do here is we're as.

0:24:30.240 --> 0:24:33.240
<v Speaker 3>A model maker myself, no, no, sorry, keep going.

0:24:33.320 --> 0:24:35.159
<v Speaker 5>Yeah, as an AI expert, Yeah, yeah.

0:24:35.000 --> 0:24:36.920
<v Speaker 3>As an AI expert. I need to hear some tips

0:24:36.960 --> 0:24:37.520
<v Speaker 3>of the field.

0:24:38.600 --> 0:24:41.399
<v Speaker 7>Uh yeah, So what we're really looking for is examples

0:24:41.400 --> 0:24:43.800
<v Speaker 7>that are as close to the boundary between human and

0:24:43.840 --> 0:24:47.000
<v Speaker 7>AI as possible that our model learns better. Something that's

0:24:47.119 --> 0:24:50.399
<v Speaker 7>very obviously AI is, you know, our models not learning

0:24:50.400 --> 0:24:53.639
<v Speaker 7>as much same thing for something that's obviously human. And

0:24:53.720 --> 0:24:57.879
<v Speaker 7>so step one is creating this data set with synthetic

0:24:57.920 --> 0:25:00.639
<v Speaker 7>mirrors of human examples, and then we train a model,

0:25:00.960 --> 0:25:03.920
<v Speaker 7>and then step two is something called active learning. So

0:25:03.960 --> 0:25:06.840
<v Speaker 7>we then take this model and use it to scan

0:25:06.960 --> 0:25:10.920
<v Speaker 7>a much larger corpus of data and look for errors,

0:25:11.200 --> 0:25:14.440
<v Speaker 7>false positives, false negatives, and then we pull those back

0:25:14.480 --> 0:25:17.080
<v Speaker 7>into our training set and are able to train a

0:25:17.160 --> 0:25:20.919
<v Speaker 7>much better model because it's seen these errors, which and

0:25:20.960 --> 0:25:23.119
<v Speaker 7>these errors we believe are just much closer to the

0:25:23.520 --> 0:25:24.840
<v Speaker 7>boundary between human and AI.

0:25:25.080 --> 0:25:28.040
<v Speaker 2>So sorry, just to be clear, the first pass is like, okay,

0:25:28.080 --> 0:25:31.800
<v Speaker 2>you have known human writing and known AI writing. You

0:25:31.840 --> 0:25:34.760
<v Speaker 2>train a model, and then the next pass is once

0:25:34.800 --> 0:25:38.199
<v Speaker 2>again unknown human and known AI writing. So you already

0:25:38.240 --> 0:25:41.600
<v Speaker 2>know the answer of each of these and therefore you

0:25:41.640 --> 0:25:44.000
<v Speaker 2>could come up with a list of which it got wrong,

0:25:44.400 --> 0:25:46.840
<v Speaker 2>and then that gets fed back into the first.

0:25:46.640 --> 0:25:50.000
<v Speaker 7>Verse exactly, and so that makes once we retrain, then

0:25:50.040 --> 0:25:52.760
<v Speaker 7>the model gets much much better, and then we could

0:25:52.840 --> 0:25:55.600
<v Speaker 7>do this as many times as we want to, kind

0:25:55.600 --> 0:25:58.800
<v Speaker 7>of just have a self improving model that gets better

0:25:58.880 --> 0:26:01.600
<v Speaker 7>with every training run. I can also tell you go

0:26:01.640 --> 0:26:04.840
<v Speaker 7>a little bit more into how we deal with AI edits,

0:26:05.000 --> 0:26:08.840
<v Speaker 7>because I think that's increasingly important. Problem is, like I

0:26:08.880 --> 0:26:12.080
<v Speaker 7>think most writing will be AI assisted in the future.

0:26:12.440 --> 0:26:14.719
<v Speaker 7>I think it's already in Google Docs and it's in

0:26:15.040 --> 0:26:15.760
<v Speaker 7>Google Keyboard.

0:26:16.000 --> 0:26:18.359
<v Speaker 4>Grammarly arguably has been doing this for a while.

0:26:18.520 --> 0:26:18.879
<v Speaker 5>Exactly.

0:26:18.960 --> 0:26:22.480
<v Speaker 7>Yeah, Grammarly uses LMS on the back end, and we

0:26:22.760 --> 0:26:25.400
<v Speaker 7>don't want to just say, like, all writing is AI now.

0:26:25.520 --> 0:26:28.000
<v Speaker 7>We want to be able to differentiate between AI assisted

0:26:28.280 --> 0:26:30.560
<v Speaker 7>and AI generated. So what we do is we also

0:26:30.640 --> 0:26:34.720
<v Speaker 7>have different prompts. So rather than saying so for our

0:26:34.960 --> 0:26:38.679
<v Speaker 7>human review of Denny's, rather than saying, generate a review

0:26:38.800 --> 0:26:41.439
<v Speaker 7>like this, we could say, help improve this, make it

0:26:41.480 --> 0:26:43.920
<v Speaker 7>more formal, make it more like, clean up the grammar.

0:26:44.080 --> 0:26:47.320
<v Speaker 7>And so we have like a long list of AI

0:26:47.520 --> 0:26:51.680
<v Speaker 7>editing prompts, and then we're able to look at basically

0:26:51.680 --> 0:26:56.280
<v Speaker 7>the cosine difference the distance between the original human text and.

0:26:56.600 --> 0:26:59.240
<v Speaker 3>The in that hyper multidimensional space.

0:26:59.080 --> 0:27:03.800
<v Speaker 7>Exactly, So how much did AI change this text? And

0:27:03.840 --> 0:27:06.119
<v Speaker 7>then we're able to train our model to say, like

0:27:06.760 --> 0:27:09.080
<v Speaker 7>we're just going to like put a point on this

0:27:09.119 --> 0:27:11.960
<v Speaker 7>distance and say like this is moderate aissistance, this is

0:27:12.040 --> 0:27:14.240
<v Speaker 7>light AI assistance, and this is heavy aissistance.

0:27:14.560 --> 0:27:16.919
<v Speaker 4>Interesting. I'm going to do something I don't think I've

0:27:16.960 --> 0:27:20.600
<v Speaker 4>ever done before, which is ask a founder about their

0:27:20.680 --> 0:27:24.760
<v Speaker 4>corporate mission. But you know, you've set up this company,

0:27:25.320 --> 0:27:27.359
<v Speaker 4>and when you think about what you're trying to do here,

0:27:27.520 --> 0:27:30.520
<v Speaker 4>is it just basic AI detection in the sense that

0:27:30.560 --> 0:27:32.600
<v Speaker 4>there might be you know, a few groups of people

0:27:32.720 --> 0:27:35.960
<v Speaker 4>like teachers that find this very valuable, or is the

0:27:36.000 --> 0:27:40.399
<v Speaker 4>mission something broader where you're actually trying to improve the

0:27:40.480 --> 0:27:42.720
<v Speaker 4>Internet and what people see on it.

0:27:43.000 --> 0:27:46.800
<v Speaker 7>I believe the technology of being able to detect AI

0:27:46.840 --> 0:27:51.439
<v Speaker 7>generated content is immensely valuable, and it's valuable not just

0:27:51.480 --> 0:27:55.680
<v Speaker 7>for teachers, but for basically everybody in every profession. Lawyer's

0:27:56.040 --> 0:28:00.560
<v Speaker 7>publisher is just an individual who consumes content on the Internet.

0:28:00.760 --> 0:28:04.480
<v Speaker 7>I think it's valuable for all these people. But ultimately, yeah,

0:28:04.520 --> 0:28:07.719
<v Speaker 7>our high level goal is to help mitigate some of

0:28:07.760 --> 0:28:11.119
<v Speaker 7>these negative effects of growing AI content.

0:28:11.440 --> 0:28:16.280
<v Speaker 4>But for instance, just using the product review example, is

0:28:16.320 --> 0:28:19.520
<v Speaker 4>the vision that like a Yelp, for instance, would want

0:28:19.520 --> 0:28:22.119
<v Speaker 4>to use this technology to make sure that its system

0:28:22.280 --> 0:28:25.520
<v Speaker 4>isn't being gamed or is the vision Like if I

0:28:25.560 --> 0:28:28.720
<v Speaker 4>am a particularly diligent consumer who has a lot of

0:28:28.720 --> 0:28:30.800
<v Speaker 4>time on my hands and I'm looking to go out

0:28:30.840 --> 0:28:34.440
<v Speaker 4>to a restaurant, I can run all these individual restaurant

0:28:34.480 --> 0:28:38.400
<v Speaker 4>reviews through Pangram and then like actually figure out if

0:28:38.440 --> 0:28:39.680
<v Speaker 4>it's real hype or not.

0:28:40.280 --> 0:28:42.800
<v Speaker 7>So I think right now it's a lot of the former.

0:28:42.880 --> 0:28:46.000
<v Speaker 7>We work with platforms. One of our biggest customers is Quorra,

0:28:46.600 --> 0:28:49.120
<v Speaker 7>and they run a bunch of content through Pangram. But

0:28:49.160 --> 0:28:52.480
<v Speaker 7>we have a lot of different platforms that use Pangram

0:28:52.560 --> 0:28:56.440
<v Speaker 7>to help moderate and find AI bad actors and get

0:28:56.440 --> 0:28:58.640
<v Speaker 7>them off their platform. But I also think, yeah, the

0:28:58.760 --> 0:29:01.920
<v Speaker 7>individual consumer case has been growing a lot, and we're

0:29:01.920 --> 0:29:03.560
<v Speaker 7>really interested in pushing.

0:29:03.240 --> 0:29:23.320
<v Speaker 2>Here the free version of pangram dot com. Like you

0:29:23.360 --> 0:29:26.160
<v Speaker 2>get a handful of tests a day or something like that.

0:29:26.800 --> 0:29:32.440
<v Speaker 2>If someone had an unlimited number of Pangram responses and

0:29:32.840 --> 0:29:36.240
<v Speaker 2>maybe had an excess to the Pangram api at infinite scale,

0:29:36.960 --> 0:29:40.959
<v Speaker 2>could they theoretically learn a prompt that they would then

0:29:41.040 --> 0:29:43.880
<v Speaker 2>be able to put into an AI to generate human style.

0:29:43.920 --> 0:29:46.479
<v Speaker 7>Writer actually had a friend do that. He put his

0:29:46.560 --> 0:29:49.640
<v Speaker 7>cloud code on a loop. I gave him some API credits,

0:29:49.680 --> 0:29:53.120
<v Speaker 7>and then his cloud code just basically worked overnight writing

0:29:53.120 --> 0:29:55.480
<v Speaker 7>a prompt trying to get it to put something that's

0:29:55.520 --> 0:29:58.360
<v Speaker 7>human written or that which came back there from Pangram

0:29:58.480 --> 0:30:01.680
<v Speaker 7>as human written. They got there, but the text was

0:30:01.720 --> 0:30:06.760
<v Speaker 7>pretty like uh incoherent, so so like, yeah, it was

0:30:06.920 --> 0:30:11.680
<v Speaker 7>producing more or less long gibberish. It was like grammatically incorrect.

0:30:12.600 --> 0:30:14.600
<v Speaker 7>A lot of the words just didn't really make sense.

0:30:14.680 --> 0:30:16.600
<v Speaker 2>Because this was my first thought, like when I saw it,

0:30:16.640 --> 0:30:18.680
<v Speaker 2>I was like, that would be like a fun experiment

0:30:19.120 --> 0:30:21.800
<v Speaker 2>to see if you could take all the outputs, find

0:30:21.800 --> 0:30:24.400
<v Speaker 2>the difference and just keep iterating on the prompt you

0:30:24.400 --> 0:30:27.560
<v Speaker 2>would have to tell AI in order to eventually get

0:30:27.560 --> 0:30:31.240
<v Speaker 2>an output that looked to Pangram like it was human generated.

0:30:31.360 --> 0:30:32.920
<v Speaker 7>Yeah, I think there's a way to do it if

0:30:32.960 --> 0:30:36.080
<v Speaker 7>you also had like an LM judge on coherency and

0:30:36.200 --> 0:30:40.040
<v Speaker 7>he's like Pangram and the coherency judge both to score

0:30:40.160 --> 0:30:43.280
<v Speaker 7>your text. I think it's definitely possible, and I'm excited

0:30:43.280 --> 0:30:44.960
<v Speaker 7>for someone to try to do it, because we could

0:30:44.960 --> 0:30:46.840
<v Speaker 7>make our model a lot better and more robust if

0:30:46.840 --> 0:30:47.480
<v Speaker 7>this existed.

0:30:47.640 --> 0:30:49.719
<v Speaker 4>So I want to know what your personal like token

0:30:49.760 --> 0:30:52.880
<v Speaker 4>budget is nowadays that you're even like contemplating some of

0:30:52.880 --> 0:30:53.360
<v Speaker 4>those stuff.

0:30:53.360 --> 0:30:56.000
<v Speaker 2>What I feel like I had the Cloude Max playing,

0:30:56.040 --> 0:30:59.400
<v Speaker 2>you know, and I don't work like when I'm at work,

0:31:00.000 --> 0:31:02.080
<v Speaker 2>I don't work on any of my Vibe coding projects.

0:31:02.160 --> 0:31:03.680
<v Speaker 3>And you know, like when we were kids.

0:31:03.840 --> 0:31:06.000
<v Speaker 2>I don't know if you remember, like if you didn't

0:31:06.000 --> 0:31:08.480
<v Speaker 2>need all your food, like someone to say, oh, there's

0:31:08.480 --> 0:31:09.760
<v Speaker 2>like starving kids in the world.

0:31:10.080 --> 0:31:13.120
<v Speaker 4>Yeah, I'm like, oh, it's starving Vibe coder.

0:31:14.280 --> 0:31:15.280
<v Speaker 3>It's like, oh, you didn't.

0:31:15.320 --> 0:31:17.720
<v Speaker 2>Like I have this four hour token window and I'm

0:31:17.760 --> 0:31:20.520
<v Speaker 2>almost never maxing it out, and I'm just thinking, like,

0:31:20.880 --> 0:31:22.600
<v Speaker 2>the are kids on the other side of the world

0:31:22.600 --> 0:31:25.160
<v Speaker 2>that wish they had your tokens and you're you're not

0:31:25.320 --> 0:31:27.040
<v Speaker 2>using all of your tokens for the window.

0:31:27.120 --> 0:31:27.680
<v Speaker 3>How dare you?

0:31:27.760 --> 0:31:30.360
<v Speaker 2>I feel a little guilty when I don't out max

0:31:30.400 --> 0:31:32.760
<v Speaker 2>out by Claude max token program.

0:31:32.840 --> 0:31:35.400
<v Speaker 7>I also have Claude Max and yeah, most days I'm

0:31:35.640 --> 0:31:37.720
<v Speaker 7>not doing much coding at all, I'm not maxing it out,

0:31:37.840 --> 0:31:39.480
<v Speaker 7>and then some days I'm going you feel a lot.

0:31:39.520 --> 0:31:42.520
<v Speaker 2>Guilty about that though, it's like, yeah, yeah, so can

0:31:42.600 --> 0:31:45.960
<v Speaker 2>I just feel like writing is kind of interesting, but like,

0:31:46.200 --> 0:31:49.960
<v Speaker 2>what are the prospects of this being able to work on? Say,

0:31:50.840 --> 0:31:53.160
<v Speaker 2>and you must get this lot image and video generation?

0:31:53.960 --> 0:31:56.680
<v Speaker 2>Is it it all theoretically similar? Is there a reason

0:31:56.800 --> 0:31:59.360
<v Speaker 2>to think that it will be replicable? Or is this

0:31:59.480 --> 0:32:00.960
<v Speaker 2>just a different beast of a problem.

0:32:01.040 --> 0:32:03.760
<v Speaker 7>I think the approach is definitely doable. I think some

0:32:03.840 --> 0:32:06.760
<v Speaker 7>of the economics change, especially if we look at video

0:32:06.840 --> 0:32:09.400
<v Speaker 7>and the cost of generating video today. Okay, we can't

0:32:09.440 --> 0:32:11.920
<v Speaker 7>generate video at the same scale that we can generate text,

0:32:12.400 --> 0:32:14.320
<v Speaker 7>and so we might need a kind of different approach.

0:32:14.680 --> 0:32:17.320
<v Speaker 7>But I also believe that if we're able to solve

0:32:17.360 --> 0:32:21.120
<v Speaker 7>this for image plus maybe like audio, that could be

0:32:21.240 --> 0:32:22.840
<v Speaker 7>enough to just solve it for video as well.

0:32:22.920 --> 0:32:24.000
<v Speaker 5>Huh, zero shot.

0:32:24.120 --> 0:32:27.040
<v Speaker 4>Could you ever envision, I don't know, launching some sort

0:32:27.040 --> 0:32:30.880
<v Speaker 4>of like certification program for video because this seems to

0:32:30.920 --> 0:32:33.920
<v Speaker 4>be my dad's a boomer spends a lot of time

0:32:33.960 --> 0:32:36.960
<v Speaker 4>on Facebook, Like this seems to be what society needs, right,

0:32:37.080 --> 0:32:39.240
<v Speaker 4>Like a video that comes with a little thing that

0:32:39.280 --> 0:32:42.680
<v Speaker 4>says this is not AI generated and someone has actually

0:32:42.760 --> 0:32:44.320
<v Speaker 4>like rubber stamped that, so.

0:32:44.360 --> 0:32:47.240
<v Speaker 7>There's an organization called c TWOPA, and I think they're

0:32:47.280 --> 0:32:52.000
<v Speaker 7>doing pretty good work on content provenance. Basically, they are

0:32:52.040 --> 0:32:57.520
<v Speaker 7>working with phone makers and hardware makers to basically embed

0:32:57.640 --> 0:33:02.080
<v Speaker 7>like hardware signatures to prove that image and video we're

0:33:02.080 --> 0:33:03.120
<v Speaker 7>truly taken from.

0:33:03.000 --> 0:33:05.120
<v Speaker 4>The hardware like watermarks basically.

0:33:04.840 --> 0:33:07.720
<v Speaker 7>Yeah, exactly so, So rather than marking the AI outputs, yeah,

0:33:07.760 --> 0:33:11.400
<v Speaker 7>we're instead embedding like a proof of authenticity in the

0:33:12.360 --> 0:33:15.080
<v Speaker 7>the like thing that's real and is captured.

0:33:14.760 --> 0:33:15.200
<v Speaker 5>In real life.

0:33:15.280 --> 0:33:19.480
<v Speaker 3>That's interesting, all right, So big picture, where's the Internet going?

0:33:19.640 --> 0:33:21.440
<v Speaker 2>You know, you mentioned forty percent of the Internet is

0:33:21.440 --> 0:33:24.560
<v Speaker 2>already air generated, but maybe that's something end of the world, Like,

0:33:25.000 --> 0:33:26.719
<v Speaker 2>you know, if it's just a bunch of SEO pages

0:33:26.760 --> 0:33:29.160
<v Speaker 2>that I never read, I don't know whatever, But like

0:33:29.560 --> 0:33:31.840
<v Speaker 2>give us some thoughts high level about like with the

0:33:31.880 --> 0:33:35.800
<v Speaker 2>trajectory of the Internet. Regardless of the uptake of Pangram

0:33:35.800 --> 0:33:37.360
<v Speaker 2>and other AD detection models.

0:33:37.560 --> 0:33:40.600
<v Speaker 5>I'm a little bit worried about the state of the Internet.

0:33:40.600 --> 0:33:41.440
<v Speaker 5>I'm gonna be honest.

0:33:41.880 --> 0:33:44.720
<v Speaker 7>I think like right now, there's still like so much

0:33:44.760 --> 0:33:47.400
<v Speaker 7>of it is built around trust and norms in a

0:33:47.440 --> 0:33:50.480
<v Speaker 7>way that like we're we're not really well equipped to

0:33:50.680 --> 0:33:53.720
<v Speaker 7>suddenly deal with an onslaught of bots at a completely

0:33:53.720 --> 0:33:55.320
<v Speaker 7>different scale than we've dealt with before.

0:33:55.920 --> 0:33:58.240
<v Speaker 5>There's maybe like a good case and a bad case.

0:33:58.480 --> 0:34:00.560
<v Speaker 7>I would say, like the bad case is the Internet

0:34:00.680 --> 0:34:04.240
<v Speaker 7>goes the way of debt internet theory, just like every

0:34:04.280 --> 0:34:07.280
<v Speaker 7>space that's open and accessible is just flooded by bots,

0:34:07.600 --> 0:34:10.000
<v Speaker 7>and then the only place people are able to communicate

0:34:10.040 --> 0:34:14.239
<v Speaker 7>authentically is in like very walled garden like closed servers

0:34:14.280 --> 0:34:17.280
<v Speaker 7>like like discord service for example, where you know everybody's

0:34:17.360 --> 0:34:19.000
<v Speaker 7>identity is known and you know you don't.

0:34:18.800 --> 0:34:21.600
<v Speaker 5>Have bots in here. So that's maybe the like bad scenario.

0:34:21.920 --> 0:34:24.399
<v Speaker 2>Can I do an insane thought that I've had go on,

0:34:25.360 --> 0:34:28.440
<v Speaker 2>We're gonna kick out of this? So when like I

0:34:28.480 --> 0:34:30.799
<v Speaker 2>forget what they call like this idea of like for

0:34:30.880 --> 0:34:31.880
<v Speaker 2>the bad actors, it's.

0:34:31.680 --> 0:34:34.200
<v Speaker 3>Called like heaven mode or heaven banning. Have you heard

0:34:34.200 --> 0:34:36.640
<v Speaker 3>of this? So there's this thought that one way.

0:34:36.520 --> 0:34:40.319
<v Speaker 2>You could deal with bad actors on the Internet is

0:34:41.280 --> 0:34:44.480
<v Speaker 2>suddenly they're on a version of say Twitter, in which

0:34:44.520 --> 0:34:47.480
<v Speaker 2>they're only bots and everyone always agrees with them on

0:34:47.520 --> 0:34:50.080
<v Speaker 2>everything and it drives them crazy and stuff like that,

0:34:50.320 --> 0:34:52.239
<v Speaker 2>and they would never know it because they're like, oh,

0:34:52.239 --> 0:34:54.160
<v Speaker 2>there's call, everyone's there, and then it's so like slowly

0:34:54.200 --> 0:34:56.040
<v Speaker 2>like yeah, they just this is like a way you

0:34:56.080 --> 0:34:58.279
<v Speaker 2>could punish people by putting them on an internet where

0:34:58.320 --> 0:34:59.480
<v Speaker 2>they will never get any fight.

0:35:00.120 --> 0:35:02.560
<v Speaker 7>Band and put into basically jail. You're talking a bunch.

0:35:02.360 --> 0:35:04.040
<v Speaker 3>Of that's right, that's right, that would be jail. But

0:35:04.080 --> 0:35:04.799
<v Speaker 3>you're heaven banned.

0:35:04.920 --> 0:35:07.080
<v Speaker 2>But I thought, and again, this is you know, like

0:35:07.080 --> 0:35:09.000
<v Speaker 2>I built this little am model myself and I like

0:35:09.000 --> 0:35:11.399
<v Speaker 2>showed it to my friends, like, oh, it's really cool, Joe.

0:35:11.400 --> 0:35:13.719
<v Speaker 2>I'm really oppressed, Like I'm really impressed by like that

0:35:13.760 --> 0:35:16.239
<v Speaker 2>you're able to do this. And I was like, are

0:35:16.280 --> 0:35:18.520
<v Speaker 2>people being honest with me? Have I been heaven banned?

0:35:18.520 --> 0:35:20.799
<v Speaker 2>Because I just like, like, you can be honest with

0:35:20.840 --> 0:35:21.560
<v Speaker 2>me if it sucks.

0:35:21.560 --> 0:35:23.400
<v Speaker 3>And I sort of have the fear.

0:35:23.360 --> 0:35:26.840
<v Speaker 4>The biggest humble braggad this thing and everyone thought it

0:35:26.880 --> 0:35:27.399
<v Speaker 4>was not great.

0:35:27.520 --> 0:35:29.279
<v Speaker 3>I'm just saying, like people are like I think people.

0:35:29.320 --> 0:35:31.560
<v Speaker 3>I'm worried that like people bring nice to me because like,

0:35:31.560 --> 0:35:33.400
<v Speaker 3>oh cool, Yeah that's repressed. You like did that.

0:35:33.560 --> 0:35:36.440
<v Speaker 2>And I have this like deep anxiety that like people

0:35:36.440 --> 0:35:38.520
<v Speaker 2>aren't giving it to me straight about it. I know

0:35:38.560 --> 0:35:40.120
<v Speaker 2>that sounds like a humble brag, but it's really not.

0:35:40.320 --> 0:35:42.120
<v Speaker 7>That's why you can never get like too successful, like

0:35:42.200 --> 0:35:45.080
<v Speaker 7>Maya West surrounded by a bunch of you never get.

0:35:44.880 --> 0:35:47.799
<v Speaker 2>Like, oh, this is his first try doing something with

0:35:47.960 --> 0:35:50.080
<v Speaker 2>vibe coding. I'm like deeply anxious, Like, no, you could

0:35:50.120 --> 0:35:52.480
<v Speaker 2>just tell me if it sucks, that's fine, that's my worry.

0:35:53.000 --> 0:35:53.920
<v Speaker 6>I don't worry about this.

0:35:54.040 --> 0:35:56.439
<v Speaker 4>If I tweet that I'm eating a steak, I will

0:35:56.440 --> 0:35:59.520
<v Speaker 4>get like a hundred people criticized and you didn't.

0:35:59.360 --> 0:35:59.839
<v Speaker 3>Put the meat.

0:36:00.120 --> 0:36:00.520
<v Speaker 2>Yeah.

0:36:00.560 --> 0:36:00.960
<v Speaker 5>Yeah.

0:36:01.000 --> 0:36:02.839
<v Speaker 2>So that's the other thing, which is that the two

0:36:02.920 --> 0:36:06.560
<v Speaker 2>things you are never allowed to tweet about meat preparation

0:36:07.160 --> 0:36:09.640
<v Speaker 2>and enjoying life, because if you ever enjoy life, then

0:36:09.640 --> 0:36:11.600
<v Speaker 2>if you ever enjoy it, and if you ever prepare.

0:36:11.360 --> 0:36:14.280
<v Speaker 3>Meat, people will flip out at you on the internet.

0:36:14.360 --> 0:36:16.279
<v Speaker 3>Those are the two things that you're not allowed to

0:36:16.360 --> 0:36:17.080
<v Speaker 3>do online.

0:36:17.280 --> 0:36:19.759
<v Speaker 4>Very true, this sort of related question, But just going

0:36:19.800 --> 0:36:22.600
<v Speaker 4>back to the methodology, if you're focused on this sort

0:36:22.600 --> 0:36:26.000
<v Speaker 4>of like path dependent idea, I'm kind of envisioning it

0:36:26.040 --> 0:36:29.279
<v Speaker 4>as like a giant decision tree, right, is there a

0:36:29.320 --> 0:36:32.839
<v Speaker 4>possibility that as the models get better and better, and

0:36:32.880 --> 0:36:35.839
<v Speaker 4>we know that they're already injecting like some degree of

0:36:36.120 --> 0:36:39.800
<v Speaker 4>randomness into their output. Although I know there's going to

0:36:39.800 --> 0:36:42.000
<v Speaker 4>be a pedant out there who like messages me and

0:36:42.040 --> 0:36:44.880
<v Speaker 4>says like, well, you know computers can't do like true randomness.

0:36:44.880 --> 0:36:49.480
<v Speaker 4>But setting that aside, setting that aside, like, we know

0:36:49.560 --> 0:36:53.640
<v Speaker 4>that they're adjusting, they're becoming more sophisticated at an incredible rate.

0:36:53.719 --> 0:36:57.480
<v Speaker 4>We know that they're trying to adjust and inject some

0:36:57.719 --> 0:37:01.000
<v Speaker 4>randomness in order to avoid exactly this kind of detection.

0:37:01.880 --> 0:37:05.160
<v Speaker 4>Do you worry about their own adaptation at all?

0:37:05.480 --> 0:37:08.600
<v Speaker 7>I have noticed that the models as they get more capable,

0:37:08.880 --> 0:37:12.279
<v Speaker 7>I believe that their output distribution gets more complex. It's

0:37:12.320 --> 0:37:14.920
<v Speaker 7>harder to learn with a simple model, which is why

0:37:14.960 --> 0:37:18.560
<v Speaker 7>we've been increasing our model size to capture a higher

0:37:18.600 --> 0:37:22.319
<v Speaker 7>complexity function that can capture the LM outputs. So I

0:37:22.320 --> 0:37:25.719
<v Speaker 7>think we may have to continue to make our models better.

0:37:25.960 --> 0:37:27.359
<v Speaker 7>We're gonna have to work to keep up with it.

0:37:27.719 --> 0:37:29.400
<v Speaker 7>We can't just rest on our laurels.

0:37:29.560 --> 0:37:31.399
<v Speaker 3>What our birstiness and perplexity.

0:37:31.760 --> 0:37:34.799
<v Speaker 7>Yeah, so this is a metric that's used by some

0:37:34.920 --> 0:37:37.960
<v Speaker 7>AI detectors, but not Pangram okay, And so I can

0:37:38.000 --> 0:37:41.319
<v Speaker 7>explain a bit about how it works. So perplexity is

0:37:41.480 --> 0:37:42.799
<v Speaker 7>Basically a measure of this.

0:37:42.800 --> 0:37:45.040
<v Speaker 2>Is not perplexity dot AI the website. This is a

0:37:45.080 --> 0:37:45.680
<v Speaker 2>technical term.

0:37:45.719 --> 0:37:48.640
<v Speaker 7>Okay, this is a metric. This is a measure of

0:37:48.719 --> 0:37:52.760
<v Speaker 7>how confusing a piece of text is to a language model.

0:37:53.320 --> 0:37:58.080
<v Speaker 7>So basically, if, for example, with every token, we can

0:37:58.120 --> 0:38:00.800
<v Speaker 7>calculate some perplexity, which is basically like how expected is

0:38:00.840 --> 0:38:03.600
<v Speaker 7>this is. So for example, like if it's I went

0:38:03.640 --> 0:38:06.560
<v Speaker 7>home to my pet and then the next token is chinchilla,

0:38:06.840 --> 0:38:09.000
<v Speaker 7>that'd be a much higher perplexity token.

0:38:08.960 --> 0:38:09.880
<v Speaker 5>Than my pet dog.

0:38:10.600 --> 0:38:16.000
<v Speaker 7>So low perplexity text or really like LM outputs tend

0:38:16.000 --> 0:38:19.040
<v Speaker 7>to be low perplexity. They're not going to produce outputs

0:38:19.080 --> 0:38:22.960
<v Speaker 7>that are surprising to themselves. So this is a decent

0:38:23.000 --> 0:38:26.160
<v Speaker 7>way to get an AI detector that's around ninety to

0:38:26.239 --> 0:38:30.000
<v Speaker 7>ninety five percent accurate. But it has some problems. The

0:38:30.000 --> 0:38:33.920
<v Speaker 7>main one is that you can't improve upon it. Basically

0:38:34.160 --> 0:38:38.160
<v Speaker 7>it has false positives. Text written by non native English

0:38:38.160 --> 0:38:41.440
<v Speaker 7>speakers often is low perplexity just because when you're late.

0:38:41.440 --> 0:38:42.880
<v Speaker 3>Don't take as many risks. Exactly.

0:38:43.000 --> 0:38:46.400
<v Speaker 7>Yeah, interesting, Yeah, So that's why a lot of the

0:38:46.440 --> 0:38:49.440
<v Speaker 7>early AI detectors had a bunch of false positives. With

0:38:49.800 --> 0:38:53.640
<v Speaker 7>ESL speakers. It's because their text was low perplexity. So

0:38:54.080 --> 0:38:56.600
<v Speaker 7>I think, like, this is a very cool metric, but

0:38:56.800 --> 0:38:59.120
<v Speaker 7>it is not the path for pangram.

0:38:59.120 --> 0:39:01.520
<v Speaker 5>Instead, we went the deep approach, so we can do

0:39:01.600 --> 0:39:02.120
<v Speaker 5>better than.

0:39:02.040 --> 0:39:04.359
<v Speaker 3>And what's in this is that just the opposite side

0:39:04.360 --> 0:39:04.759
<v Speaker 3>of the coin.

0:39:05.239 --> 0:39:09.040
<v Speaker 7>Yeah, Burstinus is basically actually, yeah, I don't know if

0:39:09.040 --> 0:39:09.600
<v Speaker 7>I can define it.

0:39:09.719 --> 0:39:13.319
<v Speaker 4>Okay, fine, Burstinus just sounds like one of those like

0:39:13.560 --> 0:39:16.960
<v Speaker 4>sort of I guess manosphere terms, doesn't it like, oh,

0:39:17.040 --> 0:39:17.520
<v Speaker 4>yeah he.

0:39:17.480 --> 0:39:20.320
<v Speaker 6>Has like he's been looksmaxing with high burst nets or

0:39:20.360 --> 0:39:20.759
<v Speaker 6>something like that.

0:39:21.440 --> 0:39:22.200
<v Speaker 3>Yeah, that's great.

0:39:22.239 --> 0:39:24.080
<v Speaker 7>Yeah, I think it might just be like a measure

0:39:24.160 --> 0:39:27.840
<v Speaker 7>of like sentence Lengthen, how the ups and downs of

0:39:27.880 --> 0:39:28.320
<v Speaker 7>the text.

0:39:28.960 --> 0:39:32.279
<v Speaker 4>If we assume that the world is collectively concerned about

0:39:32.280 --> 0:39:34.960
<v Speaker 4>AI slop and wants to do something about it, what

0:39:35.000 --> 0:39:39.120
<v Speaker 4>would be like the single biggest change to the system,

0:39:39.480 --> 0:39:42.080
<v Speaker 4>either in terms of like the economics of the internet

0:39:42.160 --> 0:39:46.120
<v Speaker 4>or regulation or technology like what you're developing that would

0:39:46.160 --> 0:39:48.160
<v Speaker 4>actually help reduce slop.

0:39:48.440 --> 0:39:51.080
<v Speaker 7>Yeah, I think the biggest one is norms. So there

0:39:51.080 --> 0:39:53.400
<v Speaker 7>have been a couple of great blog posts written about

0:39:53.440 --> 0:39:58.120
<v Speaker 7>how it is rude to send other people undisclosed AI outputs,

0:39:58.719 --> 0:40:02.359
<v Speaker 7>and I think I like completely agree here. I think,

0:40:02.480 --> 0:40:04.239
<v Speaker 7>you know, if somebody like asks the question on the

0:40:04.239 --> 0:40:06.759
<v Speaker 7>Internet and then somebody else like goes and puts into

0:40:06.800 --> 0:40:08.960
<v Speaker 7>chat CHEPT and then like pace the answer, it's kind

0:40:08.960 --> 0:40:10.560
<v Speaker 7>of rude, Like like I was going here to ask

0:40:10.800 --> 0:40:13.879
<v Speaker 7>the opinions of my friends or you know, my followers, not.

0:40:14.080 --> 0:40:16.520
<v Speaker 5>Just like not chat GPT. I could have done that myself.

0:40:16.840 --> 0:40:19.640
<v Speaker 7>And so I think, like building this norm is something

0:40:19.680 --> 0:40:22.120
<v Speaker 7>that you know, it's very new technology, so we need

0:40:22.160 --> 0:40:23.040
<v Speaker 7>to do it quickly.

0:40:23.080 --> 0:40:25.760
<v Speaker 5>But I think this would help a lot for society.

0:40:25.800 --> 0:40:27.880
<v Speaker 2>Well then actually just gets to a question that I

0:40:27.920 --> 0:40:30.680
<v Speaker 2>have then, which is I feel as though the major

0:40:30.719 --> 0:40:34.560
<v Speaker 2>Internet platforms are actually moving the exact opposite direction. I mean,

0:40:34.560 --> 0:40:38.320
<v Speaker 2>I'm stunned. Maybe I accidentally clicked on something at some point,

0:40:38.600 --> 0:40:41.520
<v Speaker 2>but the frequency with which I can email and then

0:40:41.560 --> 0:40:43.759
<v Speaker 2>I open it up to respond in Gmail, and there's

0:40:43.800 --> 0:40:47.000
<v Speaker 2>that ghost text there that do you just want GEM

0:40:47.040 --> 0:40:48.279
<v Speaker 2>and I to respond to this?

0:40:48.640 --> 0:40:49.680
<v Speaker 3>I've never done.

0:40:49.480 --> 0:40:52.040
<v Speaker 2>That, I also consider, I think that would be extremely rude.

0:40:52.040 --> 0:40:56.719
<v Speaker 2>I've never responded to any email with AI respond But

0:40:56.760 --> 0:40:59.239
<v Speaker 2>they're basically telling you to do that. They're doing the

0:40:59.239 --> 0:41:01.720
<v Speaker 2>exact opposite blowing up these norms, And so I'm curious

0:41:01.719 --> 0:41:04.680
<v Speaker 2>from your perspective, you managed to work with Quorra, But

0:41:04.920 --> 0:41:09.400
<v Speaker 2>from your impression, do the major internet platforms think this

0:41:09.560 --> 0:41:12.279
<v Speaker 2>is a problem worth solving or from their consider and

0:41:12.280 --> 0:41:14.320
<v Speaker 2>it is like you know what, Yeah, it feels content

0:41:14.400 --> 0:41:14.759
<v Speaker 2>the better.

0:41:14.840 --> 0:41:17.800
<v Speaker 4>There's mixed incentives for the big company.

0:41:17.800 --> 0:41:20.360
<v Speaker 7>It's funny because like Google seems to be playing both sides.

0:41:20.640 --> 0:41:23.680
<v Speaker 7>So like, on one hand, they had that advertisement which

0:41:23.680 --> 0:41:25.680
<v Speaker 7>people kind of blew up about where it's like, oh,

0:41:25.800 --> 0:41:29.480
<v Speaker 7>children can now send their heroes notes on like how

0:41:29.560 --> 0:41:31.799
<v Speaker 7>much they respect them by using AI instead of like

0:41:32.040 --> 0:41:34.160
<v Speaker 7>writing the note themselves, and like this is wrong, This

0:41:34.239 --> 0:41:37.560
<v Speaker 7>is like societally bad. But at the same time, they're

0:41:37.600 --> 0:41:40.799
<v Speaker 7>working very hard to deal with the AI slop on

0:41:40.880 --> 0:41:43.520
<v Speaker 7>the Internet in search results to make sure people get

0:41:43.560 --> 0:41:45.040
<v Speaker 7>served real content and not.

0:41:45.000 --> 0:41:45.960
<v Speaker 5>AI slot content.

0:41:46.640 --> 0:41:49.279
<v Speaker 7>So I think, I mean, I think obviously there's a

0:41:49.320 --> 0:41:51.640
<v Speaker 7>lot of incentives that play up around like product people

0:41:51.680 --> 0:41:55.000
<v Speaker 7>who are incentivized to push AI because that is the

0:41:55.040 --> 0:41:59.359
<v Speaker 7>corporate mandate. But yeah, I think overall, even like in

0:41:59.400 --> 0:42:02.000
<v Speaker 7>my sphere, a bunch of people who are AI researchers,

0:42:02.640 --> 0:42:06.520
<v Speaker 7>generally consensus is that like AI is a powerful tool,

0:42:06.560 --> 0:42:07.600
<v Speaker 7>but like slop is bad.

0:42:07.880 --> 0:42:10.840
<v Speaker 4>This reminds me my parents used to make me do

0:42:10.960 --> 0:42:15.080
<v Speaker 4>these like handmade greeting cards for every you know, for Christmas,

0:42:15.120 --> 0:42:17.160
<v Speaker 4>for like all relatives and stuff. And it was supposed

0:42:17.200 --> 0:42:22.319
<v Speaker 4>to be a demonstration of my commitment to communicating family. No, no,

0:42:22.400 --> 0:42:25.799
<v Speaker 4>it traumatized me forever. And I hate greeting cards as

0:42:25.840 --> 0:42:28.680
<v Speaker 4>a result of them of doing this, just spending hours

0:42:28.800 --> 0:42:31.840
<v Speaker 4>manufacturing these things. But then, secondly, the funniest thing was

0:42:31.920 --> 0:42:36.040
<v Speaker 4>once we got E cards, my parents immediately switched to

0:42:36.200 --> 0:42:40.080
<v Speaker 4>using e cards and just and now this is also

0:42:40.120 --> 0:42:40.879
<v Speaker 4>the funniest thing.

0:42:41.080 --> 0:42:42.359
<v Speaker 6>My dad uses E card.

0:42:42.400 --> 0:42:44.480
<v Speaker 4>He figured out that the E card system can tell

0:42:44.560 --> 0:42:46.680
<v Speaker 4>him whether or not you opened it, so he just

0:42:46.800 --> 0:42:48.680
<v Speaker 4>uses it as like day to day communication.

0:42:48.880 --> 0:42:51.840
<v Speaker 5>Now that's so funny.

0:42:51.880 --> 0:42:54.839
<v Speaker 3>Just send an email to your daughter E card.

0:42:55.120 --> 0:42:56.840
<v Speaker 4>It's like, I noticed you haven't opened up my E

0:42:57.000 --> 0:43:01.640
<v Speaker 4>card for International Hot Dog Day. Please let me know

0:43:01.920 --> 0:43:02.560
<v Speaker 4>what's going on.

0:43:02.640 --> 0:43:05.640
<v Speaker 2>I'm terrible handwriting as a kid, and my mother made

0:43:05.640 --> 0:43:08.480
<v Speaker 2>me write all of these handwritten notes to thank people

0:43:08.520 --> 0:43:09.440
<v Speaker 2>for the gifts I got for.

0:43:09.480 --> 0:43:10.400
<v Speaker 3>My bar mitzvah.

0:43:10.480 --> 0:43:12.839
<v Speaker 2>Yeah, I hated it, but you know what, I have

0:43:12.960 --> 0:43:14.359
<v Speaker 2>keep connections with all of.

0:43:14.320 --> 0:43:16.360
<v Speaker 3>Those people that have lasted over the years.

0:43:16.760 --> 0:43:19.400
<v Speaker 2>In that miserable one week where I just wrote and

0:43:19.440 --> 0:43:21.600
<v Speaker 2>I got, you know, hand creamped, I think it.

0:43:21.560 --> 0:43:22.520
<v Speaker 3>Paid off, all right.

0:43:22.520 --> 0:43:27.400
<v Speaker 4>Well, imagine doing that for like sixteen years basically in

0:43:27.400 --> 0:43:28.960
<v Speaker 4>a never ending stream.

0:43:29.000 --> 0:43:31.360
<v Speaker 3>Max Birou, thank you so much for coming on out Laws.

0:43:31.400 --> 0:43:33.600
<v Speaker 3>That was a lot of fun. I'm fascinated by this conversation.

0:43:33.800 --> 0:43:35.759
<v Speaker 7>Thanks so much for having me. Yeah, really exciting to

0:43:35.800 --> 0:43:38.480
<v Speaker 7>talk about this. And I think slaps is a growing problem,

0:43:38.520 --> 0:43:40.160
<v Speaker 7>so hopefully awesome RAPK deal with it.

0:43:41.120 --> 0:43:42.200
<v Speaker 6>Of the internet, I.

0:43:42.200 --> 0:43:44.040
<v Speaker 4>Can't tell if I'm surprised by that oring on.

0:43:44.280 --> 0:43:45.960
<v Speaker 3>And what's it going to be next year at this time?

0:43:46.280 --> 0:43:47.399
<v Speaker 5>Oh man, I don't know.

0:43:47.760 --> 0:43:49.800
<v Speaker 3>It'll be like hard to stay over with Georgian that

0:43:49.880 --> 0:43:50.280
<v Speaker 3>for sure.

0:43:50.719 --> 0:43:52.120
<v Speaker 5>Yeah, almost certainly crazy.

0:43:52.400 --> 0:43:53.480
<v Speaker 3>All right, thanks for coming on.

0:43:53.440 --> 0:44:02.560
<v Speaker 5>Oudlin, Thanks.

0:44:07.440 --> 0:44:08.920
<v Speaker 3>Tracy. I love that conversation.

0:44:09.000 --> 0:44:10.799
<v Speaker 2>I just think it's like a really fun puzzle, right,

0:44:11.719 --> 0:44:15.840
<v Speaker 2>It's very like it seems like a fun question to solve,

0:44:15.920 --> 0:44:19.520
<v Speaker 2>And I'm fascinated by this idea of how like with

0:44:19.719 --> 0:44:24.239
<v Speaker 2>both humans and AI, there's gonna be this gap inevitable

0:44:24.480 --> 0:44:27.319
<v Speaker 2>between what we know and what we can articulate because

0:44:27.360 --> 0:44:29.479
<v Speaker 2>you and I both setting aside a a versus text,

0:44:29.640 --> 0:44:32.160
<v Speaker 2>there are things that we both know. For example, this

0:44:32.200 --> 0:44:34.760
<v Speaker 2>is newsworthy, and this is this is a good episode

0:44:34.800 --> 0:44:37.880
<v Speaker 2>of a podcast, This is a credible sounding guest, and

0:44:37.920 --> 0:44:41.040
<v Speaker 2>this isn't the gap between that and then being able

0:44:41.080 --> 0:44:43.520
<v Speaker 2>to explain why, it's like, well, you just sort of

0:44:43.600 --> 0:44:45.560
<v Speaker 2>know it, right, You just sort of have this feeling there,

0:44:46.560 --> 0:44:50.479
<v Speaker 2>and that intuition is built up from numerous examples, which

0:44:50.520 --> 0:44:52.200
<v Speaker 2>is the same way in a sense that like the

0:44:52.239 --> 0:44:53.240
<v Speaker 2>AI is trained.

0:44:53.320 --> 0:44:54.360
<v Speaker 3>It's like these.

0:44:54.239 --> 0:44:56.760
<v Speaker 2>Things that you only know from patterns and you can

0:44:57.160 --> 0:45:00.520
<v Speaker 2>see them without fully being able to, like article exactly

0:45:00.560 --> 0:45:01.160
<v Speaker 2>what's going on.

0:45:01.280 --> 0:45:02.360
<v Speaker 6>Well, the other.

0:45:02.239 --> 0:45:05.239
<v Speaker 4>Question I would have on that is is it even

0:45:05.280 --> 0:45:07.680
<v Speaker 4>going to matter in the long run if you think about,

0:45:07.719 --> 0:45:10.960
<v Speaker 4>like so much of the Internet is already built on

0:45:11.040 --> 0:45:14.120
<v Speaker 4>bots and the sort of like false attention economy, Like

0:45:14.800 --> 0:45:21.680
<v Speaker 4>if our entire like worldview becomes shaped by AI driven drivel, yeah,

0:45:22.560 --> 0:45:25.440
<v Speaker 4>does it matter if like the economics of the Internet

0:45:25.560 --> 0:45:28.759
<v Speaker 4>are still attached to individual bought accounts and things like that.

0:45:28.760 --> 0:45:31.640
<v Speaker 6>I don't know if I'm if I'm explaining this, but.

0:45:31.760 --> 0:45:33.040
<v Speaker 2>No, no, I think it makes a lot of sense,

0:45:33.080 --> 0:45:36.160
<v Speaker 2>and I do think like it is important, like we're.

0:45:36.040 --> 0:45:37.799
<v Speaker 3>Going to have to change the entire way with them.

0:45:38.000 --> 0:45:40.399
<v Speaker 2>And Max said at the beginning, which is, and I've

0:45:40.400 --> 0:45:42.759
<v Speaker 2>thought about this, which is that it used to be

0:45:43.120 --> 0:45:45.000
<v Speaker 2>that if you came across a piece of writing and

0:45:45.040 --> 0:45:49.120
<v Speaker 2>the punctuation was excellent and the spelling was excellent, and

0:45:49.160 --> 0:45:51.680
<v Speaker 2>it was like cogent sounding, you're like, okay, this has

0:45:51.680 --> 0:45:55.239
<v Speaker 2>been written by a smart person. I will read the seriously, right,

0:45:55.880 --> 0:45:59.160
<v Speaker 2>And now there is this complete severance of sort of

0:45:59.200 --> 0:46:02.440
<v Speaker 2>like craft and out put because you could and you

0:46:02.520 --> 0:46:05.759
<v Speaker 2>did this, Like, ask Claude to write an argument in

0:46:05.800 --> 0:46:10.239
<v Speaker 2>favor of the most absurd proposition imaginable. Ask Claude to

0:46:11.000 --> 0:46:15.520
<v Speaker 2>write an argument for me that the reason why Reagan

0:46:15.560 --> 0:46:18.200
<v Speaker 2>wanted to do tax cuts in the early nineteen eighties

0:46:18.680 --> 0:46:22.200
<v Speaker 2>related to these reports of UFO sightings in the nineteen seventies,

0:46:22.600 --> 0:46:25.719
<v Speaker 2>and it will write something that not only is it

0:46:25.760 --> 0:46:28.319
<v Speaker 2>grammatically correct, it'll actually like strain to come up with

0:46:28.360 --> 0:46:31.000
<v Speaker 2>the best version of this argument before and again if

0:46:31.080 --> 0:46:33.560
<v Speaker 2>prior to that, having read and like, maybe the person

0:46:34.160 --> 0:46:37.000
<v Speaker 2>like this person took this argument seriously, but now this

0:46:37.080 --> 0:46:39.880
<v Speaker 2>argument is just created. Ax nail Oh We're going to

0:46:39.960 --> 0:46:42.440
<v Speaker 2>have to really like change our heuristics about this stuff.

0:46:42.480 --> 0:46:46.600
<v Speaker 4>We've created an unlimited stream of basically cranks, which is

0:46:46.680 --> 0:46:47.480
<v Speaker 4>really good grammar.

0:46:47.640 --> 0:46:50.520
<v Speaker 2>Yeah, that's right, that's right, because it used to be

0:46:50.560 --> 0:46:52.360
<v Speaker 2>we knew the crank because they had bad grammar, or

0:46:52.360 --> 0:46:55.239
<v Speaker 2>they would email us and like half the words would

0:46:55.239 --> 0:46:57.520
<v Speaker 2>be in yellow and the other half would be underlined green.

0:46:57.560 --> 0:47:01.279
<v Speaker 4>Inlastic exams, the tools that we use to just like, oh,

0:47:01.280 --> 0:47:03.920
<v Speaker 4>this person's a crank, they like, you know, half the

0:47:04.000 --> 0:47:05.799
<v Speaker 4>words are at all caps and stuff like that.

0:47:06.200 --> 0:47:07.280
<v Speaker 3>Those don't work anymore.

0:47:07.320 --> 0:47:09.440
<v Speaker 4>All right, on that note, shall we leave it there?

0:47:09.520 --> 0:47:10.160
<v Speaker 3>Let's save it there.

0:47:10.320 --> 0:47:12.799
<v Speaker 4>This has been another episode of the Authlots podcast. I'm

0:47:12.840 --> 0:47:15.600
<v Speaker 4>Tracy Alloway. You can follow me at Tracy Alloway.

0:47:15.320 --> 0:47:18.080
<v Speaker 2>And I'm joll Wisenthal. You can follow me at the Stalwart.

0:47:18.400 --> 0:47:22.480
<v Speaker 2>Follow our guest Max Spiro. He's at Max Underscore Spiro Underscore.

0:47:22.719 --> 0:47:25.960
<v Speaker 2>Follow our producers Carmen Rodriguez at Carmen Arman, dash Sho

0:47:26.040 --> 0:47:29.120
<v Speaker 2>Bennett at Dashbot, and Cal Brooks at Kilbrooks. And for

0:47:29.239 --> 0:47:32.359
<v Speaker 2>more oddloss content, go to Bloomberg dot com slash odd Lots.

0:47:32.360 --> 0:47:34.799
<v Speaker 2>We're a daily newsletter and all of our episodes, and

0:47:34.840 --> 0:47:36.640
<v Speaker 2>you can chat about all of these topics twenty four

0:47:36.680 --> 0:47:40.279
<v Speaker 2>to seven in our discord discord dot gg slash od

0:47:40.280 --> 0:47:41.000
<v Speaker 2>lots And.

0:47:41.080 --> 0:47:43.279
<v Speaker 4>If you enjoy odlots, if you like it when we

0:47:43.320 --> 0:47:46.120
<v Speaker 4>talk about how the Internet is forty percent slop, then

0:47:46.160 --> 0:47:49.280
<v Speaker 4>please leave us a positive review on your favorite podcast platform.

0:47:49.520 --> 0:47:51.880
<v Speaker 4>And remember, if you are a Bloomberg subscriber, you can

0:47:51.920 --> 0:47:55.000
<v Speaker 4>listen to all of our episodes absolutely ad free. All

0:47:55.000 --> 0:47:56.920
<v Speaker 4>you need to do is find the Bloomberg channel on

0:47:57.000 --> 0:47:59.239
<v Speaker 4>Apple Podcasts and follow the instructions there.

0:47:59.640 --> 0:48:00.480
<v Speaker 6>Thanks listening,