1
00:00:02,720 --> 00:00:16,360
Speaker 1: Bloomberg Audio Studios, Podcasts, Radio News.

2
00:00:18,480 --> 00:00:21,840
Speaker 2: Hello and welcome to another episode of The Odd Lads podcast.

3
00:00:21,920 --> 00:00:24,119
Speaker 3: I'm jille Wisenthal and I'm Tracy Alloway.

4
00:00:24,360 --> 00:00:27,280
Speaker 2: So, Tracy, you know, you ever come across some writing

5
00:00:28,160 --> 00:00:31,720
Speaker 2: you can't articulate exactly why, but you're like, I'm pretty

6
00:00:31,760 --> 00:00:32,720
Speaker 2: sure AI wrote this?

7
00:00:33,120 --> 00:00:34,160
Speaker 3: Does this happen too much?

8
00:00:34,280 --> 00:00:38,600
Speaker 4: So, full disclosure, I haven't really thought about it that much. Yeah,

9
00:00:38,640 --> 00:00:41,640
Speaker 4: because the thing is I probably should think about it more,

10
00:00:41,960 --> 00:00:43,960
Speaker 4: but there's a lot of bad writing out there, and

11
00:00:44,000 --> 00:00:46,440
Speaker 4: I've become sort of a nerd to it. And I

12
00:00:46,479 --> 00:00:49,680
Speaker 4: also think that I don't know trying to figure out

13
00:00:49,680 --> 00:00:53,880
Speaker 4: whether or not something was generated by AI nowadays, if

14
00:00:53,880 --> 00:00:55,960
Speaker 4: you actually dedicate a lot of your own time to

15
00:00:56,120 --> 00:01:00,880
Speaker 4: doing that, that is a huge mental burden to be attempting.

16
00:01:01,000 --> 00:01:03,480
Speaker 4: Especially you and I are in the journalism industry. How

17
00:01:03,560 --> 00:01:05,760
Speaker 4: many of the pitches do you think that we get

18
00:01:05,800 --> 00:01:08,440
Speaker 4: from prs right now are being generated by A I

19
00:01:08,760 --> 00:01:11,520
Speaker 4: imagine if you're reading each one of those and trying

20
00:01:11,560 --> 00:01:13,440
Speaker 4: to figure it out on a daily basis.

21
00:01:13,560 --> 00:01:15,360
Speaker 2: You know what I suppose I think about it the

22
00:01:15,360 --> 00:01:18,200
Speaker 2: most is someone will respond to a tweet yeah, and

23
00:01:18,240 --> 00:01:19,800
Speaker 2: I'll be like, well, if this is a real person,

24
00:01:19,800 --> 00:01:22,319
Speaker 2: then maybe this person deserves some engagement and ask a

25
00:01:22,400 --> 00:01:24,560
Speaker 2: question or I want to respond. But if there's a

26
00:01:24,560 --> 00:01:26,880
Speaker 2: person in the bot, then obviously I don't. And that's

27
00:01:26,880 --> 00:01:28,399
Speaker 2: where I look, you know what, I want to figure

28
00:01:28,400 --> 00:01:30,800
Speaker 2: it out. I would like to know the answer.

29
00:01:30,959 --> 00:01:31,160
Speaker 3: You know.

30
00:01:31,200 --> 00:01:34,200
Speaker 2: I have a controversial view about AI writing, by the way,

31
00:01:34,240 --> 00:01:36,640
Speaker 2: which is that it's pretty good. I mean, like, by

32
00:01:36,680 --> 00:01:38,920
Speaker 2: and large, and I said this, I think maybe in

33
00:01:38,920 --> 00:01:41,640
Speaker 2: a recent episode. When you consider the fact that I

34
00:01:41,680 --> 00:01:44,679
Speaker 2: don't know the majority of the population, like doesn't know

35
00:01:44,680 --> 00:01:47,600
Speaker 2: where to put a comma within the sentence, Well, this

36
00:01:47,680 --> 00:01:48,120
Speaker 2: is my point.

37
00:01:48,320 --> 00:01:48,960
Speaker 3: It's pretty good.

38
00:01:48,960 --> 00:01:49,400
Speaker 5: I mean, yeah.

39
00:01:49,400 --> 00:01:51,160
Speaker 2: One thing I'll say about AI is it never gets

40
00:01:51,240 --> 00:01:52,480
Speaker 2: the placement of a comma wrong.

41
00:01:52,840 --> 00:01:54,160
Speaker 3: On some level, it's perfect.

42
00:01:54,320 --> 00:01:56,000
Speaker 6: Did you do that? I think it was in the

43
00:01:56,000 --> 00:01:57,560
Speaker 6: New York Times the test.

44
00:01:57,600 --> 00:01:58,360
Speaker 3: I kind of hated that.

45
00:01:58,560 --> 00:02:01,200
Speaker 2: Okay, why well, because I'll tell you, first of all,

46
00:02:01,160 --> 00:02:02,240
Speaker 2: it's a five examples.

47
00:02:02,280 --> 00:02:04,520
Speaker 3: There's not very many. Two It asked the reader, which

48
00:02:04,520 --> 00:02:05,080
Speaker 3: do you prefer?

49
00:02:05,160 --> 00:02:07,120
Speaker 4: But I think they were different subjects as well.

50
00:02:07,200 --> 00:02:07,440
Speaker 3: Yeah.

51
00:02:07,600 --> 00:02:10,080
Speaker 2: Also, I think most people probably treated that as can

52
00:02:10,120 --> 00:02:11,959
Speaker 2: you guess which one is a human? Because everyone wants

53
00:02:12,000 --> 00:02:14,320
Speaker 2: to say they prefer the human I didn't think it

54
00:02:14,400 --> 00:02:18,400
Speaker 2: was like a great test. Nonetheless, Look, not only is

55
00:02:18,400 --> 00:02:22,360
Speaker 2: it often indistinguishable, not often is it often fine writing.

56
00:02:22,840 --> 00:02:25,359
Speaker 2: Sometimes AI could come up with a really remarkable turn

57
00:02:25,400 --> 00:02:27,760
Speaker 2: of phrase. Yeah, but I still buy and large don't

58
00:02:27,840 --> 00:02:30,000
Speaker 2: like it. You read like a thing, especially a long

59
00:02:30,040 --> 00:02:32,760
Speaker 2: text a's AI, and it's like, even if you can't articulate.

60
00:02:32,360 --> 00:02:33,880
Speaker 3: It, it's like this feels AI.

61
00:02:33,960 --> 00:02:36,640
Speaker 2: It has a certain sickliness sweetness to it that is

62
00:02:36,680 --> 00:02:37,320
Speaker 2: often annoying.

63
00:02:37,320 --> 00:02:38,160
Speaker 3: It's annoying.

64
00:02:38,400 --> 00:02:41,000
Speaker 4: What I notice about it is it doesn't do style

65
00:02:41,200 --> 00:02:43,240
Speaker 4: very well, right, So if you ask it to write

66
00:02:43,240 --> 00:02:45,840
Speaker 4: something in the style of a writer, if you choose

67
00:02:45,880 --> 00:02:49,240
Speaker 4: anything other than something really obvious like Shakespeare, it really

68
00:02:49,480 --> 00:02:53,120
Speaker 4: it suffers. But the text that it actually outputs is

69
00:02:53,160 --> 00:02:58,519
Speaker 4: pretty clear. Yeah, right, like for basic understanding. Total it's

70
00:02:58,639 --> 00:03:01,440
Speaker 4: probably better than a lotful what's on the internet.

71
00:03:01,760 --> 00:03:03,200
Speaker 2: The real people who are going to have to worry

72
00:03:03,240 --> 00:03:07,840
Speaker 2: about this are like teachers obviously, universities and lawyers, student

73
00:03:07,919 --> 00:03:11,040
Speaker 2: lawyers and maybe at it's fun, but there are sometimes

74
00:03:11,080 --> 00:03:12,800
Speaker 2: it's like, Okay, did someone write this or not?

75
00:03:13,000 --> 00:03:14,920
Speaker 3: And there has to be it'd be nice if we

76
00:03:14,960 --> 00:03:16,120
Speaker 3: could know the answer.

77
00:03:16,320 --> 00:03:19,280
Speaker 4: Well, the other thing that's starting to happen is have

78
00:03:19,400 --> 00:03:21,840
Speaker 4: you seen any books out there that actually come with

79
00:03:21,960 --> 00:03:25,240
Speaker 4: a disclosure or disclaimer that say this book has been

80
00:03:25,280 --> 00:03:26,760
Speaker 4: written only by humans?

81
00:03:26,800 --> 00:03:26,880
Speaker 5: No?

82
00:03:27,000 --> 00:03:28,079
Speaker 6: AI used at all.

83
00:03:28,120 --> 00:03:29,720
Speaker 4: I saw that for the first time on a book

84
00:03:29,760 --> 00:03:32,000
Speaker 4: that we actually read for an All Blots episode. I

85
00:03:32,000 --> 00:03:33,600
Speaker 4: don't think it's come out yet, but that kind of

86
00:03:33,639 --> 00:03:33,960
Speaker 4: threw me.

87
00:03:34,320 --> 00:03:34,519
Speaker 1: Yeah.

88
00:03:34,639 --> 00:03:37,480
Speaker 2: No, it's more and more anyway, as we enter a

89
00:03:37,520 --> 00:03:40,400
Speaker 2: world at which the vast majority, if not already of

90
00:03:40,480 --> 00:03:43,120
Speaker 2: words written are written by AI, is going to be

91
00:03:43,200 --> 00:03:45,760
Speaker 2: interested in this question of whether we know Anyway, there's

92
00:03:45,800 --> 00:03:48,520
Speaker 2: this company called Pangram Labs, and they have a little

93
00:03:48,560 --> 00:03:50,440
Speaker 2: thing and you can pay for it, but also a

94
00:03:50,440 --> 00:03:52,600
Speaker 2: free service where you can drop like a text in

95
00:03:53,320 --> 00:03:55,320
Speaker 2: and it'll say the odds that is written by human

96
00:03:55,440 --> 00:03:58,320
Speaker 2: or AI. And I'm pretty impressed by it. I like

97
00:03:58,360 --> 00:04:01,320
Speaker 2: did some samples of my own writing and then AI

98
00:04:01,440 --> 00:04:03,560
Speaker 2: outputs it got them all right, But then I did

99
00:04:03,560 --> 00:04:05,680
Speaker 2: some like further, like I tried to stump it to

100
00:04:05,720 --> 00:04:07,720
Speaker 2: see if like. So, what I did was I took

101
00:04:07,760 --> 00:04:10,280
Speaker 2: a piece of AI writing and then I had it

102
00:04:10,320 --> 00:04:13,600
Speaker 2: translated into Chinese, okay, and then I had it translate

103
00:04:13,640 --> 00:04:16,400
Speaker 2: that into High Chinese, so it's like, okay, imagine this

104
00:04:16,480 --> 00:04:19,160
Speaker 2: is being written by a more formal register. And then

105
00:04:19,200 --> 00:04:21,920
Speaker 2: I had that translated into Hebrew, and then I had

106
00:04:21,960 --> 00:04:24,960
Speaker 2: that translated into English. So the original thing through this

107
00:04:25,080 --> 00:04:27,920
Speaker 2: series of Ai telephone, through various translations, and then I

108
00:04:27,960 --> 00:04:30,240
Speaker 2: put that output back into Pangram.

109
00:04:30,360 --> 00:04:31,640
Speaker 3: I got that right. It said it was Ai.

110
00:04:31,720 --> 00:04:35,240
Speaker 2: So even after a series of sort of transformations designed

111
00:04:35,279 --> 00:04:39,280
Speaker 2: to obfuscate the original style of the piece to see

112
00:04:39,320 --> 00:04:41,600
Speaker 2: if you know, eventually it would emerge in something else.

113
00:04:41,839 --> 00:04:44,160
Speaker 2: So I was pretty impressed. It seems to work. And

114
00:04:44,240 --> 00:04:46,400
Speaker 2: you know, I think that's interesting for a couple of reasons,

115
00:04:46,400 --> 00:04:49,320
Speaker 2: which is maybe there is something that you can just tell.

116
00:04:49,680 --> 00:04:52,120
Speaker 2: But two, it sort of worries me because you know,

117
00:04:52,320 --> 00:04:54,480
Speaker 2: there have been articles and they'll say like, this is

118
00:04:54,480 --> 00:04:56,360
Speaker 2: written by Ai, And I think one of my big

119
00:04:56,360 --> 00:04:58,240
Speaker 2: fears would be that I write something.

120
00:04:58,600 --> 00:04:59,760
Speaker 3: I like to use an mdash.

121
00:05:00,000 --> 00:05:02,520
Speaker 4: I've always been in them, dash fan, I love m dashes.

122
00:05:02,600 --> 00:05:03,520
Speaker 4: That's how people talk.

123
00:05:03,640 --> 00:05:04,200
Speaker 6: I'm sorry.

124
00:05:04,400 --> 00:05:06,400
Speaker 2: And then what if it says you wrote this by Ai,

125
00:05:06,640 --> 00:05:08,560
Speaker 2: and I'm like, I didn't, And then here's this black

126
00:05:08,600 --> 00:05:11,680
Speaker 2: box that is suddenly like Judge Jurgen, executioner for my

127
00:05:12,279 --> 00:05:15,880
Speaker 2: career potentially who wrote this. AI the Lab says, so

128
00:05:16,440 --> 00:05:18,640
Speaker 2: you are now done? Like that worries me. So I

129
00:05:18,640 --> 00:05:21,680
Speaker 2: think this raises a lot of very interesting questions about

130
00:05:21,680 --> 00:05:23,960
Speaker 2: these molde little detection things, and I want to learn

131
00:05:23,960 --> 00:05:24,640
Speaker 2: more about how well.

132
00:05:24,640 --> 00:05:27,440
Speaker 4: There's also a lot of philosophical questions about just what

133
00:05:27,480 --> 00:05:30,919
Speaker 4: we value in writing true as well, because no one's

134
00:05:30,960 --> 00:05:33,760
Speaker 4: going to yell at you for using spell check or

135
00:05:33,800 --> 00:05:36,039
Speaker 4: something like that, right, Like, it's kind of crazy to

136
00:05:36,040 --> 00:05:39,000
Speaker 4: think that reputational risk is going to hinge on whether

137
00:05:39,120 --> 00:05:41,640
Speaker 4: or not you might have used a platform, a chat

138
00:05:41,680 --> 00:05:44,760
Speaker 4: platform to like do some basic copy editing.

139
00:05:45,000 --> 00:05:47,320
Speaker 2: Totally well, very happy to say, we do, in fact

140
00:05:47,360 --> 00:05:48,160
Speaker 2: have the perfect guest.

141
00:05:48,440 --> 00:05:50,120
Speaker 3: We're going to be speaking with Max Spiro.

142
00:05:50,240 --> 00:05:52,880
Speaker 2: He is the founder and CEO of Pangram Labs, and

143
00:05:52,880 --> 00:05:54,720
Speaker 2: he can answer all of our questions. So Max, thank

144
00:05:54,720 --> 00:05:55,600
Speaker 2: you so much for coming on.

145
00:05:55,560 --> 00:05:56,919
Speaker 5: Outlaws, Thanks for having me.

146
00:05:57,160 --> 00:05:58,120
Speaker 3: How do you know it's right?

147
00:05:58,279 --> 00:06:00,600
Speaker 2: So someone puts in a piece of tech and we'll

148
00:06:00,600 --> 00:06:02,440
Speaker 2: get into the method in the second. But someone puts

149
00:06:02,440 --> 00:06:05,440
Speaker 2: in a piece of text and it says human AI,

150
00:06:06,320 --> 00:06:08,719
Speaker 2: what makes you believe that you have a very good.

151
00:06:08,560 --> 00:06:09,760
Speaker 3: Track record all this question.

152
00:06:09,960 --> 00:06:12,520
Speaker 7: So when we started Pangram, we started by doing this

153
00:06:12,560 --> 00:06:15,840
Speaker 7: thing we call a human baseline, which is how well

154
00:06:16,120 --> 00:06:19,680
Speaker 7: can we as a human predict whether something's AI or not?

155
00:06:19,960 --> 00:06:23,039
Speaker 7: That's the first step out like learning, is this problem tractable?

156
00:06:23,440 --> 00:06:25,800
Speaker 5: How hard or easy is it? And I found, like.

157
00:06:26,120 --> 00:06:29,240
Speaker 7: Me personally, I was able to get about ninety percent accuracy,

158
00:06:29,720 --> 00:06:32,680
Speaker 7: and so we figured an AI model should be able

159
00:06:32,720 --> 00:06:33,279
Speaker 7: to do much.

160
00:06:33,120 --> 00:06:33,599
Speaker 5: Better than that.

161
00:06:33,920 --> 00:06:37,359
Speaker 4: So I have a bunch of methodology questions which we

162
00:06:37,400 --> 00:06:40,440
Speaker 4: can get into. But just before we get into any

163
00:06:40,440 --> 00:06:44,240
Speaker 4: of that, why is AI slot bad in your opinion?

164
00:06:44,279 --> 00:06:46,480
Speaker 4: Why does it need to be tracked and identified?

165
00:06:46,760 --> 00:06:48,680
Speaker 7: I think the problem is is just so easy to

166
00:06:48,760 --> 00:06:51,720
Speaker 7: generate and so like it's very difficult to know, like

167
00:06:52,240 --> 00:06:56,080
Speaker 7: what is the like intent behind it? Basically, Like right now,

168
00:06:56,360 --> 00:06:58,560
Speaker 7: I think we're actually pretty lucky living. We live in

169
00:06:58,640 --> 00:07:02,039
Speaker 7: a world where the signs noise ratio on the Internet

170
00:07:02,040 --> 00:07:03,279
Speaker 7: and in our information.

171
00:07:02,920 --> 00:07:03,920
Speaker 5: Channels is pretty high.

172
00:07:04,040 --> 00:07:06,839
Speaker 7: We have pretty high signal to noise, But any bad

173
00:07:06,839 --> 00:07:10,520
Speaker 7: actor can come in and just flood our information channels

174
00:07:10,560 --> 00:07:15,000
Speaker 7: with aislot that looks legitimate. It looks like somebody put

175
00:07:15,040 --> 00:07:18,760
Speaker 7: actual effort and thought into it, but really it was

176
00:07:18,880 --> 00:07:21,440
Speaker 7: just like a single prompt which could have also been automated.

177
00:07:21,600 --> 00:07:23,679
Speaker 2: This is something that I think about a lot, which

178
00:07:23,720 --> 00:07:26,239
Speaker 2: is that there was a point in time and maybe

179
00:07:26,280 --> 00:07:28,960
Speaker 2: still is the point in time where if you read

180
00:07:29,000 --> 00:07:33,120
Speaker 2: something that was grammatically correct, where the punctuation was strong,

181
00:07:33,400 --> 00:07:36,640
Speaker 2: where the spelling was strong, there was reason to think

182
00:07:36,680 --> 00:07:39,400
Speaker 2: that the person who wrote it was a person of

183
00:07:39,560 --> 00:07:43,240
Speaker 2: like certain seriousness and a certain intelligence behind it.

184
00:07:43,560 --> 00:07:45,640
Speaker 3: And I think that the issue that you're.

185
00:07:45,520 --> 00:07:48,600
Speaker 2: Identifying is that that link is now being severed so

186
00:07:48,640 --> 00:07:51,800
Speaker 2: that we can't use these heuristics anymore, such as the

187
00:07:51,840 --> 00:07:55,640
Speaker 2: strict quality of the pros to know in fact whether

188
00:07:55,920 --> 00:07:59,000
Speaker 2: this was published by someone who was like a serious actor,

189
00:07:59,200 --> 00:08:00,320
Speaker 2: intelligent or or not.

190
00:08:00,480 --> 00:08:03,600
Speaker 4: And now you have people inserting typos into their card

191
00:08:04,000 --> 00:08:06,680
Speaker 4: that's true that they are Yeah boyd.

192
00:08:06,680 --> 00:08:09,840
Speaker 2: Sorry just to go back to my original question. So

193
00:08:09,880 --> 00:08:12,480
Speaker 2: you mentioned, okay, you're able to get it ninety percent right,

194
00:08:12,480 --> 00:08:14,320
Speaker 2: but now we've been used a lot more and you

195
00:08:14,320 --> 00:08:19,040
Speaker 2: have people paying for your software, presumably teachers and journalists, etc.

196
00:08:20,160 --> 00:08:23,280
Speaker 2: Given all of that, getting from ninety percent to one hundred,

197
00:08:23,320 --> 00:08:25,160
Speaker 2: I mean, if you could make one out of ten

198
00:08:25,200 --> 00:08:28,240
Speaker 2: it's clearly an unacceptable error raid for a piece of

199
00:08:28,240 --> 00:08:31,640
Speaker 2: commercial software that could call someone an AI creator. So

200
00:08:31,680 --> 00:08:33,360
Speaker 2: you have to do a lot better than ninety percent.

201
00:08:33,800 --> 00:08:36,360
Speaker 2: Talk to us about like what you've seen so far

202
00:08:36,559 --> 00:08:39,920
Speaker 2: in your data since releasing it as commercial software that

203
00:08:40,040 --> 00:08:43,600
Speaker 2: makes you believe the software is doing a correct job

204
00:08:43,679 --> 00:08:45,720
Speaker 2: of allocating between the two categories.

205
00:08:45,760 --> 00:08:49,679
Speaker 7: So we've built out really comprehensive emails, okay, and so

206
00:08:49,880 --> 00:08:54,240
Speaker 7: our evaluations. There's two kinds of errors. There's a false positive,

207
00:08:54,520 --> 00:08:56,920
Speaker 7: which is when something is written by a human and

208
00:08:56,960 --> 00:08:58,720
Speaker 7: then we say that it's written by an AI, okay.

209
00:08:58,760 --> 00:09:00,839
Speaker 7: And there's a false negative, which is if it was

210
00:09:00,920 --> 00:09:03,840
Speaker 7: AI written and we don't catch it. And so we

211
00:09:04,040 --> 00:09:07,839
Speaker 7: track our numbers for both of these, and for human.

212
00:09:07,559 --> 00:09:09,079
Speaker 5: Writing, we're actually pretty fortunate.

213
00:09:09,240 --> 00:09:11,080
Speaker 7: We have like millions and millions of samples, so we

214
00:09:11,120 --> 00:09:13,640
Speaker 7: can get like a false positive number that we have

215
00:09:13,679 --> 00:09:16,080
Speaker 7: a very high degree of confidence in. And our number

216
00:09:16,160 --> 00:09:19,080
Speaker 7: right now is about one in ten thousand. Ok So,

217
00:09:19,160 --> 00:09:22,760
Speaker 7: if we scan ten thousand documents on average, one will

218
00:09:22,800 --> 00:09:23,480
Speaker 7: come back as.

219
00:09:23,840 --> 00:09:25,240
Speaker 5: AI when it was actually human.

220
00:09:25,440 --> 00:09:27,319
Speaker 3: And what about in the other direction false negative?

221
00:09:27,720 --> 00:09:31,760
Speaker 7: I would say around ninety nine percent accuracy, So like

222
00:09:32,120 --> 00:09:35,080
Speaker 7: around one percent false negative rate. I think this depends

223
00:09:35,080 --> 00:09:38,440
Speaker 7: a little bit more on like how adversarial the prompting is,

224
00:09:38,640 --> 00:09:40,720
Speaker 7: how much they're trying to ev.

225
00:09:40,720 --> 00:09:44,280
Speaker 2: What I did exact send it through multiple filtrations to

226
00:09:44,360 --> 00:09:47,600
Speaker 2: obfuscate the original output. That would be an example of

227
00:09:47,640 --> 00:09:49,240
Speaker 2: adversarial prompting exactly.

228
00:09:49,480 --> 00:09:52,079
Speaker 7: But in like the general case where we're just looking

229
00:09:52,120 --> 00:09:55,880
Speaker 7: at straight outputs from AI, it's above ninety nine percent.

230
00:09:55,960 --> 00:09:59,000
Speaker 4: Okay, okay, So what is your model looking for exactly

231
00:09:59,040 --> 00:10:02,120
Speaker 4: when it's evaluated a text? Because, as we mentioned in

232
00:10:02,160 --> 00:10:05,560
Speaker 4: the intro, you know, syntax and grammar tends to be

233
00:10:05,679 --> 00:10:10,599
Speaker 4: pretty good on AI generated copy. The style is sometimes

234
00:10:10,640 --> 00:10:14,760
Speaker 4: more of an identifier, I would argue to your point, Joe, like,

235
00:10:14,960 --> 00:10:19,320
Speaker 4: sometimes it reads very saccharine and kind of overly earnest

236
00:10:19,640 --> 00:10:22,280
Speaker 4: in some ways. So what exactly are you focusing on here?

237
00:10:22,280 --> 00:10:23,000
Speaker 4: What are the tells?

238
00:10:23,200 --> 00:10:26,120
Speaker 7: Yeah, so the style and the word choices are definitely

239
00:10:26,200 --> 00:10:27,760
Speaker 7: part of it. But I think what a lot of

240
00:10:27,760 --> 00:10:30,200
Speaker 7: people don't realize is they're actually making a lot of

241
00:10:30,559 --> 00:10:33,720
Speaker 7: decisions when they write a piece of text. So there's

242
00:10:33,840 --> 00:10:36,800
Speaker 7: you know, dozens or hundreds of ways to phrase every

243
00:10:36,840 --> 00:10:39,680
Speaker 7: single phrase, and over the course of fifty or one

244
00:10:39,720 --> 00:10:43,240
Speaker 7: hundred or two hundred words, you're making thousands of decisions actually,

245
00:10:43,679 --> 00:10:46,400
Speaker 7: And so what we're doing is we're learning the patterns

246
00:10:46,400 --> 00:10:49,880
Speaker 7: and how like these frontier models make these decisions. And

247
00:10:49,960 --> 00:10:53,000
Speaker 7: if the vast majority of these decisions line up with

248
00:10:53,040 --> 00:10:56,160
Speaker 7: how the frontier models are doing it, then it's vanishingly

249
00:10:56,240 --> 00:10:58,600
Speaker 7: unlikely that this was written by a human. You would

250
00:10:58,640 --> 00:11:01,240
Speaker 7: have to just happen to make the same exact decisions

251
00:11:01,240 --> 00:11:03,240
Speaker 7: that the LM does hundreds of times.

252
00:11:03,280 --> 00:11:04,280
Speaker 6: Interesting, Okay, this.

253
00:11:04,320 --> 00:11:05,480
Speaker 3: Is a really important point.

254
00:11:05,559 --> 00:11:08,200
Speaker 2: So everyone at this point has some feel for let

255
00:11:08,280 --> 00:11:11,400
Speaker 2: go the M dash tell right, But my understanding is

256
00:11:11,440 --> 00:11:13,640
Speaker 2: it's not like you don't go in in like hard

257
00:11:13,679 --> 00:11:15,960
Speaker 2: code if you see a bunch of M dashes. This

258
00:11:16,080 --> 00:11:19,920
Speaker 2: is the thing these decisions. In many cases, I imagine,

259
00:11:19,960 --> 00:11:24,840
Speaker 2: neither you nor the model itself can articulate in English

260
00:11:25,080 --> 00:11:27,720
Speaker 2: what the decisions are. All you know is that the

261
00:11:27,760 --> 00:11:29,160
Speaker 2: decision pattern exists.

262
00:11:29,240 --> 00:11:29,880
Speaker 3: Is this correct?

263
00:11:30,000 --> 00:11:30,679
Speaker 5: This is correct?

264
00:11:30,720 --> 00:11:31,840
Speaker 3: Okay? Can you explain?

265
00:11:32,000 --> 00:11:35,120
Speaker 2: So therefore, what does it mean that your model has

266
00:11:35,280 --> 00:11:37,079
Speaker 2: learned these decision?

267
00:11:37,480 --> 00:11:39,920
Speaker 7: So what we're doing on the very broad scale is

268
00:11:40,080 --> 00:11:42,920
Speaker 7: we're training a deep learning model. So it's a pretty

269
00:11:42,920 --> 00:11:46,400
Speaker 7: big black box, but it has the base model of

270
00:11:47,040 --> 00:11:50,040
Speaker 7: a language model, and then instead of predicting the next token,

271
00:11:50,520 --> 00:11:53,880
Speaker 7: it's predicting whether it the text is AI or not. Okay,

272
00:11:53,960 --> 00:11:56,800
Speaker 7: And how we train it is we train on tens

273
00:11:56,840 --> 00:11:59,960
Speaker 7: of millions of examples, so it sees millions and milli

274
00:12:00,160 --> 00:12:02,959
Speaker 7: of human examples, and for each human example, we also

275
00:12:03,000 --> 00:12:05,920
Speaker 7: show it an AI example. So, for example, let's say

276
00:12:05,920 --> 00:12:09,000
Speaker 7: one of these is a five star review for Denny's

277
00:12:09,200 --> 00:12:11,959
Speaker 7: that's seventy eight words long. Then we'll ask in AI

278
00:12:12,200 --> 00:12:14,120
Speaker 7: to write a five star review about Denny's that's seventy

279
00:12:14,120 --> 00:12:16,240
Speaker 7: eight words long in the style of the first one.

280
00:12:16,440 --> 00:12:18,840
Speaker 7: And obviously these two will be different, and so our

281
00:12:18,880 --> 00:12:22,080
Speaker 7: model is able to learn through contrast, what is the

282
00:12:22,080 --> 00:12:23,000
Speaker 7: difference between.

283
00:12:22,720 --> 00:12:24,840
Speaker 2: Me and The Important thing, sorry, just to be clear here,

284
00:12:25,000 --> 00:12:26,960
Speaker 2: is that you and I might not be able to

285
00:12:27,040 --> 00:12:30,439
Speaker 2: articulate the difference. There will be some difference in maybe

286
00:12:30,520 --> 00:12:33,240
Speaker 2: the sentenced length, there will be some difference in word choice,

287
00:12:33,240 --> 00:12:36,480
Speaker 2: there'll be some difference in punctuation, syntax, whatever, but you

288
00:12:36,600 --> 00:12:40,240
Speaker 2: and I wouldn't obviously spot it. However, after millions of

289
00:12:40,280 --> 00:12:43,640
Speaker 2: examples of these side by sides, the model learns what

290
00:12:43,679 --> 00:12:44,640
Speaker 2: the difference is exactly.

291
00:12:44,720 --> 00:12:46,560
Speaker 7: I think the best that a human can do is

292
00:12:46,720 --> 00:12:49,800
Speaker 7: look for some of these like really obvious tells like chat.

293
00:12:49,880 --> 00:12:53,440
Speaker 7: GIPT loves that, like it's not just X, it's y framing.

294
00:12:53,800 --> 00:12:57,240
Speaker 7: Earlier models really liked some specific words like tapestry and

295
00:12:57,320 --> 00:12:58,760
Speaker 7: intercate and delve.

296
00:12:58,840 --> 00:13:00,360
Speaker 3: Yeah, delve tapestry. Yeah.

297
00:13:00,400 --> 00:13:00,960
Speaker 5: But yeah.

298
00:13:01,000 --> 00:13:03,079
Speaker 7: I think by training Pangram, we're able to go much

299
00:13:03,120 --> 00:13:05,640
Speaker 7: deeper than this and look deeper than the high level

300
00:13:05,640 --> 00:13:08,120
Speaker 7: science at the like document level science.

301
00:13:23,960 --> 00:13:26,080
Speaker 4: So one thing this kind of reminds me of and

302
00:13:26,120 --> 00:13:28,559
Speaker 4: I'm thinking how to phrase this, but it reminds me

303
00:13:28,600 --> 00:13:31,800
Speaker 4: of you know those exercises people used to do where

304
00:13:31,800 --> 00:13:34,000
Speaker 4: you would take a bunch of different faces and meld

305
00:13:34,040 --> 00:13:37,200
Speaker 4: them all together and come up with like one face

306
00:13:37,320 --> 00:13:41,120
Speaker 4: that was supposedly attractive. So, like, to what extent is

307
00:13:41,160 --> 00:13:46,560
Speaker 4: this basically a distributional detector in the sense that you're

308
00:13:46,600 --> 00:13:50,960
Speaker 4: looking for like certain paths that you think AI would choose.

309
00:13:51,800 --> 00:13:54,239
Speaker 4: And I guess, like, could you get a false positive

310
00:13:54,840 --> 00:13:57,440
Speaker 4: just from someone who's choosing like the average of the

311
00:13:57,480 --> 00:14:00,320
Speaker 4: average of the average in a way to state a

312
00:14:00,320 --> 00:14:01,200
Speaker 4: particular sentence.

313
00:14:03,360 --> 00:14:06,400
Speaker 7: Maybe there's a reason we have our false posit rate

314
00:14:06,440 --> 00:14:08,840
Speaker 7: is one in ten thousand and not zero. It's because

315
00:14:09,200 --> 00:14:12,319
Speaker 7: you know, sometimes we look at the false positive and

316
00:14:12,360 --> 00:14:15,559
Speaker 7: it's like, oh, it reads exactly like an AI generated

317
00:14:15,720 --> 00:14:18,600
Speaker 7: review or essay, except that it was written in twenty nineteen.

318
00:14:18,640 --> 00:14:21,000
Speaker 7: So it was probably a human who just happened to

319
00:14:21,800 --> 00:14:24,840
Speaker 7: find the exact like mode collapsed.

320
00:14:24,640 --> 00:14:26,720
Speaker 5: Type of way that like, yeah, thats right, Yeah, I

321
00:14:26,760 --> 00:14:27,400
Speaker 5: would say, yeah.

322
00:14:27,480 --> 00:14:29,440
Speaker 7: I think it's a good way to think about the

323
00:14:29,480 --> 00:14:32,840
Speaker 7: distribution of writing or writing as a distribution where like,

324
00:14:32,920 --> 00:14:35,520
Speaker 7: you know, there's the space of all human writing, and

325
00:14:35,560 --> 00:14:37,920
Speaker 7: then AI writing is really just.

326
00:14:37,920 --> 00:14:39,840
Speaker 5: Like a small point within this space.

327
00:14:39,880 --> 00:14:42,360
Speaker 7: It's very no matter how much you prompt it, it

328
00:14:42,400 --> 00:14:46,160
Speaker 7: doesn't go that far from where it was trained to be.

329
00:14:46,440 --> 00:14:48,120
Speaker 3: Yeah, okay, WA's the black book.

330
00:14:48,200 --> 00:14:50,520
Speaker 2: So I built a little model myself. I built this

331
00:14:50,560 --> 00:14:53,080
Speaker 2: thing that detext. You can upload text and says whether

332
00:14:53,120 --> 00:14:56,600
Speaker 2: it's more resemblant of the written word or the spoken word.

333
00:14:57,040 --> 00:14:59,600
Speaker 2: Oh I saw that, yeah, yeah, And I used bert,

334
00:14:59,640 --> 00:15:02,480
Speaker 2: which is like one of these things open source one

335
00:15:02,480 --> 00:15:02,960
Speaker 2: from Google.

336
00:15:03,000 --> 00:15:04,800
Speaker 3: What is the core model that.

337
00:15:04,720 --> 00:15:07,280
Speaker 2: You trained on or is it something or did you

338
00:15:07,320 --> 00:15:08,120
Speaker 2: build it yourself?

339
00:15:08,200 --> 00:15:08,960
Speaker 3: Like, talk to us about that.

340
00:15:09,000 --> 00:15:11,760
Speaker 7: Our very first model was actually built on Burt, but

341
00:15:11,960 --> 00:15:17,360
Speaker 7: future models we needed to up our capacity. So basically

342
00:15:17,440 --> 00:15:20,480
Speaker 7: we were running into capacity limits with our model. It

343
00:15:20,840 --> 00:15:23,840
Speaker 7: was capping out at a certain false positive false negative rate.

344
00:15:24,040 --> 00:15:26,600
Speaker 7: It wasn't learning the deeper signals, so we had to

345
00:15:26,800 --> 00:15:28,960
Speaker 7: ten x and then one hundred x the parameter account

346
00:15:29,160 --> 00:15:32,400
Speaker 7: so that can learn like really deeply, like how these

347
00:15:32,400 --> 00:15:33,400
Speaker 7: frontier models.

348
00:15:33,200 --> 00:15:36,920
Speaker 4: Right, Have you noticed any interesting differences between how the

349
00:15:36,960 --> 00:15:40,760
Speaker 4: models right? Can you and actually is your model trained

350
00:15:40,800 --> 00:15:44,080
Speaker 4: to identify different models as well as whether or not

351
00:15:44,120 --> 00:15:46,440
Speaker 4: This is just broadly AI generated.

352
00:15:46,560 --> 00:15:50,520
Speaker 7: So we don't specifically train it on different models. We

353
00:15:50,520 --> 00:15:52,720
Speaker 7: don't say like hey, this one is CLAT three and

354
00:15:52,760 --> 00:15:56,400
Speaker 7: this one is Chat or GPD five. What we've done

355
00:15:56,680 --> 00:16:00,040
Speaker 7: we've done some interpretability work to look at basically the

356
00:16:00,080 --> 00:16:02,720
Speaker 7: output embeddings of the model and where we find that

357
00:16:02,920 --> 00:16:05,880
Speaker 7: it actually learns which model the text came from. So

358
00:16:05,920 --> 00:16:08,360
Speaker 7: you could see like little clusters like this is the

359
00:16:08,440 --> 00:16:11,440
Speaker 7: Clod cluster and like all the clods, yeah, cluster around here,

360
00:16:11,440 --> 00:16:13,760
Speaker 7: and then these are like the deep Seek and Quinn

361
00:16:13,840 --> 00:16:15,760
Speaker 7: and then this is like Chat schipt and they all

362
00:16:15,840 --> 00:16:19,680
Speaker 7: kind of like cluster into different spaces and embedding space.

363
00:16:20,240 --> 00:16:22,640
Speaker 7: So clearly the model is able to learn what the

364
00:16:22,640 --> 00:16:24,320
Speaker 7: difference is between these frontier models.

365
00:16:24,520 --> 00:16:27,480
Speaker 4: We actually since you mentioned Quin, I'm very interested is

366
00:16:27,480 --> 00:16:31,040
Speaker 4: there anything like distinct in terms of how Quen generates

367
00:16:31,080 --> 00:16:34,600
Speaker 4: text versus platforms that have been developed in the US.

368
00:16:35,120 --> 00:16:37,640
Speaker 7: I think Quen is unique because it's trained on a

369
00:16:37,680 --> 00:16:40,640
Speaker 7: lot more Chinese and multi lingual tokens than other models.

370
00:16:41,360 --> 00:16:44,200
Speaker 7: So you know, I've heard from Chinese friends that it's

371
00:16:44,320 --> 00:16:49,680
Speaker 7: it's much better at like being conversationally fluent in Chinese.

372
00:16:50,320 --> 00:16:52,400
Speaker 5: Beyond that, I don't know that I can tell.

373
00:16:52,760 --> 00:16:54,280
Speaker 7: It would be hard for me to look at a

374
00:16:54,320 --> 00:16:57,360
Speaker 7: text and say, like, I know that's Quen, But I

375
00:16:57,360 --> 00:16:59,680
Speaker 7: think somebody who's more familiar with it might be able to.

376
00:17:00,200 --> 00:17:02,880
Speaker 2: Let's talk about sort of some of the philosophical or

377
00:17:02,920 --> 00:17:04,720
Speaker 2: societal implications of this work.

378
00:17:05,240 --> 00:17:06,040
Speaker 3: Have you had.

379
00:17:05,920 --> 00:17:10,120
Speaker 2: Anyone whose text has been judged to be ai written

380
00:17:10,160 --> 00:17:12,840
Speaker 2: by Pangram and they're like, I swear to God, this

381
00:17:12,880 --> 00:17:15,639
Speaker 2: isn't you're in? They like, really insist, and what do

382
00:17:15,640 --> 00:17:17,399
Speaker 2: you think about this situation? What do you do or

383
00:17:17,440 --> 00:17:18,200
Speaker 2: talk choice about that.

384
00:17:18,359 --> 00:17:20,439
Speaker 7: I've had a couple of times this happened. There have

385
00:17:20,440 --> 00:17:22,600
Speaker 7: been times where I genuinely believe that you know this

386
00:17:22,720 --> 00:17:24,879
Speaker 7: is just a false positive. We scan hundreds of millions

387
00:17:24,880 --> 00:17:27,040
Speaker 7: of documents, so like, at a certain scale like this

388
00:17:27,040 --> 00:17:30,359
Speaker 7: will happen. But I also get people who all the

389
00:17:30,400 --> 00:17:32,720
Speaker 7: time they're just like AI detectors don't work.

390
00:17:32,840 --> 00:17:34,040
Speaker 5: It's like a total fraud.

391
00:17:34,280 --> 00:17:37,040
Speaker 7: And then whatever they're putting out on LinkedIn is just

392
00:17:37,080 --> 00:17:38,760
Speaker 7: one hundred percent AI generated.

393
00:17:38,440 --> 00:17:40,120
Speaker 5: And they're just like mad that they're getting called out.

394
00:17:40,440 --> 00:17:43,200
Speaker 7: And then you look back farther into their past and

395
00:17:43,200 --> 00:17:45,600
Speaker 7: their history, like everything they're putting out is AI generated

396
00:17:46,000 --> 00:17:49,320
Speaker 7: until about like twenty twenty three, Like for everyone, if

397
00:17:49,359 --> 00:17:52,120
Speaker 7: you look historically, there's a lot of like slop accounts

398
00:17:52,119 --> 00:17:54,800
Speaker 7: that are putting out total slop, and you can tell

399
00:17:54,800 --> 00:17:57,800
Speaker 7: either they like weren't posting as much before, and if

400
00:17:57,880 --> 00:18:00,479
Speaker 7: you scan back in time, then you see that they

401
00:18:00,480 --> 00:18:02,160
Speaker 7: were writing human text at some point.

402
00:18:02,240 --> 00:18:04,800
Speaker 2: So there's a number of accounts out there that basically

403
00:18:04,960 --> 00:18:07,840
Speaker 2: right around the beginning of twenty twenty three, where if

404
00:18:07,880 --> 00:18:10,840
Speaker 2: you scan the entire corpus of their work, it very

405
00:18:10,960 --> 00:18:12,640
Speaker 2: clearly shows a switch.

406
00:18:12,359 --> 00:18:13,920
Speaker 3: Right around early twenty twenty three.

407
00:18:14,119 --> 00:18:17,280
Speaker 7: Yeah, it really like depends on the account. I think

408
00:18:17,400 --> 00:18:19,520
Speaker 7: one thing we saw that was interesting was there is

409
00:18:19,600 --> 00:18:22,720
Speaker 7: a writer for The Guardian that was covering the Winter Olympics,

410
00:18:22,920 --> 00:18:25,040
Speaker 7: and somebody was like, hey, this article is like total

411
00:18:25,080 --> 00:18:27,840
Speaker 7: AI slop. Ran it through pangram it was AI. The

412
00:18:27,880 --> 00:18:30,520
Speaker 7: Guardian was like, no, of course, our writers don't use AI.

413
00:18:30,760 --> 00:18:34,080
Speaker 7: And then we so we scanned this single writer's history

414
00:18:34,520 --> 00:18:36,760
Speaker 7: and we found that they really did start picking up

415
00:18:36,800 --> 00:18:39,400
Speaker 7: AI like mid to late twenty twenty four, and we're

416
00:18:39,480 --> 00:18:41,240
Speaker 7: using it more and more in their articles.

417
00:18:41,560 --> 00:18:44,240
Speaker 4: I mean, just play Devil's Advocate for a second. Does

418
00:18:44,280 --> 00:18:48,359
Speaker 4: intent matter when it comes to identifying AI slop in

419
00:18:48,400 --> 00:18:50,679
Speaker 4: the sense that, Okay, I get you can have a

420
00:18:50,720 --> 00:18:54,800
Speaker 4: bad actor who's maybe trying to influence how people feel

421
00:18:54,800 --> 00:18:57,720
Speaker 4: about a particular topic, and maybe they've created a bunch

422
00:18:57,760 --> 00:19:01,320
Speaker 4: of bots on Twitter slash x and they're using AI

423
00:19:01,480 --> 00:19:04,160
Speaker 4: to just flood the zone with a bunch of AI

424
00:19:04,240 --> 00:19:08,960
Speaker 4: slop supporting their particular viewpoints. On the other hand, if

425
00:19:08,960 --> 00:19:12,479
Speaker 4: you're a journalist and your business is to write, you know,

426
00:19:12,600 --> 00:19:16,520
Speaker 4: like basic understandable copy about a news topic.

427
00:19:16,800 --> 00:19:17,880
Speaker 6: Just to be clear, I'm.

428
00:19:17,680 --> 00:19:21,440
Speaker 4: Not advocating this at all, but that intent is very

429
00:19:21,440 --> 00:19:25,040
Speaker 4: different to I'm going to try to influence something by

430
00:19:25,280 --> 00:19:26,800
Speaker 4: just you know, sheer volume.

431
00:19:27,240 --> 00:19:29,680
Speaker 7: Yeah, I mean, definitely these are like one is a

432
00:19:29,720 --> 00:19:32,239
Speaker 7: lot more severe than the other. But I think at

433
00:19:32,240 --> 00:19:34,280
Speaker 7: the same time, if you're a journalist and you're using

434
00:19:34,760 --> 00:19:38,000
Speaker 7: AI to basically shirk your work and like not do

435
00:19:38,080 --> 00:19:40,240
Speaker 7: your work, I think that's also a problem. And I

436
00:19:40,240 --> 00:19:42,880
Speaker 7: think it's a reputational risk to the outlet because people

437
00:19:42,960 --> 00:19:44,879
Speaker 7: can tell and people are going to call you out.

438
00:19:45,440 --> 00:19:46,840
Speaker 7: There's a lot of people who don't want to read

439
00:19:46,840 --> 00:19:49,240
Speaker 7: AI slop kind of regardless of where it's from.

440
00:19:49,520 --> 00:19:52,840
Speaker 2: Yeah, this is a definitely true. Are you ever going

441
00:19:52,880 --> 00:19:55,240
Speaker 2: to run out of human material to change on?

442
00:19:55,400 --> 00:19:55,520
Speaker 5: Right?

443
00:19:55,560 --> 00:19:57,920
Speaker 2: Like you could be pretty confident that if you find

444
00:19:57,960 --> 00:20:00,879
Speaker 2: some piece of text that was published on the internet

445
00:20:00,880 --> 00:20:03,960
Speaker 2: prior to twenty twenty three, but certainly prior to like

446
00:20:04,000 --> 00:20:06,840
Speaker 2: twenty nineteen or something like that, you can be extremely

447
00:20:06,880 --> 00:20:11,240
Speaker 2: sure that this is human generated. Do you worry that

448
00:20:11,400 --> 00:20:14,040
Speaker 2: in the future, like it's going to be harder to

449
00:20:14,200 --> 00:20:16,840
Speaker 2: even establish the provenance of your training data.

450
00:20:17,200 --> 00:20:18,800
Speaker 5: Uh, Yeah, it's definitely a concern for us.

451
00:20:18,920 --> 00:20:20,280
Speaker 3: Talk to us about how to think about this.

452
00:20:20,359 --> 00:20:23,440
Speaker 7: So we have a near infinite data reservoir of pre

453
00:20:23,560 --> 00:20:26,600
Speaker 7: twenty twenty three data, there's just like more than enough

454
00:20:26,600 --> 00:20:28,280
Speaker 7: for us to train on for a long long time.

455
00:20:28,920 --> 00:20:31,080
Speaker 7: But part of the problem is we also want to

456
00:20:31,080 --> 00:20:33,560
Speaker 7: train on modern text. We want to there's all this

457
00:20:33,640 --> 00:20:36,840
Speaker 7: talk about like if somebody's writing about LMS or about AI,

458
00:20:36,920 --> 00:20:39,560
Speaker 7: we don't want to incorrectly flag that as AI because

459
00:20:39,760 --> 00:20:43,399
Speaker 7: our training data has no sense of this topic. So

460
00:20:44,040 --> 00:20:46,040
Speaker 7: I think we're looking at different ways to do this,

461
00:20:46,160 --> 00:20:48,760
Speaker 7: but most of them are just like figuring out like

462
00:20:48,800 --> 00:20:49,960
Speaker 7: who is a trusted actor?

463
00:20:50,000 --> 00:20:51,160
Speaker 5: Who do we know is.

464
00:20:51,160 --> 00:20:53,919
Speaker 7: Putting out humor written content and we could use our

465
00:20:53,960 --> 00:20:56,080
Speaker 7: model for that, like to some degree. And then so

466
00:20:56,200 --> 00:20:58,600
Speaker 7: we have known actors, we know they're putting out human

467
00:20:58,640 --> 00:21:00,560
Speaker 7: written content, and then we could use their as well.

468
00:21:00,920 --> 00:21:03,680
Speaker 4: Slightly random question, but using your model, are you able

469
00:21:03,680 --> 00:21:06,919
Speaker 4: to quantify like what percentage of the Internet at the

470
00:21:06,960 --> 00:21:08,240
Speaker 4: moment is aislot?

471
00:21:08,600 --> 00:21:12,920
Speaker 2: It's about forty percent based on why you're just how'd

472
00:21:12,920 --> 00:21:13,639
Speaker 2: you get that number?

473
00:21:13,960 --> 00:21:16,960
Speaker 7: So a lot of the Internet is just like SEO

474
00:21:17,080 --> 00:21:20,480
Speaker 7: written articles and like, yeah, it's articles written for search

475
00:21:20,560 --> 00:21:22,440
Speaker 7: basically so that your website comes up more often in

476
00:21:22,440 --> 00:21:24,919
Speaker 7: search because it's targeting certain keywords. And a lot of

477
00:21:24,920 --> 00:21:28,280
Speaker 7: that industry has switched over to using AI because then

478
00:21:28,320 --> 00:21:30,480
Speaker 7: instead of having to pay writers you could turn out

479
00:21:30,560 --> 00:21:33,520
Speaker 7: articles for pennies on the dollar, but I think that

480
00:21:33,600 --> 00:21:36,280
Speaker 7: kind of results in a lot of the Internet being

481
00:21:36,359 --> 00:21:39,399
Speaker 7: AI written. It's a little bit is also kind of

482
00:21:39,440 --> 00:21:43,040
Speaker 7: platform dependent. It's about forty percent from like a Internet

483
00:21:43,040 --> 00:21:46,600
Speaker 7: page perspective. About a year and a half ago, we

484
00:21:46,640 --> 00:21:49,600
Speaker 7: looked at Medium and found that over fifty percent of

485
00:21:49,840 --> 00:21:54,240
Speaker 7: newly written Medium articles were generated, which was a crazy

486
00:21:54,320 --> 00:21:54,840
Speaker 7: high number.

487
00:21:54,880 --> 00:21:55,520
Speaker 3: What about Reddit?

488
00:21:56,160 --> 00:21:58,679
Speaker 7: Reddit, it was seven percent a year ago, I believe

489
00:21:58,920 --> 00:21:59,879
Speaker 7: a little over ten percent.

490
00:22:00,400 --> 00:22:03,280
Speaker 4: Well, actually this reminds me. So I'm on Reddit a

491
00:22:03,280 --> 00:22:05,840
Speaker 4: lot and I really enjoy it nowadays as a platform,

492
00:22:05,880 --> 00:22:07,600
Speaker 4: but I do worry about how much of it is

493
00:22:07,640 --> 00:22:11,280
Speaker 4: being generated by AI. And the thing I don't necessarily

494
00:22:11,359 --> 00:22:16,000
Speaker 4: understand is what are the economic incentives to actually write

495
00:22:16,040 --> 00:22:18,480
Speaker 4: a bunch of AI generated posts on Reddit and get

496
00:22:18,600 --> 00:22:22,439
Speaker 4: up voted, Like why does that system or motivation even exist.

497
00:22:22,760 --> 00:22:25,200
Speaker 7: So there are startups I'm not going to name names

498
00:22:25,240 --> 00:22:27,520
Speaker 7: because I don't want to promote them, but they will

499
00:22:28,119 --> 00:22:30,480
Speaker 7: sell a promise to companies that we're going to get

500
00:22:30,480 --> 00:22:33,719
Speaker 7: you organic mentions on Reddit. We're going to run our

501
00:22:33,760 --> 00:22:37,320
Speaker 7: AI bots that seem organic, and they're just going to,

502
00:22:37,640 --> 00:22:40,280
Speaker 7: you know, naturally recommend your product or you know, just

503
00:22:40,359 --> 00:22:43,119
Speaker 7: mention your product in the comments or in a post.

504
00:22:43,600 --> 00:22:46,399
Speaker 7: And so I've seen evidence of this. We can find

505
00:22:46,440 --> 00:22:51,520
Speaker 7: these like they're basically like botforms that are mostly engaging,

506
00:22:52,000 --> 00:22:55,000
Speaker 7: seemingly organically, just like doing a short reply, and then

507
00:22:55,040 --> 00:22:57,560
Speaker 7: sometimes they're doing this brand mention. And so that's why

508
00:22:57,560 --> 00:22:58,840
Speaker 7: these posts are very valuable.

509
00:22:58,840 --> 00:22:59,680
Speaker 6: That's really interesting.

510
00:22:59,720 --> 00:23:02,280
Speaker 2: I have to you also imagine it's valuable because all

511
00:23:02,359 --> 00:23:05,280
Speaker 2: of the models train on Reddit, right, and if you

512
00:23:05,359 --> 00:23:09,399
Speaker 2: want your product's name to appear in model outputs, it's like,

513
00:23:09,680 --> 00:23:13,520
Speaker 2: what is the best you know, nose hair trimmer or whatever,

514
00:23:13,960 --> 00:23:16,320
Speaker 2: And there's a bunch of bots that on Reddit talked

515
00:23:16,320 --> 00:23:18,920
Speaker 2: about this nose hair trimmer, and then that's probably more.

516
00:23:18,800 --> 00:23:21,639
Speaker 3: Likely to show up in a chatchypt request, right.

517
00:23:21,760 --> 00:23:23,920
Speaker 7: Yeah, yeah, it's been weirdly gamed. You know, you used

518
00:23:23,920 --> 00:23:26,200
Speaker 7: to just google best nose hair trimmer, and now there's

519
00:23:26,240 --> 00:23:27,160
Speaker 7: like a thousand.

520
00:23:27,400 --> 00:23:29,959
Speaker 4: The Reddit search results like show up first nowadays.

521
00:23:30,080 --> 00:23:31,240
Speaker 6: Yeah, that's where people are looking.

522
00:23:31,480 --> 00:23:34,439
Speaker 7: Yeah, and then people start searching best nose trimmer Reddit

523
00:23:35,280 --> 00:23:37,280
Speaker 7: to get their Reddit comments on it. And now it's

524
00:23:37,680 --> 00:23:39,800
Speaker 7: people have realized that that's what people are searching for.

525
00:23:40,119 --> 00:23:43,480
Speaker 7: So you need to populate Reddit with your advertisements.

526
00:23:44,760 --> 00:23:46,600
Speaker 4: I'm on the Men's health Are you looking for nose

527
00:23:46,640 --> 00:23:47,240
Speaker 4: hair trimmers?

528
00:23:47,440 --> 00:23:50,440
Speaker 2: The Panasonic ear and nose hair trimmer is the number

529
00:23:50,480 --> 00:23:53,800
Speaker 2: one choice on men's health pros. Easy to hold anyway,

530
00:23:53,960 --> 00:23:54,240
Speaker 2: it's not.

531
00:23:54,440 --> 00:23:57,800
Speaker 5: Yeah, it's all these affiliate links. Yeah, just destroyed the Internet.

532
00:23:57,920 --> 00:24:00,760
Speaker 2: I know it's it's too bad, but whatever, talk to

533
00:24:00,840 --> 00:24:03,280
Speaker 2: us more about the whole pipeline. So, I'm very fascinated

534
00:24:03,280 --> 00:24:05,879
Speaker 2: by this idea. It's like, Okay, you see this review

535
00:24:05,960 --> 00:24:08,480
Speaker 2: for Denny's. You have the AI model.

536
00:24:08,600 --> 00:24:10,879
Speaker 3: Try to replicate it as best as it could. Movie

537
00:24:10,880 --> 00:24:13,000
Speaker 3: these subtle differences. Talk to us as though about, like

538
00:24:13,040 --> 00:24:14,000
Speaker 3: the whole pipeline.

539
00:24:14,000 --> 00:24:16,640
Speaker 2: What are the other tests that you're using to get

540
00:24:16,680 --> 00:24:19,760
Speaker 2: the true you know, because what I imagine you're trying to

541
00:24:19,800 --> 00:24:22,879
Speaker 2: do is get the most similar data sets with an

542
00:24:22,880 --> 00:24:26,760
Speaker 2: almost imperceptible difference to really stress tests. Yeah, talk to

543
00:24:26,840 --> 00:24:28,120
Speaker 2: us really about the whole pipeline.

544
00:24:28,160 --> 00:24:28,320
Speaker 4: Yeah.

545
00:24:28,359 --> 00:24:30,359
Speaker 7: So what we're really trying to do here is we're as.

546
00:24:30,240 --> 00:24:33,240
Speaker 3: A model maker myself, no, no, sorry, keep going.

547
00:24:33,320 --> 00:24:35,159
Speaker 5: Yeah, as an AI expert, Yeah, yeah.

548
00:24:35,000 --> 00:24:36,920
Speaker 3: As an AI expert. I need to hear some tips

549
00:24:36,960 --> 00:24:37,520
Speaker 3: of the field.

550
00:24:38,600 --> 00:24:41,399
Speaker 7: Uh yeah, So what we're really looking for is examples

551
00:24:41,400 --> 00:24:43,800
Speaker 7: that are as close to the boundary between human and

552
00:24:43,840 --> 00:24:47,000
Speaker 7: AI as possible that our model learns better. Something that's

553
00:24:47,119 --> 00:24:50,399
Speaker 7: very obviously AI is, you know, our models not learning

554
00:24:50,400 --> 00:24:53,639
Speaker 7: as much same thing for something that's obviously human. And

555
00:24:53,720 --> 00:24:57,879
Speaker 7: so step one is creating this data set with synthetic

556
00:24:57,920 --> 00:25:00,639
Speaker 7: mirrors of human examples, and then we train a model,

557
00:25:00,960 --> 00:25:03,920
Speaker 7: and then step two is something called active learning. So

558
00:25:03,960 --> 00:25:06,840
Speaker 7: we then take this model and use it to scan

559
00:25:06,960 --> 00:25:10,920
Speaker 7: a much larger corpus of data and look for errors,

560
00:25:11,200 --> 00:25:14,440
Speaker 7: false positives, false negatives, and then we pull those back

561
00:25:14,480 --> 00:25:17,080
Speaker 7: into our training set and are able to train a

562
00:25:17,160 --> 00:25:20,919
Speaker 7: much better model because it's seen these errors, which and

563
00:25:20,960 --> 00:25:23,119
Speaker 7: these errors we believe are just much closer to the

564
00:25:23,520 --> 00:25:24,840
Speaker 7: boundary between human and AI.

565
00:25:25,080 --> 00:25:28,040
Speaker 2: So sorry, just to be clear, the first pass is like, okay,

566
00:25:28,080 --> 00:25:31,800
Speaker 2: you have known human writing and known AI writing. You

567
00:25:31,840 --> 00:25:34,760
Speaker 2: train a model, and then the next pass is once

568
00:25:34,800 --> 00:25:38,199
Speaker 2: again unknown human and known AI writing. So you already

569
00:25:38,240 --> 00:25:41,600
Speaker 2: know the answer of each of these and therefore you

570
00:25:41,640 --> 00:25:44,000
Speaker 2: could come up with a list of which it got wrong,

571
00:25:44,400 --> 00:25:46,840
Speaker 2: and then that gets fed back into the first.

572
00:25:46,640 --> 00:25:50,000
Speaker 7: Verse exactly, and so that makes once we retrain, then

573
00:25:50,040 --> 00:25:52,760
Speaker 7: the model gets much much better, and then we could

574
00:25:52,840 --> 00:25:55,600
Speaker 7: do this as many times as we want to, kind

575
00:25:55,600 --> 00:25:58,800
Speaker 7: of just have a self improving model that gets better

576
00:25:58,880 --> 00:26:01,600
Speaker 7: with every training run. I can also tell you go

577
00:26:01,640 --> 00:26:04,840
Speaker 7: a little bit more into how we deal with AI edits,

578
00:26:05,000 --> 00:26:08,840
Speaker 7: because I think that's increasingly important. Problem is, like I

579
00:26:08,880 --> 00:26:12,080
Speaker 7: think most writing will be AI assisted in the future.

580
00:26:12,440 --> 00:26:14,719
Speaker 7: I think it's already in Google Docs and it's in

581
00:26:15,040 --> 00:26:15,760
Speaker 7: Google Keyboard.

582
00:26:16,000 --> 00:26:18,359
Speaker 4: Grammarly arguably has been doing this for a while.

583
00:26:18,520 --> 00:26:18,879
Speaker 5: Exactly.

584
00:26:18,960 --> 00:26:22,480
Speaker 7: Yeah, Grammarly uses LMS on the back end, and we

585
00:26:22,760 --> 00:26:25,400
Speaker 7: don't want to just say, like, all writing is AI now.

586
00:26:25,520 --> 00:26:28,000
Speaker 7: We want to be able to differentiate between AI assisted

587
00:26:28,280 --> 00:26:30,560
Speaker 7: and AI generated. So what we do is we also

588
00:26:30,640 --> 00:26:34,720
Speaker 7: have different prompts. So rather than saying so for our

589
00:26:34,960 --> 00:26:38,679
Speaker 7: human review of Denny's, rather than saying, generate a review

590
00:26:38,800 --> 00:26:41,439
Speaker 7: like this, we could say, help improve this, make it

591
00:26:41,480 --> 00:26:43,920
Speaker 7: more formal, make it more like, clean up the grammar.

592
00:26:44,080 --> 00:26:47,320
Speaker 7: And so we have like a long list of AI

593
00:26:47,520 --> 00:26:51,680
Speaker 7: editing prompts, and then we're able to look at basically

594
00:26:51,680 --> 00:26:56,280
Speaker 7: the cosine difference the distance between the original human text and.

595
00:26:56,600 --> 00:26:59,240
Speaker 3: The in that hyper multidimensional space.

596
00:26:59,080 --> 00:27:03,800
Speaker 7: Exactly, So how much did AI change this text? And

597
00:27:03,840 --> 00:27:06,119
Speaker 7: then we're able to train our model to say, like

598
00:27:06,760 --> 00:27:09,080
Speaker 7: we're just going to like put a point on this

599
00:27:09,119 --> 00:27:11,960
Speaker 7: distance and say like this is moderate aissistance, this is

600
00:27:12,040 --> 00:27:14,240
Speaker 7: light AI assistance, and this is heavy aissistance.

601
00:27:14,560 --> 00:27:16,919
Speaker 4: Interesting. I'm going to do something I don't think I've

602
00:27:16,960 --> 00:27:20,600
Speaker 4: ever done before, which is ask a founder about their

603
00:27:20,680 --> 00:27:24,760
Speaker 4: corporate mission. But you know, you've set up this company,

604
00:27:25,320 --> 00:27:27,359
Speaker 4: and when you think about what you're trying to do here,

605
00:27:27,520 --> 00:27:30,520
Speaker 4: is it just basic AI detection in the sense that

606
00:27:30,560 --> 00:27:32,600
Speaker 4: there might be you know, a few groups of people

607
00:27:32,720 --> 00:27:35,960
Speaker 4: like teachers that find this very valuable, or is the

608
00:27:36,000 --> 00:27:40,399
Speaker 4: mission something broader where you're actually trying to improve the

609
00:27:40,480 --> 00:27:42,720
Speaker 4: Internet and what people see on it.

610
00:27:43,000 --> 00:27:46,800
Speaker 7: I believe the technology of being able to detect AI

611
00:27:46,840 --> 00:27:51,439
Speaker 7: generated content is immensely valuable, and it's valuable not just

612
00:27:51,480 --> 00:27:55,680
Speaker 7: for teachers, but for basically everybody in every profession. Lawyer's

613
00:27:56,040 --> 00:28:00,560
Speaker 7: publisher is just an individual who consumes content on the Internet.

614
00:28:00,760 --> 00:28:04,480
Speaker 7: I think it's valuable for all these people. But ultimately, yeah,

615
00:28:04,520 --> 00:28:07,719
Speaker 7: our high level goal is to help mitigate some of

616
00:28:07,760 --> 00:28:11,119
Speaker 7: these negative effects of growing AI content.

617
00:28:11,440 --> 00:28:16,280
Speaker 4: But for instance, just using the product review example, is

618
00:28:16,320 --> 00:28:19,520
Speaker 4: the vision that like a Yelp, for instance, would want

619
00:28:19,520 --> 00:28:22,119
Speaker 4: to use this technology to make sure that its system

620
00:28:22,280 --> 00:28:25,520
Speaker 4: isn't being gamed or is the vision Like if I

621
00:28:25,560 --> 00:28:28,720
Speaker 4: am a particularly diligent consumer who has a lot of

622
00:28:28,720 --> 00:28:30,800
Speaker 4: time on my hands and I'm looking to go out

623
00:28:30,840 --> 00:28:34,440
Speaker 4: to a restaurant, I can run all these individual restaurant

624
00:28:34,480 --> 00:28:38,400
Speaker 4: reviews through Pangram and then like actually figure out if

625
00:28:38,440 --> 00:28:39,680
Speaker 4: it's real hype or not.

626
00:28:40,280 --> 00:28:42,800
Speaker 7: So I think right now it's a lot of the former.

627
00:28:42,880 --> 00:28:46,000
Speaker 7: We work with platforms. One of our biggest customers is Quorra,

628
00:28:46,600 --> 00:28:49,120
Speaker 7: and they run a bunch of content through Pangram. But

629
00:28:49,160 --> 00:28:52,480
Speaker 7: we have a lot of different platforms that use Pangram

630
00:28:52,560 --> 00:28:56,440
Speaker 7: to help moderate and find AI bad actors and get

631
00:28:56,440 --> 00:28:58,640
Speaker 7: them off their platform. But I also think, yeah, the

632
00:28:58,760 --> 00:29:01,920
Speaker 7: individual consumer case has been growing a lot, and we're

633
00:29:01,920 --> 00:29:03,560
Speaker 7: really interested in pushing.

634
00:29:03,240 --> 00:29:23,320
Speaker 2: Here the free version of pangram dot com. Like you

635
00:29:23,360 --> 00:29:26,160
Speaker 2: get a handful of tests a day or something like that.

636
00:29:26,800 --> 00:29:32,440
Speaker 2: If someone had an unlimited number of Pangram responses and

637
00:29:32,840 --> 00:29:36,240
Speaker 2: maybe had an excess to the Pangram api at infinite scale,

638
00:29:36,960 --> 00:29:40,959
Speaker 2: could they theoretically learn a prompt that they would then

639
00:29:41,040 --> 00:29:43,880
Speaker 2: be able to put into an AI to generate human style.

640
00:29:43,920 --> 00:29:46,479
Speaker 7: Writer actually had a friend do that. He put his

641
00:29:46,560 --> 00:29:49,640
Speaker 7: cloud code on a loop. I gave him some API credits,

642
00:29:49,680 --> 00:29:53,120
Speaker 7: and then his cloud code just basically worked overnight writing

643
00:29:53,120 --> 00:29:55,480
Speaker 7: a prompt trying to get it to put something that's

644
00:29:55,520 --> 00:29:58,360
Speaker 7: human written or that which came back there from Pangram

645
00:29:58,480 --> 00:30:01,680
Speaker 7: as human written. They got there, but the text was

646
00:30:01,720 --> 00:30:06,760
Speaker 7: pretty like uh incoherent, so so like, yeah, it was

647
00:30:06,920 --> 00:30:11,680
Speaker 7: producing more or less long gibberish. It was like grammatically incorrect.

648
00:30:12,600 --> 00:30:14,600
Speaker 7: A lot of the words just didn't really make sense.

649
00:30:14,680 --> 00:30:16,600
Speaker 2: Because this was my first thought, like when I saw it,

650
00:30:16,640 --> 00:30:18,680
Speaker 2: I was like, that would be like a fun experiment

651
00:30:19,120 --> 00:30:21,800
Speaker 2: to see if you could take all the outputs, find

652
00:30:21,800 --> 00:30:24,400
Speaker 2: the difference and just keep iterating on the prompt you

653
00:30:24,400 --> 00:30:27,560
Speaker 2: would have to tell AI in order to eventually get

654
00:30:27,560 --> 00:30:31,240
Speaker 2: an output that looked to Pangram like it was human generated.

655
00:30:31,360 --> 00:30:32,920
Speaker 7: Yeah, I think there's a way to do it if

656
00:30:32,960 --> 00:30:36,080
Speaker 7: you also had like an LM judge on coherency and

657
00:30:36,200 --> 00:30:40,040
Speaker 7: he's like Pangram and the coherency judge both to score

658
00:30:40,160 --> 00:30:43,280
Speaker 7: your text. I think it's definitely possible, and I'm excited

659
00:30:43,280 --> 00:30:44,960
Speaker 7: for someone to try to do it, because we could

660
00:30:44,960 --> 00:30:46,840
Speaker 7: make our model a lot better and more robust if

661
00:30:46,840 --> 00:30:47,480
Speaker 7: this existed.

662
00:30:47,640 --> 00:30:49,719
Speaker 4: So I want to know what your personal like token

663
00:30:49,760 --> 00:30:52,880
Speaker 4: budget is nowadays that you're even like contemplating some of

664
00:30:52,880 --> 00:30:53,360
Speaker 4: those stuff.

665
00:30:53,360 --> 00:30:56,000
Speaker 2: What I feel like I had the Cloude Max playing,

666
00:30:56,040 --> 00:30:59,400
Speaker 2: you know, and I don't work like when I'm at work,

667
00:31:00,000 --> 00:31:02,080
Speaker 2: I don't work on any of my Vibe coding projects.

668
00:31:02,160 --> 00:31:03,680
Speaker 3: And you know, like when we were kids.

669
00:31:03,840 --> 00:31:06,000
Speaker 2: I don't know if you remember, like if you didn't

670
00:31:06,000 --> 00:31:08,480
Speaker 2: need all your food, like someone to say, oh, there's

671
00:31:08,480 --> 00:31:09,760
Speaker 2: like starving kids in the world.

672
00:31:10,080 --> 00:31:13,120
Speaker 4: Yeah, I'm like, oh, it's starving Vibe coder.

673
00:31:14,280 --> 00:31:15,280
Speaker 3: It's like, oh, you didn't.

674
00:31:15,320 --> 00:31:17,720
Speaker 2: Like I have this four hour token window and I'm

675
00:31:17,760 --> 00:31:20,520
Speaker 2: almost never maxing it out, and I'm just thinking, like,

676
00:31:20,880 --> 00:31:22,600
Speaker 2: the are kids on the other side of the world

677
00:31:22,600 --> 00:31:25,160
Speaker 2: that wish they had your tokens and you're you're not

678
00:31:25,320 --> 00:31:27,040
Speaker 2: using all of your tokens for the window.

679
00:31:27,120 --> 00:31:27,680
Speaker 3: How dare you?

680
00:31:27,760 --> 00:31:30,360
Speaker 2: I feel a little guilty when I don't out max

681
00:31:30,400 --> 00:31:32,760
Speaker 2: out by Claude max token program.

682
00:31:32,840 --> 00:31:35,400
Speaker 7: I also have Claude Max and yeah, most days I'm

683
00:31:35,640 --> 00:31:37,720
Speaker 7: not doing much coding at all, I'm not maxing it out,

684
00:31:37,840 --> 00:31:39,480
Speaker 7: and then some days I'm going you feel a lot.

685
00:31:39,520 --> 00:31:42,520
Speaker 2: Guilty about that though, it's like, yeah, yeah, so can

686
00:31:42,600 --> 00:31:45,960
Speaker 2: I just feel like writing is kind of interesting, but like,

687
00:31:46,200 --> 00:31:49,960
Speaker 2: what are the prospects of this being able to work on? Say,

688
00:31:50,840 --> 00:31:53,160
Speaker 2: and you must get this lot image and video generation?

689
00:31:53,960 --> 00:31:56,680
Speaker 2: Is it it all theoretically similar? Is there a reason

690
00:31:56,800 --> 00:31:59,360
Speaker 2: to think that it will be replicable? Or is this

691
00:31:59,480 --> 00:32:00,960
Speaker 2: just a different beast of a problem.

692
00:32:01,040 --> 00:32:03,760
Speaker 7: I think the approach is definitely doable. I think some

693
00:32:03,840 --> 00:32:06,760
Speaker 7: of the economics change, especially if we look at video

694
00:32:06,840 --> 00:32:09,400
Speaker 7: and the cost of generating video today. Okay, we can't

695
00:32:09,440 --> 00:32:11,920
Speaker 7: generate video at the same scale that we can generate text,

696
00:32:12,400 --> 00:32:14,320
Speaker 7: and so we might need a kind of different approach.

697
00:32:14,680 --> 00:32:17,320
Speaker 7: But I also believe that if we're able to solve

698
00:32:17,360 --> 00:32:21,120
Speaker 7: this for image plus maybe like audio, that could be

699
00:32:21,240 --> 00:32:22,840
Speaker 7: enough to just solve it for video as well.

700
00:32:22,920 --> 00:32:24,000
Speaker 5: Huh, zero shot.

701
00:32:24,120 --> 00:32:27,040
Speaker 4: Could you ever envision, I don't know, launching some sort

702
00:32:27,040 --> 00:32:30,880
Speaker 4: of like certification program for video because this seems to

703
00:32:30,920 --> 00:32:33,920
Speaker 4: be my dad's a boomer spends a lot of time

704
00:32:33,960 --> 00:32:36,960
Speaker 4: on Facebook, Like this seems to be what society needs, right,

705
00:32:37,080 --> 00:32:39,240
Speaker 4: Like a video that comes with a little thing that

706
00:32:39,280 --> 00:32:42,680
Speaker 4: says this is not AI generated and someone has actually

707
00:32:42,760 --> 00:32:44,320
Speaker 4: like rubber stamped that, so.

708
00:32:44,360 --> 00:32:47,240
Speaker 7: There's an organization called c TWOPA, and I think they're

709
00:32:47,280 --> 00:32:52,000
Speaker 7: doing pretty good work on content provenance. Basically, they are

710
00:32:52,040 --> 00:32:57,520
Speaker 7: working with phone makers and hardware makers to basically embed

711
00:32:57,640 --> 00:33:02,080
Speaker 7: like hardware signatures to prove that image and video we're

712
00:33:02,080 --> 00:33:03,120
Speaker 7: truly taken from.

713
00:33:03,000 --> 00:33:05,120
Speaker 4: The hardware like watermarks basically.

714
00:33:04,840 --> 00:33:07,720
Speaker 7: Yeah, exactly so, So rather than marking the AI outputs, yeah,

715
00:33:07,760 --> 00:33:11,400
Speaker 7: we're instead embedding like a proof of authenticity in the

716
00:33:12,360 --> 00:33:15,080
Speaker 7: the like thing that's real and is captured.

717
00:33:14,760 --> 00:33:15,200
Speaker 5: In real life.

718
00:33:15,280 --> 00:33:19,480
Speaker 3: That's interesting, all right, So big picture, where's the Internet going?

719
00:33:19,640 --> 00:33:21,440
Speaker 2: You know, you mentioned forty percent of the Internet is

720
00:33:21,440 --> 00:33:24,560
Speaker 2: already air generated, but maybe that's something end of the world, Like,

721
00:33:25,000 --> 00:33:26,719
Speaker 2: you know, if it's just a bunch of SEO pages

722
00:33:26,760 --> 00:33:29,160
Speaker 2: that I never read, I don't know whatever, But like

723
00:33:29,560 --> 00:33:31,840
Speaker 2: give us some thoughts high level about like with the

724
00:33:31,880 --> 00:33:35,800
Speaker 2: trajectory of the Internet. Regardless of the uptake of Pangram

725
00:33:35,800 --> 00:33:37,360
Speaker 2: and other AD detection models.

726
00:33:37,560 --> 00:33:40,600
Speaker 5: I'm a little bit worried about the state of the Internet.

727
00:33:40,600 --> 00:33:41,440
Speaker 5: I'm gonna be honest.

728
00:33:41,880 --> 00:33:44,720
Speaker 7: I think like right now, there's still like so much

729
00:33:44,760 --> 00:33:47,400
Speaker 7: of it is built around trust and norms in a

730
00:33:47,440 --> 00:33:50,480
Speaker 7: way that like we're we're not really well equipped to

731
00:33:50,680 --> 00:33:53,720
Speaker 7: suddenly deal with an onslaught of bots at a completely

732
00:33:53,720 --> 00:33:55,320
Speaker 7: different scale than we've dealt with before.

733
00:33:55,920 --> 00:33:58,240
Speaker 5: There's maybe like a good case and a bad case.

734
00:33:58,480 --> 00:34:00,560
Speaker 7: I would say, like the bad case is the Internet

735
00:34:00,680 --> 00:34:04,240
Speaker 7: goes the way of debt internet theory, just like every

736
00:34:04,280 --> 00:34:07,280
Speaker 7: space that's open and accessible is just flooded by bots,

737
00:34:07,600 --> 00:34:10,000
Speaker 7: and then the only place people are able to communicate

738
00:34:10,040 --> 00:34:14,239
Speaker 7: authentically is in like very walled garden like closed servers

739
00:34:14,280 --> 00:34:17,280
Speaker 7: like like discord service for example, where you know everybody's

740
00:34:17,360 --> 00:34:19,000
Speaker 7: identity is known and you know you don't.

741
00:34:18,800 --> 00:34:21,600
Speaker 5: Have bots in here. So that's maybe the like bad scenario.

742
00:34:21,920 --> 00:34:24,399
Speaker 2: Can I do an insane thought that I've had go on,

743
00:34:25,360 --> 00:34:28,440
Speaker 2: We're gonna kick out of this? So when like I

744
00:34:28,480 --> 00:34:30,799
Speaker 2: forget what they call like this idea of like for

745
00:34:30,880 --> 00:34:31,880
Speaker 2: the bad actors, it's.

746
00:34:31,680 --> 00:34:34,200
Speaker 3: Called like heaven mode or heaven banning. Have you heard

747
00:34:34,200 --> 00:34:36,640
Speaker 3: of this? So there's this thought that one way.

748
00:34:36,520 --> 00:34:40,319
Speaker 2: You could deal with bad actors on the Internet is

749
00:34:41,280 --> 00:34:44,480
Speaker 2: suddenly they're on a version of say Twitter, in which

750
00:34:44,520 --> 00:34:47,480
Speaker 2: they're only bots and everyone always agrees with them on

751
00:34:47,520 --> 00:34:50,080
Speaker 2: everything and it drives them crazy and stuff like that,

752
00:34:50,320 --> 00:34:52,239
Speaker 2: and they would never know it because they're like, oh,

753
00:34:52,239 --> 00:34:54,160
Speaker 2: there's call, everyone's there, and then it's so like slowly

754
00:34:54,200 --> 00:34:56,040
Speaker 2: like yeah, they just this is like a way you

755
00:34:56,080 --> 00:34:58,279
Speaker 2: could punish people by putting them on an internet where

756
00:34:58,320 --> 00:34:59,480
Speaker 2: they will never get any fight.

757
00:35:00,120 --> 00:35:02,560
Speaker 7: Band and put into basically jail. You're talking a bunch.

758
00:35:02,360 --> 00:35:04,040
Speaker 3: Of that's right, that's right, that would be jail. But

759
00:35:04,080 --> 00:35:04,799
Speaker 3: you're heaven banned.

760
00:35:04,920 --> 00:35:07,080
Speaker 2: But I thought, and again, this is you know, like

761
00:35:07,080 --> 00:35:09,000
Speaker 2: I built this little am model myself and I like

762
00:35:09,000 --> 00:35:11,399
Speaker 2: showed it to my friends, like, oh, it's really cool, Joe.

763
00:35:11,400 --> 00:35:13,719
Speaker 2: I'm really oppressed, Like I'm really impressed by like that

764
00:35:13,760 --> 00:35:16,239
Speaker 2: you're able to do this. And I was like, are

765
00:35:16,280 --> 00:35:18,520
Speaker 2: people being honest with me? Have I been heaven banned?

766
00:35:18,520 --> 00:35:20,799
Speaker 2: Because I just like, like, you can be honest with

767
00:35:20,840 --> 00:35:21,560
Speaker 2: me if it sucks.

768
00:35:21,560 --> 00:35:23,400
Speaker 3: And I sort of have the fear.

769
00:35:23,360 --> 00:35:26,840
Speaker 4: The biggest humble braggad this thing and everyone thought it

770
00:35:26,880 --> 00:35:27,399
Speaker 4: was not great.

771
00:35:27,520 --> 00:35:29,279
Speaker 3: I'm just saying, like people are like I think people.

772
00:35:29,320 --> 00:35:31,560
Speaker 3: I'm worried that like people bring nice to me because like,

773
00:35:31,560 --> 00:35:33,400
Speaker 3: oh cool, Yeah that's repressed. You like did that.

774
00:35:33,560 --> 00:35:36,440
Speaker 2: And I have this like deep anxiety that like people

775
00:35:36,440 --> 00:35:38,520
Speaker 2: aren't giving it to me straight about it. I know

776
00:35:38,560 --> 00:35:40,120
Speaker 2: that sounds like a humble brag, but it's really not.

777
00:35:40,320 --> 00:35:42,120
Speaker 7: That's why you can never get like too successful, like

778
00:35:42,200 --> 00:35:45,080
Speaker 7: Maya West surrounded by a bunch of you never get.

779
00:35:44,880 --> 00:35:47,799
Speaker 2: Like, oh, this is his first try doing something with

780
00:35:47,960 --> 00:35:50,080
Speaker 2: vibe coding. I'm like deeply anxious, Like, no, you could

781
00:35:50,120 --> 00:35:52,480
Speaker 2: just tell me if it sucks, that's fine, that's my worry.

782
00:35:53,000 --> 00:35:53,920
Speaker 6: I don't worry about this.

783
00:35:54,040 --> 00:35:56,439
Speaker 4: If I tweet that I'm eating a steak, I will

784
00:35:56,440 --> 00:35:59,520
Speaker 4: get like a hundred people criticized and you didn't.

785
00:35:59,360 --> 00:35:59,839
Speaker 3: Put the meat.

786
00:36:00,120 --> 00:36:00,520
Speaker 2: Yeah.

787
00:36:00,560 --> 00:36:00,960
Speaker 5: Yeah.

788
00:36:01,000 --> 00:36:02,839
Speaker 2: So that's the other thing, which is that the two

789
00:36:02,920 --> 00:36:06,560
Speaker 2: things you are never allowed to tweet about meat preparation

790
00:36:07,160 --> 00:36:09,640
Speaker 2: and enjoying life, because if you ever enjoy life, then

791
00:36:09,640 --> 00:36:11,600
Speaker 2: if you ever enjoy it, and if you ever prepare.

792
00:36:11,360 --> 00:36:14,280
Speaker 3: Meat, people will flip out at you on the internet.

793
00:36:14,360 --> 00:36:16,279
Speaker 3: Those are the two things that you're not allowed to

794
00:36:16,360 --> 00:36:17,080
Speaker 3: do online.

795
00:36:17,280 --> 00:36:19,759
Speaker 4: Very true, this sort of related question, But just going

796
00:36:19,800 --> 00:36:22,600
Speaker 4: back to the methodology, if you're focused on this sort

797
00:36:22,600 --> 00:36:26,000
Speaker 4: of like path dependent idea, I'm kind of envisioning it

798
00:36:26,040 --> 00:36:29,279
Speaker 4: as like a giant decision tree, right, is there a

799
00:36:29,320 --> 00:36:32,839
Speaker 4: possibility that as the models get better and better, and

800
00:36:32,880 --> 00:36:35,839
Speaker 4: we know that they're already injecting like some degree of

801
00:36:36,120 --> 00:36:39,800
Speaker 4: randomness into their output. Although I know there's going to

802
00:36:39,800 --> 00:36:42,000
Speaker 4: be a pedant out there who like messages me and

803
00:36:42,040 --> 00:36:44,880
Speaker 4: says like, well, you know computers can't do like true randomness.

804
00:36:44,880 --> 00:36:49,480
Speaker 4: But setting that aside, setting that aside, like, we know

805
00:36:49,560 --> 00:36:53,640
Speaker 4: that they're adjusting, they're becoming more sophisticated at an incredible rate.

806
00:36:53,719 --> 00:36:57,480
Speaker 4: We know that they're trying to adjust and inject some

807
00:36:57,719 --> 00:37:01,000
Speaker 4: randomness in order to avoid exactly this kind of detection.

808
00:37:01,880 --> 00:37:05,160
Speaker 4: Do you worry about their own adaptation at all?

809
00:37:05,480 --> 00:37:08,600
Speaker 7: I have noticed that the models as they get more capable,

810
00:37:08,880 --> 00:37:12,279
Speaker 7: I believe that their output distribution gets more complex. It's

811
00:37:12,320 --> 00:37:14,920
Speaker 7: harder to learn with a simple model, which is why

812
00:37:14,960 --> 00:37:18,560
Speaker 7: we've been increasing our model size to capture a higher

813
00:37:18,600 --> 00:37:22,319
Speaker 7: complexity function that can capture the LM outputs. So I

814
00:37:22,320 --> 00:37:25,719
Speaker 7: think we may have to continue to make our models better.

815
00:37:25,960 --> 00:37:27,359
Speaker 7: We're gonna have to work to keep up with it.

816
00:37:27,719 --> 00:37:29,400
Speaker 7: We can't just rest on our laurels.

817
00:37:29,560 --> 00:37:31,399
Speaker 3: What our birstiness and perplexity.

818
00:37:31,760 --> 00:37:34,799
Speaker 7: Yeah, so this is a metric that's used by some

819
00:37:34,920 --> 00:37:37,960
Speaker 7: AI detectors, but not Pangram okay, And so I can

820
00:37:38,000 --> 00:37:41,319
Speaker 7: explain a bit about how it works. So perplexity is

821
00:37:41,480 --> 00:37:42,799
Speaker 7: Basically a measure of this.

822
00:37:42,800 --> 00:37:45,040
Speaker 2: Is not perplexity dot AI the website. This is a

823
00:37:45,080 --> 00:37:45,680
Speaker 2: technical term.

824
00:37:45,719 --> 00:37:48,640
Speaker 7: Okay, this is a metric. This is a measure of

825
00:37:48,719 --> 00:37:52,760
Speaker 7: how confusing a piece of text is to a language model.

826
00:37:53,320 --> 00:37:58,080
Speaker 7: So basically, if, for example, with every token, we can

827
00:37:58,120 --> 00:38:00,800
Speaker 7: calculate some perplexity, which is basically like how expected is

828
00:38:00,840 --> 00:38:03,600
Speaker 7: this is. So for example, like if it's I went

829
00:38:03,640 --> 00:38:06,560
Speaker 7: home to my pet and then the next token is chinchilla,

830
00:38:06,840 --> 00:38:09,000
Speaker 7: that'd be a much higher perplexity token.

831
00:38:08,960 --> 00:38:09,880
Speaker 5: Than my pet dog.

832
00:38:10,600 --> 00:38:16,000
Speaker 7: So low perplexity text or really like LM outputs tend

833
00:38:16,000 --> 00:38:19,040
Speaker 7: to be low perplexity. They're not going to produce outputs

834
00:38:19,080 --> 00:38:22,960
Speaker 7: that are surprising to themselves. So this is a decent

835
00:38:23,000 --> 00:38:26,160
Speaker 7: way to get an AI detector that's around ninety to

836
00:38:26,239 --> 00:38:30,000
Speaker 7: ninety five percent accurate. But it has some problems. The

837
00:38:30,000 --> 00:38:33,920
Speaker 7: main one is that you can't improve upon it. Basically

838
00:38:34,160 --> 00:38:38,160
Speaker 7: it has false positives. Text written by non native English

839
00:38:38,160 --> 00:38:41,440
Speaker 7: speakers often is low perplexity just because when you're late.

840
00:38:41,440 --> 00:38:42,880
Speaker 3: Don't take as many risks. Exactly.

841
00:38:43,000 --> 00:38:46,400
Speaker 7: Yeah, interesting, Yeah, So that's why a lot of the

842
00:38:46,440 --> 00:38:49,440
Speaker 7: early AI detectors had a bunch of false positives. With

843
00:38:49,800 --> 00:38:53,640
Speaker 7: ESL speakers. It's because their text was low perplexity. So

844
00:38:54,080 --> 00:38:56,600
Speaker 7: I think, like, this is a very cool metric, but

845
00:38:56,800 --> 00:38:59,120
Speaker 7: it is not the path for pangram.

846
00:38:59,120 --> 00:39:01,520
Speaker 5: Instead, we went the deep approach, so we can do

847
00:39:01,600 --> 00:39:02,120
Speaker 5: better than.

848
00:39:02,040 --> 00:39:04,359
Speaker 3: And what's in this is that just the opposite side

849
00:39:04,360 --> 00:39:04,759
Speaker 3: of the coin.

850
00:39:05,239 --> 00:39:09,040
Speaker 7: Yeah, Burstinus is basically actually, yeah, I don't know if

851
00:39:09,040 --> 00:39:09,600
Speaker 7: I can define it.

852
00:39:09,719 --> 00:39:13,319
Speaker 4: Okay, fine, Burstinus just sounds like one of those like

853
00:39:13,560 --> 00:39:16,960
Speaker 4: sort of I guess manosphere terms, doesn't it like, oh,

854
00:39:17,040 --> 00:39:17,520
Speaker 4: yeah he.

855
00:39:17,480 --> 00:39:20,320
Speaker 6: Has like he's been looksmaxing with high burst nets or

856
00:39:20,360 --> 00:39:20,759
Speaker 6: something like that.

857
00:39:21,440 --> 00:39:22,200
Speaker 3: Yeah, that's great.

858
00:39:22,239 --> 00:39:24,080
Speaker 7: Yeah, I think it might just be like a measure

859
00:39:24,160 --> 00:39:27,840
Speaker 7: of like sentence Lengthen, how the ups and downs of

860
00:39:27,880 --> 00:39:28,320
Speaker 7: the text.

861
00:39:28,960 --> 00:39:32,279
Speaker 4: If we assume that the world is collectively concerned about

862
00:39:32,280 --> 00:39:34,960
Speaker 4: AI slop and wants to do something about it, what

863
00:39:35,000 --> 00:39:39,120
Speaker 4: would be like the single biggest change to the system,

864
00:39:39,480 --> 00:39:42,080
Speaker 4: either in terms of like the economics of the internet

865
00:39:42,160 --> 00:39:46,120
Speaker 4: or regulation or technology like what you're developing that would

866
00:39:46,160 --> 00:39:48,160
Speaker 4: actually help reduce slop.

867
00:39:48,440 --> 00:39:51,080
Speaker 7: Yeah, I think the biggest one is norms. So there

868
00:39:51,080 --> 00:39:53,400
Speaker 7: have been a couple of great blog posts written about

869
00:39:53,440 --> 00:39:58,120
Speaker 7: how it is rude to send other people undisclosed AI outputs,

870
00:39:58,719 --> 00:40:02,359
Speaker 7: and I think I like completely agree here. I think,

871
00:40:02,480 --> 00:40:04,239
Speaker 7: you know, if somebody like asks the question on the

872
00:40:04,239 --> 00:40:06,759
Speaker 7: Internet and then somebody else like goes and puts into

873
00:40:06,800 --> 00:40:08,960
Speaker 7: chat CHEPT and then like pace the answer, it's kind

874
00:40:08,960 --> 00:40:10,560
Speaker 7: of rude, Like like I was going here to ask

875
00:40:10,800 --> 00:40:13,879
Speaker 7: the opinions of my friends or you know, my followers, not.

876
00:40:14,080 --> 00:40:16,520
Speaker 5: Just like not chat GPT. I could have done that myself.

877
00:40:16,840 --> 00:40:19,640
Speaker 7: And so I think, like building this norm is something

878
00:40:19,680 --> 00:40:22,120
Speaker 7: that you know, it's very new technology, so we need

879
00:40:22,160 --> 00:40:23,040
Speaker 7: to do it quickly.

880
00:40:23,080 --> 00:40:25,760
Speaker 5: But I think this would help a lot for society.

881
00:40:25,800 --> 00:40:27,880
Speaker 2: Well then actually just gets to a question that I

882
00:40:27,920 --> 00:40:30,680
Speaker 2: have then, which is I feel as though the major

883
00:40:30,719 --> 00:40:34,560
Speaker 2: Internet platforms are actually moving the exact opposite direction. I mean,

884
00:40:34,560 --> 00:40:38,320
Speaker 2: I'm stunned. Maybe I accidentally clicked on something at some point,

885
00:40:38,600 --> 00:40:41,520
Speaker 2: but the frequency with which I can email and then

886
00:40:41,560 --> 00:40:43,759
Speaker 2: I open it up to respond in Gmail, and there's

887
00:40:43,800 --> 00:40:47,000
Speaker 2: that ghost text there that do you just want GEM

888
00:40:47,040 --> 00:40:48,279
Speaker 2: and I to respond to this?

889
00:40:48,640 --> 00:40:49,680
Speaker 3: I've never done.

890
00:40:49,480 --> 00:40:52,040
Speaker 2: That, I also consider, I think that would be extremely rude.

891
00:40:52,040 --> 00:40:56,719
Speaker 2: I've never responded to any email with AI respond But

892
00:40:56,760 --> 00:40:59,239
Speaker 2: they're basically telling you to do that. They're doing the

893
00:40:59,239 --> 00:41:01,720
Speaker 2: exact opposite blowing up these norms, And so I'm curious

894
00:41:01,719 --> 00:41:04,680
Speaker 2: from your perspective, you managed to work with Quorra, But

895
00:41:04,920 --> 00:41:09,400
Speaker 2: from your impression, do the major internet platforms think this

896
00:41:09,560 --> 00:41:12,279
Speaker 2: is a problem worth solving or from their consider and

897
00:41:12,280 --> 00:41:14,320
Speaker 2: it is like you know what, Yeah, it feels content

898
00:41:14,400 --> 00:41:14,759
Speaker 2: the better.

899
00:41:14,840 --> 00:41:17,800
Speaker 4: There's mixed incentives for the big company.

900
00:41:17,800 --> 00:41:20,360
Speaker 7: It's funny because like Google seems to be playing both sides.

901
00:41:20,640 --> 00:41:23,680
Speaker 7: So like, on one hand, they had that advertisement which

902
00:41:23,680 --> 00:41:25,680
Speaker 7: people kind of blew up about where it's like, oh,

903
00:41:25,800 --> 00:41:29,480
Speaker 7: children can now send their heroes notes on like how

904
00:41:29,560 --> 00:41:31,799
Speaker 7: much they respect them by using AI instead of like

905
00:41:32,040 --> 00:41:34,160
Speaker 7: writing the note themselves, and like this is wrong, This

906
00:41:34,239 --> 00:41:37,560
Speaker 7: is like societally bad. But at the same time, they're

907
00:41:37,600 --> 00:41:40,799
Speaker 7: working very hard to deal with the AI slop on

908
00:41:40,880 --> 00:41:43,520
Speaker 7: the Internet in search results to make sure people get

909
00:41:43,560 --> 00:41:45,040
Speaker 7: served real content and not.

910
00:41:45,000 --> 00:41:45,960
Speaker 5: AI slot content.

911
00:41:46,640 --> 00:41:49,279
Speaker 7: So I think, I mean, I think obviously there's a

912
00:41:49,320 --> 00:41:51,640
Speaker 7: lot of incentives that play up around like product people

913
00:41:51,680 --> 00:41:55,000
Speaker 7: who are incentivized to push AI because that is the

914
00:41:55,040 --> 00:41:59,359
Speaker 7: corporate mandate. But yeah, I think overall, even like in

915
00:41:59,400 --> 00:42:02,000
Speaker 7: my sphere, a bunch of people who are AI researchers,

916
00:42:02,640 --> 00:42:06,520
Speaker 7: generally consensus is that like AI is a powerful tool,

917
00:42:06,560 --> 00:42:07,600
Speaker 7: but like slop is bad.

918
00:42:07,880 --> 00:42:10,840
Speaker 4: This reminds me my parents used to make me do

919
00:42:10,960 --> 00:42:15,080
Speaker 4: these like handmade greeting cards for every you know, for Christmas,

920
00:42:15,120 --> 00:42:17,160
Speaker 4: for like all relatives and stuff. And it was supposed

921
00:42:17,200 --> 00:42:22,319
Speaker 4: to be a demonstration of my commitment to communicating family. No, no,

922
00:42:22,400 --> 00:42:25,799
Speaker 4: it traumatized me forever. And I hate greeting cards as

923
00:42:25,840 --> 00:42:28,680
Speaker 4: a result of them of doing this, just spending hours

924
00:42:28,800 --> 00:42:31,840
Speaker 4: manufacturing these things. But then, secondly, the funniest thing was

925
00:42:31,920 --> 00:42:36,040
Speaker 4: once we got E cards, my parents immediately switched to

926
00:42:36,200 --> 00:42:40,080
Speaker 4: using e cards and just and now this is also

927
00:42:40,120 --> 00:42:40,879
Speaker 4: the funniest thing.

928
00:42:41,080 --> 00:42:42,359
Speaker 6: My dad uses E card.

929
00:42:42,400 --> 00:42:44,480
Speaker 4: He figured out that the E card system can tell

930
00:42:44,560 --> 00:42:46,680
Speaker 4: him whether or not you opened it, so he just

931
00:42:46,800 --> 00:42:48,680
Speaker 4: uses it as like day to day communication.

932
00:42:48,880 --> 00:42:51,840
Speaker 5: Now that's so funny.

933
00:42:51,880 --> 00:42:54,839
Speaker 3: Just send an email to your daughter E card.

934
00:42:55,120 --> 00:42:56,840
Speaker 4: It's like, I noticed you haven't opened up my E

935
00:42:57,000 --> 00:43:01,640
Speaker 4: card for International Hot Dog Day. Please let me know

936
00:43:01,920 --> 00:43:02,560
Speaker 4: what's going on.

937
00:43:02,640 --> 00:43:05,640
Speaker 2: I'm terrible handwriting as a kid, and my mother made

938
00:43:05,640 --> 00:43:08,480
Speaker 2: me write all of these handwritten notes to thank people

939
00:43:08,520 --> 00:43:09,440
Speaker 2: for the gifts I got for.

940
00:43:09,480 --> 00:43:10,400
Speaker 3: My bar mitzvah.

941
00:43:10,480 --> 00:43:12,839
Speaker 2: Yeah, I hated it, but you know what, I have

942
00:43:12,960 --> 00:43:14,359
Speaker 2: keep connections with all of.

943
00:43:14,320 --> 00:43:16,360
Speaker 3: Those people that have lasted over the years.

944
00:43:16,760 --> 00:43:19,400
Speaker 2: In that miserable one week where I just wrote and

945
00:43:19,440 --> 00:43:21,600
Speaker 2: I got, you know, hand creamped, I think it.

946
00:43:21,560 --> 00:43:22,520
Speaker 3: Paid off, all right.

947
00:43:22,520 --> 00:43:27,400
Speaker 4: Well, imagine doing that for like sixteen years basically in

948
00:43:27,400 --> 00:43:28,960
Speaker 4: a never ending stream.

949
00:43:29,000 --> 00:43:31,360
Speaker 3: Max Birou, thank you so much for coming on out Laws.

950
00:43:31,400 --> 00:43:33,600
Speaker 3: That was a lot of fun. I'm fascinated by this conversation.

951
00:43:33,800 --> 00:43:35,759
Speaker 7: Thanks so much for having me. Yeah, really exciting to

952
00:43:35,800 --> 00:43:38,480
Speaker 7: talk about this. And I think slaps is a growing problem,

953
00:43:38,520 --> 00:43:40,160
Speaker 7: so hopefully awesome RAPK deal with it.

954
00:43:41,120 --> 00:43:42,200
Speaker 6: Of the internet, I.

955
00:43:42,200 --> 00:43:44,040
Speaker 4: Can't tell if I'm surprised by that oring on.

956
00:43:44,280 --> 00:43:45,960
Speaker 3: And what's it going to be next year at this time?

957
00:43:46,280 --> 00:43:47,399
Speaker 5: Oh man, I don't know.

958
00:43:47,760 --> 00:43:49,800
Speaker 3: It'll be like hard to stay over with Georgian that

959
00:43:49,880 --> 00:43:50,280
Speaker 3: for sure.

960
00:43:50,719 --> 00:43:52,120
Speaker 5: Yeah, almost certainly crazy.

961
00:43:52,400 --> 00:43:53,480
Speaker 3: All right, thanks for coming on.

962
00:43:53,440 --> 00:44:02,560
Speaker 5: Oudlin, Thanks.

963
00:44:07,440 --> 00:44:08,920
Speaker 3: Tracy. I love that conversation.

964
00:44:09,000 --> 00:44:10,799
Speaker 2: I just think it's like a really fun puzzle, right,

965
00:44:11,719 --> 00:44:15,840
Speaker 2: It's very like it seems like a fun question to solve,

966
00:44:15,920 --> 00:44:19,520
Speaker 2: And I'm fascinated by this idea of how like with

967
00:44:19,719 --> 00:44:24,239
Speaker 2: both humans and AI, there's gonna be this gap inevitable

968
00:44:24,480 --> 00:44:27,319
Speaker 2: between what we know and what we can articulate because

969
00:44:27,360 --> 00:44:29,479
Speaker 2: you and I both setting aside a a versus text,

970
00:44:29,640 --> 00:44:32,160
Speaker 2: there are things that we both know. For example, this

971
00:44:32,200 --> 00:44:34,760
Speaker 2: is newsworthy, and this is this is a good episode

972
00:44:34,800 --> 00:44:37,880
Speaker 2: of a podcast, This is a credible sounding guest, and

973
00:44:37,920 --> 00:44:41,040
Speaker 2: this isn't the gap between that and then being able

974
00:44:41,080 --> 00:44:43,520
Speaker 2: to explain why, it's like, well, you just sort of

975
00:44:43,600 --> 00:44:45,560
Speaker 2: know it, right, You just sort of have this feeling there,

976
00:44:46,560 --> 00:44:50,479
Speaker 2: and that intuition is built up from numerous examples, which

977
00:44:50,520 --> 00:44:52,200
Speaker 2: is the same way in a sense that like the

978
00:44:52,239 --> 00:44:53,240
Speaker 2: AI is trained.

979
00:44:53,320 --> 00:44:54,360
Speaker 3: It's like these.

980
00:44:54,239 --> 00:44:56,760
Speaker 2: Things that you only know from patterns and you can

981
00:44:57,160 --> 00:45:00,520
Speaker 2: see them without fully being able to, like article exactly

982
00:45:00,560 --> 00:45:01,160
Speaker 2: what's going on.

983
00:45:01,280 --> 00:45:02,360
Speaker 6: Well, the other.

984
00:45:02,239 --> 00:45:05,239
Speaker 4: Question I would have on that is is it even

985
00:45:05,280 --> 00:45:07,680
Speaker 4: going to matter in the long run if you think about,

986
00:45:07,719 --> 00:45:10,960
Speaker 4: like so much of the Internet is already built on

987
00:45:11,040 --> 00:45:14,120
Speaker 4: bots and the sort of like false attention economy, Like

988
00:45:14,800 --> 00:45:21,680
Speaker 4: if our entire like worldview becomes shaped by AI driven drivel, yeah,

989
00:45:22,560 --> 00:45:25,440
Speaker 4: does it matter if like the economics of the Internet

990
00:45:25,560 --> 00:45:28,759
Speaker 4: are still attached to individual bought accounts and things like that.

991
00:45:28,760 --> 00:45:31,640
Speaker 6: I don't know if I'm if I'm explaining this, but.

992
00:45:31,760 --> 00:45:33,040
Speaker 2: No, no, I think it makes a lot of sense,

993
00:45:33,080 --> 00:45:36,160
Speaker 2: and I do think like it is important, like we're.

994
00:45:36,040 --> 00:45:37,799
Speaker 3: Going to have to change the entire way with them.

995
00:45:38,000 --> 00:45:40,399
Speaker 2: And Max said at the beginning, which is, and I've

996
00:45:40,400 --> 00:45:42,759
Speaker 2: thought about this, which is that it used to be

997
00:45:43,120 --> 00:45:45,000
Speaker 2: that if you came across a piece of writing and

998
00:45:45,040 --> 00:45:49,120
Speaker 2: the punctuation was excellent and the spelling was excellent, and

999
00:45:49,160 --> 00:45:51,680
Speaker 2: it was like cogent sounding, you're like, okay, this has

1000
00:45:51,680 --> 00:45:55,239
Speaker 2: been written by a smart person. I will read the seriously, right,

1001
00:45:55,880 --> 00:45:59,160
Speaker 2: And now there is this complete severance of sort of

1002
00:45:59,200 --> 00:46:02,440
Speaker 2: like craft and out put because you could and you

1003
00:46:02,520 --> 00:46:05,759
Speaker 2: did this, Like, ask Claude to write an argument in

1004
00:46:05,800 --> 00:46:10,239
Speaker 2: favor of the most absurd proposition imaginable. Ask Claude to

1005
00:46:11,000 --> 00:46:15,520
Speaker 2: write an argument for me that the reason why Reagan

1006
00:46:15,560 --> 00:46:18,200
Speaker 2: wanted to do tax cuts in the early nineteen eighties

1007
00:46:18,680 --> 00:46:22,200
Speaker 2: related to these reports of UFO sightings in the nineteen seventies,

1008
00:46:22,600 --> 00:46:25,719
Speaker 2: and it will write something that not only is it

1009
00:46:25,760 --> 00:46:28,319
Speaker 2: grammatically correct, it'll actually like strain to come up with

1010
00:46:28,360 --> 00:46:31,000
Speaker 2: the best version of this argument before and again if

1011
00:46:31,080 --> 00:46:33,560
Speaker 2: prior to that, having read and like, maybe the person

1012
00:46:34,160 --> 00:46:37,000
Speaker 2: like this person took this argument seriously, but now this

1013
00:46:37,080 --> 00:46:39,880
Speaker 2: argument is just created. Ax nail Oh We're going to

1014
00:46:39,960 --> 00:46:42,440
Speaker 2: have to really like change our heuristics about this stuff.

1015
00:46:42,480 --> 00:46:46,600
Speaker 4: We've created an unlimited stream of basically cranks, which is

1016
00:46:46,680 --> 00:46:47,480
Speaker 4: really good grammar.

1017
00:46:47,640 --> 00:46:50,520
Speaker 2: Yeah, that's right, that's right, because it used to be

1018
00:46:50,560 --> 00:46:52,360
Speaker 2: we knew the crank because they had bad grammar, or

1019
00:46:52,360 --> 00:46:55,239
Speaker 2: they would email us and like half the words would

1020
00:46:55,239 --> 00:46:57,520
Speaker 2: be in yellow and the other half would be underlined green.

1021
00:46:57,560 --> 00:47:01,279
Speaker 4: Inlastic exams, the tools that we use to just like, oh,

1022
00:47:01,280 --> 00:47:03,920
Speaker 4: this person's a crank, they like, you know, half the

1023
00:47:04,000 --> 00:47:05,799
Speaker 4: words are at all caps and stuff like that.

1024
00:47:06,200 --> 00:47:07,280
Speaker 3: Those don't work anymore.

1025
00:47:07,320 --> 00:47:09,440
Speaker 4: All right, on that note, shall we leave it there?

1026
00:47:09,520 --> 00:47:10,160
Speaker 3: Let's save it there.

1027
00:47:10,320 --> 00:47:12,799
Speaker 4: This has been another episode of the Authlots podcast. I'm

1028
00:47:12,840 --> 00:47:15,600
Speaker 4: Tracy Alloway. You can follow me at Tracy Alloway.

1029
00:47:15,320 --> 00:47:18,080
Speaker 2: And I'm joll Wisenthal. You can follow me at the Stalwart.

1030
00:47:18,400 --> 00:47:22,480
Speaker 2: Follow our guest Max Spiro. He's at Max Underscore Spiro Underscore.

1031
00:47:22,719 --> 00:47:25,960
Speaker 2: Follow our producers Carmen Rodriguez at Carmen Arman, dash Sho

1032
00:47:26,040 --> 00:47:29,120
Speaker 2: Bennett at Dashbot, and Cal Brooks at Kilbrooks. And for

1033
00:47:29,239 --> 00:47:32,359
Speaker 2: more oddloss content, go to Bloomberg dot com slash odd Lots.

1034
00:47:32,360 --> 00:47:34,799
Speaker 2: We're a daily newsletter and all of our episodes, and

1035
00:47:34,840 --> 00:47:36,640
Speaker 2: you can chat about all of these topics twenty four

1036
00:47:36,680 --> 00:47:40,279
Speaker 2: to seven in our discord discord dot gg slash od

1037
00:47:40,280 --> 00:47:41,000
Speaker 2: lots And.

1038
00:47:41,080 --> 00:47:43,279
Speaker 4: If you enjoy odlots, if you like it when we

1039
00:47:43,320 --> 00:47:46,120
Speaker 4: talk about how the Internet is forty percent slop, then

1040
00:47:46,160 --> 00:47:49,280
Speaker 4: please leave us a positive review on your favorite podcast platform.

1041
00:47:49,520 --> 00:47:51,880
Speaker 4: And remember, if you are a Bloomberg subscriber, you can

1042
00:47:51,920 --> 00:47:55,000
Speaker 4: listen to all of our episodes absolutely ad free. All

1043
00:47:55,000 --> 00:47:56,920
Speaker 4: you need to do is find the Bloomberg channel on

1044
00:47:57,000 --> 00:47:59,239
Speaker 4: Apple Podcasts and follow the instructions there.

1045
00:47:59,640 --> 00:48:00,480
Speaker 6: Thanks listening,