1
00:00:15,356 --> 00:00:23,116
Speaker 1: Pushkin. Earlier this year, an employee working in Hong Kong

2
00:00:23,196 --> 00:00:26,516
Speaker 1: for an international company got a weird message from one

3
00:00:26,516 --> 00:00:29,516
Speaker 1: of his colleagues. He was supposed to make a secret

4
00:00:29,556 --> 00:00:34,236
Speaker 1: transfer of millions of dollars. It seems sketchy. It obviously

5
00:00:34,316 --> 00:00:36,956
Speaker 1: seems sketchy, so he got on a video call with

6
00:00:36,996 --> 00:00:40,516
Speaker 1: a bunch of people, including the company's CFO, the chief

7
00:00:40,516 --> 00:00:44,516
Speaker 1: financial officer. The CFO said the request was legit, so

8
00:00:44,716 --> 00:00:47,716
Speaker 1: the employee did what he was told. He transferred roughly

9
00:00:47,916 --> 00:00:52,076
Speaker 1: twenty five million dollars to several bank accounts. As it

10
00:00:52,116 --> 00:00:54,396
Speaker 1: turned out, the CFO on the video call was not

11
00:00:54,876 --> 00:00:58,796
Speaker 1: really the CFO. It was a deep fake, an AI

12
00:00:58,956 --> 00:01:03,116
Speaker 1: generated twin created from publicly available audio and video of

13
00:01:03,156 --> 00:01:06,756
Speaker 1: the real CFO. By the time the company figured out

14
00:01:06,756 --> 00:01:09,516
Speaker 1: what was going on, it was too late, the money

15
00:01:09,756 --> 00:01:18,316
Speaker 1: was gone. I'm Jacob Goldstein and this is What's Your Problem,

16
00:01:18,476 --> 00:01:20,236
Speaker 1: the show where I talk to people who are trying

17
00:01:20,276 --> 00:01:24,676
Speaker 1: to make technological progress. My guest today is Ali Shahieri.

18
00:01:24,916 --> 00:01:28,156
Speaker 1: He's the co founder and chief technology officer at the

19
00:01:28,236 --> 00:01:34,076
Speaker 1: audaciously named Reality Defender. Ali's problem is this, how can

20
00:01:34,116 --> 00:01:39,516
Speaker 1: you use AI to protect the world from AI? More specifically,

21
00:01:39,956 --> 00:01:41,996
Speaker 1: how do you build a set of models to spot

22
00:01:41,996 --> 00:01:48,076
Speaker 1: the difference between reality and AI generated deep fakes. How'd

23
00:01:48,076 --> 00:01:50,476
Speaker 1: you get into the defending reality business?

24
00:01:51,796 --> 00:01:57,556
Speaker 2: Yeah, so when I started, it was around actually generating

25
00:01:57,756 --> 00:02:00,236
Speaker 2: videos and deep fikes.

26
00:02:00,396 --> 00:02:03,036
Speaker 1: So you were attacking reality before you were defending it.

27
00:02:04,556 --> 00:02:06,796
Speaker 2: I wouldn't said we were attacking anything, but we were

28
00:02:06,796 --> 00:02:10,756
Speaker 2: definitely into looking into this technology. And it is way

29
00:02:10,796 --> 00:02:14,036
Speaker 2: back before all this stuff kind of went crazy. This

30
00:02:14,076 --> 00:02:17,036
Speaker 2: is back in like twenty nineteen around that time, So

31
00:02:17,076 --> 00:02:21,236
Speaker 2: we were building digital twins and we're looking at how

32
00:02:21,236 --> 00:02:23,476
Speaker 2: do you make it so that it looks realistic? Is

33
00:02:23,516 --> 00:02:27,116
Speaker 2: it a cartoon looking thing? Is it like a unity

34
00:02:27,116 --> 00:02:29,596
Speaker 2: three D thing? And then that's when we started to

35
00:02:29,636 --> 00:02:32,796
Speaker 2: see like these early research papers where they were taking

36
00:02:32,836 --> 00:02:36,236
Speaker 2: like someone's face and putting it on a video and

37
00:02:36,316 --> 00:02:39,436
Speaker 2: blending it in and it looked really good, and we

38
00:02:39,436 --> 00:02:41,556
Speaker 2: were like, oh, maybe we can do the digital twins

39
00:02:41,916 --> 00:02:46,276
Speaker 2: that way. And while we were like in that business,

40
00:02:46,516 --> 00:02:49,116
Speaker 2: we were like, you know, probably in a few years

41
00:02:49,116 --> 00:02:53,836
Speaker 2: someone can download an app and just make anything very easily.

42
00:02:53,876 --> 00:02:57,356
Speaker 2: And that's kind of the origins of how how we started.

43
00:02:57,996 --> 00:03:00,116
Speaker 2: We're very mission driven. What we're trying to do here

44
00:03:00,236 --> 00:03:05,356
Speaker 2: is really protect the world and people from the dangers

45
00:03:05,836 --> 00:03:08,716
Speaker 2: of AI, but in a way where you know, we

46
00:03:08,756 --> 00:03:11,996
Speaker 2: want people not to abuse the technology. We're very we

47
00:03:12,036 --> 00:03:15,116
Speaker 2: love AI, we just don't want it to be abused.

48
00:03:16,036 --> 00:03:19,876
Speaker 1: So let's talk about this sort of deep fake detection

49
00:03:20,516 --> 00:03:25,476
Speaker 1: kind of you know, jen AI detection market more generally,

50
00:03:25,716 --> 00:03:30,796
Speaker 1: like who's like, who's selling deep fake detection right now,

51
00:03:30,796 --> 00:03:32,676
Speaker 1: and who's buying what's the what's the sort of market

52
00:03:32,796 --> 00:03:33,676
Speaker 1: landscape look like.

53
00:03:34,636 --> 00:03:38,116
Speaker 2: The type of clients that we have right now are banks.

54
00:03:39,036 --> 00:03:42,316
Speaker 2: For example, we are currently live with one of the

55
00:03:42,436 --> 00:03:45,436
Speaker 2: largest banks in the world. When you call that bank,

56
00:03:46,436 --> 00:03:50,316
Speaker 2: the audio goes through our defake detection models and we're

57
00:03:50,356 --> 00:03:53,356
Speaker 2: able to tell the call center this person might be

58
00:03:53,396 --> 00:03:57,036
Speaker 2: a deep fake. And part of that is that's actually happened.

59
00:03:57,076 --> 00:04:01,676
Speaker 2: Someone's called the bank and they've transferred money out and

60
00:04:01,956 --> 00:04:05,316
Speaker 2: actually this this goes back to twenty nineteen, so the

61
00:04:05,396 --> 00:04:09,596
Speaker 2: first incident of defake fraud actually and back in.

62
00:04:09,596 --> 00:04:13,316
Speaker 1: Two in nineteen that we're aware of. Right, you're right exactly,

63
00:04:13,676 --> 00:04:15,556
Speaker 1: So what happened in twenty nineteen.

64
00:04:16,156 --> 00:04:18,836
Speaker 2: Yeah, so this is back where this is early and

65
00:04:18,876 --> 00:04:22,596
Speaker 2: nobody really knew about this, and there was a CEO

66
00:04:22,756 --> 00:04:26,756
Speaker 2: that called a smaller company that THEO was a parent

67
00:04:26,796 --> 00:04:29,756
Speaker 2: company calling the child company. The CEO calling the other

68
00:04:29,796 --> 00:04:32,676
Speaker 2: CEO and he wanted to transfer some money out and

69
00:04:33,196 --> 00:04:36,196
Speaker 2: it sounded like him and the guy transferred I think

70
00:04:36,196 --> 00:04:37,916
Speaker 2: it was in UK, about two hundred and three hundred

71
00:04:37,916 --> 00:04:39,636
Speaker 2: thousand dollars out And that was like the first one

72
00:04:39,636 --> 00:04:40,796
Speaker 2: of the first ones that we.

73
00:04:40,836 --> 00:04:44,036
Speaker 1: Know of, and they got away with it, I believe.

74
00:04:44,076 --> 00:04:44,676
Speaker 2: So. Yeah.

75
00:04:44,716 --> 00:04:46,876
Speaker 1: And there was an instance earlier this year right where

76
00:04:47,156 --> 00:04:49,636
Speaker 1: I think it was in Hong Kong and some employee

77
00:04:49,676 --> 00:04:51,836
Speaker 1: was on a zoom call with the company's CFO and

78
00:04:51,876 --> 00:04:54,356
Speaker 1: the CFO was like, you know, why are twenty five

79
00:04:54,356 --> 00:04:57,036
Speaker 1: million dollars or something to some bank account? And then

80
00:04:57,076 --> 00:04:59,236
Speaker 1: the employee did it and it turned out the CFO

81
00:04:59,276 --> 00:05:00,836
Speaker 1: on the call was a deep fake, right.

82
00:05:01,076 --> 00:05:05,676
Speaker 2: Yeah, So fast were they your client? They were not

83
00:05:05,916 --> 00:05:09,516
Speaker 2: in our clients unfortunately. But this shows the how quickly

84
00:05:09,556 --> 00:05:13,756
Speaker 2: the technology is evolving. You know, twenty nineteen audio fast

85
00:05:13,756 --> 00:05:15,676
Speaker 2: forward a few years now, You've got a zoom call.

86
00:05:15,996 --> 00:05:17,956
Speaker 2: There'd a bunch of people on it and they all

87
00:05:17,996 --> 00:05:20,476
Speaker 2: look like people, you know, I know, they're all de fis.

88
00:05:21,156 --> 00:05:23,276
Speaker 1: So you were starting to mention. Banks are some of

89
00:05:23,276 --> 00:05:25,196
Speaker 1: your main clients. Who are some of your other main.

90
00:05:25,076 --> 00:05:28,756
Speaker 2: Clients, media companies, I think think of some of the

91
00:05:28,756 --> 00:05:31,876
Speaker 2: big ones, there is our product this year, especially with

92
00:05:31,996 --> 00:05:35,076
Speaker 2: the election. You know, back twenty twenty, we thought it

93
00:05:35,076 --> 00:05:37,836
Speaker 2: would be a problem. It wasn't. This year we think

94
00:05:37,916 --> 00:05:40,116
Speaker 2: is a big problem. For sure. I think we were early,

95
00:05:40,636 --> 00:05:45,156
Speaker 2: but it's already this is happening everywhere even this year.

96
00:05:45,756 --> 00:05:47,756
Speaker 2: This year is the largest election year in the world.

97
00:05:47,876 --> 00:05:49,996
Speaker 2: More than fifty percent of the people are voting, and

98
00:05:50,076 --> 00:05:54,756
Speaker 2: we already have documented cases of election issues with the fix.

99
00:05:55,556 --> 00:05:59,436
Speaker 1: Okay, media companies, banks, any other kind of big categories

100
00:05:59,436 --> 00:05:59,956
Speaker 1: of clients.

101
00:06:00,636 --> 00:06:06,916
Speaker 2: Yeah, so other ones are government agencies. But in the end,

102
00:06:07,276 --> 00:06:11,996
Speaker 2: everyone we think, we believe everyone needs this Product's not

103
00:06:12,076 --> 00:06:14,356
Speaker 2: It shouldn't be up to the people to decide or

104
00:06:14,356 --> 00:06:16,676
Speaker 2: figure out if something's a deepic. If you're on the

105
00:06:16,756 --> 00:06:19,876
Speaker 2: social media platform, you shouldn't have to figure out, hey,

106
00:06:19,916 --> 00:06:21,556
Speaker 2: is this person real or not. It should just be

107
00:06:21,596 --> 00:06:23,596
Speaker 2: built in and anyone should be able to use it.

108
00:06:24,156 --> 00:06:29,156
Speaker 1: Well. Our social media companies either buying or building deep

109
00:06:29,196 --> 00:06:31,836
Speaker 1: fake detection tools or do they want to like stay

110
00:06:31,876 --> 00:06:33,436
Speaker 1: out of that business and be like no, we don't

111
00:06:33,436 --> 00:06:35,396
Speaker 1: want to be in the business of saying yes, this

112
00:06:35,516 --> 00:06:36,596
Speaker 1: is real, no, this isn't real.

113
00:06:37,356 --> 00:06:39,556
Speaker 2: I can tell you we've been in contact and have

114
00:06:39,676 --> 00:06:44,636
Speaker 2: talked to some social media platforms. I think one issue

115
00:06:44,676 --> 00:06:49,116
Speaker 2: is they don't have to flag these things. It's up

116
00:06:49,156 --> 00:06:53,076
Speaker 2: to them, right, there's not a lot of regulation, so

117
00:06:53,516 --> 00:06:55,436
Speaker 2: I know they're thinking about it. We've chatted with some,

118
00:06:56,076 --> 00:06:57,596
Speaker 2: but that's the extent of it.

119
00:06:58,316 --> 00:07:00,236
Speaker 1: So okay, So let's talk about how it works. And

120
00:07:00,276 --> 00:07:02,036
Speaker 1: there's two ways that I want to talk about how

121
00:07:02,036 --> 00:07:03,396
Speaker 1: it works. So one is from the point of view

122
00:07:03,436 --> 00:07:06,356
Speaker 1: of the user, whoever that may be, and then the

123
00:07:06,396 --> 00:07:08,476
Speaker 1: other is sort of what's going on under the hood. Right,

124
00:07:08,756 --> 00:07:11,316
Speaker 1: So let's start with the point of view of the user.

125
00:07:11,596 --> 00:07:15,156
Speaker 1: If I'm a whatever, a bank, university, a media company

126
00:07:15,196 --> 00:07:18,196
Speaker 1: who is paying for your service, how does it work

127
00:07:18,236 --> 00:07:18,436
Speaker 1: for me?

128
00:07:19,436 --> 00:07:22,436
Speaker 2: Depends on exactly the user and the use case. If

129
00:07:22,476 --> 00:07:26,076
Speaker 2: let's say it's a media company, Uh, they're looking at

130
00:07:26,196 --> 00:07:30,596
Speaker 2: maybe filtering through a lot of content, so content moderation.

131
00:07:30,916 --> 00:07:32,916
Speaker 2: Actually that would be like a social media company. They're

132
00:07:32,956 --> 00:07:36,516
Speaker 2: looking at content moderation. Maybe they want they're looking at

133
00:07:36,556 --> 00:07:39,716
Speaker 2: millions of assets and they want to quickly flag those

134
00:07:39,716 --> 00:07:42,396
Speaker 2: things if they were in that business. Uh, the bank

135
00:07:42,796 --> 00:07:46,196
Speaker 2: there For the example I gave the issue, someone could

136
00:07:46,236 --> 00:07:48,836
Speaker 2: call and biometrics fail. By the way, if you call

137
00:07:48,876 --> 00:07:52,116
Speaker 2: a bank, some banks say repeat after me, your my

138
00:07:52,196 --> 00:07:54,196
Speaker 2: voice is my passport? That actually fails. Now what do

139
00:07:54,316 --> 00:07:57,436
Speaker 2: you think? So a bank wants to make sure the

140
00:07:57,476 --> 00:08:00,276
Speaker 2: person calling in is actually that person. This is more

141
00:08:00,476 --> 00:08:03,876
Speaker 2: relevant to more to private banking, where there's actually a

142
00:08:03,916 --> 00:08:06,996
Speaker 2: one on one relationship between the client and the bank.

143
00:08:07,236 --> 00:08:09,436
Speaker 1: And so in that case, So let's take that case.

144
00:08:09,476 --> 00:08:12,876
Speaker 1: So in that case, someone calls in and talks to

145
00:08:12,996 --> 00:08:15,716
Speaker 1: their banker. They're a rich person who has a private banker.

146
00:08:15,756 --> 00:08:18,316
Speaker 1: Basically it's what you're talking about, right, So this rich

147
00:08:18,356 --> 00:08:21,996
Speaker 1: person calls in and talks to their private banker, and

148
00:08:22,356 --> 00:08:25,436
Speaker 1: it is the system just always running in the background

149
00:08:25,556 --> 00:08:28,156
Speaker 1: in that case, And like, how does it work from

150
00:08:28,156 --> 00:08:30,276
Speaker 1: the point of view of the of the private banker.

151
00:08:30,916 --> 00:08:33,556
Speaker 2: Sure, and I have to be careful what I say here,

152
00:08:33,836 --> 00:08:38,076
Speaker 2: But the high level is the models are listening and

153
00:08:38,116 --> 00:08:41,476
Speaker 2: if they detect a potential deep fake, they will the

154
00:08:41,636 --> 00:08:44,476
Speaker 2: call center. That person will get a notification so is

155
00:08:44,836 --> 00:08:48,716
Speaker 2: integrated into their existing workflow. They'll get a notification that says, hey, this.

156
00:08:48,596 --> 00:08:51,116
Speaker 1: Person get like a text or a slack or something

157
00:08:51,316 --> 00:08:53,796
Speaker 1: they're using. You're talking to a deep fake.

158
00:08:54,596 --> 00:08:56,876
Speaker 2: No, they're using software for the bank they're using they're

159
00:08:56,916 --> 00:09:00,516
Speaker 2: still using a software and there's a dashboard. In that scenario,

160
00:09:00,636 --> 00:09:03,436
Speaker 2: they do they ascalate, so they might say, let me

161
00:09:03,436 --> 00:09:05,436
Speaker 2: ask you some more questions or let me call you back.

162
00:09:05,876 --> 00:09:07,756
Speaker 1: Huh. Let me call you back is a super safe one,

163
00:09:07,796 --> 00:09:09,676
Speaker 1: right because if they have a relationship, probably they know

164
00:09:09,716 --> 00:09:13,276
Speaker 1: the number. They just call them back. Yeah, absolutely, okay,

165
00:09:13,596 --> 00:09:16,236
Speaker 1: And then how does it work? How does it work

166
00:09:16,276 --> 00:09:18,796
Speaker 1: for like when you say, like I presume by the

167
00:09:18,836 --> 00:09:21,276
Speaker 1: way that you can't name your clients. You said a

168
00:09:21,316 --> 00:09:24,396
Speaker 1: media company and a bank. It's it's secret that they're.

169
00:09:24,476 --> 00:09:26,036
Speaker 2: Yeah, we're not allowed to okay.

170
00:09:26,036 --> 00:09:28,196
Speaker 1: So let's say a media company. How's it work for

171
00:09:28,236 --> 00:09:30,036
Speaker 1: a media company?

172
00:09:29,916 --> 00:09:33,196
Speaker 2: Their their use case is slightly different, especially right now,

173
00:09:33,236 --> 00:09:35,676
Speaker 2: as I mentioned around the election, So there there might

174
00:09:35,716 --> 00:09:38,076
Speaker 2: be something that that's starting to go viral in the

175
00:09:38,156 --> 00:09:40,876
Speaker 2: news and they want to check, hey, is this a

176
00:09:40,996 --> 00:09:43,916
Speaker 2: real or not? I would like to say like something

177
00:09:44,036 --> 00:09:47,556
Speaker 2: like this is usually when something goes viral, the damage

178
00:09:47,556 --> 00:09:48,276
Speaker 2: is already ton.

179
00:09:48,996 --> 00:09:51,156
Speaker 1: Yes, although if you're if you're whatever, the New York

180
00:09:51,196 --> 00:09:52,956
Speaker 1: Times of the Wall Street Journal. You don't want to

181
00:09:52,996 --> 00:09:55,796
Speaker 1: repeat the viral lie. Part of your business model is

182
00:09:55,876 --> 00:09:59,156
Speaker 1: people are paying to subscribe to you because you are

183
00:09:59,636 --> 00:10:00,916
Speaker 1: more reliable.

184
00:10:00,516 --> 00:10:02,716
Speaker 2: Right exactly. So that's why they come to us. They

185
00:10:02,796 --> 00:10:05,796
Speaker 2: upload the assets and are our web app returns the

186
00:10:05,876 --> 00:10:06,636
Speaker 2: results I see.

187
00:10:06,676 --> 00:10:09,036
Speaker 1: So it's just like you just go to whatever Real

188
00:10:09,676 --> 00:10:13,156
Speaker 1: Defender dot whatever and you upload the viral video and

189
00:10:13,396 --> 00:10:15,956
Speaker 1: your machine says it's a fake.

190
00:10:16,676 --> 00:10:19,516
Speaker 2: Yeah, So we give results and probabilities that we don't

191
00:10:19,516 --> 00:10:22,756
Speaker 2: have the ground truth, so we give a probability. There's

192
00:10:22,796 --> 00:10:25,876
Speaker 2: several different models running, so we use an ensemble of models.

193
00:10:25,876 --> 00:10:29,596
Speaker 2: We have different models looking at different things, and we

194
00:10:29,676 --> 00:10:32,676
Speaker 2: give an overall score averaging those. In the case of

195
00:10:32,716 --> 00:10:35,636
Speaker 2: a video, we actually highlight the areas of a defake.

196
00:10:36,116 --> 00:10:38,036
Speaker 2: If the person is speaking and they're a fake, there'll

197
00:10:38,036 --> 00:10:39,836
Speaker 2: be a red box around them. If there is a

198
00:10:39,916 --> 00:10:41,196
Speaker 2: real they'll be a green box around it.

199
00:10:41,516 --> 00:10:46,236
Speaker 1: And well, that latter part sounds more binary as opposed

200
00:10:46,276 --> 00:10:47,276
Speaker 1: to probabilistic.

201
00:10:47,476 --> 00:10:50,076
Speaker 2: We give both. So yeah, there's there was a probably

202
00:10:50,236 --> 00:10:52,436
Speaker 2: score and there's just the visual.

203
00:10:52,276 --> 00:10:55,356
Speaker 1: And so the probabilistic score is basically according to our model,

204
00:10:55,396 --> 00:10:58,596
Speaker 1: there's a seventy percent chance that this is fake something

205
00:10:58,756 --> 00:10:59,836
Speaker 1: of that nature.

206
00:10:59,676 --> 00:11:01,796
Speaker 2: According to our ensemble of models.

207
00:11:01,916 --> 00:11:04,836
Speaker 1: Yes, yeah, our model of models, our fund of funds

208
00:11:04,836 --> 00:11:09,316
Speaker 1: of models exactly. So so okay, so you're actually looking

209
00:11:09,436 --> 00:11:12,876
Speaker 1: us toward what's under the hood, right, I'm interested in

210
00:11:12,916 --> 00:11:15,796
Speaker 1: discussing this on a few levels. Right, there is the

211
00:11:15,836 --> 00:11:20,036
Speaker 1: sort of broad beyond reality defender. You know, what are

212
00:11:20,076 --> 00:11:22,636
Speaker 1: the basic ways that the technology works, Like how does

213
00:11:23,036 --> 00:11:26,596
Speaker 1: deepfake detection gen AI detection work? In a broad way?

214
00:11:26,636 --> 00:11:27,636
Speaker 1: Like can you talk me through that?

215
00:11:27,676 --> 00:11:31,196
Speaker 2: Absolutely? Yeah. There's currently two ways people are looking at

216
00:11:31,236 --> 00:11:35,836
Speaker 2: this problem. Number one is prominence. For example, you water

217
00:11:35,996 --> 00:11:39,636
Speaker 2: mark a media that you create, maybe you water market

218
00:11:40,036 --> 00:11:42,156
Speaker 2: or you digitally sign it, maybe you put on a

219
00:11:42,156 --> 00:11:44,436
Speaker 2: blockchain somewhere or something like that. But basically there's a

220
00:11:44,476 --> 00:11:47,036
Speaker 2: source of true that this video is real. Yeah, and

221
00:11:47,076 --> 00:11:48,556
Speaker 2: there's a water mark. That's number one.

222
00:11:50,156 --> 00:11:52,916
Speaker 1: But we're concerned. We're concerned with instances where that is

223
00:11:52,916 --> 00:11:54,916
Speaker 1: not the case. Right. Our world is full of videos

224
00:11:54,956 --> 00:12:00,436
Speaker 1: today that are not clearly watermarked, blockchain whatever for prominence.

225
00:12:00,476 --> 00:12:02,676
Speaker 1: So we have this problem. What are the ways people

226
00:12:02,676 --> 00:12:03,236
Speaker 1: are solving it?

227
00:12:03,516 --> 00:12:05,676
Speaker 2: Yeah? The second way is how we're solving it, which

228
00:12:05,716 --> 00:12:08,556
Speaker 2: is basically we use AI to detect AI, so we

229
00:12:09,596 --> 00:12:13,196
Speaker 2: which we call inference. So we train AI models, as

230
00:12:13,236 --> 00:12:16,116
Speaker 2: I mentioned, a bunch of them to look at various

231
00:12:17,636 --> 00:12:20,036
Speaker 2: various aspects of plus say video.

232
00:12:20,476 --> 00:12:24,836
Speaker 1: So like, is it a sort of generative adversarial network

233
00:12:25,636 --> 00:12:27,436
Speaker 1: the right term? I mean, it seems like you It

234
00:12:27,476 --> 00:12:29,276
Speaker 1: seems like if I were making up how to do this,

235
00:12:29,316 --> 00:12:32,156
Speaker 1: I'd be like, well, I'm gonna have one model that's

236
00:12:32,236 --> 00:12:35,196
Speaker 1: like cranking out really good deep fikes, but I'll know

237
00:12:35,236 --> 00:12:36,876
Speaker 1: which ones are the deep fis, and then I'm gonna

238
00:12:36,876 --> 00:12:38,476
Speaker 1: feed the deep fis and the real ones to my

239
00:12:38,516 --> 00:12:41,076
Speaker 1: other model, and I'll score it on how well it does,

240
00:12:41,116 --> 00:12:43,356
Speaker 1: and it'll get really good at figuring out the difference.

241
00:12:43,796 --> 00:12:46,596
Speaker 2: Yeah, that's actually exactly how a lot of these work.

242
00:12:46,676 --> 00:12:48,956
Speaker 2: For if you go to there's a website you can

243
00:12:48,956 --> 00:12:51,316
Speaker 2: go where it just generates a person every time you

244
00:12:51,396 --> 00:12:53,396
Speaker 2: go to it a right, and that's actually using again

245
00:12:53,596 --> 00:12:56,916
Speaker 2: to generate that person. So the way we detect and

246
00:12:56,956 --> 00:12:58,516
Speaker 2: I can I can give a little bit more detail here.

247
00:12:58,556 --> 00:13:02,276
Speaker 2: So for example, one of our models which we actually removed,

248
00:13:02,636 --> 00:13:07,636
Speaker 2: was looking at blood flow. So yeah, so imagine actually

249
00:13:07,676 --> 00:13:11,716
Speaker 2: in this video lighting and conditions are right, we can

250
00:13:11,796 --> 00:13:14,476
Speaker 2: actually detect the heartbeat and the blood flow and the

251
00:13:14,556 --> 00:13:16,476
Speaker 2: veins the way we're looking at each other.

252
00:13:16,916 --> 00:13:19,556
Speaker 1: As I'm looking at my weirdly today, maybe because it's

253
00:13:19,556 --> 00:13:21,436
Speaker 1: hot or because the light hair, I can actually see

254
00:13:21,436 --> 00:13:24,156
Speaker 1: a vein bulging on my forehead. So, like you're saying,

255
00:13:24,156 --> 00:13:28,036
Speaker 1: an AI could like measure my pulse from that or something.

256
00:13:27,996 --> 00:13:30,956
Speaker 2: In the right conditions. Now, that model has a lot

257
00:13:30,996 --> 00:13:36,116
Speaker 2: of limitations, and you need to have the right It's

258
00:13:36,156 --> 00:13:39,276
Speaker 2: basically it has a lot of bias. Right, So we

259
00:13:39,356 --> 00:13:39,796
Speaker 2: tossed that.

260
00:13:40,156 --> 00:13:42,396
Speaker 1: Wait, you're saying it didn't work. You're saying it didn't work.

261
00:13:42,636 --> 00:13:45,876
Speaker 2: It worked in the right conditions and the right skin tone,

262
00:13:46,316 --> 00:13:49,516
Speaker 2: so yeah, so otherwise it was biased. So we this

263
00:13:49,676 --> 00:13:52,476
Speaker 2: was experimental and we tossed it.

264
00:13:52,396 --> 00:13:54,596
Speaker 1: A lot of things. It didn't work. So you tried

265
00:13:54,596 --> 00:13:56,356
Speaker 1: it and in a broad way it didn't work. It

266
00:13:56,396 --> 00:13:58,436
Speaker 1: worked in narrow conditions, but you need things that work

267
00:13:58,476 --> 00:14:01,196
Speaker 1: more broadly. What's another thing you tried that didn't work?

268
00:14:02,356 --> 00:14:05,396
Speaker 2: Well, I can tell you every month we may be

269
00:14:05,516 --> 00:14:06,636
Speaker 2: throwing away models.

270
00:14:06,836 --> 00:14:09,196
Speaker 1: Well, presumably there's things that work for a while and

271
00:14:09,236 --> 00:14:13,236
Speaker 1: then they don't. Right, It's kind of like antibiotics versus bacteria, right,

272
00:14:13,356 --> 00:14:16,236
Speaker 1: like your adversaries are getting better every day.

273
00:14:16,596 --> 00:14:19,116
Speaker 2: Basically, what we use, what we like to use is

274
00:14:19,156 --> 00:14:21,196
Speaker 2: we like to say we're like an anti virus company.

275
00:14:21,476 --> 00:14:25,316
Speaker 2: So every time every month there's a new genera of technique,

276
00:14:25,516 --> 00:14:28,156
Speaker 2: maybe we should go detective. But maybe it's something we

277
00:14:28,196 --> 00:14:30,516
Speaker 2: don't anticipate and we don't detect, and so we have

278
00:14:30,556 --> 00:14:33,436
Speaker 2: to make sure we quickly update our models. So and

279
00:14:33,436 --> 00:14:36,676
Speaker 2: then a model that worked last year, it's completely irrelevant now.

280
00:14:37,156 --> 00:14:40,316
Speaker 1: So what else, like, what else is happening technologically on

281
00:14:40,436 --> 00:14:42,956
Speaker 1: the reality defense side, on the detection side.

282
00:14:43,556 --> 00:14:46,476
Speaker 2: Okay, so the way, we have a few different products.

283
00:14:46,556 --> 00:14:50,436
Speaker 2: One is, as I mentioned, real time audio like scanning

284
00:14:50,436 --> 00:14:53,236
Speaker 2: and listening for telephone calls. The other one is a

285
00:14:53,276 --> 00:14:55,836
Speaker 2: place where a journalist or any user can go and

286
00:14:55,996 --> 00:14:59,516
Speaker 2: upload not just videos, but we also detect images. We

287
00:14:59,556 --> 00:15:03,036
Speaker 2: also detect audio, We also detect texts like chat GPT,

288
00:15:03,596 --> 00:15:06,956
Speaker 2: and these tools also explain to a user why something

289
00:15:07,196 --> 00:15:09,036
Speaker 2: is a deep fake. We don't just give a score.

290
00:15:09,236 --> 00:15:11,476
Speaker 2: Or for an image, we might put a heat map

291
00:15:11,476 --> 00:15:13,876
Speaker 2: and see these are the areas that set the model off.

292
00:15:14,956 --> 00:15:17,796
Speaker 2: For text, we might highlight areas and see these other

293
00:15:17,876 --> 00:15:19,996
Speaker 2: areas that appear to be generated.

294
00:15:20,196 --> 00:15:22,756
Speaker 1: There's a case study you have about a university that

295
00:15:22,876 --> 00:15:27,316
Speaker 1: is a client of yours that, among other things, uses

296
00:15:28,116 --> 00:15:32,636
Speaker 1: uses your service to tell when students are turning in

297
00:15:32,676 --> 00:15:36,396
Speaker 1: papers written by chat GIPT. Basically as I read it, right, like,

298
00:15:36,556 --> 00:15:39,236
Speaker 1: I just assume that, like everybody writes papers with chat

299
00:15:39,316 --> 00:15:41,516
Speaker 1: GPT now and there's nothing anybody can do about it.

300
00:15:41,516 --> 00:15:43,716
Speaker 1: But is that not true? Like if I like have

301
00:15:43,836 --> 00:15:46,196
Speaker 1: GPT write my paper and then I like change a

302
00:15:46,196 --> 00:15:49,876
Speaker 1: few words, does that sort of help get let me

303
00:15:50,116 --> 00:15:51,796
Speaker 1: sail past your defense?

304
00:15:52,316 --> 00:15:55,236
Speaker 2: It depends depends how much you change, Yeah, or if

305
00:15:55,276 --> 00:15:58,756
Speaker 2: you change like over fifty percent, maybe maybe would So

306
00:15:59,036 --> 00:16:00,756
Speaker 2: it depends.

307
00:16:00,476 --> 00:16:02,796
Speaker 1: Over fifty percent is more than a few words. And

308
00:16:02,876 --> 00:16:04,996
Speaker 1: so can you talk? I mean, I know you can't

309
00:16:05,036 --> 00:16:07,196
Speaker 1: name the university, but in practice you know how they

310
00:16:07,276 --> 00:16:11,356
Speaker 1: use it. So you know, somefess runs the student's papers

311
00:16:11,396 --> 00:16:14,316
Speaker 1: through your software and it says of when student there's

312
00:16:14,316 --> 00:16:18,596
Speaker 1: a whatever sixty percent chance that this was created using

313
00:16:18,596 --> 00:16:22,236
Speaker 1: a large language model. I mean, do you know in practice?

314
00:16:22,276 --> 00:16:24,316
Speaker 1: Obviously the professor could do whatever they want or the

315
00:16:24,436 --> 00:16:26,636
Speaker 1: university could have whatever policy, but do you know in practice,

316
00:16:26,756 --> 00:16:30,156
Speaker 1: what do they do with this information like that's that's

317
00:16:30,196 --> 00:16:31,876
Speaker 1: in a way a harder one to figure out than

318
00:16:31,916 --> 00:16:34,436
Speaker 1: the like banker who's like, oh, it might be a

319
00:16:34,436 --> 00:16:36,076
Speaker 1: deep fake on the phone. I'll call you right back

320
00:16:36,116 --> 00:16:38,756
Speaker 1: for security. Like if my I don't have a banker,

321
00:16:38,876 --> 00:16:40,636
Speaker 1: but if I had a banker and they did that,

322
00:16:40,676 --> 00:16:43,076
Speaker 1: I'd be like, oh, that's cool. I'm glad my bank

323
00:16:43,196 --> 00:16:45,796
Speaker 1: is doing this thing. Whereas with like the professor and

324
00:16:45,836 --> 00:16:51,436
Speaker 1: the student, that's a much more sort of fraud situation, right,

325
00:16:52,276 --> 00:16:55,036
Speaker 1: and harder to think of how to deal with again

326
00:16:55,116 --> 00:16:59,156
Speaker 1: the probabilistic nature of the output of the model.

327
00:16:59,476 --> 00:17:01,556
Speaker 2: Yes, I think a couple more things here. First of all,

328
00:17:01,836 --> 00:17:05,436
Speaker 2: I think even universities are trying to figure out this problem.

329
00:17:05,476 --> 00:17:08,076
Speaker 2: How to you solve it? You know. But the second

330
00:17:08,236 --> 00:17:13,156
Speaker 2: thing to note, most of our users are not interested

331
00:17:13,236 --> 00:17:15,636
Speaker 2: in a text detector. That seems to be a much

332
00:17:15,676 --> 00:17:20,076
Speaker 2: smaller market. The biggest one is actually audio. It's becoming

333
00:17:20,916 --> 00:17:22,516
Speaker 2: imagine you get a call from a loved one and

334
00:17:22,836 --> 00:17:24,876
Speaker 2: send me money, and you send money if you realize

335
00:17:24,956 --> 00:17:27,516
Speaker 2: is not who it was a defate, right, That's actually

336
00:17:27,556 --> 00:17:30,636
Speaker 2: a much widely used system.

337
00:17:31,116 --> 00:17:34,436
Speaker 1: That's the big one in terms of the business it's interesting.

338
00:17:34,476 --> 00:17:36,556
Speaker 1: I mean, I wonder if that's partly like relative we

339
00:17:36,596 --> 00:17:38,796
Speaker 1: think about the video more, but is it partly because

340
00:17:39,116 --> 00:17:41,916
Speaker 1: deep fake audio is now quite good and there are

341
00:17:41,956 --> 00:17:44,876
Speaker 1: lots of instances where people will transfer lots of money

342
00:17:44,916 --> 00:17:46,476
Speaker 1: based solely on audio.

343
00:17:47,116 --> 00:17:49,196
Speaker 2: De fake audio is the best and it's getting better,

344
00:17:49,276 --> 00:17:51,996
Speaker 2: right interested. I used to go to make your voice,

345
00:17:51,996 --> 00:17:54,196
Speaker 2: maybe I need a minute. Now I need just a

346
00:17:54,196 --> 00:17:56,996
Speaker 2: few seconds and I can make your voice. It's getting

347
00:17:57,436 --> 00:18:00,396
Speaker 2: exponentially better. All of them are, but audio is definitely

348
00:18:00,676 --> 00:18:01,596
Speaker 2: top of the list right now.

349
00:18:01,716 --> 00:18:05,316
Speaker 1: Huh And how are you keeping up?

350
00:18:06,236 --> 00:18:09,516
Speaker 2: Yeah? I mean, so when we detect audio, it's tricky.

351
00:18:09,516 --> 00:18:13,116
Speaker 2: There's a lot of factors to think about a person's accent,

352
00:18:13,636 --> 00:18:17,076
Speaker 2: right what is model biased? Does it not understand or

353
00:18:17,196 --> 00:18:20,076
Speaker 2: is there an issue where it detects It detects one

354
00:18:20,076 --> 00:18:22,236
Speaker 2: person with a certain type accent always as a d thing.

355
00:18:22,756 --> 00:18:26,116
Speaker 2: There's also issues of like noise when when when there's

356
00:18:26,116 --> 00:18:28,436
Speaker 2: a lot of background noise, the model could be impacted.

357
00:18:28,556 --> 00:18:31,236
Speaker 2: When there's cosstop, multiple people speaking at the same time,

358
00:18:31,556 --> 00:18:34,956
Speaker 2: that could impact the model. So there's a variety of factors.

359
00:18:35,116 --> 00:18:37,036
Speaker 2: And the other thing you think about is our models

360
00:18:37,036 --> 00:18:40,956
Speaker 2: are more they support multiple languages, so we don't just

361
00:18:40,956 --> 00:18:43,636
Speaker 2: do English, and so all of these kind of make

362
00:18:43,676 --> 00:18:46,956
Speaker 2: it very complicated. So when we detect something it's called

363
00:18:46,996 --> 00:18:50,036
Speaker 2: pre processing, there's a whole bunch of steps to the

364
00:18:50,116 --> 00:18:52,796
Speaker 2: audio before it actually goes to our AI models where

365
00:18:52,796 --> 00:18:56,076
Speaker 2: we have to clean up the audio, do certain types

366
00:18:56,076 --> 00:18:58,276
Speaker 2: of transformations before we push it to the models.

367
00:18:58,316 --> 00:19:01,076
Speaker 1: And is that happening in real time with these companies?

368
00:19:01,156 --> 00:19:07,036
Speaker 1: Huh huh? And and are you like, what is the

369
00:19:07,076 --> 00:19:10,236
Speaker 1: frontier of preprocessing? Like is it is it an efficiency

370
00:19:10,236 --> 00:19:12,356
Speaker 1: and speed problem because you're trying to do it in

371
00:19:12,356 --> 00:19:14,876
Speaker 1: real time and so you're just trying to kind of

372
00:19:15,156 --> 00:19:18,036
Speaker 1: make the sort of algorithmic part of it as fast

373
00:19:18,036 --> 00:19:19,156
Speaker 1: and efficient as possible.

374
00:19:19,876 --> 00:19:22,636
Speaker 2: Yeah, I mean this is a challenge. There's a lot

375
00:19:22,676 --> 00:19:25,716
Speaker 2: to be done. So that's an ongoing research. How do

376
00:19:25,796 --> 00:19:28,836
Speaker 2: we continue to speed up not just a preprocessing, but

377
00:19:28,876 --> 00:19:33,156
Speaker 2: the inference. And there's a variety of one thing that's

378
00:19:33,156 --> 00:19:35,076
Speaker 2: called a foundation model. I'm not sure if you heard

379
00:19:35,236 --> 00:19:37,556
Speaker 2: what those are, but these are extremely large pre train

380
00:19:37,636 --> 00:19:40,196
Speaker 2: model GPT is a foundation model is a pre train model.

381
00:19:40,596 --> 00:19:43,556
Speaker 2: And so these models can be useful in some parts

382
00:19:43,556 --> 00:19:47,956
Speaker 2: of the preprocessing where they can quickly extract certain features

383
00:19:47,956 --> 00:19:50,436
Speaker 2: for us, and then we can use those two down

384
00:19:50,436 --> 00:19:55,516
Speaker 2: the pipeline.

385
00:19:54,036 --> 00:19:56,436
Speaker 1: Still to come on the show. The problems that Ali

386
00:19:56,636 --> 00:20:09,996
Speaker 1: is trying to solve. Now, how good are you at

387
00:20:09,996 --> 00:20:12,996
Speaker 1: detecting de fikes? Can you quantify how good you are?

388
00:20:13,956 --> 00:20:16,236
Speaker 2: So the way they usually do this is they look

389
00:20:16,276 --> 00:20:19,876
Speaker 2: at benchmarks. Right, there's public data sets which we can

390
00:20:19,916 --> 00:20:23,396
Speaker 2: take and run and we're in the nineties and then

391
00:20:23,556 --> 00:20:25,476
Speaker 2: but you know that's not the real world.

392
00:20:25,516 --> 00:20:27,556
Speaker 1: When you say you're in the nineties, you mean you

393
00:20:29,156 --> 00:20:33,836
Speaker 1: in a binary sense, you guess correctly ninety percent of

394
00:20:33,876 --> 00:20:34,316
Speaker 1: the time.

395
00:20:35,036 --> 00:20:37,636
Speaker 2: Yeah, So on a public benchmark, we're in the nineties.

396
00:20:37,636 --> 00:20:41,956
Speaker 2: There's accuracy, precision and recall. Accuracy is how accurate are

397
00:20:41,996 --> 00:20:45,436
Speaker 2: we Let's say there is one hundred sample set is

398
00:20:45,436 --> 00:20:50,076
Speaker 2: one hundred, maybe fifty is fake, fifty is real? Right.

399
00:20:50,116 --> 00:20:52,396
Speaker 2: The accuracy is you take, okay, how many of those

400
00:20:52,396 --> 00:20:55,276
Speaker 2: did you get right? How many of the real I'm fake? Divided? Right,

401
00:20:55,836 --> 00:20:58,676
Speaker 2: that's the that's the accuracy. The problem with that is

402
00:20:58,796 --> 00:21:02,596
Speaker 2: like unbalanced data set, maybe maybe only two is fake

403
00:21:03,156 --> 00:21:06,636
Speaker 2: and then the other ninety eight are real. So in

404
00:21:06,676 --> 00:21:11,076
Speaker 2: that case, the accuracy. See if we had said that Okay,

405
00:21:11,116 --> 00:21:14,076
Speaker 2: everything is real, we would be ninety eight percent. Right,

406
00:21:14,156 --> 00:21:17,156
Speaker 2: that's not very useful because you missed the defix. So

407
00:21:17,236 --> 00:21:20,996
Speaker 2: that's why precision and recall coming. They look specifically at

408
00:21:21,276 --> 00:21:24,476
Speaker 2: how did you do on that specific like the fakes

409
00:21:24,596 --> 00:21:28,196
Speaker 2: or the reals, So there's more than just accuracy. There's

410
00:21:28,236 --> 00:21:29,756
Speaker 2: also other factors to look at.

411
00:21:30,276 --> 00:21:33,116
Speaker 1: So there's it's kind of like the sort of false

412
00:21:33,156 --> 00:21:39,716
Speaker 1: positive false negative challenge with medical tests, right you want

413
00:21:39,756 --> 00:21:43,076
Speaker 1: to test that both says you have the thing, says

414
00:21:43,076 --> 00:21:45,116
Speaker 1: you have the disease when you have the disease, and

415
00:21:45,276 --> 00:21:47,436
Speaker 1: also says you don't have the disease when you don't

416
00:21:47,436 --> 00:21:49,756
Speaker 1: have the disease, And that actually ends up being a

417
00:21:49,796 --> 00:21:54,676
Speaker 1: really complicated problem given the nature of baselines, right like

418
00:21:54,716 --> 00:21:57,076
Speaker 1: in your universe, certainly in the universe of people calling

419
00:21:57,116 --> 00:22:01,956
Speaker 1: their banker. Almost everybody calling their banker is a real person, right,

420
00:22:02,716 --> 00:22:06,276
Speaker 1: but there are these very high stakes, presumably very rare

421
00:22:06,316 --> 00:22:08,156
Speaker 1: cases where it is a defake, and so that's like

422
00:22:08,196 --> 00:22:09,956
Speaker 1: a complicated problem.

423
00:22:10,316 --> 00:22:14,036
Speaker 2: It actually is, It absolutely is, and it's something as

424
00:22:14,076 --> 00:22:16,276
Speaker 2: we work with each customer, we have to tweak those.

425
00:22:16,396 --> 00:22:20,956
Speaker 2: Someonet higher false positives, someone higher false negatives. It depends

426
00:22:20,956 --> 00:22:23,276
Speaker 2: on each use case, in the case of a bank,

427
00:22:23,436 --> 00:22:25,676
Speaker 2: they want to be a bit more cautious. But that

428
00:22:25,796 --> 00:22:28,316
Speaker 2: also causes a lot of It could cause a lot

429
00:22:28,356 --> 00:22:29,876
Speaker 2: of pain depending on the volume.

430
00:22:29,876 --> 00:22:32,716
Speaker 1: Right, because if every client it's like, oh sorry, I

431
00:22:32,716 --> 00:22:34,076
Speaker 1: got to call you back to make sure you're not

432
00:22:34,116 --> 00:22:36,196
Speaker 1: a deep fake, Like that's not great.

433
00:22:36,956 --> 00:22:38,956
Speaker 2: Yeah, And if you have thousands of calls a day

434
00:22:38,996 --> 00:22:42,916
Speaker 2: and even one percent is a false positive or negative,

435
00:22:42,996 --> 00:22:45,476
Speaker 2: that that creates a lot of work, Yeah, because it

436
00:22:45,516 --> 00:22:46,036
Speaker 2: adds up.

437
00:22:46,196 --> 00:22:47,956
Speaker 1: How do you solve that? What do you do about that?

438
00:22:49,676 --> 00:22:53,476
Speaker 2: So the way it works is all about adjusting. You

439
00:22:53,476 --> 00:22:58,436
Speaker 2: can think of thresholds, right, We can adjust variety of

440
00:22:58,476 --> 00:23:02,196
Speaker 2: parameters as the output for a model, not just the

441
00:23:02,236 --> 00:23:08,396
Speaker 2: model itself, but the for example, in an audio as

442
00:23:08,436 --> 00:23:11,156
Speaker 2: we speak, you know, we could look at okay, how

443
00:23:11,196 --> 00:23:13,876
Speaker 2: long do you want to listen before you give an answer?

444
00:23:14,516 --> 00:23:17,316
Speaker 2: You know, maybe maybe? And the longer you listen, the

445
00:23:17,396 --> 00:23:21,876
Speaker 2: more the more confident, the more that's smart.

446
00:23:21,916 --> 00:23:24,516
Speaker 1: That makes sense, right, because it's essentially more data for

447
00:23:24,596 --> 00:23:28,396
Speaker 1: the model exactly. Yeah, what are you trying to figure

448
00:23:28,396 --> 00:23:31,116
Speaker 1: out now? Like what is the frontier?

449
00:23:32,236 --> 00:23:35,116
Speaker 2: What's really the latest now? And it's just amazing how quickly.

450
00:23:35,156 --> 00:23:38,076
Speaker 2: It's going as videos. So the videos that we detect

451
00:23:38,236 --> 00:23:41,476
Speaker 2: are like a face swap, Like you're sitting there speaking

452
00:23:41,676 --> 00:23:44,676
Speaker 2: and another person's face is on there. That's a face swap.

453
00:23:44,916 --> 00:23:48,676
Speaker 2: But now you can generate an entire video completely from scratch,

454
00:23:49,116 --> 00:23:52,836
Speaker 2: and you just type in the description and the video

455
00:23:52,916 --> 00:23:55,156
Speaker 2: comes out. You can take some you can I can

456
00:23:55,156 --> 00:23:57,076
Speaker 2: take your voice, a few seconds of your voice. I

457
00:23:57,116 --> 00:24:00,156
Speaker 2: can then have you say anything I want, which you

458
00:24:00,156 --> 00:24:03,356
Speaker 2: can clearly see. The bad, bad person can misuse these tools.

459
00:24:03,716 --> 00:24:06,036
Speaker 2: So the latest is these things are getting really good and.

460
00:24:06,196 --> 00:24:09,636
Speaker 1: Over time, like with those videos, is your how is

461
00:24:09,676 --> 00:24:13,236
Speaker 1: your reliability and accuracy changing? You're getting better or worse

462
00:24:13,356 --> 00:24:16,076
Speaker 1: or staying the same as the technology to create the

463
00:24:16,076 --> 00:24:17,036
Speaker 1: deep fix improves.

464
00:24:17,236 --> 00:24:20,516
Speaker 2: So what's interesting is it has slowed down in terms

465
00:24:20,556 --> 00:24:23,476
Speaker 2: of like the signatures, Like we don't need as much

466
00:24:23,556 --> 00:24:27,436
Speaker 2: data as we used to. So of course there's still

467
00:24:27,436 --> 00:24:29,356
Speaker 2: a lot of work and we're never going to stop,

468
00:24:29,596 --> 00:24:31,516
Speaker 2: but it is stabilizing a little bit.

469
00:24:32,516 --> 00:24:35,436
Speaker 1: When you say it, what is stabilizing a little bit,

470
00:24:36,956 --> 00:24:37,436
Speaker 1: So like the.

471
00:24:37,396 --> 00:24:40,956
Speaker 2: Defied signatures are stabilizing the way.

472
00:24:40,836 --> 00:24:43,996
Speaker 1: The signatures, meaning the giveaways, the things that I can't see,

473
00:24:44,396 --> 00:24:47,636
Speaker 1: but that your models can see that AI exactly.

474
00:24:47,676 --> 00:24:51,156
Speaker 2: So our models going back and give them more detail.

475
00:24:51,316 --> 00:24:54,396
Speaker 2: They're looking at different attributes of a piece of media,

476
00:24:54,556 --> 00:24:57,356
Speaker 2: and they pull out those attributes and then they send

477
00:24:57,396 --> 00:25:01,516
Speaker 2: those to our and house neural networks that steady those attributes.

478
00:25:01,916 --> 00:25:03,796
Speaker 1: Like one that you have mentioned, that the company has

479
00:25:03,836 --> 00:25:09,476
Speaker 1: mentioned publicly is the the sync of audio and video. Right, Yes,

480
00:25:09,556 --> 00:25:11,916
Speaker 1: maybe that's one where it's gotten better and it doesn't

481
00:25:11,956 --> 00:25:15,476
Speaker 1: matter anymore, but like it, from what I understand, from

482
00:25:15,516 --> 00:25:17,316
Speaker 1: what I've read, there was at least a time when

483
00:25:17,876 --> 00:25:20,196
Speaker 1: the sink of the audio and video tended to be

484
00:25:20,396 --> 00:25:25,756
Speaker 1: off in deep fake videos. Right? Is that an example

485
00:25:25,956 --> 00:25:26,756
Speaker 1: of a signature.

486
00:25:27,316 --> 00:25:29,556
Speaker 2: So the way that works is we train the model.

487
00:25:29,596 --> 00:25:33,356
Speaker 2: We say, hey, here's a bunch of people speaking, here's

488
00:25:33,396 --> 00:25:35,196
Speaker 2: what they look like. Look at the sink. Here's a

489
00:25:35,236 --> 00:25:37,636
Speaker 2: bunch of people like that or defikes, and look at

490
00:25:37,636 --> 00:25:40,356
Speaker 2: the sink, and we tune the model so we can

491
00:25:40,396 --> 00:25:42,476
Speaker 2: tell the difference. That's also happening to a video. By

492
00:25:42,476 --> 00:25:44,636
Speaker 2: the way, if you look at Sora and some of

493
00:25:44,676 --> 00:25:49,276
Speaker 2: these new models where someone's are walking, for example, their

494
00:25:49,356 --> 00:25:54,316
Speaker 2: legs are not like you know, they're not really smooth,

495
00:25:54,356 --> 00:25:56,076
Speaker 2: or they don't look right, So you can look at

496
00:25:56,076 --> 00:25:58,876
Speaker 2: that as well. That's the temporal dynamics we call that.

497
00:25:59,436 --> 00:26:03,516
Speaker 1: Uh Like temporal dynamics is basically are things proceeding in

498
00:26:03,636 --> 00:26:05,316
Speaker 1: time in a natural.

499
00:26:04,956 --> 00:26:07,196
Speaker 2: Way exactly how things change over time.

500
00:26:09,756 --> 00:26:12,116
Speaker 1: So yeah, all of these seem like things that you

501
00:26:12,156 --> 00:26:14,556
Speaker 1: can just that are going to be fleeting, right. Like

502
00:26:14,636 --> 00:26:20,276
Speaker 1: my baseline assumption is it'll all get solved. Do you

503
00:26:21,036 --> 00:26:22,676
Speaker 1: how long do you think you'll be able to defend

504
00:26:22,716 --> 00:26:23,276
Speaker 1: reality for?

505
00:26:24,316 --> 00:26:26,436
Speaker 2: You know, this question comes up all the time where

506
00:26:26,956 --> 00:26:29,676
Speaker 2: there is always a giveaway or there is always a

507
00:26:29,716 --> 00:26:31,596
Speaker 2: new way to look at the problem. We're not just

508
00:26:31,636 --> 00:26:34,156
Speaker 2: looking always at the raw pixels, right, We could look

509
00:26:34,196 --> 00:26:38,396
Speaker 2: at different aspects. We could look at the frequency. For example,

510
00:26:38,396 --> 00:26:40,396
Speaker 2: if you look at an image, you can actually break

511
00:26:40,396 --> 00:26:41,476
Speaker 2: it down into frequencies.

512
00:26:41,956 --> 00:26:44,596
Speaker 1: When you say frequency, what do you mean when you

513
00:26:44,636 --> 00:26:46,156
Speaker 1: say you can look at the frequency? What does that mean?

514
00:26:46,316 --> 00:26:49,396
Speaker 2: So? For example, okay, so let's go with audio. You

515
00:26:49,476 --> 00:26:51,516
Speaker 2: know you can use some of call four yer transformers

516
00:26:51,556 --> 00:26:54,956
Speaker 2: to actually break up an audio into individual wavelengths science

517
00:26:54,956 --> 00:26:56,756
Speaker 2: and co science that does a look. You can do

518
00:26:56,796 --> 00:26:59,116
Speaker 2: the same with for an image, for example, you can

519
00:26:59,156 --> 00:26:59,596
Speaker 2: break that.

520
00:26:59,636 --> 00:27:03,356
Speaker 1: Up like like the analogy of a wave form of audio.

521
00:27:03,556 --> 00:27:05,716
Speaker 2: Yeah, it can. It can be translated into a bunch

522
00:27:05,716 --> 00:27:10,196
Speaker 2: of waves. So so there's multiples that we look at.

523
00:27:10,276 --> 00:27:14,556
Speaker 2: There's and the AI there's always a giveaway, uh and

524
00:27:14,556 --> 00:27:17,556
Speaker 2: and and again we're also thinking outside the box, right,

525
00:27:17,636 --> 00:27:21,156
Speaker 2: like the blood flow for example. Right, But there's other

526
00:27:21,236 --> 00:27:22,716
Speaker 2: kind of similar things we could think about.

527
00:27:22,916 --> 00:27:28,516
Speaker 1: I mean, presumably you know there, you know, renaissance Renaissance Capital.

528
00:27:28,596 --> 00:27:32,196
Speaker 1: The James Simons is one of the first quant hedge funds,

529
00:27:32,796 --> 00:27:37,556
Speaker 1: and they made tons of money for a long time.

530
00:27:37,636 --> 00:27:41,516
Speaker 1: They wildly outperformed the market. Clearly they had a technological advantage.

531
00:27:41,596 --> 00:27:44,716
Speaker 1: And the thing Simon said, the founder of this math

532
00:27:44,756 --> 00:27:47,556
Speaker 1: guy about about that company. One of the things he

533
00:27:47,596 --> 00:27:51,436
Speaker 1: said was like, we actually don't want to hire like

534
00:27:51,596 --> 00:27:54,476
Speaker 1: finance people who have some story about why a stock

535
00:27:54,556 --> 00:27:56,876
Speaker 1: is going to outperform, because if there's a story about

536
00:27:56,916 --> 00:27:59,756
Speaker 1: it that then then somebody else is going to know

537
00:27:59,836 --> 00:28:03,116
Speaker 1: it already. Right. Their thing was just like, we just

538
00:28:03,756 --> 00:28:06,716
Speaker 1: give the model all the data and let the model

539
00:28:06,756 --> 00:28:11,756
Speaker 1: find these weird ass patterns that no human even understands.

540
00:28:11,996 --> 00:28:14,836
Speaker 1: But they work more often than they don't work, and

541
00:28:14,916 --> 00:28:17,756
Speaker 1: we make tons of money. And I would think that

542
00:28:17,796 --> 00:28:20,636
Speaker 1: would be the case for you to some extent that

543
00:28:20,636 --> 00:28:22,516
Speaker 1: if you could think of a thing like monitoring blood

544
00:28:22,516 --> 00:28:25,276
Speaker 1: flow or whatever, then the bad guys or whatever, the

545
00:28:25,276 --> 00:28:29,036
Speaker 1: people who want to make realistic Jenny I would also

546
00:28:29,076 --> 00:28:31,356
Speaker 1: think of it. And the real kind of secret sauce

547
00:28:31,396 --> 00:28:36,636
Speaker 1: would be in weird correlations that the model finds that

548
00:28:37,276 --> 00:28:39,076
Speaker 1: we wouldn't even understand.

549
00:28:40,196 --> 00:28:44,596
Speaker 2: Exactly. I mean, that is oftentimes what the model is

550
00:28:44,716 --> 00:28:48,596
Speaker 2: training on, and the way it determines of something that

551
00:28:48,596 --> 00:28:52,796
Speaker 2: you think looking at certain features, it is something that

552
00:28:52,836 --> 00:28:55,636
Speaker 2: we don't even tell it, right, Yeah, it determines on

553
00:28:55,676 --> 00:28:56,036
Speaker 2: its own.

554
00:28:56,076 --> 00:28:58,676
Speaker 1: Like that's the beauty of this kind of new era

555
00:28:58,876 --> 00:29:03,676
Speaker 1: of whatever, neural networks, machine learning. Right, it's just you

556
00:29:03,796 --> 00:29:06,956
Speaker 1: throw everything at it and let the machine figure it out.

557
00:29:07,196 --> 00:29:09,636
Speaker 2: We like to say we throw the kitchen sink at sometimes.

558
00:29:09,716 --> 00:29:12,756
Speaker 1: Yes, yes, I mean, And so when you were talking

559
00:29:12,796 --> 00:29:17,436
Speaker 1: before about explainability, right about sort of saying in your output,

560
00:29:17,956 --> 00:29:20,636
Speaker 1: here's why we think it's fake. I feel like that

561
00:29:20,756 --> 00:29:22,516
Speaker 1: kind of throw everything at it and let the machine

562
00:29:22,556 --> 00:29:24,996
Speaker 1: figure it out makes it hard to like sometimes you

563
00:29:24,996 --> 00:29:26,996
Speaker 1: don't know, right, it's just like, well, the machine is

564
00:29:27,356 --> 00:29:30,996
Speaker 1: very smart in it says this is probably fake, Like yes,

565
00:29:31,116 --> 00:29:33,556
Speaker 1: that is that intention that can happen.

566
00:29:33,636 --> 00:29:36,116
Speaker 2: So you'll look at it. We'll show you an image

567
00:29:36,156 --> 00:29:38,796
Speaker 2: and it'll say the model was looking at certain areas.

568
00:29:38,876 --> 00:29:41,476
Speaker 2: And by the way, this also helps us with debugging

569
00:29:41,556 --> 00:29:44,316
Speaker 2: him bias. Right, maybe it was like for some reason

570
00:29:44,396 --> 00:29:49,156
Speaker 2: looking at an area of the face that we wouldn't tell.

571
00:29:49,196 --> 00:29:51,956
Speaker 2: Why would that set off the model? And so in

572
00:29:51,956 --> 00:29:54,636
Speaker 2: those scenario as we also investigate like why was this

573
00:29:54,756 --> 00:29:58,516
Speaker 2: area flag? And it could be one hundred percent correct,

574
00:29:58,916 --> 00:30:01,436
Speaker 2: it's just we do we do have to examine it further.

575
00:30:02,516 --> 00:30:04,996
Speaker 1: Could you create a deep fake that would fool your

576
00:30:05,076 --> 00:30:10,276
Speaker 1: deep fake detector? Yes, haha, Well if you could do it,

577
00:30:10,316 --> 00:30:13,076
Speaker 1: somebody else could do it. Don't you think I could

578
00:30:13,116 --> 00:30:13,356
Speaker 1: do it?

579
00:30:13,396 --> 00:30:17,396
Speaker 2: Because I have access to a lot more knowledge, right, Like,

580
00:30:18,316 --> 00:30:20,476
Speaker 2: you know I could if I was running an anti

581
00:30:20,556 --> 00:30:23,836
Speaker 2: virus company, I could probably write a virus if I

582
00:30:23,916 --> 00:30:27,396
Speaker 2: knew exactly what we're constantly actually trying to do that.

583
00:30:27,436 --> 00:30:29,516
Speaker 1: By the way, yeah, I mean in a sense, that's

584
00:30:29,596 --> 00:30:33,276
Speaker 1: the whole adversarial network thing, right, Like I guess you

585
00:30:33,756 --> 00:30:36,996
Speaker 1: have to do that for your detection models or your

586
00:30:37,036 --> 00:30:38,956
Speaker 1: suite of models to get better, right.

587
00:30:39,076 --> 00:30:41,916
Speaker 2: Yeah, So we have what's called red teeming both black

588
00:30:41,956 --> 00:30:44,516
Speaker 2: box and understanding of the codes. So we're trying to

589
00:30:44,516 --> 00:30:46,956
Speaker 2: break the models. That's part of the what we do.

590
00:30:47,436 --> 00:30:50,556
Speaker 1: Uh huh. And so are there like evil geniuses at

591
00:30:50,596 --> 00:30:52,636
Speaker 1: your company who can make killer deep fakes?

592
00:30:53,796 --> 00:30:57,836
Speaker 2: We definitely have geniuses one hundred percent, but we're in

593
00:30:57,876 --> 00:31:00,636
Speaker 2: the business of detection, right, we don't. We don't try

594
00:31:00,636 --> 00:31:03,396
Speaker 2: to generate too much other than just for training the models.

595
00:31:03,676 --> 00:31:08,556
Speaker 1: I mean I have to think like, there are many

596
00:31:08,596 --> 00:31:10,676
Speaker 1: people in the world world who want to make a

597
00:31:11,716 --> 00:31:15,916
Speaker 1: deep fakes for many reasons, and they're at different levels

598
00:31:15,956 --> 00:31:22,236
Speaker 1: of technological sophistication. Naively not knowing much about this, I

599
00:31:22,236 --> 00:31:25,076
Speaker 1: would think you can catch most of them. But if

600
00:31:25,116 --> 00:31:27,436
Speaker 1: you have people who can beat your models, I would

601
00:31:27,476 --> 00:31:31,916
Speaker 1: imagine that, say, state actors, countries throwing billions of dollars

602
00:31:31,996 --> 00:31:35,156
Speaker 1: at this probably also have people who could defeat your models.

603
00:31:36,996 --> 00:31:40,796
Speaker 2: Yeah, I mean that's always a case with any cybersecurity company.

604
00:31:40,956 --> 00:31:44,716
Speaker 2: We are a cybersecurity company. Every cyber security company does

605
00:31:44,756 --> 00:31:49,396
Speaker 2: its best to defend right, but we did not promise

606
00:31:49,476 --> 00:31:53,316
Speaker 2: one hundred percent. Our models are always a probability.

607
00:31:54,076 --> 00:31:57,436
Speaker 1: Who's who's the best at making deep fikes that you're

608
00:31:57,476 --> 00:31:57,996
Speaker 1: aware of.

609
00:31:58,716 --> 00:32:01,836
Speaker 2: There's a few, right, there's like Sora from OpenAI. There's Runway,

610
00:32:01,996 --> 00:32:03,676
Speaker 2: there's Synthesia, there's you.

611
00:32:03,636 --> 00:32:06,116
Speaker 1: Better be able to catch right, anything I've heard of.

612
00:32:06,236 --> 00:32:08,796
Speaker 1: You better be really good at the technic. Presumably it's

613
00:32:08,836 --> 00:32:13,116
Speaker 1: like some like you know, Russian Genius Squad or I

614
00:32:13,116 --> 00:32:15,196
Speaker 1: don't know, the North Koreans are some things. I would

615
00:32:15,236 --> 00:32:17,396
Speaker 1: imagine it is some state funded actor, but.

616
00:32:17,556 --> 00:32:20,076
Speaker 2: I would I would actually say, you know, we're in

617
00:32:20,076 --> 00:32:23,356
Speaker 2: a place where this is a problem is getting bigger.

618
00:32:23,516 --> 00:32:25,436
Speaker 2: But we're in a place where a lot of the

619
00:32:25,516 --> 00:32:28,676
Speaker 2: defects coming out are actually fron entertainment and they're not

620
00:32:28,996 --> 00:32:31,316
Speaker 2: like Youth for Evil. You know, you've seen the famous

621
00:32:31,356 --> 00:32:36,196
Speaker 2: Tom kruzwe or or other actors running around and do things,

622
00:32:36,196 --> 00:32:38,196
Speaker 2: and those are defakes, right, those are actually pretty good.

623
00:32:38,236 --> 00:32:39,916
Speaker 2: We detect them, but they're actually very good.

624
00:32:40,596 --> 00:32:42,556
Speaker 1: What are you thinking about in the context of the

625
00:32:42,756 --> 00:32:44,436
Speaker 1: of the election in the US this year and do

626
00:32:44,476 --> 00:32:48,436
Speaker 1: you have particular clients who are especially focused on election

627
00:32:48,556 --> 00:32:50,436
Speaker 1: related deep fakes.

628
00:32:51,436 --> 00:32:56,556
Speaker 2: Yeah, the media companies are the main ones, and we're ready.

629
00:32:56,996 --> 00:33:02,356
Speaker 2: We detect the best, the best defakes, right, everything that's

630
00:33:02,396 --> 00:33:05,516
Speaker 2: coming out we detect, So we're ready and we want

631
00:33:05,596 --> 00:33:09,716
Speaker 2: to make sure we're there as one avenue of people

632
00:33:10,396 --> 00:33:13,596
Speaker 2: verifying content. I believe late last year there was an

633
00:33:13,636 --> 00:33:17,876
Speaker 2: election in Slovenia where there was an audio of one

634
00:33:17,916 --> 00:33:21,796
Speaker 2: of the candidates saying he's gonna double the price of beer. Yeah,

635
00:33:21,836 --> 00:33:25,756
Speaker 2: and that actually was a defake. It was caught, but

636
00:33:25,876 --> 00:33:28,556
Speaker 2: it kind of costed some damage. So it's starting to

637
00:33:28,596 --> 00:33:28,996
Speaker 2: happen now.

638
00:33:29,076 --> 00:33:31,356
Speaker 1: It's an awesomely stupid deep fake. I mean, to me,

639
00:33:31,596 --> 00:33:35,516
Speaker 1: the real risk of deep fakes is not people believing

640
00:33:35,556 --> 00:33:39,636
Speaker 1: something that's false. It's people ceasing to believe anything, right.

641
00:33:39,716 --> 00:33:43,116
Speaker 1: It's just saying, oh, that's probably just a deep fake,

642
00:33:43,196 --> 00:33:45,716
Speaker 1: right like that. Actually, to me seems like the bigger

643
00:33:45,796 --> 00:33:49,236
Speaker 1: risk is nothing is true anymore. Nobody cares about the

644
00:33:49,276 --> 00:33:49,996
Speaker 1: truth anymore.

645
00:33:50,556 --> 00:33:56,316
Speaker 2: That's definitely a problem as well. Now we're seeing people saying, oh,

646
00:33:56,396 --> 00:33:58,996
Speaker 2: this is a defake. That's actually happened. There's a few.

647
00:33:59,996 --> 00:34:03,156
Speaker 2: I believe it was a Cape Milton video if I'm correct,

648
00:34:03,156 --> 00:34:04,876
Speaker 2: that was earlier this year, or everyone thought that was

649
00:34:04,876 --> 00:34:08,316
Speaker 2: a defake and it wasn't. So this kind of problem

650
00:34:08,356 --> 00:34:09,356
Speaker 2: is happening.

651
00:34:08,956 --> 00:34:12,476
Speaker 1: Like that's because people people want to believe things that

652
00:34:12,516 --> 00:34:14,836
Speaker 1: are consistent with their prior beliefs, and they don't want

653
00:34:14,876 --> 00:34:18,356
Speaker 1: to believe things that call their prior beliefs into question, right,

654
00:34:18,436 --> 00:34:20,796
Speaker 1: and so deep fakes in a way are an easy

655
00:34:20,836 --> 00:34:23,676
Speaker 1: out where if you see something you like, you assume

656
00:34:23,676 --> 00:34:25,476
Speaker 1: it's true. If you see something you don't like, you

657
00:34:25,516 --> 00:34:27,716
Speaker 1: assume it's not true, or you assume everything's just kind

658
00:34:27,716 --> 00:34:29,756
Speaker 1: of bullshit like that to me seems like a big

659
00:34:29,836 --> 00:34:32,476
Speaker 1: quind of societal level risk of deep fakes.

660
00:34:32,636 --> 00:34:36,276
Speaker 2: We'll never fix that. That's something that will never solve. Yeah,

661
00:34:36,836 --> 00:34:40,076
Speaker 2: people have their own beliefs. You can show them anything,

662
00:34:40,236 --> 00:34:44,156
Speaker 2: the facts, math, that's not going to fix it all. Yeah.

663
00:34:44,196 --> 00:34:46,476
Speaker 1: No, I guess that's a human nature problem, if not

664
00:34:46,556 --> 00:34:52,636
Speaker 1: an AI problem. We'll be back in a minute with

665
00:34:52,676 --> 00:35:08,556
Speaker 1: the lighting round. Okay, let's close with a lightning round.

666
00:35:09,196 --> 00:35:09,516
Speaker 2: Okay.

667
00:35:10,236 --> 00:35:13,396
Speaker 1: How often do people applying to work at Reality Defender

668
00:35:13,756 --> 00:35:16,076
Speaker 1: use generative AI to write cover letters?

669
00:35:16,476 --> 00:35:19,116
Speaker 2: Oh, that's a good one. Not a lot of, but

670
00:35:19,116 --> 00:35:21,396
Speaker 2: we've seen it for sure. I would say maybe about

671
00:35:21,676 --> 00:35:22,356
Speaker 2: three percent.

672
00:35:22,836 --> 00:35:27,316
Speaker 1: Okay. If I want to use generative AI to write

673
00:35:27,316 --> 00:35:29,756
Speaker 1: a cover letter to apply to work at Reality Defender,

674
00:35:29,836 --> 00:35:32,116
Speaker 1: but I don't want to get caught, what should I do.

675
00:35:33,156 --> 00:35:35,276
Speaker 2: Change about seventy five percent.

676
00:35:34,996 --> 00:35:39,356
Speaker 1: Of the words Okay, who is Gabe Reagan?

677
00:35:41,476 --> 00:35:45,316
Speaker 2: Gabe was? I think it was our VP of public

678
00:35:45,596 --> 00:35:48,476
Speaker 2: relations or something like that. Here's a dfake. We we

679
00:35:48,516 --> 00:35:50,996
Speaker 2: created him as a as kind of a as kind

680
00:35:51,036 --> 00:35:54,196
Speaker 2: of a fun joke. But obviously we tell everyone.

681
00:35:54,516 --> 00:35:56,636
Speaker 1: Tell me, tell me a little bit more about that.

682
00:35:57,756 --> 00:36:01,316
Speaker 2: If you go on certain websites where you put your

683
00:36:01,316 --> 00:36:05,436
Speaker 2: photo and maybe your job experience, there's quite a large

684
00:36:05,516 --> 00:36:10,036
Speaker 2: number of deficke profiles on these websites like LinkedIn.

685
00:36:11,276 --> 00:36:16,996
Speaker 1: Yes, huh, why why people be doing that?

686
00:36:17,076 --> 00:36:18,276
Speaker 2: Sorry scammers?

687
00:36:18,676 --> 00:36:20,276
Speaker 1: I'm trying to think, how do you get money out

688
00:36:20,276 --> 00:36:22,196
Speaker 1: of people by having a fake LinkedIn account?

689
00:36:22,316 --> 00:36:25,396
Speaker 2: Oh? I can tell you. Let's say you start the

690
00:36:25,476 --> 00:36:28,956
Speaker 2: most popular ones that I'm aware of, is like cryptocurrency.

691
00:36:28,996 --> 00:36:31,116
Speaker 2: Maybe you create a coin and you're like, here's a

692
00:36:31,236 --> 00:36:33,956
Speaker 2: CEO and here's this person and they have these great

693
00:36:33,956 --> 00:36:36,716
Speaker 2: LinkedIn profiles. Here's their photo and they're not real, but

694
00:36:36,876 --> 00:36:38,516
Speaker 2: it sells a story. Right.

695
00:36:41,076 --> 00:36:45,356
Speaker 1: Is it right that you founded a clothing company?

696
00:36:46,116 --> 00:36:46,436
Speaker 2: I did?

697
00:36:46,516 --> 00:36:49,316
Speaker 1: Yes, what's one thing you learned about fashion from doing that?

698
00:36:50,876 --> 00:36:54,196
Speaker 2: It's much different than software development.

699
00:36:54,956 --> 00:36:57,196
Speaker 1: Sure, I don't think you needed to start a company

700
00:36:57,236 --> 00:37:00,756
Speaker 1: to learn that. I mean, the marginal cost is not

701
00:37:00,956 --> 00:37:01,956
Speaker 1: zero for one thing.

702
00:37:02,556 --> 00:37:05,436
Speaker 2: Yeah, the software is easy, you write some It's not

703
00:37:05,636 --> 00:37:08,116
Speaker 2: easy at all. But what I mean is you're writing

704
00:37:08,196 --> 00:37:11,636
Speaker 2: some code and you ship it. Versus in fashion, you

705
00:37:11,676 --> 00:37:13,396
Speaker 2: have to have like you got to source the fabric.

706
00:37:13,436 --> 00:37:16,356
Speaker 2: You gotta you gotta design it, you gotta make the patterns,

707
00:37:16,356 --> 00:37:18,156
Speaker 2: you gotta cut it, sew it, make sure it fits.

708
00:37:18,236 --> 00:37:19,076
Speaker 2: It's a lot more work.

709
00:37:22,196 --> 00:37:24,196
Speaker 1: What are the chances that we exist in a simulation?

710
00:37:25,836 --> 00:37:27,236
Speaker 2: You know, I used to think this is kind of

711
00:37:27,276 --> 00:37:30,396
Speaker 2: a joke, but I don't know. I'm seeing every every

712
00:37:30,796 --> 00:37:34,716
Speaker 2: month it seems to get higher. From my perspective.

713
00:37:35,196 --> 00:37:36,076
Speaker 1: Why do you say that.

714
00:37:37,356 --> 00:37:40,156
Speaker 2: I'm seeing what's happening with tech and what we're building,

715
00:37:40,316 --> 00:37:43,076
Speaker 2: and there's you can see there's there was one paper

716
00:37:43,116 --> 00:37:46,036
Speaker 2: where they took a bunch of agents and they gave

717
00:37:46,076 --> 00:37:47,836
Speaker 2: them all a job and they start to do it

718
00:37:47,876 --> 00:37:50,036
Speaker 2: and they just started to like create their own kind

719
00:37:50,116 --> 00:37:52,996
Speaker 2: of like work cloths. Right, I don't know, it shuld

720
00:37:53,036 --> 00:37:53,556
Speaker 2: be getting there.

721
00:37:53,876 --> 00:37:56,236
Speaker 1: So so it's like, well, if we can create a

722
00:37:56,316 --> 00:38:00,196
Speaker 1: simulation that seems like reality, maybe someone created a simulation

723
00:38:00,356 --> 00:38:05,156
Speaker 1: that is our reality exactly. Yeah, what do you wish

724
00:38:05,236 --> 00:38:06,756
Speaker 1: more people understood about AI.

725
00:38:07,716 --> 00:38:09,796
Speaker 2: I mean, it's a tool, and I don't think people

726
00:38:09,796 --> 00:38:12,556
Speaker 2: should be afraid of it. They should embrace it. And

727
00:38:12,996 --> 00:38:16,036
Speaker 2: you know there's people are just running away from it.

728
00:38:16,036 --> 00:38:20,676
Speaker 2: It's fantastic, it's great. Embrace it. Just be careful. One

729
00:38:20,676 --> 00:38:22,836
Speaker 2: thing I'd like to tell, like my friends and family,

730
00:38:23,036 --> 00:38:25,476
Speaker 2: especially with the e fake audio, have a safe word.

731
00:38:25,516 --> 00:38:28,716
Speaker 2: As somebody calls you and you're like that's weird, you know,

732
00:38:29,236 --> 00:38:31,356
Speaker 2: call him back or ask for a safe word.

733
00:38:31,876 --> 00:38:39,236
Speaker 1: What do you wish more people understood about reality reality?

734
00:38:39,916 --> 00:38:42,876
Speaker 2: I would say, just be aware that you exist. And

735
00:38:43,036 --> 00:38:45,316
Speaker 2: every day's a gift, So you should be excited that

736
00:38:45,476 --> 00:38:48,276
Speaker 2: you hear. Like the chances of you existing it's like

737
00:38:48,596 --> 00:38:52,116
Speaker 2: you've won the lottery a million times, So every day's

738
00:38:52,116 --> 00:38:52,476
Speaker 2: a gift.

739
00:38:56,836 --> 00:39:00,996
Speaker 1: Ali Shakiyari is the co founder and CTO at Reality Defender.

740
00:39:01,956 --> 00:39:05,236
Speaker 1: Today's show was produced by Gabriel Hunter Chang. It was

741
00:39:05,476 --> 00:39:08,916
Speaker 1: edited by Lyddy Jean Kott and engineered by Sarah Bruguer.

742
00:39:09,396 --> 00:39:13,036
Speaker 1: You can email us at problem at Pushkin dot fm.

743
00:39:13,116 --> 00:39:15,476
Speaker 1: I'm Jacob Goldstein and we'll be back next week with

744
00:39:15,516 --> 00:39:26,996
Speaker 1: another episode of What's Your Problem