WEBVTT - Detecting Deepfakes With AI

0:00:15.356 --> 0:00:23.116
<v Speaker 1>Pushkin. Earlier this year, an employee working in Hong Kong

0:00:23.196 --> 0:00:26.516
<v Speaker 1>for an international company got a weird message from one

0:00:26.516 --> 0:00:29.516
<v Speaker 1>of his colleagues. He was supposed to make a secret

0:00:29.556 --> 0:00:34.236
<v Speaker 1>transfer of millions of dollars. It seems sketchy. It obviously

0:00:34.316 --> 0:00:36.956
<v Speaker 1>seems sketchy, so he got on a video call with

0:00:36.996 --> 0:00:40.516
<v Speaker 1>a bunch of people, including the company's CFO, the chief

0:00:40.516 --> 0:00:44.516
<v Speaker 1>financial officer. The CFO said the request was legit, so

0:00:44.716 --> 0:00:47.716
<v Speaker 1>the employee did what he was told. He transferred roughly

0:00:47.916 --> 0:00:52.076
<v Speaker 1>twenty five million dollars to several bank accounts. As it

0:00:52.116 --> 0:00:54.396
<v Speaker 1>turned out, the CFO on the video call was not

0:00:54.876 --> 0:00:58.796
<v Speaker 1>really the CFO. It was a deep fake, an AI

0:00:58.956 --> 0:01:03.116
<v Speaker 1>generated twin created from publicly available audio and video of

0:01:03.156 --> 0:01:06.756
<v Speaker 1>the real CFO. By the time the company figured out

0:01:06.756 --> 0:01:09.516
<v Speaker 1>what was going on, it was too late, the money

0:01:09.756 --> 0:01:18.316
<v Speaker 1>was gone. I'm Jacob Goldstein and this is What's Your Problem,

0:01:18.476 --> 0:01:20.236
<v Speaker 1>the show where I talk to people who are trying

0:01:20.276 --> 0:01:24.676
<v Speaker 1>to make technological progress. My guest today is Ali Shahieri.

0:01:24.916 --> 0:01:28.156
<v Speaker 1>He's the co founder and chief technology officer at the

0:01:28.236 --> 0:01:34.076
<v Speaker 1>audaciously named Reality Defender. Ali's problem is this, how can

0:01:34.116 --> 0:01:39.516
<v Speaker 1>you use AI to protect the world from AI? More specifically,

0:01:39.956 --> 0:01:41.996
<v Speaker 1>how do you build a set of models to spot

0:01:41.996 --> 0:01:48.076
<v Speaker 1>the difference between reality and AI generated deep fakes. How'd

0:01:48.076 --> 0:01:50.476
<v Speaker 1>you get into the defending reality business?

0:01:51.796 --> 0:01:57.556
<v Speaker 2>Yeah, so when I started, it was around actually generating

0:01:57.756 --> 0:02:00.236
<v Speaker 2>videos and deep fikes.

0:02:00.396 --> 0:02:03.036
<v Speaker 1>So you were attacking reality before you were defending it.

0:02:04.556 --> 0:02:06.796
<v Speaker 2>I wouldn't said we were attacking anything, but we were

0:02:06.796 --> 0:02:10.756
<v Speaker 2>definitely into looking into this technology. And it is way

0:02:10.796 --> 0:02:14.036
<v Speaker 2>back before all this stuff kind of went crazy. This

0:02:14.076 --> 0:02:17.036
<v Speaker 2>is back in like twenty nineteen around that time, So

0:02:17.076 --> 0:02:21.236
<v Speaker 2>we were building digital twins and we're looking at how

0:02:21.236 --> 0:02:23.476
<v Speaker 2>do you make it so that it looks realistic? Is

0:02:23.516 --> 0:02:27.116
<v Speaker 2>it a cartoon looking thing? Is it like a unity

0:02:27.116 --> 0:02:29.596
<v Speaker 2>three D thing? And then that's when we started to

0:02:29.636 --> 0:02:32.796
<v Speaker 2>see like these early research papers where they were taking

0:02:32.836 --> 0:02:36.236
<v Speaker 2>like someone's face and putting it on a video and

0:02:36.316 --> 0:02:39.436
<v Speaker 2>blending it in and it looked really good, and we

0:02:39.436 --> 0:02:41.556
<v Speaker 2>were like, oh, maybe we can do the digital twins

0:02:41.916 --> 0:02:46.276
<v Speaker 2>that way. And while we were like in that business,

0:02:46.516 --> 0:02:49.116
<v Speaker 2>we were like, you know, probably in a few years

0:02:49.116 --> 0:02:53.836
<v Speaker 2>someone can download an app and just make anything very easily.

0:02:53.876 --> 0:02:57.356
<v Speaker 2>And that's kind of the origins of how how we started.

0:02:57.996 --> 0:03:00.116
<v Speaker 2>We're very mission driven. What we're trying to do here

0:03:00.236 --> 0:03:05.356
<v Speaker 2>is really protect the world and people from the dangers

0:03:05.836 --> 0:03:08.716
<v Speaker 2>of AI, but in a way where you know, we

0:03:08.756 --> 0:03:11.996
<v Speaker 2>want people not to abuse the technology. We're very we

0:03:12.036 --> 0:03:15.116
<v Speaker 2>love AI, we just don't want it to be abused.

0:03:16.036 --> 0:03:19.876
<v Speaker 1>So let's talk about this sort of deep fake detection

0:03:20.516 --> 0:03:25.476
<v Speaker 1>kind of you know, jen AI detection market more generally,

0:03:25.716 --> 0:03:30.796
<v Speaker 1>like who's like, who's selling deep fake detection right now,

0:03:30.796 --> 0:03:32.676
<v Speaker 1>and who's buying what's the what's the sort of market

0:03:32.796 --> 0:03:33.676
<v Speaker 1>landscape look like.

0:03:34.636 --> 0:03:38.116
<v Speaker 2>The type of clients that we have right now are banks.

0:03:39.036 --> 0:03:42.316
<v Speaker 2>For example, we are currently live with one of the

0:03:42.436 --> 0:03:45.436
<v Speaker 2>largest banks in the world. When you call that bank,

0:03:46.436 --> 0:03:50.316
<v Speaker 2>the audio goes through our defake detection models and we're

0:03:50.356 --> 0:03:53.356
<v Speaker 2>able to tell the call center this person might be

0:03:53.396 --> 0:03:57.036
<v Speaker 2>a deep fake. And part of that is that's actually happened.

0:03:57.076 --> 0:04:01.676
<v Speaker 2>Someone's called the bank and they've transferred money out and

0:04:01.956 --> 0:04:05.316
<v Speaker 2>actually this this goes back to twenty nineteen, so the

0:04:05.396 --> 0:04:09.596
<v Speaker 2>first incident of defake fraud actually and back in.

0:04:09.596 --> 0:04:13.316
<v Speaker 1>Two in nineteen that we're aware of. Right, you're right exactly,

0:04:13.676 --> 0:04:15.556
<v Speaker 1>So what happened in twenty nineteen.

0:04:16.156 --> 0:04:18.836
<v Speaker 2>Yeah, so this is back where this is early and

0:04:18.876 --> 0:04:22.596
<v Speaker 2>nobody really knew about this, and there was a CEO

0:04:22.756 --> 0:04:26.756
<v Speaker 2>that called a smaller company that THEO was a parent

0:04:26.796 --> 0:04:29.756
<v Speaker 2>company calling the child company. The CEO calling the other

0:04:29.796 --> 0:04:32.676
<v Speaker 2>CEO and he wanted to transfer some money out and

0:04:33.196 --> 0:04:36.196
<v Speaker 2>it sounded like him and the guy transferred I think

0:04:36.196 --> 0:04:37.916
<v Speaker 2>it was in UK, about two hundred and three hundred

0:04:37.916 --> 0:04:39.636
<v Speaker 2>thousand dollars out And that was like the first one

0:04:39.636 --> 0:04:40.796
<v Speaker 2>of the first ones that we.

0:04:40.836 --> 0:04:44.036
<v Speaker 1>Know of, and they got away with it, I believe.

0:04:44.076 --> 0:04:44.676
<v Speaker 2>So. Yeah.

0:04:44.716 --> 0:04:46.876
<v Speaker 1>And there was an instance earlier this year right where

0:04:47.156 --> 0:04:49.636
<v Speaker 1>I think it was in Hong Kong and some employee

0:04:49.676 --> 0:04:51.836
<v Speaker 1>was on a zoom call with the company's CFO and

0:04:51.876 --> 0:04:54.356
<v Speaker 1>the CFO was like, you know, why are twenty five

0:04:54.356 --> 0:04:57.036
<v Speaker 1>million dollars or something to some bank account? And then

0:04:57.076 --> 0:04:59.236
<v Speaker 1>the employee did it and it turned out the CFO

0:04:59.276 --> 0:05:00.836
<v Speaker 1>on the call was a deep fake, right.

0:05:01.076 --> 0:05:05.676
<v Speaker 2>Yeah, So fast were they your client? They were not

0:05:05.916 --> 0:05:09.516
<v Speaker 2>in our clients unfortunately. But this shows the how quickly

0:05:09.556 --> 0:05:13.756
<v Speaker 2>the technology is evolving. You know, twenty nineteen audio fast

0:05:13.756 --> 0:05:15.676
<v Speaker 2>forward a few years now, You've got a zoom call.

0:05:15.996 --> 0:05:17.956
<v Speaker 2>There'd a bunch of people on it and they all

0:05:17.996 --> 0:05:20.476
<v Speaker 2>look like people, you know, I know, they're all de fis.

0:05:21.156 --> 0:05:23.276
<v Speaker 1>So you were starting to mention. Banks are some of

0:05:23.276 --> 0:05:25.196
<v Speaker 1>your main clients. Who are some of your other main.

0:05:25.076 --> 0:05:28.756
<v Speaker 2>Clients, media companies, I think think of some of the

0:05:28.756 --> 0:05:31.876
<v Speaker 2>big ones, there is our product this year, especially with

0:05:31.996 --> 0:05:35.076
<v Speaker 2>the election. You know, back twenty twenty, we thought it

0:05:35.076 --> 0:05:37.836
<v Speaker 2>would be a problem. It wasn't. This year we think

0:05:37.916 --> 0:05:40.116
<v Speaker 2>is a big problem. For sure. I think we were early,

0:05:40.636 --> 0:05:45.156
<v Speaker 2>but it's already this is happening everywhere even this year.

0:05:45.756 --> 0:05:47.756
<v Speaker 2>This year is the largest election year in the world.

0:05:47.876 --> 0:05:49.996
<v Speaker 2>More than fifty percent of the people are voting, and

0:05:50.076 --> 0:05:54.756
<v Speaker 2>we already have documented cases of election issues with the fix.

0:05:55.556 --> 0:05:59.436
<v Speaker 1>Okay, media companies, banks, any other kind of big categories

0:05:59.436 --> 0:05:59.956
<v Speaker 1>of clients.

0:06:00.636 --> 0:06:06.916
<v Speaker 2>Yeah, so other ones are government agencies. But in the end,

0:06:07.276 --> 0:06:11.996
<v Speaker 2>everyone we think, we believe everyone needs this Product's not

0:06:12.076 --> 0:06:14.356
<v Speaker 2>It shouldn't be up to the people to decide or

0:06:14.356 --> 0:06:16.676
<v Speaker 2>figure out if something's a deepic. If you're on the

0:06:16.756 --> 0:06:19.876
<v Speaker 2>social media platform, you shouldn't have to figure out, hey,

0:06:19.916 --> 0:06:21.556
<v Speaker 2>is this person real or not. It should just be

0:06:21.596 --> 0:06:23.596
<v Speaker 2>built in and anyone should be able to use it.

0:06:24.156 --> 0:06:29.156
<v Speaker 1>Well. Our social media companies either buying or building deep

0:06:29.196 --> 0:06:31.836
<v Speaker 1>fake detection tools or do they want to like stay

0:06:31.876 --> 0:06:33.436
<v Speaker 1>out of that business and be like no, we don't

0:06:33.436 --> 0:06:35.396
<v Speaker 1>want to be in the business of saying yes, this

0:06:35.516 --> 0:06:36.596
<v Speaker 1>is real, no, this isn't real.

0:06:37.356 --> 0:06:39.556
<v Speaker 2>I can tell you we've been in contact and have

0:06:39.676 --> 0:06:44.636
<v Speaker 2>talked to some social media platforms. I think one issue

0:06:44.676 --> 0:06:49.116
<v Speaker 2>is they don't have to flag these things. It's up

0:06:49.156 --> 0:06:53.076
<v Speaker 2>to them, right, there's not a lot of regulation, so

0:06:53.516 --> 0:06:55.436
<v Speaker 2>I know they're thinking about it. We've chatted with some,

0:06:56.076 --> 0:06:57.596
<v Speaker 2>but that's the extent of it.

0:06:58.316 --> 0:07:00.236
<v Speaker 1>So okay, So let's talk about how it works. And

0:07:00.276 --> 0:07:02.036
<v Speaker 1>there's two ways that I want to talk about how

0:07:02.036 --> 0:07:03.396
<v Speaker 1>it works. So one is from the point of view

0:07:03.436 --> 0:07:06.356
<v Speaker 1>of the user, whoever that may be, and then the

0:07:06.396 --> 0:07:08.476
<v Speaker 1>other is sort of what's going on under the hood. Right,

0:07:08.756 --> 0:07:11.316
<v Speaker 1>So let's start with the point of view of the user.

0:07:11.596 --> 0:07:15.156
<v Speaker 1>If I'm a whatever, a bank, university, a media company

0:07:15.196 --> 0:07:18.196
<v Speaker 1>who is paying for your service, how does it work

0:07:18.236 --> 0:07:18.436
<v Speaker 1>for me?

0:07:19.436 --> 0:07:22.436
<v Speaker 2>Depends on exactly the user and the use case. If

0:07:22.476 --> 0:07:26.076
<v Speaker 2>let's say it's a media company, Uh, they're looking at

0:07:26.196 --> 0:07:30.596
<v Speaker 2>maybe filtering through a lot of content, so content moderation.

0:07:30.916 --> 0:07:32.916
<v Speaker 2>Actually that would be like a social media company. They're

0:07:32.956 --> 0:07:36.516
<v Speaker 2>looking at content moderation. Maybe they want they're looking at

0:07:36.556 --> 0:07:39.716
<v Speaker 2>millions of assets and they want to quickly flag those

0:07:39.716 --> 0:07:42.396
<v Speaker 2>things if they were in that business. Uh, the bank

0:07:42.796 --> 0:07:46.196
<v Speaker 2>there For the example I gave the issue, someone could

0:07:46.236 --> 0:07:48.836
<v Speaker 2>call and biometrics fail. By the way, if you call

0:07:48.876 --> 0:07:52.116
<v Speaker 2>a bank, some banks say repeat after me, your my

0:07:52.196 --> 0:07:54.196
<v Speaker 2>voice is my passport? That actually fails. Now what do

0:07:54.316 --> 0:07:57.436
<v Speaker 2>you think? So a bank wants to make sure the

0:07:57.476 --> 0:08:00.276
<v Speaker 2>person calling in is actually that person. This is more

0:08:00.476 --> 0:08:03.876
<v Speaker 2>relevant to more to private banking, where there's actually a

0:08:03.916 --> 0:08:06.996
<v Speaker 2>one on one relationship between the client and the bank.

0:08:07.236 --> 0:08:09.436
<v Speaker 1>And so in that case, So let's take that case.

0:08:09.476 --> 0:08:12.876
<v Speaker 1>So in that case, someone calls in and talks to

0:08:12.996 --> 0:08:15.716
<v Speaker 1>their banker. They're a rich person who has a private banker.

0:08:15.756 --> 0:08:18.316
<v Speaker 1>Basically it's what you're talking about, right, So this rich

0:08:18.356 --> 0:08:21.996
<v Speaker 1>person calls in and talks to their private banker, and

0:08:22.356 --> 0:08:25.436
<v Speaker 1>it is the system just always running in the background

0:08:25.556 --> 0:08:28.156
<v Speaker 1>in that case, And like, how does it work from

0:08:28.156 --> 0:08:30.276
<v Speaker 1>the point of view of the of the private banker.

0:08:30.916 --> 0:08:33.556
<v Speaker 2>Sure, and I have to be careful what I say here,

0:08:33.836 --> 0:08:38.076
<v Speaker 2>But the high level is the models are listening and

0:08:38.116 --> 0:08:41.476
<v Speaker 2>if they detect a potential deep fake, they will the

0:08:41.636 --> 0:08:44.476
<v Speaker 2>call center. That person will get a notification so is

0:08:44.836 --> 0:08:48.716
<v Speaker 2>integrated into their existing workflow. They'll get a notification that says, hey, this.

0:08:48.596 --> 0:08:51.116
<v Speaker 1>Person get like a text or a slack or something

0:08:51.316 --> 0:08:53.796
<v Speaker 1>they're using. You're talking to a deep fake.

0:08:54.596 --> 0:08:56.876
<v Speaker 2>No, they're using software for the bank they're using they're

0:08:56.916 --> 0:09:00.516
<v Speaker 2>still using a software and there's a dashboard. In that scenario,

0:09:00.636 --> 0:09:03.436
<v Speaker 2>they do they ascalate, so they might say, let me

0:09:03.436 --> 0:09:05.436
<v Speaker 2>ask you some more questions or let me call you back.

0:09:05.876 --> 0:09:07.756
<v Speaker 1>Huh. Let me call you back is a super safe one,

0:09:07.796 --> 0:09:09.676
<v Speaker 1>right because if they have a relationship, probably they know

0:09:09.716 --> 0:09:13.276
<v Speaker 1>the number. They just call them back. Yeah, absolutely, okay,

0:09:13.596 --> 0:09:16.236
<v Speaker 1>And then how does it work? How does it work

0:09:16.276 --> 0:09:18.796
<v Speaker 1>for like when you say, like I presume by the

0:09:18.836 --> 0:09:21.276
<v Speaker 1>way that you can't name your clients. You said a

0:09:21.316 --> 0:09:24.396
<v Speaker 1>media company and a bank. It's it's secret that they're.

0:09:24.476 --> 0:09:26.036
<v Speaker 2>Yeah, we're not allowed to okay.

0:09:26.036 --> 0:09:28.196
<v Speaker 1>So let's say a media company. How's it work for

0:09:28.236 --> 0:09:30.036
<v Speaker 1>a media company?

0:09:29.916 --> 0:09:33.196
<v Speaker 2>Their their use case is slightly different, especially right now,

0:09:33.236 --> 0:09:35.676
<v Speaker 2>as I mentioned around the election, So there there might

0:09:35.716 --> 0:09:38.076
<v Speaker 2>be something that that's starting to go viral in the

0:09:38.156 --> 0:09:40.876
<v Speaker 2>news and they want to check, hey, is this a

0:09:40.996 --> 0:09:43.916
<v Speaker 2>real or not? I would like to say like something

0:09:44.036 --> 0:09:47.556
<v Speaker 2>like this is usually when something goes viral, the damage

0:09:47.556 --> 0:09:48.276
<v Speaker 2>is already ton.

0:09:48.996 --> 0:09:51.156
<v Speaker 1>Yes, although if you're if you're whatever, the New York

0:09:51.196 --> 0:09:52.956
<v Speaker 1>Times of the Wall Street Journal. You don't want to

0:09:52.996 --> 0:09:55.796
<v Speaker 1>repeat the viral lie. Part of your business model is

0:09:55.876 --> 0:09:59.156
<v Speaker 1>people are paying to subscribe to you because you are

0:09:59.636 --> 0:10:00.916
<v Speaker 1>more reliable.

0:10:00.516 --> 0:10:02.716
<v Speaker 2>Right exactly. So that's why they come to us. They

0:10:02.796 --> 0:10:05.796
<v Speaker 2>upload the assets and are our web app returns the

0:10:05.876 --> 0:10:06.636
<v Speaker 2>results I see.

0:10:06.676 --> 0:10:09.036
<v Speaker 1>So it's just like you just go to whatever Real

0:10:09.676 --> 0:10:13.156
<v Speaker 1>Defender dot whatever and you upload the viral video and

0:10:13.396 --> 0:10:15.956
<v Speaker 1>your machine says it's a fake.

0:10:16.676 --> 0:10:19.516
<v Speaker 2>Yeah, So we give results and probabilities that we don't

0:10:19.516 --> 0:10:22.756
<v Speaker 2>have the ground truth, so we give a probability. There's

0:10:22.796 --> 0:10:25.876
<v Speaker 2>several different models running, so we use an ensemble of models.

0:10:25.876 --> 0:10:29.596
<v Speaker 2>We have different models looking at different things, and we

0:10:29.676 --> 0:10:32.676
<v Speaker 2>give an overall score averaging those. In the case of

0:10:32.716 --> 0:10:35.636
<v Speaker 2>a video, we actually highlight the areas of a defake.

0:10:36.116 --> 0:10:38.036
<v Speaker 2>If the person is speaking and they're a fake, there'll

0:10:38.036 --> 0:10:39.836
<v Speaker 2>be a red box around them. If there is a

0:10:39.916 --> 0:10:41.196
<v Speaker 2>real they'll be a green box around it.

0:10:41.516 --> 0:10:46.236
<v Speaker 1>And well, that latter part sounds more binary as opposed

0:10:46.276 --> 0:10:47.276
<v Speaker 1>to probabilistic.

0:10:47.476 --> 0:10:50.076
<v Speaker 2>We give both. So yeah, there's there was a probably

0:10:50.236 --> 0:10:52.436
<v Speaker 2>score and there's just the visual.

0:10:52.276 --> 0:10:55.356
<v Speaker 1>And so the probabilistic score is basically according to our model,

0:10:55.396 --> 0:10:58.596
<v Speaker 1>there's a seventy percent chance that this is fake something

0:10:58.756 --> 0:10:59.836
<v Speaker 1>of that nature.

0:10:59.676 --> 0:11:01.796
<v Speaker 2>According to our ensemble of models.

0:11:01.916 --> 0:11:04.836
<v Speaker 1>Yes, yeah, our model of models, our fund of funds

0:11:04.836 --> 0:11:09.316
<v Speaker 1>of models exactly. So so okay, so you're actually looking

0:11:09.436 --> 0:11:12.876
<v Speaker 1>us toward what's under the hood, right, I'm interested in

0:11:12.916 --> 0:11:15.796
<v Speaker 1>discussing this on a few levels. Right, there is the

0:11:15.836 --> 0:11:20.036
<v Speaker 1>sort of broad beyond reality defender. You know, what are

0:11:20.076 --> 0:11:22.636
<v Speaker 1>the basic ways that the technology works, Like how does

0:11:23.036 --> 0:11:26.596
<v Speaker 1>deepfake detection gen AI detection work? In a broad way?

0:11:26.636 --> 0:11:27.636
<v Speaker 1>Like can you talk me through that?

0:11:27.676 --> 0:11:31.196
<v Speaker 2>Absolutely? Yeah. There's currently two ways people are looking at

0:11:31.236 --> 0:11:35.836
<v Speaker 2>this problem. Number one is prominence. For example, you water

0:11:35.996 --> 0:11:39.636
<v Speaker 2>mark a media that you create, maybe you water market

0:11:40.036 --> 0:11:42.156
<v Speaker 2>or you digitally sign it, maybe you put on a

0:11:42.156 --> 0:11:44.436
<v Speaker 2>blockchain somewhere or something like that. But basically there's a

0:11:44.476 --> 0:11:47.036
<v Speaker 2>source of true that this video is real. Yeah, and

0:11:47.076 --> 0:11:48.556
<v Speaker 2>there's a water mark. That's number one.

0:11:50.156 --> 0:11:52.916
<v Speaker 1>But we're concerned. We're concerned with instances where that is

0:11:52.916 --> 0:11:54.916
<v Speaker 1>not the case. Right. Our world is full of videos

0:11:54.956 --> 0:12:00.436
<v Speaker 1>today that are not clearly watermarked, blockchain whatever for prominence.

0:12:00.476 --> 0:12:02.676
<v Speaker 1>So we have this problem. What are the ways people

0:12:02.676 --> 0:12:03.236
<v Speaker 1>are solving it?

0:12:03.516 --> 0:12:05.676
<v Speaker 2>Yeah? The second way is how we're solving it, which

0:12:05.716 --> 0:12:08.556
<v Speaker 2>is basically we use AI to detect AI, so we

0:12:09.596 --> 0:12:13.196
<v Speaker 2>which we call inference. So we train AI models, as

0:12:13.236 --> 0:12:16.116
<v Speaker 2>I mentioned, a bunch of them to look at various

0:12:17.636 --> 0:12:20.036
<v Speaker 2>various aspects of plus say video.

0:12:20.476 --> 0:12:24.836
<v Speaker 1>So like, is it a sort of generative adversarial network

0:12:25.636 --> 0:12:27.436
<v Speaker 1>the right term? I mean, it seems like you It

0:12:27.476 --> 0:12:29.276
<v Speaker 1>seems like if I were making up how to do this,

0:12:29.316 --> 0:12:32.156
<v Speaker 1>I'd be like, well, I'm gonna have one model that's

0:12:32.236 --> 0:12:35.196
<v Speaker 1>like cranking out really good deep fikes, but I'll know

0:12:35.236 --> 0:12:36.876
<v Speaker 1>which ones are the deep fis, and then I'm gonna

0:12:36.876 --> 0:12:38.476
<v Speaker 1>feed the deep fis and the real ones to my

0:12:38.516 --> 0:12:41.076
<v Speaker 1>other model, and I'll score it on how well it does,

0:12:41.116 --> 0:12:43.356
<v Speaker 1>and it'll get really good at figuring out the difference.

0:12:43.796 --> 0:12:46.596
<v Speaker 2>Yeah, that's actually exactly how a lot of these work.

0:12:46.676 --> 0:12:48.956
<v Speaker 2>For if you go to there's a website you can

0:12:48.956 --> 0:12:51.316
<v Speaker 2>go where it just generates a person every time you

0:12:51.396 --> 0:12:53.396
<v Speaker 2>go to it a right, and that's actually using again

0:12:53.596 --> 0:12:56.916
<v Speaker 2>to generate that person. So the way we detect and

0:12:56.956 --> 0:12:58.516
<v Speaker 2>I can I can give a little bit more detail here.

0:12:58.556 --> 0:13:02.276
<v Speaker 2>So for example, one of our models which we actually removed,

0:13:02.636 --> 0:13:07.636
<v Speaker 2>was looking at blood flow. So yeah, so imagine actually

0:13:07.676 --> 0:13:11.716
<v Speaker 2>in this video lighting and conditions are right, we can

0:13:11.796 --> 0:13:14.476
<v Speaker 2>actually detect the heartbeat and the blood flow and the

0:13:14.556 --> 0:13:16.476
<v Speaker 2>veins the way we're looking at each other.

0:13:16.916 --> 0:13:19.556
<v Speaker 1>As I'm looking at my weirdly today, maybe because it's

0:13:19.556 --> 0:13:21.436
<v Speaker 1>hot or because the light hair, I can actually see

0:13:21.436 --> 0:13:24.156
<v Speaker 1>a vein bulging on my forehead. So, like you're saying,

0:13:24.156 --> 0:13:28.036
<v Speaker 1>an AI could like measure my pulse from that or something.

0:13:27.996 --> 0:13:30.956
<v Speaker 2>In the right conditions. Now, that model has a lot

0:13:30.996 --> 0:13:36.116
<v Speaker 2>of limitations, and you need to have the right It's

0:13:36.156 --> 0:13:39.276
<v Speaker 2>basically it has a lot of bias. Right, So we

0:13:39.356 --> 0:13:39.796
<v Speaker 2>tossed that.

0:13:40.156 --> 0:13:42.396
<v Speaker 1>Wait, you're saying it didn't work. You're saying it didn't work.

0:13:42.636 --> 0:13:45.876
<v Speaker 2>It worked in the right conditions and the right skin tone,

0:13:46.316 --> 0:13:49.516
<v Speaker 2>so yeah, so otherwise it was biased. So we this

0:13:49.676 --> 0:13:52.476
<v Speaker 2>was experimental and we tossed it.

0:13:52.396 --> 0:13:54.596
<v Speaker 1>A lot of things. It didn't work. So you tried

0:13:54.596 --> 0:13:56.356
<v Speaker 1>it and in a broad way it didn't work. It

0:13:56.396 --> 0:13:58.436
<v Speaker 1>worked in narrow conditions, but you need things that work

0:13:58.476 --> 0:14:01.196
<v Speaker 1>more broadly. What's another thing you tried that didn't work?

0:14:02.356 --> 0:14:05.396
<v Speaker 2>Well, I can tell you every month we may be

0:14:05.516 --> 0:14:06.636
<v Speaker 2>throwing away models.

0:14:06.836 --> 0:14:09.196
<v Speaker 1>Well, presumably there's things that work for a while and

0:14:09.236 --> 0:14:13.236
<v Speaker 1>then they don't. Right, It's kind of like antibiotics versus bacteria, right,

0:14:13.356 --> 0:14:16.236
<v Speaker 1>like your adversaries are getting better every day.

0:14:16.596 --> 0:14:19.116
<v Speaker 2>Basically, what we use, what we like to use is

0:14:19.156 --> 0:14:21.196
<v Speaker 2>we like to say we're like an anti virus company.

0:14:21.476 --> 0:14:25.316
<v Speaker 2>So every time every month there's a new genera of technique,

0:14:25.516 --> 0:14:28.156
<v Speaker 2>maybe we should go detective. But maybe it's something we

0:14:28.196 --> 0:14:30.516
<v Speaker 2>don't anticipate and we don't detect, and so we have

0:14:30.556 --> 0:14:33.436
<v Speaker 2>to make sure we quickly update our models. So and

0:14:33.436 --> 0:14:36.676
<v Speaker 2>then a model that worked last year, it's completely irrelevant now.

0:14:37.156 --> 0:14:40.316
<v Speaker 1>So what else, like, what else is happening technologically on

0:14:40.436 --> 0:14:42.956
<v Speaker 1>the reality defense side, on the detection side.

0:14:43.556 --> 0:14:46.476
<v Speaker 2>Okay, so the way, we have a few different products.

0:14:46.556 --> 0:14:50.436
<v Speaker 2>One is, as I mentioned, real time audio like scanning

0:14:50.436 --> 0:14:53.236
<v Speaker 2>and listening for telephone calls. The other one is a

0:14:53.276 --> 0:14:55.836
<v Speaker 2>place where a journalist or any user can go and

0:14:55.996 --> 0:14:59.516
<v Speaker 2>upload not just videos, but we also detect images. We

0:14:59.556 --> 0:15:03.036
<v Speaker 2>also detect audio, We also detect texts like chat GPT,

0:15:03.596 --> 0:15:06.956
<v Speaker 2>and these tools also explain to a user why something

0:15:07.196 --> 0:15:09.036
<v Speaker 2>is a deep fake. We don't just give a score.

0:15:09.236 --> 0:15:11.476
<v Speaker 2>Or for an image, we might put a heat map

0:15:11.476 --> 0:15:13.876
<v Speaker 2>and see these are the areas that set the model off.

0:15:14.956 --> 0:15:17.796
<v Speaker 2>For text, we might highlight areas and see these other

0:15:17.876 --> 0:15:19.996
<v Speaker 2>areas that appear to be generated.

0:15:20.196 --> 0:15:22.756
<v Speaker 1>There's a case study you have about a university that

0:15:22.876 --> 0:15:27.316
<v Speaker 1>is a client of yours that, among other things, uses

0:15:28.116 --> 0:15:32.636
<v Speaker 1>uses your service to tell when students are turning in

0:15:32.676 --> 0:15:36.396
<v Speaker 1>papers written by chat GIPT. Basically as I read it, right, like,

0:15:36.556 --> 0:15:39.236
<v Speaker 1>I just assume that, like everybody writes papers with chat

0:15:39.316 --> 0:15:41.516
<v Speaker 1>GPT now and there's nothing anybody can do about it.

0:15:41.516 --> 0:15:43.716
<v Speaker 1>But is that not true? Like if I like have

0:15:43.836 --> 0:15:46.196
<v Speaker 1>GPT write my paper and then I like change a

0:15:46.196 --> 0:15:49.876
<v Speaker 1>few words, does that sort of help get let me

0:15:50.116 --> 0:15:51.796
<v Speaker 1>sail past your defense?

0:15:52.316 --> 0:15:55.236
<v Speaker 2>It depends depends how much you change, Yeah, or if

0:15:55.276 --> 0:15:58.756
<v Speaker 2>you change like over fifty percent, maybe maybe would So

0:15:59.036 --> 0:16:00.756
<v Speaker 2>it depends.

0:16:00.476 --> 0:16:02.796
<v Speaker 1>Over fifty percent is more than a few words. And

0:16:02.876 --> 0:16:04.996
<v Speaker 1>so can you talk? I mean, I know you can't

0:16:05.036 --> 0:16:07.196
<v Speaker 1>name the university, but in practice you know how they

0:16:07.276 --> 0:16:11.356
<v Speaker 1>use it. So you know, somefess runs the student's papers

0:16:11.396 --> 0:16:14.316
<v Speaker 1>through your software and it says of when student there's

0:16:14.316 --> 0:16:18.596
<v Speaker 1>a whatever sixty percent chance that this was created using

0:16:18.596 --> 0:16:22.236
<v Speaker 1>a large language model. I mean, do you know in practice?

0:16:22.276 --> 0:16:24.316
<v Speaker 1>Obviously the professor could do whatever they want or the

0:16:24.436 --> 0:16:26.636
<v Speaker 1>university could have whatever policy, but do you know in practice,

0:16:26.756 --> 0:16:30.156
<v Speaker 1>what do they do with this information like that's that's

0:16:30.196 --> 0:16:31.876
<v Speaker 1>in a way a harder one to figure out than

0:16:31.916 --> 0:16:34.436
<v Speaker 1>the like banker who's like, oh, it might be a

0:16:34.436 --> 0:16:36.076
<v Speaker 1>deep fake on the phone. I'll call you right back

0:16:36.116 --> 0:16:38.756
<v Speaker 1>for security. Like if my I don't have a banker,

0:16:38.876 --> 0:16:40.636
<v Speaker 1>but if I had a banker and they did that,

0:16:40.676 --> 0:16:43.076
<v Speaker 1>I'd be like, oh, that's cool. I'm glad my bank

0:16:43.196 --> 0:16:45.796
<v Speaker 1>is doing this thing. Whereas with like the professor and

0:16:45.836 --> 0:16:51.436
<v Speaker 1>the student, that's a much more sort of fraud situation, right,

0:16:52.276 --> 0:16:55.036
<v Speaker 1>and harder to think of how to deal with again

0:16:55.116 --> 0:16:59.156
<v Speaker 1>the probabilistic nature of the output of the model.

0:16:59.476 --> 0:17:01.556
<v Speaker 2>Yes, I think a couple more things here. First of all,

0:17:01.836 --> 0:17:05.436
<v Speaker 2>I think even universities are trying to figure out this problem.

0:17:05.476 --> 0:17:08.076
<v Speaker 2>How to you solve it? You know. But the second

0:17:08.236 --> 0:17:13.156
<v Speaker 2>thing to note, most of our users are not interested

0:17:13.236 --> 0:17:15.636
<v Speaker 2>in a text detector. That seems to be a much

0:17:15.676 --> 0:17:20.076
<v Speaker 2>smaller market. The biggest one is actually audio. It's becoming

0:17:20.916 --> 0:17:22.516
<v Speaker 2>imagine you get a call from a loved one and

0:17:22.836 --> 0:17:24.876
<v Speaker 2>send me money, and you send money if you realize

0:17:24.956 --> 0:17:27.516
<v Speaker 2>is not who it was a defate, right, That's actually

0:17:27.556 --> 0:17:30.636
<v Speaker 2>a much widely used system.

0:17:31.116 --> 0:17:34.436
<v Speaker 1>That's the big one in terms of the business it's interesting.

0:17:34.476 --> 0:17:36.556
<v Speaker 1>I mean, I wonder if that's partly like relative we

0:17:36.596 --> 0:17:38.796
<v Speaker 1>think about the video more, but is it partly because

0:17:39.116 --> 0:17:41.916
<v Speaker 1>deep fake audio is now quite good and there are

0:17:41.956 --> 0:17:44.876
<v Speaker 1>lots of instances where people will transfer lots of money

0:17:44.916 --> 0:17:46.476
<v Speaker 1>based solely on audio.

0:17:47.116 --> 0:17:49.196
<v Speaker 2>De fake audio is the best and it's getting better,

0:17:49.276 --> 0:17:51.996
<v Speaker 2>right interested. I used to go to make your voice,

0:17:51.996 --> 0:17:54.196
<v Speaker 2>maybe I need a minute. Now I need just a

0:17:54.196 --> 0:17:56.996
<v Speaker 2>few seconds and I can make your voice. It's getting

0:17:57.436 --> 0:18:00.396
<v Speaker 2>exponentially better. All of them are, but audio is definitely

0:18:00.676 --> 0:18:01.596
<v Speaker 2>top of the list right now.

0:18:01.716 --> 0:18:05.316
<v Speaker 1>Huh And how are you keeping up?

0:18:06.236 --> 0:18:09.516
<v Speaker 2>Yeah? I mean, so when we detect audio, it's tricky.

0:18:09.516 --> 0:18:13.116
<v Speaker 2>There's a lot of factors to think about a person's accent,

0:18:13.636 --> 0:18:17.076
<v Speaker 2>right what is model biased? Does it not understand or

0:18:17.196 --> 0:18:20.076
<v Speaker 2>is there an issue where it detects It detects one

0:18:20.076 --> 0:18:22.236
<v Speaker 2>person with a certain type accent always as a d thing.

0:18:22.756 --> 0:18:26.116
<v Speaker 2>There's also issues of like noise when when when there's

0:18:26.116 --> 0:18:28.436
<v Speaker 2>a lot of background noise, the model could be impacted.

0:18:28.556 --> 0:18:31.236
<v Speaker 2>When there's cosstop, multiple people speaking at the same time,

0:18:31.556 --> 0:18:34.956
<v Speaker 2>that could impact the model. So there's a variety of factors.

0:18:35.116 --> 0:18:37.036
<v Speaker 2>And the other thing you think about is our models

0:18:37.036 --> 0:18:40.956
<v Speaker 2>are more they support multiple languages, so we don't just

0:18:40.956 --> 0:18:43.636
<v Speaker 2>do English, and so all of these kind of make

0:18:43.676 --> 0:18:46.956
<v Speaker 2>it very complicated. So when we detect something it's called

0:18:46.996 --> 0:18:50.036
<v Speaker 2>pre processing, there's a whole bunch of steps to the

0:18:50.116 --> 0:18:52.796
<v Speaker 2>audio before it actually goes to our AI models where

0:18:52.796 --> 0:18:56.076
<v Speaker 2>we have to clean up the audio, do certain types

0:18:56.076 --> 0:18:58.276
<v Speaker 2>of transformations before we push it to the models.

0:18:58.316 --> 0:19:01.076
<v Speaker 1>And is that happening in real time with these companies?

0:19:01.156 --> 0:19:07.036
<v Speaker 1>Huh huh? And and are you like, what is the

0:19:07.076 --> 0:19:10.236
<v Speaker 1>frontier of preprocessing? Like is it is it an efficiency

0:19:10.236 --> 0:19:12.356
<v Speaker 1>and speed problem because you're trying to do it in

0:19:12.356 --> 0:19:14.876
<v Speaker 1>real time and so you're just trying to kind of

0:19:15.156 --> 0:19:18.036
<v Speaker 1>make the sort of algorithmic part of it as fast

0:19:18.036 --> 0:19:19.156
<v Speaker 1>and efficient as possible.

0:19:19.876 --> 0:19:22.636
<v Speaker 2>Yeah, I mean this is a challenge. There's a lot

0:19:22.676 --> 0:19:25.716
<v Speaker 2>to be done. So that's an ongoing research. How do

0:19:25.796 --> 0:19:28.836
<v Speaker 2>we continue to speed up not just a preprocessing, but

0:19:28.876 --> 0:19:33.156
<v Speaker 2>the inference. And there's a variety of one thing that's

0:19:33.156 --> 0:19:35.076
<v Speaker 2>called a foundation model. I'm not sure if you heard

0:19:35.236 --> 0:19:37.556
<v Speaker 2>what those are, but these are extremely large pre train

0:19:37.636 --> 0:19:40.196
<v Speaker 2>model GPT is a foundation model is a pre train model.

0:19:40.596 --> 0:19:43.556
<v Speaker 2>And so these models can be useful in some parts

0:19:43.556 --> 0:19:47.956
<v Speaker 2>of the preprocessing where they can quickly extract certain features

0:19:47.956 --> 0:19:50.436
<v Speaker 2>for us, and then we can use those two down

0:19:50.436 --> 0:19:55.516
<v Speaker 2>the pipeline.

0:19:54.036 --> 0:19:56.436
<v Speaker 1>Still to come on the show. The problems that Ali

0:19:56.636 --> 0:20:09.996
<v Speaker 1>is trying to solve. Now, how good are you at

0:20:09.996 --> 0:20:12.996
<v Speaker 1>detecting de fikes? Can you quantify how good you are?

0:20:13.956 --> 0:20:16.236
<v Speaker 2>So the way they usually do this is they look

0:20:16.276 --> 0:20:19.876
<v Speaker 2>at benchmarks. Right, there's public data sets which we can

0:20:19.916 --> 0:20:23.396
<v Speaker 2>take and run and we're in the nineties and then

0:20:23.556 --> 0:20:25.476
<v Speaker 2>but you know that's not the real world.

0:20:25.516 --> 0:20:27.556
<v Speaker 1>When you say you're in the nineties, you mean you

0:20:29.156 --> 0:20:33.836
<v Speaker 1>in a binary sense, you guess correctly ninety percent of

0:20:33.876 --> 0:20:34.316
<v Speaker 1>the time.

0:20:35.036 --> 0:20:37.636
<v Speaker 2>Yeah, So on a public benchmark, we're in the nineties.

0:20:37.636 --> 0:20:41.956
<v Speaker 2>There's accuracy, precision and recall. Accuracy is how accurate are

0:20:41.996 --> 0:20:45.436
<v Speaker 2>we Let's say there is one hundred sample set is

0:20:45.436 --> 0:20:50.076
<v Speaker 2>one hundred, maybe fifty is fake, fifty is real? Right.

0:20:50.116 --> 0:20:52.396
<v Speaker 2>The accuracy is you take, okay, how many of those

0:20:52.396 --> 0:20:55.276
<v Speaker 2>did you get right? How many of the real I'm fake? Divided? Right,

0:20:55.836 --> 0:20:58.676
<v Speaker 2>that's the that's the accuracy. The problem with that is

0:20:58.796 --> 0:21:02.596
<v Speaker 2>like unbalanced data set, maybe maybe only two is fake

0:21:03.156 --> 0:21:06.636
<v Speaker 2>and then the other ninety eight are real. So in

0:21:06.676 --> 0:21:11.076
<v Speaker 2>that case, the accuracy. See if we had said that Okay,

0:21:11.116 --> 0:21:14.076
<v Speaker 2>everything is real, we would be ninety eight percent. Right,

0:21:14.156 --> 0:21:17.156
<v Speaker 2>that's not very useful because you missed the defix. So

0:21:17.236 --> 0:21:20.996
<v Speaker 2>that's why precision and recall coming. They look specifically at

0:21:21.276 --> 0:21:24.476
<v Speaker 2>how did you do on that specific like the fakes

0:21:24.596 --> 0:21:28.196
<v Speaker 2>or the reals, So there's more than just accuracy. There's

0:21:28.236 --> 0:21:29.756
<v Speaker 2>also other factors to look at.

0:21:30.276 --> 0:21:33.116
<v Speaker 1>So there's it's kind of like the sort of false

0:21:33.156 --> 0:21:39.716
<v Speaker 1>positive false negative challenge with medical tests, right you want

0:21:39.756 --> 0:21:43.076
<v Speaker 1>to test that both says you have the thing, says

0:21:43.076 --> 0:21:45.116
<v Speaker 1>you have the disease when you have the disease, and

0:21:45.276 --> 0:21:47.436
<v Speaker 1>also says you don't have the disease when you don't

0:21:47.436 --> 0:21:49.756
<v Speaker 1>have the disease, And that actually ends up being a

0:21:49.796 --> 0:21:54.676
<v Speaker 1>really complicated problem given the nature of baselines, right like

0:21:54.716 --> 0:21:57.076
<v Speaker 1>in your universe, certainly in the universe of people calling

0:21:57.116 --> 0:22:01.956
<v Speaker 1>their banker. Almost everybody calling their banker is a real person, right,

0:22:02.716 --> 0:22:06.276
<v Speaker 1>but there are these very high stakes, presumably very rare

0:22:06.316 --> 0:22:08.156
<v Speaker 1>cases where it is a defake, and so that's like

0:22:08.196 --> 0:22:09.956
<v Speaker 1>a complicated problem.

0:22:10.316 --> 0:22:14.036
<v Speaker 2>It actually is, It absolutely is, and it's something as

0:22:14.076 --> 0:22:16.276
<v Speaker 2>we work with each customer, we have to tweak those.

0:22:16.396 --> 0:22:20.956
<v Speaker 2>Someonet higher false positives, someone higher false negatives. It depends

0:22:20.956 --> 0:22:23.276
<v Speaker 2>on each use case, in the case of a bank,

0:22:23.436 --> 0:22:25.676
<v Speaker 2>they want to be a bit more cautious. But that

0:22:25.796 --> 0:22:28.316
<v Speaker 2>also causes a lot of It could cause a lot

0:22:28.356 --> 0:22:29.876
<v Speaker 2>of pain depending on the volume.

0:22:29.876 --> 0:22:32.716
<v Speaker 1>Right, because if every client it's like, oh sorry, I

0:22:32.716 --> 0:22:34.076
<v Speaker 1>got to call you back to make sure you're not

0:22:34.116 --> 0:22:36.196
<v Speaker 1>a deep fake, Like that's not great.

0:22:36.956 --> 0:22:38.956
<v Speaker 2>Yeah, And if you have thousands of calls a day

0:22:38.996 --> 0:22:42.916
<v Speaker 2>and even one percent is a false positive or negative,

0:22:42.996 --> 0:22:45.476
<v Speaker 2>that that creates a lot of work, Yeah, because it

0:22:45.516 --> 0:22:46.036
<v Speaker 2>adds up.

0:22:46.196 --> 0:22:47.956
<v Speaker 1>How do you solve that? What do you do about that?

0:22:49.676 --> 0:22:53.476
<v Speaker 2>So the way it works is all about adjusting. You

0:22:53.476 --> 0:22:58.436
<v Speaker 2>can think of thresholds, right, We can adjust variety of

0:22:58.476 --> 0:23:02.196
<v Speaker 2>parameters as the output for a model, not just the

0:23:02.236 --> 0:23:08.396
<v Speaker 2>model itself, but the for example, in an audio as

0:23:08.436 --> 0:23:11.156
<v Speaker 2>we speak, you know, we could look at okay, how

0:23:11.196 --> 0:23:13.876
<v Speaker 2>long do you want to listen before you give an answer?

0:23:14.516 --> 0:23:17.316
<v Speaker 2>You know, maybe maybe? And the longer you listen, the

0:23:17.396 --> 0:23:21.876
<v Speaker 2>more the more confident, the more that's smart.

0:23:21.916 --> 0:23:24.516
<v Speaker 1>That makes sense, right, because it's essentially more data for

0:23:24.596 --> 0:23:28.396
<v Speaker 1>the model exactly. Yeah, what are you trying to figure

0:23:28.396 --> 0:23:31.116
<v Speaker 1>out now? Like what is the frontier?

0:23:32.236 --> 0:23:35.116
<v Speaker 2>What's really the latest now? And it's just amazing how quickly.

0:23:35.156 --> 0:23:38.076
<v Speaker 2>It's going as videos. So the videos that we detect

0:23:38.236 --> 0:23:41.476
<v Speaker 2>are like a face swap, Like you're sitting there speaking

0:23:41.676 --> 0:23:44.676
<v Speaker 2>and another person's face is on there. That's a face swap.

0:23:44.916 --> 0:23:48.676
<v Speaker 2>But now you can generate an entire video completely from scratch,

0:23:49.116 --> 0:23:52.836
<v Speaker 2>and you just type in the description and the video

0:23:52.916 --> 0:23:55.156
<v Speaker 2>comes out. You can take some you can I can

0:23:55.156 --> 0:23:57.076
<v Speaker 2>take your voice, a few seconds of your voice. I

0:23:57.116 --> 0:24:00.156
<v Speaker 2>can then have you say anything I want, which you

0:24:00.156 --> 0:24:03.356
<v Speaker 2>can clearly see. The bad, bad person can misuse these tools.

0:24:03.716 --> 0:24:06.036
<v Speaker 2>So the latest is these things are getting really good and.

0:24:06.196 --> 0:24:09.636
<v Speaker 1>Over time, like with those videos, is your how is

0:24:09.676 --> 0:24:13.236
<v Speaker 1>your reliability and accuracy changing? You're getting better or worse

0:24:13.356 --> 0:24:16.076
<v Speaker 1>or staying the same as the technology to create the

0:24:16.076 --> 0:24:17.036
<v Speaker 1>deep fix improves.

0:24:17.236 --> 0:24:20.516
<v Speaker 2>So what's interesting is it has slowed down in terms

0:24:20.556 --> 0:24:23.476
<v Speaker 2>of like the signatures, Like we don't need as much

0:24:23.556 --> 0:24:27.436
<v Speaker 2>data as we used to. So of course there's still

0:24:27.436 --> 0:24:29.356
<v Speaker 2>a lot of work and we're never going to stop,

0:24:29.596 --> 0:24:31.516
<v Speaker 2>but it is stabilizing a little bit.

0:24:32.516 --> 0:24:35.436
<v Speaker 1>When you say it, what is stabilizing a little bit,

0:24:36.956 --> 0:24:37.436
<v Speaker 1>So like the.

0:24:37.396 --> 0:24:40.956
<v Speaker 2>Defied signatures are stabilizing the way.

0:24:40.836 --> 0:24:43.996
<v Speaker 1>The signatures, meaning the giveaways, the things that I can't see,

0:24:44.396 --> 0:24:47.636
<v Speaker 1>but that your models can see that AI exactly.

0:24:47.676 --> 0:24:51.156
<v Speaker 2>So our models going back and give them more detail.

0:24:51.316 --> 0:24:54.396
<v Speaker 2>They're looking at different attributes of a piece of media,

0:24:54.556 --> 0:24:57.356
<v Speaker 2>and they pull out those attributes and then they send

0:24:57.396 --> 0:25:01.516
<v Speaker 2>those to our and house neural networks that steady those attributes.

0:25:01.916 --> 0:25:03.796
<v Speaker 1>Like one that you have mentioned, that the company has

0:25:03.836 --> 0:25:09.476
<v Speaker 1>mentioned publicly is the the sync of audio and video. Right, Yes,

0:25:09.556 --> 0:25:11.916
<v Speaker 1>maybe that's one where it's gotten better and it doesn't

0:25:11.956 --> 0:25:15.476
<v Speaker 1>matter anymore, but like it, from what I understand, from

0:25:15.516 --> 0:25:17.316
<v Speaker 1>what I've read, there was at least a time when

0:25:17.876 --> 0:25:20.196
<v Speaker 1>the sink of the audio and video tended to be

0:25:20.396 --> 0:25:25.756
<v Speaker 1>off in deep fake videos. Right? Is that an example

0:25:25.956 --> 0:25:26.756
<v Speaker 1>of a signature.

0:25:27.316 --> 0:25:29.556
<v Speaker 2>So the way that works is we train the model.

0:25:29.596 --> 0:25:33.356
<v Speaker 2>We say, hey, here's a bunch of people speaking, here's

0:25:33.396 --> 0:25:35.196
<v Speaker 2>what they look like. Look at the sink. Here's a

0:25:35.236 --> 0:25:37.636
<v Speaker 2>bunch of people like that or defikes, and look at

0:25:37.636 --> 0:25:40.356
<v Speaker 2>the sink, and we tune the model so we can

0:25:40.396 --> 0:25:42.476
<v Speaker 2>tell the difference. That's also happening to a video. By

0:25:42.476 --> 0:25:44.636
<v Speaker 2>the way, if you look at Sora and some of

0:25:44.676 --> 0:25:49.276
<v Speaker 2>these new models where someone's are walking, for example, their

0:25:49.356 --> 0:25:54.316
<v Speaker 2>legs are not like you know, they're not really smooth,

0:25:54.356 --> 0:25:56.076
<v Speaker 2>or they don't look right, So you can look at

0:25:56.076 --> 0:25:58.876
<v Speaker 2>that as well. That's the temporal dynamics we call that.

0:25:59.436 --> 0:26:03.516
<v Speaker 1>Uh Like temporal dynamics is basically are things proceeding in

0:26:03.636 --> 0:26:05.316
<v Speaker 1>time in a natural.

0:26:04.956 --> 0:26:07.196
<v Speaker 2>Way exactly how things change over time.

0:26:09.756 --> 0:26:12.116
<v Speaker 1>So yeah, all of these seem like things that you

0:26:12.156 --> 0:26:14.556
<v Speaker 1>can just that are going to be fleeting, right. Like

0:26:14.636 --> 0:26:20.276
<v Speaker 1>my baseline assumption is it'll all get solved. Do you

0:26:21.036 --> 0:26:22.676
<v Speaker 1>how long do you think you'll be able to defend

0:26:22.716 --> 0:26:23.276
<v Speaker 1>reality for?

0:26:24.316 --> 0:26:26.436
<v Speaker 2>You know, this question comes up all the time where

0:26:26.956 --> 0:26:29.676
<v Speaker 2>there is always a giveaway or there is always a

0:26:29.716 --> 0:26:31.596
<v Speaker 2>new way to look at the problem. We're not just

0:26:31.636 --> 0:26:34.156
<v Speaker 2>looking always at the raw pixels, right, We could look

0:26:34.196 --> 0:26:38.396
<v Speaker 2>at different aspects. We could look at the frequency. For example,

0:26:38.396 --> 0:26:40.396
<v Speaker 2>if you look at an image, you can actually break

0:26:40.396 --> 0:26:41.476
<v Speaker 2>it down into frequencies.

0:26:41.956 --> 0:26:44.596
<v Speaker 1>When you say frequency, what do you mean when you

0:26:44.636 --> 0:26:46.156
<v Speaker 1>say you can look at the frequency? What does that mean?

0:26:46.316 --> 0:26:49.396
<v Speaker 2>So? For example, okay, so let's go with audio. You

0:26:49.476 --> 0:26:51.516
<v Speaker 2>know you can use some of call four yer transformers

0:26:51.556 --> 0:26:54.956
<v Speaker 2>to actually break up an audio into individual wavelengths science

0:26:54.956 --> 0:26:56.756
<v Speaker 2>and co science that does a look. You can do

0:26:56.796 --> 0:26:59.116
<v Speaker 2>the same with for an image, for example, you can

0:26:59.156 --> 0:26:59.596
<v Speaker 2>break that.

0:26:59.636 --> 0:27:03.356
<v Speaker 1>Up like like the analogy of a wave form of audio.

0:27:03.556 --> 0:27:05.716
<v Speaker 2>Yeah, it can. It can be translated into a bunch

0:27:05.716 --> 0:27:10.196
<v Speaker 2>of waves. So so there's multiples that we look at.

0:27:10.276 --> 0:27:14.556
<v Speaker 2>There's and the AI there's always a giveaway, uh and

0:27:14.556 --> 0:27:17.556
<v Speaker 2>and and again we're also thinking outside the box, right,

0:27:17.636 --> 0:27:21.156
<v Speaker 2>like the blood flow for example. Right, But there's other

0:27:21.236 --> 0:27:22.716
<v Speaker 2>kind of similar things we could think about.

0:27:22.916 --> 0:27:28.516
<v Speaker 1>I mean, presumably you know there, you know, renaissance Renaissance Capital.

0:27:28.596 --> 0:27:32.196
<v Speaker 1>The James Simons is one of the first quant hedge funds,

0:27:32.796 --> 0:27:37.556
<v Speaker 1>and they made tons of money for a long time.

0:27:37.636 --> 0:27:41.516
<v Speaker 1>They wildly outperformed the market. Clearly they had a technological advantage.

0:27:41.596 --> 0:27:44.716
<v Speaker 1>And the thing Simon said, the founder of this math

0:27:44.756 --> 0:27:47.556
<v Speaker 1>guy about about that company. One of the things he

0:27:47.596 --> 0:27:51.436
<v Speaker 1>said was like, we actually don't want to hire like

0:27:51.596 --> 0:27:54.476
<v Speaker 1>finance people who have some story about why a stock

0:27:54.556 --> 0:27:56.876
<v Speaker 1>is going to outperform, because if there's a story about

0:27:56.916 --> 0:27:59.756
<v Speaker 1>it that then then somebody else is going to know

0:27:59.836 --> 0:28:03.116
<v Speaker 1>it already. Right. Their thing was just like, we just

0:28:03.756 --> 0:28:06.716
<v Speaker 1>give the model all the data and let the model

0:28:06.756 --> 0:28:11.756
<v Speaker 1>find these weird ass patterns that no human even understands.

0:28:11.996 --> 0:28:14.836
<v Speaker 1>But they work more often than they don't work, and

0:28:14.916 --> 0:28:17.756
<v Speaker 1>we make tons of money. And I would think that

0:28:17.796 --> 0:28:20.636
<v Speaker 1>would be the case for you to some extent that

0:28:20.636 --> 0:28:22.516
<v Speaker 1>if you could think of a thing like monitoring blood

0:28:22.516 --> 0:28:25.276
<v Speaker 1>flow or whatever, then the bad guys or whatever, the

0:28:25.276 --> 0:28:29.036
<v Speaker 1>people who want to make realistic Jenny I would also

0:28:29.076 --> 0:28:31.356
<v Speaker 1>think of it. And the real kind of secret sauce

0:28:31.396 --> 0:28:36.636
<v Speaker 1>would be in weird correlations that the model finds that

0:28:37.276 --> 0:28:39.076
<v Speaker 1>we wouldn't even understand.

0:28:40.196 --> 0:28:44.596
<v Speaker 2>Exactly. I mean, that is oftentimes what the model is

0:28:44.716 --> 0:28:48.596
<v Speaker 2>training on, and the way it determines of something that

0:28:48.596 --> 0:28:52.796
<v Speaker 2>you think looking at certain features, it is something that

0:28:52.836 --> 0:28:55.636
<v Speaker 2>we don't even tell it, right, Yeah, it determines on

0:28:55.676 --> 0:28:56.036
<v Speaker 2>its own.

0:28:56.076 --> 0:28:58.676
<v Speaker 1>Like that's the beauty of this kind of new era

0:28:58.876 --> 0:29:03.676
<v Speaker 1>of whatever, neural networks, machine learning. Right, it's just you

0:29:03.796 --> 0:29:06.956
<v Speaker 1>throw everything at it and let the machine figure it out.

0:29:07.196 --> 0:29:09.636
<v Speaker 2>We like to say we throw the kitchen sink at sometimes.

0:29:09.716 --> 0:29:12.756
<v Speaker 1>Yes, yes, I mean, And so when you were talking

0:29:12.796 --> 0:29:17.436
<v Speaker 1>before about explainability, right about sort of saying in your output,

0:29:17.956 --> 0:29:20.636
<v Speaker 1>here's why we think it's fake. I feel like that

0:29:20.756 --> 0:29:22.516
<v Speaker 1>kind of throw everything at it and let the machine

0:29:22.556 --> 0:29:24.996
<v Speaker 1>figure it out makes it hard to like sometimes you

0:29:24.996 --> 0:29:26.996
<v Speaker 1>don't know, right, it's just like, well, the machine is

0:29:27.356 --> 0:29:30.996
<v Speaker 1>very smart in it says this is probably fake, Like yes,

0:29:31.116 --> 0:29:33.556
<v Speaker 1>that is that intention that can happen.

0:29:33.636 --> 0:29:36.116
<v Speaker 2>So you'll look at it. We'll show you an image

0:29:36.156 --> 0:29:38.796
<v Speaker 2>and it'll say the model was looking at certain areas.

0:29:38.876 --> 0:29:41.476
<v Speaker 2>And by the way, this also helps us with debugging

0:29:41.556 --> 0:29:44.316
<v Speaker 2>him bias. Right, maybe it was like for some reason

0:29:44.396 --> 0:29:49.156
<v Speaker 2>looking at an area of the face that we wouldn't tell.

0:29:49.196 --> 0:29:51.956
<v Speaker 2>Why would that set off the model? And so in

0:29:51.956 --> 0:29:54.636
<v Speaker 2>those scenario as we also investigate like why was this

0:29:54.756 --> 0:29:58.516
<v Speaker 2>area flag? And it could be one hundred percent correct,

0:29:58.916 --> 0:30:01.436
<v Speaker 2>it's just we do we do have to examine it further.

0:30:02.516 --> 0:30:04.996
<v Speaker 1>Could you create a deep fake that would fool your

0:30:05.076 --> 0:30:10.276
<v Speaker 1>deep fake detector? Yes, haha, Well if you could do it,

0:30:10.316 --> 0:30:13.076
<v Speaker 1>somebody else could do it. Don't you think I could

0:30:13.116 --> 0:30:13.356
<v Speaker 1>do it?

0:30:13.396 --> 0:30:17.396
<v Speaker 2>Because I have access to a lot more knowledge, right, Like,

0:30:18.316 --> 0:30:20.476
<v Speaker 2>you know I could if I was running an anti

0:30:20.556 --> 0:30:23.836
<v Speaker 2>virus company, I could probably write a virus if I

0:30:23.916 --> 0:30:27.396
<v Speaker 2>knew exactly what we're constantly actually trying to do that.

0:30:27.436 --> 0:30:29.516
<v Speaker 1>By the way, yeah, I mean in a sense, that's

0:30:29.596 --> 0:30:33.276
<v Speaker 1>the whole adversarial network thing, right, Like I guess you

0:30:33.756 --> 0:30:36.996
<v Speaker 1>have to do that for your detection models or your

0:30:37.036 --> 0:30:38.956
<v Speaker 1>suite of models to get better, right.

0:30:39.076 --> 0:30:41.916
<v Speaker 2>Yeah, So we have what's called red teeming both black

0:30:41.956 --> 0:30:44.516
<v Speaker 2>box and understanding of the codes. So we're trying to

0:30:44.516 --> 0:30:46.956
<v Speaker 2>break the models. That's part of the what we do.

0:30:47.436 --> 0:30:50.556
<v Speaker 1>Uh huh. And so are there like evil geniuses at

0:30:50.596 --> 0:30:52.636
<v Speaker 1>your company who can make killer deep fakes?

0:30:53.796 --> 0:30:57.836
<v Speaker 2>We definitely have geniuses one hundred percent, but we're in

0:30:57.876 --> 0:31:00.636
<v Speaker 2>the business of detection, right, we don't. We don't try

0:31:00.636 --> 0:31:03.396
<v Speaker 2>to generate too much other than just for training the models.

0:31:03.676 --> 0:31:08.556
<v Speaker 1>I mean I have to think like, there are many

0:31:08.596 --> 0:31:10.676
<v Speaker 1>people in the world world who want to make a

0:31:11.716 --> 0:31:15.916
<v Speaker 1>deep fakes for many reasons, and they're at different levels

0:31:15.956 --> 0:31:22.236
<v Speaker 1>of technological sophistication. Naively not knowing much about this, I

0:31:22.236 --> 0:31:25.076
<v Speaker 1>would think you can catch most of them. But if

0:31:25.116 --> 0:31:27.436
<v Speaker 1>you have people who can beat your models, I would

0:31:27.476 --> 0:31:31.916
<v Speaker 1>imagine that, say, state actors, countries throwing billions of dollars

0:31:31.996 --> 0:31:35.156
<v Speaker 1>at this probably also have people who could defeat your models.

0:31:36.996 --> 0:31:40.796
<v Speaker 2>Yeah, I mean that's always a case with any cybersecurity company.

0:31:40.956 --> 0:31:44.716
<v Speaker 2>We are a cybersecurity company. Every cyber security company does

0:31:44.756 --> 0:31:49.396
<v Speaker 2>its best to defend right, but we did not promise

0:31:49.476 --> 0:31:53.316
<v Speaker 2>one hundred percent. Our models are always a probability.

0:31:54.076 --> 0:31:57.436
<v Speaker 1>Who's who's the best at making deep fikes that you're

0:31:57.476 --> 0:31:57.996
<v Speaker 1>aware of.

0:31:58.716 --> 0:32:01.836
<v Speaker 2>There's a few, right, there's like Sora from OpenAI. There's Runway,

0:32:01.996 --> 0:32:03.676
<v Speaker 2>there's Synthesia, there's you.

0:32:03.636 --> 0:32:06.116
<v Speaker 1>Better be able to catch right, anything I've heard of.

0:32:06.236 --> 0:32:08.796
<v Speaker 1>You better be really good at the technic. Presumably it's

0:32:08.836 --> 0:32:13.116
<v Speaker 1>like some like you know, Russian Genius Squad or I

0:32:13.116 --> 0:32:15.196
<v Speaker 1>don't know, the North Koreans are some things. I would

0:32:15.236 --> 0:32:17.396
<v Speaker 1>imagine it is some state funded actor, but.

0:32:17.556 --> 0:32:20.076
<v Speaker 2>I would I would actually say, you know, we're in

0:32:20.076 --> 0:32:23.356
<v Speaker 2>a place where this is a problem is getting bigger.

0:32:23.516 --> 0:32:25.436
<v Speaker 2>But we're in a place where a lot of the

0:32:25.516 --> 0:32:28.676
<v Speaker 2>defects coming out are actually fron entertainment and they're not

0:32:28.996 --> 0:32:31.316
<v Speaker 2>like Youth for Evil. You know, you've seen the famous

0:32:31.356 --> 0:32:36.196
<v Speaker 2>Tom kruzwe or or other actors running around and do things,

0:32:36.196 --> 0:32:38.196
<v Speaker 2>and those are defakes, right, those are actually pretty good.

0:32:38.236 --> 0:32:39.916
<v Speaker 2>We detect them, but they're actually very good.

0:32:40.596 --> 0:32:42.556
<v Speaker 1>What are you thinking about in the context of the

0:32:42.756 --> 0:32:44.436
<v Speaker 1>of the election in the US this year and do

0:32:44.476 --> 0:32:48.436
<v Speaker 1>you have particular clients who are especially focused on election

0:32:48.556 --> 0:32:50.436
<v Speaker 1>related deep fakes.

0:32:51.436 --> 0:32:56.556
<v Speaker 2>Yeah, the media companies are the main ones, and we're ready.

0:32:56.996 --> 0:33:02.356
<v Speaker 2>We detect the best, the best defakes, right, everything that's

0:33:02.396 --> 0:33:05.516
<v Speaker 2>coming out we detect, So we're ready and we want

0:33:05.596 --> 0:33:09.716
<v Speaker 2>to make sure we're there as one avenue of people

0:33:10.396 --> 0:33:13.596
<v Speaker 2>verifying content. I believe late last year there was an

0:33:13.636 --> 0:33:17.876
<v Speaker 2>election in Slovenia where there was an audio of one

0:33:17.916 --> 0:33:21.796
<v Speaker 2>of the candidates saying he's gonna double the price of beer. Yeah,

0:33:21.836 --> 0:33:25.756
<v Speaker 2>and that actually was a defake. It was caught, but

0:33:25.876 --> 0:33:28.556
<v Speaker 2>it kind of costed some damage. So it's starting to

0:33:28.596 --> 0:33:28.996
<v Speaker 2>happen now.

0:33:29.076 --> 0:33:31.356
<v Speaker 1>It's an awesomely stupid deep fake. I mean, to me,

0:33:31.596 --> 0:33:35.516
<v Speaker 1>the real risk of deep fakes is not people believing

0:33:35.556 --> 0:33:39.636
<v Speaker 1>something that's false. It's people ceasing to believe anything, right.

0:33:39.716 --> 0:33:43.116
<v Speaker 1>It's just saying, oh, that's probably just a deep fake,

0:33:43.196 --> 0:33:45.716
<v Speaker 1>right like that. Actually, to me seems like the bigger

0:33:45.796 --> 0:33:49.236
<v Speaker 1>risk is nothing is true anymore. Nobody cares about the

0:33:49.276 --> 0:33:49.996
<v Speaker 1>truth anymore.

0:33:50.556 --> 0:33:56.316
<v Speaker 2>That's definitely a problem as well. Now we're seeing people saying, oh,

0:33:56.396 --> 0:33:58.996
<v Speaker 2>this is a defake. That's actually happened. There's a few.

0:33:59.996 --> 0:34:03.156
<v Speaker 2>I believe it was a Cape Milton video if I'm correct,

0:34:03.156 --> 0:34:04.876
<v Speaker 2>that was earlier this year, or everyone thought that was

0:34:04.876 --> 0:34:08.316
<v Speaker 2>a defake and it wasn't. So this kind of problem

0:34:08.356 --> 0:34:09.356
<v Speaker 2>is happening.

0:34:08.956 --> 0:34:12.476
<v Speaker 1>Like that's because people people want to believe things that

0:34:12.516 --> 0:34:14.836
<v Speaker 1>are consistent with their prior beliefs, and they don't want

0:34:14.876 --> 0:34:18.356
<v Speaker 1>to believe things that call their prior beliefs into question, right,

0:34:18.436 --> 0:34:20.796
<v Speaker 1>and so deep fakes in a way are an easy

0:34:20.836 --> 0:34:23.676
<v Speaker 1>out where if you see something you like, you assume

0:34:23.676 --> 0:34:25.476
<v Speaker 1>it's true. If you see something you don't like, you

0:34:25.516 --> 0:34:27.716
<v Speaker 1>assume it's not true, or you assume everything's just kind

0:34:27.716 --> 0:34:29.756
<v Speaker 1>of bullshit like that to me seems like a big

0:34:29.836 --> 0:34:32.476
<v Speaker 1>quind of societal level risk of deep fakes.

0:34:32.636 --> 0:34:36.276
<v Speaker 2>We'll never fix that. That's something that will never solve. Yeah,

0:34:36.836 --> 0:34:40.076
<v Speaker 2>people have their own beliefs. You can show them anything,

0:34:40.236 --> 0:34:44.156
<v Speaker 2>the facts, math, that's not going to fix it all. Yeah.

0:34:44.196 --> 0:34:46.476
<v Speaker 1>No, I guess that's a human nature problem, if not

0:34:46.556 --> 0:34:52.636
<v Speaker 1>an AI problem. We'll be back in a minute with

0:34:52.676 --> 0:35:08.556
<v Speaker 1>the lighting round. Okay, let's close with a lightning round.

0:35:09.196 --> 0:35:09.516
<v Speaker 2>Okay.

0:35:10.236 --> 0:35:13.396
<v Speaker 1>How often do people applying to work at Reality Defender

0:35:13.756 --> 0:35:16.076
<v Speaker 1>use generative AI to write cover letters?

0:35:16.476 --> 0:35:19.116
<v Speaker 2>Oh, that's a good one. Not a lot of, but

0:35:19.116 --> 0:35:21.396
<v Speaker 2>we've seen it for sure. I would say maybe about

0:35:21.676 --> 0:35:22.356
<v Speaker 2>three percent.

0:35:22.836 --> 0:35:27.316
<v Speaker 1>Okay. If I want to use generative AI to write

0:35:27.316 --> 0:35:29.756
<v Speaker 1>a cover letter to apply to work at Reality Defender,

0:35:29.836 --> 0:35:32.116
<v Speaker 1>but I don't want to get caught, what should I do.

0:35:33.156 --> 0:35:35.276
<v Speaker 2>Change about seventy five percent.

0:35:34.996 --> 0:35:39.356
<v Speaker 1>Of the words Okay, who is Gabe Reagan?

0:35:41.476 --> 0:35:45.316
<v Speaker 2>Gabe was? I think it was our VP of public

0:35:45.596 --> 0:35:48.476
<v Speaker 2>relations or something like that. Here's a dfake. We we

0:35:48.516 --> 0:35:50.996
<v Speaker 2>created him as a as kind of a as kind

0:35:51.036 --> 0:35:54.196
<v Speaker 2>of a fun joke. But obviously we tell everyone.

0:35:54.516 --> 0:35:56.636
<v Speaker 1>Tell me, tell me a little bit more about that.

0:35:57.756 --> 0:36:01.316
<v Speaker 2>If you go on certain websites where you put your

0:36:01.316 --> 0:36:05.436
<v Speaker 2>photo and maybe your job experience, there's quite a large

0:36:05.516 --> 0:36:10.036
<v Speaker 2>number of deficke profiles on these websites like LinkedIn.

0:36:11.276 --> 0:36:16.996
<v Speaker 1>Yes, huh, why why people be doing that?

0:36:17.076 --> 0:36:18.276
<v Speaker 2>Sorry scammers?

0:36:18.676 --> 0:36:20.276
<v Speaker 1>I'm trying to think, how do you get money out

0:36:20.276 --> 0:36:22.196
<v Speaker 1>of people by having a fake LinkedIn account?

0:36:22.316 --> 0:36:25.396
<v Speaker 2>Oh? I can tell you. Let's say you start the

0:36:25.476 --> 0:36:28.956
<v Speaker 2>most popular ones that I'm aware of, is like cryptocurrency.

0:36:28.996 --> 0:36:31.116
<v Speaker 2>Maybe you create a coin and you're like, here's a

0:36:31.236 --> 0:36:33.956
<v Speaker 2>CEO and here's this person and they have these great

0:36:33.956 --> 0:36:36.716
<v Speaker 2>LinkedIn profiles. Here's their photo and they're not real, but

0:36:36.876 --> 0:36:38.516
<v Speaker 2>it sells a story. Right.

0:36:41.076 --> 0:36:45.356
<v Speaker 1>Is it right that you founded a clothing company?

0:36:46.116 --> 0:36:46.436
<v Speaker 2>I did?

0:36:46.516 --> 0:36:49.316
<v Speaker 1>Yes, what's one thing you learned about fashion from doing that?

0:36:50.876 --> 0:36:54.196
<v Speaker 2>It's much different than software development.

0:36:54.956 --> 0:36:57.196
<v Speaker 1>Sure, I don't think you needed to start a company

0:36:57.236 --> 0:37:00.756
<v Speaker 1>to learn that. I mean, the marginal cost is not

0:37:00.956 --> 0:37:01.956
<v Speaker 1>zero for one thing.

0:37:02.556 --> 0:37:05.436
<v Speaker 2>Yeah, the software is easy, you write some It's not

0:37:05.636 --> 0:37:08.116
<v Speaker 2>easy at all. But what I mean is you're writing

0:37:08.196 --> 0:37:11.636
<v Speaker 2>some code and you ship it. Versus in fashion, you

0:37:11.676 --> 0:37:13.396
<v Speaker 2>have to have like you got to source the fabric.

0:37:13.436 --> 0:37:16.356
<v Speaker 2>You gotta you gotta design it, you gotta make the patterns,

0:37:16.356 --> 0:37:18.156
<v Speaker 2>you gotta cut it, sew it, make sure it fits.

0:37:18.236 --> 0:37:19.076
<v Speaker 2>It's a lot more work.

0:37:22.196 --> 0:37:24.196
<v Speaker 1>What are the chances that we exist in a simulation?

0:37:25.836 --> 0:37:27.236
<v Speaker 2>You know, I used to think this is kind of

0:37:27.276 --> 0:37:30.396
<v Speaker 2>a joke, but I don't know. I'm seeing every every

0:37:30.796 --> 0:37:34.716
<v Speaker 2>month it seems to get higher. From my perspective.

0:37:35.196 --> 0:37:36.076
<v Speaker 1>Why do you say that.

0:37:37.356 --> 0:37:40.156
<v Speaker 2>I'm seeing what's happening with tech and what we're building,

0:37:40.316 --> 0:37:43.076
<v Speaker 2>and there's you can see there's there was one paper

0:37:43.116 --> 0:37:46.036
<v Speaker 2>where they took a bunch of agents and they gave

0:37:46.076 --> 0:37:47.836
<v Speaker 2>them all a job and they start to do it

0:37:47.876 --> 0:37:50.036
<v Speaker 2>and they just started to like create their own kind

0:37:50.116 --> 0:37:52.996
<v Speaker 2>of like work cloths. Right, I don't know, it shuld

0:37:53.036 --> 0:37:53.556
<v Speaker 2>be getting there.

0:37:53.876 --> 0:37:56.236
<v Speaker 1>So so it's like, well, if we can create a

0:37:56.316 --> 0:38:00.196
<v Speaker 1>simulation that seems like reality, maybe someone created a simulation

0:38:00.356 --> 0:38:05.156
<v Speaker 1>that is our reality exactly. Yeah, what do you wish

0:38:05.236 --> 0:38:06.756
<v Speaker 1>more people understood about AI.

0:38:07.716 --> 0:38:09.796
<v Speaker 2>I mean, it's a tool, and I don't think people

0:38:09.796 --> 0:38:12.556
<v Speaker 2>should be afraid of it. They should embrace it. And

0:38:12.996 --> 0:38:16.036
<v Speaker 2>you know there's people are just running away from it.

0:38:16.036 --> 0:38:20.676
<v Speaker 2>It's fantastic, it's great. Embrace it. Just be careful. One

0:38:20.676 --> 0:38:22.836
<v Speaker 2>thing I'd like to tell, like my friends and family,

0:38:23.036 --> 0:38:25.476
<v Speaker 2>especially with the e fake audio, have a safe word.

0:38:25.516 --> 0:38:28.716
<v Speaker 2>As somebody calls you and you're like that's weird, you know,

0:38:29.236 --> 0:38:31.356
<v Speaker 2>call him back or ask for a safe word.

0:38:31.876 --> 0:38:39.236
<v Speaker 1>What do you wish more people understood about reality reality?

0:38:39.916 --> 0:38:42.876
<v Speaker 2>I would say, just be aware that you exist. And

0:38:43.036 --> 0:38:45.316
<v Speaker 2>every day's a gift, So you should be excited that

0:38:45.476 --> 0:38:48.276
<v Speaker 2>you hear. Like the chances of you existing it's like

0:38:48.596 --> 0:38:52.116
<v Speaker 2>you've won the lottery a million times, So every day's

0:38:52.116 --> 0:38:52.476
<v Speaker 2>a gift.

0:38:56.836 --> 0:39:00.996
<v Speaker 1>Ali Shakiyari is the co founder and CTO at Reality Defender.

0:39:01.956 --> 0:39:05.236
<v Speaker 1>Today's show was produced by Gabriel Hunter Chang. It was

0:39:05.476 --> 0:39:08.916
<v Speaker 1>edited by Lyddy Jean Kott and engineered by Sarah Bruguer.

0:39:09.396 --> 0:39:13.036
<v Speaker 1>You can email us at problem at Pushkin dot fm.

0:39:13.116 --> 0:39:15.476
<v Speaker 1>I'm Jacob Goldstein and we'll be back next week with

0:39:15.516 --> 0:39:26.996
<v Speaker 1>another episode of What's Your Problem