WEBVTT - A Conversation with Shiladitya Sircar from BlackBerry on DeepFake Threats

0:00:21.273 --> 0:00:24.423
<v S1>All right. So welcome, Michelle. Good to have you back

0:00:24.423 --> 0:00:25.893
<v S1>on unsupervised learning.

0:00:26.463 --> 0:00:28.623
<v S2>Yeah, I'm happy to be here. Thanks, Daniel.

0:00:29.463 --> 0:00:33.333
<v S1>So you're the senior VP of product engineering and data

0:00:33.333 --> 0:00:37.863
<v S1>science at BlackBerry, and you've been on UL before. So

0:00:37.863 --> 0:00:40.893
<v S1>good to have you back. And what I wanted to

0:00:40.893 --> 0:00:46.323
<v S1>talk to you about today is deepfakes. And basically what

0:00:46.323 --> 0:00:50.883
<v S1>you're seeing around that, and I guess starting off like

0:00:50.913 --> 0:00:54.693
<v S1>what are the main cyber threats that you see deepfakes, uh,

0:00:54.933 --> 0:00:56.463
<v S1>picking up for us?

0:00:57.303 --> 0:01:02.943
<v S2>Yeah. You know, I think we're constantly getting immersed with, um,

0:01:02.973 --> 0:01:06.303
<v S2>you know, this intricate dance with innovation and malicious intent.

0:01:06.303 --> 0:01:09.783
<v S2>And I think initially we were seeing that content was

0:01:09.783 --> 0:01:14.343
<v S2>getting generated, whether textual in nature, like better phishing, more

0:01:14.343 --> 0:01:18.903
<v S2>convincing phishing, I would say, or personalized phishing, if you will,

0:01:18.903 --> 0:01:23.433
<v S2>from a content Perspective, trying to reflect your browsing habits.

0:01:23.463 --> 0:01:27.303
<v S2>You know, have a phishing email that would create such

0:01:27.303 --> 0:01:31.083
<v S2>that you would probably click on them. And on that,

0:01:31.083 --> 0:01:35.223
<v S2>I think it's progressively gotten more sophisticated. And, you know,

0:01:35.253 --> 0:01:39.333
<v S2>with media, I think we're seeing, you know, generative AI

0:01:39.363 --> 0:01:44.373
<v S2>that is were used for generating content with the multimodal

0:01:44.373 --> 0:01:49.023
<v S2>model technology. It basically revolutionized. I mean, the idea was

0:01:49.023 --> 0:01:53.493
<v S2>that it was mostly for the entertainment and education industry.

0:01:53.493 --> 0:01:55.503
<v S2>But on the other hand, as we are seeing with

0:01:55.503 --> 0:02:00.213
<v S2>these deepfakes, it's not just limited to phishing and textual, uh,

0:02:00.213 --> 0:02:03.783
<v S2>type of attacks or social engineering attacks, but more powerful

0:02:03.783 --> 0:02:09.543
<v S2>sort of reality, indistinguishable reality from fiction type of attacks where,

0:02:09.693 --> 0:02:14.193
<v S2>you know, deepfakes cause this dystopian vision that is becoming

0:02:14.193 --> 0:02:21.123
<v S2>a reality. Now malicious actors are creating highly convincing Videos, audios,

0:02:21.453 --> 0:02:25.803
<v S2>individuals saying things that they would never say or do before.

0:02:25.833 --> 0:02:30.843
<v S2>Identity theft has been a main fragment of that that

0:02:30.843 --> 0:02:36.693
<v S2>is sort of coming into effect with these deep voice fakes.

0:02:36.693 --> 0:02:40.053
<v S2>So yeah, so I think it sort of started with

0:02:40.053 --> 0:02:44.763
<v S2>the deception and now in a full form of, um,

0:02:44.763 --> 0:02:46.563
<v S2>identity compromise.

0:02:47.073 --> 0:02:47.553
<v S3>Mhm.

0:02:48.633 --> 0:02:51.993
<v S1>Yeah. And when you, when you talk about identity compromise, uh,

0:02:51.993 --> 0:02:54.843
<v S1>what do you mean what type of attack would that

0:02:54.843 --> 0:02:57.363
<v S1>be like. What is the scenario look like.

0:02:57.963 --> 0:03:01.803
<v S2>So um, I think, you know, if you, if you

0:03:01.803 --> 0:03:05.553
<v S2>look at like, you know, um, increasing number of deep fakes,

0:03:05.553 --> 0:03:11.013
<v S2>what we see in, in social media, even things that are, um,

0:03:11.013 --> 0:03:14.973
<v S2>you know, pretty benign people trying to lose a few years, uh,

0:03:14.973 --> 0:03:18.663
<v S2>or trying to lose a few. um, years from their

0:03:18.663 --> 0:03:21.693
<v S2>life in terms of looking more young or, you know,

0:03:21.723 --> 0:03:24.903
<v S2>seeing how they would look if they get older, uh,

0:03:24.903 --> 0:03:30.153
<v S2>even some similar these benign activities, the most concerning development

0:03:30.153 --> 0:03:33.813
<v S2>is the ability to take some of the same technology

0:03:33.813 --> 0:03:38.133
<v S2>and apply on voice and creating, you know, voice cloning, uh,

0:03:38.133 --> 0:03:40.833
<v S2>and voice fakes. And the reason why it is very

0:03:40.833 --> 0:03:45.993
<v S2>disturbing as a trend is because, uh, typically what you hear,

0:03:45.993 --> 0:03:48.873
<v S2>you're you're being trained to like, with all these years is,

0:03:48.903 --> 0:03:53.133
<v S2>is real. Like, if you recognize somebody's voice, we, um,

0:03:53.133 --> 0:03:57.603
<v S2>you know, our, our brains are trained to associate relationship

0:03:57.603 --> 0:04:03.243
<v S2>based on, uh, audio senses. So with deep voice or

0:04:03.243 --> 0:04:08.343
<v S2>voice cloning technology coming out because this enables cyber criminals

0:04:08.343 --> 0:04:13.893
<v S2>to create fake identities, um, you know, that enables people

0:04:13.893 --> 0:04:19.863
<v S2>to disclose information that they would otherwise not pass some biometric. Um,

0:04:19.893 --> 0:04:22.863
<v S2>you know, voice checks and things like that. That's what

0:04:22.863 --> 0:04:26.973
<v S2>I mean by identity. Um, yeah. Cloning?

0:04:27.543 --> 0:04:29.493
<v S1>Yeah, that makes sense. One way I like to think

0:04:29.493 --> 0:04:34.683
<v S1>about this is to imagine, um, when I think about

0:04:34.683 --> 0:04:39.243
<v S1>what can happen from a deepfake. I like to think

0:04:39.243 --> 0:04:42.813
<v S1>less about the deepfake itself and just imagine the impact

0:04:42.813 --> 0:04:46.713
<v S1>that it would have. So, for example, one, uh, one

0:04:46.713 --> 0:04:49.503
<v S1>of the big things is, uh, Beck attacks. And this

0:04:49.503 --> 0:04:54.183
<v S1>is before I. Right, or before modern AI. So it

0:04:54.183 --> 0:04:57.723
<v S1>was like, um, you just send an email and say, hey,

0:04:57.723 --> 0:04:59.943
<v S1>the boss wants you to transfer this money because we're

0:04:59.943 --> 0:05:03.333
<v S1>doing this merger and it's really important. And if the

0:05:03.333 --> 0:05:07.983
<v S1>email was, uh, convincing enough, then that money would would

0:05:07.983 --> 0:05:10.863
<v S1>transfer and they would lose, you know, a thousands of

0:05:10.863 --> 0:05:14.673
<v S1>dollars or millions of dollars or whatever. so that would

0:05:14.673 --> 0:05:18.483
<v S1>be one. Um, and then there's other things like, uh,

0:05:19.053 --> 0:05:22.053
<v S1>you're convinced to vote a certain way, you're convinced to

0:05:22.083 --> 0:05:25.743
<v S1>have a certain opinion, a positive or negative opinion about

0:05:25.743 --> 0:05:29.163
<v S1>a person. So I like to think about the impact

0:05:29.163 --> 0:05:31.473
<v S1>of it and then be like, okay, so how do

0:05:31.473 --> 0:05:36.453
<v S1>we defend against that? Mhm. Um, because there's multiple ways

0:05:36.453 --> 0:05:39.723
<v S1>to trick you into doing something like somebody could just

0:05:39.723 --> 0:05:42.033
<v S1>get on and it's not a deep fake at all.

0:05:42.033 --> 0:05:45.723
<v S1>They just convince you that you should transfer this money

0:05:45.933 --> 0:05:49.623
<v S1>like like oh you should buy this real estate. It's, uh,

0:05:49.653 --> 0:05:52.233
<v S1>you know, it's Oceanside, but somehow it's in the middle

0:05:52.233 --> 0:05:54.723
<v S1>of the country and there's no ocean, but they're just

0:05:54.723 --> 0:05:57.633
<v S1>really good at talking. So they convince you. So it's

0:05:57.633 --> 0:06:00.393
<v S1>like the technique to get you to do the thing

0:06:01.113 --> 0:06:03.393
<v S1>might not be the best place to look for it,

0:06:03.393 --> 0:06:07.983
<v S1>because there's so many of those techniques. The question is

0:06:07.983 --> 0:06:13.233
<v S1>the money transfer, the vote, the, um, opening up access

0:06:13.233 --> 0:06:15.573
<v S1>to an attacker to, like, hey, I need you to

0:06:15.603 --> 0:06:19.353
<v S1>turn on remote access so I can get access in. Well,

0:06:19.353 --> 0:06:21.213
<v S1>that that would be the flag, right? What do you

0:06:21.213 --> 0:06:24.093
<v S1>think about that sort of mental framework?

0:06:24.423 --> 0:06:27.243
<v S2>Yeah, I think, you know, that is a that is

0:06:27.243 --> 0:06:32.283
<v S2>definitely the correct mental framework. I think we're talking about the, um,

0:06:33.123 --> 0:06:39.963
<v S2>the malicious intent largely has not changed. Right. Whether, like

0:06:39.963 --> 0:06:42.903
<v S2>you said, whether it's, you know, you know, getting people

0:06:42.903 --> 0:06:45.363
<v S2>to do something that they would otherwise, not in a

0:06:45.363 --> 0:06:50.973
<v S2>very simple terms. Right. And the speed at which the

0:06:50.973 --> 0:06:55.353
<v S2>act of convincing the speed, if you could map it

0:06:55.353 --> 0:06:59.673
<v S2>to the act of convincing somebody has definitely increased because

0:06:59.673 --> 0:07:05.823
<v S2>of these audio visual, these perceptive sensors, which we believe

0:07:05.853 --> 0:07:08.883
<v S2>as real. What you see is, you know, what the

0:07:08.883 --> 0:07:12.783
<v S2>reality looks like and when that is being questioned on,

0:07:12.813 --> 0:07:15.783
<v S2>you know, that sort of definitely, you know, gets it

0:07:15.783 --> 0:07:18.213
<v S2>to a point where you are now going to be

0:07:18.243 --> 0:07:22.953
<v S2>targeting the different aspects to take advantage of. You talked

0:07:22.953 --> 0:07:28.173
<v S2>about financial, social, uh, defamation, personal attacks, like, you know,

0:07:28.203 --> 0:07:32.103
<v S2>all all of these put together I think is, is

0:07:32.103 --> 0:07:35.973
<v S2>now the landscape that these actors are operating with some

0:07:35.973 --> 0:07:38.433
<v S2>of these technologies. And I think, you know, the most

0:07:38.433 --> 0:07:42.363
<v S2>concerning aspect, I'm convinced that, you know, as the technology evolves, like,

0:07:42.393 --> 0:07:44.013
<v S2>you know, there will always be this cat and mouse.

0:07:44.013 --> 0:07:46.983
<v S2>But the most concerning aspect, I think, of these deepfakes

0:07:47.013 --> 0:07:53.223
<v S2>is the potential for eroding trust. Trust from from systems

0:07:53.223 --> 0:07:56.793
<v S2>that are legitimate, that are that are true. And and

0:07:56.793 --> 0:08:00.873
<v S2>I think, you know, that that is more of these intangible, uh,

0:08:00.873 --> 0:08:03.303
<v S2>effects of this technology, I think.

0:08:03.723 --> 0:08:07.413
<v S1>Yeah. So how do you see these being used in

0:08:07.413 --> 0:08:11.343
<v S1>attack chains? So we already have existing attack chains. How

0:08:11.343 --> 0:08:14.043
<v S1>do you see these being added in or like augmented

0:08:14.043 --> 0:08:15.903
<v S1>with this technology?

0:08:16.113 --> 0:08:19.473
<v S2>Yeah. So I think, you know, we started the discussion with,

0:08:19.503 --> 0:08:23.253
<v S2>you know, phishing. Um, we talked about this disturbing trend

0:08:23.283 --> 0:08:27.423
<v S2>with textual content. Now we're seeing with, with video. Um,

0:08:27.423 --> 0:08:30.363
<v S2>and we're seeing with voice. So voice we already talked about,

0:08:30.363 --> 0:08:34.143
<v S2>for example identity. Identity masquerading, for example, you know, faking

0:08:34.143 --> 0:08:37.263
<v S2>voice identity. We've already seen in the media, for example,

0:08:37.293 --> 0:08:39.813
<v S2>some of these things playing out with millions of dollars

0:08:40.353 --> 0:08:45.933
<v S2>or potentially private information being disclosed. Right. As cyber financial crimes,

0:08:45.933 --> 0:08:51.183
<v S2>for instance. Right. Um, and in video like, you know, uh,

0:08:51.183 --> 0:08:56.943
<v S2>elections coming up in both Canada and United States, you know, uh, this, um,

0:08:56.943 --> 0:09:00.573
<v S2>this disinformation or spread of disinformation at this, at this

0:09:00.573 --> 0:09:05.073
<v S2>speed is changing public opinion visually. Uh, these are sort

0:09:05.073 --> 0:09:09.603
<v S2>of these, uh, Attack vectors, I would say like changing perception,

0:09:09.603 --> 0:09:14.763
<v S2>changing or distorting reality. Um, in, in sense for, for

0:09:14.763 --> 0:09:18.873
<v S2>the mass and also in financial crimes like, you know,

0:09:18.903 --> 0:09:25.803
<v S2>leveraging this technology, uh, to defame brands, uh, create a

0:09:25.833 --> 0:09:28.443
<v S2>direct financial like, you know, you talked about millions of

0:09:28.443 --> 0:09:34.683
<v S2>dollars getting siphoned, uh, creating those. So it's all motivated in,

0:09:34.683 --> 0:09:38.853
<v S2>in those areas. So, um, and we're seeing, you know,

0:09:38.883 --> 0:09:41.973
<v S2>the reality it's no longer hypothetical. Um.

0:09:42.543 --> 0:09:46.083
<v S1>Yeah, absolutely. So there was, uh, one recently with, uh, Ferrari.

0:09:46.113 --> 0:09:48.813
<v S1>I don't know if you saw that one. It was, uh,

0:09:49.443 --> 0:09:55.743
<v S1>it was basically somebody masqueraded the CEO's voice, um, on

0:09:55.743 --> 0:09:58.143
<v S1>a phone call, and they would actually they actually had

0:09:58.143 --> 0:10:00.933
<v S1>done a bunch of stuff on WhatsApp first to get

0:10:00.933 --> 0:10:03.303
<v S1>them to the point of almost doing this thing that

0:10:03.303 --> 0:10:06.213
<v S1>they wanted. And then the final step was, hey, I

0:10:06.243 --> 0:10:08.793
<v S1>need to talk to you on the phone. So it

0:10:08.793 --> 0:10:12.813
<v S1>was Ferrari. So they're Italian, so they have the voice

0:10:12.813 --> 0:10:15.963
<v S1>of the person. They also have the right accent from

0:10:15.963 --> 0:10:20.433
<v S1>the right part of Italy. Mhm. So, um, it was

0:10:20.433 --> 0:10:23.973
<v S1>fairly convincing, but something the executive that they were talking

0:10:24.003 --> 0:10:27.753
<v S1>to and trying to trick something made them question it.

0:10:27.753 --> 0:10:30.813
<v S1>So they asked him a personal question that they knew

0:10:30.933 --> 0:10:35.433
<v S1>about the CEO. Mhm. And the fake CEO the deep

0:10:35.463 --> 0:10:40.143
<v S1>fake couldn't answer. So they ended the call. Yeah. So

0:10:40.173 --> 0:10:43.773
<v S1>so I think um that was a lucky case that

0:10:43.773 --> 0:10:48.603
<v S1>you had somebody who was suspicious. But have you seen

0:10:48.603 --> 0:10:52.353
<v S1>other similar sort of attacks where um, I guess there

0:10:52.353 --> 0:10:55.443
<v S1>was also the one where someone was convincing to send money.

0:10:55.443 --> 0:10:57.363
<v S1>I think they actually did convince them to send.

0:10:57.363 --> 0:11:01.953
<v S2>Money in a British, um, British form. Yeah. Yeah. That's right.

0:11:02.013 --> 0:11:04.473
<v S2>Very one. You know, I think the Ferrari one is

0:11:04.473 --> 0:11:09.093
<v S2>actually a very good case study. Like, I think, you know,

0:11:09.123 --> 0:11:11.643
<v S2>in that one I haven't looked fully into it. But

0:11:11.673 --> 0:11:15.603
<v S2>on the on on the brief, um, articles that I read,

0:11:15.603 --> 0:11:22.563
<v S2>it seemed like the suspicious element was the subtle mechanical, uh,

0:11:22.563 --> 0:11:27.843
<v S2>intonations in the voice that was detected, which sort of

0:11:27.873 --> 0:11:32.043
<v S2>got them into sort of questioning. And, and it's great

0:11:32.043 --> 0:11:36.813
<v S2>for this person to have asked a, um, a shared secret,

0:11:36.813 --> 0:11:39.783
<v S2>if you will, or a previous yes question that they

0:11:39.783 --> 0:11:43.353
<v S2>would have otherwise not known. Um, but you know what? Like,

0:11:43.383 --> 0:11:47.073
<v S2>you know, my first thought reading that was like, uh,

0:11:47.373 --> 0:11:51.693
<v S2>this is real time voice cloning, right? As you're speaking. Uh,

0:11:51.693 --> 0:11:55.743
<v S2>it's sort of generating this. Right? So the compute cycles

0:11:55.743 --> 0:11:58.953
<v S2>that are required, typically for audio latencies to work is

0:11:58.953 --> 0:12:01.953
<v S2>anywhere from 9 to 42 milliseconds for a continuous stream of,

0:12:01.953 --> 0:12:06.003
<v S2>of communication. And this will get progressively better. Like if

0:12:06.003 --> 0:12:08.463
<v S2>it were a static recording that you're playing like a

0:12:08.493 --> 0:12:12.213
<v S2>like a video. Um, that voice overlay or an audio,

0:12:12.213 --> 0:12:16.113
<v S2>for example, uh, you know, uh, or audio voice notes,

0:12:16.113 --> 0:12:19.503
<v S2>for example. Uh, they would be spot on because it

0:12:19.503 --> 0:12:22.473
<v S2>would have had the compute power necessary. And the algorithms

0:12:22.473 --> 0:12:25.293
<v S2>that we have today and that are used by these

0:12:25.293 --> 0:12:28.983
<v S2>threat actors. But I bet that this particular technology of

0:12:28.983 --> 0:12:32.763
<v S2>real time cloning, which is as I'm speaking, it is

0:12:32.793 --> 0:12:37.863
<v S2>sort of, um, transferring the the audio nuances to somebody

0:12:37.863 --> 0:12:43.173
<v S2>else's Mercury would get just better, uh, over time. So

0:12:43.173 --> 0:12:46.713
<v S2>this is quite, quite concerning. But the other one, I think,

0:12:46.743 --> 0:12:49.413
<v S2>you know, there is I came across resemble I or

0:12:49.413 --> 0:12:52.083
<v S2>resembled I forget like, you know, they track quite a

0:12:52.083 --> 0:12:56.163
<v S2>few of these incidents like, you know, worldwide. Um, for

0:12:56.163 --> 0:13:03.183
<v S2>many of these fake like whether it's robocall AI misinformation. Um, and,

0:13:03.213 --> 0:13:05.823
<v S2>you know, I recently read a report from, from Deloitte,

0:13:05.823 --> 0:13:11.043
<v S2>I think the fastest growing forms of adversarial. I, uh, like,

0:13:11.073 --> 0:13:15.333
<v S2>you know, deepfake related are all on financial crimes, like

0:13:15.333 --> 0:13:19.773
<v S2>financial losses. And they're projecting about over 12 north of

0:13:19.773 --> 0:13:26.433
<v S2>12,000,000,000 in 2023. Uh, which was the case, and 40 billion, um,

0:13:26.463 --> 0:13:30.963
<v S2>by 2027 in aggregate. So which is, you know, is

0:13:30.963 --> 0:13:34.203
<v S2>growing at an astounding rate over, I don't know, like

0:13:34.323 --> 0:13:41.493
<v S2>12 to 40, like 25, 30%. Yeah. You know, compounding rate. Um,

0:13:41.493 --> 0:13:44.343
<v S2>and you know, in one of those reports, you know,

0:13:44.703 --> 0:13:50.793
<v S2>especially for the financial crimes Deloitte reported, these fakes are proliferating, um,

0:13:50.823 --> 0:13:54.303
<v S2>mostly in the banking and financial services as being the

0:13:54.303 --> 0:13:56.313
<v S2>main target. Hmm.

0:13:57.303 --> 0:14:03.933
<v S1>Interesting. So as the stuff gets better and like you said,

0:14:03.933 --> 0:14:09.483
<v S1>it becomes indistinguishable. Like there's no way to tell the difference. Mhm. Um,

0:14:10.383 --> 0:14:13.743
<v S1>so one one thing is the voice sounds better, but

0:14:13.833 --> 0:14:18.993
<v S1>what I feel like if there's extra context the more

0:14:19.023 --> 0:14:21.843
<v S1>the attacker knows about the thing. So imagine this is

0:14:21.843 --> 0:14:25.203
<v S1>a fully automated AI attack. That's even worse. So it's

0:14:25.203 --> 0:14:28.233
<v S1>not even a voice a real time clone of a person.

0:14:28.233 --> 0:14:32.163
<v S1>It's actually like an AI agent that's just calling and

0:14:32.163 --> 0:14:35.613
<v S1>trying to get the things to happen. But but it

0:14:35.613 --> 0:14:39.273
<v S1>has been given a full database about everything about you

0:14:39.273 --> 0:14:43.653
<v S1>or about me or about this, uh, Italian executive. So

0:14:43.653 --> 0:14:45.333
<v S1>it knows, like, the name of their dog because it

0:14:45.333 --> 0:14:48.483
<v S1>got it from, like, Instagram. Right? And it's got all

0:14:48.483 --> 0:14:51.393
<v S1>this personal data about like, it knows the wife's name

0:14:51.393 --> 0:14:56.373
<v S1>and everything or whatever. So how how do you defend

0:14:56.373 --> 0:14:59.623
<v S1>against that? So let's say it's a perfect voice. It's

0:14:59.623 --> 0:15:04.363
<v S1>perfectly real time, but it also has deep knowledge from

0:15:04.573 --> 0:15:08.293
<v S1>open source intelligence about the actual perpetrator.

0:15:08.323 --> 0:15:09.643
<v S4>Yeah, yeah.

0:15:09.883 --> 0:15:12.703
<v S2>No, I think, you know, it's, uh, as you're saying, right?

0:15:12.733 --> 0:15:17.323
<v S2>I mean, this type of, uh, generative AI, adversarial AI,

0:15:17.353 --> 0:15:20.983
<v S2>you know, it creates new attack vectors with all of

0:15:20.983 --> 0:15:25.603
<v S2>this information, multimodal information that no one sees coming, and

0:15:25.603 --> 0:15:30.733
<v S2>it creates a more complex, nuanced threat landscape, um, that,

0:15:30.763 --> 0:15:37.063
<v S2>you know, prioritizes identity driven attacks, and it'll only get better. Um,

0:15:37.063 --> 0:15:41.713
<v S2>so in the short term, the way I think about

0:15:41.713 --> 0:15:44.983
<v S2>this and most companies that that I talk to their

0:15:44.983 --> 0:15:48.553
<v S2>CISOs is like training, right? The first thing is at

0:15:48.583 --> 0:15:55.003
<v S2>least deepfake detection training, which is recognizing inconsistencies in facial expression, uh,

0:15:55.053 --> 0:16:00.933
<v S2>audio quality or video quality? Uh, before disclosing sensitive information, uh,

0:16:00.933 --> 0:16:06.033
<v S2>validate using a verification protocol, uh, of contacting them through

0:16:06.033 --> 0:16:09.783
<v S2>known means, like, for example, through their, uh, known contacts,

0:16:09.813 --> 0:16:12.723
<v S2>like phone or ask them questions like you talked about

0:16:12.723 --> 0:16:15.303
<v S2>this shared secret, like ask them something that you know

0:16:15.333 --> 0:16:18.693
<v S2>they would otherwise have not shared or disclosed. Uh, create

0:16:18.693 --> 0:16:25.023
<v S2>these these validation, uh, protocols, um, and have people be

0:16:25.023 --> 0:16:31.773
<v S2>more aware of social engineering from, from an awareness perspective. Right. Um,

0:16:32.733 --> 0:16:35.433
<v S2>I mean, that's the, you know, that's just to get

0:16:35.433 --> 0:16:39.843
<v S2>get to the short term problem just with exposure and training, really.

0:16:39.843 --> 0:16:45.033
<v S2>But on a more broader term, I think social engineering awareness, uh,

0:16:45.033 --> 0:16:49.143
<v S2>basically should should drive these regulatory verification process. I think

0:16:49.143 --> 0:16:52.173
<v S2>the governments have have a part to play here to

0:16:52.203 --> 0:16:59.613
<v S2>make content provenance or identity? Um, you know, spoofing, using AI, uh,

0:16:59.613 --> 0:17:03.333
<v S2>as a mechanism, as a mandatory safeguarding, like in Canada,

0:17:03.333 --> 0:17:07.143
<v S2>for example, I can say, uh, most of the provinces

0:17:07.143 --> 0:17:11.253
<v S2>have enacted legislation on sharing non-consensual media, for example. So

0:17:11.253 --> 0:17:14.763
<v S2>the acts are already there, uh, in identity fakes, for example.

0:17:14.763 --> 0:17:18.123
<v S2>But that needs to be put on AI. Um, the no.

0:17:18.153 --> 0:17:21.873
<v S2>AI Fraud Act, uh, that was introduced in the US

0:17:21.873 --> 0:17:24.783
<v S2>House of Representatives, I think, uh, earlier this year I

0:17:24.783 --> 0:17:29.553
<v S2>believe is a good first step. Right. Uh, fraud itself.

0:17:29.613 --> 0:17:32.253
<v S2>There are regulations around that. There's laws around it. We

0:17:32.253 --> 0:17:35.763
<v S2>just need the governments to to catch up to the

0:17:35.763 --> 0:17:39.633
<v S2>level at which the technology is progressing and, and create

0:17:39.633 --> 0:17:43.773
<v S2>it within, within the AI framework. Um, yeah.

0:17:43.983 --> 0:17:46.773
<v S1>I I'm sorry. Go ahead.

0:17:47.013 --> 0:17:49.113
<v S2>No, no, I was just saying that, you know, training

0:17:49.113 --> 0:17:52.173
<v S2>and then the regulatory criteria. But there are things in

0:17:52.173 --> 0:17:54.393
<v S2>the technology side as well that we can do. But perhaps,

0:17:54.423 --> 0:18:00.003
<v S2>you know, the immediately, uh, there are these two things

0:18:00.003 --> 0:18:01.173
<v S2>that come to mind.

0:18:02.223 --> 0:18:05.883
<v S1>Yeah. Well, one thing I worry about there is that

0:18:06.873 --> 0:18:09.693
<v S1>I agree that government is going to get involved and

0:18:09.693 --> 0:18:15.033
<v S1>should get involved, but I think about spam calls in

0:18:15.033 --> 0:18:16.863
<v S1>the US. I don't know about Canada in the US.

0:18:16.893 --> 0:18:21.753
<v S1>It's still very bad. Um, I basically have an allow list. Mhm. Um,

0:18:21.783 --> 0:18:26.013
<v S1>and all other calls just go directly to voicemail because

0:18:26.013 --> 0:18:30.903
<v S1>I couldn't handle it any other way. Mhm. Um spam

0:18:30.903 --> 0:18:35.283
<v S1>calls are already illegal. Fraud is already illegal. So I

0:18:35.313 --> 0:18:39.573
<v S1>if the government says it's illegal to do fraud with I,

0:18:40.383 --> 0:18:43.113
<v S1>if it's already illegal to do fraud, I'm not sure

0:18:43.113 --> 0:18:46.983
<v S1>what exactly. Like who would be willing to do the

0:18:46.983 --> 0:18:51.843
<v S1>fraud without I. Mhm. Even though it's illegal. But once

0:18:51.843 --> 0:18:54.153
<v S1>the new law came out they would be like oh

0:18:54.183 --> 0:18:56.133
<v S1>well now it's illegal with AI so I'm not going

0:18:56.163 --> 0:18:56.853
<v S1>to do it.

0:18:57.063 --> 0:18:58.623
<v S3>Mhm. Yeah.

0:18:58.653 --> 0:19:01.953
<v S2>No I think you know and that's where so, so

0:19:01.953 --> 0:19:03.963
<v S2>I think the that's a great point. So the first

0:19:03.963 --> 0:19:06.693
<v S2>thing is you know educate. Right. We all need to

0:19:06.693 --> 0:19:09.903
<v S2>be aware that reality is getting distorted and be aware

0:19:09.903 --> 0:19:12.423
<v S2>of our surroundings. Yes. Number two we do need some

0:19:12.423 --> 0:19:16.053
<v S2>guardrails regulatory guardrails. Right. So that, you know, at least

0:19:16.083 --> 0:19:20.733
<v S2>it it puts some checks and balances. Or if somebody

0:19:20.763 --> 0:19:24.093
<v S2>were to to file a complaint, there is a legal

0:19:24.093 --> 0:19:26.343
<v S2>framework to act upon. So today for example, if I

0:19:26.373 --> 0:19:29.733
<v S2>went and said something like this happened. Um, the legal

0:19:29.733 --> 0:19:34.803
<v S2>framework is not there to support, uh, let's say the, the,

0:19:34.803 --> 0:19:37.323
<v S2>the legal proceedings that would follow this.

0:19:37.623 --> 0:19:38.733
<v S3>Yeah, that's a good point.

0:19:38.763 --> 0:19:41.463
<v S2>Yeah. But also there is a technology side to it, right. Like,

0:19:41.493 --> 0:19:43.413
<v S2>I mean, if you take a step back and if

0:19:43.413 --> 0:19:46.233
<v S2>you think about it, look it will get progressively better.

0:19:46.233 --> 0:19:48.693
<v S2>And why do I think about it that way? because,

0:19:48.723 --> 0:19:51.453
<v S2>you know, this branch of generative AI, you know, works

0:19:51.693 --> 0:19:55.563
<v S2>in a way like like, you know, we talked about this,

0:19:55.563 --> 0:19:59.433
<v S2>I think, in our previous discussion about adversarial networks. Right.

0:19:59.463 --> 0:20:05.403
<v S2>Adversarial networks, Gans like or or various types of autoencoders,

0:20:05.403 --> 0:20:07.263
<v S2>for example. Typically, the way it works is like you

0:20:07.263 --> 0:20:11.313
<v S2>have two pairs of network, right? One, you know, deep

0:20:11.313 --> 0:20:14.643
<v S2>neural network is called the generator that's generating this content.

0:20:14.643 --> 0:20:17.403
<v S2>And the other one is a discriminator and it's critiquing

0:20:17.403 --> 0:20:20.103
<v S2>the content. Simplest way to think about that. So the

0:20:20.103 --> 0:20:24.003
<v S2>generator will get better and better as the discriminator gets

0:20:24.003 --> 0:20:28.473
<v S2>better at critiquing critiquing the generation. Right. Yeah. Zero sum game.

0:20:28.473 --> 0:20:31.803
<v S2>So the very technology that is sort of enabling to

0:20:31.833 --> 0:20:34.173
<v S2>detect and identifying what is a fake and what is

0:20:34.173 --> 0:20:38.703
<v S2>real in itself is assisting in getting better at the content.

0:20:38.703 --> 0:20:41.223
<v S2>So that's why, you know, I think about this as,

0:20:41.253 --> 0:20:46.653
<v S2>as the need to accelerate the framework, the regulatory framework

0:20:46.683 --> 0:20:52.113
<v S2>for authenticity detection technology. So accelerate the innovation on cryptographically

0:20:52.113 --> 0:20:54.963
<v S2>securing generative AI content through fingerprint.

0:20:55.203 --> 0:20:56.733
<v S1>Okay. I like that.

0:20:56.763 --> 0:21:00.423
<v S2>Simplicity. You know, the government regulatory criteria can come in

0:21:00.423 --> 0:21:03.483
<v S2>and enforce the authenticity. Like we all need to learn

0:21:03.483 --> 0:21:06.513
<v S2>how to verify you know, what is valid and what

0:21:06.513 --> 0:21:08.763
<v S2>is authentic content. Just look at how we learned how

0:21:08.763 --> 0:21:10.983
<v S2>to check if a browser is safe, right? There is

0:21:10.983 --> 0:21:11.673
<v S2>a little.

0:21:12.063 --> 0:21:12.873
<v S3>Um yep.

0:21:13.203 --> 0:21:16.353
<v S2>Lock at the left hand side. Oh, okay. It's safe. Right?

0:21:16.383 --> 0:21:18.843
<v S2>SSL and all that. Right. The same way we need

0:21:18.843 --> 0:21:20.553
<v S2>to sort of, you know, do the same thing to

0:21:20.583 --> 0:21:25.773
<v S2>learn how to check if a content media is safe. And,

0:21:25.803 --> 0:21:29.283
<v S2>you know, we consume our content through browsers. Right. I mean,

0:21:29.283 --> 0:21:33.483
<v S2>all of this thing is manifesting itself in the very

0:21:33.483 --> 0:21:37.563
<v S2>vehicle that's delivering it is are these browsers. So there.

0:21:37.563 --> 0:21:38.343
<v S3>Is a way and.

0:21:38.343 --> 0:21:39.933
<v S1>Mobile apps and mobile apps.

0:21:39.963 --> 0:21:40.503
<v S3>And mobile.

0:21:40.503 --> 0:21:42.153
<v S2>Apps. But at the end of the day, there is

0:21:42.153 --> 0:21:46.023
<v S2>a way and and we consume most of the content

0:21:46.053 --> 0:21:50.913
<v S2>using these sort of instruments. So we can see how

0:21:50.913 --> 0:21:53.883
<v S2>I'm drawing these parallels, that if we sort of accelerate

0:21:53.913 --> 0:21:59.013
<v S2>the innovation on cryptographically security, securing and validating these, these

0:21:59.013 --> 0:22:04.233
<v S2>fingerprinted content, then we should be able to tackle this technology, right?

0:22:04.263 --> 0:22:04.743
<v S2>I mean.

0:22:05.073 --> 0:22:07.983
<v S1>So I like that. Yeah. Is it. So it's when

0:22:07.983 --> 0:22:11.193
<v S1>you're saying legislation you're not talking about make it illegal

0:22:11.193 --> 0:22:12.993
<v S1>to do bad things because it's already. No no.

0:22:12.993 --> 0:22:13.893
<v S3>No. Yeah. Yeah.

0:22:13.923 --> 0:22:22.083
<v S1>You're talking about requiring, uh, technology providers to have an

0:22:22.083 --> 0:22:25.593
<v S1>authenticity mechanism. It's really funny you say that because I

0:22:25.593 --> 0:22:28.143
<v S1>was going to bring up a similar point. So, um,

0:22:28.143 --> 0:22:30.903
<v S1>we're in zoom right now. You see, when I'm talking,

0:22:30.903 --> 0:22:34.053
<v S1>you see, I have a green outline around. Mhm. When

0:22:34.053 --> 0:22:37.383
<v S1>you talk the you have a green outline. And in

0:22:37.383 --> 0:22:40.803
<v S1>this case that's to indicate that that's person is talking. Right.

0:22:40.833 --> 0:22:46.713
<v S1>Obviously however I've been thinking about exactly what you said.

0:22:46.713 --> 0:22:53.313
<v S1>Which is, um, what if Apple and Android and YouTube

0:22:53.553 --> 0:22:59.583
<v S1>and all of meta, they had a mechanism where, um,

0:23:00.243 --> 0:23:03.093
<v S1>so let's say I'm talking to you on the phone. Mhm.

0:23:03.123 --> 0:23:06.843
<v S1>When I initiate my phone call to you. Mhm. Um,

0:23:06.873 --> 0:23:10.863
<v S1>it's using my secure enclave on my phone. And so

0:23:10.893 --> 0:23:14.373
<v S1>it's verifying that I got in with my face or

0:23:14.373 --> 0:23:18.033
<v S1>my finger or um touch ID or whatever.

0:23:18.063 --> 0:23:19.053
<v S2>Authenticity. Yeah.

0:23:19.293 --> 0:23:22.623
<v S1>Yeah. So it authenticates that. And then when the call

0:23:22.623 --> 0:23:25.623
<v S1>comes over to you, you see something, you see a

0:23:25.623 --> 0:23:28.023
<v S1>blue outline, you see a green outline, you see a

0:23:28.023 --> 0:23:31.353
<v S1>check mark, just like you said with the lock symbol.

0:23:31.383 --> 0:23:35.823
<v S1>So now when we're having a conversation, it's validated. So

0:23:35.853 --> 0:23:38.433
<v S1>same same with this. Like you said, we could do

0:23:38.463 --> 0:23:41.553
<v S1>you could do a real time deepfake pretty soon. So

0:23:41.553 --> 0:23:44.763
<v S1>it looks like I'm talking to shell, but it's not

0:23:44.763 --> 0:23:47.973
<v S1>actually you. But if there was a green outline on

0:23:47.973 --> 0:23:51.843
<v S1>a check mark, that would mean that some combination of

0:23:51.843 --> 0:23:58.623
<v S1>my operating system and zoom had validated and continued to validate.

0:23:58.743 --> 0:24:01.053
<v S1>So maybe in the middle of our conversation, we both

0:24:01.053 --> 0:24:04.203
<v S1>get prompts because we've been talking for ten minutes. We

0:24:04.203 --> 0:24:06.483
<v S1>have to re authenticate the feed. Mhm.

0:24:07.173 --> 0:24:07.413
<v S3>Mhm.

0:24:07.443 --> 0:24:11.973
<v S2>So yeah you know you you're spot on. And you

0:24:11.973 --> 0:24:17.313
<v S2>know I almost think of this as a joint government

0:24:17.313 --> 0:24:20.733
<v S2>and industry partnership. Like you know I'm often I have

0:24:20.733 --> 0:24:24.933
<v S2>the opinion that this is definitely a solvable problem. Yeah.

0:24:24.963 --> 0:24:28.713
<v S2>And as an industry thought leaders we have to establish

0:24:28.713 --> 0:24:33.963
<v S2>what this common identity assertion protocol is and standardize that.

0:24:33.963 --> 0:24:37.083
<v S2>And then all of these companies that are in the

0:24:37.083 --> 0:24:43.593
<v S2>business of of creating media or transmitting media or exchanging media,

0:24:43.803 --> 0:24:48.183
<v S2>sort of, you know, um, adhere by it. Like, I mean,

0:24:48.213 --> 0:24:52.473
<v S2>you know, I'm, I'm actually quite, um, encouraged to see

0:24:52.473 --> 0:24:56.013
<v S2>the DARPA very recently, uh, to deal with at least the, uh,

0:24:56.013 --> 0:25:01.773
<v S2>face swapping technology and the puppeteering technology, which which is

0:25:01.773 --> 0:25:06.363
<v S2>also a phenomenally interesting branch of generative models is where

0:25:06.363 --> 0:25:08.943
<v S2>your expressions you are still the same person, your identity

0:25:08.943 --> 0:25:11.943
<v S2>is not solved, but your expressions are. But to deal

0:25:11.943 --> 0:25:14.763
<v S2>with that, they initiated a new research, I think called

0:25:14.763 --> 0:25:19.233
<v S2>the Media Forensic Research Acceleration Program R&amp;D program, if you will,

0:25:19.263 --> 0:25:24.963
<v S2>to identify fake digital visual media detection method. Right. Um,

0:25:24.993 --> 0:25:28.293
<v S2>and that will tackle, you know, those sort of things.

0:25:28.293 --> 0:25:31.983
<v S2>But I think in order to deal with the identity

0:25:32.013 --> 0:25:34.143
<v S2>side of things or validating, I think, you know, what

0:25:34.173 --> 0:25:37.053
<v S2>what you're suggesting is, is a great way of of

0:25:37.053 --> 0:25:39.663
<v S2>tackling that. In fact, I was recently reading a paper

0:25:39.693 --> 0:25:45.513
<v S2>like there is this cryptographic mechanism called um, uh, zero

0:25:45.513 --> 0:25:51.723
<v S2>knowledge proof. And the idea is, is quite simple. It's basically, um,

0:25:52.323 --> 0:25:55.413
<v S2>there is something that both you and I know without

0:25:55.413 --> 0:25:59.673
<v S2>you disclosing what you know, I'm able to verify, uh,

0:25:59.673 --> 0:26:03.003
<v S2>whether your claims are true or not.

0:26:03.033 --> 0:26:04.413
<v S1>Almost like Diffie. Hellman.

0:26:05.193 --> 0:26:08.613
<v S2>Yeah, except in Diffie. Hellman. Yeah, yeah, except that there is.

0:26:08.823 --> 0:26:11.283
<v S1>But but, um. But the middle person doesn't get to

0:26:11.433 --> 0:26:13.113
<v S1>see the thing exchanged.

0:26:13.263 --> 0:26:17.613
<v S2>Exactly. Yeah. And this zero knowledge proof is, is actually

0:26:17.613 --> 0:26:20.163
<v S2>applied in many different places. It's not new, but what

0:26:20.163 --> 0:26:23.433
<v S2>is new here is the application of zero knowledge proof

0:26:23.433 --> 0:26:29.403
<v S2>in authenticating and privacy maintaining hardware. Like, um, you know,

0:26:29.433 --> 0:26:32.463
<v S2>I came across a company that's sort of dabbling with

0:26:32.463 --> 0:26:37.323
<v S2>this called snark. They're using zero knowledge, zero knowledge proof microphones,

0:26:37.323 --> 0:26:40.413
<v S2>which you know, can prove the audio was indeed recorded

0:26:40.413 --> 0:26:44.223
<v S2>in that thing. Some media companies like Canon and Nikon

0:26:44.223 --> 0:26:48.093
<v S2>are dabbling with zero knowledge imaging technology, whereby they can

0:26:48.093 --> 0:26:52.173
<v S2>ascertain that this was actually captured using light rays coming

0:26:52.173 --> 0:26:53.583
<v S2>in a camera lens.

0:26:53.613 --> 0:26:54.993
<v S3>Oh, interesting.

0:26:54.993 --> 0:26:58.173
<v S2>And or if it has been edited in a particular way.

0:26:58.173 --> 0:27:03.753
<v S2>But this whole audio visual industry coalition content provenance, authenticity

0:27:03.783 --> 0:27:06.393
<v S2>is a serious topic. And it is time that, you know,

0:27:06.423 --> 0:27:11.043
<v S2>we we find ways to certify, uh, source of digital content,

0:27:11.073 --> 0:27:14.943
<v S2>how it was generated. Um, and things that you just

0:27:14.943 --> 0:27:20.253
<v S2>talked about ascertain before these gadgets, like mobile devices and

0:27:20.253 --> 0:27:23.973
<v S2>other such things are used to validate these things right now,

0:27:23.973 --> 0:27:27.483
<v S2>if you think about it. Right. Privacy, uh, or identity

0:27:27.513 --> 0:27:31.263
<v S2>verification is only used for content that you own. Like

0:27:31.293 --> 0:27:34.803
<v S2>for example, my phone, for example, is going to ask

0:27:34.803 --> 0:27:36.993
<v S2>me for my password and a bunch of other things

0:27:36.993 --> 0:27:41.583
<v S2>before it shows me my stuff, right? Or opens. But

0:27:41.583 --> 0:27:44.073
<v S2>that has no bearing when I'm calling you, for example.

0:27:44.103 --> 0:27:47.763
<v S2>Like you have no idea. Right. So so this this

0:27:47.763 --> 0:27:50.913
<v S2>thing is, as you mentioned, is actually now very important

0:27:50.913 --> 0:27:55.293
<v S2>is all of this identity validation that happened on the phone.

0:27:55.953 --> 0:28:00.963
<v S2>This trust needs to be expanded into the other entity

0:28:00.963 --> 0:28:04.893
<v S2>that you're interacting with. So the trust network needs to be, um,

0:28:04.893 --> 0:28:10.713
<v S2>shared using whatever methodology that these companies choose. But yeah,

0:28:10.743 --> 0:28:12.033
<v S2>our standardized.

0:28:12.303 --> 0:28:18.783
<v S1>Yeah. So I think that's correct. Um, I'm also thinking

0:28:18.783 --> 0:28:22.773
<v S1>about a thing that you said earlier when you mentioned Puppeting. Um, yeah.

0:28:22.803 --> 0:28:24.543
<v S1>I didn't know there was a name for this, but

0:28:24.543 --> 0:28:26.733
<v S1>I think it might be the same thing I'm thinking of.

0:28:26.763 --> 0:28:30.573
<v S1>So the the avatar on the other side looks like

0:28:30.573 --> 0:28:32.283
<v S1>this young, like.

0:28:32.703 --> 0:28:33.363
<v S3>Um.

0:28:34.173 --> 0:28:38.883
<v S1>Like, uh, almost like anime looking, uh, influencer girl. And she's, like,

0:28:38.883 --> 0:28:41.703
<v S1>really animated and, you know, pretty and everything, and she's

0:28:41.703 --> 0:28:45.033
<v S1>talking about whatever the topic is. And then you see

0:28:45.063 --> 0:28:48.063
<v S1>right next to it, it's actually like a 47 year

0:28:48.093 --> 0:28:52.233
<v S1>old male, and he's the one actually doing all of

0:28:52.233 --> 0:28:57.573
<v S1>the emoting and everything. And, um, it is real time

0:28:57.573 --> 0:29:04.293
<v S1>face swapping and real time hand swapping, uh, costume clothes, everything.

0:29:04.323 --> 0:29:09.543
<v S1>So that raises an interesting point. Uh, based on everything

0:29:09.543 --> 0:29:10.653
<v S1>we talked about.

0:29:11.103 --> 0:29:11.583
<v S3>At.

0:29:11.583 --> 0:29:18.603
<v S1>The start of the call, um, there was authentication that happened.

0:29:18.603 --> 0:29:22.893
<v S1>So they got a green box. Um, but then this

0:29:22.893 --> 0:29:26.253
<v S1>technology is now on. So now it looks like this

0:29:26.253 --> 0:29:30.723
<v S1>other person. Um, so what? This just got me thinking,

0:29:30.723 --> 0:29:33.273
<v S1>and I hadn't thought of this before. You know, this

0:29:33.273 --> 0:29:38.703
<v S1>reminds me of, uh, gaming situations where, um, games, there's

0:29:38.703 --> 0:29:42.603
<v S1>so much hacking happening in games that, uh, a lot

0:29:42.633 --> 0:29:47.043
<v S1>of game vendors switch to basically having to run a rootkit.

0:29:47.313 --> 0:29:47.673
<v S3>Mhm.

0:29:48.423 --> 0:29:51.993
<v S1>So they need end to end, top to bottom deep

0:29:51.993 --> 0:29:55.923
<v S1>kernel implementation to know that you do not have some

0:29:55.953 --> 0:30:01.473
<v S1>sort of shiv. Yeah. Some sort of injection capability inside

0:30:01.473 --> 0:30:04.443
<v S1>of the thing. And it's looking at all the processes

0:30:04.443 --> 0:30:07.293
<v S1>that are running. It's looking for evidence of malware. It's

0:30:07.293 --> 0:30:11.463
<v S1>looking for evidence of tampering. So, so the question is

0:30:11.493 --> 0:30:13.533
<v S1>like if we start a video call and then I

0:30:13.533 --> 0:30:17.493
<v S1>start software like that. Mhm. That technology needs to be

0:30:17.493 --> 0:30:20.313
<v S1>able to know that I'm using the puppet technology and

0:30:20.313 --> 0:30:22.953
<v S1>that there's an interception and translation happening.

0:30:22.983 --> 0:30:23.553
<v S3>Correct.

0:30:23.583 --> 0:30:26.373
<v S2>Yeah. And I think you know that's a that's a

0:30:26.373 --> 0:30:32.283
<v S2>very well uh explained kind of Uh, thought process. And

0:30:32.283 --> 0:30:36.063
<v S2>that's also one of the reasons why I think it's

0:30:36.063 --> 0:30:38.943
<v S2>not just that. And I talked about this zero knowledge

0:30:39.123 --> 0:30:44.403
<v S2>proof mechanism for authenticity built into the hardware. Um, because,

0:30:44.403 --> 0:30:49.353
<v S2>you see, if my camera is showing my video and

0:30:49.353 --> 0:30:53.763
<v S2>if the camera in the live stream, this hardware authenticates

0:30:53.763 --> 0:30:58.203
<v S2>that the video stream is basically what it is processing

0:30:58.203 --> 0:31:03.483
<v S2>using the, uh, the ZK hardware, uh, research I talked about.

0:31:03.513 --> 0:31:08.883
<v S2>Then any cross stream or stream mixing in the middle, um,

0:31:08.883 --> 0:31:12.423
<v S2>the receiving software should be able to validate the authenticity

0:31:12.423 --> 0:31:15.633
<v S2>that this is not what the camera captured. Right? Yes.

0:31:15.663 --> 0:31:18.663
<v S2>And that is the key, right? It you know, it's

0:31:18.663 --> 0:31:21.333
<v S2>not just at one level. It has to be that

0:31:21.333 --> 0:31:25.533
<v S2>the trust has to go all the way from the

0:31:25.533 --> 0:31:30.213
<v S2>physical level. Right? The lights and everything around here. To

0:31:30.243 --> 0:31:34.743
<v S2>what the camera sensor captures, to what the media gets digitized.

0:31:34.893 --> 0:31:38.583
<v S2>So just trying to tackle that the digital media layer

0:31:38.583 --> 0:31:44.673
<v S2>is insufficient. It needs to have the analog. Um, it

0:31:44.673 --> 0:31:48.873
<v S2>needs to have the analog ancillary to also transport this

0:31:48.873 --> 0:31:54.603
<v S2>authenticity and validation mechanism back for for real time communication,

0:31:55.023 --> 0:31:58.953
<v S2>whether it's an audio microphone or a video camera or sensor. Yeah.

0:31:59.433 --> 0:32:01.953
<v S1>Yeah. I love what you're saying there, because I love

0:32:01.953 --> 0:32:05.283
<v S1>the fact that the hardware itself is involved. So to

0:32:05.283 --> 0:32:12.093
<v S1>your point. Canon. Canon. Canon and Nikon. So it's almost

0:32:12.093 --> 0:32:14.043
<v S1>like they would have their own version of like a

0:32:14.043 --> 0:32:17.733
<v S1>secure enclave or something similar where it's like, that's a

0:32:17.733 --> 0:32:21.333
<v S1>protected system. It's the one doing the signing at the

0:32:21.363 --> 0:32:24.123
<v S1>at the camera hardware level, which is part of a

0:32:24.123 --> 0:32:26.883
<v S1>later signature which is passed on. That's right. So it's

0:32:26.883 --> 0:32:29.883
<v S1>like this chain of custody where it's an unbroken thing.

0:32:29.913 --> 0:32:30.273
<v S3>Yeah.

0:32:30.303 --> 0:32:32.403
<v S2>But the challenge with that, Daniel, is that, you see,

0:32:32.613 --> 0:32:35.163
<v S2>right now, the fragmentation in this space is going to

0:32:35.163 --> 0:32:39.243
<v S2>be devastating. Like, that's the worst thing that can happen. Yeah. Fragmentation.

0:32:39.243 --> 0:32:42.243
<v S2>Meaning that okay, one person is doing or one company

0:32:42.273 --> 0:32:43.833
<v S2>doing it this way. The other company is doing it

0:32:43.833 --> 0:32:46.503
<v S2>that way. And there is no like, you know, um,

0:32:46.503 --> 0:32:48.783
<v S2>so so that's why I think it's very important that

0:32:48.783 --> 0:32:54.393
<v S2>the Logitech camera can interact with some of my phone camera,

0:32:54.393 --> 0:32:57.783
<v S2>for example, or, you know, the interoperability of this. So like,

0:32:57.813 --> 0:33:01.713
<v S2>imagine if your web browser did something different, uh, of

0:33:01.713 --> 0:33:04.983
<v S2>SSL and something else did something different. It's going to

0:33:04.983 --> 0:33:08.133
<v S2>be just a chaos. So standardization of this mechanism to

0:33:08.163 --> 0:33:12.843
<v S2>tackle deep fake authenticity of of digital media, whether it's

0:33:12.843 --> 0:33:15.093
<v S2>stored media or media in transit.

0:33:15.393 --> 0:33:15.963
<v S3>Um, you know.

0:33:16.083 --> 0:33:19.113
<v S1>Now that I'm thinking about this, I think you're right

0:33:19.113 --> 0:33:24.003
<v S1>about that, because I think what will probably happen is

0:33:24.063 --> 0:33:26.823
<v S1>the what we will agree on is we agree to

0:33:26.853 --> 0:33:32.313
<v S1>trust zoom and then zoom on each of our sides.

0:33:32.433 --> 0:33:36.903
<v S1>Does the camera validation because the camera got some sort

0:33:36.903 --> 0:33:42.693
<v S1>of certification from somebody like Apple or Mac OS or windows? Correct.

0:33:42.723 --> 0:33:47.823
<v S1>So zoom trust the camera. Therefore zoom signs it. Therefore

0:33:47.823 --> 0:33:52.473
<v S1>your side agrees because zoom side zoom signed both sides.

0:33:52.503 --> 0:33:53.853
<v S3>Sure. Something like that.

0:33:53.883 --> 0:33:57.573
<v S2>That works. That works too. And yeah, that's you know,

0:33:57.603 --> 0:33:59.793
<v S2>that that is a kind of standardization. But I was

0:33:59.793 --> 0:34:02.223
<v S2>going a little bit broader. I was saying that we

0:34:02.223 --> 0:34:06.873
<v S2>should like, we should almost go to the layers of

0:34:06.873 --> 0:34:12.933
<v S2>network communication, the same way how we communicate with streams

0:34:12.933 --> 0:34:17.493
<v S2>like we do. Like you talked about Diffie-Hellman, I'm talking about,

0:34:17.523 --> 0:34:22.653
<v S2>you know, um, stream establishment at the internet protocols. Authenticity.

0:34:22.683 --> 0:34:23.643
<v S3>Oh, sure.

0:34:23.673 --> 0:34:27.213
<v S2>So I think it's time for us to look at, like.

0:34:27.243 --> 0:34:30.693
<v S2>I mean, we can keep patching this stuff, right? We

0:34:30.693 --> 0:34:33.663
<v S2>can we can keep creating these glues and, you know,

0:34:33.693 --> 0:34:37.563
<v S2>but I think it's time to to take a step

0:34:37.563 --> 0:34:43.293
<v S2>further and start, um, you know, um, the contracts of

0:34:43.293 --> 0:34:46.233
<v S2>this authenticity of the hardware, the data, like the same

0:34:46.233 --> 0:34:50.013
<v S2>way how we digitize the data. We need to embed

0:34:50.523 --> 0:34:54.003
<v S2>some of these validation mechanisms right into the protocol.

0:34:54.573 --> 0:34:58.803
<v S1>You know, honestly, we should, um, not not perfectly on topic,

0:34:58.803 --> 0:35:02.643
<v S1>but we should actually collaborate on this because, um, I

0:35:02.643 --> 0:35:05.013
<v S1>don't think it's going to be easy for a small

0:35:05.013 --> 0:35:07.143
<v S1>company to do this. I think this is really going

0:35:07.173 --> 0:35:11.373
<v S1>to be like a consortium. Mhm. Um, but I used

0:35:11.373 --> 0:35:14.493
<v S1>to be at Apple. Um, I still know a lot

0:35:14.523 --> 0:35:16.683
<v S1>of people over there. I know a lot of people

0:35:16.683 --> 0:35:19.983
<v S1>are thinking about this, but I am very surprised that

0:35:19.983 --> 0:35:22.203
<v S1>I have not heard more people talk about what you

0:35:22.203 --> 0:35:30.063
<v S1>just said. So for example, um, IPsec, uh, Randall, like

0:35:30.093 --> 0:35:34.893
<v S1>all the fundamental protocols, uh, the fundamental algorithms, what is

0:35:34.893 --> 0:35:40.203
<v S1>an underlying base standard like TCP, IP, like TLS? Um, um,

0:35:40.773 --> 0:35:44.943
<v S1>you know, is it, uh, are we doing public key

0:35:44.973 --> 0:35:48.213
<v S1>for the exchange? Are we doing symmetric for the for the, uh,

0:35:48.213 --> 0:35:53.763
<v S1>the communication? Yeah. So it's like all those things need

0:35:53.793 --> 0:35:56.403
<v S1>to be considered and built into like, like you said,

0:35:56.403 --> 0:36:01.083
<v S1>a fundamental protocol which includes the authentication piece, which includes

0:36:01.083 --> 0:36:05.373
<v S1>the re prompting for authentication over certain periods of time

0:36:05.373 --> 0:36:08.523
<v S1>based on, uh, so for example, here would be a

0:36:08.523 --> 0:36:12.273
<v S1>great like method for the, uh, thing. Uh, you have

0:36:12.273 --> 0:36:18.003
<v S1>a policy established during the, the initiation of the call

0:36:18.243 --> 0:36:21.483
<v S1>so that if certain things are being talked about. It

0:36:21.483 --> 0:36:25.503
<v S1>up levels the requirements so it prompts you both sides

0:36:25.503 --> 0:36:26.463
<v S1>more often.

0:36:26.613 --> 0:36:27.633
<v S3>Mhm. Mhm.

0:36:27.663 --> 0:36:30.003
<v S1>For revalidation. Yeah. Things like that.

0:36:30.033 --> 0:36:33.063
<v S2>Yeah absolutely. And you know you talk about Apple and

0:36:33.063 --> 0:36:35.493
<v S2>I think it's interesting right. Apple is in a unique

0:36:35.493 --> 0:36:42.393
<v S2>place to to really solve deepfakes is because they have

0:36:42.393 --> 0:36:48.183
<v S2>a full control of end to end ecosystem if you will.

0:36:48.213 --> 0:36:49.083
<v S3>Yes.

0:36:49.173 --> 0:36:51.903
<v S2>Um, all the way from the hardware to the content

0:36:51.933 --> 0:36:54.063
<v S2>to the method of that content distributes, and they have

0:36:54.063 --> 0:37:00.303
<v S2>statistically significant density of communities that interact with those content. Um, so,

0:37:00.303 --> 0:37:02.673
<v S2>so that, that, you know, that's one aspect. And the

0:37:02.673 --> 0:37:05.463
<v S2>other aspect is I think, you know, if you look

0:37:05.493 --> 0:37:10.143
<v S2>at the rate at which the technology is evolving, um,

0:37:11.553 --> 0:37:23.013
<v S2>deep fakes are probably Significantly impacting our ability of what

0:37:23.043 --> 0:37:28.473
<v S2>reality looks like and or eroding trust from systems.

0:37:28.773 --> 0:37:29.253
<v S3>Yep.

0:37:29.283 --> 0:37:31.713
<v S2>And that is massively concerning.

0:37:32.133 --> 0:37:37.713
<v S1>Yeah I agree. Yeah. One thing I just realized is, um,

0:37:37.983 --> 0:37:41.343
<v S1>I would love like a little I think this is

0:37:41.343 --> 0:37:43.953
<v S1>probably coming soon with AI agents. So you have like

0:37:43.983 --> 0:37:47.583
<v S1>a little bot that is watching this chat. And one

0:37:47.583 --> 0:37:52.323
<v S1>of the things it would have reported is, um, Shil's

0:37:52.323 --> 0:37:57.843
<v S1>background looks like a real background that is blurred. Daniel's

0:37:57.843 --> 0:38:03.663
<v S1>background looks to be AI generated. So I, I'm watching

0:38:03.663 --> 0:38:05.853
<v S1>him very carefully to make sure he doesn't have six

0:38:05.853 --> 0:38:08.853
<v S1>fingers or something. You know what I mean? So you

0:38:08.853 --> 0:38:10.953
<v S1>could just have an alert that's like right off the

0:38:10.953 --> 0:38:14.373
<v S1>start before we even started. It's a fake background.

0:38:14.403 --> 0:38:14.883
<v S3>Mhm.

0:38:15.183 --> 0:38:16.053
<v S1>You know what I mean.

0:38:16.083 --> 0:38:16.683
<v S3>Yeah. No.

0:38:16.683 --> 0:38:21.153
<v S2>Absolutely. Yeah. I think, um, I think there are various

0:38:21.153 --> 0:38:26.073
<v S2>ways to solve this, but, you know, um, there are

0:38:26.073 --> 0:38:27.663
<v S2>things that can be done in the short term. There

0:38:27.693 --> 0:38:29.313
<v S2>are things that can be done in the mid-term. But

0:38:29.313 --> 0:38:31.863
<v S2>I think, you know, if we're talking about thought leadership,

0:38:31.863 --> 0:38:33.993
<v S2>vision as to where we're going, I think it's time

0:38:33.993 --> 0:38:39.813
<v S2>for us to kind of, you know, uh, rethink what

0:38:39.813 --> 0:38:42.303
<v S2>we are doing and how we're going to deal with

0:38:42.333 --> 0:38:46.473
<v S2>fakes in general. Digital fakes. AI is helping it make better.

0:38:46.593 --> 0:38:49.203
<v S3>But yeah, yeah, yeah, yeah.

0:38:49.203 --> 0:38:52.053
<v S1>I think the way you're talking about it is exactly correct.

0:38:52.053 --> 0:38:55.743
<v S1>Ultimately it's a trust issue. So anything that is eroding

0:38:55.743 --> 0:38:58.593
<v S1>that trust is really the problem. And that's where we start.

0:38:58.623 --> 0:38:59.133
<v S3>Exactly.

0:38:59.163 --> 0:39:02.283
<v S1>And then we start with that trust problem. And then

0:39:02.283 --> 0:39:06.753
<v S1>you start thinking about a trust protocol a more fundamental

0:39:06.783 --> 0:39:11.523
<v S1>technology protocol like TCP, IP, like HTTP, something, you know,

0:39:11.553 --> 0:39:12.843
<v S1>at a deeper, more fundamental.

0:39:12.873 --> 0:39:14.163
<v S3>Yeah, yeah.

0:39:14.193 --> 0:39:17.163
<v S2>And you know, Daniel, there is also another important thing here. Like,

0:39:17.193 --> 0:39:20.703
<v S2>you know, some of these things were developed for entertainment purposes. Like,

0:39:20.703 --> 0:39:23.673
<v S2>if you think about it, the very premise. Right. If

0:39:23.673 --> 0:39:29.553
<v S2>you go look up in GitHub and you search for, uh, FS, uh, Gann, uh,

0:39:29.553 --> 0:39:34.473
<v S2>facial expression and you'll see like incredible research papers and

0:39:34.473 --> 0:39:39.543
<v S2>then implementation of them. And they are majority uh, the

0:39:39.573 --> 0:39:42.213
<v S2>goal is to demonstrate what the technology is capable of.

0:39:42.213 --> 0:39:45.453
<v S2>Some of the first applications were for fun and. Yeah.

0:39:45.453 --> 0:39:47.643
<v S2>So what like, you know, I sent a picture or

0:39:47.643 --> 0:39:52.503
<v S2>video that looks, uh, five, ten years, um, you know,

0:39:52.533 --> 0:39:55.923
<v S2>of my age taken off, right? As long as I

0:39:55.923 --> 0:39:58.803
<v S2>do not claim, I think it's perfectly fine. As long

0:39:58.803 --> 0:40:01.713
<v S2>as they say, hey, you know, look, this is. And

0:40:01.713 --> 0:40:04.203
<v S2>there is no claims made that this is who I

0:40:04.203 --> 0:40:07.443
<v S2>am or this is what it is. The problem becomes

0:40:07.443 --> 0:40:09.363
<v S2>when some like, you know. So the root of the

0:40:09.363 --> 0:40:12.903
<v S2>problem is, is a fake, whether it's deep or not,

0:40:12.933 --> 0:40:15.993
<v S2>I think is is it or AI generated is a

0:40:15.993 --> 0:40:17.793
<v S2>different point altogether?

0:40:18.513 --> 0:40:19.323
<v S3>Yeah.

0:40:19.383 --> 0:40:22.563
<v S1>No, I think that's right. It's it's a great point

0:40:22.563 --> 0:40:28.323
<v S1>because there's a harmless removal of 15 years of age. Mhm.

0:40:28.803 --> 0:40:31.593
<v S1>But if it's a guy and he's trying to get

0:40:31.593 --> 0:40:36.813
<v S1>a model, uh, modeling job and the modeling company stands

0:40:36.813 --> 0:40:40.473
<v S1>to lose money from this contract being signed, now that

0:40:40.473 --> 0:40:42.153
<v S1>innocent thing is no longer innocent.

0:40:42.183 --> 0:40:43.023
<v S3>Exactly.

0:40:43.053 --> 0:40:46.323
<v S2>Yeah. Exactly. Which is why, you know, the technology is

0:40:46.323 --> 0:40:48.783
<v S2>just enabling. And that's why my points were like, we

0:40:48.783 --> 0:40:52.593
<v S2>need to find a way to deal with the technology. Mhm.

0:40:53.223 --> 0:40:56.253
<v S1>Yeah. So any any tips for people to learn more

0:40:56.253 --> 0:40:57.363
<v S1>about this.

0:40:57.663 --> 0:41:02.283
<v S2>Yeah I think you know um like we recently did

0:41:02.313 --> 0:41:04.263
<v S2>like the threat research team and the and the data

0:41:04.263 --> 0:41:09.693
<v S2>science team, uh, did some work to, to publish this um,

0:41:09.723 --> 0:41:12.813
<v S2>thing from BlackBerry about deep fakes. I encourage people to

0:41:12.813 --> 0:41:15.063
<v S2>read it. I think they're going to find it informative.

0:41:15.063 --> 0:41:18.183
<v S2>It's developed in a language that is very easy to understand,

0:41:18.183 --> 0:41:21.903
<v S2>and I think right now I would encourage people to

0:41:21.933 --> 0:41:25.023
<v S2>sort of learn about these things of what's possible. Right.

0:41:25.053 --> 0:41:27.963
<v S2>That's the first thing that at least you're skeptical when

0:41:27.963 --> 0:41:30.633
<v S2>you see something or your antennas kind of pick up

0:41:30.633 --> 0:41:35.013
<v S2>something that that you might otherwise might not have. So

0:41:35.043 --> 0:41:37.563
<v S2>awareness I think, is is the key at this time.

0:41:38.133 --> 0:41:40.473
<v S1>Okay. Yeah, we'll definitely put the link to that in

0:41:40.473 --> 0:41:44.433
<v S1>the show notes. Um, any predictions for like the next

0:41:44.463 --> 0:41:46.323
<v S1>year or 2 or 3 years?

0:41:47.703 --> 0:41:52.533
<v S2>Um, well, I think this technology is going to get

0:41:52.533 --> 0:41:57.063
<v S2>progressively better. You're going to see more hyper realistic content.

0:41:57.063 --> 0:42:00.393
<v S2>In fact, you're going to start seeing full body, not

0:42:00.393 --> 0:42:04.773
<v S2>just faces and expression puppeteering. I think you're going to see,

0:42:04.803 --> 0:42:09.003
<v S2>you know, hyper realistic content. You're going to see content

0:42:09.003 --> 0:42:13.773
<v S2>interacting with other content in social settings. You're going to

0:42:13.773 --> 0:42:19.563
<v S2>see more personalized attacks through this mechanism. Uh, you know,

0:42:19.593 --> 0:42:22.653
<v S2>public figures or people you dislike. You're going to be

0:42:22.683 --> 0:42:27.933
<v S2>able to start propaganda and the availability of these tools like,

0:42:27.963 --> 0:42:30.843
<v S2>I mean, from $5 to $15 a month. From a

0:42:30.843 --> 0:42:34.143
<v S2>subscription perspective, you can create some of this stuff, uh,

0:42:34.143 --> 0:42:36.783
<v S2>with a bit of programming. You can go download these

0:42:36.783 --> 0:42:39.573
<v S2>GitHub projects and do your own, if you will. Um,

0:42:39.573 --> 0:42:43.623
<v S2>you know, the like, you know, the possibility is limitless.

0:42:43.623 --> 0:42:47.883
<v S2>So deepfake as a technology will continue to evolve because

0:42:47.883 --> 0:42:53.283
<v S2>it does stoke a, a, uh, in a reason for

0:42:53.283 --> 0:42:56.703
<v S2>why we do certain things that, that are, uh, not

0:42:56.703 --> 0:42:59.043
<v S2>the best moral grounds, if you will. So it will

0:42:59.043 --> 0:43:03.753
<v S2>get become more sophisticated, harder to detect the very technology

0:43:03.753 --> 0:43:08.493
<v S2>that is required to do this. Um, is is going

0:43:08.493 --> 0:43:12.363
<v S2>to basically enable this, this growth. And the challenge will

0:43:12.363 --> 0:43:15.903
<v S2>be there in the coming years unless we as a

0:43:15.903 --> 0:43:18.393
<v S2>community do something about it.

0:43:18.903 --> 0:43:21.003
<v S1>Yeah. So the better that stuff gets, the more we're

0:43:21.003 --> 0:43:23.673
<v S1>going to need the types of controls that you talked about.

0:43:23.703 --> 0:43:24.303
<v S3>Exactly.

0:43:24.333 --> 0:43:25.893
<v S2>Yeah, absolutely.

0:43:25.923 --> 0:43:28.623
<v S1>Where can we learn more about you and your team

0:43:28.623 --> 0:43:29.973
<v S1>and the work that you're doing?

0:43:30.423 --> 0:43:33.123
<v S2>Uh, we, uh. That's great. Like, you know, we have

0:43:33.123 --> 0:43:36.693
<v S2>a data science research blog where we publish, uh, things

0:43:36.693 --> 0:43:41.823
<v S2>that we learn, um, time to time, uh, at BlackBerry, um,

0:43:41.853 --> 0:43:46.413
<v S2>papers that we publish. Um, so, so I welcome, uh,

0:43:46.413 --> 0:43:49.653
<v S2>people reaching out if they want. Uh, I always love

0:43:49.653 --> 0:43:52.533
<v S2>to have a great conversation. Some of these conversations we

0:43:52.533 --> 0:43:55.773
<v S2>had were very insightful. Um, yeah.

0:43:56.523 --> 0:43:59.463
<v S1>Okay. Well, awesome. Well, it's great to have you back. And, uh,

0:43:59.493 --> 0:44:02.403
<v S1>great conversation, as always. I appreciate the time.

0:44:02.583 --> 0:44:04.053
<v S2>Hey, thanks a lot, Daniel. Thanks.

0:44:04.083 --> 0:44:04.503
<v S1>All right.

0:44:04.533 --> 0:44:06.003
<v S3>Take care. Bye.