WEBVTT - TechSupport: Pixel Peeping - How to Spot AI Video

0:00:07.800 --> 0:00:17.280
<v Speaker 1>Class. Welcome to tech Stuff. I'm Kara Price. Today's interview

0:00:17.400 --> 0:00:20.880
<v Speaker 1>is all about Sora, the video generation tool and invite

0:00:20.880 --> 0:00:23.320
<v Speaker 1>only social media app that Open Ai released at the

0:00:23.360 --> 0:00:26.960
<v Speaker 1>beginning of October. If you're on TikTok, Instagram, or x

0:00:27.120 --> 0:00:30.080
<v Speaker 1>you've likely seen videos made by Sora plastered all over

0:00:30.120 --> 0:00:33.920
<v Speaker 1>your feeds. These videos ranged from the absurd cats dancing

0:00:33.960 --> 0:00:37.920
<v Speaker 1>by a dumpster with sunglasses on to hyper realistic like

0:00:38.040 --> 0:00:41.720
<v Speaker 1>Queen Elizabeth trying jerk chicken in Jamaica. When I first

0:00:41.760 --> 0:00:44.560
<v Speaker 1>saw these videos, I was entertained by the absurdist ones

0:00:44.600 --> 0:00:47.920
<v Speaker 1>and kind of floored by the realistic ones. To me,

0:00:48.479 --> 0:00:51.400
<v Speaker 1>Sora signals that we have officially entered the post bunny

0:00:51.440 --> 0:00:54.680
<v Speaker 1>trampoline internet. Yeah, I'm talking about the AI video of

0:00:54.720 --> 0:00:57.400
<v Speaker 1>the Horde of Bunnies jumping on a trampoline in the dark.

0:00:57.880 --> 0:01:00.320
<v Speaker 1>I was very convinced that this video was real, and

0:01:00.360 --> 0:01:02.960
<v Speaker 1>so were many people, which led to a mini panic.

0:01:03.800 --> 0:01:06.640
<v Speaker 1>Is it even possible to detect what's fake and what's

0:01:06.680 --> 0:01:09.720
<v Speaker 1>not anymore? That's where my guest today comes in. His

0:01:09.840 --> 0:01:12.960
<v Speaker 1>name is Jeremy Carrasco and he runs multiple social media

0:01:13.000 --> 0:01:15.679
<v Speaker 1>accounts under the name show Tools AI.

0:01:16.240 --> 0:01:18.360
<v Speaker 2>The idea that we can't tell what's real or not

0:01:18.880 --> 0:01:22.480
<v Speaker 2>because of AI video is so far definitely.

0:01:21.959 --> 0:01:24.240
<v Speaker 1>Not the case. He has only been a full time

0:01:24.280 --> 0:01:27.000
<v Speaker 1>creator for four months, but he has become a trusted

0:01:27.040 --> 0:01:30.880
<v Speaker 1>source for dissecting viral AI videos and explaining the tells.

0:01:31.000 --> 0:01:35.160
<v Speaker 2>There is a physical truth to shooting a video with

0:01:35.200 --> 0:01:38.959
<v Speaker 2>a camera. That physical truth isn't going away, and AI

0:01:39.520 --> 0:01:43.600
<v Speaker 2>does a version that to our eyes look like that

0:01:43.640 --> 0:01:48.400
<v Speaker 2>physical truth. But upon examination you can figure out that

0:01:48.440 --> 0:01:51.120
<v Speaker 2>these things break down. And I do think that any

0:01:51.240 --> 0:01:55.120
<v Speaker 2>normal person with decent eyesight can zoom into these AI

0:01:55.240 --> 0:01:56.600
<v Speaker 2>videos and figure that out.

0:01:57.200 --> 0:02:00.680
<v Speaker 1>So Jeremy wants his social videos to be education. He

0:02:00.760 --> 0:02:03.120
<v Speaker 1>wants more people to get excited by what he calls

0:02:03.200 --> 0:02:06.760
<v Speaker 1>pixel peeping, and he wants to improve people's media literacy

0:02:07.320 --> 0:02:10.120
<v Speaker 1>and hopes his accounts can help people tune their AI

0:02:10.240 --> 0:02:10.880
<v Speaker 1>vibe checker.

0:02:11.360 --> 0:02:13.400
<v Speaker 2>I'm not naive to the fact that people aren't going

0:02:13.440 --> 0:02:16.160
<v Speaker 2>to be pixel peeping on the videos that they watch,

0:02:16.320 --> 0:02:20.000
<v Speaker 2>So it's just about trying to tune people's initial impressions

0:02:20.040 --> 0:02:22.440
<v Speaker 2>so that they have something in their head that says ey,

0:02:22.520 --> 0:02:25.040
<v Speaker 2>something might not be right here, and then they can use,

0:02:25.080 --> 0:02:27.880
<v Speaker 2>hopefully other media skills that I teach them. In order

0:02:27.919 --> 0:02:28.959
<v Speaker 2>to dive a little.

0:02:28.760 --> 0:02:31.800
<v Speaker 1>Bit deeper, I talk to Jeremy about so many things,

0:02:31.840 --> 0:02:35.840
<v Speaker 1>how video generation tools work, how to pick up on AI, tells,

0:02:36.200 --> 0:02:39.200
<v Speaker 1>why Sora is an inflection point for the Internet, and

0:02:39.240 --> 0:02:41.960
<v Speaker 1>what this signals for the future of social media. I

0:02:42.000 --> 0:02:44.880
<v Speaker 1>started out by asking Jeremy to clarify what Sora is

0:02:45.280 --> 0:02:46.440
<v Speaker 1>and what it does.

0:02:47.120 --> 0:02:52.360
<v Speaker 2>So. Sora was originally released as Openay's first video model

0:02:52.680 --> 0:02:56.239
<v Speaker 2>in October twenty twenty five. They reuse the Sora name

0:02:56.440 --> 0:02:59.240
<v Speaker 2>to launch their social media app. A lot of the

0:02:59.280 --> 0:03:01.680
<v Speaker 2>hype has been around a Sora app, which is currently

0:03:01.720 --> 0:03:04.720
<v Speaker 2>invite only, and then there's the Sora TOI model that

0:03:04.760 --> 0:03:08.680
<v Speaker 2>you can already access if you have API access or

0:03:08.720 --> 0:03:11.760
<v Speaker 2>if you're a developer or even a normal person. There

0:03:11.800 --> 0:03:14.360
<v Speaker 2>are tools that let you generate a video with the

0:03:14.400 --> 0:03:18.440
<v Speaker 2>Sora to video model without an invite. The Sora app

0:03:18.480 --> 0:03:23.520
<v Speaker 2>experience is very unique in some ways and very familiar

0:03:23.520 --> 0:03:26.200
<v Speaker 2>in others. It does feel like a TikTok for you

0:03:26.320 --> 0:03:30.160
<v Speaker 2>page just for AI videos. You can scroll, it has

0:03:30.200 --> 0:03:32.840
<v Speaker 2>an algorithm to suggest But what's gotten a lot of

0:03:32.840 --> 0:03:36.720
<v Speaker 2>the tension is the ability to cameo someone, but really,

0:03:36.840 --> 0:03:38.880
<v Speaker 2>these are just deep bakes. Like you're creating deep fakes

0:03:38.920 --> 0:03:41.200
<v Speaker 2>of your friends. You're creating deep bakes of whoever lets

0:03:41.240 --> 0:03:42.960
<v Speaker 2>you create a deep bake of them, And you have

0:03:43.040 --> 0:03:46.840
<v Speaker 2>different levels of permissions. So, for example, Jake Paul and

0:03:46.920 --> 0:03:51.080
<v Speaker 2>Sam Altman let anyone deep fake them, whereas I let

0:03:51.120 --> 0:03:54.120
<v Speaker 2>no one deep fake me because I'm not comfortable with that.

0:03:54.600 --> 0:03:56.680
<v Speaker 1>What does it look like to let someone deep fake

0:03:56.720 --> 0:03:57.400
<v Speaker 1>you on Sora?

0:03:57.840 --> 0:04:00.800
<v Speaker 2>It looks like a version of you doing whatever they

0:04:01.520 --> 0:04:04.240
<v Speaker 2>prompted you to do. Now, there are safety features in place,

0:04:04.480 --> 0:04:08.120
<v Speaker 2>so you can't have them do anything violent, you can't

0:04:08.160 --> 0:04:11.560
<v Speaker 2>do anything sexual. But it's really up to open a

0:04:11.640 --> 0:04:14.280
<v Speaker 2>high to set those boundaries. And I don't think it's

0:04:14.320 --> 0:04:17.680
<v Speaker 2>completely accurate. I've made versions of myself that I think

0:04:17.720 --> 0:04:19.919
<v Speaker 2>don't look very much like me. I've made other versions

0:04:19.920 --> 0:04:22.400
<v Speaker 2>of myself that look a lot like me. That's really

0:04:22.520 --> 0:04:27.520
<v Speaker 2>up to luck, because as we'll learn, these models aren't deterministic.

0:04:27.600 --> 0:04:30.120
<v Speaker 2>There's a part of this that is random, so it's

0:04:30.160 --> 0:04:33.440
<v Speaker 2>not repeatable. So Jake Paul is a very good example.

0:04:33.480 --> 0:04:35.880
<v Speaker 2>There are a ton of AI videos of Jake Paul

0:04:35.960 --> 0:04:38.080
<v Speaker 2>right now. All of them look a little bit different,

0:04:38.120 --> 0:04:41.960
<v Speaker 2>but have his likeness, so you have to give permission

0:04:42.000 --> 0:04:43.680
<v Speaker 2>for someone to make a video of you through the

0:04:43.680 --> 0:04:44.440
<v Speaker 2>cameo feature.

0:04:44.760 --> 0:04:48.960
<v Speaker 1>So would you say that AI video generation scares you, Like,

0:04:49.360 --> 0:04:51.120
<v Speaker 1>is it something that keeps you up at night?

0:04:51.800 --> 0:04:54.159
<v Speaker 2>It's not because I'm doing something about it now, but

0:04:54.200 --> 0:04:56.280
<v Speaker 2>it really was, and I think it is keeping people

0:04:56.320 --> 0:04:58.000
<v Speaker 2>up at night because so much of our time is

0:04:58.040 --> 0:05:01.120
<v Speaker 2>spent on these short form video platforms like for better

0:05:01.240 --> 0:05:04.800
<v Speaker 2>or worse. I do think that it is the primary

0:05:04.839 --> 0:05:08.039
<v Speaker 2>way that people get information now. There was probably never

0:05:08.120 --> 0:05:10.520
<v Speaker 2>the best format for that information in the first place,

0:05:10.640 --> 0:05:13.080
<v Speaker 2>but here we are. So I think what keeps me

0:05:13.200 --> 0:05:16.800
<v Speaker 2>up is really general media literacy skills, and I think

0:05:16.800 --> 0:05:19.640
<v Speaker 2>of AI video as an extension of that. A lot

0:05:19.680 --> 0:05:21.560
<v Speaker 2>of people are kept up by what I think are

0:05:21.720 --> 0:05:25.480
<v Speaker 2>irrational fears about AI video, Like, in my opinion, it's

0:05:25.520 --> 0:05:28.039
<v Speaker 2>probably not going to be framing you for a crime

0:05:28.080 --> 0:05:31.400
<v Speaker 2>anytime soon, but it might turn the core of public

0:05:31.440 --> 0:05:34.320
<v Speaker 2>opinion against you. It might be spreading disinformation.

0:05:34.600 --> 0:05:34.680
<v Speaker 1>Like.

0:05:34.760 --> 0:05:38.320
<v Speaker 2>It's an extension of other media literacy problems, and it's

0:05:38.360 --> 0:05:41.440
<v Speaker 2>a very believable one because people when they are scrolling,

0:05:41.560 --> 0:05:43.400
<v Speaker 2>they're just there to tune out and scroll. They're not

0:05:43.440 --> 0:05:46.640
<v Speaker 2>there to pixel peep and really pay attention, right, I.

0:05:46.600 --> 0:05:48.960
<v Speaker 1>Mean, you don't think that we are living in a

0:05:49.000 --> 0:05:51.560
<v Speaker 1>world where soon people could be framed for something they

0:05:51.560 --> 0:05:53.599
<v Speaker 1>didn't do using manipulated video.

0:05:54.000 --> 0:05:56.960
<v Speaker 2>Well, I think that. I'm not a lawyer, but I've

0:05:57.000 --> 0:05:59.920
<v Speaker 2>done some looking into this, and the reality is that

0:06:00.279 --> 0:06:03.000
<v Speaker 2>in order for something to be admitted into evidence, at

0:06:03.080 --> 0:06:04.800
<v Speaker 2>least in the United States, it has to have an

0:06:04.839 --> 0:06:09.039
<v Speaker 2>extensive metadata trail. It has to be authenticated. You have

0:06:09.080 --> 0:06:11.039
<v Speaker 2>to get the person who filmed the video into the

0:06:11.040 --> 0:06:13.839
<v Speaker 2>courtroom to say that they filmed it. And we have

0:06:13.920 --> 0:06:17.840
<v Speaker 2>to understand that while our perception might be getting tricked,

0:06:18.040 --> 0:06:21.919
<v Speaker 2>there are procedural and mathematical ways that these can be detected.

0:06:22.440 --> 0:06:25.440
<v Speaker 2>So it is not undetectable yet. And anyone who says

0:06:25.440 --> 0:06:28.640
<v Speaker 2>it's undetectable is probably either selling you something or doesn't

0:06:28.640 --> 0:06:30.760
<v Speaker 2>have a good eye. And anyone who says it will

0:06:30.800 --> 0:06:34.680
<v Speaker 2>be undetectable does not know that, and frankly doesn't understand

0:06:34.760 --> 0:06:36.840
<v Speaker 2>the technology that's making these AI videos very well.

0:06:36.880 --> 0:06:40.160
<v Speaker 1>In my opinion, and right now, your likeness is not shared.

0:06:41.160 --> 0:06:46.640
<v Speaker 2>No, I have a strong, strong bias against this because

0:06:46.720 --> 0:06:49.680
<v Speaker 2>I believe that once your likeness gets out there and

0:06:49.839 --> 0:06:53.360
<v Speaker 2>is deepfakable, so to speak. It's really hard to pull

0:06:53.360 --> 0:06:56.719
<v Speaker 2>that back, not because you can't, like you can tell

0:06:56.760 --> 0:06:59.800
<v Speaker 2>people to stop, but once it's out there, I think

0:06:59.800 --> 0:07:01.880
<v Speaker 2>you lose a sense of trust. It's a line that

0:07:01.960 --> 0:07:04.440
<v Speaker 2>I just don't want to cross. I'm not comfortable crossing,

0:07:04.440 --> 0:07:06.719
<v Speaker 2>and I've actually told my followers I will never cross

0:07:06.760 --> 0:07:09.760
<v Speaker 2>that line because it's just not what I'm interested in.

0:07:10.440 --> 0:07:13.640
<v Speaker 1>So I was hoping that you could show me how

0:07:13.640 --> 0:07:15.559
<v Speaker 1>to make a video using the Sora app.

0:07:15.760 --> 0:07:19.520
<v Speaker 2>Sure, so this is the Sora desktop app. It is

0:07:19.640 --> 0:07:22.400
<v Speaker 2>not the vertical experience that you have, you know, on

0:07:22.400 --> 0:07:25.360
<v Speaker 2>the phone. It is, however, showing a lot of the

0:07:25.400 --> 0:07:28.280
<v Speaker 2>same content. So this is essentially the for you page

0:07:28.280 --> 0:07:30.400
<v Speaker 2>of Sora. And the thing to note here is that

0:07:30.440 --> 0:07:33.280
<v Speaker 2>there are Sora water marks over each one of these videos.

0:07:33.560 --> 0:07:36.040
<v Speaker 2>In the mobile experience, those water marks go away, but

0:07:36.160 --> 0:07:39.200
<v Speaker 2>they don't let you screen record in the mobile version,

0:07:39.320 --> 0:07:42.360
<v Speaker 2>Whereas theoretically anyone could do what I'm doing right now,

0:07:42.400 --> 0:07:44.720
<v Speaker 2>like I can share my screen here, I could record

0:07:44.760 --> 0:07:47.560
<v Speaker 2>my screen. When you see Sora videos on social media,

0:07:48.080 --> 0:07:49.360
<v Speaker 2>this is how they're being made.

0:07:49.440 --> 0:07:52.040
<v Speaker 1>So let's try to make a Sora video. Let's do

0:07:53.000 --> 0:07:55.680
<v Speaker 1>skiing with candy.

0:07:56.520 --> 0:07:59.000
<v Speaker 2>Skiing with Candy. You want me to just say that

0:07:59.080 --> 0:08:02.520
<v Speaker 2>and see what it comes up with. Yes, let's do it.

0:08:02.600 --> 0:08:03.720
<v Speaker 2>I think that's a great idea.

0:08:03.760 --> 0:08:04.840
<v Speaker 1>Why do you think it's a good idea?

0:08:04.960 --> 0:08:10.120
<v Speaker 2>Because so something that people aren't talking enough about with

0:08:10.240 --> 0:08:12.880
<v Speaker 2>Sora is that you can have a very simple prompt

0:08:12.960 --> 0:08:16.080
<v Speaker 2>and it can come up with something really creative. That's

0:08:16.200 --> 0:08:19.960
<v Speaker 2>really what, in my opinion, distinguishes it from other video models.

0:08:20.240 --> 0:08:22.880
<v Speaker 2>Google vo three was how a lot of AI content

0:08:22.960 --> 0:08:25.320
<v Speaker 2>was made a few weeks ago. If you don't give

0:08:25.320 --> 0:08:28.480
<v Speaker 2>Google vo three a good prompt, it's just boring, Whereas

0:08:28.480 --> 0:08:32.040
<v Speaker 2>Sora will go through some attempts to at least make

0:08:32.080 --> 0:08:33.120
<v Speaker 2>it entertaining anyway.

0:08:33.320 --> 0:08:36.360
<v Speaker 1>It's just incredible to me that in a given three

0:08:36.400 --> 0:08:38.160
<v Speaker 1>weeks the world sort of changes.

0:08:38.480 --> 0:08:42.080
<v Speaker 2>I think that there is a misconception that the world

0:08:42.200 --> 0:08:48.440
<v Speaker 2>just changed because video AI made a huge, undetectable leap.

0:08:48.960 --> 0:08:52.160
<v Speaker 2>It did make a step towards more realism. What soa

0:08:52.280 --> 0:08:57.000
<v Speaker 2>to really improved. Where a lot of the human parts

0:08:57.120 --> 0:09:00.840
<v Speaker 2>of video AI, such as hand movement or if they

0:09:01.160 --> 0:09:04.840
<v Speaker 2>have a missing limb, or if their teeth look weird,

0:09:05.080 --> 0:09:08.200
<v Speaker 2>or if their eyes look uncanny, hair like, there were

0:09:08.240 --> 0:09:10.880
<v Speaker 2>all these little things that people would pick on again,

0:09:11.000 --> 0:09:14.360
<v Speaker 2>a lot of them subconscious. Sura made a step towards

0:09:14.360 --> 0:09:18.199
<v Speaker 2>improving those things. It still has a lot of background issues.

0:09:18.720 --> 0:09:22.720
<v Speaker 2>It is actually a noisier or muddier looking model in

0:09:22.720 --> 0:09:25.520
<v Speaker 2>my opinion than video, but a lot of people aren't

0:09:25.559 --> 0:09:27.480
<v Speaker 2>looking for that. A lot of the videos that go

0:09:27.640 --> 0:09:32.720
<v Speaker 2>viral that are AI generated are security cams, our body

0:09:32.800 --> 0:09:36.960
<v Speaker 2>cams are go pro looking cameras, things that people aren't

0:09:37.000 --> 0:09:40.160
<v Speaker 2>looking at every day. But it really made improvements in

0:09:41.080 --> 0:09:45.560
<v Speaker 2>how good the outputs are to watch. Like story wise,

0:09:46.160 --> 0:09:49.080
<v Speaker 2>if you were to release Google vo three as a

0:09:49.120 --> 0:09:53.440
<v Speaker 2>social media app, it would fail just entirely because people

0:09:53.440 --> 0:09:57.600
<v Speaker 2>would get on there and unless you're a good prompter, like,

0:09:57.640 --> 0:10:00.320
<v Speaker 2>you're not going to come up with anything interesting. As

0:10:00.320 --> 0:10:03.520
<v Speaker 2>Sah made anyone getting into AI, it's possible for you

0:10:03.559 --> 0:10:06.280
<v Speaker 2>to come up with something interesting with a very basic prompt.

0:10:06.320 --> 0:10:09.760
<v Speaker 2>That's a really, really big innovation that they didn't talk about.

0:10:09.880 --> 0:10:12.160
<v Speaker 2>But I think that's why it's had such an impact

0:10:12.280 --> 0:10:16.920
<v Speaker 2>is because there's a huge volume of somewhat meaningful Sora

0:10:17.040 --> 0:10:19.880
<v Speaker 2>videos out there, whereas there really wasn't with VEO when

0:10:19.880 --> 0:10:22.360
<v Speaker 2>that came out right. So all right, so it came

0:10:22.440 --> 0:10:24.120
<v Speaker 2>up with skiing with candy. Let's see what Let's see

0:10:24.120 --> 0:10:24.640
<v Speaker 2>what I did here?

0:10:25.240 --> 0:10:25.520
<v Speaker 1>Look what.

0:10:27.160 --> 0:10:29.959
<v Speaker 2>Go mid slip snack classy and.

0:10:29.920 --> 0:10:31.000
<v Speaker 3>A peppermint for the wind.

0:10:31.200 --> 0:10:34.000
<v Speaker 1>Nothing like sweet feel to keep the turn smooth? Catch

0:10:34.000 --> 0:10:35.079
<v Speaker 1>you at the bottom.

0:10:35.480 --> 0:10:37.599
<v Speaker 2>All right? What are your impressions?

0:10:37.840 --> 0:10:40.760
<v Speaker 1>I just don't I'm sorry, this is Is it okay

0:10:40.760 --> 0:10:42.000
<v Speaker 1>that this is blowing my mind?

0:10:42.400 --> 0:10:43.400
<v Speaker 2>It should okay?

0:10:43.440 --> 0:10:45.360
<v Speaker 1>Good, it should blow your mind because I feel daft,

0:10:45.520 --> 0:10:49.640
<v Speaker 1>Like I feel like I can't wrap my head around this,

0:10:49.960 --> 0:10:52.160
<v Speaker 1>Like I'm assuming this woman in the video with her

0:10:52.200 --> 0:10:53.960
<v Speaker 1>ski mask on is not a real person.

0:10:54.120 --> 0:10:56.120
<v Speaker 2>No, she's not a real person. And we don't know

0:10:56.480 --> 0:10:58.600
<v Speaker 2>how they invented her. They just came up with that.

0:10:58.679 --> 0:10:59.880
<v Speaker 3>What so.

0:11:00.000 --> 0:11:03.000
<v Speaker 2>So there are things about this that stick out to

0:11:03.040 --> 0:11:05.720
<v Speaker 2>me as obvious AI video. And then there are things

0:11:05.720 --> 0:11:09.040
<v Speaker 2>about this that I just have to say, wow, that

0:11:09.160 --> 0:11:12.280
<v Speaker 2>is incredible. So if I can just explain what I

0:11:12.360 --> 0:11:15.360
<v Speaker 2>see here someone who watches these, so please. She starts

0:11:15.360 --> 0:11:18.120
<v Speaker 2>out by skiing down the hill, but she's kind of

0:11:18.160 --> 0:11:21.640
<v Speaker 2>skiing like it's snowboarding. Then she stops. She has some

0:11:21.679 --> 0:11:24.240
<v Speaker 2>peppermints in her hand, she has some bags of candy

0:11:24.400 --> 0:11:27.920
<v Speaker 2>in her hand, and there are some weird things going

0:11:27.960 --> 0:11:30.280
<v Speaker 2>on here. But what it did with it is without

0:11:30.320 --> 0:11:34.000
<v Speaker 2>any input, it basically made a social media video with it.

0:11:34.000 --> 0:11:38.200
<v Speaker 2>It's like she's promoting this candy. There's someone responding to

0:11:38.240 --> 0:11:42.319
<v Speaker 2>her in the background. It invented a straw for her exactly.

0:11:42.400 --> 0:11:45.120
<v Speaker 2>It she talks like an influencer.

0:11:45.520 --> 0:11:47.440
<v Speaker 1>I just it really trips me up that she's not

0:11:47.480 --> 0:11:49.319
<v Speaker 1>a real person, that this person does not exist in

0:11:49.360 --> 0:11:50.800
<v Speaker 1>the world. It's really weird.

0:11:51.040 --> 0:11:53.360
<v Speaker 2>Same. I mean, I have to tell myself it's not

0:11:53.400 --> 0:11:53.920
<v Speaker 2>a real person.

0:11:54.000 --> 0:11:55.679
<v Speaker 1>I mean, it would be like if you didn't exist.

0:11:56.000 --> 0:12:00.680
<v Speaker 2>Yeah, it's That's the thing is, it's visually feels the

0:12:00.679 --> 0:12:03.640
<v Speaker 2>same as talking to another person online. Of course, there

0:12:03.640 --> 0:12:05.640
<v Speaker 2>are there are tells, so I'll get into those. So

0:12:06.280 --> 0:12:08.880
<v Speaker 2>first of all, you have just the context. Why is

0:12:08.960 --> 0:12:11.800
<v Speaker 2>she skiing down the hill with a bag of candy

0:12:11.960 --> 0:12:13.880
<v Speaker 2>and why is she just putting it in her mouth

0:12:13.960 --> 0:12:17.800
<v Speaker 2>with the wrappers. Then there are some artifacts that I

0:12:17.840 --> 0:12:20.760
<v Speaker 2>can see, especially at the beginning of the generation. Her

0:12:20.880 --> 0:12:24.400
<v Speaker 2>jacket and her pants are incredibly pixelated when it starts.

0:12:24.920 --> 0:12:28.160
<v Speaker 2>But the other thing here is that it's very noisy.

0:12:28.200 --> 0:12:32.840
<v Speaker 2>If we actually zoom in there's a lot of artifacts

0:12:33.040 --> 0:12:34.640
<v Speaker 2>in the mountains back there.

0:12:34.880 --> 0:12:36.679
<v Speaker 1>It is weird how she's eating the candy. That's a

0:12:36.679 --> 0:12:37.480
<v Speaker 1>little uncanny.

0:12:37.520 --> 0:12:41.400
<v Speaker 2>It's weird. Yeah, she's eating raft candy and the bag

0:12:41.440 --> 0:12:43.960
<v Speaker 2>there just stuck to her knee. Yeah, you know, so

0:12:44.040 --> 0:12:46.480
<v Speaker 2>at first it's a ziplock bag, then it's not a

0:12:46.559 --> 0:12:50.720
<v Speaker 2>ziplock bag, then it sticks to her knee. Her feet

0:12:50.720 --> 0:12:54.199
<v Speaker 2>are backwards, like her foot there is literally backwards in

0:12:54.280 --> 0:12:56.840
<v Speaker 2>this version. She doesn't have a foot like you know,

0:12:56.880 --> 0:12:58.160
<v Speaker 2>you get into it. It's kind of funny.

0:12:58.200 --> 0:12:59.960
<v Speaker 1>But this is why you have such a large platfor

0:13:00.320 --> 0:13:02.320
<v Speaker 1>because like I look at this at first and I'm like, oh,

0:13:02.360 --> 0:13:05.840
<v Speaker 1>it's perfect. Like in a way, if I see the

0:13:05.880 --> 0:13:09.199
<v Speaker 1>trappings of what I think i'm seeing, I don't really

0:13:09.240 --> 0:13:10.760
<v Speaker 1>look for the detail that's wrong.

0:13:11.120 --> 0:13:14.439
<v Speaker 2>Especially when you're just scrolling on TikTok or Instagram. You're

0:13:14.440 --> 0:13:15.079
<v Speaker 2>not looking for.

0:13:15.000 --> 0:13:16.719
<v Speaker 1>Anything wrong, right, which is how they want you to

0:13:16.760 --> 0:13:18.559
<v Speaker 1>look at it, or scrolling on Sora.

0:13:18.480 --> 0:13:20.679
<v Speaker 2>Or scrolling on Sora. A lot of them are leaving

0:13:20.679 --> 0:13:23.560
<v Speaker 2>Sara and making it out to all these platforms. Yeah,

0:13:24.240 --> 0:13:26.480
<v Speaker 2>you're not going to be looking for these things. I'm

0:13:26.520 --> 0:13:29.640
<v Speaker 2>totally aware of that. I mean, on first watch, are

0:13:29.679 --> 0:13:31.480
<v Speaker 2>you gonna pick out everything that's wrong with this? No,

0:13:31.600 --> 0:13:33.800
<v Speaker 2>But if you watch it five times and start zooming in,

0:13:34.240 --> 0:13:37.199
<v Speaker 2>you're gonna start noticing that her feet are literally backwards.

0:13:37.480 --> 0:13:41.000
<v Speaker 2>So yeah, when it comes down to it, I think

0:13:41.040 --> 0:13:44.800
<v Speaker 2>what's really very important about Sora is that it did

0:13:44.840 --> 0:13:47.400
<v Speaker 2>all that work for you. You didn't need to know

0:13:47.480 --> 0:13:50.160
<v Speaker 2>how to prompt the video. AI. If you were to

0:13:50.240 --> 0:13:54.320
<v Speaker 2>put skiing with candy into Google video, it's just going

0:13:54.400 --> 0:13:56.320
<v Speaker 2>to be boring. I'll just tell you that right now.

0:13:56.679 --> 0:14:00.720
<v Speaker 1>So if I wanted this video, this exact video, for three,

0:14:01.720 --> 0:14:03.560
<v Speaker 1>what would I have to prompt it to do?

0:14:04.040 --> 0:14:06.320
<v Speaker 2>You'd have to act like a camera director. You'd have

0:14:06.400 --> 0:14:11.040
<v Speaker 2>to say, video starting with women skiing down the slope.

0:14:11.120 --> 0:14:15.880
<v Speaker 2>She is wearing a pink and yellow top, a turquoise bottom,

0:14:16.320 --> 0:14:18.320
<v Speaker 2>She's holding a bag of candy in her right hand,

0:14:18.360 --> 0:14:20.600
<v Speaker 2>pepperminster her left hand, and you'd have to go shot

0:14:20.640 --> 0:14:23.880
<v Speaker 2>by shot to give it. I can actually show you

0:14:24.560 --> 0:14:27.320
<v Speaker 2>something that I came up with that more clearly demonstrates

0:14:27.400 --> 0:14:29.960
<v Speaker 2>this point. So this is a video I made yesterday

0:14:30.080 --> 0:14:34.600
<v Speaker 2>with the prompt epic anime of Diego Maradonna scoring a

0:14:34.680 --> 0:14:37.000
<v Speaker 2>goal in the world cup, weaving.

0:14:36.680 --> 0:14:41.600
<v Speaker 1>Past one, still going two. Defender's beaten. He won't announcers.

0:14:42.280 --> 0:14:45.680
<v Speaker 2>This is him dribbling through an entire defense. It is

0:14:45.720 --> 0:14:49.120
<v Speaker 2>an epic looking anime. Anime. People would say it doesn't

0:14:49.160 --> 0:14:53.040
<v Speaker 2>look great, but normal people probably wouldn't notice it. And

0:14:53.320 --> 0:14:57.360
<v Speaker 2>what blew me away about this is that it created

0:14:57.720 --> 0:15:02.280
<v Speaker 2>Diego Maradonna's most famous goal and it added the announcers.

0:15:02.560 --> 0:15:04.680
<v Speaker 2>I didn't tell it to do any of that. Now,

0:15:04.720 --> 0:15:07.720
<v Speaker 2>if I compare that to what Google Vio did with

0:15:07.840 --> 0:15:15.080
<v Speaker 2>the exact same prompt it did this, this.

0:15:15.000 --> 0:15:16.120
<v Speaker 1>One is b team.

0:15:16.160 --> 0:15:20.200
<v Speaker 2>It is. The quality of the video is actually better,

0:15:20.440 --> 0:15:23.840
<v Speaker 2>but it didn't make it interesting. So again, that's why

0:15:23.880 --> 0:15:25.840
<v Speaker 2>you're seeing so much, Sarah, as you don't need to

0:15:25.840 --> 0:15:26.600
<v Speaker 2>be very creative.

0:15:27.080 --> 0:15:30.120
<v Speaker 1>What are the implications of a social media app being

0:15:30.200 --> 0:15:34.280
<v Speaker 1>designed and housing videos full of fake people? Like it's

0:15:34.320 --> 0:15:35.760
<v Speaker 1>just crazy to me that I can watch a video

0:15:35.840 --> 0:15:37.440
<v Speaker 1>of someone who doesn't exist.

0:15:37.960 --> 0:15:42.040
<v Speaker 2>I think that we don't know the implications, and I

0:15:42.080 --> 0:15:45.920
<v Speaker 2>would push back on it being like our inevitable future

0:15:46.360 --> 0:15:49.760
<v Speaker 2>a bit, but I would say that it is normalizing

0:15:50.000 --> 0:15:53.960
<v Speaker 2>deep faking, and I don't think we know what that

0:15:54.040 --> 0:15:56.720
<v Speaker 2>will mean for us. But I don't think it'll be good.

0:15:57.200 --> 0:15:59.960
<v Speaker 2>I think it might be entertaining, I think it might

0:15:59.960 --> 0:16:04.320
<v Speaker 2>be interesting. It is certainly a technical achievement, but I

0:16:04.320 --> 0:16:07.800
<v Speaker 2>don't consider it to be a technological advancement. I'm not

0:16:07.920 --> 0:16:11.080
<v Speaker 2>so sure it is progress. But it is a pretty

0:16:11.080 --> 0:16:14.320
<v Speaker 2>incredible thing that they've been able to pull off, and

0:16:14.840 --> 0:16:17.840
<v Speaker 2>I think that it is rational for people to look

0:16:17.880 --> 0:16:21.080
<v Speaker 2>at these videos and be pretty freaked out. And that's

0:16:21.120 --> 0:16:25.080
<v Speaker 2>what a lot of my comments are because what isn't

0:16:25.120 --> 0:16:29.320
<v Speaker 2>clear is how this is going to improve social media

0:16:29.320 --> 0:16:32.320
<v Speaker 2>in anyway, to improve our media literacy skills in any way.

0:16:32.640 --> 0:16:37.760
<v Speaker 2>There are definitely tech advancements here that can improve advancements

0:16:37.800 --> 0:16:42.680
<v Speaker 2>towards artificial general intelligence, like there are technical reasons that

0:16:42.720 --> 0:16:46.000
<v Speaker 2>this could be helpful in the future. But the step

0:16:46.040 --> 0:16:48.960
<v Speaker 2>that open Aye took to release this in a social

0:16:49.040 --> 0:16:53.760
<v Speaker 2>media app was a huge jump, in my opinion, in

0:16:53.800 --> 0:16:56.920
<v Speaker 2>the wrong direction. But the technology is here to stay

0:16:56.960 --> 0:16:57.320
<v Speaker 2>for sure.

0:17:03.920 --> 0:17:08.000
<v Speaker 1>After the break, will we become desensitized to deep fakes?

0:17:08.520 --> 0:17:12.880
<v Speaker 3>Stay with us?

0:17:27.800 --> 0:17:30.120
<v Speaker 1>One thing that I can't really get over about sore

0:17:30.240 --> 0:17:34.440
<v Speaker 1>Too is that Sam Altman is letting anybody use his likeness.

0:17:34.560 --> 0:17:37.840
<v Speaker 1>He opened his likeness to any sor user, so I

0:17:37.840 --> 0:17:42.000
<v Speaker 1>could say Sam Altman building a snowman for example, why

0:17:42.040 --> 0:17:45.320
<v Speaker 1>do this, Like, as the head of the company.

0:17:45.359 --> 0:17:48.480
<v Speaker 2>I can only guess. I think that it is generally

0:17:49.080 --> 0:17:54.480
<v Speaker 2>just attempt at normalizing deep baking people, and I think

0:17:54.520 --> 0:17:57.240
<v Speaker 2>people should be really scared of crossing that line. I

0:17:57.240 --> 0:17:59.560
<v Speaker 2>think it's a serious thing to do, and I think

0:17:59.640 --> 0:18:03.960
<v Speaker 2>open pushing everyone in that direction before anyone was even

0:18:04.000 --> 0:18:08.320
<v Speaker 2>asking for it is really frightening. You could create deep

0:18:08.359 --> 0:18:11.560
<v Speaker 2>fakes of people before there was the technology to do it.

0:18:11.760 --> 0:18:14.720
<v Speaker 2>There was a lot of friction and social pressure not

0:18:14.880 --> 0:18:18.280
<v Speaker 2>to do it. That friction was helpful in keeping our

0:18:18.359 --> 0:18:22.320
<v Speaker 2>information economy healthy. Even with safety features on the Sora

0:18:22.359 --> 0:18:25.639
<v Speaker 2>app of like letting letting you set permissions, people are

0:18:25.640 --> 0:18:28.880
<v Speaker 2>gonna mess that up. People won't know that they can

0:18:28.880 --> 0:18:31.479
<v Speaker 2>be deepfaked, and of course that's their responsibility to know.

0:18:31.680 --> 0:18:34.120
<v Speaker 2>But you've just opened up an entire can of worms.

0:18:34.160 --> 0:18:37.200
<v Speaker 2>There are other issues here, like currently you can't delete

0:18:37.240 --> 0:18:40.639
<v Speaker 2>your Sora account without deleting your entire JATGPT account.

0:18:40.960 --> 0:18:41.200
<v Speaker 3>Wow.

0:18:41.359 --> 0:18:43.879
<v Speaker 2>And again like you can't pull this back, Like in

0:18:44.040 --> 0:18:46.280
<v Speaker 2>theory you could stop people. But if you are a

0:18:46.280 --> 0:18:48.520
<v Speaker 2>public figure and you open up this can of worms,

0:18:48.880 --> 0:18:54.040
<v Speaker 2>it could really backfire. So it's Sora accelerating this deep

0:18:54.080 --> 0:18:57.399
<v Speaker 2>fake idea into a space that just hasn't been that

0:18:57.440 --> 0:18:59.760
<v Speaker 2>full explored yet. And I don't think i'd want to

0:18:59.760 --> 0:19:02.920
<v Speaker 2>be inly adopter of this because there's a lot of negative,

0:19:03.040 --> 0:19:05.520
<v Speaker 2>like downside risk that I just don't think we figured

0:19:05.600 --> 0:19:06.000
<v Speaker 2>out yet.

0:19:06.520 --> 0:19:09.040
<v Speaker 1>So you have a video where you talk about how

0:19:09.080 --> 0:19:12.600
<v Speaker 1>SOA is actually costing open Ai about one dollar per post?

0:19:12.720 --> 0:19:15.760
<v Speaker 1>Can you explain that calculation and what it means for

0:19:15.840 --> 0:19:17.160
<v Speaker 1>Sora long term?

0:19:17.240 --> 0:19:19.520
<v Speaker 2>This was an educated guest that ended up being right.

0:19:19.840 --> 0:19:23.280
<v Speaker 2>Every video you create is basically on open AI's dime. So,

0:19:23.520 --> 0:19:26.840
<v Speaker 2>for example, two weeks ago, if I, as a creator

0:19:26.960 --> 0:19:30.320
<v Speaker 2>wanted to post an Ai video to TikTok or Instagram,

0:19:30.680 --> 0:19:33.040
<v Speaker 2>I would have to pay a subscription to make that

0:19:33.119 --> 0:19:37.400
<v Speaker 2>video and download it or pay per post. So there

0:19:37.520 --> 0:19:42.640
<v Speaker 2>are commodity prices for these video models. For Google vo

0:19:42.920 --> 0:19:46.000
<v Speaker 2>it's a dollar fifty to three dollars. Sora is currently

0:19:46.080 --> 0:19:51.040
<v Speaker 2>around a dollar. But the Sora application is free, and

0:19:51.160 --> 0:19:53.760
<v Speaker 2>anytime you create an Ai video on that it is

0:19:53.880 --> 0:19:57.600
<v Speaker 2>free to you. So as always I would ask the question,

0:19:57.760 --> 0:19:59.959
<v Speaker 2>if it is free, are you the product? And I'm

0:20:00.080 --> 0:20:02.960
<v Speaker 2>this case they are taking your data, they're taking your

0:20:02.960 --> 0:20:06.119
<v Speaker 2>face scans, they're taking your props. Right, so there's that

0:20:06.200 --> 0:20:08.040
<v Speaker 2>question of why are they doing this? Of course they're

0:20:08.040 --> 0:20:10.879
<v Speaker 2>also doing it to get users. But imagine you were

0:20:10.920 --> 0:20:15.119
<v Speaker 2>TikTok or Instagram and every single time someone posted a

0:20:15.200 --> 0:20:18.200
<v Speaker 2>video on your site you needed to pay a dollar.

0:20:18.600 --> 0:20:20.840
<v Speaker 2>How quickly is that going to add up? For Sora?

0:20:21.160 --> 0:20:21.760
<v Speaker 1>Very quickly?

0:20:22.200 --> 0:20:26.280
<v Speaker 2>Would advertisers be able to make up that difference? Are

0:20:26.280 --> 0:20:28.280
<v Speaker 2>you going to need subscribers to help make up with

0:20:28.320 --> 0:20:31.200
<v Speaker 2>that difference? I mean, video takes a ton of compute.

0:20:31.240 --> 0:20:34.640
<v Speaker 2>It is costing them GPU compute, it is costing them

0:20:35.119 --> 0:20:38.760
<v Speaker 2>opportunity costs. The GPUs could be used for other things, right,

0:20:39.080 --> 0:20:42.639
<v Speaker 2>So the fact that they chose a video social media

0:20:42.680 --> 0:20:45.040
<v Speaker 2>app where every time someone posts on your platform it's

0:20:45.040 --> 0:20:48.160
<v Speaker 2>costing you money is pretty confusing to me as someone

0:20:48.200 --> 0:20:52.440
<v Speaker 2>who understands that those advertiser clicks are not even close

0:20:52.480 --> 0:20:53.359
<v Speaker 2>to worth that much.

0:20:53.720 --> 0:20:59.199
<v Speaker 1>My question is, is your sam Altman, you oversee the

0:20:59.320 --> 0:21:04.280
<v Speaker 1>most popular or AI tool on the market, Why are

0:21:04.280 --> 0:21:05.600
<v Speaker 1>you going into social media?

0:21:06.160 --> 0:21:09.320
<v Speaker 2>You're asking the right question that I think even open

0:21:09.359 --> 0:21:12.480
<v Speaker 2>AI's own employees are asking. There has been some reporting

0:21:12.680 --> 0:21:16.280
<v Speaker 2>on even open AI people being confused by this. At

0:21:16.280 --> 0:21:19.560
<v Speaker 2>the end of the day, TikTok is releasing an AI generator,

0:21:19.600 --> 0:21:21.679
<v Speaker 2>I get ads for that all the time. YouTube is

0:21:21.720 --> 0:21:26.400
<v Speaker 2>putting Google vo three into YouTube shorts. Everyone's looking at

0:21:26.400 --> 0:21:29.680
<v Speaker 2>this as how do we build like the AI video feed?

0:21:29.960 --> 0:21:34.359
<v Speaker 2>And it appears to me the rationale would be to

0:21:34.600 --> 0:21:38.160
<v Speaker 2>generate some sort of advertiser revenue. I think that would

0:21:38.200 --> 0:21:41.040
<v Speaker 2>be the simple answer. But whether or not that actually

0:21:41.040 --> 0:21:43.680
<v Speaker 2>works is a huge open question.

0:21:44.200 --> 0:21:47.280
<v Speaker 1>So in the future, say, Sora, the app is running

0:21:47.320 --> 0:21:48.560
<v Speaker 1>ads between videos.

0:21:48.760 --> 0:21:50.680
<v Speaker 2>Yeah, absolutely interesting.

0:21:51.640 --> 0:21:54.040
<v Speaker 1>So in one of your videos, you say that AI

0:21:54.200 --> 0:21:56.720
<v Speaker 1>will end social media? What do you mean by that?

0:21:57.480 --> 0:21:59.960
<v Speaker 2>I think it has the potential to end the four

0:22:00.119 --> 0:22:03.080
<v Speaker 2>you page as we know it, unless the social media

0:22:03.119 --> 0:22:07.800
<v Speaker 2>companies figure out a way to filter AI content. Again,

0:22:08.240 --> 0:22:10.720
<v Speaker 2>we do not know how people are going to react

0:22:10.720 --> 0:22:14.719
<v Speaker 2>to this when it's deployed much wider. But it is

0:22:14.760 --> 0:22:18.840
<v Speaker 2>a rational thing to not want to only see AI

0:22:18.920 --> 0:22:21.520
<v Speaker 2>slop in your feed. And I say AI slop because

0:22:21.560 --> 0:22:24.240
<v Speaker 2>it's bad. Let's assume even that it's better. Let's assume

0:22:24.240 --> 0:22:28.560
<v Speaker 2>that AI video were indistinguishable. If that were the case,

0:22:29.000 --> 0:22:31.040
<v Speaker 2>would you actually want more of it in your feed,

0:22:31.480 --> 0:22:33.720
<v Speaker 2>or would you want to turn it off even more.

0:22:34.280 --> 0:22:36.639
<v Speaker 2>I don't think that we know the answers to these questions,

0:22:37.080 --> 0:22:41.480
<v Speaker 2>but it's very likely that if companies that are running

0:22:41.480 --> 0:22:45.280
<v Speaker 2>these platforms can't figure out a way to filter out

0:22:45.320 --> 0:22:48.600
<v Speaker 2>AI content, there's a part of the population that's going

0:22:48.640 --> 0:22:52.359
<v Speaker 2>to start tuning out. There's also advertisers that might be

0:22:52.400 --> 0:22:55.400
<v Speaker 2>scared by that, So I do think it's an existential

0:22:55.440 --> 0:22:57.800
<v Speaker 2>threat to the for you page. I think it actually

0:22:57.880 --> 0:23:01.800
<v Speaker 2>might be a boon for this subscriber or substack type communities,

0:23:02.040 --> 0:23:05.200
<v Speaker 2>like I think thatsting when people start rushing towards people

0:23:05.200 --> 0:23:07.920
<v Speaker 2>that they trust, I think that that could be a really,

0:23:08.000 --> 0:23:11.399
<v Speaker 2>really positive thing. I'll say for me, one of the

0:23:11.440 --> 0:23:13.119
<v Speaker 2>things that I would be looking at if I were

0:23:13.119 --> 0:23:16.200
<v Speaker 2>an AI creator is the fact that because Sora too

0:23:16.480 --> 0:23:19.439
<v Speaker 2>is so good at making videos, it lowered the barrier

0:23:19.440 --> 0:23:22.400
<v Speaker 2>of entries so far that I don't think open ai

0:23:22.560 --> 0:23:25.760
<v Speaker 2>is that far from generating their own feed. You know,

0:23:25.760 --> 0:23:28.879
<v Speaker 2>if you can make an interesting video with only two sentences, well,

0:23:28.960 --> 0:23:32.840
<v Speaker 2>chat gbt can make two sentences. They're collecting everyone's prompts,

0:23:32.880 --> 0:23:37.760
<v Speaker 2>they're seeing what gets likes and engagement on Sora. I

0:23:37.800 --> 0:23:40.119
<v Speaker 2>don't understand why they would need a human in the

0:23:40.160 --> 0:23:40.720
<v Speaker 2>loop soon.

0:23:41.200 --> 0:23:44.639
<v Speaker 1>I believe there's Actually we were just covering a story

0:23:44.680 --> 0:23:48.080
<v Speaker 1>in the Financial Times about gen Z being less on

0:23:48.200 --> 0:23:50.840
<v Speaker 1>social media, and I think a lot of it has

0:23:50.880 --> 0:23:54.520
<v Speaker 1>to do with the sort of enthitification of the feed.

0:23:55.040 --> 0:23:57.320
<v Speaker 1>And I see a lot of people kind of resigned

0:23:57.359 --> 0:24:00.200
<v Speaker 1>to the fact that going on Instagram means scrolling through

0:24:00.359 --> 0:24:01.840
<v Speaker 1>a lot of shit, and a lot of shit that's

0:24:01.840 --> 0:24:05.320
<v Speaker 1>AI generated. It's no longer social media. It's like watching

0:24:05.359 --> 0:24:09.080
<v Speaker 1>fake video. Yeah, it's hyper and shitification. It is the

0:24:09.200 --> 0:24:13.240
<v Speaker 1>most and shittified feed you could possibly have. And I

0:24:13.280 --> 0:24:16.960
<v Speaker 1>am totally agreeing that there will be people who are

0:24:17.000 --> 0:24:19.560
<v Speaker 1>super down with that and who are going to enjoy it.

0:24:20.119 --> 0:24:22.520
<v Speaker 2>Again, there are people who enjoy this. I don't want

0:24:22.600 --> 0:24:25.520
<v Speaker 2>to say that they're doing the wrong thing by enjoying

0:24:25.560 --> 0:24:26.439
<v Speaker 2>AI video a.

0:24:26.480 --> 0:24:28.720
<v Speaker 1>Fruit cutting another fruit, something like that.

0:24:29.040 --> 0:24:31.560
<v Speaker 2>Yeah, Like, I'm not here to judge what people are watching.

0:24:31.920 --> 0:24:36.359
<v Speaker 2>But if you play this out to its logical conclusion, here,

0:24:36.720 --> 0:24:41.720
<v Speaker 2>it looks like social media companies generating their own videos

0:24:41.800 --> 0:24:46.679
<v Speaker 2>without creators in the middle, for a hyper and shitified feed.

0:24:47.320 --> 0:24:49.840
<v Speaker 1>So five to ten years is a huge difference. So

0:24:49.920 --> 0:24:52.600
<v Speaker 1>let's just say, five years from now, what do you

0:24:52.680 --> 0:24:55.199
<v Speaker 1>think the state of AI video looks like, and what

0:24:55.240 --> 0:24:58.080
<v Speaker 1>does it mean for the Internet, for politics, and just

0:24:58.240 --> 0:24:59.439
<v Speaker 1>us generally as a culture.

0:25:00.119 --> 0:25:05.640
<v Speaker 2>If we project the current growth out, it is indistinguishable

0:25:05.720 --> 0:25:10.200
<v Speaker 2>and everywhere. If we take a contrarian view, we can

0:25:10.280 --> 0:25:12.560
<v Speaker 2>see that people might not be into it and it

0:25:12.640 --> 0:25:15.680
<v Speaker 2>might lose a lot of money. We don't know which

0:25:15.680 --> 0:25:18.040
<v Speaker 2>direction it's going to go, and I don't claim to

0:25:18.080 --> 0:25:21.439
<v Speaker 2>be able to tell which direction we're going in. But

0:25:21.840 --> 0:25:25.000
<v Speaker 2>in that first scenario where it's indistinguishable, it'll still be

0:25:25.040 --> 0:25:30.080
<v Speaker 2>distinguishable by machine learning algorithms, It'll still be detectable by experts.

0:25:30.400 --> 0:25:34.520
<v Speaker 2>I still don't think it presents legal problems, but it

0:25:34.560 --> 0:25:39.359
<v Speaker 2>presents massive disinformation problems. I'm very scared about that. And

0:25:39.400 --> 0:25:41.920
<v Speaker 2>then there's another scenario which I think is a little

0:25:41.960 --> 0:25:44.439
<v Speaker 2>bit more optimistic, which I actually subscribe to, which is

0:25:44.440 --> 0:25:48.080
<v Speaker 2>that AI content becomes its own genre. There are companies

0:25:48.080 --> 0:25:51.360
<v Speaker 2>that figure out a way to monetize it. It stays

0:25:51.480 --> 0:25:57.720
<v Speaker 2>separate from our real feeds to whatever degree the viewer wants.

0:25:58.119 --> 0:26:00.199
<v Speaker 2>And I think that this is the optimistic vision, and

0:26:00.280 --> 0:26:02.040
<v Speaker 2>that a lot of the tech community believes in too,

0:26:02.080 --> 0:26:04.000
<v Speaker 2>and that Sam Altman would probably say, you know, he's

0:26:04.040 --> 0:26:05.800
<v Speaker 2>been asked about this, He's been asked, how do we

0:26:05.800 --> 0:26:09.240
<v Speaker 2>tell what's real or fake? And I actually didn't hate

0:26:09.240 --> 0:26:11.600
<v Speaker 2>his answer. He said, well, just like we've always told

0:26:11.800 --> 0:26:13.800
<v Speaker 2>we follow the people we trust, like we have human

0:26:13.880 --> 0:26:18.280
<v Speaker 2>communication networks. Now, I think that his accelerationist view is

0:26:18.400 --> 0:26:20.639
<v Speaker 2>kind of running against that a little bit, but I

0:26:20.680 --> 0:26:22.880
<v Speaker 2>do believe that at its core, that's how we're going

0:26:22.880 --> 0:26:26.320
<v Speaker 2>to figure this out, and it might push people less online.

0:26:26.440 --> 0:26:29.359
<v Speaker 2>Like I just think that there's just so many unanswered questions.

0:26:29.400 --> 0:26:32.359
<v Speaker 2>But yeah, there's a few different scenarios that right now,

0:26:32.480 --> 0:26:34.159
<v Speaker 2>I think we just have to flip a coin on

0:26:34.200 --> 0:26:35.639
<v Speaker 2>which one we believe in.

0:26:37.160 --> 0:26:40.080
<v Speaker 1>So you said the reason that you got interested in

0:26:40.280 --> 0:26:45.040
<v Speaker 1>understanding AI video was as a tool for production. When

0:26:45.040 --> 0:26:47.480
<v Speaker 1>that was the case, what were you excited about and

0:26:47.520 --> 0:26:49.520
<v Speaker 1>sort of why has that now changed for you.

0:26:50.720 --> 0:26:54.080
<v Speaker 2>I was excited about it lowering the grounds to doing

0:26:54.200 --> 0:26:57.080
<v Speaker 2>creative things. I have a green screen studio in my basement.

0:26:57.119 --> 0:26:59.199
<v Speaker 2>I was excited about it, you know, putting me in

0:26:59.200 --> 0:27:01.879
<v Speaker 2>different types of stud and different types of environments. I

0:27:01.920 --> 0:27:06.800
<v Speaker 2>was excited about it improving my graphics workflows. What started

0:27:06.840 --> 0:27:09.159
<v Speaker 2>steering me away from it. It was some of the

0:27:09.160 --> 0:27:11.960
<v Speaker 2>ethical concerns. I did realize that at the end of

0:27:12.000 --> 0:27:15.720
<v Speaker 2>the day, like this was mostly stolen information. It was

0:27:15.920 --> 0:27:19.320
<v Speaker 2>actually not that much more useful than the actual room

0:27:19.520 --> 0:27:21.600
<v Speaker 2>I'm in right now, Like I can make a decent

0:27:21.640 --> 0:27:28.639
<v Speaker 2>studio myself. And really what made me turn was just

0:27:28.920 --> 0:27:31.680
<v Speaker 2>using the tools. I think a lot of the people

0:27:31.960 --> 0:27:36.359
<v Speaker 2>who are using them, who come from my background, realize

0:27:36.359 --> 0:27:39.320
<v Speaker 2>that they aren't very fun tools to use. It's not

0:27:39.359 --> 0:27:42.000
<v Speaker 2>a creative process for me. It's really frustrating.

0:27:42.080 --> 0:27:43.280
<v Speaker 1>Well, you just type something in.

0:27:43.400 --> 0:27:45.080
<v Speaker 2>You just type something in, and you hope it comes

0:27:45.080 --> 0:27:47.480
<v Speaker 2>back the way you want it. It's like if because

0:27:47.480 --> 0:27:49.240
<v Speaker 2>I have a history as a director, it is like

0:27:49.400 --> 0:27:52.119
<v Speaker 2>every time I needed to tell the actor exactly what

0:27:52.240 --> 0:27:55.720
<v Speaker 2>to say, exactly how to deliver it, over and over

0:27:55.840 --> 0:27:59.560
<v Speaker 2>and over. And as a creative person and as a director,

0:28:00.080 --> 0:28:02.680
<v Speaker 2>I just want to collaborate with people who bring something

0:28:02.680 --> 0:28:04.520
<v Speaker 2>to the table. I don't want to bring everything to

0:28:04.560 --> 0:28:06.479
<v Speaker 2>the table myself. I don't want to tell everyone how

0:28:06.480 --> 0:28:09.040
<v Speaker 2>to do everything right. That's not what the process of

0:28:09.080 --> 0:28:11.919
<v Speaker 2>creating ever was. It was always about collaboration. It was

0:28:11.920 --> 0:28:14.720
<v Speaker 2>always a fun process. I find the idea of just

0:28:14.720 --> 0:28:19.119
<v Speaker 2>sitting in my basement creating AI videos with text is

0:28:19.160 --> 0:28:22.719
<v Speaker 2>just it's exhausting. It doesn't feel creative at all. So

0:28:23.320 --> 0:28:25.960
<v Speaker 2>but I'm not saying that people should hate every AI

0:28:26.080 --> 0:28:28.680
<v Speaker 2>video they see, like some of them can be creative.

0:28:28.720 --> 0:28:32.280
<v Speaker 2>But yeah, it's just taking that opportunity to train yourself

0:28:32.320 --> 0:28:34.600
<v Speaker 2>to see what these video models look like. Because if

0:28:34.600 --> 0:28:37.439
<v Speaker 2>you're into it, that's totally fine, but then you're at

0:28:37.520 --> 0:28:40.080
<v Speaker 2>least ready for when it is used for disinformation, which

0:28:40.080 --> 0:28:41.440
<v Speaker 2>I think is enough of ball at this point.

0:28:41.840 --> 0:28:45.680
<v Speaker 1>Well, thank you so much, Jeremy. I will be tuned

0:28:45.760 --> 0:28:48.680
<v Speaker 1>into your feed. You are I don't know what I

0:28:48.680 --> 0:28:51.520
<v Speaker 1>would call you. Is it vigilante justice? I don't think so.

0:28:51.720 --> 0:28:55.520
<v Speaker 1>But you're doing some kind of public service education education.

0:28:55.640 --> 0:28:57.200
<v Speaker 3>You're an educator, Yeah, there you go.

0:28:57.320 --> 0:28:58.480
<v Speaker 1>You're an AI educator.

0:28:58.600 --> 0:29:22.040
<v Speaker 3>Yeah, for tech stuff.

0:29:22.240 --> 0:29:25.520
<v Speaker 1>I'm Kara Price. This episode was produced by Eliza Dennis,

0:29:25.560 --> 0:29:28.680
<v Speaker 1>Melissa Slaughter, and Tyler Hill. It was executive produced by

0:29:28.720 --> 0:29:32.720
<v Speaker 1>me oswa Oshan, Julia Nutter, and Kate Osborne for Kaleidoscope

0:29:33.000 --> 0:29:36.680
<v Speaker 1>and Katrina Norvell for iHeart Podcasts. Kyle Murdoch mixed this

0:29:36.760 --> 0:29:39.680
<v Speaker 1>episode and wrote our theme song. Join us on Friday

0:29:39.720 --> 0:29:41.840
<v Speaker 1>for the week in tech oz and I will run

0:29:41.880 --> 0:29:44.640
<v Speaker 1>through the headlines you may have missed. Please rate, review,

0:29:44.680 --> 0:29:47.160
<v Speaker 1>and reach out to us at tech Stuff Podcast at

0:29:47.160 --> 0:29:57.320
<v Speaker 1>gmail dot com.