WEBVTT - The AI that Might Take My Job (And Yours)

0:00:15.356 --> 0:00:15.796
<v Speaker 1>Pushkin.

0:00:22.036 --> 0:00:24.476
<v Speaker 2>After every interview we do for the show, we upload

0:00:24.516 --> 0:00:28.236
<v Speaker 2>the audio to a piece of software called descript. Descript

0:00:28.316 --> 0:00:31.476
<v Speaker 2>turns the audio into a transcript, and then I can

0:00:31.756 --> 0:00:35.556
<v Speaker 2>edit that transcript, cut out the boring parts, move sections around,

0:00:35.916 --> 0:00:39.876
<v Speaker 2>and when I do that, descript edits the underlying audio

0:00:40.036 --> 0:00:45.436
<v Speaker 2>to match. As software, Descript is pretty janky, buggy. It's

0:00:45.476 --> 0:00:47.996
<v Speaker 2>constantly changing in ways that can make it hard to use,

0:00:48.356 --> 0:00:51.556
<v Speaker 2>and sometimes it just blows stuff up. But we use

0:00:51.596 --> 0:00:55.716
<v Speaker 2>it anyway because Descript is an incredible.

0:00:54.956 --> 0:00:56.676
<v Speaker 1>Advance over what came before.

0:00:57.396 --> 0:01:01.956
<v Speaker 2>Before descript audio software represented audio files not as words

0:01:01.996 --> 0:01:05.436
<v Speaker 2>that you can read and edit, but as waveforms, as

0:01:05.676 --> 0:01:07.676
<v Speaker 2>squiggly lines presented.

0:01:07.316 --> 0:01:09.876
<v Speaker 1>On a timeline. So when does the script came along?

0:01:09.996 --> 0:01:12.396
<v Speaker 2>Being able to edit audio by editing words on a

0:01:12.436 --> 0:01:15.956
<v Speaker 2>screen was this huge advance, and it was an advance

0:01:15.956 --> 0:01:21.876
<v Speaker 2>made possible by artificial intelligence. Eventually, Descript expanded to allow

0:01:21.916 --> 0:01:25.036
<v Speaker 2>people to edit not just audio but also video, and

0:01:25.116 --> 0:01:28.636
<v Speaker 2>last fall, open AI, the company that makes chat GPT,

0:01:29.196 --> 0:01:30.956
<v Speaker 2>led a fifty million dollar.

0:01:30.796 --> 0:01:32.396
<v Speaker 1>Investment round in Descript.

0:01:32.876 --> 0:01:35.636
<v Speaker 2>It's a sign that Descript is moving out to the

0:01:35.676 --> 0:01:40.036
<v Speaker 2>new AI frontier, the frontier of generative AI. AI that

0:01:40.076 --> 0:01:44.076
<v Speaker 2>creates words and pictures. This is of immediate interest to me,

0:01:44.796 --> 0:01:47.836
<v Speaker 2>as in is AI gonna help me do my job?

0:01:48.236 --> 0:01:50.756
<v Speaker 2>Is AI gonna do my job?

0:01:51.276 --> 0:01:54.116
<v Speaker 1>But there is also a bigger question here what is

0:01:54.196 --> 0:01:55.076
<v Speaker 1>AI going to mean?

0:01:55.236 --> 0:01:58.956
<v Speaker 2>More broadly for people whose jobs involve writing things and

0:01:58.996 --> 0:02:02.116
<v Speaker 2>creating visuals, which is to say, what is AI going

0:02:02.196 --> 0:02:11.316
<v Speaker 2>to mean for almost all white collar workers. I'm Jacob

0:02:11.356 --> 0:02:13.676
<v Speaker 2>Goldstein and this is What's Your Problem, a show about

0:02:13.716 --> 0:02:17.556
<v Speaker 2>people trying to make technological progress. My guest today is

0:02:17.636 --> 0:02:21.516
<v Speaker 2>Andrew Mason, founder and CEO of descript or maybe it's

0:02:21.836 --> 0:02:25.156
<v Speaker 2>descript by the way, I've always said descript and I'm

0:02:25.156 --> 0:02:28.356
<v Speaker 2>pretty sure that's wrong, right, it's a descript like dtour.

0:02:28.436 --> 0:02:30.196
<v Speaker 3>Weird noncommittal on the issue.

0:02:30.316 --> 0:02:31.756
<v Speaker 2>Let's do the subjective version.

0:02:31.756 --> 0:02:33.676
<v Speaker 1>You're just one man. How do you say the name

0:02:33.716 --> 0:02:34.356
<v Speaker 1>of your company?

0:02:34.676 --> 0:02:37.836
<v Speaker 3>Yeah, I've kind of cultivated the ability to flip between

0:02:37.876 --> 0:02:38.876
<v Speaker 3>them as I speak.

0:02:39.716 --> 0:02:40.396
<v Speaker 1>You're killing me.

0:02:40.556 --> 0:02:42.036
<v Speaker 3>The world still needs a little mystery.

0:02:42.196 --> 0:02:44.436
<v Speaker 1>Okay, how about this? Say your name and your job.

0:02:45.156 --> 0:02:48.476
<v Speaker 3>My name is Andrew Mason. I work at descript. That's

0:02:49.036 --> 0:02:53.316
<v Speaker 3>dscript dscript.

0:02:53.556 --> 0:02:54.476
<v Speaker 1>Well played.

0:02:55.836 --> 0:02:58.516
<v Speaker 2>Earlier in his career. Andrew Mason was the co founder

0:02:58.556 --> 0:03:01.596
<v Speaker 2>of groupunk. He took the company public and then got

0:03:01.636 --> 0:03:05.076
<v Speaker 2>fired after its stock fell by something like seventy five percent.

0:03:05.596 --> 0:03:08.676
<v Speaker 2>After that, he started a company called Dtour or maybe

0:03:08.676 --> 0:03:12.236
<v Speaker 2>it's I don't know. The company made these highly produced

0:03:12.316 --> 0:03:15.356
<v Speaker 2>audio walking tours that you could listen to on your phone.

0:03:15.396 --> 0:03:18.276
<v Speaker 2>In that job, Andrew saw the challenges of working with

0:03:18.356 --> 0:03:20.236
<v Speaker 2>the old waveform.

0:03:19.716 --> 0:03:21.276
<v Speaker 1>Based audio editing software.

0:03:21.596 --> 0:03:25.116
<v Speaker 2>At the same time, AI generated transcripts were getting better

0:03:25.276 --> 0:03:28.476
<v Speaker 2>and cheaper, and new technology was making it possible to

0:03:28.636 --> 0:03:32.476
<v Speaker 2>automatically match a transcript to an audio file. Andrew looked

0:03:32.476 --> 0:03:35.076
<v Speaker 2>at those two developments and thought, we should make an

0:03:35.116 --> 0:03:38.396
<v Speaker 2>audio editor that works like a word processor, which he

0:03:38.516 --> 0:03:41.036
<v Speaker 2>admits was a distraction from what he was supposed to

0:03:41.036 --> 0:03:43.276
<v Speaker 2>be doing, which was making walking tours.

0:03:44.916 --> 0:03:47.396
<v Speaker 3>If I'm being honest, it was a bit of an indulgence.

0:03:47.636 --> 0:03:50.436
<v Speaker 3>It just felt like an incredibly cool problem to work on.

0:03:51.076 --> 0:03:53.956
<v Speaker 3>I went to school for music technology and worked in

0:03:53.996 --> 0:03:58.156
<v Speaker 3>a recording studio after I graduated, and just always loved tools,

0:03:58.236 --> 0:04:02.436
<v Speaker 3>and audio visual tools in particular. It was just so

0:04:02.756 --> 0:04:05.076
<v Speaker 3>fun to start thinking about this puzzle.

0:04:05.636 --> 0:04:06.316
<v Speaker 1>Uh huh.

0:04:06.356 --> 0:04:10.676
<v Speaker 3>So we told ourselves it was kind of way of diversifying,

0:04:10.716 --> 0:04:13.636
<v Speaker 3>but that's just like a ridiculous way for a product

0:04:13.676 --> 0:04:15.796
<v Speaker 3>that at like a startup that just hasn't even found

0:04:16.236 --> 0:04:19.716
<v Speaker 3>product market fit in their core product to be thinking

0:04:19.796 --> 0:04:22.196
<v Speaker 3>about the world. You know, all of the advice textbooks

0:04:22.236 --> 0:04:24.836
<v Speaker 3>will tell you not to do that, and it's probably

0:04:24.876 --> 0:04:28.956
<v Speaker 3>generally good advice, but it was just it was just irresistible.

0:04:28.596 --> 0:04:28.796
<v Speaker 1>You know.

0:04:28.876 --> 0:04:31.516
<v Speaker 2>So I am a fan of descript. I started using

0:04:31.596 --> 0:04:35.036
<v Speaker 2>it around when it came out several years ago. Certainly

0:04:35.636 --> 0:04:38.996
<v Speaker 2>I think it's great. It is kind of janky, and

0:04:39.076 --> 0:04:42.836
<v Speaker 2>it's always kind of janky, right, And my guess is

0:04:43.396 --> 0:04:46.556
<v Speaker 2>jankie meaning like a little bit unstable, things don't quite work.

0:04:46.596 --> 0:04:48.156
<v Speaker 2>It's always telling you to restart.

0:04:48.676 --> 0:04:50.356
<v Speaker 3>By the way, if I'm not sure, if I have

0:04:50.516 --> 0:04:52.476
<v Speaker 3>your side, so I may ask you to send me

0:04:52.596 --> 0:04:54.676
<v Speaker 3>this entire portion of the interview so I can share

0:04:54.676 --> 0:04:55.236
<v Speaker 3>it with the team.

0:04:55.756 --> 0:04:58.156
<v Speaker 2>So the thing is, like, I wonder, why is it

0:04:58.156 --> 0:05:00.436
<v Speaker 2>always kind of janky? Why is it never just like

0:05:00.476 --> 0:05:02.916
<v Speaker 2>stable and it works? And my guess is it's because

0:05:02.916 --> 0:05:05.436
<v Speaker 2>you're pushing forward really fast, right, You're trying to make

0:05:05.476 --> 0:05:08.476
<v Speaker 2>it better and better and better, and presumably there is

0:05:08.516 --> 0:05:11.236
<v Speaker 2>some trace off, right, like the faster you try and

0:05:11.276 --> 0:05:14.796
<v Speaker 2>push it forward, the more janky it's gonna be. You could,

0:05:14.836 --> 0:05:17.236
<v Speaker 2>I'm sure, just perfect the way it was four years ago,

0:05:17.316 --> 0:05:18.836
<v Speaker 2>but then it would never get better, but it would

0:05:18.836 --> 0:05:21.076
<v Speaker 2>be stable, right, And so this is like a big

0:05:21.116 --> 0:05:25.236
<v Speaker 2>whatever startup founder type question, like, is that some balance

0:05:25.236 --> 0:05:27.276
<v Speaker 2>you're always trying to figure out how fast do we

0:05:27.756 --> 0:05:29.756
<v Speaker 2>iterate versus how much do we try and make it

0:05:29.876 --> 0:05:31.036
<v Speaker 2>just stable and work.

0:05:31.876 --> 0:05:36.796
<v Speaker 3>Yeah, that's an astute observation, not the fact that it's janky.

0:05:36.996 --> 0:05:38.596
<v Speaker 3>That doesn't take a genius.

0:05:38.316 --> 0:05:41.236
<v Speaker 2>Respectful but respectfully as a fan.

0:05:41.076 --> 0:05:41.916
<v Speaker 1>I'm telling you it's.

0:05:43.396 --> 0:05:45.716
<v Speaker 3>But I think like your your attempt to make sense

0:05:45.756 --> 0:05:48.116
<v Speaker 3>out of it, I think like a good story to

0:05:48.196 --> 0:05:51.116
<v Speaker 3>tell here is maybe like going back to the to

0:05:51.196 --> 0:05:56.116
<v Speaker 3>the very beginning of Descript. So when we became Descript,

0:05:56.476 --> 0:06:01.236
<v Speaker 3>we sold off Detour to Bows and we decided to

0:06:01.356 --> 0:06:05.796
<v Speaker 3>just focus on building out this media word processor thing.

0:06:06.996 --> 0:06:10.956
<v Speaker 3>And some of the public radio producers who had worked

0:06:10.956 --> 0:06:14.916
<v Speaker 3>at Detour went on back into public radio and they

0:06:14.996 --> 0:06:21.596
<v Speaker 3>became some of the earliest customers of dscript. And what

0:06:21.756 --> 0:06:27.316
<v Speaker 3>we found was that they pushed it so much farther

0:06:27.876 --> 0:06:29.756
<v Speaker 3>than we were ready for.

0:06:30.236 --> 0:06:32.876
<v Speaker 1>Ah, so quickly, what do you mean by that? Like,

0:06:32.916 --> 0:06:34.036
<v Speaker 1>what is an example of that.

0:06:34.316 --> 0:06:37.076
<v Speaker 3>Yeah, I mean specifically in the case of like some

0:06:37.116 --> 0:06:40.836
<v Speaker 3>of these shows, it means putting together three to five

0:06:40.916 --> 0:06:45.756
<v Speaker 3>hour cuts of tape from many different files, with lots

0:06:45.876 --> 0:06:49.716
<v Speaker 3>like tons and tons of edits and notes mixed into

0:06:49.716 --> 0:06:54.916
<v Speaker 3>the edits, and just like stuff that we hadn't pressure

0:06:54.956 --> 0:06:58.396
<v Speaker 3>tested from a performance just giant.

0:06:58.476 --> 0:07:01.276
<v Speaker 2>The files are really big, right, Like a three hour

0:07:01.636 --> 0:07:04.276
<v Speaker 2>audio file is actually a giant file, right, And if

0:07:04.276 --> 0:07:05.836
<v Speaker 2>you're stacking up a bunch of those, so you have

0:07:05.836 --> 0:07:08.556
<v Speaker 2>all these giant files and you're making tons of cuts,

0:07:08.636 --> 0:07:14.196
<v Speaker 2>that's just like computationally intensive, that kind of thing storage intensive.

0:07:15.036 --> 0:07:17.396
<v Speaker 3>Yeah, it was just something that we hadn't that we

0:07:17.436 --> 0:07:20.876
<v Speaker 3>hadn't optimized for. It's it's an eminently solvable problem, but

0:07:20.916 --> 0:07:25.396
<v Speaker 3>it was something that in the earliest versions we hadn't done.

0:07:26.036 --> 0:07:28.676
<v Speaker 3>And so that is kind of in many ways been

0:07:28.796 --> 0:07:32.316
<v Speaker 3>the story of descript up to this point, where there's

0:07:32.396 --> 0:07:35.476
<v Speaker 3>there's been that element of it, and there were kind

0:07:35.556 --> 0:07:39.156
<v Speaker 3>of realities of needing to make quick progress that we

0:07:39.196 --> 0:07:46.636
<v Speaker 3>had to balance against stability and what we had for

0:07:46.716 --> 0:07:50.436
<v Speaker 3>our customers in terms of like the core product idea

0:07:50.996 --> 0:07:53.756
<v Speaker 3>of being able to edit by text was still for

0:07:53.916 --> 0:07:58.476
<v Speaker 3>them so much better than the alternative that there was

0:07:59.276 --> 0:08:04.516
<v Speaker 3>just a tolerance of the stability issues that honestly made

0:08:04.596 --> 0:08:06.596
<v Speaker 3>us sick to our stomachs that we had to put

0:08:06.636 --> 0:08:10.036
<v Speaker 3>people through. And it's not like wegn but it was

0:08:10.116 --> 0:08:13.596
<v Speaker 3>like we had to make trade offs there. So all

0:08:13.636 --> 0:08:16.516
<v Speaker 3>of this pushing kind of culminated with this release of

0:08:16.556 --> 0:08:20.316
<v Speaker 3>a pretty major overhaul that we did at the end

0:08:20.316 --> 0:08:25.996
<v Speaker 3>of last year and since then, since last November and

0:08:26.556 --> 0:08:29.996
<v Speaker 3>really like through the first half of this year, is

0:08:30.036 --> 0:08:32.036
<v Speaker 3>when we think we start to get to a good place.

0:08:32.796 --> 0:08:35.396
<v Speaker 3>Our goal is that if we're having this conversation, like

0:08:35.436 --> 0:08:38.516
<v Speaker 3>we're not going to be having the same conversation in

0:08:38.716 --> 0:08:41.436
<v Speaker 3>say July, for sure at the very latest, like the

0:08:41.436 --> 0:08:44.196
<v Speaker 3>conversation we'll be having with someone like you will be Wow,

0:08:44.236 --> 0:08:46.756
<v Speaker 3>it's gotten. It's like not an issue anymore.

0:08:46.876 --> 0:08:50.516
<v Speaker 2>So you say all that, but also, you just got

0:08:50.516 --> 0:08:53.916
<v Speaker 2>this big investment from open Ai. You got a thing

0:08:54.116 --> 0:08:57.756
<v Speaker 2>on descript that says sign up to try GPT four

0:08:57.876 --> 0:09:00.836
<v Speaker 2>with Descript, which I just signed up for and I'm

0:09:00.916 --> 0:09:04.636
<v Speaker 2>very curious about That doesn't sound like, oh, we've arrived

0:09:04.676 --> 0:09:06.316
<v Speaker 2>and now we've got our product and we've just got

0:09:06.316 --> 0:09:08.316
<v Speaker 2>a hone it that sounds like there's this whole giant

0:09:08.396 --> 0:09:10.356
<v Speaker 2>new universe of things you were about to try and

0:09:10.356 --> 0:09:10.916
<v Speaker 2>figure out.

0:09:11.556 --> 0:09:13.916
<v Speaker 3>That's true, And that's the funny thing about all of

0:09:13.916 --> 0:09:16.996
<v Speaker 3>this are is that at the same time that we're

0:09:16.996 --> 0:09:21.076
<v Speaker 3>turning to focus on quality, it's a moment where generative

0:09:21.076 --> 0:09:25.076
<v Speaker 3>AI has arrived at a scale and with a force

0:09:25.316 --> 0:09:28.156
<v Speaker 3>that no one really saw coming this quickly.

0:09:28.676 --> 0:09:31.396
<v Speaker 2>So so okay, I know from the beginning Descript was

0:09:31.396 --> 0:09:35.076
<v Speaker 2>built on top of AI, you know, the technology for

0:09:35.116 --> 0:09:39.516
<v Speaker 2>transcription for matching audio to text, but was Descript itself

0:09:39.596 --> 0:09:40.716
<v Speaker 2>an AI company.

0:09:41.516 --> 0:09:44.236
<v Speaker 3>So we had some really smart people on the team

0:09:44.596 --> 0:09:47.636
<v Speaker 3>in UH with machine learning experience, but I wouldn't say

0:09:47.636 --> 0:09:50.196
<v Speaker 3>in the early days we were like a company that

0:09:50.676 --> 0:09:53.956
<v Speaker 3>was with anybody that was doing like original AI research

0:09:54.036 --> 0:09:59.436
<v Speaker 3>or anything like that. We saw that as a gap

0:09:59.836 --> 0:10:05.036
<v Speaker 3>that we wanted to solve. And so I forget exactly

0:10:05.636 --> 0:10:07.956
<v Speaker 3>what year it was, it was maybe about four years

0:10:07.956 --> 0:10:12.916
<v Speaker 3>ago we saw this company called Liarbird. It was a

0:10:12.916 --> 0:10:16.436
<v Speaker 3>a company out of y Combinator with some really smart

0:10:17.076 --> 0:10:21.036
<v Speaker 3>PhD candidates. They had built model that would build a

0:10:21.076 --> 0:10:23.836
<v Speaker 3>clone of your voice based on I think about three

0:10:24.076 --> 0:10:27.476
<v Speaker 3>minutes or five minutes of training data. Of just talking

0:10:27.556 --> 0:10:27.796
<v Speaker 3>to it.

0:10:27.876 --> 0:10:31.756
<v Speaker 2>Let me just say, I know Liarbird is spelled l yri,

0:10:32.596 --> 0:10:34.476
<v Speaker 2>but I assume they're aware.

0:10:34.236 --> 0:10:35.036
<v Speaker 1>Of the hominem.

0:10:35.156 --> 0:10:37.036
<v Speaker 2>Right, this is a thing that is cloning your voice

0:10:37.076 --> 0:10:39.636
<v Speaker 2>so that you can make it sound like you're talking

0:10:39.756 --> 0:10:42.876
<v Speaker 2>even if you're not talking. And the company is called Liarbird,

0:10:43.276 --> 0:10:47.276
<v Speaker 2>and this is a somewhat fraud thing, right, Like, I

0:10:47.276 --> 0:10:49.156
<v Speaker 2>feel like they're throwing it in my face that this

0:10:49.196 --> 0:10:53.636
<v Speaker 2>is a sketchy product that they're developing.

0:10:54.436 --> 0:10:55.716
<v Speaker 1>Did it cross your mind?

0:10:56.196 --> 0:10:59.276
<v Speaker 3>Did it cross my mind? Is like the ethical quandary

0:10:59.276 --> 0:11:01.956
<v Speaker 3>that we were getting into, or like the branding implications

0:11:01.996 --> 0:11:02.596
<v Speaker 3>of the name.

0:11:03.636 --> 0:11:04.836
<v Speaker 1>More the ethical quandary.

0:11:05.396 --> 0:11:10.236
<v Speaker 3>Yeah, the ethical quandary absolutely entered our mind. And our

0:11:10.276 --> 0:11:12.756
<v Speaker 3>point of view on that, and has been our point

0:11:12.796 --> 0:11:16.316
<v Speaker 3>of view on these things in general, has been that

0:11:17.356 --> 0:11:22.196
<v Speaker 3>we don't want to be like out there paving the

0:11:22.236 --> 0:11:25.756
<v Speaker 3>way for any new paths to the apocalypse, so to speak.

0:11:27.756 --> 0:11:31.996
<v Speaker 3>We actually think, like have always felt like not really

0:11:32.036 --> 0:11:36.236
<v Speaker 3>sure how society was going to put the brakes on

0:11:36.236 --> 0:11:38.116
<v Speaker 3>this sort of thing. We just knew that we didn't

0:11:38.116 --> 0:11:40.156
<v Speaker 3>want to be part of it, and we tried to

0:11:40.156 --> 0:11:42.996
<v Speaker 3>put guardrails in place on our product. That would make

0:11:43.036 --> 0:11:46.316
<v Speaker 3>it easy to stay off the slippery slope. So in

0:11:46.356 --> 0:11:50.996
<v Speaker 3>the case of Lyyerbird, which once we bought them, we

0:11:51.276 --> 0:11:53.596
<v Speaker 3>integrated their technology and released it as something that we

0:11:53.636 --> 0:11:55.996
<v Speaker 3>call overdub. It's a way that you can clone your voice.

0:11:56.636 --> 0:11:59.556
<v Speaker 3>We require you to authenticate that it's actually you, and

0:11:59.596 --> 0:12:02.756
<v Speaker 3>we only let you clone your own voice, and that's

0:12:02.796 --> 0:12:05.476
<v Speaker 3>worked really well. We're now in a world where there's

0:12:05.476 --> 0:12:08.876
<v Speaker 3>other people that have similar models and they're not putting

0:12:08.876 --> 0:12:11.756
<v Speaker 3>those protections in place. And the use case that we've

0:12:11.756 --> 0:12:14.876
<v Speaker 3>always been the most excited about is making it possible

0:12:14.876 --> 0:12:18.916
<v Speaker 3>to edit your natural recordings, so going in and changing

0:12:18.956 --> 0:12:22.236
<v Speaker 3>an individual word, and we've built some special stuff that

0:12:22.316 --> 0:12:24.676
<v Speaker 3>will kind of listen to the audio on either sides

0:12:24.716 --> 0:12:27.396
<v Speaker 3>and make sure that it blends in. From an intonation perspective,

0:12:27.996 --> 0:12:30.196
<v Speaker 3>we started with the ability to delete stuff and move

0:12:30.236 --> 0:12:32.796
<v Speaker 3>stuff around. Now you can just type and really make

0:12:32.836 --> 0:12:34.156
<v Speaker 3>it feel like it's a word processor.

0:12:34.316 --> 0:12:39.316
<v Speaker 2>Presumably the better you get, the better the technology you

0:12:39.436 --> 0:12:43.156
<v Speaker 2>use to Clona Voice gets, the more words it can do. Right,

0:12:43.196 --> 0:12:46.196
<v Speaker 2>I mean, every week, for what's your problem? I write

0:12:46.196 --> 0:12:50.676
<v Speaker 2>a little introduction and then I read it. But presumably

0:12:50.716 --> 0:12:53.236
<v Speaker 2>at some point overdub will be good enough that no

0:12:53.276 --> 0:12:55.556
<v Speaker 2>one knows will know whether it's me reading it or

0:12:55.596 --> 0:12:56.836
<v Speaker 2>I'm just typing it right.

0:12:57.316 --> 0:13:00.196
<v Speaker 3>We have a new version of overdub that will release

0:13:01.116 --> 0:13:04.156
<v Speaker 3>in the next couple of months, and it's the first

0:13:04.196 --> 0:13:08.236
<v Speaker 3>time that I've heard my own voice doing a narration

0:13:08.356 --> 0:13:12.076
<v Speaker 3>of something that made me say, like, this sounds so

0:13:12.236 --> 0:13:16.116
<v Speaker 3>much like me in a way that it's not distracting

0:13:16.196 --> 0:13:18.156
<v Speaker 3>or the AI does not get in the way.

0:13:18.556 --> 0:13:23.236
<v Speaker 2>Can I try that new version now, like, not this minute,

0:13:23.276 --> 0:13:24.156
<v Speaker 2>but like for the show?

0:13:24.516 --> 0:13:25.196
<v Speaker 1>Yeah, for the show.

0:13:26.196 --> 0:13:27.956
<v Speaker 3>I bet we could find a way to do it.

0:13:27.956 --> 0:13:30.396
<v Speaker 3>It's just so you could hear it and stuff.

0:13:30.796 --> 0:13:33.116
<v Speaker 2>There's a universe where I say, at this moment in

0:13:33.156 --> 0:13:37.236
<v Speaker 2>the show, guess what today? That voice me reading the

0:13:37.236 --> 0:13:39.756
<v Speaker 2>intro at the top of the show that was overdubbed.

0:13:39.756 --> 0:13:40.676
<v Speaker 1>It wasn't really made.

0:13:40.916 --> 0:13:45.756
<v Speaker 3>Yeah, we tried overdubb for the voice doing the intro

0:13:45.836 --> 0:13:47.196
<v Speaker 3>at the top of the show.

0:13:47.516 --> 0:13:49.316
<v Speaker 1>And we decided it wasn't quite.

0:13:49.076 --> 0:13:51.676
<v Speaker 2>Good enough, but we decided it would work for this

0:13:51.796 --> 0:13:52.636
<v Speaker 2>part of the show.

0:13:53.156 --> 0:13:57.036
<v Speaker 1>What you're hearing right now, it's not really me. It's overdubbed.

0:13:58.116 --> 0:14:01.756
<v Speaker 2>In a minute, what overdubb and chat GPT and generative

0:14:01.796 --> 0:14:04.116
<v Speaker 2>AI will mean for descript and for the.

0:14:04.076 --> 0:14:06.396
<v Speaker 1>World and also for me.

0:14:12.436 --> 0:14:16.316
<v Speaker 2>Now back to the show, descript is expanding from podcasts

0:14:16.316 --> 0:14:18.716
<v Speaker 2>to video, and it just took a big investment from

0:14:18.836 --> 0:14:22.636
<v Speaker 2>open Ai, the company that makes chat GPT, and also

0:14:22.676 --> 0:14:26.516
<v Speaker 2>this system called Dolly that uses AI to generate images.

0:14:26.916 --> 0:14:30.116
<v Speaker 2>So Descript is clearly pointing toward a future where it's

0:14:30.156 --> 0:14:33.276
<v Speaker 2>going to be software for creating AI generated or at

0:14:33.356 --> 0:14:34.316
<v Speaker 2>least AI.

0:14:34.236 --> 0:14:35.956
<v Speaker 1>Enhanced audio and video.

0:14:36.556 --> 0:14:39.356
<v Speaker 2>And I asked Andrew, what does that future look like?

0:14:39.796 --> 0:14:42.596
<v Speaker 2>How is generative AI going to work in descript?

0:14:43.236 --> 0:14:45.676
<v Speaker 3>I don't think we know entirely yet. In a lot

0:14:45.676 --> 0:14:47.836
<v Speaker 3>of ways, it feels to me like you're letting this

0:14:48.116 --> 0:14:51.876
<v Speaker 3>alien into into your app. You're just giving it the

0:14:51.956 --> 0:14:56.716
<v Speaker 3>keys and then the interfaces. How do you find how

0:14:56.756 --> 0:15:00.916
<v Speaker 3>do you find a way to kind of give the

0:15:00.956 --> 0:15:04.556
<v Speaker 3>aliens some buttons in tier UI, give them the ability

0:15:04.636 --> 0:15:06.796
<v Speaker 3>to press the buttons, and then how do you talk

0:15:06.836 --> 0:15:07.476
<v Speaker 3>to the alien?

0:15:07.796 --> 0:15:08.796
<v Speaker 1>What do you mean? Like?

0:15:08.836 --> 0:15:13.396
<v Speaker 2>That is a striking metaphor a little scarier right. It

0:15:13.476 --> 0:15:17.036
<v Speaker 2>suggests a certain level of uncertainty and potential downside. It's

0:15:17.076 --> 0:15:18.716
<v Speaker 2>not like, oh, this is great, this is going to

0:15:18.716 --> 0:15:20.556
<v Speaker 2>solve a problem like, why do you say it's like

0:15:20.636 --> 0:15:21.716
<v Speaker 2>letting an alien.

0:15:21.396 --> 0:15:24.116
<v Speaker 3>In as opposed to letting a human in.

0:15:25.276 --> 0:15:27.836
<v Speaker 1>It's a really interesting choice of words. Tell me more

0:15:27.836 --> 0:15:28.276
<v Speaker 1>about it.

0:15:30.996 --> 0:15:35.436
<v Speaker 3>So let's start by just saying, like, very specifically, what

0:15:35.476 --> 0:15:40.276
<v Speaker 3>I mean. I think, when implemented, well, what this will

0:15:40.276 --> 0:15:43.596
<v Speaker 3>feel like is as if you had a co editor

0:15:44.236 --> 0:15:47.196
<v Speaker 3>in a document with you, in our case, in a

0:15:47.276 --> 0:15:52.436
<v Speaker 3>video or a podcast that you're working on that is smart,

0:15:52.596 --> 0:15:54.956
<v Speaker 3>knows how to do everything, definitely knows how to do

0:15:54.996 --> 0:15:59.316
<v Speaker 3>the tedios busy work, and you can kind of kind

0:15:59.356 --> 0:16:03.156
<v Speaker 3>of guide or direct through giving these tasks. You know,

0:16:03.716 --> 0:16:07.596
<v Speaker 3>it's almost like it's the production assistant or something like that,

0:16:07.636 --> 0:16:10.716
<v Speaker 3>and you're the director and you're able to just guide

0:16:10.756 --> 0:16:13.076
<v Speaker 3>it and give it feedback on how it's doing and

0:16:13.076 --> 0:16:15.196
<v Speaker 3>what it's doing well and what it's not doing well.

0:16:15.676 --> 0:16:18.076
<v Speaker 2>There's a version of it where it's like we've gotten

0:16:18.156 --> 0:16:21.716
<v Speaker 2>used to the graphical user interface, right, We've been trained

0:16:21.756 --> 0:16:24.796
<v Speaker 2>since the Magintosh computer in the mid nineteen eighties that

0:16:24.876 --> 0:16:26.836
<v Speaker 2>the way you interact with a computer is like there's

0:16:26.876 --> 0:16:28.996
<v Speaker 2>little pictures and little folders and you point.

0:16:28.756 --> 0:16:30.676
<v Speaker 1>And click one way or another, right, and.

0:16:31.276 --> 0:16:35.436
<v Speaker 2>One possibility here is the new standard interface is chat.

0:16:35.476 --> 0:16:38.996
<v Speaker 2>You just type in like whatever, please trim all the

0:16:39.116 --> 0:16:42.436
<v Speaker 2>ums from this file, or even please turn this thirty

0:16:42.476 --> 0:16:44.836
<v Speaker 2>minute interview into a twenty minute interview, and the way

0:16:44.836 --> 0:16:47.596
<v Speaker 2>that makes it most interesting, right, and you just type

0:16:47.596 --> 0:16:48.516
<v Speaker 2>that in and it happens.

0:16:48.556 --> 0:16:50.396
<v Speaker 1>I mean that's a version of what I hear you

0:16:50.396 --> 0:16:50.916
<v Speaker 1>saying there.

0:16:50.956 --> 0:16:53.316
<v Speaker 3>I think some people believe that that chat or a

0:16:53.396 --> 0:16:57.796
<v Speaker 3>text field will become the primary interface for making things.

0:16:58.476 --> 0:17:01.156
<v Speaker 3>I think of it more as like it's the primary

0:17:01.196 --> 0:17:04.276
<v Speaker 3>interface for interacting with the alien, and then you and

0:17:04.316 --> 0:17:07.396
<v Speaker 3>the alien are still going to be working, like have

0:17:07.556 --> 0:17:11.356
<v Speaker 3>other buttons that they can press. You still, sometimes you

0:17:11.436 --> 0:17:13.556
<v Speaker 3>just want to take the thing in your hands and

0:17:13.556 --> 0:17:14.236
<v Speaker 3>do it yourself.

0:17:14.796 --> 0:17:17.316
<v Speaker 2>The alien metaphor, I mean there's a real like do

0:17:17.356 --> 0:17:21.316
<v Speaker 2>we welcome our alien overlord's question? When you choose that metaphor,

0:17:21.396 --> 0:17:22.196
<v Speaker 2>it makes.

0:17:21.996 --> 0:17:24.356
<v Speaker 3>Me I mean, maybe it feels that way.

0:17:24.396 --> 0:17:26.316
<v Speaker 2>It doesn't it it doesn't make me feel better.

0:17:26.396 --> 0:17:29.996
<v Speaker 3>I'll say that I think it feels the way that

0:17:30.036 --> 0:17:33.916
<v Speaker 3>an alien arrival would probably feel, where you know, maybe

0:17:33.916 --> 0:17:37.476
<v Speaker 3>you shake its hand and immediately it has something in

0:17:37.516 --> 0:17:41.476
<v Speaker 3>its skin that cures your cancer, and you feel hopeful,

0:17:42.996 --> 0:17:45.516
<v Speaker 3>but you also want to know what they're up.

0:17:45.436 --> 0:17:48.876
<v Speaker 2>To and yeah, and cure your cancer is definitely the

0:17:48.916 --> 0:17:49.636
<v Speaker 2>happy version.

0:17:49.676 --> 0:17:51.476
<v Speaker 1>Not usually in the alien movie.

0:17:51.276 --> 0:17:53.236
<v Speaker 2>What happens, but I guess that could happen.

0:17:53.396 --> 0:17:55.356
<v Speaker 3>Well, there's the good there's a good part, right, But

0:17:55.396 --> 0:17:57.636
<v Speaker 3>you never really know, I think is the point. And

0:17:57.956 --> 0:18:00.516
<v Speaker 3>I think we're all living in this kind of like

0:18:00.596 --> 0:18:03.916
<v Speaker 3>pushing forward in this mystery, kind of kind of stuck

0:18:03.956 --> 0:18:05.876
<v Speaker 3>between awe and terror.

0:18:06.276 --> 0:18:10.156
<v Speaker 2>You sound more ambivalent than I might have thought. Why

0:18:10.196 --> 0:18:12.836
<v Speaker 2>is that because you just took a giant investment from

0:18:12.876 --> 0:18:13.356
<v Speaker 2>open AI.

0:18:14.876 --> 0:18:17.516
<v Speaker 3>I think like at moments like this, you have a

0:18:17.596 --> 0:18:23.436
<v Speaker 3>choice between either renunciation and just like stopping and out

0:18:23.476 --> 0:18:27.236
<v Speaker 3>of from a place of fear. Which maybe that's right,

0:18:27.356 --> 0:18:32.836
<v Speaker 3>you know, maybe fulfillment and happiness everything we have for

0:18:32.876 --> 0:18:36.796
<v Speaker 3>that is is already here, and we should focus our

0:18:37.076 --> 0:18:39.756
<v Speaker 3>energies on making peace with our inevitable death.

0:18:41.356 --> 0:18:42.676
<v Speaker 1>In any case, we should do that.

0:18:42.836 --> 0:18:47.436
<v Speaker 3>But go on the other way to think of it

0:18:47.516 --> 0:18:52.076
<v Speaker 3>is to just forge ahead and realize that the potential

0:18:52.116 --> 0:18:54.356
<v Speaker 3>of what's on the other end of this might make

0:18:54.476 --> 0:18:59.196
<v Speaker 3>us feel in retrospect like we were just in the

0:18:59.236 --> 0:19:05.476
<v Speaker 3>earliest possible innings of our of the human experiment. So

0:19:06.756 --> 0:19:09.396
<v Speaker 3>you know, I feel like we're all going to die

0:19:09.436 --> 0:19:12.396
<v Speaker 3>one way or another, might as well forge ahead. It's

0:19:12.396 --> 0:19:15.756
<v Speaker 3>not ambivalence, but it's more just being clear eyed about

0:19:15.796 --> 0:19:18.716
<v Speaker 3>the fact that not trying to pretend that there's parts

0:19:18.716 --> 0:19:20.156
<v Speaker 3>of it that don't seem scary.

0:19:21.236 --> 0:19:23.996
<v Speaker 2>I mean, one of the things that's really striking to

0:19:24.156 --> 0:19:29.756
<v Speaker 2>me with AI, and that seems quite different from other

0:19:30.476 --> 0:19:34.636
<v Speaker 2>technologies in the past, is the people who are working

0:19:34.676 --> 0:19:38.036
<v Speaker 2>on it, the people who really understand it, seem more

0:19:38.076 --> 0:19:40.836
<v Speaker 2>scared than everybody else.

0:19:41.516 --> 0:19:44.716
<v Speaker 3>I'm not a first time founder. I went through the

0:19:44.796 --> 0:19:50.396
<v Speaker 3>experience of being a young person building building group on

0:19:51.356 --> 0:19:54.396
<v Speaker 3>telling myself a story about how it was going to

0:19:54.876 --> 0:19:59.156
<v Speaker 3>revolutionize local commerce and all the good stuff, and it

0:19:59.276 --> 0:20:01.116
<v Speaker 3>just didn't turn out that way. And I think we've

0:20:01.156 --> 0:20:06.156
<v Speaker 3>seen a generation of tech companies that just like didn't

0:20:06.556 --> 0:20:10.756
<v Speaker 3>turn out the way that the the super Rose colored

0:20:10.756 --> 0:20:15.356
<v Speaker 3>Glasses mission statement would have suggested. And I think we're

0:20:15.396 --> 0:20:19.316
<v Speaker 3>just trying to be we just have that experience, that

0:20:19.356 --> 0:20:24.196
<v Speaker 3>recent experience at top of mind, and are trying to

0:20:24.316 --> 0:20:27.836
<v Speaker 3>think about it in a way that has guardrails around

0:20:27.996 --> 0:20:30.356
<v Speaker 3>around repeating that history and just make sure we're really

0:20:30.396 --> 0:20:33.476
<v Speaker 3>proud of what we build. Does that make sense?

0:20:33.596 --> 0:20:34.236
<v Speaker 1>It makes sense.

0:20:34.596 --> 0:20:36.196
<v Speaker 3>Am I Am I going to regret saying all this?

0:20:37.156 --> 0:20:37.836
<v Speaker 1>I don't think so.

0:20:37.996 --> 0:20:40.676
<v Speaker 2>You haven't said anything like incriminating as far as I

0:20:40.676 --> 0:20:42.996
<v Speaker 2>can tell. You know, I heard somebody saying the other day, like,

0:20:43.076 --> 0:20:46.076
<v Speaker 2>it's an interesting question to ask somebody like, what was

0:20:46.116 --> 0:20:49.996
<v Speaker 2>the first thing you asked GPT chet GPT to do?

0:20:50.596 --> 0:20:53.156
<v Speaker 2>And the first thing I asked chet GPT to do

0:20:53.596 --> 0:20:57.236
<v Speaker 2>was write an episode of Planet Money podcast I used

0:20:57.236 --> 0:20:59.236
<v Speaker 2>to host, of which there are you know, a thousand

0:20:59.276 --> 0:21:01.956
<v Speaker 2>transcripts on the internet. Write an episode of Planet Money

0:21:01.996 --> 0:21:04.276
<v Speaker 2>about whether the FED is going to raise interest rates

0:21:04.276 --> 0:21:06.756
<v Speaker 2>by twenty five basis points or leave them unchanged, right,

0:21:07.396 --> 0:21:10.516
<v Speaker 2>And it wrote something that was pretty good, like not

0:21:10.596 --> 0:21:13.716
<v Speaker 2>a whole show, it's not there now, but at the

0:21:13.796 --> 0:21:16.556
<v Speaker 2>rate of current improvement, you could definitely imagine it writing

0:21:16.596 --> 0:21:20.716
<v Speaker 2>that episode pretty well in whatever a year or two

0:21:20.836 --> 0:21:22.956
<v Speaker 2>years or some amount of time when I will still

0:21:23.036 --> 0:21:26.716
<v Speaker 2>want to be gainfully employed. And like I do wonder

0:21:26.756 --> 0:21:29.796
<v Speaker 2>on this one, is there a day slash? How far

0:21:29.916 --> 0:21:30.716
<v Speaker 2>are we from.

0:21:30.556 --> 0:21:34.596
<v Speaker 1>The day when generative a I can just make a

0:21:34.636 --> 0:21:36.276
<v Speaker 1>podcast without me?

0:21:36.836 --> 0:21:37.876
<v Speaker 3>How does that make you feel?

0:21:39.436 --> 0:21:43.876
<v Speaker 2>I mean somewhat afraid, also like interested in figuring out

0:21:43.916 --> 0:21:46.596
<v Speaker 2>how to use it, right, Like it feels like a steamroller.

0:21:46.636 --> 0:21:49.676
<v Speaker 2>It's like, oh, maybe I should go get in that steamroller.

0:21:49.716 --> 0:21:51.676
<v Speaker 2>If my choices are get in the steamroller or get

0:21:51.756 --> 0:21:52.476
<v Speaker 2>run over by it.

0:21:53.156 --> 0:21:56.796
<v Speaker 3>Yeah, I think, like before I comment on it, I

0:21:56.796 --> 0:22:01.956
<v Speaker 3>think it's important that people understand, Like it's very true that,

0:22:03.596 --> 0:22:07.236
<v Speaker 3>like it's easy to think that I'll have a bullshitty

0:22:07.276 --> 0:22:10.276
<v Speaker 3>answer to a question like this because I work at

0:22:10.316 --> 0:22:12.316
<v Speaker 3>a tech company that's working on a lot of this stuff.

0:22:12.356 --> 0:22:15.956
<v Speaker 3>But you have to remember that, like, if that's true,

0:22:16.316 --> 0:22:19.676
<v Speaker 3>we're out of jobs as soon as like a human

0:22:19.836 --> 0:22:23.676
<v Speaker 3>is no longer in the loop. That's really bad for us.

0:22:24.116 --> 0:22:26.516
<v Speaker 3>Like does that make sense to you buy that.

0:22:26.836 --> 0:22:27.996
<v Speaker 1>At some margin?

0:22:28.276 --> 0:22:30.996
<v Speaker 2>Right, there's a long way between all the people who

0:22:30.996 --> 0:22:33.236
<v Speaker 2>are doing it now and zero people. There's a lot

0:22:33.236 --> 0:22:36.636
<v Speaker 2>of intermediate cases between the way it is now and

0:22:36.676 --> 0:22:40.156
<v Speaker 2>like a fully AI generated podcast, right, and like we're

0:22:40.156 --> 0:22:43.076
<v Speaker 2>already starting down the road, right, getting AI to write

0:22:43.116 --> 0:22:47.116
<v Speaker 2>show notes or something that's basically has happened now. And

0:22:48.356 --> 0:22:51.476
<v Speaker 2>you know, like I know the history of technology and

0:22:51.556 --> 0:22:54.836
<v Speaker 2>the labor market pretty well, you know, from the Industrial

0:22:54.876 --> 0:22:55.516
<v Speaker 2>Revolution on.

0:22:55.916 --> 0:22:56.796
<v Speaker 1>I'm pro.

0:22:58.276 --> 0:23:02.556
<v Speaker 2>Technological innovation. I believe in productivity gains and efficiency gains.

0:23:03.156 --> 0:23:06.196
<v Speaker 2>I'm also aware that there are instances when highly skilled

0:23:06.236 --> 0:23:09.276
<v Speaker 2>crafts people are displaced by technology. Right, that is definitely

0:23:09.276 --> 0:23:11.036
<v Speaker 2>a thing that happens. And I recognize that the pie

0:23:11.076 --> 0:23:13.156
<v Speaker 2>gets bigger and everybody's better off than the long run,

0:23:13.516 --> 0:23:16.396
<v Speaker 2>But like, I just want to not get pinched, right,

0:23:16.476 --> 0:23:18.636
<v Speaker 2>I just want to be you know, you don't want

0:23:18.676 --> 0:23:19.116
<v Speaker 2>to be the one.

0:23:19.556 --> 0:23:21.676
<v Speaker 1>I don't want to be the one. And you know.

0:23:21.956 --> 0:23:26.716
<v Speaker 2>I'm not out on using it. It's getting really good,

0:23:26.756 --> 0:23:28.316
<v Speaker 2>really fast. It's doing a lot of the things that

0:23:28.356 --> 0:23:28.796
<v Speaker 2>I can do.

0:23:29.596 --> 0:23:31.756
<v Speaker 3>There's one other thing I wanted to say, just about

0:23:31.756 --> 0:23:35.476
<v Speaker 3>the fear for your job thing, which is something we

0:23:35.516 --> 0:23:38.836
<v Speaker 3>say around here a lot, is that you should struggle

0:23:38.916 --> 0:23:41.956
<v Speaker 3>with your story and not your tools. That's almost like

0:23:42.036 --> 0:23:44.956
<v Speaker 3>a guiding light for us, is we want to take

0:23:45.356 --> 0:23:47.996
<v Speaker 3>all of the cognitive friction away from using the tools.

0:23:48.916 --> 0:23:51.716
<v Speaker 3>The funny thing about all of these things is like

0:23:51.996 --> 0:23:54.236
<v Speaker 3>there's a brief moment in time where you feel like

0:23:54.276 --> 0:23:59.196
<v Speaker 3>you have superpowers, but then everybody has them, and humans

0:23:59.236 --> 0:24:02.516
<v Speaker 3>once again become the differentiator. And we really think to

0:24:02.636 --> 0:24:05.756
<v Speaker 3>make like making great stuff is always going to be

0:24:05.796 --> 0:24:09.116
<v Speaker 3>a thing, and great is always going to be determined

0:24:09.596 --> 0:24:11.316
<v Speaker 3>by the human that's in the loop.

0:24:11.556 --> 0:24:14.396
<v Speaker 2>I mean, you know, there's this story about chess, right,

0:24:14.516 --> 0:24:18.036
<v Speaker 2>a computer chess program beat a person a long time ago,

0:24:18.116 --> 0:24:22.396
<v Speaker 2>decades ago now. But then after that people pointed out

0:24:22.436 --> 0:24:25.956
<v Speaker 2>the fact optimistically from my point of view, that a

0:24:25.996 --> 0:24:30.636
<v Speaker 2>computer plus a person could still beat any computer. Right,

0:24:30.636 --> 0:24:32.916
<v Speaker 2>a person working with a computer was better than the

0:24:32.956 --> 0:24:34.756
<v Speaker 2>best computer in the world. And that was like the

0:24:34.796 --> 0:24:37.676
<v Speaker 2>metaphor for like, yes, if we work with machines, we

0:24:37.716 --> 0:24:40.636
<v Speaker 2>can be better. That is no longer true now the

0:24:40.636 --> 0:24:43.476
<v Speaker 2>computer's kept getting better, and now people can't make them better.

0:24:43.516 --> 0:24:46.396
<v Speaker 2>Even a person plus a computer cannot beat a computer.

0:24:47.076 --> 0:24:49.276
<v Speaker 2>And I know that chess is less complex than the

0:24:49.276 --> 0:24:52.276
<v Speaker 2>real world, and so perhaps still a reason for optimism.

0:24:52.996 --> 0:24:56.196
<v Speaker 2>I certainly think I'm clever and good at making podcasts

0:24:56.236 --> 0:24:58.516
<v Speaker 2>and hope that I can do that. I hope that

0:24:58.556 --> 0:25:01.076
<v Speaker 2>I can work with AI to make something better than

0:25:01.076 --> 0:25:03.396
<v Speaker 2>ANYII or more like me or something.

0:25:05.556 --> 0:25:08.516
<v Speaker 3>It's it might not be true, though, but here's the

0:25:08.556 --> 0:25:13.156
<v Speaker 3>amazing thing. People are still playing chess. Right. It's like true,

0:25:13.836 --> 0:25:18.756
<v Speaker 3>there's some separation. Some separation happens where the machines become

0:25:18.876 --> 0:25:20.956
<v Speaker 3>so good and we just say, okay, you you machines,

0:25:20.996 --> 0:25:23.396
<v Speaker 3>you go off and do your thing, and we're going

0:25:23.476 --> 0:25:28.316
<v Speaker 3>to be here kind of reveling in our humanity with

0:25:28.396 --> 0:25:31.276
<v Speaker 3>each other. I think what we'll see is there's there's

0:25:31.316 --> 0:25:34.236
<v Speaker 3>going to be a certain category of content that's really

0:25:34.356 --> 0:25:37.636
<v Speaker 3>just about like the transmission of bits of information from

0:25:37.836 --> 0:25:40.676
<v Speaker 3>your brain to my brain, and that's all that it's about.

0:25:41.396 --> 0:25:41.516
<v Speaker 1>That.

0:25:42.796 --> 0:25:44.796
<v Speaker 3>Maybe we do one day see humans taken out of

0:25:44.796 --> 0:25:47.516
<v Speaker 3>the loop, but I really do believe there will always

0:25:47.796 --> 0:25:53.796
<v Speaker 3>be space for like at the core great content, storytelling,

0:25:53.836 --> 0:25:57.236
<v Speaker 3>whatever you call it, it's it's about feeling connected to

0:25:57.516 --> 0:26:00.796
<v Speaker 3>the humans and other people. And as soon as machines

0:26:01.116 --> 0:26:04.116
<v Speaker 3>play to have too heavy a hand, it's just not

0:26:04.196 --> 0:26:05.116
<v Speaker 3>interesting anymore.

0:26:08.236 --> 0:26:10.356
<v Speaker 2>We'll be back in a minute with the Lightning Round,

0:26:10.836 --> 0:26:13.516
<v Speaker 2>which includes a message from Andrew to.

0:26:13.596 --> 0:26:24.276
<v Speaker 1>His future self. That's the end of the ads. Now

0:26:24.276 --> 0:26:25.276
<v Speaker 1>we're going back to the show.

0:26:25.756 --> 0:26:28.196
<v Speaker 2>Okay, so this is the Lightning Round, now you ready.

0:26:28.676 --> 0:26:31.476
<v Speaker 2>It's just a bunch of questions. Do you use generitive

0:26:31.596 --> 0:26:33.836
<v Speaker 2>AI in your life outside of work?

0:26:34.356 --> 0:26:34.596
<v Speaker 1>Now?

0:26:35.196 --> 0:26:38.516
<v Speaker 3>You know what's interesting. I did something this morning where

0:26:38.516 --> 0:26:40.596
<v Speaker 3>I was actually like, I don't I don't even care

0:26:40.636 --> 0:26:42.836
<v Speaker 3>if it's wrong. I don't even care if it's.

0:26:44.036 --> 0:26:46.876
<v Speaker 2>Like the test of a theory is not is it correct?

0:26:46.916 --> 0:26:47.836
<v Speaker 1>But is it interesting?

0:26:48.796 --> 0:26:52.556
<v Speaker 3>Yeah? Exactly. I was asking it about I think, like

0:26:52.796 --> 0:26:54.516
<v Speaker 3>my son got hit in the head with a baseball,

0:26:54.556 --> 0:26:57.116
<v Speaker 3>and I was trying to I really should care about this. Actually,

0:26:57.796 --> 0:26:58.556
<v Speaker 3>you should.

0:26:58.276 --> 0:27:01.556
<v Speaker 1>Not ask chat GPT anything significant.

0:27:00.996 --> 0:27:02.996
<v Speaker 2>About that, not to give your parents advice.

0:27:04.996 --> 0:27:08.516
<v Speaker 3>It's stuff like that, like I've I've pretty quickly been.

0:27:08.356 --> 0:27:10.356
<v Speaker 2>Able to like that you should not be asking for

0:27:10.396 --> 0:27:11.996
<v Speaker 2>medical advice about your child.

0:27:15.876 --> 0:27:17.956
<v Speaker 3>I know. But when I say stuff like that, like

0:27:18.316 --> 0:27:20.956
<v Speaker 3>I would have googled it and probably just done what

0:27:20.996 --> 0:27:22.876
<v Speaker 3>I was going to do anyway. So it was almost

0:27:22.916 --> 0:27:25.396
<v Speaker 3>just a curiosity. He was fine, He didn't.

0:27:25.156 --> 0:27:29.996
<v Speaker 2>Need to go see a doctor, not according to JGBT. No,

0:27:31.956 --> 0:27:36.436
<v Speaker 2>I'm curious about your time working in a recording studio, right,

0:27:36.476 --> 0:27:38.956
<v Speaker 2>You worked in a recording studio where musicians came in

0:27:39.236 --> 0:27:42.996
<v Speaker 2>and recorded. Did you see there any like moments of

0:27:43.076 --> 0:27:43.956
<v Speaker 2>musical genius?

0:27:44.196 --> 0:27:44.596
<v Speaker 1>Is there one?

0:27:44.596 --> 0:27:45.236
<v Speaker 2>In particular?

0:27:46.156 --> 0:27:51.236
<v Speaker 3>I worked for this guy named Steve Albini, who is

0:27:51.836 --> 0:27:56.796
<v Speaker 3>a pretty well known engineer producer that was in some

0:27:57.676 --> 0:28:00.236
<v Speaker 3>popular kind of punk rock bands in the in the

0:28:00.276 --> 0:28:05.036
<v Speaker 3>eighties and currently and definitely saw some cool bands. But

0:28:05.076 --> 0:28:08.596
<v Speaker 3>I think also I really feel like I learned a

0:28:08.636 --> 0:28:15.636
<v Speaker 3>ton from watching him work. He's so talented, so articulate,

0:28:15.756 --> 0:28:18.956
<v Speaker 3>so smart in many ways, like an example of what

0:28:18.996 --> 0:28:23.236
<v Speaker 3>I aspired to be at the time, and so seeing

0:28:23.316 --> 0:28:25.956
<v Speaker 3>that output, but then also seeing him every day and

0:28:26.036 --> 0:28:30.356
<v Speaker 3>how hard he worked, it was a real like, oh,

0:28:30.556 --> 0:28:33.236
<v Speaker 3>this is how it happens kind of moment for me,

0:28:34.196 --> 0:28:39.316
<v Speaker 3>and it kind of inspired me. It inspired within me

0:28:39.796 --> 0:28:41.596
<v Speaker 3>a kind of work ethic that I'm not sure I

0:28:41.596 --> 0:28:42.996
<v Speaker 3>would have gotten to otherwise.

0:28:44.676 --> 0:28:46.636
<v Speaker 2>What's the best deal you ever got from group on?

0:28:53.396 --> 0:28:56.836
<v Speaker 3>Man? You know, it's so funny because, like obviously I

0:28:56.876 --> 0:28:58.396
<v Speaker 3>was asked. I used to be asked that question all

0:28:58.436 --> 0:29:02.116
<v Speaker 3>the time. I think it was a sensory deprivation tank.

0:29:02.716 --> 0:29:07.716
<v Speaker 3>They had a sensory deprivation tank center in somewhere in Chicago.

0:29:08.436 --> 0:29:09.996
<v Speaker 3>Had never tried. It was really cool.

0:29:11.356 --> 0:29:14.036
<v Speaker 2>This is a descript question. Now, how will you know

0:29:14.076 --> 0:29:15.476
<v Speaker 2>when it's time to do something else?

0:29:15.996 --> 0:29:18.196
<v Speaker 1>But leave? Dude?

0:29:18.396 --> 0:29:19.476
<v Speaker 3>I don't know if I want to say this on

0:29:19.516 --> 0:29:21.996
<v Speaker 3>a podcast, because if I do decide to take the

0:29:22.036 --> 0:29:25.156
<v Speaker 3>company public, it'll come back to haunt me. But I

0:29:25.156 --> 0:29:28.836
<v Speaker 3>almost want to say it specifically for that reason, Andrew,

0:29:29.076 --> 0:29:31.916
<v Speaker 3>I'm talking to future Andrew right now. You do not

0:29:31.956 --> 0:29:35.236
<v Speaker 3>want to be a public company CEO again, Okay, hire

0:29:35.276 --> 0:29:38.996
<v Speaker 3>someone else to do that. I know you're talking yourself

0:29:38.996 --> 0:29:40.636
<v Speaker 3>into it and saying it's going to be different time.

0:29:40.676 --> 0:29:44.676
<v Speaker 3>It's okay, but you hate it. It's the things that

0:29:45.196 --> 0:29:47.916
<v Speaker 3>those people are good at is and are interested in

0:29:48.356 --> 0:29:51.476
<v Speaker 3>is different than you go do something else.

0:29:53.596 --> 0:29:54.196
<v Speaker 1>Amazing.

0:29:54.396 --> 0:29:57.476
<v Speaker 2>I've never had someone leave themselves at time. Tast a

0:29:57.556 --> 0:29:58.756
<v Speaker 2>lot of podcasts before.

0:30:02.756 --> 0:30:04.276
<v Speaker 1>I'm going to send that to you. If you go public,

0:30:04.276 --> 0:30:05.596
<v Speaker 1>I'm to have you back on the show and I'm

0:30:05.596 --> 0:30:06.356
<v Speaker 1>going to play it to you.

0:30:09.116 --> 0:30:11.276
<v Speaker 2>Thank you, Thank you for being so generous with your time.

0:30:11.916 --> 0:30:14.076
<v Speaker 2>I appreciate your candor and I'm grateful for that.

0:30:14.636 --> 0:30:17.036
<v Speaker 3>I appreciate that I had. I had fun too. You're

0:30:17.076 --> 0:30:21.276
<v Speaker 3>good at your job in the sense that, like uh you,

0:30:21.276 --> 0:30:22.116
<v Speaker 3>you bring it out in me.

0:30:22.876 --> 0:30:25.956
<v Speaker 1>I'm better than a machine for now. It's gonna that's

0:30:26.036 --> 0:30:28.876
<v Speaker 1>my model, better than a machine for now.

0:30:34.436 --> 0:30:38.316
<v Speaker 2>Andrew Mason is the founder and CEO of Descript. Today's

0:30:38.356 --> 0:30:41.996
<v Speaker 2>show was edited by Sarah Nix, produced by Edith Russolo, and.

0:30:42.036 --> 0:30:45.516
<v Speaker 1>Engineered by Amanda k Wong. I'm Jacob Goldstein.

0:30:45.596 --> 0:30:48.316
<v Speaker 2>We'll be back next week with another episode of What's

0:30:48.316 --> 0:30:52.596
<v Speaker 2>Your Problem? And here, finally is the top of today's show.

0:30:52.636 --> 0:30:55.836
<v Speaker 2>The intro to the show as read if that's what

0:30:55.916 --> 0:31:01.276
<v Speaker 2>you'd call it, as generated by overdub descripts AI, powered

0:31:01.396 --> 0:31:06.836
<v Speaker 2>voice whatever emulator. After every interview we do for the show,

0:31:06.956 --> 0:31:09.316
<v Speaker 2>we upload the audio to a piece of soft were

0:31:09.356 --> 0:31:13.516
<v Speaker 2>called the script. Descript turns the audio into a transcript,

0:31:13.956 --> 0:31:16.316
<v Speaker 2>and then I can edit the transcript, cut out the

0:31:16.356 --> 0:31:19.276
<v Speaker 2>boring parts, move sections around, and when I do that,

0:31:19.756 --> 0:31:24.156
<v Speaker 2>descript edits the underlying audio to match. As software, descript

0:31:24.196 --> 0:31:27.756
<v Speaker 2>is pretty janky, it's buggy, it's constantly changing in ways

0:31:27.756 --> 0:31:30.276
<v Speaker 2>that can make it hard to use, and sometimes it

0:31:30.436 --> 0:31:34.116
<v Speaker 2>just blows stuff up. But we use it anyway because

0:31:34.156 --> 0:31:38.236
<v Speaker 2>descript is an incredible advance over what came before. Before

0:31:38.276 --> 0:31:42.356
<v Speaker 2>descript audio software represented audio files not as words, but

0:31:42.436 --> 0:31:46.556
<v Speaker 2>as waveforms, squiggly lines presented on a timeline. So when

0:31:46.596 --> 0:31:50.076
<v Speaker 2>descript came along, being able to edit audio by editing

0:31:50.156 --> 0:31:52.756
<v Speaker 2>words on a screen was a huge advance, and it

0:31:52.796 --> 0:31:55.836
<v Speaker 2>was an advance made possible by artificial intelligence.