WEBVTT - Judge AI based on Output, Not Mechanism

0:00:01.080 --> 0:00:03.240
<v S1>I mean, have you listened to something that I think

0:00:03.240 --> 0:00:09.120
<v S1>captures extraordinarily well? Why? Arguments that AI don't understand anything

0:00:09.119 --> 0:00:15.120
<v S1>and can't possibly understand anything are completely misguided and empty.

0:00:16.360 --> 0:00:19.239
<v S1>This is a blues version of Without Me by Eminem.

0:00:19.800 --> 0:00:24.360
<v S1>It's from the 1950s, which means it's not real. And

0:00:24.360 --> 0:00:27.240
<v S1>he's also never done a blues version of Without Me,

0:00:27.240 --> 0:00:31.120
<v S1>to my knowledge. And so it's AI generated and it's

0:00:31.120 --> 0:00:35.720
<v S1>objectively a stunning piece of music, and it's quite different

0:00:35.720 --> 0:00:38.440
<v S1>from the original. So let's listen to it.

0:00:46.240 --> 0:00:59.000
<v S2>Guess who's been back again? Shady's back. Tell a friend.

0:01:00.970 --> 0:01:03.890
<v S2>Guess who's back? Guess who's back? Guess who's back. Guess

0:01:03.890 --> 0:01:06.890
<v S2>who's back. Guess who's back. Guess who's back. Guess who's back.

0:01:06.890 --> 0:01:14.289
<v S2>Guess who's back. Guess who's back. Guess who's back. No

0:01:14.290 --> 0:01:24.570
<v S2>no no no no no no no. I've created a monster.

0:01:25.050 --> 0:01:28.930
<v S2>Cause nobody wants to see my shoes no more. They

0:01:28.930 --> 0:01:32.530
<v S2>won't shake I'm chopped liver. Well, if you want shady.

0:01:32.569 --> 0:01:34.530
<v S2>This is what I give you. A little bit of

0:01:34.530 --> 0:01:38.009
<v S2>me mixed with some hard liquor. Some vodka that'll jumpstart

0:01:38.010 --> 0:01:40.369
<v S2>my heart quicker than a shark. When I get shot

0:01:40.370 --> 0:01:43.890
<v S2>at the hospital by the doctor. When I'm not cooperating,

0:01:43.890 --> 0:01:47.290
<v S2>when I'm rocking the table while it's operating. Hey, you

0:01:47.290 --> 0:01:49.970
<v S2>waited this long to stop debating cause I'm back. I'm

0:01:49.970 --> 0:01:52.490
<v S2>on the. I know that you got a job, Miss Cheney,

0:01:52.490 --> 0:01:56.130
<v S2>but your husband's heart problems complicating. So the FCC won't

0:01:56.130 --> 0:01:59.050
<v S2>let me be. Or let me be me. So let

0:01:59.090 --> 0:02:02.370
<v S2>me see. They try to shove me down on MTV,

0:02:02.810 --> 0:02:06.730
<v S2>but it feels so empty without me.

0:02:07.450 --> 0:02:12.370
<v S1>So every time I listen to that, I feel compelled

0:02:12.370 --> 0:02:16.170
<v S1>to move. I think if music makes you dance and

0:02:16.169 --> 0:02:21.330
<v S1>feel things, it is real. If AI models and scaffolding

0:02:21.330 --> 0:02:25.370
<v S1>can be assembled into a product that can replace human workers,

0:02:25.889 --> 0:02:30.369
<v S1>it's intelligent, i.e. it has the ability to understand, pursue,

0:02:30.370 --> 0:02:35.450
<v S1>and accomplish goals. If a technology can perform a task

0:02:35.450 --> 0:02:42.889
<v S1>and produce an output that requires understanding, it understands. So

0:02:42.889 --> 0:02:45.450
<v S1>in this frame, understanding is the ability of an actor

0:02:45.450 --> 0:02:49.049
<v S1>to interpret a given task and desired outcome well enough

0:02:49.050 --> 0:02:53.810
<v S1>to create an acceptable result. AI can clearly do that

0:02:53.810 --> 0:02:58.290
<v S1>now across so many domains. It's true that if you

0:02:58.290 --> 0:03:01.660
<v S1>break open a neural net or a human brain and

0:03:01.660 --> 0:03:04.180
<v S1>start poking at it with a stick or a scalpel

0:03:04.180 --> 0:03:07.860
<v S1>or an electron microscope. There is no place to point

0:03:07.860 --> 0:03:12.500
<v S1>to and say this is understanding, or here is the intelligence,

0:03:13.380 --> 0:03:18.019
<v S1>but it is there in both human brains and in

0:03:18.020 --> 0:03:21.900
<v S1>neural nets, because we see the outputs that prove that

0:03:21.900 --> 0:03:26.380
<v S1>it's there. We should stop wasting cycles on does it

0:03:26.380 --> 0:03:30.700
<v S1>understand or is it intelligent or it can't be intelligent,

0:03:30.700 --> 0:03:36.140
<v S1>because all these behaviors in both animals and technology are

0:03:36.140 --> 0:03:41.060
<v S1>the result of emergent functionality. And the core issue here

0:03:41.060 --> 0:03:46.300
<v S1>is that we still lack transparency into emergence itself, not

0:03:46.300 --> 0:03:49.980
<v S1>just for tech, not just for llms, not just for AI,

0:03:50.020 --> 0:03:54.460
<v S1>but for humans and other animals as well. So let's

0:03:54.460 --> 0:03:58.460
<v S1>not confuse that opacity of emergence itself, which is a

0:03:58.540 --> 0:04:04.140
<v S1>universal human problem in curiosity, with a specific implementation of

0:04:04.180 --> 0:04:08.940
<v S1>that emergence, opacity and a new intelligence stack judge capabilities

0:04:08.940 --> 0:04:13.460
<v S1>by their ground truth outputs. In other words, in your lexicon,

0:04:13.860 --> 0:04:17.380
<v S1>did the creation of that output require understanding and or

0:04:17.380 --> 0:04:21.739
<v S1>intelligence if it were a human doing it? And if so,

0:04:21.980 --> 0:04:27.740
<v S1>then did a non-human actually produce that? Did a non-human

0:04:27.740 --> 0:04:31.300
<v S1>technology produce that same thing that if you saw it

0:04:31.300 --> 0:04:37.060
<v S1>from someone else, it would have required intelligence? Then guess

0:04:37.060 --> 0:04:43.140
<v S1>what that is? Intelligence. Intelligence was used to produce the output.

0:04:43.339 --> 0:04:46.900
<v S1>We can use the output itself, and the fact that

0:04:46.900 --> 0:04:51.020
<v S1>we have defined it as requiring intelligence to say that

0:04:51.020 --> 0:04:55.740
<v S1>anything that could have produced it had intelligence itself. I

0:04:55.740 --> 0:04:59.230
<v S1>think this framing helps clarify the whole situation a little

0:04:59.270 --> 0:05:02.310
<v S1>bit because we can start from ground truth, which is

0:05:02.550 --> 0:05:06.190
<v S1>what we already know and accept as being the product

0:05:06.190 --> 0:05:10.070
<v S1>of intelligence, right? If you hear a song like this,

0:05:10.070 --> 0:05:13.390
<v S1>if you see a work output from an AI digital

0:05:13.390 --> 0:05:16.309
<v S1>worker or something, and you say, well, if a human

0:05:16.310 --> 0:05:18.670
<v S1>would have made that, I would have thought it was

0:05:18.670 --> 0:05:22.070
<v S1>a good product. I would have thought this definitely required intelligence.

0:05:22.550 --> 0:05:26.909
<v S1>That statement there we can use as ground truth. And

0:05:26.910 --> 0:05:30.590
<v S1>then from there, it's a quick step to say anything

0:05:30.589 --> 0:05:35.590
<v S1>that can produce that then also has that intelligence. And

0:05:35.589 --> 0:05:38.270
<v S1>notice that this is completely separate from being able to

0:05:38.310 --> 0:05:41.470
<v S1>explain how it got it. We just have to remind

0:05:41.470 --> 0:05:44.830
<v S1>ourselves we don't know how we got ours either. We

0:05:44.830 --> 0:05:48.430
<v S1>have no idea how. When you look at a spongy

0:05:48.470 --> 0:05:52.270
<v S1>pink brain, how you can store memories in there, how

0:05:52.270 --> 0:05:55.550
<v S1>you can have ideas, how you can have thoughts. We

0:05:55.550 --> 0:05:59.830
<v S1>have no idea where inside of that brain any of

0:05:59.830 --> 0:06:04.790
<v S1>this stuff is actually performed or stored. Now, in humans,

0:06:04.790 --> 0:06:07.230
<v S1>we are not tempted to say, well, since I can't

0:06:07.230 --> 0:06:11.350
<v S1>find it, we are clearly not doing understanding. We are

0:06:11.350 --> 0:06:14.990
<v S1>not doing intelligence. Those things are not there because I

0:06:15.029 --> 0:06:18.589
<v S1>cannot find them by looking at the substrate. We're not

0:06:18.589 --> 0:06:21.510
<v S1>tempted to say that with humans, and we're not tempted

0:06:21.510 --> 0:06:23.830
<v S1>to say it, because we can actually look at the

0:06:23.830 --> 0:06:29.470
<v S1>outputs of ourselves doing those exact things. So why are

0:06:29.470 --> 0:06:33.589
<v S1>we making this mistake with a different type of intelligence?

0:06:34.150 --> 0:06:36.950
<v S1>Why are we looking at outputs that we would judge

0:06:37.190 --> 0:06:42.790
<v S1>as being intelligent or requiring intelligence to make and saying, well,

0:06:42.830 --> 0:06:45.190
<v S1>because I can't find where it was made or how

0:06:45.190 --> 0:06:49.390
<v S1>it was made, it must not be intelligence. It just

0:06:49.390 --> 0:06:53.030
<v S1>doesn't make sense. And hopefully this frame will help you

0:06:53.029 --> 0:06:56.349
<v S1>have the conversation with yourself or with others. We'll see

0:06:56.350 --> 0:06:57.190
<v S1>you in the next one.