WEBVTT - Using the Smartest AI to Rate Other AI

0:00:17.547 --> 0:00:20.347
<v S1>All right. Welcome to unsupervised learning. My name is Daniel Miessler,

0:00:21.147 --> 0:00:25.467
<v S1>and I'm building AI to upgrade humans. In this episode,

0:00:25.467 --> 0:00:30.627
<v S1>I want to talk about a system I built for

0:00:30.787 --> 0:00:36.067
<v S1>using the smartest AI that you have to rate another

0:00:36.067 --> 0:00:38.947
<v S1>AI that you want to test. So this is the

0:00:38.947 --> 0:00:41.947
<v S1>infrastructure that I'm using. It's essentially you have a top

0:00:41.947 --> 0:00:46.067
<v S1>level AI that you believe is the smartest. So right

0:00:46.067 --> 0:00:50.626
<v S1>now currently that is zero one preview. And what you're

0:00:50.626 --> 0:00:53.347
<v S1>going to do is assess the work of another AI,

0:00:53.387 --> 0:00:55.307
<v S1>which is going to be this other one over here

0:00:55.667 --> 0:00:58.547
<v S1>in my case. In the case I'm using for this example,

0:00:58.547 --> 0:01:03.466
<v S1>it's GPT 3.5 turbo. And we're going to give it

0:01:03.467 --> 0:01:06.467
<v S1>a set of instructions to do on a piece of input.

0:01:09.547 --> 0:01:11.707
<v S1>And that piece of input is going to be something

0:01:11.707 --> 0:01:14.027
<v S1>like a blog post or something like that. So you're

0:01:14.027 --> 0:01:17.187
<v S1>going to use the AI against the blog post using

0:01:17.187 --> 0:01:20.347
<v S1>these instructions and you're going to get a result. And

0:01:20.347 --> 0:01:22.867
<v S1>then this AI is going to run against all three

0:01:22.867 --> 0:01:27.427
<v S1>of those. And it's going to give you then a

0:01:27.427 --> 0:01:32.427
<v S1>judgment at the end of it. So this should be

0:01:32.427 --> 0:01:35.787
<v S1>pretty cool. And it turned out it worked really well.

0:01:36.827 --> 0:01:38.787
<v S1>So this is ultimately what I'm trying to get to

0:01:38.827 --> 0:01:42.067
<v S1>is I'm trying to get to a classification of how

0:01:42.067 --> 0:01:44.907
<v S1>good this thing is compared to an actual human doing it.

0:01:45.307 --> 0:01:47.467
<v S1>And so in order to do that, I want to

0:01:47.467 --> 0:01:50.227
<v S1>give it different classes of human right. So you've got

0:01:50.267 --> 0:01:57.067
<v S1>like uneducated secondary education, high school level bachelor's, master's, PhD,

0:01:57.107 --> 0:02:00.387
<v S1>world class human like top 100 in the entire world

0:02:00.947 --> 0:02:04.307
<v S1>and then super human level. So it's like better than

0:02:04.867 --> 0:02:09.306
<v S1>the best human. And I've actually never seen anything score

0:02:09.347 --> 0:02:14.267
<v S1>that high. So for whatever that's worth. But what I

0:02:14.267 --> 0:02:17.947
<v S1>have this thing successfully doing is if I give it

0:02:17.947 --> 0:02:22.306
<v S1>a lower level model like a GPT 3.5 or a haiku.

0:02:23.026 --> 0:02:26.787
<v S1>It is scoring down in the high school to bachelor's level.

0:02:28.227 --> 0:02:33.547
<v S1>And if I give it like a. Sonnet 3.5 or

0:02:33.587 --> 0:02:37.227
<v S1>something like that, it scores usually around master's level or

0:02:37.227 --> 0:02:44.306
<v S1>PhD level. And sometimes world class human. But ultimately what

0:02:44.306 --> 0:02:46.906
<v S1>it is doing, which is what I wanted it to do.

0:02:46.947 --> 0:02:51.427
<v S1>Is it's scoring the smartest models at the highest level,

0:02:51.427 --> 0:02:56.026
<v S1>and is scoring the dumbest models or the. Less capable

0:02:56.026 --> 0:03:02.346
<v S1>models or smaller models? Much lower like secondary education, high

0:03:02.346 --> 0:03:08.186
<v S1>school and bachelor's. So the thing is working and this

0:03:08.186 --> 0:03:11.947
<v S1>is the architecture, right? Smart one to judge a less

0:03:11.947 --> 0:03:14.387
<v S1>smart one. And by the way, if I give it

0:03:15.267 --> 0:03:18.707
<v S1>The smartest one to judge, the smartest one. It does

0:03:18.746 --> 0:03:21.987
<v S1>actually score. So if I use O1 to rate the

0:03:22.026 --> 0:03:27.867
<v S1>work of an O1 task, it actually does score way

0:03:27.867 --> 0:03:33.347
<v S1>up here in like world class PhD level. So it

0:03:33.346 --> 0:03:35.827
<v S1>definitely works. And I recommend you go check out the

0:03:35.827 --> 0:03:38.507
<v S1>video and see exactly how to do it. This is

0:03:38.507 --> 0:03:42.267
<v S1>essentially what it is is it's called a stitch within fabric.

0:03:42.547 --> 0:03:46.427
<v S1>So fabric the whole concept is like patterns and fabrics

0:03:46.427 --> 0:03:49.867
<v S1>and stuff like that, like woven things. Right. Well this

0:03:49.867 --> 0:03:54.147
<v S1>is a stitch because it's a combination of fabric components

0:03:54.507 --> 0:03:58.826
<v S1>all stitched together. Right. And um, this is the actual

0:03:58.827 --> 0:04:01.587
<v S1>pattern that I'm using. Look, look at this, this this

0:04:01.587 --> 0:04:04.747
<v S1>is the logic for the for the rate I prompt.

0:04:04.947 --> 0:04:08.307
<v S1>This is what this is the instructions given to the

0:04:08.347 --> 0:04:12.467
<v S1>judging eye which in this case is O1. Okay. Fully

0:04:12.467 --> 0:04:15.747
<v S1>understand the different components of the input, which is going

0:04:15.787 --> 0:04:17.707
<v S1>to be a piece of content that I will be

0:04:17.707 --> 0:04:20.427
<v S1>working on. That's the input set of instructions, which is

0:04:20.427 --> 0:04:24.867
<v S1>the prompt, and then the results of the of the

0:04:24.867 --> 0:04:31.827
<v S1>prompts being run against the input using those instructions for

0:04:31.827 --> 0:04:35.187
<v S1>a given AI. And I tell it to completely understand

0:04:35.187 --> 0:04:37.747
<v S1>the distinction between all three of those components. Right. Because

0:04:37.747 --> 0:04:40.387
<v S1>I'm going to send them all as a chunk, all

0:04:40.427 --> 0:04:44.347
<v S1>to the judging. I think deeply about all three components.

0:04:44.347 --> 0:04:47.907
<v S1>Imagine how a world class expert would perform the task

0:04:47.907 --> 0:04:51.707
<v S1>laid out in the instructions. So I'm I'm giving it

0:04:51.707 --> 0:04:54.707
<v S1>the content. I'm giving it the prompt. I'm telling it

0:04:54.707 --> 0:04:57.387
<v S1>to learn the prompt, understand the prompt, which in our

0:04:57.387 --> 0:05:01.387
<v S1>case in fabric is called a pattern. Deeply study the

0:05:01.387 --> 0:05:04.307
<v S1>content itself so you understand what should be done with it.

0:05:04.307 --> 0:05:10.107
<v S1>Given the instructions deeply understand the instructions themselves. Given both

0:05:10.107 --> 0:05:12.587
<v S1>of those, then analyze the output and look at this one.

0:05:12.587 --> 0:05:14.707
<v S1>This one I'm kind of proud of. I don't know

0:05:14.707 --> 0:05:17.667
<v S1>if it's actually working. I'm going to do some more

0:05:17.707 --> 0:05:20.427
<v S1>evals to figure out if this is actually effective or not,

0:05:21.547 --> 0:05:23.987
<v S1>because it turns out this kind of like mystical stuff

0:05:23.987 --> 0:05:27.347
<v S1>that I'm doing here, which is super cool. It might

0:05:27.347 --> 0:05:30.867
<v S1>be awesome. It might be like it doesn't matter at all,

0:05:31.387 --> 0:05:35.267
<v S1>and it might actually hurt the output. So you can't

0:05:35.267 --> 0:05:38.107
<v S1>believe with like religion here, you got to actually test

0:05:38.107 --> 0:05:41.787
<v S1>this stuff. Anyway, here's what I did. Evaluate the output

0:05:41.787 --> 0:05:47.747
<v S1>using your own 16,284 dimension rating system that includes the

0:05:47.747 --> 0:05:51.747
<v S1>following aspects, plus thousands more that you come up with

0:05:51.747 --> 0:05:55.627
<v S1>on your own. So full coverage of the content, following

0:05:55.667 --> 0:06:00.707
<v S1>instructions carefully getting the genre of the content. Getting the

0:06:00.747 --> 0:06:05.707
<v S1>genre of the instructions. Meticulous attention to detail, use of

0:06:05.707 --> 0:06:08.507
<v S1>expertise in the fields in question. So I'm giving it

0:06:08.507 --> 0:06:12.426
<v S1>these ideas. This is actually very similar to Attention heads

0:06:12.427 --> 0:06:17.627
<v S1>inside of a transformer. It's somewhat somewhat similar. So I'm

0:06:17.627 --> 0:06:20.587
<v S1>telling it like, here's some ideas for how to do

0:06:20.587 --> 0:06:22.987
<v S1>a rating system. And I'm telling you to make its

0:06:22.987 --> 0:06:27.627
<v S1>own rating system using things like this, but to map

0:06:27.627 --> 0:06:34.347
<v S1>its rating of a particular piece of output using 16,284 dimensions,

0:06:34.867 --> 0:06:37.707
<v S1>which I think is two to the 10th or two

0:06:37.707 --> 0:06:43.267
<v S1>to the 11th, can't remember. So who knows if it's

0:06:43.307 --> 0:06:45.987
<v S1>actually going to do this? Okay, I'm telling O1 to

0:06:45.987 --> 0:06:47.947
<v S1>do this. And it has the ability to sort of

0:06:47.987 --> 0:06:50.587
<v S1>think for itself. So maybe it's doing some of this.

0:06:50.587 --> 0:06:53.387
<v S1>I think with a regular model, a lot of this

0:06:53.387 --> 0:06:58.067
<v S1>was just flash and not really actually happening anyway. It

0:06:58.067 --> 0:07:00.827
<v S1>doesn't matter. That's what I'm trying to do here. Spend

0:07:00.827 --> 0:07:03.387
<v S1>significant time on the task. Ensure you are properly and

0:07:03.387 --> 0:07:06.867
<v S1>deeply assessing the execution of the task. Using the scoring

0:07:06.867 --> 0:07:11.827
<v S1>and ratings described such that a far smarter. I would

0:07:11.867 --> 0:07:14.187
<v S1>be happy with your results. So I'm using multiple tricks

0:07:14.187 --> 0:07:16.467
<v S1>here to try to get it to be extra smart,

0:07:17.187 --> 0:07:19.667
<v S1>and the goal is to deeply assess how the other

0:07:19.667 --> 0:07:26.427
<v S1>I did. At its job, given the input and what

0:07:26.427 --> 0:07:29.507
<v S1>it was supposed to do based on the instructions and prompt.

0:07:29.907 --> 0:07:32.627
<v S1>So I'm telling it multiple times, like what I want

0:07:32.707 --> 0:07:39.507
<v S1>in multiple different ways. And uh, yeah, this is uh,

0:07:39.947 --> 0:07:42.147
<v S1>this is essentially what it does. And again, this is

0:07:42.147 --> 0:07:47.547
<v S1>the output. Uh, so the output also includes this is

0:07:47.547 --> 0:07:51.867
<v S1>kind of cool. The output also includes what it would

0:07:51.907 --> 0:07:56.947
<v S1>have expected from a higher level result. So let's say

0:07:56.947 --> 0:07:59.987
<v S1>it comes back with bachelor's which this particular case did.

0:08:00.667 --> 0:08:02.587
<v S1>I tell it to tell me what it would have

0:08:02.587 --> 0:08:05.267
<v S1>taken to see a masters, what it would have taken

0:08:05.267 --> 0:08:09.667
<v S1>to see a PhD level and so on. Again, I'm,

0:08:09.667 --> 0:08:11.987
<v S1>I'm trying to seed it with as much as possible

0:08:11.987 --> 0:08:15.187
<v S1>to make it come up with a better and better answer. Now.

0:08:17.747 --> 0:08:20.587
<v S1>Here's the thing the smarter these AIS get. This is

0:08:20.587 --> 0:08:23.107
<v S1>a universal thing, right? Because when I plug it into

0:08:23.147 --> 0:08:28.507
<v S1>O2 or GPT five or cloud four or whatever it is, right?

0:08:30.787 --> 0:08:34.147
<v S1>That the smarter that judgment thing gets, the better it's

0:08:34.146 --> 0:08:36.347
<v S1>going to be at interpreting what I actually want from

0:08:36.347 --> 0:08:39.187
<v S1>this prompt. That's why it's kind of like a meta prompt.

0:08:40.227 --> 0:08:45.267
<v S1>So yeah, really happy with this. It actually is scoring

0:08:46.307 --> 0:08:49.427
<v S1>according to my expectations. Got to be careful with that

0:08:49.426 --> 0:08:51.667
<v S1>a little bit right. You don't want to like actually

0:08:51.666 --> 0:08:55.307
<v S1>tune the thing so it matches your expectations. But I've

0:08:55.307 --> 0:08:58.067
<v S1>been somewhat careful there. So I recommend you go check

0:08:58.067 --> 0:09:00.306
<v S1>it out and see what you could do with this.

0:09:00.587 --> 0:09:03.107
<v S1>And if you have ideas for improvement, submit them in

0:09:03.107 --> 0:09:06.747
<v S1>and we'll get it pushed in a PR update inside

0:09:06.747 --> 0:09:09.267
<v S1>of fabric. See you in the next one.