WEBVTT - Why I Believe in SOTA Models Over Custom Ones

0:00:00.040 --> 0:00:03.240
<v S1>I'm not completely sure I'm right about this, but I've

0:00:03.240 --> 0:00:06.360
<v S1>never been a big believer in training custom models. I've

0:00:06.360 --> 0:00:09.479
<v S1>also never believed in fine tuning going all the way

0:00:09.480 --> 0:00:12.640
<v S1>back to 2023. My intuition has always pushed me towards

0:00:12.640 --> 0:00:16.000
<v S1>the best state of the art model possible, combined with

0:00:16.000 --> 0:00:20.959
<v S1>context management. I just finally crystallized my reasoning around this.

0:00:21.720 --> 0:00:24.520
<v S1>Anytime you think you're using a small model for a

0:00:24.520 --> 0:00:27.920
<v S1>small task, there's usually a whole lot more going into

0:00:27.960 --> 0:00:31.560
<v S1>a given decision than just that individual area of expertise.

0:00:32.520 --> 0:00:37.560
<v S1>For example, labeling emails, writing reports, processing security events, searching

0:00:37.560 --> 0:00:40.360
<v S1>for threats on a network. On one hand, I think

0:00:40.360 --> 0:00:42.680
<v S1>these are specialized, but the fact is, the smarter and

0:00:42.680 --> 0:00:46.080
<v S1>more experienced a human is who has this expertise, the

0:00:46.080 --> 0:00:49.159
<v S1>better job they're going to do. This is because most

0:00:49.159 --> 0:00:53.560
<v S1>specialized tasks still benefit from the general life experience of

0:00:53.560 --> 0:00:56.440
<v S1>the person doing the execution. This is why I think

0:00:56.440 --> 0:00:59.040
<v S1>the future is not a whole bunch of extremely small,

0:00:59.040 --> 0:01:03.280
<v S1>specialized models throughout the enterprise. I think what's far more

0:01:03.280 --> 0:01:07.600
<v S1>likely is more of an opus, sonnet haiku model, where

0:01:07.600 --> 0:01:10.000
<v S1>the best of the best just keeps coming down in price,

0:01:10.360 --> 0:01:14.440
<v S1>including going into open source. And those smaller models are

0:01:14.440 --> 0:01:17.160
<v S1>used in conjunction with context to perform all the different

0:01:17.160 --> 0:01:21.160
<v S1>tasks in an organization at much lower cost. But I

0:01:21.160 --> 0:01:24.960
<v S1>think they'll still be extremely general models, not tiny and

0:01:24.959 --> 0:01:28.920
<v S1>narrow custom ones. I think the Tldr here is when

0:01:28.920 --> 0:01:32.160
<v S1>you think you're doing a narrow task, that narrow task

0:01:32.160 --> 0:01:36.440
<v S1>is actually benefiting from a ton of general experience. And

0:01:36.440 --> 0:01:38.720
<v S1>I think this applies to humans, and I think it

0:01:38.720 --> 0:01:42.720
<v S1>also applies to models. I'm not completely convinced of this.

0:01:42.760 --> 0:01:46.840
<v S1>I'm about 70% sure. But yeah, I think this is

0:01:46.840 --> 0:01:47.720
<v S1>the way it's going to go.