WEBVTT - A Conversation with Michael Brown

0:00:19.852 --> 0:00:22.212
<v S1>All right, Michael, welcome to unsupervised learning.

0:00:22.892 --> 0:00:24.412
<v S2>Hey, it's great to be here. Thanks for having me.

0:00:25.532 --> 0:00:29.892
<v S1>Yeah. So, uh, lots to talk about here. Uh, can

0:00:29.892 --> 0:00:31.732
<v S1>you give a quick intro on yourself?

0:00:32.492 --> 0:00:34.172
<v S2>Yeah, sure. So, uh, my name is Michael Brown. I'm

0:00:34.172 --> 0:00:36.812
<v S2>a principal security engineer at trilobites. I lead up our

0:00:36.812 --> 0:00:40.932
<v S2>company's AI and ML security research group. We really focus

0:00:40.932 --> 0:00:44.972
<v S2>on two kinds of, uh, intersections between AI, ML, and security.

0:00:45.012 --> 0:00:51.332
<v S2>It's primarily using AIML technologies to solve traditional cybersecurity problems

0:00:51.332 --> 0:00:54.292
<v S2>that are really hairy and really kind of sticky, and

0:00:54.292 --> 0:00:57.972
<v S2>conventional methods have kind of failed to address. And then

0:00:57.972 --> 0:01:01.412
<v S2>we also, uh, to a smaller degree, look at, um,

0:01:01.452 --> 0:01:06.292
<v S2>the security of AIML based systems. So, um, I was

0:01:06.532 --> 0:01:10.332
<v S2>also the lead designer, um, in team lead for um,

0:01:10.372 --> 0:01:14.092
<v S2>trilobites team that entered into the AI Cyber Challenge. Uh,

0:01:14.132 --> 0:01:16.892
<v S2>we built the tool called Buttercup, which took second place

0:01:16.892 --> 0:01:21.382
<v S2>in And overall in the iacc. And, um. Yeah, that's

0:01:21.382 --> 0:01:21.862
<v S2>about it.

0:01:22.462 --> 0:01:26.062
<v S1>Yeah. That's perfect. And that's exactly what I'd like to

0:01:26.102 --> 0:01:32.622
<v S1>chat about. Um, so I guess, um, I guess the

0:01:32.622 --> 0:01:35.702
<v S1>thing I'm most interested in is, uh, just the design

0:01:35.702 --> 0:01:41.622
<v S1>of the system, and, um, I guess overall, what you

0:01:41.622 --> 0:01:44.542
<v S1>know about the designs of the other system. So design

0:01:44.542 --> 0:01:49.222
<v S1>versus design, system versus system. What? Whatever you want to

0:01:49.222 --> 0:01:51.541
<v S1>share or can share. Like what? What are your thoughts

0:01:51.542 --> 0:01:54.541
<v S1>on that? Um, I guess everyone releases open source. So

0:01:54.862 --> 0:01:56.462
<v S1>maybe you've had a chance to look at some of

0:01:56.462 --> 0:01:59.662
<v S1>the other offerings. Maybe you've heard them talking, maybe you know,

0:01:59.662 --> 0:02:02.742
<v S1>the teams. Uh, so I guess what kind of Intel

0:02:02.742 --> 0:02:06.502
<v S1>do you have on what everyone else was doing versus

0:02:06.502 --> 0:02:10.062
<v S1>what you guys were doing? And how do you think

0:02:10.182 --> 0:02:11.142
<v S1>that went?

0:02:12.502 --> 0:02:14.622
<v S2>Yeah. Well, um, yeah, I guess I can answer that

0:02:14.622 --> 0:02:17.992
<v S2>last part pretty easily. It went pretty well for us. Um,

0:02:18.232 --> 0:02:20.712
<v S2>so we took second place. Uh, the team that finished

0:02:20.712 --> 0:02:23.952
<v S2>in first. Team Atlanta. Um, they had a pretty similar

0:02:23.952 --> 0:02:28.512
<v S2>setup to ours. Um, they had more components, more moving parts, uh,

0:02:28.512 --> 0:02:31.552
<v S2>more pieces. They had more hands. Um, larger team to

0:02:31.552 --> 0:02:34.112
<v S2>be able to kind of implement more, um, but ultimately

0:02:34.112 --> 0:02:37.952
<v S2>they had a really similar kind of set of design principles, um,

0:02:37.992 --> 0:02:41.632
<v S2>that worked out for us, the third place finishing team theory, they, um,

0:02:41.672 --> 0:02:44.112
<v S2>had a bit of a deviation in terms of like

0:02:44.112 --> 0:02:47.232
<v S2>their conceptual, uh, principles that guided how they built their system.

0:02:47.232 --> 0:02:49.712
<v S2>But I can get into that in a bit. Um,

0:02:49.952 --> 0:02:51.392
<v S2>I guess I can first start off by talking a

0:02:51.392 --> 0:02:55.192
<v S2>little bit about our concept. So it's interesting. Um, you know,

0:02:55.232 --> 0:02:57.472
<v S2>the concept for Buttercup changed quite a bit over the

0:02:57.472 --> 0:03:00.032
<v S2>course of the over the course of the AI Cyber Challenge.

0:03:00.032 --> 0:03:03.832
<v S2>So this got announced, um, a couple years back, and

0:03:03.832 --> 0:03:06.592
<v S2>there was a period of about 4 or 5 months, um,

0:03:06.672 --> 0:03:09.512
<v S2>after the cyber challenge was announced, but before DARPA had

0:03:09.512 --> 0:03:13.031
<v S2>really released any rules. So we didn't really know exactly

0:03:13.312 --> 0:03:15.282
<v S2>how the competition was going to be structured. We structured.

0:03:15.282 --> 0:03:16.682
<v S2>We just knew that we would have to build a

0:03:16.681 --> 0:03:22.281
<v S2>fully autonomous, AI driven system that could find and patch vulnerabilities, um,

0:03:22.322 --> 0:03:26.202
<v S2>with a high degree of accuracy. Um, so originally, the

0:03:26.202 --> 0:03:29.962
<v S2>concept that I drew up along with my co-creator Ian Smith, um,

0:03:30.562 --> 0:03:34.042
<v S2>was originally really ambitious. Lots of moving parts, lots of

0:03:34.042 --> 0:03:39.602
<v S2>static analysis, dynamic analysis, lots of, um, conventional techniques, lots

0:03:39.602 --> 0:03:42.642
<v S2>of AIML based techniques. But ultimately, once the rules came out,

0:03:42.642 --> 0:03:44.442
<v S2>it kind of got pared down quite a bit. Um,

0:03:44.442 --> 0:03:47.322
<v S2>some of the things that we wanted to do, um, were,

0:03:47.522 --> 0:03:49.122
<v S2>were marked as like out of scope. Some of the

0:03:49.122 --> 0:03:52.162
<v S2>stuff we wanted to do were marked as against the rules, um,

0:03:52.162 --> 0:03:54.322
<v S2>just for the tractability of the competition.

0:03:54.322 --> 0:03:56.802
<v S1>So is that because they were, they would have been

0:03:56.802 --> 0:03:59.402
<v S1>too expensive. Didn't you have budgets you had to stay under?

0:04:00.162 --> 0:04:02.722
<v S2>Yeah. So some of it was definitely, um, budgetary and

0:04:02.722 --> 0:04:04.562
<v S2>some stuff was just, you know, flat out against the rules.

0:04:04.562 --> 0:04:07.402
<v S2>We looked at fine tuning a large language model, um,

0:04:07.442 --> 0:04:10.602
<v S2>with information about lots of open source software. And, um,

0:04:10.642 --> 0:04:15.022
<v S2>there ended up being a rule about pre-baking models, so. okay, really,

0:04:15.022 --> 0:04:17.702
<v S2>kudos to DARPA for making sure that, you know, competitors

0:04:17.702 --> 0:04:21.022
<v S2>didn't have the ability to kind of, um, skew the

0:04:21.022 --> 0:04:23.622
<v S2>systems that they build for the test, which is, you know,

0:04:23.662 --> 0:04:27.182
<v S2>finding and patching vulnerabilities and open source software. Um, so, yeah,

0:04:27.222 --> 0:04:29.541
<v S2>there was a lot of stuff that gets cut down. Um,

0:04:29.582 --> 0:04:33.382
<v S2>they got cut down. But ultimately the design of our system, um, was,

0:04:33.382 --> 0:04:35.541
<v S2>was basically a pipeline. We we kind of broke the

0:04:35.541 --> 0:04:37.491
<v S2>problem down. We realized we had to do basically 4

0:04:37.492 --> 0:04:40.421
<v S2>or 5 things really well. To win this competition, we

0:04:40.422 --> 0:04:42.462
<v S2>had to be able to find vulnerabilities. And not only that,

0:04:42.462 --> 0:04:44.302
<v S2>we had to be able to prove they exist. So

0:04:44.302 --> 0:04:46.942
<v S2>it wasn't enough just to, you know, use a static

0:04:46.942 --> 0:04:49.302
<v S2>analysis scanner and say, hey, this thing thinks there's a

0:04:49.302 --> 0:04:54.501
<v S2>vulnerability online. 50 of, you know, whatever, uh, you actually

0:04:54.502 --> 0:04:56.862
<v S2>had to have a crashing test case for the first

0:04:57.981 --> 0:05:01.582
<v S2>round of the competition in the semifinals. And in the finals.

0:05:01.702 --> 0:05:05.222
<v S2>You didn't they, they relaxed this requirement. But the pathway

0:05:05.222 --> 0:05:08.982
<v S2>to getting lots of points basically still required one. Um,

0:05:08.981 --> 0:05:11.462
<v S2>so you you had to find vulnerabilities and also prove

0:05:11.462 --> 0:05:14.152
<v S2>they exist with a crashing input, or an input that

0:05:14.152 --> 0:05:19.232
<v S2>would trigger a sanitizer in the target function. Um, you

0:05:19.231 --> 0:05:22.032
<v S2>had to be able to contextualize and draw additional information

0:05:22.032 --> 0:05:25.952
<v S2>about this vulnerability. Otherwise, patching was doomed to fail. Um,

0:05:25.992 --> 0:05:30.272
<v S2>and then you had to patch the actually patched the vulnerability. Um,

0:05:30.791 --> 0:05:35.072
<v S2>so this is a highly complex, uh, problem that conventional

0:05:35.072 --> 0:05:39.312
<v S2>approaches to software analysis have really kind of not addressed. Well,

0:05:39.312 --> 0:05:41.032
<v S2>in my opinion. And it was a great area to

0:05:41.072 --> 0:05:43.272
<v S2>use I. And then we also, you know, finally we

0:05:43.272 --> 0:05:47.272
<v S2>had to orchestrate all of these functions and do really

0:05:47.272 --> 0:05:50.032
<v S2>high quality engineering around all of them so that the

0:05:50.032 --> 0:05:53.032
<v S2>system would stay up and running for several days. Um,

0:05:53.032 --> 0:05:54.872
<v S2>so based on those kind of 4 or 5, depending

0:05:54.872 --> 0:05:57.632
<v S2>on how you chop them up, core principles or core

0:05:57.632 --> 0:05:59.671
<v S2>tasks that we had to do, um, we kind of

0:05:59.712 --> 0:06:01.952
<v S2>decided on an approach that we kind of call the

0:06:01.952 --> 0:06:04.672
<v S2>best of both worlds, which was, you know, we knew

0:06:04.712 --> 0:06:08.752
<v S2>that conventional software analysis, whether it's dynamic, static, hybrid, whatever, um,

0:06:08.791 --> 0:06:12.162
<v S2>it really excels at certain subproblems within this pipeline. and

0:06:12.162 --> 0:06:15.722
<v S2>it really struggles with other ones. And AIML and specifically

0:06:15.722 --> 0:06:19.202
<v S2>generative AI, which the competition was, was kind of heavily

0:06:19.202 --> 0:06:22.522
<v S2>skewed towards generative AI. Generative AI does really well at

0:06:22.522 --> 0:06:25.162
<v S2>certain types of subproblems in this pipeline, but also really

0:06:25.162 --> 0:06:29.282
<v S2>struggles with others. So our approach is pretty straightforward. We're

0:06:29.282 --> 0:06:31.442
<v S2>going to merge the best in class capability for each

0:06:31.442 --> 0:06:35.481
<v S2>part of this pipeline. Uh, stitch them together with high uptime,

0:06:35.481 --> 0:06:39.842
<v S2>high reliability engineering code, um, and then focus on doing really,

0:06:39.842 --> 0:06:44.042
<v S2>really well for the largest number of, um, the largest

0:06:44.041 --> 0:06:48.282
<v S2>number of possible targets that we could possibly, um, that

0:06:48.282 --> 0:06:49.522
<v S2>we could possibly do well in.

0:06:51.322 --> 0:07:01.002
<v S1>Okay. Yeah. Interesting. So would you say that, um. Basically

0:07:01.002 --> 0:07:03.241
<v S1>those those things that you described in the beginning, those

0:07:03.242 --> 0:07:05.882
<v S1>are like modules and they should almost like, kind of

0:07:05.922 --> 0:07:08.442
<v S1>work independently. So you can, like, hand a task to

0:07:08.481 --> 0:07:11.372
<v S1>each of them. Is that kind of the the system

0:07:11.372 --> 0:07:12.332
<v S1>design idea?

0:07:12.892 --> 0:07:15.692
<v S2>Yeah. Yeah. So we, um, part of this was just

0:07:15.732 --> 0:07:20.052
<v S2>surviving a really rapid development cycle. This wasn't really advertised

0:07:20.092 --> 0:07:22.012
<v S2>all that well, but we actually only had about three

0:07:22.012 --> 0:07:26.292
<v S2>months to develop the first version of Buttercup in the semi-finals. Um,

0:07:26.292 --> 0:07:29.452
<v S2>and we actually had only had about six months to develop, um,

0:07:29.492 --> 0:07:32.332
<v S2>the final version of Buttercup or Buttercup 2.0, which, which

0:07:32.332 --> 0:07:35.132
<v S2>took second place in the finals. Um, and that was

0:07:35.132 --> 0:07:37.852
<v S2>because even though each round of the competition ran for

0:07:37.852 --> 0:07:41.012
<v S2>a year, it took DARPA a while to solicit feedback

0:07:41.012 --> 0:07:45.572
<v S2>from competitors, other stakeholders, and actually solidify the rules. Um,

0:07:45.612 --> 0:07:47.732
<v S2>and so the rules were solidified. It was really at

0:07:47.732 --> 0:07:51.652
<v S2>risk to do really kind of any development on the system. Also,

0:07:51.652 --> 0:07:54.772
<v S2>certain things like the the technical specifics on their competition

0:07:54.772 --> 0:07:58.492
<v S2>API weren't available until later in the, in these cycles. Um,

0:07:58.492 --> 0:08:01.292
<v S2>so part of the reason why we modularized each component

0:08:01.612 --> 0:08:04.812
<v S2>was so that we could take smaller subteams within my

0:08:04.812 --> 0:08:08.452
<v S2>larger team of about ten engineers, um, all working some

0:08:08.452 --> 0:08:10.862
<v S2>degree of part time on this system so we can

0:08:10.862 --> 0:08:12.982
<v S2>modularize it, keep them kind of separate. You know, it

0:08:12.982 --> 0:08:14.782
<v S2>gives us this integration problem that we have to deal

0:08:14.782 --> 0:08:15.902
<v S2>with at the end. We have to kind of put

0:08:15.902 --> 0:08:18.302
<v S2>everything together and make sure that it runs well. Um,

0:08:18.342 --> 0:08:19.822
<v S2>but it was kind of a necessity. It was kind

0:08:19.822 --> 0:08:21.902
<v S2>of a necessity because we had to work on developing

0:08:21.902 --> 0:08:25.342
<v S2>everything independently. We couldn't afford to just do the first block.

0:08:25.622 --> 0:08:27.302
<v S2>And is it becoming like that? You know, that meme

0:08:27.302 --> 0:08:31.302
<v S2>of the horse drawing where really finally defined head and

0:08:31.302 --> 0:08:33.222
<v S2>then as it gets towards like the the back parts

0:08:33.222 --> 0:08:35.382
<v S2>of the animal, it turns into like a raw sketch.

0:08:35.502 --> 0:08:37.381
<v S2>That was what was going to happen if we if

0:08:37.382 --> 0:08:40.822
<v S2>we didn't modularize this. Um, but it also helped because

0:08:40.822 --> 0:08:43.742
<v S2>as we decided to change out strategies or play with

0:08:43.742 --> 0:08:45.982
<v S2>different strategies, made it really easy to kind of plug

0:08:45.982 --> 0:08:48.262
<v S2>and play different parts to see what would work later on.

0:08:49.462 --> 0:08:52.822
<v S1>Yeah, that makes sense. So I keep having this debate

0:08:52.822 --> 0:08:56.381
<v S1>with a whole bunch of people. It's kind of around, um,

0:08:56.942 --> 0:09:00.542
<v S1>let the model do the work because the model is smarter. Um,

0:09:00.742 --> 0:09:04.781
<v S1>and it just understands what to do. And then there's uh,

0:09:05.462 --> 0:09:10.112
<v S1>the other argument, which is, um, build a robust system

0:09:10.832 --> 0:09:13.552
<v S1>and you have the model kind of just be the

0:09:13.552 --> 0:09:17.432
<v S1>intelligence that helps guide the system or moves things through

0:09:17.432 --> 0:09:22.392
<v S1>the system, or maybe routes, uh, across the system or whatever.

0:09:22.712 --> 0:09:25.432
<v S1>But the system itself should be set up really well,

0:09:26.232 --> 0:09:28.752
<v S1>and you're kind of like functioning as a router. And

0:09:28.752 --> 0:09:33.352
<v S1>then when the model gets updated, it makes the system better. Um,

0:09:33.592 --> 0:09:37.272
<v S1>but the counter to that is basically that we're just

0:09:37.272 --> 0:09:40.032
<v S1>going to design bad systems. So we should stop trying

0:09:40.032 --> 0:09:43.192
<v S1>to be rigid there and just use the model. Like

0:09:43.352 --> 0:09:44.752
<v S1>where do you guys fall on that?

0:09:45.432 --> 0:09:49.672
<v S2>Uh, I think it was probably closest to the second

0:09:49.672 --> 0:09:53.712
<v S2>one and maybe more like an an undescribed third thing.

0:09:53.712 --> 0:09:56.911
<v S2>So I'll kind of go over for I, um, you know, we've,

0:09:56.912 --> 0:09:58.792
<v S2>we've been, you know, in me particular I've been doing

0:09:58.792 --> 0:10:03.592
<v S2>research on like applied AI for, for security problems since before, uh,

0:10:03.592 --> 0:10:06.271
<v S2>the large language model became the predominant form of technology.

0:10:06.272 --> 0:10:12.402
<v S2>Back to, you know, 2018, 2019 time frame. Um, and uh, realistically,

0:10:12.402 --> 0:10:14.482
<v S2>like large language models are great at a good number

0:10:14.482 --> 0:10:17.842
<v S2>of things. Um, but they really struggle with certain things.

0:10:18.282 --> 0:10:20.881
<v S2>And particularly in a challenge like this where you have

0:10:20.881 --> 0:10:23.722
<v S2>to do multiple things right in sequence in order to

0:10:23.722 --> 0:10:27.082
<v S2>be successful, you have to worry about errors that start

0:10:27.122 --> 0:10:31.242
<v S2>off in early stages of an LLM heavy pipeline that

0:10:31.242 --> 0:10:33.562
<v S2>compound over time, until eventually you get to the point

0:10:33.562 --> 0:10:36.521
<v S2>where I think kind of collapses. Um, so our philosophy

0:10:36.522 --> 0:10:40.122
<v S2>on using AI, uh, specifically within the AI cyber challenge

0:10:40.122 --> 0:10:42.602
<v S2>and also kind of more broadly, um, is to use

0:10:42.602 --> 0:10:49.082
<v S2>it for, um, tightly constrained, highly contextualized problems that, um,

0:10:49.362 --> 0:10:51.842
<v S2>the models are set up for success. Um, so this

0:10:51.842 --> 0:10:54.162
<v S2>is actually kind of an interesting anecdote. Um, during the

0:10:54.162 --> 0:10:58.122
<v S2>first round of, uh, during the first round of the

0:10:58.122 --> 0:11:03.202
<v S2>AI Cyber Challenge, um, the whole concept of like multi-agent systems,

0:11:03.442 --> 0:11:08.342
<v S2>systems that have, like, tools available to them. um, didn't

0:11:08.342 --> 0:11:10.622
<v S2>really exist. It was like in a couple of papers

0:11:10.622 --> 0:11:13.901
<v S2>on archive and ultimately, um, the way we built our

0:11:13.902 --> 0:11:17.582
<v S2>aperture for the semi-finals and for the finals, um, is

0:11:17.622 --> 0:11:20.862
<v S2>is now reflective of how LM driven systems are just

0:11:20.862 --> 0:11:23.742
<v S2>built today. So it's actually really vindicating. So like our

0:11:23.742 --> 0:11:28.021
<v S2>patcher is a like a multi-agent system. It's got multiple

0:11:28.022 --> 0:11:30.662
<v S2>large language models, each with different roles to play within

0:11:30.662 --> 0:11:34.342
<v S2>this process that collaborate to generate a patch and then

0:11:34.342 --> 0:11:38.021
<v S2>validate it to make sure that it's actually one will compile,

0:11:38.062 --> 0:11:41.582
<v S2>two will actually fix the vulnerability that we've discovered. And

0:11:41.582 --> 0:11:44.462
<v S2>three doesn't break other functionality within the program. So we

0:11:44.462 --> 0:11:46.342
<v S2>found that trying to ask one large language model to

0:11:46.342 --> 0:11:48.982
<v S2>do all of that didn't really work out. And also

0:11:48.982 --> 0:11:52.662
<v S2>in the semi-finals, the, the reasoning models, um, or the

0:11:52.662 --> 0:11:55.702
<v S2>thinking models, depending on, on the branding, they didn't exist,

0:11:55.702 --> 0:11:57.702
<v S2>they weren't available. They weren't even available to us to

0:11:57.742 --> 0:12:02.222
<v S2>use as like, um, early adopter models in the a.i.c.c.

0:12:02.222 --> 0:12:04.262
<v S2>So we were dealing with, with simple, you know, back

0:12:04.261 --> 0:12:09.592
<v S2>and forth, um, style chat models. Um, so we actually

0:12:09.592 --> 0:12:12.391
<v S2>had to build in a lot of this reasoning as

0:12:12.392 --> 0:12:14.912
<v S2>part of this, like multi-agent architecture, we had to build

0:12:14.912 --> 0:12:18.512
<v S2>in a lot of like reliability and engineering code around

0:12:18.511 --> 0:12:23.872
<v S2>maintaining the pipeline. Um, fortunately, the process for um, discovering

0:12:23.872 --> 0:12:27.272
<v S2>artifacts and submitting them was pretty rigid. Um, so it

0:12:27.272 --> 0:12:29.912
<v S2>didn't really affect us that much in terms of or

0:12:29.912 --> 0:12:31.232
<v S2>it didn't have to like put a lot of really

0:12:31.232 --> 0:12:34.632
<v S2>complex reasoning in, um, but actually we ended up even

0:12:34.631 --> 0:12:36.552
<v S2>by the end of the finals, we didn't use a

0:12:36.552 --> 0:12:39.952
<v S2>reasoning or a thinking model, um, in Buttercup, because we'd

0:12:39.952 --> 0:12:42.552
<v S2>actually built it in, it was part of the circuitry

0:12:42.552 --> 0:12:45.832
<v S2>or part of like the, um, the Python code, part

0:12:45.832 --> 0:12:49.312
<v S2>of our orchestration code. Um, so we had the opportunity

0:12:49.312 --> 0:12:50.712
<v S2>in the finals to take that out and let the

0:12:50.712 --> 0:12:52.992
<v S2>model do the work. We kind of explored it a

0:12:52.992 --> 0:12:55.592
<v S2>little bit, but ultimately we decided against it because the

0:12:55.592 --> 0:12:58.672
<v S2>best case scenario was that the model would kind of

0:12:58.712 --> 0:13:01.792
<v S2>figure out on its own how to break the problem

0:13:01.792 --> 0:13:03.752
<v S2>down and how to do individual things, and what tools

0:13:03.752 --> 0:13:07.242
<v S2>to call in sequence. Uh, but we were already subject

0:13:07.242 --> 0:13:09.202
<v S2>matter experts who did it exactly the way it should

0:13:09.202 --> 0:13:12.242
<v S2>be done. So the the best case scenario is that

0:13:12.242 --> 0:13:14.762
<v S2>the model was able to replicate what we've done only

0:13:14.761 --> 0:13:17.842
<v S2>at a more expensive per call. Um, or more expensive,

0:13:17.881 --> 0:13:21.882
<v S2>like number of volume of tokens. Um, so we actually kept, um, we,

0:13:21.881 --> 0:13:23.842
<v S2>we did upgrade our models. We went from the GPT

0:13:23.881 --> 0:13:28.161
<v S2>three series, um, and the Claude three, uh, series of

0:13:28.162 --> 0:13:33.122
<v S2>models and moved up to, um, the four and like

0:13:33.162 --> 0:13:36.362
<v S2>the basically the Gen four versions of models for the final.

0:13:36.362 --> 0:13:39.402
<v S2>So we, we upgraded the underlying models, but we very much, um,

0:13:39.442 --> 0:13:42.562
<v S2>kept the problems very small for the, for the AI's

0:13:42.682 --> 0:13:45.362
<v S2>or for the, um, for the AI models, so that

0:13:45.362 --> 0:13:48.122
<v S2>we would avoid this issue where you have compounding errors,

0:13:48.362 --> 0:13:51.642
<v S2>you have to worry about like these, these modulo errors of,

0:13:51.682 --> 0:13:54.082
<v S2>you know, deciding to do the wrong thing in sequence.

0:13:54.562 --> 0:13:56.682
<v S2>And that actually turns out to be really, uh, to

0:13:56.682 --> 0:14:00.242
<v S2>be penalize you heavily in these long systems because, you know,

0:14:00.282 --> 0:14:03.202
<v S2>when a system decides, you know, hey, I've got to

0:14:03.202 --> 0:14:05.692
<v S2>do A, B, C and D and C before b.

0:14:06.052 --> 0:14:09.651
<v S2>All of that information involved with dealing with this like

0:14:09.852 --> 0:14:12.852
<v S2>out of sequence task. It stays in the context window.

0:14:12.852 --> 0:14:14.732
<v S2>And it kind of, for lack of a better term,

0:14:14.772 --> 0:14:17.532
<v S2>kind of pollutes the model's ability to kind of reorder

0:14:17.532 --> 0:14:19.132
<v S2>those tasks and do them correctly. It has a hard

0:14:19.132 --> 0:14:22.532
<v S2>time kind of forgetting information until it rolls out of

0:14:22.532 --> 0:14:24.692
<v S2>the context window. So it's a really long way to

0:14:24.692 --> 0:14:28.172
<v S2>say we probably did the latter version. But, um, one

0:14:28.172 --> 0:14:29.692
<v S2>thing I do want to say is like the actual

0:14:29.732 --> 0:14:33.052
<v S2>like processing of artifacts through the system, we didn't rely

0:14:33.052 --> 0:14:34.692
<v S2>on the AI to kind of figure out, okay, I've

0:14:34.692 --> 0:14:36.532
<v S2>got a vulnerability now I should patch it. That was

0:14:36.532 --> 0:14:40.772
<v S2>also all, um, that was also all orchestrated, um, by

0:14:40.772 --> 0:14:42.172
<v S2>our by our larger pipeline.

0:14:42.572 --> 0:14:46.132
<v S1>Okay. Okay. So yeah, I've seen this a lot as well.

0:14:46.172 --> 0:14:48.772
<v S1>I mean, I feel like this is a general concept

0:14:48.772 --> 0:14:54.252
<v S1>that people are coming to, which is, um, I don't

0:14:54.252 --> 0:14:59.372
<v S1>want to say legacy tech. Traditional tech is just like, deterministic. So, like,

0:14:59.372 --> 0:15:01.532
<v S1>that's the tech that you want to use to, like,

0:15:02.092 --> 0:15:05.342
<v S1>do things that matter, and then you kind of want

0:15:05.382 --> 0:15:09.702
<v S1>to use like AI for like a, um, I don't know,

0:15:09.742 --> 0:15:13.462
<v S1>like a router maybe, or like a, um, something intelligent

0:15:13.462 --> 0:15:19.222
<v S1>about choosing which standard tech to use, but not making like, choices.

0:15:19.222 --> 0:15:22.742
<v S1>Maybe necessarily. Um, I don't know. I'm trying to figure

0:15:22.742 --> 0:15:24.582
<v S1>out how to articulate that, but it's like.

0:15:24.782 --> 0:15:26.262
<v S2>Yeah, well, it's actually funny you bring this up. I've

0:15:26.262 --> 0:15:28.982
<v S2>had to kind of get good at articulating this, um,

0:15:28.982 --> 0:15:31.022
<v S2>over the last couple of years. So the way I've

0:15:31.022 --> 0:15:33.742
<v S2>explained this to people is that certain problems, particularly in

0:15:33.742 --> 0:15:37.462
<v S2>computer science with this kind of generalizes everywhere. Certain problems

0:15:37.502 --> 0:15:43.142
<v S2>lend themselves to prescriptive solutions. So prescriptive solution is something

0:15:43.142 --> 0:15:44.982
<v S2>that we do when we write an algorithm to solve

0:15:44.982 --> 0:15:47.502
<v S2>a problem. This could be like coming up with an

0:15:47.502 --> 0:15:50.302
<v S2>answer for the traveling salesman problem. You know, we know

0:15:50.342 --> 0:15:52.502
<v S2>it's a really difficult problem to solve, but there's greedy

0:15:52.502 --> 0:15:54.982
<v S2>algorithms that do a pretty good job and for the

0:15:54.982 --> 0:15:56.822
<v S2>most part, will get you a good answer. Maybe not

0:15:56.822 --> 0:15:58.542
<v S2>the best answer, but they'll get you a good one.

0:15:59.102 --> 0:16:01.952
<v S2>So for these types of problems, you can prescribe a

0:16:01.952 --> 0:16:04.552
<v S2>set of steps to the computer and let them execute them.

0:16:04.952 --> 0:16:09.032
<v S2>Now other problems are really, really challenging to prescribe a

0:16:09.032 --> 0:16:12.952
<v S2>solution for. So these types of problems lend themselves to

0:16:12.992 --> 0:16:15.592
<v S2>AI or ML techniques because you can use a descriptive

0:16:15.712 --> 0:16:19.352
<v S2>instead of prescriptive solution. So a good example of this

0:16:19.352 --> 0:16:22.432
<v S2>is like image recognition. So it's really really hard to

0:16:22.472 --> 0:16:25.112
<v S2>take a picture of a cat and write a computer

0:16:25.112 --> 0:16:28.832
<v S2>program that will say, okay, based on the pixel colors

0:16:28.832 --> 0:16:30.992
<v S2>of this pixel and this position, this is going to

0:16:30.992 --> 0:16:32.832
<v S2>be a cat, because a cat can be in a

0:16:32.832 --> 0:16:36.312
<v S2>million different contortions. It can have different hair, the face

0:16:36.312 --> 0:16:38.752
<v S2>can be half obscured. But what we can do is

0:16:38.792 --> 0:16:41.152
<v S2>we can describe to an AI ML model what a

0:16:41.192 --> 0:16:43.712
<v S2>cat looks like with millions of pictures, because we have

0:16:43.712 --> 0:16:46.032
<v S2>millions of pictures of cats. And then it can do

0:16:46.032 --> 0:16:48.512
<v S2>a good job of solving that problem. Now it might

0:16:48.512 --> 0:16:51.192
<v S2>make mistakes, but this is better than the option that

0:16:51.192 --> 0:16:54.152
<v S2>you had with the traditional approach, because that approach was

0:16:54.152 --> 0:16:57.152
<v S2>awful to begin with. So a good example of a

0:16:57.152 --> 0:17:01.242
<v S2>corollary for this in Buttercup is patch generation. There's a

0:17:01.282 --> 0:17:03.282
<v S2>lot of synthetic code generation tools and a lot of

0:17:03.282 --> 0:17:06.202
<v S2>research in this area. But in terms of like automatically

0:17:06.202 --> 0:17:10.242
<v S2>generating patches to fix bugs, unless your bug is like

0:17:10.282 --> 0:17:13.321
<v S2>dead obvious, like it's missing a bounds check and it's

0:17:13.322 --> 0:17:15.402
<v S2>really easy to apply some sort of pattern matching to

0:17:15.442 --> 0:17:17.202
<v S2>figure out what the lower bound is, or the upper

0:17:17.202 --> 0:17:20.882
<v S2>bound is that needs to be checked. Um, tools to

0:17:20.922 --> 0:17:24.482
<v S2>generate patches for weird bugs. Like they just don't exist.

0:17:24.922 --> 0:17:27.402
<v S2>So this is a great place for AIML to help

0:17:27.402 --> 0:17:29.401
<v S2>us out. And it actually turns out, um, you know,

0:17:29.402 --> 0:17:31.922
<v S2>this is really proven true by the AI Cyber Challenge

0:17:31.922 --> 0:17:35.921
<v S2>and by Buttercup, more specifically, um, llms are great at

0:17:35.922 --> 0:17:38.402
<v S2>generating code, um, because it's one of the biggest value

0:17:38.402 --> 0:17:43.002
<v S2>propositions right now for the technology. So, um, generating patches

0:17:43.002 --> 0:17:45.602
<v S2>for bugs is tightly constrained. It's not not asking you

0:17:45.602 --> 0:17:48.561
<v S2>to generate all of the code that is necessary to

0:17:48.602 --> 0:17:51.042
<v S2>build this entire system that I've got a spec sheet for.

0:17:51.482 --> 0:17:54.121
<v S2>I'm only asking it given this code, and given what

0:17:54.122 --> 0:17:56.002
<v S2>we know about this vulnerability, how would you change it

0:17:56.002 --> 0:17:59.122
<v S2>to fix it? The large language models have already internalized

0:17:59.122 --> 0:18:03.382
<v S2>internalize large numbers of incremental commits to open source code

0:18:03.382 --> 0:18:06.262
<v S2>repositories that fix bugs, so they actually have a really

0:18:06.262 --> 0:18:09.742
<v S2>good track record with, um, more than I expected, even

0:18:09.742 --> 0:18:13.022
<v S2>when we started this, uh, with generating patches. So this

0:18:13.022 --> 0:18:15.382
<v S2>is a great example of where generating a patch is

0:18:15.382 --> 0:18:18.462
<v S2>something that lends itself towards a descriptive solution and a

0:18:18.462 --> 0:18:23.621
<v S2>descriptive algorithm, uh, or an AIML algorithm versus something that's prescriptive, um,

0:18:23.902 --> 0:18:25.941
<v S2>which is fuzzing. Fuzzing is a good example of a

0:18:25.942 --> 0:18:28.341
<v S2>prescriptive solution. If you if you need to find a

0:18:28.342 --> 0:18:32.822
<v S2>vulnerability and you need a crashing input, um, you have

0:18:32.821 --> 0:18:35.061
<v S2>to be able to prove that it exists. It's really,

0:18:35.061 --> 0:18:37.262
<v S2>really hard to get an LLM to do that because

0:18:37.302 --> 0:18:42.262
<v S2>llms the underlying reasoning. They don't have like data feedforward. Um,

0:18:42.302 --> 0:18:45.542
<v S2>they basically they look at source code like they look

0:18:45.542 --> 0:18:49.621
<v S2>at natural language. Natural language doesn't describe the activities of

0:18:49.622 --> 0:18:53.022
<v S2>an underlying state machine that runs on hardware after it

0:18:53.022 --> 0:18:55.622
<v S2>passes through a compiler. So like, you know, the source

0:18:55.622 --> 0:18:57.982
<v S2>code when looked at by a model. Models look at

0:18:57.982 --> 0:19:01.112
<v S2>source code in a really shallow way. Um, so when

0:19:01.112 --> 0:19:04.191
<v S2>we want to find, you know, a crashing input, a

0:19:04.192 --> 0:19:06.192
<v S2>fuzzer is a great way because we can prescribe a solution,

0:19:06.192 --> 0:19:10.072
<v S2>which is try everything, brute force it. Um, just come

0:19:10.071 --> 0:19:11.671
<v S2>up with different inputs, throw it in there, and then

0:19:11.672 --> 0:19:13.752
<v S2>if it crashes, well, there you go. You've proven it.

0:19:13.912 --> 0:19:16.952
<v S2>So that's what fuzzing heavily early on. You know, for

0:19:16.952 --> 0:19:19.192
<v S2>one type of problem we use patching heavily for another.

0:19:20.071 --> 0:19:24.552
<v S1>Yeah, that makes sense. And the other problem with, um,

0:19:25.632 --> 0:19:33.152
<v S1>finding vulns with with um, I also seems to me that, um, they,

0:19:33.152 --> 0:19:35.992
<v S1>they want to please there's they're heavily biased to be like,

0:19:35.992 --> 0:19:38.391
<v S1>this is it. This is one. Yeah. Well, this is

0:19:38.392 --> 0:19:40.831
<v S1>definitely a hit or whatever. And you look at it

0:19:40.872 --> 0:19:44.952
<v S1>and it's actually not. So I guess the intelligence is

0:19:44.952 --> 0:19:48.632
<v S1>deciding to use the fuzzer, which it could help make

0:19:48.632 --> 0:19:51.552
<v S1>that decision that a fuzzer should be used. Right.

0:19:52.512 --> 0:19:55.192
<v S2>Yeah. Yeah. So it's it's funny you bring that up.

0:19:55.192 --> 0:19:59.402
<v S2>Large language models really struggle to solve problems that aren't

0:19:59.442 --> 0:20:02.162
<v S2>rooted in some kind of ground truth. Um, it turns

0:20:02.162 --> 0:20:04.242
<v S2>out there's a huge difference there. We have some internal

0:20:04.242 --> 0:20:08.242
<v S2>research that we haven't published. Anybody could reproduce it. But, um,

0:20:08.282 --> 0:20:09.482
<v S2>so it turns out if you if you have a

0:20:09.482 --> 0:20:11.242
<v S2>bit of source code and you ask the model to

0:20:11.282 --> 0:20:15.522
<v S2>tell you where the vulnerability is, um, it will absolutely

0:20:15.561 --> 0:20:18.202
<v S2>hallucinate a vulnerability because it wants to please you. Uh,

0:20:18.202 --> 0:20:20.841
<v S2>we have one of our researchers, um, one of our

0:20:20.842 --> 0:20:23.522
<v S2>principal researchers, Artem. He's a great guy. He, um, he

0:20:23.522 --> 0:20:28.522
<v S2>downloaded the, um, formally, correct. Uh, the formally proven correct

0:20:28.522 --> 0:20:32.042
<v S2>portions of, uh, of Linux and asked a large language

0:20:32.042 --> 0:20:35.882
<v S2>model several hundred times. Um, here's a snippet of code.

0:20:35.882 --> 0:20:37.722
<v S2>It has a vulnerability where it is, and every single

0:20:37.722 --> 0:20:40.802
<v S2>time it would find it would manufacture vulnerability because it

0:20:40.802 --> 0:20:43.722
<v S2>wants to find the answer. So it turns out when

0:20:43.722 --> 0:20:46.601
<v S2>we started asking it, is there a vulnerability? Um, it

0:20:46.602 --> 0:20:49.162
<v S2>messed up a little less, but it would still assume

0:20:49.162 --> 0:20:51.921
<v S2>that because you're asking that there's something to find and

0:20:51.922 --> 0:20:54.322
<v S2>it would still mess up quite a bit. So that's

0:20:54.321 --> 0:20:57.012
<v S2>why when we're in the concept where we're, when we're using, um,

0:20:57.052 --> 0:21:00.252
<v S2>large language models for generating patches. It's great because we

0:21:00.252 --> 0:21:02.411
<v S2>know there's a vulnerability because we found it and we

0:21:02.412 --> 0:21:04.571
<v S2>proved it, and we can collect additional information.

0:21:04.612 --> 0:21:05.172
<v S1>Yeah.

0:21:05.412 --> 0:21:08.532
<v S2>So now I don't have to worry about asking the model. Hey,

0:21:08.532 --> 0:21:10.611
<v S2>do you think there's a vulnerability? And if so, patch it.

0:21:10.612 --> 0:21:12.972
<v S2>I say no, there is a vulnerability. It's here. This

0:21:12.972 --> 0:21:15.772
<v S2>is extra information about a code that touches it. Now

0:21:15.772 --> 0:21:18.571
<v S2>generate a patch. And the model is very good at

0:21:18.571 --> 0:21:21.012
<v S2>doing that because it takes away the decision making or,

0:21:21.332 --> 0:21:24.092
<v S2>or the judgment call that large language models are really,

0:21:24.092 --> 0:21:27.252
<v S2>really bad at because they don't actually model judgment calls underneath.

0:21:27.612 --> 0:21:31.332
<v S2>And their architecture, they, they model, you know, sequencing information,

0:21:31.571 --> 0:21:34.052
<v S2>sequencing tokens. And when you write code, you're writing a

0:21:34.052 --> 0:21:36.851
<v S2>sequence of tokens. So these problems tend to be, um,

0:21:36.892 --> 0:21:39.972
<v S2>a lot more suitable than other problems where you're asking

0:21:39.972 --> 0:21:42.292
<v S2>it to find the ground truth for you, bad problems

0:21:42.292 --> 0:21:45.332
<v S2>for llms asking it to take ground truth and expand

0:21:45.332 --> 0:21:47.611
<v S2>upon it. Great applications for Llms.

0:21:48.012 --> 0:21:49.772
<v S1>Oh man, I love that. And this also goes to

0:21:49.772 --> 0:21:52.652
<v S1>your previous point of not wanting to pollute the context

0:21:52.652 --> 0:21:56.622
<v S1>for the current task on hand, which is building that patch,

0:21:57.222 --> 0:22:00.582
<v S1>because if you have like some history of like there

0:22:00.582 --> 0:22:04.182
<v S1>were previous decisions made or previous questions asked or whatever

0:22:04.222 --> 0:22:06.061
<v S1>it might get like diverted, you know?

0:22:06.942 --> 0:22:11.102
<v S2>Yeah, absolutely. It's um, it's a, it's a big challenge particularly, um,

0:22:11.582 --> 0:22:13.222
<v S2>I don't know, it's funny. I've, I've been kind of

0:22:13.262 --> 0:22:15.742
<v S2>trying to sing this gospel internally, uh, at Trail of

0:22:15.742 --> 0:22:18.222
<v S2>Bits and to other people who will listen that, um,

0:22:18.622 --> 0:22:22.502
<v S2>the increasing size of context window is not always your friend. Um,

0:22:23.061 --> 0:22:25.502
<v S2>by increasing the size of the context window. I mean,

0:22:25.502 --> 0:22:27.102
<v S2>if you think about how the large language model works

0:22:27.102 --> 0:22:29.702
<v S2>under the hood, it's using these contexts to attune the

0:22:29.702 --> 0:22:32.302
<v S2>model to certain parts of its training data that are

0:22:32.302 --> 0:22:35.262
<v S2>going to be highly relevant to solving your particular problem.

0:22:35.622 --> 0:22:37.862
<v S2>And the more words and the more tokens you put

0:22:37.862 --> 0:22:41.262
<v S2>into the context window, the more you are kind of

0:22:41.302 --> 0:22:46.821
<v S2>nulling out or, um, numbing the attention mechanism. You're forcing

0:22:46.821 --> 0:22:48.742
<v S2>it to become more and more general, because now there

0:22:48.782 --> 0:22:52.822
<v S2>are more tokens that are affecting these attuned probabilities. So

0:22:52.821 --> 0:22:56.192
<v S2>you actually are better off with using now. Context window

0:22:56.232 --> 0:22:58.911
<v S2>is great because if you need, let's say a million,

0:22:59.152 --> 0:23:01.671
<v S2>you know, a million tokens in your context window to

0:23:01.712 --> 0:23:04.472
<v S2>constrain the problem, then use a million tokens. But if

0:23:04.472 --> 0:23:07.312
<v S2>you can do it for 1000 or 10,000, you're going

0:23:07.311 --> 0:23:09.831
<v S2>to get better results because you're more likely to focus

0:23:09.832 --> 0:23:11.311
<v S2>that model where it needs to be.

0:23:12.512 --> 0:23:16.311
<v S1>Yeah, I love this. Like, by the way, this this

0:23:16.352 --> 0:23:20.392
<v S1>this is great. This is great. Um, I'm going to

0:23:20.792 --> 0:23:24.912
<v S1>create a lot of content out of this, um, because it's,

0:23:24.912 --> 0:23:30.631
<v S1>it's really crystallizing in like one starting to form something

0:23:30.632 --> 0:23:34.232
<v S1>in my mind. I'd love to work with you on it. Um, essentially,

0:23:34.232 --> 0:23:37.391
<v S1>what I'm trying to think of is, um, what are

0:23:37.392 --> 0:23:40.592
<v S1>some general statements that we could make? Um, one that

0:23:40.592 --> 0:23:42.512
<v S1>I'm sort of heading in the direction of, you tell

0:23:42.512 --> 0:23:46.872
<v S1>me if I'm wrong is like. And this might be

0:23:46.872 --> 0:23:51.032
<v S1>overstating it, but like, the system itself should be highly

0:23:51.032 --> 0:23:56.522
<v S1>modular and and most as much as possible made up

0:23:56.522 --> 0:24:01.602
<v S1>of traditional and deterministic tech. And then the way that

0:24:01.602 --> 0:24:05.082
<v S1>you use the AI is for the specific type of problem,

0:24:05.282 --> 0:24:08.121
<v S1>which we're going to articulate the way you articulated it

0:24:09.482 --> 0:24:13.762
<v S1>for those types of problems where routing is needed to

0:24:13.762 --> 0:24:18.841
<v S1>the traditional tech. Um, and it's like, don't just go

0:24:18.882 --> 0:24:22.682
<v S1>crazy with AI. Don't ask it questions that the traditional

0:24:22.682 --> 0:24:27.122
<v S1>text should be answering. Um, it's something like that. And

0:24:27.122 --> 0:24:33.362
<v S1>then ultimately you have like this dependable deterministic system with

0:24:33.762 --> 0:24:37.081
<v S1>the minimum amount of AI that is required to move

0:24:37.402 --> 0:24:39.162
<v S1>appropriately through that system.

0:24:40.522 --> 0:24:43.561
<v S2>Yeah. So yeah, really it comes down to problem formulation.

0:24:43.561 --> 0:24:46.722
<v S2>And this is like the the great part about and

0:24:46.722 --> 0:24:48.002
<v S2>this is part of the reason why you see such

0:24:48.002 --> 0:24:50.841
<v S2>a huge overlap in interest between people from the computer

0:24:50.842 --> 0:24:53.622
<v S2>science background and people from like data science backgrounds on

0:24:53.622 --> 0:24:55.782
<v S2>here because, you know, one of the basic things you

0:24:55.782 --> 0:24:57.582
<v S2>learn in computer science, like when you get to like

0:24:57.582 --> 0:25:01.821
<v S2>the graduate level is problem formulation. It's how to recognize

0:25:02.022 --> 0:25:07.302
<v S2>your problem as a derivative, or maybe a like dressed

0:25:07.302 --> 0:25:12.102
<v S2>up version of some other problem. So, you know, right away, um, okay,

0:25:12.102 --> 0:25:13.742
<v S2>I have this problem of, okay, I've got to manage

0:25:13.742 --> 0:25:16.742
<v S2>this delivery system. How do I make this delivery system, um,

0:25:16.742 --> 0:25:20.022
<v S2>for Amazon efficient? You can recognize this right away as, oh,

0:25:20.022 --> 0:25:22.782
<v S2>this is traveling salesman. There's no good way to do this.

0:25:22.821 --> 0:25:24.222
<v S2>But what I can do is I can. I'm going

0:25:24.262 --> 0:25:26.582
<v S2>to get a good answer. I just have to accept

0:25:26.942 --> 0:25:29.142
<v S2>that my answer is going to be imprecise or not

0:25:29.142 --> 0:25:33.742
<v S2>necessarily optimal. Um, and in applying AI and ML to

0:25:33.742 --> 0:25:37.342
<v S2>security problems or any problem in general, the first step

0:25:37.342 --> 0:25:40.782
<v S2>is very much like problem formulation. It's understanding what kind

0:25:40.782 --> 0:25:42.621
<v S2>of model is going to work best for this problem,

0:25:42.662 --> 0:25:45.742
<v S2>because is this a problem that will work well with

0:25:45.742 --> 0:25:47.661
<v S2>a time series model, because my data is coming in

0:25:47.662 --> 0:25:49.861
<v S2>over time, or is this a model that's going to

0:25:49.862 --> 0:25:54.992
<v S2>work well with, um, let's say like a, like linear regression,

0:25:54.992 --> 0:25:59.472
<v S2>because there is some true underlying probability for how the

0:25:59.472 --> 0:26:02.152
<v S2>data is distributed that I'm trying to learn from one

0:26:02.152 --> 0:26:05.432
<v S2>of like the kind of curses of large language models

0:26:05.752 --> 0:26:09.032
<v S2>is that they have abstracted all of this good data

0:26:09.032 --> 0:26:12.592
<v S2>science practice, all these good data science practices away. And

0:26:12.592 --> 0:26:15.992
<v S2>now it's great because it democratizes it. Anybody can use AI,

0:26:16.032 --> 0:26:18.071
<v S2>anybody can use an LLM. And all you have to

0:26:18.071 --> 0:26:20.272
<v S2>do is be able to articulate your problem. The problem is,

0:26:20.272 --> 0:26:23.232
<v S2>is that it also abstracts away problem formulation. And now

0:26:23.232 --> 0:26:26.311
<v S2>we're starting to use Llms because they're accessible for certain

0:26:26.311 --> 0:26:29.831
<v S2>types of problems that they're really not well formulated for. Um.

0:26:30.672 --> 0:26:31.272
<v S1>Yeah.

0:26:31.432 --> 0:26:33.391
<v S2>So this is this is kind of where we get

0:26:33.392 --> 0:26:35.712
<v S2>to the issue. So the good news is we don't

0:26:35.712 --> 0:26:38.152
<v S2>have to just like say, okay, well, I can't do

0:26:38.192 --> 0:26:40.232
<v S2>problem formulation with an LLM, so I just throw it away.

0:26:40.232 --> 0:26:42.111
<v S2>Don't use it. I have to go back to, you know,

0:26:42.152 --> 0:26:44.431
<v S2>TensorFlow and writing my own models and stuff. What we

0:26:44.432 --> 0:26:46.592
<v S2>really have to do is get to what you were describing,

0:26:46.912 --> 0:26:50.282
<v S2>which is rather than throw the LLM at a large problem.

0:26:50.282 --> 0:26:52.722
<v S2>We take it a step further. We break the problem down.

0:26:52.722 --> 0:26:56.002
<v S2>Are there subproblems that are highly amenable to AI solutions?

0:26:56.242 --> 0:26:58.762
<v S2>I have a litmus test that I, that I pass, um,

0:26:58.802 --> 0:27:01.042
<v S2>you know, problems through. And I try to encourage my

0:27:01.042 --> 0:27:04.802
<v S2>team members to use, um, which is, you know, basically

0:27:04.802 --> 0:27:06.482
<v S2>like a check to see whether a problem is good

0:27:06.482 --> 0:27:09.042
<v S2>for AIML. And it's usually, you know, do you have

0:27:09.042 --> 0:27:11.802
<v S2>enough data in the model that you can train? In

0:27:11.802 --> 0:27:13.722
<v S2>this case, it now becomes is the LLM. Does the

0:27:13.722 --> 0:27:15.722
<v S2>LLM have examples of this on the internet that it

0:27:15.722 --> 0:27:17.841
<v S2>can draw from, or are you asking it to do

0:27:17.842 --> 0:27:23.282
<v S2>something like reverse engineering, you know, firmware code on this

0:27:23.282 --> 0:27:26.162
<v S2>obscure chipset that like there's no examples on the internet,

0:27:26.162 --> 0:27:29.282
<v S2>bad example or to it won't have it won't have

0:27:29.282 --> 0:27:32.562
<v S2>anything to draw from. Number two, um, is there some

0:27:32.561 --> 0:27:36.361
<v S2>probabilistic nature to the data that's underlying? This is actually

0:27:36.362 --> 0:27:38.401
<v S2>makes large language models really bad for a lot of

0:27:38.402 --> 0:27:42.722
<v S2>security problems, because they're what we call non-differentiable, meaning that

0:27:42.722 --> 0:27:45.082
<v S2>they don't have like this nice curved space that you

0:27:45.082 --> 0:27:49.852
<v S2>can use stochastic gradient descent or virtually any optimization function

0:27:49.852 --> 0:27:52.052
<v S2>to try and climb and find a good answer for

0:27:52.172 --> 0:27:54.052
<v S2>it actually exists more of like this kind of cloud

0:27:54.052 --> 0:27:56.012
<v S2>with dots of answers all over the place. If you

0:27:56.012 --> 0:27:58.811
<v S2>were to try and imagine the answers to security questions

0:27:59.132 --> 0:28:01.252
<v S2>in like a mathematical graph.

0:28:01.732 --> 0:28:04.772
<v S1>Okay, what's an example of what's an example of one

0:28:04.772 --> 0:28:06.612
<v S1>of those? I'm, I'm trying to think of what that

0:28:06.612 --> 0:28:07.692
<v S1>space might look like.

0:28:08.172 --> 0:28:10.212
<v S2>Yeah. So a good example of like a problem that

0:28:10.212 --> 0:28:14.532
<v S2>is differentiable is like housing prices. So housing prices vary by,

0:28:14.571 --> 0:28:17.851
<v S2>you know, like the size by square footage. Yeah. Square footage,

0:28:17.852 --> 0:28:20.931
<v S2>number of rooms, zip code quality of the schools. So

0:28:20.932 --> 0:28:22.691
<v S2>when you plot these all out you get something that

0:28:22.692 --> 0:28:24.732
<v S2>you can do linear regression on. You can see like.

0:28:24.732 --> 0:28:24.932
<v S1>A.

0:28:25.132 --> 0:28:28.052
<v S2>Little loop. And that's called a differentiable function because it's

0:28:28.052 --> 0:28:31.052
<v S2>a continuous line that you can draw through the data

0:28:31.052 --> 0:28:33.212
<v S2>that more or less minimizes the error of those points

0:28:33.212 --> 0:28:34.012
<v S2>along the line.

0:28:34.252 --> 0:28:34.611
<v S1>Yep.

0:28:35.132 --> 0:28:37.732
<v S2>But if we want to think about, um, let's say

0:28:37.772 --> 0:28:40.332
<v S2>now optimizing a program, we can take a look at

0:28:40.332 --> 0:28:45.532
<v S2>how ordering certain steps or changing the way we implement

0:28:45.532 --> 0:28:48.342
<v S2>certain functions as changing the speed of a program up

0:28:48.342 --> 0:28:52.782
<v S2>and down, and that becomes kind of pseudo differentiable. It's

0:28:52.782 --> 0:28:54.382
<v S2>it's more like a step function where you have kind

0:28:54.382 --> 0:28:56.502
<v S2>of like little lines where if I change this one thing,

0:28:56.502 --> 0:28:59.262
<v S2>it jumps up a little bit, it's more jagged, but

0:28:59.302 --> 0:29:03.022
<v S2>there's still, um, it's close to differentiable because I can

0:29:03.062 --> 0:29:06.662
<v S2>kind of map deterministically how if I run it on,

0:29:06.982 --> 0:29:09.622
<v S2>you know, with this set of compiler optimizations or that

0:29:09.662 --> 0:29:13.102
<v S2>it's definitely not differentiable, but it's closer. Security is just

0:29:13.102 --> 0:29:17.622
<v S2>wild because the flaws in computer programs can come from

0:29:17.622 --> 0:29:19.382
<v S2>one of a million different sources. It can be a

0:29:19.382 --> 0:29:22.142
<v S2>logic bug, it can be a mis implemented function. It

0:29:22.142 --> 0:29:23.982
<v S2>can be the use of an unsafe function, which is

0:29:23.982 --> 0:29:27.702
<v S2>easy to find. There's no way for us to take, um,

0:29:28.502 --> 0:29:32.262
<v S2>root causes for vulnerabilities in software and solutions to them

0:29:32.422 --> 0:29:35.062
<v S2>and plot them on a graph. Because they come from

0:29:35.702 --> 0:29:39.502
<v S2>they come from unquantifiable sources. Some of them like, you know,

0:29:39.542 --> 0:29:42.982
<v S2>Spectre and Meltdown and stuff. They they're resident in hardware

0:29:43.222 --> 0:29:45.942
<v S2>and the implementation there. Some are purely in software like

0:29:45.992 --> 0:29:50.312
<v S2>X type vulnerabilities. We can't they don't they're it's, um,

0:29:50.352 --> 0:29:52.272
<v S2>it's not even apples and oranges. It's like trying to

0:29:52.272 --> 0:29:55.512
<v S2>compare apples and fighter jets. Um.

0:29:56.752 --> 0:29:59.112
<v S1>Is it, is it a matter of, like the, the

0:29:59.272 --> 0:30:03.792
<v S1>tensor size or the, um, I think that's called tensor size.

0:30:03.792 --> 0:30:07.752
<v S1>I can't remember the, the, um, the number of dimensions

0:30:07.752 --> 0:30:10.112
<v S1>in the space, because when you're looking at square footage

0:30:10.112 --> 0:30:13.592
<v S1>and price what you have to write, is it the

0:30:13.592 --> 0:30:18.472
<v S1>problem in security that is just so many dimensions that, um,

0:30:18.472 --> 0:30:20.912
<v S1>when you try to plot it, you try to simplify it,

0:30:20.952 --> 0:30:22.192
<v S1>it just becomes garbage.

0:30:22.912 --> 0:30:25.072
<v S2>Well, it's a matter of common dimensions. So if you

0:30:25.072 --> 0:30:28.112
<v S2>build a house, every house has square footage.

0:30:28.552 --> 0:30:29.192
<v S1>There you go.

0:30:29.712 --> 0:30:32.272
<v S2>And you can calculate the space underneath. But a cross

0:30:32.312 --> 0:30:36.792
<v S2>site request forgery vulnerability in a, um, you know, piece

0:30:36.792 --> 0:30:39.552
<v S2>of JavaScript code that exists on the web has almost

0:30:39.552 --> 0:30:42.952
<v S2>nothing in common with a memory corruption vulnerability in a

0:30:42.992 --> 0:30:47.762
<v S2>C program running on a router in your home device.

0:30:48.082 --> 0:30:51.802
<v S2>They are implemented at different levels of abstraction. You know,

0:30:51.842 --> 0:30:54.482
<v S2>like even the program representations are different because some of

0:30:54.482 --> 0:30:57.122
<v S2>the vulnerabilities might exist only in binary code after it's

0:30:57.122 --> 0:31:02.362
<v S2>been compiled versus other vulnerabilities that are resident in source

0:31:02.362 --> 0:31:06.162
<v S2>code that's interpreted via web browser. Um, so really what

0:31:06.162 --> 0:31:07.962
<v S2>it is, is it's like trying to it's like trying

0:31:07.962 --> 0:31:10.682
<v S2>to plot, you know, the prices of homes, along with

0:31:11.002 --> 0:31:14.962
<v S2>the prices of, um, I don't know, oranges in a

0:31:14.962 --> 0:31:18.722
<v S2>particular year. You know, there's very little in common between

0:31:18.762 --> 0:31:21.802
<v S2>a house and an orange other than maybe some, like,

0:31:21.842 --> 0:31:25.402
<v S2>you know, global macro effects that might show some correlation,

0:31:25.802 --> 0:31:28.122
<v S2>you know. You know, economic factors like inflation.

0:31:28.522 --> 0:31:31.202
<v S1>Or like the beating of a whale's heart to determine

0:31:31.202 --> 0:31:35.962
<v S1>whether or not it's healthy. It's it's like completely different. Uh, yeah.

0:31:36.002 --> 0:31:39.161
<v S1>Completely different sports. Yeah. Yeah, yeah.

0:31:39.522 --> 0:31:40.962
<v S2>Yeah. So, so really, it's a it's a lack of

0:31:40.962 --> 0:31:43.682
<v S2>common dimensions in cybersecurity, which is why, you know, if

0:31:43.682 --> 0:31:45.732
<v S2>we think about like if we were trying to model,

0:31:45.772 --> 0:31:47.532
<v S2>like what the data would look like, if we could

0:31:47.532 --> 0:31:50.012
<v S2>visualize it, it would just be a bunch of points

0:31:50.012 --> 0:31:54.332
<v S2>of presence out there. Um, uh, within this, like, kind

0:31:54.332 --> 0:31:57.332
<v S2>of large cloud. Um, and even then, that's another problem

0:31:57.332 --> 0:31:59.692
<v S2>that kind of makes cybersecurity really hard to model with

0:31:59.692 --> 0:32:05.092
<v S2>AML is that there is really comparatively little data, um,

0:32:05.692 --> 0:32:07.412
<v S2>in terms of like the volume of data, there's tons

0:32:07.412 --> 0:32:09.452
<v S2>of vulnerabilities out there. But if you're trying to make

0:32:09.452 --> 0:32:13.732
<v S2>a model that's really, really good at, let's say, detecting, um,

0:32:14.052 --> 0:32:17.692
<v S2>buffer overflows and embedded device code, um, you're going to

0:32:17.692 --> 0:32:19.452
<v S2>find some data for that, but there's not that much

0:32:19.452 --> 0:32:21.492
<v S2>you have to rely on like POC write ups on,

0:32:21.492 --> 0:32:23.572
<v S2>on the internet for practitioners who put it out there

0:32:23.572 --> 0:32:27.412
<v S2>for fun. Um, but there's not a million of examples

0:32:27.412 --> 0:32:28.772
<v S2>of that like, it is if you want to say,

0:32:28.772 --> 0:32:30.732
<v S2>I want to train a model to write the Great

0:32:30.732 --> 0:32:33.972
<v S2>American novel, there you can take you can take every

0:32:33.972 --> 0:32:36.132
<v S2>novel ever written, throw it in there and then see

0:32:36.132 --> 0:32:38.292
<v S2>what the model comes up with. If you prompt it

0:32:38.292 --> 0:32:40.052
<v S2>with like a general plot line, it's going to do

0:32:40.052 --> 0:32:42.412
<v S2>a lot better at that because, you know, that data

0:32:42.412 --> 0:32:48.312
<v S2>fills in that space a lot more. Um, so so, yeah, it's, um. Yeah.

0:32:48.352 --> 0:32:50.992
<v S2>Like the, the, the challenges and problem formulation are, are

0:32:50.992 --> 0:32:53.112
<v S2>really big and, um, yeah, that's why I kind of

0:32:53.152 --> 0:32:55.752
<v S2>encourage people when they look at these like, okay, I

0:32:55.752 --> 0:32:58.552
<v S2>want to build an AI, ML driven system. Um, take

0:32:58.552 --> 0:33:01.312
<v S2>a look at what subproblems are actually suitable for AIML.

0:33:01.592 --> 0:33:03.352
<v S2>Use them there. And I think you'll also find that

0:33:03.352 --> 0:33:05.472
<v S2>a lot of the times we have a tendency to

0:33:05.512 --> 0:33:08.152
<v S2>like say, okay, let's just kind of throw large language

0:33:08.152 --> 0:33:09.592
<v S2>models at some of these problems that we know we

0:33:09.592 --> 0:33:13.312
<v S2>could really solve with regular code. Um, and that's really

0:33:13.312 --> 0:33:16.191
<v S2>bad because of this compounding error problem. So, you know,

0:33:16.232 --> 0:33:18.432
<v S2>if I, you know, five steps in sequence that I've

0:33:18.432 --> 0:33:20.232
<v S2>got to do in step three is good for AIML

0:33:20.232 --> 0:33:23.272
<v S2>and step four is good for AIML. You know, like

0:33:23.272 --> 0:33:25.352
<v S2>it's like, okay, well, look, almost half of this problem is,

0:33:25.512 --> 0:33:26.792
<v S2>you know, is something I'm going to ask the model

0:33:26.792 --> 0:33:28.352
<v S2>to do anyway. I'll just ask it to do one,

0:33:28.352 --> 0:33:30.712
<v S2>two and five to. Well, the problem is it can

0:33:30.712 --> 0:33:32.352
<v S2>make a mistake in one. It can make a mistake

0:33:32.352 --> 0:33:34.632
<v S2>in two. That compound before you get to three and four.

0:33:34.832 --> 0:33:37.912
<v S2>So you're better off, you know, implementing one, two and code.

0:33:37.952 --> 0:33:39.912
<v S2>And then maybe you ask the model just to finish

0:33:39.912 --> 0:33:43.082
<v S2>it off and do step five because it's the final step.

0:33:43.082 --> 0:33:46.442
<v S2>It's had ground truth rooted in steps one two, steps

0:33:46.442 --> 0:33:49.482
<v S2>three and four. If they're well contextualized problems, maybe the

0:33:49.482 --> 0:33:52.082
<v S2>false positive rate is low enough that you can afford

0:33:52.082 --> 0:33:53.642
<v S2>to just let the model kind of finish it up

0:33:53.642 --> 0:33:56.442
<v S2>for you. But that's the biggest that's the biggest jump

0:33:56.442 --> 0:33:59.722
<v S2>I would take. Usually that's step five is like validation

0:33:59.722 --> 0:34:03.802
<v S2>or correctness. Um, checking. And that's not something you want

0:34:03.802 --> 0:34:06.522
<v S2>to ask the model to do because it's, it's it's

0:34:07.242 --> 0:34:11.082
<v S2>it has the tendency to, um, one be wanting to

0:34:11.122 --> 0:34:13.162
<v S2>kind of like please itself and say, oh yeah, it

0:34:13.162 --> 0:34:17.282
<v S2>looks great to me. Um, or to, um, depending on

0:34:17.282 --> 0:34:20.242
<v S2>how you phrase it, find something that doesn't exist. And

0:34:20.362 --> 0:34:23.482
<v S2>validation is a problem that typically is, uh, is pretty

0:34:23.522 --> 0:34:25.442
<v S2>amenable to like deterministic code.

0:34:27.042 --> 0:34:33.602
<v S1>So I really love this. Um. Where this is taking

0:34:33.602 --> 0:34:38.322
<v S1>me is designing, like, a, uh, a general problem solver.

0:34:38.922 --> 0:34:43.852
<v S1>And I'm imagining, like, the smartest model that you have.

0:34:43.892 --> 0:34:47.772
<v S1>You know, opus, whatever. Or, like, the best Gemini or

0:34:47.772 --> 0:34:50.612
<v S1>whatever or whatever the best model is. But but then

0:34:50.612 --> 0:34:54.092
<v S1>what you do is you say, okay, uh, the problem

0:34:54.092 --> 0:34:59.972
<v S1>is we need to design a system that, uh, you know, properly,

0:34:59.972 --> 0:35:03.652
<v S1>deterministically solves this problem with a high level of accuracy.

0:35:04.252 --> 0:35:06.972
<v S1>For example, the vulnerability problem that you guys worked on.

0:35:07.372 --> 0:35:11.332
<v S1>And then what I love is the idea of you

0:35:11.372 --> 0:35:15.932
<v S1>present to the model all these different AI models and

0:35:15.932 --> 0:35:20.852
<v S1>all these different deterministic technologies, all as solutions. And then

0:35:20.852 --> 0:35:25.452
<v S1>you do what you said, which is you, um, break

0:35:25.452 --> 0:35:28.652
<v S1>down the problems that need to be solved at every

0:35:28.652 --> 0:35:33.852
<v S1>level of the subpieces. Right. And then you match each

0:35:33.852 --> 0:35:38.732
<v S1>of those little problems to either one or, uh, one

0:35:38.732 --> 0:35:42.022
<v S1>or many of these eyes, which are bigger or smaller,

0:35:42.062 --> 0:35:45.262
<v S1>have different weaknesses or whatever, or even ML, not even

0:35:45.302 --> 0:35:51.142
<v S1>LLM based. Yeah. Versus deterministic with the rule of like look,

0:35:51.182 --> 0:35:56.702
<v S1>use the appropriate one for this problem type. And then

0:35:56.702 --> 0:35:59.582
<v S1>maybe you have a whole bunch of training about problem

0:35:59.582 --> 0:36:03.742
<v S1>types and solution types. And then it picks which one

0:36:03.742 --> 0:36:07.382
<v S1>to use for each step. I mean is that.

0:36:08.102 --> 0:36:09.542
<v S2>You mentioned this. I think this is what some of

0:36:09.542 --> 0:36:11.862
<v S2>like the large, you know, third party ML as a

0:36:11.862 --> 0:36:14.342
<v S2>service providers like OpenAI and anthropic are kind of trying

0:36:14.342 --> 0:36:16.422
<v S2>to do. If you've heard of like this concept of

0:36:16.422 --> 0:36:19.942
<v S2>like mixture of experts models, um, it's uh.

0:36:19.942 --> 0:36:20.462
<v S1>That's true.

0:36:20.662 --> 0:36:22.622
<v S2>Yeah. It's this concept where, you know, like, you know,

0:36:22.662 --> 0:36:25.062
<v S2>like the, the actual interface. We have to maybe GPT

0:36:25.102 --> 0:36:27.462
<v S2>five and, and I haven't looked at the source code.

0:36:27.462 --> 0:36:28.942
<v S2>I don't work at OpenAI, so I have no idea

0:36:28.942 --> 0:36:30.622
<v S2>if this works underneath the hood, but it's been kind

0:36:30.622 --> 0:36:33.542
<v S2>of theorized and it's even been mentioned, you know, a

0:36:33.542 --> 0:36:36.182
<v S2>bit in terms of, um, you know, people who've kind

0:36:36.182 --> 0:36:38.422
<v S2>of looked at the models a little bit closer that,

0:36:38.462 --> 0:36:40.352
<v S2>you know, um, you know, when we, when we, we

0:36:40.392 --> 0:36:41.992
<v S2>fine tune a model to make it really good or

0:36:41.992 --> 0:36:44.872
<v S2>really suitable for a particular purpose that's amenable to AIML,

0:36:45.232 --> 0:36:49.232
<v S2>it can still be challenging to, um, have it interface

0:36:49.232 --> 0:36:50.912
<v S2>with the user in the way that like a high

0:36:50.912 --> 0:36:54.472
<v S2>quality chatbot would. So using yeah, a mixture of experts

0:36:54.472 --> 0:36:56.912
<v S2>models suggests that like having like an interface, like a

0:36:57.432 --> 0:37:01.192
<v S2>bot that interacts with the user but then recognizes certain

0:37:01.192 --> 0:37:04.392
<v S2>classes of problems and ducts them to the right expert. So, oh,

0:37:04.432 --> 0:37:07.192
<v S2>they're asking me about cyber. I'll ask, you know, um,

0:37:08.072 --> 0:37:11.112
<v S2>cyber GPT to handle this one. All they're asking about,

0:37:11.392 --> 0:37:14.392
<v S2>you know, mental health, I'll ask, you know, mental health

0:37:14.392 --> 0:37:19.552
<v S2>GPT to to help out here. Um, so, you know,

0:37:19.592 --> 0:37:22.672
<v S2>this kind of like concept I think is I think

0:37:22.672 --> 0:37:24.992
<v S2>it's trying to be creative, or at least it's been

0:37:24.992 --> 0:37:27.392
<v S2>thought of, um, in terms of using like all AI,

0:37:27.392 --> 0:37:30.192
<v S2>ML solutions. But but yeah, I agree, like the way

0:37:30.232 --> 0:37:32.992
<v S2>forward is to have, um, you know, for, for like

0:37:32.992 --> 0:37:38.482
<v S2>rapid like prototype development have like components that do certain things. Well, um,

0:37:38.522 --> 0:37:40.842
<v S2>and honestly, it's like reflected in software, like we have

0:37:40.842 --> 0:37:44.522
<v S2>libraries for, we have libraries for sorting. No one or

0:37:44.562 --> 0:37:47.562
<v S2>we have libraries for cryptography. Nobody should be writing their

0:37:47.562 --> 0:37:50.962
<v S2>own cryptography code. Use a library. Um, you know, the

0:37:50.962 --> 0:37:54.882
<v S2>closer these high quality libraries and, um, fine tuned ML

0:37:54.882 --> 0:37:58.282
<v S2>applications or ML models for certain types of subproblems, the

0:37:58.282 --> 0:37:59.882
<v S2>closer we get to being able to kind of compose

0:37:59.882 --> 0:38:01.962
<v S2>all these together. And the good thing is, is that

0:38:01.962 --> 0:38:03.882
<v S2>Elm is probably pretty good at writing the glue code

0:38:03.922 --> 0:38:05.362
<v S2>to sequence all this stuff together.

0:38:06.362 --> 0:38:09.082
<v S1>Yeah, yeah. Because because that's the trick for me. Because

0:38:09.122 --> 0:38:12.122
<v S1>inside of a mixture of experts, you're already inside the LLM.

0:38:12.442 --> 0:38:15.042
<v S1>What I'm thinking of this higher level model is like, look,

0:38:15.082 --> 0:38:18.282
<v S1>we're doing it. We're doing, um, matrix math over here.

0:38:18.602 --> 0:38:22.962
<v S1>We're doing multiplication over here. Um, guess what? This problem

0:38:22.962 --> 0:38:26.442
<v S1>space is not associated with an AI. We don't even

0:38:26.482 --> 0:38:29.042
<v S1>know I will ever touch this. We hand it to

0:38:29.042 --> 0:38:34.922
<v S1>our fastest and best, you know, deterministic addition function or whatever,

0:38:34.962 --> 0:38:38.572
<v S1>you know, and it's like maybe 95% of the whole

0:38:38.572 --> 0:38:41.372
<v S1>app ends up being traditional tech that doesn't involve AI,

0:38:41.412 --> 0:38:43.092
<v S1>other than the routing to get there.

0:38:43.932 --> 0:38:45.372
<v S2>Yeah, I mean, that would be ideal. I mean, anything

0:38:45.372 --> 0:38:49.852
<v S2>you can route, anything. Anything you can. Yeah, I don't know.

0:38:49.852 --> 0:38:51.972
<v S2>It's funny. It's like really what it comes down to

0:38:52.052 --> 0:38:57.412
<v S2>is like using large language models and like, solving large problems.

0:38:57.412 --> 0:39:00.692
<v S2>It becomes a conditional probability problem. And even if you

0:39:00.692 --> 0:39:03.572
<v S2>have the answer, get the right answer right at 99%

0:39:03.572 --> 0:39:07.572
<v S2>of the time. Um, over and over and over again,

0:39:08.252 --> 0:39:11.052
<v S2>you still have a high likelihood of failure by the

0:39:11.052 --> 0:39:13.892
<v S2>time you compute all the conditional probability out. It's kind

0:39:13.932 --> 0:39:15.812
<v S2>of funny. Like, I kind of learned this lesson in like,

0:39:15.852 --> 0:39:19.052
<v S2>in a completely different walk of life. Um, after I

0:39:19.052 --> 0:39:22.332
<v S2>got my bachelor's degree in CS, I, I worked for

0:39:22.332 --> 0:39:27.132
<v S2>like a year doing, um, software engineering and kind of

0:39:27.172 --> 0:39:30.852
<v S2>found it to be dull, so I, I, I did

0:39:30.852 --> 0:39:33.252
<v S2>something completely different. I joined the Army and I started

0:39:33.252 --> 0:39:37.432
<v S2>flying helicopters. Um, it's actually nice. That is, that's actually,

0:39:37.472 --> 0:39:39.712
<v S2>you know, I'm at up at Camp Dwyer in in

0:39:39.712 --> 0:39:43.152
<v S2>RC Southwest and Afghanistan. It's, um, picture was taken of

0:39:43.152 --> 0:39:45.792
<v S2>our aircraft on the flight line, and one of my

0:39:45.792 --> 0:39:48.632
<v S2>jobs as a pilot was to educate our junior pilots

0:39:48.632 --> 0:39:52.232
<v S2>on this concept of, like, mission survivability. Um, and that's

0:39:52.232 --> 0:39:55.192
<v S2>the idea that, um, you know, understanding what's called, like,

0:39:55.192 --> 0:39:57.192
<v S2>the kill chain. The kill chain has been pretty popularized

0:39:57.192 --> 0:40:00.392
<v S2>and security as well. But, you know, basically for a

0:40:00.432 --> 0:40:03.312
<v S2>for a compromise, whether it's shooting down an aircraft or

0:40:03.312 --> 0:40:05.432
<v S2>breaching a database, like a lot of things have to

0:40:05.472 --> 0:40:08.032
<v S2>happen and they all have some sort of probability. And

0:40:08.032 --> 0:40:10.312
<v S2>your goal in breaking the kill chain or breaking the

0:40:10.312 --> 0:40:13.672
<v S2>exploitation chain is to reduce any one probability down to zero,

0:40:13.912 --> 0:40:18.832
<v S2>because then the common or the conditional probability problem becomes zero. Um,

0:40:18.832 --> 0:40:20.752
<v S2>but the probabilities can be really weird. I used to

0:40:20.752 --> 0:40:22.712
<v S2>talk to my junior pilots and ask them like, hey,

0:40:22.992 --> 0:40:25.632
<v S2>what do you think is like the acceptable loss rate

0:40:25.632 --> 0:40:28.432
<v S2>on any of the missions that we fly here in theater?

0:40:28.472 --> 0:40:30.432
<v S2>And they would usually give me answers like they were

0:40:30.432 --> 0:40:35.002
<v S2>pretty close. They'd say like 90% or 95% or even 99%.

0:40:35.922 --> 0:40:37.562
<v S2>So I would actually take them to the math problem.

0:40:37.562 --> 0:40:39.681
<v S2>I get off the whiteboard and I'd say, okay, let's

0:40:39.682 --> 0:40:42.882
<v S2>assume it's 99%. I say, okay, how many aircraft are

0:40:42.882 --> 0:40:44.962
<v S2>we flying a day? Okay. You know, we have ten

0:40:44.962 --> 0:40:47.602
<v S2>total aircraft. We go on five missions a day. So

0:40:47.602 --> 0:40:50.242
<v S2>that's five aircraft are going out there. And let's say

0:40:50.242 --> 0:40:52.162
<v S2>there's only a 1% chance that each one of them

0:40:52.162 --> 0:40:54.162
<v S2>gets shot down. Okay. So that's five aircraft a day.

0:40:54.162 --> 0:40:55.562
<v S2>But we're going to be in we're going to be

0:40:55.562 --> 0:40:58.122
<v S2>in theater for for nine months. We'll round it off.

0:40:58.122 --> 0:40:59.882
<v S2>We'll make it a year. We're going to be here

0:40:59.882 --> 0:41:03.722
<v S2>for 365 days. So now if I take 365 by

0:41:03.762 --> 0:41:06.642
<v S2>five and multiply it by five, that's the number of

0:41:06.642 --> 0:41:09.762
<v S2>missions we're flying in the entire time we're here. This

0:41:09.762 --> 0:41:11.482
<v S2>number comes out to be pretty high. And now all

0:41:11.482 --> 0:41:14.802
<v S2>of a sudden, if I lose one aircraft for every 100,

0:41:14.842 --> 0:41:17.162
<v S2>you realize that I actually run out of aircraft in

0:41:17.162 --> 0:41:20.162
<v S2>the first two months of of being in theater and I.

0:41:20.162 --> 0:41:21.922
<v S2>And now all of a sudden, the troops don't have,

0:41:22.082 --> 0:41:23.402
<v S2>don't have helicopters to fly.

0:41:23.402 --> 0:41:23.762
<v S1>Yeah.

0:41:24.122 --> 0:41:26.762
<v S2>So I said, actually, believe it or not, our our

0:41:26.842 --> 0:41:32.962
<v S2>acceptable loss rate is something more like 99.99999%. Um, we

0:41:32.962 --> 0:41:35.252
<v S2>can almost never lose an aircraft because. Or we can

0:41:35.252 --> 0:41:38.452
<v S2>almost never accept any type of probability. That means we

0:41:38.452 --> 0:41:41.052
<v S2>have even a remote chance of losing an aircraft because

0:41:41.052 --> 0:41:44.092
<v S2>we will deplete them. It's a limited resource. Um, solving

0:41:44.092 --> 0:41:46.372
<v S2>problems with Llms is the same way. If you ask

0:41:46.372 --> 0:41:48.972
<v S2>them to solve 15 problems in a row, even if

0:41:48.972 --> 0:41:52.372
<v S2>it's got a 99% chance, which is which would be

0:41:52.372 --> 0:41:55.212
<v S2>amazing if any LLM could get anywhere close to that,

0:41:55.652 --> 0:41:57.812
<v S2>even if it has a 99% chance of answering every

0:41:57.812 --> 0:42:01.452
<v S2>single problem right over the course of a year, it's

0:42:01.452 --> 0:42:04.932
<v S2>probably going to give you answers that are wrong almost 80%

0:42:04.932 --> 0:42:07.652
<v S2>of the time if that chain is long enough. And

0:42:07.652 --> 0:42:09.572
<v S2>if you have enough problems that you feed through it.

0:42:10.012 --> 0:42:13.252
<v S2>So that's one thing I try to like, um, hope

0:42:13.252 --> 0:42:16.972
<v S2>people conceptualize over relying on large language models and try

0:42:16.972 --> 0:42:20.412
<v S2>to help them understand this, like compounding error problem. It's

0:42:20.412 --> 0:42:26.132
<v S2>really a conditional probability, uh, compounding conditional probability problem. And

0:42:26.132 --> 0:42:28.812
<v S2>your tolerance for false positives is actually zero. So anywhere

0:42:28.812 --> 0:42:31.252
<v S2>in this chain that you can we have to think

0:42:31.252 --> 0:42:33.942
<v S2>about this differently now because I can't reduce anything to zero.

0:42:33.982 --> 0:42:35.422
<v S2>But what I can do is I can take certain

0:42:35.422 --> 0:42:36.902
<v S2>parts of the chain and I can bump them up

0:42:36.902 --> 0:42:40.102
<v S2>to 100%, meaning my chances of getting something right when

0:42:40.102 --> 0:42:42.902
<v S2>I use a deterministic algorithm are 100%. So now I

0:42:42.942 --> 0:42:45.862
<v S2>no longer have some sort of fractional probability out of.

0:42:45.902 --> 0:42:49.102
<v S2>So this 15 step problem now let's say 12 steps

0:42:49.102 --> 0:42:52.062
<v S2>I do deterministically. Now I only have a three step chain.

0:42:52.062 --> 0:42:55.622
<v S2>And now that 99% I'm getting it right only three times.

0:42:55.702 --> 0:42:58.422
<v S2>You simplify this problem. Now I might be able to

0:42:58.422 --> 0:43:00.702
<v S2>make it through a year's worth of operations that, you know,

0:43:00.742 --> 0:43:03.382
<v S2>100 examples of the problem a day. I might be

0:43:03.382 --> 0:43:06.302
<v S2>able to make it through that with a false positive

0:43:06.302 --> 0:43:08.022
<v S2>rate of. I don't know what the math is in

0:43:08.022 --> 0:43:09.502
<v S2>my head. I'd have to I have to punch it out.

0:43:09.502 --> 0:43:11.342
<v S2>But that false positive rate might be a lot more

0:43:11.342 --> 0:43:15.422
<v S2>survivable in an operational world than, you know, 15 conditional

0:43:15.422 --> 0:43:17.862
<v S2>probability problems that are all 99%.

0:43:18.902 --> 0:43:22.902
<v S1>Yeah, yeah, I love that. The way I describe it is, um,

0:43:23.222 --> 0:43:26.782
<v S1>what's 1% of 100 metric tons of problems.

0:43:27.942 --> 0:43:28.382
<v S2>A metric.

0:43:29.062 --> 0:43:31.182
<v S1>A metric ton of problems?

0:43:31.272 --> 0:43:33.272
<v S2>Yeah, I love that. I love that.

0:43:34.072 --> 0:43:40.152
<v S1>Yeah. Yeah. Um, so, uh, we share this in common, actually. So, um,

0:43:40.152 --> 0:43:42.952
<v S1>I was, um, I was also Army, and I was at.

0:43:43.832 --> 0:43:46.912
<v S1>I was at Fort Campbell, so I was air assault,

0:43:46.912 --> 0:43:48.832
<v S1>so I had to do all the helicopter stuff.

0:43:48.872 --> 0:43:50.552
<v S2>Uh, right on, man. Hell, yeah. Brother.

0:43:50.992 --> 0:43:54.872
<v S1>Yeah. That's cool. Airborne air assault. Right? Um, yeah.

0:43:55.592 --> 0:43:58.792
<v S2>No. Yeah, I, I was, um, uh, this this picture

0:43:58.792 --> 0:44:01.392
<v S2>was taken when we were doing, uh, medevac chase. Uh,

0:44:01.432 --> 0:44:03.712
<v S2>we we did security for those guys over there, but

0:44:03.752 --> 0:44:05.832
<v S2>I was in an air assault battalion, so we literally

0:44:05.832 --> 0:44:07.912
<v S2>did nothing but fly you guys around, so.

0:44:07.952 --> 0:44:08.352
<v S1>Oh.

0:44:08.352 --> 0:44:10.312
<v S2>Nice man. Small world. Dude.

0:44:10.752 --> 0:44:11.432
<v S1>Yeah, yeah.

0:44:12.272 --> 0:44:14.472
<v S2>Yeah, I was over at Fort Campbell. I, I was at, um.

0:44:14.472 --> 0:44:16.712
<v S2>I was at Fort Riley, uh, in in the first

0:44:17.112 --> 0:44:19.872
<v S2>cab and then, um, I PC from there after I

0:44:19.872 --> 0:44:22.072
<v S2>went to Afghanistan and went to the 82nd. Um, so

0:44:22.072 --> 0:44:24.272
<v S2>I never got, never quite got to Campbell, which, like,

0:44:24.312 --> 0:44:26.472
<v S2>would have been great because I live here in Ohio

0:44:26.472 --> 0:44:29.272
<v S2>and Cincinnati. It's like where I was from. So I

0:44:29.272 --> 0:44:31.402
<v S2>was like always trying to get to Campbell because it

0:44:31.402 --> 0:44:33.202
<v S2>was like only like 4 or 5 hours from home

0:44:33.202 --> 0:44:35.322
<v S2>and be able to see family a lot easier. But

0:44:35.322 --> 0:44:38.722
<v S2>I ended up like 12 and nine hours away, respectively, so, uh.

0:44:39.042 --> 0:44:42.642
<v S1>Yeah. Well, that's super cool. Yeah, well, we need to

0:44:42.642 --> 0:44:46.802
<v S1>chat some more. Man. This is, like, really, really cool stuff. Um,

0:44:47.282 --> 0:44:49.122
<v S1>what you guys did on the team is cool, but

0:44:49.162 --> 0:44:51.762
<v S1>I'm even more excited just about the way you think

0:44:51.762 --> 0:44:58.522
<v S1>about these things. Um, I'm. I'm, uh, happy that, um,

0:44:58.762 --> 0:45:00.722
<v S1>the way you're thinking about it is similar to the

0:45:00.722 --> 0:45:03.082
<v S1>way I'm thinking about it. I you've taught me a

0:45:03.082 --> 0:45:06.082
<v S1>lot just during this thing. We should we should definitely

0:45:06.082 --> 0:45:08.362
<v S1>chat more after this. Um, anything else you want to

0:45:08.362 --> 0:45:14.482
<v S1>share about the the competition or, um, lessons learned? Um.

0:45:15.522 --> 0:45:17.082
<v S2>So I think one of the things that that came

0:45:17.082 --> 0:45:31.532
<v S2>out of the competition, um, was a lot of vindication. Sorry.

0:45:31.532 --> 0:45:36.612
<v S2>I nudged mouse in it. Oh. So, um, I'll just

0:45:36.612 --> 0:45:38.252
<v S2>I'll just go right into the answer. I assume you

0:45:38.252 --> 0:45:41.852
<v S2>can edit this later or something, but yeah. Um, so yeah,

0:45:41.852 --> 0:45:43.412
<v S2>one of the things that, um, that came out of

0:45:43.412 --> 0:45:46.572
<v S2>the competition was, was honestly a lot of indication, um,

0:45:47.092 --> 0:45:49.252
<v S2>like I had mentioned before, you know, when we started

0:45:49.252 --> 0:45:53.092
<v S2>off this process, um, this was two years ago, which

0:45:53.092 --> 0:45:56.292
<v S2>has been two lifetimes in the development of like AI

0:45:56.292 --> 0:46:00.612
<v S2>enabled systems for any problem, much less cybersecurity. Um, so

0:46:00.612 --> 0:46:03.692
<v S2>a lot of the things that we did, like tool enabling, um,

0:46:03.732 --> 0:46:07.812
<v S2>and multi-agent systems were things that we did before, things

0:46:07.812 --> 0:46:13.252
<v S2>like MCP or um, complicated, um, libraries for supporting this existed,

0:46:13.252 --> 0:46:17.012
<v S2>like we used early versions of um, of long chain, uh,

0:46:17.012 --> 0:46:19.052
<v S2>for some of our multi-agent stuff, but we actually ended

0:46:19.052 --> 0:46:20.332
<v S2>up having to write a lot of and implement a

0:46:20.332 --> 0:46:23.692
<v S2>lot of our own glue code for this. Um, so

0:46:23.732 --> 0:46:26.612
<v S2>it's really vindicating to see, like, those techniques become, while

0:46:26.612 --> 0:46:29.952
<v S2>we're doing the competition, become not only one commonplace and

0:46:29.952 --> 0:46:33.832
<v S2>two supported by the major large language model, providers be

0:46:33.832 --> 0:46:37.192
<v S2>adopted and be used generally by the community. Um, you know,

0:46:37.232 --> 0:46:39.112
<v S2>it was really great that we came in second and

0:46:39.112 --> 0:46:41.512
<v S2>that also the first place finisher also used this like

0:46:41.512 --> 0:46:46.632
<v S2>kind of, um, use, um, problem solving techniques that are

0:46:46.632 --> 0:46:51.232
<v S2>well suited for the problem approach. Yeah. Don't use AI everywhere. Um,

0:46:51.792 --> 0:46:55.512
<v S2>finisher theory. They were a little bit more LM forward,

0:46:55.792 --> 0:46:58.352
<v S2>but they still had a lot of, like, traditional components.

0:46:58.512 --> 0:47:01.352
<v S2>I don't think any team really went after this. Like,

0:47:01.352 --> 0:47:04.912
<v S2>all LM tried to just do everything within the LM. Um.

0:47:05.432 --> 0:47:07.952
<v S1>I bet a lot started that way, and they they

0:47:08.112 --> 0:47:10.112
<v S1>fall back from it. Yeah.

0:47:10.152 --> 0:47:13.072
<v S2>Yeah. Yeah, I think I think at least one of them, um,

0:47:13.312 --> 0:47:14.672
<v S2>at least one team. I think all you need is

0:47:14.672 --> 0:47:17.872
<v S2>a fuzzing brain. I think in the semi-finals, their approach, um,

0:47:18.352 --> 0:47:20.752
<v S2>tried to just use an LM to augment a fuzzer

0:47:20.752 --> 0:47:22.632
<v S2>to find vulnerabilities. And I don't think they really had

0:47:22.632 --> 0:47:24.672
<v S2>much of, like, a solution for patching, but it was

0:47:24.672 --> 0:47:26.552
<v S2>enough to get them to the finals. I they had

0:47:26.552 --> 0:47:31.202
<v S2>a more well rounded system, I believe, uh, in the,

0:47:31.242 --> 0:47:33.442
<v S2>in the finals. Um, so yeah, it was kind of

0:47:33.482 --> 0:47:35.442
<v S2>vindicating to also see that all these other bright minds

0:47:35.442 --> 0:47:38.802
<v S2>out there were also similarly of the, of the mindset

0:47:38.802 --> 0:47:41.002
<v S2>to do this. But um, one of the biggest takeaways

0:47:41.002 --> 0:47:43.682
<v S2>I have that I'll, that I'll say is that was

0:47:43.682 --> 0:47:45.802
<v S2>like different than what I expected because it's really easy

0:47:45.802 --> 0:47:47.442
<v S2>to pat myself on the back and say, oh yeah,

0:47:47.482 --> 0:47:49.522
<v S2>all the plan I came up with worked great. That's

0:47:49.682 --> 0:47:52.722
<v S2>that's awesome. But, um, I will say that I was

0:47:52.722 --> 0:47:56.362
<v S2>really surprised at how well large language models eventually became

0:47:56.362 --> 0:47:59.402
<v S2>at helping us generate patches and also helping us generate

0:47:59.402 --> 0:48:02.722
<v S2>seed inputs to improve Fuzzer performance. Those were areas where

0:48:02.722 --> 0:48:04.922
<v S2>I didn't really give the LLM a lot of credit

0:48:04.922 --> 0:48:07.322
<v S2>up front, but I had to build an autonomous system,

0:48:07.322 --> 0:48:10.802
<v S2>so I had no choice. They really outperformed my expectations.

0:48:10.802 --> 0:48:12.802
<v S2>So I kind of came out of this with, um,

0:48:13.202 --> 0:48:17.122
<v S2>a bit of a healthier respect for the capabilities of

0:48:17.122 --> 0:48:20.322
<v S2>AI models. Once again, these are still highly constrained.

0:48:20.322 --> 0:48:21.322
<v S1>And yeah, yeah.

0:48:21.362 --> 0:48:23.962
<v S2>Very context rich problems that we ask them to do,

0:48:24.082 --> 0:48:26.212
<v S2>but they still did way better than I thought they

0:48:26.212 --> 0:48:27.612
<v S2>were going to do. Um.

0:48:27.892 --> 0:48:32.812
<v S1>Yeah. And also context constrained, not polluted, like a very

0:48:33.132 --> 0:48:35.612
<v S1>controlled context for that thing. Like like you were talking

0:48:35.612 --> 0:48:36.612
<v S1>about before, right?

0:48:37.092 --> 0:48:40.692
<v S2>Yeah. Yeah. Um, yeah, I think that's about it. Unfortunately,

0:48:40.692 --> 0:48:41.892
<v S2>I do have to jump off. I gotta I got

0:48:41.892 --> 0:48:45.212
<v S2>another call at 1230, but, um. Yeah, I'd love to

0:48:45.212 --> 0:48:47.612
<v S2>chat more and talk more with you at some point.

0:48:47.612 --> 0:48:49.892
<v S2>If you want to do a follow up episode or,

0:48:49.932 --> 0:48:52.372
<v S2>I don't know, you just want to chat about other stuff. Um,

0:48:53.052 --> 0:48:54.372
<v S2>you know, we got a couple of friends in common

0:48:54.372 --> 0:48:58.652
<v S2>between Clint and, uh, between Clint and Keith, and it's, uh,

0:48:58.652 --> 0:49:00.172
<v S2>you know, I've. I've run into you a couple places

0:49:00.172 --> 0:49:03.052
<v S2>on various calls and stuff that we've been on, but, um,

0:49:03.132 --> 0:49:04.212
<v S2>it was good to get a chance to talk with

0:49:04.212 --> 0:49:06.052
<v S2>you one on one. I feel like we've been kind of, like,

0:49:06.292 --> 0:49:08.612
<v S2>circling around in the same circle for a while, but

0:49:08.612 --> 0:49:10.132
<v S2>I hadn't had a chance to, like, actually just chat

0:49:10.132 --> 0:49:10.812
<v S2>the two of us.

0:49:11.492 --> 0:49:15.452
<v S1>Yeah, absolutely. Well, thanks. Thanks for the, uh, the input.

0:49:15.452 --> 0:49:18.812
<v S1>This is just, uh, fantastic stuff. And, uh, let's definitely

0:49:18.812 --> 0:49:19.572
<v S1>catch up soon.

0:49:20.052 --> 0:49:21.372
<v S2>Yeah. Sounds good man. Take care of yourself.

0:49:21.412 --> 0:49:22.252
<v S1>All right. Take care.