WEBVTT - The 4 AAAAs of the AI ECOSYSTEM: Assistants, APIs, Agents, and Augmented Reality

0:00:00.880 --> 0:00:05.040
<v S1>Unsupervised Learning is a podcast about trends and ideas in cybersecurity,

0:00:05.080 --> 0:00:09.960
<v S1>national security, AI, technology and society, and how best to

0:00:10.000 --> 0:00:17.640
<v S1>upgrade ourselves to be ready for what's coming. All right,

0:00:17.640 --> 0:00:19.840
<v S1>in this video, I'm going to talk about the AI

0:00:19.880 --> 0:00:22.840
<v S1>ecosystem that I think everyone is actually building and moving

0:00:22.880 --> 0:00:26.280
<v S1>towards without even realizing that they're doing so. And I'm

0:00:26.280 --> 0:00:28.600
<v S1>going to break it down into four pieces, which are

0:00:29.200 --> 0:00:34.960
<v S1>assistance APIs, agents, and augmented reality. And I think once

0:00:34.960 --> 0:00:38.199
<v S1>you see this model, you're going to realize that the

0:00:38.200 --> 0:00:41.120
<v S1>news coming in from OpenAI and anthropic and all these

0:00:41.120 --> 0:00:45.480
<v S1>different companies, it's all moving in this direction toward this model.

0:00:46.080 --> 0:00:48.639
<v S1>And I think it's going to be really helpful for

0:00:48.640 --> 0:00:51.400
<v S1>you to just have that mental model of it. So

0:00:51.400 --> 0:00:54.120
<v S1>let's jump into it. So I actually broke this down

0:00:54.120 --> 0:00:57.280
<v S1>into a much longer explanation that you could see above, uh,

0:00:57.280 --> 0:01:00.720
<v S1>back in December of 2023. And I wrote a kind

0:01:00.720 --> 0:01:02.720
<v S1>of a little bit of a crappy book about it

0:01:02.720 --> 0:01:05.600
<v S1>in 2016. I really just wanted to capture the ideas,

0:01:05.600 --> 0:01:08.039
<v S1>which it was decent at doing that. It was called

0:01:08.040 --> 0:01:10.200
<v S1>The Real Internet of Things. You could put a link

0:01:10.200 --> 0:01:12.400
<v S1>to that in there. Don't really need to read the book.

0:01:12.400 --> 0:01:15.800
<v S1>This is actually much better. But, um, this stuff is

0:01:15.800 --> 0:01:18.360
<v S1>actually starting to happen now. And what I want to

0:01:18.360 --> 0:01:21.400
<v S1>do today is take the concepts from the book and

0:01:21.400 --> 0:01:25.440
<v S1>from that video above, and show how they're actually falling

0:01:25.440 --> 0:01:28.679
<v S1>into this model and how it's actually starting to happen today,

0:01:29.160 --> 0:01:31.600
<v S1>which we could see throughout the news from all these

0:01:31.600 --> 0:01:36.040
<v S1>different companies. So the first a here is assistant or

0:01:36.040 --> 0:01:38.800
<v S1>what I call a digital assistant. So a million different

0:01:38.800 --> 0:01:42.560
<v S1>companies are actually building this piece in various ways. Some

0:01:42.560 --> 0:01:46.160
<v S1>companies are building like digital companions or like smart assistants

0:01:46.160 --> 0:01:50.240
<v S1>or personal agents. But to me like this is all

0:01:50.240 --> 0:01:53.000
<v S1>actually kind of the same thing. It is basically a

0:01:53.040 --> 0:01:55.800
<v S1>piece of tech that is the most intimate to you

0:01:56.280 --> 0:01:59.200
<v S1>because it knows everything about you your preferences, your calendar,

0:01:59.200 --> 0:02:03.250
<v S1>your contacts like health information. Your finances is going to

0:02:03.250 --> 0:02:07.890
<v S1>be our best advocate, our best tutor, right? It's a

0:02:07.890 --> 0:02:10.850
<v S1>filter against like incoming stuff that you don't want to see.

0:02:10.889 --> 0:02:14.130
<v S1>And hopefully that filter and those filters are actually assigned

0:02:14.130 --> 0:02:16.930
<v S1>by you and not by someone else. But, um, yeah,

0:02:16.970 --> 0:02:19.930
<v S1>filtering out messages and emails and stuff you don't want.

0:02:19.970 --> 0:02:24.330
<v S1>It's basically figuring out exactly what you want and figuring

0:02:24.330 --> 0:02:26.690
<v S1>out how to make that happen all the time. Now,

0:02:26.730 --> 0:02:30.890
<v S1>if you're insecurity like me, you're probably thinking, well, that's crazy.

0:02:31.010 --> 0:02:33.170
<v S1>All this stuff on the screen here, like, knows everything

0:02:33.169 --> 0:02:38.370
<v S1>about you. History, trauma preferences, remembers everything, has multiple agents.

0:02:38.570 --> 0:02:42.970
<v S1>Like nobody's going to actually put that information into their Da.

0:02:43.090 --> 0:02:45.090
<v S1>But I think we already know that's not true. We

0:02:45.090 --> 0:02:47.650
<v S1>already know that people are doing this. That's why these

0:02:47.650 --> 0:02:51.930
<v S1>digital companion companies are doing so well already. This is

0:02:51.930 --> 0:02:55.530
<v S1>functionality that is just so powerful. Like it's just going

0:02:55.530 --> 0:02:59.169
<v S1>to happen, right? And this is kind of the centerpiece

0:02:59.169 --> 0:03:01.130
<v S1>of this whole model that I'm going to break down.

0:03:01.290 --> 0:03:04.170
<v S1>This is the first one. This is the assistant. So

0:03:04.169 --> 0:03:07.329
<v S1>let's call RDA Kai hours is going to be called

0:03:07.370 --> 0:03:13.329
<v S1>Kai okay. So the second A is APIs. So your

0:03:13.330 --> 0:03:16.929
<v S1>Da isn't that useful if it can't do stuff for

0:03:16.930 --> 0:03:21.250
<v S1>you right. So the way it'll do stuff is through APIs. Uh,

0:03:21.250 --> 0:03:24.410
<v S1>I didn't say in 2016 which of like the Da

0:03:24.410 --> 0:03:26.330
<v S1>or the APIs was going to come first because I

0:03:26.330 --> 0:03:29.730
<v S1>didn't know. And I still really kind of don't know,

0:03:29.730 --> 0:03:31.570
<v S1>but it seems like they're kind of happening at the

0:03:31.570 --> 0:03:35.530
<v S1>same time. Basically, your Da Chi is over here being

0:03:35.530 --> 0:03:38.690
<v S1>an agent for you, right? It's being like your advocate,

0:03:38.690 --> 0:03:42.130
<v S1>like we already talked about. It's it is an agent.

0:03:42.130 --> 0:03:44.690
<v S1>It has agents working for it. But ultimately it's like

0:03:44.690 --> 0:03:48.410
<v S1>one personality with a collection of agents behind it. It's

0:03:48.450 --> 0:03:51.770
<v S1>effectively kind of like one entity or one person, which

0:03:51.770 --> 0:03:54.050
<v S1>is why we give it a name. You know, we're

0:03:54.050 --> 0:03:56.650
<v S1>treating it like a person, like like a friend, right?

0:03:57.130 --> 0:04:02.420
<v S1>So ultimately it's encapsulated in one personality, in one sort

0:04:02.420 --> 0:04:06.660
<v S1>of entity. So Chi is constantly looking at your state

0:04:06.660 --> 0:04:09.180
<v S1>and trying to figure out how to make it better.

0:04:09.540 --> 0:04:13.340
<v S1>That's the core concept for the Da. That's what it's doing.

0:04:13.460 --> 0:04:16.419
<v S1>Are you hungry? Are you angry? Are you stressed out?

0:04:16.460 --> 0:04:19.300
<v S1>Do you have a meeting coming up? Right. That it

0:04:19.300 --> 0:04:21.700
<v S1>needs to help you prepare for. And all of this

0:04:21.700 --> 0:04:24.380
<v S1>is proactive. You haven't even asked it. Anything else yet.

0:04:24.420 --> 0:04:26.700
<v S1>So basically, these are going to do something I talked

0:04:26.700 --> 0:04:29.299
<v S1>about a couple of weeks ago, which is managing your

0:04:29.300 --> 0:04:35.300
<v S1>current state relative to your ideal state or your desired state. Right.

0:04:35.500 --> 0:04:39.659
<v S1>How could this current situation that I am in that

0:04:39.660 --> 0:04:42.700
<v S1>Chi is monitoring? How could that be better? So if

0:04:42.700 --> 0:04:45.860
<v S1>you're hungry, Chi will go find food. If if you're

0:04:45.860 --> 0:04:48.140
<v S1>worried they're going to find camera feeds to like see

0:04:48.140 --> 0:04:50.380
<v S1>around corners, like if you're worried about your security or

0:04:50.380 --> 0:04:53.780
<v S1>you're walking around or something. Or they'll listen to scanners

0:04:53.779 --> 0:04:56.420
<v S1>and see if there's, like police activity nearby. Or look

0:04:56.460 --> 0:05:00.060
<v S1>at crime stats for like the, the neighborhood that you're

0:05:00.060 --> 0:05:02.980
<v S1>in or whatever. Never. And that's actually what all these

0:05:02.980 --> 0:05:05.779
<v S1>APIs are over here that you see, right. These are

0:05:05.779 --> 0:05:10.140
<v S1>all the different things that Cai that your Da will

0:05:10.140 --> 0:05:17.539
<v S1>have access to. So APIs are essentially the representations of

0:05:17.860 --> 0:05:22.660
<v S1>people and companies and services. Basically everything becomes an API.

0:05:23.100 --> 0:05:25.700
<v S1>And we're already seeing this as we're going to talk

0:05:25.700 --> 0:05:29.820
<v S1>about we're already seeing this with MCC. This is actually

0:05:29.820 --> 0:05:32.500
<v S1>starting to happen. So so what I said in 2016

0:05:32.500 --> 0:05:38.659
<v S1>was basically everything gets an API. Every person objects, businesses

0:05:38.660 --> 0:05:42.380
<v S1>most importantly people and businesses, but also other objects. That's

0:05:42.380 --> 0:05:45.140
<v S1>why I called it the real Internet of Things. Basically,

0:05:45.140 --> 0:05:48.620
<v S1>everything gets an API and you have your Da navigating

0:05:48.620 --> 0:05:52.979
<v S1>those APIs for you on your behalf, right? So all

0:05:52.980 --> 0:05:55.860
<v S1>these APIs you see here, uh, except for there's going

0:05:55.860 --> 0:05:59.100
<v S1>to be millions of them, right? Eventually billions. But you

0:05:59.100 --> 0:06:02.260
<v S1>start off with thousands and then millions or whatever, but

0:06:02.260 --> 0:06:05.580
<v S1>every company will be an API. Every product will be

0:06:05.580 --> 0:06:10.059
<v S1>an API. People will be broadcasting APIs of ourselves, which

0:06:10.100 --> 0:06:12.620
<v S1>which I call demons is just a Greek word for

0:06:12.660 --> 0:06:16.660
<v S1>like soul, basically. And think of this like your own

0:06:16.660 --> 0:06:20.420
<v S1>personal like MCP server. And these are not designed to

0:06:20.420 --> 0:06:23.180
<v S1>be used by you or me. We can't read all

0:06:23.180 --> 0:06:27.300
<v S1>these APIs. You need help reading all these APIs. You

0:06:27.300 --> 0:06:29.420
<v S1>can't walk into a mall or walk into a city,

0:06:29.420 --> 0:06:32.220
<v S1>or walk down a road or whatever, and read all

0:06:32.220 --> 0:06:34.420
<v S1>the cars and the trees and all the people and

0:06:34.420 --> 0:06:37.020
<v S1>all the businesses. You can't do that. That's why you

0:06:37.020 --> 0:06:39.700
<v S1>need your Da to do that for you. Right. So

0:06:39.700 --> 0:06:43.700
<v S1>all these systems, all these APIs here and the agents

0:06:43.700 --> 0:06:46.339
<v S1>that sort of represent them, they're all designed to be

0:06:46.339 --> 0:06:51.380
<v S1>used by your Da, right? That's their purpose. So it's

0:06:51.380 --> 0:06:53.980
<v S1>like the interface to the world, like changes. It's no

0:06:53.980 --> 0:06:56.940
<v S1>longer about what we see with our eyes, like Google

0:06:56.980 --> 0:07:01.099
<v S1>like old Google now. It's about what do agents see?

0:07:01.140 --> 0:07:04.270
<v S1>What do DAC write? That's the world that starts to

0:07:04.270 --> 0:07:06.469
<v S1>matter a lot more. And a big part of this

0:07:06.470 --> 0:07:08.950
<v S1>is going to be a bunch of APIs that are

0:07:08.950 --> 0:07:14.990
<v S1>actually just concatenations or lists or directories of other APIs,

0:07:15.310 --> 0:07:17.870
<v S1>because one of the things that Midia has to do

0:07:17.870 --> 0:07:20.470
<v S1>is it has to ask, hey, what's the best restaurant

0:07:20.470 --> 0:07:24.910
<v S1>or whatever? And I've got a few here, right? Best food. Um, yeah.

0:07:24.950 --> 0:07:27.790
<v S1>Lookups or whatever. These are all just be, you know,

0:07:27.830 --> 0:07:30.870
<v S1>third party services that do nothing but crawl all the

0:07:30.870 --> 0:07:34.430
<v S1>other ones and rate them so that when Kai reaches

0:07:34.430 --> 0:07:36.590
<v S1>out and says, hey, I need to find the best food,

0:07:36.790 --> 0:07:40.750
<v S1>you know, within like three minutes, uh, close to this location,

0:07:40.750 --> 0:07:43.670
<v S1>but it can't have chicken in it or whatever. All

0:07:43.670 --> 0:07:46.550
<v S1>those different criterias, it can find the right one, right?

0:07:46.590 --> 0:07:48.830
<v S1>So there'll be a whole bunch of like lookup service

0:07:48.870 --> 0:07:52.030
<v S1>type of things like that. Okay. So that takes us

0:07:52.030 --> 0:07:54.310
<v S1>right into the third one which is agent. So we've

0:07:54.310 --> 0:07:56.430
<v S1>got a few agents here. And the way I like

0:07:56.430 --> 0:07:59.190
<v S1>to define an agent, there's lots of different definitions. I

0:07:59.190 --> 0:08:03.150
<v S1>think the agent should be super obvious, like from the definition,

0:08:03.190 --> 0:08:05.430
<v S1>like what it actually means and why it has value.

0:08:05.910 --> 0:08:08.790
<v S1>So I say it's an AI system component that autonomously

0:08:08.790 --> 0:08:11.990
<v S1>pursues a goal by taking multiple steps that previously would

0:08:12.070 --> 0:08:14.910
<v S1>have required a human. I think that is a really

0:08:14.910 --> 0:08:20.190
<v S1>good definition. Um, it's autonomous and it's taking a goal,

0:08:20.750 --> 0:08:24.230
<v S1>and it's pursuing that with multiple steps in a way

0:08:24.230 --> 0:08:27.270
<v S1>that only humans could do before. The part that makes

0:08:27.270 --> 0:08:30.390
<v S1>an agent different than automation, this is really important. This

0:08:30.390 --> 0:08:32.230
<v S1>is why I have it in the definition. The part

0:08:32.230 --> 0:08:35.950
<v S1>that makes it different is the fact that when a

0:08:35.950 --> 0:08:38.709
<v S1>human is trying to get something done, like say you're

0:08:38.710 --> 0:08:40.630
<v S1>an assistant for your boss or whatever, and you're trying

0:08:40.630 --> 0:08:43.270
<v S1>to get something done, like you call the first place.

0:08:43.270 --> 0:08:45.870
<v S1>They don't answer the phone, you call the first place.

0:08:45.870 --> 0:08:49.230
<v S1>The phone number doesn't work. Life is just broken, right? Like,

0:08:49.230 --> 0:08:52.030
<v S1>all these different steps are broken. Now, if you have automation,

0:08:52.030 --> 0:08:56.229
<v S1>automation is static, right? It's a whole bunch of if thens.

0:08:56.350 --> 0:08:59.510
<v S1>Agents aren't. If then they are. I have all these

0:08:59.510 --> 0:09:02.880
<v S1>tools available. I'm going to keep going. I'm going to keep,

0:09:02.920 --> 0:09:07.720
<v S1>you know, exhausting my resources, trying different things to try

0:09:07.720 --> 0:09:11.120
<v S1>to get it done right. I will, you know, maybe,

0:09:11.120 --> 0:09:13.360
<v S1>maybe none of the things work. So I'm going to

0:09:13.360 --> 0:09:16.600
<v S1>do more research to find another, uh, API that I

0:09:16.600 --> 0:09:20.560
<v S1>could use or another service to find this person. Pizza.

0:09:20.600 --> 0:09:24.000
<v S1>Sarah wants pizza. I'm going to get Sarah pizza, and

0:09:24.000 --> 0:09:26.600
<v S1>she's going to do you know, the Da is going

0:09:26.600 --> 0:09:28.480
<v S1>to do multiple things to make sure that she gets

0:09:28.480 --> 0:09:31.600
<v S1>that pizza. That's the difference between automation and agent. So

0:09:31.600 --> 0:09:35.160
<v S1>that's our definition here. And we see that our Da

0:09:35.240 --> 0:09:39.080
<v S1>actually has the use of multiple agents. These agents might

0:09:39.080 --> 0:09:43.120
<v S1>be like researchers. They might be like security bots uh,

0:09:43.120 --> 0:09:46.079
<v S1>to lock down your infrastructure. That could be whatever. But

0:09:46.080 --> 0:09:50.040
<v S1>they all kind of work for Kai, right? Kai is, like,

0:09:50.040 --> 0:09:52.120
<v S1>the centerpiece here. And this is going to be a

0:09:52.120 --> 0:09:55.160
<v S1>theme we're going to see throughout. Agents all over the place,

0:09:55.160 --> 0:09:58.240
<v S1>including inside of companies like we have over here with like,

0:09:58.280 --> 0:10:02.000
<v S1>United or whatever. It's the concept of you're talking to

0:10:02.040 --> 0:10:04.360
<v S1>one agent, but behind it, it has a whole bunch

0:10:04.360 --> 0:10:07.199
<v S1>of other agents. So you give it the goal and

0:10:07.200 --> 0:10:09.880
<v S1>it breaks that down into sub goals and gives that

0:10:09.880 --> 0:10:12.320
<v S1>to the smaller agents, which are then doing the other

0:10:12.320 --> 0:10:16.319
<v S1>things like building a marketing campaign, uh, hacking a website,

0:10:16.320 --> 0:10:20.320
<v S1>doing whatever it is. Right. So that's really the concept

0:10:20.320 --> 0:10:24.200
<v S1>of agents. And Google actually just released a thing called

0:10:24.200 --> 0:10:26.400
<v S1>agent to agent, I think was the name of it.

0:10:26.400 --> 0:10:28.440
<v S1>And what it does is it makes it so that

0:10:28.440 --> 0:10:31.319
<v S1>all these different agents here, they could talk to each

0:10:31.320 --> 0:10:34.880
<v S1>other with a common protocol, which is very similar to MCP,

0:10:35.559 --> 0:10:39.000
<v S1>where it's a common protocol for creating APIs for an

0:10:39.000 --> 0:10:43.440
<v S1>application or, you know, a company or whatever. So we're

0:10:43.440 --> 0:10:47.040
<v S1>starting to see the glue, the protocol glue that's going

0:10:47.080 --> 0:10:49.600
<v S1>to make all this stuff possible with this agent agent

0:10:49.600 --> 0:10:53.760
<v S1>protocol in MCP and stuff like that. So the final

0:10:53.760 --> 0:10:57.120
<v S1>piece here, so we've got we've got the Da, we've

0:10:57.120 --> 0:11:00.480
<v S1>got the assistant, we've got the APIs, we've got the agents.

0:11:00.880 --> 0:11:04.920
<v S1>So the final piece or the final A here in

0:11:04.920 --> 0:11:09.400
<v S1>the four A's is AR or augmented reality. And this

0:11:09.400 --> 0:11:11.200
<v S1>is the one you might be thinking is fringe or

0:11:11.200 --> 0:11:13.640
<v S1>it's like ten years away, but it's actually much closer.

0:11:14.120 --> 0:11:17.160
<v S1>Meta and Apple are currently fighting over this now. Tim

0:11:17.160 --> 0:11:19.800
<v S1>Cook just recently said, look, I'm not going to let

0:11:19.800 --> 0:11:22.760
<v S1>anyone beat us here. They want to beat meta at

0:11:22.760 --> 0:11:27.240
<v S1>this game. Meta already has really good glasses. Um, they're

0:11:27.240 --> 0:11:29.760
<v S1>not actually displaying anything inside of it, but you can

0:11:29.760 --> 0:11:33.720
<v S1>see out of it that takes pictures like it's pretty decent.

0:11:33.880 --> 0:11:36.719
<v S1>And obviously it's not big and heavy and super expensive

0:11:36.720 --> 0:11:39.760
<v S1>like the Vision Pro. So that is a battle that

0:11:39.760 --> 0:11:44.560
<v S1>is happening right now. So we are all eventually I

0:11:44.559 --> 0:11:46.120
<v S1>don't know how long this is going to take. It's

0:11:46.120 --> 0:11:50.230
<v S1>hard to make like specific predictions. Right. So 2 to

0:11:50.230 --> 0:11:54.080
<v S1>5 years, who knows. It's going to be something relatively soon.

0:11:54.280 --> 0:11:57.160
<v S1>Meta or Apple or maybe someone comes out of the

0:11:57.160 --> 0:12:00.680
<v S1>dark and just kind of crushes this. Who knows? But

0:12:00.679 --> 0:12:02.570
<v S1>the point is, we're all going to have these AR

0:12:02.650 --> 0:12:06.170
<v S1>glasses eventually, like contact lenses or something better than that.

0:12:06.170 --> 0:12:09.010
<v S1>But it's going to start off with glasses. And here's

0:12:09.010 --> 0:12:11.530
<v S1>the trick. This is how the whole ecosystem starts to

0:12:11.570 --> 0:12:14.370
<v S1>come together. Our Das are going to be showing us

0:12:14.370 --> 0:12:19.849
<v S1>stuff that is time based, and that is contextually relevant

0:12:19.850 --> 0:12:22.770
<v S1>to whatever we're doing at that moment. Remember, our Da

0:12:22.770 --> 0:12:25.850
<v S1>is trying to optimize everything according to our goals. It's

0:12:25.850 --> 0:12:28.730
<v S1>trying to get to our desired state from our current state.

0:12:28.890 --> 0:12:31.530
<v S1>So if we're walking down a street like this here,

0:12:31.770 --> 0:12:33.809
<v S1>we're walking down a street and like we think it's

0:12:33.809 --> 0:12:37.530
<v S1>kind of dangerous. Yeah, it's going to present this interface here,

0:12:37.530 --> 0:12:40.809
<v S1>which I've got over here coming from this, this demon

0:12:41.290 --> 0:12:44.530
<v S1>called bastion, which is really it's just a company. It's

0:12:44.530 --> 0:12:47.250
<v S1>a company. It's called bastion. And they have feeds called

0:12:47.250 --> 0:12:52.250
<v S1>get feed, poll cameras, poll microphones, query personal mics, get

0:12:52.250 --> 0:12:57.250
<v S1>local CCTV. Right. So maybe it could pull all the different, um,

0:12:57.570 --> 0:13:00.730
<v S1>people who are broadcasting their feed because people are going

0:13:00.730 --> 0:13:04.689
<v S1>to be wearing cameras as well. This is coming soon. Uh, basically.

0:13:04.850 --> 0:13:09.250
<v S1>Camera ahead of you. Camera behind. Behind you. And maybe

0:13:09.250 --> 0:13:14.090
<v S1>you sell your camera feed to Bastian. People will do this.

0:13:14.090 --> 0:13:16.809
<v S1>Trust me. It's going to happen. People are going to

0:13:16.809 --> 0:13:20.290
<v S1>sell their camera feeds to Bastian. Right. It's not going

0:13:20.290 --> 0:13:22.250
<v S1>to be for private stuff. Like it's going to get

0:13:22.250 --> 0:13:24.329
<v S1>turned off when you go home. Stuff like that. You

0:13:24.330 --> 0:13:26.970
<v S1>shouldn't trust that. You should also like cover the camera

0:13:26.970 --> 0:13:29.010
<v S1>or whatever. But the point is, if you're sitting in

0:13:29.010 --> 0:13:33.089
<v S1>Starbucks or whatever and say a fight altercation happens or

0:13:33.090 --> 0:13:36.610
<v S1>something like that, Bastian will be able to show that

0:13:36.610 --> 0:13:39.090
<v S1>to the police or show that to somebody else who's

0:13:39.090 --> 0:13:43.089
<v S1>worried about it. So my Da, while I'm walking down

0:13:43.090 --> 0:13:48.450
<v S1>the street, right? I'm walking down the street here, it's like, oh,

0:13:48.570 --> 0:13:52.450
<v S1>I this neighborhood feels unsafe. That's what I'm saying. I'm

0:13:52.450 --> 0:13:55.250
<v S1>saying this neighborhood feels unsafe or it hears me say

0:13:55.250 --> 0:13:57.850
<v S1>something in a conversation where I'm just like, I don't know,

0:13:57.850 --> 0:14:00.929
<v S1>it's kind of sketchy. I'm a little worried, right? I

0:14:00.929 --> 0:14:04.020
<v S1>say anything like that or even before I say it,

0:14:04.620 --> 0:14:08.540
<v S1>the da Chi goes out and looks at one of

0:14:08.540 --> 0:14:12.060
<v S1>these services to find the best security interface, the best

0:14:12.059 --> 0:14:16.540
<v S1>one for parsing feeds, uh, giving, you know, real time

0:14:16.580 --> 0:14:19.780
<v S1>HUD data and stuff like that. So it gets one back.

0:14:19.780 --> 0:14:23.540
<v S1>It's called bastion. So it starts pulling stuff like that,

0:14:23.940 --> 0:14:27.860
<v S1>it gets back the content. It then goes to another interface,

0:14:28.220 --> 0:14:32.100
<v S1>which is a whole separate company, which is the UI

0:14:32.460 --> 0:14:35.580
<v S1>for this content. Okay. You see these red. This is

0:14:35.580 --> 0:14:39.020
<v S1>a great example. But like let's say there's data here right.

0:14:39.060 --> 0:14:42.020
<v S1>Let's say there's like, oh, how many people are around. Um,

0:14:42.060 --> 0:14:46.340
<v S1>is anyone wearing a weapon. Let's do like gait analysis

0:14:46.340 --> 0:14:48.740
<v S1>to see if they're leaning because they're carrying a gun

0:14:48.740 --> 0:14:51.940
<v S1>or something like that. Right. All this stuff, all these

0:14:51.940 --> 0:14:57.620
<v S1>different individual pieces, different companies are better at, okay, somebody

0:14:57.620 --> 0:15:01.660
<v S1>is better at making this red, cool looking interface. Somebody

0:15:01.660 --> 0:15:05.300
<v S1>is better at doing voice analysis of microphones coming from

0:15:05.300 --> 0:15:10.220
<v S1>all around you. Somebody is better at doing camera analysis of, like,

0:15:10.220 --> 0:15:13.580
<v S1>all the different dangers on the street. All of those

0:15:13.820 --> 0:15:18.220
<v S1>are these right here. This is what every company becomes.

0:15:18.220 --> 0:15:21.780
<v S1>It becomes a specialized thing at doing a thing better

0:15:21.780 --> 0:15:28.220
<v S1>than everyone else, all judged by these indexing services, these

0:15:28.220 --> 0:15:32.900
<v S1>rating services which are marketing to your Da. It is

0:15:32.900 --> 0:15:36.700
<v S1>marketing to Chi. So when I'm walking down the street

0:15:36.700 --> 0:15:39.300
<v S1>and I say, hey, show me what's going on around

0:15:39.300 --> 0:15:41.820
<v S1>me or something like that, or I don't even have

0:15:41.820 --> 0:15:44.220
<v S1>to say it. It just knows I'm freaking out. Why?

0:15:44.380 --> 0:15:47.780
<v S1>Because Chi can see my heart rate. Chi can see

0:15:47.780 --> 0:15:50.620
<v S1>that we're in a place I've never been. Um, somebody

0:15:50.620 --> 0:15:52.820
<v S1>is laying on the street with, like, a needle sticking

0:15:52.820 --> 0:15:56.020
<v S1>out of their arm. Chi figures out this is kind

0:15:56.060 --> 0:15:58.300
<v S1>of seedy. It's a little bit dangerous. I don't like it.

0:15:58.620 --> 0:16:03.990
<v S1>And obviously, my principal Daniel, doesn't like it either. Therefore,

0:16:04.350 --> 0:16:09.550
<v S1>broom goes and searches, finds Bastian, finds a UI. The

0:16:09.550 --> 0:16:12.869
<v S1>best UI. Okay, the best UI is called UI Wizard.

0:16:13.310 --> 0:16:16.550
<v S1>Not too creative, but whatever it's called UI Wizard. UI

0:16:16.590 --> 0:16:19.990
<v S1>Wizard pops up. That's this red interface, and it starts

0:16:19.990 --> 0:16:23.510
<v S1>filling in data where the data come from. From Bastian,

0:16:23.510 --> 0:16:27.270
<v S1>it came from the Bastian service. Where does that go?

0:16:27.470 --> 0:16:32.150
<v S1>This interface is in these glasses, which is on my face.

0:16:32.150 --> 0:16:35.710
<v S1>So now watch this. We've got other scenarios here. Okay?

0:16:36.190 --> 0:16:39.670
<v S1>You start browsing for headphones. Your Da does this. It

0:16:39.670 --> 0:16:43.550
<v S1>uses these services and it gives you back a response.

0:16:43.550 --> 0:16:46.470
<v S1>So I'm looking for headphones. It goes and investigates all

0:16:46.470 --> 0:16:49.950
<v S1>these different things. You mentioned your friend that you're getting hungry.

0:16:50.110 --> 0:16:55.230
<v S1>It goes and looks researches all these different best food places,

0:16:55.510 --> 0:16:58.950
<v S1>parses all 713 different places in like a second and

0:16:58.950 --> 0:17:02.310
<v S1>a half. Gets back the results. Hey, you haven't had

0:17:02.430 --> 0:17:04.670
<v S1>Thai in a while. There's a great little place with

0:17:04.670 --> 0:17:07.629
<v S1>super high ratings if you take a right into blocks.

0:17:07.990 --> 0:17:11.949
<v S1>I can call you in an order if you want. Right.

0:17:11.990 --> 0:17:16.510
<v S1>This is the model. These four A's. This is the model.

0:17:16.510 --> 0:17:20.350
<v S1>This is where this is all heading. This is the direction, right?

0:17:20.550 --> 0:17:25.389
<v S1>So I'm telling you. I'm telling you this. This is

0:17:25.390 --> 0:17:31.030
<v S1>what's happening. It is absolutely exciting to see this starting

0:17:31.070 --> 0:17:34.430
<v S1>to unfold. Right. There's a million different companies working on

0:17:34.430 --> 0:17:39.070
<v S1>this part. There's multiple companies working on the AR glasses part.

0:17:39.109 --> 0:17:42.470
<v S1>Everything is turning into an API already. This is MCP

0:17:42.510 --> 0:17:46.109
<v S1>over here. This is the unification of it. And then

0:17:46.109 --> 0:17:48.590
<v S1>of course, over here we have what's happening on the

0:17:48.590 --> 0:17:52.350
<v S1>corporate side where agents are basically going to be doing

0:17:52.350 --> 0:17:55.310
<v S1>a whole bunch of work. You'll have a humans kind

0:17:55.310 --> 0:17:57.909
<v S1>of in charge of things. The leaders and the in

0:17:57.950 --> 0:18:01.470
<v S1>the extreme, SMEs will be human for quite some time,

0:18:01.470 --> 0:18:03.750
<v S1>I think. I mean, it's going to be pretty hard

0:18:03.750 --> 0:18:06.030
<v S1>to automate everyone away, but a lot of the work

0:18:06.030 --> 0:18:08.710
<v S1>that was getting done is going to be getting done

0:18:08.710 --> 0:18:12.470
<v S1>by agents and teams of agents. So that's agents inside

0:18:12.470 --> 0:18:16.630
<v S1>the corporate place. But as far as the consumer side,

0:18:16.630 --> 0:18:20.150
<v S1>as far as the stuff that you're seeing, like in the,

0:18:20.950 --> 0:18:24.709
<v S1>you know, the OpenAI and anthropic and most of the

0:18:24.710 --> 0:18:27.670
<v S1>stuff that they're talking about is mostly about the consumer

0:18:27.670 --> 0:18:33.590
<v S1>and stuff like that. This is it. This is the structure. Okay.

0:18:33.630 --> 0:18:36.590
<v S1>So another example of this is like let's say you're

0:18:36.590 --> 0:18:39.670
<v S1>in like a live conversation and you're having a conversation

0:18:39.670 --> 0:18:43.550
<v S1>with somebody and it's like somebody you've never met before

0:18:43.670 --> 0:18:45.830
<v S1>and you're considering whether to go into business with them

0:18:45.830 --> 0:18:48.110
<v S1>or whatever, and they're making a whole bunch of claims.

0:18:48.109 --> 0:18:50.510
<v S1>They're like, oh, yeah, I used to work with so-and-so

0:18:50.510 --> 0:18:52.390
<v S1>and blah blah, blah. And actually I helped him start

0:18:52.390 --> 0:18:56.510
<v S1>his business and, uh, yeah. Do you know Sarah? Yeah. Sarah.

0:18:56.670 --> 0:18:58.750
<v S1>You know, I went to college with her and blah, blah, blah.

0:18:59.590 --> 0:19:03.440
<v S1>So again, you're wearing glasses. Everyone's wearing glasses. The person

0:19:03.440 --> 0:19:07.359
<v S1>you're talking talking to is actually wearing these glasses as well,

0:19:07.840 --> 0:19:11.120
<v S1>and you're having this conversation. But in the whole time

0:19:11.119 --> 0:19:15.040
<v S1>you're wondering like, is this actually true? Is all the

0:19:15.040 --> 0:19:18.440
<v S1>stuff that this person claimed happened or the people that

0:19:18.440 --> 0:19:21.200
<v S1>they claim they know or whatever? Is this all true? Right.

0:19:21.320 --> 0:19:25.560
<v S1>So what will happen is you'll have like something going

0:19:25.560 --> 0:19:28.880
<v S1>off to voice analysis. This depends how many things you're

0:19:28.880 --> 0:19:31.320
<v S1>subscribed to. It depends how far along we are in

0:19:31.320 --> 0:19:34.960
<v S1>this cycle. You know what all your Da can actually do.

0:19:34.960 --> 0:19:37.800
<v S1>But this is this is all sort of being built

0:19:37.800 --> 0:19:42.200
<v S1>right now. So like, if there's tension in their voice, like, uh,

0:19:42.200 --> 0:19:44.840
<v S1>analyzing the claims that they're making, doing research on it,

0:19:44.880 --> 0:19:47.720
<v S1>did they actually go to college? Did did Sarah and

0:19:47.720 --> 0:19:50.000
<v S1>this person actually were they in college at the same time?

0:19:50.000 --> 0:19:52.119
<v S1>That should be on LinkedIn. Let's go find that out.

0:19:52.720 --> 0:19:54.840
<v S1>So if you're waiting on a delivery and this goes

0:19:54.840 --> 0:19:56.720
<v S1>back to the R side, if you're waiting on the

0:19:56.720 --> 0:20:00.480
<v S1>delivery you will see a timer timing down right. Just

0:20:00.480 --> 0:20:03.760
<v S1>like you have on on your phone. Now that will

0:20:03.760 --> 0:20:05.760
<v S1>be in your interface so you don't have to pick

0:20:05.760 --> 0:20:08.480
<v S1>up a phone. The whole point is with AR glasses

0:20:08.680 --> 0:20:11.840
<v S1>is to have to do much less with an actual

0:20:11.840 --> 0:20:14.480
<v S1>physical device that you pick up and have to interact with.

0:20:14.640 --> 0:20:17.560
<v S1>It'll be a lot more. It's visual here, and you're

0:20:17.560 --> 0:20:21.000
<v S1>just talking to your Da, and your Da is doing

0:20:21.000 --> 0:20:24.440
<v S1>most of the work yourself using all these different services, right.

0:20:25.000 --> 0:20:29.359
<v S1>So where is the data coming from? Right. How is

0:20:29.359 --> 0:20:32.520
<v S1>it being displayed in the glasses. That's exactly what we

0:20:32.520 --> 0:20:36.160
<v S1>talked about earlier. The data is coming from these services

0:20:36.840 --> 0:20:40.360
<v S1>moving through a UI being displayed in the glasses. Right.

0:20:41.000 --> 0:20:43.199
<v S1>So in the case of like trying to determine if

0:20:43.200 --> 0:20:46.560
<v S1>someone is lying, right. Let's say there's like a lie

0:20:46.560 --> 0:20:50.040
<v S1>o meter interface inside of this little UI here inside

0:20:50.040 --> 0:20:54.439
<v S1>your AR. Well, somebody is providing that lie o meter interface, right?

0:20:54.480 --> 0:20:57.800
<v S1>There is some one of these companies is actually the

0:20:57.800 --> 0:21:02.240
<v S1>voice analysis, uh, providing back data. And that's just like

0:21:02.330 --> 0:21:07.050
<v S1>Jason coming back. That is like, um, saying, like the

0:21:07.090 --> 0:21:10.170
<v S1>chances of them being lying about this particular thing, like,

0:21:10.210 --> 0:21:14.290
<v S1>according to voice analysis or like, according to the research

0:21:14.290 --> 0:21:17.850
<v S1>that was done. Right. All these things can be combined together.

0:21:17.890 --> 0:21:21.330
<v S1>Like that stuff could actually just be returned. Roar back

0:21:21.330 --> 0:21:24.770
<v S1>to Chi, and Chi could look at all that and

0:21:24.770 --> 0:21:29.370
<v S1>send another feed into the UI to update that Leo meter, right?

0:21:29.490 --> 0:21:32.730
<v S1>There's so many options here because because these are APIs,

0:21:32.770 --> 0:21:35.170
<v S1>this is just JSON or whatever. The protocol is going

0:21:35.210 --> 0:21:38.370
<v S1>to be flowing back and forth, right? That's the power

0:21:38.369 --> 0:21:41.850
<v S1>of this entire thing. So let's zoom out again and

0:21:41.850 --> 0:21:44.730
<v S1>let's just take a look at this entire thing. Okay.

0:21:44.970 --> 0:21:48.090
<v S1>Again just as a review here. Look at the four A's.

0:21:48.250 --> 0:21:52.010
<v S1>Chi knows what you want at all times. Constantly calling APIs,

0:21:52.010 --> 0:21:55.689
<v S1>making requests, summarizing things, creating reports for you, researching the

0:21:55.690 --> 0:21:58.810
<v S1>best options, anticipating your needs throughout the day and week

0:21:58.810 --> 0:22:02.530
<v S1>and month or year or whatever. Adjusting your ah, interface.

0:22:02.609 --> 0:22:06.810
<v S1>Constantly switching. Look, you're in the house. You maybe you

0:22:06.810 --> 0:22:09.570
<v S1>see your books. Maybe you see how hungry everyone is.

0:22:09.570 --> 0:22:13.010
<v S1>Like this. This interface is constantly changing. Chi is changing

0:22:13.010 --> 0:22:15.810
<v S1>it for you using these different APIs. Right. And the

0:22:15.810 --> 0:22:18.689
<v S1>interface that Chi is using will be coming from multiple

0:22:18.690 --> 0:22:23.210
<v S1>companies as well. Maybe, maybe Chi's generic interface actually gets

0:22:23.250 --> 0:22:27.169
<v S1>good enough. Maybe one of these companies is a generic

0:22:27.170 --> 0:22:31.929
<v S1>interface creator, so you don't actually have to use individual ones. Right.

0:22:31.970 --> 0:22:34.770
<v S1>And Kyle just switched using that one. So when you're

0:22:34.770 --> 0:22:37.369
<v S1>looking at books you're inside of a library. It's a

0:22:37.369 --> 0:22:39.490
<v S1>different HUD. You know, walking down the street it's a

0:22:39.490 --> 0:22:42.650
<v S1>different HUD talking to someone different HUD. Right. So if

0:22:42.690 --> 0:22:45.010
<v S1>you take a step back and you look at this

0:22:45.010 --> 0:22:48.169
<v S1>interface here, right, think of all the news that you've

0:22:48.170 --> 0:22:51.570
<v S1>heard from the last few years in. I think about

0:22:51.570 --> 0:22:54.170
<v S1>the latest news from OpenAI. They just added like long

0:22:54.250 --> 0:22:59.690
<v S1>term memory, where it's going to remember all your previous conversations. Right. Um,

0:22:59.730 --> 0:23:03.690
<v S1>think about digital companion companies, uh, digital helper apps that

0:23:03.690 --> 0:23:06.130
<v S1>will like go and do tasks for you. Siri and

0:23:06.130 --> 0:23:09.889
<v S1>Gemini on the mobile device. And of course, you all

0:23:09.890 --> 0:23:12.330
<v S1>heard the stuff about Siri. Right? Where they're trying to

0:23:12.369 --> 0:23:14.890
<v S1>make this thing a better assistant. They're trying to give

0:23:14.890 --> 0:23:18.010
<v S1>it access to more and more data about you. Right.

0:23:18.330 --> 0:23:21.010
<v S1>And they're trying to do it in a secure way, obviously.

0:23:21.250 --> 0:23:23.930
<v S1>And Gemini is competing in that space as well. Samsung

0:23:23.930 --> 0:23:27.730
<v S1>has their own version. Right. This is all like heading

0:23:27.730 --> 0:23:30.410
<v S1>in that direction of like the Da. All of them

0:23:30.410 --> 0:23:34.290
<v S1>are heading in this direction of the unified right. That's

0:23:34.290 --> 0:23:37.090
<v S1>the easiest way to see this. So now think about

0:23:37.090 --> 0:23:41.050
<v S1>all the news around MCP and APIs and how they'll

0:23:41.050 --> 0:23:43.090
<v S1>be able to talk to each other with the agent

0:23:43.130 --> 0:23:46.730
<v S1>agent protocol and all of those different things. Basically, that's

0:23:46.730 --> 0:23:49.610
<v S1>what we're talking about with everything that's an API. And

0:23:49.609 --> 0:23:51.530
<v S1>you already hear all the talk about agents. That's all

0:23:51.530 --> 0:23:54.210
<v S1>everyone talks about now. And then you think about the

0:23:54.210 --> 0:23:58.650
<v S1>news about Meta and Apple fighting about AR glasses. And

0:23:58.650 --> 0:24:03.300
<v S1>that's this piece over here. Right. So everyone is moving

0:24:03.540 --> 0:24:07.260
<v S1>towards these four A's. This is it. This is the

0:24:07.260 --> 0:24:11.540
<v S1>ecosystem that we're creating. It's absolutely insane. And it's starting

0:24:11.540 --> 0:24:14.820
<v S1>to fill in. And like again, now that you've seen it,

0:24:14.820 --> 0:24:17.740
<v S1>I think you're going to realize that all the new

0:24:17.740 --> 0:24:20.740
<v S1>news is just filling in pieces of this and eventually

0:24:20.740 --> 0:24:23.939
<v S1>getting us to this. And it's very sort of cyberpunk

0:24:23.980 --> 0:24:27.620
<v S1>y and very future oriented. But so many of these

0:24:27.619 --> 0:24:32.140
<v S1>pieces are already exist, like this is just HTTP going

0:24:32.140 --> 0:24:35.180
<v S1>back and forth like these protocols are not too difficult.

0:24:35.180 --> 0:24:37.860
<v S1>The only difficult part right now is just like the

0:24:37.859 --> 0:24:41.460
<v S1>hardware challenge of like the AR stuff is really difficult,

0:24:41.780 --> 0:24:44.340
<v S1>and that's probably the thing that's going to take the longest.

0:24:44.540 --> 0:24:48.060
<v S1>But agents are getting better. Like the AI itself is

0:24:48.060 --> 0:24:52.420
<v S1>getting smarter and smarter. Like context size is a huge

0:24:52.420 --> 0:24:56.140
<v S1>one and memory is a huge one that is dramatically

0:24:56.140 --> 0:24:59.700
<v S1>getting better. Like for one now has a million tokens. Um,

0:25:00.460 --> 0:25:04.900
<v S1>a couple of the Google ones now have 4 million tokens,

0:25:04.900 --> 0:25:07.500
<v S1>I think, for context. So all of this is starting

0:25:07.500 --> 0:25:09.820
<v S1>to come together. All right. So anyway, that's what I

0:25:09.820 --> 0:25:13.300
<v S1>wanted to share. And I think this is a really

0:25:13.300 --> 0:25:16.140
<v S1>powerful way of just interpreting what's coming in in terms

0:25:16.140 --> 0:25:19.619
<v S1>of the news and putting that into a context and

0:25:19.660 --> 0:25:21.820
<v S1>showing how it fits into a model. I think it's

0:25:21.820 --> 0:25:24.379
<v S1>just useful to be able to parse things in that

0:25:24.380 --> 0:25:27.020
<v S1>way and make sense of it. Keep in mind, we

0:25:27.020 --> 0:25:30.220
<v S1>don't actually know the details of any of these pieces, right?

0:25:30.500 --> 0:25:33.340
<v S1>Is MCP going to win? Who knows? That new agent

0:25:33.340 --> 0:25:35.460
<v S1>to agent protocol that Google came out with? Is that

0:25:35.460 --> 0:25:38.419
<v S1>actually going to be useful? Who knows. Like it might

0:25:38.420 --> 0:25:41.820
<v S1>not be useful at all. Like nobody could adopt it.

0:25:41.820 --> 0:25:45.540
<v S1>Like who's going to win with the glasses, right. Is

0:25:45.540 --> 0:25:46.980
<v S1>it is it going to be Apple. Is it going

0:25:46.980 --> 0:25:49.740
<v S1>to be meta? Is it going to be someone completely separate?

0:25:49.780 --> 0:25:53.740
<v S1>I have no idea. Right. These things are not super predictable.

0:25:54.260 --> 0:25:59.180
<v S1>I believe that my 2016 like outlook and predictions from

0:25:59.180 --> 0:26:02.260
<v S1>back then, it's human based. It's based on the fact

0:26:02.260 --> 0:26:04.949
<v S1>that I know what I want my agent to be

0:26:04.950 --> 0:26:08.350
<v S1>able to do. And you know, I've been in tech

0:26:08.350 --> 0:26:11.270
<v S1>for so long that I understand all these protocols and stuff,

0:26:11.270 --> 0:26:13.750
<v S1>so I just assumed this is the way it was going.

0:26:13.830 --> 0:26:15.990
<v S1>And it turns out to be happening. But you can't

0:26:15.990 --> 0:26:19.230
<v S1>predict the companies. You can't predict the timelines, you can't

0:26:19.230 --> 0:26:22.030
<v S1>predict any of this. So that's what makes it so exciting.

0:26:22.310 --> 0:26:24.510
<v S1>I just hope that this model helps you sort of

0:26:24.550 --> 0:26:27.510
<v S1>make sense of the news as it comes in, and

0:26:27.510 --> 0:26:30.950
<v S1>hopefully moves it in this direction towards this model, and

0:26:30.950 --> 0:26:33.470
<v S1>just makes it easier for you to parse the news

0:26:33.630 --> 0:26:36.710
<v S1>and make sense of it. So, uh, do me a

0:26:36.710 --> 0:26:39.429
<v S1>favor and subscribe and I'll see you in the next one.

0:26:40.230 --> 0:26:44.190
<v S1>Unsupervised learning is produced on Hindenburg Pro using an Sm7

0:26:44.230 --> 0:26:47.870
<v S1>B microphone. A video version of the podcast is available

0:26:47.869 --> 0:26:51.510
<v S1>on the Unsupervised Learning YouTube channel, and the text version

0:26:51.510 --> 0:26:56.630
<v S1>with full links and notes is available at Daniel Miessler newsletter.

0:26:57.230 --> 0:26:58.229
<v S1>We'll see you next time.