WEBVTT - Hallucinating with AI 0:00:04.440 --> 0:00:12.399 Welcome to tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.440 --> 0:00:16.240 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:16.280 --> 0:00:19.760 I'm an executive producer with iHeartRadio and how the tech 0:00:19.800 --> 0:00:23.480 are you. At the beginning of this year, that being 0:00:23.920 --> 0:00:26.360 twenty twenty three, I said like it felt like it 0:00:26.400 --> 0:00:28.520 was going to be the year of AI, and so 0:00:28.600 --> 0:00:31.319 far I think I'm pretty much on the money. But 0:00:31.800 --> 0:00:35.680 more specifically, twenty twenty three has been the year of 0:00:36.200 --> 0:00:43.320 generative AI. That is artificial intelligence that creates or generates something, 0:00:43.600 --> 0:00:47.199 whether it's an image, a sound, or as we're going 0:00:47.280 --> 0:00:51.680 to talk about today, text in response to some sort 0:00:51.840 --> 0:00:55.480 of input. Now, before we go any further, this is 0:00:55.480 --> 0:00:57.800 where we need to remind ourselves that while this is 0:00:57.920 --> 0:01:03.120 a type of artificial intelligence, it's not all of AI. 0:01:03.760 --> 0:01:09.600 Not every AI application involves generative processes. And while generative 0:01:09.600 --> 0:01:15.479 AI can seem fascinating, exciting, surprising, or creepy, I believe 0:01:15.520 --> 0:01:19.520 that largely stems from how generative AI appears to be 0:01:19.680 --> 0:01:26.080 mimicking humans, and it's not an indication of how sophisticated, advanced, 0:01:26.240 --> 0:01:30.680 or dangerous it really is. It's kind of an uncanny 0:01:31.000 --> 0:01:35.679 Valley thing because it appears to be behaving like a human, 0:01:36.360 --> 0:01:41.720 we start to project things on it that aren't necessarily 0:01:42.400 --> 0:01:45.319 accurate or realistic. I think of it kind of like 0:01:45.520 --> 0:01:48.360 the way we can be with our pets, where we 0:01:48.400 --> 0:01:51.360 will project things on our pets that may not reflect 0:01:51.440 --> 0:01:54.800 what the pet is actually experiencing, but that's how we're 0:01:54.840 --> 0:01:58.040 perceiving it. So the reason I say all of this 0:01:58.160 --> 0:01:59.840 up at the very top of this episode is that 0:01:59.840 --> 0:02:04.760 we're also seeing a lot of people expressing concern about AI, 0:02:05.000 --> 0:02:09.239 which is understandable. You know about how it could potentially 0:02:09.360 --> 0:02:15.959 lead to harm, and these are legitimate and rational concerns. However, 0:02:16.560 --> 0:02:21.160 with the focus on stuff like chat GPT for example, 0:02:21.320 --> 0:02:25.280 or Google Bard, I would argue the concern is far 0:02:25.360 --> 0:02:29.800 too narrowly focused on just one aspect of AI, and 0:02:30.040 --> 0:02:34.200 in my opinion, it's not even the most dangerous implementation 0:02:34.280 --> 0:02:38.280 of AI. I mean, we have cars on the road 0:02:38.720 --> 0:02:44.040 right now that use AI for driver assists and autonomous operations. 0:02:44.520 --> 0:02:47.800 If we're worried about the robots taking us down, maybe 0:02:47.840 --> 0:02:51.639 we shouldn't make them our chauffeurs. But really that's a 0:02:51.680 --> 0:02:55.120 topic for another episode. Today, I wanted to take a 0:02:55.160 --> 0:02:58.280 look at an issue that crops up in AI chat 0:02:58.320 --> 0:03:01.960 bots like open ai or goole Bard and similar products. 0:03:02.560 --> 0:03:05.560 This is one that is concerning because it's an issue 0:03:05.560 --> 0:03:09.800 that leads these tools to create false or misleading information 0:03:10.360 --> 0:03:13.920 while presenting that info in a way that seems authoritative 0:03:14.000 --> 0:03:17.680 and trustworthy. And in the field of AI, the term 0:03:18.000 --> 0:03:22.640 hallucination is used to describe this situation. At least a 0:03:22.680 --> 0:03:25.400 lot of folks will use the word hallucination. As it 0:03:25.440 --> 0:03:29.160 turns out, there's actually some debate in AI circles about 0:03:29.160 --> 0:03:32.799 whether or not that should be the appropriate term. Now 0:03:33.200 --> 0:03:36.960 for we mirror mortals, a hallucination is when we have 0:03:37.000 --> 0:03:42.360 an experience in which we perceive something that isn't reflected 0:03:42.520 --> 0:03:46.480 in reality. Maybe we hear a sound but there was 0:03:46.520 --> 0:03:50.640 actually no sound present. Maybe it was that tree falling 0:03:50.680 --> 0:03:53.120 in the woods and no one was around or something, 0:03:53.720 --> 0:03:57.680 or we see something that's not really there. It can 0:03:57.720 --> 0:04:03.320 be really darn disconcerting, and sometimes it can be absolutely terrifying. 0:04:03.680 --> 0:04:07.760 I'm reminded of how many people who experience sleep paralysis 0:04:07.800 --> 0:04:13.880 often will also have hallucinations accompany this period where they're 0:04:13.960 --> 0:04:17.599 awake but they cannot move, and it's probably because sleep 0:04:17.600 --> 0:04:21.640 paralysis occurs when you're kind of caught between being asleep 0:04:21.800 --> 0:04:25.120 and being awake, so there's still some dream like activity 0:04:25.160 --> 0:04:28.760 going on in your brain that's trying to explain things 0:04:28.839 --> 0:04:31.719 like why you're unable to move. Oh, it's because you 0:04:31.839 --> 0:04:35.960 have this witch perched on your chest and she's pinning 0:04:36.000 --> 0:04:41.200 you to the bed. Tools like chat GPT are not dreaming, 0:04:41.680 --> 0:04:45.279 you know, they're not perceiving anything at all. They have 0:04:46.040 --> 0:04:52.160 no senses to trigger, so they cannot hallucinate in that sense. Instead, 0:04:52.240 --> 0:04:57.440 what they are doing is mistakenly assigning high confidence to 0:04:57.560 --> 0:05:01.200 something that they just plane made up. So they're treating 0:05:01.200 --> 0:05:05.919 it like it's a fact that they're highly confident is accurate, 0:05:06.440 --> 0:05:11.000 when really they just invented it. So it is an 0:05:11.000 --> 0:05:14.640 instance where they're really confident in something that is not 0:05:15.279 --> 0:05:19.520 coming from a reliable source in the AI's actual training data. 0:05:20.040 --> 0:05:22.200 So if we wanted to put that into human terms, 0:05:22.640 --> 0:05:25.040 it'd be kind of like if you made up a 0:05:25.120 --> 0:05:29.040 story to explain something that otherwise would either be really 0:05:29.080 --> 0:05:32.720 boring or maybe really embarrassing. So you make up a lie, 0:05:33.160 --> 0:05:35.080 in other words, to cover up something that you would 0:05:35.160 --> 0:05:38.640 rather not be known, And so you tell this lie 0:05:39.120 --> 0:05:41.400 over and over when people are asking you about this 0:05:41.440 --> 0:05:45.920 particular thing, and you repeat it often enough where gradually 0:05:45.960 --> 0:05:49.560 your brain essentially makes a pathway where this fake version 0:05:49.560 --> 0:05:53.760 of history of what actually happened becomes the real one 0:05:54.279 --> 0:05:56.960 in your head. You begin to believe your own lie, 0:05:57.080 --> 0:05:59.680 and so in future tellings of the story, you don't 0:05:59.680 --> 0:06:02.240 even realize you're lying at all. You're telling what you 0:06:02.360 --> 0:06:05.440 believe to be the real sequence of events, even though 0:06:05.440 --> 0:06:08.800 it's all a fib. That's kind of what's happening with 0:06:08.839 --> 0:06:14.039 AI hallucinations, only it happens all at once, And for 0:06:14.120 --> 0:06:17.440 that reason, some folks prefer to use other terms to 0:06:17.520 --> 0:06:21.640 describe what AI does when it starts to invent things 0:06:21.760 --> 0:06:25.520 in response to a query from a user. So some 0:06:25.600 --> 0:06:30.840 have proposed the word confabulation as an alternative descriptor of 0:06:30.839 --> 0:06:33.640 what's going on. So this is similar to kind of 0:06:33.680 --> 0:06:38.240 the scenario I just gave, because it's in human psychology. 0:06:38.240 --> 0:06:41.760 A confabulation is when we have a hitch in our memory, 0:06:42.160 --> 0:06:44.680 and so we fill in a gap that's in our memory. 0:06:44.680 --> 0:06:47.840 We're not doing it consciously, it just happens, and that 0:06:47.920 --> 0:06:49.720 might mean we fill in the gap that doesn't at 0:06:49.760 --> 0:06:53.400 all reflect what really happened. So this can happen at 0:06:53.440 --> 0:06:56.560 any time. I've seen it happen with people who are 0:06:56.640 --> 0:07:00.360 in like a situation that was totally on a expected 0:07:00.400 --> 0:07:03.600 in high stress. I've seen it in training operations where 0:07:04.120 --> 0:07:07.560 you have a group of people and then someone bursts 0:07:07.640 --> 0:07:11.200 in as if they are a burglar or a thief or something, 0:07:11.560 --> 0:07:14.720 and then they get out, and then those people who 0:07:14.760 --> 0:07:18.840 were just subjected to this very scary situation are asked 0:07:18.880 --> 0:07:22.200 to give details about the thief's appearance, and people start 0:07:22.240 --> 0:07:26.960 to invent things, not purposefully, not with the intent to deceive, 0:07:27.400 --> 0:07:29.440 but because their memory is just trying to fill in 0:07:29.480 --> 0:07:32.760 gaps because their perception didn't really take it all in. 0:07:33.400 --> 0:07:37.440 So confabulation doesn't imply intent, and I think that might 0:07:37.480 --> 0:07:40.320 be why a lot of researchers like the word, because 0:07:40.880 --> 0:07:44.840 it's not the intention of the AI to fool people 0:07:45.320 --> 0:07:49.800 or to pass off fantasy as if it were reality. Instead, 0:07:50.120 --> 0:07:53.320 the AI is making an honest go of trying to 0:07:53.360 --> 0:07:56.240 meet the expectations of the user. So if you ask 0:07:56.320 --> 0:08:01.000 the AI about, say a historical figure, really tries to 0:08:01.000 --> 0:08:04.720 give you a good answer, but occasionally that answer might 0:08:04.760 --> 0:08:08.240 be wrong, not because the AI is drawing from a 0:08:08.360 --> 0:08:12.000 bad data source, but because there's actually a gap in 0:08:12.040 --> 0:08:15.200 its knowledge, and the AI just fills that gap as 0:08:15.240 --> 0:08:19.040 best it can. Unfortunately, the end result is you get 0:08:19.080 --> 0:08:23.200 an answer that seems totally cromulent, like you could just 0:08:23.280 --> 0:08:28.880 imagine reading that answer in a respectable, thoroughly fact check encyclopedia, 0:08:29.360 --> 0:08:33.560 but then it turns out to be garbage. So let's 0:08:33.600 --> 0:08:36.800 talk about how this happens, which will involve an overview 0:08:36.880 --> 0:08:39.920 of how these chatbought AI tools are trained and at 0:08:39.960 --> 0:08:42.800 a very very high level, how they work. So this 0:08:42.880 --> 0:08:48.480 is going to involve some discussion about machine learning and statistics. So, 0:08:48.559 --> 0:08:54.080 first off, how do machines actually learn? I think it's 0:08:54.120 --> 0:08:57.679 pretty easy to understand. How we program machines to do 0:08:58.600 --> 0:09:01.959 some specific task. Right, we create a set of rules 0:09:02.400 --> 0:09:07.200 that this machine follows sequentially, and the machine executes those 0:09:07.320 --> 0:09:10.720 rules as directed, and then we get the result we wanted. 0:09:10.800 --> 0:09:13.920 That is easy to understand. So I'll give an example. 0:09:13.960 --> 0:09:16.439 Let's say we have a robotic arm and you've got 0:09:16.480 --> 0:09:19.200 two tables, and you put a wooden block on table 0:09:19.320 --> 0:09:23.240 number one, and you program the robotic arm to pick 0:09:23.360 --> 0:09:26.320 up this wooden block on table one and move it 0:09:26.360 --> 0:09:29.640 over to table two. Once you program it then it 0:09:29.679 --> 0:09:31.720 should be able to do that task over and over, 0:09:31.880 --> 0:09:34.880 assuming that no one has moved the tables. No one 0:09:34.920 --> 0:09:37.560 has moved the robotic arm, and the wooden block is 0:09:37.640 --> 0:09:41.360 always in the same place and it's always the same size. Right, 0:09:41.400 --> 0:09:43.560 you haven't changed any of the parameters, so it's the 0:09:43.600 --> 0:09:46.400 exact same situation over and over and over again. You've 0:09:46.400 --> 0:09:49.640 created this simple program. It should be no surprise when 0:09:49.640 --> 0:09:52.920 the robotic arm does it successfully. But what if we 0:09:52.960 --> 0:09:55.920 wanted a robotic arm that could learn how to pick 0:09:56.000 --> 0:09:59.720 up different objects from table one and then move them 0:09:59.760 --> 0:10:02.800 to t able to These objects could be different shapes, 0:10:02.800 --> 0:10:05.880 they could be different sizes, they could weigh different amounts, 0:10:06.120 --> 0:10:08.560 They might be made of different stuff. Maybe some of 0:10:08.559 --> 0:10:12.760 them are fairly delicate and the arm would break the 0:10:12.840 --> 0:10:15.680 object if it applied too much pressure. So how would 0:10:15.720 --> 0:10:18.640 we build a robotic arm that could deal with these 0:10:18.679 --> 0:10:23.439 different scenarios, including ones where we put something completely new 0:10:23.520 --> 0:10:26.640 to the robot on the table, something that the robot 0:10:26.679 --> 0:10:31.240 has never encountered before. Well, to do that, we would 0:10:31.320 --> 0:10:36.320 probably pursue a machine learning model in order to teach 0:10:36.440 --> 0:10:41.480 this robot the whole process of picking something up, especially 0:10:41.520 --> 0:10:45.840 something it had not encountered before. So basically, machine learning 0:10:46.080 --> 0:10:49.120 uses sets of algorithms in an effort to get better 0:10:49.600 --> 0:10:54.560 at a given task, and part of learning involves training, 0:10:54.600 --> 0:10:59.080 which really boils down to feeding a machine lots and 0:10:59.160 --> 0:11:02.480 lots and lots of information, like the more information you 0:11:02.520 --> 0:11:07.320 can feed it, the better, and then letting it process 0:11:07.360 --> 0:11:10.520 this information in an effort to get a specific result, 0:11:11.080 --> 0:11:15.280 and then going back and tweaking the model to refine 0:11:15.320 --> 0:11:18.280 it over and over and over and over again to 0:11:18.480 --> 0:11:22.360 get better at it over time. So we'll imagine a 0:11:22.440 --> 0:11:26.480 hypothetical machine learning model that is designed to do something 0:11:26.559 --> 0:11:30.760 relatively simple like recognize if an image has a cat 0:11:31.040 --> 0:11:33.760 in it or not, because this is actually something that 0:11:33.880 --> 0:11:36.760 has been done with machine learning models in the past. 0:11:36.960 --> 0:11:40.960 It's actually a fairly popular approach is does this picture 0:11:40.960 --> 0:11:43.200 have a cat in it? Or does this video have 0:11:43.240 --> 0:11:45.920 a cat in it? That kind of thing. Let's imagine 0:11:46.000 --> 0:11:49.520 that our machine learning model is an actual physical model, 0:11:49.679 --> 0:11:53.319 like it's a giant funnel. So on the wide end 0:11:53.320 --> 0:11:56.360 of the funnel, that's where we just dump tons of 0:11:56.400 --> 0:11:59.120 photographs with some of them have cats in them, some 0:11:59.160 --> 0:12:02.560 of them don't. Now imagine that at the narrow end 0:12:02.600 --> 0:12:05.160 of the funnel. At the bottom of the funnel, we 0:12:05.200 --> 0:12:09.200 actually have two channels. One channel leads into a bucket 0:12:09.440 --> 0:12:13.160 that says no cats here, and the other channel leads 0:12:13.160 --> 0:12:16.920 to a bucket that says, ah, sweet kitty cats. So 0:12:17.520 --> 0:12:23.280 we dump thousands, maybe millions of photographs into the top 0:12:23.320 --> 0:12:27.280 of this funnel, and the funnel starts to sort the pictures. 0:12:27.640 --> 0:12:30.080 We can't see this because it's inside the funnel, but 0:12:30.120 --> 0:12:35.119 there are channels inside that funnel where photos are directed 0:12:35.559 --> 0:12:40.240 either to go more toward the no kittycat side or 0:12:40.280 --> 0:12:44.560 the yes kittykat side, And they go through these channels 0:12:44.640 --> 0:12:47.200 all down the funnel, and at the very end of it, 0:12:47.880 --> 0:12:51.480 they start spitting out these images into the two buckets. Well, 0:12:51.480 --> 0:12:54.520 once it's done, once it has processed all the photos, 0:12:54.760 --> 0:12:56.719 we take the two buckets and we see how our 0:12:56.760 --> 0:12:59.520 model did. And maybe we see that the model caught 0:12:59.679 --> 0:13:02.240 most of the pictures with cats in them, but not 0:13:02.360 --> 0:13:05.280 all of them. Maybe we also see that there are 0:13:05.320 --> 0:13:08.160 some photos that fell into the kitty cat bucket that 0:13:08.280 --> 0:13:12.440 have exactly zero kitty cats in the picture. Something is 0:13:12.480 --> 0:13:15.760 not working inside our model. So at that point we 0:13:15.920 --> 0:13:19.800 open the funnel, we take the top off or whatever 0:13:19.880 --> 0:13:23.280 we have built in a hinged latch or something, and 0:13:23.320 --> 0:13:26.480 we've opened it up. Now essentially inside our funnel, we 0:13:26.520 --> 0:13:29.320 see all those channels, and each channel is meant to 0:13:29.320 --> 0:13:31.280 look for some sort of evidence of a cat, and 0:13:31.320 --> 0:13:34.520 if it finds evidence, it pushes it closer toward the 0:13:34.559 --> 0:13:37.760 pathway of kitty cat, and if it doesn't, it pushes 0:13:37.800 --> 0:13:41.640 it closer to the pathway of no kitty cat. But 0:13:41.720 --> 0:13:44.520 there's tons of these channels. Some of them feed images 0:13:44.640 --> 0:13:48.920 back up through the whole process. Again, it's very complicated 0:13:48.920 --> 0:13:52.400 inside this funnel, and you have to go in there 0:13:52.440 --> 0:13:57.120 and start to tweak little bits of rules in these 0:13:57.720 --> 0:14:01.880 channels to adjust for whatever problem you're encountering at the 0:14:01.960 --> 0:14:05.679 end result when you're done. So, when you're training your model, 0:14:06.080 --> 0:14:11.000 you change the weights of these different decisions that are made. 0:14:11.080 --> 0:14:14.520 Some decisions perhaps have too much emphasis on them. They 0:14:14.720 --> 0:14:18.040 like they're too powerful and they're skewing the results. So 0:14:18.120 --> 0:14:22.760 you reduce the weight of that particular decision point and 0:14:22.800 --> 0:14:25.000 you increase the weight of a different one to try 0:14:25.040 --> 0:14:28.560 and get things right. It's a painstaking process and you 0:14:28.600 --> 0:14:30.800 have to do it over and over again, and these 0:14:30.840 --> 0:14:35.200 exercises repeat and you try to refine your model to 0:14:35.240 --> 0:14:38.440 get it better at deciding whether or not a photograph 0:14:39.320 --> 0:14:41.880 has got a cabinet or does it, and eventually, if 0:14:41.920 --> 0:14:44.880 everything is working well, it gets very very good at 0:14:44.920 --> 0:14:47.920 sorting images. Maybe once in a while, something sneaks through. 0:14:48.160 --> 0:14:50.000 Maybe there's a cloud that kind of looks like a 0:14:50.040 --> 0:14:52.440 kitty cat and it goes into the wrong bucket, or 0:14:52.600 --> 0:14:54.760 maybe there is a kitty cat that goes into the 0:14:54.800 --> 0:14:56.720 no kitty cat bucket, but the kitty cat was kind 0:14:56.720 --> 0:14:59.240 of obscured in the picture and the model just couldn't 0:14:59.240 --> 0:15:04.480 suss it out. But it succeeds more often than not. Okay, 0:15:04.840 --> 0:15:06.880 that's a baseline. When we come back, we'll talk a 0:15:06.880 --> 0:15:09.440 bit more about machine learning and how this plays into 0:15:09.800 --> 0:15:24.200 tools like chat GPT. Okay, I laid out one version 0:15:24.400 --> 0:15:27.040 of machine learning, and I want to stress that's just 0:15:27.320 --> 0:15:30.520 one version of machine learning. It's related to things like 0:15:30.600 --> 0:15:34.920 neural networks, which are designed to kind of mimic the 0:15:34.960 --> 0:15:40.440 way our brains process information and form pathways among neurons 0:15:40.480 --> 0:15:43.920 while we're trying to suss things out. But that's just 0:15:44.080 --> 0:15:46.400 one version of machine learning. I don't mean to say 0:15:46.440 --> 0:15:49.560 that's how it all works. There are actually lots of 0:15:49.720 --> 0:15:53.240 sub fields within machine learning, neural networks being just one 0:15:53.280 --> 0:15:55.960 of them, but there's also subsets of neural networks. One 0:15:56.000 --> 0:16:00.000 of those was would be deep learning, which always makes 0:16:00.040 --> 0:16:03.000 we think of MST three K and deep hurting shout 0:16:03.000 --> 0:16:05.760 outs to any misties out there. Now, as you dive 0:16:05.880 --> 0:16:09.960 down to deep learning, you're really getting into an interesting 0:16:10.040 --> 0:16:13.480 field of AI and machine learning. So deep learning models 0:16:13.720 --> 0:16:17.640 can accept unstructured data. If you're going further up to 0:16:17.800 --> 0:16:22.920 less specialized machine learning models, these have to use heavily 0:16:23.040 --> 0:16:27.119 labeled data sets and heavily structured data and use supervised 0:16:27.160 --> 0:16:30.640 learning in order to improve with time. But when you 0:16:30.680 --> 0:16:34.160 get into deep learning, you're looking at a very focused 0:16:34.160 --> 0:16:38.440 approach to machine learning where you can just feed unstructured 0:16:38.520 --> 0:16:41.960 data that has no labels to it and start to 0:16:42.080 --> 0:16:45.000 use this model to do whatever it is that you 0:16:45.640 --> 0:16:48.320 want it to do. But we're still kind of talking 0:16:48.320 --> 0:16:53.080 about a channeling or funneling situation here. The input goes 0:16:53.120 --> 0:16:56.720 into the model, the model analyzes the input and pushes 0:16:56.760 --> 0:16:59.360 it further one way or another through the system, and 0:16:59.400 --> 0:17:02.520 it comes out the end as output, which could be 0:17:02.560 --> 0:17:05.280 an image search result for kiddy cats in your smartphone's 0:17:05.320 --> 0:17:07.760 photo role, for example, So if you've ever gone into 0:17:08.600 --> 0:17:12.200 a smartphone photo collection and you just typed in a 0:17:13.040 --> 0:17:15.800 general word in search, you know it's not that you 0:17:15.880 --> 0:17:18.040 tagged any of your photos with this. You're just like 0:17:18.160 --> 0:17:20.280 looking for photos in your role that has a cat 0:17:20.359 --> 0:17:24.280 in them, and it returns something like that. Well, that 0:17:24.280 --> 0:17:27.600 can be the result of a machine learning process like 0:17:27.640 --> 0:17:31.000 the one I've just described, because again, the system has 0:17:31.000 --> 0:17:33.600 to figure out which of your photos have cats in them, 0:17:33.840 --> 0:17:36.560 even though you didn't tag any of those photos with cats. 0:17:36.560 --> 0:17:39.880 It doesn't have metadata. It has to analyze the photo itself. 0:17:40.480 --> 0:17:46.040 Now it's time to talk about probabilities. Large language models lms, 0:17:46.520 --> 0:17:51.360 which are what power chat bots like Google Bard and 0:17:51.880 --> 0:17:57.320 Chat GPT. They work in probabilities. And there's one example 0:17:57.359 --> 0:18:01.199 of an AI using probabilistic algorithms to generate responses that 0:18:01.280 --> 0:18:06.760 I really loved reference, and that example is IBM's Watson platform. 0:18:07.440 --> 0:18:09.760 So while the world right now is struggling to figure 0:18:09.760 --> 0:18:13.280 out how to handle chat GPT and Google Bard and such, 0:18:13.720 --> 0:18:16.520 IBM's Watson gave us a glimpse at what we could 0:18:16.560 --> 0:18:20.080 expect all the way back in twenty eleven. That's when 0:18:20.119 --> 0:18:24.240 IBM famously put Watson to the test and some exhibition 0:18:24.400 --> 0:18:29.199 games of the game show Jeopardy against former champions of 0:18:29.280 --> 0:18:33.280 that game show, human champions. So in many ways, this 0:18:33.400 --> 0:18:36.800 was an echo of IBM's Deep Blue going up against 0:18:36.960 --> 0:18:41.879 chess master Gary Kasparov in various games of chess. Putting 0:18:41.960 --> 0:18:46.000 Watson up against humans and Jeopardy was a fantastic publicity stunt, 0:18:46.160 --> 0:18:49.000 and it also was really impressive because the way Jeopardy 0:18:49.040 --> 0:18:53.719 works is players get several categories of trivia that they 0:18:53.720 --> 0:18:57.440 can choose from. Each category has different levels of questions 0:18:57.480 --> 0:19:00.960 that are designated by a dollar amount, So higher the 0:19:01.000 --> 0:19:04.400 dollar amount, the harder the trivia question is. Generally speaking, 0:19:05.520 --> 0:19:09.040 the actual clue that the players get is given in 0:19:09.040 --> 0:19:11.720 the form of an answer, and they have to provide 0:19:12.480 --> 0:19:17.440 a question that relates to that answer. So here's an example. 0:19:17.880 --> 0:19:22.320 The answer revealed in say a hypothetical Jeopardy game that 0:19:22.359 --> 0:19:26.119 has the category podcasts, could be something like he was 0:19:26.240 --> 0:19:29.600 Jonathan Strickland's original co host on the show tech Stuff. 0:19:30.000 --> 0:19:32.399 The correct response would be bipp a bip Who is 0:19:32.480 --> 0:19:36.200 Chris Palette? That would be the correct answer, But Jeopardy 0:19:36.680 --> 0:19:42.040 goes beyond just trivia. Often the answers provided will include 0:19:42.359 --> 0:19:46.760 word play or images or sound cues, and players will 0:19:46.760 --> 0:19:49.879 have to think outside the box. They can't just know 0:19:50.160 --> 0:19:54.800 the answer. Sometimes there's interpretation that has to happen first. 0:19:55.359 --> 0:19:58.399 The clue to the correct response could be a pun, 0:19:58.960 --> 0:20:02.600 it could involve a rhyme to the answer. It's not 0:20:02.760 --> 0:20:07.320 always a straightforward trivia question. In other words, so Watson 0:20:07.320 --> 0:20:10.879 needed to be able to analyze the clue given, to 0:20:11.000 --> 0:20:15.199 break it apart into components to understand what exactly is 0:20:15.240 --> 0:20:17.680 being asked of it. Then it needed to search its 0:20:17.760 --> 0:20:22.600 database for relevant information. So Watson famously was not connected 0:20:22.640 --> 0:20:25.359 to the Internet during these Jeopardy games. Instead, it was 0:20:25.400 --> 0:20:30.040 relying upon a database representing millions of books filled with facts. 0:20:30.680 --> 0:20:37.640 Then it would generate hypothetical responses like a hypothetical answer 0:20:38.200 --> 0:20:42.080 that Watson should give, or rather questions we're talking about jeopardy, 0:20:42.640 --> 0:20:45.200 and it would submit these hypotheses to a second round 0:20:45.240 --> 0:20:48.760 of analysis to look at is there any evidence that 0:20:48.840 --> 0:20:53.840 supports this response as being correct? Kind of measuring like, well, 0:20:54.359 --> 0:20:58.760 here's a possible answer, how likely is this answer to 0:20:58.840 --> 0:21:01.480 be right? And that was all part of the process. 0:21:01.800 --> 0:21:04.280 So it might even produce more than one answer. You 0:21:04.359 --> 0:21:08.480 might have multiple potential answers, and Watson would assign each 0:21:08.520 --> 0:21:12.040 answer a probability kind of a confidence level of how 0:21:12.080 --> 0:21:15.359 it felt that answer measured up against all the other ones. So, 0:21:16.440 --> 0:21:19.439 as an example, answer A might receive a ninety percent 0:21:19.520 --> 0:21:23.119 confidence level, So that's pretty darn confident that's the right answer. 0:21:23.840 --> 0:21:25.879 Maybe you have answer B and you're like, I'm seventy 0:21:25.920 --> 0:21:28.760 eight percent sure that this could be right. An answer 0:21:28.800 --> 0:21:32.040 C is the long shot with thirty three percent confidence. 0:21:32.240 --> 0:21:35.080 These don't add up to one hundred because they're not 0:21:35.280 --> 0:21:38.040 It's not like a zero sum game. It's more like, oh, 0:21:38.040 --> 0:21:39.919 it could be this or it could be that, but 0:21:40.040 --> 0:21:43.040 I feel like this is more likely than that, so 0:21:43.080 --> 0:21:45.399 I'm going to go with this. And Watson also had 0:21:45.400 --> 0:21:49.080 a threshold. If the answer it generated failed to meet 0:21:49.160 --> 0:21:53.159 a certain confidence threshold, Watson would not buzz in to 0:21:53.280 --> 0:21:58.000 try an answer. Otherwise, Watson played pretty aggressively and even 0:21:58.040 --> 0:22:00.919 in some sticky situations with daily dumb where if you 0:22:00.960 --> 0:22:04.160 get a daily double in Jeopardy, you don't buzz in anymore. 0:22:04.560 --> 0:22:06.440 If you are the one who chose the daily double, 0:22:06.520 --> 0:22:10.240 you're playing by yourself and you just have to give 0:22:10.280 --> 0:22:13.760 an answer. So in those situations, Watson got aggressive, and 0:22:13.880 --> 0:22:18.200 it would it would guess with very low confidence thresholds 0:22:18.240 --> 0:22:21.040 for some of these, like at the thirty percent range, 0:22:21.560 --> 0:22:24.199 and occasionally it was right. In fact, more often than 0:22:24.240 --> 0:22:26.440 not it was right until it got to final Jeopardy, 0:22:26.440 --> 0:22:30.160 where at least the first time, things did not go 0:22:30.920 --> 0:22:34.080 totally in Watson's favor. Also, Watson had an interesting betting 0:22:34.160 --> 0:22:37.399 strategy when it came to daily doubles. But I'm getting 0:22:37.440 --> 0:22:40.840 way off track. So that confidence level is really what 0:22:40.920 --> 0:22:44.040 I want to hone in on here. So it was 0:22:44.119 --> 0:22:48.560 expressed in percentages, So zero percent confidence would be like 0:22:48.640 --> 0:22:51.040 I do not know the answer, I do not know 0:22:51.119 --> 0:22:54.000 what goes here. A one hundred percent confidence level would 0:22:54.000 --> 0:22:56.639 be I am absolutely certain this is the right answer. 0:22:57.359 --> 0:22:59.879 And in a way, AI chat bots like chat GP 0:23:00.320 --> 0:23:03.920 and Google Bard are doing the same thing, only their 0:23:04.040 --> 0:23:08.520 confidence isn't about this is the answer to your question. 0:23:08.680 --> 0:23:11.800 I'm one hundred percent certain that this answers your question. 0:23:12.440 --> 0:23:16.080 It's more like it's more granular than that, because it's 0:23:16.080 --> 0:23:18.720 more at the sentence level. It's like, I think this 0:23:18.880 --> 0:23:22.679 word is the word that needs to go next to 0:23:22.760 --> 0:23:25.800 create the sentence that I'm building. So let's talk about 0:23:25.800 --> 0:23:28.320 how these models do create sentences, and I'm not going 0:23:28.400 --> 0:23:31.760 to wade into stuff like natural language processing. That is 0:23:32.160 --> 0:23:34.800 a major part of this, but I have done full 0:23:34.840 --> 0:23:39.280 episodes about natural language processing before. That essentially says, it's 0:23:39.320 --> 0:23:43.679 a way for machines to analyze information that's written in 0:23:44.800 --> 0:23:49.720 you know, your normal language, whether that's English or whatever. 0:23:50.200 --> 0:23:54.120 But you're not trying to create a sentence that the 0:23:54.160 --> 0:23:58.439 machine is able to parse. Right, You're not trying to 0:23:58.680 --> 0:24:02.480 work with the machine on its terms. You're just communicating 0:24:02.480 --> 0:24:04.440 with it the way you would with anyone else. It's 0:24:04.480 --> 0:24:06.840 the machines job to figure out what the heck you're saying. 0:24:07.400 --> 0:24:09.840 So we're not gonna dwell on that. Instead, we're going 0:24:09.920 --> 0:24:13.600 to talk about how a chatbot chooses how to respond 0:24:14.240 --> 0:24:18.840 to something that is said or asked of it. These 0:24:18.920 --> 0:24:22.240 chatbots are built on top of language models that have 0:24:22.320 --> 0:24:26.879 had enormous data sets fed to them during training. The 0:24:27.000 --> 0:24:29.560 data sets include stuff like basic facts. So if you 0:24:29.600 --> 0:24:32.000 ask a chatbot who was the sixteenth president of the 0:24:32.080 --> 0:24:34.840 United States, a well trained chatbot at least is going 0:24:34.920 --> 0:24:39.160 to say it was Abraham Lincoln. But that data also 0:24:39.280 --> 0:24:42.919 trains the chatbot on how we communicate with one another. 0:24:43.640 --> 0:24:49.600 So through analyzing hundreds of millions of documents, ranging from 0:24:49.640 --> 0:24:54.800 books to online social platforms like Reddit, these chatbot models 0:24:55.040 --> 0:25:01.560 learn rules of communication. They learn rules about spelling syntax. 0:25:01.600 --> 0:25:05.080 They learn about structure that goes from the sentence level 0:25:05.119 --> 0:25:08.800 to paragraphs like They learn how to build a sentence properly, 0:25:09.040 --> 0:25:11.880 how to build another sentence that builds on the first one, 0:25:12.160 --> 0:25:14.840 how to build a whole paragraph that gets a thought across, 0:25:15.160 --> 0:25:19.320 and then how to do a series of paragraphs to 0:25:19.359 --> 0:25:24.280 convey meaning of some sort right, how to build to 0:25:24.800 --> 0:25:30.320 like a thesis almost They learn which words typically follow 0:25:30.560 --> 0:25:34.439