WEBVTT - Did AI Write This? 0:00:04.440 --> 0:00:12.280 Welcome to Tech Stuff, a production from iHeartRadio. Hey there, 0:00:12.280 --> 0:00:15.640 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:15.680 --> 0:00:18.160 I'm an executive producer with iHeartRadio. And how the tech 0:00:18.239 --> 0:00:21.600 are you. I'm here to tell you something. You write 0:00:21.600 --> 0:00:25.480 like a robot. But that's okay because I do too. 0:00:25.880 --> 0:00:29.720 One of the founding fathers of the United States, James Madison, 0:00:30.120 --> 0:00:33.239 wrote like a robot. Robots weren't even a thing when 0:00:33.280 --> 0:00:36.080 he was writing back in the eighteenth century, all right, 0:00:36.159 --> 0:00:38.960 so really, I guess it's more fair to say that 0:00:39.159 --> 0:00:43.520 robots write like us. And while I'm having a little 0:00:43.560 --> 0:00:46.760 bit of fun using the word robots, what I'm really 0:00:46.800 --> 0:00:51.000 talking about is generative AI. You know, stuff like chat 0:00:51.080 --> 0:00:55.520 GPT and Google Bard, that kind of thing, These AI 0:00:55.680 --> 0:00:59.280 powered chat bots right like humans. Right, That's one of 0:00:59.320 --> 0:01:02.800 the big suff features of the chatbots. One that they 0:01:02.800 --> 0:01:06.560 can understand a prompt that we give them, That they 0:01:06.560 --> 0:01:09.480 can understand what we mean when we give them a prompt, 0:01:09.520 --> 0:01:12.840 and two that they then generate a response as if 0:01:12.920 --> 0:01:15.759 it had been written by an actual person. But obviously 0:01:15.800 --> 0:01:20.399 this also creates some challenges, some issues. So you might 0:01:20.440 --> 0:01:25.440 remember that since chat GPT became publicly available last year 0:01:25.480 --> 0:01:29.319 when OpenAI opened it up and let people start playing 0:01:29.319 --> 0:01:34.200 with chat GPT, there were people in education, teachers and 0:01:34.280 --> 0:01:37.839 administrators that sort of thing, who raise the alarm about 0:01:37.840 --> 0:01:42.320 the possibility that students could use chat GPT and similar 0:01:42.360 --> 0:01:47.800 tools to auto generate essays and stuff and thus bypass 0:01:47.920 --> 0:01:51.920 school assignments. My robot wrote it for me. Beyond the 0:01:52.040 --> 0:01:55.480 education sector, there are plenty of arenas where people are 0:01:55.520 --> 0:01:59.639 worried that the less scrupulous folks out there will attempt 0:01:59.680 --> 0:02:02.840 to pass off AI generated text as their own writing, 0:02:03.240 --> 0:02:08.760 whether this is creative writing or business writing, whatever it 0:02:08.800 --> 0:02:13.440 may be. So this then leads us to the concept 0:02:14.040 --> 0:02:18.640 of AI writing detection tools, you know, some sort of 0:02:19.360 --> 0:02:23.280 tool to determine if a piece of text originated from 0:02:23.480 --> 0:02:27.560 a real human being or from that character that Haley 0:02:27.639 --> 0:02:31.240 Joel Osmon played in that film about artificial intelligence. I 0:02:31.240 --> 0:02:35.239 forget what that movie was called. Subsequent to the release 0:02:35.680 --> 0:02:39.239 of these detection tools, we started hearing reports of teachers 0:02:39.560 --> 0:02:45.000 failing students, sometimes an entire class of students, because the 0:02:45.120 --> 0:02:49.520 detection tool indicated that the real source of the works 0:02:49.560 --> 0:02:52.040 that were being turned in by the students it wasn't 0:02:52.120 --> 0:02:54.919 from the students, but from AI. Now a lot of 0:02:54.960 --> 0:02:57.359 students have actually come forward to argue that no, no, 0:02:57.520 --> 0:03:03.200 they actually wrote those pieces themselves, that they authored that work, 0:03:03.240 --> 0:03:05.440 they didn't use AI to do it, and that they 0:03:05.440 --> 0:03:09.000 are the victim of false positives, that these writing detection 0:03:09.120 --> 0:03:12.240 tools made a mistake, and as it turns out, at 0:03:12.320 --> 0:03:15.440 least some of them, and likely a lot of them 0:03:15.560 --> 0:03:18.239 were telling the truth. And we can say that because 0:03:18.280 --> 0:03:25.160 these AI writing detection tools have abysmal accuracy rates, they 0:03:25.240 --> 0:03:29.400 are worse than chance. That's how bad these tools can be. 0:03:30.160 --> 0:03:33.000 So the success rate for an AI writing detector can 0:03:33.040 --> 0:03:36.040 be so low that it has led some of the 0:03:36.040 --> 0:03:40.320 companies to shut them down, and it led to a 0:03:40.360 --> 0:03:43.600 lot of critics to just dismiss the concept of an 0:03:43.600 --> 0:03:48.240 AI writing tool entirely. In fact, there are quite a 0:03:48.240 --> 0:03:51.120 few who have argued that AI writing detection tools are 0:03:51.560 --> 0:03:54.720 essentially snake oil. That there are companies that are making 0:03:54.760 --> 0:03:57.480 what they say are reliable tools that can tell the 0:03:57.480 --> 0:04:00.560 difference between text that was written by person and text 0:04:00.640 --> 0:04:04.200 that was written by AI, but really they're just peddling 0:04:04.720 --> 0:04:08.800 a hoax or a scam, and they're trying to make 0:04:08.920 --> 0:04:13.400 money selling these tools to various organizations like schools and such, 0:04:14.160 --> 0:04:17.360 but in fact those tools don't work, or at least 0:04:17.400 --> 0:04:21.160 they don't work very well. Even open Ai, which is 0:04:21.200 --> 0:04:25.640 the company that is responsible for chat GPT, they had 0:04:26.279 --> 0:04:28.880 a tool that was meant to be a detection tool 0:04:28.920 --> 0:04:32.159 to tell whether or not something was written by AI. 0:04:32.279 --> 0:04:35.560 It was called AI Classifier, but they shut it down 0:04:36.240 --> 0:04:41.760 earlier this year. Why because its accuracy rate was twenty 0:04:42.160 --> 0:04:47.960 six percent. Twenty six percent accurate, that is bonkers. That 0:04:48.000 --> 0:04:52.320 means nearly three quarters of the time that detection tool 0:04:52.400 --> 0:04:54.920 came up with the wrong answer. Either it gave a 0:04:55.000 --> 0:04:59.400 pass to an AI generated piece, or it accused a 0:04:59.600 --> 0:05:04.760 work that a human being actually wrote, like definitively wrote, 0:05:05.320 --> 0:05:08.560 as being the product of AI. This brings us to 0:05:08.640 --> 0:05:14.040 James Madison. James Madison wrote the US Constitution, and folks 0:05:14.080 --> 0:05:17.880 have fed the US Constitution into these AI writing detection 0:05:18.000 --> 0:05:22.440 tools and received a notification that this piece was very 0:05:22.520 --> 0:05:25.800 likely written by AI, which obviously led to lots of 0:05:26.320 --> 0:05:29.400 jocularity on the Internet, as people said, I knew it. 0:05:29.440 --> 0:05:31.479 I knew that the founding fathers of the United States 0:05:31.520 --> 0:05:34.440 of America were really robots from the future sent back 0:05:34.480 --> 0:05:39.599 in time to create a ultra capitalist society that preys 0:05:39.720 --> 0:05:44.440 upon the disenfranchised or something like. There are a lot 0:05:44.440 --> 0:05:47.120 of jokes about it, but the fact is no, it's 0:05:47.200 --> 0:05:51.680 just that this writing detection tool is completely unreliable. So 0:05:51.720 --> 0:05:55.039 you certainly cannot use these kinds of tools to justify 0:05:55.120 --> 0:05:59.080 flunking an entire class of students when you know that 0:05:59.200 --> 0:06:02.680 the reliability is so low. Now, I decided to do 0:06:03.120 --> 0:06:07.960 this short episode about AI writing detection tools after reading 0:06:08.160 --> 0:06:11.279 a couple of great pieces in Ours Technico. Those of 0:06:11.320 --> 0:06:14.080 y'all who listen to my show frequently know that I 0:06:14.120 --> 0:06:19.760 often reference Ours Technica because the folks there reliably post 0:06:20.240 --> 0:06:23.400 great articles. So in this case, the author of both 0:06:23.440 --> 0:06:28.320 pieces I read was BENJ. Edwards b E and J. Edwards, 0:06:28.680 --> 0:06:31.000 And at some point I probably should reach out to 0:06:31.040 --> 0:06:33.440 them and ask if they would like to join tech 0:06:33.480 --> 0:06:36.920 stuff for an episode to talk about something like generative AI, 0:06:37.400 --> 0:06:41.839 because Edwards has done some really good work. Anyways, as 0:06:41.880 --> 0:06:48.280 we think about the issue about how this generative AI works, 0:06:48.680 --> 0:06:53.960 the underlying technology that powers generative AI, we start to 0:06:53.960 --> 0:06:58.800 see why there's this big reliability problem. Why are we 0:06:58.880 --> 0:07:04.240 having such issues with an automated detection tool? Really determining 0:07:04.400 --> 0:07:07.760 if something was written by a person or AI. And 0:07:07.760 --> 0:07:12.320 it's because the tools like chat GPT are built on 0:07:12.400 --> 0:07:17.800 top of large language models, also known as llms, And 0:07:17.840 --> 0:07:21.360 if we take a moment to really understand llms, then 0:07:21.400 --> 0:07:23.720 we start to get a handle on why these detector 0:07:23.760 --> 0:07:27.920 tools are so unreliable. So first off, let's actually talk 0:07:27.920 --> 0:07:32.040 about a precursor to large language models. This would be 0:07:32.200 --> 0:07:37.080 recurrent neural networks or r ends. Now I've talked a 0:07:37.120 --> 0:07:39.800 lot about neural networks on this show, but just as 0:07:39.800 --> 0:07:43.640 a refresher. Neural network is an attempt to create a 0:07:43.800 --> 0:07:48.680 computer system or computer model that processes information in a 0:07:48.680 --> 0:07:53.080 way that is similar to how our brains process information. 0:07:53.640 --> 0:07:58.560 So you have layers of artificial neurons, or you can 0:07:58.560 --> 0:08:02.920 think of them as nodes. These layers connect to other 0:08:03.000 --> 0:08:07.080 artificial neurons. You have multiple connections from neuron to other neurons, 0:08:07.480 --> 0:08:09.520 and you have layers that go from top to bottom. 0:08:09.520 --> 0:08:11.560 You can think of it like at the top that's 0:08:11.560 --> 0:08:14.120 where you put input and at the bottom that's where 0:08:14.160 --> 0:08:18.120 you get output. So essentially, you feed information into the 0:08:18.160 --> 0:08:20.960 model and then the information goes through a series of 0:08:20.960 --> 0:08:25.160 operations in which data passes through these different nodes, and 0:08:25.200 --> 0:08:28.480 the nodes make decisions based upon the input, and then 0:08:28.520 --> 0:08:32.640 they send output to different nodes and eventually you get 0:08:33.000 --> 0:08:36.800 the ultimate output. And sometimes that output is correct. It 0:08:36.800 --> 0:08:40.480 gives you the answer that is correct. Sometimes it's wrong. 0:08:41.000 --> 0:08:43.120 And typically what that means is that you then have 0:08:43.200 --> 0:08:47.520 to adjust how those artificial neurons are making decisions. Those 0:08:47.559 --> 0:08:52.480 neurons apply a sort of bias to input, we call 0:08:52.520 --> 0:08:56.880 it a weight, so they will favor some types of 0:08:56.960 --> 0:08:59.800 input over others in an effort to make a decision. 0:08:59.840 --> 0:09:03.000 If they didn't, then the data would never go anywhere. 0:09:03.080 --> 0:09:05.120 You would never be able to have it processed through 0:09:05.160 --> 0:09:09.440 the system. So the weighting affects how the neuron actually 0:09:09.440 --> 0:09:11.920 processes the data, where does it pass it on to. 0:09:12.559 --> 0:09:16.720 So it may say, if value is greater than X, 0:09:16.960 --> 0:09:20.679 send to node A. If value is less than x, 0:09:20.920 --> 0:09:24.839 send to node B. That could be a very basic weight. 0:09:25.240 --> 0:09:28.040 X would be the weight in that case, and maybe 0:09:28.120 --> 0:09:31.640 that would lead you to a correct outcome. So by 0:09:31.679 --> 0:09:36.400 adjusting the weighting, you can change how these neurons make decisions. 0:09:36.880 --> 0:09:39.000 And if you build a neural network for the purposes, 0:09:39.360 --> 0:09:42.200 let's give it a hypothetical. Let's say it's identifying pictures 0:09:42.240 --> 0:09:46.320 of cats. It's always my go to. And you start 0:09:46.400 --> 0:09:48.640 looking at the output and you see that it is 0:09:48.760 --> 0:09:53.199 mistakenly saying that pictures of flowers are pictures of cats. 0:09:53.600 --> 0:09:56.760 You would say, all right, these artificial neural networks, the 0:09:57.040 --> 0:10:00.640 nodes in this artificial neural network are making the wrong decisions. 0:10:01.000 --> 0:10:04.280 The waiting is wrong in these nodes. I need to 0:10:04.280 --> 0:10:07.280 go and start adjusting things so that I can start 0:10:07.280 --> 0:10:12.520 to get back to this correctly saying whether or not 0:10:12.559 --> 0:10:15.400 an image has a cat in it or doesn't. And 0:10:16.040 --> 0:10:18.240 your goal is to train this model over and over 0:10:18.280 --> 0:10:21.200 and over again until it gets better and better at 0:10:21.200 --> 0:10:24.120 this task, so that then you can just send it 0:10:24.200 --> 0:10:26.720 any raw data you like and not have to worry 0:10:26.760 --> 0:10:31.120 about checking up on it afterward because its accuracy level 0:10:31.120 --> 0:10:34.520 will be high enough to be reliable. That's your ultimate goal, 0:10:34.880 --> 0:10:38.120 But there's a whole process of learning of training that 0:10:38.200 --> 0:10:41.760 you have to go through first. Now, a recurrent neural network, 0:10:41.760 --> 0:10:44.760 it's a little more specific than just artificial neural network. 0:10:45.320 --> 0:10:50.679 Recurrent neural networks use sequential data. These networks can and 0:10:50.760 --> 0:10:55.720 do take information from earlier inputs into consideration when processing 0:10:55.920 --> 0:11:00.280 a new input, so there's a different model, the convolutional 0:11:00.520 --> 0:11:04.040 neural network CNN, not the news channel. This is the 0:11:04.040 --> 0:11:08.000 other big type of neural network where every time data 0:11:08.080 --> 0:11:11.480 goes into an input, it's like a blank slate. It's 0:11:11.480 --> 0:11:15.320 its own thing, it has nothing about That decision is 0:11:15.400 --> 0:11:19.880 based upon any past decision. It's an instance by instance 0:11:20.000 --> 0:11:22.960 kind of case. So you're starting from scratch. But with 0:11:23.160 --> 0:11:27.720 recurrent neural networks, the network can actually incorporate past inputs 0:11:28.080 --> 0:11:31.840 as part of how it processes a current input. But 0:11:31.960 --> 0:11:35.400 one issue with these types of networks, the recurrent neural 0:11:35.400 --> 0:11:38.800 networks is that they need a full sequence before they 0:11:38.840 --> 0:11:42.600 can process the information. So when we're talking about text, 0:11:43.040 --> 0:11:45.880 like if we wanted to process text through a recurrent 0:11:45.920 --> 0:11:49.120 neural network, it would need to work over the entire 0:11:49.240 --> 0:11:53.240 text before producing a result in order to understand things 0:11:53.280 --> 0:11:57.000 like context. Sometimes this approach can lead to errors because 0:11:57.040 --> 0:12:01.720 the model essentially forgets the stuff that was at the 0:12:01.760 --> 0:12:04.160 beginning of the text by the time it gets to 0:12:04.200 --> 0:12:07.160 the end, which sounds a lot like me honestly, where 0:12:07.600 --> 0:12:10.440 I will finish a book and then I'll think, like 0:12:10.520 --> 0:12:13.560 I'll have a discussion with someone about a book that 0:12:13.600 --> 0:12:15.360 we've both read and they'll be like, Oh, I like 0:12:15.440 --> 0:12:18.320 that part where in early in the book blah blah 0:12:18.320 --> 0:12:20.600 blah blah blah, and it pays off much later, and meanwhile, 0:12:20.600 --> 0:12:23.320 I'm thinking, I totally forgot that that happened earlier in 0:12:23.360 --> 0:12:25.559 the book. I remember where we ended up, but I 0:12:25.600 --> 0:12:28.960 don't remember how we got there. Recurrent neural networks can 0:12:29.000 --> 0:12:33.360 fall into the same sort of trap, and so that 0:12:34.679 --> 0:12:38.520 creates a bit of a hurdle when it comes to 0:12:38.559 --> 0:12:44.640 things like analyzing text for the purposes of building natural 0:12:44.720 --> 0:12:48.960 language systems. But I'll explain how that all started to 0:12:49.040 --> 0:12:52.559 change in twenty seventeen. First, however, we need to take 0:12:52.600 --> 0:13:05.680 a quick break to thank our sponsors. Okay, before the break, 0:13:05.720 --> 0:13:08.840 I was talking about recurrent neural networks and how those 0:13:08.880 --> 0:13:11.439 have certain limitations when it comes to the way they 0:13:11.440 --> 0:13:14.800 process data because it has to be sequential. Well, in 0:13:14.840 --> 0:13:18.480 twenty seventeen, a group of AI researchers who were working 0:13:18.520 --> 0:13:24.120 specifically over at Google were coming up with an alternative 0:13:24.760 --> 0:13:27.760 to this approach, and they published a paper, and the 0:13:27.800 --> 0:13:32.000 paper's title was Attention is All You Need, in which 0:13:32.000 --> 0:13:35.680 they suggested that you could do something differently from the 0:13:35.720 --> 0:13:39.000 recurrent neural network approach for the purposes of analyzing stuff 0:13:39.080 --> 0:13:44.360 like text. Their approach was what they called a transformer model. 0:13:45.240 --> 0:13:49.600 While you're old, RNN would analyze text essentially a character 0:13:49.679 --> 0:13:51.600 at a time, not even a word at a time, 0:13:51.600 --> 0:13:54.840 but a character at a time, and thus that's sequential, right. 0:13:54.880 --> 0:13:58.440 The sequential data is character by character. It builds this 0:13:58.600 --> 0:14:02.120 up and then analyzes the whole thing. The transformer model 0:14:02.160 --> 0:14:06.680 instead would tackle a sentence as a unit as opposed 0:14:06.679 --> 0:14:10.280 to a character or even an entire passage of text 0:14:10.440 --> 0:14:13.319 would be a single unit, and so it would analyze 0:14:13.360 --> 0:14:17.160 this to understand the context of what was being said, 0:14:17.880 --> 0:14:20.880 and that's a huge benefit you. Getting a handle on 0:14:21.000 --> 0:14:26.160 context is absolutely critical to understanding what someone means, because 0:14:26.200 --> 0:14:29.400 words can have multiple meanings, right, and without context, we 0:14:29.440 --> 0:14:33.720 can't really be sure which meaning someone intended. So here's 0:14:33.720 --> 0:14:37.760 an example. The English word late. That can mean a 0:14:37.760 --> 0:14:40.280 lot of things if you're an English speaker. So if 0:14:40.320 --> 0:14:42.280 you're talking about the time of day, if you say 0:14:42.280 --> 0:14:45.560 it's late, you usually mean it's getting close to night time. 0:14:45.800 --> 0:14:47.760 You could say it's late at night, which means it's 0:14:47.800 --> 0:14:51.120 actually close to morning time, or maybe it even is 0:14:51.200 --> 0:14:55.440 the morning because it's still dark. And so you think 0:14:55.480 --> 0:14:59.360 of it as night, but technically speaking, it's morning and 0:14:59.360 --> 0:15:01.760 you're just saying it's late at night. If you're saying 0:15:01.880 --> 0:15:05.800 somebody is late, you could either mean they are not 0:15:05.960 --> 0:15:10.320 on time for some appointment, or tragically, you could mean 0:15:10.360 --> 0:15:13.440 that this is a person who has passed away. They 0:15:13.480 --> 0:15:16.960 are late. But you need the rest of the sentence. 0:15:17.000 --> 0:15:21.920 You need that context to understand what meaning of late 0:15:22.480 --> 0:15:27.720 was actually intended. So you need that contextual vision to 0:15:27.760 --> 0:15:31.680 be able to understand the whole thing. So transformer models 0:15:32.240 --> 0:15:37.840 began to revolutionize certain types of AI applications, specifically in 0:15:37.880 --> 0:15:43.760 the realm of natural language processing and generative AI, and 0:15:43.840 --> 0:15:47.600 it's what led to the development of large language models 0:15:47.960 --> 0:15:52.040 the lms. Essentially, a large language model is just a 0:15:52.280 --> 0:15:56.600 huge transformer model. And to make a large language model, 0:15:57.040 --> 0:16:00.760 you need a lot of text to train your model, 0:16:01.120 --> 0:16:04.960 like a lot a lot. Open AI trained its large 0:16:05.000 --> 0:16:09.040 language model known as GPT, which stands for Generative pre 0:16:09.160 --> 0:16:15.640 Trained Transformer. They trained it on countless documents, millions and 0:16:15.800 --> 0:16:22.040 millions of documents found across the web. Some authors allege 0:16:22.440 --> 0:16:26.000 that the training material included copyrighted material and that the 0:16:26.000 --> 0:16:28.840 authors did not give permission for their works to be 0:16:28.960 --> 0:16:32.200 part of the information that fed into this model. That 0:16:32.400 --> 0:16:35.400 leads into its own set of problems that are a 0:16:35.480 --> 0:16:37.760 little bit beyond the scope of what I'm talking about today, 0:16:37.760 --> 0:16:41.080 but they are big problems and they're ongoing now. Stephen 0:16:41.160 --> 0:16:45.120 King argued that his works were clearly used to train 0:16:45.240 --> 0:16:48.720 up large language models. A dead giveaway is if you 0:16:48.840 --> 0:16:53.360 ask a chatbot built on top of a large language 0:16:53.360 --> 0:16:58.360 model to recite passages from specific authors works, and if 0:16:58.360 --> 0:17:01.560 it can do that like accurate, like it's really giving 0:17:01.600 --> 0:17:06.760 you an accurate representation of that text. Yeah, there's no 0:17:06.880 --> 0:17:12.240 way could have received that information without having trained on 0:17:12.440 --> 0:17:16.399 the original text at least somewhere. Now, if it's just 0:17:16.440 --> 0:17:20.520 making stuff up, that's different. That falls into the category 0:17:20.560 --> 0:17:24.080 of hallucinations, which we might touch upon again before we 0:17:24.320 --> 0:17:30.320 finish shut this episode. Anyway, the benefit of feeding so 0:17:30.640 --> 0:17:34.480 much information to a transformer model is that the transformer model, 0:17:34.560 --> 0:17:38.000 the large language model, gets pretty darn good at sussing 0:17:38.040 --> 0:17:42.040 out context. Even stuff that you would expect would trip 0:17:42.200 --> 0:17:45.720 up an AI chatbot can become a breeze. You know, 0:17:45.800 --> 0:17:49.479 you might think that slang or idioms could trip up 0:17:49.480 --> 0:17:52.840 an AI tool, but then you have to remember that 0:17:52.920 --> 0:17:55.960 these tools rely on essentially all the stuff that's on 0:17:56.000 --> 0:17:59.320 the Internet, at least all the stuff that's publicly available 0:17:59.320 --> 0:18:03.560 that's not locked behind something, and maybe even some stuff 0:18:03.560 --> 0:18:07.080 that is locked behind stuff. As it turns out, and 0:18:07.200 --> 0:18:09.840 as such, that means that these models have trained with 0:18:09.960 --> 0:18:12.960 data sets that originate from the same communities that are 0:18:13.000 --> 0:18:16.919 creating the culture that generates certain slang and idioms in 0:18:16.920 --> 0:18:20.400 the first place. So if your AI model is using 0:18:20.440 --> 0:18:25.320 the same source material where these turns of phrase and 0:18:25.440 --> 0:18:29.960 certain slang terms are are originating from, well, of course 0:18:30.000 --> 0:18:32.119 it's going to understand it because that was part of 0:18:32.119 --> 0:18:36.240 its training, so it has that grounding. It's not like me, 0:18:36.800 --> 0:18:40.199 where I am old. I don't understand slang that the 0:18:40.280 --> 0:18:43.880 kids use these days because I'm not in those communities. 0:18:44.560 --> 0:18:47.080 You wouldn't expect me to understand. I am definitely the 0:18:48.400 --> 0:18:51.800 stereotypical out of touch old dude. So when I hear 0:18:51.880 --> 0:18:55.840 people about, you know, people rizing up, I'm like, wait what? 0:18:56.880 --> 0:18:59.600 And I have to look things up. And as we 0:18:59.640 --> 0:19:03.720 all know, urban dictionary is not the most reliable of resources. 0:19:04.200 --> 0:19:08.679 It is frequently entertaining, usually in a way that is 0:19:08.720 --> 0:19:13.600 incredibly offensive, but it's not always accurate anyway. This ultimately 0:19:13.680 --> 0:19:16.680 starts to lead us to why these AI writing detection 0:19:16.800 --> 0:19:21.280 tools are not very good. The material that AI generates 0:19:21.400 --> 0:19:24.840 is built upon how we communicate. It's a built on 0:19:24.880 --> 0:19:28.360 how we write. That's how it was trained. So it's 0:19:28.359 --> 0:19:33.199 not like AI or robots, as I was facetiously saying 0:19:33.280 --> 0:19:36.080 earlier in the episode. It's not like AI has a 0:19:36.119 --> 0:19:39.320 different path toward writing than we do. The AI is 0:19:39.359 --> 0:19:43.760 not following an established set of rules that's unique to AI. Right, 0:19:43.800 --> 0:19:47.760 They're not saying, write this like artificial intelligence. So the 0:19:47.840 --> 0:19:51.639 stuff that AI produces can come across as very human 0:19:52.040 --> 0:19:56.159 and vice versa. Now, this does not mean that it 0:19:56.280 --> 0:20:01.080 is absolutely impossible for someone like a teacher to tell 0:20:01.160 --> 0:20:04.720 if something was written by AI or a student. If 0:20:04.760 --> 0:20:07.439 the teacher is actually really familiar with the writing style 0:20:07.720 --> 0:20:12.120 of that student or students in question, it's entirely possible 0:20:12.320 --> 0:20:15.120 that the teacher might notice if that writing style were 0:20:15.160 --> 0:20:20.880 to suddenly and maybe significantly change between assignments. This can 0:20:20.960 --> 0:20:23.640 be a big ask, by the way, for certain teachers, 0:20:23.800 --> 0:20:26.880 because class sizes can get huge depending on where you are, 0:20:27.600 --> 0:20:30.320 and if you're talking about an overworked English teacher who's 0:20:30.359 --> 0:20:33.879 teaching multiple classes and each class has got, you know, 0:20:34.000 --> 0:20:37.720 thirty kids in it. It can be hard to really 0:20:37.920 --> 0:20:42.879 build up a working knowledge and memory of the writing 0:20:42.920 --> 0:20:45.520 styles of every single person in every single class. But 0:20:46.119 --> 0:20:48.800 that is one way that teachers can tell. If teachers 0:20:49.040 --> 0:20:52.040 read an essay and think, wow, you know, Robert didn't 0:20:52.119 --> 0:20:55.720 write like this in the essay we did last month, 0:20:56.200 --> 0:20:59.960 this is a very different approach to writing and per 0:21:00.040 --> 0:21:04.080 perhaps that's an indicator that someone else wrote the piece, 0:21:04.119 --> 0:21:07.680 whether that was AI or maybe you know, another human being, 0:21:08.480 --> 0:21:12.320 and that can be an indication something hinky is going on. Also, 0:21:12.400 --> 0:21:15.480 I mean, obviously some people get sloppy. This happens a 0:21:15.480 --> 0:21:18.640 lot too when people just aren't paying attention as they're 0:21:18.760 --> 0:21:24.600 using AI to generate either you know, an educational assignment 0:21:24.880 --> 0:21:28.840 or business or whatever. There have been so many examples 0:21:29.240 --> 0:21:33.280 of how people have accidentally copied and pasted not just 0:21:33.560 --> 0:21:36.760 the body of the text, but stuff that's outside the 0:21:36.800 --> 0:21:39.000 body of the text, like it might even be a 0:21:39.000 --> 0:21:42.600 little disclaimer saying it was made by AI, or it 0:21:42.640 --> 0:21:47.080 could be a command like regenerate response. That's something you 0:21:47.160 --> 0:21:51.480 find in certain chat bots, and that is just what 0:21:51.520 --> 0:21:55.760 regenerate response means. It just means, hey, can you create 0:21:55.920 --> 0:22:03.119 a new AI response to the initial prompt I gave you. 0:22:03.320 --> 0:22:06.200 So I wrote a prompt, I had you generate response. 0:22:07.040 --> 0:22:09.280 I want you to create a whole new response based 0:22:09.320 --> 0:22:13.280 on that original prompt. If you have regenerate response written 0:22:13.320 --> 0:22:18.560 at in your essay, that's a dead giveaway that you 0:22:18.800 --> 0:22:22.160 copied and pasted that essay off of an AI chatbot. 0:22:22.520 --> 0:22:25.920 So there are ways that teachers can tell the difference, 0:22:26.840 --> 0:22:30.359 but they aren't. It's not as granular as saying, oh, 0:22:30.600 --> 0:22:35.280 this is clearly something that was written by artificial intelligence 0:22:35.359 --> 0:22:37.719 versus this was written by a human. It's more like 0:22:38.400 --> 0:22:41.359 this is different from what I have received before from 0:22:41.440 --> 0:22:47.440 this particular student, or this contains obvious errors that reveal 0:22:47.720 --> 0:22:52.399 that the student has used AI. Now, the AI writing 0:22:52.440 --> 0:22:58.160 detection tools are at least claiming to use a couple 0:22:58.160 --> 0:23:01.040 of strategies to try and determine if something was written 0:23:01.080 --> 0:23:04.360 by AI or a human. So they're saying, we can 0:23:04.440 --> 0:23:08.359 automate that process, and we can actually analyze a block 0:23:08.400 --> 0:23:11.120 of text and give you a determination as to whether 0:23:11.240 --> 0:23:13.399 or not that was made by AI or a human, 0:23:13.760 --> 0:23:18.080 which suggests that maybe there is some sort of fundamental 0:23:18.119 --> 0:23:22.360 difference between the way AI generates content and the way 0:23:22.520 --> 0:23:27.800 people do. But these strategies that the AI writing detection 0:23:27.920 --> 0:23:32.199 tools are built upon have fundamental flaws, and we know 0:23:32.280 --> 0:23:35.000 that because we know the tools are bad. It was 0:23:35.040 --> 0:23:37.639 bad enough for open ai to shut down its version 0:23:37.760 --> 0:23:42.919 back in June. So this isn't like just us postulating 0:23:43.160 --> 0:23:45.919 that these tools are bad. We know they're bad. We 0:23:46.080 --> 0:23:49.640 know they create things like false positives. So knowing that 0:23:49.920 --> 0:23:53.639 already they are unreliable, you then have to start asking, well, 0:23:53.960 --> 0:23:56.560 why are they unreliable? What are the things that are 0:23:56.680 --> 0:24:00.480 leading these tools to make these wrong determinations? And when 0:24:00.480 --> 0:24:04.359 we come back, I'll talk about how Bene Edwards and 0:24:04.480 --> 0:24:08.880 those OURS Technica articles really kind of digs into two 0:24:09.359 --> 0:24:14.159 main concepts that end up leading to these writing detection 0:24:14.280 --> 0:24:17.080 tools trying to make a determination and why they are 0:24:17.760 --> 0:24:32.600 fundamentally flawed. But first let's take another quick break. So 0:24:33.320 --> 0:24:35.439 before the break, I mentioned that I was going to 0:24:35.440 --> 0:24:39.760 talk about some strategies that Binge Edwards outlines in his 0:24:39.960 --> 0:24:43.800 RS Technica articles, and they fall into two categories. The 0:24:43.840 --> 0:24:49.600 first is called perplexity, and that really means how surprising 0:24:49.800 --> 0:24:54.480 or perplexing are the word choices, how creative are the 0:24:54.600 --> 0:24:59.640 sentences in a given piece of text compared to an 0:24:59.680 --> 0:25:04.800 AI training model. So the thinking behind this is that 0:25:05.560 --> 0:25:09.600 if a block of text seems to conform to the 0:25:09.640 --> 0:25:12.880 same sort of stuff that the language model would produce, 0:25:13.760 --> 0:25:17.639 then AI probably created the text. That's the idea if 0:25:17.880 --> 0:25:22.280 they're saying essentially that if the text is really similar 0:25:22.320 --> 0:25:25.880 to what AI would create, then AI probably created it. 0:25:26.880 --> 0:25:30.119 And let's think about how some tools use autocomplete to 0:25:30.160 --> 0:25:32.600 help you write a text or sentence. Using a purely 0:25:32.680 --> 0:25:35.520 hypothetical scenario to kind of get our minds wrapped around this, 0:25:36.160 --> 0:25:38.880 Let's say that you were typing into something that has 0:25:38.960 --> 0:25:43.640 autocomplete built into it, the sentence or the phrase I'm 0:25:43.680 --> 0:25:48.800 going to go for a and then whatever tool you're 0:25:48.840 --> 0:25:53.920 typing it into suggests the word walk as an autocomplete option. Well, 0:25:53.960 --> 0:25:57.800 that would be because the language model that is powering 0:25:58.280 --> 0:26:04.760 this autocomplete function has a has sampled millions of passages, 0:26:04.880 --> 0:26:07.879 millions and millions and millions of documents, and has found 0:26:08.440 --> 0:26:11.760 that the word walk has been the most common word 0:26:11.800 --> 0:26:15.199 to follow the phrase I'm going to go for a 0:26:16.520 --> 0:26:21.720 and so therefore it offers that as the suggestion, and 0:26:21.920 --> 0:26:24.160 maybe it would even offer you a few options. Maybe 0:26:24.160 --> 0:26:27.359 it would say walk, maybe it'd say swim in the UK, 0:26:27.480 --> 0:26:32.080 maybe it'd say a curry. Who knows so, but you know, 0:26:32.119 --> 0:26:33.959 it would give you maybe a couple of different options, 0:26:33.960 --> 0:26:35.960 but they would be the ones that would most likely 0:26:36.040 --> 0:26:40.240