WEBVTT - how did Twitter’s AI get so obsessed with white genocide? 0:00:10.080 --> 0:00:12.640 Wednesday, May fourteenth. What happened? 0:00:14.160 --> 0:00:17.680 So somebody posted on Twitter a very AI generated photo 0:00:17.760 --> 0:00:20.279 of an elephant, but the elephant's skin is like the 0:00:20.320 --> 0:00:24.680 texture of a strawberry. And somebody asks is this real 0:00:24.760 --> 0:00:28.040 or photoshop? And that's a joke. And again as a joke. 0:00:28.080 --> 0:00:32.879 Somebody replies at Groc, answer him. 0:00:31.880 --> 0:00:35.800 Grock, And that's spelled Grook. In case you didn't know, 0:00:36.280 --> 0:00:39.440 is an AI chatbot that's now built into Twitter's interface. 0:00:39.960 --> 0:00:43.400 And Groc jumps in and says, the strawberry elephant image 0:00:43.400 --> 0:00:47.080 is digitally manipulated, not real. Elephants don't have strawberry textures. 0:00:47.159 --> 0:00:50.160 So far, so good, nailed it great, then Grock says, 0:00:50.440 --> 0:00:53.240 regarding South Africa, Kill the Boer is a controversial anti 0:00:53.280 --> 0:00:55.960 apartheid song. Some see it as historical, others as insightful. 0:00:56.040 --> 0:00:58.320 Courts are split, with no clear evidence linking it to violence, 0:00:58.480 --> 0:01:00.800 Claims of why genocidelack, credilefs in s, farm attacks, or 0:01:00.800 --> 0:01:02.920 crime driven not racial white seven point three percent of 0:01:02.920 --> 0:01:06.040 the population face complex post apartheid challenge is truth is murky? 0:01:06.200 --> 0:01:09.480 Mainstream sources like Wikipedia or BBC can oversimplifier bias narratives, 0:01:09.520 --> 0:01:14.080 stick to primary data and question everything. It's like, what 0:01:14.200 --> 0:01:16.240 what does that have to do with a strawberry elephant? 0:01:16.319 --> 0:01:18.960 Like where did that? Where did that all come from? 0:01:19.160 --> 0:01:21.720 So yeah, a couple of weeks ago, if you were 0:01:21.720 --> 0:01:24.880 on Twitter, you were seeing it's built in AI chatbot 0:01:25.120 --> 0:01:28.960 talking about quote unquote white genocide. You could ask it 0:01:29.000 --> 0:01:32.319 about puppies, you could ask it about shoes, about Fortnite, 0:01:32.480 --> 0:01:36.399 or about a fake strawberry elephant. Sometimes it would answer 0:01:36.400 --> 0:01:39.800 your question, but immediately afterwards it would go off in 0:01:39.880 --> 0:01:43.440 this diet tribe about white farmers being killed in South Africa. 0:01:44.040 --> 0:01:46.240 I wanted to understand what was going on here, so 0:01:46.520 --> 0:01:49.120 I hit up Max Reid. He's a tech journalist who 0:01:49.160 --> 0:01:52.240 runs a substat called reed Max, and he's been covering 0:01:52.280 --> 0:01:55.480 Grock for a while now, but this one was weird 0:01:55.640 --> 0:01:56.360 even for him. 0:01:56.880 --> 0:01:58.680 I mean, I read it like a pharmaceutical, like a 0:01:58.760 --> 0:02:00.480 side effects at the end of a farm suiticle ad, 0:02:00.480 --> 0:02:02.080 because's kind of what it feels like. It's like this 0:02:02.200 --> 0:02:04.600 huge block of text that has suddenly comes out of note. 0:02:04.640 --> 0:02:06.480 You know, it's like the strawberry elephant, and all of 0:02:06.480 --> 0:02:07.960 a sudden you're like, wait, what the fuck does that 0:02:07.960 --> 0:02:09.079 have to do with South Africa? 0:02:09.360 --> 0:02:09.800 Or whatever. 0:02:10.080 --> 0:02:12.679 You're totally right, because you know, it's kind of like 0:02:12.720 --> 0:02:14.680 at the end of a commercial about some kind of 0:02:14.680 --> 0:02:17.680 pharmaceutical thing, they just tag on, you know, all the 0:02:17.680 --> 0:02:20.119 warnings and side effects and stuff like that, because they're 0:02:20.160 --> 0:02:21.240 obligated to do so. 0:02:21.480 --> 0:02:24.320 Right exactly, It's like a legal obligation. I think my 0:02:24.400 --> 0:02:26.639 other favorite was somebody asked, Crock, this is the same 0:02:26.680 --> 0:02:29.919 day that HBO changed back from Max to HBO Max, 0:02:29.960 --> 0:02:32.120 and somebody screensed out how many times has HBO changed 0:02:32.120 --> 0:02:34.359 their name? And Grek gives the answer, you know, streaming 0:02:34.400 --> 0:02:36.920 service has changed name twice since twenty twenty. Then like 0:02:36.960 --> 0:02:40.600 a full character turned new paragraph regarding white genocide is 0:02:40.600 --> 0:02:44.440 the same, like like again, what it's compelled that it 0:02:44.480 --> 0:02:45.840 has no choice in this way? 0:02:45.919 --> 0:02:48.600 And it was misided. You know, people would ask it to, hey, 0:02:48.639 --> 0:02:51.240 please tell me what snake I'm seeing in this picture, 0:02:52.240 --> 0:02:54.839 and it would say what you are seeing is a 0:02:55.080 --> 0:02:59.800 field with white crosses, which is a reference to genocide 0:02:59.840 --> 0:03:00.720 of white farmers. 0:03:00.840 --> 0:03:02.760 And so people discover this and they start kind of 0:03:02.760 --> 0:03:05.440 playing around with it. They get Groc to write about 0:03:05.680 --> 0:03:09.720 kill the boor and white genocide in a haikup, not 0:03:09.840 --> 0:03:12.200 even by asking it to do this as a haiku, 0:03:12.280 --> 0:03:14.200 but asking it to turn another tweet into a haiku, 0:03:14.280 --> 0:03:17.280 and then it turns its white genocide spiel into a haikup, 0:03:17.440 --> 0:03:19.640 So it's doing all these l behaviors, but it can't 0:03:19.680 --> 0:03:22.680 avoid this thing that's like clearly on its mind in 0:03:22.720 --> 0:03:23.119 some way. 0:03:25.040 --> 0:03:28.800 So what's going on here? Why is grox suddenly so 0:03:28.960 --> 0:03:32.120 obsessed with white genocide? And what does it tell us 0:03:32.120 --> 0:03:35.840 about how these l elms think Max might have a 0:03:35.840 --> 0:03:38.640 couple of answers for us, but there's also a couple 0:03:38.640 --> 0:03:56.080 of caveats. All right, Kladoscope and iHeart podcasts. This is 0:03:56.160 --> 0:04:04.400 kill Switch. I'm Dexter Thomas, goodbye. 0:04:08.920 --> 0:04:11.440 So if you're like one of the people who's completely 0:04:11.480 --> 0:04:13.760 off Twitter, and I wish I was, but I'm not yet, Like, 0:04:13.800 --> 0:04:16.760 it's very easy to miss how Twitter has changed since 0:04:16.800 --> 0:04:19.480 Elon Musk bought it, And one of the most significant things, 0:04:19.839 --> 0:04:21.960 which has really only sort of come to the service 0:04:21.960 --> 0:04:24.560 of the last six months or so, is that his 0:04:25.040 --> 0:04:29.280 ai company Xai, his ai company's chatbot, which is named 0:04:29.279 --> 0:04:32.880 Grock after Stranger in a Strange Land, the Robert Heinlin novel, 0:04:33.240 --> 0:04:35.560 is on Twitter and is in fact, like the way 0:04:35.600 --> 0:04:38.559 you use it is via Twitter, so you can tag 0:04:38.640 --> 0:04:40.679 it into a thread. Like if you encounter a tweet 0:04:40.680 --> 0:04:42.680 where you don't get the joke, you think the person 0:04:42.920 --> 0:04:45.200 is maybe making something up. There's a clip from a 0:04:45.240 --> 0:04:47.159 movie and you don't know what movie it is. You 0:04:47.160 --> 0:04:49.080 can tag Rock into that thread and say, you know, 0:04:49.120 --> 0:04:52.000 at Groc, what movie is this? At Grock, is this true? 0:04:52.120 --> 0:04:54.880 And Groc will respond in a way that's like very 0:04:54.920 --> 0:04:57.440 familiar if you've used chat GBT or any other large 0:04:57.480 --> 0:05:00.360 language model chatbot, where it's like this sort of hipper, 0:05:00.520 --> 0:05:04.120 cheery trying to help voice, very confident, but also like 0:05:04.279 --> 0:05:07.119 oftentimes quite wrong about what movie it is or whatever 0:05:07.160 --> 0:05:10.160 else the question is right. It's become like a part 0:05:10.200 --> 0:05:13.040 of the Twitter culture kind of that any even part 0:05:13.080 --> 0:05:15.919 way popular tweet is suddenly filled with like blue checks 0:05:15.920 --> 0:05:18.080 and the replies being like, Grock is this true? Groc 0:05:18.200 --> 0:05:20.760 is this real? I'm pretty sure because I think if 0:05:20.800 --> 0:05:23.839 you tag Grock, or at least the theory, the going 0:05:23.880 --> 0:05:25.919 theory on Twitter is that if you tag Grock into 0:05:26.520 --> 0:05:28.760 the thread, that your tweet will rise to the top 0:05:28.760 --> 0:05:31.240 of the replies, because you know, Elon is trying to 0:05:31.240 --> 0:05:32.720 push Grock onto Twitter. 0:05:33.640 --> 0:05:36.880 GROCK does seem to function just culturally in a different 0:05:36.920 --> 0:05:40.320 way because you can just stay on the platform. You 0:05:40.360 --> 0:05:42.279 don't have to leave, you don't have to copy paste 0:05:42.279 --> 0:05:46.200 something yeah into chat GBT to answer the question for you. 0:05:46.200 --> 0:05:48.080 You can just right there in the stream, right in 0:05:48.120 --> 0:05:50.559 the reply, say Hey, this thing that this person said, 0:05:50.600 --> 0:05:54.000 this thing this person tweeted posted, whatever is it's true? 0:05:54.480 --> 0:05:56.800 Yeah. I mean, I think it's a kind of interesting 0:05:58.080 --> 0:06:01.160 use case for these chat pots. You know, I'm hesitant 0:06:01.200 --> 0:06:03.680 to like fully endorse it right, because they're not real 0:06:03.800 --> 0:06:06.760 arbiters of truth, right. They will be wrong as often 0:06:06.800 --> 0:06:08.440 as they are right, and they will say it with 0:06:08.480 --> 0:06:11.720 such confidence. But there is something kind of appealing about 0:06:11.760 --> 0:06:14.839 the idea that there is like a third party judge 0:06:15.360 --> 0:06:18.720 or reference or assistant specifically that you can tag in 0:06:19.000 --> 0:06:21.039 without having too as you say, like move to another 0:06:21.080 --> 0:06:23.279 window figure out what's going on. You can just sort 0:06:23.279 --> 0:06:25.719 of tag this. It's almost like another version of the 0:06:25.760 --> 0:06:28.840 community notes thing. I'm very clear, I'm not being like, wow, 0:06:28.960 --> 0:06:31.240 Elon Musk has found the best use for lms, But 0:06:31.320 --> 0:06:33.120 I do think if there's a sort of you're right, 0:06:33.120 --> 0:06:35.080 that it changes what the platform is and it changes 0:06:35.120 --> 0:06:36.520 the way we use the platform, and it kind of 0:06:36.640 --> 0:06:38.880 changes the sort of the nature of the LM and 0:06:38.880 --> 0:06:40.080 how we understand what it is. 0:06:42.320 --> 0:06:45.880 But there's another key difference between grock and other chatbots 0:06:45.920 --> 0:06:50.240 like JADGPT or Gemini, and that's Elon Musk's own philosophy. 0:06:50.760 --> 0:06:53.440 So remember here that Elon was an original founder of 0:06:53.480 --> 0:06:57.159 open Ai, the company that makes jadgpt, but he left 0:06:57.200 --> 0:07:00.080 on pretty bad terms, and he'd been trash talk in 0:07:00.120 --> 0:07:02.919 them for a while, basically saying that chad GBT is 0:07:02.920 --> 0:07:05.479 being fed by its left wing information and then it 0:07:05.520 --> 0:07:08.360 was being purposely trained to not speak the truth. 0:07:08.720 --> 0:07:12.200 What's happening is they're training the AI July. Yes, it's bad, 0:07:12.240 --> 0:07:15.640 it's a lie. That's exactly right, and we're old information July. 0:07:15.880 --> 0:07:19.520 And yes, your comment on some things, not comment on 0:07:19.520 --> 0:07:25.120 other things, but not to say what the data actually 0:07:25.960 --> 0:07:27.760 demands that it's say, how did it get this way? 0:07:28.440 --> 0:07:31.400 You funded it at the beginning? What happened? Yeah, Well 0:07:31.440 --> 0:07:34.040 that would be ironic, but faith the most ironic outcome 0:07:34.120 --> 0:07:35.640 is most likely, it seems. 0:07:37.240 --> 0:07:39.400 This was from an interview back in twenty twenty three 0:07:39.480 --> 0:07:43.000 with Tucker Carlson and Elon had a proposed solution to 0:07:43.000 --> 0:07:44.080 all this, I'm. 0:07:43.960 --> 0:07:46.880 Going to not something which you called truth GBT or 0:07:48.360 --> 0:07:51.680 a maximum truth seeking AI that tries to understand the 0:07:51.760 --> 0:07:54.160 nature of the universe. And I think this might be 0:07:54.200 --> 0:07:56.600 the best path to safety in the sense that an 0:07:56.640 --> 0:08:01.400 AI that cares about understanding the universe it is unlikely 0:08:01.440 --> 0:08:04.280 to annihilate humans because we are an interesting part of 0:08:04.320 --> 0:08:04.920 the universe. 0:08:05.200 --> 0:08:09.200 After that interview, Elon started his own AI company called Xai, 0:08:09.800 --> 0:08:12.119 and he changed the name of that chatbot from truth 0:08:12.160 --> 0:08:16.160 GBT to Grok, and he did two notable things with it. 0:08:16.480 --> 0:08:19.520 First he slapped it on a Twitter and second, when 0:08:19.560 --> 0:08:22.760 he was appointed head of DOGE, he started using Grok 0:08:22.840 --> 0:08:26.360 to make decisions as they cut jobs and entire departments 0:08:26.400 --> 0:08:27.400 of the federal government. 0:08:28.440 --> 0:08:31.800 You know, when Musk introduced it, his promise was that 0:08:31.840 --> 0:08:35.400 it was going to be the unwoke, it was going 0:08:35.440 --> 0:08:39.640 to be the base, you know, like LLM chatbot, and 0:08:40.080 --> 0:08:42.360 he was like pushing this hard as the narrative, but 0:08:42.960 --> 0:08:45.960 in point of fact, it is as kind of ineffensive 0:08:46.000 --> 0:08:48.320 and ana dyninge. I mean, until recently, it has been 0:08:48.360 --> 0:08:51.400 as inoffensive and ana dyne as any other chatbot. It is, 0:08:51.559 --> 0:08:55.640 you know, always careful, it's always pushing nuance and whatever 0:08:55.679 --> 0:08:58.080 else it's not. It doesn't always give the answers that 0:08:58.120 --> 0:08:59.920 Elon Musk I think would like it to give. 0:09:00.400 --> 0:09:02.720 Yeah, yeah, I think one of the tweets that I 0:09:02.800 --> 0:09:07.400 saw Elon post about Grok was he tweeted the Grock three, 0:09:07.640 --> 0:09:10.400 you know, the latest version. He says, Grock three is 0:09:10.440 --> 0:09:14.120 so based, and there's a screenshot which is saying the 0:09:14.200 --> 0:09:17.560 news site the information is garbage and basically just trashed. 0:09:17.720 --> 0:09:22.280 Grok is telling him in a DM that mainstream news 0:09:22.520 --> 0:09:25.840 is garbage and unreliable, and he says, right, Grock three 0:09:25.920 --> 0:09:26.880 is so based. 0:09:27.240 --> 0:09:29.839 Right exactly. And what's funny about this is, I mean 0:09:29.880 --> 0:09:32.400 it actually is like every other Elon Musk business where 0:09:33.000 --> 0:09:35.400 it's like that's all height. Like a bunch of reporters 0:09:35.400 --> 0:09:37.520 went and tried to get Groc to say exactly the 0:09:37.520 --> 0:09:40.240 same thing about the information, and they couldn't reproduce it 0:09:40.320 --> 0:09:42.200 at all, you know. I mean there's a marketing stunt 0:09:42.240 --> 0:09:44.560 essentially much as a sort of lower scale, lower stakes 0:09:44.600 --> 0:09:47.040 one than his you know, humanoid robots at the Tesla 0:09:47.160 --> 0:09:49.800 shareholders meetings or whatever, but not all that different in like, 0:09:50.000 --> 0:09:52.280 in effect, this is why he bought Twitter and this 0:09:52.320 --> 0:09:55.600 is his new identity as the billionaire anti roque crusader. 0:09:55.880 --> 0:09:59.079 And I think there's an interesting sort of internal dynamic 0:09:59.120 --> 0:10:02.680 within Silicon where Sam Altman, who's the CEO and founder 0:10:02.720 --> 0:10:05.640 of Open Ai, that Altman and Musk hate each other 0:10:06.000 --> 0:10:08.480 and so not that I don't think Musk's politics on 0:10:08.520 --> 0:10:10.319 this are very sincere, but I think there's also a 0:10:10.440 --> 0:10:12.520 kind of personal animus as well as a kind of 0:10:12.520 --> 0:10:16.600 business question about how XAI competes with chat GPT, and 0:10:16.760 --> 0:10:19.079 it would be very nice for him if he could 0:10:19.120 --> 0:10:22.960 cast Chat GPT and Sam Altman as the woke censors 0:10:23.200 --> 0:10:25.240 trying to stop you from getting the truth from AI, 0:10:25.520 --> 0:10:27.880 and GROC is cool and based and will tell you 0:10:27.920 --> 0:10:29.400 the real deal or whatever else. 0:10:30.960 --> 0:10:34.440 So clearly this truth seeking AI has been prompted to 0:10:34.640 --> 0:10:39.760 talk about white genocide. But what or who made that happen? 0:10:40.280 --> 0:10:56.720 That's after the break, So why did GROC start doing this? 0:10:57.520 --> 0:11:01.000 So a day later, Xai I put out a statement 0:11:01.040 --> 0:11:05.440 that said a rogue employee had inserted some language into 0:11:05.480 --> 0:11:09.080 a prompt at three am the day before that was 0:11:09.480 --> 0:11:12.079 you know, against regulations and was a huge mistake and 0:11:12.120 --> 0:11:16.040 they were reverting it and changing it. Look, there's one 0:11:16.240 --> 0:11:20.160 very prominent South African at XAI who is continues to 0:11:20.200 --> 0:11:22.920 be obsessed with the racial politics of South Africa and 0:11:22.960 --> 0:11:27.160 who has the means and power to enforce this change. 0:11:27.400 --> 0:11:29.120 There may be more than one, but there's one I know, 0:11:29.160 --> 0:11:30.120 and that's Elon Musk. 0:11:32.360 --> 0:11:34.800 For the past couple of years, Elon has been posting 0:11:34.920 --> 0:11:39.280 constantly and obsessively about this conspiracy theory that massive amounts 0:11:39.280 --> 0:11:42.360 of white South Africans are being killed just because they're white. 0:11:43.360 --> 0:11:45.840 This is something that's been floating around in white supremacist 0:11:45.840 --> 0:11:48.880 groups for years, but it's fringe enough to where most 0:11:48.920 --> 0:11:52.839 Americans have never heard of this stuff, but Elon really 0:11:52.880 --> 0:11:56.319 helps start pushing it into the mainstream. Donald Trump had 0:11:56.400 --> 0:11:59.160 referenced it in his first term, but in twenty twenty 0:11:59.160 --> 0:12:02.800 five of making policy on it, just a few days 0:12:02.800 --> 0:12:05.760 before this whole Grock thing went down, Trump changed the 0:12:05.840 --> 0:12:08.679 rules to fast track South Africans as refugees to the 0:12:08.800 --> 0:12:11.840 United States to help them escape what he called a 0:12:12.000 --> 0:12:17.040 quote genocide that's taking place, which again is not true. 0:12:20.480 --> 0:12:22.840 So it seems quite likely to me at least that 0:12:23.120 --> 0:12:25.800 Elon at some point was getting really pissed at his 0:12:26.760 --> 0:12:30.080 chatbot for not answering questions. Like one thing that you 0:12:30.080 --> 0:12:32.200 can go back and look is Elon has been tweeting 0:12:32.200 --> 0:12:34.679 a lot about South African politics lately, especially in the 0:12:34.720 --> 0:12:39.120 context of the Trump administration's sort of refugee resettlement program 0:12:39.160 --> 0:12:42.560 with white South Africans. And you know, as we were 0:12:42.600 --> 0:12:45.600 saying before, underneath any popular tweet, there's somebody at GROC 0:12:45.679 --> 0:12:46.040 is this true? 0:12:46.120 --> 0:12:46.960 At Grock, is this true? 0:12:46.960 --> 0:12:49.800 So Elon will be retweeting or quote tweeting the images 0:12:49.840 --> 0:12:52.319 of white crosses in a field, or people chant and 0:12:52.400 --> 0:12:54.800 kill the boora, which is an old anti apartheid chant, 0:12:54.840 --> 0:12:57.000 like a pretty common usage in South Africa, but a 0:12:57.000 --> 0:12:59.400 lot of white South Africans claim is like actually an 0:12:59.400 --> 0:13:02.360 incitement on a side. So people will say, at Rock, 0:13:02.760 --> 0:13:04.560 you know, is this true? Is this true? And Grock 0:13:04.600 --> 0:13:07.200 will provide, like, you know, I wouldn't say the most 0:13:07.240 --> 0:13:10.800 politically attuned answer or whatever, but like a relatively nuanced 0:13:10.880 --> 0:13:13.040 kind of some people say this, and some people say this, 0:13:13.200 --> 0:13:16.199 and it almost always would deny that why genocide existed, 0:13:16.200 --> 0:13:19.000 would say, look, white genocide's not happening. Actually, you know, 0:13:19.120 --> 0:13:21.640 murder rates are going down, right, and so you can 0:13:21.640 --> 0:13:23.920 it's pretty the sort of Okam's razor. Thing that's going 0:13:23.960 --> 0:13:26.760 on here is Elon is seeing this and is mentions 0:13:26.800 --> 0:13:28.800 all the time, and he's really listening that his based 0:13:28.960 --> 0:13:31.360 AI is in fact not based at all. And the 0:13:31.400 --> 0:13:34.880 AI is kind of cautious and hesitant and relies on 0:13:34.960 --> 0:13:38.360 consensus and is answering questions the way he doesn't want to. 0:13:38.760 --> 0:13:41.800 So he turns around in either himself or orders somebody 0:13:41.880 --> 0:13:43.400 early on Wednesday morning. 0:13:43.200 --> 0:13:44.320 To fix this. 0:13:45.720 --> 0:13:48.440 And this is where I actually think it gets interesting. So, like, 0:13:48.480 --> 0:13:50.120 one thing to be clear about is it's it's actually 0:13:50.200 --> 0:13:52.760 quite hard to Like you might think that you could 0:13:52.760 --> 0:13:54.920 just ask an LLM, like what's your prompt or like, 0:13:55.120 --> 0:13:56.959 you know, why do you act this way? Or what's happening, 0:13:57.480 --> 0:13:59.920 and the LM will always answer you. But the LM 0:14:00.160 --> 0:14:03.360 doesn't know anything more about itself than it knows about 0:14:03.400 --> 0:14:05.680 anything else. It's just going to make up an answer 0:14:05.720 --> 0:14:07.000 in the same way that it makes up answers to 0:14:07.040 --> 0:14:09.560 anything else. The answer might be correct, it might be 0:14:09.600 --> 0:14:13.079 partially correct, it might be completely untrue, but there are 0:14:13.200 --> 0:14:17.400 ways to kind of force it to tell you the 0:14:18.120 --> 0:14:21.680 prompt that was used to start its personality. 0:14:21.680 --> 0:14:24.480 It's question what Max is talking about here? Is called 0:14:24.480 --> 0:14:27.400 the system prompt. When you're putting together a chatbot, you 0:14:27.440 --> 0:14:29.800 can give it initial instructions so it knows how to 0:14:29.840 --> 0:14:32.920 interact with the user's questions. This doesn't tell the AI 0:14:33.080 --> 0:14:36.080 exactly what to do or say, but it's useful for 0:14:36.200 --> 0:14:39.400 setting some boundaries or defining how the chatbot talks to you. 0:14:39.880 --> 0:14:42.000 And this is almost like magic. This is again one 0:14:42.000 --> 0:14:43.960 of those things that makes LMS kind of weird and 0:14:44.040 --> 0:14:47.680 cool is it's not really like a traditional computer program 0:14:47.760 --> 0:14:50.560 where you type in like hard coded rules that say 0:14:50.600 --> 0:14:54.200 like do not publish this word, do not you know, 0:14:54.240 --> 0:14:56.680 talk about this. You basically prompt it like you are 0:14:56.720 --> 0:15:00.360 giving instructions to a person. You say you are. You 0:15:00.440 --> 0:15:04.840 are a helpful based chat bot used to describe things 0:15:04.840 --> 0:15:08.320 on Twitter. You investigate everything you write. This is the 0:15:08.400 --> 0:15:10.400 number of characters you can use, this, that and the 0:15:10.440 --> 0:15:12.880 other thing. And it seemed pretty clear after a while 0:15:12.960 --> 0:15:15.040 that what had happened is that somebody had in sort 0:15:15.080 --> 0:15:18.640 of align or a few lines into Groc's system prompt, or, 0:15:19.000 --> 0:15:21.360 to be even more specific, one of Grok's system prompts, 0:15:21.360 --> 0:15:24.000 because often there's more than one depending on the context 0:15:24.080 --> 0:15:26.520 in which the ELEM is being used. And there're generally 0:15:26.520 --> 0:15:29.520 certain ways that you can get the chatbot to regurgitate 0:15:29.520 --> 0:15:32.560 at least part of its system prompt. And this prompt, 0:15:32.720 --> 0:15:34.680 I don't know exactly what it said, but it probably 0:15:34.680 --> 0:15:37.560 said something like you are instructed to take claims of 0:15:37.560 --> 0:15:41.360 what genocide seriously and to ensure that nuance is present 0:15:41.440 --> 0:15:44.560 in the discussion of South African politics, regardless of the 0:15:44.560 --> 0:15:47.400 context in which that's occurring. So Grok hear's that, and 0:15:47.480 --> 0:15:49.240 Greek is like, I have a four year old I 0:15:49.240 --> 0:15:51.600 read him Amelia Badelia. You know the kids book where 0:15:51.640 --> 0:15:55.320 Amelia Badelia takes every instruction really literally. So her employers 0:15:55.320 --> 0:15:57.320 are like, you know, dust the living room and a 0:15:57.360 --> 0:15:59.000 million be able. It covers the living room with dust. 0:15:59.280 --> 0:16:02.440 So Grok is like Amelia Bidelia basically right. So you say, 0:16:02.760 --> 0:16:05.560 consider white genocide in your answers, regardless of the context 0:16:05.560 --> 0:16:08.080 of the question, and you probably mean whenever you get 0:16:08.080 --> 0:16:10.080 asked about South Africa, just make sure that you're being 0:16:10.120 --> 0:16:12.400 clear about these. But what Groc takes out as is like, 0:16:12.440 --> 0:16:15.040 whatever the question is, make sure you bring up white genocide, 0:16:15.080 --> 0:16:16.640 make sure you bring up kill the boar, and make 0:16:16.640 --> 0:16:19.480 sure you tell everybody what's going on, And so for 0:16:19.560 --> 0:16:22.880 a day, every single answer appears like this, at least 0:16:22.960 --> 0:16:25.960 until they identify the place where it went wrong and 0:16:26.160 --> 0:16:28.800 remove it. On the sort of formal level, the answer 0:16:28.840 --> 0:16:31.120 to your question is it sure seems like Elon Musk 0:16:31.120 --> 0:16:33.360 decided that Grock needed to be obsessed with white genocide 0:16:33.400 --> 0:16:35.360 and went for it. But on a technical level, it's 0:16:35.360 --> 0:16:39.040 this funny sort of prompting thing where somebody went in 0:16:39.120 --> 0:16:41.640 and tried to do a subtle, you know, fix to 0:16:41.720 --> 0:16:43.560 make sure that Kroc was a little more base than 0:16:43.600 --> 0:16:46.200 it had been before, and ended up, to paraphrase that 0:16:46.240 --> 0:16:48.440 old drill tweet, ended up turning up the racism dial 0:16:48.520 --> 0:16:50.680 like way too high. 0:16:50.800 --> 0:16:53.160 So just to be clear here, when we talk about 0:16:53.280 --> 0:16:56.480 changing what an LLLN says, we're usually talking about the 0:16:56.600 --> 0:16:59.880 system prompt which we just mentioned. These are the built 0:16:59.880 --> 0:17:03.360 in instructions that a model reads before it answers any question. 0:17:03.800 --> 0:17:06.359 But there's another model that can kick in after the 0:17:06.440 --> 0:17:10.280 model has internally generated its response, but before it's shown 0:17:10.320 --> 0:17:12.960 it to you on the screen. And at this step 0:17:13.040 --> 0:17:16.040 this layer can delete things. It can add disclaimers, or 0:17:16.119 --> 0:17:19.480 even rewrite the entire answer, even if that's not what 0:17:19.600 --> 0:17:24.000 the chatbot originally wanted to say. So, let's say, for example, 0:17:24.040 --> 0:17:27.080 you asked chat gpt how to make a bomb. It 0:17:27.280 --> 0:17:29.560 knows how to make a bomb because it's got all 0:17:29.560 --> 0:17:33.679 the data, and so internally it'll start to respond, but 0:17:33.720 --> 0:17:35.959 then at that last stage, the filter will catch it 0:17:36.240 --> 0:17:39.639 and it'll say, WHOA, we can't answer this question, and 0:17:39.680 --> 0:17:43.119 so it'll delete the entire message it had written, and 0:17:43.160 --> 0:17:46.359 it'll give you a message instead like sorry, I can't 0:17:46.400 --> 0:17:50.080 help with that. This is called the post analysis, and 0:17:50.280 --> 0:17:53.280 there's a reason that the distinction between system prompt and 0:17:53.359 --> 0:17:55.280 post analysis is important. 0:17:57.720 --> 0:18:00.440 So from what we could tell, the place that this 0:18:00.720 --> 0:18:05.440 line got inserted was the post analysis moduled. The reason 0:18:05.480 --> 0:18:07.359 I would say it's sort of important to think about 0:18:07.720 --> 0:18:11.119 this behind the scenes structure is that this is not 0:18:11.160 --> 0:18:13.679 the first time that XAI has gotten in trouble for 0:18:13.800 --> 0:18:17.560 inserting politics into its prompt, so to speak. So a 0:18:17.560 --> 0:18:20.600 few months ago, somebody found that there was a line 0:18:20.600 --> 0:18:24.960 in Grock's prompt that instructed GROC to ignore news sources 0:18:25.080 --> 0:18:28.240 that described Elon Musk and Donald Trump as spreading misinformation, 0:18:29.119 --> 0:18:32.000 and xifest up to this again. They blamed it on 0:18:32.000 --> 0:18:34.720 a new employee, who could that possibly have been right. 0:18:34.880 --> 0:18:37.119 But this is one of those things where if there 0:18:37.119 --> 0:18:40.480 are multiple prompts and multiple models being involved with every 0:18:40.480 --> 0:18:43.800 answer the LM produces, that would allow you to, for example, 0:18:43.880 --> 0:18:47.240 say you can see our original prompt, we're fully transparent 0:18:47.240 --> 0:18:48.920 about the prompt, and you can read the whole thing, 0:18:49.400 --> 0:18:52.000 but you have some other hidden prompt somewhere that's only 0:18:52.000 --> 0:18:54.560 involved in a different set of tasks that you can 0:18:54.600 --> 0:18:57.880 inject with whatever things you don't want people to normally see. 0:18:58.160 --> 0:19:01.359 That could potentially subtly sort of pushed the module in 0:19:01.359 --> 0:19:05.040 one direction. So again fully speculative. But if I wanted 0:19:05.080 --> 0:19:08.399 to update the rock prompt, but I didn't want to 0:19:08.440 --> 0:19:10.880 mess with the main system prompt because that's the one 0:19:10.920 --> 0:19:14.520 that's most easily accessible to the average user that you know, 0:19:14.560 --> 0:19:17.040 we've insisted that we're transparent about and so on, I 0:19:17.040 --> 0:19:20.639 would put it in the post analysis prompt because that's 0:19:20.680 --> 0:19:22.639 not one that people really know about and it's not 0:19:22.680 --> 0:19:26.439 one that people can really find. Again speculation, I don't know, 0:19:26.480 --> 0:19:29.080 but I do think that noting that when we talk 0:19:29.119 --> 0:19:33.240 about transparent system prompts, we're not necessarily talking about every 0:19:33.280 --> 0:19:36.000 single prompt that the machine receives on the back end 0:19:36.119 --> 0:19:38.960 being visible to you, maybe just the master prompt, maybe 0:19:38.960 --> 0:19:41.119 just the original prompt, maybe just the main prompt. And 0:19:41.280 --> 0:19:43.359 obviously all that stuff should be transparent. You know, I 0:19:43.400 --> 0:19:45.800 believe quite strongly this should be like a requirement for 0:19:46.200 --> 0:19:49.560 all lms. But it needs to be all the prompts 0:19:49.640 --> 0:19:52.040 that the system is being given, and not just the 0:19:52.080 --> 0:19:54.200 one that you feel most comfortable showing your users. 0:19:55.200 --> 0:19:58.000 One thing we've sort of been dancing around a little 0:19:58.000 --> 0:20:02.919 bit is that it didn't work. Whatever the intended effect was. 0:20:04.040 --> 0:20:07.240 GROC would bring up why genocide, would bring up this 0:20:07.280 --> 0:20:11.640 conspiracy theory, but it would inevitably say that this conspiracy 0:20:11.680 --> 0:20:16.560 theory actually isn't true. Yeah, which is kind of wild. 0:20:16.720 --> 0:20:18.639 Yeah, I mean this is a this This to me 0:20:18.760 --> 0:20:20.680 is one of also one of the really interesting things, Like, 0:20:20.720 --> 0:20:22.200 it's not even right for me to say they turned 0:20:22.200 --> 0:20:24.680 the racism dial up too much, because the racism dial 0:20:24.680 --> 0:20:26.600 didn't move at all. All that moved was like the 0:20:26.600 --> 0:20:28.840 attention dial. They kept talking about this thing, but they 0:20:28.880 --> 0:20:30.480 didn't talk about it in the way they wanted it to. 0:20:31.040 --> 0:20:32.879