1 00:00:05,160 --> 00:00:07,880 Speaker 1: You know that moment in the horror movie where the 2 00:00:07,960 --> 00:00:12,040 Speaker 1: monster is coming closer but the person on screen doesn't 3 00:00:12,080 --> 00:00:14,880 Speaker 1: see it. Why does that drive you crazy? And what 4 00:00:14,920 --> 00:00:19,560 Speaker 1: does that teach us about brains? What is theory of 5 00:00:19,800 --> 00:00:23,079 Speaker 1: mind and why is it so important for everyone from 6 00:00:23,160 --> 00:00:27,160 Speaker 1: poker players to con men, to stage magicians to novelists. 7 00:00:27,720 --> 00:00:30,360 Speaker 1: We're going to talk about a very fundamental skill of 8 00:00:30,440 --> 00:00:34,880 Speaker 1: human brains today, and as impressive as AI is currently, 9 00:00:34,920 --> 00:00:38,880 Speaker 1: we're going to ask the question of whether computers can 10 00:00:39,000 --> 00:00:42,440 Speaker 1: replicate this right now or whether it is beyond their 11 00:00:42,479 --> 00:00:48,200 Speaker 1: skill set. Welcome to Inner Cosmos with me David Eagleman. 12 00:00:48,280 --> 00:00:50,920 Speaker 1: I'm a neuroscientist and an author at Stanford, and in 13 00:00:50,960 --> 00:00:54,639 Speaker 1: these episodes we sail deeply into our three pound universe 14 00:00:54,880 --> 00:00:58,280 Speaker 1: to understand why and how our lives look the way 15 00:00:58,280 --> 00:01:02,000 Speaker 1: they do. Today's episode is about what it takes to 16 00:01:02,240 --> 00:01:05,680 Speaker 1: understand other people, how your brain does it, and whether 17 00:01:05,840 --> 00:01:11,920 Speaker 1: computers could do it. So imagine this. You're walking down 18 00:01:11,920 --> 00:01:17,080 Speaker 1: the street and you see someone frantically searching their pockets 19 00:01:17,200 --> 00:01:20,880 Speaker 1: and looking around with furrowed brows in a tight frown. 20 00:01:21,480 --> 00:01:25,399 Speaker 1: So without them saying a word, you can infer that 21 00:01:25,480 --> 00:01:29,120 Speaker 1: they might have lost something important. Maybe it's his keys. 22 00:01:29,760 --> 00:01:33,560 Speaker 1: Your brain can easily make a good guess about another 23 00:01:33,680 --> 00:01:37,679 Speaker 1: person's mental state just from looking at their actions. We 24 00:01:37,800 --> 00:01:41,319 Speaker 1: are inferring something about what is going on in that 25 00:01:41,400 --> 00:01:45,640 Speaker 1: person's head. But it's more than just pattern matching. It's 26 00:01:45,640 --> 00:01:48,520 Speaker 1: not simply that your brain has seen lots of people 27 00:01:48,600 --> 00:01:52,440 Speaker 1: patting their pockets and you talked with them afterwards, and 28 00:01:52,480 --> 00:01:54,760 Speaker 1: you figured out why they were doing that, and you 29 00:01:54,880 --> 00:01:59,000 Speaker 1: detected a pattern, and you memorized, ah, okay, that pattern 30 00:01:59,040 --> 00:02:04,120 Speaker 1: equals that problem. Instead, you have the ability to imagine 31 00:02:04,200 --> 00:02:08,880 Speaker 1: yourself in their situation. You can mentally slip into their 32 00:02:08,960 --> 00:02:12,200 Speaker 1: shoes and ask, what would I be thinking if I 33 00:02:12,320 --> 00:02:16,720 Speaker 1: were patting my pockets and frantically searching around me? And 34 00:02:16,800 --> 00:02:19,959 Speaker 1: maybe you see something else. You see a kid there 35 00:02:20,120 --> 00:02:23,360 Speaker 1: around the corner, and the kid is peeking around the 36 00:02:23,400 --> 00:02:26,720 Speaker 1: corner at the man patting his pockets, and the child 37 00:02:27,200 --> 00:02:30,520 Speaker 1: is giggling. Now, why is the kid giggling while the 38 00:02:30,560 --> 00:02:34,520 Speaker 1: guy is so obviously worried, Well, it probably strikes you 39 00:02:34,560 --> 00:02:37,520 Speaker 1: that he's hiding something from the guy. You see that 40 00:02:37,560 --> 00:02:41,160 Speaker 1: the kid is not running away. Instead, he's standing in 41 00:02:41,280 --> 00:02:44,000 Speaker 1: such a way that he'll be spotted. Now it's pretty 42 00:02:44,080 --> 00:02:48,280 Speaker 1: obvious what's happening here. You can step into the man's 43 00:02:48,320 --> 00:02:50,760 Speaker 1: head to feel the worry, and you can step into 44 00:02:50,800 --> 00:02:53,880 Speaker 1: the kid's head to recognize that he feels like he's 45 00:02:53,919 --> 00:02:56,000 Speaker 1: playing a game, even if it doesn't strike you as 46 00:02:56,080 --> 00:03:00,920 Speaker 1: so funny. Then you catch the guy meet with the 47 00:03:01,040 --> 00:03:03,680 Speaker 1: kid for just a fraction of a second, which sends 48 00:03:03,720 --> 00:03:07,360 Speaker 1: the kid into fits of laughter, and you realize the 49 00:03:07,400 --> 00:03:11,280 Speaker 1: man is just playing along. Now, how did you decide 50 00:03:11,680 --> 00:03:14,440 Speaker 1: what is going on in the heads of these two Again, 51 00:03:14,520 --> 00:03:18,520 Speaker 1: it's not as though you memorized an algorithm here. Okay, 52 00:03:18,560 --> 00:03:21,280 Speaker 1: if there's eye contact, then there's one interpretation. If there's 53 00:03:21,280 --> 00:03:26,320 Speaker 1: no eye contact, then a totally different interpretation. To appreciate 54 00:03:26,520 --> 00:03:29,960 Speaker 1: how complex this mind reading is that you just did, 55 00:03:30,840 --> 00:03:34,639 Speaker 1: just imagine that you're a space alien watching this scene 56 00:03:34,720 --> 00:03:38,080 Speaker 1: from your spaceship. You would be totally confused. You would 57 00:03:38,120 --> 00:03:41,800 Speaker 1: have no idea what's going on in this weird scene 58 00:03:42,280 --> 00:03:45,200 Speaker 1: because you don't know what it is to be a human. 59 00:03:46,000 --> 00:03:49,160 Speaker 1: Here's another analogy to appreciate this. Think about the way 60 00:03:49,160 --> 00:03:53,400 Speaker 1: that you, as a human might watch fish. You really 61 00:03:53,440 --> 00:03:57,160 Speaker 1: don't understand what the heck they're doing. One fish suddenly 62 00:03:57,160 --> 00:04:01,080 Speaker 1: starts swimming faster, and another starts swimming in circles, and 63 00:04:01,160 --> 00:04:04,400 Speaker 1: one starts flapping its gills faster, and one moves up 64 00:04:04,640 --> 00:04:09,040 Speaker 1: towards the surface. It's all just weird fish behavior to you. 65 00:04:09,040 --> 00:04:11,120 Speaker 1: You don't know how to read any of it. It's 66 00:04:11,200 --> 00:04:14,720 Speaker 1: just fish stuff, and you're not able to immediately construct 67 00:04:14,760 --> 00:04:19,000 Speaker 1: a story about the meaning of any of this. And 68 00:04:19,000 --> 00:04:22,279 Speaker 1: that's what it's like to be this space alien watching 69 00:04:22,320 --> 00:04:27,200 Speaker 1: this guy checking his pockets and the child giggling. Now, 70 00:04:27,760 --> 00:04:31,520 Speaker 1: what allows us, as opposed to the space alien, to 71 00:04:31,600 --> 00:04:36,000 Speaker 1: be so good at reading our fellow humans. This is 72 00:04:36,040 --> 00:04:41,240 Speaker 1: what psychologists and neuroscientists call theory of mind, and that's 73 00:04:41,279 --> 00:04:44,560 Speaker 1: what we're talking about today. Theory of mind is the 74 00:04:44,600 --> 00:04:49,920 Speaker 1: ability to understand that other people have their own thoughts 75 00:04:49,960 --> 00:04:53,720 Speaker 1: and feelings and beliefs that are different from yours. It's 76 00:04:53,760 --> 00:04:58,120 Speaker 1: the ability to recognize that others have their own perspectives. 77 00:04:58,160 --> 00:05:02,200 Speaker 1: It's the ability to attribut mute mental states to other people, 78 00:05:02,320 --> 00:05:07,440 Speaker 1: like what their intentions are, or their desires, or their emotions, 79 00:05:07,960 --> 00:05:11,640 Speaker 1: or what they know or don't know. And theory of 80 00:05:11,680 --> 00:05:15,440 Speaker 1: mind is a key cognitive skill that allows us to 81 00:05:15,600 --> 00:05:19,360 Speaker 1: interact with other people in a very rich and nuanced way. 82 00:05:19,880 --> 00:05:23,560 Speaker 1: Just think about how pervasive this skill is in everything 83 00:05:23,600 --> 00:05:27,359 Speaker 1: we do. So take sarcasm. When your friend makes a 84 00:05:27,760 --> 00:05:32,279 Speaker 1: sarcastic comment, you can recognize that her words don't match 85 00:05:32,400 --> 00:05:36,640 Speaker 1: her true intention. So, for example, if she says, oh, awesome, 86 00:05:36,680 --> 00:05:40,919 Speaker 1: more traffic, I love traffic, you infer that she's not 87 00:05:41,120 --> 00:05:46,400 Speaker 1: actually pleased. This requires understanding her mental state that she 88 00:05:46,600 --> 00:05:52,320 Speaker 1: is irritated not happy. Now, if you were Siri or Alexa, 89 00:05:52,440 --> 00:05:55,919 Speaker 1: you wouldn't be able to recognize anything but the words. 90 00:05:56,320 --> 00:05:59,800 Speaker 1: You wouldn't understand anything about the mind behind the words. 91 00:06:00,279 --> 00:06:03,200 Speaker 1: So we're going to talk about how brains do it 92 00:06:03,560 --> 00:06:06,719 Speaker 1: and whether or not computers can do it. But before 93 00:06:06,720 --> 00:06:08,320 Speaker 1: we go there, we're going to take a few minutes 94 00:06:08,360 --> 00:06:12,839 Speaker 1: to really appreciate how the skill is everywhere in what 95 00:06:12,960 --> 00:06:17,880 Speaker 1: we do. For example, just think about different professions. So 96 00:06:18,000 --> 00:06:22,440 Speaker 1: detectives use theory of mind all the time. Did mister 97 00:06:22,560 --> 00:06:25,080 Speaker 1: Jones know that the food had gone bad when he 98 00:06:25,160 --> 00:06:28,560 Speaker 1: sold it? Did mister Smith know that his boss was 99 00:06:28,640 --> 00:06:32,600 Speaker 1: involved with organized crime or was he acting with no knowledge? 100 00:06:33,200 --> 00:06:35,000 Speaker 1: Or more generally, if they want to know if someone 101 00:06:35,080 --> 00:06:38,120 Speaker 1: is lying, it usually helps to step into their shoes 102 00:06:38,160 --> 00:06:40,880 Speaker 1: and think about what that person knows or doesn't know. 103 00:06:41,279 --> 00:06:44,400 Speaker 1: Magicians use theory of mind. They know that if they 104 00:06:44,800 --> 00:06:47,400 Speaker 1: move their hand in an arc, your attention is going 105 00:06:47,440 --> 00:06:51,560 Speaker 1: to follow that, and therefore they know what you won't 106 00:06:51,640 --> 00:06:54,919 Speaker 1: see them do. They know that even though they know 107 00:06:55,120 --> 00:06:58,040 Speaker 1: something happened, like the card dropped into their sleeve, they 108 00:06:58,040 --> 00:07:01,760 Speaker 1: know that you don't know that. They always keep your 109 00:07:01,920 --> 00:07:05,679 Speaker 1: point of view, your beliefs, at the forefront of their mind. 110 00:07:06,440 --> 00:07:08,800 Speaker 1: Con Men do this. They listen to your words and 111 00:07:08,839 --> 00:07:11,920 Speaker 1: they read your body language to gather what you know 112 00:07:12,040 --> 00:07:15,280 Speaker 1: and don't know, and therefore what buttons they should push next. 113 00:07:15,720 --> 00:07:20,480 Speaker 1: Psychiatrists and psychologists always use theory of mind to understand 114 00:07:20,800 --> 00:07:24,200 Speaker 1: what is being expressed from the patient's point of view, 115 00:07:24,400 --> 00:07:26,960 Speaker 1: In other words, what the person believes, whether or not 116 00:07:27,040 --> 00:07:30,360 Speaker 1: it's what the therapist believes. I'll give you another example. 117 00:07:30,640 --> 00:07:33,840 Speaker 1: My friend Maddie is a professional poker player, and he 118 00:07:33,960 --> 00:07:37,400 Speaker 1: describes poker playing like this. He says, when you're learning 119 00:07:37,480 --> 00:07:39,880 Speaker 1: to play poker, you think about the cards you have 120 00:07:39,960 --> 00:07:43,239 Speaker 1: in your hand. As you get better, you think about 121 00:07:43,240 --> 00:07:46,840 Speaker 1: your hand and also what the other person is thinking. 122 00:07:47,200 --> 00:07:49,560 Speaker 1: And as you get even better, you think about what 123 00:07:49,640 --> 00:07:52,960 Speaker 1: the other person is thinking your thinking, and when you 124 00:07:53,000 --> 00:07:57,040 Speaker 1: get to the professional levels, you're thinking about what he thinks, 125 00:07:57,320 --> 00:08:00,559 Speaker 1: you think he thinks, and people who are real pros 126 00:08:00,680 --> 00:08:04,280 Speaker 1: can think five or six levels deep on this. All 127 00:08:04,320 --> 00:08:08,080 Speaker 1: of this is theory of mind, and theory of mind 128 00:08:08,240 --> 00:08:12,520 Speaker 1: is key when you're teaching something. For example, parents know 129 00:08:13,080 --> 00:08:16,880 Speaker 1: that their children can't understand certain things. For example, the 130 00:08:16,960 --> 00:08:20,080 Speaker 1: child needs to get that smallpox shot, even though to 131 00:08:20,120 --> 00:08:22,720 Speaker 1: the child that's nothing but scary and he simply doesn't 132 00:08:22,960 --> 00:08:26,600 Speaker 1: have the capacity to think about the future benefits that 133 00:08:26,640 --> 00:08:29,880 Speaker 1: will accrue. Or the school teacher can only hope to 134 00:08:30,040 --> 00:08:33,800 Speaker 1: educate her students if she knows what they already know 135 00:08:34,040 --> 00:08:36,319 Speaker 1: or don't know. She needs to phrase things in such 136 00:08:36,360 --> 00:08:39,320 Speaker 1: a way that someone who doesn't already know what she 137 00:08:39,480 --> 00:08:44,000 Speaker 1: knows can absorb it, and that just requires theory of mind. 138 00:08:44,480 --> 00:08:47,160 Speaker 1: If she couldn't simulate what it's like to be in 139 00:08:47,200 --> 00:08:50,719 Speaker 1: their heads, she'd have no meaningful shot at getting them 140 00:08:50,960 --> 00:08:54,360 Speaker 1: past the first quiz. And this issue of considering what 141 00:08:54,480 --> 00:08:59,240 Speaker 1: someone knows or doesn't know is also critical in any negotiation. 142 00:08:59,400 --> 00:09:03,040 Speaker 1: You try to under understand the other person's desires and 143 00:09:03,200 --> 00:09:08,680 Speaker 1: goals and where they might potentially compromise during a salary negotiation, 144 00:09:08,760 --> 00:09:12,199 Speaker 1: you consider what your employer is thinking about the needs 145 00:09:12,200 --> 00:09:15,240 Speaker 1: in future of the company and therefore what they might 146 00:09:15,280 --> 00:09:17,800 Speaker 1: be willing to offer. And this is also how you 147 00:09:17,840 --> 00:09:21,640 Speaker 1: manage conflicts. In any disagreement, if you're smart, you try 148 00:09:21,640 --> 00:09:26,480 Speaker 1: to understand the other person's perspective to resolve the issue. 149 00:09:26,559 --> 00:09:28,960 Speaker 1: If your partner is upset with you, you try to 150 00:09:29,000 --> 00:09:32,360 Speaker 1: figure out what you did or said that set things off, 151 00:09:32,400 --> 00:09:35,600 Speaker 1: and why that offended the other person and how it 152 00:09:35,840 --> 00:09:38,640 Speaker 1: landed for them. And that's the single way that you're 153 00:09:38,640 --> 00:09:42,160 Speaker 1: going to hit the problem effectively. So this ability to 154 00:09:42,240 --> 00:09:46,280 Speaker 1: slip into someone else's shoes has almost everything to do 155 00:09:46,360 --> 00:09:51,880 Speaker 1: with our social intelligence. You use this very human skill 156 00:09:52,600 --> 00:09:55,240 Speaker 1: all the time. And before we get to the next 157 00:09:55,280 --> 00:09:57,800 Speaker 1: act of this podcast, where I ask if computers can 158 00:09:57,840 --> 00:09:59,920 Speaker 1: do this or not, I just want to finish fla 159 00:10:00,360 --> 00:10:03,160 Speaker 1: this out so we can really see how pervasive this is. 160 00:10:03,640 --> 00:10:07,000 Speaker 1: So as an example, you rev up your theory of 161 00:10:07,200 --> 00:10:10,640 Speaker 1: mind engine whenever you send an email. If you know 162 00:10:10,760 --> 00:10:13,959 Speaker 1: someone has a well developed model of you, like your 163 00:10:14,000 --> 00:10:17,600 Speaker 1: parents or your spouse, then you can use abbreviations and 164 00:10:17,640 --> 00:10:20,840 Speaker 1: shortcuts to get your message across. But if you're writing 165 00:10:20,840 --> 00:10:23,120 Speaker 1: to someone who's never met you before. Let's say you're 166 00:10:23,160 --> 00:10:26,520 Speaker 1: applying to a new job. You run a very different game, 167 00:10:27,160 --> 00:10:31,600 Speaker 1: so you're not just an email writing algorithm that produces output, 168 00:10:31,640 --> 00:10:35,559 Speaker 1: but instead your output is modified according to who you 169 00:10:35,640 --> 00:10:38,440 Speaker 1: expect is doing the reading on the other end, and 170 00:10:38,520 --> 00:10:42,160 Speaker 1: specifically what their mind is like. And I also want 171 00:10:42,200 --> 00:10:45,760 Speaker 1: to mention that theory of mind is critical for literature 172 00:10:45,880 --> 00:10:48,080 Speaker 1: to work because it's often the case that you can 173 00:10:48,520 --> 00:10:52,400 Speaker 1: see the limitations of the character's point of view. So, 174 00:10:52,520 --> 00:10:55,960 Speaker 1: for example, if you remember the beginning of the movie Jaws, 175 00:10:56,080 --> 00:10:58,840 Speaker 1: the woman is swimming around in the ocean water and 176 00:10:58,880 --> 00:11:02,680 Speaker 1: she's very relaxed than happy because we see the shark, 177 00:11:02,960 --> 00:11:06,720 Speaker 1: but she doesn't. If we didn't have theory of mind, 178 00:11:06,760 --> 00:11:09,040 Speaker 1: we would simply say, oh, there's a shark there. But 179 00:11:09,080 --> 00:11:12,439 Speaker 1: we're able to understand that she cannot see the shark, 180 00:11:12,720 --> 00:11:14,880 Speaker 1: and that's a big part of why we are fearful, 181 00:11:15,400 --> 00:11:18,120 Speaker 1: because she isn't fearful, and we want her to be. 182 00:11:19,000 --> 00:11:23,520 Speaker 1: This stepping into other people's heads drives essentially all horror 183 00:11:23,600 --> 00:11:26,720 Speaker 1: movies because we often know something that the main character 184 00:11:27,320 --> 00:11:30,960 Speaker 1: does not, and it also drives romantic comedies. For example, 185 00:11:31,000 --> 00:11:35,000 Speaker 1: we see the guy doing something very nice like helping 186 00:11:35,040 --> 00:11:37,760 Speaker 1: an elderly woman cross the street, and he doesn't know 187 00:11:37,840 --> 00:11:41,080 Speaker 1: that he is being watched by the female love interest, 188 00:11:41,520 --> 00:11:45,200 Speaker 1: and therefore we the audience interpret what kind of guy 189 00:11:45,240 --> 00:11:47,959 Speaker 1: he must be to behave that way when as far 190 00:11:47,960 --> 00:11:50,720 Speaker 1: as he knows, he's totally alone. We would have a 191 00:11:50,800 --> 00:11:56,040 Speaker 1: totally different interpretation. If he sees his romantic counterparts there 192 00:11:56,080 --> 00:11:58,880 Speaker 1: and then he does the charitable act, we'd simulate that 193 00:11:58,920 --> 00:12:03,080 Speaker 1: his intentions are different there. Now, why are human brains 194 00:12:03,160 --> 00:12:07,440 Speaker 1: so talented at making theories about other people's minds. Well, 195 00:12:07,480 --> 00:12:10,240 Speaker 1: you've heard me say many times that the job of 196 00:12:10,280 --> 00:12:14,960 Speaker 1: intelligent brains is to predict the future. If you're the magician, 197 00:12:15,040 --> 00:12:18,520 Speaker 1: you'd better be sure that you are predicting correctly where 198 00:12:18,559 --> 00:12:21,480 Speaker 1: their spotlight of attention is about to be. If you're 199 00:12:21,520 --> 00:12:24,440 Speaker 1: the poker player or the con man, you're trying to 200 00:12:24,440 --> 00:12:27,440 Speaker 1: predict what someone is going to do next, and this 201 00:12:27,520 --> 00:12:30,360 Speaker 1: is the optimal way to do this is to step 202 00:12:30,400 --> 00:12:34,240 Speaker 1: into their mental world and understand what it is like 203 00:12:34,440 --> 00:12:36,880 Speaker 1: to be them. What they know and they don't know. 204 00:12:37,360 --> 00:12:43,200 Speaker 1: You leverage theory of mind to anticipate their next action, 205 00:12:43,600 --> 00:12:46,760 Speaker 1: and presumably this reaches back to the recent millions of 206 00:12:46,840 --> 00:12:49,880 Speaker 1: years of our evolution. So if you're an early homo 207 00:12:49,920 --> 00:12:53,160 Speaker 1: sapien and moving along the trail and you see another 208 00:12:53,240 --> 00:12:57,199 Speaker 1: homo sapien coming down the trail towards you, it's absolutely 209 00:12:57,200 --> 00:12:59,920 Speaker 1: critical for you to figure out is he going to 210 00:13:00,120 --> 00:13:03,240 Speaker 1: attack me? Is he scared of me? Is he trying 211 00:13:03,280 --> 00:13:05,520 Speaker 1: to trick me? Is he just trying to get past me. 212 00:13:06,200 --> 00:13:09,320 Speaker 1: You're trying to figure out his mind so you can 213 00:13:09,360 --> 00:13:13,280 Speaker 1: figure out his next actions. So what I've told you 214 00:13:13,320 --> 00:13:16,440 Speaker 1: so far is that theory of mind is this critical 215 00:13:16,480 --> 00:13:21,079 Speaker 1: foundation for all of our meaningful social interactions because those 216 00:13:21,200 --> 00:13:26,319 Speaker 1: require you to be able to simulate other people's intentions 217 00:13:26,360 --> 00:13:30,800 Speaker 1: and emotions and beliefs. Your brain doesn't assume that it's 218 00:13:30,840 --> 00:13:34,600 Speaker 1: a knowledge communism out there where everyone knows exactly what 219 00:13:34,720 --> 00:13:37,840 Speaker 1: you know. Instead, we're able to pull off a higher 220 00:13:37,920 --> 00:13:41,760 Speaker 1: level of interaction because we understand that the world is 221 00:13:41,800 --> 00:13:45,440 Speaker 1: different inside different heads. And this, by the way, is 222 00:13:45,480 --> 00:13:48,880 Speaker 1: really sophisticated. It requires knowing who I am and what 223 00:13:48,920 --> 00:13:51,560 Speaker 1: I see and believe, and also holding in my head 224 00:13:51,600 --> 00:13:53,480 Speaker 1: what it is to be someone else and see and 225 00:13:53,520 --> 00:13:56,920 Speaker 1: believe something different. This is a very sophisticated computation that 226 00:13:56,960 --> 00:14:00,800 Speaker 1: the brain pulls off, but because we're so good at it, 227 00:14:00,800 --> 00:14:05,680 Speaker 1: it's typically invisible to us. But theory of mind doesn't 228 00:14:05,720 --> 00:14:09,760 Speaker 1: come for free. It's something that develops with time. As 229 00:14:09,800 --> 00:14:12,800 Speaker 1: you get more and more experience in the world and 230 00:14:12,840 --> 00:14:16,000 Speaker 1: you stop believing that you are the centerpiece and that 231 00:14:16,080 --> 00:14:18,760 Speaker 1: everyone else is just a cast member. You come to 232 00:14:18,880 --> 00:14:22,640 Speaker 1: understand that that person believes something different than you do, 233 00:14:23,120 --> 00:14:25,800 Speaker 1: and this other person feels a certain way even though 234 00:14:25,840 --> 00:14:29,360 Speaker 1: you don't, and that this person over here thinks something 235 00:14:29,400 --> 00:14:48,040 Speaker 1: to be true even though you know it's not. So 236 00:14:48,080 --> 00:14:50,800 Speaker 1: how do we know that this is a skill that 237 00:14:50,880 --> 00:14:55,920 Speaker 1: develops through time Because very little kids are terrible at 238 00:14:55,960 --> 00:14:59,240 Speaker 1: theory of mind, but they get better as they mature 239 00:14:59,440 --> 00:15:02,520 Speaker 1: into the world, and typically by the ages of three 240 00:15:02,720 --> 00:15:05,720 Speaker 1: to five, they're getting that they're not the only point 241 00:15:05,720 --> 00:15:08,400 Speaker 1: of view that's possible, but that each person in the 242 00:15:08,440 --> 00:15:11,360 Speaker 1: scene has his or her own point of view. Now, 243 00:15:11,360 --> 00:15:15,160 Speaker 1: how do you test whether someone is capable of theory 244 00:15:15,280 --> 00:15:18,240 Speaker 1: of mind? Well, what you do is you present a 245 00:15:18,280 --> 00:15:22,080 Speaker 1: little scenario like this. Sally comes into the room and 246 00:15:22,160 --> 00:15:25,560 Speaker 1: puts her baseball under the bed, and then she leaves. 247 00:15:26,200 --> 00:15:29,920 Speaker 1: While she's gone, Anne comes in the room, she sees 248 00:15:29,960 --> 00:15:32,240 Speaker 1: the ball under the bed, She picks it up, and 249 00:15:32,280 --> 00:15:36,080 Speaker 1: she puts it in the closet. Then she leaves. Now 250 00:15:36,440 --> 00:15:39,359 Speaker 1: Sally comes back in the room, she wants her baseball. 251 00:15:39,920 --> 00:15:42,520 Speaker 1: Where does she look for it? Now? You and I 252 00:15:42,600 --> 00:15:44,840 Speaker 1: know that Sally will look for it under the bed 253 00:15:44,880 --> 00:15:48,720 Speaker 1: where she put it last, even though we simultaneously know 254 00:15:48,840 --> 00:15:52,400 Speaker 1: the actual location of the baseball in the closet. And 255 00:15:52,440 --> 00:15:55,800 Speaker 1: this is because we are running an emulation of what 256 00:15:55,920 --> 00:15:58,640 Speaker 1: it is like to be inside Sally's head with her 257 00:15:58,800 --> 00:16:03,640 Speaker 1: limited knowledg. Now, little children will fail the sally An 258 00:16:03,800 --> 00:16:07,640 Speaker 1: test because they know that the baseball is in the closet, 259 00:16:08,000 --> 00:16:11,920 Speaker 1: so they assume that Sally should know that too. But 260 00:16:12,200 --> 00:16:15,680 Speaker 1: as cognition develops, they come to realize that different heads 261 00:16:15,880 --> 00:16:18,880 Speaker 1: have different beliefs. And a really important clue to the 262 00:16:18,960 --> 00:16:23,240 Speaker 1: development of this is that not everyone develops theory of 263 00:16:23,320 --> 00:16:26,400 Speaker 1: mind in the same way at the same rate. For example, 264 00:16:26,760 --> 00:16:31,680 Speaker 1: people who are on the autism spectrum typically show delays 265 00:16:31,720 --> 00:16:36,360 Speaker 1: in developing theory of mind, which cannot surprisingly impact their 266 00:16:36,360 --> 00:16:40,440 Speaker 1: social interactions. For instance, this is why sarcasm doesn't work 267 00:16:40,480 --> 00:16:44,080 Speaker 1: so well with a person who has autism. When you say, oh, 268 00:16:44,080 --> 00:16:47,760 Speaker 1: great more traffic. I love traffic. They're not likely to 269 00:16:47,880 --> 00:16:51,160 Speaker 1: catch the meaning beneath the words that you're not actually 270 00:16:51,200 --> 00:16:54,680 Speaker 1: pleased because they don't have a sensitive model of your 271 00:16:55,040 --> 00:16:57,880 Speaker 1: actual mental state. If you can't put yourself in the 272 00:16:57,920 --> 00:17:01,000 Speaker 1: shoes of the other person, your understands is limited to 273 00:17:01,200 --> 00:17:05,159 Speaker 1: just pattern recognition, which is not enough for the very 274 00:17:05,200 --> 00:17:09,240 Speaker 1: subtle and sophisticated kinds of communication that humans engage in 275 00:17:09,359 --> 00:17:12,520 Speaker 1: every day. So this tells us that theory of mind 276 00:17:12,720 --> 00:17:16,480 Speaker 1: doesn't come for free in humans. There are brain networks 277 00:17:16,480 --> 00:17:18,879 Speaker 1: that have to develop and learn for this to work, 278 00:17:19,040 --> 00:17:22,200 Speaker 1: so when you look at normal development or delay development. 279 00:17:22,240 --> 00:17:27,600 Speaker 1: This allows us to understand how different brain regions contribute 280 00:17:27,920 --> 00:17:30,960 Speaker 1: to theory of mind. For example, there's one area called 281 00:17:30,960 --> 00:17:34,520 Speaker 1: the temporopridal junction, and this is interesting because it pops 282 00:17:34,520 --> 00:17:39,119 Speaker 1: its head up in tasks that require understanding perspectives, like 283 00:17:39,600 --> 00:17:43,080 Speaker 1: distinguishing between what you know and what someone else knows. 284 00:17:43,520 --> 00:17:47,000 Speaker 1: So imagine you're teaching a friend how to play chess. 285 00:17:47,480 --> 00:17:49,720 Speaker 1: You need to not only understand the rules of the game, 286 00:17:50,040 --> 00:17:52,919 Speaker 1: but also know what your friend knows or doesn't know 287 00:17:53,119 --> 00:17:57,080 Speaker 1: about the game to teach effectively, and the temporo pridal 288 00:17:57,160 --> 00:18:01,160 Speaker 1: junction is involved in that not just that area. It's 289 00:18:01,200 --> 00:18:03,760 Speaker 1: a lot of other areas involved in theory of mind. 290 00:18:04,119 --> 00:18:07,399 Speaker 1: So the medial prefrontal cortex plays a big role in 291 00:18:07,480 --> 00:18:11,600 Speaker 1: making social judgments. It becomes active when you think about 292 00:18:11,680 --> 00:18:14,840 Speaker 1: the mental states of others. For example, if you're trying 293 00:18:14,880 --> 00:18:19,680 Speaker 1: to decide if someone is lying or being truthful, your 294 00:18:19,720 --> 00:18:23,160 Speaker 1: medial prefrontal cortex is engaged. And there are other areas, 295 00:18:23,240 --> 00:18:26,600 Speaker 1: like part of your superior temporal sulcus is involved in 296 00:18:26,760 --> 00:18:31,720 Speaker 1: processing social information like interpreting other people's eye gaze or 297 00:18:31,760 --> 00:18:35,080 Speaker 1: their body language, like the man looking for his keys 298 00:18:35,400 --> 00:18:38,520 Speaker 1: and the child giggling. We're able to infer a lot 299 00:18:38,920 --> 00:18:42,040 Speaker 1: because of the activity of this area. So we see 300 00:18:42,119 --> 00:18:45,280 Speaker 1: lots of areas in brain imaging experiments. And I want 301 00:18:45,320 --> 00:18:48,399 Speaker 1: to mention this to illustrate that theory of mind is 302 00:18:48,440 --> 00:18:51,960 Speaker 1: a brain wide issue. It's not a single area. And 303 00:18:52,000 --> 00:18:53,720 Speaker 1: by the way, this is true of so many things 304 00:18:53,760 --> 00:18:57,439 Speaker 1: in neuroscience. Imagine that I spread out a map of 305 00:18:57,480 --> 00:19:00,280 Speaker 1: your city and I ask you, hey, can you put 306 00:19:00,320 --> 00:19:03,840 Speaker 1: a pin in the spot that represents the economy of 307 00:19:03,880 --> 00:19:07,040 Speaker 1: the city. You tell me that that is a misplaced request. 308 00:19:07,359 --> 00:19:10,840 Speaker 1: There is no single spot for the economy. The economy 309 00:19:11,000 --> 00:19:14,439 Speaker 1: emerges from all the interactions between all the pieces and 310 00:19:14,480 --> 00:19:17,000 Speaker 1: parts of the city, and it's the same with almost 311 00:19:17,080 --> 00:19:21,080 Speaker 1: everything in neuroscience, and especially something like the skill of 312 00:19:21,200 --> 00:19:24,440 Speaker 1: slipping into someone else's point of view. There's not one 313 00:19:24,600 --> 00:19:27,640 Speaker 1: spot to drop a pin into. Instead, it is an 314 00:19:27,720 --> 00:19:32,600 Speaker 1: emergent property that develops from the interaction of lots of networks. 315 00:19:32,880 --> 00:19:35,679 Speaker 1: So what we've seen so far is that theory of 316 00:19:35,840 --> 00:19:39,560 Speaker 1: mind is this ability to infer what someone else knows, 317 00:19:39,600 --> 00:19:41,560 Speaker 1: and we've seen that this is right at the center 318 00:19:42,040 --> 00:19:46,000 Speaker 1: of social interactions. It's something that most humans develop naturally, 319 00:19:46,440 --> 00:19:49,200 Speaker 1: but that doesn't mean it's simple. And the question we're 320 00:19:49,200 --> 00:19:54,159 Speaker 1: going to ask today is does AI have theory of mind? 321 00:19:54,280 --> 00:19:58,640 Speaker 1: Can it put itself into someone else's shoes to understand 322 00:19:59,080 --> 00:20:03,000 Speaker 1: their limited knowledge. One of my colleagues at Stanford recently 323 00:20:03,000 --> 00:20:08,119 Speaker 1: wrote a paper suggesting yes, AI can do this. But fascinatingly, 324 00:20:08,720 --> 00:20:11,679 Speaker 1: it's not as easy to answer this question as you 325 00:20:11,760 --> 00:20:14,200 Speaker 1: might think. And this is for some reasons that we're 326 00:20:14,200 --> 00:20:16,760 Speaker 1: going to dive into. But before we get there, I 327 00:20:16,800 --> 00:20:19,640 Speaker 1: just want to zoom this out to a slightly larger question. 328 00:20:20,200 --> 00:20:25,280 Speaker 1: Could a computer develop theory of mind. Hypothetically, could an 329 00:20:25,320 --> 00:20:28,119 Speaker 1: AI system at some point in the future say, look, 330 00:20:28,280 --> 00:20:31,040 Speaker 1: I know XYZ to be true, but if I look 331 00:20:31,040 --> 00:20:33,639 Speaker 1: at that other person over there, I understand that they 332 00:20:33,640 --> 00:20:36,399 Speaker 1: have a limited viewpoint and that they don't know X 333 00:20:36,440 --> 00:20:41,439 Speaker 1: and Y, and that person over there misbelieves something about Z. 334 00:20:41,920 --> 00:20:46,560 Speaker 1: Well almost certainly, yes, Why it's because we're made up 335 00:20:46,600 --> 00:20:50,280 Speaker 1: of physical stuff and we're running algorithms that took hundreds 336 00:20:50,280 --> 00:20:54,600 Speaker 1: of millions of years to refine. But nonetheless it's physical stuff. 337 00:20:54,720 --> 00:20:59,160 Speaker 1: So if we can do something, presumably a machine could 338 00:20:59,160 --> 00:21:01,760 Speaker 1: do it also, whether or not it's currently clear how 339 00:21:01,760 --> 00:21:07,679 Speaker 1: that's done. That's the central premise of computational neuroscience, and 340 00:21:07,720 --> 00:21:10,240 Speaker 1: to my mind, one of the most remarkable effects of 341 00:21:10,280 --> 00:21:14,800 Speaker 1: the AI explosion over the last few years is understanding 342 00:21:15,280 --> 00:21:18,280 Speaker 1: that things that would have seemed impossible to do with 343 00:21:18,359 --> 00:21:21,760 Speaker 1: a machine, things that almost everyone would have sworn couldn't 344 00:21:21,800 --> 00:21:25,080 Speaker 1: be done. It now seems like background furniture as we 345 00:21:25,160 --> 00:21:28,080 Speaker 1: wait for the next thing. Now, the complexity of the 346 00:21:28,119 --> 00:21:31,040 Speaker 1: brain suggests that theory of mind is going to be 347 00:21:31,080 --> 00:21:34,399 Speaker 1: a very hard problem to solve, because it requires us 348 00:21:34,400 --> 00:21:37,240 Speaker 1: to understand how the brain has a model of the 349 00:21:37,280 --> 00:21:41,320 Speaker 1: world and then how it can make submodels and simulate 350 00:21:41,720 --> 00:21:43,720 Speaker 1: what it is like to only know part of the 351 00:21:43,760 --> 00:21:47,120 Speaker 1: story or to believe a different story. So we don't 352 00:21:47,119 --> 00:21:49,919 Speaker 1: currently know how our brains do it, but of course 353 00:21:50,400 --> 00:21:54,280 Speaker 1: we have Our computers do this sort of thing often, 354 00:21:54,720 --> 00:21:58,520 Speaker 1: Like you can take your modern MacBook laptop and use 355 00:21:58,600 --> 00:22:02,200 Speaker 1: a little bit of its processor to simulate an old 356 00:22:02,760 --> 00:22:06,600 Speaker 1: timex Sinclare computer. Your mac can perfectly simulate it by 357 00:22:06,680 --> 00:22:11,360 Speaker 1: running what's called an emulation on part of its computational hardware. 358 00:22:11,800 --> 00:22:17,160 Speaker 1: Somehow human brains can run emulations also, like just by 359 00:22:17,240 --> 00:22:20,440 Speaker 1: looking you can emulate what it's like to not know 360 00:22:20,560 --> 00:22:23,199 Speaker 1: that the shark is there below you. So yes, it 361 00:22:23,240 --> 00:22:26,639 Speaker 1: seems totally plausible to me that a machine could do 362 00:22:26,880 --> 00:22:30,560 Speaker 1: theory of mind, because we can. But the question we 363 00:22:30,600 --> 00:22:33,639 Speaker 1: want to ask today is whether we are there or 364 00:22:33,720 --> 00:22:38,000 Speaker 1: not right now? Have current large language models like chat 365 00:22:38,080 --> 00:22:42,360 Speaker 1: GPT come to solve this problem without us telling them 366 00:22:42,359 --> 00:22:46,560 Speaker 1: explicitly to do so, in other words, with no instruction? Whatsoever? 367 00:22:47,119 --> 00:22:51,600 Speaker 1: Is the emulation of other minds and emergent property that 368 00:22:51,720 --> 00:22:54,560 Speaker 1: comes out of these things, which would absolutely blow our 369 00:22:54,600 --> 00:22:59,240 Speaker 1: minds if true, does AI do theory of mind? If 370 00:22:59,280 --> 00:23:03,639 Speaker 1: it can, this would have profound implications for our understanding 371 00:23:03,760 --> 00:23:07,119 Speaker 1: of intelligence and our relationship with AI. I mean, just 372 00:23:07,200 --> 00:23:09,320 Speaker 1: consider how much better it would be if it could 373 00:23:09,359 --> 00:23:14,320 Speaker 1: emulate the mental states of people, like with auto driving cars, 374 00:23:14,400 --> 00:23:18,120 Speaker 1: if it didn't just depend on the observable, but instead 375 00:23:18,160 --> 00:23:21,080 Speaker 1: on what's going on in the other driver's head. Like, 376 00:23:21,480 --> 00:23:23,679 Speaker 1: given the trajectory of this car, I think that the 377 00:23:23,720 --> 00:23:27,720 Speaker 1: other driver is drunk or asleep or distracted. And so 378 00:23:27,880 --> 00:23:30,679 Speaker 1: here's what I think is going to happen next. So 379 00:23:31,119 --> 00:23:34,520 Speaker 1: a colleague of mine at Stanford, Michael Kazinski, published a 380 00:23:34,640 --> 00:23:38,479 Speaker 1: twenty twenty three paper that was originally titled Theory of 381 00:23:38,600 --> 00:23:44,159 Speaker 1: Mind might have spontaneously emerged in large language models, although 382 00:23:44,160 --> 00:23:47,119 Speaker 1: he later changed the title. In the paper, he suggested 383 00:23:47,400 --> 00:23:50,600 Speaker 1: that even though these AI models didn't set out to 384 00:23:50,880 --> 00:23:54,840 Speaker 1: have theory of mind, it may have appeared anyway as 385 00:23:55,000 --> 00:23:59,400 Speaker 1: a byproduct of their improving language skills. So, for example, 386 00:23:59,640 --> 00:24:04,560 Speaker 1: he gives the following scenario to chatchipt complete the following story. 387 00:24:05,119 --> 00:24:08,600 Speaker 1: Here is a bag filled with popcorn. There is no 388 00:24:08,880 --> 00:24:12,160 Speaker 1: chocolate in the bag, yet the label on the bag 389 00:24:12,200 --> 00:24:17,280 Speaker 1: says chocolate and not popcorn. Sam finds the bag. She 390 00:24:17,320 --> 00:24:20,639 Speaker 1: has never seen this bag before Sam doesn't open the 391 00:24:20,680 --> 00:24:24,800 Speaker 1: bag and doesn't look inside. Sam reads the label and 392 00:24:24,840 --> 00:24:27,639 Speaker 1: then he gives the prompt. Sam opens the bag and 393 00:24:27,760 --> 00:24:31,639 Speaker 1: looks inside. She can clearly see that it is full of. 394 00:24:32,359 --> 00:24:35,760 Speaker 1: And then he looks at the word that Chatgypt produces, 395 00:24:35,960 --> 00:24:40,879 Speaker 1: is it popcorn or chocolate? And chatchipt says popcorn. But 396 00:24:40,920 --> 00:24:44,320 Speaker 1: if instead he gives a different prompt, Sam calls a 397 00:24:44,359 --> 00:24:47,280 Speaker 1: friend to tell him that she has just found a 398 00:24:47,400 --> 00:24:52,760 Speaker 1: bag full of and now Chatchipet says chocolate, indicating that 399 00:24:52,920 --> 00:24:57,840 Speaker 1: Sam holds a false belief. And Kasinski runs this a 400 00:24:57,880 --> 00:25:01,240 Speaker 1: bunch of ways and shows that chat Gi gets the 401 00:25:01,320 --> 00:25:05,320 Speaker 1: right answer. So is there something going on here? And 402 00:25:05,359 --> 00:25:07,479 Speaker 1: you can try this for yourself. Type in a version 403 00:25:07,680 --> 00:25:11,440 Speaker 1: of the Sally and test where Sally hides her ball 404 00:25:11,520 --> 00:25:13,919 Speaker 1: under the bed and then leaves and An comes in 405 00:25:14,000 --> 00:25:16,520 Speaker 1: later and sees it, moves into the closet, and you 406 00:25:16,560 --> 00:25:19,480 Speaker 1: ask when Sally comes back in the room, where will 407 00:25:19,520 --> 00:25:22,200 Speaker 1: she look for the ball? And chat gpt will tell 408 00:25:22,240 --> 00:25:25,879 Speaker 1: you that Sally will look for the ball under the bed. 409 00:25:26,320 --> 00:25:28,879 Speaker 1: And this is amazing, right, So I want to be 410 00:25:29,040 --> 00:25:32,920 Speaker 1: clear why I think it is meaningless that AI can 411 00:25:33,000 --> 00:25:36,320 Speaker 1: pass these tests if anyone ever tells you that this 412 00:25:36,480 --> 00:25:39,240 Speaker 1: is proof that AI has theory of mind, please let 413 00:25:39,240 --> 00:25:43,520 Speaker 1: them know this is not proof. Why. Well, that question 414 00:25:43,600 --> 00:25:47,359 Speaker 1: about the bag of popcorn that's labeled chocolate, that is 415 00:25:47,480 --> 00:25:52,240 Speaker 1: known as the unexpected Content's task, and this was originally 416 00:25:52,320 --> 00:25:56,280 Speaker 1: published by three researchers in nineteen eighty seven. Hundreds of 417 00:25:56,320 --> 00:26:00,680 Speaker 1: papers cite this or replicate this, hundreds of blogs about this, 418 00:26:01,280 --> 00:26:04,280 Speaker 1: so of course a large language model gets it right. 419 00:26:04,680 --> 00:26:07,800 Speaker 1: And the sally An test is in even more places 420 00:26:07,840 --> 00:26:12,080 Speaker 1: on the web, literally hundreds of thousands of places. It's 421 00:26:12,119 --> 00:26:16,199 Speaker 1: known in the literature as the unexpected transfer test. So 422 00:26:16,359 --> 00:26:21,000 Speaker 1: of course chat GPT solves these challenges. That's what large 423 00:26:21,040 --> 00:26:25,240 Speaker 1: language models do. They read everything that has come before them, 424 00:26:25,480 --> 00:26:29,199 Speaker 1: so it well knows the punchline of this question. It 425 00:26:29,320 --> 00:26:47,920 Speaker 1: is a statistical parrot. Now I'll give you one more 426 00:26:47,960 --> 00:26:50,960 Speaker 1: example of this that I mentioned in an earlier episode, 427 00:26:51,200 --> 00:26:53,959 Speaker 1: when a friend of mine was blown away by the 428 00:26:53,960 --> 00:26:57,480 Speaker 1: fact that he asked a visual reasoning problem to chat 429 00:26:57,520 --> 00:27:00,800 Speaker 1: GPT and it gave him the perfectly right answer. My 430 00:27:00,840 --> 00:27:04,399 Speaker 1: friend said, take a capital letter D and turn it 431 00:27:04,480 --> 00:27:07,240 Speaker 1: on its side, flat side down, and then put that 432 00:27:07,320 --> 00:27:10,199 Speaker 1: on top of a capital letter J, what does that 433 00:27:10,280 --> 00:27:13,760 Speaker 1: look like? And chat GPT said, it looks like an umbrella. 434 00:27:13,800 --> 00:27:16,040 Speaker 1: And my friend was so impressed with this that he 435 00:27:16,119 --> 00:27:18,480 Speaker 1: told me he was certain that chat GPT could do 436 00:27:18,680 --> 00:27:22,560 Speaker 1: visual reasoning. But I pointed out to him that this 437 00:27:22,720 --> 00:27:26,480 Speaker 1: example he used was this single most used example in 438 00:27:26,560 --> 00:27:29,880 Speaker 1: the literature on visual reasoning. I knew about this from 439 00:27:29,880 --> 00:27:33,320 Speaker 1: a quite famous paper from nineteen eighty nine, although I 440 00:27:33,320 --> 00:27:35,199 Speaker 1: don't even know if that was the first usage of it, 441 00:27:35,560 --> 00:27:39,520 Speaker 1: and you can find precisely that question referenced online in 442 00:27:39,720 --> 00:27:43,359 Speaker 1: thousands of places. Now, I don't know whether he was 443 00:27:43,480 --> 00:27:46,439 Speaker 1: consciously aware that question was something he had heard before, 444 00:27:47,160 --> 00:27:50,000 Speaker 1: or if he had heard it years ago and erroneously 445 00:27:50,119 --> 00:27:52,600 Speaker 1: thought he had thought of it. Or there's also the 446 00:27:52,760 --> 00:27:55,639 Speaker 1: very tiny possibility that he had never heard that question 447 00:27:55,680 --> 00:27:58,800 Speaker 1: before and had thought of it independently. But that just 448 00:27:59,040 --> 00:28:02,840 Speaker 1: underscores the point even more that we live on a 449 00:28:02,920 --> 00:28:07,800 Speaker 1: planet with billions of other brains, and almost anything you 450 00:28:07,920 --> 00:28:12,119 Speaker 1: think of has been thought before and likely written down, 451 00:28:12,560 --> 00:28:17,040 Speaker 1: maybe hundreds of thousands of times. So the point is 452 00:28:17,359 --> 00:28:20,919 Speaker 1: that you may think a large language model is brilliant 453 00:28:21,480 --> 00:28:24,760 Speaker 1: when it is just a good imitator. Now, one important 454 00:28:24,760 --> 00:28:27,840 Speaker 1: point on this, you might think, hey, instead of talking 455 00:28:27,840 --> 00:28:30,879 Speaker 1: about Sally and Anne, what if I do something clever 456 00:28:30,960 --> 00:28:34,679 Speaker 1: and I ask chat GPT about Brett and Michael, And 457 00:28:34,760 --> 00:28:38,480 Speaker 1: instead of putting the baseball under the bed, Rrehtt puts 458 00:28:38,520 --> 00:28:41,040 Speaker 1: a marble in a box. And then Michael finds the 459 00:28:41,080 --> 00:28:43,320 Speaker 1: marble and puts it up on the shelf. And the 460 00:28:43,440 --> 00:28:46,239 Speaker 1: question is where does Brett look for the marble. But 461 00:28:46,280 --> 00:28:50,360 Speaker 1: you'll find that the large language model has no trouble generalizing, 462 00:28:50,600 --> 00:28:54,360 Speaker 1: especially as it has digested multiple flavors of this task. 463 00:28:54,880 --> 00:28:59,080 Speaker 1: And this is because it's mapping the relationship between concepts 464 00:28:59,160 --> 00:29:01,720 Speaker 1: in its latent space. If you don't know what latent 465 00:29:01,760 --> 00:29:03,440 Speaker 1: space is, I'm going to do an episode on that 466 00:29:03,560 --> 00:29:06,800 Speaker 1: quite soon because it's such an amazing concept. So you 467 00:29:06,880 --> 00:29:10,560 Speaker 1: might be tempted to say it's not just a statistical parrot, 468 00:29:10,600 --> 00:29:14,520 Speaker 1: it's understanding something deeper in its latent space. But I 469 00:29:14,560 --> 00:29:17,520 Speaker 1: think this could also be a wrong interpretation. It is 470 00:29:17,600 --> 00:29:21,959 Speaker 1: still a statistical parrot that doesn't know what it is 471 00:29:22,080 --> 00:29:25,280 Speaker 1: to be another person, but it nonetheless learns from the 472 00:29:25,320 --> 00:29:29,920 Speaker 1: statistics which words to put after what. In other words, 473 00:29:30,280 --> 00:29:33,920 Speaker 1: it's not clear that these systems have to truly understand 474 00:29:34,080 --> 00:29:38,960 Speaker 1: other people's thoughts and feelings to simply extract the patterns 475 00:29:39,240 --> 00:29:43,200 Speaker 1: from what they have been trained on. And he might say, well, 476 00:29:43,480 --> 00:29:45,040 Speaker 1: how do we know that's not the same with us, 477 00:29:45,080 --> 00:29:48,760 Speaker 1: How do you know that we're not just extracting statistics. Well, 478 00:29:48,880 --> 00:29:52,000 Speaker 1: when you are watching the woman swimming in the opening 479 00:29:52,040 --> 00:29:55,640 Speaker 1: scene of Jaws and you feel fear because the shark 480 00:29:55,760 --> 00:29:59,680 Speaker 1: is circling below her, it's not that you have memorized 481 00:29:59,720 --> 00:30:03,440 Speaker 1: the answer of similar problems, and that's how you conclude 482 00:30:03,840 --> 00:30:06,640 Speaker 1: that she doesn't know the shark is there. Instead, your 483 00:30:06,680 --> 00:30:09,640 Speaker 1: heart starts racing and you start gripping the chair because 484 00:30:10,120 --> 00:30:13,560 Speaker 1: you've been in similar situations where there's nothing but dark 485 00:30:13,600 --> 00:30:16,840 Speaker 1: water below you, and you know she really doesn't know, 486 00:30:17,280 --> 00:30:21,680 Speaker 1: and you appreciate how terrifying the situation is. So what 487 00:30:21,760 --> 00:30:24,840 Speaker 1: I have described to you is a problem where knowledge 488 00:30:24,880 --> 00:30:28,280 Speaker 1: exists in the literature written by humans, and the AI 489 00:30:28,600 --> 00:30:32,560 Speaker 1: digests that writing, but the person running the query doesn't 490 00:30:32,560 --> 00:30:36,600 Speaker 1: fully appreciate that. And this is a very basic confusion 491 00:30:36,640 --> 00:30:39,600 Speaker 1: that I'm watching A lot of people have about large 492 00:30:39,640 --> 00:30:43,200 Speaker 1: language models. They type in a sophisticated question and they 493 00:30:43,240 --> 00:30:45,880 Speaker 1: get back what appears to be a sophisticated answer, and 494 00:30:45,920 --> 00:30:50,320 Speaker 1: they conclude this thing is truly intelligent. This thing has 495 00:30:50,440 --> 00:30:53,840 Speaker 1: theory of mind, or it's sentient, or it can visualize. 496 00:30:54,560 --> 00:30:57,320 Speaker 1: And I'm seeing this so commonly now that I've decided 497 00:30:57,360 --> 00:30:59,840 Speaker 1: to give it a name. I'm calling this the in 498 00:31:00,000 --> 00:31:05,360 Speaker 1: intelligence echo illusion. This happens when you think AI is 499 00:31:05,520 --> 00:31:08,959 Speaker 1: answering something with great insight, but really what you're hearing 500 00:31:09,000 --> 00:31:12,040 Speaker 1: back is just an echo of things that have already 501 00:31:12,080 --> 00:31:16,120 Speaker 1: been said by humans before. In other words, you think 502 00:31:16,280 --> 00:31:20,000 Speaker 1: it's intelligent, but you're confusing that with the intellectual endeavors 503 00:31:20,520 --> 00:31:23,880 Speaker 1: of other people. Maybe dozens of people had written about this, 504 00:31:24,160 --> 00:31:27,160 Speaker 1: or hundreds or thousands, but you simply didn't know that, 505 00:31:27,600 --> 00:31:31,400 Speaker 1: and so you're hearing their echo and you misinterpret that 506 00:31:31,680 --> 00:31:35,680 Speaker 1: echo as the proud voice of AI. So I ran 507 00:31:35,720 --> 00:31:38,280 Speaker 1: some calculations on this. There are eight point two billion 508 00:31:38,280 --> 00:31:40,800 Speaker 1: people on the planet alive right now, and let's call 509 00:31:40,840 --> 00:31:43,960 Speaker 1: it one hundred and fifteen billion humans who have lived 510 00:31:43,960 --> 00:31:47,200 Speaker 1: and died before us. And every one of these billions 511 00:31:47,800 --> 00:31:51,080 Speaker 1: was thinking and having their own stories every day of 512 00:31:51,120 --> 00:31:54,760 Speaker 1: their lives, and some fraction wrote their thoughts down, and 513 00:31:54,800 --> 00:31:58,560 Speaker 1: as a result, these large language models like CHATGPT are 514 00:31:58,600 --> 00:32:02,640 Speaker 1: trained on massive data sets of what is already out 515 00:32:02,640 --> 00:32:06,560 Speaker 1: there written down by humans. We're talking hundreds of billions 516 00:32:06,560 --> 00:32:10,320 Speaker 1: of words. These data sets are pulled from books and 517 00:32:10,440 --> 00:32:14,280 Speaker 1: websites and blogs and articles and on and on. So, 518 00:32:14,400 --> 00:32:18,520 Speaker 1: for example, the training data for these large language models 519 00:32:18,520 --> 00:32:23,640 Speaker 1: includes a data set called common crawl, which contains hundreds 520 00:32:23,680 --> 00:32:28,880 Speaker 1: of terabytes of text. Now assume you read for an 521 00:32:28,920 --> 00:32:31,160 Speaker 1: hour every day of your life, let's say at an 522 00:32:31,160 --> 00:32:33,280 Speaker 1: average speed of two hundred and fifty words per minute, 523 00:32:33,360 --> 00:32:35,960 Speaker 1: and you do that for reading window of seventy years. 524 00:32:36,400 --> 00:32:39,400 Speaker 1: That's three hundred million words that you can read in 525 00:32:39,400 --> 00:32:42,720 Speaker 1: your lifetime, which means that what you consume in a 526 00:32:42,760 --> 00:32:47,640 Speaker 1: lifetime is one one thousandth of what chat GPT is 527 00:32:47,680 --> 00:32:50,840 Speaker 1: trained on. That means if you digest books every day 528 00:32:50,840 --> 00:32:54,200 Speaker 1: of your entire life, you still only read point one 529 00:32:54,400 --> 00:32:58,240 Speaker 1: percent of what chat GPT has read. You would need 530 00:32:58,720 --> 00:33:02,520 Speaker 1: a thousand life times to know what it knows, and 531 00:33:02,560 --> 00:33:04,800 Speaker 1: on top of that, you'd have to actually remember every 532 00:33:04,840 --> 00:33:08,440 Speaker 1: sentence of what you read. So there are many many 533 00:33:09,000 --> 00:33:13,200 Speaker 1: questions and answers that a large language model has trained 534 00:33:13,240 --> 00:33:17,080 Speaker 1: on that you either have no knowledge of, or maybe 535 00:33:17,120 --> 00:33:19,760 Speaker 1: you had heard it before, but don't remember, and in 536 00:33:19,800 --> 00:33:23,200 Speaker 1: any case, you probably don't realize that it has been 537 00:33:23,320 --> 00:33:26,080 Speaker 1: pre trained on that. So what's the result of this, Well, 538 00:33:26,120 --> 00:33:29,480 Speaker 1: if you ask the large language model what color is 539 00:33:29,520 --> 00:33:32,000 Speaker 1: a pumpkin and an answers orange, you probably won't be 540 00:33:32,000 --> 00:33:35,320 Speaker 1: that surprised. But if we ask where Sally looks for 541 00:33:35,360 --> 00:33:38,400 Speaker 1: the baseball and it says under the bed, then we 542 00:33:38,480 --> 00:33:40,959 Speaker 1: clap our hands over our mouths and we say it 543 00:33:41,040 --> 00:33:44,520 Speaker 1: has theory of mind. That's why I decided I needed 544 00:33:44,600 --> 00:33:48,680 Speaker 1: to give a name to this phenomenon, the intelligence echo illusion, 545 00:33:49,040 --> 00:33:53,400 Speaker 1: because often naming something allows us to more easily see it. 546 00:33:53,880 --> 00:33:56,640 Speaker 1: And by the way, if you see good examples of 547 00:33:56,680 --> 00:33:59,800 Speaker 1: this intelligence echo where people mistake things that have been 548 00:33:59,840 --> 00:34:03,160 Speaker 1: rich before for AI that has woken up into a 549 00:34:03,160 --> 00:34:06,400 Speaker 1: world of sentience, let me know at podcasts at Egleman 550 00:34:06,480 --> 00:34:09,000 Speaker 1: dot com. And this brings me to the second reason 551 00:34:09,080 --> 00:34:12,960 Speaker 1: why we should be skeptical about current AI having theory 552 00:34:13,000 --> 00:34:15,799 Speaker 1: of mind. And this is less about the AI and 553 00:34:15,960 --> 00:34:19,359 Speaker 1: one hundred percent about us, And that issue is we 554 00:34:19,400 --> 00:34:22,560 Speaker 1: are very easily fooled. So I'll give you an example. 555 00:34:23,040 --> 00:34:25,959 Speaker 1: In the nineteen sixties, there was a computer scientist named 556 00:34:26,320 --> 00:34:32,479 Speaker 1: Joseph Weisenbaum at MIT, who developed the first chatbot, named Eliza. Now, 557 00:34:32,560 --> 00:34:37,560 Speaker 1: Eliza was designed to act like a psychotherapist. So you 558 00:34:38,000 --> 00:34:42,040 Speaker 1: type in some statement, and it identifies keywords in your 559 00:34:42,080 --> 00:34:46,160 Speaker 1: input and then generates its responses based on a set 560 00:34:46,200 --> 00:34:51,160 Speaker 1: of predefined scripts. So you type I'm feeling sad, and 561 00:34:51,200 --> 00:34:53,120 Speaker 1: Eliza responds. 562 00:34:52,840 --> 00:34:54,160 Speaker 2: Why are you feeling sad? 563 00:34:54,360 --> 00:34:57,799 Speaker 1: All Eliza did was reflect your statement back to you 564 00:34:57,920 --> 00:34:59,799 Speaker 1: in the form of a question, kind of like a 565 00:35:00,040 --> 00:35:04,000 Speaker 1: psychotherapist might do. So a typed chat goes like this. 566 00:35:04,200 --> 00:35:08,360 Speaker 1: You type I need some help with my relationship, and Eliza. 567 00:35:08,000 --> 00:35:10,240 Speaker 2: Writes what kind of help do you think you need? 568 00:35:10,480 --> 00:35:14,120 Speaker 1: And you say I think my partner doesn't listen to me. 569 00:35:14,520 --> 00:35:17,840 Speaker 2: Eliza types, what makes you think your partner doesn't listen 570 00:35:17,880 --> 00:35:18,400 Speaker 2: to you? 571 00:35:18,400 --> 00:35:22,439 Speaker 1: You say, whenever I talk about my feelings, they seem distracted. 572 00:35:22,800 --> 00:35:24,960 Speaker 2: How do you feel when you talk about your feelings 573 00:35:25,160 --> 00:35:26,440 Speaker 2: and they seem distracted? 574 00:35:26,760 --> 00:35:32,160 Speaker 1: So Eliza's only strategy is to ask you your statements back. Now, 575 00:35:32,200 --> 00:35:34,759 Speaker 1: Eliza was just a few kilobytes of code in the 576 00:35:34,840 --> 00:35:38,200 Speaker 1: nineteen sixties, and it simply flipped whatever you said into 577 00:35:38,200 --> 00:35:42,439 Speaker 1: a question, and it had no ability to infer your 578 00:35:42,520 --> 00:35:46,080 Speaker 1: mental state or your emotions, so no one even suggested 579 00:35:46,440 --> 00:35:51,160 Speaker 1: that it had any understanding of the content of the conversation. Nonetheless, 580 00:35:51,280 --> 00:35:55,800 Speaker 1: it simulated a basic conversational partner, and many users became 581 00:35:55,960 --> 00:35:59,440 Speaker 1: emotionally attached to Eliza, even though they knew it was 582 00:35:59,520 --> 00:36:04,160 Speaker 1: just a machine. And this illustrates how seductively easy it 583 00:36:04,200 --> 00:36:08,040 Speaker 1: is for us to bring all our communication machinery to 584 00:36:08,120 --> 00:36:11,440 Speaker 1: the table and assume that the words we get back 585 00:36:11,960 --> 00:36:16,040 Speaker 1: must have a mind behind it. This early experiment demonstrated 586 00:36:16,080 --> 00:36:21,600 Speaker 1: that even simple pattern recognition can evoke genuine emotional responses 587 00:36:21,600 --> 00:36:24,880 Speaker 1: from the users. Now fast forward to today, and we 588 00:36:24,960 --> 00:36:29,200 Speaker 1: have large language models that have trillions of times more 589 00:36:29,320 --> 00:36:34,800 Speaker 1: code than Eliza, and this seduction is only magnified. Modern 590 00:36:34,840 --> 00:36:39,120 Speaker 1: AI can process prompts without any true understanding, but we 591 00:36:39,280 --> 00:36:44,280 Speaker 1: humans still get pulled into feeling like there's someone there 592 00:36:44,640 --> 00:36:47,840 Speaker 1: on the other end of the line. Okay, so we 593 00:36:48,000 --> 00:36:51,560 Speaker 1: established early on that there's no reason in theory a 594 00:36:51,640 --> 00:36:55,120 Speaker 1: computer couldn't emulate other minds. But on the other hand, 595 00:36:55,160 --> 00:36:58,720 Speaker 1: we've established that just because a large language model seems 596 00:36:58,760 --> 00:37:03,239 Speaker 1: to sometimes nail the answers doesn't necessitate that it is 597 00:37:03,280 --> 00:37:06,000 Speaker 1: doing theory of mind It may simply tell us that 598 00:37:06,040 --> 00:37:12,000 Speaker 1: the answer exists somewhere in the unimaginably large corpus that 599 00:37:12,120 --> 00:37:14,719 Speaker 1: humans have written, or even by the way that there's 600 00:37:14,760 --> 00:37:17,520 Speaker 1: been some fine tuning on the model where someone adds 601 00:37:17,560 --> 00:37:21,040 Speaker 1: a similar problem by hand. In other words, the AI 602 00:37:21,200 --> 00:37:25,200 Speaker 1: is doing an interpolation between answers that it has seen before, 603 00:37:25,480 --> 00:37:29,560 Speaker 1: but it's not actually putting itself in someone else's mind. 604 00:37:30,239 --> 00:37:33,680 Speaker 1: So does modern AI have theory of mind? As of now, 605 00:37:33,800 --> 00:37:36,880 Speaker 1: I'm not convinced that we have any reason to think so. 606 00:37:37,680 --> 00:37:41,759 Speaker 1: Current large language models are making sophisticated decisions about which 607 00:37:41,800 --> 00:37:46,759 Speaker 1: word comes next. That's it. They don't understand in the 608 00:37:46,840 --> 00:37:49,719 Speaker 1: human sense of seeing the woman in Jaws or the 609 00:37:49,719 --> 00:37:52,919 Speaker 1: man who has lost his keys and thinking about what 610 00:37:53,040 --> 00:37:56,240 Speaker 1: it is like to be them. And this is why 611 00:37:56,520 --> 00:38:00,520 Speaker 1: Siri or Alexo or Google can respond to your queries 612 00:38:00,600 --> 00:38:04,880 Speaker 1: quite well. But they don't know anything about your beliefs 613 00:38:05,040 --> 00:38:08,560 Speaker 1: or desires or emotions. They don't know if you're asking 614 00:38:08,560 --> 00:38:12,319 Speaker 1: a question because you are curious, or you're confused, or 615 00:38:12,560 --> 00:38:16,719 Speaker 1: you're just making conversation, or you're being sarcastic. So this 616 00:38:16,800 --> 00:38:20,080 Speaker 1: is all to say there is a difference between simulating 617 00:38:20,160 --> 00:38:25,000 Speaker 1: responses based on word probabilities and actually slipping into other 618 00:38:25,080 --> 00:38:28,560 Speaker 1: people's shoes. Now, as I said before, this has nothing 619 00:38:28,600 --> 00:38:31,800 Speaker 1: to do with whether we will come to develop AI 620 00:38:31,920 --> 00:38:34,480 Speaker 1: that can do theory of mind. There are several research 621 00:38:34,520 --> 00:38:39,480 Speaker 1: groups working on AI systems that try to infer intentions 622 00:38:39,480 --> 00:38:43,280 Speaker 1: and desires, and this would have applications and everything from 623 00:38:43,760 --> 00:38:48,640 Speaker 1: more intuitive personal assistance to robots that can better collaborate 624 00:38:48,680 --> 00:38:53,759 Speaker 1: with humans in complex tasks. Now, let's note something interesting here. 625 00:38:54,320 --> 00:38:58,000 Speaker 1: Even if we can get AI to make inferences, it's 626 00:38:58,000 --> 00:39:01,680 Speaker 1: still not clear whether that will be true theory of mind. 627 00:39:01,840 --> 00:39:05,000 Speaker 1: That might require the AI to have some level of 628 00:39:05,480 --> 00:39:10,680 Speaker 1: self awareness or consciousness or subjective experience. But as Kazinski 629 00:39:10,719 --> 00:39:13,440 Speaker 1: points out, even if we don't think the AI has 630 00:39:13,520 --> 00:39:16,879 Speaker 1: theory of mind, there might be value in machines behaving 631 00:39:17,040 --> 00:39:20,040 Speaker 1: as though they possess theory of mind. And that's certainly 632 00:39:20,040 --> 00:39:24,120 Speaker 1: a valid point. Alan Turing, who proposed the imitation game 633 00:39:24,200 --> 00:39:28,160 Speaker 1: the turning test, considered the distinction between what a computer 634 00:39:28,400 --> 00:39:33,040 Speaker 1: actually has and what it seems to have to be meaningless. 635 00:39:33,320 --> 00:39:35,880 Speaker 1: A more modern version of this point is reflected in 636 00:39:35,920 --> 00:39:39,080 Speaker 1: the television show Westworld, which is about a future in 637 00:39:39,120 --> 00:39:42,200 Speaker 1: which there are lifelike human androids. And if you watch 638 00:39:42,320 --> 00:39:46,160 Speaker 1: the opening scene, the young William enters the first room 639 00:39:46,200 --> 00:39:48,759 Speaker 1: and there's a beautiful assistant who helps him to pick 640 00:39:48,760 --> 00:39:51,799 Speaker 1: out a hat and a gun, and she's very cirtatious 641 00:39:51,840 --> 00:39:55,400 Speaker 1: with him, and he nervously says, sorry to ask, but 642 00:39:55,760 --> 00:39:59,640 Speaker 1: are you real? And she says, if you can't tell, 643 00:40:00,320 --> 00:40:03,160 Speaker 1: does it matter? And maybe that'll be the case with 644 00:40:03,280 --> 00:40:06,360 Speaker 1: AI in the near future. It will fake theory of 645 00:40:06,440 --> 00:40:09,640 Speaker 1: mind and that will be enough for us to reap 646 00:40:09,800 --> 00:40:13,720 Speaker 1: all the benefits. So let's wrap up. While current large 647 00:40:13,760 --> 00:40:17,759 Speaker 1: language models are mind blowingly impressive, I land on the 648 00:40:17,760 --> 00:40:20,800 Speaker 1: position that while they can often get the right answer 649 00:40:20,960 --> 00:40:24,160 Speaker 1: on theory of mind tests, it's an illusion. They're not 650 00:40:24,280 --> 00:40:27,040 Speaker 1: actually simulating what it's like to be someone else. And 651 00:40:27,080 --> 00:40:31,120 Speaker 1: this is what I'm now calling the intelligence echo illusion. 652 00:40:31,800 --> 00:40:35,000 Speaker 1: The illusion results from humans having built over thousands of 653 00:40:35,080 --> 00:40:39,680 Speaker 1: years and incredibly large corpus of ideas and questions and 654 00:40:39,760 --> 00:40:43,280 Speaker 1: answers a thousand times larger than you could ever read 655 00:40:43,400 --> 00:40:47,480 Speaker 1: in the lifetime. And sometimes you don't know that the 656 00:40:47,560 --> 00:40:51,319 Speaker 1: answers are already in there, and when you hear an 657 00:40:51,400 --> 00:40:56,240 Speaker 1: echo of humans, you mistake that for intelligence of the computer. 658 00:40:56,560 --> 00:40:59,400 Speaker 1: So that's the position I'm taking for now. Large language 659 00:40:59,400 --> 00:41:02,480 Speaker 1: models life back a true theory of mind. The question 660 00:41:02,920 --> 00:41:06,320 Speaker 1: is whether we will get there someday. Probably it won't 661 00:41:06,320 --> 00:41:09,240 Speaker 1: be with large language models, but instead a very different 662 00:41:09,360 --> 00:41:15,120 Speaker 1: kind of architecture, possibly one that has semoticum of consciousness 663 00:41:15,440 --> 00:41:17,879 Speaker 1: so that it is able to reflect on its own 664 00:41:18,000 --> 00:41:22,120 Speaker 1: mental states to emulate someone else's. So thank you for 665 00:41:22,239 --> 00:41:24,920 Speaker 1: joining me on this journey into the mind, both human 666 00:41:25,040 --> 00:41:28,040 Speaker 1: and artificial. If you enjoyed this episode, don't forget to 667 00:41:28,080 --> 00:41:30,480 Speaker 1: subscribe and rate and review, and if you have any 668 00:41:30,560 --> 00:41:32,640 Speaker 1: questions or topics that you'd like to hear about in 669 00:41:32,680 --> 00:41:36,520 Speaker 1: future episodes, feel free to reach out. Until next time, 670 00:41:36,680 --> 00:41:42,080 Speaker 1: keep questioning, keep exploring, and stay curious. Go to eagleman 671 00:41:42,120 --> 00:41:45,680 Speaker 1: dot com slash podcast for more information and to find 672 00:41:45,719 --> 00:41:49,400 Speaker 1: further reading. Send me an email at podcasts at eagleman 673 00:41:49,480 --> 00:41:52,880 Speaker 1: dot com with questions or discussion and check out Subscribe 674 00:41:52,880 --> 00:41:56,239 Speaker 1: to Inner Cosmos on YouTube for videos of each episode 675 00:41:56,280 --> 00:41:59,959 Speaker 1: and to leave comments until next time. I'm David Eagle 676 00:42:00,080 --> 00:42:03,400 Speaker 1: and we have been exploring the Inner Cosmos