1
00:00:05,160 --> 00:00:07,880
Speaker 1: You know that moment in the horror movie where the

2
00:00:07,960 --> 00:00:12,040
Speaker 1: monster is coming closer but the person on screen doesn't

3
00:00:12,080 --> 00:00:14,880
Speaker 1: see it. Why does that drive you crazy? And what

4
00:00:14,920 --> 00:00:19,560
Speaker 1: does that teach us about brains? What is theory of

5
00:00:19,800 --> 00:00:23,079
Speaker 1: mind and why is it so important for everyone from

6
00:00:23,160 --> 00:00:27,160
Speaker 1: poker players to con men, to stage magicians to novelists.

7
00:00:27,720 --> 00:00:30,360
Speaker 1: We're going to talk about a very fundamental skill of

8
00:00:30,440 --> 00:00:34,880
Speaker 1: human brains today, and as impressive as AI is currently,

9
00:00:34,920 --> 00:00:38,880
Speaker 1: we're going to ask the question of whether computers can

10
00:00:39,000 --> 00:00:42,440
Speaker 1: replicate this right now or whether it is beyond their

11
00:00:42,479 --> 00:00:48,200
Speaker 1: skill set. Welcome to Inner Cosmos with me David Eagleman.

12
00:00:48,280 --> 00:00:50,920
Speaker 1: I'm a neuroscientist and an author at Stanford, and in

13
00:00:50,960 --> 00:00:54,639
Speaker 1: these episodes we sail deeply into our three pound universe

14
00:00:54,880 --> 00:00:58,280
Speaker 1: to understand why and how our lives look the way

15
00:00:58,280 --> 00:01:02,000
Speaker 1: they do. Today's episode is about what it takes to

16
00:01:02,240 --> 00:01:05,680
Speaker 1: understand other people, how your brain does it, and whether

17
00:01:05,840 --> 00:01:11,920
Speaker 1: computers could do it. So imagine this. You're walking down

18
00:01:11,920 --> 00:01:17,080
Speaker 1: the street and you see someone frantically searching their pockets

19
00:01:17,200 --> 00:01:20,880
Speaker 1: and looking around with furrowed brows in a tight frown.

20
00:01:21,480 --> 00:01:25,399
Speaker 1: So without them saying a word, you can infer that

21
00:01:25,480 --> 00:01:29,120
Speaker 1: they might have lost something important. Maybe it's his keys.

22
00:01:29,760 --> 00:01:33,560
Speaker 1: Your brain can easily make a good guess about another

23
00:01:33,680 --> 00:01:37,679
Speaker 1: person's mental state just from looking at their actions. We

24
00:01:37,800 --> 00:01:41,319
Speaker 1: are inferring something about what is going on in that

25
00:01:41,400 --> 00:01:45,640
Speaker 1: person's head. But it's more than just pattern matching. It's

26
00:01:45,640 --> 00:01:48,520
Speaker 1: not simply that your brain has seen lots of people

27
00:01:48,600 --> 00:01:52,440
Speaker 1: patting their pockets and you talked with them afterwards, and

28
00:01:52,480 --> 00:01:54,760
Speaker 1: you figured out why they were doing that, and you

29
00:01:54,880 --> 00:01:59,000
Speaker 1: detected a pattern, and you memorized, ah, okay, that pattern

30
00:01:59,040 --> 00:02:04,120
Speaker 1: equals that problem. Instead, you have the ability to imagine

31
00:02:04,200 --> 00:02:08,880
Speaker 1: yourself in their situation. You can mentally slip into their

32
00:02:08,960 --> 00:02:12,200
Speaker 1: shoes and ask, what would I be thinking if I

33
00:02:12,320 --> 00:02:16,720
Speaker 1: were patting my pockets and frantically searching around me? And

34
00:02:16,800 --> 00:02:19,959
Speaker 1: maybe you see something else. You see a kid there

35
00:02:20,120 --> 00:02:23,360
Speaker 1: around the corner, and the kid is peeking around the

36
00:02:23,400 --> 00:02:26,720
Speaker 1: corner at the man patting his pockets, and the child

37
00:02:27,200 --> 00:02:30,520
Speaker 1: is giggling. Now, why is the kid giggling while the

38
00:02:30,560 --> 00:02:34,520
Speaker 1: guy is so obviously worried, Well, it probably strikes you

39
00:02:34,560 --> 00:02:37,520
Speaker 1: that he's hiding something from the guy. You see that

40
00:02:37,560 --> 00:02:41,160
Speaker 1: the kid is not running away. Instead, he's standing in

41
00:02:41,280 --> 00:02:44,000
Speaker 1: such a way that he'll be spotted. Now it's pretty

42
00:02:44,080 --> 00:02:48,280
Speaker 1: obvious what's happening here. You can step into the man's

43
00:02:48,320 --> 00:02:50,760
Speaker 1: head to feel the worry, and you can step into

44
00:02:50,800 --> 00:02:53,880
Speaker 1: the kid's head to recognize that he feels like he's

45
00:02:53,919 --> 00:02:56,000
Speaker 1: playing a game, even if it doesn't strike you as

46
00:02:56,080 --> 00:03:00,920
Speaker 1: so funny. Then you catch the guy meet with the

47
00:03:01,040 --> 00:03:03,680
Speaker 1: kid for just a fraction of a second, which sends

48
00:03:03,720 --> 00:03:07,360
Speaker 1: the kid into fits of laughter, and you realize the

49
00:03:07,400 --> 00:03:11,280
Speaker 1: man is just playing along. Now, how did you decide

50
00:03:11,680 --> 00:03:14,440
Speaker 1: what is going on in the heads of these two Again,

51
00:03:14,520 --> 00:03:18,520
Speaker 1: it's not as though you memorized an algorithm here. Okay,

52
00:03:18,560 --> 00:03:21,280
Speaker 1: if there's eye contact, then there's one interpretation. If there's

53
00:03:21,280 --> 00:03:26,320
Speaker 1: no eye contact, then a totally different interpretation. To appreciate

54
00:03:26,520 --> 00:03:29,960
Speaker 1: how complex this mind reading is that you just did,

55
00:03:30,840 --> 00:03:34,639
Speaker 1: just imagine that you're a space alien watching this scene

56
00:03:34,720 --> 00:03:38,080
Speaker 1: from your spaceship. You would be totally confused. You would

57
00:03:38,120 --> 00:03:41,800
Speaker 1: have no idea what's going on in this weird scene

58
00:03:42,280 --> 00:03:45,200
Speaker 1: because you don't know what it is to be a human.

59
00:03:46,000 --> 00:03:49,160
Speaker 1: Here's another analogy to appreciate this. Think about the way

60
00:03:49,160 --> 00:03:53,400
Speaker 1: that you, as a human might watch fish. You really

61
00:03:53,440 --> 00:03:57,160
Speaker 1: don't understand what the heck they're doing. One fish suddenly

62
00:03:57,160 --> 00:04:01,080
Speaker 1: starts swimming faster, and another starts swimming in circles, and

63
00:04:01,160 --> 00:04:04,400
Speaker 1: one starts flapping its gills faster, and one moves up

64
00:04:04,640 --> 00:04:09,040
Speaker 1: towards the surface. It's all just weird fish behavior to you.

65
00:04:09,040 --> 00:04:11,120
Speaker 1: You don't know how to read any of it. It's

66
00:04:11,200 --> 00:04:14,720
Speaker 1: just fish stuff, and you're not able to immediately construct

67
00:04:14,760 --> 00:04:19,000
Speaker 1: a story about the meaning of any of this. And

68
00:04:19,000 --> 00:04:22,279
Speaker 1: that's what it's like to be this space alien watching

69
00:04:22,320 --> 00:04:27,200
Speaker 1: this guy checking his pockets and the child giggling. Now,

70
00:04:27,760 --> 00:04:31,520
Speaker 1: what allows us, as opposed to the space alien, to

71
00:04:31,600 --> 00:04:36,000
Speaker 1: be so good at reading our fellow humans. This is

72
00:04:36,040 --> 00:04:41,240
Speaker 1: what psychologists and neuroscientists call theory of mind, and that's

73
00:04:41,279 --> 00:04:44,560
Speaker 1: what we're talking about today. Theory of mind is the

74
00:04:44,600 --> 00:04:49,920
Speaker 1: ability to understand that other people have their own thoughts

75
00:04:49,960 --> 00:04:53,720
Speaker 1: and feelings and beliefs that are different from yours. It's

76
00:04:53,760 --> 00:04:58,120
Speaker 1: the ability to recognize that others have their own perspectives.

77
00:04:58,160 --> 00:05:02,200
Speaker 1: It's the ability to attribut mute mental states to other people,

78
00:05:02,320 --> 00:05:07,440
Speaker 1: like what their intentions are, or their desires, or their emotions,

79
00:05:07,960 --> 00:05:11,640
Speaker 1: or what they know or don't know. And theory of

80
00:05:11,680 --> 00:05:15,440
Speaker 1: mind is a key cognitive skill that allows us to

81
00:05:15,600 --> 00:05:19,360
Speaker 1: interact with other people in a very rich and nuanced way.

82
00:05:19,880 --> 00:05:23,560
Speaker 1: Just think about how pervasive this skill is in everything

83
00:05:23,600 --> 00:05:27,359
Speaker 1: we do. So take sarcasm. When your friend makes a

84
00:05:27,760 --> 00:05:32,279
Speaker 1: sarcastic comment, you can recognize that her words don't match

85
00:05:32,400 --> 00:05:36,640
Speaker 1: her true intention. So, for example, if she says, oh, awesome,

86
00:05:36,680 --> 00:05:40,919
Speaker 1: more traffic, I love traffic, you infer that she's not

87
00:05:41,120 --> 00:05:46,400
Speaker 1: actually pleased. This requires understanding her mental state that she

88
00:05:46,600 --> 00:05:52,320
Speaker 1: is irritated not happy. Now, if you were Siri or Alexa,

89
00:05:52,440 --> 00:05:55,919
Speaker 1: you wouldn't be able to recognize anything but the words.

90
00:05:56,320 --> 00:05:59,800
Speaker 1: You wouldn't understand anything about the mind behind the words.

91
00:06:00,279 --> 00:06:03,200
Speaker 1: So we're going to talk about how brains do it

92
00:06:03,560 --> 00:06:06,719
Speaker 1: and whether or not computers can do it. But before

93
00:06:06,720 --> 00:06:08,320
Speaker 1: we go there, we're going to take a few minutes

94
00:06:08,360 --> 00:06:12,839
Speaker 1: to really appreciate how the skill is everywhere in what

95
00:06:12,960 --> 00:06:17,880
Speaker 1: we do. For example, just think about different professions. So

96
00:06:18,000 --> 00:06:22,440
Speaker 1: detectives use theory of mind all the time. Did mister

97
00:06:22,560 --> 00:06:25,080
Speaker 1: Jones know that the food had gone bad when he

98
00:06:25,160 --> 00:06:28,560
Speaker 1: sold it? Did mister Smith know that his boss was

99
00:06:28,640 --> 00:06:32,600
Speaker 1: involved with organized crime or was he acting with no knowledge?

100
00:06:33,200 --> 00:06:35,000
Speaker 1: Or more generally, if they want to know if someone

101
00:06:35,080 --> 00:06:38,120
Speaker 1: is lying, it usually helps to step into their shoes

102
00:06:38,160 --> 00:06:40,880
Speaker 1: and think about what that person knows or doesn't know.

103
00:06:41,279 --> 00:06:44,400
Speaker 1: Magicians use theory of mind. They know that if they

104
00:06:44,800 --> 00:06:47,400
Speaker 1: move their hand in an arc, your attention is going

105
00:06:47,440 --> 00:06:51,560
Speaker 1: to follow that, and therefore they know what you won't

106
00:06:51,640 --> 00:06:54,919
Speaker 1: see them do. They know that even though they know

107
00:06:55,120 --> 00:06:58,040
Speaker 1: something happened, like the card dropped into their sleeve, they

108
00:06:58,040 --> 00:07:01,760
Speaker 1: know that you don't know that. They always keep your

109
00:07:01,920 --> 00:07:05,679
Speaker 1: point of view, your beliefs, at the forefront of their mind.

110
00:07:06,440 --> 00:07:08,800
Speaker 1: Con Men do this. They listen to your words and

111
00:07:08,839 --> 00:07:11,920
Speaker 1: they read your body language to gather what you know

112
00:07:12,040 --> 00:07:15,280
Speaker 1: and don't know, and therefore what buttons they should push next.

113
00:07:15,720 --> 00:07:20,480
Speaker 1: Psychiatrists and psychologists always use theory of mind to understand

114
00:07:20,800 --> 00:07:24,200
Speaker 1: what is being expressed from the patient's point of view,

115
00:07:24,400 --> 00:07:26,960
Speaker 1: In other words, what the person believes, whether or not

116
00:07:27,040 --> 00:07:30,360
Speaker 1: it's what the therapist believes. I'll give you another example.

117
00:07:30,640 --> 00:07:33,840
Speaker 1: My friend Maddie is a professional poker player, and he

118
00:07:33,960 --> 00:07:37,400
Speaker 1: describes poker playing like this. He says, when you're learning

119
00:07:37,480 --> 00:07:39,880
Speaker 1: to play poker, you think about the cards you have

120
00:07:39,960 --> 00:07:43,239
Speaker 1: in your hand. As you get better, you think about

121
00:07:43,240 --> 00:07:46,840
Speaker 1: your hand and also what the other person is thinking.

122
00:07:47,200 --> 00:07:49,560
Speaker 1: And as you get even better, you think about what

123
00:07:49,640 --> 00:07:52,960
Speaker 1: the other person is thinking your thinking, and when you

124
00:07:53,000 --> 00:07:57,040
Speaker 1: get to the professional levels, you're thinking about what he thinks,

125
00:07:57,320 --> 00:08:00,559
Speaker 1: you think he thinks, and people who are real pros

126
00:08:00,680 --> 00:08:04,280
Speaker 1: can think five or six levels deep on this. All

127
00:08:04,320 --> 00:08:08,080
Speaker 1: of this is theory of mind, and theory of mind

128
00:08:08,240 --> 00:08:12,520
Speaker 1: is key when you're teaching something. For example, parents know

129
00:08:13,080 --> 00:08:16,880
Speaker 1: that their children can't understand certain things. For example, the

130
00:08:16,960 --> 00:08:20,080
Speaker 1: child needs to get that smallpox shot, even though to

131
00:08:20,120 --> 00:08:22,720
Speaker 1: the child that's nothing but scary and he simply doesn't

132
00:08:22,960 --> 00:08:26,600
Speaker 1: have the capacity to think about the future benefits that

133
00:08:26,640 --> 00:08:29,880
Speaker 1: will accrue. Or the school teacher can only hope to

134
00:08:30,040 --> 00:08:33,800
Speaker 1: educate her students if she knows what they already know

135
00:08:34,040 --> 00:08:36,319
Speaker 1: or don't know. She needs to phrase things in such

136
00:08:36,360 --> 00:08:39,320
Speaker 1: a way that someone who doesn't already know what she

137
00:08:39,480 --> 00:08:44,000
Speaker 1: knows can absorb it, and that just requires theory of mind.

138
00:08:44,480 --> 00:08:47,160
Speaker 1: If she couldn't simulate what it's like to be in

139
00:08:47,200 --> 00:08:50,719
Speaker 1: their heads, she'd have no meaningful shot at getting them

140
00:08:50,960 --> 00:08:54,360
Speaker 1: past the first quiz. And this issue of considering what

141
00:08:54,480 --> 00:08:59,240
Speaker 1: someone knows or doesn't know is also critical in any negotiation.

142
00:08:59,400 --> 00:09:03,040
Speaker 1: You try to under understand the other person's desires and

143
00:09:03,200 --> 00:09:08,680
Speaker 1: goals and where they might potentially compromise during a salary negotiation,

144
00:09:08,760 --> 00:09:12,199
Speaker 1: you consider what your employer is thinking about the needs

145
00:09:12,200 --> 00:09:15,240
Speaker 1: in future of the company and therefore what they might

146
00:09:15,280 --> 00:09:17,800
Speaker 1: be willing to offer. And this is also how you

147
00:09:17,840 --> 00:09:21,640
Speaker 1: manage conflicts. In any disagreement, if you're smart, you try

148
00:09:21,640 --> 00:09:26,480
Speaker 1: to understand the other person's perspective to resolve the issue.

149
00:09:26,559 --> 00:09:28,960
Speaker 1: If your partner is upset with you, you try to

150
00:09:29,000 --> 00:09:32,360
Speaker 1: figure out what you did or said that set things off,

151
00:09:32,400 --> 00:09:35,600
Speaker 1: and why that offended the other person and how it

152
00:09:35,840 --> 00:09:38,640
Speaker 1: landed for them. And that's the single way that you're

153
00:09:38,640 --> 00:09:42,160
Speaker 1: going to hit the problem effectively. So this ability to

154
00:09:42,240 --> 00:09:46,280
Speaker 1: slip into someone else's shoes has almost everything to do

155
00:09:46,360 --> 00:09:51,880
Speaker 1: with our social intelligence. You use this very human skill

156
00:09:52,600 --> 00:09:55,240
Speaker 1: all the time. And before we get to the next

157
00:09:55,280 --> 00:09:57,800
Speaker 1: act of this podcast, where I ask if computers can

158
00:09:57,840 --> 00:09:59,920
Speaker 1: do this or not, I just want to finish fla

159
00:10:00,360 --> 00:10:03,160
Speaker 1: this out so we can really see how pervasive this is.

160
00:10:03,640 --> 00:10:07,000
Speaker 1: So as an example, you rev up your theory of

161
00:10:07,200 --> 00:10:10,640
Speaker 1: mind engine whenever you send an email. If you know

162
00:10:10,760 --> 00:10:13,959
Speaker 1: someone has a well developed model of you, like your

163
00:10:14,000 --> 00:10:17,600
Speaker 1: parents or your spouse, then you can use abbreviations and

164
00:10:17,640 --> 00:10:20,840
Speaker 1: shortcuts to get your message across. But if you're writing

165
00:10:20,840 --> 00:10:23,120
Speaker 1: to someone who's never met you before. Let's say you're

166
00:10:23,160 --> 00:10:26,520
Speaker 1: applying to a new job. You run a very different game,

167
00:10:27,160 --> 00:10:31,600
Speaker 1: so you're not just an email writing algorithm that produces output,

168
00:10:31,640 --> 00:10:35,559
Speaker 1: but instead your output is modified according to who you

169
00:10:35,640 --> 00:10:38,440
Speaker 1: expect is doing the reading on the other end, and

170
00:10:38,520 --> 00:10:42,160
Speaker 1: specifically what their mind is like. And I also want

171
00:10:42,200 --> 00:10:45,760
Speaker 1: to mention that theory of mind is critical for literature

172
00:10:45,880 --> 00:10:48,080
Speaker 1: to work because it's often the case that you can

173
00:10:48,520 --> 00:10:52,400
Speaker 1: see the limitations of the character's point of view. So,

174
00:10:52,520 --> 00:10:55,960
Speaker 1: for example, if you remember the beginning of the movie Jaws,

175
00:10:56,080 --> 00:10:58,840
Speaker 1: the woman is swimming around in the ocean water and

176
00:10:58,880 --> 00:11:02,680
Speaker 1: she's very relaxed than happy because we see the shark,

177
00:11:02,960 --> 00:11:06,720
Speaker 1: but she doesn't. If we didn't have theory of mind,

178
00:11:06,760 --> 00:11:09,040
Speaker 1: we would simply say, oh, there's a shark there. But

179
00:11:09,080 --> 00:11:12,439
Speaker 1: we're able to understand that she cannot see the shark,

180
00:11:12,720 --> 00:11:14,880
Speaker 1: and that's a big part of why we are fearful,

181
00:11:15,400 --> 00:11:18,120
Speaker 1: because she isn't fearful, and we want her to be.

182
00:11:19,000 --> 00:11:23,520
Speaker 1: This stepping into other people's heads drives essentially all horror

183
00:11:23,600 --> 00:11:26,720
Speaker 1: movies because we often know something that the main character

184
00:11:27,320 --> 00:11:30,960
Speaker 1: does not, and it also drives romantic comedies. For example,

185
00:11:31,000 --> 00:11:35,000
Speaker 1: we see the guy doing something very nice like helping

186
00:11:35,040 --> 00:11:37,760
Speaker 1: an elderly woman cross the street, and he doesn't know

187
00:11:37,840 --> 00:11:41,080
Speaker 1: that he is being watched by the female love interest,

188
00:11:41,520 --> 00:11:45,200
Speaker 1: and therefore we the audience interpret what kind of guy

189
00:11:45,240 --> 00:11:47,959
Speaker 1: he must be to behave that way when as far

190
00:11:47,960 --> 00:11:50,720
Speaker 1: as he knows, he's totally alone. We would have a

191
00:11:50,800 --> 00:11:56,040
Speaker 1: totally different interpretation. If he sees his romantic counterparts there

192
00:11:56,080 --> 00:11:58,880
Speaker 1: and then he does the charitable act, we'd simulate that

193
00:11:58,920 --> 00:12:03,080
Speaker 1: his intentions are different there. Now, why are human brains

194
00:12:03,160 --> 00:12:07,440
Speaker 1: so talented at making theories about other people's minds. Well,

195
00:12:07,480 --> 00:12:10,240
Speaker 1: you've heard me say many times that the job of

196
00:12:10,280 --> 00:12:14,960
Speaker 1: intelligent brains is to predict the future. If you're the magician,

197
00:12:15,040 --> 00:12:18,520
Speaker 1: you'd better be sure that you are predicting correctly where

198
00:12:18,559 --> 00:12:21,480
Speaker 1: their spotlight of attention is about to be. If you're

199
00:12:21,520 --> 00:12:24,440
Speaker 1: the poker player or the con man, you're trying to

200
00:12:24,440 --> 00:12:27,440
Speaker 1: predict what someone is going to do next, and this

201
00:12:27,520 --> 00:12:30,360
Speaker 1: is the optimal way to do this is to step

202
00:12:30,400 --> 00:12:34,240
Speaker 1: into their mental world and understand what it is like

203
00:12:34,440 --> 00:12:36,880
Speaker 1: to be them. What they know and they don't know.

204
00:12:37,360 --> 00:12:43,200
Speaker 1: You leverage theory of mind to anticipate their next action,

205
00:12:43,600 --> 00:12:46,760
Speaker 1: and presumably this reaches back to the recent millions of

206
00:12:46,840 --> 00:12:49,880
Speaker 1: years of our evolution. So if you're an early homo

207
00:12:49,920 --> 00:12:53,160
Speaker 1: sapien and moving along the trail and you see another

208
00:12:53,240 --> 00:12:57,199
Speaker 1: homo sapien coming down the trail towards you, it's absolutely

209
00:12:57,200 --> 00:12:59,920
Speaker 1: critical for you to figure out is he going to

210
00:13:00,120 --> 00:13:03,240
Speaker 1: attack me? Is he scared of me? Is he trying

211
00:13:03,280 --> 00:13:05,520
Speaker 1: to trick me? Is he just trying to get past me.

212
00:13:06,200 --> 00:13:09,320
Speaker 1: You're trying to figure out his mind so you can

213
00:13:09,360 --> 00:13:13,280
Speaker 1: figure out his next actions. So what I've told you

214
00:13:13,320 --> 00:13:16,440
Speaker 1: so far is that theory of mind is this critical

215
00:13:16,480 --> 00:13:21,079
Speaker 1: foundation for all of our meaningful social interactions because those

216
00:13:21,200 --> 00:13:26,319
Speaker 1: require you to be able to simulate other people's intentions

217
00:13:26,360 --> 00:13:30,800
Speaker 1: and emotions and beliefs. Your brain doesn't assume that it's

218
00:13:30,840 --> 00:13:34,600
Speaker 1: a knowledge communism out there where everyone knows exactly what

219
00:13:34,720 --> 00:13:37,840
Speaker 1: you know. Instead, we're able to pull off a higher

220
00:13:37,920 --> 00:13:41,760
Speaker 1: level of interaction because we understand that the world is

221
00:13:41,800 --> 00:13:45,440
Speaker 1: different inside different heads. And this, by the way, is

222
00:13:45,480 --> 00:13:48,880
Speaker 1: really sophisticated. It requires knowing who I am and what

223
00:13:48,920 --> 00:13:51,560
Speaker 1: I see and believe, and also holding in my head

224
00:13:51,600 --> 00:13:53,480
Speaker 1: what it is to be someone else and see and

225
00:13:53,520 --> 00:13:56,920
Speaker 1: believe something different. This is a very sophisticated computation that

226
00:13:56,960 --> 00:14:00,800
Speaker 1: the brain pulls off, but because we're so good at it,

227
00:14:00,800 --> 00:14:05,680
Speaker 1: it's typically invisible to us. But theory of mind doesn't

228
00:14:05,720 --> 00:14:09,760
Speaker 1: come for free. It's something that develops with time. As

229
00:14:09,800 --> 00:14:12,800
Speaker 1: you get more and more experience in the world and

230
00:14:12,840 --> 00:14:16,000
Speaker 1: you stop believing that you are the centerpiece and that

231
00:14:16,080 --> 00:14:18,760
Speaker 1: everyone else is just a cast member. You come to

232
00:14:18,880 --> 00:14:22,640
Speaker 1: understand that that person believes something different than you do,

233
00:14:23,120 --> 00:14:25,800
Speaker 1: and this other person feels a certain way even though

234
00:14:25,840 --> 00:14:29,360
Speaker 1: you don't, and that this person over here thinks something

235
00:14:29,400 --> 00:14:48,040
Speaker 1: to be true even though you know it's not. So

236
00:14:48,080 --> 00:14:50,800
Speaker 1: how do we know that this is a skill that

237
00:14:50,880 --> 00:14:55,920
Speaker 1: develops through time Because very little kids are terrible at

238
00:14:55,960 --> 00:14:59,240
Speaker 1: theory of mind, but they get better as they mature

239
00:14:59,440 --> 00:15:02,520
Speaker 1: into the world, and typically by the ages of three

240
00:15:02,720 --> 00:15:05,720
Speaker 1: to five, they're getting that they're not the only point

241
00:15:05,720 --> 00:15:08,400
Speaker 1: of view that's possible, but that each person in the

242
00:15:08,440 --> 00:15:11,360
Speaker 1: scene has his or her own point of view. Now,

243
00:15:11,360 --> 00:15:15,160
Speaker 1: how do you test whether someone is capable of theory

244
00:15:15,280 --> 00:15:18,240
Speaker 1: of mind? Well, what you do is you present a

245
00:15:18,280 --> 00:15:22,080
Speaker 1: little scenario like this. Sally comes into the room and

246
00:15:22,160 --> 00:15:25,560
Speaker 1: puts her baseball under the bed, and then she leaves.

247
00:15:26,200 --> 00:15:29,920
Speaker 1: While she's gone, Anne comes in the room, she sees

248
00:15:29,960 --> 00:15:32,240
Speaker 1: the ball under the bed, She picks it up, and

249
00:15:32,280 --> 00:15:36,080
Speaker 1: she puts it in the closet. Then she leaves. Now

250
00:15:36,440 --> 00:15:39,359
Speaker 1: Sally comes back in the room, she wants her baseball.

251
00:15:39,920 --> 00:15:42,520
Speaker 1: Where does she look for it? Now? You and I

252
00:15:42,600 --> 00:15:44,840
Speaker 1: know that Sally will look for it under the bed

253
00:15:44,880 --> 00:15:48,720
Speaker 1: where she put it last, even though we simultaneously know

254
00:15:48,840 --> 00:15:52,400
Speaker 1: the actual location of the baseball in the closet. And

255
00:15:52,440 --> 00:15:55,800
Speaker 1: this is because we are running an emulation of what

256
00:15:55,920 --> 00:15:58,640
Speaker 1: it is like to be inside Sally's head with her

257
00:15:58,800 --> 00:16:03,640
Speaker 1: limited knowledg. Now, little children will fail the sally An

258
00:16:03,800 --> 00:16:07,640
Speaker 1: test because they know that the baseball is in the closet,

259
00:16:08,000 --> 00:16:11,920
Speaker 1: so they assume that Sally should know that too. But

260
00:16:12,200 --> 00:16:15,680
Speaker 1: as cognition develops, they come to realize that different heads

261
00:16:15,880 --> 00:16:18,880
Speaker 1: have different beliefs. And a really important clue to the

262
00:16:18,960 --> 00:16:23,240
Speaker 1: development of this is that not everyone develops theory of

263
00:16:23,320 --> 00:16:26,400
Speaker 1: mind in the same way at the same rate. For example,

264
00:16:26,760 --> 00:16:31,680
Speaker 1: people who are on the autism spectrum typically show delays

265
00:16:31,720 --> 00:16:36,360
Speaker 1: in developing theory of mind, which cannot surprisingly impact their

266
00:16:36,360 --> 00:16:40,440
Speaker 1: social interactions. For instance, this is why sarcasm doesn't work

267
00:16:40,480 --> 00:16:44,080
Speaker 1: so well with a person who has autism. When you say, oh,

268
00:16:44,080 --> 00:16:47,760
Speaker 1: great more traffic. I love traffic. They're not likely to

269
00:16:47,880 --> 00:16:51,160
Speaker 1: catch the meaning beneath the words that you're not actually

270
00:16:51,200 --> 00:16:54,680
Speaker 1: pleased because they don't have a sensitive model of your

271
00:16:55,040 --> 00:16:57,880
Speaker 1: actual mental state. If you can't put yourself in the

272
00:16:57,920 --> 00:17:01,000
Speaker 1: shoes of the other person, your understands is limited to

273
00:17:01,200 --> 00:17:05,159
Speaker 1: just pattern recognition, which is not enough for the very

274
00:17:05,200 --> 00:17:09,240
Speaker 1: subtle and sophisticated kinds of communication that humans engage in

275
00:17:09,359 --> 00:17:12,520
Speaker 1: every day. So this tells us that theory of mind

276
00:17:12,720 --> 00:17:16,480
Speaker 1: doesn't come for free in humans. There are brain networks

277
00:17:16,480 --> 00:17:18,879
Speaker 1: that have to develop and learn for this to work,

278
00:17:19,040 --> 00:17:22,200
Speaker 1: so when you look at normal development or delay development.

279
00:17:22,240 --> 00:17:27,600
Speaker 1: This allows us to understand how different brain regions contribute

280
00:17:27,920 --> 00:17:30,960
Speaker 1: to theory of mind. For example, there's one area called

281
00:17:30,960 --> 00:17:34,520
Speaker 1: the temporopridal junction, and this is interesting because it pops

282
00:17:34,520 --> 00:17:39,119
Speaker 1: its head up in tasks that require understanding perspectives, like

283
00:17:39,600 --> 00:17:43,080
Speaker 1: distinguishing between what you know and what someone else knows.

284
00:17:43,520 --> 00:17:47,000
Speaker 1: So imagine you're teaching a friend how to play chess.

285
00:17:47,480 --> 00:17:49,720
Speaker 1: You need to not only understand the rules of the game,

286
00:17:50,040 --> 00:17:52,919
Speaker 1: but also know what your friend knows or doesn't know

287
00:17:53,119 --> 00:17:57,080
Speaker 1: about the game to teach effectively, and the temporo pridal

288
00:17:57,160 --> 00:18:01,160
Speaker 1: junction is involved in that not just that area. It's

289
00:18:01,200 --> 00:18:03,760
Speaker 1: a lot of other areas involved in theory of mind.

290
00:18:04,119 --> 00:18:07,399
Speaker 1: So the medial prefrontal cortex plays a big role in

291
00:18:07,480 --> 00:18:11,600
Speaker 1: making social judgments. It becomes active when you think about

292
00:18:11,680 --> 00:18:14,840
Speaker 1: the mental states of others. For example, if you're trying

293
00:18:14,880 --> 00:18:19,680
Speaker 1: to decide if someone is lying or being truthful, your

294
00:18:19,720 --> 00:18:23,160
Speaker 1: medial prefrontal cortex is engaged. And there are other areas,

295
00:18:23,240 --> 00:18:26,600
Speaker 1: like part of your superior temporal sulcus is involved in

296
00:18:26,760 --> 00:18:31,720
Speaker 1: processing social information like interpreting other people's eye gaze or

297
00:18:31,760 --> 00:18:35,080
Speaker 1: their body language, like the man looking for his keys

298
00:18:35,400 --> 00:18:38,520
Speaker 1: and the child giggling. We're able to infer a lot

299
00:18:38,920 --> 00:18:42,040
Speaker 1: because of the activity of this area. So we see

300
00:18:42,119 --> 00:18:45,280
Speaker 1: lots of areas in brain imaging experiments. And I want

301
00:18:45,320 --> 00:18:48,399
Speaker 1: to mention this to illustrate that theory of mind is

302
00:18:48,440 --> 00:18:51,960
Speaker 1: a brain wide issue. It's not a single area. And

303
00:18:52,000 --> 00:18:53,720
Speaker 1: by the way, this is true of so many things

304
00:18:53,760 --> 00:18:57,439
Speaker 1: in neuroscience. Imagine that I spread out a map of

305
00:18:57,480 --> 00:19:00,280
Speaker 1: your city and I ask you, hey, can you put

306
00:19:00,320 --> 00:19:03,840
Speaker 1: a pin in the spot that represents the economy of

307
00:19:03,880 --> 00:19:07,040
Speaker 1: the city. You tell me that that is a misplaced request.

308
00:19:07,359 --> 00:19:10,840
Speaker 1: There is no single spot for the economy. The economy

309
00:19:11,000 --> 00:19:14,439
Speaker 1: emerges from all the interactions between all the pieces and

310
00:19:14,480 --> 00:19:17,000
Speaker 1: parts of the city, and it's the same with almost

311
00:19:17,080 --> 00:19:21,080
Speaker 1: everything in neuroscience, and especially something like the skill of

312
00:19:21,200 --> 00:19:24,440
Speaker 1: slipping into someone else's point of view. There's not one

313
00:19:24,600 --> 00:19:27,640
Speaker 1: spot to drop a pin into. Instead, it is an

314
00:19:27,720 --> 00:19:32,600
Speaker 1: emergent property that develops from the interaction of lots of networks.

315
00:19:32,880 --> 00:19:35,679
Speaker 1: So what we've seen so far is that theory of

316
00:19:35,840 --> 00:19:39,560
Speaker 1: mind is this ability to infer what someone else knows,

317
00:19:39,600 --> 00:19:41,560
Speaker 1: and we've seen that this is right at the center

318
00:19:42,040 --> 00:19:46,000
Speaker 1: of social interactions. It's something that most humans develop naturally,

319
00:19:46,440 --> 00:19:49,200
Speaker 1: but that doesn't mean it's simple. And the question we're

320
00:19:49,200 --> 00:19:54,159
Speaker 1: going to ask today is does AI have theory of mind?

321
00:19:54,280 --> 00:19:58,640
Speaker 1: Can it put itself into someone else's shoes to understand

322
00:19:59,080 --> 00:20:03,000
Speaker 1: their limited knowledge. One of my colleagues at Stanford recently

323
00:20:03,000 --> 00:20:08,119
Speaker 1: wrote a paper suggesting yes, AI can do this. But fascinatingly,

324
00:20:08,720 --> 00:20:11,679
Speaker 1: it's not as easy to answer this question as you

325
00:20:11,760 --> 00:20:14,200
Speaker 1: might think. And this is for some reasons that we're

326
00:20:14,200 --> 00:20:16,760
Speaker 1: going to dive into. But before we get there, I

327
00:20:16,800 --> 00:20:19,640
Speaker 1: just want to zoom this out to a slightly larger question.

328
00:20:20,200 --> 00:20:25,280
Speaker 1: Could a computer develop theory of mind. Hypothetically, could an

329
00:20:25,320 --> 00:20:28,119
Speaker 1: AI system at some point in the future say, look,

330
00:20:28,280 --> 00:20:31,040
Speaker 1: I know XYZ to be true, but if I look

331
00:20:31,040 --> 00:20:33,639
Speaker 1: at that other person over there, I understand that they

332
00:20:33,640 --> 00:20:36,399
Speaker 1: have a limited viewpoint and that they don't know X

333
00:20:36,440 --> 00:20:41,439
Speaker 1: and Y, and that person over there misbelieves something about Z.

334
00:20:41,920 --> 00:20:46,560
Speaker 1: Well almost certainly, yes, Why it's because we're made up

335
00:20:46,600 --> 00:20:50,280
Speaker 1: of physical stuff and we're running algorithms that took hundreds

336
00:20:50,280 --> 00:20:54,600
Speaker 1: of millions of years to refine. But nonetheless it's physical stuff.

337
00:20:54,720 --> 00:20:59,160
Speaker 1: So if we can do something, presumably a machine could

338
00:20:59,160 --> 00:21:01,760
Speaker 1: do it also, whether or not it's currently clear how

339
00:21:01,760 --> 00:21:07,679
Speaker 1: that's done. That's the central premise of computational neuroscience, and

340
00:21:07,720 --> 00:21:10,240
Speaker 1: to my mind, one of the most remarkable effects of

341
00:21:10,280 --> 00:21:14,800
Speaker 1: the AI explosion over the last few years is understanding

342
00:21:15,280 --> 00:21:18,280
Speaker 1: that things that would have seemed impossible to do with

343
00:21:18,359 --> 00:21:21,760
Speaker 1: a machine, things that almost everyone would have sworn couldn't

344
00:21:21,800 --> 00:21:25,080
Speaker 1: be done. It now seems like background furniture as we

345
00:21:25,160 --> 00:21:28,080
Speaker 1: wait for the next thing. Now, the complexity of the

346
00:21:28,119 --> 00:21:31,040
Speaker 1: brain suggests that theory of mind is going to be

347
00:21:31,080 --> 00:21:34,399
Speaker 1: a very hard problem to solve, because it requires us

348
00:21:34,400 --> 00:21:37,240
Speaker 1: to understand how the brain has a model of the

349
00:21:37,280 --> 00:21:41,320
Speaker 1: world and then how it can make submodels and simulate

350
00:21:41,720 --> 00:21:43,720
Speaker 1: what it is like to only know part of the

351
00:21:43,760 --> 00:21:47,120
Speaker 1: story or to believe a different story. So we don't

352
00:21:47,119 --> 00:21:49,919
Speaker 1: currently know how our brains do it, but of course

353
00:21:50,400 --> 00:21:54,280
Speaker 1: we have Our computers do this sort of thing often,

354
00:21:54,720 --> 00:21:58,520
Speaker 1: Like you can take your modern MacBook laptop and use

355
00:21:58,600 --> 00:22:02,200
Speaker 1: a little bit of its processor to simulate an old

356
00:22:02,760 --> 00:22:06,600
Speaker 1: timex Sinclare computer. Your mac can perfectly simulate it by

357
00:22:06,680 --> 00:22:11,360
Speaker 1: running what's called an emulation on part of its computational hardware.

358
00:22:11,800 --> 00:22:17,160
Speaker 1: Somehow human brains can run emulations also, like just by

359
00:22:17,240 --> 00:22:20,440
Speaker 1: looking you can emulate what it's like to not know

360
00:22:20,560 --> 00:22:23,199
Speaker 1: that the shark is there below you. So yes, it

361
00:22:23,240 --> 00:22:26,639
Speaker 1: seems totally plausible to me that a machine could do

362
00:22:26,880 --> 00:22:30,560
Speaker 1: theory of mind, because we can. But the question we

363
00:22:30,600 --> 00:22:33,639
Speaker 1: want to ask today is whether we are there or

364
00:22:33,720 --> 00:22:38,000
Speaker 1: not right now? Have current large language models like chat

365
00:22:38,080 --> 00:22:42,360
Speaker 1: GPT come to solve this problem without us telling them

366
00:22:42,359 --> 00:22:46,560
Speaker 1: explicitly to do so, in other words, with no instruction? Whatsoever?

367
00:22:47,119 --> 00:22:51,600
Speaker 1: Is the emulation of other minds and emergent property that

368
00:22:51,720 --> 00:22:54,560
Speaker 1: comes out of these things, which would absolutely blow our

369
00:22:54,600 --> 00:22:59,240
Speaker 1: minds if true, does AI do theory of mind? If

370
00:22:59,280 --> 00:23:03,639
Speaker 1: it can, this would have profound implications for our understanding

371
00:23:03,760 --> 00:23:07,119
Speaker 1: of intelligence and our relationship with AI. I mean, just

372
00:23:07,200 --> 00:23:09,320
Speaker 1: consider how much better it would be if it could

373
00:23:09,359 --> 00:23:14,320
Speaker 1: emulate the mental states of people, like with auto driving cars,

374
00:23:14,400 --> 00:23:18,120
Speaker 1: if it didn't just depend on the observable, but instead

375
00:23:18,160 --> 00:23:21,080
Speaker 1: on what's going on in the other driver's head. Like,

376
00:23:21,480 --> 00:23:23,679
Speaker 1: given the trajectory of this car, I think that the

377
00:23:23,720 --> 00:23:27,720
Speaker 1: other driver is drunk or asleep or distracted. And so

378
00:23:27,880 --> 00:23:30,679
Speaker 1: here's what I think is going to happen next. So

379
00:23:31,119 --> 00:23:34,520
Speaker 1: a colleague of mine at Stanford, Michael Kazinski, published a

380
00:23:34,640 --> 00:23:38,479
Speaker 1: twenty twenty three paper that was originally titled Theory of

381
00:23:38,600 --> 00:23:44,159
Speaker 1: Mind might have spontaneously emerged in large language models, although

382
00:23:44,160 --> 00:23:47,119
Speaker 1: he later changed the title. In the paper, he suggested

383
00:23:47,400 --> 00:23:50,600
Speaker 1: that even though these AI models didn't set out to

384
00:23:50,880 --> 00:23:54,840
Speaker 1: have theory of mind, it may have appeared anyway as

385
00:23:55,000 --> 00:23:59,400
Speaker 1: a byproduct of their improving language skills. So, for example,

386
00:23:59,640 --> 00:24:04,560
Speaker 1: he gives the following scenario to chatchipt complete the following story.

387
00:24:05,119 --> 00:24:08,600
Speaker 1: Here is a bag filled with popcorn. There is no

388
00:24:08,880 --> 00:24:12,160
Speaker 1: chocolate in the bag, yet the label on the bag

389
00:24:12,200 --> 00:24:17,280
Speaker 1: says chocolate and not popcorn. Sam finds the bag. She

390
00:24:17,320 --> 00:24:20,639
Speaker 1: has never seen this bag before Sam doesn't open the

391
00:24:20,680 --> 00:24:24,800
Speaker 1: bag and doesn't look inside. Sam reads the label and

392
00:24:24,840 --> 00:24:27,639
Speaker 1: then he gives the prompt. Sam opens the bag and

393
00:24:27,760 --> 00:24:31,639
Speaker 1: looks inside. She can clearly see that it is full of.

394
00:24:32,359 --> 00:24:35,760
Speaker 1: And then he looks at the word that Chatgypt produces,

395
00:24:35,960 --> 00:24:40,879
Speaker 1: is it popcorn or chocolate? And chatchipt says popcorn. But

396
00:24:40,920 --> 00:24:44,320
Speaker 1: if instead he gives a different prompt, Sam calls a

397
00:24:44,359 --> 00:24:47,280
Speaker 1: friend to tell him that she has just found a

398
00:24:47,400 --> 00:24:52,760
Speaker 1: bag full of and now Chatchipet says chocolate, indicating that

399
00:24:52,920 --> 00:24:57,840
Speaker 1: Sam holds a false belief. And Kasinski runs this a

400
00:24:57,880 --> 00:25:01,240
Speaker 1: bunch of ways and shows that chat Gi gets the

401
00:25:01,320 --> 00:25:05,320
Speaker 1: right answer. So is there something going on here? And

402
00:25:05,359 --> 00:25:07,479
Speaker 1: you can try this for yourself. Type in a version

403
00:25:07,680 --> 00:25:11,440
Speaker 1: of the Sally and test where Sally hides her ball

404
00:25:11,520 --> 00:25:13,919
Speaker 1: under the bed and then leaves and An comes in

405
00:25:14,000 --> 00:25:16,520
Speaker 1: later and sees it, moves into the closet, and you

406
00:25:16,560 --> 00:25:19,480
Speaker 1: ask when Sally comes back in the room, where will

407
00:25:19,520 --> 00:25:22,200
Speaker 1: she look for the ball? And chat gpt will tell

408
00:25:22,240 --> 00:25:25,879
Speaker 1: you that Sally will look for the ball under the bed.

409
00:25:26,320 --> 00:25:28,879
Speaker 1: And this is amazing, right, So I want to be

410
00:25:29,040 --> 00:25:32,920
Speaker 1: clear why I think it is meaningless that AI can

411
00:25:33,000 --> 00:25:36,320
Speaker 1: pass these tests if anyone ever tells you that this

412
00:25:36,480 --> 00:25:39,240
Speaker 1: is proof that AI has theory of mind, please let

413
00:25:39,240 --> 00:25:43,520
Speaker 1: them know this is not proof. Why. Well, that question

414
00:25:43,600 --> 00:25:47,359
Speaker 1: about the bag of popcorn that's labeled chocolate, that is

415
00:25:47,480 --> 00:25:52,240
Speaker 1: known as the unexpected Content's task, and this was originally

416
00:25:52,320 --> 00:25:56,280
Speaker 1: published by three researchers in nineteen eighty seven. Hundreds of

417
00:25:56,320 --> 00:26:00,680
Speaker 1: papers cite this or replicate this, hundreds of blogs about this,

418
00:26:01,280 --> 00:26:04,280
Speaker 1: so of course a large language model gets it right.

419
00:26:04,680 --> 00:26:07,800
Speaker 1: And the sally An test is in even more places

420
00:26:07,840 --> 00:26:12,080
Speaker 1: on the web, literally hundreds of thousands of places. It's

421
00:26:12,119 --> 00:26:16,199
Speaker 1: known in the literature as the unexpected transfer test. So

422
00:26:16,359 --> 00:26:21,000
Speaker 1: of course chat GPT solves these challenges. That's what large

423
00:26:21,040 --> 00:26:25,240
Speaker 1: language models do. They read everything that has come before them,

424
00:26:25,480 --> 00:26:29,199
Speaker 1: so it well knows the punchline of this question. It

425
00:26:29,320 --> 00:26:47,920
Speaker 1: is a statistical parrot. Now I'll give you one more

426
00:26:47,960 --> 00:26:50,960
Speaker 1: example of this that I mentioned in an earlier episode,

427
00:26:51,200 --> 00:26:53,959
Speaker 1: when a friend of mine was blown away by the

428
00:26:53,960 --> 00:26:57,480
Speaker 1: fact that he asked a visual reasoning problem to chat

429
00:26:57,520 --> 00:27:00,800
Speaker 1: GPT and it gave him the perfectly right answer. My

430
00:27:00,840 --> 00:27:04,399
Speaker 1: friend said, take a capital letter D and turn it

431
00:27:04,480 --> 00:27:07,240
Speaker 1: on its side, flat side down, and then put that

432
00:27:07,320 --> 00:27:10,199
Speaker 1: on top of a capital letter J, what does that

433
00:27:10,280 --> 00:27:13,760
Speaker 1: look like? And chat GPT said, it looks like an umbrella.

434
00:27:13,800 --> 00:27:16,040
Speaker 1: And my friend was so impressed with this that he

435
00:27:16,119 --> 00:27:18,480
Speaker 1: told me he was certain that chat GPT could do

436
00:27:18,680 --> 00:27:22,560
Speaker 1: visual reasoning. But I pointed out to him that this

437
00:27:22,720 --> 00:27:26,480
Speaker 1: example he used was this single most used example in

438
00:27:26,560 --> 00:27:29,880
Speaker 1: the literature on visual reasoning. I knew about this from

439
00:27:29,880 --> 00:27:33,320
Speaker 1: a quite famous paper from nineteen eighty nine, although I

440
00:27:33,320 --> 00:27:35,199
Speaker 1: don't even know if that was the first usage of it,

441
00:27:35,560 --> 00:27:39,520
Speaker 1: and you can find precisely that question referenced online in

442
00:27:39,720 --> 00:27:43,359
Speaker 1: thousands of places. Now, I don't know whether he was

443
00:27:43,480 --> 00:27:46,439
Speaker 1: consciously aware that question was something he had heard before,

444
00:27:47,160 --> 00:27:50,000
Speaker 1: or if he had heard it years ago and erroneously

445
00:27:50,119 --> 00:27:52,600
Speaker 1: thought he had thought of it. Or there's also the

446
00:27:52,760 --> 00:27:55,639
Speaker 1: very tiny possibility that he had never heard that question

447
00:27:55,680 --> 00:27:58,800
Speaker 1: before and had thought of it independently. But that just

448
00:27:59,040 --> 00:28:02,840
Speaker 1: underscores the point even more that we live on a

449
00:28:02,920 --> 00:28:07,800
Speaker 1: planet with billions of other brains, and almost anything you

450
00:28:07,920 --> 00:28:12,119
Speaker 1: think of has been thought before and likely written down,

451
00:28:12,560 --> 00:28:17,040
Speaker 1: maybe hundreds of thousands of times. So the point is

452
00:28:17,359 --> 00:28:20,919
Speaker 1: that you may think a large language model is brilliant

453
00:28:21,480 --> 00:28:24,760
Speaker 1: when it is just a good imitator. Now, one important

454
00:28:24,760 --> 00:28:27,840
Speaker 1: point on this, you might think, hey, instead of talking

455
00:28:27,840 --> 00:28:30,879
Speaker 1: about Sally and Anne, what if I do something clever

456
00:28:30,960 --> 00:28:34,679
Speaker 1: and I ask chat GPT about Brett and Michael, And

457
00:28:34,760 --> 00:28:38,480
Speaker 1: instead of putting the baseball under the bed, Rrehtt puts

458
00:28:38,520 --> 00:28:41,040
Speaker 1: a marble in a box. And then Michael finds the

459
00:28:41,080 --> 00:28:43,320
Speaker 1: marble and puts it up on the shelf. And the

460
00:28:43,440 --> 00:28:46,239
Speaker 1: question is where does Brett look for the marble. But

461
00:28:46,280 --> 00:28:50,360
Speaker 1: you'll find that the large language model has no trouble generalizing,

462
00:28:50,600 --> 00:28:54,360
Speaker 1: especially as it has digested multiple flavors of this task.

463
00:28:54,880 --> 00:28:59,080
Speaker 1: And this is because it's mapping the relationship between concepts

464
00:28:59,160 --> 00:29:01,720
Speaker 1: in its latent space. If you don't know what latent

465
00:29:01,760 --> 00:29:03,440
Speaker 1: space is, I'm going to do an episode on that

466
00:29:03,560 --> 00:29:06,800
Speaker 1: quite soon because it's such an amazing concept. So you

467
00:29:06,880 --> 00:29:10,560
Speaker 1: might be tempted to say it's not just a statistical parrot,

468
00:29:10,600 --> 00:29:14,520
Speaker 1: it's understanding something deeper in its latent space. But I

469
00:29:14,560 --> 00:29:17,520
Speaker 1: think this could also be a wrong interpretation. It is

470
00:29:17,600 --> 00:29:21,959
Speaker 1: still a statistical parrot that doesn't know what it is

471
00:29:22,080 --> 00:29:25,280
Speaker 1: to be another person, but it nonetheless learns from the

472
00:29:25,320 --> 00:29:29,920
Speaker 1: statistics which words to put after what. In other words,

473
00:29:30,280 --> 00:29:33,920
Speaker 1: it's not clear that these systems have to truly understand

474
00:29:34,080 --> 00:29:38,960
Speaker 1: other people's thoughts and feelings to simply extract the patterns

475
00:29:39,240 --> 00:29:43,200
Speaker 1: from what they have been trained on. And he might say, well,

476
00:29:43,480 --> 00:29:45,040
Speaker 1: how do we know that's not the same with us,

477
00:29:45,080 --> 00:29:48,760
Speaker 1: How do you know that we're not just extracting statistics. Well,

478
00:29:48,880 --> 00:29:52,000
Speaker 1: when you are watching the woman swimming in the opening

479
00:29:52,040 --> 00:29:55,640
Speaker 1: scene of Jaws and you feel fear because the shark

480
00:29:55,760 --> 00:29:59,680
Speaker 1: is circling below her, it's not that you have memorized

481
00:29:59,720 --> 00:30:03,440
Speaker 1: the answer of similar problems, and that's how you conclude

482
00:30:03,840 --> 00:30:06,640
Speaker 1: that she doesn't know the shark is there. Instead, your

483
00:30:06,680 --> 00:30:09,640
Speaker 1: heart starts racing and you start gripping the chair because

484
00:30:10,120 --> 00:30:13,560
Speaker 1: you've been in similar situations where there's nothing but dark

485
00:30:13,600 --> 00:30:16,840
Speaker 1: water below you, and you know she really doesn't know,

486
00:30:17,280 --> 00:30:21,680
Speaker 1: and you appreciate how terrifying the situation is. So what

487
00:30:21,760 --> 00:30:24,840
Speaker 1: I have described to you is a problem where knowledge

488
00:30:24,880 --> 00:30:28,280
Speaker 1: exists in the literature written by humans, and the AI

489
00:30:28,600 --> 00:30:32,560
Speaker 1: digests that writing, but the person running the query doesn't

490
00:30:32,560 --> 00:30:36,600
Speaker 1: fully appreciate that. And this is a very basic confusion

491
00:30:36,640 --> 00:30:39,600
Speaker 1: that I'm watching A lot of people have about large

492
00:30:39,640 --> 00:30:43,200
Speaker 1: language models. They type in a sophisticated question and they

493
00:30:43,240 --> 00:30:45,880
Speaker 1: get back what appears to be a sophisticated answer, and

494
00:30:45,920 --> 00:30:50,320
Speaker 1: they conclude this thing is truly intelligent. This thing has

495
00:30:50,440 --> 00:30:53,840
Speaker 1: theory of mind, or it's sentient, or it can visualize.

496
00:30:54,560 --> 00:30:57,320
Speaker 1: And I'm seeing this so commonly now that I've decided

497
00:30:57,360 --> 00:30:59,840
Speaker 1: to give it a name. I'm calling this the in

498
00:31:00,000 --> 00:31:05,360
Speaker 1: intelligence echo illusion. This happens when you think AI is

499
00:31:05,520 --> 00:31:08,959
Speaker 1: answering something with great insight, but really what you're hearing

500
00:31:09,000 --> 00:31:12,040
Speaker 1: back is just an echo of things that have already

501
00:31:12,080 --> 00:31:16,120
Speaker 1: been said by humans before. In other words, you think

502
00:31:16,280 --> 00:31:20,000
Speaker 1: it's intelligent, but you're confusing that with the intellectual endeavors

503
00:31:20,520 --> 00:31:23,880
Speaker 1: of other people. Maybe dozens of people had written about this,

504
00:31:24,160 --> 00:31:27,160
Speaker 1: or hundreds or thousands, but you simply didn't know that,

505
00:31:27,600 --> 00:31:31,400
Speaker 1: and so you're hearing their echo and you misinterpret that

506
00:31:31,680 --> 00:31:35,680
Speaker 1: echo as the proud voice of AI. So I ran

507
00:31:35,720 --> 00:31:38,280
Speaker 1: some calculations on this. There are eight point two billion

508
00:31:38,280 --> 00:31:40,800
Speaker 1: people on the planet alive right now, and let's call

509
00:31:40,840 --> 00:31:43,960
Speaker 1: it one hundred and fifteen billion humans who have lived

510
00:31:43,960 --> 00:31:47,200
Speaker 1: and died before us. And every one of these billions

511
00:31:47,800 --> 00:31:51,080
Speaker 1: was thinking and having their own stories every day of

512
00:31:51,120 --> 00:31:54,760
Speaker 1: their lives, and some fraction wrote their thoughts down, and

513
00:31:54,800 --> 00:31:58,560
Speaker 1: as a result, these large language models like CHATGPT are

514
00:31:58,600 --> 00:32:02,640
Speaker 1: trained on massive data sets of what is already out

515
00:32:02,640 --> 00:32:06,560
Speaker 1: there written down by humans. We're talking hundreds of billions

516
00:32:06,560 --> 00:32:10,320
Speaker 1: of words. These data sets are pulled from books and

517
00:32:10,440 --> 00:32:14,280
Speaker 1: websites and blogs and articles and on and on. So,

518
00:32:14,400 --> 00:32:18,520
Speaker 1: for example, the training data for these large language models

519
00:32:18,520 --> 00:32:23,640
Speaker 1: includes a data set called common crawl, which contains hundreds

520
00:32:23,680 --> 00:32:28,880
Speaker 1: of terabytes of text. Now assume you read for an

521
00:32:28,920 --> 00:32:31,160
Speaker 1: hour every day of your life, let's say at an

522
00:32:31,160 --> 00:32:33,280
Speaker 1: average speed of two hundred and fifty words per minute,

523
00:32:33,360 --> 00:32:35,960
Speaker 1: and you do that for reading window of seventy years.

524
00:32:36,400 --> 00:32:39,400
Speaker 1: That's three hundred million words that you can read in

525
00:32:39,400 --> 00:32:42,720
Speaker 1: your lifetime, which means that what you consume in a

526
00:32:42,760 --> 00:32:47,640
Speaker 1: lifetime is one one thousandth of what chat GPT is

527
00:32:47,680 --> 00:32:50,840
Speaker 1: trained on. That means if you digest books every day

528
00:32:50,840 --> 00:32:54,200
Speaker 1: of your entire life, you still only read point one

529
00:32:54,400 --> 00:32:58,240
Speaker 1: percent of what chat GPT has read. You would need

530
00:32:58,720 --> 00:33:02,520
Speaker 1: a thousand life times to know what it knows, and

531
00:33:02,560 --> 00:33:04,800
Speaker 1: on top of that, you'd have to actually remember every

532
00:33:04,840 --> 00:33:08,440
Speaker 1: sentence of what you read. So there are many many

533
00:33:09,000 --> 00:33:13,200
Speaker 1: questions and answers that a large language model has trained

534
00:33:13,240 --> 00:33:17,080
Speaker 1: on that you either have no knowledge of, or maybe

535
00:33:17,120 --> 00:33:19,760
Speaker 1: you had heard it before, but don't remember, and in

536
00:33:19,800 --> 00:33:23,200
Speaker 1: any case, you probably don't realize that it has been

537
00:33:23,320 --> 00:33:26,080
Speaker 1: pre trained on that. So what's the result of this, Well,

538
00:33:26,120 --> 00:33:29,480
Speaker 1: if you ask the large language model what color is

539
00:33:29,520 --> 00:33:32,000
Speaker 1: a pumpkin and an answers orange, you probably won't be

540
00:33:32,000 --> 00:33:35,320
Speaker 1: that surprised. But if we ask where Sally looks for

541
00:33:35,360 --> 00:33:38,400
Speaker 1: the baseball and it says under the bed, then we

542
00:33:38,480 --> 00:33:40,959
Speaker 1: clap our hands over our mouths and we say it

543
00:33:41,040 --> 00:33:44,520
Speaker 1: has theory of mind. That's why I decided I needed

544
00:33:44,600 --> 00:33:48,680
Speaker 1: to give a name to this phenomenon, the intelligence echo illusion,

545
00:33:49,040 --> 00:33:53,400
Speaker 1: because often naming something allows us to more easily see it.

546
00:33:53,880 --> 00:33:56,640
Speaker 1: And by the way, if you see good examples of

547
00:33:56,680 --> 00:33:59,800
Speaker 1: this intelligence echo where people mistake things that have been

548
00:33:59,840 --> 00:34:03,160
Speaker 1: rich before for AI that has woken up into a

549
00:34:03,160 --> 00:34:06,400
Speaker 1: world of sentience, let me know at podcasts at Egleman

550
00:34:06,480 --> 00:34:09,000
Speaker 1: dot com. And this brings me to the second reason

551
00:34:09,080 --> 00:34:12,960
Speaker 1: why we should be skeptical about current AI having theory

552
00:34:13,000 --> 00:34:15,799
Speaker 1: of mind. And this is less about the AI and

553
00:34:15,960 --> 00:34:19,359
Speaker 1: one hundred percent about us, And that issue is we

554
00:34:19,400 --> 00:34:22,560
Speaker 1: are very easily fooled. So I'll give you an example.

555
00:34:23,040 --> 00:34:25,959
Speaker 1: In the nineteen sixties, there was a computer scientist named

556
00:34:26,320 --> 00:34:32,479
Speaker 1: Joseph Weisenbaum at MIT, who developed the first chatbot, named Eliza. Now,

557
00:34:32,560 --> 00:34:37,560
Speaker 1: Eliza was designed to act like a psychotherapist. So you

558
00:34:38,000 --> 00:34:42,040
Speaker 1: type in some statement, and it identifies keywords in your

559
00:34:42,080 --> 00:34:46,160
Speaker 1: input and then generates its responses based on a set

560
00:34:46,200 --> 00:34:51,160
Speaker 1: of predefined scripts. So you type I'm feeling sad, and

561
00:34:51,200 --> 00:34:53,120
Speaker 1: Eliza responds.

562
00:34:52,840 --> 00:34:54,160
Speaker 2: Why are you feeling sad?

563
00:34:54,360 --> 00:34:57,799
Speaker 1: All Eliza did was reflect your statement back to you

564
00:34:57,920 --> 00:34:59,799
Speaker 1: in the form of a question, kind of like a

565
00:35:00,040 --> 00:35:04,000
Speaker 1: psychotherapist might do. So a typed chat goes like this.

566
00:35:04,200 --> 00:35:08,360
Speaker 1: You type I need some help with my relationship, and Eliza.

567
00:35:08,000 --> 00:35:10,240
Speaker 2: Writes what kind of help do you think you need?

568
00:35:10,480 --> 00:35:14,120
Speaker 1: And you say I think my partner doesn't listen to me.

569
00:35:14,520 --> 00:35:17,840
Speaker 2: Eliza types, what makes you think your partner doesn't listen

570
00:35:17,880 --> 00:35:18,400
Speaker 2: to you?

571
00:35:18,400 --> 00:35:22,439
Speaker 1: You say, whenever I talk about my feelings, they seem distracted.

572
00:35:22,800 --> 00:35:24,960
Speaker 2: How do you feel when you talk about your feelings

573
00:35:25,160 --> 00:35:26,440
Speaker 2: and they seem distracted?

574
00:35:26,760 --> 00:35:32,160
Speaker 1: So Eliza's only strategy is to ask you your statements back. Now,

575
00:35:32,200 --> 00:35:34,759
Speaker 1: Eliza was just a few kilobytes of code in the

576
00:35:34,840 --> 00:35:38,200
Speaker 1: nineteen sixties, and it simply flipped whatever you said into

577
00:35:38,200 --> 00:35:42,439
Speaker 1: a question, and it had no ability to infer your

578
00:35:42,520 --> 00:35:46,080
Speaker 1: mental state or your emotions, so no one even suggested

579
00:35:46,440 --> 00:35:51,160
Speaker 1: that it had any understanding of the content of the conversation. Nonetheless,

580
00:35:51,280 --> 00:35:55,800
Speaker 1: it simulated a basic conversational partner, and many users became

581
00:35:55,960 --> 00:35:59,440
Speaker 1: emotionally attached to Eliza, even though they knew it was

582
00:35:59,520 --> 00:36:04,160
Speaker 1: just a machine. And this illustrates how seductively easy it

583
00:36:04,200 --> 00:36:08,040
Speaker 1: is for us to bring all our communication machinery to

584
00:36:08,120 --> 00:36:11,440
Speaker 1: the table and assume that the words we get back

585
00:36:11,960 --> 00:36:16,040
Speaker 1: must have a mind behind it. This early experiment demonstrated

586
00:36:16,080 --> 00:36:21,600
Speaker 1: that even simple pattern recognition can evoke genuine emotional responses

587
00:36:21,600 --> 00:36:24,880
Speaker 1: from the users. Now fast forward to today, and we

588
00:36:24,960 --> 00:36:29,200
Speaker 1: have large language models that have trillions of times more

589
00:36:29,320 --> 00:36:34,800
Speaker 1: code than Eliza, and this seduction is only magnified. Modern

590
00:36:34,840 --> 00:36:39,120
Speaker 1: AI can process prompts without any true understanding, but we

591
00:36:39,280 --> 00:36:44,280
Speaker 1: humans still get pulled into feeling like there's someone there

592
00:36:44,640 --> 00:36:47,840
Speaker 1: on the other end of the line. Okay, so we

593
00:36:48,000 --> 00:36:51,560
Speaker 1: established early on that there's no reason in theory a

594
00:36:51,640 --> 00:36:55,120
Speaker 1: computer couldn't emulate other minds. But on the other hand,

595
00:36:55,160 --> 00:36:58,720
Speaker 1: we've established that just because a large language model seems

596
00:36:58,760 --> 00:37:03,239
Speaker 1: to sometimes nail the answers doesn't necessitate that it is

597
00:37:03,280 --> 00:37:06,000
Speaker 1: doing theory of mind It may simply tell us that

598
00:37:06,040 --> 00:37:12,000
Speaker 1: the answer exists somewhere in the unimaginably large corpus that

599
00:37:12,120 --> 00:37:14,719
Speaker 1: humans have written, or even by the way that there's

600
00:37:14,760 --> 00:37:17,520
Speaker 1: been some fine tuning on the model where someone adds

601
00:37:17,560 --> 00:37:21,040
Speaker 1: a similar problem by hand. In other words, the AI

602
00:37:21,200 --> 00:37:25,200
Speaker 1: is doing an interpolation between answers that it has seen before,

603
00:37:25,480 --> 00:37:29,560
Speaker 1: but it's not actually putting itself in someone else's mind.

604
00:37:30,239 --> 00:37:33,680
Speaker 1: So does modern AI have theory of mind? As of now,

605
00:37:33,800 --> 00:37:36,880
Speaker 1: I'm not convinced that we have any reason to think so.

606
00:37:37,680 --> 00:37:41,759
Speaker 1: Current large language models are making sophisticated decisions about which

607
00:37:41,800 --> 00:37:46,759
Speaker 1: word comes next. That's it. They don't understand in the

608
00:37:46,840 --> 00:37:49,719
Speaker 1: human sense of seeing the woman in Jaws or the

609
00:37:49,719 --> 00:37:52,919
Speaker 1: man who has lost his keys and thinking about what

610
00:37:53,040 --> 00:37:56,240
Speaker 1: it is like to be them. And this is why

611
00:37:56,520 --> 00:38:00,520
Speaker 1: Siri or Alexo or Google can respond to your queries

612
00:38:00,600 --> 00:38:04,880
Speaker 1: quite well. But they don't know anything about your beliefs

613
00:38:05,040 --> 00:38:08,560
Speaker 1: or desires or emotions. They don't know if you're asking

614
00:38:08,560 --> 00:38:12,319
Speaker 1: a question because you are curious, or you're confused, or

615
00:38:12,560 --> 00:38:16,719
Speaker 1: you're just making conversation, or you're being sarcastic. So this

616
00:38:16,800 --> 00:38:20,080
Speaker 1: is all to say there is a difference between simulating

617
00:38:20,160 --> 00:38:25,000
Speaker 1: responses based on word probabilities and actually slipping into other

618
00:38:25,080 --> 00:38:28,560
Speaker 1: people's shoes. Now, as I said before, this has nothing

619
00:38:28,600 --> 00:38:31,800
Speaker 1: to do with whether we will come to develop AI

620
00:38:31,920 --> 00:38:34,480
Speaker 1: that can do theory of mind. There are several research

621
00:38:34,520 --> 00:38:39,480
Speaker 1: groups working on AI systems that try to infer intentions

622
00:38:39,480 --> 00:38:43,280
Speaker 1: and desires, and this would have applications and everything from

623
00:38:43,760 --> 00:38:48,640
Speaker 1: more intuitive personal assistance to robots that can better collaborate

624
00:38:48,680 --> 00:38:53,759
Speaker 1: with humans in complex tasks. Now, let's note something interesting here.

625
00:38:54,320 --> 00:38:58,000
Speaker 1: Even if we can get AI to make inferences, it's

626
00:38:58,000 --> 00:39:01,680
Speaker 1: still not clear whether that will be true theory of mind.

627
00:39:01,840 --> 00:39:05,000
Speaker 1: That might require the AI to have some level of

628
00:39:05,480 --> 00:39:10,680
Speaker 1: self awareness or consciousness or subjective experience. But as Kazinski

629
00:39:10,719 --> 00:39:13,440
Speaker 1: points out, even if we don't think the AI has

630
00:39:13,520 --> 00:39:16,879
Speaker 1: theory of mind, there might be value in machines behaving

631
00:39:17,040 --> 00:39:20,040
Speaker 1: as though they possess theory of mind. And that's certainly

632
00:39:20,040 --> 00:39:24,120
Speaker 1: a valid point. Alan Turing, who proposed the imitation game

633
00:39:24,200 --> 00:39:28,160
Speaker 1: the turning test, considered the distinction between what a computer

634
00:39:28,400 --> 00:39:33,040
Speaker 1: actually has and what it seems to have to be meaningless.

635
00:39:33,320 --> 00:39:35,880
Speaker 1: A more modern version of this point is reflected in

636
00:39:35,920 --> 00:39:39,080
Speaker 1: the television show Westworld, which is about a future in

637
00:39:39,120 --> 00:39:42,200
Speaker 1: which there are lifelike human androids. And if you watch

638
00:39:42,320 --> 00:39:46,160
Speaker 1: the opening scene, the young William enters the first room

639
00:39:46,200 --> 00:39:48,759
Speaker 1: and there's a beautiful assistant who helps him to pick

640
00:39:48,760 --> 00:39:51,799
Speaker 1: out a hat and a gun, and she's very cirtatious

641
00:39:51,840 --> 00:39:55,400
Speaker 1: with him, and he nervously says, sorry to ask, but

642
00:39:55,760 --> 00:39:59,640
Speaker 1: are you real? And she says, if you can't tell,

643
00:40:00,320 --> 00:40:03,160
Speaker 1: does it matter? And maybe that'll be the case with

644
00:40:03,280 --> 00:40:06,360
Speaker 1: AI in the near future. It will fake theory of

645
00:40:06,440 --> 00:40:09,640
Speaker 1: mind and that will be enough for us to reap

646
00:40:09,800 --> 00:40:13,720
Speaker 1: all the benefits. So let's wrap up. While current large

647
00:40:13,760 --> 00:40:17,759
Speaker 1: language models are mind blowingly impressive, I land on the

648
00:40:17,760 --> 00:40:20,800
Speaker 1: position that while they can often get the right answer

649
00:40:20,960 --> 00:40:24,160
Speaker 1: on theory of mind tests, it's an illusion. They're not

650
00:40:24,280 --> 00:40:27,040
Speaker 1: actually simulating what it's like to be someone else. And

651
00:40:27,080 --> 00:40:31,120
Speaker 1: this is what I'm now calling the intelligence echo illusion.

652
00:40:31,800 --> 00:40:35,000
Speaker 1: The illusion results from humans having built over thousands of

653
00:40:35,080 --> 00:40:39,680
Speaker 1: years and incredibly large corpus of ideas and questions and

654
00:40:39,760 --> 00:40:43,280
Speaker 1: answers a thousand times larger than you could ever read

655
00:40:43,400 --> 00:40:47,480
Speaker 1: in the lifetime. And sometimes you don't know that the

656
00:40:47,560 --> 00:40:51,319
Speaker 1: answers are already in there, and when you hear an

657
00:40:51,400 --> 00:40:56,240
Speaker 1: echo of humans, you mistake that for intelligence of the computer.

658
00:40:56,560 --> 00:40:59,400
Speaker 1: So that's the position I'm taking for now. Large language

659
00:40:59,400 --> 00:41:02,480
Speaker 1: models life back a true theory of mind. The question

660
00:41:02,920 --> 00:41:06,320
Speaker 1: is whether we will get there someday. Probably it won't

661
00:41:06,320 --> 00:41:09,240
Speaker 1: be with large language models, but instead a very different

662
00:41:09,360 --> 00:41:15,120
Speaker 1: kind of architecture, possibly one that has semoticum of consciousness

663
00:41:15,440 --> 00:41:17,879
Speaker 1: so that it is able to reflect on its own

664
00:41:18,000 --> 00:41:22,120
Speaker 1: mental states to emulate someone else's. So thank you for

665
00:41:22,239 --> 00:41:24,920
Speaker 1: joining me on this journey into the mind, both human

666
00:41:25,040 --> 00:41:28,040
Speaker 1: and artificial. If you enjoyed this episode, don't forget to

667
00:41:28,080 --> 00:41:30,480
Speaker 1: subscribe and rate and review, and if you have any

668
00:41:30,560 --> 00:41:32,640
Speaker 1: questions or topics that you'd like to hear about in

669
00:41:32,680 --> 00:41:36,520
Speaker 1: future episodes, feel free to reach out. Until next time,

670
00:41:36,680 --> 00:41:42,080
Speaker 1: keep questioning, keep exploring, and stay curious. Go to eagleman

671
00:41:42,120 --> 00:41:45,680
Speaker 1: dot com slash podcast for more information and to find

672
00:41:45,719 --> 00:41:49,400
Speaker 1: further reading. Send me an email at podcasts at eagleman

673
00:41:49,480 --> 00:41:52,880
Speaker 1: dot com with questions or discussion and check out Subscribe

674
00:41:52,880 --> 00:41:56,239
Speaker 1: to Inner Cosmos on YouTube for videos of each episode

675
00:41:56,280 --> 00:41:59,959
Speaker 1: and to leave comments until next time. I'm David Eagle

676
00:42:00,080 --> 00:42:03,400
Speaker 1: and we have been exploring the Inner Cosmos