1
00:00:04,400 --> 00:00:07,800
Speaker 1: Welcome to Tech Stuff, a production from I Heart Radio.

2
00:00:12,360 --> 00:00:15,600
Speaker 1: Hey there, and welcome to tech Stuff. I'm your host,

3
00:00:15,800 --> 00:00:19,040
Speaker 1: Jonathan Strickland. I'm an executive producer with I Heart Radio,

4
00:00:19,079 --> 00:00:23,560
Speaker 1: and I love all things tech, And uh, you know

5
00:00:23,600 --> 00:00:26,840
Speaker 1: what the today's episode I was gonna I was gonna

6
00:00:26,840 --> 00:00:29,920
Speaker 1: make it a one partner, but it turns out there's

7
00:00:30,000 --> 00:00:33,040
Speaker 1: just way too much stuff, not just about the topic

8
00:00:33,080 --> 00:00:37,239
Speaker 1: at hand, but the various components that make up this

9
00:00:37,440 --> 00:00:40,600
Speaker 1: topic that require me to do more than one. So

10
00:00:40,760 --> 00:00:43,519
Speaker 1: this is gonna likely be a two parter. But today

11
00:00:43,960 --> 00:00:47,239
Speaker 1: I thought we could look back at the development and

12
00:00:47,400 --> 00:00:53,160
Speaker 1: evolution of a famous AI personality. This virtual assistant celebrated

13
00:00:53,200 --> 00:00:57,480
Speaker 1: an anniversary recently, and I must apologize for being a

14
00:00:57,480 --> 00:01:02,480
Speaker 1: couple of days late with this, but this particular servant

15
00:01:03,280 --> 00:01:07,120
Speaker 1: debuted on October fourth, two thousand eleven, technically for the

16
00:01:07,200 --> 00:01:11,680
Speaker 1: second time, but the history of the actual technology dates

17
00:01:11,720 --> 00:01:15,480
Speaker 1: back much further. And of course, I'm talking about Sirie,

18
00:01:15,920 --> 00:01:20,640
Speaker 1: Apple's virtual assistant that can interpret voice commands and return

19
00:01:20,720 --> 00:01:25,160
Speaker 1: results based on them. This is not just some dull

20
00:01:25,360 --> 00:01:30,680
Speaker 1: history lesson, however, Sirie really has an incredible backstory, ranging

21
00:01:30,720 --> 00:01:33,959
Speaker 1: from a science fiction vision of the future to a

22
00:01:34,120 --> 00:01:39,240
Speaker 1: secret project intended to augment the decision making capabilities of

23
00:01:39,280 --> 00:01:45,880
Speaker 1: the United States military. Yeah, Siri had a pretty tough background.

24
00:01:46,640 --> 00:01:50,720
Speaker 1: The story of Sirie is complicated, and not just because

25
00:01:50,880 --> 00:01:55,280
Speaker 1: of the internal history of developing the technology, but also

26
00:01:55,360 --> 00:01:59,440
Speaker 1: because the tool relies on a lot of converging technological

27
00:01:59,480 --> 00:02:04,600
Speaker 1: trend There are elements of voice recognition, UH, speech to text,

28
00:02:05,080 --> 00:02:09,240
Speaker 1: natural language interpretation, and other technologies that fall under the

29
00:02:09,440 --> 00:02:14,600
Speaker 1: very broad umbrella of artificial intelligence. So get settled, it's

30
00:02:14,600 --> 00:02:17,760
Speaker 1: time to talk about Siri. Also, if you're listening to

31
00:02:17,760 --> 00:02:22,320
Speaker 1: this near Apple devices, I apologize because there's a good

32
00:02:22,400 --> 00:02:26,800
Speaker 1: chance those devices might start talking back at me. But

33
00:02:26,960 --> 00:02:29,440
Speaker 1: I refuse to do an episode where I just refer

34
00:02:29,560 --> 00:02:34,240
Speaker 1: to the subject as you know who. You could argue

35
00:02:34,720 --> 00:02:37,799
Speaker 1: that the origins of Siri can be found in a

36
00:02:37,840 --> 00:02:43,960
Speaker 1: promotional video that Apple produced back in nineteen seven to

37
00:02:44,040 --> 00:02:48,440
Speaker 1: show off a concept of an artificially intelligent smart assistant.

38
00:02:49,000 --> 00:02:52,520
Speaker 1: Now that alone is interesting, but what really is amazing

39
00:02:52,800 --> 00:02:57,360
Speaker 1: is that the arbitrary date they chose as the setting

40
00:02:57,360 --> 00:03:01,440
Speaker 1: for this video was two thousand evan, probably September. We

41
00:03:01,560 --> 00:03:05,080
Speaker 1: know that because there is a part within the video

42
00:03:05,200 --> 00:03:08,840
Speaker 1: where a character asks for information that had been published

43
00:03:08,919 --> 00:03:14,080
Speaker 1: five years previously, and the published information had a publication

44
00:03:14,160 --> 00:03:17,480
Speaker 1: date of two thousand six. Now this means that the

45
00:03:17,560 --> 00:03:22,520
Speaker 1: actual debut of Syrie as an Apple product was just

46
00:03:22,760 --> 00:03:27,160
Speaker 1: one month after the fictional events in that video from nine.

47
00:03:28,680 --> 00:03:31,520
Speaker 1: That's just a coincidence, but it's a cool one. The

48
00:03:31,639 --> 00:03:37,000
Speaker 1: Knowledge Navigator video shows a man walking into a study,

49
00:03:37,360 --> 00:03:42,080
Speaker 1: really nice one, and unfolding a tablet style computer device.

50
00:03:42,560 --> 00:03:45,080
Speaker 1: Then he walks off away to stare at stuff as

51
00:03:45,120 --> 00:03:49,640
Speaker 1: a virtual assistant reads off his messages and meetings on

52
00:03:49,680 --> 00:03:54,560
Speaker 1: his calendar. The virtual assistant appears as a video and

53
00:03:54,600 --> 00:03:57,520
Speaker 1: a little window on the screen of the tablet, and

54
00:03:57,560 --> 00:03:59,760
Speaker 1: it's you know, like shot from the shoulders up, kind

55
00:03:59,800 --> 00:04:03,000
Speaker 1: of a the bust of a young man, and the

56
00:04:03,120 --> 00:04:06,440
Speaker 1: video takes up that one little corner of the tablet device.

57
00:04:06,440 --> 00:04:10,560
Speaker 1: So in this visualization, the virtual assistant isn't just a

58
00:04:10,640 --> 00:04:15,560
Speaker 1: disembodied voice. It also has a face. Also, everyone in

59
00:04:15,560 --> 00:04:19,720
Speaker 1: this video is extremely white, which I guess is kind

60
00:04:19,720 --> 00:04:24,080
Speaker 1: of a given for the time period and the people involved,

61
00:04:24,680 --> 00:04:28,680
Speaker 1: but it just comes across as so white. I mean,

62
00:04:29,160 --> 00:04:32,160
Speaker 1: we're doing this with the benefit of the glasses of

63
00:04:32,200 --> 00:04:35,400
Speaker 1: twenty I just wanted to throw that out there anyway.

64
00:04:35,560 --> 00:04:38,960
Speaker 1: The video goes on to have the real life man

65
00:04:39,160 --> 00:04:42,520
Speaker 1: who is a professor in this video, ask his virtual

66
00:04:42,520 --> 00:04:47,120
Speaker 1: assistant to pull up lecture notes uh and unread articles

67
00:04:47,160 --> 00:04:50,039
Speaker 1: that relate back to the lecture he's He's asking for

68
00:04:50,040 --> 00:04:52,440
Speaker 1: a lecture notes of a lecture he gave a year ago.

69
00:04:52,839 --> 00:04:55,440
Speaker 1: He's giving essentially the same lecture now, but he wants

70
00:04:55,440 --> 00:04:58,440
Speaker 1: to update it with the latest information, and he even

71
00:04:58,480 --> 00:05:03,159
Speaker 1: asks the virtual assistant to summarize those unread articles that

72
00:05:03,200 --> 00:05:06,520
Speaker 1: had been published in the year since his last lecture.

73
00:05:06,760 --> 00:05:12,040
Speaker 1: The virtual assistant is thus aggregating information, analyzing that information

74
00:05:12,080 --> 00:05:15,880
Speaker 1: for context, and then delivering summaries, which is that's a

75
00:05:15,880 --> 00:05:21,279
Speaker 1: pretty sophisticated set of artificially intelligent tasks. He also, the

76
00:05:21,360 --> 00:05:25,680
Speaker 1: professor uses the device and virtual assistant to call and

77
00:05:25,760 --> 00:05:29,760
Speaker 1: collaborate with a peer in real time. Now, this was

78
00:05:29,839 --> 00:05:33,840
Speaker 1: not the only video that Apple would produce to showcase

79
00:05:33,920 --> 00:05:37,560
Speaker 1: this kind of general idea, however, arguably it is the

80
00:05:37,600 --> 00:05:43,000
Speaker 1: most famous of those videos. Now, as I said, Knowledge

81
00:05:43,080 --> 00:05:47,039
Speaker 1: Navigator came out of Apple, and Steve Jobs would later

82
00:05:47,080 --> 00:05:51,880
Speaker 1: play a pivotal role in how the company would introduce Sirie,

83
00:05:52,560 --> 00:05:56,120
Speaker 1: but This was not a Steve Jobs project because Jobs

84
00:05:56,120 --> 00:05:59,840
Speaker 1: had been ousted from the company Apple, or he had

85
00:06:00,080 --> 00:06:03,240
Speaker 1: quit in disgust, depending upon which version of the story

86
00:06:03,600 --> 00:06:06,159
Speaker 1: you're listening to. Anyway, he had left a couple of

87
00:06:06,240 --> 00:06:10,279
Speaker 1: years before this video was produced. The Knowledge Navigator was

88
00:06:10,360 --> 00:06:14,200
Speaker 1: something that Apple CEO John Scully had described in a

89
00:06:14,279 --> 00:06:18,640
Speaker 1: book titled Odyssey. Now, of course, in science fiction stories

90
00:06:19,400 --> 00:06:22,240
Speaker 1: we have no shortage of instances where a human is

91
00:06:22,279 --> 00:06:26,800
Speaker 1: interacting with a computer or otherwise artificially intelligent device like

92
00:06:26,839 --> 00:06:30,520
Speaker 1: a robot, but the Knowledge Navigator seemed to lay down

93
00:06:30,560 --> 00:06:35,160
Speaker 1: the foundations toward future products like Siri and the iPad,

94
00:06:35,440 --> 00:06:39,040
Speaker 1: not to mention the potential uses of the Internet, which

95
00:06:39,040 --> 00:06:44,080
Speaker 1: inn was definitely a thing. It existed, but most of

96
00:06:44,120 --> 00:06:48,440
Speaker 1: the mainstream public remained unaware of it because the Worldwide

97
00:06:48,440 --> 00:06:51,919
Speaker 1: Web wouldn't even come along for another few years. However,

98
00:06:52,360 --> 00:06:54,760
Speaker 1: while you can look at this video and say, ah,

99
00:06:55,480 --> 00:06:59,520
Speaker 1: this must be where Apple got that idea, they probably

100
00:06:59,560 --> 00:07:02,400
Speaker 1: got to work right away on Siri, well you'd be

101
00:07:02,480 --> 00:07:06,960
Speaker 1: wrong because the early work, in fact, the vast bulk

102
00:07:07,440 --> 00:07:09,880
Speaker 1: of the work on Syrie to bring it to life,

103
00:07:10,440 --> 00:07:14,560
Speaker 1: didn't start at Apple at all. It didn't involve the company.

104
00:07:14,600 --> 00:07:19,200
Speaker 1: So our story now turns to a very different organization,

105
00:07:19,600 --> 00:07:25,640
Speaker 1: the Defense Advanced Research Projects Agency, better known as DARPA.

106
00:07:25,760 --> 00:07:29,600
Speaker 1: Now this is part of the United States Department of Defense.

107
00:07:30,080 --> 00:07:33,120
Speaker 1: Back in nineteen fifty eight, the then President of the

108
00:07:33,200 --> 00:07:39,080
Speaker 1: United States, Dwight D. Eisenhower, authorized the foundation of this agency,

109
00:07:39,280 --> 00:07:42,240
Speaker 1: though at the time it was called the Advanced Research

110
00:07:42,320 --> 00:07:46,880
Speaker 1: Project Agency or ARPA. Defense would be added later. This

111
00:07:46,960 --> 00:07:49,400
Speaker 1: agency would play a critical role in the evolution of

112
00:07:49,440 --> 00:07:53,960
Speaker 1: technologies in the United States, and the mission of DARPA

113
00:07:54,040 --> 00:07:58,520
Speaker 1: and ARPA before it is quote to make pivotal investments

114
00:07:58,600 --> 00:08:03,000
Speaker 1: and breakthrough technology is for national security end quote, and

115
00:08:03,040 --> 00:08:07,240
Speaker 1: that wording is really precise. It's easy to imagine DARPA

116
00:08:07,320 --> 00:08:11,600
Speaker 1: as being housed in some enormous underground bunker filled with

117
00:08:11,640 --> 00:08:16,520
Speaker 1: scientists who are building out crazy devices like robo scorpions

118
00:08:16,640 --> 00:08:19,680
Speaker 1: or a blender that can also teleport or something. But

119
00:08:19,800 --> 00:08:26,080
Speaker 1: in reality, DARPA is more about funding research than conducting research. Now,

120
00:08:26,080 --> 00:08:29,520
Speaker 1: don't get me wrong, the agency relies heavily on experts

121
00:08:29,560 --> 00:08:33,240
Speaker 1: to evaluate proposals and consider to whom the agency should

122
00:08:33,280 --> 00:08:36,959
Speaker 1: send money. But the purpose of DARPA is to enable

123
00:08:37,120 --> 00:08:41,680
Speaker 1: others to do important work. DARPA has played a huge

124
00:08:41,840 --> 00:08:46,640
Speaker 1: role in countless technological breakthroughs. This way. Much of the

125
00:08:46,679 --> 00:08:49,960
Speaker 1: technologies that would go on to power the Internet started

126
00:08:50,000 --> 00:08:53,400
Speaker 1: with ARPA net, a kind of precursor network to the

127
00:08:53,400 --> 00:08:57,400
Speaker 1: Internet and one that was funded by ARPA. Thus the

128
00:08:57,520 --> 00:09:01,600
Speaker 1: name the DARPA Grand Challenge just helped get self driving

129
00:09:01,640 --> 00:09:05,880
Speaker 1: cars into gear. You know, pun intended. They also created

130
00:09:05,960 --> 00:09:09,720
Speaker 1: difficult scenarios for humanoid robots to go through. That was

131
00:09:09,760 --> 00:09:13,120
Speaker 1: a few years ago and was really cool. The competitions

132
00:09:13,200 --> 00:09:17,640
Speaker 1: DARPA hosts have specific goals and metrics, and that guides

133
00:09:17,720 --> 00:09:20,840
Speaker 1: the designers and engineers who are working on them as

134
00:09:20,840 --> 00:09:24,720
Speaker 1: they build out technologies. It's good to define your goal.

135
00:09:24,840 --> 00:09:28,080
Speaker 1: It really gives you focus when you're trying to develop

136
00:09:28,160 --> 00:09:31,360
Speaker 1: the technology to meet that goal. Winning a challenge is

137
00:09:31,400 --> 00:09:34,320
Speaker 1: a big deal, though the cash prize may not even

138
00:09:34,360 --> 00:09:37,880
Speaker 1: cover the amount of money participants have spent through the

139
00:09:37,880 --> 00:09:42,400
Speaker 1: development of those technologies, and there are entire businesses, or

140
00:09:42,559 --> 00:09:46,680
Speaker 1: at least divisions within businesses that can be borne out

141
00:09:46,679 --> 00:09:50,400
Speaker 1: of these challenges. The Grand Challenges are just one way

142
00:09:50,520 --> 00:09:55,200
Speaker 1: DARPA encourages technological development. Often, the agency will create a

143
00:09:55,240 --> 00:09:59,480
Speaker 1: specific goal such as the design of a robotic exoskeleton

144
00:09:59,559 --> 00:10:03,000
Speaker 1: that can help you know, US soldiers carry heavy loads

145
00:10:03,160 --> 00:10:06,800
Speaker 1: while they are on foot for longer distances, and then

146
00:10:06,840 --> 00:10:10,439
Speaker 1: they'll send out an RFP, which is a request for proposal.

147
00:10:11,120 --> 00:10:14,680
Speaker 1: The agency considers the proposals that it receives from this

148
00:10:14,840 --> 00:10:19,040
Speaker 1: RFP and then decides which, if any, they will accept

149
00:10:19,160 --> 00:10:22,320
Speaker 1: and then fund. Then after a given amount of time.

150
00:10:22,400 --> 00:10:25,840
Speaker 1: You know, it's dependent upon the specific project, we find

151
00:10:25,880 --> 00:10:28,960
Speaker 1: out if anything comes out of it. Sometimes nothing does,

152
00:10:29,360 --> 00:10:33,360
Speaker 1: as some technological problems may prove more challenging than others

153
00:10:33,400 --> 00:10:37,680
Speaker 1: and may require more time to evolve the various technologies

154
00:10:37,720 --> 00:10:40,400
Speaker 1: to make it possible. So it might push the field,

155
00:10:40,640 --> 00:10:42,760
Speaker 1: but you might not have a finished product at the

156
00:10:42,800 --> 00:10:45,120
Speaker 1: end of it. Other times you do get a finished

157
00:10:45,120 --> 00:10:49,240
Speaker 1: product anyway. In two thousand three, a decade and a

158
00:10:49,280 --> 00:10:52,840
Speaker 1: half after the Knowledge Navigator videos came out of Apple,

159
00:10:53,480 --> 00:10:57,040
Speaker 1: DARPA identified a new opportunity, and this was one that

160
00:10:57,120 --> 00:11:00,960
Speaker 1: was borne out of necessity. The challenge was that we

161
00:11:01,040 --> 00:11:04,360
Speaker 1: have access to way more information today than we did

162
00:11:04,360 --> 00:11:08,440
Speaker 1: in the past. So decades ago, military commanders had to

163
00:11:08,480 --> 00:11:12,960
Speaker 1: make decisions based on limited information. They'd rely a great

164
00:11:13,040 --> 00:11:17,280
Speaker 1: deal on their own expertise and experience in order to

165
00:11:17,400 --> 00:11:19,360
Speaker 1: make up for the fact that they only had part

166
00:11:19,400 --> 00:11:22,160
Speaker 1: of the picture. And while a great commander has a

167
00:11:22,200 --> 00:11:26,199
Speaker 1: better chance of making the right call than an inexperienced

168
00:11:26,200 --> 00:11:30,119
Speaker 1: commander would, the limited amount of information could still contribute

169
00:11:30,160 --> 00:11:33,840
Speaker 1: to disaster. You might be the greatest commander of all time,

170
00:11:34,400 --> 00:11:37,319
Speaker 1: but if you're lacking a key part of information, you

171
00:11:37,400 --> 00:11:41,160
Speaker 1: might make a decision that is terrible. So flash forward

172
00:11:41,200 --> 00:11:44,120
Speaker 1: to two thousand three, and now the story had kind

173
00:11:44,200 --> 00:11:48,800
Speaker 1: of flip flopped. Now military commanders would receive more information

174
00:11:48,840 --> 00:11:52,920
Speaker 1: than they could reasonably handle. The challenge now wasn't to

175
00:11:53,120 --> 00:11:56,120
Speaker 1: use intuition to make up for blind spots, but rather,

176
00:11:56,559 --> 00:11:59,600
Speaker 1: how do you synthesize all this information so that you

177
00:11:59,640 --> 00:12:03,960
Speaker 1: can make the right decision. Too much information was proving

178
00:12:04,000 --> 00:12:06,640
Speaker 1: to be kind of as big a problem as too

179
00:12:06,720 --> 00:12:11,240
Speaker 1: little information, at least in some cases, and so DARPA

180
00:12:11,240 --> 00:12:14,240
Speaker 1: wished to fund the development of a smart system that

181
00:12:14,320 --> 00:12:17,560
Speaker 1: could help commanders make sense of all the data coming

182
00:12:17,600 --> 00:12:21,840
Speaker 1: in from day to day. Now, DARPA projects tend to

183
00:12:21,880 --> 00:12:26,360
Speaker 1: be labyrinthian, with lots of bits and pieces and a

184
00:12:26,360 --> 00:12:30,160
Speaker 1: lot of different companies and research labs and more organizations

185
00:12:30,240 --> 00:12:33,800
Speaker 1: might tackle all or part of one of these projects.

186
00:12:34,400 --> 00:12:38,199
Speaker 1: The cognitive computing section of DARPA had a program called

187
00:12:38,360 --> 00:12:44,640
Speaker 1: Perceptive Assistance that Learn or PAL, which seems nice. It

188
00:12:44,760 --> 00:12:47,520
Speaker 1: was this part of the program that would fund the

189
00:12:47,559 --> 00:12:52,200
Speaker 1: development of a virtual cognitive assistant. The amount of funding

190
00:12:52,640 --> 00:12:57,520
Speaker 1: was twenty two million dollars. What a great PAL. The

191
00:12:57,640 --> 00:13:02,880
Speaker 1: organization that landed this deal was s r I International,

192
00:13:03,240 --> 00:13:11,160
Speaker 1: itself an incredibly influential organization. It's a nonprofit scientific research institution.

193
00:13:11,520 --> 00:13:16,319
Speaker 1: Originally it was called the Stanford Research Institute because it

194
00:13:16,360 --> 00:13:20,000
Speaker 1: was established by the trustees of Stanford University back in

195
00:13:20,120 --> 00:13:24,120
Speaker 1: nineteen forty six, though the organization would separate from the

196
00:13:24,200 --> 00:13:28,160
Speaker 1: university formally in the nineteen seventies and become a standalone,

197
00:13:28,240 --> 00:13:33,480
Speaker 1: nonprofit scientific research lab. The organization has played a role

198
00:13:33,520 --> 00:13:38,120
Speaker 1: in advancing materials science, developing liquid crystal displays or l

199
00:13:38,160 --> 00:13:43,280
Speaker 1: c d s, creating telesurgery implementations, and more. And now

200
00:13:43,360 --> 00:13:46,720
Speaker 1: it was going to tackle DARPA's request for a cognitive

201
00:13:46,760 --> 00:13:52,360
Speaker 1: computer assistant. S r I International created a project called

202
00:13:52,400 --> 00:13:58,200
Speaker 1: the Cognitive Assistant that Learns and Organizes or KALO or

203
00:13:58,400 --> 00:14:01,320
Speaker 1: CALO if you prefer. And this appears to be another

204
00:14:01,360 --> 00:14:05,440
Speaker 1: case where they landed upon that acronym first and then

205
00:14:05,559 --> 00:14:09,480
Speaker 1: worked backward, as klo seems to come from the Latin

206
00:14:09,520 --> 00:14:15,840
Speaker 1: word colognists, which means soldiers servant, and I probably mispronounced

207
00:14:15,840 --> 00:14:19,240
Speaker 1: that because even though I was a medievalist, it's almost

208
00:14:19,280 --> 00:14:23,720
Speaker 1: criminal I never took Latin. The concept, however, hearkens back

209
00:14:23,720 --> 00:14:26,280
Speaker 1: to some of what we would see in that Knowledge

210
00:14:26,400 --> 00:14:30,560
Speaker 1: Navigator video from a system that would be able to

211
00:14:30,640 --> 00:14:36,400
Speaker 1: receive and interpret information, presumably from multiple sources, and provide

212
00:14:36,400 --> 00:14:41,040
Speaker 1: a meaningful presentation or even interpretation of that data to humans,

213
00:14:41,760 --> 00:14:44,880
Speaker 1: which is a pretty tall order, and let's break down

214
00:14:45,120 --> 00:14:47,400
Speaker 1: a bit of what an assistant would need to do

215
00:14:47,520 --> 00:14:50,920
Speaker 1: in order to accomplish this. We'll leave help the voice

216
00:14:50,920 --> 00:14:54,040
Speaker 1: activation parts for now, as that would not be absolutely

217
00:14:54,040 --> 00:14:56,080
Speaker 1: critical to make this work. You know, you might have

218
00:14:56,120 --> 00:14:59,680
Speaker 1: a system that gives daily briefings on its own, or

219
00:15:00,040 --> 00:15:02,680
Speaker 1: you might have one that you activate through text commands

220
00:15:02,760 --> 00:15:05,840
Speaker 1: or some other user interface. It wouldn't necessarily have to

221
00:15:05,880 --> 00:15:08,840
Speaker 1: be voice activated. But on the back end, what has

222
00:15:08,880 --> 00:15:11,680
Speaker 1: to happen for this to work well? Presumably such a

223
00:15:11,680 --> 00:15:14,480
Speaker 1: system would need to pull in data from a number

224
00:15:14,560 --> 00:15:18,680
Speaker 1: of disparate sources, so the assistant wouldn't just be reciting

225
00:15:18,680 --> 00:15:23,600
Speaker 1: facts and figures that we're coming from a centralized data server. Instead,

226
00:15:23,600 --> 00:15:27,040
Speaker 1: it might be assimilating data from numerous sources into a

227
00:15:27,120 --> 00:15:31,000
Speaker 1: cohesive presentation. On top of that, the data might be

228
00:15:31,000 --> 00:15:33,680
Speaker 1: in different formats, meaning the system would need to be

229
00:15:33,680 --> 00:15:37,800
Speaker 1: able to analyze the information inside different types of files.

230
00:15:38,880 --> 00:15:42,120
Speaker 1: This isn't an easy thing to do. There's a reason

231
00:15:42,280 --> 00:15:45,000
Speaker 1: we have a lot of specialized programs for working with

232
00:15:45,040 --> 00:15:49,120
Speaker 1: specific types of files. When I put together these podcasts,

233
00:15:49,680 --> 00:15:53,000
Speaker 1: I use a word processor for my notes, and I

234
00:15:53,160 --> 00:15:56,840
Speaker 1: use an audio editing piece of software to record and

235
00:15:57,080 --> 00:16:00,479
Speaker 1: edit the podcasts. Now I need both of those programs

236
00:16:00,680 --> 00:16:04,000
Speaker 1: because neither of them can do the job that the

237
00:16:04,040 --> 00:16:06,720
Speaker 1: other one does. I don't have like a all purpose

238
00:16:06,760 --> 00:16:11,440
Speaker 1: program that does everything. Accessing different file formats, even in

239
00:16:11,480 --> 00:16:15,760
Speaker 1: the same general family of applications is tricky. Beyond that,

240
00:16:16,320 --> 00:16:20,360
Speaker 1: the way information can be presented within each file could

241
00:16:20,360 --> 00:16:23,880
Speaker 1: be very different. It's very possible for us to open

242
00:16:23,960 --> 00:16:28,800
Speaker 1: up multiple spreadsheets and even using the same basic spreadsheet

243
00:16:28,800 --> 00:16:31,160
Speaker 1: program let's just say Excel, It's possible for us to

244
00:16:31,200 --> 00:16:35,240
Speaker 1: open up half a dozen Excel spreadsheets that are all

245
00:16:35,280 --> 00:16:38,680
Speaker 1: presenting the same information but doing so in different ways,

246
00:16:38,880 --> 00:16:41,760
Speaker 1: and that might not be obvious at casual glance. You

247
00:16:41,840 --> 00:16:44,960
Speaker 1: might look at one and the other and not immediately realize, oh,

248
00:16:45,160 --> 00:16:48,200
Speaker 1: these are both saying the same thing. Just think about

249
00:16:48,200 --> 00:16:51,000
Speaker 1: how information could be presented as a table or a

250
00:16:51,000 --> 00:16:55,560
Speaker 1: graph or a chart. The AI assistant would ideally be

251
00:16:55,640 --> 00:16:59,040
Speaker 1: able to access information no matter what format it was in.

252
00:16:59,560 --> 00:17:02,880
Speaker 1: Nomatter are what a version of that format it was in,

253
00:17:02,960 --> 00:17:05,199
Speaker 1: be able to interpret it and then be able to

254
00:17:05,240 --> 00:17:09,280
Speaker 1: deliver a meaningful analysis to the user. Now, as data

255
00:17:09,320 --> 00:17:13,560
Speaker 1: sets grow, this becomes increasingly difficult, which I should point

256
00:17:13,600 --> 00:17:16,600
Speaker 1: out is the whole reason DARPA wanted to fund research

257
00:17:16,640 --> 00:17:19,800
Speaker 1: into this in the first place. Military commanders were faced

258
00:17:19,840 --> 00:17:23,360
Speaker 1: with a growing mountain of information that was increasingly difficult

259
00:17:23,400 --> 00:17:28,600
Speaker 1: to parse. The analysis might also need to incorporate natural

260
00:17:28,720 --> 00:17:32,479
Speaker 1: language recognition features. And I've talked about natural language a

261
00:17:32,480 --> 00:17:35,480
Speaker 1: lot in previous episodes, but if we boil it down,

262
00:17:35,720 --> 00:17:38,679
Speaker 1: it's the language that we humans use to communicate with

263
00:17:38,720 --> 00:17:43,399
Speaker 1: one another. It's our natural way of expressing our thoughts.

264
00:17:43,440 --> 00:17:47,119
Speaker 1: But the way we humans process and communicate information is

265
00:17:47,240 --> 00:17:51,080
Speaker 1: different from how machines do it. We can be subtle.

266
00:17:51,400 --> 00:17:54,919
Speaker 1: We can use stuff like metaphors and allegories and just

267
00:17:55,080 --> 00:17:59,960
Speaker 1: different phrasing. Computers are, you know, a lot more literal. Hey,

268
00:18:00,119 --> 00:18:02,960
Speaker 1: if you break it down to the most basic unit

269
00:18:03,240 --> 00:18:06,600
Speaker 1: of machine information, you know, the bit. You see how

270
00:18:06,680 --> 00:18:10,560
Speaker 1: literal computers are. A bit is either a zero or

271
00:18:10,600 --> 00:18:13,600
Speaker 1: a one, or if you prefer, it's either off and

272
00:18:13,840 --> 00:18:18,159
Speaker 1: on or no and yes. But using lots of bits,

273
00:18:18,359 --> 00:18:21,359
Speaker 1: we can describe information in a way that provides more

274
00:18:21,400 --> 00:18:24,320
Speaker 1: subtlety than just nowhere. Yes. But my point is that

275
00:18:24,359 --> 00:18:28,520
Speaker 1: computers don't naturally process information the way we do, and

276
00:18:28,600 --> 00:18:33,400
Speaker 1: so an entire branch of artificial intelligence called natural language

277
00:18:33,400 --> 00:18:37,880
Speaker 1: processing evolved to create ways for computers to interpret what

278
00:18:37,960 --> 00:18:42,680
Speaker 1: we mean when we express things within natural language. Making

279
00:18:42,720 --> 00:18:46,080
Speaker 1: this more complicated is that, of course, there's no one

280
00:18:46,240 --> 00:18:49,439
Speaker 1: way to say any given thing. We've got lots of

281
00:18:49,480 --> 00:18:53,040
Speaker 1: ways to express the same general thought. And added to that,

282
00:18:53,680 --> 00:18:58,400
Speaker 1: we have lots of different languages. There are around seven

283
00:18:58,440 --> 00:19:02,320
Speaker 1: thousand different langue whig is spoken in the world today,

284
00:19:02,640 --> 00:19:04,919
Speaker 1: though you could probably get away with a couple of

285
00:19:05,040 --> 00:19:08,399
Speaker 1: dozen and cover the vast majority of the world's population

286
00:19:08,520 --> 00:19:11,840
Speaker 1: that way. But these languages have their own vocabularies, their

287
00:19:11,840 --> 00:19:16,119
Speaker 1: own syntaxes, their own expressions. So not only do we

288
00:19:16,200 --> 00:19:19,320
Speaker 1: have multiple ways of saying things within one language, we

289
00:19:19,400 --> 00:19:22,960
Speaker 1: have all these different languages to worry about. If you

290
00:19:23,000 --> 00:19:26,320
Speaker 1: were to send ten people into a room with an

291
00:19:26,320 --> 00:19:29,600
Speaker 1: AI assistant, and those ten people have a task they're

292
00:19:29,640 --> 00:19:33,000
Speaker 1: supposed to perform with the help of this AI assistant,

293
00:19:33,680 --> 00:19:36,240
Speaker 1: odds are no two people are going to go about

294
00:19:36,280 --> 00:19:40,240
Speaker 1: it exactly the same way. And yet a working virtual

295
00:19:40,280 --> 00:19:43,359
Speaker 1: assistant needs to be able to interpret and respond to

296
00:19:43,560 --> 00:19:47,120
Speaker 1: every case and do so reliably on the back end,

297
00:19:47,440 --> 00:19:50,080
Speaker 1: and AI system needs to be able to interpret data

298
00:19:50,119 --> 00:19:53,480
Speaker 1: coming from different sources that may have very different ways

299
00:19:53,520 --> 00:19:58,720
Speaker 1: of expressing similar ideas. This is an enormous task. Now,

300
00:19:58,720 --> 00:20:01,560
Speaker 1: when we come back, I'll talk more about what s

301
00:20:01,680 --> 00:20:04,520
Speaker 1: R I was doing and how the military project would

302
00:20:04,520 --> 00:20:08,560
Speaker 1: evolve ultimately into Apple's Personal Assistant. But first let's take

303
00:20:08,880 --> 00:20:19,359
Speaker 1: a quick break. Now I've only scratched the surface of

304
00:20:19,440 --> 00:20:22,840
Speaker 1: what makes the creation of an AI assistant capable of

305
00:20:22,880 --> 00:20:27,280
Speaker 1: accessing information from numerous sources and making that information useful

306
00:20:27,800 --> 00:20:32,040
Speaker 1: really required. Let's talk a bit about the parameters of

307
00:20:32,080 --> 00:20:35,399
Speaker 1: this project itself. So if you remember I said that

308
00:20:35,480 --> 00:20:38,919
Speaker 1: the deal was initially for twenty two million dollars, and

309
00:20:39,000 --> 00:20:42,200
Speaker 1: that would end up funding the creation of a five

310
00:20:42,400 --> 00:20:47,720
Speaker 1: hundred person project, and the project spanned five years initially

311
00:20:47,880 --> 00:20:51,680
Speaker 1: to investigate the possibility of building out such an AI system.

312
00:20:51,720 --> 00:20:55,159
Speaker 1: Over time, more money would end up going into the

313
00:20:55,240 --> 00:20:58,760
Speaker 1: research system, and it totaled around a hundred fifty million

314
00:20:58,800 --> 00:21:01,399
Speaker 1: dollars by the end of the produc inject. The lab

315
00:21:01,560 --> 00:21:04,920
Speaker 1: where it all went down would receive the charming nickname

316
00:21:05,200 --> 00:21:08,760
Speaker 1: nerd City. A large part of the project focused on

317
00:21:08,840 --> 00:21:13,159
Speaker 1: creating a program that could learn a user's behaviors. So

318
00:21:13,200 --> 00:21:17,359
Speaker 1: not only could this personal assistant respond to what you

319
00:21:17,400 --> 00:21:22,760
Speaker 1: were asking, it would gradually learn the way you behaved

320
00:21:22,840 --> 00:21:26,240
Speaker 1: and it would adapt to you to work more effectively.

321
00:21:26,800 --> 00:21:31,040
Speaker 1: Now this comes into the arena of pattern recognition. We

322
00:21:31,280 --> 00:21:34,840
Speaker 1: humans are pretty darn good at recognizing patterns. In fact,

323
00:21:35,400 --> 00:21:39,480
Speaker 1: we're so good that sometimes we will quote unquote recognize

324
00:21:39,560 --> 00:21:43,919
Speaker 1: a pattern even when there isn't a pattern there. In

325
00:21:43,960 --> 00:21:47,880
Speaker 1: some cases, this can come across as charming, such as

326
00:21:48,280 --> 00:21:52,040
Speaker 1: when we see a face in a cloud, right, that's

327
00:21:52,560 --> 00:21:55,880
Speaker 1: not really a pattern there. We're recognizing a pattern where

328
00:21:55,880 --> 00:21:58,639
Speaker 1: none really exists. It's all based on our perspective in

329
00:21:58,640 --> 00:22:02,560
Speaker 1: our imaginations. Now, in other cases, it's not so charming.

330
00:22:02,600 --> 00:22:05,159
Speaker 1: It can actually lead to faulty reasoning. So I'm going

331
00:22:05,200 --> 00:22:08,120
Speaker 1: to give you a very basic example that I hear

332
00:22:08,200 --> 00:22:11,880
Speaker 1: all the time, particularly now that we're in October and

333
00:22:11,960 --> 00:22:16,439
Speaker 1: there's some full moon weirdness going on. So there's a

334
00:22:16,480 --> 00:22:21,320
Speaker 1: fairly widespread belief that there's a connection between full moons

335
00:22:21,359 --> 00:22:25,280
Speaker 1: and an increase in the number of medical emergencies that happened.

336
00:22:25,359 --> 00:22:29,520
Speaker 1: Generally speaking, that people act irresponsibly during a full moon,

337
00:22:29,640 --> 00:22:33,760
Speaker 1: and that often results in injury, which means greater activity

338
00:22:33,800 --> 00:22:38,480
Speaker 1: at hospitals. Now, this belief is most likely due to

339
00:22:38,640 --> 00:22:43,680
Speaker 1: confirmation bias. That is, we already have a belief in place,

340
00:22:44,040 --> 00:22:46,880
Speaker 1: and the belief is that full moons lead to more

341
00:22:46,920 --> 00:22:51,000
Speaker 1: accidents because of people acting irresponsibly. That is what we believe.

342
00:22:51,720 --> 00:22:55,760
Speaker 1: It doesn't have evidence yet, and then when things do

343
00:22:55,920 --> 00:22:58,960
Speaker 1: get busy at a hospital and there happens to be

344
00:22:59,000 --> 00:23:03,159
Speaker 1: a full moon, we register that as evidence for our belief. Aha,

345
00:23:03,920 --> 00:23:07,840
Speaker 1: says the mistaken person. The full moon explains it. However,

346
00:23:08,200 --> 00:23:11,080
Speaker 1: on nights when it is busy but there is no

347
00:23:11,160 --> 00:23:14,160
Speaker 1: full moon, there's no hit, no one, no one takes

348
00:23:14,200 --> 00:23:17,280
Speaker 1: notice of how odd you know, it's crazy busy, but

349
00:23:17,359 --> 00:23:20,959
Speaker 1: there's no full moon tonight. We don't do that. Likewise,

350
00:23:21,520 --> 00:23:25,000
Speaker 1: if it happens to not be busy but there's a

351
00:23:25,040 --> 00:23:27,800
Speaker 1: full moon, you're also not likely to notice. You're not

352
00:23:27,880 --> 00:23:30,159
Speaker 1: likely to say, like hunt, it's not very busy tonight,

353
00:23:30,200 --> 00:23:33,560
Speaker 1: but there's a full moon out. So it's only when

354
00:23:33,800 --> 00:23:37,120
Speaker 1: you have the full moon and the busy hospital where

355
00:23:37,119 --> 00:23:41,360
Speaker 1: the evidence appears to support your belief and confirm your bias.

356
00:23:42,040 --> 00:23:44,480
Speaker 1: But in truth, when you take a step back and

357
00:23:44,560 --> 00:23:47,520
Speaker 1: you do an objective study and you look at the

358
00:23:47,640 --> 00:23:50,440
Speaker 1: times when a hospital is busy, and you look at

359
00:23:50,520 --> 00:23:52,439
Speaker 1: when there was a full moon, and you look to

360
00:23:52,440 --> 00:23:56,280
Speaker 1: see if there's any correlation, it falls apart. Now I

361
00:23:56,320 --> 00:23:58,959
Speaker 1: got a little off track there, But the point I

362
00:23:58,960 --> 00:24:03,040
Speaker 1: wanted to make is that we humans are biologically attuned

363
00:24:03,240 --> 00:24:08,080
Speaker 1: to recognizing patterns. It's very likely that pattern recognition is

364
00:24:08,080 --> 00:24:11,240
Speaker 1: one of the traits that really helped us survive thousands

365
00:24:11,240 --> 00:24:14,359
Speaker 1: of years ago, which is why it's so intrinsic in

366
00:24:14,400 --> 00:24:19,359
Speaker 1: the human experience. But building programs, computer systems that are

367
00:24:19,359 --> 00:24:23,880
Speaker 1: capable of identifying patterns and separating out what is signal

368
00:24:24,119 --> 00:24:28,000
Speaker 1: versus what is noise is its own really big challenge.

369
00:24:28,800 --> 00:24:31,280
Speaker 1: S r I was hoping to create a program that

370
00:24:31,320 --> 00:24:34,520
Speaker 1: could look for patterns and user behavior in order to

371
00:24:34,640 --> 00:24:38,879
Speaker 1: respond with greater precision and accuracy to user requests and

372
00:24:39,040 --> 00:24:43,680
Speaker 1: ultimately to anticipate future requests. Now we see the sort

373
00:24:43,720 --> 00:24:47,960
Speaker 1: of pattern recognition and response in lots of technology today.

374
00:24:48,000 --> 00:24:51,240
Speaker 1: There are several smart thermostats on the market right now,

375
00:24:51,440 --> 00:24:55,200
Speaker 1: for example, that can track when you tend to raise

376
00:24:55,480 --> 00:24:58,399
Speaker 1: or lower the temperature in your home, and after a while,

377
00:24:58,640 --> 00:25:01,480
Speaker 1: the thermostat learns that, hey, maybe you like it nice

378
00:25:01,480 --> 00:25:03,840
Speaker 1: and chilly at night, but you want it to be

379
00:25:03,960 --> 00:25:07,320
Speaker 1: warm and toasty in the morning, and so the thermostat

380
00:25:07,400 --> 00:25:10,840
Speaker 1: begins to adjust itself in preparation for that based on

381
00:25:10,920 --> 00:25:14,800
Speaker 1: your previous behaviors. Now that is a very simple example.

382
00:25:15,359 --> 00:25:18,960
Speaker 1: Extrapolate that out and you begin to imagine a technology

383
00:25:19,000 --> 00:25:22,639
Speaker 1: that is anticipating what you need or want, perhaps before

384
00:25:22,680 --> 00:25:26,320
Speaker 1: you're even aware of it yourself, which can get kind

385
00:25:26,359 --> 00:25:29,480
Speaker 1: of creepy but also sort of magical. But in truth,

386
00:25:29,520 --> 00:25:34,639
Speaker 1: it's because this system is detecting patterns that we aren't

387
00:25:34,680 --> 00:25:38,679
Speaker 1: even able to recognize ourselves. The danger there, of course,

388
00:25:39,200 --> 00:25:43,159
Speaker 1: is that the systems can sometimes mistakenly identify a pattern

389
00:25:43,520 --> 00:25:46,120
Speaker 1: when in fact there's not really a pattern there. Very

390
00:25:46,160 --> 00:25:48,720
Speaker 1: similar to the case I was explaining about with the

391
00:25:48,840 --> 00:25:52,800
Speaker 1: full moon and the busy hospital. Even computer systems can

392
00:25:52,800 --> 00:25:56,640
Speaker 1: make those sort of mistakes, and depending upon the implementation,

393
00:25:56,920 --> 00:25:59,960
Speaker 1: that can be a real problem. But that's a that's

394
00:26:00,000 --> 00:26:02,960
Speaker 1: an issue for a different podcast. Now. When it comes

395
00:26:02,960 --> 00:26:06,919
Speaker 1: to humans, pattern recognition is so ingrained in most of

396
00:26:07,000 --> 00:26:09,760
Speaker 1: us that it can actually be kind of hard to explain.

397
00:26:10,000 --> 00:26:13,280
Speaker 1: You notice, when something happens, and if that same thing

398
00:26:13,359 --> 00:26:17,080
Speaker 1: happens later with the same general results as the first time,

399
00:26:17,560 --> 00:26:22,120
Speaker 1: it reinforces your first perception of that thing, and if

400
00:26:22,119 --> 00:26:24,760
Speaker 1: it happens over and over, their brain essentially comes to

401
00:26:24,840 --> 00:26:29,280
Speaker 1: understand that when I see X happen, I can expect

402
00:26:29,400 --> 00:26:33,280
Speaker 1: why to follow, and from that you might eventually realize

403
00:26:33,320 --> 00:26:36,240
Speaker 1: that there are other correlating factors that may or may

404
00:26:36,240 --> 00:26:39,919
Speaker 1: not be present. When this goes on. With computers, the

405
00:26:39,960 --> 00:26:43,399
Speaker 1: goal is to create systems that can analyze input, whether

406
00:26:43,480 --> 00:26:46,679
Speaker 1: that input is an image file or typed text or

407
00:26:46,760 --> 00:26:50,439
Speaker 1: spoken words or whatever, and it first has to interpret

408
00:26:50,560 --> 00:26:54,320
Speaker 1: that input, has to identify it and figure out the

409
00:26:54,400 --> 00:26:58,480
Speaker 1: defining features and attributes of that input, then compare that

410
00:26:58,880 --> 00:27:02,199
Speaker 1: against known patterns to see if the input matches or

411
00:27:02,359 --> 00:27:05,880
Speaker 1: doesn't match those patterns. And in a way, you can

412
00:27:05,920 --> 00:27:08,439
Speaker 1: think of this as a computer system receiving input and

413
00:27:08,480 --> 00:27:12,639
Speaker 1: asking the question have I seen this before? And if so,

414
00:27:13,200 --> 00:27:17,640
Speaker 1: what is the correct response? If the input matches no pattern,

415
00:27:18,040 --> 00:27:21,000
Speaker 1: the system then has to have the correct response for that.

416
00:27:21,520 --> 00:27:24,919
Speaker 1: So a very simple example might just be a failed state,

417
00:27:25,000 --> 00:27:28,040
Speaker 1: in which case the virtual assistant might reply with something

418
00:27:28,080 --> 00:27:30,920
Speaker 1: like I'm sorry, I don't know how to do that yet,

419
00:27:31,320 --> 00:27:35,320
Speaker 1: or something along those lines. Now, remember earlier I mentioned

420
00:27:35,520 --> 00:27:37,760
Speaker 1: that we humans have a lot of different ways to

421
00:27:37,840 --> 00:27:42,520
Speaker 1: say the same general thing. For example, with my smart speaker,

422
00:27:42,840 --> 00:27:45,480
Speaker 1: I might ask it to turn the lights on full,

423
00:27:45,720 --> 00:27:47,760
Speaker 1: meaning I want them to be all the way up.

424
00:27:48,359 --> 00:27:52,080
Speaker 1: I might say make the lights. I might just say

425
00:27:52,240 --> 00:27:55,240
Speaker 1: make it brighter. And the system has to take this input,

426
00:27:55,680 --> 00:27:59,160
Speaker 1: analyze it, and make a statistical determination to guess at

427
00:27:59,280 --> 00:28:03,119
Speaker 1: what is that I actually want to have happen. I

428
00:28:03,200 --> 00:28:06,880
Speaker 1: say guess because in each case we're really looking at

429
00:28:06,880 --> 00:28:09,760
Speaker 1: a system that has multiple options when it comes to

430
00:28:09,880 --> 00:28:13,919
Speaker 1: a response, and each option gets a probability assigned to

431
00:28:13,960 --> 00:28:18,600
Speaker 1: it based on how closely that option matches with the input,

432
00:28:19,119 --> 00:28:22,400
Speaker 1: So I might say make it brighter, and the underlying

433
00:28:22,440 --> 00:28:26,560
Speaker 1: system recognizes that there's a n chance I mean, increase

434
00:28:26,640 --> 00:28:29,160
Speaker 1: the brightness of the lights of the room, my men,

435
00:28:29,760 --> 00:28:35,640
Speaker 1: and the system has determined that that's the most probable answer. Right,

436
00:28:35,640 --> 00:28:39,120
Speaker 1: it's probably correct, so it goes with that, but still

437
00:28:39,200 --> 00:28:41,400
Speaker 1: kind of a guess. Now, there are a lot of

438
00:28:41,400 --> 00:28:44,440
Speaker 1: different ways to go about doing this, but the one

439
00:28:44,520 --> 00:28:48,160
Speaker 1: you hear about a lot would be artificial neural networks.

440
00:28:48,560 --> 00:28:51,600
Speaker 1: I've talked a lot about these in recent episodes, so

441
00:28:51,760 --> 00:28:54,680
Speaker 1: we'll just give kind of the quick overview. So you've

442
00:28:54,720 --> 00:28:59,480
Speaker 1: got a computer system has artificial neurons. These are called nodes,

443
00:29:00,040 --> 00:29:03,160
Speaker 1: and the job of a node is to accept incoming

444
00:29:03,240 --> 00:29:07,240
Speaker 1: input from two or more sources. The node is then

445
00:29:07,280 --> 00:29:10,760
Speaker 1: to perform an operation on those inputs, and then it

446
00:29:10,800 --> 00:29:13,760
Speaker 1: generates an output, which it then passes on to other

447
00:29:13,920 --> 00:29:17,000
Speaker 1: nodes further in the system. You can think of the

448
00:29:17,080 --> 00:29:20,560
Speaker 1: nodes as existing in a series of levels, with the

449
00:29:20,560 --> 00:29:23,120
Speaker 1: top level being where input comes in and the bottom

450
00:29:23,200 --> 00:29:27,280
Speaker 1: level being where the ultimate output comes out. So the

451
00:29:27,400 --> 00:29:31,760
Speaker 1: nodes are level down except incoming inputs then perform other

452
00:29:31,800 --> 00:29:34,880
Speaker 1: operations on them and pass it further down the chain

453
00:29:34,960 --> 00:29:38,640
Speaker 1: and so on until ultimately you get an output or response.

454
00:29:38,680 --> 00:29:42,160
Speaker 1: Now that's a gross oversimplification of what's going on, but

455
00:29:42,320 --> 00:29:45,920
Speaker 1: generally you get the idea of the process. Now, let's

456
00:29:45,920 --> 00:29:48,560
Speaker 1: complicate things a little bit to get these sort of

457
00:29:48,600 --> 00:29:52,240
Speaker 1: neural networks to generate the results you want. One thing

458
00:29:52,280 --> 00:29:56,240
Speaker 1: you can do is mess with how each node values

459
00:29:56,440 --> 00:30:00,320
Speaker 1: or ways each of the inputs coming into that node.

460
00:30:01,040 --> 00:30:04,480
Speaker 1: So I'm going to use some names human names for

461
00:30:04,600 --> 00:30:08,280
Speaker 1: nodes here just to make things easier to understand. Let's

462
00:30:08,280 --> 00:30:12,960
Speaker 1: say we've got a node named Billy. Billy is on

463
00:30:13,000 --> 00:30:15,840
Speaker 1: the second layer of nodes, so it's one layer down

464
00:30:15,880 --> 00:30:19,680
Speaker 1: from where direct input comes into the system. So there

465
00:30:19,680 --> 00:30:24,240
Speaker 1: are nodes above Billy that are sending information to Billy.

466
00:30:24,480 --> 00:30:27,360
Speaker 1: We'll say that the two nodes that give Billy information

467
00:30:27,360 --> 00:30:31,920
Speaker 1: are named Sue and Jim Bob. Sue and Jim Bob

468
00:30:32,320 --> 00:30:35,800
Speaker 1: send Billy information, and it's Billy's job to determine what

469
00:30:36,040 --> 00:30:39,160
Speaker 1: further information to send down the pipeline. Like I need

470
00:30:39,200 --> 00:30:42,280
Speaker 1: to do an operation based on this bits of these

471
00:30:42,320 --> 00:30:44,680
Speaker 1: bits of information that are coming to me, and then

472
00:30:44,720 --> 00:30:47,680
Speaker 1: I have to come up with a result. Only Billy

473
00:30:47,760 --> 00:30:51,200
Speaker 1: has been told that Sue's information tends to be a

474
00:30:51,240 --> 00:30:56,000
Speaker 1: little more important than Jimbob's information is, and so if

475
00:30:56,040 --> 00:30:58,600
Speaker 1: there's a question as to what to do, it's better

476
00:30:58,640 --> 00:31:03,360
Speaker 1: to lean more on sue use information than on Jimbob's information.

477
00:31:03,880 --> 00:31:07,600
Speaker 1: We would call this waiting as n W E I

478
00:31:07,720 --> 00:31:11,840
Speaker 1: G H T I n G. Computer scientists wait the

479
00:31:11,960 --> 00:31:15,440
Speaker 1: inputs going into nodes in order to train a system

480
00:31:15,520 --> 00:31:19,360
Speaker 1: to generate the results to the scientists want. One way

481
00:31:19,400 --> 00:31:22,600
Speaker 1: to do this is through a process called back propagation.

482
00:31:23,320 --> 00:31:27,440
Speaker 1: Back propagation is when you know what result you want

483
00:31:27,640 --> 00:31:30,400
Speaker 1: the system to arrive at. So let's use the classic

484
00:31:30,440 --> 00:31:34,360
Speaker 1: example of identifying pictures that have cats in them. As

485
00:31:34,400 --> 00:31:37,760
Speaker 1: a human, you can quickly determine if a photo has

486
00:31:37,760 --> 00:31:40,560
Speaker 1: a cat in it or not. You'll spot it right away.

487
00:31:40,680 --> 00:31:44,680
Speaker 1: So you feed a picture through this system and you

488
00:31:44,720 --> 00:31:47,280
Speaker 1: wait for the system to tell you if yes, there's

489
00:31:47,280 --> 00:31:50,600
Speaker 1: a kitty cat in the picture or no. The images

490
00:31:50,720 --> 00:31:53,840
Speaker 1: cat free. And let's say that the picture you fed

491
00:31:53,880 --> 00:31:56,479
Speaker 1: to the system in fact does have a cat in it.

492
00:31:56,680 --> 00:31:58,640
Speaker 1: You can see it, but when you feed it through

493
00:31:58,640 --> 00:32:01,400
Speaker 1: the system, the system fail is to find the cat

494
00:32:01,520 --> 00:32:04,600
Speaker 1: and says nope, there's no cat here. Well, you know

495
00:32:05,040 --> 00:32:08,040
Speaker 1: that the system got it wrong. So what you might

496
00:32:08,080 --> 00:32:10,920
Speaker 1: do as a computer scientist is you look at that

497
00:32:11,080 --> 00:32:14,400
Speaker 1: final level of nodes right at the output level to

498
00:32:14,480 --> 00:32:17,840
Speaker 1: see which factors led those nodes to come to the

499
00:32:17,880 --> 00:32:21,600
Speaker 1: conclusion that there was no cat in the photo. You

500
00:32:21,680 --> 00:32:24,200
Speaker 1: then look at the inputs that are coming into those

501
00:32:24,240 --> 00:32:26,959
Speaker 1: nodes and you see how they are weighted, and you

502
00:32:27,080 --> 00:32:31,000
Speaker 1: change the weights of those inputs in order to force

503
00:32:31,120 --> 00:32:34,440
Speaker 1: that last level of nodes to say, oh, no, there

504
00:32:34,480 --> 00:32:37,040
Speaker 1: definitely is a cat here. And so on. You move

505
00:32:37,320 --> 00:32:40,640
Speaker 1: up from the output level and you go up level

506
00:32:40,720 --> 00:32:45,000
Speaker 1: by level, tweaking the waitings of incoming data so that

507
00:32:45,080 --> 00:32:48,720
Speaker 1: the system is tweaked to more accurately determined if a

508
00:32:48,760 --> 00:32:51,719
Speaker 1: photo has a cat in it. Now, this takes a

509
00:32:51,760 --> 00:32:55,760
Speaker 1: lot of work, and it also means using huge data sets.

510
00:32:55,840 --> 00:32:59,520
Speaker 1: You know, you're feeding hundreds of thousands or millions of images,

511
00:32:59,760 --> 00:33:02,800
Speaker 1: so of them with cats, some of them without, and

512
00:33:02,920 --> 00:33:05,280
Speaker 1: training the system over and over again to train it

513
00:33:05,360 --> 00:33:08,560
Speaker 1: before you start feeding it brand new images to see

514
00:33:08,560 --> 00:33:11,240
Speaker 1: if it still works. And this can be a laborious

515
00:33:11,280 --> 00:33:14,240
Speaker 1: process to train a machine learning system, but the result

516
00:33:14,320 --> 00:33:16,840
Speaker 1: is that you end up with a system that hopefully

517
00:33:17,080 --> 00:33:19,640
Speaker 1: is pretty accurate a doing whatever it was you were

518
00:33:19,680 --> 00:33:22,840
Speaker 1: training it to do, you know, like recognized cats. But

519
00:33:22,920 --> 00:33:26,960
Speaker 1: that's just one approach to machine learning. There are others.

520
00:33:27,600 --> 00:33:30,600
Speaker 1: Some like the version I just described, fall into a

521
00:33:30,640 --> 00:33:37,040
Speaker 1: broad category called supervised learning. Others are in unsupervised learning.

522
00:33:37,320 --> 00:33:42,520
Speaker 1: In fact, Kalo was largely built through unsupervised learning, meaning

523
00:33:42,880 --> 00:33:46,880
Speaker 1: the machine had to train itself as it performed tasks

524
00:33:47,320 --> 00:33:51,240
Speaker 1: using inputs that hadn't been curated specifically for training purposes.

525
00:33:51,280 --> 00:33:54,200
Speaker 1: It's just an enormous amount of information coming in that

526
00:33:54,320 --> 00:33:57,400
Speaker 1: the system has to process. So, in other words, for Kalo,

527
00:33:57,480 --> 00:34:00,160
Speaker 1: the system wasn't dealing with like a stack of a

528
00:34:00,200 --> 00:34:04,480
Speaker 1: million photos, seventy of which had cats and which didn't.

529
00:34:04,920 --> 00:34:08,200
Speaker 1: Kayla was working with real world information and attempting to

530
00:34:08,239 --> 00:34:12,000
Speaker 1: suss out what to do with it in real time. Now,

531
00:34:12,040 --> 00:34:16,080
Speaker 1: to go into how unsupervised machine learning works would require

532
00:34:16,080 --> 00:34:19,080
Speaker 1: a full episode on its own, but it is a

533
00:34:19,120 --> 00:34:23,279
Speaker 1: fascinating and complicated subject, so I probably will tackle it

534
00:34:23,320 --> 00:34:25,600
Speaker 1: at some point. I'm just gonna spare you guys for

535
00:34:25,680 --> 00:34:28,520
Speaker 1: right now. The real point I'm making is that s

536
00:34:28,640 --> 00:34:32,040
Speaker 1: RI I International spent years building out systems that could

537
00:34:32,080 --> 00:34:35,920
Speaker 1: do a wide range of tasks based on inputs. Pattern

538
00:34:35,960 --> 00:34:39,600
Speaker 1: recognition was actually just one relatively small piece of that.

539
00:34:40,200 --> 00:34:43,040
Speaker 1: Creating an ability to pull data from different sources in

540
00:34:43,040 --> 00:34:46,759
Speaker 1: a meaningful way is its own incredibly challenging problem, as

541
00:34:46,800 --> 00:34:50,680
Speaker 1: I alluded to earlier, particularly as the number of sources

542
00:34:50,680 --> 00:34:53,920
Speaker 1: you're pulling from and the variety of formats the data

543
00:34:54,000 --> 00:34:57,120
Speaker 1: is in begins to increase, it becomes easier for the

544
00:34:57,120 --> 00:35:00,960
Speaker 1: system to make mistakes as you throw more variety at it,

545
00:35:01,080 --> 00:35:04,800
Speaker 1: and it requires a lot of refinement. Frankly, it's actually

546
00:35:04,960 --> 00:35:08,480
Speaker 1: a task that's so big I have trouble grasping it.

547
00:35:09,120 --> 00:35:13,719
Speaker 1: The Kalo project became the largest AI program in history

548
00:35:13,800 --> 00:35:17,480
Speaker 1: up to that point. It was an incredible achievement. It

549
00:35:17,520 --> 00:35:22,040
Speaker 1: brought together different disciplines of artificial intelligence into a cohesive

550
00:35:22,120 --> 00:35:26,080
Speaker 1: project with a solid goal. By the two thousand's, artificial

551
00:35:26,080 --> 00:35:31,120
Speaker 1: intelligence was a sprawling collection of computer science disciplines, each

552
00:35:31,160 --> 00:35:34,480
Speaker 1: with incredible depth to them. So you might find an

553
00:35:34,480 --> 00:35:37,520
Speaker 1: expert in one field of AI who would have little

554
00:35:37,560 --> 00:35:41,400
Speaker 1: to no experience with another branch under the same general

555
00:35:41,440 --> 00:35:45,440
Speaker 1: discipline of artificial intelligence. There was a prevailing feeling that

556
00:35:45,520 --> 00:35:48,680
Speaker 1: the various branches of AI had each become so complex

557
00:35:49,000 --> 00:35:52,960
Speaker 1: they would never work together. The Kalo project proved that wrong.

558
00:35:53,680 --> 00:35:57,000
Speaker 1: When we come back, i'll explain how part of this

559
00:35:57,120 --> 00:36:00,600
Speaker 1: military project would break away to become the virtual assistant,

560
00:36:01,120 --> 00:36:05,160
Speaker 1: ultimately finding its way onto iOS devices. But first let's

561
00:36:05,160 --> 00:36:19,000
Speaker 1: take another quick break. Adam Chair, whose name I'm likely mispronouncing,

562
00:36:19,000 --> 00:36:21,880
Speaker 1: and I apologize, but he was an engineer at s

563
00:36:22,000 --> 00:36:24,480
Speaker 1: r I working on Kalo, and he worked with a

564
00:36:24,560 --> 00:36:27,839
Speaker 1: team that had the daunting task of assimilating the work

565
00:36:28,040 --> 00:36:31,720
Speaker 1: that was being done by twenties seven different engineering teams

566
00:36:32,440 --> 00:36:36,839
Speaker 1: into a cohesive virtual assistant. So, as I mentioned just

567
00:36:36,960 --> 00:36:40,000
Speaker 1: before the break, the disciplines of AI had each gotten

568
00:36:40,160 --> 00:36:45,000
Speaker 1: very deep, very broad, and required a lot of specialization.

569
00:36:45,320 --> 00:36:48,759
Speaker 1: So you have these different engineering teams working within various disciplines,

570
00:36:49,280 --> 00:36:52,399
Speaker 1: and it was chairs team that needed to bring all

571
00:36:52,400 --> 00:36:56,040
Speaker 1: these together and make it into a working, coherent hole.

572
00:36:56,560 --> 00:36:59,880
Speaker 1: The results were really phenomenal. Now I'll give you a

573
00:37:00,040 --> 00:37:04,799
Speaker 1: hypothetical use for Kalo. Let's say that you've got a

574
00:37:04,800 --> 00:37:08,640
Speaker 1: project team and there are ten people on your team,

575
00:37:08,760 --> 00:37:12,520
Speaker 1: including you, and let's say there's a meeting that's on

576
00:37:12,560 --> 00:37:16,879
Speaker 1: the books for tomorrow morning at a particular conference room,

577
00:37:16,920 --> 00:37:19,400
Speaker 1: and it's supposed to be a status update meeting for

578
00:37:19,440 --> 00:37:22,840
Speaker 1: the project. It turns out that two out of the

579
00:37:22,920 --> 00:37:25,360
Speaker 1: ten people on your team are no longer able to

580
00:37:25,440 --> 00:37:29,960
Speaker 1: make the meeting due to last minute high priority conflicts,

581
00:37:30,040 --> 00:37:33,359
Speaker 1: so they've had to cancel out of the meeting. KALO

582
00:37:33,440 --> 00:37:36,319
Speaker 1: would be able to detect the change in status of

583
00:37:36,360 --> 00:37:38,799
Speaker 1: those two people and say, all right, these two are

584
00:37:38,880 --> 00:37:42,640
Speaker 1: no longer going to the meeting. Then KALO could determine

585
00:37:42,719 --> 00:37:46,200
Speaker 1: how important those two people were to the overall team,

586
00:37:46,320 --> 00:37:49,719
Speaker 1: essentially saying what are their roles? What what role are

587
00:37:49,719 --> 00:37:53,080
Speaker 1: they performing within the context of this team, and is

588
00:37:53,120 --> 00:37:56,200
Speaker 1: it a critical role for this meeting. It can also

589
00:37:56,200 --> 00:37:58,680
Speaker 1: look at the importance of the meeting itself, like, oh, well,

590
00:37:58,719 --> 00:38:01,440
Speaker 1: this is a status update, so it's really just to

591
00:38:01,520 --> 00:38:04,600
Speaker 1: keep the team, you know, informed of what's going on.

592
00:38:05,440 --> 00:38:08,120
Speaker 1: It's not a mission critical type of meeting. It could

593
00:38:08,120 --> 00:38:11,160
Speaker 1: take all that into account. Then KALO can make a

594
00:38:11,160 --> 00:38:14,359
Speaker 1: determination on its own whether or not it should keep

595
00:38:14,400 --> 00:38:17,239
Speaker 1: the meeting in place and go ahead just without those

596
00:38:17,280 --> 00:38:20,720
Speaker 1: two people and maybe just send updates to those two people,

597
00:38:21,320 --> 00:38:24,960
Speaker 1: or to cancel the meeting entirely notifying all the participants

598
00:38:25,000 --> 00:38:28,600
Speaker 1: about it. Then look at the different calendars of those participants,

599
00:38:28,920 --> 00:38:33,040
Speaker 1: book a new meeting, including securing a space for that

600
00:38:33,160 --> 00:38:36,799
Speaker 1: meeting and sending out new invites. It would even be

601
00:38:36,880 --> 00:38:39,400
Speaker 1: able to look at the purpose of the meeting and

602
00:38:39,480 --> 00:38:43,279
Speaker 1: flag information that's relevant to that meeting, essentially creating a

603
00:38:43,320 --> 00:38:47,640
Speaker 1: sort of meeting dossier on demand. So it's really, you know,

604
00:38:47,760 --> 00:38:53,000
Speaker 1: incredible sophisticated stuff. Now, that was the fully fledged Kalo,

605
00:38:53,800 --> 00:38:58,000
Speaker 1: but an offshoot of this project, or maybe it's it's

606
00:38:58,040 --> 00:39:00,480
Speaker 1: better to say it was a smaller sister project that

607
00:39:00,520 --> 00:39:02,960
Speaker 1: existed at the same time it launched in two thousand three.

608
00:39:03,000 --> 00:39:07,440
Speaker 1: Along with Kalo. This other one was called Vanguard, at

609
00:39:07,480 --> 00:39:10,000
Speaker 1: least within s r I, and it was taking a

610
00:39:10,000 --> 00:39:15,000
Speaker 1: more scaled down approach of building out an assistant and

611
00:39:15,040 --> 00:39:19,280
Speaker 1: looking at how it could be useful on mobile devices. Now, again,

612
00:39:19,280 --> 00:39:22,319
Speaker 1: this was in two thousand three, before smartphones would really

613
00:39:22,360 --> 00:39:26,120
Speaker 1: become a mainstream product because Apple wouldn't even introduce the

614
00:39:26,160 --> 00:39:29,440
Speaker 1: iPhone until two thousand seven. But s r I was

615
00:39:29,480 --> 00:39:32,920
Speaker 1: working on implementations of a more limited virtual assistant and

616
00:39:32,960 --> 00:39:36,880
Speaker 1: then showing it off to companies like Motorola. One person

617
00:39:37,160 --> 00:39:40,840
Speaker 1: at Motorola who was really impressed with this work was

618
00:39:40,880 --> 00:39:45,319
Speaker 1: a guy named Dog Kittlaus. Kittlaus attempted to convince his

619
00:39:45,360 --> 00:39:49,239
Speaker 1: superiors that Motorola that Vanguard was a really important piece

620
00:39:49,280 --> 00:39:53,200
Speaker 1: of work, but he didn't find any real interest over

621
00:39:53,320 --> 00:39:57,279
Speaker 1: at Motorola, so he did something fairly brazen. In two

622
00:39:57,280 --> 00:40:00,600
Speaker 1: thousand seven, he quit his job at Motorole and he

623
00:40:00,719 --> 00:40:04,800
Speaker 1: joined SRI International with the intent of exploring ways to

624
00:40:04,960 --> 00:40:09,080
Speaker 1: spin off a new business that would develop an implementation

625
00:40:09,280 --> 00:40:14,480
Speaker 1: of the Kalo Vanguard virtual assistant, but for the consumer market.

626
00:40:15,040 --> 00:40:19,080
Speaker 1: The result would be a new company called Sirie s

627
00:40:19,120 --> 00:40:21,799
Speaker 1: I r I, which is kind of the way you

628
00:40:21,800 --> 00:40:24,840
Speaker 1: would say s r I if you were trying to

629
00:40:24,880 --> 00:40:27,480
Speaker 1: pronounce it as if it were an acronym as opposed

630
00:40:27,560 --> 00:40:32,280
Speaker 1: to an initialism. Adam Chair, after some convincing from Kittlaus,

631
00:40:32,840 --> 00:40:36,480
Speaker 1: joined the venture as the vice president of Engineering. Kit

632
00:40:36,600 --> 00:40:40,320
Speaker 1: Loss would be the CEO. Tom Gruber, who had studied

633
00:40:40,320 --> 00:40:44,120
Speaker 1: computer science at Stanford and then pioneered work in various

634
00:40:44,160 --> 00:40:48,160
Speaker 1: fields of artificial intelligence, would become the chief technology officer

635
00:40:48,360 --> 00:40:53,640
Speaker 1: for the company. Interestingly, the Serie team didn't initially call

636
00:40:53,920 --> 00:41:00,000
Speaker 1: their own virtual assistant project SIRIE. Instead, the new spinoff company,

637
00:41:00,520 --> 00:41:04,960
Speaker 1: SIRI would call their virtual Assistant how H a l

638
00:41:05,440 --> 00:41:08,719
Speaker 1: after the AI system in the book and film two

639
00:41:08,800 --> 00:41:11,960
Speaker 1: thousand one. They did take an extra step to reassure

640
00:41:12,000 --> 00:41:15,719
Speaker 1: people that this time HOW would behave itself. So, if

641
00:41:15,719 --> 00:41:18,480
Speaker 1: you're not familiar with the story of two thousand one,

642
00:41:19,040 --> 00:41:24,239
Speaker 1: the artificially Intelligent computer system HOW begins to malfunction and

643
00:41:24,280 --> 00:41:26,880
Speaker 1: begins to interpret its mission in such a way that

644
00:41:27,000 --> 00:41:29,920
Speaker 1: it compels it to start killing off the crew inside

645
00:41:29,920 --> 00:41:33,560
Speaker 1: a spacecraft, kind of a worst case scenario with AI.

646
00:41:34,200 --> 00:41:37,480
Speaker 1: While SIRIE began to get off the ground, it was

647
00:41:37,560 --> 00:41:41,759
Speaker 1: licensing technologies from s r I to power the virtual assistant,

648
00:41:42,120 --> 00:41:44,839
Speaker 1: and it also began to hire the talent needed to

649
00:41:44,960 --> 00:41:48,799
Speaker 1: bring this idea to life. At the same time, Apple

650
00:41:49,160 --> 00:41:52,319
Speaker 1: was pushing the smartphone industry into the limelight with the

651
00:41:52,360 --> 00:41:54,880
Speaker 1: introduction of the first iPhone. This was all happening at

652
00:41:54,920 --> 00:41:58,200
Speaker 1: two thousand seven. It was clear that the push for

653
00:41:58,280 --> 00:42:01,480
Speaker 1: a virtual assistant was coming at just the right time,

654
00:42:01,600 --> 00:42:06,880
Speaker 1: as Apple's implementation of smartphone technology was a grand slam

655
00:42:06,920 --> 00:42:11,040
Speaker 1: home run. To use a sports analogy, it soon became

656
00:42:11,080 --> 00:42:14,239
Speaker 1: obvious that the future of computing was going to be,

657
00:42:14,320 --> 00:42:18,480
Speaker 1: at least in large part mobile That in turn opened

658
00:42:18,520 --> 00:42:21,640
Speaker 1: up opportunities to create new ways to interact with mobile

659
00:42:21,640 --> 00:42:24,560
Speaker 1: devices in order to do the stuff we needed to

660
00:42:24,640 --> 00:42:28,280
Speaker 1: do now. It's obvious to say this, but mobile devices

661
00:42:28,320 --> 00:42:32,200
Speaker 1: have a very different user interface from your typical computer.

662
00:42:32,600 --> 00:42:35,760
Speaker 1: Interacting with a handheld computer by tapping on a screen

663
00:42:35,960 --> 00:42:40,880
Speaker 1: or talking to it creates different opportunities for crafting experiences

664
00:42:41,280 --> 00:42:44,520
Speaker 1: than someone sitting down to a computer with a keyboard

665
00:42:44,520 --> 00:42:48,799
Speaker 1: and mouse. There's a potential need for a voice activated

666
00:42:48,880 --> 00:42:51,760
Speaker 1: personal assistant that could help you carry out your tasks,

667
00:42:51,800 --> 00:42:56,720
Speaker 1: particularly ones that might need multiple steps. Sirie the Company

668
00:42:57,000 --> 00:43:00,160
Speaker 1: came along just as the need for Sirie the App

669
00:43:00,360 --> 00:43:03,040
Speaker 1: was beginning to take shape, so it was the right

670
00:43:03,080 --> 00:43:07,280
Speaker 1: place at the right time. In two thousand seven, Apple

671
00:43:07,360 --> 00:43:10,960
Speaker 1: had not yet opened up the opportunity for independent app

672
00:43:11,000 --> 00:43:15,200
Speaker 1: developers to submit apps for the iPhone. That wouldn't actually

673
00:43:15,200 --> 00:43:18,160
Speaker 1: happen until July tenth, two thou eight, essentially a year

674
00:43:18,200 --> 00:43:21,960
Speaker 1: after the iPhone had debuted. The Serie team was still

675
00:43:22,360 --> 00:43:25,600
Speaker 1: hard at work building out the virtual assistant app they

676
00:43:25,600 --> 00:43:28,719
Speaker 1: had in mind in two thousand and eight, while they

677
00:43:28,760 --> 00:43:32,440
Speaker 1: were licensing technology from s r I International, you know,

678
00:43:32,480 --> 00:43:35,839
Speaker 1: from the Vanguard and the the Kalo projects, they still

679
00:43:35,880 --> 00:43:38,120
Speaker 1: had to build out the systems that would actually power

680
00:43:38,200 --> 00:43:42,640
Speaker 1: Syria on the back end. Generally speaking, their approach was

681
00:43:42,719 --> 00:43:45,560
Speaker 1: to create an app where a person could ask Syria

682
00:43:45,680 --> 00:43:49,319
Speaker 1: question and the app would record that request as a

683
00:43:49,360 --> 00:43:53,000
Speaker 1: little audio file, send that audio file to a server

684
00:43:53,160 --> 00:43:55,879
Speaker 1: and a data center, and the first step then would

685
00:43:55,920 --> 00:44:00,200
Speaker 1: be to transcribe the audio file into text, so we're

686
00:44:00,200 --> 00:44:03,479
Speaker 1: talking about speech to text here. Then the system would

687
00:44:03,480 --> 00:44:07,400
Speaker 1: need to parse the request. What is actually being asked here?

688
00:44:07,480 --> 00:44:11,719
Speaker 1: What is the command or request saying. Now, in some systems,

689
00:44:12,080 --> 00:44:15,440
Speaker 1: a computer will break down a sentence into its various components,

690
00:44:15,480 --> 00:44:19,000
Speaker 1: you know, a subject, verb, and object, and then try

691
00:44:19,080 --> 00:44:22,560
Speaker 1: to figure out what is actually being set. Adam Chair

692
00:44:22,680 --> 00:44:26,759
Speaker 1: took a different approach with his team. They taught their

693
00:44:26,800 --> 00:44:31,399
Speaker 1: system the meaning of real world objects. So, rather than

694
00:44:31,480 --> 00:44:34,760
Speaker 1: trying to parse out what a sentence meant by first

695
00:44:34,880 --> 00:44:38,760
Speaker 1: figuring out what's the subject, what's the verb, and what's

696
00:44:38,800 --> 00:44:42,560
Speaker 1: the object that the subject is acting upon, Siri started

697
00:44:42,560 --> 00:44:46,040
Speaker 1: off by looking at real world concepts within the request.

698
00:44:46,719 --> 00:44:50,319
Speaker 1: Siri would then map the request against a list of

699
00:44:50,400 --> 00:44:55,480
Speaker 1: possible responses and then employ that statistical probability model that

700
00:44:55,560 --> 00:44:59,120
Speaker 1: I mentioned earlier. What are the odds that someone was

701
00:44:59,160 --> 00:45:02,960
Speaker 1: asking for dire actions to an Italian restaurant versus asking

702
00:45:03,040 --> 00:45:06,640
Speaker 1: Siri to provide a recipe for an Italian dish, for example.

703
00:45:07,120 --> 00:45:10,439
Speaker 1: So if I activate my virtual assistant and say I

704
00:45:10,520 --> 00:45:15,279
Speaker 1: want linguini, that's a pretty broad thing to say, right.

705
00:45:15,440 --> 00:45:17,799
Speaker 1: The app has to guess at whether I mean I

706
00:45:17,880 --> 00:45:21,719
Speaker 1: want to go someplace that serves linguini or I want

707
00:45:21,719 --> 00:45:25,080
Speaker 1: to make it myself. Now, my personal app would have

708
00:45:25,200 --> 00:45:29,000
Speaker 1: learned by my behaviors that I am very lazy and

709
00:45:29,000 --> 00:45:31,960
Speaker 1: would realize that I am actually asking for someone to

710
00:45:32,000 --> 00:45:35,880
Speaker 1: bring me linguini. So there's no doubt Siri would return

711
00:45:35,920 --> 00:45:39,160
Speaker 1: results of Italian restaurants that deliver as a result from

712
00:45:39,160 --> 00:45:42,359
Speaker 1: my request. And keep in mind, Sirie was intended to

713
00:45:42,440 --> 00:45:45,319
Speaker 1: learn from user behaviors and a tune itself to those

714
00:45:45,360 --> 00:45:50,520
Speaker 1: behaviors over time. Beyond that, Siri would pull information from

715
00:45:50,600 --> 00:45:54,320
Speaker 1: multiple sources to provide results. So if I asked about

716
00:45:54,320 --> 00:45:57,960
Speaker 1: a restaurant, Siri would provide all sorts of data about

717
00:45:58,040 --> 00:46:01,440
Speaker 1: the restaurant, from user reviews, to directions to the restaurant,

718
00:46:01,520 --> 00:46:04,640
Speaker 1: to menu items to what price range I might expect

719
00:46:05,160 --> 00:46:08,440
Speaker 1: at that place. Syria could also tap into other stuff

720
00:46:08,480 --> 00:46:12,680
Speaker 1: like the phone's location, and thus give relevant answers based

721
00:46:12,719 --> 00:46:15,640
Speaker 1: on my location, so I wouldn't have to worry about

722
00:46:15,680 --> 00:46:19,000
Speaker 1: getting irrelevant search results if I happened to be far

723
00:46:19,120 --> 00:46:23,359
Speaker 1: from home, right Siri wouldn't suggest that I go and

724
00:46:23,440 --> 00:46:25,480
Speaker 1: get food from a place that's right down the street

725
00:46:25,480 --> 00:46:28,320
Speaker 1: from my house in Atlanta while I happen to be

726
00:46:28,360 --> 00:46:31,719
Speaker 1: in New York City, for example. The team also gave

727
00:46:31,800 --> 00:46:35,680
Speaker 1: Sirie a bit of an attitude. Siri could be sassy

728
00:46:35,840 --> 00:46:38,279
Speaker 1: and had a bit of a potty mouth. In fact,

729
00:46:38,320 --> 00:46:41,600
Speaker 1: Siri would occasionally drop an F bomb here or there now.

730
00:46:41,600 --> 00:46:45,920
Speaker 1: According to Kittlaus, the goal was eventually to offer extensions

731
00:46:45,960 --> 00:46:48,719
Speaker 1: to Siri so that end users could kind of pick

732
00:46:48,800 --> 00:46:53,600
Speaker 1: the apps personality. Maybe you want a no nonsense virtual

733
00:46:53,600 --> 00:46:56,920
Speaker 1: assistant that just provides the information you need and that's it.

734
00:46:57,760 --> 00:47:01,600
Speaker 1: Maybe you wanted more of a good fee sidekick, or

735
00:47:01,640 --> 00:47:04,960
Speaker 1: maybe you wanted a virtual assistant who could give you

736
00:47:05,000 --> 00:47:08,520
Speaker 1: some serious attitude on occasion. The goal down the line

737
00:47:08,600 --> 00:47:10,880
Speaker 1: was to create options for people to kind of shape

738
00:47:10,960 --> 00:47:14,040
Speaker 1: their experience, but that would end up on the cutting

739
00:47:14,120 --> 00:47:18,600
Speaker 1: room floor due to a very big reason. The serie

740
00:47:18,719 --> 00:47:24,239
Speaker 1: app made its debut in the iPhone app store. In January,

741
00:47:24,280 --> 00:47:28,120
Speaker 1: three weeks after it debuted, Kit Loss received a phone

742
00:47:28,120 --> 00:47:32,080
Speaker 1: call from an unlisted number, a call that he almost

743
00:47:32,320 --> 00:47:35,720
Speaker 1: didn't even answer, but when he did answer, the person

744
00:47:35,800 --> 00:47:37,600
Speaker 1: on the other end of the call happened to be

745
00:47:37,719 --> 00:47:42,120
Speaker 1: Steve Jobs, the CEO of Apple. Jobs was over the

746
00:47:42,160 --> 00:47:45,040
Speaker 1: moon about Sirie and wanted to meet with kit Lost

747
00:47:45,080 --> 00:47:48,919
Speaker 1: to discover some pretty enormous options, the biggest one being

748
00:47:48,960 --> 00:47:53,240
Speaker 1: that Apple itself would acquire Sirie. Now. At the time Sirie,

749
00:47:53,239 --> 00:47:56,200
Speaker 1: the company was working on developing a version of the

750
00:47:56,239 --> 00:47:59,920
Speaker 1: app for Android phones, having reached a deal with varies

751
00:48:00,080 --> 00:48:02,920
Speaker 1: in to create a version of Sirie that could be

752
00:48:03,000 --> 00:48:06,520
Speaker 1: the default app on all Verizon Android phones moving forward.

753
00:48:07,200 --> 00:48:11,680
Speaker 1: The Apple deal would ultimately derail that agreement, as Jobs

754
00:48:11,760 --> 00:48:16,080
Speaker 1: was insistent that Sirie be an Apple exclusive. In fact,

755
00:48:16,400 --> 00:48:22,480
Speaker 1: when Apple would introduce Sirie on October fourth, two thousand eleven,

756
00:48:23,440 --> 00:48:26,760
Speaker 1: it seemed like it was being presented as a purely

757
00:48:26,960 --> 00:48:32,600
Speaker 1: Apple product, that it didn't have a life outside of

758
00:48:32,680 --> 00:48:35,120
Speaker 1: Apple at all. It came across as it just being

759
00:48:35,400 --> 00:48:40,360
Speaker 1: Apple all along. And of course, the day after Apple

760
00:48:40,600 --> 00:48:45,400
Speaker 1: would introduce SyRI to the public, Steve Jobs himself passed away.

761
00:48:45,680 --> 00:48:49,319
Speaker 1: October five, two thousand eleven. But that part of the

762
00:48:49,320 --> 00:48:52,399
Speaker 1: story will have to wait for part two because, as

763
00:48:52,400 --> 00:48:56,480
Speaker 1: I said, this is going longer than I anticipated. So

764
00:48:56,520 --> 00:48:59,719
Speaker 1: in our next episode we'll pick up probably actually a

765
00:48:59,719 --> 00:49:02,759
Speaker 1: little earlier than where I'm leaving off here, actually, because

766
00:49:02,760 --> 00:49:06,359
Speaker 1: there's still some other details we should talk about as

767
00:49:06,360 --> 00:49:10,640
Speaker 1: far as how Siri works and the actual arrangement of

768
00:49:10,719 --> 00:49:14,320
Speaker 1: Apple's acquisition, and then we'll talk about how the app

769
00:49:14,520 --> 00:49:18,800
Speaker 1: has evolved and changed under Apple's ownership, and will also explore,

770
00:49:18,840 --> 00:49:22,120
Speaker 1: you know, a little bit about series distant cousins like

771
00:49:22,320 --> 00:49:26,799
Speaker 1: Alexa and Google Assistant and others, because all of these

772
00:49:26,840 --> 00:49:31,440
Speaker 1: work in similar ways, though they have their own specific

773
00:49:32,120 --> 00:49:36,680
Speaker 1: processes to handle requests, and so if you do an

774
00:49:36,680 --> 00:49:40,359
Speaker 1: Apples to Apples comparison, it does break down ultimately once

775
00:49:40,400 --> 00:49:43,600
Speaker 1: you start getting down to how things are working in

776
00:49:43,760 --> 00:49:46,520
Speaker 1: detail on the back end. So I won't go into

777
00:49:47,040 --> 00:49:50,640
Speaker 1: full mode on those because it would require multiple episodes

778
00:49:50,640 --> 00:49:53,920
Speaker 1: on that. But we will talk more about Siri and

779
00:49:54,320 --> 00:49:57,120
Speaker 1: what has happened in the years since its acquisition in

780
00:49:57,120 --> 00:50:00,120
Speaker 1: our next episode. If you guys have suggestions for future

781
00:50:00,120 --> 00:50:02,960
Speaker 1: topics I should tackle on tech stuff, let me know

782
00:50:03,320 --> 00:50:05,399
Speaker 1: the best way to do that is to reach out

783
00:50:05,480 --> 00:50:08,919
Speaker 1: on Twitter. The handle we use is text stuff H

784
00:50:09,120 --> 00:50:13,120
Speaker 1: s W and I'll talk to you again really soon.

785
00:50:18,239 --> 00:50:21,279
Speaker 1: Text Stuff is an I Heart Radio production. For more

786
00:50:21,360 --> 00:50:24,720
Speaker 1: podcasts from my heart Radio, visit the i heart Radio app,

787
00:50:24,880 --> 00:50:28,040
Speaker 1: Apple Podcasts, or wherever you listen to your favorite shows.