1 00:00:04,400 --> 00:00:07,800 Speaker 1: Welcome to Tech Stuff, a production from I Heart Radio. 2 00:00:12,360 --> 00:00:15,600 Speaker 1: Hey there, and welcome to tech Stuff. I'm your host, 3 00:00:15,800 --> 00:00:19,040 Speaker 1: Jonathan Strickland. I'm an executive producer with I Heart Radio, 4 00:00:19,079 --> 00:00:23,560 Speaker 1: and I love all things tech, And uh, you know 5 00:00:23,600 --> 00:00:26,840 Speaker 1: what the today's episode I was gonna I was gonna 6 00:00:26,840 --> 00:00:29,920 Speaker 1: make it a one partner, but it turns out there's 7 00:00:30,000 --> 00:00:33,040 Speaker 1: just way too much stuff, not just about the topic 8 00:00:33,080 --> 00:00:37,239 Speaker 1: at hand, but the various components that make up this 9 00:00:37,440 --> 00:00:40,600 Speaker 1: topic that require me to do more than one. So 10 00:00:40,760 --> 00:00:43,519 Speaker 1: this is gonna likely be a two parter. But today 11 00:00:43,960 --> 00:00:47,239 Speaker 1: I thought we could look back at the development and 12 00:00:47,400 --> 00:00:53,160 Speaker 1: evolution of a famous AI personality. This virtual assistant celebrated 13 00:00:53,200 --> 00:00:57,480 Speaker 1: an anniversary recently, and I must apologize for being a 14 00:00:57,480 --> 00:01:02,480 Speaker 1: couple of days late with this, but this particular servant 15 00:01:03,280 --> 00:01:07,120 Speaker 1: debuted on October fourth, two thousand eleven, technically for the 16 00:01:07,200 --> 00:01:11,680 Speaker 1: second time, but the history of the actual technology dates 17 00:01:11,720 --> 00:01:15,480 Speaker 1: back much further. And of course, I'm talking about Sirie, 18 00:01:15,920 --> 00:01:20,640 Speaker 1: Apple's virtual assistant that can interpret voice commands and return 19 00:01:20,720 --> 00:01:25,160 Speaker 1: results based on them. This is not just some dull 20 00:01:25,360 --> 00:01:30,680 Speaker 1: history lesson, however, Sirie really has an incredible backstory, ranging 21 00:01:30,720 --> 00:01:33,959 Speaker 1: from a science fiction vision of the future to a 22 00:01:34,120 --> 00:01:39,240 Speaker 1: secret project intended to augment the decision making capabilities of 23 00:01:39,280 --> 00:01:45,880 Speaker 1: the United States military. Yeah, Siri had a pretty tough background. 24 00:01:46,640 --> 00:01:50,720 Speaker 1: The story of Sirie is complicated, and not just because 25 00:01:50,880 --> 00:01:55,280 Speaker 1: of the internal history of developing the technology, but also 26 00:01:55,360 --> 00:01:59,440 Speaker 1: because the tool relies on a lot of converging technological 27 00:01:59,480 --> 00:02:04,600 Speaker 1: trend There are elements of voice recognition, UH, speech to text, 28 00:02:05,080 --> 00:02:09,240 Speaker 1: natural language interpretation, and other technologies that fall under the 29 00:02:09,440 --> 00:02:14,600 Speaker 1: very broad umbrella of artificial intelligence. So get settled, it's 30 00:02:14,600 --> 00:02:17,760 Speaker 1: time to talk about Siri. Also, if you're listening to 31 00:02:17,760 --> 00:02:22,320 Speaker 1: this near Apple devices, I apologize because there's a good 32 00:02:22,400 --> 00:02:26,800 Speaker 1: chance those devices might start talking back at me. But 33 00:02:26,960 --> 00:02:29,440 Speaker 1: I refuse to do an episode where I just refer 34 00:02:29,560 --> 00:02:34,240 Speaker 1: to the subject as you know who. You could argue 35 00:02:34,720 --> 00:02:37,799 Speaker 1: that the origins of Siri can be found in a 36 00:02:37,840 --> 00:02:43,960 Speaker 1: promotional video that Apple produced back in nineteen seven to 37 00:02:44,040 --> 00:02:48,440 Speaker 1: show off a concept of an artificially intelligent smart assistant. 38 00:02:49,000 --> 00:02:52,520 Speaker 1: Now that alone is interesting, but what really is amazing 39 00:02:52,800 --> 00:02:57,360 Speaker 1: is that the arbitrary date they chose as the setting 40 00:02:57,360 --> 00:03:01,440 Speaker 1: for this video was two thousand evan, probably September. We 41 00:03:01,560 --> 00:03:05,080 Speaker 1: know that because there is a part within the video 42 00:03:05,200 --> 00:03:08,840 Speaker 1: where a character asks for information that had been published 43 00:03:08,919 --> 00:03:14,080 Speaker 1: five years previously, and the published information had a publication 44 00:03:14,160 --> 00:03:17,480 Speaker 1: date of two thousand six. Now this means that the 45 00:03:17,560 --> 00:03:22,520 Speaker 1: actual debut of Syrie as an Apple product was just 46 00:03:22,760 --> 00:03:27,160 Speaker 1: one month after the fictional events in that video from nine. 47 00:03:28,680 --> 00:03:31,520 Speaker 1: That's just a coincidence, but it's a cool one. The 48 00:03:31,639 --> 00:03:37,000 Speaker 1: Knowledge Navigator video shows a man walking into a study, 49 00:03:37,360 --> 00:03:42,080 Speaker 1: really nice one, and unfolding a tablet style computer device. 50 00:03:42,560 --> 00:03:45,080 Speaker 1: Then he walks off away to stare at stuff as 51 00:03:45,120 --> 00:03:49,640 Speaker 1: a virtual assistant reads off his messages and meetings on 52 00:03:49,680 --> 00:03:54,560 Speaker 1: his calendar. The virtual assistant appears as a video and 53 00:03:54,600 --> 00:03:57,520 Speaker 1: a little window on the screen of the tablet, and 54 00:03:57,560 --> 00:03:59,760 Speaker 1: it's you know, like shot from the shoulders up, kind 55 00:03:59,800 --> 00:04:03,000 Speaker 1: of a the bust of a young man, and the 56 00:04:03,120 --> 00:04:06,440 Speaker 1: video takes up that one little corner of the tablet device. 57 00:04:06,440 --> 00:04:10,560 Speaker 1: So in this visualization, the virtual assistant isn't just a 58 00:04:10,640 --> 00:04:15,560 Speaker 1: disembodied voice. It also has a face. Also, everyone in 59 00:04:15,560 --> 00:04:19,720 Speaker 1: this video is extremely white, which I guess is kind 60 00:04:19,720 --> 00:04:24,080 Speaker 1: of a given for the time period and the people involved, 61 00:04:24,680 --> 00:04:28,680 Speaker 1: but it just comes across as so white. I mean, 62 00:04:29,160 --> 00:04:32,160 Speaker 1: we're doing this with the benefit of the glasses of 63 00:04:32,200 --> 00:04:35,400 Speaker 1: twenty I just wanted to throw that out there anyway. 64 00:04:35,560 --> 00:04:38,960 Speaker 1: The video goes on to have the real life man 65 00:04:39,160 --> 00:04:42,520 Speaker 1: who is a professor in this video, ask his virtual 66 00:04:42,520 --> 00:04:47,120 Speaker 1: assistant to pull up lecture notes uh and unread articles 67 00:04:47,160 --> 00:04:50,039 Speaker 1: that relate back to the lecture he's He's asking for 68 00:04:50,040 --> 00:04:52,440 Speaker 1: a lecture notes of a lecture he gave a year ago. 69 00:04:52,839 --> 00:04:55,440 Speaker 1: He's giving essentially the same lecture now, but he wants 70 00:04:55,440 --> 00:04:58,440 Speaker 1: to update it with the latest information, and he even 71 00:04:58,480 --> 00:05:03,159 Speaker 1: asks the virtual assistant to summarize those unread articles that 72 00:05:03,200 --> 00:05:06,520 Speaker 1: had been published in the year since his last lecture. 73 00:05:06,760 --> 00:05:12,040 Speaker 1: The virtual assistant is thus aggregating information, analyzing that information 74 00:05:12,080 --> 00:05:15,880 Speaker 1: for context, and then delivering summaries, which is that's a 75 00:05:15,880 --> 00:05:21,279 Speaker 1: pretty sophisticated set of artificially intelligent tasks. He also, the 76 00:05:21,360 --> 00:05:25,680 Speaker 1: professor uses the device and virtual assistant to call and 77 00:05:25,760 --> 00:05:29,760 Speaker 1: collaborate with a peer in real time. Now, this was 78 00:05:29,839 --> 00:05:33,840 Speaker 1: not the only video that Apple would produce to showcase 79 00:05:33,920 --> 00:05:37,560 Speaker 1: this kind of general idea, however, arguably it is the 80 00:05:37,600 --> 00:05:43,000 Speaker 1: most famous of those videos. Now, as I said, Knowledge 81 00:05:43,080 --> 00:05:47,039 Speaker 1: Navigator came out of Apple, and Steve Jobs would later 82 00:05:47,080 --> 00:05:51,880 Speaker 1: play a pivotal role in how the company would introduce Sirie, 83 00:05:52,560 --> 00:05:56,120 Speaker 1: but This was not a Steve Jobs project because Jobs 84 00:05:56,120 --> 00:05:59,840 Speaker 1: had been ousted from the company Apple, or he had 85 00:06:00,080 --> 00:06:03,240 Speaker 1: quit in disgust, depending upon which version of the story 86 00:06:03,600 --> 00:06:06,159 Speaker 1: you're listening to. Anyway, he had left a couple of 87 00:06:06,240 --> 00:06:10,279 Speaker 1: years before this video was produced. The Knowledge Navigator was 88 00:06:10,360 --> 00:06:14,200 Speaker 1: something that Apple CEO John Scully had described in a 89 00:06:14,279 --> 00:06:18,640 Speaker 1: book titled Odyssey. Now, of course, in science fiction stories 90 00:06:19,400 --> 00:06:22,240 Speaker 1: we have no shortage of instances where a human is 91 00:06:22,279 --> 00:06:26,800 Speaker 1: interacting with a computer or otherwise artificially intelligent device like 92 00:06:26,839 --> 00:06:30,520 Speaker 1: a robot, but the Knowledge Navigator seemed to lay down 93 00:06:30,560 --> 00:06:35,160 Speaker 1: the foundations toward future products like Siri and the iPad, 94 00:06:35,440 --> 00:06:39,040 Speaker 1: not to mention the potential uses of the Internet, which 95 00:06:39,040 --> 00:06:44,080 Speaker 1: inn was definitely a thing. It existed, but most of 96 00:06:44,120 --> 00:06:48,440 Speaker 1: the mainstream public remained unaware of it because the Worldwide 97 00:06:48,440 --> 00:06:51,919 Speaker 1: Web wouldn't even come along for another few years. However, 98 00:06:52,360 --> 00:06:54,760 Speaker 1: while you can look at this video and say, ah, 99 00:06:55,480 --> 00:06:59,520 Speaker 1: this must be where Apple got that idea, they probably 100 00:06:59,560 --> 00:07:02,400 Speaker 1: got to work right away on Siri, well you'd be 101 00:07:02,480 --> 00:07:06,960 Speaker 1: wrong because the early work, in fact, the vast bulk 102 00:07:07,440 --> 00:07:09,880 Speaker 1: of the work on Syrie to bring it to life, 103 00:07:10,440 --> 00:07:14,560 Speaker 1: didn't start at Apple at all. It didn't involve the company. 104 00:07:14,600 --> 00:07:19,200 Speaker 1: So our story now turns to a very different organization, 105 00:07:19,600 --> 00:07:25,640 Speaker 1: the Defense Advanced Research Projects Agency, better known as DARPA. 106 00:07:25,760 --> 00:07:29,600 Speaker 1: Now this is part of the United States Department of Defense. 107 00:07:30,080 --> 00:07:33,120 Speaker 1: Back in nineteen fifty eight, the then President of the 108 00:07:33,200 --> 00:07:39,080 Speaker 1: United States, Dwight D. Eisenhower, authorized the foundation of this agency, 109 00:07:39,280 --> 00:07:42,240 Speaker 1: though at the time it was called the Advanced Research 110 00:07:42,320 --> 00:07:46,880 Speaker 1: Project Agency or ARPA. Defense would be added later. This 111 00:07:46,960 --> 00:07:49,400 Speaker 1: agency would play a critical role in the evolution of 112 00:07:49,440 --> 00:07:53,960 Speaker 1: technologies in the United States, and the mission of DARPA 113 00:07:54,040 --> 00:07:58,520 Speaker 1: and ARPA before it is quote to make pivotal investments 114 00:07:58,600 --> 00:08:03,000 Speaker 1: and breakthrough technology is for national security end quote, and 115 00:08:03,040 --> 00:08:07,240 Speaker 1: that wording is really precise. It's easy to imagine DARPA 116 00:08:07,320 --> 00:08:11,600 Speaker 1: as being housed in some enormous underground bunker filled with 117 00:08:11,640 --> 00:08:16,520 Speaker 1: scientists who are building out crazy devices like robo scorpions 118 00:08:16,640 --> 00:08:19,680 Speaker 1: or a blender that can also teleport or something. But 119 00:08:19,800 --> 00:08:26,080 Speaker 1: in reality, DARPA is more about funding research than conducting research. Now, 120 00:08:26,080 --> 00:08:29,520 Speaker 1: don't get me wrong, the agency relies heavily on experts 121 00:08:29,560 --> 00:08:33,240 Speaker 1: to evaluate proposals and consider to whom the agency should 122 00:08:33,280 --> 00:08:36,959 Speaker 1: send money. But the purpose of DARPA is to enable 123 00:08:37,120 --> 00:08:41,680 Speaker 1: others to do important work. DARPA has played a huge 124 00:08:41,840 --> 00:08:46,640 Speaker 1: role in countless technological breakthroughs. This way. Much of the 125 00:08:46,679 --> 00:08:49,960 Speaker 1: technologies that would go on to power the Internet started 126 00:08:50,000 --> 00:08:53,400 Speaker 1: with ARPA net, a kind of precursor network to the 127 00:08:53,400 --> 00:08:57,400 Speaker 1: Internet and one that was funded by ARPA. Thus the 128 00:08:57,520 --> 00:09:01,600 Speaker 1: name the DARPA Grand Challenge just helped get self driving 129 00:09:01,640 --> 00:09:05,880 Speaker 1: cars into gear. You know, pun intended. They also created 130 00:09:05,960 --> 00:09:09,720 Speaker 1: difficult scenarios for humanoid robots to go through. That was 131 00:09:09,760 --> 00:09:13,120 Speaker 1: a few years ago and was really cool. The competitions 132 00:09:13,200 --> 00:09:17,640 Speaker 1: DARPA hosts have specific goals and metrics, and that guides 133 00:09:17,720 --> 00:09:20,840 Speaker 1: the designers and engineers who are working on them as 134 00:09:20,840 --> 00:09:24,720 Speaker 1: they build out technologies. It's good to define your goal. 135 00:09:24,840 --> 00:09:28,080 Speaker 1: It really gives you focus when you're trying to develop 136 00:09:28,160 --> 00:09:31,360 Speaker 1: the technology to meet that goal. Winning a challenge is 137 00:09:31,400 --> 00:09:34,320 Speaker 1: a big deal, though the cash prize may not even 138 00:09:34,360 --> 00:09:37,880 Speaker 1: cover the amount of money participants have spent through the 139 00:09:37,880 --> 00:09:42,400 Speaker 1: development of those technologies, and there are entire businesses, or 140 00:09:42,559 --> 00:09:46,680 Speaker 1: at least divisions within businesses that can be borne out 141 00:09:46,679 --> 00:09:50,400 Speaker 1: of these challenges. The Grand Challenges are just one way 142 00:09:50,520 --> 00:09:55,200 Speaker 1: DARPA encourages technological development. Often, the agency will create a 143 00:09:55,240 --> 00:09:59,480 Speaker 1: specific goal such as the design of a robotic exoskeleton 144 00:09:59,559 --> 00:10:03,000 Speaker 1: that can help you know, US soldiers carry heavy loads 145 00:10:03,160 --> 00:10:06,800 Speaker 1: while they are on foot for longer distances, and then 146 00:10:06,840 --> 00:10:10,439 Speaker 1: they'll send out an RFP, which is a request for proposal. 147 00:10:11,120 --> 00:10:14,680 Speaker 1: The agency considers the proposals that it receives from this 148 00:10:14,840 --> 00:10:19,040 Speaker 1: RFP and then decides which, if any, they will accept 149 00:10:19,160 --> 00:10:22,320 Speaker 1: and then fund. Then after a given amount of time. 150 00:10:22,400 --> 00:10:25,840 Speaker 1: You know, it's dependent upon the specific project, we find 151 00:10:25,880 --> 00:10:28,960 Speaker 1: out if anything comes out of it. Sometimes nothing does, 152 00:10:29,360 --> 00:10:33,360 Speaker 1: as some technological problems may prove more challenging than others 153 00:10:33,400 --> 00:10:37,680 Speaker 1: and may require more time to evolve the various technologies 154 00:10:37,720 --> 00:10:40,400 Speaker 1: to make it possible. So it might push the field, 155 00:10:40,640 --> 00:10:42,760 Speaker 1: but you might not have a finished product at the 156 00:10:42,800 --> 00:10:45,120 Speaker 1: end of it. Other times you do get a finished 157 00:10:45,120 --> 00:10:49,240 Speaker 1: product anyway. In two thousand three, a decade and a 158 00:10:49,280 --> 00:10:52,840 Speaker 1: half after the Knowledge Navigator videos came out of Apple, 159 00:10:53,480 --> 00:10:57,040 Speaker 1: DARPA identified a new opportunity, and this was one that 160 00:10:57,120 --> 00:11:00,960 Speaker 1: was borne out of necessity. The challenge was that we 161 00:11:01,040 --> 00:11:04,360 Speaker 1: have access to way more information today than we did 162 00:11:04,360 --> 00:11:08,440 Speaker 1: in the past. So decades ago, military commanders had to 163 00:11:08,480 --> 00:11:12,960 Speaker 1: make decisions based on limited information. They'd rely a great 164 00:11:13,040 --> 00:11:17,280 Speaker 1: deal on their own expertise and experience in order to 165 00:11:17,400 --> 00:11:19,360 Speaker 1: make up for the fact that they only had part 166 00:11:19,400 --> 00:11:22,160 Speaker 1: of the picture. And while a great commander has a 167 00:11:22,200 --> 00:11:26,199 Speaker 1: better chance of making the right call than an inexperienced 168 00:11:26,200 --> 00:11:30,119 Speaker 1: commander would, the limited amount of information could still contribute 169 00:11:30,160 --> 00:11:33,840 Speaker 1: to disaster. You might be the greatest commander of all time, 170 00:11:34,400 --> 00:11:37,319 Speaker 1: but if you're lacking a key part of information, you 171 00:11:37,400 --> 00:11:41,160 Speaker 1: might make a decision that is terrible. So flash forward 172 00:11:41,200 --> 00:11:44,120 Speaker 1: to two thousand three, and now the story had kind 173 00:11:44,200 --> 00:11:48,800 Speaker 1: of flip flopped. Now military commanders would receive more information 174 00:11:48,840 --> 00:11:52,920 Speaker 1: than they could reasonably handle. The challenge now wasn't to 175 00:11:53,120 --> 00:11:56,120 Speaker 1: use intuition to make up for blind spots, but rather, 176 00:11:56,559 --> 00:11:59,600 Speaker 1: how do you synthesize all this information so that you 177 00:11:59,640 --> 00:12:03,960 Speaker 1: can make the right decision. Too much information was proving 178 00:12:04,000 --> 00:12:06,640 Speaker 1: to be kind of as big a problem as too 179 00:12:06,720 --> 00:12:11,240 Speaker 1: little information, at least in some cases, and so DARPA 180 00:12:11,240 --> 00:12:14,240 Speaker 1: wished to fund the development of a smart system that 181 00:12:14,320 --> 00:12:17,560 Speaker 1: could help commanders make sense of all the data coming 182 00:12:17,600 --> 00:12:21,840 Speaker 1: in from day to day. Now, DARPA projects tend to 183 00:12:21,880 --> 00:12:26,360 Speaker 1: be labyrinthian, with lots of bits and pieces and a 184 00:12:26,360 --> 00:12:30,160 Speaker 1: lot of different companies and research labs and more organizations 185 00:12:30,240 --> 00:12:33,800 Speaker 1: might tackle all or part of one of these projects. 186 00:12:34,400 --> 00:12:38,199 Speaker 1: The cognitive computing section of DARPA had a program called 187 00:12:38,360 --> 00:12:44,640 Speaker 1: Perceptive Assistance that Learn or PAL, which seems nice. It 188 00:12:44,760 --> 00:12:47,520 Speaker 1: was this part of the program that would fund the 189 00:12:47,559 --> 00:12:52,200 Speaker 1: development of a virtual cognitive assistant. The amount of funding 190 00:12:52,640 --> 00:12:57,520 Speaker 1: was twenty two million dollars. What a great PAL. The 191 00:12:57,640 --> 00:13:02,880 Speaker 1: organization that landed this deal was s r I International, 192 00:13:03,240 --> 00:13:11,160 Speaker 1: itself an incredibly influential organization. It's a nonprofit scientific research institution. 193 00:13:11,520 --> 00:13:16,319 Speaker 1: Originally it was called the Stanford Research Institute because it 194 00:13:16,360 --> 00:13:20,000 Speaker 1: was established by the trustees of Stanford University back in 195 00:13:20,120 --> 00:13:24,120 Speaker 1: nineteen forty six, though the organization would separate from the 196 00:13:24,200 --> 00:13:28,160 Speaker 1: university formally in the nineteen seventies and become a standalone, 197 00:13:28,240 --> 00:13:33,480 Speaker 1: nonprofit scientific research lab. The organization has played a role 198 00:13:33,520 --> 00:13:38,120 Speaker 1: in advancing materials science, developing liquid crystal displays or l 199 00:13:38,160 --> 00:13:43,280 Speaker 1: c d s, creating telesurgery implementations, and more. And now 200 00:13:43,360 --> 00:13:46,720 Speaker 1: it was going to tackle DARPA's request for a cognitive 201 00:13:46,760 --> 00:13:52,360 Speaker 1: computer assistant. S r I International created a project called 202 00:13:52,400 --> 00:13:58,200 Speaker 1: the Cognitive Assistant that Learns and Organizes or KALO or 203 00:13:58,400 --> 00:14:01,320 Speaker 1: CALO if you prefer. And this appears to be another 204 00:14:01,360 --> 00:14:05,440 Speaker 1: case where they landed upon that acronym first and then 205 00:14:05,559 --> 00:14:09,480 Speaker 1: worked backward, as klo seems to come from the Latin 206 00:14:09,520 --> 00:14:15,840 Speaker 1: word colognists, which means soldiers servant, and I probably mispronounced 207 00:14:15,840 --> 00:14:19,240 Speaker 1: that because even though I was a medievalist, it's almost 208 00:14:19,280 --> 00:14:23,720 Speaker 1: criminal I never took Latin. The concept, however, hearkens back 209 00:14:23,720 --> 00:14:26,280 Speaker 1: to some of what we would see in that Knowledge 210 00:14:26,400 --> 00:14:30,560 Speaker 1: Navigator video from a system that would be able to 211 00:14:30,640 --> 00:14:36,400 Speaker 1: receive and interpret information, presumably from multiple sources, and provide 212 00:14:36,400 --> 00:14:41,040 Speaker 1: a meaningful presentation or even interpretation of that data to humans, 213 00:14:41,760 --> 00:14:44,880 Speaker 1: which is a pretty tall order, and let's break down 214 00:14:45,120 --> 00:14:47,400 Speaker 1: a bit of what an assistant would need to do 215 00:14:47,520 --> 00:14:50,920 Speaker 1: in order to accomplish this. We'll leave help the voice 216 00:14:50,920 --> 00:14:54,040 Speaker 1: activation parts for now, as that would not be absolutely 217 00:14:54,040 --> 00:14:56,080 Speaker 1: critical to make this work. You know, you might have 218 00:14:56,120 --> 00:14:59,680 Speaker 1: a system that gives daily briefings on its own, or 219 00:15:00,040 --> 00:15:02,680 Speaker 1: you might have one that you activate through text commands 220 00:15:02,760 --> 00:15:05,840 Speaker 1: or some other user interface. It wouldn't necessarily have to 221 00:15:05,880 --> 00:15:08,840 Speaker 1: be voice activated. But on the back end, what has 222 00:15:08,880 --> 00:15:11,680 Speaker 1: to happen for this to work well? Presumably such a 223 00:15:11,680 --> 00:15:14,480 Speaker 1: system would need to pull in data from a number 224 00:15:14,560 --> 00:15:18,680 Speaker 1: of disparate sources, so the assistant wouldn't just be reciting 225 00:15:18,680 --> 00:15:23,600 Speaker 1: facts and figures that we're coming from a centralized data server. Instead, 226 00:15:23,600 --> 00:15:27,040 Speaker 1: it might be assimilating data from numerous sources into a 227 00:15:27,120 --> 00:15:31,000 Speaker 1: cohesive presentation. On top of that, the data might be 228 00:15:31,000 --> 00:15:33,680 Speaker 1: in different formats, meaning the system would need to be 229 00:15:33,680 --> 00:15:37,800 Speaker 1: able to analyze the information inside different types of files. 230 00:15:38,880 --> 00:15:42,120 Speaker 1: This isn't an easy thing to do. There's a reason 231 00:15:42,280 --> 00:15:45,000 Speaker 1: we have a lot of specialized programs for working with 232 00:15:45,040 --> 00:15:49,120 Speaker 1: specific types of files. When I put together these podcasts, 233 00:15:49,680 --> 00:15:53,000 Speaker 1: I use a word processor for my notes, and I 234 00:15:53,160 --> 00:15:56,840 Speaker 1: use an audio editing piece of software to record and 235 00:15:57,080 --> 00:16:00,479 Speaker 1: edit the podcasts. Now I need both of those programs 236 00:16:00,680 --> 00:16:04,000 Speaker 1: because neither of them can do the job that the 237 00:16:04,040 --> 00:16:06,720 Speaker 1: other one does. I don't have like a all purpose 238 00:16:06,760 --> 00:16:11,440 Speaker 1: program that does everything. Accessing different file formats, even in 239 00:16:11,480 --> 00:16:15,760 Speaker 1: the same general family of applications is tricky. Beyond that, 240 00:16:16,320 --> 00:16:20,360 Speaker 1: the way information can be presented within each file could 241 00:16:20,360 --> 00:16:23,880 Speaker 1: be very different. It's very possible for us to open 242 00:16:23,960 --> 00:16:28,800 Speaker 1: up multiple spreadsheets and even using the same basic spreadsheet 243 00:16:28,800 --> 00:16:31,160 Speaker 1: program let's just say Excel, It's possible for us to 244 00:16:31,200 --> 00:16:35,240 Speaker 1: open up half a dozen Excel spreadsheets that are all 245 00:16:35,280 --> 00:16:38,680 Speaker 1: presenting the same information but doing so in different ways, 246 00:16:38,880 --> 00:16:41,760 Speaker 1: and that might not be obvious at casual glance. You 247 00:16:41,840 --> 00:16:44,960 Speaker 1: might look at one and the other and not immediately realize, oh, 248 00:16:45,160 --> 00:16:48,200 Speaker 1: these are both saying the same thing. Just think about 249 00:16:48,200 --> 00:16:51,000 Speaker 1: how information could be presented as a table or a 250 00:16:51,000 --> 00:16:55,560 Speaker 1: graph or a chart. The AI assistant would ideally be 251 00:16:55,640 --> 00:16:59,040 Speaker 1: able to access information no matter what format it was in. 252 00:16:59,560 --> 00:17:02,880 Speaker 1: Nomatter are what a version of that format it was in, 253 00:17:02,960 --> 00:17:05,199 Speaker 1: be able to interpret it and then be able to 254 00:17:05,240 --> 00:17:09,280 Speaker 1: deliver a meaningful analysis to the user. Now, as data 255 00:17:09,320 --> 00:17:13,560 Speaker 1: sets grow, this becomes increasingly difficult, which I should point 256 00:17:13,600 --> 00:17:16,600 Speaker 1: out is the whole reason DARPA wanted to fund research 257 00:17:16,640 --> 00:17:19,800 Speaker 1: into this in the first place. Military commanders were faced 258 00:17:19,840 --> 00:17:23,360 Speaker 1: with a growing mountain of information that was increasingly difficult 259 00:17:23,400 --> 00:17:28,600 Speaker 1: to parse. The analysis might also need to incorporate natural 260 00:17:28,720 --> 00:17:32,479 Speaker 1: language recognition features. And I've talked about natural language a 261 00:17:32,480 --> 00:17:35,480 Speaker 1: lot in previous episodes, but if we boil it down, 262 00:17:35,720 --> 00:17:38,679 Speaker 1: it's the language that we humans use to communicate with 263 00:17:38,720 --> 00:17:43,399 Speaker 1: one another. It's our natural way of expressing our thoughts. 264 00:17:43,440 --> 00:17:47,119 Speaker 1: But the way we humans process and communicate information is 265 00:17:47,240 --> 00:17:51,080 Speaker 1: different from how machines do it. We can be subtle. 266 00:17:51,400 --> 00:17:54,919 Speaker 1: We can use stuff like metaphors and allegories and just 267 00:17:55,080 --> 00:17:59,960 Speaker 1: different phrasing. Computers are, you know, a lot more literal. Hey, 268 00:18:00,119 --> 00:18:02,960 Speaker 1: if you break it down to the most basic unit 269 00:18:03,240 --> 00:18:06,600 Speaker 1: of machine information, you know, the bit. You see how 270 00:18:06,680 --> 00:18:10,560 Speaker 1: literal computers are. A bit is either a zero or 271 00:18:10,600 --> 00:18:13,600 Speaker 1: a one, or if you prefer, it's either off and 272 00:18:13,840 --> 00:18:18,159 Speaker 1: on or no and yes. But using lots of bits, 273 00:18:18,359 --> 00:18:21,359 Speaker 1: we can describe information in a way that provides more 274 00:18:21,400 --> 00:18:24,320 Speaker 1: subtlety than just nowhere. Yes. But my point is that 275 00:18:24,359 --> 00:18:28,520 Speaker 1: computers don't naturally process information the way we do, and 276 00:18:28,600 --> 00:18:33,400 Speaker 1: so an entire branch of artificial intelligence called natural language 277 00:18:33,400 --> 00:18:37,880 Speaker 1: processing evolved to create ways for computers to interpret what 278 00:18:37,960 --> 00:18:42,680 Speaker 1: we mean when we express things within natural language. Making 279 00:18:42,720 --> 00:18:46,080 Speaker 1: this more complicated is that, of course, there's no one 280 00:18:46,240 --> 00:18:49,439 Speaker 1: way to say any given thing. We've got lots of 281 00:18:49,480 --> 00:18:53,040 Speaker 1: ways to express the same general thought. And added to that, 282 00:18:53,680 --> 00:18:58,400 Speaker 1: we have lots of different languages. There are around seven 283 00:18:58,440 --> 00:19:02,320 Speaker 1: thousand different langue whig is spoken in the world today, 284 00:19:02,640 --> 00:19:04,919 Speaker 1: though you could probably get away with a couple of 285 00:19:05,040 --> 00:19:08,399 Speaker 1: dozen and cover the vast majority of the world's population 286 00:19:08,520 --> 00:19:11,840 Speaker 1: that way. But these languages have their own vocabularies, their 287 00:19:11,840 --> 00:19:16,119 Speaker 1: own syntaxes, their own expressions. So not only do we 288 00:19:16,200 --> 00:19:19,320 Speaker 1: have multiple ways of saying things within one language, we 289 00:19:19,400 --> 00:19:22,960 Speaker 1: have all these different languages to worry about. If you 290 00:19:23,000 --> 00:19:26,320 Speaker 1: were to send ten people into a room with an 291 00:19:26,320 --> 00:19:29,600 Speaker 1: AI assistant, and those ten people have a task they're 292 00:19:29,640 --> 00:19:33,000 Speaker 1: supposed to perform with the help of this AI assistant, 293 00:19:33,680 --> 00:19:36,240 Speaker 1: odds are no two people are going to go about 294 00:19:36,280 --> 00:19:40,240 Speaker 1: it exactly the same way. And yet a working virtual 295 00:19:40,280 --> 00:19:43,359 Speaker 1: assistant needs to be able to interpret and respond to 296 00:19:43,560 --> 00:19:47,120 Speaker 1: every case and do so reliably on the back end, 297 00:19:47,440 --> 00:19:50,080 Speaker 1: and AI system needs to be able to interpret data 298 00:19:50,119 --> 00:19:53,480 Speaker 1: coming from different sources that may have very different ways 299 00:19:53,520 --> 00:19:58,720 Speaker 1: of expressing similar ideas. This is an enormous task. Now, 300 00:19:58,720 --> 00:20:01,560 Speaker 1: when we come back, I'll talk more about what s 301 00:20:01,680 --> 00:20:04,520 Speaker 1: R I was doing and how the military project would 302 00:20:04,520 --> 00:20:08,560 Speaker 1: evolve ultimately into Apple's Personal Assistant. But first let's take 303 00:20:08,880 --> 00:20:19,359 Speaker 1: a quick break. Now I've only scratched the surface of 304 00:20:19,440 --> 00:20:22,840 Speaker 1: what makes the creation of an AI assistant capable of 305 00:20:22,880 --> 00:20:27,280 Speaker 1: accessing information from numerous sources and making that information useful 306 00:20:27,800 --> 00:20:32,040 Speaker 1: really required. Let's talk a bit about the parameters of 307 00:20:32,080 --> 00:20:35,399 Speaker 1: this project itself. So if you remember I said that 308 00:20:35,480 --> 00:20:38,919 Speaker 1: the deal was initially for twenty two million dollars, and 309 00:20:39,000 --> 00:20:42,200 Speaker 1: that would end up funding the creation of a five 310 00:20:42,400 --> 00:20:47,720 Speaker 1: hundred person project, and the project spanned five years initially 311 00:20:47,880 --> 00:20:51,680 Speaker 1: to investigate the possibility of building out such an AI system. 312 00:20:51,720 --> 00:20:55,159 Speaker 1: Over time, more money would end up going into the 313 00:20:55,240 --> 00:20:58,760 Speaker 1: research system, and it totaled around a hundred fifty million 314 00:20:58,800 --> 00:21:01,399 Speaker 1: dollars by the end of the produc inject. The lab 315 00:21:01,560 --> 00:21:04,920 Speaker 1: where it all went down would receive the charming nickname 316 00:21:05,200 --> 00:21:08,760 Speaker 1: nerd City. A large part of the project focused on 317 00:21:08,840 --> 00:21:13,159 Speaker 1: creating a program that could learn a user's behaviors. So 318 00:21:13,200 --> 00:21:17,359 Speaker 1: not only could this personal assistant respond to what you 319 00:21:17,400 --> 00:21:22,760 Speaker 1: were asking, it would gradually learn the way you behaved 320 00:21:22,840 --> 00:21:26,240 Speaker 1: and it would adapt to you to work more effectively. 321 00:21:26,800 --> 00:21:31,040 Speaker 1: Now this comes into the arena of pattern recognition. We 322 00:21:31,280 --> 00:21:34,840 Speaker 1: humans are pretty darn good at recognizing patterns. In fact, 323 00:21:35,400 --> 00:21:39,480 Speaker 1: we're so good that sometimes we will quote unquote recognize 324 00:21:39,560 --> 00:21:43,919 Speaker 1: a pattern even when there isn't a pattern there. In 325 00:21:43,960 --> 00:21:47,880 Speaker 1: some cases, this can come across as charming, such as 326 00:21:48,280 --> 00:21:52,040 Speaker 1: when we see a face in a cloud, right, that's 327 00:21:52,560 --> 00:21:55,880 Speaker 1: not really a pattern there. We're recognizing a pattern where 328 00:21:55,880 --> 00:21:58,639 Speaker 1: none really exists. It's all based on our perspective in 329 00:21:58,640 --> 00:22:02,560 Speaker 1: our imaginations. Now, in other cases, it's not so charming. 330 00:22:02,600 --> 00:22:05,159 Speaker 1: It can actually lead to faulty reasoning. So I'm going 331 00:22:05,200 --> 00:22:08,120 Speaker 1: to give you a very basic example that I hear 332 00:22:08,200 --> 00:22:11,880 Speaker 1: all the time, particularly now that we're in October and 333 00:22:11,960 --> 00:22:16,439 Speaker 1: there's some full moon weirdness going on. So there's a 334 00:22:16,480 --> 00:22:21,320 Speaker 1: fairly widespread belief that there's a connection between full moons 335 00:22:21,359 --> 00:22:25,280 Speaker 1: and an increase in the number of medical emergencies that happened. 336 00:22:25,359 --> 00:22:29,520 Speaker 1: Generally speaking, that people act irresponsibly during a full moon, 337 00:22:29,640 --> 00:22:33,760 Speaker 1: and that often results in injury, which means greater activity 338 00:22:33,800 --> 00:22:38,480 Speaker 1: at hospitals. Now, this belief is most likely due to 339 00:22:38,640 --> 00:22:43,680 Speaker 1: confirmation bias. That is, we already have a belief in place, 340 00:22:44,040 --> 00:22:46,880 Speaker 1: and the belief is that full moons lead to more 341 00:22:46,920 --> 00:22:51,000 Speaker 1: accidents because of people acting irresponsibly. That is what we believe. 342 00:22:51,720 --> 00:22:55,760 Speaker 1: It doesn't have evidence yet, and then when things do 343 00:22:55,920 --> 00:22:58,960 Speaker 1: get busy at a hospital and there happens to be 344 00:22:59,000 --> 00:23:03,159 Speaker 1: a full moon, we register that as evidence for our belief. Aha, 345 00:23:03,920 --> 00:23:07,840 Speaker 1: says the mistaken person. The full moon explains it. However, 346 00:23:08,200 --> 00:23:11,080 Speaker 1: on nights when it is busy but there is no 347 00:23:11,160 --> 00:23:14,160 Speaker 1: full moon, there's no hit, no one, no one takes 348 00:23:14,200 --> 00:23:17,280 Speaker 1: notice of how odd you know, it's crazy busy, but 349 00:23:17,359 --> 00:23:20,959 Speaker 1: there's no full moon tonight. We don't do that. Likewise, 350 00:23:21,520 --> 00:23:25,000 Speaker 1: if it happens to not be busy but there's a 351 00:23:25,040 --> 00:23:27,800 Speaker 1: full moon, you're also not likely to notice. You're not 352 00:23:27,880 --> 00:23:30,159 Speaker 1: likely to say, like hunt, it's not very busy tonight, 353 00:23:30,200 --> 00:23:33,560 Speaker 1: but there's a full moon out. So it's only when 354 00:23:33,800 --> 00:23:37,120 Speaker 1: you have the full moon and the busy hospital where 355 00:23:37,119 --> 00:23:41,360 Speaker 1: the evidence appears to support your belief and confirm your bias. 356 00:23:42,040 --> 00:23:44,480 Speaker 1: But in truth, when you take a step back and 357 00:23:44,560 --> 00:23:47,520 Speaker 1: you do an objective study and you look at the 358 00:23:47,640 --> 00:23:50,440 Speaker 1: times when a hospital is busy, and you look at 359 00:23:50,520 --> 00:23:52,439 Speaker 1: when there was a full moon, and you look to 360 00:23:52,440 --> 00:23:56,280 Speaker 1: see if there's any correlation, it falls apart. Now I 361 00:23:56,320 --> 00:23:58,959 Speaker 1: got a little off track there, But the point I 362 00:23:58,960 --> 00:24:03,040 Speaker 1: wanted to make is that we humans are biologically attuned 363 00:24:03,240 --> 00:24:08,080 Speaker 1: to recognizing patterns. It's very likely that pattern recognition is 364 00:24:08,080 --> 00:24:11,240 Speaker 1: one of the traits that really helped us survive thousands 365 00:24:11,240 --> 00:24:14,359 Speaker 1: of years ago, which is why it's so intrinsic in 366 00:24:14,400 --> 00:24:19,359 Speaker 1: the human experience. But building programs, computer systems that are 367 00:24:19,359 --> 00:24:23,880 Speaker 1: capable of identifying patterns and separating out what is signal 368 00:24:24,119 --> 00:24:28,000 Speaker 1: versus what is noise is its own really big challenge. 369 00:24:28,800 --> 00:24:31,280 Speaker 1: S r I was hoping to create a program that 370 00:24:31,320 --> 00:24:34,520 Speaker 1: could look for patterns and user behavior in order to 371 00:24:34,640 --> 00:24:38,879 Speaker 1: respond with greater precision and accuracy to user requests and 372 00:24:39,040 --> 00:24:43,680 Speaker 1: ultimately to anticipate future requests. Now we see the sort 373 00:24:43,720 --> 00:24:47,960 Speaker 1: of pattern recognition and response in lots of technology today. 374 00:24:48,000 --> 00:24:51,240 Speaker 1: There are several smart thermostats on the market right now, 375 00:24:51,440 --> 00:24:55,200 Speaker 1: for example, that can track when you tend to raise 376 00:24:55,480 --> 00:24:58,399 Speaker 1: or lower the temperature in your home, and after a while, 377 00:24:58,640 --> 00:25:01,480 Speaker 1: the thermostat learns that, hey, maybe you like it nice 378 00:25:01,480 --> 00:25:03,840 Speaker 1: and chilly at night, but you want it to be 379 00:25:03,960 --> 00:25:07,320 Speaker 1: warm and toasty in the morning, and so the thermostat 380 00:25:07,400 --> 00:25:10,840 Speaker 1: begins to adjust itself in preparation for that based on 381 00:25:10,920 --> 00:25:14,800 Speaker 1: your previous behaviors. Now that is a very simple example. 382 00:25:15,359 --> 00:25:18,960 Speaker 1: Extrapolate that out and you begin to imagine a technology 383 00:25:19,000 --> 00:25:22,639 Speaker 1: that is anticipating what you need or want, perhaps before 384 00:25:22,680 --> 00:25:26,320 Speaker 1: you're even aware of it yourself, which can get kind 385 00:25:26,359 --> 00:25:29,480 Speaker 1: of creepy but also sort of magical. But in truth, 386 00:25:29,520 --> 00:25:34,639 Speaker 1: it's because this system is detecting patterns that we aren't 387 00:25:34,680 --> 00:25:38,679 Speaker 1: even able to recognize ourselves. The danger there, of course, 388 00:25:39,200 --> 00:25:43,159 Speaker 1: is that the systems can sometimes mistakenly identify a pattern 389 00:25:43,520 --> 00:25:46,120 Speaker 1: when in fact there's not really a pattern there. Very 390 00:25:46,160 --> 00:25:48,720 Speaker 1: similar to the case I was explaining about with the 391 00:25:48,840 --> 00:25:52,800 Speaker 1: full moon and the busy hospital. Even computer systems can 392 00:25:52,800 --> 00:25:56,640 Speaker 1: make those sort of mistakes, and depending upon the implementation, 393 00:25:56,920 --> 00:25:59,960 Speaker 1: that can be a real problem. But that's a that's 394 00:26:00,000 --> 00:26:02,960 Speaker 1: an issue for a different podcast. Now. When it comes 395 00:26:02,960 --> 00:26:06,919 Speaker 1: to humans, pattern recognition is so ingrained in most of 396 00:26:07,000 --> 00:26:09,760 Speaker 1: us that it can actually be kind of hard to explain. 397 00:26:10,000 --> 00:26:13,280 Speaker 1: You notice, when something happens, and if that same thing 398 00:26:13,359 --> 00:26:17,080 Speaker 1: happens later with the same general results as the first time, 399 00:26:17,560 --> 00:26:22,120 Speaker 1: it reinforces your first perception of that thing, and if 400 00:26:22,119 --> 00:26:24,760 Speaker 1: it happens over and over, their brain essentially comes to 401 00:26:24,840 --> 00:26:29,280 Speaker 1: understand that when I see X happen, I can expect 402 00:26:29,400 --> 00:26:33,280 Speaker 1: why to follow, and from that you might eventually realize 403 00:26:33,320 --> 00:26:36,240 Speaker 1: that there are other correlating factors that may or may 404 00:26:36,240 --> 00:26:39,919 Speaker 1: not be present. When this goes on. With computers, the 405 00:26:39,960 --> 00:26:43,399 Speaker 1: goal is to create systems that can analyze input, whether 406 00:26:43,480 --> 00:26:46,679 Speaker 1: that input is an image file or typed text or 407 00:26:46,760 --> 00:26:50,439 Speaker 1: spoken words or whatever, and it first has to interpret 408 00:26:50,560 --> 00:26:54,320 Speaker 1: that input, has to identify it and figure out the 409 00:26:54,400 --> 00:26:58,480 Speaker 1: defining features and attributes of that input, then compare that 410 00:26:58,880 --> 00:27:02,199 Speaker 1: against known patterns to see if the input matches or 411 00:27:02,359 --> 00:27:05,880 Speaker 1: doesn't match those patterns. And in a way, you can 412 00:27:05,920 --> 00:27:08,439 Speaker 1: think of this as a computer system receiving input and 413 00:27:08,480 --> 00:27:12,639 Speaker 1: asking the question have I seen this before? And if so, 414 00:27:13,200 --> 00:27:17,640 Speaker 1: what is the correct response? If the input matches no pattern, 415 00:27:18,040 --> 00:27:21,000 Speaker 1: the system then has to have the correct response for that. 416 00:27:21,520 --> 00:27:24,919 Speaker 1: So a very simple example might just be a failed state, 417 00:27:25,000 --> 00:27:28,040 Speaker 1: in which case the virtual assistant might reply with something 418 00:27:28,080 --> 00:27:30,920 Speaker 1: like I'm sorry, I don't know how to do that yet, 419 00:27:31,320 --> 00:27:35,320 Speaker 1: or something along those lines. Now, remember earlier I mentioned 420 00:27:35,520 --> 00:27:37,760 Speaker 1: that we humans have a lot of different ways to 421 00:27:37,840 --> 00:27:42,520 Speaker 1: say the same general thing. For example, with my smart speaker, 422 00:27:42,840 --> 00:27:45,480 Speaker 1: I might ask it to turn the lights on full, 423 00:27:45,720 --> 00:27:47,760 Speaker 1: meaning I want them to be all the way up. 424 00:27:48,359 --> 00:27:52,080 Speaker 1: I might say make the lights. I might just say 425 00:27:52,240 --> 00:27:55,240 Speaker 1: make it brighter. And the system has to take this input, 426 00:27:55,680 --> 00:27:59,160 Speaker 1: analyze it, and make a statistical determination to guess at 427 00:27:59,280 --> 00:28:03,119 Speaker 1: what is that I actually want to have happen. I 428 00:28:03,200 --> 00:28:06,880 Speaker 1: say guess because in each case we're really looking at 429 00:28:06,880 --> 00:28:09,760 Speaker 1: a system that has multiple options when it comes to 430 00:28:09,880 --> 00:28:13,919 Speaker 1: a response, and each option gets a probability assigned to 431 00:28:13,960 --> 00:28:18,600 Speaker 1: it based on how closely that option matches with the input, 432 00:28:19,119 --> 00:28:22,400 Speaker 1: So I might say make it brighter, and the underlying 433 00:28:22,440 --> 00:28:26,560 Speaker 1: system recognizes that there's a n chance I mean, increase 434 00:28:26,640 --> 00:28:29,160 Speaker 1: the brightness of the lights of the room, my men, 435 00:28:29,760 --> 00:28:35,640 Speaker 1: and the system has determined that that's the most probable answer. Right, 436 00:28:35,640 --> 00:28:39,120 Speaker 1: it's probably correct, so it goes with that, but still 437 00:28:39,200 --> 00:28:41,400 Speaker 1: kind of a guess. Now, there are a lot of 438 00:28:41,400 --> 00:28:44,440 Speaker 1: different ways to go about doing this, but the one 439 00:28:44,520 --> 00:28:48,160 Speaker 1: you hear about a lot would be artificial neural networks. 440 00:28:48,560 --> 00:28:51,600 Speaker 1: I've talked a lot about these in recent episodes, so 441 00:28:51,760 --> 00:28:54,680 Speaker 1: we'll just give kind of the quick overview. So you've 442 00:28:54,720 --> 00:28:59,480 Speaker 1: got a computer system has artificial neurons. These are called nodes, 443 00:29:00,040 --> 00:29:03,160 Speaker 1: and the job of a node is to accept incoming 444 00:29:03,240 --> 00:29:07,240 Speaker 1: input from two or more sources. The node is then 445 00:29:07,280 --> 00:29:10,760 Speaker 1: to perform an operation on those inputs, and then it 446 00:29:10,800 --> 00:29:13,760 Speaker 1: generates an output, which it then passes on to other 447 00:29:13,920 --> 00:29:17,000 Speaker 1: nodes further in the system. You can think of the 448 00:29:17,080 --> 00:29:20,560 Speaker 1: nodes as existing in a series of levels, with the 449 00:29:20,560 --> 00:29:23,120 Speaker 1: top level being where input comes in and the bottom 450 00:29:23,200 --> 00:29:27,280 Speaker 1: level being where the ultimate output comes out. So the 451 00:29:27,400 --> 00:29:31,760 Speaker 1: nodes are level down except incoming inputs then perform other 452 00:29:31,800 --> 00:29:34,880 Speaker 1: operations on them and pass it further down the chain 453 00:29:34,960 --> 00:29:38,640 Speaker 1: and so on until ultimately you get an output or response. 454 00:29:38,680 --> 00:29:42,160 Speaker 1: Now that's a gross oversimplification of what's going on, but 455 00:29:42,320 --> 00:29:45,920 Speaker 1: generally you get the idea of the process. Now, let's 456 00:29:45,920 --> 00:29:48,560 Speaker 1: complicate things a little bit to get these sort of 457 00:29:48,600 --> 00:29:52,240 Speaker 1: neural networks to generate the results you want. One thing 458 00:29:52,280 --> 00:29:56,240 Speaker 1: you can do is mess with how each node values 459 00:29:56,440 --> 00:30:00,320 Speaker 1: or ways each of the inputs coming into that node. 460 00:30:01,040 --> 00:30:04,480 Speaker 1: So I'm going to use some names human names for 461 00:30:04,600 --> 00:30:08,280 Speaker 1: nodes here just to make things easier to understand. Let's 462 00:30:08,280 --> 00:30:12,960 Speaker 1: say we've got a node named Billy. Billy is on 463 00:30:13,000 --> 00:30:15,840 Speaker 1: the second layer of nodes, so it's one layer down 464 00:30:15,880 --> 00:30:19,680 Speaker 1: from where direct input comes into the system. So there 465 00:30:19,680 --> 00:30:24,240 Speaker 1: are nodes above Billy that are sending information to Billy. 466 00:30:24,480 --> 00:30:27,360 Speaker 1: We'll say that the two nodes that give Billy information 467 00:30:27,360 --> 00:30:31,920 Speaker 1: are named Sue and Jim Bob. Sue and Jim Bob 468 00:30:32,320 --> 00:30:35,800 Speaker 1: send Billy information, and it's Billy's job to determine what 469 00:30:36,040 --> 00:30:39,160 Speaker 1: further information to send down the pipeline. Like I need 470 00:30:39,200 --> 00:30:42,280 Speaker 1: to do an operation based on this bits of these 471 00:30:42,320 --> 00:30:44,680 Speaker 1: bits of information that are coming to me, and then 472 00:30:44,720 --> 00:30:47,680 Speaker 1: I have to come up with a result. Only Billy 473 00:30:47,760 --> 00:30:51,200 Speaker 1: has been told that Sue's information tends to be a 474 00:30:51,240 --> 00:30:56,000 Speaker 1: little more important than Jimbob's information is, and so if 475 00:30:56,040 --> 00:30:58,600 Speaker 1: there's a question as to what to do, it's better 476 00:30:58,640 --> 00:31:03,360 Speaker 1: to lean more on sue use information than on Jimbob's information. 477 00:31:03,880 --> 00:31:07,600 Speaker 1: We would call this waiting as n W E I 478 00:31:07,720 --> 00:31:11,840 Speaker 1: G H T I n G. Computer scientists wait the 479 00:31:11,960 --> 00:31:15,440 Speaker 1: inputs going into nodes in order to train a system 480 00:31:15,520 --> 00:31:19,360 Speaker 1: to generate the results to the scientists want. One way 481 00:31:19,400 --> 00:31:22,600 Speaker 1: to do this is through a process called back propagation. 482 00:31:23,320 --> 00:31:27,440 Speaker 1: Back propagation is when you know what result you want 483 00:31:27,640 --> 00:31:30,400 Speaker 1: the system to arrive at. So let's use the classic 484 00:31:30,440 --> 00:31:34,360 Speaker 1: example of identifying pictures that have cats in them. As 485 00:31:34,400 --> 00:31:37,760 Speaker 1: a human, you can quickly determine if a photo has 486 00:31:37,760 --> 00:31:40,560 Speaker 1: a cat in it or not. You'll spot it right away. 487 00:31:40,680 --> 00:31:44,680 Speaker 1: So you feed a picture through this system and you 488 00:31:44,720 --> 00:31:47,280 Speaker 1: wait for the system to tell you if yes, there's 489 00:31:47,280 --> 00:31:50,600 Speaker 1: a kitty cat in the picture or no. The images 490 00:31:50,720 --> 00:31:53,840 Speaker 1: cat free. And let's say that the picture you fed 491 00:31:53,880 --> 00:31:56,479 Speaker 1: to the system in fact does have a cat in it. 492 00:31:56,680 --> 00:31:58,640 Speaker 1: You can see it, but when you feed it through 493 00:31:58,640 --> 00:32:01,400 Speaker 1: the system, the system fail is to find the cat 494 00:32:01,520 --> 00:32:04,600 Speaker 1: and says nope, there's no cat here. Well, you know 495 00:32:05,040 --> 00:32:08,040 Speaker 1: that the system got it wrong. So what you might 496 00:32:08,080 --> 00:32:10,920 Speaker 1: do as a computer scientist is you look at that 497 00:32:11,080 --> 00:32:14,400 Speaker 1: final level of nodes right at the output level to 498 00:32:14,480 --> 00:32:17,840 Speaker 1: see which factors led those nodes to come to the 499 00:32:17,880 --> 00:32:21,600 Speaker 1: conclusion that there was no cat in the photo. You 500 00:32:21,680 --> 00:32:24,200 Speaker 1: then look at the inputs that are coming into those 501 00:32:24,240 --> 00:32:26,959 Speaker 1: nodes and you see how they are weighted, and you 502 00:32:27,080 --> 00:32:31,000 Speaker 1: change the weights of those inputs in order to force 503 00:32:31,120 --> 00:32:34,440 Speaker 1: that last level of nodes to say, oh, no, there 504 00:32:34,480 --> 00:32:37,040 Speaker 1: definitely is a cat here. And so on. You move 505 00:32:37,320 --> 00:32:40,640 Speaker 1: up from the output level and you go up level 506 00:32:40,720 --> 00:32:45,000 Speaker 1: by level, tweaking the waitings of incoming data so that 507 00:32:45,080 --> 00:32:48,720 Speaker 1: the system is tweaked to more accurately determined if a 508 00:32:48,760 --> 00:32:51,719 Speaker 1: photo has a cat in it. Now, this takes a 509 00:32:51,760 --> 00:32:55,760 Speaker 1: lot of work, and it also means using huge data sets. 510 00:32:55,840 --> 00:32:59,520 Speaker 1: You know, you're feeding hundreds of thousands or millions of images, 511 00:32:59,760 --> 00:33:02,800 Speaker 1: so of them with cats, some of them without, and 512 00:33:02,920 --> 00:33:05,280 Speaker 1: training the system over and over again to train it 513 00:33:05,360 --> 00:33:08,560 Speaker 1: before you start feeding it brand new images to see 514 00:33:08,560 --> 00:33:11,240 Speaker 1: if it still works. And this can be a laborious 515 00:33:11,280 --> 00:33:14,240 Speaker 1: process to train a machine learning system, but the result 516 00:33:14,320 --> 00:33:16,840 Speaker 1: is that you end up with a system that hopefully 517 00:33:17,080 --> 00:33:19,640 Speaker 1: is pretty accurate a doing whatever it was you were 518 00:33:19,680 --> 00:33:22,840 Speaker 1: training it to do, you know, like recognized cats. But 519 00:33:22,920 --> 00:33:26,960 Speaker 1: that's just one approach to machine learning. There are others. 520 00:33:27,600 --> 00:33:30,600 Speaker 1: Some like the version I just described, fall into a 521 00:33:30,640 --> 00:33:37,040 Speaker 1: broad category called supervised learning. Others are in unsupervised learning. 522 00:33:37,320 --> 00:33:42,520 Speaker 1: In fact, Kalo was largely built through unsupervised learning, meaning 523 00:33:42,880 --> 00:33:46,880 Speaker 1: the machine had to train itself as it performed tasks 524 00:33:47,320 --> 00:33:51,240 Speaker 1: using inputs that hadn't been curated specifically for training purposes. 525 00:33:51,280 --> 00:33:54,200 Speaker 1: It's just an enormous amount of information coming in that 526 00:33:54,320 --> 00:33:57,400 Speaker 1: the system has to process. So, in other words, for Kalo, 527 00:33:57,480 --> 00:34:00,160 Speaker 1: the system wasn't dealing with like a stack of a 528 00:34:00,200 --> 00:34:04,480 Speaker 1: million photos, seventy of which had cats and which didn't. 529 00:34:04,920 --> 00:34:08,200 Speaker 1: Kayla was working with real world information and attempting to 530 00:34:08,239 --> 00:34:12,000 Speaker 1: suss out what to do with it in real time. Now, 531 00:34:12,040 --> 00:34:16,080 Speaker 1: to go into how unsupervised machine learning works would require 532 00:34:16,080 --> 00:34:19,080 Speaker 1: a full episode on its own, but it is a 533 00:34:19,120 --> 00:34:23,279 Speaker 1: fascinating and complicated subject, so I probably will tackle it 534 00:34:23,320 --> 00:34:25,600 Speaker 1: at some point. I'm just gonna spare you guys for 535 00:34:25,680 --> 00:34:28,520 Speaker 1: right now. The real point I'm making is that s 536 00:34:28,640 --> 00:34:32,040 Speaker 1: RI I International spent years building out systems that could 537 00:34:32,080 --> 00:34:35,920 Speaker 1: do a wide range of tasks based on inputs. Pattern 538 00:34:35,960 --> 00:34:39,600 Speaker 1: recognition was actually just one relatively small piece of that. 539 00:34:40,200 --> 00:34:43,040 Speaker 1: Creating an ability to pull data from different sources in 540 00:34:43,040 --> 00:34:46,759 Speaker 1: a meaningful way is its own incredibly challenging problem, as 541 00:34:46,800 --> 00:34:50,680 Speaker 1: I alluded to earlier, particularly as the number of sources 542 00:34:50,680 --> 00:34:53,920 Speaker 1: you're pulling from and the variety of formats the data 543 00:34:54,000 --> 00:34:57,120 Speaker 1: is in begins to increase, it becomes easier for the 544 00:34:57,120 --> 00:35:00,960 Speaker 1: system to make mistakes as you throw more variety at it, 545 00:35:01,080 --> 00:35:04,800 Speaker 1: and it requires a lot of refinement. Frankly, it's actually 546 00:35:04,960 --> 00:35:08,480 Speaker 1: a task that's so big I have trouble grasping it. 547 00:35:09,120 --> 00:35:13,719 Speaker 1: The Kalo project became the largest AI program in history 548 00:35:13,800 --> 00:35:17,480 Speaker 1: up to that point. It was an incredible achievement. It 549 00:35:17,520 --> 00:35:22,040 Speaker 1: brought together different disciplines of artificial intelligence into a cohesive 550 00:35:22,120 --> 00:35:26,080 Speaker 1: project with a solid goal. By the two thousand's, artificial 551 00:35:26,080 --> 00:35:31,120 Speaker 1: intelligence was a sprawling collection of computer science disciplines, each 552 00:35:31,160 --> 00:35:34,480 Speaker 1: with incredible depth to them. So you might find an 553 00:35:34,480 --> 00:35:37,520 Speaker 1: expert in one field of AI who would have little 554 00:35:37,560 --> 00:35:41,400 Speaker 1: to no experience with another branch under the same general 555 00:35:41,440 --> 00:35:45,440 Speaker 1: discipline of artificial intelligence. There was a prevailing feeling that 556 00:35:45,520 --> 00:35:48,680 Speaker 1: the various branches of AI had each become so complex 557 00:35:49,000 --> 00:35:52,960 Speaker 1: they would never work together. The Kalo project proved that wrong. 558 00:35:53,680 --> 00:35:57,000 Speaker 1: When we come back, i'll explain how part of this 559 00:35:57,120 --> 00:36:00,600 Speaker 1: military project would break away to become the virtual assistant, 560 00:36:01,120 --> 00:36:05,160 Speaker 1: ultimately finding its way onto iOS devices. But first let's 561 00:36:05,160 --> 00:36:19,000 Speaker 1: take another quick break. Adam Chair, whose name I'm likely mispronouncing, 562 00:36:19,000 --> 00:36:21,880 Speaker 1: and I apologize, but he was an engineer at s 563 00:36:22,000 --> 00:36:24,480 Speaker 1: r I working on Kalo, and he worked with a 564 00:36:24,560 --> 00:36:27,839 Speaker 1: team that had the daunting task of assimilating the work 565 00:36:28,040 --> 00:36:31,720 Speaker 1: that was being done by twenties seven different engineering teams 566 00:36:32,440 --> 00:36:36,839 Speaker 1: into a cohesive virtual assistant. So, as I mentioned just 567 00:36:36,960 --> 00:36:40,000 Speaker 1: before the break, the disciplines of AI had each gotten 568 00:36:40,160 --> 00:36:45,000 Speaker 1: very deep, very broad, and required a lot of specialization. 569 00:36:45,320 --> 00:36:48,759 Speaker 1: So you have these different engineering teams working within various disciplines, 570 00:36:49,280 --> 00:36:52,399 Speaker 1: and it was chairs team that needed to bring all 571 00:36:52,400 --> 00:36:56,040 Speaker 1: these together and make it into a working, coherent hole. 572 00:36:56,560 --> 00:36:59,880 Speaker 1: The results were really phenomenal. Now I'll give you a 573 00:37:00,040 --> 00:37:04,799 Speaker 1: hypothetical use for Kalo. Let's say that you've got a 574 00:37:04,800 --> 00:37:08,640 Speaker 1: project team and there are ten people on your team, 575 00:37:08,760 --> 00:37:12,520 Speaker 1: including you, and let's say there's a meeting that's on 576 00:37:12,560 --> 00:37:16,879 Speaker 1: the books for tomorrow morning at a particular conference room, 577 00:37:16,920 --> 00:37:19,400 Speaker 1: and it's supposed to be a status update meeting for 578 00:37:19,440 --> 00:37:22,840 Speaker 1: the project. It turns out that two out of the 579 00:37:22,920 --> 00:37:25,360 Speaker 1: ten people on your team are no longer able to 580 00:37:25,440 --> 00:37:29,960 Speaker 1: make the meeting due to last minute high priority conflicts, 581 00:37:30,040 --> 00:37:33,359 Speaker 1: so they've had to cancel out of the meeting. KALO 582 00:37:33,440 --> 00:37:36,319 Speaker 1: would be able to detect the change in status of 583 00:37:36,360 --> 00:37:38,799 Speaker 1: those two people and say, all right, these two are 584 00:37:38,880 --> 00:37:42,640 Speaker 1: no longer going to the meeting. Then KALO could determine 585 00:37:42,719 --> 00:37:46,200 Speaker 1: how important those two people were to the overall team, 586 00:37:46,320 --> 00:37:49,719 Speaker 1: essentially saying what are their roles? What what role are 587 00:37:49,719 --> 00:37:53,080 Speaker 1: they performing within the context of this team, and is 588 00:37:53,120 --> 00:37:56,200 Speaker 1: it a critical role for this meeting. It can also 589 00:37:56,200 --> 00:37:58,680 Speaker 1: look at the importance of the meeting itself, like, oh, well, 590 00:37:58,719 --> 00:38:01,440 Speaker 1: this is a status update, so it's really just to 591 00:38:01,520 --> 00:38:04,600 Speaker 1: keep the team, you know, informed of what's going on. 592 00:38:05,440 --> 00:38:08,120 Speaker 1: It's not a mission critical type of meeting. It could 593 00:38:08,120 --> 00:38:11,160 Speaker 1: take all that into account. Then KALO can make a 594 00:38:11,160 --> 00:38:14,359 Speaker 1: determination on its own whether or not it should keep 595 00:38:14,400 --> 00:38:17,239 Speaker 1: the meeting in place and go ahead just without those 596 00:38:17,280 --> 00:38:20,720 Speaker 1: two people and maybe just send updates to those two people, 597 00:38:21,320 --> 00:38:24,960 Speaker 1: or to cancel the meeting entirely notifying all the participants 598 00:38:25,000 --> 00:38:28,600 Speaker 1: about it. Then look at the different calendars of those participants, 599 00:38:28,920 --> 00:38:33,040 Speaker 1: book a new meeting, including securing a space for that 600 00:38:33,160 --> 00:38:36,799 Speaker 1: meeting and sending out new invites. It would even be 601 00:38:36,880 --> 00:38:39,400 Speaker 1: able to look at the purpose of the meeting and 602 00:38:39,480 --> 00:38:43,279 Speaker 1: flag information that's relevant to that meeting, essentially creating a 603 00:38:43,320 --> 00:38:47,640 Speaker 1: sort of meeting dossier on demand. So it's really, you know, 604 00:38:47,760 --> 00:38:53,000 Speaker 1: incredible sophisticated stuff. Now, that was the fully fledged Kalo, 605 00:38:53,800 --> 00:38:58,000 Speaker 1: but an offshoot of this project, or maybe it's it's 606 00:38:58,040 --> 00:39:00,480 Speaker 1: better to say it was a smaller sister project that 607 00:39:00,520 --> 00:39:02,960 Speaker 1: existed at the same time it launched in two thousand three. 608 00:39:03,000 --> 00:39:07,440 Speaker 1: Along with Kalo. This other one was called Vanguard, at 609 00:39:07,480 --> 00:39:10,000 Speaker 1: least within s r I, and it was taking a 610 00:39:10,000 --> 00:39:15,000 Speaker 1: more scaled down approach of building out an assistant and 611 00:39:15,040 --> 00:39:19,280 Speaker 1: looking at how it could be useful on mobile devices. Now, again, 612 00:39:19,280 --> 00:39:22,319 Speaker 1: this was in two thousand three, before smartphones would really 613 00:39:22,360 --> 00:39:26,120 Speaker 1: become a mainstream product because Apple wouldn't even introduce the 614 00:39:26,160 --> 00:39:29,440 Speaker 1: iPhone until two thousand seven. But s r I was 615 00:39:29,480 --> 00:39:32,920 Speaker 1: working on implementations of a more limited virtual assistant and 616 00:39:32,960 --> 00:39:36,880 Speaker 1: then showing it off to companies like Motorola. One person 617 00:39:37,160 --> 00:39:40,840 Speaker 1: at Motorola who was really impressed with this work was 618 00:39:40,880 --> 00:39:45,319 Speaker 1: a guy named Dog Kittlaus. Kittlaus attempted to convince his 619 00:39:45,360 --> 00:39:49,239 Speaker 1: superiors that Motorola that Vanguard was a really important piece 620 00:39:49,280 --> 00:39:53,200 Speaker 1: of work, but he didn't find any real interest over 621 00:39:53,320 --> 00:39:57,279 Speaker 1: at Motorola, so he did something fairly brazen. In two 622 00:39:57,280 --> 00:40:00,600 Speaker 1: thousand seven, he quit his job at Motorole and he 623 00:40:00,719 --> 00:40:04,800 Speaker 1: joined SRI International with the intent of exploring ways to 624 00:40:04,960 --> 00:40:09,080 Speaker 1: spin off a new business that would develop an implementation 625 00:40:09,280 --> 00:40:14,480 Speaker 1: of the Kalo Vanguard virtual assistant, but for the consumer market. 626 00:40:15,040 --> 00:40:19,080 Speaker 1: The result would be a new company called Sirie s 627 00:40:19,120 --> 00:40:21,799 Speaker 1: I r I, which is kind of the way you 628 00:40:21,800 --> 00:40:24,840 Speaker 1: would say s r I if you were trying to 629 00:40:24,880 --> 00:40:27,480 Speaker 1: pronounce it as if it were an acronym as opposed 630 00:40:27,560 --> 00:40:32,280 Speaker 1: to an initialism. Adam Chair, after some convincing from Kittlaus, 631 00:40:32,840 --> 00:40:36,480 Speaker 1: joined the venture as the vice president of Engineering. Kit 632 00:40:36,600 --> 00:40:40,320 Speaker 1: Loss would be the CEO. Tom Gruber, who had studied 633 00:40:40,320 --> 00:40:44,120 Speaker 1: computer science at Stanford and then pioneered work in various 634 00:40:44,160 --> 00:40:48,160 Speaker 1: fields of artificial intelligence, would become the chief technology officer 635 00:40:48,360 --> 00:40:53,640 Speaker 1: for the company. Interestingly, the Serie team didn't initially call 636 00:40:53,920 --> 00:41:00,000 Speaker 1: their own virtual assistant project SIRIE. Instead, the new spinoff company, 637 00:41:00,520 --> 00:41:04,960 Speaker 1: SIRI would call their virtual Assistant how H a l 638 00:41:05,440 --> 00:41:08,719 Speaker 1: after the AI system in the book and film two 639 00:41:08,800 --> 00:41:11,960 Speaker 1: thousand one. They did take an extra step to reassure 640 00:41:12,000 --> 00:41:15,719 Speaker 1: people that this time HOW would behave itself. So, if 641 00:41:15,719 --> 00:41:18,480 Speaker 1: you're not familiar with the story of two thousand one, 642 00:41:19,040 --> 00:41:24,239 Speaker 1: the artificially Intelligent computer system HOW begins to malfunction and 643 00:41:24,280 --> 00:41:26,880 Speaker 1: begins to interpret its mission in such a way that 644 00:41:27,000 --> 00:41:29,920 Speaker 1: it compels it to start killing off the crew inside 645 00:41:29,920 --> 00:41:33,560 Speaker 1: a spacecraft, kind of a worst case scenario with AI. 646 00:41:34,200 --> 00:41:37,480 Speaker 1: While SIRIE began to get off the ground, it was 647 00:41:37,560 --> 00:41:41,759 Speaker 1: licensing technologies from s r I to power the virtual assistant, 648 00:41:42,120 --> 00:41:44,839 Speaker 1: and it also began to hire the talent needed to 649 00:41:44,960 --> 00:41:48,799 Speaker 1: bring this idea to life. At the same time, Apple 650 00:41:49,160 --> 00:41:52,319 Speaker 1: was pushing the smartphone industry into the limelight with the 651 00:41:52,360 --> 00:41:54,880 Speaker 1: introduction of the first iPhone. This was all happening at 652 00:41:54,920 --> 00:41:58,200 Speaker 1: two thousand seven. It was clear that the push for 653 00:41:58,280 --> 00:42:01,480 Speaker 1: a virtual assistant was coming at just the right time, 654 00:42:01,600 --> 00:42:06,880 Speaker 1: as Apple's implementation of smartphone technology was a grand slam 655 00:42:06,920 --> 00:42:11,040 Speaker 1: home run. To use a sports analogy, it soon became 656 00:42:11,080 --> 00:42:14,239 Speaker 1: obvious that the future of computing was going to be, 657 00:42:14,320 --> 00:42:18,480 Speaker 1: at least in large part mobile That in turn opened 658 00:42:18,520 --> 00:42:21,640 Speaker 1: up opportunities to create new ways to interact with mobile 659 00:42:21,640 --> 00:42:24,560 Speaker 1: devices in order to do the stuff we needed to 660 00:42:24,640 --> 00:42:28,280 Speaker 1: do now. It's obvious to say this, but mobile devices 661 00:42:28,320 --> 00:42:32,200 Speaker 1: have a very different user interface from your typical computer. 662 00:42:32,600 --> 00:42:35,760 Speaker 1: Interacting with a handheld computer by tapping on a screen 663 00:42:35,960 --> 00:42:40,880 Speaker 1: or talking to it creates different opportunities for crafting experiences 664 00:42:41,280 --> 00:42:44,520 Speaker 1: than someone sitting down to a computer with a keyboard 665 00:42:44,520 --> 00:42:48,799 Speaker 1: and mouse. There's a potential need for a voice activated 666 00:42:48,880 --> 00:42:51,760 Speaker 1: personal assistant that could help you carry out your tasks, 667 00:42:51,800 --> 00:42:56,720 Speaker 1: particularly ones that might need multiple steps. Sirie the Company 668 00:42:57,000 --> 00:43:00,160 Speaker 1: came along just as the need for Sirie the App 669 00:43:00,360 --> 00:43:03,040 Speaker 1: was beginning to take shape, so it was the right 670 00:43:03,080 --> 00:43:07,280 Speaker 1: place at the right time. In two thousand seven, Apple 671 00:43:07,360 --> 00:43:10,960 Speaker 1: had not yet opened up the opportunity for independent app 672 00:43:11,000 --> 00:43:15,200 Speaker 1: developers to submit apps for the iPhone. That wouldn't actually 673 00:43:15,200 --> 00:43:18,160 Speaker 1: happen until July tenth, two thou eight, essentially a year 674 00:43:18,200 --> 00:43:21,960 Speaker 1: after the iPhone had debuted. The Serie team was still 675 00:43:22,360 --> 00:43:25,600 Speaker 1: hard at work building out the virtual assistant app they 676 00:43:25,600 --> 00:43:28,719 Speaker 1: had in mind in two thousand and eight, while they 677 00:43:28,760 --> 00:43:32,440 Speaker 1: were licensing technology from s r I International, you know, 678 00:43:32,480 --> 00:43:35,839 Speaker 1: from the Vanguard and the the Kalo projects, they still 679 00:43:35,880 --> 00:43:38,120 Speaker 1: had to build out the systems that would actually power 680 00:43:38,200 --> 00:43:42,640 Speaker 1: Syria on the back end. Generally speaking, their approach was 681 00:43:42,719 --> 00:43:45,560 Speaker 1: to create an app where a person could ask Syria 682 00:43:45,680 --> 00:43:49,319 Speaker 1: question and the app would record that request as a 683 00:43:49,360 --> 00:43:53,000 Speaker 1: little audio file, send that audio file to a server 684 00:43:53,160 --> 00:43:55,879 Speaker 1: and a data center, and the first step then would 685 00:43:55,920 --> 00:44:00,200 Speaker 1: be to transcribe the audio file into text, so we're 686 00:44:00,200 --> 00:44:03,479 Speaker 1: talking about speech to text here. Then the system would 687 00:44:03,480 --> 00:44:07,400 Speaker 1: need to parse the request. What is actually being asked here? 688 00:44:07,480 --> 00:44:11,719 Speaker 1: What is the command or request saying. Now, in some systems, 689 00:44:12,080 --> 00:44:15,440 Speaker 1: a computer will break down a sentence into its various components, 690 00:44:15,480 --> 00:44:19,000 Speaker 1: you know, a subject, verb, and object, and then try 691 00:44:19,080 --> 00:44:22,560 Speaker 1: to figure out what is actually being set. Adam Chair 692 00:44:22,680 --> 00:44:26,759 Speaker 1: took a different approach with his team. They taught their 693 00:44:26,800 --> 00:44:31,399 Speaker 1: system the meaning of real world objects. So, rather than 694 00:44:31,480 --> 00:44:34,760 Speaker 1: trying to parse out what a sentence meant by first 695 00:44:34,880 --> 00:44:38,760 Speaker 1: figuring out what's the subject, what's the verb, and what's 696 00:44:38,800 --> 00:44:42,560 Speaker 1: the object that the subject is acting upon, Siri started 697 00:44:42,560 --> 00:44:46,040 Speaker 1: off by looking at real world concepts within the request. 698 00:44:46,719 --> 00:44:50,319 Speaker 1: Siri would then map the request against a list of 699 00:44:50,400 --> 00:44:55,480 Speaker 1: possible responses and then employ that statistical probability model that 700 00:44:55,560 --> 00:44:59,120 Speaker 1: I mentioned earlier. What are the odds that someone was 701 00:44:59,160 --> 00:45:02,960 Speaker 1: asking for dire actions to an Italian restaurant versus asking 702 00:45:03,040 --> 00:45:06,640 Speaker 1: Siri to provide a recipe for an Italian dish, for example. 703 00:45:07,120 --> 00:45:10,439 Speaker 1: So if I activate my virtual assistant and say I 704 00:45:10,520 --> 00:45:15,279 Speaker 1: want linguini, that's a pretty broad thing to say, right. 705 00:45:15,440 --> 00:45:17,799 Speaker 1: The app has to guess at whether I mean I 706 00:45:17,880 --> 00:45:21,719 Speaker 1: want to go someplace that serves linguini or I want 707 00:45:21,719 --> 00:45:25,080 Speaker 1: to make it myself. Now, my personal app would have 708 00:45:25,200 --> 00:45:29,000 Speaker 1: learned by my behaviors that I am very lazy and 709 00:45:29,000 --> 00:45:31,960 Speaker 1: would realize that I am actually asking for someone to 710 00:45:32,000 --> 00:45:35,880 Speaker 1: bring me linguini. So there's no doubt Siri would return 711 00:45:35,920 --> 00:45:39,160 Speaker 1: results of Italian restaurants that deliver as a result from 712 00:45:39,160 --> 00:45:42,359 Speaker 1: my request. And keep in mind, Sirie was intended to 713 00:45:42,440 --> 00:45:45,319 Speaker 1: learn from user behaviors and a tune itself to those 714 00:45:45,360 --> 00:45:50,520 Speaker 1: behaviors over time. Beyond that, Siri would pull information from 715 00:45:50,600 --> 00:45:54,320 Speaker 1: multiple sources to provide results. So if I asked about 716 00:45:54,320 --> 00:45:57,960 Speaker 1: a restaurant, Siri would provide all sorts of data about 717 00:45:58,040 --> 00:46:01,440 Speaker 1: the restaurant, from user reviews, to directions to the restaurant, 718 00:46:01,520 --> 00:46:04,640 Speaker 1: to menu items to what price range I might expect 719 00:46:05,160 --> 00:46:08,440 Speaker 1: at that place. Syria could also tap into other stuff 720 00:46:08,480 --> 00:46:12,680 Speaker 1: like the phone's location, and thus give relevant answers based 721 00:46:12,719 --> 00:46:15,640 Speaker 1: on my location, so I wouldn't have to worry about 722 00:46:15,680 --> 00:46:19,000 Speaker 1: getting irrelevant search results if I happened to be far 723 00:46:19,120 --> 00:46:23,359 Speaker 1: from home, right Siri wouldn't suggest that I go and 724 00:46:23,440 --> 00:46:25,480 Speaker 1: get food from a place that's right down the street 725 00:46:25,480 --> 00:46:28,320 Speaker 1: from my house in Atlanta while I happen to be 726 00:46:28,360 --> 00:46:31,719 Speaker 1: in New York City, for example. The team also gave 727 00:46:31,800 --> 00:46:35,680 Speaker 1: Sirie a bit of an attitude. Siri could be sassy 728 00:46:35,840 --> 00:46:38,279 Speaker 1: and had a bit of a potty mouth. In fact, 729 00:46:38,320 --> 00:46:41,600 Speaker 1: Siri would occasionally drop an F bomb here or there now. 730 00:46:41,600 --> 00:46:45,920 Speaker 1: According to Kittlaus, the goal was eventually to offer extensions 731 00:46:45,960 --> 00:46:48,719 Speaker 1: to Siri so that end users could kind of pick 732 00:46:48,800 --> 00:46:53,600 Speaker 1: the apps personality. Maybe you want a no nonsense virtual 733 00:46:53,600 --> 00:46:56,920 Speaker 1: assistant that just provides the information you need and that's it. 734 00:46:57,760 --> 00:47:01,600 Speaker 1: Maybe you wanted more of a good fee sidekick, or 735 00:47:01,640 --> 00:47:04,960 Speaker 1: maybe you wanted a virtual assistant who could give you 736 00:47:05,000 --> 00:47:08,520 Speaker 1: some serious attitude on occasion. The goal down the line 737 00:47:08,600 --> 00:47:10,880 Speaker 1: was to create options for people to kind of shape 738 00:47:10,960 --> 00:47:14,040 Speaker 1: their experience, but that would end up on the cutting 739 00:47:14,120 --> 00:47:18,600 Speaker 1: room floor due to a very big reason. The serie 740 00:47:18,719 --> 00:47:24,239 Speaker 1: app made its debut in the iPhone app store. In January, 741 00:47:24,280 --> 00:47:28,120 Speaker 1: three weeks after it debuted, Kit Loss received a phone 742 00:47:28,120 --> 00:47:32,080 Speaker 1: call from an unlisted number, a call that he almost 743 00:47:32,320 --> 00:47:35,720 Speaker 1: didn't even answer, but when he did answer, the person 744 00:47:35,800 --> 00:47:37,600 Speaker 1: on the other end of the call happened to be 745 00:47:37,719 --> 00:47:42,120 Speaker 1: Steve Jobs, the CEO of Apple. Jobs was over the 746 00:47:42,160 --> 00:47:45,040 Speaker 1: moon about Sirie and wanted to meet with kit Lost 747 00:47:45,080 --> 00:47:48,919 Speaker 1: to discover some pretty enormous options, the biggest one being 748 00:47:48,960 --> 00:47:53,240 Speaker 1: that Apple itself would acquire Sirie. Now. At the time Sirie, 749 00:47:53,239 --> 00:47:56,200 Speaker 1: the company was working on developing a version of the 750 00:47:56,239 --> 00:47:59,920 Speaker 1: app for Android phones, having reached a deal with varies 751 00:48:00,080 --> 00:48:02,920 Speaker 1: in to create a version of Sirie that could be 752 00:48:03,000 --> 00:48:06,520 Speaker 1: the default app on all Verizon Android phones moving forward. 753 00:48:07,200 --> 00:48:11,680 Speaker 1: The Apple deal would ultimately derail that agreement, as Jobs 754 00:48:11,760 --> 00:48:16,080 Speaker 1: was insistent that Sirie be an Apple exclusive. In fact, 755 00:48:16,400 --> 00:48:22,480 Speaker 1: when Apple would introduce Sirie on October fourth, two thousand eleven, 756 00:48:23,440 --> 00:48:26,760 Speaker 1: it seemed like it was being presented as a purely 757 00:48:26,960 --> 00:48:32,600 Speaker 1: Apple product, that it didn't have a life outside of 758 00:48:32,680 --> 00:48:35,120 Speaker 1: Apple at all. It came across as it just being 759 00:48:35,400 --> 00:48:40,360 Speaker 1: Apple all along. And of course, the day after Apple 760 00:48:40,600 --> 00:48:45,400 Speaker 1: would introduce SyRI to the public, Steve Jobs himself passed away. 761 00:48:45,680 --> 00:48:49,319 Speaker 1: October five, two thousand eleven. But that part of the 762 00:48:49,320 --> 00:48:52,399 Speaker 1: story will have to wait for part two because, as 763 00:48:52,400 --> 00:48:56,480 Speaker 1: I said, this is going longer than I anticipated. So 764 00:48:56,520 --> 00:48:59,719 Speaker 1: in our next episode we'll pick up probably actually a 765 00:48:59,719 --> 00:49:02,759 Speaker 1: little earlier than where I'm leaving off here, actually, because 766 00:49:02,760 --> 00:49:06,359 Speaker 1: there's still some other details we should talk about as 767 00:49:06,360 --> 00:49:10,640 Speaker 1: far as how Siri works and the actual arrangement of 768 00:49:10,719 --> 00:49:14,320 Speaker 1: Apple's acquisition, and then we'll talk about how the app 769 00:49:14,520 --> 00:49:18,800 Speaker 1: has evolved and changed under Apple's ownership, and will also explore, 770 00:49:18,840 --> 00:49:22,120 Speaker 1: you know, a little bit about series distant cousins like 771 00:49:22,320 --> 00:49:26,799 Speaker 1: Alexa and Google Assistant and others, because all of these 772 00:49:26,840 --> 00:49:31,440 Speaker 1: work in similar ways, though they have their own specific 773 00:49:32,120 --> 00:49:36,680 Speaker 1: processes to handle requests, and so if you do an 774 00:49:36,680 --> 00:49:40,359 Speaker 1: Apples to Apples comparison, it does break down ultimately once 775 00:49:40,400 --> 00:49:43,600 Speaker 1: you start getting down to how things are working in 776 00:49:43,760 --> 00:49:46,520 Speaker 1: detail on the back end. So I won't go into 777 00:49:47,040 --> 00:49:50,640 Speaker 1: full mode on those because it would require multiple episodes 778 00:49:50,640 --> 00:49:53,920 Speaker 1: on that. But we will talk more about Siri and 779 00:49:54,320 --> 00:49:57,120 Speaker 1: what has happened in the years since its acquisition in 780 00:49:57,120 --> 00:50:00,120 Speaker 1: our next episode. If you guys have suggestions for future 781 00:50:00,120 --> 00:50:02,960 Speaker 1: topics I should tackle on tech stuff, let me know 782 00:50:03,320 --> 00:50:05,399 Speaker 1: the best way to do that is to reach out 783 00:50:05,480 --> 00:50:08,919 Speaker 1: on Twitter. The handle we use is text stuff H 784 00:50:09,120 --> 00:50:13,120 Speaker 1: s W and I'll talk to you again really soon. 785 00:50:18,239 --> 00:50:21,279 Speaker 1: Text Stuff is an I Heart Radio production. For more 786 00:50:21,360 --> 00:50:24,720 Speaker 1: podcasts from my heart Radio, visit the i heart Radio app, 787 00:50:24,880 --> 00:50:28,040 Speaker 1: Apple Podcasts, or wherever you listen to your favorite shows.