WEBVTT - Happy Anniver-Siri 0:00:04.400 --> 0:00:07.800 Welcome to Tech Stuff, a production from I Heart Radio. 0:00:12.360 --> 0:00:15.600 Hey there, and welcome to tech Stuff. I'm your host, 0:00:15.800 --> 0:00:19.040 Jonathan Strickland. I'm an executive producer with I Heart Radio, 0:00:19.079 --> 0:00:23.560 and I love all things tech, And uh, you know 0:00:23.600 --> 0:00:26.840 what the today's episode I was gonna I was gonna 0:00:26.840 --> 0:00:29.920 make it a one partner, but it turns out there's 0:00:30.000 --> 0:00:33.040 just way too much stuff, not just about the topic 0:00:33.080 --> 0:00:37.239 at hand, but the various components that make up this 0:00:37.440 --> 0:00:40.600 topic that require me to do more than one. So 0:00:40.760 --> 0:00:43.519 this is gonna likely be a two parter. But today 0:00:43.960 --> 0:00:47.239 I thought we could look back at the development and 0:00:47.400 --> 0:00:53.160 evolution of a famous AI personality. This virtual assistant celebrated 0:00:53.200 --> 0:00:57.480 an anniversary recently, and I must apologize for being a 0:00:57.480 --> 0:01:02.480 couple of days late with this, but this particular servant 0:01:03.280 --> 0:01:07.120 debuted on October fourth, two thousand eleven, technically for the 0:01:07.200 --> 0:01:11.680 second time, but the history of the actual technology dates 0:01:11.720 --> 0:01:15.480 back much further. And of course, I'm talking about Sirie, 0:01:15.920 --> 0:01:20.640 Apple's virtual assistant that can interpret voice commands and return 0:01:20.720 --> 0:01:25.160 results based on them. This is not just some dull 0:01:25.360 --> 0:01:30.680 history lesson, however, Sirie really has an incredible backstory, ranging 0:01:30.720 --> 0:01:33.959 from a science fiction vision of the future to a 0:01:34.120 --> 0:01:39.240 secret project intended to augment the decision making capabilities of 0:01:39.280 --> 0:01:45.880 the United States military. Yeah, Siri had a pretty tough background. 0:01:46.640 --> 0:01:50.720 The story of Sirie is complicated, and not just because 0:01:50.880 --> 0:01:55.280 of the internal history of developing the technology, but also 0:01:55.360 --> 0:01:59.440 because the tool relies on a lot of converging technological 0:01:59.480 --> 0:02:04.600 trend There are elements of voice recognition, UH, speech to text, 0:02:05.080 --> 0:02:09.240 natural language interpretation, and other technologies that fall under the 0:02:09.440 --> 0:02:14.600 very broad umbrella of artificial intelligence. So get settled, it's 0:02:14.600 --> 0:02:17.760 time to talk about Siri. Also, if you're listening to 0:02:17.760 --> 0:02:22.320 this near Apple devices, I apologize because there's a good 0:02:22.400 --> 0:02:26.800 chance those devices might start talking back at me. But 0:02:26.960 --> 0:02:29.440 I refuse to do an episode where I just refer 0:02:29.560 --> 0:02:34.240 to the subject as you know who. You could argue 0:02:34.720 --> 0:02:37.799 that the origins of Siri can be found in a 0:02:37.840 --> 0:02:43.960 promotional video that Apple produced back in nineteen seven to 0:02:44.040 --> 0:02:48.440 show off a concept of an artificially intelligent smart assistant. 0:02:49.000 --> 0:02:52.520 Now that alone is interesting, but what really is amazing 0:02:52.800 --> 0:02:57.360 is that the arbitrary date they chose as the setting 0:02:57.360 --> 0:03:01.440 for this video was two thousand evan, probably September. We 0:03:01.560 --> 0:03:05.080 know that because there is a part within the video 0:03:05.200 --> 0:03:08.840 where a character asks for information that had been published 0:03:08.919 --> 0:03:14.080 five years previously, and the published information had a publication 0:03:14.160 --> 0:03:17.480 date of two thousand six. Now this means that the 0:03:17.560 --> 0:03:22.520 actual debut of Syrie as an Apple product was just 0:03:22.760 --> 0:03:27.160 one month after the fictional events in that video from nine. 0:03:28.680 --> 0:03:31.520 That's just a coincidence, but it's a cool one. The 0:03:31.639 --> 0:03:37.000 Knowledge Navigator video shows a man walking into a study, 0:03:37.360 --> 0:03:42.080 really nice one, and unfolding a tablet style computer device. 0:03:42.560 --> 0:03:45.080 Then he walks off away to stare at stuff as 0:03:45.120 --> 0:03:49.640 a virtual assistant reads off his messages and meetings on 0:03:49.680 --> 0:03:54.560 his calendar. The virtual assistant appears as a video and 0:03:54.600 --> 0:03:57.520 a little window on the screen of the tablet, and 0:03:57.560 --> 0:03:59.760 it's you know, like shot from the shoulders up, kind 0:03:59.800 --> 0:04:03.000 of a the bust of a young man, and the 0:04:03.120 --> 0:04:06.440 video takes up that one little corner of the tablet device. 0:04:06.440 --> 0:04:10.560 So in this visualization, the virtual assistant isn't just a 0:04:10.640 --> 0:04:15.560 disembodied voice. It also has a face. Also, everyone in 0:04:15.560 --> 0:04:19.720 this video is extremely white, which I guess is kind 0:04:19.720 --> 0:04:24.080 of a given for the time period and the people involved, 0:04:24.680 --> 0:04:28.680 but it just comes across as so white. I mean, 0:04:29.160 --> 0:04:32.160 we're doing this with the benefit of the glasses of 0:04:32.200 --> 0:04:35.400 twenty I just wanted to throw that out there anyway. 0:04:35.560 --> 0:04:38.960 The video goes on to have the real life man 0:04:39.160 --> 0:04:42.520 who is a professor in this video, ask his virtual 0:04:42.520 --> 0:04:47.120 assistant to pull up lecture notes uh and unread articles 0:04:47.160 --> 0:04:50.039 that relate back to the lecture he's He's asking for 0:04:50.040 --> 0:04:52.440 a lecture notes of a lecture he gave a year ago. 0:04:52.839 --> 0:04:55.440 He's giving essentially the same lecture now, but he wants 0:04:55.440 --> 0:04:58.440 to update it with the latest information, and he even 0:04:58.480 --> 0:05:03.159 asks the virtual assistant to summarize those unread articles that 0:05:03.200 --> 0:05:06.520 had been published in the year since his last lecture. 0:05:06.760 --> 0:05:12.040 The virtual assistant is thus aggregating information, analyzing that information 0:05:12.080 --> 0:05:15.880 for context, and then delivering summaries, which is that's a 0:05:15.880 --> 0:05:21.279 pretty sophisticated set of artificially intelligent tasks. He also, the 0:05:21.360 --> 0:05:25.680 professor uses the device and virtual assistant to call and 0:05:25.760 --> 0:05:29.760 collaborate with a peer in real time. Now, this was 0:05:29.839 --> 0:05:33.840 not the only video that Apple would produce to showcase 0:05:33.920 --> 0:05:37.560 this kind of general idea, however, arguably it is the 0:05:37.600 --> 0:05:43.000 most famous of those videos. Now, as I said, Knowledge 0:05:43.080 --> 0:05:47.039 Navigator came out of Apple, and Steve Jobs would later 0:05:47.080 --> 0:05:51.880 play a pivotal role in how the company would introduce Sirie, 0:05:52.560 --> 0:05:56.120 but This was not a Steve Jobs project because Jobs 0:05:56.120 --> 0:05:59.840 had been ousted from the company Apple, or he had 0:06:00.080 --> 0:06:03.240 quit in disgust, depending upon which version of the story 0:06:03.600 --> 0:06:06.159 you're listening to. Anyway, he had left a couple of 0:06:06.240 --> 0:06:10.279 years before this video was produced. The Knowledge Navigator was 0:06:10.360 --> 0:06:14.200 something that Apple CEO John Scully had described in a 0:06:14.279 --> 0:06:18.640 book titled Odyssey. Now, of course, in science fiction stories 0:06:19.400 --> 0:06:22.240 we have no shortage of instances where a human is 0:06:22.279 --> 0:06:26.800 interacting with a computer or otherwise artificially intelligent device like 0:06:26.839 --> 0:06:30.520 a robot, but the Knowledge Navigator seemed to lay down 0:06:30.560 --> 0:06:35.160 the foundations toward future products like Siri and the iPad, 0:06:35.440 --> 0:06:39.040 not to mention the potential uses of the Internet, which 0:06:39.040 --> 0:06:44.080 inn was definitely a thing. It existed, but most of 0:06:44.120 --> 0:06:48.440 the mainstream public remained unaware of it because the Worldwide 0:06:48.440 --> 0:06:51.919 Web wouldn't even come along for another few years. However, 0:06:52.360 --> 0:06:54.760 while you can look at this video and say, ah, 0:06:55.480 --> 0:06:59.520 this must be where Apple got that idea, they probably 0:06:59.560 --> 0:07:02.400 got to work right away on Siri, well you'd be 0:07:02.480 --> 0:07:06.960 wrong because the early work, in fact, the vast bulk 0:07:07.440 --> 0:07:09.880 of the work on Syrie to bring it to life, 0:07:10.440 --> 0:07:14.560 didn't start at Apple at all. It didn't involve the company. 0:07:14.600 --> 0:07:19.200 So our story now turns to a very different organization, 0:07:19.600 --> 0:07:25.640 the Defense Advanced Research Projects Agency, better known as DARPA. 0:07:25.760 --> 0:07:29.600 Now this is part of the United States Department of Defense. 0:07:30.080 --> 0:07:33.120 Back in nineteen fifty eight, the then President of the 0:07:33.200 --> 0:07:39.080 United States, Dwight D. Eisenhower, authorized the foundation of this agency, 0:07:39.280 --> 0:07:42.240 though at the time it was called the Advanced Research 0:07:42.320 --> 0:07:46.880 Project Agency or ARPA. Defense would be added later. This 0:07:46.960 --> 0:07:49.400 agency would play a critical role in the evolution of 0:07:49.440 --> 0:07:53.960 technologies in the United States, and the mission of DARPA 0:07:54.040 --> 0:07:58.520 and ARPA before it is quote to make pivotal investments 0:07:58.600 --> 0:08:03.000 and breakthrough technology is for national security end quote, and 0:08:03.040 --> 0:08:07.240 that wording is really precise. It's easy to imagine DARPA 0:08:07.320 --> 0:08:11.600 as being housed in some enormous underground bunker filled with 0:08:11.640 --> 0:08:16.520 scientists who are building out crazy devices like robo scorpions 0:08:16.640 --> 0:08:19.680 or a blender that can also teleport or something. But 0:08:19.800 --> 0:08:26.080 in reality, DARPA is more about funding research than conducting research. Now, 0:08:26.080 --> 0:08:29.520 don't get me wrong, the agency relies heavily on experts 0:08:29.560 --> 0:08:33.240 to evaluate proposals and consider to whom the agency should 0:08:33.280 --> 0:08:36.959 send money. But the purpose of DARPA is to enable 0:08:37.120 --> 0:08:41.680 others to do important work. DARPA has played a huge 0:08:41.840 --> 0:08:46.640 role in countless technological breakthroughs. This way. Much of the 0:08:46.679 --> 0:08:49.960 technologies that would go on to power the Internet started 0:08:50.000 --> 0:08:53.400 with ARPA net, a kind of precursor network to the 0:08:53.400 --> 0:08:57.400 Internet and one that was funded by ARPA. Thus the 0:08:57.520 --> 0:09:01.600 name the DARPA Grand Challenge just helped get self driving 0:09:01.640 --> 0:09:05.880 cars into gear. You know, pun intended. They also created 0:09:05.960 --> 0:09:09.720 difficult scenarios for humanoid robots to go through. That was 0:09:09.760 --> 0:09:13.120 a few years ago and was really cool. The competitions 0:09:13.200 --> 0:09:17.640 DARPA hosts have specific goals and metrics, and that guides 0:09:17.720 --> 0:09:20.840 the designers and engineers who are working on them as 0:09:20.840 --> 0:09:24.720 they build out technologies. It's good to define your goal. 0:09:24.840 --> 0:09:28.080 It really gives you focus when you're trying to develop 0:09:28.160 --> 0:09:31.360 the technology to meet that goal. Winning a challenge is 0:09:31.400 --> 0:09:34.320 a big deal, though the cash prize may not even 0:09:34.360 --> 0:09:37.880 cover the amount of money participants have spent through the 0:09:37.880 --> 0:09:42.400 development of those technologies, and there are entire businesses, or 0:09:42.559 --> 0:09:46.680 at least divisions within businesses that can be borne out 0:09:46.679 --> 0:09:50.400 of these challenges. The Grand Challenges are just one way 0:09:50.520 --> 0:09:55.200 DARPA encourages technological development. Often, the agency will create a 0:09:55.240 --> 0:09:59.480 specific goal such as the design of a robotic exoskeleton 0:09:59.559 --> 0:10:03.000 that can help you know, US soldiers carry heavy loads 0:10:03.160 --> 0:10:06.800 while they are on foot for longer distances, and then 0:10:06.840 --> 0:10:10.439 they'll send out an RFP, which is a request for proposal. 0:10:11.120 --> 0:10:14.680 The agency considers the proposals that it receives from this 0:10:14.840 --> 0:10:19.040 RFP and then decides which, if any, they will accept 0:10:19.160 --> 0:10:22.320 and then fund. Then after a given amount of time. 0:10:22.400 --> 0:10:25.840 You know, it's dependent upon the specific project, we find 0:10:25.880 --> 0:10:28.960 out if anything comes out of it. Sometimes nothing does, 0:10:29.360 --> 0:10:33.360 as some technological problems may prove more challenging than others 0:10:33.400 --> 0:10:37.680 and may require more time to evolve the various technologies 0:10:37.720 --> 0:10:40.400 to make it possible. So it might push the field, 0:10:40.640 --> 0:10:42.760 but you might not have a finished product at the 0:10:42.800 --> 0:10:45.120 end of it. Other times you do get a finished 0:10:45.120 --> 0:10:49.240 product anyway. In two thousand three, a decade and a 0:10:49.280 --> 0:10:52.840 half after the Knowledge Navigator videos came out of Apple, 0:10:53.480 --> 0:10:57.040 DARPA identified a new opportunity, and this was one that 0:10:57.120 --> 0:11:00.960 was borne out of necessity. The challenge was that we 0:11:01.040 --> 0:11:04.360 have access to way more information today than we did 0:11:04.360 --> 0:11:08.440 in the past. So decades ago, military commanders had to 0:11:08.480 --> 0:11:12.960 make decisions based on limited information. They'd rely a great 0:11:13.040 --> 0:11:17.280 deal on their own expertise and experience in order to 0:11:17.400 --> 0:11:19.360 make up for the fact that they only had part 0:11:19.400 --> 0:11:22.160 of the picture. And while a great commander has a 0:11:22.200 --> 0:11:26.199 better chance of making the right call than an inexperienced 0:11:26.200 --> 0:11:30.119 commander would, the limited amount of information could still contribute 0:11:30.160 --> 0:11:33.840 to disaster. You might be the greatest commander of all time, 0:11:34.400 --> 0:11:37.319 but if you're lacking a key part of information, you 0:11:37.400 --> 0:11:41.160 might make a decision that is terrible. So flash forward 0:11:41.200 --> 0:11:44.120 to two thousand three, and now the story had kind 0:11:44.200 --> 0:11:48.800 of flip flopped. Now military commanders would receive more information 0:11:48.840 --> 0:11:52.920 than they could reasonably handle. The challenge now wasn't to 0:11:53.120 --> 0:11:56.120 use intuition to make up for blind spots, but rather, 0:11:56.559 --> 0:11:59.600 how do you synthesize all this information so that you 0:11:59.640 --> 0:12:03.960 can make the right decision. Too much information was proving 0:12:04.000 --> 0:12:06.640 to be kind of as big a problem as too 0:12:06.720 --> 0:12:11.240 little information, at least in some cases, and so DARPA 0:12:11.240 --> 0:12:14.240 wished to fund the development of a smart system that 0:12:14.320 --> 0:12:17.560 could help commanders make sense of all the data coming 0:12:17.600 --> 0:12:21.840 in from day to day. Now, DARPA projects tend to 0:12:21.880 --> 0:12:26.360 be labyrinthian, with lots of bits and pieces and a 0:12:26.360 --> 0:12:30.160 lot of different companies and research labs and more organizations 0:12:30.240 --> 0:12:33.800 might tackle all or part of one of these projects. 0:12:34.400 --> 0:12:38.199 The cognitive computing section of DARPA had a program called 0:12:38.360 --> 0:12:44.640 Perceptive Assistance that Learn or PAL, which seems nice. It 0:12:44.760 --> 0:12:47.520 was this part of the program that would fund the 0:12:47.559 --> 0:12:52.200 development of a virtual cognitive assistant. The amount of funding 0:12:52.640 --> 0:12:57.520 was twenty two million dollars. What a great PAL. The 0:12:57.640 --> 0:13:02.880 organization that landed this deal was s r I International, 0:13:03.240 --> 0:13:11.160 itself an incredibly influential organization. It's a nonprofit scientific research institution. 0:13:11.520 --> 0:13:16.319 Originally it was called the Stanford Research Institute because it 0:13:16.360 --> 0:13:20.000 was established by the trustees of Stanford University back in 0:13:20.120 --> 0:13:24.120 nineteen forty six, though the organization would separate from the 0:13:24.200 --> 0:13:28.160 university formally in the nineteen seventies and become a standalone, 0:13:28.240 --> 0:13:33.480 nonprofit scientific research lab. The organization has played a role 0:13:33.520 --> 0:13:38.120 in advancing materials science, developing liquid crystal displays or l 0:13:38.160 --> 0:13:43.280 c d s, creating telesurgery implementations, and more. And now 0:13:43.360 --> 0:13:46.720 it was going to tackle DARPA's request for a cognitive 0:13:46.760 --> 0:13:52.360 computer assistant. S r I International created a project called 0:13:52.400 --> 0:13:58.200 the Cognitive Assistant that Learns and Organizes or KALO or 0:13:58.400 --> 0:14:01.320 CALO if you prefer. And this appears to be another 0:14:01.360 --> 0:14:05.440 case where they landed upon that acronym first and then 0:14:05.559 --> 0:14:09.480 worked backward, as klo seems to come from the Latin 0:14:09.520 --> 0:14:15.840 word colognists, which means soldiers servant, and I probably mispronounced 0:14:15.840 --> 0:14:19.240 that because even though I was a medievalist, it's almost 0:14:19.280 --> 0:14:23.720 criminal I never took Latin. The concept, however, hearkens back 0:14:23.720 --> 0:14:26.280 to some of what we would see in that Knowledge 0:14:26.400 --> 0:14:30.560 Navigator video from a system that would be able to 0:14:30.640 --> 0:14:36.400 receive and interpret information, presumably from multiple sources, and provide 0:14:36.400 --> 0:14:41.040 a meaningful presentation or even interpretation of that data to humans, 0:14:41.760 --> 0:14:44.880 which is a pretty tall order, and let's break down 0:14:45.120 --> 0:14:47.400 a bit of what an assistant would need to do 0:14:47.520 --> 0:14:50.920 in order to accomplish this. We'll leave help the voice 0:14:50.920 --> 0:14:54.040 activation parts for now, as that would not be absolutely 0:14:54.040 --> 0:14:56.080 critical to make this work. You know, you might have 0:14:56.120 --> 0:14:59.680 a system that gives daily briefings on its own, or 0:15:00.040 --> 0:15:02.680 you might have one that you activate through text commands 0:15:02.760 --> 0:15:05.840 or some other user interface. It wouldn't necessarily have to 0:15:05.880 --> 0:15:08.840 be voice activated. But on the back end, what has 0:15:08.880 --> 0:15:11.680 to happen for this to work well? Presumably such a 0:15:11.680 --> 0:15:14.480 system would need to pull in data from a number 0:15:14.560 --> 0:15:18.680 of disparate sources, so the assistant wouldn't just be reciting 0:15:18.680 --> 0:15:23.600 facts and figures that we're coming from a centralized data server. Instead, 0:15:23.600 --> 0:15:27.040 it might be assimilating data from numerous sources into a 0:15:27.120 --> 0:15:31.000 cohesive presentation. On top of that, the data might be 0:15:31.000 --> 0:15:33.680 in different formats, meaning the system would need to be 0:15:33.680 --> 0:15:37.800 able to analyze the information inside different types of files. 0:15:38.880 --> 0:15:42.120 This isn't an easy thing to do. There's a reason 0:15:42.280 --> 0:15:45.000 we have a lot of specialized programs for working with 0:15:45.040 --> 0:15:49.120 specific types of files. When I put together these podcasts, 0:15:49.680 --> 0:15:53.000 I use a word processor for my notes, and I 0:15:53.160 --> 0:15:56.840 use an audio editing piece of software to record and 0:15:57.080 --> 0:16:00.479 edit the podcasts. Now I need both of those programs 0:16:00.680 --> 0:16:04.000 because neither of them can do the job that the 0:16:04.040 --> 0:16:06.720 other one does. I don't have like a all purpose 0:16:06.760 --> 0:16:11.440 program that does everything. Accessing different file formats, even in 0:16:11.480 --> 0:16:15.760 the same general family of applications is tricky. Beyond that, 0:16:16.320 --> 0:16:20.360 the way information can be presented within each file could 0:16:20.360 --> 0:16:23.880 be very different. It's very possible for us to open 0:16:23.960 --> 0:16:28.800 up multiple spreadsheets and even using the same basic spreadsheet 0:16:28.800 --> 0:16:31.160 program let's just say Excel, It's possible for us to 0:16:31.200 --> 0:16:35.240 open up half a dozen Excel spreadsheets that are all 0:16:35.280 --> 0:16:38.680 presenting the same information but doing so in different ways, 0:16:38.880 --> 0:16:41.760 and that might not be obvious at casual glance. You 0:16:41.840 --> 0:16:44.960 might look at one and the other and not immediately realize, oh, 0:16:45.160 --> 0:16:48.200 these are both saying the same thing. Just think about 0:16:48.200 --> 0:16:51.000 how information could be presented as a table or a 0:16:51.000 --> 0:16:55.560 graph or a chart. The AI assistant would ideally be 0:16:55.640 --> 0:16:59.040 able to access information no matter what format it was in. 0:16:59.560 --> 0:17:02.880 Nomatter are what a version of that format it was in, 0:17:02.960 --> 0:17:05.199 be able to interpret it and then be able to 0:17:05.240 --> 0:17:09.280 deliver a meaningful analysis to the user. Now, as data 0:17:09.320 --> 0:17:13.560 sets grow, this becomes increasingly difficult, which I should point 0:17:13.600 --> 0:17:16.600 out is the whole reason DARPA wanted to fund research 0:17:16.640 --> 0:17:19.800 into this in the first place. Military commanders were faced 0:17:19.840 --> 0:17:23.360 with a growing mountain of information that was increasingly difficult 0:17:23.400 --> 0:17:28.600 to parse. The analysis might also need to incorporate natural 0:17:28.720 --> 0:17:32.479 language recognition features. And I've talked about natural language a 0:17:32.480 --> 0:17:35.480 lot in previous episodes, but if we boil it down, 0:17:35.720 --> 0:17:38.679 it's the language that we humans use to communicate with 0:17:38.720 --> 0:17:43.399 one another. It's our natural way of expressing our thoughts. 0:17:43.440 --> 0:17:47.119 But the way we humans process and communicate information is 0:17:47.240 --> 0:17:51.080 different from how machines do it. We can be subtle. 0:17:51.400 --> 0:17:54.919 We can use stuff like metaphors and allegories and just 0:17:55.080 --> 0:17:59.960 different phrasing. Computers are, you know, a lot more literal. Hey, 0:18:00.119 --> 0:18:02.960 if you break it down to the most basic unit 0:18:03.240 --> 0:18:06.600 of machine information, you know, the bit. You see how 0:18:06.680 --> 0:18:10.560 literal computers are. A bit is either a zero or 0:18:10.600 --> 0:18:13.600 a one, or if you prefer, it's either off and 0:18:13.840 --> 0:18:18.159 on or no and yes. But using lots of bits, 0:18:18.359 --> 0:18:21.359 we can describe information in a way that provides more 0:18:21.400 --> 0:18:24.320 subtlety than just nowhere. Yes. But my point is that 0:18:24.359 --> 0:18:28.520 computers don't naturally process information the way we do, and 0:18:28.600 --> 0:18:33.400 so an entire branch of artificial intelligence called natural language 0:18:33.400 --> 0:18:37.880 processing evolved to create ways for computers to interpret what 0:18:37.960 --> 0:18:42.680 we mean when we express things within natural language. Making 0:18:42.720 --> 0:18:46.080 this more complicated is that, of course, there's no one 0:18:46.240 --> 0:18:49.439 way to say any given thing. We've got lots of 0:18:49.480 --> 0:18:53.040 ways to express the same general thought. And added to that, 0:18:53.680 --> 0:18:58.400 we have lots of different languages. There are around seven 0:18:58.440 --> 0:19:02.320 thousand different langue whig is spoken in the world today, 0:19:02.640 --> 0:19:04.919 though you could probably get away with a couple of 0:19:05.040 --> 0:19:08.399 dozen and cover the vast majority of the world's population 0:19:08.520 --> 0:19:11.840 that way. But these languages have their own vocabularies, their 0:19:11.840 --> 0:19:16.119 own syntaxes, their own expressions. So not only do we 0:19:16.200 --> 0:19:19.320 have multiple ways of saying things within one language, we 0:19:19.400 --> 0:19:22.960 have all these different languages to worry about. If you 0:19:23.000 --> 0:19:26.320 were to send ten people into a room with an 0:19:26.320 --> 0:19:29.600 AI assistant, and those ten people have a task they're 0:19:29.640 --> 0:19:33.000 supposed to perform with the help of this AI assistant, 0:19:33.680 --> 0:19:36.240 odds are no two people are going to go about 0:19:36.280 --> 0:19:40.240 it exactly the same way. And yet a working virtual 0:19:40.280 --> 0:19:43.359 assistant needs to be able to interpret and respond to 0:19:43.560 --> 0:19:47.120 every case and do so reliably on the back end, 0:19:47.440 --> 0:19:50.080 and AI system needs to be able to interpret data 0:19:50.119 --> 0:19:53.480 coming from different sources that may have very different ways 0:19:53.520 --> 0:19:58.720 of expressing similar ideas. This is an enormous task. Now, 0:19:58.720 --> 0:20:01.560 when we come back, I'll talk more about what s 0:20:01.680 --> 0:20:04.520 R I was doing and how the military project would 0:20:04.520 --> 0:20:08.560 evolve ultimately into Apple's Personal Assistant. But first let's take 0:20:08.880 --> 0:20:19.359 a quick break. Now I've only scratched the surface of 0:20:19.440 --> 0:20:22.840 what makes the creation of an AI assistant capable of 0:20:22.880 --> 0:20:27.280 accessing information from numerous sources and making that information useful 0:20:27.800 --> 0:20:32.040 really required. Let's talk a bit about the parameters of 0:20:32.080 --> 0:20:35.399 this project itself. So if you remember I said that 0:20:35.480 --> 0:20:38.919 the deal was initially for twenty two million dollars, and 0:20:39.000 --> 0:20:42.200 that would end up funding the creation of a five 0:20:42.400 --> 0:20:47.720 hundred person project, and the project spanned five years initially 0:20:47.880 --> 0:20:51.680 to investigate the possibility of building out such an AI system. 0:20:51.720 --> 0:20:55.159 Over time, more money would end up going into the 0:20:55.240 --> 0:20:58.760 research system, and it totaled around a hundred fifty million 0:20:58.800 --> 0:21:01.399 dollars by the end of the produc inject. The lab 0:21:01.560 --> 0:21:04.920 where it all went down would receive the charming nickname 0:21:05.200 --> 0:21:08.760 nerd City. A large part of the project focused on 0:21:08.840 --> 0:21:13.159 creating a program that could learn a user's behaviors. So 0:21:13.200 --> 0:21:17.359 not only could this personal assistant respond to what you 0:21:17.400 --> 0:21:22.760 were asking, it would gradually learn the way you behaved 0:21:22.840 --> 0:21:26.240 and it would adapt to you to work more effectively. 0:21:26.800 --> 0:21:31.040 Now this comes into the arena of pattern recognition. We 0:21:31.280 --> 0:21:34.840 humans are pretty darn good at recognizing patterns. In fact, 0:21:35.400 --> 0:21:39.480 we're so good that sometimes we will quote unquote recognize 0:21:39.560 --> 0:21:43.919 a pattern even when there isn't a pattern there. In 0:21:43.960 --> 0:21:47.880 some cases, this can come across as charming, such as 0:21:48.280 --> 0:21:52.040 when we see a face in a cloud, right, that's 0:21:52.560 --> 0:21:55.880 not really a pattern there. We're recognizing a pattern where 0:21:55.880 --> 0:21:58.639 none really exists. It's all based on our perspective in 0:21:58.640 --> 0:22:02.560 our imaginations. Now, in other cases, it's not so charming. 0:22:02.600 --> 0:22:05.159 It can actually lead to faulty reasoning. So I'm going 0:22:05.200 --> 0:22:08.120 to give you a very basic example that I hear 0:22:08.200 --> 0:22:11.880 all the time, particularly now that we're in October and 0:22:11.960 --> 0:22:16.439 there's some full moon weirdness going on. So there's a 0:22:16.480 --> 0:22:21.320 fairly widespread belief that there's a connection between full moons 0:22:21.359 --> 0:22:25.280 and an increase in the number of medical emergencies that happened. 0:22:25.359 --> 0:22:29.520 Generally speaking, that people act irresponsibly during a full moon, 0:22:29.640 --> 0:22:33.760 and that often results in injury, which means greater activity 0:22:33.800 --> 0:22:38.480 at hospitals. Now, this belief is most likely due to 0:22:38.640 --> 0:22:43.680 confirmation bias. That is, we already have a belief in place, 0:22:44.040 --> 0:22:46.880 and the belief is that full moons lead to more 0:22:46.920 --> 0:22:51.000 accidents because of people acting irresponsibly. That is what we believe. 0:22:51.720 --> 0:22:55.760 It doesn't have evidence yet, and then when things do 0:22:55.920 --> 0:22:58.960 get busy at a hospital and there happens to be 0:22:59.000 --> 0:23:03.159 a full moon, we register that as evidence for our belief. Aha, 0:23:03.920 --> 0:23:07.840 says the mistaken person. The full moon explains it. However, 0:23:08.200 --> 0:23:11.080 on nights when it is busy but there is no 0:23:11.160 --> 0:23:14.160 full moon, there's no hit, no one, no one takes 0:23:14.200 --> 0:23:17.280 notice of how odd you know, it's crazy busy, but 0:23:17.359 --> 0:23:20.959 there's no full moon tonight. We don't do that. Likewise, 0:23:21.520 --> 0:23:25.000 if it happens to not be busy but there's a 0:23:25.040 --> 0:23:27.800 full moon, you're also not likely to notice. You're not 0:23:27.880 --> 0:23:30.159 likely to say, like hunt, it's not very busy tonight, 0:23:30.200 --> 0:23:33.560 but there's a full moon out. So it's only when 0:23:33.800 --> 0:23:37.120 you have the full moon and the busy hospital where 0:23:37.119 --> 0:23:41.360 the evidence appears to support your belief and confirm your bias. 0:23:42.040 --> 0:23:44.480 But in truth, when you take a step back and 0:23:44.560 --> 0:23:47.520 you do an objective study and you look at the 0:23:47.640 --> 0:23:50.440 times when a hospital is busy, and you look at 0:23:50.520 --> 0:23:52.439 when there was a full moon, and you look to 0:23:52.440 --> 0:23:56.280 see if there's any correlation, it falls apart. Now I 0:23:56.320 --> 0:23:58.959 got a little off track there, But the point I 0:23:58.960 --> 0:24:03.040 wanted to make is that we humans are biologically attuned 0:24:03.240 --> 0:24:08.080 to recognizing patterns. It's very likely that pattern recognition is 0:24:08.080 --> 0:24:11.240 one of the traits that really helped us survive thousands 0:24:11.240 --> 0:24:14.359 of years ago, which is why it's so intrinsic in 0:24:14.400 --> 0:24:19.359 the human experience. But building programs, computer systems that are 0:24:19.359 --> 0:24:23.880 capable of identifying patterns and separating out what is signal 0:24:24.119 --> 0:24:28.000 versus what is noise is its own really big challenge. 0:24:28.800 --> 0:24:31.280 S r I was hoping to create a program that 0:24:31.320 --> 0:24:34.520 could look for patterns and user behavior in order to 0:24:34.640 --> 0:24:38.879 respond with greater precision and accuracy to user requests and 0:24:39.040 --> 0:24:43.680 ultimately to anticipate future requests. Now we see the sort 0:24:43.720 --> 0:24:47.960 of pattern recognition and response in lots of technology today. 0:24:48.000 --> 0:24:51.240 There are several smart thermostats on the market right now, 0:24:51.440 --> 0:24:55.200 for example, that can track when you tend to raise 0:24:55.480 --> 0:24:58.399 or lower the temperature in your home, and after a while, 0:24:58.640 --> 0:25:01.480 the thermostat learns that, hey, maybe you like it nice 0:25:01.480 --> 0:25:03.840 and chilly at night, but you want it to be 0:25:03.960 --> 0:25:07.320 warm and toasty in the morning, and so the thermostat 0:25:07.400 --> 0:25:10.840 begins to adjust itself in preparation for that based on 0:25:10.920 --> 0:25:14.800 your previous behaviors. Now that is a very simple example. 0:25:15.359 --> 0:25:18.960 Extrapolate that out and you begin to imagine a technology 0:25:19.000 --> 0:25:22.639 that is anticipating what you need or want, perhaps before 0:25:22.680 --> 0:25:26.320 you're even aware of it yourself, which can get kind 0:25:26.359 --> 0:25:29.480 of creepy but also sort of magical. But in truth, 0:25:29.520 --> 0:25:34.639 it's because this system is detecting patterns that we aren't 0:25:34.680 --> 0:25:38.679 even able to recognize ourselves. The danger there, of course, 0:25:39.200 --> 0:25:43.159 is that the systems can sometimes mistakenly identify a pattern 0:25:43.520 --> 0:25:46.120 when in fact there's not really a pattern there. Very 0:25:46.160 --> 0:25:48.720 similar to the case I was explaining about with the 0:25:48.840 --> 0:25:52.800 full moon and the busy hospital. Even computer systems can 0:25:52.800 --> 0:25:56.640 make those sort of mistakes, and depending upon the implementation, 0:25:56.920 --> 0:25:59.960 that can be a real problem. But that's a that's 0:26:00.000 --> 0:26:02.960 an issue for a different podcast. Now. When it comes 0:26:02.960 --> 0:26:06.919 to humans, pattern recognition is so ingrained in most of 0:26:07.000 --> 0:26:09.760 us that it can actually be kind of hard to explain. 0:26:10.000 --> 0:26:13.280 You notice, when something happens, and if that same thing 0:26:13.359 --> 0:26:17.080 happens later with the same general results as the first time, 0:26:17.560 --> 0:26:22.120 it reinforces your first perception of that thing, and if 0:26:22.119 --> 0:26:24.760 it happens over and over, their brain essentially comes to 0:26:24.840 --> 0:26:29.280