WEBVTT - Smart Talks With IBM: Transformations in AI: why foundation models are the future 0:00:00.120 --> 0:00:02.840 Hey everyone, it's Robert and Joe here. Today we've got 0:00:02.840 --> 0:00:04.720 something a little bit different to share with you. It 0:00:04.840 --> 0:00:08.000 is a new season of the Smart Talks with IBM 0:00:08.119 --> 0:00:09.119 podcast series. 0:00:09.600 --> 0:00:11.680 Today we are witnessed to one of those rare moments 0:00:11.680 --> 0:00:14.360 in history, the rise of an innovative technology with the 0:00:14.360 --> 0:00:18.680 potential to radically transform business and society forever. The technology, 0:00:18.760 --> 0:00:22.160 of course, is artificial intelligence, and it's the central focus 0:00:22.239 --> 0:00:24.800 for this new season of Smart Talks with IBM. 0:00:25.320 --> 0:00:28.400 Join hosts from your favorite Pushkin podcasts as they talk 0:00:28.480 --> 0:00:31.640 with industry experts and leaders to explore how businesses can 0:00:31.680 --> 0:00:35.360 integrate AI into their workflows and help drive real change 0:00:35.400 --> 0:00:38.160 in this new era of AI. And of course, host 0:00:38.280 --> 0:00:40.440 Malcolm Gladwell will be there to guide you through the 0:00:40.479 --> 0:00:42.640 season and throw in his two cents as well. 0:00:43.120 --> 0:00:46.120 Look out for new episodes of Smart Talks with IBM 0:00:46.400 --> 0:00:49.519 every other week on the iHeartRadio app, Apple Podcasts, or 0:00:49.560 --> 0:00:53.360 wherever you get your podcasts, and learn more at IBM 0:00:53.479 --> 0:00:55.480 dot com slash smart Talks. 0:00:57.720 --> 0:01:01.160 Hello, Hello, Welcome to Smart Talks with IBA, a podcast 0:01:01.200 --> 0:01:06.480 from Pushkin Industries, iHeartRadio and IBM. I'm Malcolm Gabwell. This 0:01:06.560 --> 0:01:11.200 season we're continuing our conversation with new creators visionaries who 0:01:11.240 --> 0:01:15.319 are creatively applying technology in business to drive change, but 0:01:15.440 --> 0:01:19.600 with a focus on the transformative power of artificial intelligence 0:01:19.920 --> 0:01:22.640 and what it means to leverage AI as a game 0:01:22.720 --> 0:01:27.440 changing multiplier for your business. Our guest today is doctor 0:01:27.520 --> 0:01:32.720 David Cox, VP of AI Models at IBM Research and 0:01:32.959 --> 0:01:37.520 IBM Director of the MIT IBM Watson AI Lab, a 0:01:37.600 --> 0:01:42.480 first of its kind industry academic collaboration between IBM and 0:01:42.640 --> 0:01:48.240 MIT focused on the fundamental research of artificial intelligence. Over 0:01:48.240 --> 0:01:51.960 the course of decades, David Cox watched as the AI 0:01:52.080 --> 0:01:55.920 revolution steadily grew from the simmering ideas of a few 0:01:55.960 --> 0:02:00.840 academics and technologists into the industrial boom we are experiencing today. 0:02:01.800 --> 0:02:04.720 Having dedicated his life to pushing the field of AI 0:02:04.840 --> 0:02:09.280 towards new horizons, David has both contributed to and presided 0:02:09.320 --> 0:02:14.520 over many of the major breakthroughs in artificial intelligence. In 0:02:14.560 --> 0:02:18.680 today's episode, you'll hear David explain some of the conceptual 0:02:18.840 --> 0:02:23.760 underpinnings of the current AI landscape, things like foundation models, 0:02:23.800 --> 0:02:27.680 in surprisingly comprehensible terms. I might add, we'll also get 0:02:27.720 --> 0:02:31.320 into some of the amazing practical applications for AI in business, 0:02:31.480 --> 0:02:34.440 as well as what implications AI will have for the 0:02:34.440 --> 0:02:38.200 future of work and design. David spoke with Jacob Goldstein, 0:02:38.480 --> 0:02:42.560 host of the Pushkin podcast What's Your Problem. A veteran 0:02:42.639 --> 0:02:46.000 business journalist, Jacob has reported for The Wall Street Journal, 0:02:46.280 --> 0:02:49.160 the Miami Herald, and was a longtime host of the 0:02:49.280 --> 0:02:58.960 NPR program Planet Money. Okay, let's get to the interview. 0:03:00.840 --> 0:03:02.799 Tell me about your job at IBM. 0:03:03.240 --> 0:03:03.360 So. 0:03:03.680 --> 0:03:06.640 I wear two hats at IBM. So one, I'm the 0:03:06.680 --> 0:03:10.080 IBM Director of the MIT, IBM Watson AI Lab. So 0:03:10.160 --> 0:03:13.040 that's a joint lab between IBM and MIT where we 0:03:13.600 --> 0:03:15.880 try and invent what's next in AI. It's been running 0:03:15.919 --> 0:03:18.920 for about five years, and then more recently I started 0:03:19.040 --> 0:03:21.800 as the vice president for AI Models, and I'm in 0:03:21.880 --> 0:03:26.400 charge of building IBM's foundation models, you know, building these 0:03:26.520 --> 0:03:28.600 these big models, generative models that allow us to have 0:03:28.639 --> 0:03:30.680 all kinds of new exciting capabilities in AI. 0:03:31.240 --> 0:03:33.160 So, so I want to talk to you a lot 0:03:33.200 --> 0:03:36.640 about foundation models, about genitive AI. But before we get 0:03:36.640 --> 0:03:38.720 to that, let's just spend a minute on the on 0:03:38.840 --> 0:03:44.480 the IBM MIT collaboration. Where did that partnership start, How 0:03:44.480 --> 0:03:45.280 did it originate? 0:03:46.440 --> 0:03:49.280 Yeah, So, actually it turns out that MIT and IBM 0:03:49.480 --> 0:03:52.560 have been collaborating for a very long time in the 0:03:52.600 --> 0:03:56.680 area of AI. In fact, the term artificial intelligence was 0:03:56.720 --> 0:04:00.440 coined in a nineteen fifty six workshop that was held 0:04:00.440 --> 0:04:02.440 at Dartmouth, but it was actually organized by an IBM 0:04:02.560 --> 0:04:05.720 or Nathaniel Rochester, who led the development of the IBM 0:04:05.760 --> 0:04:08.920 seven and one. So we've really been together in AIS 0:04:08.960 --> 0:04:13.840 since the beginning, and as AI kept accelerating more and 0:04:13.880 --> 0:04:17.320 more and more, I think there was a really interesting 0:04:17.360 --> 0:04:19.880 decision to say, let's make this a formal partnership, so 0:04:20.080 --> 0:04:22.320 IBM in twenty seventeen and also to be committing close 0:04:22.360 --> 0:04:25.200 to a quarter billion dollars over ten years to have 0:04:25.279 --> 0:04:29.360 this joint lab with MIT, and we located ourselves right 0:04:29.400 --> 0:04:31.640 on the campus and we've been developing very very deep 0:04:31.680 --> 0:04:34.280 relationships where we can really get to know each other, 0:04:34.400 --> 0:04:37.719 work shoulder to shoulder, conceiving what we should work on next, 0:04:37.760 --> 0:04:41.720 and then executing the projects. And it's really very few 0:04:42.360 --> 0:04:46.040 entities like this exist between academia industry. It's been really 0:04:46.080 --> 0:04:48.320 fun the last five years to be a part of it. 0:04:48.960 --> 0:04:50.400 And what do you think are some of the most 0:04:50.440 --> 0:04:54.000 important outcomes of this collaboration between IBM and MIT. 0:04:55.400 --> 0:04:58.039 Yeah, so we're really kind of the tip of the 0:04:58.080 --> 0:05:02.720 sphere for IBM the I strategy. So we're we're really 0:05:02.720 --> 0:05:05.560 looking what, you know, what's coming ahead, and you know, 0:05:05.560 --> 0:05:08.239 in areas like Foundation models, you know, as the field 0:05:08.400 --> 0:05:11.600 changes and I T people are interested in working on 0:05:11.760 --> 0:05:13.719 you know, faculty, students and staff are interested in working 0:05:13.800 --> 0:05:15.520 on what's the latest thing, what's the next thing. We 0:05:15.560 --> 0:05:18.960 at IBM Research are very much interested in the same. 0:05:19.240 --> 0:05:21.240 So we can kind of put out feelers, you know, 0:05:21.320 --> 0:05:24.880 interesting things that we're seeing in our research, interesting things 0:05:24.920 --> 0:05:26.479 we're hearing in the field. We can go and chase 0:05:26.480 --> 0:05:29.599 those opportunities. So when something big comes, like the big 0:05:29.680 --> 0:05:32.560 change that's been happening lately with Foundation Models, we're ready 0:05:32.560 --> 0:05:35.000 to jump on it. That's really the purpose, that's that's 0:05:35.000 --> 0:05:38.120 the lab functioning the way it should. We're also really 0:05:38.120 --> 0:05:41.640 interested in how do we advance you know AI that 0:05:41.680 --> 0:05:44.600 can help with climate change or you know, build better 0:05:44.720 --> 0:05:47.240 materials and all these kinds of things that are you know, 0:05:47.320 --> 0:05:50.440 a broader aperture sometimes than than what we might consider 0:05:50.600 --> 0:05:53.479 just looking at the product portfolio of IBM, and that 0:05:53.480 --> 0:05:55.240 that gives us again a breadth where we can see 0:05:55.240 --> 0:05:58.240 connections that we might not have seen otherwise. We can 0:05:58.520 --> 0:06:01.039 you know, think things that help out society and also 0:06:01.040 --> 0:06:02.080 help out our customers. 0:06:03.240 --> 0:06:07.680 So the last whatever six months, say, there has been 0:06:07.720 --> 0:06:13.400 this wild rise in the public's interest in AI, right 0:06:13.480 --> 0:06:16.880 clearly coming out of these generative AI models that are 0:06:16.920 --> 0:06:21.039 really accessible, you know, certainly chat GPT language models like that, 0:06:21.080 --> 0:06:24.159 as well as models that generate images like mid Journey. 0:06:24.760 --> 0:06:27.520 I mean, can you just sort of briefly talk about 0:06:27.680 --> 0:06:31.680 the breakthroughs in AI that have made this moment feel 0:06:31.760 --> 0:06:35.240 so exciting, so revolutionary for artificial intelligence. 0:06:36.320 --> 0:06:41.039 Yeah. You know, I've been studying AI basically my entire 0:06:41.080 --> 0:06:43.200 adult life. Before I came to IBM, I was a 0:06:43.200 --> 0:06:45.760 professor at Harvard. I've been doing this a long time, 0:06:46.000 --> 0:06:48.280 and I've gotten used to being surprised. It sounds like 0:06:48.320 --> 0:06:51.400 a joke, but it's serious, Like I'm getting used to 0:06:51.440 --> 0:06:55.359 being surprised at the acceleration of the pace Again. It 0:06:55.440 --> 0:06:58.320 tracks actually a long way back. You know, there's lots 0:06:58.360 --> 0:07:00.920 of things where there was an idea that just simmered 0:07:01.600 --> 0:07:04.960 for a really long time. Some of the key math 0:07:05.440 --> 0:07:09.000 behind the stuff that we have today, which is amazing 0:07:09.720 --> 0:07:12.960 there's an algorithm called backpropagation, which is sort of key 0:07:13.000 --> 0:07:15.600 to training neural networks that's been around, you know, since 0:07:15.680 --> 0:07:19.520 the eighties in wide use. And really what happened was 0:07:19.840 --> 0:07:23.560 it simmered for a long time and then enough data 0:07:23.800 --> 0:07:27.440 and enough compute came. So we had enough data because 0:07:28.080 --> 0:07:31.240 you know, we all started carrying multiple cameras around with us. 0:07:31.280 --> 0:07:34.080 Our mobile phones have all, you know, all these cameras 0:07:34.160 --> 0:07:36.600 and this we put everything on the Internet, and there's 0:07:36.600 --> 0:07:39.119 all this data out there. We caught a lucky break 0:07:39.120 --> 0:07:41.400 that there was something called a graphics processing unit, which 0:07:41.680 --> 0:07:43.680 turns out to be really useful for doing these kinds 0:07:43.720 --> 0:07:46.240 of algorithms, maybe even more useful than it is for 0:07:46.320 --> 0:07:50.400 doing graphics. They're greater graphics too, And things just kept 0:07:50.480 --> 0:07:53.680 kind of adding to the snowball. So we had deep learning, 0:07:54.120 --> 0:07:57.720 which is sort of a rebrand of neural networks that 0:07:57.760 --> 0:07:59.960 I mentioned from from the eighties, and that was enable 0:08:00.160 --> 0:08:03.800 again by data because we digitalize the world and compute 0:08:03.800 --> 0:08:06.240 because because we kept building faster and faster and more 0:08:06.280 --> 0:08:09.520 powerful computers, and then that allowed us to make this 0:08:09.520 --> 0:08:13.200 this big breakthrough. And then you know, more recently, using 0:08:13.280 --> 0:08:17.360 the same building blocks that inexorable rise of more and 0:08:17.400 --> 0:08:21.920 more and more data met a technology called self supervised learning, 0:08:22.400 --> 0:08:27.120 where the key difference there in traditional deep learning, you know, 0:08:27.160 --> 0:08:29.760 for classifying images, you know, like is this a cat 0:08:29.880 --> 0:08:33.120 or is this a dog? And a picture, those technologies 0:08:33.520 --> 0:08:36.880 require super visions, so you have to take what you 0:08:36.960 --> 0:08:38.320 have and then you have to label it. So you 0:08:38.320 --> 0:08:39.679 have to take a picture of a cat, and then 0:08:39.679 --> 0:08:42.240 you label it as a cat. And it turns out 0:08:42.280 --> 0:08:44.600 that you know, that's very powerful, that it takes a 0:08:44.600 --> 0:08:47.600 lot of time to label gaps and to label dogs, 0:08:47.640 --> 0:08:50.040 and there's only so many labels that exist in the world. 0:08:50.440 --> 0:08:54.000 So what really changed more recently is that we have 0:08:54.080 --> 0:08:56.560 self supervised learning where you don't have to have the labels. 0:08:56.559 --> 0:08:59.040 We can just take unannotated data. And what that does 0:08:59.160 --> 0:09:02.240 is it lots you use even more data. And that's 0:09:02.280 --> 0:09:06.200 really what drove this latest sort of rage. And then 0:09:06.559 --> 0:09:08.199 and then all of a sudden we start getting these 0:09:08.559 --> 0:09:14.600 really powerful models. And then really this has been simmering technologies, right, 0:09:15.120 --> 0:09:19.280 this has been happening for a while and progressively getting 0:09:19.280 --> 0:09:21.880 more and more powerful. One of the things that really 0:09:22.120 --> 0:09:27.320 happened with CHATGBT and technologies like stable diffusion and mid 0:09:27.360 --> 0:09:30.480 journey was that they made it visible to the public. 0:09:31.160 --> 0:09:33.080 You know, you put it out there, the public can 0:09:33.120 --> 0:09:35.280 touch and feel and they're like, wow, not only is 0:09:35.280 --> 0:09:38.560 there palpable change and wow this you know, I could 0:09:38.559 --> 0:09:40.720 talk to this thing. Wow, this thing can generate an image. 0:09:41.040 --> 0:09:43.679 Not only that, but everyone can touch and feel and try. 0:09:44.520 --> 0:09:49.160 My kids can use some of these AI art generation technologies. 0:09:49.679 --> 0:09:53.200 And that's really just launched. You know. It's like a 0:09:53.600 --> 0:09:57.199 propelled slingshot at us into a different regime in terms 0:09:57.200 --> 0:09:58.960 of the public awareness of these technologies. 0:09:59.640 --> 0:10:02.800 You mentioned earlier in the conversation foundation models, and I 0:10:02.840 --> 0:10:04.679 want to talk a little bit about that. I mean, 0:10:04.720 --> 0:10:08.120 can you just tell me, you know, what are foundation 0:10:08.360 --> 0:10:11.080 models for AI and why are they a big deal? 0:10:12.280 --> 0:10:16.000 Yeah, So this term foundation model was coined by a 0:10:16.000 --> 0:10:19.760 group at Stanford, and I think it's actually a really 0:10:19.800 --> 0:10:23.319 apt term because remember I said, you know, one of 0:10:23.360 --> 0:10:26.680 the big things that unlocked this latest excitement was the 0:10:26.679 --> 0:10:30.280 fact that we could use large amounts of unannotated data. 0:10:30.840 --> 0:10:32.200 We could train a model. We don't have to go 0:10:32.240 --> 0:10:35.760 through the painful effort of labeling each and every example. 0:10:36.320 --> 0:10:38.560 You still need to have your model do something you 0:10:38.559 --> 0:10:40.720 wanted to do you still need to tell it what 0:10:40.800 --> 0:10:42.360 you want to do. You can't just have a model 0:10:42.400 --> 0:10:45.600 that doesn't have any purpose, but what a foundation model 0:10:45.640 --> 0:10:49.280 that provides a foundation, like a literal foundation, you can 0:10:49.320 --> 0:10:51.320 sort of stand on the shoulders of giants. You can 0:10:51.400 --> 0:10:53.960 have one of these massively trained models and then do 0:10:54.040 --> 0:10:56.400 a little bit on top. You know, you could use 0:10:56.440 --> 0:10:59.280 just a few examples of what you're looking for and 0:10:59.559 --> 0:11:01.880 you can get what you want from the model. So 0:11:02.040 --> 0:11:04.040 just a little bit on top now gets to the 0:11:04.080 --> 0:11:06.120 results that a huge amount of effort used to have 0:11:06.200 --> 0:11:08.400 to put in, you know, to get from the ground 0:11:08.520 --> 0:11:10.120 up to that level. 0:11:10.400 --> 0:11:14.320 I was trying to think of of an analogy for 0:11:14.440 --> 0:11:17.480 sort of foundation models versus what came before, and I 0:11:17.480 --> 0:11:19.960 don't know that I came up with a good one, 0:11:20.000 --> 0:11:21.719 but the best I could do was this. I want 0:11:21.720 --> 0:11:24.640 you to tell me if it's plausible. It's like before 0:11:24.760 --> 0:11:28.280 foundation models, it was like you had these sort of 0:11:28.360 --> 0:11:31.560 single use kitchen appliances. You could make a waffle iron 0:11:31.600 --> 0:11:34.120 if you wanted waffles, or you could make a toaster 0:11:34.280 --> 0:11:36.839 if you wanted to make toast. But a foundation model 0:11:36.880 --> 0:11:39.480 is like like an oven with a range on top. 0:11:39.520 --> 0:11:41.319 So it's like this machine, and you could just cook 0:11:41.400 --> 0:11:43.240 anything with this machine. 0:11:43.880 --> 0:11:48.360 Yeah, that's a great analogy. They're very versatile. The other 0:11:48.480 --> 0:11:51.040 piece of it, too, is that they dramatically lowered the 0:11:51.160 --> 0:11:54.320 effort that it takes to do something that you want 0:11:54.360 --> 0:11:57.640 to do. And I used to say about the old 0:11:57.679 --> 0:11:59.440 world of AI, would say, you know, the problem with 0:11:59.480 --> 0:12:03.240 automation is that it's too labor intensive, which sounds like 0:12:03.280 --> 0:12:04.160 I'm making a joke. 0:12:04.400 --> 0:12:08.920 Indeed, famously, if automation does one thing, it substitutes machines 0:12:09.040 --> 0:12:12.280 or computing power for labor, right, So what does that 0:12:12.360 --> 0:12:16.640 mean to say AI is or automation is too labor intensive. 0:12:17.120 --> 0:12:19.120 It sounds like I'm making a joke, but I'm actually serious. 0:12:19.320 --> 0:12:22.520 What I mean is that the effort it took the 0:12:22.640 --> 0:12:26.480 old regime to automate something was very very high. So 0:12:26.640 --> 0:12:29.520 if I need to go and curate all this data, 0:12:29.600 --> 0:12:32.800 collect all this data, and then carefully label all these examples, 0:12:33.200 --> 0:12:37.600 that labeling itself might be incredibly expensive and time. And 0:12:37.640 --> 0:12:40.240 we estimate anywhere between eighty to ninety percent of the 0:12:40.320 --> 0:12:43.200 effort it takes to feel an AI solution actually is 0:12:43.360 --> 0:12:46.679 just spent on data, so that that has some consequences, 0:12:47.000 --> 0:12:52.360 which is the threshold for bothering. You know, if you're 0:12:52.360 --> 0:12:54.559 going to only get a little bit of value back 0:12:54.800 --> 0:12:57.040 from something, are you going to go through this huge 0:12:57.040 --> 0:13:00.560 effort to curate all this data and then when it 0:13:00.559 --> 0:13:02.960 comes time to train the model you need highly skilled 0:13:03.000 --> 0:13:06.680 people defensive or hard to find in the labor market. 0:13:07.200 --> 0:13:08.719 You know, are you really going to do something that's 0:13:08.760 --> 0:13:10.800 just a title incremental thing? Now you're going to do 0:13:10.840 --> 0:13:15.840 the only the highest value things that weren't right level because. 0:13:15.440 --> 0:13:18.840 You have to essentially build the whole machine from scratch, 0:13:19.040 --> 0:13:21.960 and there aren't many things where it's worth that much 0:13:21.960 --> 0:13:24.040 work to build a machine that's only going to do 0:13:24.160 --> 0:13:25.319 one narrow thing. 0:13:25.800 --> 0:13:28.719 That's right, and then you tackle the next problem and 0:13:28.840 --> 0:13:31.160 you basically have to start over. And you know, there 0:13:31.160 --> 0:13:34.000 are some nuances here, like for images, you can pre 0:13:34.040 --> 0:13:36.560 train a model on some other task and change it around. 0:13:36.559 --> 0:13:39.520 So there are some examples of this, like non recurring 0:13:39.640 --> 0:13:42.160 cost that we have in the old world too, But 0:13:42.240 --> 0:13:44.800 by and large, it's just a lot of effort. It's hard. 0:13:45.080 --> 0:13:49.360 It takes you know, a large level of skill to implement. 0:13:50.160 --> 0:13:52.959 One analogy that I like is, you know, think about 0:13:52.960 --> 0:13:55.080 it as you know, you have a river of data, 0:13:55.440 --> 0:13:58.840 you know, running through your company or your institution. Traditional 0:13:58.840 --> 0:14:01.320 AI solutions are kind of like building a dam on 0:14:01.360 --> 0:14:04.840 that river. You know, dams are very expensive things to build. 0:14:05.160 --> 0:14:09.439 They require highly specialized skills and lots of planning. And 0:14:09.640 --> 0:14:11.360 you know, you're only going to put a dam on 0:14:11.720 --> 0:14:14.440 a river that's big enough that you're gonna get enough 0:14:14.520 --> 0:14:16.400 energy out of it that it was worth your trouble. 0:14:16.800 --> 0:14:18.040 You're going to get a lot of value out of 0:14:18.040 --> 0:14:19.920 that dam. If you have a river like that, you know, 0:14:20.000 --> 0:14:23.560 a river of data, but it's actually the vast majority 0:14:23.600 --> 0:14:26.160 of the water you know in your kingdom, actually isn't 0:14:26.160 --> 0:14:30.360 in that river. It's in puddles and creeks and bable bricks, 0:14:30.400 --> 0:14:33.840 And you know, there's a lot of value left on 0:14:33.880 --> 0:14:36.440 the table because it's like, well, I can't there's nothing 0:14:36.440 --> 0:14:38.240 you can do about it. It's just that that's too 0:14:39.240 --> 0:14:42.360 low value, so it takes too much effort, so I'm 0:14:42.400 --> 0:14:43.960 just not going to do it. The return around investment 0:14:44.320 --> 0:14:47.080 just isn't there, so you just end up not automating things. 0:14:47.320 --> 0:14:50.640 It's too much of a pain. Now what foundation models 0:14:50.640 --> 0:14:52.720 do is they say, well, actually no, we can train 0:14:53.600 --> 0:14:55.680 a base model, a foundation that you can work on 0:14:55.800 --> 0:14:57.760 and we don't we don't care. We have specifying what 0:14:57.760 --> 0:14:59.280 the task is ahead of time. We just need to 0:14:59.680 --> 0:15:02.440 learn about the domain of data. So if we want 0:15:02.440 --> 0:15:05.760 to build something that can understand English language, there's a 0:15:05.800 --> 0:15:09.080 ton of English language text available out in the world. 0:15:09.280 --> 0:15:13.040 We can now train models on huge quantities of it, 0:15:13.520 --> 0:15:17.280 and then it learned the structure, learned how language you know, 0:15:17.400 --> 0:15:20.280 good part of how language works on all that unlabeled data, 0:15:20.360 --> 0:15:22.600 and then when you roll up with your task, you know, 0:15:22.840 --> 0:15:26.240 I want to solve this particular problem. You don't have 0:15:26.320 --> 0:15:29.000 to start from scratch. You're starting from a very very 0:15:29.080 --> 0:15:31.960 very high place. So that just gives you the ability 0:15:32.040 --> 0:15:34.440 to just you know, now all of a sudden, everything 0:15:34.680 --> 0:15:37.560 is accessible. All the puddles and greeks and babbling brooks 0:15:37.720 --> 0:15:42.000 and klipons, you know, those are all accessible now. And 0:15:42.040 --> 0:15:44.920 that's that's very exciting. But it just changes the equation 0:15:45.040 --> 0:15:47.600 on what kinds of problems you could use AI to solve. 0:15:47.720 --> 0:15:53.160 And so foundation models basically mean that automating some new 0:15:53.280 --> 0:15:56.520 task is much less labor intensive. The sort of marginal 0:15:56.600 --> 0:15:59.560 effort to do some new automation thing is much lower 0:15:59.560 --> 0:16:02.880 because you're building on top of the foundation model rather 0:16:02.960 --> 0:16:07.320 than starting from scratch. Absolutely, So that is that is 0:16:07.440 --> 0:16:11.040 like the exciting good news. I do feel like there's 0:16:11.840 --> 0:16:14.400 a little bit of a countervailing idea that's worth talking 0:16:14.480 --> 0:16:16.800 about here, and that is the idea that even though 0:16:16.840 --> 0:16:20.880 there are these foundation models that are really powerful, that 0:16:20.920 --> 0:16:23.960 are relatively easy to build on top of, it's still 0:16:24.000 --> 0:16:27.200 the case right that there is not some one size 0:16:27.240 --> 0:16:30.920 fits all foundation model. So you know, what does that 0:16:31.040 --> 0:16:33.280 mean and why is that important to think about in 0:16:33.320 --> 0:16:34.040 this context? 0:16:34.640 --> 0:16:38.440 Yeah, so we believe very strongly that there isn't just 0:16:38.560 --> 0:16:41.400 one model to rule them all. There's a number of 0:16:41.480 --> 0:16:44.440 reasons why that could be true. One which I think 0:16:44.520 --> 0:16:48.560 is important and very relevant today is how much energy 0:16:48.880 --> 0:16:53.640 these models can consume. So these models, you know, can 0:16:53.680 --> 0:16:59.120 get very very large. So one thing that we're starting 0:16:59.160 --> 0:17:01.880 to see or starting to believe, is that you probably 0:17:01.920 --> 0:17:07.040 shouldn't use one giant sledgehammer model to solve every single problem, 0:17:07.200 --> 0:17:09.160 you know, like we should pick the right size model 0:17:09.200 --> 0:17:12.000 to solve the problem. We shouldn't necessarily assume that we 0:17:12.040 --> 0:17:16.600 need the biggest, baddest model for every little use case. 0:17:17.040 --> 0:17:19.280 And we're also seeing that, you know, small models that 0:17:19.320 --> 0:17:23.520 are trained like to specialize on particular domains can actually 0:17:23.520 --> 0:17:27.080 outperform much bigger models. So bigger isn't always even better. 0:17:27.440 --> 0:17:30.040 So they're more efficient and they do the thing you 0:17:30.080 --> 0:17:31.680 want them to do better as well. 0:17:32.240 --> 0:17:35.520 That's right. So Stanford, for instance, a group of Stanford 0:17:35.520 --> 0:17:38.639 trained a model. It is a two point seven billion 0:17:38.680 --> 0:17:41.840 parameter model, which isn't terribly big by today's standards. They 0:17:41.840 --> 0:17:44.119 trained it just on the biomedical literature, you know, this 0:17:44.160 --> 0:17:46.520 is the kind of thing that universities do. And what 0:17:46.560 --> 0:17:50.080 they showed was that this model was better at answering 0:17:50.160 --> 0:17:52.680 questions about the biomedical literature than some models that are 0:17:53.200 --> 0:17:56.919 one hundred billion parameters, you know, many times larger. So 0:17:57.080 --> 0:17:59.600 it's a little bit like you know, asking an expert 0:18:00.119 --> 0:18:03.919 for help on something versus asking the smartest person you know. Ye, 0:18:04.000 --> 0:18:06.440 the smartest person you know may be very smart, but 0:18:06.560 --> 0:18:09.439 they're not going to be expertise. And then as an 0:18:09.480 --> 0:18:11.919 added bonus, you know, this is now a much smaller model, 0:18:12.000 --> 0:18:13.919 it's much more efficient to run, we are you know, 0:18:14.480 --> 0:18:18.400 you know, it's cheaper, so there's lots of different advantages there. 0:18:18.440 --> 0:18:22.000 So I think we're going to see attention in the 0:18:22.080 --> 0:18:25.360 industry between vendors that say hey, this is the one, 0:18:25.560 --> 0:18:27.920 you know, big model, and then others that say, well, actually, 0:18:28.200 --> 0:18:30.720 you know, there's there's you know, lots of different tools 0:18:30.720 --> 0:18:32.720 we can use that all have this nice quality that 0:18:32.760 --> 0:18:35.440 we outlined at the beginning, and then we should really 0:18:35.440 --> 0:18:36.960 pick the one that makes the most sense for the 0:18:37.320 --> 0:18:38.040 task at hand. 0:18:39.320 --> 0:18:43.720 So there's sustainability basically efficiency. Another kind of set of 0:18:43.720 --> 0:18:46.000 issues that come up a lot with ai A are 0:18:46.200 --> 0:18:50.000 bias hallucination. Can you talk a little bit about bias 0:18:50.240 --> 0:18:52.480 and hallucination, what they are and how you're working to 0:18:52.880 --> 0:18:54.000 mitigate those problems. 0:18:54.400 --> 0:18:57.240 Yeah, so there are lots of issues still as amazing 0:18:57.280 --> 0:19:00.240 as these technologies are, and they are amazing, let's let's 0:19:00.280 --> 0:19:02.720 be very clear, lots of great things we're going to 0:19:02.840 --> 0:19:06.640 enable with these kinds of technologies. Bias isn't a new problem, 0:19:07.000 --> 0:19:11.600 so you know, basically we've seen this since the beginning 0:19:11.600 --> 0:19:14.520 of AI. If you train a model on data that 0:19:14.960 --> 0:19:17.080 has a bias in it, the model is going to 0:19:17.119 --> 0:19:21.680 recapitulate that bias when it provides its answers. So every time, 0:19:21.880 --> 0:19:24.399 you know, if all the text you have says, you know, 0:19:24.440 --> 0:19:27.520 it's more likely to refer to female nurses and male scientists. 0:19:27.560 --> 0:19:29.639 Then you're going to you know, get models that you know. 0:19:29.720 --> 0:19:32.760 For instance, there was an example where a machine learning 0:19:32.800 --> 0:19:37.200 based translation system translated from Hungarian to English. Hungarian doesn't 0:19:37.240 --> 0:19:40.520 have gendered pronouns. English does, and when you ask them 0:19:40.560 --> 0:19:42.879 to translate, it would translate they are a nurse to 0:19:43.320 --> 0:19:46.160 she is a nurse, would translate they are a scientist 0:19:46.200 --> 0:19:48.399 to he is a scientist. And that's not because the 0:19:49.240 --> 0:19:51.800 people who wrote the algorithm were building in bias and 0:19:51.920 --> 0:19:53.640 coding in like oh, it's got to be this way. 0:19:53.720 --> 0:19:55.960 It's because the data was like that. You know, we 0:19:56.040 --> 0:20:00.159 have biases in our society and they're reflected in in 0:20:00.200 --> 0:20:04.000 our data and our text and our images everywhere. And 0:20:04.040 --> 0:20:06.920 then the models they're just mapping from what they've what 0:20:06.960 --> 0:20:09.480 they've seen in their training data to to the result 0:20:09.520 --> 0:20:11.400 that you're trying to get them to do and to give, 0:20:11.440 --> 0:20:15.240 and then these biases come out. So there's a very 0:20:15.600 --> 0:20:19.439 active program of research, and you know, we we do 0:20:19.800 --> 0:20:22.600 quite a bit at IBM research and my T but 0:20:22.840 --> 0:20:25.960 also all over the community and industry and academia trying 0:20:25.960 --> 0:20:29.320 to figure out how do we explicitly remove these biases, 0:20:29.359 --> 0:20:31.560 how do we identify them, how do you know, how 0:20:31.600 --> 0:20:33.840 do we build tools that allow people to audit their 0:20:33.840 --> 0:20:36.439 systems to make sure they aren't biased. So this is 0:20:36.440 --> 0:20:38.560 a really important thing. And you know, again this was 0:20:38.600 --> 0:20:43.200 here since the beginning, you know of machine learning and AI, 0:20:43.680 --> 0:20:47.240 but foundation models and large language models and generative AI 0:20:48.119 --> 0:20:50.880 just bring it into sharper even sharper focus because there's 0:20:50.880 --> 0:20:53.399 just so much data and it's sort of building in 0:20:54.000 --> 0:20:56.960 baking and all these different biases we have, so that 0:20:57.080 --> 0:21:01.240 that's that's absolutely a problem that these model have. Another 0:21:01.280 --> 0:21:04.720 one that you mentioned was hallucinations. So even the most 0:21:04.720 --> 0:21:08.840 impressive of our models will often just make stuff up. 0:21:09.280 --> 0:21:11.840 You know, the technical term that the heels chosen as 0:21:11.960 --> 0:21:15.280 is hallucination. To give you an example, I asked chat 0:21:15.359 --> 0:21:19.920 tbt to create a biography of David Cox IBM, and 0:21:20.040 --> 0:21:22.320 you know, it started off really well. You know, they 0:21:22.440 --> 0:21:24.639 identified that I was the director of the mt IBM 0:21:24.680 --> 0:21:27.320 Watsonay and said a few words about that, and then 0:21:27.359 --> 0:21:32.119