WEBVTT - Smart Talks with IBM: RE-AIR - Transformations in AI: Why Foundation Models Are the Future 0:00:00.160 --> 0:00:02.840 Hey everyone, it's Robert and Joe here. Today we've got 0:00:02.880 --> 0:00:04.760 something a little bit different to share with you. It 0:00:04.840 --> 0:00:08.039 is a new season of the Smart Talks with IBM 0:00:08.160 --> 0:00:09.160 podcast series. 0:00:09.640 --> 0:00:11.639 Today we are witnessed to one of those rare moments 0:00:11.680 --> 0:00:14.400 in history, the rise of an innovative technology with the 0:00:14.400 --> 0:00:18.680 potential to radically transform business and society forever. The technology, 0:00:18.760 --> 0:00:22.240 of course, is artificial intelligence, and it's the central focus 0:00:22.239 --> 0:00:24.840 for this new season of Smart Talks with IBM. 0:00:25.320 --> 0:00:28.400 Join hosts from your favorite Pushkin podcasts as they talk 0:00:28.480 --> 0:00:31.680 with industry experts and leaders to explore how businesses can 0:00:31.720 --> 0:00:35.400 integrate AI into their workflows and help drive real change 0:00:35.400 --> 0:00:38.240 in this new era of AI. And of course, host 0:00:38.280 --> 0:00:40.479 Malcolm Gladwell will be there to guide you through the 0:00:40.479 --> 0:00:42.640 season and throw in his two cents as well. 0:00:43.120 --> 0:00:46.200 Look out for new episodes of Smart Talks with IBM 0:00:46.400 --> 0:00:49.559 every other week on the iHeartRadio app, Apple Podcasts, or 0:00:49.600 --> 0:00:53.360 wherever you get your podcasts. And learn more at IBM 0:00:53.479 --> 0:00:55.480 dot com slash smart Talks. 0:00:56.120 --> 0:00:59.760 Hey, it's Jacob Goldstein for Smart Talks with IBM. Last 0:00:59.840 --> 0:01:02.280 year I had the pleasure of sitting down with doctor 0:01:02.360 --> 0:01:07.399 David Cox, VP of AI Models at IBM Research. We 0:01:07.480 --> 0:01:11.200 explored the fascinating world of AI foundation models and their 0:01:11.240 --> 0:01:15.919 revolutionary potential for business automation and innovation. When we first 0:01:16.000 --> 0:01:19.319 aired this episode, the concept of foundation models was just 0:01:19.400 --> 0:01:23.280 beginning to capture our attention. Since then, this technology has 0:01:23.400 --> 0:01:27.480 evolved and redefined the boundaries of what's possible. Businesses are 0:01:27.520 --> 0:01:30.959 becoming more savvy about selecting the right models and understanding 0:01:30.959 --> 0:01:34.360 how they can drive revenue and efficiency. As I listened 0:01:34.360 --> 0:01:37.280 back to the conversation, it was interesting to reflect on 0:01:37.319 --> 0:01:41.120 some new developments and ideas that have emerged, and many 0:01:41.160 --> 0:01:44.320 of these we will continue to explore throughout the season, 0:01:44.840 --> 0:01:46.920 like how to play an active role in choosing the 0:01:46.920 --> 0:01:50.440 best model for your needs. Whether you're a longtime listener 0:01:50.520 --> 0:01:52.920 or tuning in for the first time, I'm certain you'll 0:01:52.920 --> 0:01:56.680 find doctor Cox's insights as thought provoking as ever. Thanks 0:01:56.680 --> 0:01:59.760 as always for joining us. Now let's dive in. 0:02:01.440 --> 0:02:05.240 Hello, Hello, Welcome to Smart Talks with IBM, a podcast 0:02:05.240 --> 0:02:10.720 from Pushkin Industries, iHeartRadio and IBM. I'm Malcolm Gladwell. Our 0:02:10.760 --> 0:02:15.440 guest today is doctor David Cox, VP of AI Models 0:02:15.480 --> 0:02:20.480 at IBM Research and IBM Director of the MIT IBM 0:02:20.560 --> 0:02:24.800 Watson AI Lab, a first of its kind industry academic 0:02:24.919 --> 0:02:30.000 collaboration between IBM and MIT focused on the fundamental research 0:02:30.400 --> 0:02:35.160 of artificial intelligence. Over the course of decades, David Cox 0:02:35.240 --> 0:02:39.880 watched as the AI revolution steadily grew from the simmering 0:02:39.919 --> 0:02:43.760 ideas of a few academics and technologists into the industrial 0:02:43.840 --> 0:02:48.160 boom we are experiencing today. Having dedicated his life to 0:02:48.240 --> 0:02:51.640 pushing the field of AI towards new horizons, David has 0:02:51.680 --> 0:02:55.600 both contributed to and presided over many of the major 0:02:55.720 --> 0:03:01.320 breakthroughs in artificial intelligence. In today's episode, you'll hear David 0:03:01.320 --> 0:03:06.440 explain some of the conceptual underpinnings of the current AI landscape, 0:03:06.600 --> 0:03:11.240 things like foundation models, in surprisingly comprehensible terms, I might add, 0:03:11.800 --> 0:03:14.959 we'll also get into some of the amazing practical applications 0:03:14.960 --> 0:03:18.000 for AI in business, as well as what implications AI 0:03:18.120 --> 0:03:21.640 will have for the future of work and design. David 0:03:21.680 --> 0:03:25.400 spoke with Jacob Goldstein, host of the Pushkin podcast What's 0:03:25.440 --> 0:03:29.720 Your Problem. A veteran business journalist, Jacob has reported for 0:03:29.760 --> 0:03:32.560 The Wall Street Journal, the Miami Herald, and was a 0:03:32.560 --> 0:03:38.200 longtime host of the NPR program Planet Money. Okay, let's 0:03:38.240 --> 0:03:39.120 get to the interview. 0:03:41.480 --> 0:03:43.240 Tell me about your job at IBM. 0:03:43.840 --> 0:03:47.160 So I wear two hats at IBM. So one, I'm 0:03:47.200 --> 0:03:50.080 the IBM Doctor of the MIT IBM Watson AI Lab. 0:03:50.560 --> 0:03:53.600 So that's a joint lab between IBM and MIT where 0:03:53.640 --> 0:03:56.200 we try and invent what's next in AI. It's been 0:03:56.280 --> 0:03:59.120 running for about five years, and then more recently I 0:03:59.160 --> 0:04:02.400 started as Vice president for AI Models, and I'm in 0:04:02.480 --> 0:04:07.040 charge of building IBM's foundation models, you know, building these 0:04:07.120 --> 0:04:09.240 these big models, generative models that allow us to have 0:04:09.240 --> 0:04:11.320 all kinds of new exciting capabilities in AI. 0:04:11.840 --> 0:04:13.760 So, so I want to talk to you a lot 0:04:13.840 --> 0:04:17.240 about foundation models, about genitive AI. But before we get 0:04:17.279 --> 0:04:19.360 to that, let's just spend a minute on the on 0:04:19.440 --> 0:04:25.080 the IBM MIT collaboration. Where did that partnership start, How 0:04:25.080 --> 0:04:25.920 did it originate? 0:04:27.080 --> 0:04:29.880 Yeah, So, actually it turns out that MIT and IBM 0:04:30.120 --> 0:04:33.200 have been collaborating for a very long time in the 0:04:33.240 --> 0:04:37.280 area of AI. In fact, the term artificial intelligence was 0:04:37.360 --> 0:04:41.000 coined in a nineteen fifty six workshop that was held 0:04:41.040 --> 0:04:43.200 at Dartmouth. It was actually organized by an IBM or 0:04:43.279 --> 0:04:46.560 Nathaniel Rochester, who led the development of the IBM seven 0:04:46.600 --> 0:04:49.840 and one. So we've really been together in AIS since 0:04:49.880 --> 0:04:54.719 the beginning and as AI kept accelerating more and more 0:04:54.760 --> 0:04:58.360 and more, I think there was a really interesting decision 0:04:58.360 --> 0:05:01.080 to say, let's make this a formal partnership. So IBM 0:05:01.120 --> 0:05:02.960 in twenty seventeen and now, so it'll be committing close 0:05:02.960 --> 0:05:05.840 to a quarter billion dollars over ten years to have 0:05:05.920 --> 0:05:09.799 this joint lab with MIT, and we we located ourselves 0:05:09.839 --> 0:05:12.000 right on the campus and we've been developing very very 0:05:12.080 --> 0:05:14.359 deep relationships where we can you know, really get to 0:05:14.400 --> 0:05:17.479 know each other, work shoulder to shoulder, conceiving what we 0:05:17.480 --> 0:05:20.120 should work on next, and then executing the projects. And 0:05:20.200 --> 0:05:24.159 it's really you know, very few entities like this exist 0:05:24.640 --> 0:05:27.640 between academia industry. It's been really fun of the last 0:05:27.680 --> 0:05:28.920 five years to be a part of it. 0:05:29.560 --> 0:05:31.080 And what do you think are some of the most 0:05:31.080 --> 0:05:34.600 important outcomes of this collaboration between IBM and MIT. 0:05:36.000 --> 0:05:38.680 Yeah, so we're really kind of the tip of the 0:05:38.720 --> 0:05:43.640 sphere for for IBM's the I strategy. So we're really looking, 0:05:43.880 --> 0:05:46.680 you know, what's coming ahead, and you know, in areas 0:05:46.720 --> 0:05:50.480 like foundation models, you know, as the field changes, MIT 0:05:50.640 --> 0:05:53.279 people are interested in working on you know, faculty, students 0:05:53.279 --> 0:05:55.440 and staff are interested in working on what's the latest thing, 0:05:55.480 --> 0:05:58.520 what's the next thing. We at IBM Research are very 0:05:58.560 --> 0:06:00.839 much interested in the same. We can kind of put 0:06:00.839 --> 0:06:03.719 out feelers, you know, interesting things that we're seeing in 0:06:03.760 --> 0:06:06.480 our research, interesting things we're hearing in the field. We 0:06:06.520 --> 0:06:09.640 can go and chase those opportunities. So when something big comes, 0:06:09.760 --> 0:06:12.719 like the big change that's been happening lately with foundation models, 0:06:12.720 --> 0:06:15.120 we're ready to jump on it. That's really the purpose, 0:06:15.200 --> 0:06:18.160 that's that's the lab functioning the way it should. We're 0:06:18.200 --> 0:06:21.680 also really interested in how do we advance you know, 0:06:21.839 --> 0:06:24.560 AI that can help with climate change or you know, 0:06:24.680 --> 0:06:27.320 build better materials and all these kinds of things that 0:06:27.360 --> 0:06:30.320 are you know, a broader aperture sometimes than what we 0:06:30.400 --> 0:06:33.400 might consider just looking at the product portfolio of IBM, 0:06:33.720 --> 0:06:35.480 and that that gives us again a breadth where we 0:06:35.520 --> 0:06:38.160 can see connections that we might not have seen otherwise. 0:06:38.520 --> 0:06:41.080 We can you know, think things that help out society 0:06:41.200 --> 0:06:42.680 and also help out our customers. 0:06:43.480 --> 0:06:47.920 So the last whatever six months, say, there has been 0:06:47.960 --> 0:06:53.680 this wild rise in the public's interest in AI right 0:06:53.720 --> 0:06:57.160 clearly coming out of these generative AI models that are 0:06:57.160 --> 0:07:01.279 really accessible you know, certainly chat GPT language models like that, 0:07:01.320 --> 0:07:04.400 as well as models that generate images like mid journey. 0:07:05.000 --> 0:07:08.480 I mean, can you just sort of briefly talk about 0:07:07.920 --> 0:07:11.920 the breakthroughs in AI that have made this moment feel 0:07:12.000 --> 0:07:15.520 so exciting, so revolutionary for artificial intelligence. 0:07:16.560 --> 0:07:21.280 Yeah. You know, I've been studying AI basically my entire 0:07:21.320 --> 0:07:23.440 adult life. Before I came to IBM, I was a 0:07:23.440 --> 0:07:26.000 professor at Harvard. I've been doing this a long time, 0:07:26.240 --> 0:07:28.520 and I've gotten used to being surprised. It sounds like 0:07:28.560 --> 0:07:31.640 a joke, but it's serious, Like I'm getting used to 0:07:31.680 --> 0:07:35.600 being surprised at the acceleration of the pace again. It 0:07:35.680 --> 0:07:38.560 tracks actually a long way back. You know, there's lots 0:07:38.600 --> 0:07:41.160 of things where there was an idea that just simmered 0:07:41.840 --> 0:07:45.200 for a really long time. Some of the key math 0:07:45.680 --> 0:07:49.280 behind the stuff that we have today, which is amazing. 0:07:50.000 --> 0:07:52.880 There's an algorithm called back propagation, which is sort of 0:07:52.960 --> 0:07:55.600 key to training neural networks that's been around, you know, 0:07:55.640 --> 0:07:59.520 since the eighties in wide use. And really what happened 0:07:59.640 --> 0:08:03.320 was it simmered for a long time, and then enough 0:08:03.480 --> 0:08:07.239 data and enough compute came so we had enough data 0:08:07.280 --> 0:08:11.240 because you know, we all started carrying multiple cameras around 0:08:11.240 --> 0:08:13.640 with us. Our mobile phones have all you know, all 0:08:13.640 --> 0:08:16.400 these cameras and this we put everything on the Internet, 0:08:16.520 --> 0:08:18.760 and there's all this data out there. We caught a 0:08:18.840 --> 0:08:21.320 lucky break that there was something called the graphics processing unit, 0:08:21.400 --> 0:08:23.640 which turns out to be really useful for doing these 0:08:23.720 --> 0:08:26.360 kinds of algorithms, maybe even more useful than it is 0:08:26.400 --> 0:08:30.360 for doing graphics. They're greater graphics too, And things just 0:08:30.480 --> 0:08:33.080 kept kind of adding to the snowball. So we had 0:08:33.240 --> 0:08:37.439 deep learning, which is sort of a rebrand of neural 0:08:37.480 --> 0:08:39.960 networks that I mentioned from the eighties, and that was 0:08:40.040 --> 0:08:43.199 enabled again by data because we digitalized the world and 0:08:43.679 --> 0:08:46.520 compute because we kept building faster and faster and more 0:08:46.520 --> 0:08:49.760 powerful computers, and then that allowed us to make this 0:08:49.760 --> 0:08:53.440 this big breakthrough. And then, you know, more recently, using 0:08:53.520 --> 0:08:57.600 the same building blocks, that inexorable rise of more and 0:08:57.640 --> 0:09:02.160 more and more data, that technology called self supervised learning. 0:09:02.640 --> 0:09:07.360 Where the key difference there in traditional deep learning, you know, 0:09:07.400 --> 0:09:10.040 for classifying images, you know, like is this a cat 0:09:10.120 --> 0:09:13.320 or is this a dog? And a picture those technologies 0:09:13.800 --> 0:09:17.120 require supper visions, so you have to take what you 0:09:17.200 --> 0:09:18.560 have and then you have to label it. So you 0:09:18.600 --> 0:09:19.920 have to take a picture of a cat and then 0:09:19.960 --> 0:09:22.640 you label it as a cat, and it turns out that, 0:09:22.800 --> 0:09:25.000 you know, that's very powerful, but it takes a lot 0:09:25.000 --> 0:09:27.920 of time to label gats and to label dogs, and 0:09:28.360 --> 0:09:30.280 there's only so many labels that exist in the world. 0:09:30.679 --> 0:09:34.240 So what really changed more recently is that we have 0:09:34.320 --> 0:09:36.800 self supervised learning where you don't have to have the labels. 0:09:36.800 --> 0:09:39.360 We can just take unannotated data. And what that does 0:09:39.400 --> 0:09:42.480 is it allows you use even more data. And that's 0:09:42.520 --> 0:09:46.120 really what drove this this latest sort of rage. And 0:09:46.120 --> 0:09:48.280 then and then all of a sudden we start getting 0:09:48.320 --> 0:09:52.199 these these really powerful models. And then really, this has 0:09:52.240 --> 0:09:57.040 been simmering technologies, right, this has been happening for a 0:09:57.080 --> 0:10:01.280 while and progressively getting more and more powerful. One of 0:10:01.280 --> 0:10:05.560 the things that really happened with CHATGBT and technologies like 0:10:06.000 --> 0:10:09.079 Stable Diffusion and mid Journey was that they made it 0:10:09.640 --> 0:10:12.319 visible to the public. You know, you put it out 0:10:12.360 --> 0:10:14.600 there the public can touch and feel and they're like, wow, 0:10:14.920 --> 0:10:18.480 not only is there palpable change, and wow this you know, 0:10:18.520 --> 0:10:20.000 I can talk to this thing. Wow, this thing can 0:10:20.080 --> 0:10:22.959 generate an image. Not only that, but everyone can touch 0:10:23.000 --> 0:10:27.280 and feel and try. My kids can use some of 0:10:27.280 --> 0:10:32.720 these AI art generation technologies, and that's really just launched. 0:10:32.800 --> 0:10:36.040 You know, it's like a propelled slingshot at us into 0:10:36.360 --> 0:10:38.400 a different regime. In terms of the public awareness of 0:10:38.400 --> 0:10:39.239 these technologies. 0:10:39.920 --> 0:10:43.040 You mentioned earlier in the conversation foundation models, and I 0:10:43.080 --> 0:10:44.920 want to talk a little bit about that. I mean, 0:10:44.960 --> 0:10:48.360 can you just tell me, you know, what are foundation 0:10:48.600 --> 0:10:51.360 models for AI and why are they a big deal? 0:10:52.520 --> 0:10:56.240 Yeah, So this term foundation model was coined by a 0:10:56.280 --> 0:10:59.960 group at Stanford, and I think it's actually a really 0:11:00.080 --> 0:11:03.600 apt term because remember I said, you know, one of 0:11:03.600 --> 0:11:06.920 the big things that unlocked this latest excitement was the 0:11:06.920 --> 0:11:10.520 fact that we could use large amounts of unannotated data. 0:11:11.080 --> 0:11:12.440 We could train a model. We don't have to go 0:11:12.480 --> 0:11:16.000 through the painful effort of labeling each and every example. 0:11:16.559 --> 0:11:18.800 You still need to have your model do something you 0:11:18.800 --> 0:11:21.000 wanted to do. You still need to tell it what 0:11:21.040 --> 0:11:22.600 you want to do. You can't just have a model 0:11:22.640 --> 0:11:25.160 that doesn't, you know, have any purpose. But what a 0:11:25.200 --> 0:11:29.040 foundation models that provides a foundation, like a literal foundation. 0:11:29.280 --> 0:11:31.320 You can sort of stand on the shoulders of giants. 0:11:31.360 --> 0:11:34.079 You can have them these massively trained models, and then 0:11:34.120 --> 0:11:36.280 do a little bit on top. You know, you could 0:11:36.480 --> 0:11:38.560 use just a few examples of what you're looking for 0:11:39.360 --> 0:11:41.520 and you can get what you want from the model, 0:11:42.040 --> 0:11:44.080 So just a little bit on top now gets to 0:11:44.240 --> 0:11:46.240 the results that a huge amount of effort used to 0:11:46.280 --> 0:11:48.199 have to put in, you know, to get from the 0:11:48.320 --> 0:11:50.360 ground up to that level. 0:11:50.640 --> 0:11:54.560 I was trying to think of of an analogy for 0:11:54.679 --> 0:11:57.720 sort of foundation models versus what came before, and I 0:11:57.760 --> 0:12:00.240 don't know that I came up with a good one, 0:12:00.280 --> 0:12:01.959 but the best I could do was this. I want 0:12:01.960 --> 0:12:04.920 you to tell me if it's plausible. It's like before 0:12:05.000 --> 0:12:08.520 foundation models, it was like you had these sort of 0:12:08.600 --> 0:12:11.800 single use kitchen appliances. You could make a waffle iron 0:12:11.840 --> 0:12:14.360 if you wanted waffles, or you could make a toaster 0:12:14.520 --> 0:12:17.079 if you wanted to make toast. But a foundation model 0:12:17.160 --> 0:12:19.720 is like like an oven with a range on top. 0:12:19.800 --> 0:12:21.559 So it's like this machine and you could just cook 0:12:21.640 --> 0:12:23.480 anything with this machine. 0:12:24.120 --> 0:12:28.600 Yeah, that's a great analogy. They're very versatile. The other 0:12:28.720 --> 0:12:31.280 piece of it, too, is that they dramatically lower the 0:12:31.400 --> 0:12:34.560 effort that it takes to do something that you want 0:12:34.600 --> 0:12:37.600 to do. And sometimes I used to say about the 0:12:37.640 --> 0:12:39.600 old world of AI, would say, you know, the problem 0:12:39.640 --> 0:12:43.400 with automation is that it's too labor intensive. H sounds 0:12:43.440 --> 0:12:44.400 like I'm making a joke. 0:12:44.640 --> 0:12:49.200 Indeed, famously, if automation does one thing, it substitutes machines 0:12:49.320 --> 0:12:52.520 or computing power for labor. Right, So what does that 0:12:52.600 --> 0:12:56.880 mean to say AI is or automation is too labor intensive. 0:12:57.360 --> 0:12:59.320 It sounds like I'm making a joke, but I'm actually serious. 0:12:59.559 --> 0:13:02.800 What I mean is that the effort it took the 0:13:02.880 --> 0:13:06.719 old regime to automate something was very, very high. So 0:13:06.920 --> 0:13:09.800 if I need to go and curate all this data, 0:13:09.840 --> 0:13:13.040 collect all this data, and then carefully label all these examples, 0:13:13.440 --> 0:13:17.360 that labeling itself might be incredibly expensive and time. So 0:13:17.760 --> 0:13:20.360 and we estimate anywhere between eighty to ninety percent of 0:13:20.400 --> 0:13:23.240 the effort it takes to feel an AI solution actually 0:13:23.360 --> 0:13:26.959 is just spent on data, so that that has some consequences, 0:13:27.240 --> 0:13:32.600 which is the threshold for bothering. You know, if you're 0:13:32.600 --> 0:13:34.800 going to only get a little bit of value back 0:13:35.040 --> 0:13:37.280 from something, are you going to go through this huge 0:13:37.280 --> 0:13:40.800 effort to curate all this data and then when it 0:13:40.800 --> 0:13:43.240 comes time to train the model, you need highly skilled 0:13:43.240 --> 0:13:47.280 people expensive or hard to find in the labor market. 0:13:47.440 --> 0:13:48.959 You know, are you really going to do something that's 0:13:49.000 --> 0:13:50.920 just a tiny, little incremental thing. Now, you're going to 0:13:50.960 --> 0:13:54.600 do the only the highest value things that weren't at 0:13:54.800 --> 0:13:56.000 level because. 0:13:55.679 --> 0:13:59.080 You have to essentially build the whole machine from scratch, 0:13:59.280 --> 0:14:02.200 and there aren't many things where it's worth that much 0:14:02.240 --> 0:14:04.280 work to build a machine that's only going to do 0:14:04.400 --> 0:14:05.600 one narrow thing. 0:14:06.040 --> 0:14:09.000 That's right, and then you tackle the next problem and 0:14:09.080 --> 0:14:11.400 you basically have to start over. And you know, there 0:14:11.440 --> 0:14:14.240 are some nuances here, like for images, you can pre 0:14:14.280 --> 0:14:16.800 train a model on some other tasks and change it around. 0:14:16.800 --> 0:14:19.760 So there are some examples of this like non recurring 0:14:19.880 --> 0:14:22.480 cost that we have in the old world too, But 0:14:22.520 --> 0:14:25.040 by and large, it's just a lot of effort. It's hard. 0:14:25.320 --> 0:14:29.600 It takes, you know, a large level of skill to implement. 0:14:30.400 --> 0:14:33.160 One analogy that I like is, you know, think about 0:14:33.200 --> 0:14:35.320 it as you know, you have a river of data, 0:14:35.720 --> 0:14:39.080 you know, running through your company or your institution. Traditional 0:14:39.080 --> 0:14:41.600 AI solutions are kind of like building a dam on 0:14:41.600 --> 0:14:45.080 that river. You know, dams are very expensive things to build. 0:14:45.440 --> 0:14:49.680 They require highly specialized skills and lots of planning. And 0:14:49.880 --> 0:14:51.560 you know, you're only going to put a dam on 0:14:51.960 --> 0:14:54.680 a river that's big enough that you're gonna get enough 0:14:54.800 --> 0:14:57.200 energy out of it that it was worth trouble. You're 0:14:57.200 --> 0:14:58.640 gonna get a lot of value out of that dam. 0:14:58.680 --> 0:15:00.320 If you have a river like that, you know, a 0:15:00.400 --> 0:15:03.960 river of data, but it's actually the vast majority of 0:15:04.160 --> 0:15:06.520 the water you know in your kingdom actually isn't in 0:15:06.560 --> 0:15:10.560 that river. It's in puddles and greeks and babid brooks, 0:15:10.640 --> 0:15:14.080 And you know, there's a lot of value left on 0:15:14.120 --> 0:15:16.680 the table because it's like, well, I can't there's nothing 0:15:16.720 --> 0:15:18.520 you can do about it. It's just that that's too 0:15:19.480 --> 0:15:22.600 low value. So it takes too much effort, so I'm 0:15:22.640 --> 0:15:24.200 just not going to do it. The return around investment 0:15:24.560 --> 0:15:27.120 just isn't there, so you just end up not automating 0:15:27.160 --> 0:15:29.960 things because it's too much of a pain. Now what 0:15:30.000 --> 0:15:32.440 foundation models do is they say, well, actually, no, we 0:15:32.480 --> 0:15:35.680 can train a base model a foundation that you can 0:15:35.720 --> 0:15:37.360 work on that we don't we don't care. We don't 0:15:37.400 --> 0:15:39.240 specify what the task is ahead of time. We just 0:15:39.280 --> 0:15:42.440 need to learn about the domain of data. So if 0:15:42.440 --> 0:15:45.320 we want to build something that can understand English language, 0:15:45.640 --> 0:15:48.920 there's a ton of English language text available out in 0:15:48.960 --> 0:15:53.040 the world. We can now train models on huge quantities 0:15:53.040 --> 0:15:56.200 of it, and then it learned the structure. It learned 0:15:56.280 --> 0:15:59.040 how language, you know, good part of how language works 0:15:59.120 --> 0:16:01.400 on all that unlabeled data. And then when you roll 0:16:01.480 --> 0:16:04.440 up with your task, you know, I want to solve 0:16:04.440 --> 0:16:07.560 this particular problem, you don't have to start from scratch. 0:16:07.600 --> 0:16:11.040 You're starting from a very very very high place. So 0:16:11.080 --> 0:16:13.560 that just gives you the ability to just you know, now, 0:16:13.600 --> 0:16:16.440 all of a sudden, everything is accessible. All the puddles 0:16:16.440 --> 0:16:19.200 and greeks and babbling brooks and kettlepons, you know, those 0:16:19.200 --> 0:16:23.960 are all accessible now. And that's that's very exciting. But 0:16:24.040 --> 0:16:26.520 it just changes the equation on what kinds of problems 0:16:26.640 --> 0:16:27.840 you could use AI to solve. 0:16:27.960 --> 0:16:33.400 And so foundation models basically mean that automating some new 0:16:33.520 --> 0:16:36.760 task is much less labor intensive, The sort of marginal 0:16:36.840 --> 0:16:39.840 effort to do some new automation thing is much lower 0:16:39.880 --> 0:16:43.120 because you're building on top of the foundation model rather 0:16:43.200 --> 0:16:47.560 than starting from scratch. Absolutely, so that is that is 0:16:47.680 --> 0:16:51.280 like the exciting good news. I do feel like there's 0:16:52.080 --> 0:16:54.680 a little bit of a countervailing idea that's worth talking 0:16:54.720 --> 0:16:57.080 about here, and that is the idea that even though 0:16:57.080 --> 0:17:01.120 there are these foundation models that are really powerful, that 0:17:01.160 --> 0:17:04.200 are relatively easy to build on top of, it's still 0:17:04.240 --> 0:17:07.439 the case, right that there is not some one size 0:17:07.480 --> 0:17:11.159 fits all foundation model. So you know, what does that 0:17:11.320 --> 0:17:13.520 mean and why is that important to think about in 0:17:13.560 --> 0:17:14.320 this context? 0:17:14.880 --> 0:17:18.679 Yeah, so we believe very strongly that there isn't just 0:17:18.800 --> 0:17:21.680 one model to rule them all. There's a number of 0:17:21.720 --> 0:17:24.720 reasons why that could be true. One which I think 0:17:24.800 --> 0:17:28.800 is important and very relevant today is how much energy 0:17:29.119 --> 0:17:33.880 these models can consume. So these models, you know, can 0:17:33.920 --> 0:17:39.360 get very very large. So one thing that we're starting 0:17:39.400 --> 0:17:42.120 to see or starting to believe, is that you probably 0:17:42.160 --> 0:17:47.280 shouldn't use one giant sledgehammer model to solve every single problem, 0:17:47.480 --> 0:17:49.400 you know, like we should pick the right size model 0:17:49.440 --> 0:17:52.239 to solve the problem. We shouldn't necessarily assume that we 0:17:52.280 --> 0:17:56.840 need the biggest, baddest model for every little use case. 0:17:57.320 --> 0:17:59.520 And we're also seeing that, you know, small models that 0:17:59.520 --> 0:18:03.439 are trained to like to specialize on particular domains can 0:18:03.480 --> 0:18:07.600 actually outperform much bigger models. So bigger isn't always even better. 0:18:07.720 --> 0:18:10.280 So they're more efficient and they do the thing you 0:18:10.320 --> 0:18:11.960 want them to do better as well. 0:18:12.480 --> 0:18:15.760 That's right. So Stanford, for instance, a group of Stanford 0:18:15.800 --> 0:18:18.919 trained a model. It is a two point seven billion 0:18:18.960 --> 0:18:22.080 parameter model, which isn't terribly big by today's standards. They 0:18:22.080 --> 0:18:24.359 trained it just on the biomedical literature, you know, this 0:18:24.400 --> 0:18:26.800 is the kind of thing that universities do, and what 0:18:26.840 --> 0:18:30.320 they showed was that this model was better at answering 0:18:30.400 --> 0:18:32.920 questions about the biomedical literature than some models that were 0:18:33.440 --> 0:18:37.159 one hundred billion parameters, you know, many times larger. So 0:18:37.320 --> 0:18:39.880 it's a little bit like you know, asking an expert 0:18:40.320 --> 0:18:43.600 for help on something versus asking the smartest person. You know, 0:18:44.160 --> 0:18:46.720 the smartest person you know may be very smart, but 0:18:46.800 --> 0:18:49.679 they're not going to be expertise. And then as an 0:18:49.720 --> 0:18:52.199 added bonus, you know, this is now a much smaller model, 0:18:52.280 --> 0:18:54.159 it's much more efficient to run. We are you know, 0:18:54.760 --> 0:18:58.640 you know, it's cheaper. So there's lots of different advantages there. 0:18:58.680 --> 0:19:02.280 So I think we're going to see attention in the 0:19:02.320 --> 0:19:05.600 industry between vendors that say, hey, this is the one, 0:19:05.800 --> 0:19:08.159 you know, big model, and then others that say, well, actually, 0:19:08.440 --> 0:19:10.960 you know, there's there's you know, lots of different tools 0:19:10.960 --> 0:19:13.000 we can use that all have this nice quality that 0:19:13.040 --> 0:19:15.680 we outlined at the beginning, and then we should really 0:19:15.680 --> 0:19:17.200 pick the one that makes the most sense for the 0:19:17.560 --> 0:19:18.280 task at hand. 0:19:19.560 --> 0:19:23.960 So there's sustainability basically efficiency, another kind of set of 0:19:23.960 --> 0:19:27.880 issues that come up a lot with AI A are bias, hallucination. 0:19:28.600 --> 0:19:31.200 Can you talk a little bit about bias and hallucination, 0:19:31.320 --> 0:19:34.240 what they are and how you're working to mitigate those problems. 0:19:34.640 --> 0:19:37.520 Yeah, so there are lots of issues still as amazing 0:19:37.520 --> 0:19:40.440 as these technologies are, and they are amazing, let's let's 0:19:40.480 --> 0:19:42.960 be very clear, lots of great things we're going to 0:19:43.080 --> 0:19:46.920 enable with these kinds of technologies. Bias isn't a new problem. 0:19:47.240 --> 0:19:51.840 So you know, basically we've seen this since the beginning 0:19:51.880 --> 0:19:54.800 of AI. If you train a model on data that 0:19:55.200 --> 0:19:57.320 has a bias in it, the model is going to 0:19:57.359 --> 0:20:01.920 recapitulate that bias when it provides it's answers. So every time, 0:20:02.119 --> 0:20:04.639 you know, if all the text you have says, you know, 0:20:04.680 --> 0:20:07.760 it's more likely to refer to female nurses and male scientists, 0:20:07.800 --> 0:20:09.879 then you're going to you know, get models that you know. 0:20:09.960 --> 0:20:13.040 For instance, there was an example where a machine learning 0:20:13.040 --> 0:20:17.480 based translation system translated from Hungarian to English. Hungarian doesn't 0:20:17.480 --> 0:20:20.800 have gendered pronouns. English does, and when you ask them 0:20:20.800 --> 0:20:23.119 to translate, it would translate they are a nurse to 0:20:23.560 --> 0:20:26.520 she is a nurse, translate they are a scientist. To 0:20:26.600 --> 0:20:29.720 he is a scientist. And that's not because the people 0:20:29.720 --> 0:20:32.520 who wrote the algorithm were building in bias and coding 0:20:32.560 --> 0:20:34.080 in like, oh, it's got to be this way. It's 0:20:34.119 --> 0:20:36.359 because the data was like that. You know, we have 0:20:36.480 --> 0:20:40.920 biases in our society and they're reflected in our data 0:20:40.960 --> 0:20:44.600 and our text and our images everywhere. And then the 0:20:44.640 --> 0:20:47.600 models they're just mapping from what they what they've seen 0:20:47.600 --> 0:20:50.120 in their training data to to the result that you're 0:20:50.200 --> 0:20:51.800 trying to get them to do and to give, and 0:20:51.840 --> 0:20:56.280 then these biases come out. So there's a very active 0:20:57.119 --> 0:21:00.320 program of research and you know, we do quite a 0:21:00.320 --> 0:21:03.880 bit at IBM research and my T but also all 0:21:03.920 --> 0:21:06.639 over the community and industry and academia trying to figure 0:21:06.640 --> 0:21:09.840 out how do we explicitly remove these biases, how do 0:21:09.840 --> 0:21:12.000 we identify them, how do you know, how do we 0:21:12.040 --> 0:21:14.679 build tools that allow people to audit their systems to 0:21:14.680 --> 0:21:17.000 make sure they aren't biased. So this is a really 0:21:17.040 --> 0:21:20.200