WEBVTT - Smart Talks with IBM - Transformations in Al: why foundation models are the future 0:00:04.480 --> 0:00:10.440 Welcome to Tech Stuff, a production from iHeartRadio. 0:00:12.200 --> 0:00:13.840 Today, we are witnessed. 0:00:13.360 --> 0:00:16.600 To one of those rare moments in history, the rise 0:00:16.720 --> 0:00:20.480 of an innovative technology with the potential to radically transform 0:00:20.560 --> 0:00:26.280 business in society forever. That technology, of course, is artificial intelligence, 0:00:26.480 --> 0:00:29.240 and it's the central focus for this new season of 0:00:29.280 --> 0:00:33.360 Smart Talks with IBM. Join hosts from your favorite Pushkin 0:00:33.440 --> 0:00:37.000 podcasts as they talk with industry experts and leaders to 0:00:37.080 --> 0:00:40.839 explore how businesses can integrate AI into their workflows and 0:00:41.000 --> 0:00:44.239 help drive real change in this new era of AI. 0:00:44.680 --> 0:00:47.320 And of course, host Malcolm Gladwell will be there to 0:00:47.360 --> 0:00:49.960 guide you through the season and throw in his two 0:00:49.960 --> 0:00:53.040 cents as well. Look out for new episodes of Smart 0:00:53.040 --> 0:00:56.240 Talks with IBM every other week on the iHeartRadio app, 0:00:56.440 --> 0:00:59.920 Apple Podcasts, or wherever you get your podcasts, and learn 0:01:00.000 --> 0:01:03.800 more at IBM dot com slash smart Talks. 0:01:05.280 --> 0:01:09.280 Hey, it's Jacob Goldstein for Smart Talks with IBM. Last year, 0:01:09.360 --> 0:01:12.399 I had the pleasure of sitting down with doctor David Cox, 0:01:12.720 --> 0:01:17.120 VP of AI Models at IBM Research. We explored the 0:01:17.160 --> 0:01:21.560 fascinating world of AI foundation models and their revolutionary potential 0:01:21.640 --> 0:01:26.199 for business automation and innovation. When we first aired this episode, 0:01:26.319 --> 0:01:29.440 the concept of foundation models was just beginning to capture 0:01:29.480 --> 0:01:34.000 our attention. Since then, this technology has evolved and redefined 0:01:34.000 --> 0:01:37.640 the boundaries of what's possible. Businesses are becoming more savvy 0:01:37.680 --> 0:01:40.560 about selecting the right models and understanding how they can 0:01:40.640 --> 0:01:44.640 drive revenue and efficiency. As I listened back to the conversation, 0:01:44.880 --> 0:01:47.600 it was interesting to reflect on some new developments and 0:01:47.680 --> 0:01:51.200 ideas that have emerged, and many of these we will 0:01:51.240 --> 0:01:54.640 continue to explore throughout the season, like how to play 0:01:54.680 --> 0:01:57.480 an active role in choosing the best model for your needs. 0:01:58.240 --> 0:02:00.480 Whether you're a longtime listener or two in for the 0:02:00.520 --> 0:02:03.760 first time, I'm certain you'll find doctor Cox's insights as 0:02:03.800 --> 0:02:07.120 thought provoking as ever. Thanks as always for joining us. 0:02:07.480 --> 0:02:08.880 Now let's dive in. 0:02:10.600 --> 0:02:14.360 Hello, Hello, Welcome to Smart Talks with IBM, a podcast 0:02:14.400 --> 0:02:19.880 from Pushkin Industries, iHeartRadio and IBM. I'm Malcolm Glabwell. Our 0:02:19.919 --> 0:02:24.560 guest today is doctor David Cox, VP of AI Models 0:02:24.600 --> 0:02:29.639 at IBM Research and IBM Director of the MIT IBM 0:02:29.680 --> 0:02:33.960 Watson AI Lab, a first of its kind industry academic 0:02:34.080 --> 0:02:39.120 collaboration between IBM and MIT focused on the fundamental research 0:02:39.560 --> 0:02:44.280 of artificial intelligence. Over the course of decades, David Cox 0:02:44.400 --> 0:02:49.040 watched as the AI revolution steadily grew from the simmering 0:02:49.040 --> 0:02:52.919 ideas of a few academics and technologists into the industrial 0:02:53.000 --> 0:02:57.280 boom we are experiencing today. Having dedicated his life to 0:02:57.360 --> 0:03:00.800 pushing the field of AI towards new horizons, David has 0:03:00.840 --> 0:03:04.760 both contributed to and presided over many of the major 0:03:04.880 --> 0:03:10.440 breakthroughs in artificial intelligence. In today's episode, you'll hear David 0:03:10.480 --> 0:03:15.600 explain some of the conceptual underpinnings of the current AI landscape, 0:03:15.720 --> 0:03:20.680 things like foundation models, in surprisingly comprehensible terms. I might add, 0:03:20.919 --> 0:03:24.080 we'll also get into some of the amazing practical applications 0:03:24.080 --> 0:03:27.120 for AI in business, as well as what implications AI 0:03:27.280 --> 0:03:30.799 will have for the future of work and design. David 0:03:30.800 --> 0:03:34.520 spoke with Jacob Goldstein, host of the Pushkin podcast What's 0:03:34.560 --> 0:03:38.880 Your Problem. A veteran business journalist, Jacob has reported for 0:03:38.920 --> 0:03:41.680 The Wall Street Journal, the Miami Herald, and was a 0:03:41.720 --> 0:03:47.360 longtime host of the NPR program Planet Money. Okay, let's 0:03:47.400 --> 0:03:48.280 get to the interview. 0:03:50.600 --> 0:03:53.800 Tell me about your job at IBM SO. I wear 0:03:53.920 --> 0:03:57.080 two hats at IBM SO one. I'm the IBM Doctor 0:03:57.120 --> 0:04:00.320 of the MIT IBM Watson the Eye Lab. That's a 0:04:00.480 --> 0:04:03.640 joint lab between IBM and MIT where we try and 0:04:03.680 --> 0:04:06.040 invent what's next in AI. It's been running for about 0:04:06.040 --> 0:04:09.120 five years, and then more recently I started as the 0:04:09.160 --> 0:04:12.040 vice president for AI Models, and I'm in charge of 0:04:12.080 --> 0:04:16.960 building IBM's foundation models, you know, building these these big models, 0:04:16.960 --> 0:04:18.800 generative models that allow us to have all kinds of 0:04:18.880 --> 0:04:20.440 new exciting capabilities in AI. 0:04:21.000 --> 0:04:22.880 So so I want to talk to you a lot 0:04:22.960 --> 0:04:26.400 about foundation models, about genitive AI. But before we get 0:04:26.400 --> 0:04:28.520 to that, let's just spend a minute on the on 0:04:28.600 --> 0:04:31.240 the IBM MIT collaboration. 0:04:32.240 --> 0:04:35.039 Where did that partnership start, How did it originate? 0:04:36.200 --> 0:04:39.039 Yeah, so, actually it turns out that MIT and IBM 0:04:39.240 --> 0:04:42.320 have been collaborating for a very long time in the 0:04:42.360 --> 0:04:46.440 area of AI. In fact, the term artificial intelligence was 0:04:46.480 --> 0:04:50.160 coined in a nineteen fifty six workshop that was held 0:04:50.200 --> 0:04:52.360 at Dartmouth. It was actually organized by an IBM or 0:04:52.400 --> 0:04:55.719 Nathaniel Rochester, who led the development of the IBM seven 0:04:55.720 --> 0:04:59.000 and one. So we've really been together in AI since 0:04:59.000 --> 0:05:03.839 the beginning, and as AI kept accelerating more and more 0:05:03.880 --> 0:05:07.480 and more, I think there was a really interesting decision 0:05:07.480 --> 0:05:10.239 to say let's make this a formal partnership. So IBM 0:05:10.279 --> 0:05:12.080 in twenty seventeen and now, so it'd be committing close 0:05:12.120 --> 0:05:14.960 to a quarter billion dollars over ten years to have 0:05:15.040 --> 0:05:18.880 this joint lab with MIT and we we located ourselves 0:05:18.960 --> 0:05:21.200 right on the campus and we've been developing very very 0:05:21.200 --> 0:05:23.520 deep relationships where we can you know, really get to 0:05:23.520 --> 0:05:26.640 know each other, work shoulder to shoulder, conceiving what we 0:05:26.640 --> 0:05:29.279 should work on next, and then executing the projects. And 0:05:29.320 --> 0:05:33.320 it's really you know, very few entities like this exist 0:05:33.800 --> 0:05:36.800 between academia industry. It's been really fun of the last 0:05:36.800 --> 0:05:38.080 five years to be a part of it. 0:05:38.720 --> 0:05:40.160 And what do you think are some of the most 0:05:40.200 --> 0:05:43.760 important outcomes of this collaboration between IBM and MIT. 0:05:45.160 --> 0:05:47.800 Yeah, so we're really kind of the tip of the 0:05:47.839 --> 0:05:52.799 sphere for IBM's the I strategy. So we're really looking, 0:05:53.000 --> 0:05:55.800 you know, what's coming ahead, and you know, in areas 0:05:55.839 --> 0:05:59.760 like foundation models, you know, as the field changes, MiTV 0:06:00.120 --> 0:06:02.480 are interested in working on you know, faculty, students and 0:06:02.520 --> 0:06:04.520 staff are interested in working on what's the latest thing, 0:06:04.600 --> 0:06:07.680 what's the next thing. We at IBM Research are very 0:06:07.720 --> 0:06:09.800 much interested in the same. So we can kind of 0:06:09.800 --> 0:06:12.719 put out feelers, you know, interesting things that we're seeing 0:06:12.760 --> 0:06:15.559 in our research, interesting things we're hearing in the field. 0:06:15.560 --> 0:06:17.960 We can go and chase those opportunities. So when something 0:06:18.000 --> 0:06:21.000 big comes, like the big change that's been happening lately 0:06:21.040 --> 0:06:23.599 with foundation models, we're ready to jump on it. That's 0:06:23.600 --> 0:06:26.600 really the purpose, that's that's the lab functioning the way 0:06:26.600 --> 0:06:29.800 it should. We're also really interested in how do we 0:06:29.839 --> 0:06:32.440 advance you know, the AI that can help with climate 0:06:32.560 --> 0:06:35.840 change or you know, build better materials and all these 0:06:35.920 --> 0:06:38.200 kinds of things that are you know, a broader aperture 0:06:38.279 --> 0:06:40.960 sometimes than than what we might consider just looking at 0:06:40.960 --> 0:06:43.719 the product portfolio of IBM, and that that gives us 0:06:43.760 --> 0:06:45.920 again a breadth where we can see connections that we 0:06:46.000 --> 0:06:48.880 might not have seen otherwise. We can you know, think 0:06:48.880 --> 0:06:51.839 things that help out society and also help out our customers. 0:06:52.600 --> 0:06:57.080 So the last whatever six months, say, there has been 0:06:57.120 --> 0:07:02.800 this wild rise in the public's interest in AI, right 0:07:02.839 --> 0:07:06.280 clearly coming out of these generative AI models that are 0:07:06.320 --> 0:07:10.400 really accessible, you know, certainly chat GPT language models like that, 0:07:10.480 --> 0:07:13.520 as well as models that generate images like mid Journey. 0:07:14.160 --> 0:07:17.640 I mean, can you just sort of briefly talk about 0:07:17.040 --> 0:07:21.080 the breakthroughs in AI that have made this moment feel 0:07:21.160 --> 0:07:24.640 so exciting, so revolutionary for artificial intelligence. 0:07:25.680 --> 0:07:30.440 Yeah, you know, I've been studying AI basically my entire 0:07:30.480 --> 0:07:32.600 adult life. Before I came to IBM, I was a 0:07:32.600 --> 0:07:35.160 professor at Harvard. I've been doing this a long time, 0:07:35.400 --> 0:07:37.680 and I've gotten used to being surprised. It sounds like 0:07:37.680 --> 0:07:40.960 a joke, but it's serious, like getting used to being 0:07:41.000 --> 0:07:43.559 surprised at the acceleration of the pace. 0:07:44.360 --> 0:07:44.600 Again. 0:07:44.640 --> 0:07:47.400 It tracks actually a long way back. You know, there's 0:07:47.480 --> 0:07:49.720 lots of things where there was an idea that just 0:07:49.840 --> 0:07:53.800 simmered for a really long time. Some of the key 0:07:54.000 --> 0:07:58.400 math behind the stuff that we have today, which is amazing. 0:07:59.120 --> 0:08:01.960 There's an algorithm call back propagation, which is sort of 0:08:02.080 --> 0:08:04.720 key to training neural networks that's been around, you know, 0:08:04.760 --> 0:08:08.679 since the eighties in wide use. And really what happened 0:08:08.800 --> 0:08:12.480 was it simmered for a long time and then enough 0:08:12.640 --> 0:08:16.400 data and enough compute came so we had enough data 0:08:16.440 --> 0:08:20.320 because you know, we all started carrying multiple cameras around 0:08:20.360 --> 0:08:22.760 with us. Our mobile phones have all you know, all 0:08:22.800 --> 0:08:25.600 these cameras and this we put everything on the Internet 0:08:25.680 --> 0:08:27.920 and there's all this data out there. We caught a 0:08:27.960 --> 0:08:30.480 lucky break that there was something called a graphics processing unit, 0:08:30.520 --> 0:08:32.800 which turns out to be really useful for doing these 0:08:32.880 --> 0:08:35.480 kinds of algorithms, maybe even more useful than it is 0:08:35.559 --> 0:08:39.520 for doing graphics. They're greater graphics too, And things just 0:08:39.600 --> 0:08:42.240 kept kind of adding to the snowball. So we had 0:08:42.360 --> 0:08:46.600 deep learning, which is sort of a rebrand of neural 0:08:46.600 --> 0:08:49.079 networks that I mentioned from the eighties, and that was 0:08:49.200 --> 0:08:52.360 enabled again by data because we digitalized the world and 0:08:52.800 --> 0:08:55.480 compute because because we kept building faster and faster and 0:08:55.520 --> 0:08:58.679 more powerful computers, and then that allowed us to make 0:08:58.720 --> 0:09:01.840 this this big breakthrough. And then you know, more recently, 0:09:02.320 --> 0:09:06.600 using the same building blocks, that inexorable rise of more 0:09:06.640 --> 0:09:10.280 and more and more data that the technology called self 0:09:10.320 --> 0:09:16.240 supervised learning. Where the key difference there in traditional deep learning, 0:09:16.320 --> 0:09:18.800 you know, for classifying images, you know, like is this 0:09:18.880 --> 0:09:20.640 a cat or is this a dog? And a picture 0:09:21.080 --> 0:09:26.040 those technologies require supervision, so you have to take what 0:09:26.160 --> 0:09:27.560 you have and then you have to label it. So 0:09:27.600 --> 0:09:28.920 you have to take a picture of a cat and 0:09:28.920 --> 0:09:31.480 then you label it as a cat, and it turns 0:09:31.520 --> 0:09:33.920 out that you know, that's very powerful, but it takes 0:09:33.920 --> 0:09:37.000 a lot of time to label gats and to label dogs, 0:09:37.000 --> 0:09:39.400 and there's only so many labels that exist in the world. 0:09:39.840 --> 0:09:43.400 So what really changed more recently is that we have 0:09:43.480 --> 0:09:45.960 self supervised learning where you don't have to have the labels. 0:09:45.960 --> 0:09:48.480 We can just take unannotated data. And what that does 0:09:48.559 --> 0:09:51.640 is allows you to use even more data. And that's 0:09:51.640 --> 0:09:55.240 really what drove this this latest sort of rage. And 0:09:55.280 --> 0:09:57.400 then and then all of a sudden we start getting 0:09:57.440 --> 0:10:01.360 these these really powerful models. And then really this has 0:10:01.400 --> 0:10:06.160 been simmering technologies, right, this has been happening for a 0:10:06.240 --> 0:10:10.400 while and progressively getting more and more powerful. One of 0:10:10.440 --> 0:10:14.720 the things that really happened with CHATGBT and technologies like 0:10:15.160 --> 0:10:18.600 stable Diffusion and mid Journey was that they made it 0:10:18.800 --> 0:10:21.480 visible to the public. You know, you put it out 0:10:21.480 --> 0:10:23.760 there the public can touch and feel and they're like, wow, 0:10:24.040 --> 0:10:27.640 not only is there palpable change, and wow this you know, 0:10:27.679 --> 0:10:29.199 I can talk to this thing. Wow, this thing can 0:10:29.240 --> 0:10:32.120 generate an image. Not only that, but everyone can touch 0:10:32.120 --> 0:10:36.400 and feel and try. My kids can use some of 0:10:36.440 --> 0:10:41.880 these AI art generation technologies. And that's really just launched, 0:10:41.920 --> 0:10:45.160 you know, it's like a propelled slingshot at us into 0:10:45.520 --> 0:10:47.520 a different regime in terms of the public awareness of 0:10:47.559 --> 0:10:48.920 these technologies. 0:10:49.040 --> 0:10:52.200 You mentioned earlier in the conversation foundation models, and I 0:10:52.240 --> 0:10:54.040 want to talk a little bit about that. I mean, 0:10:54.080 --> 0:10:57.520 can you just tell me, you know, what are foundation 0:10:57.720 --> 0:11:00.480 models for AI and why are they big deal? 0:11:01.679 --> 0:11:05.360 Yeah, So this term foundation model was coined by a 0:11:05.400 --> 0:11:09.080 group at Stanford, and I think it's actually a really 0:11:09.200 --> 0:11:12.679 apt term because remember I said, you know, one of 0:11:12.720 --> 0:11:16.080 the big things that unlocked this latest excitement was the 0:11:16.080 --> 0:11:19.680 fact that we could use large amounts of unannotated data. 0:11:20.000 --> 0:11:21.480 Could we could train a model. We don't have to 0:11:21.520 --> 0:11:25.120 go through the painful effort of labeling each and every example. 0:11:25.720 --> 0:11:27.920 You still need to have your model do something you 0:11:27.960 --> 0:11:30.120 wanted to do. You still need to tell it what 0:11:30.160 --> 0:11:31.760 you want to do. You can't just have a model 0:11:31.760 --> 0:11:33.720 that doesn't, you know, have any purpose. 0:11:34.000 --> 0:11:34.400 But what a. 0:11:34.320 --> 0:11:38.160 Foundation model is that provides a foundation, like a literal foundation. 0:11:38.440 --> 0:11:40.520 You can sort of stand on the shoulders of giants. 0:11:40.520 --> 0:11:43.040 You can have one of these massively trained models and 0:11:43.080 --> 0:11:45.160 then do a little bit on top. You know, you 0:11:45.160 --> 0:11:47.440 could use just a few examples of what you're looking 0:11:47.480 --> 0:11:50.640 for and you can get what you want from the model. 0:11:51.200 --> 0:11:53.199 So just a little bit on top now gets to 0:11:53.360 --> 0:11:55.440 the results that a huge amount of effort used to 0:11:55.440 --> 0:11:57.360 have to put in, you know, to get from the 0:11:57.440 --> 0:11:59.520 ground up to that level. 0:12:00.200 --> 0:12:04.160 Trying to think of of an analogy for sort of 0:12:04.440 --> 0:12:07.240 foundation models versus what came before, and I don't know 0:12:07.280 --> 0:12:09.679 that I came up with a good one, but the 0:12:09.720 --> 0:12:11.320 best I could do was this. I want you to 0:12:11.320 --> 0:12:15.240 tell me if it's plausible. It's like before foundation models, 0:12:15.760 --> 0:12:18.400 it was like you had these sort of single use 0:12:18.640 --> 0:12:21.160 kitchen appliances. You could make a waffle iron if you 0:12:21.200 --> 0:12:23.199 wanted waffles, or you could make a. 0:12:23.160 --> 0:12:24.840 Toaster if you wanted to make toast. 0:12:25.160 --> 0:12:27.960 But a foundation model is like like an oven with 0:12:28.040 --> 0:12:29.960 a range on top. So it's like this machine and 0:12:30.000 --> 0:12:32.640 you could just cook anything with this machine. 0:12:33.280 --> 0:12:37.760 Yeah, that's a great analogy. They're very versatile. The other 0:12:37.880 --> 0:12:40.440 piece of it, too, is that they dramatically lower the 0:12:40.520 --> 0:12:43.679 effort that it takes to do something that you want 0:12:43.720 --> 0:12:46.760 to do. And someone I used to say about the 0:12:46.800 --> 0:12:48.720 old world of AI would say, you know, the problem 0:12:48.760 --> 0:12:52.200 with automation is that it's too labor intensive. H It 0:12:52.240 --> 0:12:53.560 sounds like I'm making a joke. 0:12:53.760 --> 0:12:58.320 Indeed, famously, if automation does one thing, it substitutes machines 0:12:58.440 --> 0:13:01.679 or computing power for labor. Right, So what does that 0:13:01.720 --> 0:13:06.040 mean to say AI is or automation is too labor intensive. 0:13:06.520 --> 0:13:08.480 It sounds like I'm making a joke, but I'm actually serious, 0:13:08.520 --> 0:13:11.240 And what I mean is that the effort it took 0:13:11.840 --> 0:13:15.600 the old regime to automate something was very, very high. 0:13:15.720 --> 0:13:18.920 So if I need to go and curate all this data, 0:13:18.960 --> 0:13:22.199 collect all this data, and then carefully label all these examples, 0:13:22.600 --> 0:13:26.559 that labeling itself might be incredibly expensive and time. So 0:13:26.880 --> 0:13:29.520 and we estimate anywhere between eighty to ninety percent of 0:13:29.559 --> 0:13:32.440 the effort it takes to feel an AI solution actually 0:13:32.520 --> 0:13:36.079 is just spent on data, so that that has some consequences, 0:13:36.400 --> 0:13:41.720 which is the threshold for bothering. You know, if you're 0:13:41.760 --> 0:13:43.920 going to only get a little bit of value back 0:13:44.200 --> 0:13:46.400 from something, are you going to go through this huge 0:13:46.440 --> 0:13:49.960 effort to curate all this data and then when it 0:13:49.960 --> 0:13:52.320 comes time to train the model, you need highly skilled 0:13:52.400 --> 0:13:56.439 people expensive or hard to find in the labor market. 0:13:56.600 --> 0:13:58.120 You know, are you really going to do something that's 0:13:58.160 --> 0:14:00.000 just a tiny little incremental thing. Now you're going to 0:14:00.080 --> 0:14:03.240 do the only the highest value things that warn't right 0:14:03.920 --> 0:14:05.000 level because you. 0:14:04.960 --> 0:14:08.559 Have to essentially build the whole machine from scratch, and 0:14:08.960 --> 0:14:11.600 there aren't many things where it's worth that much work 0:14:11.640 --> 0:14:13.760 to build a machine that's only going to do one 0:14:13.880 --> 0:14:14.720 narrow thing. 0:14:15.200 --> 0:14:18.120 That's right, and then you tackle the next problem and 0:14:18.200 --> 0:14:20.560 you basically have to start over. And you know, there 0:14:20.560 --> 0:14:23.360 are some nuances here, like for images, you can pre 0:14:23.440 --> 0:14:25.880 train a model on some other tasks and change it around. 0:14:25.960 --> 0:14:28.920 So there are some examples of this, like non recurring 0:14:29.040 --> 0:14:31.600 cost that we have in the old world too, But 0:14:31.640 --> 0:14:34.160 by and large, it's just a lot of effort. It's hard, 0:14:34.480 --> 0:14:38.760 it takes, you know, a large level of skill to implement. 0:14:39.520 --> 0:14:42.320 One analogy that I like is, you know, think about 0:14:42.360 --> 0:14:44.480 it as you know, you have a river of data, 0:14:44.840 --> 0:14:48.160 you know, running through your company or your institution. Traditional 0:14:48.240 --> 0:14:50.720 AI solutions are kind of like building a dam on 0:14:50.760 --> 0:14:54.240 that river. You know, dams are very expensive things to build. 0:14:54.560 --> 0:14:58.800 They require highly specialized skills and lots of planning. And 0:14:59.000 --> 0:15:00.680 you know, you're only going to put a dam on 0:15:01.120 --> 0:15:03.640 a river that's big enough that you're going to get 0:15:03.680 --> 0:15:05.800 enough energy out of it that it was worth your trouble. 0:15:06.200 --> 0:15:07.720 You're gonna get a lot of value out of that dam. 0:15:07.800 --> 0:15:09.400 If you have a river like that, you know, a 0:15:09.520 --> 0:15:13.080 river of data, but it's actually the vast majority of 0:15:13.280 --> 0:15:15.640 the water you know in your kingdom actually isn't in 0:15:15.680 --> 0:15:19.720 that river. It's in puddles and greeks and ballet bricks. 0:15:19.800 --> 0:15:23.240 And you know, there's a lot of value left on 0:15:23.280 --> 0:15:25.840 the table because it's like, well, I can't there's nothing 0:15:25.840 --> 0:15:27.640 you can do about it. It's just that that's too 0:15:28.640 --> 0:15:31.760 low value. So it takes too much effort, so I'm 0:15:31.760 --> 0:15:33.320 just not going to do it. The return on investment 0:15:33.720 --> 0:15:36.280 just isn't there, so you just end up not automating 0:15:36.320 --> 0:15:39.120 things because it's too much of a pain. Now what 0:15:39.160 --> 0:15:41.600 foundation models do is they say, well, actually, no, we 0:15:41.640 --> 0:15:44.800 can train a base model, a foundation that you can 0:15:44.840 --> 0:15:46.560 work on the don't We don't care. We have to 0:15:46.560 --> 0:15:48.400 specify what the task is ahead of time. We just 0:15:48.400 --> 0:15:51.560 need to learn about the domain of data. So if 0:15:51.560 --> 0:15:54.440 we want to build something that can understand English language, 0:15:54.760 --> 0:15:58.080 there's a ton of English language text available out in 0:15:58.120 --> 0:16:02.440 the world. We can now train on huge quantities of it, 0:16:02.880 --> 0:16:06.680 and then it learned the structure, learned how language you know, 0:16:06.800 --> 0:16:09.680 good part of how language works on all that unlabeled data, 0:16:09.760 --> 0:16:11.880 and then when you roll up with your task, you 0:16:11.880 --> 0:16:15.440 know I want to solve this particular problem. You don't 0:16:15.480 --> 0:16:18.080 have to start from scratch. You're starting from a very 0:16:18.200 --> 0:16:20.920 very very high place. So that just gives you the 0:16:20.960 --> 0:16:23.320 ability to just, you know, now, all of a sudden, 0:16:23.360 --> 0:16:26.560 everything is accessible. All the puddles and greeks and babbling 0:16:26.560 --> 0:16:30.720 brooks and kettlepons, you know, those are all accessible now. 0:16:31.240 --> 0:16:33.920 And that's that's very exciting. But it just changes the 0:16:33.920 --> 0:16:36.440 equation on what kinds of problems you could use AI 0:16:36.560 --> 0:16:36.960 to solve. 0:16:37.080 --> 0:16:42.560 And so foundation models basically mean that automating some new 0:16:42.640 --> 0:16:45.920 task is much less labor intensive. The sort of marginal 0:16:45.960 --> 0:16:49.000 effort to do some new automation thing is much lower 0:16:49.000 --> 0:16:52.280 because you're building on top of the foundation model rather 0:16:52.320 --> 0:16:56.720 than starting from scratch. Absolutely, So that is that is 0:16:56.800 --> 0:17:00.520 like the exciting good news. I do feel like there's 0:17:01.200 --> 0:17:03.840 a little bit of a countervailing idea that's worth talking 0:17:03.840 --> 0:17:06.200 about here, and that is the idea that even though 0:17:06.240 --> 0:17:10.280 there are these foundation models that are really powerful, that 0:17:10.320 --> 0:17:13.359 are relatively easy to build on top of, it's still 0:17:13.359 --> 0:17:17.240 the case right that there is not some one size fits. 0:17:16.960 --> 0:17:18.200 All foundation model. 0:17:18.760 --> 0:17:21.320 So you know, what does that mean and why is 0:17:21.359 --> 0:17:22.520 that important to think about? 0:17:22.560 --> 0:17:23.800 In this context. 0:17:24.040 --> 0:17:27.840 Yeah, so we believe very strongly that there isn't just 0:17:27.920 --> 0:17:30.800 one model to rule them all. There's a number of 0:17:30.840 --> 0:17:33.840 reasons why that could be true. One which I think 0:17:33.920 --> 0:17:37.960 is important and very relevant today is how much energy 0:17:38.280 --> 0:17:43.040 these models can consume. So these models, you know, can 0:17:43.080 --> 0:17:48.520 get very, very large. So one thing that we're starting 0:17:48.560 --> 0:17:51.280 to see or starting to believe, is that you probably 0:17:51.280 --> 0:17:56.440 shouldn't use one giant sledgehammer model to solve every single problem, 0:17:56.600 --> 0:17:58.560 you know, like we should pick the right size model 0:17:58.560 --> 0:18:01.359 to solve the problem. We shouldn't necessarily assume that we 0:18:01.440 --> 0:18:06.000 need the biggest, baddest model for every little use case. 0:18:06.440 --> 0:18:08.639 And we're also seeing that, you know, small models that 0:18:08.680 --> 0:18:12.880 are trained like to specialize on particular domains can actually 0:18:12.920 --> 0:18:16.760 outperform much bigger models. So bigger isn't always even better. 0:18:16.840 --> 0:18:19.439 So they're more efficient and they do the thing you 0:18:19.440 --> 0:18:21.080 want them to do better as well. 0:18:21.640 --> 0:18:22.120 That's right. 0:18:22.240 --> 0:18:25.639 So Stanford, for instance, a group of Stanford trained a model. 0:18:26.359 --> 0:18:28.920 It is a two point seven billion parameter model, which 0:18:28.960 --> 0:18:31.800 isn't terribly big by today's standards. They trained it just 0:18:31.880 --> 0:18:33.160 on the biomedical literature. 0:18:33.200 --> 0:18:33.359 You know. 0:18:33.400 --> 0:18:35.760 This is the kind of thing that universities do and 0:18:35.840 --> 0:18:39.119 what they showed was that this model was better at 0:18:39.119 --> 0:18:41.840 answering questions about the biomedical literature than some models that 0:18:41.920 --> 0:18:45.639 are one hundred billion parameters, you know, many times larger. 0:18:46.200 --> 0:18:48.560 So it's a little bit like you know, asking an 0:18:48.560 --> 0:18:52.439 expert for help on something versus asking the smartest person, 0:18:52.480 --> 0:18:55.240 you know, the smartest person you know, maybe very smart, 0:18:55.680 --> 0:18:58.720 but they're not going to be expertise. And then as 0:18:58.760 --> 0:19:00.520 an added bonus, you know, this is now a much 0:19:00.520 --> 0:19:02.960 smaller model, it's much more efficient to run, we are 0:19:03.119 --> 0:19:06.639 you know, you know, it's cheaper. So there's lots of 0:19:06.640 --> 0:19:09.359 different advantages there. So I think we're going to see 0:19:09.760 --> 0:19:14.280 attension in the industry between vendors that say, hey, this 0:19:14.359 --> 0:19:16.480 is the one, you know, big model, and then others 0:19:16.480 --> 0:19:18.800 that say, well, actually, you know, there's there's you know, 0:19:19.160 --> 0:19:21.080 lots of different tools we can use that all have 0:19:21.160 --> 0:19:24.119 this nice quality that we outligned at the beginning, and 0:19:24.119 --> 0:19:25.600 then we should really pick the one that makes the 0:19:25.680 --> 0:19:27.360 most sense for the task at hand. 0:19:28.720 --> 0:19:33.080 So there's sustainability basically efficiency. Another kind of set of 0:19:33.119 --> 0:19:37.000 issues that come up a lot with ai A are bias, hallucination. 0:19:37.720 --> 0:19:40.359 Can you talk a little bit about bias and hallucination 0:19:40.440 --> 0:19:43.360 what they are, and how you're working to mitigate those problems. 0:19:43.800 --> 0:19:46.639 Yeah, so there are lots of issues still. As amazing 0:19:46.680 --> 0:19:49.640 as these technologies are, and they are amazing, let's let's 0:19:49.640 --> 0:19:52.119 be very clear, lots of great things we're going to 0:19:52.200 --> 0:19:56.040 enable with these kinds of technologies. Bias isn't a new problem. 0:19:56.400 --> 0:20:01.000 So you know, basically we've seen this since the beginning 0:20:01.000 --> 0:20:03.919 of AI. If you train a model on data that 0:20:04.320 --> 0:20:06.439 has a bias in it, the model is going to 0:20:06.480 --> 0:20:11.080 recapitulate that bias and it provides its answers. So every time, 0:20:11.240 --> 0:20:13.800 you know, if all the text you have says, you know, 0:20:13.840 --> 0:20:16.919 it's more likely to refer to female nurses and male scientists, 0:20:16.960 --> 0:20:19.040 then you're going to you know, get models that you know. 0:20:19.080 --> 0:20:22.160 For instance, there was an example where a machine learning 0:20:22.200 --> 0:20:26.600 based translation system translated from Hungarian to English. Hungarian doesn't 0:20:26.600 --> 0:20:29.919 have gender pronouns, English does, and when you ask them 0:20:29.920 --> 0:20:32.280 to translate, it would translate they are a nurse to 0:20:32.680 --> 0:20:35.560 she is a nurse, would translate they are a scientist 0:20:35.600 --> 0:20:37.800 to he is a scientist. And that's not because the 0:20:38.600 --> 0:20:41.199 people who wrote the algorithm were building in bias and 0:20:41.320 --> 0:20:43.040 coding in like oh, it's got to be this way. 0:20:43.119 --> 0:20:45.359 It's because the data was like that. You know, we 0:20:45.440 --> 0:20:49.719 have biases in our society and they're reflected in our 0:20:49.800 --> 0:20:53.600 data and our text and our images everywhere, and then 0:20:53.640 --> 0:20:56.760 the models they're just mapping from what they've seen in 0:20:56.800 --> 0:20:59.560 their training data to the result that you're trying to 0:20:59.560 --> 0:21:01.960 get them to do and to give, and then these 0:21:01.960 --> 0:21:06.840 biases come out. So there's a very active program of 0:21:06.920 --> 0:21:09.560 research and you know, we we do quite a bit 0:21:09.600 --> 0:21:13.240 at IBM research and i T but also all over 0:21:13.400 --> 0:21:16.000 the community and industry and academia trying to figure out 0:21:16.040 --> 0:21:19.080 how do we explicitly remove these biases, how do we 0:21:19.119 --> 0:21:21.480 identify them, how do you know, how do we build 0:21:21.640 --> 0:21:23.959