WEBVTT - Smart Talks with IBM - Transformations in AI: why foundation models are the future 0:00:04.519 --> 0:00:10.480 Welcome to Tech Stuff, a production from iHeartRadio. 0:00:12.240 --> 0:00:13.800 Today, we are witnessed. 0:00:13.400 --> 0:00:16.640 To one of those rare moments in history, the rise 0:00:16.760 --> 0:00:20.520 of an innovative technology with the potential to radically transform 0:00:20.600 --> 0:00:26.320 business in society forever. That technology, of course, is artificial intelligence, 0:00:26.520 --> 0:00:29.280 and it's the central focus for this new season of 0:00:29.320 --> 0:00:33.400 Smart Talks with IBM. Join hosts from your favorite Pushkin 0:00:33.479 --> 0:00:37.040 podcasts as they talk with industry experts and leaders to 0:00:37.120 --> 0:00:40.879 explore how businesses can integrate AI into their workflows and 0:00:41.000 --> 0:00:44.160 help drive real change in this new era of AI, 0:00:44.720 --> 0:00:47.360 and of course, host Malcolm Gladwell will be there to 0:00:47.400 --> 0:00:49.919 guide you through the season and throw in his two 0:00:50.000 --> 0:00:53.080 cents as well. Look out for new episodes of Smart 0:00:53.080 --> 0:00:56.240 Talks with IBM every other week on the iHeartRadio app, 0:00:56.480 --> 0:00:59.960 Apple Podcasts, or wherever you get your podcasts, and learn 0:01:00.160 --> 0:01:03.840 more at IBM dot com slash smart Talks. 0:01:06.200 --> 0:01:09.680 Hello, Hello, Welcome to Smart Talks with IBM, a podcast 0:01:09.680 --> 0:01:15.399 from Pushkin Industries, iHeartRadio and IBM. I'm Malcolm Gabwell. This season, 0:01:15.440 --> 0:01:19.920 we're continuing our conversation with new creators visionaries who are 0:01:19.959 --> 0:01:24.000 creatively applying technology in business to drive change, but with 0:01:24.080 --> 0:01:28.480 a focus on the transformative power of artificial intelligence and 0:01:28.560 --> 0:01:31.520 what it means to leverage AI as a game changing 0:01:31.600 --> 0:01:36.800 multiplier for your business. Our guest today is doctor David Cox, 0:01:37.440 --> 0:01:42.320 VP of AI Models at IBM Research and IBM Director 0:01:42.360 --> 0:01:46.440 of the MIT IBM Watson AI Lab, a first of 0:01:46.480 --> 0:01:52.360 its kind industry academic collaboration between IBM and MIT focused 0:01:52.400 --> 0:01:57.080 on the fundamental research of artificial intelligence. Over the course 0:01:57.120 --> 0:02:02.000 of decades, David Cox watched as the AI revolution steadily 0:02:02.080 --> 0:02:05.120 grew from the simmering ideas of a few academics and 0:02:05.160 --> 0:02:10.560 technologists into the industrial boom we are experiencing today. Having 0:02:10.600 --> 0:02:13.720 dedicated his life to pushing the field of AI towards 0:02:13.800 --> 0:02:18.040 new horizons, David has both contributed to and presided over 0:02:18.400 --> 0:02:24.040 many of the major breakthroughs in artificial intelligence. In today's episode, 0:02:24.280 --> 0:02:28.520 you'll hear David explain some of the conceptual underpinnings of 0:02:28.560 --> 0:02:33.320 the current AI landscape, things like foundation models, in surprisingly 0:02:33.440 --> 0:02:36.520 comprehensible terms. I might add, we'll also get into some 0:02:36.560 --> 0:02:40.120 of the amazing practical applications for AI in business, as 0:02:40.160 --> 0:02:43.280 well as what implications AI will have for the future 0:02:43.320 --> 0:02:47.280 of work and design. David spoke with Jacob Goldstein, host 0:02:47.320 --> 0:02:52.080 of the Pushkin podcast What's Your Problem. A veteran business journalist, 0:02:52.400 --> 0:02:55.680 Jacob has reported for The Wall Street Journal, the Miami Herald, 0:02:55.919 --> 0:03:01.080 and was a longtime host of the NPR program Planet Money. Okay, 0:03:01.760 --> 0:03:02.920 let's get to the interview. 0:03:09.360 --> 0:03:11.360 Tell me about your job at IBM. 0:03:11.720 --> 0:03:15.040 So. I wear two hats at IBM. So one, I'm 0:03:15.080 --> 0:03:17.919 the IBM Director of the MI T IBM Watson AI Lab. 0:03:18.440 --> 0:03:21.480 So that's a joint lab between IBM and MIT where 0:03:21.480 --> 0:03:24.080 we try and invent what's next in AI. It's been 0:03:24.120 --> 0:03:27.000 running for about five years, and then more recently I 0:03:27.040 --> 0:03:30.200 started as the vice president for AI Models, and I'm 0:03:30.200 --> 0:03:34.760 in charge of building IBM's foundation models, you know, building 0:03:34.800 --> 0:03:37.000 these these big models, generative models that allow us to 0:03:37.040 --> 0:03:39.160 have all kinds of new exciting capabilities in AI. 0:03:39.720 --> 0:03:41.640 So, so I want to talk to you a lot 0:03:41.720 --> 0:03:45.120 about foundation models, about genitive AI. But before we get 0:03:45.160 --> 0:03:47.240 to that, let's just spend a minute on the on 0:03:47.320 --> 0:03:52.800 the IBM MI T collaboration. Where where did that partnership start? 0:03:52.840 --> 0:03:53.800 How did it originate? 0:03:54.960 --> 0:03:57.320 Yeah, So, actually it turns out that MI T and 0:03:57.360 --> 0:04:01.000 IBM have been collaborating for a very long time in 0:04:01.000 --> 0:04:05.040 the area of AI. In fact, the term artificial intelligence 0:04:05.120 --> 0:04:08.720 was coined in a nineteen fifty six workshop that was 0:04:08.720 --> 0:04:11.000 held at Dartmouth. It was actually organized by an IBM 0:04:11.040 --> 0:04:14.200 or Nathaniel Rochester, who led the development of the IBM 0:04:14.240 --> 0:04:17.400 seven and one. So we've really been together in AIS 0:04:17.480 --> 0:04:22.360 since the beginning, and as AI kept accelerating more and 0:04:22.400 --> 0:04:25.839 more and more, I think there was a really interesting 0:04:25.880 --> 0:04:28.400 decision to say, let's make this a formal partnership. So 0:04:28.600 --> 0:04:30.640 IBM in twenty seventeen and AW so it'd be committing 0:04:30.640 --> 0:04:33.520 close to a quarter billion dollars over ten years to 0:04:33.600 --> 0:04:37.640 have this joint lab with MIT, and we located ourselves 0:04:37.720 --> 0:04:39.880 right on the campus and we've been developing very, very 0:04:39.960 --> 0:04:42.760 deep relationships where we can really get to know each other, 0:04:42.880 --> 0:04:46.240 work shoulder to shoulder, conceiving what we should work on next, 0:04:46.240 --> 0:04:50.239 and then executing the projects. And it's really very few 0:04:50.880 --> 0:04:54.520 entities like this exist between academia industry. It's been really 0:04:54.560 --> 0:04:56.800 fun the last five years to be a part of it. 0:04:57.440 --> 0:04:58.960 And what do you think are some of the most 0:04:58.960 --> 0:05:02.480 important outcomes of this collaboration between IBM and MIT. 0:05:03.880 --> 0:05:06.560 Yeah, so we're really kind of the tip of the 0:05:06.600 --> 0:05:10.920 sphere for for IBM's b I strategy. So we're we're 0:05:10.960 --> 0:05:14.039 really looking what, you know, what's coming ahead, and you know, 0:05:14.080 --> 0:05:16.800 in areas like Foundation models, you know, as the field 0:05:16.880 --> 0:05:20.080 changes and I T people are interested in working on 0:05:20.279 --> 0:05:22.280 you know, faculty, students and staff are interested in working 0:05:22.320 --> 0:05:24.039 on what's the latest thing, what's the next thing. We 0:05:24.080 --> 0:05:27.440 at IBM Research are very much interested in the same 0:05:27.720 --> 0:05:29.760 So we can kind of put out feelers, you know, 0:05:29.839 --> 0:05:33.400 interesting things that we're seeing in our research, interesting things 0:05:33.400 --> 0:05:34.960 we're hearing in the field. We can go and chase 0:05:35.000 --> 0:05:38.120 those opportunities. So when something big comes, like the big 0:05:38.200 --> 0:05:41.040 change that's been happening lately with Foundation Models, we're ready 0:05:41.080 --> 0:05:43.479 to jump on it. That's really the purpose, that's that's 0:05:43.520 --> 0:05:46.600 the lab functioning the way it should. We're also really 0:05:46.600 --> 0:05:50.159 interested in how do we advance you know AI that 0:05:50.200 --> 0:05:53.039 can help with climate change or you know, build better 0:05:53.200 --> 0:05:55.760 materials and all these kinds of things that are you know, 0:05:55.839 --> 0:05:58.960 a broader aperture sometimes than than what we might consider. 0:05:59.120 --> 0:06:01.800 Just looking at the product portfolio of IBM, and that 0:06:02.120 --> 0:06:04.440 gives us again a breadth where we can see connections 0:06:04.480 --> 0:06:07.240 that we might not have seen otherwise. We can you know, 0:06:07.400 --> 0:06:09.920 think things that help out society and also help out 0:06:09.920 --> 0:06:10.560 our customers. 0:06:11.720 --> 0:06:16.200 So the last whatever six months, say, there has been 0:06:16.200 --> 0:06:21.920 this wild rise in the public's interest in AI, right, 0:06:21.960 --> 0:06:25.400 clearly coming out of these generative AI models that are 0:06:25.400 --> 0:06:29.520 really accessible, you know, certainly chat GPT language models like that, 0:06:29.600 --> 0:06:32.640 as well as models that generate images like mid Journey. 0:06:33.279 --> 0:06:36.040 I mean, can you just sort of briefly talk about 0:06:36.160 --> 0:06:40.200 the breakthroughs in AI that have made this moment feel 0:06:40.279 --> 0:06:43.760 so exciting, so revolutionary for artificial intelligence? 0:06:44.800 --> 0:06:49.560 Yeah. You know, I've been studying AI basically my entire 0:06:49.600 --> 0:06:51.680 adult life. Before I came to IBM, I was a 0:06:51.680 --> 0:06:54.280 professor at Harvard. I've been doing this a long time, 0:06:54.520 --> 0:06:56.760 and I've gotten used to being surprised. It sounds like 0:06:56.800 --> 0:07:00.000 a joke, but it's serious, like getting used to being 0:07:00.080 --> 0:07:04.400 surprised at the acceleration of the pace again. It tracks 0:07:04.440 --> 0:07:06.960 actually a long way back. You know, there's lots of 0:07:06.960 --> 0:07:10.240 things where there was an idea that just simmered for 0:07:10.280 --> 0:07:14.480 a really long time. Some of the key math behind 0:07:15.080 --> 0:07:18.400 the stuff that we have today, which is amazing. There's 0:07:18.400 --> 0:07:21.560 an algorithm called backpropagation, which is sort of key to 0:07:21.640 --> 0:07:24.240 training neural networks that's been around you know since the 0:07:24.280 --> 0:07:28.400 eighties in wide use, and really what happened was it 0:07:28.520 --> 0:07:32.360 simmered for a long time and then enough data and 0:07:32.520 --> 0:07:36.720 enough compute came. So we had enough data because you know, 0:07:36.760 --> 0:07:39.920 we all started carrying multiple cameras around with us, our 0:07:39.960 --> 0:07:42.760 mobile phones have all, you know, all these cameras and 0:07:42.800 --> 0:07:45.240 this we put everything on the Internet, and there's all 0:07:45.240 --> 0:07:47.680 this data out there. We called a lucky break that 0:07:47.720 --> 0:07:50.160 there was something called a graphics processing unit, which you know, 0:07:50.200 --> 0:07:52.200 turns out to be really useful for doing these kinds 0:07:52.240 --> 0:07:54.720 of algorithms, maybe even more useful than it is for 0:07:54.840 --> 0:07:58.880 doing graphics. They're great graphics too, And things just kept 0:07:58.960 --> 0:08:02.160 kind of adding to the snowball. So we had deep learning, 0:08:02.600 --> 0:08:06.240 which is sort of a rebrand of neural networks that 0:08:06.280 --> 0:08:08.880 I mentioned from the eighties, and that was enabled again 0:08:08.920 --> 0:08:12.880 by data because we digitalized the world and compute because 0:08:12.920 --> 0:08:15.600 we kept building faster and faster and more powerful computers. 0:08:16.120 --> 0:08:18.800 And then that allowed us to make this this big breakthrough. 0:08:19.080 --> 0:08:23.400 And then you know, more recently, using the same building blocks, 0:08:23.880 --> 0:08:26.800 that inexorable rise of more and more and more data 0:08:27.480 --> 0:08:32.400 met the technology called self supervised learning. Where the key 0:08:32.679 --> 0:08:37.080 difference there in traditional deep learning, you know, for classifying images, 0:08:37.160 --> 0:08:38.680 you know, like is this a cat or is this 0:08:38.720 --> 0:08:43.079 a dog? And a picture those technologies require super vision, 0:08:43.200 --> 0:08:45.920 so you have to take what you have and then 0:08:45.920 --> 0:08:47.160 you have to label it. So you have to take 0:08:47.160 --> 0:08:48.720 a picture of a cat and then you label it 0:08:48.720 --> 0:08:51.199 as a cat, and it turns out that, you know, 0:08:51.240 --> 0:08:53.600 that's very powerful, but it takes a lot of time 0:08:53.640 --> 0:08:56.959 to label gats and to label dogs, and there's only 0:08:57.040 --> 0:08:59.360 so many labels that exist in the world. So what 0:08:59.520 --> 0:09:03.280 really changed more recently is that we have self supervised 0:09:03.320 --> 0:09:05.120 learning where you don't have to have the labels. We 0:09:05.120 --> 0:09:07.720 can just take unannotated data. And what that does is 0:09:07.720 --> 0:09:11.000 it lets you use even more data. And that's really 0:09:11.000 --> 0:09:15.120 what drove this latest sort of rage. And then and 0:09:15.160 --> 0:09:17.040 then all of a sudden we start getting these these 0:09:17.040 --> 0:09:23.120 really powerful models. And then really, this has been simmering technologies, right, 0:09:23.600 --> 0:09:27.760 This has been happening for a while and progressively getting 0:09:27.800 --> 0:09:30.400 more and more powerful. One of the things that really 0:09:30.600 --> 0:09:35.760 happened with CHATGBT and technologies like stable diffusion and mid 0:09:35.880 --> 0:09:39.000 Journey was that they made it visible to the public. 0:09:39.640 --> 0:09:41.600 You know, you put it out there the public can 0:09:41.640 --> 0:09:43.760 touch and feel, and they're like, Wow, not only is 0:09:43.800 --> 0:09:47.000 there palpable change and wow this you know, I can 0:09:47.080 --> 0:09:49.199 talk to this thing. Wow, this thing can generate an image. 0:09:49.559 --> 0:09:52.160 Not only that, but everyone can touch and feel and try. 0:09:53.000 --> 0:09:57.640 My kids can use some of these AI art generation technologies. 0:09:58.200 --> 0:10:01.640 And that's really just launched, you know. It's like a 0:10:02.080 --> 0:10:05.679 propelled slingshot at us into a different regime in terms 0:10:05.720 --> 0:10:07.480 of the public awareness of these technologies. 0:10:08.160 --> 0:10:11.320 You mentioned earlier in the conversation foundation models, and I 0:10:11.360 --> 0:10:13.160 want to talk a little bit about that. I mean, 0:10:13.200 --> 0:10:16.600 can you just tell me, you know, what are foundation 0:10:16.840 --> 0:10:19.600 models for AI and why are they a big deal? 0:10:20.800 --> 0:10:24.480 Yeah? So this term foundation model was coined by a 0:10:24.520 --> 0:10:28.240 group at Stanford, and I think it's actually a really 0:10:28.280 --> 0:10:31.800 apt term because remember I said, you know, one of 0:10:31.840 --> 0:10:35.200 the big things that unlocked this latest excitement was the 0:10:35.200 --> 0:10:38.800 fact that we could use large amounts of unannotated data. 0:10:39.000 --> 0:10:40.680 We could train a model. We don't have to go 0:10:40.720 --> 0:10:44.240 through the painful effort of labeling each and every example. 0:10:44.840 --> 0:10:47.040 You still need to have your model do something you 0:10:47.080 --> 0:10:49.240 wanted to do. You still need to tell it what 0:10:49.280 --> 0:10:50.880 you want to do. You can't just have a model 0:10:50.880 --> 0:10:53.440 that doesn't, you know, have any purpose. But what a 0:10:53.440 --> 0:10:57.280 foundation models that provides a foundation, like a literal foundation, 0:10:57.559 --> 0:10:59.560 you can sort of stand on the shoulders of giants. 0:10:59.600 --> 0:11:02.160 You can have one of these massively trained models and 0:11:02.200 --> 0:11:04.280 then do a little bit on top. You know, you 0:11:04.280 --> 0:11:06.520 could use just a few examples of what you're looking 0:11:06.559 --> 0:11:09.760 for and you can get what you want from the model. 0:11:10.320 --> 0:11:12.240 So just a little bit on top. Now it gets 0:11:12.280 --> 0:11:14.439 to the results that a huge amount of effort used 0:11:14.480 --> 0:11:16.400 to have to put in, you know, to get from 0:11:16.440 --> 0:11:18.640 the ground up to that level. 0:11:18.880 --> 0:11:22.680 I was trying to think of of an analogy for 0:11:22.960 --> 0:11:25.959 sort of foundation models versus what came before, and I 0:11:26.000 --> 0:11:28.440 don't know that I came up with a good one, 0:11:28.480 --> 0:11:30.199 but the best I could do was this. I want 0:11:30.240 --> 0:11:33.160 you to tell me if it's plausible. It's like before 0:11:33.240 --> 0:11:36.800 foundation models, it was like you had these sort of 0:11:36.840 --> 0:11:40.040 single use kitchen appliances. You could make a waffle iron 0:11:40.080 --> 0:11:42.600 if you wanted waffles, or you could make a toaster 0:11:42.760 --> 0:11:45.319 if you wanted to make toast. But a foundation model 0:11:45.400 --> 0:11:47.960 is like like an oven with a range on top. 0:11:48.040 --> 0:11:49.800 So it's like this machine and you could just cook 0:11:49.920 --> 0:11:51.760 anything with this machine. 0:11:52.360 --> 0:11:56.880 Yeah, that's a great analogy. They're very versatile. The other 0:11:57.000 --> 0:11:59.520 piece of it too, is that they dramatically lowered the 0:11:59.600 --> 0:12:02.800 effort that it takes to do something that you want 0:12:02.840 --> 0:12:06.120 to do. And I used to say about the old 0:12:06.160 --> 0:12:07.960 world of AI, would say, you know, the problem with 0:12:08.000 --> 0:12:11.760 automation is that it's too labor intensive, which sounds like 0:12:11.800 --> 0:12:12.680 I'm making a joke. 0:12:12.880 --> 0:12:17.440 Indeed, famously, if automation does one thing, it substitutes machines 0:12:17.559 --> 0:12:20.760 or computing power for labor, right, So what does that 0:12:20.840 --> 0:12:25.160 mean to say AI is or automation is too labor intensive. 0:12:25.640 --> 0:12:27.600 It sounds like I'm making a joke, but I'm actually serious. 0:12:27.840 --> 0:12:31.040 What I mean is that the effort it took the 0:12:31.120 --> 0:12:34.960 old regime to automate something was very, very high. So 0:12:35.160 --> 0:12:38.040 if I need to go and curate all this data, 0:12:38.080 --> 0:12:41.319 collect all this data, and then carefully label all these examples, 0:12:41.720 --> 0:12:46.120 that labeling itself might be incredibly expensive and times, and 0:12:46.120 --> 0:12:48.760 we estimate anywhere between eighty to ninety percent of the 0:12:48.800 --> 0:12:51.720 effort it takes to feel an AI solution actually is 0:12:51.880 --> 0:12:55.199 just spent on data so that that has some consequences, 0:12:55.480 --> 0:13:00.839 which is the threshold for bothering. You know, if you're 0:13:00.880 --> 0:13:03.040 going to only get a little bit of value back 0:13:03.320 --> 0:13:05.520 from something, are you going to go through this huge 0:13:05.559 --> 0:13:09.000 effort to curate all this data, and then when it 0:13:09.040 --> 0:13:11.480 comes time to train the model, you need highly skilled 0:13:11.480 --> 0:13:15.160 people defensive or hard to find in the labor market. 0:13:15.720 --> 0:13:17.240 You know, are you really going to do something that's 0:13:17.240 --> 0:13:19.280 just a tidal incremental thing? Now you're going to do 0:13:19.320 --> 0:13:23.280 the only the highest value things that weren't right level 0:13:23.640 --> 0:13:24.199 because you have. 0:13:24.240 --> 0:13:28.240 To essentially build the whole machine from scratch, and there 0:13:28.240 --> 0:13:30.840 aren't many things where it's worth that much work to 0:13:30.880 --> 0:13:33.840 build a machine that's only going to do one narrow thing. 0:13:34.320 --> 0:13:37.240 That's right, and then you tackle the next problem and 0:13:37.320 --> 0:13:39.640 you basically have to start over. And you know, there 0:13:39.679 --> 0:13:42.480 are some nuances here, like for images, you can pre 0:13:42.520 --> 0:13:45.000 train a model on some other task and change it around. 0:13:45.080 --> 0:13:48.040 So there are some examples of this, like non recurring 0:13:48.120 --> 0:13:50.719 cost that we have in the old world too, But 0:13:50.760 --> 0:13:53.280 by and large, it's just a lot of effort. It's hard. 0:13:53.559 --> 0:13:57.880 It takes you know, a large level of skill to implement. 0:13:58.640 --> 0:14:01.439 One analogy that I like is, you know, think about 0:14:01.440 --> 0:14:03.559 it as you know, you have a river of data, 0:14:03.960 --> 0:14:07.280 you know, running through your company or your institution. Traditional 0:14:07.360 --> 0:14:09.840 AI solutions are kind of like building a dam on 0:14:09.880 --> 0:14:13.320 that river. You know, dams are very expensive things to build. 0:14:13.679 --> 0:14:17.960 They require highly specialized skills and lots of planning. And 0:14:18.120 --> 0:14:19.800 you know, you're only going to put a dam on 0:14:20.240 --> 0:14:22.960 a river that's big enough that you're gonna get enough 0:14:23.040 --> 0:14:24.920 energy out of it that it was worth your trouble. 0:14:25.320 --> 0:14:26.600 You're gonna get a lot of value out of that 0:14:26.680 --> 0:14:28.400 dam if you have a river like that, you know, 0:14:28.480 --> 0:14:32.080 a river of data, but it's actually the vast majority 0:14:32.080 --> 0:14:34.640 of the water you know in your kingdom actually isn't 0:14:34.680 --> 0:14:38.800 in that river. It's in puddles and greeks and babid brooks. 0:14:38.880 --> 0:14:42.360 And you know, there's a lot of value left on 0:14:42.360 --> 0:14:44.960 the table because it's like, well, I can't there's nothing 0:14:44.960 --> 0:14:46.760 you can do about it. It's just that that's too 0:14:47.760 --> 0:14:50.880 low value. So it takes too much effort, so I'm 0:14:50.880 --> 0:14:52.000 just not going to do it. The return on the 0:14:52.000 --> 0:14:54.800 investment just isn't there, so you just end up not 0:14:54.880 --> 0:14:58.120 automating things. It's too much of a pain. Now what 0:14:58.280 --> 0:15:00.720 foundation models do is they say, well, actually, no, we 0:15:00.760 --> 0:15:03.920 can train a base model a foundation that you can 0:15:03.960 --> 0:15:06.240 work on, don't We don't care. We're not specifying what 0:15:06.280 --> 0:15:07.800 the task is ahead of time. We just need to 0:15:08.240 --> 0:15:10.920 learn about the domain of data. So if we want 0:15:10.960 --> 0:15:14.240 to build something that can understand English language, there's a 0:15:14.280 --> 0:15:17.640 ton of English language text available out in the world. 0:15:17.760 --> 0:15:21.560 We can now train models on huge quantities of it. 0:15:22.000 --> 0:15:25.400 And then it learned the structure, It learned how language 0:15:25.600 --> 0:15:27.640 you know, good part of how language works on all 0:15:27.640 --> 0:15:29.920 that unlabeled data. And then when you roll up with 0:15:30.000 --> 0:15:33.760 your task, you know, I want to solve this particular problem. 0:15:34.200 --> 0:15:36.560 You don't have to start from scratch. You're starting from 0:15:36.640 --> 0:15:39.840 a very very very high place. So that just gives 0:15:39.880 --> 0:15:42.440 you the ability to you know, now all of a sudden, 0:15:42.480 --> 0:15:45.640 everything is accessible. All the puddles and greeks and babbling 0:15:45.680 --> 0:15:49.840 brooks and kilopons, you know, those are all accessible now, 0:15:50.360 --> 0:15:53.040 and that's that's very exciting. But it just changes the 0:15:53.040 --> 0:15:55.560 equation on what kinds of problems you could use AI 0:15:55.720 --> 0:15:56.080 to solve. 0:15:56.200 --> 0:16:01.680 And so foundation models basically mean that automating some new 0:16:01.760 --> 0:16:05.000 task is much less labor intensive. The sort of marginal 0:16:05.080 --> 0:16:08.120 effort to do some new automation thing is much lower 0:16:08.120 --> 0:16:11.400 because you're building on top of the foundation model rather 0:16:11.440 --> 0:16:16.200 than starting from scratch. Absolutely, so that is like the 0:16:16.560 --> 0:16:20.720 exciting good news. I do feel like there's a little 0:16:20.760 --> 0:16:23.480 bit of a countervailing idea that's worth talking about here, 0:16:23.520 --> 0:16:25.640 and that is the idea that even though there are 0:16:25.680 --> 0:16:30.280 these foundation models that are really powerful that are relatively 0:16:30.320 --> 0:16:32.880 easy to build on top of, it's still the case 0:16:32.960 --> 0:16:36.240 right that there is not some one size fits all 0:16:36.320 --> 0:16:39.960 foundation model. So you know, what does that mean and 0:16:40.040 --> 0:16:42.560 why is that important to think about in this context? 0:16:43.160 --> 0:16:46.920 Yeah, So we believe very strongly that there isn't just 0:16:47.040 --> 0:16:49.960 one model to rule them all. There's a number of 0:16:49.960 --> 0:16:52.960 reasons why that could be true. One which I think 0:16:53.040 --> 0:16:57.080 is important and very relevant today is how much energy 0:16:57.400 --> 0:17:02.160 these models can consume. So these models, you know, can 0:17:02.200 --> 0:17:07.640 get very, very large. So one thing that we're starting 0:17:07.640 --> 0:17:10.399 to see or starting to believe, is that you probably 0:17:10.400 --> 0:17:15.560 shouldn't use one giant sledgehammer model to solve every single problem, 0:17:15.720 --> 0:17:17.680 you know, like we should pick the right size model 0:17:17.680 --> 0:17:20.480 to solve the problem. We shouldn't necessarily assume that we 0:17:20.560 --> 0:17:25.119 need the biggest, baddest model for every little use case. 0:17:25.560 --> 0:17:27.760 And we're also seeing that, you know, small models that 0:17:27.800 --> 0:17:32.000 are trained like to specialize on particular domains can actually 0:17:32.040 --> 0:17:35.920 outperform much bigger models. So bigger isn't always even better. 0:17:35.960 --> 0:17:38.520 So they're more efficient and they do the thing you 0:17:38.560 --> 0:17:40.200 want them to do better as well. 0:17:40.760 --> 0:17:43.960 That's right. So Stanford, for instance, a group of Stanford 0:17:44.040 --> 0:17:47.880 trained a model was a two point seven billion parameter model, 0:17:47.880 --> 0:17:50.719 which isn't terribly big by today's standards. They trained it 0:17:50.840 --> 0:17:52.760 just on the biomedical literature, you know, this is the 0:17:52.800 --> 0:17:55.560 kind of thing that universities do. And what they showed 0:17:55.680 --> 0:17:59.120 was that this model was better at answering questions about 0:17:59.119 --> 0:18:01.800 the biomedical literacy sure than some models that are one 0:18:01.880 --> 0:18:05.760 hundred billion prouders, you know, many times larger. So it's 0:18:05.800 --> 0:18:08.679 a little bit like you know, asking an expert for 0:18:08.760 --> 0:18:11.840 help on something versus asking the smartest person, you know, 0:18:12.480 --> 0:18:15.280 the smartest person you know, maybe very smart, but they're 0:18:15.320 --> 0:18:18.639 not going to be expertise. And then as an added bonus, 0:18:18.680 --> 0:18:20.680 you know, this is now a much smaller model, it's 0:18:20.760 --> 0:18:23.199 much more efficient to run, we are you know, you know, 0:18:23.240 --> 0:18:27.040 it's cheaper. So there's lots of different advantages there. So 0:18:27.280 --> 0:18:31.119 I think we're going to see attention in the industry 0:18:31.480 --> 0:18:34.200 between vendors that say, hey, this is the one, you know, 0:18:34.280 --> 0:18:36.879 big model, and then others that say, well, actually, you know, 0:18:36.880 --> 0:18:39.439 there's there's you know, lots of different tools we can 0:18:39.560 --> 0:18:41.840 use that all have this nice quality that we outligned 0:18:41.880 --> 0:18:44.199 at the beginning, and then we should really pick the 0:18:44.200 --> 0:18:46.520 one that makes the most sense for the task at hand. 0:18:47.840 --> 0:18:52.199 So there's sustainability basically efficiency. Another kind of set of 0:18:52.240 --> 0:18:54.520 issues that come up a lot with AI A are 0:18:54.680 --> 0:18:58.479 bias hallucination. Can you talk a little bit about bias 0:18:58.720 --> 0:19:01.000 and hallucination, what they are and how you're working to 0:19:01.400 --> 0:19:02.480 mitigate those problems. 0:19:02.920 --> 0:19:05.760 Yeah, so there are lots of issues still as amazing 0:19:05.800 --> 0:19:08.960 as these technologies are, and they are amazing, let's be 0:19:09.080 --> 0:19:11.600 very clear, lots of great things we're going to enable 0:19:11.640 --> 0:19:15.160 with these kinds of technologies. Bias isn't a new problem, 0:19:15.520 --> 0:19:20.119 so you know, basically we've seen this since the beginning 0:19:20.119 --> 0:19:23.040 of AI. If you train a model on data that 0:19:23.440 --> 0:19:25.560 has a bias in it, the model is going to 0:19:25.600 --> 0:19:30.200 recapitulate that bias and it provides its answers. So every time, 0:19:30.359 --> 0:19:32.919 you know, if all the text you have says, you know, 0:19:32.920 --> 0:19:35.960 it's more likely to refer to female nurses and male scientists, 0:19:36.080 --> 0:19:38.760 then you're going to get models that you know. For instance, 0:19:39.080 --> 0:19:41.960 there was an example where a machine learning based translation 0:19:42.040 --> 0:19:46.840 system translated from Hungarian to English. Hungarian doesn't have gender pronouns. 0:19:46.960 --> 0:19:49.520 English does, and when you ask them to translate, it 0:19:49.560 --> 0:19:52.560 would translate they are a nurse to she is a nurse. 0:19:53.200 --> 0:19:55.680 Translate they are a scientist, to he is a scientist. 0:19:55.920 --> 0:19:58.960 And that's not because the people who wrote the algorithm 0:19:58.960 --> 0:20:01.520 we're building in bio and coding in like, oh, it's 0:20:01.520 --> 0:20:03.320 got to be this way. It's because the data was 0:20:03.359 --> 0:20:06.080 like that. You know, we have biases in our society 0:20:06.560 --> 0:20:10.119 and they're reflected in our data and our text and 0:20:10.200 --> 0:20:14.640 our images everywhere. And then the models they're just mapping 0:20:14.680 --> 0:20:16.560 from what they've what they've seen in their training data 0:20:16.640 --> 0:20:19.000 to to the result that you're trying to get them 0:20:19.000 --> 0:20:21.880 to do and to give, and then these biases come out. 0:20:22.000 --> 0:20:27.439 So there's a very active program of research in you know, 0:20:27.480 --> 0:20:30.280 we we do quite a bit at IBM research and I, 0:20:31.000 --> 0:20:34.240 but also all over the community and industry in academia 0:20:34.280 --> 0:20:37.840 trying to figure out how do we explicitly remove these biases, 0:20:37.880 --> 0:20:40.080 how do we identify them, how do you know, how 0:20:40.080 --> 0:20:42.320 do we build tools that allow people to audit their 0:20:42.359 --> 0:20:44.919 systems to make sure they aren't biased. So this is 0:20:44.960 --> 0:20:47.040 a really important thing, and you know, again this was 0:20:47.080 --> 0:20:51.600 here since the beginning, you know, of machine learning and AI, 0:20:52.160 --> 0:20:55.640 but foundation models and large language models and generative AI 0:20:56.600 --> 0:20:59.360 just bring it into sharper even sharper focus because there's 0:20:59.359 --> 0:21:02.880 just so much and it's sort of building in baking 0:21:02.920 --> 0:21:06.160 in all these different biases we have. So that's that's 0:21:06.200 --> 0:21:10.000 absolutely a problem that these models have. Another one that 0:21:10.040 --> 0:21:13.879 you mentioned was hallucinations. So even the most impressive of 0:21:13.880 --> 0:21:17.960 our models will often just make stuff up. You know, 0:21:18.000 --> 0:21:21.159 the technical term that the heels chosen as is hallucination. 0:21:21.760 --> 0:21:24.719 To give you an example, I asked chat tbt to 0:21:24.960 --> 0:21:28.760 create a biography of David Cox IBM, and you know, 0:21:29.000 --> 0:21:31.560 it started off really well. You know, they identified that 0:21:31.600 --> 0:21:34.040 I was the director of the mt IBM Watson and 0:21:34.040 --> 0:21:36.440 said a few words about that, and then it proceeded 0:21:36.480 --> 0:21:41.040 to create an authoritative but completely fake biography of me. 0:21:41.080 --> 0:21:43.560 Where I was British, I was born in the UK. 0:21:44.960 --> 0:21:47.880 I went to British university, you know universities in the UK. 0:21:47.960 --> 0:21:51.359 I was professor the authority, right, it's the certainty that 0:21:51.359 --> 0:21:54.600 that is weird about it, right, It's it's dead certain 0:21:54.640 --> 0:21:56.520 that you're from the UK, et cetera. 0:21:57.080 --> 0:22:00.119