WEBVTT - Rerun: Deep Learning and Deepfakes 0:00:04.400 --> 0:00:07.760 Welcome to tech Stuff, a production from I Heart Radio. 0:00:11.800 --> 0:00:14.200 Hey there, and welcome to tech Stuff. I'm your host, 0:00:14.320 --> 0:00:16.840 Jonathan Strickland. I'm an executive producer with I Heart Radio, 0:00:16.880 --> 0:00:20.720 and how the tech are you. I am currently on 0:00:21.320 --> 0:00:26.280 vacation celebrating my anniversary, and I didn't want to leave 0:00:26.360 --> 0:00:29.560 you without an episode, So the episode We're going to 0:00:29.600 --> 0:00:32.640 play for You was recorded and published on September seven, 0:00:32.760 --> 0:00:36.200 twenty twenty. It is called deep learning and deep fakes, 0:00:36.200 --> 0:00:41.400 and recent developments in the deep fakes field include researchers 0:00:41.440 --> 0:00:46.400 creating tools that can detect tells in artificial voices, for example. 0:00:47.040 --> 0:00:49.280 But really, when you think about that, it's just a 0:00:49.360 --> 0:00:52.560 see saw like pattern that will see deep fake technology 0:00:52.600 --> 0:00:56.040 improve over time, and then our ability to detect deep 0:00:56.080 --> 0:00:59.200 fakes will improve, and this will keep going until one 0:00:59.200 --> 0:01:03.600 side or the other has the edge permanently. Now we 0:01:03.800 --> 0:01:06.640 kind of talk about that in this episode. In fact, 0:01:07.120 --> 0:01:10.160 also deep fakes are very much in the spotlight literally 0:01:10.600 --> 0:01:13.679 on the popular TV series America's Got Talent, A team 0:01:13.680 --> 0:01:16.360 from the startup Metaphysics made it all the way to 0:01:16.360 --> 0:01:19.400 the final round of the competition by creating deep fake 0:01:19.560 --> 0:01:22.319 copies of the famous judges on the show all in 0:01:22.400 --> 0:01:28.640 real time. It's equal parts entertaining and terrifying. Okay, maybe 0:01:29.520 --> 0:01:38.360 entertaining terrifying. Anyway, enjoy this episode deep learning and deep Fakes. Now, 0:01:38.360 --> 0:01:41.039 before I get into today's episode, I want to give 0:01:41.160 --> 0:01:45.120 a little listener warning here. The topic at hand involves 0:01:45.200 --> 0:01:48.680 some adult content, including the use of technology to do 0:01:48.800 --> 0:01:55.120 stuff that can be unethical, illegal, hurtful, and just plain awful. Now, 0:01:55.160 --> 0:01:57.960 I think this is an important topic, but I wanted 0:01:58.000 --> 0:01:59.840 to give a bit of a heads up at this 0:02:00.000 --> 0:02:02.440 are of the episode, just in case any of you 0:02:02.440 --> 0:02:05.720 guys are listening to a podcast on like a family 0:02:05.960 --> 0:02:08.799 road trip or something. I think this is an important 0:02:08.840 --> 0:02:12.200 topic and I think everyone should know about it and 0:02:12.240 --> 0:02:14.320 think about it. But I also respect that for some 0:02:14.360 --> 0:02:17.800 people this subject might get a bit taboo. So let's 0:02:17.840 --> 0:02:23.160 go on with the episode. Back in ninete a movie 0:02:23.320 --> 0:02:28.160 called Rising Sun, directed by Philip Kaufman, based on a 0:02:28.320 --> 0:02:32.079 Michael Crichton novel and starring Wesley Snipes and Sean Connery 0:02:32.240 --> 0:02:35.639 came out in theaters. Now, I didn't see it in theaters, 0:02:36.320 --> 0:02:38.640 but I did catch it when it came on you know, 0:02:39.200 --> 0:02:43.040 HBO or Cinemax or something. Later on the movie included 0:02:43.080 --> 0:02:46.280 a sequence that I found to be totally unbelievable. And 0:02:46.320 --> 0:02:50.000 I'm not talking about buying into Sean Connery being an 0:02:50.040 --> 0:02:54.600 expert on Japanese culture and business practices. Actually, side note, 0:02:54.760 --> 0:02:59.000 Sean Connery has an interesting history of playing unlikely characters, 0:02:59.040 --> 0:03:01.760 such as in high Lander, where he played an immortal 0:03:01.919 --> 0:03:05.720 who is supposedly Egyptian, then who lived in feudal Japan 0:03:06.240 --> 0:03:09.120 and ended up in Spain where he became known as Ramirez, 0:03:09.520 --> 0:03:12.080 and all the while he's talking to a Scottish Highlander 0:03:12.240 --> 0:03:15.519 who's played by a Belgian actor. But I'm getting way 0:03:15.520 --> 0:03:19.320 off track here. Besides, I've heard Crichton actually wrote the 0:03:19.440 --> 0:03:22.280 character while thinking of Connery, So you know, what the 0:03:22.280 --> 0:03:25.600 heck do I know? In the film, Snips and Connery 0:03:25.720 --> 0:03:29.519 are investigators, and they're looking into a homicide that happened 0:03:29.560 --> 0:03:34.080 at a Japanese business but on American soil. The security 0:03:34.080 --> 0:03:38.360 system in the building captured video of the homicide and 0:03:38.400 --> 0:03:40.800 the identity of the killer appears to be a pretty 0:03:40.840 --> 0:03:44.200 open and shut case. But that's not how it all 0:03:44.240 --> 0:03:47.800 turns out. The investigators talked to a security expert played 0:03:47.840 --> 0:03:51.720 by Tia Carrera, and she demonstrates in real time how 0:03:51.840 --> 0:03:56.440 video footage can be altered. She records a short video 0:03:56.800 --> 0:04:00.240 of Connery and snipes loads that onto a computer. Her 0:04:00.560 --> 0:04:04.520 freezes a frame of the video and essentially performs a 0:04:04.600 --> 0:04:07.800 cut and paste job swapping the heads of our two 0:04:07.880 --> 0:04:11.440 lead characters. Then she resumes the video and the head 0:04:11.480 --> 0:04:16.800 swap remains in place, and that head swap stuff is possible. 0:04:17.040 --> 0:04:19.120 I mean, clearly it has to be possible, because you 0:04:19.160 --> 0:04:22.240 actually do see that effect in the film itself. But 0:04:22.480 --> 0:04:25.040 it takes a bit more than a quick cut and 0:04:25.120 --> 0:04:27.720 paste job. But we'll leave off of that for now. 0:04:28.279 --> 0:04:31.920 The whole point of that sequence, apart from showing off 0:04:32.000 --> 0:04:36.640 some cinema magic, is to demonstrate to the investigators that video, 0:04:36.800 --> 0:04:41.080 like photographs, can be altered. The expert has detected a 0:04:41.080 --> 0:04:44.520 blue halo around the face of the supposed murderer in 0:04:44.600 --> 0:04:48.000 the footage, indicating that some sort of trickery has happened. 0:04:48.480 --> 0:04:51.719 She also reveals that she cannot magically restore the video 0:04:51.760 --> 0:04:54.839 to its previous unaltered state, which I think was actually 0:04:54.880 --> 0:04:57.680 a nice change of pace for a movie. By the way, 0:04:58.240 --> 0:05:01.480 I think this movie is really, you know, not good, 0:05:01.960 --> 0:05:07.040 like not worth your time, but that's my opinion anyway. 0:05:07.080 --> 0:05:10.400 For years, this kind of video sorcery was pretty much 0:05:10.560 --> 0:05:14.600 limited to the film and TV industries. It usually required 0:05:14.640 --> 0:05:18.360 a lot of pre planning beforehand, so it wasn't as 0:05:18.400 --> 0:05:21.560 simple as just taking footage that was already shot and 0:05:21.680 --> 0:05:24.559 changing it in post on a whim with a couple 0:05:24.600 --> 0:05:26.800 of clicks of a button. If it were, we would 0:05:26.800 --> 0:05:30.640 see a lot fewer mistakes left in movies and television 0:05:30.760 --> 0:05:33.240 because you could catch it later and just fix it. 0:05:33.640 --> 0:05:37.080 But the tricks were possible, they were just difficult to 0:05:37.120 --> 0:05:40.159 pull off. It just wasn't something you or I would 0:05:40.200 --> 0:05:43.960 ever encounter in our day to day lives. But today 0:05:44.400 --> 0:05:47.239 we live in a different world, a world that has 0:05:47.279 --> 0:05:52.240 examples of synthetic media. Commonly referred to as deep fakes. 0:05:52.880 --> 0:05:56.640 These are videos that have been altered or generated so 0:05:56.760 --> 0:05:59.360 that the subject of the video is doing something that 0:05:59.400 --> 0:06:03.640 they probably really would or could never do. They've brought 0:06:03.640 --> 0:06:07.279 into question whether or not video evidence is even reliable, 0:06:07.600 --> 0:06:10.919 much as the film Rising Sun was talking about. We 0:06:11.000 --> 0:06:16.560 already know that eyewitness testimony is terribly unreliable. Our perception 0:06:16.720 --> 0:06:20.080 and memories play tricks on us, and we can quote 0:06:20.160 --> 0:06:24.160 unquote remember stuff that just didn't happen the way things 0:06:24.279 --> 0:06:28.640 actually unfolded in reality. But now we're looking at video 0:06:28.720 --> 0:06:33.320 evidence and potentially the same light. I mean, it's scary. 0:06:33.400 --> 0:06:37.680 So today we're going to learn about synthetic media, how 0:06:37.960 --> 0:06:42.080 it can be generated, the implications that follow with that 0:06:42.240 --> 0:06:44.680 sort of reality, and ways that people are trying to 0:06:44.720 --> 0:06:50.240 counteract a potentially dangerous threat. You know, fun stuff. Now, first, 0:06:50.360 --> 0:06:54.719 the term synthetic media has a particular meaning. It refers 0:06:54.760 --> 0:06:59.560 to art created through some sort of automated process, so 0:06:59.680 --> 0:07:04.000 it's a largely hands off approach to creating the final 0:07:04.720 --> 0:07:08.560 art piece. Now, under that definition, the example of rising 0:07:08.600 --> 0:07:12.040 Sun would not apply here because we see in the 0:07:12.080 --> 0:07:14.840 film and presumably this happens in the book as well, 0:07:14.880 --> 0:07:18.360 but I haven't read the book that a human being 0:07:18.600 --> 0:07:22.400 actually changes that. People have used tools to alter the 0:07:22.480 --> 0:07:26.240 video footage. This would be more like using photoshop to 0:07:26.240 --> 0:07:29.280 touch up a still image, with the computer system presumably 0:07:29.360 --> 0:07:32.520 doing some of the work in the background to keep 0:07:32.560 --> 0:07:35.200 things matched up. Either that or you would need to 0:07:35.240 --> 0:07:38.920 alter each image in the footage frame by frame, or 0:07:39.000 --> 0:07:42.960 use some sort of matt approach. To learn more about Matts, 0:07:43.360 --> 0:07:45.480 you can listen to my episode about how blue and 0:07:45.560 --> 0:07:50.000 green screens work. Synthetic media as a general practice has 0:07:50.200 --> 0:07:54.320 been around for centuries. Artists have set up various contraptions 0:07:54.360 --> 0:07:58.360 to create works with little or no human guidance. In 0:07:58.400 --> 0:08:01.559 the twentieth century we started to see a movement called 0:08:01.760 --> 0:08:05.160 generative art take form. This type of art is all 0:08:05.200 --> 0:08:08.960 about creating a system that then creates or generates the 0:08:09.040 --> 0:08:12.600 finished art piece. That would mean that the finished work, 0:08:12.720 --> 0:08:16.800 such as a painting, wouldn't reflect the feelings or thoughts 0:08:16.800 --> 0:08:20.240 of the artists who created the system. In fact, it 0:08:20.320 --> 0:08:23.560 starts to raise the question what is the art? Is 0:08:23.600 --> 0:08:26.400 it the painting that came about due to a machine 0:08:26.480 --> 0:08:30.480 following a program of some sort, or is the art 0:08:30.720 --> 0:08:35.079 the program itself? Is the art the process by which 0:08:35.120 --> 0:08:37.800 the painting was made? Now, I'm not here to answer 0:08:37.840 --> 0:08:41.400 that question. I just think it is an interesting question 0:08:41.440 --> 0:08:46.480 to ask. Sometimes people ask much less polite questions, such 0:08:46.520 --> 0:08:50.600 as is it art at all? Some art critics went 0:08:50.640 --> 0:08:53.640 out of their way to dismiss generative art in the 0:08:53.679 --> 0:08:58.280 early days. They found it insulting, but hey, that's kind 0:08:58.320 --> 0:09:02.160 of the history of art and general Each new movement 0:09:02.200 --> 0:09:07.199 in art inevitably finds both supporters and critics as it emerges. 0:09:07.640 --> 0:09:11.480 If anything, you might argue that such a response legitimizes 0:09:11.720 --> 0:09:14.679 the movement in you know, a weird way. If people 0:09:14.720 --> 0:09:18.880 hate it, it must be something. In two thousand eighteen, 0:09:19.000 --> 0:09:23.800 an artist collective called Obvious located out of Paris, France. 0:09:24.200 --> 0:09:27.760 They submitted portrait style paintings that were created not by 0:09:27.880 --> 0:09:32.719 an actual human painter, but by an artificially intelligent system. 0:09:32.760 --> 0:09:37.280 Now they looked a lot like typical eighteenth century style portraits. 0:09:37.920 --> 0:09:41.400 There was no attempt to pass off the portrait as 0:09:41.440 --> 0:09:44.360 if it were actually made by a human artist. In fact, 0:09:44.800 --> 0:09:47.959 the appeal of the piece was largely due to it 0:09:48.080 --> 0:09:52.840 being synthetically generated. It went to auction at Christie's and 0:09:52.960 --> 0:09:59.000 the AI created painting fetched more than four hundred thousand dollars. 0:09:59.120 --> 0:10:02.240 And the way the group trained their AI is relevant 0:10:02.280 --> 0:10:06.720 to our discussion about deep fakes. The collective relied on 0:10:06.800 --> 0:10:11.560 a type of machine learning called generative adversarial networks or 0:10:11.800 --> 0:10:16.320 g a N, which in turn is depending on deep learning. 0:10:16.400 --> 0:10:18.079 So it looks like we've got a few things we're 0:10:18.080 --> 0:10:20.760 gonna have to define here. Now, I'm going to keep 0:10:20.840 --> 0:10:24.840 things fairly high level, because, as it turns out, there 0:10:24.880 --> 0:10:28.240 are a few different ways to create machine learning models, 0:10:28.600 --> 0:10:31.160 and to go through all of them in exhaustive detail 0:10:31.280 --> 0:10:34.760 would represent a university level course in machine learning. I 0:10:34.800 --> 0:10:38.280 have neither the time for that nor the expertise. I 0:10:38.320 --> 0:10:41.920 would do a terrible job, so we'll go with a 0:10:42.040 --> 0:10:47.560 high level perspective here. First. A generative adversarial network uses 0:10:47.679 --> 0:10:51.800 two systems. You have a generator and you have a discriminator. 0:10:52.280 --> 0:10:55.600 Both of these systems are a type of neural network. 0:10:56.000 --> 0:10:59.600 A neural network is a computing model that is inspired 0:10:59.640 --> 0:11:03.960 by the way our brains work. Our brains contain billions 0:11:03.960 --> 0:11:08.319 of neurons, and these neurons work together, communicating through electrical 0:11:08.360 --> 0:11:13.080 and chemical signals, controlling and coordinating pretty much everything in 0:11:13.120 --> 0:11:18.440 our bodies. With computers, the neurons are nodes. The job 0:11:18.559 --> 0:11:21.720 of a node is, you know, supposed to be kind 0:11:21.720 --> 0:11:24.400 of like a neuron cell in the brain. It's to 0:11:24.520 --> 0:11:29.200 take in multiple weighted input values and then generate a 0:11:29.320 --> 0:11:34.160 single output value. Now, the word weighted w E I 0:11:34.320 --> 0:11:37.920 G H T E D weighted is really important here 0:11:37.960 --> 0:11:42.160 because the larger and inputs weight the more that input 0:11:42.280 --> 0:11:45.679 will have an effect on whatever the output is. So 0:11:45.720 --> 0:11:48.720 it kind of comes down to which inputs are the 0:11:48.800 --> 0:11:52.440 most important for that nodes particular function. Now, if I 0:11:52.480 --> 0:11:55.720 were to make an analogy, I would say, your boss 0:11:55.840 --> 0:11:59.560 hands you three tasks to do. One of those tasks 0:11:59.600 --> 0:12:03.840 has the label extremely important, and the second task has 0:12:03.920 --> 0:12:08.120 the label critically important, and the third task has a 0:12:08.200 --> 0:12:10.200 label saying you should have finished that one before it 0:12:10.280 --> 0:12:13.240 was handed to you. Okay, so that's just some sort 0:12:13.280 --> 0:12:15.719 of snarky office humor that I need to get off 0:12:15.720 --> 0:12:20.200 my chest. But more seriously, imagine a node accepting three inputs. 0:12:20.200 --> 0:12:24.679 In this example, input one has a fift weight, input 0:12:24.760 --> 0:12:28.320 two has a weight, and input three has a ten 0:12:28.440 --> 0:12:32.040 percent weight That adds up to and that would tell 0:12:32.080 --> 0:12:35.520 you that the output that node generates will be most 0:12:35.679 --> 0:12:39.880 affected by input one, followed by input two, and then 0:12:39.880 --> 0:12:43.439 input three would have a smaller effect on whatever the 0:12:43.480 --> 0:12:48.560 output is. Each node applies a nonlinear transformation on the 0:12:48.600 --> 0:12:53.720 input values, again affected by each inputs weight value, and 0:12:53.800 --> 0:12:58.920 that generates the output value. The details of that really 0:12:58.920 --> 0:13:02.360 are not important are our episode. It involves performing changes 0:13:02.360 --> 0:13:06.040 on variables that in turn change the correlation between variables, 0:13:06.040 --> 0:13:08.560 and it gets a bit Matthew, and we would get 0:13:08.600 --> 0:13:11.840 lost in the weeds. Pretty quickly. The important thing to 0:13:11.880 --> 0:13:15.520 remember is that a node within a neural network takes 0:13:15.640 --> 0:13:20.520 in a weighted sum of inputs, then performs a process 0:13:20.559 --> 0:13:25.480 on those inputs before passing the result on as an output. 0:13:25.920 --> 0:13:30.319 Then some other node a layer down will accept that output, 0:13:30.640 --> 0:13:33.079 along with outputs from a couple of other nodes one 0:13:33.160 --> 0:13:36.840 layer up, and then we'll perform an operation based on 0:13:36.920 --> 0:13:40.079 those weighted inputs and pass that on to the next layer, 0:13:40.160 --> 0:13:43.480 and so on. So these nodes are in layers, like 0:13:43.600 --> 0:13:47.240 you know a cake. One layer of notes processes some inputs, 0:13:47.280 --> 0:13:50.280 they send it onto the next layer of nodes, and 0:13:50.320 --> 0:13:52.240 then that one does onto the next one, and the 0:13:52.280 --> 0:13:56.400 next one and so on. This isn't a new idea. 0:13:56.800 --> 0:14:02.160 Computer scientists began theorizing and experimenting with neural network approaches 0:14:02.640 --> 0:14:06.000 as far back as the nineteen fifties with the perceptron, 0:14:06.320 --> 0:14:09.680 which was a hypothetical system that was described by Frank 0:14:09.760 --> 0:14:13.559 Rosenblatt of Cornell University. But it wasn't until the last 0:14:13.640 --> 0:14:17.160 decade that computing power and our ability to handle a 0:14:17.200 --> 0:14:20.520 lot of data reached a point where these sort of 0:14:20.600 --> 0:14:24.480 learning models could really take off. The goal of this 0:14:24.680 --> 0:14:28.440 system is to train it to perform a particular task 0:14:28.920 --> 0:14:33.120 within a certain level of precision. The weights I mentioned 0:14:33.160 --> 0:14:35.880 are adjustable, so you can think of it as teaching 0:14:35.880 --> 0:14:39.760 a system which bits are the most important in order 0:14:39.760 --> 0:14:42.520 to do whatever it is the system is supposed to 0:14:42.520 --> 0:14:45.680 do in order to achieve your task. These are the 0:14:45.680 --> 0:14:49.200 bits that are the most important and therefore should matter 0:14:49.240 --> 0:14:52.040 the most when you weigh a decision. This is a 0:14:52.040 --> 0:14:54.840 bit easier if we talk about a similar system with 0:14:55.080 --> 0:14:59.120 the version of IBM S Watson that played on Jeopardy. 0:14:59.360 --> 0:15:03.080 That system famously was not connected to the Internet. It 0:15:03.200 --> 0:15:06.400 had to rely on all the information that was stored 0:15:06.480 --> 0:15:11.920 within itself. When the system encountered a clue in Jeopardy, 0:15:11.960 --> 0:15:14.760 it would analyze the clue, and then it would reference 0:15:14.800 --> 0:15:17.920 its database to look for possible answers to whatever that 0:15:18.000 --> 0:15:21.920 clue was. The system would weigh those possible answers and 0:15:21.960 --> 0:15:25.240 attempt to determine which, if any, were the most likely 0:15:25.320 --> 0:15:29.040 to be correct. If the certainty was over a certain threshold, 0:15:29.400 --> 0:15:33.320 such as sure, the system would buzz in with its answer. 0:15:33.680 --> 0:15:37.360 If no response rose above that threshold, the system would 0:15:37.400 --> 0:15:40.080 not buzz in. So you could say that Watson was 0:15:40.120 --> 0:15:43.320 playing the game with a best guess sort of approach. 0:15:43.840 --> 0:15:48.880 Neural networks do essentially that sort of processing. With this 0:15:48.920 --> 0:15:52.440 particular type of approach, we know what we want the 0:15:52.520 --> 0:15:55.480 outcome to be, so we can judge whether or not 0:15:55.560 --> 0:15:59.760 the system was successful. After each attempt, we can adjust 0:16:00.120 --> 0:16:03.800 weight on the input between nodes to refine the decision 0:16:03.840 --> 0:16:07.760 making process to get more accurate results. If the system 0:16:07.840 --> 0:16:11.080 succeeds in its task, we can increase the weights that 0:16:11.160 --> 0:16:15.240 contributed to the system picking the correct answer and thus 0:16:15.480 --> 0:16:21.800 decrease the inputs that did not contribute to the successful response. 0:16:22.280 --> 0:16:25.880 If the system done messed up and gave the wrong answer, 0:16:26.440 --> 0:16:28.880 then we do the opposite. We look at the inputs 0:16:28.920 --> 0:16:32.880 that contributed to the wrong answer, we diminish their weights, 0:16:33.200 --> 0:16:35.560 and we increase the weights of the other input and 0:16:35.560 --> 0:16:40.120 then we run the test again a lot. I'll explain 0:16:40.320 --> 0:16:42.760 a bit more about this process when we come back, 0:16:42.800 --> 0:16:54.200 but first let's take a quick break. Early in the 0:16:54.320 --> 0:16:58.680 history of neural networks, computer scientists were hitting some pretty 0:16:58.760 --> 0:17:02.240 hard stops do to the limitations of computing power at 0:17:02.280 --> 0:17:06.040 the time. Early networks were only a couple of layers deep, 0:17:06.119 --> 0:17:08.800 which really meant they weren't terribly powerful, and they could 0:17:08.800 --> 0:17:12.560 only tackle rudimentary tasks like figuring out whether or not 0:17:12.600 --> 0:17:16.679 a square is drawn on a piece of paper that 0:17:17.200 --> 0:17:23.320 isn't terribly sophisticated. In six David Rummelhart, Jeffrey Hinton, and 0:17:23.480 --> 0:17:28.520 Ronald Williams published a lecture titled learning representations by back 0:17:28.640 --> 0:17:34.159 propagating errors. This was a big breakthrough with deep learning. 0:17:34.760 --> 0:17:36.960 This all has to do with a deep learning system 0:17:37.000 --> 0:17:40.200 improving its ability to complete a specific task. And basically 0:17:40.240 --> 0:17:43.679 the algorithm's job is to go from the output layer, 0:17:43.920 --> 0:17:46.800 you know, where the system has made a decision, and 0:17:46.840 --> 0:17:50.480 then work backward through the neural network, adjusting the weights 0:17:50.520 --> 0:17:55.960 that led to an incorrect decision. So let's say it's 0:17:56.040 --> 0:17:59.520 a system that is looking to figure out whether or 0:17:59.560 --> 0:18:02.760 not a hat is in a photograph and it says, 0:18:02.960 --> 0:18:05.320 there's a cat in this picture, and you look at 0:18:05.320 --> 0:18:08.159 the picture and there is no cat there. Then you 0:18:08.160 --> 0:18:12.439 would look at the inputs one level back just before 0:18:12.480 --> 0:18:15.080 the system said here's a picture of a cat, and 0:18:15.119 --> 0:18:17.520 you'd say, all right, which of these inputs lad the 0:18:17.520 --> 0:18:20.760 system to believe this was a picture of a cat? 0:18:21.160 --> 0:18:23.639 And then you would adjust those Then you would go 0:18:23.840 --> 0:18:27.720 back one layer up, So you're working your way up 0:18:27.920 --> 0:18:31.919 the model and say which inputs here led to it 0:18:32.119 --> 0:18:36.240 giving the outputs that led to the mistake, and you 0:18:36.320 --> 0:18:39.640 do this all the way up until you get up 0:18:39.640 --> 0:18:42.800 to the input level at the top of the computer model. 0:18:42.840 --> 0:18:46.000 You are back propagating, and then you run the test 0:18:46.040 --> 0:18:50.760 again to see if you've got improvement. It's exhaustive, but 0:18:50.840 --> 0:18:56.080 it's also drastically improved neural network performance, much faster than 0:18:56.160 --> 0:18:59.920 just throwing more brute force to it. The algorithm is 0:19:00.080 --> 0:19:02.439 entually is checking to see if a small change in 0:19:02.520 --> 0:19:06.520 each input value received by a layer of nodes would 0:19:06.560 --> 0:19:08.679 have led to a more accurate results. So it's all 0:19:08.720 --> 0:19:11.960 about going from that output working your way backward. In 0:19:12.040 --> 0:19:15.520 two thousand twelve, Alex Krajewski published a paper that gave 0:19:15.600 --> 0:19:19.320 us the next big breakthrough. He argued that a really 0:19:19.520 --> 0:19:23.080 deep neural network with a lot of layers could give 0:19:23.200 --> 0:19:26.359 really great results if you paired it with enough data 0:19:26.440 --> 0:19:29.800 to train the system. So you needed to throw lots 0:19:29.840 --> 0:19:33.680 of data at these models, and it needed to be 0:19:33.760 --> 0:19:37.760 an enormous amount of data. However, once trained, the system 0:19:37.840 --> 0:19:40.880 would produce lower error rates. So yeah, I would take 0:19:40.880 --> 0:19:43.640 a long time, but you would get better results. Now, 0:19:43.680 --> 0:19:46.439 at the time, a good error rate for such a 0:19:46.480 --> 0:19:51.480 system was that means one out of four conclusions the 0:19:51.560 --> 0:19:54.480 system would come to would be wrong. If you ran 0:19:54.560 --> 0:19:58.400 it across a long enough number of decisions, you would 0:19:58.400 --> 0:20:02.240 find that one out of every four wasn't right. The 0:20:02.320 --> 0:20:05.959 system that Alex's team worked on produced results that had 0:20:06.000 --> 0:20:09.399 an error rate of six percent, so much lower. And 0:20:09.440 --> 0:20:13.879 then in just five years, with more improvements to this process, 0:20:14.280 --> 0:20:18.080 the classification error rate had dropped down to two point 0:20:18.320 --> 0:20:22.800 three percent for deep learning systems. So from to two 0:20:22.880 --> 0:20:27.560 point three percent, it was really powerful stuff. Okay, so 0:20:27.720 --> 0:20:31.879 you've got your artificial neural network. You've got your layers 0:20:31.960 --> 0:20:35.760 and layers of nodes. You've adjusted the weights of the 0:20:35.800 --> 0:20:39.719 inputs into each node to see if your system can identify, 0:20:40.119 --> 0:20:44.960 you know, pictures of cats, and you start feeding images 0:20:45.040 --> 0:20:48.879 to this system, lots of them. This is the domain 0:20:49.080 --> 0:20:51.360 that you are feeding to your system. The more images 0:20:51.400 --> 0:20:53.520 you can feed to it, the better. And you want 0:20:53.520 --> 0:20:55.840 a wide variety of images of all sorts of stuff, 0:20:56.240 --> 0:20:58.800 not just of different types of cats, but stuff that 0:20:58.920 --> 0:21:03.400 most certainly isn't not a cat, like dogs, or cars 0:21:03.520 --> 0:21:06.760 or chartered public accountants. You name it, and you look 0:21:06.840 --> 0:21:10.520 to see which images the system identifies correctly and which 0:21:10.560 --> 0:21:14.040 ones it screws up, both which images have cats in 0:21:14.080 --> 0:21:17.880 it that actually don't have cats in it, or images 0:21:17.920 --> 0:21:20.760 the system has identified as saying there is no cat here, 0:21:20.960 --> 0:21:23.880 but there is a cat there. This guides you into 0:21:23.920 --> 0:21:27.520 adjusting the weights again and again, and you start over 0:21:27.560 --> 0:21:29.440 and you do it again, and that's your basic deep 0:21:29.520 --> 0:21:33.000 learning system, and it gets better over time as you 0:21:33.080 --> 0:21:36.399 train it. It learns. Now, let's transition over to the 0:21:36.440 --> 0:21:40.439 adversarial systems I mentioned earlier, because they take this and 0:21:40.480 --> 0:21:45.560 twist it a little bit. So you've got to artificial 0:21:45.720 --> 0:21:49.520 neural networks and they are using this general approach to 0:21:49.720 --> 0:21:53.400 deep learning, and you're setting them up so that they 0:21:53.440 --> 0:21:58.000 feed into each other. One network. The generator has the 0:21:58.040 --> 0:22:01.919 task to learn how to do something such as create 0:22:01.960 --> 0:22:05.919 an eighteenth century style portrait based off lots and lots 0:22:06.000 --> 0:22:09.600 of examples of the real thing. The domain the problem 0:22:09.960 --> 0:22:14.760 domain the second network. The discriminator has a different job. 0:22:15.359 --> 0:22:18.800 It has to tell the difference between authentic portraits that 0:22:19.040 --> 0:22:23.960 came from the problem domain and computer generated portraits that 0:22:24.040 --> 0:22:27.919 came from the generator itself. So essentially, the discriminator is 0:22:28.000 --> 0:22:31.199 like the model I mentioned earlier that was identifying pictures 0:22:31.200 --> 0:22:33.320 of cats, It's doing the same sort of thing, except 0:22:33.359 --> 0:22:36.600 instead of saying cat or no cat, it's saying real 0:22:36.760 --> 0:22:40.600 portrait or computer generated portrait. So there are essentially two 0:22:40.600 --> 0:22:44.359 outcomes the discriminator could reach, and that's whether or not 0:22:44.440 --> 0:22:48.119 an images computer generated or it wasn't. So do you 0:22:48.119 --> 0:22:51.680 see where this is going? You train up both models. 0:22:52.119 --> 0:22:54.879 You have the generator attempt to make its own version 0:22:54.960 --> 0:22:58.400 of something such as that eighteenth century portrait. It does 0:22:58.440 --> 0:23:01.119 so it designs the portrait it based on what the 0:23:01.160 --> 0:23:05.720 model believes are the key elements of a portrait, so 0:23:05.920 --> 0:23:10.679 things like colors, shapes, the ratio of size, like you know, 0:23:10.720 --> 0:23:13.720 how large should the head be in relation to the body. 0:23:13.760 --> 0:23:17.960 All of these factors and many more come into play. 0:23:18.119 --> 0:23:22.399 The generator creates its own idea of what a portrait 0:23:22.520 --> 0:23:25.159 is supposed to look like, and chances are the early 0:23:25.240 --> 0:23:29.879 rounds of this will not be terribly convincing. The results 0:23:30.040 --> 0:23:33.280 are then fed to the discriminator, which tries to suss 0:23:33.320 --> 0:23:36.359 out which of the images fed to it are computer 0:23:36.480 --> 0:23:40.360 generated and which ones aren't. After that round, both models 0:23:40.600 --> 0:23:45.480 are tweaked. The generator adjusts input weights to get closer 0:23:45.560 --> 0:23:49.159 to the genuine article, and the discriminator adjust weights to 0:23:49.320 --> 0:23:53.320 reduce false positives or to catch computer generated images. And 0:23:53.359 --> 0:23:57.560 then you go again and again and again and again, 0:23:57.840 --> 0:24:01.479 and they both get better over time. So, assuming everything 0:24:01.560 --> 0:24:04.840 is working properly, over time, the adjustment of input weights 0:24:04.880 --> 0:24:08.320 will lead to more convincing results, and given enough time 0:24:08.520 --> 0:24:11.480 and enough repetition, you'll end up with a computer generated 0:24:11.520 --> 0:24:13.879 painting that you can auction off for nearly half a 0:24:13.960 --> 0:24:18.479 million dollars. Though keep in mind that huge price relates 0:24:18.520 --> 0:24:21.720 back to the novelty of it being an early AI 0:24:21.760 --> 0:24:25.399 generated painting. It would be shocking to me if we 0:24:25.480 --> 0:24:29.400 saw that actually become a trend. Also, the painting, while interesting, 0:24:29.880 --> 0:24:32.760 isn't exactly so astounding as to make you think there's 0:24:32.800 --> 0:24:35.399 no way a machine did that. You'd look at them 0:24:35.400 --> 0:24:38.160 and go, yeah, I can imagine a machine did that. One. 0:24:38.840 --> 0:24:43.160 A group of computer scientists first described the general adversarial 0:24:43.200 --> 0:24:46.040 network architecture in a paper in two thousand and fourteen, 0:24:46.640 --> 0:24:49.840 and like other neural networks, these models require a lot 0:24:49.880 --> 0:24:52.480 of data. The more the better. In fact, smaller data 0:24:52.480 --> 0:24:56.159 sets means the models have to make some pretty big assumptions, 0:24:56.720 --> 0:25:00.440 and you tend to get pretty lousy results. More data, 0:25:00.600 --> 0:25:03.879 as in more examples, teaches the models more about the 0:25:03.920 --> 0:25:07.119 parameters of the domain, whatever it is they are trying 0:25:07.160 --> 0:25:10.560 to generate. It refines the approach. So if you have 0:25:10.600 --> 0:25:13.280 a sophisticated enough pair of models and you have enough 0:25:13.400 --> 0:25:16.280 data to fill up a domain, you can generate some 0:25:16.440 --> 0:25:20.520 convincing material, and that includes video. And this brings us 0:25:20.560 --> 0:25:26.240 around to deep fakes. And in addition to generative adversarial networks, 0:25:26.280 --> 0:25:31.400