WEBVTT - Deep Dreaming with Google 0:00:00.160 --> 0:00:07.080 Brought to you by Toyota. Let's go places. Welcome to 0:00:07.280 --> 0:00:13.680 Forward Thinking. Hey there, and welcomed up Forward Thinking, the 0:00:13.880 --> 0:00:17.120 podcast that looks at the future and says, Oh, dream Weaver, 0:00:17.440 --> 0:00:20.239 I believe you can help me through the night. I'm Jonathan, 0:00:21.040 --> 0:00:24.560 I'm Lauren PoCA, and I'm Joe McCormick. And today I 0:00:24.640 --> 0:00:27.880 would begin by asking both of you if you had 0:00:27.920 --> 0:00:32.360 seen images produced by Google Deep Dream. But I already 0:00:32.360 --> 0:00:36.440 know the answer because we spent part of this afternoon 0:00:36.520 --> 0:00:40.920 looking at pictures of Jonathan in his Renaissance festival costume 0:00:42.000 --> 0:00:44.960 with caterpillars growing out of his arm sockets and so 0:00:45.120 --> 0:00:49.559 many dogs, yeah, like bison puppies in his head. It 0:00:49.640 --> 0:00:52.880 was kind of like It's kind of like the American 0:00:52.960 --> 0:00:56.840 Kennel Club mixed with lovecrafty and horror, thrown on top 0:00:56.880 --> 0:00:59.800 of a Renaissance festival thrown on top of a giant 0:01:00.080 --> 0:01:05.000 pile of acid. So now you you listeners at home 0:01:05.160 --> 0:01:07.520 or wherever you are, whether you're at home or standing 0:01:07.560 --> 0:01:11.520 in line somewhere, or standing on top of a giant 0:01:11.640 --> 0:01:17.040 statue of a naked Greek hero, someone is I guarantee 0:01:17.040 --> 0:01:19.160 it probably waiting to hear the end of the Senates. 0:01:20.560 --> 0:01:22.959 You are now in one of two camps. You're either going, 0:01:23.080 --> 0:01:27.440 oh man, Google deep dream, this is crazy, or you're saying, 0:01:27.560 --> 0:01:30.440 what are they talking about? If you're in the latter camp, 0:01:30.520 --> 0:01:33.000 the people saying what are they talking about? Pause this 0:01:33.160 --> 0:01:38.320 right now, Stop what you're doing. Go look up these images. 0:01:39.200 --> 0:01:42.080 And there are numerous images for you to look at, 0:01:42.240 --> 0:01:44.520 and uh, and ultimately what you need to know is 0:01:44.560 --> 0:01:48.160 that the images are all pictures that have been altered 0:01:48.560 --> 0:01:53.440 by essentially artificial intelligence. Yeah, and in case you're like 0:01:53.720 --> 0:01:58.920 driving or otherwise visually indisposed, allow me to paint a 0:01:58.960 --> 0:02:02.680 brief word share for you. You put in a normal 0:02:02.720 --> 0:02:06.600 old photograph and and what comes out of these algorithms 0:02:06.840 --> 0:02:11.480 is recognizably your subject and your background. But the subject 0:02:11.600 --> 0:02:15.680 might have extra faces in places where faces usually are not, 0:02:16.320 --> 0:02:19.840 and and the background is maybe dripping with tentacles and eyes, 0:02:20.480 --> 0:02:22.839 and the edges of things are feathered out, as though 0:02:22.919 --> 0:02:26.720 technicolor anonomy hats have suddenly become all the rage for animals, 0:02:26.800 --> 0:02:30.600 vegetables and minerals. It looks you, guys like like like 0:02:30.600 --> 0:02:34.520 a gritty post Frank Miller reboot of Yellow Submarine. I 0:02:34.600 --> 0:02:37.360 was going to say it looks like everything is taking 0:02:37.400 --> 0:02:40.720 place in the sacred halls of Lord Dagon, and just 0:02:40.800 --> 0:02:46.760 like dripping tentacled creatures, I go with Lovecraft plus impressionist painters. Yeah, 0:02:47.080 --> 0:02:50.680 all of these descriptions are valid descriptions. Yeah. I Actually 0:02:50.720 --> 0:02:53.520 the other night I found someone who had created an 0:02:53.560 --> 0:02:56.760 app that makes use of the of Google's deep dream 0:02:57.000 --> 0:03:00.640 algorithm and allows you to submit your own photos and 0:03:00.720 --> 0:03:03.520 get them deep dreamed. So I sent a picture of 0:03:03.560 --> 0:03:07.360 my dog, Charles Darwin, a little Charlie sitting on the couch, 0:03:07.400 --> 0:03:10.160 and he's so cute. But in the deep dream photo, 0:03:10.240 --> 0:03:13.480 what's going on with him? Well, the turquoise pillow that 0:03:13.520 --> 0:03:15.960 he was laying on in the original picture has turned 0:03:16.000 --> 0:03:19.320 into a giant caterpillar with lots of strange eyes. I 0:03:19.360 --> 0:03:22.440 see more like a parrot, like a tentacle parrot. Okay, 0:03:22.440 --> 0:03:25.679 it's kind of a tentacle parrot. Yeah. Charlie's face has 0:03:25.760 --> 0:03:30.679 turned into a sort of bifurcated evil sweet dog to 0:03:30.919 --> 0:03:33.520 face face. And then he's also got a face in 0:03:33.600 --> 0:03:37.480 his butt which which appears to be a very similar face, 0:03:37.600 --> 0:03:42.440 almost exactly. There's another weird dog face in his butt. 0:03:42.560 --> 0:03:45.720 His leg has tentacles and antennae, and like it looks 0:03:45.720 --> 0:03:48.280 like a fish baby on one foot and then and 0:03:48.320 --> 0:03:51.000 then a bird where his tail would be. His tail 0:03:51.120 --> 0:03:53.600 is a bird peeking over the top of the couch. 0:03:53.640 --> 0:03:55.240 I was trying to figure out what this was. In 0:03:55.280 --> 0:03:58.520 the original image. It's just some details against the wall. 0:03:58.560 --> 0:04:01.160 It's like the top of my co feemaker and stuff. 0:04:02.000 --> 0:04:04.880 But that has turned into a creepy teddy bear head 0:04:05.000 --> 0:04:09.000 peeking over the back of the couch cushion angry ewalk. 0:04:09.280 --> 0:04:11.960 I think it's I think you're essentially making the box 0:04:12.080 --> 0:04:18.560 art for the next five nights at Freddie's video game. Yeah, 0:04:18.680 --> 0:04:21.800 so what is going on with these images that you 0:04:21.880 --> 0:04:25.400 may have seen going around the internet with all these 0:04:25.440 --> 0:04:28.279 animal faces or they're not all animal faces. That's my 0:04:28.400 --> 0:04:32.400 favorite version, but we'll explain how you get these different 0:04:32.440 --> 0:04:34.719 filters coming through. But there are also some that have 0:04:35.240 --> 0:04:39.880 just strange accents on curves and corners, or that have 0:04:40.279 --> 0:04:45.480 geometrical patterns emerging from the figures in the in the picture. Yeah, 0:04:45.520 --> 0:04:47.600 so what is going on? I mean, has Google just 0:04:47.640 --> 0:04:51.919 decided to make a super trippy weird art project? Is 0:04:51.960 --> 0:04:54.800 that the purpose of deep Dream? No, Deep Dream actually 0:04:55.080 --> 0:04:58.720 is an extension of research that's been going on at 0:04:58.720 --> 0:05:02.640 Google about image processing that I think is mainly based 0:05:02.640 --> 0:05:06.080 in the idea of image recognition. Uh. And this is 0:05:06.160 --> 0:05:09.239 done through something we've talked about on the podcast before, 0:05:09.279 --> 0:05:11.800 but we'll go into more detail about today, which is 0:05:11.920 --> 0:05:17.000 artificial neural networks. UH. And in this case, the the 0:05:17.080 --> 0:05:21.240 application you could see applications for this beyond you know, 0:05:21.279 --> 0:05:25.200 making trippy images for for practical purposes, doing things like 0:05:25.680 --> 0:05:28.239 let's say you've got a picture that has some blurry 0:05:28.320 --> 0:05:31.279 elements to it and you've already taken the picture. You 0:05:31.320 --> 0:05:33.800 can't unless you're using uh, you know, like a light 0:05:33.880 --> 0:05:38.479 field capture camera. You can't change the focus after you've 0:05:38.480 --> 0:05:40.640 taken the picture. But you might be able to use 0:05:40.640 --> 0:05:44.039 algorithms to to recognize elements within a photo and bring 0:05:44.080 --> 0:05:46.640 it into focus after the picture has already been taken, 0:05:46.960 --> 0:05:49.320 assuming the algorithms are good enough to do that reliably 0:05:49.360 --> 0:05:53.880 and not turn it into a nightmarish experience. Yeah. That 0:05:54.000 --> 0:05:56.360 is one of the weird outcomes of this type of 0:05:57.080 --> 0:06:00.640 artificial neural network and image processing is that it could 0:06:00.680 --> 0:06:05.320 actually lead to the idea of zoom and enhanced I mean, 0:06:05.360 --> 0:06:08.640 it wouldn't be perfect, but it might be better than 0:06:08.680 --> 0:06:12.039 anything we've ever had in this fake idea of zoom 0:06:12.040 --> 0:06:15.360 and enhanced today. Yeah. Yeah, so these beautiful trippy pictures 0:06:15.360 --> 0:06:18.080 are kind of a mid step between what we have today, 0:06:18.240 --> 0:06:21.680 which is not zoom and enhanced and and and really 0:06:21.760 --> 0:06:25.640 amazing artificial intelligence. Yeah. So let's get into the mechanisms 0:06:25.680 --> 0:06:29.560 behind what's going on to produce these crazy trippy pictures. 0:06:29.640 --> 0:06:32.680 And the main thing to talk about is what is 0:06:32.720 --> 0:06:35.520 going on with artificial neural networks. And I have to 0:06:35.560 --> 0:06:39.360 admit I have had a lot of trouble like actually 0:06:39.600 --> 0:06:43.400 visualizing and understanding artificial neural networks. And I've read about 0:06:43.440 --> 0:06:46.039 them plenty of times before, but they're they're one of 0:06:46.040 --> 0:06:50.560 those abstract concepts where it's it's tough to fit it 0:06:50.640 --> 0:06:53.520 to a real world example that makes it make sense 0:06:53.560 --> 0:06:55.839 to people who have a i don't know, more intuitive 0:06:55.920 --> 0:06:59.040 kind of kinetic grasp on things. After after a while 0:06:59.080 --> 0:07:00.800 reading about them, my and kind of goes, yeah, I'm 0:07:00.800 --> 0:07:03.360 gonna go get some sushi, and like that's it. It's 0:07:03.440 --> 0:07:07.440 it's tricky, largely because there is such a difference between 0:07:07.520 --> 0:07:11.400 the way our brains work and the way computer processors work. Right, 0:07:11.440 --> 0:07:14.640 So Artificial neural networks are problem solving systems that are 0:07:14.640 --> 0:07:18.080 designed to work like our brains. Actually, they're trying to 0:07:18.160 --> 0:07:21.720 take computer hardware. Well, actually you could create an artificial 0:07:21.760 --> 0:07:24.320 neural network that was hardware based, but I think we're 0:07:24.320 --> 0:07:29.680 talking usually about using software within a traditional computer architecture 0:07:30.080 --> 0:07:33.560 to mimic the cells inside a biological brain. So if 0:07:33.560 --> 0:07:38.720 they solve problems by directing data through these layers of 0:07:38.880 --> 0:07:44.520 nodes that form information exchanging connections. So let me walk 0:07:44.560 --> 0:07:48.160 you through, and I'll explain how computer processors at a 0:07:48.240 --> 0:07:51.000 high level work, and then the difference between that and 0:07:51.040 --> 0:07:55.560 an organic brain, and then how this artificial neural network 0:07:55.600 --> 0:07:58.680 is attempting to simulate what's going on with a brain. So, 0:07:58.720 --> 0:08:03.920 your typical computer process or has transistors, right, They have transistors, 0:08:03.960 --> 0:08:09.040 all of them, and transistors are serially linked. So typically 0:08:09.080 --> 0:08:11.560 you would find a transistor that's linked at most to 0:08:11.720 --> 0:08:15.800 two other transistors, and these are forming logic gates collectively 0:08:16.360 --> 0:08:21.720 which direct the ones and zeros based upon very simple rules, 0:08:21.800 --> 0:08:24.560 and then collectively, when you get lots of them together, 0:08:25.000 --> 0:08:29.600 you can do neat complex stuff. But they're still linking 0:08:29.720 --> 0:08:34.640 just to one or two other transistors. Brains however, have 0:08:34.840 --> 0:08:37.920 neurons along with a lot of other types of cells. 0:08:37.960 --> 0:08:43.000 But neurons are interconnected with each other in super complex ways. 0:08:43.040 --> 0:08:47.640 They're not serially linked, they're linked in parallel, so a 0:08:47.679 --> 0:08:50.200 single neuron could have connections to as many as ten 0:08:50.280 --> 0:08:54.800 thousand other neurons. And also, while you look at the 0:08:54.880 --> 0:08:58.160 number of transistors that are on a microprocessor, we keep 0:08:58.200 --> 0:09:01.760 on increasing that number by decreasing the size of those 0:09:01.880 --> 0:09:05.480 discrete elements. So you're talking around two billion or so 0:09:06.400 --> 0:09:09.840 on a microprocessor, which that's a lot, but our brains 0:09:09.920 --> 0:09:14.640 have somewhere around eighty two hundred billion neurons. So we 0:09:14.720 --> 0:09:17.640 have way more neurons in our brains with much more 0:09:17.640 --> 0:09:22.280 sophisticated interconnectivity than you would find in a microprocessor. So 0:09:22.280 --> 0:09:25.760 it's no big surprise that our brains work in a 0:09:25.880 --> 0:09:28.680 very different way. Now, one of the cool things about 0:09:28.679 --> 0:09:33.559 our brains is that we can innovate, we can be creative, 0:09:33.640 --> 0:09:36.440 we can learn things. It takes time for us to 0:09:36.520 --> 0:09:39.679 learn stuff, but once we learn things, we can then 0:09:39.720 --> 0:09:43.160 extrapolate from what we've learned and create new things. And 0:09:43.160 --> 0:09:47.000 this is where we get everything from. Hey, maybe this 0:09:47.000 --> 0:09:48.880 would work better if we try it. This way to 0:09:49.800 --> 0:09:52.640 a genius like Mozart. I mean that's sure. Yeah, this 0:09:52.720 --> 0:09:56.400 is the basis of imagination and engineering and invention and 0:09:56.520 --> 0:09:59.319 everything that we kind of when it really comes down 0:09:59.400 --> 0:10:02.920 to it, talk about as being human, Like what it 0:10:03.080 --> 0:10:06.040 is to be human is these qualities and there you 0:10:06.080 --> 0:10:09.240 know other elements as well that may play a part 0:10:09.280 --> 0:10:11.440 in this. But our understanding of the brain is still 0:10:11.480 --> 0:10:15.280 so limited we cannot say definitively like how much of 0:10:15.320 --> 0:10:19.280 this is what is required for consciousness for example, But 0:10:19.320 --> 0:10:21.520 that that we've we talked about that in previous episodes, 0:10:21.520 --> 0:10:23.199 so I'm not gonna I'm not gonna go over that again. 0:10:23.360 --> 0:10:26.200 So artificial neural networks attempt to capture some of that 0:10:26.280 --> 0:10:29.080 complexity and sophistication found in the brain, usually through a 0:10:29.200 --> 0:10:35.600 software virtualization as opposed to let's hook up these finding 0:10:35.640 --> 0:10:38.400 eighty billion computers just laying around and trying to connect 0:10:38.440 --> 0:10:41.200 them together probably not your best use of time. So 0:10:41.320 --> 0:10:44.400 you're usually going to be creating this through software. Uh. 0:10:44.400 --> 0:10:48.320 And they have these units, they call them units that 0:10:48.440 --> 0:10:53.400 are interconnected um. And you want to try and use 0:10:53.480 --> 0:10:58.520 these simulations to teach a computer something, for example, pattern 0:10:58.600 --> 0:11:02.280 recognition or the one that we've talked about before what 0:11:02.520 --> 0:11:04.680 a cat is, even if you don't tell it that 0:11:04.760 --> 0:11:07.640 this is a cat. If you feed enough pictures of 0:11:07.760 --> 0:11:12.319 cats uh to an artificial neural network, and you use 0:11:12.360 --> 0:11:15.360 a feedback system so that it is able to different 0:11:15.400 --> 0:11:18.760 differentiate between cat and things that are not a cat, 0:11:19.360 --> 0:11:22.640 it then understands that a cat is a thing, even 0:11:22.640 --> 0:11:26.640 if it's seeing different pictures of different catsum it starts 0:11:26.679 --> 0:11:29.840 to pick out the common elements to all of these 0:11:30.280 --> 0:11:32.480 all of these data points that are being fed through it. 0:11:32.720 --> 0:11:36.320 Right now, the important part is the training process, because 0:11:36.360 --> 0:11:40.199 without that training process and feedback, it never learns, right 0:11:40.520 --> 0:11:44.319 you would, it's meaningless to the artificial neural network. So 0:11:44.880 --> 0:11:48.520 in in this artificial neural network, each artificial neuron is 0:11:48.559 --> 0:11:51.920 a unit. There are three types their input units. This 0:11:51.960 --> 0:11:55.720 is what accepts the incoming information, so that kiddy cat picture, 0:11:55.840 --> 0:11:59.160 for example. Then you have on the other side of 0:11:59.200 --> 0:12:02.200 this network you've at the output units. That's what ends 0:12:02.240 --> 0:12:04.439 up being the information that says, yes, that is a 0:12:04.480 --> 0:12:07.040 picture of a kittie cat, or no, that is most 0:12:07.080 --> 0:12:10.680 certainly not a kittie cat. In between the the input 0:12:10.679 --> 0:12:13.440 and output you have the hidden units. These are the 0:12:13.559 --> 0:12:17.240 layers of neurons that represent the various parts of the 0:12:17.280 --> 0:12:20.679 brain that the inter connections that would happen. Um. And 0:12:20.880 --> 0:12:24.440 essentially all these units are connected to all the other units. Uh, 0:12:24.480 --> 0:12:29.280 And those connections are weighted. By weighted, I mean they 0:12:29.320 --> 0:12:33.280 have a specific relationship from one unit to the next unit. 0:12:33.320 --> 0:12:35.320 And it helps to visualize this as thinking of it 0:12:35.360 --> 0:12:39.320 being from left to right, with left most being input units, 0:12:39.720 --> 0:12:42.280 right most being output units, and everything in between being 0:12:42.320 --> 0:12:47.040 the hidden units. So the connections between each unit as 0:12:47.080 --> 0:12:49.320 you move from left or right are weighted. If it's 0:12:49.320 --> 0:12:52.000 a positive weight it means that the unit on the 0:12:52.080 --> 0:12:55.679 left can excite the unit on the right. All right, 0:12:55.760 --> 0:12:58.800 So input coming into the unit on the left, it 0:12:58.880 --> 0:13:01.640 excites the connection to the next unit on the right. 0:13:01.720 --> 0:13:04.679 Is is weighted positively, it excites the unit on the right. 0:13:05.240 --> 0:13:09.160 If it's negative, it means it suppresses the next unit 0:13:09.200 --> 0:13:11.520 that it's connected to that. And keep in mind that 0:13:12.000 --> 0:13:14.680 each of these hidden units is connected to lots of 0:13:14.679 --> 0:13:17.200 other units. It's not it's not serial, so it's not 0:13:17.280 --> 0:13:19.480 just you know, a straight line left right, it's an 0:13:19.480 --> 0:13:24.480 interconnected network of these connections. UM, i'm usn't connect a lot. 0:13:24.520 --> 0:13:28.640 Sorry about that. But anyway, the bigger the way to number, 0:13:28.679 --> 0:13:31.240 the greater the influence one unit will have on the 0:13:31.280 --> 0:13:35.040 next one. And a single unit might have all these 0:13:35.080 --> 0:13:37.680 multiple connections. Some of them are weighted positively, some of 0:13:37.679 --> 0:13:40.400 them are weighted negatively. The whole point of it is 0:13:41.000 --> 0:13:44.240 this represents a single sort of think of it almost 0:13:44.280 --> 0:13:47.160 like a decision or a perception. So in the case 0:13:47.160 --> 0:13:50.800 of the kenycat picture, the first wave might be very 0:13:50.880 --> 0:13:54.400 general shapes that would be associated with cats, and then 0:13:54.400 --> 0:13:57.240 the next wave might be more particular details, and the 0:13:57.280 --> 0:14:01.520 next wave more particular details. And uh as units pick 0:14:01.679 --> 0:14:05.040 up on those details and send the message on further 0:14:05.160 --> 0:14:07.839 down the line, it starts to refine it and refine 0:14:07.840 --> 0:14:09.800 it until it finally comes to the decision of yes, 0:14:09.880 --> 0:14:13.480 kitty cat or no, not a kitty cat. I'm way 0:14:13.559 --> 0:14:15.880 over simple but but yeah yeah. So so so there's there's 0:14:15.880 --> 0:14:18.520 a bunch of layers in the middle here where the 0:14:18.600 --> 0:14:22.040 machine is going like, yeah, this is probably a kitty cat. Yeah, yeah, 0:14:22.200 --> 0:14:24.640 probably or probably not until you get to the very 0:14:24.720 --> 0:14:27.440 end and and generally you have like a threshold and 0:14:28.000 --> 0:14:30.160 if the data at the end of it meets that 0:14:30.240 --> 0:14:34.040 threshold or exceeds it. Then it's one result, and if 0:14:34.080 --> 0:14:36.600 it doesn't, it's a different result. It's a negative result. 0:14:37.000 --> 0:14:39.400 So you can almost think of it as the probabilistic 0:14:39.480 --> 0:14:43.040 approach that a system like Watson goes through when it's 0:14:43.040 --> 0:14:45.360 trying to determine if an answer to a jeopardy question, 0:14:45.600 --> 0:14:48.160 or rather the question to a jeopardy answer is the 0:14:48.200 --> 0:14:51.840 appropriate one, where it says, all right, as long as 0:14:51.840 --> 0:14:56.240 it meets this level of of of being sure this 0:14:56.280 --> 0:14:58.680 is the correct one, we're going with it. It'll push 0:14:58.720 --> 0:15:01.240 the button and yeah, give give an answer to the 0:15:01.240 --> 0:15:03.640 form of question. And maybe when a copy of the 0:15:03.640 --> 0:15:07.960 home game. Uh so, all of these, all of all 0:15:08.000 --> 0:15:10.600 of this is going in what is called a feed 0:15:10.640 --> 0:15:14.600 forward network, which is just one type of artificial neural network. 0:15:14.920 --> 0:15:17.120 I'm using the feed forward network because it's one of 0:15:17.160 --> 0:15:20.360 the easiest ones to explain. Uh there are others that 0:15:20.400 --> 0:15:23.840 get way more complicated than this, and it requires an 0:15:23.920 --> 0:15:28.760 understanding of artificial neural networks that goes beyond my surface 0:15:29.120 --> 0:15:32.640 shallow level of understanding. Now, one of the ways that 0:15:32.760 --> 0:15:36.440 artificial neural networks have become most significant is in the 0:15:36.480 --> 0:15:40.880 field of machine learning, where you're not just coming up 0:15:40.920 --> 0:15:44.640 with a logical process for a machine. But you're showing 0:15:44.720 --> 0:15:48.040 a machine how it can refine its own decision making. 0:15:48.280 --> 0:15:50.760 And that comes in with the feedback that I was 0:15:50.760 --> 0:15:53.320 talking about earlier. You have to have feedback. You have 0:15:53.440 --> 0:15:55.440 to tell the machine. You have to be able to 0:15:55.440 --> 0:15:58.080 communicate to the machine when it has made a success 0:15:58.200 --> 0:16:00.000 versus when it has failed. And you have to be 0:16:00.000 --> 0:16:03.040 able to tweak the machine. And by machine, i'm talking 0:16:03.040 --> 0:16:05.160 about software in this case, you have to you have 0:16:05.200 --> 0:16:08.160 to tweak that design so that you get the outcome 0:16:08.200 --> 0:16:11.760 you want. Now, this is where we get a little meta. 0:16:12.160 --> 0:16:14.280 We as people know if a picture is of a 0:16:14.320 --> 0:16:16.040 kittie cat or not when we look at it and 0:16:16.040 --> 0:16:17.680 we recognize it whether or not it's kittie cat, well 0:16:17.680 --> 0:16:21.240 as an adult human who has experienced cats. Yes, yes, 0:16:21.400 --> 0:16:24.560 don't over generalized, Jonathan. Sorry, my my one year old 0:16:24.600 --> 0:16:29.240 niece knows what a kitty cat horse. All right, Let's 0:16:29.760 --> 0:16:32.400 let's say that we have determined ourselves that this this 0:16:32.440 --> 0:16:35.440 photograph we hold and on multile hands is that of 0:16:35.480 --> 0:16:39.080 a kitty catch And I'm sorry to want you know 0:16:39.120 --> 0:16:41.840 what we're gonna We're gonna title this episode after a 0:16:41.880 --> 0:16:44.880 Hamlet quote. It's going to be called very like a whale, uh, 0:16:44.960 --> 0:16:49.200 because it makes sense in that context. So anyway, you've 0:16:49.200 --> 0:16:51.080 got a picture of a kid cat. Now you you 0:16:51.160 --> 0:16:54.240 feed it to the computer and the computer output comes 0:16:54.280 --> 0:16:56.040 out and it says it's not a kiddy cat. And 0:16:56.080 --> 0:16:58.720 you know that's the wrong answer. So you have to 0:16:58.760 --> 0:17:01.480 look at how to fix the system so that it 0:17:01.520 --> 0:17:04.000 recognizes the picture you're showing it is in fact a 0:17:04.040 --> 0:17:06.399 kitty cat, right, And that might require you to to 0:17:06.560 --> 0:17:10.640 dig down back through those layers and and pick out 0:17:10.680 --> 0:17:13.120 the one that kind of said like, noah, well it's 0:17:13.160 --> 0:17:18.080 more triangular, so it's obviously not a cat, right or whatever. Exactly, 0:17:18.119 --> 0:17:20.239 you have to figure out where in this, in this 0:17:20.440 --> 0:17:24.640 stage of interconnections, did that one decision or maybe multiple 0:17:24.680 --> 0:17:27.320 decisions lead to the conclusion that it was not a 0:17:27.400 --> 0:17:30.560 kitty cat. The one one way of doing this is 0:17:30.560 --> 0:17:33.439 called back propagation, where you start with the output and 0:17:33.480 --> 0:17:35.560 you work your way backwards and you say, all right, 0:17:35.600 --> 0:17:37.600 for in order for this to say yes, this is 0:17:37.640 --> 0:17:40.480 a kitty cat, we need to have this result at 0:17:40.480 --> 0:17:42.959 this stage. Do we have that result? Yes? All right, 0:17:43.040 --> 0:17:46.520 let's go one step you know actually probably no, no, Well, 0:17:46.560 --> 0:17:49.199 then what is going on the step before it, that 0:17:49.280 --> 0:17:50.959 sort of thing, and you work your way back and 0:17:51.000 --> 0:17:53.520 you start tweaking those waitings. I was talking about the 0:17:53.520 --> 0:17:56.440 connect connections, and you say, all right, well, maybe this 0:17:56.520 --> 0:17:59.760 connection is actually waited too much. It's too far in 0:17:59.840 --> 0:18:01.920 the positive. We need to bring that down. Or maybe 0:18:01.920 --> 0:18:03.600 it's in the negative and we need to switch it 0:18:03.640 --> 0:18:06.439 to positive. So you start making these adjustments in the 0:18:06.480 --> 0:18:10.479 software to those weighted connections in the neural network, and 0:18:10.600 --> 0:18:13.200 that might end up allowing you to pass that same 0:18:13.280 --> 0:18:15.840 kitty cat picture through and now it says, oh, that's 0:18:15.840 --> 0:18:18.960 a kitty cat. Like Yeah. You do this a lot, 0:18:20.200 --> 0:18:24.600 with lots of different examples, and eventually you get to 0:18:24.640 --> 0:18:28.680 a point where you feel confident that it is doing 0:18:28.720 --> 0:18:30.639 what you intended it to do, that it is in 0:18:30.720 --> 0:18:33.760 fact able to recognize the picture of the kitty cat 0:18:34.760 --> 0:18:38.520 at a high enough percentage that you're that it's you 0:18:38.840 --> 0:18:41.520 that you say, this can recognize a kitty cat. Then 0:18:41.560 --> 0:18:44.080 you can start feeding it pictures you have never shown 0:18:44.080 --> 0:18:46.359 it before, including pictures of stuff that looks like a 0:18:46.440 --> 0:18:49.399 kitty cat but isn't, and kitty cats that maybe slightly 0:18:49.400 --> 0:18:53.160 outside the norm of what it had experienced before, and 0:18:53.640 --> 0:18:56.080 see how it does in that case, and once you 0:18:56.080 --> 0:18:59.720 get to a certain point, the device is able to 0:18:59.760 --> 0:19:03.240 maintain its ability to recognize things without you having to 0:19:03.280 --> 0:19:06.280 go in there and tweak stuff in between the training 0:19:06.280 --> 0:19:09.920 sessions it has been trained. Yeah. So so that's what 0:19:10.040 --> 0:19:13.520 they were working on, was the idea of image recognition. 0:19:14.000 --> 0:19:16.240 But one of the things that comes out in the 0:19:16.280 --> 0:19:20.040 Google research blog post where they were first describing the 0:19:20.040 --> 0:19:22.520 the idea and the genesis of the Google Deep Dream 0:19:23.080 --> 0:19:25.280 was that the researchers found that and I'm gonna have 0:19:25.320 --> 0:19:28.399 to quote this here, neural networks that were trained to 0:19:28.480 --> 0:19:32.080 discriminate between different kinds of images have quite a bit 0:19:32.080 --> 0:19:36.679 of the information needed to generate images too. So that 0:19:36.880 --> 0:19:39.040 in training, and that was the end of the quote, 0:19:39.080 --> 0:19:42.120 So that in training, what they had done in teaching 0:19:42.200 --> 0:19:45.679 these neural networks how to recognize images was also sort 0:19:45.720 --> 0:19:48.919 of teach them how to make images of things. Right. 0:19:49.040 --> 0:19:51.560 If you say, here are the shapes that you would 0:19:51.600 --> 0:19:56.000 see in Japanese architecture, and these are the these are 0:19:56.000 --> 0:19:58.600 the shapes commonly used in that architecture. So this is 0:19:58.600 --> 0:20:01.879 how you can recognize you're shown an image of a 0:20:02.040 --> 0:20:08.760 building from historic region of Japan. Then it knows it 0:20:08.920 --> 0:20:12.399 being you know, knowing and is being generous here, but 0:20:12.520 --> 0:20:16.760 it recognizes those features. Those are the features that define 0:20:16.800 --> 0:20:20.320 what that thing is. It can now generate those same features. 0:20:21.080 --> 0:20:25.720 And so if it sees quote unquote sees patterns within 0:20:25.720 --> 0:20:30.200 an image that resemble that, it could generate those images, 0:20:30.800 --> 0:20:36.000 kind of kind of tweaking and shaping the the fed 0:20:36.119 --> 0:20:41.080 image and producing something new. Yeah. So they had some 0:20:41.160 --> 0:20:45.119 wonderful examples on this Google research blog post where one 0:20:45.200 --> 0:20:49.200 of the things they were doing was just refining images 0:20:49.400 --> 0:20:54.639 based on white noise until they started to show the 0:20:54.680 --> 0:20:58.200 image that was desired. So you would start off with static, 0:20:58.720 --> 0:21:02.280 just pure static in an edge, and then tell the 0:21:02.320 --> 0:21:07.119 algorithm to constantly tweak that static to enhance it to 0:21:07.200 --> 0:21:10.960 become more like an image of a banana, and eventually, 0:21:10.920 --> 0:21:15.119 for example, yeah, yeah, and eventually the static evolved into 0:21:15.200 --> 0:21:18.359 a banana or a cluster of bananas kind of I 0:21:18.359 --> 0:21:22.160 wouldn't say, like like not like a group of banana, 0:21:22.720 --> 0:21:27.920 but a banana pile, like some sort of weird minion 0:21:28.160 --> 0:21:32.120 slash banana box made out of bananas. And I thought 0:21:32.119 --> 0:21:36.840 this was funny because this is the digital equivalent of apophenia. 0:21:37.600 --> 0:21:41.159 Do you know the process of apophenia. It's in psychology 0:21:41.160 --> 0:21:46.320 where we see significance in random patterns, sort of like 0:21:46.359 --> 0:21:48.760 paradolia being a very specific version of that, where you 0:21:48.760 --> 0:21:53.120 can you see faces in in shapes, like seeing something 0:21:53.200 --> 0:21:58.720 in a cloud, very like a whale, very much. But yeah, 0:21:58.800 --> 0:22:03.520 so we were essentially here teaching computers how too, Well, 0:22:04.000 --> 0:22:07.480 just keep trying at all of this random noise until 0:22:07.680 --> 0:22:12.119 you can find the banana there. Yeah, it's almost like 0:22:12.160 --> 0:22:14.720 pointing a sculpture at a block of marble and say, 0:22:14.800 --> 0:22:19.040 just keep cutting away until the masterpiece, like David emerges, 0:22:19.119 --> 0:22:22.359 until you find the banana. I'm sure that is what 0:22:22.440 --> 0:22:24.960 someone told Michael Angelo is a very young boy. Pretty 0:22:25.000 --> 0:22:28.159 sure that was. That was in one of his famous paintings. 0:22:28.200 --> 0:22:30.280 It's just there's a little thing at the very end, 0:22:30.320 --> 0:22:35.280 like have fun, find a banana. But yeah, obviously the 0:22:35.320 --> 0:22:38.160 actual goal of the research is image recognition. I mean 0:22:38.160 --> 0:22:42.159 we've actually done podcasts about image recognition of various forms before. 0:22:42.320 --> 0:22:45.119 Oh yeah, well that that in speech recognition and and 0:22:45.200 --> 0:22:48.520 facial recognition, which is very uh kind of creepy and 0:22:48.560 --> 0:22:51.800 important in our daily Internet lives. And if you if 0:22:51.840 --> 0:22:53.880 you want to, you know, we've talked about this so often. 0:22:53.920 --> 0:22:55.520 If you want to do a really deep dive on 0:22:55.560 --> 0:22:58.920 these topics, you can check out episodes such as can 0:22:59.000 --> 0:23:03.360 computers describe what they see? From November? I know that 0:23:03.480 --> 0:23:10.280 face from October zoom and enhanced from August, and speech 0:23:10.280 --> 0:23:14.240 recognition from April. Man, we talk about computers learning a lot. 0:23:14.480 --> 0:23:17.720 It's well, it is the future and and there's still 0:23:17.760 --> 0:23:21.240 a lot of challenges to this field, right. I mean, 0:23:21.440 --> 0:23:25.159 it's not so easy to make a computer see and 0:23:25.560 --> 0:23:28.359 recognize what it sees the way a human or an 0:23:28.359 --> 0:23:32.320