WEBVTT - Deep Dreaming with Google

0:00:00.160 --> 0:00:07.080
<v Speaker 1>Brought to you by Toyota. Let's go places. Welcome to

0:00:07.280 --> 0:00:13.680
<v Speaker 1>Forward Thinking. Hey there, and welcomed up Forward Thinking, the

0:00:13.880 --> 0:00:17.120
<v Speaker 1>podcast that looks at the future and says, Oh, dream Weaver,

0:00:17.440 --> 0:00:20.239
<v Speaker 1>I believe you can help me through the night. I'm Jonathan,

0:00:21.040 --> 0:00:24.560
<v Speaker 1>I'm Lauren PoCA, and I'm Joe McCormick. And today I

0:00:24.640 --> 0:00:27.880
<v Speaker 1>would begin by asking both of you if you had

0:00:27.920 --> 0:00:32.360
<v Speaker 1>seen images produced by Google Deep Dream. But I already

0:00:32.360 --> 0:00:36.440
<v Speaker 1>know the answer because we spent part of this afternoon

0:00:36.520 --> 0:00:40.920
<v Speaker 1>looking at pictures of Jonathan in his Renaissance festival costume

0:00:42.000 --> 0:00:44.960
<v Speaker 1>with caterpillars growing out of his arm sockets and so

0:00:45.120 --> 0:00:49.559
<v Speaker 1>many dogs, yeah, like bison puppies in his head. It

0:00:49.640 --> 0:00:52.880
<v Speaker 1>was kind of like It's kind of like the American

0:00:52.960 --> 0:00:56.840
<v Speaker 1>Kennel Club mixed with lovecrafty and horror, thrown on top

0:00:56.880 --> 0:00:59.800
<v Speaker 1>of a Renaissance festival thrown on top of a giant

0:01:00.080 --> 0:01:05.000
<v Speaker 1>pile of acid. So now you you listeners at home

0:01:05.160 --> 0:01:07.520
<v Speaker 1>or wherever you are, whether you're at home or standing

0:01:07.560 --> 0:01:11.520
<v Speaker 1>in line somewhere, or standing on top of a giant

0:01:11.640 --> 0:01:17.040
<v Speaker 1>statue of a naked Greek hero, someone is I guarantee

0:01:17.040 --> 0:01:19.160
<v Speaker 1>it probably waiting to hear the end of the Senates.

0:01:20.560 --> 0:01:22.959
<v Speaker 1>You are now in one of two camps. You're either going,

0:01:23.080 --> 0:01:27.440
<v Speaker 1>oh man, Google deep dream, this is crazy, or you're saying,

0:01:27.560 --> 0:01:30.440
<v Speaker 1>what are they talking about? If you're in the latter camp,

0:01:30.520 --> 0:01:33.000
<v Speaker 1>the people saying what are they talking about? Pause this

0:01:33.160 --> 0:01:38.320
<v Speaker 1>right now, Stop what you're doing. Go look up these images.

0:01:39.200 --> 0:01:42.080
<v Speaker 1>And there are numerous images for you to look at,

0:01:42.240 --> 0:01:44.520
<v Speaker 1>and uh, and ultimately what you need to know is

0:01:44.560 --> 0:01:48.160
<v Speaker 1>that the images are all pictures that have been altered

0:01:48.560 --> 0:01:53.440
<v Speaker 1>by essentially artificial intelligence. Yeah, and in case you're like

0:01:53.720 --> 0:01:58.920
<v Speaker 1>driving or otherwise visually indisposed, allow me to paint a

0:01:58.960 --> 0:02:02.680
<v Speaker 1>brief word share for you. You put in a normal

0:02:02.720 --> 0:02:06.600
<v Speaker 1>old photograph and and what comes out of these algorithms

0:02:06.840 --> 0:02:11.480
<v Speaker 1>is recognizably your subject and your background. But the subject

0:02:11.600 --> 0:02:15.680
<v Speaker 1>might have extra faces in places where faces usually are not,

0:02:16.320 --> 0:02:19.840
<v Speaker 1>and and the background is maybe dripping with tentacles and eyes,

0:02:20.480 --> 0:02:22.839
<v Speaker 1>and the edges of things are feathered out, as though

0:02:22.919 --> 0:02:26.720
<v Speaker 1>technicolor anonomy hats have suddenly become all the rage for animals,

0:02:26.800 --> 0:02:30.600
<v Speaker 1>vegetables and minerals. It looks you, guys like like like

0:02:30.600 --> 0:02:34.520
<v Speaker 1>a gritty post Frank Miller reboot of Yellow Submarine. I

0:02:34.600 --> 0:02:37.360
<v Speaker 1>was going to say it looks like everything is taking

0:02:37.400 --> 0:02:40.720
<v Speaker 1>place in the sacred halls of Lord Dagon, and just

0:02:40.800 --> 0:02:46.760
<v Speaker 1>like dripping tentacled creatures, I go with Lovecraft plus impressionist painters. Yeah,

0:02:47.080 --> 0:02:50.680
<v Speaker 1>all of these descriptions are valid descriptions. Yeah. I Actually

0:02:50.720 --> 0:02:53.520
<v Speaker 1>the other night I found someone who had created an

0:02:53.560 --> 0:02:56.760
<v Speaker 1>app that makes use of the of Google's deep dream

0:02:57.000 --> 0:03:00.640
<v Speaker 1>algorithm and allows you to submit your own photos and

0:03:00.720 --> 0:03:03.520
<v Speaker 1>get them deep dreamed. So I sent a picture of

0:03:03.560 --> 0:03:07.360
<v Speaker 1>my dog, Charles Darwin, a little Charlie sitting on the couch,

0:03:07.400 --> 0:03:10.160
<v Speaker 1>and he's so cute. But in the deep dream photo,

0:03:10.240 --> 0:03:13.480
<v Speaker 1>what's going on with him? Well, the turquoise pillow that

0:03:13.520 --> 0:03:15.960
<v Speaker 1>he was laying on in the original picture has turned

0:03:16.000 --> 0:03:19.320
<v Speaker 1>into a giant caterpillar with lots of strange eyes. I

0:03:19.360 --> 0:03:22.440
<v Speaker 1>see more like a parrot, like a tentacle parrot. Okay,

0:03:22.440 --> 0:03:25.679
<v Speaker 1>it's kind of a tentacle parrot. Yeah. Charlie's face has

0:03:25.760 --> 0:03:30.679
<v Speaker 1>turned into a sort of bifurcated evil sweet dog to

0:03:30.919 --> 0:03:33.520
<v Speaker 1>face face. And then he's also got a face in

0:03:33.600 --> 0:03:37.480
<v Speaker 1>his butt which which appears to be a very similar face,

0:03:37.600 --> 0:03:42.440
<v Speaker 1>almost exactly. There's another weird dog face in his butt.

0:03:42.560 --> 0:03:45.720
<v Speaker 1>His leg has tentacles and antennae, and like it looks

0:03:45.720 --> 0:03:48.280
<v Speaker 1>like a fish baby on one foot and then and

0:03:48.320 --> 0:03:51.000
<v Speaker 1>then a bird where his tail would be. His tail

0:03:51.120 --> 0:03:53.600
<v Speaker 1>is a bird peeking over the top of the couch.

0:03:53.640 --> 0:03:55.240
<v Speaker 1>I was trying to figure out what this was. In

0:03:55.280 --> 0:03:58.520
<v Speaker 1>the original image. It's just some details against the wall.

0:03:58.560 --> 0:04:01.160
<v Speaker 1>It's like the top of my co feemaker and stuff.

0:04:02.000 --> 0:04:04.880
<v Speaker 1>But that has turned into a creepy teddy bear head

0:04:05.000 --> 0:04:09.000
<v Speaker 1>peeking over the back of the couch cushion angry ewalk.

0:04:09.280 --> 0:04:11.960
<v Speaker 1>I think it's I think you're essentially making the box

0:04:12.080 --> 0:04:18.560
<v Speaker 1>art for the next five nights at Freddie's video game. Yeah,

0:04:18.680 --> 0:04:21.800
<v Speaker 1>so what is going on with these images that you

0:04:21.880 --> 0:04:25.400
<v Speaker 1>may have seen going around the internet with all these

0:04:25.440 --> 0:04:28.279
<v Speaker 1>animal faces or they're not all animal faces. That's my

0:04:28.400 --> 0:04:32.400
<v Speaker 1>favorite version, but we'll explain how you get these different

0:04:32.440 --> 0:04:34.719
<v Speaker 1>filters coming through. But there are also some that have

0:04:35.240 --> 0:04:39.880
<v Speaker 1>just strange accents on curves and corners, or that have

0:04:40.279 --> 0:04:45.480
<v Speaker 1>geometrical patterns emerging from the figures in the in the picture. Yeah,

0:04:45.520 --> 0:04:47.600
<v Speaker 1>so what is going on? I mean, has Google just

0:04:47.640 --> 0:04:51.919
<v Speaker 1>decided to make a super trippy weird art project? Is

0:04:51.960 --> 0:04:54.800
<v Speaker 1>that the purpose of deep Dream? No, Deep Dream actually

0:04:55.080 --> 0:04:58.720
<v Speaker 1>is an extension of research that's been going on at

0:04:58.720 --> 0:05:02.640
<v Speaker 1>Google about image processing that I think is mainly based

0:05:02.640 --> 0:05:06.080
<v Speaker 1>in the idea of image recognition. Uh. And this is

0:05:06.160 --> 0:05:09.239
<v Speaker 1>done through something we've talked about on the podcast before,

0:05:09.279 --> 0:05:11.800
<v Speaker 1>but we'll go into more detail about today, which is

0:05:11.920 --> 0:05:17.000
<v Speaker 1>artificial neural networks. UH. And in this case, the the

0:05:17.080 --> 0:05:21.240
<v Speaker 1>application you could see applications for this beyond you know,

0:05:21.279 --> 0:05:25.200
<v Speaker 1>making trippy images for for practical purposes, doing things like

0:05:25.680 --> 0:05:28.239
<v Speaker 1>let's say you've got a picture that has some blurry

0:05:28.320 --> 0:05:31.279
<v Speaker 1>elements to it and you've already taken the picture. You

0:05:31.320 --> 0:05:33.800
<v Speaker 1>can't unless you're using uh, you know, like a light

0:05:33.880 --> 0:05:38.479
<v Speaker 1>field capture camera. You can't change the focus after you've

0:05:38.480 --> 0:05:40.640
<v Speaker 1>taken the picture. But you might be able to use

0:05:40.640 --> 0:05:44.039
<v Speaker 1>algorithms to to recognize elements within a photo and bring

0:05:44.080 --> 0:05:46.640
<v Speaker 1>it into focus after the picture has already been taken,

0:05:46.960 --> 0:05:49.320
<v Speaker 1>assuming the algorithms are good enough to do that reliably

0:05:49.360 --> 0:05:53.880
<v Speaker 1>and not turn it into a nightmarish experience. Yeah. That

0:05:54.000 --> 0:05:56.360
<v Speaker 1>is one of the weird outcomes of this type of

0:05:57.080 --> 0:06:00.640
<v Speaker 1>artificial neural network and image processing is that it could

0:06:00.680 --> 0:06:05.320
<v Speaker 1>actually lead to the idea of zoom and enhanced I mean,

0:06:05.360 --> 0:06:08.640
<v Speaker 1>it wouldn't be perfect, but it might be better than

0:06:08.680 --> 0:06:12.039
<v Speaker 1>anything we've ever had in this fake idea of zoom

0:06:12.040 --> 0:06:15.360
<v Speaker 1>and enhanced today. Yeah. Yeah, so these beautiful trippy pictures

0:06:15.360 --> 0:06:18.080
<v Speaker 1>are kind of a mid step between what we have today,

0:06:18.240 --> 0:06:21.680
<v Speaker 1>which is not zoom and enhanced and and and really

0:06:21.760 --> 0:06:25.640
<v Speaker 1>amazing artificial intelligence. Yeah. So let's get into the mechanisms

0:06:25.680 --> 0:06:29.560
<v Speaker 1>behind what's going on to produce these crazy trippy pictures.

0:06:29.640 --> 0:06:32.680
<v Speaker 1>And the main thing to talk about is what is

0:06:32.720 --> 0:06:35.520
<v Speaker 1>going on with artificial neural networks. And I have to

0:06:35.560 --> 0:06:39.360
<v Speaker 1>admit I have had a lot of trouble like actually

0:06:39.600 --> 0:06:43.400
<v Speaker 1>visualizing and understanding artificial neural networks. And I've read about

0:06:43.440 --> 0:06:46.039
<v Speaker 1>them plenty of times before, but they're they're one of

0:06:46.040 --> 0:06:50.560
<v Speaker 1>those abstract concepts where it's it's tough to fit it

0:06:50.640 --> 0:06:53.520
<v Speaker 1>to a real world example that makes it make sense

0:06:53.560 --> 0:06:55.839
<v Speaker 1>to people who have a i don't know, more intuitive

0:06:55.920 --> 0:06:59.040
<v Speaker 1>kind of kinetic grasp on things. After after a while

0:06:59.080 --> 0:07:00.800
<v Speaker 1>reading about them, my and kind of goes, yeah, I'm

0:07:00.800 --> 0:07:03.360
<v Speaker 1>gonna go get some sushi, and like that's it. It's

0:07:03.440 --> 0:07:07.440
<v Speaker 1>it's tricky, largely because there is such a difference between

0:07:07.520 --> 0:07:11.400
<v Speaker 1>the way our brains work and the way computer processors work. Right,

0:07:11.440 --> 0:07:14.640
<v Speaker 1>So Artificial neural networks are problem solving systems that are

0:07:14.640 --> 0:07:18.080
<v Speaker 1>designed to work like our brains. Actually, they're trying to

0:07:18.160 --> 0:07:21.720
<v Speaker 1>take computer hardware. Well, actually you could create an artificial

0:07:21.760 --> 0:07:24.320
<v Speaker 1>neural network that was hardware based, but I think we're

0:07:24.320 --> 0:07:29.680
<v Speaker 1>talking usually about using software within a traditional computer architecture

0:07:30.080 --> 0:07:33.560
<v Speaker 1>to mimic the cells inside a biological brain. So if

0:07:33.560 --> 0:07:38.720
<v Speaker 1>they solve problems by directing data through these layers of

0:07:38.880 --> 0:07:44.520
<v Speaker 1>nodes that form information exchanging connections. So let me walk

0:07:44.560 --> 0:07:48.160
<v Speaker 1>you through, and I'll explain how computer processors at a

0:07:48.240 --> 0:07:51.000
<v Speaker 1>high level work, and then the difference between that and

0:07:51.040 --> 0:07:55.560
<v Speaker 1>an organic brain, and then how this artificial neural network

0:07:55.600 --> 0:07:58.680
<v Speaker 1>is attempting to simulate what's going on with a brain. So,

0:07:58.720 --> 0:08:03.920
<v Speaker 1>your typical computer process or has transistors, right, They have transistors,

0:08:03.960 --> 0:08:09.040
<v Speaker 1>all of them, and transistors are serially linked. So typically

0:08:09.080 --> 0:08:11.560
<v Speaker 1>you would find a transistor that's linked at most to

0:08:11.720 --> 0:08:15.800
<v Speaker 1>two other transistors, and these are forming logic gates collectively

0:08:16.360 --> 0:08:21.720
<v Speaker 1>which direct the ones and zeros based upon very simple rules,

0:08:21.800 --> 0:08:24.560
<v Speaker 1>and then collectively, when you get lots of them together,

0:08:25.000 --> 0:08:29.600
<v Speaker 1>you can do neat complex stuff. But they're still linking

0:08:29.720 --> 0:08:34.640
<v Speaker 1>just to one or two other transistors. Brains however, have

0:08:34.840 --> 0:08:37.920
<v Speaker 1>neurons along with a lot of other types of cells.

0:08:37.960 --> 0:08:43.000
<v Speaker 1>But neurons are interconnected with each other in super complex ways.

0:08:43.040 --> 0:08:47.640
<v Speaker 1>They're not serially linked, they're linked in parallel, so a

0:08:47.679 --> 0:08:50.200
<v Speaker 1>single neuron could have connections to as many as ten

0:08:50.280 --> 0:08:54.800
<v Speaker 1>thousand other neurons. And also, while you look at the

0:08:54.880 --> 0:08:58.160
<v Speaker 1>number of transistors that are on a microprocessor, we keep

0:08:58.200 --> 0:09:01.760
<v Speaker 1>on increasing that number by decreasing the size of those

0:09:01.880 --> 0:09:05.480
<v Speaker 1>discrete elements. So you're talking around two billion or so

0:09:06.400 --> 0:09:09.840
<v Speaker 1>on a microprocessor, which that's a lot, but our brains

0:09:09.920 --> 0:09:14.640
<v Speaker 1>have somewhere around eighty two hundred billion neurons. So we

0:09:14.720 --> 0:09:17.640
<v Speaker 1>have way more neurons in our brains with much more

0:09:17.640 --> 0:09:22.280
<v Speaker 1>sophisticated interconnectivity than you would find in a microprocessor. So

0:09:22.280 --> 0:09:25.760
<v Speaker 1>it's no big surprise that our brains work in a

0:09:25.880 --> 0:09:28.680
<v Speaker 1>very different way. Now, one of the cool things about

0:09:28.679 --> 0:09:33.559
<v Speaker 1>our brains is that we can innovate, we can be creative,

0:09:33.640 --> 0:09:36.440
<v Speaker 1>we can learn things. It takes time for us to

0:09:36.520 --> 0:09:39.679
<v Speaker 1>learn stuff, but once we learn things, we can then

0:09:39.720 --> 0:09:43.160
<v Speaker 1>extrapolate from what we've learned and create new things. And

0:09:43.160 --> 0:09:47.000
<v Speaker 1>this is where we get everything from. Hey, maybe this

0:09:47.000 --> 0:09:48.880
<v Speaker 1>would work better if we try it. This way to

0:09:49.800 --> 0:09:52.640
<v Speaker 1>a genius like Mozart. I mean that's sure. Yeah, this

0:09:52.720 --> 0:09:56.400
<v Speaker 1>is the basis of imagination and engineering and invention and

0:09:56.520 --> 0:09:59.319
<v Speaker 1>everything that we kind of when it really comes down

0:09:59.400 --> 0:10:02.920
<v Speaker 1>to it, talk about as being human, Like what it

0:10:03.080 --> 0:10:06.040
<v Speaker 1>is to be human is these qualities and there you

0:10:06.080 --> 0:10:09.240
<v Speaker 1>know other elements as well that may play a part

0:10:09.280 --> 0:10:11.440
<v Speaker 1>in this. But our understanding of the brain is still

0:10:11.480 --> 0:10:15.280
<v Speaker 1>so limited we cannot say definitively like how much of

0:10:15.320 --> 0:10:19.280
<v Speaker 1>this is what is required for consciousness for example, But

0:10:19.320 --> 0:10:21.520
<v Speaker 1>that that we've we talked about that in previous episodes,

0:10:21.520 --> 0:10:23.199
<v Speaker 1>so I'm not gonna I'm not gonna go over that again.

0:10:23.360 --> 0:10:26.200
<v Speaker 1>So artificial neural networks attempt to capture some of that

0:10:26.280 --> 0:10:29.080
<v Speaker 1>complexity and sophistication found in the brain, usually through a

0:10:29.200 --> 0:10:35.600
<v Speaker 1>software virtualization as opposed to let's hook up these finding

0:10:35.640 --> 0:10:38.400
<v Speaker 1>eighty billion computers just laying around and trying to connect

0:10:38.440 --> 0:10:41.200
<v Speaker 1>them together probably not your best use of time. So

0:10:41.320 --> 0:10:44.400
<v Speaker 1>you're usually going to be creating this through software. Uh.

0:10:44.400 --> 0:10:48.320
<v Speaker 1>And they have these units, they call them units that

0:10:48.440 --> 0:10:53.400
<v Speaker 1>are interconnected um. And you want to try and use

0:10:53.480 --> 0:10:58.520
<v Speaker 1>these simulations to teach a computer something, for example, pattern

0:10:58.600 --> 0:11:02.280
<v Speaker 1>recognition or the one that we've talked about before what

0:11:02.520 --> 0:11:04.680
<v Speaker 1>a cat is, even if you don't tell it that

0:11:04.760 --> 0:11:07.640
<v Speaker 1>this is a cat. If you feed enough pictures of

0:11:07.760 --> 0:11:12.319
<v Speaker 1>cats uh to an artificial neural network, and you use

0:11:12.360 --> 0:11:15.360
<v Speaker 1>a feedback system so that it is able to different

0:11:15.400 --> 0:11:18.760
<v Speaker 1>differentiate between cat and things that are not a cat,

0:11:19.360 --> 0:11:22.640
<v Speaker 1>it then understands that a cat is a thing, even

0:11:22.640 --> 0:11:26.640
<v Speaker 1>if it's seeing different pictures of different catsum it starts

0:11:26.679 --> 0:11:29.840
<v Speaker 1>to pick out the common elements to all of these

0:11:30.280 --> 0:11:32.480
<v Speaker 1>all of these data points that are being fed through it.

0:11:32.720 --> 0:11:36.320
<v Speaker 1>Right now, the important part is the training process, because

0:11:36.360 --> 0:11:40.199
<v Speaker 1>without that training process and feedback, it never learns, right

0:11:40.520 --> 0:11:44.319
<v Speaker 1>you would, it's meaningless to the artificial neural network. So

0:11:44.880 --> 0:11:48.520
<v Speaker 1>in in this artificial neural network, each artificial neuron is

0:11:48.559 --> 0:11:51.920
<v Speaker 1>a unit. There are three types their input units. This

0:11:51.960 --> 0:11:55.720
<v Speaker 1>is what accepts the incoming information, so that kiddy cat picture,

0:11:55.840 --> 0:11:59.160
<v Speaker 1>for example. Then you have on the other side of

0:11:59.200 --> 0:12:02.200
<v Speaker 1>this network you've at the output units. That's what ends

0:12:02.240 --> 0:12:04.439
<v Speaker 1>up being the information that says, yes, that is a

0:12:04.480 --> 0:12:07.040
<v Speaker 1>picture of a kittie cat, or no, that is most

0:12:07.080 --> 0:12:10.680
<v Speaker 1>certainly not a kittie cat. In between the the input

0:12:10.679 --> 0:12:13.440
<v Speaker 1>and output you have the hidden units. These are the

0:12:13.559 --> 0:12:17.240
<v Speaker 1>layers of neurons that represent the various parts of the

0:12:17.280 --> 0:12:20.679
<v Speaker 1>brain that the inter connections that would happen. Um. And

0:12:20.880 --> 0:12:24.440
<v Speaker 1>essentially all these units are connected to all the other units. Uh,

0:12:24.480 --> 0:12:29.280
<v Speaker 1>And those connections are weighted. By weighted, I mean they

0:12:29.320 --> 0:12:33.280
<v Speaker 1>have a specific relationship from one unit to the next unit.

0:12:33.320 --> 0:12:35.320
<v Speaker 1>And it helps to visualize this as thinking of it

0:12:35.360 --> 0:12:39.320
<v Speaker 1>being from left to right, with left most being input units,

0:12:39.720 --> 0:12:42.280
<v Speaker 1>right most being output units, and everything in between being

0:12:42.320 --> 0:12:47.040
<v Speaker 1>the hidden units. So the connections between each unit as

0:12:47.080 --> 0:12:49.320
<v Speaker 1>you move from left or right are weighted. If it's

0:12:49.320 --> 0:12:52.000
<v Speaker 1>a positive weight it means that the unit on the

0:12:52.080 --> 0:12:55.679
<v Speaker 1>left can excite the unit on the right. All right,

0:12:55.760 --> 0:12:58.800
<v Speaker 1>So input coming into the unit on the left, it

0:12:58.880 --> 0:13:01.640
<v Speaker 1>excites the connection to the next unit on the right.

0:13:01.720 --> 0:13:04.679
<v Speaker 1>Is is weighted positively, it excites the unit on the right.

0:13:05.240 --> 0:13:09.160
<v Speaker 1>If it's negative, it means it suppresses the next unit

0:13:09.200 --> 0:13:11.520
<v Speaker 1>that it's connected to that. And keep in mind that

0:13:12.000 --> 0:13:14.680
<v Speaker 1>each of these hidden units is connected to lots of

0:13:14.679 --> 0:13:17.200
<v Speaker 1>other units. It's not it's not serial, so it's not

0:13:17.280 --> 0:13:19.480
<v Speaker 1>just you know, a straight line left right, it's an

0:13:19.480 --> 0:13:24.480
<v Speaker 1>interconnected network of these connections. UM, i'm usn't connect a lot.

0:13:24.520 --> 0:13:28.640
<v Speaker 1>Sorry about that. But anyway, the bigger the way to number,

0:13:28.679 --> 0:13:31.240
<v Speaker 1>the greater the influence one unit will have on the

0:13:31.280 --> 0:13:35.040
<v Speaker 1>next one. And a single unit might have all these

0:13:35.080 --> 0:13:37.680
<v Speaker 1>multiple connections. Some of them are weighted positively, some of

0:13:37.679 --> 0:13:40.400
<v Speaker 1>them are weighted negatively. The whole point of it is

0:13:41.000 --> 0:13:44.240
<v Speaker 1>this represents a single sort of think of it almost

0:13:44.280 --> 0:13:47.160
<v Speaker 1>like a decision or a perception. So in the case

0:13:47.160 --> 0:13:50.800
<v Speaker 1>of the kenycat picture, the first wave might be very

0:13:50.880 --> 0:13:54.400
<v Speaker 1>general shapes that would be associated with cats, and then

0:13:54.400 --> 0:13:57.240
<v Speaker 1>the next wave might be more particular details, and the

0:13:57.280 --> 0:14:01.520
<v Speaker 1>next wave more particular details. And uh as units pick

0:14:01.679 --> 0:14:05.040
<v Speaker 1>up on those details and send the message on further

0:14:05.160 --> 0:14:07.839
<v Speaker 1>down the line, it starts to refine it and refine

0:14:07.840 --> 0:14:09.800
<v Speaker 1>it until it finally comes to the decision of yes,

0:14:09.880 --> 0:14:13.480
<v Speaker 1>kitty cat or no, not a kitty cat. I'm way

0:14:13.559 --> 0:14:15.880
<v Speaker 1>over simple but but yeah yeah. So so so there's there's

0:14:15.880 --> 0:14:18.520
<v Speaker 1>a bunch of layers in the middle here where the

0:14:18.600 --> 0:14:22.040
<v Speaker 1>machine is going like, yeah, this is probably a kitty cat. Yeah, yeah,

0:14:22.200 --> 0:14:24.640
<v Speaker 1>probably or probably not until you get to the very

0:14:24.720 --> 0:14:27.440
<v Speaker 1>end and and generally you have like a threshold and

0:14:28.000 --> 0:14:30.160
<v Speaker 1>if the data at the end of it meets that

0:14:30.240 --> 0:14:34.040
<v Speaker 1>threshold or exceeds it. Then it's one result, and if

0:14:34.080 --> 0:14:36.600
<v Speaker 1>it doesn't, it's a different result. It's a negative result.

0:14:37.000 --> 0:14:39.400
<v Speaker 1>So you can almost think of it as the probabilistic

0:14:39.480 --> 0:14:43.040
<v Speaker 1>approach that a system like Watson goes through when it's

0:14:43.040 --> 0:14:45.360
<v Speaker 1>trying to determine if an answer to a jeopardy question,

0:14:45.600 --> 0:14:48.160
<v Speaker 1>or rather the question to a jeopardy answer is the

0:14:48.200 --> 0:14:51.840
<v Speaker 1>appropriate one, where it says, all right, as long as

0:14:51.840 --> 0:14:56.240
<v Speaker 1>it meets this level of of of being sure this

0:14:56.280 --> 0:14:58.680
<v Speaker 1>is the correct one, we're going with it. It'll push

0:14:58.720 --> 0:15:01.240
<v Speaker 1>the button and yeah, give give an answer to the

0:15:01.240 --> 0:15:03.640
<v Speaker 1>form of question. And maybe when a copy of the

0:15:03.640 --> 0:15:07.960
<v Speaker 1>home game. Uh so, all of these, all of all

0:15:08.000 --> 0:15:10.600
<v Speaker 1>of this is going in what is called a feed

0:15:10.640 --> 0:15:14.600
<v Speaker 1>forward network, which is just one type of artificial neural network.

0:15:14.920 --> 0:15:17.120
<v Speaker 1>I'm using the feed forward network because it's one of

0:15:17.160 --> 0:15:20.360
<v Speaker 1>the easiest ones to explain. Uh there are others that

0:15:20.400 --> 0:15:23.840
<v Speaker 1>get way more complicated than this, and it requires an

0:15:23.920 --> 0:15:28.760
<v Speaker 1>understanding of artificial neural networks that goes beyond my surface

0:15:29.120 --> 0:15:32.640
<v Speaker 1>shallow level of understanding. Now, one of the ways that

0:15:32.760 --> 0:15:36.440
<v Speaker 1>artificial neural networks have become most significant is in the

0:15:36.480 --> 0:15:40.880
<v Speaker 1>field of machine learning, where you're not just coming up

0:15:40.920 --> 0:15:44.640
<v Speaker 1>with a logical process for a machine. But you're showing

0:15:44.720 --> 0:15:48.040
<v Speaker 1>a machine how it can refine its own decision making.

0:15:48.280 --> 0:15:50.760
<v Speaker 1>And that comes in with the feedback that I was

0:15:50.760 --> 0:15:53.320
<v Speaker 1>talking about earlier. You have to have feedback. You have

0:15:53.440 --> 0:15:55.440
<v Speaker 1>to tell the machine. You have to be able to

0:15:55.440 --> 0:15:58.080
<v Speaker 1>communicate to the machine when it has made a success

0:15:58.200 --> 0:16:00.000
<v Speaker 1>versus when it has failed. And you have to be

0:16:00.000 --> 0:16:03.040
<v Speaker 1>able to tweak the machine. And by machine, i'm talking

0:16:03.040 --> 0:16:05.160
<v Speaker 1>about software in this case, you have to you have

0:16:05.200 --> 0:16:08.160
<v Speaker 1>to tweak that design so that you get the outcome

0:16:08.200 --> 0:16:11.760
<v Speaker 1>you want. Now, this is where we get a little meta.

0:16:12.160 --> 0:16:14.280
<v Speaker 1>We as people know if a picture is of a

0:16:14.320 --> 0:16:16.040
<v Speaker 1>kittie cat or not when we look at it and

0:16:16.040 --> 0:16:17.680
<v Speaker 1>we recognize it whether or not it's kittie cat, well

0:16:17.680 --> 0:16:21.240
<v Speaker 1>as an adult human who has experienced cats. Yes, yes,

0:16:21.400 --> 0:16:24.560
<v Speaker 1>don't over generalized, Jonathan. Sorry, my my one year old

0:16:24.600 --> 0:16:29.240
<v Speaker 1>niece knows what a kitty cat horse. All right, Let's

0:16:29.760 --> 0:16:32.400
<v Speaker 1>let's say that we have determined ourselves that this this

0:16:32.440 --> 0:16:35.440
<v Speaker 1>photograph we hold and on multile hands is that of

0:16:35.480 --> 0:16:39.080
<v Speaker 1>a kitty catch And I'm sorry to want you know

0:16:39.120 --> 0:16:41.840
<v Speaker 1>what we're gonna We're gonna title this episode after a

0:16:41.880 --> 0:16:44.880
<v Speaker 1>Hamlet quote. It's going to be called very like a whale, uh,

0:16:44.960 --> 0:16:49.200
<v Speaker 1>because it makes sense in that context. So anyway, you've

0:16:49.200 --> 0:16:51.080
<v Speaker 1>got a picture of a kid cat. Now you you

0:16:51.160 --> 0:16:54.240
<v Speaker 1>feed it to the computer and the computer output comes

0:16:54.280 --> 0:16:56.040
<v Speaker 1>out and it says it's not a kiddy cat. And

0:16:56.080 --> 0:16:58.720
<v Speaker 1>you know that's the wrong answer. So you have to

0:16:58.760 --> 0:17:01.480
<v Speaker 1>look at how to fix the system so that it

0:17:01.520 --> 0:17:04.000
<v Speaker 1>recognizes the picture you're showing it is in fact a

0:17:04.040 --> 0:17:06.399
<v Speaker 1>kitty cat, right, And that might require you to to

0:17:06.560 --> 0:17:10.640
<v Speaker 1>dig down back through those layers and and pick out

0:17:10.680 --> 0:17:13.120
<v Speaker 1>the one that kind of said like, noah, well it's

0:17:13.160 --> 0:17:18.080
<v Speaker 1>more triangular, so it's obviously not a cat, right or whatever. Exactly,

0:17:18.119 --> 0:17:20.239
<v Speaker 1>you have to figure out where in this, in this

0:17:20.440 --> 0:17:24.640
<v Speaker 1>stage of interconnections, did that one decision or maybe multiple

0:17:24.680 --> 0:17:27.320
<v Speaker 1>decisions lead to the conclusion that it was not a

0:17:27.400 --> 0:17:30.560
<v Speaker 1>kitty cat. The one one way of doing this is

0:17:30.560 --> 0:17:33.439
<v Speaker 1>called back propagation, where you start with the output and

0:17:33.480 --> 0:17:35.560
<v Speaker 1>you work your way backwards and you say, all right,

0:17:35.600 --> 0:17:37.600
<v Speaker 1>for in order for this to say yes, this is

0:17:37.640 --> 0:17:40.480
<v Speaker 1>a kitty cat, we need to have this result at

0:17:40.480 --> 0:17:42.959
<v Speaker 1>this stage. Do we have that result? Yes? All right,

0:17:43.040 --> 0:17:46.520
<v Speaker 1>let's go one step you know actually probably no, no, Well,

0:17:46.560 --> 0:17:49.199
<v Speaker 1>then what is going on the step before it, that

0:17:49.280 --> 0:17:50.959
<v Speaker 1>sort of thing, and you work your way back and

0:17:51.000 --> 0:17:53.520
<v Speaker 1>you start tweaking those waitings. I was talking about the

0:17:53.520 --> 0:17:56.440
<v Speaker 1>connect connections, and you say, all right, well, maybe this

0:17:56.520 --> 0:17:59.760
<v Speaker 1>connection is actually waited too much. It's too far in

0:17:59.840 --> 0:18:01.920
<v Speaker 1>the positive. We need to bring that down. Or maybe

0:18:01.920 --> 0:18:03.600
<v Speaker 1>it's in the negative and we need to switch it

0:18:03.640 --> 0:18:06.439
<v Speaker 1>to positive. So you start making these adjustments in the

0:18:06.480 --> 0:18:10.479
<v Speaker 1>software to those weighted connections in the neural network, and

0:18:10.600 --> 0:18:13.200
<v Speaker 1>that might end up allowing you to pass that same

0:18:13.280 --> 0:18:15.840
<v Speaker 1>kitty cat picture through and now it says, oh, that's

0:18:15.840 --> 0:18:18.960
<v Speaker 1>a kitty cat. Like Yeah. You do this a lot,

0:18:20.200 --> 0:18:24.600
<v Speaker 1>with lots of different examples, and eventually you get to

0:18:24.640 --> 0:18:28.680
<v Speaker 1>a point where you feel confident that it is doing

0:18:28.720 --> 0:18:30.639
<v Speaker 1>what you intended it to do, that it is in

0:18:30.720 --> 0:18:33.760
<v Speaker 1>fact able to recognize the picture of the kitty cat

0:18:34.760 --> 0:18:38.520
<v Speaker 1>at a high enough percentage that you're that it's you

0:18:38.840 --> 0:18:41.520
<v Speaker 1>that you say, this can recognize a kitty cat. Then

0:18:41.560 --> 0:18:44.080
<v Speaker 1>you can start feeding it pictures you have never shown

0:18:44.080 --> 0:18:46.359
<v Speaker 1>it before, including pictures of stuff that looks like a

0:18:46.440 --> 0:18:49.399
<v Speaker 1>kitty cat but isn't, and kitty cats that maybe slightly

0:18:49.400 --> 0:18:53.160
<v Speaker 1>outside the norm of what it had experienced before, and

0:18:53.640 --> 0:18:56.080
<v Speaker 1>see how it does in that case, and once you

0:18:56.080 --> 0:18:59.720
<v Speaker 1>get to a certain point, the device is able to

0:18:59.760 --> 0:19:03.240
<v Speaker 1>maintain its ability to recognize things without you having to

0:19:03.280 --> 0:19:06.280
<v Speaker 1>go in there and tweak stuff in between the training

0:19:06.280 --> 0:19:09.920
<v Speaker 1>sessions it has been trained. Yeah. So so that's what

0:19:10.040 --> 0:19:13.520
<v Speaker 1>they were working on, was the idea of image recognition.

0:19:14.000 --> 0:19:16.240
<v Speaker 1>But one of the things that comes out in the

0:19:16.280 --> 0:19:20.040
<v Speaker 1>Google research blog post where they were first describing the

0:19:20.040 --> 0:19:22.520
<v Speaker 1>the idea and the genesis of the Google Deep Dream

0:19:23.080 --> 0:19:25.280
<v Speaker 1>was that the researchers found that and I'm gonna have

0:19:25.320 --> 0:19:28.399
<v Speaker 1>to quote this here, neural networks that were trained to

0:19:28.480 --> 0:19:32.080
<v Speaker 1>discriminate between different kinds of images have quite a bit

0:19:32.080 --> 0:19:36.679
<v Speaker 1>of the information needed to generate images too. So that

0:19:36.880 --> 0:19:39.040
<v Speaker 1>in training, and that was the end of the quote,

0:19:39.080 --> 0:19:42.120
<v Speaker 1>So that in training, what they had done in teaching

0:19:42.200 --> 0:19:45.679
<v Speaker 1>these neural networks how to recognize images was also sort

0:19:45.720 --> 0:19:48.919
<v Speaker 1>of teach them how to make images of things. Right.

0:19:49.040 --> 0:19:51.560
<v Speaker 1>If you say, here are the shapes that you would

0:19:51.600 --> 0:19:56.000
<v Speaker 1>see in Japanese architecture, and these are the these are

0:19:56.000 --> 0:19:58.600
<v Speaker 1>the shapes commonly used in that architecture. So this is

0:19:58.600 --> 0:20:01.879
<v Speaker 1>how you can recognize you're shown an image of a

0:20:02.040 --> 0:20:08.760
<v Speaker 1>building from historic region of Japan. Then it knows it

0:20:08.920 --> 0:20:12.399
<v Speaker 1>being you know, knowing and is being generous here, but

0:20:12.520 --> 0:20:16.760
<v Speaker 1>it recognizes those features. Those are the features that define

0:20:16.800 --> 0:20:20.320
<v Speaker 1>what that thing is. It can now generate those same features.

0:20:21.080 --> 0:20:25.720
<v Speaker 1>And so if it sees quote unquote sees patterns within

0:20:25.720 --> 0:20:30.200
<v Speaker 1>an image that resemble that, it could generate those images,

0:20:30.800 --> 0:20:36.000
<v Speaker 1>kind of kind of tweaking and shaping the the fed

0:20:36.119 --> 0:20:41.080
<v Speaker 1>image and producing something new. Yeah. So they had some

0:20:41.160 --> 0:20:45.119
<v Speaker 1>wonderful examples on this Google research blog post where one

0:20:45.200 --> 0:20:49.200
<v Speaker 1>of the things they were doing was just refining images

0:20:49.400 --> 0:20:54.639
<v Speaker 1>based on white noise until they started to show the

0:20:54.680 --> 0:20:58.200
<v Speaker 1>image that was desired. So you would start off with static,

0:20:58.720 --> 0:21:02.280
<v Speaker 1>just pure static in an edge, and then tell the

0:21:02.320 --> 0:21:07.119
<v Speaker 1>algorithm to constantly tweak that static to enhance it to

0:21:07.200 --> 0:21:10.960
<v Speaker 1>become more like an image of a banana, and eventually,

0:21:10.920 --> 0:21:15.119
<v Speaker 1>for example, yeah, yeah, and eventually the static evolved into

0:21:15.200 --> 0:21:18.359
<v Speaker 1>a banana or a cluster of bananas kind of I

0:21:18.359 --> 0:21:22.160
<v Speaker 1>wouldn't say, like like not like a group of banana,

0:21:22.720 --> 0:21:27.920
<v Speaker 1>but a banana pile, like some sort of weird minion

0:21:28.160 --> 0:21:32.120
<v Speaker 1>slash banana box made out of bananas. And I thought

0:21:32.119 --> 0:21:36.840
<v Speaker 1>this was funny because this is the digital equivalent of apophenia.

0:21:37.600 --> 0:21:41.159
<v Speaker 1>Do you know the process of apophenia. It's in psychology

0:21:41.160 --> 0:21:46.320
<v Speaker 1>where we see significance in random patterns, sort of like

0:21:46.359 --> 0:21:48.760
<v Speaker 1>paradolia being a very specific version of that, where you

0:21:48.760 --> 0:21:53.120
<v Speaker 1>can you see faces in in shapes, like seeing something

0:21:53.200 --> 0:21:58.720
<v Speaker 1>in a cloud, very like a whale, very much. But yeah,

0:21:58.800 --> 0:22:03.520
<v Speaker 1>so we were essentially here teaching computers how too, Well,

0:22:04.000 --> 0:22:07.480
<v Speaker 1>just keep trying at all of this random noise until

0:22:07.680 --> 0:22:12.119
<v Speaker 1>you can find the banana there. Yeah, it's almost like

0:22:12.160 --> 0:22:14.720
<v Speaker 1>pointing a sculpture at a block of marble and say,

0:22:14.800 --> 0:22:19.040
<v Speaker 1>just keep cutting away until the masterpiece, like David emerges,

0:22:19.119 --> 0:22:22.359
<v Speaker 1>until you find the banana. I'm sure that is what

0:22:22.440 --> 0:22:24.960
<v Speaker 1>someone told Michael Angelo is a very young boy. Pretty

0:22:25.000 --> 0:22:28.159
<v Speaker 1>sure that was. That was in one of his famous paintings.

0:22:28.200 --> 0:22:30.280
<v Speaker 1>It's just there's a little thing at the very end,

0:22:30.320 --> 0:22:35.280
<v Speaker 1>like have fun, find a banana. But yeah, obviously the

0:22:35.320 --> 0:22:38.160
<v Speaker 1>actual goal of the research is image recognition. I mean

0:22:38.160 --> 0:22:42.159
<v Speaker 1>we've actually done podcasts about image recognition of various forms before.

0:22:42.320 --> 0:22:45.119
<v Speaker 1>Oh yeah, well that that in speech recognition and and

0:22:45.200 --> 0:22:48.520
<v Speaker 1>facial recognition, which is very uh kind of creepy and

0:22:48.560 --> 0:22:51.800
<v Speaker 1>important in our daily Internet lives. And if you if

0:22:51.840 --> 0:22:53.880
<v Speaker 1>you want to, you know, we've talked about this so often.

0:22:53.920 --> 0:22:55.520
<v Speaker 1>If you want to do a really deep dive on

0:22:55.560 --> 0:22:58.920
<v Speaker 1>these topics, you can check out episodes such as can

0:22:59.000 --> 0:23:03.360
<v Speaker 1>computers describe what they see? From November? I know that

0:23:03.480 --> 0:23:10.280
<v Speaker 1>face from October zoom and enhanced from August, and speech

0:23:10.280 --> 0:23:14.240
<v Speaker 1>recognition from April. Man, we talk about computers learning a lot.

0:23:14.480 --> 0:23:17.720
<v Speaker 1>It's well, it is the future and and there's still

0:23:17.760 --> 0:23:21.240
<v Speaker 1>a lot of challenges to this field, right. I mean,

0:23:21.440 --> 0:23:25.159
<v Speaker 1>it's not so easy to make a computer see and

0:23:25.560 --> 0:23:28.359
<v Speaker 1>recognize what it sees the way a human or an

0:23:28.359 --> 0:23:32.320
<v Speaker 1>animal would well, and a lot of these involve uh

0:23:32.640 --> 0:23:36.400
<v Speaker 1>systems that are tweaked for very specific types of recognition.

0:23:36.760 --> 0:23:40.160
<v Speaker 1>It's not like you have one neural network that recognizes everything.

0:23:40.720 --> 0:23:43.439
<v Speaker 1>If you looks really just the cat network or the

0:23:43.480 --> 0:23:46.720
<v Speaker 1>banana network. Yeah, by the way, I get the banana

0:23:46.760 --> 0:23:54.600
<v Speaker 1>work network. It's really appealing. So how are you guys

0:23:59.160 --> 0:24:02.359
<v Speaker 1>just released acknowledge that entirely? I think we keep that

0:24:02.400 --> 0:24:04.840
<v Speaker 1>whole part. I think we do. I think we should

0:24:04.880 --> 0:24:07.439
<v Speaker 1>continue right now. Okay, but but so if you if

0:24:07.440 --> 0:24:11.959
<v Speaker 1>you want to go on the teap dive, So if

0:24:12.000 --> 0:24:13.840
<v Speaker 1>you want to go on a deep dive about all

0:24:13.880 --> 0:24:16.240
<v Speaker 1>of this stuff, you certainly can. But let's go over

0:24:16.320 --> 0:24:19.359
<v Speaker 1>like like a basic overview of why it's so difficult

0:24:19.440 --> 0:24:22.600
<v Speaker 1>to get computers to to see and hear the way

0:24:22.640 --> 0:24:25.359
<v Speaker 1>that we do well. The big one being that architecture,

0:24:25.720 --> 0:24:28.760
<v Speaker 1>you know, the difference between computer architecture and the way

0:24:28.800 --> 0:24:31.720
<v Speaker 1>our brains work. That's that's the biggest, right, that's just

0:24:32.320 --> 0:24:34.480
<v Speaker 1>fundamentally they work in very different ways. And we have

0:24:34.600 --> 0:24:37.600
<v Speaker 1>expressed this in multiple episodes too, especially dealing with things

0:24:37.600 --> 0:24:41.159
<v Speaker 1>like how computer memory is so different from our memory.

0:24:41.520 --> 0:24:44.439
<v Speaker 1>That's just one easy way of pointing at this. So

0:24:44.760 --> 0:24:48.400
<v Speaker 1>that's a big one even And the software simulations are

0:24:48.400 --> 0:24:52.879
<v Speaker 1>incredibly limited because they require a great deal of processing

0:24:52.880 --> 0:24:56.920
<v Speaker 1>power to work properly. Um, and uh, you know, we're

0:24:56.920 --> 0:24:59.919
<v Speaker 1>still learning how the brain works, and so to create

0:25:00.080 --> 0:25:03.080
<v Speaker 1>simulation of it while we have only a partial understanding

0:25:03.119 --> 0:25:06.240
<v Speaker 1>is really tough. In fact, a lot of people, I

0:25:06.240 --> 0:25:10.680
<v Speaker 1>saw one person say, uh, the simulation the artificial neural

0:25:10.720 --> 0:25:13.520
<v Speaker 1>network is similar in a way, like you wouldn't say

0:25:13.560 --> 0:25:15.800
<v Speaker 1>it's a brain, the same way you wouldn't say a

0:25:15.840 --> 0:25:20.480
<v Speaker 1>weather simulation is an actual weather front. It's you know,

0:25:20.560 --> 0:25:23.200
<v Speaker 1>it's a It's as close as we can get right now,

0:25:23.280 --> 0:25:27.080
<v Speaker 1>based upon our understanding and our technological sophistication, and that's

0:25:27.080 --> 0:25:29.760
<v Speaker 1>going to only improve as time goes on. But we're

0:25:29.800 --> 0:25:33.600
<v Speaker 1>still at the very early stages of that, sure. And

0:25:33.720 --> 0:25:36.920
<v Speaker 1>it's also just a major difference in the approach of

0:25:37.400 --> 0:25:41.439
<v Speaker 1>problem solving. I mean, typically, computers as they exist today

0:25:41.720 --> 0:25:45.760
<v Speaker 1>are good at learning by explicit instructions to get the

0:25:45.840 --> 0:25:49.160
<v Speaker 1>right answer. Yes, and assuming that everything in the computer

0:25:49.280 --> 0:25:53.480
<v Speaker 1>is working properly, then they will reliably execute those instructions

0:25:54.160 --> 0:25:57.640
<v Speaker 1>precisely every single time. Yeah, And our brains are exactly

0:25:57.680 --> 0:26:02.040
<v Speaker 1>the opposite. They're not getting the perfect answer, but they're

0:26:02.119 --> 0:26:05.440
<v Speaker 1>very good at something computers aren't at approximating. They're good

0:26:05.440 --> 0:26:08.680
<v Speaker 1>at approximating based on a lot of inputs. So we

0:26:08.840 --> 0:26:12.520
<v Speaker 1>learn what a chair is not by reading a definition

0:26:12.640 --> 0:26:15.040
<v Speaker 1>of the key features of a chair, and then a

0:26:15.160 --> 0:26:19.399
<v Speaker 1>list of all the possible exceptions, including every variation on

0:26:19.440 --> 0:26:22.119
<v Speaker 1>a chair there could be, I mean, like, why would

0:26:22.119 --> 0:26:25.640
<v Speaker 1>we do that? Instead we just see a bunch of chairs.

0:26:26.119 --> 0:26:29.600
<v Speaker 1>Notice that people identify all of these things as chairs.

0:26:29.960 --> 0:26:33.480
<v Speaker 1>We can generally yeah, and then we get an approximate

0:26:33.560 --> 0:26:37.359
<v Speaker 1>idea of Okay, here's basically what a chair is. So

0:26:37.400 --> 0:26:40.840
<v Speaker 1>we have a sort of fuzzy feel for what constitutes

0:26:41.000 --> 0:26:44.080
<v Speaker 1>that object. And that's what the neural networks are trying

0:26:44.119 --> 0:26:48.399
<v Speaker 1>to do. They they have large samples of data and

0:26:48.440 --> 0:26:51.560
<v Speaker 1>they try to get a feel for it. And it

0:26:51.560 --> 0:26:56.320
<v Speaker 1>it takes the the the work of actual human beings

0:26:56.359 --> 0:26:59.320
<v Speaker 1>to make certain that that early stage of training is

0:26:59.320 --> 0:27:02.040
<v Speaker 1>actually working. It's not like, it's not like we have

0:27:02.320 --> 0:27:04.720
<v Speaker 1>a computer that you can just turn on and it

0:27:04.800 --> 0:27:07.080
<v Speaker 1>just automatically starts to learn and it knows when it's

0:27:07.160 --> 0:27:09.040
<v Speaker 1>right and knows when it's wrong, and it can thus

0:27:09.560 --> 0:27:13.240
<v Speaker 1>start to learn everything. We're nowhere close to that. We

0:27:13.359 --> 0:27:15.280
<v Speaker 1>are to the point where you turn a computer on,

0:27:15.359 --> 0:27:17.800
<v Speaker 1>you feed it some information, you see what comes out,

0:27:18.240 --> 0:27:20.600
<v Speaker 1>and then you either say, all right, looks like this

0:27:20.720 --> 0:27:24.840
<v Speaker 1>particular uh go through work. Fine, let's try something else,

0:27:24.920 --> 0:27:27.000
<v Speaker 1>or you say, oh, this didn't work. Let's find out

0:27:27.000 --> 0:27:28.600
<v Speaker 1>what's wrong and fix it so we can try it

0:27:28.640 --> 0:27:30.320
<v Speaker 1>again before you ever get to a point where you

0:27:30.320 --> 0:27:33.359
<v Speaker 1>can start showing it new stuff, right and kind of that.

0:27:33.440 --> 0:27:35.640
<v Speaker 1>The way that you do that, that you build a

0:27:35.680 --> 0:27:39.080
<v Speaker 1>better neural network is that you you check in on

0:27:39.320 --> 0:27:42.760
<v Speaker 1>what it's doing, and by by asking a layer to

0:27:42.960 --> 0:27:46.879
<v Speaker 1>create a visualization, a layer of these these artificial neurals

0:27:46.880 --> 0:27:49.639
<v Speaker 1>to create a visualization of what it's working through. Is

0:27:50.000 --> 0:27:53.920
<v Speaker 1>one way that you have of checking in interesting. As

0:27:54.040 --> 0:27:57.119
<v Speaker 1>an example from this Google research blog, you might have

0:27:57.160 --> 0:27:59.720
<v Speaker 1>set your network to figure out what a dumbbell is,

0:28:00.680 --> 0:28:02.400
<v Speaker 1>but until you ask it for an image, you might

0:28:02.440 --> 0:28:04.800
<v Speaker 1>not realize that all the pictures of dumbbells that it's

0:28:04.800 --> 0:28:08.200
<v Speaker 1>found so far include beefy human arms. So it's obviously

0:28:08.240 --> 0:28:12.680
<v Speaker 1>searching through our stock image libraries that we use. Sure, sure,

0:28:12.800 --> 0:28:14.760
<v Speaker 1>but you know so. So once you see it's images

0:28:14.800 --> 0:28:18.440
<v Speaker 1>of these really weird arm dumbbell hybrids, you can help

0:28:18.480 --> 0:28:21.320
<v Speaker 1>it correct by telling it to enhance the bits that

0:28:21.359 --> 0:28:23.960
<v Speaker 1>are dumbbells, you know, the shapes and the colors that

0:28:24.040 --> 0:28:27.800
<v Speaker 1>go with dumbells, and to ignore the beef arm bits. Yeah,

0:28:28.359 --> 0:28:30.679
<v Speaker 1>it actually is very interesting because as much as I

0:28:30.760 --> 0:28:33.480
<v Speaker 1>joke about the stock stock image thing, it really does

0:28:33.560 --> 0:28:38.480
<v Speaker 1>tell you that when we choose certain images to represent concepts,

0:28:38.880 --> 0:28:42.719
<v Speaker 1>we often will go to very similar ones, and to

0:28:42.760 --> 0:28:45.600
<v Speaker 1>the point where as humans, we know we can differentiate

0:28:45.720 --> 0:28:48.640
<v Speaker 1>the thing in that image that actually represents the concept

0:28:48.760 --> 0:28:52.560
<v Speaker 1>versus some other supplemental thing. But a computer doesn't know

0:28:52.640 --> 0:28:55.920
<v Speaker 1>that unless Yeah, and it's so loaded with context. I mean,

0:28:56.080 --> 0:28:59.000
<v Speaker 1>you wouldn't know why a bf arm and a dumbbell

0:28:59.040 --> 0:29:02.440
<v Speaker 1>would go together, and you understood culturally speaking that that

0:29:02.560 --> 0:29:04.520
<v Speaker 1>sometimes when you work with a lot of dumbbells, you

0:29:04.520 --> 0:29:07.640
<v Speaker 1>get big BPRM. So all of this is extranees information

0:29:07.640 --> 0:29:10.760
<v Speaker 1>that a computer can't possibly be asked to automatically know

0:29:11.000 --> 0:29:14.680
<v Speaker 1>the way that a human person would. Uh And okay,

0:29:14.760 --> 0:29:18.000
<v Speaker 1>So so extrapolating out a little bit further from this concept,

0:29:18.520 --> 0:29:21.160
<v Speaker 1>the end goal is really for your neural network to

0:29:21.280 --> 0:29:25.280
<v Speaker 1>be able to auto correct, so you can program it

0:29:25.360 --> 0:29:28.680
<v Speaker 1>to enhance whatever it thinks is important, and it will

0:29:28.720 --> 0:29:31.200
<v Speaker 1>look for patterns in the visual data of an image

0:29:31.280 --> 0:29:35.040
<v Speaker 1>and enhance those, then evaluate the resulting image and find

0:29:35.040 --> 0:29:38.160
<v Speaker 1>more patterns and enhance them, and so on. It's a

0:29:38.200 --> 0:29:40.360
<v Speaker 1>little like like asking a child to tell you what

0:29:40.560 --> 0:29:43.440
<v Speaker 1>shapes she sees in the clouds. You can get a

0:29:43.440 --> 0:29:46.640
<v Speaker 1>decent sense based on her answers of her abilities to

0:29:46.680 --> 0:29:52.520
<v Speaker 1>think abstract lee and to extrapolate visually interesting. So yeah, yeah,

0:29:52.560 --> 0:29:56.560
<v Speaker 1>because I think of ties where uh, me and my

0:29:56.600 --> 0:29:59.240
<v Speaker 1>friends would be looking at the clouds and we would

0:29:59.280 --> 0:30:01.440
<v Speaker 1>sit there and talk about what shapes we saw. And

0:30:01.480 --> 0:30:03.720
<v Speaker 1>I remember like a friend might say, oh that I

0:30:03.720 --> 0:30:05.600
<v Speaker 1>see a dog, and other guy says, I see a man,

0:30:05.640 --> 0:30:07.040
<v Speaker 1>and they'd say, what do you see? As I see

0:30:07.080 --> 0:30:09.920
<v Speaker 1>it looks like it's going to rain, and m hm

0:30:11.640 --> 0:30:14.960
<v Speaker 1>explains why I don't have friends. But but yeah, you know,

0:30:15.040 --> 0:30:17.280
<v Speaker 1>so so you get to you get to evaluate this

0:30:17.480 --> 0:30:20.960
<v Speaker 1>kid's conceptualizations, and you also probably get to have a

0:30:20.960 --> 0:30:23.640
<v Speaker 1>really rad, trippy conversation. And these are the two things

0:30:23.680 --> 0:30:26.840
<v Speaker 1>that we are getting out of deep dream. Yeah, yeah, certainly.

0:30:26.960 --> 0:30:30.440
<v Speaker 1>And to bring it back to this sort of byproduct

0:30:30.560 --> 0:30:34.320
<v Speaker 1>these deep dream images, how you actually get these is

0:30:34.520 --> 0:30:40.080
<v Speaker 1>that refining feedback process. So at a certain layer of analysis,

0:30:40.720 --> 0:30:44.120
<v Speaker 1>you tell the neural network, okay, what whatever you found here,

0:30:44.600 --> 0:30:46.680
<v Speaker 1>focus in on that and and pay a lot of

0:30:46.720 --> 0:30:50.080
<v Speaker 1>attention to it, and then look at it again and

0:30:50.120 --> 0:30:54.080
<v Speaker 1>then pay more attention. Yeah, and and it does. Paying

0:30:54.080 --> 0:30:58.560
<v Speaker 1>attention goes beyond just focusing. It goes to the addition

0:30:58.720 --> 0:31:01.640
<v Speaker 1>of information, right if relation of information Yeah, yeah, Like

0:31:01.680 --> 0:31:04.240
<v Speaker 1>if you think that you see a pagoda in those clouds,

0:31:04.520 --> 0:31:09.600
<v Speaker 1>then really really enhance that pagoda tendency, right, And this

0:31:09.680 --> 0:31:13.720
<v Speaker 1>is how you end up with Jonathan having eyes all

0:31:13.760 --> 0:31:17.800
<v Speaker 1>over his shoulders and caterpillars for arms. It might be

0:31:17.880 --> 0:31:23.200
<v Speaker 1>looking for images that it has recognized before in biological life,

0:31:23.760 --> 0:31:28.320
<v Speaker 1>in pictures of animals, pictures of insects, and say, yeah, okay,

0:31:28.360 --> 0:31:31.640
<v Speaker 1>that arms kind of you know, kind of tube shaped.

0:31:31.720 --> 0:31:33.600
<v Speaker 1>So maybe we can make that a little bit more

0:31:33.680 --> 0:31:36.160
<v Speaker 1>like a caterpillar. Oh now it's looking a lot like

0:31:36.200 --> 0:31:39.880
<v Speaker 1>a caterpillar. Make it look more like a caterpillar. Even, Yeah, totally.

0:31:40.000 --> 0:31:41.960
<v Speaker 1>By the way, you can get to that same destination

0:31:42.040 --> 0:31:43.840
<v Speaker 1>just by hanging out with me on a Saturday night,

0:31:44.960 --> 0:31:49.760
<v Speaker 1>Same same result to me. This actually raises a pretty

0:31:49.800 --> 0:31:56.240
<v Speaker 1>weird and possibly interesting, possibly superficial question about artificial intelligence.

0:31:57.280 --> 0:32:00.840
<v Speaker 1>If we're teaching our computers to seem or like us,

0:32:01.640 --> 0:32:05.880
<v Speaker 1>does that mean they'll eventually learn to hallucinate like us?

0:32:05.920 --> 0:32:11.920
<v Speaker 1>Like Is hallucinating a natural consequence of human levels of

0:32:12.000 --> 0:32:16.520
<v Speaker 1>vision and object recognition? Well, I would ask, how how

0:32:16.800 --> 0:32:20.880
<v Speaker 1>do we know that they're not already hallucinating? There's a

0:32:20.960 --> 0:32:23.760
<v Speaker 1>theory Follow me here, Follow me here. There's there's a

0:32:23.840 --> 0:32:27.480
<v Speaker 1>theory of baby brain growth that suggests that infant sensory

0:32:27.520 --> 0:32:30.600
<v Speaker 1>processing is on a level similar to adults that are

0:32:30.680 --> 0:32:34.720
<v Speaker 1>using hallucinogens, because there's there's less or or even no

0:32:34.920 --> 0:32:39.200
<v Speaker 1>awareen awareness of context, and there's less separation between internal

0:32:39.200 --> 0:32:43.640
<v Speaker 1>and external stimuli. So everything feels and looks and sounds

0:32:44.040 --> 0:32:47.800
<v Speaker 1>real and immediate, including things that are artifacts of brains

0:32:47.800 --> 0:32:52.360
<v Speaker 1>inner processings like associations and fragments of memories and misunderstandings

0:32:52.360 --> 0:32:57.080
<v Speaker 1>of what you're seeing and hearing. So h so are

0:32:57.120 --> 0:33:00.760
<v Speaker 1>our computers all tripping all the time? Is the question

0:33:00.840 --> 0:33:04.880
<v Speaker 1>I posed? Do androids dream of electric sheep or are

0:33:04.920 --> 0:33:08.360
<v Speaker 1>the electric sheep there for them always? I don't know. Well,

0:33:08.440 --> 0:33:12.040
<v Speaker 1>they don't dream of electric sheep standing in a pasture.

0:33:12.080 --> 0:33:15.200
<v Speaker 1>They dream of electric cheap emerging out of your pectoral

0:33:15.280 --> 0:33:20.440
<v Speaker 1>muscles again Saturday night. Yeah, this is ah to me?

0:33:20.520 --> 0:33:23.320
<v Speaker 1>This is this isn't a great A great way of

0:33:23.360 --> 0:33:30.120
<v Speaker 1>appreciating how weird and amazing artificial intelligence as a discipline is.

0:33:30.760 --> 0:33:34.280
<v Speaker 1>And while while this is almost like a byproduct, like

0:33:34.320 --> 0:33:37.520
<v Speaker 1>its just an interesting byproduct of something that was intended

0:33:37.560 --> 0:33:42.720
<v Speaker 1>to improve upon image recognition software, it also has created

0:33:43.160 --> 0:33:47.520
<v Speaker 1>some truly remarkable images. I mean, it's it's you could

0:33:47.600 --> 0:33:52.040
<v Speaker 1>argue it's a new form of art, and you start

0:33:52.200 --> 0:33:55.880
<v Speaker 1>by feeding it an image that you think is already

0:33:55.920 --> 0:33:59.120
<v Speaker 1>interesting or maybe not interesting. Because either way it works

0:33:59.680 --> 0:34:02.600
<v Speaker 1>and in you see what comes out of it, and

0:34:02.800 --> 0:34:06.440
<v Speaker 1>U we've even seen some pretty you know, mostly it's

0:34:06.480 --> 0:34:08.520
<v Speaker 1>done for laughs, but there have been some that I

0:34:08.560 --> 0:34:11.319
<v Speaker 1>think it just really striking images that make me think

0:34:11.320 --> 0:34:15.440
<v Speaker 1>of Impressionism and some other and more even more abstract

0:34:15.480 --> 0:34:17.680
<v Speaker 1>and surreal approaches to art as well. Some of the

0:34:17.680 --> 0:34:21.279
<v Speaker 1>ones I've seen very much have a Salvador Dolly kind

0:34:21.280 --> 0:34:25.560
<v Speaker 1>of the ones with like a dripping architecture where there

0:34:25.560 --> 0:34:27.920
<v Speaker 1>seemed to be you see, like arches and things that

0:34:28.000 --> 0:34:30.799
<v Speaker 1>you would recognize from buildings, except they seem to be

0:34:30.880 --> 0:34:33.920
<v Speaker 1>made of liquids somehow. Yeah. Yeah. And there's also an

0:34:33.920 --> 0:34:36.200
<v Speaker 1>element like Dolly of that, of that kind of of

0:34:36.239 --> 0:34:39.640
<v Speaker 1>that kind of Escher sort of influence in there too,

0:34:39.640 --> 0:34:42.280
<v Speaker 1>because of the way that the shapes repeat and twist

0:34:42.400 --> 0:34:46.439
<v Speaker 1>on each other. And oh it's fascinating, like impossible perspective.

0:34:46.840 --> 0:34:52.319
<v Speaker 1>Yeah yeah, yeah. So we're getting some some phenomenal pictures, uh,

0:34:52.440 --> 0:34:55.840
<v Speaker 1>and maybe we'll even post some from the Forward Thinking

0:34:55.880 --> 0:34:59.279
<v Speaker 1>crew later on. We've talked about the possibility of of

0:34:59.320 --> 0:35:02.680
<v Speaker 1>doing a photo shoot just just to feed it through

0:35:02.719 --> 0:35:05.440
<v Speaker 1>here and find out what kind of fresh horror awaits us.

0:35:05.480 --> 0:35:09.000
<v Speaker 1>But I really think that this was cool also just

0:35:09.080 --> 0:35:11.200
<v Speaker 1>to kind of get a look at how artificial neural

0:35:11.239 --> 0:35:15.080
<v Speaker 1>networks work and and the process that they tend to

0:35:15.239 --> 0:35:18.080
<v Speaker 1>use in order for machines to be able to learn stuff,

0:35:18.080 --> 0:35:21.200
<v Speaker 1>because we've talked about learning so much without really getting

0:35:21.200 --> 0:35:24.320
<v Speaker 1>into the process that's going on, just you know, to

0:35:24.360 --> 0:35:27.879
<v Speaker 1>say that, hey, this machine can learn. It's it's it's

0:35:27.880 --> 0:35:30.160
<v Speaker 1>not it's doing it a disservice. So this was great

0:35:30.200 --> 0:35:33.680
<v Speaker 1>to get into the nuts and bolts of that. I'm

0:35:33.680 --> 0:35:37.120
<v Speaker 1>going to go home and lay on a caterparrot. Okay,

0:35:37.200 --> 0:35:40.560
<v Speaker 1>well you you own one. We've seen it. So guys,

0:35:40.680 --> 0:35:44.280
<v Speaker 1>if you have any suggestions for future episodes of forward Thinking,

0:35:44.360 --> 0:35:46.680
<v Speaker 1>maybe there's something you've always wanted to know more about,

0:35:46.760 --> 0:35:48.320
<v Speaker 1>like what's that going to be like in the future,

0:35:48.400 --> 0:35:50.160
<v Speaker 1>Or maybe there's even a topic we've covered in the

0:35:50.200 --> 0:35:52.960
<v Speaker 1>past that you want to have us focus on the

0:35:53.080 --> 0:35:55.960
<v Speaker 1>very specific part of that, or whatever it may be.

0:35:56.520 --> 0:35:58.720
<v Speaker 1>You should write us, and maybe you just have comments

0:35:58.719 --> 0:36:02.880
<v Speaker 1>about the the goal project or artificial intelligence of machine learning.

0:36:03.160 --> 0:36:05.759
<v Speaker 1>Send us a message. Our email is FW thinking at

0:36:05.760 --> 0:36:07.920
<v Speaker 1>how Stuff Works dot com, or drop us a line

0:36:08.200 --> 0:36:11.680
<v Speaker 1>on Facebook, Google Plus or Twitter. A Google Plus and Twitter,

0:36:11.719 --> 0:36:14.480
<v Speaker 1>we are FW Thinking or Facebook. Just search f W

0:36:14.640 --> 0:36:16.279
<v Speaker 1>Thinking in the search bar. We will pop right up.

0:36:16.320 --> 0:36:18.160
<v Speaker 1>We can leave us a message and we will talk

0:36:18.160 --> 0:36:25.560
<v Speaker 1>to you. Against will be soon. For more on this

0:36:25.680 --> 0:36:28.680
<v Speaker 1>topic in the future of technology, I visit forward thinking

0:36:28.800 --> 0:36:41.880
<v Speaker 1>dot Com, brought to you by Toyota. Let's Go Places,