WEBVTT - What is Apple's Neural Engine? 0:00:04.120 --> 0:00:07.160 Get in touch with technology with tech Stuff from how 0:00:07.200 --> 0:00:13.880 stuff works dot com. Hey there, and welcome to tech Stuff. 0:00:13.960 --> 0:00:16.680 My name is Jonathan Strickland. I happen to be the 0:00:16.680 --> 0:00:19.239 host of this show. I'm also an executive producer at 0:00:19.239 --> 0:00:23.000 how stuff Works. And hey, I love all things tech, 0:00:23.480 --> 0:00:26.960 and today we're doing a little listener mail request. Dan 0:00:27.240 --> 0:00:29.600 wrote in and asked if I might do an episode 0:00:29.840 --> 0:00:34.800 about Apple's so called neural engine in its more recent iPhones. 0:00:35.200 --> 0:00:38.800 So today we are going to learn what a neural 0:00:38.920 --> 0:00:42.360 engine is and what it does. And if you guys, 0:00:42.479 --> 0:00:45.440 by the way, have any requests for topics you've always thought, Hey, 0:00:45.479 --> 0:00:49.600 I want to have an episode about this particular tech topic. Remember, 0:00:49.640 --> 0:00:52.040 you can send those to me by sending an email 0:00:52.080 --> 0:00:54.920 to tex Stuff at how stuff works dot com. And 0:00:54.960 --> 0:00:58.920 now let's talk about this neural engine. Well, the general 0:00:58.960 --> 0:01:04.240 public for heard about this topic back in September two 0:01:04.240 --> 0:01:08.640 thousand seventeen, when Apple CEO Tim Cook presented at what 0:01:08.760 --> 0:01:12.000 has become an annual tradition for Apple at around that 0:01:12.080 --> 0:01:16.160 time of year, pretty much every September is when Apple 0:01:16.200 --> 0:01:18.800 will come out and unveil the latest in its line 0:01:18.840 --> 0:01:23.880 of iPhone smartphones, and in that would have been the 0:01:24.080 --> 0:01:28.120 Iconic iPhone X, the tenth anniversary edition of the iPhone 0:01:28.160 --> 0:01:32.440 also the one that's been discontinued now. Cook listed off 0:01:32.480 --> 0:01:35.399 a lot of features when he went to that presentation, 0:01:35.440 --> 0:01:38.360 but the one we're really interested in today is part 0:01:38.400 --> 0:01:42.840 of the phones A eleven micro processor, also called the 0:01:42.920 --> 0:01:47.680 A eleven Bionic CPU. The most recent iPhones as of 0:01:47.680 --> 0:01:51.720 the recording of this podcast now have the next generation 0:01:52.040 --> 0:01:55.520 of that chip, the A twelve, But in both cases, 0:01:55.720 --> 0:01:58.280 the neural engine is one of the elements that gets 0:01:58.320 --> 0:02:00.800 a lot of coverage. So let's go to the A 0:02:00.880 --> 0:02:03.000 eleven since that was the first one to have it. 0:02:03.000 --> 0:02:07.520 It's more than just a CPU. It's technically a system 0:02:07.600 --> 0:02:11.640 on a chip or s O a C. It's an 0:02:11.840 --> 0:02:15.320 ARM sixty four bit chip. But that doesn't really tell 0:02:15.360 --> 0:02:18.160 you anything if you're not, you know, deep into the 0:02:18.200 --> 0:02:21.120 world of micro processors. So what does that actually mean. Well, 0:02:21.360 --> 0:02:24.600 the ARM based part means that it's it's based on 0:02:24.800 --> 0:02:29.519 the ARM micro architecture in chip design. So for our 0:02:29.560 --> 0:02:33.760 purposes we can simplify this to say, the chips components, 0:02:33.880 --> 0:02:38.240 the stuff that's actually on the microprocessor are laid out 0:02:38.480 --> 0:02:42.000 in a way that was developed by ARM Holdings, that's 0:02:42.000 --> 0:02:47.040 the company behind ARM processors. Now that is different from 0:02:47.040 --> 0:02:49.560 the layout you would find in a chip that was 0:02:49.639 --> 0:02:54.119 made by Intel, for example. So the architecture part literally 0:02:54.200 --> 0:02:58.400 refers to the layout of components in the microprocessor and 0:02:58.440 --> 0:03:02.519 how they interact with each other. And generally speaking, companies 0:03:02.560 --> 0:03:08.080 that make microprocessors develop an architecture. They do so in 0:03:08.120 --> 0:03:11.240 a way that is supposed to maximize the efficiency of 0:03:11.320 --> 0:03:13.320 the chips. So if you get the most power for 0:03:13.400 --> 0:03:17.679 the least amount of energy input you can with the 0:03:17.760 --> 0:03:19.600 least amount of waste, really is the best way of 0:03:19.600 --> 0:03:21.440 putting it. You don't want to waste too much and 0:03:22.000 --> 0:03:25.680 produce too much heat. And then you typically would then 0:03:25.880 --> 0:03:29.880 reduce the size of the various components. And then after 0:03:29.919 --> 0:03:32.040 you reduce the size of the components, you might figure 0:03:32.040 --> 0:03:35.720 out a new architecture that makes better use of these 0:03:36.080 --> 0:03:39.720 smaller components. And this process goes on and on. Intel 0:03:39.760 --> 0:03:44.320 calls this the TIC talk methodology. So that's what the 0:03:44.400 --> 0:03:47.720 ARM based part means. It's from this particular company following 0:03:47.720 --> 0:03:51.440 this particular layout. As for that sixty four bit part, 0:03:51.520 --> 0:03:54.360 what does that mean, Well, that refers to the data 0:03:54.480 --> 0:03:59.840 width of the arithmetic logic unit or a LU. The 0:04:00.040 --> 0:04:02.680 says the part of the processor that actually carries out 0:04:03.120 --> 0:04:08.640 those operations on data from computer instructions. So data with 0:04:08.840 --> 0:04:12.160 essentially tells you how much information the a L you 0:04:12.360 --> 0:04:16.919 can accept or handle at a given time, and it 0:04:16.960 --> 0:04:20.240 tells you this in bits. Now, a bit, just to 0:04:20.279 --> 0:04:24.760 remind you, is a single unit of computational information, and 0:04:24.800 --> 0:04:29.560 it is binary, meaning has two states, which we designate 0:04:29.640 --> 0:04:33.799 as being either a zero or a one. Some people 0:04:33.839 --> 0:04:38.120 say often on or false and true, but it's zero 0:04:38.279 --> 0:04:42.200 and one. The number of bits tells you how big 0:04:42.640 --> 0:04:46.919 these actual numbers can get. Before the a L you 0:04:47.120 --> 0:04:49.960 can't handle them anymore. So let's say you have an 0:04:49.960 --> 0:04:52.320 eight bit chip, because that's a lot easier to talk about. 0:04:53.000 --> 0:04:56.360 You would be able to add, subtract, multiply, divide, you know, 0:04:56.400 --> 0:05:02.320 the basic arithmetic lot logical operation to eight bit numbers. 0:05:02.480 --> 0:05:05.800 With an eight bit chip, Now, a single bit is 0:05:05.800 --> 0:05:08.960 a zero or a one, and eight bit number you 0:05:09.000 --> 0:05:14.000 can represent as a string of eight eight numbers, either 0:05:14.120 --> 0:05:17.359 zeros or ones, So you could have eight zeros in 0:05:17.360 --> 0:05:20.200 a row, up to eight ones in a row and 0:05:20.240 --> 0:05:23.200 everything in between. So it could be seven zeros and 0:05:23.240 --> 0:05:25.680 then a one or it could be six zeros and 0:05:25.720 --> 0:05:28.159 then a one and then another zero. You get the point. 0:05:28.880 --> 0:05:32.320 With that many combinations, that means you would be able 0:05:32.400 --> 0:05:36.680 to go from the typical numbers of zero to two 0:05:36.760 --> 0:05:41.120 hundred fifty five. That's with eight bit. However, we're not 0:05:41.200 --> 0:05:44.840 talking eight bit. We're talking about a sixty four bit chips. 0:05:44.839 --> 0:05:48.960 So now you have sixty four digits in a row 0:05:49.000 --> 0:05:52.240 that can be either a zero or one. That provides 0:05:52.279 --> 0:05:57.640 you a lot more combinations, which means you could range 0:05:57.720 --> 0:06:04.040 in number from zero row to nine quintillion, two hundred 0:06:04.320 --> 0:06:10.160 twenty three quadrillion, three hundred seventy two trillion, thirty six billion, 0:06:10.360 --> 0:06:15.000 eight hundred fifty four million, seven hundred seventy five thousand, 0:06:15.080 --> 0:06:19.880 eight hundred seven. That's a pretty big range. It can 0:06:19.920 --> 0:06:23.760 handle way way larger numbers than an eight bit chip. 0:06:24.120 --> 0:06:28.200 So that tells you the type of architecture this chip 0:06:28.279 --> 0:06:30.440 has and the amount of data it can handle at 0:06:30.440 --> 0:06:35.680 a time. The A eleven has six cores, so processors 0:06:35.680 --> 0:06:40.400 with multiple cores can work on parts of a problem simultaneously. 0:06:40.520 --> 0:06:43.200 If you have something that's called a parallel problem, you 0:06:43.240 --> 0:06:45.839 can divide that problem up into different segments and have 0:06:45.960 --> 0:06:49.440 different cores tackle it. Two of those six cores are 0:06:49.480 --> 0:06:53.240 what Apple calls high performance cores. They have a clock 0:06:53.320 --> 0:06:57.160 speed of two point three thirty nine giga hurts uh 0:06:57.200 --> 0:06:59.640 in the A eleven, So the clock speed tells you 0:06:59.680 --> 0:07:03.360 how many clock cycles a CPU can perform per second. 0:07:03.760 --> 0:07:06.920 Two point three nine gigga hurts means that these cores 0:07:06.960 --> 0:07:11.480 can each perform two point thirty nine billion clock cycles 0:07:11.480 --> 0:07:16.320 per second. Now, clock cycles do not easily translate over 0:07:16.440 --> 0:07:21.000 into actions. It's not necessarily one clock cycle per action. 0:07:21.400 --> 0:07:24.960 But generally these numbers tell you how much a core 0:07:26.080 --> 0:07:28.840 of the processor is able to handle per second, how 0:07:28.880 --> 0:07:32.280 many tasks it can do per second, assuming a certain 0:07:32.320 --> 0:07:36.640 number of clock cycles per task. Now, these two cores 0:07:36.680 --> 0:07:41.080 are referred to as Monsoon. The other four cores are 0:07:41.200 --> 0:07:44.440 what Apple refers to as energy efficient cores. They are 0:07:44.480 --> 0:07:47.440 not at that same high clock speed. They are meant 0:07:47.480 --> 0:07:52.440 to handle more routine tasks. They are called mistral. So 0:07:52.520 --> 0:07:57.760 you have Monsoon and Mistral, two Monsoon cores for mistral cores. 0:07:57.800 --> 0:08:00.360 But the A eleven is not just a CPU. Also 0:08:00.440 --> 0:08:05.760 has a three core graphics processing unit or GPU incorporated 0:08:05.840 --> 0:08:08.600 into this chip. And then there are the two processing 0:08:08.640 --> 0:08:14.119 cores dedicated specifically to handling tasks related to machine learning algorithms. 0:08:14.520 --> 0:08:18.400 This pair of processors are the neural engine. They are 0:08:18.640 --> 0:08:23.720 essentially an artificial neural network. And I've talked a little 0:08:23.760 --> 0:08:27.440 bit about artificial neural networks before, but we're really going 0:08:27.480 --> 0:08:29.400 to try and get an understanding of what makes them 0:08:29.400 --> 0:08:34.160 special today, because that's really why neural engine means anything 0:08:34.160 --> 0:08:37.160 in the first place. So this means we get to 0:08:37.160 --> 0:08:40.080 do a quick history lesson because this is tech stuff, 0:08:40.160 --> 0:08:42.960 and of course we have to go into the history. 0:08:43.040 --> 0:08:47.559 So here we go back in the nineteen forties and 0:08:47.640 --> 0:08:51.800 the nineteen fifties, there were some smart guys named Warren 0:08:51.840 --> 0:08:56.040 McCullough who was a neurophysiologist, and another guy named Walter 0:08:56.200 --> 0:08:58.960 Pitts who was a computer scientist and a logician, and 0:08:59.000 --> 0:09:04.560 they began developing theories that brought together computational science and neuroscience, 0:09:04.600 --> 0:09:08.200 in other words, the way machines process information and the 0:09:08.240 --> 0:09:13.599 way brains process information, which is different. McCullough wrote a 0:09:13.640 --> 0:09:16.679 couple of papers about this, and he asserted that the 0:09:16.720 --> 0:09:19.840 basic unit of logic in the brain is the neuron. 0:09:20.320 --> 0:09:24.400 So the nerve cell, the brain cell, is your your 0:09:24.440 --> 0:09:28.040 basic unit of logic in a brain, so it would 0:09:28.040 --> 0:09:30.840 act kind of like a gate or a transistor in 0:09:30.880 --> 0:09:35.000 a circuit. And so you might have a transistor being 0:09:35.040 --> 0:09:40.280 the smallest unit, not not metric of logic, but the 0:09:40.320 --> 0:09:43.640 smallest unit to allow this to happen in a circuit 0:09:44.080 --> 0:09:47.920 neurons in the brain. Pets and McCullough began developing computer 0:09:48.000 --> 0:09:51.280 algorithms that attempted to guide machines to process information in 0:09:51.280 --> 0:09:54.320 a way that was at least conceptually similar to the 0:09:54.320 --> 0:09:57.640 way our brains process information. McCullough had proposed that by 0:09:57.640 --> 0:10:00.400 doing this, you could train a machine to wreck niye 0:10:00.520 --> 0:10:06.000 handwritten characters like numbers or letters, even if those representations 0:10:06.280 --> 0:10:09.520 varied in size or style. And I've talked about this 0:10:09.600 --> 0:10:13.480 being a challenge in the past as well, that training 0:10:13.559 --> 0:10:18.760 a computer to recognize a specific type of image or 0:10:18.800 --> 0:10:22.800 a specific thing in an image is challenging. So I 0:10:22.800 --> 0:10:25.200 always use coffee mugs as an example. I don't know why, 0:10:25.400 --> 0:10:27.800 but I like that that particular one. So we're gonna 0:10:27.840 --> 0:10:30.880 go with it again. If you were to create a 0:10:30.920 --> 0:10:34.200 computer program where you feed an image of a coffee 0:10:34.280 --> 0:10:36.680 mug to the computer program, and you tell the computer 0:10:36.760 --> 0:10:42.679 program this image corresponds with this concept called coffee mug. 0:10:43.240 --> 0:10:47.080 And the image shows a blue coffee mug and its 0:10:47.120 --> 0:10:50.280 handle is pointed toward the right of the perspective of 0:10:50.320 --> 0:10:54.280 the viewer. And then you were to feed a different image, 0:10:54.360 --> 0:10:56.360 maybe of that same coffee mug, but now at a 0:10:56.360 --> 0:11:01.040 different angle. Well, the machine as looking at this as 0:11:01.080 --> 0:11:05.360 if it's a totally new thing. It cannot just uh 0:11:05.559 --> 0:11:08.120 extricate that information and say, oh, this is also a 0:11:08.120 --> 0:11:11.120 coffee mug, or maybe it's a different coffee mug. It's 0:11:11.120 --> 0:11:13.800 a different color or a different size or different shape. 0:11:14.840 --> 0:11:18.920 The computer doesn't understand the concept of coffee mug. So 0:11:18.960 --> 0:11:21.720 how can you teach it this concept? How can you 0:11:21.760 --> 0:11:25.480 train it so it recognizes coffee mugs? That was what 0:11:25.600 --> 0:11:29.400 McCulloch was looking at. Then you have another guy who 0:11:29.400 --> 0:11:33.480 came along, Frank Rosenblat, very smart man, who built on 0:11:33.520 --> 0:11:37.760 this work. He developed an artificial neuron called the perceptron. Now, 0:11:37.760 --> 0:11:41.160 a perceptron's job is, from a very high level, pretty simple. 0:11:41.280 --> 0:11:46.400 It accepts multiple binary inputs. So it accepts inputs that 0:11:46.440 --> 0:11:49.920 are either zeros or ones, and then it produces a 0:11:50.000 --> 0:11:54.440 single binary output either a zero or a one based 0:11:54.520 --> 0:11:58.199 upon processing that information. So let's say you want to 0:11:58.240 --> 0:12:01.199 create a program that can help you decide which restaurant 0:12:01.320 --> 0:12:03.920 you want to go to, and you've come up with 0:12:04.040 --> 0:12:07.560 three criteria that you think are really important in order 0:12:07.559 --> 0:12:10.560 for you to make this decision. And the three criteria 0:12:10.760 --> 0:12:14.160 you have are is the restaurant within a twenty minute 0:12:14.240 --> 0:12:19.439 drive or less? So, is it relatively close? Will a 0:12:19.440 --> 0:12:23.080 meal cost less than fifty dollars for two people to 0:12:23.160 --> 0:12:27.360 have dinner there? And does the restaurant serve tacos? Those 0:12:27.400 --> 0:12:30.200 are your three points of criteria, and you can represent 0:12:30.280 --> 0:12:33.800 each of those variables with a binary figure. So, for example, 0:12:34.360 --> 0:12:37.880 you could say that if the restaurant is closer than 0:12:37.920 --> 0:12:41.160 a twenty minute drive, if it is nearby, you represent 0:12:41.240 --> 0:12:44.120 that variable with a one. If it is further away 0:12:44.160 --> 0:12:47.440 than that, it's a zero. If the dinner for two 0:12:47.520 --> 0:12:50.439 is cheaper than fifty dollars, that's a one. If it's 0:12:50.440 --> 0:12:54.920 more expensive, it's a zero. And if it serves tacos, 0:12:54.920 --> 0:12:57.840 it's a one. And if it does not serve tacos, 0:12:57.960 --> 0:13:00.680 it's a big fat zero. Then you have a list 0:13:00.720 --> 0:13:04.319 of various restaurants you could feed each restaurant through your 0:13:04.320 --> 0:13:07.480 criteria and see how they do. Uh, And then you 0:13:07.480 --> 0:13:10.000 could narrow your choices this way, and perhaps there is 0:13:10.040 --> 0:13:13.280 no single restaurant that meets all those criteria, so you 0:13:13.320 --> 0:13:17.640 really should take another step. And that's where Rosenblatt introduces 0:13:17.679 --> 0:13:23.679 the concept of weights, where you you change how important 0:13:23.720 --> 0:13:26.400 each of the criteria are in relation to each other. 0:13:26.480 --> 0:13:31.160 Weights are real numbers that indicate the importance of particular criterion. 0:13:31.720 --> 0:13:36.520 So you want, let's say all those three criteria you've identified, 0:13:36.720 --> 0:13:39.400 the distance, the cost, and whether or not they have tacos. 0:13:39.880 --> 0:13:43.560 You have decided the most critical piece of information is 0:13:43.600 --> 0:13:47.360 whether or not the restaurant serves tacos. So you could 0:13:47.400 --> 0:13:52.000 then assign a greater weight to that criterion, saying this 0:13:52.080 --> 0:13:54.679 is more important to me, and that will influence the 0:13:54.720 --> 0:13:58.440 output of the neuron. You must also determine a threshold 0:13:58.520 --> 0:14:02.160 value for the decision. In other words, you say, in 0:14:02.280 --> 0:14:05.920 order to produce a positive result to tell me, yes, 0:14:05.960 --> 0:14:08.640 this is a restaurant you should go to, you must 0:14:08.720 --> 0:14:13.240 at least meet this threshold. That's the minimum value the 0:14:13.280 --> 0:14:15.880 calculation has to meet or exceed in order to produce 0:14:15.960 --> 0:14:19.280 a go to this restaurant result. I'll explain a bit 0:14:19.280 --> 0:14:21.720 more about this in just a second, But first I'm 0:14:21.720 --> 0:14:24.440 going to take a quick break and thank our sponsors. 0:14:32.320 --> 0:14:35.240 That threshold value that I mentioned before the break is 0:14:35.280 --> 0:14:38.360 really important because it tells your model what sort of 0:14:38.400 --> 0:14:42.920 results count as valid versus not valid. So let's say 0:14:43.040 --> 0:14:45.920 I've waited the criteria so that the distance to the 0:14:46.000 --> 0:14:49.040 restaurant and the expense of the meal each have a 0:14:49.120 --> 0:14:53.120 weight of two, but the presence of tacos is a six. 0:14:53.680 --> 0:14:56.240 That's how important I think tacos are. And I've said 0:14:56.240 --> 0:14:59.240 a threshold of four. Well, that means that if the 0:14:59.280 --> 0:15:04.240 restaurant is relatively close and it's relatively inexpensive, it's going 0:15:04.280 --> 0:15:06.400 to pass my criteria because I've given a weight of 0:15:06.440 --> 0:15:09.560 two for both of those and added together that's four. 0:15:09.640 --> 0:15:12.320 It equals the threshold. Good to go. But even if 0:15:12.360 --> 0:15:16.040 the restaurant is far away and even if it's expensive, 0:15:16.720 --> 0:15:20.520 if it serves tacos, it still passes my criteria because 0:15:20.520 --> 0:15:23.760 I gave the tacos a weight of six. Raising the 0:15:23.800 --> 0:15:29.160 threshold value reduces the number of valid restaurants. So if 0:15:29.200 --> 0:15:32.920 I make the threshold eight instead of four, now the 0:15:33.080 --> 0:15:36.160 only way I can get a valid result a result 0:15:36.200 --> 0:15:39.240 of yes, go to this restaurant is if the restaurant 0:15:39.360 --> 0:15:44.760 has tacos and it's either close by, or it's inexpensive, 0:15:45.160 --> 0:15:47.800 or both. And if I said the threshold were ten, 0:15:48.680 --> 0:15:51.840 all three criteria would need to be met for this 0:15:51.880 --> 0:15:55.720 option to be valid. Now, an artificial intelligence for the 0:15:55.760 --> 0:15:59.200 purposes of notation, many people will move the threshold value 0:15:59.440 --> 0:16:01.880 to the other side of the equation, and in this 0:16:01.920 --> 0:16:04.760 case we now call it a bias, and a bias 0:16:04.880 --> 0:16:07.360 essentially is a measurement to tell you how easy or 0:16:07.440 --> 0:16:10.640 difficult it is to get the perceptron to fire off 0:16:10.720 --> 0:16:14.520 a positive value. If you have a big positive bias, 0:16:14.640 --> 0:16:17.640 that means it's easier for the perceptron to produce a 0:16:17.680 --> 0:16:22.400 positive result a one. A large negative bias does the opposite, 0:16:22.840 --> 0:16:25.400 and thus you would get a zero. So we can 0:16:25.480 --> 0:16:29.480 write out the perceptron's rules like this. Take the value 0:16:29.680 --> 0:16:32.440 of a variable which is either going to be a 0:16:32.520 --> 0:16:36.120 zero or a one. It will be binary. You multiply 0:16:36.440 --> 0:16:40.680 the value of this variable by the weight of that variable, 0:16:41.280 --> 0:16:47.800 and weights can be different values. Let's say that the 0:16:49.000 --> 0:16:52.000 distance and expense are both weighted at two. Tacos gets 0:16:52.000 --> 0:16:56.560 a big hefty six. You're going to add your various 0:16:56.760 --> 0:17:00.040 weighted variable results together, and then you add the I 0:17:00.320 --> 0:17:03.160 s for the perceptron. And in our example, the bias 0:17:03.360 --> 0:17:08.000 is a minus six. That's to tell us that in 0:17:08.200 --> 0:17:11.840 order for this perceptron to fire, you have to you 0:17:11.840 --> 0:17:14.240 have to be able to factor in that minus six 0:17:14.359 --> 0:17:17.600 and beat it. So if after adding these elements together, 0:17:18.200 --> 0:17:21.800 you get a result that is zero or lower, the 0:17:21.880 --> 0:17:24.439 output is a zero or a negative, saying, don't go 0:17:24.480 --> 0:17:27.280 to this restaurant. So after adding that negative six, if 0:17:27.320 --> 0:17:29.920 you have a zero or less, you don't go. If 0:17:29.920 --> 0:17:32.000 you get a result that's greater than zero, it's a 0:17:32.040 --> 0:17:34.840 positive result, it says, go to that restaurant. So here 0:17:34.840 --> 0:17:38.000 in our hypothetical perceptron, we've decided on a bias of 0:17:38.000 --> 0:17:40.760 minus six, and we take our three variables as we 0:17:40.840 --> 0:17:44.240 examine a single restaurant. So this restaurant is twenty five 0:17:44.240 --> 0:17:47.800 minutes away. So that means for our first variable, which 0:17:47.880 --> 0:17:51.159 is all about distance, it gets a zero because it 0:17:51.240 --> 0:17:53.919 is further than twenty minutes away. So that variable is 0:17:53.920 --> 0:17:57.160 a zero. And we multiply the variable times the weight. 0:17:57.560 --> 0:18:00.720 The weight is too for that particular variable two time 0:18:00.840 --> 0:18:05.040 zero is zero. Then I look and I see that 0:18:05.119 --> 0:18:07.240 dinner for two of that restaurant's gonna set me back 0:18:07.359 --> 0:18:10.640 thirty dollars, but that's below the limit we had set 0:18:10.680 --> 0:18:13.240 of fifty dollars. So that means the value of the 0:18:13.320 --> 0:18:16.160 variable is one. It is cheaper than fifty dollars, so 0:18:16.200 --> 0:18:19.720 that gets a one. The weight for this variable is 0:18:19.760 --> 0:18:22.840 to so multiply the weight times of variable two times 0:18:22.880 --> 0:18:26.800 one is two. Then we have the question does the 0:18:26.880 --> 0:18:29.800 restaurants serve tacos? And I know you're dying to know this. 0:18:30.240 --> 0:18:34.560 I'm glad to report the restaurant does in fact serve tacos, 0:18:35.040 --> 0:18:38.080 And that means that the variable is a one. It's positive, 0:18:38.600 --> 0:18:41.159 and we waited this variable very heavily with a six, 0:18:41.359 --> 0:18:44.800 So six times one is six. Now we have to 0:18:44.840 --> 0:18:49.119 add all of those results together, so we have zero 0:18:49.280 --> 0:18:52.280 from the first one, too, from the second one, six 0:18:52.480 --> 0:18:55.200 from the third one. Add that together you get eight. 0:18:55.840 --> 0:18:58.199 Now we have to add in the bias, and the 0:18:58.240 --> 0:19:02.680 bias for this perceptron is a minus six. Eight plus 0:19:02.720 --> 0:19:06.240 minus six gives us a final value of two. Two 0:19:06.359 --> 0:19:09.640 is greater than zero. So by the rules we have established, 0:19:09.680 --> 0:19:13.120 the perceptron says this is a positive result and fires 0:19:13.119 --> 0:19:15.439 off a one. So the restaurant we fed to the 0:19:15.440 --> 0:19:18.800 perceptron met the criteria based on that bias. Now, if 0:19:18.840 --> 0:19:24.520 our bias had been minus ten or minus nine, we 0:19:24.520 --> 0:19:28.159 would have not produced this positive result. We have gotten 0:19:28.640 --> 0:19:31.199 a zero or negative number and it would have said no. 0:19:31.840 --> 0:19:34.840 So that bias is very important, as is the weight 0:19:35.080 --> 0:19:38.600 of the various variables. And that is one neuron. Now 0:19:38.640 --> 0:19:41.439 you can actually create layers of neurons. That's why we 0:19:41.480 --> 0:19:45.240 call it an artificial neural network, not just an artificial neuron. 0:19:45.920 --> 0:19:49.400 And by doing that you can have results from one 0:19:49.640 --> 0:19:55.480 neuron's decisions feed directly into another neuron. Also, a perceptron 0:19:55.600 --> 0:19:59.560 can perform as a type of logical gait called a 0:19:59.760 --> 0:20:04.359 name end gate in a n D that stands for 0:20:04.800 --> 0:20:08.480 not and it's a type of logical gate that can 0:20:08.560 --> 0:20:13.000 produce a false or negative output if all its inputs 0:20:13.200 --> 0:20:16.720 are true or positive. So, in other words, with the 0:20:16.800 --> 0:20:20.840 right weights and biases, a perceptron will produce an output 0:20:20.920 --> 0:20:24.679 of zero if all of its inputs are ones. The 0:20:24.800 --> 0:20:29.240 nand gate in computer science is a universal gate because 0:20:29.600 --> 0:20:33.520 you can use different creations and combinations of nand gates 0:20:34.080 --> 0:20:36.840 and build any kind of computation. You just have to 0:20:36.880 --> 0:20:39.280 link them together properly in order to do it. It's 0:20:39.280 --> 0:20:41.679 not always the most efficient way to do this, but 0:20:41.800 --> 0:20:45.600 it does work. So if you had perceptrons that accepted 0:20:45.720 --> 0:20:48.760 two variables, each with a weight of minus two, and 0:20:48.800 --> 0:20:51.680 the perceptron had a bias of three, it would act 0:20:51.760 --> 0:20:55.480 like a nandgate. That's because if both variables are one, 0:20:56.080 --> 0:20:58.560 then the final equation you'd get to determine the output 0:20:58.560 --> 0:21:02.600 would be minus two because you multiply the weight of 0:21:02.680 --> 0:21:05.560 minus two times the variable of one, and then you 0:21:05.720 --> 0:21:09.040 have to add a second minus two because the second 0:21:09.080 --> 0:21:12.359 variable is the same way. And then you would add 0:21:12.359 --> 0:21:15.080 the bias, which is three. But minus two plus minus 0:21:15.119 --> 0:21:18.600 two is minus four. You add in plus three, you 0:21:18.640 --> 0:21:21.320 get a minus one is the result minus one is 0:21:21.400 --> 0:21:23.760 less than zero, which means they output for the perceptron 0:21:23.960 --> 0:21:27.000 must be zero as opposed to one. You get a 0:21:27.040 --> 0:21:31.560 false or an off or a zero result. Two positive 0:21:31.560 --> 0:21:34.800 inputs create a negative output when a few times you 0:21:34.840 --> 0:21:37.639 can say two positives make a negative. Now that means 0:21:37.920 --> 0:21:42.760 we can ask progressively more complicated questions, with each perceptron 0:21:42.840 --> 0:21:46.480 handling one aspect of that question and feeding into another 0:21:46.560 --> 0:21:50.320 layer of perceptrons. Each perceptron will produce either a positive 0:21:50.400 --> 0:21:52.440 or a negative result, so you either get a one 0:21:52.560 --> 0:21:55.680 or a zero, and these results will feed into other 0:21:55.760 --> 0:21:58.400 neurons in the network, which will use them to perform 0:21:58.560 --> 0:22:02.280 their own calculations of their own weights and their own biases. 0:22:02.800 --> 0:22:05.480 All of this is to feed those questions through a 0:22:05.560 --> 0:22:07.840 network to produce a result, and I should be clear 0:22:08.320 --> 0:22:11.000 the weights for each variable along this path can change 0:22:11.400 --> 0:22:13.880 from one part of the decision making process to the next. 0:22:13.920 --> 0:22:18.360 We're not just talking about identical perceptrons all through the network, 0:22:18.760 --> 0:22:21.320 and that last bit is the most important part, because 0:22:21.520 --> 0:22:24.359 if this were just a matter of setting biases and 0:22:24.400 --> 0:22:27.600 weights and building out a network of perceptrons, there'd be 0:22:27.640 --> 0:22:30.879 nothing special about it, because we already have nannd gates. 0:22:31.760 --> 0:22:35.240 They existed before perceptrons. It would just mean that we 0:22:35.320 --> 0:22:39.119 have a different way to implement something we could already do, 0:22:39.440 --> 0:22:41.320 and finding a new way to do something you were 0:22:41.359 --> 0:22:45.480 already doing is rarely super transformative. You might be able 0:22:45.520 --> 0:22:48.760 to make it a better way of doing the same thing, 0:22:48.840 --> 0:22:52.080 but in this case it might be less efficient than 0:22:52.119 --> 0:22:54.680 the old way. However, there is something else that makes 0:22:54.680 --> 0:22:58.600 these perceptrons special, and that's by pairing them with those 0:22:58.640 --> 0:23:02.280 special algorithms that Cola and Pets were proposing back in 0:23:02.320 --> 0:23:06.040 the forties and fifties. These would be learning algorithms. These 0:23:06.040 --> 0:23:10.800 algorithms are instructions that can, based upon external stimuli, dynamically 0:23:10.880 --> 0:23:15.240 and automatically tune the weights and biases of perceptrons in 0:23:15.320 --> 0:23:18.800 a neural network. In other words, a program can guide 0:23:18.960 --> 0:23:22.840 the network so that it learns how to solve problems. 0:23:22.880 --> 0:23:26.639 But how well. It all comes down to making small 0:23:26.800 --> 0:23:30.399 changes in those weights and biases in order to fine 0:23:30.440 --> 0:23:33.680 tune outputs. So let's say we're working on an image 0:23:33.720 --> 0:23:37.119 recognition algorithm. That's one of the big things that the 0:23:37.160 --> 0:23:40.879 neural engine and Apple's iPhones do. They that's one of 0:23:40.920 --> 0:23:43.920 their main purposes. So in our example, let's say we're 0:23:43.960 --> 0:23:49.479 training the neural network to recognize handwritten printed lowercase letters. 0:23:49.520 --> 0:23:52.240 It's very similar to what McCulla was talking about. But 0:23:52.359 --> 0:23:55.960 let's say our model is having trouble differentiating a lowercase 0:23:56.280 --> 0:24:00.480 L with a lowercase I. It was just having issues 0:24:00.880 --> 0:24:04.280 being able to tell those two apart in particular. Now 0:24:04.440 --> 0:24:07.320 we've got a specific example in which our model is 0:24:07.400 --> 0:24:10.679 misidentifying an L as an eye. Let's say, in the 0:24:10.760 --> 0:24:14.199 hypothetical situation, and so we decide we're gonna make some 0:24:14.280 --> 0:24:18.280 minor tweaks in the weights and biases earlier on in 0:24:18.320 --> 0:24:22.920 the artificial neural network to guide our network so that 0:24:22.960 --> 0:24:27.000 it can more readily tell the difference between l lower 0:24:27.000 --> 0:24:29.239 case ls and lower case eyes. And we get our 0:24:29.280 --> 0:24:31.760 model closer to being able to tell that difference. We 0:24:31.880 --> 0:24:35.400 keep making these small adjustments until we get more consistent output. 0:24:35.840 --> 0:24:38.320 The network as a whole is said to quote unquote 0:24:38.520 --> 0:24:41.639 learn through this process. It's getting better and creating an 0:24:41.680 --> 0:24:45.359 output there's more reflective reality. But there's a bit of 0:24:45.359 --> 0:24:47.920 a problem, and anyone who has worked in QA has 0:24:47.960 --> 0:24:52.200