WEBVTT - AI Defeats Anti-AI Test

0:00:03.720 --> 0:00:07.000
<v Speaker 1>What happens when an artificially intelligent network figures out how

0:00:07.040 --> 0:00:10.240
<v Speaker 1>to beat your anti AI test. Time to come up

0:00:10.280 --> 0:00:13.360
<v Speaker 1>with a new test. I'm Jonathan Strickland, and this is

0:00:13.400 --> 0:00:19.599
<v Speaker 1>Tech Stuff Daily. If you've ever filled out any sort

0:00:19.640 --> 0:00:23.000
<v Speaker 1>of online form, you've probably encountered a caption test. The

0:00:23.079 --> 0:00:26.400
<v Speaker 1>word stands for a completely automated public touring test to

0:00:26.480 --> 0:00:29.360
<v Speaker 1>tell computers and humans apart. It's a bit of a mouthful,

0:00:29.400 --> 0:00:32.000
<v Speaker 1>so I'm glad we can stick with CAPTA. Perhaps the

0:00:32.000 --> 0:00:34.680
<v Speaker 1>most interesting part of that name is the Turing test.

0:00:35.120 --> 0:00:38.839
<v Speaker 1>Named after a computer scientist and codebreaker Alan Turing, the

0:00:38.840 --> 0:00:42.640
<v Speaker 1>Turing test has evolved into an interesting concept. The Turing

0:00:42.640 --> 0:00:45.320
<v Speaker 1>test of today is a scenario in which a human

0:00:45.360 --> 0:00:49.080
<v Speaker 1>interrogator uses some sort of interface, often a computer terminal,

0:00:49.320 --> 0:00:52.760
<v Speaker 1>to communicate with someone else. That someone else might be

0:00:52.840 --> 0:00:56.160
<v Speaker 1>a human being, or it might be a computer program.

0:00:56.200 --> 0:00:59.800
<v Speaker 1>If enough human interrogators are fooled into thinking the computer

0:00:59.800 --> 0:01:02.960
<v Speaker 1>program is actually a person, that program is said to

0:01:02.960 --> 0:01:05.760
<v Speaker 1>have passed the Turing test. The program has appeared to

0:01:05.800 --> 0:01:08.840
<v Speaker 1>be indistinguishable from a human being to a significant percentage

0:01:08.840 --> 0:01:11.679
<v Speaker 1>of interrogators. In the case of captures, the Turing test

0:01:11.800 --> 0:01:14.399
<v Speaker 1>is a sort of shorthand for a gateway that is

0:01:14.400 --> 0:01:18.040
<v Speaker 1>meant to allow humans to pass through while stopping artificially

0:01:18.040 --> 0:01:21.959
<v Speaker 1>intelligent constructs at the border. The general idea behind the

0:01:22.000 --> 0:01:24.800
<v Speaker 1>captured test is that it should be something relatively simple

0:01:24.840 --> 0:01:27.920
<v Speaker 1>for a human to complete, but very challenging for a

0:01:27.959 --> 0:01:31.000
<v Speaker 1>program to do. This is important because you don't want

0:01:31.000 --> 0:01:33.360
<v Speaker 1>to capture to be challenging for humans to complete, since

0:01:33.360 --> 0:01:35.400
<v Speaker 1>that would discourage people from filling out the form in

0:01:35.440 --> 0:01:37.640
<v Speaker 1>the first place. It doesn't do you any good if

0:01:37.640 --> 0:01:40.280
<v Speaker 1>you prevent anyone from going through the gate. There are

0:01:40.319 --> 0:01:42.880
<v Speaker 1>some things that robots and software can do much better

0:01:42.920 --> 0:01:46.440
<v Speaker 1>than humans, such as drive a car under normal conditions

0:01:46.600 --> 0:01:48.840
<v Speaker 1>or play a game of chess. But there are some

0:01:48.960 --> 0:01:51.840
<v Speaker 1>things that we humans can do pretty easily that AI

0:01:51.960 --> 0:01:54.920
<v Speaker 1>has a really tough time handling. At least that's how

0:01:55.000 --> 0:01:57.960
<v Speaker 1>things stand now. AI is getting better all the time,

0:01:58.000 --> 0:01:59.920
<v Speaker 1>and we humans have seemed to stalled out all the

0:02:00.000 --> 0:02:04.160
<v Speaker 1>little bit. Evolutionarily speaking, a simple capture example that you've

0:02:04.240 --> 0:02:06.520
<v Speaker 1>likely seen would be a series of letters and numbers

0:02:06.560 --> 0:02:10.600
<v Speaker 1>that are deformed or semi obscured by other shapes. As humans,

0:02:10.639 --> 0:02:13.720
<v Speaker 1>we can recognize which shapes represent letters or numbers and

0:02:13.760 --> 0:02:16.639
<v Speaker 1>which are just distractions, But computers have a lot harder

0:02:16.639 --> 0:02:20.359
<v Speaker 1>time separating the signal from the noise. Enter the AI

0:02:20.480 --> 0:02:24.400
<v Speaker 1>research firm Vicarious, a company that has received funding from

0:02:24.440 --> 0:02:28.919
<v Speaker 1>such tech luminaries as Mark Zuckerberg and Jeff Bezos. Vicarious

0:02:29.000 --> 0:02:32.519
<v Speaker 1>created an artificial neural network that the firm claims can

0:02:32.560 --> 0:02:35.920
<v Speaker 1>solve capture puzzles. That's pretty impressive considering some of the

0:02:35.960 --> 0:02:39.760
<v Speaker 1>trickier puzzles can stomp human beings. Google's Recapture has an

0:02:39.760 --> 0:02:44.000
<v Speaker 1>eighty seven percent solution rate among humans. The neural network

0:02:44.080 --> 0:02:46.359
<v Speaker 1>can suss out the shape of letters and numbers while

0:02:46.360 --> 0:02:49.959
<v Speaker 1>ignoring all that distracting noise. A neural network mimics the

0:02:50.000 --> 0:02:53.400
<v Speaker 1>way our brains work. Instead of neurons the nerve cells

0:02:53.440 --> 0:02:56.800
<v Speaker 1>and our brains, the network uses network computer systems. The

0:02:56.880 --> 0:03:00.040
<v Speaker 1>artificial neurons attack problems in layers, with each neuron and

0:03:00.120 --> 0:03:03.079
<v Speaker 1>effectively tackling a different part of the problem, and then

0:03:03.120 --> 0:03:05.400
<v Speaker 1>combining all of those solutions together to come up with

0:03:05.440 --> 0:03:08.840
<v Speaker 1>an overall solution to the caption. These networks aren't just

0:03:08.919 --> 0:03:12.200
<v Speaker 1>automatically good at completing tasks. You have to actually teach

0:03:12.240 --> 0:03:14.120
<v Speaker 1>them how to do it, just as you would a

0:03:14.200 --> 0:03:17.480
<v Speaker 1>human being. Back in two thousand twelve, folks across the

0:03:17.520 --> 0:03:20.160
<v Speaker 1>Internet reacted in glee at the news that a network

0:03:20.200 --> 0:03:24.720
<v Speaker 1>of sixteen thousand computers had learned how to recognize cats

0:03:24.960 --> 0:03:29.120
<v Speaker 1>by examining ten million images culled from YouTube videos. What

0:03:29.240 --> 0:03:32.760
<v Speaker 1>could be more representative of the modern online experience, But

0:03:32.840 --> 0:03:36.600
<v Speaker 1>that example illustrates how complicated this process is. It took

0:03:36.680 --> 0:03:39.880
<v Speaker 1>thousands of computers working together to learn how to identify

0:03:39.920 --> 0:03:43.360
<v Speaker 1>a cat based off a sample size of ten million images.

0:03:43.840 --> 0:03:46.360
<v Speaker 1>So while it is possible to teach computers to do

0:03:46.440 --> 0:03:51.200
<v Speaker 1>things like recognize cats or captures, it's not easy. Humans

0:03:51.200 --> 0:03:55.760
<v Speaker 1>are still better at this than machines are at learning. Unfortunately,

0:03:56.400 --> 0:03:59.600
<v Speaker 1>captured tests aren't really about learning, or else we would

0:03:59.640 --> 0:04:02.600
<v Speaker 1>still have a leg up on the robotic competition. As

0:04:02.640 --> 0:04:06.480
<v Speaker 1>AI has become better at recognizing captures, designers have created

0:04:06.520 --> 0:04:11.160
<v Speaker 1>more difficult puzzles that obviously impacts both humans and non humans,

0:04:11.720 --> 0:04:16.599
<v Speaker 1>according to researchers with Vicarious. Since Vicarious researchers sounds weird,

0:04:17.040 --> 0:04:20.320
<v Speaker 1>even with new and improved captures, their software can solve

0:04:20.360 --> 0:04:22.800
<v Speaker 1>the puzzles sixty six point six percent of the time.

0:04:22.839 --> 0:04:26.320
<v Speaker 1>With Google's recapture system fifty seven point four percent of

0:04:26.360 --> 0:04:29.480
<v Speaker 1>the time for Yahoo's captures and fifty seven point one

0:04:29.520 --> 0:04:33.880
<v Speaker 1>percent at a time with Paypals captures. According to security experts,

0:04:33.920 --> 0:04:36.200
<v Speaker 1>this means we can expect to see hackers make use

0:04:36.240 --> 0:04:39.040
<v Speaker 1>of those same techniques within a few months. That will

0:04:39.080 --> 0:04:41.600
<v Speaker 1>mean the capture will cease to be an effective gatekeeper

0:04:41.640 --> 0:04:45.000
<v Speaker 1>to keep bots out. We may see more creative approaches.

0:04:45.160 --> 0:04:48.120
<v Speaker 1>For example, some sites use a short written passage followed

0:04:48.120 --> 0:04:50.760
<v Speaker 1>by a simple question about the reading material to test

0:04:50.839 --> 0:04:54.440
<v Speaker 1>for human nests. Eventually, AI and neural networks may be

0:04:54.440 --> 0:04:57.360
<v Speaker 1>good enough to navigate those systems that we humans are

0:04:57.440 --> 0:05:00.599
<v Speaker 1>used to without any trouble at all. At that point,

0:05:00.600 --> 0:05:02.800
<v Speaker 1>we may need to re examine how we protect sites

0:05:02.839 --> 0:05:05.919
<v Speaker 1>and services from abuse. Or maybe by then we'll have

0:05:06.080 --> 0:05:09.440
<v Speaker 1>merged with the machines and entered the singularity. To learn

0:05:09.480 --> 0:05:13.000
<v Speaker 1>more about artificial intelligence and how machines and human intelligence

0:05:13.040 --> 0:05:15.840
<v Speaker 1>may one day merge into one. Subscribe to the tech

0:05:15.880 --> 0:05:20.159
<v Speaker 1>Stuff podcast. We also talk about other texts like catapults,

0:05:20.160 --> 0:05:23.480
<v Speaker 1>sort of like a grab bag of awesome tech topics.

0:05:23.960 --> 0:05:26.400
<v Speaker 1>The show publishes on Wednesdays and Fridays, and it's a

0:05:26.440 --> 0:05:29.800
<v Speaker 1>deep dive into all things tell I'll see you again soon.