1 00:00:04,480 --> 00:00:12,399 Speaker 1: Welcome to Tech Stuff, a production from iHeartRadio. Hey there, 2 00:00:12,440 --> 00:00:15,800 Speaker 1: and welcome to tech Stuff. I'm your host, Jonathan Strickland. 3 00:00:15,800 --> 00:00:18,799 Speaker 1: I'm an executive producer with iHeart Podcasts. And how the 4 00:00:18,800 --> 00:00:21,560 Speaker 1: tech are you. I thought today we could do a 5 00:00:21,760 --> 00:00:25,560 Speaker 1: real quickie, because you know, sometimes it's nice just to 6 00:00:25,600 --> 00:00:29,400 Speaker 1: do a short thing to talk about a subject in tech. 7 00:00:29,920 --> 00:00:32,480 Speaker 1: And there are a whole bunch of tech Stuff episodes 8 00:00:32,800 --> 00:00:36,559 Speaker 1: in which I have talked about the Turing test. So 9 00:00:37,200 --> 00:00:39,400 Speaker 1: there are a lot of different variations of the Turing test. 10 00:00:40,040 --> 00:00:43,559 Speaker 1: It's based off a thought experiment from Alan Turing, the 11 00:00:43,840 --> 00:00:48,680 Speaker 1: computer scientist, very influential, very important in World War Two. 12 00:00:49,200 --> 00:00:54,160 Speaker 1: He helped crack the Enigma code. And actually the movie 13 00:00:54,280 --> 00:00:59,720 Speaker 1: that sort of depicted his efforts in cracking Enigma is 14 00:00:59,760 --> 00:01:03,280 Speaker 1: called the Imitation Game. Well, the Imitation Game makes reference 15 00:01:03,440 --> 00:01:07,160 Speaker 1: that is like part of the Turing test. So he 16 00:01:07,319 --> 00:01:09,760 Speaker 1: kind of proposed this test when people would ask him 17 00:01:09,760 --> 00:01:13,240 Speaker 1: if he thought machines would be capable of thought. Now 18 00:01:13,440 --> 00:01:15,520 Speaker 1: keep in mind, like this is back in the forties 19 00:01:15,560 --> 00:01:18,480 Speaker 1: and fifties, do you think that machines will be capable 20 00:01:18,520 --> 00:01:20,240 Speaker 1: of thinking? And he said, I don't really think that's 21 00:01:20,240 --> 00:01:23,080 Speaker 1: a very interesting question. For one thing, I don't know 22 00:01:23,120 --> 00:01:25,840 Speaker 1: that there's any meaningful way to answer it. However, I 23 00:01:25,920 --> 00:01:29,120 Speaker 1: do think we can be a little more precise if 24 00:01:29,160 --> 00:01:32,399 Speaker 1: we think about it in terms of kind of a 25 00:01:32,400 --> 00:01:36,280 Speaker 1: thought experiment, a test. So imagine this is the situation 26 00:01:36,920 --> 00:01:40,560 Speaker 1: you find yourself in. You go into a room and 27 00:01:40,640 --> 00:01:43,760 Speaker 1: there's a computer terminal there, and that's it. You know, 28 00:01:43,800 --> 00:01:45,920 Speaker 1: you can't see into any other rooms or anything. It's 29 00:01:45,959 --> 00:01:48,800 Speaker 1: just a desk with a computer terminal. And you sit 30 00:01:48,880 --> 00:01:52,760 Speaker 1: down at this terminal and there's a little prompt there 31 00:01:52,880 --> 00:01:55,640 Speaker 1: that lets you get into a chat session. And you 32 00:01:55,800 --> 00:01:59,640 Speaker 1: enter into this chat session, and you have five minutes 33 00:01:59,760 --> 00:02:02,640 Speaker 1: and you can ask the person on the other end 34 00:02:02,680 --> 00:02:05,560 Speaker 1: of the chat session any questions you want within that 35 00:02:05,600 --> 00:02:08,760 Speaker 1: five minute time frame. And once those five minutes are up, 36 00:02:08,919 --> 00:02:12,959 Speaker 1: you're asked to determine was the person on the other 37 00:02:13,120 --> 00:02:16,120 Speaker 1: end of the chat an actual human being or was 38 00:02:16,160 --> 00:02:20,200 Speaker 1: it a computer program? Was it some form of artificial intelligence? 39 00:02:20,280 --> 00:02:23,200 Speaker 1: A bot is what we would call it today. And 40 00:02:23,800 --> 00:02:28,600 Speaker 1: if you are unable to determine whether the subject on 41 00:02:28,639 --> 00:02:31,120 Speaker 1: the other end of the chat is human or a 42 00:02:31,160 --> 00:02:35,720 Speaker 1: bot to any reliable degree, then you could say that, oh, 43 00:02:35,880 --> 00:02:40,239 Speaker 1: that that program passed the Turing test, I could find 44 00:02:40,720 --> 00:02:45,399 Speaker 1: no way of telling the difference between that computer program 45 00:02:45,440 --> 00:02:48,919 Speaker 1: and an actual person. Touring suggests that due to advancements 46 00:02:48,919 --> 00:02:52,320 Speaker 1: in computer science, he suspected that people would have at 47 00:02:52,440 --> 00:02:56,160 Speaker 1: best a success rate of around seventy percent to be 48 00:02:56,200 --> 00:02:59,160 Speaker 1: able to tell whether or not the quote unquote person 49 00:02:59,200 --> 00:03:00,720 Speaker 1: on the other end of the church chat was a 50 00:03:00,800 --> 00:03:04,320 Speaker 1: human being or a computer program, And he said he 51 00:03:04,400 --> 00:03:06,720 Speaker 1: expected that to be capable in just a few years time. 52 00:03:06,880 --> 00:03:10,040 Speaker 1: It took a little bit longer than that, but I 53 00:03:10,080 --> 00:03:14,120 Speaker 1: would say that with the sophistication we've reached with chatbots 54 00:03:14,200 --> 00:03:17,640 Speaker 1: these days, I think you could fairly conclusively say that 55 00:03:17,680 --> 00:03:21,600 Speaker 1: we've got programs out there that can quote unquote beat 56 00:03:21,639 --> 00:03:24,240 Speaker 1: the Turing test. Part of the problem is that Turing 57 00:03:24,360 --> 00:03:28,000 Speaker 1: was saying that in the future, these programs are going 58 00:03:28,040 --> 00:03:30,560 Speaker 1: to be sophisticated enough that they will fool people into 59 00:03:30,560 --> 00:03:33,720 Speaker 1: thinking it's another person. He wasn't saying, oh, you have 60 00:03:33,800 --> 00:03:38,480 Speaker 1: to meet this specific threshold for your system to have 61 00:03:39,560 --> 00:03:43,960 Speaker 1: achieved beating the Turing test. That would come afterward. Other 62 00:03:44,000 --> 00:03:47,720 Speaker 1: people would kind of create the criteria. But since then 63 00:03:47,800 --> 00:03:51,320 Speaker 1: people have subsequently used the phrase Turing test to reference. 64 00:03:51,480 --> 00:03:54,520 Speaker 1: Essentially any kind of task designed to determine if a 65 00:03:54,560 --> 00:03:58,720 Speaker 1: machine has or at least appears to have the property 66 00:03:59,280 --> 00:04:01,800 Speaker 1: of intell and when I say that, I mean really 67 00:04:01,960 --> 00:04:07,920 Speaker 1: general intelligence. But there's another specific use of Turing tests 68 00:04:08,000 --> 00:04:10,840 Speaker 1: that I would like to bring up today, and that 69 00:04:11,040 --> 00:04:16,080 Speaker 1: is the completely automated public Turing test to tell computers 70 00:04:16,200 --> 00:04:21,360 Speaker 1: and humans apart that once you turn it into an acronym, 71 00:04:21,440 --> 00:04:25,080 Speaker 1: becomes cap chop. So these are those little tasks that 72 00:04:25,160 --> 00:04:29,520 Speaker 1: you occasionally encounter on certain websites, and they require you 73 00:04:29,640 --> 00:04:32,400 Speaker 1: to do something like you type in a string of 74 00:04:32,480 --> 00:04:36,280 Speaker 1: characters that are displayed on screen. They're usually deformed in 75 00:04:36,360 --> 00:04:40,040 Speaker 1: some way and up against a crazy background. Or you 76 00:04:40,160 --> 00:04:42,840 Speaker 1: might be given a big selection of images and told 77 00:04:42,880 --> 00:04:45,320 Speaker 1: to pick out all the ones that have a cat 78 00:04:45,360 --> 00:04:48,120 Speaker 1: in them or something. Or you might have to drag 79 00:04:48,279 --> 00:04:51,839 Speaker 1: a little picture of a puzzle piece into an image 80 00:04:52,000 --> 00:04:55,279 Speaker 1: where it fits into a very specific spot. All of 81 00:04:55,320 --> 00:04:58,919 Speaker 1: these are meant to separate out actual human visitors to 82 00:04:59,240 --> 00:05:03,560 Speaker 1: a website service versus all the automated programs or bots 83 00:05:03,640 --> 00:05:05,560 Speaker 1: or whatever you might want to call it. So I 84 00:05:05,560 --> 00:05:07,640 Speaker 1: thought it would be fun to do a quick episode 85 00:05:07,880 --> 00:05:11,640 Speaker 1: on where captures came from and what purpose they serve 86 00:05:12,240 --> 00:05:14,560 Speaker 1: kind of touched on that already, but also how they 87 00:05:14,560 --> 00:05:18,160 Speaker 1: fit into the grand picture of artificial intelligence, because interestingly 88 00:05:18,480 --> 00:05:22,039 Speaker 1: they play a pretty important part. They have helped drive 89 00:05:22,240 --> 00:05:26,640 Speaker 1: the development and advancement of artificial intelligence, not necessarily in 90 00:05:26,680 --> 00:05:31,080 Speaker 1: a way that is helpful to everybody out there, but 91 00:05:31,120 --> 00:05:34,480 Speaker 1: it certainly has served as a way to get people 92 00:05:34,960 --> 00:05:39,280 Speaker 1: thinking about how to tackle certain AI problems. So our 93 00:05:39,320 --> 00:05:43,599 Speaker 1: story actually begins with the good old website Yahoo. Y'all 94 00:05:43,600 --> 00:05:46,360 Speaker 1: remember Yahoo. I mean, it's still a thing, but I 95 00:05:46,400 --> 00:05:50,720 Speaker 1: remember a time where Yahoo was practically synonymous with Internet 96 00:05:50,800 --> 00:05:53,159 Speaker 1: for a lot of folks. You may not even remember 97 00:05:53,200 --> 00:05:56,000 Speaker 1: this if you haven't been on Yahoo in ages, but 98 00:05:56,480 --> 00:05:59,400 Speaker 1: once upon a time, Yahoo was sort of an ordal 99 00:05:59,640 --> 00:06:02,159 Speaker 1: to the rest of the Internet. Yahoo was kind of 100 00:06:02,160 --> 00:06:03,919 Speaker 1: like a landing page. A lot of people had it 101 00:06:04,000 --> 00:06:06,960 Speaker 1: set as their homepage, so when they would go into 102 00:06:07,000 --> 00:06:10,440 Speaker 1: a web browser, they'd go right into Yahoo and you 103 00:06:10,440 --> 00:06:13,919 Speaker 1: would find articles there and all sorts of other links, 104 00:06:14,279 --> 00:06:18,000 Speaker 1: as well as chat rooms and of course the search 105 00:06:18,040 --> 00:06:21,440 Speaker 1: engine where you could search for other stuff online besides 106 00:06:21,520 --> 00:06:24,680 Speaker 1: the stuff that just popped up on Yahoo. Well, in 107 00:06:24,720 --> 00:06:29,240 Speaker 1: those chat rooms, moderators were running into a really serious problem. 108 00:06:29,560 --> 00:06:35,440 Speaker 1: The chat spaces were becoming invaded by bots posing as people. 109 00:06:35,600 --> 00:06:37,520 Speaker 1: Now this is in two thousand. The bots were not 110 00:06:37,560 --> 00:06:41,919 Speaker 1: particularly sophisticated, but they were creating a lot of spam, Like, 111 00:06:41,960 --> 00:06:45,880 Speaker 1: they were jamming up chat spaces with just spam messages 112 00:06:45,920 --> 00:06:48,560 Speaker 1: while people are trying to chat. In some cases, they 113 00:06:48,560 --> 00:06:52,279 Speaker 1: were gathering personal information of users in an effort to 114 00:06:52,839 --> 00:06:56,719 Speaker 1: exploit those users in some way or another. So Yahoo 115 00:06:56,720 --> 00:06:59,320 Speaker 1: didn't want this to keep going. It wasn't reflecting well 116 00:06:59,320 --> 00:07:02,440 Speaker 1: on the companies. So they turned to the computer science 117 00:07:02,480 --> 00:07:06,400 Speaker 1: department at Carnegie Mellon University in order to see, Hey, 118 00:07:06,640 --> 00:07:10,200 Speaker 1: is there some way that we could, you know, kind 119 00:07:10,200 --> 00:07:13,400 Speaker 1: of like have a bouncer out front, a gate keeper 120 00:07:13,640 --> 00:07:18,040 Speaker 1: if you will, that would allow humans into the various 121 00:07:18,080 --> 00:07:19,880 Speaker 1: systems so that they can make use of them the 122 00:07:19,880 --> 00:07:23,800 Speaker 1: way they were intended, but prevent all the robots, all 123 00:07:23,840 --> 00:07:28,000 Speaker 1: the AI programs, all the computer software or algorithms, however 124 00:07:28,000 --> 00:07:31,440 Speaker 1: you want to define it, keep them from getting access. 125 00:07:31,680 --> 00:07:36,960 Speaker 1: So a team led by Manuel Bloom and including folks 126 00:07:37,040 --> 00:07:40,120 Speaker 1: or Blum, I suppose, and including folks like John Langford, 127 00:07:40,360 --> 00:07:45,800 Speaker 1: Louis von On, Nicholas Hopper, and others tackled this challenge, 128 00:07:46,200 --> 00:07:49,160 Speaker 1: so they needed to come up with a test. Now, 129 00:07:49,240 --> 00:07:53,200 Speaker 1: in an ideal world, the test would be a cinch 130 00:07:53,480 --> 00:07:56,520 Speaker 1: for a human being to complete, but it would be 131 00:07:56,560 --> 00:08:01,840 Speaker 1: a real stumper for algorithmically driven by And that is 132 00:08:01,920 --> 00:08:06,520 Speaker 1: the basic philosophy of capture. Make a test that humans 133 00:08:06,560 --> 00:08:10,800 Speaker 1: find really easy to complete, perhaps even trivial, like it's 134 00:08:10,960 --> 00:08:15,360 Speaker 1: just a mild inconvenience, as they say, but for bots 135 00:08:15,640 --> 00:08:19,040 Speaker 1: it's like a turn away. You're never going to be 136 00:08:19,080 --> 00:08:21,720 Speaker 1: able to get this. Now, some of y'all might be 137 00:08:21,760 --> 00:08:24,920 Speaker 1: saying something along the lines of but Jonathan, whenever I 138 00:08:25,080 --> 00:08:29,320 Speaker 1: run into captures these days, they're sometimes really hard, Like 139 00:08:29,520 --> 00:08:32,560 Speaker 1: it's hard to see what they spell out. I'll try 140 00:08:32,559 --> 00:08:35,439 Speaker 1: and type things in three four times and get kicked out. 141 00:08:35,760 --> 00:08:38,839 Speaker 1: And you're right, that is a problem. It is something 142 00:08:38,880 --> 00:08:42,720 Speaker 1: that actually is happening. It doesn't mean that you're not human. 143 00:08:43,400 --> 00:08:46,280 Speaker 1: If you're having like existential crises. I would like to 144 00:08:46,320 --> 00:08:49,400 Speaker 1: set your mind at ease by saying you're probably human. 145 00:08:49,800 --> 00:08:52,520 Speaker 1: I mean, I don't think I could say anything for certain, 146 00:08:52,559 --> 00:08:55,480 Speaker 1: but I feel fairly confidence saying you're probably human. But 147 00:08:55,880 --> 00:08:59,719 Speaker 1: the reason why captures have become really difficult in some 148 00:08:59,800 --> 00:09:03,240 Speaker 1: way cases anyway with some specific types of captures is 149 00:09:03,360 --> 00:09:07,800 Speaker 1: largely because other programmers figured out how to make better 150 00:09:07,840 --> 00:09:12,000 Speaker 1: automated programs that can parse and respond to captures. So 151 00:09:12,200 --> 00:09:14,920 Speaker 1: as one group of programmers figured out how to design 152 00:09:15,080 --> 00:09:19,360 Speaker 1: tools to defeat a capture, the captured designers would go 153 00:09:19,440 --> 00:09:23,120 Speaker 1: back to the drawing board to create new tests to 154 00:09:23,160 --> 00:09:26,320 Speaker 1: be more challenging for those bots, to say, well, they 155 00:09:26,559 --> 00:09:31,280 Speaker 1: got good at this, let's change these things and reintroduce 156 00:09:31,400 --> 00:09:36,560 Speaker 1: the capture so that this will trip up those systems 157 00:09:36,600 --> 00:09:39,560 Speaker 1: because while they're good at what we used to use 158 00:09:40,320 --> 00:09:45,920 Speaker 1: for gatekeeping, they've never run into this before, and unfortunately 159 00:09:46,080 --> 00:09:49,360 Speaker 1: that sometimes means that the tests become more challenging for 160 00:09:49,440 --> 00:09:52,440 Speaker 1: human beings as well. It no longer is a case 161 00:09:52,480 --> 00:09:56,160 Speaker 1: where something is trivial for a human but difficult for robots, 162 00:09:56,240 --> 00:09:59,679 Speaker 1: at least for certain types of captures. And that's particularly 163 00:09:59,720 --> 00:10:03,040 Speaker 1: true so if the human has some impairments like if 164 00:10:03,120 --> 00:10:07,000 Speaker 1: they have color blindness for example, or some other visual 165 00:10:07,480 --> 00:10:11,880 Speaker 1: or impairment like there are real issues in making captures 166 00:10:11,920 --> 00:10:14,720 Speaker 1: that do what they're supposed to do, that is weed 167 00:10:14,760 --> 00:10:18,320 Speaker 1: out all the non humans but also be accessible to 168 00:10:18,520 --> 00:10:21,800 Speaker 1: all humans, even those who might have impairments that would 169 00:10:21,840 --> 00:10:25,640 Speaker 1: otherwise make it difficult or challenging to complete a capture. 170 00:10:25,960 --> 00:10:29,320 Speaker 1: It is not an easy path to walk. We're going 171 00:10:29,400 --> 00:10:31,079 Speaker 1: to take a quick break. When we come back, i'll 172 00:10:31,200 --> 00:10:46,200 Speaker 1: talk more about the capture story we're back. So in 173 00:10:46,280 --> 00:10:50,400 Speaker 1: the early days of captures, they mostly took on the 174 00:10:50,440 --> 00:10:55,600 Speaker 1: form of distorted text that was printed over a busy background. 175 00:10:55,880 --> 00:10:58,960 Speaker 1: And the idea was that most automated programs would not 176 00:10:59,040 --> 00:11:02,360 Speaker 1: be able to recognize distorted texts like it would be 177 00:11:02,400 --> 00:11:05,760 Speaker 1: an image, not just text letters where it would be 178 00:11:05,800 --> 00:11:08,360 Speaker 1: able to read like the code used to generate the 179 00:11:08,440 --> 00:11:10,960 Speaker 1: letters and then say, oh, well that's these letters that 180 00:11:11,000 --> 00:11:13,839 Speaker 1: can replicate that and get through no problem. You had 181 00:11:13,840 --> 00:11:18,160 Speaker 1: to have something that was going to really stump them. Now, 182 00:11:18,240 --> 00:11:21,480 Speaker 1: image recognition is a pretty tricky science. I've talked about 183 00:11:21,480 --> 00:11:25,160 Speaker 1: it on this show before, Like, training computer systems to 184 00:11:25,240 --> 00:11:28,960 Speaker 1: recognize images takes a lot of time and effort and 185 00:11:29,040 --> 00:11:32,200 Speaker 1: lots and lots and lots of samples so that the 186 00:11:32,280 --> 00:11:37,640 Speaker 1: computer system can quote unquote learn what those images represent. Now, 187 00:11:37,679 --> 00:11:40,559 Speaker 1: it's one thing to teach a computer how to recognize 188 00:11:40,920 --> 00:11:44,600 Speaker 1: standard letters that are in a recognizable font. So if 189 00:11:44,640 --> 00:11:48,280 Speaker 1: the Internet only ever used one font and only used 190 00:11:48,320 --> 00:11:52,839 Speaker 1: one size of that font. Then it would be relatively 191 00:11:53,120 --> 00:11:56,400 Speaker 1: trivial for those who want to defeat captures, because once 192 00:11:56,400 --> 00:11:58,720 Speaker 1: you train a computer vision system on what a lower 193 00:11:58,760 --> 00:12:02,480 Speaker 1: case T looks like, for example, then the system would 194 00:12:02,480 --> 00:12:05,160 Speaker 1: recognize a lowercase tea every time one popped up. But 195 00:12:05,280 --> 00:12:08,400 Speaker 1: of course, there are lots of different fonts and typefaces 196 00:12:08,440 --> 00:12:11,080 Speaker 1: on the Internet, and they come in different sizes and 197 00:12:11,200 --> 00:12:15,240 Speaker 1: colors and on different backgrounds. So teaching a computer system 198 00:12:15,280 --> 00:12:18,520 Speaker 1: what a times new Roman lowercase tea looks like against 199 00:12:18,559 --> 00:12:21,439 Speaker 1: a blank background doesn't mean it's also going to recognize 200 00:12:21,440 --> 00:12:25,160 Speaker 1: a lowercase tea and some other font on some crazy background. 201 00:12:25,320 --> 00:12:28,040 Speaker 1: Plus maybe the tea is a little wavy, a little distorted, 202 00:12:28,440 --> 00:12:31,640 Speaker 1: so distorting that text makes it more challenging for image 203 00:12:31,679 --> 00:12:36,200 Speaker 1: recognition systems, like they're looking for defining features to be 204 00:12:36,360 --> 00:12:40,040 Speaker 1: able to recognize the image of a letter with the 205 00:12:40,120 --> 00:12:43,280 Speaker 1: actual letter. You see, humans, when we teach a human 206 00:12:43,440 --> 00:12:46,880 Speaker 1: what something looks like, it's a lot easier for humans 207 00:12:46,920 --> 00:12:50,280 Speaker 1: to associate other things that look kind of the way 208 00:12:50,880 --> 00:12:54,160 Speaker 1: the first example did, but maybe not exactly the same. 209 00:12:54,640 --> 00:12:57,640 Speaker 1: So in other words, like the example, I always use 210 00:12:57,640 --> 00:13:00,160 Speaker 1: our coffee mugs right. If I show you a coffee mugdug, 211 00:13:00,200 --> 00:13:02,199 Speaker 1: and I say this is a coffee mug, and then 212 00:13:02,240 --> 00:13:05,240 Speaker 1: I show you a second kind that looks totally different, 213 00:13:05,280 --> 00:13:08,520 Speaker 1: different color, different size, you know, whatever, maybe has different 214 00:13:08,520 --> 00:13:10,800 Speaker 1: writing on it, whatever it might be. And I say, 215 00:13:10,840 --> 00:13:13,400 Speaker 1: this is also a coffee mug. And then I show 216 00:13:13,400 --> 00:13:16,200 Speaker 1: you a third example that looks unlike the first two. 217 00:13:16,440 --> 00:13:18,800 Speaker 1: You could say, oh, okay, I get the idea. I 218 00:13:18,840 --> 00:13:23,200 Speaker 1: get the different features that make up what a coffee 219 00:13:23,280 --> 00:13:26,560 Speaker 1: mug is. I understand now. And now when I encounter 220 00:13:27,160 --> 00:13:29,840 Speaker 1: different types of coffee mugs, even though they might not 221 00:13:29,920 --> 00:13:32,559 Speaker 1: look anything like any of the other ones I've encountered, 222 00:13:32,720 --> 00:13:36,360 Speaker 1: I know, Okay, that's probably a coffee mug. Until someone says, no, 223 00:13:36,440 --> 00:13:39,160 Speaker 1: that's a teacup, and then your world is turned upside down. 224 00:13:39,200 --> 00:13:41,880 Speaker 1: But you get what I'm saying. Computers don't work that way. 225 00:13:42,200 --> 00:13:45,840 Speaker 1: Computers like, if you teach it an example is a thing, 226 00:13:46,320 --> 00:13:52,120 Speaker 1: it doesn't necessarily understand that similar but distinctly different versions 227 00:13:52,160 --> 00:13:55,200 Speaker 1: of that same thing fall into the same category. That 228 00:13:55,320 --> 00:13:58,920 Speaker 1: takes lots and lots of training. So the whole idea 229 00:13:59,080 --> 00:14:02,760 Speaker 1: of distortion was that this would make it very tricky 230 00:14:03,120 --> 00:14:06,040 Speaker 1: for most systems to be able to parse that information 231 00:14:06,320 --> 00:14:09,080 Speaker 1: and be able to put it in reliably and to 232 00:14:09,240 --> 00:14:13,200 Speaker 1: fool the capture system doesn't mean that it was fool proof. 233 00:14:13,360 --> 00:14:16,720 Speaker 1: Over time, those systems did get better at being able 234 00:14:16,760 --> 00:14:20,640 Speaker 1: to recognize those figures that were on screen, even better 235 00:14:20,800 --> 00:14:25,360 Speaker 1: than humans could in some cases, which is obviously a problem. Now, 236 00:14:25,400 --> 00:14:28,480 Speaker 1: there have been lots of other capture systems, not just Capture. 237 00:14:28,760 --> 00:14:34,400 Speaker 1: For example, there's one called Asira Asira. Asira did something 238 00:14:34,480 --> 00:14:37,200 Speaker 1: I mentioned earlier in the episode. It would present the 239 00:14:37,360 --> 00:14:40,680 Speaker 1: visitor with a collection of photographs and they would include 240 00:14:40,760 --> 00:14:43,960 Speaker 1: cats and dogs, and it would ask you, okay, identify 241 00:14:44,200 --> 00:14:47,240 Speaker 1: the pictures that have cats in them. So that was 242 00:14:47,280 --> 00:14:50,920 Speaker 1: one way to get around this was that it wasn't 243 00:14:51,040 --> 00:14:55,240 Speaker 1: just figuring out text. It was differentiating between cats and dogs, 244 00:14:55,560 --> 00:14:59,080 Speaker 1: something that again computer systems couldn't do just natively. They 245 00:14:59,120 --> 00:15:02,600 Speaker 1: had to be taught how to recognize the features that 246 00:15:02,640 --> 00:15:05,200 Speaker 1: belonged to a cat versus those that belonged to a dog, 247 00:15:05,720 --> 00:15:08,720 Speaker 1: just the same as all other image recognition software. The 248 00:15:08,760 --> 00:15:13,280 Speaker 1: folks over at Google developed Recapture, and that actually served 249 00:15:13,280 --> 00:15:17,640 Speaker 1: a dual purpose. It was kind of sneaky. So with Recapture, 250 00:15:17,720 --> 00:15:19,400 Speaker 1: you would go to a website and you would be 251 00:15:19,440 --> 00:15:22,400 Speaker 1: greeted by some you know, kind of grainy text, and 252 00:15:22,440 --> 00:15:24,320 Speaker 1: you'd be asked to type it out. You'd actually get 253 00:15:24,360 --> 00:15:28,120 Speaker 1: a couple of different ones, not just one. And this 254 00:15:28,320 --> 00:15:32,920 Speaker 1: text was from scans made of physical digitized books, so 255 00:15:32,960 --> 00:15:36,200 Speaker 1: in other words, books where they had put the page 256 00:15:36,320 --> 00:15:39,200 Speaker 1: down on a scanner and created a scan. So some 257 00:15:39,240 --> 00:15:41,800 Speaker 1: of these books were in you know, pretty bad shape. 258 00:15:41,840 --> 00:15:46,200 Speaker 1: They were at all crisp, clear images. So your first 259 00:15:46,480 --> 00:15:50,720 Speaker 1: capture you'd be presented with, Google actually knew the answer 260 00:15:50,880 --> 00:15:53,640 Speaker 1: to whatever the word was. So let's say the word 261 00:15:54,000 --> 00:15:58,800 Speaker 1: is salamander and you type in salamander, and so Google says, 262 00:15:58,800 --> 00:16:04,200 Speaker 1: all right, I already knew that this scanned word is salamander. 263 00:16:04,640 --> 00:16:06,960 Speaker 1: This is obviously a person who has typed this in. 264 00:16:07,280 --> 00:16:11,160 Speaker 1: But the second image would be a scan from a book. 265 00:16:11,200 --> 00:16:13,720 Speaker 1: Maybe it'd be a really smudged one, like one that's 266 00:16:13,760 --> 00:16:16,840 Speaker 1: harder to read, and it would ask you, okay, was 267 00:16:16,880 --> 00:16:20,320 Speaker 1: this word. Let's say it's surgeon and you type insurgeon. Well, 268 00:16:20,360 --> 00:16:25,200 Speaker 1: the secret sauce here is that Google didn't know that 269 00:16:25,200 --> 00:16:28,280 Speaker 1: that scanned word was surgeon. What Google was doing was 270 00:16:28,320 --> 00:16:33,560 Speaker 1: crowdsourcing crowdsourcing the effort to figure out what the text 271 00:16:33,840 --> 00:16:39,400 Speaker 1: in this scanned image actually said. So if you and 272 00:16:40,000 --> 00:16:43,280 Speaker 1: thousands of other people all put the same word in 273 00:16:43,760 --> 00:16:47,360 Speaker 1: when you were encountering this particular scan, Google would say, 274 00:16:47,400 --> 00:16:51,840 Speaker 1: all right, that word is very likely surgeon. Because you know, 275 00:16:52,040 --> 00:16:55,280 Speaker 1: ninety eight percent of the people who were shown this 276 00:16:55,680 --> 00:16:59,480 Speaker 1: recapture typed surgeon in. So now we know that that 277 00:16:59,520 --> 00:17:04,880 Speaker 1: word is which meant that they could essentially transcribe these 278 00:17:04,960 --> 00:17:09,840 Speaker 1: digitized texts by using the crowd to do the work 279 00:17:09,880 --> 00:17:13,159 Speaker 1: for them. And that is kind of the heart of 280 00:17:13,280 --> 00:17:17,840 Speaker 1: where capture and AI meet. That captures have been used 281 00:17:18,119 --> 00:17:21,880 Speaker 1: one to help train AI so that it's more effective. 282 00:17:21,920 --> 00:17:24,119 Speaker 1: Like if you've encountered other Google ones where it's like 283 00:17:24,320 --> 00:17:26,840 Speaker 1: pick all the images here that have motorcycles in them 284 00:17:27,119 --> 00:17:31,600 Speaker 1: or stairs. Well, part of that is training Google's image 285 00:17:31,640 --> 00:17:37,120 Speaker 1: recognition systems so that they're more accurate. Right, Like an 286 00:17:37,160 --> 00:17:41,359 Speaker 1: image recognition system might have trouble differentiating an actual like 287 00:17:41,440 --> 00:17:44,920 Speaker 1: stone staircase out in front of a building with a 288 00:17:44,960 --> 00:17:48,800 Speaker 1: pedestrian crosswalk, because you know you've got those those broken 289 00:17:48,920 --> 00:17:52,760 Speaker 1: lines on a crosswalk, those could look like stairs to 290 00:17:53,000 --> 00:17:57,440 Speaker 1: a computer image recognition system. So by giving users the 291 00:17:57,480 --> 00:18:01,080 Speaker 1: task of hey, identify all the excit samples in this 292 00:18:01,359 --> 00:18:05,080 Speaker 1: list that have stairs in them, Google starts to train 293 00:18:05,200 --> 00:18:09,119 Speaker 1: its own image recognition algorithms to be more effective and 294 00:18:09,200 --> 00:18:13,000 Speaker 1: more accurate. So in a way, we were essentially being 295 00:18:13,160 --> 00:18:18,640 Speaker 1: used as free labor to make these AI systems more accurate, 296 00:18:19,000 --> 00:18:20,800 Speaker 1: just so that we could get access to whatever it 297 00:18:20,920 --> 00:18:23,719 Speaker 1: was we were trying to visit, whether that was an 298 00:18:23,800 --> 00:18:27,159 Speaker 1: online shop or a chat room, or you know, whatever 299 00:18:27,200 --> 00:18:31,719 Speaker 1: it might be. So, yeah, we we've been working for free, y'all. 300 00:18:32,080 --> 00:18:35,240 Speaker 1: Actually it's in some cases we've been working for free 301 00:18:35,359 --> 00:18:38,880 Speaker 1: and denied access to tools that we wanted to use 302 00:18:39,040 --> 00:18:41,240 Speaker 1: because the captures were too hard for us to be 303 00:18:41,280 --> 00:18:44,480 Speaker 1: able to solve. But yeah, that's that's the quick story 304 00:18:44,720 --> 00:18:48,400 Speaker 1: about the history and evolution of captures. Clearly they're still 305 00:18:48,480 --> 00:18:51,600 Speaker 1: used today. Sometimes it's something simple like click this box 306 00:18:51,640 --> 00:18:54,040 Speaker 1: to prove your human that kind of thing where it 307 00:18:54,080 --> 00:18:57,400 Speaker 1: requires you to take an action. Those obviously are much 308 00:18:57,440 --> 00:19:02,159 Speaker 1: more simple for humans to comp than for robots, so 309 00:19:02,240 --> 00:19:06,080 Speaker 1: those still follow the philosophy of the original captions. A 310 00:19:06,119 --> 00:19:08,840 Speaker 1: lot of other ones, though they get pretty tricky, to 311 00:19:08,880 --> 00:19:11,720 Speaker 1: the point where sometimes I'm discouraged from even going further 312 00:19:12,080 --> 00:19:15,080 Speaker 1: and visiting the website in particular, just like, you know what, 313 00:19:15,400 --> 00:19:18,520 Speaker 1: I don't need to feel stupid because I couldn't find 314 00:19:18,560 --> 00:19:22,440 Speaker 1: all the fire hydrants in these photographs, so I'm just out. 315 00:19:22,720 --> 00:19:25,080 Speaker 1: But yeah, that's it. And like I said, it plays 316 00:19:25,119 --> 00:19:27,320 Speaker 1: a really important part with AI. It's kind of a 317 00:19:27,359 --> 00:19:33,159 Speaker 1: seesaw effect, right, Like you create a barrier that AI 318 00:19:33,359 --> 00:19:36,080 Speaker 1: can't get over until it can, and then you have 319 00:19:36,119 --> 00:19:39,520 Speaker 1: to go back and create a harder barrier. And meanwhile, 320 00:19:39,680 --> 00:19:42,840 Speaker 1: the folks developing the AI keep making advancements that the 321 00:19:42,880 --> 00:19:46,600 Speaker 1: AI gets more sophisticated and powerful over time. So yeah, 322 00:19:46,680 --> 00:19:50,960 Speaker 1: delicate balance and not everybody benefits. As I said, hope 323 00:19:51,000 --> 00:19:54,200 Speaker 1: that that was interesting and informative to y'all. I hope 324 00:19:54,240 --> 00:19:56,919 Speaker 1: you're all doing well, and I'll talk to you again 325 00:19:57,480 --> 00:20:07,640 Speaker 1: really soon. Tech Stuff is an iHeartRadio production. For more 326 00:20:07,720 --> 00:20:12,480 Speaker 1: podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or 327 00:20:12,480 --> 00:20:14,439 Speaker 1: wherever you listen to your favorite shows.