1
00:00:04,480 --> 00:00:12,399
Speaker 1: Welcome to Tech Stuff, a production from iHeartRadio. Hey there,

2
00:00:12,440 --> 00:00:15,800
Speaker 1: and welcome to tech Stuff. I'm your host, Jonathan Strickland.

3
00:00:15,800 --> 00:00:18,799
Speaker 1: I'm an executive producer with iHeart Podcasts. And how the

4
00:00:18,800 --> 00:00:21,560
Speaker 1: tech are you. I thought today we could do a

5
00:00:21,760 --> 00:00:25,560
Speaker 1: real quickie, because you know, sometimes it's nice just to

6
00:00:25,600 --> 00:00:29,400
Speaker 1: do a short thing to talk about a subject in tech.

7
00:00:29,920 --> 00:00:32,480
Speaker 1: And there are a whole bunch of tech Stuff episodes

8
00:00:32,800 --> 00:00:36,559
Speaker 1: in which I have talked about the Turing test. So

9
00:00:37,200 --> 00:00:39,400
Speaker 1: there are a lot of different variations of the Turing test.

10
00:00:40,040 --> 00:00:43,559
Speaker 1: It's based off a thought experiment from Alan Turing, the

11
00:00:43,840 --> 00:00:48,680
Speaker 1: computer scientist, very influential, very important in World War Two.

12
00:00:49,200 --> 00:00:54,160
Speaker 1: He helped crack the Enigma code. And actually the movie

13
00:00:54,280 --> 00:00:59,720
Speaker 1: that sort of depicted his efforts in cracking Enigma is

14
00:00:59,760 --> 00:01:03,280
Speaker 1: called the Imitation Game. Well, the Imitation Game makes reference

15
00:01:03,440 --> 00:01:07,160
Speaker 1: that is like part of the Turing test. So he

16
00:01:07,319 --> 00:01:09,760
Speaker 1: kind of proposed this test when people would ask him

17
00:01:09,760 --> 00:01:13,240
Speaker 1: if he thought machines would be capable of thought. Now

18
00:01:13,440 --> 00:01:15,520
Speaker 1: keep in mind, like this is back in the forties

19
00:01:15,560 --> 00:01:18,480
Speaker 1: and fifties, do you think that machines will be capable

20
00:01:18,520 --> 00:01:20,240
Speaker 1: of thinking? And he said, I don't really think that's

21
00:01:20,240 --> 00:01:23,080
Speaker 1: a very interesting question. For one thing, I don't know

22
00:01:23,120 --> 00:01:25,840
Speaker 1: that there's any meaningful way to answer it. However, I

23
00:01:25,920 --> 00:01:29,120
Speaker 1: do think we can be a little more precise if

24
00:01:29,160 --> 00:01:32,399
Speaker 1: we think about it in terms of kind of a

25
00:01:32,400 --> 00:01:36,280
Speaker 1: thought experiment, a test. So imagine this is the situation

26
00:01:36,920 --> 00:01:40,560
Speaker 1: you find yourself in. You go into a room and

27
00:01:40,640 --> 00:01:43,760
Speaker 1: there's a computer terminal there, and that's it. You know,

28
00:01:43,800 --> 00:01:45,920
Speaker 1: you can't see into any other rooms or anything. It's

29
00:01:45,959 --> 00:01:48,800
Speaker 1: just a desk with a computer terminal. And you sit

30
00:01:48,880 --> 00:01:52,760
Speaker 1: down at this terminal and there's a little prompt there

31
00:01:52,880 --> 00:01:55,640
Speaker 1: that lets you get into a chat session. And you

32
00:01:55,800 --> 00:01:59,640
Speaker 1: enter into this chat session, and you have five minutes

33
00:01:59,760 --> 00:02:02,640
Speaker 1: and you can ask the person on the other end

34
00:02:02,680 --> 00:02:05,560
Speaker 1: of the chat session any questions you want within that

35
00:02:05,600 --> 00:02:08,760
Speaker 1: five minute time frame. And once those five minutes are up,

36
00:02:08,919 --> 00:02:12,959
Speaker 1: you're asked to determine was the person on the other

37
00:02:13,120 --> 00:02:16,120
Speaker 1: end of the chat an actual human being or was

38
00:02:16,160 --> 00:02:20,200
Speaker 1: it a computer program? Was it some form of artificial intelligence?

39
00:02:20,280 --> 00:02:23,200
Speaker 1: A bot is what we would call it today. And

40
00:02:23,800 --> 00:02:28,600
Speaker 1: if you are unable to determine whether the subject on

41
00:02:28,639 --> 00:02:31,120
Speaker 1: the other end of the chat is human or a

42
00:02:31,160 --> 00:02:35,720
Speaker 1: bot to any reliable degree, then you could say that, oh,

43
00:02:35,880 --> 00:02:40,239
Speaker 1: that that program passed the Turing test, I could find

44
00:02:40,720 --> 00:02:45,399
Speaker 1: no way of telling the difference between that computer program

45
00:02:45,440 --> 00:02:48,919
Speaker 1: and an actual person. Touring suggests that due to advancements

46
00:02:48,919 --> 00:02:52,320
Speaker 1: in computer science, he suspected that people would have at

47
00:02:52,440 --> 00:02:56,160
Speaker 1: best a success rate of around seventy percent to be

48
00:02:56,200 --> 00:02:59,160
Speaker 1: able to tell whether or not the quote unquote person

49
00:02:59,200 --> 00:03:00,720
Speaker 1: on the other end of the church chat was a

50
00:03:00,800 --> 00:03:04,320
Speaker 1: human being or a computer program, And he said he

51
00:03:04,400 --> 00:03:06,720
Speaker 1: expected that to be capable in just a few years time.

52
00:03:06,880 --> 00:03:10,040
Speaker 1: It took a little bit longer than that, but I

53
00:03:10,080 --> 00:03:14,120
Speaker 1: would say that with the sophistication we've reached with chatbots

54
00:03:14,200 --> 00:03:17,640
Speaker 1: these days, I think you could fairly conclusively say that

55
00:03:17,680 --> 00:03:21,600
Speaker 1: we've got programs out there that can quote unquote beat

56
00:03:21,639 --> 00:03:24,240
Speaker 1: the Turing test. Part of the problem is that Turing

57
00:03:24,360 --> 00:03:28,000
Speaker 1: was saying that in the future, these programs are going

58
00:03:28,040 --> 00:03:30,560
Speaker 1: to be sophisticated enough that they will fool people into

59
00:03:30,560 --> 00:03:33,720
Speaker 1: thinking it's another person. He wasn't saying, oh, you have

60
00:03:33,800 --> 00:03:38,480
Speaker 1: to meet this specific threshold for your system to have

61
00:03:39,560 --> 00:03:43,960
Speaker 1: achieved beating the Turing test. That would come afterward. Other

62
00:03:44,000 --> 00:03:47,720
Speaker 1: people would kind of create the criteria. But since then

63
00:03:47,800 --> 00:03:51,320
Speaker 1: people have subsequently used the phrase Turing test to reference.

64
00:03:51,480 --> 00:03:54,520
Speaker 1: Essentially any kind of task designed to determine if a

65
00:03:54,560 --> 00:03:58,720
Speaker 1: machine has or at least appears to have the property

66
00:03:59,280 --> 00:04:01,800
Speaker 1: of intell and when I say that, I mean really

67
00:04:01,960 --> 00:04:07,920
Speaker 1: general intelligence. But there's another specific use of Turing tests

68
00:04:08,000 --> 00:04:10,840
Speaker 1: that I would like to bring up today, and that

69
00:04:11,040 --> 00:04:16,080
Speaker 1: is the completely automated public Turing test to tell computers

70
00:04:16,200 --> 00:04:21,360
Speaker 1: and humans apart that once you turn it into an acronym,

71
00:04:21,440 --> 00:04:25,080
Speaker 1: becomes cap chop. So these are those little tasks that

72
00:04:25,160 --> 00:04:29,520
Speaker 1: you occasionally encounter on certain websites, and they require you

73
00:04:29,640 --> 00:04:32,400
Speaker 1: to do something like you type in a string of

74
00:04:32,480 --> 00:04:36,280
Speaker 1: characters that are displayed on screen. They're usually deformed in

75
00:04:36,360 --> 00:04:40,040
Speaker 1: some way and up against a crazy background. Or you

76
00:04:40,160 --> 00:04:42,840
Speaker 1: might be given a big selection of images and told

77
00:04:42,880 --> 00:04:45,320
Speaker 1: to pick out all the ones that have a cat

78
00:04:45,360 --> 00:04:48,120
Speaker 1: in them or something. Or you might have to drag

79
00:04:48,279 --> 00:04:51,839
Speaker 1: a little picture of a puzzle piece into an image

80
00:04:52,000 --> 00:04:55,279
Speaker 1: where it fits into a very specific spot. All of

81
00:04:55,320 --> 00:04:58,919
Speaker 1: these are meant to separate out actual human visitors to

82
00:04:59,240 --> 00:05:03,560
Speaker 1: a website service versus all the automated programs or bots

83
00:05:03,640 --> 00:05:05,560
Speaker 1: or whatever you might want to call it. So I

84
00:05:05,560 --> 00:05:07,640
Speaker 1: thought it would be fun to do a quick episode

85
00:05:07,880 --> 00:05:11,640
Speaker 1: on where captures came from and what purpose they serve

86
00:05:12,240 --> 00:05:14,560
Speaker 1: kind of touched on that already, but also how they

87
00:05:14,560 --> 00:05:18,160
Speaker 1: fit into the grand picture of artificial intelligence, because interestingly

88
00:05:18,480 --> 00:05:22,039
Speaker 1: they play a pretty important part. They have helped drive

89
00:05:22,240 --> 00:05:26,640
Speaker 1: the development and advancement of artificial intelligence, not necessarily in

90
00:05:26,680 --> 00:05:31,080
Speaker 1: a way that is helpful to everybody out there, but

91
00:05:31,120 --> 00:05:34,480
Speaker 1: it certainly has served as a way to get people

92
00:05:34,960 --> 00:05:39,280
Speaker 1: thinking about how to tackle certain AI problems. So our

93
00:05:39,320 --> 00:05:43,599
Speaker 1: story actually begins with the good old website Yahoo. Y'all

94
00:05:43,600 --> 00:05:46,360
Speaker 1: remember Yahoo. I mean, it's still a thing, but I

95
00:05:46,400 --> 00:05:50,720
Speaker 1: remember a time where Yahoo was practically synonymous with Internet

96
00:05:50,800 --> 00:05:53,159
Speaker 1: for a lot of folks. You may not even remember

97
00:05:53,200 --> 00:05:56,000
Speaker 1: this if you haven't been on Yahoo in ages, but

98
00:05:56,480 --> 00:05:59,400
Speaker 1: once upon a time, Yahoo was sort of an ordal

99
00:05:59,640 --> 00:06:02,159
Speaker 1: to the rest of the Internet. Yahoo was kind of

100
00:06:02,160 --> 00:06:03,919
Speaker 1: like a landing page. A lot of people had it

101
00:06:04,000 --> 00:06:06,960
Speaker 1: set as their homepage, so when they would go into

102
00:06:07,000 --> 00:06:10,440
Speaker 1: a web browser, they'd go right into Yahoo and you

103
00:06:10,440 --> 00:06:13,919
Speaker 1: would find articles there and all sorts of other links,

104
00:06:14,279 --> 00:06:18,000
Speaker 1: as well as chat rooms and of course the search

105
00:06:18,040 --> 00:06:21,440
Speaker 1: engine where you could search for other stuff online besides

106
00:06:21,520 --> 00:06:24,680
Speaker 1: the stuff that just popped up on Yahoo. Well, in

107
00:06:24,720 --> 00:06:29,240
Speaker 1: those chat rooms, moderators were running into a really serious problem.

108
00:06:29,560 --> 00:06:35,440
Speaker 1: The chat spaces were becoming invaded by bots posing as people.

109
00:06:35,600 --> 00:06:37,520
Speaker 1: Now this is in two thousand. The bots were not

110
00:06:37,560 --> 00:06:41,919
Speaker 1: particularly sophisticated, but they were creating a lot of spam, Like,

111
00:06:41,960 --> 00:06:45,880
Speaker 1: they were jamming up chat spaces with just spam messages

112
00:06:45,920 --> 00:06:48,560
Speaker 1: while people are trying to chat. In some cases, they

113
00:06:48,560 --> 00:06:52,279
Speaker 1: were gathering personal information of users in an effort to

114
00:06:52,839 --> 00:06:56,719
Speaker 1: exploit those users in some way or another. So Yahoo

115
00:06:56,720 --> 00:06:59,320
Speaker 1: didn't want this to keep going. It wasn't reflecting well

116
00:06:59,320 --> 00:07:02,440
Speaker 1: on the companies. So they turned to the computer science

117
00:07:02,480 --> 00:07:06,400
Speaker 1: department at Carnegie Mellon University in order to see, Hey,

118
00:07:06,640 --> 00:07:10,200
Speaker 1: is there some way that we could, you know, kind

119
00:07:10,200 --> 00:07:13,400
Speaker 1: of like have a bouncer out front, a gate keeper

120
00:07:13,640 --> 00:07:18,040
Speaker 1: if you will, that would allow humans into the various

121
00:07:18,080 --> 00:07:19,880
Speaker 1: systems so that they can make use of them the

122
00:07:19,880 --> 00:07:23,800
Speaker 1: way they were intended, but prevent all the robots, all

123
00:07:23,840 --> 00:07:28,000
Speaker 1: the AI programs, all the computer software or algorithms, however

124
00:07:28,000 --> 00:07:31,440
Speaker 1: you want to define it, keep them from getting access.

125
00:07:31,680 --> 00:07:36,960
Speaker 1: So a team led by Manuel Bloom and including folks

126
00:07:37,040 --> 00:07:40,120
Speaker 1: or Blum, I suppose, and including folks like John Langford,

127
00:07:40,360 --> 00:07:45,800
Speaker 1: Louis von On, Nicholas Hopper, and others tackled this challenge,

128
00:07:46,200 --> 00:07:49,160
Speaker 1: so they needed to come up with a test. Now,

129
00:07:49,240 --> 00:07:53,200
Speaker 1: in an ideal world, the test would be a cinch

130
00:07:53,480 --> 00:07:56,520
Speaker 1: for a human being to complete, but it would be

131
00:07:56,560 --> 00:08:01,840
Speaker 1: a real stumper for algorithmically driven by And that is

132
00:08:01,920 --> 00:08:06,520
Speaker 1: the basic philosophy of capture. Make a test that humans

133
00:08:06,560 --> 00:08:10,800
Speaker 1: find really easy to complete, perhaps even trivial, like it's

134
00:08:10,960 --> 00:08:15,360
Speaker 1: just a mild inconvenience, as they say, but for bots

135
00:08:15,640 --> 00:08:19,040
Speaker 1: it's like a turn away. You're never going to be

136
00:08:19,080 --> 00:08:21,720
Speaker 1: able to get this. Now, some of y'all might be

137
00:08:21,760 --> 00:08:24,920
Speaker 1: saying something along the lines of but Jonathan, whenever I

138
00:08:25,080 --> 00:08:29,320
Speaker 1: run into captures these days, they're sometimes really hard, Like

139
00:08:29,520 --> 00:08:32,560
Speaker 1: it's hard to see what they spell out. I'll try

140
00:08:32,559 --> 00:08:35,439
Speaker 1: and type things in three four times and get kicked out.

141
00:08:35,760 --> 00:08:38,839
Speaker 1: And you're right, that is a problem. It is something

142
00:08:38,880 --> 00:08:42,720
Speaker 1: that actually is happening. It doesn't mean that you're not human.

143
00:08:43,400 --> 00:08:46,280
Speaker 1: If you're having like existential crises. I would like to

144
00:08:46,320 --> 00:08:49,400
Speaker 1: set your mind at ease by saying you're probably human.

145
00:08:49,800 --> 00:08:52,520
Speaker 1: I mean, I don't think I could say anything for certain,

146
00:08:52,559 --> 00:08:55,480
Speaker 1: but I feel fairly confidence saying you're probably human. But

147
00:08:55,880 --> 00:08:59,719
Speaker 1: the reason why captures have become really difficult in some

148
00:08:59,800 --> 00:09:03,240
Speaker 1: way cases anyway with some specific types of captures is

149
00:09:03,360 --> 00:09:07,800
Speaker 1: largely because other programmers figured out how to make better

150
00:09:07,840 --> 00:09:12,000
Speaker 1: automated programs that can parse and respond to captures. So

151
00:09:12,200 --> 00:09:14,920
Speaker 1: as one group of programmers figured out how to design

152
00:09:15,080 --> 00:09:19,360
Speaker 1: tools to defeat a capture, the captured designers would go

153
00:09:19,440 --> 00:09:23,120
Speaker 1: back to the drawing board to create new tests to

154
00:09:23,160 --> 00:09:26,320
Speaker 1: be more challenging for those bots, to say, well, they

155
00:09:26,559 --> 00:09:31,280
Speaker 1: got good at this, let's change these things and reintroduce

156
00:09:31,400 --> 00:09:36,560
Speaker 1: the capture so that this will trip up those systems

157
00:09:36,600 --> 00:09:39,560
Speaker 1: because while they're good at what we used to use

158
00:09:40,320 --> 00:09:45,920
Speaker 1: for gatekeeping, they've never run into this before, and unfortunately

159
00:09:46,080 --> 00:09:49,360
Speaker 1: that sometimes means that the tests become more challenging for

160
00:09:49,440 --> 00:09:52,440
Speaker 1: human beings as well. It no longer is a case

161
00:09:52,480 --> 00:09:56,160
Speaker 1: where something is trivial for a human but difficult for robots,

162
00:09:56,240 --> 00:09:59,679
Speaker 1: at least for certain types of captures. And that's particularly

163
00:09:59,720 --> 00:10:03,040
Speaker 1: true so if the human has some impairments like if

164
00:10:03,120 --> 00:10:07,000
Speaker 1: they have color blindness for example, or some other visual

165
00:10:07,480 --> 00:10:11,880
Speaker 1: or impairment like there are real issues in making captures

166
00:10:11,920 --> 00:10:14,720
Speaker 1: that do what they're supposed to do, that is weed

167
00:10:14,760 --> 00:10:18,320
Speaker 1: out all the non humans but also be accessible to

168
00:10:18,520 --> 00:10:21,800
Speaker 1: all humans, even those who might have impairments that would

169
00:10:21,840 --> 00:10:25,640
Speaker 1: otherwise make it difficult or challenging to complete a capture.

170
00:10:25,960 --> 00:10:29,320
Speaker 1: It is not an easy path to walk. We're going

171
00:10:29,400 --> 00:10:31,079
Speaker 1: to take a quick break. When we come back, i'll

172
00:10:31,200 --> 00:10:46,200
Speaker 1: talk more about the capture story we're back. So in

173
00:10:46,280 --> 00:10:50,400
Speaker 1: the early days of captures, they mostly took on the

174
00:10:50,440 --> 00:10:55,600
Speaker 1: form of distorted text that was printed over a busy background.

175
00:10:55,880 --> 00:10:58,960
Speaker 1: And the idea was that most automated programs would not

176
00:10:59,040 --> 00:11:02,360
Speaker 1: be able to recognize distorted texts like it would be

177
00:11:02,400 --> 00:11:05,760
Speaker 1: an image, not just text letters where it would be

178
00:11:05,800 --> 00:11:08,360
Speaker 1: able to read like the code used to generate the

179
00:11:08,440 --> 00:11:10,960
Speaker 1: letters and then say, oh, well that's these letters that

180
00:11:11,000 --> 00:11:13,839
Speaker 1: can replicate that and get through no problem. You had

181
00:11:13,840 --> 00:11:18,160
Speaker 1: to have something that was going to really stump them. Now,

182
00:11:18,240 --> 00:11:21,480
Speaker 1: image recognition is a pretty tricky science. I've talked about

183
00:11:21,480 --> 00:11:25,160
Speaker 1: it on this show before, Like, training computer systems to

184
00:11:25,240 --> 00:11:28,960
Speaker 1: recognize images takes a lot of time and effort and

185
00:11:29,040 --> 00:11:32,200
Speaker 1: lots and lots and lots of samples so that the

186
00:11:32,280 --> 00:11:37,640
Speaker 1: computer system can quote unquote learn what those images represent. Now,

187
00:11:37,679 --> 00:11:40,559
Speaker 1: it's one thing to teach a computer how to recognize

188
00:11:40,920 --> 00:11:44,600
Speaker 1: standard letters that are in a recognizable font. So if

189
00:11:44,640 --> 00:11:48,280
Speaker 1: the Internet only ever used one font and only used

190
00:11:48,320 --> 00:11:52,839
Speaker 1: one size of that font. Then it would be relatively

191
00:11:53,120 --> 00:11:56,400
Speaker 1: trivial for those who want to defeat captures, because once

192
00:11:56,400 --> 00:11:58,720
Speaker 1: you train a computer vision system on what a lower

193
00:11:58,760 --> 00:12:02,480
Speaker 1: case T looks like, for example, then the system would

194
00:12:02,480 --> 00:12:05,160
Speaker 1: recognize a lowercase tea every time one popped up. But

195
00:12:05,280 --> 00:12:08,400
Speaker 1: of course, there are lots of different fonts and typefaces

196
00:12:08,440 --> 00:12:11,080
Speaker 1: on the Internet, and they come in different sizes and

197
00:12:11,200 --> 00:12:15,240
Speaker 1: colors and on different backgrounds. So teaching a computer system

198
00:12:15,280 --> 00:12:18,520
Speaker 1: what a times new Roman lowercase tea looks like against

199
00:12:18,559 --> 00:12:21,439
Speaker 1: a blank background doesn't mean it's also going to recognize

200
00:12:21,440 --> 00:12:25,160
Speaker 1: a lowercase tea and some other font on some crazy background.

201
00:12:25,320 --> 00:12:28,040
Speaker 1: Plus maybe the tea is a little wavy, a little distorted,

202
00:12:28,440 --> 00:12:31,640
Speaker 1: so distorting that text makes it more challenging for image

203
00:12:31,679 --> 00:12:36,200
Speaker 1: recognition systems, like they're looking for defining features to be

204
00:12:36,360 --> 00:12:40,040
Speaker 1: able to recognize the image of a letter with the

205
00:12:40,120 --> 00:12:43,280
Speaker 1: actual letter. You see, humans, when we teach a human

206
00:12:43,440 --> 00:12:46,880
Speaker 1: what something looks like, it's a lot easier for humans

207
00:12:46,920 --> 00:12:50,280
Speaker 1: to associate other things that look kind of the way

208
00:12:50,880 --> 00:12:54,160
Speaker 1: the first example did, but maybe not exactly the same.

209
00:12:54,640 --> 00:12:57,640
Speaker 1: So in other words, like the example, I always use

210
00:12:57,640 --> 00:13:00,160
Speaker 1: our coffee mugs right. If I show you a coffee mugdug,

211
00:13:00,200 --> 00:13:02,199
Speaker 1: and I say this is a coffee mug, and then

212
00:13:02,240 --> 00:13:05,240
Speaker 1: I show you a second kind that looks totally different,

213
00:13:05,280 --> 00:13:08,520
Speaker 1: different color, different size, you know, whatever, maybe has different

214
00:13:08,520 --> 00:13:10,800
Speaker 1: writing on it, whatever it might be. And I say,

215
00:13:10,840 --> 00:13:13,400
Speaker 1: this is also a coffee mug. And then I show

216
00:13:13,400 --> 00:13:16,200
Speaker 1: you a third example that looks unlike the first two.

217
00:13:16,440 --> 00:13:18,800
Speaker 1: You could say, oh, okay, I get the idea. I

218
00:13:18,840 --> 00:13:23,200
Speaker 1: get the different features that make up what a coffee

219
00:13:23,280 --> 00:13:26,560
Speaker 1: mug is. I understand now. And now when I encounter

220
00:13:27,160 --> 00:13:29,840
Speaker 1: different types of coffee mugs, even though they might not

221
00:13:29,920 --> 00:13:32,559
Speaker 1: look anything like any of the other ones I've encountered,

222
00:13:32,720 --> 00:13:36,360
Speaker 1: I know, Okay, that's probably a coffee mug. Until someone says, no,

223
00:13:36,440 --> 00:13:39,160
Speaker 1: that's a teacup, and then your world is turned upside down.

224
00:13:39,200 --> 00:13:41,880
Speaker 1: But you get what I'm saying. Computers don't work that way.

225
00:13:42,200 --> 00:13:45,840
Speaker 1: Computers like, if you teach it an example is a thing,

226
00:13:46,320 --> 00:13:52,120
Speaker 1: it doesn't necessarily understand that similar but distinctly different versions

227
00:13:52,160 --> 00:13:55,200
Speaker 1: of that same thing fall into the same category. That

228
00:13:55,320 --> 00:13:58,920
Speaker 1: takes lots and lots of training. So the whole idea

229
00:13:59,080 --> 00:14:02,760
Speaker 1: of distortion was that this would make it very tricky

230
00:14:03,120 --> 00:14:06,040
Speaker 1: for most systems to be able to parse that information

231
00:14:06,320 --> 00:14:09,080
Speaker 1: and be able to put it in reliably and to

232
00:14:09,240 --> 00:14:13,200
Speaker 1: fool the capture system doesn't mean that it was fool proof.

233
00:14:13,360 --> 00:14:16,720
Speaker 1: Over time, those systems did get better at being able

234
00:14:16,760 --> 00:14:20,640
Speaker 1: to recognize those figures that were on screen, even better

235
00:14:20,800 --> 00:14:25,360
Speaker 1: than humans could in some cases, which is obviously a problem. Now,

236
00:14:25,400 --> 00:14:28,480
Speaker 1: there have been lots of other capture systems, not just Capture.

237
00:14:28,760 --> 00:14:34,400
Speaker 1: For example, there's one called Asira Asira. Asira did something

238
00:14:34,480 --> 00:14:37,200
Speaker 1: I mentioned earlier in the episode. It would present the

239
00:14:37,360 --> 00:14:40,680
Speaker 1: visitor with a collection of photographs and they would include

240
00:14:40,760 --> 00:14:43,960
Speaker 1: cats and dogs, and it would ask you, okay, identify

241
00:14:44,200 --> 00:14:47,240
Speaker 1: the pictures that have cats in them. So that was

242
00:14:47,280 --> 00:14:50,920
Speaker 1: one way to get around this was that it wasn't

243
00:14:51,040 --> 00:14:55,240
Speaker 1: just figuring out text. It was differentiating between cats and dogs,

244
00:14:55,560 --> 00:14:59,080
Speaker 1: something that again computer systems couldn't do just natively. They

245
00:14:59,120 --> 00:15:02,600
Speaker 1: had to be taught how to recognize the features that

246
00:15:02,640 --> 00:15:05,200
Speaker 1: belonged to a cat versus those that belonged to a dog,

247
00:15:05,720 --> 00:15:08,720
Speaker 1: just the same as all other image recognition software. The

248
00:15:08,760 --> 00:15:13,280
Speaker 1: folks over at Google developed Recapture, and that actually served

249
00:15:13,280 --> 00:15:17,640
Speaker 1: a dual purpose. It was kind of sneaky. So with Recapture,

250
00:15:17,720 --> 00:15:19,400
Speaker 1: you would go to a website and you would be

251
00:15:19,440 --> 00:15:22,400
Speaker 1: greeted by some you know, kind of grainy text, and

252
00:15:22,440 --> 00:15:24,320
Speaker 1: you'd be asked to type it out. You'd actually get

253
00:15:24,360 --> 00:15:28,120
Speaker 1: a couple of different ones, not just one. And this

254
00:15:28,320 --> 00:15:32,920
Speaker 1: text was from scans made of physical digitized books, so

255
00:15:32,960 --> 00:15:36,200
Speaker 1: in other words, books where they had put the page

256
00:15:36,320 --> 00:15:39,200
Speaker 1: down on a scanner and created a scan. So some

257
00:15:39,240 --> 00:15:41,800
Speaker 1: of these books were in you know, pretty bad shape.

258
00:15:41,840 --> 00:15:46,200
Speaker 1: They were at all crisp, clear images. So your first

259
00:15:46,480 --> 00:15:50,720
Speaker 1: capture you'd be presented with, Google actually knew the answer

260
00:15:50,880 --> 00:15:53,640
Speaker 1: to whatever the word was. So let's say the word

261
00:15:54,000 --> 00:15:58,800
Speaker 1: is salamander and you type in salamander, and so Google says,

262
00:15:58,800 --> 00:16:04,200
Speaker 1: all right, I already knew that this scanned word is salamander.

263
00:16:04,640 --> 00:16:06,960
Speaker 1: This is obviously a person who has typed this in.

264
00:16:07,280 --> 00:16:11,160
Speaker 1: But the second image would be a scan from a book.

265
00:16:11,200 --> 00:16:13,720
Speaker 1: Maybe it'd be a really smudged one, like one that's

266
00:16:13,760 --> 00:16:16,840
Speaker 1: harder to read, and it would ask you, okay, was

267
00:16:16,880 --> 00:16:20,320
Speaker 1: this word. Let's say it's surgeon and you type insurgeon. Well,

268
00:16:20,360 --> 00:16:25,200
Speaker 1: the secret sauce here is that Google didn't know that

269
00:16:25,200 --> 00:16:28,280
Speaker 1: that scanned word was surgeon. What Google was doing was

270
00:16:28,320 --> 00:16:33,560
Speaker 1: crowdsourcing crowdsourcing the effort to figure out what the text

271
00:16:33,840 --> 00:16:39,400
Speaker 1: in this scanned image actually said. So if you and

272
00:16:40,000 --> 00:16:43,280
Speaker 1: thousands of other people all put the same word in

273
00:16:43,760 --> 00:16:47,360
Speaker 1: when you were encountering this particular scan, Google would say,

274
00:16:47,400 --> 00:16:51,840
Speaker 1: all right, that word is very likely surgeon. Because you know,

275
00:16:52,040 --> 00:16:55,280
Speaker 1: ninety eight percent of the people who were shown this

276
00:16:55,680 --> 00:16:59,480
Speaker 1: recapture typed surgeon in. So now we know that that

277
00:16:59,520 --> 00:17:04,880
Speaker 1: word is which meant that they could essentially transcribe these

278
00:17:04,960 --> 00:17:09,840
Speaker 1: digitized texts by using the crowd to do the work

279
00:17:09,880 --> 00:17:13,159
Speaker 1: for them. And that is kind of the heart of

280
00:17:13,280 --> 00:17:17,840
Speaker 1: where capture and AI meet. That captures have been used

281
00:17:18,119 --> 00:17:21,880
Speaker 1: one to help train AI so that it's more effective.

282
00:17:21,920 --> 00:17:24,119
Speaker 1: Like if you've encountered other Google ones where it's like

283
00:17:24,320 --> 00:17:26,840
Speaker 1: pick all the images here that have motorcycles in them

284
00:17:27,119 --> 00:17:31,600
Speaker 1: or stairs. Well, part of that is training Google's image

285
00:17:31,640 --> 00:17:37,120
Speaker 1: recognition systems so that they're more accurate. Right, Like an

286
00:17:37,160 --> 00:17:41,359
Speaker 1: image recognition system might have trouble differentiating an actual like

287
00:17:41,440 --> 00:17:44,920
Speaker 1: stone staircase out in front of a building with a

288
00:17:44,960 --> 00:17:48,800
Speaker 1: pedestrian crosswalk, because you know you've got those those broken

289
00:17:48,920 --> 00:17:52,760
Speaker 1: lines on a crosswalk, those could look like stairs to

290
00:17:53,000 --> 00:17:57,440
Speaker 1: a computer image recognition system. So by giving users the

291
00:17:57,480 --> 00:18:01,080
Speaker 1: task of hey, identify all the excit samples in this

292
00:18:01,359 --> 00:18:05,080
Speaker 1: list that have stairs in them, Google starts to train

293
00:18:05,200 --> 00:18:09,119
Speaker 1: its own image recognition algorithms to be more effective and

294
00:18:09,200 --> 00:18:13,000
Speaker 1: more accurate. So in a way, we were essentially being

295
00:18:13,160 --> 00:18:18,640
Speaker 1: used as free labor to make these AI systems more accurate,

296
00:18:19,000 --> 00:18:20,800
Speaker 1: just so that we could get access to whatever it

297
00:18:20,920 --> 00:18:23,719
Speaker 1: was we were trying to visit, whether that was an

298
00:18:23,800 --> 00:18:27,159
Speaker 1: online shop or a chat room, or you know, whatever

299
00:18:27,200 --> 00:18:31,719
Speaker 1: it might be. So, yeah, we we've been working for free, y'all.

300
00:18:32,080 --> 00:18:35,240
Speaker 1: Actually it's in some cases we've been working for free

301
00:18:35,359 --> 00:18:38,880
Speaker 1: and denied access to tools that we wanted to use

302
00:18:39,040 --> 00:18:41,240
Speaker 1: because the captures were too hard for us to be

303
00:18:41,280 --> 00:18:44,480
Speaker 1: able to solve. But yeah, that's that's the quick story

304
00:18:44,720 --> 00:18:48,400
Speaker 1: about the history and evolution of captures. Clearly they're still

305
00:18:48,480 --> 00:18:51,600
Speaker 1: used today. Sometimes it's something simple like click this box

306
00:18:51,640 --> 00:18:54,040
Speaker 1: to prove your human that kind of thing where it

307
00:18:54,080 --> 00:18:57,400
Speaker 1: requires you to take an action. Those obviously are much

308
00:18:57,440 --> 00:19:02,159
Speaker 1: more simple for humans to comp than for robots, so

309
00:19:02,240 --> 00:19:06,080
Speaker 1: those still follow the philosophy of the original captions. A

310
00:19:06,119 --> 00:19:08,840
Speaker 1: lot of other ones, though they get pretty tricky, to

311
00:19:08,880 --> 00:19:11,720
Speaker 1: the point where sometimes I'm discouraged from even going further

312
00:19:12,080 --> 00:19:15,080
Speaker 1: and visiting the website in particular, just like, you know what,

313
00:19:15,400 --> 00:19:18,520
Speaker 1: I don't need to feel stupid because I couldn't find

314
00:19:18,560 --> 00:19:22,440
Speaker 1: all the fire hydrants in these photographs, so I'm just out.

315
00:19:22,720 --> 00:19:25,080
Speaker 1: But yeah, that's it. And like I said, it plays

316
00:19:25,119 --> 00:19:27,320
Speaker 1: a really important part with AI. It's kind of a

317
00:19:27,359 --> 00:19:33,159
Speaker 1: seesaw effect, right, Like you create a barrier that AI

318
00:19:33,359 --> 00:19:36,080
Speaker 1: can't get over until it can, and then you have

319
00:19:36,119 --> 00:19:39,520
Speaker 1: to go back and create a harder barrier. And meanwhile,

320
00:19:39,680 --> 00:19:42,840
Speaker 1: the folks developing the AI keep making advancements that the

321
00:19:42,880 --> 00:19:46,600
Speaker 1: AI gets more sophisticated and powerful over time. So yeah,

322
00:19:46,680 --> 00:19:50,960
Speaker 1: delicate balance and not everybody benefits. As I said, hope

323
00:19:51,000 --> 00:19:54,200
Speaker 1: that that was interesting and informative to y'all. I hope

324
00:19:54,240 --> 00:19:56,919
Speaker 1: you're all doing well, and I'll talk to you again

325
00:19:57,480 --> 00:20:07,640
Speaker 1: really soon. Tech Stuff is an iHeartRadio production. For more

326
00:20:07,720 --> 00:20:12,480
Speaker 1: podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or

327
00:20:12,480 --> 00:20:14,439
Speaker 1: wherever you listen to your favorite shows.