WEBVTT - Drowning in Data

0:00:00.160 --> 0:00:07.400
<v Speaker 1>Brought to you by Toyota. Let's go places. Welcome to

0:00:07.560 --> 0:00:15.280
<v Speaker 1>Forward Thinking, either everyone, and welcome to Forward Thinking, the

0:00:15.440 --> 0:00:18.240
<v Speaker 1>podcast that looks at the future and says, what's the story,

0:00:18.320 --> 0:00:21.880
<v Speaker 1>Morning Glory, what's the word hummingbird? I'm Jonathan Strickland, I'm

0:00:21.920 --> 0:00:25.000
<v Speaker 1>laurenco and I'm Joe McCormick, and we have jazz hands

0:00:25.760 --> 0:00:29.640
<v Speaker 1>and today you know it totally does not translate to radio.

0:00:29.760 --> 0:00:34.040
<v Speaker 1>But anyway, we're doing Yeah, and we were. Actually I wasn't,

0:00:34.120 --> 0:00:37.400
<v Speaker 1>but you two were honesty and podcasting. That's true. So

0:00:37.440 --> 0:00:40.840
<v Speaker 1>today we wanted to talk about the concept of big

0:00:40.920 --> 0:00:44.479
<v Speaker 1>data or big data, which is not a gigantic character

0:00:44.520 --> 0:00:47.840
<v Speaker 1>from Star Trek the Next Generation. Unfortunately, No, that would

0:00:47.840 --> 0:00:52.320
<v Speaker 1>have been both interesting and terrifying. But that is not

0:00:52.360 --> 0:00:54.520
<v Speaker 1>the case. Actually, I guess you could still argue that

0:00:54.560 --> 0:00:58.080
<v Speaker 1>big data is both interesting and to some people terrifying. Yeah,

0:00:58.400 --> 0:01:01.040
<v Speaker 1>it's it's so what is data? I've got a couple

0:01:01.040 --> 0:01:04.320
<v Speaker 1>of official definitions, and then Joe, I think you have

0:01:04.400 --> 0:01:06.440
<v Speaker 1>your own definition, So let me let me go through

0:01:06.480 --> 0:01:10.280
<v Speaker 1>these quote unquote official definitions. These are from IBM, and

0:01:10.400 --> 0:01:12.800
<v Speaker 1>IBM is one of those companies that has a lot

0:01:12.959 --> 0:01:16.520
<v Speaker 1>invested in big data in general and big data management

0:01:16.920 --> 0:01:20.959
<v Speaker 1>and so in a paper called Demystifying Big Data. Here's

0:01:21.000 --> 0:01:22.640
<v Speaker 1>one of the definitions, which is big data is a

0:01:22.680 --> 0:01:26.199
<v Speaker 1>phenomenon defined by the rapid acceleration and the expanding volume

0:01:26.240 --> 0:01:29.360
<v Speaker 1>of high velocity, complex, and diverse types of data. Big

0:01:29.440 --> 0:01:34.360
<v Speaker 1>data is often defined along three dimensions, volume, velocity, and variety.

0:01:34.480 --> 0:01:36.959
<v Speaker 1>And then the other definition is big data as a

0:01:37.080 --> 0:01:40.480
<v Speaker 1>term that describes large volumes of high velocity, complex and

0:01:40.520 --> 0:01:44.160
<v Speaker 1>variable data that require advanced techniques and technologies to enable

0:01:44.200 --> 0:01:49.000
<v Speaker 1>to capture, storage, distribution, management, and analysis of the information. So, Joe,

0:01:49.000 --> 0:01:51.480
<v Speaker 1>what's your what's your definition? That was a lot of words. Yeah,

0:01:51.480 --> 0:01:54.880
<v Speaker 1>it was, uh, well it just seems to me and

0:01:55.320 --> 0:01:58.880
<v Speaker 1>I'm no expert here, but big what's the difference between

0:01:58.920 --> 0:02:01.680
<v Speaker 1>just a lot of data to and then big data.

0:02:01.960 --> 0:02:04.920
<v Speaker 1>That's a good question because we've had a lot of

0:02:05.000 --> 0:02:08.120
<v Speaker 1>data before, but suddenly there's sort of this new paradigm

0:02:08.160 --> 0:02:10.400
<v Speaker 1>where we have to think about, oh, it's big data.

0:02:10.520 --> 0:02:13.440
<v Speaker 1>It's a separate thing. It's not just a matter of degrees.

0:02:13.800 --> 0:02:17.200
<v Speaker 1>And I think it's the point at which our intuition

0:02:17.360 --> 0:02:20.800
<v Speaker 1>kicks in and tells us something weird is going on, right,

0:02:20.840 --> 0:02:23.320
<v Speaker 1>there's so much here to handle that I can't even

0:02:23.400 --> 0:02:28.040
<v Speaker 1>imagine handling it. Yeah, it's it's it's when you suddenly realize, oh,

0:02:28.280 --> 0:02:32.240
<v Speaker 1>hold onto your butts. The phrase was actually entered into

0:02:32.280 --> 0:02:35.600
<v Speaker 1>the Oxford English Dictionary just this quarter, like this month.

0:02:35.639 --> 0:02:40.720
<v Speaker 1>As of June, they entered it into their quarterly online

0:02:40.800 --> 0:02:43.880
<v Speaker 1>update and and there there there's a definition that is

0:02:44.000 --> 0:02:46.000
<v Speaker 1>very much like the one that Jonathan read off, but

0:02:46.120 --> 0:02:50.040
<v Speaker 1>slightly more succinct. Um, well, it's the O E. D Well.

0:02:50.080 --> 0:02:53.200
<v Speaker 1>They also include several several examples of it of its

0:02:53.320 --> 0:02:58.480
<v Speaker 1>use um the original going back to when a social

0:02:58.520 --> 0:03:02.760
<v Speaker 1>historian um by name of C. Tilly referenced it. But

0:03:02.840 --> 0:03:06.680
<v Speaker 1>in terms of meaning being obscured when you are in

0:03:06.720 --> 0:03:12.200
<v Speaker 1>the presence of an increasingly complex system of of of information. Right,

0:03:12.400 --> 0:03:14.680
<v Speaker 1>so it has something to do with it's hard for

0:03:14.760 --> 0:03:19.040
<v Speaker 1>us to grasp intuitively. Right. Well, well, yeah, it's kind

0:03:19.040 --> 0:03:21.080
<v Speaker 1>of like, if you want to boil it down to

0:03:21.160 --> 0:03:24.160
<v Speaker 1>a cliche, it's the whole not being able to see

0:03:24.160 --> 0:03:27.040
<v Speaker 1>the forest for the trees. Like you're able to see

0:03:27.720 --> 0:03:30.639
<v Speaker 1>the stuff that's immediately around you, but when you're trying

0:03:30.639 --> 0:03:33.320
<v Speaker 1>to get a bigger picture, your perspective is blocked by

0:03:33.320 --> 0:03:36.120
<v Speaker 1>the fact that there's just so much there. So if

0:03:36.160 --> 0:03:38.240
<v Speaker 1>you can't get a good grasp on the big picture,

0:03:38.480 --> 0:03:40.560
<v Speaker 1>so exactly how many trees are we talking about here, Well,

0:03:40.840 --> 0:03:43.240
<v Speaker 1>let's let's boil that down. Let's boil that down to

0:03:43.360 --> 0:03:46.480
<v Speaker 1>talking about how we measure data in the computer world.

0:03:46.480 --> 0:03:50.560
<v Speaker 1>And for anyone who has any background in computers at all,

0:03:50.800 --> 0:03:52.840
<v Speaker 1>this is probably going to seem super basic to you.

0:03:52.880 --> 0:03:54.840
<v Speaker 1>But it's important to have the building blocks there for

0:03:54.920 --> 0:03:58.280
<v Speaker 1>us to understand the enormity of big data. Right, if

0:03:58.280 --> 0:03:59.840
<v Speaker 1>you go look on the internet, it seems like a

0:03:59.840 --> 0:04:03.760
<v Speaker 1>lot of times people confuse terms like data and facts

0:04:04.000 --> 0:04:07.440
<v Speaker 1>and information like they just use them interchangeably. Data and

0:04:07.480 --> 0:04:10.640
<v Speaker 1>information are you know, of course that they are synonyms.

0:04:10.640 --> 0:04:14.040
<v Speaker 1>But when we're talking about data in the terms of computation,

0:04:14.480 --> 0:04:17.480
<v Speaker 1>and then we're talking about bites, right, So a bite

0:04:17.680 --> 0:04:22.080
<v Speaker 1>is eight bits, and one bite can represent one character.

0:04:22.440 --> 0:04:24.840
<v Speaker 1>So when I'm talking about character, I'm talking about like

0:04:24.880 --> 0:04:26.680
<v Speaker 1>a letter or a number or a symbol. I'm not

0:04:26.720 --> 0:04:30.240
<v Speaker 1>talking about Jean Valjean from La Misseraba. That's totally different

0:04:30.320 --> 0:04:33.559
<v Speaker 1>kind of character. So eight bits can be one character,

0:04:33.600 --> 0:04:37.359
<v Speaker 1>So it takes about ten or bytes, So eighty bits

0:04:37.400 --> 0:04:40.080
<v Speaker 1>total ten bytes to make up about you know, one

0:04:40.160 --> 0:04:45.240
<v Speaker 1>word or so. Uh So that's your basic unit of data.

0:04:45.640 --> 0:04:48.080
<v Speaker 1>So if we look at kill a bite, that's we

0:04:48.080 --> 0:04:51.880
<v Speaker 1>we say it's a thousand bytes. Technically it's one thousand,

0:04:51.960 --> 0:04:56.080
<v Speaker 1>twenty four bytes. And in counting, yeah, this this gets

0:04:56.080 --> 0:04:59.480
<v Speaker 1>a little, this gets a little complex as we go,

0:04:59.760 --> 0:05:03.240
<v Speaker 1>high are up, so I will be rounding down to

0:05:03.320 --> 0:05:10.280
<v Speaker 1>the nearest. Uh yeah, because otherwise, uh we'll be spending

0:05:10.279 --> 0:05:12.360
<v Speaker 1>the entire podcast just listening to me read out an

0:05:12.360 --> 0:05:14.920
<v Speaker 1>incredibly long number. I'll give an example of that when

0:05:14.920 --> 0:05:16.800
<v Speaker 1>I get to it, but I'll skip over most of

0:05:16.800 --> 0:05:20.320
<v Speaker 1>them anyway. So a kilobyte will rough says roughly a

0:05:20.360 --> 0:05:22.600
<v Speaker 1>thousand bites. So if you were to type out a

0:05:22.600 --> 0:05:25.800
<v Speaker 1>page of text, that would be about two kilobytes of information.

0:05:25.800 --> 0:05:28.640
<v Speaker 1>Now that's if you're just typing text, not like images

0:05:28.720 --> 0:05:31.480
<v Speaker 1>or anything else. But that would be about two kilobytes. Now,

0:05:31.480 --> 0:05:35.800
<v Speaker 1>if you had a low res photo, that's probably around

0:05:35.839 --> 0:05:39.440
<v Speaker 1>a hundred kilobytes, maybe fewer, depending upon the resolution. I

0:05:39.440 --> 0:05:41.360
<v Speaker 1>mean that you know some things that if you save

0:05:41.400 --> 0:05:43.960
<v Speaker 1>it for the web, it can be between like thirty

0:05:44.000 --> 0:05:47.080
<v Speaker 1>and a hundred kilobytes or so. Uh, the next step

0:05:47.160 --> 0:05:51.839
<v Speaker 1>up is megabyte. So that's technically one million, forty eight thousand,

0:05:51.920 --> 0:05:54.000
<v Speaker 1>five D seventy six bites, but we'll usually just say

0:05:54.000 --> 0:05:55.960
<v Speaker 1>one million. And that's the last I'm going to do

0:05:56.040 --> 0:05:59.560
<v Speaker 1>of the specific numbers. Uh, so high res photo could

0:05:59.600 --> 0:06:02.800
<v Speaker 1>be at east two megabytes. Five megabytes is enough to

0:06:02.839 --> 0:06:06.719
<v Speaker 1>hold the complete works of Shakespeare. So now wait a minute,

0:06:06.760 --> 0:06:11.240
<v Speaker 1>would that be plain text or formatted documents? It would

0:06:11.240 --> 0:06:14.000
<v Speaker 1>be essentially plain text? Yeah, plain text would be about

0:06:14.040 --> 0:06:18.960
<v Speaker 1>five megabytes Shakespeare. Yeah, now if you want to if

0:06:18.960 --> 0:06:23.520
<v Speaker 1>you want there's there's all these bears pursuing you in

0:06:24.080 --> 0:06:27.040
<v Speaker 1>Winter's Tale, so you got to fill that out. Uh.

0:06:27.240 --> 0:06:31.039
<v Speaker 1>The a CD ROM, do you guys remember those? There's

0:06:31.040 --> 0:06:32.960
<v Speaker 1>still they still exist. I have one in my computer

0:06:33.040 --> 0:06:36.280
<v Speaker 1>in front of us, an optical drive. They round, they

0:06:36.320 --> 0:06:39.320
<v Speaker 1>were uh you know, along one side, but if you

0:06:39.360 --> 0:06:42.120
<v Speaker 1>looked at them in profile, they were flat. That's what

0:06:42.160 --> 0:06:45.200
<v Speaker 1>you used to play missed right, Yes it was in fact,

0:06:45.480 --> 0:06:48.200
<v Speaker 1>but a CD ROM could hold between about six fifty

0:06:48.279 --> 0:06:51.599
<v Speaker 1>and nine hundred megabytes. And we're talking about the twelve

0:06:51.600 --> 0:06:55.040
<v Speaker 1>centimeter discs, not the eight centimeter discs, because they had

0:06:55.040 --> 0:06:57.600
<v Speaker 1>many discs as well. It wasn't as popular here in

0:06:57.600 --> 0:07:00.599
<v Speaker 1>the United States, but in Asia they were very popular.

0:07:01.240 --> 0:07:05.000
<v Speaker 1>Then you have gigabyte that ends up being the billion

0:07:05.040 --> 0:07:09.320
<v Speaker 1>mark of bytes. Again, I'm just simplifying here about one

0:07:09.360 --> 0:07:13.200
<v Speaker 1>gigabyte can hold a broadcast quality movie. By the way,

0:07:13.320 --> 0:07:15.920
<v Speaker 1>I remember when I had a computer that had a

0:07:16.160 --> 0:07:20.560
<v Speaker 1>I think three gigabyte hard drive, and that was so huge. Yeah.

0:07:20.560 --> 0:07:22.520
<v Speaker 1>I remember when I got a two and fifty six

0:07:22.640 --> 0:07:24.920
<v Speaker 1>megabyte hard drive and I thought that there's no way

0:07:24.960 --> 0:07:27.400
<v Speaker 1>anyone could fill up that budget space. And look at

0:07:27.480 --> 0:07:31.480
<v Speaker 1>what I know. A twenty gigabyte drive could hold the

0:07:31.560 --> 0:07:35.640
<v Speaker 1>high fidelity recordings of the entire works of Beethoven. A

0:07:35.760 --> 0:07:38.800
<v Speaker 1>fifty gigabyte hard drive is equivalent to a floor of

0:07:38.880 --> 0:07:42.760
<v Speaker 1>books in a typical library. The next step up is terabyte,

0:07:42.960 --> 0:07:46.040
<v Speaker 1>which is one trillion bytes. That two terabytes would be

0:07:46.080 --> 0:07:49.960
<v Speaker 1>equivalent to an academic research library, and ten terabytes would

0:07:49.960 --> 0:07:53.600
<v Speaker 1>be equivalent to all of the printed collection at the

0:07:53.720 --> 0:07:57.000
<v Speaker 1>United States Library of Congress, just the printed materials. It's

0:07:57.040 --> 0:08:00.520
<v Speaker 1>amazing how much cheaper the storage has been come over

0:08:00.560 --> 0:08:02.880
<v Speaker 1>the years. What does it cost to go buy like

0:08:02.960 --> 0:08:06.240
<v Speaker 1>a one tero byte external hard drate? It all depends

0:08:06.240 --> 0:08:08.800
<v Speaker 1>on where you go, but you're talking around a hundred

0:08:09.080 --> 0:08:11.880
<v Speaker 1>couple hundred dollars at most for most places. And and

0:08:11.920 --> 0:08:14.200
<v Speaker 1>the thing is that you know, the the technology has

0:08:14.240 --> 0:08:18.080
<v Speaker 1>improved over time and the manufacturing processes have improved over time,

0:08:18.240 --> 0:08:21.240
<v Speaker 1>which has brought the price down over time. But we're

0:08:21.240 --> 0:08:24.080
<v Speaker 1>not done yet. We gotta go back. So if you

0:08:24.520 --> 0:08:26.360
<v Speaker 1>that was one trillion bites, let's say you one of

0:08:26.400 --> 0:08:29.600
<v Speaker 1>the levels, let's say you want to quadrup Well, the

0:08:29.680 --> 0:08:32.280
<v Speaker 1>numbers we're going to be talking about are way bigger

0:08:32.280 --> 0:08:36.000
<v Speaker 1>than well, they don't know, we haven't said it yet.

0:08:36.040 --> 0:08:39.040
<v Speaker 1>So next is A is one quadrillion bites. That's a

0:08:39.240 --> 0:08:42.360
<v Speaker 1>that's a pedo byte and uh, to pedal bytes would

0:08:42.400 --> 0:08:46.880
<v Speaker 1>be all US academic research libraries combined. Two hundred peda

0:08:46.920 --> 0:08:51.840
<v Speaker 1>bytes would be all printed material everywhere. Uh. If you

0:08:51.880 --> 0:08:57.199
<v Speaker 1>went up to one quintillion bites, I don't know how

0:08:57.200 --> 0:09:00.720
<v Speaker 1>many zeros that is, it's a lot. Uh, that's exabyte

0:09:01.200 --> 0:09:04.360
<v Speaker 1>and five exabytes would be enough to contain all the

0:09:04.400 --> 0:09:07.839
<v Speaker 1>words ever spoken by human beings. If you were to

0:09:07.880 --> 0:09:11.120
<v Speaker 1>break down all the words we've ever spoken into bites,

0:09:11.640 --> 0:09:14.559
<v Speaker 1>it would fit within five exabytes. Did we say whose

0:09:14.640 --> 0:09:17.839
<v Speaker 1>estimates these are? Oh? Well, these are actually estimates that

0:09:17.880 --> 0:09:21.840
<v Speaker 1>are up all over the place there. You know. IBM

0:09:21.880 --> 0:09:27.920
<v Speaker 1>actually cites these as their their benchmarks as well. Yeah,

0:09:28.000 --> 0:09:30.400
<v Speaker 1>and then the next two levels up you want to

0:09:30.400 --> 0:09:33.240
<v Speaker 1>go up. So we've done quintillions, So sextillion would be

0:09:33.360 --> 0:09:37.040
<v Speaker 1>zeta bite and I guess septilian would be YadA bite.

0:09:37.640 --> 0:09:43.800
<v Speaker 1>So anyway, or yakta YadA, not yoda, not yoda, not

0:09:44.000 --> 0:09:50.520
<v Speaker 1>yakta not yakta YadA. So so anyway, those those are

0:09:50.520 --> 0:09:52.760
<v Speaker 1>the scales exabite. We get to the point where it's

0:09:52.760 --> 0:09:55.480
<v Speaker 1>all the words we've ever spoken, right, So that gives

0:09:55.520 --> 0:09:59.000
<v Speaker 1>you the idea of of the basics of bites and

0:09:59.120 --> 0:10:03.560
<v Speaker 1>how much information is equivalent to you know, various real

0:10:03.600 --> 0:10:06.680
<v Speaker 1>world examples. So let's talk about the amount of information

0:10:07.120 --> 0:10:11.600
<v Speaker 1>that we create on a daily basis. Soil for example,

0:10:11.600 --> 0:10:18.200
<v Speaker 1>of corey DIABM, we create about twelve terabytes of tweets

0:10:18.240 --> 0:10:22.840
<v Speaker 1>in one day. That's twelve trillion bytes of messages at

0:10:22.840 --> 0:10:26.040
<v Speaker 1>a hundred forty characters or fewer tweets alone. And Twitter

0:10:26.120 --> 0:10:29.680
<v Speaker 1>isn't even by far the most popular social networking No

0:10:30.160 --> 0:10:32.640
<v Speaker 1>is not is not the most popular at all. And

0:10:32.720 --> 0:10:35.760
<v Speaker 1>yet that's not even rich text isn't. That's just plain text.

0:10:35.800 --> 0:10:38.760
<v Speaker 1>It's plain texts short messaging service really is what it is.

0:10:38.920 --> 0:10:42.080
<v Speaker 1>So so remember too, terabytes is equivalent to a single

0:10:42.160 --> 0:10:45.240
<v Speaker 1>academic research library. So you've got the equivalent of six

0:10:45.360 --> 0:10:50.480
<v Speaker 1>research libraries academic research libraries, just in tweets alone. Now,

0:10:50.520 --> 0:10:52.440
<v Speaker 1>I'm not saying that you're going to be able to

0:10:52.760 --> 0:10:58.320
<v Speaker 1>research your next uh thesis only using Twitter. It's not

0:10:58.400 --> 0:11:01.439
<v Speaker 1>necessarily useful data that's out there, but that's how much,

0:11:01.720 --> 0:11:04.080
<v Speaker 1>you know, once you put it together. Lots of people

0:11:04.120 --> 0:11:08.280
<v Speaker 1>are are tracking trends and keywords and and you know,

0:11:08.600 --> 0:11:11.960
<v Speaker 1>sure all kinds of things. There are lots of useful

0:11:12.480 --> 0:11:15.600
<v Speaker 1>Twitter to predict the stock market and earthquakes, and they're

0:11:15.679 --> 0:11:19.160
<v Speaker 1>using Twitter for all sorts of stuff. Yeah, consumer behavior,

0:11:19.840 --> 0:11:22.480
<v Speaker 1>Not that we're endorsing those ideas, by the way, but

0:11:24.000 --> 0:11:26.079
<v Speaker 1>going back to the kind of data that we actually

0:11:26.080 --> 0:11:28.520
<v Speaker 1>are producing on a daily basis. Back in two thousand twelve,

0:11:28.600 --> 0:11:32.880
<v Speaker 1>when Facebook still had just under one billion registered users,

0:11:33.520 --> 0:11:36.360
<v Speaker 1>they were collecting According to Facebook, they did an earnings

0:11:36.400 --> 0:11:39.319
<v Speaker 1>call where they said they were collecting about five hundred

0:11:39.480 --> 0:11:44.120
<v Speaker 1>terabytes per day from users. So that's all the stuff

0:11:44.160 --> 0:11:47.200
<v Speaker 1>that everyone is doing on Facebook, whether they are posting

0:11:47.200 --> 0:11:50.680
<v Speaker 1>a status update or liking a page or sharing a link.

0:11:51.200 --> 0:11:54.600
<v Speaker 1>All of that was folded into this number. But five

0:11:54.840 --> 0:12:00.640
<v Speaker 1>hundred terabytes every day. So then you've got YouTube. You guys,

0:12:00.720 --> 0:12:04.000
<v Speaker 1>of course have heard the famous UH stat Now it

0:12:04.120 --> 0:12:07.640
<v Speaker 1>is one hundred hours of video that's uploaded every minute

0:12:09.880 --> 0:12:12.080
<v Speaker 1>between the time we shot that video and by the

0:12:12.080 --> 0:12:14.319
<v Speaker 1>time we but between the time we wrote it and

0:12:14.480 --> 0:12:17.400
<v Speaker 1>the time we shot, really time we shot in the

0:12:17.440 --> 0:12:20.079
<v Speaker 1>time it published exactly. Yeah, it went from seventy two.

0:12:21.240 --> 0:12:23.760
<v Speaker 1>Technically it had been growing all that time, but Google

0:12:23.880 --> 0:12:26.679
<v Speaker 1>gave the official announcement. Yeah, so one hundred hours of

0:12:26.720 --> 0:12:29.440
<v Speaker 1>YouTube footage is upload every minute. So that means that

0:12:29.920 --> 0:12:31.679
<v Speaker 1>in a single day you get about a hundred and

0:12:31.760 --> 0:12:37.960
<v Speaker 1>forty four thousand hours of video on YouTube added every day.

0:12:38.000 --> 0:12:44.000
<v Speaker 1>That's sick. It's a little it's a little stomach churning actually. Um.

0:12:44.080 --> 0:12:45.880
<v Speaker 1>And so if you were to look at all the

0:12:45.960 --> 0:12:49.760
<v Speaker 1>data that we are creating, and this is beyond social media,

0:12:49.800 --> 0:12:52.560
<v Speaker 1>we're talking about all the information being created not just

0:12:52.600 --> 0:12:55.280
<v Speaker 1>by human beings, but by things like sensors that are

0:12:55.600 --> 0:12:58.079
<v Speaker 1>connected to computer systems and are sending that data in

0:12:58.200 --> 0:13:03.000
<v Speaker 1>So things like whether uh sensors or traffic sensors, the

0:13:03.080 --> 0:13:05.080
<v Speaker 1>cameras we've talked about in the past. All of this

0:13:05.160 --> 0:13:11.080
<v Speaker 1>stuff combined generates about two point five quintillion bytes of

0:13:11.200 --> 0:13:15.920
<v Speaker 1>data also known as XA bytes every single day. Well,

0:13:15.960 --> 0:13:19.760
<v Speaker 1>that makes me curious, does more data come from human

0:13:19.880 --> 0:13:24.120
<v Speaker 1>entry or from other machines? Right now, humans are actually

0:13:24.120 --> 0:13:27.960
<v Speaker 1>generating most of that data. Eight percent of it, in fact,

0:13:28.160 --> 0:13:33.960
<v Speaker 1>is coming from unstructured information, which includes things like email, video, blogs, uh,

0:13:34.080 --> 0:13:38.439
<v Speaker 1>social media, call center conversations. All of this goes into

0:13:38.480 --> 0:13:43.400
<v Speaker 1>that that data that's going to change. Probably if you

0:13:43.440 --> 0:13:45.640
<v Speaker 1>remember our Internet of Things episode, we talked a lot

0:13:45.640 --> 0:13:47.559
<v Speaker 1>about how we're going to be living in this world

0:13:47.640 --> 0:13:50.920
<v Speaker 1>where our environments are going to be constantly collecting Yeah,

0:13:50.960 --> 0:13:53.880
<v Speaker 1>the more devices are collecting information about us and yeah,

0:13:53.679 --> 0:13:55.520
<v Speaker 1>yeah it Well, it'll it'll get to a point where

0:13:55.559 --> 0:13:59.520
<v Speaker 1>we'll start to see that number probably uh fluctuate quite

0:13:59.559 --> 0:14:03.280
<v Speaker 1>a bit. We'll see that percentage drop for human produced

0:14:03.400 --> 0:14:08.720
<v Speaker 1>data versus you know, automated sensors. So two point five

0:14:08.800 --> 0:14:11.920
<v Speaker 1>quintillion bytes of data produced every day, that means every

0:14:11.960 --> 0:14:15.440
<v Speaker 1>two days we produce as much information as all the

0:14:15.480 --> 0:14:19.160
<v Speaker 1>words we've ever spoken. So that's kind of incredible and also,

0:14:19.280 --> 0:14:23.080
<v Speaker 1>according to ibm UH, in the last two years, we

0:14:23.200 --> 0:14:27.800
<v Speaker 1>have produced of all the world's data, meaning that everything

0:14:27.960 --> 0:14:33.280
<v Speaker 1>prior to those two years represents of all the data. Ever,

0:14:33.480 --> 0:14:36.680
<v Speaker 1>it's a it's a pretty crazy, crazily curved graph as

0:14:36.720 --> 0:14:40.000
<v Speaker 1>of So, so, you know, years ago at this point,

0:14:40.480 --> 0:14:42.360
<v Speaker 1>Eric Schmidt of Google said that we were creating as

0:14:42.440 --> 0:14:45.160
<v Speaker 1>much info every two days as as we had from

0:14:45.200 --> 0:14:49.320
<v Speaker 1>the dawn of human kind up through two thousand three. Yeah,

0:14:49.760 --> 0:14:52.840
<v Speaker 1>so in one point we created one point eight zeta

0:14:52.920 --> 0:14:57.280
<v Speaker 1>bytes of information globally UH in that year, and that

0:14:57.320 --> 0:15:02.080
<v Speaker 1>amount is expected to double every year. So that means

0:15:02.720 --> 0:15:05.120
<v Speaker 1>one point a zeta bites. If you want to know, like, okay,

0:15:05.120 --> 0:15:06.680
<v Speaker 1>well what does that mean to me? Like, how how

0:15:06.720 --> 0:15:12.600
<v Speaker 1>can I conceptualize this amount of information? That's equivalent to

0:15:13.800 --> 0:15:19.920
<v Speaker 1>two hundred billion two our HD movies? And if you

0:15:19.920 --> 0:15:23.520
<v Speaker 1>wanted to watch those two hundred billion HD movies and

0:15:23.600 --> 0:15:26.040
<v Speaker 1>just sit down and have a marathon, it would take

0:15:26.080 --> 0:15:33.320
<v Speaker 1>you forty seven million years to do it, no bathroom breaks. Yeah,

0:15:33.480 --> 0:15:36.040
<v Speaker 1>that that those these these numbers don't even make any

0:15:36.200 --> 0:15:39.040
<v Speaker 1>sense to me at that point. It's just you know,

0:15:39.200 --> 0:15:41.800
<v Speaker 1>and that's and that's the that's the key, right that

0:15:41.800 --> 0:15:44.160
<v Speaker 1>that's why we call it big data, because they are

0:15:44.400 --> 0:15:47.160
<v Speaker 1>such huge numbers that when you think about you're like,

0:15:48.120 --> 0:15:51.160
<v Speaker 1>what what can I even do with all this information?

0:15:51.200 --> 0:15:53.560
<v Speaker 1>How can I make use of it? But that's exactly

0:15:53.600 --> 0:15:55.960
<v Speaker 1>the thing, right, We're going to a point where it's

0:15:56.000 --> 0:15:58.240
<v Speaker 1>not so much that you're going to do something with it,

0:15:58.560 --> 0:16:01.240
<v Speaker 1>but the machines are going to do something right And

0:16:01.240 --> 0:16:04.760
<v Speaker 1>and there's some very creative processes that people have come

0:16:04.840 --> 0:16:08.440
<v Speaker 1>up with that break this down into more manageable problems

0:16:08.520 --> 0:16:11.200
<v Speaker 1>that machines can handle. Well, I shouldn't say that you're

0:16:11.240 --> 0:16:12.960
<v Speaker 1>not going to use it for sure. I mean, this

0:16:13.040 --> 0:16:15.440
<v Speaker 1>kind of thing is probably really useful to people who

0:16:15.440 --> 0:16:18.520
<v Speaker 1>are involved in, say marketing. Well, it's useful for marketing,

0:16:18.560 --> 0:16:21.800
<v Speaker 1>but it's also and we'll do yeah, well, we'll do

0:16:21.840 --> 0:16:26.680
<v Speaker 1>a full episode on some of the applications of using data. Yeah,

0:16:27.000 --> 0:16:28.560
<v Speaker 1>that will be our next episode that we recorded. Me

0:16:30.000 --> 0:16:33.680
<v Speaker 1>But but yeah, there here's an example that all of

0:16:33.760 --> 0:16:37.160
<v Speaker 1>us in this room could use UH with big data.

0:16:37.200 --> 0:16:39.840
<v Speaker 1>Now we would not be actually doing the analyzing ourselves,

0:16:40.160 --> 0:16:43.560
<v Speaker 1>but we would benefit from the UH, the the the

0:16:43.600 --> 0:16:47.400
<v Speaker 1>actual work, and that would be traffic, real time traffic

0:16:47.440 --> 0:16:51.600
<v Speaker 1>reports So if you are using some sort of GPS

0:16:51.640 --> 0:16:56.480
<v Speaker 1>that allows you to get incoming traffic information that's being

0:16:56.520 --> 0:16:59.960
<v Speaker 1>gathered by various means, and there are different ways of

0:17:00.040 --> 0:17:02.960
<v Speaker 1>doing it depending upon what system you're using, then you

0:17:03.000 --> 0:17:07.800
<v Speaker 1>are essentially benefiting from the from the analysis of big

0:17:07.920 --> 0:17:11.840
<v Speaker 1>data because it's taking all this information about the actual

0:17:11.920 --> 0:17:15.440
<v Speaker 1>environment around you that's gathered from multiple sources and helping

0:17:15.480 --> 0:17:18.760
<v Speaker 1>you route the most efficient way. Right, does it dynamic routing?

0:17:18.800 --> 0:17:22.640
<v Speaker 1>Which is really a cool thing and and and obvious benefit.

0:17:22.880 --> 0:17:27.879
<v Speaker 1>But that's just one application. So how do you end

0:17:27.960 --> 0:17:31.879
<v Speaker 1>up navigating this much information? Like what what's the what's

0:17:31.920 --> 0:17:34.600
<v Speaker 1>the magic key to it? And there are actually a

0:17:34.640 --> 0:17:36.600
<v Speaker 1>couple different ways. I don't want to I don't want

0:17:36.600 --> 0:17:38.480
<v Speaker 1>to throw my co host under the bus because I

0:17:38.520 --> 0:17:41.280
<v Speaker 1>specifically look this stuff up. And so I ask the question.

0:17:41.280 --> 0:17:45.000
<v Speaker 1>They're like, I, uh, it's a lot throw a dart.

0:17:45.160 --> 0:17:53.119
<v Speaker 1>I don't know. Um, you use computers, that's good, Joe, supercomputers.

0:17:53.200 --> 0:17:56.359
<v Speaker 1>Perhaps you could use supercomputers. You could also use grid computing,

0:17:56.359 --> 0:17:58.520
<v Speaker 1>which is where you end up using a lot of

0:17:58.560 --> 0:18:02.120
<v Speaker 1>computers to work on a single problem. So you guys

0:18:02.160 --> 0:18:06.440
<v Speaker 1>have probably heard the term parallel processing, parallel parallel processing

0:18:06.520 --> 0:18:08.800
<v Speaker 1>is is an idea where you are able to take

0:18:08.880 --> 0:18:12.400
<v Speaker 1>certain kinds of computer problems where you can divide up

0:18:12.400 --> 0:18:16.879
<v Speaker 1>the problem into different UH sections, maybe even subsections of data,

0:18:17.320 --> 0:18:21.160
<v Speaker 1>and then assign each of those parts to a different processor.

0:18:21.520 --> 0:18:24.280
<v Speaker 1>And this this could happen within a computer, or you

0:18:24.280 --> 0:18:28.080
<v Speaker 1>could be talking about spreading it out lots of different computers.

0:18:28.119 --> 0:18:30.080
<v Speaker 1>So if you have a computer that has multiple cores,

0:18:30.200 --> 0:18:33.359
<v Speaker 1>for example a multi core processor, each core could be

0:18:33.400 --> 0:18:36.680
<v Speaker 1>taking part of this problem and working on it separately.

0:18:37.200 --> 0:18:39.520
<v Speaker 1>And really what you have is you have you essentially

0:18:39.600 --> 0:18:44.160
<v Speaker 1>have one unit acting as the director, and the director's

0:18:44.240 --> 0:18:46.560
<v Speaker 1>job is to take the problem and to divide it

0:18:46.680 --> 0:18:50.000
<v Speaker 1>up into manageable chunks, and then to parcel that out

0:18:50.080 --> 0:18:53.120
<v Speaker 1>to all the other elements of the system, whether it's

0:18:53.119 --> 0:18:56.840
<v Speaker 1>other computers or other processors or other cores. Their job

0:18:56.920 --> 0:18:59.400
<v Speaker 1>is to work on that particular part of the problem

0:18:59.680 --> 0:19:03.320
<v Speaker 1>and then send the results back to the master. The

0:19:03.359 --> 0:19:06.000
<v Speaker 1>master then takes all the results and has a collective

0:19:06.000 --> 0:19:10.080
<v Speaker 1>result as as the final product, and that's where you

0:19:10.119 --> 0:19:13.639
<v Speaker 1>get the answer that you're looking for UH. It's called

0:19:14.119 --> 0:19:17.119
<v Speaker 1>often this this approach is called a map reduce framework.

0:19:17.200 --> 0:19:19.720
<v Speaker 1>You're mapping out the problem and then routing it out

0:19:20.200 --> 0:19:23.240
<v Speaker 1>and then you reduce the problem that way, when all

0:19:23.280 --> 0:19:25.840
<v Speaker 1>the uh and when all the answers are sent back

0:19:25.880 --> 0:19:28.200
<v Speaker 1>to you, you you reduce all those answers into one answer.

0:19:28.600 --> 0:19:32.200
<v Speaker 1>So that's the whole process for taking generally a huge

0:19:32.240 --> 0:19:35.400
<v Speaker 1>problem and making it manageable. Now, the key to all

0:19:35.440 --> 0:19:39.000
<v Speaker 1>of this, and a lot of people and companies that

0:19:39.040 --> 0:19:42.439
<v Speaker 1>specialize in big data will tell everyone this is that

0:19:42.480 --> 0:19:45.199
<v Speaker 1>you can't just do anything with this information. What you

0:19:45.320 --> 0:19:48.120
<v Speaker 1>have to do is decide what is it that you

0:19:48.200 --> 0:19:50.760
<v Speaker 1>want to do with the information, specific thing are we

0:19:50.880 --> 0:19:54.120
<v Speaker 1>looking for, right, and then you build a system that

0:19:54.200 --> 0:19:57.520
<v Speaker 1>lets you get that from derive that from all the

0:19:57.600 --> 0:20:00.520
<v Speaker 1>data you have. So it's not like you just look

0:20:00.560 --> 0:20:02.920
<v Speaker 1>at it. Yeah, you can't just look at this big

0:20:02.960 --> 0:20:06.640
<v Speaker 1>ball of zeros and wines and then just magically draw

0:20:06.720 --> 0:20:09.320
<v Speaker 1>out the information you need. What you do and or

0:20:09.359 --> 0:20:11.160
<v Speaker 1>you just sit there and you look at and say,

0:20:11.400 --> 0:20:13.840
<v Speaker 1>you know, what can we learn from this? If you

0:20:13.880 --> 0:20:15.720
<v Speaker 1>look for a specific kind of pattern, like if you're

0:20:15.760 --> 0:20:18.000
<v Speaker 1>looking for a needle in a haystack, you're you're looking

0:20:18.000 --> 0:20:21.199
<v Speaker 1>for something shiny, for example, and and you know, if

0:20:21.240 --> 0:20:23.520
<v Speaker 1>you if you just go like, yeah, if you just say, well,

0:20:23.520 --> 0:20:26.000
<v Speaker 1>I'm looking for something kind of pointy and short. Then

0:20:26.040 --> 0:20:29.080
<v Speaker 1>that's not What is interesting is that when you get

0:20:29.160 --> 0:20:32.199
<v Speaker 1>information on this scale, this huge amount of information, you

0:20:32.200 --> 0:20:35.080
<v Speaker 1>can actually start to recognize patterns that otherwise would have

0:20:35.119 --> 0:20:38.720
<v Speaker 1>been completely Yeah, you would never have been able to Again,

0:20:38.720 --> 0:20:40.920
<v Speaker 1>it's the forest for the trees thing. You would never

0:20:40.920 --> 0:20:43.119
<v Speaker 1>have been able to have seen the forest because you

0:20:43.160 --> 0:20:45.119
<v Speaker 1>were right there in the middle of all those trees.

0:20:45.560 --> 0:20:47.440
<v Speaker 1>So the same sort of thing, you'd be able to

0:20:47.440 --> 0:20:50.440
<v Speaker 1>see these big patterns that happen. And that's where especially

0:20:50.520 --> 0:20:53.520
<v Speaker 1>things like marketing ends up being a big deal, because

0:20:53.920 --> 0:20:57.920
<v Speaker 1>you can see things like tendencies for customers to behave

0:20:57.960 --> 0:20:59.679
<v Speaker 1>in a certain way, and if you want them to

0:20:59.720 --> 0:21:01.959
<v Speaker 1>behave of a particular way, you can start to focus

0:21:02.000 --> 0:21:04.000
<v Speaker 1>on things that kind of guide them in that direction.

0:21:04.040 --> 0:21:07.639
<v Speaker 1>But I've even seen bizarre representations a thing online that

0:21:07.800 --> 0:21:10.560
<v Speaker 1>was the strange corn ucopia shape and it was just

0:21:10.720 --> 0:21:14.680
<v Speaker 1>labeled like the geometry of big data. Yeah, it's it's

0:21:14.680 --> 0:21:17.760
<v Speaker 1>a little like again, when we're talking about something so

0:21:17.880 --> 0:21:20.479
<v Speaker 1>huge that it's difficult for us to get a mental

0:21:20.560 --> 0:21:25.080
<v Speaker 1>grasp on it, trying to find a representation of that

0:21:25.080 --> 0:21:28.800
<v Speaker 1>that makes sense. To us is something of an uphill

0:21:28.840 --> 0:21:31.919
<v Speaker 1>battle it. You know, it's a lot of people have tried,

0:21:32.400 --> 0:21:36.639
<v Speaker 1>but it's really difficult to make this and make it

0:21:36.760 --> 0:21:39.280
<v Speaker 1>understandable in a way that doesn't just blow out the

0:21:39.320 --> 0:21:43.160
<v Speaker 1>scale immediately, where you know, you have the manageable amount

0:21:43.200 --> 0:21:46.280
<v Speaker 1>of data and then the spike just goes all the

0:21:46.280 --> 0:21:48.199
<v Speaker 1>way up through the top of the graph and you

0:21:48.240 --> 0:21:52.280
<v Speaker 1>can't see the topic. That's so it's really big. Other

0:21:52.359 --> 0:21:55.240
<v Speaker 1>ways to handle all that data, there's also uh the

0:21:55.280 --> 0:21:58.200
<v Speaker 1>approach of doing just real time analytics and streaming of data.

0:21:58.640 --> 0:22:00.480
<v Speaker 1>In this case, this would be kind of like traffic

0:22:00.520 --> 0:22:04.240
<v Speaker 1>example I gave earlier. So with traffic, you have all

0:22:04.240 --> 0:22:07.480
<v Speaker 1>these sensors gathering data, and then you have uh that

0:22:07.880 --> 0:22:11.360
<v Speaker 1>analysis and streaming of the data happening immediately, and then

0:22:11.400 --> 0:22:14.800
<v Speaker 1>you get the results. UH. In this case, you don't

0:22:14.840 --> 0:22:17.639
<v Speaker 1>have to worry so much about storing lots of data

0:22:17.760 --> 0:22:21.960
<v Speaker 1>because it doesn't really matter if there was a slow

0:22:22.080 --> 0:22:25.600
<v Speaker 1>spot on the highway two hours ago. What matters is

0:22:25.600 --> 0:22:28.119
<v Speaker 1>what's going on right now, So you don't have to

0:22:28.160 --> 0:22:31.800
<v Speaker 1>worry about building these huge data centers to store all

0:22:31.800 --> 0:22:34.240
<v Speaker 1>that information. You just have to build a system that's

0:22:34.280 --> 0:22:38.200
<v Speaker 1>large enough to handle incoming information and give outgoing information.

0:22:38.200 --> 0:22:41.320
<v Speaker 1>So you have to have a good input output basis. Uh,

0:22:41.440 --> 0:22:43.960
<v Speaker 1>that's what's important in those types of systems. Now, in

0:22:44.040 --> 0:22:46.119
<v Speaker 1>the other types of systems where you are collecting and

0:22:46.160 --> 0:22:48.879
<v Speaker 1>analyzing enormous amounts of information, you have to have a

0:22:48.880 --> 0:22:51.360
<v Speaker 1>place for that information to live, and that's where storage

0:22:51.359 --> 0:22:54.560
<v Speaker 1>comes into play. And that's where we see these enormous

0:22:54.720 --> 0:22:58.680
<v Speaker 1>data centers things that buildings are specifically made to house

0:22:59.240 --> 0:23:02.200
<v Speaker 1>data servers. So it's if you were to walk into

0:23:02.240 --> 0:23:05.720
<v Speaker 1>one of these places, it essentially would look like a

0:23:05.800 --> 0:23:09.720
<v Speaker 1>huge warehouse or maybe even like an airline hangar. I mean,

0:23:09.760 --> 0:23:13.919
<v Speaker 1>these buildings can be enormous and they're filled with shelves

0:23:14.280 --> 0:23:21.439
<v Speaker 1>of servers. They usually have massive HVAC systems. Sometimes they

0:23:21.520 --> 0:23:24.760
<v Speaker 1>ideally they are because of course, the warmer things get,

0:23:24.840 --> 0:23:28.840
<v Speaker 1>the more poorly technology can perform. To an extent. You

0:23:28.840 --> 0:23:31.680
<v Speaker 1>don't want to super cool everything because then it can

0:23:32.160 --> 0:23:34.600
<v Speaker 1>have its own problems, but you do want it to

0:23:34.680 --> 0:23:38.160
<v Speaker 1>maintain a decent operable temperature. So you might have even

0:23:38.160 --> 0:23:41.359
<v Speaker 1>a water cooling system and not just air cooling. They

0:23:41.400 --> 0:23:44.600
<v Speaker 1>have to have sort of a distributed redundancy to don't

0:23:44.960 --> 0:23:48.840
<v Speaker 1>like because if one machine dies and with that many machines,

0:23:48.920 --> 0:23:52.280
<v Speaker 1>you know, just every so often several machines are going

0:23:52.320 --> 0:23:56.439
<v Speaker 1>to die. You you can't lose something, right. So for example, Google,

0:23:56.680 --> 0:23:59.720
<v Speaker 1>which is that's a great example, because Google has lots

0:23:59.720 --> 0:24:03.760
<v Speaker 1>of out of centers, uh, and that Google uses famously.

0:24:03.880 --> 0:24:08.480
<v Speaker 1>They use fairly inexpensive servers in the grand scheme of things.

0:24:08.480 --> 0:24:11.480
<v Speaker 1>They're not buying the top of the line, fresh off

0:24:11.520 --> 0:24:15.760
<v Speaker 1>the manufacturing plant servers. They want things that are plentiful

0:24:15.840 --> 0:24:18.480
<v Speaker 1>and easy to replace. Yeah, they're going for efficiency, not

0:24:19.200 --> 0:24:21.840
<v Speaker 1>high power. Yeah, they don't. They don't need each server

0:24:22.040 --> 0:24:25.880
<v Speaker 1>to be able to handle the workload of three other servers.

0:24:25.920 --> 0:24:28.080
<v Speaker 1>They want things that are going to be reliable and

0:24:28.119 --> 0:24:30.159
<v Speaker 1>if it does break down, make it easy to switch

0:24:30.200 --> 0:24:33.160
<v Speaker 1>it out with something else. But on their system, they

0:24:33.200 --> 0:24:37.280
<v Speaker 1>do have lots of redundancy. And it's this idea that

0:24:37.359 --> 0:24:41.160
<v Speaker 1>you know, stuff breaks, machines go down, power goes out.

0:24:41.440 --> 0:24:44.879
<v Speaker 1>With that many it's just statistically guaranteed exactly you know

0:24:44.960 --> 0:24:47.440
<v Speaker 1>what's going to happen. So the way you protect against

0:24:47.480 --> 0:24:50.560
<v Speaker 1>it is that you build extra you have extra machines

0:24:50.680 --> 0:24:53.200
<v Speaker 1>involved there so that some of them have a little

0:24:53.200 --> 0:24:55.919
<v Speaker 1>bit of information from like if you have servers A

0:24:56.040 --> 0:24:58.399
<v Speaker 1>through Z, server DEM might have a little bit of

0:24:58.440 --> 0:25:01.960
<v Speaker 1>information from server A, and then maybe even server you know,

0:25:02.200 --> 0:25:04.240
<v Speaker 1>J has a little bit from server A. So it's

0:25:04.280 --> 0:25:06.280
<v Speaker 1>the ideas that you've spread it out so that if

0:25:06.320 --> 0:25:10.080
<v Speaker 1>anyone server goes down, you still have access to that information,

0:25:10.160 --> 0:25:13.560
<v Speaker 1>so that there's no interruption in service. Uh. Now there

0:25:13.600 --> 0:25:16.800
<v Speaker 1>are cases of servers that have gone down where that

0:25:16.880 --> 0:25:20.440
<v Speaker 1>was the only really source of that information and that

0:25:20.520 --> 0:25:23.679
<v Speaker 1>has been a spectacular failure. You mean my data is

0:25:23.720 --> 0:25:26.560
<v Speaker 1>not safe. Well, I'm not going to say that, Joe.

0:25:26.640 --> 0:25:30.280
<v Speaker 1>That's not Let's not spread fear, uncertainty, in doubt. That's

0:25:30.320 --> 0:25:33.960
<v Speaker 1>not what this podcast is about. Now, It's not that.

0:25:34.040 --> 0:25:36.080
<v Speaker 1>It's just that there have been times in the past

0:25:36.119 --> 0:25:38.560
<v Speaker 1>where it became clear like this is the way to go,

0:25:38.640 --> 0:25:40.679
<v Speaker 1>The redundancy way is the way to go, and I

0:25:40.680 --> 0:25:43.359
<v Speaker 1>would I would say that, you know, I can't imagine

0:25:43.400 --> 0:25:48.200
<v Speaker 1>any operation of the size that would involve big data

0:25:48.640 --> 0:25:51.960
<v Speaker 1>would not also have redundancy plans there. And then of

0:25:52.000 --> 0:25:53.440
<v Speaker 1>course that does mean that you have to have even

0:25:53.440 --> 0:25:57.600
<v Speaker 1>more machines than what you would require at minimum, and

0:25:57.640 --> 0:26:01.040
<v Speaker 1>that requirement is constantly going up. But as we're generating

0:26:01.080 --> 0:26:03.840
<v Speaker 1>more and more data every day. So then the question

0:26:03.880 --> 0:26:06.160
<v Speaker 1>becomes data management. You know, how long do you keep

0:26:06.200 --> 0:26:09.840
<v Speaker 1>that information? At what point do you you know, do

0:26:09.920 --> 0:26:13.760
<v Speaker 1>you ever wipe uh a drive so that you can

0:26:13.920 --> 0:26:16.359
<v Speaker 1>you know, fill it up again? Or it all depends

0:26:16.359 --> 0:26:18.800
<v Speaker 1>on what your your services and what the purpose of

0:26:18.840 --> 0:26:22.480
<v Speaker 1>it is. But uh, yeah, I mean that's it's it's

0:26:22.560 --> 0:26:26.960
<v Speaker 1>kind of crazy, like how much how much infrastructure needs

0:26:27.000 --> 0:26:29.119
<v Speaker 1>to exist just so that these zeros and ones have

0:26:29.160 --> 0:26:33.520
<v Speaker 1>a home? Well, so if we extrapolate that outward, that

0:26:33.640 --> 0:26:36.000
<v Speaker 1>leads me to what seems like kind of a weird

0:26:36.160 --> 0:26:40.919
<v Speaker 1>philosophical question almost Uh, is there a limit to the

0:26:41.000 --> 0:26:45.399
<v Speaker 1>kind of data we can process? And ultimately, if you

0:26:45.480 --> 0:26:49.600
<v Speaker 1>say no, there's no real physical limit, there's no necessary limit.

0:26:50.119 --> 0:26:53.760
<v Speaker 1>Is it possible to represent the entire universe, all of

0:26:53.840 --> 0:26:58.240
<v Speaker 1>reality as data? Or is there something about the universe

0:26:58.320 --> 0:27:01.880
<v Speaker 1>that can't ever be reduced to information? I am going

0:27:01.920 --> 0:27:07.679
<v Speaker 1>to tackle your question in multiple parts. Part the first, uh,

0:27:07.880 --> 0:27:10.720
<v Speaker 1>is there a limit? I I hesitate to ever say

0:27:10.720 --> 0:27:13.560
<v Speaker 1>that there is a limit in the sense that there

0:27:13.600 --> 0:27:17.800
<v Speaker 1>there's always more innovation that allows us to do bigger

0:27:17.800 --> 0:27:20.960
<v Speaker 1>and better things. But I will say there is a

0:27:21.000 --> 0:27:24.880
<v Speaker 1>limit to the amount of energy that is in the universe, right, yeah, yeah,

0:27:24.920 --> 0:27:28.240
<v Speaker 1>I was just thinking about like, well, they're say physical constants,

0:27:28.280 --> 0:27:30.919
<v Speaker 1>and there's like the speed of light and stuff like that,

0:27:31.040 --> 0:27:35.600
<v Speaker 1>but we're talking about things that ever represent a barrier. Well,

0:27:35.600 --> 0:27:38.160
<v Speaker 1>I don't think we found any kind of physical law

0:27:38.280 --> 0:27:41.440
<v Speaker 1>that states that once you get to this amount of data,

0:27:41.520 --> 0:27:44.399
<v Speaker 1>there's no amount of parsing that you can do to

0:27:44.480 --> 0:27:47.720
<v Speaker 1>make it useful. I don't think, well, certainly, we haven't

0:27:47.800 --> 0:27:50.480
<v Speaker 1>encountered that yet. I don't think it's I don't think

0:27:50.480 --> 0:27:54.240
<v Speaker 1>that's possible, simply because as we get more and more data,

0:27:54.280 --> 0:27:57.200
<v Speaker 1>we're also building more and more powerful machines that can

0:27:57.280 --> 0:28:00.159
<v Speaker 1>handle larger amounts of data. And if we're able to

0:28:00.240 --> 0:28:04.960
<v Speaker 1>break down those problems into smaller bits anyway, then really

0:28:05.000 --> 0:28:07.600
<v Speaker 1>the limitation we're the limiting factor we're looking at here

0:28:07.680 --> 0:28:12.040
<v Speaker 1>is energy, not computering power. So although at the current moment,

0:28:12.200 --> 0:28:14.600
<v Speaker 1>some computer scientists are concerned about the amount of data

0:28:14.600 --> 0:28:17.000
<v Speaker 1>that we're crunching versus there they're saying that the amount

0:28:17.000 --> 0:28:20.639
<v Speaker 1>of data that we're creating is um fast, outstripping More's

0:28:20.720 --> 0:28:23.480
<v Speaker 1>law in terms of how fast processors are going. Sure,

0:28:23.880 --> 0:28:26.200
<v Speaker 1>there will there will be bottlenecks. I mean, there will

0:28:26.240 --> 0:28:29.240
<v Speaker 1>obviously be bald next. But if you're looking at truly philosophical,

0:28:29.400 --> 0:28:35.120
<v Speaker 1>idealistic approach, you're essentially saying that eventually you could create

0:28:35.200 --> 0:28:38.160
<v Speaker 1>more data than you could possibly process, only because you

0:28:38.240 --> 0:28:40.280
<v Speaker 1>don't have enough there's not enough energy in the universe

0:28:40.320 --> 0:28:43.040
<v Speaker 1>to run all the processors you would need to handle

0:28:43.040 --> 0:28:45.240
<v Speaker 1>that much data. There's that or like I had the

0:28:45.280 --> 0:28:47.600
<v Speaker 1>crazy absurd idea. I know this is silly, but like

0:28:48.080 --> 0:28:51.719
<v Speaker 1>you've got so much data that the server farm is

0:28:51.800 --> 0:28:55.520
<v Speaker 1>so big that the pieces of information are too far

0:28:55.600 --> 0:29:01.120
<v Speaker 1>apart to communicate with each other efficiently. Like physically, I

0:29:01.160 --> 0:29:03.400
<v Speaker 1>can see what you're saying. So you're saying like like

0:29:03.880 --> 0:29:10.200
<v Speaker 1>we had we had yeah, if we had planets, it

0:29:10.240 --> 0:29:13.520
<v Speaker 1>all depends on on Yeah, I can see what you're saying.

0:29:13.520 --> 0:29:15.400
<v Speaker 1>So let's say that Let's say that we're filling up

0:29:15.440 --> 0:29:18.320
<v Speaker 1>space with servers. This again is a very philosophical kind

0:29:18.320 --> 0:29:21.120
<v Speaker 1>of you know, thought experiment approach. But let's say we're

0:29:21.160 --> 0:29:25.760
<v Speaker 1>filling up fifty years out where he filled up space

0:29:25.800 --> 0:29:30.320
<v Speaker 1>with with with computer servers, and you were just packing

0:29:30.440 --> 0:29:33.479
<v Speaker 1>space with the servers so that you could process more

0:29:33.520 --> 0:29:37.720
<v Speaker 1>and more data. You could get to all right, enough

0:29:37.760 --> 0:29:39.160
<v Speaker 1>of the comedy. I'm trying to make a point here.

0:29:39.200 --> 0:29:41.800
<v Speaker 1>You can't. You could get to a point in that

0:29:42.000 --> 0:29:45.680
<v Speaker 1>in that theoretically where you've got a server that literally

0:29:45.760 --> 0:29:49.000
<v Speaker 1>is light years away, could be physically next to billions

0:29:49.040 --> 0:29:53.120
<v Speaker 1>of other servers, but it's away from where. Yeah, then

0:29:53.160 --> 0:29:57.000
<v Speaker 1>you're talking about you're limited by the speed of communication.

0:29:57.240 --> 0:30:01.080
<v Speaker 1>Not it's again not really the p assessing limitation. It's

0:30:01.520 --> 0:30:04.160
<v Speaker 1>the speed of light that's limiting you. But but you'd

0:30:04.200 --> 0:30:07.680
<v Speaker 1>be able to do it, it would just take time. Um,

0:30:07.760 --> 0:30:11.440
<v Speaker 1>your other question, could all of the universe, all reality itself,

0:30:11.440 --> 0:30:15.640
<v Speaker 1>be broken down into data? I don't know, because first

0:30:15.640 --> 0:30:18.440
<v Speaker 1>of all, we're only able to observe part of the universe,

0:30:18.480 --> 0:30:20.080
<v Speaker 1>and we don't know how much of the rest of

0:30:20.120 --> 0:30:25.400
<v Speaker 1>it there is. Uh. But assuming that we could, that

0:30:25.440 --> 0:30:27.760
<v Speaker 1>would mean that all right, let's let's assume that it

0:30:27.840 --> 0:30:30.640
<v Speaker 1>is possible to break down all of reality into zeros

0:30:30.640 --> 0:30:32.800
<v Speaker 1>and one sure, sure, in that twenty fifty years, we

0:30:32.840 --> 0:30:34.480
<v Speaker 1>have figured out what dark matter is and we have

0:30:34.880 --> 0:30:37.040
<v Speaker 1>observed all of it. Right, We've got we've got to

0:30:37.320 --> 0:30:39.760
<v Speaker 1>we've got it down. Our fingers on the pulse of

0:30:39.760 --> 0:30:41.920
<v Speaker 1>the universe. And we know what makes it tick, and

0:30:41.960 --> 0:30:45.160
<v Speaker 1>then we can actually create a simulation of that because

0:30:45.200 --> 0:30:47.760
<v Speaker 1>we know that, we can break down the universe itself

0:30:47.840 --> 0:30:52.000
<v Speaker 1>into what you know, into data, make that transition. That

0:30:52.040 --> 0:30:54.320
<v Speaker 1>would then raise the argument of well, if we could

0:30:54.320 --> 0:30:57.360
<v Speaker 1>do that, then theoretically we could create a simulation of

0:30:57.400 --> 0:31:01.400
<v Speaker 1>our universe on a smaller scale digitally and then be

0:31:01.440 --> 0:31:03.520
<v Speaker 1>able to run interesting numbers on you know, what would

0:31:03.520 --> 0:31:06.440
<v Speaker 1>have happened if we had tweaked this protein in this protozoa,

0:31:06.680 --> 0:31:08.480
<v Speaker 1>Or what would have happened if there had been more

0:31:09.000 --> 0:31:12.080
<v Speaker 1>antimatter particles rather than matter particles, or what would have

0:31:12.120 --> 0:31:14.840
<v Speaker 1>happened if yeah, if I mean that that age old question,

0:31:14.840 --> 0:31:17.440
<v Speaker 1>if if someone had gone back and killed Hitler, and yeah,

0:31:17.680 --> 0:31:20.000
<v Speaker 1>to the point where you could actually, you know, theoretically

0:31:20.040 --> 0:31:24.240
<v Speaker 1>create life virtual life, which then raises the question, wait,

0:31:24.360 --> 0:31:26.640
<v Speaker 1>if that is possible, If all of that is possible

0:31:26.640 --> 0:31:28.600
<v Speaker 1>for us to be able to make this, to break

0:31:28.640 --> 0:31:31.120
<v Speaker 1>down the universe into a simulation and make it ourselves

0:31:31.200 --> 0:31:33.920
<v Speaker 1>and be able to watch it, really we will. That means, yes,

0:31:33.960 --> 0:31:36.520
<v Speaker 1>we will one day do that because we're people and

0:31:36.560 --> 0:31:38.400
<v Speaker 1>we're curious and we want to do that, which means

0:31:39.720 --> 0:31:44.960
<v Speaker 1>which means Yeah, that that means that that the highest

0:31:45.040 --> 0:31:47.160
<v Speaker 1>likelihood is that we are in fact living in a

0:31:47.160 --> 0:31:50.000
<v Speaker 1>computer simulation right now, because if it's possible, then we

0:31:50.040 --> 0:31:52.560
<v Speaker 1>will do it, and if we will do it, then

0:31:52.760 --> 0:31:56.400
<v Speaker 1>we probably already have done it, and that we the

0:31:56.440 --> 0:31:58.440
<v Speaker 1>people who are living in this reality right now, are

0:31:58.440 --> 0:32:01.200
<v Speaker 1>in fact living through a computer simulation, which could be

0:32:01.560 --> 0:32:04.880
<v Speaker 1>a computer simulation, which could be a computer simulation. Yeah,

0:32:04.920 --> 0:32:07.320
<v Speaker 1>that's what what I started thinking about when we really

0:32:07.320 --> 0:32:10.200
<v Speaker 1>got into the absurdities. Here is this kind of snake

0:32:10.240 --> 0:32:12.840
<v Speaker 1>eating its own tail sort of thing, like, well, imagine

0:32:12.840 --> 0:32:16.560
<v Speaker 1>you could represent the entire universe somehow is data that

0:32:16.760 --> 0:32:20.760
<v Speaker 1>universe representation would have to include all of the simulations

0:32:20.800 --> 0:32:24.080
<v Speaker 1>and data within the universe. Um. Well, if you were

0:32:24.120 --> 0:32:26.600
<v Speaker 1>creating a simulation of the universe, you could be selective

0:32:26.640 --> 0:32:28.920
<v Speaker 1>in what you were including and what you weren't claim.

0:32:28.920 --> 0:32:31.520
<v Speaker 1>But if you were trying to build a an actual like,

0:32:31.840 --> 0:32:35.760
<v Speaker 1>it's essentially like making a map to scale in a

0:32:35.840 --> 0:32:39.080
<v Speaker 1>one to one scale. Yeah, Like, here's my map of

0:32:39.120 --> 0:32:42.000
<v Speaker 1>Atlanta at one to one scale. It's the size of Atlanta.

0:32:42.360 --> 0:32:46.240
<v Speaker 1>Not very useful, Um, but anyway, this is this is

0:32:46.280 --> 0:32:48.600
<v Speaker 1>a philosophical argument that's been made before about whether or

0:32:48.640 --> 0:32:50.680
<v Speaker 1>not we're in a computer simulation, which really was more

0:32:50.800 --> 0:32:54.640
<v Speaker 1>about the idea that we probably will never get there

0:32:54.680 --> 0:32:56.840
<v Speaker 1>in the sense that you know, it wasn't It wasn't

0:32:56.840 --> 0:32:58.760
<v Speaker 1>that we definitely are living in a computer simulation, but

0:32:58.880 --> 0:33:03.520
<v Speaker 1>rather that humankind would very likely end its own existence

0:33:03.560 --> 0:33:06.920
<v Speaker 1>before reaching a point where we were capable of doing

0:33:06.960 --> 0:33:08.480
<v Speaker 1>such a thing. And you're talking about the point where

0:33:08.480 --> 0:33:11.560
<v Speaker 1>you're actually harnessing the power of stars themselves in order

0:33:11.560 --> 0:33:15.000
<v Speaker 1>to generate computer power you need. Well, that's pretty cool,

0:33:15.040 --> 0:33:17.240
<v Speaker 1>and I'm not so much skeptical about that as I

0:33:17.280 --> 0:33:23.000
<v Speaker 1>am about housing consciousness inside a you know, a computer processor, right, Well,

0:33:23.040 --> 0:33:25.960
<v Speaker 1>and and again we this is kind of getting way

0:33:26.000 --> 0:33:28.400
<v Speaker 1>off track, but but but there's the idea that if

0:33:28.400 --> 0:33:31.520
<v Speaker 1>you were able to create a simulation of a human brain,

0:33:31.640 --> 0:33:33.600
<v Speaker 1>there's no way of predicting whether or not it would

0:33:33.680 --> 0:33:38.120
<v Speaker 1>develop its own consciousness. Yeah, we don't know, because we

0:33:38.200 --> 0:33:40.800
<v Speaker 1>haven't been able to build a human brain on a

0:33:40.880 --> 0:33:44.680
<v Speaker 1>real time, uh scale, like a one to one scale

0:33:45.320 --> 0:33:48.600
<v Speaker 1>without you know, we we've built very small models that

0:33:48.640 --> 0:33:52.040
<v Speaker 1>could run in very slow amount of time. But well,

0:33:52.080 --> 0:33:54.080
<v Speaker 1>and and again, I mean, much like the universe we

0:33:54.120 --> 0:33:55.840
<v Speaker 1>really don't know what's going on inside the human brain.

0:33:55.880 --> 0:33:57.640
<v Speaker 1>There's so much of it that we don't understand simulating

0:33:57.680 --> 0:34:02.000
<v Speaker 1>the brain. That's it is big data. Yeah, so anyway, uh,

0:34:02.080 --> 0:34:04.400
<v Speaker 1>you know that, I guess that kind of wraps up

0:34:04.440 --> 0:34:07.120
<v Speaker 1>this whole overview of what big data is and why

0:34:07.240 --> 0:34:09.120
<v Speaker 1>it's And I know I keep saying big data and

0:34:09.120 --> 0:34:12.399
<v Speaker 1>switching to big data, but that's what I do every day.

0:34:12.440 --> 0:34:14.719
<v Speaker 1>But but yeah, this is big business, is what it

0:34:14.760 --> 0:34:18.840
<v Speaker 1>really boils down to, because companies are are trying to

0:34:18.880 --> 0:34:22.239
<v Speaker 1>harness all this information to make it meaningful in some way.

0:34:22.520 --> 0:34:26.160
<v Speaker 1>If if it weren't possible to do that, then we'd

0:34:26.160 --> 0:34:29.040
<v Speaker 1>probably see a lot of these services die off pretty

0:34:29.120 --> 0:34:31.640
<v Speaker 1>quickly because there just wouldn't be the support there to

0:34:32.280 --> 0:34:37.600
<v Speaker 1>financially to have them continue. They make money by serving advertising,

0:34:37.719 --> 0:34:40.120
<v Speaker 1>right right, there's the advertising, and then there's you know,

0:34:40.160 --> 0:34:43.520
<v Speaker 1>there's some companies that are not revenue supported, but they

0:34:43.560 --> 0:34:46.680
<v Speaker 1>are supported by investors, and a lot of these these

0:34:46.719 --> 0:34:49.560
<v Speaker 1>investors are saying, look, I know that right now there's

0:34:49.640 --> 0:34:52.160
<v Speaker 1>no direct way that this service is making money, but

0:34:52.200 --> 0:34:55.600
<v Speaker 1>the data it generates is intrinsically valuable, and as soon

0:34:55.640 --> 0:34:58.200
<v Speaker 1>as we figure out a way of leveraging that data.

0:34:58.800 --> 0:35:01.520
<v Speaker 1>We make all of our investments back, So I mean,

0:35:01.760 --> 0:35:04.120
<v Speaker 1>you know, it's a it's a money game to step

0:35:04.200 --> 0:35:08.120
<v Speaker 1>for profits. It's exactly all right. Well, that wraps up

0:35:08.239 --> 0:35:11.799
<v Speaker 1>this conversation about big data and underpants gnomes. If you

0:35:11.840 --> 0:35:15.640
<v Speaker 1>guys have any suggestions for future episodes of forward Thinking,

0:35:15.960 --> 0:35:17.920
<v Speaker 1>I recommend you get in touch with us. Send us

0:35:17.920 --> 0:35:21.080
<v Speaker 1>an email. That's FW Thinking at discovery dot com, or

0:35:21.120 --> 0:35:23.840
<v Speaker 1>go to f W thinking dot com. Check out our blogs,

0:35:23.920 --> 0:35:27.160
<v Speaker 1>check out our podcasts, check out the social media, check

0:35:27.160 --> 0:35:28.920
<v Speaker 1>out all of the links we have there. We've got

0:35:28.920 --> 0:35:30.840
<v Speaker 1>some really cool content that we want to share with

0:35:30.880 --> 0:35:32.040
<v Speaker 1>you guys, and we want you to be part of

0:35:32.040 --> 0:35:34.520
<v Speaker 1>the conversation. So come on and join us and we

0:35:34.560 --> 0:35:40.600
<v Speaker 1>will talk to you again really soon. For more on

0:35:40.640 --> 0:35:43.840
<v Speaker 1>this topic and the future of technology, visit forward thinking

0:35:43.920 --> 0:35:57.440
<v Speaker 1>dot com, brought to you by Toyota. Let's go Places,