WEBVTT - Short Stuff: DNA Data Storage

0:00:04.160 --> 0:00:06.200
<v Speaker 1>Hey, and welcome to the short stuff. I'm Josh, and

0:00:06.280 --> 0:00:08.639
<v Speaker 1>there's Chuck, and Jerry's here too, and so's Dave and

0:00:08.720 --> 0:00:12.360
<v Speaker 1>Spirit and we're coming at you from the future of

0:00:12.520 --> 0:00:13.000
<v Speaker 1>right now.

0:00:14.120 --> 0:00:18.480
<v Speaker 2>This is one of those where it's so interesting, so cool,

0:00:18.920 --> 0:00:21.920
<v Speaker 2>so mind blowing and so promising, and then you get

0:00:21.920 --> 0:00:27.400
<v Speaker 2>to the very end and then you're like, oh.

0:00:25.960 --> 0:00:27.720
<v Speaker 1>To me, that just meant just give it a little

0:00:27.760 --> 0:00:28.280
<v Speaker 1>more time.

0:00:28.480 --> 0:00:30.080
<v Speaker 2>No, And in a lot of times that is the case,

0:00:30.080 --> 0:00:31.880
<v Speaker 2>and probably will be in this case. But it was

0:00:31.920 --> 0:00:35.000
<v Speaker 2>such a oh. And you'll see what in about you know,

0:00:35.080 --> 0:00:36.520
<v Speaker 2>twelish minutes what we're talking about.

0:00:36.840 --> 0:00:39.720
<v Speaker 1>So essentially, what we're talking about first is data. We've

0:00:39.720 --> 0:00:42.479
<v Speaker 1>got a lot of data. Like anytime somebody says something,

0:00:42.720 --> 0:00:45.360
<v Speaker 1>thinks something writes something down, somebody comes up with a

0:00:45.360 --> 0:00:48.760
<v Speaker 1>new recipe or a new patent or whatever, that gets encoded.

0:00:48.800 --> 0:00:51.520
<v Speaker 1>It's data that gets saved. We don't really throw stuff

0:00:51.560 --> 0:00:55.520
<v Speaker 1>away anymore. And so we're kind of a wash in data.

0:00:56.080 --> 0:00:57.720
<v Speaker 1>And if you want to take that data, you want

0:00:57.720 --> 0:00:59.480
<v Speaker 1>to save it, you want to preserve it. Let's say,

0:00:59.480 --> 0:01:04.080
<v Speaker 1>it's really like you're the Library of Congress. Sure get this, man,

0:01:04.120 --> 0:01:06.240
<v Speaker 1>I did not realize this what you do is you

0:01:06.280 --> 0:01:09.160
<v Speaker 1>take that data and you transfer onto the same kind

0:01:09.160 --> 0:01:14.240
<v Speaker 1>of magnetic reels that those old room sized computer mainframes

0:01:14.360 --> 0:01:17.720
<v Speaker 1>used to read and write data. Yeah, you put it

0:01:17.760 --> 0:01:21.319
<v Speaker 1>on tape, Yeah exactly. Well I didn't realize that, but

0:01:21.880 --> 0:01:27.080
<v Speaker 1>it's just the proven go to means of long term

0:01:27.120 --> 0:01:30.240
<v Speaker 1>it's called archival storage of the kind of data that

0:01:30.319 --> 0:01:33.960
<v Speaker 1>you don't really need to access anytime soon. It's called

0:01:34.000 --> 0:01:38.080
<v Speaker 1>low touch data. You're just putting it literally in cold storage.

0:01:38.720 --> 0:01:40.959
<v Speaker 2>Yeah, I mean, it's been around for a long time,

0:01:41.600 --> 0:01:46.679
<v Speaker 2>very dependable, very durable, very reliable. It doesn't cost a

0:01:46.680 --> 0:01:48.960
<v Speaker 2>lot of money. It can hold a ton of data.

0:01:49.680 --> 0:01:53.480
<v Speaker 2>One tape can hold between one million and fifteen million

0:01:53.520 --> 0:01:58.400
<v Speaker 2>gigabytes or one to fifteen petabytes. That's a lot of stuff.

0:01:58.840 --> 0:01:59.120
<v Speaker 1>Really.

0:02:00.000 --> 0:02:03.280
<v Speaker 2>The problem is, and you know it's all relative, but

0:02:04.760 --> 0:02:08.040
<v Speaker 2>they're kind of big, but not big big. They're three

0:02:08.040 --> 0:02:10.160
<v Speaker 2>inches by three inches and you're like, Chuck, that's not

0:02:10.240 --> 0:02:12.640
<v Speaker 2>very big at all, But that is big when you

0:02:12.720 --> 0:02:16.440
<v Speaker 2>talk about you know, potentially billions of these things and

0:02:16.480 --> 0:02:19.680
<v Speaker 2>having to store them in a place that is, like

0:02:19.720 --> 0:02:23.040
<v Speaker 2>you said, cold storage. So it's the cost of building

0:02:23.080 --> 0:02:27.040
<v Speaker 2>these cold storage buildings that is the issue. When it

0:02:27.040 --> 0:02:28.840
<v Speaker 2>comes to this three by three inch thing.

0:02:29.200 --> 0:02:31.520
<v Speaker 1>That and then also you know they've been around for

0:02:31.600 --> 0:02:34.200
<v Speaker 1>three quarters of a century, so we know they last

0:02:34.240 --> 0:02:35.960
<v Speaker 1>that long if you keep them in cold storage, but

0:02:36.000 --> 0:02:38.880
<v Speaker 1>we don't know exactly how long they will last, so

0:02:39.200 --> 0:02:43.560
<v Speaker 1>there's also a question of that. So that combined with

0:02:43.680 --> 0:02:47.800
<v Speaker 1>so cost questions about how long it will last, and

0:02:47.840 --> 0:02:52.280
<v Speaker 1>then also just the enormous amounts of information we're adding

0:02:52.520 --> 0:02:56.239
<v Speaker 1>every year are making people look for other ways to

0:02:56.720 --> 0:03:00.600
<v Speaker 1>encapsulate data, to encode data in ways that are cheaper,

0:03:00.639 --> 0:03:03.959
<v Speaker 1>that are smaller, that are require less money to keep cold.

0:03:04.520 --> 0:03:07.399
<v Speaker 1>And what they've come up with, chuck. For anybody who

0:03:07.480 --> 0:03:10.080
<v Speaker 1>has looked at the title of this episode, they won't

0:03:10.080 --> 0:03:13.160
<v Speaker 1>be very surprised. But DNA, that's right.

0:03:13.639 --> 0:03:15.080
<v Speaker 2>I know it's early, but we got to take a

0:03:15.080 --> 0:03:15.600
<v Speaker 2>break right.

0:03:15.520 --> 0:03:17.799
<v Speaker 1>There, right agreed, all right, we'll be right back.

0:03:38.800 --> 0:03:41.280
<v Speaker 2>All right. So you dropped a pretty big truth bomb

0:03:41.360 --> 0:03:45.280
<v Speaker 2>on everyone. I'm sure there are people that for sixty seconds,

0:03:45.320 --> 0:03:50.280
<v Speaker 2>where like, what storing data on DNA? Dude, that's in

0:03:50.360 --> 0:03:53.480
<v Speaker 2>my body? Like, what are you talking about putting data

0:03:53.480 --> 0:03:54.120
<v Speaker 2>in my body?

0:03:54.360 --> 0:03:55.240
<v Speaker 1>You got that straight?

0:03:56.120 --> 0:03:58.920
<v Speaker 2>You don't have that straight. But here's here's a pretty

0:03:58.920 --> 0:04:01.160
<v Speaker 2>good as far as how much this stuff can hold.

0:04:01.640 --> 0:04:05.280
<v Speaker 2>This is pretty staggering stuff. And this is from a

0:04:05.320 --> 0:04:08.720
<v Speaker 2>couple of dudes from the Los Alamos National Lab, and

0:04:09.040 --> 0:04:12.720
<v Speaker 2>I think you got it from Scientific American. Here's how

0:04:12.800 --> 0:04:18.280
<v Speaker 2>much DNA can hold. Seventy four million million bytes of information,

0:04:18.320 --> 0:04:20.280
<v Speaker 2>which is basically the Library of Congress.

0:04:20.400 --> 0:04:20.960
<v Speaker 1>That's a lot.

0:04:21.200 --> 0:04:24.040
<v Speaker 2>That's a lot. You can put that, if you were

0:04:24.040 --> 0:04:28.160
<v Speaker 2>putting it on DNA, into the size of something as

0:04:28.160 --> 0:04:31.800
<v Speaker 2>big as a poppy seed, six thousand times over. Right.

0:04:33.120 --> 0:04:35.360
<v Speaker 2>Said another way, if you split that seed in half,

0:04:35.920 --> 0:04:39.200
<v Speaker 2>you could store all of the data on Facebook.

0:04:39.920 --> 0:04:44.680
<v Speaker 1>Yeah, and then by twenty twenty five, the size of

0:04:44.760 --> 0:04:49.160
<v Speaker 1>the data that humanity's generated, it will reach an estimated

0:04:49.200 --> 0:04:52.760
<v Speaker 1>thirty three zeta bytes, so three point three followed by

0:04:52.800 --> 0:04:56.960
<v Speaker 1>twenty two zeros of bytes of information, a lot of bytes.

0:04:57.760 --> 0:05:01.400
<v Speaker 1>If you can transcribe that all to DNA, you could

0:05:01.440 --> 0:05:04.400
<v Speaker 1>fit the whole thing into a ping pong ball. Yeah,

0:05:04.560 --> 0:05:09.839
<v Speaker 1>not a three by three plastic cartridge, multiple times over.

0:05:10.160 --> 0:05:12.720
<v Speaker 1>A single ping pong ball could hold all of the

0:05:12.760 --> 0:05:15.880
<v Speaker 1>world's data. And you can make multiple ping pong balls

0:05:15.880 --> 0:05:16.880
<v Speaker 1>as backups too.

0:05:17.160 --> 0:05:19.359
<v Speaker 2>Yeah, and you don't need to. And it's pretty easy

0:05:19.360 --> 0:05:22.480
<v Speaker 2>to duplicate them, apparently, and you don't need to keep

0:05:22.520 --> 0:05:24.600
<v Speaker 2>them in the fridge, even though you could put it

0:05:24.640 --> 0:05:28.400
<v Speaker 2>in an egg carton sure and be set. You don't

0:05:28.440 --> 0:05:30.520
<v Speaker 2>even have to. It's going to last a long time

0:05:31.600 --> 0:05:33.919
<v Speaker 2>not being in cold storage, and probably even longer in

0:05:33.960 --> 0:05:34.680
<v Speaker 2>cold storage.

0:05:34.760 --> 0:05:37.039
<v Speaker 1>You could give a ping pong ball to every living

0:05:37.080 --> 0:05:40.520
<v Speaker 1>human to keep in their fridge and like it would

0:05:40.600 --> 0:05:43.159
<v Speaker 1>have no problem whatsoever. Be like here, you keep this

0:05:43.240 --> 0:05:46.040
<v Speaker 1>cold for one hundred and fifty years, and only.

0:05:45.839 --> 0:05:48.640
<v Speaker 2>Half of them would eat that ping pong ball thinking

0:05:48.680 --> 0:05:49.279
<v Speaker 2>it was an egg.

0:05:49.640 --> 0:05:52.080
<v Speaker 1>Yeah. Yeah, so you'd still be left with all of

0:05:52.120 --> 0:05:53.240
<v Speaker 1>those backups.

0:05:53.800 --> 0:05:57.359
<v Speaker 2>Here's where it gets super interesting though, because you know,

0:05:57.400 --> 0:05:59.560
<v Speaker 2>as most people listening probably are, Like I said, as

0:05:59.600 --> 0:06:01.680
<v Speaker 2>I was read this, I was like, Okay, that's a

0:06:01.720 --> 0:06:04.559
<v Speaker 2>cool idea, but like, how in the world does this work?

0:06:05.120 --> 0:06:09.080
<v Speaker 2>And it turns out that it's not that mind blowing

0:06:09.080 --> 0:06:11.000
<v Speaker 2>your difficult I'm not saying I could go out and

0:06:11.040 --> 0:06:13.359
<v Speaker 2>do it, but it makes a lot of sense to

0:06:13.360 --> 0:06:17.359
<v Speaker 2>wrap your head around. Because DNA, as we all know,

0:06:18.040 --> 0:06:26.280
<v Speaker 2>is composed of four nucleotides, or at least combinations of guanine, thymine, addenine,

0:06:26.320 --> 0:06:34.599
<v Speaker 2>and cytosine. Just remember GTAC and attica. Yeah, ooh, ironically,

0:06:35.480 --> 0:06:39.000
<v Speaker 2>all this digital data though is included. That's out there

0:06:39.040 --> 0:06:41.680
<v Speaker 2>in the world and as everyone knows, and ones and zeros.

0:06:42.360 --> 0:06:44.880
<v Speaker 2>So it's it's you know, it sounds like you know,

0:06:44.920 --> 0:06:47.000
<v Speaker 2>and it can be any combination of ways. But when

0:06:47.000 --> 0:06:48.719
<v Speaker 2>you really break it down, it's really you can either

0:06:48.760 --> 0:06:51.560
<v Speaker 2>just have zero zero, zero, one, one zero or one

0:06:51.640 --> 0:06:55.360
<v Speaker 2>one as far as those combinations go. And that's four things.

0:06:55.640 --> 0:06:58.640
<v Speaker 2>And there are those four nucleotides. So if you just

0:06:58.760 --> 0:07:02.240
<v Speaker 2>like say, hey, each one of these nucleotides is going

0:07:02.279 --> 0:07:05.080
<v Speaker 2>to be assigned to different number, then that's all you need.

0:07:05.320 --> 0:07:06.400
<v Speaker 2>There's the key to your map.

0:07:06.800 --> 0:07:12.000
<v Speaker 1>Yeah, so say adenocene stands for zero zero, and guanine

0:07:12.000 --> 0:07:15.200
<v Speaker 1>stands for one to one, and so forth, each one

0:07:15.240 --> 0:07:18.160
<v Speaker 1>stands for one of those pairs of possible combinations. Then

0:07:18.200 --> 0:07:21.800
<v Speaker 1>you can take any string of binary data zeros and

0:07:21.840 --> 0:07:26.520
<v Speaker 1>ones and turn it into genetic code based on those nucleotides.

0:07:26.560 --> 0:07:29.360
<v Speaker 1>So like you would just have you go from a

0:07:29.360 --> 0:07:31.920
<v Speaker 1>string of ones and zeros to a string of ATG's

0:07:32.000 --> 0:07:35.360
<v Speaker 1>and c's. That's it. The thing is is you're you're

0:07:35.360 --> 0:07:40.480
<v Speaker 1>not turning ones and zeros into letters. You're actually transcribing

0:07:41.000 --> 0:07:45.520
<v Speaker 1>the ones and zeros from binary code into physical genetic material.

0:07:45.720 --> 0:07:49.840
<v Speaker 1>You're actually putting a base of at Adenocene right there.

0:07:49.960 --> 0:07:53.520
<v Speaker 1>You're putting a base of thiamine next to it, like,

0:07:53.960 --> 0:07:56.720
<v Speaker 1>depending on how the code reads with the ones and

0:07:56.720 --> 0:07:59.880
<v Speaker 1>the zeros and what order they're in. You're actually physically

0:08:00.120 --> 0:08:04.760
<v Speaker 1>creating genetic material DNA. But rather than encoding the information

0:08:05.000 --> 0:08:09.440
<v Speaker 1>to building a living thing, you're encoding the information to

0:08:10.680 --> 0:08:13.880
<v Speaker 1>the entire catalog of stuff you should know. And honestly,

0:08:14.240 --> 0:08:18.800
<v Speaker 1>isn't that the first thing we should preserve in DNA? Sure? Good?

0:08:19.160 --> 0:08:21.520
<v Speaker 2>After the movies of Gene Wilder.

0:08:22.800 --> 0:08:24.920
<v Speaker 1>How about at the same time as the movies of

0:08:24.960 --> 0:08:26.920
<v Speaker 1>Gene Wilder, can we just agree to that?

0:08:26.960 --> 0:08:27.680
<v Speaker 2>How dare you?

0:08:28.400 --> 0:08:33.200
<v Speaker 1>Hey? I think highly of us and Gene Wilder. Uh.

0:08:33.320 --> 0:08:35.120
<v Speaker 2>I don't know why he's been on my mind lately,

0:08:35.160 --> 0:08:35.760
<v Speaker 2>but he has been.

0:08:36.160 --> 0:08:38.240
<v Speaker 1>He's been shaking it for you in your head.

0:08:38.679 --> 0:08:41.160
<v Speaker 2>He's been shaking it for me. So this all sounds great.

0:08:41.240 --> 0:08:43.880
<v Speaker 2>And like I mentioned at the very beginning, this is

0:08:43.920 --> 0:08:45.920
<v Speaker 2>one of those things where like, holy cow, this is

0:08:45.960 --> 0:08:48.959
<v Speaker 2>the future, this is it, and then the L at

0:08:48.960 --> 0:08:52.680
<v Speaker 2>the end, and the L is that it's really expensive

0:08:53.640 --> 0:08:56.080
<v Speaker 2>to do this. Like, we can do this, we figured

0:08:56.080 --> 0:08:58.360
<v Speaker 2>out how to do this, it's possible, we have the

0:08:58.400 --> 0:09:02.440
<v Speaker 2>tech to do this, but that here's a tape name

0:09:02.679 --> 0:09:06.040
<v Speaker 2>lto DASH nine. It's a magnetic storage tape. You can

0:09:06.040 --> 0:09:07.640
<v Speaker 2>get it for eight bucks and you can get one

0:09:07.679 --> 0:09:11.240
<v Speaker 2>petabyte of storage. That would cost you about a trillion

0:09:11.320 --> 0:09:14.040
<v Speaker 2>dollars to do for DNA.

0:09:14.440 --> 0:09:16.400
<v Speaker 1>Yeah, there was a guy who was interviewed in Ours

0:09:16.440 --> 0:09:19.360
<v Speaker 1>Technica named Hugh and June Park. He's the CEO of

0:09:19.400 --> 0:09:22.640
<v Speaker 1>a data storage company called Catalog, and he even estimated said,

0:09:22.679 --> 0:09:24.680
<v Speaker 1>let's say it cost you three cents to print a

0:09:24.720 --> 0:09:28.959
<v Speaker 1>single nucleotide. Yes, that's cheap, but for each base pairrot

0:09:29.000 --> 0:09:30.760
<v Speaker 1>now you're up to six cents. And then now you're

0:09:30.800 --> 0:09:34.880
<v Speaker 1>translating gigabytes, you're entering millions of dollars. So if it

0:09:34.920 --> 0:09:39.440
<v Speaker 1>cost millions of dollars to translate a gigabyte, it cost

0:09:39.559 --> 0:09:42.720
<v Speaker 1>trillions of dollars to do a petabyte. And the other

0:09:42.760 --> 0:09:45.959
<v Speaker 1>problem of it, too, Chuck, is that it's really really slow, right.

0:09:46.600 --> 0:09:49.560
<v Speaker 2>It's super slow. So this is a clear case of

0:09:49.600 --> 0:09:51.200
<v Speaker 2>one of those things like you mentioned, which is like

0:09:51.320 --> 0:09:54.480
<v Speaker 2>just wait, because like with any technology, it's going to

0:09:54.520 --> 0:09:57.439
<v Speaker 2>get quicker, it's going to get cheaper. I don't know

0:09:57.480 --> 0:09:59.320
<v Speaker 2>if this is like one hundred years into the future,

0:09:59.360 --> 0:10:01.480
<v Speaker 2>but I don't think at this point the cost is

0:10:01.880 --> 0:10:05.480
<v Speaker 2>just so outrageous that there's no government is going to

0:10:05.559 --> 0:10:06.280
<v Speaker 2>fund something like this.

0:10:06.360 --> 0:10:09.600
<v Speaker 1>I mean, a trillion dollars for one petabyte of information

0:10:10.360 --> 0:10:13.480
<v Speaker 1>is not You're not going to sell that very very easily.

0:10:13.600 --> 0:10:16.319
<v Speaker 1>And then yeah, like I was saying, the speed, if

0:10:16.320 --> 0:10:21.360
<v Speaker 1>you're transferring information from one of those magnetic storage tapes,

0:10:21.679 --> 0:10:25.679
<v Speaker 1>you're transferring it about a gigabyte per second typically if

0:10:25.720 --> 0:10:29.120
<v Speaker 1>it takes even like a second to print a single nucleotide,

0:10:29.160 --> 0:10:32.120
<v Speaker 1>which is still very fast, but you're we're thinking on

0:10:32.240 --> 0:10:35.200
<v Speaker 1>human level fast. We need to think on like how

0:10:35.240 --> 0:10:39.680
<v Speaker 1>many ones and zeros are in the average gigabyte of code.

0:10:40.480 --> 0:10:44.439
<v Speaker 1>Now you're talking about decades to transfer a petabyte discs

0:10:44.520 --> 0:10:50.160
<v Speaker 1>worth of information using DNA technology. Yeah, so, yes, it's

0:10:50.400 --> 0:10:54.040
<v Speaker 1>very slow right now, it's very expensive right now, But

0:10:54.240 --> 0:10:56.400
<v Speaker 1>I don't think we're one hundred years off, Chuck, because

0:10:56.480 --> 0:10:59.719
<v Speaker 1>we're able to do this now relatively cheaply because the

0:11:00.280 --> 0:11:03.040
<v Speaker 1>Human Genome Project came along that was twenty years ago.

0:11:03.480 --> 0:11:05.600
<v Speaker 1>Think about how much, how long, how far we've come,

0:11:05.760 --> 0:11:08.280
<v Speaker 1>And this is like the hardest chunk the first twenty years.

0:11:08.640 --> 0:11:10.640
<v Speaker 1>I think it's just going to get faster and easier.

0:11:10.920 --> 0:11:12.480
<v Speaker 1>I don't think we're going to be waiting one hundred

0:11:12.520 --> 0:11:14.160
<v Speaker 1>years to see DNA data storage.

0:11:14.640 --> 0:11:16.280
<v Speaker 2>Does that mean that stuff you should Know is in

0:11:16.320 --> 0:11:19.760
<v Speaker 2>the hardest chunk when you're fifteen?

0:11:19.920 --> 0:11:25.040
<v Speaker 1>I think so. Yeah, it feels like it. Okay, I'm kidding. Well,

0:11:25.080 --> 0:11:29.760
<v Speaker 1>I'm teasing Chuck, right, Yeah, just teasing, which means, of course,

0:11:30.160 --> 0:11:33.840
<v Speaker 1>short stuff is out.

0:11:34.440 --> 0:11:37.320
<v Speaker 2>Stuff you Should Know is a production of iHeartRadio. For

0:11:37.400 --> 0:11:41.560
<v Speaker 2>more podcasts my Heart Radio, visit the iHeartRadio app, Apple Podcasts,

0:11:41.679 --> 0:11:41.719
<v Speaker 2>or

0:11:41.760 --> 0:11:49.040
<v Speaker 1>Wherever you listen to your favorite shows.