We are in an era of data explosion, and the amount of global data is growing exponentially!
IDC, an international data agency, predicts that by 2025, the amount of global data will reach 175ZB, with a five-year average compound growth rate of 8%. 1ZB equals 1 trillionGB. If 175ZB of data is stored in a 1GB mobile hard drive, at least 175 trillion hard drives will be needed. In the future, data storage issues will become a pain point in the development of the Internet.
To solve the problem of data storage, researchers, inspired by biology, have targeted DNA in the human body.
The largest human chromosome contains nearly 250 million base pairs. If data can be stored on each base pair, in theory,
a coffee cup filled with DNA can store all the data in the world, said Mark Bathe, a professor of biological engineering at MIT. In this way, storing 175ZB of data is not a problem.
Such a promising emerging storage technology
was included in the draft outline of the 14th Five-Year Plan
in March this year
. In addition,
the endless related research and implementation progress in 2021 have made DNA storage technology more and more popular.
On January 11 last year, Nature published a paper on Columbia University translating hello world into base language and entering it into E. coli DNA; on May 26, Zhongke Carbon Yuan, incubated by the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences, was established to focus on promoting the development and commercialization of DNA data storage; on November 12, Southeast University's Liu Hong team published a paper on writing the school motto "Strive for Perfection" into DNA in Science Advances; on November 24, Microsoft announced the first nanoscale DNA storage writer...
It should be noted that DNA chips in a broad sense are tools for genomics and genetics research, which refers to the in-situ synthesis of oligonucleotides on a solid support or the direct microprinting of a large number of pre-prepared DNA probes in an orderly manner on the support surface, and then hybridization with labeled samples. Because the support surface is often a computer chip, it is called a DNA chip.
There are various types of DNA chips, including those for detecting genes, chromosomes or for clinical diagnosis. Among them, the one
that imitates the structure of DNA molecules for data storage is the focus of our discussion today, that is, the DNA storage chip.
01
.
The bases correspond to the binary, and the length of the human hand
DNA strands can store 1 billion GB of data
From the patterns carved on ancient stone walls to the emergence of writing, and then to the creation of books, the most important information carrier, we actually don’t produce much information. But since entering the information age, the information recorded by humans in the past 50 years has far exceeded the information in the past 2,000 years.
We are in the era of big data with information explosion. All information on the Internet is saved as data, ranging from web pages and applications to security and satellite fields.
According to data from the international data organization IDC, the global big data storage volume was 4.3ZB, 6.6ZB, and 8.6ZB from 2013 to 2015, with a growth rate of about 40%. By 2016, the global big data storage volume reached 16.1ZB, with a growth rate of 87.21%. From 2017 to 2019, the global big data storage volume was 21.6ZB, 33ZB, and 41ZB, respectively. In 2020, the global data volume reached 60ZB. As the field of big data continues to develop, storage methods are also constantly changing to meet the massive data storage needs.
▲IDC monitors the trend of global data volume changes from 2015 to 2020 and forecasts it for 2025
DNA is a carrier for storing genetic information. It carries the genetic information necessary for synthesizing RNA and proteins. It can encode all the information of an organism.
In the 1950s, researchers discovered the relationship between biological characteristics and man-made objects. DNA molecules are composed of four bases, and data is composed of binary 0 and 1. DNA is used to store genetic information, and data needs a medium to store it. Therefore, Soviet physicist Mikhail Samoilovich Neiman thought, can we refer to the DNA structure to store data?
与传统的存储介质不同,DNA存储技术有如下显著优势。
First, DNA has high storage density.
A DNA molecule can retain all the genetic information of a species. The largest human chromosome contains nearly 250 million base pairs, which means that a DNA chain about the length of a human hand can store 1EB (1EB=1.074 billion GB) of data.
Compared to the data storage density of hard disks and flash memory, hard disk storage is about 10
13
bits per cubic centimeter, flash memory storage is about 10
16
bits, and DNA storage density is about 10
19
bits.
Secondly, DNA molecule storage is stable.
In February this year, a paper published in the top international academic journal Nature stated that paleontologists extracted genetic material from mammoths dating back 1.2 million years ago from the permafrost in northeastern Siberia and analyzed their DNA, which further refreshed the record of the preservation age of DNA molecules.
It is reported that DNA can retain data for at least hundreds of years. In comparison, data on hard drives and magnetic tapes can only be retained for about 10 years at most.
Finally, DNA storage has low maintenance costs.
Data stored in the form of DNA is easy to maintain. Unlike traditional data centers, it does not require a large amount of manpower and financial resources, and only needs to be stored in a low-temperature environment.
In terms of energy consumption, a hard disk that stores 1GB of data consumes about 0.04W, while DNA storage consumes less than 10
-10
W.
02
.
Low-cost scalability
Can hold millions of DNA sequences
In the 1950s, scientists had already proposed the idea of creating artificial objects that resembled biological features in the microscopic world and believed that such artificial objects would have a wider range of capabilities. Less than a decade later, Soviet physicist Mikhail Samoilovich Neiman independently proposed the possibility of using DNA and RNA molecules for information recording, storage and retrieval.
The application of DNA for data storage really began in 1988, when
artist Joe Davis and researchers from Harvard University collaborated to store an ancient Germanic rune image representing life and female earth in the DNA sequence of E. coli through a 5x7 matrix. They used binary 1 to represent dark pixels in the image and 0 to represent bright pixels in the image.
In subsequent studies, researchers proposed a variety of encoding methods for DNA storage. In 2011, the research team encoded a 659KB book, using a one-to-one correspondence, with adenine or cytosine representing 0 in binary, and guanine or thymine representing 1. However, when the researchers checked the data storage results, they found 22 errors in the DNA. This one-to-one encoding method has low accuracy.
DNA is composed of four bases combined into base pairs and forming a spiral structure. The four bases are adenine (A), thymine (T), guanine (G), and cytosine (C). Then, according to the principle of base complementary pairing, DNA molecules are arranged to store genetic information.
These four codes also provide a suitable coding environment for DNA storage chips.
▲Schematic diagram of DNA molecular structure
DNA storage technology includes four steps: information encoding, storage, retrieval, and decoding.
In computers, data storage needs to be represented by binary 0 and 1. To use DNA to store data, 0 and 1 must first be converted into the four bases A, C, T, and G in DNA to create a DNA helical structure with the correct base sequence. After synthesizing DNA, it is stored in vivo or in vitro. During decoding, the DNA sequencer will transcribe the base sequence in the DNA structure, convert it into 0 and 1 through decoding software, and restore the data information.
In 2012, a research team at Harvard University confirmed that DNA can be used as a storage medium similar to hard drives and tapes. They encoded digital information through DNA, including a 53,400-byte HTML draft, 11 JPG images, and a JavaScript program, using a one-to-one mapping between bits and bases, but this method would cause the same base to run for a long time, making the sequencing process prone to errors.
This simple one-to-one encoding form made a breakthrough in 2013. Researchers at the European Bioinformatics Institute (EBI) said in a paper that they have achieved the storage, retrieval and replication of more than 5 million bits of data, and all DNA files reproduced the information with 99.99% to 100% accuracy. During the encoding process, the research team added an error correction coding scheme and used an encoding method of overlapping short oligonucleotides that can be identified by sequence.
Since then, research teams from Columbia University, University of Washington, Imperial College London and other institutions have carried out a series of studies.
In order to prove the long-term stability of DNA-encoded data, researchers from the Swiss Federal Institute of Technology published a paper in the top international journal Angewandte Chemie International Edition on February 4, 2015. The researchers encapsulated DNA in silica glass balls through Reed-Solomon error correction coding and sol and gel to increase redundancy, which
may be the earliest form of DNA storage chips
.
Since November 2021, several research teams have announced new progress in DNA memory chip research, including
research groups from
Southeast University, Microsoft Research, Northwestern University in Illinois, and Georgia Institute of Technology
.
On November 12, Liu Hong's team from the School of Life Sciences and Medical Engineering and the State Key Laboratory of Bioelectronics at Southeast University in my country successfully stored the school motto "Strive for Perfection" into a DNA sequence. The paper was published in Science Advances.
In order to achieve miniaturization, integration and automation of DNA storage, the research team optimized the sequencing process. Based on the electrochemical single-electrode DNA synthesis and sequencing method, the traditional phosphoramidite chemical synthesis method was improved by electrochemical deprotection technology, and the DNA molecules on the electrode surface were sequenced based on the charge oscillation phenomenon, successfully encoding and decoding the school motto.
▲Flowchart of the DNA data storage system based on electrochemical DNA synthesis and sequencing developed by Liu Hong’s team (Image source: Southeast University official website)
On November 24, a paper on a breakthrough in DNA storage made by Microsoft Research and the Molecular Information Systems Laboratory (MISL) of the University of Washington was published in Science Advances. The research team announced the first nanoscale DNA storage writer. The molecular controller and DNA writer on the DNA chip are equipped with a PCIe interface, which can construct four strands of synthetic DNA at a time, producing a DNA chain containing 100 bases.
Microsoft Research said that longer DNA chains are more prone to errors, but this will be improved as hardware develops. The experiment proved the possibility of expanding the storage scale of DNA helical structure.
On November 29 last year, the Center for Synthetic Biology at Northwestern University in Illinois proposed a new method for recording information into DNA and published it in the journal Technology Networks. In the encoding process, they tried to use the capabilities of DNA itself to create a new data storage solution.
In the experiment, they used a new enzymatic system to synthesize DNA, recording rapidly changing environmental signals directly into the DNA sequence. Keith EJ Tyo, a professor of engineering at Northwestern University, said that by directly controlling the enzymes that synthesize DNA, it is possible to express and store information in advance.
In order to make DNA data storage scale up while reducing costs, Nicholas Guise, a senior research scientist at the Georgia Institute of Technology (GTRI), told the BBC on December 1: "The functional density on our new chip is about 100 times higher than current commercial devices."
The chip they designed can grow DNA chains in an ultra-dense format at a very low cost, obtaining large-scale storage capacity. This microchip is equipped with 10 groups of "microwells" several hundred nanometers deep, allowing DNA molecules to grow in parallel in the middle, eventually accumulating millions of DNA sequences on the chip. Compared with the traditional synthetic DNA manufacturing process, this method uses electrochemical local activation synthesis, which is much cheaper.
▲Georgia Institute of Technology (GTRI) research team experimental encoding and decoding process (picture source is the paper illustration)
03
.
$7,000 to synthesize 2MB
Reading costs $2000
Continuous research shows that DNA storage technology will become a new era of storage. However, since it was proposed in the 1950s, there has been no significant substantive progress in its development. As an early entrant in DNA data storage, Microsoft Research began related research in 2015, but it was not until 2019 that they demonstrated a fully automatic system to encode and decode data information in DNA.
DNA storage chips can achieve high-density and long-term storage characteristics, but the technology is not yet widely used in the computer field. It is currently mainly used for some content that is not often used but needs to be saved.
There are probably several reasons why DNA storage chips cannot be commercialized.
First, the cost of writing and reading DNA storage data is high.
In 2017, Columbia University's experiment showed that it cost $7,000 to synthesize 2MB of DNA data, and $2,000 to read the data. Although this is much lower than the cost of $12,400 per megabyte in 2013, if a user needs to store a 1GB movie in DNA form, encoding it will cost about $3.58 million, and reading the data will cost another $1.02 million.
Secondly, the decoding process of DNA storage data requires large tools.
At present, the decoding process of DNA storage technology still relies on sequencers to sort DNA molecules. Most of the sequencers produced in the market are used in small laboratories, clinical applications and other scenarios with high timeliness requirements, which are still far from daily use.
▲The iSeq 100 sequencer product of sequencing service provider Illumina (picture source: Illumina official website)
In addition, DNA storage technology has slow read and write speeds. In
early December 2021, research at the Georgia Institute of Technology increased the DNA storage speed to 20GB of data per day. The current read and write speed of solid-state drives is about 500MB per second. IDC's "Data Age 2025" report shows that the world's annual data generation will reach 175ZB in 2025, equivalent to 491EB of data per day. Even if the density of DNA storage chips is large enough, their real-time read speed cannot meet current data storage needs.
DNA storage chips are an ideal medium for large-capacity storage in the future. Most of the current research progress is in the concept verification stage, and it will take a long time for its hardware equipment to be implemented.
04
.
Conclusion: The key to commercializing DNA storage is
Achieve low cost and high density
DNA memory chips have the advantages of high storage density, high stability and easy maintenance, which determine their potential to become the next generation of storage devices. However, there are still many limitations to the further commercialization of this technology, such as high cost, many storage environment restrictions, slow real-time reading speed, etc., which indicate that it still has a long way to go before it can become a mainstream storage device.
We are in the digital age, where a large amount of information is generated every day from smartphones, tablets, PCs to wearable devices. Therefore, this reality dictates that it is urgent to find storage devices with higher performance requirements and lower costs.
The half-life of DNA is 521 years. Under cold or suitable conditions, DNA can persist for hundreds of thousands or even millions of years. If DNA storage technology is truly commercialized, in the future, our data archives may become "fossils" and be preserved.
Reply to any content you want to search in
the
official
, such as problem keywords, technical terms, bug codes, etc.,
and you can easily get relevant professional technical content feedback
. Go and try it!
Since the WeChat official account has recently changed its push rules, if you want to see our articles frequently, you can click "Like" or "Reading" at the bottom of the page after each reading, so that each pushed article will appear in your subscription list as soon as possible.
Or set our public account as a star. After entering the public account homepage, click the "three dots" in the upper right corner, click "Set as Star", and a yellow five-pointed star will appear next to our public account name (the operation is the same for Android and iOS users).
Focus on industry hot spots and understand the latest frontiers
Please pay attention to EEWorld electronic headlines
https://www.eeworld.com.cn/mp/wap
Copy this link to your browser or long press the QR code below to browse
The following WeChat public accounts belong to
EEWorld(www.eeworld.com.cn)
Welcome to long press the QR code to follow us!
EEWorld Subscription Account: Electronic Engineering World
EEWorld Service Account: Electronic Engineering World Welfare Club