Is DNA the Next Frontier for Data Storage?
Is DNA the Next Frontier for Data Storage?
Whether you are an individual consumer, or a large enterprise, data storage is a hassle; setting up cloud solutions, or even just external hard drives, can be extremely time-consuming. But things might get even more stressful. It's recommended that data on hard drives is fully replaced every decade, so if you have had a set of results sitting safely on a pen drive, it doesn't necessarily mean you will be able to take a look at them (if that's what you like to do) ten or fifteen years down the line. In fact, long term data storage is still an uncertain area, with some applications still being served by seemingly antiquated technology like magnetic tape.
However, there may be a new answer; storing our digital data on DNA. DNA is designed to keep lots of data in a tiny space for an extremely long time. But the best way in which we can transfer our data onto DNA is still a topic of extensive debate. To find out more, we spoke to Dr Bill Efcavitch, Co-founder and Chief Scientific Officer of Molecular Assemblies, Inc., a company that has developed its own proprietary method of keeping our data safe on DNA.
Molly Campbell (MC): How is binary data incorporated into physical DNA? What challenges are typically encountered in this process?
Bill Efcavitch (BE): To create a binary code, zeroes and ones are converted into a two-nucleotide code (A & T bases for odd zero & one bits; G & C bases for even zero & one bits). Molecular Assemblies’ proprietary approach uses short homopolymers of A, C, G, and T to encode a bit instead of a single nucleotide. As such, we created a different process for writing DNA data strands than what is used for “natural” DNA strands. In this way, we have greatly simplified the synthesis cycle, enabling high-throughput hardware to achieve highly parallel and fast “write” speeds.
We have demonstrated this data storage ability by converting a text message into a binary code, encoding the binary data into the letters of DNA (A, G, C, and T bases), and then writing the message as a DNA sequence. To recover the text message, this process is reversed.
Ruairi Mackenzie (RM): What separates Molecular Assemblies’ DNA data storage techniques from those developed by, for example, Twist Bioscience in collaboration with Microsoft?
BE: We believe that enzymatic DNA synthesis is the optimal way to create a cost-effective, sustainable and scalable integrated system that stores and reads information in DNA. Molecular Assemblies is the first industry group to disclose the successful completion of an end-to-end run to store and retrieve digital information in DNA using enzymatic synthesis.
Molecular Assemblies’ technology employs a unique, aqueous-based enzymatic synthesis approach, while most DNA providers use the now 38-year-old, non-aqueous, phosphoramidite chemical synthesis method. The latter is limited to DNA strands that are <150 nucleotides in length, while our technology has the potential to synthesize DNA data strands that are 10 to 50 times longer.
The older chemical synthesis method requires extensive post-synthesis processing, and hence is time consuming, and expensive. It is also not optimal for data storage applications, as the DNA sequences produced are short, and the cycle requires complex hardware. Our proprietary method improves on traditional chemical synthesis by producing DNA with minimal purification and processing steps. The use of aqueous, non-toxic reagents makes it compatible with a data storage facility and will allow the development of integrated write/read devices, which can never be achieved with the current chemical method.
Molecular Assemblies has developed two distinct DNA synthesis technologies: one can be used to create long pure strands of DNA for Life Sciences applications and markets, while the other can be used to write long DNA data strands. As such, our proprietary enzymatic method can successfully synthesize desired sequences of DNA for either purpose.
Molecular Assemblies’ proprietary enzymatic DNA synthesis method is based on making DNA the way nature makes DNA, which enables us to produce long, high-quality, sequence-specific DNA reliably, affordably, and sustainably. Also, our technology has the potential to drastically simplify and streamline the reading, writing, and storage of DNA, and it opens up the possibility of creating integrated DNA data storage devices.
RM: Why shouldn't we be satisfied with current electronic data storage methods?
BE: It is estimated that only about 10% of the data currently generated is saved, and that global annual data production will reach 163 zettabytes by 2025 (one zettabyte is about one trillion gigabytes). So, how and where will all this information be stored, if we run out of storage space?
Long-term information storage is currently achieved using tape media and disc drives. These physical storage media need to be rewritten every five to seven years to prevent data loss. What’s more, the power consumption of large-scale data storage facilities is large, as is the real estate footprint.
Current electronic data storage methods are not sufficient to hold the predicted flood of data – and we think that DNA could be an important part of the solution. If we could simulate how nature stores genetic information in the genome – essentially through microscopic DNA packages which compact massive amounts of data into nanograms of stable, easily replicable material, we can revolutionize data storage.
RM: What improvements on current storage tech will DNA data storage offer?
BE: A key facet of DNA-based storage methods is the ability to pack tons of data into microscopic spaces. The higher data density of DNA means a smaller physical footprint for devices.
Also, DNA as a storage media has an extremely long lifetime, and is virtually indestructible, and the passive power consumption is much lower than current methods.
DNA-based data storage is immune to the electromagnetic pulse generated by nuclear weapons, which is important to the survivability of both government and financial records.
Modern DNA synthesis approaches are designed to maximize the speed and efficiency of the readback mechanism.
RM: How much would it currently cost to store a megabyte of data using DNA synthesis? How long will it be before the speed and cost of DNA storage matches that of electronic methods?
BE: It is too early to make this estimate because the technologies are still under development. With enough capital investment, we believe that the costs will be greatly reduced. Currently, the amount of capital injected into DNA data storage R&D is minuscule compared to the cash flow into electronic methods, because it is a newer technology still in its infancy.
The exponential increase in the generation of data will force the adoption of DNA-based data storage in the not too distant future. If done successfully, DNA data storage could eventually become cheaper than current electronic methods of data storage.
As for speed, the writing speed of DNA storage methods may never be as fast as those of electronic methods. But that doesn’t matter for long-term, archival storage because once the data is written, it will be stored for 20-30 years. We can’t predict the future, but it is safe to say that the DNA writing speeds today are compatible with long-term storage.
MC: Are there any security implications associated with storing data in DNA?
RM: Molecular Assemblies’ data strand synthesis technology generates non-biological DNA that could possibly be a more secure form of data storage because it is much harder to hack and cannot be repurposed for bioterrorism. Since Molecular Assemblies uses short homopolymers to encode data bits, the resulting DNA is not useable in biological systems. The DNA data writing hardware is actually incapable of being used to write non-homopolymer DNA that would be required for creating harmful pathogens and toxins that could be used for bioterrorism.
Bill Efcavitch was speaking to Molly R Campbell and Ruairi J Mackenzie, Science Writer for Technology Networks