Scrolls thousands of years old provide us with a glimpse into long-forgotten cultures and the knowledge of our ancestors. In this digital era, in contrast, a large part of our knowledge is located on servers and hard drives. It will be a challenge for this data to survive 50 years, let alone thousands of years. Researchers are therefore searching for new ways to store large volumes of data over the long term. Particular attention is being paid to a storage medium found in nature: the genetic material DNA.
DNA lends itself to this task as it can store large amounts of information in a compact manner. Unfortunately, the data is not always retrievable error-free: gaps and false information in the encoded data arise through chemical degradation and mistakes in DNA sequencing. Now researchers led by Robert Grass, a lecturer at ETH Zurich's Department of Chemistry and Applied Biosciences, have revealed how the long-term, error-free storage of information can be achieved, potentially for more than a million years. First, they encapsulate the information-bearing segments of DNA in silica (glass) and second, they use an algorithm in order to correct mistakes in the data.
'Synthetic fossil' forms a protective cloak
Two years ago, researchers demonstrated that data could be saved and reread in the form of DNA. In that case, the time period between 'writing' the information - the synthesis of the corresponding coding sequence of the DNA - and the reading, or sequencing, of the data was very short. But even a short period of time presents a problem in terms of the margin of error, as mistakes occur in the writing and reading of the DNA. Over the longer term, DNA can change significantly as it reacts chemically with the environment, thus presenting an obstacle to long-term storage. However, genetic material found in fossilised bones several hundreds of thousands of years old can be isolated and analysed as it has been encapsulated and protected. "Similar to these bones, we wanted to protect the information-bearing DNA with a synthetic 'fossil' shell," explains Grass.
In order to do that, his team encapsulated the DNA in silica spheres with a diameter of roughly 150 nanometres. The researchers encoded Switzerland's Federal Charter of 1291 and The Methods of Mechanical Theorems by Archimedes in the DNA. In order to simulate the degradation of the information-bearing DNA over a long period of time, researchers stored it at a temperature of between 60 and 70 degrees Celsius for up to a month. Such high temperatures replicate the chemical degradation that takes place over hundreds of years within a few weeks. In this manner, researchers could compare the storage of DNA in a sheath of silica glass with other common storage methods: on impregnated filter paper and in a biopolymer. The DNA encapsulated in the glass shell turned out to be particularly robust. Through the use of a fluoride solution, it could be easily separated from the silica glass, and the information read from it.
As encapsulation in silica is roughly comparable to that in fossilised bones, researchers could draw on prehistoric information about the long-term stability of encapsulated DNA and from this calculate a prognosis: through storage in low temperatures, such as that found in the Svalbard Global Seed Vault, which is stored at minus 18 degrees Celsius, DNA-encoded information can survive over a million years. In contrast, data projected on to microfilm can be preserved only for an estimated 500 years.
Retrieval of lost data points
Nevertheless, it's not enough to simply store the information over long periods of time without substantial damage; the data must also be able to be read free of error. Thanks to significant technological advancements in DNA sequencing, the reading of stored data is affordable and will become even more cost-effective in the future. These technologies, however, are not error-free.
In order to respond to this problem, Reinhard Heckel from ETH Zurich's Communication Technology Laboratory developed a scheme to correct these errors based on the Reed-Solomon Codes, similar to those that are used in the transmission of data over long distances; for example, radio communication with spacecraft. The key is additional information attached to the actual data, explains Heckel. "In order to define a parabola, you basically need only three points. We added a further two in case one gets lost or is shifted." The DNA-encoded data is indeed more complex, but in principle the researchers' DNA-encrypted security 'back-up' functions in the same manner. Even when stored in adverse conditions, the information saved for the test - Switzerland's Federal Charter and Archimedes' text - could be retrieved error-free.
What kind of information would Grass save for millions of years? The documents in Unesco's Memory of the World Programme, he says. And Wikipedia as well: "Many entries are described in detail, others less so. This probably provides a good overview of what our society knows, what occupies it and to what extent."
Literature reference: Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ: Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angewandte Chemie International Edition, 54, 8, 2552,-2555, DOI: 10.1002/anie.201411378.