Image1
DNA has stored ‘data’ for millions of years, in the form of the 4 bases; A, T, G, and C. Synthetic attempts to recreate this huge data storing capability have promised the same levels of storage density and stability, but are yet to compete with current optical/magnetic storage devices due to the high error rates and large costs. A novel approach has been found, storing the data in the backbone of native DNA instead of the bases themselves, this greatly reduces costs, while reducing the error rate to 0. This new technique is also capable of bitwise random-access memory, opening the door to the world of molecular computing.In nature, DNA is found as a double-helix ladder compromising of billions of molecular building blocks known as bases (A, T, G, and C), coding instructions for proteins. The closely-packed nature of these nucleotides enable DNA to code for huge quantities of data; one gram of DNA is capable of holding 455 exabytes of data1 ( 1 exabyte = 1018 bytes), enough storage for all of the data produced by every major tech company…with room to spare. Traditional data storage methods are reaching their physical limits, with hard-drives being restricted to 1 terabyte (1012bytes) per square inch2. With the exponential growth of big data, alternative data storage methods such as DNA are starting to be explored, made possible due to the huge advances in DNA sequencing technology.Current approaches of DNA synthesis-based storage2–4assign segments of binary code to specific nucleotide sequences, these are then synthesised in vitro using enzymes to bind the nucleotides together, iterating through many cycles to form the string of DNA. The information encoded in the DNA can then be retrieved via next generation sequencing (NGS) or third generation nanopore sequencing. Nanopore sequencing entails feeding the string of DNA through a tiny hole, much like a thread going through a needle, reading each nucleotide as it passes through. Advancements in sequencing technologies allow for large sequences to be read at a relatively low cost; the total cost of the first human genome was ≈ £2.07 billion2, whereas sequencing a full genome now costs around £10002.However, despite the promise of synthesis-based DNA storage, issues arise during the synthesis step. The first issue is that adding one nucleotide per cycle takes hours to complete a full sequence, making it very slow and expensive compared to traditional optical and magnetic writing techniques. Secondly, DNA synthesis is naturally a very error prone process, with 1% of bases containing substitution (incorrect base attached) or indel (insertion or deletion of a base) errors3. This is a big issue when dealing with large volumes of DNA as errors are inevitable, this can have dire repercussions when dealing with compressed file formats. Finally, DNA synthesis machinery is limited to creating relatively short sequences, reducing the amount of readouts/ data, per fragment5.Tabatabaei et al. 5 propose a novel method to storing data using DNA, via a topological ‘nicking’ approach instead of direct synthesis. Therefore, the data is not stored in the nucleotides themselves, but the sugar-phosphate backbone of the DNA. This enables you to use a known sequence of DNA which you can map to a reference genome, bypassing the aforementioned issues associated with synthesis-based storage.