DNA Data Storage: a Decade of Coding and Decoding, How Far Have we Got?

Abstract

DNA molecules offer an alternative to storage information at extremely high spatial density. They remain stable for hundreds of years at a low energy cost, making them a natural information repository. As we approach the 10th anniversary of the Church’s publication, it is timely to look back and recall the recent main accomplishments. Although it was not the first time that information had been coded into DNA, their work showed the technology’s viability. Storing digital data into DNA raises challenges common to other storage mediums, such as speeding up the process of writing and reading data, and some specific to the medium, like the errors type and encoding data in a non-binary fashion. The goal of this presentation is to show the evolution of the coding algorithms in DNA data storage over the last decade, including the mapping schemes to convert data between bits and DNA bases, data compression, and the data recovery strategies to handle errors inherent to DNA storage. Finally, the analyses are abridged in a feature comparison table with major mapping and error-correcting methods in the recent literature considering the maximum data density (without the index and redundancy segments); constraints that the algorithms meet (homopolymer; GC content; self-reverse complementariness; undesirable motifs); and the inner codes and the outer codes used as ECC (Error Correction Code).

Related Sessions