HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints
- PMID:32675237
- PMCID: PMC7414044
- DOI: 10.1073/pnas.2004821117
HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints
Abstract
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Keywords: DNA; Reed–Solomon; error-correcting code; indel; information storage.
Copyright © 2020 the Author(s). Published by PNAS.
Figures


Similar articles
- Indel-correcting DNA barcodes for high-throughput sequencing.Hawkins JA, Jones SK Jr, Finkelstein IJ, Press WH.Hawkins JA, et al.Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.Proc Natl Acad Sci U S A. 2018.PMID:29925596Free PMC article.
- Error-correcting codes and information in biology.Battail G.Battail G.Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.Biosystems. 2019.PMID:31295534
- Efficient DNA-based data storage using shortmer combinatorial encoding.Preuss I, Rosenberg M, Yakhini Z, Anavy L.Preuss I, et al.Sci Rep. 2024 Apr 2;14(1):7731. doi: 10.1038/s41598-024-58386-z.Sci Rep. 2024.PMID:38565928Free PMC article.
- Novel Modalities in DNA Data Storage.Lim CK, Nirantar S, Yew WS, Poh CL.Lim CK, et al.Trends Biotechnol. 2021 Oct;39(10):990-1003. doi: 10.1016/j.tibtech.2020.12.008. Epub 2021 Jan 14.Trends Biotechnol. 2021.PMID:33455842Review.
- Error-Free Synthetic DNA by Molecular Dictation.Knyphausen P, Lindenburg L, Hollfelder F.Knyphausen P, et al.Trends Biotechnol. 2021 Sep;39(9):861-865. doi: 10.1016/j.tibtech.2021.02.001. Epub 2021 Feb 27.Trends Biotechnol. 2021.PMID:33653603Review.
Cited by
- Efficient DNA Coding Algorithm for Polymerase Chain Reaction Amplification Information Retrieval.Wang Q, Zhang S, Li Y.Wang Q, et al.Int J Mol Sci. 2024 Jun 11;25(12):6449. doi: 10.3390/ijms25126449.Int J Mol Sci. 2024.PMID:38928155Free PMC article.
- Explorer: efficient DNA coding by De Bruijn graph toward arbitrary local and global biochemical constraints.Dou C, Yang Y, Zhu F, Li B, Duan Y.Dou C, et al.Brief Bioinform. 2024 Jul 25;25(5):bbae363. doi: 10.1093/bib/bbae363.Brief Bioinform. 2024.PMID:39073829Free PMC article.
- Adaptive coding for DNA storage with high storage density and low coverage.Cao B, Zhang X, Cui S, Zhang Q.Cao B, et al.NPJ Syst Biol Appl. 2022 Jul 4;8(1):23. doi: 10.1038/s41540-022-00233-w.NPJ Syst Biol Appl. 2022.PMID:35788589Free PMC article.
- Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding.Xu Y, Ding L, Wu S, Ruan J.Xu Y, et al.Adv Sci (Weinh). 2024 Aug;11(30):e2402951. doi: 10.1002/advs.202402951. Epub 2024 Jun 14.Adv Sci (Weinh). 2024.PMID:38874370Free PMC article.
- In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA).Mortuza GM, Guerrero J, Llewellyn S, Tobiason MD, Dickinson GD, Hughes WL, Zadegan R, Andersen T.Mortuza GM, et al.BMC Bioinformatics. 2023 Apr 21;24(1):160. doi: 10.1186/s12859-023-05264-6.BMC Bioinformatics. 2023.PMID:37085766Free PMC article.Review.
References
- Church G. M., Gao Y., Kosuri S., Next-generation digital information storage in DNA. Science 337, 1628 (2012) - PubMed
- Grass R. N., Heckel R., Pudda M., Paunescu D., Stark W. J., Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015). - PubMed
- Bornholt J., et al. , A DNA-based archival storage system. Comput. Architect. News 44, 637–649 (2016).
- Erlich Y., Zielinski D., DNA Fountain enables a robust and efficient storage architecture. Science 255, 950–954 (2017). - PubMed
Publication types
MeSH terms
Substances
Related information
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous