HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints
- PMID:32675237
- PMCID: PMC7414044
- DOI: 10.1073/pnas.2004821117
HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints
Abstract
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Keywords: DNA; Reed–Solomon; error-correcting code; indel; information storage.
Copyright © 2020 the Author(s). Published by PNAS.
Figures


Similar articles
- Indel-correcting DNA barcodes for high-throughput sequencing.Hawkins JA, Jones SK Jr, Finkelstein IJ, Press WH.Hawkins JA, et al.Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.Proc Natl Acad Sci U S A. 2018.PMID:29925596Free PMC article.
- Error-correcting codes and information in biology.Battail G.Battail G.Biosystems. 2019 Oct;184:103987. doi: 10.1016/j.biosystems.2019.103987. Epub 2019 Jul 8.Biosystems. 2019.PMID:31295534
- Efficient DNA-based data storage using shortmer combinatorial encoding.Preuss I, Rosenberg M, Yakhini Z, Anavy L.Preuss I, et al.Sci Rep. 2024 Apr 2;14(1):7731. doi: 10.1038/s41598-024-58386-z.Sci Rep. 2024.PMID:38565928Free PMC article.
- Novel Modalities in DNA Data Storage.Lim CK, Nirantar S, Yew WS, Poh CL.Lim CK, et al.Trends Biotechnol. 2021 Oct;39(10):990-1003. doi: 10.1016/j.tibtech.2020.12.008. Epub 2021 Jan 14.Trends Biotechnol. 2021.PMID:33455842Review.
- Error-Free Synthetic DNA by Molecular Dictation.Knyphausen P, Lindenburg L, Hollfelder F.Knyphausen P, et al.Trends Biotechnol. 2021 Sep;39(9):861-865. doi: 10.1016/j.tibtech.2021.02.001. Epub 2021 Feb 27.Trends Biotechnol. 2021.PMID:33653603Review.
Cited by
- Composite Hedges Nanopores codec system for rapid and portable DNA data readout with high INDEL-Correction.Zhao X, Li J, Fan Q, Dai J, Long Y, Liu R, Zhai J, Pan Q, Li Y.Zhao X, et al.Nat Commun. 2024 Oct 30;15(1):9395. doi: 10.1038/s41467-024-53455-3.Nat Commun. 2024.PMID:39477940Free PMC article.
- FrameD: framework for DNA-based data storage design, verification, and validation.Volkel KD, Lin KN, Hook PW, Timp W, Keung AJ, Tuck JM.Volkel KD, et al.Bioinformatics. 2023 Oct 3;39(10):btad572. doi: 10.1093/bioinformatics/btad572.Bioinformatics. 2023.PMID:37713474Free PMC article.
- An Analysis of Algebraic Codes over Lattice Valued Intuitionistic Fuzzy Type-3R-Submodules.Riaz A, Kousar S, Kausar N, Pamucar D, Addis GM.Riaz A, et al.Comput Intell Neurosci. 2022 Jun 23;2022:8148284. doi: 10.1155/2022/8148284. eCollection 2022.Comput Intell Neurosci. 2022.PMID:35785082Free PMC article.
- RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.Schwarz PM, Welzel M, Heider D, Freisleben B.Schwarz PM, et al.Bioinform Adv. 2023 Aug 26;3(1):vbad117. doi: 10.1093/bioadv/vbad117. eCollection 2023.Bioinform Adv. 2023.PMID:38496344Free PMC article.
- Turbo autoencoders for the DNA data storage channel with Autoturbo-DNA.Welzel M, Dreßler H, Heider D.Welzel M, et al.iScience. 2024 Mar 27;27(5):109575. doi: 10.1016/j.isci.2024.109575. eCollection 2024 May 17.iScience. 2024.PMID:38638577Free PMC article.
References
- Church G. M., Gao Y., Kosuri S., Next-generation digital information storage in DNA. Science 337, 1628 (2012) - PubMed
- Grass R. N., Heckel R., Pudda M., Paunescu D., Stark W. J., Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015). - PubMed
- Bornholt J., et al. , A DNA-based archival storage system. Comput. Architect. News 44, 637–649 (2016).
- Erlich Y., Zielinski D., DNA Fountain enables a robust and efficient storage architecture. Science 255, 950–954 (2017). - PubMed
Publication types
MeSH terms
Substances
Related information
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous