Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

BioMed Central full text link BioMed Central Free PMC article
Full text links

Actions

.2015 Oct 12:16:224.
doi: 10.1186/s13059-015-0776-0.

Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA

Affiliations

Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA

Gabriel Renaud et al. Genome Biol..

Abstract

Ancient DNA is typically highly degraded with appreciable cytosine deamination, and contamination with present-day DNA often complicates the identification of endogenous molecules. Together, these factors impede accurate assembly of the endogenous ancient mitochondrial genome. We present schmutzi, an iterative approach to jointly estimate present-day human contamination in ancient human DNA datasets and reconstruct the endogenous mitochondrial genome. By using sequence deamination patterns and fragment length distributions, schmutzi accurately reconstructs the endogenous mitochondrial genome sequence even when contamination exceeds 50 %. Given sufficient coverage, schmutzi also produces reliable estimates of contamination across a range of contamination rates.

Availability: https://bioinf.eva.mpg.de/schmutzi/ license:GPLv3.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Schematic illustration of mitochondrial sequences from an ancient DNA library. When DNA from an ancient human sample is sequenced, DNA from the ancient human (endogenous fragments represented ingreen) as well as contaminant DNA fragments from the individuals who have handled the bone (contaminating fragments represented inred) are included. Because DNA undergoes deamination over time, endogenous fragments are likely to carry deaminated cytosines (represented asT’s in ablue frame), particularly near the ends of the DNA fragments. The non-deaminated cytosines are represented asunframed blue C’s. Schmutzi first identifies the endogenous fragments and, in a second step, uses these to quantify contamination. These steps are repeated until convergence is achieved and a single mitochondrial genome is identified
Fig. 2
Fig. 2
Schmutzi workflow. An initial contamination estimate is computed using the deamination rates of fragments by conditioning on the other end being deaminated and comparing these to the deamination rate of all fragments in the dataset (contDeam). This prior is provided to call an endogenous consensus (endoCaller). The consensus call is, in turn, used to re-estimate mitochondrial contamination (mtCont). Deamination rates and fragment length distributions are measured for fragments that support endogenous and contaminant mitochondrial genomes (splitEndo). The information from mtCont and splitEndo is used as input for re-calling the endogenous consensus (endoCaller). This cycle is repeated until a stable contamination rate is reached.db database
Fig. 3
Fig. 3
Effect of increasing contamination on endogenous genome sequence reconstruction and contaminant genome sequence reconstruction of simulated data. Accuracy of the ancient (a) and present-day contaminant (b) mitochondrial consensus sequences produced by schmutzi on simulated data for an early modern human, a Neanderthal and a Denisovan mitochondrial genome. We define an error as either a mismatch or an indel between the predicted endogenous sequence and the published mitochondrial sequence used for simulations. As contamination increases, inference of the endogenous mitochondrial genome becomes more difficult (a). In contrast, the prediction of the contaminant genome becomes more accurate at higher levels of present-day human contamination (b)
Fig. 4
Fig. 4
Consensus call and contamination estimate accuracy for empirical datasets.a The htslib consensus call (yellow) and the schmutzi consensus call (red) were performed on a subset of the data from three Neanderthals, one Denisovan and one early modern human. The number of mismatches between the mitochondrial consensus sequence and the published mitochondrial genome from the same individual was calculated.b Contamination was estimated using schmutzi (red) and contamMix v.1.0-10 (blue) and compared to the contamination computed using diagnostic positions (gray per fragment andblack per base). For the two Mezmaiskaya individuals, the endogenous genome used for comparison was obtained using another library with low levels of contamination from the same individual.diag pos diagnostic position,Nean Neanderthal
Fig. 5
Fig. 5
Contamination estimates and phylogenetic placement of Mezmaiskaya 1 (library ID B9687).a The posterior probability distribution for contamination in Mezmaiskaya 1. Thedotted line represents the estimate obtained using an ad hoc method based on fixed sites.b A maximum-likelihood tree showing the placement of the mitochondrial genome of Mezmaiskaya 1 (labeledMT in the tree) and the inferred contaminant (labeledMTc in the tree), compared to 20 present-day humans and nine archaic humans
Fig. 6
Fig. 6
Simulated versus measured contamination rates. Several sets contained simulated aDNA fragments from a mitochondrial genome belonging to an early modern human (left), a Neanderthal (middle) or a Denisovan (right). All simulated sets had damage patterns associated with a single-stranded library protocol. The double-stranded figure can be found in Additional file 1: Results. A contaminating present-day human was pooled together at various rates to simulate contamination. Thedotted black line represents a perfect prediction, andblue dots are the predicted rates of contamination by schmutzi once convergence was achieved. Thered dots represent sets for which the algorithm stopped prematurely due to lack of information about the contaminant fragments. Theblack whiskers represent the 95 % confidence interval for contamination
Fig. 7
Fig. 7
Robustness of the contamination estimate to lower coverage. The simulated dataset with a contamination rate of ∼47 % and single-stranded deamination patterns was subsampled at various coverages from 0 to 1250 ×. Top: Contamination rates were estimated across a range of coverages in simulated data for a Neanderthal, a Denisovan and an early modern human (Ust’-Ishim). Bottom: Contamination estimates when a high-quality mtDNA sequence from a closely related individual is used as the endogenous genome. Robust estimates can be made down to 5 × coverage even at 47 % contamination. For the early modern human, the contamination estimate provided was computed using the database alone and not the prediction of the contaminant genome thus leading to underestimates (see Table 4 for an example of the effect of using the predicted contaminant in the contamination estimate)
See this image and copyright information in PMC

References

    1. Prüfer K, Stenzel U, Hofreiter M, Pääbo S, Kelso J, Green RE. Computational challenges in the analysis of ancient DNA. Genome Biol. 2010;11:47. doi: 10.1186/gb-2010-11-5-r47. - DOI - PMC - PubMed
    1. Briggs AW, Stenzel U, Johnson PL, Green RE, Kelso J, Prüfer K, et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci. 2007;104:14616–21. doi: 10.1073/pnas.0704665104. - DOI - PMC - PubMed
    1. Poinar HN, Schwarz C, Qi J, Shapiro B, MacPhee RD, Buigues B, et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science. 2006;311:392–4. doi: 10.1126/science.1123360. - DOI - PubMed
    1. Bandelt HJ. Mosaics of ancient mitochondrial DNA: positive indicators of nonauthenticity. Eur J Hum Genet. 2005;13:1106–12. doi: 10.1038/sj.ejhg.5201476. - DOI - PubMed
    1. Wall JD, Kim SK. Inconsistencies in Neanderthal genomic DNA sequences. PLoS Genet. 2007;3:175. doi: 10.1371/journal.pgen.0030175. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources

Full text links
BioMed Central full text link BioMed Central Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp