Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

License

NotificationsYou must be signed in to change notification settings

google/deepvariant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

releaseannouncementsblog

DeepVariant is a deep learning-based variant caller that takes aligned reads (inBAM or CRAM format), produces pileup image tensors from them, classifies eachtensor using a convolutional neural network, and finally reports the results ina standard VCF or gVCF file.

DeepVariant supports germline variant-calling in diploid organisms.

DeepVariant case-studies for germline variant calling:

Pangenome-aware DeepVariant case-studies:

We have also adapted DeepVariant for somatic calling. See theDeepSomatic repo for details.

Please also note:

  • DeepVariant currently supports variant calling on organisms where theploidy/copy-number is two. This is because the genotypes supported arehom-alt, het, and hom-ref.
  • The models included with DeepVariant are only trained on human data. Forother organisms, see theblog post on non-human variant-callingfor some possible pitfalls and how to handle them.

DeepTrio

DeepTrio is a deep learning-based trio variant caller built on top ofDeepVariant. DeepTrio extends DeepVariant's functionality, allowing it toutilize the power of neural networks to predict genomic variants in trios orduos. Seethis page for more details andinstructions on how to run DeepTrio.

DeepTrio supports germline variant-calling in diploid organisms for thefollowing types of input data:

Please also note:

  • All DeepTrio models were trained on human data.
  • It is possible to use DeepTrio with only 2 samples (child, and one parent).
  • External toolGLnexus is used tomerge output VCFs.

How to run DeepVariant

We recommend using our Docker solution. The command will look like this:

BIN_VERSION="1.9.0"docker run \  -v "YOUR_INPUT_DIR":"/input" \  -v "YOUR_OUTPUT_DIR:/output" \  google/deepvariant:"${BIN_VERSION}" \  /opt/deepvariant/bin/run_deepvariant \  --model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO,ONT_R104,HYBRID_PACBIO_ILLUMINA]**  --ref=/input/YOUR_REF \  --reads=/input/YOUR_BAM \  --output_vcf=/output/YOUR_OUTPUT_VCF \  --output_gvcf=/output/YOUR_OUTPUT_GVCF \  --num_shards=$(nproc) \ **This will use all your cores to run make_examples. Feel free to change.**  --vcf_stats_report=true \ **Optional. Creates VCF statistics report in html file. Default is false.  --disable_small_model=true \ **Optional. Disables the small model from make_examples stage. Default is false.  --logging_dir=/output/logs \ **Optional. This saves the log output for each stage separately.  --haploid_contigs="chrX,chrY" \ **Optional. Heterozygous variants in these contigs will be re-genotyped as the most likely of reference or homozygous alternates. For a sample with karyotype XY, it should be set to "chrX,chrY" for GRCh38 and "X,Y" for GRCh37. For a sample with karyotype XX, this should not be used.  --par_regions_bed="/input/GRCh3X_par.bed" \ **Optional. If --haploid_contigs is set, then this can be used to provide PAR regions to be excluded from genotype adjustment. Download links to this files are available in this page.  --dry_run=false **Default is false. If set to true, commands will be printed out but not executed.

For details on X,Y support, please seeDeepVariant haploid support and the casestudy inDeepVariant X, Y case study. Youcan download the PAR bed files from here:GRCh38_par.bed,GRCh37_par.bed.

To see all flags you can use, run:docker run google/deepvariant:"${BIN_VERSION}"

If you're using GPUs, or want to use Singularity instead, seeQuick Start for more details.

If you are running on a machine with a GPU, an experimental mode is availablethat enables running themake_examples stage on the CPU while thecall_variants stage runs on the GPU simultaneously.For more details, refer to theFast Pipeline case study.

For more information, also see:

How to cite

If you're using DeepVariant in your work, please cite:

A universal SNP and small-indel variant caller using deep neural networks.Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.
doi:https://doi.org/10.1038/nbt.4235

Additionally, if you are generating multi-sample calls using ourDeepVariant and GLnexus Best Practices, pleasecite:

Accurate, scalable cohort variant calls using DeepVariant and GLnexus.Bioinformatics (2021).
Taedong Yun, Helen Li, Pi-Chuan Chang, Michael F. Lin, Andrew Carroll, and CoryY. McLean.
doi:https://doi.org/10.1093/bioinformatics/btaa1081

Why Use DeepVariant?

  • High accuracy - DeepVariant won 2020PrecisionFDA Truth Challenge V2for All Benchmark Regions for ONT, PacBio, and Multiple Technologiescategories, and 2016PrecisionFDA Truth Challengefor best SNP Performance. DeepVariant maintains high accuracy across datafrom different sequencing technologies, prep methods, and species. Forlower coverage,using DeepVariant makes an especially great difference. Seemetrics for the latest accuracy numbers on each of thesequencing types.
  • Flexibility - Out-of-the-box use forPCR-positivesamples andlow quality sequencing runs,and easy adjustments fordifferent sequencing technologiesandnon-human species.
  • Ease of use - No filtering is needed beyond setting your preferredminimum quality threshold.
  • Cost effectiveness - With a single non-preemptible n1-standard-16machine on Google Cloud, it costs ~$11.8 to call a 30x whole genome and~$0.89 to call an exome. With preemptible pricing, the cost is $2.84 for a30x whole genome and $0.21 for whole exome (not considering preemption).
  • Speed - Seemetrics for the runtime of all supporteddatatypes on a 96-core CPU-only machine. Multiple options foracceleration exist.
  • Usage options - DeepVariant can be run via Docker or binaries, usingboth on-premise hardware or in the cloud, with support for hardwareaccelerators like GPUs and TPUs.

(1): Time estimates do not include mapping.

How DeepVariant works

Stages in DeepVariant

For more information on the pileup images and how to read them, please see the"Looking through DeepVariant's Eyes" blog post.

DeepVariant relies onNucleus, a library ofPython and C++ code for reading and writing data in common genomics file formats(like SAM and VCF) designed for painless integration with theTensorFlow machine learning framework. Nucleuswas built with DeepVariant in mind and open-sourced separately so it can be usedby anyone in the genomics research community for other projects. See this blogpost onUsing Nucleus and TensorFlow for DNA Sequencing Error Correction.

DeepVariant Setup

Prerequisites

  • Unix-like operating system (cannot run on Windows)
  • Python 3.10

Official Solutions

Below are the official solutions provided by theGenomics team in Google Health.

NameDescription
DockerThis is the recommended method.
Build from sourceDeepVariant comes with scripts to build it on Ubuntu 20.04. To build and run on other Unix-based systems, you will need to modify these scripts.
Prebuilt BinariesAvailable atgs://deepvariant/. These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the/proc/cpuinfo file on your computer, which lists these features under "flags".

Contribution Guidelines

Pleaseopen a pull request ifyou wish to contribute to DeepVariant. Note, we have not set up theinfrastructure to merge pull requests externally. If you agree, we will test andsubmit the changes internally and mention your contributions in ourrelease notes. We apologizefor any inconvenience.

If you have any difficulty using DeepVariant, feel free toopen an issue. If you havegeneral questions not specific to DeepVariant, we recommend that you post on acommunity discussion forum such asBioStars.

License

BSD-3-Clause license

Acknowledgements

DeepVariant happily makes use of many open source packages. We would like tospecifically call out a few key ones:

We thank all of the developers and contributors to these packages for theirwork.

Disclaimer

This is not an official Google product.

NOTE: the content of this research code repository (i) is not intended to be amedical device; and (ii) is not intended for clinical use of any kind, includingbut not limited to diagnosis or prognosis.

About

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp