Movatterモバイル変換

sigven/gvannoPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star58

Generic human DNA variant annotation pipeline

58 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
data-raw		data-raw
examples		examples
src		src
README.md		README.md
download_gvanno_refdata.py		download_gvanno_refdata.py
gvanno.py		gvanno.py

Repository files navigation

gvanno - generic workflow for functional and clinical annotation of human DNA variants

Overview

The generic variant annotator (gvanno) is a software package intended for simple analysis and interpretation of human DNA variants. Variants and genes are annotated with disease-related and functional associations. Technically, the workflow is developed in Python, and it relies uponDocker /Singularity technology for encapsulation of software dependencies.

gvanno accepts query files encoded in the VCF format, and can analyze both SNVs and short insertions or deletions (indels). The workflow relies heavily uponEnsembl's Variant Effect Predictor (VEP), andvcfanno. It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record. Note that if your input VCF contains data (genotypes) from multiple samples (i.e. a multisample VCF), the output TSV file will contain one line/recordper sample variant.

News

December 29th 2023 -1.7.0 release
- Data updates: ClinVar, GENCODE, GWAS catalog
- Software updates: VEP
- Improved Singularity support
April 27th 2023 -1.6.0 release
- Added option--oncogenicity_annotation - classifies variants according to oncogenicity (Horak et al., Genet Med, 2022)
- Data updates: ClinVar, GENCODE, GWAS catalog, CancerMine
- Excluded extensive disease associations from the Open Targets Platform
September 26th 2022 -1.5.1 release
- Added option--vep_coding_only - only report variants that fall into coding regions of transcripts (VEP option--coding_only)

Annotation resources (v1.7.0)

VEP - Variant Effect Predictor v110 (GENCODE v44/v19 as the gene reference dataset)
dBNSFP - Database of non-synonymous functional predictions (v4.5, November 2023)
gnomAD - Germline variant frequencies exome-wide (release 2.1, October 2018) - from VEP
dbSNP - Database of short genetic variants (build 154) - from VEP
ClinVar - Database of variants related to human health/disease phenotypes (December 2023)
CancerMine - literature-mined database of drivers, oncogenes and tumor suppressors in cancer (version 50, March 2023)
Mutation hotspots - Database of mutation hotspots in cancer
NHGRI-EBI GWAS Catalog - Catalog of published genome-wide association studies (November 2023)

Getting started

STEP 0: Prerequisites

Python
An installation of Python (version >=3.6) is required to rungvanno. Check that Python is installed by typingpython --version in your terminal window.
Other utilities
The script that installs the reference data requires that the user hasbgzip andtabix installed. Seehere for instructions. The script also requires that basic Linux/UNIX commands are available (i.e.gzip,tar)
NOTE:gvanno should be installed on a MacOS or Linux/UNIX operating system

STEP 1: Installation of Docker/Singularity

thegvanno workflow can be executed with eitherDocker orSingularity container technology

Installation of Docker

Install the Docker engine on your preferred platform
- installingDocker on Linux
- installingDocker on Mac OS
- NOTE: We have not yet been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with gvanno (an example beingmounting of data volumes)
Test that Docker is running, e.g. by typingdocker ps ordocker images in the terminal window
Adjust the computing resources dedicated to the Docker, i.e.:
- Memory: minimum 5GB
- CPUs: minimum 4
- How to - Mac OS X

Installation of Singularity

Install Singularity

STEP 2: Downloadgvanno and data bundle

Download and unpack the latest release
Install the assembly-specific VEP cache, and gvanno-specific reference data using thedownload_gvanno_refdata.py script, i.e.:
- python download_gvanno_refdata.py --download_dir <PATH_TO_DOWNLOAD_DIR> --genome_assembly grch38
NOTE: This can take a considerable amount of time depending on your local bandwidth (approx 20Gb pr. assembly-specific bundle)
Pull container images
- Docker
  - Pull thegvanno Docker image (v1.7.0) from DockerHub (approx 3.8Gb):
  - docker pull sigven/gvanno:1.7.0 (gvanno annotation engine)
- Singularity
  - Download thegvanno SIF image (v1.7.0) (approx 1.2Gb) and use this as the argument for--sif_file in thegvanno.py run script.

STEP 3: Input preprocessing

Thegvanno workflow accepts a single input file:

An unannotated, single-sample VCF file (>= v4.2) with germline variants (SNVs/InDels)

Westrongly recommend that the input VCF is compressed and indexed usingbgzip andtabix. NOTE: If the input VCF contains multi-allelic sites, these will be subject todecomposition.

STEP 5: Run example

Run the workflow withgvanno.py, which takes the following arguments and options:

usage:gvanno.py -h [options]--query_vcf <QUERY_VCF>--gvanno_dir <GVANNO_DIR>--output_dir <OUTPUT_DIR>--genome_assembly <grch37|grch38>--sample_id <SAMPLE_ID>--container <docker|singularity>gvanno - workflow for functional and clinical annotation of germline nucleotide variantsRequired arguments:--query_vcf QUERY_VCF                VCF input file with germline query variants (SNVs/InDels).--gvanno_dir GVANNO_DIR                Directory that contains the gvanno reference data, e.g. ~/gvanno-1.7.0--output_dir OUTPUT_DIR                Output directory--genome_assembly {grch37,grch38}                Genome assembly build: grch37 or grch38--container {docker,singularity}                Run gvanno with docker or singularity--sample_id SAMPLE_ID                Sample identifier - prefix for output filesVEP optional arguments:--vep_regulatory        Enable Variant Effect Predictor (VEP) to look for overlap with regulatory regions (option --regulatory in VEP).--vep_gencode_basic     Consider only basic GENCODE transcripts with Variant Effect Predictor (VEP).--vep_lof_prediction    Predict loss-of-function variants with the LOFTEE plugin in Variant Effect Predictor (VEP), default: False--vep_n_forks VEP_N_FORKS                Number of forks for Variant Effect Predictor (VEP) processing, default: 4--vep_buffer_size VEP_BUFFER_SIZE                Variant buffer size (variants read into memory simultaneously) for Variant Effect Predictor (VEP) processing                - set lower to reduce memory usage, higher to increase speed, default: 500--vep_pick_order VEP_PICK_ORDER                Comma-separated string of ordered transcript properties for primary variant pick in                Variant Effect Predictor (VEP) processing, default: canonical,appris,biotype,ccds,rank,tsl,length,mane--vep_no_intergenic                Skip intergenic variants in Variant Effect Predictor (VEP) processing, default: False--vep_coding_only          Only report variants falling into coding regions of transcripts (VEP), default: FalseOther optional arguments:--force_overwrite     By default, the script will fail with an error if any output file already exists.                You can force the overwrite of existing result files by using this flag, default: False--version             show program's version number and exit--no_vcf_validate     Skip validation of input VCF with Ensembl's vcf-validator, default: False--docker_uid DOCKER_USER_ID                Docker user ID. default is the host system user ID. If you are experiencing permission errors, try setting this up to root (`--docker-uid root`)--vcfanno_n_processes VCFANNO_N_PROCESSES                Number of processes for vcfanno processing (see https://github.com/brentp/vcfanno#-p), default: 4--oncogenicity_annotation                    Classify variants according to oncogenicity (Horak et al., Genet Med, 2022)--debug             Print full Docker/Singularity commands to log and do not delete intermediate files with warnings etc.--sif_filegvanno SIF image file for usage of gvanno workflow with option '--container singularity'

Theexamples folder contains an example VCF file. Analysis of the example VCF can be performed by the following command (Docker-based):

python ~/gvanno-1.7.0/gvanno.py--query_vcf ~/gvanno-1.7.0/examples/example.grch37.vcf.gz--gvanno_dir ~/gvanno-1.7.0--output_dir ~/gvanno-1.7.0--sample_id example--genome_assembly grch37--container docker--force_overwrite

or Singularity-based

python ~/gvanno-1.7.0/gvanno.py--query_vcf ~/gvanno-1.7.0/examples/example.grch37.vcf.gz--gvanno_dir ~/gvanno-1.7.0--output_dir ~/gvanno-1.7.0--sample_id example--genome_assembly grch37--container singularity--sif_file gvanno_1.7.0.sif--force_overwrite

This command will run the Docker-basedgvanno workflow and produce the following output files in theexamples folder:

example_gvanno_grch37.pass.vcf.gz (.tbi) - Bgzipped VCF file with rich set of functional/clinical variant and gene annotations
example_gvanno_grch37.pass.tsv.gz - Compressed TSV file with rich set of functional/clinical variant and gene annotations

Similar files are produced for all variants, not only variants with aPASS designation in the VCF FILTER column.

Documentation

Documentation of the various variant and gene annotations should be interrogated from the header of the annotated VCF file. The column names of the tab-separated values (TSV) file will be identical to the INFO tags that are documented in the VCF file.

Contact

sigven AT ifi.uio.no

About

Generic human DNA variant annotation pipeline

Releases24

VEP 110 Latest

Dec 29, 2023

+ 23 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

gvanno - generic workflow for functional and clinical annotation of human DNA variants

Contents

Overview

News

Annotation resources (v1.7.0)

Getting started

STEP 0: Prerequisites

STEP 1: Installation of Docker/Singularity

Installation of Docker

Installation of Singularity

STEP 2: Downloadgvanno and data bundle

STEP 3: Input preprocessing

STEP 5: Run example

Documentation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases24

Packages

Contributors3

Uh oh!

Languages

Movatterモバイル変換

sigven/gvanno

Folders and files

Latest commit

History

Repository files navigation

gvanno - generic workflow for functional and clinical annotation of human DNA variants

Contents

Overview

News

Annotation resources (v1.7.0)

Getting started

STEP 0: Prerequisites

STEP 1: Installation of Docker/Singularity

Installation of Docker

Installation of Singularity

STEP 2: Downloadgvanno and data bundle

STEP 3: Input preprocessing

STEP 5: Run example

Documentation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases24

Packages0

Contributors3

Uh oh!

Languages

Packages