Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

vezzi/joint_variant_calling

Repository files navigation

Python package to run Join Calling on population at NGI (National Genomics Infrastructure) Sweden.The scriptjoint_variant_calling.py implements the GATK-Workflow described inhttps://www.broadinstitute.org/gatk/guide/article?id=3893

With option--mixed-positions VQSR step is executed following best-practice described inhttp://gatkforums.broadinstitute.org/gatk/discussion/2805/howto-recalibrate-variant-quality-scores-run-vqsrin order to avoid the problem with MIXED positions (i.e., position where an indel overlaps a SNP). This is the mode used to run SweGen dataset.

Samples to be be join called can be specified in two ways:

  • in the sample field of the config.yaml file to be provided as input. Complete path to the gvcf file needs to be provided
  • a file named 00_samples.txt in the cwd. If present samples specified in this file will be used instead of those speccified in the config

If run likepython joint_variant_calling.py --config config.yamlit creates the following folder structure

  • 00_intervals: optional see Intervals section
  • 00_samples.txt: samples that are processed (i.e., samples that are join called)
  • 01_CombineGVCFs: step one is CombineGVCFs, batching gvcf files together
  • 02_GenotypeGVCFs: then run GenotypeGVCFs
  • 03_CatVariants: merge the gvcfs into one in case computation has been spread out into intervals (see Intervals section)
  • 04_SelectVariants: extract SNPs and INDELs and run eveluation tool from GATK to prepare VQSR
  • 05_VariantRecalibrator: first step of VQSR
  • 06_ApplyRecalibration: second step of VQSR

If run like python joint_variant_calling.py --config config.yaml --mixed-positionsit creates the following folder structure

Folders 01, 02, ..., 06 are all organised in the same way:

  • sbatch: sbatch files to be submitted to the slurm queue (Uppmax assumed)
  • std_err: output of the standard error
  • std_out: output of the standard output
  • VCF: contains the gvcf or vcf files (in general results files, in case of05_VariantRecalibrator contains recalibration tables)

Folder01_CombineGVCFs contains an extra sub-folder:

  • batches: used to restart joint calling if new samples are available [UNSTABLE: UNDER TEST]

If run like python joint_variant_calling.py --config config.yaml --resumeit resumes the joint calling adding new samples and recomputing only the last (if needed) and the new batches (in01_CombineGVCFs)

Intervals

The most time consuming steps of the workflow are01_CombineGVCFs and02_GenotypeGVCFs. These two steps can be parallised running the commands on non overlapping sections of the genome.For this purpose an utility script is provided:python utils/create_interval_list.py --dict intervals/human_g1k_v37.dict --block 1000000000the fileintervals/human_g1k_v37.dict can be found in this repo.If run in a directory (e.g. 00_intervals) it creates the intrvals files (.intervals). This folder need to be provided in the config.yaml file. Running the example command will create 4 blocks,first 3 of 1Gbp and the last one of ~200Mbp.

Theexample directory in this repo contains a dry-run of joint_variant_calling.py run on intervals generated by this command, on 7 samples, in batches of 4 (i.e.,this creates two batches, one of 4 and one of 3 samples)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp