lkuchenb/MultiHLAPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star17

WES HLA Typing based on multiple alternative tools

License

MIT license

17 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
bin		bin
compare		compare
datasets		datasets
fastq		fastq
img		img
map		map
ref		ref
trim		trim
truth		truth
typing		typing
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

HLA Typing Workflow

Scope of this workflow

This workflow enables the concurrent analysis of WES or WGS data usingpublicly available software to derive HLA haplotypes from this type of data.

Currently available software tools

xHLA
Xie, C., Yeo, Z. X., Wong, M., Piper, J., Long, T., Kirkness, E. F., ... & Brady, C. (2017). Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. Proceedings of the National Academy of Sciences, 114(30), 8059-8064.
The workflow implements read mapping the reads against hg38 without altcontigs usingbwa mem as instructed by the authors. The mapped reads are thensorted and index using samtools.
The workflow utilizes theDocker Image provided by the authors toperform the actual HLA typing.
HLA-VBSeq
Nariai, N., Kojima, K., Saito, S., Mimori, T., Sato, Y., Kawai, Y., ... & Nagasaki, M. (2015, December). HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data. In BMC genomics (Vol. 16, No. S2, p. S7). BioMed Central.
Wang, Y. Y., Mimori, T., Khor, S. S., Gervais, O., Kawai, Y., Hitomi, Y., ... & Nagasaki, M. (2019). HLA-VBSeq v2: improved HLA calling accuracy with full-length Japanese class-I panel. Human Genome Variation, 6(1), 1-5.
The workflow implements read mapping the reads against hg19 without altcontigs. The authors instructions merely state to "map against hg19"without any further specifics, but mapping against hg19 with alt contigsyielded very poor typing results with missing HLA class I genes, thusthe workflow uses hg19 without alt contigs.
HLA-VBSeq released two reference database versions:
- v1 database based on IMGT/HLA database, Release 3.15.0
- v2 database based on IMGT/HLA database Release 3.31.0 and Japanese HLA reference dataset
OptiType
Szolek, A., Schubert, B., Mohr, C., Sturm, M., Feldhahn, M., & Kohlbacher, O. (2014). OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, 30(23), 3310-3316.
The workflow invokes theOptiType snakemake wrapper without prior filteringof reads.
HLA-LA
Dilthey, A. T., Mentzer, A. J., Carapito, R., Cutland, C., Cereb, N., Madhi, S. A., ... & Phillippy, A. M. (2019). HLA*LA - HLA typing from linearly projected graph alignments. Bioinformatics, 35(21), 4394-4396.
The workflow uses reads mapped against the human genome (hg38) withoutalt contigs as input for HLA-LA. A corresponding reference txt file for HLA-LAis part of this workflow repository. The preprocessed graph directoryPRG_MHC_GRCh38_withIMGT can be either placed manually intyping/hla_la/hla_la.graphs/ or it will be downloaded and preprocessedautomatically.
The workflow uses theHLA-LA bioconda package for graph preprocessing and HLA typing.
arcasHLA
Orenbuch, R., Filip, I., Comito, D., Shaman, J., Pe’er, I., & Rabadan, R. (2020). arcasHLA: high-resolution HLA typing from RNAseq. Bioinformatics, 36(1), 33-40.
The workflow maps RNAseq reads against the human genome (hg38) withoutalt contigs using the STAR aligner with default paramters. It theninvokes the 'extract' and 'genotype' subtools provided by arcasHLA.

Usage

Install snakemake

conda install -c conda-forge mambamamba create -c conda-forge -c bioconda -n snakemake snakemakeconda activate snakemake

Clone theMultiHLA repository

git clone https://github.com/lkuchenb/MultiHLA.git hla_typingcd hla_typing

Put the input files in place
MultiHLA comes with a predefined folder structure:
- dataset/
  A dataset is defined as a set of samples. Place a TSV file here for every dataset with the following three named columns:
```
 SampleName  FileNameR1                              FileNameR2 Donor1      SEQ_D1_DAT_01_S53_L001_R1_001.fastq.gz  SEQ_D1_DAT_01_S53_L001_R2_001.fastq.gz Donor1      SEQ_D1_DAT_01_S53_L002_R1_001.fastq.gz  SEQ_D1_DAT_01_S53_L002_R2_001.fastq.gz Donor2      SEQ_D2_DAT_01_S54_L001_R1_001.fastq.gz  SEQ_D2_DAT_01_S54_L001_R2_001.fastq.gz Donor2      SEQ_D2_DAT_01_S54_L002_R1_001.fastq.gz  SEQ_D2_DAT_01_S54_L002_R2_001.fastq.gz Donor3      SEQ_D3_DAT_01_S55_L001_R1_001.fastq.gz  SEQ_D3_DAT_01_S55_L001_R2_001.fastq.gz Donor3      SEQ_D3_DAT_01_S55_L002_R1_001.fastq.gz  SEQ_D3_DAT_01_S55_L002_R2_001.fastq.gz
```
  FASTQ files have to come in gziped pairs and be named{prefix}_R[12]{suffix}.fastq.gz. A sample can be covered by an arbitrarynumber of FASTQ pairs (at least one).
- fastq/
  Place the FASTQ files as listed in your dataset sheet here.
- ref/
  Place or link the required human genome references here as described for each supported method, otherwise they will be automatically downloaded.
- trim/
  This is an output folder. It will be filled with adapter trimmed versions of the provided FASTQ files.
- typing/{method}/
  This is an output folder. It will be filled with subfolders for each method.
- workflow/
  This folder contains the workflow code.
Run the workflow
Invoke snakemake usingsnakemake --use-conda --use-singularity. This enablessnakemake to automatically install dependencies into conda environments thatare created on the fly and also enables the container based jobs to run. Toprocess all samples of a dataset, for example the datasetdataset_1described indatasets/dataset_1.tsv use
```
snakemake --use-conda --use-singularity typing/dataset_1.all.multihla
```
Memory and run time requirements for each job are noted in their resources (mem_mb andtime).

About

WES HLA Typing based on multiple alternative tools

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HLA Typing Workflow

Scope of this workflow

Currently available software tools

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

lkuchenb/MultiHLA

Folders and files

Latest commit

History

Repository files navigation

HLA Typing Workflow

Scope of this workflow

Currently available software tools

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages