PGScatalog/pgsc_calcPublic

NotificationsYou must be signed in to change notification settings
Fork33
Star152

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation

License

Apache-2.0 license

152 stars 33 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 883 Commits
.github		.github
assets		assets
conf		conf
docs		docs
environments		environments
lib		lib
modules/local		modules/local
subworkflows/local		subworkflows/local
tests		tests
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE-CHECKLIST.md		RELEASE-CHECKLIST.md
conftest.py		conftest.py
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Repository files navigation

The Polygenic Score Catalog Calculator (`pgsc_calc`)

Introduction

pgsc_calc is a bioinformatics best-practice analysis pipeline for calculatingpolygenic [risk] scores on samples with imputed genotypes using existing scoringfiles from thePolygenic Score (PGS) Catalogand/or user-defined PGS/PRS.

Pipeline summary

Important

Whole genome sequencing (WGS) dataare not currently supported by the calculator
It’s possible tocreate compatible gVCFs from WGS data. We plan to improve support for WGS data in the near future.

The workflow performs the following steps:

Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
Calculates PGS for all samples (linear sum of weights and dosages)
Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

And optionally:

Genetic Ancestry: calculate similarity of target samples to populations in areference dataset (1000 Genomes (1000G)), using principal components analysis (PCA)
PGS Normalization: Using reference population data and/or PCA projections to reportindividual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry

See documentation for a list of plannedfeatures under development.

PGS applications and libraries

pgsc_calc uses applications and libraries internally developed at the PGS Catalog, which can do helpful things like:

Query the PGS Catalog to bulk download scoring files in a specific genome build
Match variants from scoring files to target variants
Adjust calculated PGS in the context of genetic ancestry

If you want to write Python code to work with PGS,check out thepygscatalog repository to learn more.

If you want a simpler way of working with PGS, ignore this section and continue below to learn more aboutpgsc_calc.

Quick start

InstallNextflow(>=23.10.0)
InstallDocker orSingularity (v3.8.3 minimum)(please only useConda as a last resort)
Download the pipeline and test it on a minimal dataset with a single command:
```
nextflow run pgscatalog/pgsc_calc -profile test,<docker/singularity/conda>
```

Start running your own analysis!

nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --pgs_id PGS001229

Seegettingstarted for moredetails.

Documentation

Full documentation is available on Read the Docs

Credits

pgscatalog/pgsc_calc is developed as part of the PGS Catalog project, acollaboration between the University of Cambridge’s Department of Public Healthand Primary Care (Michael Inouye, Samuel Lambert) and the EuropeanBioinformatics Institute (Helen Parkinson, Laura Harris).

The pipeline seeks to provide a standardized workflow for PGS calculation andancestry inference implemented in nextflow derived from an existing set oftools/scripts developed by Inouye lab (Rodrigo Canovas, Scott Ritchie, JingqinWu) and PGS Catalog teams (Samuel Lambert, Laurent Gil).

The adaptation of the codebase, nextflow implementation, and PGS Catalog featuresare written by Benjamin Wingfield, Samuel Lambert, Laurent Gil with additional inputfrom Aoife McMahon (EBI). Development of new features, testing, and code reviewis ongoing including Inouye lab members (Rodrigo Canovas, Scott Ritchie) and others. Ifyou use the tool we ask you to cite our paper describing software and updated PGS Catalog resource:

Lambert, Wingfieldet al. (2024) Enhancing the Polygenic Score Catalog with tools for scorecalculation and ancestry normalization. Nature Genetics.doi:10.1038/s41588-024-01937-x.

This pipeline is distrubuted under anApache License amd uses code andinfrastructure developed and maintained by thenf-core community(Ewelset al. Nature Biotech (2020) doi:10.1038/s41587-020-0439-x),reused here under theMIT license.

Additional references of open-source tools and data used in this pipeline are described inCITATIONS.md.

This work has received funding from EMBL-EBI core funds, the Baker Institute,the University of Cambridge, Health Data Research UK (HDRUK), and the EuropeanUnion’s Horizon 2020 research and innovation programme under grant agreement No101016775 INTERVENE.

About

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation

pgsc-calc.readthedocs.io/en/latest/

Releases23

v2.1.0 Latest

Jun 23, 2025

+ 22 releases

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Polygenic Score Catalog Calculator (`pgsc_calc`)

Introduction

Pipeline summary

PGS applications and libraries

Quick start

Documentation

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases23

Uh oh!

Contributors4

Languages

Movatterモバイル変換

License

PGScatalog/pgsc_calc

Folders and files

Latest commit

History

Repository files navigation

The Polygenic Score Catalog Calculator (pgsc_calc)

Introduction

Pipeline summary

PGS applications and libraries

Quick start

Documentation

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases23

Uh oh!

Contributors4

Languages

The Polygenic Score Catalog Calculator (`pgsc_calc`)