- Notifications
You must be signed in to change notification settings - Fork33
Description
Hi!
I am running pgsc_calc on a hpc, on a large dataset (800 individuals and 5k PGS scores) as following:
nextflow run pgscatalog/pgsc_calc
-profile apptainer
-resume
-c my_nextflow_config.config
--input samplesheet_subset.csv
--target_build GRCh37
--scorefile '*_hmPOS_GRCh37.txt.gz'
--run_ancestry /pgsc_HGDP+1kGP_v1.tar.zst
--outdir subset_inclancestry
The config file is copied from thetutorial on your website, thank you for providing!
The pipeline works for a subset of 100 scores. I do suspect the pipeline to use a lot more memory if I increase the number of scores. As a precautionary measure I have increased the memory for the steps MATCH_VARIANTS and MATCH_COMBINE in my config file. However, I noted already for the 100 scores that the first step of the pipeline (INPUT_CHECK_FORMAT_SCOREFILES) needs a lot of memory. Is there a way to increase the memory for this step beforehand? Currently, it needs to try the limit of 16G before moving to 32G and 64G, which takes a lot of time when I already know the process will be out of memory.
I hope you can help me, perhaps I have missed an obvious approach to change this variable but I did not find it!