NotificationsYou must be signed in to change notification settings
Fork46
Star242

Quickstart

Francisco Zorrilla edited this pageJul 27, 2024 ·26 revisions

❗Please refer to main READMEinstallation one-liner or the detailedsetup guide for recommended installation❗

Automated installation

Clone this repository to your HPC or local computer:

git clone https://github.com/franciscozorrilla/metaGEM.git # Download metaGEM repocd metaGEM # Move into metaGEM directoryrm -r .git # Remove ~250 Mb of unneeded git history files

Pressy andEnter when prompted to remove write-protected files, these are not necessary and just eat your precious space.

rm: remove write-protected regular file ‘.git/objects/pack/pack-f4a65f7b63c09419a9b30e64b0e4405c524a5b35.pack’? yrm: remove write-protected regular file ‘.git/objects/pack/pack-f4a65f7b63c09419a9b30e64b0e4405c524a5b35.idx’? y

Run theenv_setup.sh script:

bash env_setup.sh # Run automated setup script

Thisenv_setup.sh script will prompt you to set up 4 conda environments in theenvs/ folder:

mamba
- Only used for installing mamba and setting up subsequent environments from recipe files
metagem
- Contains mostmetaGEM core workflow tools
- Python 3
metawrap
- Contains onlymetaWRAP and its dependencies
- Python 2
prokkaroary
- Contains bonus tools

Don't worry, you don't need to install everything right away. You can already start processing you raw sequences with just themetagem conda env installed.

If you run into issues with the automated installation please refer to themanual installation page.

Checking your installation

To make sure that the basics have been properly configured, run thecheck task using themetaGEM.sh parser:

bash metaGEM.sh -t check

This will check if conda is installed/available and verify that the environments were properly set up by theenv_setup.sh script.Additionally, thischeck function will prompt you to create results folders if they are not already present.Finally, this task will check if any sequencing files are present in the dataset folder, prompting the user to the either organize already existing files into sample-specific subfolders or to download a smalltoy dataset.

Environments

The conda environments will be set up under the/envs folder:

envs/├── mamba/├── metagem/├── metawrap/└── prokkaroary/

Input data

metaGEM expects data files to be organized into sample specific subdirectories within thedataset folder, note that this will be done automatically after downloading the toy dataset files. Alternatively, users can dump all fastq files in thedataset folder and run themetaGEM taskorganizeData:

bash metaGEM.sh --task organizeData

This is how the dataset folder should look:

dataset/└── {SAMPLE ID 1}/    ├── {SAMPLE ID 1}_R1.fastq.gz    └── {SAMPLE ID 1}_R2.fastq.gz└── {SAMPLE ID 2}/    ├── {SAMPLE ID 2}_R1.fastq.gz    └── {SAMPLE ID 2}_R2.fastq.gz└── {SAMPLE ID 3}/    ├── {SAMPLE ID 3}_R1.fastq.gz    └── {SAMPLE ID 3}_R2.fastq.gz...

Note that theorganizeData task expects that your samples are named according to the following scheme:

{SAMPLE ID}_R{1|2}.fastq.gz, e.g. ERR260137_R1.fastq.gz, ERR260137_R2.fastq.gz, ERR260138_R1.fastq.gz, etc.

Config files

Make sure to inspect and set up the two config files to ensure smoothmetaGEM runs:

Snakemake configuration

Theconfig.yaml handles all the tunable parameters, subfolder names, paths, and more. Theroot path is automatically set by themetaGEM.sh parser to be the current working directory. Most importantly, you should make sure that thescratch path is properly configured. Most clusters have a location for temporary or high I/O operations such as$TMPDIR or$SCRATCH, e.g.see here. Please refer to theconfig.yamlwiki page for a more in depth look at this config file.

Cluster configuration

Thecluster_config.json handles parameters for submitting jobs to the cluster workload manager. Most importantly, you should make sure that theaccount is properly defined to be able to submit jobs to your cluster. Please refer to thecluster_config.jsonwiki page for a more in depth look at this config file.

Tools requiring additional configuration

Please note that you will need to set up the following tools/databases to run the complete core metaGEM workflow:

1. CheckM

CheckM is used extensively within themetaWRAP modules to evaluate the output of various intermediate steps. Although theCheckM package is installed in themetawrap environment, the user is required to download theCheckM database and runcheckm data setRoot <db_dir> as outlined in theCheckM installation guide.

2. GTDB-Tk

GTDB-Tk is used for taxonomic assignment of MAGs, and requires a database to be downloaded and configured. Please refer to theinstallation documentation for detailed instructions.

3. CPLEX

UnfortunatelyCPLEX cannot be automatically installed in theenv_setup.sh script, you must install this dependency manually within the metagem conda environment. GEM reconstruction and GEM community simulations require theIBM CPLEX solver, which isfree to download with an academic license. Refer to theCarveMe andSMETANA installation instructions for further information or troubleshooting. Note:CPLEX v.12.8 is recommended.

Movatterモバイル変換

Quickstart

❗Please refer to main READMEinstallation one-liner or the detailedsetup guide for recommended installation❗

Automated installation

Checking your installation

Environments

Input data

Config files

Snakemake configuration

Cluster configuration

Tools requiring additional configuration

1. CheckM

2. GTDB-Tk

3. CPLEX

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Resources

Key Files

Workflow

Core

Bonus

Clone this wiki locally