- Notifications
You must be signed in to change notification settings - Fork0
Reusable and maintained Luigi tasks to incorporate in bioinformatics pipelines
License
PavlidisLab/bioluigi
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Reusable and maintained Luigi tasks to incorporate in bioinformatics pipelines
Provides Luigi tasks for tools fromsamtools,bcftools,STAR,RSEM,vcfanno,GATK,Ensembl VEP and much more!
Reuses as much as possible theExternalProgramTask
interface from theexternal_program contrib moduleand extends its feature to make it work on modern scheduler such asSlurm.
Provides basic resource management for a local scheduler: all tasks are annotated with reasonable defaultcpus
andmemory
parameters that can be tuned and constrained via the[resources]
configuration. In the case of externally scheduled tasks, the resource management is deferred.
Provides a command-line interface for interacting more conveniently with Luigi scheduler.
bioluigi list [--status STATUS] [--user USER] [--detailed] TASK_GLOBbioluigi show TASK_ID
Here's a list of supported tools:
- sratoolkit with
prefetch
andfastq-dump
- bcftools
- FastQC
- MultiQC
- local
- Slurm
The most convenient way of using the pre-defined tasks is to yield them dynamically in the body of therun
function. It's also possible to require them since they inherit fromluigi.Task
.
importluigifrombioluigi.tasksimportbcftoolsdefMyTask(luigi.Task)definput(self):returnluigi.LocalTarget('source.vcf.gz')defrun(self):yieldbcftools.Annotate(self.input().path,annotations_file,self.output().path, ...,scheduler='slurm',cpus=8)defoutput(self):returnluigi.LocalTarget('annotated.vcf.gz')
You can define your own scheduled task by implementing theScheduledExternalProgramTask
class. Note that the default scheduler islocal
and will use Luigi's[resources]
allocation mechanism.
importdatetimefrombioluigi.scheduled_external_programimportScheduledExternalProgramTaskclassMyScheduledTask(ScheduledExternalProgramTask):scheduler='slurm'walltime=datetime.timedelta(seconds=10)cpus=1memory=1defprogram_args(self):return ['sleep','10']
About
Reusable and maintained Luigi tasks to incorporate in bioinformatics pipelines