- Notifications
You must be signed in to change notification settings - Fork7
miku/gluish
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Note: v0.2.X cleans up some cruft from v0.1.X. v0.2.X still passes the sametests as v0.1.X, but removes a lot of functionality unrelated to luigi. Pleasecheck, before you upgrade.
Luigi 2.0 compatibility: gluish 0.2.3 or higher.
Note that luigi dropped Python 2 support, and so does this package, starting with 0.3.0.
Some glue aroundluigi.
Provides a base class, that autogenerates its output filenames based on
- some base path,
- a tag,
- the task id (the classname and the significant parameters)
Additionally, this package provides a few smaller utilities, like a TSV format,a benchmarking decorator and some task templates.
This project has been developed forProject finc atLeipzig University Library.
gluish.task.BaseTask
is intended to be used as a supertask.
fromgluish.taskimportBaseTaskimportdatetimeimportluigiimporttempfileclassDefaultTask(BaseTask):""" Some default abstract task for your tasks. BASE and TAG determine the paths, where the artefacts will be stored. """BASE=tempfile.gettempdir()TAG='just-a-test'classRealTask(DefaultTask):""" Note that this task has a `self.path()`, that figures out the full path for this class' output. """date=luigi.DateParameter(default=datetime.date(1970,1,1))defrun(self):withself.output().open('w')asoutput:output.write('Hello World!')defoutput(self):returnluigi.LocalTarget(path=self.path())
When instantiating aRealTask
instance, it will automatically be assigned astructured output path, consisting ofBASE
,TAG
, task name and a slugifiedversion of the significant parameters.
task=RealTask()task.output().path# would be something like this on OS X:# /var/folders/jy/g_b2kpwx0850/T/just-a-test/RealTask/date-1970-01-01.tsv
Was started on themailing list.Continuing the example from above, lets create a task, that generates TSVfiles, namedTabularSource
.
fromgluish.formatimportTSVclassTabularSource(DefaultTask):date=luigi.DateParameter(default=datetime.date(1970,1,1))defrun(self):withself.output().open('w')asoutput:foriinrange(10):output.write_tsv(i,'Hello','World')defoutput(self):returnluigi.LocalTarget(path=self.path(),format=TSV)
Another class,TabularConsumer
can useiter_tsv
on the handle obtainedby opening the file. Therow
will be a tuple, or - ifcols
is specified -acollections.namedtuple
.
classTabularConsumer(DefaultTask):date=luigi.DateParameter(default=datetime.date(1970,1,1))defrequires(self):returnTabularSource()defrun(self):withself.input().open()ashandle:forrowinhandle.iter_tsv(cols=('id','greeting','greetee'))print('{0} {1}!'.format(row.greeting,row.greetee))defcomplete(self):returnFalse
Leverage command line tools withgluish.utils.shellout
.shellout
willtake a string argument and will format it according to the keyword arguments.The{output}
placeholder is special, since it will be automatically assigneda path to a temporary file, if it is not specified as a keyword argument.
The return value ofshellout
is the path to the{output}
file.
Spaces in the given string are normalized, unlesspreserve_whitespace=True
ispassed. A literal curly brace can be inserted by{{
and}}
respectively.
An exception is raised, whenever the commands exit with a non-zero return value.
Note: If you want to make sure an executable is available on you system before the task runs,youcan use agluish.common.Executable
task as requirement.
fromgluish.commonimportExecutablefromgluish.utilsimportshelloutimportluigiclassGIFScreencast(DefaultTask):""" Given a path to a screencast .mov, generate a GIF which is funnier by definition. """filename=luigi.Parameter(description='Path to a .mov screencast')delay=luigi.IntParameter(default=3)defrequires(self):return [Executable(name='ffmpg'),Executable(name='gifsicle',message='http://www.lcdf.org/gifsicle/')]defrun(self):output=shellout("""ffmpeg -i {infile} -s 600x400 -pix_fmt rgb24 -r 10 -f gif - | gifsicle --optimize=3 --delay={delay} > {output} """,infile=self.filename,delay=self.delay)luigi.LocalTarget(output).move(self.output().path)defoutput(self):returnluigi.LocalTarget(path=self.path())
Sometimes theeffective date for a task needs to be determined dynamically.
Consider for example a workflow involving an FTP server.
A data source is fetched from FTP, but it is not known, when updates aresupplied. So the FTP server needs to be checked in regular intervals.Dependent tasks do not need to be updated as long as there is nothing newon the FTP server.
To map an arbitrary date to theclosest date in the past, where an updateoccured, you can use agluish.parameter.ClosestDateParameter
, which is just an ordinaryDateParameter
but will invoketask.closest()
behind the scene, tofigure out theeffective date.
fromgluish.parameterimportClosestDateParameterimportdatetimeimportluigiclassSimpleTask(DefaultTask):""" Reuse DefaultTask from above """date=ClosestDateParameter(default=datetime.date.today())defclosest(self):# invoke dynamic checks here ...# for simplicity, map this task to the last mondayreturnself.date-datetime.timedelta(days=self.date.weekday())defrun(self):withself.output().open('w')asoutput:output.write("It's just another manic Monday!")defoutput(self):returnluigi.LocalTarget(path=self.path())
A short, self contained example can be found inthis gist.