common-workflow-language/cwlprov-pyPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star3

cwlprov: Command line tool and Python API to explore Research Objects containing provenance of Common Workflow Language executions

License

Apache-2.0 license

3 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.github		.github
cwlprov		cwlprov
mypy-stubs		mypy-stubs
test		test
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pylintrc		.pylintrc
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
ci-tests.yml		ci-tests.yml
mypy-requirements.txt		mypy-requirements.txt
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Repository files navigation

CWLProv Python tool

Thecwlprov Python tool is a command line interface to validate and inspectCWLProv Research Objects that capture workflow runs,typically executed in aCommon Workflow Languageimplementation.

Installation

You'll needPython 3.

To install frompip try:

pip3 install cwlprov

If you would rather install from the checkout of this source code:

pip3 install .

If you would like to use thecwltool rerun feature you may also need:

pip3 install cwlref-runner

Development

To develop cwlprov-py it is recommended to set up a newvirtualenv:

virtualenv -p python3 venv

To activate the environment and install your development version of cwlprov:

. venv3/bin/activatepip3 install .

Usage

Usecwlprov --help to see all options. For instancecwlprov validate will validate the folder is valid according to CWLProv.

$ cwlprov --helpusage: cwlprov [-h] [--version] [--directory DIRECTORY] [--relative]            [--absolute] [--output OUTPUT] [--verbose] [--quiet] [--hints]            [--no-hints]            {validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}            ...cwlprov explores Research Objects containing provenance of Common WorkflowLanguage executions. <https://w3id.org/cwl/prov/>optional arguments:-h, --help            show this help message and exit--version             show program's version number and exit--directory DIRECTORY, -d DIRECTORY                        Path to CWLProv Research Object (default: .)--relative            Output paths relative to current directory (default if                        -d is missing or relative)--absolute            Output absolute paths (default if -d is absolute)--output OUTPUT, -o OUTPUT                        File to write output to (default: stdout)--verbose, -v         Verbose logging (repeat for more verbose)--quiet, -q           No logging or hints--hints               Show hints on cwlprov usage--no-hints            Do not show hintscommands:{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}    validate            validate the CWLProv Research Object    info                show research object metadata    who                 show who ran the workflow    prov                export workflow execution provenance in PROV format    inputs              list workflow/step input files/values    outputs             list workflow/step output files/values    run                 show workflow execution log    runs                List all workflow executions in RO    rerun               Rerun a workflow or step    derived             List what was derived from a data item, based on                        activity usage/generation    runtimes            Calculate average step execution runtimes

Thetest/ folder contains some examples of workflow runs for different CWLProv profiles.

All commands forcwlprov will attempt to detect the CWLProv research object from the current directory, alternatively take the--directory option to specify the root folder.

The--quiet option may be used in scripts for less verbose outputs. The--verbose option has the opposite affect to enable logging. For debug logging, use-vv or--verbose --verbose.

Note that the general arguments listed above must be providedbefore thecommand, e.g.

cwlprov --quiet --directory /tmp/1 validate

Many of the commands accept additional arguments, which can be accessed bycwlprov COMMAND --help, e.g.:

$ cwlprov run --helpusage: cwlprov run [-h] [--step STEP] [--steps] [--no-steps] [--start]                [--no-start] [--end] [--no-end] [--duration]                [--no-duration] [--labels] [--no-labels] [--inputs]                [--outputs]                [id]positional arguments:id                    workflow run UUIDoptional arguments:-h, --help            show this help message and exit--step STEP, -s STEP  Show only step with given UUID--steps               List steps of workflow--no-steps            Do not list steps--start               Show start timestamps (default)--no-start, -S        Do not show start timestamps--end, -e             Show end timestamps--no-end              Do not show end timestamps--duration            Show step duration (default)--no-duration, -D     Do not show step duration--labels              Show activity labels--no-labels, -L       Do not show activity labels--inputs, -i          Show inputs--outputs, -o         Show outputs

Validation

Runningcwlprov with no commands will return with status 0 if a CWLProv folder structure is detected:

$ cd test/revsort-cwlprov-0.4.0test/revsort-cwlprov-0.4.0$ cwlprov Detected CWLProv Research Object: /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0$ cd /tmp/tmp$ cwlprovERROR:cwlprov.tool:Could not find bagit.txt, try cwlprov -d mybag/

If a cwlprov is not detected or invalid, an error code is raised.

cwlprov && echo Do cwlprov-stuffERROR:cwlprov.tool:Could not find bagit.txt, try cwlprov -d mybag/

Combined with the--quiet optioncwlprov can be useful to find the root of a CWLProv folder:

test/revsort-cwlprov-0.4.0/metadata/provenance$ cwlprov -q/home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0

All commands ofcwlprov will by default perform aquick validation, which conforms all files are present in the correct file size. For instance, if we remove a file:

test/revsort-cwlprov-0.4.0$ rm data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 test/revsort-cwlprov-0.4.0$ cwlprov ERROR:cwlprov.tool:BagIt validation failed for: /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0: Payload-Oxum validation failed. Expected 3 files and 3333 bytes but found 2 files and 2222 bytes

To perform full validation, usecwlprov validate:

test/revsort-cwlprov-0.4.0$ cwlprov validate WARNING:bdbag.bdbagit:data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 exists in manifest but was not found on filesystemERROR:cwlprov.tool:BagIt validation failed for: /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0: Bag validation failed: data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 exists in manifest but was not found on filesystemtest/revsort-cwlprov-0.4.0$ git checkout . test/revsort-cwlprov-0.4.0$ cwlprov validateValid CWLProv RO: .

Unlike the quick validation,cwlprov validate will confirm checksums on all files, and thus detect byte-level changes. For instance, let's pretendI has been replaced with lower casei in a data file:

test/revsort-cwlprov-0.4.0$ sed -i 's/I/i/g' data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376test/revsort-cwlprov-0.4.0$ cwlprov Detected CWLProv Research Object: /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0test/revsort-cwlprov-0.4.0$ cwlprov validateWARNING:bdbag.bdbagit:data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 sha1 validation failed: expected="327fc7aedf4f6b69a42a7c8b808dc5a7aff61376" found="60c41d3758bc8b03e78db07bc0f17d1804d2662d"ERROR:cwlprov.tool:BagIt validation failed for: /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0: Bag validation failed: data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 sha1 validation failed: expected="327fc7aedf4f6b69a42a7c8b808dc5a7aff61376" found="60c41d3758bc8b03e78db07bc0f17d1804d2662d"

Research Object information

Thecwlprov info command gives high-level information about the research object and its identifiers.

test/revsort-cwlprov-0.4.0$ cwlprov infoResearch Object of CWL workflow runResearch Object ID: arcp://uuid,d47d3d43-4830-44f0-aa32-4cda74849c63/Profile: https://w3id.org/cwl/prov/0.4.0Workflow run ID: urn:uuid:d47d3d43-4830-44f0-aa32-4cda74849c63Packaged: 2018-08-21

TheProfile indicates the version of the CWLProv the research object implements,which determine which features of a workflow run is represented.

Note that a warning will be printed if an unknown CWLProv version is detected:

$ cwlprovWARNING:cwlprov.tool:Unsupported CWLProv version: {'https://w3id.org/cwl/prov/0.8.0'}Supported profiles:https://w3id.org/cwl/prov/0.6.0https://w3id.org/cwl/prov/0.5.0https://w3id.org/cwl/prov/0.4.0https://w3id.org/cwl/prov/0.3.0

This typically means that cwlprov-py is outdated, although that is normally harmless. Trypip install --upgrade cwlprov

Thecwlprov who command will try to determine the user that ran the workflow.

$ cwlprov whoPackaged By: cwltool 1.0.20180925133620 <urn:uuid:d9c16ea5-c3fd-4c56-b125-f3a5207e6c38>Executed By: Stian Soiland-Reyes <https://orcid.org/0000-0001-9842-9718>

Note that for privacy concerns, CWL executors likecwltoolwould not log such user information unless this has been enabled with options like--orcid--full-name or--enable-user-provenance.

Workflow run

To list the step executions of a workflow usecwlprov run:

test/revsort-cwlprov-0.4.0$ cwlprov run2018-08-21 17:26:24.467844 Flow d47d3d43-4830-44f0-aa32-4cda74849c63 [ Run of workflow/packed.cwl#main 2018-08-21 17:26:24.530884 Step 6f501717-0c97-492e-b18a-10bc096f1797   Run of workflow/packed.cwl#main/rev  (0:00:01.122498)2018-08-21 17:26:25.656084 Step e7c8b2c0-dee6-4c61-b674-f0807cb47344   Run of workflow/packed.cwl#main/sorted  (0:00:01.087999)2018-08-21 17:26:26.752493 Flow d47d3d43-4830-44f0-aa32-4cda74849c63 ] Run of workflow/packed.cwl#main  (0:00:02.284649)Legend:[ Workflow start] Workflow end

The listing can be customized, seecwlprov run --help for details. For example:

test/revsort-cwlprov-0.4.0$ cwlprov --no-hints run --no-labels --start --end --no-duration 2018-08-21 17:26:24.467844                            Flow d47d3d43-4830-44f0-aa32-4cda74849c63 [2018-08-21 17:26:24.530884 2018-08-21 17:26:25.653382 Step 6f501717-0c97-492e-b18a-10bc096f1797  2018-08-21 17:26:25.656084 2018-08-21 17:26:26.744083 Step e7c8b2c0-dee6-4c61-b674-f0807cb47344                             2018-08-21 17:26:26.752493 Flow d47d3d43-4830-44f0-aa32-4cda74849c63 ]

Nested workflows

Nested workflows, steps that themselves are workflows, are indicated incwlprov run with a*:

(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run2018-08-08 22:44:06.573330 Flow 39408a40-c1c8-4852-9747-87249425be1e [ Run of workflow/packed.cwl#main 2018-08-08 22:44:06.691722 Step 4f082fb6-3e4d-4a21-82e3-c685ce3deb58   Run of workflow/packed.cwl#main/create-tar  (0:00:00.010133)2018-08-08 22:44:06.702976 Step 0cceeaf6-4109-4f08-940b-f06ac959944a * Run of workflow/packed.cwl#main/compile  (unknown duration)2018-08-08 22:44:12.680097 Flow 39408a40-c1c8-4852-9747-87249425be1e ] Run of workflow/packed.cwl#main  (0:00:06.106767)Legend:[ Workflow start* Nested provenance, use UUID to explore: cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a] Workflow end(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a2018-08-08 22:44:06.607210 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a [ Run of workflow/packed.cwl#main 2018-08-08 22:44:06.707070 Step 83752ab4-8227-4d4a-8baa-78376df34aed   Run of workflow/packed.cwl#main/untar  (0:00:00.008149)2018-08-08 22:44:06.718554 Step f56d8478-a190-4251-84d9-7f69fe0f6f8b   Run of workflow/packed.cwl#main/argument  (0:00:00.532052)2018-08-08 22:44:07.251588 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a ] Run of workflow/packed.cwl#main  (0:00:00.644378)Legend:[ Workflow start] Workflow end

Note that there is a bug in CWLProv 0.3.0 logging shown above; steps of nested workflows are misleadingly labeled under#main

You can list all workflow runs (including nested workflow runs) withcwlprov runs:

test/nested-cwlprov-0.3.0$ cwlprov runs39408a40-c1c8-4852-9747-87249425be1e * Run of workflow/packed.cwl#main0cceeaf6-4109-4f08-940b-f06ac959944a   Run of workflow/packed.cwl#mainLegend:* master workflow

To explore the nested workflow run with other commands you may have to provide the run UUID with--run argument, e.g.

test/nested-cwlprov-0.3.0$ cwlprov outputs --format=files --run 0cceeaf6-4109-4f08-940b-f06ac959944a 83752ab4-8227-4d4a-8baa-78376df34aedOutput example_out:data/93/93035905e94e150874f5a881d39f3c5c6378dd38

License

This repository is distributed underApache License, version 2.0

See the fileLICENSE.txt for details, andNOTICE for required notices.

SPDX-License-Identifier: Apache-2.0

Contributing

cwlprov.py is maintained athttps://github.com/common-workflow-language/cwlprov-py/ by theCommon Workflow Language project.

Feel free to raise anissue or apull request tocontribute to CWLProv. Contributions are assumed to be covered bysection 5 of the Apache License.

For an informal CWLProv discussion with other developers, join the (relativelyquiet) Gitter roomcommon-workflow-language/cwlprov,or the (more busy)common-workflow-language/common-workflow-language.

Code of Conduct

The CWL Project is dedicated to providing a harassment-free experience foreveryone, regardless of gender, gender identity and expression, sexualorientation, disability, physical appearance, body size, age, race, orreligion. We do not tolerate harassment of participants in any form. This codeof conduct applies to all CWL Project spaces, including the Google Group, theGitter chat room, the Google Hangouts chats, both online and off. Anyone whoviolates this code of conduct may be sanctioned or expelled from these spacesat the discretion of the leadership team.

For more details, see ourCode of Conduct.

About

cwlprov: Command line tool and Python API to explore Research Objects containing provenance of Common Workflow Language executions

w3id.org/cwl/prov/

Releases2

0.1.2 Latest

Mar 7, 2022

+ 1 release

Sponsor this project

Packages

No packages published

Movatterモバイル変換

Uh oh!

License

common-workflow-language/cwlprov-py

Folders and files

Latest commit

History

Repository files navigation

CWLProv Python tool

Installation

Development

Usage

Validation

Research Object information

Workflow run

Nested workflows

License

Contributing

Code of Conduct

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases2

Sponsor this project

Uh oh!

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

Packages