You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
ManGO ingest is a lightweight tool to monitor a local directory for filechanges and ingest (part of) them into iRODS. There is no need for cronjobsas it is based on python watchdog which starts its own threads for continousoperations.
The main purpose it to be an easy entry point for ingestion of files intoiRODS, from where possibly a ManGO Flow task will pick up and handle furtherprocessing
Current state of supported platforms: beta software
The initial development is focusing on Linux, but the target platforms arealso including Windows and Mac OS. It may or may not work for you (yet), pleaseuse the issue tracker to report on your findings/use cases and more..
Afterwards verify the executablemango_ingest is available in your PATH
$ mango_ingest --help
Quick checkout
Just checkout the repository and copy the scriptmango_ingest.py around to where you want to execute it
Authentication
Authentication is done by creating aniRODSSession from a configuration file either as specified by the environment variableIRODS_ENVIRONMENT_FILE or with a fallback to the current user~/.irods/irods_environment.json.
Usage
mango_ingest [OPTIONS] [COMMAND [OPTIONS] [ARGS]]
If it detects a new file creation, the corresponding file is inspectedthrough a white list (glob pattern and/or regular expression list) and ifany of those match, it is uploaded to the specified path in iRODS/ManGO
Ignore patterns--ignore-glob and regular expressions--ignore areevaluated before any--glob and/or--regex
CUSTOM FILTERS
Custom filters can be specified too with --custom-filter, if they areresolvable with a dynamic import. The parameter is a string defining thename of the module nf function in the form<module>.<function> and thatfunctions takes as the first positional parameter thepathlib.Pathparameter of the file to validate, followed by an optional set of kwargsparameters. See also the option--filter-kwargs which accepts a dict/jsonstring.
METADATA
In addition, there are a number of ways to add metadata on the fly. A fewbuiltin functions cover the case for some rather obvious ones like metadatathat is included in the path--metadata-path or shorter--md-path andfile system properties such as modified time--metadata-mtime and symlinkinformation
You can also add your custom handler much in the same way as you can addcustom filters, see--help and the--metadata-handler option. An exampleis also included indoc/examples/extract_metadata.py which relies on theexiftool executable and corresponding Python module.
ENVIRONTMENT VARIABLES
All parameters can also be set via environment variables using their longname, uppercased and prefixed withMANGO_ . For example
Besides command line options, environment variables, you can also specify aYaml formatted configuration file through the environment variableMANGO_INGEST_CONFIG. This can hold all or a subset of the command lineoptions. It acts as a "default" setting for each option, and the valuespecified by the command line option or environment variable takesprecedence.
The builtin sub commandgenerate-config will create such a yaml formattedconfig file for you.
Options: -v, --verbose Show runtime messages [default: 0] -r, --recursive Also watch sub directories -p, --path TEXT The (local) path to monitor [default: .] -d, --destination TEXT iRODS destination collection path --observer [native|polling] The observer system to use for getting changed paths. Defaults to 'polling' which is recommended for most use cases, but you can use also 'native' in for linux/mac filesystems when watching for new files that are directly written into the directorypolling is a rather brute force algorithm, needed for network mounted drives and windows for example [default: polling] --polling-interval INTEGER Polling interval in seconds in case the observer is specified as 'polling' [default: 5] --regex TEXT regular expression to match [multiple] --glob TEXT glob expression to match as a simpler alternative to --regex [multiple] --filter-func TEXT use an external filter (along regex/glob patterns), it will be dynamically imported --filter-func-kwargs TEXT A json string that will be parsed as a dict and injected as kwargs into the filter after the path --ignore TEXT regular expression to ignore certain files/folders [multiple] --ignore-glob TEXT glob patterns to ignore files / folders [multiple] --sync Do an initial sync --verify-checksum Verify checksums --restart PATH Use restart file to retry failed uploads from a previous run --dry-run Dry run: do not upload anything, implies --verbose -nw, --no-watch Do not start monitoring for future changes, implies --sync --metadata-path, --md-path TEXT regular expression to extract metadata from the path [multiple] --metadata-mtime, --md-mtime Add the original modify time as metadata --metadata-handler, --md-handler TEXT a custom PYPON_PATH accessible module.function to handle metadata --metadata-handler-kwargs, --md-handler-kwargs TEXT kwargs parameters for the metadata-handler as a json string --help Show this message and exit.Commands: check-regex Utilty to test a regular expression against a filename... clean Clean up older (default) or all (-a) result files examples Examples generate-config Generate a YAML config template show Show parameter and values as would be used given the...