- Notifications
You must be signed in to change notification settings - Fork0
damast: A Python library to facilitate the creation of reproducible data processing pipelines and usage of FAIR data
License
simula/damast
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The main purpose of this library is to faciliate the reusability of data and data processing pipelines.For this, damast introduces a means to associate metadata with data frames and enables consistency checking.
To ensure semantic consistency, transformation steps in a pipeline can be annotated withallowed data ranges for inputs and outputs, as well as units.
class LatLonTransformer(PipelineElement): """ The LatLonTransformer will consume a lat(itude) and a lon(gitude) column and perform cyclic normalization. It will add four columns to a dataframe, namely lat_x, lat_y, lon_x, lon_y. """ @damast.core.describe("Lat/Lon cyclic transformation") @damast.core.input({ "lat": {"unit": units.deg}, "lon": {"unit": units.deg} }) @damast.core.output({ "lat_x": {"value_range": MinMax(-1.0, 1.0)}, "lat_y": {"value_range": MinMax(-1.0, 1.0)}, "lon_x": {"value_range": MinMax(-1.0, 1.0)}, "lon_y": {"value_range": MinMax(-1.0, 1.0)} }) def transform(self, df: AnnotatedDataFrame) -> AnnotatedDataFrame: lat_cyclic_transformer = CycleTransformer(features=["lat"], n=180.0) lon_cyclic_transformer = CycleTransformer(features=["lon"], n=360.0) _df = lat_cyclic_transformer.fit_transform(df=df) _df = lon_cyclic_transformer.fit_transform(df=_df) df._dataframe = _df return df
For detailed examples, check the documentation at:https://simula.github.io/damast
Firstly, you will want to create you an isolated development environment for Python, that being conda or venv-based.The following will go through a venv based setup.
Let us assume you operate with a 'workspace' directory for this project:
cd workspace
Here, you will create a virtual environment.Get an overview over venv (command):
python -m venv --help
Create your venv and activate it:
python -m venv damast-venv source damast-venv/bin/activate
Clone the repo and install:
git clone https://github.com/simula/damast cd damast pip install -e ".[test,dev]"
or alternatively:
pip install damast[test,dev]
If you prefer to work or start with a docker container you can build it using the providedDockerfile
docker build -t damast:latest -f Dockerfile .
To enter the container:
docker run -it --rm damast:latest /bin/bash
To get the usage documentation it is easiest to check the published documentationhere.
Otherwise, you can also locally generate the latest documentation once you installed the package:
tox -e build_docs
Then open the documentation with a browser:
<yourbrowser> _build/html/index.html
Install the project and use the predefined default test environment:
tox -e py
This project is open to contributions. For details on how to contribute please check theContribution Guidelines
This project is licensed under theBSD-3-Clause License.
Copyright (c) 2023-2025Simula Research Laboratory, Oslo, Norway
This work has been derived from work that is part of theT-SAR projectSome derived work is mainly part of the specific data processing for the 'maritime' domain.
The development of this library is part of the EU-projectAI4COPSEC which receives fundingfrom the Horizon Europe framework programme under Grant Agreement N. 101190021.
About
damast: A Python library to facilitate the creation of reproducible data processing pipelines and usage of FAIR data