You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
This project provides tools for packaging and uploading datasets along with their metadata, specifically tailored for compatibility with the Frictionless Data specifications, Open Energy Metadata (OEM) standards and the Open Energy Platform (OEP).
Features
"OEM Data Package": Gathers and packages datasets and metadata into a customized Frictionless Data Package, ready for sharing or uploading to OEP. The package is fully compatible with the Frictionless Data specification and framework and integrates compatibility with the specifications of OEM/OEP.
Geodata Support: Includes custom handling for GeoPackage files, extracting and validating CRS (Coordinate Reference System), geometry types, and bounding boxes.
Data Upload: Manages data upload to OEP-DB, automatically creating the necessary tables and uploading the data in batches of rows (batchsize can be adjusted).
Update metadata on OEP: Supports updating metadata of specific tables on OEP based on the contents of the respective OEM files in the provided data package.
OEM Validation Reports: Generates detailed reports on metadata validation, identifying any discrepancies or non-compliance issues with OEM standards.
CLI Support: Offers a command-line interface for easy execution of the data packaging and uploading process.
Contents
oem_datapackage.py: Defines theOemDataPackage class, which packages datasets and metadata into custom "OEM Data Package"
oep_uploadhandler.py: Implements theOepUploadHandler class for uploading datasets of an OEM Data Package to the OEP, including dataset and metadata.
cli.py: Provides a Command Line Interface (CLI) to facilitate the use ofOemDataPackage andOepUploadHandler functionalities.
utils.py: Contains utility functions that support data processing tasks across the project.
requirements.txt: Lists all the necessary Python packages required to run this project.
setup.py: Contains setup configurations for packaging this project.
Installation
Ensure you have Python 3.8 or newer installed. Clone or download the project repository, navigate to the project directory, and install the required dependencies:
pip install -r requirements.txt
OEM Data Package
TheOEMDataPackage class is designed to streamline the creation, validation, and packaging of datasets along with their respective OEM, adhering to the Frictionless Data Package standard and incorporating standards of OEM and OEP. It should facilitate the organization of datasets for easy sharing, publication, and further processing, specifically enabling improved integration with the OEP.
Class Features
Automated Packaging: Packages datasets and metadata for OEP and OEM compliance.
Metadata Validation: Ensures metadata meets OEM standards for publication.
Customization: Enables detailed package naming, description, and versioning.
Geospatial Features: Validates geospatial data, including CRS and geometry.
Validation Reports: Provides reports on metadata compliance with OEM.
Usage
Initialize: Specify the input directory containing datasets and metadata, the output directory for the data package, package name, description, version, and whether to enable OEM integration.
fromoem_dpkgimportOemDataPackagepackage=OemDataPackage(input_path="/path/to/datasets",output_path="/path/to/output",name="Example Data Package",description="A comprehensive data package for energy research.",version="1.0",oem=True)
Create the Data Package: Call thecreate() method to automatically package the datasets, perform metadata validation, and prepare the data package.
package.create()
This process copies the datasets and metadata to the specified output directory, validates the metadata against OEM standards, and generates adatapackage.json file that describes the entire data package.
Considerations
Ensure the input directory is well-organized, with each dataset and its corresponding metadata placed in separate subdirectories. (see example structure)
The metadata files should be namedmetadata.json and formatted according to the OEM standards for seamless validation.
The naming convention for the data package should adhere to Frictionless Data Package specifications. The class automatically adjusts names to fit these specifications, if necessary.
OepUploadHandler
TheOepUploadHandler class is designed to improve the workflow of uploading data to the Open Energy Platform (OEP). It facilitates the preparation and uploading of datasets and their metadata, ensuring compliance with the Open Energy Metadata (OEM) standards. This guide will explain how to effectively utilize this class, highlighting important considerations to ensure successful data uploads.
Class Features
Batch Uploads: Enables efficient dataset uploads to the Open Energy Platform (OEP) in batches.
Selective Processing: Offers flexibility in processing specific datasets or all datasets within a package.
Metadata Updates: Automatically updates metadata on OEP to keep dataset information current.
Table Management: Gives options to newly create necessary tables or manage existing tables on OEP, including overwriting capabilities with user confirmation.
Prerequisites
Before usingOepUploadHandler, ensure you have:
A valid API token for the Open Energy Platform.
Properly packaged the datasets and metadata you intend to upload, using theOEMDataPackage functions.
Usage
Initialize: To begin, instantiate theOepUploadHandler class with the path to your data package, your OEP API token, and other relevant information:
fromoem_dpkgimportOepUploadHandlerupload_handler=OepUploadHandler(datapackage_path="path/to/your/datapackage.json",api_token="your_oep_api_token",oep_username="your_oep_username",oep_schema="model_draft",# Optional, defaults to "model_draft"dataset_selection=["dataset1","dataset2"]# Optional, specify datasets to upload)
Extract Dataset Resources: Theextract_dataset_resources method filters the datasets you wish to upload based on yourdataset_selection. If no selection is provided, all datasets within the data package are processed:
upload_handler.extract_dataset_resources()
Set up OEP Database Connection: Establish a connection to the OEP Database API usingsetup_db_connection:
upload_handler.setup_db_connection()
This step is crucial for enabling dataset uploads and table creation on the OEP.
Upload Datasets: Use theupload_datasets method to upload the datasets to the OEP. This method handles data preparation, batch uploading, and metadata updating:
upload_handler.upload_datasets()
During the upload process, a progress bar will display the upload status for each dataset.
Update Metadata:
If you need to update the metadata for a dataset already on the OEP, use theupdate_oep_metadata method. Provide the path to the OEM file and the table name:
With "run_all()" you execute the complete upload process (initializing the handler, extracting resources, setting up a database connection, creating necessary tables, uploading metadata and dataset-data).
Data and Metadata Compatibility: Ensure your datasets and metadata files comply with the OEM standards and the specific requirements of the OEP database schema you're targeting.
Batch Size: The default batch size is set to 1000 rows. Depending on your dataset size and network conditions, you may adjust this value in theupload_data_to_table method call withinupload_datasets.
Error Handling: The class includes basic error handling for database connections, API requests, and batch uploads. Monitor the console output for error messages to troubleshoot issues.
CLI
The provided CLI tool (cli.py) offers an accessible way to use the functionalities of this project from the command line, streamlining the process of data package creation and uploading to OEP.
Creating a Data Package
To create a data package from your datasets and metadata, run:
Used for converting Open Energy Metadata (OEM) into database schemas and tables, facilitating the automated creation of database structures based on the standardized metadata definitions.
Essential for validating the metadata of datasets being packaged. This ensures that all data packages comply with the latest OEM standards, promoting discoverability and usability.
Enables querying and manipulating OEP data using SQLAlchemy ORM, thus abstracting database specifics and allowing for seamless integration of OEP data into the project's workflows.