- Notifications
You must be signed in to change notification settings - Fork63
ML Dataset Governance Policy for Autonomous Vehicle Datasets
License
TRI-ML/dgp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
To ensure the traceability, reproducibility and standardization for all MLdatasets and models generated and consumed within Toyota Research Institute(TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schemaand maintenance of all TRI's Autonomous Vehicle (AV) datasets.
- Schema:Protobuf-based schemas forraw data, annotations and dataset management.
- DataLoaders: Universal PyTorch DatasetClass to load allDGP-compliant datasets.
- CLI: Main CLI for handling DGP datasets and the entrypoint ofvisulization tools.
Please seeGetting Started for environment setup.
Getting started is as simple as initializing a dataset-class with the relevantdataset JSON, raw data sensor names, annotation types, and split information.Below, we show a few examples of initializing a Pytorch dataset for multi-modallearning from 2D bounding boxes, and 3D bounding boxes.
fromdgp.datasetsimportSynchronizedSceneDataset# Load synchronized pairs of camera and lidar frames, with 2d and 3d# bounding box annotations.dataset=SynchronizedSceneDataset('<dataset_name>_v0.0.json',datum_names=('camera_01','lidar'),requested_annotations=('bounding_box_2d','bounding_box_3d'),split='train')
A list of starter scripts are provided in theexamples directory.
- examples/load_dataset.py: Simple example script toload a multi-modal dataset based on theGetting Started section above.
You can build the base docker image and run the tests withindocker containervia:
make docker-buildmake docker-run-tests
We appreciate all contributions to DGP! To learn more about making acontribution to DGP, please seeContribution Guidelines.
Job | CI | Notes |
---|---|---|
docker-build | Docker build and push tocontainer registry | |
pre-merge | Pre-merge testing | |
doc-gen | GitHub Pages doc generation | |
coverage | Code coverage metrics and badge generation |
Type | Platforms |
---|---|
🚨Bug Reports | GitHub Issue Tracker |
🎁Feature Requests | GitHub Issue Tracker |
DGP is developed and currently maintained byQuincy Chen, Arjun Bhargava, ChaoFang, Chris Ochoa and Kuan-Hui Lee from ML-Engineering team atToyota Research Institute (TRI), with contributionscoming from ML-Research team at TRI,Woven Planet andParallel Domain.
About
ML Dataset Governance Policy for Autonomous Vehicle Datasets