- Notifications
You must be signed in to change notification settings - Fork5
Working repo to support the Alliance's Open Trusted Data Initiative
The-AI-Alliance/open-trusted-data-initiative
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Welcome to the AI AllianceOpen Trusted Data Initiative (OTDI).
OTDI is building a high-quality, trusted, and open catalog of datasets for AI LLM pre-training, fine-tuning, and domain-specific applications. These datasets are amenable to a wide variety of use cases in enterprises, governments, regulated industries, and wherever high trust in the data foundations of AI is essential.
The initiative consists of several projects:
- Define Openness Criteria: What has to be true about a dataset in order for it to be considered trulyopen for use? This project defines those criteria. See theDataset Specification page for our evolving thinking on the minimally-sufficient criteria.
- Find Diverse Datasets: We seek a very broad range of datasets, including: text (especially under-served language), multimedia (audio, video, images), time series (targeting any domain or application), science (molecular discovery, drug discovery, geospatial, physics, etc., etc), specific domains and use cases (industry-specific and use case-specific data), synthetic (datasets for all of the above can be synthetic or "real").
- Data Pipelines: Data pipelines implemented using tools likeDPK are used both to validate datasets proposed for inclusion in our catalog and, eventually, to derive new datasets specialized for particular purposes. See theHow We Process Datasets page for more information.
- Open Dataset Catalog: A catalog of datasets from many sources that meet our criteria for openness. See theDataset Catalog page for more information.
Each of these projects welcome enthusiastic participants! Please join us!
This repo will contain the "code" for theOTDI website, as well as the code that implements the projects for OTDI.
The website is published usingGitHub Pages, where the pages are written in Markdown/HTML and served usingJekyll. We use theJust the Docs Jekyll theme.
SeeGITHUB_PAGES.md for more information, especially for instructions on previewing changes locally usingjekyll
.
See thestatic-catalog/README.md
for details about building the current "static" catalog.
Note
All documentation is licensed under Creative Commons Attribution 4.0 International. SeeLICENSE.CDLA-2.0.
This repo will alsohost the code for the projects that are part of OTDI, listed above. Eventually, as these projects grow, we may move them out to separate repos.
Miscellaneous other documentation, not in the website, is also captured here:
tools-notes
- Notes on potential tool choices.data-processing-notes
- Notes on requirements and data-specific tool choices.
We welcome contributions as PRs. Please see ourAlliance community repo for general information about contributing to any of our projects. This section provides some specific details you need to know.
In particular, see the AI AllianceCONTRIBUTING instructions. You will need to agree with the AI AllianceCode of Conduct.
Allcode contributions are licensed under theApache 2.0 LICENSE (which is also in this repo,LICENSE.Apache-2.0).
Alldocumentation contributions are licensed under theCreative Commons Attribution 4.0 International (which is also in this repo,LICENSE.CC-BY-4.0).
Alldata contributions are licensed under theCommunity Data License Agreement - Permissive - Version 2.0 (which is also in this repo,LICENSE.CDLA-2.0).
Warning
Before you make any git commits with changes, understand what's required for DCO.
See the Alliance contributing guidesection on DCO for details. In practical terms, supporting this requirement means you must use the-s
flag with yourgit commit
commands.
About
Working repo to support the Alliance's Open Trusted Data Initiative
Topics
Resources
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.