Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Working repo to support the Alliance's Open Trusted Data Initiative

NotificationsYou must be signed in to change notification settings

The-AI-Alliance/open-trusted-data-initiative

Repository files navigation

Welcome to the AI AllianceOpen Trusted Data Initiative (OTDI).

Vision

OTDI is building a high-quality, trusted, and open catalog of datasets for AI LLM pre-training, fine-tuning, and domain-specific applications. These datasets are amenable to a wide variety of use cases in enterprises, governments, regulated industries, and wherever high trust in the data foundations of AI is essential.

The initiative consists of several projects:

  • Define Openness Criteria: What has to be true about a dataset in order for it to be considered trulyopen for use? This project defines those criteria. See theDataset Specification page for our evolving thinking on the minimally-sufficient criteria.
  • Find Diverse Datasets: We seek a very broad range of datasets, including: text (especially under-served language), multimedia (audio, video, images), time series (targeting any domain or application), science (molecular discovery, drug discovery, geospatial, physics, etc., etc), specific domains and use cases (industry-specific and use case-specific data), synthetic (datasets for all of the above can be synthetic or "real").
  • Data Pipelines: Data pipelines implemented using tools likeDPK are used both to validate datasets proposed for inclusion in our catalog and, eventually, to derive new datasets specialized for particular purposes. See theHow We Process Datasets page for more information.
  • Open Dataset Catalog: A catalog of datasets from many sources that meet our criteria for openness. See theDataset Catalog page for more information.

Each of these projects welcome enthusiastic participants! Please join us!

Using This Repo

This repo will contain the "code" for theOTDI website, as well as the code that implements the projects for OTDI.

About the GitHub Pages Website Published from this Repo

The website is published usingGitHub Pages, where the pages are written in Markdown/HTML and served usingJekyll. We use theJust the Docs Jekyll theme.

SeeGITHUB_PAGES.md for more information, especially for instructions on previewing changes locally usingjekyll.

See thestatic-catalog/README.md for details about building the current "static" catalog.

Note

All documentation is licensed under Creative Commons Attribution 4.0 International. SeeLICENSE.CDLA-2.0.

Other Documentation and Code

This repo will alsohost the code for the projects that are part of OTDI, listed above. Eventually, as these projects grow, we may move them out to separate repos.

Miscellaneous other documentation, not in the website, is also captured here:

Getting Involved

We welcome contributions as PRs. Please see ourAlliance community repo for general information about contributing to any of our projects. This section provides some specific details you need to know.

In particular, see the AI AllianceCONTRIBUTING instructions. You will need to agree with the AI AllianceCode of Conduct.

Allcode contributions are licensed under theApache 2.0 LICENSE (which is also in this repo,LICENSE.Apache-2.0).

Alldocumentation contributions are licensed under theCreative Commons Attribution 4.0 International (which is also in this repo,LICENSE.CC-BY-4.0).

Alldata contributions are licensed under theCommunity Data License Agreement - Permissive - Version 2.0 (which is also in this repo,LICENSE.CDLA-2.0).

We use the "Developer Certificate of Origin" (DCO).

Warning

Before you make any git commits with changes, understand what's required for DCO.

See the Alliance contributing guidesection on DCO for details. In practical terms, supporting this requirement means you must use the-s flag with yourgit commit commands.

About

Working repo to support the Alliance's Open Trusted Data Initiative

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors6


[8]ページ先頭

©2009-2025 Movatter.jp