Hub

API docs

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Datasets Overview

Datasets on the Hub

The Hugging Face Hub hosts alarge number of community-curated datasets for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in thedataset card, many datasets, such asGLUE, include aDataset Viewer to showcase the data.

Each dataset is aGit repository that contains the data required to generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to theData files Configuration page. Following the supported repo structure will ensure that the dataset page on the Hub will have a Viewer.

Search for datasets

Like models and spaces, you can search the Hub for datasets using the search bar in the top navigation or on themain datasets page. There’s a large number of languages, tasks, and licenses that you can use to filter your results to find a dataset that’s right for you.

Privacy

Since datasets are repositories, you cantoggle their visibility between private and public through the Settings tab. If a dataset is owned by anorganization, the privacy settings apply to all the members of the organization.

Update on GitHub

←Datasets Dataset Cards→

Movatterモバイル変換

Hub

Datasets Overview

Datasets on the Hub

Search for datasets

Privacy