Hub documentation
Datasets Overview
Hub
Datasets Overview
Datasets on the Hub
The Hugging Face Hub hosts alarge number of community-curated datasets for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in thedataset card, many datasets, such asGLUE, include aDataset Viewer to showcase the data.
Each dataset is aGit repository that contains the data required to generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to theData files Configuration page. Following the supported repo structure will ensure that the dataset page on the Hub will have a Viewer.
Search for datasets
Like models and spaces, you can search the Hub for datasets using the search bar in the top navigation or on themain datasets page. There’s a large number of languages, tasks, and licenses that you can use to filter your results to find a dataset that’s right for you.


Privacy
Since datasets are repositories, you cantoggle their visibility between private and public through the Settings tab. If a dataset is owned by anorganization, the privacy settings apply to all the members of the organization.
Update on GitHub