Dataset viewer

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Overview

The dataset viewer automatically converts and publishes public datasets less than 5GB on the Hub as Parquet files. If the dataset is already in Parquet format, it will be published as is.Parquet files are column-based and they shine when you’re working with big data.

For private datasets, the feature is provided if the repository is owned by aPRO user or anEnterprise Hub organization.

There are several different libraries you can use to work with the published Parquet files:

ClickHouse, a column-oriented database management system for online analytical processing
cuDF, a Python GPU DataFrame library
DuckDB, a high-performance SQL database for analytical queries
Pandas, a data analysis tool for working with data structures
Polars, a Rust based DataFrame library
PostgreSQL via pgai, a powerful, open source object-relational database system
mlcroissant, a library for loading datasets from Croissant metadata
pyspark, the Python API for Apache Spark

<>Update on GitHub

←Get Croissant metadata ClickHouse→

Movatterモバイル変換

Dataset viewer

Overview