Movatterモバイル変換


[0]ホーム

URL:


Hugging Face's logoHugging Face

Dataset viewer documentation

Overview

Dataset viewer

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces
Faster examples with accelerated inference
Switch between documentation themes

to get started

Overview

The dataset viewer automatically converts and publishes public datasets less than 5GB on the Hub as Parquet files. If the dataset is already in Parquet format, it will be published as is.Parquet files are column-based and they shine when you’re working with big data.

For private datasets, the feature is provided if the repository is owned by aPRO user or anEnterprise Hub organization.

There are several different libraries you can use to work with the published Parquet files:

  • ClickHouse, a column-oriented database management system for online analytical processing
  • cuDF, a Python GPU DataFrame library
  • DuckDB, a high-performance SQL database for analytical queries
  • Pandas, a data analysis tool for working with data structures
  • Polars, a Rust based DataFrame library
  • PostgreSQL via pgai, a powerful, open source object-relational database system
  • mlcroissant, a library for loading datasets from Croissant metadata
  • pyspark, the Python API for Apache Spark
<>Update on GitHub


[8]ページ先頭

©2009-2026 Movatter.jp