Hub documentation
Downloading datasets
Hub
Downloading datasets
Integrated libraries
If a dataset on the Hub is tied to asupported library, loading the dataset can be done in just a few lines. For information on accessing the dataset, you can click on the “Use this dataset” button on the dataset page to see how to do so. For example,samsum shows how to do so withdatasets below.




Using the Hugging Face Client Library
You can use thehuggingface_hub library to create, delete, update and retrieve information from repos. For example, to download theHuggingFaceH4/ultrachat_200k dataset from the command line, run
hf download HuggingFaceH4/ultrachat_200k --repo-type dataset
See theHF CLI download documentation for more information.
You can also integrate this into your own library! For example, you can quickly load a CSV dataset with a few lines using Pandas.
from huggingface_hubimport hf_hub_downloadimport pandasas pdREPO_ID ="YOUR_REPO_ID"FILENAME ="data.csv"dataset = pd.read_csv( hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="dataset"))
Using Git
Since all datasets on the Hub are Xet-backed Git repositories, you can clone the datasets locally byinstalling git-xet and running:
git xet installgit lfs installgitclone git@hf.co:datasets/<dataset ID># example: git clone git@hf.co:datasets/allenai/c4
If you have write-access to the particular dataset repo, you’ll also have the ability to commit and push revisions to the dataset.
Add your SSH public key toyour user settings to push changes and/or access private repos.
Update on GitHub