Interact with the Hub through the Filesystem API
In addition to theHfApi, thehuggingface_hub library providesHfFileSystem, a pythonicfsspec-compatible file interface to the Hugging Face Hub. TheHfFileSystem builds on top of theHfApi and offers typical filesystem style operations likecp,mv,ls,du,glob,get_file, andput_file.
[!WARNING][HfFileSystem](/docs/huggingface_hub/v1.4.0/en/package_reference/hf_file_system#huggingface_hub.HfFileSystem) provides fsspec compatibility, which is useful for libraries that require it (e.g., readingHugging Face datasets directly with
pandas). However, it introduces additional overhead due to this compatibilitylayer. For better performance and reliability, it’s recommended to useHfApi methods when possible.
Usage
>>>from huggingface_hubimport hffs>>># List all files in a directory>>>hffs.ls("datasets/my-username/my-dataset-repo/data", detail=False)['datasets/my-username/my-dataset-repo/data/train.csv','datasets/my-username/my-dataset-repo/data/test.csv']>>># List all ".csv" files in a repo>>>hffs.glob("datasets/my-username/my-dataset-repo/**/*.csv")['datasets/my-username/my-dataset-repo/data/train.csv','datasets/my-username/my-dataset-repo/data/test.csv']>>># Read a remote file>>>with hffs.open("datasets/my-username/my-dataset-repo/data/train.csv","r")as f:... train_data = f.readlines()>>># Read the content of a remote file as a string>>>train_data = hffs.read_text("datasets/my-username/my-dataset-repo/data/train.csv", revision="dev")>>># Write a remote file>>>with hffs.open("datasets/my-username/my-dataset-repo/data/validation.csv","w")as f:... f.write("text,label")... f.write("Fantastic movie!,good")
The optionalrevision argument can be passed to run an operation from a specific commit such as a branch, tag name, or a commit hash.
Unlike Python’s built-inopen,fsspec’sopen defaults to binary mode,"rb". This means you must explicitly set mode as"r" for reading and"w" for writing in text mode. Appending to a file (modes"a" and"ab") is not supported yet.
Integrations
TheHfFileSystem can be used with any library that integratesfsspec, provided the URL follows the scheme:
hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo>

Therepo_type_prefix isdatasets/ for datasets,spaces/ for spaces, and models don’t need a prefix in the URL.
Some interesting integrations whereHfFileSystem simplifies interacting with the Hub are listed below:
Reading/writing aPandas DataFrame from/to a Hub repository:
>>>import pandasas pd>>># Read a remote CSV file into a dataframe>>>df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")>>># Write a dataframe to a remote CSV file>>>df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")
The same workflow can also be used forDask andPolars DataFrames.
Querying (remote) Hub files withDuckDB:
>>>from huggingface_hubimport HfFileSystem>>>import duckdb>>>fs = HfFileSystem()>>>duckdb.register_filesystem(fs)>>># Query a remote file and get the result back as a dataframe>>>fs_query_file ="hf://datasets/my-username/my-dataset-repo/data_dir/data.parquet">>>df = duckdb.query(f"SELECT * FROM '{fs_query_file}' LIMIT 10").df()
Using the Hub as an array store withZarr:
>>>import numpyas np>>>import zarr>>>embeddings = np.random.randn(50000,1000).astype("float32")>>># Write an array to a repo>>>with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w")as root:... foo = root.create_group("embeddings")... foobar = foo.zeros('experiment_0', shape=(50000,1000), chunks=(10000,1000), dtype='f4')... foobar[:] = embeddings>>># Read an array from a repo>>>with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r")as root:... first_row = root["embeddings/experiment_0"][0]
Authentication
In many cases, you must be logged in with a Hugging Face account to interact with the Hub. Refer to theAuthentication section of the documentation to learn more about authentication methods on the Hub.
It is also possible to log in programmatically by passing yourtoken as an argument toHfFileSystem:
>>>from huggingface_hubimport HfFileSystem>>>hffs = HfFileSystem(token=token)
If you log in this way, be careful not to accidentally leak the token when sharing your source code!
Update on GitHub