Cache-system reference
The caching system was updated in v0.8.0 to become the central cache-system sharedacross libraries that depend on the Hub. Read thecache-system guidefor a detailed presentation of caching at HF.
Helpers
try_to_load_from_cache
huggingface_hub.try_to_load_from_cache
<source>(repo_id: strfilename: strcache_dir: typing.Union[str, pathlib.Path, NoneType] = Nonerevision: typing.Optional[str] = Nonerepo_type: typing.Optional[str] = None)→Optional[str] or_CACHED_NO_EXIST
Parameters
- cache_dir (
stroros.PathLike) —The folder where the cached files lie. - repo_id (
str) —The ID of the repo on huggingface.co. - filename (
str) —The filename to look for insiderepo_id. - revision (
str,optional) —The specific model version to use. Will default to"main"if it’s not provided and nocommit_hashisprovided either. - repo_type (
str,optional) —The type of the repository. Will default to"model".
Returns
Optional[str] or_CACHED_NO_EXIST
Will returnNone if the file was not cached. Otherwise:
- The exact path to the cached file if it’s found in the cache
- A special value
_CACHED_NO_EXISTif the file does not exist at the given commit hash and this fact wascached.
Explores the cache to return the latest cached file for a given revision if found.
This function will not raise any exception if the file in not cached.
cached_assets_path
huggingface_hub.cached_assets_path
<source>(library_name: strnamespace: str = 'default'subfolder: str = 'default'assets_dir: typing.Union[str, pathlib.Path, NoneType] = None)
Parameters
- library_name (
str) —Name of the library that will manage the cache folder. Example:"dataset". - namespace (
str,optional, defaults to “default”) —Namespace to which the data belongs. Example:"SQuAD". - subfolder (
str,optional, defaults to “default”) —Subfolder in which the data will be stored. Example:extracted. - assets_dir (
str,Path,optional) —Path to the folder where assets are cached. This must not be the same folderwhere Hub files are cached. Defaults toHF_HOME / "assets"if not provided.Can also be set withHF_ASSETS_CACHEenvironment variable.
Return a folder path to cache arbitrary files.
huggingface_hub provides a canonical folder path to store assets. This is therecommended way to integrate cache in a downstream library as it will benefit fromthe builtins tools to scan and delete the cache properly.
The distinction is made between files cached from the Hub and assets. Files from theHub are cached in a git-aware manner and entirely managed byhuggingface_hub. Seerelated documentation.All other files that a downstream library caches are considered to be “assets”(files downloaded from external sources, extracted from a .tar archive, preprocessedfor training,…).
Once the folder path is generated, it is guaranteed to exist and to be a directory.The path is based on 3 levels of depth: the library name, a namespace and asubfolder. Those 3 levels grants flexibility while allowinghuggingface_hub toexpect folders when scanning/deleting parts of the assets cache. Within a library,it is expected that all namespaces share the same subset of subfolder names but thisis not a mandatory rule. The downstream library has then full control on which filestructure to adopt within its cache. Namespace and subfolder are optional (woulddefault to a"default/" subfolder) but library name is mandatory as we want everydownstream library to manage its own cache.
Expected tree:
assets/ └── datasets/ │ ├── SQuAD/ │ │ ├── downloaded/ │ │ ├── extracted/ │ │ └── processed/ │ ├── Helsinki-NLP--tatoeba_mt/ │ ├── downloaded/ │ ├── extracted/ │ └── processed/ └── transformers/ ├── default/ │ ├── something/ ├── bert-base-cased/ │ ├── default/ │ └── training/ hub/ └── models--julien-c--EsperBERTo-small/ ├── blobs/ │ ├── (...) │ ├── (...) ├── refs/ │ └── (...) └── [ 128] snapshots/ ├── 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/ │ ├── (...) └── bbc77c8132af1cc5cf678da3f1ddf2de43606d48/ └── (...)
Example:
>>>from huggingface_hubimport cached_assets_path>>>cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="download")PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/download')>>>cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="extracted")PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/extracted')>>>cached_assets_path(library_name="datasets", namespace="Helsinki-NLP/tatoeba_mt")PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/Helsinki-NLP--tatoeba_mt/default')>>>cached_assets_path(library_name="datasets", assets_dir="/tmp/tmp123456")PosixPath('/tmp/tmp123456/datasets/default/default')
scan_cache_dir
huggingface_hub.scan_cache_dir
<source>(cache_dir: typing.Union[str, pathlib.Path, NoneType] = None)
Scan the entire HF cache-system and return a~HFCacheInfo structure.
Usescan_cache_dir in order to programmatically scan your cache-system. The cachewill be scanned repo by repo. If a repo is corrupted, a~CorruptedCacheExceptionwill be thrown internally but captured and returned in the~HFCacheInfostructure. Only valid repos get a proper report.
>>>from huggingface_hubimport scan_cache_dir>>>hf_cache_info = scan_cache_dir()HFCacheInfo( size_on_disk=3398085269, repos=frozenset({ CachedRepoInfo( repo_id='t5-small', repo_type='model', repo_path=PosixPath(...), size_on_disk=970726914, nb_files=11, revisions=frozenset({ CachedRevisionInfo( commit_hash='d78aea13fa7ecd06c29e3e46195d6341255065d5', size_on_disk=970726339, snapshot_path=PosixPath(...), files=frozenset({ CachedFileInfo( file_name='config.json', size_on_disk=1197 file_path=PosixPath(...), blob_path=PosixPath(...), ), CachedFileInfo(...), ... }), ), CachedRevisionInfo(...), ... }), ), CachedRepoInfo(...), ... }), warnings=[ CorruptedCacheException("Snapshots dir doesn't exist in cached repo: ..."), CorruptedCacheException(...), ... ],)
You can also print a detailed report directly from thehf command line using:
> hf cache lsID SIZE LAST_ACCESSED LAST_MODIFIED REFS--------------------------- -------- ------------- ------------- -----------dataset/nyu-mll/glue 157.4M 2 days ago 2 days ago main scriptmodel/LiquidAI/LFM2-VL-1.6B 3.2G 4 days ago 4 days ago mainmodel/microsoft/UserLM-8b 32.1G 4 days ago 4 days ago mainDone in 0.0s. Scanned 6 repo(s) for a total of 3.4G.Got 1 warning(s) while scanning. Use -vvv to print details.
Raises:
CacheNotFoundIf the cache directory does not exist.
ValueErrorIf the cache directory is a file, instead of a directory.
Returns: a~HFCacheInfo object.
Data structures
All structures are built and returned byscan_cache_dir() and are immutable.
HFCacheInfo
classhuggingface_hub.HFCacheInfo
<source>(size_on_disk: intrepos: frozensetwarnings: list)
Parameters
- size_on_disk (
int) —Sum of all valid repo sizes in the cache-system. - repos (
frozenset[CachedRepoInfo]) —Set of~CachedRepoInfo describing all valid cached repos found on thecache-system while scanning. - warnings (
list[CorruptedCacheException]) —List of~CorruptedCacheException that occurred while scanning the cache.Those exceptions are captured so that the scan can continue. Corrupted reposare skipped from the scan.
Frozen data structure holding information about the entire cache-system.
This data structure is returned byscan_cache_dir() and is immutable.
Here
size_on_diskis equal to the sum of all repo sizes (only blobs). However ifsome cached repos are corrupted, their sizes are not taken into account.
delete_revisions
<source>(*revisions: str)
Prepare the strategy to delete one or more revisions cached locally.
Input revisions can be any revision hash. If a revision hash is not found in thelocal cache, a warning is thrown but no error is raised. Revisions can be fromdifferent cached repos since hashes are unique across repos,
Examples:
>>>from huggingface_hubimport scan_cache_dir>>>cache_info = scan_cache_dir()>>>delete_strategy = cache_info.delete_revisions(..."81fd1d6e7847c99f5862c9fb81387956d99ec7aa"...)>>>print(f"Will free{delete_strategy.expected_freed_size_str}.")Will free7.9K.>>>delete_strategy.execute()Cache deletion done. Saved7.9K.
>>>from huggingface_hubimport scan_cache_dir>>>scan_cache_dir().delete_revisions(..."81fd1d6e7847c99f5862c9fb81387956d99ec7aa",..."e2983b237dccf3ab4937c97fa717319a9ca1a96d",..."6c0e6080953db56375760c0471a8c5f2929baf11",...).execute()Cache deletion done. Saved8.6G.
delete_revisionsreturns aDeleteCacheStrategy object that needs tobe executed. TheDeleteCacheStrategy is not meant to be modified butallows having a dry run before actually executing the deletion.
export_as_table
<source>(verbosity: int = 0)→str
Generate a table from theHFCacheInfo object.
Passverbosity=0 to get a table with a single row per repo, with columns“repo_id”, “repo_type”, “size_on_disk”, “nb_files”, “last_accessed”, “last_modified”, “refs”, “local_path”.
Passverbosity=1 to get a table with a row per repo and revision (thus multiple rows can appear for a single repo), with columns“repo_id”, “repo_type”, “revision”, “size_on_disk”, “nb_files”, “last_modified”, “refs”, “local_path”.
Example:
>>>from huggingface_hub.utilsimport scan_cache_dir>>>hf_cache_info = scan_cache_dir()HFCacheInfo(...)>>>print(hf_cache_info.export_as_table())REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH--------------------------------------------------- --------- ------------ -------- ------------- ------------- ---- --------------------------------------------------------------------------------------------------roberta-base model2.7M51 day ago1 week ago main ~/.cache/huggingface/hub/models--roberta-basesuno/bark model8.8K11 week ago1 week ago main ~/.cache/huggingface/hub/models--suno--barkt5-base model893.8M44 days ago7 months ago main ~/.cache/huggingface/hub/models--t5-baset5-large model3.0G45 weeks ago5 months ago main ~/.cache/huggingface/hub/models--t5-large>>>print(hf_cache_info.export_as_table(verbosity=1))REPO ID REPO TYPE REVISION SIZE ON DISK NB FILES LAST_MODIFIED REFS LOCAL PATH--------------------------------------------------- --------- ---------------------------------------- ------------ -------- ------------- ---- -----------------------------------------------------------------------------------------------------------------------------------------------------roberta-base model e2da8e2f811d1448a5b465c236feacd80ffbac7b2.7M51 week ago main ~/.cache/huggingface/hub/models--roberta-base/snapshots/e2da8e2f811d1448a5b465c236feacd80ffbac7bsuno/bark model 70a8a7d34168586dc5d028fa9666aceade1779928.8K11 week ago main ~/.cache/huggingface/hub/models--suno--bark/snapshots/70a8a7d34168586dc5d028fa9666aceade177992t5-base model a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1893.8M47 months ago main ~/.cache/huggingface/hub/models--t5-base/snapshots/a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1t5-large model 150ebc2c4b72291e770f58e6057481c8d2ed331a3.0G45 months ago main ~/.cache/huggingface/hub/models--t5-large/snapshots/150ebc2c4b72291e770f58e6057481c8d2ed331a
CachedRepoInfo
classhuggingface_hub.CachedRepoInfo
<source>(repo_id: strrepo_type: typing.Literal['model', 'dataset', 'space']repo_path: Pathsize_on_disk: intnb_files: intrevisions: frozensetlast_accessed: floatlast_modified: float)
Parameters
- repo_id (
str) —Repo id of the repo on the Hub. Example:"google/fleurs". - repo_type (
Literal["dataset", "model", "space"]) —Type of the cached repo. - repo_path (
Path) —Local path to the cached repo. - size_on_disk (
int) —Sum of the blob file sizes in the cached repo. - nb_files (
int) —Total number of blob files in the cached repo. - revisions (
frozenset[CachedRevisionInfo]) —Set of~CachedRevisionInfo describing all revisions cached in the repo. - last_accessed (
float) —Timestamp of the last time a blob file of the repo has been accessed. - last_modified (
float) —Timestamp of the last time a blob file of the repo has been modified/created.
Frozen data structure holding information about a cached repository.
size_on_diskis not necessarily the sum of all revisions sizes because ofduplicated files. Besides, only blobs are taken into account, not the (negligible)size of folders and symlinks.
last_accessedandlast_modifiedreliability can depend on the OS you are using.Seepython documentationfor more details.
size_on_disk_str
<source>()
(property) Sum of the blob file sizes as a human-readable string.
Example: “42.2K”.
refs
<source>()
(property) Mapping betweenrefs and revision data structures.
CachedRevisionInfo
classhuggingface_hub.CachedRevisionInfo
<source>(commit_hash: strsnapshot_path: Pathsize_on_disk: intfiles: frozensetrefs: frozensetlast_modified: float)
Parameters
- commit_hash (
str) —Hash of the revision (unique).Example:"9338f7b671827df886678df2bdd7cc7b4f36dffd". - snapshot_path (
Path) —Path to the revision directory in thesnapshotsfolder. It contains theexact tree structure as the repo on the Hub. - files — (
frozenset[CachedFileInfo]):Set of~CachedFileInfo describing all files contained in the snapshot. - refs (
frozenset[str]) —Set ofrefspointing to this revision. If the revision has norefs, itis considered detached.Example:{"main", "2.4.0"}or{"refs/pr/1"}. - size_on_disk (
int) —Sum of the blob file sizes that are symlink-ed by the revision. - last_modified (
float) —Timestamp of the last time the revision has been created/modified.
Frozen data structure holding information about a revision.
A revision correspond to a folder in thesnapshots folder and is populated withthe exact tree structure as the repo on the Hub but contains only symlinks. Arevision can be either referenced by 1 or morerefs or be “detached” (no refs).
last_accessedcannot be determined correctly on a single revision as blob filesare shared across revisions.
size_on_diskis not necessarily the sum of all file sizes because of possibleduplicated files. Besides, only blobs are taken into account, not the (negligible)size of folders and symlinks.
size_on_disk_str
<source>()
(property) Sum of the blob file sizes as a human-readable string.
Example: “42.2K”.
nb_files
<source>()
(property) Total number of files in the revision.
CachedFileInfo
classhuggingface_hub.CachedFileInfo
<source>(file_name: strfile_path: Pathblob_path: Pathsize_on_disk: intblob_last_accessed: floatblob_last_modified: float)
Parameters
- file_name (
str) —Name of the file. Example:config.json. - file_path (
Path) —Path of the file in thesnapshotsdirectory. The file path is a symlinkreferring to a blob in theblobsfolder. - blob_path (
Path) —Path of the blob file. This is equivalent tofile_path.resolve(). - size_on_disk (
int) —Size of the blob file in bytes. - blob_last_accessed (
float) —Timestamp of the last time the blob file has been accessed (from anyrevision). - blob_last_modified (
float) —Timestamp of the last time the blob file has been modified/created.
Frozen data structure holding information about a single cached file.
blob_last_accessedandblob_last_modifiedreliability can depend on the OS youare using. Seepython documentationfor more details.
size_on_disk_str
<source>()
(property) Size of the blob file as a human-readable string.
Example: “42.2K”.
DeleteCacheStrategy
classhuggingface_hub.DeleteCacheStrategy
<source>(expected_freed_size: intblobs: frozensetrefs: frozensetrepos: frozensetsnapshots: frozenset)
Parameters
- expected_freed_size (
float) —Expected freed size once strategy is executed. - blobs (
frozenset[Path]) —Set of blob file paths to be deleted. - refs (
frozenset[Path]) —Set of reference file paths to be deleted. - repos (
frozenset[Path]) —Set of entire repo paths to be deleted. - snapshots (
frozenset[Path]) —Set of snapshots to be deleted (directory of symlinks).
Frozen data structure holding the strategy to delete cached revisions.
This object is not meant to be instantiated programmatically but to be returned bydelete_revisions(). See documentation for usage example.
expected_freed_size_str
<source>()
(property) Expected size that will be freed as a human-readable string.
Example: “42.2K”.
Exceptions
CorruptedCacheException
classhuggingface_hub.CorruptedCacheException
<source>()
Exception for any unexpected structure in the Huggingface cache-system.