Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CacheStore containing source store and cache store#3366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@ruaridhg
Copy link
Contributor

@ruaridhgruaridhg commentedAug 11, 2025
edited
Loading

Closes#3357

Store containing 2 stores i.e. primary store (where the data is coming from) and a cache store (where you want to store the data as a cache).

Introduces the classCacheStore

This cache wraps any Store implementation and uses a separate Store instance as the cache backend. This provides persistent caching capabilities with time-based expiration, size-based eviction, and flexible cache storage options.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented indocs/user-guide/*.rst
  • Changes documented as a new file inchanges/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

d-v-b, alxmrs, ziw-liu, and golmschenk reacted with rocket emoji
@codecov
Copy link

codecovbot commentedAug 11, 2025
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.86%. Comparing base (1cd4f7e) to head (af81c17).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@##             main    #3366      +/-   ##==========================================+ Coverage   61.25%   61.86%   +0.61%==========================================  Files          84       85       +1       Lines        9949    10110     +161     ==========================================+ Hits         6094     6255     +161  Misses       3855     3855
Files with missing linesCoverage Δ
src/zarr/experimental/cache_store.py100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

await self._cache.delete(key)
self._remove_from_tracking(key)

def cache_info(self) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

it would be great to use a typeddict here, so that the return type is known

Copy link
Contributor

@TomAugspurgerTomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the PR. I left a few small comments, but a couple bigger picture things:

  1. This is perhaps the best demonstration for why#2473 (statically associating a Store with a Buffer type) might be helpful. Iimagine that we don't want to use precious GPU RAM as a cache.
  2. Do we care about thread safety here at all? Given that the primary interface to concurrency in Zarr is async, I think we're probably OK not worrying about it. But there might be spots where we can do operations atomically (e.g. usingdict.pop instead of anif key in dict followed by the operation) with little cost. Trying to synchronize changes to multiple dictionaries would require much more effort.

except Exception as e:
logger.warning("_evict_key: failed to evict key %s: %s", key, e)

def _cache_value(self, key: str, value: Any) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why are we accepting arbitraryvalue here? Is it possible to scope this to justBuffer objects (or maybeBuffer andNDBuffer)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

At a glance, it looks like we only call this withBuffer so hopefully this is an easy fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fixed

Comment on lines 189 to 190
if key in self._cache_order:
del self._cache_order[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't know if this is called from multiple threads, but this could be done atomically withself._cache_order.pop(key, None).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fixed


def _cache_value(self, key: str, value: Any) -> None:
"""Cache a value with size tracking."""
value_size = buffer_size(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If we only acceptBuffer here, thenbuffer_size can be removed hopefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

buffer_size is now removed, as we are using theBuffer type

>>> cached_store = zarr.storage.CacheStore(
... store=source_store,
... cache_store=cache_store,
... max_size=256*1024*1024 # 256MB cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for this PR. I think it would be better to have the LRU functionality on thecache_store (in this example the MemoryStore). Otherwise the enclosingCacheStore would need to keep track of all keys and their access order in the inner store. That could be problematic if the inner store would be shared with other CacheStores or other code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That could be problematic if the inner store would be shared with other CacheStores or other code.

As long as one of the design goals is to use a regular zarr store as the caching layer, there will be nothing we can do to guard against external access to the cache store. For example, if someone uses aLocalStore as a cache, we can't protect the local file system from external modification. I think it's the user's responsibility to ensure that they don't use the same cache for separateCacheStores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

but our default behavior could be to create a freshMemoryStore, which would be a safe default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo. There could even be anLRUStore that wraps a store and implements the tracking and eviction.
The safety concern is, as you pointed out, something the user should take care of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo.

That makes sense, maybe we could implementLRUStore as another store wrapper?

negin513 reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be better to have the LRU functionality on the cache_store (in this example the MemoryStore)

To understand, is the suggestion here that

  1. There's noCacheStore class
  2. Instead all the logic for caching is implemented onStore, and there is acache_store property that can be set to a second store to enable caching?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@normanrz unless you have a concrete proposal for a refactor that would be workable in the scope of this PR, I would suggest we move forward with the PR as-is, and use experience to dial in the abstraction in a later PR.

But knowing that design here might change, I think we should introduce an "experimental" storage module, e.g.zarr.storage.experimental, or a top-level experimental module,zarr.experimental, and put this class there until we are sure that the API is final.

Thoughts? I would like to ship this important feature while also retaining the ability to safely adjust it later. An experimental module seems like a safe way to do that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just commenting that I'd love to have this included sooner rather than later, it will be immediately useful 🎉 Thanks@ruaridhg for taking the initiative and putting this together!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice use ofexperimental here! 🤩

@d-v-bd-v-b mentioned this pull requestSep 23, 2025
@github-actionsgithub-actionsbot added the needs release notesAutomatically applied to PRs which haven't added release notes labelSep 30, 2025
@github-actionsgithub-actionsbot removed the needs release notesAutomatically applied to PRs which haven't added release notes labelSep 30, 2025
@d-v-b
Copy link
Contributor

@ruaridhg I pushed a bunch of changes, please have a look before I merge!

@d-v-bd-v-b mentioned this pull requestOct 2, 2025
Copy link
Contributor

@d-v-bd-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is missing some doctests, but that's because we don't have doctests working (see#3500). I'm going to merge this as-is, and we can circle back with more fixes later. Thanks for this contribution@ruaridhg!

@d-v-bd-v-benabled auto-merge (squash)October 2, 2025 08:46
@d-v-bd-v-b merged commit2eb89e1 intozarr-developers:mainOct 2, 2025
29 checks passed
@ruaridhg
Copy link
ContributorAuthor

This is missing some doctests, but that's because we don't have doctests working (see#3500). I'm going to merge this as-is, and we can circle back with more fixes later. Thanks for this contribution@ruaridhg!

@d-v-b Thanks for making changes and merging. I'm on another project with a tight deadline so haven't had time to look at this, much appreciated!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@joshmoorejoshmoorejoshmoore left review comments

@normanrznormanrznormanrz left review comments

@TomAugspurgerTomAugspurgerTomAugspurger left review comments

@d-v-bd-v-bd-v-b approved these changes

+2 more reviewers

@dstansbydstansbydstansby left review comments

@cmalinmayorcmalinmayorcmalinmayor left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Add general CacheStore to Zarr 3.0

7 participants

@ruaridhg@d-v-b@joshmoore@normanrz@TomAugspurger@dstansby@cmalinmayor

[8]ページ先頭

©2009-2025 Movatter.jp