Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork1.2k
Support rechunking to seasonal frequency with SeasonalResampler#10519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
def _for_chunking(self) -> Self: | ||
""" | ||
Return a version of this resampler suitable for chunking. | ||
For SeasonResampler, this returns a version with drop_incomplete=False | ||
to prevent data from being silently dropped during chunking operations. | ||
""" | ||
return type(self)(seasons=self.seasons, drop_incomplete=False) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
def_for_chunking(self)->Self: | |
""" | |
Returnaversionofthisresamplersuitableforchunking. | |
ForSeasonResampler,thisreturnsaversionwithdrop_incomplete=False | |
topreventdatafrombeingsilentlydroppedduringchunkingoperations. | |
""" | |
returntype(self)(seasons=self.seasons,drop_incomplete=False) |
# Create a temporary resampler that ignores drop_incomplete for chunking | ||
# This prevents data from being silently dropped during chunking | ||
resampler_for_chunking = self._for_chunking() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
resampler_for_chunking=self._for_chunking() | |
resampler_for_chunking=type(self)(seasons=self.seasons,drop_incomplete=False) |
data = create_test_data() | ||
for chunks in [1, 2, 3, 4, 5]: | ||
rechunked = data.chunk({"dim1": chunks}) | ||
assert rechunked.chunks["dim1"] == (chunks,) * (8 // chunks) + ( | ||
(8 % chunks,) if 8 % chunks else () | ||
) | ||
rechunked = data.chunk({"dim2": chunks}) | ||
assert rechunked.chunks["dim2"] == (chunks,) * (9 // chunks) + ( | ||
(9 % chunks,) if 9 % chunks else () | ||
) | ||
rechunked = data.chunk({"dim1": chunks, "dim2": chunks}) | ||
assert rechunked.chunks["dim1"] == (chunks,) * (8 // chunks) + ( | ||
(8 % chunks,) if 8 % chunks else () | ||
) | ||
assert rechunked.chunks["dim2"] == (chunks,) * (9 // chunks) + ( | ||
(9 % chunks,) if 9 % chunks else () | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
data=create_test_data() | |
forchunksin [1,2,3,4,5]: | |
rechunked=data.chunk({"dim1":chunks}) | |
assertrechunked.chunks["dim1"]== (chunks,)* (8//chunks)+ ( | |
(8%chunks,)if8%chunkselse () | |
) | |
rechunked=data.chunk({"dim2":chunks}) | |
assertrechunked.chunks["dim2"]== (chunks,)* (9//chunks)+ ( | |
(9%chunks,)if9%chunkselse () | |
) | |
rechunked=data.chunk({"dim1":chunks,"dim2":chunks}) | |
assertrechunked.chunks["dim1"]== (chunks,)* (8//chunks)+ ( | |
(8%chunks,)if8%chunkselse () | |
) | |
assertrechunked.chunks["dim2"]== (chunks,)* (9//chunks)+ ( | |
(9%chunks,)if9%chunkselse () | |
) |
) | ||
# Test standard seasons | ||
rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
we'll need to error on a missing season like this:
rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "SON"]))
chunks = chunks.dropna(name).astype(int) | ||
chunks_tuple: tuple[int, ...] = tuple(chunks.data.tolist()) | ||
return chunks_tuple | ||
return resampler.compute_chunks(name, variable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
returnresampler.compute_chunks(name,variable) | |
newchunks=resampler.compute_chunks(name,variable) | |
ifsum(newchunks)!=variable.shape[0]: | |
raiseValueError(f"Logic bug in rechunking using{resampler!r}. New chunks tuple does not match size of data. Please open an issue.") | |
returnnewchunks |
Let's protect ourselves a bit from logic bugs in the resampler
# Test standard seasons | ||
rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])) | ||
# Should have multiple chunks along time dimension | ||
assert len(rechunked.chunksizes["time"]) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
let's assert an actual value here.
N = 365 * 2 # 2 years | ||
if use_cftime: | ||
time = xr.date_range("2001-01-01", periods=N, freq="D", use_cftime=True) | ||
else: | ||
time = xr.date_range("2001-01-01", periods=N, freq="D") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
N=365*2# 2 years | |
ifuse_cftime: | |
time=xr.date_range("2001-01-01",periods=N,freq="D",use_cftime=True) | |
else: | |
time=xr.date_range("2001-01-01",periods=N,freq="D") | |
N=366+365# 2 years | |
time=xr.date_range("2000-01-01",periods=N,freq="D",use_cftime=use_cftime) |
By starting in 2000, we can check leap year logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we parameterize this over calendars too?360_day
,noleap
andstandard
should be good enough.
{"x": 2, "time": SeasonResampler(["DJFM", "AM", "JJA", "SON"])} | ||
) | ||
# Should have multiple chunks along time dimension | ||
assert len(rechunked.chunksizes["time"]) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
here too let's assert actual chunks tuple
whats-new.rst
api.rst
users could not use
SeasonResampler
for chunking operations in xarray, despite it being a natural fit for seasonal data analysis. When attemptingds.chunk(time=SeasonResampler(["DJF", "MAMJ", "JAS", "ON"]))
, users encountered obscure errors because the chunking logic was hardcoded to only work withTimeResampler
objects. This limitation prevented efficient seasonal analysis workflows and forced users to use workarounds or manual chunking strategies.Now Added a generalized chunking approach by adding a
resolve_chunks
method to theResampler
base class and updating the chunking logic to work with allResampler
objects, not justTimeResampler
. We also added a_for_chunking
method toSeasonResampler
that ensuresdrop_incomplete=False
during chunking operations to prevent silent data loss. The solution maintains full backward compatibility with existingTimeResampler
functionality while enabling seamless seasonal chunking