Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Support rechunking to seasonal frequency with SeasonalResampler#10519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
dhruvak001 wants to merge12 commits intopydata:main
base:main
Choose a base branch
Loading
fromdhruvak001:issue#10425

Conversation

dhruvak001
Copy link
Contributor

users could not useSeasonResampler for chunking operations in xarray, despite it being a natural fit for seasonal data analysis. When attemptingds.chunk(time=SeasonResampler(["DJF", "MAMJ", "JAS", "ON"])), users encountered obscure errors because the chunking logic was hardcoded to only work withTimeResampler objects. This limitation prevented efficient seasonal analysis workflows and forced users to use workarounds or manual chunking strategies.

Now Added a generalized chunking approach by adding aresolve_chunks method to theResampler base class and updating the chunking logic to work with allResampler objects, not justTimeResampler. We also added a_for_chunking method toSeasonResampler that ensuresdrop_incomplete=False during chunking operations to prevent silent data loss. The solution maintains full backward compatibility with existingTimeResampler functionality while enabling seamless seasonal chunking

@dhruvak001dhruvak001 changed the titleSupport chunkingSupport rechunking to seasonal frequency with SeasonalResamplerJul 9, 2025
Comment on lines +1035 to +1043
def _for_chunking(self) -> Self:
"""
Return a version of this resampler suitable for chunking.

For SeasonResampler, this returns a version with drop_incomplete=False
to prevent data from being silently dropped during chunking operations.
"""
return type(self)(seasons=self.seasons, drop_incomplete=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
def_for_chunking(self)->Self:
"""
Returnaversionofthisresamplersuitableforchunking.
ForSeasonResampler,thisreturnsaversionwithdrop_incomplete=False
topreventdatafrombeingsilentlydroppedduringchunkingoperations.
"""
returntype(self)(seasons=self.seasons,drop_incomplete=False)


# Create a temporary resampler that ignores drop_incomplete for chunking
# This prevents data from being silently dropped during chunking
resampler_for_chunking = self._for_chunking()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
resampler_for_chunking=self._for_chunking()
resampler_for_chunking=type(self)(seasons=self.seasons,drop_incomplete=False)

Comment on lines +1140 to +1158
data = create_test_data()
for chunks in [1, 2, 3, 4, 5]:
rechunked = data.chunk({"dim1": chunks})
assert rechunked.chunks["dim1"] == (chunks,) * (8 // chunks) + (
(8 % chunks,) if 8 % chunks else ()
)

rechunked = data.chunk({"dim2": chunks})
assert rechunked.chunks["dim2"] == (chunks,) * (9 // chunks) + (
(9 % chunks,) if 9 % chunks else ()
)

rechunked = data.chunk({"dim1": chunks, "dim2": chunks})
assert rechunked.chunks["dim1"] == (chunks,) * (8 // chunks) + (
(8 % chunks,) if 8 % chunks else ()
)
assert rechunked.chunks["dim2"] == (chunks,) * (9 // chunks) + (
(9 % chunks,) if 9 % chunks else ()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
data=create_test_data()
forchunksin [1,2,3,4,5]:
rechunked=data.chunk({"dim1":chunks})
assertrechunked.chunks["dim1"]== (chunks,)* (8//chunks)+ (
(8%chunks,)if8%chunkselse ()
)
rechunked=data.chunk({"dim2":chunks})
assertrechunked.chunks["dim2"]== (chunks,)* (9//chunks)+ (
(9%chunks,)if9%chunkselse ()
)
rechunked=data.chunk({"dim1":chunks,"dim2":chunks})
assertrechunked.chunks["dim1"]== (chunks,)* (8//chunks)+ (
(8%chunks,)if8%chunkselse ()
)
assertrechunked.chunks["dim2"]== (chunks,)* (9//chunks)+ (
(9%chunks,)if9%chunkselse ()
)

)

# Test standard seasons
rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "JJA", "SON"]))
Copy link
Contributor

@dcheriandcherianJul 18, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we'll need to error on a missing season like this:

        rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "SON"]))

chunks = chunks.dropna(name).astype(int)
chunks_tuple: tuple[int, ...] = tuple(chunks.data.tolist())
return chunks_tuple
return resampler.compute_chunks(name, variable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
returnresampler.compute_chunks(name,variable)
newchunks=resampler.compute_chunks(name,variable)
ifsum(newchunks)!=variable.shape[0]:
raiseValueError(f"Logic bug in rechunking using{resampler!r}. New chunks tuple does not match size of data. Please open an issue.")
returnnewchunks

Let's protect ourselves a bit from logic bugs in the resampler

# Test standard seasons
rechunked = ds.chunk(x=2, time=SeasonResampler(["DJF", "MAM", "JJA", "SON"]))
# Should have multiple chunks along time dimension
assert len(rechunked.chunksizes["time"]) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

let's assert an actual value here.

Comment on lines +1169 to +1173
N = 365 * 2 # 2 years
if use_cftime:
time = xr.date_range("2001-01-01", periods=N, freq="D", use_cftime=True)
else:
time = xr.date_range("2001-01-01", periods=N, freq="D")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
N=365*2# 2 years
ifuse_cftime:
time=xr.date_range("2001-01-01",periods=N,freq="D",use_cftime=True)
else:
time=xr.date_range("2001-01-01",periods=N,freq="D")
N=366+365# 2 years
time=xr.date_range("2000-01-01",periods=N,freq="D",use_cftime=use_cftime)

By starting in 2000, we can check leap year logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we parameterize this over calendars too?360_day,noleap andstandard should be good enough.

{"x": 2, "time": SeasonResampler(["DJFM", "AM", "JJA", "SON"])}
)
# Should have multiple chunks along time dimension
assert len(rechunked.chunksizes["time"]) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

here too let's assert actual chunks tuple

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dcheriandcheriandcherian left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Support rechunking to seasonal frequency with SeasonalResampler
2 participants
@dhruvak001@dcherian

[8]ページ先頭

©2009-2025 Movatter.jp