Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

TYP: preserve type for split functions via structural Protocol#30450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
janosh wants to merge1 commit intonumpy:main
base:main
Choose a base branch
Loading
fromjanosh:fix-split-return-type-23005

Conversation

@janosh
Copy link

@janoshjanosh commentedDec 16, 2025
edited
Loading

add overloads forarray_split,split,hsplit,vsplit, anddsplit that preserve the input type when the input implementsswapaxes and__getitem__.

this uses structural typing (Protocol) based on the methods actually used by these split functions internally - not__array_function__. the Protocol is:

@type_check_onlyclass_SupportsSplitOps(Protocol):defswapaxes(self,axis1:Any,axis2:Any,copy:Any= ...)->Any: ...def__getitem__(self,key:Any,/)->Any: ...

objects likepandas.DataFrame that implement these methods (and returnSelf from them) will have their type preserved through split operations.

example

importnumpyasnpimportpandasaspddf=pd.DataFrame({"a": [1,2,3,4]})splits=np.array_split(df,2)splits[0].columns# before: error (ndarray has no attr "columns"), after: works

why this works

the split functions useswapaxes and slicing (__getitem__) internally. for objects like DataFrame that return their own type from these operations, the split result preserves the type. this is pure duck-typing, not__array_function__ dispatch.

closes#23005

@shoyer
Copy link
Member

The design for__array_function__ (see NEP-18) predates support for type checking in NumPy, but it isn't obvious to me that adding typing overloads for non-NumPy arrays is a good idea. This is particularly true for NumPy APIs acting on types like pandas.DataFrame which intentionally deviate from the design of NumPy and only implement an incomplete subset of its design.

Generally speaking my recommendation would be not to try to combine type checking and__array_function__. There is basically no way to guarantee type safety here given its highly dynamic design. The__array_module__ API is a much saner (but more limited) approach here.

@janoshjanosh changed the titleTYP: preserve type for split functions with __array_function__ objectsTYP: preserve type for split functions via structural ProtocolDec 17, 2025
@janosh
Copy link
Author

thanks for the feedback! i updated the PR title and description - they were outdated.

to clarify: this PR doesnot use__array_function__ at all. the approach is purely structural typing via a Protocol:

class_SupportsSplitOps(Protocol):defswapaxes(self,axis1:Any,axis2:Any,copy:Any= ...)->Any: ...def__getitem__(self,key:Any,/)->Any: ...

the split functions internally useswapaxes and slicing. for objects likeDataFrame that implement these methods and returnSelf, the type is preserved. this is just duck-typing - no__array_function__ dispatch involved.

the Protocol matches what the implementation actually does, so it should be type-safe for any object that properly implements these methods.

@shoyer
Copy link
Member

OK, interesting. I'm still hesitant to recommend relying on this behavior, but it's been around in NumPy long enough that it's probaby not going to change.

The real challenge is that strictly following the implementation, we don't know what the return type should be -- it could be an entirely different type, principle.

I would still suggest that anybody concerned with type safety should avoid these implicitly overloaded NumPy APIs. In retrospect, they were clearly a design mistake.

@jorenhamjorenham self-requested a reviewDecember 17, 2025 08:16
Copy link
Member

@jorenhamjorenham left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for this PR.

Overall this looks fine to me. The shapes are retained, so even if the input arrays include static shape-typing information, this will still be sound.
One issue with this protocol is that it isn't complete. If you look at thearray_split implementation, for example, you'll see

try:Ntotal=ary.shape[axis]exceptAttributeError:Ntotal=len(ary)

But as it currently stands, there are objects assignable to_SupportsSplitOps that have neither ashape nor__len__. Please take a closer look at each implementation to see what other methods/attributes theProtocol should have.

Recently, NumPy dropped support for Python 3.11, so we're using PEP 695 syntax now, in favor ofTypeVar. You'll see ruff complaining about this if you runruff check.

This needs tests to verify that it is working as intended. We don't have Pandas available in our test environment, so you'll have to ducktype something (that'll also work at runtime) instead. You can find them undertyping/tests/data

@github-actions

This comment has been minimized.

@jorenham
Copy link
Member

The primer error is unrelated

Add overloads for array_split, split, hsplit, vsplit, and dsplit thatpreserve the input type when the input implements _SupportsSplitOps.The Protocol requires:- shape: tuple[int, ...] - for ary.shape[axis] access- ndim: int - for dimensional checks in hsplit/vsplit/dsplit- swapaxes(axis1, axis2) -> Self - for axis manipulation- __getitem__(key) -> Self - for slicingUses PEP 695 syntax and adds tests with a duck-typed SplitableArrayclass that implements the protocol.Closesnumpy#23005
@janoshjanoshforce-pushed thefix-split-return-type-23005 branch froma2a391b to627f537CompareDecember 18, 2025 05:21
@janosh
Copy link
Author

@jorenham thanks for taking a look and the great feedback. i resolved merge conflicts and i think addressed all your suggestions

defshape(self)->tuple[int, ...]: ...
@property
defndim(self)->int: ...
defswapaxes(self,axis1:SupportsIndex,axis2:SupportsIndex,/)->Self: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure if this would cause any issues for pandas, but theaxis{1,2} are in contravariant (input) positions, so it would be more general to annotate them as: int.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@jorenhamjorenhamjorenham requested changes

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

TYP: wrong return type when applying np.split to pd.DataFrame

3 participants

@janosh@shoyer@jorenham

[8]ページ先頭

©2009-2025 Movatter.jp