Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork11.9k
TYP: preserve type for split functions via structural Protocol#30450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
b56fe9d toa2a391bCompareshoyer commentedDec 17, 2025
The design for Generally speaking my recommendation would be not to try to combine type checking and |
janosh commentedDec 17, 2025
thanks for the feedback! i updated the PR title and description - they were outdated. to clarify: this PR doesnot use class_SupportsSplitOps(Protocol):defswapaxes(self,axis1:Any,axis2:Any,copy:Any= ...)->Any: ...def__getitem__(self,key:Any,/)->Any: ... the split functions internally use the Protocol matches what the implementation actually does, so it should be type-safe for any object that properly implements these methods. |
shoyer commentedDec 17, 2025
OK, interesting. I'm still hesitant to recommend relying on this behavior, but it's been around in NumPy long enough that it's probaby not going to change. The real challenge is that strictly following the implementation, we don't know what the return type should be -- it could be an entirely different type, principle. I would still suggest that anybody concerned with type safety should avoid these implicitly overloaded NumPy APIs. In retrospect, they were clearly a design mistake. |
jorenham left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for this PR.
Overall this looks fine to me. The shapes are retained, so even if the input arrays include static shape-typing information, this will still be sound.
One issue with this protocol is that it isn't complete. If you look at thearray_split implementation, for example, you'll see
try:Ntotal=ary.shape[axis]exceptAttributeError:Ntotal=len(ary)
But as it currently stands, there are objects assignable to_SupportsSplitOps that have neither ashape nor__len__. Please take a closer look at each implementation to see what other methods/attributes theProtocol should have.
Recently, NumPy dropped support for Python 3.11, so we're using PEP 695 syntax now, in favor ofTypeVar. You'll see ruff complaining about this if you runruff check.
This needs tests to verify that it is working as intended. We don't have Pandas available in our test environment, so you'll have to ducktype something (that'll also work at runtime) instead. You can find them undertyping/tests/data
This comment has been minimized.
This comment has been minimized.
jorenham commentedDec 17, 2025
The primer error is unrelated |
Add overloads for array_split, split, hsplit, vsplit, and dsplit thatpreserve the input type when the input implements _SupportsSplitOps.The Protocol requires:- shape: tuple[int, ...] - for ary.shape[axis] access- ndim: int - for dimensional checks in hsplit/vsplit/dsplit- swapaxes(axis1, axis2) -> Self - for axis manipulation- __getitem__(key) -> Self - for slicingUses PEP 695 syntax and adds tests with a duck-typed SplitableArrayclass that implements the protocol.Closesnumpy#23005
a2a391b to627f537Comparejanosh commentedDec 18, 2025
@jorenham thanks for taking a look and the great feedback. i resolved merge conflicts and i think addressed all your suggestions |
| defshape(self)->tuple[int, ...]: ... | ||
| @property | ||
| defndim(self)->int: ... | ||
| defswapaxes(self,axis1:SupportsIndex,axis2:SupportsIndex,/)->Self: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not sure if this would cause any issues for pandas, but theaxis{1,2} are in contravariant (input) positions, so it would be more general to annotate them as: int.
Uh oh!
There was an error while loading.Please reload this page.
add overloads for
array_split,split,hsplit,vsplit, anddsplitthat preserve the input type when the input implementsswapaxesand__getitem__.this uses structural typing (Protocol) based on the methods actually used by these split functions internally - not
__array_function__. the Protocol is:objects like
pandas.DataFramethat implement these methods (and returnSelffrom them) will have their type preserved through split operations.example
why this works
the split functions use
swapaxesand slicing (__getitem__) internally. for objects like DataFrame that return their own type from these operations, the split result preserves the type. this is pure duck-typing, not__array_function__dispatch.closes#23005