Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Implemented MultiIndex.searchsorted method ( GH14833)#61435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
GSAUC3 wants to merge18 commits intopandas-dev:main
base:main
Choose a base branch
Loading
fromGSAUC3:searchsorted_branch

Conversation

GSAUC3
Copy link

@GSAUC3GSAUC3 commentedMay 12, 2025
edited
Loading

@datapythonista
Copy link
Member

Thanks@GSAUC3 for the PR. Is there and issue, or has there been any discussion about this elsewhere?

@GSAUC3
Copy link
Author

GSAUC3 commentedMay 12, 2025
edited
Loading

Hi@datapythonista, this was the issue#14833 , against which i made the pull request.

@GSAUC3
Copy link
Author

HI, I had ran pytest and pre-commit locally, is it possible to run all these test locally?

@RB-zentronlabs
Copy link

@GSAUC3 Add a return statement in your function afterexcept block.

  • check teh docstring and proper typing hints,IndexOpsMixin check this class, where searchsorted has a properly defined parameter structure.
GSAUC3 reacted with thumbs up emoji

@GSAUC3
Copy link
Author

GSAUC3 commentedMay 15, 2025
edited
Loading

hi,@datapythonista , I am having trouble running all these tests, locally, before committing my code, so far i have only ran the pytest, locally, and that worked, could you please, guide me, how to set up the testing environment locally, before each commit?

will running
pre-commit run --all-files
suffice?

@datapythonista
Copy link
Member

pre-commit should run automatically if it's set up to work as intended. You have all the information on how to set up the development environment, run tests... in the development documentation:https://pandas.pydata.org/docs/development/index.html

@RB-zentronlabs
Copy link

RB-zentronlabs commentedMay 15, 2025
edited
Loading

Hi,@datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

=================================== FAILURES ===================================__________________________ test_searchsorted[tuples] ___________________________[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 1[48](https://github.com/pandas-dev/pandas/actions/runs/15033468287/job/42250710830?pr=61435#step:5:52)33___________________ test_searchsorted[mi-with-dt64tz-level] ____________________[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 14833___________________________ test_searchsorted[multi] ___________________________

@datapythonista
Copy link
Member

Hi,@datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

Yes, that's correct. I guess we have an xfail for the test that should be removed.

GSAUC3 reacted with thumbs up emoji

@GSAUC3
Copy link
Author

Hi@datapythonista . Thank you for your suggestions, I've addressed the feedback from earlier and the CI checks are now passing. This PR should be ready for review whenever you get a chance. Please let me know if any changes are required. Thanks again!

RB-zentronlabs reacted with thumbs up emoji

@GSAUC3GSAUC3 changed the titleSearchsorted branchImplemented MultiIndex.searchsorted method (bug #14833)May 17, 2025
@GSAUC3GSAUC3 changed the titleImplemented MultiIndex.searchsorted method (bug #14833)ENH: Implemented MultiIndex.searchsorted method ( GH14833)May 17, 2025
Copy link
Member

@datapythonistadatapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good, added few comments.

You will have to add a not in the whatsnew for 3.0. You can check other PRs to see how we do it.

GSAUC3 reacted with thumbs up emoji
Comment on lines 150 to 157
#if isinstance(obj, pd.MultiIndex):
# # See gh-14833
# request.applymarker(
# pytest.mark.xfail(
# reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833"
# )
# )
if obj.dtype.kind == "c" and isinstance(obj, Index):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you remove this instead of commenting please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Absolutely, I will do it right away.

.gitignore Outdated
@@ -141,3 +141,5 @@ doc/source/savefig/
# Pyodide/WASM related files #
##############################
/.pyodide-xbuildenv-*

*.ipynb

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Better to remove this. We do have some notebooks in this repo iirc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, I will remove it.

@@ -1029,3 +1029,28 @@ def test_get_loc_namedtuple_behaves_like_tuple():
assert idx.get_loc(("i1", "i2")) == 0
assert idx.get_loc(("i3", "i4")) == 1
assert idx.get_loc(("i5", "i6")) == 2


def test_searchsorted():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be better to divide this test. Ideally, when a test fails, we want to know what's wrong just by checking the test name, and also we want to still test everything else. With a single test, the first thing that fails will make the whole test fail. You can use pytest fixtures and parametrize to not repeat too much code.

Copy link
Author

@GSAUC3GSAUC3May 19, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for the suggestion, it is quite insightful for me. I have created three different functions as follows:

deftest_searchsorted_single():...# for single inputsdeftest_searchsorted_lists():...# list of single inputsdeftest_searchsorted_invalid():...# for invalid inputs

Would this be the correct way to do it?

RB-zentronlabs reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that may seem silly, but then when a test fails it's very easy to know what it fails. Ideally an assert per test, and if more than one input is asserted, then pytest.mark.parametrize to reuse a test with multiple inputs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No no, it is not at all silly. I understand the importance of descriptive test functions. Thank you.

RB-zentronlabs reacted with thumbs up emoji
@GSAUC3
Copy link
Author

Hi@datapythonista,
I hope you're doing well. Apologies if this is a basic question—I'm still relatively new to open source, and I noticed that the pull request now shows an “outdated” tag on some of the files I contributed to.
I'm not entirely sure what that means. Should I be concerned about it? Should i update the branch?
Thanks in advance for your guidance!

Copy link
Member

@datapythonistadatapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks@GSAUC3, this looks good to me now. I'll let someone else review it before merging.

Don't worry about the branch not being updated. This will be the case almost always, as soon as other PRs get merged, which is most days. The only cases this is an issue is when there are conflicts, but this is not the case here.

Copy link
Member

@mroeschkemroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the proper solution would involve possibly allowingalgorithms.searchsorted to support 2D inputs. The generalsearchsorted pattern is to dispatch to the array, so thatExtensionArray automatically gain support as well. However,ExtensionArray only support 1D data so I think progress on that front is needed first

if not value:
raise ValueError("searchsorted requires a non-empty value")

if not isinstance(value, (tuple, list)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These types do not match what the type annotation suggest

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the comment, you're absolutely right.
The current runtime check assumes value must be a tuple or list of tuples but the type annotation allows for array-like or ExtensionArray inputs, which may not satisfy that condition.
I will update this.

Comment on lines 3864 to 3879
dtype = np.dtype(
[
(f"level_{i}", np.asarray(level).dtype)
for i, level in enumerate(self.levels)
]
)

val_array = np.array([v], dtype=dtype)

pos = np.searchsorted(
np.asarray(self.values, dtype=dtype),
val_array,
side=side,
sorter=sorter,
)
result.append(np.intp(pos[0]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This probably won't preserve pandas nullable extension types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi@mroeschke, thanks for the comment.

Would it make sense to switch to using:

dtype=np.dtype(    [(f"level_{i}",level.dtype)fori,levelinenumerate(self.levels)])

My concern here is that level.dtype might include extension dtypes (e.g., StringDtype, Int64Dtype) which are not supported by NumPy structured arrays and could raise a TypeError.

An alternative would be to avoid np.searchsorted entirely in the fallback path and implement a manual binary search over the MultiIndex.values, which would preserve pandas’ nullable types and avoid reliance on NumPy's structured dtypes.

Would you prefer we go with the binary search approach instead, or is there another path you'd recommend?

Thanks!

@GSAUC3
Copy link
Author

Hi@mroeschke, thank you for the review and helpful feedback.

I understand that ExtensionArray currently only supports 1D data, and making it work with 2D inputs would likely take some deeper changes.

If the long-term goal is to update algorithms.searchsorted to support 2D inputs and dispatch to the array — so that ExtensionArray can benefit automatically — I’d be happy to help with that.

Please let me know how you’d like to move forward. I’d be glad to contribute to any changes or help explore what’s needed.

@GSAUC3GSAUC3 requested a review frommroeschkeMay 31, 2025 17:36
@GSAUC3
Copy link
Author

Hi@mroeschke, I've made the required changes.
I had a question; would it be appropriate to implement this using binary search?
I already have a working implementation ready, and I'm happy to push it if that's the recommended approach.
Let me know what you think!

@GSAUC3
Copy link
Author

Hi@datapythonista and@mroeschke 👋,

I hope you're both doing well! Just a gentle reminder regardingPR #61435. I've addressed the requested changes and would appreciate it if you could take a look when you have a moment. Please let me know if any further modifications are needed.

Thank you very much for your time and guidance!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@datapythonistadatapythonistadatapythonista approved these changes

@mroeschkemroeschkeAwaiting requested review from mroeschke

Requested changes must be addressed to merge this pull request.

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Multiarray searchsorted fails
4 participants
@GSAUC3@datapythonista@RB-zentronlabs@mroeschke

[8]ページ先頭

©2009-2025 Movatter.jp