NotificationsYou must be signed in to change notification settings
Fork18.6k
Star45.9k

ENH: Implemented MultiIndex.searchsorted method ( GH14833)#61435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

GSAUC3 wants to merge18 commits intopandas-dev:main

base:main

Choose a base branch

fromGSAUC3:searchsorted_branch

Open

ENH: Implemented MultiIndex.searchsorted method ( GH14833)#61435

GSAUC3 wants to merge18 commits intopandas-dev:mainfromGSAUC3:searchsorted_branch

Conversation

Copy link

GSAUC3 commentedMay 12, 2025•
edited
Loading

closesMultiarray searchsorted fails #14833
Tests added and passed if fixing a bug or adding a new feature
Allcode checks passed.
Addedtype annotations to new arguments/methods/functions.
Added an entry in the latestdoc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

GSAUC3 added4 commits

May 12, 2025 22:48

sortedarray in side multi.py implemented, testing pending

cffb863

implemented the searchsorted() method, w.r.t issuepandas-dev#18433

1ba7ff8

modified test_searchsorted, discarded the use of numpy.testing

ac70f3e

applying pre-commit fixes

275b0e2

Copy link

Member

datapythonista commentedMay 12, 2025

Thanks@GSAUC3 for the PR. Is there and issue, or has there been any discussion about this elsewhere?

Copy link

Author

GSAUC3 commentedMay 12, 2025•
edited
Loading

Hi@datapythonista, this was the issue#14833 , against which i made the pull request.

datapythonista added Enhancement MultiIndex labels

May 12, 2025

Copy link

Author

GSAUC3 commentedMay 12, 2025

HI, I had ran pytest and pre-commit locally, is it possible to run all these test locally?

Copy link

RB-zentronlabs commentedMay 13, 2025

@GSAUC3 Add a return statement in your function afterexcept block.

check teh docstring and proper typing hints,IndexOpsMixin check this class, where searchsorted has a properly defined parameter structure.

GSAUC3 added3 commits

May 14, 2025 00:27

fixed the returned statement error

0e0b9b5

solved the mypy type checking error, and implemented searchsorted und…

4747609

…er MultiIndex()

Merge branch 'main' into searchsorted_branch

9ac62ab

Copy link

Author

GSAUC3 commentedMay 15, 2025•
edited
Loading

hi,@datapythonista , I am having trouble running all these tests, locally, before committing my code, so far i have only ran the pytest, locally, and that worked, could you please, guide me, how to set up the testing environment locally, before each commit?

will running
pre-commit run --all-files
suffice?

Copy link

Member

datapythonista commentedMay 15, 2025

pre-commit should run automatically if it's set up to work as intended. You have all the information on how to set up the development environment, run tests... in the development documentation:https://pandas.pydata.org/docs/development/index.html

Copy link

RB-zentronlabs commentedMay 15, 2025•
edited
Loading

Hi,@datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

=================================== FAILURES ===================================__________________________ test_searchsorted[tuples] ___________________________[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 1[48](https://github.com/pandas-dev/pandas/actions/runs/15033468287/job/42250710830?pr=61435#step:5:52)33___________________ test_searchsorted[mi-with-dt64tz-level] ____________________[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 14833___________________________ test_searchsorted[multi] ___________________________

Copy link

Member

datapythonista commentedMay 15, 2025

Hi,@datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

Yes, that's correct. I guess we have an xfail for the test that should be removed.

GSAUC3 added4 commits

May 17, 2025 10:02

Closes the GH 14833 issue

e2c2c5e

pre-commit run successful, closes issue 14833

e88da57

fixed incompatible return type issue, and closes issue 14833

94f7c44

fixed typing checks; closes issue 14833

1f4a1c9

Copy link

Author

GSAUC3 commentedMay 17, 2025

Hi@datapythonista . Thank you for your suggestions, I've addressed the feedback from earlier and the CI checks are now passing. This PR should be ready for review whenever you get a chance. Please let me know if any changes are required. Thanks again!

GSAUC3 changed the title~~Searchsorted branch~~Implemented MultiIndex.searchsorted method (bug #14833)

May 17, 2025

GSAUC3 changed the title~~Implemented MultiIndex.searchsorted method (bug #14833)~~ENH: Implemented MultiIndex.searchsorted method ( GH14833)

May 17, 2025

datapythonista reviewed

May 19, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good, added few comments.

You will have to add a not in the whatsnew for 3.0. You can check other PRs to see how we do it.

pandas/tests/base/test_misc.py Outdated

Comment on lines 150 to 157

		#if isinstance(obj, pd.MultiIndex):
		# # See gh-14833
		# request.applymarker(
		# pytest.mark.xfail(
		# reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833"
		# )
		# )
		if obj.dtype.kind == "c" and isinstance(obj, Index):

Copy link

Member

datapythonistaMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you remove this instead of commenting please?

Copy link

Author

GSAUC3May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Absolutely, I will do it right away.

.gitignore Outdated

		@@ -141,3 +141,5 @@ doc/source/savefig/
		# Pyodide/WASM related files #
		##############################
		/.pyodide-xbuildenv-*

		*.ipynb

Copy link

Member

datapythonistaMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Better to remove this. We do have some notebooks in this repo iirc.

Copy link

Author

GSAUC3May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, I will remove it.

pandas/tests/indexes/multi/test_indexing.py Outdated

		@@ -1029,3 +1029,28 @@ def test_get_loc_namedtuple_behaves_like_tuple():
		assert idx.get_loc(("i1", "i2")) == 0
		assert idx.get_loc(("i3", "i4")) == 1
		assert idx.get_loc(("i5", "i6")) == 2


		def test_searchsorted():

Copy link

Member

datapythonistaMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be better to divide this test. Ideally, when a test fails, we want to know what's wrong just by checking the test name, and also we want to still test everything else. With a single test, the first thing that fails will make the whole test fail. You can use pytest fixtures and parametrize to not repeat too much code.

Copy link

Author

GSAUC3May 19, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for the suggestion, it is quite insightful for me. I have created three different functions as follows:

deftest_searchsorted_single():...# for single inputsdeftest_searchsorted_lists():...# list of single inputsdeftest_searchsorted_invalid():...# for invalid inputs

Would this be the correct way to do it?

Copy link

Member

datapythonistaMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that may seem silly, but then when a test fails it's very easy to know what it fails. Ideally an assert per test, and if more than one input is asserted, then pytest.mark.parametrize to reuse a test with multiple inputs.

Copy link

Author

GSAUC3May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No no, it is not at all silly. I understand the importance of descriptive test functions. Thank you.

GSAUC3 added3 commits

May 19, 2025 23:05

updated whatsnew list, added more descriptive tests

ffd99d8

warning in rst file solved. closes issue 14833

5e2caa4

added searchsorted method in indexing.rst

6b0d0ab

Copy link

Author

GSAUC3 commentedMay 21, 2025

Hi@datapythonista,
I hope you're doing well. Apologies if this is a basic question—I'm still relatively new to open source, and I noticed that the pull request now shows an “outdated” tag on some of the files I contributed to.
I'm not entirely sure what that means. Should I be concerned about it? Should i update the branch?
Thanks in advance for your guidance!

datapythonista approved these changes

May 21, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks@GSAUC3, this looks good to me now. I'll let someone else review it before merging.

Don't worry about the branch not being updated. This will be the case almost always, as soon as other PRs get merged, which is most days. The only cases this is an issue is when there are conflicts, but this is not the case here.

mroeschke requested changes

May 27, 2025

View reviewed changes

Copy link

Member

mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the proper solution would involve possibly allowingalgorithms.searchsorted to support 2D inputs. The generalsearchsorted pattern is to dispatch to the array, so thatExtensionArray automatically gain support as well. However,ExtensionArray only support 1D data so I think progress on that front is needed first

pandas/core/indexes/multi.py Outdated

		if not value:
		raise ValueError("searchsorted requires a non-empty value")

		if not isinstance(value, (tuple, list)):

Copy link

Member

mroeschkeMay 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These types do not match what the type annotation suggest

Copy link

Author

GSAUC3May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the comment, you're absolutely right.
The current runtime check assumes value must be a tuple or list of tuples but the type annotation allows for array-like or ExtensionArray inputs, which may not satisfy that condition.
I will update this.

pandas/core/indexes/multi.py Outdated

Comment on lines 3864 to 3879

		dtype = np.dtype(
		[
		(f"level_{i}", np.asarray(level).dtype)
		for i, level in enumerate(self.levels)
		]
		)

		val_array = np.array([v], dtype=dtype)

		pos = np.searchsorted(
		np.asarray(self.values, dtype=dtype),
		val_array,
		side=side,
		sorter=sorter,
		)
		result.append(np.intp(pos[0]))

Copy link

Member

mroeschkeMay 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This probably won't preserve pandas nullable extension types

Copy link

Author

GSAUC3May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi@mroeschke, thanks for the comment.

Would it make sense to switch to using:

dtype=np.dtype(    [(f"level_{i}",level.dtype)fori,levelinenumerate(self.levels)])

My concern here is that level.dtype might include extension dtypes (e.g., StringDtype, Int64Dtype) which are not supported by NumPy structured arrays and could raise a TypeError.

An alternative would be to avoid np.searchsorted entirely in the fallback path and implement a manual binary search over the MultiIndex.values, which would preserve pandas’ nullable types and avoid reliance on NumPy's structured dtypes.

Would you prefer we go with the binary search approach instead, or is there another path you'd recommend?

Thanks!

Copy link

Author

GSAUC3 commentedMay 28, 2025

Hi@mroeschke, thank you for the review and helpful feedback.

I understand that ExtensionArray currently only supports 1D data, and making it work with 2D inputs would likely take some deeper changes.

If the long-term goal is to update algorithms.searchsorted to support 2D inputs and dispatch to the array — so that ExtensionArray can benefit automatically — I’d be happy to help with that.

Please let me know how you’d like to move forward. I’d be glad to contribute to any changes or help explore what’s needed.

GSAUC3 added2 commits

May 30, 2025 21:35

ensured typing hints consistency, closes issue 14833

73e308b

fixed mypy error, closes issue 14833

1342657

GSAUC3 requested a review frommroeschke

May 31, 2025 17:36

binary search approach for searchsorted(), closes issue 14833

68a3b81

Copy link

Author

GSAUC3 commentedJun 3, 2025

Hi@mroeschke, I've made the required changes.
I had a question; would it be appropriate to implement this using binary search?
I already have a working implementation ready, and I'm happy to push it if that's the recommended approach.
Let me know what you think!

fixed the overload pyright error

a681c2d

Copy link

Author

GSAUC3 commentedJun 19, 2025

Hi@datapythonista and@mroeschke 👋,

I hope you're both doing well! Just a gentle reminder regardingPR #61435. I've addressed the requested changes and would appreciate it if you could take a look when you have a moment. Please let me know if any further modifications are needed.

Thank you very much for your time and guidance!

Labels

Enhancement MultiIndex

4 participants

Movatterモバイル変換

Uh oh!

ENH: Implemented MultiIndex.searchsorted method ( GH14833)#61435

Are you sure you want to change the base?

ENH: Implemented MultiIndex.searchsorted method ( GH14833)#61435

Uh oh!

Conversation

GSAUC3 commentedMay 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista commentedMay 12, 2025

Uh oh!

GSAUC3 commentedMay 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

GSAUC3 commentedMay 12, 2025

Uh oh!

RB-zentronlabs commentedMay 13, 2025

Uh oh!

GSAUC3 commentedMay 15, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista commentedMay 15, 2025

Uh oh!

RB-zentronlabs commentedMay 15, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista commentedMay 15, 2025

Uh oh!

GSAUC3 commentedMay 17, 2025

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GSAUC3May 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GSAUC3 commentedMay 21, 2025

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GSAUC3 commentedMay 28, 2025

Uh oh!

GSAUC3 commentedJun 3, 2025

Uh oh!

GSAUC3 commentedJun 19, 2025

Uh oh!

Uh oh!

GSAUC3 commentedMay 12, 2025•
edited
Loading

GSAUC3 commentedMay 12, 2025•
edited
Loading

GSAUC3 commentedMay 15, 2025•
edited
Loading

RB-zentronlabs commentedMay 15, 2025•
edited
Loading

GSAUC3May 19, 2025•
edited
Loading