Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PERF: Restore old performances with .isin() on columns typed as np.ui…#61320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mroeschke merged 2 commits intopandas-dev:mainfrompbrochart:main
May 19, 2025

Conversation

pbrochart
Copy link
Contributor

@pbrochartpbrochart commentedApr 20, 2025
edited
Loading

…nt64

Only if dtypes are equal (e.g uint64 vs uint64, uint32 vs uint32...)

%timeit data["uints"].isin([np.uint64(1), np.uint64(2)]) # 17ms (!)
The last line, with older numpy==1.26.4 (last version <2.0), is even worse: ~200ms.

@pbrochart
Copy link
ContributorAuthor

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@pbrochart
Copy link
ContributorAuthor

Implicit conversion to float64 happens only whith uint64/int64.
I reverted the PR#46693 to provide an example based on initial issue#46485:

import pandas as pdimport numpy as nptest_df = pd.DataFrame([{'a': 1378774140726870442}], dtype=np.uint64)print(1378774140726870442 == 1378774140726870528) #Falseprint(test_df['a'].isin([1378774140726870528])[0])#Trueprint(test_df['a'].isin([1])[0])#False

The second test must be False and was handled by the PR#46693
because there is implicit conversion to float64.
But if we change it to:

print(test_df['a'].isin([np.uint64(1378774140726870528)])[0])#False

The result is correct because in this case there is no implicit conversion so it's not necessary to use object.
Regarding the performance, it's resolves partially the issue#60098:

Before:

import pandas as pd, numpy as npdata = pd.DataFrame({    "uints": np.random.randint(10000, size=300000, dtype=np.uint64),    "ints": np.random.randint(10000, size=300000, dtype=np.int64),})%timeit data["uints"].isin([np.uint64(1), np.uint64(2)]) # 239ms

After:

import pandas as pd, numpy as npdata = pd.DataFrame({    "uints": np.random.randint(10000, size=300000, dtype=np.uint64),    "ints": np.random.randint(10000, size=300000, dtype=np.int64),})%timeit data["uints"].isin([np.uint64(1), np.uint64(2)]) # 4ms

Copy link
Member

@rhshadrachrhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

lgtm

@rhshadrachrhshadrach added PerformanceMemory or execution speed performance RegressionFunctionality that used to work in a prior pandas version isinisin method labelsMay 19, 2025
@rhshadrachrhshadrach added this to the3.0 milestoneMay 19, 2025
@mroeschkemroeschke merged commiteca6bd3 intopandas-dev:mainMay 19, 2025
54 checks passed
@mroeschke
Copy link
Member

Thanks@pbrochart

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mroeschkemroeschkemroeschke approved these changes

@rhshadrachrhshadrachrhshadrach approved these changes

Assignees
No one assigned
Labels
isinisin methodPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version
Projects
None yet
Milestone
3.0
Development

Successfully merging this pull request may close these issues.

PERF: Slowdowns with .isin() on columns typed as np.uint64
3 participants
@pbrochart@mroeschke@rhshadrach

[8]ページ先頭

©2009-2025 Movatter.jp