Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BUG: Fix #57608: queries on categorical string columns in HDFStore.select() return unexpected results.#61225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mroeschke merged 4 commits intopandas-dev:mainfromSofiaSM45:bugfix-branch
May 20, 2025

Conversation

SofiaSM45
Copy link
Contributor

In functioninit() of class Selection (pandas/core/io/pytables.py), the method self.terms.evaluate() was not returning the correct value for the where condition. The issue stemmed from the function convert_value() of class BinOp (pandas/core/computation/pytables.py), where the function searchedsorted() did not return the correct index when matching the where condition in the metadata (categories table). Replacing searchsorted() with np.where() resolves this issue.

@@ -239,7 +239,8 @@ def stringify(value):
if conv_val not in metadata:
result = -1
else:
result = metadata.searchsorted(conv_val, side="left")
# Find the index of the first match of conv_val in metadata
result = np.where(metadata == conv_val)[0][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
result=np.where(metadata==conv_val)[0][0]
result=np.flatnonzero(metadata==conv_val)[0]

Also is it possible to know if metadata is sorted ahead of time so we can usesearchsorted? it will be much faster in that case

@@ -239,7 +239,13 @@ def stringify(value):
if conv_val not in metadata:
result = -1
else:
result = metadata.searchsorted(conv_val, side="left")
# Check if metadata is sorted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well we probably won't want to do this check here because this also incurs some performance penalty. I was just staying if there's something in the preprocessing code above that already checked this for us.

If not, just usingnp.flatnonzero here directly is fine

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It doesn’t look like there’s a check to ensure the metadata is ordered — this part seems to work with just the array of the category values, so it may not be able to confirm whether it was ordered

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK thanks. You can revert this to how you had it before

@github-actionsGitHub Actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Pleaseupdate and respond to this comment if you're still interested in working on this.

HDFStore.select() return unexpected results.In function __init__() of class Selection (pandas/core/io/pytables.py),the method self.terms.evaluate() was not returning the correct valuefor the where condition. The issue stemmed from the functionconvert_value() of class BinOp (pandas/core/computation/pytables.py),where the function searchedsorted() did not return the correct indexwhen matching the where condition in the metadata (categories table).Replacing searchsorted() with np.where() resolves this issue.
@mroeschke
Copy link
Member

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@SofiaSM45
Copy link
ContributorAuthor

Rebased to include recent upstream changes. I apologize for the unused import in my earlier commit; thank you very much for the quick fix!

@mroeschkemroeschke added IO HDF5read_hdf, HDFStore and removed Stale labelsMay 20, 2025
@mroeschkemroeschke added this to the3.0 milestoneMay 20, 2025
@mroeschkemroeschke merged commitc75171a intopandas-dev:mainMay 20, 2025
48 checks passed
@mroeschke
Copy link
Member

Thanks@SofiaSM45

SofiaSM45 reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mroeschkemroeschkemroeschke approved these changes

Assignees
No one assigned
Labels
IO HDF5read_hdf, HDFStore
Projects
None yet
Milestone
3.0
Development

Successfully merging this pull request may close these issues.

BUG: queries on categorical string columns in HDFStore.select() return unexpected results
2 participants
@SofiaSM45@mroeschke

[8]ページ先頭

©2009-2025 Movatter.jp