Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitc75171a

Browse files
BUG:Fix#57608: queries on categorical string columns in HDFStore.select() return unexpected results. (#61225)
* BUG:Fix#57608: queries on categorical string columns inHDFStore.select() return unexpected results.In function __init__() of class Selection (pandas/core/io/pytables.py),the method self.terms.evaluate() was not returning the correct valuefor the where condition. The issue stemmed from the functionconvert_value() of class BinOp (pandas/core/computation/pytables.py),where the function searchedsorted() did not return the correct indexwhen matching the where condition in the metadata (categories table).Replacing searchsorted() with np.where() resolves this issue.* BUG: Follow-up for#57608: check if metadata is sorted before search* BUG: Follow-up for#57608: use direct match via np.flatnonzero* [pre-commit.ci] auto fixes from pre-commit.com hooksfor more information, seehttps://pre-commit.ci---------Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent09a17c7 commitc75171a

File tree

3 files changed

+26
-1
lines changed

3 files changed

+26
-1
lines changed

‎doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -775,6 +775,7 @@ I/O
775775
- Bug in:meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
776776
- Bug in:meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
777777
- Bug in:meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
778+
- Bug in:meth:`HDFStore.select` causing queries on categorical string columns to return unexpected results (:issue:`57608`)
778779
- Bug in:meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
779780
- Bug in:meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
780781
- Bug in:meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)

‎pandas/core/computation/pytables.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,8 @@ def stringify(value):
239239
ifconv_valnotinmetadata:
240240
result=-1
241241
else:
242-
result=metadata.searchsorted(conv_val,side="left")
242+
# Find the index of the first match of conv_val in metadata
243+
result=np.flatnonzero(metadata==conv_val)[0]
243244
returnTermValue(result,result,"integer")
244245
elifkind=="integer":
245246
try:

‎pandas/tests/io/pytables/test_store.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
timedelta_range,
2424
)
2525
importpandas._testingastm
26+
frompandas.api.typesimport (
27+
CategoricalDtype,
28+
)
2629
frompandas.tests.io.pytables.commonimport (
2730
_maybe_remove,
2831
ensure_clean_store,
@@ -1107,3 +1110,23 @@ def test_store_bool_index(tmp_path, setup_path):
11071110
df.to_hdf(path,key="a")
11081111
result=read_hdf(path,"a")
11091112
tm.assert_frame_equal(expected,result)
1113+
1114+
1115+
@pytest.mark.parametrize("model", ["name","longname","verylongname"])
1116+
deftest_select_categorical_string_columns(tmp_path,model):
1117+
# Corresponding to BUG: 57608
1118+
1119+
path=tmp_path/"test.h5"
1120+
1121+
models=CategoricalDtype(categories=["name","longname","verylongname"])
1122+
df=DataFrame(
1123+
{"modelId": ["name","longname","longname"],"value": [1,2,3]}
1124+
).astype({"modelId":models,"value":int})
1125+
1126+
withHDFStore(path,"w")asstore:
1127+
store.append("df",df,data_columns=["modelId"])
1128+
1129+
withHDFStore(path,"r")asstore:
1130+
result=store.select("df","modelId == model")
1131+
expected=df[df["modelId"]==model]
1132+
tm.assert_frame_equal(result,expected)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp