API (string dtype): comparisons between different string classes #60639

New issue

Closed

#61138

Closed

API (string dtype): comparisons between different string classes#60639

#61138

Labels

API - ConsistencyInternal Consistency of API/BehaviorNeeds DiscussionRequires discussion from core team before further actionNumeric OperationsArithmetic, Comparison, and Logical operationsStringsString extension data type and string data

Milestone

2.3

Description

rhshadrach

opened

on Jan 1, 2025

Some comparisons between different classes of string (e.g.string[pyarrow] andstr) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g.string[pyarrow] == str should returnstring[pyarrow] whereasstr == string[pyarrow] should returnstr. Is this the concensus?

We currently run into issues with how Python handles subclasses with comparison dunders.

lhs=pd.array(["x",pd.NA,"y"],dtype="string[pyarrow]")rhs=pd.array(["x",pd.NA,"y"],dtype=pd.StringDtype("pyarrow",np.nan))print(lhs.__eq__(rhs))# <ArrowExtensionArray># [True, <NA>, True]# Length: 3, dtype: bool[pyarrow]print(lhs==rhs)# [ True False  True]

The two results above differ becauseArrowStringArrayNumpySemantics is a proper subclass ofArrowStringArray and therefore Python first callsrhs.__eq__(lhs).

We can avoid this by special casing this particular case inArrowStringArrayNumpySemantics, but I wanted to open up an issue for discussion before proceeding.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API (string dtype): comparisons between different string classes #60639

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions