Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork18.5k
Description
Some comparisons between different classes of string (e.g.string[pyarrow]
andstr
) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g.string[pyarrow] == str
should returnstring[pyarrow]
whereasstr == string[pyarrow]
should returnstr
. Is this the concensus?
We currently run into issues with how Python handles subclasses with comparison dunders.
lhs=pd.array(["x",pd.NA,"y"],dtype="string[pyarrow]")rhs=pd.array(["x",pd.NA,"y"],dtype=pd.StringDtype("pyarrow",np.nan))print(lhs.__eq__(rhs))# <ArrowExtensionArray># [True, <NA>, True]# Length: 3, dtype: bool[pyarrow]print(lhs==rhs)# [ True False True]
The two results above differ becauseArrowStringArrayNumpySemantics
is a proper subclass ofArrowStringArray
and therefore Python first callsrhs.__eq__(lhs)
.
We can avoid this by special casing this particular case inArrowStringArrayNumpySemantics
, but I wanted to open up an issue for discussion before proceeding.