Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork18.5k
BUG: Add fillna at the beginning of _where not to fill NA. #60729#60772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
pandas/core/generic.py Outdated
@@ -9674,6 +9674,13 @@ def _where( | |||
if axis is not None: | |||
axis = self._get_axis_number(axis) | |||
# We should not be filling NA. See GH#60729 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is this trying to fill missing values when NaN is the missing value indicator? I don't think that is right either - the missing values should propogate for all types. We may just be missing coverage for the NaN case (which should be added to the test)
sanggon6107Jan 25, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the feedback,@WillAyd .
I thought we could make the values propagate by fillingcond
withTrue
, since_where()
would finally keep the values inself
alive where itscond
isTrue
.
Even if I don't fill those values here,_where
would callfillna()
using inplace at the below code. That's also why the result varies depending on whetherinpalce=True
or not.
Lines 9695 to 9698 ine3b2de8
# make sure we are boolean | |
fill_value=bool(inplace) | |
cond=cond.fillna(fill_value) | |
cond=cond.infer_objects() |
Could you explain in more detail what you mean by propagate for all type? Do you mean we need to keep NA as it is even after this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi@WillAyd,
I've done some further investigations on this, but I still belive the current code is the simplest way to make the missing values propagate.
If we want to let NA propagate without callingfillna()
here, there might be too many code changes needed. See below codes :
- Need to change the below code so that we don't fill the missing values when caller is
where()
ormask()
. If we don't,fillna()
will fill them withinplace
.
Lines 9695 to 9698 inf1441b2
# make sure we are boolean | |
fill_value=bool(inplace) | |
cond=cond.fillna(fill_value) | |
cond=cond.infer_objects() |
- Need to change the below code as well since
to_numpy()
will fill the missing value usinginplace
when cond is a DataFrame.
Lines 9703 to 9716 inf1441b2
ifnotisinstance(cond,ABCDataFrame): | |
# This is a single-dimensional object. | |
ifnotis_bool_dtype(cond): | |
raiseTypeError(msg.format(dtype=cond.dtype)) | |
else: | |
for_dtincond.dtypes: | |
ifnotis_bool_dtype(_dt): | |
raiseTypeError(msg.format(dtype=_dt)) | |
ifcond._mgr.any_extension_types: | |
# GH51574: avoid object ndarray conversion later on | |
cond=cond._constructor( | |
cond.to_numpy(dtype=bool,na_value=fill_value), | |
**cond._construct_axes_dict(), | |
) |
- Since
extract_bool_array()
fills the missing values using argna_value=False
atEABackedBlock.where()
, we might need to find every single NA index from cond before we call this function(using isna() for example) and then implement additional behaviour to make those values propagate atExtensionArray._where()
.
pandas/pandas/core/internals/blocks.py
Lines 1664 to 1668 inf1441b2
defwhere(self,other,cond)->list[Block]: | |
arr=self.values.T | |
cond=extract_bool_array(cond) | |
pandas/pandas/core/array_algos/putmask.py
Lines 116 to 127 inf1441b2
defextract_bool_array(mask:ArrayLike)->npt.NDArray[np.bool_]: | |
""" | |
If we have a SparseArray or BooleanArray, convert it to ndarray[bool]. | |
""" | |
ifisinstance(mask,ExtensionArray): | |
# We could have BooleanArray, Sparse[bool], ... | |
# Except for BooleanArray, this is equivalent to just | |
# np.asarray(mask, dtype=bool) | |
mask=mask.to_numpy(dtype=bool,na_value=False) | |
mask=np.asarray(mask,dtype=bool) | |
returnmask |
If _where() is trying to fill the missing values for cond anyway, I think we don't necessarily have to disfavour the current code change. Could you give me some feedback?
rhshadrachMar 1, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is this trying to fill missing values when NaN is the missing value indicator? I don't think that is right either - the missing values should propogate for all types.
By filling in the missing values oncond
with True, the missing value in the caller propagates. It's not filling in this missing values oncond
that then fails to properly propagate the caller's missing value.
Co-authored-by: WillAyd <will_ayd@innobi.io>
FYI, it seems this has already been discussed at#53124 (comment) |
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think this looks like the right approach, but a question on if we can simplify here.
pandas/core/generic.py Outdated
if isinstance(cond, np.ndarray): | ||
cond = np.array(cond) | ||
cond[np.isnan(cond)] = True | ||
elif isinstance(cond, NDFrame): | ||
cond = cond.fillna(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We also do fillna on L9704 below. Can these be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also, what other types besides ndarray and NDFrame get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also, what other types besides ndarray and NDFrame get here?
Hi@rhshadrach, thanks for the review!
I've tried to find if there are any other types that possibly can get here, but I couldn't find any.
According to the documentation, cond should be one of these : boolSeries
/DataFrame
,array-like
, orcallable
.
Andarray-like
such aslist
/tuple
would be converted toNDFrame
/np.ndarray
via below codes.
In case we inputlist
/tuple
tomask()
:
or In case we inputcallable
(a function that returnslist
ortuple
) tomask()
:
(pandas/core/generic.py > NDFrame.mask())
Lines 10096 to 10098 in57fd502
# see gh-21891 | |
ifnothasattr(cond,"__invert__"): | |
cond=np.array(cond) |
In case we inputlist
/tuple
towhere()
:
or In case we inputscalar
to 'mask()' or 'where():
or In case we inputcallable
(a function that returnslist
ortuple
) towhere()
:
(pandas/core/generic.py > NDFrame._where())
Lines 9712 to 9717 in57fd502
else: | |
ifnothasattr(cond,"shape"): | |
cond=np.asanyarray(cond) | |
ifcond.shape!=self.shape: | |
raiseValueError("Array conditional must be same shape as self") | |
cond=self._constructor(cond,**self._construct_axes_dict(),copy=False) |
Please let me know if I'm missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi@rhshadrach, sorry for the confusion. I just realized thatcond
could be either list or tuple when we input a callable towhere()
. Will revise the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@rhshadrach,
I've tried to combine this code withfillna(inplace)
at L9732 as you said, but it seems this would result in some test failures sincealign()
at L9722 sometimes returns anndarray
with full ofnp.nan
, and thencond
is supposed to be filled withinplace
(=False
) by L9732. And several tests is current expecting this behaviour as it is. For example, attests.frame.indexing.test_mask.test_mask_stringdtype[Series]
:
tests.frame.indexing.test_mask.test_mask_stringdtype[Series]
deftest_mask_stringdtype(frame_or_series):# GH 40824obj=DataFrame( {"A": ["foo","bar","baz",NA]},index=["id1","id2","id3","id4"],dtype=StringDtype(), )filtered_obj=DataFrame( {"A": ["this","that"]},index=["id2","id3"],dtype=StringDtype() )expected=DataFrame( {"A": [NA,"this","that",NA]},index=["id1","id2","id3","id4"],dtype=StringDtype(), )ifframe_or_seriesisSeries:obj=obj["A"]filtered_obj=filtered_obj["A"]expected=expected["A"]filter_ser=Series([False,True,True,False])result=obj.mask(filter_ser,filtered_obj)tm.assert_equal(result,expected)
result>>>id1fooid2barid3bazid4<NA>Name:A,dtype:stringexpected>>>id1<NA>id2thisid3thatid4<NA>Name:A,dtype:string
I suspect the behaviour ofalign()
at L9722 is not desirable because current code will re-initialize thecond
for these cases and fillcond
withFalse
regardless ofcond
given by users as below.
obj>>>id1fooid2barid3bazid4<NA>Name:A,dtype:stringfiltered_obj>>>id2thisid3thatName:A,dtype:stringfilter_ser=pd.Series([False,True,True,False])filter_ser_2=pd.Series([False,False,False,False])filter_ser_3=pd.Series([True,True,True,True])result=obj.mask(filter_ser,filtered_obj)# Should return ["foo", "this", "that", pd.NA]. But this test is currently expecthing to be [pd.NA, "this", "that", pd.NA]result>>>id1<NA>id2thisid3thatid4<NA>Name:A,dtype:stringresult_2=obj.mask(filter_ser_2,filtered_obj)# Should return ["foo", "bar", "baz", pd.NA]result_2>>>id1<NA>id2thisid3thatid4<NA>Name:A,dtype:stringresult_3=obj.mask(filter_ser_3,filtered_obj)# Should reutrn ["pd.NA, "this", "that", pd.NA]result_3>>>id1<NA>id2thisid3thatid4<NA>Name:A,dtype:string
I think I'd better open another issue regarding this, but for now,I suppose we'd best to leavefillna()
as it is, not combining with the below one. Could you please let me know what you think about this?
failing tests
tests.frame.indexing.test_getitem.TestGetitemBooleanMask.test_getitem_boolean_series_with_duplicate_columns
tests.frame.indexing.test_indexing.TestDataFrameIndexing.test_setitem_cast
tests.frame.indexing.test_mask.test_mask_stringdtype[Series]
tests.frame.indexing.test_mask.test_mask_where_dtype_timedelta
tests.frame.indexing.test_where.test_where_bool_comparison
tests.frame.indexing.test_where.test_where_string_dtype[Series]
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_alignment[float_string]
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_alignment[mixed_int]
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_bug
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_invalid
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_ndframe_align
tests.frame.indexing.test_where.TestDataFrameIndexingWhere.test_where_none
tests.indexing.multiindex.test_setitem.TestMultiIndexSetItem.test_frame_getitem_setitem_multislice
tests.indexing.test_indexing.TestMisc.test_no_reference_cycle
tests.series.indexing.test_mask.test_mask_casts
tests.series.indexing.test_where.test_where_error
tests.series.indexing.test_where.test_where_setitem_invalid
def test_mask_na(): | ||
# We should not be filling pd.NA. See GH#60729 | ||
series = Series([None, 1, 2, None, 3, 4, None], dtype=Int64Dtype()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you also add a test for arrow. Can parametrize with e.g.
@pytest.mark.parametrize("dtype", ["Int64", "int64[pyarrow]")
sanggon6107Mar 3, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks, just added a test for pyarrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looking good!
pandas/core/generic.py Outdated
cond[isna(cond)] = True | ||
elif isinstance(cond, NDFrame): | ||
cond = cond.fillna(True) | ||
elif isinstance(cond, (list, tuple)): |
rhshadrachMar 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The docs forwhere
state thatcond
can be "list-like", so we should be usingis_list_like
instead of this condition. However, can you instead move this section so that it's combined with theif
block on L9714 immediately below.
sanggon6107Mar 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've change the code, but it seems this causes test failures. I'll convert the status to draft and revise the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
just confirmed that all the tests passed. Thanks!
from pandas import ( | ||
Series, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
can you revert this change.
sanggon6107Mar 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Code reverted. Thanks!
if dtype == "int64[pyarrow]": | ||
pytest.importorskip("pyarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Instead, can you change the parametrization to be:
pytest.param("int64[pyarrow]", marks=td.skip_if_no("pyarrow"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the suggestion. just changed the code.
Many thanks for your feedback,@rhshadrach . All the changes you requested are now reflected. Could you review the changes? |
pandas/core/generic.py Outdated
@@ -9712,6 +9713,9 @@ def _where( | |||
else: | |||
if not hasattr(cond, "shape"): | |||
cond = np.asanyarray(cond) | |||
else: | |||
cond = np.array(cond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
cond=np.array(cond) | |
cond=extract_array(cond,extract_numpy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could we avoid thecopy
here somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think we might be able to callfillna(True)
right below theself._constructor()
since cond will become an NDFrame there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi@mroeschke ,confirmed that all the checks passed. Could you review the code change?
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Hi@mroeschke , Thanks for the suggestion! But it seems some tests fail when we changed the code |
Hi@mroeschke , >>>importpandasaspd>>>importnumpyasnp>>>series=pd.Series([None,1,2,None,3,4,None])>>>series.where([np.nan,False,False,np.nan,True,True,np.nan],-99)Traceback (mostrecentcalllast):File"<stdin>",line1,in<module>File"C:\Users\sangg\Desktop\pandas-sanggon-from-code\pandas\pandas\core\generic.py",line10032,inwherereturnself._where(cond,other,inplace=inplace,axis=axis,level=level)File"C:\Users\sangg\Desktop\pandas-sanggon-from-code\pandas\pandas\core\generic.py",line9732,in_whereraiseTypeError(msg.format(dtype=cond.dtype))TypeError:Booleanarrayexpectedforthecondition,notobject I think def_where(...ifisinstance(cond,NDFrame):...else:ifnothasattr(cond,"shape"):dt=Noneifset(cond).issubset({True,False,np.nan}):dt=boolcond=np.asanyarray(cond,dtype=dt)ifcond.shape!=self.shape:raiseValueError("Array conditional must be same shape as self") |
def test_mask_na(dtype): | ||
# We should not be filling pd.NA. See GH#60729 | ||
series = Series([None, 1, 2, None, 3, 4, None], dtype=dtype) | ||
result = series.mask(series <= 2, -99) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you also test the case where the condition is an ndarray and a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks@rhshadrach , add tests for a list and an ndarray. Could you review the code change?
pandas/core/generic.py Outdated
if not cond.flags.writeable: | ||
cond.setflags(write=True) | ||
cond[isna(cond)] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Doesn't this mutate the caller'scond
in certain cases? We cannot mutate here.
result = series.mask(cond, -99) | ||
tm.assert_series_equal(result, expected) | ||
result = series.mask(cond.to_list(), -99) | ||
tm.assert_series_equal(result, expected) | ||
result = series.mask(cond.to_numpy(), -99) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you instead parametrize this test. Something like:
@pytest.mark.parametrize("cond_type", ["series", "list", "numpy"])
and
ifcond_type=="list":cond=cond.to_list()elifcond_type=="numpy":cond=cond.to_numpy()
@@ -9701,6 +9702,7 @@ def _where( | |||
# align the cond to same shape as myself | |||
cond = common.apply_if_callable(cond, self) | |||
if isinstance(cond, NDFrame): | |||
cond = cond.fillna(True) |
rhshadrachApr 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Below wefillna
using the value ofinplace
. UsingTrue
as done here looks correct, usinginplace
seems to be a bug. On main:
s=Series([1.0,2.0,3.0,4.0])cond=Series([True,False])print(s.mask(cond))# 0 NaN# 1 2.0# 2 NaN# 3 NaN# dtype: float64s.mask(cond,inplace=True)print(s)# 0 NaN# 1 2.0# 2 3.0# 3 4.0# dtype: float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks@rhshadrach ,
I thinkalign
atL9714
sometimes returns aDataFrame
with NA in it, and they need to be filled with either of True or False depending on caller. (__setitem__
requires the former, and that's why it calls_where()
withinplace=True
)
My suggestion is to add an argument to_where
so that the caller can decide whether or not we are going to fill the missing value withTrue
afteralign
. Could you let me know if this change would be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yea, that makes sense to me. But I believe that's only necessary to fix the bug mentioned above and not necessarily for this PR. If that is the case, I think it should be handled separately.
if all( | ||
x is NA or isinstance(x, (np.bool_, bool)) or x is np.nan | ||
for x in cond.flatten() | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you detail why this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wrote this conditional block to deal with thelist
cond that consists ofTrue
,False
andnp.nan
.
The reason why I thought this conditional statement is necessary is that we need to convertcond
tobool
(L10112
) only for that case. Or we might see some unexpected conversions as below.
>>>importpandasaspd>>>importnumpyasnp>>>>>>cond= [1,2,3,4,np.nan]>>>cond=np.array(cond,dtype=object)>>>condarray([1,2,3,4,nan],dtype=object)>>>>>>cond[pd.isna(cond)]=False>>>condarray([1,2,3,4,False],dtype=object)>>>>>>cond=cond.astype(bool)>>>condarray([True,True,True,True,False])# 1, 2, 3, 4 were converted to True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not following whycond = cond.astype(bool)
would be incorrect here.bool(np.nan)
gives True, and so while users may be surprised by pandas replacingnp.nan
in such a case, I believe it is technically correct.pd.NA
on the other hand doesn't allow conversion to Boolean and propagates in comparisons (unlike nan), so I think we should special case this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Doc stated thatcond
should be a bool array-like, so I thoughtmask()
should raise whencond
has non-bool values(integers in the below case) in it.
I supposemask()
won't raise if we remove that condition. For example,
>>>importpandasaspd>>>importnumpyasnp>>>>>>ser=pd.Series([1,2,3,4,5])>>>cond= [True,False,33,44,np.nan]# bool, bool, int, int, np.nan>>>>>>res=ser.mask(cond,other=-99)# mask() would raise at 601f6c9, but doesn't raise when the the condition is removed.>>>res0-99122-993-9945dtype:int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi@rhshadrach ,
could we rather construct a DataFrame here and then callfillna(False)
,infer_objects()
?
I think then we could avoid using the generator expression and also deal withNA
.
Code be like :
cond=common.apply_if_callable(cond,self)other=common.apply_if_callable(other,self)# see gh-21891ifnothasattr(cond,"__invert__"):cond=np.array(cond,dtype=object)ifisinstance(cond,np.ndarray):ifcond.shape!=self.shape:raiseValueError("Array conditional must be same shape as self")cond=self._constructor(cond,**self._construct_axes_dict(),copy=False)cond=cond.fillna(False)cond=cond.infer_objects()returnself._where(~cond,other=other,inplace=inplace,axis=axis,level=level, )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@sanggon6107 - could you investigate usingpd.array
on L10111 and thenfillna
prior to the negation when passing to_where
. If you run into issues doing this, let me know and I can take a deeper look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the feedback@rhshadrach , but it seemspd.array()
raises whencond
is a 2D list sincepd.array()
could only accept 1-dimensionaldata
.
@@ -9701,6 +9702,7 @@ def _where( | |||
# align the cond to same shape as myself | |||
cond = common.apply_if_callable(cond, self) | |||
if isinstance(cond, NDFrame): | |||
cond = cond.fillna(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yea, that makes sense to me. But I believe that's only necessary to fix the bug mentioned above and not necessarily for this PR. If that is the case, I think it should be handled separately.
if all( | ||
x is NA or isinstance(x, (np.bool_, bool)) or x is np.nan | ||
for x in cond.flatten() | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not following whycond = cond.astype(bool)
would be incorrect here.bool(np.nan)
gives True, and so while users may be surprised by pandas replacingnp.nan
in such a case, I believe it is technically correct.pd.NA
on the other hand doesn't allow conversion to Boolean and propagates in comparisons (unlike nan), so I think we should special case this.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Added fillna at the beginning of _where so that we can fill pd.NA.
Since this is my first PR, please correct me if I'm mistaken. Thanks!