Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork18.5k
BUG: groupby.agg with UDF changing pyarrow dtypes#59601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
BUG: groupby.agg with UDF changing pyarrow dtypes#59601
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…unt for missing pyarrow dtypes
…b.com:undermyumbrella1/pandas into fix/group_by_agg_pyarrow_bool_numpy_same_type
This pull request is stale because it has been open for thirty days with no activity. Pleaseupdate and respond to this comment if you're still interested in working on this. |
…group_by_agg_pyarrow_bool_numpy_same_type
result = gb.agg(lambda x: {"number": 1}) | ||
arr = pa.array([{"number": 1}, {"number": 1}, {"number": 1}]) | ||
expected = DataFrame( | ||
{"B": ArrowExtensionArray(arr)}, | ||
index=Index(["c1", "c2", "c3"], name="A"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
When the column starts as a PyArrow dtype and returns dictionaries, it seems questionable to me whether we should return the corresponding PyArrow dtype. The other option is a NumPy array of object dtype. But both seem like reasonable results and I imagine the PyArrow is likely to be more convenient for the user who is using PyArrow dtypes.
This pull request is stale because it has been open for thirty days with no activity. Pleaseupdate and respond to this comment if you're still interested in working on this. |
Uh oh!
There was an error while loading.Please reload this page.
Continuation of#58129
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Root cause:
agg_series
always forces output dtype to be the same as input dtype, but depending on the lambda, the output dtype can be differentFix:
maybe_convert_object
, asmaybe_convert_object
does not check for NA, and forces dtype to float if NA is present (NA is not float in pyarrow),