NotificationsYou must be signed in to change notification settings
Fork18.6k
Star45.8k

DOC: User Guide Page on user-defined functions#61195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

rhshadrach merged 21 commits intopandas-dev:mainfromarthurlw:udf_user_guide

May 18, 2025

Merged

DOC: User Guide Page on user-defined functions#61195

rhshadrach merged 21 commits intopandas-dev:mainfromarthurlw:udf_user_guide

May 18, 2025

Conversation

Copy link

Member

arthurlw commentedMar 28, 2025

closesDOC: Write user guide page on apply/map/transform methods #61126
~~Tests added and passed if fixing a bug or adding a new feature~~
~~Allcode checks passed.~~
~~Addedtype annotations to new arguments/methods/functions.~~
~~Added an entry in the latestdoc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.~~

arthurlw added8 commits

March 25, 2025 19:39

udf user guide introduction

3f94137

added apply method

bf984ca

added agg, transform and filter

fe67ec8

added map, pipe and vectorized operations

4ec5697

bugfix

11392d7

updated map method

f322d9e

precommit

b6b7b02

trim trailing whitespace

d20bcc7

Copy link

MemberAuthor

arthurlw commentedMar 28, 2025

Currently writing this, so I would appreciate any feedback on it!

toctree

72f7b62

rhshadrach requested changes

Mar 29, 2025

View reviewed changes

Copy link

Member

rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the PR! I'm not opposed to a dedicated page on UDFs, but I am opposed to duplicating documentation that exists elsewhere in the user guide, as I think much of this does. Instead of e.g. examples ofapply, I recommend linking to the appropriate section. This page can then focus on recommendations of when to use apply vs other methods.

doc/source/user_guide/user_defined_functions.rst Outdated

Comment on lines 16 to 17

		Why Use User-Defined Functions?
		-------------------------------

Copy link

Member

rhshadrachMar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we should lead withWhy _not_ User-Defined Functions. While performance is called out down below, I think the poor behavior of UDFs should be mentioned as well. Namely that pandas has no information on what a UDF is doing, and so has to infer (guess) at how to handle the result.

In particular, I think it should be mentioned that none of the examples on this page should be UDFs in practice.

rhshadrach added Apply

Apply, Aggregate, Transform, Map

Docs labels

Mar 29, 2025

Copy link

MemberAuthor

arthurlw commentedMar 29, 2025

Hi@rhshadrach thanks for the feedback! I agree with you and will push updates soon

arthurlw added5 commits

March 29, 2025 13:28

restructured udf user guide

90a2d24

updated documentation links

0d02d64

precommit

214f0ac

fix links

fffaad0

change links

561a1f5

rhshadrach requested changes

Apr 6, 2025

View reviewed changes

Copy link

Member

rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this is looking a lot better. Can we also link tohttps://pandas.pydata.org/pandas-docs/dev/user_guide/enhancingperf.html#numba-jit-compilation at the very bottom in a section titled something like "Improving Performance with UDFs".

doc/source/user_guide/user_defined_functions.rst OutdatedShow resolvedHide resolved

doc/source/user_guide/user_defined_functions.rstShow resolvedHide resolved

doc/source/user_guide/user_defined_functions.rst Outdated

		ways to apply UDFs across different pandas data structures.

		.. note::
		Some of these methods are can also be applied to Groupby Objects. Refer to :ref:`groupby`.

Copy link

Member

rhshadrachApr 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you also make a mention of resample, rolling, expanding, and ewm. Perhaps link to each section in the User Guide.

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we add the other objects to this note, it seems to me they all belong together.

Suggested change

	Some of these methods are can also be applied toGroupby Objects. Refer to:ref:`groupby`.
	Some of these methods are can also be applied togroupby, resample, and various window objects. See:ref:`groupby`,:ref:`resample()<timeseries>`,:ref:`rolling()<window>`,:ref:`expanding()<window>`, and:ref:`ewm()<window>` for details.

doc/source/user_guide/user_defined_functions.rst OutdatedShow resolvedHide resolved

updated user guide

c6891a0

rhshadrach requested changes

Apr 12, 2025

View reviewed changes

doc/source/user_guide/user_defined_functions.rst Outdated

		pandas comes with a set of built-in functions for data manipulation, UDFs offer
		flexibility when built-in methods are not sufficient. These functions can be
		applied at different levels: element-wise, row-wise, column-wise, or group-wise,
		and change the data differently, depending on the method used.

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: "change the data differently" sounds very close to mutating in a UDF, which we explicitly do not support. What do you think of "behave differently".

Copy link

MemberAuthor

arthurlwApr 12, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

“Behave differently” sounds clearer and avoids implying mutation. I'll update it!

doc/source/user_guide/user_defined_functions.rst Outdated

Comment on lines 63 to 64

		* :meth:`~DataFrame.apply` - A flexible method that allows applying a function to Series,
		DataFrames, or groups of data.

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm thinking we should removegroups of data here.DataFrame.apply that you're referencing doesn't operate on groups, and you mention groupby below.

doc/source/user_guide/user_defined_functions.rst Outdated

		ways to apply UDFs across different pandas data structures.

		.. note::
		Some of these methods are can also be applied to Groupby Objects. Refer to :ref:`groupby`.

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we add the other objects to this note, it seems to me they all belong together.

Suggested change

	Some of these methods are can also be applied toGroupby Objects. Refer to:ref:`groupby`.
	Some of these methods are can also be applied togroupby, resample, and various window objects. See:ref:`groupby`,:ref:`resample()<timeseries>`,:ref:`rolling()<window>`,:ref:`expanding()<window>`, and:ref:`ewm()<window>` for details.

doc/source/user_guide/user_defined_functions.rst Outdated

Comment on lines 129 to 130

		When to use: Use :meth:`DataFrame.agg` for performing aggregations like sum, mean, or custom aggregation
		functions across groups.

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Things like.agg(["sum", "mean"]) aren't UDFs, so I don't think they should be mentioned here, and it could be make users think these types of usages are slow (they are not).

Suggested change

	When to use: Use:meth:`DataFrame.agg` for performing aggregations like sum, mean, or custom aggregation
	functions across groups.
	When to use: Use:meth:`DataFrame.agg` for performing custom aggregations, where the operation returns a scalar value on each input.

doc/source/user_guide/user_defined_functions.rst Outdated

		})

		# Using transform with mean
		df['Mean_Transformed'] = df.groupby('Category')['Values'].transform('mean')

Copy link

Member

rhshadrachApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This isn't an example of a UDF. I really like your example of using linear regression - can we do that here? It's a bit unfortunate that groupby.transform does not allow operating on the entire group (only works column-by-column) here.

fromsklearn.linear_modelimportLinearRegressiondf=pd.DataFrame({'group': ['A','A','A','B','B','B'],'x': [1,2,3,1,2,3],'y': [2,4,6,1,2,1.5]}).set_index("x")# Function to fit a model to each groupdeffit_model(group):x=group.index.to_frame()y=groupmodel=LinearRegression()model.fit(x,y)pred=model.predict(x)returnpredresult=df.groupby('group').transform(fit_model)

datapythonista reviewed

Apr 12, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Excellent job here@arthurlw, thanks for taking care of this. I added a general comment about using examples to incrementally illustrate what it's explain here, and changing a bit the order of the sections.

Please let me know if it doesn't make sense or you have any comment. I'll review more in depth after the proposed changes are implemented or discussed. But in a first look, this is really nice.

doc/source/user_guide/index.rst OutdatedShow resolvedHide resolved

doc/source/user_guide/user_defined_functions.rst Outdated

		and change the data differently, depending on the method used.

		Why Not To Use User-Defined Functions
		-----------------------------------------

Copy link

Member

datapythonistaApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure if Sphinx is more flexible now, but this had to be the same exact length as the title before.

Copy link

Member

rhshadrachApr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Title marker needs to be at least as long as the text, but can be longer.

doc/source/user_guide/user_defined_functions.rst OutdatedShow resolvedHide resolved

doc/source/user_guide/user_defined_functions.rst

		applied at different levels: element-wise, row-wise, column-wise, or group-wise,
		and change the data differently, depending on the method used.

		Why Not To Use User-Defined Functions

Copy link

Member

datapythonistaApr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe just personal opinion, but to me it makes more sense to explain what UDFs are in pandas before explaining when not to use them. This order seems reasonable assuming users already know what pandas udfs are in practice, but I'd personally prefer not to assume it in the user guide for UDFs.

In my opinion, after the previous introduction which is great, I'd show a very simple example so we make sure users reading this understand the very basics.

Something like:

defadd_one(x):returnx+1my_series=pd.Series([1,2,3])my_series.map(add_one)

Building on top of this, like then showing the same with aDataFrame, at some point showing UDFs that receive the whole column with.apply... should help make sure users are following and understanding all the information provided here.

Copy link

Member

rhshadrachApr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am a bit negative here. This is duplicating a lot of other documentation that we already have. I think we should instead link to that documentation.

Copy link

Member

datapythonistaApr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do you mind pointing out to an specific example@rhshadrach? I found documentation for the aggregate functions, but not much for themap,apply... onSeries andDataFrame other than in the API docs. I agree with not having much duplication. Personally, if there is few here and there like in the FAQs, Performance page... I'd rather have the docs related to these methods in this page, as it feels like the natural place, and link to the sections here in the FAQs, performance hints, groupby user guide... Of course there can be cases where it makes more sense the opposite, but maybe we can discuss the specific cases where there is duplication.

Copy link

Member

rhshadrachApr 19, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

apply:https://pandas.pydata.org/docs/user_guide/basics.html#row-or-column-wise-function-application
map:https://pandas.pydata.org/docs/user_guide/basics.html#applying-elementwise-functions

I'd rather have the docs related to these methods in this page, as it feels like the natural place

If we are going to move the docs on e.g.DataFrame.agg here, then this no longer is a page just about UDFs asDataFrame.agg does more than just use UDFs. In addition, that seems like a large reworking of the docs for little (in my opinion, actually negative) benefit.

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I totally missed the Essential basic functionality page, thanks for pointing that out. Fully agree with you that what I proposed here is repeating again the wholehttps://pandas.pydata.org/docs/user_guide/basics.html#function-application section . And I agree that's not a good idea.

Personally, I'd rather not have that section, and have that content here. At least in my experience, map and apply are common, but not essential as other parts described in that page. And also, I think the structure of the user guide will be clearer and easier to find things with the changes.

For theDataFrame.agg, there is already a groupby page, and I think just having the methods in the lists of methods that support udfs would be good, and then just a mention that points out to the group by page where all the detail explanation regarding groupping is presented with examples.

There may be other structures, but what I'd like is that we can give users structure to the related methods. I thinkSeries has around 200 methods and attributes. Users having to navigate that whole API to find out themselves that map, apply and pipe are kind of the same just changing the input of the udf, doesn't seem ideal. I think this page here can really help in that.

What do you think?

Copy link

Member

rhshadrachApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Personally, I'd rather not have that section, and have that content here.

If we move the main document ofapply here, then I am quite opposed to calling this a page on UDFs as apply does more than just take UDFs. By documentingapply("sum") et al here, it seems to me we make this page far less clear than leaving it as solely UDFs.

In any case, is that something you think should be tackled in this PR? This PR started as

A dedicated page in the users guide that guides users on when to use udf, a general idea of the API, the differences between the different methods, the options available... seems a better idea.

I do not think we should morph it into moving around documentation from other places, especially when there are disagreements.

Users having to navigate that whole API to find out themselves that map, apply and pipe are kind of the same just changing the input of the udf, doesn't seem ideal.

Which is why I think this page should be a comparison of UDF methods (as it mostly is now), while pointing to more thorough documentation elsewhere in the User Guide.

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fair enough, I think I understand your point better now. Maybe I'd like to improve a bit the apply/maps docs in essential, but that's unrelated to this PR. And happy to move forward here focussing on the UDFs and not on the methods, as you describe.

updated udf user guide based on reviews

f56ec28

datapythonista reviewed

Apr 19, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Very nice, just a couple of small comment. And we need to decide about duplication, but in general looks great. Thanks for the work here@arthurlw

doc/source/user_guide/user_defined_functions.rst Outdated

Comment on lines 90 to 100

		* :meth:`~DataFrame.apply` - A flexible method that allows applying a function to Series and
		DataFrames.
		* :meth:`~DataFrame.agg` (Aggregate) - Used for summarizing data, supporting custom
		aggregation functions.
		* :meth:`~DataFrame.transform` - Applies a function to Series and Dataframes while preserving the shape of
		the original data.
		* :meth:`~DataFrame.filter` - Filters Series and Dataframes based on a list of Boolean conditions.
		* :meth:`~DataFrame.map` - Applies an element-wise function to a Series or Dataframe, useful for
		transforming individual values.
		* :meth:`~DataFrame.pipe` - Allows chaining custom functions to process Series or
		Dataframes in a clean, readable manner.

Copy link

Member

datapythonistaApr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What do you think about having this as a table? Personally I think it should make it easier to understand the differences about the methods. As a general idea:

method	function input	function output	description
map	scalar	scalar	map each element to the element returned by the function elementwise
apply(axis=0	column	column	map each column to the column returned by the function
apply(axis=1)	row	row	map each row to the row returned by the function
pipe	series or dataframe	series or dataframe	map the series or dataframe to a new series or dataframe returned by the function

Not sure if it makes sense to combine with the table below.

Copy link

MemberAuthor

arthurlwApr 20, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah I agree with you, thanks for the suggestion! I will keep the two tables separate for now

doc/source/user_guide/user_defined_functions.rst

Comment on lines +199 to +201

		.. note::
		:meth:`DataFrame.filter` does not accept UDFs, but can accept
		list comprehensions that have UDFs applied to them.

Copy link

Member

datapythonistaApr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm unsure on havingfilter here for now. I think it's very good that you added it, as it doesn't support udfs, but it probably should. So, it opens a discussion we probably want to have about adding them.@rhshadrach thoughts?

Copy link

Member

rhshadrachApr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suspect the reason this was added is thatDataFrameGroupBy.filter does accept UDFs. Perhaps that should be mentioned instead?

I actually thinkDataFrame.filter should accept Boolean masks, similar to PySpark and Polars. But agreed that discussion is not for here!

doc/source/user_guide/user_defined_functions.rst Outdated

		Documentation can be found at :meth:`~DataFrame.pipe`.


		Best Practices

Copy link

Member

datapythonistaApr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe just personal preference, but these last 3 sections seem to be talking about the same (performance), I'd have just a section about performance.

I'd keep it short for now, and we can iterate over it later. The reason is that each time we review this before merging it we need to re-read the whole document. So, if we can finish the main part above first, and have this as a placeholder, then in a second PR we can focus more on performance without having to keep reviewing the first part again.

arthurlw added2 commits

April 19, 2025 17:55

updated definition section and performance section title

c00d1d2

updated definition table

8d41537

datapythonista reviewed

Apr 20, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks great. Some minor comments, and as you've seen, we need to make a decision on the structure to use. But I think this guide is a great addition.

doc/source/user_guide/user_defined_functions.rst Outdated

Comment on lines 93 to 95

		\| :meth:`map` \| Scalar \| Scalar \| Maps each element to the element returned by the function element-wise \|
		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`apply` (axis=0) \| Column (Series) \| Column (Series) \| Apply a function to each column \|

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this comes from my suggestion, but checking the descriptions now, feels like it'd be helpful to use the use the same terminology, to show how similar both functions are. Meaning that if we useApply a function to each column, I think it'd be helpful to useApply a function to each element. Feel free to disagree, and I see the point on using the name of the method in the description. But personally, I think it'd be good to highlight how similar these methods are, and let users understand the difference easily and quickly, and I think what I'm proposing should help with that.

doc/source/user_guide/user_defined_functions.rst Outdated

		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`agg` \| Series/DataFrame \| Scalar or Series \| Aggregate and summarizes values, e.g., sum or custom reducer \|
		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`transform` \| Series/DataFrame \| Same shape as input \| Transform values while preserving shape \|

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Transform is actually very similar to apply. The function input is a column or a row depending on axis. And the output is also a column or a row (exactly same as apply). The only difference is thatapply allows to return a shorter or longer dataframe, while transform will raise. See this example:

# apply works fine when the output still have 3 samples:>>>pandas.DataFrame({"points": [100,30,50]}).apply(lambdax:pandas.Series([1,2,3]))points011223# transform also works fine>>>pandas.DataFrame({"points": [100,30,50]}).transform(lambdax:pandas.Series([1,2,3]))points011223# apply is still happy now that we removed one of the samples:>>>pandas.DataFrame({"points": [100,30,50]}).apply(lambdax:pandas.Series([1,2]))points0112# transform is not happy:>>>pandas.DataFrame({"points": [100,30,50]}).transform(lambdax:pandas.Series([1,2]))Traceback (mostrecentcalllast):File"<stdin>",line1,in<module>File"/home/mgarcia/src/pandas/pandas/core/frame.py",line10269,intransformresult=op.transform()^^^^^^^^^^^^^^File"/home/mgarcia/src/pandas/pandas/core/apply.py",line356,intransformraiseValueError("Function did not transform")ValueError:Functiondidnottransform

This is more useful with groupby, and I'm not even sure ifDataFrame.transform is that useful, it probably mostly exist for consistency.

doc/source/user_guide/user_defined_functions.rst Outdated

		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`transform` \| Series/DataFrame \| Same shape as input \| Transform values while preserving shape \|
		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`filter` \| Series/DataFrame \| Series/DataFrame \| Filter data using a boolean array \|

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For now I'd remove the function input/output fields, as there is no function.

doc/source/user_guide/user_defined_functions.rst

		df["new_col"] = df.apply(calc_ratio, axis=1)

		# Vectorized Operation
		df["new_col2"] = 100 * (df["one"] / df["two"])

Copy link

Member

datapythonistaApr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe worth mentioning and comparing also.pipe, which is both vectorized and a udf?

arthurlw added2 commits

April 20, 2025 16:06

updated table of definitions and added .pipe discussion under perform…

467bc93

…ance section

precommit

efd5201

datapythonista reviewed

May 3, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I added few comment. Happy to get this merged once those are addressed, and any further improvement to this doc can be done as a follow up.

Thanks for all the work on this@arthurlw, really good job.

doc/source/user_guide/user_defined_functions.rst Outdated

		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`transform` \| Series/DataFrame \| Same shape as input \| Apply a function while preserving shape; raises error if shape changes \|
		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`filter` \| - \| - \| Return rows that satisfy a boolean condition \|

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	\|:meth:`filter` \|- \| - \| Return rows that satisfy a boolean condition \|
	\|:meth:`filter` \|Series or DataFrame \| Boolean\| Only accepts UDFs in group by. Function it's called for each group, and the group is removed from the result if the function returns ``False`` \|

doc/source/user_guide/user_defined_functions.rst Outdated

		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`agg` \| Series/DataFrame \| Scalar or Series \| Aggregate and summarizes values, e.g., sum or custom reducer \|
		+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
		\| :meth:`transform` \| Series/DataFrame \| Same shape as input \| Apply a function while preserving shape; raises error if shape changes \|

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	\|:meth:`transform`\|Series/DataFrame \|Same shape as input \| Apply a function while preserving shape; raises error if shapechanges \|
	\|:meth:`transform`(axis=0) \| Column (Series) \|Column(Series) \| Same as :meth:`apply`, but it raises an exception if the function changes the shapeof the data \|

apply andtransform are almost the same, I'd also have it twice for axis=0 and axis=1 like apply.

doc/source/user_guide/user_defined_functions.rst Outdated

		and :ref:`ewm()<window>` for details.


		Choosing the Right Method

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Personally I'd only keep the first table. I think the description can already explain anything not clear present in the second table.

And I'd make this sectionchoosing the right method part of theMethods that support User-Defined Functions. To me it feels like we're having two sections for mostly the same.

doc/source/user_guide/user_defined_functions.rst Outdated

		for common operations.

		.. note::
		If performance is critical, explore vectorizated operations before resorting

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

vectorized?

doc/source/user_guide/user_defined_functions.rst Outdated

		When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method or UDF method is available,
		but consider optimizing performance with vectorized operations wherever possible.

		Documentation can be found at :meth:`~DataFrame.apply`.

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd remove this for all methods, as the title section and every mention ofapply will already be a link.

doc/source/user_guide/user_defined_functions.rst Outdated

		~~~~~~~~~~~~~~~~~~~~~

		:meth:`DataFrame.map` is used specifically to apply element-wise UDFs and is better
		for this purpose compared to :meth:`DataFrame.apply` because of its better performance.

Copy link

Member

datapythonistaMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think map is more performant than apply, in general apply will be more performant. I'd remove this as a reason, as it depends a lot on the use case, and I don't think this is a general rule.

Copy link

MemberAuthor

arthurlwMay 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Got it. Just curious: Why is applying generally more performant than mapping in practice? I thought map was faster for simple element-wise UDFs.

Copy link

Member

datapythonistaMay 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In map, without anything special, pandas should necessarily loop in Python over the elements, casting each to a PyObject. In apply the same can be true, but since a whole column is given to the user, the user could also use vectorization. Like, a simple function doing ax + 1 would be fast when using apply and slow when using map. That's why I wouldn't say map is faster.

updated udf user guide based on reviews

af7964b

Copy link

Member

datapythonista commentedMay 4, 2025

/preview

Copy link

Contributor

github-actionsbot commentedMay 4, 2025

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61195/

Copy link

Member

datapythonista commentedMay 4, 2025

@arthurlw I generated a preview of the rendered docs in this PR. If you want to have a look to see if everything looks as expected:https://pandas.pydata.org/preview/pandas-dev/pandas/61195/docs/user_guide/user_defined_functions.html

@rhshadrach can you have a look and see if this can be merged? Even if we want to make some improvements and extend this in the future, I think this PR is already a great first version. So, whenever there is no blocker, probably easier to get this merged and iterate in follow up PRs if needed.

Copy link

MemberAuthor

arthurlw commentedMay 5, 2025

Thanks for the preview! I think adding an example that combines groupby and filter (which takes a UDF) could be beneficial. That said, I do think it might duplicate some existing docs, so not sure if it's worth including here.

Copy link

Member

rhshadrach commentedMay 8, 2025

Just noting this is still on my radar, should be able to get to it in the next 3 days.

rhshadrach approved these changes

May 18, 2025

View reviewed changes

Copy link

Member

rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

lgtm

rhshadrach added this to the3.0 milestone

May 18, 2025

rhshadrach merged commit5b0767a intopandas-dev:main

May 18, 2025

8 checks passed

Copy link

Member

rhshadrach commentedMay 18, 2025

Thanks@arthurlw for all the work here!

Copy link

MemberAuthor

arthurlw commentedMay 19, 2025

Thanks for the guidance on this UDF User Guide@datapythonista @rhshadrach!

I'm interested in diving deeper into this area of pandas. Are there any related issues, features, or improvements you think could be tackled next?

Thanks in advance!

Copy link

Member

datapythonista commentedMay 19, 2025

@arthurlw I created#61458 and assigned you to it. I think it's a good one to work in pandas udf. Good to learn more about the status quo, with reasonable complexity, and very useful to the project since it will make the code much clearer for future changes. Also, if you work on this issue, I think you'll realize of inconsistencies, missing documentation, and other related tasks that may be good to work on too. For example, it could be useful to document the executor interface in the documentation about extending pandas. So, third-party library authors can easily learn how to create an execution engine for pandas map/apply.

Please let me know if this is not what you're looking for, I can try to think of something else. Or if you have any question (better to ask them in the other issue).

xaris96 pushed a commit to xaris96/pandas that referenced this pull request

May 30, 2025

DOC: User Guide Page on user-defined functions (pandas-dev#61195)

811d9a3

Labels

Apply

Apply, Aggregate, Transform, Map

Docs

3 participants

Movatterモバイル変換

Uh oh!

DOC: User Guide Page on user-defined functions#61195

DOC: User Guide Page on user-defined functions#61195

Uh oh!

Conversation

arthurlw commentedMar 28, 2025

Uh oh!

arthurlw commentedMar 28, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arthurlw commentedMar 29, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arthurlwApr 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrachApr 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arthurlwApr 20, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

arthurlwApr 12, 2025•
edited
Loading

rhshadrachApr 19, 2025•
edited
Loading

arthurlwApr 20, 2025•
edited
Loading