bigframes.pandas.DataFrame.apply#

DataFrame.apply(func,*,axis=0,args:Tuple=(),**kwargs)[source]#

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index isthe DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1).The final return type is inferred from the return type of the appliedfunction.

Note

axis=1 scenario is in preview.

Examples:

>>>df=bpd.DataFrame({'col1':[1,2],'col2':[3,4]})>>>df   col1  col20     1     31     2     4[2 rows x 2 columns]
>>>defsquare(x):...returnx*x
>>>df.apply(square)   col1  col20     1     91     4    16[2 rows x 2 columns]

You could apply a user defined function to every row of the DataFrame bycreating a remote function out of it, and using it withaxis=1. Withinthe function, each row is passed as apandas.Series. It is recommendedto select only the necessary columns before callingapply(). Note: Thisfeature is currently inpreview.

>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...deffoo(row:pd.Series)->int:...result=1...result+=row["col1"]...result+=row["col2"]*row["col2"]...returnresult
>>>df[["col1","col2"]].apply(foo,axis=1)0    111    19dtype: Int64

You could return an array output for every input row from the remotefunction.

>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...defmarks_analyzer(marks:pd.Series)->list[float]:...importstatistics...average=marks.mean()...median=marks.median()...gemetric_mean=statistics.geometric_mean(marks.values)...harmonic_mean=statistics.harmonic_mean(marks.values)...return[...round(stat,2)forstatin...(average,median,gemetric_mean,harmonic_mean)...]
>>>df=bpd.DataFrame({..."physics":[67,80,75],..."chemistry":[88,56,72],..."algebra":[78,91,79]...},index=["Alice","Bob","Charlie"])>>>stats=df.apply(marks_analyzer,axis=1)>>>statsAlice      [77.67 78.   77.19 76.71]Bob        [75.67 80.   74.15 72.56]Charlie    [75.33 75.   75.28 75.22]dtype: list<item: double>[pyarrow]

You could also apply a remote function which accepts multiple parametersto every row of a DataFrame by using it withaxis=1 if the DataFramehas matching number of columns and data types. Note: This feature iscurrently inpreview.

>>>df=bpd.DataFrame({...'col1':[1,2],...'col2':[3,4],...'col3':[5,5]...})>>>df   col1  col2  col30     1     3     51     2     4     5[2 rows x 3 columns]
>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...deffoo(x:int,y:int,z:int)->float:...result=1...result+=x...result+=y/z...returnresult
>>>df.apply(foo,axis=1)0    2.61    3.8dtype: Float64
Parameters:
  • func (function) –

    Function to apply to each column or row. To apply to each row(i.e. whenaxis=1 is specified) the function can be of one ofthe two types:

    (1). It accepts a single input parameter of typeSeries, in

    which case each row is delivered to the function as a pandasSeries.

    (2). It accept one or more parameters, in which case column values

    are delivered to the function as separate arguments (mappingto those parameters) for each row. For this to work theDataFrame must have same number of columns and matchingdata types.

  • axis ({index (0),columns (1)}) – Axis along which the function is applied. Specify 0 or ‘index’to apply function to each column. Specify 1 or ‘columns’ toapply function to each row.

  • args (tuple) – Positional arguments to pass tofunc in addition to thearray/series.

  • **kwargs – Additional keyword arguments to pass as keywords arguments tofunc.

Returns:

Result of applyingfunc along the given axis of the DataFrame.

Return type:

bigframes.pandas.DataFrame orbigframes.pandas.Series

Raises:
  • ValueError – If a remote function is not provided whenaxis=1 is specified.

  • ValueError – If number or input params in the remote function are not the same as the number of columns in the dataframe.

  • ValueError – If the dtypes of the columns in the dataframe are not compatible with the data types of the remote function input params.

On this page

This Page