bigframes.pandas.DataFrame.apply#
- DataFrame.apply(func,*,axis=0,args:Tuple=(),**kwargs)[source]#
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index isthe DataFrame’s index (
axis=0) or the DataFrame’s columns (axis=1).The final return type is inferred from the return type of the appliedfunction.Note
axis=1scenario is in preview.Examples:
>>>df=bpd.DataFrame({'col1':[1,2],'col2':[3,4]})>>>df col1 col20 1 31 2 4[2 rows x 2 columns]
>>>defsquare(x):...returnx*x
>>>df.apply(square) col1 col20 1 91 4 16[2 rows x 2 columns]
You could apply a user defined function to every row of the DataFrame bycreating a remote function out of it, and using it withaxis=1. Withinthe function, each row is passed as a
pandas.Series. It is recommendedto select only the necessary columns before callingapply(). Note: Thisfeature is currently inpreview.>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...deffoo(row:pd.Series)->int:...result=1...result+=row["col1"]...result+=row["col2"]*row["col2"]...returnresult
>>>df[["col1","col2"]].apply(foo,axis=1)0 111 19dtype: Int64
You could return an array output for every input row from the remotefunction.
>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...defmarks_analyzer(marks:pd.Series)->list[float]:...importstatistics...average=marks.mean()...median=marks.median()...gemetric_mean=statistics.geometric_mean(marks.values)...harmonic_mean=statistics.harmonic_mean(marks.values)...return[...round(stat,2)forstatin...(average,median,gemetric_mean,harmonic_mean)...]
>>>df=bpd.DataFrame({..."physics":[67,80,75],..."chemistry":[88,56,72],..."algebra":[78,91,79]...},index=["Alice","Bob","Charlie"])>>>stats=df.apply(marks_analyzer,axis=1)>>>statsAlice [77.67 78. 77.19 76.71]Bob [75.67 80. 74.15 72.56]Charlie [75.33 75. 75.28 75.22]dtype: list<item: double>[pyarrow]
You could also apply a remote function which accepts multiple parametersto every row of a DataFrame by using it withaxis=1 if the DataFramehas matching number of columns and data types. Note: This feature iscurrently inpreview.
>>>df=bpd.DataFrame({...'col1':[1,2],...'col2':[3,4],...'col3':[5,5]...})>>>df col1 col2 col30 1 3 51 2 4 5[2 rows x 3 columns]
>>>@bpd.remote_function(reuse=False,cloud_function_service_account="default")...deffoo(x:int,y:int,z:int)->float:...result=1...result+=x...result+=y/z...returnresult
>>>df.apply(foo,axis=1)0 2.61 3.8dtype: Float64
- Parameters:
func (function) –
Function to apply to each column or row. To apply to each row(i.e. whenaxis=1 is specified) the function can be of one ofthe two types:
- (1). It accepts a single input parameter of typeSeries, in
which case each row is delivered to the function as a pandasSeries.
- (2). It accept one or more parameters, in which case column values
are delivered to the function as separate arguments (mappingto those parameters) for each row. For this to work theDataFrame must have same number of columns and matchingdata types.
axis ({index (0),columns (1)}) – Axis along which the function is applied. Specify 0 or ‘index’to apply function to each column. Specify 1 or ‘columns’ toapply function to each row.
args (tuple) – Positional arguments to pass tofunc in addition to thearray/series.
**kwargs – Additional keyword arguments to pass as keywords arguments tofunc.
- Returns:
Result of applying
funcalong the given axis of the DataFrame.- Return type:
- Raises:
ValueError – If a remote function is not provided when
axis=1is specified.ValueError – If number or input params in the remote function are not the same as the number of columns in the dataframe.
ValueError – If the dtypes of the columns in the dataframe are not compatible with the data types of the remote function input params.