pyspark.RDD.aggregate #

RDD.aggregate(zeroValue,seqOp,combOp)[source]#

Aggregate the elements of each partition, and then the results for allthe partitions, using a given combine functions and a neutral “zerovalue.”

The functionsop(t1,t2) is allowed to modifyt1 and return itas its result value to avoid object allocation; however, it should notmodifyt2.

The first function (seqOp) can return a different result type, U, thanthe type of this RDD. Thus, we need one operation for merging a T intoan U and one operation for merging two U

New in version 1.1.0.

Parameters

zeroValueU: the initial value for the accumulated result of each partition
seqOpfunction: a function used to accumulate results within a partition
combOpfunction: an associative function used to combine results from different partitions

Returns

U: the aggregated result

See also

RDD.reduce()
RDD.fold()

Examples

>>>seqOp=(lambdax,y:(x[0]+y,x[1]+1))>>>combOp=(lambdax,y:(x[0]+y[0],x[1]+y[1]))>>>sc.parallelize([1,2,3,4]).aggregate((0,0),seqOp,combOp)(10, 4)>>>sc.parallelize([]).aggregate((0,0),seqOp,combOp)(0, 0)

Show Source

Movatterモバイル変換

pyspark.RDD.aggregate#

pyspark.RDD.aggregate #