pyspark.RDD.sortByKey #

RDD.sortByKey(ascending=True,numPartitions=None,keyfunc=<functionRDD.<lambda>>)[source]#

Sorts this RDD, which is assumed to consist of (key, value) pairs.

New in version 0.9.1.

Parameters

ascendingbool, optional, default True: sort the keys in ascending or descending order
numPartitionsint, optional: the number of partitions in newRDD
keyfuncfunction, optional, default identity mapping: a function to compute the key

Returns

RDD: a newRDD

See also

RDD.sortBy()
pyspark.sql.DataFrame.sort()

Examples

>>>tmp=[('a',1),('b',2),('1',3),('d',4),('2',5)]>>>sc.parallelize(tmp).sortByKey().first()('1', 3)>>>sc.parallelize(tmp).sortByKey(True,1).collect()[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]>>>sc.parallelize(tmp).sortByKey(True,2).collect()[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]>>>tmp2=[('Mary',1),('had',2),('a',3),('little',4),('lamb',5)]>>>tmp2.extend([('whose',6),('fleece',7),('was',8),('white',9)])>>>sc.parallelize(tmp2).sortByKey(True,3,keyfunc=lambdak:k.lower()).collect()[('a', 3), ('fleece', 7), ('had', 2), ('lamb', 5),...('white', 9), ('whose', 6)]

Show Source

Movatterモバイル変換

pyspark.RDD.sortByKey#

pyspark.RDD.sortByKey #