pyspark.RDD.sortByKey#
- RDD.sortByKey(ascending=True,numPartitions=None,keyfunc=<functionRDD.<lambda>>)[source]#
Sorts this RDD, which is assumed to consist of (key, value) pairs.
New in version 0.9.1.
- Parameters
- ascendingbool, optional, default True
sort the keys in ascending or descending order
- numPartitionsint, optional
the number of partitions in new
RDD- keyfuncfunction, optional, default identity mapping
a function to compute the key
- Returns
Examples
>>>tmp=[('a',1),('b',2),('1',3),('d',4),('2',5)]>>>sc.parallelize(tmp).sortByKey().first()('1', 3)>>>sc.parallelize(tmp).sortByKey(True,1).collect()[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]>>>sc.parallelize(tmp).sortByKey(True,2).collect()[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]>>>tmp2=[('Mary',1),('had',2),('a',3),('little',4),('lamb',5)]>>>tmp2.extend([('whose',6),('fleece',7),('was',8),('white',9)])>>>sc.parallelize(tmp2).sortByKey(True,3,keyfunc=lambdak:k.lower()).collect()[('a', 3), ('fleece', 7), ('had', 2), ('lamb', 5),...('white', 9), ('whose', 6)]