Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pyspark.SparkContext.parallelize#

SparkContext.parallelize(c,numSlices=None)[source]#

Distribute a local Python collection to form an RDD. Using rangeis recommended if the input represents a range for performance.

New in version 0.7.0.

Parameters
ccollections.abc.Iterable

iterable collection to distribute

numSlicesint, optional

the number of partitions of the new RDD

Returns
RDD

RDD representing distributed collection.

Examples

>>>sc.parallelize([0,2,3,4,6],5).glom().collect()[[0], [2], [3], [4], [6]]>>>sc.parallelize(range(0,6,2),5).glom().collect()[[], [0], [], [2], [4]]

Deal with a list of strings.

>>>strings=["a","b","c"]>>>sc.parallelize(strings,2).glom().collect()[['a'], ['b', 'c']]

[8]ページ先頭

©2009-2025 Movatter.jp