Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pyspark.SparkContext.sequenceFile#

SparkContext.sequenceFile(path,keyClass=None,valueClass=None,keyConverter=None,valueConverter=None,minSplits=None,batchSize=0)[source]#

Read a Hadoop SequenceFile with arbitrary key and value Writable class from HDFS,a local file system (available on all nodes), or any Hadoop-supported file system URI.The mechanism is as follows:

  1. A Java RDD is created from the SequenceFile or other InputFormat, and the keyand value Writable classes

  2. Serialization is attempted via Pickle pickling

  3. If this fails, the fallback is to call ‘toString’ on each key and value

  4. CPickleSerializer is used to deserialize pickled objects on the Python side

New in version 1.3.0.

Parameters
pathstr

path to sequencefile

keyClass: str, optional

fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)

valueClassstr, optional

fully qualified classname of value Writable class(e.g. “org.apache.hadoop.io.LongWritable”)

keyConverterstr, optional

fully qualified name of a function returning key WritableConverter

valueConverterstr, optional

fully qualifiedname of a function returning value WritableConverter

minSplitsint, optional

minimum splits in dataset (default min(2, sc.defaultParallelism))

batchSizeint, optional, default 0

The number of Python objects represented as a singleJava object. (default 0, choose batchSize automatically)

Returns
RDD

RDD of tuples of key and corresponding value

Examples

>>>importos>>>importtempfile

Set the class of output format

>>>output_format_class="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat"
>>>withtempfile.TemporaryDirectory(prefix="sequenceFile")asd:...path=os.path.join(d,"hadoop_file")......# Write a temporary Hadoop file...rdd=sc.parallelize([(1,{3.0:"bb"}),(2,{1.0:"aa"}),(3,{2.0:"dd"})])...rdd.saveAsNewAPIHadoopFile(path,output_format_class)......collected=sorted(sc.sequenceFile(path).collect())
>>>collected[(1, {3.0: 'bb'}), (2, {1.0: 'aa'}), (3, {2.0: 'dd'})]

[8]ページ先頭

©2009-2025 Movatter.jp