Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pyspark.SparkContext.hadoopFile#

SparkContext.hadoopFile(path,inputFormatClass,keyClass,valueClass,keyConverter=None,valueConverter=None,conf=None,batchSize=0)[source]#

Read an ‘old’ Hadoop InputFormat with arbitrary key and value class from HDFS,a local file system (available on all nodes), or any Hadoop-supported file system URI.The mechanism is the same as for meth:SparkContext.sequenceFile.

New in version 1.1.0.

A Hadoop configuration can be passed in as a Python dict. This will be converted into aConfiguration in Java.

Parameters
pathstr

path to Hadoop file

inputFormatClassstr

fully qualified classname of Hadoop InputFormat(e.g. “org.apache.hadoop.mapreduce.lib.input.TextInputFormat”)

keyClassstr

fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)

valueClassstr

fully qualified classname of value Writable class(e.g. “org.apache.hadoop.io.LongWritable”)

keyConverterstr, optional

fully qualified name of a function returning key WritableConverter

valueConverterstr, optional

fully qualified name of a function returning value WritableConverter

confdict, optional

Hadoop configuration, passed in as a dict

batchSizeint, optional, default 0

The number of Python objects represented as a singleJava object. (default 0, choose batchSize automatically)

Returns
RDD

RDD of tuples of key and corresponding value

Examples

>>>importos>>>importtempfile

Set the related classes

>>>output_format_class="org.apache.hadoop.mapred.TextOutputFormat">>>input_format_class="org.apache.hadoop.mapred.TextInputFormat">>>key_class="org.apache.hadoop.io.IntWritable">>>value_class="org.apache.hadoop.io.Text"
>>>withtempfile.TemporaryDirectory(prefix="hadoopFile")asd:...path=os.path.join(d,"old_hadoop_file")......# Write a temporary Hadoop file...rdd=sc.parallelize([(1,""),(1,"a"),(3,"x")])...rdd.saveAsHadoopFile(path,output_format_class,key_class,value_class)......loaded=sc.hadoopFile(path,input_format_class,key_class,value_class)...collected=sorted(loaded.collect())
>>>collected[(0, '1\t'), (0, '1\ta'), (0, '3\tx')]

[8]ページ先頭

©2009-2025 Movatter.jp