pyspark.SparkContext.textFile #

SparkContext.textFile(name,minPartitions=None,use_unicode=True)[source]#

Read a text file from HDFS, a local file system (available on allnodes), or any Hadoop-supported file system URI, and return it as anRDD of Strings. The text files must be encoded as UTF-8.

New in version 0.7.0.

Parameters

namestr: directory to the input data files, the path can be comma separatedpaths as a list of inputs
minPartitionsint, optional: suggested minimum number of partitions for the resulting RDD
use_unicodebool, default True: Ifuse_unicode is False, the strings will be kept asstr (encodingasutf-8), which is faster and smaller than unicode.
New in version 1.2.0.

Returns

RDD: RDD representing text data from the file(s).

See also

RDD.saveAsTextFile()
SparkContext.wholeTextFiles()

Examples

>>>importos>>>importtempfile>>>withtempfile.TemporaryDirectory(prefix="textFile")asd:...path1=os.path.join(d,"text1")...path2=os.path.join(d,"text2")......# Write a temporary text file...sc.parallelize(["x","y","z"]).saveAsTextFile(path1)......# Write another temporary text file...sc.parallelize(["aa","bb","cc"]).saveAsTextFile(path2)......# Load text file...collected1=sorted(sc.textFile(path1,3).collect())...collected2=sorted(sc.textFile(path2,4).collect())......# Load two text files together...collected3=sorted(sc.textFile('{},{}'.format(path1,path2),5).collect())

>>>collected1['x', 'y', 'z']>>>collected2['aa', 'bb', 'cc']>>>collected3['aa', 'bb', 'cc', 'x', 'y', 'z']

Show Source

Movatterモバイル変換

pyspark.SparkContext.textFile#

pyspark.SparkContext.textFile #