Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pyspark.SparkContext.textFile#

SparkContext.textFile(name,minPartitions=None,use_unicode=True)[source]#

Read a text file from HDFS, a local file system (available on allnodes), or any Hadoop-supported file system URI, and return it as anRDD of Strings. The text files must be encoded as UTF-8.

New in version 0.7.0.

Parameters
namestr

directory to the input data files, the path can be comma separatedpaths as a list of inputs

minPartitionsint, optional

suggested minimum number of partitions for the resulting RDD

use_unicodebool, default True

Ifuse_unicode is False, the strings will be kept asstr (encodingasutf-8), which is faster and smaller than unicode.

New in version 1.2.0.

Returns
RDD

RDD representing text data from the file(s).

Examples

>>>importos>>>importtempfile>>>withtempfile.TemporaryDirectory(prefix="textFile")asd:...path1=os.path.join(d,"text1")...path2=os.path.join(d,"text2")......# Write a temporary text file...sc.parallelize(["x","y","z"]).saveAsTextFile(path1)......# Write another temporary text file...sc.parallelize(["aa","bb","cc"]).saveAsTextFile(path2)......# Load text file...collected1=sorted(sc.textFile(path1,3).collect())...collected2=sorted(sc.textFile(path2,4).collect())......# Load two text files together...collected3=sorted(sc.textFile('{},{}'.format(path1,path2),5).collect())
>>>collected1['x', 'y', 'z']>>>collected2['aa', 'bb', 'cc']>>>collected3['aa', 'bb', 'cc', 'x', 'y', 'z']

[8]ページ先頭

©2009-2025 Movatter.jp