pyspark.RDD.pipe#
- RDD.pipe(command,env=None,checkCode=False)[source]#
Return an RDD created by piping elements to a forked external process.
New in version 0.7.0.
- Parameters
- commandstr
command to run.
- envdict, optional
environment variables to set.
- checkCodebool, optional
whether to check the return value of the shell command.
- Returns
Examples
>>>sc.parallelize(['1','2','','3']).pipe('cat').collect()['1', '2', '', '3']