pyarrow.fs.HadoopFileSystem #

classpyarrow.fs.HadoopFileSystem(strhost,intport=8020,struser=None,*,intreplication=3,intbuffer_size=0,default_block_size=None,kerb_ticket=None,extra_conf=None)#

Bases:FileSystem

HDFS backed FileSystem implementation

Parameters:

hoststr: HDFS host to connect to. Set to “default” for fs.defaultFS fromcore-site.xml.
portint, default 8020: HDFS port to connect to. Set to 0 for default or logical (HA) nodes.
userstr, defaultNone: Username when connecting to HDFS; None implies login user.
replicationint, default 3: Number of copies each block will have.
buffer_sizeint, default 0: If 0, no buffering will happen otherwise the size of the temporary readand write buffer.
default_block_sizeint, defaultNone: None means the default configuration for HDFS, a typical block size is128 MB.
kerb_ticketstr or path, defaultNone: If not None, the path to the Kerberos ticket cache.
extra_confdict, defaultNone: Extra key/value pairs for configuration; will override anyhdfs-site.xml properties.

Examples

>>>frompyarrowimportfs>>>hdfs=fs.HadoopFileSystem(host,port,user=user,kerb_ticket=ticket_cache_path)

For usage of the methods see examples forLocalFileSystem().

__init__(*args,**kwargs)#

Methods

`__init__`(args, *kwargs)
`copy_file`(self, src, dest)	Copy a file.
`create_dir`(self, path, *, bool recursive=True)	Create a directory and subdirectories.
`delete_dir`(self, path)	Delete a directory and its contents, recursively.
`delete_dir_contents`(self, path, *, ...)	Delete a directory's contents, recursively.
`delete_file`(self, path)	Delete a file.
`equals`(self, FileSystem other)
`from_uri`(uri)	Instantiate HadoopFileSystem object from an URI string.
`get_file_info`(self, paths_or_selector)	Get info for the given files.
`move`(self, src, dest)	Move / rename a file or directory.
`normalize_path`(self, path)	Normalize filesystem path.
`open_append_stream`(self, path[, ...])	Open an output stream for appending.
`open_input_file`(self, path)	Open an input file for random access reading.
`open_input_stream`(self, path[, compression, ...])	Open an input stream for sequential reading.
`open_output_stream`(self, path[, ...])	Open an output stream for sequential writing.

Attributes

type_name

The filesystem's type name.

copy_file(self,src,dest)#

Copy a file.

If the destination exists and is a directory, an error is returned.Otherwise, it is replaced.

Parameters:

srcstr: The path of the file to be copied from.
deststr: The destination path where the file is copied to.

Examples

>>>local.copy_file(path,...local_path+'/pyarrow-fs-example_copy.dat')

Inspect the file info:

>>>local.get_file_info(local_path+'/pyarrow-fs-example_copy.dat')<FileInfo for '/.../pyarrow-fs-example_copy.dat': type=FileType.File, size=4>>>>local.get_file_info(path)<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>

create_dir(self,path,*,boolrecursive=True)#

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Parameters:

pathstr: The path of the new directory.
recursivebool, defaultTrue: Create nested directories as well.

delete_dir(self,path)#

Delete a directory and its contents, recursively.

Parameters:

pathstr: The path of the directory to be deleted.

delete_dir_contents(self,path,*,boolaccept_root_dir=False,boolmissing_dir_ok=False)#

Delete a directory’s contents, recursively.

Like delete_dir, but doesn’t delete the directory itself.

Parameters:

pathstr: The path of the directory to be deleted.
accept_root_dirbool, defaultFalse: Allow deleting the root directory’s contents(if path is empty or “/”)
missing_dir_okbool, defaultFalse: If False then an error is raised if path doesnot exist

delete_file(self,path)#

Delete a file.

Parameters:

pathstr: The path of the file to be deleted.

equals(self,FileSystemother)#

Parameters:

otherpyarrow.fs.FileSystem

Returns:

bool

staticfrom_uri(uri)#

Instantiate HadoopFileSystem object from an URI string.

The following two calls are equivalent

HadoopFileSystem.from_uri('hdfs://localhost:8020/?user=test&replication=1')
HadoopFileSystem('localhost',port=8020,user='test',replication=1)

Parameters:

uristr: A string URI describing the connection to HDFS.In order to change the user, replication, buffer_size ordefault_block_size pass the values as query parts.

Returns:

HadoopFileSystem

get_file_info(self,paths_or_selector)#

Get info for the given files.

Any symlink is automatically dereferenced, recursively. A non-existingor unreachable file returns a FileStat object and has a FileType ofvalue NotFound. An exception indicates a truly exceptional condition(low-level I/O error, etc.).

Parameters:

paths_or_selectorFileSelector, path-like orlist of path-likes: Either a selector object, a path-like object or a list ofpath-like objects. The selector’s base directory will not bepart of the results, even if it exists. If it doesn’t exist,useallow_not_found.

Returns:

FileInfo orlist ofFileInfo: Single FileInfo object is returned for a single path, otherwisea list of FileInfo objects is returned.

Examples

>>>local<pyarrow._fs.LocalFileSystem object at ...>>>>local.get_file_info(f"/{local_path}/pyarrow-fs-example.dat")<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>

move(self,src,dest)#

Move / rename a file or directory.

If the destination exists:- if it is a non-empty directory, an error is returned- otherwise, if it has the same type as the source, it is replaced- otherwise, behavior is unspecified (implementation-dependent).

Parameters:

srcstr: The path of the file or the directory to be moved.
deststr: The destination path where the file or directory is moved to.

Examples

Create a new folder with a file:

>>>local.create_dir('/tmp/other_dir')>>>local.copy_file(path,'/tmp/move_example.dat')

Move the file:

>>>local.move('/tmp/move_example.dat',...'/tmp/other_dir/move_example_2.dat')

Inspect the file info:

>>>local.get_file_info('/tmp/other_dir/move_example_2.dat')<FileInfo for '/tmp/other_dir/move_example_2.dat': type=FileType.File, size=4>>>>local.get_file_info('/tmp/move_example.dat')<FileInfo for '/tmp/move_example.dat': type=FileType.NotFound>

Delete the folder:>>> local.delete_dir(‘/tmp/other_dir’)

normalize_path(self,path)#

Normalize filesystem path.

Parameters:

pathstr: The path to normalize

Returns:

normalized_pathstr: The normalized path

open_append_stream(self,path,compression='detect',buffer_size=None,metadata=None)#

Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Note

Some filesystem implementations do not support efficientappending to an existing file, in which case this method willraise NotImplementedError.Consider writing to multiple files (using e.g. the dataset layer)instead.

Parameters:

pathstr: The source to open for writing.
compressionstr optional, default ‘detect’: The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
buffer_sizeint optional, defaultNone: If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
metadatadict optional, defaultNone: If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.

Returns:

streamNativeFile

Examples

Append new data to a FileSystem subclass with nonempty file:

>>>withlocal.open_append_stream(path)asf:...f.write(b'+newly added')12

Print out the content to the file:

>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data+newly added'

open_input_file(self,path)#

Open an input file for random access reading.

Parameters:

pathstr: The source to open for reading.

Returns:

streamNativeFile

Examples

Print the data from the file withopen_input_file():

>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data'

open_input_stream(self,path,compression='detect',buffer_size=None)#

Open an input stream for sequential reading.

Parameters:

pathstr: The source to open for reading.
compressionstr optional, default ‘detect’: The compression algorithm to use for on-the-fly decompression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
buffer_sizeint optional, defaultNone: If None or 0, no buffering will happen. Otherwise the size of thetemporary read buffer.

Returns:

streamNativeFile

Examples

Print the data from the file withopen_input_stream():

>>>withlocal.open_input_stream(path)asf:...print(f.readall())b'data'

open_output_stream(self,path,compression='detect',buffer_size=None,metadata=None)#

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Parameters:

pathstr: The source to open for writing.
compressionstr optional, default ‘detect’: The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
buffer_sizeint optional, defaultNone: If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
metadatadict optional, defaultNone: If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.

Returns:

streamNativeFile

Examples

>>>local=fs.LocalFileSystem()>>>withlocal.open_output_stream(path)asstream:...stream.write(b'data')4

type_name#: The filesystem’s type name.

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.fs.HadoopFileSystem#

pyarrow.fs.HadoopFileSystem #