pyarrow.fs.AzureFileSystem#
- classpyarrow.fs.AzureFileSystem(account_name,account_key=None,*,blob_storage_authority=None,blob_storage_scheme=None,client_id=None,client_secret=None,dfs_storage_authority=None,dfs_storage_scheme=None,sas_token=None,tenant_id=None)#
Bases:
FileSystemAzure Blob Storage backed FileSystem implementation
This implementation supports flat namespace and hierarchical namespace (HNS) a.k.a.Data Lake Gen2 storage accounts. HNS will be automatically detected and HNS specific features will be used when they provide a performance advantage. Azurite emulator is also supported. Note:/ is the only supported delimiter.
The storage account is considered the root of the filesystem. When enabled, containers will be created or deleted during relevant directory operations. Obviously, this also requires authentication with the additional permissions.
By defaultDefaultAzureCredential is used for authentication. This means it will try several types of authenticationand go with the first one that works. If any authentication parameters are provided when initialising the FileSystem, they will be used instead of the default credential.
- Parameters:
- account_name
str Azure Blob Storage account name. This is the globally unique identifier for the storage account.
- account_key
str, defaultNone Account key of the storage account. If sas_token and account_key are None the default credential will be used. The parameters account_key and sas_token aremutually exclusive.
- blob_storage_authority
str, defaultNone hostname[:port] of the Blob Service. Defaults to.blob.core.windows.net. Usefulfor connecting to a local emulator, like Azurite.
- blob_storage_scheme
str, defaultNone Eitherhttp orhttps. Defaults tohttps. Useful for connecting to a local emulator, like Azurite.
- client_id
str, defaultNone The client ID (Application ID) for Azure Active Directory authentication.Its interpretation depends on the credential type being used:- ForClientSecretCredential: It is the Application (client) ID of your
registered Azure AD application (Service Principal). It must be providedtogether withtenant_id andclient_secret to use ClientSecretCredential.
ForManagedIdentityCredential: It is the client ID of a specificuser-assigned managed identity. This is only necessary if you are using auser-assigned managed identity and need to explicitly specify which one(e.g., if the resource has multiple user-assigned identities). Forsystem-assigned managed identities, this parameter is typically not required.
- client_secret
str, defaultNone Client secret for Azure Active Directory authentication. Must be provided togetherwithtenant_id andclient_id to use ClientSecretCredential.
- dfs_storage_authority
str, defaultNone hostname[:port] of the Data Lake Gen 2 Service. Defaults to.dfs.core.windows.net. Useful for connecting to a local emulator, like Azurite.
- dfs_storage_scheme
str, defaultNone Eitherhttp orhttps. Defaults tohttps. Useful for connecting to a localemulator, like Azurite.
- sas_token
str, defaultNone SAS token for the storage account, used as an alternative to account_key. If sas_tokenand account_key are None the default credential will be used. The parametersaccount_key and sas_token are mutually exclusive.
- tenant_id
str, defaultNone Tenant ID for Azure Active Directory authentication. Must be provided together withclient_id andclient_secret to use ClientSecretCredential.
- account_name
Examples
>>>frompyarrowimportfs>>>azure_fs=fs.AzureFileSystem(account_name='myaccount')>>>azurite_fs=fs.AzureFileSystem(...account_name='devstoreaccount1',...account_key='Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',...blob_storage_authority='127.0.0.1:10000',...dfs_storage_authority='127.0.0.1:10000',...blob_storage_scheme='http',...dfs_storage_scheme='http',...)
For usage of the methods see examples for
LocalFileSystem().- __init__(*args,**kwargs)#
Methods
__init__(*args, **kwargs)copy_file(self, src, dest)Copy a file.
create_dir(self, path, *, bool recursive=True)Create a directory and subdirectories.
delete_dir(self, path)Delete a directory and its contents, recursively.
delete_dir_contents(self, path, *, ...)Delete a directory's contents, recursively.
delete_file(self, path)Delete a file.
equals(self, FileSystem other)from_uri(uri)Create a new FileSystem from URI or Path.
get_file_info(self, paths_or_selector)Get info for the given files.
move(self, src, dest)Move / rename a file or directory.
normalize_path(self, path)Normalize filesystem path.
open_append_stream(self, path[, ...])Open an output stream for appending.
open_input_file(self, path)Open an input file for random access reading.
open_input_stream(self, path[, compression, ...])Open an input stream for sequential reading.
open_output_stream(self, path[, ...])Open an output stream for sequential writing.
Attributes
The filesystem's type name.
- copy_file(self,src,dest)#
Copy a file.
If the destination exists and is a directory, an error is returned.Otherwise, it is replaced.
- Parameters:
Examples
>>>local.copy_file(path,...local_path+'/pyarrow-fs-example_copy.dat')
Inspect the file info:
>>>local.get_file_info(local_path+'/pyarrow-fs-example_copy.dat')<FileInfo for '/.../pyarrow-fs-example_copy.dat': type=FileType.File, size=4>>>>local.get_file_info(path)<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- create_dir(self,path,*,boolrecursive=True)#
Create a directory and subdirectories.
This function succeeds if the directory already exists.
- delete_dir(self,path)#
Delete a directory and its contents, recursively.
- Parameters:
- path
str The path of the directory to be deleted.
- path
- delete_dir_contents(self,path,*,boolaccept_root_dir=False,boolmissing_dir_ok=False)#
Delete a directory’s contents, recursively.
Like delete_dir, but doesn’t delete the directory itself.
- equals(self,FileSystemother)#
- Parameters:
- Returns:
- staticfrom_uri(uri)#
Create a new FileSystem from URI or Path.
Recognized URI schemes are “file”, “mock”, “s3fs”, “gs”, “gcs”, “hdfs” and “viewfs”.In addition, the argument can be a pathlib.Path object, or a stringdescribing an absolute local path.
- Parameters:
- uri
str URI-based path, for example:file:///some/local/path.
- uri
- Returns:
tupleof (FileSystem,strpath)With (filesystem, path) tuple where path is the abstract pathinside the FileSystem instance.
Examples
Create a new FileSystem subclass from a URI:
>>>uri=f'file:///{local_path}/pyarrow-fs-example.dat'>>>local_new,path_new=fs.FileSystem.from_uri(uri)>>>local_new<pyarrow._fs.LocalFileSystem object at ...>>>path_new'/.../pyarrow-fs-example.dat'
Or from a s3 bucket:
>>>fs.FileSystem.from_uri("s3://usgs-landsat/collection02/")(<pyarrow._s3fs.S3FileSystem object at ...>, 'usgs-landsat/collection02')
Or from an fsspec+ URI:
>>>fs.FileSystem.from_uri("fsspec+memory:///path/to/file")(<pyarrow._fs.PyFileSystem object at ...>, '/path/to/file')
- get_file_info(self,paths_or_selector)#
Get info for the given files.
Any symlink is automatically dereferenced, recursively. A non-existingor unreachable file returns a FileStat object and has a FileType ofvalue NotFound. An exception indicates a truly exceptional condition(low-level I/O error, etc.).
- Parameters:
- paths_or_selector
FileSelector, path-like orlistof path-likes Either a selector object, a path-like object or a list ofpath-like objects. The selector’s base directory will not bepart of the results, even if it exists. If it doesn’t exist,useallow_not_found.
- paths_or_selector
- Returns:
Examples
>>>local<pyarrow._fs.LocalFileSystem object at ...>>>>local.get_file_info(f"/{local_path}/pyarrow-fs-example.dat")<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- move(self,src,dest)#
Move / rename a file or directory.
If the destination exists:- if it is a non-empty directory, an error is returned- otherwise, if it has the same type as the source, it is replaced- otherwise, behavior is unspecified (implementation-dependent).
- Parameters:
Examples
Create a new folder with a file:
>>>local.create_dir('/tmp/other_dir')>>>local.copy_file(path,'/tmp/move_example.dat')
Move the file:
>>>local.move('/tmp/move_example.dat',...'/tmp/other_dir/move_example_2.dat')
Inspect the file info:
>>>local.get_file_info('/tmp/other_dir/move_example_2.dat')<FileInfo for '/tmp/other_dir/move_example_2.dat': type=FileType.File, size=4>>>>local.get_file_info('/tmp/move_example.dat')<FileInfo for '/tmp/move_example.dat': type=FileType.NotFound>
Delete the folder:>>> local.delete_dir(‘/tmp/other_dir’)
- normalize_path(self,path)#
Normalize filesystem path.
- open_append_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for appending.
If the target doesn’t exist, a new empty file is created.
Note
Some filesystem implementations do not support efficientappending to an existing file, in which case this method willraise NotImplementedError.Consider writing to multiple files (using e.g. the dataset layer)instead.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Append new data to a FileSystem subclass with nonempty file:
>>>withlocal.open_append_stream(path)asf:...f.write(b'+newly added')12
Print out the content to the file:
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data+newly added'
- open_input_file(self,path)#
Open an input file for random access reading.
- Parameters:
- path
str The source to open for reading.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_file():
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data'
- open_input_stream(self,path,compression='detect',buffer_size=None)#
Open an input stream for sequential reading.
- Parameters:
- path
str The source to open for reading.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly decompression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary read buffer.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_stream():
>>>withlocal.open_input_stream(path)asf:...print(f.readall())b'data'
- open_output_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for sequential writing.
If the target already exists, existing data is truncated.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
>>>local=fs.LocalFileSystem()>>>withlocal.open_output_stream(path)asstream:...stream.write(b'data')4
- type_name#
The filesystem’s type name.

