pyarrow.fs.S3FileSystem#
- classpyarrow.fs.S3FileSystem(access_key=None,*,secret_key=None,session_token=None,boolanonymous=False,region=None,request_timeout=None,connect_timeout=None,scheme=None,endpoint_override=None,boolbackground_writes=True,default_metadata=None,role_arn=None,session_name=None,external_id=None,load_frequency=900,proxy_options=None,allow_delayed_open=False,allow_bucket_creation=False,allow_bucket_deletion=False,check_directory_existence_before_creation=False,retry_strategy:S3RetryStrategy=AwsStandardS3RetryStrategy(max_attempts=3),force_virtual_addressing=False,tls_ca_file_path=None)#
Bases:
FileSystemS3-backed FileSystem implementation
AWS access_key and secret_key can be provided explicitly.
If role_arn is provided instead of access_key and secret_key, temporarycredentials will be fetched by issuing a request to STS to assume thespecified role.
If neither access_key nor secret_key are provided, and role_arn is also notprovided, then attempts to establish the credentials automatically.S3FileSystem will try the following methods, in order:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKENenvironment variablesconfiguration files such as
~/.aws/credentialsand~/.aws/configfor nodes on Amazon EC2, the EC2 Instance Metadata Service
Note: S3 buckets are special and the operations available on them may belimited or more expensive than desired.
When S3FileSystem creates new buckets (assuming allow_bucket_creation isTrue), it does not pass any non-default settings. In AWS S3, the bucket andall objects will be not publicly visible, and will have no bucket policiesand no resource tags. To have more control over how buckets are created,use a different API to create them.
- Parameters:
- access_key
str, defaultNone AWS Access Key ID. Pass None to use the standard AWS environmentvariables and/or configuration file.
- secret_key
str, defaultNone AWS Secret Access key. Pass None to use the standard AWS environmentvariables and/or configuration file.
- session_token
str, defaultNone AWS Session Token. An optional session token, required if access_keyand secret_key are temporary credentials from STS.
- anonymousbool, default
False Whether to connect anonymously if access_key and secret_key are None.If true, will not attempt to look up credentials using standard AWSconfiguration methods.
- role_arn
str, defaultNone AWS Role ARN. If provided instead of access_key and secret_key,temporary credentials will be fetched by assuming this role.
- session_name
str, defaultNone An optional identifier for the assumed role session.
- external_id
str, defaultNone An optional unique identifier that might be required when you assumea role in another account.
- load_frequency
int, default 900 The frequency (in seconds) with which temporary credentials from anassumed role session will be refreshed.
- region
str, defaultNone AWS region to connect to. If not set, the AWS SDK will attempt todetermine the region using heuristics such as environment variables,configuration profile, EC2 metadata, or default to ‘us-east-1’ when SDKversion <1.8. One can also use
pyarrow.fs.resolve_s3_region()toautomatically resolve the region from a bucket name.- request_timeout
double, defaultNone Socket read timeouts on Windows and macOS, in seconds.If omitted, the AWS SDK default value is used (typically 3 seconds).This option is ignored on non-Windows, non-macOS systems.
- connect_timeout
double, defaultNone Socket connection timeout, in seconds.If omitted, the AWS SDK default value is used (typically 1 second).
- scheme
str, default ‘https’ S3 connection transport scheme.
- endpoint_override
str, defaultNone Override region with a connect string such as “localhost:9000”
- background_writesbool, default
True Whether file writes will be issued in the background, withoutblocking.
- default_metadatamapping or
pyarrow.KeyValueMetadata, defaultNone Default metadata for open_output_stream. This will be ignored ifnon-empty metadata is passed to open_output_stream.
- proxy_options
dictorstr, defaultNone If a proxy is used, provide the options here. Supported options are:‘scheme’ (str: ‘http’ or ‘https’; required), ‘host’ (str; required),‘port’ (int; required), ‘username’ (str; optional),‘password’ (str; optional).A proxy URI (str) can also be provided, in which case these optionswill be derived from the provided URI.The following are equivalent:
S3FileSystem(proxy_options='http://username:password@localhost:8020')S3FileSystem(proxy_options={'scheme':'http','host':'localhost','port':8020,'username':'username','password':'password'})
- allow_delayed_openbool, default
False Whether to allow file-open methods to return before the actual open. This optionmay reduce latency as it decreases the number of round trips.The downside is failures such as opening a file in a non-existing bucket willonly be reported when actual I/O is done (at worst, when attempting to close thefile).
- allow_bucket_creationbool, default
False Whether to allow directory creation at the bucket-level. This option may also bepassed in a URI query parameter.
- allow_bucket_deletionbool, default
False Whether to allow directory deletion at the bucket-level. This option may also bepassed in a URI query parameter.
- check_directory_existence_before_creationbool, default
false Whether to check the directory existence before creating it.If false, when creating a directory the code will not check if it alreadyexists or not. It’s an optimization to try directory creation and catch the error,rather than issue two dependent I/O calls.If true, when creating a directory the code will only create the directory when necessaryat the cost of extra I/O calls. This can be used for key/value cloud storage which hasa hard rate limit to number of object mutation operations or scenarios such asthe directories already exist and you do not have creation access.
- retry_strategy
S3RetryStrategy, defaultAwsStandardS3RetryStrategy(max_attempts=3) The retry strategy to use with S3; fail after max_attempts. Availablestrategies are AwsStandardS3RetryStrategy, AwsDefaultS3RetryStrategy.
- force_virtual_addressingbool, default
False Whether to use virtual addressing of buckets.If true, then virtual addressing is always enabled.If false, then virtual addressing is only enabled ifendpoint_override is empty.This can be used for non-AWS backends that only support virtual hosted-style access.
- tls_ca_file_path
str, defaultNone If set, this should be the path of a file containing TLS certificatesin PEM format which will be used for TLS verification.
- access_key
Examples
>>>frompyarrowimportfs>>>s3=fs.S3FileSystem(region='us-west-2')>>>s3.get_file_info(fs.FileSelector(...'power-analysis-ready-datastore/power_901_constants.zarr/FROCEAN',recursive=True...))[<FileInfo for 'power-analysis-ready-datastore/power_901_constants.zarr/FROCEAN/.zarray...
For usage of the methods see examples for
LocalFileSystem().- __init__(*args,**kwargs)#
Methods
__init__(*args, **kwargs)copy_file(self, src, dest)Copy a file.
create_dir(self, path, *, bool recursive=True)Create a directory and subdirectories.
delete_dir(self, path)Delete a directory and its contents, recursively.
delete_dir_contents(self, path, *, ...)Delete a directory's contents, recursively.
delete_file(self, path)Delete a file.
equals(self, FileSystem other)from_uri(uri)Create a new FileSystem from URI or Path.
get_file_info(self, paths_or_selector)Get info for the given files.
move(self, src, dest)Move / rename a file or directory.
normalize_path(self, path)Normalize filesystem path.
open_append_stream(self, path[, ...])Open an output stream for appending.
open_input_file(self, path)Open an input file for random access reading.
open_input_stream(self, path[, compression, ...])Open an input stream for sequential reading.
open_output_stream(self, path[, ...])Open an output stream for sequential writing.
Attributes
- copy_file(self,src,dest)#
Copy a file.
If the destination exists and is a directory, an error is returned.Otherwise, it is replaced.
- Parameters:
Examples
>>>local.copy_file(path,...local_path+'/pyarrow-fs-example_copy.dat')
Inspect the file info:
>>>local.get_file_info(local_path+'/pyarrow-fs-example_copy.dat')<FileInfo for '/.../pyarrow-fs-example_copy.dat': type=FileType.File, size=4>>>>local.get_file_info(path)<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- create_dir(self,path,*,boolrecursive=True)#
Create a directory and subdirectories.
This function succeeds if the directory already exists.
- delete_dir(self,path)#
Delete a directory and its contents, recursively.
- Parameters:
- path
str The path of the directory to be deleted.
- path
- delete_dir_contents(self,path,*,boolaccept_root_dir=False,boolmissing_dir_ok=False)#
Delete a directory’s contents, recursively.
Like delete_dir, but doesn’t delete the directory itself.
- equals(self,FileSystemother)#
- Parameters:
- Returns:
- staticfrom_uri(uri)#
Create a new FileSystem from URI or Path.
Recognized URI schemes are “file”, “mock”, “s3fs”, “gs”, “gcs”, “hdfs” and “viewfs”.In addition, the argument can be a pathlib.Path object, or a stringdescribing an absolute local path.
- Parameters:
- uri
str URI-based path, for example:file:///some/local/path.
- uri
- Returns:
tupleof (FileSystem,strpath)With (filesystem, path) tuple where path is the abstract pathinside the FileSystem instance.
Examples
Create a new FileSystem subclass from a URI:
>>>uri=f'file:///{local_path}/pyarrow-fs-example.dat'>>>local_new,path_new=fs.FileSystem.from_uri(uri)>>>local_new<pyarrow._fs.LocalFileSystem object at ...>>>path_new'/.../pyarrow-fs-example.dat'
Or from a s3 bucket:
>>>fs.FileSystem.from_uri("s3://usgs-landsat/collection02/")(<pyarrow._s3fs.S3FileSystem object at ...>, 'usgs-landsat/collection02')
Or from an fsspec+ URI:
>>>fs.FileSystem.from_uri("fsspec+memory:///path/to/file")(<pyarrow._fs.PyFileSystem object at ...>, '/path/to/file')
- get_file_info(self,paths_or_selector)#
Get info for the given files.
Any symlink is automatically dereferenced, recursively. A non-existingor unreachable file returns a FileStat object and has a FileType ofvalue NotFound. An exception indicates a truly exceptional condition(low-level I/O error, etc.).
- Parameters:
- paths_or_selector
FileSelector, path-like orlistof path-likes Either a selector object, a path-like object or a list ofpath-like objects. The selector’s base directory will not bepart of the results, even if it exists. If it doesn’t exist,useallow_not_found.
- paths_or_selector
- Returns:
Examples
>>>local<pyarrow._fs.LocalFileSystem object at ...>>>>local.get_file_info(f"/{local_path}/pyarrow-fs-example.dat")<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- move(self,src,dest)#
Move / rename a file or directory.
If the destination exists:- if it is a non-empty directory, an error is returned- otherwise, if it has the same type as the source, it is replaced- otherwise, behavior is unspecified (implementation-dependent).
- Parameters:
Examples
Create a new folder with a file:
>>>local.create_dir('/tmp/other_dir')>>>local.copy_file(path,'/tmp/move_example.dat')
Move the file:
>>>local.move('/tmp/move_example.dat',...'/tmp/other_dir/move_example_2.dat')
Inspect the file info:
>>>local.get_file_info('/tmp/other_dir/move_example_2.dat')<FileInfo for '/tmp/other_dir/move_example_2.dat': type=FileType.File, size=4>>>>local.get_file_info('/tmp/move_example.dat')<FileInfo for '/tmp/move_example.dat': type=FileType.NotFound>
Delete the folder:>>> local.delete_dir(‘/tmp/other_dir’)
- normalize_path(self,path)#
Normalize filesystem path.
- open_append_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for appending.
If the target doesn’t exist, a new empty file is created.
Note
Some filesystem implementations do not support efficientappending to an existing file, in which case this method willraise NotImplementedError.Consider writing to multiple files (using e.g. the dataset layer)instead.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Append new data to a FileSystem subclass with nonempty file:
>>>withlocal.open_append_stream(path)asf:...f.write(b'+newly added')12
Print out the content to the file:
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data+newly added'
- open_input_file(self,path)#
Open an input file for random access reading.
- Parameters:
- path
str The source to open for reading.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_file():
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data'
- open_input_stream(self,path,compression='detect',buffer_size=None)#
Open an input stream for sequential reading.
- Parameters:
- path
str The source to open for reading.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly decompression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary read buffer.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_stream():
>>>withlocal.open_input_stream(path)asf:...print(f.readall())b'data'
- open_output_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for sequential writing.
If the target already exists, existing data is truncated.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
>>>local=fs.LocalFileSystem()>>>withlocal.open_output_stream(path)asstream:...stream.write(b'data')4
- region#
The AWS region this filesystem connects to.
- type_name#
The filesystem’s type name.

