pyarrow.fs.GcsFileSystem#
- classpyarrow.fs.GcsFileSystem(boolanonymous=False,*,access_token=None,target_service_account=None,credential_token_expiration=None,default_bucket_location='US',scheme=None,endpoint_override=None,default_metadata=None,retry_time_limit=None,project_id=None)#
Bases:
FileSystemGoogle Cloud Storage (GCS) backed FileSystem implementation
By default uses the process described inhttps://google.aip.dev/auth/4110to resolve credentials. If not running on Google Cloud Platform (GCP),this generally requires the environment variableGOOGLE_APPLICATION_CREDENTIALS to point to a JSON filecontaining credentials.
Note: GCS buckets are special and the operations available on them may belimited or more expensive than expected compared to local file systems.
Note: When pickling a GcsFileSystem that uses default credentials, resolutioncredentials are not stored in the serialized data. Therefore, when unpicklingit is assumed that the necessary credentials are in place for the targetprocess.
- Parameters:
- anonymousbool, default
False Whether to connect anonymously.If true, will not attempt to look up credentials using standard GCPconfiguration methods.
- access_token
str, defaultNone GCP access token. If provided, temporary credentials will be fetched byassuming this role; also, acredential_token_expiration must bespecified as well.
- target_service_account
str, defaultNone An optional service account to try to impersonate when accessing GCS. Thisrequires the specified credential user or service account to have the necessarypermissions.
- credential_token_expiration
datetime, defaultNone Expiration for credential generated with an access token. Must be specifiedifaccess_token is specified.
- default_bucket_location
str, default ‘US’ GCP region to create buckets in.
- scheme
str, default ‘https’ GCS connection transport scheme.
- endpoint_override
str, defaultNone Override endpoint with a connect string such as “localhost:9000”
- default_metadatamapping or
pyarrow.KeyValueMetadata, defaultNone Default metadata foropen_output_stream. This will be ignored ifnon-empty metadata is passed toopen_output_stream.
- retry_time_limit
timedelta, defaultNone Set the maximum amount of time the GCS client will attempt to retrytransient errors. Subsecond granularity is ignored.
- project_id
str, defaultNone The GCP project identifier to use for creating buckets.If not set, the library uses the GOOGLE_CLOUD_PROJECT environmentvariable. Most I/O operations do not need a project id, only applicationsthat create new buckets need a project id.
- anonymousbool, default
- __init__(*args,**kwargs)#
Methods
__init__(*args, **kwargs)copy_file(self, src, dest)Copy a file.
create_dir(self, path, *, bool recursive=True)Create a directory and subdirectories.
delete_dir(self, path)Delete a directory and its contents, recursively.
delete_dir_contents(self, path, *, ...)Delete a directory's contents, recursively.
delete_file(self, path)Delete a file.
equals(self, FileSystem other)from_uri(uri)Create a new FileSystem from URI or Path.
get_file_info(self, paths_or_selector)Get info for the given files.
move(self, src, dest)Move / rename a file or directory.
normalize_path(self, path)Normalize filesystem path.
open_append_stream(self, path[, ...])Open an output stream for appending.
open_input_file(self, path)Open an input file for random access reading.
open_input_stream(self, path[, compression, ...])Open an input stream for sequential reading.
open_output_stream(self, path[, ...])Open an output stream for sequential writing.
Attributes
The GCP location this filesystem will write to.
The GCP project id this filesystem will use.
The filesystem's type name.
- copy_file(self,src,dest)#
Copy a file.
If the destination exists and is a directory, an error is returned.Otherwise, it is replaced.
- Parameters:
Examples
>>>local.copy_file(path,...local_path+'/pyarrow-fs-example_copy.dat')
Inspect the file info:
>>>local.get_file_info(local_path+'/pyarrow-fs-example_copy.dat')<FileInfo for '/.../pyarrow-fs-example_copy.dat': type=FileType.File, size=4>>>>local.get_file_info(path)<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- create_dir(self,path,*,boolrecursive=True)#
Create a directory and subdirectories.
This function succeeds if the directory already exists.
- default_bucket_location#
The GCP location this filesystem will write to.
- delete_dir(self,path)#
Delete a directory and its contents, recursively.
- Parameters:
- path
str The path of the directory to be deleted.
- path
- delete_dir_contents(self,path,*,boolaccept_root_dir=False,boolmissing_dir_ok=False)#
Delete a directory’s contents, recursively.
Like delete_dir, but doesn’t delete the directory itself.
- equals(self,FileSystemother)#
- Parameters:
- Returns:
- staticfrom_uri(uri)#
Create a new FileSystem from URI or Path.
Recognized URI schemes are “file”, “mock”, “s3fs”, “gs”, “gcs”, “hdfs” and “viewfs”.In addition, the argument can be a pathlib.Path object, or a stringdescribing an absolute local path.
- Parameters:
- uri
str URI-based path, for example:file:///some/local/path.
- uri
- Returns:
tupleof (FileSystem,strpath)With (filesystem, path) tuple where path is the abstract pathinside the FileSystem instance.
Examples
Create a new FileSystem subclass from a URI:
>>>uri=f'file:///{local_path}/pyarrow-fs-example.dat'>>>local_new,path_new=fs.FileSystem.from_uri(uri)>>>local_new<pyarrow._fs.LocalFileSystem object at ...>>>path_new'/.../pyarrow-fs-example.dat'
Or from a s3 bucket:
>>>fs.FileSystem.from_uri("s3://usgs-landsat/collection02/")(<pyarrow._s3fs.S3FileSystem object at ...>, 'usgs-landsat/collection02')
Or from an fsspec+ URI:
>>>fs.FileSystem.from_uri("fsspec+memory:///path/to/file")(<pyarrow._fs.PyFileSystem object at ...>, '/path/to/file')
- get_file_info(self,paths_or_selector)#
Get info for the given files.
Any symlink is automatically dereferenced, recursively. A non-existingor unreachable file returns a FileStat object and has a FileType ofvalue NotFound. An exception indicates a truly exceptional condition(low-level I/O error, etc.).
- Parameters:
- paths_or_selector
FileSelector, path-like orlistof path-likes Either a selector object, a path-like object or a list ofpath-like objects. The selector’s base directory will not bepart of the results, even if it exists. If it doesn’t exist,useallow_not_found.
- paths_or_selector
- Returns:
Examples
>>>local<pyarrow._fs.LocalFileSystem object at ...>>>>local.get_file_info(f"/{local_path}/pyarrow-fs-example.dat")<FileInfo for '/.../pyarrow-fs-example.dat': type=FileType.File, size=4>
- move(self,src,dest)#
Move / rename a file or directory.
If the destination exists:- if it is a non-empty directory, an error is returned- otherwise, if it has the same type as the source, it is replaced- otherwise, behavior is unspecified (implementation-dependent).
- Parameters:
Examples
Create a new folder with a file:
>>>local.create_dir('/tmp/other_dir')>>>local.copy_file(path,'/tmp/move_example.dat')
Move the file:
>>>local.move('/tmp/move_example.dat',...'/tmp/other_dir/move_example_2.dat')
Inspect the file info:
>>>local.get_file_info('/tmp/other_dir/move_example_2.dat')<FileInfo for '/tmp/other_dir/move_example_2.dat': type=FileType.File, size=4>>>>local.get_file_info('/tmp/move_example.dat')<FileInfo for '/tmp/move_example.dat': type=FileType.NotFound>
Delete the folder:>>> local.delete_dir(‘/tmp/other_dir’)
- normalize_path(self,path)#
Normalize filesystem path.
- open_append_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for appending.
If the target doesn’t exist, a new empty file is created.
Note
Some filesystem implementations do not support efficientappending to an existing file, in which case this method willraise NotImplementedError.Consider writing to multiple files (using e.g. the dataset layer)instead.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Append new data to a FileSystem subclass with nonempty file:
>>>withlocal.open_append_stream(path)asf:...f.write(b'+newly added')12
Print out the content to the file:
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data+newly added'
- open_input_file(self,path)#
Open an input file for random access reading.
- Parameters:
- path
str The source to open for reading.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_file():
>>>withlocal.open_input_file(path)asf:...print(f.readall())b'data'
- open_input_stream(self,path,compression='detect',buffer_size=None)#
Open an input stream for sequential reading.
- Parameters:
- path
str The source to open for reading.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly decompression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary read buffer.
- path
- Returns:
- stream
NativeFile
- stream
Examples
Print the data from the file withopen_input_stream():
>>>withlocal.open_input_stream(path)asf:...print(f.readall())b'data'
- open_output_stream(self,path,compression='detect',buffer_size=None,metadata=None)#
Open an output stream for sequential writing.
If the target already exists, existing data is truncated.
- Parameters:
- path
str The source to open for writing.
- compression
stroptional, default ‘detect’ The compression algorithm to use for on-the-fly compression.If “detect” and source is a file path, then compression will bechosen based on the file extension.If None, no compression will be applied. Otherwise, a well-knownalgorithm name must be supplied (e.g. “gzip”).
- buffer_size
intoptional, defaultNone If None or 0, no buffering will happen. Otherwise the size of thetemporary write buffer.
- metadata
dictoptional, defaultNone If not None, a mapping of string keys to string values.Some filesystems support storing metadata along the file(such as “Content-Type”).Unsupported metadata keys will be ignored.
- path
- Returns:
- stream
NativeFile
- stream
Examples
>>>local=fs.LocalFileSystem()>>>withlocal.open_output_stream(path)asstream:...stream.write(b'data')4
- project_id#
The GCP project id this filesystem will use.
- type_name#
The filesystem’s type name.

