pyarrow.fs.copy_files #

pyarrow.fs.copy_files(source,destination,source_filesystem=None,destination_filesystem=None,*,chunk_size=1048576,use_threads=True)[source]#

Copy files between FileSystems.

This functions allows you to recursively copy directories of files fromone file system to another, such as from S3 to your local machine.

Parameters:

sourcestr: Source file path or URI to a single file or directory.If a directory, files will be copied recursively from this path.
destinationstr: Destination file path or URI. Ifsource is a file,destinationis also interpreted as the destination file (not directory).Directories will be created as necessary.
source_filesystemFileSystem, optional: Source filesystem, needs to be specified ifsource is not a URI,otherwise inferred.
destination_filesystemFileSystem, optional: Destination filesystem, needs to be specified ifdestination is nota URI, otherwise inferred.
chunk_sizeint, default 1MB: The maximum size of block to read before flushing to thedestination file. A larger chunk_size will use more memory whilecopying but may help accommodate high latency FileSystems.
use_threadsbool, defaultTrue: Whether to use multiple threads to accelerate copying.

Examples

Inspect an S3 bucket’s files:

>>>s3,path=fs.FileSystem.from_uri(..."s3://registry.opendata.aws/roda/ndjson/")>>>selector=fs.FileSelector(path)>>>s3.get_file_info(selector)[<FileInfo for 'registry.opendata.aws/roda/ndjson/index.ndjson':...]

Copy one file from S3 bucket to a local directory:

>>>fs.copy_files("s3://registry.opendata.aws/roda/ndjson/index.ndjson",...f"file:///{local_path}/index_copy.ndjson")

>>>fs.LocalFileSystem().get_file_info(str(local_path)+...'/index_copy.ndjson')<FileInfo for '.../index_copy.ndjson': type=FileType.File, size=...>

Copy file using a FileSystem object:

>>>fs.copy_files("registry.opendata.aws/roda/ndjson/index.ndjson",...f"file:///{local_path}/index_copy.ndjson",...source_filesystem=fs.S3FileSystem())

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.fs.copy_files#

pyarrow.fs.copy_files #