Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

License

NotificationsYou must be signed in to change notification settings

drivendataorg/cloudpathlib

Repository files navigation

Docs StatusPyPIconda-forgeconda-forge feedstocktestscodecov

Our goal is to be the meringue of file management libraries: the subtle sweetness ofpathlib working in harmony with the ethereal lightness of the cloud.

A Python library with classes that mimicpathlib.Path's interface for URIs from different cloud storage services.

withCloudPath("s3://bucket/filename.txt").open("w+")asf:f.write("Send my changes to the cloud!")

Why use cloudpathlib?

  • Familiar: If you know how to interact withPath, you know how to interact withCloudPath. All of the cloud-relevantPath methods are implemented.
  • Supported clouds: AWS S3, Google Cloud Storage, and Azure Blob Storage are implemented. FTP is on the way.
  • Extensible: The base classes do most of the work generically, so implementing two small classesMyPath andMyClient is all you need to add support for a new cloud storage service.
  • Read/write support: Reading just works. Using thewrite_text,write_bytes or.open('w') methods will all upload your changes to cloud storage without any additional file management as a developer.
  • Seamless caching: Files are downloaded locally only when necessary. You can also easily pass a persistent cache folder so that across processes and sessions you only re-download what is necessary.
  • Tested: Comprehensive test suite and code coverage.
  • Testability: Local filesystem implementations that can be used to easily mock cloud storage in your unit tests.

Installation

cloudpathlib depends on the cloud services' SDKs (e.g.,boto3,google-cloud-storage,azure-storage-blob) to communicate with their respective storage service. If you try to use cloud paths for a cloud service for which you don't have dependencies installed,cloudpathlib will error and let you know what you need to install.

To install a cloud service's SDK dependency when installingcloudpathlib, you need to specify it using pip's"extras" specification. For example:

pip install cloudpathlib[s3,gs,azure]

With some shells, you may need to use quotes:

pip install"cloudpathlib[s3,gs,azure]"

Currently supported cloud storage services are:azure,gs,s3. You can also useall to install all available services' dependencies.

If you do not specify any extras or separately install any cloud SDKs, you will only be able to develop with the base classes for rolling your own cloud path class.

conda

cloudpathlib is also available usingconda from conda-forge. Note that to install the necessary cloud service SDK dependency, you should include the appropriate suffix in the package name. For example:

conda install cloudpathlib-s3 -c conda-forge

If no suffix is used, only the base classes will be usable. See theconda-forge/cloudpathlib-feedstock for all installation options.

Development version

You can get latest development version from GitHub:

pip install https://github.com/drivendataorg/cloudpathlib.git#egg=cloudpathlib[all]

Note that you similarly need to specify cloud service dependencies, such asall in the above example command.

Quick usage

Here's an example to get the gist of using the package. By default,cloudpathlib authenticates with the environment variables supported by each respective cloud service SDK. For more details and advanced authentication options, see the"Authentication" documentation.

fromcloudpathlibimportCloudPath# dispatches to S3Path based on prefixroot_dir=CloudPath("s3://drivendata-public-assets/")root_dir#> S3Path('s3://drivendata-public-assets/')# there's only one file, but globbing works in nested folderforfinroot_dir.glob('**/*.txt'):text_data=f.read_text()print(f)print(text_data)#> s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt#> Eviction Lab Data Dictionary#>#> Additional information in our FAQ evictionlab.org/help-faq/#> Full methodology evictionlab.org/methods/#>#> ... (additional text output truncated)# use / to join paths (and, in this case, create a new file)new_file_copy=root_dir/"nested_dir/copy_file.txt"new_file_copy#> S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt')# show things work and the file does not exist yetnew_file_copy.exists()#> False# writing text data to the new file in the cloudnew_file_copy.write_text(text_data)#> 6933# file now listedlist(root_dir.glob('**/*.txt'))#> [S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt'),#>  S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]# but, we can remove itnew_file_copy.unlink()# no longer therelist(root_dir.glob('**/*.txt'))#> [S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

Supported methods and properties

Most methods and properties frompathlib.Path are supported except for the ones that don't make sense in a cloud context. There are a few additional methods or properties that relate to specific cloud services or specifically for cloud paths.

Methods + propertiesAzureBlobPathS3PathGSPath
absolute
anchor
as_uri
drive
exists
glob
is_absolute
is_dir
is_file
is_relative_to
iterdir
joinpath
match
mkdir
name
open
parent
parents
parts
read_bytes
read_text
relative_to
rename
replace
resolve
rglob
rmdir
samefile
stat
stem
suffix
suffixes
touch
unlink
with_name
with_stem
with_suffix
write_bytes
write_text
as_posix
chmod
cwd
expanduser
group
hardlink_to
home
is_block_device
is_char_device
is_fifo
is_mount
is_reserved
is_socket
is_symlink
lchmod
link_to
lstat
owner
readlink
root
symlink_to
as_url
clear_cache
cloud_prefix
copy
copytree
download_to
etag
fspath
is_junction
is_valid_cloudpath
rmtree
upload_from
validate
walk
with_segments
blob
bucket
container
key
md5

Icon made bysrip fromwww.flaticon.com.
Sample code block generated using thereprexpy package.

About

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp