rkhomenko/s5cmdPublic

forked frompeak/s5cmd

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Parallel S3 and local filesystem execution tool.

License

MIT license

0 stars 274 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 528 Commits
.github		.github
benchmark		benchmark
command		command
doc		doc
e2e		e2e
error		error
log		log
orderedwriter		orderedwriter
parallel		parallel
progressbar		progressbar
storage		storage
strutil		strutil
vendor		vendor
version		version
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Repository files navigation

Overview

s5cmd is a very fast S3 and local filesystem execution tool. It comes with supportfor a multitude of operations including tab completion and wildcard supportfor files, which can be very handy for your object storage workflow while workingwith large number of files.

There are already other utilities to work with S3 and similar object storageservices, thus it is natural to wonder whats5cmd has to offer that others don't.

In short,s5cmd offers a very fast speed.Thanks toJoshua Robinson for hisstudy and experimentation ons5cmd; to quote his mediumpost:

For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli.For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmdand aws-cli can only reach 85 MB/s and 375 MB/s respectively.

If you would like to know more about performance ofs5cmd and thereasons for its fast speed, refer tobenchmarks section

Features

s5cmd supports wide range of object management tasks both for cloudstorage services and local filesystems.

List buckets and objects
Upload, download or delete objects
Move, copy or rename objects
Set Server Side Encryption using AWS Key Management Service (KMS)
Set Access Control List (ACL) for objects/files on the upload, copy, move.
Print object contents to stdout
Select JSON records from objects using SQL expressions
Create or remove buckets
Summarize objects sizes, grouping by storage class
Wildcard support for all operations
Multiple arguments support for delete operation
Command file support to run commands in batches at very high execution speeds
Dry run support
S3 Transfer Acceleration support
Google Cloud Storage (and any other S3 API compatible service) support
Structured logging for querying command outputs
Shell auto-completion
S3 ListObjects API backward compatibility

Installation

Official Releases

Binaries

TheReleases page provides pre-builtbinaries for Linux, macOS and Windows.

Homebrew

For macOS, ahomebrew tap is provided:

brew install peak/tap/s5cmd

Unofficial Releases (by Community)

WarningThese releases are maintained by the community. They might be out of date compared to the official releases.

MacPorts

You can also installs5cmd fromMacPorts on macOS:

sudo port selfupdatesudo port install s5cmd

Conda

s5cmd isincluded in theconda-forge channel, and it can be downloaded through theConda.

Installings5cmd from theconda-forge channel can be achieved by addingconda-forge to your channels with:
conda config --add channels conda-forgeconda config --set channel_priority strict
Once theconda-forge channel has been enabled,s5cmd can be installed withconda:
conda install s5cmd

ps. Quoted froms5cmd feedstock. You can also find further instructions on itsREADME.

FreeBSD

On FreeBSD you can install s5cmd as a package:

pkg install s5cmd

or via ports:

cd /usr/ports/net/s5cmdmake install clean

Build from source

You can builds5cmd from source if you haveGo 1.19+installed.

go install github.com/peak/s5cmd/v2@master

⚠️ Please note that building frommaster is not guaranteed to be stable sincedevelopment happens onmaster branch.

Docker

Hub

$ docker pull peakcom/s5cmd$ docker run --rm -v ~/.aws:/root/.aws peakcom/s5cmd <S3 operation>

ℹ️/aws directory is the working directory of the image. Mounting your current working directory to it allows you to runs5cmd as if it was installed in your system;

docker run --rm -v $(pwd):/aws -v ~/.aws:/root/.aws peakcom/s5cmd <S3 operation>

Build

$ git clone https://github.com/peak/s5cmd && cd s5cmd$ docker build -t s5cmd .$ docker run --rm -v ~/.aws:/root/.aws s5cmd <S3 operation>

Usage

s5cmd supports multiple-level wildcards for all S3 operations. This isachieved by listing all S3 objects with the prefix up to the first wildcard,then filtering the results in-memory. For example, for the following command;

s5cmd cp 's3://bucket/logs/2020/03/*' .

first aListObjects request is send, then the copy operation will be executedagainst each matching object, in parallel.

Specifying credentials

s5cmd uses official AWS SDK to access S3. SDK requires credentials to signrequests to AWS. Credentials can be provided in avariety of ways:

Command line options--profile to use a named profile,--credentials-file flag to use the specified credentials file

# Use your company profile in AWS default credential files5cmd --profile my-work-profile ls s3://my-company-bucket/# Use your company profile in your own credential files5cmd --credentials-file~/.your-credentials-file --profile my-work-profile ls s3://my-company-bucket/

Environment variables

# Export your AWS access key and secret pairexport AWS_ACCESS_KEY_ID='<your-access-key-id>'export AWS_SECRET_ACCESS_KEY='<your-secret-access-key>'export AWS_PROFILE='<your-profile-name>'export AWS_REGION='<your-bucket-region>'s5cmd ls s3://your-bucket/

Ifs5cmd runs on an Amazon EC2 instance, EC2 IAM role
Ifs5cmd runs on EKS, Kube IAM role

Or, you can send requests anonymously with--no-sign-request option

# List objects in a public buckets5cmd --no-sign-request ls s3://public-bucket/

Region detection

While executing the commands,s5cmd detects the region according to the following order of priority:

--source-region or--destination-region flags ofcp command.
AWS_REGION environment variable.
Region section of AWS profile.
Auto detection from bucket region (viaHeadBucket API call).
us-east-1 as default region.

Examples

Download a single S3 object

s5cmd cp s3://bucket/object.gz .

Download multiple S3 objects

Suppose we have the following objects:

s3://bucket/logs/2020/03/18/file1.gzs3://bucket/logs/2020/03/19/file2.gzs3://bucket/logs/2020/03/19/originals/file3.gz

s5cmd cp 's3://bucket/logs/2020/03/*' logs/

s5cmd will match the given wildcards and arguments by doing an efficientsearch against the given prefixes. All matching objects will be downloaded inparallel.s5cmd will create the destination directory if it is missing.

logs/ directory content will look like:

$ tree.└── logs    ├── 18    │   └── file1.gz    └── 19        ├── file2.gz        └── originals            └── file3.gz4 directories, 3 files

ℹ️s5cmd preserves the source directory structure by default. If you want toflatten the source directory structure, use the--flatten flag.

s5cmd cp --flatten 's3://bucket/logs/2020/03/*' logs/

logs/ directory content will look like:

$ tree.└── logs    ├── file1.gz    ├── file2.gz    └── file3.gz1 directory, 3 files

Upload a file to S3

s5cmd cp object.gz s3://bucket/

by setting server side encryption (aws kms) of the file:

s5cmd cp -sse aws:kms -sse-kms-key-id <your-kms-key-id> object.gz s3://bucket/

by setting Access Control List (acl) policy of the object:

s5cmd cp -acl bucket-owner-full-control object.gz s3://bucket/

Upload multiple files to S3

s5cmd cp directory/ s3://bucket/

Will upload all files at given directory to S3 while keeping the folder hierarchyof the source.

Stream stdin to S3

You can upload remote objects by piping stdin tos5cmd:

curl https://github.com/peak/s5cmd/ | s5cmd pipe s3://bucket/s5cmd.html

Or you can compress the data before uploading:

tar -cf - file.bin | s5cmd pipe s3://bucket/file.bin.tar

Delete an S3 object

s5cmd rm s3://bucket/logs/2020/03/18/file1.gz

Delete multiple S3 objects

s5cmd rm s3://bucket/logs/2020/03/19/*

Will remove all matching objects:

s3://bucket/logs/2020/03/19/file2.gzs3://bucket/logs/2020/03/19/originals/file3.gz

s5cmd utilizes S3 delete batch API. If matching objects are up to 1000,they'll be deleted in a single request. However, it should be noted that commands such as

s5cmd rm s3://bucket-foo/object s3://bucket-bar/object

are not supported bys5cmd and result in error (since we have 2 different buckets), as it is in odds with the benefit of performing batch delete requests. Thus, if in need, one can uses5cmd run mode for this case, i.e,

$ s5cmd runrm s3://bucket-foo/objectrm s3://bucket-bar/object

more details and examples ons5cmd run are presented in alater section.

Copy objects from S3 to S3

s5cmd supports copying objects on the server side as well.

s5cmd cp 's3://bucket/logs/2020/*' s3://bucket/logs/backup/

Will copy all the matching objects to the given S3 prefix, respecting the sourcefolder hierarchy.

⚠️ Copying objects (from S3 to S3) larger than 5GB is not supported yet. We haveanopen ticket to track the issue.

Using Exclude and Include Filters

s5cmd supports the--exclude and--include flags, which can be used to specify patterns for objects to be excluded or included in commands.

The--exclude flag specifies objects that should be excluded from the operation. Any object that matches the pattern will be skipped.
The--include flag specifies objects that should be included in the operation. Only objects that match the pattern will be handled.
If both flags are used,--exclude has precedence over--include. This means that if an object URL matches any of the--exclude patterns, the object will be skipped, even if it also matches one of the--include patterns.
The order of the flags does not affect the results (unlikeaws-cli).

The command below will delete only objects that end with.log.

s5cmd rm --include "*.log" 's3://bucket/logs/2020/*'

The command below will delete all objects except those that end with.log or.txt.

s5cmd rm --exclude "*.log" --exclude "*.txt" 's3://bucket/logs/2020/*'

If you wish, you can use multiple flags, like below. It will download objects that start withrequest or end with.log.

s5cmd cp --include "*.log" --include "request*" 's3://bucket/logs/2020/*' .

Using a combination of--include and--exclude also possible. The command below will only sync objects that end with.log or.txt but exclude those that start withaccess_. For example,request.log, andlicense.txt will be included, whileaccess_log.txt, andreadme.md are excluded.

s5cmd sync --include "*.log" --exclude "access_*" --include "*.txt" 's3://bucket/logs/*' .

Select JSON object content using SQL

s5cmd supports theSelectObjectContent S3 operation, and will run yourSQL queryagainst objects matching normal wildcard syntax and emit matching JSON records via stdout. Recordsfrom multiple objects will be interleaved, and order of the records is not guaranteed (though it'slikely that the records from a single object will arrive in-order, even if interleaved with otherrecords).

$ s5cmd select --compression GZIP \  --query "SELECT s.timestamp, s.hostname FROM S3Object s WHERE s.ip_address LIKE '10.%' OR s.application='unprivileged'" \  s3://bucket-foo/object/2021/*{"timestamp":"2021-07-08T18:24:06.665Z","hostname":"application.internal"}{"timestamp":"2021-07-08T18:24:16.095Z","hostname":"api.github.com"}

At the moment this operationonly supports JSON records selected with SQL. S3 calls thislines-type JSON, but it seems that it works even if the records aren't line-delineated. YMMV.

Count objects and determine total size

$ s5cmd du --humanize 's3://bucket/2020/*'30.8M bytes in 3 objects: s3://bucket/2020/*

Run multiple commands in parallel

The most powerful feature ofs5cmd is the commands file. Thousands of S3 andfilesystem commands are declared in a file (or simply piped in from anotherprocess) and they are executed using multiple parallel workers. Since only oneprogram is launched, thousands of unnecessary fork-exec calls are avoided. Thisway S3 execution times can reach a few thousand operations per second.

s5cmd run commands.txt

cat commands.txt | s5cmd run

commands.txt content could look like:

cp s3://bucket/2020/03/* logs/2020/03/# line comments are supportedrm s3://bucket/2020/03/19/file2.gz# empty lines are OK too like above# rename an S3 objectmv s3://bucket/2020/03/18/file1.gz s3://bucket/2020/03/18/original/file.gz

Sync

sync command synchronizes S3 buckets, prefixes, directories and files between S3 buckets and prefixes as well.It compares files between source and destination, taking source files assource-of-truth;

copies files those do not exist in destination
copies files those exist in both locations if the comparison made with sync strategy allows it so

It makes a one way synchronization from source to destination without modifying any of the source files and deleting any of the destination files (unless--delete flag has passed).

Suppose we have following files;

   -  29 Sep 10:00 .5000  29 Sep 11:00 ├── favicon.ico 300  29 Sep 10:00 ├── index.html  50  29 Sep 10:00 ├── readme.md  80  29 Sep 11:30 └── styles.css

s5cmd ls s3://bucket/static/2021/09/29 10:00:01               300 index.html2021/09/29 11:10:01                10 readme.md2021/09/29 10:00:01                90 styles.css2021/09/29 11:10:01                10 test.html

running would;

copyfavicon.ico
- file does not exist in destination.
copystyles.css
- source file is newer than to remote counterpart.
copyreadme.md
- even though the source one is older, it's size differs from the destination one; assuming source file is the source of truth.

s5cmd sync . s3://bucket/static/cp favicon.ico s3://bucket/static/favicon.icocp styles.css s3://bucket/static/styles.csscp readme.md s3://bucket/static/readme.md

Running with--delete flag would delete files those do not exist in the source;

s5cmd sync --delete . s3://bucket/static/rm s3://bucket/test.htmlcp favicon.ico s3://bucket/static/favicon.icocp styles.css s3://bucket/static/styles.csscp readme.md s3://bucket/static/readme.md

It's also possible to use wildcards to sync only a subset of files.

To sync only.html files in S3 bucket above to same local file system;

s5cmd sync 's3://bucket/static/*.html' .cp s3://bucket/prefix/index.html index.htmlcp s3://bucket/prefix/test.html test.html

Strategy

Default

By defaults5cmd compares files' both sizeand modification times, treating source files assource of truth. Any difference in size or modification time would causes5cmd to copy source object to destination.

mod time	size	should sync
src > dst	src != dst	✅
src > dst	src == dst	✅
src <= dst	src != dst	✅
src <= dst	src == dst	❌

Size only

With--size-only flag, it's possible to use the strategy that would only compare file sizes. Source treated assource of truth and any difference in sizes would causes5cmd to copy source object to destination.

mod time	size	should sync
src > dst	src != dst	✅
src > dst	src = dst	❌
src <= dst	src != dst	✅
src <= dst	src == dst	❌

Dry run

--dry-run flag will output what operations will be performed without actuallycarrying out those operations.

s3://bucket/pre/file1.gz...s3://bucket/last.txt

running

s5cmd --dry-run cp s3://bucket/pre/* s3://another-bucket/

will output

cp s3://bucket/pre/file1.gz s3://another-bucket/file1.gz...cp s3://bucket/pre/last.txt s3://anohter-bucket/last.txt

however, those copy operations will not be performed. It is displaying whats5cmd will do when ran without--dry-run

Note that--dry-run can be used with any operation that has a side effect, i.e.,cp, mv, rm, mb ...

S3 ListObjects API Backward Compatibility

The--use-list-objects-v1 flag will force using S3 ListObjectsV1 API. Thisflag is useful for services that do not support ListObjectsV2 API.

s5cmd --use-list-objects-v1 ls s3://bucket/

Shell auto-completion

Shell completion is supported for bash, pwsh (PowerShell) and zsh.

Runs5cmd --install-completion to obtain the appropriate auto-completion script for your shell, note thatinstall-completion does not install the auto-completion but merely gives the instructions to install. The name is kept as it is for backward compatibility.

To actually enable auto-completion:

in bash and zsh:

you should add auto-completion script to.bashrc and.zshrc file.

in pwsh:

you should save the autocompletion script to a file nameds5cmd.ps1 and add the full path of "s5cmd.ps1" file to profile file (which you can locate with$profile)

Finally, restart your shell to activate the changes.

NoteThe environment variableSHELL must be accurate for the autocompletion to function properly. That is it should point tobash binary in bash, tozsh binary in zsh and topwsh binary in PowerShell.

NoteThe autocompletion is tested with following versions of the shells:
zsh 5.8.1 (x86_64-apple-darwin21.0)
GNUbash, version 5.1.16(1)-release (x86_64-apple-darwin21.1.0)
PowerShell 7.2.6

Google Cloud Storage support

s5cmd supports S3 API compatible services, such as GCS, Minio or your favoriteobject storage.

s5cmd --endpoint-url https://storage.googleapis.com ls

or an alternative with environment variable

S3_ENDPOINT_URL="https://storage.googleapis.com" s5cmd ls# orexport S3_ENDPOINT_URL="https://storage.googleapis.com"s5cmd ls

all variants will return your GCS buckets.

s5cmd reads.aws/credentials to access Google Cloud Storage. Populate theaws_access_key_id andaws_secret_access_key fields in.aws/credentials with an HMAC key created using thisprocedure.

s5cmd will use virtual-host style bucket resolving for S3, S3 transferacceleration and GCS. If a custom endpoint is provided, it'll fallback topath-style.

Retry logic

s5cmd uses an exponential backoff retry mechanism for transient or potentialserver-side throttling errors. Non-retriable errors, such asinvalid credentials,authorization errors etc, will not be retried. By default,s5cmd will retry 10 times for up to a minute. Number of retries are adjustablevia--retry-count flag.

ℹ️ Enable debug level logging for displaying retryable errors.

Integrity Verification

s5cmd verifies the integrity of files uploaded to Amazon S3 by checking theContent-MD5 andX-Amz-Content-Sha256 headers. These headers are added by the AWS SDK for both standard and multipart uploads.

Content-MD5 is a checksum of the file's contents, calculated using theMD5 algorithm.
X-Amz-Content-Sha256 is a checksum of the file's contents, calculated using theSHA256 algorithm.

If the checksums in these headers do not match the checksum of the file that was actually uploaded, thens5cmd will fail the upload. This helps to ensure that the file was not corrupted during transmission.

If the checksum calculated by S3 does not match the checksums provided in theContent-MD5 andX-Amz-Content-Sha256 headers, S3 will not store the object. Instead, it will return an error message tos5cmd with the error codeInvalidDigest for anMD5 mismatch orXAmzContentSHA256Mismatch for aSHA256 mismatch.

Error Code	Description
`InvalidDigest`	The checksum provided in the`Content-MD5` header does not match the checksum calculated by S3.
`XAmzContentSHA256Mismatch`	The checksum provided in the`X-Amz-Content-Sha256` header does not match the checksum calculated by S3.

Ifs5cmd receives either of these error codes, it will not retry to upload the object again and exit code will be1.

If theMD5 checksum mismatches, you will see an error like the one below.

ERROR "cp file.log s3://bucket/file.log": InvalidDigest: The Content-MD5 you specified was invalid. status code: 400, request id: S3TR4P2E0A2K3JMH7, host id: XTeMYKd2KECOHWk5S

If theSHA256 checksum mismatches, you will see an error like the one below.

ERROR "cp file.log s3://bucket/file.log": XAmzContentSHA256Mismatch: The provided 'x-amz-content-sha256' header does not match what was computed. status code: 400, request id: S3TR4P2E0A2K3JMH7, host id: XTeMYKd2KECOHWk5S

aws-cli ands5cmd are both command-line tools that can be used to interact with Amazon S3. However, there are some differences between the two tools in terms of how they verify the integrity of data uploaded to S3.

Number of retries:aws-cli will retry up to five times to upload a file, whiles5cmd will not retry.
Checksums: If you enableSignature Version 4 in your~/.aws/config file,aws-cli will only check theSHA256 checksum of a file whiles5cmd will check both theMD5 andSHA256 checksums.

Sources:

Using wildcards

On some shells, like zsh, the* character gets treated as a file globbingwildcard, which causes unexpected results fors5cmd. You might see an outputlike:

zsh: no matches found

If that happens, you need to wrap your wildcard expression in single quotes, like:

s5cmd cp '*.gz' s3://bucket/

Output

s5cmd supports both structured and unstructured outputs.

unstructured output

$ s5cmd cp s3://bucket/testfile.cp s3://bucket/testfile testfile

$ s5cmd cp --no-clobber s3://somebucket/file.txt file.txtERROR"cp s3://somebucket/file.txt file.txt": object already exists

If--json flag is provided:

{"operation":"cp","success":true,"source":"s3://bucket/testfile","destination":"testfile","object":"[object]"}{"operation":"cp","job":"cp s3://somebucket/file.txt file.txt","error":"'cp s3://somebucket/file.txt file.txt': object already exists"}

Configuring Concurrency

numworkers

numworkers is a global option that sets the size of the global worker pool. Default value ofnumworkers is256.Commands such ascp,select andrun, which can benefit from parallelism use this worker pool to execute tasks. A task can be an upload, a download or anything in arun file.

For example, if you are uploading 100 files to an S3 bucket and the--numworkers is set to 10, thens5cmd will limit the number of files concurrently uploaded to 10.

s5cmd --numworkers 10 cp '/Users/foo/bar/*' s3://mybucket/foo/bar/

concurrency

concurrency is acp command option. It sets the number of parts that will be uploaded or downloaded in parallel for a single file.This parameter is used by the AWS Go SDK. Default value ofconcurrency is5.

numworkers andconcurrency options can be used together:

s5cmd --numworkers 10 cp --concurrency 10 '/Users/foo/bar/*' s3://mybucket/foo/bar/

If you have a few, large files to download, setting--numworkers to a very high value will not affect download speed. In this scenario setting--concurrency to a higher value may have a better impact on the download speed.

Benchmarks

Some benchmarks regarding the performance ofs5cmd are introduced below. For moredetails refer to thispostwhich is the source of the benchmarks to be presented.

Upload/download of single large file

Uploading large number of small-sized files

Performance comparison on different hardware

So, where does all this speed come from?

There are mainly two reasons for this:

It is written in Go, a statically compiled language designed to make developmentof concurrent systems easy and make full utilization of multi-core processors.
Parallelization.s5cmd starts out with concurrent worker pools and parallelizesworkloads as much as possible while trying to achieve maximum throughput.

performance regression tests

bench.py script can be used to compare performance of two different s5cmd builds. Refer to thisreadme file for further details.

Advanced Usage

Some of the advanced usage patterns provided below are inspired by the followingarticle (thank you!@joshuarobinson)

Integrate s5cmd operations with Unix commands

Assume we have a set of objects on S3, and we would like to list them in sorted fashion according to object names.

$ s5cmd ls s3://bucket/reports/ | sort -k 42020/08/17 09:34:33              1364 antalya.csv2020/08/17 09:34:33                 0 batman.csv2020/08/17 09:34:33             23114 istanbul.csv2020/08/17 09:34:33             26154 izmir.csv2020/08/17 09:34:33               112 samsun.csv2020/08/17 09:34:33             12552 van.csv

For a more practical scenario, let's say we have anavocado prices dataset, and we would like to take a peek at the few lines of the data by fetching only the necessary bytes.

$ s5cmd cat s3://bucket/avocado.csv.gz | gunzip | xsv slice --len 5 | xsv table    Date        AveragePrice  Total Volume  4046     4225       4770   Total Bags  Small Bags  Large Bags  XLarge Bags  type          year  region0   2015-12-27  1.33          64236.62      1036.74  54454.85   48.16  8696.87     8603.62     93.25       0.0          conventional  2015  Albany1   2015-12-20  1.35          54876.98      674.28   44638.81   58.33  9505.56     9408.07     97.49       0.0          conventional  2015  Albany2   2015-12-13  0.93          118220.22     794.7    109149.67  130.5  8145.35     8042.21     103.14      0.0          conventional  2015  Albany3   2015-12-06  1.08          78992.15      1132.0   71976.41   72.58  5811.16     5677.4      133.76      0.0          conventional  2015  Albany4   2015-11-29  1.28          51039.6       941.48   43838.39   75.78  6183.95     5986.26     197.69      0.0          conventional  2015  Albany

Beast Mode s5cmd

s5cmd allows to pass in some file, containing list of operations to be performed, as an argument to therun command as illustrated in theabove example. Alternatively, one can pipe in commands intotherun:

BUCKET=s5cmd-test; s5cmd ls s3://$BUCKET/*test | grep -v DIR | awk ‘{print $NF}’| xargs -I {} echo “cp s3://$BUCKET/{} /local/directory/” | s5cmd run

The above command performs twos5cmd invocations; first, searches for files withtest suffix and then creates acopy to local directory command for each matching file and finally, pipes in those into the run.

Let's examine another usage instance, where we migrate files older than30 days to a cloud object storage:

find /mnt/joshua/nachos/ -type f -mtime +30 | awk '{print "mv "$1" s3://joshuarobinson/backup/"$1}'| s5cmd run

It is worth to mention that,run command should not be considered as asilver bullet for all operations. For example, assume we want to remove the following objects:

s3://bucket/prefix/2020/03/object1.gzs3://bucket/prefix/2020/04/object1.gz...s3://bucket/prefix/2020/09/object77.gz

Rather than executing

rm s3://bucket/prefix/2020/03/object1.gzrm s3://bucket/prefix/2020/04/object1.gz...rm s3://bucket/prefix/2020/09/object77.gz

withrun command, it is better to just use

rm s3://bucket/prefix/2020/0*/object*.gz

the latter sends single delete request per thousand objects, whereas using the former approachsends a separate delete request for each subcommand provided torun. Thus, there can be asignificant runtime difference between those two approaches.

LICENSE

MIT. SeeLICENSE.

About

Parallel S3 and local filesystem execution tool.

Releases

No releases published

Packages

No packages published

Languages

Go96.9%
Python2.9%
Other0.2%

Movatterモバイル変換

License

rkhomenko/s5cmd

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Installation

Official Releases

Binaries

Homebrew

Unofficial Releases (by Community)

MacPorts

Conda

FreeBSD

Build from source

Docker

Hub

Build

Usage

Specifying credentials

Region detection

Examples

Download a single S3 object

Download multiple S3 objects

Upload a file to S3

Upload multiple files to S3

Stream stdin to S3

Delete an S3 object

Delete multiple S3 objects

Copy objects from S3 to S3

Using Exclude and Include Filters

Select JSON object content using SQL

Count objects and determine total size

Run multiple commands in parallel

Sync

Strategy

Default

Size only

Dry run

S3 ListObjects API Backward Compatibility

Shell auto-completion

in bash and zsh:

in pwsh:

Google Cloud Storage support

Retry logic

Integrity Verification

Using wildcards

Output

Configuring Concurrency

numworkers

concurrency

Benchmarks

performance regression tests

Advanced Usage

Integrate s5cmd operations with Unix commands

Beast Mode s5cmd

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages