wal-g/wal-gPublic

NotificationsYou must be signed in to change notification settings
Fork488
Star3.6k

Archival and Restoration for databases in the Cloud

License

View license

3.6k stars 488 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,964 Commits
.github		.github
benchmarks		benchmarks
cmd		cmd
docker		docker
docs		docs
internal		internal
main		main
pkg/storages		pkg/storages
submodules		submodules
test		test
tests_func		tests_func
testtools		testtools
utility		utility
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.goimportsignore		.goimportsignore
.golangci.yml		.golangci.yml
.readthedocs.yaml		.readthedocs.yaml
CMakeLists-brotli.txt		CMakeLists-brotli.txt
CODEOWNERS		CODEOWNERS
LICENSE.md		LICENSE.md
Makefile		Makefile
cleanup.sh		cleanup.sh
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
link_brotli.sh		link_brotli.sh
link_libsodium.sh		link_libsodium.sh
mkdocs.yml		mkdocs.yml
redis_cli.sh		redis_cli.sh

Repository files navigation

WAL-G

This documentation is also available at wal-g.readthedocs.io

WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis).

WAL-G is the successor of WAL-E with a number of key differences. WAL-G uses LZ4, LZMA, ZSTD, or Brotli compression, multiple processors, and non-exclusive base backups for Postgres. More information on the original design and implementation of WAL-G can be found on the Citus Data blog post"Introducing WAL-G by Citus: Faster Disaster Recovery for Postgres".

Table of Contents

Installation

A precompiled binary for Linux AMD 64 of the latest version of WAL-G can be obtained under theReleases tab.

Binary name has the following format:wal-g-DBNAME-OSNAME, whereDBNAME stands for the name of the database (for example pg, mysql),OSNAME stands for the name of the operating system used for building the binary.

To decompress the binary, use:

tar -zxvf wal-g-DBNAME-OSNAME-amd64.tar.gzmv wal-g-DBNAME-OSNAME-amd64 /usr/local/bin/wal-g

For example, for Postgres and Ubuntu 18.04:

tar -zxvf wal-g-pg-ubuntu-18.04-amd64.tar.gzmv wal-g-pg-ubuntu-18.04-amd64 /usr/local/bin/wal-g

For other systems, please consult theDevelopment section for more information.

WAL-G supports bash and zsh autocompletion. Runwal-g help completion for more info.

Configuration

There are two ways how you can configure WAL-G:

Using environment variables
Using a config file
--config /path flag can be used to specify the path where the config file is located.
We support every format that theviper package supports: JSON, YAML, envfile andothers.

Every configuration variable mentioned in the following documentation can be specified either as an environment variable or a field in the config file.

Storage

To configure where WAL-G stores backups, please consult theStorages section.

Compression

WALG_COMPRESSION_METHOD

To configure the compression method used for backups. Possible options are:lz4,lzma,zstd,brotli. The default method islz4. LZ4 is the fastest method, but the compression ratio is bad.LZMA is way much slower. However, it compresses backups about 6 times better than LZ4. Brotli and zstd are a good trade-off between speed and compression ratio, which is about 3 times better than LZ4.

Encryption

YC_CSE_KMS_KEY_ID

To configure Yandex Cloud KMS key for client-side encryption and decryption. By default, no encryption is used.

YC_SERVICE_ACCOUNT_KEY_FILE

To configure the name of a file containing private key of Yandex Cloud Service Account. If not set a token from the metadata service (http://169.254.169.254) will be used to make API calls to Yandex Cloud KMS.

WALG_LIBSODIUM_KEY

To configure encryption and decryption with libsodium. WAL-G uses analgorithm that only requires a secret key. libsodium keys are fixed-size keys of 32 bytes. For optimal cryptographic security, it is recommened to use a random 32 byte key. To generate a random key, you can something likeopenssl rand -hex 32 (setWALG_LIBSODIUM_KEY_TRANSFORM tohex) oropenssl rand -base64 32 (setWALG_LIBSODIUM_KEY_TRANSFORM tobase64).

WALG_LIBSODIUM_KEY_PATH

Similar toWALG_LIBSODIUM_KEY, but value is the path to the key on file system. The file content will be trimmed from whitespace characters.

WALG_LIBSODIUM_KEY_TRANSFORM

The transform that will be applied to theWALG_LIBSODIUM_KEY to get the required 32 byte key. Supported transformations arebase64,hex ornone (default).The optionnone exists for backwards compatbility, the user input will be converted to 32 byte either via truncation or by zero-padding.

WALG_GPG_KEY_ID (alternative formWALE_GPG_KEY_ID)⚠️DEPRECATED

To configure GPG key for encryption and decryption. By default, no encryption is used. Public keyring is cached in the file "/.walg_key_cache".

WALG_PGP_KEY

To configure encryption and decryption with OpenPGP standard. You can join multiline key using\n symbols into one line (mostly used in case of daemontools and envdir).Setprivate key value when you need to executewal-fetch orbackup-fetch command.Setpublic key value when you need to executewal-push orbackup-push command.Keep in mind that theprivate key also contains thepublic key.

WALG_PGP_KEY_PATH

Similar toWALG_PGP_KEY, but value is the path to the key on file system.

WALG_PGP_KEY_PASSPHRASE

If yourprivate key is encrypted with apassphrase, you should setpassphrase for decrypt.

WALG_ENVELOPE_PGP_KEYTo configure encryption and decryption with the envelope PGP key stored in key management system.This option allows you to securely manage your PGP keys by storing them in the KMS.It is crucial to ensure that the key passed is encrypted using kms and encoded withbase64.Also bothprivate andpubllic parts should be presents in key because envelope key will be injected in metadata and used later inwal/backup-fetch.

Please note that currently, only Yandex Cloud Key Management Service (KMS) is supported for configuring.Ensure that you have set up and configured Yandex Cloud KMS mentioned below before attempting to use this feature.

WALG_ENVELOPE_CACHE_EXPIRATION

This setting controls kms response expiration. Default value is0 to store keys permanent in memory.Please note that if the system will not be able to redecrypt the key in kms after expiration, the previous response will be used.

WALG_ENVELOPE_PGP_YC_ENDPOINT

Endpoint is an API endpoint of Yandex.Cloud against which the SDK is used. Most users won't need to explicitly set it.

WALG_ENVELOPE_PGP_YC_CSE_KMS_KEY_ID

Similar toYC_CSE_KMS_KEY_ID, but only used for envelope pgp keys.

WALG_ENVELOPE_PGP_YC_SERVICE_ACCOUNT_KEY_FILE

Similar toYC_SERVICE_ACCOUNT_KEY_FILE, but only used for envelope pgp keys.

WALG_ENVELOPE_PGP_KEY_PATH

Similar toWALG_ENVELOPE_PGP_KEY, but value is the path to the key on file system.

Monitoring

WALG_STATSD_ADDRESS

To enable metrics publishing tostatsd orstatsd_exporter. Metrics will be sent on a best-effort basis via UDP. The default port for statsd is8125.

WALG_STATSD_EXTRA_TAGS

Use this setting to add static tags (host,operation,database, etc) to the metrics WAL-G publishes to statsd.

If you want to make demo for testing purposes, you can use graphite service from docker-compose file.

Profiling

Profiling is useful for identifying bottlenecks within WAL-G.

PROFILE_SAMPLING_RATIO

A float value between 0 and 1, defines likelihood of the profiler getting enabled. When set to 1, it will always run. This allows probabilistic sampling of invocations. Since WAL-G processes may get created several times per second (e.g. wal-g wal-push), we do not want to profile all of them.

PROFILE_MODE

The type of pprof profiler to use. Can be one ofcpu,mem,mutex,block,threadcreation,trace,goroutine. See theruntime/pprof docs for more information. Defaults tocpu.

PROFILE_PATH

The directory to store profiles in. Defaults to$TMPDIR.

Rate limiting

WALG_NETWORK_RATE_LIMIT

Network traffic rate limit during thebackup-push/backup-fetch operations in bytes per second.

Database-specific options

More options are available for the chosen database. See it inDatabases

Usage

WAL-G currently supports these commands for all type of databases:

`backup-list`

Lists names and creation time of available backups.

--pretty flag prints list in a table

--json flag prints list in JSON format, pretty-printed if combined with--pretty

--detail flag prints extra backup details, pretty-printed if combined with--pretty, json-encoded if combined with--json

`delete`

Is used to delete backups and WALs before them. By default,delete will perform a dry run. If you want to execute deletion, you have to add--confirm flag at the end of the command. Backups marked as permanent will not be deleted.

delete can operate in four modes:retain,before,everything andtarget.

retain [FULL|FIND_FULL] %number% [--after %name|time%]

ifFULL is specified, keep%number% full backups and everything in the middle. If with--after flag is used keep$number$ the most recent backups and backups made after%name|time% (including).

before [FIND_FULL] %name%

IfFIND_FULL is specified, WAL-G will calculate minimum backup needed to keep all deltas alive. IfFIND_FULL is not specified, and call can produce orphaned deltas, the call will fail with the list.

everything [FORCE]

target [FIND_FULL] %name% | --target-user-data %data% will delete the backup specified by name or user data. Unlike other delete commands, this command does not delete any archived WALs.

(Only in Postgres & MySQL) By default, if delta backup is provided as the target, WAL-G will also delete all the dependant delta backups. IfFIND_FULL is specified, WAL-G will delete all backups with the same base backup as the target.

Examples

everything all backups will be deleted (if there are no permanent backups)

everything FORCE all backups, include permanent, will be deleted

retain 5 will fail if 5th is delta

retain FULL 5 will keep 5 full backups and all deltas of them

retain FIND_FULL 5 will find necessary full for 5th and keep everything after it

retain 5 --after 2019-12-12T12:12:12 keep 5 most recent backups and backups made after 2019-12-12 12:12:12

before base_000010000123123123 will fail ifbase_000010000123123123 is delta

before FIND_FULL base_000010000123123123 will keep everything after base of base_000010000123123123

target base_0000000100000000000000C9 delete the base backup and all dependant delta backups

target --target-user-data "{ \"x\": [3], \"y\": 4 }" delete backup specified by user data

target base_0000000100000000000000C9_D_0000000100000000000000C4 delete delta backup and all dependant delta backups

target FIND_FULL base_0000000100000000000000C9_D_0000000100000000000000C4 delete delta backup and all delta backups with the same base backup

More commands are available for the chosen database engine. See it inDatabases

Storage tools

wal-g st command series allows the direct interaction with the configured storage.Storage tools documentation

Databases

Development

The following steps describe how to build WAL-G for PostgreSQL, but the process is the same for other databases. For example, to build WAL-G for MySQL, use themake mysql_build instead ofmake pg_build.

Optional:

To build with brotli compressor and decompressor, set theUSE_BROTLI environment variable.
To build with libsodium, set theUSE_LIBSODIUM environment variable.
To build with lzo decompressor, set theUSE_LZO environment variable.

Installing

Ubuntu

# Install latest Go compilersudo add-apt-repository ppa:longsleep/golang-backportssudo apt updatesudo apt install golang-go# Install lib dependenciessudo apt install libbrotli-dev liblzo2-dev libsodium-dev curl cmake# Fetch project and build# Go 1.15 and belowgo get github.com/wal-g/wal-g# Go 1.16+ - just clone repository to $GOPATH# if you want to save space add --depth=1 or --single-branchgit clone https://github.com/wal-g/wal-g$(go env GOPATH)/src/github.com/wal-g/wal-gcd$(go env GOPATH)/src/github.com/wal-g/wal-g# optional exports (see above)export USE_BROTLI=1export USE_LIBSODIUM=1export USE_LZO=1make depsmake pg_buildmain/pg/wal-g --version

Users can also install WAL-G by usingmake pg_install. Specifying theGOBIN environment variable before installing allows the user to specify the installation location. By default,make pg_install puts the compiled binary in the root directory (/).

export USE_BROTLI=1export USE_LIBSODIUM=1export USE_LZO=1make pg_cleanmake depsGOBIN=/usr/local/bin make pg_install

macOS

# brew command is Homebrew for Mac OSbrew install cmake# Fetch project and build# Go 1.15 and belowgo get github.com/wal-g/wal-g# Go 1.16+ - just clone repository to $GOPATH# if you want to save space add --depth=1 or --single-branchgit clone https://github.com/wal-g/wal-g$(go env GOPATH)/src/github.com/wal-g/wal-gcd$(go env GOPATH)/src/github.com/wal-g/wal-gexport USE_BROTLI=1export USE_LIBSODIUM="true"# since we're linking libsodium later./link_brotli.sh./link_libsodium.shmake install_and_build_pg# if you need to installGOBIN=/usr/local/bin make pg_install

To build on ARM64, set the correspondingGOOS/GOARCH environment variables:

env GOOS=darwin GOARCH=arm64 make install_and_build_pg

To build linux build on ARM MacOS:

GOOS=linux GOARCH=amd64 make mysql_build

The compiled binary to run ismain/pg/wal-g

Testing

WAL-G relies heavily on unit tests. These tests do not require S3 configuration as the upload/download parts are tested using mocked objects. Unit tests can be run using

./link_brotli.shexport USE_BROTLI=1make unittest

For more information on testing, please consulttest,testtools andunittest section inMakefile.

WAL-G will perform a round-trip compression/decompression test that generates a directory for data (e.g., data...), compressed files (e.g., compressed), and extracted files (e.g., extracted). These directories will only get cleaned up if the files in the original data directory match the files in the extracted one.

Test coverage can be obtained using:

export USE_BROTLI=1make coverage

This command generatescoverage.out file and opens HTML representation of the coverage.

Development on Windows

Information about installing and usage

Troubleshooting

A good way to start troubleshooting problems is by setting one or both of these environment variables:

WALG_LOG_LEVEL=DEVEL

Prints out the used configuration of WAL-G and detailed logs of the used command.

S3_LOG_LEVEL=DEVEL

If your commands seem to be stuck it could be that the S3 is not reachable, certificate problems or other S3 related issues.With this environment variable set you can see the Requests and Responses from S3.

Authors

See also the list ofcontributors who participated in this project.

License

This project is licensed under the Apache License, Version 2.0, but the lzo support is licensed under GPL 3.0+. Please refer to theLICENSE.md file for more details.

Acknowledgments

WAL-G would not have happened without the support ofCitus Data

WAL-G came into existence as a result of the collaboration between a summer engineering intern at Citus, Katie Li, and Daniel Farina, the original author of WAL-E, who currently serves as a principal engineer on the Citus Cloud team. Citus Data also has anopen-source extension to Postgres that distributes database queries horizontally to deliver scale and performance.

WAL-G development is supported byYandex Cloud