Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Block-level incremental backup engine for PostgreSQL

License

NotificationsYou must be signed in to change notification settings

postgrespro/ptrack

Repository files navigation

TestCodecovGitHub release

ptrack

Overview

Ptrack is a block-level incremental backup engine for PostgreSQL. You caneffectively useptrack engine for taking incremental backups withpg_probackup backup and recovery manager for PostgreSQL.

It is designed to allow false positives (i.e. block/page is marked in theptrack map, but actually has not been changed), but to never allow false negatives (i.e. loosing anyPGDATA changes, excepting hint-bits).

Currently,ptrack codebase is split between small PostgreSQL core patch and extension. All public SQL API methods and main engine are placed in theptrack extension, while the core patch contains only certain hooks and modifies binary utilities to ignoreptrack.map.* files.

This extension is compatible with PostgreSQL11,12,13,14,15.

Installation

  1. Specify the PostgreSQL branch to work with:
export PG_BRANCH=REL_15_STABLE
  1. Get the latest PostgreSQL sources:
git clone https://github.com/postgres/postgres.git -b$PG_BRANCH
  1. Get the latestptrack sources:
git clone https://github.com/postgrespro/ptrack.git postgres/contrib/ptrack
  1. Change to theptrack directory:
cd postgres/contrib/ptrack
  1. Apply the PostgreSQL core patch:
make patch
  1. Compile and install PostgreSQL:
make install-postgres prefix=$PWD/pgsql# or some other prefix of your choice
  1. Add the newly created binaries to the PATH:
export PATH=$PWD/pgsql/bin:$PATH
  1. Compile and installptrack:
make install USE_PGXS=1
  1. Setptrack.map_size (in MB):
echo"shared_preload_libraries = 'ptrack'">><DATA_DIR>/postgresql.confecho"ptrack.map_size = 64">><DATA_DIR>/postgresql.conf
  1. Run PostgreSQL and create theptrack extension:
postgres=# CREATE EXTENSION ptrack;

Configuration

The only one configurable option isptrack.map_size (in MB). Default is0, which meansptrack is turned off. In order to reduce number of false positives it is recommended to setptrack.map_size to1 / 1000 of expectedPGDATA size (i.e.1000 for a 1 TB database).

To disableptrack and clean up all remaining service files setptrack.map_size to0.

Public SQL API

  • ptrack_version() — returns ptrack version string.
  • ptrack_init_lsn() — returns LSN of the last ptrack map initialization.
  • ptrack_get_pagemapset(start_lsn pg_lsn) — returns a set of changed data files with a number of changed blocks and their bitmaps since specifiedstart_lsn.
  • ptrack_get_change_stat(start_lsn pg_lsn) — returns statistic of changes (number of files, pages and size in MB) since specifiedstart_lsn.

Usage example:

postgres=# SELECT ptrack_version(); ptrack_version----------------2.4(1 row)postgres=# SELECT ptrack_init_lsn(); ptrack_init_lsn-----------------0/1814408(1 row)postgres=# SELECT * FROM ptrack_get_pagemapset('0/185C8C0');path         | pagecount |                pagemap---------------------+-----------+---------------------------------------- base/16384/1255     |3 | \x001000000005000000000000 base/16384/2674     |3 | \x0000000900010000000000000000 base/16384/2691     |1 | \x00004000000000000000000000 base/16384/2608     |1 | \x000000000000000400000000000000000000 base/16384/2690     |1 | \x000400000000000000000000(5 rows)postgres=# SELECT * FROM ptrack_get_change_stat('0/285C8C8'); files | pages |        size, MB-------+-------+------------------------20 |25 |0.19531250000000000000(1 row)

Upgrading

Usually, you have to only install new version ofptrack and doALTER EXTENSION ptrack UPDATE;. However, some specific actions may be required as well:

Upgrading from 2.0.0 to 2.1.*:

  • Putshared_preload_libraries = 'ptrack' intopostgresql.conf.
  • Renameptrack_map_size toptrack.map_size.
  • DoALTER EXTENSION ptrack UPDATE;.
  • Restart your server.

Upgrading from 2.1.* to 2.2.*:

Since version 2.2 we use a different algorithm for tracking changed pages. Thus, data recorded in theptrack.map using pre 2.2 versions ofptrack is incompatible with newer versions. After extension upgrade and server restart oldptrack.map will be discarded withWARNING and initialized from the scratch.

Upgrading from 2.2.* to 2.3.*:

  • Stop your server
  • Update ptrack binaries
  • Remove global/ptrack.map.mmap if it exist in server data directory
  • Start server
  • DoALTER EXTENSION ptrack UPDATE;.

Upgrading from 2.3.* to 2.4.*:

  • Stop your server
  • Update ptrack binaries
  • Start server
  • DoALTER EXTENSION ptrack UPDATE;.

Limitations

  1. You can only useptrack safely withwal_level >= 'replica'. Otherwise, you can lose tracking of some changes if crash-recovery occurs, sincecertain commands are designed not to write WAL at all if wal_level is minimal, but we only durably flushptrack map at checkpoint time.

  2. The only one production-ready backup utility, that fully supportsptrack ispg_probackup.

  3. You cannot resizeptrack map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup.

  4. You will need up toptrack.map_size * 2 of additional disk space, sinceptrack uses additional temporary file for durability purpose. SeeArchitecture section for details.

Benchmarks

Briefly, an overhead of usingptrack on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incrementalptrack backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More detailshere.

Architecture

We use a single shared hash table inptrack. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enoughptrack.map_size.

All reads/writes are made using atomic operations onuint64 entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access,ptrack keeps a map (ptrack.map) since the last checkpoint intact and uses up to 1 additional temporary file:

  • temporary fileptrack.map.tmp to durably replaceptrack.map during checkpoint.

Map is written on disk at the end of checkpoint atomically block by block involving the CRC32 checksum calculation that is checked on the next whole map re-read after crash-recovery or restart.

To gather the whole changeset of modified blocks inptrack_get_pagemapset() we walk the entirePGDATA (base/**/*,global/*,pg_tblspc/**/*) and verify using map whether each block of each relation was modified since the specified LSN or not.

Contribution

Feel free tosend a pull request,create an issue orreach us by e-mail if you are interested inptrack.

Tests

All changes of the source code in this repository are checked by CI - see commit statuses and the project status badge. You can also run tests locally by executing a few Makefile targets.

Prerequisites

To run Python tests install the following packages:

OS packages:

  • python3-pip
  • python3-six
  • python3-pytest
  • python3-pytest-xdist

PIP packages:

  • testgres

For example, for Ubuntu:

sudo apt updatesudo apt install python3-pip python3-six python3-pytest python3-pytest-xdistsudo pip3 install testgres

Testing

Install PostgreSQL and ptrack as described inInstallation, install the testing prerequisites, then do (assuming the current directory isptrack):

git clone https://github.com/postgrespro/pg_probackup.git ../pg_probackup# clone the repository into postgres/contrib/pg_probackup# remember to export PATH=/path/to/pgsql/bin:$PATHmake install-pg-probackup USE_PGXS=1 top_srcdir=../..make test-tap USE_PGXS=1make test-python

Ifpg_probackup is not located inpostgres/contrib then additionally specify the path to thepg_probackup directory when buildingpg_probackup:

make install-pg-probackup USE_PGXS=1 top_srcdir=/path/to/postgres pg_probackup_dir=/path/to/pg_probackup

You can use a public Docker image which already has the necessary build environment (but not the testing prerequisites):

docker run  -e USER_ID=`id -u` -it -v$PWD:/work --name=ptrack ghcr.io/postgres-dev/ubuntu-22.04:1.0dev@a033797d2f73:~$

Environment variables

VariablePossible valuesRequiredDefault valueDescription
NPROCAn integer greater than 0NoOutput ofnprocThe number of threads used for building and running tests
PG_CONFIGFile pathNopg_config (from the PATH)The path to thepg_config binary
TESTSA Pytest filter expressionNoNot set (run all Python tests)A filter to include only selected tests into the run. See the Pytest-k option for more information. This variable is only applicable totest-python for the tests located intests.
TEST_MODEnormal, legacy, paranoiaNonormalThe "legacy" mode runs tests in an environment similar to a 32-bit Windows system. This mode is only applicable totest-tap. The "paranoia" mode compares the checksums of each block of the database catalog (PGDATA) contents before making a backup and after the restoration. This mode is only applicable totest-python.

TODO

  • Should we introduceptrack.map_path to allowptrack service files storage outside ofPGDATA? Doing that we will avoid patching PostgreSQL binary utilities to ignoreptrack.map.* files.
  • Can we resizeptrack map on restart but keep the previously tracked changes?
  • Can we write a formal proof, that we never loose any modified page withptrack? With TLA+?

About

Block-level incremental backup engine for PostgreSQL

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors16


[8]ページ先頭

©2009-2025 Movatter.jp