- Notifications
You must be signed in to change notification settings - Fork16
Block-level incremental backup engine for PostgreSQL
License
postgrespro/ptrack
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PTRACK allows speed up incremental backups for the huge PostgreSQL databases. PTRACK store changes of physical blocks in the memory. You caneffectively usePTRACK
engine for taking incremental backups bypg_probackup.
Current patch are available for PostgreSQL11,12,13,14,15
Enterprise PTRACK are part ofPostgres Pro Enterprise and offers the capability to track more than 100 000 tables and indexes concurrently without any degradation in speedCFS (compressed file system).According to benchmarks, it operated up to 5 times faster and useful for ERP and DWH with huge amount of tables and relations between them.
- Get latest
PTRACK
sources:
git clone https://github.com/postgrespro/ptrack.git
- Get latest PostgreSQL sources:
git clone https://github.com/postgres/postgres.git -b REL_14_STABLE&&cd postgres
- Apply PostgreSQL core patch:
git apply -3 ../ptrack/patches/REL_14_STABLE-ptrack-core.diff
Compile and install PostgreSQL
Set
ptrack.map_size
(in MB)
echo"shared_preload_libraries = 'ptrack'">> postgres_data/postgresql.confecho"ptrack.map_size = 64">> postgres_data/postgresql.conf
- Compile and install
PTRACK
extension
USE_PGXS=1 make -C /path/to/ptrack/ install
- Run PostgreSQL and create
PTRACK
extension
postgres=# CREATE EXTENSION ptrack;
The only one configurable option isptrack.map_size
(in MB). Default is0
, which meansPTRACK
is turned off. In order to reduce number of false positives it is recommended to setptrack.map_size
to1 / 1000
of expectedPGDATA
size (i.e.1000
for a 1 TB database).
To disablePTRACK
and clean up all remaining service files setptrack.map_size
to0
.
- ptrack_version() — returns ptrack version string.
- ptrack_init_lsn() — returns LSN of the last ptrack map initialization.
- ptrack_get_pagemapset(start_lsn pg_lsn) — returns a set of changed data files with a number of changed blocks and their bitmaps since specified
start_lsn
. - ptrack_get_change_stat(start_lsn pg_lsn) — returns statistic of changes (number of files, pages and size in MB) since specified
start_lsn
.
Usage example:
postgres=# SELECT ptrack_version(); ptrack_version----------------2.4(1 row)postgres=# SELECT ptrack_init_lsn(); ptrack_init_lsn-----------------0/1814408(1 row)postgres=# SELECT * FROM ptrack_get_pagemapset('0/185C8C0');path | pagecount | pagemap---------------------+-----------+---------------------------------------- base/16384/1255 |3 | \x001000000005000000000000 base/16384/2674 |3 | \x0000000900010000000000000000 base/16384/2691 |1 | \x00004000000000000000000000 base/16384/2608 |1 | \x000000000000000400000000000000000000 base/16384/2690 |1 | \x000400000000000000000000(5 rows)postgres=# SELECT * FROM ptrack_get_change_stat('0/285C8C8'); files | pages | size, MB-------+-------+------------------------20 |25 |0.19531250000000000000(1 row)
Usually, you have to only install new version ofPTRACK
and doALTER EXTENSION 'ptrack' UPDATE;
. However, some specific actions may be required as well:
- Put
shared_preload_libraries = 'ptrack'
intopostgresql.conf
. - Rename
ptrack_map_size
toptrack.map_size
. - Do
ALTER EXTENSION 'ptrack' UPDATE;
. - Restart your server.
Since version 2.2 we use a different algorithm for tracking changed pages. Thus, data recorded in theptrack.map
using pre 2.2 versions ofPTRACK
is incompatible with newer versions. After extension upgrade and server restart oldptrack.map
will be discarded withWARNING
and initialized from the scratch.
- Stop your server
- Update ptrack binaries
- Remove global/ptrack.map.mmap if it exist in server data directory
- Start server
- Do
ALTER EXTENSION 'ptrack' UPDATE;
.
- Stop your server
- Update
PTRACK
binaries - Start server
- Do
ALTER EXTENSION 'ptrack' UPDATE;
.
You can only use
PTRACK
safely withwal_level >= 'replica'
. Otherwise, you can lose tracking of some changes if crash-recovery occurs, sincecertain commands are designed not to write WAL at all if wal_level is minimal, but we only durably flushPTRACK
map at checkpoint time.The only one production-ready backup utility, that fully supports
PTRACK
ispg_probackup.You cannot resize
PTRACK
map in runtime, only on postmaster start. Also, you will lose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup.You will need up to
ptrack.map_size * 2
of additional disk space, sincePTRACK
uses additional temporary file for durability purpose. SeeArchitecture section for details.
Briefly, an overhead of usingPTRACK
on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incrementalPTRACK
backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More detailshere.
It is designed to permit false positives (i.e., block/page is marked as altered in thePTRACK
map when it hasn't actually been changed), but it never tolerates false negatives (i.e., it never loses any PGDATA modifications, barring hint-bits).
At present, the PTRACK codebase is divided between a small PostgreSQL core patch and an extension. The public SQL API methods and the main engine are housed in thePTRACK
extension, whereas the core patch only includes specific hooks and modifies binary utilities to disregard ptrack.map.* files.
InPTRACK
, we use a single shared hash table. Due to the fixed size of the map, there can be false positives (when a block is marked as changed without actual modification), but false negatives are not allowed. Nevertheless, these false positives can be completely removed by setting a sufficiently high ptrack.map_size.
All reads/writes are performed using atomic operations on uint64 entries, making the map completely lockless during standard PostgreSQL operation. Since we do not utilize locks for read/write access,PTRACK
maintains a map (ptrack.map) from the last checkpoint unaltered and uses a maximum of one additional temporary file.
- temporary file
ptrack.map.tmp
to durably replaceptrack.map
during checkpoint.
Map is written on disk at the end of checkpoint atomically block by block involving the CRC32 checksum calculation that is checked on the next whole map re-read after crash-recovery or restart.
To gather the whole changeset of modified blocks inptrack_get_pagemapset()
we walk the entirePGDATA
(base/**/*
,global/*
,pg_tblspc/**/*
) and verify using map whether each block of each relation was modified since the specified LSN or not.
Feel free tosend pull requests,fill up issues.See also the list ofauthors who participated in this project.
Everything is tested automatically withtravis-ci.com andcodecov.io, but you can also run tests locally viaDocker
:
export PG_BRANCH=REL_14_STABLEexport TEST_CASE=allexport MODE=paranoia./make_dockerfile.shdocker-compose builddocker-compose run tests
Available test modes (MODE
) arebasic
(default) andparanoia
(per-block checksum comparison ofPGDATA
content before and after backup-restore process). Available test cases (TEST_CASE
) aretap
(minimalistic PostgreSQLtap test),all
or any specificpg_probackup test, e.g.test_ptrack_simple
.
PTRACK development is supported by Postgres ProfessionalYou can ask any question about contributing or usage inRussian chat orInternation chat ofpg_probackup.
About
Block-level incremental backup engine for PostgreSQL
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.