- Notifications
You must be signed in to change notification settings - Fork16
Block-level incremental backup engine for PostgreSQL
License
postgrespro/ptrack
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Ptrack is a block-level incremental backup engine for PostgreSQL. You caneffectively useptrack
engine for taking incremental backups withpg_probackup backup and recovery manager for PostgreSQL.
It is designed to allow false positives (i.e. block/page is marked in theptrack
map, but actually has not been changed), but to never allow false negatives (i.e. loosing anyPGDATA
changes, excepting hint-bits).
Currently,ptrack
codebase is split between small PostgreSQL core patch and extension. All public SQL API methods and main engine are placed in theptrack
extension, while the core patch contains only certain hooks and modifies binary utilities to ignoreptrack.map.*
files.
- Get latest PostgreSQL sources:
git clone https://github.com/postgres/postgres.git -b REL_12_STABLE&&cd postgres
- Apply PostgreSQL core patch:
git apply -3 ptrack/patches/REL_12_STABLE-ptrack-core.diff
Compile and install PostgreSQL
Set
ptrack.map_size
(in MB)
echo"shared_preload_libraries = 'ptrack'">> postgres_data/postgresql.confecho"ptrack.map_size = 64">> postgres_data/postgresql.conf
- Compile and install
ptrack
extension
USE_PGXS=1 make -C /path/to/ptrack/ install
- Run PostgreSQL and create
ptrack
extension
postgres=# CREATE EXTENSION ptrack;
The only one configurable option isptrack.map_size
(in MB). Default is-1
, which meansptrack
is turned off. To completely avoid false positives it is recommended to setptrack.map_size
to1 / 1000
of expectedPGDATA
size (i.e.1000
for a 1 TB database), since a single 8 byteptrack
map record tracks changes in a standard 8 KB PostgreSQL page.
To disableptrack
and clean up all remaining service files setptrack.map_size
to0
.
- ptrack_version() — returns ptrack version string.
- ptrack_init_lsn() — returns LSN of the last ptrack map initialization.
- ptrack_get_pagemapset('LSN') — returns a set of changed data files with bitmaps of changed blocks since specified LSN.
Usage example:
postgres=# SELECT ptrack_version(); ptrack_version----------------2.1(1 row)postgres=# SELECT ptrack_init_lsn(); ptrack_init_lsn-----------------0/1814408(1 row)postgres=# SELECT ptrack_get_pagemapset('0/186F4C8'); ptrack_get_pagemapset------------------------------------------- (global/1262,"\\x0100000000000000000000") (global/2672,"\\x0200000000000000000000") (global/2671,"\\x0200000000000000000000")(3 rows)
You can only use
ptrack
safely withwal_level >= 'replica'
. Otherwise, you can lose tracking of some changes if crash-recovery occurs, sincecertain commands are designed not to write WAL at all if wal_level is minimal, but we only durably flushptrack
map at checkpoint time.The only one production-ready backup utility, that fully supports
ptrack
ispg_probackup.Currently, you cannot resize
ptrack
map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup. SeeTODO for details.You will need up to
ptrack.map_size * 3
of additional disk space, sinceptrack
uses two additional temporary files for durability purpose. SeeArchitecture section for details.
Briefly, an overhead of usingptrack
on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incrementalptrack
backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More detailshere.
We use a single shared hash table inptrack
, which is mapped in memory from the file on disk usingmmap
. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enoughptrack.map_size
.
All reads/writes are made using atomic operations onuint64
entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access and cannot controlmmap
eviction back to disk,ptrack
keeps a map (ptrack.map
) since the last checkpoint intact and uses up to 2 additional temporary files:
- working copy
ptrack.map.mmap
for doingmmap
on it (there is aTODO item); - temporary file
ptrack.map.tmp
to durably replaceptrack.map
during checkpoint.
Map is written on disk at the end of checkpoint atomically block by block involving the CRC32 checksum calculation that is checked on the next whole map re-read after crash-recovery or restart.
To gather the whole changeset of modified blocks inptrack_get_pagemapset()
we walk the entirePGDATA
(base/**/*
,global/*
,pg_tblspc/**/*
) and verify using map whether each block of each relation was modified since the specified LSN or not.
Feel free tosend pull requests,fill up issues, or just reach one of us directly (e.g. <Alexey Kondratov,@ololobus>) if you are interested inptrack
.
Everything is tested automatically withtravis-ci.com andcodecov.io, but you can also run tests locally viaDocker
:
export PG_VERSION=12export PG_BRANCH=REL_12_STABLEexport TEST_CASE=allexport MODE=paranoiadocker-compose builddocker-compose run tests
Available test modes (MODE
) arebasic
(default) andparanoia
(per-block checksum comparison ofPGDATA
content before and after backup-restore process). Available test cases (TEST_CASE
) aretap
(minimalistic PostgreSQLtap test),all
or any specificpg_probackup test, e.g.test_ptrack_simple
.
- Use POSIX
shm_open()
instead ofopen()
to do not create an additional working copy ofptrack
map file. - Should we introduce
ptrack.map_path
to allowptrack
service files storage outside ofPGDATA
? Doing that we will avoid patching PostgreSQL binary utilities to ignoreptrack.map.*
files. - Can we resize
ptrack
map on restart but keep the previously tracked changes? - Can we resize
ptrack
map dynamicaly? - Can we write a formal proof, that we never loose any modified page with
ptrack
? With TLA+?
About
Block-level incremental backup engine for PostgreSQL
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.