Earlier this year, I had the incredible opportunity to present at PGConf India, where I delved into the intricacies of Write-Ahead Logging (WAL) in PostgreSQL. My presentation aimed to demystify this crucial database feature that ensures data integrity and enhances performance.
I shared insights on WAL's components, its role in crash recovery, and its significance in both streaming and logical replication. The session was a deep dive into the mechanics of WAL, offering attendees a comprehensive understanding of how it underpins the reliability and efficiency of PostgreSQL databases.
Below, you'll find the slides from my presentation at PGConf India 2025.
Vigneshwaran C
Software Lead Developer
Fujitsu
postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT
postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT
Note: This modified page is not written to the data files on disk immediately.
WAL logs are the binary files in which various transactions are stored. These logs are written to disk before the changes are made to the database. PostgreSQL maintains multiple WAL (Write-Ahead Log) files. Instead of a single WAL file, it uses a sequence of WAL segments to continuously log database changes. When one WAL segment fills up, PostgreSQL creates a new WAL file. This allows for efficient storage and recycling of WAL files.
Checkpoints are points in the sequence of transactions within a log file where all the data is guaranteed to have been updated with the information before the checkpoints. These are particularly important in crashes since the WAL file recovers from the most recent checkpoint. This is then where the system performs the REDO operation.
In PostgreSQL the working units that read and write data to your tables are called "buffers”. The buffers of write-ahead logs are typically called transaction log buffers. These determine the amount of memory allocated to storing WAL data (See 'wal_buffers' GUC later).
Log sequence numbers are 64-bit integers assigned to individual WAL records within a WAL, indicating the INSERT position. They are byte offsets unique within a single log stream that increase with each new transaction and are of data type pg_lsn. LSNs can help determine the amount of data in bytes between different transactions.
Each WAL record consists of multiple components:
*: Optional
Type | Details |
CREATE | Represents the creation of a database object such as a table, index, or other schema-related entity. |
INSERT | Logs the insertion of a new row into a table. |
INSERT+INIT | Similar to an INSERT record but occurs when inserting into a newly initialized (empty) page. |
UPDATE | Logs the modification of an existing row in a table. |
DELETE | Logs the deletion of a row from a table. |
TRUNCATE | Logs the truncation of a table, which removes all rows efficiently without logging individual row deletions. |
COMMIT | Logs a transaction commit, ensuring that all its changes are permanently recorded. |
LOCK | Represents a lock operation, typically used for locking database objects such as tables. |
INVALIDATION | Logs cache invalidation events to ensure changes to shared data (e.g., catalogs) are recognized by all backends. |
RUNNING_XACTS | Logs running transactions at a particular moment, used in replication and recovery to determine transaction visibility. |
HOT_UPDATE | Special form of UPDATE that allows heap-only tuple updates (HOT), optimizing performance by avoiding unnecessary index updates. |
HOT_UPDATE+INIT | Similar to HOT_UPDATE, but involves an update into an initialized (empty) page. |
NEW_CID | Logs the assignment of a new command ID, used to track command execution order within a transaction. |
...pg_control version number: 1700...Latest checkpoint location: 0/449B5E0Latest checkpoint's REDO location: 0/449B588Latest checkpoint's REDO WAL file: 000000010000000000000004...
A WAL segment switch occurs when:
Config | Details |
wal_level | replica (default). |
wal_buffers | Sets the number of disk-page buffers in shared memory for WAL. |
wal_keep_segments | Minimum number of file segments to keep in the pg_wal directory. If you do not want to archive, set a slightly increased value. |
checkpoint_flush_after | Whenever more than this amount of data has been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage. |
archive_mode | When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command or archive_library. |
archive_command | Command for archiving the WAL. Example:cp %p /mnt/serv/arch_dir %f |
synchronous_commit | Synchronization level of the standby servers. Setremote_apply to optimize data freshness andon to maintain performance and reliability. |
max_slot_wal_keep_size | Specify the maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time. |
max_wal_senders | Number of standby servers +1. This value cannot exceed the value set in max_connections. |
wal_sender_timeout | Time to wait before determining that the WAL receiver process is in an abnormal state. |
max_connections | Maximum number of simultaneous connections to the database server. Set a value greater than max_wal_senders. |
synchronous_standby_names | Standby servers for synchronous replication. No setting is required for asynchronous replication. Example:s1 |
Logical replication is a method of replicating selective data changes based on the definition of the publication from the publisher to subscriber.
It is performed by the archiver (background) process. The copied file is called an archive log. This feature is typically used for hot physical backup and PITR (Point-in-Time Recovery)
For example, the following parameter would copy WAL segment files to the directory/home/db/archives/ every time a segment switch occurs:
archive_command = cp %p /home/db/archives/%f
This enables the use of scp or other file backup tools to transfer archive logs to remote hosts, instead of relying on simple copy commands.
synchronous_commit
Specifies how much WAL processing must complete before the database server returns a success indication to the client.
Valid values areremote_apply,on (default),remote_write,local, andoff
synchronous_commit setting | Local durable commot | Standby durable commit after Postgres crash | Standby durable commit after OS crash | Standby query consistency |
remote_apply | ||||
on | ||||
remote_write | ||||
local | ||||
off |
pg_waldump is a PostgreSQL server application to display a human-readable rendering of the WAL of a PostgreSQL database cluster.
pg_waldump [option...] [startseg [endseg]]
Let’s say you want to see the WAL’s generated for table test_wal whose tablespace ID is1663, database ID is5, and relation ID is16384
./pg_waldump node1/pg_wal/000000010000000000000001 -R1663/5/16384rmgr: Heap len (rec/tot): 59/59, tx: 744, lsn: 0/014AE238, prev 0/014AE200, desc: INSERT+INIT off: 1, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap len (rec/tot): 70/70, tx: 755, lsn: 0/014F0C28, prev 0/014F0BF0, desc: HOT_UPDATE old_xmax: 755, old_off: 1, old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 2,blkref #0: rel1663/5/16384 blk 0rmgr: Heap len (rec/tot): 59/59, tx: 756, lsn: 0/014F0CD8, prev 0/014F0CA0, desc: INSERT off: 3, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap len (rec/tot): 54/54, tx: 757, lsn: 0/014F0D48, prev 0/014F0D18, desc: DELETE xmax: 757, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel1663/5/16384 blk 0
postgres=# SELECT * FROM pg_get_wal_records_info('0/014AEC80', '0/014FE6B0') LIMIT 1;-[ RECORD 1 ]----+---------------------------------------------------------start_lsn | 0/14AEC80end_lsn | 0/14AECB8prev_lsn | 0/14AEC50xid | 752resource_manager | Heaprecord_type | DELETErecord_length | 54main_data_length | 8fpi_length | 0description | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_ref | blkref #0: rel 1663/5/16384 fork main blk 0
Gets information about each block reference from all the valid WAL records between start_lsn and end_lsn with one or more block references.
postgres=# SELECT * FROM pg_get_wal_block_info('0/014AEC80', '0/014FE6B0') limit 1;-[ RECORD 1 ]-----+---------------------------------------------------------start_lsn | 0/14AEC80end_lsn | 0/14AECB8prev_lsn | 0/14AEC50block_id | 0reltablespace | 1663reldatabase | 5relfilenode | 16384relforknumber | 0relblocknumber | 0xid | 752resource_manager | Heaprecord_type | DELETErecord_length | 54main_data_length | 8block_data_length | 0block_fpi_length | 0block_fpi_info | description | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_data | block_fpi_data |
Presenting at PGConf India 2025 was an enriching experience, allowing me to share my insights on WAL with a vibrant community of PostgreSQL enthusiasts. I tried to encapsulate the essence of WAL, from its fundamental concepts to its critical role in ensuring data integrity and performance. By exploring the various components, configurations, and diagnostic tools associated with WAL, I aimed to provide a comprehensive understanding that attendees could apply in their own database environments.
I hope that sharing the slides here serve as a valuable resource for anyone looking to deepen their knowledge of PostgreSQL's WAL mechanism. Thank you for joining me on this journey through the intricacies of WAL!
Topics:PostgreSQL
Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.
Fill the form to receive notifications of future posts