Fujitsu PostgreSQL blog

< Back to blog homeFujitsu PostgreSQL blog
Understanding PostgreSQL Write-Ahead Logging (WAL)
Vigneshwaran C | May 9, 2025

Earlier this year, I had the incredible opportunity to present at PGConf India, where I delved into the intricacies of Write-Ahead Logging (WAL) in PostgreSQL. My presentation aimed to demystify this crucial database feature that ensures data integrity and enhances performance.

I shared insights on WAL's components, its role in crash recovery, and its significance in both streaming and logical replication. The session was a deep dive into the mechanics of WAL, offering attendees a comprehensive understanding of how it underpins the reliability and efficiency of PostgreSQL databases.

I discussed howPostgreSQL WAL ensures your database survives crashes, supports point-in-time recovery, and forms the bedrock of replication.

Unlocking the secrets of WAL: My PGConf India 2025 talk

Below, you'll find the slides from my presentation at PGConf India 2025.

Side by side

Click to view the slides side by side

Top to bottomClick to view the slides in vertical orientation

Understanding Write-Ahead Logging (WAL)

Vigneshwaran C

Software Lead Developer

Fujitsu

Agenda

What is WAL
DML prior to WAL
WAL was introduced
WAL components
WAL segment file
WAL record types
Checkpoint processing
Recovery after crash
WAL segment management
WAL configurations
WAL in streaming replication
WAL in logical replication
WAL Summarizer process
Archiving
WAL modes
WAL level
Diagnosing using pg_waldump
Diagnosing using pg_walinspect

What is WAL

The WAL (Write-ahead logs) is asequential record of all changes made to a database
WAL files are stored in the directory pg_wal under the data directory
Whenever a transaction modifies data in PostgreSQL, the changes are first written to the WAL file before they are applied to the actual on-disk table data files
This process is known as write-ahead logging
The WAL in PostgreSQL is stored in a set of log files called WAL segments (16MB by default)
Once a segment is filled, it is archived, depending on the database configuration
WAL segments preceding the one containing the redo record are no longer needed and can be recycled or removed
The WAL segments can be replayed during crash recovery or used for replication purposes

Aredo record is a specific type of WAL record that must be replayed during crash recovery or replication

DML prior to WAL(<Postgres 7.0)

User performs some Insert into table t1 and commits it.

postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT

Server loads the table t1 to shared buffer pool and modified the page creating dirty pages.
Write to the disk files. If there are changes in 100 tables, 100 files had to be written.
The backend responds to the user, confirming that the transaction has been committed.

Since synchronous write had to be done in random location for various tables, the DML operations were extremely slow in versions 7.0 or earlier

WAL was introduced

User performs some Insert into table t1 and commits it.

postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT

Server loads the table t1 to shared buffer pool and modified the page creating dirty pages.
Note: This modified page is not written to the data files on disk immediately.
WAL records for insert operation are written to WAL buffer.
The WAL buffers for the transaction are written to the WAL files during commit operation by the backend in case of sync commit or by walwriter in case of async commit.
The backend responds to the user, confirming that the transaction has been committed.
Checkpointer or background writer will write the dirty buffer to the physical file of the table.

WAL Writer process

It periodically checks the WAL buffer and writes all unwritten XLOG records to the WAL segments.
The WAL writer is enabled by default and cannot be disabled.
The check interval is set to the configuration parameter wal_writer_delay, which defaults to 200 milliseconds.
The backends first block the space in WAL buffer and will then write later.
In the case the backends have not finished writing to WAL buffer, WAL writer process will wait till the WAL buffer is written before flushing to disk.

WAL components

WAL Logs
WAL logs are the binary files in which various transactions are stored. These logs are written to disk before the changes are made to the database. PostgreSQL maintains multiple WAL (Write-Ahead Log) files. Instead of a single WAL file, it uses a sequence of WAL segments to continuously log database changes. When one WAL segment fills up, PostgreSQL creates a new WAL file. This allows for efficient storage and recycling of WAL files.
Checkpoints
Checkpoints are points in the sequence of transactions within a log file where all the data is guaranteed to have been updated with the information before the checkpoints. These are particularly important in crashes since the WAL file recovers from the most recent checkpoint. This is then where the system performs the REDO operation.
WAL buffers
In PostgreSQL the working units that read and write data to your tables are called "buffers”. The buffers of write-ahead logs are typically called transaction log buffers. These determine the amount of memory allocated to storing WAL data (See 'wal_buffers' GUC later).
Log Sequence Numbers
Log sequence numbers are 64-bit integers assigned to individual WAL records within a WAL, indicating the INSERT position. They are byte offsets unique within a single log stream that increase with each new transaction and are of data type pg_lsn. LSNs can help determine the amount of data in bytes between different transactions.

WAL segment file

A WAL segment is 16 MB by default.
A WAL file name consists of three parts:
- Timeline ID: Tracks different versions of the database’s history. Think of it like saving multiple versions of a database.
- Logical file ID: Represents the WAL file’s sequence number within the timeline.
- Segment ID: Identifies a specific segment within the log file. Each segment is typically 16 MB in size.

WAL segment file

Each WAL segment is divided into 8 KB pages
The first page contains a special header -XLogLongPageHeaderData
All other pages have standard headers -XLogPageHeaderData
WAL records are written to WAL page

WAL records

Each WAL record consists of multiple components:

WAL record header
- LSN (Log Sequence Number): Unique identifier for the record
- Transaction ID (XID): Transaction that generated the record
- Record type: Type of operation (e.g., INSERT, UPDATE, DELETE, COMMIT)
- CRC checksum: Ensures WAL integrity
WAL record data
- Block references: Identifies which pages (relations) are affected
- Tuple data (optional): Stores new or old row values (for UPDATE/DELETE)
- Redo information: Required details to redo the change
WAL record tail^*
- Full-page Image (FPI): Snapshot of the entire page (used after checkpoints).
- Commit status: Indicates if the transaction is fully committed.

*: Optional

WAL record types

Type	Details
CREATE	Represents the creation of a database object such as a table, index, or other schema-related entity.
INSERT	Logs the insertion of a new row into a table.
INSERT+INIT	Similar to an INSERT record but occurs when inserting into a newly initialized (empty) page.
UPDATE	Logs the modification of an existing row in a table.
DELETE	Logs the deletion of a row from a table.
TRUNCATE	Logs the truncation of a table, which removes all rows efficiently without logging individual row deletions.
COMMIT	Logs a transaction commit, ensuring that all its changes are permanently recorded.
LOCK	Represents a lock operation, typically used for locking database objects such as tables.
INVALIDATION	Logs cache invalidation events to ensure changes to shared data (e.g., catalogs) are recognized by all backends.
RUNNING_XACTS	Logs running transactions at a particular moment, used in replication and recovery to determine transaction visibility.
HOT_UPDATE	Special form of UPDATE that allows heap-only tuple updates (HOT), optimizing performance by avoiding unnecessary index updates.
HOT_UPDATE+INIT	Similar to HOT_UPDATE, but involves an update into an initialized (empty) page.
NEW_CID	Logs the assignment of a new command ID, used to track command execution order within a transaction.

Checkpoint processing

A WAL record of this checkpoint is recorded in the log
Shared memory contents like clog are written to disk
All dirty pages of data are written to disk
The checkpoint information like checkpoint LSN, timeline, oldest transaction ID are written to the pg_control file
A WAL record saying checkpoint done is recorded in the log. WAL files that are not required (prior to last checkpoint’s location) are removed

Acheckpoint is a process that ensures data consistency by flushing all modified (dirty) buffers from memory to disk and updating metadata. It marks a point in the WAL (Write-Ahead Log) from which crash recovery can start, minimizing WAL replay time.

Recovery after crash

...pg_control version number:            1700...Latest checkpoint location:           0/449B5E0Latest checkpoint's REDO location:    0/449B588Latest checkpoint's REDO WAL file:    000000010000000000000004...

PostgreSQL read pg_control to identify the redo location from which the restart must be started.
It then reads the WAL from the redo point and replays the WAL record. It will skip applying the WAL if it is already applied.
By this process all committed transactions are restored.
Any partially executed transactions that were not committed will be rolled back.
Now recovery is complete and database is ready to use.

WAL segment management

A WAL segment switch occurs when:

The segment is filled up
The functionpg_switch_wal() is called
archive_mode is enabled andarchive_timeout expires

WAL configurations

Config	Details
wal_level	replica (default).
wal_buffers	Sets the number of disk-page buffers in shared memory for WAL.
wal_keep_segments	Minimum number of file segments to keep in the pg_wal directory. If you do not want to archive, set a slightly increased value.
checkpoint_flush_after	Whenever more than this amount of data has been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage.
archive_mode	When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command or archive_library.
archive_command	Command for archiving the WAL. Example:cp %p /mnt/serv/arch_dir %f
synchronous_commit	Synchronization level of the standby servers. Setremote_apply to optimize data freshness andon to maintain performance and reliability.
max_slot_wal_keep_size	Specify the maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time.
max_wal_senders	Number of standby servers +1. This value cannot exceed the value set in max_connections.
wal_sender_timeout	Time to wait before determining that the WAL receiver process is in an abnormal state.
max_connections	Maximum number of simultaneous connections to the database server. Set a value greater than max_wal_senders.
synchronous_standby_names	Standby servers for synchronous replication. No setting is required for asynchronous replication. Example:s1

WAL in streaming replication

Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real time, so that the databases of the primary server and standby server can be kept in sync.
Streaming replication works by transferring, or shipping, the WAL to the standby server in real time, and applying it on the standby server.

WAL in streaming replication

User performs transaction commit on primary
The backend process handles the transaction and writes changes to the primary data pages.
Changes made by the transaction are logged in the Write-Ahead Log (WAL).
The WAL sender process reads the WAL records from the primary node.
WAL records are transmitted over a network connection from the primary to the standby.
The WAL receiver process on the standby node writes incoming WAL records to its local WAL storage.
The standby server reads the WAL records that were received.
The startup process applies WAL changes to the standby’s data files.
The recovery process applies WAL changes to the standby’s data files.

WAL in streaming replication

Logical replication is a method of replicating selective data changes based on the definition of the publication from the publisher to subscriber.

User initiates a transaction commit on the publisher node
The backend process applies changes to the publisher's data files
Changes are logged in WAL (Write-Ahead Log) on the publisher
The WAL sender process checks if the change should be published based on the publication rules
The WAL sender transmits only relevant committed transactions to the subscriber node
The apply worker receives changes and applies transactions to the subscriber's data
The subscriber node can accept read and write queries independently

WAL Summarizer process

The WAL Summarizer process was introduced in PostgreSQL 17. It supports incremental backups, tracks changes to all database blocks (including relations and visibility maps) and writes these modifications to WAL summary files located in thepg_wal/summaries directory
The WAL Summarizer process operates as follows:
- During each checkpoint, it reads WAL segment files from the previous REDO point to the current REDO point
- It tracks changes to all blocks of all relations (inc. visibility maps) using the WAL segment files
- It writes the summary to WAL summary files located in thepg_wal/summaries/ directory
In the context of the WAL Summarizer process,previous REDO point andcurrent REDO point are referred to asstart_lsn andend_lsn, respectively
Wal summarizer process will be started ifsummarize_wal guc is set.

The Redo Point in PostgreSQL refers to the Log Sequence Number (LSN) from which WAL (Write-Ahead Log) replay must start during crash recovery. It is the position where the last checkpoint started, and serves as the earliest point required for consistent recovery.

Archiving

Continuous archiving is a feature that copies WAL segment files to an archival area at the time when a WAL segment switch occurs
It is performed by the archiver (background) process. The copied file is called an archive log. This feature is typically used for hot physical backup and PITR (Point-in-Time Recovery)
The path to the archival area is set by the archive_command configuration parameter
For example, the following parameter would copy WAL segment files to the directory/home/db/archives/ every time a segment switch occurs:
```
archive_command = cp %p /home/db/archives/%f
```
The%p placeholder represents the copied WAL segment, and the%f placeholder represents the archive log
The archive_command parameter can be set to any Unix command or tool.
This enables the use of scp or other file backup tools to transfer archive logs to remote hosts, instead of relying on simple copy commands.

WAL modes

synchronous_commit

Specifies how much WAL processing must complete before the database server returns a success indication to the client.
Valid values areremote_apply,on (default),remote_write,local, andoff

synchronous_commit setting	Local durable commot	Standby durable commit after Postgres crash	Standby durable commit after OS crash	Standby query consistency
remote_apply
on
remote_write
local
off

WAL level

minimal
- This level generates the least WAL volume. It logs no row information for permanent relations in transactions that create or rewrite them. This can make operations much faster. But minimal WAL does not contain sufficient information for point-in-time recovery, so replica or higher must be used to enable continuous archiving (archive_mode) and streaming binary replication.
replica
- It means that the WAL contains enough information for streaming replication to read-only standby servers.
logical
- In logical level, the same information is logged as with replica, plus information needed to extract logical change sets from the WAL. Using a level of logical will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL and many UPDATE and DELETE statements are executed.

Diagnosing using pg_waldump

pg_waldump is a PostgreSQL server application to display a human-readable rendering of the WAL of a PostgreSQL database cluster.

pg_waldump [option...] [startseg [endseg]]

Let’s say you want to see the WAL’s generated for table test_wal whose tablespace ID is1663, database ID is5, and relation ID is16384

./pg_waldump node1/pg_wal/000000010000000000000001 -R1663/5/16384rmgr: Heap  len (rec/tot): 59/59, tx: 744, lsn: 0/014AE238, prev 0/014AE200, desc: INSERT+INIT off: 1, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 70/70, tx: 755, lsn: 0/014F0C28, prev 0/014F0BF0, desc: HOT_UPDATE old_xmax: 755, old_off: 1, old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 2,blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 59/59, tx: 756, lsn: 0/014F0CD8, prev 0/014F0CA0, desc: INSERT off: 3, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 54/54, tx: 757, lsn: 0/014F0D48, prev 0/014F0D18, desc: DELETE xmax: 757, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel1663/5/16384 blk 0

Diagnosing using pg_walinspect

pg_walinspect is an extension which provides SQL functions that allow you to inspect WAL content
This is very useful in cloud environment for people having only database access
Similar topg_waldump, but accessible through SQL rather than separate utility
Gets WAL record info about a record located at or after the in_lsn argument

postgres=# SELECT * FROM pg_get_wal_records_info('0/014AEC80', '0/014FE6B0') LIMIT 1;-[ RECORD 1 ]----+---------------------------------------------------------start_lsn        | 0/14AEC80end_lsn          | 0/14AECB8prev_lsn         | 0/14AEC50xid              | 752resource_manager | Heaprecord_type      | DELETErecord_length    | 54main_data_length | 8fpi_length       | 0description      | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_ref        | blkref #0: rel 1663/5/16384 fork main blk 0

Diagnosing using pg_walinspect

Gets information about each block reference from all the valid WAL records between start_lsn and end_lsn with one or more block references.

postgres=# SELECT * FROM pg_get_wal_block_info('0/014AEC80', '0/014FE6B0') limit 1;-[ RECORD 1 ]-----+---------------------------------------------------------start_lsn         | 0/14AEC80end_lsn           | 0/14AECB8prev_lsn          | 0/14AEC50block_id          | 0reltablespace     | 1663reldatabase       | 5relfilenode       | 16384relforknumber     | 0relblocknumber    | 0xid               | 752resource_manager  | Heaprecord_type       | DELETErecord_length     | 54main_data_length  | 8block_data_length | 0block_fpi_length  | 0block_fpi_info    | description       | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_data        | block_fpi_data    |

Summary

What is WAL
Why WAL is required
How WAL helps in recovery
How WAL helps in logical & streaming replication
Various WAL configurations
Diagnosis of WAL

References

The Internals of PostgreSQL
https://www.interdb.jp/pg/
PostgreSQL 14 Internals
https://postgrespro.com/community/books/internals
PostgreSQL Documentation
https://www.postgresql.org/docs/

Thank you

Vigneshwaran C

Software Lead Developer

Fujitsu

linkedin.com/in/vigneshwaran-c-80b36029

Conclusion

ill-people-using-laptop-around-shield-and-gear-and-magnifying-glass-01-variation-04 Presenting at PGConf India 2025 was an enriching experience, allowing me to share my insights on WAL with a vibrant community of PostgreSQL enthusiasts. I tried to encapsulate the essence of WAL, from its fundamental concepts to its critical role in ensuring data integrity and performance. By exploring the various components, configurations, and diagnostic tools associated with WAL, I aimed to provide a comprehensive understanding that attendees could apply in their own database environments.

I hope that sharing the slides here serve as a valuable resource for anyone looking to deepen their knowledge of PostgreSQL's WAL mechanism. Thank you for joining me on this journey through the intricacies of WAL!

Topics:PostgreSQL

Receive our blog

Search by topic

see all >

Vigneshwaran C

Software Lead Developer, Fujitsu OSS PostgreSQL team

Vignesh is a recognized PostgreSQL Contributor working in adding enhancements and fixing bugs in PostgreSQL.

He presents talks related to logical replication in various conferences, and he authors blogs on various PostgreSQL topics.

View all Vignesh's blog posts