Movatterモバイル変換


[0]ホーム

URL:


Logo: Fujitsu and home icon
Start  trial
    bnr-elephant-39-variation-01

    Fujitsu PostgreSQL blog

    < Back to blog homeFujitsu PostgreSQL blog
    Understanding PostgreSQL Write-Ahead Logging (WAL)
    Vigneshwaran C | May 9, 2025

    Earlier this year, I had the incredible opportunity to present at PGConf India, where I delved into the intricacies of Write-Ahead Logging (WAL) in PostgreSQL. My presentation aimed to demystify this crucial database feature that ensures data integrity and enhances performance. 

    I shared insights on WAL's components, its role in crash recovery, and its significance in both streaming and logical replication. The session was a deep dive into the mechanics of WAL, offering attendees a comprehensive understanding of how it underpins the reliability and efficiency of PostgreSQL databases.

    I discussed howPostgreSQL WAL ensures your database survives crashes, supports point-in-time recovery, and forms the bedrock of replication.

    Unlocking the secrets of WAL: My PGConf India 2025 talk

    Below, you'll find the slides from my presentation at PGConf India 2025.

    Side by sideClick to view the slides side by side
    Top to bottomClick to view the slides in vertical orientation

    Understanding Write-Ahead Logging (WAL)

    Vigneshwaran C

    Software Lead Developer

    Fujitsu

    Agenda

    • What is WAL
    • DML prior to WAL
    • WAL was introduced
    • WAL components
    • WAL segment file
    • WAL record types
    • Checkpoint processing
    • Recovery after crash
    • WAL segment management
    • WAL configurations
    • WAL in streaming replication
    • WAL in logical replication
    • WAL Summarizer process
    • Archiving
    • WAL modes
    • WAL level
    • Diagnosing using pg_waldump
    • Diagnosing using pg_walinspect

    What is WAL

    • The WAL (Write-ahead logs) is asequential record of all changes made to a database
    • WAL files are stored in the directory pg_wal under the data directory
    • Whenever a transaction modifies data in PostgreSQL, the changes are first written to the WAL file before they are applied to the actual on-disk table data files
    • This process is known as write-ahead logging
    • The WAL in PostgreSQL is stored in a set of log files called WAL segments (16MB by default)
    • Once a segment is filled, it is archived, depending on the database configuration
    • WAL segments preceding the one containing the redo record are no longer needed and can be recycled or removed
    • The WAL segments can be replayed during crash recovery or used for replication purposes
    Aredo record is a specific type of WAL record that must be replayed during crash recovery or replication

    DML prior to WAL(<Postgres 7.0)

    1. User performs some Insert into table t1 and commits it.
      postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT
    2. Server loads the table t1 to shared buffer pool and modified the page creating dirty pages.
    3. Write to the disk files. If there are changes in 100 tables, 100 files had to be written.
    4. The backend responds to the user, confirming that the transaction has been committed.
    Since synchronous write had to be done in random location for various tables, the DML operations were extremely slow in versions 7.0 or earlier

    WAL was introduced

    1. User performs some Insert into table t1 and commits it.
      postgres=# begin;BEGINpostgres=*# INSERT INTO t1 VALUES(1);INSERT 0 1postgres=*# COMMIT;COMMIT
    2. Server loads the table t1 to shared buffer pool and modified the page creating dirty pages.

      Note: This modified page is not written to the data files on disk immediately.

    3. WAL records for insert operation are written to WAL buffer.
    4. The WAL buffers for the transaction are written to the WAL files during commit operation by the backend in case of sync commit or by walwriter in case of async commit.
    5. The backend responds to the user, confirming that the transaction has been committed.
    6. Checkpointer or background writer will write the dirty buffer to the physical file of the table.

    WAL Writer process

    • It periodically checks the WAL buffer and writes all unwritten XLOG records to the WAL segments.
    • The WAL writer is enabled by default and cannot be disabled.
    • The check interval is set to the configuration parameter wal_writer_delay, which defaults to 200 milliseconds.
    • The backends first block the space in WAL buffer and will then write later.
    • In the case the backends have not finished writing to WAL buffer, WAL writer process will wait till the WAL buffer is written before flushing to disk.

    WAL components

    • WAL Logs

      WAL logs are the binary files in which various transactions are stored. These logs are written to disk before the changes are made to the database. PostgreSQL maintains multiple WAL (Write-Ahead Log) files. Instead of a single WAL file, it uses a sequence of WAL segments to continuously log database changes. When one WAL segment fills up, PostgreSQL creates a new WAL file. This allows for efficient storage and recycling of WAL files.

    • Checkpoints

      Checkpoints are points in the sequence of transactions within a log file where all the data is guaranteed to have been updated with the information before the checkpoints. These are particularly important in crashes since the WAL file recovers from the most recent checkpoint. This is then where the system performs the REDO operation.

    • WAL buffers

      In PostgreSQL the working units that read and write data to your tables are called "buffers”. The buffers of write-ahead logs are typically called transaction log buffers. These determine the amount of memory allocated to storing WAL data (See 'wal_buffers' GUC later).

    • Log Sequence Numbers

      Log sequence numbers are 64-bit integers assigned to individual WAL records within a WAL, indicating the INSERT position. They are byte offsets unique within a single log stream that increase with each new transaction and are of data type pg_lsn. LSNs can help determine the amount of data in bytes between different transactions.

    WAL segment file

    • A WAL segment is 16 MB by default.
    • A WAL file name consists of three parts:
      • Timeline ID: Tracks different versions of the database’s history. Think of it like saving multiple versions of a database.
      • Logical file ID: Represents the WAL file’s sequence number within the timeline.
      • Segment ID: Identifies a specific segment within the log file. Each segment is typically 16 MB in size.

    WAL segment file

    • Each WAL segment is divided into 8 KB pages
    • The first page contains a special header -XLogLongPageHeaderData
    • All other pages have standard headers -XLogPageHeaderData
    • WAL records are written to WAL page

    WAL records

    Each WAL record consists of multiple components:

    • WAL record header
      • LSN (Log Sequence Number): Unique identifier for the record
      • Transaction ID (XID): Transaction that generated the record
      • Record type: Type of operation (e.g., INSERT, UPDATE, DELETE, COMMIT)
      • CRC checksum: Ensures WAL integrity
    • WAL record data
      • Block references: Identifies which pages (relations) are affected
      • Tuple data (optional): Stores new or old row values (for UPDATE/DELETE)
      • Redo information: Required details to redo the change
    • WAL record tail*
      • Full-page Image (FPI): Snapshot of the entire page (used after checkpoints).
      • Commit status: Indicates if the transaction is fully committed.

    *: Optional

    WAL record types

    TypeDetails
    CREATERepresents the creation of a database object such as a table, index, or other schema-related entity.
    INSERTLogs the insertion of a new row into a table.
    INSERT+INITSimilar to an INSERT record but occurs when inserting into a newly initialized (empty) page.
    UPDATELogs the modification of an existing row in a table.
    DELETELogs the deletion of a row from a table.
    TRUNCATELogs the truncation of a table, which removes all rows efficiently without logging individual row deletions.
    COMMITLogs a transaction commit, ensuring that all its changes are permanently recorded.
    LOCKRepresents a lock operation, typically used for locking database objects such as tables.
    INVALIDATIONLogs cache invalidation events to ensure changes to shared data (e.g., catalogs) are recognized by all backends.
    RUNNING_XACTSLogs running transactions at a particular moment, used in replication and recovery to determine transaction visibility.
    HOT_UPDATESpecial form of UPDATE that allows heap-only tuple updates (HOT), optimizing performance by avoiding unnecessary index updates.
    HOT_UPDATE+INITSimilar to HOT_UPDATE, but involves an update into an initialized (empty) page.
    NEW_CIDLogs the assignment of a new command ID, used to track command execution order within a transaction.

    Checkpoint processing

    1. A WAL record of this checkpoint is recorded in the log
    2. Shared memory contents like clog are written to disk
    3. All dirty pages of data are written to disk
    4. The checkpoint information like checkpoint LSN, timeline, oldest transaction ID are written to the pg_control file
    5. A WAL record saying checkpoint done is recorded in the log. WAL files that are not required (prior to last checkpoint’s location) are removed
    Acheckpoint is a process that ensures data consistency by flushing all modified (dirty) buffers from memory to disk and updating metadata. It marks a point in the WAL (Write-Ahead Log) from which crash recovery can start, minimizing WAL replay time.

    Recovery after crash

    ...pg_control version number:            1700...Latest checkpoint location:           0/449B5E0Latest checkpoint's REDO location:    0/449B588Latest checkpoint's REDO WAL file:    000000010000000000000004...
    • PostgreSQL read pg_control to identify the redo location from which the restart must be started.
    • It then reads the WAL from the redo point and replays the WAL record. It will skip applying the WAL if it is already applied.
    • By this process all committed transactions are restored.
    • Any partially executed transactions that were not committed will be rolled back.
    • Now recovery is complete and database is ready to use.

    WAL segment management

    A WAL segment switch occurs when:

    • The segment is filled up
    • The functionpg_switch_wal() is called
    • archive_mode is enabled andarchive_timeout expires

    WAL configurations

    ConfigDetails
    wal_levelreplica (default).
    wal_buffersSets the number of disk-page buffers in shared memory for WAL.
    wal_keep_segmentsMinimum number of file segments to keep in the pg_wal directory.
    If you do not want to archive, set a slightly increased value.
    checkpoint_flush_afterWhenever more than this amount of data has been written while performing a checkpoint, attempt to force the OS to issue these writes to the underlying storage.
    archive_modeWhen archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command or archive_library.
    archive_commandCommand for archiving the WAL.
    Example:cp %p /mnt/serv/arch_dir %f
    synchronous_commitSynchronization level of the standby servers.
    Setremote_apply to optimize data freshness andon to maintain performance and reliability.
    max_slot_wal_keep_sizeSpecify the maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time.
    max_wal_sendersNumber of standby servers +1.
    This value cannot exceed the value set in max_connections.
    wal_sender_timeoutTime to wait before determining that the WAL receiver process is in an abnormal state.
    max_connectionsMaximum number of simultaneous connections to the database server.
    Set a value greater than max_wal_senders.
    synchronous_standby_namesStandby servers for synchronous replication. No setting is required for asynchronous replication.
    Example:s1

    WAL in streaming replication

    • Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real time, so that the databases of the primary server and standby server can be kept in sync.
    •  
    • Streaming replication works by transferring, or shipping, the WAL to the standby server in real time, and applying it on the standby server.
    •  

    WAL in streaming replication

    1. User performs transaction commit on primary
    2. The backend process handles the transaction and writes changes to the primary data pages.
    3. Changes made by the transaction are logged in the Write-Ahead Log (WAL).
    4. The WAL sender process reads the WAL records from the primary node.
    5. WAL records are transmitted over a network connection from the primary to the standby.
    6. The WAL receiver process on the standby node writes incoming WAL records to its local WAL storage.
    7. The standby server reads the WAL records that were received.
    8. The startup process applies WAL changes to the standby’s data files.
    9. The recovery process applies WAL changes to the standby’s data files.

    WAL in streaming replication

    Logical replication is a method of replicating selective data changes based on the definition of the publication from the publisher to subscriber.

    1. User initiates a transaction commit on the publisher node
    2. The backend process applies changes to the publisher's data files
    3. Changes are logged in WAL (Write-Ahead Log) on the publisher
    4. The WAL sender process checks if the change should be published based on the publication rules
    5. The WAL sender transmits only relevant committed transactions to the subscriber node
    6. The apply worker receives changes and applies transactions to the subscriber's data
    7. The subscriber node can accept read and write queries independently

    WAL Summarizer process

    • The WAL Summarizer process was introduced in PostgreSQL 17. It supports incremental backups, tracks changes to all database blocks (including relations and visibility maps) and writes these modifications to WAL summary files located in thepg_wal/summaries directory
    • The WAL Summarizer process operates as follows:
      • During each checkpoint, it reads WAL segment files from the previous REDO point to the current REDO point
      • It tracks changes to all blocks of all relations (inc. visibility maps) using the WAL segment files
      • It writes the summary to WAL summary files located in thepg_wal/summaries/ directory
    • In the context of the WAL Summarizer process,previous REDO point andcurrent REDO point are referred to asstart_lsn andend_lsn, respectively
    • Wal summarizer process will be started ifsummarize_wal guc is set.
    The Redo Point in PostgreSQL refers to the Log Sequence Number (LSN) from which WAL (Write-Ahead Log) replay must start during crash recovery. It is the position where the last checkpoint started, and serves as the earliest point required for consistent recovery.

    Archiving

    • Continuous archiving is a feature that copies WAL segment files to an archival area at the time when a WAL segment switch occurs

      It is performed by the archiver (background) process. The copied file is called an archive log. This feature is typically used for hot physical backup and PITR (Point-in-Time Recovery)

    • The path to the archival area is set by the archive_command configuration parameter

      For example, the following parameter would copy WAL segment files to the directory/home/db/archives/ every time a segment switch occurs:

      archive_command = cp %p /home/db/archives/%f
    • The%p placeholder represents the copied WAL segment, and the%f placeholder represents the archive log
    • The archive_command parameter can be set to any Unix command or tool.

      This enables the use of scp or other file backup tools to transfer archive logs to remote hosts, instead of relying on simple copy commands.

    WAL modes

    synchronous_commit

    Specifies how much WAL processing must complete before the database server returns a success indication to the client.
    Valid values areremote_apply,on (default),remote_write,local, andoff

    synchronous_commit
    setting
    Local
    durable
    commot
    Standby
    durable commit
    after Postgres crash
    Standby
    durable commit
    after OS crash
    Standby
    query
    consistency
    remote_apply
    on
    remote_write
    local
    off

    WAL level

    • minimal
      • This level generates the least WAL volume. It logs no row information for permanent relations in transactions that create or rewrite them. This can make operations much faster. But minimal WAL does not contain sufficient information for point-in-time recovery, so replica or higher must be used to enable continuous archiving (archive_mode) and streaming binary replication.
    • replica
      • It means that the WAL contains enough information for streaming replication to read-only standby servers.
    • logical
      • In logical level, the same information is logged as with replica, plus information needed to extract logical change sets from the WAL. Using a level of logical will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL and many UPDATE and DELETE statements are executed.

    Diagnosing using pg_waldump

    pg_waldump is a PostgreSQL server application to display a human-readable rendering of the WAL of a PostgreSQL database cluster.

    pg_waldump [option...] [startseg [endseg]]

    Let’s say you want to see the WAL’s generated for table test_wal whose tablespace ID is1663, database ID is5, and relation ID is16384

    ./pg_waldump node1/pg_wal/000000010000000000000001 -R1663/5/16384rmgr: Heap  len (rec/tot): 59/59, tx: 744, lsn: 0/014AE238, prev 0/014AE200, desc: INSERT+INIT off: 1, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 70/70, tx: 755, lsn: 0/014F0C28, prev 0/014F0BF0, desc: HOT_UPDATE old_xmax: 755, old_off: 1, old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 2,blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 59/59, tx: 756, lsn: 0/014F0CD8, prev 0/014F0CA0, desc: INSERT off: 3, flags: 0x08, blkref #0: rel1663/5/16384 blk 0rmgr: Heap  len (rec/tot): 54/54, tx: 757, lsn: 0/014F0D48, prev 0/014F0D18, desc: DELETE xmax: 757, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel1663/5/16384 blk 0

    Diagnosing using pg_walinspect

    • pg_walinspect is an extension which provides SQL functions that allow you to inspect WAL content
    • This is very useful in cloud environment for people having only database access
    • Similar topg_waldump, but accessible through SQL rather than separate utility
    • Gets WAL record info about a record located at or after the in_lsn argument
    postgres=# SELECT * FROM pg_get_wal_records_info('0/014AEC80', '0/014FE6B0') LIMIT 1;-[ RECORD 1 ]----+---------------------------------------------------------start_lsn        | 0/14AEC80end_lsn          | 0/14AECB8prev_lsn         | 0/14AEC50xid              | 752resource_manager | Heaprecord_type      | DELETErecord_length    | 54main_data_length | 8fpi_length       | 0description      | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_ref        | blkref #0: rel 1663/5/16384 fork main blk 0

    Diagnosing using pg_walinspect

    Gets information about each block reference from all the valid WAL records between start_lsn and end_lsn with one or more block references.

    postgres=# SELECT * FROM pg_get_wal_block_info('0/014AEC80', '0/014FE6B0') limit 1;-[ RECORD 1 ]-----+---------------------------------------------------------start_lsn         | 0/14AEC80end_lsn           | 0/14AECB8prev_lsn          | 0/14AEC50block_id          | 0reltablespace     | 1663reldatabase       | 5relfilenode       | 16384relforknumber     | 0relblocknumber    | 0xid               | 752resource_manager  | Heaprecord_type       | DELETErecord_length     | 54main_data_length  | 8block_data_length | 0block_fpi_length  | 0block_fpi_info    | description       | xmax: 752, off: 2, infobits: [KEYS_UPDATED], flags: 0x00block_data        | block_fpi_data    |

    Summary

    • What is WAL
    • Why WAL is required
    • How WAL helps in recovery
    • How WAL helps in logical & streaming replication
    • Various WAL configurations
    • Diagnosis of WAL

    References

    Thank you

    Vigneshwaran C

    Software Lead Developer

    Fujitsu

    linkedin.com/in/vigneshwaran-c-80b36029

    Conclusion

    ill-people-using-laptop-around-shield-and-gear-and-magnifying-glass-01-variation-04Presenting at PGConf India 2025 was an enriching experience, allowing me to share my insights on WAL with a vibrant community of PostgreSQL enthusiasts. I tried to encapsulate the essence of WAL, from its fundamental concepts to its critical role in ensuring data integrity and performance. By exploring the various components, configurations, and diagnostic tools associated with WAL, I aimed to provide a comprehensive understanding that attendees could apply in their own database environments.

    I hope that sharing the slides here serve as a valuable resource for anyone looking to deepen their knowledge of PostgreSQL's WAL mechanism. Thank you for joining me on this journey through the intricacies of WAL!

    Topics:PostgreSQL

    Receive our blog

    Search by topic

    see all >
    Vigneshwaran C
    Software Lead Developer, Fujitsu OSS PostgreSQL team
    Vignesh is a recognized PostgreSQL Contributor working in adding enhancements and fixing bugs in PostgreSQL.

    He presents talks related to logical replication in various conferences, and he authors blogs on various PostgreSQL topics.
    OurMigration Portal helps you assess the effort required to move to the enterprise-built version of Postgres - Fujitsu Enterprise Postgres.
    We also have a series of technical articles for PostgreSQL enthusiasts of all stripes, with tips and how-to's.

     

    Explore PostgreSQL Insider >
    Subscribe to be notified of future blog posts
    If you would like to be notified of my next blog posts and other PostgreSQL-related articles, fill the formhere.

    Read our latest blogs

    Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.

    Receive our blog

    Fill the form to receive notifications of future posts

    Search by topic

    see all >

    [8]ページ先頭

    ©2009-2025 Movatter.jp