Replication lag Stay organized with collections Save and categorize content based on your preferences.
This page describes how to troubleshoot and fix replication lag for Cloud SQLread replicas.
Overview
Cloud SQL read replicas usePostgreSQL streaming replication.Changes are written to Write-Ahead Log (WAL) in the primary instance.The WAL sender sends the WAL to the WAL receiver in the replica, where they are applied.Replication lag can happen in a few scenarios, such as:
- The primary instance can't send the changes fast enough to the replica.
- The replica can't receive the changes quickly enough.
- The replica can't apply the changes quickly enough.
network_lag
metric.The third is observed via thereplica_lag
metric. Highreplica_lag
means thatthe replica can't apply replication changes fast enough. The total lag can beobserved viareplica_byte_lag
metric, which has labels to indicate furtherdetails. These metrics are described in theMonitor replication lagsection below.Optimize queries and schema
This section suggests some common query and schema optimizations you can make toimprove replication performance.
Long-running queries in the read replica
Long-running queries in the replica might block replication for Cloud SQL.You might want to have separate replicas for online transaction processing(OLTP) and online analytical processing (OLAP) purposes and only sendlong-running queries to the OLAP replica.
Consider adjusting themax_standby_archive_delay
andmax_standby_streaming_delay
flags for your replica.
If you suspect VACUUM is the culprit, and query cancellation is not acceptable,consider setting thehot_standby_feedback
flag in the replica.
ReviewPostgreSQL documentation for more information.
Exclusive locks due to DDL
Data definition language (DDL) commands, such asALTER TABLE
andCREATE INDEX
, can cause replication lag in the replica due toexclusive locks. To avoid lock contention, consider scheduling DDL executionduring times when the query load is lower on the replicas.
Overloaded replica
If a read replica is receiving too many queries, replication could be blocked.Consider splitting the reads among multiple replicas to reduce the load on eachone.
To avoid query spikes, consider throttling replica read queries in yourapplication logic or in a proxy layer if you use one.
If there are spikes of activity on the primary instance, consider spreading outupdates.
Monolithic primary database
Consider sharding the primary database vertically (or horizontally) to preventone or more lagging tables from holding back all the other tables.
Monitor replication lag
You can use thereplica_lag
andnetwork_lag
metrics to monitor replicationlag and identify whether the cause of the lag is in the primary database,the network, or the replica.
Metric | Description |
---|---|
Replication lag ( cloudsql.googleapis.com ) | The number of seconds that the replica's state is lagging behind the state of the primary instance. This is the difference between the current time and the original timestamp at which the primary database committed the transaction that is currently being applied on the replica. In particular, writes might be counted as lagging even if they have been received by the replica, if the replica hasn't yet applied the write to the database. This metric is calculated using |
Lag bytes ( cloudsql.googleapis.com ) | The amount of bytes by which the replica's state is lagging behind the state of the primary database.
|
Network lag ( cloudsql.googleapis.com ) | The amount of time in, seconds that it takes from commit in the primary database to reach the WAL receiver in the replica. If the |
Verify replication
To verify that replication is working, run the following statement against thereplica: select status, last_msg_receipt_time from pg_stat_wal_receiver;
If replication is happening, you see the statusstreaming
and a recentlast_msg_receipt_time:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver; status | last_msg_receipt_time -----------+------------------------------- streaming | 2020-01-21 20:19:51.461535+00 (1 row)
If replication is not happening, an empty result is returned:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver; status | last_msg_receipt_time --------+----------------------- (0 rows)
What's next:
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-07-14 UTC.