Replication lag

MySQL  |  PostgreSQL  |  SQL Server

This page describes how to troubleshoot and fix replication lag for Cloud SQLread replicas.

Overview

Cloud SQL read replicas useSQL Server Always-On Availability Groups for replication. Changes are written to the transaction log in the primaryinstance. The primary instance forwards transactions to any secondary replicainstances, where the changes are applied. The availability mode used isAsynchronous-commit mode.

Replication lag can happen in a few scenarios, such as:

  • The primary instance can't send the changes fast enough to the replica.
  • The replica can't receive the changes quickly enough.
  • The replica can't apply the changes quickly enough.
The first two listed scenarios can be monitored with thenetwork_lagmetric. For more information about the metric, seeMonitor replication lag.

Ensure replica is adequately provisioned

A replica instance that is smaller than the primary instance (for example, withless vCPUs and memory) can experience replication lag. A smaller replica mightalso have different default configuration flags compared to a larger primaryinstance. We recommend that the replica instance is at least as large as theprimary instance to have enough resources to handle the replication load.

High CPU utilization on the replica can also cause replication lag. If thereplica's CPU utilization is high (for example, over 90%), consider increasingthe replica's CPU capacity.

Optimize queries and schema

This section suggests some common query and schema optimizations you can make toimprove replication performance.

Long-running queries in the read replica

Long-running queries in the replica might block replication for Cloud SQL.You might want to have separate replicas for online transaction processing(OLTP) and online analytical processing (OLAP) purposes and only sendlong-running queries to the OLAP replica.

Exclusive locks due to DDL

Data definition language (DDL) commands, such asALTER TABLE andCREATE INDEX, can cause replication lag in the replica due toexclusive locks. To avoid lock contention, consider scheduling DDL executionduring times when the query load is lower on the replicas.

Additionally, DDL statements such asCREATE INDEX,ALTER INDEX, andINDEXMAINTENANCE can cause replication lag due to the large number of transactionlog records that these statements can generate.

Overloaded replica

If a read replica is receiving too many queries, replication could be blocked.Consider splitting the reads among multiple replicas to reduce the load on eachone.

To avoid query spikes, consider throttling replica read queries in yourapplication logic or in a proxy layer if you use one.

If there are spikes of activity on the primary instance, consider spreading outupdates.

Monolithic primary database

Consider sharding the primary database vertically (or horizontally) to preventone or more lagging tables from holding back all the other tables.

Monitor replication lag

You can use thereplica_lag andnetwork_lag metrics to monitor replicationlag and identify whether the cause of the lag is in the primary database,the network, or the replica.

MetricDescription
Network lag
(cloudsql.googleapis.com/database/replication/network_lag)

The difference between the timestamp of the last received log entry on the replica and the last sent log entry on the primary.

Verify replication

To verify that replication is working, check the value of thecloudsql.googleapis.com/database/replication/state metric on the primaryinstance. If the state isRunning, then replication is healthy.

What's next:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.