Last month, I was proud to talk about the internals of logical replication to a room full of PostgreSQL enthusiasts at PGConf India 2023. In case you missed it, let's go through it together.
My paper on the Internals of Logical Replication was one of the 27 CFP's selected from 120 submissions. During the event, I covered the following topics:
Logical replication is a method of replicating data changes from publisher to subscriber. The node where a publication is defined is referred to as the publisher. The node where a subscription is defined is referred to as the subscriber. Logical replication allows fine-grained control over both data replication and security.
Logical replication uses a publish and subscribe model with one or more subscribers subscribing to one or more publications on a publisher node. Subscribers pull data from the publications they subscribe to and may subsequently re-publish data to allow cascading replication or more complex configurations.
Below, I illustrate how logical replication works in PostgreSQL 15. I will refer back to this diagram later in this post.
Publications can be defined on the primary node whose changes should be replicated. A publication is a set of changes generated from a table or a group of tables and might also be described as a change set or replication set. Each publication exists in only one database.
Each table can be added to multiple publications if needed. Publications may currently only contain tables and all tables in schema.
Publications can choose to limit the changes they produce to any combination ofINSERT,UPDATE,DELETE, andTRUNCATE, similar to how triggers are fired by particular event types. By default, all operation types are replicated.
When a publication is created, the publication information will be added topg_publication catalogue table:
Information about table publication is added topg_publication_rel catalog table:
Information about schema publications is added topg_publication_namespace catalog table:
A subscription is the downstream side of logical replication. It defines the connection to another database and set of publications (one or more) to which it wants to subscribe.
The subscriber database behaves in the same way as any other PostgreSQL instance, and can be used as a publisher for other databases by defining its own publications. A subscriber node may have multiple subscriptions. It is possible to define multiple subscriptions between a single publisher-subscriber pair, in which case care must be taken to ensure that the subscribed publication objects don't overlap.
Each subscription will receive changes via one replication slot. Additional replication slots may be required for the initial synchronization of pre-existing table data, which will be dropped at the end of data synchronization.
When a subscription is created, the subscription information will be added to thepg_subscription catalog table:
The subscriber will connect to the publisher and get the list of tables that the publisher is publishing. In our earlier example, we createdpub_alltables to publish data of all tables - the publication relations will be added to thepg_subscription_rel catalog tables:
The subscriber connects to the publisher and creates a replication slot, whose information is available inpg_replication_slots:
Subscribers add the subscription stats information topg_stat_subscription
The initial part of the CREATE SUBSCRIPTION command will be completed and returned to the user. The remaining work will be done in the background by thereplication launcher,walsender,apply worker, andtablesync worker after the CREATE SUBSCRIPTION command is completed.
This process is started by the postmaster during the start of the instance. It will periodically check the pg_subscription catalog table to see if any subscriptions have been added or enabled.
The logical replication worker launcher uses the background worker infrastructure to start the logical replication workers for every enabled subscription.
Once the launcher process identifies that a new subscription has been created or enabled, it will start an apply worker process.
The apply worker running can be seen from the process list:
The above information illustrates step 1 mentioned in theArchitecture section above.
The apply worker will iterate through the table list and launch tablesync workers to synchronize the tables. Each table will be synchronized by one tablesync worker.
Multiple tablesync workers (one for each table) will run in parallel based on themax_sync_workers_per_subscription configuration.
The apply worker will wait until the tablesync worker copies the initial table data and sets the table state to ready state inpg_subscription_rel.
The above information illustrates step 2 mentioned in theArchitecture section above.
Note: Currently, DDL operations are not supported by logical replication. Only DML changes will be replicated.
The above information illustrates step 3 mentioned in theArchitecture section above.
The walsender is started when the subscriber connects to the publisher and requests WAL. It then reads the WAL record by record, and decodes it to get the tuple data and size.
The changes are queued into thereorderbufferqueue. Thereorderbufferqueuecollects individual pieces of transactions in the order they are written to the WAL. When a transaction is completed, it will reassemble the transaction and call the output plugin with the changes. If thereorderbufferqueueexceeds logical_decoding_work_mem, then find the largest transaction and evict it to disk.
If streaming is enabled, then this transaction data will be sent to the subscriber, but will be applied in the subscriber only after the transaction is committed in the publisher.
Once the transaction is committed, the walsender performs the following:
The walsender then updates the statistics like txn count, txn bytes, spill count, spill bytes, spill txns, stream count, stream bytes, stream txns.
The above information illustrates steps 7 and 8 mentioned in theArchitecture.
Incremental changes are handled by the walsender and the apply worker, as described below.
If the apply worker fails due to an error, the apply worker process will exit. During its normal operation, the apply worker will have maintained the origin LSN during the last transaction commit.
The replication launcher will periodically check if the subscription worker is running. If the launcher identifies that it is not, then it will restart the worker for the subscription. The apply worker will request start_replication streaming from the last origin LSN that was committed. The walsender will start streaming transactions from the origin LSN (last committed transaction) requested by the apply worker.
Whenever the apply worker encounters a constraint error such as duplicate constraint error, check constraint error, etc, it will exit and repeat the steps mentioned above.
There is an option to skip the LSN in case of errors - user canset skip lsn of the failing transaction in this case. If the user sets to skip LSN, the apply worker will check if the transaction matches the LSN specified, skip this transaction, and proceed to the next one.
The user can usedisable_on_error instead of repeatedly trying the steps. In this case, any error in the apply worker will be caught using try() /catch(), and the subscription will be disabled before the apply worker exists. As the subscription is disabled, the launcher will not restart the apply worker for the subscription.
The apply worker will periodically check the current subscription values against the new ones - if they have been changed:
In the subscriber, create a subscription with the synchronous_commit option set to ‘on’
In the publisher, use ALTER SYSTEM SET to set the synschronous_standby_names option to the subscription name, and reload the configuration using pg_reload_conf. Verify that is_sync option is enabled in pg_stat_replication.
Subscriber
Publisher
A replication slot ensures that the publisher will retain the WAL logs that are needed by the replicas even when they are disconnected from the subscriber.
As mentioned earlier, each (active) subscription receives changes from a replication slot on the remote (publishing) side.
Additional table synchronization slots are normally transient, created internally to perform initial table synchronization, and dropped automatically when they are no longer needed.
Normally, the remote replication slot is created automatically when the subscription is created duringCREATE SUBSCRIPTION,and it is dropped automatically when the subscription is dropped duringDROP SUBSCRIPTION.
Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys.
When creating a publication, a WHERE clause can be specified. This information is stored in thepg_publication_relcatalog table:
Rows that don't satisfy this WHERE clause will be filtered by the publisher. This allows a set of tables to be partially replicated.
During table synchronization, only the table data that satisfies the row filter will be copied to the subscriber. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), the expressions get OR 'ed, and rows satisfying any of the expressions are replicated.
If the subscription has several publications, and at least one of them was specified using ALL TABLES or TABLES IN SCHEMA and the table belongs to the referred schema, then those publications take precedence, and the publication behaves as if there are no row filters.
For insert operations, the publisher checks if the new row satisfies the row filter condition to determine if the new record should be sent to the subscriber or skipped.
For delete operations, the publisher checks if the row satisfies the row filter condition to determine if the operation should be sent to the subscriber or skipped.
The update operation is handled in a slightly different manner:
When creating a publication, a column list clause can be specified. This information is stored inpg_publication_rel catalog table:
postgres=# CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname);
CREATE PUBLICATION
postgres=# SELECT * FROMpg_publication_rel;
oid | prpubid | prrelid | prqual | prattrs
-------+---------+---------+--------+---------
16453 | 16452 | 16436 | | 1 2
(1 row)
postgres=# SELECT * FROM pg_publication_tables;
pubname | schemaname | tablename | attnames | rowfilter
---------------+------------+-----------+----------------------+-----------
users_filtered | public | users | {user_id, firstname} |
(1 row)
Columns not included in this list are not sent to the subscriber. This allows the schema on the subscriber to be a subset of the publisher schema, as illustrated below.
During the initial table synchronization, only columns included in the column list are copied to the subscriber.
When sending incremental transaction changes, the publisher will check for the relation information and send to the subscriber the values only for the columns that match the specified column list.
For partitioned tables, publish_via_partition_root determines whether the column list for the root or leaf relation will be used. If the parameter is 'false' (the default), the list defined for the leaf relation is used. Otherwise, the column list for the root partition will be used.
Specifying a column list when the publication also publishes FOR TABLES IN SCHEMA is not supported.
There's currently no support for subscriptions comprising several publications where the same table has been published with different column lists.
The row filter and column list features provide the following advantages:
One or more schemas can be specified in FOR TABLES IN SCHEMA. This information is maintained in thepg_publication_namespace catalog table:
During the initial table synchronization, only tables that belong to the specified schema are copied to the subscriber. When sending the incremental transaction changes, the publisher will check if this transaction’s relation belongs to one of the schemas and publish only those changes.
If the subscription has several publications, and at least one was specified using ALL TABLES, then those publications will be given higher precedence and all the table data will be sent to the subscription.
Any new table created in the schema after creation of the publication will be automatically added to it. Similarly, tables removed from the schema will be automatically removed from the publication. But data of newly created tables (after creation of subscription) will not be replicated automatically - the user will have to run ALTER SUBSCRIPTION … REFRESH PUBLICATION, which will fetch the missing tables and take care of synchronizing the data from the publisher.
ALL TABLES replication is similar to TABLES IN SCHEMA publication, except that it will replicate data from all tables, instead of just the tables present in the schema.
If you have more questions or would like to extend your knowledge, here is some recommended reading:
Topics:PostgreSQL,PostgreSQL community,PostgreSQL development
Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.
Fill the form to receive notifications of future posts