Synapse database schema files

Synapse's database schema is stored in thesynapse.storage.schema module.

Logical databases

Synapse supports splitting its datastore across multiple physical databases (which canbe useful for large installations), and the schema files are therefore split accordingto the logical database they apply to.

At the time of writing, the following "logical" databases are supported:

state - used to store Matrix room state (more specifically,state_groups,their relationships and contents).
main - stores everything else.

Additionally, thecommon directory contains schema files for tables which must bepresent onall physical databases.

Synapse schema versions

Synapse manages its database schema via "schema versions". These are mainly used tohelp avoid confusion if the Synapse codebase is rolled back after the database isupdated. They work as follows:

The Synapse codebase defines a constantsynapse.storage.schema.SCHEMA_VERSIONwhich represents the expectations made about the database by that version. Forexample, as of Synapse v1.36, this is59.
The database stores a "compatibility version" inschema_compat_version.compat_version which defines theSCHEMA_VERSION of theoldest version of Synapse which will work with the database. On startup, ifcompat_version is found to be newer thanSCHEMA_VERSION, Synapse will refuse tostart.
Synapse automatically updates this field fromsynapse.storage.schema.SCHEMA_COMPAT_VERSION.
Whenever a backwards-incompatible change is made to the database format (normallyvia adelta file),synapse.storage.schema.SCHEMA_COMPAT_VERSION is also updatedso that administrators can not accidentally roll back to a too-old version of Synapse.

Generally, the goal is to maintain compatibility with at least one or two previousreleases of Synapse, so any substantial change tends to require multiple releases and abit of forward-planning to get right.

As a worked example: we want to remove theroom_stats_historical table. Here is how itmight pan out.

Replace any code thatreads fromroom_stats_historical with alternativeimplementations, but keep writing to it in case of rollback to an earlier version.Also, increasesynapse.storage.schema.SCHEMA_VERSION. In thisinstance, there is no existing code which reads fromroom_stats_historical, soour starting point is:
v1.36.0:SCHEMA_VERSION=59,SCHEMA_COMPAT_VERSION=59
Next (say in Synapse v1.37.0): remove the code thatwrites toroom_stats_historical, but don’t yet remove the table in case of rollback tov1.36.0. Again, we increasesynapse.storage.schema.SCHEMA_VERSION, butbecause we have not broken compatibility with v1.36, we do not yet updateSCHEMA_COMPAT_VERSION. We now have:
v1.37.0:SCHEMA_VERSION=60,SCHEMA_COMPAT_VERSION=59.
Later (say in Synapse v1.38.0): we can remove the table altogether. This willbreak compatibility with v1.36.0, so we must updateSCHEMA_COMPAT_VERSION accordingly.There is no need to updatesynapse.storage.schema.SCHEMA_VERSION, since there is nochange to the Synapse codebase here. So we end up with:
v1.38.0:SCHEMA_VERSION=60,SCHEMA_COMPAT_VERSION=60.

If in doubt about whether to updateSCHEMA_VERSION or not, it is generally best tolean towards doing so.

Full schema dumps

In thefull_schemas directories, only the most recently-numbered snapshot is used(54 at the time of writing). Older snapshots (eg,16) are present for historicalreference only.

Building full schema dumps

If you want to recreate these schemas, they need to be made from a database thathas had all background updates run.

To do so, usescripts-dev/make_full_schema.sh. This will produce newfull.sql.postgres andfull.sql.sqlite files.

Ensure postgres is installed, then run:

./scripts-dev/make_full_schema.sh -p postgres_username -o output_dir/

NB at the time of writing, this script predates the split into separatestate/maindatabases so will require updates to handle that correctly.

Delta files

Delta files define the steps required to upgrade the database from an earlier version.They can be written as either a file containing a series of SQL statements, or a Pythonmodule.

Synapse remembers which delta files it has applied to a database (they are stored in theapplied_schema_deltas table) and will not re-apply them (even if a given file issubsequently updated).

Delta files should be placed in a directory namedsynapse/storage/schema/<database>/delta/<version>/.They are applied in alphanumeric order, so by convention the first two charactersof the filename should be an integer such as01, to put the file in the right order.

SQL delta files

These should be named*.sql, or — for changes which should only be applied for agiven database engine —*.sql.posgres or*.sql.sqlite. For example, a delta whichadds a new column to thefoo table might be called01add_bar_to_foo.sql.

Note that our SQL parser is a bit simple - it understands comments (-- and/*...*/),but complex statements which require a; in the middle of them (such asCREATE TRIGGER) are beyond it and you'll have to use a Python delta file.

Python delta files

For more flexibility, a delta file can take the form of a python module. These shouldbe named*.py. Note that database-engine-specific modules are not supported here –instead you can writeif isinstance(database_engine, PostgresEngine) or similar.

A Python delta module should define either or both of the following functions:

import synapse.config.homeserverimport synapse.storage.enginesimport synapse.storage.typesdef run_create(    cur: synapse.storage.types.Cursor,    database_engine: synapse.storage.engines.BaseDatabaseEngine,) -> None:    """Called whenever an existing or new database is to be upgraded"""    ...def run_upgrade(    cur: synapse.storage.types.Cursor,    database_engine: synapse.storage.engines.BaseDatabaseEngine,    config: synapse.config.homeserver.HomeServerConfig,) -> None:    """Called whenever an existing database is to be upgraded."""    ...

Background updates

It is sometimes appropriate to perform database migrations as part of a backgroundprocess (instead of blocking Synapse until the migration is done). In particular,this is useful for migrating data when adding new columns or tables.

Pending background updates stored in thebackground_updates table and are denotedby a unique name, the current status (stored in JSON), and some dependency information:

Whether the update requires a previous update to be complete.
A rough ordering for which to complete updates.

A new background update needs to be added to thebackground_updates table:

INSERT INTO background_updates (ordering, update_name, depends_on, progress_json) VALUES  (7706, 'my_background_update', 'a_previous_background_update' '{}');

And then needs an associated handler in the appropriate datastore:

self.db_pool.updates.register_background_update_handler(    "my_background_update",    update_handler=self._my_background_update,)

There are a few types of updates that can be performed, see theBackgroundUpdater:

register_background_update_handler: A generic handler for custom SQL
register_background_index_update: Create an index in the background
register_background_validate_constraint: Validate a constraint in the background(PostgreSQL-only)
register_background_validate_constraint_and_delete_rows: Similar toregister_background_validate_constraint, but deletes rows which don't fitthe constraint.

Forregister_background_update_handler, the generic handler must track progressand then finalize the background update:

async def _my_background_update(self, progress: JsonDict, batch_size: int) -> int:    def _do_something(txn: LoggingTransaction) -> int:        ...        self.db_pool.updates._background_update_progress_txn(            txn, "my_background_update", {"last_processed": last_processed}        )        return last_processed - prev_last_processed    num_processed = await self.db_pool.runInteraction("_do_something", _do_something)    await self.db_pool.updates._end_background_update("my_background_update")    return num_processed

Synapse will attempt to rate-limit how often background updates are run via thegiven batch-size and the returned number of processed entries (and how long thefunction took to run). Seebackground update controller callbacks.

Boolean columns

Boolean columns require special treatment, since SQLite treats booleans thesame as integers.

Any new boolean column must be added to theBOOLEAN_COLUMNS list insynapse/_scripts/synapse_port_db.py. This tells the port script to castthe integer value from SQLite to a boolean before writing the value to thepostgres database.

`event_id` global uniqueness

event_id's can be considered globally unique although there has been a lot ofdebate on this topic in places likeMSC2779 andMSC2848 whichhas no resolution yet (as of 2022-09-01). There are several places in Synapseand even in the Matrix APIs likeGET /_matrix/federation/v1/event/{eventId}where we assume that event IDs are globally unique.

When scopingevent_id in a database schema, it is often nice to accompany itwithroom_id (PRIMARY KEY (room_id, event_id) and aFOREIGN KEY(room_id) REFERENCES rooms(room_id)) which makes flexible lookups easy. For example itmakes it very easy to find and clean up everything in a room when it needs to bepurged (no need to use sub-select query or join from theevents table).

A note on collisions: In room versions1 and2 it's possible to end up withtwo events with the sameevent_id (in the same or different rooms). After roomversion3, that can only happen with a hash collision, which we basically hopewill never happen (SHA256 has a massive big key space).

Worked examples of gradual migrations

Some migrations need to be performed gradually. A prime example of this is anythingwhich would need to do a large table scan — including adding columns, indices orNOT NULL constraints to non-empty tables — such a migration should be done as abackground update where possible, at least on Postgres.We can afford to be more relaxed about SQLite databases since they are usuallyused on smaller deployments and SQLite does not support the same concurrentDDL operations as Postgres.

We also typically insist on having at least one Synapse version's worth ofbackwards compatibility, so that administrators can roll back Synapse if an upgradedid not go smoothly.

This sometimes results in having to plan a migration across multiple versionsof Synapse.

This section includes an example and may include more in the future.

Transforming a column into another one, with`NOT NULL` constraints

This example illustrates how you would introduce a new column, write data into itbased on data from an old column and then drop the old column.

We are aiming for semantic equivalence to:

ALTER TABLE mytable ADD COLUMN new_column INTEGER;UPDATE mytable SET new_column = old_column * 100;ALTER TABLE mytable ALTER COLUMN new_column ADD CONSTRAINT NOT NULL;ALTER TABLE mytable DROP COLUMN old_column;

Synapse version`N`

SCHEMA_VERSION = SSCHEMA_COMPAT_VERSION = ... # unimportant at this stage

Invariants:

old_column is read by Synapse and written to by Synapse.

Synapse version`N + 1`

SCHEMA_VERSION = S + 1SCHEMA_COMPAT_VERSION = ... # unimportant at this stage

Changes:

ALTER TABLE mytable ADD COLUMN new_column INTEGER;

Invariants:

old_column is read by Synapse and written to by Synapse.
new_column is written to by Synapse.

Notes:

new_column can't have aNOT NULL NOT VALID constraint yet, because the previous Synapse version did not write to the new column (since we haven't bumped theSCHEMA_COMPAT_VERSION yet, we still need to be compatible with the previous version).

Synapse version`N + 2`

SCHEMA_VERSION = S + 2SCHEMA_COMPAT_VERSION = S + 1 # this signals that we can't roll back to a time before new_column existed

Changes:

On Postgres, add aNOT VALID constraint to ensure new rows are compliant.SQLite does not have such a construct, but it would be unnecessary anyway since there is no way to concurrently perform this migration on SQLite.
```
ALTER TABLE mytable ADD CONSTRAINT CHECK new_column_not_null (new_column IS NOT NULL) NOT VALID;
```
Start a background update to perform migration: it should gradually run e.g.
```
UPDATE mytable SET new_column = old_column * 100 WHERE 0 < mytable_id AND mytable_id <= 5;
```
This background update is technically pointless on SQLite, but you must schedule it anyway so that theportdb script to migrate to Postgres still works.
Upon completion of the background update, you should runVALIDATE CONSTRAINT on Postgres to turn theNOT VALID constraint into a valid one.
```
ALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;
```
This will take some time but doesNOT hold an exclusive lock over the table.

Invariants:

old_column is read by Synapse and written to by Synapse.
new_column is written to by Synapse and new rows always have a non-NULL value in this field.

Notes:

If you wish, you can convert theCHECK (new_column IS NOT NULL) to aNOT NULL constraint free of charge in Postgres by adding theNOT NULL constraint and then dropping theCHECK constraint, because Postgres can statically verify that theNOT NULL constraint is implied by theCHECK constraint without performing a table scan.
It might be tempting to make versionN + 2 redundant by moving the background update toN + 1 and delaying adding theNOT NULL constraint toN + 3, but that would mean the constraint would always be validated in the foreground inN + 3. Whereas if theN + 2 step is kept, the migration inN + 3 would be fast in the happy case.

Synapse version`N + 3`

SCHEMA_VERSION = S + 3SCHEMA_COMPAT_VERSION = S + 1 # we can't roll back to a time before new_column existed

Changes:

(Postgres) Update the table to populate values ofnew_column in case the background update had not completed. Additionally,VALIDATE CONSTRAINT to make the check fully valid.

-- you ideally want an index on `new_column` or e.g. `(new_column) WHERE new_column IS NULL` first, or perhaps you can find a way to skip this if the `NOT NULL` constraint has already been validated.UPDATE mytable SET new_column = old_column * 100 WHERE new_column IS NULL;-- this is a no-op if it already ran as part of the background updateALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;

(SQLite) Recreate the table by precisely followingthe 12-step procedure for SQLite table schema changes.During this table rewrite, you should recreatenew_column asNOT NULL and populate any outstandingNULL values at the same time.Unfortunately, you can't dropold_column yet because it must be present for compatibility with the Postgres schema, as needed byportdb.(Otherwise you could do this all in one go with SQLite!)

Invariants:

old_column is written to by Synapse (but no longer read by Synapse!).
new_column is read by Synapse and written to by Synapse. Moreover, all rows have a non-NULL value in this field, as guaranteed by a schema constraint.

Notes:

We can't dropold_column yet, or even stop writing to it, because that would break a rollback to the previous version of Synapse.
Application code can now rely onnew_column being populated. The remaining steps are only motivated by the wish to clean-up old columns.

Synapse version`N + 4`

SCHEMA_VERSION = S + 4SCHEMA_COMPAT_VERSION = S + 3 # we can't roll back to a time before new_column was entirely non-NULL

Invariants:

old_column exists but is not written to or read from by Synapse.
new_column is read by Synapse and written to by Synapse. Moreover, all rows have a non-NULL value in this field, as guaranteed by a schema constraint.

Notes:

We can't dropold_column yet because that would break a rollback to the previous version of Synapse.
TODO: It may be possible to relax this and drop the column straight away as long as the previous version of Synapse detected a rollback occurred and stopped attempting to write to the column. This could possibly be done by checking whether the database's schema compatibility version wasS + 3.

Synapse version`N + 5`

SCHEMA_VERSION = S + 5SCHEMA_COMPAT_VERSION = S + 4 # we can't roll back to a time before old_column was no longer being touched

Changes:

ALTER TABLE mytable DROP COLUMN old_column;

Movatterモバイル変換

Synapse