tsg/pgstreamPublic

forked fromxataio/pgstream

NotificationsYou must be signed in to change notification settings
Fork0
Star0

PostgreSQL replication with DDL changes

www.xata.io

License

Apache-2.0 license

0 stars 37 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,366 Commits
.github		.github
brand-kit		brand-kit
build/docker		build/docker
cmd		cmd
docs		docs
internal		internal
migrations/postgres		migrations/postgres
pkg		pkg
tools		tools
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cli-definition.json		cli-definition.json
config_template.yaml		config_template.yaml
go.mod		go.mod
go.sum		go.sum
license-header.txt		license-header.txt
main.go		main.go
transformers-definition.json		transformers-definition.json

Repository files navigation

pgstream - Postgres replication with DDL changes

pgstream is an open source CDC command-line tool and library that offers Postgres replication support with DDL changes to any provided target.

Features

Schema change tracking and replication of DDL changes
Support for multiple out of the box targets
- Elasticsearch/OpenSearch
- Webhooks
- PostgreSQL
Initial and on demand PostgreSQL snapshots (for when you don't need continuous replication)
Column value transformations (anonymise your data on the go!)
Modular deployment configuration, only requires Postgres
Kafka support with schema based partitioning
Extendable support for custom targets

Usage

pgstream can be used via the readily available CLI or as a library. For detailed information about the CLI usage, check out the dedicatedCLI documentation section.

CLI Installation

# Download the latest releasecurl -L https://github.com/xataio/pgstream/releases/latest/download/pgstream-linux-amd64 -o pgstreamchmod +x pgstreamsudo mv pgstream /usr/local/bin/# Or use go installgo install github.com/xataio/pgstream@latest# Or build from sourcegit clone https://github.com/xataio/pgstream.gitcd pgstreamgo build -o pgstream ./cmd# Or install via homebrew on macOS or Linuxbrew tap xataio/pgstreambrew install pgstream

Environment setup

If you have an environment available, with at least Postgres and whichever resources you're planning on running, then you can skip this step. Otherwise, a docker setup is available in this repository with profiles that selectively start Postgres, Kafka and OpenSearch.

To run all profiles:

docker-compose -f build/docker/docker-compose.yml up

If you only want to run PostgreSQL to PostgreSQL pgstream replication you can use thepg2pg profile:

docker-compose -f build/docker/docker-compose.yml --profile pg2pg up

You can also run multiple profiles. For example to start two PostgreSQL instances and Kafka:

docker-compose -f build/docker/docker-compose.yml --profile pg2pg --profile kafka up

List of supported docker profiles:

pg2pg
pg2os
pg2webhook
kafka

Configuration

Pgstream source and target need to be configured appropriately before the commands can be run. This can be done:

Using the relevant CLI flags for each command
Using a yaml configuration file
Using environment variables (.env file supported)

Check theconfiguration documentation for more information about the configuration options, or check theCLI documentation for details on the available flags. Additionally, you can find sample files for both .env and .yamlhere.

If you want to configure column transformations, leveraginggreenmask,neosync andgo-masker open source integrations, as well as custom transformers, check thetransformation rules configuration for more details, along with the list ofavailable transformers.

Run`pgstream`

Replication mode

Run will start streaming data from the configured source into the configured target. By passing the--init flag to the run command, pgstream will initialise the pgstream state in the source Postgres database before starting replication. It will:

Create apgstream schema
Create tables/functions/triggers to keep track of schema changes for DDL replication (seeTracking schema changes for more details)
Create a replication slot

Initialisation is required for pgstream replication. It can alternatively be performed by running thepgstream init command separately beforepgstream run. Check out theCLI documentation for more details.

Example running pgstream replication from Postgres -> OpenSearch:

# using the environment configuration filepgstream run -c docs/examples/pg2os.env --init --log-level trace# using the yaml configuration filepgstream run -c docs/examples/pg2os.yaml --init --log-level info# using the CLI flagspgstream run --source postgres --source-url"postgres://postgres:postgres@localhost:5432?sslmode=disable" --target opensearch --target-url"http://admin:admin@localhost:9200" --init

Example running pgstream with Postgres -> Kafka, and in a separate terminal, Kafka->OpenSearch:

# using the environment configuration filepgstream run -c docs/examples/pg2kafka.env --init --log-level trace# using the yaml configuration filepgstream run -c docs/examples/pg2kafka.yaml --init --log-level info# using the CLI flagspgstream run --source postgres --source-url"postgres://postgres:postgres@localhost:5432?sslmode=disable" --target kafka --target-url"localhost:9092" --init

# using the environment configuration filepgstream run -c docs/examples/kafka2os.env --init --log-level trace# using the yaml configuration filepgstream run -c docs/examples/kafka2os.yaml --init --log-level info# using the CLI flagspgstream run --source kafka --source-url"localhost:9092" --target opensearch --target-url"http://admin:admin@localhost:9200" --init

An initial snapshot can be performed before starting replication by providing--snapshot-tables flag or by setting the relevant configuration fields (check theconfiguration documentation for more details on advanced configuration options).

Example running pgstream with PostgreSQL -> PostgreSQL with initial snapshot enabled:

# using the CLI flagspgstream run --source postgres --source-url"postgres://postgres:postgres@localhost:5432?sslmode=disable" --target postgres --target-url"postgres://postgres:postgres@localhost:7654?sslmode=disable" --snapshot-tablestest --init

Snapshot mode

pgstream can also be used to perform a point in time snapshot of the source database. This is helpful if you don't require continuous replication, but want to keep the source and target in sync by running nightly snapshots for example.

Thesnapshot command doesn't require any initialisation or pgstream specific state, since it only performs read operations on the source Postgres database.

Example running pgstream to perform a snapshot from PostgreSQL -> PostgreSQL:

# using the environment configuration filepgstream snapshot -c docs/examples/snapshot2pg.env --log-level trace# using the yaml configuration filepgstream snapshot -c docs/examples/snapshot2pg.yaml --log-level info# using the CLI flagspgstream snapshot --postgres-url="postgres://postgres:postgres@localhost:5432?sslmode=disable" --target=postgres --target-url="postgres://postgres:postgres@localhost:7654?sslmode=disable" --tables="test" --reset

Tutorials

Documentation

For more advanced usage, implementation details, and detailed configuration settings, please refer to the full documentation below.

Benchmarks

Snapshots

Datasets used:IMDB database,MusicBrainz database,Firenibble database.

All benchmarks were run using the same setup, with pgstreamv0.7.2, pg_dump/pg_restore (PostgreSQL) 17.4 and PostgreSQL 17.4, using identical resources to ensure a fair comparison.

For more details into performance benchmarking for snapshots to PostgreSQL withpgstream, check out thisblogpost.

Limitations

Some of the limitations of the initial release include:

Single Kafka topic support
Postgres plugin support limited towal2json
No row level filtering support
Primary key/unique not null column required for replication
Kafka serialisation support limited to JSON

Contributing

We welcome contributions from the community! If you'd like to contribute to pgstream, please followthese guidelines and adhere to ourcode of conduct.

License

This project is licensed under the Apache License 2.0 - see theLICENSE file for details.

Support

If you have any questions, encounter issues, or need assistance, open an issue in this repository our join ourDiscord, and our community will be happy to help.

Made with 💜 byXata 🦋

About

PostgreSQL replication with DDL changes

www.xata.io

Releases

No releases published

Packages

No packages published

Languages

Go95.9%
PLpgSQL3.9%
Other0.2%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pgstream - Postgres replication with DDL changes

Features

Table of Contents

Usage

CLI Installation

Environment setup

Configuration

Run`pgstream`

Replication mode

Snapshot mode

Tutorials

Documentation

Benchmarks

Snapshots

Limitations

Contributing

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

tsg/pgstream

Folders and files

Latest commit

History

Repository files navigation

pgstream - Postgres replication with DDL changes

Features

Table of Contents

Usage

CLI Installation

Environment setup

Configuration

Runpgstream

Replication mode

Snapshot mode

Tutorials

Documentation

Benchmarks

Snapshots

Limitations

Contributing

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Run`pgstream`

Packages