NotificationsYou must be signed in to change notification settings
Fork28
Star151

Commit6b10c9a

committed

Merge branch 'PGPROEE9_6_MULTIMASTER' ofhttps://gitlab.postgrespro.ru/pgpro-dev/postgrespro into PGPROEE9_6_MULTIMASTER

2 parentsf492dce +5d632f9 commit6b10c9aCopy full SHA for 6b10c9a

File tree

3 files changed

+191

-123

lines changed

contrib/mmts
- README.md
- doc
  - administration.md
  - configuration.md

3 files changed

+191

-123

lines changed

`‎contrib/mmts/README.md`

Lines changed: 23 additions & 118 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,8 +1,7 @@`
`1`	`1`	#`Postgresql multi-master`
`2`	`2`
`3`		`-Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a`
`4`		`-synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic`
`5`		`-disaster recovery.`
	`3`	`+Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic disaster recovery.`
	`4`	`+`
`6`	`5`
`7`	`6`	`##Features`
`8`	`7`
`@@ -12,148 +11,54 @@ disaster recovery.`
`12`	`11`	`* Fault tolerance`
`13`	`12`	`* Automatic node recovery`
`14`	`13`
`15`		`-##Overview`
`16`		`-`
`17`		`-Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction`
`18`		`-isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the`
`19`		`-same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent`
`20`		`-backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing`
`21`		`-commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only`
`22`		`-transactions and queries executed locally without measurable overhead. Replication mechanism itself based on`
`23`		`-logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.`
`24`		`-`
`25`		`-Cluster consisting of N nodes can continue to work while majority of initial nodes are alive and reachable by`
`26`		`-other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is`
`27`		`-brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still`
`28`		`-exists since the time when node was excluded from cluster (this depends on checkpoint configuration in postgres).`
`29`		`-`
`30`		`-Read more about internals on[architecture](/contrib/mmts/doc/architecture.md) page.`
`31`		`-`
`32`		`-`
`33`		`-`
`34`		`-##Installation`
`35`		`-`
`36`		`-Multi-master consist of patched version of postgres and extension mmts, that provides most of functionality, but`
`37`		`-doesn't requiere changes to postgres core. To run multimaster one need to install postgres and several extensions`
`38`		`-to all nodes in cluster.`
`39`		`-`
`40`		`-###Sources`
`41`		`-`
`42`		`-Ensure that following prerequisites are installed:`
`43`		`-`
`44`		`-for Debian based linux:`
`45`		`-`
`46`		-```sh
`47`		`-apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev`
`48`		-```
`49`		`-`
`50`		`-for RedHat based linux:`
`51`		`-`
`52`		-```sh
`53`		`-yum groupinstall'Development Tools'`
`54`		`-yum install git, automake, libtool, bison, flex readline-devel`
`55`		-```
`56`		`-`
`57`		`-After that everything is ready to install postgres along with extensions`
`58`		`-`
`59`		-```sh
`60`		`-git clone https://github.com/postgrespro/postgres_cluster.git`
`61`		`-cd postgres_cluster`
`62`		`-./configure&& make&& make -j 4 install`
`63`		`-cd ../../contrib/mmts&& make install`
`64`		-```
`65`		`-`
`66`		`-###Docker`
`67`		`-`
`68`		`-Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting`
`69`		`-3 node cluster.`
`70`		`-`
`71`		-```sh
`72`		`-cd contrib/mmts`
`73`		`-docker-compose up`
`74`		-```
`75`		`-`
`76`		`-###PgPro packages`
`77`		`-`
`78`		`-After things go more stable we will release prebuilt packages for major platforms.`
`79`		`-`
`80`		`-##Configuration`
`81`		`-`
`82`		-1. Add these required options to the`postgresql.conf` of each instance in the cluster.
`83`		-```sh
`84`		`- wal_level = logical# multimaster is build on top of`
`85`		`-# logical replication and will not work otherwise`
`86`		`- max_connections = 100`
`87`		`- max_prepared_transactions = 300# all transactions are implicitly two-phase, so that's`
`88`		`-# a good idea to set this equal to max_connections*N_nodes.`
`89`		`- max_wal_senders = 10# at least the number of nodes`
`90`		`- max_replication_slots = 10# at least the number of nodes`
`91`		`- max_worker_processes = 250# Each node has:`
`92`		`-# N_nodes-1 receiver`
`93`		`-# N_nodes-1 sender`
`94`		`-# 1 mtm-sender`
`95`		`-# 1 mtm-receiver`
`96`		`-# Also transactions executed at neighbour nodes can cause spawn of`
`97`		`-# background pool worker at our node. At max this will be equal to`
`98`		`-# sum of max_connections on neighbour nodes.`
`99`		`-`
`100`	`14`
	`15`	`+##Overview`
`101`	`16`
`102`		`- shared_preload_libraries ='multimaster'`
`103`		`- multimaster.max_nodes = 3# cluster size`
`104`		`- multimaster.node_id = 1# the 1-based index of the node in the cluster`
`105`		`- multimaster.conn_strings ='dbname=mydb host=node1.mycluster, ...'`
`106`		`-# comma-separated list of connection strings to neighbour nodes.`
`107`		-```
`108`		-2. Allow replication in`pg_hba.conf`.
	`17`	+Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only transactions and queries executed locally without measurable overhead. Replication mechanism itself based on logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
`109`	`18`
`110`		`-Read description ofall configuration params at[configuration](/contrib/mmts/doc/configuration.md)`
	`19`	`+Cluster consisting ofN nodes can continue to work while majority of initial nodes are alive and reachable by other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still exists since the time when node was excluded from cluster (this depends on checkpointconfiguration in postgres).`
`111`	`20`
`112`		`-##Management`
`113`	`21`
`114`		-`create extension mmts;` to gain access to these functions:
	`22`	`+##Documentation`
`115`	`23`
`116`		-*`mtm.get_nodes_state()` -- show status of nodes on cluster
`117`		-*`mtm.get_cluster_state()` -- show whole cluster status
`118`		-*`mtm.get_cluster_info()` -- print some debug info
`119`		-*`mtm.make_table_local(relation regclass)` -- stop replication for a given table
	`24`	`+1.[Administration](doc/administration.md)`
	`25`	`+1.[Installation](doc/administration.md)`
	`26`	`+1.[Setting up empty cluster](doc/administration.md)`
	`27`	`+1.[Setting up cluster from pre-existing database](doc/administration.md)`
	`28`	`+1.[Tuning configuration params](doc/administration.md)`
	`29`	`+1.[Monitoring](doc/administration.md)`
	`30`	`+1.[Adding nodes to cluster](doc/administration.md)`
	`31`	`+1.[Excluding nodes from cluster](doc/administration.md)`
	`32`	`+1.[Architecture and internals](doc/architecture.md)`
	`33`	`+1.[List of configuration variables](doc/configuration.md)`
	`34`	`+1.[Built-in functions and views](doc/configuration.md)`
`120`	`35`
`121`		`-Read description of all management functions at[functions](/contrib/mmts/doc/functions.md)`
`122`	`36`
	`37`	`+##Tests`
`123`	`38`
	`39`	`+###Fault tolerance`
`124`	`40`
`125`		`-##Tests`
	`41`	`+(Link to test/failure matrix)`
`126`	`42`
`127`	`43`	`###Performance`
`128`	`44`
`129`	`45`	`(Show TPC-C here on 3 nodes)`
`130`	`46`
`131`		`-###Fault tolerance`
`132`		`-`
`133`		`-(Link to test/failure matrix)`
`134`		`-`
`135`	`47`
`136`	`48`	`##Limitations`
`137`	`49`
`138`	`50`	`* Commit latency.`
`139`		`-Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of`
`140`		`-heavy-write transaction user will wait for transaction processing two times: on local node and on all other nodes`
`141`		`-(simultaneosly). We have plans to address this issue in future.`
	`51`	`+Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of heavy-write transaction user will wait for transaction processing two times: on local node and on all other nodes (simultaneosly). We have plans to address this issue in future.`
`142`	`52`
`143`	`53`	`* DDL replication.`
`144`		`-While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same`
`145`		`-statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We`
`146`		`-are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164`
`147`		`-postgres regression tests.`
	`54`	`+While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164 postgres regression tests.`
`148`	`55`
`149`	`56`	`* Isolation level.`
`150`		`-Multimaster currently support only_repeatable__read_ isolation level. This is stricter than default_read__commited_,`
`151`		`-but also increases probability of serialization failure during commit._Serializable_ level isn't supported yet.`
	`57`	`+Multimaster currently support only_repeatable__read_ isolation level. This is stricter than default_read__commited_, but also increases probability of serialization failure during commit._Serializable_ level isn't supported yet.`
`152`	`58`
`153`	`59`	`* One database per cluster.`
`154`	`60`
`155`	`61`
`156`		`-`
`157`	`62`	`##Credits and Licence`
`158`	`63`
`159`	`64`	`Multi-master developed by the PostgresPro team.`

`‎contrib/mmts/doc/administration.md`

Lines changed: 164 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,164 @@`
	`1`	+#`Administration`
	`2`	`+`
	`3`	`+1.[Installation](doc/administration.md)`
	`4`	`+1.[Setting up empty cluster](doc/administration.md)`
	`5`	`+1.[Setting up cluster from pre-existing database](doc/administration.md)`
	`6`	`+1.[Tuning configuration params](doc/administration.md)`
	`7`	`+1.[Monitoring](doc/administration.md)`
	`8`	`+1.[Adding nodes to cluster](doc/administration.md)`
	`9`	`+1.[Excluding nodes from cluster](doc/administration.md)`
	`10`	`+`
	`11`	`+`
	`12`	`+`
	`13`	`+##Installation`
	`14`	`+`
	`15`	`+Multi-master consist of patched version of postgres and extension mmts, that provides most of the functionality, but depends on several modifications to postgres core.`
	`16`	`+`
	`17`	`+`
	`18`	`+###Sources`
	`19`	`+`
	`20`	`+Ensure that following prerequisites are installed:`
	`21`	`+`
	`22`	`+for Debian based linux:`
	`23`	`+`
	`24`	+```
	`25`	`+apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev`
	`26`	+```
	`27`	`+`
	`28`	`+for RedHat based linux:`
	`29`	`+`
	`30`	+```
	`31`	`+yum groupinstall 'Development Tools'`
	`32`	`+yum install git, automake, libtool, bison, flex readline-devel`
	`33`	+```
	`34`	`+`
	`35`	`+on mac OS it enough to have XCode command line tools installed.`
	`36`	`+`
	`37`	`+After that everything is ready to install postgres along with multimaster extension.`
	`38`	`+`
	`39`	+```
	`40`	`+git clone https://github.com/postgrespro/postgres_cluster.git`
	`41`	`+cd postgres_cluster`
	`42`	`+./configure --prefix=/path/to/install && make -j 4 install`
	`43`	`+cd contrib/mmts && make install`
	`44`	+```
	`45`	`+`
	`46`	+```./configure``` here is standard postgres autotools script, so it possible to specify[any options](https://www.postgresql.org/docs/9.6/static/install-procedure.html) available in postgres. Also please ensure that /path/to/install/bin is enlisted in```PATH``` environment variable for current user:
	`47`	`+`
	`48`	+```
	`49`	`+ export PATH=$PATH:/path/to/install/bin`
	`50`	+```
	`51`	`+`
	`52`	`+`
	`53`	`+###Docker`
	`54`	`+`
	`55`	`+Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting 3 node cluster listening on port 15432, 15433 and 15434.`
	`56`	`+`
	`57`	+```
	`58`	`+cd contrib/mmts`
	`59`	`+docker-compose up`
	`60`	+```
	`61`	`+`
	`62`	`+###PgPro packages`
	`63`	`+`
	`64`	`+When things go more stable we will release prebuilt packages for major platforms.`
	`65`	`+`
	`66`	`+`
	`67`	`+`
	`68`	`+##Configuration`
	`69`	`+`
	`70`	+After installing software on all cluster nodes we can configure our cluster. Here we describe how to set up multimaster consisting of 3 nodes with empty database. Suppose our nodes accesible via domain names```node1```,```node2``` and```node3```. Perform following steps on each node (sequentially or in parallel – doesn't matter):
	`71`	`+`
	`72`	`+1. As with usual postgres first of all we need to initialize directiory where postgres will store it files:`
	`73`	+```
	`74`	`+ initdb -D ./datadir`
	`75`	+ ```
	`76`	+ In that directory we are interested in files ```postgresql.conf``` and ```pg_hba.conf``` that are responsible for a general and security configuration consequently.
	`77`	`+`
	`78`	`+1. Create database that will be used with multimaster. This will require intermediate launch of postgres.`
	`79`	`+`
	`80`	+ ```
	`81`	`+ pg_ctl -D ./datadir -l ./pg.log start`
	`82`	`+ createdb myuser -h localhost`
	`83`	`+ createdb mydb -O myuser -h localhost`
	`84`	`+ pg_ctl -D ./datadir -l ./pg.log stop`
	`85`	+ ```
	`86`	`+`
	`87`	+1. To be able to run multimaster we need following changes to ```postgresql.conf```:
	`88`	`+`
	`89`	+ ```
	`90`	`+ ### General postgres option to let multimaster work`
	`91`	`+`
	`92`	`+ wal_level = logical # multimaster is build on top of`
	`93`	`+ # logical replication and will not work otherwise`
	`94`	`+ max_connections = 100`
	`95`	`+ max_prepared_transactions = 300 # all transactions are implicitly two-phase, so that's`
	`96`	`+ # a good idea to set this equal to max_connections*N_nodes.`
	`97`	`+ max_wal_senders = 10 # at least the number of nodes`
	`98`	`+ max_replication_slots = 10 # at least the number of nodes`
	`99`	`+ max_worker_processes = 250 # Each node has:`
	`100`	`+ # N_nodes-1 receiver`
	`101`	`+ # N_nodes-1 sender`
	`102`	`+ # 1 mtm-sender`
	`103`	`+ # 1 mtm-receiver`
	`104`	`+ # Also transactions executed at neighbour nodes can cause spawn of`
	`105`	`+ # background pool worker at our node. At max this will be equal to`
	`106`	`+ # sum of max_connections on neighbour nodes.`
	`107`	`+`
	`108`	`+ ### Multimaster-specific options`
	`109`	`+`
	`110`	`+ shared_preload_libraries = 'multimaster'`
	`111`	`+ multimaster.max_nodes = 3 # cluster size`
	`112`	`+ multimaster.node_id = 1 # the 1-based index of the node in the cluster`
	`113`	`+ multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3'`
	`114`	`+ # comma-separated list of connection strings to neighbour nodes.`
	`115`	+ ```
	`116`	`+`
	`117`	`+ Full description of all configuration parameters available in section [configuration](doc/configuration.md). Depending on network environment and expected usage patterns one can want to tweak parameters.`
	`118`	`+`
	`119`	+1. Allow replication in `pg_hba.conf`:
	`120`	`+`
	`121`	+ ```
	`122`	`+ host myuser all node1 trust`
	`123`	`+ host myuser all node2 trust`
	`124`	`+ host myuser all node3 trust`
	`125`	`+ host replication all node1 trust`
	`126`	`+ host replication all node2 trust`
	`127`	`+ host replication all node3 trust`
	`128`	+ ```
	`129`	`+`
	`130`	`+1. Finally start postgres:`
	`131`	`+`
	`132`	+ ```
	`133`	`+ pg_ctl -D ./datadir -l ./pg.log start`
	`134`	+ ```
	`135`	`+`
	`136`	`+1. When postgres is started on all nodes you can connect to any node and create multimaster extention to get acces to monitoring functions:`
	`137`	+ ```
	`138`	`+ psql -h node1`
	`139`	`+ > CREATE EXTENSION multimaster;`
	`140`	+ ```
	`141`	`+`
	`142`	+ To enshure that everything is working check multimaster view ```mtm.get_cluster_state()```:
	`143`	`+`
	`144`	+ ```
	`145`	`+ > select * from mtm.get_cluster_state();`
	`146`	+ ```
	`147`	`+`
	`148`	`+ Check that liveNodes in this view is equal to allNodes.`
	`149`	`+`
	`150`	`+`
	`151`	`+## Setting up cluster from pre-existing database`
	`152`	`+## Tuning configuration params`
	`153`	`+## Monitoring`
	`154`	`+`
	`155`	+* `mtm.get_nodes_state()` -- show status of nodes on cluster
	`156`	+* `mtm.get_cluster_state()` -- show whole cluster status
	`157`	+* `mtm.get_cluster_info()` -- print some debug info
	`158`	+* `mtm.make_table_local(relation regclass)` -- stop replication for a given table
	`159`	`+`
	`160`	`+Read description of all management functions at [functions](doc/functions.md)`
	`161`	`+`
	`162`	`+## Adding nodes to cluster`
	`163`	`+## Excluding nodes from cluster`
	`164`	`+`

`‎contrib/mmts/doc/configuration.md`

Lines changed: 4 additions & 5 deletions

Original file line number	Diff line number	Diff line change
`@@ -16,31 +16,30 @@`
`16`	`16`
`17`	`17`	```multimaster.ignore_tables_without_pk``` Do not replicate tables withpout primary key. Boolean.
`18`	`18`
	`19`	+```multimaster.cluster_name``` Name of the cluster, desn't affect anything. Just in case. If set that mmts will check name correspondence.
`19`	`20`
`20`	`21`	`##Questionable`
`21`	`22`
`22`	`23`	`(probably we will delete that variables, most of them are useful only for development purposes --stas)`
`23`	`24`
`24`		-```multimaster.cluster_name``` Name of the cluster, desn't affect anything. Just in case.
`25`		`-`
`26`	`25`	```multimaster.min_2pc_timeout``` Minimal timeout between receiving PREPARED message from nodes participated in transaction to coordinator (milliseconds). Default = 2000, /* 2 seconds*/.
`27`	`26`
`28`	`27`	```multimaster.max_2pc_ratio``` Maximal ratio (in percents) between prepare time at different nodes: if T is time of preparing transaction at some node, then transaction can be aborted if prepared responce was not received in TMtmMax2PCRatio/100. default = 200, / 2 times*/
`29`	`28`
`30`	`29`	```multimaster.queue_size``` Multimaster queue size. default = 25610241024,
`31`	`30`
	`31`	+```multimaster.trans_spill_threshold``` Maximal size (Mb) of transaction after which transaction is written to the disk. Default = 1000, /* 1Gb*/ (istm reorderbuffer also can do that, isn't it?)
	`32`	`+`
`32`	`33`	```multimaster.vacuum_delay``` Minimal age of records which can be vacuumed (seconds). default = 1.
`33`	`34`
`34`	`35`	```multimaster.worker``` Number of multimaster executor workers. Default = 8. (use dynamic workers with some timeout to die?)
`35`	`36`
`36`	`37`	```multimaster.max_worker``` Maximal number of multimaster dynamic executor workers. (set this to max_conn?) Default = 100.
`37`	`38`
`38`		-```multimaster.gc_period```Number of distributed transactions after which garbage collection is started. Multimaster is building xid->csn hash map which has to be cleaned to avoid hash overflow. This parameter specifies interval of invoking garbage collector for this map. default = MTM_HASH_SIZE/10
	`39`	+```multimaster.gc_period``` Number of distributed transactions after which garbage collection is started. Multimaster is building xid->csn hash map which has to be cleaned to avoid hash overflow. This parameter specifies interval of invoking garbage collector for this map. default = MTM_HASH_SIZE/10
`39`	`40`
`40`	`41`	```multimaster.max_node``` Maximal number of cluster nodes. This parameters allows to add new nodes to the cluster, default value 0 restricts number of nodes to one specified in multimaster.conn_strings (May be just set that to 64 and allow user to add node when trey need without restart?) default = 0
`41`	`42`
`42`		-```multimaster.trans_spill_threshold``` Maximal size (Mb) of transaction after which transaction is written to the disk. Default = 1000, /* 1Gb*/ (istm reorderbuffer also can do that, isn't it?)
`43`		`-`
`44`	`43`	```multimaster.node_disable_delay``` Minimal amount of time (msec) between node status change. This delay is used to avoid false detection of node failure and to prevent blinking of node status node. default = 2000. (We can just increase heartbeat_recv_timeout)
`45`	`44`
`46`	`45`	```multimaster.connect_timeout``` Multimaster nodes connect timeout. Interval in milliseconds for establishing connection with cluster node. default = 10000, /* 10 seconds*/

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit6b10c9a

File tree

3 files changed

3 files changed

`‎contrib/mmts/README.md`

`‎contrib/mmts/doc/administration.md`

`‎contrib/mmts/doc/configuration.md`

0 commit comments