Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6b10c9a

Browse files
committed
Merge branch 'PGPROEE9_6_MULTIMASTER' ofhttps://gitlab.postgrespro.ru/pgpro-dev/postgrespro into PGPROEE9_6_MULTIMASTER
2 parentsf492dce +5d632f9 commit6b10c9a

File tree

3 files changed

+191
-123
lines changed

3 files changed

+191
-123
lines changed

‎contrib/mmts/README.md

Lines changed: 23 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
#`Postgresql multi-master`
22

3-
Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a
4-
synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic
5-
disaster recovery.
3+
Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic disaster recovery.
4+
65

76
##Features
87

@@ -12,148 +11,54 @@ disaster recovery.
1211
* Fault tolerance
1312
* Automatic node recovery
1413

15-
##Overview
16-
17-
Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction
18-
isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the
19-
same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent
20-
backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing
21-
commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only
22-
transactions and queries executed locally without measurable overhead. Replication mechanism itself based on
23-
logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
24-
25-
Cluster consisting of N nodes can continue to work while majority of initial nodes are alive and reachable by
26-
other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is
27-
brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still
28-
exists since the time when node was excluded from cluster (this depends on checkpoint configuration in postgres).
29-
30-
Read more about internals on[architecture](/contrib/mmts/doc/architecture.md) page.
31-
32-
33-
34-
##Installation
35-
36-
Multi-master consist of patched version of postgres and extension mmts, that provides most of functionality, but
37-
doesn't requiere changes to postgres core. To run multimaster one need to install postgres and several extensions
38-
to all nodes in cluster.
39-
40-
###Sources
41-
42-
Ensure that following prerequisites are installed:
43-
44-
for Debian based linux:
45-
46-
```sh
47-
apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev
48-
```
49-
50-
for RedHat based linux:
51-
52-
```sh
53-
yum groupinstall'Development Tools'
54-
yum install git, automake, libtool, bison, flex readline-devel
55-
```
56-
57-
After that everything is ready to install postgres along with extensions
58-
59-
```sh
60-
git clone https://github.com/postgrespro/postgres_cluster.git
61-
cd postgres_cluster
62-
./configure&& make&& make -j 4 install
63-
cd ../../contrib/mmts&& make install
64-
```
65-
66-
###Docker
67-
68-
Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting
69-
3 node cluster.
70-
71-
```sh
72-
cd contrib/mmts
73-
docker-compose up
74-
```
75-
76-
###PgPro packages
77-
78-
After things go more stable we will release prebuilt packages for major platforms.
79-
80-
##Configuration
81-
82-
1. Add these required options to the`postgresql.conf` of each instance in the cluster.
83-
```sh
84-
wal_level = logical# multimaster is build on top of
85-
# logical replication and will not work otherwise
86-
max_connections = 100
87-
max_prepared_transactions = 300# all transactions are implicitly two-phase, so that's
88-
# a good idea to set this equal to max_connections*N_nodes.
89-
max_wal_senders = 10# at least the number of nodes
90-
max_replication_slots = 10# at least the number of nodes
91-
max_worker_processes = 250# Each node has:
92-
# N_nodes-1 receiver
93-
# N_nodes-1 sender
94-
# 1 mtm-sender
95-
# 1 mtm-receiver
96-
# Also transactions executed at neighbour nodes can cause spawn of
97-
# background pool worker at our node. At max this will be equal to
98-
# sum of max_connections on neighbour nodes.
99-
10014

15+
##Overview
10116

102-
shared_preload_libraries ='multimaster'
103-
multimaster.max_nodes = 3# cluster size
104-
multimaster.node_id = 1# the 1-based index of the node in the cluster
105-
multimaster.conn_strings ='dbname=mydb host=node1.mycluster, ...'
106-
# comma-separated list of connection strings to neighbour nodes.
107-
```
108-
2. Allow replication in`pg_hba.conf`.
17+
Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only transactions and queries executed locally without measurable overhead. Replication mechanism itself based on logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
10918

110-
Read description ofall configuration params at[configuration](/contrib/mmts/doc/configuration.md)
19+
Cluster consisting ofN nodes can continue to work while majority of initial nodes are alive and reachable by other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still exists since the time when node was excluded from cluster (this depends on checkpointconfiguration in postgres).
11120

112-
##Management
11321

114-
`create extension mmts;` to gain access to these functions:
22+
##Documentation
11523

116-
*`mtm.get_nodes_state()` -- show status of nodes on cluster
117-
*`mtm.get_cluster_state()` -- show whole cluster status
118-
*`mtm.get_cluster_info()` -- print some debug info
119-
*`mtm.make_table_local(relation regclass)` -- stop replication for a given table
24+
1.[Administration](doc/administration.md)
25+
1.[Installation](doc/administration.md)
26+
1.[Setting up empty cluster](doc/administration.md)
27+
1.[Setting up cluster from pre-existing database](doc/administration.md)
28+
1.[Tuning configuration params](doc/administration.md)
29+
1.[Monitoring](doc/administration.md)
30+
1.[Adding nodes to cluster](doc/administration.md)
31+
1.[Excluding nodes from cluster](doc/administration.md)
32+
1.[Architecture and internals](doc/architecture.md)
33+
1.[List of configuration variables](doc/configuration.md)
34+
1.[Built-in functions and views](doc/configuration.md)
12035

121-
Read description of all management functions at[functions](/contrib/mmts/doc/functions.md)
12236

37+
##Tests
12338

39+
###Fault tolerance
12440

125-
##Tests
41+
(Link to test/failure matrix)
12642

12743
###Performance
12844

12945
(Show TPC-C here on 3 nodes)
13046

131-
###Fault tolerance
132-
133-
(Link to test/failure matrix)
134-
13547

13648
##Limitations
13749

13850
* Commit latency.
139-
Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of
140-
heavy-write transaction user will wait for transaction processing two times: on local node and on all other nodes
141-
(simultaneosly). We have plans to address this issue in future.
51+
Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of heavy-write transaction user will wait for transaction processing two times: on local node and on all other nodes (simultaneosly). We have plans to address this issue in future.
14252

14353
* DDL replication.
144-
While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same
145-
statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We
146-
are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164
147-
postgres regression tests.
54+
While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164 postgres regression tests.
14855

14956
* Isolation level.
150-
Multimaster currently support only_repeatable__read_ isolation level. This is stricter than default_read__commited_,
151-
but also increases probability of serialization failure during commit._Serializable_ level isn't supported yet.
57+
Multimaster currently support only_repeatable__read_ isolation level. This is stricter than default_read__commited_, but also increases probability of serialization failure during commit._Serializable_ level isn't supported yet.
15258

15359
* One database per cluster.
15460

15561

156-
15762
##Credits and Licence
15863

15964
Multi-master developed by the PostgresPro team.

‎contrib/mmts/doc/administration.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
#`Administration`
2+
3+
1.[Installation](doc/administration.md)
4+
1.[Setting up empty cluster](doc/administration.md)
5+
1.[Setting up cluster from pre-existing database](doc/administration.md)
6+
1.[Tuning configuration params](doc/administration.md)
7+
1.[Monitoring](doc/administration.md)
8+
1.[Adding nodes to cluster](doc/administration.md)
9+
1.[Excluding nodes from cluster](doc/administration.md)
10+
11+
12+
13+
##Installation
14+
15+
Multi-master consist of patched version of postgres and extension mmts, that provides most of the functionality, but depends on several modifications to postgres core.
16+
17+
18+
###Sources
19+
20+
Ensure that following prerequisites are installed:
21+
22+
for Debian based linux:
23+
24+
```
25+
apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev
26+
```
27+
28+
for RedHat based linux:
29+
30+
```
31+
yum groupinstall 'Development Tools'
32+
yum install git, automake, libtool, bison, flex readline-devel
33+
```
34+
35+
on mac OS it enough to have XCode command line tools installed.
36+
37+
After that everything is ready to install postgres along with multimaster extension.
38+
39+
```
40+
git clone https://github.com/postgrespro/postgres_cluster.git
41+
cd postgres_cluster
42+
./configure --prefix=/path/to/install && make -j 4 install
43+
cd contrib/mmts && make install
44+
```
45+
46+
```./configure``` here is standard postgres autotools script, so it possible to specify[any options](https://www.postgresql.org/docs/9.6/static/install-procedure.html) available in postgres. Also please ensure that /path/to/install/bin is enlisted in```PATH``` environment variable for current user:
47+
48+
```
49+
export PATH=$PATH:/path/to/install/bin
50+
```
51+
52+
53+
###Docker
54+
55+
Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting 3 node cluster listening on port 15432, 15433 and 15434.
56+
57+
```
58+
cd contrib/mmts
59+
docker-compose up
60+
```
61+
62+
###PgPro packages
63+
64+
When things go more stable we will release prebuilt packages for major platforms.
65+
66+
67+
68+
##Configuration
69+
70+
After installing software on all cluster nodes we can configure our cluster. Here we describe how to set up multimaster consisting of 3 nodes with empty database. Suppose our nodes accesible via domain names```node1```,```node2``` and```node3```. Perform following steps on each node (sequentially or in parallel – doesn't matter):
71+
72+
1. As with usual postgres first of all we need to initialize directiory where postgres will store it files:
73+
```
74+
initdb -D ./datadir
75+
```
76+
In that directory we are interested in files ```postgresql.conf``` and ```pg_hba.conf``` that are responsible for a general and security configuration consequently.
77+
78+
1. Create database that will be used with multimaster. This will require intermediate launch of postgres.
79+
80+
```
81+
pg_ctl -D ./datadir -l ./pg.log start
82+
createdb myuser -h localhost
83+
createdb mydb -O myuser -h localhost
84+
pg_ctl -D ./datadir -l ./pg.log stop
85+
```
86+
87+
1. To be able to run multimaster we need following changes to ```postgresql.conf```:
88+
89+
```
90+
### General postgres option to let multimaster work
91+
92+
wal_level = logical # multimaster is build on top of
93+
# logical replication and will not work otherwise
94+
max_connections = 100
95+
max_prepared_transactions = 300 # all transactions are implicitly two-phase, so that's
96+
# a good idea to set this equal to max_connections*N_nodes.
97+
max_wal_senders = 10 # at least the number of nodes
98+
max_replication_slots = 10 # at least the number of nodes
99+
max_worker_processes = 250 # Each node has:
100+
# N_nodes-1 receiver
101+
# N_nodes-1 sender
102+
# 1 mtm-sender
103+
# 1 mtm-receiver
104+
# Also transactions executed at neighbour nodes can cause spawn of
105+
# background pool worker at our node. At max this will be equal to
106+
# sum of max_connections on neighbour nodes.
107+
108+
### Multimaster-specific options
109+
110+
shared_preload_libraries = 'multimaster'
111+
multimaster.max_nodes = 3 # cluster size
112+
multimaster.node_id = 1 # the 1-based index of the node in the cluster
113+
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3'
114+
# comma-separated list of connection strings to neighbour nodes.
115+
```
116+
117+
Full description of all configuration parameters available in section [configuration](doc/configuration.md). Depending on network environment and expected usage patterns one can want to tweak parameters.
118+
119+
1. Allow replication in `pg_hba.conf`:
120+
121+
```
122+
host myuser all node1 trust
123+
host myuser all node2 trust
124+
host myuser all node3 trust
125+
host replication all node1 trust
126+
host replication all node2 trust
127+
host replication all node3 trust
128+
```
129+
130+
1. Finally start postgres:
131+
132+
```
133+
pg_ctl -D ./datadir -l ./pg.log start
134+
```
135+
136+
1. When postgres is started on all nodes you can connect to any node and create multimaster extention to get acces to monitoring functions:
137+
```
138+
psql -h node1
139+
> CREATE EXTENSION multimaster;
140+
```
141+
142+
To enshure that everything is working check multimaster view ```mtm.get_cluster_state()```:
143+
144+
```
145+
> select * from mtm.get_cluster_state();
146+
```
147+
148+
Check that liveNodes in this view is equal to allNodes.
149+
150+
151+
## Setting up cluster from pre-existing database
152+
## Tuning configuration params
153+
## Monitoring
154+
155+
* `mtm.get_nodes_state()` -- show status of nodes on cluster
156+
* `mtm.get_cluster_state()` -- show whole cluster status
157+
* `mtm.get_cluster_info()` -- print some debug info
158+
* `mtm.make_table_local(relation regclass)` -- stop replication for a given table
159+
160+
Read description of all management functions at [functions](doc/functions.md)
161+
162+
## Adding nodes to cluster
163+
## Excluding nodes from cluster
164+

‎contrib/mmts/doc/configuration.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,31 +16,30 @@
1616

1717
```multimaster.ignore_tables_without_pk``` Do not replicate tables withpout primary key. Boolean.
1818

19+
```multimaster.cluster_name``` Name of the cluster, desn't affect anything. Just in case. If set that mmts will check name correspondence.
1920

2021
##Questionable
2122

2223
(probably we will delete that variables, most of them are useful only for development purposes --stas)
2324

24-
```multimaster.cluster_name``` Name of the cluster, desn't affect anything. Just in case.
25-
2625
```multimaster.min_2pc_timeout``` Minimal timeout between receiving PREPARED message from nodes participated in transaction to coordinator (milliseconds). Default = 2000, /* 2 seconds*/.
2726

2827
```multimaster.max_2pc_ratio``` Maximal ratio (in percents) between prepare time at different nodes: if T is time of preparing transaction at some node, then transaction can be aborted if prepared responce was not received in T*MtmMax2PCRatio/100. default = 200, /* 2 times*/
2928

3029
```multimaster.queue_size``` Multimaster queue size. default = 256*1024*1024,
3130

31+
```multimaster.trans_spill_threshold``` Maximal size (Mb) of transaction after which transaction is written to the disk. Default = 1000, /* 1Gb*/ (istm reorderbuffer also can do that, isn't it?)
32+
3233
```multimaster.vacuum_delay``` Minimal age of records which can be vacuumed (seconds). default = 1.
3334

3435
```multimaster.worker``` Number of multimaster executor workers. Default = 8. (use dynamic workers with some timeout to die?)
3536

3637
```multimaster.max_worker``` Maximal number of multimaster dynamic executor workers. (set this to max_conn?) Default = 100.
3738

38-
```multimaster.gc_period```Number of distributed transactions after which garbage collection is started. Multimaster is building xid->csn hash map which has to be cleaned to avoid hash overflow. This parameter specifies interval of invoking garbage collector for this map. default = MTM_HASH_SIZE/10
39+
```multimaster.gc_period``` Number of distributed transactions after which garbage collection is started. Multimaster is building xid->csn hash map which has to be cleaned to avoid hash overflow. This parameter specifies interval of invoking garbage collector for this map. default = MTM_HASH_SIZE/10
3940

4041
```multimaster.max_node``` Maximal number of cluster nodes. This parameters allows to add new nodes to the cluster, default value 0 restricts number of nodes to one specified in multimaster.conn_strings (May be just set that to 64 and allow user to add node when trey need without restart?) default = 0
4142

42-
```multimaster.trans_spill_threshold``` Maximal size (Mb) of transaction after which transaction is written to the disk. Default = 1000, /* 1Gb*/ (istm reorderbuffer also can do that, isn't it?)
43-
4443
```multimaster.node_disable_delay``` Minimal amount of time (msec) between node status change. This delay is used to avoid false detection of node failure and to prevent blinking of node status node. default = 2000. (We can just increase heartbeat_recv_timeout)
4544

4645
```multimaster.connect_timeout``` Multimaster nodes connect timeout. Interval in milliseconds for establishing connection with cluster node. default = 10000, /* 10 seconds*/

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp