1- <!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.18 2007/11/08 19:16:30 momjian Exp $ -->
1+ <!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.19 2007/11/08 19:18:23 momjian Exp $ -->
22
33<chapter id="high-availability">
44 <title>High Availability, Load Balancing, and Replication</title>
7979
8080 <variablelist>
8181
82- <varlistentry>
83- <term>Shared Disk Failover</term>
84- <listitem>
85-
86- <para>
87- Shared disk failover avoids synchronization overhead by having only one
88- copy of the database. It uses a single disk array that is shared by
89- multiple servers. If the main database server fails, the standby server
90- is able to mount and start the database as though it was recovering from
91- a database crash. This allows rapid failover with no data loss.
92- </para>
93-
94- <para>
95- Shared hardware functionality is common in network storage devices.
96- Using a network file system is also possible, though care must be
97- taken that the file system has full POSIX behavior (see <xref
98- linkend="creating-cluster-nfs">). One significant limitation of this
99- method is that if the shared disk array fails or becomes corrupt, the
100- primary and standby servers are both nonfunctional. Another issue is
101- that the standby server should never access the shared storage while
102- the primary server is running.
103- </para>
104-
105- </listitem>
106- </varlistentry>
107-
108- <varlistentry>
109- <term>File System Replication</term>
110- <listitem>
111-
112- <para>
113- A modified version of shared hardware functionality is file system
114- replication, where all changes to a file system are mirrored to a file
115- system residing on another computer. The only restriction is that
116- the mirroring must be done in a way that ensures the standby server
117- has a consistent copy of the file system — specifically, writes
118- to the standby must be done in the same order as those on the master.
119- DRBD is a popular file system replication solution for Linux.
120- </para>
82+ <varlistentry>
83+ <term>Shared Disk Failover</term>
84+ <listitem>
85+
86+ <para>
87+ Shared disk failover avoids synchronization overhead by having only one
88+ copy of the database. It uses a single disk array that is shared by
89+ multiple servers. If the main database server fails, the standby server
90+ is able to mount and start the database as though it was recovering from
91+ a database crash. This allows rapid failover with no data loss.
92+ </para>
93+
94+ <para>
95+ Shared hardware functionality is common in network storage devices.
96+ Using a network file system is also possible, though care must be
97+ taken that the file system has full POSIX behavior (see <xref
98+ linkend="creating-cluster-nfs">). One significant limitation of this
99+ method is that if the shared disk array fails or becomes corrupt, the
100+ primary and standby servers are both nonfunctional. Another issue is
101+ that the standby server should never access the shared storage while
102+ the primary server is running.
103+ </para>
104+
105+ </listitem>
106+ </varlistentry>
107+
108+ <varlistentry>
109+ <term>File System Replication</term>
110+ <listitem>
111+
112+ <para>
113+ A modified version of shared hardware functionality is file system
114+ replication, where all changes to a file system are mirrored to a file
115+ system residing on another computer. The only restriction is that
116+ the mirroring must be done in a way that ensures the standby server
117+ has a consistent copy of the file system — specifically, writes
118+ to the standby must be done in the same order as those on the master.
119+ DRBD is a popular file system replication solution for Linux.
120+ </para>
121121
122122<!--
123123https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
@@ -128,150 +128,150 @@ only committed once to disk and there is a distributed locking
128128protocol to make nodes agree on a serializable transactional order.
129129-->
130130
131- </listitem>
132- </varlistentry>
133-
134- <varlistentry>
135- <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
136- <listitem>
137-
138- <para>
139- A warm standby server (see <xref linkend="warm-standby">) can
140- be kept current by reading a stream of write-ahead log (WAL)
141- records. If the main server fails, the warm standby contains
142- almost all of the data of the main server, and can be quickly
143- made the new master database server. This is asynchronous and
144- can only be done for the entire database server.
145- </para>
146- </listitem>
147- </varlistentry>
148-
149- <varlistentry>
150- <term>Master-Slave Replication</term>
151- <listitem>
152-
153- <para>
154- A master-slave replication setup sends all data modification
155- queries to the master server. The master server asynchronously
156- sends data changes to the slave server. The slave can answer
157- read-only queries while the master server is running. The
158- slave server is ideal for data warehouse queries.
159- </para>
160-
161- <para>
162- Slony-I is an example of this type of replication, with per-table
163- granularity, and support for multiple slaves. Because it
164- updates the slave server asynchronously (in batches), there is
165- possible data loss during fail over.
166- </para>
167- </listitem>
168- </varlistentry>
169-
170- <varlistentry>
171- <term>Statement-Based Replication Middleware</term>
172- <listitem>
173-
174- <para>
175- With statement-based replication middleware, a program intercepts
176- every SQL query and sends it to one or all servers. Each server
177- operates independently. Read-write queries are sent to all servers,
178- while read-only queries can be sent to just one server, allowing
179- the read workload to be distributed.
180- </para>
181-
182- <para>
183- If queries are simply broadcast unmodified, functions like
184- <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
185- sequences would have different values on different servers.
186- This is because each server operates independently, and because
187- SQL queries are broadcast (and not actual modified rows). If
188- this is unacceptable, either the middleware or the application
189- must query such values from a single server and then use those
190- values in write queries. Also, care must be taken that all
191- transactions either commit or abort on all servers, perhaps
192- using two-phase commit (<xref linkend="sql-prepare-transaction"
193- endterm="sql-prepare-transaction-title"> and <xref
194- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
195- Pgpool and Sequoia are an example of this type of replication.
196- </para>
197- </listitem>
198- </varlistentry>
199-
200- <varlistentry>
201- <term>Asynchronous Multi-Master Replication</term>
202- <listitem>
203-
204- <para>
205- For servers that are not regularly connected, like laptops or
206- remote servers, keeping data consistent among servers is a
207- challenge. Using asynchronous multi-master replication, each
208- server works independently, and periodically communicates with
209- the other servers to identify conflicting transactions. The
210- conflicts can be resolved by users or conflict resolution rules.
211- </para>
212- </listitem>
213- </varlistentry>
214-
215- <varlistentry>
216- <term>Synchronous Multi-Master Replication</term>
217- <listitem>
218-
219- <para>
220- In synchronous multi-master replication, each server can accept
221- write requests, and modified data is transmitted from the
222- original server to every other server before each transaction
223- commits. Heavy write activity can cause excessive locking,
224- leading to poor performance. In fact, write performance is
225- often worse than that of a single server. Read requests can
226- be sent to any server. Some implementations use shared disk
227- to reduce the communication overhead. Synchronous multi-master
228- replication is best for mostly read workloads, though its big
229- advantage is that any server can accept write requests —
230- there is no need to partition workloads between master and
231- slave servers, and because the data changes are sent from one
232- server to another, there is no problem with non-deterministic
233- functions like <function>random()</>.
234- </para>
235-
236- <para>
237- <productname>PostgreSQL</> does not offer this type of replication,
238- though <productname>PostgreSQL</> two-phase commit (<xref
239- linkend="sql-prepare-transaction"
240- endterm="sql-prepare-transaction-title"> and <xref
241- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
242- can be used to implement this in application code or middleware.
243- </para>
244- </listitem>
245- </varlistentry>
246-
247- <varlistentry>
248- <term>Data Partitioning</term>
249- <listitem>
250-
251- <para>
252- Data partitioning splits tables into data sets. Each set can
253- be modified by only one server. For example, data can be
254- partitioned by offices, e.g. London and Paris, with a server
255- in each office. If queries combining London and Paris data
256- are necessary, an application can query both servers, or
257- master/slave replication can be used to keep a read-only copy
258- of the other office's data on each server.
259- </para>
260- </listitem>
261- </varlistentry>
262-
263- <varlistentry>
264- <term>Commercial Solutions</term>
265- <listitem>
266-
267- <para>
268- Because <productname>PostgreSQL</> is open source and easily
269- extended, a number of companies have taken <productname>PostgreSQL</>
270- and created commercial closed-source solutions with unique
271- failover, replication, and load balancing capabilities.
272- </para>
273- </listitem>
274- </varlistentry>
131+ </listitem>
132+ </varlistentry>
133+
134+ <varlistentry>
135+ <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
136+ <listitem>
137+
138+ <para>
139+ A warm standby server (see <xref linkend="warm-standby">) can
140+ be kept current by reading a stream of write-ahead log (WAL)
141+ records. If the main server fails, the warm standby contains
142+ almost all of the data of the main server, and can be quickly
143+ made the new master database server. This is asynchronous and
144+ can only be done for the entire database server.
145+ </para>
146+ </listitem>
147+ </varlistentry>
148+
149+ <varlistentry>
150+ <term>Master-Slave Replication</term>
151+ <listitem>
152+
153+ <para>
154+ A master-slave replication setup sends all data modification
155+ queries to the master server. The master server asynchronously
156+ sends data changes to the slave server. The slave can answer
157+ read-only queries while the master server is running. The
158+ slave server is ideal for data warehouse queries.
159+ </para>
160+
161+ <para>
162+ Slony-I is an example of this type of replication, with per-table
163+ granularity, and support for multiple slaves. Because it
164+ updates the slave server asynchronously (in batches), there is
165+ possible data loss during fail over.
166+ </para>
167+ </listitem>
168+ </varlistentry>
169+
170+ <varlistentry>
171+ <term>Statement-Based Replication Middleware</term>
172+ <listitem>
173+
174+ <para>
175+ With statement-based replication middleware, a program intercepts
176+ every SQL query and sends it to one or all servers. Each server
177+ operates independently. Read-write queries are sent to all servers,
178+ while read-only queries can be sent to just one server, allowing
179+ the read workload to be distributed.
180+ </para>
181+
182+ <para>
183+ If queries are simply broadcast unmodified, functions like
184+ <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
185+ sequences would have different values on different servers.
186+ This is because each server operates independently, and because
187+ SQL queries are broadcast (and not actual modified rows). If
188+ this is unacceptable, either the middleware or the application
189+ must query such values from a single server and then use those
190+ values in write queries. Also, care must be taken that all
191+ transactions either commit or abort on all servers, perhaps
192+ using two-phase commit (<xref linkend="sql-prepare-transaction"
193+ endterm="sql-prepare-transaction-title"> and <xref
194+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
195+ Pgpool and Sequoia are an example of this type of replication.
196+ </para>
197+ </listitem>
198+ </varlistentry>
199+
200+ <varlistentry>
201+ <term>Asynchronous Multi-Master Replication</term>
202+ <listitem>
203+
204+ <para>
205+ For servers that are not regularly connected, like laptops or
206+ remote servers, keeping data consistent among servers is a
207+ challenge. Using asynchronous multi-master replication, each
208+ server works independently, and periodically communicates with
209+ the other servers to identify conflicting transactions. The
210+ conflicts can be resolved by users or conflict resolution rules.
211+ </para>
212+ </listitem>
213+ </varlistentry>
214+
215+ <varlistentry>
216+ <term>Synchronous Multi-Master Replication</term>
217+ <listitem>
218+
219+ <para>
220+ In synchronous multi-master replication, each server can accept
221+ write requests, and modified data is transmitted from the
222+ original server to every other server before each transaction
223+ commits. Heavy write activity can cause excessive locking,
224+ leading to poor performance. In fact, write performance is
225+ often worse than that of a single server. Read requests can
226+ be sent to any server. Some implementations use shared disk
227+ to reduce the communication overhead. Synchronous multi-master
228+ replication is best for mostly read workloads, though its big
229+ advantage is that any server can accept write requests —
230+ there is no need to partition workloads between master and
231+ slave servers, and because the data changes are sent from one
232+ server to another, there is no problem with non-deterministic
233+ functions like <function>random()</>.
234+ </para>
235+
236+ <para>
237+ <productname>PostgreSQL</> does not offer this type of replication,
238+ though <productname>PostgreSQL</> two-phase commit (<xref
239+ linkend="sql-prepare-transaction"
240+ endterm="sql-prepare-transaction-title"> and <xref
241+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
242+ can be used to implement this in application code or middleware.
243+ </para>
244+ </listitem>
245+ </varlistentry>
246+
247+ <varlistentry>
248+ <term>Data Partitioning</term>
249+ <listitem>
250+
251+ <para>
252+ Data partitioning splits tables into data sets. Each set can
253+ be modified by only one server. For example, data can be
254+ partitioned by offices, e.g. London and Paris, with a server
255+ in each office. If queries combining London and Paris data
256+ are necessary, an application can query both servers, or
257+ master/slave replication can be used to keep a read-only copy
258+ of the other office's data on each server.
259+ </para>
260+ </listitem>
261+ </varlistentry>
262+
263+ <varlistentry>
264+ <term>Commercial Solutions</term>
265+ <listitem>
266+
267+ <para>
268+ Because <productname>PostgreSQL</> is open source and easily
269+ extended, a number of companies have taken <productname>PostgreSQL</>
270+ and created commercial closed-source solutions with unique
271+ failover, replication, and load balancing capabilities.
272+ </para>
273+ </listitem>
274+ </varlistentry>
275275
276276 </variablelist>
277277