Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5e550ac

Browse files
committed
Document Warm Standby for High Availability
Includes sample standby script.Simon Riggs
1 parent075c0ca commit5e550ac

File tree

1 file changed

+307
-1
lines changed

1 file changed

+307
-1
lines changed

‎doc/src/sgml/backup.sgml

Lines changed: 307 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/1521:55:07 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/1522:02:21 momjian Exp $ -->
22

33
<chapter id="backup">
44
<title>Backup and Restore</title>
@@ -1203,6 +1203,312 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
12031203
</sect2>
12041204
</sect1>
12051205

1206+
<sect1 id="warm-standby">
1207+
<title>Warm Standby Servers for High Availability</title>
1208+
1209+
<indexterm zone="backup">
1210+
<primary>Warm Standby</primary>
1211+
</indexterm>
1212+
1213+
<indexterm zone="backup">
1214+
<primary>PITR Standby</primary>
1215+
</indexterm>
1216+
1217+
<indexterm zone="backup">
1218+
<primary>Standby Server</primary>
1219+
</indexterm>
1220+
1221+
<indexterm zone="backup">
1222+
<primary>Log Shipping</primary>
1223+
</indexterm>
1224+
1225+
<indexterm zone="backup">
1226+
<primary>Witness Server</primary>
1227+
</indexterm>
1228+
1229+
<indexterm zone="backup">
1230+
<primary>STONITH</primary>
1231+
</indexterm>
1232+
1233+
<indexterm zone="backup">
1234+
<primary>High Availability</primary>
1235+
</indexterm>
1236+
1237+
<para>
1238+
Continuous Archiving can be used to create a High Availability (HA)
1239+
cluster configuration with one or more Standby Servers ready to take
1240+
over operations in the case that the Primary Server fails. This
1241+
capability is more widely known as Warm Standby Log Shipping.
1242+
</para>
1243+
1244+
<para>
1245+
The Primary and Standby Server work together to provide this capability,
1246+
though the servers are only loosely coupled. The Primary Server operates
1247+
in Continuous Archiving mode, while the Standby Server operates in a
1248+
continuous Recovery mode, reading the WAL files from the Primary. No
1249+
changes to the database tables are required to enable this capability,
1250+
so it offers a low administration overhead in comparison with other
1251+
replication approaches. This configuration also has a very low
1252+
performance impact on the Primary server.
1253+
</para>
1254+
1255+
<para>
1256+
Directly moving WAL or "log" records from one database server to another
1257+
is typically described as Log Shipping. PostgreSQL implements file-based
1258+
Log Shipping, meaning WAL records are batched one file at a time. WAL
1259+
files can be shipped easily and cheaply over any distance, whether it be
1260+
to an adjacent system, another system on the same site or another system
1261+
on the far side of the globe. The bandwidth required for this technique
1262+
varies according to the transaction rate of the Primary Server.
1263+
Record-based Log Shipping is also possible with custom-developed
1264+
procedures, discussed in a later section. Future developments are likely
1265+
to include options for synchronous and/or integrated record-based log
1266+
shipping.
1267+
</para>
1268+
1269+
<para>
1270+
It should be noted that the log shipping is asynchronous, i.e. the WAL
1271+
records are shipped after transaction commit. As a result there can be a
1272+
small window of data loss, should the Primary Server suffer a
1273+
catastrophic failure. The window of data loss is minimised by the use of
1274+
the archive_timeout parameter, which can be set as low as a few seconds
1275+
if required. A very low setting can increase the bandwidth requirements
1276+
for file shipping.
1277+
</para>
1278+
1279+
<para>
1280+
The Standby server is not available for access, since it is continually
1281+
performing recovery processing. Recovery performance is sufficiently
1282+
good that the Standby will typically be only minutes away from full
1283+
availability once it has been activated. As a result, we refer to this
1284+
capability as a Warm Standby configuration that offers High
1285+
Availability. Restoring a server from an archived base backup and
1286+
rollforward can take considerably longer and so that technique only
1287+
really offers a solution for Disaster Recovery, not HA.
1288+
</para>
1289+
1290+
<para>
1291+
Other mechanisms for High Availability replication are available, both
1292+
commercially and as open-source software.
1293+
</para>
1294+
1295+
<para>
1296+
In general, log shipping between servers running different release
1297+
levels will not be possible. It is the policy of the PostgreSQL Worldwide
1298+
Development Group not to make changes to disk formats during minor release
1299+
upgrades, so it is likely that running different minor release levels
1300+
on Primary and Standby servers will work successfully. However, no
1301+
formal support for that is offered and you are advised not to allow this
1302+
to occur over long periods.
1303+
</para>
1304+
1305+
<sect2 id="warm-standby-planning">
1306+
<title>Planning</title>
1307+
1308+
<para>
1309+
On the Standby server all tablespaces and paths will refer to similarly
1310+
named mount points, so it is important to create the Primary and Standby
1311+
servers so that they are as similar as possible, at least from the
1312+
perspective of the database server. Furthermore, any CREATE TABLESPACE
1313+
commands will be passed across as-is, so any new mount points must be
1314+
created on both servers before they are used on the Primary. Hardware
1315+
need not be the same, but experience shows that maintaining two
1316+
identical systems is easier than maintaining two dissimilar ones over
1317+
the whole lifetime of the application and system.
1318+
</para>
1319+
1320+
<para>
1321+
There is no special mode required to enable a Standby server. The
1322+
operations that occur on both Primary and Standby servers are entirely
1323+
normal continuous archiving and recovery tasks. The primary point of
1324+
contact between the two database servers is the archive of WAL files
1325+
that both share: Primary writing to the archive, Standby reading from
1326+
the archive. Care must be taken to ensure that WAL archives for separate
1327+
servers do not become mixed together or confused.
1328+
</para>
1329+
1330+
<para>
1331+
The magic that makes the two loosely coupled servers work together is
1332+
simply a restore_command that waits for the next WAL file to be archived
1333+
from the Primary. The restore_command is specified in the recovery.conf
1334+
file on the Standby Server. Normal recovery processing would request a
1335+
file from the WAL archive, causing an error if the file was unavailable.
1336+
For Standby processing it is normal for the next file to be unavailable,
1337+
so we must be patient and wait for it to appear. A waiting
1338+
restore_command can be written as a custom script that loops after
1339+
polling for the existence of the next WAL file. There must also be some
1340+
way to trigger failover, which should interrupt the restore_command,
1341+
break the loop and return a file not found error to the Standby Server.
1342+
This then ends recovery and the Standby will then come up as a normal
1343+
server.
1344+
</para>
1345+
1346+
<para>
1347+
Sample code for the C version of the restore_command would be be:
1348+
<programlisting>
1349+
triggered = false;
1350+
while (!NextWALFileReady() && !triggered)
1351+
{
1352+
sleep(100000L); // wait for ~0.1 sec
1353+
if (CheckForExternalTrigger())
1354+
triggered = true;
1355+
}
1356+
if (!triggered)
1357+
CopyWALFileForRecovery();
1358+
</programlisting>
1359+
</para>
1360+
1361+
<para>
1362+
PostgreSQL does not provide the system software required to identify a
1363+
failure on the Primary and notify the Standby system and then the
1364+
Standby database server. Many such tools exist and are well integrated
1365+
with other aspects of a system failover, such as ip address migration.
1366+
</para>
1367+
1368+
<para>
1369+
Triggering failover is an important part of planning and design. The
1370+
restore_command is executed in full once for each WAL file. The process
1371+
running the restore_command is therefore created and dies for each file,
1372+
so there is no daemon or server process and so we cannot use signals and
1373+
a signal handler. A more permanent notification is required to trigger
1374+
the failover. It is possible to use a simple timeout facility,
1375+
especially if used in conjunction with a known archive_timeout setting
1376+
on the Primary. This is somewhat error prone since a network or busy
1377+
Primary server might be sufficient to initiate failover. A notification
1378+
mechanism such as the explicit creation of a trigger file is less error
1379+
prone, if this can be arranged.
1380+
</para>
1381+
</sect2>
1382+
1383+
<sect2 id="warm-standby-config">
1384+
<title>Implementation</title>
1385+
1386+
<para>
1387+
The short procedure for configuring a Standby Server is as follows. For
1388+
full details of each step, refer to previous sections as noted.
1389+
<orderedlist>
1390+
<listitem>
1391+
<para>
1392+
Set up Primary and Standby systems as near identically as possible,
1393+
including two identical copies of PostgreSQL at same release level.
1394+
</para>
1395+
</listitem>
1396+
<listitem>
1397+
<para>
1398+
Set up Continuous Archiving from the Primary to a WAL archive located
1399+
in a directory on the Standby Server. Ensure that both <xref
1400+
linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout">
1401+
are set. (See <xref linkend="backup-archiving-wal">)
1402+
</para>
1403+
</listitem>
1404+
<listitem>
1405+
<para>
1406+
Make a Base Backup of the Primary Server. (See <xref
1407+
linkend="backup-base-backup">)
1408+
</para>
1409+
</listitem>
1410+
<listitem>
1411+
<para>
1412+
Begin recovery on the Standby Server from the local WAL archive,
1413+
using a recovery.conf that specifies a restore_command that waits as
1414+
described previously. (See <xref linkend="backup-pitr-recovery">)
1415+
</para>
1416+
</listitem>
1417+
</orderedlist>
1418+
</para>
1419+
1420+
<para>
1421+
Recovery treats the WAL Archive as read-only, so once a WAL file has
1422+
been copied to the Standby system it can be copied to tape at the same
1423+
time as it is being used by the Standby database server to recover.
1424+
Thus, running a Standby Server for High Availability can be performed at
1425+
the same time as files are stored for longer term Disaster Recovery
1426+
purposes.
1427+
</para>
1428+
1429+
<para>
1430+
For testing purposes, it is possible to run both Primary and Standby
1431+
servers on the same system. This does not provide any worthwhile
1432+
improvement on server robustness, nor would it be described as HA.
1433+
</para>
1434+
</sect2>
1435+
1436+
<sect2 id="warm-standby-failover">
1437+
<title>Failover</title>
1438+
1439+
<para>
1440+
If the Primary Server fails then the Standby Server should take begin
1441+
failover procedures.
1442+
</para>
1443+
1444+
<para>
1445+
If the Standby Server fails then no failover need take place. If the
1446+
Standby Server can be restarted, then the recovery process can also be
1447+
immediately restarted, taking advantage of Restartable Recovery.
1448+
</para>
1449+
1450+
<para>
1451+
If the Primary Server fails and then immediately restarts, you must have
1452+
a mechanism for informing it that it is no longer the Primary. This is
1453+
sometimes known as STONITH (Should the Other Node In The Head), which is
1454+
necessary to avoid situations where both systems think they are the
1455+
Primary, which can lead to confusion and ultimately data loss.
1456+
</para>
1457+
1458+
<para>
1459+
Many failover systems use just two systems, the Primary and the Standby,
1460+
connected by some kind of heartbeat mechanism to continually verify the
1461+
connectivity between the two and the viability of the Primary. It is
1462+
also possible to use a third system, known as a Witness Server to avoid
1463+
some problems of inappropriate failover, but the additional complexity
1464+
may not be worthwhile unless it is set-up with sufficient care and
1465+
rigorous testing.
1466+
</para>
1467+
1468+
<para>
1469+
At the instant that failover takes place to the Standby, we have only a
1470+
single server in operation. This is known as a degenerate state.
1471+
The former Standby is now the Primary, but the former Primary is down
1472+
and may stay down. We must now fully re-create a Standby server,
1473+
either on the former Primary system when it comes up, or on a third,
1474+
possibly new, system. Once complete the Primary and Standby can be
1475+
considered to have switched roles. Some people choose to use a third
1476+
server to provide additional protection across the failover interval,
1477+
though clearly this complicates the system configuration and
1478+
operational processes (and this can also act as a Witness Server).
1479+
</para>
1480+
1481+
<para>
1482+
So, switching from Primary to Standby Server can be fast, but requires
1483+
some time to re-prepare the failover cluster. Regular switching from
1484+
Primary to Standby is encouraged, since it allows the regular downtime
1485+
one each system required to maintain HA. This also acts as a test of the
1486+
failover so that it definitely works when you really need it. Written
1487+
administration procedures are advised.
1488+
</para>
1489+
</sect2>
1490+
1491+
<sect2 id="warm-standby-record">
1492+
<title>Implementing Record-based Log Shipping</title>
1493+
1494+
<para>
1495+
The main features for Log Shipping in this release are based around the
1496+
file-based Log Shipping described above. It is also possible to
1497+
implement record-based Log Shipping using the pg_xlogfile_name_offset()
1498+
function, though this requires custom development.
1499+
</para>
1500+
1501+
<para>
1502+
An external program can call pg_xlogfile_name_offset() to find out the
1503+
filename and the exact byte offset within it of the latest WAL pointer.
1504+
If the external program regularly polls the server it can find out how
1505+
far forward the pointer has moved. It can then access the WAL file
1506+
directly and copy those bytes across to a less up-to-date copy on a
1507+
Standby Server.
1508+
</para>
1509+
</sect2>
1510+
</sect1>
1511+
12061512
<sect1 id="migration">
12071513
<title>Migration Between Releases</title>
12081514

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp