|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/1521:55:07 momjian Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/1522:02:21 momjian Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="backup">
|
4 | 4 | <title>Backup and Restore</title>
|
@@ -1203,6 +1203,312 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
|
1203 | 1203 | </sect2>
|
1204 | 1204 | </sect1>
|
1205 | 1205 |
|
| 1206 | + <sect1 id="warm-standby"> |
| 1207 | + <title>Warm Standby Servers for High Availability</title> |
| 1208 | + |
| 1209 | + <indexterm zone="backup"> |
| 1210 | + <primary>Warm Standby</primary> |
| 1211 | + </indexterm> |
| 1212 | + |
| 1213 | + <indexterm zone="backup"> |
| 1214 | + <primary>PITR Standby</primary> |
| 1215 | + </indexterm> |
| 1216 | + |
| 1217 | + <indexterm zone="backup"> |
| 1218 | + <primary>Standby Server</primary> |
| 1219 | + </indexterm> |
| 1220 | + |
| 1221 | + <indexterm zone="backup"> |
| 1222 | + <primary>Log Shipping</primary> |
| 1223 | + </indexterm> |
| 1224 | + |
| 1225 | + <indexterm zone="backup"> |
| 1226 | + <primary>Witness Server</primary> |
| 1227 | + </indexterm> |
| 1228 | + |
| 1229 | + <indexterm zone="backup"> |
| 1230 | + <primary>STONITH</primary> |
| 1231 | + </indexterm> |
| 1232 | + |
| 1233 | + <indexterm zone="backup"> |
| 1234 | + <primary>High Availability</primary> |
| 1235 | + </indexterm> |
| 1236 | + |
| 1237 | + <para> |
| 1238 | + Continuous Archiving can be used to create a High Availability (HA) |
| 1239 | + cluster configuration with one or more Standby Servers ready to take |
| 1240 | + over operations in the case that the Primary Server fails. This |
| 1241 | + capability is more widely known as Warm Standby Log Shipping. |
| 1242 | + </para> |
| 1243 | + |
| 1244 | + <para> |
| 1245 | + The Primary and Standby Server work together to provide this capability, |
| 1246 | + though the servers are only loosely coupled. The Primary Server operates |
| 1247 | + in Continuous Archiving mode, while the Standby Server operates in a |
| 1248 | + continuous Recovery mode, reading the WAL files from the Primary. No |
| 1249 | + changes to the database tables are required to enable this capability, |
| 1250 | + so it offers a low administration overhead in comparison with other |
| 1251 | + replication approaches. This configuration also has a very low |
| 1252 | + performance impact on the Primary server. |
| 1253 | + </para> |
| 1254 | + |
| 1255 | + <para> |
| 1256 | + Directly moving WAL or "log" records from one database server to another |
| 1257 | + is typically described as Log Shipping. PostgreSQL implements file-based |
| 1258 | + Log Shipping, meaning WAL records are batched one file at a time. WAL |
| 1259 | + files can be shipped easily and cheaply over any distance, whether it be |
| 1260 | + to an adjacent system, another system on the same site or another system |
| 1261 | + on the far side of the globe. The bandwidth required for this technique |
| 1262 | + varies according to the transaction rate of the Primary Server. |
| 1263 | + Record-based Log Shipping is also possible with custom-developed |
| 1264 | + procedures, discussed in a later section. Future developments are likely |
| 1265 | + to include options for synchronous and/or integrated record-based log |
| 1266 | + shipping. |
| 1267 | + </para> |
| 1268 | + |
| 1269 | + <para> |
| 1270 | + It should be noted that the log shipping is asynchronous, i.e. the WAL |
| 1271 | + records are shipped after transaction commit. As a result there can be a |
| 1272 | + small window of data loss, should the Primary Server suffer a |
| 1273 | + catastrophic failure. The window of data loss is minimised by the use of |
| 1274 | + the archive_timeout parameter, which can be set as low as a few seconds |
| 1275 | + if required. A very low setting can increase the bandwidth requirements |
| 1276 | + for file shipping. |
| 1277 | + </para> |
| 1278 | + |
| 1279 | + <para> |
| 1280 | + The Standby server is not available for access, since it is continually |
| 1281 | + performing recovery processing. Recovery performance is sufficiently |
| 1282 | + good that the Standby will typically be only minutes away from full |
| 1283 | + availability once it has been activated. As a result, we refer to this |
| 1284 | + capability as a Warm Standby configuration that offers High |
| 1285 | + Availability. Restoring a server from an archived base backup and |
| 1286 | + rollforward can take considerably longer and so that technique only |
| 1287 | + really offers a solution for Disaster Recovery, not HA. |
| 1288 | + </para> |
| 1289 | + |
| 1290 | + <para> |
| 1291 | + Other mechanisms for High Availability replication are available, both |
| 1292 | + commercially and as open-source software. |
| 1293 | + </para> |
| 1294 | + |
| 1295 | + <para> |
| 1296 | + In general, log shipping between servers running different release |
| 1297 | + levels will not be possible. It is the policy of the PostgreSQL Worldwide |
| 1298 | + Development Group not to make changes to disk formats during minor release |
| 1299 | + upgrades, so it is likely that running different minor release levels |
| 1300 | + on Primary and Standby servers will work successfully. However, no |
| 1301 | + formal support for that is offered and you are advised not to allow this |
| 1302 | + to occur over long periods. |
| 1303 | + </para> |
| 1304 | + |
| 1305 | + <sect2 id="warm-standby-planning"> |
| 1306 | + <title>Planning</title> |
| 1307 | + |
| 1308 | + <para> |
| 1309 | + On the Standby server all tablespaces and paths will refer to similarly |
| 1310 | + named mount points, so it is important to create the Primary and Standby |
| 1311 | + servers so that they are as similar as possible, at least from the |
| 1312 | + perspective of the database server. Furthermore, any CREATE TABLESPACE |
| 1313 | + commands will be passed across as-is, so any new mount points must be |
| 1314 | + created on both servers before they are used on the Primary. Hardware |
| 1315 | + need not be the same, but experience shows that maintaining two |
| 1316 | + identical systems is easier than maintaining two dissimilar ones over |
| 1317 | + the whole lifetime of the application and system. |
| 1318 | + </para> |
| 1319 | + |
| 1320 | + <para> |
| 1321 | + There is no special mode required to enable a Standby server. The |
| 1322 | + operations that occur on both Primary and Standby servers are entirely |
| 1323 | + normal continuous archiving and recovery tasks. The primary point of |
| 1324 | + contact between the two database servers is the archive of WAL files |
| 1325 | + that both share: Primary writing to the archive, Standby reading from |
| 1326 | + the archive. Care must be taken to ensure that WAL archives for separate |
| 1327 | + servers do not become mixed together or confused. |
| 1328 | + </para> |
| 1329 | + |
| 1330 | + <para> |
| 1331 | + The magic that makes the two loosely coupled servers work together is |
| 1332 | + simply a restore_command that waits for the next WAL file to be archived |
| 1333 | + from the Primary. The restore_command is specified in the recovery.conf |
| 1334 | + file on the Standby Server. Normal recovery processing would request a |
| 1335 | + file from the WAL archive, causing an error if the file was unavailable. |
| 1336 | + For Standby processing it is normal for the next file to be unavailable, |
| 1337 | + so we must be patient and wait for it to appear. A waiting |
| 1338 | + restore_command can be written as a custom script that loops after |
| 1339 | + polling for the existence of the next WAL file. There must also be some |
| 1340 | + way to trigger failover, which should interrupt the restore_command, |
| 1341 | + break the loop and return a file not found error to the Standby Server. |
| 1342 | + This then ends recovery and the Standby will then come up as a normal |
| 1343 | + server. |
| 1344 | + </para> |
| 1345 | + |
| 1346 | + <para> |
| 1347 | + Sample code for the C version of the restore_command would be be: |
| 1348 | +<programlisting> |
| 1349 | +triggered = false; |
| 1350 | +while (!NextWALFileReady() && !triggered) |
| 1351 | +{ |
| 1352 | + sleep(100000L); // wait for ~0.1 sec |
| 1353 | + if (CheckForExternalTrigger()) |
| 1354 | + triggered = true; |
| 1355 | +} |
| 1356 | +if (!triggered) |
| 1357 | + CopyWALFileForRecovery(); |
| 1358 | +</programlisting> |
| 1359 | + </para> |
| 1360 | + |
| 1361 | + <para> |
| 1362 | + PostgreSQL does not provide the system software required to identify a |
| 1363 | + failure on the Primary and notify the Standby system and then the |
| 1364 | + Standby database server. Many such tools exist and are well integrated |
| 1365 | + with other aspects of a system failover, such as ip address migration. |
| 1366 | + </para> |
| 1367 | + |
| 1368 | + <para> |
| 1369 | + Triggering failover is an important part of planning and design. The |
| 1370 | + restore_command is executed in full once for each WAL file. The process |
| 1371 | + running the restore_command is therefore created and dies for each file, |
| 1372 | + so there is no daemon or server process and so we cannot use signals and |
| 1373 | + a signal handler. A more permanent notification is required to trigger |
| 1374 | + the failover. It is possible to use a simple timeout facility, |
| 1375 | + especially if used in conjunction with a known archive_timeout setting |
| 1376 | + on the Primary. This is somewhat error prone since a network or busy |
| 1377 | + Primary server might be sufficient to initiate failover. A notification |
| 1378 | + mechanism such as the explicit creation of a trigger file is less error |
| 1379 | + prone, if this can be arranged. |
| 1380 | + </para> |
| 1381 | + </sect2> |
| 1382 | + |
| 1383 | + <sect2 id="warm-standby-config"> |
| 1384 | + <title>Implementation</title> |
| 1385 | + |
| 1386 | + <para> |
| 1387 | + The short procedure for configuring a Standby Server is as follows. For |
| 1388 | + full details of each step, refer to previous sections as noted. |
| 1389 | + <orderedlist> |
| 1390 | + <listitem> |
| 1391 | + <para> |
| 1392 | + Set up Primary and Standby systems as near identically as possible, |
| 1393 | + including two identical copies of PostgreSQL at same release level. |
| 1394 | + </para> |
| 1395 | + </listitem> |
| 1396 | + <listitem> |
| 1397 | + <para> |
| 1398 | + Set up Continuous Archiving from the Primary to a WAL archive located |
| 1399 | + in a directory on the Standby Server. Ensure that both <xref |
| 1400 | + linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout"> |
| 1401 | + are set. (See <xref linkend="backup-archiving-wal">) |
| 1402 | + </para> |
| 1403 | + </listitem> |
| 1404 | + <listitem> |
| 1405 | + <para> |
| 1406 | + Make a Base Backup of the Primary Server. (See <xref |
| 1407 | + linkend="backup-base-backup">) |
| 1408 | + </para> |
| 1409 | + </listitem> |
| 1410 | + <listitem> |
| 1411 | + <para> |
| 1412 | + Begin recovery on the Standby Server from the local WAL archive, |
| 1413 | + using a recovery.conf that specifies a restore_command that waits as |
| 1414 | + described previously. (See <xref linkend="backup-pitr-recovery">) |
| 1415 | + </para> |
| 1416 | + </listitem> |
| 1417 | + </orderedlist> |
| 1418 | + </para> |
| 1419 | + |
| 1420 | + <para> |
| 1421 | + Recovery treats the WAL Archive as read-only, so once a WAL file has |
| 1422 | + been copied to the Standby system it can be copied to tape at the same |
| 1423 | + time as it is being used by the Standby database server to recover. |
| 1424 | + Thus, running a Standby Server for High Availability can be performed at |
| 1425 | + the same time as files are stored for longer term Disaster Recovery |
| 1426 | + purposes. |
| 1427 | + </para> |
| 1428 | + |
| 1429 | + <para> |
| 1430 | + For testing purposes, it is possible to run both Primary and Standby |
| 1431 | + servers on the same system. This does not provide any worthwhile |
| 1432 | + improvement on server robustness, nor would it be described as HA. |
| 1433 | + </para> |
| 1434 | + </sect2> |
| 1435 | + |
| 1436 | + <sect2 id="warm-standby-failover"> |
| 1437 | + <title>Failover</title> |
| 1438 | + |
| 1439 | + <para> |
| 1440 | + If the Primary Server fails then the Standby Server should take begin |
| 1441 | + failover procedures. |
| 1442 | + </para> |
| 1443 | + |
| 1444 | + <para> |
| 1445 | + If the Standby Server fails then no failover need take place. If the |
| 1446 | + Standby Server can be restarted, then the recovery process can also be |
| 1447 | + immediately restarted, taking advantage of Restartable Recovery. |
| 1448 | + </para> |
| 1449 | + |
| 1450 | + <para> |
| 1451 | + If the Primary Server fails and then immediately restarts, you must have |
| 1452 | + a mechanism for informing it that it is no longer the Primary. This is |
| 1453 | + sometimes known as STONITH (Should the Other Node In The Head), which is |
| 1454 | + necessary to avoid situations where both systems think they are the |
| 1455 | + Primary, which can lead to confusion and ultimately data loss. |
| 1456 | + </para> |
| 1457 | + |
| 1458 | + <para> |
| 1459 | + Many failover systems use just two systems, the Primary and the Standby, |
| 1460 | + connected by some kind of heartbeat mechanism to continually verify the |
| 1461 | + connectivity between the two and the viability of the Primary. It is |
| 1462 | + also possible to use a third system, known as a Witness Server to avoid |
| 1463 | + some problems of inappropriate failover, but the additional complexity |
| 1464 | + may not be worthwhile unless it is set-up with sufficient care and |
| 1465 | + rigorous testing. |
| 1466 | + </para> |
| 1467 | + |
| 1468 | + <para> |
| 1469 | + At the instant that failover takes place to the Standby, we have only a |
| 1470 | + single server in operation. This is known as a degenerate state. |
| 1471 | + The former Standby is now the Primary, but the former Primary is down |
| 1472 | + and may stay down. We must now fully re-create a Standby server, |
| 1473 | + either on the former Primary system when it comes up, or on a third, |
| 1474 | + possibly new, system. Once complete the Primary and Standby can be |
| 1475 | + considered to have switched roles. Some people choose to use a third |
| 1476 | + server to provide additional protection across the failover interval, |
| 1477 | + though clearly this complicates the system configuration and |
| 1478 | + operational processes (and this can also act as a Witness Server). |
| 1479 | + </para> |
| 1480 | + |
| 1481 | + <para> |
| 1482 | + So, switching from Primary to Standby Server can be fast, but requires |
| 1483 | + some time to re-prepare the failover cluster. Regular switching from |
| 1484 | + Primary to Standby is encouraged, since it allows the regular downtime |
| 1485 | + one each system required to maintain HA. This also acts as a test of the |
| 1486 | + failover so that it definitely works when you really need it. Written |
| 1487 | + administration procedures are advised. |
| 1488 | + </para> |
| 1489 | + </sect2> |
| 1490 | + |
| 1491 | + <sect2 id="warm-standby-record"> |
| 1492 | + <title>Implementing Record-based Log Shipping</title> |
| 1493 | + |
| 1494 | + <para> |
| 1495 | + The main features for Log Shipping in this release are based around the |
| 1496 | + file-based Log Shipping described above. It is also possible to |
| 1497 | + implement record-based Log Shipping using the pg_xlogfile_name_offset() |
| 1498 | + function, though this requires custom development. |
| 1499 | + </para> |
| 1500 | + |
| 1501 | + <para> |
| 1502 | + An external program can call pg_xlogfile_name_offset() to find out the |
| 1503 | + filename and the exact byte offset within it of the latest WAL pointer. |
| 1504 | + If the external program regularly polls the server it can find out how |
| 1505 | + far forward the pointer has moved. It can then access the WAL file |
| 1506 | + directly and copy those bytes across to a less up-to-date copy on a |
| 1507 | + Standby Server. |
| 1508 | + </para> |
| 1509 | + </sect2> |
| 1510 | + </sect1> |
| 1511 | + |
1206 | 1512 | <sect1 id="migration">
|
1207 | 1513 | <title>Migration Between Releases</title>
|
1208 | 1514 |
|
|