Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite05a8c5

Browse files
committed
Add section on reliable operation, talking about caching and storage
subsystem reliability.
1 parent1c25594 commite05a8c5

File tree

1 file changed

+105
-24
lines changed

1 file changed

+105
-24
lines changed

‎doc/src/sgml/wal.sgml

Lines changed: 105 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,114 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.31 2004/11/15 06:32:14 neilc Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.32 2005/09/28 18:18:02 momjian Exp $ -->
22

3-
<chapter id="wal">
4-
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
3+
<chapter id="reliability">
4+
<title>Reliability</title>
55

6-
<indexterm zone="wal">
7-
<primary>WAL</primary>
8-
</indexterm>
6+
<para>
7+
Reliability is a major feature of any serious database system, and
8+
<productname>PostgreSQL</> does everything possible to guarantee
9+
reliable operation. One aspect of reliable operation is that all data
10+
recorded by a transaction should be stored in a non-volatile area
11+
that is safe from power loss, operating system failure, and hardware
12+
failure (unrelated to the non-volatile area itself). To accomplish
13+
this, <productname>PostgreSQL</> uses the magnetic platters of modern
14+
disk drives for permanent storage that is immune to the failures
15+
listed above. In fact, a computer can be completely destroyed, but if
16+
the disk drives survive they can be moved to another computer with
17+
similar hardware and all committed transaction will remain intact.
18+
</para>
919

10-
<indexterm>
11-
<primary>transaction log</primary>
12-
<see>WAL</see>
13-
</indexterm>
20+
<para>
21+
While forcing data periodically to the disk platters might seem like
22+
a simple operation, it is not. Because disk drives are dramatically
23+
slower than main memory and CPUs, several layers of caching exist
24+
between the computer's main memory and the disk drive platters.
25+
First, there is the operating system kernel cache, which caches
26+
frequently requested disk blocks and delays disk writes. Fortunately,
27+
all operating systems give applications a way to force writes from
28+
the kernel cache to disk, and <productname>PostgreSQL</> uses those
29+
features. In fact, the <xref linkend="guc-wal-sync-method"> parameter
30+
controls how this is done.
31+
</para>
32+
<para>
33+
Secondly, there is an optional disk drive controller cache,
34+
particularly popular on <acronym>RAID</> controller cards. Some of
35+
these caches are <literal>write-through</>, meaning writes are passed
36+
along to the drive as soon as they arrive. Others are
37+
<literal>write-back</>, meaning data is passed on to the drive at
38+
some later time. Such caches can be a reliability problem because the
39+
disk controller card cache is volatile, unlike the disk driver
40+
platters, unless the disk drive controller has a battery-backed
41+
cache, meaning the card has a battery that maintains power to the
42+
cache in case of server power loss. When the disk drives are later
43+
accessible, the data is written to the drives.
44+
</para>
1445

1546
<para>
16-
<firstterm>Write-Ahead Logging</firstterm> (<acronym>WAL</acronym>)
17-
is a standard approach to transaction logging. Its detailed
18-
description may be found in most (if not all) books about
19-
transaction processing. Briefly, <acronym>WAL</acronym>'s central
20-
concept is that changes to data files (where tables and indexes
21-
reside) must be written only after those changes have been logged,
22-
that is, when log records describing the changes have been flushed
23-
to permanent storage. If we follow this procedure, we do not need
24-
to flush data pages to disk on every transaction commit, because we
25-
know that in the event of a crash we will be able to recover the
26-
database using the log: any changes that have not been applied to
27-
the data pages can be redone from the log records. (This is
28-
roll-forward recovery, also known as REDO.)
47+
And finally, most disk drives have caches. Some are write-through
48+
(typically SCSI), and some are write-back(typically IDE), and the
49+
same concerns about data loss exist for write-back drive caches as
50+
exist for disk controller caches. To have reliability, all
51+
storage subsystems must be reliable in their storage characteristics.
52+
When the operating system sends a write request to the drive platters,
53+
there is little it can do to make sure the data has arrived at a
54+
non-volatile store area on the system. Rather, it is the
55+
administrator's responsibility to be sure that all storage components
56+
have reliable characteristics.
57+
</para>
58+
59+
<para>
60+
One other area of potential data loss are the disk platter writes
61+
themselves. Disk platters are internally made up of 512-byte sectors.
62+
When a write request arrives at the drive, it might be for 512 bytes,
63+
1024 bytes, or 8192 bytes, and the process of writing could fail due
64+
to power loss at any time, meaning some of the 512-byte sectors were
65+
written, and others were not, or the first half of a 512-byte sector
66+
has new data, and the remainder has the original data. Obviously, on
67+
startup, <productname>PostgreSQL</> would not be able to deal with
68+
these partially written cases. To guard against that,
69+
<productname>PostgreSQL</> periodically writes full page images to
70+
permanent storage <emphasis>before</> modifying the actual page on
71+
disk. By doing this, during recovery <productname>PostgreSQL</> can
72+
restore partially-written pages. If you have a battery-backed disk
73+
controller that prevents partial page writes, you can turn off this
74+
page imaging by using the <xref linkend="guc-full-page-writes">
75+
parameter.
76+
</para>
77+
78+
<para>
79+
The following sections into detail about how the Write-Ahead Log
80+
is used to obtain efficient, reliable operation.
2981
</para>
3082

83+
<sect1 id="wal">
84+
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
85+
86+
<indexterm zone="wal">
87+
<primary>WAL</primary>
88+
</indexterm>
89+
90+
<indexterm>
91+
<primary>transaction log</primary>
92+
<see>WAL</see>
93+
</indexterm>
94+
95+
<para>
96+
<firstterm>Write-Ahead Logging</firstterm> (<acronym>WAL</acronym>)
97+
is a standard approach to transaction logging. Its detailed
98+
description may be found in most (if not all) books about
99+
transaction processing. Briefly, <acronym>WAL</acronym>'s central
100+
concept is that changes to data files (where tables and indexes
101+
reside) must be written only after those changes have been logged,
102+
that is, when log records describing the changes have been flushed
103+
to permanent storage. If we follow this procedure, we do not need
104+
to flush data pages to disk on every transaction commit, because we
105+
know that in the event of a crash we will be able to recover the
106+
database using the log: any changes that have not been applied to
107+
the data pages can be redone from the log records. (This is
108+
roll-forward recovery, also known as REDO.)
109+
</para>
110+
</sect1>
111+
31112
<sect1 id="wal-benefits">
32113
<title>Benefits of <acronym>WAL</acronym></title>
33114

@@ -238,7 +319,7 @@
238319
</sect1>
239320

240321
<sect1 id="wal-internals">
241-
<title>Internals</title>
322+
<title>WALInternals</title>
242323

243324
<para>
244325
<acronym>WAL</acronym> is automatically enabled; no action is

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp