Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6b0be33

Browse files
committed
Update WAL configuration discussion to reflect post-7.1 tweaking.
Minor copy-editing.
1 parent8394e47 commit6b0be33

File tree

1 file changed

+80
-39
lines changed

1 file changed

+80
-39
lines changed

‎doc/src/sgml/wal.sgml

Lines changed: 80 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.11 2001/09/29 04:02:19 tgl Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.12 2001/10/26 23:10:21 tgl Exp $ -->
22

33
<chapter id="wal">
44
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
@@ -88,8 +88,11 @@
8888
transaction identifiers. Once UNDO is implemented,
8989
<filename>pg_clog</filename> will no longer be required to be
9090
permanent; it will be possible to remove
91-
<filename>pg_clog</filename> at shutdown, split it into segments
92-
and remove old segments.
91+
<filename>pg_clog</filename> at shutdown. (However, the urgency
92+
of this concern has decreased greatly with the adoption of a segmented
93+
storage method for <filename>pg_clog</filename> --- it is no longer
94+
necessary to keep old <filename>pg_clog</filename> entries around
95+
forever.)
9396
</para>
9497

9598
<para>
@@ -116,6 +119,18 @@
116119
copying the data files (operating system copy commands are not
117120
suitable).
118121
</para>
122+
123+
<para>
124+
A difficulty standing in the way of realizing these benefits is that they
125+
require saving <acronym>WAL</acronym> entries for considerable periods
126+
of time (eg, as long as the longest possible transaction if transaction
127+
UNDO is wanted). The present <acronym>WAL</acronym> format is
128+
extremely bulky since it includes many disk page snapshots.
129+
This is not a serious concern at present, since the entries only need
130+
to be kept for one or two checkpoint intervals; but to achieve
131+
these future benefits some sort of compressed <acronym>WAL</acronym>
132+
format will be needed.
133+
</para>
119134
</sect2>
120135
</sect1>
121136

@@ -133,8 +148,8 @@
133148
<para>
134149
<acronym>WAL</acronym> logs are stored in the directory
135150
<Filename><replaceable>$PGDATA</replaceable>/pg_xlog</Filename>, as
136-
a set of segment files, each16 MB in size. Each segment is
137-
divided into8 kB pages. The log record headers are described in
151+
a set of segment files, each16MB in size. Each segment is
152+
divided into8KB pages. The log record headers are described in
138153
<filename>access/xlog.h</filename>; record content is dependent on
139154
the type of event that is being logged. Segment files are given
140155
ever-increasing numbers as names, starting at
@@ -147,8 +162,8 @@
147162
The <acronym>WAL</acronym> buffers and control structure are in
148163
shared memory, and are handled by the backends; they are protected
149164
by lightweight locks. The demand on shared memory is dependent on the
150-
number of buffers; the default size of the <acronym>WAL</acronym>
151-
buffers is64 kB.
165+
number of buffers. The default size of the <acronym>WAL</acronym>
166+
buffers is8 8KB buffers, or 64KB.
152167
</para>
153168

154169
<para>
@@ -166,8 +181,8 @@
166181
disk drives that falsely report a successful write to the kernel,
167182
when, in fact, they have only cached the data and not yet stored it
168183
on the disk. A power failure in such a situation may still lead to
169-
irrecoverable data corruption; administrators should try to ensure
170-
that disks holding <productname>PostgreSQL</productname>'s data and
184+
irrecoverable data corruption. Administrators should try to ensure
185+
that disks holding <productname>PostgreSQL</productname>'s
171186
log files do not make such false reports.
172187
</para>
173188

@@ -179,11 +194,12 @@
179194
checkpoint's position is saved in the file
180195
<filename>pg_control</filename>. Therefore, when recovery is to be
181196
done, the backend first reads <filename>pg_control</filename> and
182-
then the checkpoint record; next it reads the redo record, whose
183-
position is saved in the checkpoint, and begins the REDO operation.
184-
Because the entire content of the pages is saved in the log on the
185-
first page modification after a checkpoint, the pages will be first
186-
restored to a consistent state.
197+
then the checkpoint record; then it performs the REDO operation by
198+
scanning forward from the log position indicated in the checkpoint
199+
record.
200+
Because the entire content of data pages is saved in the log on the
201+
first page modification after a checkpoint, all pages changed since
202+
the checkpoint will be restored to a consistent state.
187203
</para>
188204

189205
<para>
@@ -217,9 +233,9 @@
217233
buffers. This is undesirable because <function>LogInsert</function>
218234
is used on every database low level modification (for example,
219235
tuple insertion) at a time when an exclusive lock is held on
220-
affected data pages and the operationis supposed to be as fast as
221-
possible; what is worse, writing <acronym>WAL</acronym> buffers may
222-
alsocause the creation of a new log segment, which takes even more
236+
affected data pages, so the operationneeds to be as fast as
237+
possible. What is worse, writing <acronym>WAL</acronym> buffers may
238+
alsoforce the creation of a new log segment, which takes even more
223239
time. Normally, <acronym>WAL</acronym> buffers should be written
224240
and flushed by a <function>LogFlush</function> request, which is
225241
made, for the most part, at transaction commit time to ensure that
@@ -230,7 +246,7 @@
230246
one should increase the number of <acronym>WAL</acronym> buffers by
231247
modifying the <varname>WAL_BUFFERS</varname> parameter. The default
232248
number of <acronym>WAL</acronym> buffers is 8. Increasing this
233-
value willhave an impact on shared memory usage.
249+
value willcorrespondingly increase shared memory usage.
234250
</para>
235251

236252
<para>
@@ -243,34 +259,28 @@
243259
log (known as the redo record) it should start the REDO operation,
244260
since any changes made to data files before that record are already
245261
on disk. After a checkpoint has been made, any log segments written
246-
before the undo records are removed, so checkpoints are used to free
247-
disk space in the <acronym>WAL</acronym> directory. (When
248-
<acronym>WAL</acronym>-based <acronym>BAR</acronym> is implemented,
249-
the log segments can be archived instead of just being removed.)
250-
The checkpoint maker is also able to create a few log segments for
251-
future use, so as to avoid the need for
252-
<function>LogInsert</function> or <function>LogFlush</function> to
253-
spend time in creating them.
262+
before the undo records are no longer needed and can be recycled or
263+
removed. (When <acronym>WAL</acronym>-based <acronym>BAR</acronym> is
264+
implemented, the log segments would be archived before being recycled
265+
or removed.)
254266
</para>
255267

256268
<para>
257-
The <acronym>WAL</acronym> log is held on the disk as a set of 16
258-
MB files called <firstterm>segments</firstterm>. By default a new
259-
segment is created only if more than 75% of the current segment is
260-
used. One can instruct the server to pre-create up to 64 log segments
269+
The checkpoint maker is also able to create a few log segments for
270+
future use, so as to avoid the need for
271+
<function>LogInsert</function> or <function>LogFlush</function> to
272+
spend time in creating them. (If that happens, the entire database
273+
system will be delayed by the creation operation, so it's better if
274+
the files can be created in the checkpoint maker, which is not on
275+
anyone's critical path.)
276+
By default a new 16MB segment file is created only if more than 75% of
277+
the current segment has been used. This is inadequate if the system
278+
generates more than 4MB of log output between checkpoints.
279+
One can instruct the server to pre-create up to 64 log segments
261280
at checkpoint time by modifying the <varname>WAL_FILES</varname>
262281
configuration parameter.
263282
</para>
264283

265-
<para>
266-
For faster after-crash recovery, it would be better to create
267-
checkpoints more often. However, one should balance this against
268-
the cost of flushing dirty data pages; in addition, to ensure data
269-
page consistency, the first modification of a data page after each
270-
checkpoint results in logging the entire page content, thus
271-
increasing output to log and the log's size.
272-
</para>
273-
274284
<para>
275285
The postmaster spawns a special backend process every so often
276286
to create the next checkpoint. A checkpoint is created every
@@ -281,6 +291,35 @@
281291
<command>CHECKPOINT</command>.
282292
</para>
283293

294+
<para>
295+
Reducing <varname>CHECKPOINT_SEGMENTS</varname> and/or
296+
<varname>CHECKPOINT_TIMEOUT</varname> causes checkpoints to be
297+
done more often. This allows faster after-crash recovery (since
298+
less work will need to be redone). However, one must balance this against
299+
the increased cost of flushing dirty data pages more often. In addition,
300+
to ensure data page consistency, the first modification of a data page
301+
after each checkpoint results in logging the entire page content.
302+
Thus a smaller checkpoint interval increases the volume of output to
303+
the log, partially negating the goal of using a smaller interval, and
304+
in any case causing more disk I/O.
305+
</para>
306+
307+
<para>
308+
The number of 16MB segment files will always be at least
309+
<varname>WAL_FILES</varname> + 1, and will normally not exceed
310+
<varname>WAL_FILES</varname> + 2 * <varname>CHECKPOINT_SEGMENTS</varname>
311+
+ 1. This may be used to estimate space requirements for WAL. Ordinarily,
312+
when an old log segment file is no longer needed, it is recycled (renamed
313+
to become the next sequential future segment). If, due to a short-term
314+
peak of log output rate, there are more than <varname>WAL_FILES</varname> +
315+
2 * <varname>CHECKPOINT_SEGMENTS</varname> + 1 segment files, then unneeded
316+
segment files will be deleted instead of recycled until the system gets
317+
back under this limit. (If this happens on a regular basis,
318+
<varname>WAL_FILES</varname> should be increased to avoid it. Deleting log
319+
segments that will only have to be created again later is expensive and
320+
pointless.)
321+
</para>
322+
284323
<para>
285324
The <varname>COMMIT_DELAY</varname> parameter defines for how many
286325
microseconds the backend will sleep after writing a commit
@@ -294,6 +333,8 @@
294333
Note that on most platforms, the resolution of a sleep request is
295334
ten milliseconds, so that any nonzero <varname>COMMIT_DELAY</varname>
296335
setting between 1 and 10000 microseconds will have the same effect.
336+
Good values for these parameters are not yet clear; experimentation
337+
is encouraged.
297338
</para>
298339

299340
<para>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp