11<!--
2- $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.15 2002/06/22 04:08:07 momjian Exp $
2+ $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.16 2002/06/23 03:37:12 momjian Exp $
33-->
44
55<chapter id="maintenance">
@@ -55,8 +55,8 @@ $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.15 2002/06/22 04:08:07
5555 </indexterm>
5656
5757 <para>
58- <productname>PostgreSQL</productname>'s <command>VACUUM</> command must be
59- run on a regular basis for several reasons:
58+ <productname>PostgreSQL</productname>'s <command>VACUUM</> command
59+ must be run on a regular basis for several reasons:
6060
6161 <orderedlist>
6262 <listitem>
@@ -100,26 +100,27 @@ $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.15 2002/06/22 04:08:07
100100 </indexterm>
101101
102102 <para>
103- In normal <productname>PostgreSQL</productname> operation, an <command>UPDATE</> or
104- <command>DELETE</> of a row does not immediately remove the old <firstterm>tuple</>
105- (version of therow). This approach is necessary to gain thebenefits
106- of multiversion concurrency control (see the<citetitle>User's Guide</>):
107- the tuple must not be deleted while
108- it is still potentially visible to other transactions. But eventually,
109- an outdated or deleted tuple is no longer of interest to any transaction.
110- The space it occupies must be reclaimed for reuse by new tuples, to avoid
111- infinite growth of disk space requirements. This is done by running
112- <command>VACUUM</>.
103+ In normal <productname>PostgreSQL</productname> operation, an
104+ <command>UPDATE</> or <command> DELETE</> of a row does not
105+ immediately remove theold <firstterm>tuple</> (version of therow).
106+ This approach is necessary to gain thebenefits of multiversion
107+ concurrency control (see the <citetitle>User's Guide</>): the tuple
108+ must not be deleted while it is still potentially visible to other
109+ transactions. But eventually, an outdated or deleted tuple is no
110+ longer of interest to any transaction. The space it occupies must be
111+ reclaimed for reuse by new tuples, to avoid infinite growth of disk
112+ space requirements. This is done by running <command>VACUUM</>.
113113 </para>
114114
115115 <para>
116116 Clearly, a table that receives frequent updates or deletes will need
117- to be vacuumed more often than tables that are seldom updated. It may
118- be useful to set up periodic <application>cron</> tasks that vacuum only selected tables,
119- skipping tables that are known not to change often. This is only likely
120- to be helpful if you have both large heavily-updated tables and large
121- seldom-updated tables --- the extra cost of vacuuming a small table
122- isn't enough to be worth worrying about.
117+ to be vacuumed more often than tables that are seldom updated. It
118+ may be useful to set up periodic <application>cron</> tasks that
119+ vacuum only selected tables, skipping tables that are known not to
120+ change often. This is only likely to be helpful if you have both
121+ large heavily-updated tables and large seldom-updated tables --- the
122+ extra cost of vacuuming a small table isn't enough to be worth
123+ worrying about.
123124 </para>
124125
125126 <para>
@@ -174,18 +175,18 @@ $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.15 2002/06/22 04:08:07
174175
175176 <para>
176177 As with vacuuming for space recovery, frequent updates of statistics
177- are more useful for heavily-updated tables than for seldom-updated ones.
178- But even for a heavily-updated table, there may be no need for
179- statistics updates if the statistical distribution of the data is not
180- changing much. A simple rule of thumb is to think about how much
178+ are more useful for heavily-updated tables than for seldom-updated
179+ ones. But even for a heavily-updated table, there may be no need for
180+ statistics updates if the statistical distribution of the data is
181+ not changing much. A simple rule of thumb is to think about how much
181182 the minimum and maximum values of the columns in the table change.
182- For example, a <type>timestamp</type> column that contains the time of row update
183- will have a constantly-increasing maximum value as rows are added and
184- updated; such a column will probably need more frequent statistics
185- updates than, say, a column containing URLs for pages accessed on a
186- website. The URL column may receive changes just as often, but the
187- statistical distribution of its values probably changes relatively
188- slowly.
183+ For example, a <type>timestamp</type> column that contains the time
184+ of row update will have a constantly-increasing maximum value as
185+ rows are added and updated; such a column will probably need more
186+ frequent statistics updates than, say, a column containing URLs for
187+ pages accessed on a website. The URL column may receive changes just
188+ as often, but the statistical distribution of its values probably
189+ changes relatively slowly.
189190 </para>
190191
191192 <para>
@@ -247,42 +248,45 @@ $Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.15 2002/06/22 04:08:07
247248
248249 <para>
249250 Prior to <productname>PostgreSQL</productname> 7.2, the only defense
250- against XID wraparound was to re-<command>initdb</> at least every 4 billion
251- transactions. This of course was not very satisfactory for high-traffic
252- sites, so a better solution has been devised. The new approach allows an
253- installation to remain up indefinitely, without <command>initdb</> or any sort of
254- restart. The price is this maintenance requirement:
255- <emphasis>every table in the database must be vacuumed at least once every
256- billion transactions</emphasis>.
251+ against XID wraparound was to re-<command>initdb</> at least every 4
252+ billion transactions. This of course was not very satisfactory for
253+ high-traffic sites, so a better solution has been devised. The new
254+ approach allows an installation to remain up indefinitely, without
255+ <command>initdb</> or any sort of restart. The price is this
256+ maintenance requirement: <emphasis>every table in the database must
257+ be vacuumed at least once every billion transactions</emphasis>.
257258 </para>
258259
259260 <para>
260- In practice this isn't an onerous requirement, but since the consequences
261- of failing to meet it can be complete data loss (not just wasted disk
262- space or slow performance), some special provisions have been made to help
263- database administrators keep track of the time since the last
264- <command>VACUUM</>. The remainder of this section gives the details.
261+ In practice this isn't an onerous requirement, but since the
262+ consequences of failing to meet it can be complete data loss (not
263+ just wasted disk space or slow performance), some special provisions
264+ have been made to help database administrators keep track of the
265+ time since the last <command>VACUUM</>. The remainder of this
266+ section gives the details.
265267 </para>
266268
267269 <para>
268- The new approach to XID comparison distinguishes two special XIDs, numbers
269- 1 and 2 (<literal>BootstrapXID</> and <literal>FrozenXID</>). These two
270- XIDs are always considered older than every normal XID. Normal XIDs (those
271- greater than 2) are compared using modulo-2<superscript>31</> arithmetic. This means
270+ The new approach to XID comparison distinguishes two special XIDs,
271+ numbers 1 and 2 (<literal>BootstrapXID</> and
272+ <literal>FrozenXID</>). These two XIDs are always considered older
273+ than every normal XID. Normal XIDs (those greater than 2) are
274+ compared using modulo-2<superscript>31</> arithmetic. This means
272275 that for every normal XID, there are two billion XIDs that are
273- <quote>older</> and two billion that are <quote>newer</>; another way to
274- say it is that the normal XID space is circular with no endpoint.
275- Therefore, once a tuple has been created with a particular normal XID, the
276- tuple will appear to be <quote>in the past</> for the next two billion
277- transactions, no matter which normal XID we are talking about. If the
278- tuple still exists after more than two billion transactions, it will
279- suddenly appear to be in the future. To prevent data loss, old tuples
280- must be reassigned the XID <literal>FrozenXID</> sometime before they reach
281- the two-billion-transactions-old mark. Once they are assigned this
282- special XID, they will appear to be <quote>in the past</> to all normal
283- transactions regardless of wraparound issues, and so such tuples will be
284- good until deleted, no matter how long that is. This reassignment of
285- XID is handled by <command>VACUUM</>.
276+ <quote>older</> and two billion that are <quote>newer</>; another
277+ way to say it is that the normal XID space is circular with no
278+ endpoint. Therefore, once a tuple has been created with a particular
279+ normal XID, the tuple will appear to be <quote>in the past</> for
280+ the next two billion transactions, no matter which normal XID we are
281+ talking about. If the tuple still exists after more than two billion
282+ transactions, it will suddenly appear to be in the future. To
283+ prevent data loss, old tuples must be reassigned the XID
284+ <literal>FrozenXID</> sometime before they reach the
285+ two-billion-transactions-old mark. Once they are assigned this
286+ special XID, they will appear to be <quote>in the past</> to all
287+ normal transactions regardless of wraparound issues, and so such
288+ tuples will be good until deleted, no matter how long that is. This
289+ reassignment of XID is handled by <command>VACUUM</>.
286290 </para>
287291
288292 <para>
@@ -346,21 +350,22 @@ VACUUM
346350 <para>
347351 <command>VACUUM</> with the <command>FREEZE</> option uses a more
348352 aggressive freezing policy: tuples are frozen if they are old enough
349- to be considered good by all open transactions. In particular, if
350- a <command>VACUUM FREEZE</> is performed in an otherwise-idle database,
351- it is guaranteed that <emphasis>all</> tuples in that database will be
352- frozen. Hence, as long as the database is not modified in any way, it
353- will not need subsequent vacuuming to avoid transaction ID wraparound
354- problems. This technique is used by <filename>initdb</> to prepare the
355- <filename>template0</> database. It should also be used to prepare any
356- user-created databases that are to be marked <literal>datallowconn</> =
357- <literal>false</> in <filename>pg_database</>, since there isn't any
358- convenient way to vacuum a database that you can't connect to. Note
359- that <command>VACUUM</command>'s automatic warning message about unvacuumed databases will
360- ignore <filename>pg_database</> entries with <literal>datallowconn</> =
361- <literal>false</>, so as to avoid giving false warnings about these
362- databases; therefore it's up to you to ensure that such databases are
363- frozen correctly.
353+ to be considered good by all open transactions. In particular, if a
354+ <command>VACUUM FREEZE</> is performed in an otherwise-idle
355+ database, it is guaranteed that <emphasis>all</> tuples in that
356+ database will be frozen. Hence, as long as the database is not
357+ modified in any way, it will not need subsequent vacuuming to avoid
358+ transaction ID wraparound problems. This technique is used by
359+ <filename>initdb</> to prepare the <filename>template0</> database.
360+ It should also be used to prepare any user-created databases that
361+ are to be marked <literal>datallowconn</> = <literal>false</> in
362+ <filename>pg_database</>, since there isn't any convenient way to
363+ vacuum a database that you can't connect to. Note that
364+ <command>VACUUM</command>'s automatic warning message about
365+ unvacuumed databases will ignore <filename>pg_database</> entries
366+ with <literal>datallowconn</> = <literal>false</>, so as to avoid
367+ giving false warnings about these databases; therefore it's up to
368+ you to ensure that such databases are frozen correctly.
364369 </para>
365370
366371 </sect2>
@@ -375,13 +380,20 @@ VACUUM
375380 </indexterm>
376381
377382 <para>
378- <productname>PostgreSQL</productname> is unable to reuse index pages
379- in some cases. The problem is that if indexed rows are deleted, those
380- indexes pages can only be reused by rows with similar values. In
381- cases where low indexed rows are deleted and newly inserted rows have
382- high values, disk space used by the index will grow indefinately, even
383- if <command>VACUUM</> is run frequently.
384- TO BE COMPLETED 2002-06-22 bjm
383+ <productname>PostgreSQL</productname> is unable to reuse btree index
384+ pages in certain cases. The problem is that if indexed rows are
385+ deleted, those index pages can only be reused by rows with similar
386+ values. For example, if indexed rows are deleted and newly
387+ inserted/updated rows have much higher values, the new rows can't use
388+ the index space made available by the deleted rows. Instead, such
389+ new rows must be placed on new index pages. In such cases, disk
390+ space used by the index will grow indefinately, even if
391+ <command>VACUUM</> is run frequently.
392+ </para>
393+ <para>
394+ As a solution, you can use the <command>REINDEX</> command
395+ periodically to discard pages used by deleted rows. There is also
396+ <filename>contrib/reindex</> which can reindex an entire database.
385397 </para>
386398 </sect1>
387399
@@ -404,31 +416,32 @@ VACUUM
404416 </para>
405417
406418 <para>
407- If you simply direct the postmaster's <systemitem>stderr</> into a file, the only way
408- to truncate the log file is to stop and restart the postmaster. This
409- may be OK for development setups but you won't want to run a production
410- server that way.
419+ If you simply direct the postmaster's <systemitem>stderr</> into a
420+ file, the only way to truncate the log file is to stop and restart
421+ the postmaster. This may be OK for development setups but you won't
422+ want to run a production server that way.
411423 </para>
412424
413425 <para>
414- The simplest production-grade approach to managing log output is to send it
415- all to <application>syslog</> and let <application>syslog</> deal with file
416- rotation. To do this, make sure <productname>PostgreSQL</> was built with
417- the <option>--enable-syslog </>configure option, and set
418- <literal> syslog</>to 2
419- (log to syslog only) in <filename>postgresql.conf</>.
420- Then you can send a <literal>SIGHUP</literal> signal to the
421- <application>syslog</> daemon whenever you want toforce it to start
422- writing a new log file.
426+ The simplest production-grade approach to managing log output is to
427+ send it all to <application>syslog</> and let <application>syslog</>
428+ deal with file rotation. To do this, make sure
429+ <productname>PostgreSQL </>was built with the
430+ <option>--enable- syslog</>configure option, and set
431+ <literal>syslog</> to 2 (log to syslog only) in
432+ <filename>postgresql.conf</>. Then you can send a
433+ <literal>SIGHUP</literal> signal tothe <application>syslog</> daemon
434+ whenever you want to force it to start writing a new log file.
423435 </para>
424436
425437 <para>
426438 On many systems, however, syslog is not very reliable, particularly
427439 with large log messages; it may truncate or drop messages just when
428- you need them the most. You may find it more useful to pipe the
429- <application>postmaster</>'s <systemitem>stderr</> to some type of log rotation script.
430- If you start the postmaster with <application>pg_ctl</>, then the
431- postmaster's <systemitem>stderr</> is already redirected to <systemitem>stdout</>, so you just need a
440+ you need them the most. You may find it more useful to pipe the
441+ <application>postmaster</>'s <systemitem>stderr</> to some type of
442+ log rotation script. If you start the postmaster with
443+ <application>pg_ctl</>, then the postmaster's <systemitem>stderr</>
444+ is already redirected to <systemitem>stdout</>, so you just need a
432445 pipe command:
433446
434447<screen>