1- <!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.27 2009/04/23 10:20:27 heikki Exp $ -->
1+ <!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.28 2009/05/16 22:03:53 tgl Exp $ -->
22
33<chapter id="storage">
44
@@ -33,7 +33,7 @@ these required items, the cluster configuration files
3333<filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
3434<filename>pg_ident.conf</filename> are traditionally stored in
3535<varname>PGDATA</> (although in <productname>PostgreSQL</productname> 8.0 and
36- later, it is possible to keep them elsewhere).
36+ later, it is possible to keep them elsewhere).
3737</para>
3838
3939<table tocentry="1" id="pgdata-contents-table">
7474<row>
7575 <entry><filename>pg_multixact</></entry>
7676 <entry>Subdirectory containing multitransaction status data
77- (used for shared row locks)</entry>
77+ (used for shared row locks)</entry>
7878</row>
7979
8080<row>
@@ -131,12 +131,12 @@ there.
131131Each table and index is stored in a separate file, named after the table
132132or index's <firstterm>filenode</> number, which can be found in
133133<structname>pg_class</>.<structfield>relfilenode</>. In addition to the
134- main file (aka. main fork), a <firstterm>free space map</> (see
135- < xref linkend="storage-fsm">) that stores information about free space
136- available in the relation, is stored in a file named after the filenode
137- number, with the <literal>_fsm</> suffix . Tables also have a visibility map
138- fork, with the <literal>_vm</> suffix , to track which pages are known to have
139- no dead tuples and therefore need no vacuuming.
134+ main file (a/k/a main fork),each table and index has a <firstterm>free space
135+ map</> (see < xref linkend="storage-fsm">), which stores information about free
136+ space available in the relation. The free space map is stored in a file named
137+ with the filenode number plus thesuffix <literal>_fsm</>. Tables also have a
138+ visibility map fork, with thesuffix <literal>_vm</>, to track which pages are
139+ known to have no dead tuples and therefore need no vacuuming.
140140</para>
141141
142142<caution>
@@ -157,6 +157,8 @@ This arrangement avoids problems on platforms that have file size limitations.
157157(Actually, 1 GB is just the default segment size. The segment size can be
158158adjusted using the configuration option <option>--with-segsize</option>
159159when building <productname>PostgreSQL</>.)
160+ In principle, free space map and visibility map forks could require multiple
161+ segments as well, though this is unlikely to happen in practice.
160162The contents of tables and indexes are discussed further in
161163<xref linkend="storage-page-layout">.
162164</para>
@@ -193,7 +195,7 @@ if a tablespace other than <literal>pg_default</> is specified for them.
193195The name of a temporary file has the form
194196<filename>pgsql_tmp<replaceable>PPP</>.<replaceable>NNN</></filename>,
195197where <replaceable>PPP</> is the PID of the owning backend and
196- <replaceable>NNN</> distinguishes different files of that backend.
198+ <replaceable>NNN</> distinguishes differenttemporary files of that backend.
197199</para>
198200
199201</sect1>
@@ -215,10 +217,10 @@ Oversized-Attribute Storage Technique).
215217<para>
216218<productname>PostgreSQL</productname> uses a fixed page size (commonly
2172198 kB), and does not allow tuples to span multiple pages. Therefore, it is
218- not possible to store very large field values directly. To overcome
220+ not possible to store very large field values directly. To overcome
219221this limitation, large field values are compressed and/or broken up into
220222multiple physical rows. This happens transparently to the user, with only
221- small impact on most of the backend code. The technique is affectionately
223+ small impact on most of the backend code. The technique is affectionately
222224known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
223225</para>
224226
@@ -377,24 +379,24 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
377379
378380<title>Free Space Map</title>
379381
380- <indexterm>
381- <primary>Free Space Map</primary>
382- </indexterm>
383- <indexterm><primary>FSM</><see>Free Space Map</></indexterm>
382+ <indexterm>
383+ <primary>Free Space Map</primary>
384+ </indexterm>
385+ <indexterm><primary>FSM</><see>Free Space Map</></indexterm>
384386
385387<para>
386- A Free Space Map is stored with every heap and index relation, except for
387- hash indexes, to keep track of available space in the relation. It's stored
388- along the main relation data, in a separateFSM relation fork, named after
389- relfilenode of the relation,but with a <literal>_fsm</> suffix. For example,
390- if therelfilenode of a relation is 12345, the FSM is stored in a file called
388+ Each heap and index relation, except for hash indexes, has a Free Space Map
389+ (FSM) to keep track of available space in the relation. It's stored
390+ alongside the main relation data in a separate relation fork, named after the
391+ filenode number of the relation,plus a <literal>_fsm</> suffix. For example,
392+ if thefilenode of a relation is 12345, the FSM is stored in a file called
391393<filename>12345_fsm</>, in the same directory as the main relation file.
392394</para>
393395
394396<para>
395397The Free Space Map is organized as a tree of <acronym>FSM</> pages. The
396- bottom level <acronym>FSM</> pagesstores the free space available onevery
397- heap (or index) page, using one byte to represent eachheap page. The upper
398+ bottom level <acronym>FSM</> pagesstore the free space available oneach
399+ heap (or index) page, using one byte to represent eachsuch page. The upper
398400levels aggregate information from the lower levels.
399401</para>
400402
@@ -409,8 +411,8 @@ at the root.
409411<para>
410412See <filename>src/backend/storage/freespace/README</> for more details on
411413how the <acronym>FSM</> is structured, and how it's updated and searched.
412- <xref linkend="pgfreespacemap"> contrib module can be used toview the
413- information stored in free space maps.
414+ The <filename> contrib/pg_freespacemap</> module can be used toexamine the
415+ information stored in free space maps (see <xref linkend="pgfreespacemap">) .
414416</para>
415417
416418</sect1>
@@ -515,7 +517,7 @@ data. Empty in ordinary tables.</entry>
515517 and <structfield>pd_special</structfield>). These contain byte offsets
516518 from the page start to the start
517519 of unallocated space, to the end of unallocated space, and to the start of
518- the special space.
520+ the special space.
519521 The next 2 bytes of the page header,
520522 <structfield>pd_pagesize_version</structfield>, store both the page size
521523 and a version indicator. Beginning with
@@ -530,15 +532,15 @@ data. Empty in ordinary tables.</entry>
530532 more than one page size in an installation.
531533 The last field is a hint that shows whether pruning the page is likely
532534 to be profitable: it tracks the oldest un-pruned XMAX on the page.
533-
535+
534536 </para>
535-
537+
536538 <table tocentry="1" id="pageheaderdata-table">
537539 <title>PageHeaderData Layout</title>
538540 <titleabbrev>PageHeaderData Layout</titleabbrev>
539- <tgroup cols="4">
541+ <tgroup cols="4">
540542 <thead>
541- <row>
543+ <row>
542544 <entry>Field</entry>
543545 <entry>Type</entry>
544546 <entry>Length</entry>
@@ -627,25 +629,25 @@ data. Empty in ordinary tables.</entry>
627629 </para>
628630
629631 <para>
630-
632+
631633 The items themselves are stored in space allocated backwards from the end
632634 of unallocated space. The exact structure varies depending on what the
633635 table is to contain. Tables and sequences both use a structure named
634636 <type>HeapTupleHeaderData</type>, described below.
635637
636638 </para>
637-
639+
638640 <para>
639-
641+
640642 The final section is the <quote>special section</quote> which can
641643 contain anything the access method wishes to store. For example,
642644 b-tree indexes store links to the page's left and right siblings,
643645 as well as some other data relevant to the index structure.
644646 Ordinary tables do not use a special section at all (indicated by setting
645647 <structfield>pd_special</> to equal the page size).
646-
648+
647649 </para>
648-
650+
649651 <para>
650652
651653 All table rows are structured in the same way. There is a fixed-size
@@ -669,15 +671,15 @@ data. Empty in ordinary tables.</entry>
669671 <structfield>t_hoff</> a MAXALIGN multiple will appear between the null
670672 bitmap and the object ID. (This in turn ensures that the object ID is
671673 suitably aligned.)
672-
674+
673675 </para>
674-
676+
675677 <table tocentry="1" id="heaptupleheaderdata-table">
676678 <title>HeapTupleHeaderData Layout</title>
677679 <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
678- <tgroup cols="4">
680+ <tgroup cols="4">
679681 <thead>
680- <row>
682+ <row>
681683 <entry>Field</entry>
682684 <entry>Type</entry>
683685 <entry>Length</entry>
@@ -743,7 +745,7 @@ data. Empty in ordinary tables.</entry>
743745 </para>
744746
745747 <para>
746-
748+
747749 Interpreting the actual data can only be done with information obtained
748750 from other tables, mostly <structname>pg_attribute</structname>. The
749751 key values needed to identify field locations are
@@ -753,7 +755,7 @@ data. Empty in ordinary tables.</entry>
753755 null values. All this trickery is wrapped up in the functions
754756 <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
755757 and <firstterm>heap_getsysattr</firstterm>.
756-
758+
757759 </para>
758760 <para>
759761
@@ -767,7 +769,7 @@ data. Empty in ordinary tables.</entry>
767769 value and some flag bits. Depending on the flags, the data can be either
768770 inline or in a <acronym>TOAST</> table;
769771 it might be compressed, too (see <xref linkend="storage-toast">).
770-
772+
771773 </para>
772774</sect1>
773775