1
- <!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.27 2009/04/23 10:20:27 heikki Exp $ -->
1
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.28 2009/05/16 22:03:53 tgl Exp $ -->
2
2
3
3
<chapter id="storage">
4
4
@@ -33,7 +33,7 @@ these required items, the cluster configuration files
33
33
<filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
34
34
<filename>pg_ident.conf</filename> are traditionally stored in
35
35
<varname>PGDATA</> (although in <productname>PostgreSQL</productname> 8.0 and
36
- later, it is possible to keep them elsewhere).
36
+ later, it is possible to keep them elsewhere).
37
37
</para>
38
38
39
39
<table tocentry="1" id="pgdata-contents-table">
74
74
<row>
75
75
<entry><filename>pg_multixact</></entry>
76
76
<entry>Subdirectory containing multitransaction status data
77
- (used for shared row locks)</entry>
77
+ (used for shared row locks)</entry>
78
78
</row>
79
79
80
80
<row>
@@ -131,12 +131,12 @@ there.
131
131
Each table and index is stored in a separate file, named after the table
132
132
or index's <firstterm>filenode</> number, which can be found in
133
133
<structname>pg_class</>.<structfield>relfilenode</>. In addition to the
134
- main file (aka. main fork), a <firstterm>free space map</> (see
135
- < xref linkend="storage-fsm">) that stores information about free space
136
- available in the relation, is stored in a file named after the filenode
137
- number, with the <literal>_fsm</> suffix . Tables also have a visibility map
138
- fork, with the <literal>_vm</> suffix , to track which pages are known to have
139
- no dead tuples and therefore need no vacuuming.
134
+ main file (a/k/a main fork),each table and index has a <firstterm>free space
135
+ map</> (see < xref linkend="storage-fsm">), which stores information about free
136
+ space available in the relation. The free space map is stored in a file named
137
+ with the filenode number plus thesuffix <literal>_fsm</>. Tables also have a
138
+ visibility map fork, with thesuffix <literal>_vm</>, to track which pages are
139
+ known to have no dead tuples and therefore need no vacuuming.
140
140
</para>
141
141
142
142
<caution>
@@ -157,6 +157,8 @@ This arrangement avoids problems on platforms that have file size limitations.
157
157
(Actually, 1 GB is just the default segment size. The segment size can be
158
158
adjusted using the configuration option <option>--with-segsize</option>
159
159
when building <productname>PostgreSQL</>.)
160
+ In principle, free space map and visibility map forks could require multiple
161
+ segments as well, though this is unlikely to happen in practice.
160
162
The contents of tables and indexes are discussed further in
161
163
<xref linkend="storage-page-layout">.
162
164
</para>
@@ -193,7 +195,7 @@ if a tablespace other than <literal>pg_default</> is specified for them.
193
195
The name of a temporary file has the form
194
196
<filename>pgsql_tmp<replaceable>PPP</>.<replaceable>NNN</></filename>,
195
197
where <replaceable>PPP</> is the PID of the owning backend and
196
- <replaceable>NNN</> distinguishes different files of that backend.
198
+ <replaceable>NNN</> distinguishes differenttemporary files of that backend.
197
199
</para>
198
200
199
201
</sect1>
@@ -215,10 +217,10 @@ Oversized-Attribute Storage Technique).
215
217
<para>
216
218
<productname>PostgreSQL</productname> uses a fixed page size (commonly
217
219
8 kB), and does not allow tuples to span multiple pages. Therefore, it is
218
- not possible to store very large field values directly. To overcome
220
+ not possible to store very large field values directly. To overcome
219
221
this limitation, large field values are compressed and/or broken up into
220
222
multiple physical rows. This happens transparently to the user, with only
221
- small impact on most of the backend code. The technique is affectionately
223
+ small impact on most of the backend code. The technique is affectionately
222
224
known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
223
225
</para>
224
226
@@ -377,24 +379,24 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
377
379
378
380
<title>Free Space Map</title>
379
381
380
- <indexterm>
381
- <primary>Free Space Map</primary>
382
- </indexterm>
383
- <indexterm><primary>FSM</><see>Free Space Map</></indexterm>
382
+ <indexterm>
383
+ <primary>Free Space Map</primary>
384
+ </indexterm>
385
+ <indexterm><primary>FSM</><see>Free Space Map</></indexterm>
384
386
385
387
<para>
386
- A Free Space Map is stored with every heap and index relation, except for
387
- hash indexes, to keep track of available space in the relation. It's stored
388
- along the main relation data, in a separateFSM relation fork, named after
389
- relfilenode of the relation,but with a <literal>_fsm</> suffix. For example,
390
- if therelfilenode of a relation is 12345, the FSM is stored in a file called
388
+ Each heap and index relation, except for hash indexes, has a Free Space Map
389
+ (FSM) to keep track of available space in the relation. It's stored
390
+ alongside the main relation data in a separate relation fork, named after the
391
+ filenode number of the relation,plus a <literal>_fsm</> suffix. For example,
392
+ if thefilenode of a relation is 12345, the FSM is stored in a file called
391
393
<filename>12345_fsm</>, in the same directory as the main relation file.
392
394
</para>
393
395
394
396
<para>
395
397
The Free Space Map is organized as a tree of <acronym>FSM</> pages. The
396
- bottom level <acronym>FSM</> pagesstores the free space available onevery
397
- heap (or index) page, using one byte to represent eachheap page. The upper
398
+ bottom level <acronym>FSM</> pagesstore the free space available oneach
399
+ heap (or index) page, using one byte to represent eachsuch page. The upper
398
400
levels aggregate information from the lower levels.
399
401
</para>
400
402
@@ -409,8 +411,8 @@ at the root.
409
411
<para>
410
412
See <filename>src/backend/storage/freespace/README</> for more details on
411
413
how the <acronym>FSM</> is structured, and how it's updated and searched.
412
- <xref linkend="pgfreespacemap"> contrib module can be used toview the
413
- information stored in free space maps.
414
+ The <filename> contrib/pg_freespacemap</> module can be used toexamine the
415
+ information stored in free space maps (see <xref linkend="pgfreespacemap">) .
414
416
</para>
415
417
416
418
</sect1>
@@ -515,7 +517,7 @@ data. Empty in ordinary tables.</entry>
515
517
and <structfield>pd_special</structfield>). These contain byte offsets
516
518
from the page start to the start
517
519
of unallocated space, to the end of unallocated space, and to the start of
518
- the special space.
520
+ the special space.
519
521
The next 2 bytes of the page header,
520
522
<structfield>pd_pagesize_version</structfield>, store both the page size
521
523
and a version indicator. Beginning with
@@ -530,15 +532,15 @@ data. Empty in ordinary tables.</entry>
530
532
more than one page size in an installation.
531
533
The last field is a hint that shows whether pruning the page is likely
532
534
to be profitable: it tracks the oldest un-pruned XMAX on the page.
533
-
535
+
534
536
</para>
535
-
537
+
536
538
<table tocentry="1" id="pageheaderdata-table">
537
539
<title>PageHeaderData Layout</title>
538
540
<titleabbrev>PageHeaderData Layout</titleabbrev>
539
- <tgroup cols="4">
541
+ <tgroup cols="4">
540
542
<thead>
541
- <row>
543
+ <row>
542
544
<entry>Field</entry>
543
545
<entry>Type</entry>
544
546
<entry>Length</entry>
@@ -627,25 +629,25 @@ data. Empty in ordinary tables.</entry>
627
629
</para>
628
630
629
631
<para>
630
-
632
+
631
633
The items themselves are stored in space allocated backwards from the end
632
634
of unallocated space. The exact structure varies depending on what the
633
635
table is to contain. Tables and sequences both use a structure named
634
636
<type>HeapTupleHeaderData</type>, described below.
635
637
636
638
</para>
637
-
639
+
638
640
<para>
639
-
641
+
640
642
The final section is the <quote>special section</quote> which can
641
643
contain anything the access method wishes to store. For example,
642
644
b-tree indexes store links to the page's left and right siblings,
643
645
as well as some other data relevant to the index structure.
644
646
Ordinary tables do not use a special section at all (indicated by setting
645
647
<structfield>pd_special</> to equal the page size).
646
-
648
+
647
649
</para>
648
-
650
+
649
651
<para>
650
652
651
653
All table rows are structured in the same way. There is a fixed-size
@@ -669,15 +671,15 @@ data. Empty in ordinary tables.</entry>
669
671
<structfield>t_hoff</> a MAXALIGN multiple will appear between the null
670
672
bitmap and the object ID. (This in turn ensures that the object ID is
671
673
suitably aligned.)
672
-
674
+
673
675
</para>
674
-
676
+
675
677
<table tocentry="1" id="heaptupleheaderdata-table">
676
678
<title>HeapTupleHeaderData Layout</title>
677
679
<titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
678
- <tgroup cols="4">
680
+ <tgroup cols="4">
679
681
<thead>
680
- <row>
682
+ <row>
681
683
<entry>Field</entry>
682
684
<entry>Type</entry>
683
685
<entry>Length</entry>
@@ -743,7 +745,7 @@ data. Empty in ordinary tables.</entry>
743
745
</para>
744
746
745
747
<para>
746
-
748
+
747
749
Interpreting the actual data can only be done with information obtained
748
750
from other tables, mostly <structname>pg_attribute</structname>. The
749
751
key values needed to identify field locations are
@@ -753,7 +755,7 @@ data. Empty in ordinary tables.</entry>
753
755
null values. All this trickery is wrapped up in the functions
754
756
<firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
755
757
and <firstterm>heap_getsysattr</firstterm>.
756
-
758
+
757
759
</para>
758
760
<para>
759
761
@@ -767,7 +769,7 @@ data. Empty in ordinary tables.</entry>
767
769
value and some flag bits. Depending on the flags, the data can be either
768
770
inline or in a <acronym>TOAST</> table;
769
771
it might be compressed, too (see <xref linkend="storage-toast">).
770
-
772
+
771
773
</para>
772
774
</sect1>
773
775