@@ -22,9 +22,13 @@ refers to data that is stored in <productname>PostgreSQL</productname> tables.
22
22
</para>
23
23
24
24
<para>
25
- <xref linkend="page-table"> shows how pages in both normal <productname>PostgreSQL</productname> tables
26
- and <productname>PostgreSQL</productname> indexes
27
- (e.g., a B-tree index) are structured.
25
+
26
+ <xref linkend="page-table"> shows how pages in both normal
27
+ <productname>PostgreSQL</productname> tables and
28
+ <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
29
+ are structured. This structure is also used for toast tables and sequences.
30
+ There are five parts to each page.
31
+
28
32
</para>
29
33
30
34
<table tocentry="1" id="page-table">
@@ -43,113 +47,255 @@ Item
43
47
<tbody>
44
48
45
49
<row>
46
- <entry>itemPointerData</entry>
47
- </row>
48
-
49
- <row>
50
- <entry>filler</entry>
50
+ <entry>PageHeaderData</entry>
51
+ <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
51
52
</row>
52
53
53
54
<row>
54
- <entry>itemData...</entry>
55
+ <entry>itemPointerData</entry>
56
+ <entry>List of (offset,length) pairs pointing to the actual item.</entry>
55
57
</row>
56
58
57
59
<row>
58
- <entry>Unallocated Space</entry>
60
+ <entry>Free space</entry>
61
+ <entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry>
59
62
</row>
60
63
61
64
<row>
62
- <entry>ItemContinuationData</entry>
65
+ <entry>items</entry>
66
+ <entry>The actual items themselves. Different access method have different data here.</entry>
63
67
</row>
64
68
65
69
<row>
66
70
<entry>Special Space</entry>
71
+ <entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
67
72
</row>
68
73
69
- <row >
70
- <entry><quote>ItemData 2</quote></entry >
71
- </row >
74
+ </tbody >
75
+ </tgroup >
76
+ </table >
72
77
73
- <row>
74
- <entry><quote>ItemData 1</quote></entry>
75
- </row>
78
+ <para>
76
79
77
- <row>
78
- <entry>ItemIdData</entry>
79
- </row>
80
+ The first 20 bytes of each page consists of a page header
81
+ (PageHeaderData). It's format is detailed in <xref
82
+ linkend="pageheaderdata-table">. The first two fields deal with WAL
83
+ related stuff. This is followed by three 2-byte integer fields
84
+ (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
85
+ <firstterm>special</firstterm>). These represent byte offsets to the start
86
+ of unallocated space, to the end of unallocated space, and to the start of
87
+ the special space.
88
+
89
+ </para>
90
+
91
+ <table tocentry="1" id="pageheaderdata-table">
92
+ <title>PageHeaderData Layout</title>
93
+ <titleabbrev>PageHeaderData Layout</titleabbrev>
94
+ <tgroup cols="4">
95
+ <thead>
96
+ <row>
97
+ <entry>Field</entry>
98
+ <entry>Type</entry>
99
+ <entry>Length</entry>
100
+ <entry>Description</entry>
101
+ </row>
102
+ </thead>
103
+ <tbody>
104
+ <row>
105
+ <entry>pd_lsn</entry>
106
+ <entry>XLogRecPtr</entry>
107
+ <entry>6 bytes</entry>
108
+ <entry>LSN: next byte after last byte of xlog</entry>
109
+ </row>
110
+ <row>
111
+ <entry>pd_sui</entry>
112
+ <entry>StartUpID</entry>
113
+ <entry>4 bytes</entry>
114
+ <entry>SUI of last changes (currently it's used by heap AM only)</entry>
115
+ </row>
116
+ <row>
117
+ <entry>pd_lower</entry>
118
+ <entry>LocationIndex</entry>
119
+ <entry>2 bytes</entry>
120
+ <entry>Offset to start of free space.</entry>
121
+ </row>
122
+ <row>
123
+ <entry>pd_upper</entry>
124
+ <entry>LocationIndex</entry>
125
+ <entry>2 bytes</entry>
126
+ <entry>Offset to end of free space.</entry>
127
+ </row>
128
+ <row>
129
+ <entry>pd_special</entry>
130
+ <entry>LocationIndex</entry>
131
+ <entry>2 bytes</entry>
132
+ <entry>Offset to start of special space.</entry>
133
+ </row>
134
+ <row>
135
+ <entry>pd_opaque</entry>
136
+ <entry>OpaqueData</entry>
137
+ <entry>2 bytes</entry>
138
+ <entry>AM-generic information. Currently just stores the page size.</entry>
139
+ </row>
140
+ </tbody>
141
+ </tgroup>
142
+ </table>
80
143
81
- <row>
82
- <entry>PageHeaderData</entry>
83
- </row>
144
+ <para>
145
+ Special space is a region at the end of the page that is allocated at page
146
+ initialization time and contains information specific to an access method.
147
+ The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
148
+ currently only stores the page size. Page size is stored in each page
149
+ because frames in the buffer pool may be subdivided into equal sized pages
150
+ on a frame by frame basis within a table (is this true? - mvo).
84
151
85
- </tbody>
86
- </tgroup>
87
- </table>
152
+ </para>
88
153
89
- <!--
90
- .\" Running
91
- .\" .q .../bin/dumpbpages
92
- .\" or
93
- .\" .q .../src/support/dumpbpages
94
- .\" as the postgres superuser
95
- .\" with the file paths associated with
96
- .\" (heap or B-tree index) classes,
97
- .\" .q .../data/base/<database-name>/<class-name>,
98
- .\" will display the page structure used by the classes.
99
- .\" Specifying the
100
- .\" .q -r
101
- .\" flag will cause the classes to be
102
- .\" treated as heap classes and for more information to be displayed.
103
- -->
154
+ <para>
104
155
105
- <para>
106
- The first 8 bytes of each page consists of a page header
107
- (PageHeaderData).
108
- Within the header, the first three 2-byte integer fields
109
- (<firstterm>lower</firstterm>,
110
- <firstterm>upper</firstterm>,
111
- and
112
- <firstterm>special</firstterm>)
113
- represent byte offsets to the start of unallocated space, to the end
114
- of unallocated space, and to the start of <firstterm>special space</firstterm>.
115
- Special space is a region at the end of the page that is allocated at
116
- page initialization time and contains information specific to an
117
- access method. The last 2 bytes of the page header,
118
- <firstterm>opaque</firstterm>,
119
- encode the page size and information on the internal fragmentation of
120
- the page. Page size is stored in each page because frames in the
121
- buffer pool may be subdivided into equal sized pages on a frame by
122
- frame basis within a table. The internal fragmentation information is
123
- used to aid in determining when page reorganization should occur.
124
- </para>
156
+ Following the page header are item identifiers
157
+ (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated
158
+ from the first four bytes of unallocated space. Because an item
159
+ identifier is never moved until it is freed, its index may be used to
160
+ indicate the location of an item on a page. In fact, every pointer to an
161
+ item (<firstterm>ItemPointer</firstterm>, also know as
162
+ <firstterm>CTID</firstterm>) created by
163
+ <productname>PostgreSQL</productname> consists of a frame number and an
164
+ index of an item identifier. An item identifier contains a byte-offset to
165
+ the start of an item, its length in bytes, and a set of attribute bits
166
+ which affect its interpretation.
125
167
126
- <para>
127
- Following the page header are item identifiers
128
- (<firstterm>ItemIdData</firstterm>).
129
- New item identifiers are allocated from the first four bytes of
130
- unallocated space. Because an item identifier is never moved until it
131
- is freed, its index may be used to indicate the location of an item on
132
- a page. In fact, every pointer to an item
133
- (<firstterm>ItemPointer</firstterm>)
134
- created by <productname>PostgreSQL</productname> consists of a frame number and an index of an item
135
- identifier. An item identifier contains a byte-offset to the start of
136
- an item, its length in bytes, and a set of attribute bits which affect
137
- its interpretation.
138
- </para>
168
+ </para>
139
169
140
- <para>
141
- The items themselves are stored in space allocated backwards from
142
- the end of unallocated space. Usually, the items are not interpreted.
143
- However when the item is too long to be placed on a single page or
144
- when fragmentation of the item is desired, the item is divided and
145
- each piece is handled as distinct items in the following manner. The
146
- first through the next to last piece are placed in an item
147
- continuation structure
148
- (<firstterm>ItemContinuationData</firstterm>).
149
- This structure contains
150
- itemPointerData
151
- which points to the next piece and the piece itself. The last piece
152
- is handled normally.
153
- </para>
170
+ <para>
171
+
172
+ The items themselves are stored in space allocated backwards from the end
173
+ of unallocated space. The exact structure varies depending on what the
174
+ table is to contain. Sequences and tables both use a structure named
175
+ <firstterm>HeapTupleHeaderData</firstterm>, describe below.
176
+
177
+ </para>
178
+
179
+ <para>
180
+
181
+ The final section is the "special section" which may contain anything the
182
+ access method wishes to store. Ordinary tables do not use this at all
183
+ (indicated by setting the offset to the pagesize).
184
+
185
+ </para>
186
+
187
+ <para>
188
+
189
+ All tuples are structured the same way. A header of around 31 bytes
190
+ followed by an optional null bitmask and the data. The header is detailed
191
+ below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is
192
+ only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
193
+ <firstterm>t_infomask</firstterm>. If it is present it takes up the space
194
+ between the end of the header and the beginning of the data, as indicated
195
+ by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
196
+ indicates not-null, a 0 bit is a null.
197
+
198
+ </para>
199
+
200
+ <table tocentry="1" id="heaptupleheaderdata-table">
201
+ <title>HeapTupleHeaderData Layout</title>
202
+ <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
203
+ <tgroup cols="4">
204
+ <thead>
205
+ <row>
206
+ <entry>Field</entry>
207
+ <entry>Type</entry>
208
+ <entry>Length</entry>
209
+ <entry>Description</entry>
210
+ </row>
211
+ </thead>
212
+ <tbody>
213
+ <row>
214
+ <entry>t_oid</entry>
215
+ <entry>Oid</entry>
216
+ <entry>4 bytes</entry>
217
+ <entry>OID of this tuple</entry>
218
+ </row>
219
+ <row>
220
+ <entry>t_cmin</entry>
221
+ <entry>CommandId</entry>
222
+ <entry>4 bytes</entry>
223
+ <entry>insert CID stamp</entry>
224
+ </row>
225
+ <row>
226
+ <entry>t_cmax</entry>
227
+ <entry>CommandId</entry>
228
+ <entry>4 bytes</entry>
229
+ <entry>delete CID stamp</entry>
230
+ </row>
231
+ <row>
232
+ <entry>t_xmin</entry>
233
+ <entry>TransactionId</entry>
234
+ <entry>4 bytes</entry>
235
+ <entry>insert XID stamp</entry>
236
+ </row>
237
+ <row>
238
+ <entry>t_xmax</entry>
239
+ <entry>TransactionId</entry>
240
+ <entry>4 bytes</entry>
241
+ <entry>delete XID stamp</entry>
242
+ </row>
243
+ <row>
244
+ <entry>t_ctid</entry>
245
+ <entry>ItemPointerData</entry>
246
+ <entry>6 bytes</entry>
247
+ <entry>current TID of this or newer tuple</entry>
248
+ </row>
249
+ <row>
250
+ <entry>t_natts</entry>
251
+ <entry>int16</entry>
252
+ <entry>2 bytes</entry>
253
+ <entry>number of attributes</entry>
254
+ </row>
255
+ <row>
256
+ <entry>t_infomask</entry>
257
+ <entry>uint16</entry>
258
+ <entry>2 bytes</entry>
259
+ <entry>Various flags</entry>
260
+ </row>
261
+ <row>
262
+ <entry>t_hoff</entry>
263
+ <entry>uint8</entry>
264
+ <entry>1 byte</entry>
265
+ <entry>length of tuple header. Also offset of data.</entry>
266
+ </row>
267
+ </tbody>
268
+ </tgroup>
269
+ </table>
270
+
271
+ <para>
272
+
273
+ All the details may be found in src/include/storage/bufpage.h.
274
+
275
+ </para>
276
+
277
+ <para>
278
+
279
+ Interpreting the actual data can only be done with information obtained
280
+ from other tables, mostly <firstterm>pg_attribute</firstterm>. The
281
+ particular fields are <firstterm>attlen</firstterm> and
282
+ <firstterm>attalign</firstterm>. There is no way to directly get a
283
+ particular attribute, except when there are only fixed width fields and no
284
+ NULLs. All this trickery is wrapped up in the functions
285
+ <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
286
+ and <firstterm>heap_getsysattr</firstterm>.
287
+
288
+ </para>
289
+ <para>
154
290
291
+ To read the data you need to examine each attribute in turn. First check
292
+ whether the field is NULL according to the null bitmap. If it is, go to
293
+ the next. Then make sure you have the right alignment. If the field is a
294
+ fixed width field, then all the bytes are simply placed. If it's a
295
+ variable length field (attlen == -1) then it's a bit more complicated,
296
+ using the variable length structure <firstterm>varattrib</firstterm>.
297
+ Depending on the flags, the data may be either inline, compressed or in
298
+ another table (TOAST).
299
+
300
+ </para>
155
301
</chapter>