Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd1fcd33

Browse files
committed
Add new documentation on page format.
Martijn van Ooster
1 parent42ef2c9 commitd1fcd33

File tree

1 file changed

+234
-88
lines changed

1 file changed

+234
-88
lines changed

‎doc/src/sgml/page.sgml

Lines changed: 234 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,13 @@ refers to data that is stored in <productname>PostgreSQL</productname> tables.
2222
</para>
2323

2424
<para>
25-
<xref linkend="page-table"> shows how pages in both normal <productname>PostgreSQL</productname> tables
26-
and <productname>PostgreSQL</productname> indexes
27-
(e.g., a B-tree index) are structured.
25+
26+
<xref linkend="page-table"> shows how pages in both normal
27+
<productname>PostgreSQL</productname> tables and
28+
<productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
29+
are structured. This structure is also used for toast tables and sequences.
30+
There are five parts to each page.
31+
2832
</para>
2933

3034
<table tocentry="1" id="page-table">
@@ -43,113 +47,255 @@ Item
4347
<tbody>
4448

4549
<row>
46-
<entry>itemPointerData</entry>
47-
</row>
48-
49-
<row>
50-
<entry>filler</entry>
50+
<entry>PageHeaderData</entry>
51+
<entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
5152
</row>
5253

5354
<row>
54-
<entry>itemData...</entry>
55+
<entry>itemPointerData</entry>
56+
<entry>List of (offset,length) pairs pointing to the actual item.</entry>
5557
</row>
5658

5759
<row>
58-
<entry>Unallocated Space</entry>
60+
<entry>Free space</entry>
61+
<entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry>
5962
</row>
6063

6164
<row>
62-
<entry>ItemContinuationData</entry>
65+
<entry>items</entry>
66+
<entry>The actual items themselves. Different access method have different data here.</entry>
6367
</row>
6468

6569
<row>
6670
<entry>Special Space</entry>
71+
<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
6772
</row>
6873

69-
<row>
70-
<entry><quote>ItemData 2</quote></entry>
71-
</row>
74+
</tbody>
75+
</tgroup>
76+
</table>
7277

73-
<row>
74-
<entry><quote>ItemData 1</quote></entry>
75-
</row>
78+
<para>
7679

77-
<row>
78-
<entry>ItemIdData</entry>
79-
</row>
80+
The first 20 bytes of each page consists of a page header
81+
(PageHeaderData). It's format is detailed in <xref
82+
linkend="pageheaderdata-table">. The first two fields deal with WAL
83+
related stuff. This is followed by three 2-byte integer fields
84+
(<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
85+
<firstterm>special</firstterm>). These represent byte offsets to the start
86+
of unallocated space, to the end of unallocated space, and to the start of
87+
the special space.
88+
89+
</para>
90+
91+
<table tocentry="1" id="pageheaderdata-table">
92+
<title>PageHeaderData Layout</title>
93+
<titleabbrev>PageHeaderData Layout</titleabbrev>
94+
<tgroup cols="4">
95+
<thead>
96+
<row>
97+
<entry>Field</entry>
98+
<entry>Type</entry>
99+
<entry>Length</entry>
100+
<entry>Description</entry>
101+
</row>
102+
</thead>
103+
<tbody>
104+
<row>
105+
<entry>pd_lsn</entry>
106+
<entry>XLogRecPtr</entry>
107+
<entry>6 bytes</entry>
108+
<entry>LSN: next byte after last byte of xlog</entry>
109+
</row>
110+
<row>
111+
<entry>pd_sui</entry>
112+
<entry>StartUpID</entry>
113+
<entry>4 bytes</entry>
114+
<entry>SUI of last changes (currently it's used by heap AM only)</entry>
115+
</row>
116+
<row>
117+
<entry>pd_lower</entry>
118+
<entry>LocationIndex</entry>
119+
<entry>2 bytes</entry>
120+
<entry>Offset to start of free space.</entry>
121+
</row>
122+
<row>
123+
<entry>pd_upper</entry>
124+
<entry>LocationIndex</entry>
125+
<entry>2 bytes</entry>
126+
<entry>Offset to end of free space.</entry>
127+
</row>
128+
<row>
129+
<entry>pd_special</entry>
130+
<entry>LocationIndex</entry>
131+
<entry>2 bytes</entry>
132+
<entry>Offset to start of special space.</entry>
133+
</row>
134+
<row>
135+
<entry>pd_opaque</entry>
136+
<entry>OpaqueData</entry>
137+
<entry>2 bytes</entry>
138+
<entry>AM-generic information. Currently just stores the page size.</entry>
139+
</row>
140+
</tbody>
141+
</tgroup>
142+
</table>
80143

81-
<row>
82-
<entry>PageHeaderData</entry>
83-
</row>
144+
<para>
145+
Special space is a region at the end of the page that is allocated at page
146+
initialization time and contains information specific to an access method.
147+
The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
148+
currently only stores the page size. Page size is stored in each page
149+
because frames in the buffer pool may be subdivided into equal sized pages
150+
on a frame by frame basis within a table (is this true? - mvo).
84151

85-
</tbody>
86-
</tgroup>
87-
</table>
152+
</para>
88153

89-
<!--
90-
.\" Running
91-
.\" .q .../bin/dumpbpages
92-
.\" or
93-
.\" .q .../src/support/dumpbpages
94-
.\" as the postgres superuser
95-
.\" with the file paths associated with
96-
.\" (heap or B-tree index) classes,
97-
.\" .q .../data/base/<database-name>/<class-name>,
98-
.\" will display the page structure used by the classes.
99-
.\" Specifying the
100-
.\" .q -r
101-
.\" flag will cause the classes to be
102-
.\" treated as heap classes and for more information to be displayed.
103-
-->
154+
<para>
104155

105-
<para>
106-
The first 8 bytes of each page consists of a page header
107-
(PageHeaderData).
108-
Within the header, the first three 2-byte integer fields
109-
(<firstterm>lower</firstterm>,
110-
<firstterm>upper</firstterm>,
111-
and
112-
<firstterm>special</firstterm>)
113-
represent byte offsets to the start of unallocated space, to the end
114-
of unallocated space, and to the start of <firstterm>special space</firstterm>.
115-
Special space is a region at the end of the page that is allocated at
116-
page initialization time and contains information specific to an
117-
access method. The last 2 bytes of the page header,
118-
<firstterm>opaque</firstterm>,
119-
encode the page size and information on the internal fragmentation of
120-
the page. Page size is stored in each page because frames in the
121-
buffer pool may be subdivided into equal sized pages on a frame by
122-
frame basis within a table. The internal fragmentation information is
123-
used to aid in determining when page reorganization should occur.
124-
</para>
156+
Following the page header are item identifiers
157+
(<firstterm>ItemIdData</firstterm>). New item identifiers are allocated
158+
from the first four bytes of unallocated space. Because an item
159+
identifier is never moved until it is freed, its index may be used to
160+
indicate the location of an item on a page. In fact, every pointer to an
161+
item (<firstterm>ItemPointer</firstterm>, also know as
162+
<firstterm>CTID</firstterm>) created by
163+
<productname>PostgreSQL</productname> consists of a frame number and an
164+
index of an item identifier. An item identifier contains a byte-offset to
165+
the start of an item, its length in bytes, and a set of attribute bits
166+
which affect its interpretation.
125167

126-
<para>
127-
Following the page header are item identifiers
128-
(<firstterm>ItemIdData</firstterm>).
129-
New item identifiers are allocated from the first four bytes of
130-
unallocated space. Because an item identifier is never moved until it
131-
is freed, its index may be used to indicate the location of an item on
132-
a page. In fact, every pointer to an item
133-
(<firstterm>ItemPointer</firstterm>)
134-
created by <productname>PostgreSQL</productname> consists of a frame number and an index of an item
135-
identifier. An item identifier contains a byte-offset to the start of
136-
an item, its length in bytes, and a set of attribute bits which affect
137-
its interpretation.
138-
</para>
168+
</para>
139169

140-
<para>
141-
The items themselves are stored in space allocated backwards from
142-
the end of unallocated space. Usually, the items are not interpreted.
143-
However when the item is too long to be placed on a single page or
144-
when fragmentation of the item is desired, the item is divided and
145-
each piece is handled as distinct items in the following manner. The
146-
first through the next to last piece are placed in an item
147-
continuation structure
148-
(<firstterm>ItemContinuationData</firstterm>).
149-
This structure contains
150-
itemPointerData
151-
which points to the next piece and the piece itself. The last piece
152-
is handled normally.
153-
</para>
170+
<para>
171+
172+
The items themselves are stored in space allocated backwards from the end
173+
of unallocated space. The exact structure varies depending on what the
174+
table is to contain. Sequences and tables both use a structure named
175+
<firstterm>HeapTupleHeaderData</firstterm>, describe below.
176+
177+
</para>
178+
179+
<para>
180+
181+
The final section is the "special section" which may contain anything the
182+
access method wishes to store. Ordinary tables do not use this at all
183+
(indicated by setting the offset to the pagesize).
184+
185+
</para>
186+
187+
<para>
188+
189+
All tuples are structured the same way. A header of around 31 bytes
190+
followed by an optional null bitmask and the data. The header is detailed
191+
below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is
192+
only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
193+
<firstterm>t_infomask</firstterm>. If it is present it takes up the space
194+
between the end of the header and the beginning of the data, as indicated
195+
by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
196+
indicates not-null, a 0 bit is a null.
197+
198+
</para>
199+
200+
<table tocentry="1" id="heaptupleheaderdata-table">
201+
<title>HeapTupleHeaderData Layout</title>
202+
<titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
203+
<tgroup cols="4">
204+
<thead>
205+
<row>
206+
<entry>Field</entry>
207+
<entry>Type</entry>
208+
<entry>Length</entry>
209+
<entry>Description</entry>
210+
</row>
211+
</thead>
212+
<tbody>
213+
<row>
214+
<entry>t_oid</entry>
215+
<entry>Oid</entry>
216+
<entry>4 bytes</entry>
217+
<entry>OID of this tuple</entry>
218+
</row>
219+
<row>
220+
<entry>t_cmin</entry>
221+
<entry>CommandId</entry>
222+
<entry>4 bytes</entry>
223+
<entry>insert CID stamp</entry>
224+
</row>
225+
<row>
226+
<entry>t_cmax</entry>
227+
<entry>CommandId</entry>
228+
<entry>4 bytes</entry>
229+
<entry>delete CID stamp</entry>
230+
</row>
231+
<row>
232+
<entry>t_xmin</entry>
233+
<entry>TransactionId</entry>
234+
<entry>4 bytes</entry>
235+
<entry>insert XID stamp</entry>
236+
</row>
237+
<row>
238+
<entry>t_xmax</entry>
239+
<entry>TransactionId</entry>
240+
<entry>4 bytes</entry>
241+
<entry>delete XID stamp</entry>
242+
</row>
243+
<row>
244+
<entry>t_ctid</entry>
245+
<entry>ItemPointerData</entry>
246+
<entry>6 bytes</entry>
247+
<entry>current TID of this or newer tuple</entry>
248+
</row>
249+
<row>
250+
<entry>t_natts</entry>
251+
<entry>int16</entry>
252+
<entry>2 bytes</entry>
253+
<entry>number of attributes</entry>
254+
</row>
255+
<row>
256+
<entry>t_infomask</entry>
257+
<entry>uint16</entry>
258+
<entry>2 bytes</entry>
259+
<entry>Various flags</entry>
260+
</row>
261+
<row>
262+
<entry>t_hoff</entry>
263+
<entry>uint8</entry>
264+
<entry>1 byte</entry>
265+
<entry>length of tuple header. Also offset of data.</entry>
266+
</row>
267+
</tbody>
268+
</tgroup>
269+
</table>
270+
271+
<para>
272+
273+
All the details may be found in src/include/storage/bufpage.h.
274+
275+
</para>
276+
277+
<para>
278+
279+
Interpreting the actual data can only be done with information obtained
280+
from other tables, mostly <firstterm>pg_attribute</firstterm>. The
281+
particular fields are <firstterm>attlen</firstterm> and
282+
<firstterm>attalign</firstterm>. There is no way to directly get a
283+
particular attribute, except when there are only fixed width fields and no
284+
NULLs. All this trickery is wrapped up in the functions
285+
<firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
286+
and <firstterm>heap_getsysattr</firstterm>.
287+
288+
</para>
289+
<para>
154290

291+
To read the data you need to examine each attribute in turn. First check
292+
whether the field is NULL according to the null bitmap. If it is, go to
293+
the next. Then make sure you have the right alignment. If the field is a
294+
fixed width field, then all the bytes are simply placed. If it's a
295+
variable length field (attlen == -1) then it's a bit more complicated,
296+
using the variable length structure <firstterm>varattrib</firstterm>.
297+
Depending on the flags, the data may be either inline, compressed or in
298+
another table (TOAST).
299+
300+
</para>
155301
</chapter>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp