Movatterモバイル変換


[0]ホーム

URL:



Facebook
Postgres Pro
Facebook
Downloads
65.6. Database Page Layout
Prev UpChapter 65. Database Physical StorageHome Next

65.6. Database Page Layout#

This section provides an overview of the page format used withinPostgreSQL tables and indexes.[17] Sequences andTOAST tables are formatted just like a regular table.

In the following explanation, abyte is assumed to contain 8 bits. In addition, the termitem refers to an individual data value that is stored on a page. In a table, an item is a row; in an index, an item is an index entry.

Every table and index is stored as an array ofpages of a fixed size (usually 8 kB, although a different page size can be selected when compiling the server). In a table, all the pages are logically equivalent, so a particular item (row) can be stored in any page. In indexes, the first page is generally reserved as ametapage holding control information, and there can be different types of pages within the index, depending on the index access method.

Table 65.2 shows the overall layout of a page. There are five parts to each page.

Table 65.2. Overall Page Layout

ItemDescription
PageHeaderData24 bytes long. Contains general information about the page, including free space pointers.
ItemIdDataArray of item identifiers pointing to the actual items. Each entry is an (offset,length) pair. 4 bytes per item.
Free spaceThe unallocated space. New item identifiers are allocated from the start of this area, new items from the end.
ItemsThe actual items themselves.
Special spaceIndex access method specific data. Different methods store different data. Empty in ordinary tables.

The first 24 bytes of each page consists of a page header (PageHeaderData). Its format is detailed inTable 65.3. The first field tracks the most recent WAL entry related to this page. The second field contains the page checksum ifdata checksums are enabled. Next is a 2-byte field containing flag bits. This is followed by three 2-byte integer fields (pd_lower,pd_upper, andpd_special). These contain byte offsets from the page start to the start of unallocated space, to the end of unallocated space, and to the start of the special space. The next 2 bytes of the page header,pd_pagesize_version, store both the page size and a version indicator. Beginning withPostgreSQL 8.3 the version number is 4;PostgreSQL 8.1 and 8.2 used version number 3;PostgreSQL 8.0 used version number 2;PostgreSQL 7.3 and 7.4 used version number 1; prior releases used version number 0. (The basic page layout and header format has not changed in most of these versions, but the layout of heap row headers has.) The page size is basically only present as a cross-check; there is no support for having more than one page size in an installation. The last field is a hint that shows whether pruning the page is likely to be profitable: it tracks the oldest un-pruned XMAX on the page.

Table 65.3. PageHeaderData Layout

FieldTypeLengthDescription
pd_lsnPageXLogRecPtr8 bytesLSN: next byte after last byte of WAL record for last change to this page
pd_checksumuint162 bytesPage checksum
pd_flagsuint162 bytesFlag bits
pd_lowerLocationIndex2 bytesOffset to start of free space
pd_upperLocationIndex2 bytesOffset to end of free space
pd_specialLocationIndex2 bytesOffset to start of special space
pd_pagesize_versionuint162 bytesPage size and layout version number information
pd_prune_xidTransactionId4 bytesOldest unpruned XMAX on page, or zero if none

All the details can be found insrc/include/storage/bufpage.h.

Following the page header are item identifiers (ItemIdData), each requiring four bytes. An item identifier contains a byte-offset to the start of an item, its length in bytes, and a few attribute bits which affect its interpretation. New item identifiers are allocated as needed from the beginning of the unallocated space. The number of item identifiers present can be determined by looking atpd_lower, which is increased to allocate a new identifier. Because an item identifier is never moved until it is freed, its index can be used on a long-term basis to reference an item, even when the item itself is moved around on the page to compact free space. In fact, every pointer to an item (ItemPointer, also known asCTID) created byPostgreSQL consists of a page number and the index of an item identifier.

The items themselves are stored in space allocated backwards from the end of unallocated space. The exact structure varies depending on what the table is to contain. Tables and sequences both use a structure namedHeapTupleHeaderData, described below.

The final section is thespecial section which can contain anything the access method wishes to store. For example, b-tree indexes store links to the page's left and right siblings, as well as some other data relevant to the index structure. Ordinary tables do not use a special section at all (indicated by settingpd_special to equal the page size).

Figure 65.1 illustrates how these parts are laid out in a page.

Figure 65.1. Page Layout


65.6.1. Table Row Layout#

All table rows are structured in the same way. There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data. The header is detailed inTable 65.4. The actual user data (columns of the row) begins at the offset indicated byt_hoff, which must always be a multiple of the MAXALIGN distance for the platform. The null bitmap is only present if theHEAP_HASNULL bit is set int_infomask. If it is present it begins just after the fixed header and occupies enough bytes to have one bit per data column (that is, the number of bits that equals the attribute count int_infomask2). In this list of bits, a 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not present, all columns are assumed not-null. The object ID is only present if theHEAP_HASOID_OLD bit is set int_infomask. If present, it appears just before thet_hoff boundary. Any padding needed to maket_hoff a MAXALIGN multiple will appear between the null bitmap and the object ID. (This in turn ensures that the object ID is suitably aligned.)

Table 65.4. HeapTupleHeaderData Layout

FieldTypeLengthDescription
t_xminTransactionId4 bytesinsert XID stamp
t_xmaxTransactionId4 bytesdelete XID stamp
t_cidCommandId4 bytesinsert and/or delete CID stamp (overlays with t_xvac)
t_xvacTransactionId4 bytesXID for VACUUM operation moving a row version
t_ctidItemPointerData6 bytescurrent TID of this or newer row version
t_infomask2uint162 bytesnumber of attributes, plus various flag bits
t_infomaskuint162 bytesvarious flag bits
t_hoffuint81 byteoffset to user data

All the details can be found insrc/include/access/htup_details.h.

Interpreting the actual data can only be done with information obtained from other tables, mostlypg_attribute. The key values needed to identify field locations areattlen andattalign. There is no way to directly get a particular attribute, except when there are only fixed width fields and no null values. All this trickery is wrapped up in the functionsheap_getattr,fastgetattr andheap_getsysattr.

To read the data you need to examine each attribute in turn. First check whether the field is NULL according to the null bitmap. If it is, go to the next. Then make sure you have the right alignment. If the field is a fixed width field, then all the bytes are simply placed. If it's a variable length field (attlen = -1) then it's a bit more complicated. All variable-length data types share the common header structurestruct varlena, which includes the total length of the stored value and some flag bits. Depending on the flags, the data can be either inline or in aTOAST table; it might be compressed, too (seeSection 65.2).



[17] Actually, use of this page format is not required for either table or index access methods. Theheap table access method always uses this format. All the existing index methods also use the basic format, but the data kept on index metapages usually doesn't follow the item layout rules.


Prev Up Next
65.5. The Initialization Fork Home 65.7. Heap-Only Tuples (HOT)
pdfepub
Go to PostgreSQL 17
By continuing to browse this website, you agree to the use of cookies. Go toPrivacy Policy.

[8]ページ先頭

©2009-2025 Movatter.jp