NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitd1fcd33

committed

Add new documentation on page format.

Martijn van Ooster

1 parent42ef2c9 commitd1fcd33Copy full SHA for d1fcd33

File tree

1 file changed

+234

-88

lines changed

doc/src/sgml
- page.sgml

1 file changed

+234

-88

lines changed

`‎doc/src/sgml/page.sgml`

Lines changed: 234 additions & 88 deletions

Original file line number	Diff line number	Diff line change
`@@ -22,9 +22,13 @@ refers to data that is stored in <productname>PostgreSQL</productname> tables.`
`22`	`22`	`</para>`
`23`	`23`
`24`	`24`	`<para>`
`25`		`-<xref linkend="page-table"> shows how pages in both normal <productname>PostgreSQL</productname> tables`
`26`		`- and <productname>PostgreSQL</productname> indexes`
`27`		`-(e.g., a B-tree index) are structured.`
	`25`	`+`
	`26`	`+<xref linkend="page-table"> shows how pages in both normal`
	`27`	`+ <productname>PostgreSQL</productname> tables and`
	`28`	`+ <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)`
	`29`	`+are structured. This structure is also used for toast tables and sequences.`
	`30`	`+There are five parts to each page.`
	`31`	`+`
`28`	`32`	`</para>`
`29`	`33`
`30`	`34`	`<table tocentry="1" id="page-table">`
`@@ -43,113 +47,255 @@ Item`
`43`	`47`	`<tbody>`
`44`	`48`
`45`	`49`	`<row>`
`46`		`-<entry>itemPointerData</entry>`
`47`		`-</row>`
`48`		`-`
`49`		`-<row>`
`50`		`-<entry>filler</entry>`
	`50`	`+ <entry>PageHeaderData</entry>`
	`51`	`+ <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>`
`51`	`52`	`</row>`
`52`	`53`
`53`	`54`	`<row>`
`54`		`-<entry>itemData...</entry>`
	`55`	`+<entry>itemPointerData</entry>`
	`56`	`+<entry>List of (offset,length) pairs pointing to the actual item.</entry>`
`55`	`57`	`</row>`
`56`	`58`
`57`	`59`	`<row>`
`58`		`-<entry>Unallocated Space</entry>`
	`60`	`+<entry>Free space</entry>`
	`61`	`+<entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry>`
`59`	`62`	`</row>`
`60`	`63`
`61`	`64`	`<row>`
`62`		`-<entry>ItemContinuationData</entry>`
	`65`	`+<entry>items</entry>`
	`66`	`+<entry>The actual items themselves. Different access method have different data here.</entry>`
`63`	`67`	`</row>`
`64`	`68`
`65`	`69`	`<row>`
`66`	`70`	`<entry>Special Space</entry>`
	`71`	`+<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>`
`67`	`72`	`</row>`
`68`	`73`
`69`		`-<row>`
`70`		`-<entry><quote>ItemData 2</quote></entry>`
`71`		`-</row>`
	`74`	`+</tbody>`
	`75`	`+</tgroup>`
	`76`	`+</table>`
`72`	`77`
`73`		`-<row>`
`74`		`-<entry><quote>ItemData 1</quote></entry>`
`75`		`-</row>`
	`78`	`+ <para>`
`76`	`79`
`77`		`-<row>`
`78`		`-<entry>ItemIdData</entry>`
`79`		`-</row>`
	`80`	`+ The first 20 bytes of each page consists of a page header`
	`81`	`+ (PageHeaderData). It's format is detailed in <xref`
	`82`	`+ linkend="pageheaderdata-table">. The first two fields deal with WAL`
	`83`	`+ related stuff. This is followed by three 2-byte integer fields`
	`84`	`+ (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and`
	`85`	`+ <firstterm>special</firstterm>). These represent byte offsets to the start`
	`86`	`+ of unallocated space, to the end of unallocated space, and to the start of`
	`87`	`+ the special space.`
	`88`	`+`
	`89`	`+ </para>`
	`90`	`+`
	`91`	`+ <table tocentry="1" id="pageheaderdata-table">`
	`92`	`+ <title>PageHeaderData Layout</title>`
	`93`	`+ <titleabbrev>PageHeaderData Layout</titleabbrev>`
	`94`	`+ <tgroup cols="4">`
	`95`	`+ <thead>`
	`96`	`+ <row>`
	`97`	`+ <entry>Field</entry>`
	`98`	`+ <entry>Type</entry>`
	`99`	`+ <entry>Length</entry>`
	`100`	`+ <entry>Description</entry>`
	`101`	`+ </row>`
	`102`	`+ </thead>`
	`103`	`+ <tbody>`
	`104`	`+ <row>`
	`105`	`+ <entry>pd_lsn</entry>`
	`106`	`+ <entry>XLogRecPtr</entry>`
	`107`	`+ <entry>6 bytes</entry>`
	`108`	`+ <entry>LSN: next byte after last byte of xlog</entry>`
	`109`	`+ </row>`
	`110`	`+ <row>`
	`111`	`+ <entry>pd_sui</entry>`
	`112`	`+ <entry>StartUpID</entry>`
	`113`	`+ <entry>4 bytes</entry>`
	`114`	`+ <entry>SUI of last changes (currently it's used by heap AM only)</entry>`
	`115`	`+ </row>`
	`116`	`+ <row>`
	`117`	`+ <entry>pd_lower</entry>`
	`118`	`+ <entry>LocationIndex</entry>`
	`119`	`+ <entry>2 bytes</entry>`
	`120`	`+ <entry>Offset to start of free space.</entry>`
	`121`	`+ </row>`
	`122`	`+ <row>`
	`123`	`+ <entry>pd_upper</entry>`
	`124`	`+ <entry>LocationIndex</entry>`
	`125`	`+ <entry>2 bytes</entry>`
	`126`	`+ <entry>Offset to end of free space.</entry>`
	`127`	`+ </row>`
	`128`	`+ <row>`
	`129`	`+ <entry>pd_special</entry>`
	`130`	`+ <entry>LocationIndex</entry>`
	`131`	`+ <entry>2 bytes</entry>`
	`132`	`+ <entry>Offset to start of special space.</entry>`
	`133`	`+ </row>`
	`134`	`+ <row>`
	`135`	`+ <entry>pd_opaque</entry>`
	`136`	`+ <entry>OpaqueData</entry>`
	`137`	`+ <entry>2 bytes</entry>`
	`138`	`+ <entry>AM-generic information. Currently just stores the page size.</entry>`
	`139`	`+ </row>`
	`140`	`+ </tbody>`
	`141`	`+ </tgroup>`
	`142`	`+ </table>`
`80`	`143`
`81`		`-<row>`
`82`		`-<entry>PageHeaderData</entry>`
`83`		`-</row>`
	`144`	`+ <para>`
	`145`	`+ Special space is a region at the end of the page that is allocated at page`
	`146`	`+ initialization time and contains information specific to an access method.`
	`147`	`+ The last 2 bytes of the page header, <firstterm>opaque</firstterm>,`
	`148`	`+ currently only stores the page size. Page size is stored in each page`
	`149`	`+ because frames in the buffer pool may be subdivided into equal sized pages`
	`150`	`+ on a frame by frame basis within a table (is this true? - mvo).`
`84`	`151`
`85`		`-</tbody>`
`86`		`-</tgroup>`
`87`		`-</table>`
	`152`	`+ </para>`
`88`	`153`
`89`		`-<!--`
`90`		`-.\" Running`
`91`		`-.\" .q .../bin/dumpbpages`
`92`		`-.\" or`
`93`		`-.\" .q .../src/support/dumpbpages`
`94`		`-.\" as the postgres superuser`
`95`		`-.\" with the file paths associated with`
`96`		`-.\" (heap or B-tree index) classes,`
`97`		`-.\" .q .../data/base/<database-name>/<class-name>,`
`98`		`-.\" will display the page structure used by the classes.`
`99`		`-.\" Specifying the`
`100`		`-.\" .q -r`
`101`		`-.\" flag will cause the classes to be`
`102`		`-.\" treated as heap classes and for more information to be displayed.`
`103`		`--->`
	`154`	`+ <para>`
`104`	`155`
`105`		`-<para>`
`106`		`-The first 8 bytes of each page consists of a page header`
`107`		`-(PageHeaderData).`
`108`		`-Within the header, the first three 2-byte integer fields`
`109`		`-(<firstterm>lower</firstterm>,`
`110`		`-<firstterm>upper</firstterm>,`
`111`		`-and`
`112`		`-<firstterm>special</firstterm>)`
`113`		`-represent byte offsets to the start of unallocated space, to the end`
`114`		`-of unallocated space, and to the start of <firstterm>special space</firstterm>.`
`115`		`-Special space is a region at the end of the page that is allocated at`
`116`		`-page initialization time and contains information specific to an`
`117`		`-access method. The last 2 bytes of the page header,`
`118`		`-<firstterm>opaque</firstterm>,`
`119`		`-encode the page size and information on the internal fragmentation of`
`120`		`-the page. Page size is stored in each page because frames in the`
`121`		`-buffer pool may be subdivided into equal sized pages on a frame by`
`122`		`-frame basis within a table. The internal fragmentation information is`
`123`		`-used to aid in determining when page reorganization should occur.`
`124`		`-</para>`
	`156`	`+ Following the page header are item identifiers`
	`157`	`+ (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated`
	`158`	`+ from the first four bytes of unallocated space. Because an item`
	`159`	`+ identifier is never moved until it is freed, its index may be used to`
	`160`	`+ indicate the location of an item on a page. In fact, every pointer to an`
	`161`	`+ item (<firstterm>ItemPointer</firstterm>, also know as`
	`162`	`+ <firstterm>CTID</firstterm>) created by`
	`163`	`+ <productname>PostgreSQL</productname> consists of a frame number and an`
	`164`	`+ index of an item identifier. An item identifier contains a byte-offset to`
	`165`	`+ the start of an item, its length in bytes, and a set of attribute bits`
	`166`	`+ which affect its interpretation.`
`125`	`167`
`126`		`-<para>`
`127`		`-Following the page header are item identifiers`
`128`		`-(<firstterm>ItemIdData</firstterm>).`
`129`		`-New item identifiers are allocated from the first four bytes of`
`130`		`-unallocated space. Because an item identifier is never moved until it`
`131`		`-is freed, its index may be used to indicate the location of an item on`
`132`		`-a page. In fact, every pointer to an item`
`133`		`-(<firstterm>ItemPointer</firstterm>)`
`134`		`-created by <productname>PostgreSQL</productname> consists of a frame number and an index of an item`
`135`		`-identifier. An item identifier contains a byte-offset to the start of`
`136`		`-an item, its length in bytes, and a set of attribute bits which affect`
`137`		`-its interpretation.`
`138`		`-</para>`
	`168`	`+ </para>`
`139`	`169`
`140`		`-<para>`
`141`		`-The items themselves are stored in space allocated backwards from`
`142`		`-the end of unallocated space. Usually, the items are not interpreted.`
`143`		`-However when the item is too long to be placed on a single page or`
`144`		`-when fragmentation of the item is desired, the item is divided and`
`145`		`-each piece is handled as distinct items in the following manner. The`
`146`		`-first through the next to last piece are placed in an item`
`147`		`-continuation structure`
`148`		`-(<firstterm>ItemContinuationData</firstterm>).`
`149`		`-This structure contains`
`150`		`-itemPointerData`
`151`		`-which points to the next piece and the piece itself. The last piece`
`152`		`-is handled normally.`
`153`		`-</para>`
	`170`	`+ <para>`
	`171`	`+`
	`172`	`+ The items themselves are stored in space allocated backwards from the end`
	`173`	`+ of unallocated space. The exact structure varies depending on what the`
	`174`	`+ table is to contain. Sequences and tables both use a structure named`
	`175`	`+ <firstterm>HeapTupleHeaderData</firstterm>, describe below.`
	`176`	`+`
	`177`	`+ </para>`
	`178`	`+`
	`179`	`+ <para>`
	`180`	`+`
	`181`	`+ The final section is the "special section" which may contain anything the`
	`182`	`+ access method wishes to store. Ordinary tables do not use this at all`
	`183`	`+ (indicated by setting the offset to the pagesize).`
	`184`	`+`
	`185`	`+ </para>`
	`186`	`+`
	`187`	`+ <para>`
	`188`	`+`
	`189`	`+ All tuples are structured the same way. A header of around 31 bytes`
	`190`	`+ followed by an optional null bitmask and the data. The header is detailed`
	`191`	`+ below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is`
	`192`	`+ only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the`
	`193`	`+ <firstterm>t_infomask</firstterm>. If it is present it takes up the space`
	`194`	`+ between the end of the header and the beginning of the data, as indicated`
	`195`	`+ by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit`
	`196`	`+ indicates not-null, a 0 bit is a null.`
	`197`	`+`
	`198`	`+ </para>`
	`199`	`+`
	`200`	`+ <table tocentry="1" id="heaptupleheaderdata-table">`
	`201`	`+ <title>HeapTupleHeaderData Layout</title>`
	`202`	`+ <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>`
	`203`	`+ <tgroup cols="4">`
	`204`	`+ <thead>`
	`205`	`+ <row>`
	`206`	`+ <entry>Field</entry>`
	`207`	`+ <entry>Type</entry>`
	`208`	`+ <entry>Length</entry>`
	`209`	`+ <entry>Description</entry>`
	`210`	`+ </row>`
	`211`	`+ </thead>`
	`212`	`+ <tbody>`
	`213`	`+ <row>`
	`214`	`+ <entry>t_oid</entry>`
	`215`	`+ <entry>Oid</entry>`
	`216`	`+ <entry>4 bytes</entry>`
	`217`	`+ <entry>OID of this tuple</entry>`
	`218`	`+ </row>`
	`219`	`+ <row>`
	`220`	`+ <entry>t_cmin</entry>`
	`221`	`+ <entry>CommandId</entry>`
	`222`	`+ <entry>4 bytes</entry>`
	`223`	`+ <entry>insert CID stamp</entry>`
	`224`	`+ </row>`
	`225`	`+ <row>`
	`226`	`+ <entry>t_cmax</entry>`
	`227`	`+ <entry>CommandId</entry>`
	`228`	`+ <entry>4 bytes</entry>`
	`229`	`+ <entry>delete CID stamp</entry>`
	`230`	`+ </row>`
	`231`	`+ <row>`
	`232`	`+ <entry>t_xmin</entry>`
	`233`	`+ <entry>TransactionId</entry>`
	`234`	`+ <entry>4 bytes</entry>`
	`235`	`+ <entry>insert XID stamp</entry>`
	`236`	`+ </row>`
	`237`	`+ <row>`
	`238`	`+ <entry>t_xmax</entry>`
	`239`	`+ <entry>TransactionId</entry>`
	`240`	`+ <entry>4 bytes</entry>`
	`241`	`+ <entry>delete XID stamp</entry>`
	`242`	`+ </row>`
	`243`	`+ <row>`
	`244`	`+ <entry>t_ctid</entry>`
	`245`	`+ <entry>ItemPointerData</entry>`
	`246`	`+ <entry>6 bytes</entry>`
	`247`	`+ <entry>current TID of this or newer tuple</entry>`
	`248`	`+ </row>`
	`249`	`+ <row>`
	`250`	`+ <entry>t_natts</entry>`
	`251`	`+ <entry>int16</entry>`
	`252`	`+ <entry>2 bytes</entry>`
	`253`	`+ <entry>number of attributes</entry>`
	`254`	`+ </row>`
	`255`	`+ <row>`
	`256`	`+ <entry>t_infomask</entry>`
	`257`	`+ <entry>uint16</entry>`
	`258`	`+ <entry>2 bytes</entry>`
	`259`	`+ <entry>Various flags</entry>`
	`260`	`+ </row>`
	`261`	`+ <row>`
	`262`	`+ <entry>t_hoff</entry>`
	`263`	`+ <entry>uint8</entry>`
	`264`	`+ <entry>1 byte</entry>`
	`265`	`+ <entry>length of tuple header. Also offset of data.</entry>`
	`266`	`+ </row>`
	`267`	`+ </tbody>`
	`268`	`+ </tgroup>`
	`269`	`+ </table>`
	`270`	`+`
	`271`	`+ <para>`
	`272`	`+`
	`273`	`+ All the details may be found in src/include/storage/bufpage.h.`
	`274`	`+`
	`275`	`+ </para>`
	`276`	`+`
	`277`	`+ <para>`
	`278`	`+`
	`279`	`+ Interpreting the actual data can only be done with information obtained`
	`280`	`+ from other tables, mostly <firstterm>pg_attribute</firstterm>. The`
	`281`	`+ particular fields are <firstterm>attlen</firstterm> and`
	`282`	`+ <firstterm>attalign</firstterm>. There is no way to directly get a`
	`283`	`+ particular attribute, except when there are only fixed width fields and no`
	`284`	`+ NULLs. All this trickery is wrapped up in the functions`
	`285`	`+ <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>`
	`286`	`+ and <firstterm>heap_getsysattr</firstterm>.`
	`287`	`+`
	`288`	`+ </para>`
	`289`	`+ <para>`
`154`	`290`
	`291`	`+ To read the data you need to examine each attribute in turn. First check`
	`292`	`+ whether the field is NULL according to the null bitmap. If it is, go to`
	`293`	`+ the next. Then make sure you have the right alignment. If the field is a`
	`294`	`+ fixed width field, then all the bytes are simply placed. If it's a`
	`295`	`+ variable length field (attlen == -1) then it's a bit more complicated,`
	`296`	`+ using the variable length structure <firstterm>varattrib</firstterm>.`
	`297`	`+ Depending on the flags, the data may be either inline, compressed or in`
	`298`	`+ another table (TOAST).`
	`299`	`+`
	`300`	`+ </para>`
`155`	`301`	`</chapter>`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitd1fcd33

File tree

1 file changed

1 file changed

`‎doc/src/sgml/page.sgml`

0 commit comments