63.2. TOAST#
This section provides an overview ofTOAST (The Oversized-Attribute Storage Technique).
Postgres Pro uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known asTOAST (or“the best thing since sliced bread”). TheTOAST infrastructure is also used to improve handling of large data values in-memory.
Only certain data types supportTOAST — there is no need to impose the overhead on data types that cannot produce large field values. To supportTOAST, a data type must have a variable-length (varlena) representation, in which, ordinarily, the first four-byte word of any stored value contains the total length of the value in bytes (including itself).TOAST does not constrain the rest of the data type's representation. The special representations collectively calledTOASTed values work by modifying or reinterpreting this initial length word. Therefore, the C-level functions supporting aTOAST-able data type must be careful about how they handle potentiallyTOASTed input values: an input might not actually consist of a four-byte length word and contents until after it's beendetoasted. (This is normally done by invokingPG_DETOAST_DATUM
before doing anything with an input value, but in some cases more efficient approaches are possible. SeeSection 36.13.1 for more detail.)
TOAST usurps two bits of the varlena length word (the high-order bits on big-endian machines, the low-order bits on little-endian machines), thereby limiting the logical size of any value of aTOAST-able data type to 1 GB (230 - 1 bytes). When both bits are zero, the value is an ordinary un-TOASTed value of the data type, and the remaining bits of the length word give the total datum size (including length word) in bytes. When the highest-order or lowest-order bit is set, the value has only a single-byte header instead of the normal four-byte header, and the remaining bits of that byte give the total datum size (including length byte) in bytes. This alternative supports space-efficient storage of values shorter than 127 bytes, while still allowing the data type to grow to 1 GB at need. Values with single-byte headers aren't aligned on any particular boundary, whereas values with four-byte headers are aligned on at least a four-byte boundary; this omission of alignment padding provides additional space savings that is significant compared to short values. As a special case, if the remaining bits of a single-byte header are all zero (which would be impossible for a self-inclusive length), the value is a pointer to out-of-line data, with several possible alternatives as described below. The type and size of such aTOAST pointer are determined by a code stored in the second byte of the datum. Lastly, when the highest-order or lowest-order bit is clear but the adjacent bit is set, the content of the datum has been compressed and must be decompressed before use. In this case the remaining bits of the four-byte length word give the total size of the compressed datum, not the original data. Note that compression is also possible for out-of-line data but the varlena header does not tell whether it has occurred — the content of theTOAST pointer tells that, instead.
The compression technique used for either in-line or out-of-line compressed data can be selected for each column by setting theCOMPRESSION
column option inCREATE TABLE
orALTER TABLE
. The default for columns with no explicit setting is to consult thedefault_toast_compression parameter at the time data is inserted.
As mentioned, there are multiple types ofTOAST pointer datums. The oldest and most common type is a pointer to out-of-line data stored in aTOAST table that is separate from, but associated with, the table containing theTOAST pointer datum itself. Theseon-disk pointer datums are created by theTOAST management code when a tuple to be stored on disk is too large to be stored as-is. Further details appear inSection 63.2.1. Alternatively, aTOAST pointer datum can contain a pointer to out-of-line data that appears elsewhere in memory. Such datums are necessarily short-lived, and will never appear on-disk, but they are very useful for avoiding copying and redundant processing of large data values. Further details appear inSection 63.2.2.
63.2.1. Out-of-Line, On-Disk TOAST Storage#
If any of the columns of a table areTOAST-able, the table will have an associatedTOAST table, whose OID is stored in the table'spg_class
.reltoastrelid
entry. On-diskTOASTed values are kept in theTOAST table, as described in more detail below.
Out-of-line values are divided (after compression if used) into chunks of at mostTOAST_MAX_CHUNK_SIZE
bytes (by default this value is chosen so that four chunk rows will fit on a page, making it about 2000 bytes). Each chunk is stored as a separate row in theTOAST table belonging to the owning table. EveryTOAST table has the columnschunk_id
(an OID identifying the particularTOASTed value),chunk_seq
(a sequence number for the chunk within its value), andchunk_data
(the actual data of the chunk). A unique index onchunk_id
andchunk_seq
provides fast retrieval of the values. A pointer datum representing an out-of-line on-diskTOASTed value therefore needs to store the OID of theTOAST table in which to look and the OID of the specific value (itschunk_id
). For convenience, pointer datums also store the logical datum size (original uncompressed data length), physical stored size (different if compression was applied), and the compression method used, if any. Allowing for the varlena header bytes, the total size of an on-diskTOAST pointer datum is therefore 18 bytes regardless of the actual size of the represented value.
TheTOAST management code is triggered only when a row value to be stored in a table is wider thanTOAST_TUPLE_THRESHOLD
bytes (normally 2 kB). TheTOAST code will compress and/or move field values out-of-line until the row value is shorter thanTOAST_TUPLE_TARGET
bytes (also normally 2 kB, adjustable) or no more gains can be had. During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs noTOAST costs if none of the out-of-line values change.
TheTOAST management code recognizes four different strategies for storingTOAST-able columns on disk:
PLAIN
prevents either compression or out-of-line storage. This is the only possible strategy for columns of non-TOAST-able data types.EXTENDED
allows both compression and out-of-line storage. This is the default for mostTOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big.
EachTOAST-able data type specifies a default strategy for columns of that data type, but the strategy for a given table column can be altered withALTER TABLE ... SET STORAGE
.
TOAST_TUPLE_TARGET
can be adjusted for each table usingALTER TABLE ... SET (toast_tuple_target = N)
This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work of the executor will be done using the main row entry. The big values ofTOASTed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. Thus, the main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and sorts will more often be done entirely in memory. A little test showed that a table containing typical HTML pages and their URLs was stored in about half of the raw data size including theTOAST table, and that the main table contained only about 10% of the entire data (the URLs and some small HTML pages). There was no run time difference compared to an un-TOASTed comparison table, in which all the HTML pages were cut down to 7 kB to fit.
IndirectTOAST pointers simply point at a non-indirect varlena value stored somewhere in memory. This case was originally created merely as a proof of concept, but it is currently used during logical decoding to avoid possibly having to create physical tuples exceeding 1 GB (as pulling all out-of-line field values into the tuple might do). The case is of limited use since the creator of the pointer datum is entirely responsible that the referenced data survives for as long as the pointer could exist, and there is no infrastructure to help with this.
ExpandedTOAST pointers are useful for complex data types whose on-disk representation is not especially suited for computational purposes. As an example, the standard varlena representation of aPostgres Pro array includes dimensionality information, a nulls bitmap if there are any null elements, then the values of all the elements in order. When the element type itself is variable-length, the only way to find theN
'th element is to scan through all the preceding elements. This representation is appropriate for on-disk storage because of its compactness, but for computations with the array it's much nicer to have an“expanded” or“deconstructed” representation in which all the element starting locations have been identified. TheTOAST pointer mechanism supports this need by allowing a pass-by-reference Datum to point to either a standard varlena value (the on-disk representation) or aTOAST pointer that points to an expanded representation somewhere in memory. The details of this expanded representation are up to the data type, though it must have a standard header and meet the other API requirements given insrc/include/utils/expandeddatum.h
. C-level functions working with the data type can choose to handle either representation. Functions that do not know about the expanded representation, but simply applyPG_DETOAST_DATUM
to their inputs, will automatically receive the traditional varlena representation; so support for an expanded representation can be introduced incrementally, one function at a time.
For all types of in-memoryTOAST pointer, theTOAST management code ensures that no such pointer datum can accidentally get stored on disk. In-memoryTOAST pointers are automatically expanded to normal in-line varlena values before storage — and then possibly converted to on-diskTOAST pointers, if the containing tuple would otherwise be too big.
TOAST is a part ofPostgres Pro core.TOAST is not extensible, has the same strategy for all data types and is not effective for structured data (JSON) or data that requires a special workflow (bytea
).
Experimental pluggableTOAST provides an open API allowing you to develop and plug in customTOAST implementations for table columns and data types in addition to the default one. These implementations are called TOASTers. A TOASTer can process TOASTed data using the knowledge about the internal data structure and workflow.
The pluggableTOAST consists of the core, that is, theTOAST API, and custom TOASTers plugged in toPostgres Pro with this API.
This API does not possess any logic regarding theTOAST functionality, it is just a wraparound that allows registering, calling and dropping plugged TOASTers.
To enable using customTOAST implementations, theTOAST API defines a newCustom
type ofTOAST pointer in addition toExternal
andExtended
(corresponding toEXTERNAL
andEXTENDED
TOAST-able column storing strategies).
Currently the pluggableTOAST has several limitations:
Custom TOASTers must only be assigned to a column with the
EXTERNAL
storing strategy.Currently the pluggableTOAST does not support logical replication. A customTOAST pointer is skipped by the logical replication engine and is not replicated. This is because support of custom pointers requires heavy modifications in thePostgres Pro core (the logical replication engine) and even architecture changes as now custom pointers allow updating TOASTed values, while default pointers do not.
When the TOASTer is added to a system, a TOASTer-extension dependency is created.
When the TOASTer is assigned to a column, a TOASTer-relation dependency is created.
When the TOASTer is unassigned from a column, the TOASTer-relation dependency is deleted.
When the TOASTer is dropped from a system, the TOASTer-extension dependency is deleted.
add_toaster(
→toaster_name
text
,toaster_handler_func
text
)integer
set_toaster(
→toaster_name
text
,tab_name
text
,col_name
text
)integer
reset_toaster(
→tab_name
text
,col_name
text
)integer
drop_toaster(
toaster_name
text
)Returns the TOASTer OID on success or 0 if the TOASTer is used.
get_toaster(
→tab_name
text
,col_name
text
)integer
Get the OID of the TOASTer assigned to a table column.
Returns the OID of the TOASTer assigned to a column or 0 if none.
The following example illustrates the use of the pluggable TOAST API:
CREATE EXTENSION toastapi;CREATE EXTENSION bytea_toaster;SELECT add_toaster('bytea_toaster');CREATE TABLE test_bytea_append (id int, a bytea);ALTER TABLE test_bytea_append ALTER a SET STORAGE external;SELECT set_toaster('bytea_toaster', 'test_bytea_append', 'a'); test_set_toaster ------------------ (1 row)...SELECT get_toaster('test_bytea_append', 'a') AS bytea_toaster_oid; get_toaster ------------- 16348(1 row)SELECT pgpro_toast.reset_toaster('test_bytea_append','a'); reset_toaster --------------- 0(1 row)DROP TABLE test_bytea_append;SELECT pgpro_toast.drop_toaster('bytea_toaster'); drop_toaster -------------- 16348(1 row)DROP EXTENSION bytea_toaster;