9.5. Binary String Functions and Operators#
This section describes functions and operators for examining and manipulating binary strings, that is values of typebytea
. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are inTable 9.11.Postgres Pro also provides versions of these functions that use the regular function invocation syntax (seeTable 9.12).
Table 9.11. SQL Binary String Functions and Operators
Function/Operator Description Example(s) |
---|
bytea || bytea →bytea
Concatenates the two binary strings. '\x123456'::bytea || '\x789a00bcde'::bytea →\x123456789a00bcde
|
bit_length (bytea ) →integer
Returns number of bits in the binary string (8 times theoctet_length ). bit_length('\x123456'::bytea) →24
|
btrim (bytes bytea ,bytesremoved bytea ) →bytea
Removes the longest string containing only bytes appearing inbytesremoved from the start and end ofbytes . btrim('\x1234567890'::bytea, '\x9012'::bytea) →\x345678
|
ltrim (bytes bytea ,bytesremoved bytea ) →bytea
Removes the longest string containing only bytes appearing inbytesremoved from the start ofbytes . ltrim('\x1234567890'::bytea, '\x9012'::bytea) →\x34567890
|
octet_length (bytea ) →integer
Returns number of bytes in the binary string. octet_length('\x123456'::bytea) →3
|
overlay (bytes bytea PLACING newsubstring bytea FROM start integer [FOR count integer ] ) →bytea
Replaces the substring ofbytes that starts at thestart 'th byte and extends forcount bytes withnewsubstring . Ifcount is omitted, it defaults to the length ofnewsubstring . overlay('\x1234567890'::bytea placing '\002\003'::bytea from 2 for 3) →\x12020390
|
position (substring bytea IN bytes bytea ) →integer
Returns first starting index of the specifiedsubstring withinbytes , or zero if it's not present. position('\x5678'::bytea in '\x1234567890'::bytea) →3
|
rtrim (bytes bytea ,bytesremoved bytea ) →bytea
Removes the longest string containing only bytes appearing inbytesremoved from the end ofbytes . rtrim('\x1234567890'::bytea, '\x9012'::bytea) →\x12345678
|
substring (bytes bytea [FROM start integer ] [FOR count integer ] ) →bytea
Extracts the substring ofbytes starting at thestart 'th byte if that is specified, and stopping aftercount bytes if that is specified. Provide at least one ofstart andcount . substring('\x1234567890'::bytea from 3 for 2) →\x5678
|
trim ( [LEADING |TRAILING |BOTH ]bytesremoved bytea FROM bytes bytea ) →bytea
Removes the longest string containing only bytes appearing inbytesremoved from the start, end, or both ends (BOTH is the default) ofbytes . trim('\x9012'::bytea from '\x1234567890'::bytea) →\x345678
|
trim ( [LEADING |TRAILING |BOTH ] [FROM ]bytes bytea ,bytesremoved bytea ) →bytea
This is a non-standard syntax fortrim() . trim(both from '\x1234567890'::bytea, '\x9012'::bytea) →\x345678
|
Additional binary string manipulation functions are available and are listed inTable 9.12. Some of them are used internally to implement theSQL-standard string functions listed inTable 9.11.
Table 9.12. Other Binary String Functions
Function Description Example(s) |
---|
bit_count (bytes bytea ) →bigint
Returns the number of bits set in the binary string (also known as“popcount”). bit_count('\x1234567890'::bytea) →15
|
get_bit (bytes bytea ,n bigint ) →integer
Extractsn'th bit from binary string. get_bit('\x1234567890'::bytea, 30) →1
|
get_byte (bytes bytea ,n integer ) →integer
Extractsn'th byte from binary string. get_byte('\x1234567890'::bytea, 4) →144
|
length (bytea ) →integer
Returns the number of bytes in the binary string. length('\x1234567890'::bytea) →5
|
length (bytes bytea ,encoding name ) →integer
Returns the number of characters in the binary string, assuming that it is text in the givenencoding . length('jose'::bytea, 'UTF8') →4
|
md5 (bytea ) →text
Computes the MD5hash of the binary string, with the result written in hexadecimal. md5('Th\000omas'::bytea) →8ab2d3c9689aaf18b4958c334c82d8b1
|
set_bit (bytes bytea ,n bigint ,newvalue integer ) →bytea
Setsn'th bit in binary string tonewvalue . set_bit('\x1234567890'::bytea, 30, 0) →\x1234563890
|
set_byte (bytes bytea ,n integer ,newvalue integer ) →bytea
Setsn'th byte in binary string tonewvalue . set_byte('\x1234567890'::bytea, 4, 64) →\x1234567840
|
sha224 (bytea ) →bytea
Computes the SHA-224hash of the binary string. sha224('abc'::bytea) →\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
|
sha256 (bytea ) →bytea
Computes the SHA-256hash of the binary string. sha256('abc'::bytea) →\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
|
sha384 (bytea ) →bytea
Computes the SHA-384hash of the binary string. sha384('abc'::bytea) →\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
|
sha512 (bytea ) →bytea
Computes the SHA-512hash of the binary string. sha512('abc'::bytea) →\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
|
substr (bytes bytea ,start integer [,count integer ] ) →bytea
Extracts the substring ofbytes starting at thestart 'th byte, and extending forcount bytes if that is specified. (Same assubstring(bytes fromstart forcount ) .) substr('\x1234567890'::bytea, 3, 2) →\x5678
|
Functionsget_byte
andset_byte
number the first byte of a binary string as byte 0. Functionsget_bit
andset_bit
number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.
For historical reasons, the functionmd5
returns a hex-encoded value of typetext
whereas the SHA-2 functions return typebytea
. Use the functionsencode
anddecode
to convert between the two. For example writeencode(sha256('abc'), 'hex')
to get a hex-encoded text representation, ordecode(md5('abc'), 'hex')
to get abytea
value.
Functions for converting strings between different character sets (encodings), and for representing arbitrary binary data in textual form, are shown inTable 9.13. For these functions, an argument or result of typetext
is expressed in the database's default encoding, while arguments or results of typebytea
are in an encoding named by another argument.
Table 9.13. Text/Binary String Conversion Functions
Function Description Example(s) |
---|
convert (bytes bytea ,src_encoding name ,dest_encoding name ) →bytea
Converts a binary string representing text in encodingsrc_encoding to a binary string in encodingdest_encoding (seeSection 22.3.4 for available conversions). convert('text_in_utf8', 'UTF8', 'LATIN1') →\x746578745f696e5f75746638
|
convert_from (bytes bytea ,src_encoding name ) →text
Converts a binary string representing text in encodingsrc_encoding totext in the database encoding (seeSection 22.3.4 for available conversions). convert_from('text_in_utf8', 'UTF8') →text_in_utf8
|
convert_to (string text ,dest_encoding name ) →bytea
Converts atext string (in the database encoding) to a binary string encoded in encodingdest_encoding (seeSection 22.3.4 for available conversions). convert_to('some_text', 'UTF8') →\x736f6d655f74657874
|
encode (bytes bytea ,format text ) →text
Encodes binary data into a textual representation; supportedformat values are:base64 ,escape ,hex . encode('123\000\001', 'base64') →MTIzAAE=
|
decode (string text ,format text ) →bytea
Decodes binary data from a textual representation; supportedformat values are the same as forencode . decode('MTIzAAE=', 'base64') →\x3132330001
|
Theencode
anddecode
functions support the following textual formats:
- base64#
Thebase64
format is that ofRFC 2045 Section 6.8. As per theRFC, encoded lines are broken at 76 characters. However instead of the MIME CRLF end-of-line marker, only a newline is used for end-of-line. Thedecode
function ignores carriage-return, newline, space, and tab characters. Otherwise, an error is raised whendecode
is supplied invalid base64 data — including when trailing padding is incorrect.- escape#
Theescape
format converts zero bytes and bytes with the high bit set into octal escape sequences (\
nnn
), and it doubles backslashes. Other byte values are represented literally. Thedecode
function will raise an error if a backslash is not followed by either a second backslash or three octal digits; it accepts other byte values unchanged.
- hex#
Thehex
format represents each 4 bits of data as one hexadecimal digit,0
throughf
, writing the higher-order digit of each byte first. Theencode
function outputs thea
-f
hex digits in lower case. Because the smallest unit of data is 8 bits, there are always an even number of characters returned byencode
. Thedecode
function accepts thea
-f
characters in either upper or lower case. An error is raised whendecode
is given invalid hex data — including when given an odd number of characters.
See also the aggregate functionstring_agg
inSection 9.21 and the large object functions inSection 33.4.