PostgreSQL lexical structure and syntax

This page describes the lexical structure used to create PostgreSQLstatements in Spanner and defines the syntax of the lexical elements,calledtokens, of this structure.

The content on this page is based on the PostgreSQL documentation, which is available under thePostgreSQL License. There are differences in behavior between the PostgreSQL capabilities supported in Spanner and their open-source PostgreSQL equivalents.

A PostgreSQL statement comprises a series of tokens. Tokens includeidentifiers,quoted identifiers,keywords,literals,operators, andspecial characters. You can separate tokens with whitespace (for example,space, tab, newline) or comments.

Identifiers and quoted identifiers

Identifiers are names that are associated with columns, tables, and otherdatabase objects. They can be unquoted or quoted.

  • The maximum identifier length is 63 characters. For more information aboutidentifier lengths, seeQuotas &limits. The PostgreSQL parser truncates identifiers that arelonger than 63 characters.
  • Unquoted identifiers must begin with a letter or an underscore character.Subsequent characters can be letters, numbers, or underscores.
  • Identifiers that start with an underscore can only be used for aliases.Schema objects such as tables, views, indexes, and columns cannot have aname that starts with an underscore.
  • Quoted identifiers must be enclosed by double quote (") characters.
    • Quoted identifiers can contain any character, such as spaces or symbols.
    • Quoted identifiers cannot be empty.
    • Quoted identifiers support the same escape sequences asstring literals.
    • Akeyword must be a quoted identifier if it is a standalonekeyword or the first component of a path expression. It may be unquotedas the second or later component of a path expression.

Examples

These are valid identifiers (unquoted identifiers are converted to lower case):

Customers5"5Customers"dataField_dataField1ADGROUP"tableName~""GROUP"

These path expressions contain valid identifiers:

foo."GROUP"foo.GROUP

These are invalid identifiers:

5Customers_dataField!GROUP

5Customers begins with a number, not a letter or underscore._dataField!contains the special character "!" which is not a letter, number, or underscore.GROUP is a keyword, and therefore cannot be used as an identifier withoutbeing enclosed by double quote characters.

Fully qualified names

Fully qualified names (FQNs) combine the schema name and the object name toidentify a database object, for example,sales.customers. When you add an FQNin DDL, it requires quotes around each part of the name:

"foo"."Group"

Case sensitivity

PostgreSQL is case sensitive. Quoted identifiers are case preserving.Unquoted identifiers are not. Unquoted identifiers are all converted to lowercase before being compared.

Examples

CREATETABLEFoo(aintprimarykey);select*fromfoo;-- worksCREATETABLEFoo(aintprimarykey);select*from"foo";-- worksCREATETABLE"Foo"(aintprimarykey);select*fromfoo;-- failsCREATETABLEFoo(aintprimarykey);select*from"Foo";-- fails

Keywords

In the following example, the tokensSELECT,UPDATE, andVALUES areexamples of keywords, that is, words that have a fixed meaning in the SQLlanguage.

SELECT*FROMMY_TABLE;UPDATEMY_TABLESETA=5;INSERTINTOMY_TABLEVALUES(3,'hi there');

Keywords and identifiers have the same lexical structure, meaning that one can'tknow whether a token is an identifier or a keyword without knowing the language.

Literals (constants)

There are three kinds ofimplicitly-typed constants in PostgreSQL:strings, bit strings, and numbers. Constants can also be specified with explicittypes, which can enable more accurate representation and more efficient handlingby the system. These alternatives are discussed in the following subsections.

String constants

A string constant in SQL is an arbitrary sequence of characters bounded bysingle quotes ('), for example'This is a string'. To include a single-quotecharacter within a string constant, write two adjacent single quotes, forexample'Dianne''s horse'. Note that this isnot the same as a double-quotecharacter (").

Two string constants that are only separated by whitespace with at least onenewline are concatenated and effectively treated as if the string had beenwritten as one constant. For example:

SELECT'foo''bar';

is equivalent to:

SELECT 'foobar';

but:

SELECT 'foo' 'bar';

is not valid syntax.

String constants with C-style escapes

PostgreSQL also acceptsescape string constants, which are anextension to the SQL standard. An escape string constant is specified by writingthe letter E (upper or lower case) just before the opening single quote, forexampleE'foo'. (When continuing an escape string constant across lines, writeE only before the first opening quote.) Within an escape string, a backslashcharacter (\) begins a C-likebackslash escape sequence, in which thecombination of backslash and following character(s) represent a special bytevalue, as shown in the following table.

Backslash Escape SequenceInterpretation
\bbackspace
\fform feed
\nnewline
\rcarriage return
\ttab
\o,\oo,\ooo (o = 0–7)octal byte value
\xh,\xhh (h = 0–9, A–F)hexadecimal byte value
\uxxxx,\Uxxxxxxxx (x = 0–9, A–F)16 or 32-bit hexadecimal Unicode character value

Any other character following a backslash is taken literally. Thus, to include abackslash character, write two backslashes (\\). Also, a single quote can beincluded in an escape string by writing\', in addition to the normal way ofwriting''.

Note: Backslash escapes are recognized only in escape string constants, not in regular string constants.

Ensure that the byte sequences that you create, especially when using the octalor hexadecimal escapes, compose valid characters in the server character setencoding. A useful alternative is to use Unicode escapes or the alternativeUnicode escape syntax; then the server will check that the character conversionis possible.

Note that Spanner supports only UTF-8 for strings.

The character with the code zero cannot be in a string constant.

String Constants With Unicode Escapes

PostgreSQL also supports another type of escape syntax for strings thatallows specifying arbitrary Unicode characters by code point. A Unicode escapestring constant starts with U& (upper or lower case letter U followed byampersand) immediately before the opening quote, without any spaces in between,for example U&'foo'. (Note that this creates an ambiguity with the operator &.Use spaces around the operator to avoid this problem.) Inside the quotes,Unicode characters can be specified in escaped form by writing a backslashfollowed by the four-digit hexadecimal code point number or alternatively abackslash followed by a plus sign followed by a six-digit hexadecimal code pointnumber. For example, the string 'data' could be written as the following:

U&'d\0061t\+000061'

The following less trivial example writes the Russian word "slon" (elephant) inCyrillic letters:

U&'\0441\043B\043E\043D'

If you want to use an escape character other than backslash, you can specify itusing theUESCAPE clause after the string, for example:

U&'d!0061t!+000061' UESCAPE '!'

The escape character can be any single character other than a hexadecimal digit,the plus sign, a single quote, a double quote, or a whitespace character.

To include the escape character in the string literally, write it twice.

Dollar-Quoted String Constants

Although the standard syntax for specifying string constants is usuallyconvenient, it can be difficult to understand when the string contains manysingle quotes or backslashes, because each of those must be doubled. To allowmore readable queries in such situations, PostgreSQL provides anotherway, called "dollar quoting", to write string constants. A dollar-quoted stringconstant consists of a dollar sign ($), an optional "tag" of zero or morecharacters, another dollar sign, an arbitrary sequence of characters that makesup the string content, a dollar sign, the same tag that began this dollar quote,and a dollar sign. For example, here are two different ways to specify thestring "Dianne's horse" using dollar quoting:

$$Dianne's horse$$$SomeTag$Dianne's horse$SomeTag$

Notice that inside the dollar-quoted string, single quotes can be used withoutneeding to be escaped. Indeed, no characters inside a dollar-quoted string areever escaped: the string content is always written literally. Backslashes arenot special, and neither are dollar signs, unless they are part of a sequencematching the opening tag.

It's possible to nest dollar-quoted string constants by choosing different tagsat each nesting level.

The tag, if any, of a dollar-quoted string follows the same rules as an unquotedidentifier, except that it cannot contain a dollar sign. Tags are casesensitive, so$tag$String content$tag$ is correct, but$TAG$Stringcontent$tag$ is not.

A dollar-quoted string that follows a keyword or identifier must be separatedfrom it by whitespace; otherwise the dollar quoting delimiter would be taken aspart of the preceding identifier.

Dollar quoting is not part of the SQL standard, but it is often a moreconvenient way to write complicated string literals than the standard-compliantsingle quote syntax. It is particularly useful when representing stringconstants inside other constants.

Numeric Constants

Numeric constants are accepted in these general forms:

digitsdigits.[digits][e[+-]digits][digits].digits[e[+-]digits]digitse[+-]digits

where digits is one or more decimal digits (0 through 9). At least one digitmust be before or after the decimal point, if one is used. At least one digitmust follow the exponent marker (e), if one is present. There cannot be anyspaces or other characters embedded in the constant. Note that any leading plusor minus sign is not actually considered part of the constant; it is an operatorapplied to the constant.

These are some examples of valid numeric constants:

423.54..0015e21.925e-3

A numeric constant that contains neither a decimal point nor an exponent isinitially presumed to be type bigint if its value fits in type bigint (64 bits);otherwise it is taken to be type numeric. Constants that contain decimal pointsor exponents are always initially presumed to be type numeric.

The initially assigned data type of a numeric constant is just a starting pointfor the type resolution algorithms. In most cases the constant will beautomatically coerced to the most appropriate type depending on context. Whennecessary, you can force a numeric value to be interpreted as a specific datatype by casting it. For example, you can force a bigint value to be treated astype numeric by writing:

NUMERIC '123'123::numeric

These are actually just special cases of the general casting notations discussedin the next section.

Constants Of Other Types

A constant of an arbitrary type can be entered using any one of the followingnotations:

type 'string''string'::typeCAST ( 'string' AS type )

The string constant's text is passed to the input conversion routine for thetype calledtype. The result is a constant of the indicated type. The explicittype cast can be omitted if there is no ambiguity as to the type the constantmust be (for example, when it is assigned directly to a table column), in whichcase it is automatically coerced.

The string constant can be written using either regular SQL notation ordollar-quoting.

The:: andCAST() syntaxes can also be used to specify run-time typeconversions of arbitrary expressions. To avoid syntactic ambiguity, thetype'string' syntax can only be used to specify the type of a literal constant.

Another restriction on thetype 'string' syntax is that it does not work forarray types; use:: orCAST() to specify the type of an array constant.

TheCAST() syntax conforms to SQL. Thetype 'string' syntax is ageneralization of the standard: SQL specifies this syntax only for a few datatypes, but PostgreSQL allows it for all types. The syntax with:: ishistorical PostgreSQL usage.

Special characters

Some characters that are not alphanumeric have a special meaning that isdifferent from being an operator. Details on the usage can be found at thelocation where the respective syntax element is described. This section onlyexists to advise the existence and summarize the purposes of these characters.

  • A dollar sign ($) followed by digits is used to represent a positionalparameter in the body of a prepared statement. In other contexts the dollarsign can be part of an identifier or a dollar-quoted string constant.

  • Parentheses (()) have their usual meaning to group expressions and enforceprecedence. In some cases parentheses are required as part of the fixedsyntax of a particular SQL command.

  • Brackets ([]) are used to select the elements of an array.

  • Commas (,) are used in some syntactical constructs to separate the elementsof a list.

  • The semicolon (;) terminates a SQL command. It cannot appear anywherewithin a command, except within a string constant or quoted identifier.

  • The asterisk (*) is used in some contexts to denote all the fields of atable row or composite value. It also has a special meaning when used as theargument of an aggregate function, namely that the aggregate does notrequire any explicit parameter.

  • The period (.) is used in numeric constants, and to separate table, andcolumn names.

Comments

A comment is a sequence of characters beginning with double dashes and extendingto the end of the line. For example:

-- This is a standard SQL comment

Alternatively, C-style block comments can be used:

/* multiline comment *   with nesting: /* nested block comment */ */

where the comment begins with /* and extends to the matching occurrence of */.These block comments nest, as specified in the SQL standard but unlike C, sothat one can comment out larger blocks of code that might contain existing blockcomments.

A comment is removed from the input stream before further syntax analysis and iseffectively replaced by whitespace. The exception is a comment that containshints, which are preserved for interpretation by the query planner.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.