@@ -5966,6 +5966,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
59665966
59675967<!-- end re_syntax.n man page -->
59685968
5969+ <sect3 id="posix-vs-xquery">
5970+ <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title>
5971+
5972+ <indexterm zone="posix-vs-xquery">
5973+ <primary><literal>LIKE_REGEX</literal></primary>
5974+ </indexterm>
5975+
5976+ <indexterm zone="posix-vs-xquery">
5977+ <primary>XQuery regular expressions</primary>
5978+ </indexterm>
5979+
5980+ <para>
5981+ Since SQL:2008, the SQL standard includes
5982+ a <literal>LIKE_REGEX</literal> operator that performs pattern
5983+ matching according to the XQuery regular expression
5984+ standard. <productname>PostgreSQL</productname> does not yet
5985+ implement this operator, but you can get very similar behavior using
5986+ the <function>regexp_match()</function> function, since XQuery
5987+ regular expressions are quite close to the ARE syntax described above.
5988+ </para>
5989+
5990+ <para>
5991+ Notable differences between the existing POSIX-based
5992+ regular-expression feature and XQuery regular expressions include:
5993+
5994+ <itemizedlist>
5995+ <listitem>
5996+ <para>
5997+ XQuery character class subtraction is not supported. An example of
5998+ this feature is using the following to match only English
5999+ consonants: <literal>[a-z-[aeiou]]</literal>.
6000+ </para>
6001+ </listitem>
6002+ <listitem>
6003+ <para>
6004+ XQuery character class shorthands <literal>\c</literal>,
6005+ <literal>\C</literal>, <literal>\i</literal>,
6006+ and <literal>\I</literal> are not supported.
6007+ </para>
6008+ </listitem>
6009+ <listitem>
6010+ <para>
6011+ XQuery character class elements
6012+ using <literal>\p{UnicodeProperty}</literal> or the
6013+ inverse <literal>\P{UnicodeProperty}</literal> are not supported.
6014+ </para>
6015+ </listitem>
6016+ <listitem>
6017+ <para>
6018+ POSIX interprets character classes such as <literal>\w</literal>
6019+ (see <xref linkend="posix-class-shorthand-escapes-table"/>)
6020+ according to the prevailing locale (which you can control by
6021+ attaching a <literal>COLLATE</literal> clause to the operator or
6022+ function). XQuery specifies these classes by reference to Unicode
6023+ character properties, so equivalent behavior is obtained only with
6024+ a locale that follows the Unicode rules.
6025+ </para>
6026+ </listitem>
6027+ <listitem>
6028+ <para>
6029+ The SQL standard (not XQuery itself) attempts to cater for more
6030+ variants of <quote>newline</quote> than POSIX does. The
6031+ newline-sensitive matching options described above consider only
6032+ ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have
6033+ us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>)
6034+ (a Windows-style newline), and some Unicode-only characters like
6035+ LINE SEPARATOR (U+2028) as newlines as well.
6036+ Notably, <literal>.</literal> and <literal>\s</literal> should
6037+ count <literal>\r\n</literal> as one character not two according to
6038+ SQL.
6039+ </para>
6040+ </listitem>
6041+ <listitem>
6042+ <para>
6043+ Of the character-entry escapes described in
6044+ <xref linkend="posix-character-entry-escapes-table"/>,
6045+ XQuery supports only <literal>\n</literal>, <literal>\r</literal>,
6046+ and <literal>\t</literal>.
6047+ </para>
6048+ </listitem>
6049+ <listitem>
6050+ <para>
6051+ XQuery does not support
6052+ the <literal>[:<replaceable>name</replaceable>:]</literal> syntax
6053+ for character classes within bracket expressions.
6054+ </para>
6055+ </listitem>
6056+ <listitem>
6057+ <para>
6058+ XQuery does not have lookahead or lookbehind constraints,
6059+ nor any of the constraint escapes described in
6060+ <xref linkend="posix-constraint-escapes-table"/>.
6061+ </para>
6062+ </listitem>
6063+ <listitem>
6064+ <para>
6065+ The metasyntax forms described in <xref linkend="posix-metasyntax"/>
6066+ do not exist in XQuery.
6067+ </para>
6068+ </listitem>
6069+ <listitem>
6070+ <para>
6071+ The regular expression flag letters defined by XQuery are
6072+ related to but not the same as the option letters for POSIX
6073+ (<xref linkend="posix-embedded-options-table"/>). While the
6074+ <literal>i</literal> and <literal>q</literal> options behave the
6075+ same, others do not:
6076+ <itemizedlist>
6077+ <listitem>
6078+ <para>
6079+ XQuery's <literal>s</literal> (allow dot to match newline)
6080+ and <literal>m</literal> (allow <literal>^</literal>
6081+ and <literal>$</literal> to match at newlines) flags provide
6082+ access to the same behaviors as
6083+ POSIX's <literal>n</literal>, <literal>p</literal>
6084+ and <literal>w</literal> flags, but they
6085+ do <emphasis>not</emphasis> match the behavior of
6086+ POSIX's <literal>s</literal> and <literal>m</literal> flags.
6087+ Note in particular that dot-matches-newline is the default
6088+ behavior in POSIX but not XQuery.
6089+ </para>
6090+ </listitem>
6091+ <listitem>
6092+ <para>
6093+ XQuery's <literal>x</literal> (ignore whitespace in pattern) flag
6094+ is noticeably different from POSIX's expanded-mode flag.
6095+ POSIX's <literal>x</literal> flag also
6096+ allows <literal>#</literal> to begin a comment in the pattern,
6097+ and POSIX will not ignore a whitespace character after a
6098+ backslash.
6099+ </para>
6100+ </listitem>
6101+ </itemizedlist>
6102+ </para>
6103+ </listitem>
6104+ </itemizedlist>
6105+ </para>
6106+
6107+ </sect3>
59696108 </sect2>
59706109 </sect1>
59716110
@@ -11739,6 +11878,14 @@ table2-mapping
1173911878</programlisting>
1174011879 </para>
1174111880 </listitem>
11881+
11882+ <listitem>
11883+ <para>
11884+ There are minor differences in the interpretation of regular
11885+ expression patterns used in <literal>like_regex</literal> filters, as
11886+ described in <xref linkend="jsonpath-regular-expressions"/>.
11887+ </para>
11888+ </listitem>
1174211889 </itemizedlist>
1174311890
1174411891 <sect3 id="strict-and-lax-modes">
@@ -11818,6 +11965,63 @@ table2-mapping
1181811965
1181911966 </sect3>
1182011967
11968+ <sect3 id="jsonpath-regular-expressions">
11969+ <title>Regular Expressions</title>
11970+
11971+ <indexterm zone="jsonpath-regular-expressions">
11972+ <primary><literal>LIKE_REGEX</literal></primary>
11973+ <secondary>in SQL/JSON</secondary>
11974+ </indexterm>
11975+
11976+ <para>
11977+ SQL/JSON path expressions allow matching text to a regular expression
11978+ with the <literal>like_regex</literal> filter. For example, the
11979+ following SQL/JSON path query would case-insensitively match all
11980+ strings in an array that start with an English vowel:
11981+ <programlisting>
11982+ '$[*] ? (@ like_regex "^[aeiou]" flag "i")'
11983+ </programlisting>
11984+ </para>
11985+
11986+ <para>
11987+ The optional <literal>flag</literal> string may include one or more of
11988+ the characters
11989+ <literal>i</literal> for case-insensitive match,
11990+ <literal>m</literal> to allow <literal>^</literal>
11991+ and <literal>$</literal> to match at newlines,
11992+ <literal>s</literal> to allow <literal>.</literal> to match a newline,
11993+ and <literal>q</literal> to quote the whole pattern (reducing the
11994+ behavior to a simple substring match).
11995+ </para>
11996+
11997+ <para>
11998+ The SQL/JSON standard borrows its definition for regular expressions
11999+ from the <literal>LIKE_REGEX</literal> operator, which in turn uses the
12000+ XQuery standard. PostgreSQL does not currently support the
12001+ <literal>LIKE_REGEX</literal> operator. Therefore,
12002+ the <literal>like_regex</literal> filter is implemented using the
12003+ POSIX regular expression engine described in
12004+ <xref linkend="functions-posix-regexp"/>. This leads to various minor
12005+ discrepancies from standard SQL/JSON behavior, which are cataloged in
12006+ <xref linkend="posix-vs-xquery"/>.
12007+ Note, however, that the flag-letter incompatibilities described there
12008+ do not apply to SQL/JSON, as it translates the XQuery flag letters to
12009+ match what the POSIX engine expects.
12010+ </para>
12011+
12012+ <para>
12013+ Keep in mind that the pattern argument of <literal>like_regex</literal>
12014+ is a JSON path string literal, written according to the rules given in
12015+ <xref linkend="datatype-jsonpath"/>. This means in particular that any
12016+ backslashes you want to use in the regular expression must be doubled.
12017+ For example, to match strings that contain only digits:
12018+ <programlisting>
12019+ '$ ? (@ like_regex "^\\d+$")'
12020+ </programlisting>
12021+ </para>
12022+
12023+ </sect3>
12024+
1182112025 <sect3 id="functions-sqljson-path-operators">
1182212026 <title>SQL/JSON Path Operators and Methods</title>
1182312027
@@ -12059,10 +12263,11 @@ table2-mapping
1205912263 <row>
1206012264 <entry><literal>like_regex</literal></entry>
1206112265 <entry>
12062- Tests pattern matching with POSIX regular expressions
12063- (see <xref linkend="functions-posix-regexp"/>). Supported flags
12064- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
12065- <literal>x</literal>, and <literal>q</literal>.</entry>
12266+ Tests whether the first operand matches the regular expression
12267+ given by the second operand, optionally with modifications
12268+ described by a string of <literal>flag</literal> characters (see
12269+ <xref linkend="jsonpath-regular-expressions"/>)
12270+ </entry>
1206612271 <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry>
1206712272 <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry>
1206812273 <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>