@@ -5970,6 +5970,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
59705970
59715971<!-- end re_syntax.n man page -->
59725972
5973+ <sect3 id="posix-vs-xquery">
5974+ <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title>
5975+
5976+ <indexterm zone="posix-vs-xquery">
5977+ <primary><literal>LIKE_REGEX</literal></primary>
5978+ </indexterm>
5979+
5980+ <indexterm zone="posix-vs-xquery">
5981+ <primary>XQuery regular expressions</primary>
5982+ </indexterm>
5983+
5984+ <para>
5985+ Since SQL:2008, the SQL standard includes
5986+ a <literal>LIKE_REGEX</literal> operator that performs pattern
5987+ matching according to the XQuery regular expression
5988+ standard. <productname>PostgreSQL</productname> does not yet
5989+ implement this operator, but you can get very similar behavior using
5990+ the <function>regexp_match()</function> function, since XQuery
5991+ regular expressions are quite close to the ARE syntax described above.
5992+ </para>
5993+
5994+ <para>
5995+ Notable differences between the existing POSIX-based
5996+ regular-expression feature and XQuery regular expressions include:
5997+
5998+ <itemizedlist>
5999+ <listitem>
6000+ <para>
6001+ XQuery character class subtraction is not supported. An example of
6002+ this feature is using the following to match only English
6003+ consonants: <literal>[a-z-[aeiou]]</literal>.
6004+ </para>
6005+ </listitem>
6006+ <listitem>
6007+ <para>
6008+ XQuery character class shorthands <literal>\c</literal>,
6009+ <literal>\C</literal>, <literal>\i</literal>,
6010+ and <literal>\I</literal> are not supported.
6011+ </para>
6012+ </listitem>
6013+ <listitem>
6014+ <para>
6015+ XQuery character class elements
6016+ using <literal>\p{UnicodeProperty}</literal> or the
6017+ inverse <literal>\P{UnicodeProperty}</literal> are not supported.
6018+ </para>
6019+ </listitem>
6020+ <listitem>
6021+ <para>
6022+ POSIX interprets character classes such as <literal>\w</literal>
6023+ (see <xref linkend="posix-class-shorthand-escapes-table"/>)
6024+ according to the prevailing locale (which you can control by
6025+ attaching a <literal>COLLATE</literal> clause to the operator or
6026+ function). XQuery specifies these classes by reference to Unicode
6027+ character properties, so equivalent behavior is obtained only with
6028+ a locale that follows the Unicode rules.
6029+ </para>
6030+ </listitem>
6031+ <listitem>
6032+ <para>
6033+ The SQL standard (not XQuery itself) attempts to cater for more
6034+ variants of <quote>newline</quote> than POSIX does. The
6035+ newline-sensitive matching options described above consider only
6036+ ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have
6037+ us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>)
6038+ (a Windows-style newline), and some Unicode-only characters like
6039+ LINE SEPARATOR (U+2028) as newlines as well.
6040+ Notably, <literal>.</literal> and <literal>\s</literal> should
6041+ count <literal>\r\n</literal> as one character not two according to
6042+ SQL.
6043+ </para>
6044+ </listitem>
6045+ <listitem>
6046+ <para>
6047+ Of the character-entry escapes described in
6048+ <xref linkend="posix-character-entry-escapes-table"/>,
6049+ XQuery supports only <literal>\n</literal>, <literal>\r</literal>,
6050+ and <literal>\t</literal>.
6051+ </para>
6052+ </listitem>
6053+ <listitem>
6054+ <para>
6055+ XQuery does not support
6056+ the <literal>[:<replaceable>name</replaceable>:]</literal> syntax
6057+ for character classes within bracket expressions.
6058+ </para>
6059+ </listitem>
6060+ <listitem>
6061+ <para>
6062+ XQuery does not have lookahead or lookbehind constraints,
6063+ nor any of the constraint escapes described in
6064+ <xref linkend="posix-constraint-escapes-table"/>.
6065+ </para>
6066+ </listitem>
6067+ <listitem>
6068+ <para>
6069+ The metasyntax forms described in <xref linkend="posix-metasyntax"/>
6070+ do not exist in XQuery.
6071+ </para>
6072+ </listitem>
6073+ <listitem>
6074+ <para>
6075+ The regular expression flag letters defined by XQuery are
6076+ related to but not the same as the option letters for POSIX
6077+ (<xref linkend="posix-embedded-options-table"/>). While the
6078+ <literal>i</literal> and <literal>q</literal> options behave the
6079+ same, others do not:
6080+ <itemizedlist>
6081+ <listitem>
6082+ <para>
6083+ XQuery's <literal>s</literal> (allow dot to match newline)
6084+ and <literal>m</literal> (allow <literal>^</literal>
6085+ and <literal>$</literal> to match at newlines) flags provide
6086+ access to the same behaviors as
6087+ POSIX's <literal>n</literal>, <literal>p</literal>
6088+ and <literal>w</literal> flags, but they
6089+ do <emphasis>not</emphasis> match the behavior of
6090+ POSIX's <literal>s</literal> and <literal>m</literal> flags.
6091+ Note in particular that dot-matches-newline is the default
6092+ behavior in POSIX but not XQuery.
6093+ </para>
6094+ </listitem>
6095+ <listitem>
6096+ <para>
6097+ XQuery's <literal>x</literal> (ignore whitespace in pattern) flag
6098+ is noticeably different from POSIX's expanded-mode flag.
6099+ POSIX's <literal>x</literal> flag also
6100+ allows <literal>#</literal> to begin a comment in the pattern,
6101+ and POSIX will not ignore a whitespace character after a
6102+ backslash.
6103+ </para>
6104+ </listitem>
6105+ </itemizedlist>
6106+ </para>
6107+ </listitem>
6108+ </itemizedlist>
6109+ </para>
6110+
6111+ </sect3>
59736112 </sect2>
59746113 </sect1>
59756114
@@ -11793,6 +11932,14 @@ table2-mapping
1179311932</programlisting>
1179411933 </para>
1179511934 </listitem>
11935+
11936+ <listitem>
11937+ <para>
11938+ There are minor differences in the interpretation of regular
11939+ expression patterns used in <literal>like_regex</literal> filters, as
11940+ described in <xref linkend="jsonpath-regular-expressions"/>.
11941+ </para>
11942+ </listitem>
1179611943 </itemizedlist>
1179711944
1179811945 <sect3 id="strict-and-lax-modes">
@@ -11872,6 +12019,63 @@ table2-mapping
1187212019
1187312020 </sect3>
1187412021
12022+ <sect3 id="jsonpath-regular-expressions">
12023+ <title>Regular Expressions</title>
12024+
12025+ <indexterm zone="jsonpath-regular-expressions">
12026+ <primary><literal>LIKE_REGEX</literal></primary>
12027+ <secondary>in SQL/JSON</secondary>
12028+ </indexterm>
12029+
12030+ <para>
12031+ SQL/JSON path expressions allow matching text to a regular expression
12032+ with the <literal>like_regex</literal> filter. For example, the
12033+ following SQL/JSON path query would case-insensitively match all
12034+ strings in an array that start with an English vowel:
12035+ <programlisting>
12036+ '$[*] ? (@ like_regex "^[aeiou]" flag "i")'
12037+ </programlisting>
12038+ </para>
12039+
12040+ <para>
12041+ The optional <literal>flag</literal> string may include one or more of
12042+ the characters
12043+ <literal>i</literal> for case-insensitive match,
12044+ <literal>m</literal> to allow <literal>^</literal>
12045+ and <literal>$</literal> to match at newlines,
12046+ <literal>s</literal> to allow <literal>.</literal> to match a newline,
12047+ and <literal>q</literal> to quote the whole pattern (reducing the
12048+ behavior to a simple substring match).
12049+ </para>
12050+
12051+ <para>
12052+ The SQL/JSON standard borrows its definition for regular expressions
12053+ from the <literal>LIKE_REGEX</literal> operator, which in turn uses the
12054+ XQuery standard. PostgreSQL does not currently support the
12055+ <literal>LIKE_REGEX</literal> operator. Therefore,
12056+ the <literal>like_regex</literal> filter is implemented using the
12057+ POSIX regular expression engine described in
12058+ <xref linkend="functions-posix-regexp"/>. This leads to various minor
12059+ discrepancies from standard SQL/JSON behavior, which are cataloged in
12060+ <xref linkend="posix-vs-xquery"/>.
12061+ Note, however, that the flag-letter incompatibilities described there
12062+ do not apply to SQL/JSON, as it translates the XQuery flag letters to
12063+ match what the POSIX engine expects.
12064+ </para>
12065+
12066+ <para>
12067+ Keep in mind that the pattern argument of <literal>like_regex</literal>
12068+ is a JSON path string literal, written according to the rules given in
12069+ <xref linkend="datatype-jsonpath"/>. This means in particular that any
12070+ backslashes you want to use in the regular expression must be doubled.
12071+ For example, to match strings that contain only digits:
12072+ <programlisting>
12073+ '$ ? (@ like_regex "^\\d+$")'
12074+ </programlisting>
12075+ </para>
12076+
12077+ </sect3>
12078+
1187512079 <sect3 id="functions-sqljson-path-operators">
1187612080 <title>SQL/JSON Path Operators and Methods</title>
1187712081
@@ -12113,10 +12317,11 @@ table2-mapping
1211312317 <row>
1211412318 <entry><literal>like_regex</literal></entry>
1211512319 <entry>
12116- Tests pattern matching with POSIX regular expressions
12117- (see <xref linkend="functions-posix-regexp"/>). Supported flags
12118- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
12119- <literal>x</literal>, and <literal>q</literal>.</entry>
12320+ Tests whether the first operand matches the regular expression
12321+ given by the second operand, optionally with modifications
12322+ described by a string of <literal>flag</literal> characters (see
12323+ <xref linkend="jsonpath-regular-expressions"/>)
12324+ </entry>
1212012325 <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry>
1212112326 <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry>
1212212327 <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>