Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcf9b0fe

Browse files
committed
Implement regexp_match(), a simplified alternative to regexp_matches().
regexp_match() is like regexp_matches(), but it disallows the 'g' flagand in consequence does not need to return a set. Instead, it returnsa simple text array value, or NULL if there's no match. Previously peopleusually got that behavior with a sub-select, but this way is considerablymore efficient.Documentation adjusted so that regexp_match() is presented first and thenregexp_matches() is introduced as a more complicated version. This isa bit historically revisionist but seems pedagogically better.Still TODO: extend contrib/citext to support this function.Emre Hasegeli, reviewed by David JohnstonDiscussion: <CAE2gYzy42sna2ME_e3y1KLQ-4UBrB-eVF0SWn8QG39sQSeVhEw@mail.gmail.com>
1 parent2d7e591 commitcf9b0fe

File tree

9 files changed

+252
-93
lines changed

9 files changed

+252
-93
lines changed

‎doc/src/sgml/func.sgml

Lines changed: 109 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -2036,6 +2036,23 @@
20362036
<entry><literal>'42.5'</literal></entry>
20372037
</row>
20382038

2039+
<row>
2040+
<entry>
2041+
<indexterm>
2042+
<primary>regexp_match</primary>
2043+
</indexterm>
2044+
<literal><function>regexp_match(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</function></literal>
2045+
</entry>
2046+
<entry><type>text[]</type></entry>
2047+
<entry>
2048+
Return captured substring(s) resulting from the first match of a POSIX
2049+
regular expression to the <parameter>string</parameter>. See
2050+
<xref linkend="functions-posix-regexp"> for more information.
2051+
</entry>
2052+
<entry><literal>regexp_match('foobarbequebaz', '(bar)(beque)')</literal></entry>
2053+
<entry><literal>{bar,beque}</literal></entry>
2054+
</row>
2055+
20392056
<row>
20402057
<entry>
20412058
<indexterm>
@@ -2045,12 +2062,12 @@
20452062
</entry>
20462063
<entry><type>setof text[]</type></entry>
20472064
<entry>
2048-
Returnallcapturedsubstrings resulting from matching a POSIX regular
2049-
expressionagainst the <parameter>string</parameter>. See
2065+
Return capturedsubstring(s) resulting from matching a POSIX regular
2066+
expressionto the <parameter>string</parameter>. See
20502067
<xref linkend="functions-posix-regexp"> for more information.
20512068
</entry>
2052-
<entry><literal>regexp_matches('foobarbequebaz', '(bar)(beque)')</literal></entry>
2053-
<entry><literal>{bar,beque}</literal></entry>
2069+
<entry><literal>regexp_matches('foobarbequebaz', 'ba.', 'g')</literal></entry>
2070+
<entry><literal>{bar}</literal><para><literal>{baz}</literal></para> (2 rows)</entry>
20542071
</row>
20552072

20562073
<row>
@@ -4112,6 +4129,9 @@ substring('foobar' from '#"o_b#"%' for '#') <lineannotation>NULL</lineannotat
41124129
<indexterm>
41134130
<primary>regexp_replace</primary>
41144131
</indexterm>
4132+
<indexterm>
4133+
<primary>regexp_match</primary>
4134+
</indexterm>
41154135
<indexterm>
41164136
<primary>regexp_matches</primary>
41174137
</indexterm>
@@ -4272,64 +4292,106 @@ regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
42724292
</para>
42734293

42744294
<para>
4275-
The <function>regexp_matches</> function returns a text array of
4276-
all of the captured substrings resulting from matching a POSIX
4277-
regular expression pattern. It has the syntax
4278-
<function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
4279-
<optional>, <replaceable>flags</> </optional>).
4280-
The function can return no rows, one row, or multiple rows (see
4281-
the <literal>g</> flag below). If the <replaceable>pattern</>
4282-
does not match, the function returns no rows. If the pattern
4283-
contains no parenthesized subexpressions, then each row
4284-
returned is a single-element text array containing the substring
4285-
matching the whole pattern. If the pattern contains parenthesized
4286-
subexpressions, the function returns a text array whose
4287-
<replaceable>n</>'th element is the substring matching the
4288-
<replaceable>n</>'th parenthesized subexpression of the pattern
4289-
(not counting <quote>non-capturing</> parentheses; see below for
4290-
details).
4291-
The <replaceable>flags</> parameter is an optional text
4292-
string containing zero or more single-letter flags that change the
4293-
function's behavior. Flag <literal>g</> causes the function to find
4294-
each match in the string, not only the first one, and return a row for
4295-
each such match. Supported flags (though
4296-
not <literal>g</>)
4297-
are described in <xref linkend="posix-embedded-options-table">.
4295+
The <function>regexp_match</> function returns a text array of
4296+
captured substring(s) resulting from the first match of a POSIX
4297+
regular expression pattern to a string. It has the syntax
4298+
<function>regexp_match</function>(<replaceable>string</>,
4299+
<replaceable>pattern</> <optional>, <replaceable>flags</> </optional>).
4300+
If there is no match, the result is <literal>NULL</>.
4301+
If a match is found, and the <replaceable>pattern</> contains no
4302+
parenthesized subexpressions, then the result is a single-element text
4303+
array containing the substring matching the whole pattern.
4304+
If a match is found, and the <replaceable>pattern</> contains
4305+
parenthesized subexpressions, then the result is a text array
4306+
whose <replaceable>n</>'th element is the substring matching
4307+
the <replaceable>n</>'th parenthesized subexpression of
4308+
the <replaceable>pattern</> (not counting <quote>non-capturing</>
4309+
parentheses; see below for details).
4310+
The <replaceable>flags</> parameter is an optional text string
4311+
containing zero or more single-letter flags that change the function's
4312+
behavior. Supported flags are described
4313+
in <xref linkend="posix-embedded-options-table">.
42984314
</para>
42994315

43004316
<para>
43014317
Some examples:
43024318
<programlisting>
4303-
SELECT regexp_matches('foobarbequebaz', '(bar)(beque)');
4304-
regexp_matches
4305-
----------------
4319+
SELECT regexp_match('foobarbequebaz', 'bar.*que');
4320+
regexp_match
4321+
--------------
4322+
{barbeque}
4323+
(1 row)
4324+
4325+
SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
4326+
regexp_match
4327+
--------------
43064328
{bar,beque}
43074329
(1 row)
4330+
</programlisting>
4331+
In the common case where you just want the whole matching substring
4332+
or <literal>NULL</> for no match, write something like
4333+
<programlisting>
4334+
SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
4335+
regexp_match
4336+
--------------
4337+
barbeque
4338+
(1 row)
4339+
</programlisting>
4340+
</para>
4341+
4342+
<para>
4343+
The <function>regexp_matches</> function returns a set of text arrays
4344+
of captured substring(s) resulting from matching a POSIX regular
4345+
expression pattern to a string. It has the same syntax as
4346+
<function>regexp_match</function>.
4347+
This function returns no rows if there is no match, one row if there is
4348+
a match and the <literal>g</> flag is not given, or <replaceable>N</>
4349+
rows if there are <replaceable>N</> matches and the <literal>g</> flag
4350+
is given. Each returned row is a text array containing the whole
4351+
matched substring or the substrings matching parenthesized
4352+
subexpressions of the <replaceable>pattern</>, just as described above
4353+
for <function>regexp_match</function>.
4354+
<function>regexp_matches</> accepts all the flags shown
4355+
in <xref linkend="posix-embedded-options-table">, plus
4356+
the <literal>g</> flag which commands it to return all matches, not
4357+
just the first one.
4358+
</para>
4359+
4360+
<para>
4361+
Some examples:
4362+
<programlisting>
4363+
SELECT regexp_matches('foo', 'not there');
4364+
regexp_matches
4365+
----------------
4366+
(0 rows)
43084367

43094368
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
4310-
regexp_matches
4369+
regexp_matches
43114370
----------------
43124371
{bar,beque}
43134372
{bazil,barf}
43144373
(2 rows)
4315-
4316-
SELECT regexp_matches('foobarbequebaz', 'barbeque');
4317-
regexp_matches
4318-
----------------
4319-
{barbeque}
4320-
(1 row)
43214374
</programlisting>
43224375
</para>
43234376

4324-
<para>
4325-
It is possible to force <function>regexp_matches()</> to always
4326-
return one row by using a sub-select; this is particularly useful
4327-
in a <literal>SELECT</> target list when you want all rows
4328-
returned, even non-matching ones:
4377+
<tip>
4378+
<para>
4379+
In most cases <function>regexp_matches()</> should be used with
4380+
the <literal>g</> flag, since if you only want the first match, it's
4381+
easier and more efficient to use <function>regexp_match()</>.
4382+
However, <function>regexp_match()</> only exists
4383+
in <productname>PostgreSQL</> version 10 and up. When working in older
4384+
versions, a common trick is to place a <function>regexp_matches()</>
4385+
call in a sub-select, for example:
43294386
<programlisting>
43304387
SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
43314388
</programlisting>
4332-
</para>
4389+
This produces a text array if there's a match, or <literal>NULL</> if
4390+
not, the same as <function>regexp_match()</> would do. Without the
4391+
sub-select, this query would produce no output at all for table rows
4392+
without a match, which is typically not the desired behavior.
4393+
</para>
4394+
</tip>
43334395

43344396
<para>
43354397
The <function>regexp_split_to_table</> function splits a string using a POSIX
@@ -4408,6 +4470,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
44084470
zero-length matches that occur at the start or end of the string
44094471
or immediately after a previous match. This is contrary to the strict
44104472
definition of regexp matching that is implemented by
4473+
<function>regexp_match</> and
44114474
<function>regexp_matches</>, but is usually the most convenient behavior
44124475
in practice. Other software systems such as Perl use similar definitions.
44134476
</para>
@@ -5482,22 +5545,22 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
54825545
into the digits and the parts before and after them. We might try to
54835546
do that like this:
54845547
<screen>
5485-
SELECTregexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
5548+
SELECTregexp_match('abc01234xyz', '(.*)(\d+)(.*)');
54865549
<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput>
54875550
</screen>
54885551
That didn't work: the first <literal>.*</> is greedy so
54895552
it <quote>eats</> as much as it can, leaving the <literal>\d+</> to
54905553
match at the last possible place, the last digit. We might try to fix
54915554
that by making it non-greedy:
54925555
<screen>
5493-
SELECTregexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
5556+
SELECTregexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
54945557
<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput>
54955558
</screen>
54965559
That didn't work either, because now the RE as a whole is non-greedy
54975560
and so it ends the overall match as soon as possible. We can get what
54985561
we want by forcing the RE as a whole to be greedy:
54995562
<screen>
5500-
SELECTregexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
5563+
SELECTregexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
55015564
<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput>
55025565
</screen>
55035566
Controlling the RE's overall greediness separately from its components'

‎src/backend/catalog/information_schema.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2068,7 +2068,7 @@ CREATE VIEW triggers AS
20682068
-- XXX strange hacks follow
20692069
CAST(
20702070
CASE WHEN pg_has_role(c.relowner,'USAGE')
2071-
THEN (SELECT m[1]FROM regexp_matches(pg_get_triggerdef(t.oid), E'.{35,} WHEN\\((.+)\\) EXECUTE PROCEDURE')AS rm(m)LIMIT1)
2071+
THEN (regexp_match(pg_get_triggerdef(t.oid), E'.{35,} WHEN\\((.+)\\) EXECUTE PROCEDURE'))[1]
20722072
ELSEnull END
20732073
AS character_data)AS action_condition,
20742074
CAST(

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp