Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0471cd5

Browse files
committed
Clarify description of greedy and non-greedy POSIX regular expressions,
per discussion in Nov 2004 with Ken Tanzer.
1 parenta9566cc commit0471cd5

File tree

1 file changed

+94
-30
lines changed

1 file changed

+94
-30
lines changed

‎doc/src/sgml/func.sgml

Lines changed: 94 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.233 2005/01/08 05:19:18 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.234 2005/01/09 20:08:50 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -3772,45 +3772,109 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
37723772
In the event that an RE could match more than one substring of a given
37733773
string, the RE matches the one starting earliest in the string.
37743774
If the RE could match more than one substring starting at that point,
3775-
its choice is determined by its <firstterm>preference</>:
3776-
either the longest substring, or the shortest.
3775+
either the longest possible match or the shortest possible match will
3776+
be taken, depending on whether the RE is <firstterm>greedy</> or
3777+
<firstterm>non-greedy</>.
37773778
</para>
37783779

37793780
<para>
3780-
Most atoms, and all constraints, have no preference.
3781-
A parenthesized RE has the same preference (possibly none) as the RE.
3782-
A quantified atom with quantifier
3783-
<literal>{</><replaceable>m</><literal>}</>
3784-
or
3785-
<literal>{</><replaceable>m</><literal>}?</>
3786-
has the same preference (possibly none) as the atom itself.
3787-
A quantified atom with other normal quantifiers (including
3788-
<literal>{</><replaceable>m</><literal>,</><replaceable>n</><literal>}</>
3789-
with <replaceable>m</> equal to <replaceable>n</>)
3790-
prefers longest match.
3791-
A quantified atom with other non-greedy quantifiers (including
3792-
<literal>{</><replaceable>m</><literal>,</><replaceable>n</><literal>}?</>
3793-
with <replaceable>m</> equal to <replaceable>n</>)
3794-
prefers shortest match.
3795-
A branch has the same preference as the first quantified atom in it
3796-
which has a preference.
3797-
An RE consisting of two or more branches connected by the
3798-
<literal>|</> operator prefers longest match.
3781+
Whether an RE is greedy or not is determined by the following rules:
3782+
<itemizedlist>
3783+
<listitem>
3784+
<para>
3785+
Most atoms, and all constraints, have no greediness attribute (because
3786+
they cannot match variable amounts of text anyway).
3787+
</para>
3788+
</listitem>
3789+
<listitem>
3790+
<para>
3791+
Adding parentheses around an RE does not change its greediness.
3792+
</para>
3793+
</listitem>
3794+
<listitem>
3795+
<para>
3796+
A quantified atom with a fixed-repetition quantifier
3797+
(<literal>{</><replaceable>m</><literal>}</>
3798+
or
3799+
<literal>{</><replaceable>m</><literal>}?</>)
3800+
has the same greediness (possibly none) as the atom itself.
3801+
</para>
3802+
</listitem>
3803+
<listitem>
3804+
<para>
3805+
A quantified atom with other normal quantifiers (including
3806+
<literal>{</><replaceable>m</><literal>,</><replaceable>n</><literal>}</>
3807+
with <replaceable>m</> equal to <replaceable>n</>)
3808+
is greedy (prefers longest match).
3809+
</para>
3810+
</listitem>
3811+
<listitem>
3812+
<para>
3813+
A quantified atom with a non-greedy quantifier (including
3814+
<literal>{</><replaceable>m</><literal>,</><replaceable>n</><literal>}?</>
3815+
with <replaceable>m</> equal to <replaceable>n</>)
3816+
is non-greedy (prefers shortest match).
3817+
</para>
3818+
</listitem>
3819+
<listitem>
3820+
<para>
3821+
A branch &mdash; that is, an RE that has no top-level
3822+
<literal>|</> operator &mdash; has the same greediness as the first
3823+
quantified atom in it that has a greediness attribute.
3824+
</para>
3825+
</listitem>
3826+
<listitem>
3827+
<para>
3828+
An RE consisting of two or more branches connected by the
3829+
<literal>|</> operator is always greedy.
3830+
</para>
3831+
</listitem>
3832+
</itemizedlist>
3833+
</para>
3834+
3835+
<para>
3836+
The above rules associate greediness attributes not only with individual
3837+
quantified atoms, but with branches and entire REs that contain quantified
3838+
atoms. What that means is that the matching is done in such a way that
3839+
the branch, or whole RE, matches the longest or shortest possible
3840+
substring <emphasis>as a whole</>. Once the length of the entire match
3841+
is determined, the part of it that matches any particular subexpression
3842+
is determined on the basis of the greediness attribute of that
3843+
subexpression, with subexpressions starting earlier in the RE taking
3844+
priority over ones starting later.
3845+
</para>
3846+
3847+
<para>
3848+
An example of what this means:
3849+
<screen>
3850+
SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
3851+
<lineannotation>Result: </lineannotation><computeroutput>123</computeroutput>
3852+
SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
3853+
<lineannotation>Result: </lineannotation><computeroutput>1</computeroutput>
3854+
</screen>
3855+
In the first case, the RE as a whole is greedy because <literal>Y*</>
3856+
is greedy. It can match beginning at the <literal>Y</>, and it matches
3857+
the longest possible string starting there, i.e., <literal>Y123</>.
3858+
The output is the parenthesized part of that, or <literal>123</>.
3859+
In the second case, the RE as a whole is non-greedy because <literal>Y*?</>
3860+
is non-greedy. It can match beginning at the <literal>Y</>, and it matches
3861+
the shortest possible string starting there, i.e., <literal>Y1</>.
3862+
The subexpression <literal>[0-9]{1,3}</> is greedy but it cannot change
3863+
the decision as to the overall match length; so it is forced to match
3864+
just <literal>1</>.
37993865
</para>
38003866

38013867
<para>
3802-
Subject to the constraints imposed by the rules for matching the whole RE,
3803-
subexpressions also match the longest or shortest possible substrings,
3804-
based on their preferences,
3805-
with subexpressions starting earlier in the RE taking priority over
3806-
ones starting later.
3807-
Note that outer subexpressions thus take priority over
3808-
their component subexpressions.
3868+
In short, when an RE contains both greedy and non-greedy subexpressions,
3869+
the total match length is either as long as possible or as short as
3870+
possible, according to the attribute assigned to the whole RE. The
3871+
attributes assigned to the subexpressions only affect how much of that
3872+
match they are allowed to <quote>eat</> relative to each other.
38093873
</para>
38103874

38113875
<para>
38123876
The quantifiers <literal>{1,1}</> and <literal>{1,1}?</>
3813-
can be used to forcelongest and shortest preference, respectively,
3877+
can be used to forcegreediness or non-greediness, respectively,
38143878
on a subexpression or a whole RE.
38153879
</para>
38163880

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp