Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7dc13a0

Browse files
committed
Change regex \D and \W shorthands to always match newlines.
Newline is certainly not a digit, nor a word character, so it issensible that it should match these complemented character classes.Previously, \D and \W acted that way by default, but innewline-sensitive mode ('n' or 'p' flag) they did not match newlines.This behavior was previously forced because explicit complementedcharacter classes don't match newlines in newline-sensitive mode;but as of the previous commit that implementation constraint nolonger exists. It seems useful to change this because the primaryreal-world use for newline-sensitive mode seems to be to match thedefault behavior of other regex engines such as Perl and Javascript... and their default behavior is that these match newlines.The old behavior can be kept by writing an explicit complementedcharacter class, i.e. [^[:digit:]] or [^[:word:]]. (This meansthat \D and \W are not exactly equivalent to those strings, butthey weren't anyway.)Discussion:https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
1 parent2a0af7f commit7dc13a0

File tree

4 files changed

+36
-17
lines changed

4 files changed

+36
-17
lines changed

‎doc/src/sgml/func.sgml

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6323,32 +6323,38 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
63236323
<tbody>
63246324
<row>
63256325
<entry> <literal>\d</literal> </entry>
6326-
<entry> <literal>[[:digit:]]</literal> </entry>
6326+
<entry> matches any digit, like
6327+
<literal>[[:digit:]]</literal> </entry>
63276328
</row>
63286329

63296330
<row>
63306331
<entry> <literal>\s</literal> </entry>
6331-
<entry> <literal>[[:space:]]</literal> </entry>
6332+
<entry> matches any whitespace character, like
6333+
<literal>[[:space:]]</literal> </entry>
63326334
</row>
63336335

63346336
<row>
63356337
<entry> <literal>\w</literal> </entry>
6336-
<entry> <literal>[[:word:]]</literal> </entry>
6338+
<entry> matches any word character, like
6339+
<literal>[[:word:]]</literal> </entry>
63376340
</row>
63386341

63396342
<row>
63406343
<entry> <literal>\D</literal> </entry>
6341-
<entry> <literal>[^[:digit:]]</literal> </entry>
6344+
<entry> matches any non-digit, like
6345+
<literal>[^[:digit:]]</literal> </entry>
63426346
</row>
63436347

63446348
<row>
63456349
<entry> <literal>\S</literal> </entry>
6346-
<entry> <literal>[^[:space:]]</literal> </entry>
6350+
<entry> matches any non-whitespace character, like
6351+
<literal>[^[:space:]]</literal> </entry>
63476352
</row>
63486353

63496354
<row>
63506355
<entry> <literal>\W</literal> </entry>
6351-
<entry> <literal>[^[:word:]]</literal> </entry>
6356+
<entry> matches any non-word character, like
6357+
<literal>[^[:word:]]</literal> </entry>
63526358
</row>
63536359
</tbody>
63546360
</tgroup>
@@ -6813,14 +6819,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
68136819
If newline-sensitive matching is specified, <literal>.</literal>
68146820
and bracket expressions using <literal>^</literal>
68156821
will never match the newline character
6816-
(so that matches willnever crossnewlines unless the RE
6817-
explicitlyarranges it)
6822+
(so that matches willnot crosslines unless the RE
6823+
explicitlyincludes a newline)
68186824
and <literal>^</literal> and <literal>$</literal>
68196825
will match the empty string after and before a newline
68206826
respectively, in addition to matching at beginning and end of string
68216827
respectively.
68226828
But the ARE escapes <literal>\A</literal> and <literal>\Z</literal>
68236829
continue to match beginning or end of string <emphasis>only</emphasis>.
6830+
Also, the character class shorthands <literal>\D</literal>
6831+
and <literal>\W</literal> will match a newline regardless of this mode.
6832+
(Before <productname>PostgreSQL</productname> 14, they did not match
6833+
newlines when in newline-sensitive mode.
6834+
Write <literal>[^[:digit:]]</literal>
6835+
or <literal>[^[:word:]]</literal> to get the old behavior.)
68246836
</para>
68256837

68266838
<para>

‎src/backend/regex/re_syntax.n

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -804,7 +804,7 @@ and bracket expressions using
804804
\fB^\fR
805805
will never match the newline character
806806
(so that matches will never cross newlines unless the RE
807-
explicitlyarranges it)
807+
explicitlyincludes a newline)
808808
and
809809
\fB^\fR
810810
and
@@ -817,6 +817,11 @@ ARE
817817
and
818818
\fB\eZ\fR
819819
continue to match beginning or end of string\fIonly\fR.
820+
Also, the character class shorthands
821+
\fB\eD\fR
822+
and
823+
\fB\eW\fR
824+
will match a newline regardless of this mode.
820825
.PP
821826
If partial newline-sensitive matching is specified,
822827
this affects\fB.\fR

‎src/backend/regex/regcomp.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1407,10 +1407,6 @@ charclasscomplement(struct vars *v,
14071407

14081408
/* build arcs for char class; this may cause color splitting */
14091409
subcolorcvec(v,cv,cstate,cstate);
1410-
1411-
/* in NLSTOP mode, ensure newline is not part of the result set */
1412-
if (v->cflags&REG_NLSTOP)
1413-
newarc(v->nfa,PLAIN,v->nlcolor,cstate,cstate);
14141410
NOERR();
14151411

14161412
/* clean up any subcolors in the arc set */
@@ -1612,6 +1608,8 @@ cbracket(struct vars *v,
16121608

16131609
NOERR();
16141610
bracket(v,left,right);
1611+
1612+
/* in NLSTOP mode, ensure newline is not part of the result set */
16151613
if (v->cflags&REG_NLSTOP)
16161614
newarc(v->nfa,PLAIN,v->nlcolor,left,right);
16171615
NOERR();

‎src/test/modules/test_regex/expected/test_regex.out

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2144,7 +2144,8 @@ select * from test_regex('\D+', E'abc\ndef345', 'nLP');
21442144
test_regex
21452145
-------------------------------
21462146
{0,REG_UNONPOSIX,REG_ULOCALE}
2147-
{abc}
2147+
{"abc +
2148+
def"}
21482149
(2 rows)
21492150

21502151
select * from test_regex('[\D]+', E'abc\ndef345', 'LPE');
@@ -2159,7 +2160,8 @@ select * from test_regex('[\D]+', E'abc\ndef345', 'nLPE');
21592160
test_regex
21602161
----------------------------------------
21612162
{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}
2162-
{abc}
2163+
{"abc +
2164+
def"}
21632165
(2 rows)
21642166

21652167
select * from test_regex('\w+', E'abc_012\ndef', 'LP');
@@ -2202,7 +2204,8 @@ select * from test_regex('\W+', E'***\n@@@___', 'nLP');
22022204
test_regex
22032205
-------------------------------
22042206
{0,REG_UNONPOSIX,REG_ULOCALE}
2205-
{***}
2207+
{"*** +
2208+
@@@"}
22062209
(2 rows)
22072210

22082211
select * from test_regex('[\W]+', E'***\n@@@___', 'LPE');
@@ -2217,7 +2220,8 @@ select * from test_regex('[\W]+', E'***\n@@@___', 'nLPE');
22172220
test_regex
22182221
----------------------------------------
22192222
{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}
2220-
{***}
2223+
{"*** +
2224+
@@@"}
22212225
(2 rows)
22222226

22232227
-- doing 13 "escapes"

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp