NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit7dc13a0

committed

Change regex \D and \W shorthands to always match newlines.

Newline is certainly not a digit, nor a word character, so it issensible that it should match these complemented character classes.Previously, \D and \W acted that way by default, but innewline-sensitive mode ('n' or 'p' flag) they did not match newlines.This behavior was previously forced because explicit complementedcharacter classes don't match newlines in newline-sensitive mode;but as of the previous commit that implementation constraint nolonger exists. It seems useful to change this because the primaryreal-world use for newline-sensitive mode seems to be to match thedefault behavior of other regex engines such as Perl and Javascript... and their default behavior is that these match newlines.The old behavior can be kept by writing an explicit complementedcharacter class, i.e. [^[:digit:]] or [^[:word:]]. (This meansthat \D and \W are not exactly equivalent to those strings, butthey weren't anyway.)Discussion:https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us

1 parent2a0af7f commit7dc13a0Copy full SHA for 7dc13a0

File tree

4 files changed

+36

-17

lines changed

doc/src/sgml
- func.sgml
src
- backend/regex
  - re_syntax.n
  - regcomp.c
- test/modules/test_regex/expected
  - test_regex.out

4 files changed

+36

-17

lines changed

`‎doc/src/sgml/func.sgml`

Lines changed: 20 additions & 8 deletions

Original file line number	Diff line number	Diff line change
`@@ -6323,32 +6323,38 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;`
`6323`	`6323`	`<tbody>`
`6324`	`6324`	`<row>`
`6325`	`6325`	`<entry> <literal>\d</literal> </entry>`
`6326`		`- <entry> <literal>[[:digit:]]</literal> </entry>`
	`6326`	`+ <entry> matches any digit, like`
	`6327`	`+ <literal>[[:digit:]]</literal> </entry>`
`6327`	`6328`	`</row>`
`6328`	`6329`
`6329`	`6330`	`<row>`
`6330`	`6331`	`<entry> <literal>\s</literal> </entry>`
`6331`		`- <entry> <literal>[[:space:]]</literal> </entry>`
	`6332`	`+ <entry> matches any whitespace character, like`
	`6333`	`+ <literal>[[:space:]]</literal> </entry>`
`6332`	`6334`	`</row>`
`6333`	`6335`
`6334`	`6336`	`<row>`
`6335`	`6337`	`<entry> <literal>\w</literal> </entry>`
`6336`		`- <entry> <literal>[[:word:]]</literal> </entry>`
	`6338`	`+ <entry> matches any word character, like`
	`6339`	`+ <literal>[[:word:]]</literal> </entry>`
`6337`	`6340`	`</row>`
`6338`	`6341`
`6339`	`6342`	`<row>`
`6340`	`6343`	`<entry> <literal>\D</literal> </entry>`
`6341`		`- <entry> <literal>[^[:digit:]]</literal> </entry>`
	`6344`	`+ <entry> matches any non-digit, like`
	`6345`	`+ <literal>[^[:digit:]]</literal> </entry>`
`6342`	`6346`	`</row>`
`6343`	`6347`
`6344`	`6348`	`<row>`
`6345`	`6349`	`<entry> <literal>\S</literal> </entry>`
`6346`		`- <entry> <literal>[^[:space:]]</literal> </entry>`
	`6350`	`+ <entry> matches any non-whitespace character, like`
	`6351`	`+ <literal>[^[:space:]]</literal> </entry>`
`6347`	`6352`	`</row>`
`6348`	`6353`
`6349`	`6354`	`<row>`
`6350`	`6355`	`<entry> <literal>\W</literal> </entry>`
`6351`		`- <entry> <literal>[^[:word:]]</literal> </entry>`
	`6356`	`+ <entry> matches any non-word character, like`
	`6357`	`+ <literal>[^[:word:]]</literal> </entry>`
`6352`	`6358`	`</row>`
`6353`	`6359`	`</tbody>`
`6354`	`6360`	`</tgroup>`
`@@ -6813,14 +6819,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.?)(\d+)(.)){1,1}');`
`6813`	`6819`	`If newline-sensitive matching is specified, <literal>.</literal>`
`6814`	`6820`	`and bracket expressions using <literal>^</literal>`
`6815`	`6821`	`will never match the newline character`
`6816`		`- (so that matches willnever crossnewlines unless the RE`
`6817`		`- explicitlyarranges it)`
	`6822`	`+ (so that matches willnot crosslines unless the RE`
	`6823`	`+ explicitlyincludes a newline)`
`6818`	`6824`	`and <literal>^</literal> and <literal>$</literal>`
`6819`	`6825`	`will match the empty string after and before a newline`
`6820`	`6826`	`respectively, in addition to matching at beginning and end of string`
`6821`	`6827`	`respectively.`
`6822`	`6828`	`But the ARE escapes <literal>\A</literal> and <literal>\Z</literal>`
`6823`	`6829`	`continue to match beginning or end of string <emphasis>only</emphasis>.`
	`6830`	`+ Also, the character class shorthands <literal>\D</literal>`
	`6831`	`+ and <literal>\W</literal> will match a newline regardless of this mode.`
	`6832`	`+ (Before <productname>PostgreSQL</productname> 14, they did not match`
	`6833`	`+ newlines when in newline-sensitive mode.`
	`6834`	`+ Write <literal>[^[:digit:]]</literal>`
	`6835`	`+ or <literal>[^[:word:]]</literal> to get the old behavior.)`
`6824`	`6836`	`</para>`
`6825`	`6837`
`6826`	`6838`	`<para>`

`‎src/backend/regex/re_syntax.n`

Lines changed: 6 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -804,7 +804,7 @@ and bracket expressions using`
`804`	`804`	`\fB^\fR`
`805`	`805`	`will never match the newline character`
`806`	`806`	`(so that matches will never cross newlines unless the RE`
`807`		`-explicitlyarranges it)`
	`807`	`+explicitlyincludes a newline)`
`808`	`808`	`and`
`809`	`809`	`\fB^\fR`
`810`	`810`	`and`
`@@ -817,6 +817,11 @@ ARE`
`817`	`817`	`and`
`818`	`818`	`\fB\eZ\fR`
`819`	`819`	`continue to match beginning or end of string\fIonly\fR.`
	`820`	`+Also, the character class shorthands`
	`821`	`+\fB\eD\fR`
	`822`	`+and`
	`823`	`+\fB\eW\fR`
	`824`	`+will match a newline regardless of this mode.`
`820`	`825`	`.PP`
`821`	`826`	`If partial newline-sensitive matching is specified,`
`822`	`827`	`this affects\fB.\fR`

`‎src/backend/regex/regcomp.c`

Lines changed: 2 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -1407,10 +1407,6 @@ charclasscomplement(struct vars *v,`
`1407`	`1407`
`1408`	`1408`	`/* build arcs for char class; this may cause color splitting */`
`1409`	`1409`	`subcolorcvec(v,cv,cstate,cstate);`
`1410`		`-`
`1411`		`-/* in NLSTOP mode, ensure newline is not part of the result set */`
`1412`		`-if (v->cflags&REG_NLSTOP)`
`1413`		`-newarc(v->nfa,PLAIN,v->nlcolor,cstate,cstate);`
`1414`	`1410`	`NOERR();`
`1415`	`1411`
`1416`	`1412`	`/* clean up any subcolors in the arc set */`
`@@ -1612,6 +1608,8 @@ cbracket(struct vars *v,`
`1612`	`1608`
`1613`	`1609`	`NOERR();`
`1614`	`1610`	`bracket(v,left,right);`
	`1611`	`+`
	`1612`	`+/* in NLSTOP mode, ensure newline is not part of the result set */`
`1615`	`1613`	`if (v->cflags&REG_NLSTOP)`
`1616`	`1614`	`newarc(v->nfa,PLAIN,v->nlcolor,left,right);`
`1617`	`1615`	`NOERR();`

`‎src/test/modules/test_regex/expected/test_regex.out`

Lines changed: 8 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -2144,7 +2144,8 @@ select * from test_regex('\D+', E'abc\ndef345', 'nLP');`
`2144`	`2144`	`test_regex`
`2145`	`2145`	`-------------------------------`
`2146`	`2146`	`{0,REG_UNONPOSIX,REG_ULOCALE}`
`2147`		`- {abc}`
	`2147`	`+ {"abc +`
	`2148`	`+ def"}`
`2148`	`2149`	`(2 rows)`
`2149`	`2150`
`2150`	`2151`	`select * from test_regex('[\D]+', E'abc\ndef345', 'LPE');`
`@@ -2159,7 +2160,8 @@ select * from test_regex('[\D]+', E'abc\ndef345', 'nLPE');`
`2159`	`2160`	`test_regex`
`2160`	`2161`	`----------------------------------------`
`2161`	`2162`	`{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}`
`2162`		`- {abc}`
	`2163`	`+ {"abc +`
	`2164`	`+ def"}`
`2163`	`2165`	`(2 rows)`
`2164`	`2166`
`2165`	`2167`	`select * from test_regex('\w+', E'abc_012\ndef', 'LP');`
`@@ -2202,7 +2204,8 @@ select * from test_regex('\W+', E'***\n@@@___', 'nLP');`
`2202`	`2204`	`test_regex`
`2203`	`2205`	`-------------------------------`
`2204`	`2206`	`{0,REG_UNONPOSIX,REG_ULOCALE}`
`2205`		`- {***}`
	`2207`	`+ {"*** +`
	`2208`	`+ @@@"}`
`2206`	`2209`	`(2 rows)`
`2207`	`2210`
`2208`	`2211`	`select * from test_regex('[\W]+', E'***\n@@@___', 'LPE');`
`@@ -2217,7 +2220,8 @@ select * from test_regex('[\W]+', E'***\n@@@___', 'nLPE');`
`2217`	`2220`	`test_regex`
`2218`	`2221`	`----------------------------------------`
`2219`	`2222`	`{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}`
`2220`		`- {***}`
	`2223`	`+ {"*** +`
	`2224`	`+ @@@"}`
`2221`	`2225`	`(2 rows)`
`2222`	`2226`
`2223`	`2227`	`-- doing 13 "escapes"`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit7dc13a0

File tree

4 files changed

4 files changed

`‎doc/src/sgml/func.sgml`

`‎src/backend/regex/re_syntax.n`

`‎src/backend/regex/regcomp.c`

`‎src/test/modules/test_regex/expected/test_regex.out`

0 commit comments