NotificationsYou must be signed in to change notification settings
Fork4.9k
Star17.8k

Commitb107744

committed

Improve comment in regc_pg_locale.c.

Reported-by: Noah Misch <noah@leadboat.com>Reviewed-by: Noah Misch <noah@leadboat.com>Discussion:https://postgr.es/m/20250412123430.8c.nmisch@google.com

1 parent3fae25c commitb107744Copy full SHA for b107744

File tree

1 file changed

+13

-16

lines changed

src/backend/regex
- regc_pg_locale.c

1 file changed

+13

-16

lines changed

`‎src/backend/regex/regc_pg_locale.c`

Lines changed: 13 additions & 16 deletions

Original file line number	Diff line number	Diff line change
`@@ -21,22 +21,22 @@`
`21`	`21`	`#include"utils/pg_locale.h"`
`22`	`22`
`23`	`23`	`/*`
`24`		`- * To provide as much functionality as possible on a variety of platforms,`
`25`		`- * without going so far as to implement everything from scratch, we use`
`26`		`- * several implementation strategies depending on the situation:`
	`24`	`+ * For the libc provider, to provide as much functionality as possible on a`
	`25`	`+ * variety of platforms without going so far as to implement everything from`
	`26`	`+ * scratch, we use several implementation strategies depending on the`
	`27`	`+ * situation:`
`27`	`28`	`*`
`28`	`29`	`* 1. In C/POSIX collations, we use hard-wired code. We can't depend on`
`29`	`30`	`* the <ctype.h> functions since those will obey LC_CTYPE. Note that these`
`30`	`31`	`* collations don't give a fig about multibyte characters.`
`31`	`32`	`*`
`32`		`- * 2. In the "default" collation (which is supposed to obey LC_CTYPE):`
`33`		`- *`
`34`		`- * 2a. When working in UTF8 encoding, we use the <wctype.h> functions.`
	`33`	`+ * 2. When working in UTF8 encoding, we use the <wctype.h> functions.`
`35`	`34`	`* This assumes that every platform uses Unicode codepoints directly`
`36`		`- * as the wchar_t representation of Unicode. On some platforms`
	`35`	`+ * as the wchar_t representation of Unicode. (XXX: ICU makes this assumption`
	`36`	`+ * even for non-UTF8 encodings, which may be a problem.) On some platforms`
`37`	`37`	`* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.`
`38`	`38`	`*`
`39`		`- *2b. In all other encodings, we use the <ctype.h> functions for pg_wchar`
	`39`	`+ *3. In all other encodings, we use the <ctype.h> functions for pg_wchar`
`40`	`40`	`* values up to 255, and punt for values above that. This is 100% correct`
`41`	`41`	`* only in single-byte encodings such as LATINn. However, non-Unicode`
`42`	`42`	`* multibyte encodings are mostly Far Eastern character sets for which the`
`@@ -46,14 +46,11 @@`
`46`	`46`	`* the platform's wchar_t representation matches what we do in pg_wchar`
`47`	`47`	`* conversions.`
`48`	`48`	`*`
`49`		`- * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>`
`50`		`- * functions, under exactly the same cases as #2.`
`51`		`- *`
`52`		`- * There is one notable difference between cases 2 and 3: in the "default"`
`53`		`- * collation we force ASCII letters to follow ASCII upcase/downcase rules,`
`54`		`- * while in a non-default collation we just let the library functions do what`
`55`		`- * they will. The case where this matters is treatment of I/i in Turkish,`
`56`		`- * and the behavior is meant to match the upper()/lower() SQL functions.`
	`49`	`+ * As a special case, in the "default" collation, (2) and (3) force ASCII`
	`50`	`+ * letters to follow ASCII upcase/downcase rules, while in a non-default`
	`51`	`+ * collation we just let the library functions do what they will. The case`
	`52`	`+ * where this matters is treatment of I/i in Turkish, and the behavior is`
	`53`	`+ * meant to match the upper()/lower() SQL functions.`
`57`	`54`	`*`
`58`	`55`	`* We store the active collation setting in static variables. In principle`
`59`	`56`	`* it could be passed down to here via the regex library's "struct vars" data`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitb107744

File tree

1 file changed

1 file changed

`‎src/backend/regex/regc_pg_locale.c`

0 commit comments