Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb107744

Browse files
committed
Improve comment in regc_pg_locale.c.
Reported-by: Noah Misch <noah@leadboat.com>Reviewed-by: Noah Misch <noah@leadboat.com>Discussion:https://postgr.es/m/20250412123430.8c.nmisch@google.com
1 parent3fae25c commitb107744

File tree

1 file changed

+13
-16
lines changed

1 file changed

+13
-16
lines changed

‎src/backend/regex/regc_pg_locale.c

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,22 @@
2121
#include"utils/pg_locale.h"
2222

2323
/*
24-
* To provide as much functionality as possible on a variety of platforms,
25-
* without going so far as to implement everything from scratch, we use
26-
* several implementation strategies depending on the situation:
24+
* For the libc provider, to provide as much functionality as possible on a
25+
* variety of platforms without going so far as to implement everything from
26+
* scratch, we use several implementation strategies depending on the
27+
* situation:
2728
*
2829
* 1. In C/POSIX collations, we use hard-wired code. We can't depend on
2930
* the <ctype.h> functions since those will obey LC_CTYPE. Note that these
3031
* collations don't give a fig about multibyte characters.
3132
*
32-
* 2. In the "default" collation (which is supposed to obey LC_CTYPE):
33-
*
34-
* 2a. When working in UTF8 encoding, we use the <wctype.h> functions.
33+
* 2. When working in UTF8 encoding, we use the <wctype.h> functions.
3534
* This assumes that every platform uses Unicode codepoints directly
36-
* as the wchar_t representation of Unicode. On some platforms
35+
* as the wchar_t representation of Unicode. (XXX: ICU makes this assumption
36+
* even for non-UTF8 encodings, which may be a problem.) On some platforms
3737
* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
3838
*
39-
*2b. In all other encodings, we use the <ctype.h> functions for pg_wchar
39+
*3. In all other encodings, we use the <ctype.h> functions for pg_wchar
4040
* values up to 255, and punt for values above that. This is 100% correct
4141
* only in single-byte encodings such as LATINn. However, non-Unicode
4242
* multibyte encodings are mostly Far Eastern character sets for which the
@@ -46,14 +46,11 @@
4646
* the platform's wchar_t representation matches what we do in pg_wchar
4747
* conversions.
4848
*
49-
* 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
50-
* functions, under exactly the same cases as #2.
51-
*
52-
* There is one notable difference between cases 2 and 3: in the "default"
53-
* collation we force ASCII letters to follow ASCII upcase/downcase rules,
54-
* while in a non-default collation we just let the library functions do what
55-
* they will. The case where this matters is treatment of I/i in Turkish,
56-
* and the behavior is meant to match the upper()/lower() SQL functions.
49+
* As a special case, in the "default" collation, (2) and (3) force ASCII
50+
* letters to follow ASCII upcase/downcase rules, while in a non-default
51+
* collation we just let the library functions do what they will. The case
52+
* where this matters is treatment of I/i in Turkish, and the behavior is
53+
* meant to match the upper()/lower() SQL functions.
5754
*
5855
* We store the active collation setting in static variables. In principle
5956
* it could be passed down to here via the regex library's "struct vars" data

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp