|
21 | 21 | #include"utils/pg_locale.h"
|
22 | 22 |
|
23 | 23 | /*
|
24 |
| - * To provide as much functionality as possible on a variety of platforms, |
25 |
| - * without going so far as to implement everything from scratch, we use |
26 |
| - * several implementation strategies depending on the situation: |
| 24 | + * For the libc provider, to provide as much functionality as possible on a |
| 25 | + * variety of platforms without going so far as to implement everything from |
| 26 | + * scratch, we use several implementation strategies depending on the |
| 27 | + * situation: |
27 | 28 | *
|
28 | 29 | * 1. In C/POSIX collations, we use hard-wired code. We can't depend on
|
29 | 30 | * the <ctype.h> functions since those will obey LC_CTYPE. Note that these
|
30 | 31 | * collations don't give a fig about multibyte characters.
|
31 | 32 | *
|
32 |
| - * 2. In the "default" collation (which is supposed to obey LC_CTYPE): |
33 |
| - * |
34 |
| - * 2a. When working in UTF8 encoding, we use the <wctype.h> functions. |
| 33 | + * 2. When working in UTF8 encoding, we use the <wctype.h> functions. |
35 | 34 | * This assumes that every platform uses Unicode codepoints directly
|
36 |
| - * as the wchar_t representation of Unicode. On some platforms |
| 35 | + * as the wchar_t representation of Unicode. (XXX: ICU makes this assumption |
| 36 | + * even for non-UTF8 encodings, which may be a problem.) On some platforms |
37 | 37 | * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
|
38 | 38 | *
|
39 |
| - *2b. In all other encodings, we use the <ctype.h> functions for pg_wchar |
| 39 | + *3. In all other encodings, we use the <ctype.h> functions for pg_wchar |
40 | 40 | * values up to 255, and punt for values above that. This is 100% correct
|
41 | 41 | * only in single-byte encodings such as LATINn. However, non-Unicode
|
42 | 42 | * multibyte encodings are mostly Far Eastern character sets for which the
|
|
46 | 46 | * the platform's wchar_t representation matches what we do in pg_wchar
|
47 | 47 | * conversions.
|
48 | 48 | *
|
49 |
| - * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h> |
50 |
| - * functions, under exactly the same cases as #2. |
51 |
| - * |
52 |
| - * There is one notable difference between cases 2 and 3: in the "default" |
53 |
| - * collation we force ASCII letters to follow ASCII upcase/downcase rules, |
54 |
| - * while in a non-default collation we just let the library functions do what |
55 |
| - * they will. The case where this matters is treatment of I/i in Turkish, |
56 |
| - * and the behavior is meant to match the upper()/lower() SQL functions. |
| 49 | + * As a special case, in the "default" collation, (2) and (3) force ASCII |
| 50 | + * letters to follow ASCII upcase/downcase rules, while in a non-default |
| 51 | + * collation we just let the library functions do what they will. The case |
| 52 | + * where this matters is treatment of I/i in Turkish, and the behavior is |
| 53 | + * meant to match the upper()/lower() SQL functions. |
57 | 54 | *
|
58 | 55 | * We store the active collation setting in static variables. In principle
|
59 | 56 | * it could be passed down to here via the regex library's "struct vars" data
|
|