Movatterモバイル変換
[0]ホーム
This is the mail archive of thelibc-alpha@sourceware.orgmailing list for theglibc project.
Re: Improved check-localedef script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
- Date: Tue, 08 Aug 2017 09:00:26 +0200
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 739FBC04B31B
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com><CAKCAbMhVb3+CzRcSGTHVuahuwHryhtZTEYq=XiSyERtjPwbmXw@mail.gmail.com>
Zack Weinberg <zackw@panix.com> wrote:> On Thu, Aug 3, 2017 at 5:17 PM, Zack Weinberg <zackw@panix.com> wrote:>> Here is an improved version of the check-localedef script I posted the>> other week.>> Here is another revision which uses the SUPPORTED file to learn the> legacy encodings for each locale, rather than looking at %Charset:> annotations in the source files. You run it like this now (from the> top level of the source tree):>> $ ./scripts/check-localedef.py -p localedata/locales -f> localedata/SUPPORTED localedata/locales/*>> The final "localedata/locales/*" part is not _required_; it only> enables the script to tell you about any locales that are missing from> the SUPPORTED file.>> (Also, still more bugs have been fixed; in particular the> "inappropriate character" errors have been restored. Doh.)>> It's possible that Python isn't going to work out as the> implementation language for this script. I used it because its> standard library provides Unicode normalization and many codecs for> legacy encodings, but it doesn't know all of the encodings mentioned> in localedata/SUPPORTED (ARMSCII-8, GEORGIAN-PS, and EUC-TW are> missing) and I don't think it knows how to do transliteration, either.> And it's still a solid order of magnitude slower than it should be. localedata/locales/uz_UZ:212: string not representable in iso8859-1: 0073 006F 02BB 006D That is “soʻm” where the 3rd character is U+02BB MODIFIER LETTER TURNED COMMA.In the Latin1 version of the uz_UZ locale this gets transliteratedinto U+0027 APOSTROPHE: $ LC_ALL=uz_UZ.ISO-8859-1 locale -k currency_symbol currency_symbol="so'm"It looks like most of the “string not representable” warnings are falsepositives.-- Mike FABIAN <mfabian@redhat.com>
[8]ページ先頭