Movatterモバイル変換
[0]ホーム
This is the mail archive of thelibc-alpha@sourceware.orgmailing list for theglibc project.
Re: Improved check-localedef script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Rafal Luzynski <digitalfreak at lingonborough dot com>
- Cc: Zack Weinberg <zackw at panix dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 04 Aug 2017 11:50:00 +0200
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 587CB8124F
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com><s9d60e3bspn.fsf@redhat.com><26692227.553011.1501838716734@poczta.nazwa.pl>
Rafal Luzynski <digitalfreak@lingonborough.com> wrote:> 4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:>> But even though U+20AC cannot be converted to ISO-8859-1, the>> ca_ES.ISO-8859-1 locale still works because it is transliterated:>>>> $ LC_ALL=ca_ES locale -k currency_symbol charmap>> currency_symbol="EUR">> charmap="ISO-8859-1">>>> So this does not cause an actual problem.>> So the "€" character is actually representable in ISO-8859-1 because> we can convert it to "EUR". Looks like a false positive then.Yes.>> The ca_ES source file is not ASCII, it has>>>> % català>> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>">>>> So maybe I could just convert the file to UTF-8>> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”>> to get rid of the check-localedef warning.>>>> Would that be OK?>> I think that no, it's not OK. If I understand correctly the> "source file is ASCII" sentence means that the individual characters:> '<', '2', '0', 'A', 'C', '>' are ASCII.Yes.> They may describe something more complex like <U00E0>. But even this> is not UTF-8 because UTF-8 would be <C3> <A0> (UTF-8 is 8-bit). The> closest charset would be UCS-2 or simply a generic Unicode.My understanding at the moment is that the “% Charset: ...” commentindicates the encoding used to write the source file. So something like“<U20AC>” is definitely ASCII. Non-ASCII stuff in locale source filesseems to exist only in comments at the moment.-- Mike FABIAN <mfabian@redhat.com>
[8]ページ先頭