Movatterモバイル変換
[0]ホーム
[Python-Dev] [I18n-sig] Re: Unicode debate
Tom Emersontree@basistech.com
Fri, 28 Apr 2000 06:44:00 -0400 (EDT)
Just van Rossum writes: > How will other parts of a program know which encoding was used for > non-unicode string literals?This is the exact reason that Unicode should be used for all stringliterals: from a language design perspective I don't understand therationale for providing "traditional" and "unicode" string. > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding!In Dylan there is an explicit split between 'characters' (which arealways Unicode) and 'bytes'.What are the compelling reasons to not use UTF-8 as the (source)document encoding? In the past the usual response is, "the tools are'tthere for authoring UTF-8 documents". This argument becomes morespecious as more OS's move towards Unicode. I firmly believe this canbe done without Java's bloat.One off-the-cuff solution is this:All character strings are Unicode (utf-8 encoding). Language terminalsand operators are restricted to US-ASCII, which are identical toUTF8. The contents of comments are not interpreted in any way. > >- We need a way to indicate the encoding of input and output data > >files, and we need shortcuts to set the encoding of stdin, stdout and > >stderr (and maybe all files opened without an explicit encoding). > > Can you open a file *with* an explicit encoding?If you cannot, you lose. You absolutely must be able to specify theencoding of a file when opening it, so that the runtime can transcodeinto the native encoding as you read it. This should be otherwisetransparent the user. -tree-- Tom Emerson Basis Technology Corp.Language Hackerhttp://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
[8]ページ先頭