Movatterモバイル変換
[0]ホーム
[Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons)
Guido van Rossumguido@python.org
Wed, 05 Apr 2000 10:16:15 -0400
> Sigh. In our company we use 'german' as our master language so> we have string literals containing iso-8859-1 umlauts all over the place.> Okay as long as we don't mix them with Unicode objects, this doesn't> hurt anybody.>> What I would love to see, would be a well defined way to tell the> interpreter to use 'latin-1' as default encoding instead of 'UTF-8'> when dealing with string literals from our modules.It would be better if this was supported for u"..." literals, so thatit was taken care of at the source code level completely. The runningprogram shouldn't have to worry about what encoding its source codewas!For 8-bit literals, this would mean that if you had source code usingLatin-1, the literals would be translated from Latin-1 to UTF-8 by thecode generator. This would mean that len('ç') would return 2. I'mnot sure this is a great idea -- but then I'm not sure that usingLatin-1 in source code is a great idea either.> The tokenizer in Python 1.6 already contains smart logic to get the> size of TABs right (pasting from tokenizer.c):>> /* Skip comment, while looking for tab-setting magic */> if (c == '#') {> static char *tabforms[] = {> "tab-width:", /* Emacs */> ":tabstop=", /* vim, full form */> ":ts=", /* vim, abbreviated form */> "set tabsize=", /* will vi never die? */> /* more templates can be added here to support other editors */> };> ..>> It wouldn't be to hard to add something there to recognize> other "pragma" comments like for example:> #content-transfer-encoding: iso-8859-1> But what to do with it? May be adding a default encoding to every string> object? Is this bloat? Just an idea.Before we go any further we should design pragmas. The currentapproach is inefficient and only designed to accommodateeditor-specific magical commands.I say it's a Python 1.7 issue.--Guido van Rossum (home page:http://www.python.org/~guido/)
[8]ページ先頭