Movatterモバイル変換
[0]ホーム
[Python-Dev] Internal representation of strings and Micropython
Nick Coghlanncoghlan at gmail.com
Thu Jun 5 15:15:54 CEST 2014
On 5 June 2014 22:37, Paul Sokolovsky <pmiscml at gmail.com> wrote:> On Thu, 5 Jun 2014 22:20:04 +1000> Nick Coghlan <ncoghlan at gmail.com> wrote:>> problems caused by trusting the locale encoding to be correct, but the>> startup code will need non-trivial changes for that to happen - the>> C.UTF-8 locale may even become widespread before we get there).>> ... And until those golden times come, it would be nice if Python did> not force its perfect world model, which unfortunately is not based on> surrounding reality, and let users solve their encoding problems> themselves - when they need, because again, one can go quite a long way> without dealing with encodings at all. Whereas now Python3 forces users> to deal with encoding almost universally, but forcing a particular for> all strings (which is again, doesn't correspond to the state of> surrounding reality). I already hear response that it's good that users> taught to deal with encoding, that will make them write correct> programs, but that's a bit far away from the original aim of making it> write "correct" programs easy and pleasant. (And definition of> "correct" vary.)As I've said before in other contexts, find me Windows, Mac OS X andJVM developers, or educators and scientists that are as concerned bythe text model changes as folks that are primarily focused on Linuxsystem (including network) programming, and I'll be more willing toconcede the point.Windows, Mac OS X, and the JVM are all opinionated about the textencodings to be used at platform boundaries (using UTF-16, UTF-8 andUTF-16, respectively). By contrast, Linux (or, more accurately, POSIX)says "well, it's configurable, but we won't provide a reliablemechanism for finding out what the encoding is. So either guess asbest you can based on the info the OS *does* provide, assume UTF-8,assume 'some ASCII compatible encoding', or don't do anything thatrequires knowing the encoding of the data being exchanged with the OS,like, say, displaying file names to users or accepting arbitrary textas input, transforming it in a content aware fashion, and echoing itback in a console application".None of those options are perfectly good choices. 6(ish) years ago, wechose the first option, because it has the best chance of workingproperly on Linux systems that use ASCII incompatible encodings likeShiftJIS, ISO-2022, and various other East Asian codecs. For normaluser space programming, Linux is pretty reliable when it comes toensuring the locale encoding is set to something sensible, but theprice we currently pay for that decision is interoperability issueswith things like daemons not receiving any configuration settings andhence falling back the POSIX locale and ssh environment forwardingmoving a clients encoding settings to a session on a server withdifferent settings. I still consider it preferable to imposeinconveniences like that based on use case (situations where Linuxsystems don't provide sensible encoding settings) than geographicregion (locales where ASCII incompatible encodings are likely to stillbe in common use).If I (or someone else) ever find the time to implement PEP 432 (orsomething like it) to address some of the limitations of theinterpreter startup sequence that currently make it difficult to avoidrelying on the POSIX locale encoding on Linux, then we'll be in aposition to reassess that decision based on the increased adoption ofUTF-8 by Linux distributions in recent years. As the major communityLinux distributions complete the migration of their system utilitiesto Python 3, we'll get to see if they decide it's better to make theirlocale settings more reliable, or help make it easier for Python 3 toignore them when they're wrong.Cheers,Nick.-- Nick Coghlan |ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Devmailing list
[8]ページ先頭