Movatterモバイル変換


[0]ホーム

URL:


[I18n-sig] Re: [Python-Dev] Unicode debate

Guido van Rossumguido@python.org
Fri, 28 Apr 2000 10:50:05 -0400


[Paul Prescod]> I think that maybe an important point is getting lost here. I could be> wrong, but it seems that all of this emphasis on encodings is misplaced.In practical applications that manipulate text, encodings creep up allthe time.  I remember a talk or message by Andy Robinson about themessiness of producing printed reports in Japanese for a largeinvestment firm.  Most off the issues that took his time had to dowith encodings, if I recall correctly.  (Andy, do you remember whatI'm talking about?  Do you have a URL?)> > The truth of the matter is: the encoding of string objects is in the> > mind of the programmer.  When I read a GIF file into a string object,> > the encoding is "binary goop".>> IMHO, it's a mistake of history that you would even think it makes sense> to read a GIF file into a "string" object and we should be trying to> erase that mistake, as quickly as possible (which is admittedly not very> quickly) not building more and more infrastructure around it. How can we> make the transition to a "binary goops are not strings" world easiest?I'm afraid that's a bigger issue than we can solve for Python 1.6.We're committed to by and large backwards compatibility whilesupporting Unicode -- the backwards compatibility with tons ofextension module (many 3rd party) requires that we deal with 8-bitstrings in basically the same way as we did before.> > The moral of all this?  8-bit strings are not going away.>> If that is a statement of your long term vision, then I think that it is> very unfortunate. Treating string literals as if they were isomorphic> with byte arrays was probably the right thing in 1991 but it won't be in> 2005.I think you're a tad too optimistic about the evolution speed ofsoftware (Windows 2000 *still* has to support DOS programs), but I seeyour point.  As I stated in another message, in Python 3000 we'll haveto consider a more Java-esque solution: *character* strings areUnicode, and for bytes we have (mutable!) byte arras.  Certainly 8-bitbytes as the smallest storage unit aren't going away.> It doesn't meet the definition of string used in the Unicode spec., nor> in XML, nor in Java, nor at the W3C nor in most other up and coming> specifications.OK, so that's a good indication of where you're coming from.  Maybeyou should spend a little more time in the trenches and a little lessin standards bodies.  Standards are good, but sometimes disconnectedfrom reality (remember ISO networking? :-).> From the W3C site:>> ""While ISO-2022-JP is not sufficient for every ISO10646 document, it is> the case that ISO10646 is a sufficient document character set for any> entity encoded with ISO-2022-JP.""And this is exactly why encodings will remain important: entitiesencoded in ISO-2022-JP have no compelling reason to be recodedpermanently into ISO10646, and there are lots of forces that make itconvenient to keep it encoded in ISO-2022-JP (like existing tools).>http://www.w3.org/MarkUp/html-spec/charset-harmful.htmlI know that document well.--Guido van Rossum (home page:http://www.python.org/~guido/)


[8]ページ先頭

©2009-2025 Movatter.jp