Movatterモバイル変換
[0]ホーム
[Python-Dev] accept string in a2b and base64?
Stephen J. Turnbullstephen at xemacs.org
Wed Feb 22 08:37:55 CET 2012
R. David Murray writes: > If most people agree with Antoine I won't fight it, but it seems to me > that accepting unicode in the binascii and base64 APIs is a bad > idea.First, I agree with David that this change should have been brought upon python-dev before committing it. The distinctions Python 3 hasmade between APIs for bytes and those for str are both obviouslycontroversial and genuinely delicate.Second, if Unicode is to be accepted in these APIs, there is a docissue (which I haven't checked). It must be made clear that the"printable ASCII" is question is the set represented by the *integers*33 to 126, *not* the ASCII characters ! to ~. Those characters arepresent in the Unicode repertoire in many other places (specificallythe "full-width ASCII" compatibility character set around U+FF20, butalso several Greek and Cyrillic characters, and possibly others.)I'm going to side with Antoine and Nick on these particular changesbecause in practice (except maybe in the email module :-( ) theBASE-encoded "text" to be decoded is going to be consistently definedby the client as either str or bytes, but not both. The fact that therepr of the encoded text is identical (except for the presence orabsence of a leading "b") is very suggestive here. I do harbor aslight niggle that I think there is more room for confusion here thanin Nick's urllib work.However, once we clarify that confusion in *our* minds, I don't thinkthere's much potential for dangerous confusion for API clients. (Iagree with Antoine on that point.) The BASE## decoding APIs inabstract are "text" to bytes. Pedantically in Python that suggests astr -> bytes signature, but RFC 4648 doesn't anywhere require a 1-byterepresentation of ASCII, only that the representation be interpretedas integers in the ASCII coding. However, an RFC-4648-conformingimplementation MUST reject any string containing characters notallowed in the representation, so it's actually stricter thanrequiring ASCII. I see no problem with allowing str-or-bytes -> bytespolymorphism here.The remaining issue to my mind is we'd also like bytes -> str-or-bytespolymorphism for symmetry, but this is not Haskell, we can't have it.The same is true for binascii, I suppose -- assuming that the moduleis specified (as the name suggests) to produce and consume only ASCIItext as a representation of bytes.
More information about the Python-Devmailing list
[8]ページ先頭