Movatterモバイル変換

[Python-Dev] accept string in a2b and base64?

Nick Coghlanncoghlan at gmail.com
Tue Feb 21 03:51:08 CET 2012

Previous message:[Python-Dev] accept string in a2b and base64?
Next message:[Python-Dev] accept string in a2b and base64?
Messages sorted by:[ date ][ thread ][ subject ][ author ]

On Tue, Feb 21, 2012 at 11:24 AM, R. David Murray <rdmurray at bitdance.com> wrote:> If most people agree with Antoine I won't fight it, but it seems to me> that accepting unicode in the binascii and base64 APIs is a bad idea.I see it as essentially the same as the changes I made inurllib.urlparse to support pure ASCII bytes->bytes in many of the APIs(which work by doing an implicit ascii+strict decode at the beginningof the function, and then reversing that at the end). For those, ifyour byte sequence has non-ASCII data in it, they'll throw aUnicodeDecodeError and it's up to you to figure out where thosenon-ASCII bytes are coming from. Similarly, if one of these updatedAPIs throws ValueError, then you'll have to figure out where thenon-ASCII code points are coming from.Yes, it's a niggling irritation from a purist point of view, but it'salso an acknowledgement of the fact that whether a pure ASCII sequenceshould be treated as a sequence of bytes or a sequence of code pointsis going to be application and context depended. Sometimes it willmake more sense to treat it as binary data, other times as text.The key point is that any multimode support that depends on implicittype conversion from bytes->str (or vice-versa) really needs to belimited to *strict* ASCII only (if no other information on theencoding is available). If something is 7-bit ASCII pure, then oddsare very good that it really *is* ASCII text. As soon as thathigh-order bit gets set though, all bets are off and we have to pushthe text encoding problem back on the API caller to figure out.The reason Python 2's implicit str<->unicode conversions are soproblematic isn't just because they're implicit: it's because theyeffectively assume *latin-1* as the encoding on the 8-bit str side.That means reliance on implicit decoding can silently corruptnon-ASCII data instead of triggering exceptions at the point ofimplicit conversion. If you're lucky, some *other* part of theapplication will detect the corruption and you'll have at least avague hope of tracking it down. Otherwise, the corrupted data mayescape the application and you'll have an even *thornier* debuggingproblem on your hands.My one concern with the base64 patch is that it doesn't test thatmixing types triggers TypeError. While this shouldn't require anyextra code (the error should arise naturally from the methodimplementation), it should still be tested explicitly to ensure typemismatches fail as expected. Checking explicitly for mismatches in thecode would then just be a matter of wanting to emit nice errormessages explaining the problem rather than being needed forcorrectness reasons (e.g. urlparse uses pre-checks in order to emit aclear error message for type mismatches, but it has significantlylonger function signatures to deal with).Cheers,Nick.-- Nick Coghlan   |  ncoghlan at gmail.com   |   Brisbane, Australia

Previous message:[Python-Dev] accept string in a2b and base64?
Next message:[Python-Dev] accept string in a2b and base64?
Messages sorted by:[ date ][ thread ][ subject ][ author ]

More information about the Python-Devmailing list

[8]ページ先頭