Movatterモバイル変換

[Python-Dev] Revised PEP 349: Allow str() to return unicode strings

Neil Schemenauernas at arctrix.com
Mon Aug 22 23:31:42 CEST 2005
Previous message:[Python-Dev] Weekly Python Patch/Bug Summary
Next message:[Python-Dev] Revised PEP 349: Allow str() to return unicodestrings
Messages sorted by:[ date ][ thread ][ subject ][ author ]
[Please mail followups topython-dev at python.org.]The PEP has been rewritten based on a suggestion by Guido to changestr() rather than adding a new built-in function.  Based on mytesting, I believe the idea is feasible.  It would be helpful ifpeople could test the patched Python with their own applications andreport any incompatibilities.PEP: 349Title: Allow str() to return unicode stringsVersion: $Revision: 1.3 $Last-Modified: $Date: 2005/08/22 21:12:08 $Author: Neil Schemenauer <nas at arctrix.com>Status: DraftType: Standards TrackContent-Type: text/plainCreated: 02-Aug-2005Post-History: 06-Aug-2005Python-Version: 2.5Abstract    This PEP proposes to change the str() built-in function so that it    can return unicode strings.  This change would make it easier to    write code that works with either string type and would also make    some existing code handle unicode strings.  The C function    PyObject_Str() would remain unchanged and the function    PyString_New() would be added instead.Rationale    Python has had a Unicode string type for some time now but use of    it is not yet widespread.  There is a large amount of Python code    that assumes that string data is represented as str instances.    The long term plan for Python is to phase out the str type and use    unicode for all string data.  Clearly, a smooth migration path    must be provided.    We need to upgrade existing libraries, written for str instances,    to be made capable of operating in an all-unicode string world.    We can't change to an all-unicode world until all essential    libraries are made capable for it.  Upgrading the libraries in one    shot does not seem feasible.  A more realistic strategy is to    individually make the libraries capable of operating on unicode    strings while preserving their current all-str environment    behaviour.    First, we need to be able to write code that can accept unicode    instances without attempting to coerce them to str instances.  Let    us label such code as Unicode-safe.  Unicode-safe libraries can be    used in an all-unicode world.    Second, we need to be able to write code that, when provided only    str instances, will not create unicode results.  Let us label such    code as str-stable.  Libraries that are str-stable can be used by    libraries and applications that are not yet Unicode-safe.        Sometimes it is simple to write code that is both str-stable and    Unicode-safe.  For example, the following function just works:        def appendx(s):            return s + 'x'    That's not too surprising since the unicode type is designed to    make the task easier.  The principle is that when str and unicode    instances meet, the result is a unicode instance.  One notable    difficulty arises when code requires a string representation of an    object; an operation traditionally accomplished by using the str()    built-in function.        Using the current str() function makes the code not Unicode-safe.    Replacing a str() call with a unicode() call makes the code not    str-stable.  Changing str() so that it could return unicode    instances would solve this problem.  As a further benefit, some code    that is currently not Unicode-safe because it uses str() would    become Unicode-safe.Specification    A Python implementation of the str() built-in follows:        def str(s):            """Return a nice string representation of the object.  The            return value is a str or unicode instance.            """            if type(s) is str or type(s) is unicode:                return s            r = s.__str__()            if not isinstance(r, (str, unicode)):                raise TypeError('__str__ returned non-string')            return r                The following function would be added to the C API and would be the    equivalent to the str() built-in (ideally it be called PyObject_Str,    but changing that function could cause a massive number of    compatibility problems):        PyObject *PyString_New(PyObject *);    A reference implementation is available on Sourceforge [1] as a    patch.                Backwards Compatibility    Some code may require that str() returns a str instance.  In the    standard library, only one such case has been found so far.  The    function email.header_decode() requires a str instance and the    email.Header.decode_header() function tries to ensure this by    calling str() on its argument.  The code was fixed by changing    the line "header = str(header)" to:        if isinstance(header, unicode):            header = header.encode('ascii')    Whether this is truly a bug is questionable since decode_header()    really operates on byte strings, not character strings.  Code that    passes it a unicode instance could itself be considered buggy.Alternative Solutions    A new built-in function could be added instead of changing str().    Doing so would introduce virtually no backwards compatibility    problems.  However, since the compatibility problems are expected to    rare, changing str() seems preferable to adding a new built-in.    The basestring type could be changed to have the proposed behaviour,    rather than changing str().  However, that would be confusing    behaviour for an abstract base type.References    [1]http://www.python.org/sf/1266570Copyright    This document has been placed in the public domain.Local Variables:mode: indented-textindent-tabs-mode: nilsentence-end-double-space: tfill-column: 70End:-------------- next part --------------PEP: 349Title: Allow str() to return unicode stringsVersion: $Revision: 1.3 $Last-Modified: $Date: 2005/08/22 21:12:08 $Author: Neil Schemenauer <nas at arctrix.com>Status: DraftType: Standards TrackContent-Type: text/plainCreated: 02-Aug-2005Post-History: 06-Aug-2005Python-Version: 2.5Abstract    This PEP proposes to change the str() built-in function so that it    can return unicode strings.  This change would make it easier to    write code that works with either string type and would also make    some existing code handle unicode strings.  The C function    PyObject_Str() would remain unchanged and the function    PyString_New() would be added instead.Rationale    Python has had a Unicode string type for some time now but use of    it is not yet widespread.  There is a large amount of Python code    that assumes that string data is represented as str instances.    The long term plan for Python is to phase out the str type and use    unicode for all string data.  Clearly, a smooth migration path    must be provided.    We need to upgrade existing libraries, written for str instances,    to be made capable of operating in an all-unicode string world.    We can't change to an all-unicode world until all essential    libraries are made capable for it.  Upgrading the libraries in one    shot does not seem feasible.  A more realistic strategy is to    individually make the libraries capable of operating on unicode    strings while preserving their current all-str environment    behaviour.    First, we need to be able to write code that can accept unicode    instances without attempting to coerce them to str instances.  Let    us label such code as Unicode-safe.  Unicode-safe libraries can be    used in an all-unicode world.    Second, we need to be able to write code that, when provided only    str instances, will not create unicode results.  Let us label such    code as str-stable.  Libraries that are str-stable can be used by    libraries and applications that are not yet Unicode-safe.        Sometimes it is simple to write code that is both str-stable and    Unicode-safe.  For example, the following function just works:        def appendx(s):            return s + 'x'    That's not too surprising since the unicode type is designed to    make the task easier.  The principle is that when str and unicode    instances meet, the result is a unicode instance.  One notable    difficulty arises when code requires a string representation of an    object; an operation traditionally accomplished by using the str()    built-in function.        Using the current str() function makes the code not Unicode-safe.    Replacing a str() call with a unicode() call makes the code not    str-stable.  Changing str() so that it could return unicode    instances would solve this problem.  As a further benefit, some code    that is currently not Unicode-safe because it uses str() would    become Unicode-safe.Specification    A Python implementation of the str() built-in follows:        def str(s):            """Return a nice string representation of the object.  The            return value is a str or unicode instance.            """            if type(s) is str or type(s) is unicode:                return s            r = s.__str__()            if not isinstance(r, (str, unicode)):                raise TypeError('__str__ returned non-string')            return r                The following function would be added to the C API and would be the    equivalent to the str() built-in (ideally it be called PyObject_Str,    but changing that function could cause a massive number of    compatibility problems):        PyObject *PyString_New(PyObject *);    A reference implementation is available on Sourceforge [1] as a    patch.                Backwards Compatibility    Some code may require that str() returns a str instance.  In the    standard library, only one such case has been found so far.  The    function email.header_decode() requires a str instance and the    email.Header.decode_header() function tries to ensure this by    calling str() on its argument.  The code was fixed by changing    the line "header = str(header)" to:        if isinstance(header, unicode):            header = header.encode('ascii')    Whether this is truly a bug is questionable since decode_header()    really operates on byte strings, not character strings.  Code that    passes it a unicode instance could itself be considered buggy.Alternative Solutions    A new built-in function could be added instead of changing str().    Doing so would introduce virtually no backwards compatibility    problems.  However, since the compatibility problems are expected to    rare, changing str() seems preferable to adding a new built-in.    The basestring type could be changed to have the proposed behaviour,    rather than changing str().  However, that would be confusing    behaviour for an abstract base type.References    [1]http://www.python.org/sf/1266570Copyright    This document has been placed in the public domain.Local Variables:mode: indented-textindent-tabs-mode: nilsentence-end-double-space: tfill-column: 70End:
Previous message:[Python-Dev] Weekly Python Patch/Bug Summary
Next message:[Python-Dev] Revised PEP 349: Allow str() to return unicodestrings
Messages sorted by:[ date ][ thread ][ subject ][ author ]
More information about the Python-Devmailing list
[8]ページ先頭