Support for theLC_NUMERIC locale category in Python 2.3 isimplemented only in Python-space. This causes inconsistentbehavior and thread-safety issues for applications that useextension modules and libraries implemented in C that parse andgenerate floats from strings. This document proposes a plan forremoving this inconsistency by providing and using substitutelocale-agnostic functions as necessary.
Python provides generic localization services through the localemodule, which among other things allows localizing the display andconversion process of numeric types. Locale categories, such asLC_TIME andLC_COLLATE, allow configuring precisely what aspectsof the application are to be localized.
TheLC_NUMERIC category specifies formatting for non-monetarynumeric information, such as the decimal separator in float andfixed-precision numbers. Localization of theLC_NUMERIC categoryis currently implemented only in Python-space; C libraries invokedfrom the Python runtime are unaware of Python’sLC_NUMERICsetting. This is done to avoid changing the behavior of certainlow-level functions that are used by the Python parser and relatedcode[2].
However, this presents a problem for extension modules that wrap Clibraries. Applications that use these extension modules willinconsistently display and convert floating-point values.
James Henstridge, the author of PyGTK[3], has additionallypointed out that thesetlocale() function also presentsthread-safety issues, since a thread may call the C librarysetlocale() outside of the GIL, and cause Python to parse andgenerate floats incorrectly.
The inconsistency between Python and C library localization forLC_NUMERIC is a problem for any localized application using Cextensions. The exact nature of the problem will vary dependingon the application, but it will most likely occur when parsing orformatting a floating-point value.
The initial problem that motivated this PEP is related to theGtkSpinButton[4] widget in the GTK+ UI toolkit, wrapped by thePyGTK module. The widget can be set to numeric mode, and whenthis occurs, characters typed into it are evaluated as a number.
Problems occur whenLC_NUMERIC is set to a locale with a floatseparator that differs from the C locale’s standard (for instance,‘,’ instead of ‘.’ for the Brazilian locale pt_BR). BecauseLC_NUMERIC is not set at the libc level, float values aredisplayed incorrectly (using ‘.’ as a separator) in thespinbutton’s text entry, and it is impossible to enter fractionalvalues using the ‘,’ separator.
This small example demonstrates reduced usability for localizedapplications using this toolkit when coded in Python.
Martin v. Löwis commented on the initial constraints for anacceptable solution to the problem on python-dev:
LC_NUMERIC can be set at the C library level withoutbreaking the parser.float() andstr() stay locale-unaware.str() andatof() stay in the locale module.An analysis of the Python source suggests that the followingfunctions currently depend onLC_NUMERIC being set to the Clocale:
Python/compile.c:parsenumber()Python/marshal.c:r_object()Objects/complexobject.c:complex_to_buf()Objects/complexobject.c:complex_subtype_from_string()Objects/floatobject.c:PyFloat_FromString()Objects/floatobject.c:format_float()Objects/stringobject.c:formatfloat()Modules/stropmodule.c:strop_atof()Modules/cPickle.c:load_float()The proposed approach is to implementLC_NUMERIC-agnosticfunctions for converting from (strtod()/atof()) and to(snprintf()) float formats, using these functions where theformatting should not vary according to the user-specified locale.
The locale module should also be changed to remove thespecial-casing forLC_NUMERIC.
This change should also solve the aforementioned thread-safetyproblems.
This problem was initially reported as a problem in the GTK+libraries[5]; since then it has been correctly diagnosed as aninconsistency in Python’s implementation. However, in a fortunatecoincidence, the glib library (developed primarily for GTK+, notto be confused with the GNU C library) implements a number ofLC_NUMERIC-agnostic functions (for an example, see[6]) forreasons similar to those presented in this paper.
In the same GTK+ problem report, Havoc Pennington suggested thatthe glib authors would be willing to contribute this code to thePSF, which would simplify implementation of this PEP considerably.Alex Larsson, the original author of the glib code, submitted aPSF Contributor Agreement[7] on 2003-08-20[8] to ensure the codecould be safely integrated; this agreement has been received andaccepted.
There may be cross-platform issues with the providedlocale-agnostic functions, though this risk is low given that thecode supplied simply reverses any locale-dependent changes made tofloating-point numbers.
Martin and Guido pointed out potential copyright issues with thecontributed code. I believe we will have no problems in this areaas members of the GTK+ and glib teams have said they are fine withrelicensing the code, and a PSF contributor agreement has beenmailed in to ensure this safety.
Tim Peters has pointed out[9] that there are situations involvingthreading in which the proposed change is insufficient to solvethe problem completely. A complete solution, however, does notcurrently exist.
An implementation was developed by Gustavo Carneiro <gjc atinescporto.pt>, and attached to Sourceforge.net bug 774665[10]
The final patch[11] was integrated into Python CVS by Martin v.Löwis on 2004-06-08, as stated in the bug report.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0331.rst
Last modified:2025-02-01 08:59:27 GMT