Movatterモバイル変換

Issue1580

➜

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45921

classification

Title:	Use shorter float repr when possible
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.1

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	alexandre.vassalotti, amaury.forgeotdarc, christian.heimes, eric.smith, eric.snow, gvanrossum, jaredgrubb, mark.dickinson, nascheme, noam, olau, pitrou, preston, rhettinger, tim.peters
Priority:	normal	Keywords:	patch

Created on2007-12-10 19:13 bynoam, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File name	Uploaded	Description	Edit
short_float_repr.diff	noam,2007-12-10 19:13
ieee_dbl.c	christian.heimes,2007-12-11 02:51
float.diff	gvanrossum,2008-07-04 04:56	Tentative patch for py3k
float2.diff	gvanrossum,2008-07-07 03:27	Improved tentative patch for py3k
unnamed	preston,2009-02-25 18:11
unnamed	preston,2009-02-27 03:57

Messages (109)
msg58359 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-10 19:13
The current float repr() always calculates the 17 first digits of thedecimal representation of a float, and displays all of them (discardingtrailing zeros). This is a simple method for making sure thateval(repr(f)) == f. However, many times it results in a long string,where a shorter string would have sufficed. For example, currentlyrepr(1.1) is '1.1000000000000001', where obviously '1.1' would have beengood enough, and much easier to read.This patch implements an algorithm for finding the shortest string thatwill evaluate to the right number. It is based on the code fromhttp://www.cs.indiana.edu/~burger/fp/index.html, and also on thefloating-point formatting function of TCL, Tcl_PrintDouble.The patch also adds a test case, which takes a long list of floatingpoint numbers, created especially for testing binary to decimalconversion, and makes sure that eval(repr(f)) == f. Seefloating_points.txt for the source of the list.
msg58364 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-10 19:49
I like this; but I don't have time for a complete thourough review. Maybe Tim can lend a hand?If Tim has no time, I propose that if it works correctly without leakson at least Windows, OSX and Linux, we check it in, and worry about morereview later.Crys, can you test this on Windows and see if it needs any project fileupdates? I'll check OSX. Linux works and I see no leaks.
msg58374 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-10 22:20
Applied inr59457I had to add checks for _M_X64 and _M_IA64 to doubledigits.c. The restwas fine on Windows.
msg58394 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 00:37
Noam, perhaps you can help with this? We checked this in but found aproblem: repr(1e5) suddenly returns '1.0'. Can you think of a cause forthis?
msg58398 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 01:08
I don't know, for me it works fine, even after downloading a fresh SVNcopy. On what platform does it happen?
msg58399 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 01:09
I've disabled the new repr() in trunk and py3k until somebody has sortedout the build problems. I've removed doubledigits.c from Makefile.in anddisabled the code with #ifdef Py_BROKEN_REPR in floatobject.c.
msg58401 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 01:24
> On what platform does it happen?Linux on x86.It seems find on PPC OSX.This suggests it could be a byte order bug.
msg58403 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 01:31
I also use linux on x86. I think that byte order would cause differentresults (the repr of a random float shouldn't be "1.0".)Does the test case run ok? Because if it does, it's really strange.
msg58405 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-11 01:58
There is nothing you can do to repr() that's sufficient by itself toensure eval(repr(x)) == x.The other necessary part is correctly rounded float /input/ routines.The 754 standard does not require correctly rounding input or outputroutines. It does require that eval(repr(x)) == x provided that repr(x)produce at least 17 significant digits (not necessarily correctlyrounded, but "good enough" so that eval(repr(x)) == x). That's whyPython produces 17 significant digits. While there's no guarantee thatall platform I/O routines are good enough to meet the 754 requirement,most do, and what Python's repr() actually does is the strongest that/can/ be done assuming no more than full 754 conformance.Again, this cannot be improved cross-platform without Python supplyingboth output and input routines. Burger's output routines supply onlyhalf of what's needed. An efficient way to do the other half (correctlyrounding input) was first written up by Clinger. Note that neither theBurger/Dybvig output code nor the Clinger input code are used much inpractice anymore; David Gay's code is usually used instead for "go fast"reasons:http://mail.python.org/pipermail/python-list/2004-July/272167.htmlClinger also endorses Gay's code:ftp://ftp.ccs.neu.edu/pub/people/will/retrospective.pdfHowever, as the Python list link says, Gay's code is "mind-numbinglycomplicated".There is no easy cross-platform solution, where "cross-platform" meansjust "limited to 754 platforms".
msg58407 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 02:51
I've written a small C program for auto config that checks theendianness of IEEE doubles. Neil has promised to add something toconfigure.in when I come up with a small C program. It should do theright thing on a BE platform but I can't test it.Tim, does Python run on a non IEEE 754 platform? Should the new repr()be ripped out? I hate to break Python for scientists just to silencesome ignorants and unexperienced newbies.
msg58411 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-11 03:15
Again, without replacing float input routines too, this is /not/ goodenough to ensure eval(repr(x)) == x even across just 100%-conforming 754platforms.It is good enough to ensure (well, assuming the code is 100% correct)eval(repr(x)) == x across conforming 754 platforms that go /beyond/ thestandard by also supplying correctly rounding input routines. Someplatform C libraries do, some don't. For example, I believe glibc does(and builds its I/O code on David Gay's, mentioned before), but thatMicrosoft's does not (but that Microsoft's do meet the 754 standard,which-- again --isn't strong enough for "shortest output" to round-tripcorrectly in all cases).
msg58416 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 07:59
Oh, this is sad. Now I know why Tcl have implemented also a decimal tobinary routine.Perhaps we can simply use both their routines? If I am not mistaken,their only real dependency is on a library which allows arbitrary longintegers, called tommath, from which they use a few basic functions.We can use instead the functions from longobject.c. It will probablybe somewhat slower, since longobject.c wasn't created to allowin-place operations, but I don't think it should be that bad -- we aremostly talking about compile time.
msg58419 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 08:37
The Tcl code can be fonund here:http://tcl.cvs.sourceforge.net/tcl/tcl/generic/tclStrToD.c?view=markupWhat Tim says gives another reason for using that code - it means thatcurrently, the compilation of the same source code on two platforms canresult in a code which does different things.Just to make sure - IEEE does require that operations on doubles will dothe same thing on different platforms, right?
msg58422 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 09:11
It's really a shame. It was a nice idea ...Could we at least use the new formating for str(float) and the displayof floats? In Python 2.6 floats are not displayed with repr(). They seemto use yet another hook.>>> repr(11./5)'2.2'>>> 11./52.2000000000000002>>> str(11./5)'2.2'
msg58424 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 12:30
I think that for str(), the current method is better - using the newrepr() method will make str(1.13) == '3.3000000000000003', instead of'3.3'. (The repr is right - you can check, and 1.13 != 3.3. But forstr() purposes it's fine.)But I actually think that we should also use Tcl's decimal to binaryconversion - otherwise, a .pyc file created by python compiled withMicrosoft will cause a different behaviour from a .pyc file created bypython compiled with Gnu, which is quite strange.
msg58427 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 13:21
If I think about it some more, why not get rid of all the floatplatform-dependencies and define how +inf, -inf and nan behave?I think that it means:* inf and -inf are legitimate floats just like any other float.Perhaps there should be a builtin Inf, or at least math.inf.* nan is an object of type float, which behaves like None, that is:"nan == nan" is true, but "nan < nan" and "nan < 3" will raise anexception. Mathematical operations which used to return nan will raisean exception (division by zero does this already, but "inf + -inf"will do that too, instead of returning nan.) Again, there should be abuiltin NaN, or math.nan. The reason for having a special nan objectis compatibility with IEEE floats - I want to be able to pass aroundIEEE floats easily even if they happen to be nan.This is basically what Tcl did, if I understand correctly - see item 6inhttp://www.tcl.tk/cgi-bin/tct/tip/132.html .
msg58429 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 13:46
Noam Raphael wrote:> * nan is an object of type float, which behaves like None, that is:> "nan == nan" is true, but "nan < nan" and "nan < 3" will raise an> exception. No, that's not correct. The standard defines that nan is always unequalto nan.False>>> float("inf") == float("inf")True>>> float("inf") == -1*float("-inf")TrueThe float module could gain three singletons nan, inf an neginf so youcan doTrueChristian
msg58430 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 13:58
‎That's right, but the standard also defines that 0.0/0 -> nan, and1.0/0 -> inf, but instead we raise an exception. It's just that inPython, every object is expected to be equal to itself. Otherwise, howcan I check if a number is nan?
msg58432 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 15:09
I propose that we add three singletons to the float implementation:PyFloat_NaNPyFloat_InfPyFloat_NegInfThe singletons are returned from PyFloat_FromString() for "nan", "inf"and "-inf". The other PyFloat_ method must return the singletons, too.It's easy to check the signum, special and nan/inf status of an IEEEdouble with a struct.I've already started to work on a way to create nan, inf and -inf crossplatform conform. Right now float("nan"), float("inf") and float("-inf")works on Linux but doesn't work on Windows. We could also add fourmethods to math:signum(float) -> 1/-1, finity(float) -> bool, infinite(float), nan(float)Here is my work in progress:#define Py_IEEE_DBL (double)&(ieee_double)#if defined(LITTLE_ENDIAN_IEEE_DOUBLE) && !defined(BIG_ENDIAN_IEEE_DOUBLE)typedef struct { unsigned int m4 : 16; unsigned int m3 : 16; unsigned int m2 : 16; unsigned int m1 : 4; unsigned int exp : 11; unsigned int sign : 1;} ieee_double;#define Py_IEEE_NAN Py_IEEE_DBL{0xffff, 0xffff, 0xffff, 0xf, 0x7ff, 0}#define Py_IEEE_INF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 0}#define Py_IEEE_NEGINF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 1}#elif !defined(LITTLE_ENDIAN_IEEE_DOUBLE) && defined(BIG_ENDIAN_IEEE_DOUBLE)typedef struct { unsigned int sign : 1; unsigned int exp : 11; unsigned int m1 : 4; unsigned int m2 : 16; unsigned int m3 : 16; unsigned int m4 : 16;} ieee_double;#define Py_IEEE_NAN Py_IEEE_DBL{0, 0x7ff, 0xf, 0xffff, 0xffff, 0xffff}#define Py_IEEE_INF Py_IEEE_DBL{0, 0x7ff, 0, 0, 0, 0}#define Py_IEEE_NEGINF Py_IEEE_DBL{1, 0x7ff, 0, 0, 0, 0}#else#error Unknown or no IEEE double implementation#endif
msg58433 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 15:21
(1) Despite Tim's grave language, I don't think we'll need to write ourown correctly-rounding float input routine. We can just say that Pythonwon't work correctly unless your float input routine is roundingcorrectly; a unittest should detect whether this is the case. I expectthat more recent C runtimes for Windows are correct.(1a) Perhaps it's better to only do this for Python 3.0, which has asmaller set of platforms to support.(2) Have you two (Christian and Noam) figured out yet why repr(1e5) is'1.0' on some platforms? If it's not byte order (and I no longerbelieve it is) what could it be? An uninitialized variable? It doesn'tseem to vary with the processor, but it could vary with the C runtime or OS.(3) Detecting NaNs and infs in a platform-independent way is tricky,probably beyond you (I know it's beyond me). (Of course, producingstandardized output for them and recognizing these as input is easy, andthat should definitely be done.)(4) Looks like we've been a bit too hasty checking this in. Let's beextra careful for round #2.
msg58436 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 15:47
Guido van Rossum wrote:> (1a) Perhaps it's better to only do this for Python 3.0, which has a> smaller set of platforms to support.+1Does Python depend on a working, valid and non-broken IEEE 754 floatingpoint arithmetic? Could we state the Python's float type depends onIEEE754-1985 conform 64bit double precision format. Developers for aplatform with no IEEE754 conform double must role their own floatobjectimplementation.> (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is> '1.0' on some platforms? If it's not byte order (and I no longer> believe it is) what could it be? An uninitialized variable? It doesn't> seem to vary with the processor, but it could vary with the C runtime or OS.I wasn't able to reproduce the problem on my work stations. Can you giveus more information about the computer with failing tests? CPU, OS,Kernel, libc etc.> (3) Detecting NaNs and infs in a platform-independent way is tricky,> probably beyond you (I know it's beyond me). (Of course, producing> standardized output for them and recognizing these as input is easy, and> that should definitely be done.)In fact detecting NaNs and Infs is platform-independent is dead easy IFF the platform has IEEE 754 support. A double uses 64bit to store thedata, 1 bit signum, 11 bit exponent and 53bit fraction part (akamantissa). When all bits of the exponent are set (0x7ff) the number iseither a NaN or +/-Inf. For Infs the fraction part has no bits set (0)and the signum bit signals +Inf (sig: 0) or -Inf (sig: 1). If one to allbits of the mantissa is set, it's a NaN.The form (normalized or denormalized) is regardless for the detection ofNaN and Inf. An autoconf macro can easily detect the endianness of thedouble implementation.We can also use the much slower version and check if x > DBL_MAX for+Inf, x < -DBL_MAX for -Inf and x != x for NaN.Christian
msg58443 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 17:34
> > (1a) Perhaps it's better to only do this for Python 3.0, which has a> > smaller set of platforms to support.>> +1> Does Python depend on a working, valid and non-broken IEEE 754 floating> point arithmetic? Could we state the Python's float type depends on> IEEE754-1985 conform 64bit double precision format. Developers for a> platform with no IEEE754 conform double must role their own floatobject> implementation.No, traditionally Python has just used whatever C's double provides.There are some places that benefit from IEEE 754, but few that requireit (dunno about optional extension modules).> > (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is> > '1.0' on some platforms? If it's not byte order (and I no longer> > believe it is) what could it be? An uninitialized variable? It doesn't> > seem to vary with the processor, but it could vary with the C runtime or OS.>> I wasn't able to reproduce the problem on my work stations. Can you give> us more information about the computer with failing tests? CPU, OS,> Kernel, libc etc.So far I have only one box where it is broken (even after make clobberand ./configure). I cannot reproduce it with a debug build.It's an x86 Linux box running in mixed 64-32 bit mode.From /etc/lsb-release: "Ubuntu 6.06 LTS"uname -a: Linux pythonic.corp.google.com 2.6.18.5-gg16-mixed64-32 #1SMP Wed Oct 10 17:45:08 PDT 2007 x86_64 GNU/Linuxgcc -v: gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)I'm afraid I'll have to debug this myself, but not today.> > (3) Detecting NaNs and infs in a platform-independent way is tricky,> > probably beyond you (I know it's beyond me). (Of course, producing> > standardized output for them and recognizing these as input is easy, and> > that should definitely be done.)>> In fact detecting NaNs and Infs is platform-independent is dead easy IFF> the platform has IEEE 754 support. A double uses 64bit to store the> data, 1 bit signum, 11 bit exponent and 53bit fraction part (aka> mantissa). When all bits of the exponent are set (0x7ff) the number is> either a NaN or +/-Inf. For Infs the fraction part has no bits set (0)> and the signum bit signals +Inf (sig: 0) or -Inf (sig: 1). If one to all> bits of the mantissa is set, it's a NaN.>> The form (normalized or denormalized) is regardless for the detection of> NaN and Inf. An autoconf macro can easily detect the endianness of the> double implementation.Only for IEEE 754 though.> We can also use the much slower version and check if x > DBL_MAX for> +Inf, x < -DBL_MAX for -Inf and x != x for NaN.Of course the latter isn't guaranteed to help for non-IEEE-754platforms -- some platforms don't have NaNs at all!
msg58445 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2007-12-11 17:51
> Of course the latter isn't guaranteed to help for > non-IEEE-754 platforms -- some platforms don't have> NaNs at all! ISTM, that years of toying with Infs and Nans has notyielded a portable, workable solution. I'm concernedthat further efforts will contort and slow-down ourcode without benefit. NaNs in particular are a reallydifficult case because our equality testing routineshave a fast path where identity implies equality. I don'tthink NaNs can be correctly implemented without breakingthat fast path which appears throughout the code base(even inside the code for dictobject.c; otherwise, howcould you lookup a NaN value).I recommend punting on the subject of NaNs. Attemptingto "fix" it will necessarily entail introducing a number of little atrocities that just aren't worth it.IMO, Practicality beats purity on this one.The decimal module provides full support for NaNs.While it is slow, it may be sufficient for someone whoisn't happy with the way float handles them.
msg58449 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-11 17:57
[Raymond]> ...> NaNs in particular are a really> difficult case because our equality testing routines> have a fast path where identity implies equality.Works as intended in 2.5; this is Windows output:1.#INF>>> nan = inf - inf>>> nan # really is a NaN-1.#IND>>> nan is nan # of course this is trueTrue>>> nan == nan # but not equal anywayFalse
msg58452 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-11 18:24
[Guido]> ... We can just say that Python> won't work correctly unless your float input routine is rounding> correctly; a unittest should detect whether this is the case.Sorry, but that's intractable. Correct rounding is a property thatneeds to be proved, not tested. Processors aren't fast enough to testall (roughly) 2**64 possible doubles in reasonable time, and there'sonly one way to get all of them right (correct rounding). There aremany ways to get at least one of them wrong, and short of proof onlyexhaustive testing can determine whether at least one is wrong. Thereis no "smoking gun" specific test case, either -- different flawedroutines will fail on different inputs.> I expect that more recent C runtimes for Windows are correct.Based on what? Unless MS says so somewhere, it's very hard to know.Vern Paxson and David Hough developed a test program in the early90's, based on an excruciatingly difficult algorithm I invented forfinding "hard cases", and the code they wrote still appears to beavailable from the places Vern mentions in this brief msg:http://www.netlib.org/fp/testbaseThat code doesn't guarantee to find a failing correct-rounding case ifone exists, but did find failures reasonably efficiently on manyplatforms tested at the time.
msg58454 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-11 18:46
Guido van Rossum wrote:> No, traditionally Python has just used whatever C's double provides.> > There are some places that benefit from IEEE 754, but few that require> it (dunno about optional extension modules).I asked Thomas Wouter about IEEE 754:"""I don't know of any system with FPU that does not support 754 semantics.and all intel CPUs since the 486 have FPUs."""During the discussion we found only one system that has supported Pythonin the past but doesn't have IEEE 754 doubles. Pippy (Python for PalmOS)has no floats at all since the hardware doesn't haven a FPU.> So far I have only one box where it is broken (even after make clobber> and ./configure). I cannot reproduce it with a debug build.> > It's an x86 Linux box running in mixed 64-32 bit mode.> > From /etc/lsb-release: "Ubuntu 6.06 LTS"> uname -a: Linux pythonic.corp.google.com 2.6.18.5-gg16-mixed64-32 #1> SMP Wed Oct 10 17:45:08 PDT 2007 x86_64 GNU/Linux> gcc -v: gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)> > I'm afraid I'll have to debug this myself, but not today.The problem could either be related to SMP or to the mixed mode. Pleasetry to bind the Python process to one CPU with schedtool or schedutils.> Only for IEEE 754 though.> Of course the latter isn't guaranteed to help for non-IEEE-754> platforms -- some platforms don't have NaNs at all!Do you know of any system that supports Python and floats but doesn'thave IEEE 753 semantics? I'm asking out of curiosity.Christian
msg58456 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 18:57
> Do you know of any system that supports Python and floats but doesn'thave IEEE 753 semantics?(Assuming you meant 754.)I'm pretty sure the VAX doesn't have IEEE FP, and it used to run Unixand Python. Ditto for Crays -- unsure if we still support that though. What about Nokia S60?Tim probably knows more.
msg58457 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 19:03
> Correct rounding is a property that needs to be proved, not tested.I take it your position is that this can never be done 100% correctly soit shouldn't go in? That's disappointing, because the stream ofcomplaints that "round is broken" won't stop (we had two of these in thebug tracker just this month).I really want to address this without having to either changeinteractive mode to use something besides repr() or changing repr() todrop precision (assuming correctly rounding input).We can fix the roundtripping in marshal and pickle and the like byexplicitly using "%.17g" % x instead of repr(x). Scientists who worryabout roundtripping can also do this (though are there any scientistswho work on platforms that don't have correctly-rounding input routines?).We can also fix it by adopting the Tcl string-to-float code.
msg58466 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-11 20:26
[Guido]> I take it your position is that this can never be done 100% correctlyNo. David Gay's code is believed to be 100% correctly-rounded and isalso reasonably fast in most cases. I don't know of any other "open"string<->float code that achieves both (& expect that's why glibcadopted Gay's code as its base). I don't know anything about Tcl'scode here, except that, unlike Gay's code, it appears that Tcl's codealso tries to cater to non-754 platforms. It's also easy to be 100%correct if you don't care about speed (but that's factors of 1000sslower than Gay's code).If you care about guaranteeing eval(repr(x)) == x for binary floats,sorry, there's no way to do it that satisfies all of {cheap, easy,fast, correct}. If you don't care about guaranteeing eval(repr(x)) ==x, then it's dead easy to do any number of "wrong" things instead.For example, much easier and faster than anything discussed here wouldbe to use the float str() implementation instead. "Produce theshortest possible output that reproduces the same input" is agenuinely difficult task (for both output and input) when speedmatters, but it's not really what people are asking for when they see0.10000000000000001. They would be just as happy with str(x)'s outputinstead: it's not people who know a lot about floats who are askingto begin with, it's people who don't know much at all.It's not a coincidence that the whole idea of "shortest possibleoutput" came from the Scheme world ;-) That is, "formally correct atany price".> so it shouldn't go in?What shouldn't go in is something that doesn't solve "the problem",whatever that's thought to be today. Changing repr() to doshortest-possible output is not, by itself, a solution /if/ theproblem is thought to be "getting rid of lots of zeroes or nines whilepreserving eval(repr(x)) == x". Slows things down yet isn't enough toguarantee eval(repr(x)) == x. This is just going in circles.> That's disappointing, because the stream of complaints that "round is broken" won't stop (we> had two of these in the bug tracker just this month).I don't care about complaints from people who use binary floatingpoint without understanding it; they should use the decimal moduleinstead if they're unwilling or unable to learn. Or give up oneval(repr(x)) == x for binary floats and then it's all much easier --although people using binary floats naively will get burned bysomething subtler then.A sufficient response to "round() is broken" complaints is to point tothe Tutorial appendix that specifically discusses "round() is broken"complaints.
msg58467 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 20:43
I'd be willing to require eval(repr(x)) == x only for platforms whosefloat input routine is correctly rounding. That would make the currentpatch acceptable I believe -- but I believe you think there's a betterway in that case too? What way is that?Also, I'd rather not lock out non-IEEE-754 platforms, but I'm OK withdropping the above roundtripping invariant there.
msg58473 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 23:07
If I understand correctly, there are two main concerns: speed andportability. I think that they are both not that terrible.How about this:* For IEEE-754 hardware, we implement decimal/binary conversions, anddefine the exact behaviour of floats.* For non-IEEE-754 hardware, we keep the current method of relying onthe system libraries.About speed, perhaps it's not such a big problem, since decimal/binaryconversions are usually related to I/O, and this is relatively slowanyway. I think that usually a program does a relatively fewdecimal/binary conversions.About portability, I think (from a small research I just made) thatS90 supports IEEE-754. This leaves VAX and cray users, which will haveto live with a non-perfect floating-point behaviour.If I am correct, it will let 99.9% of the users get a deterministicfloating-point behaviour, where eval(repr(f)) == f andrepr(1.1)=='1.1', with a speed penalty they won't notice.
msg58474 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-11 23:14
Sounds okay, except that I think that for some folks (e.g. numericPython users) I/O speed does matter, as their matrices are verylarge, and their disks and networks are very fast.
msg58477 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-11 23:46
If I were in that situation I would prefer to store the binaryrepresentation. But if someone really needs to store decimal floats,we can add a method "fast_repr" which always calculates 17 decimaldigits.Decimal to binary conversion, in any case, shouldn't be slower than itis now, since on Gnu it is done anyway, and I don't think that ourimplementation should be much slower.
msg58478 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-12 00:06
> If I were in that situation I would prefer to store the binary> representation. But if someone really needs to store decimal floats,> we can add a method "fast_repr" which always calculates 17 decimal> digits.They can just use "%.17g" % x> Decimal to binary conversion, in any case, shouldn't be slower than it> is now, since on Gnu it is done anyway, and I don't think that our> implementation should be much slower.Sure.
msg58480 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-12 00:31
I've tracked my problem to the GCC optimizer. The default optimizersetting is -O3. When I edit the Makefile to change this to -O1 or -O0and recompile (only) doubledigits.c, repr(1e5) starts returning'100000.0' again. -O2 behaves the same as -O3.Now, don't immediately start pointing fingers to the optimizer; it'squite possible that the code depends on behavior that the C standarddoesn't actually guarantee. The code looks quote complicated.This is with GCC 4.0.3.
msg58509 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-12 23:34
Ok, so if I understand correctly, the ideal thing would be toimplement decimal to binary conversion by ourselves. This would makestr <-> float conversion do the same thing on all platforms, and wouldmake repr(1.1)=='1.1'. This would also allow us to define exactly howfloats operate, with regard to infinities and NaNs. All this is forIEEE-754 platforms -- for the rare platforms which don't support it,the current state remains.However, I don't think I'm going, in the near future, to add a decimalto binary implementation -- the Tcl code looks very nice, but it'squite complicated and I don't want to fiddle with it right now.If nobody is going to implement the correctly rounding decimal tobinary conversion, then I see three options:1. Revert to previous situation2. Keep the binary to shortest decimal routine and use it only when weknow that the system's decimal to binary routine is correctly rounding(we can check - perhaps Microsoft has changed theirs?)3. Keep the binary to shortest decimal routine and drop repr(f) == f(I don't like that option).If options 2 or 3 are chosen, we can check the 1e5 bug.
msg58510 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-12 23:43
> Ok, so if I understand correctly, the ideal thing would be to> implement decimal to binary conversion by ourselves. This would make> str <-> float conversion do the same thing on all platforms, and would> make repr(1.1)=='1.1'. This would also allow us to define exactly how> floats operate, with regard to infinities and NaNs. All this is for> IEEE-754 platforms -- for the rare platforms which don't support it,> the current state remains.Does doubledigits.c not work for non-754 platforms?> However, I don't think I'm going, in the near future, to add a decimal> to binary implementation -- the Tcl code looks very nice, but it's> quite complicated and I don't want to fiddle with it right now.Fair enough. And the glibc code is better anyway, according to Tim.(Although probably even more complex.)> If nobody is going to implement the correctly rounding decimal to> binary conversion, then I see three options:> 1. Revert to previous situationThat would be sad since we're so close.> 2. Keep the binary to shortest decimal routine and use it only when we> know that the system's decimal to binary routine is correctly rounding> (we can check - perhaps Microsoft has changed theirs?)Tim says you can't check (test) for this -- you have to prove it fromsource, or trust the vendor's documentation. I would have no ideawhere to find this documented.> 3. Keep the binary to shortest decimal routine and drop repr(f) == f> (I don't like that option).It would mean that on platforms with unknown-correct rounding inputthe output would be ugly again. I wouldn't mind so much if this wasonly VAX or Cray; but if we had to do this on Windows it'd be a realshame.> If options 2 or 3 are chosen, we can check the 1e5 bug.First you'll have to reproduce it -- I can't afford to spend muchquality time with gdb, especially since I don't understand the sourcecode at all. Though you may be able to pinpoint a possible candidatebased on the characteristics of the error -- it seems to have to dowith not placing the decimal point correctly when there are trailingzeros.
msg58524 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-13 09:12
2007/12/13, Guido van Rossum <report@bugs.python.org>:>> > Ok, so if I understand correctly, the ideal thing would be to> > implement decimal to binary conversion by ourselves. This would make> > str <-> float conversion do the same thing on all platforms, and would> > make repr(1.1)=='1.1'. This would also allow us to define exactly how> > floats operate, with regard to infinities and NaNs. All this is for> > IEEE-754 platforms -- for the rare platforms which don't support it,> > the current state remains.>> Does doubledigits.c not work for non-754 platforms?No. It may be a kind of an oops, but currently it just won't compileon platforms which it doesn't recognize, and it only recognizes 754platforms.>> > 2. Keep the binary to shortest decimal routine and use it only when we> > know that the system's decimal to binary routine is correctly rounding> > (we can check - perhaps Microsoft has changed theirs?)>> Tim says you can't check (test) for this -- you have to prove it from> source, or trust the vendor's documentation. I would have no idea> where to find this documented.>The program for testing floating point compatibility is inhttp://www.cant.ua.ac.be/ieeecc754.htmlTo run it, on my computer, I used:./configure -target Conversions -platform IntelPentium_cppmake./IeeeCC754 -d -r n -n x Conversion/testsets/d2bconvdless ieee.logThis tests only doubles, round to nearest, and ignores flags whichshould be raised to signal inexact conversion. You can use any file inConversions/testsets/d2b* - I chose this one pretty randomly.It turns out that even on my gcc 4.1.3 it finds a few floats notcorrectly rounded. :(Anyway, it can be used to test other platforms. If not by theexecutable itself, we can pretty easily write a python program whichuses the test data.I don't know what exactly the errors with gcc 4.1.3 mean - is there aproblem with the algorithm of glibc, or perhaps the testing programdidn't set some flag?
msg58707 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-17 21:53
Ok, I think I have a solution!We don't really need always the shortest decimal representation. We justwant that for most floats which have a nice decimal representation, thatrepresentation will be used. Why not do something like that:def newrepr(f): r = str(f) if eval(r) == f: return r else: return repr(f)Or, in more words:1. Calculate the decimal representation of f with 17 precision digits,s1, using the system's routines.2. Create a new string, s2, by rounding the resulting string to 12precision digits.3. Convert the resulting rounded string to a new double, g, using thesystem's routines.4. If f==g, return s2. Otherwise, return s1.It will take some more time than the current repr(), because of theadditional decimal to binary conversion, but we already said that ifspeed is extremely important one can use "'%f.17' % f". It willobviously preserve the eval(repr(f)) == f property. And it will return ashort representation for almost any float that has a short representation.This algorithm I will be glad to implement.What do you think?
msg58710 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-17 22:56
This is what I was thinking of before, although I'd use "%.16g"%f and"%.17g"%f instead of str(f) and repr(f), and I'd use float() insteadof eval().I suspect that it doesn't satisfy Tim Peters though, because this maydepend on a rounding bug in the local platform's input function.
msg58726 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-18 07:28
It's not a question of bugs. Call the machine writing the string W andthe machine reading the string R. Then there are 4 ways R can get backthe double W started with when using the suggested algorithm:1. W and R are the same machine. This is the way that's most obvious.The rest apply when W and R are different machines (by which I reallymean both use 754 doubles, but use different C libraries fordouble<->string conversions):2. W and R both do correct-rounding string->double. This is obvious too (although it may not be obvious that it's /not/ also necessary that Wdo correct-rounding double->string).3. W and R both conform to the 754 standard, and W did produce 17significant digits (or fewer if trailing zeroes were chopped). This oneis neither obvious nor non-obvious: that "it works" is something the754 standard requires.4. W produced fewer than 17 significant digits, and W and/or R do not docorrect rounding. Then it works, or fails, by luck, and this hasnothing to do whether the double<->string algorithms have bugs. Thereare countless ways to code such routines that will fail in some cases --doing correct rounding in all cases is the only way that won't fail insome cases of producing fewer than 17 significant digits (and exactlywhich cases depend on all the details of the specificnon-correct-rounding algorithms W and/or R implement).This has nothing to do with what will or won't satisfy me, either. I'mhappy with what Python currently does, which is to rely on #3 above. That's explainable (what's hard about understanding "%.17g"?), andrelies only on what the 754 standard requires.There is no easy way around this, period. If you're willing to give upon float(repr(x)) == x cross-754-platform in some cases (and are willingto be unable to spell out /which/ cases), then many easy alternates exist.Maybe this is a missing fact: I don't care whether float(repr(x)) == x. It would, for example, be fine by me if Python used a plain %.8g formatfor str and repr. But I'm not indulging wishful thinking about thedifficulties in ensuring float(repr(x)) == x either ;-)BTW, abouthttp://www.cant.ua.ac.be/ieeecc754.html: passing that testdoes not guarantee the platform does correct rounding. It uses a fixedset of test cases that were in fact obtained from running the algorithmI invented for efficiently finding "hard" cases in the early 90s (andimplemented by Vern Paxson, and then picked up by the authors of thetest suite you found). It's "very likely" that an implementation thatdoesn't do correct rounding will fail on at least one of those cases,but not certain. Exhaustive testing is impossible (too many cases).
msg58727 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-18 08:09
I think that we can give up float(repr(x)) == x across differentplatforms, since we don't guarantee something more basic: We don'tguarantee that the same program doing only floating point operationswill produce the same results across different 754 platforms, becausein the compilation process we rely on the system's decimal to binaryconversion. In other words, using the current repr(), one can pass avalue x from platform A platform B and be sure to get the same value.But if he has a python function f, he can't be sure that f(x) onplatform A will result in the same value as f(x) on platform B. So thecross-platform repr() doesn't really matter.I like eval(repr(x)) == x because it means that repr(x) captures allthe information about x, not because it lets me pass x from oneplatform to another. For communication, I use other methods.
msg58728 -(view)	Author: Christian Heimes (christian.heimes)*	Date: 2007-12-18 08:09
Tim Peters wrote:> This has nothing to do with what will or won't satisfy me, either. I'm> happy with what Python currently does, which is to rely on #3 above. > That's explainable (what's hard about understanding "%.17g"?), and> relies only on what the 754 standard requires.> > There is no easy way around this, period. If you're willing to give up> on float(repr(x)) == x cross-754-platform in some cases (and are willing> to be unable to spell out /which/ cases), then many easy alternates exist.> > Maybe this is a missing fact: I don't care whether float(repr(x)) == x.> It would, for example, be fine by me if Python used a plain %.8g format> for str and repr. But I'm not indulging wishful thinking about the> difficulties in ensuring float(repr(x)) == x either ;-)Time is right.The only way to ensure that it works on all platforms is a full andmathematical correct proof of the algorithm and all functions used bythe algorithms. In the mean time we are going to release Python 40kwhile you are still working on the proof. scnr :)I like to propose a compromise. The patch is about making therepresentation of a float nicer when users - newbies and ordinary peoplew/o mathematical background - work on a shell or do some simple stuff.They want a pretty and short representation of2.2>>> "My value is %s" % (11./5.)'My value is 2.2'The latter already prints 2.2 without the patch but the former prints2.2000000000000002 on my machine. Now here comes the interesting part.In 2.6 the shell uses PyObject_Print and the tp_print slot to print afloat. We can overwrite the str() and tp_print slot to use the newalgorithm while keeping the old algorithm for repr().Unfortunately the tp_print slot is no longer used in Python 3.0. Itcould be resurrected for this purpose (object.c:internal_print()).str(1.) -> new coderepr(1.) -> old and good representation>>> 1. -> new codeChristian
msg58731 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2007-12-18 09:05
ISTM shorter repr's are inherently misleading and will make it more harder to diagnose why 1.1 * 3 != 3.3 or why round(1.0 % 0.1, 1) == 0.1 which is very far-off of what you might expect. The 17 digit representation is useful in that it suggests where the problem lies. In contrast, showing two numbers with reprs of different lengths will strongly suggest that the shorter one is exactly represented. Currently, that is a useful suggestion, 10.25 shows as 10.25 while 10.21 shows as 10.210000000000001 (indicating that the latter is not exactly represented). If you start showing 1.1 as 1.1, then you've lost both benefits.FWIW, I prefer the current 17 digit behavior, its speed, its cross-754 round-trip guarantees, and it psychological cues about exactness.
msg58733 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-18 09:34
2007/12/18, Raymond Hettinger <report@bugs.python.org>:> The 17 digit representation is useful in that it suggests where the> problem lies. In contrast, showing two numbers with reprs of different> lengths will strongly suggest that the shorter one is exactly> represented. Currently, that is a useful suggestion, 10.25 shows as> 10.25 while 10.21 shows as 10.210000000000001 (indicating that the> latter is not exactly represented). If you start showing 1.1 as 1.1,> then you've lost both benefits.Currently, repr(1.3) == '1.3', suggesting that it is exactlyrepresented, which isn't true. I think that unless you use analgorithm that will truncate zeros only if the decimal representationis exact, the suggested algorithm is less confusing than the currentone, in that it doesn't suggest that 1.3 is exactly stored and 1.1isn't.
msg58736 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2007-12-18 10:25
Right, there are plenty of exceptions to the suggestion of exactness. Still, I find the current behavior to be more helpful than not (especially when trying to explain the examples I gave in the previous post).I'm concerned that the tone of the recent discussion shows a strong desire to just get something checked-in despite dubious benefits and despite long standing invariants.It is a bit cavalier to declare "we can give up float(repr(x)) == x across different platforms." Our NumPy friends might not be amused. If a PEP on this were circulated, I'm sure we would receive some frank and specific feedback.Likewise, I find it questionable that shorter reprs are helpful. Is there any evidence supporting that assertion? My arguments above suggest short reprs are counter-productive because they mask the cause of problems that might otherwise be self-evident. Of course, it's possible that both could be true, short reprs may help in some cases (by masking oddities) and may hurt in other cases (by masking the underlying cause of an oddity or by making a false psychological suggestion).
msg58748 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2007-12-18 12:34
About the educational problem. If someone is puzzled by "1.1*3 !=3.3", you could always use '%50f' % 1.1 instead of repr(1.1). I don'tthink that trying to teach people that floating points don't always dowhat they expect them to do is a good reason to print uninterestingand visually distracting digits when you don't have to.About the compatibility problem: I don't see why it should matter tothe NumPy people if the repr() of some floats is made shorter. Anyway,we can ask them, using a PEP or just the mailing list.About the benefit: If I have data which contains floats, I'm usuallyinterested about their (physical) value, not about their last bits.That's why str(f) does what it does. I like repr(x) to be one-to-one,as I explained in the previous message, but if it can be made morereadable, why not make it so?
msg58753 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2007-12-18 19:03
[Tim: when I said "bugs" I just meant non-correct rounding. Sorry.]On the educational issue: it's still embarrassingly easy to run intosituations where arithmetic using floats produces "educational"results. Simplest case I could find quickly: 0.1+0.2 != 0.3.But I think this is separate from the more directly baffling>>> 0.10.10000000000000001>>> I propose the following set of changes:(1) Get rid of doubledigits.c. IMO it is evil because it uses staticglobal variables.(2) Change pickle.py, cPickle.c and marshal.c to use "%.17g" instead ofrepr. This means that these serialization protocols are not affected byincorrectly-rounding atof() implementations since the IEEE-754 standardguarantees that 17 digits input will produce the correct valueregardless of rounding correctness.(3) change repr(float) so as to use "%.16g" if the result roundtripsusing atof() and "%.17g" if not. ()This means that platforms where atof() is not correctly rounding mayproduce incorrect results when the input was produced by repr() -- evenwhen the repr() was run on a platform whose atof() rounds correctly, IIUTim correctly. I'm okay with that. It's still better than changingrepr() to always use %.16g or even %.12g, which would really turn theeducational issue into an educational nightmare, trying to explain whytwo numbers both print the same but compare unequal (and God forbidanyone proposes adding fuzz to float ==).() If atof() is much slower than "%.17g", we could compute "%.17g" and"%.16g" first and if they are the same string avoid the atof() callaltogether. We could even avoid the "%.16g" if the "%17g" step producedat least one (suppressed) trailing zero. OTOH if atof() is about thesame speed as "%.17g" or faster, it makes more sense to try "%.16g" andatof() first, and if this doesn't roundtrip, fall back to "%.17g".
msg58787 -(view)	Author: Skip Montanaro (skip.montanaro)*	Date: 2007-12-18 23:52
Guido> ... trying to explain why two numbers both print the same but Guido> compare unequal ...This is not a Python-specific issue. The notion of limited precision waspounded into our heads in the numerical analysis class I took in college,1980-ish. I'm pretty sure that was before both Python and IEEE-754. I'mpretty sure I programmed for that class in Fortran. Guido> ... (and God forbid anyone proposes adding fuzz to float ==).Is it possible to produce a FuzzyFloat class which is a subclass of floatthat people could use where they needed it? Its constructor could take atolerance arg, or it could be initialized from a global default. I'm notsuggesting that suchh a class replace float in the core, but that it mightbe useful to have it available in some contexts (available via PyPI, ofcourse). I'm offline at the moment or I'd poke around before asking.Skip
msg58790 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-19 03:09
Guido, right, for that to work reliably, double->str and str->doublemust both round correctly on the platform doing the repr(), andstr->double must round correctly on the platform reading the string.It's quite easy to understand why at a high level: a simple (buttedious) pigeonhole argument shows that there are more representabledoubles than there are possible 16-digit decimal outputs. Thereforethere must exist distinct doubles (say, x and y) that map to the same16-digit decimal output (in fact, there are cases where 7 consecutivedoubles all map to the same correctly-rounded 16-digit decimal string). When reading that back in, there's no room for error in deciding whichof {x, y} was /intended/ -- the str->double conversion has to beflawless in deciding which of {x, y} is closest to the 16-digit decimalinput, and in case x and y are equally close, has to apply 754'sround-to-nearest/even rule correctly. This can require exactmulti-thousand bit arithmetic in some cases, which is why the 754standard doesn't require it (thought to be impractically expensive atthe time).> which would really turn the educational issue into an educational> nightmare, trying to explain why two numbers both print the same> but compare unequalWell, newbies see that in, e.g., spreadsheets every day. You generallyhave to work to get a spreadsheet to show you "all the digits"; in somespreadsheets it's not even possible. "Not all digits are displayed"should be within the grasp of must students without suffering nightmares ;-)
msg58966 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-23 01:58
If someone has a more recent version of MS's compiler, I'd be interestedto know what this does:inc = 2.0*-43base = 1024.0xs = ([base + iinc for i in range(-4, 0)] + [base] + [base + 2iinc for i in (1, 2)])print xsprint ["%.16g" % x for x in xs]That creates 7 distinct doubles all of which map to "1024" whencorrectly rounded to 16 significant digits. And that's what the CygwinPython 2.5.1 (which presumably uses David Gay's correct-roundingconversion routines from glibc) displays for the last line:['1024', '1024', '1024', '1024', '1024', '1024', '1024']The released Windows Python 2.5.1 displays this instead:['1024', '1024', '1024', '1024', '1024', '1024', '1024.000000000001']That's a pretty gross rounding error, since the exact value of the lastelement is1024.00000000000045474735088646411895751953125and so the 16'th digit should indeed not round up to 1. It's a "prettygross" error because the rounded-off part isn't particularly close to ahalf unit in the last (16'th) place.
msg59053 -(view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*	Date: 2007-12-30 22:19
> If someone has a more recent version of MS's compiler, > I'd be interested to know what this does:Visual Studio 2008 Express Edition gives the same results:['1024', '1024', '1024', '1024', '1024', '1024', '1024.000000000001'](Tested release and debug builds, with trunk version and py3k branch)
msg59054 -(view)	Author: Tim Peters (tim.peters)*	Date: 2007-12-30 22:26
Thanks, Amaury! That settles an issue raised earlier: MS'sstring<->double routines still don't do correct rounding, and "aren'teven close" (the specific incorrect rounding showed here isn't a hardalmost-exactly-half-way case).
msg69232 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2008-07-03 23:39
I'd like to reopen this. I'm still in favor of something like to thisalgorithm:def float_repr(x): s = "%.16g" % x if float(s) != x: s = "%.17g" % x s1 = s if s1.startswith('-'): s1 = s[1:] if s1.isdigit(): s += '.0' # Flag it as a float # XXX special case inf and nan, see floatobject.c::format_double return scombined with explicitly using %.17g or the new hex notation (seeissue3008) in places where cross-platform correctness matters, like pickleand marshal.This will ensure float(repr(x)) == x on all platforms, but not crossplatforms -- and I don't care.I'm fine with only doing this in 3.0.
msg69235 -(view)	Author: Tim Peters (tim.peters)*	Date: 2008-07-04 01:02
If you think using 16 (when possible) will stop complaints, think again;-) For example,>>> for x in 0.07, 0.56:... putatively_improved_repr = "%.16g" % x... assert float(putatively_improved_repr) == x... print putatively_improved_repr0.070000000000000010.5600000000000001(those aren't the only "2-digit" examples, but I expect those specificexamples work the same under Linux and Windows).IOW, 16 doesn't eliminate base-conversion surprises, except at the 8"single-digit surprises" (i/10.0 for i in range(1, 5) + range(6, 10)).To eliminate "2-digit surprises" (like 0.07 and 0.56 above), and higher, in this way, you need to add a loop and keeping trying smallerconversion widths so long as they convert back to the original input.Keep it up, and you can make repr(float) so slow it would be faster todo perfect rounding in Python ;-)
msg69242 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2008-07-04 04:56
Here's a rough-and-tumble implementation of that idea, for Py3k.
msg69243 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2008-07-04 04:58
That is truly maddening! :-(I guess Noam's proposal to return str(x) if float(str(x)) == x makesmore sense then. I don't really care as much about 1.234567890123 vs.1.2345678901229999 as I care about 1.2345 vs. 1.2344999999999999.(This comment is supposed to precede the patch.)
msg69368 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2008-07-07 03:27
Here's a fixed patch, float2.diff. (The previous one tasted of anearlier attempt.)
msg69421 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2008-07-08 09:24
[Tim]> If you think using 16 (when possible) will stop complaints, think again> ;-) For example, ...Aha! But using 15 digits would be enough to eliminate all 1, 2, 3, 4, ..., 15 digit 'surprises', wouldn't it?! 16 digits doesn't quite work, essentially because 2-53 is still a little larger than 10-16, but 15 digits ought to be okay.
msg69422 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2008-07-08 10:08
Here's the 'proof' that 15 digits should be enough:Suppose that x is a positive (for simplicity) real number that's exactly representable as a decimal with <= 15 digits. We'd like to know that'%.15g' % (nearest_float_to_x) recovers x.There are integers k and m such that 2(k-1) <= x <= 2k, and 10(m-1) < x <= 10m. (e.g. k = ceiling(log_2(x)), m = ceiling(log_10(x)))Let y be the closest floating-point number to x. Then \|y-x\| <= 2*(k-54). Hence \|y-x\| <= x2*-53 < x/2 10-15 <= 10(m-15)/2. It follows that x is the closest 15-digit decimal to y, hence that applying '%.15g' to y should recover x exactly.The key step here is in the middle inequality, which uses the fact that2-52 < 10-15. The above doesn't work for subnormals, but I'm guessing that people don't care too much about these. The argument above also depends on correct rounding, but there's actually sufficient leeway here (that is, 2-52 is quite substantially smaller than 10-15) that the whole thing would probably work even in the absence of correct rounding.
msg69429 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2008-07-08 13:34
For what it's worth, I'm -0.1 (or should that be -0.10000000000000001?) on this change. It seems better to leave the problems caused by binary floating-point out in the open than try to partially hide them, and the proposed change just seems a little bit too 'magic'. The inconsistency in the following current behaviour does irk me slightly (but only slightly):>>> 0.130.13>>> 0.140.14000000000000001But practical issues aside, my preference would be to go to the other extreme: fix this inconsistency by changing the first output to 0.13000000000000000 rather than changing the second output to 0.14. That is, have repr(float) always output 17 digits, possibly except when that float is exactly representable in 16 or fewer decimal digits (e.g. 3.5, 0.125, ...). All the zeros act as a subtle reminder that the stored value is not exactly 0.13. But I suspect this, too, isn't very practical.
msg69552 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2008-07-11 15:31
Mildly off-topic: it seems that currently eval(repr(x)) == x isn't always true, anyway. On OS X 10.5.4/Intel, I get:>>> x = (2*52-1)2.**(-1074)>>> x2.2250738585072009e-308>>> y = eval(repr(x))>>> y2.2250738585072014e-308>>> x == yFalseThis is almost certainly an OS X bug...
msg69554 -(view)	Author: Tim Peters (tim.peters)*	Date: 2008-07-11 16:22
About (2*52-1)2.(-1074): same outcome under Cygwin 2.5.1, which ispresumably based on David Gay's "perfect rounding" code. Cool ;-)Under the native Windows 2.5.1:>>> x = (252-1)2.*(-1074)>>> x2.2250738585072009e-308>>> y = eval(repr(x))>>> y0.0Hard to say whether that's actually worse :-(About %.15g, good point! It probably would "make Guido happy". It'shard to know for sure, because despite what people say about this, thereal objection to the status quo is aesthetic, not technical, andthere's no disputing tastes.My tastes agree that %.17g is a pain in the ass when using theinteractive shell -- I rarely want to see so many digits. But %.15g,and even %.12g, would /still/ be a pain in the ass, and for the samereason. Most times I'd be happiest with plain old %g (same as %.6g). OTOH, sometimes I do want to see all the bits, and sometimes I'd like afixed format (like %.2f), and so on.Alas, all choices suck for strong reasons (either technical or aesthetic -- and, BTW, your "lots of trailing zeroes" idea won't fly because ofthe latter).The introduction of hex formats for floats should make this starklyclear: using hex I/O for float repr would be the ideal technicalchoice: easy to code, extremely fast, compact representation, andhighly portable. Those are supposedly repr's goals, regardless ofwhether the repr string is easily readable by humans. But you knowwithout asking that using hex format for repr(float) has no chance ofbeing adopted -- because it's ugly :-)
msg82708 -(view)	Author: Preston Briggs (preston)	Date: 2009-02-25 17:09
In all this discussion, it seems that we have not discussed thepossibility of adapting David Gay's code, dtoa.c, which nicely handlesboth halves of the problem. It's also free and has been well exercisedover the years.It's available here:http://www.netlib.org/fp/It'd probably have to be touched up a bit. Guido notes that, forexample, malloc is called without checking the return value. Nevertheless, the core is what we're looking for, I believe.It's described in the paper: "Correctly rounded binary-decimal anddecimal-binary conversions", David M. GayIt comes recommended by Steele and White (output) and by Clinger(input), in the retrospectives on their seminal papers.
msg82710 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-02-25 17:45
> It'd probably have to be touched up a bit.This may be an understatement. :-)In the first 50 lines of the 3897-line dtoa.c file, I see this warning:/* On a machine with IEEE extended-precision registers, it is * necessary to specify double-precision (53-bit) rounding precision * before invoking strtod or dtoa. [...] */It seems to me that figuring out how and when to do this, andwriting and testing the appropriate configure/pyport/whatevercode, is already a nontrivial task.
msg82711 -(view)	Author: Preston Briggs (preston)	Date: 2009-02-25 18:11
>> It'd probably have to be touched up a bit.> This may be an understatement. :-)Probably so. Nevertheless, it's got to be easierthan approaching the problem from scratch.And considering that this discussion has beengoing on for over a year now, it might be a way tomove forward.> In the first 50 lines of the 3897-line dtoa.c file, I see this warning:>> /* On a machine with IEEE extended-precision registers, it is> * necessary to specify double-precision (53-bit) rounding precision> * before invoking strtod or dtoa. [...] */>> It seems to me that figuring out how and when to do this, and> writing and testing the appropriate configure/pyport/whatever> code, is already a nontrivial task.I would consider compiling the library with flags appropriate to forcing64-bit IEEE arithmetic if possible. Another possibility is to exploregdtoa.cwhich is supposed to include code for extended precision, quad precision,etc.
msg82718 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-02-25 20:49
> I would consider compiling the library with flags appropriate to forcing> 64-bit IEEE arithmetic if possible.Using the right compiler flags is only half the battle, though. You should really be setting the rounding precision dynamically: set it to 53-bit precision before the call to strtod, and then restore it to whatever the original precision was afterwards. And the magic formula for doing this is likely to vary from platform to platform. As far as I can tell, the C99 standard doesn't suggest any standard spellings for changing the rounding precision.It's not unknown for libm functions on x86 to blindly assume that the x87 FPU is set to its default of extended-precision (64-bit) rounding, so it's risky to just set 53-bit rounding for the whole of Python's execution.And contrary to popular belief, setting 53-bit rounding still doesn't give the x87 FPU IEEE 754 semantics: there are issues at the extremes of the floating-point range, and I'm not 100% convinced that those issues wouldn't affect the dtoa.c strtod.I'm not saying that all this is impossible; just that it's subtle and would involve some work.
msg82724 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2009-02-25 22:27
Even if someone devoted the time to get possibly get this right, itwould be somewhat difficult to maintain.
msg82727 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-02-25 22:50
What maintenance issues are you anticipating?
msg82728 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2009-02-25 23:09
Gay's code is 3800+ lines and includes many ifdef paths that we need toget right. Mark points out that the code itself needs additional work. The discussions so far also get into setting compiler flags ondifferent systems and I would imagine that that would need to bemaintained as those compilers change and as hardware gets updated. Themain challenge for a maintainer is to be sure that any change doesn'tbreak something. AFAICT, this code is very difficult to validate.The comments is the first 180 lines of code give a good survey of theissues a maintainer would face when given a bug report in the form of:eval(repr(f)) != f for myhardware x compiled with mycompiler c withmyflags f and running on operating system z. There reports would beeven more challenging for cross-platform comparisons.
msg82730 -(view)	Author: Tim Peters (tim.peters)*	Date: 2009-02-25 23:17
The GNU library's float<->string routines are based on David Gay's. Therefore you can compare those to Gay's originals to see how mucheffort was required to make them "mostly" portable, and can look at thehistory of those to get some feel for the maintenance burden.There's no difficulty in doing this correctly except for denormals. Thepain is almost entirely in doing it correctly /fast/ -- and that isindeed hard (which accounts for the extreme length and complexity ofGay's code).
msg82752 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-02-26 10:14
> The GNU library's float<->string routines are based on David Gay's. > Therefore you can compare those to Gay's originalsSounds reasonable.> (which accounts for the extreme length and complexity of Gay's code).Looking at the code, I'm actually not seeing extreme complexity.There are a lot of ifdefs for things that Python probably doesn'tcare about (e.g., worrying about whether the inexact and underflowflags get set or not; hexadecimal floats, which Python already hasits own code for; weird rounding modes, ...). There's some prettystandard looking bignum integer stuff (it might even be possible tosubstitute PyLong arithmetic for this bit). There's lots of boringmaterial for things like parsing numeric strings. The reallyinteresting part of the code is probably only a few hundred lines.I think it looks feasible to adapt Gay's code for Python. I'm not sureI'd want to adapt it without understanding it fully, but actually thatlooks quite feasible too.The x87 control word stuff isn't really a big problem: it can be dealtwith. The bit I don't know how to do here is using the autoconfmachinery to figure out whether the x87 FPU exists and is in use on aparticular machine. (I put in an autoconf test for x87-related doublerounding recently, which should be a pretty reliable indicator, but thatstill fails if someone's already set 53-bit rounding precision...).A side-note: for x86/x86_64, we should really be enabling SSE2 bydefault whenever possible (e.g., on all machines newer than Pentium 3).
msg82774 -(view)	Author: Tim Peters (tim.peters)*	Date: 2009-02-26 20:35
Mark, "extreme complexity" is relative to what's possible if you don'tcare about speed; e.g., if you use only bigint operations verystraightforwardly, correct rounding amounts to a dozen lines ofobviously correct Python code.
msg82776 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-02-26 20:50
So is it worth trying to come up with a patch for this? (Where this = making David Gay's code for strtod and dtoa usable from Python.)
msg82777 -(view)	Author: Tim Peters (tim.peters)*	Date: 2009-02-26 21:01
Is it worth it? To whom ;-) ? It was discussed several times before onvarious Python mailing lists, and nobody was willing to sign up for theconsiderable effort required (both to update Gay's code and to fightwith shifting platform quirks ever after).If /you/ have the time, interest, and commitment, then yes, it's a VeryGood Thing. The only hangup has been that it's also a labor-intensivething, in an obscure area requiring true expertise.
msg82801 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-02-27 01:12
On Thu, Feb 26, 2009 at 1:01 PM, Tim Peters <report@bugs.python.org> wrote:> Is it worth it? To whom ;-) ? It was discussed several times before on> various Python mailing lists, and nobody was willing to sign up for the> considerable effort required (both to update Gay's code and to fight> with shifting platform quirks ever after).>> If /you/ have the time, interest, and commitment, then yes, it's a Very> Good Thing. The only hangup has been that it's also a labor-intensive> thing, in an obscure area requiring true expertise.I think I understood Preston as saying that he has the expertise andis interested in doing this. He's just looking for guidance on howwe'd like the software engineering side of things to be done, and Ithink he's getting good feedback here. (Preston, please butt in if Imisunderstood you.)
msg82807 -(view)	Author: Tim Peters (tim.peters)*	Date: 2009-02-27 01:42
Huh. I didn't see Preston volunteer to do anything here ;-)One bit of software engineering for whoever does sign on: nothing killsporting a language to a new platform faster than needing to get anobscure but core subsystem working. So whatever is done here, to theextent that it requires fiddling with obscure platform-specific gimmicksfor manipulating FPU state, to that extent also is it important toretain a std C fallback (i.e., one requiring no porting effort).
msg82817 -(view)	Author: Preston Briggs (preston)	Date: 2009-02-27 03:57
This all started with email to Guido that y'all didn't see,wherein I wondered if Python was interested in such a thing.Guido said: Sure, in principle, please see the discussion associatedwith this change.I probably don't have all the required expertise today,but I am patient and persistent. I wouldn't be very fast either,'cause I have other things on my plate. Nevertheless, it seems likean interesting project and I'm happy to work on it.If Mark or someone else wants to jump in, that's fine too.Preston
msg82832 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-02-27 13:31
I'd be interested in working with Preston on adapting David Gay's code.(I'm interested in looking at this anyway, but I'd much prefer to do itin collaboration with someone else.)It would be nice to get something working before the 3.1 betas, but thatseems a little optimistic. 2.7 and 3.2 are more likely targets.Preston, are you going to be at PyCon?
msg82994 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2009-03-02 00:00
I'm sorry, but it seems to me that the conclusion of the discussion in2008 is that the algorithm should simply use the system'sbinary-to-decimal routine, and if the result is like 123.456, round itto 15 digits after the 0, check if the result evaluates to the originalvalue, and if so, return the rounded result. This would satisfy mostpeople, and has no need for complex rounding algorithms. Am I mistaken?If I implement this, will anybody be interested?Noam
msg82995 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-03-02 00:15
I tried that, and it was more subtle than that in corner cases.Another argument against it is that on Windows the system input routinedoesn't correctly round unless 17 digits of precision are given. One ofTim Peters's responses should have evidence of that (or perhaps it wason a python-dev thread).
msg83049 -(view)	Author: Noam Yorav-Raphael (noam)	Date: 2009-03-02 23:53
Do you meanmsg58966?I'm sorry, I still don't understand what's the problem with returningf_15(x) if eval(f_15(x)) == x and otherwise returning f_17(x). You said(msg69232) that you don't care if float(repr(x)) == x isn'tcross-platform. Obviously, the simple method will preserve eval(repr(x))== x, no matter what rounding bugs are present on the platform.
msg83050 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-03-02 23:56
I changed my mind on the cross-platform requirement.
msg83189 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-03-05 11:49
Would it be acceptable to use shorter float repr only on big-endian and little-endian IEEE 754 platforms, and use the full 17-digit repr on other platforms? This would greatly simplify the adaptation and testing of Gay's code.Notable platforms that would still have long repr under this scheme, inorder of decreasing significance: - IBM zSeries mainframes when using their hexadecimal FPU (recent models also have IEEE 754 floating-point, and Linux on zSeries uses that in preference to the hex floating-point). - ARM, Old ABI (uses a mixed-endian IEEE 754 format; a newer set of ABIs appeared recently that use big or little-endian IEEE 754) - Cray SV1 (introduced 1998) and earlier Cray machines. Gay's code doesn't support the Cray floating-point format anyway. I believe that newer Crays (e.g., the X1, introduced in 2003) use IEEE floating-point. - VAX machines, using VAX D and G floating-point formats. More recent Alpha machines support IEEE 754 floats as well. - many ancient machines with weird and wonderful floating-point schemesAs far as I can determine, apart from the IBM machines described above all new platforms nowadays use an IEEE 754 double format. Sometimes IEEE arithmetic is only emulated in software; often there are shortcuts taken with respect to NaNs and underflow, but in all cases the underlying format is still IEEE 754, so Gay's code should work. Gay's careful to avoid problems with machines that flush underflow to zero, for example.Gay's code only actually supports 3 floating-point formats: IEEE 754 (big and little-endian), IBM hex format and VAX D floating-point format. There are actually (at least?) two variants of D floating-point: the version used on VAX, which has a 56-bit mantissa and an 8-bit exponent, and the Alpha version, which only has a 53-bit mantissa (but still only an 8-bit exponent); Gay's code only supports the former.I'm quite convinced that the VAX stuff in Gay's code is not worth much to us, so really the only question is whether it's worth keeping the code to support IBM hexadecimal floating-point. Given the difficulty of testing this code and the (significant but not overwhelming) extra complexity it adds, I'd like to remove it.
msg83199 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-03-05 15:10
Sounds good to me.
msg83200 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2009-03-05 15:17
+1 on the fallback strategy for platforms we don't know how to handle.
msg84666 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-03-30 21:52
Eric and I have set up a branch of py3k for work on this issue. URL for (read-only) checkout is:http://svn.python.org/projects/python/branches/py3k-short-float-repr
msg85690 -(view)	Author: Eric V. Smith (eric.smith)*	Date: 2009-04-07 09:33
My changes on the py3k-short-float-repr branch include:- Create a new function PyOS_double_to_string. This will replacePyOS_ascii_formatd. All existing internal uses of PyOS_ascii_formatdfollow this pattern: printf into a buffer to build up the format string,call PyOS_ascii_formatd, then parse the format string to determine whatto do. The changes to the API are: - discrete parameters instead of a printf-like format string. No parsing is required. - returns PyMem_Malloc'd memory instead of writing into a supplied buffer.- Modify all callers of PyOS_ascii_formatd to call PyOS_double_to_stringinstead. These are: -Modules/_pickle.c -Objects/complexobject.c -Objects/floatobject.c -Objects/stringlib/formatter.h -Objects/unicodeobject.c -Python/marshal.cRemaining to be done:- Re-enable 'n' (locale-aware) formatting in float.__format__().- Decide what to do about the change in meaning of "alt" formatting withformat code 'g' when an exponent is present, also in float.__format__().- Have marshal.c deal with potential memory failures returned fromPyOS_double_to_string.- Deprecate PyOS_ascii_formatd.
msg85692 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 10:10
So work on the py3k-short-float-repr branch is nearing completion, andwe (Eric and I) would like to get approval for merging these changesinto the py3k branch before this month's beta.A proposal: I propose that the short float representation should beconsidered an implementation detail for CPython, not a requirement forPython the language. This leaves Jython and IronPython and friends freeto do as they wish. All that should be required for Python itself isthat float(repr(x)) == x for (non-nan, non-infinite) floats x. Doesthis seem reasonable to people following this issue? If it'scontroversial I'll take it to python-dev.Eric's summarized his changes above. Here are mine (mostly---someof this has bits of Eric's work in too): - change repr(my_float) to produce output with minimal number ofdigits, on IEEE 754 big- and little-endian systems (which includes allsystems we can actually get our hands on right now). - there's a new filePython/dtoa.c (and a corresponding fileInclude/dtoa.h) with a cut-down version of Gay's dtoa and strtod in it.There are comments at the top of that file indicating the changes thathave been made from Gay's original code. - one minor semantic change: repr(x) changes to exponential format at1e16 instead of 1e17. This avoids a strange situation where Gay's codeproduces a 16-digit 'shortest' integer string for a particular float andPython's formatting code then pads with an incorrect '0':>>> x = 2e16+8. # 2e16+8. is exactly representable as a float>>> x20000000000000010.0There's no way that this padding with bogus digits can happen fornumbers less than 1e16.
msg85693 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 10:26
Changing target Python versions.I'll upload a patchset to Rietveld sometime soon (later today, I hope).
msg85696 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 12:26
I've uploaded the current version to Rietveld:http://codereview.appspot.com/33084/show
msg85697 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 12:34
The Rietveld patch set doesn't show the three new files, which are:Python/dtoa.c Include/dtoa.h Lib/test/formatfloat_testcases.txt
msg85698 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 12:53
Those three missing files have now been added to Rietveld.Just for reference, in case anyone else encounters this: the reason those files were missing from the initial upload was that after I svn merge'd from py3k-short-float-repr to py3k, those files were being treated (by svn) as copies from py3k-short-float-repr, rather than new files, so svn diff didn't show any output for them. Doing:svn revertPython/dtoa.csvn addPython/dtoa.c(and similarly for the other two files) fixed this.
msg85705 -(view)	Author: Guido van Rossum (gvanrossum)*	Date: 2009-04-07 14:54
On Tue, Apr 7, 2009 at 3:10 AM, Mark Dickinson <report@bugs.python.org> wrote:> A proposal: I propose that the short float representation should be> considered an implementation detail for CPython, not a requirement for> Python the language. This leaves Jython and IronPython and friends free> to do as they wish.In principle that's fine with me.> All that should be required for Python itself is> that float(repr(x)) == x for (non-nan, non-infinite) floats x. Does> this seem reasonable to people following this issue? If it's> controversial I'll take it to python-dev.Historically, we've had a stronger requirement: if you print repr(x)and ship that string to a different machine, float() of that stringreturns the same value, assuming both systems use the same internal FPrepresentation (e.g. IEEE). Earlier in this bug (or elsewhere) TimPeters has pointed out that on some platforms, in particular Windows,input conversion doesn't always round correctly and only workscorrectly when the input has at least 17 digits (in which case youapparently don't need to round, and truncation works just as well --that's all the C standard requires, I believe).Now that pickle and marshal no longer use repr() for floats I thinkthis is less important, but still at least worth considering. I thinkby having our own input function we solve this if both hosts runCPython, but it might break for other implementations.In order to make progress I recommend that we just not this and don'tuse it to hold up the implementation, but I think it's worth noticingin docs somewhere.
msg85733 -(view)	Author: Jared Grubb (jaredgrubb)	Date: 2009-04-07 18:32
I think ANY attempt to rely on eval(repr(x))==x is asking for trouble,and it should probably be removed from the docs.Example: The following C code can vary even on a IEEE 754 platform,even in two places in the same source file (so same compile options),even in the same 5 lines of code recompiled after changing code thatdoes even not touch/read 'x' or 'y':double x, y;x = 3.0/7.0;y = x;/* ... code that doesnt touch/read x or y ... */printf(" x==y: %s", (x==y) ? "true" : "false");So, how can we hope that eval(repr(x))==x is EVER stable? Equality andfloating point should raise the hairs on the back of everyone's neck...(Code above based onhttp://docs.sun.com/source/806-3568/ncg_goldberg.html in section"Current IEEE 754 Implementations", along with a great explanation onwhy this is true. The code example is a little different, but the sameconcept applies.)
msg85739 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 19:43
> Historically, we've had a stronger requirement: if you print repr(x)> and ship that string to a different machine, float() of that string> returns the same value, assuming both systems use the same internal FP> representation (e.g. IEEE).Hmm. With the py3k-short-float-repr stuff, we should be okay moving things between different CPython boxes, since float() would use Gay's code and so would be correctly rounded for any length input, not just input with 17 significant digits.But for a CPython-generated short repr to give the correct value on some other implementation, that implementation would also have to have a correctly rounded string->float.For safety's sake, I'll make sure that marshal (version 1) and pickle (protocol 0) are still outputting the full 17 digits.
msg85741 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-07 20:04
> I think ANY attempt to rely on eval(repr(x))==x is asking for trouble,> and it should probably be removed from the docs.I disagree. I've read the paper you refer to; nevertheless, it's still perfectly possible to guarantee eval(repr(x)) == x in spite of the various floating-point details one has to deal with. In the worst case, assuming that C doubles are IEEE 754 binary64 format, for double -> string conversion one can simply cast the double to an array of 8 bytes, extract the sign, exponent and mantissa from those bytes, and then compute the appropriate digit string using nothing but integer arithmetic; similarly for the reverse operation: take a digit string, do some integer arithmetic to figure out what the exponent, mantissa and sign should be, and pack those values back into the double. Hey presto: correctly rounded string -> double and double -> string conversions! Add a whole bag of tricks to speed things up in common cases and you end up with something like David Gay's code.It is true that the correctness of Gay's code depends on the FPU being set to use 53-bit precision, something that isn't true by default on x87 FPUs. But this isn't hard to deal with, either: first, you make sure that you're not using the x87 FPU unless you really have to (all Intel machines that are newer than Pentium 3 have the SSE2 instruction set, which doesn't have the problems that x87 does); second, if you really have to use x87 then you figure out how to set its control word to get 53-bit precision and round-half-to-even rounding; third, if you can't figure out how to set the control word then you use a fallback something like that described above.
msg85747 -(view)	Author: Jared Grubb (jaredgrubb)	Date: 2009-04-07 21:02
The process that you describe inmsg85741 is a way of ensuring"memcmp(&x, &y, sizeof(x))==0", and it's portable and safe and is theRight Thing that we all want and expect. But that's not "x==y", as thatSun paper explains. It's close, but not technically accurate, as theimplication arrow only goes one way (just as "x=1/y" implies "xy=1" inalgebra, but not the other way around)I'd be interested to see if you could say that the Python objectmodel/bytecode interpretation enforces a certain quauntum of operationsthat actually does imply "eval(repr(x))==x"; but I suspect it'sunprovable, and it's fragile as Python grows to have more support inCLI/LLVM/JVM backends. My pedantic mind would strip any and all references to floating-pointequality out of the docs, as it's dangerous and insidiously misleading,even in "obvious" cases. But, I'll stop now :) (FYI: I've enjoyed the~100 messages here.. Great stuff!)
msg86048 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-16 21:52
The py3k-short-float-repr branch has been merged to py3k in two parts:r71663 is mostly concerned with the inclusion of David Gay's code into the core, and the necessary floating-point fixups to allow Gay's code to be used (SSE2 detection, x87 control word manipulation, etc.)r71665 contains Eric's mammoth rewrite and upgrade of the all the float formatting code to use the new _Py_dg_dtoa and _Py_dg_strtod functions.Note: the new code doesn't give short float repr on all platforms, though it's close. The sticking point is that Gay's code needs 53-bit rounding precision, and the x87 FPU uses 64-bit rounding precision by default (though some operating systems---e.g., FreeBSD, Windows---change that default). For the record, here's the strategy that I used: feedback (esp. from experts) would be appreciated.- If the float format is not IEEE 754, don't use Gay's code.- Otherwise, if we're not on x86, we're probably fine. (Historically, there are other FPUs that have similar problems---e.g., the Motorola 68881/2 FPUs---but I think they're all too old to be worth worrying about by now.)- x86-64 in 64-bit mode is also fine: there, SSE2 is the default. x86-64 in 32-bit mode (e.g., 32-bit Linux on Core 2 Duo) has the same problems as plain x86. (OS X is fine: its gcc has SSE2 enabled by default even for 32-bit mode.)- Windows/x86 appears to set rounding precision to 53-bits by default, so we're okay there too.So:- On gcc/x86, detect the availability of SSE2 (by examining the result of the cpuid instruction) and add the appropriate flags (-msse2 -mfpmath=sse2) to BASECFLAGS if SSE2 is available.- On gcc/x86, if SSE2 is not available, so that we're using the x87 FPU, use inline assembler to set the rounding precision (and rounding mode) before calling Gay's code, and restore the FPU state directly afterwards. Use of inline assembler is pretty horrible, but it seems to be more portable than any of the alternatives. The official C99 way is to use fegetenv/fesetenv to get and set the floating-point environment, but the fenv_t type used to store the environment can (and will) vary from platform to platform.- There's an autoconf test for double-rounding. If there's no evidence of double rounding then it's likely to be safe to use Gay's code: double rounding is an almost unavoidable symptom of 64-bit precision on x87.So on non-Windows x86 platforms that aren't using gcc and do exhibitdouble rounding (implying that they're not using SSE2, and that the OS doesn't change the FPU rounding precision to 53 bits), we're out of luck. In this case the old long float repr is used.The most prominent platform that I can think of that's affected by this would be something like Solaris/x86 with Sun's own compiler, or more generally any Unix/x86 combination where the compiler isn't gcc. Those platforms need to be dealt with on a case-by-case basis, by figuring out for each such platform how to detect and use SSE2, and how to get and set the x87 control word if SSE2 instructions aren't available.Note that if any of the above heuristics is wrong and we end up using Gay's code inappropriately, then there will be loud failure: we'll know about it.
msg86069 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-17 11:19
Closing this. There are a few known problems remaining, but they've all got their own issue numbers: seeissue 5780,issue 4482.
msg86127 -(view)	Author: Antoine Pitrou (pitrou)*	Date: 2009-04-18 17:55
Hello folks,IIUC, autoconf tries to enable SSE2 by default without asking. Isn't ita problem for people distributing Python binaries (e.g. Linux vendors)and expecting these binaries to work on legacy systems even though thesystem on which the binaries were compiled supports SSE2?Or am I misunderstanding what the changes are?
msg86129 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-18 18:08
Yes, I think you're right.Perhaps the SSE2 support should be turned into an --enable-sse2 configure option, that's disabled by default? One problem with this is that I don't know how to enable SSE2 instructions for compilers other than gcc, so the option would only apply to gcc-based systems.Disabling SSE2 would just mean that all x86/gcc systems would end up using the inline assembler to get and set the control word; so we'd still get short float repr everywhere we did before.
msg86131 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-18 18:34
Perhaps better to drop the SSE2 bits completely. Anybody whoactually wants SSE2 instructions in their binary can do aCC="gcc -msse2 -mfpmath=sse" configure && ...Unless there are objections, I'll drop everything involving SSE2 from the configure script.It's a bit of a shame, though: it's definitely desirable to be using the SSE2 instructions for floating-point whenever possible. Fewer surprises due to double rounding or random 80-bit register spilling, better performance on underflow and NaNs, ...
msg86141 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-18 20:56
SSE2 detection and flags removed inr71723. We'll see how the buildbots fare...
msg86144 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2009-04-18 23:35
Is there a way to use SSE when available and x86 when it's not. IIRC,floating point used to work that way (using a C lib as a fallback onsystems w/o coprocessor support).
msg86146 -(view)	Author: Antoine Pitrou (pitrou)*	Date: 2009-04-18 23:43
> Is there a way to use SSE when available and x86 when it's not.Probably, but I don't think there is any point doing so. The mainbenefit of SSE2 is to get higher performance on floating point intensivecode, which no pure Python code could be regarded as (the CPU cycleselapsed between two FP instructions would dwarf the cost of executingthe FP instructions themselves).The situation is different for specialized packages like numpy, buttheir decisions need not be the same as ours AFAIK.
msg86149 -(view)	Author: Raymond Hettinger (rhettinger)*	Date: 2009-04-18 23:52
The advantage is accuracy. No double rounding. This will also help themath.fsum() function that is also susceptible to double rounding.
msg86158 -(view)	Author: Mark Dickinson (mark.dickinson)*	Date: 2009-04-19 08:44
[Raymond]> Is there a way to use SSE when available and x86 when it's not.I guess it's possible in theory, but I don't know of any way to do this in practice. I suppose one could trap the SIGILL generated by the attempted use of an SSE2 instruction on a non-supported platform---is this how things used to work for 386s without the 387? That would make a sequence of floating-point instructions on non-SSE2 x86 horribly slow, though.Antoine: as Raymond said, the advantage of SSE2 for numeric work is accuracy, predictability, and consistency across platforms. The SSE2 instructions finally put an end to all the problems arising from the mismatch between the precision of the x87 floating-point registers (64-bits) and the precision of a C double (53-bits). Those difficulties include (1) unpredictable rounding of intermediate values from 64-bit precision to 53-bit precision, due to spilling of temporaries from FPU registers to memory, and (2) double-rounding. The arithmetic of Python itself is largely immune to the former, but not the latter. (And of course the register spilling still causes headaches for various bits of CPython).Those difficulties can be mostly dealt with by setting the x87 rounding precision to double (instead of extended), though this doesn't fix the exponent range, so one still ends up with double-rounding on underflow. The catch is that one can't mess with the x87 state globally, as various library functions (especially libm functions) might depend on it being in whatever the OS considers to be the default state.There's a very nice paper by David Monniaux that covers all this: definitely recommended reading after you've finished Goldberg's "What Every Computer Scientist...". It can currently be found at:http://hal.archives-ouvertes.fr/hal-00128124/en/An example: in Python (any version), try this:>>> 1e16 + 2.999910000000000000002.0On OS X, Windows and FreeBSD you'll get the answer above.(OS X gcc uses SSE2 by default; Windows and FreeBSD bothmake the default x87 rounding-precision 53 bits).On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer10000000000000004.0instead, because Linux doesn't (usually?) change the Intel defaultrounding precision of 64-bits. Using SSE2 instead of the x87 would have fixed this.</standard x87 rant>
msg117723 -(view)	Author: Ole Laursen (olau)	Date: 2010-09-30 11:15
Just came across this bug, I don't want to reopen this or anything, but regarding the SSE2 code I couldn't help thinking that why can't you just detect the presence of SSE2 when the interpreter starts up and then switch implementations based on that? I think that's what liboil does (http://liboil.freedesktop.org/wiki/).

History
Date	User	Action	Args
2022-04-11 14:56:28	admin	set	github: 45921
2014-03-07 21:30:36	eric.snow	set	nosy: +eric.snow
2010-09-30 11:15:48	olau	set	nosy: +olau messages: +msg117723
2009-04-19 08:44:24	mark.dickinson	set	messages: +msg86158
2009-04-18 23:52:33	rhettinger	set	messages: +msg86149
2009-04-18 23:43:06	pitrou	set	messages: +msg86146
2009-04-18 23:35:34	rhettinger	set	messages: +msg86144
2009-04-18 20:56:29	mark.dickinson	set	messages: +msg86141
2009-04-18 18:34:55	mark.dickinson	set	messages: +msg86131
2009-04-18 18:08:30	mark.dickinson	set	messages: +msg86129
2009-04-18 17:55:49	pitrou	set	nosy: +pitrou messages: +msg86127
2009-04-17 11:19:21	mark.dickinson	set	status: open -> closed resolution: accepted messages: +msg86069 stage: resolved
2009-04-16 21:52:38	mark.dickinson	set	messages: +msg86048
2009-04-07 21:02:02	jaredgrubb	set	messages: +msg85747
2009-04-07 20:04:23	mark.dickinson	set	messages: +msg85741
2009-04-07 19:44:00	mark.dickinson	set	messages: +msg85739
2009-04-07 18:32:26	jaredgrubb	set	nosy: +jaredgrubb messages: +msg85733
2009-04-07 14:54:02	gvanrossum	set	messages: +msg85705
2009-04-07 12:53:52	mark.dickinson	set	messages: +msg85698
2009-04-07 12:34:25	mark.dickinson	set	messages: +msg85697
2009-04-07 12:26:23	mark.dickinson	set	messages: +msg85696
2009-04-07 10:26:52	mark.dickinson	set	messages: +msg85693 versions: + Python 3.1, - Python 2.6, Python 3.0
2009-04-07 10:10:30	mark.dickinson	set	messages: +msg85692
2009-04-07 09:33:09	eric.smith	set	messages: +msg85690
2009-03-30 21:52:27	mark.dickinson	set	messages: +msg84666
2009-03-30 13:33:31	mark.dickinson	set	nosy: +eric.smith
2009-03-05 15:17:55	rhettinger	set	messages: +msg83200
2009-03-05 15:10:25	gvanrossum	set	messages: +msg83199
2009-03-05 11:49:31	mark.dickinson	set	messages: +msg83189
2009-03-02 23:56:46	gvanrossum	set	messages: +msg83050
2009-03-02 23:53:36	noam	set	messages: +msg83049
2009-03-02 00:15:51	gvanrossum	set	messages: +msg82995
2009-03-02 00:00:12	noam	set	messages: +msg82994
2009-02-27 17:50:51	skip.montanaro	set	nosy: -skip.montanaro
2009-02-27 13:31:39	mark.dickinson	set	messages: +msg82832
2009-02-27 03:57:32	preston	set	files: +unnamed messages: +msg82817
2009-02-27 01:42:12	tim.peters	set	messages: +msg82807
2009-02-27 01:12:03	gvanrossum	set	messages: +msg82801
2009-02-26 21:01:33	tim.peters	set	messages: +msg82777
2009-02-26 20:50:06	mark.dickinson	set	messages: +msg82776
2009-02-26 20:35:24	tim.peters	set	messages: +msg82774
2009-02-26 10:14:08	mark.dickinson	set	messages: +msg82752
2009-02-25 23:17:22	tim.peters	set	messages: +msg82730
2009-02-25 23:09:35	rhettinger	set	messages: +msg82728
2009-02-25 22:50:22	gvanrossum	set	messages: +msg82727
2009-02-25 22:27:50	rhettinger	set	messages: +msg82724
2009-02-25 20:49:14	mark.dickinson	set	messages: +msg82718
2009-02-25 18:11:20	preston	set	files: +unnamed messages: +msg82711
2009-02-25 17:45:33	mark.dickinson	set	messages: +msg82710
2009-02-25 17:09:53	preston	set	nosy: +preston messages: +msg82708
2008-07-11 16:22:40	tim.peters	set	messages: +msg69554
2008-07-11 15:31:53	mark.dickinson	set	messages: +msg69552
2008-07-08 13:34:36	mark.dickinson	set	messages: +msg69429
2008-07-08 10:08:04	mark.dickinson	set	messages: +msg69422
2008-07-08 09:24:56	mark.dickinson	set	messages: +msg69421
2008-07-07 03:27:09	gvanrossum	set	files: +float2.diff messages: +msg69368
2008-07-04 04:58:40	gvanrossum	set	messages: +msg69243
2008-07-04 04:57:01	gvanrossum	set	files: +float.diff messages: +msg69242
2008-07-04 01:02:27	tim.peters	set	messages: +msg69235
2008-07-03 23:39:33	gvanrossum	set	messages: +msg69232
2008-06-29 10:50:18	mark.dickinson	set	nosy: +mark.dickinson
2008-06-10 04:27:08	alexandre.vassalotti	set	nosy: +alexandre.vassalotti
2008-01-06 22:29:44	admin	set	keywords: -py3k versions: Python 2.6, Python 3.0
2007-12-30 22:26:50	tim.peters	set	messages: +msg59054
2007-12-30 22:19:08	amaury.forgeotdarc	set	nosy: +amaury.forgeotdarc messages: +msg59053
2007-12-23 01:58:14	tim.peters	set	messages: +msg58966
2007-12-19 03:09:09	tim.peters	set	messages: +msg58790
2007-12-18 23:52:12	skip.montanaro	set	nosy: +skip.montanaro messages: +msg58787
2007-12-18 19:03:19	gvanrossum	set	messages: +msg58753
2007-12-18 12:34:50	noam	set	messages: +msg58748
2007-12-18 10:25:43	rhettinger	set	messages: +msg58736
2007-12-18 09:34:42	noam	set	messages: +msg58733
2007-12-18 09:05:59	rhettinger	set	messages: +msg58731
2007-12-18 08:09:34	christian.heimes	set	messages: +msg58728
2007-12-18 08:09:24	noam	set	messages: +msg58727
2007-12-18 07:28:54	tim.peters	set	messages: +msg58726
2007-12-17 22:56:49	gvanrossum	set	messages: +msg58710
2007-12-17 21:53:20	noam	set	messages: +msg58707
2007-12-13 09:12:27	noam	set	messages: +msg58524
2007-12-12 23:43:27	gvanrossum	set	messages: +msg58510
2007-12-12 23:34:18	noam	set	messages: +msg58509
2007-12-12 00:31:07	gvanrossum	set	messages: +msg58480
2007-12-12 00:06:46	gvanrossum	set	messages: +msg58478
2007-12-11 23:46:57	noam	set	messages: +msg58477
2007-12-11 23:14:40	gvanrossum	set	messages: +msg58474
2007-12-11 23:07:40	noam	set	messages: +msg58473
2007-12-11 20:43:26	gvanrossum	set	messages: +msg58467
2007-12-11 20:26:45	tim.peters	set	messages: +msg58466
2007-12-11 19:03:41	gvanrossum	set	messages: +msg58457
2007-12-11 18:57:42	gvanrossum	set	priority: high -> normal messages: +msg58456
2007-12-11 18:46:53	christian.heimes	set	messages: +msg58454
2007-12-11 18:24:31	tim.peters	set	messages: +msg58452
2007-12-11 17:57:48	tim.peters	set	messages: +msg58449
2007-12-11 17:51:04	rhettinger	set	nosy: +rhettinger messages: +msg58445
2007-12-11 17:34:06	gvanrossum	set	messages: +msg58443
2007-12-11 15:47:22	christian.heimes	set	messages: +msg58436
2007-12-11 15:21:20	gvanrossum	set	messages: +msg58433
2007-12-11 15:09:19	christian.heimes	set	messages: +msg58432
2007-12-11 13:58:58	noam	set	messages: +msg58430
2007-12-11 13:46:41	christian.heimes	set	messages: +msg58429
2007-12-11 13:21:51	noam	set	messages: +msg58427
2007-12-11 12:30:37	noam	set	messages: +msg58424
2007-12-11 09:11:02	christian.heimes	set	messages: +msg58422
2007-12-11 08:37:51	noam	set	messages: +msg58419
2007-12-11 07:59:20	noam	set	messages: +msg58416
2007-12-11 03:15:06	tim.peters	set	messages: +msg58411
2007-12-11 02:51:08	christian.heimes	set	files: +ieee_dbl.c nosy: +nascheme messages: +msg58407
2007-12-11 01:58:14	tim.peters	set	messages: +msg58405
2007-12-11 01:33:16	christian.heimes	set	versions: + Python 2.6
2007-12-11 01:31:13	noam	set	messages: +msg58403 versions: - Python 2.6
2007-12-11 01:24:18	gvanrossum	set	messages: +msg58401
2007-12-11 01:09:53	christian.heimes	set	versions: + Python 2.6 messages: +msg58399 priority: high assignee:christian.heimes -> keywords: +py3k resolution: fixed -> (no value)
2007-12-11 01:08:22	noam	set	messages: +msg58398
2007-12-11 00:37:02	gvanrossum	set	status: closed -> open messages: +msg58394
2007-12-10 22:20:21	christian.heimes	set	status: open -> closed resolution: fixed messages: +msg58374
2007-12-10 19:49:55	gvanrossum	set	keywords: +patch assignee:christian.heimes messages: +msg58364 nosy: +gvanrossum,christian.heimes,tim.peters
2007-12-10 19:13:27	noam	create

Supported byThe Python Software Foundation,
Powered byRoundup