Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
Description
Currentlysum()
makes no efforts to improve accuracy over a simple running total. We do havemath.fsum()
that makes extreme efforts to be almost perfect; however, that function isn't well known, it runs 10x slower than regularsum()
, and it always coerces to a float.
I suggest switching the builtinsum()
handling of float inputs to Arnold Neumaier's improved variation of compensated summation. Per hispaper, this algorithm has excellent error bounds (though not as perfect asfsum()
:
|s - š| ≤ ε|s| + ε²(3/4n² + n)·Σ|aᵢ| (IV,12)|s - š| ≤ ε|s| + ε²(1/4n³ + 5/2n² + n)·Max|aᵢ| (IV,13)
The compensation tracking runs in parallel to the main accumulation. And except for pathological cases, the branch is predictable, making the test essentially free. Thanks to the magic of instruction level parallelism and branch prediction, this improvement has zero cost on my Apple M1 build. Timed with:
./python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_000)]' 'sum(d)'
N.B. Numpy switched from a simple running total topartial pairwise summation. That isn't as accurate as what is being proposed here, but it made more sense for them because the extra work of Neumaier summation isn't masked by the overhead of fetching values from an iterator as we do here. Also with an iterator, we can't do pairwise summation without using auxiliary memory.
Linked PRs
- GH-100425: Improve accuracy of builtin sum() for float inputs #100426
- GH-100425: Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum #100860
- gh-100425: Update tutorial docs related to sum() accuracy #101854
- GH-100425: Note improved commutativity in sum(). #107785
- [3.12] GH-100425: Note improved commutativity in sum(). (GH-107785) #107787