Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Math hypot exactfloat fastpath#8949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@rhettinger
Copy link
Contributor

Provide a fast path for the common case of exact float inputs. Saves the overhead of an external function call and of thex == 1.0 error check. Allows the inner loops to mostly use registers.

Speeds-up the overall function by approximately 25%:

$ ------ baseline -------$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'1000000 loops, best of 7: 297 nsec per loop$ ------ patched -------$ py -m timeit -r7 -s 'from math import dist; p=tuple(map(float, range(20))); q=tuple(reversed(p))' 'dist(p, q)'1000000 loops, best of 7: 215 nsec per loop

Disassembly ofmath_hypot() using GCC 8.2 shows a very tight inner loop without unnecessary register spills and reloads and without external calls that have to save and restore registers:

L378:xorl%eax, %eaxandpdlC5(%rip), %xmm0         # x = fabs(x);ucomisd%xmm0, %xmm0movl$1, %ecxsetne%alcmovp%ecx, %eaxorl%eax, %ebx               # found_nan |= Py_IS_NAN(x);L377:movsd%xmm0, (%r12,%r15,8)     # coordinates[i] = x;maxsd%xmm1, %xmm0             # if (x > max) { max = x; }addq$1, %r15                 # i++cmpq%r15, %rbp               # i < nmovapd%xmm0, %xmm1jleL418L385:movq24(%r13,%r15,8), %rdi    # item = PyTuple_GET_ITEM(args, i);cmpq%r14, 8(%rdi)            # if (PyFloat_CheckExact(item))jneL375movsd16(%rdi), %xmm0          # x = PyFloat_AS_DOUBLE(item)jmpL378

Saves function call overhead and lets inner-loop be performedin registers with no spills/reloads.
@rhettingerrhettinger added performancePerformance or resource usage skip issue skip news labelsAug 27, 2018
@rhettingerrhettinger merged commit74734f7 intopython:masterAug 27, 2018
@rhettingerrhettinger deleted the math-hypot-exactfloat-fastpath branchAugust 27, 2018 00:38
CuriousLearner added a commit to CuriousLearner/cpython that referenced this pull requestAug 27, 2018
* master: (104 commits)  Fast path for exact floats in math.hypot() and math.dist() (pythonGH-8949)  Remove AIX workaround test_subprocess (pythonGH-8939)  bpo-34503: Fix refleak in PyErr_SetObject() (pythonGH-8934)  closes bpo-34504: Remove the useless NULL check in PySequence_Check(). (pythonGH-8935)  closes bpo-34501: PyType_FromSpecWithBases: Check spec->name before dereferencing it. (pythonGH-8930)  closes bpo-34502: Remove a note about utf8_mode from sys.exit() docs. (pythonGH-8928)  Remove unneeded PyErr_Clear() in _winapi_SetNamedPipeHandleState_impl() (pythonGH-8281)  Fix markup in stdtypes documentation (pythonGH-8905)  bpo-34395: Don't free allocated memory on realloc fail in load_mark() in _pickle.c. (pythonGH-8788)  Fix upsizing of marks stack in pickle module. (pythonGH-8860)  bpo-34171: Prevent creating Lib/trace.cover when run the trace module. (pythonGH-8841)  closes bpo-34493: Objects/genobject.c: Add missing NULL check to compute_cr_origin() (pythonGH-8911)  Fixed typo with asynccontextmanager code example (pythonGH-8845)  bpo-34426: fix typo (__lltrace__ -> __ltrace__) (pythonGH-8822)  bpo-13312: Avoid int underflow in time year. (pythonGH-8912)  bpo-34492: Python/coreconfig.c: Fix _Py_wstrlist_copy() (pythonGH-8910)  bpo-34448: Improve output of usable wchar_t check (pythonGH-8846)  closes bpo-34471: _datetime: Add missing NULL check to tzinfo_from_isoformat_results. (pythonGH-8869)  bpo-6700: Fix inspect.getsourcelines for module level frames/tracebacks (pythonGH-8864)  Fix typo in the dataclasses's doc (pythonGH-8896)  ...
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

performancePerformance or resource usageskip issueskip news

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@rhettinger@the-knights-who-say-ni@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp