Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397

Draft
eendebakpt wants to merge 2 commits intopython:mainfrom
eendebakpt:jit_float_truediv
Draft

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397
eendebakpt wants to merge 2 commits intopython:mainfrom
eendebakpt:jit_float_truediv

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commentedMar 24, 2026
edited by bedevere-appbot
Loading

We optimize float divisions for the case where one of the operands is a unique reference. This is similar to#146307, but with a guard for division by zero.

  • We do not add opcodes in tier 1
  • For tier 2 we can specialize for the case when one of the operations is a unique reference and for the case when there are no unique references. The case where there are no unique references (or we miss information about the uniqueness) has no performance improvement in itself, but might be usefull to propagate types better. For the cases tested so far it was not needed, so there is no_BINARY_TRUEDIV_FLOAT opcode for tier 2 in this PR.

Micro-benchmarks (min of 3 runs, 2M iterations)

Patternmain (ns/iter)branch (ns/iter)SpeedupNotes
(a+b) * c10.810.8--baseline (multiply, already optimized)
a / b20.613.31.5xunique result enables inplace +=
(a+b) / c26.410.82.4xinplace LHS, guard inserted forc
(2.0+x) / y25.111.12.3xinplace LHS, guard inserted fory
c / (a+b)26.011.42.3xinplace RHS, guard inserted forc
(a/b) / (c/d)41.322.31.9xunique results enable inplace ops
(a/b) + (c/d)29.121.91.3xunique results enable inplace +
Benchmark script
"""Benchmark for float true division tier 2 specialization.Usage:    ./python bench_truediv.py"""import timeitN = 2_000_000INNER = 1000def bench(label, fn):    iters = N // INNER    times = [timeit.timeit(fn, number=iters) for _ in range(3)]    t = min(times)    print(f"  {label}: {t/N*1e9:.1f} ns/iter")def f_chain_mul(n, a, b, c):    t = 0.0    for i in range(n):        t += (a + b) * c    return tdef f_div(n, a, b):    t = 0.0    for i in range(n):        t += a / b    return tdef f_chain_div(n, a, b, c):    t = 0.0    for i in range(n):        t += (a + b) / c    return tdef f_2px_div_y(n, x, y):    t = 0.0    for i in range(n):        t += (2.0 + x) / y    return tdef f_div_rhs(n, a, b, c):    t = 0.0    for i in range(n):        t += c / (a + b)    return tdef f_ab_div_cd(n, a, b, c, d):    t = 0.0    for i in range(n):        t += (a / b) / (c / d)    return tdef f_ab_add_cd(n, a, b, c, d):    t = 0.0    for i in range(n):        t += (a / b) + (c / d)    return t# Warmupf_chain_mul(10000, 2.0, 3.0, 4.0)f_div(10000, 10.0, 3.0)f_chain_div(10000, 2.0, 3.0, 4.0)f_2px_div_y(10000, 3.0, 4.0)f_div_rhs(10000, 2.0, 3.0, 4.0)f_ab_div_cd(10000, 10.0, 3.0, 4.0, 5.0)f_ab_add_cd(10000, 10.0, 3.0, 4.0, 5.0)print("Float truediv benchmark (min of 3 runs):")bench("(a+b) * c              (baseline)", lambda: f_chain_mul(INNER, 2.0, 3.0, 4.0))bench("a / b                  (generic) ", lambda: f_div(INNER, 10.0, 3.0))bench("(a+b) / c              (inplace L)", lambda: f_chain_div(INNER, 2.0, 3.0, 4.0))bench("(2.0+x) / y            (inplace L)", lambda: f_2px_div_y(INNER, 3.0, 4.0))bench("c / (a+b)              (inplace R)", lambda: f_div_rhs(INNER, 2.0, 3.0, 4.0))bench("(a/b) / (c/d)          (type prop)", lambda: f_ab_div_cd(INNER, 10.0, 3.0, 4.0, 5.0))bench("(a/b) + (c/d)          (type prop)", lambda: f_ab_add_cd(INNER, 10.0, 3.0, 4.0, 5.0))

Analysis

The inplace truediv kicks in when at least one operand is a uniquely-referenced float (e.g. the result of a prior add/multiply). The optimizer emits_BINARY_OP_TRUEDIV_FLOAT_INPLACE or_INPLACE_RIGHT, saving onePyFloat_FromDouble allocation + deallocation per iteration.

The optimization works well for several cases. For some (e.g.(a/b) + (c/d) ) the performance gain is not due to an inplace division, but by better type propagation allowing the+ to be specialized inplace. Thea / b is also faster because of better type propagation and a+= in the test script.

…izerAdd inplace float true division ops that the tier 2 optimizer emitswhen at least one operand is a known float:- _BINARY_OP_TRUEDIV_FLOAT_INPLACE (unique LHS)- _BINARY_OP_TRUEDIV_FLOAT_INPLACE_RIGHT (unique RHS)The optimizer inserts _GUARD_TOS_FLOAT / _GUARD_NOS_FLOAT foroperands not yet known to be float, enabling specialization inexpressions like `(a + b) / c`.Also marks the result of all NB_TRUE_DIVIDE operations as uniquefloat in the abstract interpreter, enabling downstream inplace opseven for generic `a / b` (the `+=` can reuse the division result).Speeds up chain division patterns by ~2.3x and simple `total += a/b`by ~1.5x.Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operations that always return a new float (true division, float**int,int**negative_int, mixed int/float arithmetic) now mark their resultas PyJitRef_MakeUnique. This enables downstream operations to mutatethe result in place instead of allocating a new float.Int results are NOT marked unique because small ints are cached/immortal.Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@tomasr8tomasr8Awaiting requested review from tomasr8tomasr8 is a code owner

@Fidget-SpinnerFidget-SpinnerAwaiting requested review from Fidget-SpinnerFidget-Spinner is a code owner

@savannahostrowskisavannahostrowskiAwaiting requested review from savannahostrowskisavannahostrowski is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@eendebakpt@savannahostrowski

[8]ページ先頭

©2009-2026 Movatter.jp