python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork34.3k
Star72.1k

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397

Draft

eendebakpt wants to merge 2 commits intopython:mainfrom

eendebakpt:jit_float_truediv

Draft

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397
eendebakpt wants to merge 2 commits intopython:mainfrom
eendebakpt:jit_float_truediv

Conversation

Copy link

Contributor

eendebakpt commentedMar 24, 2026•
edited by bedevere-appbot
Loading

We optimize float divisions for the case where one of the operands is a unique reference. This is similar to#146307, but with a guard for division by zero.

We do not add opcodes in tier 1
For tier 2 we can specialize for the case when one of the operations is a unique reference and for the case when there are no unique references. The case where there are no unique references (or we miss information about the uniqueness) has no performance improvement in itself, but might be usefull to propagate types better. For the cases tested so far it was not needed, so there is no_BINARY_TRUEDIV_FLOAT opcode for tier 2 in this PR.

Micro-benchmarks (min of 3 runs, 2M iterations)

Pattern	main (ns/iter)	branch (ns/iter)	Speedup	Notes
`(a+b) * c`	10.8	10.8	--	baseline (multiply, already optimized)
`a / b`	20.6	13.3	1.5x	unique result enables inplace +=
`(a+b) / c`	26.4	10.8	2.4x	inplace LHS, guard inserted for`c`
`(2.0+x) / y`	25.1	11.1	2.3x	inplace LHS, guard inserted for`y`
`c / (a+b)`	26.0	11.4	2.3x	inplace RHS, guard inserted for`c`
`(a/b) / (c/d)`	41.3	22.3	1.9x	unique results enable inplace ops
`(a/b) + (c/d)`	29.1	21.9	1.3x	unique results enable inplace +

Benchmark script

"""Benchmark for float true division tier 2 specialization.Usage:    ./python bench_truediv.py"""import timeitN = 2_000_000INNER = 1000def bench(label, fn):    iters = N // INNER    times = [timeit.timeit(fn, number=iters) for _ in range(3)]    t = min(times)    print(f"  {label}: {t/N*1e9:.1f} ns/iter")def f_chain_mul(n, a, b, c):    t = 0.0    for i in range(n):        t += (a + b) * c    return tdef f_div(n, a, b):    t = 0.0    for i in range(n):        t += a / b    return tdef f_chain_div(n, a, b, c):    t = 0.0    for i in range(n):        t += (a + b) / c    return tdef f_2px_div_y(n, x, y):    t = 0.0    for i in range(n):        t += (2.0 + x) / y    return tdef f_div_rhs(n, a, b, c):    t = 0.0    for i in range(n):        t += c / (a + b)    return tdef f_ab_div_cd(n, a, b, c, d):    t = 0.0    for i in range(n):        t += (a / b) / (c / d)    return tdef f_ab_add_cd(n, a, b, c, d):    t = 0.0    for i in range(n):        t += (a / b) + (c / d)    return t# Warmupf_chain_mul(10000, 2.0, 3.0, 4.0)f_div(10000, 10.0, 3.0)f_chain_div(10000, 2.0, 3.0, 4.0)f_2px_div_y(10000, 3.0, 4.0)f_div_rhs(10000, 2.0, 3.0, 4.0)f_ab_div_cd(10000, 10.0, 3.0, 4.0, 5.0)f_ab_add_cd(10000, 10.0, 3.0, 4.0, 5.0)print("Float truediv benchmark (min of 3 runs):")bench("(a+b) * c              (baseline)", lambda: f_chain_mul(INNER, 2.0, 3.0, 4.0))bench("a / b                  (generic) ", lambda: f_div(INNER, 10.0, 3.0))bench("(a+b) / c              (inplace L)", lambda: f_chain_div(INNER, 2.0, 3.0, 4.0))bench("(2.0+x) / y            (inplace L)", lambda: f_2px_div_y(INNER, 3.0, 4.0))bench("c / (a+b)              (inplace R)", lambda: f_div_rhs(INNER, 2.0, 3.0, 4.0))bench("(a/b) / (c/d)          (type prop)", lambda: f_ab_div_cd(INNER, 10.0, 3.0, 4.0, 5.0))bench("(a/b) + (c/d)          (type prop)", lambda: f_ab_add_cd(INNER, 10.0, 3.0, 4.0, 5.0))

Analysis

The inplace truediv kicks in when at least one operand is a uniquely-referenced float (e.g. the result of a prior add/multiply). The optimizer emits_BINARY_OP_TRUEDIV_FLOAT_INPLACE or_INPLACE_RIGHT, saving onePyFloat_FromDouble allocation + deallocation per iteration.

The optimization works well for several cases. For some (e.g.(a/b) + (c/d) ) the performance gain is not due to an inplace division, but by better type propagation allowing the+ to be specialized inplace. Thea / b is also faster because of better type propagation and a+= in the test script.

Issue:Optimize float division operations by mutating uniquely-referenced operands in place (JIT only) #146393

pythongh-146306: Specialize float/float true division in tier 2 optim…

c846269

…izerAdd inplace float true division ops that the tier 2 optimizer emitswhen at least one operand is a known float:- _BINARY_OP_TRUEDIV_FLOAT_INPLACE (unique LHS)- _BINARY_OP_TRUEDIV_FLOAT_INPLACE_RIGHT (unique RHS)The optimizer inserts _GUARD_TOS_FLOAT / _GUARD_NOS_FLOAT foroperands not yet known to be float, enabling specialization inexpressions like `(a + b) / c`.Also marks the result of all NB_TRUE_DIVIDE operations as uniquefloat in the abstract interpreter, enabling downstream inplace opseven for generic `a / b` (the `+=` can reuse the division result).Speeds up chain division patterns by ~2.3x and simple `total += a/b`by ~1.5x.Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eendebakpt requested review fromFidget-Spinner,markshannon,savannahostrowski andtomasr8 ascode owners

March 24, 2026 22:40

bedevere-appbot added the awaiting review label

Mar 24, 2026

bedevere-appbot mentioned this pull request

Mar 24, 2026

Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146393

Open

eendebakpt marked this pull request as draft

March 24, 2026 22:56

bedevere-appbot removed the awaiting review label

Mar 24, 2026

Mark results of float-producing _BINARY_OP as unique

97889f5

Operations that always return a new float (true division, float**int,int**negative_int, mixed int/float arithmetic) now mark their resultas PyJitRef_MakeUnique. This enables downstream operations to mutatethe result in place instead of allocating a new float.Int results are NOT marked unique because small ints are cached/immortal.Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397
eendebakpt wants to merge 2 commits intopython:mainfrom
eendebakpt:jit_float_truediv

Conversation

eendebakpt commentedMar 24, 2026•
edited by bedevere-appbot
Loading

Uh oh!

Micro-benchmarks (min of 3 runs, 2M iterations)

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Movatterモバイル変換

Uh oh!

Conversation

eendebakpt commentedMar 24, 2026• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Micro-benchmarks (min of 3 runs, 2M iterations)

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eendebakpt commentedMar 24, 2026•
edited by bedevere-appbot
Loading