python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork34k
Star71.3k

Tail calling interpreter for MSVC #139922

New issue

Closed

Tail calling interpreter for MSVC#139922

Labels

OS-windowsinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-featureA feature request or enhancement

Description

Fidget-Spinner

opened

on Oct 10, 2025

Feature or enhancement

Proposal:

We should get tail calling interpreter support for MSVC.
The latest up-to-date figures for the tail calling interpreter are:

1-3% pyperformance faster on Ubuntu x64
4-5% pyperformance faster on macOS AArch64
The last benchmarks for the tail calling interpreter on Windows MSVC reported a17% speedup on pyperformance.

On Windows, the performance isn't easy to measure becausepyperf system tune doesn't work on there. However, on a best-effort quiet system and some benchmarks from pyperformance on my system, these are my results:

Mean +- std dev: [spectralnorm_tc_no] 146 ms +- 1 ms -> [spectralnorm_tc] 98.3 ms +- 1.1 ms: 1.48x fasterMean +- std dev: [nbody_tc_no] 145 ms +- 2 ms -> [nbody_tc] 107 ms +- 2 ms: 1.35x fasterMean +- std dev: [bm_django_template_tc_no] 26.9 ms +- 0.5 ms -> [bm_django_template_tc] 22.8 ms +- 0.4 ms: 1.18x fasterMean +- std dev: [xdsl_tc_no] 64.2 ms +- 1.6 ms -> [xdsl_tc] 56.1 ms +- 1.5 ms: 1.14x faster

Note that I had to apply a 2-line patch to fix xDSL for Python 3.15.

Nbody and spectralnorm are toy shootout benchmarks. xDSL is anew benchmark added that is a MLIR DSL library with written almost entirely in pure Python. I consider it a pure-Python large library not too far off from mypy (though it's obviously smaller). 14% faster for xDSL is massive. It's roughly half of thespeedup we got in 3.11 with the specializing interpreter, and that was thousands of lines of code. This change in contrast will be just one PR.Django templating shows a big speedup too -- 18% faster.

This could not have been possible without help from the MSVC team. Specifically, I'd like to give a shoutout to Hulon Jenkins and the MSVC team in the patch notes when I land this.

I've discussed this during the Wednesday CHIPS meetings, and there was no opposition to the new set of changes required to get this working. The changes are minimal and mostly just involve adding correctrestrict and scoping to things to tell MSVC a local variable doesn't escape. This should also benefit GCC and Clang in some fashion.

We require working CI before this is merged, so I am now waiting on GitHub actions or some other CI to get this working.

Very special shoutout as well to@chris-eibl who has been helping me with this on Windows.

Some other benefits: the TC on MSVC actually correctly "resets" the inlining heuristic on MSVC. This means eventually we can get rid of all the macros and ugly hacks we have to make the current interpreter faster on MSVC once we distribute the builds with tail calling. Example of a hack where we use macros over static inlines on MSVC just because of the interpreter loop breaking the inliniing heuristics of MSVC#121263

Possibility of a compiler bug this time

I doubt it, as MSVC has no computed goto, so it can't have the same bug as Clang that we bumped into the previous time.

Where are the perf gains coming from?

It's mainly the better inlining that we get from the tail calling interpreter, and elimination of double jumps vs the switch-case interpreter.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Assignees

No one assigned

Labels

OS-windowsinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tail calling interpreter for MSVC #139922

Description

Feature or enhancement

Proposal:

Possibility of a compiler bug this time

Where are the perf gains coming from?

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions