Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tail calling interpreter for MSVC #139922

Closed
Closed
Labels
@Fidget-Spinner

Description

@Fidget-Spinner

Feature or enhancement

Proposal:

We should get tail calling interpreter support for MSVC.
The latest up-to-date figures for the tail calling interpreter are:

  • 1-3% pyperformance faster on Ubuntu x64
  • 4-5% pyperformance faster on macOS AArch64
  • The last benchmarks for the tail calling interpreter on Windows MSVC reported a17% speedup on pyperformance.

On Windows, the performance isn't easy to measure becausepyperf system tune doesn't work on there. However, on a best-effort quiet system and some benchmarks from pyperformance on my system, these are my results:

Mean +- std dev: [spectralnorm_tc_no] 146 ms +- 1 ms -> [spectralnorm_tc] 98.3 ms +- 1.1 ms: 1.48x fasterMean +- std dev: [nbody_tc_no] 145 ms +- 2 ms -> [nbody_tc] 107 ms +- 2 ms: 1.35x fasterMean +- std dev: [bm_django_template_tc_no] 26.9 ms +- 0.5 ms -> [bm_django_template_tc] 22.8 ms +- 0.4 ms: 1.18x fasterMean +- std dev: [xdsl_tc_no] 64.2 ms +- 1.6 ms -> [xdsl_tc] 56.1 ms +- 1.5 ms: 1.14x faster

Note that I had to apply a 2-line patch to fix xDSL for Python 3.15.

Nbody and spectralnorm are toy shootout benchmarks. xDSL is anew benchmark added that is a MLIR DSL library with written almost entirely in pure Python. I consider it a pure-Python large library not too far off from mypy (though it's obviously smaller). 14% faster for xDSL is massive. It's roughly half of thespeedup we got in 3.11 with the specializing interpreter, and that was thousands of lines of code. This change in contrast will be just one PR.Django templating shows a big speedup too -- 18% faster.

This could not have been possible without help from the MSVC team. Specifically, I'd like to give a shoutout to Hulon Jenkins and the MSVC team in the patch notes when I land this.

I've discussed this during the Wednesday CHIPS meetings, and there was no opposition to the new set of changes required to get this working. The changes are minimal and mostly just involve adding correctrestrict and scoping to things to tell MSVC a local variable doesn't escape. This should also benefit GCC and Clang in some fashion.

We require working CI before this is merged, so I am now waiting on GitHub actions or some other CI to get this working.

Very special shoutout as well to@chris-eibl who has been helping me with this on Windows.

Some other benefits: the TC on MSVC actually correctly "resets" the inlining heuristic on MSVC. This means eventually we can get rid of all the macros and ugly hacks we have to make the current interpreter faster on MSVC once we distribute the builds with tail calling. Example of a hack where we use macros over static inlines on MSVC just because of the interpreter loop breaking the inliniing heuristics of MSVC#121263

Possibility of a compiler bug this time

I doubt it, as MSVC has no computed goto, so it can't have the same bug as Clang that we bumped into the previous time.

Where are the perf gains coming from?

It's mainly the better inlining that we get from the tail calling interpreter, and elimination of double jumps vs the switch-case interpreter.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2026 Movatter.jp