Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Performance issue with ARM64 windows Python release binaries #134524

Open
Labels
OS-windowspendingThe issue will be closed if no feedback is providedperformancePerformance or resource usagetriagedThe issue has been accepted as valid by a triager.
@Akash9824

Description

@Akash9824

Bug report

Bug description:

Hi Team,

I have an X-Elite laptop with an ARM64-based SoC, and I’ve been running Python workloads on it. However, I’ve noticed that Python seems to perform slower on Windows for ARM devices. To investigate, I used pybench, which provides a solid set of test cases for performance benchmarking. i have also taken intel x64 Lunarlake 258V device which has similar geekbench performance like X-Elite to see the performance delta.

Pybench :https://share.gtd-gmbh.de/d/7e9368c6350a4894bf8f/files/?p=%2FWorklets%2FPyBench%2Fpybench-for-3.10.tar.gz&dl=1
I collected the following results:

<style></style>

EnvironmentTotal Time (ms)
Windows on ARM64802
Windows on x64507
WSL2 (Linux on windows ARM64)515

To further analyze, I tested multiple Python versions and observed that earlier ARM64 Windows releases performed better than the latest one:
Python Version Comparison

<style></style>

VersionWindows ARM64 (ms)Windows x64 (ms)
3.11.0763575
3.11.3589Not tested
3.11.6590Not tested
3.11.9568Not tested
3.12.0666545
3.12.5688Not tested
3.12.6700Not tested
3.12.7802Not tested
3.12.10802507

It’s clear that x64 performance has improved with each release, while ARM64 performance has been inconsistent, with a noticeable regression in the latest version.
I also cloned the Python 3.12.10 source and compiled it on the ARM64 Windows device using different compilers. I found that using clang-cl (19.1.2) with computed gotos enabled yielded significantly better performance than the official release:
Compiled vs. Released (Python 3.12.10)

<style></style>

versionARM64 (Release)​ARM64 (Compiled)​
Python v3.12.10 (ms)​802​628​

Here i have question:
Can anybody please share the compilation steps (which compiler and flags) used to compile release ARM64 Windows binaries? If it is MSVC, is there any specific reason for not using clang-cl? Based on my experiment with pybench, I am seeing good results with clang-cl. Are there any other test cases we are running with release binaries where clang is not performing better?

Analysis:
I have tried to collect ETL logs in Windows and profile the test with Profile Explorer. The actual bottleneck I am seeing is in the compiler interpreter. The function python312.dll!_PyEval_EvalFrameDefault is the bottleneck.

CPython versions tested on:

CPython main branch, 3.12

Operating systems tested on:

Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowspendingThe issue will be closed if no feedback is providedperformancePerformance or resource usagetriagedThe issue has been accepted as valid by a triager.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp