Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

py: Add a new dual-VM option to improve performance ofsys.settrace#18571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dpgeorge wants to merge8 commits intomicropython:master
base:master
Choose a base branch
Loading
fromdpgeorge:py-settrace-dual-vm

Conversation

@dpgeorge
Copy link
Member

Summary

MicroPython supports the standardsys.settrace() facility, to trace bytecode execution. That feature is enabled throughMICROPY_PY_SYS_SETTRACE and is disabled by default because it really slows the VM down, and increases memory use in many places due to the overhead of tracking the state of functions.

This PR adds a new optionMICROPY_PY_SYS_SETTRACE_DUAL_VM that, when enabled, duplicates the VM, themp_execute_bytecode() function:

  1. One version is calledmp_execute_bytecode_standard() and has settrace disabled.
  2. The other version is calledmp_execute_bytecode_tracing() and has settrace enabled.

Then a wrapper functionmp_execute_bytecode() calls into the appropriate VM depending on whether the user has enabled a settrace callback or not.

The result is:

  1. An increase of about 5k in firmware size (on ARM Cortex-M) when enabling the dual-VM (that's on top of having settrace already enabled).
  2. Almost no performance is lost if a settrace callback is not registered.
  3. Full support for settrace if needed.

This is useful for, eg, webassembly where PyScript wants to havesys.settrace enabled, and can afford to have a dual-VM in order not to lose performance with settrace enabled.

This PR also includes some minor fixes for the webassembly port to be able to run the performance benchmarks on that port.

Testing

Tested by running the standard test suite on the unix and webassembly ports, which also have dual-VM enabled in CI.

On the webassembly port, theperf_bench/misc_aes.py performance test runs 200 times faster (!) with dual-VM enabled, compared to just enabling settrace without the dual-VM. So it's a very big win there.

On PYBV10 the metrics are:

  • code size increases by 14.5k withMICROPY_PY_SYS_SETTRACE enabled
  • code size increases by an extra 5.3k withMICROPY_PY_SYS_SETTRACE_DUAL_VM enabled as well
  • performance is roughly halved when enabling settrace
  • performance is pretty much all regained when enabling dual-VM

The performance measurements on PYBV10 in detail

EnablingMICROPY_PY_SYS_SETTRACE:

diff of scores (higher is better)N=100 M=100                perf-pyb-std -> perf-pyb-settrace         diff      diff% (error%)bm_chaos.py                    356.93 ->     261.95 :     -94.98 = -26.610% (+/-0.01%)bm_fannkuch.py                  72.90 ->      61.28 :     -11.62 = -15.940% (+/-0.00%)bm_fft.py                     2331.03 ->    1681.29 :    -649.74 = -27.874% (+/-0.02%)bm_hexiom.py                    45.75 ->      13.35 :     -32.40 = -70.820% (+/-0.00%)bm_nqueens.py                 4111.33 ->    2695.70 :   -1415.63 = -34.432% (+/-0.00%)bm_pidigits.py                 663.16 ->     606.83 :     -56.33 =  -8.494% (+/-0.02%)bm_wordcount.py                 47.36 ->      39.05 :      -8.31 = -17.546% (+/-0.01%)core_import_mpy_multi.py       640.86 ->     483.22 :    -157.64 = -24.598% (+/-0.01%)core_import_mpy_single.py      105.97 ->      84.55 :     -21.42 = -20.213% (+/-0.01%)core_locals.py                  40.28 ->      32.41 :      -7.87 = -19.538% (+/-0.00%)core_qstr.py                   204.36 ->     192.74 :     -11.62 =  -5.686% (+/-0.00%)core_str.py                     25.86 ->      22.55 :      -3.31 = -12.800% (+/-0.00%)core_yield_from.py             355.61 ->       3.84 :    -351.77 = -98.920% (+/-0.00%)misc_aes.py                    407.85 ->      32.12 :    -375.73 = -92.125% (+/-0.01%)misc_mandel.py                3166.90 ->    2791.44 :    -375.46 = -11.856% (+/-0.01%)misc_pystone.py               2378.74 ->     279.43 :   -2099.31 = -88.253% (+/-0.00%)misc_raytrace.py               361.02 ->     256.57 :    -104.45 = -28.932% (+/-0.01%)viper_call0.py                 499.49 ->     537.73 :     +38.24 =  +7.656% (+/-0.00%)viper_call1a.py                522.73 ->     514.72 :      -8.01 =  -1.532% (+/-0.01%)viper_call1b.py                379.77 ->     385.81 :      +6.04 =  +1.590% (+/-0.00%)viper_call1c.py                384.11 ->     389.38 :      +5.27 =  +1.372% (+/-0.01%)viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.00%)viper_call2b.py                335.06 ->     341.81 :      +6.75 =  +2.015% (+/-0.01%)

Tests that are heavy on the VM (misc_aes.py andmisc_pystone.py) run at about half the speed with settrace enabled.

Enabling bothMICROPY_PY_SYS_SETTRACE andMICROPY_PY_SYS_SETTRACE_DUAL_VM (comparing to settrace completely disabled):

diff of scores (higher is better)N=100 M=100                perf-pyb-std -> perf-pyb-settrace-dual-vm         diff      diff% (error%)bm_chaos.py                    356.93 ->     353.71 :      -3.22 =  -0.902% (+/-0.01%)bm_fannkuch.py                  72.90 ->      72.78 :      -0.12 =  -0.165% (+/-0.00%)bm_fft.py                     2331.03 ->    2320.26 :     -10.77 =  -0.462% (+/-0.00%)bm_float.py                   5713.01 ->    5607.72 :    -105.29 =  -1.843% (+/-0.02%)bm_hexiom.py                    45.75 ->      45.06 :      -0.69 =  -1.508% (+/-0.00%)bm_nqueens.py                 4111.33 ->    4035.90 :     -75.43 =  -1.835% (+/-0.00%)bm_pidigits.py                 663.16 ->     639.93 :     -23.23 =  -3.503% (+/-0.02%)bm_wordcount.py                 47.36 ->      46.71 :      -0.65 =  -1.372% (+/-0.00%)core_import_mpy_multi.py       640.86 ->     608.33 :     -32.53 =  -5.076% (+/-0.01%)core_import_mpy_single.py      105.97 ->     101.29 :      -4.68 =  -4.416% (+/-0.01%)core_locals.py                  40.28 ->      39.92 :      -0.36 =  -0.894% (+/-0.00%)core_qstr.py                   204.36 ->     202.55 :      -1.81 =  -0.886% (+/-0.03%)core_str.py                     25.86 ->      25.71 :      -0.15 =  -0.580% (+/-0.00%)core_yield_from.py             355.61 ->     328.12 :     -27.49 =  -7.730% (+/-0.01%)misc_aes.py                    407.85 ->     398.19 :      -9.66 =  -2.369% (+/-0.01%)misc_mandel.py                3166.90 ->    3148.39 :     -18.51 =  -0.584% (+/-0.01%)misc_pystone.py               2378.74 ->    2336.25 :     -42.49 =  -1.786% (+/-0.00%)misc_raytrace.py               361.02 ->     357.93 :      -3.09 =  -0.856% (+/-0.01%)viper_call0.py                 499.49 ->     495.08 :      -4.41 =  -0.883% (+/-0.01%)viper_call1a.py                522.73 ->     514.77 :      -7.96 =  -1.523% (+/-0.00%)viper_call1b.py                379.77 ->     374.68 :      -5.09 =  -1.340% (+/-0.01%)viper_call1c.py                384.11 ->     378.07 :      -6.04 =  -1.572% (+/-0.00%)viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.01%)viper_call2b.py                335.06 ->     332.42 :      -2.64 =  -0.788% (+/-0.00%)

Performance is almost all regained with dual-VM enabled.

Trade-offs and Alternatives

This obviously duplicates the VM, but I tried to make it as simple as possible in the implementation so that it's easy to maintain. In the future this mechanism could potentially be extended to optimise for other things, although I can't think of any obvious candidates for that.

An alternative to having a dual-VM could be to try and optimise the internal VM macrosFRAME_UPDATE andTRACE_TICK. Although that could save code size, no matter how well these macros are optimised it won't be able to reach the performance of a dual-VM architecture.

I'm not sure exactly how CPython implements efficientsys.settrace these days (they've changed a lot recently with their VM), but could look there for alternative ideas as well.

Josverl reacted with heart emoji
Signed-off-by: Damien George <damien@micropython.org>
For testing.Signed-off-by: Damien George <damien@micropython.org>
@github-actions
Copy link

github-actionsbot commentedDec 15, 2025
edited
Loading

Code size report:

Reference:  all: Bump version to 1.28.0-preview. [099cc9f]Comparison: py/vm: Always track current frame when settrace enabled. [merge of bf7cab6]  mpy-cross:    +0 +0.000%    bare-arm:    +0 +0.000% minimal x86:    +0 +0.000%    unix x64:    +0 +0.000% standard      stm32:    +0 +0.000% PYBV10     mimxrt:    +0 +0.000% TEENSY40        rp2:    +0 +0.000% RPI_PICO_W       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS  qemu rv32:    +0 +0.000% VIRT_RV32

It can be `undefined` as well.Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Enabling `MICROPY_PY_SYS_SETTRACE` increases the output wasm file by +41k.Enabling `MICROPY_PY_SYS_SETTRACE_DUAL_VM` on top of that adds a further+17k, but performance is regained.  In particular, `perf_bench/misc_aes.py`runs 200 times faster with dual-VM enabled compared to having it disabledwith `sys.settrace` enabled.Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
So that, after enabling a settrace callback, the outer-most function callwill call into the callback when dual-VM is enabled.Signed-off-by: Damien George <damien@micropython.org>
@codecov
Copy link

codecovbot commentedDec 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.36%. Comparing base (78ff170) to head (bf7cab6).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@##           master   #18571      +/-   ##==========================================- Coverage   98.38%   98.36%   -0.02%==========================================  Files         171      172       +1       Lines       22300    22052     -248     ==========================================- Hits        21939    21691     -248  Misses        361      361

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dpgeorgedpgeorge marked this pull request as draftDecember 15, 2025 04:46
@dpgeorge
Copy link
MemberAuthor

I have no idea why the webassembly CI is failing to run the test suite. Running it locally works fine. Needs further investigation.

There are also some caveats I need to detail when using this option.

So, marking this PR as draft for now.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

port-webassemblypy-coreRelates to py/ directory in source

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@dpgeorge

[8]ページ先頭

©2009-2025 Movatter.jp