The Viper integer-indexed generator bits are now able to utilise a specialised operation implementation if the platform emitter exposes one. This means that macros namedASM_{LOAD,STORE}{8,16,32}_REG_REG_OFFSET are used if they are present in the platform backend.

ASM_LOAD16_REG_REG_OFFSET,ASM_LOAD32_REG_REG_OFFSET, andASM_STORE32_REG_REG_OFFSET were already present since they are used by the native emitter, so their implementations are used also by the Viper emitter now. This may or may not reduce the code size, but it streamlines the current behaviour by having one single optimised implementation on each platform.

However, those three implementations are now exposed to scenarios where accessing larger than usual offsets is a somewhat probable occurrence - requiring some changes to make them work in all possible cases on Arm, Thumb, and Xtensa. The full set of platform-optimised integer indexed load/store operations is not implemented for all Viper operations on all platforms, as the focus of this PR is ARMv7-M improvements. The rest of the changes is there to adapt the other platform backends to the infrastructure changes needed for ARMv7-M. The remaining load/store operations' code will come later in a different PR if the code size increase can be kept to a minimum.

The Arm emitter is now able to generateLDR andSTR integer-indexed opcodes whose offsets can span the full 32 bits address space. The original behaviour would be undefined if the offset was larger than 12 bits, as it would corrupt the upper bits of the generated opcode itself.

The Thumb emitter is now able to generate ARMv7-M opcodes for all data sizes if both ARMv7-M opcodes are allowed and if the opcode offset spans between 6 and 12 bits. In that case a single 32 bits opcode will be generated (LDR{B,H}.W andSTR{B,H}.W), falling back to the existing generic code sequence generator if the offset is too big to be embedded in a single opcode.

The Xtensa emitter is now able to generate opcode sequences spanning the full 32 bits address space for 32 bits loads and stores, and for 16 bit loads. The existing implementations would raise aRuntimeError if the offset could not be embedded in a single opcode.

Testing

The native and viper tests intests/micropython ran successfully on QEMUMPS2-AN385,SABRELITE, andVIRT_RV32 boards, on a NodeMCU v3 8266 board, on a generic ESP32 D0WD board, and on x64 via the Unix port.

Trade-offs and Alternatives

I've tried to keep the extra code footprint as limited as possible without having to make the code too complex to follow (leaving some potential ARMv7-M only optimisations out as well, taking up a bit more space).

This is mostly evident for armv6/armv7 targets, where the overhead should be ~200 bytes. This was offset a bit by moving out some code from the native emitter to the platform backends, where a simpler implementation could suffice.

However there are some benefits in more optimised code being generated, and in a slight decrease of complexity in the Viper emitter as less platform-specific code is present there.

Copy link

codecovbot commentedMay 22, 2025•
edited
Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base(e43a384) to head(43f6013).
Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@##           master   #17342      +/-   ##==========================================- Coverage   98.54%   98.54%   -0.01%==========================================  Files         169      169                Lines       21943    21938       -5     ==========================================- Hits        21623    21618       -5  Misses        320      320

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:

❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actionsbot commentedMay 22, 2025•
edited
Loading

Code size report:

   bare-arm:    +0 +0.000% minimal x86:    +0 +0.000%    unix x64:   -80 -0.009% standard      stm32:  +184 +0.047% PYBV10     mimxrt:  +176 +0.047% TEENSY40        rp2:  -152 -0.017% RPI_PICO_W       samd:  +188 +0.070% ADAFRUIT_ITSYBITSY_M4_EXPRESS  qemu rv32:   -58 -0.013% VIRT_RV32

agatti force-pushed theoh-no-more-viper-improvements branch 2 times, most recently from59f8f83 to5e3fe57Compare

May 28, 2025 04:18

dpgeorge reviewed

May 29, 2025

View reviewed changes

py/asmx86.h OutdatedShow resolvedHide resolved

dpgeorge added this to therelease-1.26.0 milestone

Jun 4, 2025

agatti force-pushed theoh-no-more-viper-improvements branch from5e3fe57 to0d4598eCompare

June 4, 2025 20:35

Copy link

ContributorAuthor

agatti commentedJun 4, 2025•
edited
Loading

Hrm, the test runner change is clashing withnanbox? Looking into it now...

Yep,nanbox doesn't support native emitters and CI requests a full test run, which forces native code output, and you can guess the rest.

agatti force-pushed theoh-no-more-viper-improvements branch from8e2bd5b toe801a72Compare

June 4, 2025 21:24

Copy link

ContributorAuthor

agatti commentedJun 4, 2025

There are lots of ways to work around thenanbox CI issue - please let me know if the one I chose is OK with you.

dpgeorge reviewed

Jun 5, 2025

View reviewed changes

py/asmxtensa.c OutdatedShow resolvedHide resolved

dpgeorge reviewed

Jun 5, 2025

View reviewed changes

py/asmthumb.h

		#define ASM_STORE_REG_REG_OFFSET(as, reg_src, reg_base, word_offset) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), (word_offset))
		#define ASM_STORE8_REG_REG(as, reg_src, reg_base) asm_thumb_strb_rlo_rlo_i5((as), (reg_src), (reg_base), 0)
		#define ASM_STORE16_REG_REG(as, reg_src, reg_base) asm_thumb_strh_rlo_rlo_i5((as), (reg_src), (reg_base), 0)
		#define ASM_STORE32_REG_REG(as, reg_src, reg_base) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), 0)

Copy link

Member

dpgeorgeJun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These inline functions likeasm_thumb_strb_rlo_rlo_i5() are now unused, so could be removed. But maybe worth keeping them around in case they are needed again one day?

Copy link

ContributorAuthor

agattiJun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

They're not adding any code to the final binary and they're under version control anyway. I'd personally take them out but maybe some external code is relying on those functions?

Copy link

Member

dpgeorgeJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's leave them in for now, until all these changes to the native emitter settle down.

dpgeorge reviewed

Jun 5, 2025

View reviewed changes

py/asmarm.c OutdatedShow resolvedHide resolved

agatti force-pushed theoh-no-more-viper-improvements branch 2 times, most recently from07818dc to15cb103Compare

June 9, 2025 06:05

agattiand others added10 commits

June 10, 2025 11:29

py/emitnative: Let Viper int-indexed code use appropriate operands.

78ee1ba

This commit extends the generic ASM API by adding the rest of theASM_{LOAD,STORE}[size]_REG_REG_OFFSET macros whenever applicable.The Viper int-indexed load/store code generator was changed to use thoseAPI functions if they are available, falling back to backend-specificimplementations if possible and ultimately to a generic implementation.Right now all backends except for x64 implement load16, load32, andstore32 operations (x64 only implements load16).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmthumb: Extend load/store generators with ARMv7-M opcodes.

1f5ba69

This commit lets the Thumb native code generator backend emit ARMv7-Mspecific opcodes for indexed load/store operations if possible.Now T3 opcode encodings are used if the generator backend is configuredto allow emitting ARMv7-M opcodes and if the (unsigned) scaled indexfits in 12 bits.  Or, in other words, LDR{B,H}.W and STR{B,H}.W opcodesare now emitted if possible.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmxtensa: Extend existing specialised load/store operations range.

84ad2c6

This commit updates the existing specialised implementations forint-indexed 32-bit load and store operations, and adds a specialisedimplementation for int-indexed 16-bit load.The 32-bit operations relied on the fact that their applicability waslimited to a specific range, falling back on a generic implementationotherwise.  Introducing a single entry point for each int-indexedload/store operation size would break that assumption.  Now those twooperations contain fallback code to generate working code by themselvesinstead of raising an exception.The 16-bit operation instead simply did not have any range check, but itwas not exposed directly to the Viper emitter.  When a 16-bitint-indexed load entry point was introduced, the existing implementationwould fail when accessing memory outside its 0..255 halfwords range.  Aspecialised implementation is now present, performing fewer operationsthan the existing Viper emitter equivalent.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/emitnative: Remove redundant RV32 Viper int-indexed code.

901c96d

This commit removes redundant RV32 implementations of certainint-indexed code generation operations (32-bit load/store and 16-bitload).Those operations were already available as part of the native emitterAPI but were not exposed to the Viper code generator.  As part of theintroduction of more specialised load and store API calls toint-indexed Viper load/store generator bits, the existing native emitterimplementations are reused, thus making the Viper implementationsredundant.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmarm: Extend int-indexed 32-bit load/store offset ranges.

bbab2e9

This commit extends the range for int-indexed load/store opcodegenerators, making them emit correct code sequences for offsets thatspan more than 12 bits.This is necessary due to those generator bits being also used in theViper emitter, where it's more probable to reference offsets that cannot be embedded in the LDR/STR opcodes.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmarm: Give a proper name to the temporary register.

5b90d6d

This commit performs a small refactoring on the Arm native emitter, byrenaming all but one instance of ASM_ARM_REG_R8 into REG_TEMP.ASM_ARM_REG_R8 is the temporary register used by the emitter whenoperations cannot overwrite the value of a particular register and someextra storage is needed.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

tests/run-tests.py: Unconditionally enable native tests if asked.

c1c73d9

This commit lets the test runner enumerate and run native tests if thefeature check fails but native tests were explicitly requested from thecommand line.The old behaviour would disable native tests anyway if the feature checkfailed, however this hid a bug in the x86 native emitter that would betriggered even during the feature check.  That meant the test suitewould pass on x86 even with a broken emitter, as those tests would havebeen skipped anyway.Now, if the user asks for native code it will get native code out of therunner no matter what.Co-authored-by: Damien George <damien@micropython.org>Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

tools/ci.sh: Fix nanbox CI test runs.

0da22b2

This commit fixes CI test runs for the `nanbox` target, which werebroken by the unconditional native emitter code output changes in thetest runner.The `nanbox` configuration does not enable native emitters of any kind,and with a full test run that includes executing emitted native codethings would break when doing CI runs.This is worked around by introducing a common subset of tests that donot involve the native emitter, and a more comprehensive set of teststhat include both non-emitter and emitter tests.  The `nanbox` CI testrun will stop at the first subset, whilst other configurations will runthat and execute further tests.Function names have been kept the same for steps that involve nativecode, with the `nanbox` subset having another one.  This should nottrigger any breakage in existing CI configurations or external scripts.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmxtensa: Extend BCCZ range to 18 bits.

80b823b

This commit lets the native emitter backend extends the range of theBCCZ family of opcodes (BEQZ, BNEZ, BLTZ, BGEZ) from 12 bits to 18bits.The test suite contains some test files that, when compiled into nativecode, would require BCCZ jumps outside the (signed) 12 bits range.  Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range.  However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (3 bytes for a 12-bitsoffset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

py/asmxtensa: Extend BCC range to 18 bits.

43f6013

This commit lets the native emitter backend extends the range of theBCC family of opcodes (BALL, BANY, BBC, BBS, BEQ, BGE, BGEU, BLT,BLTU, BNALL, BNE, BNONE) from 8 bits to 18 bits.The test suite contains some test files that, when compiled into nativecode, would require BCC jumps outside the (signed) 8 bits range.  Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range.  However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (8-bits offset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

dpgeorge force-pushed theoh-no-more-viper-improvements branch from15cb103 to43f6013Compare

June 10, 2025 02:56

dpgeorge approved these changes

Jun 10, 2025

View reviewed changes

Copy link

Member

dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for updating/fixing. It looks like everything is resolved now.

Tested on ESP8266_GENERIC and ESP32_GENERIC, withrun-tests.py --via-mpy --emit native. Not everything passes, but those that fail did already fail before this PR.

dpgeorge merged commit43f6013 intomicropython:master

Jun 10, 2025

69 checks passed

agatti deleted the oh-no-more-viper-improvements branch

June 10, 2025 09:41

Labels

None yet

2 participants

Movatterモバイル変換

Uh oh!

py/emitnative: Further Viper integer-indexed load/store improvements.#17342

py/emitnative: Further Viper integer-indexed load/store improvements.#17342

Uh oh!

Conversation

agatti commentedMay 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecovbot commentedMay 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actionsbot commentedMay 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

agatti commentedJun 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

agatti commentedJun 4, 2025

Uh oh!

Uh oh!

dpgeorgeJun 5, 2025

Choose a reason for hiding this comment

Uh oh!

agattiJun 8, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorgeJun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dpgeorge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

agatti commentedMay 22, 2025•
edited
Loading

codecovbot commentedMay 22, 2025•
edited
Loading

github-actionsbot commentedMay 22, 2025•
edited
Loading

agatti commentedJun 4, 2025•
edited
Loading