Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.3k
py/emitnative: Further Viper integer-indexed load/store improvements.#17342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
py/emitnative: Further Viper integer-indexed load/store improvements.#17342
Uh oh!
There was an error while loading.Please reload this page.
Conversation
codecovbot commentedMay 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@## master #17342 +/- ##==========================================- Coverage 98.54% 98.54% -0.01%========================================== Files 169 169 Lines 21943 21938 -5 ==========================================- Hits 21623 21618 -5 Misses 320 320 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
github-actionsbot commentedMay 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Code size report:
|
59f8f83
to5e3fe57
CompareUh oh!
There was an error while loading.Please reload this page.
5e3fe57
to0d4598e
Compareagatti commentedJun 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hrm, the test runner change is clashing with Yep, |
8e2bd5b
toe801a72
CompareThere are lots of ways to work around the |
Uh oh!
There was an error while loading.Please reload this page.
#define ASM_STORE_REG_REG_OFFSET(as, reg_src, reg_base, word_offset) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), (word_offset)) | ||
#define ASM_STORE8_REG_REG(as, reg_src, reg_base) asm_thumb_strb_rlo_rlo_i5((as), (reg_src), (reg_base), 0) | ||
#define ASM_STORE16_REG_REG(as, reg_src, reg_base) asm_thumb_strh_rlo_rlo_i5((as), (reg_src), (reg_base), 0) | ||
#define ASM_STORE32_REG_REG(as, reg_src, reg_base) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These inline functions likeasm_thumb_strb_rlo_rlo_i5()
are now unused, so could be removed. But maybe worth keeping them around in case they are needed again one day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
They're not adding any code to the final binary and they're under version control anyway. I'd personally take them out but maybe some external code is relying on those functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Let's leave them in for now, until all these changes to the native emitter settle down.
Uh oh!
There was an error while loading.Please reload this page.
07818dc
to15cb103
CompareThis commit extends the generic ASM API by adding the rest of theASM_{LOAD,STORE}[size]_REG_REG_OFFSET macros whenever applicable.The Viper int-indexed load/store code generator was changed to use thoseAPI functions if they are available, falling back to backend-specificimplementations if possible and ultimately to a generic implementation.Right now all backends except for x64 implement load16, load32, andstore32 operations (x64 only implements load16).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the Thumb native code generator backend emit ARMv7-Mspecific opcodes for indexed load/store operations if possible.Now T3 opcode encodings are used if the generator backend is configuredto allow emitting ARMv7-M opcodes and if the (unsigned) scaled indexfits in 12 bits. Or, in other words, LDR{B,H}.W and STR{B,H}.W opcodesare now emitted if possible.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit updates the existing specialised implementations forint-indexed 32-bit load and store operations, and adds a specialisedimplementation for int-indexed 16-bit load.The 32-bit operations relied on the fact that their applicability waslimited to a specific range, falling back on a generic implementationotherwise. Introducing a single entry point for each int-indexedload/store operation size would break that assumption. Now those twooperations contain fallback code to generate working code by themselvesinstead of raising an exception.The 16-bit operation instead simply did not have any range check, but itwas not exposed directly to the Viper emitter. When a 16-bitint-indexed load entry point was introduced, the existing implementationwould fail when accessing memory outside its 0..255 halfwords range. Aspecialised implementation is now present, performing fewer operationsthan the existing Viper emitter equivalent.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit removes redundant RV32 implementations of certainint-indexed code generation operations (32-bit load/store and 16-bitload).Those operations were already available as part of the native emitterAPI but were not exposed to the Viper code generator. As part of theintroduction of more specialised load and store API calls toint-indexed Viper load/store generator bits, the existing native emitterimplementations are reused, thus making the Viper implementationsredundant.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit extends the range for int-indexed load/store opcodegenerators, making them emit correct code sequences for offsets thatspan more than 12 bits.This is necessary due to those generator bits being also used in theViper emitter, where it's more probable to reference offsets that cannot be embedded in the LDR/STR opcodes.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit performs a small refactoring on the Arm native emitter, byrenaming all but one instance of ASM_ARM_REG_R8 into REG_TEMP.ASM_ARM_REG_R8 is the temporary register used by the emitter whenoperations cannot overwrite the value of a particular register and someextra storage is needed.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the test runner enumerate and run native tests if thefeature check fails but native tests were explicitly requested from thecommand line.The old behaviour would disable native tests anyway if the feature checkfailed, however this hid a bug in the x86 native emitter that would betriggered even during the feature check. That meant the test suitewould pass on x86 even with a broken emitter, as those tests would havebeen skipped anyway.Now, if the user asks for native code it will get native code out of therunner no matter what.Co-authored-by: Damien George <damien@micropython.org>Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit fixes CI test runs for the `nanbox` target, which werebroken by the unconditional native emitter code output changes in thetest runner.The `nanbox` configuration does not enable native emitters of any kind,and with a full test run that includes executing emitted native codethings would break when doing CI runs.This is worked around by introducing a common subset of tests that donot involve the native emitter, and a more comprehensive set of teststhat include both non-emitter and emitter tests. The `nanbox` CI testrun will stop at the first subset, whilst other configurations will runthat and execute further tests.Function names have been kept the same for steps that involve nativecode, with the `nanbox` subset having another one. This should nottrigger any breakage in existing CI configurations or external scripts.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the native emitter backend extends the range of theBCCZ family of opcodes (BEQZ, BNEZ, BLTZ, BGEZ) from 12 bits to 18bits.The test suite contains some test files that, when compiled into nativecode, would require BCCZ jumps outside the (signed) 12 bits range. Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range. However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (3 bytes for a 12-bitsoffset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the native emitter backend extends the range of theBCC family of opcodes (BALL, BANY, BBC, BBS, BEQ, BGE, BGEU, BLT,BLTU, BNALL, BNE, BNONE) from 8 bits to 18 bits.The test suite contains some test files that, when compiled into nativecode, would require BCC jumps outside the (signed) 8 bits range. Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range. However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (8-bits offset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
15cb103
to43f6013
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for updating/fixing. It looks like everything is resolved now.
Tested on ESP8266_GENERIC and ESP32_GENERIC, withrun-tests.py --via-mpy --emit native
. Not everything passes, but those that fail did already fail before this PR.
43f6013
intomicropython:masterUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Summary
This PR introduces improvements in the code that gets generated for integer-indexed load and store operations for various architectures.
The Viper integer-indexed generator bits are now able to utilise a specialised operation implementation if the platform emitter exposes one. This means that macros named
ASM_{LOAD,STORE}{8,16,32}_REG_REG_OFFSET
are used if they are present in the platform backend.ASM_LOAD16_REG_REG_OFFSET
,ASM_LOAD32_REG_REG_OFFSET
, andASM_STORE32_REG_REG_OFFSET
were already present since they are used by the native emitter, so their implementations are used also by the Viper emitter now. This may or may not reduce the code size, but it streamlines the current behaviour by having one single optimised implementation on each platform.However, those three implementations are now exposed to scenarios where accessing larger than usual offsets is a somewhat probable occurrence - requiring some changes to make them work in all possible cases on Arm, Thumb, and Xtensa. The full set of platform-optimised integer indexed load/store operations is not implemented for all Viper operations on all platforms, as the focus of this PR is ARMv7-M improvements. The rest of the changes is there to adapt the other platform backends to the infrastructure changes needed for ARMv7-M. The remaining load/store operations' code will come later in a different PR if the code size increase can be kept to a minimum.
The Arm emitter is now able to generate
LDR
andSTR
integer-indexed opcodes whose offsets can span the full 32 bits address space. The original behaviour would be undefined if the offset was larger than 12 bits, as it would corrupt the upper bits of the generated opcode itself.The Thumb emitter is now able to generate ARMv7-M opcodes for all data sizes if both ARMv7-M opcodes are allowed and if the opcode offset spans between 6 and 12 bits. In that case a single 32 bits opcode will be generated (
LDR{B,H}.W
andSTR{B,H}.W
), falling back to the existing generic code sequence generator if the offset is too big to be embedded in a single opcode.The Xtensa emitter is now able to generate opcode sequences spanning the full 32 bits address space for 32 bits loads and stores, and for 16 bit loads. The existing implementations would raise a
RuntimeError
if the offset could not be embedded in a single opcode.Testing
The native and viper tests in
tests/micropython
ran successfully on QEMUMPS2-AN385
,SABRELITE
, andVIRT_RV32
boards, on a NodeMCU v3 8266 board, on a generic ESP32 D0WD board, and on x64 via the Unix port.Trade-offs and Alternatives
I've tried to keep the extra code footprint as limited as possible without having to make the code too complex to follow (leaving some potential ARMv7-M only optimisations out as well, taking up a bit more space).
This is mostly evident for armv6/armv7 targets, where the overhead should be ~200 bytes. This was offset a bit by moving out some code from the native emitter to the platform backends, where a simpler implementation could suffice.
However there are some benefits in more optimised code being generated, and in a slight decrease of complexity in the Viper emitter as less platform-specific code is present there.