Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.2k
py/emitnative: Further Viper integer-indexed load/store improvements.#17342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This commit extends the generic ASM API by adding the rest of theASM_{LOAD,STORE}[size]_REG_REG_OFFSET macros whenever applicable.The Viper int-indexed load/store code generator was changed to use thoseAPI functions if they are available, falling back to backend-specificimplementations if possible and ultimately to a generic implementation.Right now all backends except for x64 implement load16, load32, andstore32 operations (x64 only implements load16).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
codecovbot commentedMay 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@## master #17342 +/- ##==========================================- Coverage 98.54% 98.54% -0.01%========================================== Files 169 169 Lines 21898 21893 -5 ==========================================- Hits 21579 21574 -5 Misses 319 319 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
github-actionsbot commentedMay 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Code size report:
|
This commit lets the Thumb native code generator backend emit ARMv7-Mspecific opcodes for indexed load/store operations if possible.Now T3 opcode encodings are used if the generator backend is configuredto allow emitting ARMv7-M opcodes and if the (unsigned) scaled indexfits in 12 bits. Or, in other words, LDR{B,H}.W and STR{B,H}.W opcodesare now emitted if possible.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit updates the existing specialised implementations forint-indexed 32-bit load and store operations, and adds a specialisedimplementation for int-indexed 16-bit load.The 32-bit operations relied on the fact that their applicability waslimited to a specific range, falling back on a generic implementationotherwise. Introducing a single entry point for each int-indexedload/store operation size would break that assumption. Now those twooperations contain fallback code to generate working code by themselvesinstead of raising an exception.The 16-bit operation instead simply did not have any range check, but itwas not exposed directly to the Viper emitter. When a 16-bitint-indexed load entry point was introduced, the existing implementationwould fail when accessing memory outside its 0..255 halfwords range. Aspecialised implementation is now present, performing fewer operationsthan the existing Viper emitter equivalent.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit removes redundant RV32 implementations of certainint-indexed code generation operations (32-bit load/store and 16-bitload).Those operations were already available as part of the native emitterAPI but were not exposed to the Viper code generator. As part of theintroduction of more specialised load and store API calls toint-indexed Viper load/store generator bits, the existing native emitterimplementations are reused, thus making the Viper implementationsredundant.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit extends the range for int-indexed load/store opcodegenerators, making them emit correct code sequences for offsets thatspan more than 12 bits.This is necessary due to those generator bits being also used in theViper emitter, where it's more probable to reference offsets that cannot be embedded in the LDR/STR opcodes.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
cb22367
to59f8f83
Compare
Uh oh!
There was an error while loading.Please reload this page.
Summary
This PR introduces improvements in the code that gets generated for integer-indexed load and store operations for various architectures.
The Viper integer-indexed generator bits are now able to utilise a specialised operation implementation if the platform emitter exposes one. This means that macros named
ASM_{LOAD,STORE}{8,16,32}_REG_REG_OFFSET
are used if they are present in the platform backend.ASM_LOAD16_REG_REG_OFFSET
,ASM_LOAD32_REG_REG_OFFSET
, andASM_STORE32_REG_REG_OFFSET
were already present since they are used by the native emitter, so their implementations are used also by the Viper emitter now. This may or may not reduce the code size, but it streamlines the current behaviour by having one single optimised implementation on each platform.However, those three implementations are now exposed to scenarios where accessing larger than usual offsets is a somewhat probable occurrence - requiring some changes to make them work in all possible cases on Arm, Thumb, and Xtensa. The full set of integer-indexed load/store is not implemented for all Viper operations on all platforms, as the focus of this PR is ARMv7-M improvements. The rest of the changes is there to adapt the other platform backends to the infrastructure changes needed for ARMv7-M. The remaining load/store operations' code will come later in a different PR if the code size increase can be kept to a minimum.
The Arm emitter is now able to generate
LDR
andSTR
integer-indexed opcodes whose offsets can span the full 32 bits address space. The original behaviour would be undefined if the offset was larger than 12 bits, as it would corrupt the upper bits of the generated opcode itself.The Thumb emitter is now able to generate ARMv7-M opcodes for all data sizes if both ARMv7-M opcodes are allowed and if the opcode offset spans between 6 and 12 bits. In that case a single 32 bits opcode will be generated (
LDR{B,H}.W
andSTR{B,H}.W
), falling back to the existing generic code sequence generator if the offset is too big to be embedded in a single opcode.The Xtensa emitter is now able to generate opcode sequences spanning the full 32 bits address space for 32 bits loads and stores, and for 16 bit loads. The existing implementations would raise a
RuntimeError
if the offset could not be embedded in a single opcode.Testing
The native and viper tests in
tests/micropython
were ran successfully on QEMUMPS2-AN385
,SABRELITE
, andVIRT_RV32
boards, on a NodeMCU v3 8266 board, on a generic ESP32 D0WD board, and on x64 via the Unix port.Trade-offs and Alternatives
I've tried to keep the extra code footprint as limited as possible without having to make the code too complex to follow (leaving some potential ARMv7-M only optimisations out as well, taking up a bit more space).
This is mostly evident for armv6/armv7 targets, where the overhead should be ~200 bytes. This was offset a bit by moving out some code from the native emitter to the platform backends, where a simpler implementation could suffice.
However there are some benefits in more optimised code being generated, and in a slight decrease of complexity in the Viper emitter as less platform-specific code is present there.