Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

py/emitnative: Further Viper integer-indexed load/store improvements.#17342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

agatti
Copy link
Contributor

@agattiagatti commentedMay 22, 2025
edited
Loading

Summary

This PR introduces improvements in the code that gets generated for integer-indexed load and store operations for various architectures.

The Viper integer-indexed generator bits are now able to utilise a specialised operation implementation if the platform emitter exposes one. This means that macros namedASM_{LOAD,STORE}{8,16,32}_REG_REG_OFFSET are used if they are present in the platform backend.

ASM_LOAD16_REG_REG_OFFSET,ASM_LOAD32_REG_REG_OFFSET, andASM_STORE32_REG_REG_OFFSET were already present since they are used by the native emitter, so their implementations are used also by the Viper emitter now. This may or may not reduce the code size, but it streamlines the current behaviour by having one single optimised implementation on each platform.

However, those three implementations are now exposed to scenarios where accessing larger than usual offsets is a somewhat probable occurrence - requiring some changes to make them work in all possible cases on Arm, Thumb, and Xtensa. The full set of platform-optimised integer indexed load/store operations is not implemented for all Viper operations on all platforms, as the focus of this PR is ARMv7-M improvements. The rest of the changes is there to adapt the other platform backends to the infrastructure changes needed for ARMv7-M. The remaining load/store operations' code will come later in a different PR if the code size increase can be kept to a minimum.

The Arm emitter is now able to generateLDR andSTR integer-indexed opcodes whose offsets can span the full 32 bits address space. The original behaviour would be undefined if the offset was larger than 12 bits, as it would corrupt the upper bits of the generated opcode itself.

The Thumb emitter is now able to generate ARMv7-M opcodes for all data sizes if both ARMv7-M opcodes are allowed and if the opcode offset spans between 6 and 12 bits. In that case a single 32 bits opcode will be generated (LDR{B,H}.W andSTR{B,H}.W), falling back to the existing generic code sequence generator if the offset is too big to be embedded in a single opcode.

The Xtensa emitter is now able to generate opcode sequences spanning the full 32 bits address space for 32 bits loads and stores, and for 16 bit loads. The existing implementations would raise aRuntimeError if the offset could not be embedded in a single opcode.

Testing

The native and viper tests intests/micropython ran successfully on QEMUMPS2-AN385,SABRELITE, andVIRT_RV32 boards, on a NodeMCU v3 8266 board, on a generic ESP32 D0WD board, and on x64 via the Unix port.

Trade-offs and Alternatives

I've tried to keep the extra code footprint as limited as possible without having to make the code too complex to follow (leaving some potential ARMv7-M only optimisations out as well, taking up a bit more space).

This is mostly evident for armv6/armv7 targets, where the overhead should be ~200 bytes. This was offset a bit by moving out some code from the native emitter to the platform backends, where a simpler implementation could suffice.

However there are some benefits in more optimised code being generated, and in a slight decrease of complexity in the Viper emitter as less platform-specific code is present there.

@codecovCodecov
Copy link

codecovbot commentedMay 22, 2025
edited
Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base(e43a384) to head(43f6013).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@##           master   #17342      +/-   ##==========================================- Coverage   98.54%   98.54%   -0.01%==========================================  Files         169      169                Lines       21943    21938       -5     ==========================================- Hits        21623    21618       -5  Misses        320      320

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedMay 22, 2025
edited
Loading

Code size report:

   bare-arm:    +0 +0.000% minimal x86:    +0 +0.000%    unix x64:   -80 -0.009% standard      stm32:  +184 +0.047% PYBV10     mimxrt:  +176 +0.047% TEENSY40        rp2:  -152 -0.017% RPI_PICO_W       samd:  +188 +0.070% ADAFRUIT_ITSYBITSY_M4_EXPRESS  qemu rv32:   -58 -0.013% VIRT_RV32

@agattiagattiforce-pushed theoh-no-more-viper-improvements branch 2 times, most recently from59f8f83 to5e3fe57CompareMay 28, 2025 04:18
@dpgeorgedpgeorge added this to therelease-1.26.0 milestoneJun 4, 2025
@agattiagattiforce-pushed theoh-no-more-viper-improvements branch from5e3fe57 to0d4598eCompareJune 4, 2025 20:35
@agatti
Copy link
ContributorAuthor

agatti commentedJun 4, 2025
edited
Loading

Hrm, the test runner change is clashing withnanbox? Looking into it now...

Yep,nanbox doesn't support native emitters and CI requests a full test run, which forces native code output, and you can guess the rest.

@agattiagattiforce-pushed theoh-no-more-viper-improvements branch from8e2bd5b toe801a72CompareJune 4, 2025 21:24
@agatti
Copy link
ContributorAuthor

There are lots of ways to work around thenanbox CI issue - please let me know if the one I chose is OK with you.

#define ASM_STORE_REG_REG_OFFSET(as, reg_src, reg_base, word_offset) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), (word_offset))
#define ASM_STORE8_REG_REG(as, reg_src, reg_base) asm_thumb_strb_rlo_rlo_i5((as), (reg_src), (reg_base), 0)
#define ASM_STORE16_REG_REG(as, reg_src, reg_base) asm_thumb_strh_rlo_rlo_i5((as), (reg_src), (reg_base), 0)
#define ASM_STORE32_REG_REG(as, reg_src, reg_base) asm_thumb_str_rlo_rlo_i5((as), (reg_src), (reg_base), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These inline functions likeasm_thumb_strb_rlo_rlo_i5() are now unused, so could be removed. But maybe worth keeping them around in case they are needed again one day?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

They're not adding any code to the final binary and they're under version control anyway. I'd personally take them out but maybe some external code is relying on those functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's leave them in for now, until all these changes to the native emitter settle down.

@agattiagattiforce-pushed theoh-no-more-viper-improvements branch 2 times, most recently from07818dc to15cb103CompareJune 9, 2025 06:05
agattiand others added10 commitsJune 10, 2025 11:29
This commit extends the generic ASM API by adding the rest of theASM_{LOAD,STORE}[size]_REG_REG_OFFSET macros whenever applicable.The Viper int-indexed load/store code generator was changed to use thoseAPI functions if they are available, falling back to backend-specificimplementations if possible and ultimately to a generic implementation.Right now all backends except for x64 implement load16, load32, andstore32 operations (x64 only implements load16).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the Thumb native code generator backend emit ARMv7-Mspecific opcodes for indexed load/store operations if possible.Now T3 opcode encodings are used if the generator backend is configuredto allow emitting ARMv7-M opcodes and if the (unsigned) scaled indexfits in 12 bits.  Or, in other words, LDR{B,H}.W and STR{B,H}.W opcodesare now emitted if possible.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit updates the existing specialised implementations forint-indexed 32-bit load and store operations, and adds a specialisedimplementation for int-indexed 16-bit load.The 32-bit operations relied on the fact that their applicability waslimited to a specific range, falling back on a generic implementationotherwise.  Introducing a single entry point for each int-indexedload/store operation size would break that assumption.  Now those twooperations contain fallback code to generate working code by themselvesinstead of raising an exception.The 16-bit operation instead simply did not have any range check, but itwas not exposed directly to the Viper emitter.  When a 16-bitint-indexed load entry point was introduced, the existing implementationwould fail when accessing memory outside its 0..255 halfwords range.  Aspecialised implementation is now present, performing fewer operationsthan the existing Viper emitter equivalent.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit removes redundant RV32 implementations of certainint-indexed code generation operations (32-bit load/store and 16-bitload).Those operations were already available as part of the native emitterAPI but were not exposed to the Viper code generator.  As part of theintroduction of more specialised load and store API calls toint-indexed Viper load/store generator bits, the existing native emitterimplementations are reused, thus making the Viper implementationsredundant.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit extends the range for int-indexed load/store opcodegenerators, making them emit correct code sequences for offsets thatspan more than 12 bits.This is necessary due to those generator bits being also used in theViper emitter, where it's more probable to reference offsets that cannot be embedded in the LDR/STR opcodes.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit performs a small refactoring on the Arm native emitter, byrenaming all but one instance of ASM_ARM_REG_R8 into REG_TEMP.ASM_ARM_REG_R8 is the temporary register used by the emitter whenoperations cannot overwrite the value of a particular register and someextra storage is needed.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the test runner enumerate and run native tests if thefeature check fails but native tests were explicitly requested from thecommand line.The old behaviour would disable native tests anyway if the feature checkfailed, however this hid a bug in the x86 native emitter that would betriggered even during the feature check.  That meant the test suitewould pass on x86 even with a broken emitter, as those tests would havebeen skipped anyway.Now, if the user asks for native code it will get native code out of therunner no matter what.Co-authored-by: Damien George <damien@micropython.org>Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit fixes CI test runs for the `nanbox` target, which werebroken by the unconditional native emitter code output changes in thetest runner.The `nanbox` configuration does not enable native emitters of any kind,and with a full test run that includes executing emitted native codethings would break when doing CI runs.This is worked around by introducing a common subset of tests that donot involve the native emitter, and a more comprehensive set of teststhat include both non-emitter and emitter tests.  The `nanbox` CI testrun will stop at the first subset, whilst other configurations will runthat and execute further tests.Function names have been kept the same for steps that involve nativecode, with the `nanbox` subset having another one.  This should nottrigger any breakage in existing CI configurations or external scripts.Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the native emitter backend extends the range of theBCCZ family of opcodes (BEQZ, BNEZ, BLTZ, BGEZ) from 12 bits to 18bits.The test suite contains some test files that, when compiled into nativecode, would require BCCZ jumps outside the (signed) 12 bits range.  Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range.  However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (3 bytes for a 12-bitsoffset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the native emitter backend extends the range of theBCC family of opcodes (BALL, BANY, BBC, BBS, BEQ, BGE, BGEU, BLT,BLTU, BNALL, BNE, BNONE) from 8 bits to 18 bits.The test suite contains some test files that, when compiled into nativecode, would require BCC jumps outside the (signed) 8 bits range.  Inthis case either the MicroPython interpreter or mpy-cross would raise anexception, not running the test when using the "--via-mpy --emit native"command line options with the test runner.This comes with a 3 bytes penalty on each forward jump, bringing thefootprint of those jumps to 6 bytes each, as a longer opcode sequencehas to be emitted to let jumps access a larger range.  However, this isslightly offset by the fact that backward jumps can be emitted with asingle opcode if the range is small enough (8-bits offset).Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
@dpgeorgedpgeorgeforce-pushed theoh-no-more-viper-improvements branch from15cb103 to43f6013CompareJune 10, 2025 02:56
Copy link
Member

@dpgeorgedpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for updating/fixing. It looks like everything is resolved now.

Tested on ESP8266_GENERIC and ESP32_GENERIC, withrun-tests.py --via-mpy --emit native. Not everything passes, but those that fail did already fail before this PR.

@dpgeorgedpgeorge merged commit43f6013 intomicropython:masterJun 10, 2025
69 checks passed
@agattiagatti deleted the oh-no-more-viper-improvements branchJune 10, 2025 09:41
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dpgeorgedpgeorgedpgeorge approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
release-1.26.0
Development

Successfully merging this pull request may close these issues.

2 participants
@agatti@dpgeorge

[8]ページ先頭

©2009-2025 Movatter.jp