Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ARM64 - Optimizing a % b operations#65535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
TIHan merged 35 commits intodotnet:mainfromTIHan:arm64-opt-mod
Mar 14, 2022
Merged

Conversation

TIHan
Copy link
Contributor

@TIHanTIHan commentedFeb 18, 2022
edited
Loading

Addressingpart of this issue:#34937

Description

There are various ways to optimize% for integers on ARM64.

a % b can be transformed intoa & (b - 1) if they are unsigned integers andb is a constant with the power of 2.

Acceptance Criteria

  • Add Tests (asmdiffs cover this)

kunalspathak reacted with thumbs up emoji
@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelFeb 18, 2022
@ghostghost assignedTIHanFeb 18, 2022
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT
See info inarea-owners.md if you want to be subscribed.

Issue Details

Addressing this issue:#34937

Description

There are various ways to optimize% for integers on ARM64.

Example:
a % b can be transformed intoa & (b - 1) if they are unsigned integers andb is a constant with the power of 2.

Acceptance Criteria

  • Add signed int mod optimization with known constant
  • Add signed int mod optimization without a known constant
  • Add Tests
Author:TIHan
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@EgorBo
Copy link
Member

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead?

@TIHan
Copy link
ContributorAuthor

@EgorBo are you referring toLowering::LowerConstIntDivOrMod ?

@EgorBo
Copy link
Member

LowerConstIntDivOrMod

yeah, and just movemod toa % b = a - (a / b) * b; there - thus, we won't have to re-implement the already existing optimization.

One potential problem with this approach that it might produce regressions wherea - (a / b) * b previously led to more CSE opportunities, e.g. witha / b next toa % b

@TIHan
Copy link
ContributorAuthor

It makes sense that we should do it there so the earlier phases don't screw it up.

@TIHan
Copy link
ContributorAuthor

I did some work to see if I could move the existing mod optimizations to lowering, but it might be a bit much for what the PR is trying to accomplish.

@tannergooding
Copy link
Member

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize arounddiv/rem in many cases)?

@kunalspathak
Copy link
Contributor

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

I agree.

@EgorBo
Copy link
Member

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize arounddiv/rem in many cases)?

cc@kunalspathak@tannergooding

I personally think it's not, for any non-leaf X inX % Y we have to introduce a new local (ASG node) instead of just keeping a simplex mod y expression, e.g(x + 1) % y early in morph is converted into:

[000015] -A-X-+--R---              \--*  SUB       int   [000012] -----+------                 +--*  LCL_VAR   int    V03 tmp1         [000014] -A-X-+------                 \--*  MUL       int   [000004] -A-X-+------                    +--*  DIV       int   [000011] -A---+------                    |  +--*  COMMA     int   [000009] -A---+------                    |  |  +--*  ASG       int   [000008] D----+-N----                    |  |  |  +--*  LCL_VAR   int    V03 tmp1         [000002] -----+------                    |  |  |  \--*  ADD       int   [000000] -----+------                    |  |  |     +--*  LCL_VAR   int    V00 arg0         [000001] -----+------                    |  |  |     \--*  CNS_INT   int    1[000010] -----+------                    |  |  \--*  LCL_VAR   int    V03 tmp1         [000003] -----+------                    |  \--*  LCL_VAR   int    V01 arg1         [000013] -----+------                    \--*  LCL_VAR   int    V01 arg1

instead of just:

[000004] ---X--------              \--*  MOD       int   [000002] ------------                 +--*  ADD       int   [000000] ------------                 |  +--*  LCL_VAR   int    V00 arg0         [000001] ------------                 |  \--*  CNS_INT   int    1[000003] ------------                 \--*  LCL_VAR   int    V01 arg1

E.g. it makes it non-hoistable for ARM64, e.g. see this:
image
loops are highlighted for both x64 and arm64

@tannergooding
Copy link
Member

tannergooding commentedFeb 18, 2022
edited
Loading

@EgorBo, I was referring specifically to thex % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replacex % SomePow2 withx & (SomePow2 - 1) since that's the same number of nodes, still a constant, but alsoAND is better understood and optimized thanDIV orMOD

@EgorBo
Copy link
Member

@EgorBo, I was referring specifically to thex % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replacex % SomePow2 withx & (SomePow2 - 1) since that's the same number of nodes, still a constant, but alsoAND is better understood and optimized thanDIV orMOD

I'm fine with doingx umod POT early in morph - it makes sense and doesn't produce additional local, I was referring to my suggestion to remove the early expansion of generalX [u]mod Y

@TIHan
Copy link
ContributorAuthor

@kunalspathak @echesakovMSFT This is ready.

Will try to restart CI.

Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM with nice diffs.

@TIHanTIHan merged commitedf14c1 intodotnet:mainMar 14, 2022
@TIHanTIHan deleted the arm64-opt-mod branchMarch 14, 2022 18:29
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull requestMar 30, 2022
* Initial work for ARM64 mod optimization* Updated comment* Updated comment* Updated comment* Fixing build* Remove uneeded var* Use '%' morph logic for both x64/arm64* Adding back in divisor check for x64* Formatting* Update comments* Update comments* Fixing* Updated comment* Updated comment* Tweaking x64 transformation logic for the mod opt* Tweaking x64 transformation logic for the mod opt* Using IntCon* Fixed build* Minor tweak* Fixing x64 diffs* Removing flag set* Feedback* Fixing build* Feedback* Fixing tests* Fixing tests* Fixing tests* Formatting* Fixing tests* Feedback* Fixing build
@ghostghost locked asresolvedand limited conversation to collaboratorsApr 13, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@EgorBoEgorBoEgorBo left review comments

@echesakovechesakovechesakov left review comments

@SingleAccretionSingleAccretionSingleAccretion left review comments

@tannergoodingtannergoodingtannergooding approved these changes

@kunalspathakkunalspathakkunalspathak approved these changes

Assignees

@TIHanTIHan

Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@TIHan@EgorBo@tannergooding@kunalspathak@echesakov@SingleAccretion

[8]ページ先頭

©2009-2025 Movatter.jp