NotificationsYou must be signed in to change notification settings
Fork5.1k
Star16.6k

ARM64 - Optimizing a % b operations#65535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

TIHan merged 35 commits intodotnet:mainfromTIHan:arm64-opt-mod

Mar 14, 2022

Merged

ARM64 - Optimizing a % b operations#65535

TIHan merged 35 commits intodotnet:mainfromTIHan:arm64-opt-mod

Mar 14, 2022

Conversation

Copy link

Contributor

TIHan commentedFeb 18, 2022•
edited
Loading

Addressingpart of this issue:#34937

Description

There are various ways to optimize% for integers on ARM64.

a % b can be transformed intoa & (b - 1) if they are unsigned integers andb is a constant with the power of 2.

Acceptance Criteria

~~Add Tests~~ (asmdiffs cover this)

TIHan added3 commits

February 17, 2022 13:12

Initial work for ARM64 mod optimization

64f7042

Updated comment

d1dce26

Updated comment

a3fbe54

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

Feb 18, 2022

ghost assignedTIHan

Feb 18, 2022

Copy link

ghost commentedFeb 18, 2022

Tagging subscribers to this area:@JulieLeeMSFT
See info inarea-owners.md if you want to be subscribed.

Issue Details

Addressing this issue:#34937

Description

There are various ways to optimize% for integers on ARM64.

Example:
a % b can be transformed intoa & (b - 1) if they are unsigned integers andb is a constant with the power of 2.

Acceptance Criteria

Add signed int mod optimization with known constant
Add signed int mod optimization without a known constant
Add Tests

Author:	TIHan
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

Updated comment

729057d

Copy link

Member

EgorBo commentedFeb 18, 2022

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead?

Copy link

ContributorAuthor

TIHan commentedFeb 18, 2022

@EgorBo are you referring toLowering::LowerConstIntDivOrMod ?

Copy link

Member

EgorBo commentedFeb 18, 2022

LowerConstIntDivOrMod

yeah, and just movemod toa % b = a - (a / b) * b; there - thus, we won't have to re-implement the already existing optimization.

One potential problem with this approach that it might produce regressions wherea - (a / b) * b previously led to more CSE opportunities, e.g. witha / b next toa % b

Copy link

ContributorAuthor

TIHan commentedFeb 18, 2022

It makes sense that we should do it there so the earlier phases don't screw it up.

Copy link

ContributorAuthor

TIHan commentedFeb 18, 2022

I did some work to see if I could move the existing mod optimizations to lowering, but it might be a bit much for what the PR is trying to accomplish.

TIHan added2 commits

February 17, 2022 17:40

Fixing build

c2eee76

Remove uneeded var

ca333e0

Copy link

Member

tannergooding commentedFeb 18, 2022

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize arounddiv/rem in many cases)?

tannergooding reviewed

Feb 18, 2022

View reviewed changes

src/coreclr/jit/gentree.h OutdatedShow resolvedHide resolved

Copy link

Contributor

kunalspathak commentedFeb 18, 2022

@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize around div/rem in many cases)?

I agree.

Copy link

Member

EgorBo commentedFeb 18, 2022

Shouldn't we just remove early expansion of UMOD/MOD for arm64 from morph and use the shared (with x86) impl in lower instead
@EgorBo, I would've thought it was better to do it early in morph so other things can more easily take advantage of the optimization (we don't optimize arounddiv/rem in many cases)?

cc@kunalspathak @tannergooding

I personally think it's not, for any non-leaf X inX % Y we have to introduce a new local (ASG node) instead of just keeping a simplex mod y expression, e.g(x + 1) % y early in morph is converted into:

[000015] -A-X-+--R---              \--*  SUB       int   [000012] -----+------                 +--*  LCL_VAR   int    V03 tmp1         [000014] -A-X-+------                 \--*  MUL       int   [000004] -A-X-+------                    +--*  DIV       int   [000011] -A---+------                    |  +--*  COMMA     int   [000009] -A---+------                    |  |  +--*  ASG       int   [000008] D----+-N----                    |  |  |  +--*  LCL_VAR   int    V03 tmp1         [000002] -----+------                    |  |  |  \--*  ADD       int   [000000] -----+------                    |  |  |     +--*  LCL_VAR   int    V00 arg0         [000001] -----+------                    |  |  |     \--*  CNS_INT   int    1[000010] -----+------                    |  |  \--*  LCL_VAR   int    V03 tmp1         [000003] -----+------                    |  \--*  LCL_VAR   int    V01 arg1         [000013] -----+------                    \--*  LCL_VAR   int    V01 arg1

instead of just:

[000004] ---X--------              \--*  MOD       int   [000002] ------------                 +--*  ADD       int   [000000] ------------                 |  +--*  LCL_VAR   int    V00 arg0         [000001] ------------                 |  \--*  CNS_INT   int    1[000003] ------------                 \--*  LCL_VAR   int    V01 arg1

E.g. it makes it non-hoistable for ARM64, e.g. see this:

loops are highlighted for both x64 and arm64

Copy link

Member

tannergooding commentedFeb 18, 2022•
edited
Loading

@EgorBo, I was referring specifically to thex % SomePow2 optimization being introduced here.

It should be a clear improvement to recognize and replacex % SomePow2 withx & (SomePow2 - 1) since that's the same number of nodes, still a constant, but alsoAND is better understood and optimized thanDIV orMOD

Copy link

Member

EgorBo commentedFeb 18, 2022

@EgorBo, I was referring specifically to thex % SomePow2 optimization being introduced here.
It should be a clear improvement to recognize and replacex % SomePow2 withx & (SomePow2 - 1) since that's the same number of nodes, still a constant, but alsoAND is better understood and optimized thanDIV orMOD

I'm fine with doingx umod POT early in morph - it makes sense and doesn't produce additional local, I was referring to my suggestion to remove the early expansion of generalX [u]mod Y

TIHan added3 commits

February 23, 2022 15:02

Use '%' morph logic for both x64/arm64

7fc88ed

Merge remote-tracking branch 'upstream/main' into arm64-opt-mod

4720f55

Adding back in divisor check for x64

6656763

This was referencedFeb 24, 2022

System.Runtime.Serialization.Xml.Tests failed#59926

Closed

profiler.elt work item test failures in slowpatheltenter#60018

Closed

baseservices\\exceptions\\stackoverflow\\stackoverflowtester\\stackoverflowtester.cmd#65201

Closed

System.IO.Tests work item failing with SIGKILL#65791

Closed

TIHan added3 commits

February 25, 2022 11:31

Formatting

cfa9805

Update comments

e058553

Update comments

b950ab4

TIHan added6 commits

March 2, 2022 13:20

Tweaking x64 transformation logic for the mod opt

dee80b5

Using IntCon

1fac071

Fixed build

372bcf3

Minor tweak

8809058

Fixing x64 diffs

b16d381

Removing flag set

9235f92

Copy link

ContributorAuthor

TIHan commentedMar 9, 2022

@kunalspathak @echesakovMSFT This is ready.

Will try to restart CI.

TIHan mentioned this pull request

Mar 9, 2022

ARM64 - Optimizing a % b operations part 2#66407

Merged

2 tasks

Merge remote-tracking branch 'upstream/main' into arm64-opt-mod

03b2cf0

EgorBo reviewed

Mar 9, 2022

View reviewed changes

src/coreclr/jit/morph.cpp OutdatedShow resolvedHide resolved

EgorBo reviewed

Mar 9, 2022

View reviewed changes

src/coreclr/jit/morph.cpp OutdatedShow resolvedHide resolved

kunalspathak approved these changes

Mar 10, 2022

View reviewed changes

Copy link

Contributor

kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM with nice diffs.

SingleAccretion reviewed

Mar 10, 2022

View reviewed changes

src/coreclr/jit/gentree.h OutdatedShow resolvedHide resolved

src/coreclr/jit/morph.cpp OutdatedShow resolvedHide resolved

src/coreclr/jit/gentree.h OutdatedShow resolvedHide resolved

TIHan added3 commits

March 10, 2022 10:42

Feedback

36b1e6a

Fixing build

964b426

Feedback

ec8246a

tannergooding approved these changes

Mar 11, 2022

View reviewed changes

TIHan added5 commits

March 11, 2022 12:22

Fixing tests

27c3894

Fixing tests

4c36be7

Fixing tests

19f16b0

Formatting

f8921b9

Fixing tests

ade331d

echesakov reviewed

Mar 12, 2022

View reviewed changes

src/coreclr/jit/morph.cpp OutdatedShow resolvedHide resolved

TIHan added2 commits

March 12, 2022 17:01

Feedback

06d1124

Fixing build

f142ca2

TIHan merged commitedf14c1 intodotnet:main

Mar 14, 2022

TIHan deleted the arm64-opt-mod branch

March 14, 2022 18:29

radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request

Mar 30, 2022

ARM64 - Optimizing a % b operations (dotnet#65535)

73b1897

* Initial work for ARM64 mod optimization* Updated comment* Updated comment* Updated comment* Fixing build* Remove uneeded var* Use '%' morph logic for both x64/arm64* Adding back in divisor check for x64* Formatting* Update comments* Update comments* Fixing* Updated comment* Updated comment* Tweaking x64 transformation logic for the mod opt* Tweaking x64 transformation logic for the mod opt* Using IntCon* Fixed build* Minor tweak* Fixing x64 diffs* Removing flag set* Feedback* Fixing build* Feedback* Fixing tests* Fixing tests* Fixing tests* Formatting* Fixing tests* Feedback* Fixing build