Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

JIT: Have lowering set up IR for post-indexed addressing and make strength reduced IV updates amenable to post-indexed addressing#105185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
jakobbotsch merged 2 commits intodotnet:mainfromjakobbotsch:lower-post-indexing
Jul 22, 2024

Conversation

@jakobbotsch
Copy link
Member

@jakobbotschjakobbotsch commentedJul 20, 2024
edited
Loading

This adds a transformation in lowering that tries to set up the IR to be
amenable to post-indexed addressing in the backend. It does so by
looking for RMW additions/subtractions of a local that was also recently
used as the address to an indirection, and making them adjacent.

Additionally, have strength reduction try to insert IV updates after the last
use if that last use is a legal insertion point. This allows the lowering transformation
to kick in.

For a simple loop:

[MethodImpl(MethodImplOptions.NoInlining)]publicstaticintSum(int[]arr){intsum=0;foreach(intxinarr){sum+=x;}returnsum;}

this results in:

@@ -19,12 +19,11 @@ G_M53154_IG03:        ; bbWeight=0.25, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b ;; size=4 bbWeight=0.25 PerfScore 0.12  G_M53154_IG04:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref, isz-            ldr     w3, [x0]+            ldr     w3, [x0], #0x04             add     w1, w3, w1-            add     x0, x0, #4             sub     w2, w2, #1             cbnz    w2, G_M53154_IG04-;; size=20 bbWeight=4 PerfScore 22.00+;; size=16 bbWeight=4 PerfScore 20.00  G_M53154_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref                              ; byrRegs -[x0]@@ -35,5 +34,5 @@ G_M53154_IG06:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code: 60+; Total bytes of code: 56

The .NET 8 vs .NET 9 codegen diff for this loop becomes:

@@ -7,11 +7,10 @@ G_M53154_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M53154_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz                              ; gcrRegs +[x0]             mov     w1, wzr-            mov     w2, wzr-            ldr     w3, [x0, #0x08]-            cmp     w3, #0+            ldr     w2, [x0, #0x08]+            cmp     w2, #0             ble     G_M53154_IG05-;; size=20 bbWeight=1 PerfScore 5.50+;; size=16 bbWeight=1 PerfScore 5.00  G_M53154_IG03:        ; bbWeight=0.25, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref             add     x0, x0, #16@@ -20,12 +19,11 @@ G_M53154_IG03:        ; bbWeight=0.25, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b ;; size=4 bbWeight=0.25 PerfScore 0.12  G_M53154_IG04:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref, isz-            ldr     w4, [x0, w2, UXTW #2]-            add     w1, w4, w1-            add     w2, w2, #1-            cmp     w3, w2-            bgt     G_M53154_IG04-;; size=20 bbWeight=4 PerfScore 22.00+            ldr     w3, [x0], #0x04+            add     w1, w3, w1+            sub     w2, w2, #1+            cbnz    w2, G_M53154_IG04+;; size=16 bbWeight=4 PerfScore 20.00  G_M53154_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref                              ; byrRegs -[x0]@@ -36,5 +34,5 @@ G_M53154_IG06:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code: 64+; Total bytes of code: 56

@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelJul 20, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
MemberAuthor

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
MemberAuthor

@jakobbotschjakobbotsch marked this pull request as ready for reviewJuly 21, 2024 10:42
@jakobbotsch
Copy link
MemberAuthor

cc @dotnet/jit-contrib PTAL@AndyAyersMS

Diffs. Some cool diffs:
image

Suchiman and PaulusParssinen reacted with heart emoji

@AndyAyersMS
Copy link
Member

Interesting diff in windows arm benchmarks pgo

+4 (+0.09%) : 7422.dasm - System.Text.RegularExpressions.RegexPrefixAnalyzer:<FindPrefixes>g__FindPrefixesCore|0_1(System.Text.RegularExpressions.RegexNode,System.Collections.Generic.List`1[System.Text.StringBuilder],ubyte):ubyte (Tier0-FullOpts)@@ -1600,10 +1600,12 @@ G_M12455_IG128:        ; bbWeight=0.05, gcrefRegs=180000 {x19 x20}, byrefRegs=C0 ;; size=4 bbWeight=0.05 PerfScore 0.02 G_M12455_IG129:        ; bbWeight=0.49, gcrefRegs=180000 {x19 x20}, byrefRegs=C00000 {x22 x23}, byref, isz             ldrh    w1, [x23, x21]-            stp     wzr, w1, [fp, #0x50]// [V16 loc13], [V15 loc12]+            str     w1, [fp, #0x54]// [V15 loc12]+            add     x21, x21, #2+            str     wzr, [fp, #0x50]// [V16 loc13]

int maxCount =min(m_blockIndirs.Height(), POST_INDEXED_ADDRESSING_MAX_DISTANCE /2);
for (int i =0; i < maxCount; i++)
{
SavedIndir& prev = m_blockIndirs.TopRef(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it be more efficient to start checking with the last indir instead of the first?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This does start with the last indir (since it is usingTopRef instead ofBottomRef)

assert((prevIndir->gtLIRFlags & LIR::Flags::Mark) ==0);
m_scratchSideEffects.Clear();

for (GenTree* cur = prevIndir->gtNext; cur != store; cur = cur->gtNext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if this could be cheaper if you computed two side effect sets and then checked for interference. But it probably doesn't make much difference.

Copy link
MemberAuthor

@jakobbotschjakobbotschJul 21, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm, possibly -- although it would be a bit less precise than what's here since not all nodes that are part ofstore's dataflow necessarily happen after all the nodes we are moving.

This adds a transformation in lowering that tries to set up the IR to beamenable to post-indexed addressing in the backend. It does so bylooking for RMW additions/subtractions of a local that was also recentlyused as the address to an indirection.
…singOn arm64 have strength reduction try to insert IV updates after the lastuse if that last use is a legal insertion point. This often allows thebackend to use post-indexed addressing to combine the use with the IVupdate.
@jakobbotsch
Copy link
MemberAuthor

jakobbotsch commentedJul 21, 2024
edited
Loading

Interesting diff in windows arm benchmarks pgo

+4 (+0.09%) : 7422.dasm - System.Text.RegularExpressions.RegexPrefixAnalyzer:<FindPrefixes>g__FindPrefixesCore|0_1(System.Text.RegularExpressions.RegexNode,System.Collections.Generic.List`1[System.Text.StringBuilder],ubyte):ubyte (Tier0-FullOpts)@@ -1600,10 +1600,12 @@ G_M12455_IG128:        ; bbWeight=0.05, gcrefRegs=180000 {x19 x20}, byrefRegs=C0 ;; size=4 bbWeight=0.05 PerfScore 0.02 G_M12455_IG129:        ; bbWeight=0.49, gcrefRegs=180000 {x19 x20}, byrefRegs=C00000 {x22 x23}, byref, isz             ldrh    w1, [x23, x21]-            stp     wzr, w1, [fp, #0x50]// [V16 loc13], [V15 loc12]+            str     w1, [fp, #0x54]// [V15 loc12]+            add     x21, x21, #2+            str     wzr, [fp, #0x50]// [V16 loc13]

We end up with this IR after strength reduction:

*****BB135 [0056]STMT00207 (0x2B1[E-] ...0x2BB )N007 (10,7) [000666]DA-XGO-----STORE_LCL_VAR intV15 loc12        d:1 $d1aN006 (10,7) [000664]---XGO-N---└──▌COMMA     ushort <l:$b0d, c:$b0c>N001 (0,0) [000657]-----------├──▌NOP       voidN005 (10,7) [003041]---XGO-----└──▌IND       ushort <l:$b0a, c:$b0b>N004 (7,5) [000663]----GO-N---└──▌ADD       byref  $c12N002 (3,2) [000662]-----------├──▌LCL_VAR   byrefV165 tmp129      u:2 $c11N003 (3,2) [003994]-----------└──▌LCL_VAR   longV247 rat2*****BB135 [0056]STMT00720 (??? ...??? )N004 (9,8) [003993]DA---------STORE_LCL_VAR longV247 rat2N003 (5,5) [003992]-----------└──▌ADD       longN001 (3,2) [003991]-----------├──▌LCL_VAR   longV247 rat2N002 (1,2) [003989]-----------└──▌CNS_INT   long2 $34d*****BB135 [0056]STMT00208 (0x2BD[E-] ...0x2BE )N002 (1,3) [000668]DA---------STORE_LCL_VAR intV16 loc13        d:1 $VN.VoidN001 (1,2) [000667]-----------└──▌CNS_INT   int0 $c0

Lowering does not try to make the indirection[003041 and update[003993] adjacent since the indirection isn't directly onV247, so there won't be a chance for post-indexed addressing. And this breaks thestp that previously happened forV15 andV16 that end up being stack allocated in adjacent slots.

Strength reduction doesn't do much (any) sanity checking of whether we actually expect to be able to do post-indexed after moving the IV update. That would require us to check that the use is of a supported pattern. But I figure that complication is unnecessary since the exact place we update the IV at shouldn't matter much here -- it is live throughout the loop anyway. It might even be better for scheduling purposes to update it as soon as possible after that last use.

@jakobbotschjakobbotsch merged commit7dd68f4 intodotnet:mainJul 22, 2024
@jakobbotschjakobbotsch deleted the lower-post-indexing branchJuly 22, 2024 09:06
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsAug 22, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@AndyAyersMSAndyAyersMSAndyAyersMS approved these changes

Assignees

@jakobbotschjakobbotsch

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@jakobbotsch@AndyAyersMS

[8]ページ先頭

©2009-2025 Movatter.jp