- Notifications
You must be signed in to change notification settings - Fork5.2k
JIT: Have lowering set up IR for post-indexed addressing and make strength reduced IV updates amenable to post-indexed addressing#105185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch |
jakobbotsch commentedJul 20, 2024
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
| Azure Pipelines successfully started running 2 pipeline(s). |
e907ed1 to9598fddComparejakobbotsch commentedJul 21, 2024
Ran jitstress inhttps://dev.azure.com/dnceng-public/public/_build/results?buildId=749010&view=results and libraries-jitstress inhttps://dev.azure.com/dnceng-public/public/_build/results?buildId=749011&view=results. jitstress failures were#105186. |
jakobbotsch commentedJul 21, 2024
cc @dotnet/jit-contrib PTAL@AndyAyersMS Diffs. Some cool diffs: |
AndyAyersMS commentedJul 21, 2024
Interesting diff in windows arm benchmarks pgo +4 (+0.09%) : 7422.dasm - System.Text.RegularExpressions.RegexPrefixAnalyzer:<FindPrefixes>g__FindPrefixesCore|0_1(System.Text.RegularExpressions.RegexNode,System.Collections.Generic.List`1[System.Text.StringBuilder],ubyte):ubyte (Tier0-FullOpts)@@ -1600,10 +1600,12 @@ G_M12455_IG128: ; bbWeight=0.05, gcrefRegs=180000 {x19 x20}, byrefRegs=C0 ;; size=4 bbWeight=0.05 PerfScore 0.02 G_M12455_IG129: ; bbWeight=0.49, gcrefRegs=180000 {x19 x20}, byrefRegs=C00000 {x22 x23}, byref, isz ldrh w1, [x23, x21]- stp wzr, w1, [fp, #0x50]// [V16 loc13], [V15 loc12]+ str w1, [fp, #0x54]// [V15 loc12]+ add x21, x21, #2+ str wzr, [fp, #0x50]// [V16 loc13] |
| int maxCount =min(m_blockIndirs.Height(), POST_INDEXED_ADDRESSING_MAX_DISTANCE /2); | ||
| for (int i =0; i < maxCount; i++) | ||
| { | ||
| SavedIndir& prev = m_blockIndirs.TopRef(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it be more efficient to start checking with the last indir instead of the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This does start with the last indir (since it is usingTopRef instead ofBottomRef)
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| assert((prevIndir->gtLIRFlags & LIR::Flags::Mark) ==0); | ||
| m_scratchSideEffects.Clear(); | ||
| for (GenTree* cur = prevIndir->gtNext; cur != store; cur = cur->gtNext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder if this could be cheaper if you computed two side effect sets and then checked for interference. But it probably doesn't make much difference.
jakobbotschJul 21, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hmm, possibly -- although it would be a bit less precise than what's here since not all nodes that are part ofstore's dataflow necessarily happen after all the nodes we are moving.
This adds a transformation in lowering that tries to set up the IR to beamenable to post-indexed addressing in the backend. It does so bylooking for RMW additions/subtractions of a local that was also recentlyused as the address to an indirection.
…singOn arm64 have strength reduction try to insert IV updates after the lastuse if that last use is a legal insertion point. This often allows thebackend to use post-indexed addressing to combine the use with the IVupdate.
9598fdd toc5cc900Comparejakobbotsch commentedJul 21, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
We end up with this IR after strength reduction: *****BB135 [0056]STMT00207 (0x2B1[E-] ...0x2BB )N007 (10,7) [000666]DA-XGO-----▌STORE_LCL_VAR intV15 loc12 d:1 $d1aN006 (10,7) [000664]---XGO-N---└──▌COMMA ushort <l:$b0d, c:$b0c>N001 (0,0) [000657]-----------├──▌NOP voidN005 (10,7) [003041]---XGO-----└──▌IND ushort <l:$b0a, c:$b0b>N004 (7,5) [000663]----GO-N---└──▌ADD byref $c12N002 (3,2) [000662]-----------├──▌LCL_VAR byrefV165 tmp129 u:2 $c11N003 (3,2) [003994]-----------└──▌LCL_VAR longV247 rat2*****BB135 [0056]STMT00720 (??? ...??? )N004 (9,8) [003993]DA---------▌STORE_LCL_VAR longV247 rat2N003 (5,5) [003992]-----------└──▌ADD longN001 (3,2) [003991]-----------├──▌LCL_VAR longV247 rat2N002 (1,2) [003989]-----------└──▌CNS_INT long2 $34d*****BB135 [0056]STMT00208 (0x2BD[E-] ...0x2BE )N002 (1,3) [000668]DA---------▌STORE_LCL_VAR intV16 loc13 d:1 $VN.VoidN001 (1,2) [000667]-----------└──▌CNS_INT int0 $c0 Lowering does not try to make the indirection Strength reduction doesn't do much (any) sanity checking of whether we actually expect to be able to do post-indexed after moving the IV update. That would require us to check that the use is of a supported pattern. But I figure that complication is unnecessary since the exact place we update the IV at shouldn't matter much here -- it is live throughout the loop anyway. It might even be better for scheduling purposes to update it as soon as possible after that last use. |

Uh oh!
There was an error while loading.Please reload this page.
This adds a transformation in lowering that tries to set up the IR to be
amenable to post-indexed addressing in the backend. It does so by
looking for RMW additions/subtractions of a local that was also recently
used as the address to an indirection, and making them adjacent.
Additionally, have strength reduction try to insert IV updates after the last
use if that last use is a legal insertion point. This allows the lowering transformation
to kick in.
For a simple loop:
this results in:
The .NET 8 vs .NET 9 codegen diff for this loop becomes: