Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Certain loops do not get recorded in optLoopTable #43713

Closed
Assignees
jakobbotsch
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone
@kunalspathak

Description

@kunalspathak

Today, we fail to capture certain loops insideoptLoopTable.optLoopTable is a data structure that should ideally contain all the loops so we can make further loop optimizations likeloop unrolling,loop cloning orhoist loop code.

Below is the code taken fromQuickSortSpan benchmark.

for(i=lo,j=hi,pivot=data[hi];i<j;){while(i<j&&data[i]<=pivot){++i;}while(j>i&&data[j]>=pivot){--j;}if(i<j){temp=data[i];data[i]=data[j];data[j]=temp;}}

In the assembly code, the 2 inner while loops have this kind of representation and hence do not make it tooptLoopTable.

Assembly code
; Assembly listing for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]); Emitting BLENDED_CODE for X64 CPU with AVX - Windows; optimized code; rsp based frame; fully interruptible; Final local variable assignments; Lcl frame size = 56G_M37982_IG01:              ;; offset=0000H 00007ff8`a20ebd2057pushrdi 00007ff8`a20ebd2156pushrsi 00007ff8`a20ebd2255pushrbp 00007ff8`a20ebd2353pushrbx 00007ff8`a20ebd24        4883EC38subrsp,56 00007ff8`a20ebd28        33C0xorrax,rax 00007ff8`a20ebd2a4889442428mov      qword ptr[rsp+28H],rax 00007ff8`a20ebd2f4889442430mov      qword ptr[rsp+30H],rax;; bbWeight=1    PerfScore 6.50G_M37982_IG02:              ;; offset=0014H 00007ff8`a20ebd34        488B31movrsi, bword ptr[rcx] 00007ff8`a20ebd37        8B7908movedi, dword ptr[rcx+8];; bbWeight=1    PerfScore 4.00G_M37982_IG03:              ;; offset=001AH 00007ff8`a20ebd3a        83FF01cmpedi,1 00007ff8`a20ebd3d        7F09jg       SHORT G_M37982_IG05;; bbWeight=1    PerfScore 1.25G_M37982_IG04:              ;; offset=001FH 00007ff8`a20ebd3f        4883C438addrsp,56; =========================== 32B boundary =========================== 00007ff8`a20ebd43        5Bpoprbx 00007ff8`a20ebd44        5Dpoprbp 00007ff8`a20ebd45        5Epoprsi 00007ff8`a20ebd46        5Fpoprdi 00007ff8`a20ebd47        C3ret;; bbWeight=0.50 PerfScore 1.63G_M37982_IG05:              ;; offset=0028H 00007ff8`a20ebd48        8D4FFFleaecx,[rdi-1] 00007ff8`a20ebd4b        33DBxorebx,ebx 00007ff8`a20ebd4d        8BD1movedx,ecx 00007ff8`a20ebd4f        3BD7cmpedx,edi 00007ff8`a20ebd51        0F8315010000jae      G_M37982_IG23 00007ff8`a20ebd57        4863C2movsxdrax,edx 00007ff8`a20ebd5a        8B0486moveax, dword ptr[rsi+4*rax] 00007ff8`a20ebd5d        EB60jmp      SHORT G_M37982_IG12;; bbWeight=0.50 PerfScore 3.25G_M37982_IG06:              ;; offset=003FH 00007ff8`a20ebd5f        FFC3incebx; =========================== 32B boundary ===========================;; bbWeight=2    PerfScore 0.50G_M37982_IG07:              ;; offset=0041H 00007ff8`a20ebd61        3BDAcmpebx,edx 00007ff8`a20ebd63        7D15jge      SHORT G_M37982_IG10 00007ff8`a20ebd65        3BDFcmpebx,edi 00007ff8`a20ebd67        0F83FF000000jae      G_M37982_IG23 00007ff8`a20ebd6d        4C63C3movsxdr8,ebx 00007ff8`a20ebd7042390486cmp      dword ptr[rsi+4*r8],eax 00007ff8`a20ebd74        7EE9jle      SHORT G_M37982_IG06;; bbWeight=16    PerfScore 92.00G_M37982_IG08:              ;; offset=0056H 00007ff8`a20ebd76        EB02jmp      SHORT G_M37982_IG10;; bbWeight=2    PerfScore 4.00G_M37982_IG09:              ;; offset=0058H 00007ff8`a20ebd78        FFCAdecedx;; bbWeight=8    PerfScore 2.00G_M37982_IG10:              ;; offset=005AH 00007ff8`a20ebd7a        3BD3cmpedx,ebx 00007ff8`a20ebd7c        7E11jle      SHORT G_M37982_IG11 00007ff8`a20ebd7e        3BD7cmpedx,edi; =========================== 32B boundary =========================== 00007ff8`a20ebd80        0F83E6000000jae      G_M37982_IG23 00007ff8`a20ebd86        4C63C2movsxdr8,edx 00007ff8`a20ebd8942390486cmp      dword ptr[rsi+4*r8],eax 00007ff8`a20ebd8d        7DE9jge      SHORT G_M37982_IG09;; bbWeight=16    PerfScore 92.00G_M37982_IG11:              ;; offset=006FH 00007ff8`a20ebd8f        3BDAcmpebx,edx 00007ff8`a20ebd91        7D2Cjge      SHORT G_M37982_IG12 00007ff8`a20ebd93        3BDFcmpebx,edi 00007ff8`a20ebd95        0F83D1000000jae      G_M37982_IG23 00007ff8`a20ebd9b        4C63C3movsxdr8,ebx 00007ff8`a20ebd9e        468B0486movr8d, dword ptr[rsi+4*r8]; =========================== 32B boundary =========================== 00007ff8`a20ebda2        4C63CBmovsxdr9,ebx 00007ff8`a20ebda5        3BD7cmpedx,edi 00007ff8`a20ebda7        0F83BF000000jae      G_M37982_IG23 00007ff8`a20ebdad        4C63D2movsxdr10,edx 00007ff8`a20ebdb0        468B1496movr10d, dword ptr[rsi+4*r10] 00007ff8`a20ebdb4        4689148Emov      dword ptr[rsi+4*r9],r10d 00007ff8`a20ebdb8        4C63CAmovsxdr9,edx 00007ff8`a20ebdbb        4689048Emov      dword ptr[rsi+4*r9],r8d;; bbWeight=2    PerfScore 21.50G_M37982_IG12:              ;; offset=009FH 00007ff8`a20ebdbf        3BDAcmpebx,edx; =========================== 32B boundary =========================== 00007ff8`a20ebdc1        7C9Ejl       SHORT G_M37982_IG07;; bbWeight=4    PerfScore 5.00G_M37982_IG13:              ;; offset=00A3H 00007ff8`a20ebdc3        3BD9cmpebx,ecx 00007ff8`a20ebdc5        741Cje       SHORT G_M37982_IG14 00007ff8`a20ebdc7        3BDFcmpebx,edi 00007ff8`a20ebdc9        0F839D000000jae      G_M37982_IG23 00007ff8`a20ebdcf        4C63C3movsxdr8,ebx 00007ff8`a20ebdd2        468B0486movr8d, dword ptr[rsi+4*r8] 00007ff8`a20ebdd6        4863D3movsxdrdx,ebx 00007ff8`a20ebdd9890496mov      dword ptr[rsi+4*rdx],eax 00007ff8`a20ebddc        4863C9movsxdrcx,ecx 00007ff8`a20ebddf        4489048Emov      dword ptr[rsi+4*rcx],r8d; =========================== 32B boundary ===========================;; bbWeight=0.50 PerfScore 3.63G_M37982_IG14:              ;; offset=00C3H 00007ff8`a20ebde3        8BCBmovecx,ebx 00007ff8`a20ebde5        8BD7movedx,edi 00007ff8`a20ebde7        483BCAcmprcx,rdx 00007ff8`a20ebdea        777Aja       SHORT G_M37982_IG22;; bbWeight=0.50 PerfScore 0.88G_M37982_IG15:              ;; offset=00CCH 00007ff8`a20ebdec        48B960300EB402020000movrcx,0x202B40E3060 00007ff8`a20ebdf6        488B29movrbp, gword ptr[rcx] 00007ff8`a20ebdf9        488BD5movrdx,rbp 00007ff8`a20ebdfc4889542420mov      gword ptr[rsp+20H],rdx; =========================== 32B boundary =========================== 00007ff8`a20ebe01        488BCAmovrcx,rdx 00007ff8`a20ebe04        85DBtestebx,ebx 00007ff8`a20ebe06        7D08jge      SHORT G_M37982_IG17;; bbWeight=0.50 PerfScore 2.50G_M37982_IG16:              ;; offset=00E8H 00007ff8`a20ebe08        488BD1movrdx,rcx 00007ff8`a20ebe0b        E8B8FE8DFFcall     System.Diagnostics.Debug:Fail(System.String,System.String);; bbWeight=0.25 PerfScore 0.31G_M37982_IG17:              ;; offset=00F0H 00007ff8`a20ebe10        8BCBmovecx,ebx 00007ff8`a20ebe12        488D442428learax, bword ptr[rsp+28H] 00007ff8`a20ebe17488930mov      bword ptr[rax],rsi 00007ff8`a20ebe1a894808mov      dword ptr[rax+8],ecx 00007ff8`a20ebe1d        488D4C2428learcx, bword ptr[rsp+28H]; =========================== 32B boundary =========================== 00007ff8`a20ebe22        E889CA8FFFcall     Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]) 00007ff8`a20ebe27        FFC3incebx 00007ff8`a20ebe29        3BDFcmpebx,edi 00007ff8`a20ebe2b7739ja       SHORT G_M37982_IG22;; bbWeight=0.50 PerfScore 2.88G_M37982_IG18:              ;; offset=010DH 00007ff8`a20ebe2d        2BFBsubedi,ebx 00007ff8`a20ebe2f        488B542420movrdx, gword ptr[rsp+20H] 00007ff8`a20ebe34        85FFtestedi,edi 00007ff8`a20ebe36        7D08jge      SHORT G_M37982_IG20;; bbWeight=0.50 PerfScore 1.25G_M37982_IG19:              ;; offset=0118H 00007ff8`a20ebe38        488BCAmovrcx,rdx 00007ff8`a20ebe3b        E888FE8DFFcall     System.Diagnostics.Debug:Fail(System.String,System.String); =========================== 32B boundary ===========================;; bbWeight=0.25 PerfScore 0.31G_M37982_IG20:              ;; offset=0120H 00007ff8`a20ebe40        4863CBmovsxdrcx,ebx 00007ff8`a20ebe43        488D0C8Elearcx, bword ptr[rsi+4*rcx] 00007ff8`a20ebe47        488D442428learax, bword ptr[rsp+28H] 00007ff8`a20ebe4c488908mov      bword ptr[rax],rcx 00007ff8`a20ebe4f897808mov      dword ptr[rax+8],edi 00007ff8`a20ebe52        488D4C2428learcx, bword ptr[rsp+28H] 00007ff8`a20ebe57        E854CA8FFFcall     Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]) 00007ff8`a20ebe5c90nop;; bbWeight=0.50 PerfScore 2.50G_M37982_IG21:              ;; offset=013DH 00007ff8`a20ebe5d        4883C438addrsp,56; =========================== 32B boundary =========================== 00007ff8`a20ebe61        5Bpoprbx 00007ff8`a20ebe62        5Dpoprbp 00007ff8`a20ebe63        5Epoprsi 00007ff8`a20ebe64        5Fpoprdi 00007ff8`a20ebe65        C3ret;; bbWeight=0.50 PerfScore 1.63G_M37982_IG22:              ;; offset=0146H 00007ff8`a20ebe66        E83DEB8EFFcall     System.ThrowHelper:ThrowArgumentOutOfRangeException() 00007ff8`a20ebe6b        CCint3;; bbWeight=0    PerfScore 0.00G_M37982_IG23:              ;; offset=014CH 00007ff8`a20ebe6c        E8AF87065Fcall     CORINFO_HELP_RNGCHKFAIL 00007ff8`a20ebe71        CCint3;; bbWeight=0    PerfScore 0.00; Total bytes of code 338, prolog size 20, PerfScore 283.30, instruction count 115 (MethodHash=bb696ba1) for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]); ============================================================

Here, none of the 3 loops (outerfor and 2 innerwhile) loops are present inoptLoopTable. On investigation,@BruceForstall and I found out that it happens because the entry to the loop is below the bottom of the loop and from the entry it jumps up to the top block, however the bottom's backedge jumps to the first block of the loop.

Here is the pictorial representation. Ideally, we favor below loops. The diagram is taken fromoptimizer.cpp.

        |        v      head        |        |  top/first <--+        |       |       |        |      ...      |        |       |       |        |       v       |        +---> entry     |                |       |               ...      |                |       |                v       |         +-- exit/tail  |         |      |       |         |     ...      |         |      |       |         |      v       |         |    bottom ---+         |         +------+                |                v

However, in this case, the loop is little different and hence we reject it inFindNaturalLoops() because the codeFindEntry returnsnullptr and hence wedon't recognize it as a loop.

        |        v      head        |        |   +--> first        |   |      |        |   |      |        |   |      v        |   |     top <---+        |   |     ...     |        |   |      |      |        |   |      v      |        |   +--- bottom   |        |          |      |        |         ...     |        |          |      |        |          v      |        +------> entry ---+        |        +------+                |                v

We should fix it to identify such loops or at least make an entry tooptLoopTable to take advantage of other loop optimizations.

This should be one of the part of#43549.

category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions


    [8]ページ先頭

    ©2009-2025 Movatter.jp