- Notifications
You must be signed in to change notification settings - Fork5.2k
Description
Today, we fail to capture certain loops insideoptLoopTable.optLoopTable is a data structure that should ideally contain all the loops so we can make further loop optimizations likeloop unrolling,loop cloning orhoist loop code.
Below is the code taken fromQuickSortSpan benchmark.
for(i=lo,j=hi,pivot=data[hi];i<j;){while(i<j&&data[i]<=pivot){++i;}while(j>i&&data[j]>=pivot){--j;}if(i<j){temp=data[i];data[i]=data[j];data[j]=temp;}}
In the assembly code, the 2 inner while loops have this kind of representation and hence do not make it tooptLoopTable.
Assembly code
; Assembly listing for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]); Emitting BLENDED_CODE for X64 CPU with AVX - Windows; optimized code; rsp based frame; fully interruptible; Final local variable assignments; Lcl frame size = 56G_M37982_IG01: ;; offset=0000H 00007ff8`a20ebd2057pushrdi 00007ff8`a20ebd2156pushrsi 00007ff8`a20ebd2255pushrbp 00007ff8`a20ebd2353pushrbx 00007ff8`a20ebd24 4883EC38subrsp,56 00007ff8`a20ebd28 33C0xorrax,rax 00007ff8`a20ebd2a4889442428mov qword ptr[rsp+28H],rax 00007ff8`a20ebd2f4889442430mov qword ptr[rsp+30H],rax;; bbWeight=1 PerfScore 6.50G_M37982_IG02: ;; offset=0014H 00007ff8`a20ebd34 488B31movrsi, bword ptr[rcx] 00007ff8`a20ebd37 8B7908movedi, dword ptr[rcx+8];; bbWeight=1 PerfScore 4.00G_M37982_IG03: ;; offset=001AH 00007ff8`a20ebd3a 83FF01cmpedi,1 00007ff8`a20ebd3d 7F09jg SHORT G_M37982_IG05;; bbWeight=1 PerfScore 1.25G_M37982_IG04: ;; offset=001FH 00007ff8`a20ebd3f 4883C438addrsp,56; =========================== 32B boundary =========================== 00007ff8`a20ebd43 5Bpoprbx 00007ff8`a20ebd44 5Dpoprbp 00007ff8`a20ebd45 5Epoprsi 00007ff8`a20ebd46 5Fpoprdi 00007ff8`a20ebd47 C3ret;; bbWeight=0.50 PerfScore 1.63G_M37982_IG05: ;; offset=0028H 00007ff8`a20ebd48 8D4FFFleaecx,[rdi-1] 00007ff8`a20ebd4b 33DBxorebx,ebx 00007ff8`a20ebd4d 8BD1movedx,ecx 00007ff8`a20ebd4f 3BD7cmpedx,edi 00007ff8`a20ebd51 0F8315010000jae G_M37982_IG23 00007ff8`a20ebd57 4863C2movsxdrax,edx 00007ff8`a20ebd5a 8B0486moveax, dword ptr[rsi+4*rax] 00007ff8`a20ebd5d EB60jmp SHORT G_M37982_IG12;; bbWeight=0.50 PerfScore 3.25G_M37982_IG06: ;; offset=003FH 00007ff8`a20ebd5f FFC3incebx; =========================== 32B boundary ===========================;; bbWeight=2 PerfScore 0.50G_M37982_IG07: ;; offset=0041H 00007ff8`a20ebd61 3BDAcmpebx,edx 00007ff8`a20ebd63 7D15jge SHORT G_M37982_IG10 00007ff8`a20ebd65 3BDFcmpebx,edi 00007ff8`a20ebd67 0F83FF000000jae G_M37982_IG23 00007ff8`a20ebd6d 4C63C3movsxdr8,ebx 00007ff8`a20ebd7042390486cmp dword ptr[rsi+4*r8],eax 00007ff8`a20ebd74 7EE9jle SHORT G_M37982_IG06;; bbWeight=16 PerfScore 92.00G_M37982_IG08: ;; offset=0056H 00007ff8`a20ebd76 EB02jmp SHORT G_M37982_IG10;; bbWeight=2 PerfScore 4.00G_M37982_IG09: ;; offset=0058H 00007ff8`a20ebd78 FFCAdecedx;; bbWeight=8 PerfScore 2.00G_M37982_IG10: ;; offset=005AH 00007ff8`a20ebd7a 3BD3cmpedx,ebx 00007ff8`a20ebd7c 7E11jle SHORT G_M37982_IG11 00007ff8`a20ebd7e 3BD7cmpedx,edi; =========================== 32B boundary =========================== 00007ff8`a20ebd80 0F83E6000000jae G_M37982_IG23 00007ff8`a20ebd86 4C63C2movsxdr8,edx 00007ff8`a20ebd8942390486cmp dword ptr[rsi+4*r8],eax 00007ff8`a20ebd8d 7DE9jge SHORT G_M37982_IG09;; bbWeight=16 PerfScore 92.00G_M37982_IG11: ;; offset=006FH 00007ff8`a20ebd8f 3BDAcmpebx,edx 00007ff8`a20ebd91 7D2Cjge SHORT G_M37982_IG12 00007ff8`a20ebd93 3BDFcmpebx,edi 00007ff8`a20ebd95 0F83D1000000jae G_M37982_IG23 00007ff8`a20ebd9b 4C63C3movsxdr8,ebx 00007ff8`a20ebd9e 468B0486movr8d, dword ptr[rsi+4*r8]; =========================== 32B boundary =========================== 00007ff8`a20ebda2 4C63CBmovsxdr9,ebx 00007ff8`a20ebda5 3BD7cmpedx,edi 00007ff8`a20ebda7 0F83BF000000jae G_M37982_IG23 00007ff8`a20ebdad 4C63D2movsxdr10,edx 00007ff8`a20ebdb0 468B1496movr10d, dword ptr[rsi+4*r10] 00007ff8`a20ebdb4 4689148Emov dword ptr[rsi+4*r9],r10d 00007ff8`a20ebdb8 4C63CAmovsxdr9,edx 00007ff8`a20ebdbb 4689048Emov dword ptr[rsi+4*r9],r8d;; bbWeight=2 PerfScore 21.50G_M37982_IG12: ;; offset=009FH 00007ff8`a20ebdbf 3BDAcmpebx,edx; =========================== 32B boundary =========================== 00007ff8`a20ebdc1 7C9Ejl SHORT G_M37982_IG07;; bbWeight=4 PerfScore 5.00G_M37982_IG13: ;; offset=00A3H 00007ff8`a20ebdc3 3BD9cmpebx,ecx 00007ff8`a20ebdc5 741Cje SHORT G_M37982_IG14 00007ff8`a20ebdc7 3BDFcmpebx,edi 00007ff8`a20ebdc9 0F839D000000jae G_M37982_IG23 00007ff8`a20ebdcf 4C63C3movsxdr8,ebx 00007ff8`a20ebdd2 468B0486movr8d, dword ptr[rsi+4*r8] 00007ff8`a20ebdd6 4863D3movsxdrdx,ebx 00007ff8`a20ebdd9890496mov dword ptr[rsi+4*rdx],eax 00007ff8`a20ebddc 4863C9movsxdrcx,ecx 00007ff8`a20ebddf 4489048Emov dword ptr[rsi+4*rcx],r8d; =========================== 32B boundary ===========================;; bbWeight=0.50 PerfScore 3.63G_M37982_IG14: ;; offset=00C3H 00007ff8`a20ebde3 8BCBmovecx,ebx 00007ff8`a20ebde5 8BD7movedx,edi 00007ff8`a20ebde7 483BCAcmprcx,rdx 00007ff8`a20ebdea 777Aja SHORT G_M37982_IG22;; bbWeight=0.50 PerfScore 0.88G_M37982_IG15: ;; offset=00CCH 00007ff8`a20ebdec 48B960300EB402020000movrcx,0x202B40E3060 00007ff8`a20ebdf6 488B29movrbp, gword ptr[rcx] 00007ff8`a20ebdf9 488BD5movrdx,rbp 00007ff8`a20ebdfc4889542420mov gword ptr[rsp+20H],rdx; =========================== 32B boundary =========================== 00007ff8`a20ebe01 488BCAmovrcx,rdx 00007ff8`a20ebe04 85DBtestebx,ebx 00007ff8`a20ebe06 7D08jge SHORT G_M37982_IG17;; bbWeight=0.50 PerfScore 2.50G_M37982_IG16: ;; offset=00E8H 00007ff8`a20ebe08 488BD1movrdx,rcx 00007ff8`a20ebe0b E8B8FE8DFFcall System.Diagnostics.Debug:Fail(System.String,System.String);; bbWeight=0.25 PerfScore 0.31G_M37982_IG17: ;; offset=00F0H 00007ff8`a20ebe10 8BCBmovecx,ebx 00007ff8`a20ebe12 488D442428learax, bword ptr[rsp+28H] 00007ff8`a20ebe17488930mov bword ptr[rax],rsi 00007ff8`a20ebe1a894808mov dword ptr[rax+8],ecx 00007ff8`a20ebe1d 488D4C2428learcx, bword ptr[rsp+28H]; =========================== 32B boundary =========================== 00007ff8`a20ebe22 E889CA8FFFcall Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]) 00007ff8`a20ebe27 FFC3incebx 00007ff8`a20ebe29 3BDFcmpebx,edi 00007ff8`a20ebe2b7739ja SHORT G_M37982_IG22;; bbWeight=0.50 PerfScore 2.88G_M37982_IG18: ;; offset=010DH 00007ff8`a20ebe2d 2BFBsubedi,ebx 00007ff8`a20ebe2f 488B542420movrdx, gword ptr[rsp+20H] 00007ff8`a20ebe34 85FFtestedi,edi 00007ff8`a20ebe36 7D08jge SHORT G_M37982_IG20;; bbWeight=0.50 PerfScore 1.25G_M37982_IG19: ;; offset=0118H 00007ff8`a20ebe38 488BCAmovrcx,rdx 00007ff8`a20ebe3b E888FE8DFFcall System.Diagnostics.Debug:Fail(System.String,System.String); =========================== 32B boundary ===========================;; bbWeight=0.25 PerfScore 0.31G_M37982_IG20: ;; offset=0120H 00007ff8`a20ebe40 4863CBmovsxdrcx,ebx 00007ff8`a20ebe43 488D0C8Elearcx, bword ptr[rsi+4*rcx] 00007ff8`a20ebe47 488D442428learax, bword ptr[rsp+28H] 00007ff8`a20ebe4c488908mov bword ptr[rax],rcx 00007ff8`a20ebe4f897808mov dword ptr[rax+8],edi 00007ff8`a20ebe52 488D4C2428learcx, bword ptr[rsp+28H] 00007ff8`a20ebe57 E854CA8FFFcall Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]) 00007ff8`a20ebe5c90nop;; bbWeight=0.50 PerfScore 2.50G_M37982_IG21: ;; offset=013DH 00007ff8`a20ebe5d 4883C438addrsp,56; =========================== 32B boundary =========================== 00007ff8`a20ebe61 5Bpoprbx 00007ff8`a20ebe62 5Dpoprbp 00007ff8`a20ebe63 5Epoprsi 00007ff8`a20ebe64 5Fpoprdi 00007ff8`a20ebe65 C3ret;; bbWeight=0.50 PerfScore 1.63G_M37982_IG22: ;; offset=0146H 00007ff8`a20ebe66 E83DEB8EFFcall System.ThrowHelper:ThrowArgumentOutOfRangeException() 00007ff8`a20ebe6b CCint3;; bbWeight=0 PerfScore 0.00G_M37982_IG23: ;; offset=014CH 00007ff8`a20ebe6c E8AF87065Fcall CORINFO_HELP_RNGCHKFAIL 00007ff8`a20ebe71 CCint3;; bbWeight=0 PerfScore 0.00; Total bytes of code 338, prolog size 20, PerfScore 283.30, instruction count 115 (MethodHash=bb696ba1) for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32]); ============================================================
Here, none of the 3 loops (outerfor and 2 innerwhile) loops are present inoptLoopTable. On investigation,@BruceForstall and I found out that it happens because the entry to the loop is below the bottom of the loop and from the entry it jumps up to the top block, however the bottom's backedge jumps to the first block of the loop.
Here is the pictorial representation. Ideally, we favor below loops. The diagram is taken fromoptimizer.cpp.
| v head | | top/first <--+ | | | | ... | | | | | v | +---> entry | | | ... | | | v | +-- exit/tail | | | | | ... | | | | | v | | bottom ---+ | +------+ | vHowever, in this case, the loop is little different and hence we reject it inFindNaturalLoops() because the codeFindEntry returnsnullptr and hence wedon't recognize it as a loop.
| v head | | +--> first | | | | | | | | v | | top <---+ | | ... | | | | | | | v | | +--- bottom | | | | | ... | | | | | v | +------> entry ---+ | +------+ | vWe should fix it to identify such loops or at least make an entry tooptLoopTable to take advantage of other loop optimizations.
This should be one of the part of#43549.
category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:medium
Metadata
Metadata
Assignees
Labels
Type
Projects
Status