Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

JIT: Add scalar evolution analysis and do IV widening based on it#97865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
jakobbotsch merged 73 commits intodotnet:mainfromjakobbotsch:simplify-ivs
Feb 27, 2024

Conversation

@jakobbotsch
Copy link
Member

@jakobbotschjakobbotsch commentedFeb 2, 2024
edited
Loading

This adds a new phase meant for optimizing induction variables. It adds infrastructure for SSA-based analysis of induction variables (scalar evolution analysis), and uses it to do induction variable widening.
For example, with this optimization, codegen for

[MethodImpl(MethodImplOptions.NoInlining)]staticintFoo(int[]arr){intsum=0;for(inti=0;i<arr.Length;i++){sum+=arr[i];}returnsum;}

goes from

; Assembly listing for method ConsoleApp34.Program:Foo(int[]):int (FullOpts); Emitting BLENDED_CODE for X64 with AVX - Windows; FullOpts code; optimized code; rsp based frame; fully interruptible; No PGO data; Final local variable assignments;;  V00 arg0         [V00,T02] (  4,  7   )     ref  ->  rcx         class-hnd single-def <int[]>;  V01 loc0         [V01,T01] (  4, 10   )     int  ->  rax;  V02 loc1         [V02,T00] (  5, 17   )     int  ->  rdx;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace";  V04 cse0         [V04,T03] (  3,  6   )     int  ->   r8         "CSE - aggressive";; Lcl frame size = 40G_M8112_IG01:subrsp,40;; size=4 bbWeight=1 PerfScore 0.25G_M8112_IG02:xoreax,eaxxoredx,edxmovr8d, dword ptr[rcx+0x08]testr8d,r8djle      SHORT G_M8112_IG04align[0 bytes for IG03];; size=13 bbWeight=1 PerfScore 3.75G_M8112_IG03:movr10d,edxaddeax, dword ptr[rcx+4*r10+0x10]incedxcmpr8d,edxjg       SHORT G_M8112_IG03;; size=15 bbWeight=4 PerfScore 19.00G_M8112_IG04:addrsp,40ret;; size=5 bbWeight=1 PerfScore 1.25; Total bytes of code 37, prolog size 4, PerfScore 24.25, instruction count 14, allocated bytes for code 37 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts); ============================================================

to

; Assembly listing for method ConsoleApp34.Program:Foo(int[]):int (FullOpts); Emitting BLENDED_CODE for X64 with AVX - Windows; FullOpts code; optimized code; rsp based frame; fully interruptible; No PGO data; Final local variable assignments;;  V00 arg0         [V00,T02] (  4,  7   )     ref  ->  rcx         class-hnd single-def <int[]>;  V01 loc0         [V01,T01] (  4, 10   )     int  ->  rax;* V02 loc1         [V02,T04] (  0,  0   )     int  ->  zero-ref;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace";  V04 tmp1         [V04,T00] (  5, 17   )    long  ->   r8         "Widened primary induction variable";  V05 cse0         [V05,T03] (  3,  6   )     int  ->  rdx         "CSE - aggressive";; Lcl frame size = 40G_M8112_IG01:  ;; offset=0x0000subrsp,40;; size=4 bbWeight=1 PerfScore 0.25G_M8112_IG02:  ;; offset=0x0004xoreax,eaxmovedx, dword ptr[rcx+0x08]testedx,edxjle      SHORT G_M8112_IG04xorr8d,r8dalign[0 bytes for IG03];; size=12 bbWeight=1 PerfScore 3.75G_M8112_IG03:  ;; offset=0x0010addeax, dword ptr[rcx+4*r8+0x10]incr8dcmpedx,r8djg       SHORT G_M8112_IG03;; size=13 bbWeight=4 PerfScore 18.00G_M8112_IG04:  ;; offset=0x001Daddrsp,40ret;; size=5 bbWeight=1 PerfScore 1.25; Total bytes of code 34, prolog size 4, PerfScore 23.25, instruction count 13, allocated bytes for code 34 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts)

where we were able to drop a zero extension of the index inside the loop. In the future I plan to build strength reduction on top of the same analysis package.

The analysis is inspired by [1] and by LLVM's scalar evolution package. It provides a small IR that represents the evolving value of IR nodes inside loops. At the core of this IR is the notion of an "add recurrence", which describes a changing value as<loop, start, step>; the value of such an add recurrence is$start + N * step$, where N is the iteration index. Currently only simple add recurrences are supported where the start and step are either constants or invariant locals, but the framework generalizes nicely to allow chains of recurrences if we wish to support that. The IR also supports constants, invariant locals and operators on top of these (casts, adds, multiplications and shifts).

Analysis example

For the IR for the above, the analysis produces the following:

Analyzing scalar evolution inL00header:BB03Members (1):BB03Entry:BB02->BB03Exit:BB03->BB04Back:BB03->BB03BB03 [0001] [006..016)->BB03,BB04 (cond), preds={BB02,BB03} succs={BB04,BB03}STMT00009 (??? ...??? )N004 (0,0) [000045]DA---------STORE_LCL_VAR intV01 loc0         d:3N003 (0,0) [000044]-----------└──▌PHI       intN001 (0,0) [000050]----------- predBB03├──▌PHI_ARG   intV01 loc0         u:4N002 (0,0) [000047]----------- predBB02└──▌PHI_ARG   intV01 loc0         u:2[000047]=>0STMT00007 (??? ...??? )N004 (0,0) [000041]DA---------STORE_LCL_VAR intV02 loc1         d:3N003 (0,0) [000040]-----------└──▌PHI       intN001 (0,0) [000051]----------- predBB03├──▌PHI_ARG   intV02 loc1         u:4N002 (0,0) [000048]----------- predBB02└──▌PHI_ARG   intV02 loc1         u:2[000051]=> <L00, 1, 1>[000048]=>0[000041]=> <L00, 0, 1>

For example, here it was able to show thatV02, the index, is a primary induction variable; it is an add recurrence that starts at 0 and steps by 1 every iteration of the loop. It also showed that the value that comes from the backedge is similarly an add recurrence, except that it starts at 1 in the first loop.

STMT00003 (0x006[E-] ...0x00B )N015 (8,9) [000015]DA--GO-----STORE_LCL_VAR intV01 loc0         d:4N014 (8,9) [000014]----GO-----└──▌ADD       intN012 (6,7) [000033]----GO-N---├──▌COMMA     intN001 (0,0) [000025]-----------├──▌NOP       voidN011 (6,7) [000034] n---GO-----└──▌IND       intN010 (3,5) [000032]-----O-----└──▌ARR_ADDR  byref int[]N009 (3,5) [000031]-------N---└──▌ADD       byrefN002 (1,1) [000022]-----------├──▌LCL_VAR   refV00 arg0         u:1N008 (4,5) [000030]-------N---└──▌ADD       longN006 (3,4) [000028]-------N---├──▌LSH       longN004 (2,3) [000026]---------U-├──▌CAST      long<- uintN003 (1,1) [000023]-----------└──▌LCL_VAR   intV02 loc1         u:3N005 (1,1) [000027]-------N---└──▌CNS_INT   long2N007 (1,1) [000029]-----------└──▌CNS_INT   long16N013 (1,1) [000009]-----------└──▌LCL_VAR   intV01 loc0         u:3 (last use)[000022]=>V00.1[000023]=> <L00, 0, 1>[000026]=> <L00, 0, 1>[000027]=>2[000028]=> <L00, 0, 4>[000029]=>16[000030]=> <L00, 16, 4>[000031]=> <L00, (V00.1 + 16), 4>[000032]=> <L00, (V00.1 + 16), 4>

This one is more interesting since we can see hints of how strength reduction is going to utilize the information. In particular, the analysis was able to show that the address[000032] is also an add recurrence; it starts at value(V00.1 + 16) (the address of the first array element) and steps by4 in every iteration.

STMT00004 (0x00C[E-] ...0x00F )N004 (3,3) [000019]DA---------STORE_LCL_VAR intV02 loc1         d:4N003 (3,3) [000018]-----------└──▌ADD       intN001 (1,1) [000016]-----------├──▌LCL_VAR   intV02 loc1         u:3 (last use)N002 (1,1) [000017]-----------└──▌CNS_INT   int1[000016]=> <L00, 0, 1>[000017]=>1[000018]=> <L00, 1, 1>[000019]=> <L00, 1, 1>STMT00002 (0x010[E-] ...0x014 )N005 (7,7) [000008]---X-------JTRUE     voidN004 (5,5) [000007]J--X---N---└──▌GT        intN002 (3,3) [000006]---X-------├──▌ARR_LENGTH intN001 (1,1) [000005]-----------└──▌LCL_VAR   refV00 arg0         u:1N003 (1,1) [000004]-----------└──▌LCL_VAR   intV02 loc1         u:4[000005]=>V00.1[000004]=> <L00, 1, 1>

Here the analysis shows that the array object is invariant (otherwise analysis would fail) and that the compared index is an add recurrence that starts at 1.

When the induction variable has uses after the loop the widening pass will store the widened version back to the old local. This is only possible if all exits where the old local is live-in are not critical blocks in the sense that all their preds must come from inside the loop.
Exit canonicalization ensure sthat this is usually the case, but RBO/assertion prop may have uncovered new natural loops, so we still have to repeat the check.

[1] Michael Wolfe. 1992. Beyond induction variables. In Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation (PLDI '92). Association for Computing Machinery, New York, NY, USA, 162–174.https://doi.org/10.1145/143095.143131

PaulusParssinen, Joe4evr, neon-sunset, and EgorBo reacted with thumbs up emojisultrifork, tomeksowi, and En3Tho reacted with hooray emojiam11 reacted with heart emoji
@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelFeb 2, 2024
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

This adds a new phase meant for optimizing induction variables. It adds infrastructure for SSA-based analysis of induction variables (scalar evolution analysis), and uses it to do induction variable widening.
For example, with this optimization, codegen for

[MethodImpl(MethodImplOptions.NoInlining)]staticintFoo(int[]arr){intsum=0;foreach(intvinarr){sum+=v;}returnsum;}

goes from

; Assembly listing for method ConsoleApp34.Program:Foo(int[]):int (FullOpts); Emitting BLENDED_CODE for X64 with AVX - Windows; FullOpts code; optimized code; rsp based frame; fully interruptible; No PGO data; Final local variable assignments;;  V00 arg0         [V00,T02] (  4,  7   )     ref  ->  rcx         class-hnd single-def <int[]>;  V01 loc0         [V01,T01] (  4, 10   )     int  ->  rax;  V02 loc1         [V02,T00] (  5, 17   )     int  ->  rdx;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace";  V04 cse0         [V04,T03] (  3,  6   )     int  ->   r8         "CSE - aggressive";; Lcl frame size = 40G_M8112_IG01:subrsp,40;; size=4 bbWeight=1 PerfScore 0.25G_M8112_IG02:xoreax,eaxxoredx,edxmovr8d, dword ptr[rcx+0x08]testr8d,r8djle      SHORT G_M8112_IG04align[0 bytes for IG03];; size=13 bbWeight=1 PerfScore 3.75G_M8112_IG03:movr10d,edxaddeax, dword ptr[rcx+4*r10+0x10]incedxcmpr8d,edxjg       SHORT G_M8112_IG03;; size=15 bbWeight=4 PerfScore 19.00G_M8112_IG04:addrsp,40ret;; size=5 bbWeight=1 PerfScore 1.25; Total bytes of code 37, prolog size 4, PerfScore 24.25, instruction count 14, allocated bytes for code 37 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts); ============================================================

to

;  V00 arg0         [V00,T02] (  4,  7   )     ref  ->  rcx         class-hnd single-def <int[]>;  V01 loc0         [V01,T01] (  4, 10   )     int  ->  rax;* V02 loc1         [V02,T04] (  0,  0   )     int  ->  zero-ref;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace";  V04 tmp1         [V04,T00] (  5, 17   )    long  ->   r8         "Widened primary induction variable";  V05 cse0         [V05,T03] (  3,  6   )     int  ->  rdx         "CSE - aggressive";; Lcl frame size = 40G_M8112_IG01:subrsp,40;; size=4 bbWeight=1 PerfScore 0.25G_M8112_IG02:xoreax,eaxmovedx, dword ptr[rcx+0x08]testedx,edxjle      SHORT G_M8112_IG04xorr8d,r8djmp      SHORT G_M8112_IG03align[0 bytes for IG03];; size=14 bbWeight=1 PerfScore 5.75G_M8112_IG03:addeax, dword ptr[rcx+4*r8+0x10]incr8dcmpedx,r8djg       SHORT G_M8112_IG03;; size=13 bbWeight=4 PerfScore 18.00G_M8112_IG04:addrsp,40ret;; size=5 bbWeight=1 PerfScore 1.25; Total bytes of code 36, prolog size 4, PerfScore 25.25, instruction count 14, allocated bytes for code 36 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts)

where we were able to drop a zero extension of the index inside the loop. In the future I plan to build strength reduction on top of the same analysis package.

The analysis is inspired by [1] and by LLVM's scalar evolution package. It provides a small IR that represents the evolving value of IR nodes inside loops. At the core of this IR is the notion of an "add recurrence", which describes a changing value as<loop, start, step>; the value of such an add recurrence is$start + N * step$, where N is the iteration index. Currently only simple add recurrences are supported where the start and step are either constants or invariant locals, but the framework generalizes nicely to allow chains of recurrences if we wish to support that. The IR also supports constants, invariant locals and operators on top of these (casts, adds, multiplications and shifts).

Analysis example

For the IR for the above, the analysis produces the following:

Analyzing scalar evolution inL00header:BB03Members (1):BB03Entry:BB02->BB03Exit:BB03->BB04Back:BB03->BB03BB03 [0001] [006..016)->BB03,BB04 (cond), preds={BB02,BB03} succs={BB04,BB03}STMT00009 (??? ...??? )N004 (0,0) [000045]DA---------STORE_LCL_VAR intV01 loc0         d:3N003 (0,0) [000044]-----------└──▌PHI       intN001 (0,0) [000050]----------- predBB03├──▌PHI_ARG   intV01 loc0         u:4N002 (0,0) [000047]----------- predBB02└──▌PHI_ARG   intV01 loc0         u:2[000047]=>0STMT00007 (??? ...??? )N004 (0,0) [000041]DA---------STORE_LCL_VAR intV02 loc1         d:3N003 (0,0) [000040]-----------└──▌PHI       intN001 (0,0) [000051]----------- predBB03├──▌PHI_ARG   intV02 loc1         u:4N002 (0,0) [000048]----------- predBB02└──▌PHI_ARG   intV02 loc1         u:2[000051]=> <L00, 1, 1>[000048]=>0[000041]=> <L00, 0, 1>

For example, here it was able to show thatV02, the index, is a primary induction variable; it is an add recurrence that starts at 0 and steps by 1 every iteration of the loop. It also showed that the value that comes from the backedge is similarly an add recurrence, except that it starts at 1 in the first loop.

STMT00003 (0x006[E-] ...0x00B )N015 (8,9) [000015]DA--GO-----STORE_LCL_VAR intV01 loc0         d:4N014 (8,9) [000014]----GO-----└──▌ADD       intN012 (6,7) [000033]----GO-N---├──▌COMMA     intN001 (0,0) [000025]-----------├──▌NOP       voidN011 (6,7) [000034] n---GO-----└──▌IND       intN010 (3,5) [000032]-----O-----└──▌ARR_ADDR  byref int[]N009 (3,5) [000031]-------N---└──▌ADD       byrefN002 (1,1) [000022]-----------├──▌LCL_VAR   refV00 arg0         u:1N008 (4,5) [000030]-------N---└──▌ADD       longN006 (3,4) [000028]-------N---├──▌LSH       longN004 (2,3) [000026]---------U-├──▌CAST      long<- uintN003 (1,1) [000023]-----------└──▌LCL_VAR   intV02 loc1         u:3N005 (1,1) [000027]-------N---└──▌CNS_INT   long2N007 (1,1) [000029]-----------└──▌CNS_INT   long16N013 (1,1) [000009]-----------└──▌LCL_VAR   intV01 loc0         u:3 (last use)[000022]=>V00.1[000023]=> <L00, 0, 1>[000026]=> <L00, 0, 1>[000027]=>2[000028]=> <L00, 0, 4>[000029]=>16[000030]=> <L00, 16, 4>[000031]=> <L00, (V00.1 + 16), 4>[000032]=> <L00, (V00.1 + 16), 4>

This one is more interesting since we can see hints of how strength reduction is going to utilize the information. In particular, the analysis was able to show that the address[000032] is also an add recurrence; it starts at value(V00.1 + 16) (the address of the first array element) and steps by4 in every iteration.

STMT00004 (0x00C[E-] ...0x00F )N004 (3,3) [000019]DA---------STORE_LCL_VAR intV02 loc1         d:4N003 (3,3) [000018]-----------└──▌ADD       intN001 (1,1) [000016]-----------├──▌LCL_VAR   intV02 loc1         u:3 (last use)N002 (1,1) [000017]-----------└──▌CNS_INT   int1[000016]=> <L00, 0, 1>[000017]=>1[000018]=> <L00, 1, 1>[000019]=> <L00, 1, 1>STMT00002 (0x010[E-] ...0x014 )N005 (7,7) [000008]---X-------JTRUE     voidN004 (5,5) [000007]J--X---N---└──▌GT        intN002 (3,3) [000006]---X-------├──▌ARR_LENGTH intN001 (1,1) [000005]-----------└──▌LCL_VAR   refV00 arg0         u:1N003 (1,1) [000004]-----------└──▌LCL_VAR   intV02 loc1         u:4[000005]=>V00.1[000004]=> <L00, 1, 1>

I do not think we can enable this by default yet, for two reasons:

  1. It breaks range check in some cases since range check does not handleTYP_LONG locals well.
  2. Currently it rebuilds SSA fully from scratch whenever it widened an IV widening. There is some tension in this phase since it is SSA-based but also creates new locals that we really want to put in SSA. My plan is to add the ability to incrementally update SSA to get rid of this rebuilding.

Also, there is work necessary to properly prove that some analysis and folding is ok in the face of potential overflows. Still, I do not see why the use in this PR for IV widening should not be correct, so I want to see what CI finds. The implementation makes use of the fact that

  1. Almost all our extensions are zero extensions, and it is free for our targeted platforms to do a zero extension when storing from a 32-bit value; thus, the actual operations themselves are not widened and there is no risk of semantics changing overflow
  2. The 64-bit backends already supportTYP_INT access ofTYP_LONG locals, so no truncations need to be inserted.

[1] Michael Wolfe. 1992. Beyond induction variables. In Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation (PLDI '92). Association for Computing Machinery, New York, NY, USA, 162–174.https://doi.org/10.1145/143095.143131

Author:jakobbotsch
Assignees:jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone:-

@ryujit-bot
Copy link

Diff results for#97865

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on2,507,286 contexts (1,007,092 MinOpts,1,500,194 FullOpts).

MISSED contexts: base:27 (0.00%), diff:32 (0.00%)

Overall (+1,719,060 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch15,557,856+27,216
benchmarks.run_pgo.linux.arm64.checked.mch80,097,444+360,784
benchmarks.run_tiered.linux.arm64.checked.mch24,601,848+14,124
coreclr_tests.run.linux.arm64.checked.mch508,729,840+334,212
libraries.crossgen2.linux.arm64.checked.mch55,844,176+78,032
libraries.pmi.linux.arm64.checked.mch76,294,380+89,360
libraries_tests.run.linux.arm64.Release.mch395,716,116+669,880
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch165,002,020+115,260
realworld.run.linux.arm64.checked.mch15,904,164+20,608
smoke_tests.nativeaot.linux.arm64.checked.mch2,946,828+9,584
FullOpts (+1,719,060 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch15,252,928+27,216
benchmarks.run_pgo.linux.arm64.checked.mch54,162,660+360,784
benchmarks.run_tiered.linux.arm64.checked.mch4,863,412+14,124
coreclr_tests.run.linux.arm64.checked.mch160,581,688+334,212
libraries.crossgen2.linux.arm64.checked.mch55,842,540+78,032
libraries.pmi.linux.arm64.checked.mch76,174,396+89,360
libraries_tests.run.linux.arm64.Release.mch180,584,164+669,880
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch151,504,556+115,260
realworld.run.linux.arm64.checked.mch15,323,240+20,608
smoke_tests.nativeaot.linux.arm64.checked.mch2,945,840+9,584

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on2,517,881 contexts (991,070 MinOpts,1,526,811 FullOpts).

MISSED contexts:28 (0.00%)

Overall (+572,703 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.x64.checked.mch14,336,747+1,856
benchmarks.run_pgo.linux.x64.checked.mch71,580,068+153,953
benchmarks.run_tiered.linux.x64.checked.mch21,436,595-2,845
coreclr_tests.run.linux.x64.checked.mch403,712,034-51,428
libraries.crossgen2.linux.x64.checked.mch38,727,480+18,381
libraries.pmi.linux.x64.checked.mch60,420,078+23,143
libraries_tests.run.linux.x64.Release.mch337,125,301+391,429
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch132,553,319+29,341
realworld.run.linux.x64.checked.mch13,174,530+8,436
smoke_tests.nativeaot.linux.x64.checked.mch4,234,684+437
FullOpts (+572,703 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.x64.checked.mch14,037,444+1,856
benchmarks.run_pgo.linux.x64.checked.mch47,780,079+153,953
benchmarks.run_tiered.linux.x64.checked.mch3,695,815-2,845
coreclr_tests.run.linux.x64.checked.mch123,957,574-51,428
libraries.crossgen2.linux.x64.checked.mch38,726,282+18,381
libraries.pmi.linux.x64.checked.mch60,307,221+23,143
libraries_tests.run.linux.x64.Release.mch153,365,692+391,429
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch121,935,551+29,341
realworld.run.linux.x64.checked.mch12,788,646+8,436
smoke_tests.nativeaot.linux.x64.checked.mch4,233,735+437

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on2,270,841 contexts (932,669 MinOpts,1,338,172 FullOpts).

MISSED contexts: base:26 (0.00%), diff:29 (0.00%)

Overall (+1,124,576 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch11,186,636+27,356
benchmarks.run_pgo.osx.arm64.checked.mch34,442,660+59,268
benchmarks.run_tiered.osx.arm64.checked.mch15,517,000+13,372
coreclr_tests.run.osx.arm64.checked.mch486,415,276+312,820
libraries.crossgen2.osx.arm64.checked.mch55,725,640+77,884
libraries.pmi.osx.arm64.checked.mch80,218,788+99,136
libraries_tests.run.osx.arm64.Release.mch324,601,508+399,340
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch162,572,640+114,848
realworld.run.osx.arm64.checked.mch15,061,556+20,552
FullOpts (+1,124,576 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch11,186,100+27,356
benchmarks.run_pgo.osx.arm64.checked.mch18,141,360+59,268
benchmarks.run_tiered.osx.arm64.checked.mch4,012,516+13,372
coreclr_tests.run.osx.arm64.checked.mch153,802,352+312,820
libraries.crossgen2.osx.arm64.checked.mch55,724,012+77,884
libraries.pmi.osx.arm64.checked.mch80,097,660+99,136
libraries_tests.run.osx.arm64.Release.mch120,885,660+399,340
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch149,418,912+114,848
realworld.run.osx.arm64.checked.mch14,497,600+20,552

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on2,341,080 contexts (938,449 MinOpts,1,402,631 FullOpts).

MISSED contexts: base:34 (0.00%), diff:37 (0.00%)

Overall (+1,320,572 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch10,966,144+26,900
benchmarks.run_pgo.windows.arm64.checked.mch45,576,512+150,544
benchmarks.run_tiered.windows.arm64.checked.mch15,587,900+14,344
coreclr_tests.run.windows.arm64.checked.mch495,268,304+323,640
libraries.crossgen2.windows.arm64.checked.mch59,069,380+83,104
libraries.pmi.windows.arm64.checked.mch79,846,048+94,408
libraries_tests.run.windows.arm64.Release.mch330,812,748+472,236
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch171,579,600+120,020
realworld.run.windows.arm64.checked.mch15,905,252+21,532
smoke_tests.nativeaot.windows.arm64.checked.mch3,970,272+13,844
FullOpts (+1,320,572 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch10,965,608+26,900
benchmarks.run_pgo.windows.arm64.checked.mch29,565,116+150,544
benchmarks.run_tiered.windows.arm64.checked.mch4,410,712+14,344
coreclr_tests.run.windows.arm64.checked.mch156,579,200+323,640
libraries.crossgen2.windows.arm64.checked.mch59,067,744+83,104
libraries.pmi.windows.arm64.checked.mch79,726,064+94,408
libraries_tests.run.windows.arm64.Release.mch127,379,152+472,236
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch158,425,852+120,020
realworld.run.windows.arm64.checked.mch15,341,272+21,532
smoke_tests.nativeaot.windows.arm64.checked.mch3,969,260+13,844

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on2,512,182 contexts (997,391 MinOpts,1,514,791 FullOpts).

MISSED contexts: base:29 (0.00%), diff:30 (0.00%)

Overall (-18,030 bytes)
CollectionBase size (bytes)Diff size (bytes)
aspnet.run.windows.x64.checked.mch47,040,393+5,552
benchmarks.run.windows.x64.checked.mch8,742,476-2,838
benchmarks.run_pgo.windows.x64.checked.mch36,233,314-2,927
benchmarks.run_tiered.windows.x64.checked.mch12,416,932-4,780
coreclr_tests.run.windows.x64.checked.mch393,194,802-71,902
libraries.crossgen2.windows.x64.checked.mch39,486,541-1,798
libraries.pmi.windows.x64.checked.mch61,663,361-2,020
libraries_tests.run.windows.x64.Release.mch282,135,537+68,806
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch137,061,874-5,098
realworld.run.windows.x64.checked.mch14,130,826+1,095
smoke_tests.nativeaot.windows.x64.checked.mch5,083,179-2,120
FullOpts (-18,030 bytes)
CollectionBase size (bytes)Diff size (bytes)
aspnet.run.windows.x64.checked.mch28,549,344+5,552
benchmarks.run.windows.x64.checked.mch8,742,113-2,838
benchmarks.run_pgo.windows.x64.checked.mch22,062,658-2,927
benchmarks.run_tiered.windows.x64.checked.mch3,317,693-4,780
coreclr_tests.run.windows.x64.checked.mch120,406,222-71,902
libraries.crossgen2.windows.x64.checked.mch39,485,354-1,798
libraries.pmi.windows.x64.checked.mch61,549,840-2,020
libraries_tests.run.windows.x64.Release.mch106,277,303+68,806
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch126,442,768-5,098
realworld.run.windows.x64.checked.mch13,744,217+1,095
smoke_tests.nativeaot.windows.x64.checked.mch5,082,232-2,120

Detailshere


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.78% to+2.55%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+1.23%
benchmarks.run_pgo.linux.arm64.checked.mch+2.23%
benchmarks.run_tiered.linux.arm64.checked.mch+0.90%
coreclr_tests.run.linux.arm64.checked.mch+1.15%
libraries.crossgen2.linux.arm64.checked.mch+0.99%
libraries.pmi.linux.arm64.checked.mch+1.04%
libraries_tests.run.linux.arm64.Release.mch+2.55%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.78%
realworld.run.linux.arm64.checked.mch+1.32%
smoke_tests.nativeaot.linux.arm64.checked.mch+1.36%
MinOpts (-0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.linux.arm64.checked.mch+0.01%
FullOpts (+0.80% to+3.29%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+1.23%
benchmarks.run_pgo.linux.arm64.checked.mch+2.52%
benchmarks.run_tiered.linux.arm64.checked.mch+1.86%
coreclr_tests.run.linux.arm64.checked.mch+1.93%
libraries.crossgen2.linux.arm64.checked.mch+0.99%
libraries.pmi.linux.arm64.checked.mch+1.04%
libraries_tests.run.linux.arm64.Release.mch+3.29%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.80%
realworld.run.linux.arm64.checked.mch+1.33%
smoke_tests.nativeaot.linux.arm64.checked.mch+1.36%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.82% to+2.31%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+1.19%
benchmarks.run_pgo.linux.x64.checked.mch+2.29%
benchmarks.run_tiered.linux.x64.checked.mch+0.90%
coreclr_tests.run.linux.x64.checked.mch+1.23%
libraries.crossgen2.linux.x64.checked.mch+1.08%
libraries.pmi.linux.x64.checked.mch+1.10%
libraries_tests.run.linux.x64.Release.mch+2.31%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.82%
realworld.run.linux.x64.checked.mch+1.39%
smoke_tests.nativeaot.linux.x64.checked.mch+1.50%
FullOpts (+0.84% to+2.94%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+1.20%
benchmarks.run_pgo.linux.x64.checked.mch+2.57%
benchmarks.run_tiered.linux.x64.checked.mch+1.86%
coreclr_tests.run.linux.x64.checked.mch+2.06%
libraries.crossgen2.linux.x64.checked.mch+1.08%
libraries.pmi.linux.x64.checked.mch+1.10%
libraries_tests.run.linux.x64.Release.mch+2.94%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.84%
realworld.run.linux.x64.checked.mch+1.39%
smoke_tests.nativeaot.linux.x64.checked.mch+1.50%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.78% to+2.27%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+1.58%
benchmarks.run_pgo.osx.arm64.checked.mch+2.05%
benchmarks.run_tiered.osx.arm64.checked.mch+1.15%
coreclr_tests.run.osx.arm64.checked.mch+1.14%
libraries.crossgen2.osx.arm64.checked.mch+0.99%
libraries.pmi.osx.arm64.checked.mch+1.06%
libraries_tests.run.osx.arm64.Release.mch+2.27%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.78%
realworld.run.osx.arm64.checked.mch+1.37%
FullOpts (+0.80% to+3.18%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+1.58%
benchmarks.run_pgo.osx.arm64.checked.mch+2.52%
benchmarks.run_tiered.osx.arm64.checked.mch+1.99%
coreclr_tests.run.osx.arm64.checked.mch+1.92%
libraries.crossgen2.osx.arm64.checked.mch+0.99%
libraries.pmi.osx.arm64.checked.mch+1.07%
libraries_tests.run.osx.arm64.Release.mch+3.18%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.80%
realworld.run.osx.arm64.checked.mch+1.38%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.77% to+2.53%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+1.58%
benchmarks.run_pgo.windows.arm64.checked.mch+2.53%
benchmarks.run_tiered.windows.arm64.checked.mch+1.16%
coreclr_tests.run.windows.arm64.checked.mch+1.14%
libraries.crossgen2.windows.arm64.checked.mch+1.00%
libraries.pmi.windows.arm64.checked.mch+1.05%
libraries_tests.run.windows.arm64.Release.mch+2.21%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.77%
realworld.run.windows.arm64.checked.mch+1.36%
smoke_tests.nativeaot.windows.arm64.checked.mch+1.48%
FullOpts (+0.79% to+3.08%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+1.58%
benchmarks.run_pgo.windows.arm64.checked.mch+2.88%
benchmarks.run_tiered.windows.arm64.checked.mch+1.92%
coreclr_tests.run.windows.arm64.checked.mch+1.93%
libraries.crossgen2.windows.arm64.checked.mch+1.00%
libraries.pmi.windows.arm64.checked.mch+1.06%
libraries_tests.run.windows.arm64.Release.mch+3.08%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.79%
realworld.run.windows.arm64.checked.mch+1.37%
smoke_tests.nativeaot.windows.arm64.checked.mch+1.48%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.83% to+2.44%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+1.80%
benchmarks.run.windows.x64.checked.mch+1.59%
benchmarks.run_pgo.windows.x64.checked.mch+2.44%
benchmarks.run_tiered.windows.x64.checked.mch+1.14%
coreclr_tests.run.windows.x64.checked.mch+1.23%
libraries.crossgen2.windows.x64.checked.mch+1.09%
libraries.pmi.windows.x64.checked.mch+1.11%
libraries_tests.run.windows.x64.Release.mch+2.05%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.83%
realworld.run.windows.x64.checked.mch+1.35%
smoke_tests.nativeaot.windows.x64.checked.mch+1.54%
FullOpts (+0.85% to+2.80%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+2.06%
benchmarks.run.windows.x64.checked.mch+1.59%
benchmarks.run_pgo.windows.x64.checked.mch+2.80%
benchmarks.run_tiered.windows.x64.checked.mch+1.84%
coreclr_tests.run.windows.x64.checked.mch+2.04%
libraries.crossgen2.windows.x64.checked.mch+1.09%
libraries.pmi.windows.x64.checked.mch+1.11%
libraries_tests.run.windows.x64.Release.mch+2.80%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.85%
realworld.run.windows.x64.checked.mch+1.35%
smoke_tests.nativeaot.windows.x64.checked.mch+1.54%

Detailshere


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.68% to+2.26%)
CollectionPDIFF
coreclr_tests.run.linux.arm64.checked.mch+0.92%
benchmarks.run_pgo.linux.arm64.checked.mch+2.03%
benchmarks.run_tiered.linux.arm64.checked.mch+0.78%
libraries.pmi.linux.arm64.checked.mch+0.92%
libraries_tests.run.linux.arm64.Release.mch+2.26%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.68%
smoke_tests.nativeaot.linux.arm64.checked.mch+1.22%
libraries.crossgen2.linux.arm64.checked.mch+0.88%
benchmarks.run.linux.arm64.checked.mch+1.06%
realworld.run.linux.arm64.checked.mch+1.15%
FullOpts (+0.70% to+2.91%)
CollectionPDIFF
coreclr_tests.run.linux.arm64.checked.mch+1.59%
benchmarks.run_pgo.linux.arm64.checked.mch+2.29%
benchmarks.run_tiered.linux.arm64.checked.mch+1.58%
libraries.pmi.linux.arm64.checked.mch+0.92%
libraries_tests.run.linux.arm64.Release.mch+2.91%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.70%
smoke_tests.nativeaot.linux.arm64.checked.mch+1.22%
libraries.crossgen2.linux.arm64.checked.mch+0.88%
benchmarks.run.linux.arm64.checked.mch+1.07%
realworld.run.linux.arm64.checked.mch+1.16%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.72% to+2.06%)
CollectionPDIFF
libraries.pmi.linux.x64.checked.mch+0.96%
libraries.crossgen2.linux.x64.checked.mch+0.95%
benchmarks.run_pgo.linux.x64.checked.mch+2.06%
smoke_tests.nativeaot.linux.x64.checked.mch+1.31%
libraries_tests.run.linux.x64.Release.mch+2.02%
benchmarks.run_tiered.linux.x64.checked.mch+0.76%
realworld.run.linux.x64.checked.mch+1.20%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.72%
coreclr_tests.run.linux.x64.checked.mch+0.96%
benchmarks.run.linux.x64.checked.mch+1.02%
FullOpts (+0.74% to+2.58%)
CollectionPDIFF
libraries.pmi.linux.x64.checked.mch+0.96%
libraries.crossgen2.linux.x64.checked.mch+0.95%
benchmarks.run_pgo.linux.x64.checked.mch+2.31%
smoke_tests.nativeaot.linux.x64.checked.mch+1.31%
libraries_tests.run.linux.x64.Release.mch+2.58%
benchmarks.run_tiered.linux.x64.checked.mch+1.56%
realworld.run.linux.x64.checked.mch+1.21%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.74%
coreclr_tests.run.linux.x64.checked.mch+1.67%
benchmarks.run.linux.x64.checked.mch+1.03%

Detailshere


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.31% to+1.70%)
CollectionPDIFF
benchmarks.run.linux.arm.checked.mch+0.65%
benchmarks.run_pgo.linux.arm.checked.mch+0.31%
benchmarks.run_tiered.linux.arm.checked.mch+0.62%
coreclr_tests.run.linux.arm.checked.mch+1.70%
libraries.crossgen2.linux.arm.checked.mch+0.57%
libraries.pmi.linux.arm.checked.mch+0.60%
libraries_tests.run.linux.arm.Release.mch+0.77%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch+0.47%
realworld.run.linux.arm.checked.mch+0.93%
FullOpts (+0.33% to+2.77%)
CollectionPDIFF
benchmarks.run.linux.arm.checked.mch+0.65%
benchmarks.run_pgo.linux.arm.checked.mch+0.33%
benchmarks.run_tiered.linux.arm.checked.mch+0.75%
coreclr_tests.run.linux.arm.checked.mch+2.77%
libraries.crossgen2.linux.arm.checked.mch+0.57%
libraries.pmi.linux.arm.checked.mch+0.60%
libraries_tests.run.linux.arm.Release.mch+0.97%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch+0.49%
realworld.run.linux.arm.checked.mch+0.94%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.48% to+1.86%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+1.03%
benchmarks.run_pgo.windows.x86.checked.mch+0.48%
benchmarks.run_tiered.windows.x86.checked.mch+1.12%
coreclr_tests.run.windows.x86.checked.mch+1.86%
libraries.crossgen2.windows.x86.checked.mch+0.68%
libraries.pmi.windows.x86.checked.mch+0.66%
libraries_tests.run.windows.x86.Release.mch+0.95%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.50%
realworld.run.windows.x86.checked.mch+0.81%
FullOpts (+0.50% to+2.77%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+1.03%
benchmarks.run_pgo.windows.x86.checked.mch+0.50%
benchmarks.run_tiered.windows.x86.checked.mch+1.34%
coreclr_tests.run.windows.x86.checked.mch+2.77%
libraries.crossgen2.windows.x86.checked.mch+0.68%
libraries.pmi.windows.x86.checked.mch+0.66%
libraries_tests.run.windows.x86.Release.mch+1.19%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.51%
realworld.run.windows.x86.checked.mch+0.81%

Detailshere


@jakobbotsch
Copy link
MemberAuthor

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
MemberAuthor

Test failures look like#97892.
Regressions are much larger outside win-x64. Need to understand why, when I look at the examples almost all of them are range check related for the reason I mentioned above. I don't understand why that wouldn't affect win-x64 equally, though.
Some of the arm64 diffs look to be because we do not support the zero-costSTORE_LCL_VAR(CAST<long <- int>(data)) pattern there, but that should be implementable. Also, on arm64 the IV widening is proportionally less useful since address modes can zero extend as part of the instruction, so fewer direct benefits from IV widening are to be expected.

We currently support scaled addressing modes when the index also needsan extension through contained `BFIZ` nodes. However, we did not supportscaled addressing modes if the index was 64 bits. This adds thatsupport as a natural extension to the `GT_LEA`.
@ryujit-bot
Copy link

Diff results for#97865

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.30% to+1.63%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+0.56%
benchmarks.run_pgo.linux.arm64.checked.mch+1.40%
benchmarks.run_tiered.linux.arm64.checked.mch+0.30%
coreclr_tests.run.linux.arm64.checked.mch+0.39%
libraries.crossgen2.linux.arm64.checked.mch+0.47%
libraries.pmi.linux.arm64.checked.mch+0.54%
libraries_tests.run.linux.arm64.Release.mch+1.63%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.41%
realworld.run.linux.arm64.checked.mch+0.67%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%
MinOpts (-0.03% to+0.00%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch-0.02%
benchmarks.run_pgo.linux.arm64.checked.mch-0.02%
benchmarks.run_tiered.linux.arm64.checked.mch-0.02%
libraries.pmi.linux.arm64.checked.mch-0.03%
libraries_tests.run.linux.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch-0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch-0.01%
FullOpts (+0.42% to+2.10%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+0.56%
benchmarks.run_pgo.linux.arm64.checked.mch+1.59%
benchmarks.run_tiered.linux.arm64.checked.mch+0.65%
coreclr_tests.run.linux.arm64.checked.mch+0.66%
libraries.crossgen2.linux.arm64.checked.mch+0.47%
libraries.pmi.linux.arm64.checked.mch+0.54%
libraries_tests.run.linux.arm64.Release.mch+2.10%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.42%
realworld.run.linux.arm64.checked.mch+0.67%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.33% to+1.50%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+0.57%
benchmarks.run_pgo.linux.x64.checked.mch+1.49%
benchmarks.run_tiered.linux.x64.checked.mch+0.33%
coreclr_tests.run.linux.x64.checked.mch+0.44%
libraries.crossgen2.linux.x64.checked.mch+0.56%
libraries.pmi.linux.x64.checked.mch+0.60%
libraries_tests.run.linux.x64.Release.mch+1.50%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.46%
realworld.run.linux.x64.checked.mch+0.74%
smoke_tests.nativeaot.linux.x64.checked.mch+0.74%
MinOpts (+0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.linux.x64.checked.mch+0.01%
FullOpts (+0.47% to+1.91%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+0.58%
benchmarks.run_pgo.linux.x64.checked.mch+1.67%
benchmarks.run_tiered.linux.x64.checked.mch+0.68%
coreclr_tests.run.linux.x64.checked.mch+0.74%
libraries.crossgen2.linux.x64.checked.mch+0.56%
libraries.pmi.linux.x64.checked.mch+0.60%
libraries_tests.run.linux.x64.Release.mch+1.91%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.47%
realworld.run.linux.x64.checked.mch+0.74%
smoke_tests.nativeaot.linux.x64.checked.mch+0.74%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.37% to+1.43%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+0.73%
benchmarks.run_pgo.osx.arm64.checked.mch+0.95%
benchmarks.run_tiered.osx.arm64.checked.mch+0.39%
coreclr_tests.run.osx.arm64.checked.mch+0.37%
libraries.crossgen2.osx.arm64.checked.mch+0.47%
libraries.pmi.osx.arm64.checked.mch+0.56%
libraries_tests.run.osx.arm64.Release.mch+1.43%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.41%
realworld.run.osx.arm64.checked.mch+0.69%
MinOpts (-0.04% to+0.01%)
CollectionPDIFF
benchmarks.run_pgo.osx.arm64.checked.mch-0.03%
benchmarks.run_tiered.osx.arm64.checked.mch-0.03%
libraries.crossgen2.osx.arm64.checked.mch-0.01%
libraries.pmi.osx.arm64.checked.mch-0.04%
libraries_tests.run.osx.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch-0.01%
realworld.run.osx.arm64.checked.mch+0.01%
FullOpts (+0.42% to+2.02%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+0.73%
benchmarks.run_pgo.osx.arm64.checked.mch+1.18%
benchmarks.run_tiered.osx.arm64.checked.mch+0.70%
coreclr_tests.run.osx.arm64.checked.mch+0.63%
libraries.crossgen2.osx.arm64.checked.mch+0.47%
libraries.pmi.osx.arm64.checked.mch+0.56%
libraries_tests.run.osx.arm64.Release.mch+2.02%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.42%
realworld.run.osx.arm64.checked.mch+0.70%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.39% to+1.50%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+0.73%
benchmarks.run_pgo.windows.arm64.checked.mch+1.50%
benchmarks.run_tiered.windows.arm64.checked.mch+0.40%
coreclr_tests.run.windows.arm64.checked.mch+0.39%
libraries.crossgen2.windows.arm64.checked.mch+0.47%
libraries.pmi.windows.arm64.checked.mch+0.55%
libraries_tests.run.windows.arm64.Release.mch+1.40%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.41%
realworld.run.windows.arm64.checked.mch+0.69%
smoke_tests.nativeaot.windows.arm64.checked.mch+0.65%
MinOpts (-0.03% to+0.00%)
CollectionPDIFF
benchmarks.run_pgo.windows.arm64.checked.mch-0.03%
benchmarks.run_tiered.windows.arm64.checked.mch-0.03%
libraries.pmi.windows.arm64.checked.mch-0.03%
libraries_tests.run.windows.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch-0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch-0.01%
FullOpts (+0.42% to+1.96%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+0.73%
benchmarks.run_pgo.windows.arm64.checked.mch+1.72%
benchmarks.run_tiered.windows.arm64.checked.mch+0.69%
coreclr_tests.run.windows.arm64.checked.mch+0.66%
libraries.crossgen2.windows.arm64.checked.mch+0.47%
libraries.pmi.windows.arm64.checked.mch+0.55%
libraries_tests.run.windows.arm64.Release.mch+1.96%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.42%
realworld.run.windows.arm64.checked.mch+0.70%
smoke_tests.nativeaot.windows.arm64.checked.mch+0.65%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.43% to+1.31%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+1.19%
benchmarks.run.windows.x64.checked.mch+0.78%
benchmarks.run_pgo.windows.x64.checked.mch+1.28%
benchmarks.run_tiered.windows.x64.checked.mch+0.43%
coreclr_tests.run.windows.x64.checked.mch+0.44%
libraries.crossgen2.windows.x64.checked.mch+0.56%
libraries.pmi.windows.x64.checked.mch+0.60%
libraries_tests.run.windows.x64.Release.mch+1.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.46%
realworld.run.windows.x64.checked.mch+0.72%
smoke_tests.nativeaot.windows.x64.checked.mch+0.76%
MinOpts (+0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.windows.x64.checked.mch+0.01%
FullOpts (+0.47% to+1.79%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+1.36%
benchmarks.run.windows.x64.checked.mch+0.78%
benchmarks.run_pgo.windows.x64.checked.mch+1.47%
benchmarks.run_tiered.windows.x64.checked.mch+0.70%
coreclr_tests.run.windows.x64.checked.mch+0.74%
libraries.crossgen2.windows.x64.checked.mch+0.56%
libraries.pmi.windows.x64.checked.mch+0.60%
libraries_tests.run.windows.x64.Release.mch+1.79%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.47%
realworld.run.windows.x64.checked.mch+0.72%
smoke_tests.nativeaot.windows.x64.checked.mch+0.76%

Detailshere


Throughput diffs for linux/arm ran on windows/x86

MinOpts (-0.02% to-0.00%)
CollectionPDIFF
benchmarks.run.linux.arm.checked.mch-0.01%
benchmarks.run_pgo.linux.arm.checked.mch-0.01%
benchmarks.run_tiered.linux.arm.checked.mch-0.01%
libraries.pmi.linux.arm.checked.mch-0.02%
libraries_tests.run.linux.arm.Release.mch-0.01%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.01%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+0.01%
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
coreclr_tests.run.windows.x86.checked.mch+0.01%
libraries.crossgen2.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.01%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
realworld.run.windows.x86.checked.mch+0.01%
MinOpts (+0.00% to+0.02%)
CollectionPDIFF
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.02%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
FullOpts (+0.01%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+0.01%
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
coreclr_tests.run.windows.x86.checked.mch+0.01%
libraries.crossgen2.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.01%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
realworld.run.windows.x86.checked.mch+0.01%

Detailshere


@ryujit-bot
Copy link

Diff results for#97865

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on2,505,877 contexts (1,007,092 MinOpts,1,498,785 FullOpts).

MISSED contexts: base:1,433 (0.06%), diff:1,441 (0.06%)

Overall (+904,416 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch15,545,232+13,448
benchmarks.run_pgo.linux.arm64.checked.mch79,735,896+232,112
benchmarks.run_tiered.linux.arm64.checked.mch24,597,876+1,620
coreclr_tests.run.linux.arm64.checked.mch508,521,576+166,136
libraries.crossgen2.linux.arm64.checked.mch55,834,212+40,568
libraries.pmi.linux.arm64.checked.mch76,274,408+50,684
libraries_tests.run.linux.arm64.Release.mch394,024,440+327,664
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch164,978,292+57,440
realworld.run.linux.arm64.checked.mch15,900,008+10,512
smoke_tests.nativeaot.linux.arm64.checked.mch2,829,664+4,232
MinOpts (-22,888 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch25,934,784-5,292
benchmarks.run_tiered.linux.arm64.checked.mch19,738,436-3,656
coreclr_tests.run.linux.arm64.checked.mch348,148,152-2,712
libraries_tests.run.linux.arm64.Release.mch215,131,952-10,636
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch13,497,464-592
FullOpts (+927,304 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch15,240,304+13,448
benchmarks.run_pgo.linux.arm64.checked.mch53,801,112+237,404
benchmarks.run_tiered.linux.arm64.checked.mch4,859,440+5,276
coreclr_tests.run.linux.arm64.checked.mch160,373,424+168,848
libraries.crossgen2.linux.arm64.checked.mch55,832,576+40,568
libraries.pmi.linux.arm64.checked.mch76,154,424+50,684
libraries_tests.run.linux.arm64.Release.mch178,892,488+338,300
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch151,480,828+58,032
realworld.run.linux.arm64.checked.mch15,319,084+10,512
smoke_tests.nativeaot.linux.arm64.checked.mch2,828,676+4,232

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on2,516,325 contexts (991,070 MinOpts,1,525,255 FullOpts).

MISSED contexts:1,584 (0.06%)

Overall (+565,740 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.x64.checked.mch14,336,053+1,904
benchmarks.run_pgo.linux.x64.checked.mch71,262,198+157,649
benchmarks.run_tiered.linux.x64.checked.mch21,436,449-2,851
coreclr_tests.run.linux.x64.checked.mch403,514,426-53,512
libraries.crossgen2.linux.x64.checked.mch38,726,572+18,689
libraries.pmi.linux.x64.checked.mch60,417,780+23,461
libraries_tests.run.linux.x64.Release.mch335,675,120+382,346
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch132,550,474+29,541
realworld.run.linux.x64.checked.mch13,174,358+8,391
smoke_tests.nativeaot.linux.x64.checked.mch4,064,942+122
FullOpts (+565,740 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.x64.checked.mch14,036,750+1,904
benchmarks.run_pgo.linux.x64.checked.mch47,462,209+157,649
benchmarks.run_tiered.linux.x64.checked.mch3,695,669-2,851
coreclr_tests.run.linux.x64.checked.mch123,759,966-53,512
libraries.crossgen2.linux.x64.checked.mch38,725,374+18,689
libraries.pmi.linux.x64.checked.mch60,304,923+23,461
libraries_tests.run.linux.x64.Release.mch151,915,511+382,346
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch121,932,706+29,541
realworld.run.linux.x64.checked.mch12,788,474+8,391
smoke_tests.nativeaot.linux.x64.checked.mch4,063,993+122

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on2,270,092 contexts (932,669 MinOpts,1,337,423 FullOpts).

MISSED contexts: base:772 (0.03%), diff:778 (0.03%)

Overall (+539,840 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch11,173,700+13,580
benchmarks.run_pgo.osx.arm64.checked.mch34,346,112+21,404
benchmarks.run_tiered.osx.arm64.checked.mch15,512,636+1,296
coreclr_tests.run.osx.arm64.checked.mch486,269,368+155,172
libraries.crossgen2.osx.arm64.checked.mch55,715,716+40,472
libraries.pmi.osx.arm64.checked.mch80,194,908+61,468
libraries_tests.run.osx.arm64.Release.mch323,385,308+178,832
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch162,548,024+57,088
realworld.run.osx.arm64.checked.mch15,058,824+10,528
MinOpts (-21,812 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch16,301,300-4,868
benchmarks.run_tiered.osx.arm64.checked.mch11,504,484-3,636
coreclr_tests.run.osx.arm64.checked.mch332,612,924-2,076
libraries_tests.run.osx.arm64.Release.mch203,715,848-10,640
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch13,153,728-592
FullOpts (+561,652 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch11,173,164+13,580
benchmarks.run_pgo.osx.arm64.checked.mch18,044,812+26,272
benchmarks.run_tiered.osx.arm64.checked.mch4,008,152+4,932
coreclr_tests.run.osx.arm64.checked.mch153,656,444+157,248
libraries.crossgen2.osx.arm64.checked.mch55,714,088+40,472
libraries.pmi.osx.arm64.checked.mch80,073,780+61,468
libraries_tests.run.osx.arm64.Release.mch119,669,460+189,472
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch149,394,296+57,680
realworld.run.osx.arm64.checked.mch14,494,868+10,528

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on2,339,802 contexts (938,449 MinOpts,1,401,353 FullOpts).

MISSED contexts: base:1,309 (0.06%), diff:1,315 (0.06%)

Overall (+659,716 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch10,957,488+13,160
benchmarks.run_pgo.windows.arm64.checked.mch45,442,912+82,132
benchmarks.run_tiered.windows.arm64.checked.mch15,586,420+1,528
coreclr_tests.run.windows.arm64.checked.mch495,080,320+159,760
libraries.crossgen2.windows.arm64.checked.mch59,059,452+43,292
libraries.pmi.windows.arm64.checked.mch79,827,256+49,804
libraries_tests.run.windows.arm64.Release.mch329,452,088+233,324
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch171,557,220+59,636
realworld.run.windows.arm64.checked.mch15,902,524+11,028
smoke_tests.nativeaot.windows.arm64.checked.mch3,798,548+6,052
MinOpts (-21,684 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch16,011,396-4,920
benchmarks.run_tiered.windows.arm64.checked.mch11,177,188-3,600
coreclr_tests.run.windows.arm64.checked.mch338,689,104-2,184
libraries_tests.run.windows.arm64.Release.mch203,433,596-10,388
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch13,153,748-592
FullOpts (+681,400 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch10,956,952+13,160
benchmarks.run_pgo.windows.arm64.checked.mch29,431,516+87,052
benchmarks.run_tiered.windows.arm64.checked.mch4,409,232+5,128
coreclr_tests.run.windows.arm64.checked.mch156,391,216+161,944
libraries.crossgen2.windows.arm64.checked.mch59,057,816+43,292
libraries.pmi.windows.arm64.checked.mch79,707,272+49,804
libraries_tests.run.windows.arm64.Release.mch126,018,492+243,712
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch158,403,472+60,228
realworld.run.windows.arm64.checked.mch15,338,544+11,028
smoke_tests.nativeaot.windows.arm64.checked.mch3,797,536+6,052

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on2,510,841 contexts (997,391 MinOpts,1,513,450 FullOpts).

MISSED contexts: base:1,370 (0.05%), diff:1,371 (0.05%)

Overall (-20,210 bytes)
CollectionBase size (bytes)Diff size (bytes)
aspnet.run.windows.x64.checked.mch47,016,272+5,893
benchmarks.run.windows.x64.checked.mch8,741,804-2,717
benchmarks.run_pgo.windows.x64.checked.mch36,134,420-2,823
benchmarks.run_tiered.windows.x64.checked.mch12,416,717-4,767
coreclr_tests.run.windows.x64.checked.mch393,035,946-72,969
libraries.crossgen2.windows.x64.checked.mch39,485,796-1,523
libraries.pmi.windows.x64.checked.mch61,660,326-1,714
libraries_tests.run.windows.x64.Release.mch280,853,081+65,922
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch137,059,674-5,007
realworld.run.windows.x64.checked.mch14,130,524+1,107
smoke_tests.nativeaot.windows.x64.checked.mch4,880,142-1,612
FullOpts (-20,210 bytes)
CollectionBase size (bytes)Diff size (bytes)
aspnet.run.windows.x64.checked.mch28,525,223+5,893
benchmarks.run.windows.x64.checked.mch8,741,441-2,717
benchmarks.run_pgo.windows.x64.checked.mch21,963,764-2,823
benchmarks.run_tiered.windows.x64.checked.mch3,317,478-4,767
coreclr_tests.run.windows.x64.checked.mch120,247,366-72,969
libraries.crossgen2.windows.x64.checked.mch39,484,609-1,523
libraries.pmi.windows.x64.checked.mch61,546,805-1,714
libraries_tests.run.windows.x64.Release.mch104,994,847+65,922
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch126,440,568-5,007
realworld.run.windows.x64.checked.mch13,743,915+1,107
smoke_tests.nativeaot.windows.x64.checked.mch4,879,195-1,612

Detailshere


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.30% to+1.63%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+0.56%
benchmarks.run_pgo.linux.arm64.checked.mch+1.40%
benchmarks.run_tiered.linux.arm64.checked.mch+0.30%
coreclr_tests.run.linux.arm64.checked.mch+0.39%
libraries.crossgen2.linux.arm64.checked.mch+0.47%
libraries.pmi.linux.arm64.checked.mch+0.54%
libraries_tests.run.linux.arm64.Release.mch+1.63%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.41%
realworld.run.linux.arm64.checked.mch+0.67%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%
MinOpts (-0.03% to+0.00%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch-0.02%
benchmarks.run_pgo.linux.arm64.checked.mch-0.02%
benchmarks.run_tiered.linux.arm64.checked.mch-0.02%
libraries.pmi.linux.arm64.checked.mch-0.03%
libraries_tests.run.linux.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch-0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch-0.01%
FullOpts (+0.42% to+2.10%)
CollectionPDIFF
benchmarks.run.linux.arm64.checked.mch+0.56%
benchmarks.run_pgo.linux.arm64.checked.mch+1.59%
benchmarks.run_tiered.linux.arm64.checked.mch+0.65%
coreclr_tests.run.linux.arm64.checked.mch+0.66%
libraries.crossgen2.linux.arm64.checked.mch+0.47%
libraries.pmi.linux.arm64.checked.mch+0.54%
libraries_tests.run.linux.arm64.Release.mch+2.10%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.42%
realworld.run.linux.arm64.checked.mch+0.67%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.33% to+1.50%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+0.57%
benchmarks.run_pgo.linux.x64.checked.mch+1.49%
benchmarks.run_tiered.linux.x64.checked.mch+0.33%
coreclr_tests.run.linux.x64.checked.mch+0.44%
libraries.crossgen2.linux.x64.checked.mch+0.56%
libraries.pmi.linux.x64.checked.mch+0.60%
libraries_tests.run.linux.x64.Release.mch+1.50%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.46%
realworld.run.linux.x64.checked.mch+0.74%
smoke_tests.nativeaot.linux.x64.checked.mch+0.74%
MinOpts (+0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.linux.x64.checked.mch+0.01%
FullOpts (+0.47% to+1.91%)
CollectionPDIFF
benchmarks.run.linux.x64.checked.mch+0.58%
benchmarks.run_pgo.linux.x64.checked.mch+1.67%
benchmarks.run_tiered.linux.x64.checked.mch+0.68%
coreclr_tests.run.linux.x64.checked.mch+0.74%
libraries.crossgen2.linux.x64.checked.mch+0.56%
libraries.pmi.linux.x64.checked.mch+0.60%
libraries_tests.run.linux.x64.Release.mch+1.91%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.47%
realworld.run.linux.x64.checked.mch+0.74%
smoke_tests.nativeaot.linux.x64.checked.mch+0.74%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.37% to+1.43%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+0.73%
benchmarks.run_pgo.osx.arm64.checked.mch+0.95%
benchmarks.run_tiered.osx.arm64.checked.mch+0.39%
coreclr_tests.run.osx.arm64.checked.mch+0.37%
libraries.crossgen2.osx.arm64.checked.mch+0.47%
libraries.pmi.osx.arm64.checked.mch+0.56%
libraries_tests.run.osx.arm64.Release.mch+1.43%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.41%
realworld.run.osx.arm64.checked.mch+0.69%
MinOpts (-0.04% to+0.01%)
CollectionPDIFF
benchmarks.run_pgo.osx.arm64.checked.mch-0.03%
benchmarks.run_tiered.osx.arm64.checked.mch-0.03%
libraries.crossgen2.osx.arm64.checked.mch-0.01%
libraries.pmi.osx.arm64.checked.mch-0.04%
libraries_tests.run.osx.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch-0.01%
realworld.run.osx.arm64.checked.mch+0.01%
FullOpts (+0.42% to+2.02%)
CollectionPDIFF
benchmarks.run.osx.arm64.checked.mch+0.73%
benchmarks.run_pgo.osx.arm64.checked.mch+1.18%
benchmarks.run_tiered.osx.arm64.checked.mch+0.70%
coreclr_tests.run.osx.arm64.checked.mch+0.63%
libraries.crossgen2.osx.arm64.checked.mch+0.47%
libraries.pmi.osx.arm64.checked.mch+0.56%
libraries_tests.run.osx.arm64.Release.mch+2.02%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch+0.42%
realworld.run.osx.arm64.checked.mch+0.70%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.39% to+1.50%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+0.73%
benchmarks.run_pgo.windows.arm64.checked.mch+1.50%
benchmarks.run_tiered.windows.arm64.checked.mch+0.40%
coreclr_tests.run.windows.arm64.checked.mch+0.39%
libraries.crossgen2.windows.arm64.checked.mch+0.47%
libraries.pmi.windows.arm64.checked.mch+0.55%
libraries_tests.run.windows.arm64.Release.mch+1.40%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.41%
realworld.run.windows.arm64.checked.mch+0.69%
smoke_tests.nativeaot.windows.arm64.checked.mch+0.65%
MinOpts (-0.03% to+0.00%)
CollectionPDIFF
benchmarks.run_pgo.windows.arm64.checked.mch-0.03%
benchmarks.run_tiered.windows.arm64.checked.mch-0.03%
libraries.pmi.windows.arm64.checked.mch-0.03%
libraries_tests.run.windows.arm64.Release.mch-0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch-0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch-0.01%
FullOpts (+0.42% to+1.96%)
CollectionPDIFF
benchmarks.run.windows.arm64.checked.mch+0.73%
benchmarks.run_pgo.windows.arm64.checked.mch+1.72%
benchmarks.run_tiered.windows.arm64.checked.mch+0.69%
coreclr_tests.run.windows.arm64.checked.mch+0.66%
libraries.crossgen2.windows.arm64.checked.mch+0.47%
libraries.pmi.windows.arm64.checked.mch+0.55%
libraries_tests.run.windows.arm64.Release.mch+1.96%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch+0.42%
realworld.run.windows.arm64.checked.mch+0.70%
smoke_tests.nativeaot.windows.arm64.checked.mch+0.65%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.43% to+1.31%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+1.19%
benchmarks.run.windows.x64.checked.mch+0.78%
benchmarks.run_pgo.windows.x64.checked.mch+1.28%
benchmarks.run_tiered.windows.x64.checked.mch+0.43%
coreclr_tests.run.windows.x64.checked.mch+0.44%
libraries.crossgen2.windows.x64.checked.mch+0.56%
libraries.pmi.windows.x64.checked.mch+0.60%
libraries_tests.run.windows.x64.Release.mch+1.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.46%
realworld.run.windows.x64.checked.mch+0.72%
smoke_tests.nativeaot.windows.x64.checked.mch+0.76%
MinOpts (+0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.windows.x64.checked.mch+0.01%
FullOpts (+0.47% to+1.79%)
CollectionPDIFF
aspnet.run.windows.x64.checked.mch+1.36%
benchmarks.run.windows.x64.checked.mch+0.78%
benchmarks.run_pgo.windows.x64.checked.mch+1.47%
benchmarks.run_tiered.windows.x64.checked.mch+0.70%
coreclr_tests.run.windows.x64.checked.mch+0.74%
libraries.crossgen2.windows.x64.checked.mch+0.56%
libraries.pmi.windows.x64.checked.mch+0.60%
libraries_tests.run.windows.x64.Release.mch+1.79%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch+0.47%
realworld.run.windows.x64.checked.mch+0.72%
smoke_tests.nativeaot.windows.x64.checked.mch+0.76%

Detailshere


Throughput diffs for linux/arm ran on windows/x86

MinOpts (-0.02% to-0.00%)
CollectionPDIFF
benchmarks.run.linux.arm.checked.mch-0.01%
benchmarks.run_pgo.linux.arm.checked.mch-0.01%
benchmarks.run_tiered.linux.arm.checked.mch-0.01%
libraries.pmi.linux.arm.checked.mch-0.02%
libraries_tests.run.linux.arm.Release.mch-0.01%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.01%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+0.01%
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
coreclr_tests.run.windows.x86.checked.mch+0.01%
libraries.crossgen2.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.01%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
realworld.run.windows.x86.checked.mch+0.01%
MinOpts (+0.00% to+0.02%)
CollectionPDIFF
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.02%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
FullOpts (+0.01%)
CollectionPDIFF
benchmarks.run.windows.x86.checked.mch+0.01%
benchmarks.run_pgo.windows.x86.checked.mch+0.01%
benchmarks.run_tiered.windows.x86.checked.mch+0.01%
coreclr_tests.run.windows.x86.checked.mch+0.01%
libraries.crossgen2.windows.x86.checked.mch+0.01%
libraries.pmi.windows.x86.checked.mch+0.01%
libraries_tests.run.windows.x86.Release.mch+0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch+0.01%
realworld.run.windows.x86.checked.mch+0.01%

Detailshere


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.32% to+1.53%)
CollectionPDIFF
benchmarks.run_pgo.linux.arm64.checked.mch+1.38%
libraries_tests.run.linux.arm64.Release.mch+1.53%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%
coreclr_tests.run.linux.arm64.checked.mch+0.38%
benchmarks.run_tiered.linux.arm64.checked.mch+0.32%
libraries.crossgen2.linux.arm64.checked.mch+0.48%
benchmarks.run.linux.arm64.checked.mch+0.55%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.41%
realworld.run.linux.arm64.checked.mch+0.66%
libraries.pmi.linux.arm64.checked.mch+0.55%
MinOpts (+0.01% to+0.08%)
CollectionPDIFF
benchmarks.run_pgo.linux.arm64.checked.mch+0.01%
libraries_tests.run.linux.arm64.Release.mch+0.02%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.02%
coreclr_tests.run.linux.arm64.checked.mch+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch+0.01%
libraries.crossgen2.linux.arm64.checked.mch+0.01%
benchmarks.run.linux.arm64.checked.mch+0.05%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.01%
realworld.run.linux.arm64.checked.mch+0.01%
libraries.pmi.linux.arm64.checked.mch+0.08%
FullOpts (+0.42% to+1.96%)
CollectionPDIFF
benchmarks.run_pgo.linux.arm64.checked.mch+1.55%
libraries_tests.run.linux.arm64.Release.mch+1.96%
smoke_tests.nativeaot.linux.arm64.checked.mch+0.61%
coreclr_tests.run.linux.arm64.checked.mch+0.65%
benchmarks.run_tiered.linux.arm64.checked.mch+0.64%
libraries.crossgen2.linux.arm64.checked.mch+0.48%
benchmarks.run.linux.arm64.checked.mch+0.56%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch+0.42%
realworld.run.linux.arm64.checked.mch+0.66%
libraries.pmi.linux.arm64.checked.mch+0.55%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.31% to+1.43%)
CollectionPDIFF
libraries.crossgen2.linux.x64.checked.mch+0.54%
benchmarks.run_pgo.linux.x64.checked.mch+1.43%
coreclr_tests.run.linux.x64.checked.mch+0.40%
realworld.run.linux.x64.checked.mch+0.70%
smoke_tests.nativeaot.linux.x64.checked.mch+0.71%
benchmarks.run.linux.x64.checked.mch+0.54%
libraries.pmi.linux.x64.checked.mch+0.56%
libraries_tests.run.linux.x64.Release.mch+1.37%
benchmarks.run_tiered.linux.x64.checked.mch+0.31%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.43%
MinOpts (+0.00% to+0.01%)
CollectionPDIFF
libraries.pmi.linux.x64.checked.mch+0.01%
FullOpts (+0.44% to+1.75%)
CollectionPDIFF
libraries.crossgen2.linux.x64.checked.mch+0.54%
benchmarks.run_pgo.linux.x64.checked.mch+1.60%
coreclr_tests.run.linux.x64.checked.mch+0.70%
realworld.run.linux.x64.checked.mch+0.70%
smoke_tests.nativeaot.linux.x64.checked.mch+0.71%
benchmarks.run.linux.x64.checked.mch+0.54%
libraries.pmi.linux.x64.checked.mch+0.56%
libraries_tests.run.linux.x64.Release.mch+1.75%
benchmarks.run_tiered.linux.x64.checked.mch+0.64%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch+0.44%

Detailshere


@jakobbotsch
Copy link
MemberAuthor

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
MemberAuthor

jakobbotsch commentedFeb 22, 2024
edited
Loading

cc @dotnet/jit-contrib PTAL@BruceForstall@AndyAyersMS

Diffs. I spent a long time analyzing these, see#97865 (comment) and#97865 (comment).

There were a bunch of regressions when IV widening in cases where the same local is the primary IV of multiple loops. This particularly affects cloning. I've disabled this case for now; I want to look closer at the case and figure out how to enable it separately.

The initial implementation always placed the initialization of the widened IV in the preheader. However, if the reaching def from outside the loop comes from a block much further back, then that means the lifetime of the widened IV is shorter than the old IV. In a lot of cases that resulted in suboptimal register allocation of the widened IV (because other live variables ended up taking the "good" register before we got to the initialization of the widened IV). So this implementation tries to place the initialization of the widened IV in the same spot as the old narrow IV, and then does a best-effort pass to replace uses of the narrow IV with the wide IV on the way to the loop. That seems to work well, but it's a bit of a hack -- I hope to reevaluate it in the future, perhaps with some LSRA work.

Throughput impact is in the 0.2-0.3% range for most collections, with benchmarks.run_pgo being a bit worse.

Copy link
Member

@AndyAyersMSAndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Did a first review pass and it looks good overall. Will look again later.

I wonder if you might save some TP if you kept track of the uses during the locate phase and used that to drive replacement.

More generally if we want to recognize secondary IVs it seems like walking forward from the primary IV appearances might be a good plan. Perhaps when we get to strength reduction?

// => <L, 0, <L, 1, 1>>
//
// The main issue is that it requires cache invalidation afterwards
// and turning the recursive result into an addrec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any idea how prevalent these more complicated IVs might be?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't checked, but I have some (hacky) code lying around for it that I was planning to look at the impact of later.

There are some other notable missing things in the SCEV IR, like a subtraction operator, so loops that subtract by an invariant local are not representable (subtractions with a constant are usually represented asGT_ADD so those are ok).

}

return nullptr;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hoisting has a more refined notion of loop invariance that can handle more tree shapes; any thought of using something like that here?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do you mean by switching to use VN? I suppose that would be possible (although somewhat annoying since the VN wouldn't give me things like "reaching def from outside the loop" that I'm making use of here). I think it would mainly get us the ability to reason about loop invariant loads (so you could have e.g.<L00, 0, IND(x)>, provided we also started representing loads in the SCEV IR).

@jakobbotsch
Copy link
MemberAuthor

I wonder if you might save some TP if you kept track of the uses during the locate phase and used that to drive replacement.

More generally if we want to recognize secondary IVs it seems like walking forward from the primary IV appearances might be a good plan. Perhaps when we get to strength reduction?

I pushed a change that switches the profitability check to use the tree list and also saves all statements in which appearences were found such that the replacement step doesn't need to walk all the loop IR again.

The savings are relatively modest (TP diff) -- I think most TP is eaten by the DFS and loop finding. But indeed it seems like we can reuse it for strength reduction as well, to limit what nodes we invoke the scalar evolution analysis on.

Copy link
Member

@AndyAyersMSAndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

Will be interesting to see how this evolves when you start to look for secondary IVs.

am11 reacted with thumbs up emoji
@BruceForstall
Copy link
Contributor

cc @dotnet/jit-contrib PTAL@BruceForstall@AndyAyersMS

I've started looking at this.

Copy link
Contributor

@BruceForstallBruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some comments:

  1. It would be valuable for this code to have more design documentation. E.g., a Markdown document (in the JIT "doc" tree) on SCEV, on the IV widening algorithm design, etc. This could also take the form of more extensive design-oriented comments.
  2. Would it make sense to extract the SCEV code to separate scev.h/cpp files? That is, separate the analysis and optimization code.
  3. Add cScev/dScev debug dumpers?
  4. Is there any useful dumper to be added to clrjt.natvis?
  5. To match LLVM conventions, should the SCEV display the addition operator (even though that is currently the only one), e.g.,<loop, start, +, step> or<start, +, step>(loop)?

// Parameters:
// scev - The scev node.
//
void ScalarEvolutionContext::DumpScev(Scev* scev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I presume this can't bestatic for some reason? If it was static it would be easier to call from a debugger, perhaps.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The main thing was that I wanted to display the specific loop forScevAddRec, andScevAddRec itself doesn't have the loop field. However, I just added that loop field as a debug-only field, and made the dumping a member method onScev. So now we havecScev anddScev.

}

//------------------------------------------------------------------------
// optBestEffortReplaceNarrowIVUsesWith: Try to find and replace uses of the specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What makes this "best effort"?

It seems awkward to end the function name with "With".

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Renamed it tooptBestEffortReplaceNarrowIVUses, and added a remarks:

// Remarks://   This function is best effort; it might not find all uses of the provided//   SSA num, particularly because it does not follow into joins. Note that we//   only use this to replace uses of the narrow IV outside the loop; inside//   the loop we do ensure that all uses/defs are replaced.//   Keeping it best-effort outside the loop is ok; there is no correctness//   issue since we do not invalidate the value of the old narrow IV in any//   way, but it may mean we end up leaving the narrow IV live concurrently//   with the new widened IV, increasing register pressure.

@jakobbotsch
Copy link
MemberAuthor

Some comments:

  1. It would be valuable for this code to have more design documentation. E.g., a Markdown document (in the JIT "doc" tree) on SCEV, on the IV widening algorithm design, etc. This could also take the form of more extensive design-oriented comments.

Makes sense, but personally I prefer to have it in the relevant files, so I added some documentation there.

  1. Would it make sense to extract the SCEV code to separate scev.h/cpp files? That is, separate the analysis and optimization code.

Yeah, seems good to me. I split it into scev.h, scev.cpp and inductionvariableopts.cpp.

  1. Add cScev/dScev debug dumpers?

Done.

  1. Is there any useful dumper to be added to clrjt.natvis?

Added some basic display of the SCEV oper/type.

  1. To match LLVM conventions, should the SCEV display the addition operator (even though that is currently the only one), e.g.,<loop, start, +, step> or<start, +, step>(loop)?

I actually matched the notation from the paper (I think LLVM uses curly braces as well, and has a different representation of chains of recurrences). If we end up with more complex recurrences then we can consider changing the notation.

@jakobbotsch
Copy link
MemberAuthor

SPMI 0 sized diffs look like#98996. All CI jobs were green last time it ran, and since then I only reorganized some things without any diffs, so the failures/timeouts look unrelated.

Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@AndyAyersMSAndyAyersMSAndyAyersMS approved these changes

+1 more reviewer

@BruceForstallBruceForstallBruceForstall approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

@jakobbotschjakobbotsch

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@jakobbotsch@ryujit-bot@AndyAyersMS@BruceForstall

[8]ページ先頭

©2009-2025 Movatter.jp