- Notifications
You must be signed in to change notification settings - Fork5.2k
Increase max loops optimized by RyuJIT from 16 to 64.#55614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
16 seems remarkably small. Note that this number is used to allocate thestatically-sized loop table, as well as for memory allocation for valuenumbering, so there is some overhead to increasing it.A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix| Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio ||------- |----------- |---------------------------------------------------------------------------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|| LLoops | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 559.9 ms | 8.20 ms | 7.67 ms | 556.4 ms | 550.6 ms | 576.0 ms | 1.00 || LLoops | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 552.3 ms | 5.84 ms | 5.46 ms | 552.0 ms | 542.4 ms | 561.1 ms | 0.99 || MulMatrix | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 369.7 ms | 4.01 ms | 3.56 ms | 369.9 ms | 364.6 ms | 376.9 ms | 1.00 || MulMatrix | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 338.1 ms | 2.69 ms | 2.51 ms | 337.7 ms | 332.2 ms | 341.9 ms | 0.91 || Puzzle | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 403.4 ms | 6.93 ms | 6.48 ms | 402.3 ms | 394.8 ms | 412.5 ms | 1.00 || Puzzle | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 394.9 ms | 4.16 ms | 3.68 ms | 395.5 ms | 388.2 ms | 401.8 ms | 0.98 |spmi diffs:```Summary of Code Size diffs:(Lower is better)Total bytes of base: 6264Total bytes of diff: 6285Total bytes of delta: 21 (0.34% of base)Total relative delta: 0.00 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 21 : 42451.dasm (0.34% of base)1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged.Top method regressions (bytes): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModelTop method regressions (percentages): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged.```</details>--------------------------------------------------------------------------------```Summary of Code Size diffs:(Lower is better)Total bytes of base: 39243Total bytes of diff: 39940Total bytes of delta: 697 (1.78% of base)Total relative delta: 0.16 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 536 : 25723.dasm (18.29% of base) 185 : 26877.dasm (2.21% of base) 66 : 16212.dasm (1.39% of base) 16 : 16196.dasm (0.36% of base) 13 : 13322.dasm (0.27% of base) 9 : 15596.dasm (0.16% of base)Top file improvements (bytes): -125 : 27270.dasm (-6.93% of base) -3 : 13993.dasm (-0.05% of base)8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged.Top method regressions (bytes): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModelTop method improvements (bytes): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)Top method regressions (percentages): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModelTop method improvements (percentages): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged.```</details>--------------------------------------------------------------------------------```Summary of Code Size diffs:(Lower is better)Total bytes of base: 142908Total bytes of diff: 143387Total bytes of delta: 479 (0.34% of base)Total relative delta: 0.42 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 536 : 248723.dasm (18.29% of base) 390 : 220441.dasm (14.45% of base) 390 : 220391.dasm (14.61% of base) 150 : 239253.dasm (1.78% of base) 71 : 234803.dasm (1.72% of base) 16 : 225588.dasm (1.00% of base) 16 : 225590.dasm (1.00% of base) 5 : 225285.dasm (0.26% of base)Top file improvements (bytes): -359 : 215690.dasm (-0.99% of base) -320 : 215701.dasm (-1.16% of base) -128 : 215723.dasm (-0.73% of base) -128 : 215666.dasm (-0.62% of base) -125 : 239280.dasm (-6.93% of base) -29 : 216754.dasm (-0.33% of base) -3 : 225316.dasm (-0.15% of base) -3 : 225313.dasm (-0.15% of base)16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged.Top method regressions (bytes): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):intTop method improvements (bytes): -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):intTop method regressions (percentages): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):intTop method improvements (percentages): -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged.```</details>--------------------------------------------------------------------------------```Summary of Code Size diffs:(Lower is better)Total bytes of base: 15351Total bytes of diff: 15448Total bytes of delta: 97 (0.63% of base)Total relative delta: 0.08 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 71 : 63973.dasm (7.63% of base) 26 : 106039.dasm (0.19% of base)2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged.Top method regressions (bytes): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:thisTop method regressions (percentages): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged.```</details>--------------------------------------------------------------------------------```Summary of Code Size diffs:(Lower is better)Total bytes of base: 22540Total bytes of diff: 22576Total bytes of delta: 36 (0.16% of base)Total relative delta: 0.01 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 23 : 104600.dasm (0.15% of base) 14 : 43143.dasm (0.47% of base)Top file improvements (bytes): -1 : 35267.dasm (-0.03% of base)3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged.Top method regressions (bytes): 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):thisTop method improvements (bytes): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):thisTop method regressions (percentages): 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:thisTop method improvements (percentages): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged.```</details>--------------------------------------------------------------------------------```Summary of Code Size diffs:(Lower is better)Total bytes of base: 53782Total bytes of diff: 53837Total bytes of delta: 55 (0.10% of base)Total relative delta: 0.00 diff is a regression. relative diff is a regression.```<details><summary>Detail diffs</summary>```Top file regressions (bytes): 28 : 28889.dasm (0.16% of base) 27 : 28887.dasm (0.15% of base)2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged.Top method regressions (bytes): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():intTop method regressions (percentages): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged.```</details>--------------------------------------------------------------------------------
BruceForstall commentedJul 14, 2021
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop |
| Azure Pipelines successfully started running 3 pipeline(s). |
BruceForstall commentedJul 14, 2021
@AndyAyersMS @dotnet/jit-contrib PTAL |
BruceForstall commentedJul 14, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
If anyone has an opinion on how the "best" number should be chosen, I'd be happy to hear it. Making it dynamic perhaps would be an option as well: not have a maximum. Not sure if there are algorithms that would need to be reconsidered to avoid bad behavior for large numbers. |
kunalspathak commentedJul 14, 2021
Curious - does choosing 32 not showing enough wins? Can we expose a |
AndyAyersMS commentedJul 14, 2021
Does 64 cover all the SPMI cases...? This might be a place where a more extensive SPMI collection would prove valuable. |
kunalspathak commentedJul 14, 2021
Could you elaborate on that? |
AndyAyersMS commentedJul 14, 2021
I'm referring to some of the internal collections we had at one point, over much larger amounts of code. |
BruceForstall commentedJul 14, 2021
I enabled So, sticking with powers of 2, 32 only misses ~9 functions with > 32 loops. Even 16 max loops per function hits 99% of all loops (no surprise). Of course there's some crazy outlier: 192 loops in (at least) one function. The stats for just the benchmarks is: Unfortunately, the spmi collections, especially the benchmarks collection, have a lot of "MISSING" data currently, so it's not clear how that's skewing the data. |
BruceForstall commentedJul 15, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Interestingly, switching from 64 to 32 max loops still gives MulMatrix 9% improvement, but puzzle regresses by 6%! (puzzle has 45 loops in its main function) |
BruceForstall commentedJul 15, 2021
As for throughput: a PIN spmi run with max loops = 64 shows no statistically significant TP difference. |
BruceForstall commentedJul 15, 2021
Test failures are all infra or known issues |
AndyAyersMS left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
16 seems remarkably small. 64 was not scientifically chosen, but does cover more
cases.
Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value
numbering, so there is some overhead to increasing it.
A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix
spmi diffs:
Detail diffs
Detail diffs
Detail diffs
Detail diffs
Detail diffs
Detail diffs