NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

[JIT] Improve inliner: new heuristics, rely on PGO data#52708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

EgorBo merged 67 commits intodotnet:mainfromEgorBo:jit-inliner-improvements

Jul 1, 2021

Merged

[JIT] Improve inliner: new heuristics, rely on PGO data#52708

EgorBo merged 67 commits intodotnet:mainfromEgorBo:jit-inliner-improvements

Jul 1, 2021

Conversation

Copy link

Member

EgorBo commentedMay 13, 2021•
edited
Loading

Closes#33338 optimize comparisons for const strings
Closes#7651 Generic Type Check Inlining
Closes#10303 have inlining heuristics look for cases where inlining might enable devirtualization
Closes#47434 Weird inlining of methods optimized by JIT
Closes#41923 Don't inline calls inside rarely used basic-blocks
Closes#33349 Utf8Formatter.TryFormat causes massive code bloat when used with a constant StandardFormat

This PR was split, the first part is#53670

Inlining is one of the most important compiler optimizations - it significantly widens opportunities for other optimizations such as Constant Folding, CSE, Devirtualization, etc. Unfortunately, there is no Silver Bullet to do it right and it's essentially a sort of an NP completeKnapsack problem, especially in JIT environments where we don't see the whole graph of all calls, new types can be loaded/added dynamically, and we're limited in time, e.g. we can't just try to import all the possible callees into JIT's IR nodes, apply all the existing optimizations we have and decide whether it was profitable or not - we only have time to quickly inspect raw IL and note some useful patterns (furthermore, we ask R2R to bake "noinline" into non-profitable methods forever).

It's important to note possibleregressions due to bad Inlining decisions:

Significant codegen size increase - it can be noticeable for the AOT'd (R2R) code. There are cases where inlining actually reduces size of callers, but mostly it's a trade-off between size and performance.
Always regresses JIT's throughput (we analyze/inline more and it's not cheap). Visible effect - it takes longer to warmup/start an app/process first request.
CPUs have limited sets of registers, so if we inline a callee with a lot of temps inside - we might cause serious problems for the Register Allocator (that, just like the inliner, also uses heuristics 🙂).
Bigger functions/loop bodies may not fit into CPU caches.
Jit has a hard-coded limit of locals it can track (lclMAX_TRACKED=512) - a good example is thissharplab.io snippet.

Current approach in RyuJIT

In RyuJIT we have several strategies (or policies) such asDefaultPolicy,ModelPolicy (based on some ML model to estimate impact),SizePolicy (prefers size over perf) andProfilePolicy (heavily relies on PGO data). But onlyDefaultPolicy is used in Production. Its algorithm can be explained in 5 steps:

Make some quick pre-checks, e.g.:
- Callee is less than 16 bytes of IL code or hasAggressiveInlining attribute - always inline
- Callee is larger than 100 bytes of IL code or hasNoInlinine - always ignore
- Callee has EH, stackalloc (and callsite is in a loop), etc - always ignore
- Check that we don't exceed depth of calls, amount of basic-block and time budget.
Roughly estimate codegen size of the callsite (NOTE: we don't take current caller's size into account directly)
Roughly estimate codegen size of the inline candidate by using a state-machine that accepts raw IL code and applies some weight for all IL opcodes.
Inspectraw IL and try to find some useful observations (heuristics) and compose a so called "Benefit multiplier"
Make a decision using the following formula:bool toInline = callsiteNativeSize * BenefitMultiplier > calleeNativeSize
An example with logs:

(BTW, in this PR this method is inlined)

Things this PR tries to improve

Add more observations (it's done inInliner: new observations (don't impact inlining for now) #53670) on top of raw IL, such as
- Inline candidate has some abstract class in its signature for one of the arguments and we pass something known at the callsite. If we inline it we'll be able to devirtualize all calls inside the callee even without the GDV (Guarded Devirtualization).
- Inline candidate is generic when caller is not
- Inline candidate has foldable unary/binary expressions and even branches when we pass a constant argument(s).
- Inline candidate has foldable BOX/UNBOX operations (generics emit a lot of them).
- Inline candidate has an expensive "X / Arg" expression and we pass a constant at the callsite for the Arg - if we inline it - we'll avoid expensiveidiv.
- It now is able to recognize more potentially foldable branches, even patterns liketypeof(T1) == typeof(T2) orargStr.Length == 10 whenargStr is a string literal at the callsite. (NOTE: only during prejitting or when a method was promoted to tier1 naturally. Won't work for methods with loops or with TieredCompilation=0).
- Inline candidates pass/returns a large struct by value - we might avoid redundant copies if we inline.
- there are few more
  A small example that covers some of the new observations inliner makes:

Thanks to@tannergooding's work we can now resolve somecall tokens and recognize intrinsics.

RefactorfgFindJumpTarget andPushedStack: We pushed opcodes to that two-slots stack in order to recognize some patterns in the next (binary/unary) opcodes - unfortunately we ignored a lot of opcodes and it led to invalid state in that stack => invalid observations. Such invalid observations were the main source of regressions in my PR because some of them led to a better performance for some reason 😐.
Find more optimal multipliers for the observations (using micro/TE benchmarks, deep performance analysis of traces/regressions, example:[JIT] Improve inliner: new heuristics, rely on PGO data #52708 (comment)).
Rely on PGO data - the most important part. We try harder to inline what was recognized as hot. Unfortunately, there are cases where a profile can be misleading/polluted and mark hot or semi-hot blocks as cold:
- We don't support context-sensitive instrumentation in Dynamic PGO mode (see example inPGO: Instrument cold blocks with profile validators #53840)
- The Static PGO that we ship can be irrelevant for some apps.
- We don't support deoptimizations so we can't re-collect profiles if something changes in the workflow over time.
- Sometimes it's still makes sense to inline methods in cold blocks to improve type/escape analysis for the whole caller. Example:sharplab.io
Allow JIT to inline bigger functions if really needed: more than 5 basic-blocks and more than 100 bytes of IL.

Metodology

For now, I mostly run various benchmarks (TE, dotnet/performance, PowerShell, Kestrel, SignalR), analyze collected samples (flamegraphs) produced by profilers/dotnet trace/BDN itself in hot spots where I inspect all calls in the chains and if I see a method that looks like it should be inlined - I try to find what kind of observations (and corresponding benefit multipliers) can make it happen, example:

There are some thoughts on how to automatize it using ML/Genetic Algorithms.

Benchmarks

I usePerfLab machines for TE/SignalR/YARP benchmarks. Besides that, I have locally:

Win - Windows 10 x64, Core i7 8700K (4 cores, 3.7Ghz, Coffee Lake, AVX2)
Linux - Ubuntu 20.04 x64, Core i7 4930K (6 cores, 3.4Ghz, Ivy Bridge E, AVX1)
Arm64 - macOS 11.5 arm64 (Mac Mini M1, arm64-v8.3(or higher?))

Also, I use different JIT modes:

Default - Default arguments.
NoPGO - same asDefault, but without static PGO collected from various benchmarks/workloads. I use this mode to estimate overall impact from PGO when compared againstFullPGO (in .NET 5.0 we didn't have any static profile so it kind of simulates net5.0).
DynPGO:
- DOTNET_TieredPGO=1 - collect profile in tier0.
- DOTNET_TC_QuickJitForLoops=1 - so we also instrument methods with loops on tier0. By default they bypass tier0 to avoid "cold loops with hot bodies" problem (OSR is not enabled by default yet).
FullPGO: same asDynPGO but with:
- DOTNET_ReadyToRun=0 - ignore all AOT code and re-collect actual profile (with better class probes).

Results

It seems like this PR unlocks PGO and GDV potential in FullPGO mode. Many high-load benchmarks show +10-20% improvements. I tried to avoid startup/time-to-first-request regressions in the Default mode so the RPS numbers are not as great as they could be with a more aggressive inliner. In order to avoid size regressions in R2R, we use the "old" inliner - it leads to zero improvements when we benchmark R2R'd code inDOTNET_TieredCompilation=0 mode (e.g.dotnet/performance microbenchmarks). Also, prejitter bakes "noinline" into thousands of method using the old algorithm and kind of limits possibilities for the new one during prejitting.

The latest run of TE benchmarks on aspnet-citrine-lin:

^ some P90-P99 numbers regressed because, I believe, they include the warmup stage, so when I try to run a specific benchmark and ask it to run longer than 30 seconds - P99 numbers are greatly improved)
Also, don't expect large numbers from Platform-* benchmarks, those are heavily optimized by hands 😐.
UPD: the latest numbers:#52708 (comment)

NoPGO vs FullPGO (mainly to estimate impact of PGO and show room for improvements for the Default mode, NOTE: NoPGO numbers should be slightly better than the current results for .NET 5.0 in the Default mode):
UPD new numbers:#52708 (comment)

JIT ET events show that the improved inliner usually inlines 10-20% more functions, e.g.:
Plaintext-MVC (Default mode):

Orchard CMS (Default mode):

YARP Proxy (Default mode):

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

May 13, 2021

tannergooding reviewed

May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp OutdatedShow resolvedHide resolved

tannergooding reviewed

May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp OutdatedShow resolvedHide resolved

tannergooding reviewed

May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp OutdatedShow resolvedHide resolved

tannergooding reviewed

May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cppShow resolvedHide resolved

Copy link

MemberAuthor

EgorBo commentedMay 17, 2021

@tannergooding Thanks for the feedback! Yeah it's early stage WIP but still open for any feedback :)

tannergooding reviewed

May 19, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp OutdatedShow resolvedHide resolved

AndyAyersMS mentioned this pull request

May 21, 2021

Dynamic PGO#43618

Closed

54 tasks

runfoappbot mentioned this pull request

May 27, 2021

FileSystemWatcher_DirectorySymbolicLink_TargetsFile_Fails#53366

Open

Copy link

MemberAuthor

EgorBo commentedMay 27, 2021•
edited
Loading

PowerShell benchmarks:

EgorPR vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.14 |          1311.98 |          1149.36 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.14 |          1327.41 |          1163.76 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.04 |         51051.17 |         49205.17 |         || Engine.Parsing.UsingStatement                                                    |      1.01 |       9962178.13 |       9827435.94 |         |

EgorPR vs Main (TieredPGO=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.12 |          1350.38 |          1208.74 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.08 |          1296.26 |          1200.47 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.03 |         52221.30 |         50559.75 |         || Engine.Parsing.UsingStatement                                                    |      1.03 |      10018784.38 |       9765815.63 |         |

EgorPR vs Main (TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.18 |          1343.95 |          1143.72 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.16 |          1362.58 |          1172.03 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.06 |         53039.17 |         49984.76 |         || Engine.Parsing.UsingStatement                                                    |      1.01 |       9938895.31 |       9804198.44 |         |

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.19 |          1255.76 |          1057.23 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1257.16 |          1082.81 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.08 |         50584.39 |         46657.50 |         || Engine.Parsing.UsingStatement                                                    |      1.03 |       9933034.38 |       9646275.00 |         |

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.19 |          1255.76 |          1057.23 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1257.16 |          1082.81 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.08 |         50584.39 |         46657.50 |         || Engine.Parsing.UsingStatement                                                    |      1.03 |       9933034.38 |       9646275.00 |         |

EgorPR (TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1327.41 |          1143.72 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.12 |          1311.98 |          1172.03 |         || Engine.Parsing.UsingStatement                                                    |      1.02 |       9962178.13 |       9804198.44 |         |

EgorPR (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.24 |          1311.98 |          1057.23 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.23 |          1327.41 |          1082.81 |         || Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.09 |         51051.17 |         46657.50 |         || Engine.Parsing.UsingStatement                                                    |      1.03 |       9962178.13 |       9646275.00 |         |

runfoappbot mentioned this pull request

May 28, 2021

The type initializer for 'Wasm.Build.Tests.BuildTestBase' threw an exception.#53428

Closed

Copy link

MemberAuthor

EgorBo commentedMay 28, 2021•
edited
Loading

JSON deserialization (System.Text.Json) example over a quite complicated file:Perf_Enumerable-report-full.json. Default parameters.

[Benchmark]publicRootObjParseTextJson()=>JsonSerializer.Deserialize<RootObj>(TestFileBytes);

|             Method |        Job |                    Toolchain |       Mean |    Error |   StdDev | Ratio | RatioSD ||------------------- |----------- |----------------------------- |-----------:|---------:|---------:|------:|--------:||      ParseTextJson | Job-VFKUTY | \runtime-main\..\corerun.exe |   900.7 us | 13.95 us | 13.05 us |  1.10 |    0.02 ||      ParseTextJson | Job-BYFRBX |   \runtime-pr\..\corerun.exe |   817.9 us |  1.18 us |  0.92 us |  1.00 |    0.00 |

This was referencedJun 1, 2021

Test failure Loader/classloader/StaticVirtualMethods/GenericContext/GenericContextTest/GenericContextTest.sh#53161

Closed

Host tests often fail on text file busy issues#53587

Closed

This was referencedJun 3, 2021

Inliner: new observations (don't impact inlining for now)#53670

Merged

Methods with struct parameters are not inlined#53783

Open

EgorBo added2 commits

June 10, 2021 10:12

Port old changes

d8e2d7c

Improvements

98afcea

EgorBo force-pushed thejit-inliner-improvements branch from61c7c91 to98afceaCompare

June 10, 2021 10:02

Limit it for loops

9a01fe3

Copy link

MemberAuthor

EgorBo commentedJun 10, 2021

aspnet Kestrel microbenchmarks: (150 benchmarks)https://gist.github.com/EgorBo/719020f50575c34c146be535d44721ce

--threshold 1% --noise 10nssummary:better: 59, geomean: 1.159worse: 1, geomean: 1.026total diff: 60

Merge branch 'main' ofhttps://github.com/dotnet/runtimeinto jit-inl…

7e4fba2

…iner-improvements

Copy link

MemberAuthor

EgorBo commentedJun 11, 2021•
edited
Loading

TechEmpower,aspnet-perf-win machine (PerfLab), randomly-picked benchmarks:

Note: Results may be different for the citrine machine (same hw as the official TE), but from a quick test plaintext-mvc has the same +20% in FullPGO there on aspnet-citrin-win). Will check Linux later.

Copy link

Member

danmoseley commentedJun 11, 2021

Detailed PR descriptions like this are great for curious observers like me. Kudos!

Copy link

Member

sebastienros commentedJun 11, 2021•
edited
Loading

We usually base the acceptance criteria on citrine. I appreciate that you did you investigations on the small machines, but it will be preferable to share the final results from citrine here. Might be even better gains!

Update:
Can this be run on Linux too? Since official TE is Linux only.

Copy link

MemberAuthor

EgorBo commentedJun 11, 2021

We usually base the acceptance criteria on citrine. I appreciate that you did you investigations on the small machines, but it will be preferable to share the final results from citrine here. Might be even better gains!
Update:
Can this be run on Linux too? Since official TE is Linux only.

Initially I just wanted to tests some configs, didn't plan to publish any results, but the numbers looked too nice 🙂 I'll publish both Windows and Linux citrine once I validate some theories

Copy link

MemberAuthor

EgorBo commentedJun 12, 2021•
edited
Loading

TE Benchmarks onaspnet-citrine-lin (Intel, 28 cores, Ubuntu 18.04) - the official TE hardware (except 4x bigger network bandwidth):

Each config was ran twice, "Platform *" benchmarks were expected to have smaller diffs since they're heavily optimized by hands and mostly are limitted by OS/bandwidth but still quite some numbers.

Let me know if I should also test some other configurations (e.g. Multiquery, etc).

EgorBo added3 commits

June 13, 2021 01:49

Tuning

4bc7341

Make it less aggressive (esp. for R2R)

52a8e8b

Bring back the previous algorithm just in case (available under a con…

d2d1d15

…fig switch)

Copy link

MemberAuthor

EgorBo commentedJun 26, 2021•
edited
Loading

Yes, this looks more reasonable that the original size regressions.
With relatively large internal methods fully inlined (like yourArray:Sort orMoveNext examples), we should tech crossgen2 to not compile these internal methods separately. Their body will never be used since they are fully inlined.

Good idea!

Unrelatated: I've just built a histogram on amount of locals in callers during inlining (crossgen2 + ExtDefaultPolicy + Release + PGO data):

So doesn't look like we inline too much often, but just in case I added a check to decrease the benefit mutliplier if we have to many locals already (say, 1/4 of MaxLocalsToTrack). Currently, MaxLocalsToTrack is 1024, and the biggest amount of locals after inlining is 463.

Copy link

MemberAuthor

EgorBo commentedJun 26, 2021

The latest run of TE benchmarks on aspnet-perf-linux (citrine was busy):

Copy link

MemberAuthor

EgorBo commentedJun 26, 2021

Citrine-Linux results (the difference is smaller):

EgorBo mentioned this pull request

Jun 26, 2021

Crossgen2 shouldn't compile inlineable internal methods#54781

Open

Better default parameters

e072cc6

jkotas reviewed

Jun 26, 2021

View reviewed changes

src/coreclr/jit/jitconfigvalues.h OutdatedShow resolvedHide resolved

EgorBo added2 commits

June 28, 2021 12:28

Address feedback

914f58b

Formatting

ae02160

Copy link

MemberAuthor

EgorBo commentedJun 28, 2021

@jkotas Any objections to merging this?

Copy link

MemberAuthor

EgorBo commentedJun 28, 2021•
edited
Loading

Windows TE aspent-citrine-win results (less stable as expected):

Copy link

Member

jkotas commentedJun 28, 2021

No objections. I think that the defaults for R2R can probably can use more tuning, but that can be done separately.

@mangod9 @dotnet/crossgen-contrib The optimization for time (-Ot crossgen2 option - not enabled by default) will have very material impact after this change. We should make sure to turn it on for workloads that use R2R with tiered JIT compilation disabled.

Copy link

Member

mangod9 commentedJun 28, 2021

Thanks, assume these the gains would be most pronounced with R2R composite mode?

Copy link

Member

jkotas commentedJun 28, 2021

assume these the gains would be most pronounced with R2R composite mode?

It depends. I would expect it to impact both composite and non-composite cases.

EgorBo added3 commits

June 29, 2021 11:20

Fix speed_opt check (it's not set for normal jit)

2337ddc

Merge branch 'main' ofhttps://github.com/dotnet/runtimeinto jit-inl…

397546a

…iner-improvements

Remove JitExtDefaultPolicyProfBB and handle CEE_IND over arguments

28de069

Copy link

MemberAuthor

EgorBo commentedJun 29, 2021•
edited
Loading

It can be merged then. This PR mostly unlocks dynamic PGO's potential that we won't use by default in .NET 6. However, users, who care less about slower start, can set the following variables for "FullPGO + Aggressive Inliner" mode:

DOTNET_TieredPGO=1# enable PGO instrumentation in tier0DOTNET_TC_QuickJitForLoops=1# don't bypass tier0 for methods with loopsDOTNET_ReadyToRun=0# don't use prejitted code and collect a better profile for everythingDOTNET_JitExtDefaultPolicyMaxIL=0x200# allow inlining for large methodsDOTNET_JitExtDefaultPolicyMaxBB=15# same hereDOTNET_JitExtDefaultPolicyProfScale=0x40# try harder inlining hot methods

Here are the results for TE benchmarks on citrine-linux forMain-default (Static PGO is included) vsThisPR + FullPGO + aggressive inliner ^:

(the diff is even bigger if we disable static PGO for the left one)

This PR also improves the default mode, but only by a few %s in general. And, just a reminder: if we use/benchmark something withDOTNET_TieredCompilation=0 we won't see any benefits since we use the old inlining policy for AOT and will never re-jit it for the BCL (don't expect any improvements fordotnet/performance micro-benchmarks if we use TC=0 there).

What is not done, but nice to have:

It doesn't fixAggressiveInlining not respected when Method grow too large #41692,System.Numerics.Vector: Recognize division by a constant and inline #43761 andJIT code optimization inconsistency #51587 since it doesn't touch "EstimateTime" yet.
It should ignore methods with insufficient samples (currently if callsite's profile >= callsite's fgFirstBb->weight -- we consider it as hot) e.g. static constructors, etc
Re-vise "report noinline to vm" logic, sometimes it might bake "noinline" for methods which are good candidates for inlining within specific call-sites (e.g. with const args).
"resolve call token" logic currently doesn't work for methods with loops.

If something goes wrong we can easily switch to the previous (more conservative and PGO-independent) policy.

Merge branch 'main' ofhttps://github.com/dotnet/runtimeinto jit-inl…

43d353f

…iner-improvements

Copy link

MemberAuthor

EgorBo commentedJun 30, 2021•
edited
Loading

The example from#33349 is my favorite one:

privatestaticboolFormat(Span<byte>span,intvalue){returnUtf8Formatter.TryFormat(value,span,out_,newStandardFormat('D',2));}

Diff:https://www.diffchecker.com/WFVcKhJm (973 bytes -> 102 bytes)

EgorBo changed the title~~JIT: Improve inliner (new heuristics, clean up)~~[JIT] Improve inliner: new heuristics, rely on PGO data

Jun 30, 2021

Copy link

MemberAuthor

EgorBo commentedJun 30, 2021•
edited
Loading

Failure is unrelated (#54125)

EgorBo merged commitcf2938f intodotnet:main

Jul 1, 2021

mangod9 mentioned this pull request

Jul 2, 2021

compile composite with avx2 on x64#55057

Merged

This was referencedJul 6, 2021

[Perf] Changes at 6/28/2021 10:33:32 PMDrewScoggins/performance-2#7147

Open

[Perf] Changes at 7/1/2021 8:32:46 AMDrewScoggins/performance-2#7148

Open

[Perf] Changes at 7/1/2021 12:15:52 AMDrewScoggins/performance-2#7204

Open

[Perf] Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>#52312

Closed

EgorBo mentioned this pull request

Jul 12, 2021

PGO Inlining Policy#43914

Closed

EgorBo mentioned this pull request

Jul 27, 2021

[Perf] Regressions from inliner changes#56012

Open

ghost locked asresolvedand limited conversation to collaborators

Jul 31, 2021

Labels

area-CodeGen-coreclr

CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

8 participants

Movatterモバイル変換

[JIT] Improve inliner: new heuristics, rely on PGO data#52708

[JIT] Improve inliner: new heuristics, rely on PGO data#52708

Uh oh!

Conversation

EgorBo commentedMay 13, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Current approach in RyuJIT

Things this PR tries to improve

Metodology

Benchmarks

Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commentedMay 17, 2021

Uh oh!

Uh oh!

EgorBo commentedMay 27, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

PowerShell benchmarks:

EgorPR vs Main (Default parameters)

EgorPR vs Main (TieredPGO=1)

EgorPR vs Main (TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR (TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

EgorPR (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

Uh oh!

EgorBo commentedMay 28, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 10, 2021

Uh oh!

EgorBo commentedJun 11, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

danmoseley commentedJun 11, 2021

Uh oh!

sebastienros commentedJun 11, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 11, 2021

Uh oh!

EgorBo commentedJun 12, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 26, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 26, 2021

Uh oh!

EgorBo commentedJun 26, 2021

Uh oh!

Uh oh!

EgorBo commentedJun 28, 2021

Uh oh!

EgorBo commentedJun 28, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jkotas commentedJun 28, 2021

Uh oh!

mangod9 commentedJun 28, 2021

Uh oh!

jkotas commentedJun 28, 2021

Uh oh!

EgorBo commentedJun 29, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 30, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

EgorBo commentedJun 30, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

EgorBo commentedMay 13, 2021•
edited
Loading

EgorBo commentedMay 27, 2021•
edited
Loading

EgorBo commentedMay 28, 2021•
edited
Loading

EgorBo commentedJun 11, 2021•
edited
Loading

sebastienros commentedJun 11, 2021•
edited
Loading

EgorBo commentedJun 12, 2021•
edited
Loading

EgorBo commentedJun 26, 2021•
edited
Loading

EgorBo commentedJun 28, 2021•
edited
Loading

EgorBo commentedJun 29, 2021•
edited
Loading

EgorBo commentedJun 30, 2021•
edited
Loading

EgorBo commentedJun 30, 2021•
edited
Loading