- Notifications
You must be signed in to change notification settings - Fork5.2k
JIT: Enable CSE for VectorX.Create#50644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
EgorBo commentedApr 2, 2021
PTAL @dotnet/jit-contrib@tannergooding |
src/coreclr/jit/valuenum.cpp Outdated
| if ((HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1) || | ||
| constbool isVariableArgs =HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1; | ||
| // In case of functions with variable number of args we only support single-arg ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just wondering, why 0 or 1 and not 0, 1, or 2?
tannergoodingApr 2, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Variable length args only impactVector.Create calls for ARM, but also impact cases likeSse.ReciprocalScalar for x86, where the upper limit is 2 args (and therefore op1 will not be aGT_LIST, butNumArgs will be-1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Initially I wanted to add support for args >= 2 cases later but since you noticed this case I've added it as part of this pr.
Repro:
staticintTest(){returnSse2.MoveMask(Sse2.Add(Vector128.Create(1,1),Vector128.Create(1,1)).AsByte());}
New codegen:
; Method Program:Test():intG_M58954_IG01:vzeroupper;; bbWeight=1 PerfScore 1.00G_M58954_IG02:vmovupdxmm0, xmmword ptr[reloc @RWD00]vpaddqxmm0,xmm0,xmm0 vpmovmskbeax,xmm0;; bbWeight=1 PerfScore 4.33G_M58954_IG03:ret;; bbWeight=1 PerfScore 1.00RWD00 dq0000000000000001h,0000000000000001h; Total bytes of code: 20
* upstream/main: (568 commits) [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842) [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303) [wasm] Fix debug build of AOT cross compiler (dotnet#50418) Fix outdated comment (dotnet#50834) [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678) Vectorized common String.Split() paths (dotnet#38001) Fix binplacing symbol files. (dotnet#50819) Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735) Remove extraneous CMake version requirement. (dotnet#50805) [wasm] Remove unncessary condition for EMSDK (dotnet#50810) Add loop alignment stats to JitLogCsv (dotnet#50624) Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265) Avoid unnecessary closures/delegates in Process (dotnet#50496) Fix for field layout verification across version bubble boundary (dotnet#50364) JIT: Enable CSE for VectorX.Create (dotnet#50644) [main] Update dependencies from mono/linker (dotnet#50779) [mono] More domain cleanup (dotnet#50771) Race condition in Mock reference tracker runtime with GC. (dotnet#50804) Remove IAssemblyName (and various fusion remnants) (dotnet#50755) Disable failing test for GCStress. (dotnet#50828) ...
Uh oh!
There was an error while loading.Please reload this page.
A low hanging fruit a noticed while I was triaging#13811, but it'd be also nice to allow CSE for GT_LIST-type of args in future.
This PR enables CSE (and hoisting) for Vector128.Create (and other functions with variable amount of args where the actual amount of them is zero or one).
Example:
codegen diff:https://www.diffchecker.com/LMZuC6QS
jit-diff (-f):
The regression is actually a perf improvement:https://www.diffchecker.com/54f2vVgR