NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible#102301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

tannergooding merged 11 commits intodotnet:mainfromtannergooding:proto-102275

Jun 5, 2024

Merged

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible#102301

tannergooding merged 11 commits intodotnet:mainfromtannergooding:proto-102275

Jun 5, 2024

Conversation

Copy link

Member

tannergooding commentedMay 16, 2024•
edited
Loading

This is a basic prototype for#102275 and covers the all vectorized types where the handling can be trivially done in managed (such as by deferring to another intrinsic API).

As part of that, it also removesMethodImpl(MethodImplOptions.AggressiveInlining) from various APIs where the method IL size is approx. less than the always inline threshold for RyuJIT (this was estimated by examining a few functions and finding that 2 or less calls with no complex logic typically fits the need).

tannergooding added3 commits

May 15, 2024 22:15

Change Vector4 to be implemented entirely in managed

603dc39

Change Plane to be implemented entirely in managed

c4bef1f

Change Quaternion to be implemented entirely in managed

10c5dfd

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

May 16, 2024

dotnet-policy-servicebot assignedtannergooding

May 16, 2024

Copy link

Contributor

dotnet-policy-servicebot commentedMay 16, 2024

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

This was referencedMay 16, 2024

StackOverflowException on System.Runtime.Serialization.Xml.ReflectionOnly.Tests#99963

Closed

Test failure: Unknown chain building error in System.Net.Security.CertificateValidationRemoteServer.ConnectWithRevocation_WithCallback#101835

Closed

Avoid accidental recursion on Mono

bcd8926

build-analysisbot mentioned this pull request

May 16, 2024

[browser][MT] WasmBrowserTestRunner.RunConsoleMessagesPump - abrupt disconect#101618

Closed

Change Vector2/3, Vector<T>, and Vector64/128/256/512<T> to be implem…

69b9d08

…ented in managed where trivially possible

tannergooding changed the title~~Change Vector4, Quaternion, and Plane to be implemented entirely in managed~~Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible

May 16, 2024

tannergooding added2 commits

May 16, 2024 14:23

Merge remote-tracking branch 'dotnet/main' into proto-102275

d93b621

Don't regress the implementation of Dot

2514560

This comment was marked as outdated.

tannergooding closed this

May 17, 2024

This was referencedMay 17, 2024

[browser][MT] System.Net.Http.Functional.Tests.WorkItemExecution timeout#102385

Closed

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137#100558

Closed

tannergooding reopened this

Jun 3, 2024

Merge remote-tracking branch 'dotnet/main' into proto-102275

c993845

Copy link

MemberAuthor

tannergooding commentedJun 3, 2024

Reopened now that#102702,#102827, and#102973 has gone in. We should no longer see pessimizations caused from needing constants during import and so we should see fewer diffs overall and ideally be in a position to consider getting this merged and building on it more.

build-analysisbot mentioned this pull request

Jun 3, 2024

Dead lettering tests#101524

Closed

Copy link

MemberAuthor

tannergooding commentedJun 3, 2024

Things are already looking much better than before.

minopts size regressions are expected since it would now require inlining to get the optimized codegen. This is something that needs to be discussed around the overall impact, but I expect its fine since itsT0 code.

fullopts has almost no changes and we are doing the right thing for the most part. There are some more complex methods where we run into limits for the default inlining heuristics given that we no longer get the[Intrinsic] profitability boost. We could solve this by still treating APIs inSystem.Runtime.Intrinsics as[Intrinsic], regardless of whether or not they're annotated. This gives us the profitability boost without needing to also have the implementation in managed.

There is an assert aroundV512.CreateScalar that needs to be looked at and some mono failures, which at a glance looks to be a legitimate bug in Mono.

tannergooding added2 commits

June 3, 2024 12:17

Fixing the handling of Vector512.CreateScalar

3eae1a6

Continue tracking the System.Numerics and System.Runtime.Intrinsics A…

6c6d71e

…PIs as intrinsic for the profitability boost

tannergooding force-pushed theproto-102275 branch fromf7b27b2 to1d6cafaCompare

June 3, 2024 22:42

jkotas reviewed

Jun 3, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.Extensions.cs

		/// <returns><paramref name="value" /> reinterpreted as a new <see cref="Plane" />.</returns>
		internalstaticPlaneAsPlane(thisVector4value)
		{
		#ifMONO

Copy link

Member

jkotasJun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this perf problem or functionality problem?

Either way, it is likely going to be fixed as side-effect of#102988 .

Copy link

MemberAuthor

tannergoodingJun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Possibly a functionality problem that I've not dug into yet. The previous tested commit had seen some failures related to Quaternion/Vector4/Plane and their equality tests, but only on Mono.

My current guess is that there's something subtly incorrect in the Mono handling that breaks for SIMD here and it will need a fix before BitCast can be used, but I'd like to try and ensure CoreCLR is clean without regressions before I do any more in depth Mono changes.

Don't use Unsafe.BitCast on Mono

4dae1fc

tannergooding force-pushed theproto-102275 branch from1d6cafa to4dae1fcCompare

June 3, 2024 22:50

build-analysisbot mentioned this pull request

Jun 4, 2024

NativeAOT legs timing out in CI#102239

Closed

This was referencedJun 4, 2024

Wasm test failure: Unable to evaluate script: target frame detached#102957

Closed

Wasm test failure: System addon update list error SyntaxError: XMLHttpRequest.open#102955

Closed

The Operation will be canceled. The next steps may not contain expected logs.dotnet/dnceng#3008

Open

[wasm] ystem addon update list error SyntaxError: XMLHttpRequest.open: 'http://%(server)s/dummy-system-addons.xml' is not a valid URL.#103060

Closed

Copy link

MemberAuthor

tannergooding commentedJun 5, 2024

@MihuBot

MihuBot mentioned this pull request

Jun 5, 2024

[X64] [tannergooding] Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/51 ...MihuBot/runtime-utils#400

Open

tannergooding marked this pull request as ready for review

June 5, 2024 19:22

Copy link

MemberAuthor

tannergooding commentedJun 5, 2024•
edited
Loading

One regression is inSystem.Drawing.RectangleF:ToVector4()

-       vmovups  xmm0, xmmword ptr [rdi]+       vmovss   xmm0, dword ptr [rdi]+       vinsertps xmm0, xmm0, dword ptr [rdi+0x04], 16+       vinsertps xmm0, xmm0, dword ptr [rdi+0x08], 32+       vinsertps xmm0, xmm0, dword ptr [rdi+0x0C], 48

This one is because while we havefgMorphCombineSIMDFieldStores, we don't have anfgMorphCombineSIMDFieldLoads instead we only had some logic as part ofNI_Vector4_Create that looked for consecutive field accesses. This one shouldn't be a blocker and can be pretty easily handled in a follow up PR.

There's a small regression inSystem.IO.Hashing.Crc32:UpdateVectorized:

 G_M1352_IG02:        mov      rax, rsi        mov      ecx, edx-       vmovups  xmm0, xmmword ptr [rax]        cmp      ecx, 128        jge      SHORT G_M1352_IG04- ;; size=17 bbWeight=1 PerfScore 5.75+;; size=13 bbWeight=1 PerfScore 1.75 G_M1352_IG03:-       vmovd    xmm1, rdi-       vpxor    xmm0, xmm1, xmm0+       vmovd    xmm0, rdi+       vpxor    xmm0, xmm0, xmmword ptr [rax]        jmp      G_M1352_IG07        align    [0 bytes for IG05]- ;; size=13 bbWeight=0.50 PerfScore 2.17+;; size=13 bbWeight=0.50 PerfScore 3.50 G_M1352_IG04:+       vmovups  xmm0, xmmword ptr [rsi]        vmovups  xmm1, xmmword ptr [rsi+0x10]

Nothing major, just a place where we don't share a load anymore.

In general the JIT ends up needing to create many additionalimpAppendStmt,Inlining Arg,spilled call-like call argument, andInline ldloca(s) first use temp locals. Most of these end up aszero-ref and so there's potential for the JIT to optimize things better around such scenarios to avoid additional or unnecessary overhead (throughput wise).

Most of the corelib regressions are in the xplat APIs that don't have any accelerated implementation today, likeDot<short> (lack of handling for multiply since it would have been fairly complex IR to support). Others are cases likeVector512.OnesComplement which would be fixed if we had it defer toop_OnesComplement rather thanVector256.OnesComplement (an easy fix in a follow up PR).

All in all, I think we are in a position where taking this is feasible and it doesn't look to regress any meaningful scenarios, only the already unaccelerated edge cases. -- Mono is being left "as is" for now since their inliner isn't as robust as the RyuJIT one, but we should be able to independently investigate removing the Mono support for the xplat APIs as well so they only need to support the platform specific APIs.

CC. @dotnet/jit-contrib

EgorBo approved these changes

Jun 5, 2024

View reviewed changes

Copy link

Member

EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice clean up in jit! 👍

tannergooding merged commit2a6bf6f intodotnet:main

Jun 5, 2024

This was referencedJun 5, 2024

Minor cleanup of the Vector64/128/256/512 implementations to improve fallbacks#103095

Merged

Move more of the xplat hwintrinsic API implementation into managed code#103150

Closed

Copy link

Member

LoopedBard3 commentedJun 11, 2024•
edited
Loading

Likely regressions:
Windows x64:dotnet/perf-autofiling-issues#36075,dotnet/perf-autofiling-issues#36067
Linux x64:dotnet/perf-autofiling-issues#36054,dotnet/perf-autofiling-issues#36055
Windows arm64:dotnet/perf-autofiling-issues#36252,dotnet/perf-autofiling-issues#36251,dotnet/perf-autofiling-issues#36216,dotnet/perf-autofiling-issues#36215,dotnet/perf-autofiling-issues#36623
Linux arm64:dotnet/perf-autofiling-issues#36174

cc:@tannergooding

Copy link

Member

lewing commentedJun 11, 2024•
edited
Loading

wasm regressions heredotnet/perf-autofiling-issues#36108 (aot) and improvementsdotnet/perf-autofiling-issues#36093 (interp)

cc@radekdoulik

Copy link

MemberAuthor

tannergooding commentedJun 11, 2024

Likely regressions:

These should be resolved with#103177

wasm regressions heredotnet/perf-autofiling-issues#36108 (aot)

Little bit surprised thatwasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

improvementsdotnet/perf-autofiling-issues#36093 (interp)

👍, fairly significant improvements at that. I know thatinterp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

radekdoulik mentioned this pull request

Jun 12, 2024

[wasm][perf] Tracking#96444

Open

kotlarmilos mentioned this pull request

Jun 12, 2024

[Perf] Linux/x64: 15 Regressions on 6/5/2024 11:33:55 PMdotnet/perf-autofiling-issues#36078

Closed

Copy link

Member

lewing commentedJun 12, 2024

Likely regressions:
These should be resolved with#103177
wasm regressions heredotnet/perf-autofiling-issues#36108 (aot)
Little bit surprised thatwasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

I find it surprising too,@radekdoulik can you take a look at our codegen here please

improvementsdotnet/perf-autofiling-issues#36093 (interp)
👍, fairly significant improvements at that. I know thatinterp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

Yes, nice improvements