Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible#102301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
tannergooding merged 11 commits intodotnet:mainfromtannergooding:proto-102275
Jun 5, 2024

Conversation

@tannergooding
Copy link
Member

@tannergoodingtannergooding commentedMay 16, 2024
edited
Loading

This is a basic prototype for#102275 and covers the all vectorized types where the handling can be trivially done in managed (such as by deferring to another intrinsic API).

As part of that, it also removesMethodImpl(MethodImplOptions.AggressiveInlining) from various APIs where the method IL size is approx. less than the always inline threshold for RyuJIT (this was estimated by examining a few functions and finding that 2 or less calls with no complex logic typically fits the need).

jakobbotsch, PaulusParssinen, am11, EgorBo, colejohnson66, breadnone, and jkrejcha reacted with hooray emoji
@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelMay 16, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

@tannergoodingtannergooding changed the titleChange Vector4, Quaternion, and Plane to be implemented entirely in managedChange Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possibleMay 16, 2024
@tannergooding

This comment was marked as outdated.

@tannergooding
Copy link
MemberAuthor

Reopened now that#102702,#102827, and#102973 has gone in. We should no longer see pessimizations caused from needing constants during import and so we should see fewer diffs overall and ideally be in a position to consider getting this merged and building on it more.

@build-analysisbuild-analysisbot mentioned this pull requestJun 3, 2024
@tannergooding
Copy link
MemberAuthor

Things are already looking much better than before.

minopts size regressions are expected since it would now require inlining to get the optimized codegen. This is something that needs to be discussed around the overall impact, but I expect its fine since itsT0 code.

fullopts has almost no changes and we are doing the right thing for the most part. There are some more complex methods where we run into limits for the default inlining heuristics given that we no longer get the[Intrinsic] profitability boost. We could solve this by still treating APIs inSystem.Runtime.Intrinsics as[Intrinsic], regardless of whether or not they're annotated. This gives us the profitability boost without needing to also have the implementation in managed.

There is an assert aroundV512.CreateScalar that needs to be looked at and some mono failures, which at a glance looks to be a legitimate bug in Mono.

/// <returns><paramref name="value" /> reinterpreted as a new <see cref="Plane" />.</returns>
internalstaticPlaneAsPlane(thisVector4value)
{
#ifMONO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this perf problem or functionality problem?

Either way, it is likely going to be fixed as side-effect of#102988 .

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Possibly a functionality problem that I've not dug into yet. The previous tested commit had seen some failures related to Quaternion/Vector4/Plane and their equality tests, but only on Mono.

My current guess is that there's something subtly incorrect in the Mono handling that breaks for SIMD here and it will need a fix before BitCast can be used, but I'd like to try and ensure CoreCLR is clean without regressions before I do any more in depth Mono changes.

@tannergooding
Copy link
MemberAuthor

@MihuBot

@tannergooding
Copy link
MemberAuthor

tannergooding commentedJun 5, 2024
edited
Loading

One regression is inSystem.Drawing.RectangleF:ToVector4()

-       vmovups  xmm0, xmmword ptr [rdi]+       vmovss   xmm0, dword ptr [rdi]+       vinsertps xmm0, xmm0, dword ptr [rdi+0x04], 16+       vinsertps xmm0, xmm0, dword ptr [rdi+0x08], 32+       vinsertps xmm0, xmm0, dword ptr [rdi+0x0C], 48

This one is because while we havefgMorphCombineSIMDFieldStores, we don't have anfgMorphCombineSIMDFieldLoads instead we only had some logic as part ofNI_Vector4_Create that looked for consecutive field accesses. This one shouldn't be a blocker and can be pretty easily handled in a follow up PR.


There's a small regression inSystem.IO.Hashing.Crc32:UpdateVectorized:

 G_M1352_IG02:        mov      rax, rsi        mov      ecx, edx-       vmovups  xmm0, xmmword ptr [rax]        cmp      ecx, 128        jge      SHORT G_M1352_IG04- ;; size=17 bbWeight=1 PerfScore 5.75+;; size=13 bbWeight=1 PerfScore 1.75 G_M1352_IG03:-       vmovd    xmm1, rdi-       vpxor    xmm0, xmm1, xmm0+       vmovd    xmm0, rdi+       vpxor    xmm0, xmm0, xmmword ptr [rax]        jmp      G_M1352_IG07        align    [0 bytes for IG05]- ;; size=13 bbWeight=0.50 PerfScore 2.17+;; size=13 bbWeight=0.50 PerfScore 3.50 G_M1352_IG04:+       vmovups  xmm0, xmmword ptr [rsi]        vmovups  xmm1, xmmword ptr [rsi+0x10]

Nothing major, just a place where we don't share a load anymore.


In general the JIT ends up needing to create many additionalimpAppendStmt,Inlining Arg,spilled call-like call argument, andInline ldloca(s) first use temp locals. Most of these end up aszero-ref and so there's potential for the JIT to optimize things better around such scenarios to avoid additional or unnecessary overhead (throughput wise).

Most of the corelib regressions are in the xplat APIs that don't have any accelerated implementation today, likeDot<short> (lack of handling for multiply since it would have been fairly complex IR to support). Others are cases likeVector512.OnesComplement which would be fixed if we had it defer toop_OnesComplement rather thanVector256.OnesComplement (an easy fix in a follow up PR).

All in all, I think we are in a position where taking this is feasible and it doesn't look to regress any meaningful scenarios, only the already unaccelerated edge cases. -- Mono is being left "as is" for now since their inliner isn't as robust as the RyuJIT one, but we should be able to independently investigate removing the Mono support for the xplat APIs as well so they only need to support the platform specific APIs.

CC. @dotnet/jit-contrib

Copy link
Member

@EgorBoEgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice clean up in jit! 👍

@lewing
Copy link
Member

lewing commentedJun 11, 2024
edited
Loading

wasm regressions heredotnet/perf-autofiling-issues#36108 (aot) and improvementsdotnet/perf-autofiling-issues#36093 (interp)

cc@radekdoulik

@tannergooding
Copy link
MemberAuthor

Likely regressions:

These should be resolved with#103177

wasm regressions heredotnet/perf-autofiling-issues#36108 (aot)

Little bit surprised thatwasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

improvementsdotnet/perf-autofiling-issues#36093 (interp)

👍, fairly significant improvements at that. I know thatinterp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

@lewing
Copy link
Member

Likely regressions:

These should be resolved with#103177

wasm regressions heredotnet/perf-autofiling-issues#36108 (aot)

Little bit surprised thatwasm aot regressed since the Mono handling hadn't been touched and the AOT support should have already been accelerated for these APIs. Is the support for PackedSIMD separate from the mainline Mono support around MonoLLVM?

I find it surprising too,@radekdoulik can you take a look at our codegen here please

improvementsdotnet/perf-autofiling-issues#36093 (interp)

👍, fairly significant improvements at that. I know thatinterp hadn't accelerated the Vector2/3/4 APIs but had accelerated the Vector128 APIs, so lets it do less work and still get the benefits.

Yes, nice improvements

@matouskozak
Copy link
Member

matouskozak commentedJun 14, 2024
edited
Loading

Possibly related noticeable size improvements on Mono iOS HelloWorlddotnet/perf-autofiling-issues#35768

@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsJul 21, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@jkotasjkotasjkotas left review comments

@EgorBoEgorBoEgorBo approved these changes

Assignees

@tannergoodingtannergooding

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@tannergooding@LoopedBard3@lewing@matouskozak@EgorBo@jkotas

[8]ページ先頭

©2009-2025 Movatter.jp