Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Move memset/memcpy helpers to managed impl#98623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 48 commits intodotnet:mainfromEgorBo:remove-jit-memset
Feb 25, 2024

Conversation

EgorBo
Copy link
Member

@EgorBoEgorBo commentedFeb 17, 2024
edited by jkotas
Loading

Closes#98620
Closes#95517

filipnavara, hamarb123, karb0f0s, huoyaoyuan, ShreyasJejurkar, and ForNeVeR reacted with thumbs up emoji
@ghostghost assignedEgorBoFeb 17, 2024
@ghostghost added the area-VM-coreclr labelFeb 17, 2024
@jkotas
Copy link
Member

jkotas commentedFeb 18, 2024
edited
Loading

We have two places in CoreLib that callUnsafe.InitBlockUnaligned. These places are effectively going to callSpanHelpers.Fill now - we can fix them up to do that directly.

Unsafe.InitBlockUnaligned(refUnsafe.As<T,byte>(ref_reference),*(byte*)&value,(uint)_length);

Unsafe.InitBlockUnaligned(refb,0,(uint)byteLength);

@EgorBo
Copy link
MemberAuthor

EgorBo commentedFeb 18, 2024
edited
Loading

We have two places in CoreLib that callUnsafe.InitBlockUnaligned. These places are effectively going to callSpanHelpers.Fill now - we can fix them up to do that directly.

The problem thatSpanHelpers.Fill is not optimized by jit for constant length. It was never needed, because e.g.Span.Clear invokedUnsafe.InitBlockUnaligned

image

No longer happens if we change it to callSpanHelpers.Fill directly. I'll look in a separate PR to unroll Fill as well (should be a simple fix).

PaulusParssinen and neon-sunset reacted with rocket emoji

@EgorBo
Copy link
MemberAuthor

@jkotas we don't use memset/memcpy helpers in JIT for x86 and always userep stosb/movb - wonder if we should switch those to helpers or not? I presumerep might also lead to gc starvation.

@jkotas
Copy link
Member

I presume rep might also lead to gc starvation.

rep ... should be emitted as fully interruptible code to avoid gc starvation. It is same as a loop without a call.

EgorBoand others added2 commitsFebruary 25, 2024 01:07
@EgorBo
Copy link
MemberAuthor

EgorBo commentedFeb 25, 2024
edited
Loading

Btw, I found & fixed an existing bug - Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.:

Buffer.MemoryCopy(null,null,10,10);

this doesn't fail (likely can be reproduced in other APIs which use Memmove under the hood)

@jkotas
Copy link
Member

Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.

This is not a bug that needs fixing. Passing invalid buffer to Memmove is UB.

@EgorBo
Copy link
MemberAuthor

EgorBo commentedFeb 25, 2024
edited
Loading

Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.

This is not a bug that needs fixing. Passing invalid buffer to Memmove is UB.

if I don't fix it, then:

voidTest(MyStruct*a,MyStruct*b)=>*a=*b;// or its ref version

won't fail on both being null (or will depend on opts level)

@jkotas
Copy link
Member

Ok, I agree that it makes sense to fix this for internal JIT use of memory copy.

@EgorBo
Copy link
MemberAuthor

EgorBo commentedFeb 25, 2024
edited
Loading

Done. I wrote somebenchmarks
whereCore_Root_PR_tailcall is this PR + a conservativefix for taillcalls (don't see big improvements from this). The fix is conservative because it disables unrollings for constant sizes.

NOTE: PGO doesn't unroll these unknown-size memops because the opt I landed for this currently require R2R=0 and I didn't disable it.

Overall looks good to me, IMO. It's possible that in some cases helper calls are always indirect because they're promoted to Tier1 together with the benchmarks, for cast helpers we didthis trick, but I am not sure we want to add the whole SpanHelpers here as well.

CopyBlock4000/InitBlock4000 likely struggle from going to native memset/memmove anyway from the managed helpers
It is also possible that these numbers vary a bit from run to run due to GC alignment (although, I tried to use[MemoryRandomization] I am not sure it helps).

@@ -10343,18 +10343,34 @@ void Compiler::impImportBlockCode(BasicBlock* block)
// TODO: enable for X86 as well, it currently doesn't support memset/memcpy helpers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

x86 supports memset/memcpy helpers now (this can be a follow up change)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, it's part of a follow up PR because it involves removal ofGT_STORE_DYN_BLK op, definitely didn't want to do it as part of this PR

#else
private const nuint MemmoveNativeThreshold = 2048;
#endif
// TODO: Determine optimal value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
// TODO: Determine optimal value

Anything TODO here?

Copy link
MemberAuthor

@EgorBoEgorBoFeb 25, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

um.. do you mean it needs a link? I didn't spend enough time testing this treshold on different cpus/platforms yet. I presume it also can't be too big to avoid regressing NAOT compiled for generic cpu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If you believe that somebody should spend time on this, it should have a link.

It is complicated to pick one universal threshold, so I am not sure whether it would be time well spent. You end up picking your winners and losers. Anything in the 1kB-10kB range sounds about right to me. The point of threshold is to amortize the cost of the PInvoke transition.

If there is something to look at here, I think it would be the infinite threshold for Arm64 memcpy. I would expect that the unmanaged memcpy is going to have more tricks that we have here. The threshold for Arm64 was set years ago when the Arm64 optimizations in the overall ecosystem were just starting to show up.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wasn't able to find any regressions from that infinite threshold on modern hw in microbenchmarks, but it's very possible the picture is different in real world due to those non-temporal stores etc.
I agree it won't hurt to change that treshold to some big value like 32kb etc

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll remove this comment as part of theGT_STORE_DYN_BLK clean up PR to avoid spinning CI for comment change if you don't mind

@jkotas
Copy link
Member

a conservativefix for taillcalls

Where would you expect it to help with the current state of the PR? The wrappers for JIT helpers are not there anymore, so the JIT helpers are not affected by this.

for cast helpers we didthis trick,

This sounds like a general problem with ordering of tiered compilation. For direct calls, it would be preferable to compile the bottom layer methods faster than upper layer methods. I do not think we have anything in place to help with that (correct me if I am wrong). If there are a few more bottom layer types that we care about in particular, I do not see a problem with adding them to this workaround.

@jkotas
Copy link
Member

Is the SpanHelpers byte overload more efficient for sizeof(T) == 1 or there is no difference?

No difference for now, but I wanted to improve the byte version so there will be

Is this left for a follow up?

Copy link
Member

@jkotasjkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks great. Thank you!

EgorBo reacted with thumbs up emoji
@EgorBo
Copy link
MemberAuthor

Where would you expect it to help with the current state of the PR?

For ordinary managed calls for Clear/Memmove inside void methods, example:

voidTest(Span<byte>span)=>span.Clear();

^ the codegen is expected to perform a tail call toSpanHelpers.ClearWithoutReferences (it doesn't happen with this PR yet)

If there are a few more bottom layer types that we care about in particular, I do not see a problem with adding them to this workaround.

Overall I agree, it's just that I'd rather wantSpanHelpers to be promoted with a delay to avoid setting in stone weights too early (one day we'll implement the context-sensitive PGO to solve this problem differently). Also, all R2R'd methods can stay in R2R form longer. This timing issue can only manifest in simple microbenchmarks I presume (I still not sure the perf difference between indirect and direct calls are noticeable)

Is this left for a follow up?

Yep. We probably can ask our ARM64/Intel contributors for help here, although, I am not sure non-zero Memset is a popular operation (very few hits accross SPMI).

@EgorBoEgorBo merged commitfab69ef intodotnet:mainFeb 25, 2024
@EgorBoEgorBo deleted the remove-jit-memset branchFebruary 25, 2024 16:37
@EgorBoEgorBo mentioned this pull requestFeb 26, 2024
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMar 27, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@AustinWiseAustinWiseAustinWise left review comments

@stephentoubstephentoubstephentoub left review comments

@jkotasjkotasjkotas approved these changes

@hamarb123hamarb123hamarb123 left review comments

@xtqqczzextqqczzextqqczze left review comments

@SingleAccretionSingleAccretionSingleAccretion left review comments

@MichalStrehovskyMichalStrehovskyAwaiting requested review from MichalStrehovskyMichalStrehovsky is a code owner

@marek-safarmarek-safarAwaiting requested review from marek-safar

Assignees

@EgorBoEgorBo

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

memcpy/memset helpers on CoreCLR may not be suspension-friendly NativeAOT: Memset/Memcpy helpers don't throw NRE
7 participants
@EgorBo@jkotas@xtqqczze@AustinWise@stephentoub@hamarb123@SingleAccretion

[8]ページ先頭

©2009-2025 Movatter.jp