Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Arm64: Memory barrier improvements#62895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:barriers
Jan 5, 2022

Conversation

kunalspathak
Copy link
Contributor

@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelDec 16, 2021
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT
See info inarea-owners.md if you want to be subscribed.

Issue Details
Author:kunalspathak
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@alexrp
Copy link
Contributor

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Basically, some of the instructions don't quite have the semantics you might expect.

@kunalspathak
Copy link
ContributorAuthor

I see approx. 0.58% and 0.28% improvement in RPS in mvc and json benchmarks respectively. I believe these are in error range.

@kunalspathak
Copy link
ContributorAuthor

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Thanks@alexrp . This PR just extends the optimal memory barriers forvolatile keyword and for one scenario where we can usedmb ishst.

@kunalspathakkunalspathak marked this pull request as ready for reviewJanuary 3, 2022 20:32
@kunalspathak
Copy link
ContributorAuthor

@dotnet/jit-contrib

@EgorBo
Copy link
Member

LGTM,@VSadov could you please take a quick look if you have time
tldr:

volatileVariable = 42;

used to emit a full memory barrier, now it emits a store-only one.

Also, for most cases stores/loads for variables marked as "volatile" now actually don't emit memory barriers at all and use e.g.stlr instead in case of store, basically the same what Volatile.Write emits today, see#60232

@@ -3280,7 +3280,7 @@ void CodeGen::genCodeForStoreInd(GenTreeStoreInd* tree)
else
{
// issue a full memory barrier before a volatile StInd
instGen_MemoryBarrier();
instGen_MemoryBarrier(BARRIER_STORE_ONLY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the typical scenario when this branch is taken? When value is a struct?

Copy link
Member

@VSadovVSadovJan 3, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As I understand the purpose of this barrier is to have release semantics for storing into volatile variable that does not fit into a single register (otherwise stlr could be used).

I think this needs a full barrier, since unlikestlr,dmb ishst only waits for stores in progress and has no effect on loads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Basically,ldar can be replaced withldr; dmb ishld , but there is no such equivalency betweenstlr anddmb ishst; str because ishst is too weak.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fromhttps://developer.arm.com/documentation/100941/0100/Barriers, trying to understand what you stated.

image

image

Basically, forishst, loads can still be reordered around barrier and hence it is weaker than theishld where loads/stores need to wait till the barrier is complete.

If we change from full barrier toishst, we might have a load that should have been completed but got reordered and might end up reading the wrong value (pre-updated value). Is my understanding correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, loads that appear after ishst, in program order, may speculatively happen ahead of the store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

staticvolatileintx;staticvolatileinty;staticintxx;staticintyy;----one thread:x=42;yy=y;---another thread:y=42;xx=x;

Can bothxx andyy end up 0 ?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Makes sense. I will revert the change related toshst then.

Copy link
Member

@VSadovVSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

@EgorBo
Copy link
Member

EgorBo commentedJan 20, 2022
edited
Loading

Improvement on ubuntu-arm64:dotnet/perf-autofiling-issues#2981
and win-arm64:dotnet/perf-autofiling-issues#2977

@ghostghost locked asresolvedand limited conversation to collaboratorsFeb 19, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@EgorBoEgorBoEgorBo approved these changes

@VSadovVSadovVSadov approved these changes

Assignees

@kunalspathakkunalspathak

Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@kunalspathak@alexrp@EgorBo@VSadov

[8]ページ先頭

©2009-2025 Movatter.jp