NotificationsYou must be signed in to change notification settings
Fork5.1k
Star16.6k

Arm64: Memory barrier improvements#62895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:barriers

Jan 5, 2022

Merged

Arm64: Memory barrier improvements#62895

kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:barriers

Jan 5, 2022

Conversation

Copy link

Contributor

kunalspathak commentedDec 16, 2021

Generate store barriers wherever possible. Currently, we generate full barriers for stores.
Generate one-way barriers forvolatile variable which makes the speed ofvolatile declared variables as observed in[arm64] Volatile.Read/Write is 2x faster than "volatile" loads/stores #60232. Thanks@EgorBo for suggesting the solution as well.

kunalspathak added2 commits

December 13, 2021 10:28

Use ishst instead of ish

74f24f3

Do not contain address of volatile fields

d2d6d70

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

Dec 16, 2021

ghost assignedkunalspathak

Dec 16, 2021

Copy link

alexrp commentedDec 16, 2021

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Basically, some of the instructions don't quite have the semantics you might expect.

Copy link

ContributorAuthor

kunalspathak commentedDec 17, 2021

I see approx. 0.58% and 0.28% improvement in RPS in mvc and json benchmarks respectively. I believe these are in error range.

Copy link

ContributorAuthor

kunalspathak commentedDec 17, 2021

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Thanks@alexrp . This PR just extends the optimal memory barriers forvolatile keyword and for one scenario where we can usedmb ishst.

Do not contain address only for Arm64

687277b

kunalspathak marked this pull request as ready for review

January 3, 2022 20:32

Copy link

ContributorAuthor

kunalspathak commentedJan 3, 2022

@dotnet/jit-contrib

Copy link

Member

EgorBo commentedJan 3, 2022

LGTM,@VSadov could you please take a quick look if you have time
tldr:

volatileVariable = 42;

used to emit a full memory barrier, now it emits a store-only one.

Also, for most cases stores/loads for variables marked as "volatile" now actually don't emit memory barriers at all and use e.g.stlr instead in case of store, basically the same what Volatile.Write emits today, see#60232

EgorBo approved these changes

Jan 3, 2022

View reviewed changes

VSadov reviewed

Jan 3, 2022

View reviewed changes

src/coreclr/jit/codegenarm64.cpp Outdated

		@@ -3280,7 +3280,7 @@ void CodeGen::genCodeForStoreInd(GenTreeStoreInd* tree)
		else
		{
		// issue a full memory barrier before a volatile StInd
		instGen_MemoryBarrier();
		instGen_MemoryBarrier(BARRIER_STORE_ONLY);

Copy link

Member

VSadovJan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the typical scenario when this branch is taken? When value is a struct?

Copy link

Member

VSadovJan 3, 2022•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As I understand the purpose of this barrier is to have release semantics for storing into volatile variable that does not fit into a single register (otherwise stlr could be used).

I think this needs a full barrier, since unlikestlr,dmb ishst only waits for stores in progress and has no effect on loads.

Copy link

Member

VSadovJan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Basically,ldar can be replaced withldr; dmb ishld , but there is no such equivalency betweenstlr anddmb ishst; str because ishst is too weak.

Copy link

ContributorAuthor

kunalspathakJan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fromhttps://developer.arm.com/documentation/100941/0100/Barriers, trying to understand what you stated.

Basically, forishst, loads can still be reordered around barrier and hence it is weaker than theishld where loads/stores need to wait till the barrier is complete.

If we change from full barrier toishst, we might have a load that should have been completed but got reordered and might end up reading the wrong value (pre-updated value). Is my understanding correct?

Copy link

Member

VSadovJan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, loads that appear after ishst, in program order, may speculatively happen ahead of the store.

Copy link

Member

VSadovJan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

staticvolatileintx;staticvolatileinty;staticintxx;staticintyy;----one thread:x=42;yy=y;---another thread:y=42;xx=x;

Can bothxx andyy end up 0 ?

Copy link

ContributorAuthor

kunalspathakJan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Makes sense. I will revert the change related toshst then.

Remove ishst

c9267a5

VSadov approved these changes

Jan 5, 2022

View reviewed changes

Copy link

Member

VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

kunalspathak merged commit4427c56 intodotnet:main

Jan 5, 2022

This was referencedJan 13, 2022

[Perf] Changes at 1/5/2022 11:37:43 PMdotnet/perf-autofiling-issues#2836

Closed

[Perf] Changes at 1/5/2022 11:37:43 PMdotnet/perf-autofiling-issues#2840

Closed

Copy link

Member

EgorBo commentedJan 20, 2022•
edited
Loading

Improvement on ubuntu-arm64:dotnet/perf-autofiling-issues#2981
and win-arm64:dotnet/perf-autofiling-issues#2977

JulieLeeMSFT mentioned this pull request

Jan 25, 2022

What's new in .NET 7 Preview 1 [WIP]dotnet/core#7106

Closed

EgorBo mentioned this pull request

Jan 26, 2022

ARM64: Avoid LEA for volatile IND#64354

Merged

EgorBo mentioned this pull request

Feb 9, 2022

AddRemoveFromDifferentThreads<string>.ConcurrentStack benchmark hangs on ARM64#64980

Closed

ghost locked asresolvedand limited conversation to collaborators

Feb 19, 2022

Labels

area-CodeGen-coreclr

CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

4 participants

Movatterモバイル変換

Arm64: Memory barrier improvements#62895

Arm64: Memory barrier improvements#62895

Uh oh!

Conversation

kunalspathak commentedDec 16, 2021

Uh oh!

ghost commentedDec 16, 2021

Uh oh!

alexrp commentedDec 16, 2021

Uh oh!

kunalspathak commentedDec 17, 2021

Uh oh!

kunalspathak commentedDec 17, 2021

Uh oh!

kunalspathak commentedJan 3, 2022

Uh oh!

EgorBo commentedJan 3, 2022

Uh oh!

VSadovJan 3, 2022

Choose a reason for hiding this comment

Uh oh!

VSadovJan 3, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadovJan 3, 2022

Choose a reason for hiding this comment

Uh oh!

kunalspathakJan 4, 2022

Choose a reason for hiding this comment

Uh oh!

VSadovJan 4, 2022

Choose a reason for hiding this comment

Uh oh!

VSadovJan 4, 2022

Choose a reason for hiding this comment

Uh oh!

kunalspathakJan 4, 2022

Choose a reason for hiding this comment

Uh oh!

VSadov left a comment

Choose a reason for hiding this comment

Uh oh!

EgorBo commentedJan 20, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

VSadovJan 3, 2022•
edited
Loading

EgorBo commentedJan 20, 2022•
edited
Loading