- Notifications
You must be signed in to change notification settings - Fork5.1k
Arm64: Memory barrier improvements#62895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
ghost commentedDec 16, 2021
Tagging subscribers to this area:@JulieLeeMSFT Issue Details
|
I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64: Basically, some of the instructions don't quite have the semantics you might expect. |
I see approx. 0.58% and 0.28% improvement in RPS in mvc and json benchmarks respectively. I believe these are in error range. |
Thanks@alexrp . This PR just extends the optimal memory barriers for |
@dotnet/jit-contrib |
LGTM,@VSadov could you please take a quick look if you have time
used to emit a full memory barrier, now it emits a store-only one. Also, for most cases stores/loads for variables marked as "volatile" now actually don't emit memory barriers at all and use e.g. |
src/coreclr/jit/codegenarm64.cpp Outdated
@@ -3280,7 +3280,7 @@ void CodeGen::genCodeForStoreInd(GenTreeStoreInd* tree) | |||
else | |||
{ | |||
// issue a full memory barrier before a volatile StInd | |||
instGen_MemoryBarrier(); | |||
instGen_MemoryBarrier(BARRIER_STORE_ONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the typical scenario when this branch is taken? When value is a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As I understand the purpose of this barrier is to have release semantics for storing into volatile variable that does not fit into a single register (otherwise stlr could be used).
I think this needs a full barrier, since unlikestlr
,dmb ishst
only waits for stores in progress and has no effect on loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Basically,ldar
can be replaced withldr; dmb ishld
, but there is no such equivalency betweenstlr
anddmb ishst; str
because ishst is too weak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fromhttps://developer.arm.com/documentation/100941/0100/Barriers, trying to understand what you stated.
Basically, forishst
, loads can still be reordered around barrier and hence it is weaker than theishld
where loads/stores need to wait till the barrier is complete.
If we change from full barrier toishst
, we might have a load that should have been completed but got reordered and might end up reading the wrong value (pre-updated value). Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, loads that appear after ishst, in program order, may speculatively happen ahead of the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
staticvolatileintx;staticvolatileinty;staticintxx;staticintyy;----one thread:x=42;yy=y;---another thread:y=42;xx=x;
Can bothxx
andyy
end up 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Makes sense. I will revert the change related toshst
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM!
EgorBo commentedJan 20, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Improvement on ubuntu-arm64:dotnet/perf-autofiling-issues#2981 |
volatile
variable which makes the speed ofvolatile
declared variables as observed in[arm64] Volatile.Read/Write is 2x faster than "volatile" loads/stores #60232. Thanks@EgorBo for suggesting the solution as well.