- Notifications
You must be signed in to change notification settings - Fork5.1k
Port yield normalization from CoreCLR to Native AOT#103675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Tagging subscribers to this area:@mangod9 |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
@@ -46,9 +46,6 @@ uint32_t WINAPI FinalizerStart(void* pContext) | |||
g_pFinalizerThread = PTR_Thread(pThread); | |||
// We have some time until the first finalization request - use the time to calibrate normalized waits. | |||
EnsureYieldProcessorNormalizedInitialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How is the measurement going to be triggered when this is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm still trying to figure this out, I'm not very familiar with Native AOT in general so I'd appreciate any suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It looks like we would need to callYieldProcessorNormalization::PerformMeasurement()
from here or add aEnsureYieldProcessorNormalizedInitialized()
entry point to the new code that simply callsYieldProcessorNormalization::PerformMeasurement()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do you happen to know if this function is called every ~4 seconds or faster than that? Currently we letYieldProcessorNormalization::PerformMeasurement()
run every ~4 s so if that's the case, I believe we may add here the same call as in CoreCLR
if (YieldProcessorNormalization::IsMeasurementScheduled()) {GCX_PREEMP();YieldProcessorNormalization::PerformMeasurement(); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
FinalizerStart
function is called once per process. It is equivalent ofFinalizerThreadStart
function in regular CoreCLR.
I think you want to follow the same structure as in regular CoreCLR: Trigger the measurement from ScheduleMeasurementIfNecessary by callingRhEnableFinalization
(it is equivalent ofFinalizerThread::EnableFinalization
in regular CoreCLR) and then add the measurement to loop inProcessFinalizers()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do you happen to know if this function is called every ~4 seconds or faster than that?
I am not sure. The whole deal with measuring duration of something that is proportional to CPU cycle is not very precise, since the CPU cycle can change drastically and many times per second and will be different for every core. Unless machine is configured into HighPerformance power plan, every measurement is a bit of a coin toss and will produce the same result with the same error margins.
The main purpose of calibration is to continue using historically hard-coded spin counts in numerous places where we spinwait while allowing that to work on systems with vastly different pause durations (i.e. on post-skylake intel CPUs pause takes ~140 cycles, pre-skylake is about ~10 cycles). For such purpose the callibration is precise enough.
I am not sure about the value of redoing the measurement over and over.
Perhaps to support scenarios where a VM is migrated between pre/post skylake machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess we can add a periodic call PerformMeasurement in NativeAOT and see what happens.
My guess - nothing will change, just a bit more time spent in PerformMeasurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There is value in having the same behavior though.
If the re-measuring (or the whole calibration deal) could be somehow avoided or improved, it would make sense to do for both runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The measurements done are very short and can be perturbed by CPU activity, the rolling min helps to stabilize it over time.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
af5ceaa
to6519b0b
CompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, thanks!
What kind of testing you have done on the change to validate that it works as expected? Do we expect improvements in any perf benchmarks? |
kouvel commentedJul 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I don't think there would be any changes to benchmarks. I would expect that the CPU time spent during startup in the measurements would be a lot less (the new scheme measures lazily, and in narrower windows), that's about it. It would be good to measure that. |
I tested the following snippet and I checked that the 8 initial measurements were done and subsequent measurements every ~4 seconds were done as well. usingSystem;usingSystem.Threading;intminutesToSpin=10;intstartTicks=Environment.TickCount;while(Environment.TickCount-startTicks<minutesToSpin*60*1000){Thread.SpinWait(1000);Thread.Sleep(2000);} |
d35f302
intodotnet:mainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Porting the current way yield normalization is done to Native AOT.
The CoreCLR implementation was moved to src/coreclr/vm/yieldprocessornormalizedshared.cpp.
Both the CoreCLR file (src/coreclr/vm/yieldprocessornormalized.cpp) and the Native AOT file (src/coreclr/nativeaot/Runtime/yieldprocessornormalized.cpp) now share the same implementation.