- Notifications
You must be signed in to change notification settings - Fork5.3k
Never use heap for return buffers#112060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
EgorBo commentedFeb 3, 2025
/azp run Fuzzlyn |
| Azure Pipelines successfully started running 1 pipeline(s). |
EgorBo commentedFeb 4, 2025
/azp run runtime-coreclr jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr gcstress-extra, runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo, Fuzzlyn, runtime-coreclr pgostress, runtime-coreclr outerloop |
| Azure Pipelines successfully started running 8 pipeline(s). |
src/coreclr/jit/compiler.h Outdated
| STRESS_MODE(POISON_IMPLICIT_BYREFS) \ | ||
| STRESS_MODE(STORE_BLOCK_UNROLLING) \ | ||
| STRESS_MODE(THREE_OPT_LAYOUT) \ | ||
| STRESS_MODE(NONHEAP_RET_BUFFER) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.
Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.
Can't find any evidence that we run jitstress for NAOT even in outerloop and we definitely have no GCStress for it (#107850)
Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?
I had it locally for R2R too. It seems my test apps fail badly if I remove theimporter code that makes local copies instead of passing heap pointer, even without any explicit stress mode (and the helper), so I presume I can delete it
jkotas left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks!
EgorBo commentedFeb 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@jakobbotsch @dotnet/jit-contrib does the jit side look good (beside leaving a few things to follow up PRs as improvements on top of it) |
src/coreclr/jit/importer.cpp Outdated
| if (op->OperIsScalarLocal() && (op->AsLclVarCommon()->GetLclNum() == impInlineRoot()->info.compRetBuffArg)) | ||
| { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think it's ok for this to return true without assigninglclVarTreeOut.
I think it would be better to add a new function that checks for the property we want, e.g.PointsOutsideHeap or similar.GenTreeIndir::IsAddressNotOnHeap could probably be switched to use it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@jakobbotsch ah good idea, addressed
src/coreclr/jit/flowgraph.cpp Outdated
| if (op->OperIs(GT_ADD)) | ||
| { | ||
| // If we have (base + offset), inspect the base. We assume someone else normalized the tree | ||
| // so the constant offset is always on the right. | ||
| GenTree* op2 = op->gtGetOp2(); | ||
| if (op2->TypeIs(TYP_I_IMPL) && op2->IsCnsIntOrI() && !op2->IsIconHandle() && | ||
| !fgIsBigOffset(op2->AsIntCon()->IconValue())) | ||
| { | ||
| returnfgAddrCouldBeHeap(op->gtGetOp1()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could usegtPeelOffsets here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Probably best to do it up before the check forop->OperIs(GT_LCL_ADDR), it might also get some cases on the retbuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've already checked that it doesn't find anything new, but I guess wouldn't hurt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
addressed
| GenTree* spilledCall = gtNewStoreLclVarNode(tmp, srcCall); | ||
| GenTree* comma = gtNewOperNode(GT_COMMA, store->TypeGet(), spilledCall, | ||
| gtNewLclvNode(tmp, lvaGetDesc(tmp)->TypeGet())); | ||
| store->Data() = comma; | ||
| comma->AsOp()->gtOp1 = impStoreStruct(spilledCall, curLevel, pAfterStmt, di, block); | ||
| return impStoreStruct(store, curLevel, pAfterStmt, di, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We still have the problem here that this reorders the LHS of the store with the RHS. I think if the LHS has side effects/ordering effects we need to introduce a local and another comma for it to evaluate it before the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@jakobbotsch can you elaborate? we spill the destination to a local (even before my change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Where is the destination spilled to a local? I think if you callimpStoreStruct with a store likeSTORE_BLK(Foo(), Bar()), then the code here will reorderFoo() so that it happens afterBar().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@jakobbotsch I think I've addressed it in4a48e64 Presumably, GT_RET_EXPR doesn't need special treatment, as we don't spill call their by hands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For inlining we already reorder things because of#112053, so regardless it's probably fine. Once that is fixed we can look into if anything is necessary here to keep the LHS before the call as well.
src/coreclr/jit/importer.cpp Outdated
| ((store->AsIndir()->Addr()->gtFlags & GTF_ALL_EFFECT) != 0)) | ||
| { | ||
| unsigned lclNum = lvaGrabTemp(true DEBUGARG("fgMakeTemp is creating a new local variable")); | ||
| impStoreToTemp(lclNum, store->AsIndir()->Addr(), curLevel, pAfterStmt, di, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think we can useimpStoreStruct here since this function is called from outside import viagtNewTempStore. I think it needs to create a comma, or only call this function in some cases (see the checks below forGT_COMMA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
um.. whatimpStoreStruct are your referring here? did you mean impStoreToTemp? Also, seems like this function already appends stuff to statements so it's weird expect it to not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess it's just an implicit contract that when gtNewTempStore calls it - it does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes I meantimpStoreTemp
seems like this function already appends stuff to statements so it's weird expect it to not?
Where does it do that? I think we only do that for theGT_COMMA case, and it has guards to ensure that only happens during import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We spoke offline and Egor convinced me that actually no reordering is happening here, so we don't need to do any spilling here. Sorry about that.
jakobbotsch left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
b1ab309 intodotnet:mainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
CI experiment for#111127
Was:
Now:
where the write barrier is put at the callsite if needed (presumably, it happens rarely)
Updated stats for write-barriers after#112227 was merged (it is supposed to help reducing the number of bulk barriers):
aspnet-win-x64 SPMI collection:
Looks like the aspnet collection has too many missed contexts currently (so the actual numbers are likely 5-10% higher)
MihuBot (PMI for BCL):