- Notifications
You must be signed in to change notification settings - Fork1k
fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable#19478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
related to usage of time.Now() in MustWaitForProvisionerAvailable andthe fact that UpdateProvisionerLastSeenAt can not use a time that isfurther in the past than the current LastSeenAt timeSigned-off-by: Callum Styan <callumstyan@gmail.com>
require.NoError(t,err,"Error getting provisioner for workspace") | ||
require.Equal(t,p.LastSeenAt.Time.UnixNano(),staleTime.UnixNano()) |
ethanndicksonAug 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Be mindful of the fact you can't callrequire
methods in a goroutine other than the main test goroutine. They all callFailNow
, and from the documentation:
FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test. Calling FailNow does not stop those other goroutines.
https://pkg.go.dev/testing#F.FailNow
You can callassert
functions, however!
Signed-off-by: Callum Styan <callumstyan@gmail.com>
provisionerSigned-off-by: Callum Styan <callumstyan@gmail.com>
transaction does not go through immediatelySigned-off-by: Callum Styan <callumstyan@gmail.com>
coderd/coderdtest/coderdtest.go Outdated
require.Eventually(t,func()bool { | ||
// Use the same logic as hasValidProvisioner but expect false | ||
provisionerDaemons,err:=db.GetProvisionerDaemonsByOrganization(ctx, database.GetProvisionerDaemonsByOrganizationParams{ | ||
OrganizationID:workspace.OrganizationID, | ||
WantTags:tags, | ||
}) | ||
iferr!=nil { | ||
returnfalse | ||
} | ||
// Check if NO provisioners are active (all are stale or gone) | ||
for_,pd:=rangeprovisionerDaemons { | ||
ifpd.LastSeenAt.Valid { | ||
age:=checkTime.Sub(pd.LastSeenAt.Time) | ||
ifage<=provisionerdserver.StaleInterval { | ||
returnfalse// Found an active provisioner, keep waiting | ||
} | ||
} | ||
} | ||
returntrue// No active provisioners found | ||
},testutil.WaitMedium,testutil.IntervalFast) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
suggest using methods intestutil/chan.go
and passing in a parent context in the method signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not sure what you're suggesting we actually change here since this set ofMustWaitForX
functions don't make use of a channel.
Did you mean to leave this comment on one of thestats := <-statsCh
lines that are in some of the actual test functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sorry, I got mixed up here. The method I wanted to highlight (testutil.Eventually
) is actually defined intestutil/eventually.go
.
The problem with the below
ctx := testutil.Context(t, testutil.WaitMedium)<setup code>require.Eventually(t, func bool() {<assertions that should eventually be true>}, testutil.WaitMedium, testutil.IntervalFast)
is thatctx
can reach its deadline before the condition inrequire.Eventually
is met, and is therefore prone to flakiness.
My suggestion is instead to use
ctx := testutil.Context(t, testutil.WaitMedium)<setup code>testutil.Eventually(t, func bool() {<assertions that should eventually be true>}, testutil.IntervalFast)
This will adhere to the deadline imposed byctx
and be less flake-prone. Apologies for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
No worries at all 👍 I refactored all of the test helpers I'd written in this file (MustWaitForX) to make use oftestutil.Eventually
.
require.EventuallySigned-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM modulo formatting and nits
coderdtest.UpdateProvisionerLastSeenAt(t,db,p.ID,staleTime) | ||
p,err=coderdtest.GetProvisionerForTags(db,time.Now(),workspace.OrganizationID,provisionerDaemonTags) | ||
assert.NoError(t,err,"Error getting provisioner for workspace") | ||
assert.Eventually(t,func()bool {returnp.LastSeenAt.Time.UnixNano()==staleTime.UnixNano() },testutil.WaitMedium,testutil.IntervalFast) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: Suggest instead checking less than or equal to staleTime instead instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This should be fine,lastSeenAt
should never be updated within the execution of this testother than via this direct call tocoderdtest.UpdateProvisionerLastSeenAt
, since that ensures the provisioner is stale and the autobuild will then not execute. We want to ensure it's at leaststaleTime
or very close to it, as when we create a provisioner via the various test helpers it gets a defaultLastSeenAt
time of like January 2024.
We could make this more explicitly enforced by also double checking that in addition to thestats.Transitions
length being 0, assert that theLastSeenAt
time for this provisioner is still equal tostaleTime
.
321c2b8
intomainUh oh!
There was an error while loading.Please reload this page.
The flake here had two causes:
Previously the test here was calling
coderdtest.MustWaitForProvisionersAvailable
which was usingtime.Now
rather than the next tick time like the realhasProvisionersAvailable
function does. Additionally, when usingUpdateProvisionerLastSeenAt
the underlying db query enforces that the time we're trying to setLastSeenAt
to cannot be older than the current value.I was able to reliably reproduce the flake by executing both the
UpdateProvisionerLastSeenAt
call andtickCh <- next
in their own goroutines, the former with a small sleep to reliably ensure we'd trigger the autobuild before we set theLastSeenAt
time. That's when I also noticed thatcoderdtest.MustWaitForProvisionersAvailable
was usingtime.Now
instead of the tick time. When I updated that function to take in a tick time + added a 2nd call toUpdateProvisionerLastSeenAt
to set an original non-stale time, we could then never get the test to pass because the later call to set the stale time would not actually modifyLastSeenAt
. On top of that, calling the provisioner daemons closer in the middle of the function doesn't really do anything of value in this test.The fix for the flake is to keep the go routines, ensuring there would be a flake if there was not a relevant fix, but to include the fix which is to ensure that we explicitly wait for the provisioner to be stale before passing the time to
tickCh
.