- Notifications
You must be signed in to change notification settings - Fork948
fix(coderd): fix flake inTestAPI/ModifyAutostopWithRunningWorkspace
#18932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
This PR fixes a flaky testTestAPI/ModifyAutostopWithRunningWorkspace
caused by a race condition inAwaitWorkspaceBuildJobCompleted
. The issue occurs when the API returns stale workspace build data due to non-transactional database queries between workspace builds and provisioner jobs.
- Adds a re-fetch of the workspace build before accessing its deadline to ensure fresh data
- Restructures the test flow to validate the deadline state before making TTL updates
Comments suppressed due to low confidence (1)
coderd/workspaces_test.go:2880
- The variable name 'build' is being reused and shadows the original build variable from the outer scope. Consider using a more descriptive name like 'freshBuild' or 'updatedBuild' to avoid confusion.
build, err := client.WorkspaceBuild(ctx, build.ID)
f751f81
intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Fixescoder/internal#521
This happened due to a race condition present in how
AwaitWorkspaceBuildJobCompleted
works.AwaitWorkspaceBuildJobCompleted
works by waiting until/api/v2/workspacesbuilds/{workspacebuild}/
returns a workspace build with.Job.CompletedAt != nil
. The issue here is thatsometimes the returnedcodersdk.WorkspaceBuild
can contain a build frombefore a provisioner job completed, but contain the provisioner job fromafter it completed.Let me demonstrate:
Here we query the database for
database.WorkspaceBuild
.coder/coderd/coderd.go
Lines 1409 to 1415 ina3f64f7
Inside of the
workspaceBuild
route handler, we callworkspaceBuildsData
coder/coderd/workspacebuilds.go
Line 54 ina3f64f7
This then calls
GetProvisionerJobsByIDsWithQueuePosition
coder/coderd/workspacebuilds.go
Lines 852 to 856 ina3f64f7
As these two calls happenoutside of a transaction, the state of the world can change underneath. This can result in an in-progress workspace build having a completed provisioner job attached to it.
Note: The change in this PR only touches the flakey test. The underlying cause of the flake isn't being fixed. I'm happy to expand the scope of this PR to fix the cause of the flake as that might also fix other flakes.