Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

chore: acquire lock for individual workspace transition#15859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
DanielleMaywood merged 5 commits intomainfromdm-lifecycle-executor-race
Dec 13, 2024

Conversation

DanielleMaywood
Copy link
Contributor

@DanielleMaywoodDanielleMaywood commentedDec 13, 2024
edited
Loading

When Coder is ran in High Availability mode, each Coder instance has a lifecycle executor. These lifecycle executors are all trying to do the same work, and whilst transactions saves us from this causing an issue, we are still doing extra work that could be prevented.

This PR adds aTryAcquireLock call for each attempted workspace transition, meaning two Coder instances shouldn't duplicate effort.

This approach does still allowsome duplicated effort to occur though. This is because we aren't locking the entirerunOnce function, meaning the follow scenario could still occur:

  1. InstanceXcallsGetWorkspacesEligibleForTransition, returning WorkspaceW
  2. InstanceXacquires lock to transition workspaceW
  3. InstanceXstarts transitioning WorkspaceW
  4. InstanceYcallsGetWorkspacesEligibleForTransition, returning WorkspaceW
  5. InstanceXfinishes transitioning WorkspaceW
  6. InstanceXreleases lock to transition workspaceW
  7. InstanceYacquires lock to transition workspaceW
  8. InstanceYstarts transitioning WorkspaceW
  9. InstanceYfails to transition WorkspaceW
  10. InstanceYreleases lock to transition workspaceW

I decided against lockingrunOnce for now as we run each workspace transition in their own transaction. Using nested transactions here will require extra design work and consideration.

mafredri reacted with thumbs up emoji
go func() {
tickChB <- next
close(tickChB)
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this potentially racy? We're testing that the lock acquire works but theoretically that might not happen if the first coderd grabs the job, completes it, and then the second one does.

I doubt it matters as I suppose we're happy even if the try acquire is hit only a faction of the time, but thought I'd flag it anyway.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking again, you're probably right. I ran the test with verbose logging and it looks like this all occurs within0.05s.

If the testdoesn't hit the lock, then we are likely to hit a flake. I'll have a go at increasing this time buffer.

Copy link
Member

@johnstcnjohnstcnDec 13, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think you might be able to reduce (but not eliminate) racyness by having a secondchan struct{} that you then close after starting both goroutines, making them both wait until it's closed to start.

e.g.

    startCh := make(chan struct{})go func() {        <-startChtickChA <- nextclose(tickChA)}()go func() {        <-startChtickChB <- nextclose(tickChB)}()    close(startCh)

You might also be able to get both of them to tick very closely in time by sharing the same tick channel, and making it buffered with size 2. (Of course then you'd want to avoid closing the channel twice to avoid a panic)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've gone with your proposal@johnstcn.

It looks like for testing we just use an echo provisioner job, so getting that to take artificially longer for this specific test may not be a trivial task.

go func() {
tickChB <- next
close(tickChB)
}()
Copy link
Member

@johnstcnjohnstcnDec 13, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think you might be able to reduce (but not eliminate) racyness by having a secondchan struct{} that you then close after starting both goroutines, making them both wait until it's closed to start.

e.g.

    startCh := make(chan struct{})go func() {        <-startChtickChA <- nextclose(tickChA)}()go func() {        <-startChtickChB <- nextclose(tickChB)}()    close(startCh)

You might also be able to get both of them to tick very closely in time by sharing the same tick channel, and making it buffered with size 2. (Of course then you'd want to avoid closing the channel twice to avoid a panic)

@DanielleMaywoodDanielleMaywood merged commit50ff06c intomainDec 13, 2024
30 checks passed
@DanielleMaywoodDanielleMaywood deleted the dm-lifecycle-executor-race branchDecember 13, 2024 16:59
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsDec 13, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@mafredrimafredrimafredri approved these changes

@johnstcnjohnstcnjohnstcn approved these changes

@mtojekmtojekAwaiting requested review from mtojek

Assignees

@DanielleMaywoodDanielleMaywood

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@DanielleMaywood@mafredri@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp