Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: add lifecycle.Executor to manage autostart and autostop#1183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
johnstcn merged 28 commits intomainfromcj/gh-271/sched-autostart
May 11, 2022

Conversation

johnstcn
Copy link
Member

@johnstcnjohnstcn commentedApr 26, 2022
edited
Loading

What:

This PR adds a packagelifecycle and anExecutor implementation that attempts to schedule a build of workspaces with autostart configured.

  • lifecycle.Executor takes achan time.Time in its constructor (e.g.time.Tick(time.Minute))
  • Whenever a value is received from this channel, it executes one iteration of looping through the workspaces and triggering lifecycle operations.
  • When the context passed to the executor is Done, it exits.
  • Only workspaces that meet the following criteria will have a lifecycle operation applied to them:
    • Workspace has a valid and non-empty autostart or autostop schedule (either)
    • Workspace's last build was successful
  • The following transitions will be applied depending on the current workspace state:
    • If the workspace is currently running, it will be stopped.
    • If the workspace is currently stopped, it will be started.
    • Otherwise, nothing will be done.

What was done:

  • Implement autostop logic

  • Add unit tests to verify that workspaces in liminal states and/or soft-deleted workspaces do not get touched by this.

  • Exclude workspaces without autostart or autostop enabled in the initial query

  • MoveprovisionerJobStatus literally anywhere else.

  • Wire up the AutostartExecutor to the real coderd

  • Renamed theautostart/lifecycle package tolifecycle/executor cause that name was bugging me.

Possible improvements / "Future Work":

  • Make Test_Executor_Run/AlreadyRunning not wait ten seconds somehow:
    • This requires some plumbing -- we'd need some way of hooking more directly into provisionerd, which doesn't seem easy right now.
  • DRY the workspace build triggering logic as it's essentially copy-pasted from the workspaces handler:
    • Leaving this for now as we only use this in two places. Once we need to do this elsewhere, we can revisit.
  • Add a unit test for racy dueling coderd instances:
    • Leaving this for now, as doing our periodic checks in a single transaction should be sufficient to ensure consistency.

Closes#271.

greyscaled reacted with eyes emoji
@codecov
Copy link

codecovbot commentedApr 26, 2022
edited
Loading

Codecov Report

Merging#1183 (7627372) intomain (2df92e6) willdecrease coverage by0.01%.
The diff coverage is74.10%.

@@            Coverage Diff             @@##             main    #1183      +/-   ##==========================================- Coverage   67.08%   67.06%   -0.02%==========================================  Files         288      288                Lines       18773    19079     +306       Branches      241      241              ==========================================+ Hits        12593    12795     +202- Misses       4906     4979      +73- Partials     1274     1305      +31
FlagCoverage Δ
unittest-go-macos-latest54.21% <65.62%> (+0.10%)⬆️
unittest-go-postgres-65.54% <71.42%> (-0.02%)⬇️
unittest-go-ubuntu-latest56.64% <65.62%> (+0.22%)⬆️
unittest-go-windows-202252.63% <65.62%> (+0.19%)⬆️
unittest-js74.24% <ø> (ø)
Impacted FilesCoverage Δ
cli/autostart.go75.29% <ø> (ø)
cli/autostop.go75.00% <ø> (ø)
coderd/autobuild/schedule/schedule.go90.90% <ø> (ø)
coderd/workspaces.go58.51% <ø> (ø)
coderd/database/queries.sql.go77.78% <61.29%> (-0.13%)⬇️
coderd/autobuild/executor/lifecycle_executor.go71.60% <71.60%> (ø)
cli/server.go58.27% <100.00%> (+0.78%)⬆️
coderd/coderdtest/coderdtest.go98.93% <100.00%> (+0.07%)⬆️
coderd/audit/backends/postgres.go38.46% <0.00%> (-30.77%)⬇️
codersdk/provisionerdaemons.go61.97% <0.00%> (-5.64%)⬇️
... and32 more

Continue to review full report at Codecov.

Legend -Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing data
Powered byCodecov. Last update2df92e6...7627372. Read thecomment docs.

Copy link
Member

@kylecarbskylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How do we intend on making this distributed? eg. if I have 5 coderd replicas, what stops them from queueing multiple workspace builds at the same time?

@johnstcn
Copy link
MemberAuthor

How do we intend on making this distributed? eg. if I have 5 coderd replicas, what stops them from queueing multiple workspace builds at the same time?

Kyle and I discussed briefly in Slack -- essentially the "best" way to do this would be to set up a distributed leader lock and only have the 'leader' coderd.

But with the current implementation that (ab?)uses one big transaction in which to perform all the autostarts, there shouldn't be any serious issues at small N.

A further optimization would essentially involve consolidating the entire logic of creating a new worksapce build with the same parameters as the last build into a single query. This would be a big scary query though, so the complexity likely isn't worth it right now.

Probably the best approach right now is to add a unit test that ensures that two coderd instances running concurrently against the same database don't end up causing problems here.

kylecarbs and greyscaled reacted with thumbs up emoji

@johnstcnjohnstcnforce-pushed thecj/gh-271/sched-autostart branch froma0a10f5 to011ea51CompareApril 26, 2022 21:44
@Emyrk
Copy link
Member

Kyle and I discussed briefly in Slack -- essentially the "best" way to do this would be to set up a distributed leader lock and only have the 'leader' coderd.

Just want to mention proper fault tolerant leadership stuff is typically solved with consensus algos like Raft or Paxos. Probably 100% overkill, but if we decide to have a "Leader" coderd, you do have to handle leaders failing, being slow, and being elected out. It's a fun problem that gets complicated super quick if you 100% cannot tolerate 2 concurrent leaders.

If you just do a pg lock and first come first serve, that's going to work fine and let pg determine who the "leader" is.

greyscaled reacted with thumbs up emoji

@johnstcn
Copy link
MemberAuthor

Just want to mention proper fault tolerant leadership stuff is typically solved with consensus algos like Raft or Paxos. Probably 100% overkill, but if we decide to have a "Leader" coderd, you do have to handle leaders failing, being slow, and being elected out. It's a fun problem that gets complicated super quick if you 100% cannot tolerate 2 concurrent leaders.

Yep, absolutely. Also, those consensus algos typically are used when there are multiple independent instances with separate persistent storage. Since all coderd replicas will be using the same database, just using a PG leader lock makes sense in this case.

@johnstcnjohnstcnforce-pushed thecj/gh-271/sched-autostart branch 4 times, most recently froma8a50a4 todb03c74CompareApril 30, 2022 15:13
@johnstcnjohnstcnforce-pushed thecj/gh-271/sched-autostart branch 2 times, most recently fromdef317c toed18e04CompareMay 10, 2022 07:28
@johnstcnjohnstcnforce-pushed thecj/gh-271/sched-autostart branch fromb39fde2 to364a27cCompareMay 10, 2022 20:50
@johnstcnjohnstcn marked this pull request as ready for reviewMay 10, 2022 20:51
@johnstcnjohnstcn requested a review froma team as acode ownerMay 10, 2022 20:51
@johnstcnjohnstcn requested a review froma teamMay 10, 2022 20:51
Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some questions and minor suggestions, but overall a really clean PR!

}

func mustProvisionWorkspace(t *testing.T, client *codersdk.Client) codersdk.Workspace {
t.Helper()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice use of helper functions, kept the tests focused on the relevant parts! 😎

kylecarbs reacted with hooray emoji
Copy link
Member

@kylecarbskylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM! Just a comment on the package naming, as I'd hate for this beautiful package to be eventually bloated due to naming conflicts!↔️↔️↔️


var validTransition database.WorkspaceTransition
var sched *schedule.Schedule
switch latestBuild.Transition {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Because builds are idempotent, we shouldn't need to switch off the last transition. Weshould be able to just start or stop.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Even so, it's probably no harm for this thing to make a best-effort at being idempotent as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ahh yes, but I mean going off the last transition could lead to unexpected results if I'm reading correctly.

eg. I'm in the stopped state but the workspace was updated, now my autostop wouldn't trigger to update my workspace. I'm not sure if this is a problem, but the behavior could be odd to customers.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

So I wrote a test to see what happens right now in this state:

  • Given: we have a stopped workspace with autostart enabled
  • Also given: the TemplateVersion of the Template used by the workspace changes

In this case, we do trigger an autostart, but with the last template version used by the workspace.

As this is an automated process, I think this makes sense and we would want the user to manually update the template version if the workspace is outdated. In future, we may want to allow users to opt-in to automatically updating workspaces to the latest version.

})
}

// TODO(cian): this function duplicates most of api.postWorkspaceBuilds. Refactor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wouldn't be opposed to exposing a function like:database.CreateWorkspaceBuild. I'm not a massive fan either, because it certainly conflicts withdatabase.InsertWorkspaceBuild. 🤷

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm pausing extracting a function here because we only have two datapoints right now; I'd like to see at least one more before doing so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I like that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion(if-minor): Just one suggestion: ticket > todo. Entirely up to you, won't hold up on that.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@vapurrmaid good call:#1401

"github.com/stretchr/testify/require"
)

func Test_Executor_Autostart_OK(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We haven't the_ idiom in other places in the codebase. I'm impartial as to whether it's good or not, but we could remove them for consistency in this case!

johnstcn reacted with thumbs up emoji
@@ -0,0 +1,219 @@
package executor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What do you think about calling the lifecycle packageautobuild orcronbuild? I'm concerned about calling itlifecycle, since that term could be interpreted very broadly.

johnstcn reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

autobuild is better and does what it says on the tin.

kylecarbs and greyscaled reacted with hooray emoji
Copy link
Contributor

@greyscaledgreyscaledMay 11, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

comment(in-support):autobuild seems intuitive/useful to me.

slogtest.Make(t, nil).Named("lifecycle.executor").Leveled(slog.LevelDebug),
options.LifecycleTicker,
)
go lifecycleExecutor.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since thisRun function is always called in a goroutine, could we do that automatically for the caller inNew()?

johnstcn reacted with thumbs up emoji
}

// TODO(cian): this function duplicates most of api.postWorkspaceBuilds. Refactor.
func doBuild(ctx context.Context, store database.Store, workspace database.Workspace, trans database.WorkspaceTransition, priorHistory database.WorkspaceBuild, priorJob database.ProvisionerJob) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

smallest nit possible: We could drop thedo here

Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

Copy link
Contributor

@greyscaledgreyscaled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

well done ✔️

cli/server.go Outdated
lifecyclePoller := time.NewTicker(time.Minute)
defer lifecyclePoller.Stop()
lifecycleExecutor := executor.New(cmd.Context(), options.Database, logger, lifecyclePoller.C)
go lifecycleExecutor.Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Comment(general): Your code looks great. I'm feeling a bit lost in understandingall that's happening in this"Start a Coder server"RunE because it's growing quite large with some branching andGoroutines. I'm wondering if that is just a me thing, or if maybe the complexity here is growing large. To understand what I mean, this diff in isolation looks harmless. In the grand scheme of initializing everything within thisRunE, though, it's harder to really evaluate.

I'm under the impression we could extract out some initialization steps to named functions and call them in a pipeline, but maybe that's just a me thing and not the general preferred approach for Go + Cobra. Do you have any comments to that effect?

Regardless, this comment is not meant for your code/this PR, just a general observation.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed, I'd be in favour of extracting some initialization logic to make things slightly more concise, just would want to be careful to avoid too many layers of indirection.

@@ -0,0 +1,219 @@
package executor
Copy link
Contributor

@greyscaledgreyscaledMay 11, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

comment(in-support):autobuild seems intuitive/useful to me.

})
}

// TODO(cian): this function duplicates most of api.postWorkspaceBuilds. Refactor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion(if-minor): Just one suggestion: ticket > todo. Entirely up to you, won't hold up on that.

@@ -17,3 +17,5 @@ coverage/
out/
storybook-static/
test-results/

**/*.swp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Praise: Thank you for thinking of these files. We should ensure that we add matching comments to each 3 of these files to consider adding entries in all 3. It's easy to miss/forget.

@johnstcnjohnstcn merged commitf4da5d4 intomainMay 11, 2022
@johnstcnjohnstcn deleted the cj/gh-271/sched-autostart branchMay 11, 2022 22:03
@missknissmisskniss added this to theV2 Beta milestoneMay 15, 2022
kylecarbs pushed a commit that referenced this pull requestJun 10, 2022
This PR adds a package lifecycle and an Executor implementation that attempts to schedule a build of workspaces with autostart configured.- lifecycle.Executor takes a chan time.Time in its constructor (e.g. time.Tick(time.Minute))- Whenever a value is received from this channel, it executes one iteration of looping through the workspaces and triggering lifecycle operations.- When the context passed to the executor is Done, it exits.- Only workspaces that meet the following criteria will have a lifecycle operation applied to them:  - Workspace has a valid and non-empty autostart or autostop schedule (either)  - Workspace's last build was successful- The following transitions will be applied depending on the current workspace state:  - If the workspace is currently running, it will be stopped.  - If the workspace is currently stopped, it will be started.  - Otherwise, nothing will be done.- Workspace builds will be created with the same parameters and template version as the last successful build (for example, template version)
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@kylecarbskylecarbskylecarbs approved these changes

@mafredrimafredrimafredri approved these changes

@greyscaledgreyscaledgreyscaled approved these changes

@EmyrkEmyrkAwaiting requested review from Emyrk

Assignees

@johnstcnjohnstcn

Labels
None yet
Projects
None yet
Milestone
V2 Beta
Development

Successfully merging this pull request may close these issues.

General use scheduler for jobs like Auto ON/OFF
6 participants
@johnstcn@Emyrk@mafredri@kylecarbs@greyscaled@misskniss

[8]ページ先頭

©2009-2025 Movatter.jp