Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: integrate Acquirer for provisioner jobs#9717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
spikecurtis merged 12 commits intomainfromspike/9428-pt2-integrate-acquirer
Sep 19, 2023

Conversation

spikecurtis
Copy link
Contributor

Fixes#9428

Some notable things to be aware of

  • Adds a new "AcquireJobWithCancel" RPC to replace AcquireJob. It allows asynchronous canceling of acquisition to support graceful shutdown. AcquireJob is deprecated, but kept around for deployed external provisioners. In a few versions we could consider removing it.

  • Refactors Provisionerd to remove polling loops.

  • Changes coderdtest to gracefully shut down the provisionerd. I added some new error logs that were getting triggered by non-graceful shutdowns.

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@@ -57,400 +64,411 @@ func testUserQuietHoursScheduleStore() *atomic.Pointer[schedule.UserQuietHoursSc
return ptr
}

funcTestAcquireJob(t *testing.T) {
funcTestAcquireJob_LongPoll(t *testing.T) {
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To review this file, I strongly suggest "Hide Whitespace"

Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis
Copy link
ContributorAuthor

CI testing has made me aware of a race condition --- sometimes wsbuilder is called from inside a transaction, so it's not safe to publish the job posting from inside wsbuilder, since then the publish could arrive before the outer transaction commits.

This issue would be solved with triggers, but that's a much bigger refactor of our database & pubsub interfaces.

For now I'm going to unwind the changes to wsbuilder and have the callers post the job.

Signed-off-by: Spike Curtis <spike@coder.com>
Comment on lines +262 to +269
logger.Error(streamCtx, "recv error and failed to cancel acquire job", slog.Error(recvErr))
// Well, this is awkward. We hit an error receiving from the stream, but didn't cancel before we locked a job
// in the database. We need to mark this job as failed so the end user can retry if they want to.
err := s.Database.UpdateProvisionerJobWithCompleteByID(
context.Background(),
database.UpdateProvisionerJobWithCompleteByIDParams{
ID: je.job.ID,
CompletedAt: sql.NullTime{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Follow-up idea: If this starts happening frequently it could be symptomatic of an underlying issue. Would it make sense to expose this as its own metric in Prometheus?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's a good idea. Especially as part of a larger effort to add relevant provisioner metrics.

Comment on lines 1273 to 1292
{
Name: "Poll Interval",
Description: "Time to wait before polling for a new job.",
Flag: "provisioner-daemon-poll-interval",
Env: "CODER_PROVISIONER_DAEMON_POLL_INTERVAL",
Default: time.Second.String(),
Value: &c.Provisioner.DaemonPollInterval,
Group: &deploymentGroupProvisioning,
YAML: "daemonPollInterval",
},
{
Name: "Poll Jitter",
Description: "Random jitter added to the poll interval.",
Flag: "provisioner-daemon-poll-jitter",
Env: "CODER_PROVISIONER_DAEMON_POLL_JITTER",
Default: (100 * time.Millisecond).String(),
Value: &c.Provisioner.DaemonPollJitter,
Group: &deploymentGroupProvisioning,
YAML: "daemonPollJitter",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If we remove these from the deployment config then it becomes an error to specify them at all. Could we keep them around for a couple releases but mark them as deprecated?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We have deprecation for options that are replaced by something else. In this case there is no replacement -- these options are just not needed any more.

I don't feel like waiting a few releases to remove these options will do us much good. We don't have a good mechanism to getcoder server CLI warnings in front of operator eyeballs. It just drops some logs at start of day and they'll get buried. We're just as likely to break people when we remove them later.

I could just change the Descriptions to "This flag is deprecated and unused" and leave them in. I don't think there will ever be a better time than now to delete them.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thinking on this some more, this will probably only break folks who are specifying the arguments via CLI flags. I think if they're specified via environment variables they'll just get ignored. And we don't specify CLI args in our Helm chart, so the blast radius doesn't appear to be as large. And folks who are running Coder via the installation script would probably be specifying configuration knobs as environment variables in/etc/coder.d/coder.env.

So maybe it's OK to just remove them? 🤷

Thinking outside the scope of the PR, we could have a "Deprecated options" section in the deployment healthcheck JSON that lights up red if you have deprecated options specified.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thinking outside the scope of the PR, we could have a "Deprecated options" section in the deployment healthcheck JSON that lights up red if you have deprecated options specified.

That's a good idea

Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
Signed-off-by: Spike Curtis <spike@coder.com>
@bpmctbpmct mentioned this pull requestSep 18, 2023
@spikecurtisspikecurtis merged commit375c70d intomainSep 19, 2023
@spikecurtisspikecurtis deleted the spike/9428-pt2-integrate-acquirer branchSeptember 19, 2023 06:25
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsSep 19, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@johnstcnjohnstcnjohnstcn approved these changes

Assignees

@spikecurtisspikecurtis

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

don't rely on global state to rate-limit AcquireJob requests
2 participants
@spikecurtis@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp