Movatterモバイル変換

Backoff acquiring provisioner jobs when the database is unreachable

3f95841

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping force-pushed thedk/17045 branch from15588ef tocb302b6Compare

April 11, 2025 14:13

dannykopping changed the title~~Reduce excessive logging when database is unreachable~~fix: reduce excessive logging when database is unreachable

dannykopping force-pushed thedk/17045 branch fromcb302b6 tocf6af33Compare

April 11, 2025 14:20

dannykopping commented

coderd/coderd.go OutdatedShow resolvedHide resolved

coderd/tailnet_test.goShow resolvedHide resolved

codersdk/database.go OutdatedShow resolvedHide resolved

provisionerd/provisionerd.go Outdated

		p.acquireAndRunOne(client)
		err:=p.acquireAndRunOne(client)
		iferr!=nil&&ctx.Err()==nil {// Only log if context is not done.
		p.opts.Logger.Debug(ctx,"retrying to acquire job",slog.F("retry_in_ms",retrier.Delay.Milliseconds()),slog.Error(err))

Copy link

ContributorAuthor

dannykoppingApr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Self-review:acquireAndRunOne already logs its own warning - specifically theprovisionerd was unable to acquire job one is logged when the db is unreachable - soDebug is what felt most appropriate to me.

Checking for, and specifically handling, database unreachability in t…

0448a74

…ailnet control protocol dialerSigned-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping force-pushed thedk/17045 branch fromcf6af33 to0448a74Compare

April 11, 2025 14:29

Add len checks for returned resources

0136b70

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping marked this pull request as ready for review

April 11, 2025 14:50

dannykopping requested a review fromspikecurtis

April 11, 2025 14:50

johnstcn reviewed

tailnet/controllers.go OutdatedShow resolvedHide resolved

coderd/tailnet.go OutdatedShow resolvedHide resolved

Merge branch 'main' of github.com:/coder/coder into dk/17045

8d94c3c

dannykopping commented

provisionerd/provisionerd.goShow resolvedHide resolved

Reset retrier after each successful job acquisition

f92e852

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

spikecurtis reviewed

coderd/tailnet.go OutdatedShow resolvedHide resolved

coderd/coderd.go OutdatedShow resolvedHide resolved

coderd/tailnet_test.go OutdatedShow resolvedHide resolved

coderd/workspaceagents_test.go Outdated

		// This needs to be done after the server "starts" otherwise it'll fail straight away when trying to initialize.
		pdb.MarkUnhealthy()

		// Then: the tailnet controller will continually try to dial the coordination endpoint, exceeding its context timeout.

Copy link

Contributor

spikecurtisApr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This comment is wrong, we don't continually retry becauseDialAgent only waits until we hit a dial error. Once the first error is returned the test is complete and we tear down the context.

Furthermore, I don't think the SDKDialAgent is really the thing that you care about testing here. It doesn't handle the retries anyways,tailnet does. Maybe simplify this and just use theWebsocketDialer and ensure it returns an error.

provisionerd/provisionerd.goShow resolvedHide resolved

provisionerd/provisionerd.go OutdatedShow resolvedHide resolved

dannykopping added3 commits

April 14, 2025 09:35

Wrap received error into codersdk.ErrDatabaseNotReachable

4b0fd6a

This has a downside of losing the details of the received error, but in this case it seems justified since we need to conditionalize responses based on codersdk.ErrDatabaseNotReachableSigned-off-by: Danny Kopping <dannykopping@gmail.com>

Replace DatabaseHealthcheckFn with interface

68867da

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

Review suggestions

6f60cbc

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping requested review fromjohnstcn andspikecurtis

April 14, 2025 13:19

johnstcn reviewed

coderd/coderd.goShow resolvedHide resolved

johnstcn reviewed

coderd/tailnet.goShow resolvedHide resolved

spikecurtis approved these changes

coderd/tailnet_test.goShow resolvedHide resolved

johnstcn approved these changes

dannykopping merged commit0b18e45 intomain

32 checks passed

dannykopping deleted the dk/17045 branch

April 15, 2025 08:55

github-actionsbot locked and limited conversation to collaborators