NotificationsYou must be signed in to change notification settings
Fork1k
Star11.2k

feat(coderd): add provisioner_daemons to /debug/health endpoint#11393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

johnstcn merged 18 commits intomainfromcj/provisionerd-healthcheck

Jan 8, 2024

Merged

feat(coderd): add provisioner_daemons to /debug/health endpoint#11393

johnstcn merged 18 commits intomainfromcj/provisionerd-healthcheck

Jan 8, 2024

Conversation

Copy link

Member

johnstcn commentedJan 3, 2024•
edited
Loading

Adds a healthcheck for provisioner daemons to /debug/health endpoint.

Part of#10676

johnstcn self-assigned this

Jan 3, 2024

johnstcn force-pushed thecj/provisionerd-healthcheck branch from1b2bc9a to53ed901Compare

January 4, 2024 16:33

johnstcn changed the base branch frommain tocj/util-apiversion

January 4, 2024 16:34

Base automatically changed fromcj/util-apiversion tomain

January 5, 2024 10:22

johnstcn force-pushed thecj/provisionerd-healthcheck branch from53ed901 tob83013bCompare

January 5, 2024 12:19

johnstcn changed the title~~WIP: add version healthcheck for provisioner daemons~~feat(coderd): add provisioner_daemons to /debug/health endpoint

Jan 5, 2024

johnstcn marked this pull request as ready for review

January 5, 2024 12:31

johnstcn requested review frommafredri,mtojek andspikecurtis

January 5, 2024 12:31

spikecurtis reviewed

Jan 5, 2024

View reviewed changes

helm/provisioner/charts/libcoder-0.1.0.tgz

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do you understand why this keeps getting marked as changed?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's the timestamp on the files changing; I haven't yet figured out a way to ignore this.

coderd/healthcheck/provisioner.go OutdatedShow resolvedHide resolved

mafredri reviewed

Jan 5, 2024

View reviewed changes

coderd/healthcheck/healthcheck.go OutdatedShow resolvedHide resolved

coderd/healthcheck/provisioner.go OutdatedShow resolvedHide resolved

coderd/coderd.go OutdatedShow resolvedHide resolved

coderd/healthcheck/provisioner.go

		continue
		}
		// Daemon has gone away, skip.
		ifnow.Sub(daemon.LastSeenAt.Time)> (opts.StaleInterval) {

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thinking out loud, is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Also, don't we want to apply the same rules tor.ProvisionerDaemons?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Yes, and this is by design. If a provisioner daemon connects at some point, has a transient error, and then never heartbeats, I don't think it makes sense to warn about this.

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's fair. I was entertaining the possibility of some error state resulting in the provisioner exiting, crashing or plain stopping communication with server. But I didn't have anything concrete in mind, so this is fine.