NotificationsYou must be signed in to change notification settings
Fork927
Star10.1k

feat: add connection statistics for workspace agents#6469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

kylecarbs merged 29 commits intomainfromexportstats

Mar 9, 2023

Merged

feat: add connection statistics for workspace agents#6469

kylecarbs merged 29 commits intomainfromexportstats

Mar 9, 2023

Conversation

Copy link

Member

kylecarbs commentedMar 7, 2023•
edited
Loading

This adds a bar at the bottom of the dashboard (only visible to admins) with periodically updating statistics on workspaces.

After merge, I'll add warnings for theDERPForcedWebsocket property to indicate reduced throughput.

kylecarbs added14 commits

March 2, 2023 10:53

fix: don't make session counts cumulative

8f1f141

This made for some weird tracking... we want the point-in-timenumber of counts!

Add databasefake query for getting agent stats

ddf9841

Add deployment stats endpoint

28d6db5

The query... works?!?

29719a4

Fix aggregation query

09a2dad

Select from multiple tables instead

12a52b1

Fix continuous stats

a1804a9

Increase period of stat refreshes

93f013b

Add workspace counts to deployment stats

50260c3

fmt

d1bae99

Add a slight bit of responsiveness

9fe9d4c

Fix template version editor overflow

00ebe2e

Add refresh button

cd76533

Fix font family on button

506740b

kylecarbs requested review fromammario andbpmct

March 7, 2023 01:42

kylecarbs self-assigned this

Mar 7, 2023

kylecarbs added4 commits

March 7, 2023 03:20

Merge branch 'main' into exportstats

1924f58

Fix latest stat being reported

9f00ac5

Merge branch 'main' into exportstats

4b6992c

Revert agent conn stats

1af9f64

kylecarbs requested a review frommafredri

March 7, 2023 16:09

kylecarbs added3 commits

March 7, 2023 16:11

Merge branch 'main' into exportstats

e3ca39f

Fix linting error

8ad39d6

Fix tests

0f06b23

kylecarbs force-pushed theexportstats branch fromfbd3c9e to3ad3508Compare

March 7, 2023 17:20

Fix gen

e87ba59

kylecarbs force-pushed theexportstats branch from3ad3508 toe87ba59Compare

March 7, 2023 17:30

mafredri reviewed

Mar 7, 2023

View reviewed changes

Copy link

Member

mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Took a preliminary look at this, I'll finish my review tomorrow.

agent/agent.go Outdated

		select {
		case a.connStatsChan <- stats:
		// Only store the latest stat when it's successfully sent!
		// Otherwise, it should be sent again on the next iteration.
		a.latestStat.Store(stats)

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Considering the previous comment about Tailscale resetting counts on every report, I'd think this current implementation will lose stats?

I imagine a more safe way to update the stats would be something along the lines of:

a.statMu.Lock()a.stats.RxBytes+=...select {casea.connStatsChan<-a.stats:// note: a copy assuming basic struct// sentdefault:// dropped this report}a.statMu.Unlock()Thiswaywe'realwaysincrementingthenumbers,evenfromdroppedreports.

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ahh, good point!

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if we'd be better off blocking instead of dropping? It seems like that's fine from the Tailscale side, and then we don't really run any risks here.

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I changed it to block instead. Let me know your thoughts! The loop retries anyways.

coderd/coderd.go Outdated

		@@ -405,15 +408,16 @@ func New(options Options) API {
		r.Post("/csp/reports", api.logReportCSPViolations)

		r.Get("/buildinfo", buildInfo)
		r.Route("/deployment", func(r chi.Router) {
		r.Use(apiKeyMiddleware)
		r.Get("/config", api.deploymentConfig)

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Breaking change?/config/deployment =>/deployment/config.

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since this endpoint is only relied on in the dashboard, I wouldn't consider this breaking, but if you think it is I'm fine to put! with it!

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No strong preference, just being cautious. 😄

coderd/database/dbfake/databasefake.goShow resolvedHide resolved

coderd/database/migrations/000102_workspace_agent_stats_types.up.sql OutdatedShow resolvedHide resolved

ammario removed their request for review

March 7, 2023 17:41

kylecarbs added3 commits

March 7, 2023 17:59

Fix migrations

99d7d1a

Block on sending stat updates

37ad03f

Merge branch 'main' into exportstats

415d8b1

mafredri approved these changes

Mar 8, 2023

View reviewed changes

Copy link

Member

mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't look very closely at frontend, but backend looks mostly good. I still have some concerns about the agent stats reporting (see comments) but that's either a lack of my understanding or something we should fix, approving nonetheless.

cli/deployment/config.go Outdated

		@@ -406,7 +406,7 @@ func newConfig() *codersdk.DeploymentConfig {
		Usage: "How frequently agent stats are recorded",
		Flag: "agent-stats-refresh-interval",
		Hidden: true,
		Default:10 * time.Minute,
		Default:30 * time.Second,

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a pretty big change, I think it's OK but increases spamminess somewhat.

Copy link

MemberAuthor

kylecarbsMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It definitely is, but I did some napkin math and I think itshould be alright.

Even if a user has hundreds of workspaces, a few hundred more writes/minute shouldn't be a big deal. I suppose it might spam the logs, which I'll check and resolve before merge.

Copy link

MemberAuthor

kylecarbsMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd hate for all of coderd to be spammed with stat logs in a large deployment ;p

agent/agent.go

		@@ -1270,10 +1267,16 @@ func (a *agent) startReportingConnectionStats(ctx context.Context) {
		// Convert from microseconds to milliseconds.
		stats.ConnectionMedianLatencyMS /= 1000

		lastStat := a.latestStat.Load()
		if lastStat != nil && reflect.DeepEqual(lastStat, stats) {

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess this still confuses me a bit. If Tailscale stats aren't cumulative, isn't the only way this matches lastStat if there was no chatter (tx/rx), the latency and sessions for SSH/Code/JetBrains stayed the same?

Since we're also doing network pings in the latency check, I think there is a non-zero chance for multiplereportStats to be running concurrently, essentially competing about Load()/Store() here?

Copy link

MemberAuthor

kylecarbsMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm, good points. I'll refactor this.

Copy link

MemberAuthor

kylecarbsMar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After looking at this again, it seems like this should be fine.

This will only match if there's no traffic, but that's arguably great because then we aren't spamming the database with nonsense. I don't want to do this incoderd, because we'd need to query for the last stat to ensure it's not the same.

reportStats is blocking, and so subsequent agent stat refreshes will wait before running again, so I don't think they'd compete.

Let me know if I'm overlooking something or didn't understand properly, I'm sick and my brain is stuffy right now ;p

coderd/database/migrations/000107_workspace_agent_stats_connection_latency.down.sql Outdated

		@@ -0,0 +1 @@
		ALTER TABLEworkspace_agent_stats ALTER COLUMN connection_median_latency_ms TYPE bigint;

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Will this work with non-empty data? You could consider adding two fixtures intestdata/fixtures (000106_pre_workspace_agent_stats_connection_latency.up.sql and000107_post_workspace_agent_stats_connection_latency.up.sql). In the former you add a row with bigint value and in the latter you add a row with float value. If tests pass then all is good. 👍🏻

codersdk/deployment.go Outdated

		type DeploymentStats struct {
		// AggregatedFrom is the time in which stats are aggregated from.
		// This might be back in time a specific duration or interval.
		AggregatedFrom time.Time `json:"aggregated_since" format:"date-time"`

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any reason to keep json/field out of sync?

Copy link

MemberAuthor

kylecarbsMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nope, just a mistake on my end. Good catch!

codersdk/deployment.go OutdatedShow resolvedHide resolved

codersdk/deployment.go Outdated

		BuildingWorkspaces int64 `json:"building_workspaces"`
		RunningWorkspaces int64 `json:"running_workspaces"`
		FailedWorkspaces int64 `json:"failed_workspaces"`
		StoppedWorkspaces int64 `json:"stopped_workspaces"`

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could utilize nesting, e.g.workspaces.pending,session_count.vscode, etc. Matter of preference, so dealers choice.

Copy link

MemberAuthor

kylecarbsMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed that's a bit easier to parse!

codersdk/deployment.go Outdated

		SessionCountReconnectingPTY int64 `json:"session_count_reconnecting_pty"`

		WorkspaceRxBytes int64 `json:"workspace_rx_bytes"`
		WorkspaceTxBytes int64 `json:"workspace_tx_bytes"`

Copy link

Member

mafredriMar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These could be underworkspaces.rx_bytes. Singular vs plural (workspace, workspaces) above is a bit confusing currently.

kylecarbs added2 commits

March 8, 2023 17:31

Add test fixtures

0037a64

Merge branch 'main' into exportstats

3d70b2a

kylecarbs force-pushed theexportstats branch from924235f to568c16fCompare

March 9, 2023 02:44

Fix response structure

d708210

kylecarbs force-pushed theexportstats branch from568c16f tod708210Compare

March 9, 2023 02:51

make gen

c951d5a

kylecarbs merged commit5304b4e intomain

Mar 9, 2023

kylecarbs deleted the exportstats branch

March 9, 2023 03:05

github-actionsbot locked and limited conversation to collaborators

Mar 9, 2023

Labels

None yet

2 participants

Movatterモバイル変換

feat: add connection statistics for workspace agents#6469

feat: add connection statistics for workspace agents#6469

Uh oh!

Conversation

kylecarbs commentedMar 7, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylecarbs commentedMar 7, 2023•
edited
Loading