Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: add connection statistics for workspace agents#6469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kylecarbs merged 29 commits intomainfromexportstats
Mar 9, 2023

Conversation

kylecarbs
Copy link
Member

@kylecarbskylecarbs commentedMar 7, 2023
edited
Loading

image

This adds a bar at the bottom of the dashboard (only visible to admins) with periodically updating statistics on workspaces.

After merge, I'll add warnings for theDERPForcedWebsocket property to indicate reduced throughput.

bpmct reacted with thumbs up emoji
@kylecarbskylecarbs self-assigned thisMar 7, 2023
@kylecarbskylecarbs requested a review frommafredriMarch 7, 2023 16:09
Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Took a preliminary look at this, I'll finish my review tomorrow.

select {
case a.connStatsChan <- stats:
// Only store the latest stat when it's successfully sent!
// Otherwise, it should be sent again on the next iteration.
a.latestStat.Store(stats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Considering the previous comment about Tailscale resetting counts on every report, I'd think this current implementation will lose stats?

I imagine a more safe way to update the stats would be something along the lines of:

a.statMu.Lock()a.stats.RxBytes+=...select {casea.connStatsChan<-a.stats:// note: a copy assuming basic struct// sentdefault:// dropped this report}a.statMu.Unlock()Thiswaywe'realwaysincrementingthenumbers,evenfromdroppedreports.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ahh, good point!

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if we'd be better off blocking instead of dropping? It seems like that's fine from the Tailscale side, and then we don't really run any risks here.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I changed it to block instead. Let me know your thoughts! The loop retries anyways.

@@ -405,15 +408,16 @@ func New(options *Options) *API {
r.Post("/csp/reports", api.logReportCSPViolations)

r.Get("/buildinfo", buildInfo)
r.Route("/deployment", func(r chi.Router) {
r.Use(apiKeyMiddleware)
r.Get("/config", api.deploymentConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Breaking change?/config/deployment =>/deployment/config.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since this endpoint is only relied on in the dashboard, I wouldn't consider this breaking, but if you think it is I'm fine to put! with it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No strong preference, just being cautious. 😄

@ammarioammario removed their request for reviewMarch 7, 2023 17:41
Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't look very closely at frontend, but backend looks mostly good. I still have some concerns about the agent stats reporting (see comments) but that's either a lack of my understanding or something we should fix, approving nonetheless.

@@ -406,7 +406,7 @@ func newConfig() *codersdk.DeploymentConfig {
Usage: "How frequently agent stats are recorded",
Flag: "agent-stats-refresh-interval",
Hidden: true,
Default:10 * time.Minute,
Default:30 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a pretty big change, I think it's OK but increases spamminess somewhat.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It definitely is, but I did some napkin math and I think itshould be alright.

Even if a user has hundreds of workspaces, a few hundred more writes/minute shouldn't be a big deal. I suppose it might spam the logs, which I'll check and resolve before merge.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd hate for all of coderd to be spammed with stat logs in a large deployment ;p

@@ -1270,10 +1267,16 @@ func (a *agent) startReportingConnectionStats(ctx context.Context) {
// Convert from microseconds to milliseconds.
stats.ConnectionMedianLatencyMS /= 1000

lastStat := a.latestStat.Load()
if lastStat != nil && reflect.DeepEqual(lastStat, stats) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess this still confuses me a bit. If Tailscale stats aren't cumulative, isn't the only way this matches lastStat if there was no chatter (tx/rx), the latency and sessions for SSH/Code/JetBrains stayed the same?

Since we're also doing network pings in the latency check, I think there is a non-zero chance for multiplereportStats to be running concurrently, essentially competing about Load()/Store() here?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm, good points. I'll refactor this.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After looking at this again, it seems like this should be fine.

This will only match if there's no traffic, but that's arguably great because then we aren't spamming the database with nonsense. I don't want to do this incoderd, because we'd need to query for the last stat to ensure it's not the same.

reportStats is blocking, and so subsequent agent stat refreshes will wait before running again, so I don't think they'd compete.

Let me know if I'm overlooking something or didn't understand properly, I'm sick and my brain is stuffy right now ;p

@@ -0,0 +1 @@
ALTER TABLEworkspace_agent_stats ALTER COLUMN connection_median_latency_ms TYPE bigint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Will this work with non-empty data? You could consider adding two fixtures intestdata/fixtures (000106_pre_workspace_agent_stats_connection_latency.up.sql and000107_post_workspace_agent_stats_connection_latency.up.sql). In the former you add a row with bigint value and in the latter you add a row with float value. If tests pass then all is good. 👍🏻

kylecarbs reacted with thumbs up emoji
type DeploymentStats struct {
// AggregatedFrom is the time in which stats are aggregated from.
// This might be back in time a specific duration or interval.
AggregatedFrom time.Time `json:"aggregated_since" format:"date-time"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any reason to keep json/field out of sync?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nope, just a mistake on my end. Good catch!

BuildingWorkspaces int64 `json:"building_workspaces"`
RunningWorkspaces int64 `json:"running_workspaces"`
FailedWorkspaces int64 `json:"failed_workspaces"`
StoppedWorkspaces int64 `json:"stopped_workspaces"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could utilize nesting, e.g.workspaces.pending,session_count.vscode, etc. Matter of preference, so dealers choice.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed that's a bit easier to parse!

SessionCountReconnectingPTY int64 `json:"session_count_reconnecting_pty"`

WorkspaceRxBytes int64 `json:"workspace_rx_bytes"`
WorkspaceTxBytes int64 `json:"workspace_tx_bytes"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These could be underworkspaces.rx_bytes. Singular vs plural (workspace, workspaces) above is a bit confusing currently.

kylecarbs reacted with thumbs up emoji
@kylecarbskylecarbs merged commit5304b4e intomainMar 9, 2023
@kylecarbskylecarbs deleted the exportstats branchMarch 9, 2023 03:05
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMar 9, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@mafredrimafredrimafredri approved these changes

@bpmctbpmctAwaiting requested review from bpmct

Assignees

@kylecarbskylecarbs

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@kylecarbs@mafredri

[8]ページ先頭

©2009-2025 Movatter.jp