Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(coderd): workspaceapps: update last_used_at when workspace app reports stats#11603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
johnstcn merged 11 commits intomainfromcj/update-workspace-lastusedat-proxy
Jan 16, 2024

Conversation

johnstcn
Copy link
Member

@johnstcnjohnstcn commentedJan 12, 2024
edited
Loading

Fixes#11509

  • Adds a new queryBatchUpdateLastUsedAt
  • Adds calls toBatchUpdateLastUsedAt in app stats handler upon flush
  • Passes a stats flush chan to apptest setup scaffolding

This is not the 'correct' solution, but it puts a decent enough bandage on the problem until we figure out a more unified solution to this issue of updating the "Last Used At" field of workspaces based on certain events. I'll be opening a follow-up PR for this.

Now when the workspace apps stats collector flushes a batch of stats, we bumpLastUsedAt for all affected workspaces.

Note: I'm just updatingLastUsedAt to the same value for all workspaces. I don't know if there's a good way to insert a whole bunch of distinct values for a number of rows in a single transaction, and I want to keep this to a single query if possible.

I've verified that this works experimentally but this is the sort of thing that should be ossified in tests. I'm currently putting this intoTestWorkspaceApps as it seems to be the place where all the testing of proxying flows happens.

Also note: the tests are sadly a bit racy; I've done what I can for the moment but I may need to spend some follow-up time refactoring.

@johnstcnjohnstcn self-assigned thisJan 12, 2024
@johnstcnjohnstcnforce-pushed thecj/update-workspace-lastusedat-proxy branch 2 times, most recently from2d44bf4 to5b799e8CompareJanuary 15, 2024 17:16
@johnstcnjohnstcn changed the title[WIP] fix(coderd): workspaceapps: update last_used_at when workspace app reports statsfix(coderd): workspaceapps: update last_used_at when workspace app reports statsJan 15, 2024
@johnstcnjohnstcn marked this pull request as ready for reviewJanuary 15, 2024 17:20
Comment on lines +205 to +206
// TODO(cian): A blocked request should not count as workspace usage.
// assertWorkspaceLastUsedAtNotUpdated(t, appDetails.AppClient(t), appDetails)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: leaving this here to address in a follow-up. As this is now tied to stats collection, we bump whenever there are app usage stats for a workspace.

mafredri reacted with thumbs up emoji
Comment on lines +221 to +222
// TODO(cian): The initial redirect should not count as workspace usage.
// assertWorkspaceLastUsedAtNotUpdated(t, appDetails.AppClient(t), appDetails)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: follow-up

Comment on lines +240 to +241
// TODO(cian): The initial redirect should not count as workspace usage.
// assertWorkspaceLastUsedAtNotUpdated(t, appDetails.AppClient(t), appDetails)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: follow-up

Comment on lines +349 to +350
// TODO(cian): A blocked request should not count as workspace usage.
// assertWorkspaceLastUsedAtNotUpdated(t, appDetails.AppClient(t), appDetails)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: follow-up

// TODO(@deansheather): This should be 400. There's a todo in the
// resolve request code to fix this.
require.Equal(t,http.StatusInternalServerError,resp.StatusCode)
assertWorkspaceLastUsedAtUpdated(t,appDetails)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: I'm counting this as a successful and valid attempt to access a workspace.

mafredri reacted with thumbs up emoji
Comment on lines 1568 to 1591
// Accessing an app should update the workspace's LastUsedAt.
// NOTE: Despite our efforts with the flush channel, this is inherently racy.
funcassertWorkspaceLastUsedAtUpdated(t testing.TB,details*Details) {
t.Helper()

<-time.After(testutil.IntervalMedium)// Wait for Bicopy to finish.
details.FlushStats()
// Wait for stats to fully flush.
require.Eventually(t,func()bool {
ws,err:=details.SDKClient.Workspace(context.Background(),details.Workspace.ID)
assert.NoError(t,err)
returnws.LastUsedAt.After(details.Workspace.LastUsedAt)
},testutil.WaitShort,testutil.IntervalMedium,"workspace LastUsedAt not updated when it should have been")
}

// Except when it sometimes shouldn't (e.g. no access)
// NOTE: Despite our efforts with the flush channel, this is inherently racy.
funcassertWorkspaceLastUsedAtNotUpdated(t testing.TB,details*Details) {
t.Helper()

<-time.After(testutil.IntervalMedium)// Wait for Bicopy to finish.
details.FlushStats()
<-time.After(testutil.IntervalMedium)// Wait for stats to fully flush.
ws,err:=details.SDKClient.Workspace(context.Background(),details.Workspace.ID)
require.NoError(t,err)
require.Equal(t,ws.LastUsedAt,details.Workspace.LastUsedAt,"workspace LastUsedAt updated when it should not have been")
}
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: this is where all the race gremlins live:

  • Bicopy does not appear to exit immediately so if we flush stats immediately we may not get a finished session.
  • Even though a completed flushshould mean that LastUsedAt has been bumped, I've observed some wiggle room (especially when wsproxy is involved).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why don't we try looping the flush until we either time out or have the stats we're looking for? Might make sense to haveFlushStats take a context too, just to be sure.

johnstcn reacted with thumbs up emoji
Comment on lines 445 to 450
proxyFlushCh:=make(chanchan<-struct{})
flushStats:=func() {
proxyFlushDone:=make(chanstruct{})
proxyFlushCh<-proxyFlushDone
<-proxyFlushDone
}
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

review: there should be no need to flush the stats on the primary here, as the proxy sends its stats directly to the primary endpoint which is wired up directly to the Reporter, which just inserts immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Correct me if I'm wrong, but without a flush, we can't ensure thatReport() is being called (i.e. flushing the in-memory stats to coderd via post), right? So I think it may be needed here too, depending on what we're looking to achieve.

Copy link
MemberAuthor

@johnstcnjohnstcnJan 16, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From my reading, any POSTS to/app-stats go straight toapi.WorkspaceAppsStatsCollectorOptions.Reporter, and the wsproxy app stats reporter just hits that endpoint on flush. The stats reporter for the primary goes straight to the DB as well as far as I can tell.

Copy link
Member

@mtojekmtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I left few nit-picks, and I'm looking forward to follow-ups!

"cdr.dev/slog"
"nhooyr.io/websocket"

"cdr.dev/slog"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: is it expected or is the linter impaired?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Might be my linter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is fine, it's how these are usually grouped (coder/coder, cdr.dev, etc grouped together, separate from stdlib and 3rd party).

// temporary map to avoid O(q.workspaces*arg.workspaceIds)
m:=make(map[uuid.UUID]struct{})
for_,id:=rangearg.IDs {
m[id]=struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could be before the mutex lock, but it's such a minor perf change not sure it's worth changing.

n++
}
ifn==0 {
returnsql.ErrNoRows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No need for err no rows here, an update never returns this unlessRETURNING * is used.

"cdr.dev/slog"
"nhooyr.io/websocket"

"cdr.dev/slog"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is fine, it's how these are usually grouped (coder/coder, cdr.dev, etc grouped together, separate from stdlib and 3rd party).

}

iferr:=tx.InsertWorkspaceAppStats(ctx,batch);err!=nil {
returnerr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One could make a case for us trying to do the last used update even if this fails, wdyt?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Inserting app stats is just doing a big insert into a single table (which IIRC is unlogged), so if we run into issues doing that my gut tells me that any further database queries might not be successful. Might not hurt to try, but I'm not sure how much the extra complexity would be worth it. Bear in mind that we will end up just trying to do this again in another 30 seconds!

mafredri reacted with thumbs up emoji
uniqueIDs:=slice.Unique(batch.WorkspaceID)
iferr:=tx.BatchUpdateWorkspaceLastUsedAt(ctx, database.BatchUpdateWorkspaceLastUsedAtParams{
IDs:uniqueIDs,
LastUsedAt:dbtime.Now(),// This isn't 100% accurate, but it's good enough.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

IMO, it's close enough. We could ensure we use max time from the stats if we want slightly more accuracy (still off for some workspaces, though).

Comment on lines 445 to 450
proxyFlushCh:=make(chanchan<-struct{})
flushStats:=func() {
proxyFlushDone:=make(chanstruct{})
proxyFlushCh<-proxyFlushDone
<-proxyFlushDone
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Correct me if I'm wrong, but without a flush, we can't ensure thatReport() is being called (i.e. flushing the in-memory stats to coderd via post), right? So I think it may be needed here too, depending on what we're looking to achieve.

"*",
}

proxyFlushCh:=make(chanchan<-struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Naming these stats specific might keep things more clear.

Comment on lines 1568 to 1591
// Accessing an app should update the workspace's LastUsedAt.
// NOTE: Despite our efforts with the flush channel, this is inherently racy.
funcassertWorkspaceLastUsedAtUpdated(t testing.TB,details*Details) {
t.Helper()

<-time.After(testutil.IntervalMedium)// Wait for Bicopy to finish.
details.FlushStats()
// Wait for stats to fully flush.
require.Eventually(t,func()bool {
ws,err:=details.SDKClient.Workspace(context.Background(),details.Workspace.ID)
assert.NoError(t,err)
returnws.LastUsedAt.After(details.Workspace.LastUsedAt)
},testutil.WaitShort,testutil.IntervalMedium,"workspace LastUsedAt not updated when it should have been")
}

// Except when it sometimes shouldn't (e.g. no access)
// NOTE: Despite our efforts with the flush channel, this is inherently racy.
funcassertWorkspaceLastUsedAtNotUpdated(t testing.TB,details*Details) {
t.Helper()

<-time.After(testutil.IntervalMedium)// Wait for Bicopy to finish.
details.FlushStats()
<-time.After(testutil.IntervalMedium)// Wait for stats to fully flush.
ws,err:=details.SDKClient.Workspace(context.Background(),details.Workspace.ID)
require.NoError(t,err)
require.Equal(t,ws.LastUsedAt,details.Workspace.LastUsedAt,"workspace LastUsedAt updated when it should not have been")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why don't we try looping the flush until we either time out or have the stats we're looking for? Might make sense to haveFlushStats take a context too, just to be sure.

johnstcn reacted with thumbs up emoji
t.Helper()

<-time.After(testutil.IntervalMedium)// Wait for Bicopy to finish.
details.FlushStats()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same here, looping flush seems better than waiting an arbitrary time?

johnstcn reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A loop-flush here would probably just execute after the first iteration.

@johnstcn
Copy link
MemberAuthor

Filed#11643 for following up on unexpected stats inserts.

@johnstcnjohnstcn merged commitd583aca intomainJan 16, 2024
@johnstcnjohnstcn deleted the cj/update-workspace-lastusedat-proxy branchJanuary 16, 2024 14:06
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsJan 16, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@mafredrimafredrimafredri left review comments

@mtojekmtojekmtojek approved these changes

@spikecurtisspikecurtisAwaiting requested review from spikecurtis

Assignees

@johnstcnjohnstcn

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

bug: workspace.last_used_at not updated when accessing coder_app

3 participants

@johnstcn@mafredri@mtojek

[8]ページ先頭

©2009-2025 Movatter.jp