Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perf: don't call GetUserByID unnecessarily for Agents metrics loops#19395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
cstyan merged 6 commits intomainfromcallum/reduce-agent-metric-db-calls
Aug 21, 2025

Conversation

cstyan
Copy link
Contributor

@cstyancstyan commentedAug 18, 2025
edited
Loading

At the moment, the loop which retrieves and updates the values of the agents metrics excessively callsGetUserByID (a DB query). First it retrieves a list of all workspaces, filtering out inactive agents (not entirely clear to me whether this is non-running workspaces, or just dead agents), and then iterates over those workspaces to get the rest of the relevant data for the metrics. The next call isGetUserByID forworkspace.OwnerID

This should at least partiallyresolvecoder/internal#726by caching seen Useruuid.UUID in a map for each iteration of the loop.

UPDATE: we now have a constraint for theusername field in theusers table which allows us to safely access the username field from the workspaces_expanded view. See#19453

UPDATE: It looks like, in theory, the calls here forGetUserByID should not even be necessary as we already have adatabase.Workspace object which also already has the owner ID and Username.

I left comments in both spots as to why the username should never be empty on the workspace again, but I'll reiterate here:
1. Theowner_id field on theworkspaces table is a FK reference to IDs in theusers table and has aNOT NULL constraint, so the owner fields of a workspace will always be populated
2. While the users table technically only has a constraint that the username has to beNOT NULL (meaning empty string is valid), at user creation time our httpmw package enforces non-empty usernames (for example it callscodersdk.NameValid which enforces that the name is at least 1 character and fits theUsernameValidRegex)
3. Theworkspaces_expanded view has an inner join onworkspaces.owner_id = visible_users.id, and if the owner is valid in theusers table (whichvisible_users is a view of) then they will have a username set

reduce calls to GetUserByIDSigned-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Member

@johnstcnjohnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seeing as the only referenced field ofuser isUsername, why not useworkspace.OwnerUsername instead?

@cstyan
Copy link
ContributorAuthor

Seeing as the only referenced field ofuser isUsername, why not useworkspace.OwnerUsername instead?

Good point, I'll make that change so that we're only caching the username in the map 👍

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Comment on lines 337 to 348
ifusername=="" {
logger.Warn(ctx,"in prometheusmetrics.Agents the username on workspace was empty string, this should not be possible",
slog.F("workspace_id",workspace.ID),
slog.F("template_id",workspace.TemplateID))
// Fallback to GetUserByID if OwnerUsername is empty (e.g., in tests)
user,err:=db.GetUserByID(ctx,workspace.OwnerID)
iferr!=nil {
logger.Error(ctx,"can't get user from the database",slog.F("user_id",workspace.OwnerID),slog.Error(err))
agentsGauge.WithLabelValues(VectorOperationAdd,0,user.Username,workspace.Name,templateName,templateVersionName)
continue
}
username=user.Username
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As far as I can tell, this situation would only be possible if we completely messed up the view. In this case, many other things would be also broken. However, in this case we not only potentially do a whole bunch of user lookups, we also spam a bunch of log errors.

IMO this should just error loudly so it's very obvious.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As far as I can tell, this situation would only be possible if we completely messed up the view.

Yes it seems that way, though I'm not sure if there's some other edge case where this could happen, in which case dropping the fallback path of queryingGetUserByID would techncially be a breaking change.

Removing the logging (warning level for the "workspace did not have a username set") feels reasonable. I would lean towards keeping the fallback DB query unless we have no requirements around breaking changes for metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would lean towards keeping the fallback DB query unless we have no requirements around breaking changes for metrics.

My point is, if we mess up the view in such a way thatOwnerUsername is empty, it's very likely that more things would be broken than metrics. Having a non-empty username is a pretty fundamental assumption baked into the codebase right now.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, but it could be possible thatOwnerUsername is empty for only a very small subset of workspaces.

Is your suggestion that we error updating the metric as a whole (for all found workspace agents) if we see any empty username, or just for those workspaces which have an empty username (and do not call the GetUserByID query as a fallback)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see two possible scenarios here:

  1. Some users end up actually having empty usernames (which istheoretically possible based on the database schema). In this case, there would be no benefit from callingGetUserByID.

  2. Some error updating the view results in an empty string being returned forowner_username. As far as I can tell, this would break things like application routing, SSH access, and also be distinctly visible in the UI. In this case, we have three options:
    a) Continue submitting metrics with an emptyusername field,
    b) Fall back to theGetUserByID query,
    c) Skip over the user/agent.
    d) Refuse to generate potentially invalid metrics at all, error and exit early.

a) could actually signal the issue more quickly, but at the cost of messed up metrics.
b) would likely correct the issue, but we would suddenly be performing more database queries unexpectedly.
c) would also surface the error in a way
d) might actually be overkill now that I think about it.

Out of curiousity, I decided to see what would happen if a migration messed up the view and pushed#19426. The main takeaway I got from that is that a number of existing tests failed, but -- most notably --coderd/prometheusmetrics andcoderd/workspacestats didn't fail. If this is an area of concern, I'd suggest modifying the existing tests to guard for this so that we catch it.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Out of curiousity, I decided to see what would happen if a migration messed up the view and pushed#19426. The main takeaway I got from that is that a number of existing tests failed, but -- most notably -- coderd/prometheusmetrics and coderd/workspacestats didn't fail. If this is an area of concern, I'd suggest modifying the existing tests to guard for this so that we catch it.

IIUC that's because we would still have theGetUserByID call, so the view doesn't matter, we're taking the workspace owner ID and looking up that user.

I'm not sure I understand your point about Bwould likely correct the issue, but we would suddenly be performing more database queries unexpectedly since we're currently making a DB call for every active workspace.

Unless we want to introduce a potentially breaking change I think B is our only option. Otherwise we can remove the fallback query and go with option C, then workspaces with correct usernames set would still emit agent related metrics.

@cstyancstyan changed the titleperf: cache the seen user IDs in each iteration of the Agents metric loopperf: don't call GetUserByID unnecessarily for Agents metrics loopsAug 19, 2025
continue
username:=workspace.OwnerUsername
// This should never be possible, but we'll guard against it just in case.
// 1. The owner_id field on the workspaces table is a reference to IDs in the users table and has a `NOT NULL` constraint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we not just add a foreign key constraint on this field? Seems like an odd omission.
Along with that, a constraint to enforce the minimum length of the username would add a lot more integrity.

This would likely render the rest of this discussion moot as this situation would not be possible.
I do appreciate the defensiveness of this code though, thanks@cstyan 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's a good observation@dannykopping - I might suggest a separate PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good idea 👍 that PR should block this one.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure I understand the FK constraint suggestion, what would that look like or do for us? The query here just uses a viewworkspaces_expanded, and theownerID on theworkspaces table is already a FK tousers. So I think it's just the constraint to enforce a minimum username length that we need to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh whoops, I didn't see the FK at the bottom of the file:https://github.com/coder/coder/blob/main/coderd/database/dump.sql#L3325-L3326

cstyan added a commit that referenced this pull requestAug 21, 2025
Username length and format, via regex, are already enforced at theapplication layer, but we have some code paths with database querieswhere we could optimize away many of the DB query calls if we could besure at the database level that the username is never an empty string.For example:#19395---------Signed-off-by: Callum Styan <callumstyan@gmail.com>
constraint to users tableSigned-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyancstyan merged commit014a2d5 intomainAug 21, 2025
44 of 47 checks passed
@cstyancstyan deleted the callum/reduce-agent-metric-db-calls branchAugust 21, 2025 18:01
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsAug 21, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@dannykoppingdannykoppingdannykopping left review comments

@johnstcnjohnstcnjohnstcn approved these changes

Assignees

@cstyancstyan

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

bug: GetUserByID is called millions of times per day on dogfood
3 participants
@cstyan@dannykopping@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp