- Notifications
You must be signed in to change notification settings - Fork1.1k
fix: introduce dedicated queries for workspaces and workspace agents metrics#19786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
2974bfc toe4265acComparecoderd/database/dbauthz/dbauthz.go Outdated
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { | ||
| returnnil,err | ||
| } | ||
| returnq.db.GetWorkspacesForAgentMetrics(ctx,deleted) | ||
| } | ||
| func (q*querier)GetWorkspacesForWorkspaceMetrics(ctx context.Context,deletedbool) ([]database.GetWorkspacesForWorkspaceMetricsRow,error) { | ||
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wasn't sure ifResourceSystem was appropriate here, just following the pattern of other queries in the general vicinity of these ones. I don't know if we really want to go about adding a new resource type for all cases, but here we could probably just reuseResource.Workspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I really want us to avoid proliferating the use ofResourceSystem --- it's a kitchen sink of different objects right now, and they have very different security properties. We need to go back and break it up at some point, but in the mean time we need to not make the problem worse.
UsingResourceWorkspace would be fine here, IMO, as long as the check is whether the Subject can readall workspaces on the site (since that's what this query does), not just whether they can read any workspace.
coderd/database/dbauthz/dbauthz.go Outdated
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { | ||
| returnnil,err | ||
| } | ||
| returnq.db.GetWorkspacesForAgentMetrics(ctx,deleted) | ||
| } | ||
| func (q*querier)GetWorkspacesForWorkspaceMetrics(ctx context.Context,deletedbool) ([]database.GetWorkspacesForWorkspaceMetricsRow,error) { | ||
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I really want us to avoid proliferating the use ofResourceSystem --- it's a kitchen sink of different objects right now, and they have very different security properties. We need to go back and break it up at some point, but in the mean time we need to not make the problem worse.
UsingResourceWorkspace would be fine here, IMO, as long as the check is whether the Subject can readall workspaces on the site (since that's what this query does), not just whether they can read any workspace.
| database.GetWorkspaceAgentsByWorkspaceAndBuildNumberParams{ | ||
| WorkspaceID:workspace.ID, | ||
| BuildNumber:workspace.BuildNumber, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This construction appears to be doing a "join" but as a Go loop with multiple queries. Does it make sense to combine this with thedb.GetWorkspacesForAgentMetrics(ctx, false) call into a single query that just returns all the agents directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I was just focused on splitting out the queries so that they would be more easily identifiable/separate from other instances ofGetWorkspaces, but this change makes sense 👍 made it in some follow up commits.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
fix testsSigned-off-by: Callum Styan <callumstyan@gmail.com>
299a3b3 toc34e3f7CompareSigned-off-by: Callum Styan <callumstyan@gmail.com>
c34e3f7 to63808d0Comparegithub-actionsbot commentedOct 16, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
All contributors have signed the CLA ✍️ ✅ |
Signed-off-by: Callum Styan <callumstyan@gmail.com>
aec1146 toc5d04b5CompareSigned-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@spikecurtis thanks for the review. Reopened and updated, I was away for a few weeks so this got hit by the stale bot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Some inline comments about eliding some arguments and returned fields, but I don't need to review again.
| JOIN workspace_builds wbONw.id=wb.workspace_id | ||
| JOIN provisioner_jobs pjONwb.job_id=pj.id | ||
| LEFT JOIN template_versions tvONwb.template_version_id=tv.id | ||
| WHEREw.deleted= @deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
do we ever call this with deleted = true? Why would we? Maybe it would be better to eliminate this parameter as it might be confusing anyway --- if you set it to true youonly get the deleted workspaces, but it would be easy to think you'd get all workspaces.
| u.usernameas owner_username, | ||
| t.nameas template_name, | ||
| tv.nameas template_version_name, | ||
| wb.build_number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think we care about this; we don't use it anywhere in the metrics.
| LEFT JOIN template_versions tvONwb.template_version_id=tv.id | ||
| JOIN workspace_resources wrONwb.job_id=wr.job_id | ||
| JOIN workspace_agentsONwr.id=workspace_agents.resource_id | ||
| WHEREw.deleted= @deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Again, I don't think there are any circumstances where we would ever set this to true, so we can eliminate it.
| } | ||
| for_,agent:=rangeworkspaceAgents { | ||
| // Collect information about agents | ||
| agentsGauge.WithLabelValues(VectorOperationAdd,1,agent.OwnerUsername,agent.WorkspaceName,agent.TemplateName,agent.TemplateVersionName.String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
missing the check onTemplateVersionName.Valid from the original
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Thanks for the follow up review 👍 addressed your comments, so going to merge this. |
141ef23 intomainUh oh!
There was an error while loading.Please reload this page.
| } | ||
| for_,metric:=rangem.Metric { | ||
| fmt.Printf("metric: %+v\n",metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Did you mean to remove this?
Over the last 7d, the
GetWorkspacesquery is our most expensive:At least 60k of those 600k calls are from the
prometheusmetrics.Workspacesandprometheusmetrics.Agentspaths just to generate some Prometheus metrics.The
GetWorkspacesquery is very generic, supporting a range of end use cases in our application by joining on other tables to fill out details about each workspace. In the case of theWorkspacesandAgentsmetrics, we didn't actually need the vast majority of the data being returned in each row.In this PR I'm introducing new queries for these paths that only retrieve the workspaces/agents and the required rows for each as they're used within those code paths. This should make these queries significantly less expensive, but at the very least will help us more accurately attribute query cost more to its actual end use case.
As a quick example, an
EXPLAINof a simplified version of the original query has at it's top level:while the new query for
GetWorkspacesForWorkspaceMetricshas:Besides the tests, I ran som queries to confirm that the results of the new query lined up with the currently exported values for the Prometheus metrics on our own internal deployment.