- Notifications
You must be signed in to change notification settings - Fork948
fix: prioritise human-initiated builds over prebuilds#18933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
fix: prioritise human-initiated builds over prebuilds#18933
Conversation
…ueueThis change implements a priority queue system for provisioner jobs to ensurethat human-initiated workspace builds are processed before prebuild jobs,improving user experience during high queue periods.Changes:- Add priority column to provisioner_jobs table (1=human, 0=prebuild)- Update AcquireProvisionerJob query to order by priority DESC, created_at ASC- Set priority in workspace builder based on initiator (PrebuildsSystemUserID)- Expose priority field in API and SDK- Add comprehensive test for priority queue behaviorCo-authored-by: kylecarbs <7122116+kylecarbs@users.noreply.github.com>
…bs are picked up first
Uh oh!
There was an error while loading.Please reload this page.
@@ -26,6 +26,8 @@ WHERE | |||
-- they are aliases and the code that calls this query already relies on a different type | |||
AND provisioner_tagset_contains(@provisioner_tags :: jsonb,potential_job.tags :: jsonb) | |||
ORDER BY | |||
-- Ensure that human-initiated jobs are prioritized over prebuilds. | |||
potential_job.initiator_id='c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is a nice idea, I like the simplicity of prioritizing human-initiated jobs.
That said, I’m a bit concerned about the risk ofprebuild starvation under heavy load. If I understand the query correctly, we’re first ordering by whether the job is human-initiated (higher priority), and then bycreated_at
within each group. That meansprebuild jobs will only be picked up once all human-initiated jobs have been processed.
Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue? I know we don’t have an explicitenqueued_at
field, but maybecreated_at
could serve that purpose. That way, we could still prioritize human jobs, but allow older prebuilds to eventually be picked up and get processed, avoiding starvation while maintaining the intended prioritization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.
Correct. The assumption here is that there will never be a constant stream of human-initiated jobs. If this is the case, then the deployment is undersized and requires more provisioners.
Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue?
I'm not sure the added complexity is worthwhile here. How likely is it for users to be creating a constant stream of provisioner jobs similar in volume to that of the prebuilds reconciler?
Also, we need to decide how long is 'too long' for a prebuild to wait? 1 minute? 10 minutes? Does it need to be configurable?
Finally, after the initial waiting period elapses, users will still end up in the situation where they will be stuck waiting on a bunch of provisioner jobs (as the prebuilds reconciler creates provisioner jobs in batches).
My take is thatif you are running into this problem in the first place, you need more provisioners.
I do like the suggestion of leveragingcoder_workspace_tags
to conditionally send prebuilds to a separate set of provisioners, but it appears thattfparse
is not able to evaluate that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I had the same starvation concern in the original PR, but your point convinced me@johnstcn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we need to update that query that tells us the queue position too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
oof maybe yea
coderabbitaibot commentedJul 21, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
WalkthroughThe changes introduce prioritization of human-initiated provisioner jobs over prebuild jobs in the job acquisition and queue ranking logic. SQL queries and their Go wrappers are updated to reflect this prioritization, and a new test is added to verify the behavior. No functional changes are made outside these areas. Changes
Sequence Diagram(s)sequenceDiagram participant User participant DB participant Provisioner User->>DB: Submit provisioner job (initiator_id: human or prebuild) loop Job Queue Provisioner->>DB: AcquireProvisionerJob() alt Human-initiated jobs present DB-->>Provisioner: Return next human-initiated job else DB-->>Provisioner: Return next prebuild job end Provisioner->>DB: Mark job as complete end Estimated code review effort2 (~15 minutes) Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat withCodeRabbit:
SupportNeed help? Create a ticket on oursupport page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
coderd/database/querier_test.go (1)
1370-1370
:Logging format matches previous review suggestion.The logging statement correctly implements the format suggested in the past review comments.
🧹 Nitpick comments (5)
coderd/database/queries/provisionerjobs.sql (1)
78-83
:Wide scan ofprovisioner_jobs
could hurt latency
pending_jobs
pullsall pending jobs (job_status = 'pending'
) before filtering by provisioner tags later. In busy installs this can be thousands of rows per call.If practical, narrow the CTE early (e.g., by
organization_id
,provisioner
, or time window) or ensure an index on(job_status, organization_id)
exists.coderd/provisionerjobs.go (1)
16-19
:Import grouping now diverges fromgoimports
outputThe blank line inserted after
cdr.dev/slog
splits third-party packages into two groups, sogoimports
will re-write this block. Re-rungoimports -w .
to keep CI lint happy.coderd/database/queries.sql.go (3)
8540-8541
:Consider parameterizing the hardcoded prebuild user UUID.The prioritization logic is correct and well-commented. However, the hardcoded UUID
c42fdf75-3097-471c-8c33-fb52454d81c0
should ideally be parameterized to improve maintainability and avoid magic values in the codebase.Consider adding this as a query parameter or using a configuration constant that can be passed to the query.
8790-8790
:Consistent prioritization logic implemented correctly.The ROW_NUMBER() window function correctly applies the same human-initiated job prioritization within each partition. The logic is consistent with the job acquisition query.
Same concern as before about the hardcoded UUID - consider parameterizing this value across all queries for consistency and maintainability.
8905-8905
:Consistent global queue position calculation with proper prioritization.The ROW_NUMBER() function correctly implements the prioritization logic for global queue position calculation. The logic is consistent across all modified queries.
To complete the refactoring suggestion from earlier comments, this hardcoded UUID should also be parameterized when addressing the maintainability concerns across all queries.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
coderd/database/querier_test.go
(2 hunks)coderd/database/queries.sql.go
(5 hunks)coderd/database/queries/provisionerjobs.sql
(5 hunks)coderd/provisionerjobs.go
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go
📄 CodeRabbit Inference Engine (.cursorrules)
**/*.go
: The codebase is rigorously linted with golangci-lint to maintain consistent code quality.
Coder emphasizes clear error handling, with specific patterns required: Concise error messages that avoid phrases like "failed to"; Wrapping errors with%w
to maintain error chains; Using sentinel errors with the "err" prefix (e.g.,errNotFound
).
**/*.go
: OAuth2-compliant error responses must use writeOAuth2Error in Go code
Public endpoints needing system access should use dbauthz.AsSystemRestricted(ctx) when calling GetOAuth2ProviderAppByClientID
Authenticated endpoints with user context should use ctx directly when calling GetOAuth2ProviderAppByClientID
Follow Uber Go Style Guide
Files:
coderd/provisionerjobs.go
coderd/database/queries.sql.go
coderd/database/querier_test.go
**/*_test.go
📄 CodeRabbit Inference Engine (.cursorrules)
**/*_test.go
: All tests must uset.Parallel()
to run concurrently, which improves test suite performance and helps identify race conditions.
All tests should run in parallel usingt.Parallel()
to ensure efficient testing and expose potential race conditions.
**/*_test.go
: Use unique identifiers in concurrent Go tests to prevent race conditions (e.g., fmt.Sprintf with t.Name() and time.Now().UnixNano())
Never use hardcoded names in concurrent Go tests
Files:
coderd/database/querier_test.go
coderd/database/queries/*.sql
📄 CodeRabbit Inference Engine (.cursorrules)
All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.
Modify coderd/database/queries/*.sql files for database changes
Files:
coderd/database/queries/provisionerjobs.sql
🧠 Learnings (4)
coderd/provisionerjobs.go (10)
Learnt from: ThomasK33
PR:#18809
File: coderd/apidoc/swagger.json:2160-2186
Timestamp: 2025-07-21T14:30:40.568Z
Learning: For the Coder repository, API specification or standards compliance issues should be raised in the Go source files (e.g., handler annotations or swaggo comments), not in auto-generated files like coderd/apidoc/swagger.json.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/coderd.go : The REST API is defined incoderd/coderd.go
and uses Chi for HTTP routing.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/coderdtest/**/* : Thecoderdtest
package incoderd/coderdtest/
provides utilities for creating test instances of the Coder server, setting up test users and workspaces, and mocking external components.
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to coderd/database/queries/.sql : Modify coderd/database/queries/.sql files for database changes
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : Follow Uber Go Style Guide
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*.go : Coder emphasizes clear error handling, with specific patterns required: Concise error messages that avoid phrases like "failed to"; Wrapping errors with%w
to maintain error chains; Using sentinel errors with the "err" prefix (e.g.,errNotFound
).
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/dbauthz/*.go : The database authorization (dbauthz) system enforces fine-grained access control across all database operations. All database operations must pass through this layer to ensure security.
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : OAuth2-compliant error responses must use writeOAuth2Error in Go code
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : Public endpoints needing system access should use dbauthz.AsSystemRestricted(ctx) when calling GetOAuth2ProviderAppByClientID
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to enterprise/audit/table.go : If audit errors occur after database changes, update enterprise/audit/table.go
coderd/database/queries.sql.go (2)
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to coderd/database/queries/.sql : Modify coderd/database/queries/.sql files for database changes
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/database/queries/*.sql : All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.
coderd/database/querier_test.go (10)
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/coderdtest/**/* : Thecoderdtest
package incoderd/coderdtest/
provides utilities for creating test instances of the Coder server, setting up test users and workspaces, and mocking external components.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*.go : Coder emphasizes clear error handling, with specific patterns required: Concise error messages that avoid phrases like "failed to"; Wrapping errors with%w
to maintain error chains; Using sentinel errors with the "err" prefix (e.g.,errNotFound
).
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderdenttest/**/* : Enterprise features have dedicated test utilities in thecoderdenttest
package.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/coderd.go : The REST API is defined incoderd/coderd.go
and uses Chi for HTTP routing.
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*_test.go : Never use hardcoded names in concurrent Go tests
Learnt from: ThomasK33
PR:#18809
File: coderd/apidoc/swagger.json:2160-2186
Timestamp: 2025-07-21T14:30:40.568Z
Learning: For the Coder repository, API specification or standards compliance issues should be raised in the Go source files (e.g., handler annotations or swaggo comments), not in auto-generated files like coderd/apidoc/swagger.json.
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*_test.go : Use unique identifiers in concurrent Go tests to prevent race conditions (e.g., fmt.Sprintf with t.Name() and time.Now().UnixNano())
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/dbauthz/*.go : The database authorization (dbauthz) system enforces fine-grained access control across all database operations. All database operations must pass through this layer to ensure security.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*_test.go : All tests must uset.Parallel()
to run concurrently, which improves test suite performance and helps identify race conditions.
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*_test.go : All tests should run in parallel usingt.Parallel()
to ensure efficient testing and expose potential race conditions.
coderd/database/queries/provisionerjobs.sql (2)
Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to coderd/database/queries/.sql : Modify coderd/database/queries/.sql files for database changes
Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/database/queries/*.sql : All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: test-go-pg (windows-2022)
- GitHub Check: test-go-pg (macos-latest)
- GitHub Check: test-e2e
- GitHub Check: test-go-race-pg
- GitHub Check: test-go-pg-17
- GitHub Check: test-go-pg (ubuntu-latest)
- GitHub Check: build-dylib
- GitHub Check: lint
- GitHub Check: gen
🔇 Additional comments (3)
coderd/database/querier_test.go (1)
1326-1398
:LGTM! Well-structured test for job prioritization logic.This test effectively validates the prioritization of human-initiated jobs over prebuild jobs. The implementation is solid:
- Properly uses
t.Parallel()
as required by coding guidelines- Creates alternating job types with proper time ordering to simulate a realistic queue
- Correctly verifies the expected acquisition order (human jobs first, then prebuilds)
- Includes helpful logging for debugging
- Follows Go testing best practices with clear Given/When/Then structure
The test logic correctly validates that the SQL changes for job prioritization are working as intended.
coderd/database/queries.sql.go (2)
8775-8775
:LGTM!Adding
initiator_id
to the SELECT statement is necessary to support the prioritization logic in the queue position calculation.
8890-8890
:LGTM!Consistent addition of
initiator_id
to support prioritization logic in the subsequent queue position calculation.
-- Ensure that human-initiated jobs are prioritized over prebuilds. | ||
potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid, | ||
potential_job.created_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
🛠️ Refactor suggestion
Hard-coded prebuild-user UUID – please parameterize & document ordering semantics
A literal'c42fdf75-3097-471c-8c33-fb52454d81c0'
is repeated in threeORDER BY
clauses.
Problems:
- Future changes to the prebuild user ID require touching every query – error-prone.
- Relying on PostgreSQL’s default ascending sort for a boolean expression (
initiator_id = …
) is subtle; a quick skim may mis-read the intent (FALSE
<TRUE
).
Consider:
- ORDER BY potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid,+ ORDER BY (potential_job.initiator_id = @prebuilds_user_id::uuid) ASC,
and passprebuilds_user_id
as a sqlc argument (or reference asettings
table).
This eliminates magic literals, makes tests easier to set up with alternate IDs, and documents thatASC
(puttingFALSE
/human jobs first) is deliberate.
Also applies to: 94-95, 147-149
🤖 Prompt for AI Agents
In coderd/database/queries/provisionerjobs.sql around lines 29 to 31, replacethe hard-coded UUID 'c42fdf75-3097-471c-8c33-fb52454d81c0' with a parameternamed prebuilds_user_id passed into the query. Update all ORDER BY clauses usingthis UUID to use the parameter instead, and explicitly specify ASC or DESC forboolean expressions to clarify ordering intent. Add documentation commentsexplaining the ordering semantics and the purpose of the prebuilds_user_idparameter. This change should also be applied to similar code at lines 94-95 and147-149.
Uh oh!
There was an error while loading.Please reload this page.
Continues from#18882
ORDER BY initiator_id = $PREBUILDS_USER_ID
toAcquireProvisionerJob
Summary by CodeRabbit