NotificationsYou must be signed in to change notification settings
Fork948
Star10.5k

fix: prioritise human-initiated builds over prebuilds#18933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

johnstcn wants to merge7 commits intomain

base:main

Choose a base branch

fromcj/fix-database-AcquireProvisionerJob-prebuild-priority

Open

fix: prioritise human-initiated builds over prebuilds#18933

johnstcn wants to merge7 commits intomainfromcj/fix-database-AcquireProvisionerJob-prebuild-priority

+88 −8

Conversation

Copy link

Member

johnstcn commentedJul 21, 2025•
edited by coderabbitaibot
Loading

Continues from#18882

Reverts extraneous changes
Adds explicitORDER BY initiator_id = $PREBUILDS_USER_ID toAcquireProvisionerJob
Improves test added for above PR

Summary by CodeRabbit

New Features
- Human-initiated jobs are now prioritized ahead of prebuild jobs in the job queue.
Bug Fixes
- Improved accuracy of job queue position reporting for human-initiated and prebuild jobs.
Tests
- Added tests to verify correct prioritization of human-initiated jobs over prebuild jobs.

blink-sobotand others added5 commits

July 21, 2025 12:31

feat: prioritize human-initiated workspace builds over prebuilds in q…

a314c51

…ueueThis change implements a priority queue system for provisioner jobs to ensurethat human-initiated workspace builds are processed before prebuild jobs,improving user experience during high queue periods.Changes:- Add priority column to provisioner_jobs table (1=human, 0=prebuild)- Update AcquireProvisionerJob query to order by priority DESC, created_at ASC- Set priority in workspace builder based on initiator (PrebuildsSystemUserID)- Expose priority field in API and SDK- Add comprehensive test for priority queue behaviorCo-authored-by: kylecarbs <7122116+kylecarbs@users.noreply.github.com>

revert addition of provisioner_jobs.priority

bce74b7

adjust ordering in AcquireProvisionerJob to ensure human-initiated jo…

a6e9987

…bs are picked up first

improve test and move to correct location

0112806

fixup! improve test and move to correct location

9ad95c9

johnstcn self-assigned this

Jul 21, 2025

johnstcn requested review fromssncferreira,SasSwart andEmyrk

July 21, 2025 11:36

johnstcn commented

Jul 21, 2025

View reviewed changes

coderd/database/querier_test.go OutdatedShow resolvedHide resolved

improve test logging

6571937

ssncferreira reviewed

Jul 21, 2025

View reviewed changes

coderd/database/queries/provisionerjobs.sql

		@@ -26,6 +26,8 @@ WHERE
		-- they are aliases and the code that calls this query already relies on a different type
		AND provisioner_tagset_contains(@provisioner_tags :: jsonb,potential_job.tags :: jsonb)
		ORDER BY
		-- Ensure that human-initiated jobs are prioritized over prebuilds.
		potential_job.initiator_id='c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid,

Copy link

Contributor

ssncferreiraJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a nice idea, I like the simplicity of prioritizing human-initiated jobs.
That said, I’m a bit concerned about the risk ofprebuild starvation under heavy load. If I understand the query correctly, we’re first ordering by whether the job is human-initiated (higher priority), and then bycreated_at within each group. That meansprebuild jobs will only be picked up once all human-initiated jobs have been processed.

Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue? I know we don’t have an explicitenqueued_at field, but maybecreated_at could serve that purpose. That way, we could still prioritize human jobs, but allow older prebuilds to eventually be picked up and get processed, avoiding starvation while maintaining the intended prioritization.

Copy link

MemberAuthor

johnstcnJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.

Correct. The assumption here is that there will never be a constant stream of human-initiated jobs. If this is the case, then the deployment is undersized and requires more provisioners.

Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue?

I'm not sure the added complexity is worthwhile here. How likely is it for users to be creating a constant stream of provisioner jobs similar in volume to that of the prebuilds reconciler?

Also, we need to decide how long is 'too long' for a prebuild to wait? 1 minute? 10 minutes? Does it need to be configurable?

Finally, after the initial waiting period elapses, users will still end up in the situation where they will be stuck waiting on a bunch of provisioner jobs (as the prebuilds reconciler creates provisioner jobs in batches).

My take is thatif you are running into this problem in the first place, you need more provisioners.

I do like the suggestion of leveragingcoder_workspace_tags to conditionally send prebuilds to a separate set of provisioners, but it appears thattfparse is not able to evaluate that right now.

Copy link

Contributor

SasSwartJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I had the same starvation concern in the original PR, but your point convinced me@johnstcn.

Copy link

Member

EmyrkJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need to update that query that tells us the queue position too?

Copy link

MemberAuthor

johnstcnJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

oof maybe yea

update queue position based on initiator_id

64c28a9

Copy link

coderabbitaibot commentedJul 21, 2025•
edited
Loading

Walkthrough

The changes introduce prioritization of human-initiated provisioner jobs over prebuild jobs in the job acquisition and queue ranking logic. SQL queries and their Go wrappers are updated to reflect this prioritization, and a new test is added to verify the behavior. No functional changes are made outside these areas.

Changes

File(s)	Change Summary
coderd/database/querier_test.go	Added a new test (`TestAcquireProvisionerJob`) to verify prioritization of human-initiated jobs in the queue.
coderd/database/queries.sql.go, coderd/database/queries/provisionerjobs.sql	Updated SQL queries and Go wrappers to prioritize jobs where`initiator_id` matches the human UUID before others.
coderd/provisionerjobs.go	Added a blank line in the import section; no functional changes.

Sequence Diagram(s)

sequenceDiagram    participant User    participant DB    participant Provisioner    User->>DB: Submit provisioner job (initiator_id: human or prebuild)    loop Job Queue        Provisioner->>DB: AcquireProvisionerJob()        alt Human-initiated jobs present            DB-->>Provisioner: Return next human-initiated job        else            DB-->>Provisioner: Return next prebuild job        end        Provisioner->>DB: Mark job as complete    end

Estimated code review effort

2 (~15 minutes)

Poem

In the warren where jobs await their turn,
Humans now hop to the front, we learn!
Prebuilds, be patient, don’t fret or stew—
The queue’s been reshuffled for something new.
With SQL and tests, the carrots align,
Priorities sorted—oh, how divine! 🥕

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat withCodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag@coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag@coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on oursupport page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings togenerate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add@coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add@coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add@coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a.coderabbit.yaml file to the root of your repository.
Please see theconfiguration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit ourDocumentation for detailed information on how to use CodeRabbit.
Join ourDiscord Community to get help, request features, and share feedback.
Follow us onX/Twitter for updates and announcements.

coderabbitaibot reviewed

Jul 21, 2025

View reviewed changes

Copy link

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)

coderd/database/querier_test.go (1)
1370-1370:Logging format matches previous review suggestion.
The logging statement correctly implements the format suggested in the past review comments.

🧹 Nitpick comments (5)

coderd/database/queries/provisionerjobs.sql (1)
78-83:Wide scan ofprovisioner_jobs could hurt latency
pending_jobs pullsall pending jobs (job_status = 'pending') before filtering by provisioner tags later. In busy installs this can be thousands of rows per call.
If practical, narrow the CTE early (e.g., byorganization_id,provisioner, or time window) or ensure an index on(job_status, organization_id) exists.
coderd/provisionerjobs.go (1)
16-19:Import grouping now diverges fromgoimports output
The blank line inserted aftercdr.dev/slog splits third-party packages into two groups, sogoimports will re-write this block. Re-rungoimports -w . to keep CI lint happy.
coderd/database/queries.sql.go (3)
8540-8541:Consider parameterizing the hardcoded prebuild user UUID.
The prioritization logic is correct and well-commented. However, the hardcoded UUIDc42fdf75-3097-471c-8c33-fb52454d81c0 should ideally be parameterized to improve maintainability and avoid magic values in the codebase.
Consider adding this as a query parameter or using a configuration constant that can be passed to the query.
8790-8790:Consistent prioritization logic implemented correctly.
The ROW_NUMBER() window function correctly applies the same human-initiated job prioritization within each partition. The logic is consistent with the job acquisition query.
Same concern as before about the hardcoded UUID - consider parameterizing this value across all queries for consistency and maintainability.
8905-8905:Consistent global queue position calculation with proper prioritization.
The ROW_NUMBER() function correctly implements the prioritization logic for global queue position calculation. The logic is consistent across all modified queries.
To complete the refactoring suggestion from earlier comments, this hardcoded UUID should also be parameterized when addressing the maintainability concerns across all queries.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between0d3b770 and64c28a9.

📒 Files selected for processing (4)

coderd/database/querier_test.go (2 hunks)
coderd/database/queries.sql.go (5 hunks)
coderd/database/queries/provisionerjobs.sql (5 hunks)
coderd/provisionerjobs.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.go

📄 CodeRabbit Inference Engine (.cursorrules)

**/*.go: The codebase is rigorously linted with golangci-lint to maintain consistent code quality.
Coder emphasizes clear error handling, with specific patterns required: Concise error messages that avoid phrases like "failed to"; Wrapping errors with%w to maintain error chains; Using sentinel errors with the "err" prefix (e.g.,errNotFound).
**/*.go: OAuth2-compliant error responses must use writeOAuth2Error in Go code
Public endpoints needing system access should use dbauthz.AsSystemRestricted(ctx) when calling GetOAuth2ProviderAppByClientID
Authenticated endpoints with user context should use ctx directly when calling GetOAuth2ProviderAppByClientID
Follow Uber Go Style Guide

Files:

coderd/provisionerjobs.go
coderd/database/queries.sql.go
coderd/database/querier_test.go

**/*_test.go

📄 CodeRabbit Inference Engine (.cursorrules)

**/*_test.go: All tests must uset.Parallel() to run concurrently, which improves test suite performance and helps identify race conditions.
All tests should run in parallel usingt.Parallel() to ensure efficient testing and expose potential race conditions.
**/*_test.go: Use unique identifiers in concurrent Go tests to prevent race conditions (e.g., fmt.Sprintf with t.Name() and time.Now().UnixNano())
Never use hardcoded names in concurrent Go tests

Files:

coderd/database/querier_test.go

coderd/database/queries/*.sql

📄 CodeRabbit Inference Engine (.cursorrules)

All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.
Modify coderd/database/queries/*.sql files for database changes

Files:

coderd/database/queries/provisionerjobs.sql

🧠 Learnings (4)

coderd/provisionerjobs.go (10)

Learnt from: ThomasK33
PR:#18809
File: coderd/apidoc/swagger.json:2160-2186
Timestamp: 2025-07-21T14:30:40.568Z
Learning: For the Coder repository, API specification or standards compliance issues should be raised in the Go source files (e.g., handler annotations or swaggo comments), not in auto-generated files like coderd/apidoc/swagger.json.

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/coderd.go : The REST API is defined incoderd/coderd.go and uses Chi for HTTP routing.

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/coderdtest/**/* : Thecoderdtest package incoderd/coderdtest/ provides utilities for creating test instances of the Coder server, setting up test users and workspaces, and mocking external components.

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to coderd/database/queries/.sql : Modify coderd/database/queries/.sql files for database changes

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : Follow Uber Go Style Guide

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*.go : Coder emphasizes clear error handling, with specific patterns required: Concise error messages that avoid phrases like "failed to"; Wrapping errors with%w to maintain error chains; Using sentinel errors with the "err" prefix (e.g.,errNotFound).

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderd/dbauthz/*.go : The database authorization (dbauthz) system enforces fine-grained access control across all database operations. All database operations must pass through this layer to ensure security.

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : OAuth2-compliant error responses must use writeOAuth2Error in Go code

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*.go : Public endpoints needing system access should use dbauthz.AsSystemRestricted(ctx) when calling GetOAuth2ProviderAppByClientID

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to enterprise/audit/table.go : If audit errors occur after database changes, update enterprise/audit/table.go

coderd/database/queries.sql.go (2)

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.034Z
Learning: Applies to coderd/database/queries/*.sql : All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.

coderd/database/querier_test.go (10)

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to coderdenttest/**/* : Enterprise features have dedicated test utilities in thecoderdenttest package.

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*_test.go : Never use hardcoded names in concurrent Go tests

Learnt from: CR
PR: coder/coder#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:32:56.474Z
Learning: Applies to **/*_test.go : Use unique identifiers in concurrent Go tests to prevent race conditions (e.g., fmt.Sprintf with t.Name() and time.Now().UnixNano())

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*_test.go : All tests must uset.Parallel() to run concurrently, which improves test suite performance and helps identify race conditions.

Learnt from: CR
PR: coder/coder#0
File: .cursorrules:0-0
Timestamp: 2025-07-21T14:32:43.035Z
Learning: Applies to **/*_test.go : All tests should run in parallel usingt.Parallel() to ensure efficient testing and expose potential race conditions.

coderd/database/queries/provisionerjobs.sql (2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: test-go-pg (windows-2022)
GitHub Check: test-go-pg (macos-latest)
GitHub Check: test-e2e
GitHub Check: test-go-race-pg
GitHub Check: test-go-pg-17
GitHub Check: test-go-pg (ubuntu-latest)
GitHub Check: build-dylib
GitHub Check: lint
GitHub Check: gen

🔇 Additional comments (3)

coderd/database/querier_test.go (1)
1326-1398:LGTM! Well-structured test for job prioritization logic.
This test effectively validates the prioritization of human-initiated jobs over prebuild jobs. The implementation is solid:
Properly usest.Parallel() as required by coding guidelines
Creates alternating job types with proper time ordering to simulate a realistic queue
Correctly verifies the expected acquisition order (human jobs first, then prebuilds)
Includes helpful logging for debugging
Follows Go testing best practices with clear Given/When/Then structure
The test logic correctly validates that the SQL changes for job prioritization are working as intended.
coderd/database/queries.sql.go (2)
8775-8775:LGTM!
Addinginitiator_id to the SELECT statement is necessary to support the prioritization logic in the queue position calculation.
8890-8890:LGTM!
Consistent addition ofinitiator_id to support prioritization logic in the subsequent queue position calculation.

coderd/database/queries/provisionerjobs.sql

Comment on lines +29 to 31

		-- Ensure that human-initiated jobs are prioritized over prebuilds.
		potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid,
		potential_job.created_at

Copy link

coderabbitaibotJul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

🛠️ Refactor suggestion

Hard-coded prebuild-user UUID – please parameterize & document ordering semantics

A literal'c42fdf75-3097-471c-8c33-fb52454d81c0' is repeated in threeORDER BY clauses.
Problems:

Future changes to the prebuild user ID require touching every query – error-prone.
Relying on PostgreSQL’s default ascending sort for a boolean expression (initiator_id = …) is subtle; a quick skim may mis-read the intent (FALSE <TRUE ).

Consider:

-        ORDER BY potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid,+        ORDER BY (potential_job.initiator_id = @prebuilds_user_id::uuid) ASC,

and passprebuilds_user_id as a sqlc argument (or reference asettings table).

This eliminates magic literals, makes tests easier to set up with alternate IDs, and documents thatASC (puttingFALSE/human jobs first) is deliberate.

Also applies to: 94-95, 147-149

🤖 Prompt for AI Agents

In coderd/database/queries/provisionerjobs.sql around lines 29 to 31, replacethe hard-coded UUID 'c42fdf75-3097-471c-8c33-fb52454d81c0' with a parameternamed prebuilds_user_id passed into the query. Update all ORDER BY clauses usingthis UUID to use the parameter instead, and explicitly specify ASC or DESC forboolean expressions to clarify ordering intent. Add documentation commentsexplaining the ordering semantics and the purpose of the prebuilds_user_idparameter. This change should also be applied to similar code at lines 94-95 and147-149.