- Notifications
You must be signed in to change notification settings - Fork1.1k
feat(scaletest): add runner for prebuilds#20571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
ethanndickson commentedOct 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This stack of pull requests is managed byGraphite. Learn more aboutstacking. |
| func TestRun(t *testing.T) { | ||
| t.Parallel() | ||
| t.Skip("This test takes several minutes to run, and is intended as a manual regression test") |
ethanndicksonOct 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I had a brief attempt at doing this with a fake API client, and it just becomes so complex so quickly. Because we're uploading the template from within the runner, we can't use the echo provisioner either, which would be considerably faster. With the Terraform provisioner it takes about 2 minutes.
f0b61b4 tofff0900Comparefff0900 tod40117dCompare| data "coder_workspace_preset" "presets" { | ||
| count = local.num_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is there some reason we need a local for this, instead of just{{.NumPresets}}?
| if err != nil { | ||
| return nil, err | ||
| } | ||
| var result strings.Builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think a normal bytes buffer is fine here, since you never actually care about getting the result as a string.
| for _, ws := range workspaces.Workspaces { | ||
| if ws.LatestBuild.Job.Status == codersdk.ProvisionerJobRunning || | ||
| ws.LatestBuild.Job.Status == codersdk.ProvisionerJobSucceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Jobs could technically fail. I know the template is really trivial, but I think we have a timeout for provisioners, and under extreme load some could time out and have their jobs marked failed. The prebuilds reconciler should retry them, but I'm curious how you think they should be counted in the metrics.
| Subsystem: "scaletest", | ||
| Name: "prebuild_job_creation_latency_seconds", | ||
| Help: "Time from when prebuilds are unpaused to when all the template build jobs have been created.", | ||
| }, []string{"template_name"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The default bucket for a histogram are:
var DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}That is, anything that takes longer than 10 seconds isn't a good fit, because you'll only fill one bucket and be unable to infer anything about the distribution other than the fact that it's longer than 10 seconds.
The other thing to say is that we are only going to record a single observation per template, so it doesn't make sense for template_name to be a label for this histogram. If you actually want to look at the latency per template, you need to make this a guage and just set it to the latency.
With some tests only creating 12 or 18 templates, I kind of question the utility of a histogram anyway.
Prebuilds only reconcile every minute by default, so I think we've made a kind of tacit choice that we don't really care about the latency of prebuilds at the level of seconds. Another option for these metrics would be to have them be gauges for the number of workspaces created, acquired, and completed (and failed, if any). We'd only see their values change at the speed of prometheus scraping (15 - 30s is typical), but I think we'd all be happy if everything completes within 30s at scale and we wouldn't worry too much about the dynamics on the per-second level.
| r.cfg.DeletionBarrier.Wait() | ||
| logger.Info(ctx, "all runners reached deletion barrier, proceeding with prebuild deletion") | ||
| err = r.measureDeletion(ctx, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I really don't think this should be inCleanup. I kinda see your reasoning, but this part of the test run because we are measuring stuff.
The comment on the harness cleanup says:
// Cleanup should be called after the test run has finished and results have// been collected.func (h *TestHarness) Cleanup(ctx context.Context) (err error) {The standard thing to do is run the harness, collect results, and then finally clean up. If you did that with the runner like this, you wouldn't get the deletion metrics as part of the results. For the sake of clarity and consistency, put this prebuild deletion stuff in the mainRun.
| } | ||
| } | ||
| if r.prebuildDeletionJobCreationLatency == 0 && (currentCount < targetNumWorkspaces || deletingCount > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The metric is supposed to be whenall workspaces have their deleting job created, but this will trigger ifany deleting jobs are created.
| r.cfg.Metrics.RecordDeletionJobCreation(r.prebuildDeletionJobCreationLatency, r.template.Name) | ||
| } | ||
| if r.prebuildDeletionJobAcquiredLatency == 0 && deletingCount > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Again, this triggers onany, when it's supposed to beall.

Uh oh!
There was an error while loading.Please reload this page.
Relates tocoder/internal#914
Adds a runner for scaletesting prebuilds. The runner uploads a no-op template with prebuilds, watches for the corresponding workspaces to be created, and then does the same to tear them down. I didn't originally plan on having metrics for the teardown, but I figured we might as well as it's still the same prebuilds reconciliation loop mechanism being tested.