- Notifications
You must be signed in to change notification settings - Fork1k
chore: cherry pick for release 2.22#17842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Merged
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
…7572)Closescoder/vscode-coder#447Closescoder/jetbrains-coder#543Closescoder/coder-jetbrains-toolbox#21This PR adds Coder Connect support to `coder ssh --stdio`.When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP.This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves.Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect.One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel.As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway).To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR.<details><summary>Benchmark</summary>## MethodologyAll tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created.The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace.To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using:```ssh -p 22 -L7001:localhost:7001 <host>```where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname.For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port:```tcping -n 100 localhost 7001```For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used:```iperf3 -c localhost -p 7001```where an `iperf3` server was running on the workspace on port 7001.## Test Cases### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder ConnectThis case tests the implementation in this PR, such that we can write a config like:```Host codercliconnect ProxyCommand /path/to/coder ssh --stdio workspace```With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows:**Throughput, 10 tests, back to back:**- Average throughput across all tests: 788.20 Mbits/sec- Minimum average throughput: 731 Mbits/sec- Maximum average throughput: 871 Mbits/sec- Standard Deviation: 38.88 Mbits/sec**Latency, 100 RTTs:**- Average: 0.369ms- Minimum: 0.290ms- Maximum: 0.473ms### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand`This is what we assume to be the 'best' way to use Coder Connect**Throughput, 10 tests, back to back:**- Average throughput across all tests: 789.50 Mbits/sec- Minimum average throughput: 708 Mbits/sec- Maximum average throughput: 839 Mbits/sec- Standard Deviation: 39.98 Mbits/sec**Latency, 100 RTTs:**- Average: 0.369ms- Minimum: 0.267ms- Maximum: 0.440ms### Testcase 3: `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-processThis is what normally happens when you run `coder ssh`:**Throughput, 10 tests, back to back:**- Average throughput across all tests: 610.20 Mbits/sec- Minimum average throughput: 569 Mbits/sec- Maximum average throughput: 664 Mbits/sec- Standard Deviation: 27.29 Mbits/sec**Latency, 100 RTTs:**- Average: 0.335ms- Minimum: 0.262ms- Maximum: 0.452ms## AnalysisPerforming a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance.## ConclusionFrom the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR.Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2.</details>(cherry picked from commit53ba361)
…17628)The regular network info file creation code also calls `Mkdirall`.Wasn't picked up in manual testing as I already had the `/net` folder inmy VSCode.Wasn't picked up in automated testing because we use an in-memory FS,which for some reason does this implicitly.(cherry picked from commitc7fc7b9)
Closescoder/internal#563The [Coder Connecttunnel](https://github.com/coder/coder/blob/main/vpn/tunnel.go) receivesworkspace state from the Coder server over a [dRPCstream.](https://github.com/coder/coder/blob/114ba4593b2a82dfd41cdcb7fd6eb70d866e7b86/tailnet/controllers.go#L1029)When first connecting to this stream, the current state of the user'sworkspaces is received, with subsequent messages being diffs on top ofthat state.However, if the client disconnects from this stream, such as when theuser's device is suspended, and then reconnects later, no mechanismexists for the tunnel to differentiate that message containing theentire initial state from another diff, and so that state is incorrectlyapplied as a diff.In practice:- Tunnel connects, receives a workspace update containing all theexisting workspaces & agents.- Tunnel loses connection, but isn't completely stopped.- All the user's workspaces are restarted, producing a new set ofagents.- Tunnel regains connection, and receives a workspace update containingall the existing workspaces & agents.- This initial update is incorrectly applied as a diff, with theTunnel's state containing both the old & new agents.This PR introduces a solution in which tunnelUpdater, when created,sends a FreshState flag with the WorkspaceUpdate type. This flag ishandled in the vpn tunnel in the following fashion:- Preserve existing Agents- Remove current Agents in the tunnel that are not present in theWorkspaceUpdate- Remove unreferenced Workspaces(cherry picked from commit5f516ed)
Don't specify the template version for a delete transition, because theprebuilt workspace may have been created using an older templateversion.If the template version isn't explicitly set, the builder willautomatically use the version from the last workspace build - which isthe desired behavior.(cherry picked from commitef11d4f)
PR contains:- fix for claiming & deleting prebuilds with immutable params- unit test for claiming scenario- unit test for deletion scenarioThe parameter resolver was failing when deleting/claiming prebuildsbecause a value for a previously-used parameter was provided to theresolver, but since the value was unchanged (it's coming from thepreset) it failed in the resolver. The resolver was missing a check tosee if the old value != new value; if the values match then there's nomutation of an immutable parameter.---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>(cherry picked from commit98e5611)
Currently we don't have a way to get insight into Postgres connectionsbeing exhausted.By using the prometheus' [`DBStats`collector](https://github.com/prometheus/client_golang/blob/main/prometheus/collectors/dbstats_collector.go),we get some insight out-of-the-box.```# HELP go_sql_idle_connections The number of idle connections.# TYPE go_sql_idle_connections gaugego_sql_idle_connections{db_name="coder"} 1# HELP go_sql_in_use_connections The number of connections currently in use.# TYPE go_sql_in_use_connections gaugego_sql_in_use_connections{db_name="coder"} 2# HELP go_sql_max_idle_closed_total The total number of connections closed due to SetMaxIdleConns.# TYPE go_sql_max_idle_closed_total countergo_sql_max_idle_closed_total{db_name="coder"} 112# HELP go_sql_max_idle_time_closed_total The total number of connections closed due to SetConnMaxIdleTime.# TYPE go_sql_max_idle_time_closed_total countergo_sql_max_idle_time_closed_total{db_name="coder"} 0# HELP go_sql_max_lifetime_closed_total The total number of connections closed due to SetConnMaxLifetime.# TYPE go_sql_max_lifetime_closed_total countergo_sql_max_lifetime_closed_total{db_name="coder"} 0# HELP go_sql_max_open_connections Maximum number of open connections to the database.# TYPE go_sql_max_open_connections gaugego_sql_max_open_connections{db_name="coder"} 10# HELP go_sql_open_connections The number of established connections both in use and idle.# TYPE go_sql_open_connections gaugego_sql_open_connections{db_name="coder"} 3# HELP go_sql_wait_count_total The total number of connections waited for.# TYPE go_sql_wait_count_total countergo_sql_wait_count_total{db_name="coder"} 28# HELP go_sql_wait_duration_seconds_total The total time blocked waiting for a new connection.# TYPE go_sql_wait_duration_seconds_total countergo_sql_wait_duration_seconds_total{db_name="coder"} 0.086936235````go_sql_wait_count_total` is the metric I'm most interested in gaining,but the others are also very useful.Changing the prefix is easy (`prometheus.WrapRegistererWithPrefix`), butgetting rid of the `go_` segment is not quite so easy. I've kept thechangeset small for now.**NOTE:** I imported a library to determine the database name from thegiven conn string. It's [not assimple](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING)as one might hope. The database name is used for the `db_name` label.---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>(cherry picked from commitc278662)
…n exhaustion (#17648)Database transactions hold onto connections, and `pubsub.Publish` triesto acquire a connection of its own. If the latter is called within atransaction, this can lead to connection exhaustion.I plan two follow-ups to this PR:1. Make connection counts tuneablehttps://github.com/coder/coder/blob/main/cli/server.go#L2360-L2376We will then be able to write tests showing how connection exhaustionoccurs.2. Write a linter/ruleguard to prevent `pubsub.Publish` from beingcalled within a transaction.---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>(cherry picked from commita646478)
`Collect()` is called whenever the `/metrics` endpoint is hit toretrieve metrics.The queries used in prebuilds metrics collection are quite heavy, and wewant to avoid having them running concurrently / too often to keep dbload down.Here I'm moving towards a background retrieval of the state required toset the metrics, which gets invalidated every interval.Also introduces `coderd_prebuilt_workspaces_metrics_last_updated` whichoperators can use to determine when these metrics go stale.See#17789 as well.---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>(cherry picked from commitb2a1de9)
Avoids two sequential scans of massive tables (`workspace_builds`,`provisioner_jobs`) and uses index scans instead. This new view largelyreplicates our already optimized query `GetWorkspaces` to fetch thelatest build.The original query and the new query were compared against the dogfooddatabase to ensure they return the exact same data in the exact sameorder (minus the new `workspaces.deleted = false` filter to improveperformance even more). The performance is massively improved evenwithout the `workspaces.deleted = false` filter, but it was added toimprove it even more.Note: these query times are probably inflated due to high database loadon our dogfood environment that this intends to partially resolve.Before: 2,139ms([explain](https://explain.dalibo.com/plan/997e4fch241b46e6))After: 33ms([explain](https://explain.dalibo.com/plan/c888dc223870f181))Co-authored-by: Cian Johnston <cian@coder.com>---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>Co-authored-by: Danny Kopping <dannykopping@gmail.com>(cherry picked from commitef745c0)
The changes in `coder/preview` necessitated the changes in`codersdk/richparameters.go` & `provisioner/terraform/resources.go`.---------Signed-off-by: Danny Kopping <dannykopping@gmail.com>Co-authored-by: Steven Masley <stevenmasley@gmail.com>(cherry picked from commit3ee95f1)
Emyrk approved these changesMay 15, 2025
3a5c2d7
intorelease/2.22 27 of 31 checks passed
Uh oh!
There was an error while loading.Please reload this page.
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Incomplete, generating changelog.