- Notifications
You must be signed in to change notification settings - Fork928
docs: update reference architecture: glossary, scale tests methodology#12438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
+173 −0
Merged
Changes fromall commits
Commits
Show all changes
20 commits Select commitHold shift + click to select a range
bce5e87
Glossary WIP
mtojek00c6e15
Provisioner
mtojeke0a31b4
Glossary done
mtojekeac4e04
WIP Scale tests methodology
mtojek49e44a2
Traffic projections
mtojek31f440b
WIP
mtojekf105ac8
Fixes
mtojek4151036
Cian's feedback
mtojek6d0cc8a
Cian's feedback
mtojek4b6a3d4
Cian's feedback
mtojek55b2160
WIP
mtojekc953d7e
Overlap scenarios
mtojek47c50df
WIP
mtojeka3bfbb6
WIP
mtojekce7f8a3
WIP
mtojek92ce63e
WIP
mtojek4540f96
Reword registry
mtojekdccaa84
WIP
mtojekc707d52
WIP
mtojekd3f5afc
Last one
mtojekFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
173 changes: 173 additions & 0 deletionsdocs/admin/reference-architectures.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# Reference architectures | ||
This document provides prescriptive solutions and reference architectures to | ||
support successful deployments of up to 2000 users and outlines at a high-level | ||
the methodology currently used to scale-test Coder. | ||
## General concepts | ||
This section outlines core concepts and terminology essential for understanding | ||
Coder's architecture and deployment strategies. | ||
### Administrator | ||
An administrator is a user role within the Coder platform with elevated | ||
privileges. Admins have access to administrative functions such as user | ||
management, template definitions, insights, and deployment configuration. | ||
### Coder | ||
Coder, also known as _coderd_, is the main service recommended for deployment | ||
with multiple replicas to ensure high availability. It provides an API for | ||
managing workspaces and templates. Each _coderd_ replica has the capability to | ||
host multiple [provisioners](#provisioner). | ||
### User | ||
A user is an individual who utilizes the Coder platform to develop, test, and | ||
deploy applications using workspaces. Users can select available templates to | ||
provision workspaces. They interact with Coder using the web interface, the CLI | ||
tool, or directly calling API methods. | ||
### Workspace | ||
A workspace refers to an isolated development environment where users can write, | ||
build, and run code. Workspaces are fully configurable and can be tailored to | ||
specific project requirements, providing developers with a consistent and | ||
efficient development environment. Workspaces can be autostarted and | ||
autostopped, enabling efficient resource management. | ||
Users can connect to workspaces using SSH or via workspace applications like | ||
`code-server`, facilitating collaboration and remote access. Additionally, | ||
workspaces can be parameterized, allowing users to customize settings and | ||
configurations based on their unique needs. Workspaces are instantiated using | ||
Coder templates and deployed on resources created by provisioners. | ||
### Template | ||
A template in Coder is a predefined configuration for creating workspaces. | ||
Templates streamline the process of workspace creation by providing | ||
pre-configured settings, tooling, and dependencies. They are built by template | ||
administrators on top of Terraform, allowing for efficient management of | ||
infrastructure resources. Additionally, templates can utilize Coder modules to | ||
leverage existing features shared with other templates, enhancing flexibility | ||
and consistency across deployments. Templates describe provisioning rules for | ||
infrastructure resources offered by Terraform providers. | ||
### Workspace Proxy | ||
A workspace proxy serves as a relay connection option for developers connecting | ||
to their workspace over SSH, a workspace app, or through port forwarding. It | ||
helps reduce network latency for geo-distributed teams by minimizing the | ||
distance network traffic needs to travel. Notably, workspace proxies do not | ||
handle dashboard connections or API calls. | ||
### Provisioner | ||
Provisioners in Coder execute Terraform during workspace and template builds. | ||
While the platform includes built-in provisioner daemons by default, there are | ||
advantages to employing external provisioners. These external daemons provide | ||
secure build environments and reduce server load, improving performance and | ||
scalability. Each provisioner can handle a single concurrent workspace build, | ||
allowing for efficient resource allocation and workload management. | ||
### Registry | ||
The Coder Registry is a platform where you can find starter templates and | ||
_Modules_ for various cloud services and platforms. | ||
Templates help create self-service development environments using | ||
Terraform-defined infrastructure, while _Modules_ simplify template creation by | ||
providing common features like workspace applications, third-party integrations, | ||
or helper scripts. | ||
Please note that the Registry is a hosted service and isn't available for | ||
offline use. | ||
## Scale-testing methodology | ||
Scaling Coder involves planning and testing to ensure it can handle more load | ||
without compromising service. This process encompasses infrastructure setup, | ||
traffic projections, and aggressive testing to identify and mitigate potential | ||
bottlenecks. | ||
A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically | ||
configured to host and manage Coder workloads. Kubernetes provides container | ||
orchestration capabilities, allowing Coder to efficiently deploy, scale, and | ||
manage workspaces across a distributed infrastructure. This ensures high | ||
availability, fault tolerance, and scalability for Coder deployments. Code is | ||
deployed on this cluster using the | ||
[Helm chart](../install/kubernetes#install-coder-with-helm). | ||
Our scale tests include the following stages: | ||
1. Prepare environment: create expected users and provision workspaces. | ||
2. SSH connections: establish user connections with agents, verifying their | ||
ability to echo back received content. | ||
3. Web Terminal: verify the PTY connection used for communication with Web | ||
Terminal. | ||
4. Workspace application traffic: assess the handling of user connections with | ||
specific workspace apps, confirming their capability to echo back received | ||
content effectively. | ||
mtojek marked this conversation as resolved. Show resolvedHide resolvedUh oh!There was an error while loading.Please reload this page. | ||
5. Dashboard evaluation: verify the responsiveness and stability of Coder | ||
dashboards under varying load conditions. This is achieved by simulating user | ||
interactions using instances of headless Chromium browsers. | ||
6. Cleanup: delete workspaces and users created in step 1. | ||
### Infrastructure and setup requirements | ||
The scale tests runner can distribute the workload to overlap single scenarios | ||
based on the workflow configuration: | ||
| | T0 | T1 | T2 | T3 | T4 | T5 | T6 | | ||
| -------------------- | --- | --- | --- | --- | --- | --- | --- | | ||
| SSH connections | X | X | X | X | | | | | ||
| Web Terminal (PTY) | | X | X | X | X | | | | ||
| Workspace apps | | | X | X | X | X | | | ||
| Dashboard (headless) | | | | X | X | X | X | | ||
This pattern closely reflects how our customers naturally use the system. SSH | ||
connections are heavily utilized because they're the primary communication | ||
channel for IDEs with VS Code and JetBrains plugins. | ||
The basic setup of scale tests environment involves: | ||
1. Scale tests runner (32 vCPU, 128 GB RAM) | ||
2. Coder: 2 replicas (4 vCPU, 16 GB RAM) | ||
3. Database: 1 instance (2 vCPU, 32 GB RAM) | ||
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM) | ||
The test is deemed successful if users did not experience interruptions in their | ||
workflows, `coderd` did not crash or require restarts, and no other internal | ||
errors were observed. | ||
### Traffic Projections | ||
In our scale tests, we simulate activity from 2000 users, 2000 workspaces, and | ||
2000 agents, with two items of workspace agent metadata being sent every 10 | ||
seconds. Here are the resulting metrics: | ||
Coder: | ||
- Median CPU usage for _coderd_: 3 vCPU, peaking at 3.7 vCPU during dashboard | ||
tests. | ||
- Median API request rate: 350 req/s during dashboard tests, 250 req/s during | ||
Web Terminal and workspace apps tests. | ||
- 2000 agent API connections with latency: p90 at 60 ms, p95 at 220 ms. | ||
- on average 2400 Web Socket connections during dashboard tests. | ||
Provisionerd: | ||
- Median CPU usage is 0.35 vCPU during workspace provisioning. | ||
Database: | ||
- Median CPU utilization is 80%, with a significant portion dedicated to writing | ||
metadata. | ||
- Memory utilization averages at 40%. | ||
- `write_ops_count` between 6.7 and 8.4 operations per second. |
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.